Phish or scam? - Part 2
We continue the MSG analysis of yesterday.
There are several ways to take a look at the text contained in a Word .docx file without using MS Office.
Here we will look at the raw XML. The content of a Word file is stored in the following file:


As you can see, the text of the document is contained between XML tags. Filtering out these XML tags, for example with a regular expression and SED, reveals the text without any formatting:

But it can be harder to understand without any new lines. And sometimes, this method will strip away info you want to see.
That is why I wrote a simple tool in Python that reads XML and can extract various information: xmldump.py.
You can achieve the same result as with sed by using command xmldump.py text:

Command wordtext is like command text, but it looks for paragraphs (<w:p>) and inserts a newline after extracting the text of each paragraph:


From the content of the Word document, it's clear that this is a scam.
Just for the sake of trying to be thorough, I poked around a bit looking for exploits or feature abuse (like DDE), but found nothing.
 
Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com
 
              
Comments