Last Updated: 2015-01-02 19:42:31 UTC
by Rick Wanner (Version: 1)
In his “Rocket Kitten” diary entry, Johannes introduces research by Gadi Evron and Tillmann Werner. They analyzed a PE-file embedded in the VBA macro code of an XLSM spreadsheet.
I want to show you how you can quickly analyze MS Offices documents and extract files. Just using my Python oledump tool, nothing else. You don’t need MS Office for this analysis.
First we run oledump on sample 266CFE755A0A66776DF9FD8CD2FEE1F1.xlsm like this:
The first line (A: ) indicates that oledump found an OLE file named xl/vbaProject.bin inside the XLSM file. Remember that the new MS Office file format (.docx, .xlsm, …) is a set of XML files stored inside a ZIP file. But VBA macros are not stored in XML files, they still use the older MS Office file format: OLE files.
oledump reports the streams it finds inside the OLE file: from index A1 through A10. A letter M next to the index is an indicator for the presence of VBA code. A lowercase letter m indicates VBA code with only Attribute statements, an uppercase letter M indicates more sophisticated VBA code, i.e. code with other statement types than Attribute statements.
If oledump finds streams with VBA macros, I always look first at the streams marked with an uppercase letter M, as these contain the most promising code.
After the column with the macro indicator M, comes a column with the size (in bytes) of the stream and another column with the full name of the stream.
Let’s take a look at the VBA code in stream A3 like this:
oledump.py –s A3 –v 266CFE755A0A66776DF9FD8CD2FEE1F1.xlsm
Option –s A3 selects stream A3 for analysis, and option –v decompresses the VBA source code.
Here is a part of the VBA source code. Remark function A0: it concatenates characters generated with function Chr into a long string. If you’re very familiar with the ASCII table, you know that Chr(77) + Chr(90) evaluates to string MZ. So this is very likely a PE file.
Looking at the bottom of the source code, we get a better idea:
With this it becomes clear that a very long series of Chr functions is concatenated into a long string s, and that this string s is written to a file with name agent2.dll, and then executed via rundll32.
So in a nutshell: this VBA macro contains a DLL file, encoded in Chr functions, which is written to disk and then executed automatically when the spreadsheet is opened.
What we would like to do now is extract the embedded DLL file without having to execute the VBA macro. This can be done with a special decoder. oledump supports decoders written in Python. Decoders transform the content of a stream into another form. We are going to use the Chr decoder: decoder_chr. This decoder finds all Chr functions in VBA source code, takes the numerical argument, converts it to a byte and concatenates all bytes.
Here is how this works:
By default, you get a hex-ascii dump of the embedded file. Now you can see that the embedded file is a PE file.
Last, we dump (option –d) the embedded PE file and pipe it through pecheck (a tool to analyze PE files):
The MD5 of the PE file is c222199c9a7eb0d162d5e96955739447. That is one of the IOCs Johannes included in his diary entry.
Oledump can be found on my blog.
-- Didier Stevens