Last Updated: 2020-09-07 16:41:49 UTC
by Didier Stevens (Version: 1)
A reader asked if a particular Emotet sample was a malformed ZIP file. It is not, and I will explain why you might think it is in this diary entry.
I create an example Word document, and save it as a .doc file (OLE file).
When I look at it with my tool zipdump.py, I get this output:
Why do I get output for a ZIP file, when the .doc file is an ole file?
What the reader noticed, is that when they used my tool zipdump.py with option -f L to find and list all PKZIP record, the output showed that there was data before the first PKZIP record (p = prefix, 10566 bytes) and after the last PKZIP record (s = suffix, 12898 bytes):
We have indeed seen ZIP files with data prepended or appended, to try to fool anti-virus products. But this is not the case here.
What is going on, is that each .doc file created with Office contains an embedded ZIP file with theme data.
When I use oledump.py with its YARA option to do an ad hoc search for filename theme1.xml, I see that this string is in the 1Table stream. This is where the ZIP file is embedded:
This file theme1.xml, found in a ZIP file embedded in an OLE file (.doc), is also present in the OOXML format (.docx):
.doc files (and also .xls files) created with Microsoft Office contain an embedded ZIP file with theme data, and this ZIP file can be found with zipdump.py.