Identifying Files: Failure Happens
Last Updated: 2019-02-19 21:45:33 UTC
by Didier Stevens (Version: 1)
I regularly post diary entries analyzing malware. And a couple of times, I posted diary entries of files that turned out to be not malicious.
In this diary entry, I analyse a file that I can't classify as malicious or benign: I just fail to identify its file type.
With oledump.py, I take a look at the .msg file that was given to me:
Streams 1 through 7 contain the attachment content and metadata. Plugin plugin_msg provides more info:
The file extension is .g, and the MIME tag is application/octet-stream. This MIME tag means that the sending client did not identify the file type, apart from: it's a stream of bytes.
To identify files for Windows machines, one can use the file extension or take a look at the content. I'm not familiar with extension .g, thus I start with the content.
I'm not the wiser looking at the content (stream 3):
file-magic.py is a tool that uses libmagic to try to identify files:
Identification "data" means that libmagic was not able to identify the file type based on the content. A file with data is not a text file.
byte-stats.py is a tool that calculates all kinds of byte statistics for its input. That's the next step I try to identify the attachment:
What I get from these numbers, is that all possible byte values (256 in total) are present in this file and that the entropy 7.96... is almost the maximum value of 8.0.
The bytes look randomly distributed, and there is no simple mathematical sequence:
This is either random data, or strong encryption without a recognizable header (for example, GPG encrypted files have a header that libmagic will recognize).
VirusTotal also fails to provide help in identifying the file.
Failing to identify the file based on its content, I search the web for file extension .g and discover that this is the extension used by a CAD program called BRL-CAD. I can't easily find sample files for this CAD program: I resort to installing this CAD program and creating a test.g file.
It turns out these files have a recognizable header:
So the file I'm analyzing is not a BRL-CAD drawing.
Most of the time, looking at the content of files is enough to identify their type. You just need to use one of the techniques I showed here, like using libagic. Here however, there's no recognizable data or structure found inside this file.
And the file extension doesn't match the content.
The body of the email is empty, and the subject is h0y2fmrmvw: not much help.
What I can try, is to see what happens if I open this file with BRL-CAD: does the program crash?
But with static techniques, I'm not able to identify what file this is. It just looks like random data or strong encryption.
It happens, sometimes you loose.
Please post a comment if you have an idea what this is.
Update: BRL-CAD does not recognize the file (and handles the exception):
Feb 22nd 2019
4 years ago