Analyzing Encoded Shellcode with scdbg
Last Updated: 2018-09-24 21:39:23 UTC
by Didier Stevens (Version: 1)
Reader Jason analyzed a malicious RTF file: using OfficeMalScanner and xorsearch he was able to extract and find the entry point of the shellcode, but scdbg was not able to emulate the shellcode.
Finally, Jason figured out what the shellcode did via dynamic analysis using jmp2it.
I took a look and found a way to conduct the analysis with scdbg.
rtfdump.py is what I used first to start analyzing the RTF file:
Taking a look at the 3rd item, I see it contains an OLE file:
This can be extracted and then analyzed with oledump.py:
This is the content of the stream:
That dump doesn't give much clues at first sight. Neither does strings:
This is probably encoded shellcode.
But the entrypoint is not at position zero (I immediately get an error):
xorsearch has embedded rules to detect instructions often found inside shellcode. Option -W needs to be used to search with these embedded rules:
2 common shellcode methods to locate its position in memory (GetEIP methods) are found, at addresses 4CE and 305 (unencoded: XOR 00).
This is what Jason found on his own, and then he used jmp2it to execute the shellcode, to discover that it's a downloader.
Here I'm going to show how this analysis can be concluded with scdbg.
Using option /foff, scdbg can be directed to start emulating shellcode at the position specified with /foff:
I get an error too, but notice that the stepcount is 2650. This means that scdbg was able to emulate 2650 instructions of the shellcode, so I probably found the correct entrypoint.
Now I check if the shellcode decoded itself (with those 2650 instructions). I do this with option /d, this dump option directs scdbg to write the unpacked shellcode to disk:
The shellcode has changed, and the first change is at position 1454. The complete shellcode, with changes, is written to file sc.unpack.
Looking at this file starting from position 1454, a URL is revealed:
Now, strings reveals more strings:
It's easy to understand that this is a downloader: I see the URL and the filename.
Although scdbg is not able to emulate the complete shellcode, it is able to emulate the decoder stage of the shellcode, and dump the decoded shellcode to disk. Of course, once decoded, the decoded shellcode will be executed. This is what jmp2it was able to do, but scdbg not. The decoded shellcode contains enough cleartext strings to reveal its purpose and provide good IOCs.
By tracing the execution of the decoder stage, it becomes clear what encoding is used:
It's XOR encoding: 4 bytes at a time are decoded with a key (register edi) that changes with each loop iteration: multiply the key with 1b09af21 and add 198677c1.
This is an exploit for CVE-2018-11882 ("Equation Editor vulnerability"):
The exploit (a buffer overflow that overwrites a return address) leads to the execution of shellcode at location E:
But scdbg is not able to emulate this shellcode, as it reads data from the Equation Editor process memory to locate instructions and API functions.
There have been several write-ups that analyze this shellcode in detail, like this one.
But for quick analysis, if scdbg can decode the shellcode, a string analysis is often enough. So if you get an error with scdbg, check if it didn't emulate enough to help understand what the shellcode does.