Decoding Custom Substitution Encodings with translate.py

Published: 2018-10-01. Last Updated: 2018-10-01 18:49:52 UTC
by Didier Stevens (Version: 1)

Reader Jan Hugo submitted a malicious spreadsheet (MD5 942e941ed7344ffc691f569083949a31).

It has some aspects that I want to highlight in this diary entry. oledump.py can be used to analyze it:

The obfuscated command is a single string:

Function BOOL decodes the string, by calling function Check to do the decoding character per character. Unlike similar functions in malicious documents that use a For or While loop to iterate over each character of an encoded string, function Check uses recursion (RED):

The decoding is done by shifting each character 9 positions to the left (BLUE), not using the ASCII table, but using a custom table hidden as form property "without" (BLUE and GREEN):

It's a substitution cipher. This encoding can also be decoded with translate.py, albeit not with a single expression, but with a small function:

def Substitute(number):
    key = [ord(char) for char in 'qwertyuiopasdfghjklzxcvbnm/"\'()[]${}.,\\;-%_|: 1234567890']
    if not number in key:
        return None
    return key[(key.index(number) - 9) % len(key)]

After extraction of the encoded string with re-search.py and replacing 2 double-quotes with a single double-quote using sed (that's how VBA encodes a double-quote inside a string), it can be simply decoded with translate.py by passing the script with option -s and calling the decoding function Substitute:

You can use this translate script if you encounter similar encodings: just replace the offset (9) and the custom table ("qwerty...") with your own.