Quickie: Generating a YARA Rule to Detect Obfuscated Strings

Published: 2023-09-10
Last Updated: 2023-09-10 21:47:09 UTC
by Didier Stevens (Version: 1)
0 comment(s)

In diary entry "Creating a YARA Rule to Detect Obfuscated Strings" I explain how to tune a YARA rule with regular expressions for performance.

I'm sharing here a Python script I wrote to generate regular expressions. The script takes one argument: the string to BASE64 encode and generate regexes for (string "ActiveMime" in my previous diary entry):

#20230829
import base64
import itertools
import sys

def GenerateRegex(word):
    strings = []
    whitespace = [' ', '\\t', '\\r', '\\n']
    detect = word[:len(word) // 3 * 3]
    print(f'String to search: {word}')
    print(f'String to search (* 3): {detect}')
    detectBASE64 = base64.standard_b64encode(detect.encode('utf8')).decode('latin')
    print(f'BASE64 string to search: {detectBASE64}')
    whitespaceregex = '[' + ''.join(whitespace) + ']*'
    print(f'Whitespace characters: {whitespaceregex}')

    detectBASE64 = [char for char in detectBASE64]

    strings.append(whitespaceregex.join(detectBASE64))

    for ws in itertools.product(whitespace, whitespace):
        strings.append(detectBASE64[0] + ''.join(ws) + whitespaceregex.join([''] + detectBASE64[1:]))
    
    for ws1 in whitespace:
        strings.append(''.join(detectBASE64[0:2]) + ws1 + whitespaceregex.join([''] + detectBASE64[2:]))
    
    strings.append(''.join(detectBASE64[0:3]) + whitespaceregex.join([''] + detectBASE64[3:]))

    return strings, detect


def Main():
    regexStrings, detect = GenerateRegex(sys.argv[1])

    print('        $base64_%s%d = /%s/' % (detect, 0, regexStrings[0]))
    print()
    for index, regex in enumerate(regexStrings[1:]):
        print('        $base64_%s%d = /%s/' % (detect, index + 1, regex))

if __name__ == '__main__':
    Main()

Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com

Keywords:
0 comment(s)

Comments


Diary Archives