Quick and Dirty Malicious HTA Analysis
Last Updated: 2019-03-10 19:34:02 UTC
by Didier Stevens (Version: 1)
Reader Ahmed shared his analysis of a malicious HTA file: the reason why he had to perform static analysis, is that dynamic analysis failed: the sandbox he used reported no activity by the HTA file.
It's a rule of thumb when reversing: if you don't succeed with one particular analysis method, try another one. Even if that second method fails too, it might give you insight to help you progress with the first method.
An HTA file is an HTML Application (extension .hta): it's an HTML file with scripts (VBScript, JScript, ...) that is executed by the HTA engine (mshta.exe). Unlike a browser, scripts running inside that engine are not restricted and use the full permissions of the user running the HTA engine.
The VBScript in this HTA file has a string that is heavily obfuscated. This string is passed on to the Create method of a WMI class to create a new process, but first it is processed by the Replace function:
This call to the Replace function, replaces string ![_%/+-$>#*&])(=?< with an empty string: the result is that each ![_%/+-$>#*&])(=?< occurence is removed from the string passed on to the Create method.
Normally it's easy to do the same with the stream editor sed, except that this string contains meta-characters that have to be escaped, like this:
Now it's clear that this is a PowerShell command, and that the script is obfuscated. We can manually deobfuscate this script, like Ahmed did, but in this diary entry I want to show a quick and dirty method to find out what this script is doing.
First of all, it's clear that we are dealing with malware. A malicious PowerShell script like this one, is almost always a downloader: a script that downloads a payload from the Internet. The URL(s) is/are often obfuscated. But if you search for the character : (found in http://), you might be lucky and find a fragment of a URL.
And that's what we have here for the third occurence of the : character:
Let me just clean this up: a bit to the left there's an .Invoke method call (that's the beginning of the statement) and a bit to the right there's a ; character (that's the end of the statement):
In pink, I've highlighted fragments of text that are clearly part of a URL. This URL uses an IPv4 address, starting with 46.101.8.
In yellow, I've highlighted all the remaining digits: it looks to me that 5.43 is the rest of the IPv4 address.
To be sure, I'm looking it up with VirusTotal: 46.101.85[.]43.
And we are lucky: this IPv4 address is known, and there's one URL with a bad score. The path of this URL is putt.txt, and with that info, I can further identify the fragments of the URL:
When you are dealing with an obfuscated PowerShell script, it's often a downloader. Depending on the obfuscation method, it's possible that the URL (or URLs) is broken up in different fragments, but that the characters have not been encoded. In that case, it can be possible to identify the different fragments, sometimes with the help of threat intel.