Last Updated: 2010-01-20 22:04:25 UTC
by Lenny Zeltser (Version: 2)
Curl and Wget are excellent command-line tools for Windows and Unix. They can download remote files and save them locally without attempting to display or render them. As the result, these tools are handy for retrieving files from potentially malicious website for local analysis--the small feature-set of these utilities, compared to traditional Web browsers, minimizes the vulnerability surface.
Both Curl and Wget support HTTP, HTTPS and FTP protocols, and allow the user to define custom HTTP headers that malicious websites may examine before attempting to attack the visitor (more on that below). Curl also supports other protocols you might find useful, such as LDAP and SFTP; however, these protocols are rarely used by analysts when examining content and code of malicious websites.
Overall, the two tools are similar when it comes to retrieving remote website files. However, the one limitation of Wget that is relevant for analyzing malicious websites it its inability to display contents of remote error pages. These error pages might be fake and contain attack code. Curl will retrieve their full contents for your review; Wget will simply display the HTTP error code.
Consider this example that uses Wget:
$ wget http://www.example.com/page
Connecting to www.example.com:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2010-01-19 05:37:11 ERROR 404: Not Found.
Many analysts assume that the malicious web page is gone when they see this. However, consider the same connection made with Curl:
$ curl http://www.example.com/page
<HEAD><TITLE>404 Not Found</TITLE></HEAD>
<H2>404 Not Found</H2>
document.write("Hi there, bear!");
<P>The requested URL was not found on this server.</P>
-D" parameter to specify the filename where the headers should be saved:
$ curl http://www.example.com/page -D headers.txt
<HEAD><TITLE>404 Not Found</TITLE></HEAD>
$ cat headers.txt
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=iso-8859-1
Date: Wed, 19 Jan 2010 05:51:44 GMT
Last-Modified: Wed, 19 Jan 2010 03:51:44 GMT
If you wish Curl to also save the retrieved page to a file, instead of sending it to STDOUT, use the "
-o" parameter, or simply redirect STDOUT to a file using "
>". This is particularly useful when retrieving binary files, or when the web server responds with an ASCII file that it automatically compressed. If you're not sure about the type of the file you obtained, check it using the Unix "
file" command or the TrID utility (available for Windows and Unix).
Update: Didier Stevens mentioned that using "
-d -o" parameters to Wget allows him to capture full HTTP request and response details in the specified log file. However, this does not seem to address the issue of Wget not displaying contents of HTTP error pages.
Whether using Curl or Wget to retrieve files from potentially-malicious websites, consider what headers you are supplying to the remote site as part of your HTTP request. Many malicious sites look at the headers to determine how or whether to attack the victim, so if they notice Curl's or Wget's identifier in the User-Agent header, you won't get far. Malicious sites also frequently examine the Referer header to target users that came from specific sites, such as Google. Even if you define these headers, the lack of other less-important headers typically set by traditional Web browsers could give you away as an analyst.
I recommend creating a .curlrc or a .wgetrc file that defines the headers you wish these tools to supply. You can define these options on the command-line when calling Curl and Wget, but I find it more convenient to use the configuration files. Consider using your own web server, "
nc -l -p 80", and/or a network sniffer to observe what headers a typical browser such as Internet Explorer sends, and define them in your .curlrc or .wgetrc file. Here's one example of a .curlrc file:
header = "Accept: image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */*"
header = "Accept-Language: en-us"
header = "Accept-Encoding: gzip, deflate"
header = "Connection: Keep-Alive"
user-agent = "Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)"
referer = "http://www.google.com/search?hl=en&q=web&aq=f&oq=&aqi=g1"
The syntax for .wgetrc is very similar, except you should not use quotation marks when defining each field. (Here is another example specific to .wgetrc.)
You may need to tweak "user-agent" and "referer" fields for a specific situation. For more examples of User-Agent strings, see UserAgentString.com.
The "Accept-Encoding" specifies that your browser is willing to accept compressed files from the web server. This will slow you down a bit, because you'll need to decompress the responses (e.g., "
gunzip"); however, it will make your request seem more legitimate to the malicious website.
There you have it--a few tips for using Curl (and Wget) for retrieving files from potentially malicious websites. What do you think?