Last Updated: 2010-05-28 02:09:02 UTC
by Kevin Liston (Version: 1)
A Diary Entry that “Writes Itself”
On my last shift, a reader asked: “How do I report Malicious Websites?” (http://isc.sans.org/diary.html?storyid=8719) I provided three ways one could report malicious URLs, IP addresses or hosts and requested your comments. There were a lot of suggestions, so I wanted to do a quick round up on this shift.
Unfortunately it Became Complex.
There was a long list of sites where you could submit a URL to a particular product, some that focused on particular service-providers, others that focused on certain types of malware (e.g. Zeus) or crime (e.g. phishing.)
There was no simple one-stop-shop for the end customer to use. Some browsers and ad-ons gives something resembling that functionality, but it too is still limited to protecting the users of that tool.
Upon reflexion, I realize why a one-stop-shop doesn't exist. A single collection and repository of information is not the correct model. It wouldn't scale, it wouldn't be resilient, and it would be expensive. What I suggest is a framework for exchanging this information.
A Diversity of Clients
The ultimate client is the end-user. We all know how uniquely diverse this population is, especially with respect to their technical skills, and security-awareness. This requires a diversity of solutions to serve this population: browser ad-ins, client software, proxy-servers, specialized DNS clients, etc.
A Diversity of Sources
The intelligence comes from a similarly diverse collection of sources, end-users, help-desk technicians, incident-handlers, malware-researchers, etc. The accuracy and reliability of this information is similarly diverse; I'm stealing from the old saying: Timely, Accurate, Cheap-- pick two.
Consumers Define the Requirements
I consume a lot of malware-related IP addresses, domains and URL each day. This information comes in from a lot of sources: mailing lists, blogs, sandbox analysis reports, online repositories, etc. My focus is on protecting my users, so I look at this information in a certain light. For most users, a simple bad vs. good determination is good enough. I use the following classifications:
- Suspicious – this is the state that all reports start off with, it looks a little better than “Unknown.”
- Exploit Site-- this is for links to exploit kits or sites that launch attacks
- Download – for URLs where downloaders or exploit-sites pull secondary payloads
- Phone-Home/Command-and-Control-- this is for tracking the requests made by malware after it's installed.
- Redirect/Compromised Site-- some systems get owned and get included in the long lists of intelligence that circulate
These classifications are important when an analyst is looking through alerts generated from this watchlist. For example, if a user hits what is classified as a Redirect/Compromised site, but the Exploit Site is blocked by the proxies, you don't have an incident, on the other hand, if you have a system that is consistently probing out to a Phone-Home site that is blocked by your proxies then you do have an incident.
For my purposes, the redirect/compromised site list is low priority. Now, if I were a hosting provider, that list would be of greater importance, but only if the entries were in my network. It is for precisely this reason why I avoid having a “risk” or “severity” rating associated with these entries.
What should it record? How should the records be organized? In my database I track based on individual IP or domain, because it's easy to search proxy and firewall logs via hostname, or IP address. I link the more verbose URL to the domain. In the framework that I propose, URLs would be classified as Suspicious, Exploit, Downloader, etc. while IP addresses, hostnames, and domain names would be their own records that link to these URLs.
For example, consider this fictitious exploit URL: hxxp://abcd.efghijkl.ab/invoice.pdf. In our data-set we could classify this URL was and Exploit URL. If we had better analysis we could tack on a sub-classification of the particular CVE that this exploit leverages. The URL would then link to the hostname of abcd.efghijkl.ab, the domain of efghijkl.ab, and at the time of the report abcd.efghijkl.ab resolved to 3 IP addresses 220.127.116.11, 18.104.22.168, and 22.214.171.124. and these may further link to a particular ASN.
Belief and Feedback
Just like in the IDS and AV worlds, this information has it's fair share of false-positives. This comes in mostly from automated sources-- simply because they don't know better. For example, a bot-client might reach out to myip.ru while another may make a google-search using a direct IP address call. Another pain-point is how advertisers redirect requests, examining the network trace of a web-exploit can sometimes lead an analyst down the rabbit-hole of researching the complexities of one of Doubleclick's competitors.
For this reason the framework would have to support multiple reports per URL, and cluster the URLs to account for unique elements in the URL. Additionally reports would have to identify their sources so consumers could rate sources, or filter out unwanted sources.
Why a Framework and Not a Centralized Repository?
Although the aim is interoperability, I understand that not everyone wants to share everything with everyone, so I imagine this resembling a number of diverse feeds that are consumed and transformed by vendors and end-users. Some services may evolve that correlate and fact-check a large number of feeds to provide a stable and reliable source of good versus bad decisions for end-users, while other vendors may pick and choose their sources to craft a unique solution for their market. Enclaves of researchers would form their own webs of trust via the feeds that they subscribe-to and self-produce.
I'm going to noodle a bit more on this, I welcome your feedback.