Threat Level: green Handler on Duty: Brad Duncan

SANS ISC: Scripting Web Categorization SANS ISC InfoSec Forums

Special Webcast: What you need to know about the crypt32.dll vulnerability. Register Now

Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
Scripting Web Categorization

When you are dealing with a huge amount of data, it can be very useful to enhance them by adding more valuable content. Example:

  • Geolocalization for IP addresses
  • Get an IP address DShield score
  • Lookup domain names in list of malicious domains
  • ...

When you are processing many URLs during a security incident investigation or while extracting IOC's from a malware sample or logs, it can also be very interesting to categorize them. The process of categorization helps to tag an URL with a label like the classic "Adult Content", "Government", "Forums", etc. Many commercial solutions offer this feature. It can be very powerful to configure your firewall to deny access to non-business categories. But, integrated in closed solutions, it's not easy to re-use them to benefit of this information in your own scripts. For years, Bluecoat has a product called "K9" that helps to protect kids surfing the web. It's free, you just can get a license key and install the tool or... use the online API!  I had to categorize a bunch of URLs , so I decided to take some time to write a few lines of Python to automate this task.

My script webcat.py fetches the defined categories at regular interval (every two hours) and perform a lookup for each URL passed as argument:

$ ./webcat.py isc.sans.org
isc.sans.org,Education

Multiple URLs can be passed on the same command line or the script can be fed via STDIN if you use "-" as parameter:

$ ./webcat.py isc.sans.org blog.rootshell.be
isc.sans.edu,Education
blog.rootshell.be,Technology/Internet
$ cat suspicious-urls.tmp | ./webcat.py -
getmooresuccess.com,Business/Economy
weddingme.net,Business/Economy
riverbird.usa.cc,Malicious Outbound Data/Botnets
1ntershipping.co,Malicious Outbound Data/Botnets
secureemail.bz,Malicious Sources/Malnets
vsreviewsa.com,Malicious Sources/Malnets
felceconserve.com,Malicious Outbound Data/Botnets
flashsync.cf,Uncategorized
cy-m0ld.com,Malicious Outbound Data/Botnets
berettitdint.ru,Malicious Outbound Data/Botnets
vehanmace.ru,Malicious Outbound Data/Botnets
redderbest.gq,Uncategorized
googlemails.ga,Uncategorized
msportf1.com,Sports/Recreation
www.vai-t.com,Malicious Sources/Malnets
duotthenaning.ru,Malicious Sources/Malnets
duotthenaning.ru,Malicious Sources/Malnets
littrecdintoft.ru,Malicious Sources/Malnets
vsreviewsa.com,Malicious Sources/Malnets
doncglobal.com,Malicious Outbound Data/Botnets

The API returns an hexadecimal code corresponding to the web category. That's why the script fetches them at regular interval and store them in a local file:

$ ./webcat.py -h
usage: webcat.py [-h] [-f CACHEFILE] [-F] [URL [URL ...]]

Categorize URL using BlueCoat K9

positional arguments:
  URL                   the URL(s) to check. Format: fqdn[:port]

optional arguments:
  -h, --help            show this help message and exit
  -f CACHEFILE, --file CACHEFILE
                        Categories local cache file (default:
                        /var/tmp/categories.txt)
  -F, --force           force a fetch of categories

Before using the script, you have to register to get your K9 license, add it to the script (line 30).

Note: I'm not aware of any rate-limit in place while querying the API. During my investigations, I was never blocked.

Xavier Mertens
ISC Handler - Freelance Security Consultant
PGP Key

Xme

499 Posts
ISC Handler
I tried the (webcat.py)script and getting something like " No JSON object could be decoded". Any help is appreciated. ( windows 8, Python 2.7)

C:\Python27>python webcat.py -F www.dshield.org
Traceback (most recent call last):
File "webcat.py", line 133, in <module>
main()
File "webcat.py", line 107, in main
webCats = fetchCategories(args.cacheFile)
File "webcat.py", line 43, in fetchCategories
data = json.load(r)
File "C:\Python27\lib\json\__init__.py", line 290, in load
**kw)
File "C:\Python27\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Anonymous

Sign Up for Free or Log In to start participating in the conversation!