Scripting Web Categorization
When you are dealing with a huge amount of data, it can be very useful to enhance them by adding more valuable content. Example:
- Geolocalization for IP addresses
- Get an IP address DShield score
- Lookup domain names in list of malicious domains
- ...
When you are processing many URLs during a security incident investigation or while extracting IOC's from a malware sample or logs, it can also be very interesting to categorize them. The process of categorization helps to tag an URL with a label like the classic "Adult Content", "Government", "Forums", etc. Many commercial solutions offer this feature. It can be very powerful to configure your firewall to deny access to non-business categories. But, integrated in closed solutions, it's not easy to re-use them to benefit of this information in your own scripts. For years, Bluecoat has a product called "K9" that helps to protect kids surfing the web. It's free, you just can get a license key and install the tool or... use the online API! I had to categorize a bunch of URLs , so I decided to take some time to write a few lines of Python to automate this task.
My script webcat.py fetches the defined categories at regular interval (every two hours) and perform a lookup for each URL passed as argument:
$ ./webcat.py isc.sans.org isc.sans.org,Education
Multiple URLs can be passed on the same command line or the script can be fed via STDIN if you use "-" as parameter:
$ ./webcat.py isc.sans.org blog.rootshell.be isc.sans.edu,Education blog.rootshell.be,Technology/Internet $ cat suspicious-urls.tmp | ./webcat.py - getmooresuccess.com,Business/Economy weddingme.net,Business/Economy riverbird.usa.cc,Malicious Outbound Data/Botnets 1ntershipping.co,Malicious Outbound Data/Botnets secureemail.bz,Malicious Sources/Malnets vsreviewsa.com,Malicious Sources/Malnets felceconserve.com,Malicious Outbound Data/Botnets flashsync.cf,Uncategorized cy-m0ld.com,Malicious Outbound Data/Botnets berettitdint.ru,Malicious Outbound Data/Botnets vehanmace.ru,Malicious Outbound Data/Botnets redderbest.gq,Uncategorized googlemails.ga,Uncategorized msportf1.com,Sports/Recreation www.vai-t.com,Malicious Sources/Malnets duotthenaning.ru,Malicious Sources/Malnets duotthenaning.ru,Malicious Sources/Malnets littrecdintoft.ru,Malicious Sources/Malnets vsreviewsa.com,Malicious Sources/Malnets doncglobal.com,Malicious Outbound Data/Botnets
The API returns an hexadecimal code corresponding to the web category. That's why the script fetches them at regular interval and store them in a local file:
$ ./webcat.py -h usage: webcat.py [-h] [-f CACHEFILE] [-F] [URL [URL ...]] Categorize URL using BlueCoat K9 positional arguments: URL the URL(s) to check. Format: fqdn[:port] optional arguments: -h, --help show this help message and exit -f CACHEFILE, --file CACHEFILE Categories local cache file (default: /var/tmp/categories.txt) -F, --force force a fetch of categories
Before using the script, you have to register to get your K9 license, add it to the script (line 30).
Note: I'm not aware of any rate-limit in place while querying the API. During my investigations, I was never blocked.
Xavier Mertens
ISC Handler - Freelance Security Consultant
PGP Key
Comments
www
Nov 17th 2022
6 months ago
EEW
Nov 17th 2022
6 months ago
qwq
Nov 17th 2022
6 months ago
mashood
Nov 17th 2022
6 months ago
isc.sans.edu
Nov 23rd 2022
6 months ago
isc.sans.edu
Nov 23rd 2022
6 months ago
isc.sans.edu
Dec 3rd 2022
6 months ago
isc.sans.edu
Dec 3rd 2022
6 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
isc.sans.edu
Dec 26th 2022
5 months ago
isc.sans.edu
Dec 26th 2022
5 months ago