Threat Level: green Handler on Duty: Didier Stevens

SANS ISC: Search engines that are no search engines - SANS Internet Storm Center SANS ISC InfoSec Forums

Participate: Learn more about our honeypot network

Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
Search engines that are no search engines
The DShield database was running a bit "hot" earlier today, so I took a closer look at the web log and found that one particular "search engine" was indexing the site rather aggressively:

a.b.c.d - - [09/Nov/2007:15:24:35 +0000] "GET /portreportascii.html?date=2007-11-09 HTTP/1.0" 200 500572 "-" "gsa-crawler (Enterprise; S5-FTNF3BWZPUJAS;" "-"

At first, I thought "oh well, its google". But looking at the user agent string closer, reveals some subtle differences. This is a Google search appliance, not the uber-google-bot we all love. The regular Google bot looks like this: - - [09/Nov/2007:15:24:37 +0000] "GET /date.html?port=47109&date=2007-10-25 HTTP/1.1" 200 7538 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" "-"

I have seen similar cases a few times now. While this one was not malicious, in some cases attacks used google's (or other search engine) user agent strings. I can only assume that this is an attempt to fit in better, and maybe retrieve a search engine version of the page. If anybody knows a good reference where to find IP address ranges used by certain search engines: let us know.

(and btw... if you need bulk data access to dshield data: Please ask. Spidering the site is just not very efficient and you will run into some anti-harvesting traps sending you in circles)

Johannes B. Ullrich
Chief Research Officer, SANS Technology Institute
I will be teaching next: Application Security: Securing Web Apps, APIs, and Microservices - SANS London June 2022


4479 Posts
ISC Handler
Nov 9th 2007

Sign Up for Free or Log In to start participating in the conversation!