What pages do bad bots look for?
I’ve been wondering for some time now about what pages and paths are visited the most by “bad” bots – scrapers, data harvesters and other automated scanners which disregards the exclusions set in robots.txt[1]. To determine this, I’ve set up a little experiment – I placed robots.txt on one of my domains, which disallowed access to commonly used paths and PHP pages which might of interest to bots (login.php, /wp-admin/, etc.), configured the server to provide HTTP 200 response for these paths and pages and started logging details about requests sent to them.
To avoid as much legitimate or manually generated traffic as possible, I’ve done this on a domain which pointed to a server on which none of the common content management systems was used.
The captured requests were a mixed bag, as one might expect. Some of them were simple one-shot HTTP GET requests while others were part of multi-request scans, some had no parameters set, while others carried generic SQL injection or XSS payloads or tried to “blindly” exploit vulnerabilities specific to common content management systems.
For our purposes, however, this is beside the point as we’re more interested in finding out which pages were looked for the most. I went over the logs and put the “top 10” most commonly requested pages for the past 12 months in the following table, along with the number of times each path or page was hit.
Path | Count |
---|---|
/wp-login.php | 1140 |
/admin/ | 189 |
/administrator/ | 104 |
/wp-admin/install.php | 82 |
/login.php | 48 |
/administrator/index.php | 26 |
/admin.php | 24 |
/wp-admin/setup-config.php | 24 |
/admin/index.php | 23 |
/wp-links-opml.php | 20 |
Although finding wp-login.php in the first place is hardly surprising, the results are interesting. Given the fairly large early drop in a number of requests it seems that one might be able to catch a significant portion of interesting “bad” bot behavior with just a single-page (or four or five-page) honeypot... In other words, if you’ve ever wondered where to place a “honeypage” on your server in order for it to be effective, the top paths mentioned in the table above might probably be a good start.
Comments
www
Nov 17th 2022
6 months ago
EEW
Nov 17th 2022
6 months ago
qwq
Nov 17th 2022
6 months ago
mashood
Nov 17th 2022
6 months ago
isc.sans.edu
Nov 23rd 2022
6 months ago
isc.sans.edu
Nov 23rd 2022
6 months ago
isc.sans.edu
Dec 3rd 2022
6 months ago
isc.sans.edu
Dec 3rd 2022
6 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
isc.sans.edu
Dec 26th 2022
5 months ago
isc.sans.edu
Dec 26th 2022
5 months ago