What is the origin of passwords submitted to honeypots?

Published: 2023-09-02. Last Updated: 2023-09-02 00:13:36 UTC
by Jesse La Grew (Version: 1)

We use passwords just about everywhere in our daily lives. It's difficult to think of an online service where we don't have a need to enter some kind of credentials to access our content. DShield honeypots collect a variety of data, including passwords, that are submitted from SSH and telnet attacks.

Figure 1: Snapshot on 9/1/2023 of DShield submitted usernames and passwords [1]

The passwords in the above image are ones that are very common week passwords. This is only a small sample of the passwords submitted to honeypots and it made me curious whether there was any particular origin of the submitted passwords:

Default system passwords
Data breach passwords
Randomly generated passwords [2]

As a starting point, I complared the almost 250,000 unique passwords submitted to my honeypot with some publicly available sources:

Rockyou [3]
HaveIBeenPwned Passwords [4]

Extracting Honeypot Passwords

There are many ways to get the passwords out of a DShield honeypot, especially if external logging of the cowrie data is set up. The method used in this case was to pull it out of the local JSON logs I regularly archive.

# read all cowrie JSON logs
# cat /logs/cowrie.json.*
#
# select logs with the .password key present
# jq 'select(.password)'
#
# query the value in the password key and return in raw format (without surrounding quotes)
# jq -r .password
#
# sort the values alphabetically
# sort
#
# return only unique values and output to a text file
# uniq > 2023-08-15_unique_passwords_raw.txt

cat /logs/cowrie.json.* | jq 'select(.password)' | jq -r .password | sort | uniq > 2023-08-15_unique_passwords_raw.txt

Comparing Password Data

The data available from the three sources came in different formats and and needed to be converted for comparison.

Data Source	Starting Format	Converted format
Honeypot passwords	utf-8 strings	SHA1 Hash
Rockyou passwords	latin-1 strings	SHA1 Hash
HaveIBeenPwned passwords	SHA1 hash with frequency count	SHA1 Hash

Since a hash cannot be reversed, hashing the passwords supplied to the honeypot and from the rockyou was performed. This actually made the process easy since little processing was needed for the HaveIBeenPwned password list, which was around 36GB in size.

Seconds to process:	1800.039571
Total honeypot hashes:	247799
Total HaveIBeenPwned hashes:	865964448
Total RockYou hashes:	14343758
RockYou Matched Hashes:	78235
RockYou ONLY Matched Hashes:	15
HaveIBeenPwned Matched Hashes:	164048
Percentage of honeypot passwords found in HaveIBeenPwned breach data:	66.2%
Percentage of honeypot passwords found in RockYou data:	31.57%
Percentage of honeypot passwords found ONLY in RockYou data:	0.01%
Average processing pace:	481080.78 hashes per second

Something learned from this process was that using a Python set() is much faster than using a Python list[]. Nothing makes this much more evident than processing a 36GB text file. Since these values were unique within each data set, a Python set() worked very well.

Also, latin-1 strings were used with the Rockyou list due issues with attempting utf-8 encoding.

Data Comparisons

Looking frequently at cowrie attacks regularly from the DShield honeypot, I knew that there was going to be some unusual results. Rather than filter those out ahead of time, I decided to look at the information visually by comparing password length frequencies.

Figure 2: Password length frequencies from honeypot submissions

The data shows that the most common password length is 8 characters, but there are a lot of passwords with much greater length and lower frequencies. The longest password that had a match in the HaveIBeenPwned data was 48 characters.

Figure 3: Longest password matching HaveIBeenPwned data was 48 characters in length

So, what are these longer passwords? In most cases, the data is most likely not a password, but another part of an attack such as a terminal command or even data meant to be sent to another protocol, such as HTTP.

Figure 4: Examples of data that were not likely meant for password submissions

As the passwords get longer, these commands stand out even more. When filtering out passwords longer than 48 characters, there is not a large difference in the match percentages. It turns out that there are only a few hundred of these passwords out of almost 250,000.

	Count	Percentage
HaveIBeen Pwned Matches	164041	66.28%
RockYou Matches	78233	31.61%
Total Hashes	247482

Passwords Without Matches

Approximately 2/3 of the passwords used to attack my honeypot were available in HaveIBeenPwned password data. What about the other 1/3 of the passwords? I pulled out one specific password example since it had no matches within the breach data used, but was also one of the top 20 passwords attempted this year [5].

Figure 5: Password example with no matches in breach data, but frequently seen

There are a variety of search results in Google when searching for this value. From the search results I was unable to find a source, but many of the results came from honeypot data. The password below the one identified also came up in a variety of articles. WIthin GitHub, that password was available in other honeypot data. This left me with some other questions:

How do write-ups about specific passwords impact those passwords being used in attacks?
How often is reported information security data used to perpetuate attacks?
What is the source of these other "unmatched" passwords? Are they generated or just from breach data not as freely available?

Takeaways

Password breach data is commonly used in credential stuffing attacks.

Use a password manager (could even be a notebook in a locked drawer)
Use unique passwords in combination with Multifactor Authentication (MFA)
Check sites like HaveIBeenPwned [6] to see if your email has been part of a reported breach
Use password breach data to diallow the use of those paswords
If you find a password you use publicly available, change it

[1] https://isc.sans.edu/data/ssh.html
[2] https://isc.sans.edu/diary/How+I+made+a+qwerty+keyboard+walk+password+generator+with+ChatGPT+Guest+Diary/30152/
[3] https://github.com/danielmiessler/SecLists/blob/master/Passwords/Leaked-Databases/rockyou.txt.tar.gz
[4] https://haveibeenpwned.com/Passwords
[5] https://isc.sans.edu/ssh_passwords.html
[6] https://haveibeenpwned.com/

--
Jesse La Grew
Handler

Keywords: breach dshield haveibeenpwned honeypot passwords rockyou

2 comment(s)

Comments

Great post! I'm looking at a lot of brute force attempts every day and have been curious regarding the user/pass combos.

I'd love to see the accompanying information regarding usernames

Thanks for reading and the suggestion! Based on your comment I just published a bit more information, but focusing a bit more on usernames that have been submitted.

Internet Storm Center

What is the origin of passwords submitted to honeypots?

Extracting Honeypot Passwords

Comparing Password Data

Data Comparisons

Passwords Without Matches

Takeaways

Comments

ogd

Sep 3rd 2023
2 years ago

Jesse

Sep 5th 2023
2 years ago

What is the origin of passwords submitted to honeypots?

Extracting Honeypot Passwords

Comparing Password Data

Data Comparisons

Passwords Without Matches

Takeaways

Comments

ogd

Sep 3rd 20232 years ago

Jesse

Sep 5th 20232 years ago

Sep 3rd 2023
2 years ago

Sep 5th 2023
2 years ago