An analysis of the Yahoo! passwords

Published: 2012-07-16
Last Updated: 2012-07-17 02:19:00 UTC
by Jim Clausing (Version: 1)
13 comment(s)

Last month the biggest security news in the mainstream press was about the password (hash) "breaches" at LinkedIn, eHarmony, and  Last week, it was a bunch of passwords that were leaked via a Yahoo! service.  These passwords were for a particular Yahoo! service, but the e-mail addresses being used were for quite a few domains.  There has been some discussion of whether, for example, the passwords for Google accounts were also exposed.  The short answer is, if the user committed one of the cardinal sins of passwords and reused the same one for multiple accounts, then, yes, some Google (or other) passwords may also have been exposed.  Having said all of that, that isn't primarily what I wanted to look at today.  I also don't plan to spend too much time on the password policy (or lack thereof) or the fact that the passwords were apparently stored in the clear, both of which most security folks would probably agree are bad ideas.

The domains

First, I did a quick analysis of the domains.  I should note that some of the e-mail addresses were clearly invalid (misspelled domains, etc.).  There were a total of 35008 domains represented.  The top 20 domains (after converting all to lower case) are shown in the table below.


The passwords

I saw an interesting analysis of the eHarmony passwords by Mike Kelly at the Trustwave SpiderLabs blog and thought I'd do a similar analysis of the Yahoo! passwords (and I didn't even need to crack them myself, since the Yahoo! ones were posted in the clear).  I pulled out my trusty install of pipal and went to work.  As an aside, pipal is an interesting tool for those of you that haven't tried it.  As I was preparing this diary, I noted that Mike says the Trustwave folks used PTJ, so I may have to take a look at that one, too. 

The first thing to note is that of the 442,836 passwords, there were 342,508 unique passwords, so over 100,000 of them were duplicates.

Looking at the top 10 passwords and the top 10 base words, we note that some of the worst possible passwords are right there at the top of the list. 123456 and password are always among the first passwords that the bad guys guess because for some reason we haven't trained our users well enough to get them to stop using them.  It is interesting to note that the base words in the eHarmony list seemed to be somewhat related to the purpose of the site (e.g., love, sex, luv, ...), I'm not sure what the significance of ninja, sunshine, or princess is in the list below.

Top 10 passwords
123456 = 1667 (0.38%)
password = 780 (0.18%)
welcome = 437 (0.1%)
ninja = 333 (0.08%)
abc123 = 250 (0.06%)
123456789 = 222 (0.05%)
12345678 = 208 (0.05%)
sunshine = 205 (0.05%)
princess = 202 (0.05%)
qwerty = 172 (0.04%)

Top 10 base words
password = 1374 (0.31%)
welcome = 535 (0.12%)
qwerty = 464 (0.1%)
monkey = 430 (0.1%)
jesus = 429 (0.1%)
love = 421 (0.1%)
money = 407 (0.09%)
freedom = 385 (0.09%)
ninja = 380 (0.09%)
sunshine = 367 (0.08%)

Next, I looked at the lengths of the passwords.  They ranged from 1 (117 users) to 30 (2 users).  Who thought allowing 1 character passwords was a good idea?

Password length (count ordered)
8 = 119135 (26.9%)
6 = 79629 (17.98%)
9 = 65964 (14.9%)
7 = 65611 (14.82%)
10 = 54760 (12.37%)
12 = 21730 (4.91%)
11 = 21220 (4.79%)
5 = 5325 (1.2%)
4 = 2749 (0.62%)
13 = 2658 (0.6%)

We security folks have long preached (and rightly so) the virtues of a "complex" password.  By increasing the size of the alphabet and the length of the password, we increase the work the bad guys must do to guess or crack the passwords.  We've gotten in the habit of telling users that a "good" password consists of [lower case, upper case, digits, special characters] (choose 3).  Unfortunately, if that is all the guidance we give, users being human and, by nature, somewhat lazy will apply those rules in the easiest way.

First capital last symbol = 1259 (0.28%)
First capital last number = 17467 (3.94%)

On the other hand, if we don't enforce at least that much, users won't bother.

Only lowercase alpha = 146516 (33.09%)
Only uppercase alpha = 1778 (0.4%)
Only alpha = 148294 (33.49%)
Only numeric = 26081 (5.89%)

I thought it was also interesting looking at the passwords that contained a year:

Years (Top 10)
2008 = 1145 (0.26%)
2009 = 1052 (0.24%)
2007 = 765 (0.17%)
2000 = 617 (0.14%)
2006 = 572 (0.13%)
2005 = 496 (0.11%)
2004 = 424 (0.1%)
1987 = 413 (0.09%)
2001 = 404 (0.09%)
2002 = 404 (0.09%)

What is the significance of 1987 and why nothing more recent that 2009?  When I analyzed some other passwords, I'd see either the current year, or the year the account was created, or the year the user was born.  And finally, some statistics inspired by the Trustwave analysis:

Months (abbr.) = 10585 (2.39%)
Days of the week (abbr.) = 6769 (1.53%)
Containing any of the top 100 boys names of 2011 = 18504 (4.18%)
Containing any of the top 100 girls names of 2011 = 10899 (2.46%)
Containing any of the top 100 dog names of 2011 = 17941 (4.05%)
Containing any of the top 25 worst passwords of 2011 = 11124 (2.51%)
Containing any NFL team names = 1066 (0.24%)
Containing any NHL team names = 863 (0.19%)
Containing any MLB team names = 1285 (0.29%)

I wish I had their list of curse words to test. :)


So, what conclusions can we draw from all of this?  Well, the obvious is that without any direction, most users will not choose particularly strong passwords and the bad guys know this.  What constitutes a good password?  What constitutes a good password policy?  Personally, I think the longer, the better and I actually recommend [lower case, upper case, digit, special character] (choose at least one of each).  Hopefully none of these users were using the same password here as on their banking sites.  What do you, our faithful readers, think?

Jim Clausing, GIAC GSE #26
jclausing --at-- isc [dot] sans (dot) edu

The opinions expressed here are strictly those of the author and do not represent those of SANS, the Internet Storm Center, the author's spouse, kids, or pets.

13 comment(s)


I suspect the reason you didn't see any years after 2009 is because these are older passwords. As near as I can tell from the articles, these are passwords from an old file that predate the "Associated Content" acquisition 2010.
Obviously, spammers also use Yahoo! as a return address, and having a real account might get them thru spam filters more than a non-existent account would. The spammers do not care whether the account gets hacked, since they are never going to read the unsubscribe request sent there anyway, so the knowingly use bad passwords.

As far as real users, 2 factor id is the easiest way to go that is easy to implement and relatively easy to use. Some kind of physical token, such as a usb stick, cd, or other memory device could easily hold a certificate issued by the service to id the user; it doesn't have to be a 1-time pad from you know who. ;-) ( But that works pretty well also! )
Oh yes, many people also use "throw away" email accounts when making a purchase, subscribing to something, giving an email address to post a comment such as this one, etc. They too do not care about the security of the password, and a 1 char password works just fine for that purpose.
I have been an advocate of secure passwords for many years and in an industry of which I work, security is a critical part. As such I am also an advocate of intelligent implementation and enforcement policies (non-MS approach)

The problem and solutions for my world never came down to complexity but education and implementation. We, as IT professionals are schooled (or we would like to think we all are) in these tenants. But how do you educate those individuals that are not. It is never as easy as it seems. But it can be...

I have used a two finger complex password solution for about 7 years now that can be taught easily but once you sell it to the masses, you essentially create the template for "others" to work dilignetly at breaking.

This is my example and since I know it will be eons until the masses even comprehend the significance or importance, I do not worry. This password I have used for many years. I no longer do so please feel free to use it against any algorithm to test.
I know exactly what it is but to some systems (MS) this password has become NOT complex enough. However I can do a two finger, two keyboard key (not counting shift),16 character password that passes every time. Not only does it work but all I need to do to come up with another password when the time period expires is to move up the keyboard one letter. So here it is.

16 character complex, two finger salute to security
qq11QQ!!qq11QQ!! and to reset ww22WW@@ww22WW@@ and so on and so on.

Enjoy the unendingly complex world of "BUT I didnt know I was supposed to do that." IT Security.
Maybe these providers could build in an account lockout feature - quarantine the source ip address if someone fails to login after 15 attempts in a 5 minute period?
Thanks for posting this, it's super timely. I'm writing some guidance on passwords for my company and it's nice to have another study of passwords to point to.
I have been a yahoo mail user for oh I don't know probably 17 years or so. It irritates and bothers me that they do not use a second factor or even SSL. None the less about passwords. I hate them, but not for the obvious reasons.
1) they are just too fallible and rely on people to remember them.
2) They are easily compromised.
3) once reach a critical mass of them ( i stand now at about 150 user account and password pairs) repeating them is very hard NOT to do.
The only solution I have found to date that is even worth a damn is the lastpass solution. You can use two factor authentication, extremely long and complex passwords, the best part is that you have to remember 1 password and your done. I do not work for them but have studied them in some detail and listened to the Steve Gibson podcast on them. If I have to use a password then i use lastpass to help me come up with a unique, non repeating gibberish password that I do not have to remember.
There are two threats to login credentials:

1. A guessable password that can be found before an account locks out.

2. A poor application that does not protect stored credentials.

#1 means you don't need a complex hard-to-remember password. You just need one that is not easily guessable in a few tries.

#2 means that no matter what your password looks like, it's completely irrelevant if the application does not protect it.

If the application lets people abuse "I forgot my user name" or 'I forgot my password" functionality, it doesn't matter what you choose. If the application does not securely store the credentials, it doesn't matter what you choose. If the application does not include brute-force protection, it doesn't matter what you choose because eventually it will be figured out. If you get a keylogger on your computer or you get man-in-the-middled, it doesn't matter hat you choose.

"Lousy" passwords are almost completely irrelevant when considered next to HOW passwords get lost: a bad application. The fact is that almost all of the "lousy" passwords provided the needed security until the application coughed them up.
passwords are dated and a flawed authentication method, certificates are a better method. Sites should allow you to give them a public key and you then can control your own passwords and your private key.
JJ, I don't disagree, HOW is at least as big a problem
whitetaco, certificates just introduce a whole new set of issues. Look at the issues with rogue CAs or breached CAs over the years, or the recent certificate revocation issues in browsers and with app-signing certs.

Diary Archives