A Really Good Article on How Easy it Is to Crack Passwords
schneier.comRemember, security against cracking is a combination of password strength and key derivation function strength. Nothing will save you if your password is "password". Not much will save you if your password is hashed with MD5.
But scrypt can be over 100,000,000 time stronger than MD5 -- so if you're using scrypt you can afford to use a password which is 100,000,000 times weaker. "jdtwbv" hashed using scrypt is stronger than "H.*W8Jz&r3" hashed using MD5.
""jdtwbv" hashed using scrypt is stronger than "H.*W8Jz&r3" hashed using MD5"
Is it? I'm not sure.
for the first one you're using lowercase letters (and digits, I'm giving you that 'free')
For the first one we have 36^6 For the second one (all printables) 100^9
Relation between them: ~ 459,393,658. If you're saying scrypt is 100M times better, in this case the second one is safer
And the relation is important but less as computers get faster. Option B may take 1Mi times as long as Option A but if Option A takes 1 microsecond, there goes your Option B as well
If you're saying scrypt is 100M times better
Oops, you're right, I got the math wrong when I looked at the table in my paper. I should have said that scrypt can be over 100,000,000,000 times stronger. ;-)
People have built huge rainbow tables of MD5 hashes.
I don't really keep up with that game (like WoW, it seems like a fun game, but only if you are willing to put in a lot of your time), but I think the current limit is somewhere around 8 or 9 characters if you are pulling from all printables, meaning that "H.*W8Jz&r3" with MD5 is probably not breakable right now.
Take off two characters, or wait 3 years, and it probably will be.
My understanding, and I'm sure that someone else will correct me, is that with MD5 rainbow tables it's not so much that someone will get your password as they will get something that hashes to the same value. More than likely this will be your password, but sometimes not. The point is that it doesn't matter if your password is 25 characters long.. if there's a 5 character password that hashes to the same value they could log in with it.
While it's possible to make MD5 collisions, finding something that hashes to the same thing as a hash of my short password is essentially impossible.
In fact, collisions on short passwords are harder than collisions on long passwords. The space of all MD5 outputs is way bigger than the space of 12-character passwords.
> "jdtwbv" hashed using scrypt is stronger than "H.W8Jz&r3" hashed using MD5*
But "password" is still grossly insecure in either case, it'll still be the first thing that someone performing a dictionary attack will try. Never tell people how good your key derivation function is, lest they misunderstand and think it means they don't have to chose a non-obvious password/passphrase.
Just use bcrypt :-p
bcrypt is not bad and you're definitely better off with that than with MD5, but scrypt performs better for these sort of things. There was an article on HN a week or so ago about this.
One of the problems of scrypt is a lack of language bindings. There's no officially blessed language binding for PHP, there's a ruby gem that only works on MRI but doesn't support jruby. Bcrypt on the other hand is widely supported, simple and easy to use implementations exist for devise and activerecord for example. I'd pick the slightly worse but widely supported algorithm and rather tune the work factor than being stuck with an extension that might or might not be supported depending on the authors availability of time and resources.
This all can change, but that's the current state of things. Sorry scrypt.
> there's a ruby gem that only works on MRI but doesn't support jruby
Works as well as any other MRI extension, which is to say pretty well:
Yay for ffi.jruby-1.7.4 :001 > require 'scrypt' => true jruby-1.7.4 :002 > SCrypt::Password.create("bla") => "400$8$2d$096ac4e8a120a4f9$d1e13bbfa387196d68f116d76ae23d0d3ffa39c891192a56832db0a1d8f6a8ec"Well, C-Extension support in jruby is wonky at best. It works for some and doesn't for others and is sometimes scheduled to be removed. Granted, this one works, I stand corrected. There's a full java implementation for scrypt as well. However, my point still stands: There's no integration in devise, none in rails. No PHP implementation.
It's all fairly easy to change, but nobody has done so :)
Devise: https://github.com/capita/devise-scrypt PHP: http://pecl.php.net/package/scrypt
So not quite "none", though granted being some random third party addon might as well be for many people (and possibly a good attitude to take for something security-critical).
ok, my original statement regarding PHP was:
> There's no officially blessed language binding for PHP.
And judging from https://github.com/DomBlack/php-scrypt/issues/9 that's going to stay like that a while. There's a pure PHP implementation that falls back to the pecl package, so that's probably your best bet atm.
I think that since bcrypt is more available I will continue recommending it, and using anything else should be based on a risk assessment. I do not think peoples that don’t do risk assessment on storing password will care enough to spend time on something that is not easy to set up when it “only” affect security
That article was simply wrong. The chart at the top of it was added after it was pointed out that PBKDF2 is worse than bcrypt as a password hash, and the chart refutes the article.
Great article. Old discussion here
> "This is an answer to the batteryhorsestaple thing."
Steube misunderstands the xkcd comic [1]. There's a really good comment which explains it: "It could be argued that Randall's example of 4 words is too short -- and indeed, for some applications, it is. However for a typical dictionary size, and genuinely random selection, it is massively stronger than "typical" passwords and in fact easily adequte to defeat the above-mentioned attacks." [2]
Emphasis on "genuinely random selection."
[2] http://www.schneier.com/blog/archives/2013/06/a_really_good_...
What makes you think he misunderstands it? For the cracker it's not about entropy per se, it's a game to come up with algorithms that crack more passwords for less compute power. The XKCD comic got a lot of mindshare so it makes sense to target algorithms towards that type of password.
I think Schneier's suggestion of reducing it to the first letter of each word is vastly preferable because it packs the majority of entropy from random word selection into the least amount of typing.
The algorithm is not targeted against the type of password which the XKCD comic suggests. The algorithm is designed to exploit common human behavior, which is similar to the XKCD method but not identical. The significant difference is that human behavior in picking words is not random, while the XKCD method requires the word selection process to be truly random. The "iloveyousomuch" example by Steube is unlikely to be picked randomly.
salmonellaeater is right, Steube misunderstands the comic. The idea of the comic is to pick a small random selection of the 250,000 distinct words in a oxford dictionary, rather than 8 of the 95 letters from all ASCII printable characters. A selection of 3 words has then higher entropy than 8 random characters, because 250,000^3 is a bigger number than 95^8. The question then is, will 3 random words really be easier to remember than 8 ASCII printable characters?
The downside to the Schneier scheme, is that each is a common sentence (low entropy), with a chosen transformation algorithm added. Thus the quality of the password will depend on the number of transformation algorithms, and the quality of each one. If we are to use the one first described to create "tlpWENT2m", we get a password strength like:
Using strictly the first letter, would only do 2x linear increase in entropy over just searching for common sentences. Change any occurrence of common numbers substitutes for words adds (0-2x) entropy increase. Writing one of the words in all caps means 6x increase in entropy. Combined, tlpWENT2m is slightly less secure than "This little piggy went to market" + two [random number below 10] or a single letter at the end.
Where are you guys getting this? All I read was this:
> Steube was able to crack "momof3g8kids" because he had "momof3g" in his 111 million dict and "8kids" in a smaller dict.
> "The combinator attack got it! It's cool," he said. Then referring to the oft-cited xkcd comic, he added: "This is an answer to the batteryhorsestaple thing."
It sounds to me like he's combining words randomly, not "exploiting common human behavior".
He found a password by 2 words randomly from two dictionaries of different sizes, so he only had m * n combinations to choose from, and his n is a lot smaller than m.
Whereas the xkcd approach is more like m * m * m * m.
In other words, exponentiation > multiplication.
Correct. What I meant with "exploiting common human behavior", is that the dictionaries the attacker used is built from list of old passwords found in previous attacks. Those dictionary will be order of magnitude smaller than a dictionary of the English language, but attackers know that people tend to pick passwords (or in this case, compilation of passwords) that someone else has already thought of before. Its a simple observed behavior that people in general tend to think alike, and simply do not think randomly even if individually, it "feels" random.
> The question then is, will 3 random words really be easier to remember than 8 ASCII printable characters?
In a sense, yes. The xkcd comic also illustrates this. A common technique to remember a sequence of arbitrary things is to transform the things into concepts or objects, and transform this sequence into a ridiculous story or visual image (the crazier it is, the better it sticks in the mind, plus it's more fun).
If you use words instead of random characters, you get to skip the "transform into concepts or objects" step, as well as you don't need to string as much of them together in a crazy but coherent picture/story.
Of course it's important to build the picture after the words, not the other way around, because then you'd probably lose some entropy again.
The entropy Randall calculated for "correcthorsestaplebattery" was a lower bound, meaning that if the attacker knows that you made your password out of 4 dictionary words, it still has tons of entropy. If the attacker doesn't know how you came up with your password, it'll take them even longer.
How would something attack Diceware?
There's a list of 7776 words, everyone knows what words are on the list. I suspect that sometimes people re-roll because they don't like a word or don't think they'll remember it. But I don't that that makes much difference.
> I suspect that sometimes people re-roll because they don't like a word or don't think they'll remember it.
This is strongly discouraged, and it does matter. The words are supposed to be random, but re-rolling makes them not random.
What password length would you need to get away with a plain-old grammatical english sentence (i.e. very much non-random selection)?
For example: "and in the swept plains of winter's vale, our hero did beseech the emperor to send for his forces" -- what would be the difficulty in cracking that, given that this isn't a quote from a book or anything, but just a sentence that popped into my mind and seems easy enough to remember?
Almost 20 years ago I saw a great password-picking article that still holds today. http://world.std.com/~reinhold/diceware.html
Take a list of 6^5 words. Roll 5 dice. Take that word from the list. Do this 4 more times. You now have a five-word passphrase like "moire fraud 80 row bernet".
Even if someone knew the exact method and list you did to get that passphrase, there are 28430288029929701376 combinations, giving you over 64 bits of entropy.
Someone has probably tried to rainbow table all those results for MD5. If a core can do 1 billion hashes per second, it would take 900 core-years to build a complete list of all those combinations, which is probably feasible for a small group to put together, but messing with the list just a little bit or adding a 6th word would likely put you past that even for a crappy MD5 hashing.
Shannon did an experiment that found the entropy of English text is about 1.6 bits per character. This is probably a high estimate, since the kinds of sentences you might think up for a password probably have lower entropy than if you used a source of random bits to generate valid sentences.
My God, are you going to type all of that or will you need a script to do it for you. Watch out for those touch-screen thingies people are touting around.
Some things you don't always need to use from those touch-screen thingies
That's a funny choice for the name. Is it kee-pass or keep-* ?
With Swype and similar programs, passphrases are pretty easy to enter.
I know there are tools & password vaults but what %-age uses them? Secondly, those password managers are introducing another possible vulnerability where you don't have control.
Swype is a text-entry interface.
'Also included in the list: "all of the lights" (yes, spaces are allowed on many sites), "i hate hackers," "allineedislove," "ilovemySister31," "iloveyousomuch," "Philippians4:13," "Philippians4:6-7," and "qeadzcwrsfxv1331." "gonefishing1125" was another password Steube saw appear on his computer screen. Seconds after it was cracked, he noted, "You won't ever find it using brute force."'
If you won't ever find "gonefishing1125" using brute force, how on earth did they find "qeadzcwrstxv1331"?
Have you looked at the keyboard pattern for qeadzcwrsfxv1331?
I imagine there are a whole bunch of these geometric patterns, and different combos of them are tried.
Passwords are broken and I really wish we would all move away from them. Persona is a nice idea with regards to privacy and control, but it's still a password that you need to remember, which can be cracked. Also, people generally don't use strong passwords.
What irks me is that every OS in use today has support for strong cryptography and browser vendors could easily integrate that. We would no longer register for a website, we would simply upload our "Online Identity" or whatever we called it. This of course is just an id_rsa.pub with maybe name and email in the comment. The remote site stores the public key and the browser authenticates using the private key, stored securely in the keychain.
This has the potential to be invisible to users, and thus used by default, and highly secure since the local keychain can generate incredibly strong keys, all behind the scenes.
Persona doesn't require a password. You could authenticate with an SSL certificate, a Yubikey or whatever else you want. I wrote my own, hosted identity provider (https://www.persowna.net/) which includes 2FA now, and I plan to add more of these types of authentication in the future.
> What irks me is that every OS in use today has support for strong cryptography and browser vendors could easily integrate that. We would no longer register for a website, we would simply upload our "Online Identity" or whatever we called it. This of course is just an id_rsa.pub with maybe name and email in the comment. The remote site stores the public key and the browser authenticates using the private key, stored securely in the keychain.
Like SSL client certificates?
I agree. Do we have to leave this initiative up to the browser developers though? As a website developer why can't I just replace the traditional password form field with a textarea form field, requiring the user to copy and paste their RSA private key (for my site) into the field, which would then be validated against their public key kept in the website user table? For additional security the private/public key pair could also be password locked. As long as my site(s) are using SSL, and other best practices, isn't the biggest risk one of the user losing their private key or having nefarious hands otherwise getting a hold of it?
And how do you access your identity from a device that isn't your own?
I'm 100% with you, it would be a major step forward - but it's too inflexible for Joe & Jane.
I wouldn't mind it if I had to have my phone to access the identity. It would be a simple matter of integration to use the phone to grant a temporary authorization to an unknown device.
and also it kinda destroys the ubiquity of the service. you have to admit, the ability to access your account from any device anywhere is pretty cool (and very critical in some cases)
It certainly is a difficult sell to the average user. For most Internet Banking, it's already implemented, but try to get users to accept that when using Facebook or access to their mail.
In Denmark we have a public system called "NemID". It is a 2-factor authentication, which relies on a card with one-time codes, or eventually, a physical key-generator. It is used to anything related to Internet Banking or access to the public services on the internet, such as application for university, change in tax return, and the like. Unless you can incorporate such a system, which ensure that most uses already have the needed physical token, I not convinced you can pull it off.
I like schemes that have an explicit input of n random bits (or where you can at least have a good estimate on the entropy.) With the Schneier Scheme I can not be sure of the actual entropy of my password. Maybe my brain only generates a relatively small set of sentences which can be reverse-engineered from my comments on HN? :-)
A good algorithm would take n bits and map them uniquely to a set of strings that are easy to remember for a human. The apg utility does something like that.
Why not force the user to have strong login credentials?
I'm creating an online system that will store users' sensitive financial data. When setting up an account, the user will have to choose a password as normal, but will also be given a passphrase of the form "correct horse battery staple" that they must write down. To log in, the user will need to enter (a) username; (b) password; and (c) passphrase.
It is effectively a poor man's two-factor authentication - the second factor being the piece of paper containing the passphrase. I think it strikes a good balance between security, convenience and cost.
What do others think of this approach?
That's not TFA since the piece of paper with the passphrase is not "A thing you have". It's just "another thing you know." and thus brute-foreceable. It's the same as not allowing the user to choose a password but rather generate 12-character random passwords with special chars.
Authentication devices for TFA are designed, so that you really have to have the device close to you when you do a login.
Another issue with this is it breaks password managers, including the built-in browser password storage. While you might say that's a Good Thing for security, it's not something you could easily pull off as a startup.
Due to lock-in effects, people have to deal with all manner of usability hell from their bank, but the same logic doesn't apply to startups. Not that your idea is usability hell, but you probably don't want to make it any harder than it needs to be.
I think adding a few characters to the minimum password would be equally secure and more consistent with tooling, as well as a more familiar model for users.
Also, 2FA might be easier than you think using a service like Twilio. Or another way to do it would be to let the user connect via a service that does support 2FA (e.g. Google or Twitter;and maybe adding your own password if you want to harden that).
I recently added 2FA (OATH/Google Authenticator) support to Persowna[1], and it only took about two hours, 1:55 of which was spent on the UI. It's really not very hard.
> I think adding a few characters to the minimum password would be equally secure
Do you mean saying to the user, "your password must be at least 12 characters long"? That would just result in the user adding "12345" to the end of their standard password. Still seems much easier to crack than 4 random words.
> Also, 2FA might be easier than you think using a service like Twilio.
It might be easy for me to set up, but for my users (who are mostly non-technical) it is still relatively painful to install and set up a two-factor authentication app. I think most of my users would prefer the write-down-four-words option, even if it is a little less secure.
> you probably don't want to make it any harder than it needs to be
OK, so the question is, "does it need to be harder than the standard login form of username and password field?". Since my system deals with sensitive financial data, and given the problems with allowing users to pick their own password, I would say the answer is "Yes"
I think you need to analyze the risks more specifically.
You should rate-limit login attempts on the live site. Even allowing only one login attempt per second kills any brute-forcing attack if your passwords have even mediocre complexity.
Password cracking is only really a threat is the bad guys get your database. And if they do, it's not much more difficult to crack two passwords than one.
The point of true two-factor is that the second "password" which comes from the device is never stored in your DB, so it cannot be cracked. That is not true of your approach.
Yes you are right that it is not 'true' two-factor authentication. It would certainly be more secure if all my users were able and willing to use something like Google Authenticator. However, I suspect that most of my users (who are not especially computer literate) would prefer the simplicity of writing down 4 words over having to install and configure an two-factor app on their phone.
You say, "it's not much more difficult to crack two passwords than one" but I don't see how that is the case if the second password is four words chosen at random from a dictionary of say 5000 words. Such a password is far more difficult to crack than the average password chosen by the average user. Having a second passphrase generated by a computer also eliminates the problem of users re-using the same password between sites, or choosing "letmein" or "password1" as a password.
Why not just require your users to set a 4-word passphrase as their password? You'll capture more variations than you would working from a fixed 5,000 word dictionary, and your users can still choose to write the words down if they want--or they can use the password management features of their browsers if they want. Plus it would be more simple to build and maintain, which is a plus when it comes to security.
The problem is that the average user is really bad at choosing a password. If the system requires a four-word passphrase then the user will choose easy-to-crack passphrases such as "use the force luke" or "john paul george ringo".
If the system randomly chooses the four words then you force the user to exchange convenience for security.
Then why let users choose a password at all? Why not just assign them one that they have to write down? And if they're going to write it down anyway, why make it words? Why not generate a 20 character random string?
I obviously don't have the whole picture of your effort, but from your description so far, I think you are over-emphasizing the importance of clever password schemes. As Colin points out in the top comment, hashing with scrypt will make even mediocre passwords uncrackable. So it would be a better use of your time to implement scrypt or bcrypt with just one password.
And high-speed cracking is only a problem if the bad guys get your password table. To do that they will have to get into your application...and if that happens there are all sorts of other problems. So I'd argue that spending more time testing and proving the overall security of your app is also a better use of your time.
> Then why let users choose a password at all? Why not just assign them one that they have to write down?
Because then anyone who reads what is written down gains full access to the user's data. A password (kept in human memory) plus passphrase (written down) is more secure. I would agree this comes at the cost of convenience, but I think the trade-off is worth it.
> Why not generate a 20 character random string?
Because it would be a real pain to type each time the user logs in. In this case I don't think the security/convenience trade-off is worth it.
> As Colin points out in the top comment, hashing with scrypt will make even mediocre passwords uncrackable.
True, but what percentage of users choose poor passwords - not mediocre ones? Scrypt will not be much good if the user chooses a password from the dictionary, or a word that appears in a list of the top 10,000 most common passwords. (Edit: According to Mark Burnett [2] such passwords are chosen by 99.8% of users)
> And high-speed cracking is only a problem if the bad guys get your password table.
The password + passphrase model also protects users who choose the same password for different online systems. A weakness in some other website (or something more evil [1]) will not compromise the security of my online system.
Edit: Low-speed cracking might also be a problem. Mark Burnett says 14% of users have a password from the top 10 password list [2].
You are presuming that your users would not write down the password they choose for themselves as well, perhaps on the same sheet of paper.
I'll leave this final thought--how many other websites have implemented a double password system like the one you're proposing? I don't know of any.
Is that because you have come up with a more secure solution that no one else has thought of? Or that your approach does not confer the security advantages that you think it does? Which is the more likely explanation?
It's certainly better than just a password, and as you say is a nice balance between usability and strong passwords.
However, I'd be careful about thinking of it as any sort of 2-factor authentication and wouldn't bestow any of the advantages of 2-factor auth on your scheme.
A static secret, no matter how complex, doesn't really prove ownership because multiple people can trivially have a copy of the secret at the same time. So you don't have a knowledge and a physical factor, just a convoluted knowledge factor.
Better than just a password, but don't let it g e you a false sense of security.
The passphrase looks to be very weak if it is just something like 4 english words. And fails on the convenience test.
Correct horse battery staple comes from https://xkcd.com/936/
Except everyone will lose their passphrase. Everyone.
People seem to forget this important fact - That hashes get leaked. Without a hash corresponding to a user account it's quite hard to break in to a given account with a moderately reasonable password, even if the hash can be 'broken' in milliseconds.
One benefit of being a indian language speaker (or other language not in hackers dictionary) is we can easily choose reasonably secure passwords that are remember-able by simply using native language phrases (combined with numbers and mixed caps)
Assuming there aren't any Indians writing password-cracking software...
The Ars article seemed totally irrelevant to me since it used MD5?
Microsoft Active Directory servers (used in big business and government all over the world) uses one round of MD4 (no salt). That's a 4, not a 5.
The cracking technique discussed is dictionary plus some common substitutions. So the hashing algorithm is not very important. You would loose some factor of speed, but the 1000 most common passwords times 10 common substitutions, perhaps with 100 postfixes is still only 1 million hashes. And you would crack with these some non negligible fraction of the passwords in an unsalted database in probably under a minute. ( If the passwords are salted in the db, then you need a minute per hash, so assuming that you crack a few percent of the hashes you try, then you expect one password in under two hours even with modern password hashes.)
I don't understand the difference between "momof3g8kids" and "tlpWENT2m". Why would the latter be more secure?
Actually, it's hard for a 9-character password to beat a 12-character password even though the latter has a larger alphabet/key-space (unless I've completely blown the analysis below, which was done before coffee :).
The first has a key-space of 36^12 (36 possible characters in each of 12 positions), or about 4.7e18. The second has a key-space of 62^9 (62 upper/lower case letters and digits in each of 9 possible locations), or about 1.4e16.
If, in addition to adding the uppercase letters, you added the possibility of needing to test symbols, such as ~`.,/:;!@#$%^&*-=_+ (another 19 symbols), and changed the latter password to "tlpW#NT2m", then the searchable key-space for all 9-character passwords becomes 81^9, or about 1.5e17.
RE-EDIT: Sorry. I should have read the article first. I'm not sure why the latter would be more secure. Obviously "WENT" would be in a dictionary, so I'd think that "tlpWENT2m" would fall to a combinator attack very quickly, too.
The former is common enough for multiple people to have it as their username, for a start.