A salt-free diet is bad for your security

I am not giving anyone health advice. Instead, I’m going to use the example of the recent LinkedIn breach to talk about hashes and salt. Not the food, but the cryptology. Before you dive into this article, you should certainly review the practical advice that Kelly has posted first. Also Kelly’s article has more information about the specific incident.

I’m writing for people who saw security people talking about “salt” and want to know more.
You may have seen things like this, that appeared in an article at The Verge.

It’s worth noting that the passwords are stored as unsalted SHA-1 hashes. SHA-1 is a secure algorithm, but is not foolproof. LinkedIn could have made the passwords more secure by ‘salting’ the hashes.

If you would like to know what that means, read on.

What we know

We know that a data set of about 6 million password hashes has been released. We also know that this does include LinkedIn passwords (more on how we know that later). The data made public does not include the usernames (email addresses), but it is almost certain that the people who got this data from LinkedIn have that. By the end of this article, you will understand why people who broke in would release only the hashes.

Hashing passwords

Websites and most login systems hash passwords so that they only have to store the hash and not the password itself. I have been writing about the importance of hash functions for a forthcoming article, but here I will try to keep things simple. A hash function such as SHA-1 converts a password like Password123 to “b2e98ad6f6eb8508dd6a14cfa704bad7f05f6fb1”. A good hash function will make it unfeasible to calculate what the password was if you only know the hash, but the hash function has to make easy to go from the password to the hash.

A hash function always produces the same hash from the same input. It will always convert “Password1” to that same thing. If both Alice and Bob use Password123, their passwords will be hashed to the same thing.

Let’s put rainbows on the table

Even if Bob and Alice use the same password (and so have the same hash of their passwords) they don’t have to worry about each other. Who they need to worry about is Charlie the Cracker. Charlie may have spent months running software that picks passwords and generates the hashes of those passwords. He will store those millions of passwords and their hashes in a database called a “Rainbow Table.”

A small portion of Charlie’s table may look like this

Password SHA-1 Hash
123456 7c4a8d09ca3762af61e59520943dc26494f8941b
abc123 6367c48dd193d56ea7b0baad25b19455e529f5ee
Password123 b2e98ad6f6eb8508dd6a14cfa704bad7f05f6fb1

(Rainbow tables are structured to allow for more efficient storing and lookup for large databases. They don’t actually look anything like my example.)

When the hashed passwords for millions of accounts are leaked, Charlie can simply look up the hashes from the site and see which ones match what he has in his tables. This way he can instantly discover the passwords for any hash that is in his database.

Of course, if you have been using 1Password’s Strong Password Generator to create your passwords for each site, then it is very unlikely that the hash of your password would be in Charlie’s table. But we know that not everyone does that.

Charlie also has a lot of friends (well, maybe not a lot, but he does have some) who have built up their own tables using different ways of generating passwords. This is why Charlie, who has both the usernames and the hashed passwords, may wish to just circulate the hashes. He is asking his friends to help lookup some of these hashes in their own tables.

I also promised to tell you how we know that the leaked hashes are indeed of LinkedIn passwords. If someone has a strong unique password for their LinkedIn account it is very unlikely that it was ever used elsewhere. So if the hash for that strong and unique password turns up on the leaked list, we can know it is from LinkedIn. This is presumably what Robert Graham did when determined that this is the LinkedIn list.

Salting the hash

More than 30 years ago, people realized the problem of pre-computed lookup tables, and so they developed a solution. The solution was to add some salt to password.

Here is how salting can work. Before the system creates a hash of the password it adds some random stuff (called “salt”) to the beginning of password. Let’s say it adds four characters. So when Alice uses the Password123 the system might add “MqZz” as the salt to the beginning to make the password MqZzPassword123. We then calculate the hash of that more unique password. When storing the hash, we also use store the salt with it. The salt is not secret, it just has to be random. Using this method, the salted hash that would be stored for Alice’s password would be “MqZz1b504173d594fd43c0b2e70022886501f30aee16”.

Bob’s password will get a different random salt, say “fgNZ”, which will make the salted hash of his password “fgNZ2ec6fa506fa9048d231b765559e2f3c79bdee5a1”. This is completely different than Alice’s, and – more importantly – it is completely different from anything Charlie has in his rainbow tables. Charlie can’t just build a table with passwords like Password123, instead he would have to build a table that contained Password123 prefixed by all of the two million possible salts we get using a four character salt.

Beyond salt

Salting is an old technology, yet a surprising number of web services don’t seem to be using it. This is probably because many of the tool kits for building websites didn’t include salting by default. The situation should improve as newer toolkits encourage more secure design, and also as these issues make the news.

But just as we are getting people up to speed with a 30 year old technology, salting may no longer be enough. And mere salting certainly isn’t good for the passwords that require the most security. To defend against determined and resourceful password crackers you should use both strong passwords and a password based key derivation function like
PBKDF2, which 1Password does use when encryption your Master Password.

Cipher of Advanced Encryption Substitution And Rotation

In the rapidly changing world of eternal mathematical truths, we need to innovativize for the sustanation of our competitive advantage. So instead of serving our customer base with the same old encryption algorithms that are used by governments, military, and most of the Internet, we want to provide the world with new, proprietary systems that should clearly set us apart in the marketplace and among the expert community.

With great pleasure, I’d like to introduce our forthcoming encryption system: Cipher of Advanced Encryption Substitution And Rotation (CAESAR).

Rationale for CAESAR

No salt but the finest

Himalayan Rock SaltUsing block equality (described below) largely obviates the problem of obtaining a good salt or nonce in our cryptography. But because we will be transitioning to our new system, there are cases where we still need to find a good salt. Most cryptographic implementations have been written in the C programming language, and so for past decades the fashion as been for C-salt. C-salt, of course, was an improvement on table salt, which entirely failed to discourage the use of tables in cracking passwords.

Times have changed, and C-salt is now passé. Instead people have been discovering the virtues of Himalayan salt. Thus we sent off an expedition to Pakistan, Afghanistan, Nepal, India, and Bhutan to find and bring back the finest and rarest of salts for our encryption recipes. To protect our interests, this salt and its origins must remain secret.

Rotation, modularity, and alphabets

Pretty much all computer ciphers are based on the theory of finite groups. CAESAR is no different. Indeed, despite the fact that we just came up with the system recently, the use of groups and rotation in CAESAR may fairly be said to be precedent setting. Yeah, history can be weird that way.

We are faced with one difficulty in our approach. Our expedition to far off lands described above discovered something that completely shocked us in addition returning with a source of salt. They discovered that not everyone speaks English. Not only did this astonishing discovery lead to our researchers submitting a number of papers to various academic journals (oddly, we have not heard back from them), and caused considerable delays and expense for the expedition itself; it also means that our original rotation would not work as effectively in alphabets that use more than the English letters A through Z.

After careful consideration of this issue, we realized that requiring such people to use English would actually increase their security. They would be using a language that those around them may not be able to read, thus giving them a second layer of encryption. This showed how forward thinking our design was. It automatically provided higher security in some unanticipated cases.

All blocks are equal!

Blockrights designOne of the problem with the kinds of block ciphers is their lack of equality when used in CBC or CTR modes. We, however, want to return the spirit of ECB mode in which all blocks are equal (Credit for this particular insight goes to the people at 0-Day Clothing).

In addition to the fairness issues in support of the notion that all blocks should be treated equally, it must be noted that mathematics is built on the notion of a mathematical function. Ensuring that a given input produces a unique value, turning to something like ECB mode helps restore us to mathematical purity.

Where to from here?

I have provided only a few of the highlights of CAESAR and our new thinking in cryptography. This custom made, bleeding-edge scheme will doubtless place us in a unique position in the security community. Look for more new projects in the future, possibly a year from the date of this article’s publication, April 1. One we are looking at is System of Natural Additive Key Encryption—Overloaded In Layers.

Do you know where your software comes from? Gatekeeper will help

Mountain LionYou trust us to provide you tools that keeps some of your very valuable secrets safe. Part of that trust means that, when you install or update 1Password or Knox, you know the app you are getting comes from us. After all, if a bad guy produces a modified version of 1Password, it could do bad things. So far there have been no such modified versions “out there” and we want to keep it that way. In addition to all of the things that we do to ensure that you get the genuine article, Apple is working to make it even easier to keep your Mac free of malicious software.

Apple has just announced that Mountain Lion (to be released in the summer of 2012) will include something called Gatekeeper. This is a core OS X feature that I and others have been anticipating for a while. (surprisingly, almost all of its components are actually built into the latest version of Lion). Roughly speaking, Gatekeeper will allow you to control which apps to run depending on where the software comes from.

The question then is: how does your Mac know where your software comes from?

The Magic of Digital Signatures

I would love nothing more than to explain the mathematics behind digital signatures. But for today’s purposes, let’s just say it is magic (even when you know the math, it feels like magic). When you connect over HTTPS to a secure website, that website proves who it is because it knows a particular secret (called a “private key” or “secret key”). The corresponding “public key” is not kept secret.

The magic is that the website doesn’t have to reveal the secret to prove that it knows it.


Evilgrade Interface

Instead your computer system can use the non-secret public key to construct a mathematical puzzle that only someone who knows the secret key can solve; anyone with the public key can check that the solution to the puzzle is correct, but they can’t figure out what the secret key is. This can prevent someone hijacking the download process with a tool like evilgrade.

In the same way that a secure website can prove who it is without revealing any secrets, a digital signature on a file (or a group of files) can prove who made it. If someone makes even the smallest change to the signed file, the signature won’t verify.

Three Kinds of Apps

Applications that you install through the Mac App Store (MAS) are all digitally signed this way. But for years, Apple has been encouraging developers to digitally sign applications even if they aren’t sold through the MAS. So on your Mac today there are probably three kinds of applications:

  1. Those that came from the MAS
  2. Signed applications that did not come through the MAS
  3. Applications that aren’t signed

Gatekeeper will allow you to decide which of these categories of applications may run on your machine.

If you are running 1Password 3.9, then that came signed through the MAS. But if you are running 1Password 3.8 or Knox 2 from our website, they are still signed by us and will fall into the second category.

Verifying a signature today

When you install an application from the Mac App Store, the installation process checks the signature. It won’t install the app if it isn’t signed or if the signature doesn’t verify (which is more likely to happen through a damaged download than through a malicious attack, but both can happen). When you update the non-MAS version of 1Password, our updater runs a code signing signature verification as one of the three checks we use to ensure that you are getting the genuine 1Password from us. For those who are curious, our other two verification mechanisms are (1) fetching from a secure web server and verifying the server signature, and (2) checking a cryptographic checksum for the download which we fetch from a separate secure server.

But suppose you wanted to check the version of 1Password that you currently are running. All of those behind-the-scenes checks on the download and installation processes won’t help you do that. Well, the way to check now is hard, which involves running a complicated command in a Terminal window. Here it is for the non-MAS version of 1Password

codesign -vvv -R="identifier ws.agile and anchor trusted" \

The output should be something like

/Applications/ valid on disk
/Applications/ satisfies its Designated Requirement
/Applications/ explicit requirement satisfied

Clearly we don’t expect users to run these sorts of commands.

codesign in Terminal

We have been using Apple’s code signing mechanism for years because we wanted to be able to direct concerned users to this kind of command if they specifically ask. We’ve also been using it for additional security in our own updater. But another reason that we’ve been doing this for a while is because we’ve been anticipating either Gatekeeper or something similar.

Verifying a signature tomorrow

Gatekeeper will perform the codesign verification when an application is launched. This adds a great level of additional security beyond verifying the download source when the application is downloaded and installed.

A mathematically valid signature is the easy part

Apple Developer IDThe mathematics (the magic) makes all of the above simple. The hard part of Gatekeeper is the trustworthiness of the signatures. I can sit at a my computer and create a public/private key pair that says that it belongs to Alan Turing. Since Turing has been dead for more than half a century, few people would think that it actually belongs to that great mathematician and codebreaker. But what if I picked the name of a trusted person or institution that is around today?

The answer is that some trusted third party digitally signs my public key after verifying it belongs to who it says it belongs to. I’ve discussed how this system works (and how it can break down) when it comes to web server certificates, so I won’t repeat that here; the concepts are all the same. In the case of codesigning certificates for Mac developers, Apple does that checking and signing.

We changed our name a while back, so at some point before Gatekeeper is in common use, we will have to update our codesigning certificate identifier from “ws.agile” to “com.agilebits”. But for the time being, when you see “ws.agile” as the entity behind the digital signature on 1Password and Knox, you should know that that is us.

Other than getting a new certificate with our new name, we have been ready and waiting for years to get on board with the new security provided by Gatekeeper.

[Update: As of 1Password 3.8.19 Beta 1, 1Password is now signed with our new Apple Developer ID, AgileBits Inc.]

AES Encryption isn't Cracked

An otherwise excellent article over at The Inquirer has a very unfortunate title: AES encryption is cracked. AES is the Advanced Encryption Standard and is at the heart of so much encryption used today by governments, militaries, banks, and all of us. It is used by 1Password and less directly by Knox for Mac. It is the work horse of modern cryptography, and modern computer chips even have components built is to allow AES to be used efficiently. If AES were to be found weakened in any meaningful way, it would be very bad news for a lot of people.

Before I get into what has happened, I’d like to quote from the research paper itself: “As our attacks are of high computational complexity, they do not threaten the practical use of AES in any way.

And quoting the Inquirer’s interview with Andrey Bogdanov, one of the researchers, we learn

“To put this into perspective: on a trillion machines, that each could test a billion keys per second, it would take more than two billion years to recover an AES-128 key,” the Leuven University researcher added. “Because of these huge complexities, the attack has no practical implications on the security of user data.” Andrey Bogdanov told The INQUIRER that a “practical” AES crack is still far off but added that the work uncovered more about the standard than was known before.

“Indeed, we are even not close to a practical break of AES at the moment.”

What’s the news

I’ve been trying to work through the actual paper and presentation slides by Andrey Bogdanov, Dmitry Khovratovich, and Christian Rechberger who were visiting Microsoft Research when they developed this. And although this research is far from having any practical influence on the use of AES it is actually fairly big news.

Cryptographers use the word “broken” in a very special way. If an attack on an algorithm can be computed with fewer computations than is required to check every single possible key, then the system is “broken”. Even if the improvement in the number of computations is negligible (as in this case) and even if other resources needed to get that small advantage are outrageously huge (again as in this case) it still gets called “broken”. But in this very specialized sense of the word “broken” the new research represents the first break of the full AES. It also displays the power of a technique developed earlier by the authors.

Impracticality #1: Impossible amounts of data

The authors calculate the best attack using their technique on AES with a 128 bit key requires storing 288 bits of data. That works out to about 38 trillion terabytes of data. Although estimates are hard to pin down, this is more than all the data stored on all the computers on the planet.

Impracticality #2: All for a two-bit gain

The number of encryptions that have to be performed to recover a 128 bit AES key is 2126.1 instead of the 2128 encryptions it would take to try all of the possible keys. This is a very small gain, as a 126-bit key (instead of 128-bits) would still take billions of years.

Impracticality #3: Lots of known plaintext needed

I may be misreading the research, but I believe that to discover an AES key, an attacker needs an enormous amount of known plaintext. That is, the attacker needs to already have a huge amount of information in both decrypted and encrypted form. I don’t know exactly how “huge” this will be, but I expect it to be far larger than any data anyone would or could encrypt using 1Password. I’m speculating here until I get a better grasp of things. Indeed, the amount is almost certainly related to the amount of data needed in “Impracticality #1”.

So this all doesn’t represent any threat to the practical use of AES for any purpose it is used for. An unfeasible amount of data needs to be stored in order to gain an insignificant improvement over just trying every key. But what it does do is highlight features of AES that make it subject to this kind of attack. Whether attacks based on this ever become any kind of real threat, we can bet that successors of AES will incorporate mechanisms to thwart them.

Where’s the meat? It’s in the middle

From here on out, I will try to explain some of what I understand from the new attack. There is much that I don’t understand of this, but I will give a broad outline and then wave my hands a bit. This part gets very technical, and I won’t be the slightest bit hurt if you stop reading here.

You may have heard of 3DES (Triple DES) which was used in many places before AES was settled upon as a replacement. The old Data Encryption Standard (DES) uses 56 bit keys. By the time we got into the 1980s it was absolutely clear that 56 bits was no longer enough for a key size. One could imagine (as many people did) taking two DES keys and just encrypting your data twice, first with one DES key and then taking that output and encrypting that with the second DES key. This, you might think, would get you the strength of a 112 bit key. It doesn’t.

It turns out that if you have an sample plaintext and ciphertext pair what you can do is try everyone one of the 256 possible keys on the plaintext and also try everyone of the possible keys on the ciphertext as well. You will then find that there is an overlap of results. Some things that the plaintext encrypts into with one key will be the same as what the ciphertext decrypts into. This will give you (pretty much) the two 56 bit keys. This looking for where the output of one can meet up with the input of the other leads to this being called a “meet-in-the-middle” attack (not to be confused with a “man-in-the-middle” attack which is something else entirely). Note that in doing this, we “only” had to work through 256 keys twice. That is the same as working through 257 once. So double encrypting with DES only improved the security by one bit. This is why to get 112 bit strength out of DES we need to go through it three times, and so even though it allows for double the number of key bits, it is actually Triple DES.

Meet-in-the-middle diagram

Now back to AES. Ciphers like AES go through multiple rounds of scrambling and manipulating the data. They also have various internal states as they process a block of data with a key. If we find an internal variable that allows us to break the encipherment into two halves then it is possible to do a meet-in-the-middle attack on that. AES, along with every modern cipher, is designed with this in mind. It is designed with enough rounds and interactions among them so that a standard meet-in-the-middle attack will not be quicker than simply trying every key.

Instead of doing the traditional meet-in-the-middle attack, the new attack constructs entities that group internal states, potential keys which complement each other in specific ways, and ciphertext into what they call “bicliques”. By using these more abstract entities instead of an intermediate variable, the attack can avoid some of the limitations on meet-in-the-middle attacks and be effective over a greater number of rounds. By carefully selecting which potential keys go into which biclique, some computation can be reduced by avoiding any duplication of effort. I still haven’t managed to understand, even in overview, how and why these bicliques do what they do, so I can’t say much more.

Thanks for joining me

If you’ve read this far (including the last section) then I thank you for joining me through my process of trying to understand this new attack on AES. Even though it has no practical implications, I find this stuff oddly fascinating. If you’ve just skipped right to the bottom (not an unreasonable thing to do at all) then let me say again everyone who has studied this, including the authors of the attack, state that this has no implications whatsoever for the practical use of AES.