### Posts

This is going to be a long and technical article, but the point can be stated more simply:

The kinds of security architectures in which it is easy to insert a back door are typically less secure than the security architectures in which it is hard to insert a back door. The back doors that have been recently been disclosed in Juniper Networks’ ScreenOS firewalls exemplify this point dramatically. It is also possible that the back door technology developed by the NSA is being used by some entity other than the NSA.

Perhaps the Economist put it better in When back doors backfire:

The problem with back doors is that, though they make life easier for spooks, they also make the internet less secure for everyone else. Recent revelations involving Juniper, an American maker of networking hardware and software, vividly demonstrate how. Juniper disclosed in December that a back door, dating to 2012, let anyone with knowledge of it read traffic encrypted by its “virtual private network” software, which is used by companies and government agencies worldwide to connect different offices via the public internet. It is unclear who is responsible, but the flaw may have arisen when one intelligence agency installed a back door which was then secretly modified by another.

## Two kinds of back doors

There were two back doors in Juniper’s ScreenOS that were fixed by Juniper, the mundane one and the interesting one. The mundane one was a back door into the authentication system, allowing someone with knowledge of it to simply log in and administer the devices. Back doors in authentication systems are easy to create, and are all too common. This, by the way, is why we try to build 1Password’s security on encryption rather than authentication. As I’ve argued before: Architectures that allow for back doors are bad for security.

The more interesting – and for my soapbox, more instructive – back door is cryptographic. It also presents a greater threat. It allows the holder of the back door secret to decrypt certain kinds of traffic that were supposed to be encrypted by the ScreenOS (VPN) server. The attacker only has to listen to network traffic that is encrypted as far as everyone else is concerned, but she will be able to decrypt that traffic with the knowledge of the back door secret.

Another difference between the two is that the mundane one is a simple password (disguised as debugging code) that is now known to the world and so any unpatched systems can now be logged into by anyone. That secret is now out. The cryptographic back door, on the other hand, is a very different beast. We know it’s there, and we know how it works, but we do not have the key. The secret needed to make use of the back door is still a secret.

I don’t want to suggest that the back door in the ScreenOS authentication system isn’t important. It is very worrisome on its own, given where ScreenOS is deployed. But we all know that it is easy to add a back door to something that is merely protected by authentication (no matter how many authentication factors one has in place).

Politically the questions about how that authentication back door got put into place in such an important product and who may have been using it is huge. But it offers nothing new from a technological point of view.

## The cryptographic back door

Oh. I want to work for the NSA! Evil mathematicians just like Professor Moriarty!

—My daughter, upon learning about the Dual EC backdoor

The cryptographic back door key all depends on how something called Q was created. Q is not a number (nor is it a free man), instead it is a point on a special sort of graph. If Q was created randomly then there is no back door key and all is well and good. But if Q was created by adding P (another point on the graph) to itself some large number of times then knowing how many times P was added to itself becomes your back door secret.

You are going to see the equation

$Q = dP$

many times. Neither Q nor P are secret. But d is secret and it is the back door key.

No doubt you are thinking something like, “well if P and Q aren’t secret, can’t we all figure out what d is by dividing Q by P?” Shouldn’t

$d = Q/P$

Well, d kinda sorta does equal Q/P, but there just isn’t enough computer power around to perform the relevant calculation. I’ll save that bit of fun math to (much) further below.

## Necessary historical background

I’m one of those people who when asked a simple question go into more historical background than many people appreciate. But here it really is necessary. Really.

ScreenOS (and others) used a cryptographic algorithm that was deliberately designed to have a back door that only the United States National Security Agency (NSA) was supposed to know about. The cryptographic algorithm is Dual Elliptic Curve Deterministic Random Bit Generator (Dual_EC_DRBG). I will refer to that simply as “Dual EC” from here on out.

Dual EC was one of four DRBGs specified in NIST SP 800–90A, and leaving the back door aside it is still technically inferior to the other three listed in that document. The standard specified a P and Q but did not explain how they were chosen. In 2007 Dan Shumow and Niels Ferguson of Microsoft showed that if Q was created with something like $Q = dP$, then whoever had hold of d could decrypt traffic encrypted with Dual_EC.

In 2014, a group of cryptographers presented a great explanation of how knowledge of d can be exploited, and even presented a proof of concept (with their own Q) of doing this.

Back in 2007 nobody (well, nobody who was going to say anything) knew whether or not Q was created so as to have the back door. In September 2013 it became clear from some of the documents revealed by Edward Snowden Dual EC had been designed with the back door. The Wikipedia page on Dual EC has a useful timeline.

For ’tis the sport to have the engineer
Hoist with his own petard
Hamlet Act 3, Scene 4

So far the story I’ve told would merely suggest that Juniper has some explaining to do about why it used a DRBG in the way that it did and why they chose one that was inferior to alternatives and was suspected of being backdoored. Juniper Networks are (sadly) not the only one in that position.

But I’ve only told part of the story. ScreenOS did not use the NIST Q. It used two custom Qs over time, and where they came from and who knows the back door keys associated with those Qs is something we simply don’t know. The back door key associated with the Q from the NIST standard is clearly held by the NSA. But what about the two used (at different times) in ScreenOS?

Juniper explained that they didn’t use the NIST Q in their Dual EC implementation but used a Q that was created randomly. However, they did not provide any proof that it was created randomly. (There are ways to demonstrably create such things randomly.) So we don’t know if that “official Juniper Q” was created in a way to give its creator a back door key. Nor do we know who might hold that key.

But there is a third Q that was, according to Juniper, inserted by parties unknown in 2012. That is someone simply changed the Q that is used in Dual EC in ScreenOS. We can only assume that this third Q is indeed backdoored, as it was something surreptitiously added to the code. We have no idea of who did this.

So even if we take Juniper at their word about their choice of Dual EC and other related design choices, the fact that the chose to use a system that was designed so that it could have a back door means that someone – we don’t know who – has had a back door key for decrypting VPN traffic between NetScreen devices for three years.

I have said it before and I will say it again, systems that are designed to have back doors (even if the back doors aren’t used) are inherently weaker than systems that are designed to make it difficult to have a back door.

## Thar be math

The rest of this article tries to explain why it is easy to calculate Q from d and P but hard to calculate d from P and Q. Unless you have a great deal of patience and would like to learn more about the math, you can stop reading here.

Here is a picture of Patty and Morgan enjoying the pleasures of warm blankets on the sofa. Morgan also has no patience for math.

Patty and Morgan relaxing on couch. Not doing math.

### Hard math, simple math

The actual math isn’t all that complicated, but it does rely on a number of unfamiliar notions. Some of these notions are hard to grasp not because they are complicated, but because they are simple. They are highly abstract. Indeed, you might even have to unlearn some things you know about addition.

I said earlier that even though P and Q are public, we can’t figure out what d is in $Q = dP$, but someone with d and P can easily construct Q. I’m not going to fully explain all of this, but I’m going to make an attempt to try to help people get a better sense of how something like this might be true. Along the way, I’m going to tell a few lies and omit lots of stuff. I need to lie to stand a reasonable chance of this article being useful to people who don’t already know the math.

One thing I should tell you right away is that P and Q are not numbers, while d is a number. I will get around to explaining what P and Q actually are below, but first we need to understand what it means to add two things together.

I am not going to try to explain how knowledge of d provides a back door. That is something that is difficult enough to explain in the normal case and is even harder given how Dual EC is actually used within ScreenOS. (See Ralf-Philipp Weinmann’s “Some Analysis of the Backdoored Backdoor” for some of those details.)

Instead, I’m focusing on a narrow point, allowing me to go into more depth about something that is central to modern cryptography.

Most of us have a pretty good sense of how addition works. We all pretty much know that the “+” in “3 + 5 = 8” means. We may not be able to define it precisely, but it is a basic concept that we understand. Well mathematicians like to define things precisely and more importantly they like to distill things down to their simplest essence. Let’s see what the essential properties of addition are.

To define addition we need to talk about what things are going to be added to what things. The expression “3 + blue” doesn’t mean anything because “blue” and “3” are not in the same set of things we might add. We might want to talk about adding colors, and so could reasonably say “red + blue = purple”.

Addition can apply to more than just numbers.

Sometimes we add numbers; sometimes we add colors; but we need to have a set of things (numbers perhaps, or maybe colors) that we will be adding.

So let’s call the set of things that we will be adding E for an ensemble of things. When we are adding numbers, “blue” won’t be in E, but when we are adding colors, “blue” may very well be in E. When we are adding colors, 3 won’t be in E but it will be when we are adding whole numbers.

For the addition that most people are used to, there are an infinite number of things in E, but that doesn’t have to be the case. When we add hours to know what time it will be 15 hours after 9 o’clock, we use only 12 (or 24) of course. Remember, we are trying to distill addition down to its essence, so we want it to work both when E has a finite number of things in it as well as when it doesn’t.

If we are adding numbers, we don’t want 3 + 5 to equal “blue”. We want 3 + 5 to result in another number. If we are adding colors, we want the result of addition to be some color.

So we will say that

If a and b are both in E then a + b is also in E.

The fancy name for this is “closure”, or “E is closed under addition”.

#### You can switch things around

We like addition to work out so that “$a + b$” will yield the same result as “$b + a$”, so we say that

If a and b are both in E then $a + b = b + a$

We say that addition is “commutative”.

#### Any groupings you want

When you mix blue and red and white, it doesn’t matter if you mix the blue and the red first or the red and the white first. Likewise if you add 3 + 2 + 10, it shouldn’t matter whether you add the 3 and the 2 first or the 2 and the 10 first.

If a, b and c are all in E then $(a + b) + c = a + (b + c)$

And so we say that addition is “associative”.

#### We need a way to do nothing.

I have no problem grasping the importance of doing nothing, but not everyone seems to agree with me (Hi boss!). Anyway, we need a way to do nothing with addition.

To put that more mathy, we want E to contain something that behaves like zero. If you add zero to anything, you end up with what you started with.

If a is in E and 0 is the zero element then $a + 0 = 0 + a = a$

For the color adding analogy, we might add the special color “transparent” as our zero element so that blue + transparent = blue.

The fancy way to talk about zero is to call it “the additive identity”.

Mathematicians are happy to call anything that meets those four conditions “addition”.

## Getting to the points

So we can add numbers and we can add colors. But let’s talk about points on a peculiar geometric shape called an elliptic curve. Here is a portion of an elliptic curve, and it has two points on it, P and Q. (These aren’t the same as P and Q used in Dual EC.)

Elliptic curve with points P and Q

There are some interesting properties of these odd sorts of shapes. One of them is that if a (non-vertical) straight line crosses the curve at all, then it will cross it in exactly three places.

For example, if we draw a line that goes through both P and Q then that line will go through the curve at exactly one other point.

From that third point where the line crosses the curve we can find the point on the opposite side of the curve by drawing a vertical line up or down (as needed) from that point of intersection:

Now as strange as it might seem, if we define addition of points on an elliptic curve this way, it will meet all the properties of addition that we want. (I’m leaving out how the zero element is defined, and have constructed my examples so that I can get away with that.)

Now let is return to the problem we started out with. If $Q = dP$ how come it is easy to calculate Q if we know d and P but not d from Q and P? P and Q are points on the curve, while d is just an ordinary (though very large) number.

Well, we’ve defined what addition means for points, but what does it mean to say, for example, “$4P$”? “$4P$” is just shorthand for writing “$P + P + P + P$”. (Note that we can treat “$4P$” kind of like multiplication, but only because 4 is an ordinary number and we can just say that it is repeated addition of P. We haven’t defined any way to multiply two points together.)

Let’s work through a small example and try to find where $4P$ is. The first thing we need to do is add P to itself to get $2P$.

We’ve already stepped through how to add two different points, but how do we add P to itself? We take as our line going through P and P to be the tangent line at point P. This is the line that touches point P but does not cross the curve at P:

That line crosses the curve at some other point. And just as we did when we added P and Q earlier, we draw a vertical line through that point of intersection to get our sum:

Now we have point P, which we started with, and we also have point P + P, or $2P$. So if we want to find $3P$ we can just add P and $2P$ together.

This gives us point $3P$ and we can add $3P$ and P together to get to our goal of $4P$:

### A shortcut

It took us three additions to calculate 4P from P. But we could have saved ourselves one of those. We never really needed to calculate 3P. Instead once we had the point 2P we could have just added 2P to itself to get 4P. Because point addition is associative, we can think of 4P as $(P + P) + (P + P)$, which is $2P + 2P$.

With this shortcut, we have been able to calculate 4P from P with only two additions: The first addition was adding P + P to get 2P; the second addition was 2P + 2P to get 4P.

### The shortcut offers huge savings

When calculating 4P our shortcut saved us only one addition, but the savings get bigger the bigger d is. Suppose we wanted to calculate 25P. The slow way would require 24 additions, but using this shortcut, it would only take us six additions.

1. $2P = P + P$
2. $4P = 2P + 2P$
3. $8P = 4P + 4P$
4. $16P = 8P + 8P$
5. $24P = 16P + 8P$
6. $25P = 24P + P$

So having to only do six of these additions versus 24 by the slow way is a big savings. When the numbers get really big, the savings get much bigger. Suppose we wanted to calculate the result of adding P to itself one billion times. 1000000000P would take almost a billion additions the slow way, but using the short cut we can do it with only about 45 additions.

The number of point additions it takes to calculate $dP$ using this shortcut is roughly proportional to the number of bits in d.

### No shortcut to d

So we’ve got a really efficient way to compute Q from d and P, but there is no efficient shortcut for finding d from P and Q. Indeed, this asymmetry between calculating d versus finding d is one of the things that makes elliptic curve cryptography work.

#### When numbers get big

When we start talking about the huge numbers that cryptographers actually use for these sorts to secrets we get truly massive savings. Those savings are so big that using the shortcut means that we can calculate the result in a fraction of a second, while doing it the slow way would take all of the computers on earth many times the age of the universe to compute.

The number of point additions needed to calculate $Q=dP$ is roughly proportional to the number of bits in d, while the number of additions needed to find d from P and Q using the slow way is roughly d. As d gets big (say a 256-bit number), the difference between d and the number of bits in d is enormous. For a 128 bit number, calculating Q takes less than 256 point additions, but calculating d takes a number so large that I don’t know how to write it out. It would have 77 digits. To get a sense of just how big numbers like that are, take a look at “Guess why we’re moving to 256-bit keys

And if I may introduce more terminology, that asymmetry is an instance of what is called the “Generalized Discrete Logarithm Problem” (GDLP).

### What is the GDLP good for?

To give you a hint of how this sort of stuff can be used for keeping secrets, suppose that two of my dogs, Molly and Patty, wish to set up secret communication without Mr. Talk (the neighbor’s cat) listening in. The dogs will need to establish between them a secret key that only they know, but how can they do this if Mr Talk is listening in to their communication?

First Patty and Molly will agree on an elliptic curve and some other non-secret parameters. Among those parameters there will be a starting point, that I will call G. Let’s let Patty pick a secret number that only she knows and we will call that little p. Patty will calculate the point pG. Let’s call the point that Patty calculates big P.

Patty can send P to Molly, but neither Molly nor Mr Talk (who is eavesdropping) will be able to calculate little p, even though they both know G and P. Big P is Patty’s public key, while small p is Patty’s private key.

Molly can do the same. She calculates $M = mG$, where little m is Molly’s secret, and sends big M to Patty.

Now here is where it gets fun. Patty can calculate point $K_p$ by multiplying M by little p. That is

$K_p = pM$

Molly can also calculate $K_m$ by multiplying Patty’s public key, P, by Molly’s own secret, m. $K_m = mP$.

The Ks that both calculate will be the same. This is because point addition is commutative and associative. Remember that $M = mG$. So

$K_p = pM = pmG$.

We can look at what Molly calculates from Patty’s public key the same way: $P = pG$. When we look at what both Molly and Patty calculate with their own secrets and the other’s public point, we get

$K_m = mP = mpG = pmG = pM = K_p$

Let’s look at each end of that long chain of “=”s, but drop out all of the middle parts.

$K_m = K_p$

Both Molly and Patty calculate the same secret K even though neither of them have revealed any secret in their communication. And Mr Talk, who has listened to everything Molly and Patty sent to each other, cannot calculate K because there is no feasible way for him to learn either Patty’s secret p or Molly’s secret m.

This is magic with mathematics.

## Lies and omissions

There are a lot of things that I left out of the explanation I gave here. And in a few cases I even told a couple of lies to make the explanation simpler. I’m only going to mention some of them here.

• The shortcut that I described for computing Q from p and D is not the most efficient shortcut, but it is an efficient shortcut that is easiest to explain.
• There are shortcuts for calculating d, but there are thought to be no efficient non-quantum ones.
• I have not explained what I mean by “efficient”. (It is a technical term.)
• I haven’t told you the formulas for doing addition of elliptic curves, I only did it with pictures.

So my daughter texted me a picture of a science fair project she saw. I texted back.

It looks like someone at my daughter’s high school did a science fair project on finding good algorithms for point doubling. Thus forcing me to reveal just how much of a pushy and intrusive parent I am.

• To make addition on elliptic curves work as an instance of the GDLP, we need to make it have only a finite amount of elements. That involves, among, other thing, making all of the normal arithmetic in the formulas done modulo some large prime number.
• I never said what “=” means when I defined addition.
• I kinda sorta misused the terms “public key” and “private key” when applying those terms to ECDHE.
• I didn’t say anything at all about the existence of additive inverses.
• The addition of colors analogy stops working if we were to consider additive inverses.
• I implied that all and only the properties I listed for addition are what is needed for something to be called “addition”. The properties that I listed plus the existence of additive inverses are what is needed for the specific type of addition we need for this cryptography. Addition of numbers requires something more because numbers – unlike points on an elliptic curve or colors – line up in an order.

## The after math

I’ve been able to use an example of a rare cryptographic back door to give me the chance to talk about the math of elliptic curves cryptography. But if you are asking what this has to do with AgileBits and our products (after all, we don’t use elliptic curves in our products), then we must return to the original point: If you build a system into which it is easy to add a back door, you shouldn’t be all too surprised when a back door is added. So let’s not build things that way.

I will close with another quote from that Economist article:

Calls for the mandatory inclusion of back doors should therefore be resisted. Their potential use by criminals weakens overall internet security, on which billions of people rely for banking and payments. Their existence also undermines confidence in technology companies and makes it hard for Western governments to criticise authoritarian regimes for interfering with the internet. And their imposition would be futile in any case: high-powered encryption software, with no back doors, is available free online to anyone who wants it.

## Bcrypt is great, but is password cracking “infeasible”?

There are a lot of technical terms that mean something very specific to cryptographers but often mean something else to everyone else, including security professionals. Years ago I wrote about what it means to say that a cipher is “broken”. Today’s word is “infeasible”.

The news that sparked this lesson is the use of “computationally infeasible” in an announcement by Slack. Slack has announced that their hashed password database had been compromised, and their message was excellent: They clearly described what was available to attackers (usernames, email address, hashed passwords, and possibly phone numbers and contact information users may have added); they offered clear and useful instructions on what users should do (change passwords, enable two-step verification), and described what they have done and what they will be doing. And – most relevant for the technical discussion here – they have told us how the passwords were hashed.

In this case they said:

Slack’s hashing function is bcrypt with a randomly generated salt per-password which makes it computationally infeasible that your password could be recreated from the hashed form.

It is terrific that they chose to use bcyrpt for password hashing. bcrypt is among the three password hashing schemes that we recommend for sites and services that must store hashed passwords. The other two are PBKDF2 and scrypt. But Slack’s use of the term “computationally infeasible” here illustrates that one must be very careful when using cryptographic technical terms.

If you have a weak or reused password for Slack, change it immediately. Here is a guide to using 1Password for changing a password. And because the Slack app on iOS makes use of the 1Password App Extension, it is easy to use a strong and unique password for Slack.

If you would like to see how to use Slack’s two-step verification with 1Password take a look at our handy guide on doing just that.

But now back to what is feasible with password hashing.

## One way hashing

When services that you log into store your password they should never store those as unencrypted “plaintext”. If they are stored as plaintext it means that anyone who can get their hands on that data file can learn everyone’s passwords. For example, Molly (one of my dogs) uses the same password on Poop Deck as she does on Barkbook. So if Patty (my other dog) learns Molly’s Poop Deck password, she can use it to break into Molly’s Barkbook account as well. This is why it is important not to reuse passwords.

Now suppose that Molly uses the password “rabbit” on Barkbook. (Have I mentioned that Molly is not the smartest dog in the pack?) Barkbook shouldn’t store just “rabbit”, but instead should store a one way hash of rabbit. A cryptographic hash function will transform something like “rabbit” into something like “bQ67vc4yR024FB0j0sAb2WKNbl8=” (base64 encoded).

One of the features of a cryptographic hash function is that it should be quick and easy to compute the hash from the original, but that it should be infeasible to perform the computation in the other direction. That is it should be pretty much impossible to go from “bQ67vc4yR024FB0j0sAb2WKNbl8=” back to “rabbit”. And it is.

### Guessing versus reversing

With any halfway decent cryptographic hash function is it infeasible to compute the original from its hash if the original is chosen at random! But if you can make some reasonable guesses about the original then you can use the hash to check your guesses. Because passwords created by humans are not chosen at random, then it does become computationally feasible (and often quite practical) to discover the original based on the hash.

The actual formal definition of “one-way” for a cryptographic hash function, H(x), includes the requirement that x be the output of a uniformly distributed sampling of the domain of H. That is, considering all of the things that you can hash (under some set length), you need to pick something at random.  Otherwise a hash function might be invertible. Human created passwords do not meet that requirement and so the “computational infeasibility” of inverting a one way function isn’t applicable when its input is not chosen at random.

So now let’s correct Slack’s statement:

Slack’s hashing function is bcrypt with a randomly generated salt per-password which makes it computationally infeasible that a randomly created password could be recreated from the hashed form.

Modified Slack statement.

This, of course, is why you should use 1Password’s Strong Password Generator for creating your passwords. When your password is chosen randomly with a large set of possibilities, then it really is computationally infeasible to discover the password from the cryptographic hash.

### Slowing down guessing

I mentioned that (for now) bcrypt, scrypt, and PBKDF2 are good choices for password hashing. Once the final results are in from the Password Hashing Competition and the dust has settled, we will probably have a good successor to those three. These are built upon cryptographic hash functions, but are designed for hashing specifically for when their input is not selected randomly.

Because cryptographic hashing is something that we have computers do a lot of, one of the things that we want is that it be fast. We want to be able to perform lots and lots of SHA-256 hashes per second without straining a computer’s memory. But if an attacker is going to be guessing passwords to see if they produce the right hash, we want to slow down the hashing. PBKDF2, scrypt, and bcrypt are all designed to require much more computation than a regular hash function to compute a hash. This can slow down an attacker from performing millions of computations per second to just thousands. The actual speed depends on many things, including the hardware that the attacker brings to bear on the system. scrypt, additionally, places specific demands on memory.

So the use of bcrypt means that attackers will need to do more work than they otherwise would to guess passwords stolen from Slack. That is a good thing, but it is not an “infeasible” amount of work.

## What’s infeasible?

I started out by saying that I was going to talk about the word “infeasible”, but so far I have just been using it a lot. This is because its definition is abstract, subtle, and hard. I am not going to give a full definition, but I am going to try to get reasonably close. The discussion that follows is inherently technical, and nobody will blame you if instead of reading further you just wish to watch us pour ice water over ourselves. (Remember, that was a thing just last year.)

Welcome back to this article. It get’s progressively more arcane from this point onward.

The notion of infeasible depends on the relationship between the amount of work the defender has to do to secure the system compared to the amount of work that the attacker has to do to break it. A bank vault may take a minute to unlock if you know the combination, but it may take days to break through if you don’t. With cryptographic systems it can take just a fraction of a second to decrypt data if you have a key, but many times the age of the universe to do so if you don’t have the key.

### Security parameters

What we want is the amount of work the attacker has to do to be vastly disproportionate to the work that the defender must do. It turns out that this can be stated mathematically, but first we need to introduce the notion of “security parameter” if we want our definition to stand the test of time instead of depending on the speed and power of current computers. So we will talk about how much work the defender and the attacker have to do in proportion to some security parameter.

Let’s pick, for purposes of exposition, an encryption system that operates at a security parameter of 56. The amount of computation that the the defender has to do to decrypt some data with the key is proportional to 56, but the amount of work that the attacker has to do to decrypt the data without the key is proportional to 2⁵⁶. Fifty-six is much much smaller than 2 raised to the 56th power, but today even 2⁵⁶ operations is within the reach of many attackers. Thirty years ago it was within the reach of just a few.

So now let’s suppose that we want to double this security parameter to 112. How much of a work increase might this cause the defender? You might be thinking that it doubles the cost to the defender, but the system I’m thinking of actually tripled the cost to the defender. Tripling the cost for double the security parameter may not seem like a good deal, but doubling the security parameter increased the work of the attacker by another 2⁵⁶, for a total of 2¹¹². This puts it well outside the reach of even the most resourceful attacker for a long time to come.

When we doubled the security parameter in that example, the work to the defender increased linearly while the work to the attacker increased exponentially. We want the work required of the attacker to increase exponentially with the security parameter while for the defender we increase it linearly or polynomially.

### Doing time, time, time in an exponential rhyme

If the security parameter is n, we will tolerate it if the amount of work the defender must do is proportional to na for any a > 1. That’s what we mean when we say the work is “polynomial in n“. So if the work goes up with the square or cube of n we might grumble and seek more practical systems, but no matter how big the power that n is raised to gets, this is still a polynomial expression. An algorithm that works this way is called a “polynomial time algorithm”.

For the attacker we want the number of computations needed to be proptional to an expression in which n is in the exponent. So if the work to the attacker is proportional to b for any b > 1, so that the work is exponential in n. (Those of you who know this stuff, know that I’m leaving some things out and am taking some shortcuts.)

It might seem that a “big” polynomial get us bigger numbers than a “small” exponential, but no matter how much a polynomial function starts out ahead of an exponential, the exponential will always catch up. Let’s compare the exponential  y=1.1ˣ with the polynomial y=x⁶ + 2. For values of x below a few hundred, it looks like the polynomial is the runaway winner.But we inevitably reach a point where the exponential function catches up. For the particularly examples I’ve given, the exponential catches up with the polynomial when x is about 372.73.

Finally, if we go just a bit beyond the point where the exponential overtakes the polynomial, we see that the exponential completely flattens the polynomial.

Some computations will take a number of steps that are polynomial in n (“polynomial time algorithms”), and others will be exponential (“exponential time algorithms”). We say that a task is infeasible if there is no polynomial time algorithm to complete it with a non-negligible chance of success. I have not defined what a non-negligible chance of success is, but as the article appears to be growing in length exponentially, I will leave that discussion for our forums.

When we have this sort of asymmetry, where the work done by the attacker grows exponentially with the security parameter, but grows at most polynomially for the defender, there will always be some security factor beyond which the work to be done by the attacker is so enormously larger than what the defender must do as to just not be an option for any attacker.

### Quibbling over terminology

Now that we have a workable definition of “infeasible” and a better understanding of what cryptographic hash functions do, we can take a closer look at Slack’s statement. First let me repeat that their overall statement was excellent, and I fully sympathize with the difficulty involved in writing something about security that is correct, clear, and usable. I’ve taken some shortcuts in my exposition on any number of occasions, and I’ve made my share of errors as well. My point here is not to criticize but instead to use this as an opportunity to explain.

Given what we believe about cryptographic hash functions it is infeasible to discover x if you are only given the hash of x but only if x is chosen at random. Furthermore this is true of any (decent) cryptographic hash function and is not limited to the slow functions that are recommended for password hashing. That is, we don’t need bcrypt or PBKDF2 for that property to hold.

### The limits of slow hashes

Slow hashes – specifically designed for password hashing – are built because we know that passwords are not chosen at random and so are subject to guessing attacks. But slow hashes have their limits, and with the notions that have been introduced above, we can now talk about them more clearly. Using a slow hash like PBKDF2 slows things down for both the attacker and for the defender. And the amount of slow-down is roughly the same for both the attacker and for the defender.

If we increase the security parameter (number of iterations) for PBKDF2 the computational cost rises linearly for both the attacker and for the defender. This is unlike the security parameters we use elsewhere in cryptography, where we would like a small (linear or perhaps polynomial) increase in cost to the defender to create a large (exponential) increase for the attacker.

Let’s see how that works out with a concrete, if hypothetical, example. Suppose it is using 40,000 PBKDF2 iterations. Now suppose that you add a really randomly chosen digit to the end of your Master Password. Adding a single random digit will make an attacker do 10 times the amount of work that they would have to do to crack the original password. Adding two digits would make the attacker have to do 100 times the work of the original. Making a password just a little bit longer (with random stuff) makes the work required by the attacker increase exponentially. That is the kind of proportion of work that we like.

Now suppose 1Password uses 40,000 PBKDF2 iterations in deriving your Master Password. To get the same additional work as adding a single digit to your password, you would need to increase the number of PBKDF2 iterations to 400,000. And to get the equivalent of adding two digits, you would need to increase the number of iterations to 4,000,000. Once we have a goodly amount of PBKDF2 iterations, there isn’t that much gained by increasing it by an additional ten or twenty thousand. But there is much to be gained by even a small improvement in a Master Password.

PBKDF2 is terrific, and it is an important part of the defense that 1Password offers if an attacker gets a hold of your encrypted data. But you must still pick a good Master Password because the security parameter is linear for both the defender and the attacker. Unless there is a breakthrough in the slow hashing competition, a strong Master Password will always be required in order to ensure your security can withstand the test of time.

## Large even prime number discovered

You have probably been taught that two is the only even prime number. But today mathematicians at the University of Southern North Dakota at Hoople have discovered a new, large, even prime. It is more than a million digits long and is equal to the value of 3²²³⁷⁵⁶¹+3¹¹¹⁸⁷⁸¹.

Many people are under the erroneous belief that two is the only even prime number, but as Professor Paul Forester explains, “tings get really meshuga vhen numbers get large.” For example, when some number n gets very large, it becomes approximately the same as its successor. Because:

$\displaystyle\lim_{n \to \infty} \frac{1}{n} = \frac{1}{n+1}$

we can see that n must get closer and closer to n+1 when n is very large. So when numbers are pretty much the same as their neighbors at these large values, the notion of odd and even don’t hold in the traditional sense.

## What does this mean for cryptography

First of all, this surprising mathematical discovery has no (immediate) bearing on the security of 1Password, as 1Password does not use the kind of cryptography that depends heavily on the theory of prime numbers. But this might have some implications for cryptography. At the moment, the only immediately visible impact is that it should make some of the slowest cryptographic computations quicker and more efficient.

In some cryptographic systems (though not 1Password), the software must generate large, randomly chosen prime numbers. This is a very time consuming process, and it works by first picking large random numbers, then checking whether they are prime through a series of tests. Almost all software implementations of this will only pick odd numbers by setting the least significant bit of the random number of 1. But this excludes half of the numbers it could pick, thus failing to find any of the even large primes.

### Testing for primes

Once a random number is picked in the appropriate range it needs to be tested for primality. Many of the tests result in answers that aren’t quite definitive. Indeed, a number of tests produce results of either “definitely not prime” and “possibly prime” and each of these tests may different amounts of time to run. The general strategy is to run the quickest tests first on your candidate number, and only then run the more expensive tests. If your candidate number passes a sufficient number of those tests, then you can determine with sufficiently high probability that the number really is prime.

There is a way, of course, to definitively test whether a number, N, is prime. And that is to attempt to divide by every prime number less than or equal to the square root of N. But while that approach if definitive, it is simply far too many divisions to actually test.

### The prime numbers in cryptography

The prime numbers used in cryptographic systems are typically 1024 bits (about 308 digits) long. Pairs of these are generated and multiplied together to produce 2048 bit (about 616 digit) products. Note that when you multiply, say, a five digit number by a three digit number you usually end up with an eight (five plus three) digit number. This holds when using bits instead of decimal numbers. So the product of two 1024 bit numbers will typically be a 2048 bit number.

Even for 300 digit numbers, which are far, far smaller than the million digit prime announced Saturday, it isn’t feasible to run definitive primality tests in the time we need when picking prime numbers. Indeed, it is probably near the edge of the NSA’s capability to factor 1024 bit products of 512 bit primes. This is why it is no longer recommended to use 1024 bit RSA keys.

### A note on key sizes

If I am saying that 1024 bit keys aren’t safe, why does 1Password “only” use 256 bit keys? This is because different kinds of encryption systems have different kinds of keys. Keys used for the AES algorithm are completely random numbers. Guessing the key means trying every single 256 bit key until you find the one that works. That just isn’t possible even for a 128 bit key. But for public key encryption systems, not just any public key will do. Not just any 2048-bit numbers can be an Rivest-Shamir-Adleman (RSA) public key. Instead, it must (essentially) be the product of two 1024-bit prime numbers (which are, in essence, the private key).

I say “essentially” in there because if two prime numbers are p and q, then the actually public key isn’t p times q, pq, but is in fact Φ(p)Φ(q), which works out to (p-1)(q-1) in this case. The Φ function is known of as Euler’s totient function. For quite some time, I believed that there was a mathematician whose name sounded like “Oiler” who worked on similar stuff as the mathematician I’d read about, whose name I pronounced “Yuler”. Along the same lines, it was only when someone read the Little Prince aloud that I realized that the word I’d heard as “yu-neek” was the same as the one that I pronounce “un-ee-cue”. I still think of the Prince as “un-ee-cue in all the world.”

Let’s get back to key sizes. Not every public key system uses the RSA algorithm. The Diffie-Hellman (DH) system uses different mathematics, but has key length requirements similar to RSA. 1024 bits is no longer considered secure against the likes of the NSA. The third kind of public key algorithm in use is based on elliptical curves, and is sometimes called ECDH because it is actually based on the same logic as Diffie-Hellman at its heart, though it works through different mathematical operations. One advantage of ECDH is that it works with much smaller keys. So a 256-bit ECDH key is perfectly reasonable.

This article was posted on April 1, 2014. The claim that an even prime number other than two has been found is bogus. The notion of odd and even holds for all integers, no matter how large. The fictitious University of Southern North Dakota at Hoople is the creation of the real Peter Schickele. The fictitious mathematician Paul Forester is my resurrection of the great 20th century mathematician, Pál Erdős. Everything else here is actually meant to be reliable information. Including those bits that are un-ee-cue in all the world.

## Guess why we’re moving to 256-bit AES keys

1Password is moving to using 256-bit AES keys instead of 128-bit keys. We already started this within the browser extensions in the summer of 2011, and the new Cloud Keychain Format also uses 256-bit keys.

Why do you think we are making this move? If your answer is because AES 256 is stronger than AES 128, you’d be wrong. There is a technical sense in which AES 256 is enormously stronger than AES 128, but in every sense that actually matters for security there is no difference. Let me explain.

### AES? Keys?

AES (the Advanced Encryption Standard) is a fundamental building block of the encryption within 1Password and most everything else that uses encryption in the modern world. It takes a key and some data (plaintext) as input and transforms that data into something that looks entirely random (ciphertext). The only way to get meaning out of the ciphertext is to use AES and the same key to transform it back into the plaintext. A key is just a number, and AES can work with keys of three different sizes, 128 bits, 192 bits, and 256 bits.

AES, by the way, is always a 128-bit cipher operating on 128-bit chunks of data (blocks) at a time; so when I use expressions like “AES256” or “256-bit AES” in what follows, I’m just talking about key size.

If you’ve been curious about why 1Password didn’t jump on the 256-bit key bandwagon earlier or why we seem to be doing so now, read on. Even if those particular questions never crossed your mind, this article may give you some insight into what sorts of things go into security choices.

The numbers that we need to talk about are just too big to write out normally. When we are dealing with numbers like 65536, I can opt whether to express it as “65536” or “216“, depending on what is most useful in the context. And maybe when dealing with a number like 232 I can say things like “4.3 billion”.

#### 2128 in words

“three hundred forty undecillion, two hundred eighty-two decillion, three hundred sixty-six nonillion, nine hundred twenty octillion, nine hundred thirty-eight septillion, four hundred sixty-three sextillion, four hundred sixty-three quintillion, three hundred seventy-four quadrillion, six hundred seven trillion, four hundred thirty-one billion, seven hundred sixty-eight million, two hundred eleven thousand, four hundred fifty-six”

But the numbers we deal with in cryptography are so big that I have to write them in exponential form. The number of possible keys that a 128-bit key allows is just too enormous to write otherwise. Sure, I could write out 2128 in words with the help of a numbers to words converter, but it is neither informative nor manageable. Nor would it be useful for me to write out the number in decimal, as it would be 39 digits long.

And one more thing about writing numbers in words: when I do so here, I will be using the US English, short scale, conventions; “billion” means 109, not 1012.

### Searching for keys is harder than digging up bones

Molly (one of my dogs) doesn’t really enjoy dog toys that much, but she will certainly not allow Patty (the other dog) to play with any toys. Molly, then, steals and hide Patty’s toys. Suppose that Molly has a possible 2128 sniff-proof hiding places she can hide them in. Also suppose that Patty knows about all of those hiding places, but she doesn’t know which one Molly has used. Patty might try to look in each one until she finds her toys. Searching each and every one of those 2128 hiding places until she finds the one with the toy is what we’ll call a brute force attack.

On average, Patty will find the right one after searching about half way through all of the hiding places. This means that, on average, she’ll have to try 2127 hiding places before she finds her toys. If you thought it was going to be 264, pause for a moment. In fact, 2128 divided by 2 is 2127. Each additional power of two doubles the number, so halving the number means just taking one off of the exponent.

Molly might imagine that, to be extra secure instead of using “only” 2128 possible hiding places, she might use 2256 possible hiding places. 2256 is enormously bigger than 2128. Hugely bigger. Unimaginably bigger. Mind-boggingly bigger, though to be honest, the number 2 is enough to boggle Molly’s mind. In fact, 2256 is 2128 times bigger than 2128.

Now, I just said that moving to  2256 hiding spaces makes the number of places that Patty would need to search unbelievably, enormously, mind-bogglingly bigger. But Molly would be wrong to think that this made it more secure. Why? Because searching through “only” 2128 hiding spaces is already so mind-bogglingly, amazingly and unimaginably hard that there is no gain in making it harder.

### How long is long?

Patty is a very fast dog – well, at least in her youth she was. Even today, over short distances, she can outrun Molly, who is ten years her junior. So let’s imagine that Patty could search hiding spaces as quickly as a super computer can add two numbers. Actually, let’s suppose that she gets together with a billion other dogs, each of which can search a hiding place as quickly as it takes a super computer to add two numbers. Working at this unimaginable speed, these billion super fast dogs working under Patty’s direction might be able to search 250 hiding spaces per second, which is about one quadrillion hiding spaces per second. There are about 31557600 seconds per year, so working like a billion super computers, Patty with her friends could check about 275, or 10 septillion, hiding places per year.

At that rate it would take 253 years (10 quadrillion years) to work through half of the 2128 hiding spaces. If we take the universe to be about 15 billion years old, then the amount of time it would take Patty, working faster than the combined power of a billion super computers, would be more than 600,000 times the age of the universe.

In case my analogy has gone too far astray, I’m estimating that, as an extremely fast estimate, all of the computing power on Earth turned to trying AES keys couldn’t check more than 275 keys per year (and really that is a very very high estimate). At that rate, it would take more than half a million times the age of the universe to go through half of the 2128 possible AES keys.

Now, single-minded Molly, who will spend an entire day barking at a squirrel up a tree, may think that half a million universes ages isn’t too long to wait. But nobody else would even consider trying such a brute force attack. Patty is a clever dog, and so she wouldn’t even consider trying a brute force attack on 2128 hiding spaces.

Patty might try other attacks. She might figure that Molly didn’t pick the hiding place in a truly random fashion, and so Patty might know which sorts of places to search first. Or Patty might try to secretly follow Molly to the hiding place. Or maybe there is a way that Patty can trick Molly into bringing her the toys. Those are the kinds of attacks that Molly needs to defend against. But she gains nothing by increasing the number of possible hiding places, because even if Patty had all of the resources on Earth searching Molly’s hiding places, Patty couldn’t even make a dent before the universe comes to an end.

### The difference between zero and zero is zero

The chances of Patty and all of her super fast friends finding Molly’s hiding spot is as close to zero as we could possibly want. Let’s call Patty’s chances in this case ϵ1 (epsilon 1), a really small number.  If Molly uses 2256 possible hiding spaces instead of 2128, the chances of Patty and her friends finding the one with the toys is another number as close to zero as we could possible want. We’ll call these chances  ϵ2 (epsilon 2). Sure, ϵ2 is many times smaller than ϵ1, but both ϵ1 and ϵ2 are already as close to zero as we could possibly want. Molly’s practical security gain in using the larger number of hiding spaces is pretty much the difference between ϵ1 and ϵ2. That difference, for all meaningful purposes, is zero.

### It takes a lot of dog food to keep Patty searching

We all know that dogs like to eat. And we all know that computers consume electricity. As it happens, computation (and inspecting hiding places) has to consume energy. It’s actually the destruction (or overwriting) of information that necessarily consumes energy, but that happens when Patty forgets about a previous hiding place so she can think about the next one. If Patty and her friends could move on to checking a new possible key using the absolute theoretical minimum energy for a single computation, 2.85 × 10-21 J, she and her pack of billion super fast (and now unfathomablely  efficient) of dogs would require about  1/100th of the total amount of energy humanity uses in a year to work through half of the 2128 hiding spaces.

### The answers to some questions remain TOP SECRET

I have tried to explain all this to Molly countless times, but she just stares blankly as if to ask, “Well, then why does the US government require 256-bit AES keys for TOP SECRET material?”  Actually, all Molly says with her stares is, “Huh?”. I tend to read a bit more into these than is really there. My only answer to her is that it is the same reason that she likes being blow dried after a bath on her left side, but hates it on her right side. Some things, in the mind of Molly and in government, remain a mystery.

I have some reasonably charitable speculation for why those requirements are there, but it remains speculation, and we can continue that discussion in our discussion forums.

### Are 256-bit keys less secure than 128-bit keys?

[F]or new applications I suggest that people don’t use AES-256. AES-128 provides more than enough security margin for the [foreseeable] future. But if you’re already using AES-256, there’s no reason to change.

people need to pay attention. But paying attention and evaluating doesn’t always mean agreeing.

Briefly, there is a long-known problem with how AES deals with 256-bit AES keys. (Of course in this business a “long-known problem” means about 10 years old.) AES does multiple rounds of transforming each chunk of data, and it uses different portions of the key in these different rounds. The specification for which portions of the key get used when is called the “key schedule”. The key schedule for 256-bit keys is not as well designed as the key schedule for 128-bit keys. And in recent years there has been substantial progress in turning those design problems into potential attacks on AES 256. This is the basis for Bruce Schneier’s advice on key choice.

One of the two reasons why I reject Schneier’s advice is that the issue with the AES 256-bit key schedule only opens up the possibility of a related key attack. Related key attacks depend on things being encrypted with keys that are related to each other in specific ways. Imagine if a system encrypts some stuff with a key, k1 and encrypts some other stuff with a different key, k2. The attacker doesn’t know what either k1 or k2 are, but she does know the difference between those two keys are.  If knowing the relationship between keys (without knowing the keys) gives the attacker some advantage in discovering the keys or decrypting material encrypted with those keys, then we have a related key attack.

When cryptographic systems are properly designed, related key attacks should not be relevant because good crypto systems shouldn’t use or create related keys. Cryptographers worry about related key attacks on AES because they know that some systems will be poorly designed, so it is still important to build ciphers that aren’t vulnerable to related key attacks. A spectacular case of using related keys with a cipher (RC4) for which related key attacks were easy was probably the design WEP WiFi encryption standard. This is one of several reasons why it is possible to discover a WEP Wi-Fi key after just a few minutes (though remember: Just because it’s easy doesn’t mean it is right or legal).

Each and every encryption key used in 1Password is selected independently using a cryptographically appropriate random number generator. This means that there is no way for an attacker to know of any relationship among keys used or generated by 1Password. There is no relationship among keys.

### So why 256 bits now?

I hope I’ve persuaded you that 256-bit AES does not reduce any meaningful threat. Essentially it reduces the chance of a successful brute force attack from effectively zero to effectively zero.

So why are we moving to 256-bit AES keys?

#### 1. There is no longer any reason not to move to 256-bit keys

1Password data needed to be encrypted and decrypted on first generation iPhones. Lots of encryption operations using 256-bit keys would have been slow and would drained batteries faster. On desktop computers, we were able to move to 256-bit keys within our 1Password browser extension. But for our principal data format – the one that is used across platforms – we needed to consider the minimal hardware it would run on.

1Password 4 for iOS requires iOS version 6 (which includes development features that allows for our awesome new Web Mode). This means all the devices 1Password 4 will run on are sufficiently powerful that we no longer need to be concerned about performance issues with 256-bit keys.  The performance concerns that we had in the past—both speed and power consumption—are no longer a concern today.

#### 2. Tougher key derivation

This one is subtle. And I’d like to thank Solar Designer of the Openwall Project for drawing my attention to this. It turns out that a side effect of using 256-bit keys in 1Password can make things even harder for automated Master Password guessing programs, not because such keys are harder to attack, but through a more convoluted chain. Don’t worry if you find this section confusing.

1Password uses PBKDF2 to slow down password crackers that could be used to automate guessing your Master Password if someone gets hold of your data. PBKDF2 is a Key Derivation Function – a system that churns your Master Password into a number that can be used as an encryption key. With our new Cloud Keychain Format, we use PBKDF2 to turn your Master Password into two 256-bit keys. One is an HMAC key, used for an integrity check on the data; and the other is a key used to actually decrypt the master key. To derive that total of 512-bits from your Master Password, 1Password uses HMAC-SHA512 within PBKDF2 in the Cloud Keychain format.

Password cracking systems, like hashcat, can speed up their operations by using GPUs (Graphic Processing Units) which can perform some kinds of computations blindingly fast, but there are some computation artifacts of SHA-512 that make this harder on GPUs. Solar Designer mentions this in his discussion of the future of password hashing (slide 35 and elsewhere).

I did warn you at the beginning of this section that this particular reason is convoluted and subtle. The short version is that a side-effect of using 256-bit AES keys is that it makes PBKDF2 more effective in certain circumstances.

#### 3. People (and Molly) feel better about 256-bit keys

If Molly feels that 128-bit keys aren’t sufficiently secure, she may incorrectly reject systems that use 128-bit keys instead of 256-bit keys. In doing so, she may make choices that actually weaken her security overall. I might not agree with her reasoning, but we do have to recognize that her feelings do matter to her choices. And I certainly want to keep Molly secure. So if by offering 256-bit keys we enable Molly to make better security choices (even if for the wrong reasons), that is a good thing.

As long as there is no reason not to use 256-bit AES keys, it makes sense to use them. We have now reached the point where there is no harm in using 256-bit keys, and so the reassurance that comes with using them is worthwhile.

Now, security is a tough business. And tough people in a tough business can talk tough. When I threatened to shoot somebody if we used the expression “military grade” to describe our use of 256-bit AES keys, I wasn’t expecting that I’d have to shoot the guy who signs the checks. But a promise is a promise. So, Dave Teare, the gauntlet is down. Water pistols at dawn!

### In conclusion

If there is any overall lesson here, beyond explaining why we’ve made the choices we have about key size for 1Password, it’s that seemingly simple questions about security and cryptography rarely have simple answers and explanations. On the one hand, we want people to understand what goes on under the hood and the thinking that goes into various design elements; but we also want to make it easy for people to behave securely without requiring that they understand what’s going on at the deeper levels.

### A quantum of bits [Update: March 20, 2013]

I reached out to the cryptographic community for any insight into Molly’s question about why the NSA insists that TOP SECRET material be encrypted using 256-bit keys. The answer came from Steven Bellovin of Columbia University:

@jpgoldberg @marshray Just heard that during the AES competition, NSA said in the open meetings it was for defense against quantum computing

Quantum computers, if they are every made practical, will be able to do amazing things. They will certainly change how we design cryptographic systems. It’s not that quantum computers will be faster or more powerful. Indeed, in some very important respects they will be less powerful than current computers. But there are some things that they will be able to do in less “time”.  I put “time” in scare quotes because it has a different meaning in this context from the ordinary use of the word. Oh, what a big difference it is. In this context it means the number of distinct steps an algorithm must take in performing some computation.

Searching through 2128 keys (on a classical, non-quantum, computer) takes a number of steps that is proportional to 2128. But for a quantum computer it takes a number of steps proportional to the square root of that number, 264. If a quantum computer is ever built capable of performing that task, we don’t know how the actual speed of each individual step will compare to those of current computers, but the NSA is taking no chances. Something with the effective strength of a 64-bit key isn’t strong enough. A 256-bit key against a quantum brute force attack would have the effective strength of a 128 bit key against a classical brute force attack.

I very much doubt that we will see a quantum computer actually capable of handing such things within the next thirty years. But if the past is any guide, my predictions about the future should be taken with a large grain of salt.

## Authenticated Encryption and how not to get caught chasing a coyote

I introduced HMAC (Hash-based Message Authentication Code) through the back door when talking about the Time-based One Time Password (TOTP) of Dropbox’s two-step verification. But TOTP is actually a peculiar way to use HMAC. Let’s explore what what Message Authentication Codes (MACs) are normally used for and why they play such an important role in the future of 1Password. If you guessed that MACs are for authenticating messages, you’re on the right track.

In a sense (but this is a lie), you can think of MACs as kind of like encrypting things that aren’t secret.  The recipient doesn’t decrypt the data, instead the recipient verifies that it really was the data sent by the sender and that the data hasn’t been tampered with. MACs work with the sender and the recipient both sharing a secret key. (This is the key difference between MACs and digital signatures. With MACs, the sender and the recipient share the same secret key.)

Only someone with knowledge of the secret MAC key could have created the MAC for some data, and (unlike the case of digital signatures) only someone with knowledge of the secret MAC key can verify that the MAC is valid for the data.

### Why use a MAC?

Suppose my dog Patty (the clever one) leaves a warning for Molly (a simple dog) that says, “There’s a coyote in the back yard”. The message isn’t secret. Neither Patty nor Molly care who reads it. But now suppose that the neighbor’s cat, Mr Talk, in his attempt to get rid of Molly, changes the message to “There’s a squirrel in the back yard”.  This would not be good news for Molly, who would blindly run behind the house, chase a squirrel up a tree and then bark at it for the next thirty minutes. Coyotes, however, do not climb trees; and so Molly would have an entirely different experience if she tried the same action against a coyote.

To avoid tampering with the message, and to prevent Mr Talk from sending a counterfeit message in Patty’s name, Patty and Molly can share a secret key which is used to create a MAC of the message. Suppose that Patty and Molly, ignoring earlier advice on creating passwords, have secretly agreed on the key (well, a password from which a key is derived)—”Kitty Smells”—for their MACs. When Patty constructs the message, she will calculate the MAC (and if she is using HMAC with SHA1 as the hashing algorithm) with ‘Kitty Smells’ as the password, the HMAC should come out as:

a3b3a8b79c135ff5b9a0aa6f5c304b411f07f90c.

Patty will leave the message, “There’s a coyote behind the house”. along with the MAC.  When Molly sees a message, she should verify the MAC before even looking at the contents of the message. If Mr Talk has changed the message to say “squirrel” instead of “coyote,” the MACs won’t match up. Mr Talk cannot create a valid MAC because he doesn’t know the shared secret password used to create the secret key that Molly and Patty have.

Sadly, Mr Talk’s trick will still work. That is because as soon as Molly encounters the word “squirrel” she will react. All thoughts of verifying the MAC will be pushed out of her brain, which will now be entirely occupied by the single thought, “squirrel”. Indeed, if I were reading this aloud, Molly would have run out the back in blind excitement.  This is why it is very important to not even look at the contents of a message before verifying its authenticity.

### MACs on encrypted data

In this example, the original message isn’t secret and so didn’t need to be encrypted. But with authenticated encryption, we first encrypt the message with an encryption key and then compute a MAC of the encrypted text using an authentication key. Just as it was important for Molly to not do anything with the message until she verified the MAC, it is vital that we don’t try to decrypt an encrypted message before verifying the MAC. This system is called “Encrypt-then-MAC”, but the emphasis should be put on the other end of the process. I like calling it “Verify-then-decrypt”.

A scheme like Encrypt-then-MAC that both encrypts a message and provides authentication (proof of who it came from) is called “authenticated encryption”. Encrypt-then-MAC isn’t the only secure authenticated encryption scheme out there, but it is the one that we use in the 1Password 4 Cloud Keychain format.

### It’s already encrypted with a shared secret. Why verify?

You might think that there is no reason to authenticate encrypted data. After all, the data was encrypted with a secret key, so if you can decrypt the message with that secret key, then you know it only could have been encrypted by someone with knowledge of the secret key. Many people have thought that, but they were wrong.

Suppose that Molly sends an encrypted message to Patty, but doesn’t use authenticated encryption. Now when Patty gets a message from Molly she decrypts it. If the decrypted message is garbled in a specific way, Patty tells Molly that it didn’t decrypt properly and that Molly should send it again. If it isn’t garbled in a particular way, Patty will just let Molly know she got the message.

Mr Talk can listen to this exchange between Patty and Molly. Without the secret encryption key, Mr Talk won’t be able to figure out what is in the message that Molly sent to Patty. But now suppose that Mr Talk is able to send encrypted messages (seemingly from Molly) to Patty. Mr Talk can send Patty modified forms of the message that Molly sent and find out whether Patty got a garbled message when she decrypted it. Mr Talk makes small changes to the original encrypted message to create a bunch of new slightly different encrypted messages. By finding out which ones are garbled and which ones aren’t, Mr Talk can actually decrypt the original message. This is a type of “Chosen Ciphertext Attack (CCA)”. It is called this because the attacker is able to choose ciphertext for the recipient to attempt to decrypt.

Now, the particular attack that Mr Talk used depends on the precise details of how Patty determines whether a message was garbled. That means changing those details can defend against this particular attack. Those who are familiar with all of this will know that I’m talking about the “padding oracle attack”, and will know that it can be defended against by using different padding or using a different encryption mode. But such a defense only addresses this particular attack. Is there a way to defend against all CCAs, even those that haven’t been invented yet?

The good news is that it is possible to defend against all Chosen Ciphertext Attacks. The way to do that is to properly use authenticated encryption. Padding or other “oracles” are not a particular threat to 1Password, as there is no back-and-forth exchange in normal operation. These sorts of attacks are practical when there is an automated oracle on the network or in a specific device that will attempt to decrypt ciphertext on demand. In 1Password, there is no opportunity for an attacker to set up the kind of dialogue needed for this kind of attack.  But we also know that theoretical weaknesses have a habit of turning into exploitable weakness over time. So we look ahead, and build authenticated encryption into 1Password now.

#### What happened to Molly?

I am pleased to say that no harm came to Molly or the coyote.  She was on her leash, and you can see in this staged and contrived photo that I was – just barely – able to restrain her from running to her doom after a coyote. However, our attempts to teach her that she must verify messages before she does any other processing of said messages have not gone well.

## Doing the two-step until the end of time

In my discussion of Dropbox’s new two-step authentication, I skimped on the cryptography. Because we had to move quickly, I wanted to focus at the time just on our recommendations, so I told a few fibs about how the way the six digit codes “get” to your phone. Now I want to explain how it really works.

Not only that, but I will sneak in a little introduction to Message Authentication Codes (MAC), which plays a major role in our newest version of the 1Password data format. This topic is also worth revisiting because our new release, 1Password4 for iOS, works well with Dropbox’s two-step verification.

Speaking of, let’s start with Dropbox’s two-step authentication system. I did try to warn readers that I was being less than forthcoming about the full truth when I suggested that a six digit code is sent from Dropbox (or Google) to your phone:

There are also some really cool things about how the protocols for two-factor authentication work, but I will bite my tongue and leave that discussion for another day. What this means, however, is that a great deal of what I say in describing the system below is a pack of lies.

Even my word “protocol” could be confusing, as it might imply some network activity. The magic of the system is that anything using this type of Time-based One Time Password (TOTP) tool will compute the same six digit code at a particular time with the initial set-up secret. Dropbox’s login system will calculate the six digit code on its own; and a tool that you use, such as Google Authenticator, will also calculate the six digit code on its own. No network connection is needed after the first time setup.  In my example below, I’ll use Google Authenticator, but it isn’t the only TOTP tool out there.

### Initial set up

When you first set up something like the Google Authenticator you scan in a QR code. It might look something like this:

The code contains a label that will typically be “Dropbox:your-email@example.com”, and it contains a secret that is randomly generated and unique for each account. The secret might look like “MQZDKZRZGBRWMMZXMI4TCMZUMYYDKYTC”. Putting this inside of a QR code just saves you a lot of typing. If you don’t have a camera that can be used to scan the code, there is even a link for getting the information that you should type in. Scanning this in is the only time that information will be transmitted (in this case, transmitted via your phone’s camera) from Dropbox to Google Authenticator.

Google Authenticator on your phone will keep a copy of the secret, and so will the Dropbox servers. That shared secret allows both Google Authenticator and Dropbox to calculate the same six digit codes when needed.

### Counting on time

When you log into Dropbox with your username and password you will then be prompted for the six digit code if you have enabled two-step verification. You will then open Google Authenticator on your phone and you will see six digits. Those six digits are computed from a combination of the the shared secret and the current time. The current time is the number of seconds since the first instant of 1970. It is rounded down to the nearest half minute. This is why the number changes every thirty seconds.

When you enter the six digit code during Dropbox’s login process, Dropbox will perform the same calculation. It has a copy of the secret that was first shared, and it too knows the current time. If what you enter matches what it has calculated, you’re in.

Your phone will not need any network connection as long as its clock is reasonably accurate. Fun fact: your phone actually makes minor adjustments to its clock pretty much every time it connects to any kind of network that allows it to check a time server on the internet. Today, most networked computers and devices know the current time to within less than one 10th of a second.

Because the code depends on both the time and on the shared secret, we end up with a different code during each 30 second period. This makes it a one time password.

### Beyond the end of the world (January 19, 2038)

Ancient eunuchs foretold global catastrophe on January 19, 2038, as their long count calendar comes to an end and starts a new cycle from zero

—Anonymous

Replica of the Aztec sun stone. This has nothing to to with Unix or Mayan time keeping.

The number of seconds since the very beginning of 1970, known as Unix time, is often maintained in a single variable in the computer’s operating system. When Unix was first designed, this number was stored in 32 bit variable. That means that the number could range from 0 to 232. Zero corresponds to the midnight January 1, 1970 (UTC). So what time does 232 correspond to? That will be 3:14:07 (UTC) on January 19, 2038. Bad things will happen then to computers that still are still using 32 bit integers to store Unix time.

So will Google Authenticator stop working in 2038? No, it should be fine. Even though iOS devices – based on 32-bit ARM chips – do just use 32 bit “long” integers, Google Authentication doesn’t rely on that. It uses NSDate to get Unix time on iOS.

Indeed, the actual standard defining TOTP states:

The implementation of this algorithm MUST support a time value T larger than a 32-bit integer when it is beyond the year 2038.

#### Another wrinkle in time

Unix Time really is the number of seconds since the very beginning of 1970, but that number ignores leap seconds. Leap seconds are added (or subtracted) on occasion to account for the fact that the speed of the Earth’s rotation can change slightly due to earthquakes, other seismic activity, and even tidal activity (not only do I get to talk about a calendar system reaching its end and resetting, I get to talk about earthquakes and tidal waves in the same post!). A leap second was added at the end of June 2012, so noon (leap second adjusted) on July 1 was actually only 86399 seconds later (by Unix time) than June 30 instead of 86400 seconds later as you would normally get between two days.

The TOTP standard requires the use of Unix time, which is defined to ignore leap seconds. This way, everyone who follows the standard will be using the same clock and calendar. Also, keep in mind that Unix time isn’t just for Unix-based operating system like OS X, iOS, and Android. Windows has a similarly defined FILETIME, which differs in its start time and that it counts in nanoseconds instead of seconds, but it can be converted to Unix time easily enough for use in the TOTP protocol.

#### Time to meet MAC

Earlier, I said that the code, or one time password, is computed from the secret key and the time, but not just any old computation will do. For the system to work securely, we need the computation to meet some requirements which include:

1. It must be easy to calculate the code from the key and the time, but it must be completely unfeasible to calculate the key from the code and the time.
2. It must be unfeasible to predict without knowledge of the key what the code will be at some particular time even if you have observed what the code is at many other times.
3. The calculation will always give the same result if given the same key and time (it is a function).

These look similar to some of the requirements we wanted for a good cryptographic hash function. And a cryptographic hash function will play a central role in how this is all done.

This also looks as if we are using a shared secret key to create a digital signature on the time. Digital signatures also involve hash functions. But “digital signature” isn’t really the right term here because those are based off of public/private key systems. With TOTP, we have a shared secret.

In place of a digital signature, we have a Message Authentication Code (MAC). This is not to be confused with “MAC” of “MAC address” that you see as hardware addresses for networking equipment, and certainly not to be confused with “Mac” (Apple’s family of computers) or “mac” (the mackintosh raincoat). Maybe this will help keep things clear:

A lowercase mac, for when you need wet wear
And an all-caps MAC is made by software
You’d be just as cool as the great Ry Cooder
If  you never confound these with a Mac computer

One of the ways to use a cryptographic hash to create a MAC is the HMAC. You will hear more about HMAC in the not-so-distant future.

#### Keeping time, time, time

One consequence of this sort of system is that it makes the computers’ knowledge of the time part of the security system. This isn’t anything new; this requirement has been part of the Kerberos system for decades. Indeed, one of my first roles in system administration was keeping clocks in sync with each other, specifically for Kerberos.

Unfortunately, this means that if someone can tamper with the time signals a computer receives from outside, then they can do damage to other aspects of security. We need systems to verify that the messages they get about the time are authentic, but the less-than-ideal state of secure time synchronization could be the subject for a new series of rant posts. Fortunately, I’ll spare you.

It is also not clear at this point what forms of time travel this or other security protocols can resist. I believe that there is a research paper in this question somewhere for an adventurous student and a flexible professor.

### Six digits from 160 bits

Let’s now put all of these pieces together. Dropbox and Google Authenticator each have the shared secret from when you set up your two-step verification. And each know the correct time at the moment. So when each calculate the HMAC of the current time, using the shared secret as a key, they will calculate the same number. If they use SHA-1 for the hash function (as they do in the current system) the number that they calculate will 160 bits long, or roughly 48 digits. The final step is to compute a 6 digit number from that 160 bit number. But let’s save time and skip those final details.

### Time for closing remarks

Dropbox’s two-step authentication is a great thing, and 1Password for iOS now works more smoothly with it. But it does the most good for people who are using weak or re-used passwords to log into Dropbox. Thankfully, 1Password users don’t really need to worry about that problem.

## Alan Turing’s contribution can’t be computed

Alan Turing was born a hundred years ago this year and his most important paper was published seventy-six years ago (November 1936). It is close to impossible to overstate the influence that Turing has had on the modern world. It is something well worth celebrating his life throughout this centennial year. Although any celebration must be tempered by reflection on the circumstances of his death, I would very much like to tell him that “it got better.”

Since others with better knowledge of history than I are writing this year about many aspects of Turing’s life and influence, I’ll discuss something that doesn’t make the rounds often: Alan Turing and randomness. Turing knew a great deal about randomness and talked about what kinds of things can and can’t be computed. If you would like to know why true randomness isn’t “computable”, please read on. Discussion of randomness in 1Password and cryptography will be in a later article.

### The Millennium of the Algorithm

I like to call the second millennium “the millennium of the algorithm”. An algorithm is a finite list of step-by-step instructions that, if followed, will get you a result. When we learned how to add multi-digit numbers in grade school, we learned an algorithm. The same is true for multiplication and division. These algorithms for arithmetic were developed and described by Muḥammad ibn Mūsā al-Khwārizmī in about 825CE and translated to Latin in the 12th century. It is from al-Khwārizmī’s name that we get the word “algorithm”.  We also get the word “algebra” from the title of one of his books.

In the final century of the millennium, Alan Turing found a way to treat algorithms as mathematical objects. This involved inventing a computer programming language and describing a physical machine that it would run on. The particular machine wouldn’t be very practical because Turing needed it to be the simplest thing possible that would compute, in principle, anything that a more complicated physical device could do. It would have been horrendously inefficient if actually built.

The full title of Turing’s 1936 paper is “On computable numbers, with an application to the Entscheidungsproblem“, but I will only be talking about the “On computable numbers”. Although utterly fascinating and the goal of his paper, let’s set aside the Entscheidungsproblem for another post or perhaps a few beers.

### Different kinds of numbers

There are lots of numbers between 0 and 1. We call all the numbers in that range real numbers. Among the real numbers are rational numbers (numbers that are a fraction of whole numbers) like 1/3, but there are also numbers that can’t be expressed as fractions of whole numbers, like the square root of 2. This fact about the square root of two was so disturbing to the Pythagoreans view of the universe that they attempted to keep it secret. Legend has it that the person who revealed the secret got tossed overboard and drowned as punishment. Anyway, numbers that can’t be expressed as ratios of whole numbers are called irrational numbers.

It turns out that there are as many even numbers as there are whole numbers. This is because for any whole number that you care to name, I can name an even number that matches it. If you say “33”, I’ll just say “66”. This may seem strange, but if we can find a way to pair up the members of one set (in this case whole numbers) with the members of another set (in this case even numbers), we say the sets are the same size. It actually gives us a useful way to talk about infinities.

Since there are as many whole numbers as even numbers, you might reasonably think at this point that all infinities are the same. You’d be wrong, but it  certainly wouldn’t be unreasonable to have that incorrect intuition. There are, in fact, more real (irrational) numbers than there are rational numbers. The proof of this is one of the most beautiful things ever invented, but I won’t go into it.

So we’ve got a different, bigger infinity of irrational numbers then we have of rational numbers. When a set has as many elements as there are counting numbers, we call that set “countable”. It’s infinite, but we can match up elements to the counting numbers. The counting numbers are countable as are the rational numbers. The set of real numbers, however, is not countable. This means that we cannot pair up each real number with some counting number.

#### How irrational can we get?

There are different types of irrational numbers. Things like the square root of 2 are called algebraic numbers for reasons I won’t explain. But there are irrational numbers that go beyond, or transcend, the algebraic numbers, and these are called transcendental numbers. The most famous transcendental number is π, the ratio of the circumference of a circle to its diameter. Of course, π has been known about for a very long time, but it was only proved to be transcendental in the 19th century. The trigonometric functions like sine and cosine typically yield transcendental results.

There are algorithms to compute any algebraic numbers (things like the square root of 2) to any precision that we need. There are also algorithms to calculate the kinds of transcendental numbers that we tend to use, like π or the sine of 10.

#### On Computable Numbers

Imagine a hotel – let’s call in the Hilbert Hotel – that has a countably infinite number of rooms, each of which is big enough for only one guest. Suppose you have every room filled, and a countably infinite number of new guests arrive. You can still fit them all in by having each current guest move to double their current room number. If Alice is staying in room 1, she will move to room 2. Barbara, who has been in room 2, will be moving to room 4. The guest in room 33 will be moving to room 66. After this move, then all of the odd numbered rooms will be vacant, and you can then give out those rooms to your new guest.

If you have an infinite number of guests who would like to stay, and you can find a way to assign each to her own room, then you have countably infinite guests. But if there is no way to give each guest her own room, then you have an uncountable number of guests.

By carefully defining what it means to compute something, Turing found a way to give every possible algorithm a room to itself in the Hilbert’s countably infinite hotel. This means that there are “only” a countable number of algorithms. At the same time, we know that there are uncountably many real numbers. There are real numbers for which there is no algorithm to produce it. Some things are uncomputable.

Plenty of irrational numbers are computable. We do have algorithms for computing the square root of 2 or the sine of 50. Turing’s result here (remember, he actually just used all of this as a building block to settle a bigger question in mathematics) tells us, among other things, that that while there are countably infinite things that we can compute, there are uncountable infinite things that we can’t compute. The overwhelming portion of numbers are things that we can’t compute.

#### Computing Machines

Turing was able to define the notion of “computable” by imagining the most simple device with the most simple of mechanisms that could do everything that we think of as computation. But, there is some confusion about the term “Turing Machine”. Although it can be used to refer to the imaginary physical device described in “On Computable Numbers”, it often refers to a program for that device (Turing called his programs “machines”). There is a special kind of Turing Machine (program) which can read and execute any Turing Machine. We call such programs or systems Universal (Turing) Machine.

One important fact about a Turing Machine (the imaginary device or individual programs) is that it always produced the same result with the same input. The result of each step in an algorithm depends entirely on what it has to work with and the step itself. If an algorithm has a step that adds two digits the results must always be the same given the same starting digits.
Algorithms cannot have steps in them like “flip a coin, if it comes up heads do X, otherwise do Y.” There is no coin flipper inside a Turing Machine.

### Pick a number, any number

If we pick a number at random from the real numbers between 0 and 1, it will almost certainly be a non-computable number, as the overwhelming majority of numbers aren’t computable. So can we pick a random number that way? Not with a Turing Machine. Each step in an algorithm should get you to a single specific result based on where you start. True randomness is not computable.

Turing knew that Turing machines could not generate true randomness, and Turing knew a great deal about randomness. His 1934 King’s College (Cambridge) Fellowship Dissertation had been on one of the most fundamental theorems in statistics. His now famous work as a code breaker at Bletchley Park during the second world war involved an innovative application of Bayes’ Rule to cryptanalysis.

When Turing got to play with real early computers for his own academic interests, he knew that if he wanted anything with truly random numbers, it would have to go beyond the computable. He had a hardware random number generated included in the design of the Mark I at Manchester University in 1949. A paper by Martin Campbell-Kelly describes what was almost certainly the first attempt at a hardware random number generator in an electronic computer:

At the request of Turing an instruction to generate a random number from a noise source was provided. Unfortunately, the numbers turned out to be not particularly random, and debugging was difficult because programmes were not repeatable; consequently, the instruction eventually fell into disuse.

The lesson in all of this history and theory is that randomness has been a difficult problem since the very beginnings of computing.

### More randomness to come

1Password, like pretty much all cryptographic software, needs cryptographically secure random numbers to do its stuff securely. What it means for a number  to be cryptographically secure, why 1Password needs such numbers, and where it gets those from will be the subject of a future article.

If you would like to look to the past then I very strongly recommend Andrew Hodges’ Alan Turing: the enigma as the definitive biography, and for a remarkably accessible yet complete discussion of On Computable Numbers, I enthusiastically recommend Charles Petzold’s The Annotated Turing.

## Hashing fast and slow: GPUs and 1Password

The net is atwitter with discussion of Jeremi Gosney’s specially crafted machine with 25 GPUs that can test hundreds of billions of passwords per second using hashcat, a password cracking system. Password crackers, like hashcat, look at the cryptographic hashes of user passwords and repeatedly make guesses to try to find a password that works. 1Password has been designed to resist automated password cracking attempts exactly because we anticipated developments like this.

### Don’t get freaked out by the numbers

First and foremost the reports of 348 billion passwords per second is specifically about passwords stored in the LM format used on Windows XP, exploiting the notorious design flaws and limitations LM hashes. (If you are still running Windows XP; and particularly if you are running NT, 2000 or 2003 domain controllers, please, please upgrade.) The hundreds of billions of  passwords per second reached against those hashes does not mean that passwords stored in other formats can be cracked that quickly.

By the way, this isn’t the only important news about password security to come out this year’s excellent Passwords12 conference. I couldn’t quite make it there this year, but I will talk about other developments from it in coming weeks. Today, all eyes are on GPUs guessing billions of passwords per second.

### Slow and Fast Hashing

Typically when a password is “stored” on a system that you log into it is not the password itself, but instead it is a cryptographic hash of the password. I’ve written about some of the ways that password  can be hashed before. These can roughly be divided into two categories: slow hashes and fast hashes. Slow hashes are specifically designed to slow down the kinds of attacks we are discussing.

### Defending against future threats

There are several lessons from this. Gosney’s work does reflect real innovation and a breakthrough, but it isn’t an unexpected breakthrough. People who keep an eye on these things – and we do keep an eye on these things – expected something like this within the next couple of years.

We need to design systems that work against plausible future threats, not just against current threats. This is what we have always tried to do with 1Password.

### Lessons and actions

1. Your 1Password Master Password and your 1Password data remain safe, we designed the Agile Keychain format from the beginning to resist crackers like this. But it is also important for people to select strong, memorable, Master Passwords.
2. It is more important than ever to have unique passwords for each site and service. As password cracking gets easier, the risks of using the same password on multiple sites increases. This is because if password hashes are stolen from one site, attackers have a better chance of discovering the password from the hash. Once they have that, they can try the same password on other sites.
3. When using 1Password’s Strong Password Generator, try to create passwords that are at least 20 characters long.

### Back to the Future

I’ve talked (well even boasted, I suppose) about how our earlier design decisions are protecting 1Password users today. But we have to look at what design decisions we make today will do for 1Password users half a decade from now.

Gosney’s machine can also be used against slow hashes, including PBKDF2 passwords. You can read more (and see cool pictures) of the design of Grosney’s hashcatting machine in the conference presentation slides (PDF).

Furthermore PBKDF2 was not designed to specifically impair parallel processing. But because GPUs have unusual and restricted ways of addressing memory, it is possible to design systems that make parallel processing using GPUs slower. This leaves a number of questions that we continue to look at.

1. Do we need to change anything at all in anticipation of even more powerful machines tuned toward PBKDF2? (We don’t yet have estimates on how many passwords per second this system could try against a 1Password data file.)
2. If we do need to change things, when do we need those changes need to be in place?
3. Should we look at more parallel and GPU resistant alternatives to PBKDF2, such as scrypt?
4. Should we look at tinkering with options within PBKDF2 to make it more resistant to GPUs working in parallel?

These are not new questions. We are always asking ourselves these and other questions in order to keep your 1Password data secure and protected, both now and in the future.

[Updated 2012-12-06 15:50 UTC. Updated to correctly explain that Gosney’s system is not limited to LM hashes. Thanks to Jeremi Grosney, Solar Designer, and others who pointed out my error. I have also taken the opportunity to add more clarifications and links to background throughout.]

## Credit card numbers, checksums, and hashes. The story of a robocall scam

As  Lívia and I were out walking Molly and Patty on Monday evening, I received a telephone call from an unknown number.

I decided to answer the phone anyway, and I was greeted by a recorded voice telling me that my Bank of America debit card beginning with 4217 has been limited and whether I would like to revalidate it.

I don’t have a Bank of America debit card (though I did many years ago), and I know US rules bar most calls that begin with recorded messages, otherwise known as “robocalls.”  I also know that the first few digits of credit card and debit card numbers indicate the card issuer and so are not secret if you know that it is a card issued by a particular bank. It would be easy for a scammer to appear to know my card number. Once I got home, I was able to confirm that 4217 is, indeed, how most Bank of America debit card numbers begin.

Naturally I pressed 1 to “revalidate” my card (not-so-sidenote: you should not try this at home. Indeed, you really should just hang up when you get such calls). I wanted to see how this scam proceeded. At this point I was told a few more details explaining that my card beginning with 4217 needed to be revalidated because of a internal security failure and that I should press 1 to continue. I pressed 1 again.

I was then asked to enter my Bank of America debit card number. Unfortunately, I was not able to construct a number off of the top of my head that would pass their validation check.
While I knew it should begin with 4127 (thanks to the robocall), I did not know at the time that the next two digits should be between 64, 65, or 66 for Bank of America. I did know that the last digit of the sixteen digit number should be calculable from the preceding 15 digits by the Luhn checksum algorithm. But I couldn’t recall the details of the algorithm at the time, and I certainly would not have been able to construct something on the fly quickly enough to enter a “valid” number.

As a consequence, I wasn’t able to get past the “enter your debit card number” step, and so never learned what other information they would ask for, such as my Social Security Number, billing address, or PIN. But it is a safe bet that those would have been requested at some point. Those of course, would have been easier to fabricate as they have no quick validation system.

### Checksums

The Luhn checksum is not some super secret security feature, it was a simple system designed to identify errors when exchanging credit card numbers in common use. At best, it can play a minor role in basic security checks to weed out those who just try random numbers.

The system is designed to catch common errors. If one digit of the 15 digits is entered incorrectly the checksum will fail. For example if the correct number is 4217 6601 2345 6784 (here 4 is the checksum) then if the number is given as 4217 6681 2345 6784, where the “0” has been mistyped as an” 8″, the check digit will be 7, and so  4217 6681 2345 6784, with a “4” as the last digit, will not validate.

Some people accidentally type “2435” instead of “2345”, swapping the “3” and the “4”. It happens to the best of us, and it’s another example of the the type of errors the Luhn system can catch.

#### Not all checksums are cryptographic hashes

Check digits of this sort, where the checksum is just one digit long, have properties that make them poorly suited for cryptographic purposes. The most obvious is that they are too small. That is, the check digit can only be one of ten possibilities—0 through 9. Put another way, there is a pretty much a 1 in 10 chance that two numbers will have the same check digit.

When two numbers result in the same checksum, we have a “collision”. If it is feasible to create two numbers that have the same check digit, the system is not collision resistant.

If we also have one number that has a particular checksum then we can easily construct another number that yields the same checksum. So the Luhn algorithm isn’t second-preimage resistant. The difference between collision resistance and second-preimage resistance is subtle. For collision resistance, the problem is “find two numbers whose checksums collide.” For second-preimage resistance, the problem is “here is a number m, now go and find another number that has the same checksum as m.”

It is also the case that if we have a checksum, we can easily construct a number that shares the same checksum. If you give me a checksum of “5”, I can construct a 15 digit number that has 5 as its checksum. That is, it is possible to work backwards from the checksum to an original. Hence, this checksum system is not preimage resistant. Functions that are preimage resistant are sometimes called “one way functions”.

The obvious reason why the Luhn algorithm fails to be preimage resistant, second-preimage resistant, and collision resistant is because it is just one digit. There just isn’t anything you can do when you only have one digit (about 3 bits) with which to work. But there are other reasons as well. I won’t discuss them at all, other than to say that the Luhn algorithm doesn’t have those three properties because it was never designed to.

### Resistance is necessary

Cryptographic hash functions, however, are used for more than just checking for accidental errors in data entry or data transmission. This means they do need to be one way functions (preimage resistant); they need to be second-preimage resistant; and they really should be collision resistant.

We’ve discussed how preimage resistance is one of several essential properties of proper treatment of passwords on servers. We’ve also discussed how second-preimage resistance plays a vital role in network security. Over the next few posts I will be talking about how hash functions are used for data integrity checks and specifically for authenticated encryption. Authenticated encryption will play a big role in our next data format.

### A fine night for a scam

I am relieved to say that both Patty and Molly did not really mind that I spent the remainder of our walk talking about the nature of the scam; how the criminals’ knowledge of the issuer number on a card can easily fool a target; and, of course, on the difference between checksums and cryptographic hashes. Lívia did not seem quite as complacent as the dogs. “The difference between us, Jeff,” she said, “is that I get really annoyed when something like that happens; but you seem to be really enjoying yourself.”

### A few notes about robocalls

Robocalls are a lot like email spam:

1. The caller ID number is easily counterfeited. You cannot rely on it to trace the source of the call
2. “Press N to be removed from our list” is a lot like the “unsubscribe” button in email spam. Sure there may the the rare case where it is genuine, but usually it is just used to confirm that the spam/call reaches a real person who, to at least some extent, trusts the message. It’s a great way to get yourself added to more lists

As I write this, the US Federal Trade Commission is hosting a summit on robocalls. There should be more information at that site in coming days and weeks.

Correction October 22, 2012: I incorrectly suggested above that the Luhn algorithm will catch all single transcription errors involving the replacement of one digit with another and all cases where adjacent distinct digits are transposed. I was wrong. The ISBN checksum has these properties, but the Luhn Algorithm used for credit cards does not. The crucial difference revolves around the fact that the ISBN-10 system uses a prime modulus (11), while the Luhn system uses a composite modulus (10). This is also why the check digit for ISBN numbers can sometimes be “X”. The check digit there works out to be a number from 0 through 10, but “X” is used to indicate 10.

I should have spotted this instantly when I looked at the Luhn algorithm, but it required a different path for me to learn of my error. Today is a student holiday in our school district, and so I assigned my daughter the task of proving that the Luhn system will have the properties that I claimed it had. She quickly constructed some counter-examples, where changing an individual digit leads to the same check digit.

## Flames and collisions

Having a Microsoft code signing certificate is the Holy Grail of malware writers. This has now happened.Mikko Hypponen

Unless you are a system administrator for a government institution in or around the Middle East you do not need to worry about Flame infecting your computer. Flame (also known as “Flamer” and “skywiper”) itself is not a security concern except to a very narrow, targeted group. Quite simply you don’t need to worry about being infected by Flame, and antivirus vendors who suggest otherwise may be engaging in fear mongering.

With so few people in danger of Flame, why am I writing about it? Good question. I’m writing about it because one of the methods used in Flame has the potential of undermining a crucial part of computer security. The authors of Flame have the ability to subvert the Windows Update process. Whatever Flame itself does or doesn’t do, the fact that its authors acquired the capability to distribute fake updates to Microsoft Windows is cause for serious concern.

### Software updates and chains of trust

I have previously written about how an important part of computer security is ensuring that your software updates come from the right place. You don’t want someone who pretends to be AgileBits giving you malicious updates to 1Password. And you don’t want someone who pretends to be Microsoft giving you malicious Windows Updates. The methods used for digitally signing downloads and updates involves some mathematical magic and a Chain of Trust. In the summer 2011, we saw, in the example of DigiNotar, what can happen when someone finds a way to insert themselves into the chain of trust.

These two articles, “Who do you trust to tell you who to trust?” and “A peek over the Gatekeeper” explain the security infrastructure I’ll be writing about here. You will see terms like Certificate Authority or Man in the Middle attack in this post, but they are more fully explained and illustrated in those other posts.

### Putting trust to the flame

The authors of Flame have acquired the ability to digitally sign their own updates to Microsoft Windows as if they were (almost) Microsoft. They have been able to create digital signatures in a way that will successfully fool the Windows Update process. Microsoft made a series of mostly innocent blunders in creating some intermediate Certification Authorities that are used by customers to creating individual licenses for a particular product. However, in combination those blunders on something that was not supposed to be part of the critical system turned out the have major consequences.

The details are subtle, and much of what we know comes from announcements by Microsoft about their recent urgent security updates. With the sketchiness of the details and the fluidity of the situation, it is safe to assume that anything I write about the mechanism used to create the bogus signatures will be out of date by the time I actually post this article. A great source for information about this is Matthew Green’s post on his Cryptographic Engineering blog. The technical details of Flame (PDF) itself are coming from research at the Budapest University of Technology and Economics.

Microsoft has issued a security update, which you get via – you guessed it – Windows Update. It is not clear (to me, at this point) whether these updates fix the long term problem or just fix the Flame-specific signatures. In a notice on June 4 they state that the June 3 update was the first of a series of actions in a phased mitigation strategy. So there will be more to come.

But the most interesting thing (to me) in their notice is

The Flame malware used a cryptographic collision attack in combination with the terminal server licensing service certificates to sign code as if it came from Microsoft. [Emphasis added]

And this gives me the opportunity to discuss an important concept in cryptography that I haven’t talked about before.

### Hashes and Collisions

In the article about Gatekeeper I talked a bit about some of the mathematical magic behind digital signatures, but I left many pieces out. The digital signature of a file is not actually a signature of the file directly. For a number of technical reasons it is too unwieldily to directly construct a signature of a large chunk of data; the file itself would need to be treated is a single enormous number that is then used as an exponent of some other large number. If the file were just 10 kilobytes, its number would be on the order of 281920 (about 25,000 digits long). We deal with some really big numbers in cryptography, but we would like to just deal with numbers about the size of 2256 (around 77 digits).

To create a digital signature of a file we first need to do is reduce the file (no matter how big it is) to a number in the range we can deal with. We create a cryptographic hash (sometimes called a message digest of a file) and then digitally sign that hash. If a hash function (something that takes all the data in a file and produces a smaller number) is to be useful for digital signatures it must have a number of security properties. The security property we are interested in today is called (strong) collision resistance.

Collision resistance
It must be infeasible to find or create two distinct files that have the same hash.

Any time you have two distinct files with the same hash you have a collision. In principle collisions are inevitable because there are more possible files than there are hashes, after all, the point of the hash to turn a really big chunk of data (a file) into a smaller (though still big) number. Instead the security of hash functions relies on how difficult it is to find or create collisions.

In a recent article, I talked about another security requirement of a cryptographic hash. It must be infeasible to calculate anything about the original file from the hash. That is called pre-image resistance, but today’s topic is collision resistance.

Now let’s look at why collision resistance is important for digital signatures. Suppose that Patty, one of my dogs, creates two files. One of them says, “Molly is the cleverest and prettiest dog ever, and Molly gets to rule.” The other file that Patty creates says, “Patty will get all of Molly’s dog treats.” Patty, of course, is the clever one. She is able to create these two files so that they produce the same hash. Patty has created these to files in such as way that their hashes collide. Patty asks Molly to digitally sign the “Molly rules” file, and Molly is happy to comply. Molly, using her secret key, creates a signature for the “Molly rules” file. The digital signature is actually a signature of the hash of the file, and so that signature will also work as a signature for the “Patty gets all the treats” file.

Patty will then bring me the “Patty gets all of Molly’s treats” file along with Molly’s signature. I calculate the hash of the file and verify that the hash really is signed using Molly’s secret key. Nobody but Molly knows that secret key, and so I figure that Molly must have signed that file. I give all of Molly’s treats to Patty. Molly may like to collide with Patty on walks, but this is one collision that Molly is not at all happy about.

It’s more than just dog treats that are at stake here. As I described in the Gatekeeper article, it’s not just ordinary files that can be signed (or more correctly, have their hashes signed), but certificates and Certificate Authorities are also just data that can be signed. The signature on a certificate helps our computer determine whether it can trust that certificate. If Molly controls a Certification Authority (CA) Patty can ask Molly’s CA to sign a certificate for Patty that Molly is happy to sign. But if Patty has another, trickier, certificate that produces the same hash as the one that Molly signed, Patty can simply add Molly’s signature to the bogus certificate.

The solution is to make sure that the hash function that is used for signing certificates (or files related to dog treats) is collision resistant. In particular this means that the MD5 hash algorithm should not be used. Unfortunately some of Microsoft’s Certification Authorities use the MD5 hash system. The MD5 hash system is badly broken, and it is very possible to generate collisions.

#### The downfall of MD5

Cryptographers don’t talk about “impossible” or “possible”; they talk about “feasible” and “infeasible”. An infeasible task usually means something like “if you used all the computing power on Earth now and for the next decades, would still have only a negligible chance of successfully completing the task.” With MD5 we’ve witnessed the progression from “secure” to “weaknesses that are infeasible to exploit” and eventually to “practical exploits” in the space of about 10 years.

MD5 (Message Digest 5) was shown in 1995 to have theoretical weaknesses. From there the history is remarkably interesting (if you are into that sort of thing). There was one group of people who were saying “don’t use MD5 any new software or systems; use SHA1 instead.” There was another group saying that the weaknesses in MD5 don’t pose an actual threat to how it is actually used. By 2005 it became absolutely clear that the weaknesses in MD5 had been expanded and could be leveraged into real attacks, and in 2008 there was a spectacular demonstration to create a rogue Certification Authority. An important side note is that while the advice to use SHA1 instead of MD5 was very good advice in 1995; today that advice is to use SHA2 instead of SHA1.

As we all watched MD5 go from secure to completely broken with respect to collision resistance in the space of ten years, we also saw that its use continued. I think that many of us failed to understand just how easily collisions could be exploited. It was tough to steer a course between people panicking on one side who were saying that anything involving MD5 was tainted and those on the other side who always seemed to think, “sure MD5 has its weaknesses, but those weaknesses don’t affect my application.”

#### Microsoft using MD5 in 2009

People continued to use MD5 for a number of purposes that they didn’t see as critical. The intermediate certification authority that Microsoft set up for signing Terminal Services licenses was, presumably, thought to be not critical enough to worry about. The worst (they thought) that could happen is that customers could create a few extra licenses for themselves if they went through the effort to create collisions. So as even as late as 2009 Microsoft created CAs that used MD5 as its hashing algorithm. They turned out to be wrong about “the worst that could happen”. Other blunders allowed a weak CA created in 2007 to sign certificates for other CAs.

Using the same trick that Patty pulled on Molly, an attacker could get a signature for an acceptable CA to work on a malicious one. The malicious one could then be used to sign false updates to Windows. Thus, the attacker would have the ability to sign things that would look like legitimate updates to Windows.

Microsoft has recently (June 6) provided more details, which confirm that what I’ve described above did play a role in the creation of the signatures for the bogus updates.

### Lessons

When we, the technology community, were advised in 1995 to stop relying on MD5 we should have been quicker to make the transition to SHA-1, even though at the time it wasn’t clear how those weaknesses could be exploited. Likewise, we should follow the same warnings about SHA-1 today. We shouldn’t panic every time a theoretical weakness is found, but we should remember that these can often be leveraged in ways we can’t predict.

Hash algorithms are hard. MD5, as you’ve seen deteriorated rapidly. SHA-1 has been shown to be less than ideal (not yet exploitable in practice), and even its replacement, SHA-2 isn’t all we would hope it to be. Cryptographers and the NIST are working on finding a system to become SHA3. If you have a high tolerance for geeks singing out of key, you may wish to sing along to the SHA-3 song.
The refrain is

SHA-2 will soon retire
Because NIST is learning and SHA-1 is burning
SHA-2 will soon retire
No we didn’t light it but tried to fight it.

#### The trust infrastructure is fragile

There are a large number of Root Certification Authorities and an even larger number of intermediate CAs. As we’ve seen here, the chain of trust can be compromised through some poorly designed CAs. As we saw last summer, in the case of DigiNotar, a Root CA can have their computer systems compromised allowing an attacker to gain accesses to the secret key needed to sign certificates. A third possibility is that the operators of CA may simply go rogue.

Charles Dudley once noted that “everybody complains about the weather, but nobody does anything about it.” The same can be said about the trust infrastructure that so much of our security relies on. Fortunately there are a few talented people who are working on serious proposals that, while they won’t completely fix the system, will mitigate some of the problems. I won’t review them here.

#### Are governments friends or foes of computer security?

Flame is almost certainly the creation of a “Western intelligence agency”. A few weeks ago, I would have guessed that it was an Israeli creation, but recent news about the creation of Stuxnet leads me to suspect that Flame, like Stuxnet, was primarily created by the United States. Parts of the US government have done and continue to do a great deal of important work in promoting good security. But we now have an example where the government has undermined a crucial part of computer security.

So far the victims of state-sponsored attacks on the certificate trust infrastructure have been in Iran. Last summer the government of Iran used certificates from a compromised Root Certification Authority, DigiNotar, to interest “secure” connections between individuals in Iran and Gmail. Now it looks like like the US (or possibly Israeli) government has been using poorly constructed Microsoft Intermediate CAs to target specific entities in Iran.

The good news, I suppose, is that these governments still had to exploit vulnerabilities instead of, say, compelling Microsoft to sign bogus updates. Of course, we cannot rule out that government agencies asked Microsoft to include those vulnerabilities. I suspect that Microsoft’s errors were innocent because they are the kinds of errors that people and organizations do make, but I can’t entirely rule out the more paranoid interpretation.

#### Security has a lot of moving parts

Each of the mistakes that Microsoft made were probably harmless on their own. No doubt they were made by different people who weren’t aware of the decisions that others had made. In combination, the mistakes add up to several ways of generating “good” signatures for bad Microsoft Updates, something with potentially catastrophic consequences.

### A few closing remarks

#### Truth and speculation

I have speculated about how Microsoft made the errors that they did, and I have speculated about what the creators of Flame have done with that. What we do know is the bogus certificates for signing Windows Updates were created, and we know what Microsoft has said about it. We know that CAs using MD5 in their digital signatures are vulnerable in the way discussed, and we know that that the bogus certificates were signed by those weak CAs.

As it turns out, some of my initial speculation has been confirmed since I first drafted this. Microsoft has published an outstanding and detailed report of their analysis.

We also have a rapidly changing situation. It shouldn’t be too surprising if much of what I say about the specifics of the case today is out of date by tomorrow. Still, it gave me the chance to talk about hash functions and attacks based on collisions, along with the problems of using MD5 for digital signatures. Furthermore, Patty and Molly are always seeking attention, so they appreciate that I’ve mentioned them here.

#### Continuing the discussion

If you would like to comment or continue this discussion, please do so in the Flames and Collisions forum topic I’ve created for the purpose.

This article was out of date before it was finished, and I expected that there would be more news to come. And so to avoid a continues series of updates, I will be adding updates to the forum thread set up for the purpose.

However, the news that came out right after this article was posted is so jaw dropping that I will repeat it here.

The cryptanalytic technique used to create the MD5 collision is new. It isn’t radically different than previous known techniques, but this is using a technique that would have taken a great deal of expertise to develop.

People have always speculated about how far ahead in cryptanalysis agencies like the NSA or GCHQ are compared to what is known by the academic community. The assumption is that the gap has been narrowing over the decades as there is more open work in cryptography done outside of intelligence agencies. We don’t often get data to help pin anything down with our speculation, but this definitely is interesting.

The Ars Technica article linked to above has a terrific diagram outlining the nature of the general technique used to create MD5 collisions.