When back doors go bad: Mind your Ps and Qs

This is going to be a long and technical article, but the point can be stated more simply:

The kinds of security architectures in which it is easy to insert a back door are typically less secure than the security architectures in which it is hard to insert a back door. The back doors that have been recently been disclosed in Juniper Networks’ ScreenOS firewalls exemplify this point dramatically. It is also possible that the back door technology developed by the NSA is being used by some entity other than the NSA.

Perhaps the Economist put it better in When back doors backfire:

The problem with back doors is that, though they make life easier for spooks, they also make the internet less secure for everyone else. Recent revelations involving Juniper, an American maker of networking hardware and software, vividly demonstrate how. Juniper disclosed in December that a back door, dating to 2012, let anyone with knowledge of it read traffic encrypted by its “virtual private network” software, which is used by companies and government agencies worldwide to connect different offices via the public internet. It is unclear who is responsible, but the flaw may have arisen when one intelligence agency installed a back door which was then secretly modified by another.

Two kinds of back doors

There were two back doors in Juniper’s ScreenOS that were fixed by Juniper, the mundane one and the interesting one. The mundane one was a back door into the authentication system, allowing someone with knowledge of it to simply log in and administer the devices. Back doors in authentication systems are easy to create, and are all too common. This, by the way, is why we try to build 1Password’s security on encryption rather than authentication. As I’ve argued before: Architectures that allow for back doors are bad for security. The more interesting – and for my soapbox, more instructive – back door is cryptographic. It also presents a greater threat. It allows the holder of the back door secret to decrypt certain kinds of traffic that were supposed to be encrypted by the ScreenOS (VPN) server. The attacker only has to listen to network traffic that is encrypted as far as everyone else is concerned, but she will be able to decrypt that traffic with the knowledge of the back door secret. Another difference between the two is that the mundane one is a simple password (disguised as debugging code) that is now known to the world and so any unpatched systems can now be logged into by anyone. That secret is now out. The cryptographic back door, on the other hand, is a very different beast. We know it’s there, and we know how it works, but we do not have the key. The secret needed to make use of the back door is still a secret. I don’t want to suggest that the back door in the ScreenOS authentication system isn’t important. It is very worrisome on its own, given where ScreenOS is deployed. But we all know that it is easy to add a back door to something that is merely protected by authentication (no matter how many authentication factors one has in place). Politically the questions about how that authentication back door got put into place in such an important product and who may have been using it is huge. But it offers nothing new from a technological point of view.

The cryptographic back door

Oh. I want to work for the NSA! Evil mathematicians just like Professor Moriarty! —My daughter, upon learning about the Dual EC backdoor

The cryptographic back door key all depends on how something called Q was created. Q is not a number (nor is it a free man), instead it is a point on a special sort of graph. If Q was created randomly then there is no back door key and all is well and good. But if Q was created by adding P (another point on the graph) to itself some large number of times then knowing how many times P was added to itself becomes your back door secret. You are going to see the equation [latex display=“true”]Q = dP[/latex] many times. Neither Q nor P are secret. But d is secret and it is the back door key. No doubt you are thinking something like, “well if P and Q aren’t secret, can’t we all figure out what d is by dividing Q by P?" Shouldn’t [latex display=“true”]d = Q/P[/latex] Well, d kinda sorta does equal Q/P, but there just isn’t enough computer power around to perform the relevant calculation. I’ll save that bit of fun math to (much) further below.

Necessary historical background

I’m one of those people who when asked a simple question go into more historical background than many people appreciate. But here it really is necessary. Really. ScreenOS (and others) used a cryptographic algorithm that was deliberately designed to have a back door that only the United States National Security Agency (NSA) was supposed to know about. The cryptographic algorithm is Dual Elliptic Curve Deterministic Random Bit Generator (Dual_EC_DRBG). I will refer to that simply as “Dual EC” from here on out. Dual EC was one of four DRBGs specified in NIST SP 800–90A, and leaving the back door aside it is still technically inferior to the other three listed in that document. The standard specified a P and Q but did not explain how they were chosen. In 2007 Dan Shumow and Niels Ferguson of Microsoft showed that if Q was created with something like [latex]Q = dP[/latex], then whoever had hold of d could decrypt traffic encrypted with Dual_EC. In 2014, a group of cryptographers presented a great explanation of how knowledge of d can be exploited, and even presented a proof of concept (with their own Q) of doing this. Back in 2007 nobody (well, nobody who was going to say anything) knew whether or not Q was created so as to have the back door. In September 2013 it became clear from some of the documents revealed by Edward Snowden Dual EC had been designed with the back door. The Wikipedia page on Dual EC has a useful timeline.

How many bad Qs?

For ’tis the sport to have the engineer Hoist with his own petardHamlet Act 3, Scene 4.

So far the story I’ve told would merely suggest that Juniper has some explaining to do about why it used a DRBG in the way that it did and why they chose one that was inferior to alternatives and was suspected of being backdoored. Juniper Networks are (sadly) not the only one in that position.

But I’ve only told part of the story. ScreenOS did not use the NIST Q. It used two custom Qs over time, and where they came from and who knows the back door keys associated with those Qs is something we simply don’t know. The back door key associated with the Q from the NIST standard is clearly held by the NSA. But what about the two used (at different times) in ScreenOS?

Juniper explained that they didn’t use the NIST Q in their Dual EC implementation but used a Q that was created randomly. However, they did not provide any proof that it was created randomly. (There are ways to demonstrably create such things randomly.) So we don’t know if that “official Juniper Q” was created in a way to give its creator a back door key. Nor do we know who might hold that key.

But there is a third Q that was, according to Juniper, inserted by parties unknown in 2012. That is someone simply changed the Q that is used in Dual EC in ScreenOS. We can only assume that this third Q is indeed backdoored, as it was something surreptitiously added to the code. We have no idea of who did this.

So even if we take Juniper at their word about their choice of Dual EC and other related design choices, the fact that the chose to use a system that was designed so that it could have a back door means that someone – we don’t know who – has had a back door key for decrypting VPN traffic between NetScreen devices for three years.

I have said it before and I will say it again, systems that are designed to have back doors (even if the back doors aren’t used) are inherently weaker than systems that are designed to make it difficult to have a back door.

Thar be math

The rest of this article tries to explain why it is easy to calculate Q from d and P but hard to calculate d from P and Q. Unless you have a great deal of patience and would like to learn more about the math, you can stop reading here.

Here is a picture of Patty and Morgan enjoying the pleasures of warm blankets on the sofa. Morgan also has no patience for math.

Patty and Morgan relaxing on couch. Not doing math.

Hard math, simple math

The actual math isn’t all that complicated, but it does rely on a number of unfamiliar notions. Some of these notions are hard to grasp not because they are complicated, but because they are simple. They are highly abstract. Indeed, you might even have to unlearn some things you know about addition.

I said earlier that even though P and Q are public, we can’t figure out what d is in Q = dP, but someone with d and P can easily construct Q. I’m not going to fully explain all of this, but I’m going to make an attempt to try to help people get a better sense of how something like this might be true. Along the way, I’m going to tell a few lies and omit lots of stuff. I need to lie to stand a reasonable chance of this article being useful to people who don’t already know the math.

One thing I should tell you right away is that P and Q are not numbers, while d is a number. I will get around to explaining what P and Q actually are below, but first we need to understand what it means to add two things together.

I am not going to try to explain how knowledge of d provides a back door. That is something that is difficult enough to explain in the normal case and is even harder given how Dual EC is actually used within ScreenOS. (See Ralf-Philipp Weinmann’s “Some Analysis of the Backdoored Backdoor” for some of those details.)

Instead, I’m focusing on a narrow point, allowing me to go into more depth about something that is central to modern cryptography.

Simple addition

Most of us have a pretty good sense of how addition works. We all pretty much know that the “+” in “3 + 5 = 8” means. We may not be able to define it precisely, but it is a basic concept that we understand. Well mathematicians like to define things precisely and more importantly they like to distill things down to their simplest essence. Let’s see what the essential properties of addition are.

What are we adding?

To define addition we need to talk about what things are going to be added to what things. The expression “3 + blue” doesn’t mean anything because “blue” and “3” are not in the same set of things we might add. We might want to talk about adding colors, and so could reasonably say “red + blue = purple”.

Addition can apply to more than just numbers.

Sometimes we add numbers; sometimes we add colors; but we need to have a set of things (numbers perhaps, or maybe colors) that we will be adding.

So let’s call the set of things that we will be adding E for an ensemble of things. When we are adding numbers, “blue” won’t be in E, but when we are adding colors, “blue” may very well be in E. When we are adding colors, 3 won’t be in E but it will be when we are adding whole numbers.

For the addition that most people are used to, there are an infinite number of things in E, but that doesn’t have to be the case. When we add hours to know what time it will be 15 hours after 9 o’clock, we use only 12 (or 24) of course. Remember, we are trying to distill addition down to its essence, so we want it to work both when E has a finite number of things in it as well as when it doesn’t.

Addition offers no escape If we are adding numbers, we don’t want 3 + 5 to equal “blue”. We want 3 + 5 to result in another number. If we are adding colors, we want the result of addition to be some color.

So we will say that

If a and b are both in E then a + b is also in E.

The fancy name for this is “closure”, or “E is closed under addition”.

You can switch things around

We like addition to work out so that “a + b” will yield the same result as “b + a”, so we say that

If a and b are both in E then a + b = b + a

We say that addition is “commutative”.

Any groupings you want

When you mix blue and red and white, it doesn’t matter if you mix the blue and the red first or the red and the white first. Likewise if you add 3 + 2 + 10, it shouldn’t matter whether you add the 3 and the 2 first or the 2 and the 10 first.

If a, b and c are all in E then (a + b) + c = a + (b + c)

And so we say that addition is “associative”.

We need a way to do nothing.

I have no problem grasping the importance of doing nothing, but not everyone seems to agree with me (Hi boss!). Anyway, we need a way to do nothing with addition.

To put that more mathy, we want E to contain something that behaves like zero. If you add zero to anything, you end up with what you started with.

If a is in E and 0 is the zero element then a + 0 = 0 + a = a

For the color adding analogy, we might add the special color “transparent” as our zero element so that blue + transparent = blue.

The fancy way to talk about zero is to call it “the additive identity”.

Adding it all up

Mathematicians are happy to call anything that meets those four conditions “addition”.

Getting to the points

So we can add numbers and we can add colors. But let’s talk about points on a peculiar geometric shape called an elliptic curve. Here is a portion of an elliptic curve, and it has two points on it, P and Q. (These aren’t the same as P and Q used in Dual EC.)

Elliptic curve with points P and Q

There are some interesting properties of these odd sorts of shapes. One of them is that if a (non-vertical) straight line crosses the curve at all, then it will cross it in exactly three places.

For example, if we draw a line that goes through both P and Q then that line will go through the curve at exactly one other point.

The first step in adding points P and Q is drawing the line connecting them and seeing where else that line crosses the curve

From that third point where the line crosses the curve we can find the point on the opposite side of the curve by drawing a vertical line up or down (as needed) from that point of intersection:

Draw a vertical line from where the line between P and Q crosses curve to get point P + Q

Now as strange as it might seem, if we define addition of points on an elliptic curve this way, it will meet all the properties of addition that we want. (I’m leaving out how the zero element is defined, and have constructed my examples so that I can get away with that.)

Repeated addition

Now let is return to the problem we started out with. If Q = dP how come it is easy to calculate Q if we know d and P but not d from Q and P? P and Q are points on the curve, while d is just an ordinary (though very large) number.

Well, we’ve defined what addition means for points, but what does it mean to say, for example, “4P”? “4P” is just shorthand for writing “P + P + P + P”. (Note that we can treat “4P” kind of like multiplication, but only because 4 is an ordinary number and we can just say that it is repeated addition of P. We haven’t defined any way to multiply two points together.)

Let’s work through a small example and try to find where 4P is. The first thing we need to do is add P to itself to get 2P.

We’ve already stepped through how to add two different points, but how do we add P to itself? We take as our line going through P and P to be the tangent line at point P. This is the line that touches point P but does not cross the curve at P:

The tangent line at P crosses the curve in exactly one other place.

That line crosses the curve at some other point. And just as we did when we added P and Q earlier, we draw a vertical line through that point of intersection to get our sum:

When adding a point to itself, we take the tangent at the point to construct the P,P line

Now we have point P, which we started with, and we also have point P + P, or 2P. So if we want to find 3P we can just add P and 2P together.

Add 2P to P to find point 3P on the curve.

This gives us point 3P and we can add 3P and P together to get to our goal of 4P:

To get point 4P, we add 3P and P.

A shortcut

It took us three additions to calculate 4P from P. But we could have saved ourselves one of those. We never really needed to calculate 3P. Instead once we had the point 2P we could have just added 2P to itself to get 4P. Because point addition is associative, we can think of 4P as (P + P) + (P + P), which is 2P + 2P.

Adding 2P to itself is another way to get 4P

With this shortcut, we have been able to calculate 4P from P with only two additions: The first addition was adding P + P to get 2P; the second addition was 2P + 2P to get 4P.

The shortcut offers huge savings

When calculating 4P our shortcut saved us only one addition, but the savings get bigger the bigger d is. Suppose we wanted to calculate 25P. The slow way would require 24 additions, but using this shortcut, it would only take us six additions.

2P = P + P
4P = 2P + 2P
8P = 4P + 4P
16P = 8P + 8P
24P = 16P + 8P
25P = 24P + P

So having to only do six of these additions versus 24 by the slow way is a big savings. When the numbers get really big, the savings get much bigger. Suppose we wanted to calculate the result of adding P to itself one billion times. 1000000000P would take almost a billion additions the slow way, but using the short cut we can do it with only about 45 additions.

The number of point additions it takes to calculate dP using this shortcut is roughly proportional to the number of bits in d.

No shortcut to d

So we’ve got a really efficient way to compute Q from d and P, but there is no efficient shortcut for finding d from P and Q. Indeed, this asymmetry between calculating d versus finding d is one of the things that makes elliptic curve cryptography work.

When numbers get big

When we start talking about the huge numbers that cryptographers actually use for these sorts to secrets we get truly massive savings. Those savings are so big that using the shortcut means that we can calculate the result in a fraction of a second, while doing it the slow way would take all of the computers on earth many times the age of the universe to compute.

The number of point additions needed to calculate Q=dP is roughly proportional to the number of bits in d, while the number of additions needed to find d from P and Q using the slow way is roughly d. As d gets big (say a 256-bit number), the difference between d and the number of bits in d is enormous. For a 128 bit number, calculating Q takes less than 256 point additions, but calculating d takes a number so large that I don’t know how to write it out. It would have 77 digits.

And if I may introduce more terminology, that asymmetry is an instance of what is called the “Generalized Discrete Logarithm Problem” (GDLP).

What is the GDLP good for?

To give you a hint of how this sort of stuff can be used for keeping secrets, suppose that two of my dogs, Molly and Patty, wish to set up secret communication without Mr. Talk (the neighbor’s cat) listening in. The dogs will need to establish between them a secret key that only they know, but how can they do this if Mr Talk is listening in to their communication?

First Patty and Molly will agree on an elliptic curve and some other non-secret parameters. Among those parameters there will be a starting point, that I will call G. Let’s let Patty pick a secret number that only she knows and we will call that little p. Patty will calculate the point pG. Let’s call the point that Patty calculates big P.

Patty can send P to Molly, but neither Molly nor Mr Talk (who is eavesdropping) will be able to calculate little p, even though they both know G and P. Big P is Patty’s public key, while small p is Patty’s private key.

Molly can do the same. She calculates M = mG, where little m is Molly’s secret, and sends big M to Patty.

Now here is where it gets fun. Patty can calculate point K_p by multiplying M by little p. That is

K_p = pM Molly can also calculate K_m by multiplying Patty’s public key, P, by Molly’s own secret, m. K_m = mP.

The Ks that both calculate will be the same. This is because point addition is commutative and associative. Remember that M = mG. So

K_p = pM = pmG.

We can look at what Molly calculates from Patty’s public key the same way: P = pG. When we look at what both Molly and Patty calculate with their own secrets and the other’s public point, we get

K_m = mP = mpG = pmG = pM = K_p Let’s look at each end of that long chain of “=”s, but drop out all of the middle parts.

K_m = K_p Both Molly and Patty calculate the same secret K even though neither of them have revealed any secret in their communication. And Mr Talk, who has listened to everything Molly and Patty sent to each other, cannot calculate K because there is no feasible way for him to learn either Patty’s secret p or Molly’s secret m.

This is magic with mathematics.

Lies and omissions

There are a lot of things that I left out of the explanation I gave here. And in a few cases I even told a couple of lies to make the explanation simpler. I’m only going to mention some of them here.

The shortcut that I described for computing Q from p and D is not the most efficient shortcut, but it is an efficient shortcut that is easiest to explain.
There are shortcuts for calculating d, but there are thought to be no efficient non-quantum ones.
I have not explained what I mean by “efficient”. (It is a technical term.)
I haven’t told you the formulas for doing addition of elliptic curves, I only did it with pictures.

It looks like someone at my daughter’s high school did a science fair project on finding good algorithms for point doubling. Thus forcing me to reveal just how much of a pushy and intrusive parent I am.

To make addition on elliptic curves work as an instance of the GDLP, we need to make it have only a finite amount of elements. That involves, among, other thing, making all of the normal arithmetic in the formulas done modulo some large prime number.
I didn’t talk about the additive identity (zero element) for addition on elliptic curves.
I never said what “=” means when I defined addition.
I kinda sorta misused the terms “public key” and “private key” when applying those terms to ECDHE.
I didn’t say anything at all about the existence of additive inverses.
The addition of colors analogy stops working if we were to consider additive inverses.
I implied that all and only the properties I listed for addition are what is needed for something to be called “addition”. The properties that I listed plus the existence of additive inverses are what is needed for the specific type of addition we need for this cryptography. Addition of numbers requires something more because numbers – unlike points on an elliptic curve or colors – line up in an order.

The after math

I’ve been able to use an example of a rare cryptographic back door to give me the chance to talk about the math of elliptic curves cryptography. But if you are asking what this has to do with AgileBits and our products (after all, we don’t use elliptic curves in our products), then we must return to the original point: If you build a system into which it is easy to add a back door, you shouldn’t be all too surprised when a back door is added. So let’s not build things that way.

I will close with another quote from that Economist article:

Calls for the mandatory inclusion of back doors should therefore be resisted. Their potential use by criminals weakens overall internet security, on which billions of people rely for banking and payments. Their existence also undermines confidence in technology companies and makes it hard for Western governments to criticise authoritarian regimes for interfering with the internet. And their imposition would be futile in any case: high-powered encryption software, with no back doors, is available free online to anyone who wants it.

Jeffrey Goldberg

Principal Security Architect