|Perl: the Markov chain saw|
Identity Based Encryption using Pairings and Crypt::PBCby jettero (Monsignor)
|on Dec 12, 2006 at 14:11 UTC||Need Help??|
[Note: My original intention was to publish this as a tutorial. However, I had mixed feelings about publishing it as such, since it had not changed much in RFC. I finally decided to just leave it in meditations.]
This tutorial covers (very basically) the concept of Identity Based Encryption (IBE) using Pairings Based Cryptography (PBC). It gives examples of implementing IBE schemes in perl and discusses some of the features and drawbacks of IBE schemes. I am writing this to try to spread the word about how much fun this stuff can be.
IBE allows us to use the identity of a person, computer, NIC, email account, or some other identifier as a public key for encryption and authentication. It is frequently said to be "self-authenticating" since every party in an IBE network can (indeed, must) generate the public key for any other party using their knowledge of the target identity .
[This entire post runs as one giant program using the download link.]
Shamir is usually credited as the first person to cook up this whole idea, but the important paper, cited everywhere is Boneh & Franklin: . These technologies have been maturing at a nice pace, but Elliptic Curve Cryptography (ECC), PBC, and IBE are all very young fields and there are a lot of unsolved problems (like what to do with it). An interesting side effect of using ECC/PBC to implement IBE is that you get the same crypto-protection from 160bit keys in ECC that you would from 1024bits of in RSA .
Despite how cool IBE sounds at first, it has some major drawbacks. Consider the role of the Certificate Authority in SSL/TLS. You need to get signed before people believe you. That's not so bad. In IBE, it's slightly worse. You have to have a TA (Trusted Authority) that generates all the private keys in the system. And that does mean the TA can decrypt everything in the IBE system! The TA is sometimes called a TTP (Trusted Third Party) or a KGC (Key Generating Center). I tend to despise the term KGC for crypto because of my KGC, but that's a personal problem...
Actually, the TA picks a master-secret s (a value in a cyclic group) that it uses for key generation. It also picks a bi-linear pairing (usually called ê), a generator P (an element in a group that can generate the whole group), and various other things . For example, since your identity never expires, it's typical to see researchers put other things in there with the identity, like version numbers and expiration dates. As described by Hoeper and Gong, that'd be Qi = H1(IDi, expiration, version) . H1() is a function that maps real life values into the pairing, so Qi just describes a point (or points) on a curve.
Many other documents cover DH in much more detail than I intend to here. Please visit the Wikipedia page on the subject of Diffie-Hellman for the full skinny. Basically, the idea is that the two sides negotiate to form a secret that they never transmit over the untrusted network.
How does DH relate to IBE? It doesn't, but I'm in no position to explain the inner workings of ECC or PBC, so instead I'm going to compare it to DH. ECC and PBC math is based on Group Theory and other abstract algebra. It is a bit over my head. DH serves as an example though, because in the PBC model of IBE, PBC serves the role of a harder version of the computational Diffie-Hellman problem .
Since this site is all about code, I won't bother explaining the protocol in English. Here's a by-hand implementation to read. It does not differ greatly from Crypt::DH except that the following isn't directly usable. Also, I'm very serious about this, don't use rand() for crypto.
The reason I included the above is to convey just how simple these crypto systems really are. They sound massively complex — and the math certainly can be for PBC — but mostly they just depend on math problems that are really hard to do backwards (the discrete log problem in this case).
The reason you can calculate the secret so fast in DH is because they're in a cyclic group (which also makes them hard to solve). Rather than calculating $a ** $p and $b ** $p and then taking the modulo, you take the modulo at each iteration of the pow() loop. bmodpow() is a special super-fast GMP call for that very purpose.
The PBC Library @ StanfordSadly, GMP can't handle the esoteric matrix operations required to do parings calculations on a curve. Fortunately for us, Ben Lynn started up this project: http://crypto.stanford.edu/pbc/. It is a generic PBC library, so it is capable of a lot more than just IBE, which is only one of many areas where PBC can be used. I have ported libpbc to perl in the form of Crypt::PBC, which is part of the reason I'm writing this document. Since PBC and IBE are very young, it isn't always clear how to approach the problem. There are various ways to do things and no general agreement on The One True Way.
Let's get started. First, we'll select a pairing. There are a staggering number of choices — in fact, they may be infinite for all I know. They're divided into broad categories (by Lynn at least and possibly others): a, a1, d, e and f. They represent different types of curves and are documented reasonably well in the libpbc manual.
Since, I think this stuff is too young to actually be used by anyone that can't double check the math on their own (i.e., not me), I tend to just use one of the example pairing parameter files that comes with the libpbc distribution. I'd guess Lynn usually chooses d-type pairings, since he provides a big .zip file full of choices on the libpbc website, but I tend to use a-type pairings since they're symmetric (meaning G1 and G2 elements can be interchanged because they are "algebraically similar").
Hrm, I should back up a little. In our DH example above, we had one cyclic group generated by $g and $p. In PBC we'll have elements in G1, G2, Gt and Zr — all are cyclic groups having something to do with the bi-linear pairing we selected in the parameters file. I'm happy to say we do not need to care overmuch what they represent, but if you are curious there are many works on the subject — choose google and have fun.
Anyway, let's select a curve. I have included an a-type pairing, a d-type pairing and an e-type pairing in the Crypt::PBC module distribution, but it doesn't install them anywhere; so, you'll have to snag them from your .cpan/build/Crypt-PBC dir (they're the *.txt files). While we're at it, let's also build our Trusted Authority values. Again, the TA is the master-secret holder and the entity that builds all the secrets in the system.
Hey, that wasn't so hard! Note that in  the public key is listed as sP. In group theory what looks like multiplication really refers to some operation, not necessarily multiplication. In these IBE examples we'll mostly use pow_zn() and e_hat() to mix the elements. From now on, I will be referring to pow_zn($P, $s) as $s$P.
Next, let's build a public key for an identity. Note that any node in the network will have $pairing, $P, $P_pub, and params_d.txt. The only thing that's secret is $s. Suppose my identity is jettero, which I'll encode as node=16186 and then map to an element in G1:
According to Lynn, it's pretty meaningless to try to pass more than around 160 bits to set_to_hash(). Happily, sha1() returns precisely 160 bits. Note that all nodes have their own $pairing and all nodes generate $Q_i on their own. The only keys and secrets transmitted (or negotiated) are the private keys. The node identified by $ID_i has to ask the TA for their private key $s$Q_i which is generated like this:
There exist various strategies for transmitting the private key over open networks. I might have chosen SSL/TLS or Crypt::DH to exchange the secret had Hoper and Gong not referred to a "blinding technique"  they cite in . It can instead be elegantly achieved (as in ) fairly easily, but we'll come back to the secure distribution of $d_i in a bit. We have a few more things to go over first.
What do you do with $d_i and $Q_i to encrypt and decrypt things? According to , there are a few properties we can use. First, we construct $g_i = ê( $Q_i, $P_pub ) in G2, a random number $r in Zr, and calculate $r$P. We then use $r$g_i as a shared secret that $ID_i can recover with $r$g_i = ê( $d_i, $r$P ).
There are a few interesting things going on here. First, notice that the encryption is symmetric — both sides have the secret. The sender could even be coerced into decrypting the message unless she carefully throws away the secret $W1 and the random nonce $r after all encrypt/decrypt operations are complete. Second, this communication is repudiable (i.e., the sender can claim they never sent it). And third, the receiver has no way to know who this message came from, meaning it is un-authenticated.
Most applications I'm familiar with will have fewer uses for un-authenticated (but secure) channels than for authentic ones. It is a good thing then, that every node in the IBE system can compute a non-negotiated self-authenticating shared secret that they can use as a key for a symmetric cipher or as a secret in a MAC (message authenticating code). That secret is $K_ij = ê( $d_i, $Q_j ) = ê( $Q_i, $d_j ) -.
Before we can get started though, we'll have to select a new pairing. It's required that e_hat() be called in this fashion: GT = e_hat(G1, G2). In the d-type pairing we selected above (an asymmetric one), G1 is represented as a 1x2 ([GMP, GMP]); G2 by a 2x3 ([[GMP, GMP, GMP], [GMP, GMP, GMP]]). e_hat() will take (G2, G1), but only if the arguments are "algebraically similar." If we switch to an a-type paring, then G1 and G2 are both represented by a 1x2. You can make identical elements for $Q_i in G1 and in G2 and make identical elements for $d_i in G1 and in G2 just so you can swap the argument order in e_hat(), but I find that it's simpler to just make a symmetric pairing. Lastly, you can't reliably mix elements from different pairings. It might work here and there (as a quirk of memory management accidents), but I'm reasonably certain it will cause trouble eventually.
In my experience, $K_10 is always the same as $K_01, but I have never seen a proof of that in a paper — at least that I understood was proof of it. In fact, the proof may be obvious by inspection to people familiar with Abstract Algebra and Group Theory! In any case, I tend to treat $K_10 and $K_01 as if they were different and choose to select the node with the lowest ASCII sort $ID for the left hand $Q. $ID_0 sorts before $ID_1 in the example above, so I would use $K_01.
The features of the $K_ij secret are fascinating. First, no negotiation is required . Tye and Merlyn can each calculate that secret on their own. That could be really helpful for networks where low network load is really important — like wireless nodes with very limited battery supplies . Compare this with Crypt::DH, where the sender and receiver have to exchange several messages to build their shared secret. The only thing Tye and Merlyn can't calculate on their own is their private key (distribution of which is the very next section). Second, the secret and any messages ciphered with it are pre-authenticated. Only the two nodes involved can make that secret — not counting the TA, which can calculate all secrets.
We can use the secrets in one of two ways:
One thing we can't do (if we're $ID_1) is prove to a third party that $ID_0 produced the message. The verifiable message is therefore said to be repudiable, which just means you can take back what you said if someone tries to hold it against you. There are schemes to sign messages (e.g., BLS Signatures), but they're not covered here.
Secure Key Distribution
Since $d_i can be used to decrypt anything intended for $ID_i, we need a scheme to get the key from the TA to $ID_i in a secure fashion. We can use more equations with ê() to solve this problem. First, we need a random nonce $x_i in Zr with which we calculate $X_i = $x_i$P [Note: Xx differ only by height in my browser]. Next, we send $X_i to the TA along with our request for $d_i. We can now rely on the fact that ê($s$X_i, $P_pub) = ê($P_pub, $P_pub)$x to calculate a shared secret with which we can construct an authentic secure channel .
The same paper that talks about secure key distribution () talks about master-secret sharing to protect privacy. In fact, in the paper, the two concepts are completely tied together. It should be reasonable clear how to recombine them, so I'll keep this example as simple as possible.
Their idea is that there's an inherent "key escrow" problem in IBE. They see the same problem I do: that the TA can decrypt everything. So they consider the idea of having privacy partners that keep the master-secret safe. The idea is that as long as at least one of the Privacy Authority (PA) entities is honest, the others cannot maliciously build any of the private keys ($d_i)in the system.
The biggest difference here is that instead of using a single secret $s, we have a list of secrets @s — one for each PA (plus the TA). The public key is called $Y, rather than $P_pub, and it combines all the secrets together: $Y = $s0$s1$s2 ... $sn$P. Of course, none of the PAs will want to hand their $s[$i] over to their neighbor. They instead pass $s[$i]$P[$i-1] to their next-in-line. Next-in-line? Yes. Order matters building $Y and $d_i!
Some Closing Thoughts
Please consider that this document is exceedingly myopic in it's coverage of Pairings Based Cryptography — in that it only covers Identity-Based Encryption and not very thoroughly at that. Some content that I considered including was omitted to help keep the article focused. For example, every scheme above is repudiable, since I did not cover signature schemes at all. If this article is well received, I may write a second article on PBC and try to cover more PBC and IBE concepts.
Although I love IBE and I hope it really takes hold somewhere ... eventually. I hope it never pops up in email systems. It seems like a natural fit at first glace, but compare it to GnuPG/OpenPGP and it seems to be lacking a few things. Having no central authority is certainly a feature I'm not willing to live without. The addition of the Privacy Authorities helps a little, but use your imagination: What if the privacy authorities were The Whitehouse, the FBI, Verisign, and Microsoft? Do you feel secure that they won't try to read your mail?
That said, it does seem like it'd be a good fit at work, where the TA could be your mail administrator and your PAs could be a couple of Industry Vice Admirals or some HR managers or something. It is probably appropriate under most circumstances that your employer should be able to read your email — well, work emails anyway.
My primary goal in writing this is the hope that a genius or a lucky person might read it and come up with the most clever use of all ... whatever that is ...