Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Experimenting with Shared Generation of RSA Keys Michael Malkin Thomas Wu Dan Boneh Stanford University *Supported by DARPA Why Share Keys? CA CA d d1 d2 d3 The private key is never reconstructed! Who generates the shared key? 1/4/99 2 Trusted Dealers CA d1 d2 Trusted Dealer d3 Drawbacks: • Single point of failure • May have to destroy dealer afterwards 1/4/99 3 Distributed Generation Advantages: • Nobody ever knows the entire key • No single point of failure Step 1 Step 2 Step 3 d1 d2 d3 1/4/99 N e 4 RSA Keys N An n-bit modulus, N = pq e The encryption (public) key d The decryption (private) key Sharing of d : d = d1 + d2 + d3 Can apply key without reconstructing d d is the secret p or q d 1/4/99 5 Distributed Generation* 1 p1 q1 p2 q2 3 p3 q3 Biprimality Test pi, qi are n/2 bit integers 2 p1 q1 p2 q2 p3 q3 N 4 p1,q1 d1 p2,q2 d2 p3,q3 d3 N = (p1 + p2 + p3)·(q1 + q2 + q3) = pq Nobody ever knows p or q! 1/4/99 (*Boneh-Franklin) 6 How Do They Compare? Non-Distributed: • Pick prime p • Pick prime q • Multiply Distributed: • Pick N • Hope N = pq is an RSA modulus • Can’t test p and q separately Distributed generation takes more iterations 1/4/99 7 Main Results Initial time: 2.5 hours (1024-bit key) • Distributed Sieving × 10 • Multithreading ×6 • Load Balancing × 1.3 • Parallel Trial Division × 1.3 Final time: 1.5 minutes 1/4/99 8 Minding Your p’s and q’s • Bad N probably divisible by 3 or 5 or 7 or … • Idea: Ensure that N isn’t divisible by any small primes Distributed Sieving • Can pick pi, qi so that p, q are not divisible by small primes … But nobody actually knows p or q! 1/4/99 9 Using Idle Time • Synchronous algorithm synchronization delays • Under-utilizing CPU — idle 80% of time Multithreading • 6 threads optimal for 1024-bit key • Almost 6 times faster! (On 300Mhz Pentium II’s running Solaris 2.6) 1/4/99 10 Costly Biprimality Test • Biprimality test involves time-consuming calculation • Idea: Only one server needs to do this Load Balancing • A different server does test for each iteration • Probabilistic load balancing 1/4/99 11 More Small Primes • What about small primes not covered by sieving? • Trial division on N by small primes Parallel Trial Division • Each server does trial division on different small primes 1/4/99 12 Private Key Generation • Implemented method for small e • In RSA usually use a small e • After N is found, generate d1, d2, and d3 so: d1 + d2 + d3 = d … But do this so that nobody ever knows d • There is an additional way to share d • Only a fraction of servers need to be active 1/4/99 13 Implementation: Config File Num_Servers: Key_Length: Threads: 3 Normal 2 TrialDiv_End: Sieve: Test_Mode: Sequence_Numbers: Transport: 10000 True True True sslv3 Share_IP_Port_0: Server_IP_Addr_0: Server_Sequence_File_0: Server_Cert_0: Server_Key_0: 1/4/99 8080 ittc.stanford.edu com_security/seq0 com_security/cert_s0.pem com_security/key_s0.pem 14 Implementation: COM • Abstraction layer • Fault tolerance - non-blocking I/O • Private, authenticated channels • Based on SSLeay • Authenticates share servers using a server certificate: /C=US/ST=California/O=Stanford University/ OU=ITTC Project/CN=[SERVER 0] 1/4/99 15 Shared Key Storage • Stored as PEM-encoded ASN.1 format Data Type Integer Integer Integer Integer Integer M Integer 1/4/99 Field Version N e k d1 M dk 16 Performance Primality Network Key Size Threads Tests Iterations Total Time Traffic 512 bit 2 36 119 0.15 min 0.18 Mb 1024 bit 6 49 130 1.5 min 1.16 Mb 2048 bit 6 234 495 18 min 7.48 Mb On three 300Mhz Pentium II’s running Solaris 2.6 • Network bandwidth is reasonable • 1024-bit works well • 2048-bit is reasonable 1/4/99 17 Effect of Number of Servers Minutes Time to generate a 1024-bit RSA key 6 5 4 3 2 1 0 3 4 5 WAN 1 CPU Number of Servers WAN: • Two servers at Stanford • One server at University of Wisconsin at Madison • Difficult to find PC’s running Solaris 1/4/99 18 1200 7 1000 6 800 5 Minutes Iterations per Thread Effect of Threads 600 400 4 3 2 200 1 0 0 0 2 4 6 8 0 2 Threads 4 6 8 Threads • Synchronization/CPU tradeoff • Minimize time with 6 threads *Generating a 1024-bit RSA key 1/4/99 19 12000 1200 10000 1000 Seconds Iterations Effect of Distributed Sieving 8000 6000 4000 2000 800 600 400 200 0 0 50 150 0 0 Sieve Bound 50 150 Sieve Bound • Sieve bound is largest prime sieved • Larger sieve fewer iterations • Diminishing returns *Generating a 512-bit RSA key 1/4/99 20 Conclusions key generation is practical: Distributed • 1.5 minutes for 1024-bit key Several practical improvements to algorithm • Distributed Sieving • Multithreading • Load Balancing • Parallel Trial Division Optimized cryptographic algorithm • Requires security proofs http://www.stanford.edu/~dabo/ITTC 1/4/99 21