Download slides

PRIMALITY TESTING – its importance for cryptography Lynn Margaret Batten Deakin University Talk at RMIT May 2003 Prime numbers have attracted much attention from mathematicians for many centuries. Questions such as • how many are there? • is there a formula for generating them? • how do you tell if a give number is a prime? have fascinated people for years. However, the first actual use of prime numbers in an important area outside of the theory of numbers was discovered only in the mid to late 1900s. This was in the establishment of a technical system to be used in maintaining the secrecy of electronic communications. TRANSMITTER Message M Source Encrypt M C = EK (M), COMMUNICATION CHANNEL RECEIVER Decrypt C M = DK (C), C 1 2 Using key K1 C Using Key K2 Cryptanalyst Key Source #1 Random Key K1 is Produced K1 KEY CHANNEL K2 Key Source #2 Decryption Key K2 Determined from K1 Conventional cryptosystem. The key channel must be secure. The Diffie-Hellman scheme proposed in 1976, was a radical departure from what, up to then, had all been essentially ‘private key’ schemes. The idea was that everyone would own both a ‘private key’ and a ‘public key’. The public key would be published in a directory, like a telephone book. If A wanted to send B an encrypted message, A simply looked up B’s public key, applied it and sent the message. Only B knew B’s private key and could use it to decrypt the message. PROBLEM? Diffie and Hellman had no concrete example of an encryption/decryption pair which could pull this off! Then along came the Rivest, Shamir, Adleman (RSA) solution in 1977: Public information: • n an integer which is a product of two large primes (p and q kept secret), and • e a positive integer less than (p-1)(q-1) with gcd(e,(p-1)(q-1)) = 1. Secret information: • The two primes p and q such that n = pq, and • d such that ed  1 (mod (p – 1)(q – 1)). To encrypt the message/number m: c  me (mod n). To decrypt c: cd  med  m (mod n). Example. Let n = 101 x 107 = 10807 and e = 7. Note 7d  1 (mod 100x106), or 7d  1 (mod 10600) so d = 4543. To encrypt the message m = 109 we find c = 1097 (mod 10807) = 4836. To decrypt find cd = 48364543  109. The security of this scheme depends on the difficulty of factoring n. In fact, it is easy to show that knowing d is equivalent to factoring n. No way of breaking RSA is known, other than finding the secret information. Thus the RSA scheme leads to the following two problems: 1. Find a large pool of big ( >100 digits) primes. (If very few of these are available, Oscar will easily be able to get his hands on the list and simply try them all in order to break the scheme.) 2. Find a quick (polynomial time) algorithm to factor integers. (There is no known deterministic, polynomial time algorithm for factoring integers.) We take a look at problem 1. The primes p and q must be of sufficient size that factorization of their product is beyond computational reach. Moreover, they should be random primes in the sense that they be chosen as a function of a random input which defines a pool of candidates of sufficient cardinality that an exhaustive attack is infeasible. In practice, the resulting primes must also be of a pre-determined bitlength, to meet system specifications. Since finding large primes is very difficult. And also, since the known primes are usually available in some library or on some website, one of the 'solutions' to problem 1 has been to investigate numbers that are not primes, but simply act like primes. Generally speaking, we say that a composite integer N is a pseudoprime if it satisfies some condition that a prime must always satisfy. One result for primes is the well-known: FERMAT'S LITTLE THEOREM Let p be a prime, and gcd(a,p) = 1. Then ap-1  1 (mod p). [Try a =2 and p=7.] The converse of Fermat's theorem is false as we see by the following example: Let N = 2701 = 37•73. Then 22700  1 (mod2701). Now consider the following: Definition We say that the composite integer N is a base b pseudoprime (written b-psp) if bN-1  1 (mod N). (*) Thus a b-psp acts like a prime with respect to Fermat's theorem, but it is not a prime. If there were only a few such numbers, this would not improve our situation, but as early as 1903 Malo showed that there exists an infinite number of composite N satisfying (*). There exists an infinite number of base b pseudoprimes because: Theorem If p is an odd prime, p  b (b2  1) and N = (b2p  1) / (b2  1), then N is a b-psp. The existence of so many pseudoprimes indicates that the question of deciding whether a given number is prime or composite is a difficult one. This leads us back to RSA and its second problem (factoring) which we now approach from a different angle – that of primality testing. It was simply very difficult (if not impossible) to prove that a randomly selected 100-digit number was a prime back in 1978. Furthermore, the primality proving methods that were available did not lend themselves to easy implementation in hardware, a necessary condition for RSA to become widely useable. A result of this situation was the refinement and further development of what are called probabilistic primality tests. Probabilistic methods Let S be any set. A Monte Carlo algorithm for S is an algorithm, which, given x and a source of random numbers for choosing x, returns “yes” or “no” with the properties that: If If x  S , then the answer is always “no”; x  S , then the answer is “yes” with probability at least ½. Solovay-Strassen test The Solovay-Strassen probabilistic primality test (1977) was the first such test popularized by the advent of public-key cryptography. There is no longer any reason to use this test, because an alternative is available,the Miller-Rabin test, which is both more efficient and always at least as correct. Miller-Rabin Test The probabilistic primality test used most in practice today is the Miller-Rabin test (1980), also known as the strong pseudoprime test. The test is based on a more complex version of Fermat’s Little Theorem: ap-1  1 (mod p) or ap-1 - 1  0 (mod p) for p prime and gcd(a, p) =1. For p odd, of course p – 1 = 2r is even. Then ap-1 - 1 = a2r – 1 = (ar -1)(ar + 1). So ap-1 – 1  0 (mod p) implies that the prime p divides into ar – 1 or into ar + 1 and consequently ar  1 (mod p) or ar  -1 (mod p). This can be taken even further, by taking all powers of 2 out of p – 1 to obtain the following fact. Fact 1. Let n be an odd prime, and let n – 1 = 2sr where r is odd. Let a be any integer such that gcd(a, n) = 1. Then either r a  1 (mod n) or a 2j r  -1 (mod n) for some j, 0  j  s – 1. Definitions Let n be an odd composite integer and let n – 1 = 2sr where r is odd. Let a be an integer in the interval [1, n – 1] relatively prime to n. (i) If ja  1 (mod n) and if 2 r a  1 (mod n) for all j, 0  j  s – 1, then a is called a strong witness (to compositeness) for n. (ii) Otherwise, n is said to be a strong pseudoprime to the base a . The integer is called a strong liar (to primality) for n. r a Example (strong pseudoprime) Consider the composite integer n = 91 =7x13. Try a = 9. Since 91 – 1 = 90 = 2 x 45, s = 1 and r = 45. Since 9r = 945  1 (mod 91), 91 is a strong pseudoprime to the base 9. The set of all strong liars for 91 is {1, 9, 10, 12, 16, 17, 22, 29, 38, 53, 62, 69 74, 75, 79, 81, 82, 90}. Notice that the number of strong liars for 91 is less than 90/4. Fact 1 can be used as a basis for a probabilistic primality test due to the following result. Fact 2 If n is an odd composite integer, 1 then at most 4 of all the numbers a, 1  a  n –1, are strong liars for n. Algorithm Miller-Rabin probabilistic primality test MILLER-RABIN (n,t) INPUT: an odd integer n  3 and security parameter t  1. OUTPUT: an answer ‘prime” or “composite”. 1. 2. 3. Write n – 1 = 2sr such that r is odd. For i from 1 to t do the following: 2.1 Choose a random integer a, 2  a  n – 2. 2.2 Compute y = ar mod n. 2.3 If y  1 and y  n – 1 then do the following: j  1. While j  s – 1 and y  n – 1 do the following: Compute y  y2 mod n. If y  1 then return (“composite”). j j + 1. If y  n – 1 then return (“composite”). Return (“prime”). If n is actually prime, this algorithm will always declare ‘prime’. However, if n is composite, Fact 2 can be used to deduce the following probability of the algorithm erroneously declaring ‘prime’. FACT 3 (Miller-Rabin errorprobability bound) For any odd composite integer n, the probability that MILLER-RABIN (n, t) incorrectly declares n to be “prime” is less than  1  t  4 To perform the Miller-Rabin test on n N to base , we will a need no more than log2( ) n (which is the number of bits in the binary representation of n) modular exponentiations, each using  (log 22 a) bit operations. Hence, the Miller-Rabin test to base a takes 3  (log 2 n) bit operations. Since 2  a  n  2, we can run this up to n – 3 times, but the more values of a we run, the slower the algorithm. In 1983, Adleman, Pomerance and Rumely gave the first deterministic algorithm for primality testing that runs in less than exponential time. For n the number being tested, the O ( logloglog n ) time needed is (log n) . In 1986, two independent algorithms were developed by Goldwasser and Kilian and by Atkin which, under certain assumptions, would guarantee primality (but not necessary compositness) in polynomial time. Then in August 2002, Agrawal, Kawal and Saxena made public their unconditional deterministic, polynomial-time algorithm for primality testing. For n the number being tested, this algorithm runs in O((log n)12 ) time. The proof that the algorithm works uses relatively basic mathematics and we shall outline it here. The AKS algorithm is based on the following identity for prime numbers: ( x  a) p  ( x p  a) (mod p) for any a such that gcd( a, p)  1. We expand the difference between the polynomials. Thus, for o  i  p, the coefficient of x in (( x  a) p  ( x p  a)) is (1)i  pi a pi .   If p is prime,  pi  is divisible by p for all i. If p is not prime, let q be a prime divisor of p and q k || p. k pq  p . The q does not divide  q  or a p p (( x  a )  ( x  a)) is not zero In this case modulo p. i So, given p to test, one could choose a value p for a and test ( x  a) as above. We would need to evaluate about p coefficients however, in the worst case, which is too slow. The trick used to reduce the run time is a standard one in algebra: We ‘mod out’ by a polynomial to obtain ( x  a)  ( x  a) p p (mod x  1) r Still working modulo p . r How is x  1 chosen? Will this work? ( * ) In fact, all primes p satisfy * for any choice of r  1 and of a . Unfortunately, some composites may also satisfy ( * ) for some choices of the pair ( a, r ). Congruence ( * ) takes O(r log 2 p) time to check if Fast Fourier Multiplication (Knuth, 1998) is used. The authors show that a suitable choice of r is: r prime of order O(log 6 p ) where x  1 contains a factor of a certain size. They then verify their algorithm for a small (O( r log p )) number of a ‘s. The algorithm _______________________________________________ Input: 1. 2. 3. 4. 5. 6. Integer n 1 If ( n is of the form a , b  1 ) output COMPOSITE; r2 ; While ( r  n) { if (gcd( n, r )  1) output COMPOSITE if ( r is prime) let q be the largest prime factor of r  1 ; b r 1 q  1 (mod r )) 7. if ( q  4 r log n) and ( n 8. break; r  r 1 ; 9. 10. } 11. For a  1 to 2 r log n n n r 12. if (( x  a)  ( x  a) (mod x  1, n)) output COMPOSITE; 13. output PRIME; _______________________________________________________________ The first loop in the algorithm tries to find a prime r such that r  1 has a large prime factor q  4 r log n. The authors show that r , as described in line 7 of the algorithm, must exist, and they are even able to establish bounds on it. They then use these bounds to establish that if n is prime, the algorithm returns PRIME. In order to show that if n is composite, the algorithm returns COMPOSITE, the following set is constructed: I g  {m | g ( x)m  g ( x m ) (mod x r 1)} Where g (x ) is a polynomial of the type on line 12 of the algorithm. There are 2 r log n such polynomials. Thus, if the algorithm falsely declares PRIME, every one of the 2 r log n incongruences in line 12 must be false. It follows that n  I g , and the authors show that this leads to a contradiction. Time Complexity _______________________________________________ Input: Integer n  1 b 1. If ( n is of the form a , b  1 ) output COMPOSITE; O(log 3n) 3. While ( r  n) { 4. If (gcd(n, r )  1) output COMPOSITE; poly (log log r ) 5. if ( r is prime) 1 r 2 poly (log log n) 6. let q be the largest prime factor of r  1 ; 7. If (q  4 r log n) and (n 10. }  1 (mod r )) poly (log log n) 8. break; 9. r  r  1 r 1 q ;    O(log 6 n) iterations Total: 1 1 O((log n  r )poly (log n  r 2 )) 6 2 or O(log 9 n poly (log 9 n)) 6 11. for a  1 to 2 r log n  12 12 n n r (( x  a )  ( x  a ) (mod x  1 , n )) 12. if O(log n poly(log n)) output COMPOSITE 13. output PRIME; Total: O(log 12n poly (log 12n)) Implications for future work: There is a good chance that people are already looking at implementing the new idea of using modulus by a polynomial to find a polynomial algorithm for factoring. REFERENCES M. Agrawal, N. Kayal, N. Saxena, ‘PRIMES is in P’. R. Crandall and C. Pomerance, ‘Prime numbers: A computational perspective’. Springer, 2001. D. Knuth, ‘Art of computer programming’, VII. AddisonWesley, 1998. H. Williams, ‘Edouard Lucas and Primality Testing’, CMS Monographs, Wiley, 1998.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides