Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Prime Example Lecture 20 CS 15-251 A positive integer p 2 is prime if the only positive integers that divide p are 1 and p itself. Positive integers n 2 which are not prime are called composite. The prime factorization of a positive integer n is an expression of n as a product of primes n = p1 p2 p3 …pk Theorem: (Fundamental Theorem of Arithmetic) Every positive integer n 2 has a unique prime factorization (up to order of the prime factors). Proof. Existence: Suppose m is the smallest positive integer 2 with no prime factorization. m is prime m itself is a prime factorization m is composite m = m1m2 for 1 < m1 < m and 1 < m2 < m m1 and m2 have prime factorizations so m1m2 yields a prime factorization for m Uniqueness: Suppose m is the smallest positive integer 2 with two different prime factorizations. m = p1p2…pi m = p1’p2’…pk’ p1 | p1’p2’…pk’ p1 | pj’ for some 1 j k p1 = pj’ (since both are prime) Remove p1 from p1p2…pi and pj’ from p1’p2’…pk’ to get m’ with two different prime factorizations: m’ = p2…pi = p1’p2’…pj-1’pj+1’…pk’ But m’ < m, so m’ must have a unique prime factorization. Three Classic Problems 1. Density How many primes are in {1…n}? 2. Generation List all the primes in {1…n}. 3. Testing Given a positive integer n, is n prime? These problems go back to the ancient Greeks! …And a Modern Problem 4. Random choice Pick a random prime number in {1...n}. This problem arises in cryptographic algorithms (such as RSA) that need large prime numbers to make cryptographic keys. We’ll learn more about these algorithms in a future lecture. 1. Density of Primes Let p(n) the number of primes in {1…n} p(10) = 4 p(20) = 8 (2,3,5,7) (2,3,5,7,11,13,17,19) Theorem (Euclid): The set of prime numbers is infinite. Proof: Suppose the primes are the finite set {p1, p2, …, pk } Let m = p1 p2 … pk + 1 m is not divisible by any prime pi, so m must be a multiple of a prime that is not in the “set of all primes”. Thanks to Euclid, we know p(n) as n . But how thickly distributed are the primes? That is, if we pick a random number in {1…n}, what’s the probability of getting a prime? We’ll see a better characterization of p(n) shortly. 2. Generating Primes Give an algorithm that lists all the primes in {1...n}. Sieve of Eratosthenes set prime [2..n] = 1 for p = 2 to n do if prime [p] = 1 then print “p is prime” for m = 2 to n/p do prime [mp] = 0 For each prime p, the Sieve eliminates all multiples of p. No prime will ever be eliminated, and every composite (which must have a prime factor smaller than itself) is guaranteed to be eliminated before the outer loop reaches it. Running time: O(p(n)n (logn)2 + n) if multiplication is O((logn)2) 3. Testing Primality Give an algorithm for deciding whether a number n is prime. A) Trial division for k = 2 to n do if k | n then return “n is not prime” otherwise return “n is prime” O(n (logn)2) time if division is O((logn)2) B) Sieve method Run the Sieve of Eratosthenes on {1...n} O(p(n)n (logn)2 + n) time C) Trial division up to n for k = 2 to n do if k |n then return “n is not prime” otherwise return “n is prime” O(n (logn)2) time if division is O((logn)2) Claim: if n is composite, then n has a prime factor p n. Proof: By contradiction. Suppose some composite n has a prime factorization n = p1 p2 … pk where all pi > n. But then n = p1 p2 … pk > (n )k which is a contradiction unless k < 2, that is, unless n is prime. The trial division algorithm can be easily adapted to find at least one factor of n: for k = 2 to n do if k |n then return “k is a factor of n” otherwise return “n is prime” This algorithm runs in O(n (logn)2) time, which is O(n) (in fact, sublinear). So why do we think factoring is so hard? Why are banks and governments willing to trust their secrets to a number that can be factored in O(n) time? Trial division is exponential as a function of the length of its input! The input number n can be represented by k=O(logn) bits. As a function of k, the trial division algorithm runs in O(n) = 2O(k) time 4. Random Primes Give an algorithm to pick a random prime number in {1..n}. Brute force algorithm Generate all the primes in {1…n} Pick one at random Randomized algorithm Pick a random number m {1…n} Test if m is prime If m is prime, return it Otherwise try again What is the expected number of tries before finding a prime? Let p = probability of picking a prime on one try p = p(n)/n Expected number of tries to get a prime is 1/p = n/p(n) The randomized algorithm motivates us to look for better solutions to the classic problems in prime numbers. Density of primes We need some assurance that primes are numerous enough that the expected number of tries n/p(n) is about O(logn). Testing primality The O(n) trial division algorithm is intractably slow. We need something more like O(logn). Solving these two problems will consume the rest of the lecture. Density of Primes (revisited) Prime Density Theorem p ( n) lim 1 n n / ln n In other words, as n , p(n) n / lnn. This deep and famous result was conjectured by Euler around 1750, but not proved until 150 years later, by Hadamard. The proof is hard. Corollary: The density of primes p(n)/n 1/lnn as n . Example: How many primes are in {1 … 1010}? p(1010) 1010 / ln 1010 434,000,000 Example: What is the probability that a random 100-digit number is prime? p(10100)/ 10100 1 / ln 10100 1 in 230 For our purposes, we don’t need to prove the Prime Density Theorem. All we need is the weaker statement: p(n) = W (n /logn) If this statement is true, we can put a lower bound on the probability that a randomly chosen number in {1..n} is prime: p(n)/n = W (1/logn) So picking randomly from {1…n}, we need only expected O(logn) tries to find a prime number. Theorem: p(n) = W (n /logn) Proof: For a prime p and a positive integer n, define the multiplicity of p in n as the number of times p occurs in the prime factorization of n. 500 = 5×5×5×2×2 multiplicity of 5 in 500 = 3 multiplicity of 2 in 500 = 2 multiplicity of 11 in 500 = 0 Claim: for any prime p and positive integer n, the multiplicity of p in n ! is n pi i 1 Look at n ! = 1×2×3××××n This product contains n/p multiples of p n/p2 multiples of p2 n/pi multiples of pi So the formula above counts the multiples of p, p2,… such that a multiple of pi (but not pi+1) is counted exactly i times. Since only multiples of p can contribute p factors, the formula counts the number of times p appears in the prime factorization of n ! Let r (p) be the natural number such that pr (p) 2n < pr (p)+1 Claim: for any prime p and positive integer n, the 2n multiplicity of p in n is at most r (p). 2n (2n)! 2n(2n 1)( 2n 2) (n 1) n(n 1)( n 2) 1 n n!n! Each factor p in the denominator cancels out a factor p 2n in the numerator. So the multiplicity of p in n is 2n n 2n n pi 2 pi pi 2 pi i 1 i 1 i 1 Lemma: a - a < 1 2a - 2 a < 2 2a - 2 a < 2 2a - 2 a 1 2n n i 2 i since p r (p)+1 >2n i 1 p p by lemma at left r ( p) r ( p) 2n n The prime factorization of is the product of its prime factors raised to the power of their multiplicities. Since the multiplicity of any prime 2n factor p in n is at most r (p), we get: 2n r ( p) p n prime factors p of 2 n n p r ( p) prime p 2 n 2n prime p 2 n ( 2n) p ( 2 n ) since the prime factors of 2n are at most 2n n since p r (p) 2n by definition 2n ( 2n) p ( 2 n ) n Taking logs of both sides and rearranging terms gives 2n log n p ( 2n) log 2n 2n 2n(2n 1)(2n 2) (n 1) Since 2n , n(n 1)(n 2)1 n n p ( 2n) log 2n So we have the desired result: n p ( n) 2 log n A quick detour... The prime density theorem states that n/lnn is a good estimate for p(n). But how large is the error? The Riemann Hypothesis lim n p (n) n / ln n n 0 Riemann proposed this conjecture in 1859. It is one of the most famous open problems in mathematics. Fermat’s Little Theorem For all primes p and integers a such that 1 a p -1, a p -1 = 1 (mod p) For example: a = 5, p = 7 56 = 15,625 = 2,232×7 + 1 = 1 (mod 7) What is 238 (mod 7)? 238 = (26)6×22 = 16×22 = 4 (mod 7) Recall From Group Theory A group (G, ×) is a set G and a binary operation × that has closure, associativity, inverses, and identity. The set Zn* = { x | 1 x < n and gcd(x,n) = 1} is a group with respect to multiplication mod n. In particular, for prime p, Zp* = {1 ,…, p -1} is a group with multiplication mod p. Lagrange: If G is a finite group and S is a subgroup of G, then the size of S divides the size of G. Theorem. (Fermat) For all primes p and integers a such that 1 a p -1, a p -1 = 1 (mod p) Proof: Given a Zp* Let aZp* = { ax | x Zp*} So aZp* = { a, 2a, 3a, …, (p-1)a } aZp* = Zp* because ax aZp* ax Zp* and x Zp* x = a (a -1 x) aZp* Multiply all the elements in each set (mod p): a×2a×3a× × × (p -1)a = 1×2×3× × × (p -1) (mod p) a p -1 (1×2×3× × × (p -1)) = 1×2×3× × × (p -1) (mod p) p -1 Testing Primality (revisited) A simple randomized primality test is based on Fermat’s Little Theorem. To test if n is prime: pick a {1…n -1} at random if a n -1 1 (mod n), return “n is not prime” else O(log n) return “n may be prime” multiplies by using repeated squaring Hmm... What if n = 341, a = 2? 2340 = (210)34 = 102434 = (3×341 + 1)34 = 1 (mod 341) But 341 is not prime. 341 = 11×31 Fermat’s Little Theorem doesn’t work in both directions. Just because some numbers n and a satisfy the Fermat property, that doesn’t necessarily mean that n is prime! The situation is even worse than that. Some composite numbers (called Carmichael numbers) actually satisfy Fermat’s Little Theorem for all 1 a n-1. Our simple primality test cannot tell Carmichael numbers from prime numbers. First three Carmichael numbers: 561 1105 1729 Fortunately Carmichael numbers are extremely rare -- only 255 of them occur in the first 100,000,000 integers. We will ignore them for now. If n is not a Carmichael number, we can bound the probability that the primality test goes wrong. Define Kn* = { a Zn* | an-1 = 1 (mod n)} So Kp* = Zp* for primes p (by Fermat’s Theorem) K341* contains 2 Kk* = Zk* for Carmichael numbers k (by definition) If our choice of a for the test happens to fall in Kn*, then the test would give the wrong answer. We want Kn* to be small relative to {1…n-1}, so that a random a {1…n-1} has a good chance of missing Kn*. Theorem. If n is composite but not a Carmichael number, then Pr (a Kn*) 1/2. Proof: Recall Kn* = { a Zn* | an-1 = 1 (mod n)} Claim: Kn* is a subgroup of Zn* Closure: an-1 = 1 (mod n), bn-1 = 1 (mod n) (ab)n-1 = 1 (mod n) Identity: 1n-1 = 1 (mod n) Associativity: multiplication is associative Inverses: an-1 = 1 (an-1)-1 = 1 (a -1)n-1 = 1 (mod n) Since n is neither prime nor Carmichael, Kn* Zn* |Kn*| < |Zn*| But |Kn*| divides |Zn*| (Lagrange’s Theorem) So |Kn*| 1/2 |Zn*| 1/2 (n-1). We can boost the probability of the randomized primality test as high as we like by repeating it. Randomized primality test: repeat k times pick a {1…n -1} at random if a n -1 1 (mod n) then return “n is not prime” return “n is probably prime or Carmichael” (Chance of mistake < 1/2k as we vary over the possible coin tosses of the algorithm) A slightly more complicated version of this algorithm (the “Miller-Rabin” algorithm) detects and eliminates Carmichael numbers with probability 1 - 1/2k.