Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 525 Randomized Algorithms & Probabilistic Analysis Winter 2008 Lecture 2: January 09 Lecturer: James R. Lee Scribes: Elisa Celis and Andrey Kolobov Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 2.1 Random measurements By cleverly measuring (i.e. viewing data) at random, it is sometimes possible to give a correct answer with high probability while reducing the amount of time or storage needed for computation. Random measurements have applications in dimensionality reduction, nearest-neighbor search, compressed sensing, and other problems. 2.1.1 String Equality Suppose Alice has a string a = a0 , . . . , an−1 , and Bob has a string b = b0 , . . . , bn−1 . How can they determine whether a equals b? A trivial solution would have Alice send a to Bob, and have Bob compare it to b. While this is correct, it is non-optimal because n bits need to be transmitted. We want to do better. The Fingerprint method uses a random measurement as follows. Without loss of generality, assume ai , bi ∈ {0, 1}, so a is a binary representation of some integer t. Now, let Alice choose a uniformly random prime number p ∈ {2, . . . , T } where t ≤ T . (In Section 2.3, we give a good way to choose p). Let the fingerprint of a be defined as Fp (a) = a mod p. Now, Alice sends Fp (a) and p to Bob, and Bob computes Fp (b). If Bob sees Fp (a) = Fp (b) we output Y ES, otherwise we output N O. The error for this solution is one-sided since Fp (a) = Fp (b) whenever a = b, so there are no false negatives. However, if a 6= b we can still have Fp (a) = Fp (b), so there may be false positives. We wish to show P r[error] is small when a 6= b. Consider the prime-counting function π(x) = |{p : p is prime, p ≤ x}|. If Fp (a) = Fp (b), then a ≡p b, so p divides |a − b|. We can now use the following proposition. Proposition 2.1. A nonzero n-bit integer has at most n distinct prime divisors. Proof. Each distinct prime divisor is at least 2, and the integer itself is at most 2n − 1. Therefore, by the fundamental theorem of arithmetic, there are no more than n prime divisors. Since |a − b| is a nonzero n-bit integer, there are at most n primes that divide |a − b|. Therefore, P r[error] ≤ Additionally, recall the following theorem. 2-1 n . π(T ) 2-2 Lecture 2: January 09 Theorem 2.2 (Prime Number Theorem). π(x) ∼ x ln(x) as x → ∞. In particular, for x ≥ 17, 1.26x x ≤ π(x) ≤ . ln(x) ln(x) ) Hence, P r[error] ≤ n ln(T T . If we want this to be small, we can simply choose T = cn ln(n). Thus, ln(n)) ln(n)) ln(cn ln(n)) ). Since ln(cln(n) = o(1), we have P r[error] ≤ n cn ln(n) = 1c (1 + ln(cln(n) P r[error] = 1 + o(1), c so we can make P r[error] arbitrarily small with the appropriate choice of c. Since p ≤ T , we know p will use at most ln(cn ln(n)) bits. To improve this, we take note of the following. Fact 2.3. A nonzero n-bit number has at most π(n) prime factors. Thus, P r[error] ≤ π(n) π(T ) , and we can choose T = cn to get P r[error] ≤ P r[error] = 1.26n ln(n) · ln(T ) T = 1.26 ln(cn) c ( ln(n) ). Hence 1.26 · (1 + o(1)). c This way we can make P r[error] arbitrarily small with a prime p of length at most ln(cn) bits. Thus, both p and Fp (a) use O(ln(n)) bits, and Alice sends Bob O(ln(n)) bits. To give a concrete example, if n = 1M B (approximately 223 bits) and T = 232 (32-bit fingerprint), then P r[error] ≈ 0.0035. 2.1.2 Pattern Matching Suppose we have two input strings, X = x0 , . . . , xn−1 and Y = y0 , . . . , ym−1 , where m < n. How do we determine whether Y is a contiguous substring of X? Let X(j) = xj , xj+1 , . . . , xj+m−1 . We can now ask if X(j) = Y for some j ∈ {0, 1, . . . , n − m}. The most trivial deterministic algorithm (explicitly comparing every contiguous substring of X to Y ) takes O(mn) time. There is a more efficient algorithm that works in O(m + n) time, but is hard to implement, has large overhead, and does not generalize well to similar problems. We will provide a randomized approach that also works in O(m + n), but is simpler and easier to extend. As before, let us treat X(j) and Y as binary integers. Choose a random prime p ∈ {2, . . . , T }, and compute Fp (Y ) = Y and Fp (X(j)) for all j ∈ {0, 1, . . . , n − m}. If there exists some j for which Fp (X(j)) = Fp (Y ), output M AT CH, otherwise output N O M AT CH. The error is one-sided since there are no false negatives, but there may be false positives. If X(j) 6= Y for every j ∈ {0, 1, . . . , n − m}, then by the union bound P r[error] ≤ n π(m) π(T ) . To get a tighter bound, recall that if Fp (X(j)) = Fp (Y ) then p divides |X(j) − Y |. Thus, if there is an error, p divides the product n−m Y |X(j) − Y |. j=0 Since |X(j) − Y | is an m bit number, and we multiply n − m of those together, this product is at most an 1.26mn ln(cmn) nm-bit integer. Thus, P r[error] ≤ π(mn) = π(T ) , and if we choose T = cmn we have P r[error] ≤ ln(mn) cmn Lecture 2: January 09 2-3 1.26 c ·(1+o(1)). 2 Hence we can make the error arbitrarily small using a prime p with no more than O(log(mn)) = O(log(n )) = O(log(n)) bits. A trivial bound on the runtime of this algorithm is O(mn) since we must compute Fp (X(j)) (in time O(m)) for n − m distinct js (giving a runtime of O(m(n − m)) = O(nm)). This is worse than the best deterministic algorithm. However, Karp and Rabin improved the runtime to O(n + m) in 1981 by making the following observation: If we know Fp (X(j)), then we can compute Fp (X(j + 1)) in O(1) steps. Specifically (under a binary representation), we know that X(j + 1) = 2(X(j) − 2m−1 xj ) + xj+m (note that the most significant bits in this representation are to the left). In other words, we need only “slide” the m-bit window one bit along the string as depicted below. By performing all the arithmetic modulo p, we see that Fp (X(j+1)) = Fp (2(Fp (X(j))−Fp (2m−1 )xj )+xj+m ). This takes O(1) time (assuming that p fits into a standard integer variable - a fact we assumed implicitly in previous runtime analysis). Therefore, the runtime becomes O(n + m) = O(n). To give a concrete example, if we are looking for a substring of length m = 28 in a DNA string of length n = 214 , then picking T = 232 (so p is a 32-bit integer) yields P r[error] ≤ 0.002. 2.2 Types of Randomized Algorithms. There are different kinds of randomized algorithms. In a Monte Carlo Algorithm, the output for a single instance may differ from run to run and may be incorrect. However, a Las Vegas Algorithm is never incorrect (it always returns a correct result, or reports a failure) and has a potentially unbounded worst-case runtime, but a small expected runtime. The algorithms we saw above were Monte Carlo Algorithms. Every Las Vegas algorithm can be turned into a Monte Carlo algorithm by running it until the expected runtime, and guessing if no answer has been found. It is unknown whether every Monte Carlo algorithm can be converted into a Las Vegas algorithm. However, we can convert the above algorithm for pattern matching into a Las Vegas algorithm as follows. Since the only error that can occur is a false positive, we need only check the cases where Fp (X(j)) = Y to make sure it is a true positive. Check each such case explicitly using bit-to-bit comparison. If we are very unlucky, we will need to do this (n − m) times. Thus, the worst case runtime is O(mn). However, this is extremely unlikely, and it can be shown the expected running time is O(n) and is highly concentrated. 2-4 2.3 Lecture 2: January 09 Primality Testing To implement the aforementioned algorithms, we need to be able to find prime numbers. In practice, we obtain a prime p ∈ {2, . . . , T } by choosing an integer m ∈ {2, . . . , T } uniformly at random, and checking if m is a prime. Theorem 2.2 says we only need approximately ln(T ) attempts before we find a prime number. Thus, the only remaining question is how to determine if an integer is a prime. Establishing this √ deterministically is a hard problem. A naive solution, trying to divide m by every k = 2, 3, . . . , b mc, has runtime exponential in the size of the input, while the best deterministic algorithm (Agrawal, Kayal and Saxena, 2002) runs in O(n6 ) time, and is impractical for many applications. We wish to develop a randomized algorithm for primality testing based on the following theorem. Theorem 2.4 (Fermat’s Little Theorem). If p is a prime number and p doesn’t divide a then ap−1 ≡ 1 mod p. To determine if m is prime, choose a ∈ {2, 3, . . . , m − 1} uniformly at random. If gcd(a, m) 6= 1, output N OT P RIM E. Additionally, if am−1 6≡ 1 mod m, output N OT P RIM E. If m passes these two tests, return P RIM E. We can compute gcd(a, m) in O(log2 m) time with Euclid’s algorithm. Additionally, am−1 mod m can be computed in O(log2 m) time by modular expoentiation (compute a2 mod m, then a4 mod m, etc.). Thus, the total runtime is O(log2 m), polynomial in the size of the input. The error is once again one-sided since we can only get false positives. However, this is far from perfect since there are an infinite number of Carmichael numbers - composite numbers m such that am−1 ≡ 1 mod m for all all a where gcd(a, m) = 1. It is known that P r[m is Carmichael] → 0 as b → ∞, where b is the number of bits in m. Thus, we are unlikely to encounter a Carmichael number. However, even if we do not encounter one, this does not necessarily mean we are safe; a non-Carmichael number m may still have am−1 ≡ 1 for most a. We will consider this problem in the next lecture, and introduce the Miller-Rabin test which circumvents the problem of Carmichael numbers altogether.