Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Randomized Algorithms I, Spring 2016, Department of Computer Science, University of Helsinki Homework 2: Solutions (Discussed February 12, 2016) 1. Exercise 3.18: Show that, for a random variable X with standard deviation σ[X] and any positive real number t: a) Pr( X − E[X] ≥ tσ[X]) ≤ 1 ; 1 + t2 First note, that if σ[X] = 0, then X = E[X] with probability 1 and thus Pr(X − E[X] ≥ tσ[X]) = 1 regardless of t. Hence, if σ[X] = 0, then the claim does not hold for any t > 0. Now assume, that σ[X] > 0. Let’s define a normalized random variable Y= for which we have E[Y ] = and E[Y 2 ] = X − E[X] , σ(X) E[X] − E[X] =0 σ(X) E[(X − E[X])2 ] σ2 (X) = 2 = 1. σ2 (X) σ (X) Now, since t > 0, the following statements are equivalent: X − E[X] ≥ tσ(X) ⇔ Y ≥t tY ≥ t ⇔ 2 ⇔ 2 1 + tY ≥ 1 + t . Therefore, we get Pr(X − E[X] ≥ tσ(X)) = Pr 1 + tY ≥ 1 + t 2 ≤ Pr |1 + tY | ≥ 1 + t 2 = Pr (1 + tY )2 ≥ (1 + t 2 )2 E (1 + tY )2 ≤ (1 + t 2 )2 E 1 + 2tY + t 2Y 2 = (1 + t 2 )2 = 1 + 2tE[Y ] + t 2 E[Y 2 ] (1 + t 2 )2 1 + t2 (1 + t 2 )2 1 , = 1 + t2 = where the second inequality follows from Markov’s inequality. b) Pr(|X − E[X]| ≥ tσ[X]) ≤ 2 . 1 + t2 By dividing event |X − E[X]| ≥ tσ[X] into two disjoint events and using the result from a) for both random 1 variables X and −X, we get Pr(|X − E[X]| ≥ tσ[X]) = Pr (X − E[X] ≥ tσ[X] ∪ −X + E[X] ≥ tσ[X]) = Pr (X − E[X] ≥ tσ[X]) + Pr (−X + E[X] ≥ tσ[X]) = Pr (X − E[X] ≥ tσ[X]) + Pr (−X − E[−X] ≥ tσ[X]) 1 2 1 + = . ≤ 2 2 1+t 1+t 1 + t2 2. Exercise 3.21: A fixed point of a permutation π : [1, n] → [1, n] is a value for which π(x) = x. Find the variance in the number of fixed points of a permutation chosen uniformly at random from all permutations. Let Xi = 1 if π(i) = i and Xi = 0 otherwise. The number of fixed points is X = ∑ni=1 Xi . A permutation with fixed point i can be obtained by fixing π(i) = i and permuting the remaining elements in one of (n − 1)! orders. Thus (n − 1)! 1 Pr(Xi = 1) = = . n! n Furthermore, if i 6= j, then (n − 2)! 1 Pr(Xi = 1 ∩ X j = 1) = = . n! n(n − 1) Using the above equations we can compute the two first moments of X: n n 1 E[X] = ∑ E[Xi ] = ∑ Pr(Xi = 1) = n = 1 n i=1 i=1 and !2 E[X 2 ] = E ∑ Xi n i=1 " =E # n ∑ i=1 Xi2 + ∑ Xi X j i6= j n = ∑ E[Xi2 ] + ∑ E[Xi X j ] i=1 n i6= j = ∑ Pr(Xi = 1) + ∑ Pr(Xi = 1 ∩ X j = 1) i=1 i6= j 1 1 = n + n(n − 1) n n(n − 1) = 1+1 = 2 where the fourth equality follows from the fact that Xi is either 0 or 1. Therefore, we get Var[X] = E[X 2 ] − E[X]2 = 2 − 1 = 1. 3. Exercise 3.24: Generalize the median-finding algorithm to find the kth largest item in a set of n items for any given value of k. Prove that your resulting algorithm is correct, and bound its running time. Let S(i) denote the ith smallest element of set S. We modify the algorithm 3.1 as follows. The input consists of set S (the size of which is denoted by |S| = n) and a positive integer k. The following steps are changed: √ 3. Let a = b nk n3/4 − nc. Set d = R(a) if a ≥ 1 and d = S(1) otherwise. √ 4. Let b = d nk n3/4 + ne. Set u = R(b) if b ≤ n3/4 and u = S(n) otherwise. 2 6. If `d ≥ k or `u > n − k, then FAIL. 8. Output C(k−`d ) . Like in the original algorithm, steps 3 and 4 choose the lower and upper bound for elements in C. If the condition in step 6 is true, then C does not contain S(k) and the algorithm fails. Otherwise, if C is small enough, it is sorted and the correct element is returned on step 8. The analysis of Theorem 3.9 in the book hold also for this modified algorithm since the size of C is guaranteed to be small enough by step 7, and step 6 guarantees that C always contains S(k) . Also, on steps 3 and 4 finding the minimum and the maximum elements S(1) and S(n) takes linear time. Thus, the algorithm either returns the correct element or fails, and always stops in linear time. To be precise, the above proof of Theorem 3.9 for the modified algorithm was all that was asked in the exercise description. However, the same upper bound O(n−1/4 ) for the failure probability should hold for this modified algorithm. An outline of the proof follows. If a < 1, then d = S(0) and `d = 0 < k. Likewise, if b > n3/4 , then d = S(n) and `u = 0 ≤ n − k. These event can not make the algorithm fail on step 6. Thus, like in the original analysis, the algorithm fails only if at least one of the following three events occur: √ E1 : Y1 = |{r ∈ R | r ≤ S(k) }| < kn−1/4 − n √ E2 : Y2 = |{r ∈ R | r ≥ S(k) }| < (n − k)n−1/4 − n E3 : |C| > 4n3/4 The proof of Lemma 3.11. requires only small changes. Let Xi be an indicator for the event that ith sample is less than or equal to S(k) . Then k Pr(Xi = 1) = n 3/4 For Y1 = ∑ni=1 Xi we obtain: E[Y1 ] = n3/4 and 3/4 Var[Y1 ] = n k = kn−1/4 n k k 1 1− ≤ n3/4 . n n 4 Using Chebyshev’s inequality we then get √ Pr(E1 ) = Pr(Y1 − kn−1/4 < − n) √ ≤ Pr(|Y1 − E[Y1 ]| > n) Var[Y1 ] n 1 −1/4 = n . 4 ≤ The same bound can be obtained similarly for the probability of E2 . Like above, the analysis of event E3 is also similar to the one in the book. We just need to replace ”median” with ”S(k) ” and ” 12 ” with nk or 1 − nk in most places. As a result Theorem 3.13 holds also for the modified algorithm and the probability that the algorithm fails is O(n−1/4 ). 4. Exercise 4.3: a) Determine the moment generating function for the binomial random variable B(n, p). 3 Using the definition of MX (t) we get MX (t) = E[etX ] n tk n = ∑e pk (1 − p)n−k k k=0 n n =∑ (et p)k (1 − p)n−k k k=0 = (et p + 1 − p)n , where the last equality follows from the binomial theorem: n n ∑ k xk yn−k = (x + y)n . k=0 b) Let X be a B(n, p) random variable and Y a B(m, p) random variable, where X and Y are independent. Use part (a) to detemine the moment generating function of X +Y . Since X and Y are independent, using Theorem 4.3 from the book, we have MX+Y (t) = MX (t)MY (t) = (et p + 1 − p)n (et p + 1 − p)m = (et p + 1 − p)n+m . c) What can we conclude from the form of the moment generating funtion of X +Y . As the moment generating function uniquely defines the distribution of a random variable (Theorem 4.2 in the book), we see that X +Y ∼ Bin(n + m, p). Thus, we can conclude that a sum of binomially distributed random variables is also binomially distributed. 5. Exercise 4.4: Determine the probability of obtaining 55 or more heads when flipping a fair coin 100 times by an explicit calculation, and compare this with the Chernoff bound. Do the same for 550 or more heads in 1000 flips. Let X ∼ Bin(100, 1/2) be the number of heads for 100 flips. The expectation of X is then µ = E[X] = 50. The exact probability of getting 55 or more heads is k 100−k 100 100 1 1 Pr(X ≥ 55) = ∑ ≈ 0.1841 k 2 2 k=55 We can obtain several different Chernoff bounds for the same distribution. Using the first bound given by Theorem 4.4 in the book gives us Pr(X ≥ 55) = Pr(X ≥ (1 + 0.1) · 50) = e0.1 (1 + 0.1)1+0.1 50 ≈ 0.7850. On the other hand, Corollary 4.9 from the book gives a bit better bound 2 ·50 Pr(X ≥ 55) = Pr(X ≥ (1 + 0.1) · 50) = e−0.1 ≈ 0.6065. Comparing to the exact value we can observe that the first bound is about 4.3 times larger and the second bound is about 3.3 times larger. Similarly, for 1000 flips, the number of heads is X ∼ Bin(1000, 1/2), and the exact probability is k 1000−k 1000 1000 1 1 Pr(X ≥ 550) = ∑ ≈ 8.6527 · 10−4 . k 2 2 k=550 4 The first bound gives Pr(X ≥ 550) = Pr(X ≥ (1 + 0.1) · 500) = e0.1 (1 + 0.1)1+0.1 500 ≈ 0.0889, and the second gives 2 ·500 Pr(X ≥ 550) = Pr(X ≥ (1 + 0.1) · 500) = e−0.1 ≈ 0.0067. This time the first bound is about 103 times larger than the exact value and the second bound is about 7.8 times larger. Thus, the relative error of the bounds seems grow when the number of flips increases. 6. Exercise 4.8: We show how to construct a random permutation π on [1, n], given a black box that outputs numbers independently and uniformly at random from [1, k] where k ≥ n. If we compute a function f : [1, n] → [1, k] with f (i) 6= f ( j) for i 6= j, this yields a permutation; simply output the numbers [1, n] according to the order of the f (i) values. To construct such a function f , do the following for j = 1, . . . , n: choose f ( j) by repeatedly obtaining numbers from the black box and setting f ( j) to the first number found such that f ( j) 6= f (i) for i < j. Prove that this approach gives a permutation chosen uniformly at random from all permutations. Find the expected number of calls to the black box that are needed when k = n and k = 2n. For the case k = 2n, argue that the probability that each call to the black box assigns a value of f ( j) to some j is at least 1/2. Based on this, use a Chernoff bound to bound the probability that the numbers of calls to the black box is at least 4n. The processproduces a sequence ( f (1), f (2), . . . , f (n)), where f (i) ∈ {1, . . . , k} and f (i) 6= f ( j) for i 6= j. There are nk n! such sequences, as a set { f (1), f (2), . . . , f (n)} can be selected in nk ways out of k elements andthe elements in the set can be ordered in n! ways. Clearly each such sequence has an equal probability, 1/ nk n! . The produced sequence maps to a permutation π. For a fixed π, only one ordering of the elements of a set { f (1), f (2), . . . , f (n)} produces π. Thus, there are nk different sequences that produce π, and the probability of π is 1 1 k Pr(π) = = , k n n! n! n which is the same for all permutations π. So permutations are chosen uniformly at random. Next, we want to find the expected number of calls needed to construct f . Denote this by X and let Xi be the number of calls required to choose f (i) when f (1), . . . , f (i − 1) have already been chosen. Thus, we have X = ∑ni=1 Xi and by linearity of expectations E[X] = ∑ni=1 E[Xi ]. To find E[Xi ] we note that when choosing f (i), there are k − i + 1 free numbers left and thus each call to the black box has probability (k − i + 1)/k of choosing a free number. Hence, Xi is geometrically distributed and its expectation is k/(k − i + 1). The expected number of calls to the black box is then n k = k (H(k) − H(k − n)) , k − i+1 i=1 E[X] = ∑ where H(i) is the ith harmonic number. For the case k = n, this becomes nH(n) = Θ(n ln n), which is actually identical to the Coupon Collector’s Problem. When k = 2n, the expectation is 2n(H(2n) − H(n)) = 2n(ln(2n) − ln n + Θ(1)) = 2n(ln 2 + Θ(1)) = Θ(n). Thus, greater k seems to make the algorithm faster, which is of course intuitive. Now, when k = 2n, for any j = 1, . . . , n the probability, that a call to black box assigns a value of f ( j), is 2n − j + 1 2n − n + 1 1 ≥ ≥ . 2n 2n 2 5 We want to use this fact together with a Chernoff bound to bound the probability that the numbers of calls to the black box is at least 4n. The problem is, that the bounds given in the book this far are for the sum of independent Poisson trials, but Xi s are geometric random variables. Therefore, we introduce a random variable Yi , which is 1 if the ith call to the black box is successful and 0 otherwise. Now, for k calls the number of successful calls is Y(k) = ∑ki=1 Yi . The algorithm stops once there are n successful call. At least 4n call are needed if and only if Y(4n−1) ≤ n − 1. Thus, we want to bound Pr(Y(4n−1) ≤ n − 1). Since Yi s are Poisson trials, we are a step closer to using a Chernoff bound to bound this probability. The remaining problem is that Yi s are not independent as the probability of event Yi = 1 depends on results previous i − 1 calls. To fix this, we use the above observation, that this probability is always at least 1/2. Let Zi be a random variable that is 1 with probability 1/2 and 0 with probability 1/2, and let Z(k) = ∑ki=1 Zi . Now, since Yi s are 1 at least as often as Zi , it is intuitively clear that Pr(Y(k) ≤ n − 1) ≤ Pr(Z(k) ≤ n − 1) for any k. To actually prove this, we can note that since Pr(Zi = 0) ≥ Pr(Yi = 0) for all i, we can define Zi such that Zi = 0 always when Yi = 0. With such Zi , we have always Zi ≤ Yi and thus Z(k) ≤ Y(k) for all k and as a result Pr(Y(k) ≤ n − 1) ≤ Pr(Z(k) ≤ n − 1). Now, since Zi s are independent Bernoulli trials, we can finally use the Chernoff bound of Corollary 4.10 from the book. Since E[Z(k) ] = kE[Zi ] = k/2, for 0 < h < k we get k − 2h Pr(Z(k) ≤ h) = Pr Z(k) ≤ E[Z(k) ] − 2 2 ! k − 2h 1 ≤ exp −2 2 k (k − 2h)2 . = exp − 2k By setting k = 4n − 1 and h = n − 1 we then have (2n + 1)2 Pr(Z(4n−1) ≤ n − 1) ≤ exp − . 8n − 2 Since (2n + 1)2 /(8n − 2) ≥ n/2, we get a simplified bound Pr(Z(4n−1) ≤ n − 1) ≤ exp(−n/2) ≈ 0.61n . ————— Instead of introducing independent Bernoulli random variables Zi , we could have derived a Chernoff bound for the sum X of geometrically distributed random variables Xi . Using Markov’s inequality, we get E etX tX 4tn Pr(X ≥ 4n) = Pr(e ≥ e ) ≤ 4tn e n for any t > 0. Since Xi and Xi s are independent, the moment generating function of X decomposes tXX= ∑i=1 into a product E e = ∏ni=1 E etXi . Now, as Xi ∼ Geom(pi ) where pi ≥ 1/2 (as we saw earlier), the moment generating function of Xi is E etXi = ∞ ∞ k=1 k=0 ∑ etk pi (1 − pi )k−1 = pi et ∑ k et (1 − pi ) = pi et , 1 − (1 − pi )et where in the last step we just simplified the geometric sum. Note that the above geometric sum, and thus E etXi , is only defined for t < − ln(1 − pi ). In particular, as pi ≥ 1/2, values 0 < t < ln 2 are valid for all pi . Taking the derivative with respect to pi we get d tXi d pi et et (1 − et ) E e = = d pi d pi 1 − (1 − pi )et (1 − (1 − pi )et )2 tX i is decreasing w.r.t. p and since p ≥ 1/2, we get an which is always negative i i tX since t > 0. Thus, E e upper bound for E e i by setting pi = 1/2. That is, we have E etXi ≤ 6 et /2 , 1 − et /2 and further, combining this with the above observations, we have t n tX e /2 n n i E e E etX 1−et /2 1 ∏i=1 ≤ = , Pr(X ≥ 4n) ≤ 4tn = e e4tn e4tn 2(1 − et /2)e3t for all 0 < t < ln 2. Now we want to minimize the right hand side with respect to t to get a tight bound. As logarithm is an increasing function, we can instead miminize its logarithm, that is n 1 et ln = −n ln 2 + ln(1 − ) + 3t . 2(1 − et /2)e3t 2 To find the minimum we find the root of the derivative w.r.t. t: et d −n ln 2 + ln(1 − ) + 3t =0 dt 2 et −n − + 3 =0 2 − et 3 t = ln . 2 ⇔ ⇔ This is a valid value of t since 0 < ln 32 < ln 2. By plugging this value into the obtained upper bound, we get n n 1 16 Pr(X ≥ 4n) ≤ = ≈ 0.59n , 3 2(1 − (3/2)/2)(3/2) 27 which is slightly tighter a bound than the one obtained with the first method. 7. Exercise 4.13: Let X1 , . . . , Xn be independent Poisson trials such that Pr(Xi = 1) = p. Let X = ∑ni=1 Xi , so that E[X] = pn. Let F(x, p) = x ln(x/p) + (1 − x) ln((1 − x)/(1 − p)). a) Show that, for 1 ≥ x > p, Pr(X ≥ xn) ≤ e−nF(x,p) . Using Markov’s inequality, we have for all t > 0 that tX Pr(X ≥ xn) = Pr(e E etX ≥ e ) ≤ txn . e txn Since X is a sum of independent and identically distributed Bernoulli trials Xi , its moment generating function is n n n E etX = ∏ E etXi = ∏ (1 − p) + pet = 1 − p + pet , i=1 i=1 and therefore (1 − p + pet )n etxn 1 − p + pet n = etx etx = exp −n ln 1 − p + pet Pr(X ≥ xn) ≤ = exp −n tx − ln 1 − p + pet , for all t > 0. This seems to be of required form. Let g(x, p,t) = tx − ln (1 − p + pet ). Now we just need to show that for some choice of t we have that g(x, p,t) = F(x, p) and the proof is complete. This is obtained by choosing (1 − p)x (1 − p)p t = ln > ln =0 , (1 − x)p (1 − p)p 7 as then we have (1 − p)x (1 − p)x − ln 1 − p + p (1 − x)p (1 − x)p 1− p (1 − p)x − ln = x ln (1 − x)p 1−x x 1−x = x ln + (1 − x) ln p 1− p g(x, p,t) = x ln = F(x, p). The above choice of t can be obtained by making the bound exp (−n (g(x, p,t))) as tight as possible, that is, maximising g(x, p,t) with respect to t. As usual, the maximum is found by finding a root of dtd g(x, p,t): d g(x, p,t) = 0 dt pet =0 x− 1 − p + pet pet = x − xp + xpet ⇔ ⇔ ⇔ t (1 − x)pe = (1 − p)x ⇔ (1 − p)x et = (1 − x)p (1 − p)x t = ln . (1 − x)p ⇔ b) Show that, when 0 < x, p < 1, we have F(x, p) − 2(x − p)2 ≥ 0. (Hint: Take the second derivative of F(x, p) − 2(x − p)2 with respect to x.) Let f (x, p) = F(x, p) − 2(x − p)2 . Now the first derivative of f with respect to x is d x 1−x d 2 f (x, p) = x ln + (1 − x) ln − 2(x − p) dx dx p 1− p x 1 1−x 1 = ln + x − − (1 − x) − 4(x − p) p x 1− p 1−x x 1−x = ln − − 4(x − p), p 1− p and the second derivative is d x 1−x d2 f (x, p) = ln − − 4(x − p) dx2 dx p 1− p 1 1 = + −4 x 1−x (1 − x) + x − 4x(1 − x) = x(1 − x) = (2x − 1)2 , x(1 − x) which is nonnegative for 0 < x < 1. Thus, f (x, p) is convex with respect to x. By setting x = p the first derivative becomes 0, so this must be the point where f (x, p) is minimized w.r.t. x. As f (p, p) = 0, we can conclude, that f (x, p) ≥ 0 for all 0 < x, p < 1. c) Using parts (a) and (b), argue that 2 Pr(X ≥ (p + ε)n) ≤ e−2nε . 8 Assume that 0 < ε ≤ 1 − p. Using first the result from (a) and then the result from (b) we get 2 2 Pr(X ≥ (p + ε)n) ≤ e−nF(p+ε,p) ≤ e−n2(p+ε−p) = e−2nε . d) Use symmetry to argue that 2 Pr(X ≤ (p − ε)n) ≤ e−2nε , and conclude that 2 Pr(|X − pn| ≥ εn) ≤ 2e−2nε . Let Yi = 1 − Xi and Y = ∑ni=1 Yi = n − X. Thus, Yi s are independent Poisson trials such that Pr(Yi = 1) = 1 − p and E[Y ] = (1 − p)n. Now, by using the result from (c) to Y , we get 2 Pr(X ≤ (p − ε)n) = Pr(Y ≥ ((1 − p) + ε)n) ≤ e−2nε . Finally using the above and (c) again, we obtain that Pr(|X − pn| ≥ εn) = Pr(X ≥ (p + ε)n ∪ X ≤ (p − ε)n) = Pr(X ≥ (p + ε)n) + Pr(X ≤ (p − ε)n) 2 ≤ 2e−2nε . 9