Download Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Randomized Algorithms I, Spring 2016, Department of Computer Science, University of Helsinki
Homework 2: Solutions (Discussed February 12, 2016)
1. Exercise 3.18: Show that, for a random variable X with standard deviation σ[X] and any positive real
number t:
a)
Pr( X − E[X] ≥ tσ[X]) ≤
1
;
1 + t2
First note, that if σ[X] = 0, then X = E[X] with probability 1 and thus Pr(X − E[X] ≥ tσ[X]) = 1 regardless
of t. Hence, if σ[X] = 0, then the claim does not hold for any t > 0.
Now assume, that σ[X] > 0. Let’s define a normalized random variable
Y=
for which we have
E[Y ] =
and
E[Y 2 ] =
X − E[X]
,
σ(X)
E[X] − E[X]
=0
σ(X)
E[(X − E[X])2 ] σ2 (X)
= 2
= 1.
σ2 (X)
σ (X)
Now, since t > 0, the following statements are equivalent:
X − E[X] ≥ tσ(X)
⇔
Y ≥t
tY ≥ t
⇔
2
⇔
2
1 + tY ≥ 1 + t .
Therefore, we get
Pr(X − E[X] ≥ tσ(X)) = Pr 1 + tY ≥ 1 + t 2
≤ Pr |1 + tY | ≥ 1 + t 2
= Pr (1 + tY )2 ≥ (1 + t 2 )2
E (1 + tY )2
≤
(1 + t 2 )2
E 1 + 2tY + t 2Y 2
=
(1 + t 2 )2
=
1 + 2tE[Y ] + t 2 E[Y 2 ]
(1 + t 2 )2
1 + t2
(1 + t 2 )2
1
,
=
1 + t2
=
where the second inequality follows from Markov’s inequality.
b)
Pr(|X − E[X]| ≥ tσ[X]) ≤
2
.
1 + t2
By dividing event |X − E[X]| ≥ tσ[X] into two disjoint events and using the result from a) for both random
1
variables X and −X, we get
Pr(|X − E[X]| ≥ tσ[X]) = Pr (X − E[X] ≥ tσ[X] ∪ −X + E[X] ≥ tσ[X])
= Pr (X − E[X] ≥ tσ[X]) + Pr (−X + E[X] ≥ tσ[X])
= Pr (X − E[X] ≥ tσ[X]) + Pr (−X − E[−X] ≥ tσ[X])
1
2
1
+
=
.
≤
2
2
1+t
1+t
1 + t2
2. Exercise 3.21: A fixed point of a permutation π : [1, n] → [1, n] is a value for which π(x) = x. Find the
variance in the number of fixed points of a permutation chosen uniformly at random from all permutations.
Let Xi = 1 if π(i) = i and Xi = 0 otherwise. The number of fixed points is X = ∑ni=1 Xi . A permutation with
fixed point i can be obtained by fixing π(i) = i and permuting the remaining elements in one of (n − 1)!
orders. Thus
(n − 1)! 1
Pr(Xi = 1) =
= .
n!
n
Furthermore, if i 6= j, then
(n − 2)!
1
Pr(Xi = 1 ∩ X j = 1) =
=
.
n!
n(n − 1)
Using the above equations we can compute the two first moments of X:
n
n
1
E[X] = ∑ E[Xi ] = ∑ Pr(Xi = 1) = n = 1
n
i=1
i=1
and

!2 
E[X 2 ] = E  ∑ Xi 
n
i=1
"
=E
#
n
∑
i=1
Xi2 +
∑ Xi X j
i6= j
n
= ∑ E[Xi2 ] + ∑ E[Xi X j ]
i=1
n
i6= j
= ∑ Pr(Xi = 1) + ∑ Pr(Xi = 1 ∩ X j = 1)
i=1
i6= j
1
1
= n + n(n − 1)
n
n(n − 1)
= 1+1 = 2
where the fourth equality follows from the fact that Xi is either 0 or 1. Therefore, we get
Var[X] = E[X 2 ] − E[X]2 = 2 − 1 = 1.
3. Exercise 3.24: Generalize the median-finding algorithm to find the kth largest item in a set of n items for
any given value of k. Prove that your resulting algorithm is correct, and bound its running time.
Let S(i) denote the ith smallest element of set S. We modify the algorithm 3.1 as follows. The input consists
of set S (the size of which is denoted by |S| = n) and a positive integer k. The following steps are changed:
√
3. Let a = b nk n3/4 − nc. Set d = R(a) if a ≥ 1 and d = S(1) otherwise.
√
4. Let b = d nk n3/4 + ne. Set u = R(b) if b ≤ n3/4 and u = S(n) otherwise.
2
6. If `d ≥ k or `u > n − k, then FAIL.
8. Output C(k−`d ) .
Like in the original algorithm, steps 3 and 4 choose the lower and upper bound for elements in C. If the
condition in step 6 is true, then C does not contain S(k) and the algorithm fails. Otherwise, if C is small
enough, it is sorted and the correct element is returned on step 8.
The analysis of Theorem 3.9 in the book hold also for this modified algorithm since the size of C is guaranteed to be small enough by step 7, and step 6 guarantees that C always contains S(k) . Also, on steps 3 and 4
finding the minimum and the maximum elements S(1) and S(n) takes linear time. Thus, the algorithm either
returns the correct element or fails, and always stops in linear time.
To be precise, the above proof of Theorem 3.9 for the modified algorithm was all that was asked in the
exercise description. However, the same upper bound O(n−1/4 ) for the failure probability should hold for
this modified algorithm. An outline of the proof follows.
If a < 1, then d = S(0) and `d = 0 < k. Likewise, if b > n3/4 , then d = S(n) and `u = 0 ≤ n − k. These event
can not make the algorithm fail on step 6. Thus, like in the original analysis, the algorithm fails only if at
least one of the following three events occur:
√
E1 : Y1 = |{r ∈ R | r ≤ S(k) }| < kn−1/4 − n
√
E2 : Y2 = |{r ∈ R | r ≥ S(k) }| < (n − k)n−1/4 − n
E3 : |C| > 4n3/4
The proof of Lemma 3.11. requires only small changes. Let Xi be an indicator for the event that ith sample
is less than or equal to S(k) . Then
k
Pr(Xi = 1) =
n
3/4
For Y1 = ∑ni=1 Xi we obtain:
E[Y1 ] = n3/4
and
3/4
Var[Y1 ] = n
k
= kn−1/4
n
k
k
1
1−
≤ n3/4 .
n
n
4
Using Chebyshev’s inequality we then get
√
Pr(E1 ) = Pr(Y1 − kn−1/4 < − n)
√
≤ Pr(|Y1 − E[Y1 ]| > n)
Var[Y1 ]
n
1 −1/4
= n
.
4
≤
The same bound can be obtained similarly for the probability of E2 .
Like above, the analysis of event E3 is also similar to the one in the book. We just need to replace ”median”
with ”S(k) ” and ” 12 ” with nk or 1 − nk in most places. As a result Theorem 3.13 holds also for the modified
algorithm and the probability that the algorithm fails is O(n−1/4 ).
4. Exercise 4.3:
a) Determine the moment generating function for the binomial random variable B(n, p).
3
Using the definition of MX (t) we get
MX (t) = E[etX ]
n
tk n
= ∑e
pk (1 − p)n−k
k
k=0
n n
=∑
(et p)k (1 − p)n−k
k
k=0
= (et p + 1 − p)n ,
where the last equality follows from the binomial theorem:
n n
∑ k xk yn−k = (x + y)n .
k=0
b) Let X be a B(n, p) random variable and Y a B(m, p) random variable, where X and Y are independent.
Use part (a) to detemine the moment generating function of X +Y .
Since X and Y are independent, using Theorem 4.3 from the book, we have
MX+Y (t) = MX (t)MY (t)
= (et p + 1 − p)n (et p + 1 − p)m
= (et p + 1 − p)n+m .
c) What can we conclude from the form of the moment generating funtion of X +Y .
As the moment generating function uniquely defines the distribution of a random variable (Theorem 4.2 in
the book), we see that X +Y ∼ Bin(n + m, p). Thus, we can conclude that a sum of binomially distributed
random variables is also binomially distributed.
5. Exercise 4.4: Determine the probability of obtaining 55 or more heads when flipping a fair coin 100 times
by an explicit calculation, and compare this with the Chernoff bound. Do the same for 550 or more heads
in 1000 flips.
Let X ∼ Bin(100, 1/2) be the number of heads for 100 flips. The expectation of X is then µ = E[X] = 50.
The exact probability of getting 55 or more heads is
k 100−k
100 100
1
1
Pr(X ≥ 55) = ∑
≈ 0.1841
k
2
2
k=55
We can obtain several different Chernoff bounds for the same distribution. Using the first bound given by
Theorem 4.4 in the book gives us
Pr(X ≥ 55) = Pr(X ≥ (1 + 0.1) · 50) =
e0.1
(1 + 0.1)1+0.1
50
≈ 0.7850.
On the other hand, Corollary 4.9 from the book gives a bit better bound
2 ·50
Pr(X ≥ 55) = Pr(X ≥ (1 + 0.1) · 50) = e−0.1
≈ 0.6065.
Comparing to the exact value we can observe that the first bound is about 4.3 times larger and the second
bound is about 3.3 times larger.
Similarly, for 1000 flips, the number of heads is X ∼ Bin(1000, 1/2), and the exact probability is
k 1000−k
1000 1000
1
1
Pr(X ≥ 550) = ∑
≈ 8.6527 · 10−4 .
k
2
2
k=550
4
The first bound gives
Pr(X ≥ 550) = Pr(X ≥ (1 + 0.1) · 500) =
e0.1
(1 + 0.1)1+0.1
500
≈ 0.0889,
and the second gives
2 ·500
Pr(X ≥ 550) = Pr(X ≥ (1 + 0.1) · 500) = e−0.1
≈ 0.0067.
This time the first bound is about 103 times larger than the exact value and the second bound is about 7.8
times larger. Thus, the relative error of the bounds seems grow when the number of flips increases.
6. Exercise 4.8: We show how to construct a random permutation π on [1, n], given a black box that outputs
numbers independently and uniformly at random from [1, k] where k ≥ n. If we compute a function f :
[1, n] → [1, k] with f (i) 6= f ( j) for i 6= j, this yields a permutation; simply output the numbers [1, n] according
to the order of the f (i) values. To construct such a function f , do the following for j = 1, . . . , n: choose
f ( j) by repeatedly obtaining numbers from the black box and setting f ( j) to the first number found such
that f ( j) 6= f (i) for i < j.
Prove that this approach gives a permutation chosen uniformly at random from all permutations. Find the
expected number of calls to the black box that are needed when k = n and k = 2n. For the case k = 2n, argue
that the probability that each call to the black box assigns a value of f ( j) to some j is at least 1/2. Based on
this, use a Chernoff bound to bound the probability that the numbers of calls to the black box is at least 4n.
The processproduces a sequence ( f (1), f (2), . . . , f (n)), where f (i) ∈ {1, . . . , k} and
f (i) 6= f ( j) for i 6= j.
There are nk n! such sequences, as a set { f (1), f (2), . . . , f (n)} can be selected in nk ways out of k elements
andthe elements
in the set can be ordered in n! ways. Clearly each such sequence has an equal probability,
1/ nk n! . The produced sequence maps to a permutation π. For a fixed π, only one ordering of the
elements of a set { f (1), f (2), . . . , f (n)} produces π. Thus, there are nk different sequences that produce π,
and the probability of π is
1
1
k
Pr(π) =
= ,
k
n
n! n!
n
which is the same for all permutations π. So permutations are chosen uniformly at random.
Next, we want to find the expected number of calls needed to construct f . Denote this by X and let Xi be the
number of calls required to choose f (i) when f (1), . . . , f (i − 1) have already been chosen. Thus, we have
X = ∑ni=1 Xi and by linearity of expectations E[X] = ∑ni=1 E[Xi ]. To find E[Xi ] we note that when choosing
f (i), there are k − i + 1 free numbers left and thus each call to the black box has probability (k − i + 1)/k
of choosing a free number. Hence, Xi is geometrically distributed and its expectation is k/(k − i + 1). The
expected number of calls to the black box is then
n
k
= k (H(k) − H(k − n)) ,
k
−
i+1
i=1
E[X] = ∑
where H(i) is the ith harmonic number. For the case k = n, this becomes
nH(n) = Θ(n ln n),
which is actually identical to the Coupon Collector’s Problem. When k = 2n, the expectation is
2n(H(2n) − H(n)) = 2n(ln(2n) − ln n + Θ(1)) = 2n(ln 2 + Θ(1)) = Θ(n).
Thus, greater k seems to make the algorithm faster, which is of course intuitive.
Now, when k = 2n, for any j = 1, . . . , n the probability, that a call to black box assigns a value of f ( j), is
2n − j + 1 2n − n + 1 1
≥
≥ .
2n
2n
2
5
We want to use this fact together with a Chernoff bound to bound the probability that the numbers of calls
to the black box is at least 4n. The problem is, that the bounds given in the book this far are for the sum
of independent Poisson trials, but Xi s are geometric random variables. Therefore, we introduce a random
variable Yi , which is 1 if the ith call to the black box is successful and 0 otherwise. Now, for k calls the
number of successful calls is Y(k) = ∑ki=1 Yi . The algorithm stops once there are n successful call. At least
4n call are needed if and only if Y(4n−1) ≤ n − 1. Thus, we want to bound Pr(Y(4n−1) ≤ n − 1). Since Yi s are
Poisson trials, we are a step closer to using a Chernoff bound to bound this probability.
The remaining problem is that Yi s are not independent as the probability of event Yi = 1 depends on results
previous i − 1 calls. To fix this, we use the above observation, that this probability is always at least 1/2.
Let Zi be a random variable that is 1 with probability 1/2 and 0 with probability 1/2, and let Z(k) = ∑ki=1 Zi .
Now, since Yi s are 1 at least as often as Zi , it is intuitively clear that Pr(Y(k) ≤ n − 1) ≤ Pr(Z(k) ≤ n − 1) for
any k. To actually prove this, we can note that since Pr(Zi = 0) ≥ Pr(Yi = 0) for all i, we can define Zi such
that Zi = 0 always when Yi = 0. With such Zi , we have always Zi ≤ Yi and thus Z(k) ≤ Y(k) for all k and as a
result Pr(Y(k) ≤ n − 1) ≤ Pr(Z(k) ≤ n − 1).
Now, since Zi s are independent Bernoulli trials, we can finally use the Chernoff bound of Corollary 4.10
from the book. Since E[Z(k) ] = kE[Zi ] = k/2, for 0 < h < k we get
k − 2h
Pr(Z(k) ≤ h) = Pr Z(k) ≤ E[Z(k) ] −
2
2 !
k − 2h
1
≤ exp −2
2
k
(k − 2h)2
.
= exp −
2k
By setting k = 4n − 1 and h = n − 1 we then have
(2n + 1)2
Pr(Z(4n−1) ≤ n − 1) ≤ exp −
.
8n − 2
Since (2n + 1)2 /(8n − 2) ≥ n/2, we get a simplified bound Pr(Z(4n−1) ≤ n − 1) ≤ exp(−n/2) ≈ 0.61n .
—————
Instead of introducing independent Bernoulli random variables Zi , we could have derived a Chernoff bound
for the sum X of geometrically distributed random variables Xi . Using Markov’s inequality, we get
E etX
tX
4tn
Pr(X ≥ 4n) = Pr(e ≥ e ) ≤ 4tn
e
n
for any t > 0. Since
Xi and
Xi s are independent, the moment generating function of X decomposes
tXX= ∑i=1
into a product E e = ∏ni=1 E etXi . Now, as Xi ∼ Geom(pi ) where pi ≥ 1/2 (as we saw earlier), the
moment generating function of Xi is
E etXi =
∞
∞
k=1
k=0
∑ etk pi (1 − pi )k−1 = pi et ∑
k
et (1 − pi ) =
pi et
,
1 − (1 − pi )et
where
in the last step we just simplified the geometric sum. Note that the above geometric sum, and thus
E etXi , is only defined for t < − ln(1 − pi ). In particular, as pi ≥ 1/2, values 0 < t < ln 2 are valid for all
pi . Taking the derivative with respect to pi we get
d tXi d
pi et
et (1 − et )
E e
=
=
d pi
d pi 1 − (1 − pi )et
(1 − (1 − pi )et )2
tX i is decreasing w.r.t. p and since p ≥ 1/2, we get an
which is always negative
i
i
tX since t > 0. Thus, E e
upper bound for E e i by setting pi = 1/2. That is, we have
E etXi ≤
6
et /2
,
1 − et /2
and further, combining this with the above observations, we have
t n
tX e /2
n
n
i
E
e
E etX
1−et /2
1
∏i=1
≤
=
,
Pr(X ≥ 4n) ≤ 4tn =
e
e4tn
e4tn
2(1 − et /2)e3t
for all 0 < t < ln 2. Now we want to minimize the right hand side with respect to t to get a tight bound. As
logarithm is an increasing function, we can instead miminize its logarithm, that is
n
1
et
ln
=
−n
ln
2
+
ln(1
−
)
+
3t
.
2(1 − et /2)e3t
2
To find the minimum we find the root of the derivative w.r.t. t:
et
d
−n ln 2 + ln(1 − ) + 3t
=0
dt
2
et
−n −
+
3
=0
2 − et
3
t = ln .
2
⇔
⇔
This is a valid value of t since 0 < ln 32 < ln 2. By plugging this value into the obtained upper bound, we get
n n
1
16
Pr(X ≥ 4n) ≤
=
≈ 0.59n ,
3
2(1 − (3/2)/2)(3/2)
27
which is slightly tighter a bound than the one obtained with the first method.
7. Exercise 4.13: Let X1 , . . . , Xn be independent Poisson trials such that Pr(Xi = 1) = p. Let X = ∑ni=1 Xi , so
that E[X] = pn. Let
F(x, p) = x ln(x/p) + (1 − x) ln((1 − x)/(1 − p)).
a) Show that, for 1 ≥ x > p,
Pr(X ≥ xn) ≤ e−nF(x,p) .
Using Markov’s inequality, we have for all t > 0 that
tX
Pr(X ≥ xn) = Pr(e
E etX
≥ e ) ≤ txn .
e
txn
Since X is a sum of independent and identically distributed Bernoulli trials Xi , its moment generating function is
n
n
n
E etX = ∏ E etXi = ∏ (1 − p) + pet = 1 − p + pet ,
i=1
i=1
and therefore
(1 − p + pet )n
etxn
1 − p + pet n
=
etx
etx
= exp −n ln
1 − p + pet
Pr(X ≥ xn) ≤
= exp −n tx − ln 1 − p + pet
,
for all t > 0. This seems to be of required form. Let g(x, p,t) = tx − ln (1 − p + pet ). Now we just need to
show that for some choice of t we have that g(x, p,t) = F(x, p) and the proof is complete. This is obtained
by choosing
(1 − p)x
(1 − p)p
t = ln
> ln
=0 ,
(1 − x)p
(1 − p)p
7
as then we have
(1 − p)x
(1 − p)x
− ln 1 − p + p
(1 − x)p
(1 − x)p
1− p
(1 − p)x
− ln
= x ln
(1 − x)p
1−x
x
1−x
= x ln + (1 − x) ln
p
1− p
g(x, p,t) = x ln
= F(x, p).
The above choice of t can be obtained by making the bound exp (−n (g(x, p,t))) as tight as possible, that is,
maximising g(x, p,t) with respect to t. As usual, the maximum is found by finding a root of dtd g(x, p,t):
d
g(x, p,t) = 0
dt
pet
=0
x−
1 − p + pet
pet = x − xp + xpet
⇔
⇔
⇔
t
(1 − x)pe = (1 − p)x
⇔
(1 − p)x
et =
(1 − x)p
(1 − p)x
t = ln
.
(1 − x)p
⇔
b) Show that, when 0 < x, p < 1, we have F(x, p) − 2(x − p)2 ≥ 0. (Hint: Take the second derivative of
F(x, p) − 2(x − p)2 with respect to x.)
Let f (x, p) = F(x, p) − 2(x − p)2 . Now the first derivative of f with respect to x is
d
x
1−x
d
2
f (x, p) =
x ln + (1 − x) ln
− 2(x − p)
dx
dx
p
1− p
x
1 1−x
1
= ln + x −
− (1 − x)
− 4(x − p)
p
x 1− p
1−x
x 1−x
= ln −
− 4(x − p),
p 1− p
and the second derivative is
d
x 1−x
d2
f
(x,
p)
=
ln
−
−
4(x
−
p)
dx2
dx
p 1− p
1
1
= +
−4
x 1−x
(1 − x) + x − 4x(1 − x)
=
x(1 − x)
=
(2x − 1)2
,
x(1 − x)
which is nonnegative for 0 < x < 1. Thus, f (x, p) is convex with respect to x. By setting x = p the first
derivative becomes 0, so this must be the point where f (x, p) is minimized w.r.t. x. As f (p, p) = 0, we can
conclude, that f (x, p) ≥ 0 for all 0 < x, p < 1.
c) Using parts (a) and (b), argue that
2
Pr(X ≥ (p + ε)n) ≤ e−2nε .
8
Assume that 0 < ε ≤ 1 − p. Using first the result from (a) and then the result from (b) we get
2
2
Pr(X ≥ (p + ε)n) ≤ e−nF(p+ε,p) ≤ e−n2(p+ε−p) = e−2nε .
d) Use symmetry to argue that
2
Pr(X ≤ (p − ε)n) ≤ e−2nε ,
and conclude that
2
Pr(|X − pn| ≥ εn) ≤ 2e−2nε .
Let Yi = 1 − Xi and Y = ∑ni=1 Yi = n − X. Thus, Yi s are independent Poisson trials such that Pr(Yi = 1) = 1 − p
and E[Y ] = (1 − p)n. Now, by using the result from (c) to Y , we get
2
Pr(X ≤ (p − ε)n) = Pr(Y ≥ ((1 − p) + ε)n) ≤ e−2nε .
Finally using the above and (c) again, we obtain that
Pr(|X − pn| ≥ εn) = Pr(X ≥ (p + ε)n ∪ X ≤ (p − ε)n)
= Pr(X ≥ (p + ε)n) + Pr(X ≤ (p − ε)n)
2
≤ 2e−2nε .
9