Download Discrete Random Variables and Expectation

Chapter 2 Discrete Random Variables and Expectation RandomQuickSort(S) // a finite S ⊆ N. 1. if |S| ≤ 0 then return the empty list [ ]. 2. Choose a pivot a ∈ S uniformly at random. 3. S1 ← {b | b ∈ S and b < a}. 4. L1 ← RandomQuickSort(S1 ). 5. S2 ← {b | b ∈ S and b > a}. 6. L2 ← RandomQuickSort(S2 ). 7. return merge(L1 , a, L2 ). Here, merge(L1 , a, L2 ) merges two lists L1 and L2 into one list by inserting the pivot a in the middle. The main theorem we will prove in this chapter is: The average running time of RandomQuickSort(S) is 2 · n · ln n + Θ(n), where n = |S|. But we need more probability theory, (at least to give a solid interpretation of “average running time”). 2.1 . Random Variables and Expectation Definition 2.1.1. A random variable X on a sample space Ω is a real-valued function on Ω, that is, X : Ω → R. A discrete random variable is a random variable that takes on only a finite or countably infinite number of values, i.e., the set {X(a) | a ∈ Ω} is finite or countably infinite. Remark 2.1.2. In the following, without stated otherwise, all random variables are discrete. And recall that we always assume the probability space is discrete. Definition 2.1.3. Let (Ω, F , Prob) be a discrete probability space and X a random variable on Ω. For every e ∈ R, let X Proba∈Ω [X = e] := Prob({a}). a ∈ Ω with X(a)=e Discrete Random Variables and Expectation 10 Sometimes, we write Prob[X = e] if the sample space Ω is clear from the context. Definition 2.1.4. Two random variables X and Y are independent if and only if Prob (X = e) ∩ (Y = f ) = Prob[X = e] · Prob[Y = f ] for all e, f ∈ R. Similarly, random variables X1 , X2 , . . . , Xn are mutually independent if and only if, for any subset I ⊆ [n] and any values ei with i ∈ I, we have h\ i Y Xi = ei = Prob Prob[Xi = ei ]. i∈I i∈I Definition 2.1.5. The expectation of a discrete random variable X, denoted by E[x], is given by E[X] = X e · Prob[X = e], e where e is ranged over Pall values in the range of X. (This is well-defined, since X is discrete.) The expectation is finite if e |e| · Prob[X = e] converges: otherwise, it is unbounded. Lemma 2.1.6. E[X] = X X(a) · Prob({a}). a∈Ω Proof: E[X] = X e · Prob[X = e] = e X e· e = X e X Prob({a}) a ∈ Ω with X(a)=e X X(a) · Prob({a}) = a ∈ Ω with X(a)=e X X(a) · Prob({a}) a∈Ω 2 2.1.1. Linearity of Expectations. Theorem 2.1.7 (Linearity of Expectations). For any finite number of discrete random variables for a probability space with finite expectations, E hX i∈[n] i X E[Xi ]. Xi = i∈[n] 2.1 Random Variables and Expectation 11 Proof: E[X1 + X2 ] = X e · Prob[X1 + X2 = e] e∈R = X e1 ,e2 ∈R = (e1 + e2 ) · Prob (X1 = e1 ) ∩ (X2 = e2 ) X X (e1 + e2 ) · Prob (X1 = e1 ) ∩ (X2 = e2 ) e1 ∈R e2 ∈R = X X e1 ∈R e2 ∈R + e1 · Prob (X1 = e1 ) ∩ (X2 = e2 ) X X e1 ∈R e2 ∈R = X e1 · e1 ∈R + X e2 ∈R X e2 · e2 ∈R = X e2 · Prob (X1 = e1 ) ∩ (X2 = e2 ) Prob (X1 = e1 ) ∩ (X2 = e2 ) X e1 ∈R Prob (X1 = e1 ) ∩ (X2 = e2 ) e1 · Prob[X1 = e1 ] + e1 ∈R X e2 · Prob[X2 = e2 ] e2 ∈R (by the law of total probability) = E[X1 ] + E[X2 ]. 2 Lemma 2.1.8. Let c ∈ R and X be a discrete random variable. Then E[c · X] = c · E[X]. Proof: E[c · X] = X e · Prob[c · X = e] e∈R =c· e∈R =c· h ei · Prob X = c c Xe X e · Prob[X = e] e∈R = c · E[X]. 2 Combining Theorem 2.1.7 and Lemma 2.1.8 we have the following. Corollary 2.1.9. Let ℓ : R → R be a linear function, i.e., for some a, b ∈ R, we have ℓ(x) = a · x + b. Then E[ℓ(X)] = ℓ(E[X]). 2.1.2. Jensen’s Inequality. Definition 2.1.10. A function f : R → R is said to be convex if for any x1 , x2 ∈ R and o ≤ λ ≤ 1, f (λ · x1 + (1 − λ) · x2 ) ≤ λ · f (x1 ) + (1 − λ) · f (x2 ). Discrete Random Variables and Expectation 12 Remark 2.1.11. Usually we explicitly require a convex function should be continuous. But in Definition 2.1.10, indeed the domain of f being R implies that f is continuous. Lemma 2.1.12. Let f : R → R be a function. Then f is convex if and only if for all a ∈ R, there exists a linear function ℓa such that - f (a) = ℓa (a), and - for all b ∈ R we have f (b) ≥ ℓa (b). Theorem 2.1.13 (Jensen’s Inequality). Let f : R → R be a convex function. Then E[f (X)] ≥ f (E[X]). Proof: Let a := E[X] and ℓa be the linear function as in Lemma 2.1.12. E[f (X)] ≥ E[ℓa (X)] = ℓa (E[X]) = ℓa (a) = f (a) = f (E[X]). Note the first inequality follows from Lemma 2.1.12 and the monotonicity of the expectation1, while the second equality from Corollary 2.1.9. 2 2.2 . The Bernoulli and Binomial Random Variables Suppose that we run an experiment that succeeds with probability p and fails with probability 1 − p. Let Y be a random variable such that Y =  1, if the experiment succeeds, 0, otherwise. The variable Y is called a Bernoulli or an indicator random variable. Note that, for a Bernoulli random variable, E[Y ] = 1 · p + 0 · (1 − p) = p = Prob[Y = 1]. Definition 2.2.1. A binomial random variable X with parameters n and p, denoted by B(n, p), is defined by the following probability distribution on j = 0, 1, . . . , n: Prob[X = j] = 1 i.e., n · pj · (1 − p)n−j j let X, Y be two random variables and for all a ∈ Ω we have X(a) ≤ Y (a), then E[X] ≤ E[Y ]. 2.3 Conditional Expectation 13 For a binomial random variable X with parameters n and p, we have E[X] = n X j· n · pj · (1 − p)n−j j j· n! · pj · (1 − p)n−j j! · (n − j)! j=0 = n X j=0 = n X j=1 n! · pj · (1 − p)n−j (j − 1)! · (n − j)! =n·p· n X j=1 (n − 1)! · pj−1 · (1 − p)(n−1)−(j−1) (j − 1)! · ((n − 1) − (j − 1))! n−1 X (n − 1)! · pk · (1 − p)(n−1)−k k! · ((n − 1) − k)! k=0 n−1 X n − 1 =n·p· · pk · (1 − p)(n−1)−k k =n·p· k=0 =n·p Indeed, there is a much simpler proof for the above fact based on the linearity of expectations. Define n indicator random variables X1 , . . . , Xn , where for i ∈ [n] Xi = ( 1, if the i-th trial is successful (with probability p), 0, otherwise. Now it is easy to see that X X= Xi , i∈[n] and E[Xi ] = p for every i ∈ [n]. Hence E[X] = E hX i∈[n] i X E[Xi ] = n · p. Xi = i∈[n] 2.3 . Conditional Expectation Definition 2.3.1. E[Y | Z = z] = X y · Prob[Y = y | Z = z], y where the summation is overall y in the range of Y . Lemma 2.3.2. Let X and Y be two random variables. E[X] = X Prob[Y = y] · E[X | Y = y], y where the sum is overall values in the range of Y and all of the expectations exist. Discrete Random Variables and Expectation 14 Proof: X Prob[Y = y] · E[X | Y = y] = y X Prob[Y = y] · y = XX x · Prob[X = x | Y = y] · Prob[Y = y] y XX x = x · Prob[X = x | Y = y] x x = X x · Prob[X = x ∩ Y = y] y X x · Prob[X = x] x = E[X]. 2 Lemma 2.3.3. Let X1 , X2 , . . . , Xn be discrete random variables with finite expectations and Y a random variable. Then i X h X E[Xi | Y = y]. Xi Y = y = E i∈[n] i∈[n] The proof is left as an exercise. Definition 2.3.4. The expression E[Y | Z] is a random variable X with X(a) = E[Y | Z = Z(a)] = X y y · Prob[Y = y | Z = Z(a)] , for every a ∈ Ω. Theorem 2.3.5. E[Y ] = E[E[Y | Z]]. Proof: Let X be the random variable E[Y | Z]. E[E[Y | Z]] = X X(a) · Prob({a}), (by Lemma 2.1.6) a∈Ω = X E[Y | Z = Z(a)] · Prob({a}) (by the definition of E[Y | Z]) a∈Ω = X z = X X E[Y | Z = z] · X E[Y | Z = z] · Prob[Z = z] z = E[Y | Z = z] · Prob({a}) a ∈ Ω with Z(a)=z X Prob({a}) a ∈ Ω with Z(a)=z (by Definition 2.3.4) z = E[Y ], (by Lemma 2.3.2). 2 Example 2.3.6. 2.4 The Geometric Distribution 15 RandomSpawn(n, p, i) // n ∈ N, 0 ≤ p ≤ 1, and i ≥ 0 1. if i = 0 then return. 2. ℓ ← 0. 3. while ℓ < n do 4. Toss a biased coin with heads probability p. 5. if the outcome is “heads” then RandomSpawn(n, p, i − 1). 6. ℓ ← ℓ + 1. 7. Return. Now consider the run (likely infinite) of RandomSpawn(n, p, 0). Let Y0 = 1, and for i ≥ 1 let Yi = the number of calls of RandomSpawn(n, p, i) spawned by all RandomSpawn(n, p, i − 1). Note, all Yi are random variables, in particular Y0 is viewed as a constant function. Thus, E[Y0 ] = 1, and since Y1 is a binomial random variable E[Y1 ] = n · p. Now suppose we know that Yi−1 = yi−1 (recall, Yi−1 is a random variable). Let Zk be the number of calls of RandomSpawn(n, p, i) spawned by the k-th calls of RandomSpawn(n, p, i − 1) for k ∈ [yi−1 ]. It is easy to see that Zk is a binomial random variable with parameters n and p. Then i h X Zk Yi−1 = yi−1 E[Yi | Yi−1 = yi − 1] = E k∈[yi−1 ] = X = X j≥0 j≥0 =E j · Prob j · Prob h X Zk k∈[yi−1 ] = X X k∈[yi−1 ] X k∈[yi−1 ] i Zk = j Yi−1 = yi−1 Zk = j E[Zk ] k∈[yi−1 ] = yi−1 · n · p. That is to say, E[Yi | Yi−1 ] is the linear function: y 7→ n · p · y composed with Yi−1 . By Theorem 2.3.5, we conclude E[Yi ] = E[E[Yi | Yi−1 ]] = E[Yi−1 · n · p] = n · p · E[Yi−1 ]. Using this recurrence, we get E[Yi ] = (n · p)i for all i ≥ 0. 2.4 . The Geometric Distribution Again we are tossing a biased coin with heads probability p repeatedly, and we do not stop until it lands on heads. What is the distribution of the number of flips? A similar question: if we do an experiment with success probability p, then how many times we need to repeat the experiment until we succeed? Discrete Random Variables and Expectation 16 Definition 2.4.1. A geometric random variable X with parameter p is given by the following probability distribution on n ∈ N: Prob[X = n] = (1 − p)n−1 · p. That is, for the geometric random variable X to equal n, there must be n−1 failures, followed by a success. Lemma 2.4.2. Let X be a geometric random variable with parameter p and n ∈ N. Then Prob[X = n + k | X > k] = Prob[X = n]. Proof: Prob[X = n + k | X > k] = = Prob[X = n + k ∩ X > k] Prob[X > k] Prob[X = n + k] Prob[X > k] (1 − p)n+k−1 · p = P∞ i i=k (1 − p) · p = (1 − p)n+k−1 · p (1 − p)k = (1 − p)n−1 · p = Prob[X = n]. 2 Lemma 2.4.3. Let X be a discrete random variable that takes on only nonnegative integer values. Then E[X] = ∞ X Prob[X ≥ i]. i=1 Proof: ∞ X Prob[X ≥ i] = ∞ X ∞ X Prob[X = j] i=1 j=i i=1 = = j ∞ X X Prob[X = j] j=1 i=1 ∞ X j · Prob[X = j] j=1 = E[X]. 2 Lemma 2.4.4. Let X be a geometric random variable with parameter p. Then E[X] = 1 . p Proof: Prob[X ≥ i] = ∞ X n=i (1 − p)n−1 · p = (1 − p)i−1 . 2.4 The Geometric Distribution 17 Hence, E[X] = ∞ X Prob[X ≥ i] = ∞ X (1 − p)i−1 = i=1 i=1 1 1 = . 1 − (1 − p) p 2 Remark 2.4.5. E[X] = Prob[Y = 0] · E[X | Y = 0] + Prob[Y = 1] · E[X | Y = 1] = (1 − p) · (1 + E[X]) + p · 1 = (1 − p) · E[X] + 1. Hence E[X] = 1/p. 2.4.1. Coupon Collector’s Problem. The coupon collector’s problem arises from the following scenario: Suppose that each box of cereal contains one of n different coupons. Once you obtain one of every type of coupon, you can send in for a prize. Assuming that the coupon in each box is chosen independently and uniformly at random from the n possibilities and that you do not collaborate with others to collect coupons, how many boxes of cereal must you buy before you obtain at least one of every type of coupon? Let X be the number of boxes bought until at least one of every type of coupons is obtain. Note X is a random variable. Moreover let Xi be the number of boxes bought while you had exactly i − 1 different coupons, i.e., after you have i − 1 different coupons, you have to buy other Xi boxes to get a new type of coupon. Clearly X Xi . X= i∈[n] When exactly i − 1 coupons have been found, the probability of obtaining a new coupon is pi = 1 − i−1 . n Hence, Xi is a geometric random variable with parameter pi , and E[Xi ] = 1 n = . pi n−i+1 Thus E[X] = E hX Xi i∈[n] = X i E[Xi ] i∈[n] = X i∈[n] =n· n n−i+1 X 1 i i∈[n] = n · ln n + Θ(n), Lemma 2.4.6. The harmonic number H(n) = (see the next lemma). P i∈[n] 1/i satisfies H(n) = ln n + Θ(1). Discrete Random Variables and Expectation 18 2.5 . Application: The Expected Running Time of RandomQuickSort Theorem 2.5.1. Let S ⊆ N with |S| = n. Then the expected number of comparisons made by RandomQuickSort(S) is 2 · n · ln n + Θ(n). Proof: Let S = {y1 , y2 , . . . , yn } with y 1 < y 2 < . . . < yn . Consider the following random variables: for 1 ≤ i < j ≤ n  1, yi and yj are compared in RandomQuickSort(S), Xij = 0, otherwise. Then the total number of comparisons X, a random variable, satisfies X= n−1 X n X n−1 X n X Xij . i=1 j=i+1 Hence, E[X] = E[Xij ]. i=1 j=i+1 Let Y ij := {yi , . . . , yj }. It is easy to see that Xij = 1 ⇐⇒ The first time we choose a pivot from Y ij , it is either yi or yj . As Eij is an indicator variable, we have E[Xij ] = 2 . j−i+1 We conclude E[X] = n−1 X n X i=1 j=i+1 = n−1 X n−i+1 X i=1 = n X k=2 2 j−i+1 2 k (n + 1 − k) · k=2 = (2 · n + 2) · 2 k X 1 −4·n k k∈[n] = (2 · n + 2) · H(n) − 4 · n = 2 · n · ln n + Θ(n). 2 Theorem 2.5.2. The expected number of comparisons made by QuickSort(S) for a set S ⊆ N chosen uniformly at random is 2 · n · ln n + Θ(n), where |S| = n.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Discrete Random Variables and Expectation