Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 2
Discrete Random Variables and
Expectation
RandomQuickSort(S)
// a finite S ⊆ N.
1. if |S| ≤ 0 then return the empty list [ ].
2. Choose a pivot a ∈ S uniformly at random.
3. S1 ← {b | b ∈ S and b < a}.
4. L1 ← RandomQuickSort(S1 ).
5. S2 ← {b | b ∈ S and b > a}.
6. L2 ← RandomQuickSort(S2 ).
7. return merge(L1 , a, L2 ).
Here, merge(L1 , a, L2 ) merges two lists L1 and L2 into one list by inserting the pivot a in the middle.
The main theorem we will prove in this chapter is:
The average running time of RandomQuickSort(S) is
2 · n · ln n + Θ(n),
where n = |S|.
But we need more probability theory, (at least to give a solid interpretation of “average running time”).
2.1 . Random Variables and Expectation
Definition 2.1.1. A random variable X on a sample space Ω is a real-valued function on Ω, that is,
X : Ω → R. A discrete random variable is a random variable that takes on only a finite or countably
infinite number of values, i.e., the set
{X(a) | a ∈ Ω}
is finite or countably infinite.
Remark 2.1.2. In the following, without stated otherwise, all random variables are discrete. And recall
that we always assume the probability space is discrete.
Definition 2.1.3. Let (Ω, F , Prob) be a discrete probability space and X a random variable on Ω. For
every e ∈ R, let
X
Proba∈Ω [X = e] :=
Prob({a}).
a ∈ Ω with
X(a)=e
Discrete Random Variables and Expectation
10
Sometimes, we write Prob[X = e] if the sample space Ω is clear from the context.
Definition 2.1.4. Two random variables X and Y are independent if and only if
Prob (X = e) ∩ (Y = f ) = Prob[X = e] · Prob[Y = f ]
for all e, f ∈ R. Similarly, random variables X1 , X2 , . . . , Xn are mutually independent if and only if, for
any subset I ⊆ [n] and any values ei with i ∈ I, we have
h\
i Y
Xi = ei =
Prob
Prob[Xi = ei ].
i∈I
i∈I
Definition 2.1.5. The expectation of a discrete random variable X, denoted by E[x], is given by
E[X] =
X
e · Prob[X = e],
e
where e is ranged over
Pall values in the range of X. (This is well-defined, since X is discrete.) The
expectation is finite if e |e| · Prob[X = e] converges: otherwise, it is unbounded.
Lemma 2.1.6.
E[X] =
X
X(a) · Prob({a}).
a∈Ω
Proof:
E[X] =
X
e · Prob[X = e] =
e
X
e·
e
=
X
e
X
Prob({a})
a ∈ Ω with
X(a)=e
X
X(a) · Prob({a}) =
a ∈ Ω with
X(a)=e
X
X(a) · Prob({a})
a∈Ω
2
2.1.1. Linearity of Expectations.
Theorem 2.1.7 (Linearity of Expectations). For any finite number of discrete random variables for a
probability space with finite expectations,
E
hX
i∈[n]
i
X
E[Xi ].
Xi =
i∈[n]
2.1 Random Variables and Expectation
11
Proof:
E[X1 + X2 ] =
X
e · Prob[X1 + X2 = e]
e∈R
=
X
e1 ,e2 ∈R
=
(e1 + e2 ) · Prob (X1 = e1 ) ∩ (X2 = e2 )
X X
(e1 + e2 ) · Prob (X1 = e1 ) ∩ (X2 = e2 )
e1 ∈R e2 ∈R
=
X X
e1 ∈R e2 ∈R
+
e1 · Prob (X1 = e1 ) ∩ (X2 = e2 )
X X
e1 ∈R e2 ∈R
=
X
e1 ·
e1 ∈R
+
X
e2 ∈R
X
e2 ·
e2 ∈R
=
X
e2 · Prob (X1 = e1 ) ∩ (X2 = e2 )
Prob (X1 = e1 ) ∩ (X2 = e2 )
X
e1 ∈R
Prob (X1 = e1 ) ∩ (X2 = e2 )
e1 · Prob[X1 = e1 ] +
e1 ∈R
X
e2 · Prob[X2 = e2 ]
e2 ∈R
(by the law of total probability)
= E[X1 ] + E[X2 ].
2
Lemma 2.1.8. Let c ∈ R and X be a discrete random variable. Then
E[c · X] = c · E[X].
Proof:
E[c · X] =
X
e · Prob[c · X = e]
e∈R
=c·
e∈R
=c·
h
ei
· Prob X =
c
c
Xe
X
e · Prob[X = e]
e∈R
= c · E[X].
2
Combining Theorem 2.1.7 and Lemma 2.1.8 we have the following.
Corollary 2.1.9. Let ℓ : R → R be a linear function, i.e., for some a, b ∈ R, we have
ℓ(x) = a · x + b.
Then E[ℓ(X)] = ℓ(E[X]).
2.1.2. Jensen’s Inequality.
Definition 2.1.10. A function f : R → R is said to be convex if for any x1 , x2 ∈ R and o ≤ λ ≤ 1,
f (λ · x1 + (1 − λ) · x2 ) ≤ λ · f (x1 ) + (1 − λ) · f (x2 ).
Discrete Random Variables and Expectation
12
Remark 2.1.11. Usually we explicitly require a convex function should be continuous. But in Definition 2.1.10, indeed the domain of f being R implies that f is continuous.
Lemma 2.1.12. Let f : R → R be a function. Then f is convex if and only if for all a ∈ R, there exists a
linear function ℓa such that
- f (a) = ℓa (a), and
- for all b ∈ R we have f (b) ≥ ℓa (b).
Theorem 2.1.13 (Jensen’s Inequality). Let f : R → R be a convex function. Then
E[f (X)] ≥ f (E[X]).
Proof: Let a := E[X] and ℓa be the linear function as in Lemma 2.1.12.
E[f (X)] ≥ E[ℓa (X)] = ℓa (E[X]) = ℓa (a) = f (a) = f (E[X]).
Note the first inequality follows from Lemma 2.1.12 and the monotonicity of the expectation1, while the
second equality from Corollary 2.1.9.
2
2.2 . The Bernoulli and Binomial Random Variables
Suppose that we run an experiment that succeeds with probability p and fails with probability 1 − p.
Let Y be a random variable such that
Y =
1, if the experiment succeeds,
0, otherwise.
The variable Y is called a Bernoulli or an indicator random variable. Note that, for a Bernoulli random
variable,
E[Y ] = 1 · p + 0 · (1 − p) = p = Prob[Y = 1].
Definition 2.2.1. A binomial random variable X with parameters n and p, denoted by B(n, p), is defined
by the following probability distribution on j = 0, 1, . . . , n:
Prob[X = j] =
1 i.e.,
n
· pj · (1 − p)n−j
j
let X, Y be two random variables and for all a ∈ Ω we have X(a) ≤ Y (a), then
E[X] ≤ E[Y ].
2.3 Conditional Expectation
13
For a binomial random variable X with parameters n and p, we have
E[X] =
n
X
j·
n
· pj · (1 − p)n−j
j
j·
n!
· pj · (1 − p)n−j
j! · (n − j)!
j=0
=
n
X
j=0
=
n
X
j=1
n!
· pj · (1 − p)n−j
(j − 1)! · (n − j)!
=n·p·
n
X
j=1
(n − 1)!
· pj−1 · (1 − p)(n−1)−(j−1)
(j − 1)! · ((n − 1) − (j − 1))!
n−1
X
(n − 1)!
· pk · (1 − p)(n−1)−k
k! · ((n − 1) − k)!
k=0
n−1
X n − 1
=n·p·
· pk · (1 − p)(n−1)−k
k
=n·p·
k=0
=n·p
Indeed, there is a much simpler proof for the above fact based on the linearity of expectations. Define
n indicator random variables X1 , . . . , Xn , where for i ∈ [n]
Xi =
(
1, if the i-th trial is successful (with probability p),
0, otherwise.
Now it is easy to see that
X
X=
Xi ,
i∈[n]
and E[Xi ] = p for every i ∈ [n]. Hence
E[X] = E
hX
i∈[n]
i X
E[Xi ] = n · p.
Xi =
i∈[n]
2.3 . Conditional Expectation
Definition 2.3.1.
E[Y | Z = z] =
X
y · Prob[Y = y | Z = z],
y
where the summation is overall y in the range of Y .
Lemma 2.3.2. Let X and Y be two random variables.
E[X] =
X
Prob[Y = y] · E[X | Y = y],
y
where the sum is overall values in the range of Y and all of the expectations exist.
Discrete Random Variables and Expectation
14
Proof:
X
Prob[Y = y] · E[X | Y = y] =
y
X
Prob[Y = y] ·
y
=
XX
x · Prob[X = x | Y = y] · Prob[Y = y]
y
XX
x
=
x · Prob[X = x | Y = y]
x
x
=
X
x · Prob[X = x ∩ Y = y]
y
X
x · Prob[X = x]
x
= E[X].
2
Lemma 2.3.3. Let X1 , X2 , . . . , Xn be discrete random variables with finite expectations and Y a random
variable. Then
i X
h X E[Xi | Y = y].
Xi Y = y =
E
i∈[n]
i∈[n]
The proof is left as an exercise.
Definition 2.3.4. The expression E[Y | Z] is a random variable X with
X(a) = E[Y | Z = Z(a)]
=
X
y
y · Prob[Y = y | Z = Z(a)] ,
for every a ∈ Ω.
Theorem 2.3.5.
E[Y ] = E[E[Y | Z]].
Proof: Let X be the random variable E[Y | Z].
E[E[Y | Z]] =
X
X(a) · Prob({a}),
(by Lemma 2.1.6)
a∈Ω
=
X
E[Y | Z = Z(a)] · Prob({a})
(by the definition of E[Y | Z])
a∈Ω
=
X
z
=
X
X
E[Y | Z = z] ·
X
E[Y | Z = z] · Prob[Z = z]
z
=
E[Y | Z = z] · Prob({a})
a ∈ Ω with
Z(a)=z
X
Prob({a})
a ∈ Ω with
Z(a)=z
(by Definition 2.3.4)
z
= E[Y ],
(by Lemma 2.3.2).
2
Example 2.3.6.
2.4 The Geometric Distribution
15
RandomSpawn(n, p, i)
// n ∈ N, 0 ≤ p ≤ 1, and i ≥ 0
1. if i = 0 then return.
2. ℓ ← 0.
3. while ℓ < n do
4.
Toss a biased coin with heads probability p.
5.
if the outcome is “heads” then RandomSpawn(n, p, i − 1).
6.
ℓ ← ℓ + 1.
7. Return.
Now consider the run (likely infinite) of RandomSpawn(n, p, 0). Let Y0 = 1, and for i ≥ 1 let
Yi = the number of calls of RandomSpawn(n, p, i) spawned by all RandomSpawn(n, p, i − 1).
Note, all Yi are random variables, in particular Y0 is viewed as a constant function.
Thus, E[Y0 ] = 1, and since Y1 is a binomial random variable
E[Y1 ] = n · p.
Now suppose we know that Yi−1 = yi−1 (recall, Yi−1 is a random variable). Let Zk be the number of calls
of RandomSpawn(n, p, i) spawned by the k-th calls of RandomSpawn(n, p, i − 1) for k ∈ [yi−1 ]. It is
easy to see that Zk is a binomial random variable with parameters n and p. Then
i
h X
Zk Yi−1 = yi−1
E[Yi | Yi−1 = yi − 1] = E
k∈[yi−1 ]
=
X
=
X
j≥0
j≥0
=E
j · Prob
j · Prob
h X
Zk
k∈[yi−1 ]
=
X
X
k∈[yi−1 ]
X
k∈[yi−1 ]
i
Zk = j Yi−1 = yi−1
Zk = j
E[Zk ]
k∈[yi−1 ]
= yi−1 · n · p.
That is to say, E[Yi | Yi−1 ] is the linear function:
y 7→ n · p · y
composed with Yi−1 . By Theorem 2.3.5, we conclude
E[Yi ] = E[E[Yi | Yi−1 ]] = E[Yi−1 · n · p] = n · p · E[Yi−1 ].
Using this recurrence, we get
E[Yi ] = (n · p)i
for all i ≥ 0.
2.4 . The Geometric Distribution
Again we are tossing a biased coin with heads probability p repeatedly, and we do not stop until it lands
on heads. What is the distribution of the number of flips? A similar question: if we do an experiment with
success probability p, then how many times we need to repeat the experiment until we succeed?
Discrete Random Variables and Expectation
16
Definition 2.4.1. A geometric random variable X with parameter p is given by the following probability
distribution on n ∈ N:
Prob[X = n] = (1 − p)n−1 · p.
That is, for the geometric random variable X to equal n, there must be n−1 failures, followed by a success.
Lemma 2.4.2. Let X be a geometric random variable with parameter p and n ∈ N. Then
Prob[X = n + k | X > k] = Prob[X = n].
Proof:
Prob[X = n + k | X > k] =
=
Prob[X = n + k ∩ X > k]
Prob[X > k]
Prob[X = n + k]
Prob[X > k]
(1 − p)n+k−1 · p
= P∞
i
i=k (1 − p) · p
=
(1 − p)n+k−1 · p
(1 − p)k
= (1 − p)n−1 · p
= Prob[X = n].
2
Lemma 2.4.3. Let X be a discrete random variable that takes on only nonnegative integer values. Then
E[X] =
∞
X
Prob[X ≥ i].
i=1
Proof:
∞
X
Prob[X ≥ i] =
∞ X
∞
X
Prob[X = j]
i=1 j=i
i=1
=
=
j
∞ X
X
Prob[X = j]
j=1 i=1
∞
X
j · Prob[X = j]
j=1
= E[X].
2
Lemma 2.4.4. Let X be a geometric random variable with parameter p. Then
E[X] =
1
.
p
Proof:
Prob[X ≥ i] =
∞
X
n=i
(1 − p)n−1 · p = (1 − p)i−1 .
2.4 The Geometric Distribution
17
Hence,
E[X] =
∞
X
Prob[X ≥ i] =
∞
X
(1 − p)i−1 =
i=1
i=1
1
1
= .
1 − (1 − p)
p
2
Remark 2.4.5.
E[X] = Prob[Y = 0] · E[X | Y = 0] + Prob[Y = 1] · E[X | Y = 1]
= (1 − p) · (1 + E[X]) + p · 1 = (1 − p) · E[X] + 1.
Hence E[X] = 1/p.
2.4.1. Coupon Collector’s Problem. The coupon collector’s problem arises from the following scenario:
Suppose that each box of cereal contains one of n different coupons. Once you obtain one of
every type of coupon, you can send in for a prize. Assuming that the coupon in each box is
chosen independently and uniformly at random from the n possibilities and that you do not
collaborate with others to collect coupons, how many boxes of cereal must you buy before you
obtain at least one of every type of coupon?
Let X be the number of boxes bought until at least one of every type of coupons is obtain. Note X is
a random variable. Moreover let Xi be the number of boxes bought while you had exactly i − 1 different
coupons, i.e., after you have i − 1 different coupons, you have to buy other Xi boxes to get a new type of
coupon. Clearly
X
Xi .
X=
i∈[n]
When exactly i − 1 coupons have been found, the probability of obtaining a new coupon is
pi = 1 −
i−1
.
n
Hence, Xi is a geometric random variable with parameter pi , and
E[Xi ] =
1
n
=
.
pi
n−i+1
Thus
E[X] = E
hX
Xi
i∈[n]
=
X
i
E[Xi ]
i∈[n]
=
X
i∈[n]
=n·
n
n−i+1
X 1
i
i∈[n]
= n · ln n + Θ(n),
Lemma 2.4.6. The harmonic number H(n) =
(see the next lemma).
P
i∈[n] 1/i
satisfies
H(n) = ln n + Θ(1).
Discrete Random Variables and Expectation
18
2.5 . Application: The Expected Running Time of RandomQuickSort
Theorem 2.5.1. Let S ⊆ N with |S| = n. Then the expected number of comparisons made by RandomQuickSort(S)
is
2 · n · ln n + Θ(n).
Proof: Let S = {y1 , y2 , . . . , yn } with
y 1 < y 2 < . . . < yn .
Consider the following random variables: for 1 ≤ i < j ≤ n
1, yi and yj are compared in RandomQuickSort(S),
Xij =
0, otherwise.
Then the total number of comparisons X, a random variable, satisfies
X=
n−1
X
n
X
n−1
X
n
X
Xij .
i=1 j=i+1
Hence,
E[X] =
E[Xij ].
i=1 j=i+1
Let Y ij := {yi , . . . , yj }. It is easy to see that
Xij = 1
⇐⇒
The first time we choose a pivot from Y ij , it is either yi or yj .
As Eij is an indicator variable, we have
E[Xij ] =
2
.
j−i+1
We conclude
E[X] =
n−1
X
n
X
i=1 j=i+1
=
n−1
X n−i+1
X
i=1
=
n
X
k=2
2
j−i+1
2
k
(n + 1 − k) ·
k=2
= (2 · n + 2) ·
2
k
X 1
−4·n
k
k∈[n]
= (2 · n + 2) · H(n) − 4 · n
= 2 · n · ln n + Θ(n).
2
Theorem 2.5.2. The expected number of comparisons made by QuickSort(S) for a set S ⊆ N chosen
uniformly at random is
2 · n · ln n + Θ(n),
where |S| = n.