Download Lecture 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability box wikipedia , lookup

Random variable wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Lecture 3
1
Weak Law of Large Numbers
Previously, we have shown how to construct an infinite sequence of independent random
variables on a common probability space (Ω, F, P). We now study a sequence of independent
and identically distributed (i.i.d.) real-valued random variables (Xn )n∈N . For instance,
(Xn )n∈N could be the result of a gambler betting on a sequence of coin tosses, where the
gambler wins $1 and sets Xn := 1 if the n-th coin toss shows head up, or otherwise loses $1
and sets Xn := −1. The probability P(Xn = 1) = 1 − P(Xn = −1) = p ∈ [0, 1] is the bias of
the coin, with p > 1/2 being favorable to the gambler and p < 1/2 unfavorable, and p = 1/2
being fair. A natural quantity to consider then is the aggregate winnings,
Sn :=
n
X
Xi ,
i=1
and its average Sn /n (also called empirical average if we interpret X1 , X2 , . . . as empirical
observations of a sequence of experiments). The law of large numbers (LLN) refers to
the phenomenon that, under i.i.d. assumption on (Xn )n∈N , as n → ∞, Sn /n converges to
µ := E[X1 ]. Since Sn /n is a sequence of random variables defined on the probability space
(Ω, F, P), and µ is a constant which can also be regarded as a random variable, we can have
two different notions of convergence. The first is convergence of Sn /n → µ in probability, i.e.,
S
n
∀ > 0,
P − µ > → 0
as n → ∞,
n
which is called the weak law of large numbers (WLLN); or
almost surely
Sn (ω)
Sn
=
→µ
n
n
as n → ∞,
which is called the strong law of large numbers (SLLN).
en := Xn +1 equals 1 if the n-th coin toss shows
Exercise 1.1 In the coin toss example above, X
2
P
ei ,
head up and 0 otherwise. Use binomial expansion to prove the WLLN for Sen := ni=1 X
which counts the number of heads. Then deduce the WLLN for Sn .
We now prove the WLLN under a finite second moment restriction.
Theorem 1.2 [L2 Weak Law of Large Numbers] Let (Xn )n∈N be a sequence of i.i.d.
R-valued random variables defined on the probability space (Ω, F, P). Assume that E[X1 ] = µ
P
and Var(X1 ) := E[X12 ] − E[X1 ]2 = σ 2 < ∞. Then Sn /n := ni=1 Xi /n converges in probability
to µ as n → ∞.
Proof. The proof relies on control of the variance of Sn and applying Markov’s inequality.
Denote Yn := Sn /n − µ. Note that E[Yn ] = 0. To show Yn → E[Yn ] = 0 in probability,
it suffices to show that the variance of Yn , E[Yn2 ] − E[Yn ]2 = E[Yn2 ], tends to 0. Indeed, by
Markov’s inequality, for any > 0,
P(|Yn | > ) = E[1{|Yn |>} ] ≤ E[|Yn |2 −2 1{|Yn |>} ] ≤ E[Yn2 ]−2 ,
1
which tends to 0 if E[Yn2 ] → 0 as n → ∞.
Under the finite second moment assumption, E[Yn2 ] is easy to evaluate:
h Pn X
2 i
h Pn (X − µ) 2 i
i
2
i=1 i
i=1
E[Yn ] = E
−µ
= E
n
n
n
n
1 X
1 X
σ2
2
=
,
E[(X
−
µ)(X
−
µ)]
=
E[(X
−
µ)
]
=
i
j
i
n2
n2
n
i,j=1
i=1
which tends to 0 as n → ∞, and hence Yn → 0 in probability.
We now extend Theorem 1.2 to a WLLN requiring only E[|X1 |] < ∞.
Theorem 1.3 [L1 Weak Law of Large Numbers] Let (Xn )n∈N be a sequence of i.i.d.
P
random variables with E[X1 ] = µ ∈ R. Then Sn /n := ni=1 Xi /n converges in probability to
µ as n → ∞.
Proof. Since E[X12 ] may be infinite, we can no longer use second moment calculations to
bound P(|Sn /n − µ| > ). Instead, we will first truncate each Xi , i.e., replace each Xi
by XiM , with XiM := Xi if |Xi | ≤ M , and XiM := 0 if |Xi | > M . Since the sequence
(Xim )i∈N is i.i.d. with µM := E[X1M ] ∈ R and Var(X1M ) < ∞, we can apply the L2 WLLN to
P
SnM := ni=1 XiM /n to conclude that SnM → µM in probability. Note that for any > 0,
S
S
SnM
n
n SnM
P − µ > = P −
+
− µM + µM − µ > n
n
n
n
(1.1)
S M
S
M
n
n Sn −
+P − µM >
+ P |µM − µ| >
,
≤P >
n
n
3
n
3
3
where the middle term tends to 0 as n → ∞ for each M > 0. Therefore to show that the
right hand side (RHS) tends to 0 as n → ∞, it suffices to show that

S
SM 
 lim sup P n − n >
≤ δ,
n
n
3
n→∞
∀ δ > 0, ∃ M0 > 0, such that for all M > M0 :
(1.2)

 P |µ − µ| > ≤ δ.
M
3
Note that by Markov’s inequality,
N
i
i 3
S
3 h S
3 h X
n SnM n SnM −
≤ E −
E (Xi − XiM ) ≤ E[|X1 − X1M |], (1.3)
P >
=
n
n
3
n
n
n
i=1
and
P |µM − µ| >
= 1{|µM −µ|> 3 } = 1{|E[X M −X1 ]|> } ≤ 1{E[|X1 −X M |]> } .
1
1
3
3
3
Therefore to prove (1.2), it suffices to show that
(1.4)
lim E[|X1 − X1M |] = lim E[|X1 |1{|X1 |>M } ] = 0.
M →∞
M →∞
Since E[|X1 |] < ∞, and P(|X1 | > M ) ≤ E[|X1 |]/M → 0 as M → ∞, the above limit must
hold by the Dominated Convergence Theorem (see also Exercise 1.4 below), and so does (1.2),
and hence Sn /n → µ in probability.
Exercise 1.4 Let X : (Ω, F, P) → (R, B) satisfy E[|X|] < ∞. If An ∈ F is a sequence of sets
with limn→∞ P(An ) = 0, then prove that limn→∞ E[|X|1An ] = 0.
2
We now make a further extension of the WLLN that does not even assume E[|X1 |] < ∞.
Theorem 1.5 [Weak Law of Large Numbers] Let (Xn )n∈N be a sequence of i.i.d. random
P
variables with limx→∞ xP(|X1 | > x) = 0. Let Sn /n := ni=1 Xi /n and µn := E[X1 1{|X1 |≤n} ].
Then Sn /n − µn → 0 in probability as n → ∞.
Remark. The condition limx→∞ xP(|X1 | > x) = 0 is in fact necessary for the WLLN. See
Feller [1, Section VII.7] for a proof.
Proof. Since µn is the mean of X1 truncated at level n, we choose n to be truncation level
M in the proof of Theorem 1.3. Analogous to (1.1), we have
S
S
S n
n Snn n
n
− µn > ≤ P −
>
−
µ
P +
P
.
(1.5)
n >
n
n
n
2
n
2
We can not apply Markov’s inequality because of lack of integrability. Instead, we note that
P
P
(Sn −Snn )/n = ni=1 (Xi −Xin )/n = ni=1 Xi 1{|Xi |>n} /n, and hence the event |Sn /n−Snn /n| >
/2 occurs only if |Xi | > n for some 1 ≤ i ≤ n. Therefore by a union bound,
n
S
X
n Snn P ≤
P(|Xi | > n) = nP(|X1 | > n),
−
>
n
n
2
i=1
which tends to 0 as n → ∞ by our assumption.
To bound the second term in (1.5), we just apply Markov’s inequality with a L2 bound:
2 i
S n
4 h S n
4
4
P n − µn >
≤ 2 E n − µn = 2 Var(X1n ) ≤ 2 E[(X1n )2 ].
(1.6)
n
2
n
n
n
By Exercise 1.6 below,
Z
Z
Z 1
1
1 ∞
1 n
n 2
n
E[(X1 ) ] =
2yP(|X1 | > y)dy =
2yP(n ≥ |X1 | > y)dy ≤
2ntP(|X1 | > nt)dt.
n
n 0
n 0
0
Since limx→∞ xP(|X1 | > x) = 0, for any δ > 0, can find K > 0 such that xP(|X1 | > x) ≤ δ
for all x ≥ K. We can then bound
Z 1
Z K/n
Z 1
K2
2ntP(|X1 | > nt)dt ≤
2ntdt +
2ntP(|X1 | > nt)dt ≤
+ 2δ,
n
0
0
K/n
the limit of which (as n → ∞) can be made arbitrarily small by choosing K sufficiently large.
Therefore the bound in (1.6) tends to 0 as n → ∞, which substituted back into (1.5) implies
that Sn /n − µn → 0 in probability.
Exercise 1.6 [Representation
of Moments] Let Y be a non-negative
variable.
R∞
R ∞ random
p
p−1
Show that E[Y ] = 0 P(Y > y)dy. Furthermore for any p > 0, E[Y ] = 0 py P(Y > y)dy.
2
Strong Law of Large Numbers
We now prove the strong law of large numbers (SLLN), i.e., Sn /n → µ almost surely, under
suitable conditions on the i.i.d. sequence (Xn )n∈N . We first need to introduce a fundamental
lemma which is the standard tool to apply to make almost sure statements.
Lemma 2.1 [Borel-Cantelli] Let (Ω, F, P) be a probability space, and An ∈ F for n ∈ N.
3
P
(i) If ∞
P(A ) < ∞, then (An )n∈N occurs infinitely often with probability 0, i.e., almost
n=1
P∞ n
surely n=1 1An (ω) < ∞.
P
(ii) If (An )n∈N is an independent collection of events with ∞
P(An ) = ∞, then (An )n∈N
n=1P
occurs infinitely often with probability 1, i.e., almost surely ∞
n=1 1An (ω) = ∞.
Proof. For (i), we note that
∞
X
P(An ) =
n=1
∞
X
∞
hX
i
E[1An ] = E
1An < ∞,
n=1
n=1
P
where we used Tonelli’s Theorem to interchange ∞
n=1 with E[·]. Therefore the non-negative
P∞
random variable n=1 1An (ω) must be finite almost surely.
P∞
For (ii), we note that
n=1 1An (ω) < ∞ if and only if lim supn→∞ 1An (ω) = 0, i.e.,
c . Note that ∩
c
ω ∈ ∪∞
∩
A
m≥n
m≥n Am is increasing in n. Therefore
m
n=1
P(
∞
X
1An (ω) < ∞)
n=1
c
c
N
c
= P(∪∞
n=1 ∩m≥n Am ) = lim P(∩m≥n Am ) = lim lim P(∩m=n Am )
n→∞
=
lim lim
N
Y
n→∞ N →∞
n→∞ N →∞
(1 − P(Am )) ≤ lim lim
m=n
n→∞ N →∞
N
Y
e−P(Am ) = lim e−
P∞
m=n
n→∞
m=n
P(Am )
= 0,
where in the second and third equalities we used the countable additivity of P, in the fourth
equality we used the independence of (An )n∈N , in the inequality we used that 1 − x ≤ e−x for
P
all x ∈ R, and in the last equality we used the assumption ∞
n=1 P(An ) = ∞.
Here is a useful extension of Lemma 2.1 (ii), which replaces the independence of (An )n∈N
by controls on their pairwise correlations.
Lemma 2.2 [Kochen-Stone] Let (Ω, F, P) be a probability space, and An ∈ F for n ∈ N.
P
If ∞
n=1 P(An ) = ∞, then
P
∞
X
n=1
k→∞
Pk
2
n=1 P(An )
Pk
m=1
n=1 P(Am
1An (ω) = ∞ ≥ lim sup Pk
∩ An )
.
Note that if (An )n∈N are pair-wise independent, then almost surely An occurs infinitely often.
Exercise 2.3 Recall the Paley-Zygmund inequality: If X ≥ 0 and E[X 2 ] < ∞, then for
0 ≤ a < E[X], P(X > a) ≥ (E[X] − a)2 /E[X 2 ]. Apply the same proof as the one for
Paley-Zygmund inequality to prove the Kochen-Stone Lemma.
As an easy corollary of the Borel-Cantelli Lemma, we prove a version of the Strong Law
of Large Numbers (SLLN) with a finite 4-th moment assumption.
Theorem 2.4 Let (Xn )n∈N be a sequence of i.i.d. random variables with E[X14 ] < ∞. Then
Pn
Sn
i=1 Xi
lim
:= lim
= E[X1 ]
almost surely.
(2.7)
n→∞ n
n→∞
n
4
Proof. Without loss of generality (w.l.o.g.), we may assume E[X1 ] = 0, since otherwise we
ei = Xi − E[Xi ]. For any > 0, by Markov’s inequality, we have
can just replace Xi with X
n
S h S 4 i
1 h X 4 i
n
n
−4
P > ≤ E
Xi
.
= 4 4E
n
n
n
i=1
P
When we expand ( ni=1 Xi )4 and take expectation, the only terms with E[Xi1 Xi2 Xi3 Xi4 ] 6= 0
are the ones where either i1 , . . . , i4 are all equal, or they take on two distinct values with each
value repeated twice among i1 , . . . , i4 . Therefore uniformly in n ∈ N, we have
S E[X 4 ] 3n(n − 1)E[X 2 ]2
C
n
1
P > ≤ 4 13 +
≤ 2 for some C > 0.
4
4
n
n
n
n
S P∞
Clearly n=1 P nn > < ∞, and hence by Borel-Cantelli, the events { Snn > }
n∈N almost
Sn surely can only occur finitely many times. In other words, a.s. lim supn→∞ n ≤ . Since
> 0 is arbitrary, (2.7) follows.
3
Kolmogorov’s 0-1 Law
Before extending the SLLN to the minimal assumption E[|X1 |] < ∞, we show here that either
limn→∞ Sn /n almost surely does not exist, or the limit equals a non-random constant a.s. This
is based on Kolmogorov’s 0-1 law.
Let X1 , X2 , . . . be a sequence of independent (not necessarily identically distributed) random variables defined on a probability space (Ω, F, P). Kolmogorov’s 0-1 law states that,
events which do not depend on the value of a finite number of the Xi ’s can only have probP
ability either 0 or 1. Examples of such events include {limn→∞ ni=1 Xi /an ∈ [c, d]} or
Pn
P
{lim supn→∞ i=1 Xi /an > c}, for any sequence an ↑ ∞, since lim supn→∞ ni=1 Xi /an does
not change if only a finite number of Xi ’s are modified. To be more precise, we need to
introduce the notion of tail σ-algebras.
n := σ(X , X
For m ≤ n ∈ N ∪ {∞}, let Fm
m
m+1 , . . . , Xn ) denote the σ-algebra on Ω
generated by Xm , . . . , Xn , i.e., it is the σ-algebra generated by events of the form Xi−1 ((a, b])
for m ≤ i ≤ n and a < b.
Definition 3.1 [Tail σ-algebra] The σ-algebra T := ∩n∈N Fn∞ is called the tail σ-algebra.
Intuitively, T consists of events which do not depend on the values of any finite collection of
Xi ’s, or in other words, depend only on the infinite right tail of the sequence (X1 , X2 , . . .).
Theorem 3.2 [Kolmogorov’s 0-1 Law] If X1 , X2 , . . . are independent random variables,
then its tail σ-field T is trivial in the sense that P(A) ∈ {0, 1} for all A ∈ T .
Proof. We will show that every event A ∈ T is independent of itself, and hence P(A) =
P(A ∩ A) = P(A)2 , which implies P(A) ∈ {0, 1}.
n is independent of F l if n < k. Since
We note that by the independence of (X1 , X2 , . . .), Fm
k
∞
n
T := ∩n∈N Fn , T is independent of F0 for any n ∈ N. Since F0∞ is the σ-algebra generated
by F0n , n ∈ N, it follows that T is also independent of F0∞ . On the other hand T ⊂ F0∞ , and
hence T must be independent of itself, which concludes the proof.
P
Note that for i.i.d. random variables (Xi )i∈N , the almost sure limit lim supn→∞ n1 ni=1 Xi
is a random variable measurable with respect to T , and hence must be trivial, i.e., equal to a
P
constant with probability 1. The same applies to lim inf n→∞ n1 ni=1 Xi . Therefore either a.s.
P
limn→∞ n1 ni=1 Xi does not exist, or a.s. it exists and equals a constant.
5
References
[1] W. Feller. An Introduction to Probability Theory and Its Applications, Vol II, John Riley
& Sons, Inc, 1971. (Note: some parts of the 1971 version is quite different from the 1967
version.)
6