Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability box wikipedia , lookup
Random variable wikipedia , lookup
Birthday problem wikipedia , lookup
Inductive probability wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Probability interpretations wikipedia , lookup
Conditioning (probability) wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Lecture 3 1 Weak Law of Large Numbers Previously, we have shown how to construct an infinite sequence of independent random variables on a common probability space (Ω, F, P). We now study a sequence of independent and identically distributed (i.i.d.) real-valued random variables (Xn )n∈N . For instance, (Xn )n∈N could be the result of a gambler betting on a sequence of coin tosses, where the gambler wins $1 and sets Xn := 1 if the n-th coin toss shows head up, or otherwise loses $1 and sets Xn := −1. The probability P(Xn = 1) = 1 − P(Xn = −1) = p ∈ [0, 1] is the bias of the coin, with p > 1/2 being favorable to the gambler and p < 1/2 unfavorable, and p = 1/2 being fair. A natural quantity to consider then is the aggregate winnings, Sn := n X Xi , i=1 and its average Sn /n (also called empirical average if we interpret X1 , X2 , . . . as empirical observations of a sequence of experiments). The law of large numbers (LLN) refers to the phenomenon that, under i.i.d. assumption on (Xn )n∈N , as n → ∞, Sn /n converges to µ := E[X1 ]. Since Sn /n is a sequence of random variables defined on the probability space (Ω, F, P), and µ is a constant which can also be regarded as a random variable, we can have two different notions of convergence. The first is convergence of Sn /n → µ in probability, i.e., S n ∀ > 0, P − µ > → 0 as n → ∞, n which is called the weak law of large numbers (WLLN); or almost surely Sn (ω) Sn = →µ n n as n → ∞, which is called the strong law of large numbers (SLLN). en := Xn +1 equals 1 if the n-th coin toss shows Exercise 1.1 In the coin toss example above, X 2 P ei , head up and 0 otherwise. Use binomial expansion to prove the WLLN for Sen := ni=1 X which counts the number of heads. Then deduce the WLLN for Sn . We now prove the WLLN under a finite second moment restriction. Theorem 1.2 [L2 Weak Law of Large Numbers] Let (Xn )n∈N be a sequence of i.i.d. R-valued random variables defined on the probability space (Ω, F, P). Assume that E[X1 ] = µ P and Var(X1 ) := E[X12 ] − E[X1 ]2 = σ 2 < ∞. Then Sn /n := ni=1 Xi /n converges in probability to µ as n → ∞. Proof. The proof relies on control of the variance of Sn and applying Markov’s inequality. Denote Yn := Sn /n − µ. Note that E[Yn ] = 0. To show Yn → E[Yn ] = 0 in probability, it suffices to show that the variance of Yn , E[Yn2 ] − E[Yn ]2 = E[Yn2 ], tends to 0. Indeed, by Markov’s inequality, for any > 0, P(|Yn | > ) = E[1{|Yn |>} ] ≤ E[|Yn |2 −2 1{|Yn |>} ] ≤ E[Yn2 ]−2 , 1 which tends to 0 if E[Yn2 ] → 0 as n → ∞. Under the finite second moment assumption, E[Yn2 ] is easy to evaluate: h Pn X 2 i h Pn (X − µ) 2 i i 2 i=1 i i=1 E[Yn ] = E −µ = E n n n n 1 X 1 X σ2 2 = , E[(X − µ)(X − µ)] = E[(X − µ) ] = i j i n2 n2 n i,j=1 i=1 which tends to 0 as n → ∞, and hence Yn → 0 in probability. We now extend Theorem 1.2 to a WLLN requiring only E[|X1 |] < ∞. Theorem 1.3 [L1 Weak Law of Large Numbers] Let (Xn )n∈N be a sequence of i.i.d. P random variables with E[X1 ] = µ ∈ R. Then Sn /n := ni=1 Xi /n converges in probability to µ as n → ∞. Proof. Since E[X12 ] may be infinite, we can no longer use second moment calculations to bound P(|Sn /n − µ| > ). Instead, we will first truncate each Xi , i.e., replace each Xi by XiM , with XiM := Xi if |Xi | ≤ M , and XiM := 0 if |Xi | > M . Since the sequence (Xim )i∈N is i.i.d. with µM := E[X1M ] ∈ R and Var(X1M ) < ∞, we can apply the L2 WLLN to P SnM := ni=1 XiM /n to conclude that SnM → µM in probability. Note that for any > 0, S S SnM n n SnM P − µ > = P − + − µM + µM − µ > n n n n (1.1) S M S M n n Sn − +P − µM > + P |µM − µ| > , ≤P > n n 3 n 3 3 where the middle term tends to 0 as n → ∞ for each M > 0. Therefore to show that the right hand side (RHS) tends to 0 as n → ∞, it suffices to show that S SM lim sup P n − n > ≤ δ, n n 3 n→∞ ∀ δ > 0, ∃ M0 > 0, such that for all M > M0 : (1.2) P |µ − µ| > ≤ δ. M 3 Note that by Markov’s inequality, N i i 3 S 3 h S 3 h X n SnM n SnM − ≤ E − E (Xi − XiM ) ≤ E[|X1 − X1M |], (1.3) P > = n n 3 n n n i=1 and P |µM − µ| > = 1{|µM −µ|> 3 } = 1{|E[X M −X1 ]|> } ≤ 1{E[|X1 −X M |]> } . 1 1 3 3 3 Therefore to prove (1.2), it suffices to show that (1.4) lim E[|X1 − X1M |] = lim E[|X1 |1{|X1 |>M } ] = 0. M →∞ M →∞ Since E[|X1 |] < ∞, and P(|X1 | > M ) ≤ E[|X1 |]/M → 0 as M → ∞, the above limit must hold by the Dominated Convergence Theorem (see also Exercise 1.4 below), and so does (1.2), and hence Sn /n → µ in probability. Exercise 1.4 Let X : (Ω, F, P) → (R, B) satisfy E[|X|] < ∞. If An ∈ F is a sequence of sets with limn→∞ P(An ) = 0, then prove that limn→∞ E[|X|1An ] = 0. 2 We now make a further extension of the WLLN that does not even assume E[|X1 |] < ∞. Theorem 1.5 [Weak Law of Large Numbers] Let (Xn )n∈N be a sequence of i.i.d. random P variables with limx→∞ xP(|X1 | > x) = 0. Let Sn /n := ni=1 Xi /n and µn := E[X1 1{|X1 |≤n} ]. Then Sn /n − µn → 0 in probability as n → ∞. Remark. The condition limx→∞ xP(|X1 | > x) = 0 is in fact necessary for the WLLN. See Feller [1, Section VII.7] for a proof. Proof. Since µn is the mean of X1 truncated at level n, we choose n to be truncation level M in the proof of Theorem 1.3. Analogous to (1.1), we have S S S n n Snn n n − µn > ≤ P − > − µ P + P . (1.5) n > n n n 2 n 2 We can not apply Markov’s inequality because of lack of integrability. Instead, we note that P P (Sn −Snn )/n = ni=1 (Xi −Xin )/n = ni=1 Xi 1{|Xi |>n} /n, and hence the event |Sn /n−Snn /n| > /2 occurs only if |Xi | > n for some 1 ≤ i ≤ n. Therefore by a union bound, n S X n Snn P ≤ P(|Xi | > n) = nP(|X1 | > n), − > n n 2 i=1 which tends to 0 as n → ∞ by our assumption. To bound the second term in (1.5), we just apply Markov’s inequality with a L2 bound: 2 i S n 4 h S n 4 4 P n − µn > ≤ 2 E n − µn = 2 Var(X1n ) ≤ 2 E[(X1n )2 ]. (1.6) n 2 n n n By Exercise 1.6 below, Z Z Z 1 1 1 ∞ 1 n n 2 n E[(X1 ) ] = 2yP(|X1 | > y)dy = 2yP(n ≥ |X1 | > y)dy ≤ 2ntP(|X1 | > nt)dt. n n 0 n 0 0 Since limx→∞ xP(|X1 | > x) = 0, for any δ > 0, can find K > 0 such that xP(|X1 | > x) ≤ δ for all x ≥ K. We can then bound Z 1 Z K/n Z 1 K2 2ntP(|X1 | > nt)dt ≤ 2ntdt + 2ntP(|X1 | > nt)dt ≤ + 2δ, n 0 0 K/n the limit of which (as n → ∞) can be made arbitrarily small by choosing K sufficiently large. Therefore the bound in (1.6) tends to 0 as n → ∞, which substituted back into (1.5) implies that Sn /n − µn → 0 in probability. Exercise 1.6 [Representation of Moments] Let Y be a non-negative variable. R∞ R ∞ random p p−1 Show that E[Y ] = 0 P(Y > y)dy. Furthermore for any p > 0, E[Y ] = 0 py P(Y > y)dy. 2 Strong Law of Large Numbers We now prove the strong law of large numbers (SLLN), i.e., Sn /n → µ almost surely, under suitable conditions on the i.i.d. sequence (Xn )n∈N . We first need to introduce a fundamental lemma which is the standard tool to apply to make almost sure statements. Lemma 2.1 [Borel-Cantelli] Let (Ω, F, P) be a probability space, and An ∈ F for n ∈ N. 3 P (i) If ∞ P(A ) < ∞, then (An )n∈N occurs infinitely often with probability 0, i.e., almost n=1 P∞ n surely n=1 1An (ω) < ∞. P (ii) If (An )n∈N is an independent collection of events with ∞ P(An ) = ∞, then (An )n∈N n=1P occurs infinitely often with probability 1, i.e., almost surely ∞ n=1 1An (ω) = ∞. Proof. For (i), we note that ∞ X P(An ) = n=1 ∞ X ∞ hX i E[1An ] = E 1An < ∞, n=1 n=1 P where we used Tonelli’s Theorem to interchange ∞ n=1 with E[·]. Therefore the non-negative P∞ random variable n=1 1An (ω) must be finite almost surely. P∞ For (ii), we note that n=1 1An (ω) < ∞ if and only if lim supn→∞ 1An (ω) = 0, i.e., c . Note that ∩ c ω ∈ ∪∞ ∩ A m≥n m≥n Am is increasing in n. Therefore m n=1 P( ∞ X 1An (ω) < ∞) n=1 c c N c = P(∪∞ n=1 ∩m≥n Am ) = lim P(∩m≥n Am ) = lim lim P(∩m=n Am ) n→∞ = lim lim N Y n→∞ N →∞ n→∞ N →∞ (1 − P(Am )) ≤ lim lim m=n n→∞ N →∞ N Y e−P(Am ) = lim e− P∞ m=n n→∞ m=n P(Am ) = 0, where in the second and third equalities we used the countable additivity of P, in the fourth equality we used the independence of (An )n∈N , in the inequality we used that 1 − x ≤ e−x for P all x ∈ R, and in the last equality we used the assumption ∞ n=1 P(An ) = ∞. Here is a useful extension of Lemma 2.1 (ii), which replaces the independence of (An )n∈N by controls on their pairwise correlations. Lemma 2.2 [Kochen-Stone] Let (Ω, F, P) be a probability space, and An ∈ F for n ∈ N. P If ∞ n=1 P(An ) = ∞, then P ∞ X n=1 k→∞ Pk 2 n=1 P(An ) Pk m=1 n=1 P(Am 1An (ω) = ∞ ≥ lim sup Pk ∩ An ) . Note that if (An )n∈N are pair-wise independent, then almost surely An occurs infinitely often. Exercise 2.3 Recall the Paley-Zygmund inequality: If X ≥ 0 and E[X 2 ] < ∞, then for 0 ≤ a < E[X], P(X > a) ≥ (E[X] − a)2 /E[X 2 ]. Apply the same proof as the one for Paley-Zygmund inequality to prove the Kochen-Stone Lemma. As an easy corollary of the Borel-Cantelli Lemma, we prove a version of the Strong Law of Large Numbers (SLLN) with a finite 4-th moment assumption. Theorem 2.4 Let (Xn )n∈N be a sequence of i.i.d. random variables with E[X14 ] < ∞. Then Pn Sn i=1 Xi lim := lim = E[X1 ] almost surely. (2.7) n→∞ n n→∞ n 4 Proof. Without loss of generality (w.l.o.g.), we may assume E[X1 ] = 0, since otherwise we ei = Xi − E[Xi ]. For any > 0, by Markov’s inequality, we have can just replace Xi with X n S h S 4 i 1 h X 4 i n n −4 P > ≤ E Xi . = 4 4E n n n i=1 P When we expand ( ni=1 Xi )4 and take expectation, the only terms with E[Xi1 Xi2 Xi3 Xi4 ] 6= 0 are the ones where either i1 , . . . , i4 are all equal, or they take on two distinct values with each value repeated twice among i1 , . . . , i4 . Therefore uniformly in n ∈ N, we have S E[X 4 ] 3n(n − 1)E[X 2 ]2 C n 1 P > ≤ 4 13 + ≤ 2 for some C > 0. 4 4 n n n n S P∞ Clearly n=1 P nn > < ∞, and hence by Borel-Cantelli, the events { Snn > } n∈N almost Sn surely can only occur finitely many times. In other words, a.s. lim supn→∞ n ≤ . Since > 0 is arbitrary, (2.7) follows. 3 Kolmogorov’s 0-1 Law Before extending the SLLN to the minimal assumption E[|X1 |] < ∞, we show here that either limn→∞ Sn /n almost surely does not exist, or the limit equals a non-random constant a.s. This is based on Kolmogorov’s 0-1 law. Let X1 , X2 , . . . be a sequence of independent (not necessarily identically distributed) random variables defined on a probability space (Ω, F, P). Kolmogorov’s 0-1 law states that, events which do not depend on the value of a finite number of the Xi ’s can only have probP ability either 0 or 1. Examples of such events include {limn→∞ ni=1 Xi /an ∈ [c, d]} or Pn P {lim supn→∞ i=1 Xi /an > c}, for any sequence an ↑ ∞, since lim supn→∞ ni=1 Xi /an does not change if only a finite number of Xi ’s are modified. To be more precise, we need to introduce the notion of tail σ-algebras. n := σ(X , X For m ≤ n ∈ N ∪ {∞}, let Fm m m+1 , . . . , Xn ) denote the σ-algebra on Ω generated by Xm , . . . , Xn , i.e., it is the σ-algebra generated by events of the form Xi−1 ((a, b]) for m ≤ i ≤ n and a < b. Definition 3.1 [Tail σ-algebra] The σ-algebra T := ∩n∈N Fn∞ is called the tail σ-algebra. Intuitively, T consists of events which do not depend on the values of any finite collection of Xi ’s, or in other words, depend only on the infinite right tail of the sequence (X1 , X2 , . . .). Theorem 3.2 [Kolmogorov’s 0-1 Law] If X1 , X2 , . . . are independent random variables, then its tail σ-field T is trivial in the sense that P(A) ∈ {0, 1} for all A ∈ T . Proof. We will show that every event A ∈ T is independent of itself, and hence P(A) = P(A ∩ A) = P(A)2 , which implies P(A) ∈ {0, 1}. n is independent of F l if n < k. Since We note that by the independence of (X1 , X2 , . . .), Fm k ∞ n T := ∩n∈N Fn , T is independent of F0 for any n ∈ N. Since F0∞ is the σ-algebra generated by F0n , n ∈ N, it follows that T is also independent of F0∞ . On the other hand T ⊂ F0∞ , and hence T must be independent of itself, which concludes the proof. P Note that for i.i.d. random variables (Xi )i∈N , the almost sure limit lim supn→∞ n1 ni=1 Xi is a random variable measurable with respect to T , and hence must be trivial, i.e., equal to a P constant with probability 1. The same applies to lim inf n→∞ n1 ni=1 Xi . Therefore either a.s. P limn→∞ n1 ni=1 Xi does not exist, or a.s. it exists and equals a constant. 5 References [1] W. Feller. An Introduction to Probability Theory and Its Applications, Vol II, John Riley & Sons, Inc, 1971. (Note: some parts of the 1971 version is quite different from the 1967 version.) 6