Download The Lindeberg central limit theorem

The Lindeberg central limit theorem Jordan Bell [email protected] Department of Mathematics, University of Toronto May 29, 2015 1 Convergence in distribution We denote by P(Rd ) the collection of Borel probability measures on Rd . Unless we say otherwise, we use the narrow topology on P(Rd ): the coarsest topology such that for each f ∈ Cb (Rd ), the map Z µ 7→ f dµ Rd is continuous P(Rd ) → C. Because Rd is a Polish space it follows that P(Rd ) is a Polish space.1 (In fact, its topology is induced by the Prokhorov metric.2 ) 2 Characteristic functions For µ ∈ P(Rd ), we define its characteristic function µ̃ : Rd → C by Z µ̃(u) = eiu·x dµ(x). Rd Theorem 1. If µ ∈ P(R) has finite kth moment, k ≥ 0, then, writing φ = µ̃: 1. φ ∈ C k (R). 2. φ(k) (v) = (i)k R R xk eivx dµ(x). 3. φ(k) is uniformly continuous. R 4. |φk (v)| ≤ R |x|k dµ(x). 1 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A Hitchhiker’s Guide, third ed., p. 515, Theorem 15.15; http://individual.utoronto.ca/ jordanbell/notes/narrow.pdf 2 Onno van Gaans, Probability measures on metric spaces, http://www.math.leidenuniv. nl/~vangaans/jancol1.pdf; Bert Fristedt and Lawrence Gray, A Modern Approach to Probability Theory, p. 365, Theorem 25. 1 Proof. For 0 ≤ l ≤ k, define fl : R → C by Z fl (v) = xl eivx dµ(x). R For h 6= 0, l ivx eihx − 1 ≤ |xl · x| = |x|l+1 , x e h so by the dominated convergence theorem we have for 0 ≤ l ≤ k − 1, Z fl (v + h) − fl (v) eihx − 1 lim = lim dµ(x) xl eivx h→0 h→0 R h h Z eihx − 1 l ivx dµ(x) = xe lim h→0 h ZR = ixl+1 eivx dµ(x). R That is, fl0 = ifl+1 . And, by the dominated convergence, for > 0 there is some δ > 0 such that if |w| < δ then Z |x|k |eiwx − 1|dµ(x) < , R hence if |v − u| < δ then Z k iux i(v−u)x − 1)dµ(x) |fk (v) − fk (u)| = x e (e ZR ≤ |x|k |ei(v−u)x − 1|dµ(x) R < , showing that fk is uniformly continuous. As well, Z |fk (v)| ≤ |x|k dµ(x) R But φ = f0 , i.e. φ(0) = f0 , so φ(1) = f00 = if1 , φ(2) = (if1 )0 = (i)2 f2 , 2 ··· , φ(k) = (i)k fk . If φ ∈ C k (R), Taylor’s theorem tells us that for each x ∈ R, φ(x) = k−1 X l=0 = φ(l) (0) l x + l! k X φ(l) (0) l! l=0 = k X l=0 xl + Z x (x − t)k−1 (k) φ (t)dt (k − 1)! x (x − t)k−1 (k) (φ (t) − φ(k) (0))dt (k − 1)! 0 Z 0 φ(l) (0) l x + Rk (x), l! and Rk (x) satisfies |Rk (x)| ≤ |x|k . sup |φ(k) (ux) − φ(k) (0)| · k! 0≤u≤1 Define θk : R → C by θk (0) = 0 and for x 6= 0 θk (x) = k! · Rk (x), xk with which, for all x ∈ R, φ(x) = k X φ(l) (0) l=0 l! xl + 1 θk (x)xk . k! Because Rk is continuous on R, θk is continuous at each x 6= 0. Moreover, |θk (x)| ≤ sup |φ(k) (ux) − φ(k) (0)|, 0≤u≤1 and as φ(k) is continuous it follows that θk is continuous at 0. Thus θk is continuous on R. Lemma 2. If µ ∈ P(R) have finite kth moment, k ≥ 0, and for 0 ≤ l ≤ k, Z Ml = xl dµ(x), R then there is a continuous function θ : R → C for which µ̃(x) = k X (i)l Ml l=0 l! xl + 1 θ(x)xk . k! The function θ satisfies |θ(x)| ≤ sup |µ̃(k) (ux) − µ̃(k) (0)|. 0≤u≤1 3 Proof. From Theorem 1, µ̃ ∈ C k (R) and Z µ̃(l) (0) = (i)l xl dµ(x) = (i)l Ml . R Thus from what we worked out above with Taylor’s theorem, µ̃(x) = k X (i)l Ml l=0 l! xl + 1 θk (x)xk , k! for which |θk (x)| ≤ sup |µ̃(k) (ux) − µ̃(k) (0)|. 0≤u≤1 For a ∈ R and σ > 0, let p(t, a, σ 2 ) = 1 (t − a)2 √ exp − , 2σ 2 σ 2π t ∈ R. Let γa,σ2 be the measure on R whose density with respect to Lebesgue measure is p(·, a, σ 2 ). We call γa,σ2 a Gaussian measure. We calculate that the first moment of γa,σ2 is a and that its second moment is σ 2 . We also calculate that 1 γ̃a,σ2 (x) = exp iax − σ 2 x2 . 2 Lévy’s continuity theorem is the following.3 Theorem 3 (Lévy’s continuity theorem). Let µn be a sequence in P(Rd ). 1. If µ ∈ P(Rd ) and µn → µ, then for each µ̃n converges to µ̃ pointwise. 2. If there is some function φ : Rd → C to which µ̃n converges pointwise and φ is continuous at 0, then there is some µ ∈ P(Rd ) such that φ = µ̃ and such that µn → µ. 3 The Lindeberg condition, the Lyapunov condition, the Feller condition, and asymptotic negligibility Let (Ω, F , P ) be a probability and let Xn , n ≥ 1, be independent L2 random variables. We specify when we impose other hypotheses on them; in particular, 3 http://individual.utoronto.ca/jordanbell/notes/martingaleCLT.pdf, p. 19, Theorem 15. 4 we specfify if we suppose them to be identically distributed or to belong to Lp for p > 2. For a random variable X, write p p σ(X) = Var(X) = E(|X − E(X)|2 ). Write σn = σ(Xn ), and, using that the Xn are independent,  sn = σ  n X  1/2 n X Xj  =  σj2   j=1 j=1 and ηn = E(Xn ). For n ≥ 1 and > 0, define n 1 X E((Xj − ηj )2 ||Xj − ηj | ≥ sn ) s2n j=1 n Z 1 X = 2 (x − ηj )2 d(Xj ∗ P )(x). sn j=1 |x−ηj |≥sn Ln () = We say that the sequence Xn satisfies the Lindeberg condition if for each > 0, lim Ln () = 0. n→∞ For example, if the sequence Xn is identically distributed, then s2n = nσ12 , so n Z 1 X Ln () = (x − η1 )2 d(X1∗ P )(x) nσ12 j=1 |x−η1 |≥n1/2 σ1 Z 1 = 2 (x − η1 )2 d(X1∗ P ). σ1 |x−η1 |≥n1/2 σ1 But if µ is a Borel probability measure on R and f ∈ L1 (µ) and Kn is a sequence of compact sets that exhaust R, then4 Z |f |dµ → 0, n → ∞. R\Kn Hence Ln () → 0 as n → ∞, showing that Xn satisfies the Lindeberg condition. 4 V. I. Bogachev, Measure Theory, volume I, p. 125, Proposition 2.6.2. 5 We say that the sequence Xn satisfies the Lyapunov condition if there is some δ > 0 such that the Xn are L2+δ and lim n→∞ n 1 X s2+δ n E(|Xj − ηj |2+δ ) = 0. j=1 In this case, for > 0, then |x − η| ≥ sn implies |x − η|2+δ ≥ |x − η|2 (sn )δ and hence n Z |x − ηj |2+δ 1 X d(Xj ∗ P )(x) Ln () ≤ 2 sn j=1 |x−ηj |≥sn (sn )δ n Z 1 X = 2+δ |x − ηj |2+δ d(Xj ∗ P )(x) sn j=1 |x−ηj |≥sn n Z 1 X |Xj − ηj |2+δ dP = 2+δ sn j=1 |Xj −ηj |≥sn ≤ 1 n X s2+δ n j=1 E(|Xj − ηj |2+δ ) → 0. This is true for each > 0, showing that if Xn satisfies the Lyapunov condition then it satisfies the Lindeberg condition. For example, if Xn are identically distributed and L2+δ , then n 1 X s2+δ n j=1 E(|Xj − ηj |2+δ ) = 1 nδ/2 σ12+δ E(|Xj − ηj |2+δ ) → 0, showing that Xn satisfies the Lyapunov condition. Another example: Suppose that the sequence Xn is bounded by M almost surely and that sn → ∞. |Xn | ≤ M almost surely implies that |ηn | = |E(Xn )| ≤ E(|Xn |) ≤ E(M ) = M. Therefore |Xn − ηn | ≤ |Xn | + |ηn | ≤ 2M almost surely. Let δ > 0. Then, as s2n = nσ12 , n 1 X s2+δ n j=1 E(|Xj − ηj | 2+δ )≤ N 1 X s2+δ n j=1 E(|Xj − ηj |2 )(2M )δ (2M )δ sδn → 0, = showing that Xn satisfies the Lyapunov condition. 6 We say that a sequence of random variables Xn satisfies the Feller condition when σj lim max = 0, n→∞ 1≤j≤n sn p where σj = σ(Xj ) = Var(Xj ) and  sn =  n X 1/2 σj2  . j=1 We prove that if a sequence satisfies the Lindeberg condition then it satisfies the Feller condition.5 Lemma 4. If a sequence of random variables Xn satisfies the Lindeberg condition, then it satisfies the Feller condition. Proof. Let > 0¿ For n ≥ 1 and 1 ≤ k ≤ n, we calculate Z 2 σk = (x − ηk )2 d(Xk∗ P )(x) ZR Z = (x − ηk )2 d(Xk∗ P )(x) + (x − ηk )2 d(Xk∗ P )(x) ≤ |x−ηk |<sn n Z X (sn )2 + (x j=1 |x−ηj |≥sn |x−ηk |≥sn − ηj )2 d(Xj ∗ P )(x) = 2 s2n + s2n Ln (). Hence max 1≤k≤n σk sn 2 ≤ 2 + Ln (), and so, because the Xn satisfy the Lindeberg condition, lim sup max n→∞ 1≤k≤n σk sn 2 ≤ 2 . This is true for all > 0, which yields lim max n→∞ 1≤k≤n σk sn 2 = 0, namely, that the Xn satisfy the Feller condition. We do not use the following idea of an asymptotically negligible family of random variables elsewhere, and merely take this as an excsuse to write out 5 Heinz Bauer, Probability Theory, p. 235, Lemma 28.2. 7 what it means. A family of random variables Xn,j , n ≥ 1, 1 ≤ j ≤ kn , is called asymptotically negligible6 if for each > 0, lim max P (|Xn,j | ≥ ) = 0. n→∞ 1≤j≤kn A sequence of random variables Xn converging in probability to 0 is equivalent to it being asymptotically negligible, with kn = 1 for each n. For example, suppose that Xn,j are L2 random variables each with E(Xn,j ) = 0 and that they satisfy lim max Var(Xn,j ) = 0. n→∞ 1≤j≤kn For > 0, by Chebyshev’s inequality, P (|Xn,j | ≥ ) ≤ 1 1 E(|Xn,j |2 ) = 2 Var(Xn,j ), 2 whence lim max P (|Xn,j | ≥ ) ≤ lim sup max n→∞ 1≤j≤kn n→∞ 1≤j≤kn 1 Var(Xn,j ) = 0, 2 and so the random variables Xn,j are asymptotically negligible. Another example: Suppose that random variables Xn,j are identically distributed, with µ = Xn,j ∗ P . For > 0, Xn,j P ≥ = P (|Xn,j | ≥ n) = µ(An ), n where An = {x ∈ R : |x| ≥ n}. As An ↓ ∅, limn→∞ µ(An ) = 0. Hence the X random variables nn,j are asymptotic negligible. The following is a statement about the characteristic functions of an asymptotically negligible family of random variables.7 Lemma 5. Suppose that a family Xn,j , n ≥ 1, 1 ≤ j ≤ kn , of random variables is asymptotically negligible, and write µn,j = Xn,j ∗ P and φn,j = µ̃n,j . For each x ∈ R, lim max |φn,j (x) − 1| = 0. n→∞ 1≤j≤kn it Proof. For any real t, |e − 1| ≤ |t|. For x ∈ R, > 0, n ≥ 1, and 1 ≤ j ≤ kn , Z |φn,j (x) − 1| = (eixy − 1)dµn,j (y) ZR Z ≤ |eixy − 1|dµn,j (y) + |eixy − 1|dµn,j (y) |y|< |y|≥ Z Z ≤ |xy|dµn,j (y) + 2dµn,j (y) |y|< |y|≥ ≤ |x| + 2P (|Xn,j | ≥ ). 6 Heinz 7 Heinz Bauer, Probability Theory, p. 225, §27.2. Bauer, Probability Theory, p. 227, Lemma 27.3. 8 Hence max |φn,j (x) − 1| ≤ |x| + 2 max P (|Xn,j | ≥ ). 1≤j≤kn 1≤j≤kn Using that the family Xn,j is asymptotically negligible, lim sup max |φn,j (x) − 1| ≤ 2|x|. n→∞ 1≤j≤kn But this is true for all > 0, so lim sup max |φn,j (x) − 1| = 0, n→∞ 1≤j≤kn proving the claim. 4 The Lindeberg central limit theorem We now prove the Lindeberg central limit theorem.8 Theorem 6 (Lindeberg central limit theorem). If Xn is a sequence of independent L2 random variables that satisfy the Lindeberg condition, then Sn∗ P → γ1 , where Pn n 1 X j=1 (Xj − E(Xj )) (Xj − ηj ) = . Sn = sn j=1 σ(X1 + · · · + Xn ) Proof. The sequence Yn = Xn − E(Xn ) are independent L2 random variables that satisfy the Lindeberg condition and σ(Yn ) = σ(Xn ). Proving the claim for the sequence Yn will prove the claim for the sequence Xn , and thus it suffices to prove the claim when E(Xn ) = 0, i.e. ηn = 0. For n ≥ 1 and 1 ≤ j ≤ n, let Xj σj µn,j = P and τn,j = . sn ∗ sn The first moment of µn,j is Z Z Xj Xj 1 P (x) = xd dP = E(Xj ) = 0, sn ∗ sn R Ω sn and the second moment of µn,j is Z R 8 Heinz x2 d Xj sn P ∗ Z (x) = Ω Xj sn 2 dP = σj2 1 2 2 E(X ) = = τn,j , j s2n s2n Bauer, Probability Theory, p. 235, Theorem 28.3. 9 for which n X n 1 X 2 σ = 1. = 2 sn j=1 j j=1 R R For µ ∈ P(R) with first moment R xdµ(x) = 0 and second moment R x2 dµ(x) = σ 2 < ∞, Lemma 2 tells us that µ̃(x) = M0 + iM1 x − 2 τn,j σ2 2 1 M2 2 1 x + θ2 (x)x2 = 1 − x + θ(x)x2 , 2 2 2 2 with |θ(x)| ≤ sup |µ̃00 (ux) − µ̃00 (0)|. 0≤u≤1 But by Lemma 1, µ̃00 (ux) = − Z y 2 eiuxy dµ(y), R so Z 2 iuxy + 1)dµ(y) |θ(x)| ≤ sup y (−e 0≤u≤1 R Z ≤ sup y 2 |eiuxy − 1|dµ(y). 0≤u≤1 R For 0 ≤nu ≤ o1, |eiuxy − 1| ≤ |uxy| ≤ |xy|, so for x ∈ R and > 0, with , when |y| < δ and 0 ≤ u ≤ 1 we have |eiuxy − 1| < . Thus δ = min , |x| Z Z y 2 |eiuxy − 1|dµ(y) y 2 |eiuxy − 1|dµ(y) + sup |θ(x)| ≤ sup 0≤u≤1 |y|≥δ 0≤u≤1 |y|<δ Z Z ≤ y 2 dµ(y) + 2 y 2 dµ(y) |y|<δ |y|≥δ Z 2 2 ≤ σ + 2 y dµ(y). |y|≥δ n o Let x ∈ R and > 0, and take δ = min , |x| . On the one hand, for n ≥ 1 and 1 ≤ j ≤ n, because the first moment of µn,j is 0 and its second moment is 2 τn,j , 2 τn,j 1 µ̃n,j (x) = 1 − x2 + θn,j (x)x2 , 2 2 with, from the above, Z 2 |θn,j (x)| ≤ τn,j + 2 y 2 dµn,j (y). |y|≥δ 2 On the other hand, the first moment of the Gaussian measure γ0,τn,j is 0 and 2 its second moment is τn,j . Its characteristic function is ! 2 2 τn,j τn,j 1 2 2 (x) = exp γ̃0,τn,j − x =1− x2 + ψn,j (x)x2 , 2 2 2 10 with, from the above, 2 |ψn,j (x)| ≤ τn,j +2 Z |y|≥δ 2 (x). y 2 dγ0,τn,j In particular, for all x ∈ R, 2 (x) = µ̃n,j (x) − γ̃0,τn,j x2 (θn,j (x) − ψn,j (x)) . 2 For k ≥ 1 and for al , bl ∈ C, 1 ≤ l ≤ k, k Y l=1 al − k Y bl = l=1 k X b1 · · · bl−1 (al − bl )al+1 · · · ak . l=1 If further |al | ≤ 1, |bl | ≤ 1, then k k k X Y Y |al − bl |. bl ≤ al − l=1 (1) l=1 l=1 Because the Xn are independent, the distribution of Sn = n X Xj j=1 sn is the convolution of the distributions of the summands: µn,1 ∗ · · · ∗ µn,n , whose characteristic function is φn = n Y µ̃n,j , j=1 since the characteristic function of a convolution of P measures is the product of n 2 the characteristic functions of the measures. Using j=1 τn,j = 1 and (1), for x ∈ R we have Y n n Y 2 x2 1 2 − 2 − 2 τn,j x |= µ̃n,j (x) − e |φn (x) − e j=1 j=1 ≤ n X 2 1 2 µ̃n,j (x) − e− 2 τn,j x j=1 n X 2 (x) = µ̃n,j (x) − γ̃0,τn,j j=1 = n x2 X |θn,j (x) − ψn,j (x)| . 2 j=1 11 n o Therefore, for x ∈ R, > 0, and δ = min , |x| , |φn (x) − e− n 2 X x ≤ 2 x2 2 | 2 τn,j +2 Z |y|≥δ j=1 =x2 + x2 2 y 2 dµn,j (y) + τn,j +2 Z n Z X j=1 |y|≥δ y 2 dµn,j (y) + x2 |y|≥δ n Z X j=1 |y|≥δ ! 2 (y) y 2 dγ0,τn,j 2 (y). y 2 dγ0,τn,j We calculate n Z 1 X y 2 d(Xj ∗ P )(y) s2n j=1 |y|≥δsn n Z 1 X X 2 dP = 2 sn j=1 |Xj |≥δsn j 2 n Z X Xj = dP X j sn j=1 sn ≥δ n Z X Xj 2 = y d P (y) sn ∗ j=1 |y|≥δ n Z X = y 2 dµn,j (y). Ln (δ) = j=1 |y|≥δ Hence, the fact that the Xn satisfy the Lindeberg condition yields n Z X 2 2 2 − x2 2 (y). | ≤ x + x lim sup y 2 dγ0,τn,j lim sup |φn (x) − e n→∞ n→∞ Write j=1 |y|≥δ σj . 1≤j≤n sn αn = max τn,j = max 1≤j≤n We calculate n Z X j=1 |y|≥δ 2 2 (y) = y dγ0,τn,j = ≤ n Z X j=1 n X j=1 n X y |y|≥δ 2 τn,j 2 τn,j 1 √ Z y2 exp − 2 2τn,j 2π u2 dγ0,1 (u) |u|≥δ/τn,j 2 τn,j Z u2 dγ0,1 (u) |u|≥δ/αn j=1 Z u2 dγ0,1 (u). = |u|≥δ/αn 12 ! dy (2) Because the sequence Xn satisfies the Lindeberg condition, by Lemma 4 it satisfies the Feller condition, which means that αn → 0 as n → ∞. Because αn → 0 as n → ∞, δ/αn → ∞ as n → ∞, hence Z u2 dγ0,1 (u) → 0 |u|≥δ/αn as n → ∞. Thus we get n Z X j=1 |y|≥δ 2 (y) → 0 y 2 dγ0,τn,j as n → ∞. Using this with (2) yields lim sup |φn (x) − e− x2 2 | ≤ x2 . n→∞ This is true for all > 0, so lim |φn (x) − e− n→∞ x2 2 | = 0, namely, φn (the characteristic function of Sn∗ P ) converges pointwise to e− 2 x2 2 . 2 − x2 − x2 Moreover, e is indeed continuous at 0, and e = γ̃0,1 (x). Therefore, Lévy’s continuity theorem (Theorem 3) tells us that Sn∗ P converges narrowly to γ0,1 , which is the claim. 13

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download The Lindeberg central limit theorem