Download The Lindeberg central limit theorem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Lindeberg central limit theorem
Jordan Bell
[email protected]
Department of Mathematics, University of Toronto
May 29, 2015
1
Convergence in distribution
We denote by P(Rd ) the collection of Borel probability measures on Rd . Unless we say otherwise, we use the narrow topology on P(Rd ): the coarsest
topology such that for each f ∈ Cb (Rd ), the map
Z
µ 7→
f dµ
Rd
is continuous P(Rd ) → C. Because Rd is a Polish space it follows that P(Rd ) is
a Polish space.1 (In fact, its topology is induced by the Prokhorov metric.2 )
2
Characteristic functions
For µ ∈ P(Rd ), we define its characteristic function µ̃ : Rd → C by
Z
µ̃(u) =
eiu·x dµ(x).
Rd
Theorem 1. If µ ∈ P(R) has finite kth moment, k ≥ 0, then, writing φ = µ̃:
1. φ ∈ C k (R).
2. φ(k) (v) = (i)k
R
R
xk eivx dµ(x).
3. φ(k) is uniformly continuous.
R
4. |φk (v)| ≤ R |x|k dµ(x).
1 Charalambos D. Aliprantis and Kim C. Border, Infinite Dimensional Analysis: A
Hitchhiker’s Guide, third ed., p. 515, Theorem 15.15; http://individual.utoronto.ca/
jordanbell/notes/narrow.pdf
2 Onno van Gaans, Probability measures on metric spaces, http://www.math.leidenuniv.
nl/~vangaans/jancol1.pdf; Bert Fristedt and Lawrence Gray, A Modern Approach to Probability Theory, p. 365, Theorem 25.
1
Proof. For 0 ≤ l ≤ k, define fl : R → C by
Z
fl (v) =
xl eivx dµ(x).
R
For h 6= 0,
l ivx eihx − 1 ≤ |xl · x| = |x|l+1 ,
x e
h
so by the dominated convergence theorem we have for 0 ≤ l ≤ k − 1,
Z
fl (v + h) − fl (v)
eihx − 1
lim
= lim
dµ(x)
xl eivx
h→0
h→0 R
h
h
Z
eihx − 1
l ivx
dµ(x)
=
xe
lim
h→0
h
ZR
=
ixl+1 eivx dµ(x).
R
That is,
fl0 = ifl+1 .
And, by the dominated convergence, for > 0 there is some δ > 0 such that if
|w| < δ then
Z
|x|k |eiwx − 1|dµ(x) < ,
R
hence if |v − u| < δ then
Z
k iux i(v−u)x
− 1)dµ(x)
|fk (v) − fk (u)| = x e (e
ZR
≤
|x|k |ei(v−u)x − 1|dµ(x)
R
< ,
showing that fk is uniformly continuous. As well,
Z
|fk (v)| ≤
|x|k dµ(x)
R
But φ = f0 , i.e. φ(0) = f0 , so
φ(1) = f00 = if1 ,
φ(2) = (if1 )0 = (i)2 f2 ,
2
··· ,
φ(k) = (i)k fk .
If φ ∈ C k (R), Taylor’s theorem tells us that for each x ∈ R,
φ(x) =
k−1
X
l=0
=
φ(l) (0) l
x +
l!
k
X
φ(l) (0)
l!
l=0
=
k
X
l=0
xl +
Z
x
(x − t)k−1 (k)
φ (t)dt
(k − 1)!
x
(x − t)k−1 (k)
(φ (t) − φ(k) (0))dt
(k − 1)!
0
Z
0
φ(l) (0) l
x + Rk (x),
l!
and Rk (x) satisfies
|Rk (x)| ≤
|x|k
.
sup |φ(k) (ux) − φ(k) (0)| ·
k!
0≤u≤1
Define θk : R → C by θk (0) = 0 and for x 6= 0
θk (x) =
k!
· Rk (x),
xk
with which, for all x ∈ R,
φ(x) =
k
X
φ(l) (0)
l=0
l!
xl +
1
θk (x)xk .
k!
Because Rk is continuous on R, θk is continuous at each x 6= 0. Moreover,
|θk (x)| ≤ sup |φ(k) (ux) − φ(k) (0)|,
0≤u≤1
and as φ(k) is continuous it follows that θk is continuous at 0. Thus θk is
continuous on R.
Lemma 2. If µ ∈ P(R) have finite kth moment, k ≥ 0, and for 0 ≤ l ≤ k,
Z
Ml =
xl dµ(x),
R
then there is a continuous function θ : R → C for which
µ̃(x) =
k
X
(i)l Ml
l=0
l!
xl +
1
θ(x)xk .
k!
The function θ satisfies
|θ(x)| ≤ sup |µ̃(k) (ux) − µ̃(k) (0)|.
0≤u≤1
3
Proof. From Theorem 1, µ̃ ∈ C k (R) and
Z
µ̃(l) (0) = (i)l xl dµ(x) = (i)l Ml .
R
Thus from what we worked out above with Taylor’s theorem,
µ̃(x) =
k
X
(i)l Ml
l=0
l!
xl +
1
θk (x)xk ,
k!
for which
|θk (x)| ≤ sup |µ̃(k) (ux) − µ̃(k) (0)|.
0≤u≤1
For a ∈ R and σ > 0, let
p(t, a, σ 2 ) =
1
(t − a)2
√ exp −
,
2σ 2
σ 2π
t ∈ R.
Let γa,σ2 be the measure on R whose density with respect to Lebesgue measure
is p(·, a, σ 2 ). We call γa,σ2 a Gaussian measure. We calculate that the first
moment of γa,σ2 is a and that its second moment is σ 2 . We also calculate that
1
γ̃a,σ2 (x) = exp iax − σ 2 x2 .
2
Lévy’s continuity theorem is the following.3
Theorem 3 (Lévy’s continuity theorem). Let µn be a sequence in P(Rd ).
1. If µ ∈ P(Rd ) and µn → µ, then for each µ̃n converges to µ̃ pointwise.
2. If there is some function φ : Rd → C to which µ̃n converges pointwise and
φ is continuous at 0, then there is some µ ∈ P(Rd ) such that φ = µ̃ and
such that µn → µ.
3
The Lindeberg condition, the Lyapunov condition, the Feller condition, and asymptotic
negligibility
Let (Ω, F , P ) be a probability and let Xn , n ≥ 1, be independent L2 random
variables. We specify when we impose other hypotheses on them; in particular,
3 http://individual.utoronto.ca/jordanbell/notes/martingaleCLT.pdf, p. 19, Theorem 15.
4
we specfify if we suppose them to be identically distributed or to belong to Lp
for p > 2.
For a random variable X, write
p
p
σ(X) = Var(X) = E(|X − E(X)|2 ).
Write
σn = σ(Xn ),
and, using that the Xn are independent,

sn = σ 
n
X

1/2
n
X
Xj  = 
σj2 

j=1
j=1
and
ηn = E(Xn ).
For n ≥ 1 and > 0, define
n
1 X
E((Xj − ηj )2 ||Xj − ηj | ≥ sn )
s2n j=1
n Z
1 X
= 2
(x − ηj )2 d(Xj ∗ P )(x).
sn j=1 |x−ηj |≥sn
Ln () =
We say that the sequence Xn satisfies the Lindeberg condition if for
each > 0,
lim Ln () = 0.
n→∞
For example, if the sequence Xn is identically distributed, then s2n = nσ12 ,
so
n Z
1 X
Ln () =
(x − η1 )2 d(X1∗ P )(x)
nσ12 j=1 |x−η1 |≥n1/2 σ1
Z
1
= 2
(x − η1 )2 d(X1∗ P ).
σ1 |x−η1 |≥n1/2 σ1
But if µ is a Borel probability measure on R and f ∈ L1 (µ) and Kn is a sequence
of compact sets that exhaust R, then4
Z
|f |dµ → 0,
n → ∞.
R\Kn
Hence Ln () → 0 as n → ∞, showing that Xn satisfies the Lindeberg condition.
4 V.
I. Bogachev, Measure Theory, volume I, p. 125, Proposition 2.6.2.
5
We say that the sequence Xn satisfies the Lyapunov condition if there
is some δ > 0 such that the Xn are L2+δ and
lim
n→∞
n
1 X
s2+δ
n
E(|Xj − ηj |2+δ ) = 0.
j=1
In this case, for > 0, then |x − η| ≥ sn implies |x − η|2+δ ≥ |x − η|2 (sn )δ and
hence
n Z
|x − ηj |2+δ
1 X
d(Xj ∗ P )(x)
Ln () ≤ 2
sn j=1 |x−ηj |≥sn (sn )δ
n Z
1 X
= 2+δ
|x − ηj |2+δ d(Xj ∗ P )(x)
sn j=1 |x−ηj |≥sn
n Z
1 X
|Xj − ηj |2+δ dP
= 2+δ
sn j=1 |Xj −ηj |≥sn
≤
1
n
X
s2+δ
n
j=1
E(|Xj − ηj |2+δ )
→ 0.
This is true for each > 0, showing that if Xn satisfies the Lyapunov condition
then it satisfies the Lindeberg condition.
For example, if Xn are identically distributed and L2+δ , then
n
1 X
s2+δ
n
j=1
E(|Xj − ηj |2+δ ) =
1
nδ/2 σ12+δ
E(|Xj − ηj |2+δ ) → 0,
showing that Xn satisfies the Lyapunov condition.
Another example: Suppose that the sequence Xn is bounded by M almost
surely and that sn → ∞. |Xn | ≤ M almost surely implies that
|ηn | = |E(Xn )| ≤ E(|Xn |) ≤ E(M ) = M.
Therefore |Xn − ηn | ≤ |Xn | + |ηn | ≤ 2M almost surely. Let δ > 0. Then, as
s2n = nσ12 ,
n
1 X
s2+δ
n
j=1
E(|Xj − ηj |
2+δ
)≤
N
1 X
s2+δ
n
j=1
E(|Xj − ηj |2 )(2M )δ
(2M )δ
sδn
→ 0,
=
showing that Xn satisfies the Lyapunov condition.
6
We say that a sequence of random variables Xn satisfies the Feller condition when
σj
lim max
= 0,
n→∞ 1≤j≤n sn
p
where σj = σ(Xj ) = Var(Xj ) and

sn = 
n
X
1/2
σj2 
.
j=1
We prove that if a sequence satisfies the Lindeberg condition then it satisfies
the Feller condition.5
Lemma 4. If a sequence of random variables Xn satisfies the Lindeberg condition, then it satisfies the Feller condition.
Proof. Let > 0¿ For n ≥ 1 and 1 ≤ k ≤ n, we calculate
Z
2
σk = (x − ηk )2 d(Xk∗ P )(x)
ZR
Z
=
(x − ηk )2 d(Xk∗ P )(x) +
(x − ηk )2 d(Xk∗ P )(x)
≤
|x−ηk |<sn
n Z
X
(sn )2 +
(x
j=1 |x−ηj |≥sn
|x−ηk |≥sn
− ηj )2 d(Xj ∗ P )(x)
= 2 s2n + s2n Ln ().
Hence
max
1≤k≤n
σk
sn
2
≤ 2 + Ln (),
and so, because the Xn satisfy the Lindeberg condition,
lim sup max
n→∞ 1≤k≤n
σk
sn
2
≤ 2 .
This is true for all > 0, which yields
lim max
n→∞ 1≤k≤n
σk
sn
2
= 0,
namely, that the Xn satisfy the Feller condition.
We do not use the following idea of an asymptotically negligible family of
random variables elsewhere, and merely take this as an excsuse to write out
5 Heinz
Bauer, Probability Theory, p. 235, Lemma 28.2.
7
what it means. A family of random variables Xn,j , n ≥ 1, 1 ≤ j ≤ kn , is called
asymptotically negligible6 if for each > 0,
lim
max P (|Xn,j | ≥ ) = 0.
n→∞ 1≤j≤kn
A sequence of random variables Xn converging in probability to 0 is equivalent
to it being asymptotically negligible, with kn = 1 for each n.
For example, suppose that Xn,j are L2 random variables each with E(Xn,j ) =
0 and that they satisfy
lim
max Var(Xn,j ) = 0.
n→∞ 1≤j≤kn
For > 0, by Chebyshev’s inequality,
P (|Xn,j | ≥ ) ≤
1
1
E(|Xn,j |2 ) = 2 Var(Xn,j ),
2
whence
lim
max P (|Xn,j | ≥ ) ≤ lim sup max
n→∞ 1≤j≤kn
n→∞ 1≤j≤kn
1
Var(Xn,j ) = 0,
2
and so the random variables Xn,j are asymptotically negligible.
Another example: Suppose that random variables Xn,j are identically distributed, with µ = Xn,j ∗ P . For > 0,
Xn,j P ≥ = P (|Xn,j | ≥ n) = µ(An ),
n where An = {x ∈ R : |x| ≥ n}. As An ↓ ∅, limn→∞ µ(An ) = 0. Hence the
X
random variables nn,j are asymptotic negligible.
The following is a statement about the characteristic functions of an asymptotically negligible family of random variables.7
Lemma 5. Suppose that a family Xn,j , n ≥ 1, 1 ≤ j ≤ kn , of random variables
is asymptotically negligible, and write µn,j = Xn,j ∗ P and φn,j = µ̃n,j . For each
x ∈ R,
lim max |φn,j (x) − 1| = 0.
n→∞ 1≤j≤kn
it
Proof. For any real t, |e − 1| ≤ |t|. For x ∈ R, > 0, n ≥ 1, and 1 ≤ j ≤ kn ,
Z
|φn,j (x) − 1| = (eixy − 1)dµn,j (y)
ZR
Z
≤
|eixy − 1|dµn,j (y) +
|eixy − 1|dµn,j (y)
|y|<
|y|≥
Z
Z
≤
|xy|dµn,j (y) +
2dµn,j (y)
|y|<
|y|≥
≤ |x| + 2P (|Xn,j | ≥ ).
6 Heinz
7 Heinz
Bauer, Probability Theory, p. 225, §27.2.
Bauer, Probability Theory, p. 227, Lemma 27.3.
8
Hence
max |φn,j (x) − 1| ≤ |x| + 2 max P (|Xn,j | ≥ ).
1≤j≤kn
1≤j≤kn
Using that the family Xn,j is asymptotically negligible,
lim sup max |φn,j (x) − 1| ≤ 2|x|.
n→∞ 1≤j≤kn
But this is true for all > 0, so
lim sup max |φn,j (x) − 1| = 0,
n→∞ 1≤j≤kn
proving the claim.
4
The Lindeberg central limit theorem
We now prove the Lindeberg central limit theorem.8
Theorem 6 (Lindeberg central limit theorem). If Xn is a sequence of independent L2 random variables that satisfy the Lindeberg condition, then
Sn∗ P → γ1 ,
where
Pn
n
1 X
j=1 (Xj − E(Xj ))
(Xj − ηj ) =
.
Sn =
sn j=1
σ(X1 + · · · + Xn )
Proof. The sequence Yn = Xn − E(Xn ) are independent L2 random variables
that satisfy the Lindeberg condition and σ(Yn ) = σ(Xn ). Proving the claim for
the sequence Yn will prove the claim for the sequence Xn , and thus it suffices
to prove the claim when E(Xn ) = 0, i.e. ηn = 0.
For n ≥ 1 and 1 ≤ j ≤ n, let
Xj
σj
µn,j =
P
and
τn,j =
.
sn ∗
sn
The first moment of µn,j is
Z
Z
Xj
Xj
1
P (x) =
xd
dP =
E(Xj ) = 0,
sn ∗
sn
R
Ω sn
and the second moment of µn,j is
Z
R
8 Heinz
x2 d
Xj
sn
P
∗
Z (x) =
Ω
Xj
sn
2
dP =
σj2
1
2
2
E(X
)
=
= τn,j
,
j
s2n
s2n
Bauer, Probability Theory, p. 235, Theorem 28.3.
9
for which
n
X
n
1 X 2
σ = 1.
= 2
sn j=1 j
j=1
R
R
For µ ∈ P(R) with first moment R xdµ(x) = 0 and second moment R x2 dµ(x) =
σ 2 < ∞, Lemma 2 tells us that
µ̃(x) = M0 + iM1 x −
2
τn,j
σ2 2 1
M2 2 1
x + θ2 (x)x2 = 1 −
x + θ(x)x2 ,
2
2
2
2
with
|θ(x)| ≤ sup |µ̃00 (ux) − µ̃00 (0)|.
0≤u≤1
But by Lemma 1,
µ̃00 (ux) = −
Z
y 2 eiuxy dµ(y),
R
so
Z
2
iuxy
+ 1)dµ(y)
|θ(x)| ≤ sup y (−e
0≤u≤1
R
Z
≤ sup
y 2 |eiuxy − 1|dµ(y).
0≤u≤1
R
For 0 ≤nu ≤ o1, |eiuxy − 1| ≤ |uxy| ≤ |xy|, so for x ∈ R and > 0, with
, when |y| < δ and 0 ≤ u ≤ 1 we have |eiuxy − 1| < . Thus
δ = min , |x|
Z
Z
y 2 |eiuxy − 1|dµ(y)
y 2 |eiuxy − 1|dµ(y) + sup
|θ(x)| ≤ sup
0≤u≤1 |y|≥δ
0≤u≤1 |y|<δ
Z
Z
≤
y 2 dµ(y) + 2
y 2 dµ(y)
|y|<δ
|y|≥δ
Z
2
2
≤ σ + 2
y dµ(y).
|y|≥δ
n
o
Let x ∈ R and > 0, and take δ = min , |x|
. On the one hand, for n ≥ 1
and 1 ≤ j ≤ n, because the first moment of µn,j is 0 and its second moment is
2
τn,j
,
2
τn,j
1
µ̃n,j (x) = 1 −
x2 + θn,j (x)x2 ,
2
2
with, from the above,
Z
2
|θn,j (x)| ≤ τn,j + 2
y 2 dµn,j (y).
|y|≥δ
2
On the other hand, the first moment of the Gaussian measure γ0,τn,j
is 0 and
2
its second moment is τn,j . Its characteristic function is
!
2
2
τn,j
τn,j
1
2
2 (x) = exp
γ̃0,τn,j
−
x
=1−
x2 + ψn,j (x)x2 ,
2
2
2
10
with, from the above,
2
|ψn,j (x)| ≤ τn,j
+2
Z
|y|≥δ
2 (x).
y 2 dγ0,τn,j
In particular, for all x ∈ R,
2 (x) =
µ̃n,j (x) − γ̃0,τn,j
x2
(θn,j (x) − ψn,j (x)) .
2
For k ≥ 1 and for al , bl ∈ C, 1 ≤ l ≤ k,
k
Y
l=1
al −
k
Y
bl =
l=1
k
X
b1 · · · bl−1 (al − bl )al+1 · · · ak .
l=1
If further |al | ≤ 1, |bl | ≤ 1, then
k
k
k
X
Y
Y
|al − bl |.
bl ≤
al −
l=1
(1)
l=1
l=1
Because the Xn are independent, the distribution of
Sn =
n
X
Xj
j=1
sn
is the convolution of the distributions of the summands:
µn,1 ∗ · · · ∗ µn,n ,
whose characteristic function is
φn =
n
Y
µ̃n,j ,
j=1
since the characteristic function of a convolution of P
measures is the product of
n
2
the characteristic functions of the measures. Using j=1 τn,j
= 1 and (1), for
x ∈ R we have
Y
n
n
Y
2
x2
1 2
− 2
− 2 τn,j x |=
µ̃n,j (x) −
e
|φn (x) − e
j=1
j=1
≤
n X
2
1 2
µ̃n,j (x) − e− 2 τn,j x j=1
n X
2 (x)
=
µ̃n,j (x) − γ̃0,τn,j
j=1
=
n
x2 X
|θn,j (x) − ψn,j (x)| .
2 j=1
11
n
o
Therefore, for x ∈ R, > 0, and δ = min , |x|
,
|φn (x) − e−
n
2 X
x
≤
2
x2
2
|
2
τn,j
+2
Z
|y|≥δ
j=1
=x2 + x2
2
y 2 dµn,j (y) + τn,j
+2
Z
n Z
X
j=1
|y|≥δ
y 2 dµn,j (y) + x2
|y|≥δ
n Z
X
j=1
|y|≥δ
!
2 (y)
y 2 dγ0,τn,j
2 (y).
y 2 dγ0,τn,j
We calculate
n Z
1 X
y 2 d(Xj ∗ P )(y)
s2n j=1 |y|≥δsn
n Z
1 X
X 2 dP
= 2
sn j=1 |Xj |≥δsn j
2
n Z
X
Xj
=
dP
X j
sn
j=1 sn ≥δ
n Z
X
Xj
2
=
y d
P (y)
sn ∗
j=1 |y|≥δ
n Z
X
=
y 2 dµn,j (y).
Ln (δ) =
j=1
|y|≥δ
Hence, the fact that the Xn satisfy the Lindeberg condition yields
n Z
X
2
2
2
− x2
2 (y).
| ≤ x + x lim sup
y 2 dγ0,τn,j
lim sup |φn (x) − e
n→∞
n→∞
Write
j=1
|y|≥δ
σj
.
1≤j≤n sn
αn = max τn,j = max
1≤j≤n
We calculate
n Z
X
j=1
|y|≥δ
2
2 (y) =
y dγ0,τn,j
=
≤
n Z
X
j=1
n
X
j=1
n
X
y
|y|≥δ
2
τn,j
2
τn,j
1
√
Z
y2
exp − 2
2τn,j
2π
u2 dγ0,1 (u)
|u|≥δ/τn,j
2
τn,j
Z
u2 dγ0,1 (u)
|u|≥δ/αn
j=1
Z
u2 dγ0,1 (u).
=
|u|≥δ/αn
12
!
dy
(2)
Because the sequence Xn satisfies the Lindeberg condition, by Lemma 4 it
satisfies the Feller condition, which means that αn → 0 as n → ∞. Because
αn → 0 as n → ∞, δ/αn → ∞ as n → ∞, hence
Z
u2 dγ0,1 (u) → 0
|u|≥δ/αn
as n → ∞. Thus we get
n Z
X
j=1
|y|≥δ
2 (y) → 0
y 2 dγ0,τn,j
as n → ∞. Using this with (2) yields
lim sup |φn (x) − e−
x2
2
| ≤ x2 .
n→∞
This is true for all > 0, so
lim |φn (x) − e−
n→∞
x2
2
| = 0,
namely, φn (the characteristic function of Sn∗ P ) converges pointwise to e−
2
x2
2
.
2
− x2
− x2
Moreover, e
is indeed continuous at 0, and e
= γ̃0,1 (x). Therefore,
Lévy’s continuity theorem (Theorem 3) tells us that Sn∗ P converges narrowly
to γ0,1 , which is the claim.
13
Related documents