Download Chapter 5 Weak convergence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Chapter 5
Weak convergence
We will see
p later that if the Xi are i.i.d. with mean zero and variance one,
then Sn / n converges in the sense
p
P(Sn / n 2 [a, b]) ! P(Z 2 [a, b]),
p
where Z is a standard normal. If Sn / n converged in probability or almost surely, then by the Kolmogorov zero-one law it would converge to a
constant, contradicting the above. We want to generalize the above type of
convergence.
We say Fn converges weakly to F if Fn (x) ! F (x) for all x at which
F is continuous. Here Fn and F are distribution functions. We say Xn
converges weakly to X if FXn converges weakly to FX . We sometimes say
Xn converges in distribution or converges in law to X. Probabilities µn
converge weakly if their corresponding distribution functions converges, that
is, if Fµn (x) = µn ( 1, x] converges weakly.
An example that illustrates why we restrict the convergence to continuity
points of F is the following. Let Xn = 1/n with probability one, and X = 0
with probability one. FXn (x) is 0 if x < 1/n and 1 otherwise. FXn (x)
converges to FX (x) for all x except x = 0.
Proposition 5.1 Xn converges weakly to X if and only if E g(Xn ) ! E g(X)
for all g bounded and continuous.
The idea that E g(Xn ) converges to E g(X) for all g bounded and continuous makes sense for any metric space and is used as a definition of weak
41
42
CHAPTER 5. WEAK CONVERGENCE
convergence for Xn taking values in general metric spaces.
Proof. First suppose E g(Xn ) converges to E g(X). Let x be a continuity
point of F , let " > 0, and choose such that |F (y) F (x)| < " if |y x| < .
Choose g continuous such that g is one on ( 1, x], takes values between
0 and 1, and is 0 on [x + , 1). Then FXn (x)  E g(Xn ) ! E g(X) 
FX (x + )  F (x) + ".
Similarly, if h is a continuous function taking values between 0 and 1
that is 1 on ( 1, x
] and 0 on [x, 1), FXn (x)
E h(Xn ) ! E h(X)
FX (x
) F (x) ". Since " is arbitrary, FXn (x) ! FX (x).
Now suppose Xn converges weakly to X. We start by making some observations. First, if we have a distribution function, it is increasing and so
the number of points at which it has a discontinuity is at most countable.
Second, if g is a continuous function on a closed bounded interval, it can be
approximated uniformly on the interval by step functions. Using the uniform
continuity of g on the interval, we may even choose the step function so that
the places where it jumps are not in some pre-specified countable set. Our
third observation is that
P(X < x) = lim P(X  x
k!1
1
) = lim FX (y);
y!x
k
thus if FX is continuous at x, then P(X = x) = FX (x)
P(X < x) = 0.
Suppose g is bounded and continuous, and we want to prove that E g(Xn )
! E g(X). By multiplying by a constant, we may suppose that |g| is bounded
by 1. Let " > 0 and choose M such that FX (M ) > 1 " and FX ( M ) < "
and so that M and M are continuity points of FX . We see that
P(Xn 
M ) = FXn ( M ) ! FX ( M ) < ",
and so for large enough n, we have P(Xn  M )  2". Similarly, for
large enough n, P(Xn > M )  2". Therefore for large enough n, E g(Xn )
di↵ers from E (g1[ M,M ] )(Xn ) by at most 4". Also, E g(X) di↵ers from
E (g1[ M,M ] )(X) by at most 2".
Let h be a step function such that sup|x|M |h(x) g(x)| < " and h is
0 outside of [ M, M ]. We choose h so that the places where h jumps are
continuity points of F and of all the Fn . Then E (g1[ M,M ] )(Xn ) di↵ers from
E h(Xn ) by at most ", and the same when Xn is replaced by X.
43
If we show E h(Xn ) ! E h(X), then
E g(X)|  8",
lim sup |E g(Xn )
n!1
and since " is arbitrary, we will be done.
P
h is of the form m
i=1 ci 1Ii , where Ii is an interval, so by linearity it is
enough to show
E 1I (Xn ) ! E 1I (X)
when I is an interval whose endpoints are continuity points of all the Fn
and of F . If the endpoints of I are a < b, then by our third observation
above, P(Xn = a) = 0, and the same when Xn is replaced by a and when a
is replaced by b. We then have
E 1I (Xn ) = P(a < Xn  b) = FXn (b) FXn (a)
! FX (b) FX (a) = P(a < X  b) = E 1I (X),
as required.
Let us examine the relationship between
p weak convergence and convergence in probability. The example of Sn / n shows that one can have weak
convergence without convergence in probability.
Proposition 5.2 (a) If Xn converges to X in probability, then it converges
weakly.
(b) If Xn converges weakly to a constant, it converges in probability.
(c) (Slutsky’s theorem) If Xn converges weakly to X and Yn converges weakly
to a constant c, then Xn + Yn converges weakly to X + c and Xn Yn converges
weakly to cX.
Proof. To prove (a), let g be a bounded and continuous function. If
nj is any subsequence, then there exists a further subsequence such that
X(njk ) converges almost surely to X. Then by dominated convergence,
E g(X(njk )) ! E g(X). That suffices to show E g(Xn ) converges to E g(X).
For (b), if Xn converges weakly to c,
P(Xn
c > ") = P(Xn > c + ") = 1
P(Xn  c + ") ! 1
P(c  c + ") = 0.
44
CHAPTER 5. WEAK CONVERGENCE
We use the fact that if Y ⌘ c, then c + " is a point of continuity for FY . A
similar equation shows P(Xn c  ") ! 0, so P(|Xn c| > ") ! 0.
We now prove the first part of (c), leaving the second part for the reader.
Let x be a point such that x c is a continuity point of FX . Choose " so
that x c + " is again a continuity point. Then
P(Xn + Yn  x)  P(Xn + c  x + ") + P(|Yn
c| > ") ! P(X  x
c + ").
So lim sup P(Xn + Yn  x)  P(X + c  x + "). Since " can be as small as
we like and x c is a continuity point of FX , then lim sup P(Xn + Yn  x) 
P(X + c  x). The lim inf is done similarly.
Here is an example where Xn converges weakly but not in probability. Let
X1 , X2 , . . . be an i.i.d. sequence with P(X1 = 1) = 12 and P(Xn = 0) = 12 .
Since the FXn are all equal, then we have weak convergence.
We claim the Xn do not converge in probability. If they did, we could
find a subsequence {nj } such that Xnj converges
P a.s. Let Aj = (Xnj = 1).
1
These are independent sets, P(Aj ) = 2 , so j P(Aj ) = 1. By the BorelCantelli lemma, P(Aj i.o.) = 1, which means that Xnj is equal to 1 infinitely
often with probability one. The same argument also shows that Xnj is equal
to 0 infinitely often with probability one, which implies that Xnj does not
converge almost surely, a contradiction.
We say a sequence of distribution functions {Fn } is tight if for each " > 0
there exists M such that Fn (M )
1 " and Fn ( M )  " for all n. A
sequence of r.v.s is tight if the corresponding distribution functions are tight;
this is equivalent to P(|Xn | M )  ".
We give an easily checked criterion for tightness.
Proposition 5.3 Suppose there exists ' : [0, 1) ! [0, 1) that is increasing
and '(x) ! 1 as x ! 1. If c = supn E '(|Xn |) < 1, then the Xn are
tight.
Proof. Let " > 0. Choose M such that '(x) c/" if x > M . Then
Z
'(|Xn |)
"
P(|Xn | > M ) 
1(|Xn |>M ) dP  E '(|Xn |)  ".
c/"
c
45
Theorem 5.4 (Helly’s theorem) Let Fn be a sequence of distribution functions that is tight. There exists a subsequence nj and a distribution function
F such that Fnj converges weakly to F .
What could happen is that Xn = n, so that FXn ! 0; the tightness
precludes this.
Proof. Let qk be an enumeration of the rationals. Since Fn (qk ) 2 [0, 1],
any subsequence has a further subsequence that converges. Use the diagonalization procedure so that Fnj (qk ) converges for each qk and call the limit
F (qk ). F is nondecreasing, and define F (x) = inf qk x F (qk ). So F is right
continuous and nondecreasing.
If x is a point of continuity of F and " > 0, then there exist r and s
rational such that r < x < s and F (s) F (x) < " and F (x) F (r) < ".
Then
Fnj (x) Fnj (r) ! F (r) > F (x) "
and
Fnj (x)  Fnj (s) ! F (s) < F (x) + ".
Since " is arbitrary, Fnj (x) ! F (x).
Since the Fn are tight, there exists M such that Fn ( M ) < ". Then
F ( M )  ", which implies limx! 1 F (x) = 0. Showing limx!1 F (x) = 1 is
similar. Therefore F is in fact a distribution function.