Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5 Weak convergence We will see p later that if the Xi are i.i.d. with mean zero and variance one, then Sn / n converges in the sense p P(Sn / n 2 [a, b]) ! P(Z 2 [a, b]), p where Z is a standard normal. If Sn / n converged in probability or almost surely, then by the Kolmogorov zero-one law it would converge to a constant, contradicting the above. We want to generalize the above type of convergence. We say Fn converges weakly to F if Fn (x) ! F (x) for all x at which F is continuous. Here Fn and F are distribution functions. We say Xn converges weakly to X if FXn converges weakly to FX . We sometimes say Xn converges in distribution or converges in law to X. Probabilities µn converge weakly if their corresponding distribution functions converges, that is, if Fµn (x) = µn ( 1, x] converges weakly. An example that illustrates why we restrict the convergence to continuity points of F is the following. Let Xn = 1/n with probability one, and X = 0 with probability one. FXn (x) is 0 if x < 1/n and 1 otherwise. FXn (x) converges to FX (x) for all x except x = 0. Proposition 5.1 Xn converges weakly to X if and only if E g(Xn ) ! E g(X) for all g bounded and continuous. The idea that E g(Xn ) converges to E g(X) for all g bounded and continuous makes sense for any metric space and is used as a definition of weak 41 42 CHAPTER 5. WEAK CONVERGENCE convergence for Xn taking values in general metric spaces. Proof. First suppose E g(Xn ) converges to E g(X). Let x be a continuity point of F , let " > 0, and choose such that |F (y) F (x)| < " if |y x| < . Choose g continuous such that g is one on ( 1, x], takes values between 0 and 1, and is 0 on [x + , 1). Then FXn (x) E g(Xn ) ! E g(X) FX (x + ) F (x) + ". Similarly, if h is a continuous function taking values between 0 and 1 that is 1 on ( 1, x ] and 0 on [x, 1), FXn (x) E h(Xn ) ! E h(X) FX (x ) F (x) ". Since " is arbitrary, FXn (x) ! FX (x). Now suppose Xn converges weakly to X. We start by making some observations. First, if we have a distribution function, it is increasing and so the number of points at which it has a discontinuity is at most countable. Second, if g is a continuous function on a closed bounded interval, it can be approximated uniformly on the interval by step functions. Using the uniform continuity of g on the interval, we may even choose the step function so that the places where it jumps are not in some pre-specified countable set. Our third observation is that P(X < x) = lim P(X x k!1 1 ) = lim FX (y); y!x k thus if FX is continuous at x, then P(X = x) = FX (x) P(X < x) = 0. Suppose g is bounded and continuous, and we want to prove that E g(Xn ) ! E g(X). By multiplying by a constant, we may suppose that |g| is bounded by 1. Let " > 0 and choose M such that FX (M ) > 1 " and FX ( M ) < " and so that M and M are continuity points of FX . We see that P(Xn M ) = FXn ( M ) ! FX ( M ) < ", and so for large enough n, we have P(Xn M ) 2". Similarly, for large enough n, P(Xn > M ) 2". Therefore for large enough n, E g(Xn ) di↵ers from E (g1[ M,M ] )(Xn ) by at most 4". Also, E g(X) di↵ers from E (g1[ M,M ] )(X) by at most 2". Let h be a step function such that sup|x|M |h(x) g(x)| < " and h is 0 outside of [ M, M ]. We choose h so that the places where h jumps are continuity points of F and of all the Fn . Then E (g1[ M,M ] )(Xn ) di↵ers from E h(Xn ) by at most ", and the same when Xn is replaced by X. 43 If we show E h(Xn ) ! E h(X), then E g(X)| 8", lim sup |E g(Xn ) n!1 and since " is arbitrary, we will be done. P h is of the form m i=1 ci 1Ii , where Ii is an interval, so by linearity it is enough to show E 1I (Xn ) ! E 1I (X) when I is an interval whose endpoints are continuity points of all the Fn and of F . If the endpoints of I are a < b, then by our third observation above, P(Xn = a) = 0, and the same when Xn is replaced by a and when a is replaced by b. We then have E 1I (Xn ) = P(a < Xn b) = FXn (b) FXn (a) ! FX (b) FX (a) = P(a < X b) = E 1I (X), as required. Let us examine the relationship between p weak convergence and convergence in probability. The example of Sn / n shows that one can have weak convergence without convergence in probability. Proposition 5.2 (a) If Xn converges to X in probability, then it converges weakly. (b) If Xn converges weakly to a constant, it converges in probability. (c) (Slutsky’s theorem) If Xn converges weakly to X and Yn converges weakly to a constant c, then Xn + Yn converges weakly to X + c and Xn Yn converges weakly to cX. Proof. To prove (a), let g be a bounded and continuous function. If nj is any subsequence, then there exists a further subsequence such that X(njk ) converges almost surely to X. Then by dominated convergence, E g(X(njk )) ! E g(X). That suffices to show E g(Xn ) converges to E g(X). For (b), if Xn converges weakly to c, P(Xn c > ") = P(Xn > c + ") = 1 P(Xn c + ") ! 1 P(c c + ") = 0. 44 CHAPTER 5. WEAK CONVERGENCE We use the fact that if Y ⌘ c, then c + " is a point of continuity for FY . A similar equation shows P(Xn c ") ! 0, so P(|Xn c| > ") ! 0. We now prove the first part of (c), leaving the second part for the reader. Let x be a point such that x c is a continuity point of FX . Choose " so that x c + " is again a continuity point. Then P(Xn + Yn x) P(Xn + c x + ") + P(|Yn c| > ") ! P(X x c + "). So lim sup P(Xn + Yn x) P(X + c x + "). Since " can be as small as we like and x c is a continuity point of FX , then lim sup P(Xn + Yn x) P(X + c x). The lim inf is done similarly. Here is an example where Xn converges weakly but not in probability. Let X1 , X2 , . . . be an i.i.d. sequence with P(X1 = 1) = 12 and P(Xn = 0) = 12 . Since the FXn are all equal, then we have weak convergence. We claim the Xn do not converge in probability. If they did, we could find a subsequence {nj } such that Xnj converges P a.s. Let Aj = (Xnj = 1). 1 These are independent sets, P(Aj ) = 2 , so j P(Aj ) = 1. By the BorelCantelli lemma, P(Aj i.o.) = 1, which means that Xnj is equal to 1 infinitely often with probability one. The same argument also shows that Xnj is equal to 0 infinitely often with probability one, which implies that Xnj does not converge almost surely, a contradiction. We say a sequence of distribution functions {Fn } is tight if for each " > 0 there exists M such that Fn (M ) 1 " and Fn ( M ) " for all n. A sequence of r.v.s is tight if the corresponding distribution functions are tight; this is equivalent to P(|Xn | M ) ". We give an easily checked criterion for tightness. Proposition 5.3 Suppose there exists ' : [0, 1) ! [0, 1) that is increasing and '(x) ! 1 as x ! 1. If c = supn E '(|Xn |) < 1, then the Xn are tight. Proof. Let " > 0. Choose M such that '(x) c/" if x > M . Then Z '(|Xn |) " P(|Xn | > M ) 1(|Xn |>M ) dP E '(|Xn |) ". c/" c 45 Theorem 5.4 (Helly’s theorem) Let Fn be a sequence of distribution functions that is tight. There exists a subsequence nj and a distribution function F such that Fnj converges weakly to F . What could happen is that Xn = n, so that FXn ! 0; the tightness precludes this. Proof. Let qk be an enumeration of the rationals. Since Fn (qk ) 2 [0, 1], any subsequence has a further subsequence that converges. Use the diagonalization procedure so that Fnj (qk ) converges for each qk and call the limit F (qk ). F is nondecreasing, and define F (x) = inf qk x F (qk ). So F is right continuous and nondecreasing. If x is a point of continuity of F and " > 0, then there exist r and s rational such that r < x < s and F (s) F (x) < " and F (x) F (r) < ". Then Fnj (x) Fnj (r) ! F (r) > F (x) " and Fnj (x) Fnj (s) ! F (s) < F (x) + ". Since " is arbitrary, Fnj (x) ! F (x). Since the Fn are tight, there exists M such that Fn ( M ) < ". Then F ( M ) ", which implies limx! 1 F (x) = 0. Showing limx!1 F (x) = 1 is similar. Therefore F is in fact a distribution function.