Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Convergence of sequences of random variables The weak law of large numbers Convergence in probability Convergence of sequences of random variables Convergence in distribution Convergence in mean square October 11, 2013 Almost surely convergence The strong law of large numbers Borel-Cantelli lemmas 1 / 65 The weak law of large numbers 2 / 65 The weak law of large numbers Theorem Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and identically distributed random variables with finite expectation m and finite variance. Set X1 + X2 + · · · + Xn Xn = , n The random variable X n has mean m: E Xn = E n 1. = Then, for all ✏ > 0, lim P X n n!1 m ✓ X1 + X2 + · · · + Xn n ◆ E(X1 ) + E(X2 ) + · · · + E(Xn ) nm = =m n n ✏ =0 3 / 65 4 / 65 The weak law of large numbers If Var(Xk ) = 2, then the variance of X n is Var X n = E 0 ⇣ Xn n 1 @X ⇣ = 2 E (Xk n m 1 n2 2 ⌘ ⌘ n X k=1 ⇣ E (Xk 2 /n. 0 n 1 @ X = 2 E (Xk n m)2 + 2 k=1 = The weak law of large numbers !2 1 m) A k=1 X 1i<jn E ((Xi | m)(Xj {z 0 ⌘ n 2 2 m)2 = 2 = n n Applying Chebyshev’s inequality, P Xn 1 m ✏ 2 /n ✏2 Therefore m))A } P Xn m ✏ !0 as n ! 1 5 / 65 Probability as the limit of the relative frequency 6 / 65 Probability as the limit of the relative frequency Let A be an event with probability P(A). Therefore, for all ✏ > 0, In each of n independent repetitions of the random experiment, we observe weather or not A occurs. More precisely, for 1 k n, let Xk be the indicator random variable of the event “A happens in the k-th repetition”. P (|fn (A) P(A)| ✏) ! 0 as n ! 1 P (|fn (A) P(A)| < ✏) ! 1 as n ! 1 Equivalently, Hence, E(Xk ) = P(A). Moreover, X1 + X2 + · · · + Xn = fn (A) n is the relative frequency of the event A. In a certain sense, the relative frequency of A converges to its probability P(A). Xn = 7 / 65 8 / 65 Convergence in probability Convergence in probability Definition I The sequence X1 , X2 , . . . , Xn , . . . converges in probability to the random variable X as n ! 1 if, for all ✏ > 0, P (|Xn X| ✏) ! 0 The sequence of sample means Xn converges in probability to the common expected value m: P Xn ! m as n ! 1 I The relative frequency fn (A) converges in probability to P(A): P Notation: fn (A) ! P(A) P Xn ! X 9 / 65 Example 10 / 65 Example We have to prove that Let X1 , X2 , . . . , Xn , . . . be random variables such that P(Xn = 0) = 1 1 , n P(Xn = 1 1) = P(Xn = 1) = , 2n P(|Xn | < ✏) ! 1 n as n ! 1 1 Let us prove that P Xn ! 0 I If ✏ > 1, then P(|Xn | < ✏) = 1 for all n I Otherwise, if ✏ 1, then P(|Xn | < ✏) = P(Xn = 0) = 1 11 / 65 1. 1 !1 n 12 / 65 The characteristic function of X n The characteristic function of X n Let Mn (!) be the characteristic function of X n : ⇣ ⌘ Mn (!) = E e i!(X1 +X2 +···+Xn )/n Expanding MX (u) as a power series in u: MX (u) = 1 + imu + ⇣ ! ⌘ ! ! = E e i n X 1 e i n X2 · · · e i n Xn Therefore ⇣ ! ⌘ ⇣ ! ⌘ ⇣ ! ⌘ = E e i n X1 E e i n X2 · · · E e i n X n ⇣ ⇣ ! ⌘⌘n = MX n i2 m2 u 2 + · · · 2 ⇣ ⇣ ! ⌘⌘n Mn (!) = MX n ✓ ◆n ⇣! ⌘ i2 ⇣ ! ⌘2 = 1 + im + m2 + ··· n 2 n 13 / 65 The characteristic function of X n The characteristic function of X n Taking logarithms: ✓ ln (Mn (!)) = n ln 1 + im =n ✓ im ⇣! ⌘ n ⇣! ⌘ n + i2 2 m2 ⇣ ! ⌘2 n + ··· Therefore ◆ Mn (!) ! e im! ✓ ◆◆ 1 o (1/n) +o = im! + . n 1/n as n!1 This limit function is the characteristic function of a “constant” random variable Y such that P(Y = m) = 1 Then ln (Mn (!)) ! im! 14 / 65 as n!1 15 / 65 16 / 65 Continuity theorem for characteristic functions Continuity theorem for characteristic functions Theorem We say that the sequence F1 , F2 , . . . , Fn , . . . of distribution functions converges to the distribution function F , written Suppose that F1 , F2 , . . . , Fn , . . . is a sequence of distribution functions with corresponding characteristic functions M1 , M2 , . . . , Mn , . . . Fn ! F I if Fn (x) ! F (x) I at each point x where F is continuous. If Fn ! F for some distribution function F with characteristic function M, then Mn (!) ! M(!) for all !. Conversely, if M(!) = limn!1 Mn (!) exists and is continuous at ! = 0, then M is the characteristic function of some distribution function F , and Fn ! F . 17 / 65 Convergence of the distribution functions of X n 18 / 65 Convergence in distribution Definition Applying the continuity theorem we can give another interpretation of the convergence to m of the sequence of sample means. The sequence X1 , X2 , . . . , Xn , . . . converges in distribution to the random variable X as n ! 1 if FXn ! FX . That is, if Let Fn be the distribution function of X n and let ⇢ 0, x < m F (x) = 1, x m FXn (x) ! FX (x) for all x 2 C (FX ), where C (FX ) = {x 2 R : F (x) is continuous at x} be the distribution function of a random variable Y with characteristic function M(!) = e im! . (Notice that Y = m with probability 1.) Notation: d Xn ! X Then Fn (x) ! F (x) for all x 6= m Example: d Xn ! m 19 / 65 20 / 65 The central limit theorem The central limit theorem Theorem Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and identically distributed r.v. with E(Xk ) = m and Var(Xk ) = Let n 1 X Xk Sn? = p n m I 2. E(Sn? ) = 0, . I k=1 Then We have Notice that Sn? is the normalized sample mean X n : 1 lim FSn? (x) = FN(0,1) (x) = p 2⇡ n!1 Xn m p / n Sn? = d Sn? ! N(0, 1) That is, for all x 2 R, Var(Sn? ) = 1 Z x e t 2 /2 X n = p Sn? + m n dt 1 21 / 65 The central limit theorem 22 / 65 The central limit theorem Let Mn be the characteristic function of Sn? . Therefore ⇣ i!Sn? Since ⌘ Mn (!) = E e ✓ P X i! p1n nk=1 k =E e = n Y k=1 ✓ E e i ! Xk p n m m1 = E(Z ) = 0, m ◆ ◆ = =E ✓ n Y e i k=1 MZ ✓ ! p n ! Xk p n ◆◆n m the first terms of the series expansion of MZ are: ! MZ (u) = 1 + im1 u + Z= i2 m2 u 2 + · · · = 1 2 1 2 u + ··· 2 Therefore where MZ is the characteristic function of X1 m2 = E Z 2 = 1, Mn (!) = m 23 / 65 ✓ MZ ✓ ! p n ◆◆n = 1 1 2 ✓ ! p n ◆2 +o ✓ ! p n ◆2 ! n 24 / 65 The central limit theorem The central limit theorem Hence, Taking logarithms, ln Mn (!) = n ln 1 =n 1 2 ✓ ! p n ◆2 1 2 ✓ ! p n ◆2 ✓ ◆! 1 +o n +o ✓ ! ! p n Mn (!) ! e ◆2 ! ! 2 /2 This is the characteristic function of a standard normal random variable. Then, by the continuity theorem, 1 2 ! , 2 d Sn? ! N(0, 1) 25 / 65 WLLN versus CLT 26 / 65 WLLN versus CLT On the other hand, by the CLT: ✓ ◆ ✓ 1 1 1 P Xn > =1 P Xn 2 10 10 Example: Let X1 , X2 , . . . , Xn , . . . be a sequence of independent random varibles, uniformly distributed on [0, 1]. =1 By the weak law of large numbers we have that ✓ ◆ 1 1 P Xn > ! 0 as n ! 1 2 10 I 27 / 65 1 1 2 10 ! p p 12n X n 1/2 12n p P 10 10 1/ 12n !! p 12n ⇡ 2 1 FN(0,1) 10 ◆ p We have taken into account that X n 1/2 / 1/ 12n converges in distribution to a standard normal. Thus, its distribution function can be approximated by FN(0,1) . 28 / 65 De Moivre-Laplace Theorem De Moivre-Laplace Theorem For instance, If each Xk is a Bernoulli random variable with probability p, then P(a Sn b) = P(a Sn = X1 + X2 + · · · Xn ⇠ Bin(n, p) ✓ ◆ np 1 Sn np b np =P < p p p npq npq npq ✓ ◆ ✓ ◆ b np a np 1 = Fn p Fn p npq npq ✓ ◆ ✓ ◆ b np + 1/2 a np 1/2 ⇡ FN(0,1) FN(0,1) p p npq npq with expected value np and variance npq. Hence, the CLT implies Theorem Sn np p npq 1 < Sn b) d ! N(0, 1) a 29 / 65 30 / 65 A local version of the CLT << Statistics`DiscreteDistributions` << Statistics`ContinuousDistributions` Theorem b6@x_D := CDF@BinomialDistribution@6, 0.5D, xD f6@x_D := Which@x < -3 ê Sqrt@6 * 0.25D, 0, x < -2 ê Sqrt@6 * 0.25D, b6@0D, x < -1 ê Sqrt@6 * 0.25D, b6@1D, x < 0, b6@2D, x < 1 ê Sqrt@6 * 0.25D, b6@3D, x < 2 ê Sqrt@6 * 0.25D, b6@4D, x < 3 ê Sqrt@6 * 0.25D, b6@5D, x ¥ 3 ê Sqrt@6 * 0.25D, 1D Let X1 , X2 , . . . , Xk , . . . be independent identically distributed random variables with zero mean and unit variance, and suppose further that their common characteristic function MX satisfies Z 1 |MX (!)|r d! < 1 Plot@8f6@xD, CDF@NormalDistribution@0, 1D, xD<, 8x, -3, 3<D 1 0.8 1 0.6 for some integer r 0.2 -3 -2 -1 1. p Therefore the density fn of Un = (X1 + X2 + · · · + Xn )/ n exists for n r , and 0.4 1 2 1 fn (u) ! p e 2⇡ 3 31 / 65 u 2 /2 as n ! 1, uniformly in R 32 / 65 Example: sum of uniform r.v. Example: sum of uniform r.v. Suppose that each Xi is uniform on [ and Var(Xi ) = 1 We have p p 3, 3]. Hence, E(Xi ) = 0 Z Their common characteristic function is Z 1 MX (!) = e i!x fX (x) dx 1 1 = p 2 3 Z p 3 p e i!x 3 1 1 2 |MX (!)| d! = Z 1 1 !2 p sin 3 ! ⇡ p d! = p < 1, 3! 3 Hence, the sufficient condition of the theorem holds for r = 2. p Thus, the density fn of Un = (X1 + X2 + · · · + Xn )/ n exists for all n 1 and p sin 3 ! dx = p 3! 1 fn (u) ! p e 2⇡ u 2 /2 33 / 65 Example: sum of uniform r.v. For instance, let U3 = Example: sum of uniform r.v. The calculation of f3 gives 8 0, > > > > > > > > > (u + 3)2 /16, > > > > < f3 (u) = (3 u 2 )/8, > > > > > > > (3 u)2 /16, > > > > > > : 0, X1 + X2 + X3 p 3 If f (s) = fX (s) ⇤ fX (s) ⇤ fX (s) then f3 (u) = p 3f 34 / 65 ⇣p 3s ⌘ 35 / 65 u< 3 3u< 1 1u<1 1u<3 u 3 36 / 65 Example: sum of uniform r.v. Poisson’s theorem Suma de 3 v.a. uniformes en [-Sqrt[3],Sqrt[3]], independents Theorem f@t_D := Which@t § -3 Sqrt@3D, 0, t § -Sqrt@3D, H3 Sqrt@3D + tL ^ 2 ê H48 Sqrt@3DL, t § Sqrt@3D, H9 - t ^ 2L ê H24 Sqrt@3DL, t § 3 Sqrt@3D, H3 Sqrt@3D - tL ^ 2 ê H48 Sqrt@3DL, True, 0D If Xn ⇠ Bin(n, /n), g@t_D := Sqrt@3D f@Sqrt@3D tD Plot@81 ê Sqrt@2 PiD Exp@-1 ê 2 t ^ 2D, g@tD<, 8t, -6, 6<, PlotRange Ø All, PlotStyle Ø 88RGBColor@1, 0, 0D<, 8RGBColor@0, 1, 0D<<D > 0, then d Xn ! Poiss( ) 0.4 0.3 Proof: ✓ ◆n MXn (!) = 1 + e i! n n ✓ ◆n i! (e 1) = 1+ !e n 0.2 0.1 -6 -4 -2 2 4 (e i! 1) as n!1 6 37 / 65 Convergence in distribution: a result 38 / 65 Convergence in mean square Theorem Let X1 , X2 , . . . Xn . . . and X be random variables taking nonnegative integer values. A necessary and sufficient condition for d Xn ! X is lim P(Xn = k) = P(X = k) n!1 for all k Given a probability space (⌦, F, P), let us consider the vector space H whose elements are the random variables X with a finite second order moment, E(X 2 ) < 1. 0 We define an inner product in H by hX , Y i = E(XY ) For example, by Poisson’s theorem and this result, we have that ✓ ◆ ✓ ◆k ✓ n 1 k n n ◆n k k !e k! 39 / 65 40 / 65 Convergence in mean square Convergence in mean square Definition I I The norm induced by this inner product is: q p k X k= hX , X i = E(X 2 ) The sequence X1 , X2 , . . . , Xn , . . . converges in mean square to the random variable X as n ! 1 if Moreover, a distance between random variables of H can be considered: r ⇣ ⌘ d(X , Y ) =k X Y k= E (X Y )2 Equivalently, d(Xn , X ) ! 0 E (Xn Notation: as n ! 1, X )2 ! 0 as n ! 1 2 Xn ! X 41 / 65 Example 42 / 65 Convergence in r -mean More generally, Definition The sequence of sample means X n converges in mean square to the expected value m: E ⇣ Xn m 2 ⌘ The sequence X1 , X2 , . . . , Xn , . . . converges in r -mean to the random variable X as n ! 1 if 2 = Var X n = n !0 E (|Xn Notation: X |r ) ! 0 as n ! 1 r Xn ! X 43 / 65 44 / 65 Example Example Proof: We have to prove that Consider the sequence X1 , X2 , . . . , Xn , . . . such that 8 > < P(Xn = 0) = 1 1 n n = 1, 2, 3, . . . 1 > : P(Xn = 1) = P(Xn = 1) = 2n E (|Xn |r ) ! 0 as n ! 1 Indeed, E(|Xn |r ) = 0 · P(Xn = 0) + 1 · (P(Xn = Let us prove that, for any r > 0, r Xn ! 0 = 1 1 + 2n 2n 1) + P(Xn = 1)) !0 45 / 65 Almost surely convergence Uniqueness Definition Theorem The sequence X1 , X2 , . . . , Xn , . . . converges almost surely to the random variable X as n ! 1 if P ({! 2 ⌦ : Xn (!) ! X (!) Notation: 46 / 65 Let X1 , X2 , . . . , Xn , . . . be a sequence of random variables. If the sequence converges as n ! 1}) = 1 a.s. Xn ! X I almost surely, I in probability, I in r -mean, I or in distribution, then the limiting random variable (distribution) is unique We also say that Xn converges to X with probability 1. 47 / 65 48 / 65 Uniqueness Uniqueness Let ! 2 NX \ NY = NX [ NY . Then For example, let us prove the uniqueness of the limit in the case of almost sure convergence. a.s. |X (!) a.s. Suppose that Xn ! X and Xn ! Y and let Y (!)| |X (!) Xn (!)| + |Xn (!) Y (!)| ! 0 So, if ! 2 NX [ NY , then X (!) = Y (!) ; hence, if X (!) 6= Y (!), then ! 2 NX [ NY . NX = {! : Xn (!) 6! X (!) as n ! 1} Thus NY = {! : Xn (!) 6! Y (!) as n ! 1} P(X 6= Y ) P(NX [ NY ) P(NX ) + P(NY ) = 0 So, we have that That is, P(NX ) = P(NY ) = 0 X =Y with probability 1 49 / 65 Relations between the convergence concepts 50 / 65 Relations between the convergence concepts Theorem With additional hypothesis some converse implications hold. The following implications hold: ⇣ ⌘ ⇣ ⌘ a.s. P I Xn ! X =) Xn ! X ⇣ ⌘ ⇣ ⌘ r P I Xn ! X =) Xn ! X for any r 1. ⇣ ⌘ ⇣ ⌘ P d I Xn ! X =) Xn ! X ⇣ ⌘ ⇣ ⌘ r s I If r > s 1, then Xn ! X =) Xn ! X Theorem I I I All implications are strict. 51 / 65 ⇣ ⌘ ⇣ ⌘ d P If c is a constant, then Xn ! c =) Xn ! c P If Xn ! X and there exists a constant C such that r P(|Xn | C ) = 1 for all n, then Xn ! X for all r 1. P If Pn (✏) = P (|Xn X | > ✏) satisfies n Pn (✏) < 1 for all a.s. ✏ > 0, then Xn ! X 52 / 65 Relations between the convergence concepts Relations between the convergence concepts If ✏ > 0 is a fixed number, we have that For instance, let us consider the proof of the implication ⇣ ⌘ ⇣ ⌘ d P Xn ! c =) Xn ! c P(|Xn d Hence, assume that Xn ! X where X = c is a constant r.v. with distribution function ( 0, x < c FX (x) = 1, x c c| > ✏) = 1 ✏ Xn c + ✏) P(c =1 (FXn (c + ✏) FXn (c 1 FXn (c + ✏) + FXn (c ✏) + P(Xn = c ✏)) ✏) ! 0 because FXn (c + ✏) ! FX (c + ✏) = 1, ✏) ! FX (c FXn (c Therefore ✏) = 0 P Xn ! c 53 / 65 Example 54 / 65 Example We have that Consider the sequence X1 , X2 , . . . , Xn , . . . such that Pn (✏) = P(|Xn 1| > ✏) = P(X1 = 1) = 1, P(Xn = 1) = 1 1 , n2 P(Xn = n) = 1 n2 Therefore n 2 1 X n=1 Let us prove that the sequence converges almost surely to 1. Hence, Pn (✏) = ( 0, n=1 1/n2 , n 2 1 X 1 <1 n2 n=2 a.s. Xn ! 1 55 / 65 56 / 65 Strong laws of large numbers Strong laws of large numbers Theorem Theorem Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and identically distributed random variables and set Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and identically distributed random variables with finite expectation m and finite variance. Set X1 + X2 + · · · + Xn Xn = , n Then, a.s. Xn ! m as n Xn = 1 X1 + X2 + · · · + Xn , n Then a.s. Xn ! µ n!1 as n 1 n!1 for some constant µ, if and only if E(|X1 |) < 1. In this case, µ = E (X1 ). 57 / 65 Borel-Cantelli lemmas 58 / 65 Borel-Cantelli lemmas The limit of the sequence B1 , B2 , . . . , Bn , . . . is Given a sequence of events A1 , A2 , . . . , An , . . ., let Bn = 1 [ A? = lim Bn = Ak , n n!1 1 k=n B2 ··· n=1 Bn = 1 [ 1 \ Ak n=1 k=n The event A? is called the limit superior of the sequence A1 , A2 , . . . , An , . . ., and it is denoted by A? = lim supn An . Notice that B1 1 \ Bn ··· I Notice that ! 2 A? if and only if ! belongs to infinitely many of the An . I That is, A? is the event “infinitely many of the An occur”. is a decreasing sequence of events. 59 / 65 60 / 65 Borel-Cantelli lemmas Borel-Cantelli lemmas For example, the proof of the first Borel-Cantelli lemma is: ⇣ ⌘ P(A? ) = P lim Bn = lim P(Bn ) n!1 n!1 ! 1 1 [ X = lim P Ak lim P(Ak ) = 0 Theorem (Borel-Cantelli lemmas) Let A1 , A2 , . . . , An , . . . be a sequence of events and A? its limit superior. Then: P1 I P(A? ) = 0 if P(An ) < 1, Pn=1 1 I P(A? ) = 1 if n=1 P(An ) = 1 and the events A1 , A2 , . . . are independent. n!1 because P n k=n n!1 k=n P(An ) converges by hypothesis. 61 / 65 Zero-one law 62 / 65 Example Let X1 , X2 , . . . , Xn , . . . be a sequence of random variables such that P(Xn = 0) = Corollary (zero-one law) 1 , n2 P(Xn = 1) = 1 Let us consider the events An = {Xn = 0}, n Let A1 , A2 , . . . , An , . . . be a sequence of independent events and let A? be its limit superior. P Then either P(A? ) = 0 or P(A? ) = 1 according as 1 n=1 P(An ) converges or diverges respectively. I I 1 n2 n 1 1. P Since n P(An ) < 1, the first Borel-Cantelli lemma implies that P(A? ) = 0. Thus, there is a 0 probability that {Xn = 0} happens infinitely often. Therefore P(Xn = 1 for all n suficiently large) = 1. Thus we have proved: lim Xn = 1 n!1 63 / 65 with probability 1 64 / 65 Example Now, let X1 , X2 , . . . , Xn , . . . be independent random variables such that 1 1 P(Xn = 0) = , P(Xn = 1) = 1 n 1 n n and let An = {Xn = 0}, n 1. P I Since n P(An ) = 1, the second Borel-Cantelli lemma implies that, with probability 1, infinitely many of the events An = {Xn = 0} occur. P I Analogously, n P(An ) = 1. Thus, the probability that infinitely many of the events An = {Xn = 1} occur is also 1. I Hence, with probability 1, limn!1 Xn does not exist. 65 / 65