Download Convergence of sequences of random variables Convergence of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Convergence of sequences of random variables
The weak law of large numbers
Convergence in probability
Convergence of sequences of random variables
Convergence in distribution
Convergence in mean square
October 11, 2013
Almost surely convergence
The strong law of large numbers
Borel-Cantelli lemmas
1 / 65
The weak law of large numbers
2 / 65
The weak law of large numbers
Theorem
Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and
identically distributed random variables with finite expectation m
and finite variance.
Set
X1 + X2 + · · · + Xn
Xn =
,
n
The random variable X n has mean m:
E Xn = E
n
1.
=
Then, for all ✏ > 0,
lim P X n
n!1
m
✓
X1 + X2 + · · · + Xn
n
◆
E(X1 ) + E(X2 ) + · · · + E(Xn )
nm
=
=m
n
n
✏ =0
3 / 65
4 / 65
The weak law of large numbers
If Var(Xk ) =
2,
then the variance of X n is
Var X n = E
0
⇣
Xn
n
1 @X ⇣
= 2
E (Xk
n
m
1
n2
2
⌘
⌘
n
X
k=1
⇣
E (Xk
2 /n.
0
n
1 @ X
= 2 E
(Xk
n
m)2 + 2
k=1
=
The weak law of large numbers
!2 1
m) A
k=1
X
1i<jn
E ((Xi
|
m)(Xj
{z
0
⌘ n 2
2
m)2 = 2 =
n
n
Applying Chebyshev’s inequality,
P Xn
1
m
✏ 
2 /n
✏2
Therefore
m))A
}
P Xn
m
✏ !0
as n ! 1
5 / 65
Probability as the limit of the relative frequency
6 / 65
Probability as the limit of the relative frequency
Let A be an event with probability P(A).
Therefore, for all ✏ > 0,
In each of n independent repetitions of the random experiment, we
observe weather or not A occurs. More precisely, for 1  k  n, let
Xk be the indicator random variable of the event “A happens in
the k-th repetition”.
P (|fn (A)
P(A)|
✏) ! 0
as n ! 1
P (|fn (A)
P(A)| < ✏) ! 1
as n ! 1
Equivalently,
Hence, E(Xk ) = P(A).
Moreover,
X1 + X2 + · · · + Xn
= fn (A)
n
is the relative frequency of the event A.
In a certain sense, the relative frequency of A converges to its
probability P(A).
Xn =
7 / 65
8 / 65
Convergence in probability
Convergence in probability
Definition
I
The sequence X1 , X2 , . . . , Xn , . . . converges in probability to the
random variable X as n ! 1 if, for all ✏ > 0,
P (|Xn
X|
✏) ! 0
The sequence of sample means Xn converges in probability to
the common expected value m:
P
Xn ! m
as n ! 1
I
The relative frequency fn (A) converges in probability to P(A):
P
Notation:
fn (A) ! P(A)
P
Xn ! X
9 / 65
Example
10 / 65
Example
We have to prove that
Let X1 , X2 , . . . , Xn , . . . be random variables such that
P(Xn = 0) = 1
1
,
n
P(Xn =
1
1) = P(Xn = 1) =
,
2n
P(|Xn | < ✏) ! 1
n
as n ! 1
1
Let us prove that
P
Xn ! 0
I
If ✏ > 1, then P(|Xn | < ✏) = 1 for all n
I
Otherwise, if ✏  1, then
P(|Xn | < ✏) = P(Xn = 0) = 1
11 / 65
1.
1
!1
n
12 / 65
The characteristic function of X n
The characteristic function of X n
Let Mn (!) be the characteristic function of X n :
⇣
⌘
Mn (!) = E e i!(X1 +X2 +···+Xn )/n
Expanding MX (u) as a power series in u:
MX (u) = 1 + imu +
⇣ !
⌘
!
!
= E e i n X 1 e i n X2 · · · e i n Xn
Therefore
⇣ ! ⌘ ⇣ ! ⌘
⇣ ! ⌘
= E e i n X1 E e i n X2 · · · E e i n X n
⇣
⇣ ! ⌘⌘n
= MX
n
i2
m2 u 2 + · · ·
2
⇣
⇣ ! ⌘⌘n
Mn (!) = MX
n
✓
◆n
⇣! ⌘ i2
⇣ ! ⌘2
= 1 + im
+ m2
+ ···
n
2
n
13 / 65
The characteristic function of X n
The characteristic function of X n
Taking logarithms:
✓
ln (Mn (!)) = n ln 1 + im
=n
✓
im
⇣! ⌘
n
⇣! ⌘
n
+
i2
2
m2
⇣ ! ⌘2
n
+ ···
Therefore
◆
Mn (!) ! e im!
✓ ◆◆
1
o (1/n)
+o
= im! +
.
n
1/n
as
n!1
This limit function is the characteristic function of a “constant”
random variable Y such that
P(Y = m) = 1
Then
ln (Mn (!)) ! im!
14 / 65
as
n!1
15 / 65
16 / 65
Continuity theorem for characteristic functions
Continuity theorem for characteristic functions
Theorem
We say that the sequence F1 , F2 , . . . , Fn , . . . of distribution
functions converges to the distribution function F , written
Suppose that F1 , F2 , . . . , Fn , . . . is a sequence of distribution
functions with corresponding characteristic functions
M1 , M2 , . . . , Mn , . . .
Fn ! F
I
if
Fn (x) ! F (x)
I
at each point x where F is continuous.
If Fn ! F for some distribution function F with characteristic
function M, then Mn (!) ! M(!) for all !.
Conversely, if M(!) = limn!1 Mn (!) exists and is continuous
at ! = 0, then M is the characteristic function of some
distribution function F , and Fn ! F .
17 / 65
Convergence of the distribution functions of X n
18 / 65
Convergence in distribution
Definition
Applying the continuity theorem we can give another interpretation
of the convergence to m of the sequence of sample means.
The sequence X1 , X2 , . . . , Xn , . . . converges in distribution to the
random variable X as n ! 1 if FXn ! FX . That is, if
Let Fn be the distribution function of X n and let
⇢
0, x < m
F (x) =
1, x m
FXn (x) ! FX (x)
for all x 2 C (FX ),
where
C (FX ) = {x 2 R : F (x) is continuous at x}
be the distribution function of a random variable Y with
characteristic function M(!) = e im! . (Notice that Y = m with
probability 1.)
Notation:
d
Xn ! X
Then
Fn (x) ! F (x)
for all x 6= m
Example:
d
Xn ! m
19 / 65
20 / 65
The central limit theorem
The central limit theorem
Theorem
Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and
identically distributed r.v. with E(Xk ) = m and Var(Xk ) =
Let
n
1 X Xk
Sn? = p
n
m
I
2.
E(Sn? ) = 0,
.
I
k=1
Then
We have
Notice that Sn? is the normalized sample mean X n :
1
lim FSn? (x) = FN(0,1) (x) = p
2⇡
n!1
Xn m
p
/ n
Sn? =
d
Sn? ! N(0, 1)
That is, for all x 2 R,
Var(Sn? ) = 1
Z
x
e
t 2 /2
X n = p Sn? + m
n
dt
1
21 / 65
The central limit theorem
22 / 65
The central limit theorem
Let Mn be the characteristic function of Sn? . Therefore
⇣
i!Sn?
Since
⌘
Mn (!) = E e
✓
P
X
i! p1n nk=1 k
=E e
=
n
Y
k=1
✓
E e
i
! Xk
p
n
m
m1 = E(Z ) = 0,
m
◆
◆
=
=E
✓
n
Y
e
i
k=1
MZ
✓
!
p
n
! Xk
p
n
◆◆n
m
the first terms of the series expansion of MZ are:
!
MZ (u) = 1 + im1 u +
Z=
i2
m2 u 2 + · · · = 1
2
1 2
u + ···
2
Therefore
where MZ is the characteristic function of
X1
m2 = E Z 2 = 1,
Mn (!) =
m
23 / 65
✓
MZ
✓
!
p
n
◆◆n
=
1
1
2
✓
!
p
n
◆2
+o
✓
!
p
n
◆2 ! n
24 / 65
The central limit theorem
The central limit theorem
Hence,
Taking logarithms,
ln Mn (!) = n ln 1
=n
1
2
✓
!
p
n
◆2
1
2
✓
!
p
n
◆2
✓ ◆!
1
+o
n
+o
✓
!
!
p
n
Mn (!) ! e
◆2 !
! 2 /2
This is the characteristic function of a standard normal random
variable.
Then, by the continuity theorem,
1 2
! ,
2
d
Sn? ! N(0, 1)
25 / 65
WLLN versus CLT
26 / 65
WLLN versus CLT
On the other hand, by the CLT:
✓
◆
✓
1
1
1
P Xn
>
=1 P
 Xn
2
10
10
Example: Let X1 , X2 , . . . , Xn , . . . be a sequence of independent
random varibles, uniformly distributed on [0, 1].
=1
By the weak law of large numbers we have that
✓
◆
1
1
P Xn
>
! 0 as n ! 1
2
10
I
27 / 65
1
1

2
10
!
p
p
12n
X n 1/2
12n
p
P


10
10
1/ 12n
!!
p
12n
⇡ 2 1 FN(0,1)
10
◆
p
We have taken into account that X n 1/2 / 1/ 12n
converges in distribution to a standard normal. Thus, its
distribution function can be approximated by FN(0,1) .
28 / 65
De Moivre-Laplace Theorem
De Moivre-Laplace Theorem
For instance,
If each Xk is a Bernoulli random variable with probability p, then
P(a  Sn  b) = P(a
Sn = X1 + X2 + · · · Xn ⇠ Bin(n, p)
✓
◆
np 1
Sn np
b np
=P
< p
 p
p
npq
npq
npq
✓
◆
✓
◆
b np
a np 1
= Fn p
Fn
p
npq
npq
✓
◆
✓
◆
b np + 1/2
a np 1/2
⇡ FN(0,1)
FN(0,1)
p
p
npq
npq
with expected value np and variance npq.
Hence, the CLT implies
Theorem
Sn np
p
npq
1 < Sn  b)
d
! N(0, 1)
a
29 / 65
30 / 65
A local version of the CLT
<< Statistics`DiscreteDistributions`
<< Statistics`ContinuousDistributions`
Theorem
b6@x_D := CDF@BinomialDistribution@6, 0.5D, xD
f6@x_D := Which@x < -3 ê Sqrt@6 * 0.25D, 0, x < -2 ê Sqrt@6 * 0.25D, b6@0D,
x < -1 ê Sqrt@6 * 0.25D, b6@1D, x < 0, b6@2D, x < 1 ê Sqrt@6 * 0.25D, b6@3D,
x < 2 ê Sqrt@6 * 0.25D, b6@4D, x < 3 ê Sqrt@6 * 0.25D, b6@5D, x ¥ 3 ê Sqrt@6 * 0.25D, 1D
Let X1 , X2 , . . . , Xk , . . . be independent identically distributed
random variables with zero mean and unit variance, and suppose
further that their common characteristic function MX satisfies
Z 1
|MX (!)|r d! < 1
Plot@8f6@xD, CDF@NormalDistribution@0, 1D, xD<, 8x, -3, 3<D
1
0.8
1
0.6
for some integer r
0.2
-3
-2
-1
1.
p
Therefore the density fn of Un = (X1 + X2 + · · · + Xn )/ n exists
for n r , and
0.4
1
2
1
fn (u) ! p e
2⇡
3
31 / 65
u 2 /2
as
n ! 1, uniformly in R
32 / 65
Example: sum of uniform r.v.
Example: sum of uniform r.v.
Suppose that each Xi is uniform on [
and Var(Xi ) = 1
We have
p p
3, 3]. Hence, E(Xi ) = 0
Z
Their common characteristic function is
Z 1
MX (!) =
e i!x fX (x) dx
1
1
= p
2 3
Z
p
3
p
e
i!x
3
1
1
2
|MX (!)| d! =
Z
1
1
!2
p
sin 3 !
⇡
p
d! = p < 1,
3!
3
Hence, the sufficient condition of the theorem holds for r = 2.
p
Thus, the density fn of Un = (X1 + X2 + · · · + Xn )/ n exists for
all n 1 and
p
sin 3 !
dx = p
3!
1
fn (u) ! p e
2⇡
u 2 /2
33 / 65
Example: sum of uniform r.v.
For instance, let
U3 =
Example: sum of uniform r.v.
The calculation of f3 gives
8
0,
>
>
>
>
>
>
>
>
>
(u + 3)2 /16,
>
>
>
>
<
f3 (u) =
(3 u 2 )/8,
>
>
>
>
>
>
>
(3 u)2 /16,
>
>
>
>
>
>
:
0,
X1 + X2 + X3
p
3
If
f (s) = fX (s) ⇤ fX (s) ⇤ fX (s)
then
f3 (u) =
p
3f
34 / 65
⇣p
3s
⌘
35 / 65
u<
3
3u<
1
1u<1
1u<3
u
3
36 / 65
Example: sum of uniform r.v.
Poisson’s theorem
Suma de 3 v.a. uniformes en [-Sqrt[3],Sqrt[3]], independents
Theorem
f@t_D := Which@t § -3 Sqrt@3D, 0, t § -Sqrt@3D,
H3 Sqrt@3D + tL ^ 2 ê H48 Sqrt@3DL, t § Sqrt@3D, H9 - t ^ 2L ê H24 Sqrt@3DL,
t § 3 Sqrt@3D, H3 Sqrt@3D - tL ^ 2 ê H48 Sqrt@3DL, True, 0D
If Xn ⇠ Bin(n, /n),
g@t_D := Sqrt@3D f@Sqrt@3D tD
Plot@81 ê Sqrt@2 PiD Exp@-1 ê 2 t ^ 2D, g@tD<, 8t, -6, 6<,
PlotRange Ø All, PlotStyle Ø 88RGBColor@1, 0, 0D<, 8RGBColor@0, 1, 0D<<D
> 0, then
d
Xn ! Poiss( )
0.4
0.3
Proof:
✓
◆n
MXn (!) = 1
+ e i!
n
n
✓
◆n
i!
(e
1)
= 1+
!e
n
0.2
0.1
-6
-4
-2
2
4
(e i! 1)
as
n!1
6
37 / 65
Convergence in distribution: a result
38 / 65
Convergence in mean square
Theorem
Let X1 , X2 , . . . Xn . . . and X be random variables taking
nonnegative integer values. A necessary and sufficient condition for
d
Xn ! X is
lim P(Xn = k) = P(X = k)
n!1
for all
k
Given a probability space (⌦, F, P), let us consider the vector
space H whose elements are the random variables X with a finite
second order moment, E(X 2 ) < 1.
0
We define an inner product in H by
hX , Y i = E(XY )
For example, by Poisson’s theorem and this result, we have that
✓ ◆ ✓ ◆k ✓
n
1
k
n
n
◆n
k
k
!e
k!
39 / 65
40 / 65
Convergence in mean square
Convergence in mean square
Definition
I
I
The norm induced by this inner product is:
q
p
k X k= hX , X i = E(X 2 )
The sequence X1 , X2 , . . . , Xn , . . . converges in mean square to the
random variable X as n ! 1 if
Moreover, a distance between random variables of H can be
considered:
r ⇣
⌘
d(X , Y ) =k X Y k= E (X Y )2
Equivalently,
d(Xn , X ) ! 0
E (Xn
Notation:
as n ! 1,
X )2 ! 0
as n ! 1
2
Xn ! X
41 / 65
Example
42 / 65
Convergence in r -mean
More generally,
Definition
The sequence of sample means X n converges in mean square to
the expected value m:
E
⇣
Xn
m
2
⌘
The sequence X1 , X2 , . . . , Xn , . . . converges in r -mean to the
random variable X as n ! 1 if
2
= Var X n =
n
!0
E (|Xn
Notation:
X |r ) ! 0
as n ! 1
r
Xn ! X
43 / 65
44 / 65
Example
Example
Proof: We have to prove that
Consider the sequence X1 , X2 , . . . , Xn , . . . such that
8
>
< P(Xn = 0) = 1 1
n
n = 1, 2, 3, . . .
1
>
: P(Xn = 1) = P(Xn = 1) =
2n
E (|Xn |r ) ! 0
as n ! 1
Indeed,
E(|Xn |r ) = 0 · P(Xn = 0) + 1 · (P(Xn =
Let us prove that, for any r > 0,
r
Xn ! 0
=
1
1
+
2n 2n
1) + P(Xn = 1))
!0
45 / 65
Almost surely convergence
Uniqueness
Definition
Theorem
The sequence X1 , X2 , . . . , Xn , . . . converges almost surely to the
random variable X as n ! 1 if
P ({! 2 ⌦ : Xn (!) ! X (!)
Notation:
46 / 65
Let X1 , X2 , . . . , Xn , . . . be a sequence of random variables. If the
sequence converges
as n ! 1}) = 1
a.s.
Xn ! X
I
almost surely,
I
in probability,
I
in r -mean,
I
or in distribution,
then the limiting random variable (distribution) is unique
We also say that Xn converges to X with probability 1.
47 / 65
48 / 65
Uniqueness
Uniqueness
Let ! 2 NX \ NY = NX [ NY . Then
For example, let us prove the uniqueness of the limit in the case of
almost sure convergence.
a.s.
|X (!)
a.s.
Suppose that Xn ! X and Xn ! Y and let
Y (!)|  |X (!)
Xn (!)| + |Xn (!)
Y (!)| ! 0
So, if ! 2 NX [ NY , then X (!) = Y (!) ; hence, if X (!) 6= Y (!),
then ! 2 NX [ NY .
NX = {! : Xn (!) 6! X (!) as n ! 1}
Thus
NY = {! : Xn (!) 6! Y (!) as n ! 1}
P(X 6= Y )  P(NX [ NY )  P(NX ) + P(NY ) = 0
So, we have that
That is,
P(NX ) = P(NY ) = 0
X =Y
with probability 1
49 / 65
Relations between the convergence concepts
50 / 65
Relations between the convergence concepts
Theorem
With additional hypothesis some converse implications hold.
The following implications hold:
⇣
⌘
⇣
⌘
a.s.
P
I
Xn ! X =) Xn ! X
⇣
⌘
⇣
⌘
r
P
I
Xn ! X =) Xn ! X for any r 1.
⇣
⌘
⇣
⌘
P
d
I
Xn ! X =) Xn ! X
⇣
⌘
⇣
⌘
r
s
I If r > s
1, then Xn ! X =) Xn ! X
Theorem
I
I
I
All implications are strict.
51 / 65
⇣
⌘
⇣
⌘
d
P
If c is a constant, then Xn ! c =) Xn ! c
P
If Xn ! X and there exists a constant C such that
r
P(|Xn |  C ) = 1 for all n, then Xn ! X for all r 1.
P
If Pn (✏) = P (|Xn X | > ✏) satisfies n Pn (✏) < 1 for all
a.s.
✏ > 0, then Xn ! X
52 / 65
Relations between the convergence concepts
Relations between the convergence concepts
If ✏ > 0 is a fixed number, we have that
For instance, let us consider the proof of the implication
⇣
⌘
⇣
⌘
d
P
Xn ! c =) Xn ! c
P(|Xn
d
Hence, assume that Xn ! X where X = c is a constant r.v. with
distribution function
(
0, x < c
FX (x) =
1, x c
c| > ✏) = 1
✏  Xn  c + ✏)
P(c
=1
(FXn (c + ✏)
FXn (c
1
FXn (c + ✏) + FXn (c
✏) + P(Xn = c
✏))
✏) ! 0
because
FXn (c + ✏) ! FX (c + ✏) = 1,
✏) ! FX (c
FXn (c
Therefore
✏) = 0
P
Xn ! c
53 / 65
Example
54 / 65
Example
We have that
Consider the sequence X1 , X2 , . . . , Xn , . . . such that
Pn (✏) = P(|Xn
1| > ✏) =
P(X1 = 1) = 1,
P(Xn = 1) = 1
1
,
n2
P(Xn = n) =
1
n2
Therefore
n
2
1
X
n=1
Let us prove that the sequence converges almost surely to 1.
Hence,
Pn (✏) =
(
0,
n=1
1/n2 ,
n
2
1
X
1
<1
n2
n=2
a.s.
Xn ! 1
55 / 65
56 / 65
Strong laws of large numbers
Strong laws of large numbers
Theorem
Theorem
Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and
identically distributed random variables and set
Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and
identically distributed random variables with finite expectation m
and finite variance.
Set
X1 + X2 + · · · + Xn
Xn =
,
n
Then,
a.s.
Xn ! m
as
n
Xn =
1
X1 + X2 + · · · + Xn
,
n
Then
a.s.
Xn ! µ
n!1
as
n
1
n!1
for some constant µ, if and only if E(|X1 |) < 1. In this case,
µ = E (X1 ).
57 / 65
Borel-Cantelli lemmas
58 / 65
Borel-Cantelli lemmas
The limit of the sequence B1 , B2 , . . . , Bn , . . . is
Given a sequence of events A1 , A2 , . . . , An , . . ., let
Bn =
1
[
A? = lim Bn =
Ak ,
n
n!1
1
k=n
B2
···
n=1
Bn =
1 [
1
\
Ak
n=1 k=n
The event A? is called the limit superior of the sequence
A1 , A2 , . . . , An , . . ., and it is denoted by A? = lim supn An .
Notice that
B1
1
\
Bn
···
I
Notice that ! 2 A? if and only if ! belongs to infinitely many
of the An .
I
That is, A? is the event “infinitely many of the An occur”.
is a decreasing sequence of events.
59 / 65
60 / 65
Borel-Cantelli lemmas
Borel-Cantelli lemmas
For example, the proof of the first Borel-Cantelli lemma is:
⇣
⌘
P(A? ) = P lim Bn = lim P(Bn )
n!1
n!1
!
1
1
[
X
= lim P
Ak  lim
P(Ak ) = 0
Theorem (Borel-Cantelli lemmas)
Let A1 , A2 , . . . , An , . . . be a sequence of events and A? its limit
superior. Then:
P1
I P(A? ) = 0 if
P(An ) < 1,
Pn=1
1
I P(A? ) = 1 if
n=1 P(An ) = 1 and the events A1 , A2 , . . . are
independent.
n!1
because
P
n
k=n
n!1
k=n
P(An ) converges by hypothesis.
61 / 65
Zero-one law
62 / 65
Example
Let X1 , X2 , . . . , Xn , . . . be a sequence of random variables such that
P(Xn = 0) =
Corollary (zero-one law)
1
,
n2
P(Xn = 1) = 1
Let us consider the events An = {Xn = 0}, n
Let A1 , A2 , . . . , An , . . . be a sequence of independent events and let
A? be its limit superior.
P
Then either P(A? ) = 0 or P(A? ) = 1 according as 1
n=1 P(An )
converges or diverges respectively.
I
I
1
n2
n
1
1.
P
Since n P(An ) < 1, the first Borel-Cantelli lemma implies
that P(A? ) = 0. Thus, there is a 0 probability that {Xn = 0}
happens infinitely often.
Therefore P(Xn = 1 for all n suficiently large) = 1.
Thus we have proved:
lim Xn = 1
n!1
63 / 65
with probability 1
64 / 65
Example
Now, let X1 , X2 , . . . , Xn , . . . be independent random variables such
that
1
1
P(Xn = 0) = , P(Xn = 1) = 1
n 1
n
n
and let An = {Xn = 0}, n 1.
P
I Since
n P(An ) = 1, the second Borel-Cantelli lemma
implies that, with probability 1, infinitely many of the events
An = {Xn = 0} occur.
P
I Analogously,
n P(An ) = 1. Thus, the probability that
infinitely many of the events An = {Xn = 1} occur is also 1.
I
Hence, with probability 1, limn!1 Xn does not exist.
65 / 65
Related documents