Download Limits and the Law of Large Numbers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Limits and the Law of Large
Numbers
Lecture XIII
Almost Sure Convergence
 Let w represent the entire random sequence
{Zt}. As discussed last time, our interest
typically centers around the averages of this
sequence:
1 n
bn    t 1 Z t
n
 Definition 2.9: Let {bn(w)} be a sequence of
real-valued random variables. We say that
bn(w) converges almost surely to b, written
bn    b
a.s.
if and only if there exists a real number b
such that
P : bn    b  1
 The probability measure P describes the
distribution of w and determines the joint
distribution function for the entire sequence
{Zt}.
 Other common terminology is that bn(w)
converges to b with probability 1 (w.p.1) or
that bn(w) is strongly consistent for b.
 Example 2.10: Let
1 n
Z n  t 1 Z t
n
where {Zt} is a sequence of independently
and identically distributed (i.i.d.) random
variables with E(Zt)=m<. Then
a .s .
Z n 
m
by the Komolgorov strong law of large
numbers (Theorem 3.1).
 Proposition 2.11: Given g: RkRl (k,l<∞)
and any sequence {bn} such that
bn  b
a.s.
where bn and b are k x 1 vectors, if g is
continuous at b, then
g bn   g b 
a.s.
 Theorem 2.12: Suppose
– y=Xb0+e;
– X’e/n a.s. 0;
– X’X/a.s.M, finite and positive definite.
 Then bn exists a.s. for all n sufficiently
large, and bna.s.b0.
 Proof: Since X’X/n a.s.M, it follows from
Proposition 2.11 that det(X’X/n)
a.s.det(M). Because M is positive definite
by (iii), det(M)>0. It follows that
det(X’X/n)>0 a.s. for all n sufficiently large,
so (X’X/N)-1 exists a.s. for all n sufficiently
large. Hence

bˆn  X ' X
n
1
X' y
n
 In addition,

bˆn  b 0  X ' X n

1
X 'e
n
 It follows from Proposition 2.11 that
a. s .
1
ˆ
b 0  b 0  M  0  b 0
Convergence in Probability
 A weaker stochastic convergence concept is
that of convergence in probability.
 Definition 2.23: Let {bn(w)} be a sequence
of real-valued random variables. If there
exists a real number b such that for every e
> 0,
P : bn    b  e   1
as n , then bn(w) converges in
probability to b.
 The almost sure measure of probability
takes into account the joint distribution of
the entire sequence {Zt}, but with
convergence in probability, we only need to
be concerned with the joint distribution of
those elements that appear in bn(w).
 Convergence in probability is also referred
to as weak consistency.
 Theorem 2.24: Let { bn(w)} be a sequence
of random variables. If
bn  b, then bn 
 b
a.s.
p
If bn converges in probability to b, then
there exists a subsequence {bnj} such that
bn j  b
a. s .
th
r
Convergence in the
Mean
 Definition 2.37: Let {bn(w)} be a sequence
of real-valued random variables. If there
exists a real number b such that


E bn    b  0
r
as n   for some r > 0, then bn(w)
converges in the rth mean to b, written as
bn    b
r .m
 Proposition 2.38: (Jensen’s inequality) Let
g: R1R1 be a convex function on an
interval BR1 and let Z be a random
variable such that P[ZB]=1. Then
g(E(Z))  E(g(Z)). If g is concave on B,
then g(E(Z)) E(g(Z)).
 Proposition 2.41: (Generalized Chebyshev
Inequality) Let Z be a random variable such
that E|Z|r < , r > 0. Then for ever e > 0
P Z  e  
 
E Z
r
e
r
 Theorem 2.42: If bn(w)r.m. b for some r >
0, then bn(w)p b.
Laws of Large Numbers
 Proposition 3.0: Given restrictions on the
dependence, heterogeneity, and moments of
a sequence of random variables {Zt},
a.s.
Z n  m n 
0
where
1 n
Z n  t 1 Z t and m n  E Z n 
n
Independent and Identically
Distributed Observations
 Theorem 3.1: (Komolgorov) Let {Zt} be a
sequence of i.i.d. random variables. Then
Z n  m
a .s .
if and only if E|Zt| <  and E(Zt) = m.
 This result is consistent with Theorem 6.2.1
(Khinchine) Let {Xi} be independent and
identically distributed (i.i.d.) with E[Xi] = m.
Then
P
Xn 
 m
 Proposition 3.4: (Holder’s Inequality) If p >
1 and 1/p+1/q=1 and if E|Y|p <  and E|Z|q
< , then E|YZ|[E|Y|p]1/p[E|Z|q]1/q.
 If p=q=2, we have the Cauchy-Schwartz
inequality
  EZ 
EYZ   E Y
2
1
2
2
1
2
Asymptotic Normality
 Under the traditional assumptions of the
linear model (fixed regressors and normally
distributed error terms) bn is distributed
multivariate normal with:
 
E bˆn  b 0
 
1
2
ˆ
V bn   0 X ' X 
for any sample size n.
 However, when the sample size becomes
large the distribution of bn is approximately
normal under some general conditions.
 Definition 4.1: Let {bn} be a sequence of
random finite-dimensional vectors with
joint distribution functions {Fn}. If Fn(z) 
F(z) as n   for every continuity point z,
where F is the distribution function of a
random variable Z, then bn converges in
distribution to the random variable Z,
denoted
bn 
 Z
d
 Other ways of stating this concept are that
bn converges in law to Z:
bn 
 Z
L
Or, bn is asymptotically distributed as F
A
bn ~ F
In this case, F is called the limiting
distribution of bn.
 Example 4.3: Let {Zt} be a i.i.d. sequence
of random variables with mean m and
variance 2 < . Define
bn

Z

n  E Z n 
V Z 
1
n
2
1
 
n
1
2

n
t 1
Zt  m 
Then by the Lindeberg-Levy central limit
A
theorem (Theorem 6.2.2),
bn ~ N 0,1

 Theorem (6.2.2): (Lindeberg-Levy) Let
{Xi} be i.i.d. with E[Xi]=m and V(Xi)=2.
Then ZnN(0,1).
 Definition 4.8: Let Z be a k x 1 random
vector with distribution function F. The
characteristic function of Z is defined as
f l   Eexp il ' Z 
where i2=-1 and l is a k x 1 real vector.
 Example 4.10: Let Z~N(m,2). Then

f l   exp ilm  l 
2
2
2
 This proof follows from the derivation of
the moment generating function in Lecture
VII.
 Specifically, note the similarity between the
definition of the moment generating
function and the characteristic function:
M X t   E exp tx
f l   E exp ilz 
 Theorem 4.11 (Uniqueness Theorem) Two
distribution functions are identical if and
only if their characteristic functions are
identical.
 Note that we have a similar theorem for
moment generating functions.
 Proof of Lindeberg-Levy:
– First define f(l) as the characteristic function
for Zt-m and let fn(l) be the characteristic
function of
n Z n  m n 
1
 
n  n
1
2

n
t 1
Zt  m 

– By the structure of the characteristic function
we have
 l 
f n l   f 

 n 
  l 
ln  f n l   n ln  f 
 
   n 
n
– Taking a second order Taylor series expansion
of f(l) around l=0 gives
f l   1   l
2 2
Thus,

ln  f n l   n ln 1  l
2
2
 n  l 2 as n  
o l
2
2n
 
o l
2
2
 Thus, by the Uniqueness Theorem the
characteristic function of the sample
approaches the characteristic function of the
standard normal.