Download Limits and the Law of Large Numbers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Lecture XIV
w represent the entire random sequence
{Zt}. As discussed last time, our interest
typically centers around the averages of this
sequence:
 Let
1 n
bn    t 1 Z t
n
 Definition
2.9: Let {bn(w)} be a sequence of
real-valued random variables. We say that
bn(w) converges almost surely to b, written
bn    b
a.s.
if and only if there exists a real number b
such that
P : bn    b  1
 The
probability measure P describes the
distribution of w and determines the joint
distribution function for the entire sequence
{Zt}.
 Other common terminology is that bn(w)
converges to b with probability 1 (w.p.1) or
that bn(w) is strongly consistent for b.
 Example
2.10: Let
1 n
Z n  t 1 Z t
n
where {Zt} is a sequence of independently
and identically distributed (i.i.d.) random
variables with E(Zt)=m<. Then
a .s .
Z n 
m
by the Komolgorov strong law of large
numbers (Theorem 3.1).
2.11: Given g: RkRl (k,l<∞) and
any sequence {bn} such that
 Proposition
bn  b
a.s.
where bn and b are k x 1 vectors, if g is
continuous at b, then
g bn   g b 
a.s.
 Theorem



2.12: Suppose
y=Xb0+e;
X’e/n a.s. 0;
X’X/a.s.M, finite and positive definite.
bn exists a.s. for all n sufficiently large,
and bna.s.b0.
 Then
Since X’X/n a.s.M, it follows from
Proposition 2.11 that det(X’X/n) a.s.det(M).
Because M is positive definite by (iii),
det(M)>0. It follows that det(X’X/n)>0 a.s.
for all n sufficiently large, so (X’X/n)-1 exists
a.s. for all n sufficiently large. Hence
 Proof:

bˆn  X ' X
n
1
X' y
n
 In
addition,

ˆ
bn  b0  X ' X
 It
n
1
X 'e
n
follows from Proposition 2.11 that
a. s .
1
ˆ
b 0  b 0  M  0  b 0
A
weaker stochastic convergence concept is
that of convergence in probability.
 Definition 2.23: Let {bn(w)} be a sequence of
real-valued random variables. If there exists
a real number b such that for every
e > 0,
P : bn    b  e   1
as n , then bn(w) converges in probability
to b.
 The
almost sure measure of probability takes
into account the joint distribution of the
entire sequence {Zt}, but with convergence
in probability, we only need to be concerned
with the joint distribution of those elements
that appear in bn(w).
 Convergence in probability is also referred to
as weak consistency.
 Theorem
2.24: Let { bn(w)} be a sequence of
random variables. If
a.s.
p
n
n
b  b, then b 
 b
If bn converges in probability to b, then there
exists a subsequence {bnj} such that
bn j  b
a. s .
 Definition
2.37: Let {bn(w)} be a sequence of
real-valued random variables. If there exists
a real number b such that


E bn    b  0
r
as n   for some r > 0, then bn(w)
converges in the rth mean to b, written as
bn    b
r .m
 Proposition
2.38: (Jensen’s inequality) Let g:
R1R1 be a convex function on an interval B
 R1 and let Z be a random variable such
that P[ZB]=1. Then g(E(Z))  E(g(Z)). If g
is concave on B, then g(E(Z)) E(g(Z)).
 Proposition
2.41: (Generalized Chebyshev
Inequality) Let Z be a random variable such
that E|Z|r < , r > 0. Then for ever e > 0
P Z  e  
 
E Z
r
e
r
 Theorem
2.42: If bn(w)r.m. b for some
r > 0, then bn(w)p b.
 Proposition
3.0: Given restrictions on the
dependence, heterogeneity, and moments of
a sequence of random variables {Zt},
Z n  m n  0
a.s.
where
1 n
Z n  t 1 Z t and m n  E Z n 
n
 Theorem
3.1: (Komolgorov) Let {Zt} be a
sequence of i.i.d. random variables. Then
Z n  m
a .s .
if and only if E|Zt| <  and E(Zt) = m.
 This result is consistent with Theorem 6.2.1
(Khinchine) Let {Xi} be independent and
identically distributed (i.i.d.) with E[Xi] = m.
Then
Xn 
 m
P
 Proposition
3.4: (Holder’s Inequality) If
p > 1 and 1/p+1/q=1 and if E|Y|p <  and
E|Z|q < , then E|YZ|[E|Y|p]1/p[E|Z|q]1/q.
 If p=q=2, we have the Cauchy-Schwartz
inequality
  EZ 
EYZ   E Y
2
1
2
2
1
2
 Under
the traditional assumptions of the
linear model (fixed regressors and normally
distributed error terms) bn is distributed
multivariate normal with:
E  bˆn   b0
V  bˆn   
for any sample size n.
2
0
X 'X 
1
 However,
when the sample size becomes
large the distribution of bn is approximately
normal under some general conditions.
 Definition
4.1: Let {bn} be a sequence of
random finite-dimensional vectors with joint
distribution functions {Fn}. If Fn(z)  F(z) as
n   for every continuity point z, where F
is the distribution function of a random
variable Z, then bn converges in distribution
to the random variable Z, denoted
bn 
 Z
d
 Other
ways of stating this concept are that bn
converges in law to Z:
bn 
 Z
L
Or, bn is asymptotically distributed as F
A
bn ~ F
In this case, F is called the limiting
distribution of bn.
 Example
4.3: Let {Zt} be a i.i.d. sequence of
random variables with mean m and variance
2 < . Define
bn

Z

n  E Z n 
V Z 
1
n
2
1
 
n
1
2

n
t 1
Zt  m 
Then by the Lindeberg-Levy central limit
theorem (Theorem 6.2.2),
bn ~ N 0,1
A

 Theorem
(6.2.2): (Lindeberg-Levy) Let {Xi} be
i.i.d. with E[Xi]=m and V(Xi)=2. Then
ZnN(0,1).
 Definition 4.8: Let Z be a k x 1 random
vector with distribution function F. The
characteristic function of Z is defined as
f l   Eexp il ' Z 
where i2=-1 and l is a k x 1 real vector.
 Example
4.10: Let Z~N(m,2). Then

f l   exp ilm  l 
 This
2
2
2
proof follows from the derivation of the
moment generating function in Lecture VII.
 Specifically,
note the similarity between the
definition of the moment generating function
and the characteristic function:
M X t   E exp tx
f l   E exp ilz 
 Theorem
4.11 (Uniqueness Theorem) Two
distribution functions are identical if and
only if their characteristic functions are
identical.
 Note
that we have a similar theorem for
moment generating functions.
 Proof of Lindeberg-Levy:

First define f(l) as the characteristic function for
Zt-m and let fn(l) be the characteristic function
of
1
n Z n  m n 
 
n  n
1
2

n
t 1
Zt  m 


By the structure of the characteristic function we
have
 l 
f n l   f 

 n 
  l 
ln  f n l   n ln  f 
 
   n 
n

Taking a second order Taylor series expansion of
f(l) around l=0 gives
f l   1   l
2 2
 
o l
2
2
Thus,

ln  f n l   n ln 1  l
2
 n  l 2 as n  
o l
2
2n
2
 Thus,
by the Uniqueness Theorem the
characteristic function of the sample
approaches the characteristic function of the
standard normal.