Download A. Melino Large Sample Theory ECO 327 - Lecture Notes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cayley–Hamilton theorem wikipedia , lookup

Ordinary least squares wikipedia , lookup

Transcript
A. Melino
ECO 327 - Lecture Notes
Large Sample Theory
So far we have studied the classical normal linear (CNL) model under whose assumptions we can compute
the sampling distribution of the least squares estimators and invoke exact tests of linear restrictions for any sample size
n.
Our results are based on the following assumptions:
(i)
The model is known to be y = X$ + e, $ < 4
(ii) X is a nonstochastic and finite n x k matrix
(iii) X has full column rank (] rank(XNX) = k)
(iv) E(e) = 0
(v)
V(e) = F2In
(vi) e ~ Nn
Let bn denote the OLS estimator from a sample of size n. We have established the following results:
(a) (Uniqueness) Given (iii) bn is unique
(b) (Unbiasedness) Given (i)-(iv) E(bn) = $
(c) (Normality) Given (i)-(vi) bn ~ Nk ($,F2(XNX)-1)
(d) (Efficiency) Given (i)-(v), bn is BLUE
Given (i)-(vi), bn has minimum variance in the class of all (not just linear)
unbiased estimators.
In the second part of the course, we shall deal with what happens if assumptions (ii), (v) or (vi) are not
satisfied. Failure of (iii) gives perfect multicollinearity, which has already been discussed. Violation of (iv) leads to
specification error.
In general, failure to satisfy the assumptions above leads to estimators with very complicated sampling
distributions. It turns out, however, that we can often derive useful approximations by considering what happens as
n 6 4.
If assumption (i) is violated, least squares estimators may turn out to be inappropriate. The maximum
likelihood principle turns out to be a powerful idea that can be used for a very wide class of models.
2
2. Convergence of a sequence of random variables to a constant
Let {cn} be a sequence of real numbers. We say that c is the limit of the sequence (as n gets large) if for
every real , > 0 there exists an integer n*(,) such that
Sup
*cn - c* < ,
n $ n*(,)
Graphically
*
*
*
.
*
. .
c+,
*_ _ _ _ _ _ . _ _ _ _ _ _ _ _
*
.
.
*
.
.
.
c
*.
.
.
*
.
.
*
c-,
*_ _ . _ _ _ _ _ _ _ _ _ _ _ _
* .
* .
*.
*
n*(,)
n
Examples
(i) cn = 7 + 1/n
cn 6 7
(ii) cn = (2n2 + 5n - 3)/(n2 + n - 1)
(iii) cn = (-1)
n
cn 6 2
cn has no limit
Application to random variables
In statistical applications, we usually have a sample of observations, say ((y1,X1),(y2,X2),...(yn,Xn)). Suppose
we think of this as just the first n observations from an infinite sample which we call T. [The set all possible samples
is called the sample space and is denoted by S.] We are interested in a statistic which is a function of the sample, say
the sample mean or a set of regression coefficients, which we can compute for any sample size n. This gives us a
sequence of numbers which we can denote {bn(T)}. For any given infinite sample (specific T) we can ask if bn(T)
has a limit. However, we are interested in the sampling distribution, i.e. what happens if we draw many infinite
samples. For some samples, bn(T) might converge whereas for others it won't. We want some way to express the idea
that it should converge for most samples. There are three related notions that are useful.
3
Definition. Let {bn} be a sequence of random variables. We say that bn converges almost surely to b if for every
real , > 0 there exists an integer n*(,) such that
P(T: sup *bn(T)-b*<,) = 1
n$n*(,)
or
]
P(T: bn(T)6b) = 1
Rks: We often write bn a.6s. b
Almost sure convergence is closely related to the usual idea of limit. It says that the probability of drawing
a sample where the statistic converges in the usual sense is one. There are many other limiting concepts available for
sequences of r.v.'s. For our purposes, we can use a much simpler concept.
Definition. Let {bn(T)} be a sequence of random variables. We say that bn convergences in probability to b if for
every ,>0
P(T:*bn(T)-b*<,) 6 1
Rks: We often write bn P6 b or plim bn = b.
: We will always use convergence in probability unless specifically stated otherwise.
: Almost sure convergence is a stronger property, i.e.
bn a.6s. b Y bn P6 b
but the converse is not generally true.
Finding the plim of an estimator
A simple and often convenient way to prove that an estimator converges to the true parameter value is to prove
that the bias and variance of an estimator goes to zero (this is called convergence in mean square!)
The proof is based on Chebychev's inequality. See Econometrics by G. Chow (McGraw-Hill, 1983) for details.
Illustration
Let s2 be the OLS estimator of the variance F2. Under the standard assumptions of the CNL model
Es2 = F2 and Vs2 = 2F4/(n-k).
Since the bias is 0 for any sample size and Vs2 6 0 we conclude that plim s2 = F2.
4
Definition. A theorem which provides sufficient conditions for a sequence of random variables to converge to a
constant in probability (almost surely) is called a weak (strong) law of large numbers.
Rks: The result above is a particularly simple to use WLLN. However, it is unnecessarily restrictive since many
estimators can be shown to be consistent (i.e. converge in probability to the true parameter value) even though the
variance (or even mean) is infinite for any sample size n. Another example of a large of law numbers shows some
of the possible variety.
Theorem. (Markov's Law of Large Numbers) Let {xt} be a sequence of independent random variables such that
E*xt*1+* is finite for some *>0 and all t. Then
a.s.
1
0, where x¯ n '
jx
6
n t' 1 t
n
x¯ n& E x¯ n
Rks: If {xt} is i.i.d. we can take *=0 and this is a necessary and sufficient condition for a.s. convergence
(Kolmogorov's Strong Law of Large Numbers).
: If {xt} is not independent, we can still establish a law of large numbers as long as observations far apart are nearly
independent. In this case however, we need E*xt*1+* to be finite for a * which depends on the degree of dependence.
More dependence requires a larger value for * and, therefore, a smaller probability of getting an observation that is
far from the mean.
Laws of large numbers are the basic tools for proving convergence. However, often we are told that a sequence
converges, but we are interested in some function of the sequence. The following theorem is one of the most used in
econometrics.
Theorem (Slutsky) Let bn P6 b. If g is a function which is continuous at b then g(bn) P6 g(b).
Rks: This theorem also holds for vectors and matrices of functions.
Applications of Slutsky's Theorem Assume that plim bn, plim b1n, plim b2n, etc., exist. Then
(i)
plim (bn)2 = (plim bn)2
(ii)
plim (bn-1) = (plim bn)-1, provided plim bn =/ 0
(iii)
plim (b1n + b2n) = plim b1n + plim b2n
(iv)
plim (b1n.b2n) = plim b1n plim b2n
(v)
plim (b1n/b2n) = plim b1n/plim b2n, provided plim b2n…0
5
Rk: E(b1n/b2n) = E(b1n)@E(1/b2n) if b1n, b2n are independent, but not in general and E (1/b2n) = 1/Eb2n only if b2n is
a constant.
Also if An and Bn are matrices of random variables, for which the plim of each element exists, then
(vi)
plim (An.Bn) = plim An.plim Bn
(vii)
plim (An-1) = (plim An)-1 provided plim An is nonsingular
Illustration (Consistency of the OLS Estimator)
Assume
(i) y ' X$% e $<4
)
(ii) E(X t e t) ' 0
X )X
) 6E
(iii) E(
n
where E is a postitive definite symmetric matrix. If the matrices in (ii) and (iii) satisfy a LLN, then b is consistent.
We can write the OLS estimator as
bn ' $ % (
X )X & 1 X )e
) (
)
n
n
X )X & 1 X )e
) (
)
n
n
X )X & 1
X )e
' $ % plim(
) @plim(
) provided both plims exist
n
n
ˆ plim bn ' $ % plim (
Notice that if we could replace the matrices in the first
expression with their expectations, we would get
plim bn ' $ % E& 1 0 . However, laws of large numbers allow us to do exactly that -- i.e. to show that under some weak
assumptions that sample averages converge to their expected values (in probability). A typical element of XNX/n is
n-1 E XitXjt which is an average of a sequence of numbers ctij = Xit Xjt. If ctij satisfies the requirements for a law of
large numbers (for all i,j) then
plim (XNX)/n = E Y plim (XNX/n)-1 = E-1.
Similiarly, a typical element of XNe/n = n-1 E Xitet which again is an average of a sequence of numbers which we can
define as dit = Xit et. If dit satisfies conditions for a LLN then plim XNe/n = 0.
6
3. Convergence in Distribution
Definition. Let {zn(T)} be a sequence of random variables with distribution functions {Fn}. If Fn 6 F (as
n 6 4) for every continuity point of the latter, where F is the distribution of a r.v. z, then we say that zn converges
in distribution to the r.v. z, denoted zn d6 z.
Synonymns
.zn L6 z "convergences in law to z"
.zn D6 F(z) or zn a~ F(z) "is asymptotically distributed as F"
Example
Let {zn} be a sequence of i.i.d. r.v. with distribution function F. Then zn D6 F. [Note that convergence in
distribution by itself says nothing about convergence of the sequence {zn}].
A Central Limit Theorem
Theorem. (Liapounov) Let {xt} be a sequence of independent r.v. with Ext = µ t, var xt = F2t where F2t … 0,
and E*xt-µ t*2+* is finite for some *>0. Define
1
jx , µ¯ n ' E x¯ n ,
n t' 1 t
n
x¯ n '
2
¯Fn
n
' var x¯ n
If F¯ n2>*N>0 for some *N and all n sufficiently large, then
zn / n
Rks:
(¯x n& µ¯ n)
6 N(0,1)
¯Fn
If {xt} is i.i.d., we can set *=0 [This is called the Lindebergh-Levy Central Limit Theorem.] Note that we
have
µ¯ n ' µ
2
¯Fn ' F2
and the requirements for asymptotic normality reduce to 0 < F2<4.
: If {xt} is a dependent sequence, we need more stringent moment conditions plus a few other regularity
requirements.
7
: A shorthand notation for the result above is to write
2
x¯ n ~ AN(¯µ n,
¯Fn
n
)
Some theorems involving convergence in probability and distribution
D (i) If xn P6 x and yn D6 y then xnyn D6 xy
D (ii) If xn-yn P6 0 and yn D6 y then xnD6 y
Application
We will use D(i) to establish the asymptotic normality of bn. D(ii) is also extremely useful. For example, we often
ˆ that has the property
will be confronted with the situation where we have an estimator 2
ˆ ~ AN(2, V )
2
n
where the avar (variance of the asymptotic distribution) V may depend upon unknown parameters. Suppose we have
a consistent estimator Vˆn , since
& 1/2
and Vˆn
ˆ 2) 6D N(0,1)
V & 1/2 n(2&
ˆ 2) & V & 1/2 n(2&
ˆ 2) ' (Vˆ & 1/2V 1/2& 1)(V & 1/2 n(2&
ˆ 2)) 6P 0
n(2&
n
[the first term on the right hand side P60, and the second term D6N(0,1)], or
ˆ
ˆ ~ AN(2, V )
2
n
n
Rk: Note that the implication of this result is that replacing the avar by a consistent estimator will often not affect the
property of test statistics!!
Some Asymptotic Results for the k-variable Linear Model
Let's assume that
(i)
y = X$ + e
(ii)
plim (n-1XNX) = E, a finite positive definite matrix
(iii)
n
-1/2
D
2
(XNe) 6 Nk(0,F E)
(must be symmetric)
8
Using a vector generalization of theorem D(i) and Cramer's Theorem we obtain
n (bn& $) ' (
X )X & 1 X )e
) (
)
n
n
D
6 Nk (0,F2E& 1)
or in shorthand bn ~AN($,F2E-1/n).
To obtain (ii) we need the elements of XNX/n (with typical element n-1E1#t#n XitXjt) to satisfy a law of large numbers;
To obtain (iii) we need the elements of n-1/2 XNe (with typical element n-1/2 E 1#t#n Xitet) to satisfy a central limit theorem.
Rk: Constants can be viewed as independent random variables. However we can establish consistency even if X is
random or var (e) is infinite!
: The assumptions we have made rule out time trends (because n-1XNX goes to infinity) and dummy variables for
subsamples (because n-1XNX will tend to a matrix of less than full rank). It's really only the second problem that
interferes with proving consistency.
Some Extensions
We are often in a position where we can establish that a statistic is asymptotically normal, but we are
interested in the asymptotic distribution of some function of the statistic. The following two theorems are extremely
useful.
Theorem. (Rao) If xn D6 x and g is a continuous function then
g(xn) D6 g(x)
Application Suppose that n1/2(bn-$) D6 Nk(0,V) for some matrix V, then n(bn-$)NV-1(bn-$) D6 P2(k)
Theorem. (Cramer) Suppose n1/2(xn-µ) D6 N(0,v) and g is a function with continuous first derivative gN in a
neighborhood of µ. If g N(µ) … 0, then n1/2(g(xn) - g(µ)) D6 N(0,gN(µ) 2v)
Cramer's theorem also holds in the vector case