Download document 55606

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Maximum likelihood estimation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
DEV\
1
HB31
.M415
working paper
department
of
economics
TWO-STEP SERIES ESTIMATION OF
SAMPLE SELECTION MODELS
Whitney K. Newey
No.
99-04
February, 1999
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
WORKING PAPER
DEPARTMENT
OF ECONOMICS
TWO-STEP SERIES ESTIMATION OF
SAMPLE SELECTION MODELS
Whitney K. Newey
No.
99-04
February, 1999
MASSACHUSETTS
INSTITUTE OF
TECHNOLOGY
MEMORIAL DRIVE
CAMBRIDGE, MASS. 02142
50
TWO STEP SERIES ESTIMATION OF SAMPLE SELECTION MODELS
by
Whitney K. Newey
Department of Economics
MIT, E52-262D
Cambridge,
MA
02139
April 1988
Latest Revision, January 1999
Abstract
Sample selection models are important for correcting for the effects of nonrandom
sampling in microeconomic data. This note is about semiparametric estimation using a
Regression spline and power
series approximation to the selection correction term.
Consistency and asymptotic normality are shown, as
series approximations are considered.
well as consistency of an asymptotic variance estimator.
JEL Classification:
C14,
C24
Keywords: Sample selection models, semiparametric estimation, series estimation, two-step
estimation.
Presented at the 1988 European meeting of the Econometric Society. Helpful comments
were provided by D.W.K. Andrews, G. Chamberlain, A. Gregory, J. Ham, J. MacKinnon, D.
McFadden, and J. Powell. The NSF and the Sloan Foundation provided financial support.
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/twostepseriesestOOnewe
1.
Introduction
Sample selection models provide an approach to correcting for nonrandom sampling
that
is
important
in
econometrics.
and Heckman (1974).
This paper
form
restricting the functional
is
Pioneering work
in this
area includes Gronau (1973)
about two-step estimation of these models without
The estimators are
of the selection correction.
particularly simple, using polynomial or spline approximations to correct for selection.
Asymptotic normality and consistency of an asymptotic variance estimator are shown.
Some
of the estimators considered here are similar to two-step least squares
estimators with flexible correction terms previously proposed by Lee (1982) and Heckman
and Robb
(1987).
The theory here allows the functional form of the correction to be
entirely unknown, with the
number
of approximating functions
size to achieve /^-consistency and asymptotic normality.
menu
of approximations by considering
new types
of
growing with the sample
Also, this paper adds to the
power
series,
along with regression
splines that are important in statistical approximation theory (e.g. Stone, 1985).
Early work on semiparametric estimation of sample selection models includes
Cosslett (1991) and Gallant and Nychka (1987).
normality results.
These papers do not have asymptotic
Powell (1987) and Ahn and Powell (1993) give distribution theory for
density weighted kernel estimators.
The series estimators analyzed here have the virtue
of being extremely easy to implement.
the regression splines.
Also,
some of the estimators are new, including
Practical experience with these estimators
is
given in Newey,
Powell, and Walker (1990).
Section 2 of the paper presents the model and discusses identification.
estimators are described
in
Section
3,
The
and Section 4 gives the asymptotic theory.
-2-
2.
The Model and Identification
The selection model model considered here
y = x'/3
(2.1)
+ £,
only observed
y
E[£|w,d=l] = E[£|v(w,a
is
d =
if
d e {0,
1,
1>.
Prob(d = l|w) = 7r(v(w,a
),d=l],
Here the conditional mean of the disturbance, given selection and
v = v(w,a
the index
This restriction
).
is
)),
w,
The function
i 0),
l(v+£;
E[£|f]
is
h
A basic implication
that
is
E[y|w,d=l] = x'p
(2.2)
depends only on
implied by other familiar conditions, such
as independence of disturbances and regressors, see Powell (1994).
of this model
x £ w.
linear in
w,
independent of
is
(£,£;)
h (v) = E[?|w,d=l]
(v),
a selection correction that
is
(v)
+ h
£,
then
h
£j
standard normal CDF and p.d.f. respectively.
by Heckman (1976).
Equation (2.2)
In this
is
paper we allow
if
d =
where
This term
$(v)
is
are the
<p[v)
the correction term considered
to have an
h (v)
and
unknown functional form.
an additive semiparametric regression like that considered by
Robinson (1988), except that the variable
Making use of this information
implied by equation
For example
has a standard normal distribution, and
= 0(v)/$(v),
(v)
familiar.
is
(2.1),
is
v
= v(w,a
depends on unknown parameters.
)
important for identification.
and regarding
would mean that any component of
x
h
that
unknown function of variables
as an
is
Ignoring the structure
in
included in those variables would not be
identified.
The identification condition for
Assumption
there
is
1:
M
this paper is
= E[d(x-E[x v,d=l])(x-E[x| v,d=l])'
|
no measurable function
f(v)
w,
such that
]
is
x'A =
nonsingular,
f(v)
when
i.e.
for any
d =
1.
A *
This condition was imposed by Cosslett (1991), and
the selection model version of
is
Robinson's (1988) identification condition for additive semiparametric regression.
shown by Chamberlain
is
(1986),
this condition
is
not necessary for identification, but
necessary for existence of a (regular) Vn'-consistent estimator.
note that this condition does not allow for a constant term in
separately identified from
h
v
given
x
is
Var(x)
that
is
from
v
x.
important to
because
is
it
not
some cases.
in
A simple
has an absolutely continuous component with conditional density that
is
An obvious necessary condition
is
Such an exclusion restriction
a choice variable and
v
Identification of
is
of an estimator of
a
implied by
be excluded
v
many economic models, where
d
is
includes a price variable for another choice.
from equation
ji
.
x.
requiring that something in
x,
in
a
.
order to allow flexibility in the choice
Of course, consistency of
but different consistent estimators
a
(2.2) also requires identification of
Here no specific assumptions will be imposed,
3.
x,
is
nonsingular and the conditional distribution of
not be a linear combination of
assumptions.
are available
1
positive on the entire real line for almost all
that
It
it
(v).
More primitive conditions for Assumption
sufficient condition
As
a
may correspond
a
will imply identification of
to different identifying
For brevity, a menu of different assumptions
not discussed here.
is
Estimation
The type of estimator we consider
semiparametric estimator
least squares regression on
selected data.
is
a two-step estimator,
of the selection parameters
a
x
a
where the
first step
and the second step
and approximating functions of
v = v(x,a)
in
is
a
is
the
These estimators are analogous to Heckman's (1976) two-step procedure for
the Gaussian disturbances case.
The difference
is
that
a
is
estimated by a
distribution-free method rather than by probit and a nonparametric approximation to
-4-
h(v)
is
used
in the
second step regression rather than the inverse Mills ratio.
There are many distribution free estimates that are available for the first step,
including those of Manski (1975), Cosslett (1983), and Ruud (1986).
Ichimura (1993), and Cavanagh and Sherman (1997).
Also, the asymptotic variance of
will be an increasing function of the asymptotic variance of
may
estimator like that of Klein and Spady (1993)
that can approximate
h
monotonic transformation of
IK
x
on
and functions of
t(v,t))
T).
This transformation
Let
as discussed below.
v,
v
denote some strictly
p
(x)
is
=
K
be a vector of functions with the property that for large
(x))'
(x),...,p
so an efficient
a,
let
depending on parameters
v,
useful for adjusting the location and scale of
(p
y
To describe the estimator
(v).
/3
be useful.
a linear regression of
The second step consists of
will
estimator of Powell, Stock, and Stoker (1989),
like the
need to be /n-consistent,
The first step
K.K.
a linear combination of
p
= (d.,w.,d.y.),
z.
T)
denote an estimator of
K
superscript for
x
nn
estimator
is
11
(3.1)
/3
/3
T),
11
M
=
(i
_1
x'(I-Q)y/n,
~~~,s
x.
=
x(v.,t)),
and
d y
nn
M
P =
)',
[d,
P ,,...,d p
11
nn
]',
and
Suppose that
x.
assumed throughout to be
n),
...,
= v(w.,a),
v.
i.i.d..
~K-
p.
= p
For
Let
where a
(x.),
x =
Q = P(P'P)
_1
P'
the
= x' (I-Q)x/n,
will exist in large
the coefficient of
is
1,
suppressed for notational convenience.
y = (d,y,
]',
=
is
p.
where the inverses
estimator
can approximate an unknown function of
liiri
the data are
[d,x,,...,d
(x)
x.
samples under conditions discussed below.
from the regression of
y.
on
x.
and
The
p.
in
the selected data.
This estimator depends on the choice of approximating functions and transformation.
Here we consider two kinds of approximating functions, power series and splines.
power series the approximating functions are given by
(3.2)
pfx)
kK
= x
k_1
.
For
Depending on the transformation
power series can lead
this
x(v,t)),
Three examples are a power series
different types of sample selection corrections.
the index
a nonlinear transformation of
v
is
or in the normal
$(•)/$(•),
in the inverse Mills ratio
v,
used
(e.g.
to several
CDF
for a power series in
$),
in
When
$(•).
may be
it
appropriate to undo a location and scale normalization imposed on most semiparametric
estimators of
To
v(w,a).
this end let
probit estimation with regressors
require that
t)
=
(tj
,t)
be the coefficients from
)'
where we do not impose normality (but
(l,v.),
be a vri-consi stent of some population parameter).
77
will
Then the transformed
observations for the three examples will be
(3.3a)
x.
=
v.,
1
1
(3.3b)
X.
=
0(TJ +T) v.)/f(T} +T) v.),
(3.3c)
T.
=
$(T)
2
The power series
V
+7)
in
)
equation (3.3a) will have as a leading term the index
The one from equation
itself.
so that the first term
Mills,
is
(3.3b) will have leading
the
Heckman
term given by the inverse
(1976) correction.
approximating functions that preserve a shape property of
and
l(v+^2;0)
The
(£,£)
example
last
Gaussian
are independent of
will correspond to a
that
v,
h
power series
v.
(v)
This one also has
h (v)
when
that holds
goes to zero as
in the selection
d =
v gets large.
probability for
£•.
Replacing power series by corresponding polynomials that are orthogonal with respect
to
some weight function may help avoid
«
~
max.
~
and
,{t(v.,t))}
,
l^n.d =1
1
1
k
that
is
orthogonal for the uniform weight on
= [2x(v.,T))-x -xj/(x -x„).
u I
1
u I
replacement, since
one could replace
.
x
k-1
=
x
by a
i
polynomial of order
x.
For example, for
„
x„ = min.
,<t(v.,ti)>
I
i^n,d =1
1
i
at
multicollinearity.
it
is
Of course,
6
is
is
evaluated
not affected by such a
just a nonsingular linear transformation of the
An alternative approximation that
[-1,1],
power
series.
better in several respects than power
-6-
series is splines, that are piecewise polynomials.
Splines are less sensitive to
outliers and to singularities in the function being approximated.
Also, as discussed
below, asymptotic normality holds under weaker conditions for splines than power series.
For theoretical convenience attention
For
[-1,1].
b
Kb
=
0)«b,
>
is
limited to splines with evenly spaced knots on
m
of degree
a spline
x
in
L
with
evenly spaced
can be based on
knots on
[-1,1]
(3.4)
xk_1,
P kK (x) =
=
~ k ~ m+1
l
<[x +
>
m
- 2(k-m-l)/(L+D]
1
An alternative, equivalent series that
is
)
m+2
,
==
k <
m+l+L s
K.
less subject to multicollinearity
problems
is
B-splines; e.g. see Powell (1981).
Fixed, evenly spaced knots
convenience.
4,
difficult
which relies on linear
For inference
variance of
restrictive,
and
is
motivated by theoretical
Allowing the knots to be estimated may improve the approximation, but would
make computation more
Section
is
it
and require substantial modification to the theory of
parameter approximations.
in
important to have a consistent estimator of the asymptotic
is
This can be formed by treating the approximation as
/3.
using formulae for parametric two-step estimators such as those of
estimator will depend on a consistent estimator
Let
v'n(a-a-).
d.p.,
e.
of residuals
from the regression
V(/3)
H
= fir
n
l
[£.
"1=11
d.x.
on
d.p.,
2
/
u.u .(e.) /n + HV(a)H' ]M
l
l
Define
1111
of
(1984).
h(v)
= p
u = (I-Q)x
1111
d.y.
(xtv.Tj))'
y
u'u
_1
,
l
i
i
is
the
sum
of
two terms, the
first of
-7-
which
is
and
d.x.
the
to be the matrix
x' (I-Q)x =
so that
on
n
= y. u.[ah(v.)/av]Sv(w.,a)/Sa'/n.
"1=1 l
l
This estimator
The
of the asymptotic variance of
the corresponding residual, and
obtained from this regression.
h(v)
were exact and
Newey
be the estimates from the regression of
jr
= d.Cy.-x'.p-p'y)
estimate of
(3.5)
and
B
V(a)
if
the White (1980)
and
let
specification robust variance estimator for the second step regression and the second a
term that accounts for the first-stage estimation of the parameters of the selection
equation.
and
i
It
can also be interpreted as the block of a joint variance estimator for
corresponding to
where the
(3,
joint estimator
is
formed as
This estimator will be consistent for the asymptotic variance of
conditions of Section
4.
Newey
/
v n(/3--3
)
„
asymptotic confidence interval for
/3
.
is
[/3
„
-V(/3)
1/2
.
.
rather
n
For example, a 95 percent
sample.
in the selected
(1984).
under the
Note here the normalization by the total sample size
than the number of observations
4.
in
/3
«
~ « 1/?
1.96/Vn, |3.+V(/3).. 1.96/Vn].
Asymptotic Normality
Some regularity conditions
normality.
The
Assumption
2:
show consistency and asymptotic
first condition is about the first stage estimator.
There exists
,i//./Vn + o (1).
T.
/
-1=1 i
p
-£-> V(a)
will be used to
E[i/».]
=
i//(w,d)
and
0,
such that for
E[i//.i//'.
l
l
i//.
exists and
]
=
is
v'ntcc-a
i/»(w.,d.),
nonsingular.
)
=
Also, for
V(a)
l
= E[0.0'.].
11
a
This condition requires that
depends only on
w
and
d.
be asymptotically equivalent to a sample average that
It
many semiparametric estimators
satisfied by
is
of binary
choice models, such as that of Klein and Spady (1993).
The next condition imposes some moment conditions on the second stage.
Assumption
d(y-x'p -h
3:
(v)),
For some
E[e
2
5
|v,d=l]
>
E[dllxll
0,
is
]
<
oo,
Var(x|v,d=l)
bounded, and for
e =
bounded.
The bounded conditional variance assumptions are standard
be very restrictive here because
is
v
will also be
-8-
in the literature,
assumed to be bounded.
and will not
To control the bias of the estimator
conditions on functions of
Assumption
and
s
We
4:
v.
and
h (v)
some smoothness
essential to impose
is
are continuously differentiate in
E[x|v,d=l]
v,
of orders
respectively.
t
also require that the transformation
Assumption
5:
There
is
satisfy
/
with
77
x
v n(T)-7)
)
=
some properties.
x(v(w,a n ),T)
the distribution of
(1),
has an absolutely continuous component with p.d.f. bounded away from zero on
which
is
with respect to
t(v,t))
and
a
Also, the first and second partial derivatives of
compact.
from zero, which
+
x
,
density of
v,
and
are bounded for
tj
a
and
in a
T)
support,
and
neighborhood of
respectively.
T)
The first condition of
= x
a,
v(w.,a)
its
)
v,
useful for series estimation, but
is
where
means that the density
this assumption
x
and
which
is
is
of
t.
is
bounded away
For example,
restrictive.
v
if
are continuously distributed and independent, then the
x
x
a convolution of the densities of
x
and
,
will be
everywhere continuous, and hence cannot have density bounded away from zero.
useful to weaken this condition, but this would be difficult and
is
It
would be
beyond the scope of
this paper.
The next assumption imposes growth rate conditions for the number of approximating
terms.
K = K
Assumption
6:
i
K /n
5,
and
7
—
>
"
such that
0;
Here, splines require the
or
VriK
p
(x)
smoothing.
It
is
is
and
a spline with
a)
m
p
^ t-1
minimum smoothness conditions and the
rate for the number of terms, with
differentiate.
>
K
b)
—
h
(v)
(x)
s
is
^
3,
a power series,
and
least stringent
t
in the
—
>
growth
only required to be three times continuously
also of note that this assumption does not required under-
The presence of
4
K /n
rate conditions means that smoothness in
-9-
s
0.
can compensate for lack of smoothness
E[x|v,d=l]
in
does not have to go to zero faster than the variance.
requirement
is
h
so that the bias of
(v),
h(v)
This absence of an undersmoothing
a feature of series estimators of semiparametric regression models that
has been previously noted
in
Donald and Newey (1994).
Asymptotic normality of the two-step least squares estimator and consistency of the
estimator of
its
asymptotic covariance matrix follow from the previous conditions.
111111
u.
Theorem
M~
Q = E[c
= d.{x.-E[x.|v.,d.=l]},
2
(n +
1:
If Assumptions
HV(a)H')M~\
2
u.u'.
111
1-6
],
H = E[u.{dh_(v.)/dv.}av(w.,aJ/a<x'
and
are satisfied and
vnCp-^ -^
1
1
N(0,V((3)),
and
Q
is
V(p)
].
i
i
nonsingular then for
-^
Let
V(£) =
V(J3).
This result gives v^n-consistency and asymptotic normality of the series estimators
considered
in this
useful to have a
paper, that are useful for large sample inference.
way
minimizes goodness of
of choosing the
fit
number
of functions in practice.
It
would also be
A
K
that
criteria for the selection correction, such as cross-validation
on the equation of interest, should satisfy the rate conditions of Assumption
6.
In
Newey, Powell and Walker (1990) such a criteria was used and gave reasonable results.
However, the results of Donald and Newey (1994) and Linton (1995) for the partially
linear model suggests that
meaning
K
it
may be optimal for estimation
of
|3
should be larger than the minimum of a goodness of
to undersmooth,
fit criteria.
Such
results are beyond the scope of this paper, but remain an important topic for future
research.
-10-
Appendix: Proof of Theorem
Throughout the Appendix
we
Also,
different uses.
C
will denote a positive constant that
Y
To begin the proof, note that by
i
from zero, both
=
I
Also, by the density of
x.
—=-*
Y
a,
max.x.
£
boundary points of
will be Vn-consistent for the
and
min.x.
11
and max.|x.-x.| =
1
l
6,
Newey
as in
l
~K
transformation of
p
111
E[d.p
(A.l)
K
^ (K)K
(x.)p
K
->
/vrT
p
=
(x.)']
1/2
K
of
(x)
l
l
HAH = tr(A'A)
(1997) that for
(x)
max.x..
Therefore, by a location
11
there
,
K
S
lld
.
|x
q(K)K~
0,
Now,
(1/vri).
p
1/2
it
it
can
follows from Assumption
a nonsingular linear
is
such that
sup,
I,
and by
bounded away
and scale transformation for power series, which will not change the regression,
|x.|
0.
i
l
be assumed that
in
for a
bounded and vri-consistency of
(1/vrT).
and hence so will
x.,
—=-»
]
then
,
p
i
and
min.x.
the support of
i
X
and conditioning sets
9v(w,oc)/9a
max. |t.-t.
bounded,
can be different
E[Y |X
will use repeatedly the result that if
sequence of positive random variables
St(v,ti)/Sv
1
|
p
si
S+1
->
S
(x)/dx
s
ll
C,
^s
(K),
0,
1
C,
(K) =
CK
for splines,
CK
^ (K) =
Since a nonsingular transformation does not change
K
p
= p
K
.
Then, as
Newey
in
mean value theorem,
max.
IIP.
HP'P/n
(1997),
-P.
II
<
C(K)max.
-
III
p,
=
|x.-x.
1
for power series.
(C
|
will be convenient to just let
it
(K)K
1/2
/vrT)
= O (C,(K)/vrT),
P
-^
so that
pi
,2
.,1/2.
A „..
^ ,>. ,„,2
P'P/nll s HP-PII/n + IIPIMIP-PII/n = ) (C(K) /n + K
CAK)/VR) -?-* 0.
1
P
,
,
0.
,
pi
(
Also, by the
HP'P/n
Hence, by
J the
triangle inequality,
HP'P/n
(A. 2)
It
follows, as in
where
A(A)
-
III
Newey
-^
0.
(1997),
that
A(P'P/n) a C
with probability approaching one,
denotes the smallest eigenvalue of a symmetric matrix
Next, since
x(v,7j
)
is
one-to-one, conditioning on
-11-
v
is
A.
equivalent to
conditioning on
so that, for example, h
x,
can be regarded as a function of
(v)
-
Let
n.
By
tr(x'Qx-2x'Qfi+fi'(j)/n.
moment
u =
= d.E[x. |x.,d.=l],
of
for
x.,
,...,jjl
[jjl
A(F"P/n) ^
A = P(P'P)"
and
],
2
So that
II(j-/jII
/n =
and existence of the second
idempotent,
Q
C,
= Qx.
ju
x.
1
1
2
llx'AII
= tr(x'AA'x) <
(l)tr(x'x/n) =
Qx/n) ^
(l)tr(x'
P
It
follows similarly that
=
llx'AII
for
(1)
O
(1).
P
P
A = P(P'P)"
1
Also,
.
llx'
£
(P-P)/nll
P
IIP-PH/n =
llxll
(C(K)/Vn)
P
(A. 3)
llx'
_1
-^
Q = P(P'P)
so that for
P',
1
Qx/n
<
- x'Qx/nll
llx'
(P-P)A'x/nll +
A(P' P-P' P)A' x/nll +
llx'
< llx'(P-P)/nll(IIA'xll + IIA'xll) + llx'AIIII(P'P-P'P)/nllllA'xll
It
x'Qu/n
follows similarly that
2
(A. 4)
ll/j-fill
T =
For
(x
- x'Q]Li/n
In
1111111
,x
)'
D =
and
(d,,...,d
E[u.u. |x.,x .,d.,d
i
J
i
J
i
.]
= E[u.E[u .|u.,x.,x
i
J
J
ijjjijij
E[u.E[u.|x
C.
.]
=
i
i
.]
j
|x.,x .,d.,d.] =
i
J
Also, by
0.
(1).
E[u.u.|T,D] =
Therefore,
0.
i
.,d.,d
J
= tr(u' Qu+n' (I-Q)]j)/n + o
(1)
J
i
J
Assumption
E[u'.u.|T,D] = E[u'.u. |x.,d.] £
3,
1111
ii
Therefore, with probability one,
(A.5)
It
.,d.] |x.,x .,d.,d
i
0.
by independence of the observations,
)',
n
E[u.|T,D] = E[d.(x.-E[x.|x.,d.=l])|x.,d.] =
-^
A(P-P)' x/nl
Therefore,
0.
/n = tr(x'Qx-2x'Qfi+/j'fi)/n + o
1
l
—^>
llx'
E[uu' |T,D] < CI.
follows that
Also, by
E[tr(u'Qu)/n|T,D] < Ctr(Q)/n = CK/n
tr(u'Qu)/n
so that
-h> 0,
-^
0.
Assumption 4 and standard approximation theory results for power series and
splines (e.g. see Newey,
there exists
TI..
1997 for references), and by
such that
E[tr((j' (I-Q)(j)]/n
(I-Q)P =
= E[tr((fi-PIT'
)'
and
(I-Q)(fi-PIT' ))]/n £
K.
K.
is.
K
K
iiisiiK.1
E[tr((/n-PIT')'(M-Pn'))]/n = E[d.{/i.-n v p
is
is
(x.)}'
{/_t.-TT
results with equation (A. 4) gives
-12-
p
(x.)}]
I-Q idempotent,
-^
0.
Combining these
2
(A. 6)
ll/J-/j|l
/n
M
This implies that
-
—
u'u/n
In
e = (e, ,...,£
Next, let
E[ee' |W,D] £
that
E[llx' (Q-Q)e/Vnll
2
M.
>
and
Q
Q
|W,D] = tr{x'(Q-Q)E[ee'
follows similarly to equation
It
that
(A. 3)
|
It
follows similarly to eq.
W
are functions of
W,D](Q-Q)x}/n
(Q-Q)Qx/n
x'
-^
and
and
D,
Ctr(x' (Q-Q)(Q-Q)x)/n.
==
x'
(Q-Q)Qx/n
x' (I-Q)e/Vn = x' (I-Q)e/Vn + o
and hence
so that llx'tQ-Qje/Vnll -£-* 0,
follows by the law of large
= [w' ...,w']'.
Then, since
CI.
—
M
M
=>
In
W
and
)',
—
u'u/n
while
0,
>
The triangle inequality then gives
numbers.
(A. 5)
-£-» 0.
(1).
-2-» 0,
follows
It
P
Newey
as in Donald and
x'(I-Q)e/Vn =
(A. 7)
(1994) that
c/Vn
u'
+ o
(1).
P
For both power series and splines
follows as
it
in
Newey
(1997) that there are
and
y
K.
such that for h
tt
sup
(A.8)
sup
Let
h.
= ^
H
i
x
I
^j_
all
K.1
n
= (a', 77')',
and
(x)'y
<
(x)|
M-CtJ-Mj^Cx)
let
£
1
h
K
CK~
CK
S+1
K
h
(x.),
K
expressions without the
Then
,
|dh (x)/dx-dh
Q
x'
= h
.
i
K
(x.),
n.
(x)/dx|
and
v
and
E[u.li
that
.]
0i
E[u.a(v.)] =
h.
=
M Ri =
jxh:.),
(x-jl,)'
= Sh(x(w.,e n ))/5e'
U
x'(I-Q)h /n
for any function
S+1
,
^(xJ,
n
.
,0]
a(v.)
= [H,0].
=
Let
K.
Since
ax(w,e n )/3T)
with finite
It
follows similarly
i
i
—^-»
CK~
(I-QHh-Lj/Vn.
1
= E[u.{dh.(v.)/dv}3v(w.,a )/3a'
l
==
subscript denote corresponding
(I-Q)lWn = x' (I-Q)(h-h)/Vn +
x(w,e) = x(v(w,a),77),
v/n-consistency of
K
K.
l
M ——» M
(x)'ti
j\.n
depends only on
to
K
sup
,
01
mean-square,
= p
(x)
.
= h
.
(J
observations multiplied by selection indicators, e.g.
[dJL,,,...,d jL. ]'.
i
K
= h(x.),
h.
and
(x.),
I
K
= p
(x)-h
|h
= h(x.),
matrices over
(t)
E[u.h
'.
].
Then by a second-order expansion and
0,
-13-
8
Op
x'U-QMh-hWn
(A. 9)
HVn(a-0
=
= -[x' (I-Q)ho /n]v^(0-e.) + o
9
+ o
Op
= -E[u.h .Wn(9-eJ + o
(1)
8i
i
(1)
(1).
p
Also, by eq.
and
(A.8)
idempotent,
I-Q
)'
(£-£,.
lii-nY [I-QUh-h v )/Vn =
(K~
-^-> O.
Also,
-H-> 0.
(<T(K)K~
pi
— 0, so that
)
and
2
Also, E[llu'Q<=
ll
that for
—
u'Qe^/Vn
x'(I-Q)h/Vn = (x-£
Combining equations
and
11111111
]
=
E[u.E[e.|w.,d.]i/»'.
=
]
-^
0.
n
=
(1)
V.
i = A'(y-xjS).
(1/Vn),
II
A' (h-fi)
11
<
sup.
|
= O (K
-s+1
"
).
=
(l)ll(h-h)//nll
l
(A. 3),
IIA'x(J3-|3
and
(1/Vn),
)ll
=
(1/Vn).
£
(l)llx(/3-0
IIA' (h-P-y^)ll
K
P
Then by
(K/n).
K.
triangle inequality,
lly-3f„ll
=
((K/n)
K.
+
sup
<
(1)
II
Also,
=
P
S
S
S
K
|d [p
(C (K)[(K/n)
s
(K
(T)'3-
1/2
+K
K
so that
HA'
+ A' e + A' (h-h) +
)
(T)/dT
S
|
S
S
]/dT - d h
< SU
(h-P>., Willi
K
2
ell
).
Then for
S
S
|
|[d p
l)
= o
(1).
p
-14-
K
=
s
1
A'(h-P^) and
or
2,
S
(T)/dT ]'(y-r
< C (K)lly-y
s
_S+1
= e'AA'e
K.
P|T|£l
(x)/dx
Willi =
p
p
|d h(T)/dT -d h
|T|£l
+
)
p
S
P|T|£l
= A'x(p-|3 n
y-y
p
SU
E[e'Qe|D,W] ^ CK,
it
l
dx(v.,T})/dv-dx(v.,T))/dv|
Similarly to previous results,
p
(A. 12)
theorem and
-
(De'Qc/n =
=
limit
(1).
p
dh(v.)/dv = [dh(x.)/dx]dx(v.,7i)/dv, and
Similarly to eq.
P
P
+ Hip.)/Vn + o
0.
follows from the Assumption 5 that
(t)'£,
ill
,(u.e.
^i=l
l
K
K.
we obtain
(A. 10),
To show the second conclusion, note that
h(x) = p
)
(Q-Q)e^/Vn =
u'
is.
from the Lindberg-Levy central
first conclusion then follows
E[u. c.dj'.
_S+1
K.K.K.K.K.
p
The
= h-h.,,
is.
x'(I-Q)(e+h)/Vn = u' c/Vn + HVn(a-a_) + o
(A. 11)
K
(K
p
/n|T,D] = e'QE[uu' |T,D]Qe.,/n £ Ce'Qe^/n ^ e'e^/n
(I-Q)(fi-fL)/V3
)'
(A. 9),
(A. 7),
=
The triangle inequality then gives
0.
>
e
-£-> o.
)
K
ix
(A. 4)
1
1
p
u' [l-Q)(h-h v -h+hT .)/Vn
K.
>
(A. 10)
-B->
)
follows similarly to eq.
it
S+1
S+1
p
is.
" "
(v/nK
R
IS.
Also,
-5-1
(I-QHh-fuJ/Vn =
K
ll
+
0(K
K
)l
-S+1
)
the
It
max.
follows that
max.
l
Also,
max.
implying
-?-> 0,
|dh„(T.)/dT-dh_(T.)/dx|
i<n
-^-» 0.
|
since the conditions require
1
1
be at least twice differentiate with bounded derivative,
h (x)
that
dh(x.)/dx-dh,Jx.)/dx
|
i£n
i^n
i
9v(w.,a)/da,
Then, by
J boundedness of
H =
for
n
^ tr(u'u/n)
dh(v.)/dv-dh^(v.)/dv
-^
|
11
^1=1
l
n
2
1/2
liaT(w.,a)/aall /n)
max.
dh^(x.)/dT-dh^(x.)/dT
i^n
^i=l
l
l
1/2
(r.
0.
l
l
,u.[5x(w.,a)/aa' ]dh„(v.)/dv,
Y.
l
IIH-HII
I
|
i
-^
|
0.
l
It
also follows by eq.
H =
that for
(A. 3)
,u.[aT(w.,a^)/aa' ]dhjv.)/dv,
^i=l l
l
n
IIH-HI
Y.
l
H —!—> H
Then since
0.
H —^-> H
by the law of large numbers,
follows by the triangle
inequality.
Now,
let
=
A.
x'.
l
max.
l^n
max.
|h(x.)-h.(x.)|
i
Nx.l
i^n
+
11
max.
.
II
Etle. ||W,D] £ C,
=
(1),
and hence
p
n
Y.
Ilu.ll
2
,
llu.ll
^i=l
|
e.
|/n =
e.
|
i^n
.
n
.
|x'.
>
(/3-/3J
n
1/(2+5),,
P
P
Furthermore, by Assumption
0.
2
|/n W,D] = £."
E[
llu.ll
|
|
c.
|
|
2
llu.ll
Therefore,
(1).
\\Y
,
2
n
llu.ll
|e.|/n)max.
A.
l
i^n
i
^i=l
2(J].
I
1
+ (Y.
|
Also, note that
|
11
11
/n
-^
0.
It
the law of large numbers,
inequality,
2
/n)max.
|a.|
i^n
l
-?h>0.
l
|
2
,lle.u.-e.u.ll
2
,llu.ll
ill
2
n 2
Z
EIY." Ile.u.-e.u.ll /n |W,D] = E[Y. ,c \\ii.-^.\\ /n W,D] <
^1=1
11 11
^i=l l
l
l
^.^Ele^lW.Dlll/l.-iJ.I^/n £ C£."llji.-fi.ll /n
n
n
^i=l
l
V(a)
n
0.
3,
W,D]/n < C£.|\
i
^i=l
^
.
P
mri
(1)0 ,,
(1/vn) —
2
2
2 2
n
n
n
2
2 2
n
2
,
,u.u'.£ /n - £. i u.u .e /nll ^ v
llG.II
|e -e |/n = V.
llu.ll
(e.-A.) -e |/n
^1=1
^i=l i
i
i
^i=l l l i
i
i
^i=l
i
(A.13)
y.
<
I
l
p
l
l
—
|A.|
2
E[£."
so that
..
i
p
l
max.^
Then by the triangle inequality
l
(1/vn) = n
/n)
llx.ll
<
|h.-h.|
max.
Also,
0.
..2+5. .1/(2+5)^
„
,
^i=l
-^
|
i^n
l
l
1/(2+5). „n
^
< n
().
l
max.
(A.12),
eq.
h_(x.)-h.(x.)
|
l^n
l
,S
18-R
By
(/3-/3J + h.-h..
l
£._ u.u'.e./n
follows that
Y.
—
>
e./n
ill
,u.u'.
^1=1
Q.
A
by equation
n
r.
2
,u.u'.c
^i=l
-^-»
l
l
l
/n
- Y.
n
Therefore,
2
ill /n -^
,u.u'.e
^i=l
Etu.u'.c] = Q,
ill
(A. 6).
0.
Then byJ
so by the triangle
The second conclusion then follows by consistency of
and the Slutzky theorem.
-15-
/n
References
Ahn, H. and J.L. Powell (1993): "Semiparametric Estimation of Censored Selection Models
with a Nonparametric Selection Mechanism," Journal of Econometrics 58, 3-29.
Chamberlain, G. (1986): "Asymptotic Efficiency
Journal of Econometrics 32, 189-218.
in
Semiparametric Models with Censoring,"
(1983): "Distribution-Free Maximum Likelihood Estimator of the Binary
Choice Model," Econometrica 51, 765-782.
Cosslett, S.R.
"Distribution-Free Estimator of a Regression Model With Sample
Barnett, J.L. Powell and G. Tauchen, eds., Nonparametric and
Semiparametric Methods in Econometrics and Statistics. Cambridge, Cambridge University
Cosslett, S.R.
(1991):
Selectivity," in W.A.
Press.
Donald, S.G. and W. Newey (1994): "Series Estimation of Semilinear Models," Journal of
Multivariate Analysis 50, 30-40.
Gallant, A.R. and D.W. Nychka (1987): "Semi-nonparametric
Econometrica
55,
Maximum
Likelihood Estimation,"
363-390.
Gronau, R. (1973): "The Effects of Children on the Housewife's Value of Time," Journal of
Political
Heckman,
Economy
J.J.
(1974):
81,
S168-S199.
"Shadow Prices, Market Wages, and Labor Supply," Econometrica
42,
679-693.
J.J. (1976): "The Common Structure of Statistical Models of Truncation, Sample
Selection and Limited Dependent Vairables and a Simple Estimator for Such Models,"
Annals of Economic and Social Measurement 5, 475-492.
Heckman,
J.J. and R. Robb (1987): "Alternative Mehods for Evaluating the Impact of
Interventions," Ch. 4 of Longitudinal Analysis of Labor Market Data, J.J. Heckman and
B. Singer eds., Cambridge, UK: Cambridge University Press.
Heckman,
Ichimura, H. (1993). Estimation of single index models. Journal of Econometrics 58,
71-120.
Klein, R.W.
and R.S. Spady
(1993):
Choice Models," Econometrica
Lee, L.F.
(1995):
Econometrica
"An Efficient Semiparametric Estimator for Discrete
387-421.
"Some Approaches to the Correction of Selectivity Bias," Review of
(1982):
Economic Studies
Linton, 0.
61,
"Second Order Approximation
63,
Manski, C. (1975):
49, 355-372.
in a Partially
Linear Regression Model,"
1079-1112.
"Maximum Score Estimation
Journal of Econometrics
3,
of the Stochastic Utility Model of Choice,"
205-228.
Newey, W.K. (1997): "Convergence Rates and Asymptotic Normality for Series Estimators,"
Journal of Econometrics 79, 147-168.
Newey, W.K. and J.L. Powell (1993): "Efficiency Bounds for Semiparametric Selection
Models," Journal of Econometrics 58, 169-184.
-16-
Powell, and J.R. Walker (1990): "Semiparametric Estimation of Selection
Results," American Economic Review Papers and Proceedings, May.
Empirical
Models: Some
Newey, W.K.,
J.L.
Powell, J.L. (1994): "Estimation of Semiparametric Models," in R.F. Engle and D.
McFadden, eds., Handbook of Econometrics: Volume 4, New York: North-Holland.
Powell, J.L., J.H. Stock, and T.M. Stoker (1989). Semiparametric Estimation of Index
Coefficients Econometrica 57, 1403-1430.
Powell, J.L. (1987): "Semiparametric Estimation of Bivariate Limited Dependent Variable
Models," manuscript, University of California, Berkeley.
Powell, M.J.D. (1981): Approximation Theory and Methods, Cambridge, UK, Cambridge
University Press.
Robinson,
931-954.
P.
(1988):
"Root-N-Consistent Semiparametric Regression," Econometrica 56,
Ruud, P. A. (1986): "Consistent Estimation of Limited Dependent Variable Models Despite
Misspecification of Distribution," Journal of Econometrics 32, 157-187.
Stone, C.J.
Statistics
(1985):
13,
"Additive Regression and Other Nonparametric Models, Annals of
689-705.
White, H. (1980): "Using Least Squares to Approximate
International Economic Review 21, 149-170.
70
7 6
DO
-17-
Unknown Regression Functions,"
Date Due
MIT LIBRARIES
3 9080 01972 1015
ISWWW8
wmm^mi
memili m
Hi
^¥iiiiii
^SmSMM
ifmwm
^WM
]:
4
liiiiiiiiiiiii
iiiiaiii
in
m"v
;
;
:
v;''
WMilllgi&i-%^
Ifiltf
:
::
;,
".
l;!;ll
::
Sliiip'i;®
31
ftp
:
,
:
:::;;:r; j:^;.:;.•/, l/;,^;/i:(;;:^;
;;
i
,
::;'^^h:;^^^;::^;; v^:^M!;;;:;i:;:
in
)nmihii^tm:
I:.
was
;?>;:
:
iBmmmma^mMm
t
mnm
MM;
illlilliiil
'if.
,.,:-
::;;.,
:
.'.,-:.,
;.,,:.
jmgmam^
llllililllllllllSIIS