Download document 62383

Document related concepts

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/convergenceratesOOnewe
working paper
department
of economics
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
CONVERGENCE RATES FOR SERIES ESTIMATORS
Whitney
No.
93-10
K.
Newey
July 1993
W6 12.9J*
rEGBVk
i
CONVERGENCE RATES FOR SERIES ESTIMATORS
Whitney K. Newey
MIT Department of Economics
July, 1993
This paper consists of part of one originally titled "Consistency and Asymptotic
Normality of Nonparametric Projection Estimators." Helpful comments were provided by
Andreas Buja and financial support by the NSF and the Sloan Foundation.
Abstract
Least squares projections are a useful way of describing the relationship between
random
variables.
functions.
These include conditional expectations and projections on additive
Series estimators,
i.e.
regressions on a finite dimensional vector where
dimension grows with sample size, provide a convenient way of estimating such
projections.
This paper gives convergence rates these estimators.
derived, and primitive regularity conditions given for
Keywords:
General results are
power series and
splines.
Nonparametric regression, additive interactive models, random coefficients,
polynomials, splines, convergence rates.
1.
Introduction
Least squares projections of a random variable
x
provide a useful
example
is
way
on functions of a random vector
of describing the relationship between
and
y
The simplest
x.
linear regression, the least squares projection on the set of linear
combinations of
as exemplified in Rao (1973, Chapter
x,
nonpar ametric example
functions of
fall in
y
x
is
4).
An interesting
the conditional expectation, the projection on the set of all
with finite mean square.
There are also a variety of projections that
between these two polar cases, where the set of functions
One example
linear combinations but smaller than all functions.
is
is
larger than
all
an additive
regression, the projection on functions that are additive in the different elements of
x.
This case
is
motivated partly by the difficulty of estimating conditional
expectations when
Friedman
(1977).
(1985),
x
has many components: see Breiman and Stone (1978), Breiman and
Friedman and Stuetzle
A generalization
(1981),
Stone (1985), and Zeldin and Thomas
that includes some interaction terms
functions that are additive in some subvectors of
combinations of functions of
x,
x.
is
the projection on
Another example
is
random
linear
as suggested by Riedel (1992) for growth curve
estimation.
One simple way to estimate nonparametric projections
is
by regression on a finite
dimensional subset, with dimension allowed to grow with the sample size, e.g. as in
Agarwal and Studden
(1980), Gallant (1981), Stone (1985),
which will be referred to here as series estimation.
Cox
(1988),
and Andrews
This type of estimator
may
(1991),
not be
good at recovering the "fine structure" of the projection relative to other smoothers,
e.g.
see Buja, Hastie, and Tibshirani (1989), but
is
computationally simple.
Also,
projections often show up as nuisance functions in semiparametric estimation, where the
fine structure is less important.
This paper derives convergence rates for series estimators of projections.
Convergence rates are important because they show how dimension affects the asymptotic
accuracy of the estimators
(e.g.
Stone 1982, 1985).
Also, they are useful for the
theory of semiparametric estimators that depend on projection estimates
1993a).
(e.g.
Newey
The paper gives mean-square rates for estimation of the projection and uniform
convergence rates for estimation of functions and derivatives.
Fully primitive
regularity conditions are given for power series and regression splines, as well as more
general conditions that
may apply
to other types of series.
Previous work on convergence rates for series estimates includes Agarwal and Studden
Cox
(1980), Stone (1935, 1990),
(1988),
improves on mau>y previous results
and Andrews and Whang (1990).
in the
This paper
convergence rate or generality of regularity
Uniform convergence rates for functions and their derivatives are given and
conditions.
some of the results allow for a data-based number of approximating terms, unlike
Cox
in
but
all
Also, the projection does not have to equal the conditional expectation, as
(1988).
Stone (1985, 1990) but not the others.
2.
Series Estimators
The results of this paper concern estimators of least squares projections that can
be described as follows.
functions of
z,
Let
x
with
z
denote a data observation,
having dimension
linear subspace of the set of all functions of
projection of
(2.1)
y
on
&
is
„E[<y - g(x)>
(measurable)
denote a mean-squared closed,
with finite mean-square.
The
2
].
the conditional expectation,
measurable functions of
x
^
x
and
is
g Q (x) = argmin
An example
Let
r.
y
x
g n (x) = E[y|x],
with finite mean-square.
as illustrations, and are of interest in their
own
right.
Two
where
!»
is
the set of
all
further examples will be used
When
Additive- Interactive Projections:
difficult to estimate
general
x,
way
a feature often referred to as the "curse of
E[y|x],
so that the individual components have smaller dimension than
x,
to describe these is to let
W =
For example,
{Z^g^)
if
L =
additive functions.
:
ngt (xt
and each
r
Z
)
)
x.
The projection on
nonparametric nonlinearities
x.,
(1
=
L)
1
x.
One
be distinct subvectors of
<
«h
is
!*
just a component of
the set
x,
is
just one or
in individual regressors.
two dimensional, then
"§
consists of
generalizes linear regression to allow for
The set of equation
further generalization that allows for nonlinear interactive terms.
(2.2) is a
For example,
if
each
would allow for just pairwise
this set
interactions.
Cova.ria.te Interactive
Projections:
As discussed
problems
in Riedel (1992),
growth
in
curve estimation motivate considering projections that are random linear combinations of
functions.
To describe
of covariates and let
these, suppose
Hf
=
(1
,
x = (w,u),
1,..., L)
where
w
= (w
be sets of functions of
w.
u.
)'
is
a vector
Consider the
set of functions
(2.3)
In
is
and specify the space of functions as
(2.2)
x.
it
This problem motivates projections that are additive in functions of
dimensionality."
subvectors of
has more than a few distinct components
x
i?
=
<£.
w.h.(u)
h- € H.},
:
a growth curve application
covariate coefficient that
The estimators of
dimensional subspace of
is
u
|u]
is
nonsingular with probability one.
represents time, so that each
h.(u)
represents a
allowed to vary over time in a general way.
g n (x)
W,
E[ww'
considered here are sample projections on a finite
which can be described as follows.
Let
p (x) =
(p.„(x)
be a vector of functions, each of which
Pj-j-tx))'
Denote the data observations by
and
)'
y
(y.
p
K =
K
[p
(x
y.
and
x.,
K
p (x
)
(i
=
1,
2,
§
an element of
and
...),
for sample size
)],
is
'.
y =
let
An estimator of
n.
g n (x)
is
g(x) = p
(2.4)
where
K
n =
(x)'rt,
(p
K 'p K
fp
K/
y,
denotes a generalized inverse, and
(•)
K.
subscripts for
The matrix
been suppressed for notational convenience.
K
p 'p
K
n
and
g(x)
have
will be asymptotically
nonsingular under conditions given below, making the choice of generalized inverse
asymptotically irrelevant.
The idea of sample projection estimators
K
are that
K
as
1)
grows
(i.e.
(E[p
p
K
1),
ir
K
K
(x).
estimation error in
is
!*,
an element of
K
and
ir
is
K
2),
and
it
p (x)'re
small,
g(x)
will
if
this approximation
2) p
arbitrarily closely in
K
(x)
"spans"
mean
!*
is
square).
approximate
should approximate
g n (x)
on
Consequently, when the
g n (x).
g n (x).
types of approximating functions will be considered in detail.
They are power
series and splines.
Power
Series:
A =
denote an r-dimensional vector of nonnegative
A
r
For a
and let x s TT„ x.
a multi-index, with norm
|A| = £._,A.,
Let
(A,
1
integers,
sequence
to
i.e.
(A(k)).
_
A
)'
r
a
E[p (x)y] =
the coefficients of the projection of
n (x)],
1)
*§,
g n (x)
can be chosen big enough that there
that approximates
n s (E[p (x)p (x)'])
-IK
E[p (x)g
Thus, under
Two
p (x)
estimates
(x)p (x)'])
p (x)
for any function in
linear combination of
Under
K
each component of
that they should approximate
The two key features of
allowed to grow with the sample size.
is
is
Art
of distinct such vectors, a power series approximation corresponds
(2.5)
p
kK
(x)
= x
Mk)
(k
Throughout the paper
,
it
will be
is
imposed so that
2,
1,
...).
Mk)
assumed that
are ordered so that
For estimating the conditional expectation
monotonically increasing.
also be required that
=
E[y|x],
can be approximated by a power series.
E[y|x]
it
an element of
This can be accomplished by requiring that the only
!?.
will
Additive-interactive
projections can be estimated by restricting the multi-indices so that each term
is
is
This requirement
include all distinct multi-indices.
(Mk)), _.
|A(k)|
p,
Mk)
v (x)
that are
included are those where indices of nonzero elements are the same as the indices of a
subvector
x.
for some
In addition, covariate interactive
I.
taking the multi-indices to have the same dimension as
approximating functions to be
selects a component of
P tK (x) =
w »(l.} u Mk)
make them
and specifying the
where
•
£(k)
is
an integer that
w.
Power series have a potential drawback of being
possible to
u
terms can be estimated by
less sensitive by using
transformation of the original data.
by a logit transformation
l/(l+e
sensitive to outliers.
power series
in a
It
may be
bounded, one-to-one
An example would be to replace each component of
I).
The theory to follow uses orthogonal polynomials, which may help alleviate the well
known multicollinearity problem for power
series.
If
each
x
Mk)
is
replaced with the
product of orthogonal polynomials of order corresponding to components of
respect to some weight function on the range of
x,
and the distribution of
similar to this weight, then there should be little collinearity
x
I
Mk)
.
Mk)
The estimator
|
is
multicollinearity problem for
function
is
x.
with
is
among the different
will be numerically invariant to such a replacement (because
monotonically increasing), but
Regression Splines:
A(k),
power
A regression
it
may
alleviate the well
known
series.
spline is a series estimator
where the approximating
a smooth piecwise polynomial with fixed knots (join points).
They have
x
some attractive features relative
to
power
series, including being less sensitive to
singularities in the function being approximated and less oscillatory.
A disadvantage
that the theory requires that knots be placed in the support and be nonrandom (as in
is
The power series theory does not
must be known.
Stone, 1985), so that the support
require a known support.
To describe regression
splines
it
is
convenient to begin with the one-dimensional
x
For convenience, suppose that the support of
case.
[-1,1]
is
(it
normalized to take this form) and that the knots are evenly spaced.
An
0)(»).
m
degree spline with L+l evenly spaced knots on
can always be
Let
[-1,1]
x
=
(•)
1(»
>
a linear
is
combination of
P*L (v)
(2.6)
v
,
1
-
Os.Jts'in,
"
<[v +
2(*-m)/(L+l)]
m
+
>
,
m+1 £ k £ m+L
Multivariate spline terms can be formed by interacting univariate ones for different
components of
and
(2 7)
-
k,
x.
For a set of multi-indices
<A(k)>,
X.(k) £
with
m+L-1
for each
the approximating functions will be products of univariate splines,
^>XM,L Uj
)
'
{k
=
Note that corresponding to each
l
'-
K
K)
there
is
a number of knots for each component of
Throughout the paper
be assumed that each ratio of numbers of knots for a pair of elements of
required that
For estimating the conditional expectation
(Mk)). _.
imposed so that
E[y|x]
i.e.
-
and a choice of which multiplicative components to include.
above and below.
include all distinct multi-indices.
E[y|x],
it
x
is
bounded
will also be
This requirement
is
can be approximated by interactive splines.
series.
Also, covariate interactive
terms can be estimated by
forming the approximating functions as products of elements of
u
with splines
x
will
it
Additive-interactive projections can be estimated by restricting the multi-indices in the
same way as for power
j
in
x
analogously to the power series case.
The theory to follow uses B-splines, which are a linear transformation of the above
basis that
is
nonsingular on
and has low multicollinearity. The low
[-1,1]
multicollinearity of B-splines and recursive formula for calculation also lead to
computational advantages;
e.g.
see Powell (1981).
number of terms
Series estimates depend on the choice of the
K
desirable to choose
based on the data.
With a data-based choice of
estimates have the flexibility to adjust to conditions in the data.
might choose
K,
is
it
these
For example, one
by delete one cross validation, by minimizing the sum of squared
K
E-_Jy - g K (x.)]
residuals
so that
K,
-
function computed from
where
,
g .„(x.)
is
the estimate of the regression
the observations but the
all
will allow for data based
i
.
Some of the
results to follow
K.
General Convergence Rates
This section derives some convergence rates for general series estimators.
this
is
it
useful to introduce some conditions.
Also, for a
v
<E[
IIYII
D
matrix
1/v
]>
,
v
<
eo,
let
II
and
II
= (trace(D'D)]
Oil
- h.(x),
u = y
J
Let
1/2
=
for a random matrix
,
the infimum of constants
IIYII
u.
C
such that
y.
To do
- h_.(x.).
Y,
1
IIYII
Prob(
IIYII
=
<
C)
00
1.
Assumption
3.1:
{(y.,x.)>
is i.i.d.
and
2
E[u |x]
The bounded second conditional moment assumption
Stone, 1985).
Apparently
it
is
is
bounded on the support of
quite
common
in the literature (e.g.
can be relaxed only at the expense of affecting the
convergence rates, so to avoid further complication this assumption
x..
is
retained.
The next Assumption
is
useful for controlling the second
moment matrix of the
series
terms.
Assumption
K
K
P (x) = Ap
uniformly
For each
3.2:
K
there
is
g(x)
is
such that for
bounded away from zero
is
invariant to nonsingular linear transformations, there
K
really no need to distinguish between
transformation
A
is
allowed for
needed for some transformation.
series, but will apply to
Assumption 3.2
is
p (x)
and
P
K
An
at this point.
(x)
is
explicit
order to emphasize that Assumption 3.2
in
is
only
For example, Assumption 3.2 will not apply to power
orthonormal polynomials.
a normalization that leads to the series terms having specific
The regularity conditions
grow too fast with the sample
will also require that the
size.
The size of
P
(x)
magnitude of
P
(x)
will be quantified by
X K
< d (K) = su P|A|=dxeX ..a P (x)H
(3.1)
I
where
is
the support of
x,
vector of nonnegative integers,
r
ixi
That
EIP (x)P (x)'l
A
K.
in
Since the estimator
not
K
K
the smallest eigenvalue of
(x),
magnitudes.
constant, nonsingular matrix
is,
= r. ,x
**j=l r
<j(K)
is
the
,
II
II
= (trace(D'D)l
1/2
" for a matrix
D,
X
denotes a
and
x K
a p (x) » a
ul p K (x)/ax
1
1
1
-"ax
r
.
r
supremum of the norms of derivatives of order
d.
The following condition places some limits on the growth of the series magnitude.
Also,
it
allows for data based choice of
terms are nested.
K,
at the expense of imposing that series
Assumption
3.3:
There are
<
K+l s K(n)
subvector of
4
£K ^ K C Q (K) /n
and
P
p (x)
a)
for
(x)
K
all
p (x),
so that in part a)
Part b)
sequence of vectors.
—
0,
»
or;
K
K(n) £
p
<
K s
K+l
for
(x)
K
The
b)
K(n) s
with
P (x)
K+l s
R~(n)
K(n)
with probability
K
all
with
K(n) s
of Assumption 3.2
and
—
< (K(n)) /n
a
is
>
0.
invariant to nonsingular linear transformations
is
suffices that any such transformation form a nested
it
more
is
a subvector of
is
As previously noted, a series estimate
of
such that
K(n)
K
approaching one and either
K
and
K(n)
restrictive, in requiring that the
<P
from
(x))
Assumption 3.2 be nested, but imposes a less stringent requirement on the growth rate of
K.
Also, if
K
is
K =
K(n) =
nonrandom, so that
K(n),
the nested sequence requirment
of both part a) and b) will be satisfied, because that requirement
is
vacuous when
K =
K.
In
order to specify primitive hypotheses for Assumptions 3.2 and 3.3
possible to find
P
for, or bounds on,
satisfying the eigenvalue condition, and having
(x)
That
C n (K).
eigenvalues are bounded
away from
power series and regression
that
is
2
K /n
—
Fourier series, but this
is
It
when
x
is
possible to derive such bounds for both
is
continuously distributed with a density
These bounds lead to the requirements that
for regression splines with nonrandom
»
are described in Sections 5 and
known values
one needs explicit bounds on series terms where the
zero.
splines,
bounded away from zero.
for power series and
is,
6.
must be
it
It
is
K.
—4
K /n
>
These results
also possible to derive such results for
not done here because they are most suitable for
approximation of periodic functions, which have fewer applications.
It
may
also be
possible to derive results for Gallant's (1981) Fourier flexible form, although this
more
—
difficult, as described in Gallant
problem with the Fourier flexible form
and Souza
is
(1991).
In
is
terms of this paper, the
that the linear and quadtratic terms can be
approximated extremely quickly by the Fourier terms, leading to a multicollinearity
problem so severe that simultaneous satisfaction of Assumptions 3.2 and 3.3 would impose
very slow growth rates on
Assumptions
The bias
norm
3.1 - 3.3
are useful for controlling the variance of a series estimator.
the error from the finite dimensional approximation.
is
will be used to quantify this approximation.
X
defined on
|f|
and
K.
|f
= maX
d
|A|sd
maX
equal to infinity
,
|
and a nonnegative integer
Many of the results
f(x)
'
does not exist for some
5 f(x)
if
For a measurable function
let
d,
|aAf(x)l
x€ Z
A supremum Sobolev
ad
\X\
and
x e J.
will be based on the following polynomial approximation rate
condition.
Assumption
for
all
K
3.4:
there
This condition
is
There
n
is
is
a nonnegative integer
|g - p
with
not primitive, but
is
* ir
I
.
known
the higher the degree of derivative of
CK
s
and constants
d
C,
such that
>
.
to be satisfied in
many
cases.
a
that exists, the bigger
g(x)
a
Typically,
and/or
d
can
This type of primtive condition will be explicitly discussed for power series
be chosen.
in Section 5
and for splines
approximation rate
for an
is
generalization leads to
in Section 6.
L
is
It
also possible to obtain results
norm, rather than the sup norm.
much more complicated
when the
However, this
results, and so is not given here.
These assumptions will imply both mean square, and uniform convergence rates for the
series estimate.
The
first result gives
mean-square rates.
Let
F(x)
denote the
x.
Theorem
3.1:
If and Assumptions
l^fgUJ-gjxjf/n
3.1
- 3.4 are satisfied for
= O (K/n *
2
Slg(x)-gn(x)) dF(x) =
O
p
K
(K/n + K
10
2
*),
2cL
).
d =
then
CDF
of
The two terms
in the
sample mean square error,
first conclusion, on
is
u
Also, the second conclusion, on integrated
K
need not satisfy
at the expense of requiring Assumptions 3.2 and 3.3, that
in these other papers.
Whang
Here the number of terms
allowed to depend on the data, and the projection residual
E[u|x] = 0,
The
bias.
similar to those of Andrews and
and Newey (1993b), but the hypotheses are different.
(1991)
is
convergence rate essentially correspond to variance and
were not imposed
mean square
error, has
not been previously given at this level of generality, although Stone (1985) gave
specific results for spline estimation of an additive projection.
The next result gives uniform convergence rates.
Theorem
3.2:
integer
d
If Assumptions
3.1,
3.2, 3.3 b),
and 3.4 are satisfied for a nonnegative
then
\g -
g
\
d
=
O
p
((;
There does not seem to be
d
(K)[(K/n)
1/2
+ K~*]).
in the literature
cover derivatives and general series in the
any previous uniform convergence results that
way
this one does.
univariate power series case, the convergence rate that
improves on that of Cox (1988), as further discussed
is
Furthermore, for the
implied by this result
in Section 4.
These uniform rates
do not attain Stone's (1982) bounds, although they do appear to improve on previously
known
rates.
For specific classes of functions
!?
and series approximations, more primitive
conditions for Assumptions 3.2 - 3.4 can be specified in order to derive convergence
rates for the estimators.
These results are illustrated
in the
next two Sections, where
convergence rates are derived for power series and regression spline estimators of
additive interactive and covariate interactive functions.
11
4.
Additive Interactive Projections
This Section gives convergence rates for power series and regression spline
estimators of additive interactive functions.
x
restricts
Assumption
The
first regularity condition
to be continuously distributed.
4.1:
x
is
continuously distributed with a support that
product of compact intervals, and bounded density that
This assumption
closed.
is
is
is
a cartesian
away from
also bounded
useful for showing that the set of additive-interactive functions
is
Also, this condition leads to Assumptions 3.2 and 3.3 being satisfied with
explicit formulae for
C«(K).
For power series
it
is
possible to generalize this
condition, so that the density goes to zero on the boundary of the support.
simplicity this generalization
is
not given here, although the
appendix can be used to verify the Section 3 conditions
It
zero.
is
Lemmas
For
given in the
in this case.
also possible to allow for a discrete regressor with finite support, by
including all
interactions.
dummy
variables for all points of support of the regressor, and all
Because such a regressor
is
essentially parametric, and allowing for
it
does not change any of the convergence rate results, this generalization will not be
considered here.
Under Assumption
4.1 the following condition will suffice for
Assumptions 3.2 and
3.3.
—
»
0,
or
K /n
—
0.
4.
Assumption
4.2:
Either a)
splines, the support of
It is
x
P kK (x)
is
—
r
is
(-1,1]
a power series with
,
possible to allow for data based
convergence rates to those given below.
K(n) = K(n) = K,
K /n
and
2
»
(x)
b) P
kK
are
for splines and obtain similar mean-square
K
This generalization
would further complicate the statement of results.
12
is
not given here because
it
A primitive condition for Assumption 3.4
Assumption
Each of the components
4.3:
differentiate of order
Let
a
function.
&
the following one.
g, (x,),
on the support of
maximum dimension
denote the
is
(I
1,
L),
....
continuously
is
x..
of the components of the additive interactive
This condition can be combined with known results on approximation rates for
power series and splines to show that Assumption 3.4
and with
=
a =
a =
when
/i-d
1.
is
The details are given
d =
satisfied for
in the
and
a =
&/n.
appendix.
These conditions lead to the following result on mean-square convergence.
Theorem
If Assumptions
4.1:
Z^&xJ-gJxjf/n
=
3.1,
and
4.1 - 4.3
(K/n + K~
2A/
"\>,
are satisfied, then
2
2a/a
;.
S[g(x)-g (x)] dF(x) = O (K/n * K~
The integrated mean square error result for splines that
been derived by Stone (1990).
(1990) give the
The rest of
An implication of Theorem
optimal integrated mean-square convergence rate
between certain bounds.
and
a
>
If
3o/2,
attains Stone's (1982) bound.
= Cn
Andrews and Whang
this result is new, although
same conclusion for the sample mean square error of power series under
different hypotheses.
= a/(2A+a),
given here has previously
is
C a
there are
c >
4.1 is that
if
power series
the number of terms
such that
K =
cn
then the mean-square convergence rate
The side condition that
A
satisfies Assumption 4.2.
spline version of Stone (1990), but
it
&
>
3n/2
is
will have an
is
chosen randomly
K = Cn
,
n
~
,
,
needed to ensure
o.
>
K
a/2.
Theorem 3.2 can be specialized to obtain uniform convergence rates for power
13
which
similar side condition is present for the
has the less strigent form of
and spline estimators.
where
series
y
Theorem
If Assumptions
4.2:
\g -
g
\
3.1,
(K[(K/n)
=
and
1/2
4.1 - 4.3
are satisfied, then for power series
K^l),
+
and for regression splines,
If - g Q
\
=
O (K
p
1/2
[(K/n)
1/2
+
K^l).
Obtaining uniform convergence rates for derivatives
is
more
approxirnaton rates are difficult to find in the literature.
function
argument
because
When the argument of each
only one dimensional, an approximation rate follows by a simple integration
is
Lemma
see
(e.g.
A. 12 in the Appendix).
convergence rate for the one-dimensional
Theorem
difficult,
4.3:
If Assumptions
3.1
\g -
additive model) case.
(i.e.
and 4.1-4.3 are satisfied,
m
power series or a regression spline with
i-
This approach leads to the following
n /T?l + 2d,{[K/n]
rlz, .1/2
gQ d = O (K
i
\
fc
1 =
1,
d
< &,
p
(x)
h-d, then for power series,
d,
-,-A+d..
+
jc
;;,
and for splines
\g -
gQ d =
\
In the case of
power
(K
{[K/n]
+
K
}).
series, it is possible to obtain
an approximation rate by a
Taylor expansion argument when the derivatives do not grow too fast with their order.
The rate
is
faster than any power of
K,
14
leading to the following result.
is a
Theorem
4.4:
If Assumptions
C
3.1
and 4.1-4.3 are satisfied,
such that for each multi-index
and there
is
derivative
of each additive component of
a constant
a
any positive integers
\g-g
\
d
-
and
o (K
1+2d
X,
(x)
the
is a
X
power
series,
partial
exists and is bounded by
g(x)
C
,
then for
d,
{[K/n]
1/2
*
p
jf
a
;;.
The uniform convergence rates are not optimal
improve on existing results.
p
in the
sense of Stone (1982), but they
For the one regressor, power series case Theorem 4.2
improves on Cox's (1988) rate of
(K <[K/n]
+
K~
A
>).
For the other cases there do
not seem to be any existing results in the literature, so that Theorems 4.2 - 4.4 give
the only uniform convergence rates available.
It
would be interesting to obtain further
improvements on these results, and investigate the possibility of attaining optimal
uniform convergence rates for series estimators of additive interactive models.
5.
Covariate Interactive Projections.
Estimation of random coefficient projections provides a second example of how the
general results of Section 3 can be applied to specific estimators.
This Section gives
convergence rates for power series and regression spline estimators of projections on the
set
&
described in equation (2.3).
For simplicity, results will be restricted to
mean-square and uniform convergence rates for the function, but not for
Also, the
u
K.
in
equation (2.3) will each be taken equal to the set of
all
its derivatives.
functions of
with finite mean-square.
Convergence rates can be derived under the following analog to the conditions of
Section
4.
15
Assumption
5.1:
u
i)
continuously distributed with a support that
is
product of compact intervals, and bounded density that
K
ii)
p.
K
(u)
is
—
r
[-1,1]
is
=
is
...,
1,
_4
K /n
a power series with
K(n) = K(n) = K,
,
is
L),
and
p (x) =
»
0,
or
b) P
(u)
kK
2
—
K /n
and
>
0.
iii)
continuously differentiable of order
E[ww'
bounded, and
support of
—
K
£
restricted to be a multiple of
is
w®p
K/£
(u)
a cartesian
away from
a.
are splines, the support of
on the support of
is
zero.
where either
Each of the components
has smallest eigenvalue that
|u]
also bounded
is
is
h,
u.;
(u),
iv)
u
(I
w
bounded away from zero on the
u..
These conditions lead to the following result on mean-square convergence.
Theorem
5.1:
If Assumptions
Z^&xJ-grfxjf/n
Also,
is
=
and
are satisfied, then
5.1
(K/n + k'
^
2
1
"),
2<i/r
2
).
S[g(x)- g()(x)] dF(x) = O (K/n + K~
for power series and splines respectively,
\g -
g
\g -
g
\
= O (K[(K/n)
1/2
*
K~*
/r
]),
p
\
=
An important feature of
but
3.1
(K
1/2
[(K/n)
1/2
*
p
K~^r ]).
this result is that the convergence rate does not depend on
controlled by the dimension of the coefficient functions and their degree of
to be expected, since the nonparametric part of the
smoothness.
This feature
projection
the coefficient functions.
is
is
16
£,
Proofs of Theorems
Appendix:
Throughout,
be a generic positive constant and
C
let
minimum and maximum eigenvalues
useful in proving the results.
of a symmetric matrix
A
.(B)
mm
and
A number
B.
A
max
be
(B)
lemmas
of
will be
some Lemmas on mean-square closure of certain
First
spaces of functions are given.
Lemma
A.1:
H
If
and closed and
is linear
E[\\w\\
2
]
<
w
{w' a+h(x)
then
h e K}
:
is
closed.
u = w-P(w|W),
Let
Proof:
w
suffices to assume that
w'a
so that
+ h(x) = u'a + h(x)+P(w|
orthogonal to
is
H.
It
is
well
W
Therefore,
a.
known that
it
finite
dimensional spaces are closed, and that direct sums of closed orthogonal subspaces are
QED.
closed, giving the conclusion.
Lemma
H
each
.
H
Consider sets
A.2:
w
closed and
is
(j =
.,
is
a
1,
J x
of functions of a random vector
J),
....
random vector such
1
Cl(x) =
that
bounded and has smallest eigenvalue bounded away from zero, then
{
T
.
If
x.
E[ww' \x]
,w
h Xx)
:
is
h
.
€
is closed.
By iterated expectations,
Proof:
Lemma
then
I,
x
A.3:
Suppose
= x.,
that
for some
i)
I'
x
with the partitioning
1
for each
,
is
and
C
)].
x.,
C'
]
(I
= E[h(x)'n(x)h(x)] £ CE[h(x)'h(x)]
=
1,
...,
L),
if
There exists a constant
ii)
= (x'[t x
E[a(x)l £ c~ Sa(x)d[F(x )'F(x
l
E[<w'h(x)>
)'
t
,
then
l
for any
a(x) >
0,
x
subvector of
is a
c >
1
such that for each
cSa(x)dlF(x )'F(x
(
{Z^^x^t ElhfxJ2]
x„
< », I =
C
)]
*
t
L)
1
closed in mean-square.
Proof:
Let
H
=
L
-
-
iZ^hfa)}
and
II
2
1/2
a ^ = [Ja(x) dF(x)J
II
.
By Proposition 2 of Section
4 of the Appendix of Bickel, Klaasen, Ritov, and Wellner (1993),
17
K
is
closed
if
and
H
.}
only
there
if
(note
h,
C
a constant
is
such that for each
need not be unique).
h-
maximal dimension of
h e H,
h
II
£ Cmax.dlh.ll
II
Lemma
Following Stone (1990,
1),
for some
}
suppose that the
x.
is
r,
and suppose that this property holds whenever the
maximal dimension of the
x-
is
r-1
= E»,h.(x.),
E[h.(xJS(x.,
such that for
=
)]
Consequently,
for all measurable functions of
-1
~
E[h.(x.
2
To show
1.
)
—
x.
components
that there
x.,
is
a unique decomposition
is
a constant
x.,
c
xf
>
that
,
is
not a proper
such that
1
E[h(x)
that are not components of
function of a strict subvector of
2
1
s c" J<h (x
]
]
£
~
*•
E[h(x)
2
this property, note that that holding fixed the vector of
x
of
h
with finite mean-square.
x.,
suffices to show that for any "maximal"
it
subvector of any other
c
Then there
that are strict subvectors of
x. t
all
or less.
/t
+
)
I
*
k,
h.(xJ
is
a
Then,
x..
/t
each
x»
^
2
h (x
)}
dF(x )dF(x^)
i
e
/t
/t
2
l
= c~ SlS<h ix ) + l
h ix )) dF{x )]6Fix )
k k
t^k t
t
k
k
=
c'Vir^x^.)
2
{J^h^x^AdFfx^ldF^)
+
2
a c'Vl/h^x^dFtx^ldFlx^) = c^Elh^x^) ].
QED.
The next few Lemmas consist of useful convergence results for random matrices with
dimension that can depend on sample
matrices, and
Lemma
A.4:
X
then
.
min
X
max
X
If
(Z)
s C
(
and
• )
.
(Z)
X
a C
mm
.
X
= min M
C
- o
(1).
p
,,
denote symmetric matrices such
the smallest and largest eigenvalues respectively.
IIZ-ZII
w.p.a.l.
For a conformable vector
.
( • )
Z
and
Z
with probability approaching one (w.p.a.l) and
Proof:
mm (Z)
Let
size.
ji,
*-»"»
it
£ X
,<n'ZM + >x'(Z-Z)n>
»"
llfill=l
Therefore,
X
.
min
(Z)
a C/2
follows by
m in (Z)
.
w.p.a.l.
r
18
- X
II
max
QED
•
II
a matrix norm that
(Z-Z) a X
mm (Z)
.
-
IIZ-ZII
a
= o
(1)
Lemma
If
A.5:
\
-\/7
such that
Proof:
HZ
.
D
(e
n
p
HA' BAH £ IIBIIoHA'AII,
s
IIABII
II
Ail
which
Z
A
= o
Ill-Ill
D
and
(1),
p
=
II
w.p.a.l,
n
for some
)
e
n
--\/7
HZ
D
then
,
easy to show that for any conformable matrices
is
It
£ C
mm (Z)
max
and that
and
(B)
(e
p
A
and
Z
Let
(B).
B,
—\/y
HZ"
(A.l)
1/2
s HZ"
s
tr(A)
D
2
(e
(Z
2
(l
ll
n
+ HZ~
1/2
Z
\y?
—i
max
)]
Also by
J
.
= tHD'lZ^-Z^lD
n
n
II
n
D
= [\
)
1/2
p
Let
(Z
max
II
A
•
II
II
B II,
max
(B),
be the symmetric square root of
A
-1
A
s
IIABII
-1/?
an orthogonal matrix and
is
consisting of the square roots of the eigenvalues of
definite and
).
n
positive semi-definite, tr(A'BA) s HAH A
max
U
where
UAU'
equal to
is
is
s KAMA
IIBAII
=
II
n
2
B
if
conformable matrix
is a
n
Z
Note that
Lemma
a diagonal matrix
-1/2
is
—i
\
A.4,
max
(Z
)
positive
=
Then
(1).
p
)
[Z-Z]Z"
-1
1/2
+ ll(Z-Z)Z
II)
D
2
ll
n
2
1/2 2
2
(Z"
)[l + o (1)0 (1) + HZ-ZH X
(1)] =
n
max
P
P
p
-1
A
max
(e
)
A
denote the trace of a square matrix
and
p
(Z
)]
2
n
QED
).
a random matrix with
u
n
rows.
Lemma
A.6:
^
Suppose
\
.
nun
(Z)
a C,
P
is a
K x n
random matrix such
that
HP'P/n -
_i /y
o
and
(1)
P'u/Vnll =
HZ
P
(e
P
tr(u'p(p'pfp'u/n) =
Proof:
Let
W
(e
n
),
and
p =
PA
Let
semi-definite.
is
and
W
= p(p'p)~p'
(€
P
Let
Y
and
rows, and let
G
2
).
n
random matrix.
P
and
Then by Lemma
respectively.
p
P,
W-W
Since the
is positive
A.5, tr(u'Wu/n) s tr(u'Wu/n) =
QED.
denote random matrices with the same number of columns and
u = Y-G.
Then
be the orthogonal projection operators
a subset of the space spanned by
Z = P'P/n.
HZ'^P'iWnll 2 =
is a
).
for the linear spaces spanned by the columns of
p
A
2
= P(P'P)~P'
space spanned by
where
For a matrix
p
19
let
it
= (p'p) p'Y
and
G
=
pit.
n
Zl
Lemma
IIG-GII
2
/n s
For
Proof:
tr(u'p(p'p) p'u/n) =
If
A.7:
p
2
2
+ IIG-pnll /n.
)
n
W
and
(€
IIG-GII
2
W
(e
2
Lemma
as in the proof of
Y'WG
/n = trfY'WY -
Then for any conformable matrix
).
G'WY
-
Lemma
A.8:
X
If
.
min
a C,
(I)
llp'p/n -
2
s
(€
p
2
+
)
n
for
Lemma
G =
X
A.4,
idempotent,
)
2
+
IIG-prell
tr[u'p(p'pf
p'u/n] =
f r r -f
and
(1),
QED
/n.
2
(e
pn
2
/n,
(e
2
+
)
n
r r
mm (p'p/n)
.
2
(l)IIG-pirll
/n.
p
£ C
w.p.a.l, so
X
(p'p/n)~
min r r
.
=
(1).
Therefore,
p
pit,
2
\\n-n\l
s X
-1
.
(p'
p/n)
tr[(ii-ir)' (p' p/n)(ir-ir)]
2
(l)[tr(u'Wu/n) + IIG-GII /n] =
s
(e
p
P
2
+
)
n
=
(DtrfY'WY - Y'WG - G'WY
2
(l)IIG-GII
+
/n.
p
To prove the second conclusion, note that by the triangle inequality and the
same arguments as for the previous equation,
tr[(n-w)'Z(ir-ie)]
2
s
Lemma
a
1K
llir-irll
A.9:
(z),...,a
,a
i=l
-
tr[(i-ii)'[Z-p'p/n](ii-ir)] + (n-n)'(p'p/n)(ir-ii)
HZ-p'p/nH +
If
z.,...,z
and
KK (z))'
n
\\Z
),
n,
(l)IIG-pirll
p
By
J
I-W
p
tr[(ir-w)'Z(jr-n)] a
Proof:
2
p
then for any conformable matrix
Htt-wII
(e
= o
Zll
and
p,
G'G]/n = trlu'Wu + G'(I-W)G]/n
+
s trlu'Wu + (G-pn)'(I-W)(G-pw)]/n s
Wp =
A.6, by
n,
K(n)
p
2
+
)
n
(l)IIG-GII
2
/n =
p
(e
p
2
n
)
+
(l)IIG-GII
K =
(z)]\\
=
p
20
({E[a
K(n)
K(n)
(z)'
/n.
a (z) =
K(n),
K(n)
2
p
are i.Ld. then for any vector of functions
(z.)/n - E[a
i
(e
(z)]/n}
1/2
).
QED.
G'G]/n
K =
Let
Proof:
By the Cauchy-Schwartz inequality,
K(n).
K
K
K
2 1/2
nillj^a^z.J/n - E[a (z)]H] s {EHJj" a (Zj)/ii - E[a (z)]H ]>
£ <E[lla
K
2
(z)ll
/n]>
1/2
,
so the conclusion follows by the Markov inequality.
Now
let
Lemma
n K
K
,P (x.)P (x.)'/n
^i=l
1
1
Z =
A.10:
Z = /P^xjP^xl'dFfx).
and
r.
QED.
3.1 - 3.3
Suppose that Assumptions
If Assumptions
are satisfied.
3.3 a) is
also satisfied
iiz
-
If Assumption
HZ -
Proof:
zii
= o
p
c^K ^K <
4
c/c;
/n7
;
= o
w.
3.3 b) is also satisfied then
4
ZII
= O ([$n(K) /n]
p
1/2
)
= o (V.
p
K
K
L, = 7." P (x.)P (x.)'/n
Let
2/2
and
K
K
Z„ = JT (x)P (x)'dF(x).
conclusion, note that by the Cauchy Schwartz inequality, for
E[
m axKsK£R HZ K -ZK H]
S
*
K2
<lK SK3 KE[«a
2
(z)..
MZ^^-^W
1/2
]/n>
IIZ-ZII
conclusion, note that w.p.a.l,
whence
Let
IIZ-ZII
y = (y
x
s IIZj^-Z^H.
y )\
n
g =
=
Z
s maXj.
and
Z
sKsjt
II
K
K
= P (x)®P
(z)
K
4
(x)..
]/n>
(
L. -Zj.il
1/2
s
tI KsKS
21
(x),
B^^toW^.
K<o
w.p.a.l.
are submatrices of
g^x^)',
K
i
Lr
(K) /nl
and
K
p = [p (x^
)-
The
firSt
To show the second
Z^
and
The conclusion then follows from Lemma
(g^)
first
q^^ij£f%)**Az m>
maxKsK3^ llL.-Zj.ll =
Then by the Markov inequality,
conclusion then follows by
= <Z
E[HP
KsKaR
2
a
To show the
2
A.9.
K
p (x
respectively,
QED.
)]'.
n
1/2
Lemma
3.1 - 3.3
If Assumptions
A.lh
are satisfied, then
(y-g)'p(p'p)~p'(y-g)/n =
u a y-g,
Let
Proof:
like that of the
£ C.
mm (Z)
.
P'u/nll
= tr(Z
_1/2
2
E[P.P'.u ]Z"
1/2
ill
Lemma
3.1,
2
ill
i.i.d.,
2
= tr(Z~
n
1/2
(T.
T n ,E[P.u.P'.u.])Z" 1/2 )/n2
.
^i=l^j=l
i
i
J
J
K
—
-1/2—
P'u/nll =
IIZ
((K/n)
1/2
The conclusion then
).
QED.
A. 6.
The next few lemmas give approximation rate results for power series and
Lemma
fix)
A.12:
is
and
a =
Proof:
1/r
K
n
there is
when
{.-d
r =
with
and
1
combinations of
for some
note that
d p
C
p
K
(x)
I
d a
= P
(x)/Sx
K+d
(x)'7r,
(x)
of
is a
compact box
then there are
{,
'n\.<CK
x.
where
,
C
a,
a = (/r
>
is
it
is
and
R
in
such that
d =
0,
I.
|A(K)|
monotonic increasing, the set of
will include the set of all polynomials of degree
a spanning vector for power series up to order
C
splines.
for when
small enough, so Theorem 8 of Lorentz (1986) applies.
first conclusion, there exists
f
\f-p
For the first conclusion, note that by
all linear
CK
For power series, if the support
continuously differentiable of order
for each
=
]
)/n s tr(CIrr)/n s CK/n.
Therefore, by the Markov inequality,
follows by
and ElP.P'.u
£,
in
P.
and an argument
Assumption
Also, by
0.
By Assumption
A. 9
l
EUHu'PZ^P'uM/n
=
]
Lemma
Also, by
HP'P/n-ZII -^-»
A.10,
Z = E[P'P]/n.
and
]',
p = PA.
such that
so by the data
2
P
[Pj
by each element of
11
2
_1/2
A
ElP.u.] =
Also,
P =
(x.),
Lemma
proof of
E[P.P'.E[u. |x.]] s CZ,
E[IIZ
K
random matrix
3.3, there is a
X
= P
p.
(K/n).
such that for
the case that
sup
all
d
x
1
k
there
f (x)/3x
d
is
w
such that,
d
d
- 3 f (x)/ax
K
|
22
1,
By the
K.
for
s C«K"*
second conclusion then follows by integration and boundedness of the support.
r =
For
+d
For
The
d =
example, for
so that
x
1,
= f„(x),
f(x)
minimum
the
of the support, and the constant coefficient chosen
minimum
equal to the
of the support
|f(x)-f
x,
s
(x)|
V
Lemma
(x)/3x|dx s CK~*
- df
S |3f(x)/3x
then for all
},
K
\f-p 'n\ d s CK~
X
X
By
Proof:
for
m
For a function
l.
constants
C
m(K)
A
x€l l3
let
C
|
X sd
Lemma
|
I
K
x € I
such that for
all
x e
<
f"
K (x)
and
let
x.
Note
C_
n
CK"
a
with
,
= sup
'
|
(,
where
The result for
0x
+ (l-£)x €
so that
also satisfies the hypotheses,
s (^"/[(m-d)!].
P(f,m,x)
a linear combination of
is
By the "natural ordering" hypothesis, there are
s K s C m(K)
?
,
a
so that for any
>
0,
and
,
k
a
I,
3P(f,m,x)/3x. = P(3f/3x.,m-l,x),
Also, 3 f(x)
C m(K)
such that
*
a f (x)_sAf (x)
case, note that
there is
f(x)\ s
form of the remainder,
= P(f,m(K),x).
'
CK~
max r \d
is
denote the Taylor series up
P(f,m,x)
f(x),
X
differentiable of order
Proof:
all
f(x) - P(3 f,m-| A|,x)|
For splines, if
A.14:
iZ-p^'nl
such that for
be the largest integer such that
/[(m(K)-d)!] s
sup
such that fix)
A,
>
3 P(f,m,x) = P(3 f,m-|A|,x).
m(K)
C
is
.
star-shaped, there exists
max
and
C
there is
so that by the intermediate value
p (x),
star-shaped and there
orders and for all multi-indices
for an expansion around
by induction
Next, let
is
U
Oss
all
to order
1
all
d >
a,
QED
.
For power series, if
A.13:
continuously differentiable of
C
+1
d =
3 p (x)/3x
X
Is
X sd
|
X
'
(x)_P(aXf m(K)_ X
'
a compact box and
then there are
a = /-d
^
for
a,
r =
1
C >
and
I
fix)
I
>
is
x)
I
"
CK_a
K
such that for
all
d s m-1
a = £/r
and
a spanning vector for splines of degree
23
-
continuously
follows by Theorem 12.8 of Schumaker (1981).
is
QED
-
there is
for
n
d =
0.
For the other
m-d,
with knot
spacing bounded by
w
there exists
OK
CK
K
for
such that for
K
f
large enough and some
K
= p
(x)
K
(x)'7r
K
,
sup
Therefore, by Powell
C.
d
x
3 f(x)/3x
l
d
d
- 3 f
(x)/3x
(1981),
d
<
|
The conclusion then follows by integration, similarly to the proof of Lemma
QED.
A. 12.
The next two Lemmas show that for power series and
such that Assumption 3.2
is
splines, there exists
P (x)
satisfied, and give explicit bounds on the series and their
derivatives.
Lemma
For power series, if the support of
A.15:
Assumptions
3.2
a subvector of
and equation
(3.1)
are satisfied, with
P
all
K £
(x)
for
Following the definitions
(x
CTl,
f
)
(1-xJ
J
CjfJO s CK
then
,
P (x)
and
is
1.
Abramowitz and Stegun
in
product of compact
is a Cartesian
say of unit intervals, with density bounded below
intervals,
Proof:
x
(1972, Ch. 22), let
C
(«)
.
a]
/is
(
denote the ultraspherical polynomial of order
n2
1-2a
r(k+2a)/<k!(k+a)[r(a)]
12
2
1
J
J
(2x .-x -x )/(x .-x
.
J
J
J
.)
2
P
K
(x)
by
is
2
1
C
„
K
.
min
M
K
ksK
(o:).
Also, let
<c.(x.)
=
by the "natural ordering" assumption
p (x)
Also, for
P(x)
absolutely continuous on
i "j
r
2
f[._.[(x .-x.)(x.-x.)]
„
K
(XP (x)P (x)'dP(x)) £ X
= max
}
£
(i.e.
X =
and by the change of there
,
with
where the inequality follows by
for
(
w-
with pdf proportional to
a constant
X
jW(k>
monotonic increasing).
[j._ [x.,x.]
is
- n
a nonsingular combination of
|A(k)|
r
k
(x)
[A^f 1/2 C
a,
and define
(v+.5)
p
p^te) =
and
>,
for exponent
k
IMk)|
and
r
j=l
mm (JV,[p
P
.
J
w
M
=
^^
J
(<C.(x.))p
J
a subvector of
(x)
P^M
Iv+.S)
iv +.5)
J
®._,[p
V}A
lx))
u
m
(<c.(x.))']dP(x)) = C,
J
w
(a:.(x.))
-
Next, by differentiating 22.5.37 of Abramowitz and Stegun (for
24
J
m
there equal to
v
here) and solving,
follows that for
it
d^^ixVdx1
«sk,
that by 22.14.2 of Abramowitz and Stegun, for
+2A
.5+i>
\a\KK.v U)\
where the
Lemma
s
J—
equation
(3.1)
are satisfied, with
be the B-spline of order
with left end-knot
...
P
*k
(
VSV
P,.^(x)
kIC
|A(k-s)|
For splines, if Assumption
First, consider the case
Proof:
* ciMk-.)|
2)1/
x
a CK
1/r
and
=
(x)
(1/2)+d
.
and
(P..(xJ
cl
l
n.
-1 + 2j/[L+l],
=
j
B
Let
[-1,1].
....
(<r),
.
-1,
-1,
0,
(k
"
4+m+1
l
l
'
'
'•
•-
r) '
n^V^u^i'V
in
-^
„.x(x.).
A
,K,
P
such that
,
(x)
= Ap K,(x)
.
for
x e I
follows
of all multiplicative interactions of splines for components of
p (x)
h(x)
and the usual basis result for B-splines
(e.g.
19.2 of Powell, 1981).
(P.
x
are
all
i.i.d.
r.. (x.)
c,L+m+lt
positive integers
L.
eigenvalue follows by
K.
Also,
when
are tne so-called normalized B-splines with evenly spaced
there
(x-)),
c
the number of elements of
uniform random variables, and noting that
follows by the argument of Burman and Chen (1989,
P..
x,
that are nonzero are bounded, uniformly in
K (x),...,P KK (x))
[2(m+l)/L.](L./2l
it
I =
let
let
\-m-l,L/V'
= n.Ll(Mk)>0)P,
the elements of
knots,
-
ck 5+v+2\
QED.
.
Next, a well known property of B-splines that for
P
s
satisfied then Assumptions 3.2 and
4.1 is
x = x_
where
corresponding to components of
Theorem
CK
a
for the knot sequence
m,
j,
^(K)
Then existence of a nonsingular matrix
by inclusion
5 * ,HW
so
(x)
-
,
/rf -
iv ***' 5)
equation (2.3),
in
J
last equality follows by
A.16:
as
,
J
J
cn'iwuk-.)]
Mk-s)
= C*C
is
C
with
X
.
min
(I
1
P.
.
.
c,L
p.
1587) that for
(x)P. . (x)'dx) £
t,L
C
P. ,(x) =
for
all
Therefore, the boundedness away from zero of the smallest
P
K
(x)
a subvector of
25
r
®»_,P/
,
(
x #). analogously to the proof of
Lemma
since changing even knot spacing
Also,
A. 14.
argument of B-splines,
sup_ \d B
Ax)/dx
.
Lemma
that there
n
K
2
(x.)'ji]
t
Lemma
Also, by
let
/n s sup
l
„|g n (x)-p
K
K
let
Z = JT (x)P (x)'dF(x)
A. 10
and
let
re
the hypotheses of
P
K
=
(K
_2a
)
so
= O (K
—
_2a
).
p
—
= (K/n)
e
1/2
The
A. 7.
e
K
p
hypotheses of
Lemma
A. 8,
By Assumption 3.2 and Lemmas
)]'.
n
replacing
(x)
—
= (K/n)
Then
(x).
eq.
1/2
For each
.
(A.2) is
K
2
T[g (x)-P (x)'Tt] dF(x) s
n
(K/n) +
+
Proof of Theorem 3.2:
(x),
d = 0,
Then by the second conclusion of Lemma A.8,
).
p
p
K
the
K
2
2
J[g (x)-i(x)] dF(x) s 2T[g (x)-P (x)'n] dF(x) + 2(n-7t)'Z(n-ir)
s
p
K
_2a
with
A.8 are satisfied with
K
(K
CK
s
In the
A. 8.
K
p (x)) and
—let
*y
(x)'ir|
P
Lemma
P (x
[pfyxj]
Lemma
replacing
(x)
\C
„|g n (x)-P
(A.3)
p
be as above, except with
satisfied (with
sup
and
2
(x)'ir|
A. 7 is satisfied
Lemma
proven using
is
K
u
xea.
Lemma
the hypothesis of
A.ll,
The second conclusion
K
in
be that from Assumption 3.4 with
it
first conclusion then follows by the conclusion of
A.ll,
present follows as
is
such that
E.i=l [g u
n (x.)-p
l
(A.2)
K
For each
3.1:
C
is
x
implying the bounds on
QED.
8.4.
Proof of Theorem
d s m,
,
The proof when
derivatives given in the conclusion.
proof of
J CL
I
equivalent to rescaling the
is
Because
P
(l)r.
[g ft
^1=1 °0
(x)
is
i
|g_(x)-P%x)'ir|
b),
.,
d
when
K a
so that
K,
n
Lemma
2
(x.)'Tr]
p
P
K
K
|g-(x)-P (x)'n|
,
d
=
_a
p
(K
—
).
(x).
p
2a
).
QED.
Also, by
|g_(x)-P (x)'ir|. s
Also,
Lemma
A.8 and the triangle inequality,
26
(K/n + —
K
K
replacing
(x)
can be chosen so that
O
/n =
l
of Theorem 3.1 that eq. (A.2) and the hypotheses of
first conclusion of
K
l
a constant nonsingular linear transformation of
Assumption 3.4 will be satisfied for
Assumption 3.3
(x.)-P
it
follows as in the proof
A.8 are satisfied.
Then by the
K
(A.4)
lg -£l
K/
s lg -P 'nl
d
IP
d
pap
= ° (K~
Proof of Theorem
=
C.(K)0 ((K/n)
CK
wflere for eacn
and
A. 12
A. 14
+K
_a
d =
a = a/a.
and
s O (K
p
=
)
4.1 it
d
d
1/2
+K~
~~
Lemma
A. 3 there exists
Lemma
x,
follows by Theorem 3.1 with
for splines.
d =
K
there
n = £.n.,
Lemma
K
A. 3 are
a representation
is
n
I
Assumption 3.4
a.
lg n » - P
with
K
'
n n
\
satisfied
is
Assumptions 3.2 and equation
A. 15,
C Q (K) =
and
QED.
]).
less than or equal to
is
(3.1)
are
Then the conclusion
and Assumption 4.2 implies that Assumption 3.3 holds.
satisfied,
a
follows that the hypotheses of
inequality, for
Also, by
+ C (K)"*-wll
)
(C,
tne dimension of
*
a
pa,(K)[(K/n)
follows that for each
it
Then by the triangle
with
1/2
By Assumption
4.1:
E/8n^ x ^'
Then by Lemmas
s
+
)
Therefore, by the conclusion of
satisfied.
SqM
a
(w-w)l
C n (K) =
for power series and
K
1/2
QED.
Proof of Theorem 4.2:
It
follows as in the proof of Theorem 4.1 that Assumptions 3.1 -
3.4 are satisfied, with £ (K) =
n
K
C n (K) =
for power series and
K
1/2
The
for splines.
conclusion then follows by Theorem 3.2.
QED.
Proof of Theorem 4.3:
proof of Theorem 4.2, except that Assumption 3.4
is
now
(3.1)
< d (K) =
a = -&+d
satisfied with
are
now
K
satisfied with
(1/2)+d
K
A. 12
and
that Assumption 4.1
A. 12
is
and Assumption 3.2 and equation
A. 14,
for power series, by
Lemma
A.16.
Lemma
A. 15,
Lemma
and
A. 14,
a
Assumption 3.4
satisfied with
u
replacing
equal to the vector from the conclusion of
27
is
x.
satisfied with
Let
Lemmas
A. 13 is
QED.
> 0.
By
similar to that of Theorems 4.1 and 4.2.
is
and with
QED.
Follows as in the proof of Theorem 4.3, except that
The proof
5.1:
bounded and Lemmas
(u)
Lemmas
by
show that Assumption 3.4 holds for any
Proof of Theorem
P
in the
Cj(K) =
for splines, by
Proof of Theorem 4.4:
applied to
Follows as
P
(x)
a =
<*/r.
= w«P
A. 15 or A.16, for
(u)
w
Also, note
for
power series
Then by the smallest eigenvalue of
and splines respectively.
Y.
from
zero,
K
Y.
E[P (x)P (x)'l a
Y./9
bounds on elements of
elements of
P
(u),
P
(x)
|u]
bounded away
Y./!f
CUsElP^^uJP'^fu)'
sense, so the smallest eigenvalue of
E[ww'
K
in the positive
])
K
E[P (x)P (x)']
is
semi-definite
bounded away from zero.
Also,
are the same, up to a constant multiple, as bounds on
so that Assumption 3.3 will hold.
the conclusions to Theorems 3.1 and 3.2.
QED.
28
The conclusion then follows by
References
Abramowitz, M. and Stegun,
I.
A.,
eds.
Handbook of Mathematical Functions.
(1972).
Washington, D.C.: Commerce Department.
Agarwal, G. and Studden, W. (1980). Asymptotic integrated mean square error using least
squares and bias minimizing splines. Annals of Statistics. 8 1307-1325.
Andrews, D.W.K. (1991). Asymptotic normality of series estimators for various
nonparametric and semiparametric models. Econometrica. 59 307-345.
Andrews, D.W.K. and Whang, Y.J. (1990). Additive interactive regression models:
Circumvention of the curse of dimensionality. Econometric Theory. 6 466-479.
Bickel P., C.A.J. Klaassen, Y. Ritov, and
J. A. Wellner (1993): Efficient and
adaptive inference in semiparametric models, monograph, forthcoming.
Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple
regression and correlation. Journal of the American Statistical Association. 80
580-598.
Breiman,
L.,
Stone, C.J. (1978). Nonlinear additive regression, note.
Buja, A., Hastie, T., and Tibshirani, R. (1989). Linear smoothers and additive models.
Annals of Statistics. 17 453-510.
Burman, P. and Chen, K.W. (1989). Nonparametric estimation of a regression function.
Annals of Statistics. 17 1567-1596.
Cox, D.D. (1988).
of
Approximation of Least Squares Regression on Nested Subspaces. .Annals
Statistics. 16 713-732.
Friedman,
J.
and Stuetzle, W.
(1981).
Projection pursuit regression. Journal
of the
American Statistical Association. 76 817-823.
On the bias in flexible functional forms and an essentially
unbiased form: The Fourier flexible form. Journal of Econometrics. 76 211 - 245.
Gallant, A.R. (1981).
Gallant, A.R. and Souza, G. (1991). On the asymptotic normality of Fourier flexible
estimates. Journal of Econometrics. 50 329-353.
Lorentz, G.G. (1986). Approximation
form
of Functions. New York: Chelsea Publishing
Company.
Newey, W.K. (1988). Adaptive estimation of regression models via moment restrictions.
Journal of Econometrics. 38 301-339.
Newey, W.K. (1993a). The asymptotic variance of semiparametric estimators. Preprint.
MIT. Department of Economics.
Newey, W.K. (1993b). Series estimation of regression functionals. forthcoming.
Econometric Theory.
29
Powell, M.J.D. (1981). Approximation Theory and Methods. Cambridge, England: Cambridge
University Press.
Rao, C.R. (1973). Linear Statistical Inference and Its Applications.
Riedel, K.S.
Institute,
(1992).
York: Wiley.
Smoothing spline growth curves with covariates. preprint, Courant
New York
Schumaker, L.L.
New
(1981):
University.
Spline Functions: Basic Theory. Wiley,
New
York.
Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression.
Annals of Statistics. 10 1040-1053.
Stone, C.J. (1985). Additive regression and other nonparametric models. Annals
Statistics. 13 689-705.
of
Stone, C.J. (1990). L_ rate of convergence for interaction spline regression, Tech. Rep.
No. 268, Berkeley).
(1984). Cross-validated spline methods for the estimation of multivariate
functions from data on functionals. In Statistics: An Appraisal, Proceedings 50th
Anniversary Conference Iowa State Statistical Laboratory (H.A. David and H.T. David,
eds. ) 205-235, Iowa State University Press, Ames, Iowa.
Wahba, G.
Thomas, D.M. (1975). Ozone trends in the Eastern Los Angeles basin
corrected for meteorological variations. Proceedings International Conference on
Environmental Sensing and Assessment, 2, held September 14-19, 1975, in Las Vegas,
Nevada.
Zeldin, M.D. and
7579
O
I
30
Date Due
Lib-26-67
MIT LIBRARIES DUPL
3
TQflO
0063210^
3