Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/convergenceratesOOnewe working paper department of economics massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 CONVERGENCE RATES FOR SERIES ESTIMATORS Whitney No. 93-10 K. Newey July 1993 W6 12.9J* rEGBVk i CONVERGENCE RATES FOR SERIES ESTIMATORS Whitney K. Newey MIT Department of Economics July, 1993 This paper consists of part of one originally titled "Consistency and Asymptotic Normality of Nonparametric Projection Estimators." Helpful comments were provided by Andreas Buja and financial support by the NSF and the Sloan Foundation. Abstract Least squares projections are a useful way of describing the relationship between random variables. functions. These include conditional expectations and projections on additive Series estimators, i.e. regressions on a finite dimensional vector where dimension grows with sample size, provide a convenient way of estimating such projections. This paper gives convergence rates these estimators. derived, and primitive regularity conditions given for Keywords: General results are power series and splines. Nonparametric regression, additive interactive models, random coefficients, polynomials, splines, convergence rates. 1. Introduction Least squares projections of a random variable x provide a useful example is way on functions of a random vector of describing the relationship between and y The simplest x. linear regression, the least squares projection on the set of linear combinations of as exemplified in Rao (1973, Chapter x, nonpar ametric example functions of fall in y x is 4). An interesting the conditional expectation, the projection on the set of all with finite mean square. There are also a variety of projections that between these two polar cases, where the set of functions One example linear combinations but smaller than all functions. is is larger than all an additive regression, the projection on functions that are additive in the different elements of x. This case is motivated partly by the difficulty of estimating conditional expectations when Friedman (1977). (1985), x has many components: see Breiman and Stone (1978), Breiman and Friedman and Stuetzle A generalization (1981), Stone (1985), and Zeldin and Thomas that includes some interaction terms functions that are additive in some subvectors of combinations of functions of x, x. is the projection on Another example is random linear as suggested by Riedel (1992) for growth curve estimation. One simple way to estimate nonparametric projections is by regression on a finite dimensional subset, with dimension allowed to grow with the sample size, e.g. as in Agarwal and Studden (1980), Gallant (1981), Stone (1985), which will be referred to here as series estimation. Cox (1988), and Andrews This type of estimator may (1991), not be good at recovering the "fine structure" of the projection relative to other smoothers, e.g. see Buja, Hastie, and Tibshirani (1989), but is computationally simple. Also, projections often show up as nuisance functions in semiparametric estimation, where the fine structure is less important. This paper derives convergence rates for series estimators of projections. Convergence rates are important because they show how dimension affects the asymptotic accuracy of the estimators (e.g. Stone 1982, 1985). Also, they are useful for the theory of semiparametric estimators that depend on projection estimates 1993a). (e.g. Newey The paper gives mean-square rates for estimation of the projection and uniform convergence rates for estimation of functions and derivatives. Fully primitive regularity conditions are given for power series and regression splines, as well as more general conditions that may apply to other types of series. Previous work on convergence rates for series estimates includes Agarwal and Studden Cox (1980), Stone (1935, 1990), (1988), improves on mau>y previous results and Andrews and Whang (1990). in the This paper convergence rate or generality of regularity Uniform convergence rates for functions and their derivatives are given and conditions. some of the results allow for a data-based number of approximating terms, unlike Cox in but all Also, the projection does not have to equal the conditional expectation, as (1988). Stone (1985, 1990) but not the others. 2. Series Estimators The results of this paper concern estimators of least squares projections that can be described as follows. functions of z, Let x with z denote a data observation, having dimension linear subspace of the set of all functions of projection of (2.1) y on & is „E[<y - g(x)> (measurable) denote a mean-squared closed, with finite mean-square. The 2 ]. the conditional expectation, measurable functions of x ^ x and is g Q (x) = argmin An example Let r. y x g n (x) = E[y|x], with finite mean-square. as illustrations, and are of interest in their own right. Two where !» is the set of all further examples will be used When Additive- Interactive Projections: difficult to estimate general x, way a feature often referred to as the "curse of E[y|x], so that the individual components have smaller dimension than x, to describe these is to let W = For example, {Z^g^) if L = additive functions. : ngt (xt and each r Z ) ) x. The projection on nonparametric nonlinearities x., (1 = L) 1 x. One be distinct subvectors of < «h is !* just a component of the set x, is just one or in individual regressors. two dimensional, then "§ consists of generalizes linear regression to allow for The set of equation further generalization that allows for nonlinear interactive terms. (2.2) is a For example, if each would allow for just pairwise this set interactions. Cova.ria.te Interactive Projections: As discussed problems in Riedel (1992), growth in curve estimation motivate considering projections that are random linear combinations of functions. To describe of covariates and let these, suppose Hf = (1 , x = (w,u), 1,..., L) where w = (w be sets of functions of w. u. )' is a vector Consider the set of functions (2.3) In is and specify the space of functions as (2.2) x. it This problem motivates projections that are additive in functions of dimensionality." subvectors of has more than a few distinct components x i? = <£. w.h.(u) h- € H.}, : a growth curve application covariate coefficient that The estimators of dimensional subspace of is u |u] is nonsingular with probability one. represents time, so that each h.(u) represents a allowed to vary over time in a general way. g n (x) W, E[ww' considered here are sample projections on a finite which can be described as follows. Let p (x) = (p.„(x) be a vector of functions, each of which Pj-j-tx))' Denote the data observations by and )' y (y. p K = K [p (x y. and x., K p (x ) (i = 1, 2, § an element of and ...), for sample size )], is '. y = let An estimator of n. g n (x) is g(x) = p (2.4) where K n = (x)'rt, (p K 'p K fp K/ y, denotes a generalized inverse, and (•) K. subscripts for The matrix been suppressed for notational convenience. K p 'p K n and g(x) have will be asymptotically nonsingular under conditions given below, making the choice of generalized inverse asymptotically irrelevant. The idea of sample projection estimators K are that K as 1) grows (i.e. (E[p p K 1), ir K K (x). estimation error in is !*, an element of K and ir is K 2), and it p (x)'re small, g(x) will if this approximation 2) p arbitrarily closely in K (x) "spans" mean !* is square). approximate should approximate g n (x) on Consequently, when the g n (x). g n (x). types of approximating functions will be considered in detail. They are power series and splines. Power Series: A = denote an r-dimensional vector of nonnegative A r For a and let x s TT„ x. a multi-index, with norm |A| = £._,A., Let (A, 1 integers, sequence to i.e. (A(k)). _ A )' r a E[p (x)y] = the coefficients of the projection of n (x)], 1) *§, g n (x) can be chosen big enough that there that approximates n s (E[p (x)p (x)']) -IK E[p (x)g Thus, under Two p (x) estimates (x)p (x)']) p (x) for any function in linear combination of Under K each component of that they should approximate The two key features of allowed to grow with the sample size. is is Art of distinct such vectors, a power series approximation corresponds (2.5) p kK (x) = x Mk) (k Throughout the paper , it will be is imposed so that 2, 1, ...). Mk) assumed that are ordered so that For estimating the conditional expectation monotonically increasing. also be required that = E[y|x], can be approximated by a power series. E[y|x] it an element of This can be accomplished by requiring that the only !?. will Additive-interactive projections can be estimated by restricting the multi-indices so that each term is is This requirement include all distinct multi-indices. (Mk)), _. |A(k)| p, Mk) v (x) that are included are those where indices of nonzero elements are the same as the indices of a subvector x. for some In addition, covariate interactive I. taking the multi-indices to have the same dimension as approximating functions to be selects a component of P tK (x) = w »(l.} u Mk) make them and specifying the where • £(k) is an integer that w. Power series have a potential drawback of being possible to u terms can be estimated by less sensitive by using transformation of the original data. by a logit transformation l/(l+e sensitive to outliers. power series in a It may be bounded, one-to-one An example would be to replace each component of I). The theory to follow uses orthogonal polynomials, which may help alleviate the well known multicollinearity problem for power series. If each x Mk) is replaced with the product of orthogonal polynomials of order corresponding to components of respect to some weight function on the range of x, and the distribution of similar to this weight, then there should be little collinearity x I Mk) . Mk) The estimator | is multicollinearity problem for function is x. with is among the different will be numerically invariant to such a replacement (because monotonically increasing), but Regression Splines: A(k), power A regression it may alleviate the well known series. spline is a series estimator where the approximating a smooth piecwise polynomial with fixed knots (join points). They have x some attractive features relative to power series, including being less sensitive to singularities in the function being approximated and less oscillatory. A disadvantage that the theory requires that knots be placed in the support and be nonrandom (as in is The power series theory does not must be known. Stone, 1985), so that the support require a known support. To describe regression splines it is convenient to begin with the one-dimensional x For convenience, suppose that the support of case. [-1,1] is (it normalized to take this form) and that the knots are evenly spaced. An 0)(»). m degree spline with L+l evenly spaced knots on can always be Let [-1,1] x = (•) 1(» > a linear is combination of P*L (v) (2.6) v , 1 - Os.Jts'in, " <[v + 2(*-m)/(L+l)] m + > , m+1 £ k £ m+L Multivariate spline terms can be formed by interacting univariate ones for different components of and (2 7) - k, x. For a set of multi-indices <A(k)>, X.(k) £ with m+L-1 for each the approximating functions will be products of univariate splines, ^>XM,L Uj ) ' {k = Note that corresponding to each l '- K K) there is a number of knots for each component of Throughout the paper be assumed that each ratio of numbers of knots for a pair of elements of required that For estimating the conditional expectation (Mk)). _. imposed so that E[y|x] i.e. - and a choice of which multiplicative components to include. above and below. include all distinct multi-indices. E[y|x], it x is bounded will also be This requirement is can be approximated by interactive splines. series. Also, covariate interactive terms can be estimated by forming the approximating functions as products of elements of u with splines x will it Additive-interactive projections can be estimated by restricting the multi-indices in the same way as for power j in x analogously to the power series case. The theory to follow uses B-splines, which are a linear transformation of the above basis that is nonsingular on and has low multicollinearity. The low [-1,1] multicollinearity of B-splines and recursive formula for calculation also lead to computational advantages; e.g. see Powell (1981). number of terms Series estimates depend on the choice of the K desirable to choose based on the data. With a data-based choice of estimates have the flexibility to adjust to conditions in the data. might choose K, is it these For example, one by delete one cross validation, by minimizing the sum of squared K E-_Jy - g K (x.)] residuals so that K, - function computed from where , g .„(x.) is the estimate of the regression the observations but the all will allow for data based i . Some of the results to follow K. General Convergence Rates This section derives some convergence rates for general series estimators. this is it useful to introduce some conditions. Also, for a v <E[ IIYII D matrix 1/v ]> , v < eo, let II and II = (trace(D'D)] Oil - h.(x), u = y J Let 1/2 = for a random matrix , the infimum of constants IIYII u. C such that y. To do - h_.(x.). Y, 1 IIYII Prob( IIYII = < C) 00 1. Assumption 3.1: {(y.,x.)> is i.i.d. and 2 E[u |x] The bounded second conditional moment assumption Stone, 1985). Apparently it is is bounded on the support of quite common in the literature (e.g. can be relaxed only at the expense of affecting the convergence rates, so to avoid further complication this assumption x.. is retained. The next Assumption is useful for controlling the second moment matrix of the series terms. Assumption K K P (x) = Ap uniformly For each 3.2: K there is g(x) is such that for bounded away from zero is invariant to nonsingular linear transformations, there K really no need to distinguish between transformation A is allowed for needed for some transformation. series, but will apply to Assumption 3.2 is p (x) and P K An at this point. (x) is explicit order to emphasize that Assumption 3.2 in is only For example, Assumption 3.2 will not apply to power orthonormal polynomials. a normalization that leads to the series terms having specific The regularity conditions grow too fast with the sample will also require that the size. The size of P (x) magnitude of P (x) will be quantified by X K < d (K) = su P|A|=dxeX ..a P (x)H (3.1) I where is the support of x, vector of nonnegative integers, r ixi That EIP (x)P (x)'l A K. in Since the estimator not K K the smallest eigenvalue of (x), magnitudes. constant, nonsingular matrix is, = r. ,x **j=l r <j(K) is the , II II = (trace(D'D)l 1/2 " for a matrix D, X denotes a and x K a p (x) » a ul p K (x)/ax 1 1 1 -"ax r . r supremum of the norms of derivatives of order d. The following condition places some limits on the growth of the series magnitude. Also, it allows for data based choice of terms are nested. K, at the expense of imposing that series Assumption 3.3: There are < K+l s K(n) subvector of 4 £K ^ K C Q (K) /n and P p (x) a) for (x) K all p (x), so that in part a) Part b) sequence of vectors. — 0, » or; K K(n) £ p < K s K+l for (x) K The b) K(n) s with P (x) K+l s R~(n) K(n) with probability K all with K(n) s of Assumption 3.2 and — < (K(n)) /n a is > 0. invariant to nonsingular linear transformations is suffices that any such transformation form a nested it more is a subvector of is As previously noted, a series estimate of such that K(n) K approaching one and either K and K(n) restrictive, in requiring that the <P from (x)) Assumption 3.2 be nested, but imposes a less stringent requirement on the growth rate of K. Also, if K is K = K(n) = nonrandom, so that K(n), the nested sequence requirment of both part a) and b) will be satisfied, because that requirement is vacuous when K = K. In order to specify primitive hypotheses for Assumptions 3.2 and 3.3 possible to find P for, or bounds on, satisfying the eigenvalue condition, and having (x) That C n (K). eigenvalues are bounded away from power series and regression that is 2 K /n — Fourier series, but this is It when x is possible to derive such bounds for both is continuously distributed with a density These bounds lead to the requirements that for regression splines with nonrandom » are described in Sections 5 and known values one needs explicit bounds on series terms where the zero. splines, bounded away from zero. for power series and is, 6. must be it It is K. —4 K /n > These results also possible to derive such results for not done here because they are most suitable for approximation of periodic functions, which have fewer applications. It may also be possible to derive results for Gallant's (1981) Fourier flexible form, although this more — difficult, as described in Gallant problem with the Fourier flexible form and Souza is (1991). In is terms of this paper, the that the linear and quadtratic terms can be approximated extremely quickly by the Fourier terms, leading to a multicollinearity problem so severe that simultaneous satisfaction of Assumptions 3.2 and 3.3 would impose very slow growth rates on Assumptions The bias norm 3.1 - 3.3 are useful for controlling the variance of a series estimator. the error from the finite dimensional approximation. is will be used to quantify this approximation. X defined on |f| and K. |f = maX d |A|sd maX equal to infinity , | and a nonnegative integer Many of the results f(x) ' does not exist for some 5 f(x) if For a measurable function let d, |aAf(x)l x€ Z A supremum Sobolev ad \X\ and x e J. will be based on the following polynomial approximation rate condition. Assumption for all K 3.4: there This condition is There n is is a nonnegative integer |g - p with not primitive, but is * ir I . known the higher the degree of derivative of CK s and constants d C, such that > . to be satisfied in many cases. a that exists, the bigger g(x) a Typically, and/or d can This type of primtive condition will be explicitly discussed for power series be chosen. in Section 5 and for splines approximation rate for an is generalization leads to in Section 6. L is It also possible to obtain results norm, rather than the sup norm. much more complicated when the However, this results, and so is not given here. These assumptions will imply both mean square, and uniform convergence rates for the series estimate. The first result gives mean-square rates. Let F(x) denote the x. Theorem 3.1: If and Assumptions l^fgUJ-gjxjf/n 3.1 - 3.4 are satisfied for = O (K/n * 2 Slg(x)-gn(x)) dF(x) = O p K (K/n + K 10 2 *), 2cL ). d = then CDF of The two terms in the sample mean square error, first conclusion, on is u Also, the second conclusion, on integrated K need not satisfy at the expense of requiring Assumptions 3.2 and 3.3, that in these other papers. Whang Here the number of terms allowed to depend on the data, and the projection residual E[u|x] = 0, The bias. similar to those of Andrews and and Newey (1993b), but the hypotheses are different. (1991) is convergence rate essentially correspond to variance and were not imposed mean square error, has not been previously given at this level of generality, although Stone (1985) gave specific results for spline estimation of an additive projection. The next result gives uniform convergence rates. Theorem 3.2: integer d If Assumptions 3.1, 3.2, 3.3 b), and 3.4 are satisfied for a nonnegative then \g - g \ d = O p ((; There does not seem to be d (K)[(K/n) 1/2 + K~*]). in the literature cover derivatives and general series in the any previous uniform convergence results that way this one does. univariate power series case, the convergence rate that improves on that of Cox (1988), as further discussed is Furthermore, for the implied by this result in Section 4. These uniform rates do not attain Stone's (1982) bounds, although they do appear to improve on previously known rates. For specific classes of functions !? and series approximations, more primitive conditions for Assumptions 3.2 - 3.4 can be specified in order to derive convergence rates for the estimators. These results are illustrated in the next two Sections, where convergence rates are derived for power series and regression spline estimators of additive interactive and covariate interactive functions. 11 4. Additive Interactive Projections This Section gives convergence rates for power series and regression spline estimators of additive interactive functions. x restricts Assumption The first regularity condition to be continuously distributed. 4.1: x is continuously distributed with a support that product of compact intervals, and bounded density that This assumption closed. is is is a cartesian away from also bounded useful for showing that the set of additive-interactive functions is Also, this condition leads to Assumptions 3.2 and 3.3 being satisfied with explicit formulae for C«(K). For power series it is possible to generalize this condition, so that the density goes to zero on the boundary of the support. simplicity this generalization is not given here, although the appendix can be used to verify the Section 3 conditions It zero. is Lemmas For given in the in this case. also possible to allow for a discrete regressor with finite support, by including all interactions. dummy variables for all points of support of the regressor, and all Because such a regressor is essentially parametric, and allowing for it does not change any of the convergence rate results, this generalization will not be considered here. Under Assumption 4.1 the following condition will suffice for Assumptions 3.2 and 3.3. — » 0, or K /n — 0. 4. Assumption 4.2: Either a) splines, the support of It is x P kK (x) is — r is (-1,1] a power series with , possible to allow for data based convergence rates to those given below. K(n) = K(n) = K, K /n and 2 » (x) b) P kK are for splines and obtain similar mean-square K This generalization would further complicate the statement of results. 12 is not given here because it A primitive condition for Assumption 3.4 Assumption Each of the components 4.3: differentiate of order Let a function. & the following one. g, (x,), on the support of maximum dimension denote the is (I 1, L), .... continuously is x.. of the components of the additive interactive This condition can be combined with known results on approximation rates for power series and splines to show that Assumption 3.4 and with = a = a = when /i-d 1. is The details are given d = satisfied for in the and a = &/n. appendix. These conditions lead to the following result on mean-square convergence. Theorem If Assumptions 4.1: Z^&xJ-gJxjf/n = 3.1, and 4.1 - 4.3 (K/n + K~ 2A/ "\>, are satisfied, then 2 2a/a ;. S[g(x)-g (x)] dF(x) = O (K/n * K~ The integrated mean square error result for splines that been derived by Stone (1990). (1990) give the The rest of An implication of Theorem optimal integrated mean-square convergence rate between certain bounds. and a > If 3o/2, attains Stone's (1982) bound. = Cn Andrews and Whang this result is new, although same conclusion for the sample mean square error of power series under different hypotheses. = a/(2A+a), given here has previously is C a there are c > 4.1 is that if power series the number of terms such that K = cn then the mean-square convergence rate The side condition that A satisfies Assumption 4.2. spline version of Stone (1990), but it & > 3n/2 is will have an is chosen randomly K = Cn , n ~ , , needed to ensure o. > K a/2. Theorem 3.2 can be specialized to obtain uniform convergence rates for power 13 which similar side condition is present for the has the less strigent form of and spline estimators. where series y Theorem If Assumptions 4.2: \g - g \ 3.1, (K[(K/n) = and 1/2 4.1 - 4.3 are satisfied, then for power series K^l), + and for regression splines, If - g Q \ = O (K p 1/2 [(K/n) 1/2 + K^l). Obtaining uniform convergence rates for derivatives is more approxirnaton rates are difficult to find in the literature. function argument because When the argument of each only one dimensional, an approximation rate follows by a simple integration is Lemma see (e.g. A. 12 in the Appendix). convergence rate for the one-dimensional Theorem difficult, 4.3: If Assumptions 3.1 \g - additive model) case. (i.e. and 4.1-4.3 are satisfied, m power series or a regression spline with i- This approach leads to the following n /T?l + 2d,{[K/n] rlz, .1/2 gQ d = O (K i \ fc 1 = 1, d < &, p (x) h-d, then for power series, d, -,-A+d.. + jc ;;, and for splines \g - gQ d = \ In the case of power (K {[K/n] + K }). series, it is possible to obtain an approximation rate by a Taylor expansion argument when the derivatives do not grow too fast with their order. The rate is faster than any power of K, 14 leading to the following result. is a Theorem 4.4: If Assumptions C 3.1 and 4.1-4.3 are satisfied, such that for each multi-index and there is derivative of each additive component of a constant a any positive integers \g-g \ d - and o (K 1+2d X, (x) the is a X power series, partial exists and is bounded by g(x) C , then for d, {[K/n] 1/2 * p jf a ;;. The uniform convergence rates are not optimal improve on existing results. p in the sense of Stone (1982), but they For the one regressor, power series case Theorem 4.2 improves on Cox's (1988) rate of (K <[K/n] + K~ A >). For the other cases there do not seem to be any existing results in the literature, so that Theorems 4.2 - 4.4 give the only uniform convergence rates available. It would be interesting to obtain further improvements on these results, and investigate the possibility of attaining optimal uniform convergence rates for series estimators of additive interactive models. 5. Covariate Interactive Projections. Estimation of random coefficient projections provides a second example of how the general results of Section 3 can be applied to specific estimators. This Section gives convergence rates for power series and regression spline estimators of projections on the set & described in equation (2.3). For simplicity, results will be restricted to mean-square and uniform convergence rates for the function, but not for Also, the u K. in equation (2.3) will each be taken equal to the set of all its derivatives. functions of with finite mean-square. Convergence rates can be derived under the following analog to the conditions of Section 4. 15 Assumption 5.1: u i) continuously distributed with a support that is product of compact intervals, and bounded density that K ii) p. K (u) is — r [-1,1] is = is ..., 1, _4 K /n a power series with K(n) = K(n) = K, , is L), and p (x) = » 0, or b) P (u) kK 2 — K /n and > 0. iii) continuously differentiable of order E[ww' bounded, and support of — K £ restricted to be a multiple of is w®p K/£ (u) a cartesian away from a. are splines, the support of on the support of is zero. where either Each of the components has smallest eigenvalue that |u] also bounded is is h, u.; (u), iv) u (I w bounded away from zero on the u.. These conditions lead to the following result on mean-square convergence. Theorem 5.1: If Assumptions Z^&xJ-grfxjf/n Also, is = and are satisfied, then 5.1 (K/n + k' ^ 2 1 "), 2<i/r 2 ). S[g(x)- g()(x)] dF(x) = O (K/n + K~ for power series and splines respectively, \g - g \g - g \ = O (K[(K/n) 1/2 * K~* /r ]), p \ = An important feature of but 3.1 (K 1/2 [(K/n) 1/2 * p K~^r ]). this result is that the convergence rate does not depend on controlled by the dimension of the coefficient functions and their degree of to be expected, since the nonparametric part of the smoothness. This feature projection the coefficient functions. is is 16 £, Proofs of Theorems Appendix: Throughout, be a generic positive constant and C let minimum and maximum eigenvalues useful in proving the results. of a symmetric matrix A .(B) mm and A number B. A max be (B) lemmas of will be some Lemmas on mean-square closure of certain First spaces of functions are given. Lemma A.1: H If and closed and is linear E[\\w\\ 2 ] < w {w' a+h(x) then h e K} : is closed. u = w-P(w|W), Let Proof: w suffices to assume that w'a so that + h(x) = u'a + h(x)+P(w| orthogonal to is H. It is well W Therefore, a. known that it finite dimensional spaces are closed, and that direct sums of closed orthogonal subspaces are QED. closed, giving the conclusion. Lemma H each . H Consider sets A.2: w closed and is (j = ., is a 1, J x of functions of a random vector J), .... random vector such 1 Cl(x) = that bounded and has smallest eigenvalue bounded away from zero, then { T . If x. E[ww' \x] ,w h Xx) : is h . € is closed. By iterated expectations, Proof: Lemma then I, x A.3: Suppose = x., that for some i) I' x with the partitioning 1 for each , is and C )]. x., C' ] (I = E[h(x)'n(x)h(x)] £ CE[h(x)'h(x)] = 1, ..., L), if There exists a constant ii) = (x'[t x E[a(x)l £ c~ Sa(x)d[F(x )'F(x l E[<w'h(x)> )' t , then l for any a(x) > 0, x subvector of is a c > 1 such that for each cSa(x)dlF(x )'F(x ( {Z^^x^t ElhfxJ2] x„ < », I = C )] * t L) 1 closed in mean-square. Proof: Let H = L - - iZ^hfa)} and II 2 1/2 a ^ = [Ja(x) dF(x)J II . By Proposition 2 of Section 4 of the Appendix of Bickel, Klaasen, Ritov, and Wellner (1993), 17 K is closed if and H .} only there if (note h, C a constant is such that for each need not be unique). h- maximal dimension of h e H, h II £ Cmax.dlh.ll II Lemma Following Stone (1990, 1), for some } suppose that the x. is r, and suppose that this property holds whenever the maximal dimension of the x- is r-1 = E»,h.(x.), E[h.(xJS(x., such that for = )] Consequently, for all measurable functions of -1 ~ E[h.(x. 2 To show 1. ) — x. components that there x., is a unique decomposition is a constant x., c xf > that , is not a proper such that 1 E[h(x) that are not components of function of a strict subvector of 2 1 s c" J<h (x ] ] £ ~ *• E[h(x) 2 this property, note that that holding fixed the vector of x of h with finite mean-square. x., suffices to show that for any "maximal" it subvector of any other c Then there that are strict subvectors of x. t all or less. /t + ) I * k, h.(xJ is a Then, x.. /t each x» ^ 2 h (x )} dF(x )dF(x^) i e /t /t 2 l = c~ SlS<h ix ) + l h ix )) dF{x )]6Fix ) k k t^k t t k k = c'Vir^x^.) 2 {J^h^x^AdFfx^ldF^) + 2 a c'Vl/h^x^dFtx^ldFlx^) = c^Elh^x^) ]. QED. The next few Lemmas consist of useful convergence results for random matrices with dimension that can depend on sample matrices, and Lemma A.4: X then . min X max X If (Z) s C ( and • ) . (Z) X a C mm . X = min M C - o (1). p ,, denote symmetric matrices such the smallest and largest eigenvalues respectively. IIZ-ZII w.p.a.l. For a conformable vector . ( • ) Z and Z with probability approaching one (w.p.a.l) and Proof: mm (Z) Let size. ji, *-»"» it £ X ,<n'ZM + >x'(Z-Z)n> »" llfill=l Therefore, X . min (Z) a C/2 follows by m in (Z) . w.p.a.l. r 18 - X II max QED • II a matrix norm that (Z-Z) a X mm (Z) . - IIZ-ZII a = o (1) Lemma If A.5: \ -\/7 such that Proof: HZ . D (e n p HA' BAH £ IIBIIoHA'AII, s IIABII II Ail which Z A = o Ill-Ill D and (1), p = II w.p.a.l, n for some ) e n --\/7 HZ D then , easy to show that for any conformable matrices is It £ C mm (Z) max and that and (B) (e p A and Z Let (B). B, —\/y HZ" (A.l) 1/2 s HZ" s tr(A) D 2 (e (Z 2 (l ll n + HZ~ 1/2 Z \y? —i max )] Also by J . = tHD'lZ^-Z^lD n n II n D = [\ ) 1/2 p Let (Z max II A • II II B II, max (B), be the symmetric square root of A -1 A s IIABII -1/? an orthogonal matrix and is consisting of the square roots of the eigenvalues of definite and ). n positive semi-definite, tr(A'BA) s HAH A max U where UAU' equal to is is s KAMA IIBAII = II n 2 B if conformable matrix is a n Z Note that Lemma a diagonal matrix -1/2 is —i \ A.4, max (Z ) positive = Then (1). p ) [Z-Z]Z" -1 1/2 + ll(Z-Z)Z II) D 2 ll n 2 1/2 2 2 (Z" )[l + o (1)0 (1) + HZ-ZH X (1)] = n max P P p -1 A max (e ) A denote the trace of a square matrix and p (Z )] 2 n QED ). a random matrix with u n rows. Lemma A.6: ^ Suppose \ . nun (Z) a C, P is a K x n random matrix such that HP'P/n - _i /y o and (1) P'u/Vnll = HZ P (e P tr(u'p(p'pfp'u/n) = Proof: Let W (e n ), and p = PA Let semi-definite. is and W = p(p'p)~p' (€ P Let Y and rows, and let G 2 ). n random matrix. P and Then by Lemma respectively. p P, W-W Since the is positive A.5, tr(u'Wu/n) s tr(u'Wu/n) = QED. denote random matrices with the same number of columns and u = Y-G. Then be the orthogonal projection operators a subset of the space spanned by Z = P'P/n. HZ'^P'iWnll 2 = is a ). for the linear spaces spanned by the columns of p A 2 = P(P'P)~P' space spanned by where For a matrix p 19 let it = (p'p) p'Y and G = pit. n Zl Lemma IIG-GII 2 /n s For Proof: tr(u'p(p'p) p'u/n) = If A.7: p 2 2 + IIG-pnll /n. ) n W and (€ IIG-GII 2 W (e 2 Lemma as in the proof of Y'WG /n = trfY'WY - Then for any conformable matrix ). G'WY - Lemma A.8: X If . min a C, (I) llp'p/n - 2 s (€ p 2 + ) n for Lemma G = X A.4, idempotent, ) 2 + IIG-prell tr[u'p(p'pf p'u/n] = f r r -f and (1), QED /n. 2 (e pn 2 /n, (e 2 + ) n r r mm (p'p/n) . 2 (l)IIG-pirll /n. p £ C w.p.a.l, so X (p'p/n)~ min r r . = (1). Therefore, p pit, 2 \\n-n\l s X -1 . (p' p/n) tr[(ii-ir)' (p' p/n)(ir-ir)] 2 (l)[tr(u'Wu/n) + IIG-GII /n] = s (e p P 2 + ) n = (DtrfY'WY - Y'WG - G'WY 2 (l)IIG-GII + /n. p To prove the second conclusion, note that by the triangle inequality and the same arguments as for the previous equation, tr[(n-w)'Z(ir-ie)] 2 s Lemma a 1K llir-irll A.9: (z),...,a ,a i=l - tr[(i-ii)'[Z-p'p/n](ii-ir)] + (n-n)'(p'p/n)(ir-ii) HZ-p'p/nH + If z.,...,z and KK (z))' n \\Z ), n, (l)IIG-pirll p By J I-W p tr[(ir-w)'Z(jr-n)] a Proof: 2 p then for any conformable matrix Htt-wII (e = o Zll and p, G'G]/n = trlu'Wu + G'(I-W)G]/n + s trlu'Wu + (G-pn)'(I-W)(G-pw)]/n s Wp = A.6, by n, K(n) p 2 + ) n (l)IIG-GII 2 /n = p (e p 2 n ) + (l)IIG-GII K = (z)]\\ = p 20 ({E[a K(n) K(n) (z)' /n. a (z) = K(n), K(n) 2 p are i.Ld. then for any vector of functions (z.)/n - E[a i (e (z)]/n} 1/2 ). QED. G'G]/n K = Let Proof: By the Cauchy-Schwartz inequality, K(n). K K K 2 1/2 nillj^a^z.J/n - E[a (z)]H] s {EHJj" a (Zj)/ii - E[a (z)]H ]> £ <E[lla K 2 (z)ll /n]> 1/2 , so the conclusion follows by the Markov inequality. Now let Lemma n K K ,P (x.)P (x.)'/n ^i=l 1 1 Z = A.10: Z = /P^xjP^xl'dFfx). and r. QED. 3.1 - 3.3 Suppose that Assumptions If Assumptions are satisfied. 3.3 a) is also satisfied iiz - If Assumption HZ - Proof: zii = o p c^K ^K < 4 c/c; /n7 ; = o w. 3.3 b) is also satisfied then 4 ZII = O ([$n(K) /n] p 1/2 ) = o (V. p K K L, = 7." P (x.)P (x.)'/n Let 2/2 and K K Z„ = JT (x)P (x)'dF(x). conclusion, note that by the Cauchy Schwartz inequality, for E[ m axKsK£R HZ K -ZK H] S * K2 <lK SK3 KE[«a 2 (z).. MZ^^-^W 1/2 ]/n> IIZ-ZII conclusion, note that w.p.a.l, whence Let IIZ-ZII y = (y x s IIZj^-Z^H. y )\ n g = = Z s maXj. and Z sKsjt II K K = P (x)®P (z) K 4 (x).. ]/n> ( L. -Zj.il 1/2 s tI KsKS 21 (x), B^^toW^. K<o w.p.a.l. are submatrices of g^x^)', K i Lr (K) /nl and K p = [p (x^ )- The firSt To show the second Z^ and The conclusion then follows from Lemma (g^) first q^^ij£f%)**Az m> maxKsK3^ llL.-Zj.ll = Then by the Markov inequality, conclusion then follows by = <Z E[HP KsKaR 2 a To show the 2 A.9. K p (x respectively, QED. )]'. n 1/2 Lemma 3.1 - 3.3 If Assumptions A.lh are satisfied, then (y-g)'p(p'p)~p'(y-g)/n = u a y-g, Let Proof: like that of the £ C. mm (Z) . P'u/nll = tr(Z _1/2 2 E[P.P'.u ]Z" 1/2 ill Lemma 3.1, 2 ill i.i.d., 2 = tr(Z~ n 1/2 (T. T n ,E[P.u.P'.u.])Z" 1/2 )/n2 . ^i=l^j=l i i J J K — -1/2— P'u/nll = IIZ ((K/n) 1/2 The conclusion then ). QED. A. 6. The next few lemmas give approximation rate results for power series and Lemma fix) A.12: is and a = Proof: 1/r K n there is when {.-d r = with and 1 combinations of for some note that d p C p K (x) I d a = P (x)/Sx K+d (x)'7r, (x) of is a compact box then there are {, 'n\.<CK x. where , C a, a = (/r > is it is and R in such that d = 0, I. |A(K)| monotonic increasing, the set of will include the set of all polynomials of degree a spanning vector for power series up to order C splines. for when small enough, so Theorem 8 of Lorentz (1986) applies. first conclusion, there exists f \f-p For the first conclusion, note that by all linear CK For power series, if the support continuously differentiable of order for each = ] )/n s tr(CIrr)/n s CK/n. Therefore, by the Markov inequality, follows by and ElP.P'.u £, in P. and an argument Assumption Also, by 0. By Assumption A. 9 l EUHu'PZ^P'uM/n = ] Lemma Also, by HP'P/n-ZII -^-» A.10, Z = E[P'P]/n. and ]', p = PA. such that so by the data 2 P [Pj by each element of 11 2 _1/2 A ElP.u.] = Also, P = (x.), Lemma proof of E[P.P'.E[u. |x.]] s CZ, E[IIZ K random matrix 3.3, there is a X = P p. (K/n). such that for the case that sup all d x 1 k there f (x)/3x d is w such that, d d - 3 f (x)/ax K | 22 1, By the K. for s C«K"* second conclusion then follows by integration and boundedness of the support. r = For +d For The d = example, for so that x 1, = f„(x), f(x) minimum the of the support, and the constant coefficient chosen minimum equal to the of the support |f(x)-f x, s (x)| V Lemma (x)/3x|dx s CK~* - df S |3f(x)/3x then for all }, K \f-p 'n\ d s CK~ X X By Proof: for m For a function l. constants C m(K) A x€l l3 let C | X sd Lemma | I K x € I such that for all x e < f" K (x) and let x. Note C_ n CK" a with , = sup ' | (, where The result for 0x + (l-£)x € so that also satisfies the hypotheses, s (^"/[(m-d)!]. P(f,m,x) a linear combination of is By the "natural ordering" hypothesis, there are s K s C m(K) ? , a so that for any > 0, and , k a I, 3P(f,m,x)/3x. = P(3f/3x.,m-l,x), Also, 3 f(x) C m(K) such that * a f (x)_sAf (x) case, note that there is f(x)\ s form of the remainder, = P(f,m(K),x). ' CK~ max r \d is denote the Taylor series up P(f,m,x) f(x), X differentiable of order Proof: all f(x) - P(3 f,m-| A|,x)| For splines, if A.14: iZ-p^'nl such that for be the largest integer such that /[(m(K)-d)!] s sup such that fix) A, > 3 P(f,m,x) = P(3 f,m-|A|,x). m(K) C is . star-shaped, there exists max and C there is so that by the intermediate value p (x), star-shaped and there orders and for all multi-indices for an expansion around by induction Next, let is U Oss all to order 1 all d > a, QED . For power series, if A.13: continuously differentiable of C +1 d = 3 p (x)/3x X Is X sd | X ' (x)_P(aXf m(K)_ X ' a compact box and then there are a = /-d ^ for a, r = 1 C > and I fix) I > is x) I " CK_a K such that for all d s m-1 a = £/r and a spanning vector for splines of degree 23 - continuously follows by Theorem 12.8 of Schumaker (1981). is QED - there is for n d = 0. For the other m-d, with knot spacing bounded by w there exists OK CK K for such that for K f large enough and some K = p (x) K (x)'7r K , sup Therefore, by Powell C. d x 3 f(x)/3x l d d - 3 f (x)/3x (1981), d < | The conclusion then follows by integration, similarly to the proof of Lemma QED. A. 12. The next two Lemmas show that for power series and such that Assumption 3.2 is splines, there exists P (x) satisfied, and give explicit bounds on the series and their derivatives. Lemma For power series, if the support of A.15: Assumptions 3.2 a subvector of and equation (3.1) are satisfied, with P all K £ (x) for Following the definitions (x CTl, f ) (1-xJ J CjfJO s CK then , P (x) and is 1. Abramowitz and Stegun in product of compact is a Cartesian say of unit intervals, with density bounded below intervals, Proof: x (1972, Ch. 22), let C («) . a] /is ( denote the ultraspherical polynomial of order n2 1-2a r(k+2a)/<k!(k+a)[r(a)] 12 2 1 J J (2x .-x -x )/(x .-x . J J J .) 2 P K (x) by is 2 1 C „ K . min M K ksK (o:). Also, let <c.(x.) = by the "natural ordering" assumption p (x) Also, for P(x) absolutely continuous on i "j r 2 f[._.[(x .-x.)(x.-x.)] „ K (XP (x)P (x)'dP(x)) £ X = max } £ (i.e. X = and by the change of there , with where the inequality follows by for ( w- with pdf proportional to a constant X jW(k> monotonic increasing). [j._ [x.,x.] is - n a nonsingular combination of |A(k)| r k (x) [A^f 1/2 C a, and define (v+.5) p p^te) = and >, for exponent k IMk)| and r j=l mm (JV,[p P . J w M = ^^ J (<C.(x.))p J a subvector of (x) P^M Iv+.S) iv +.5) J ®._,[p V}A lx)) u m (<c.(x.))']dP(x)) = C, J w (a:.(x.)) - Next, by differentiating 22.5.37 of Abramowitz and Stegun (for 24 J m there equal to v here) and solving, follows that for it d^^ixVdx1 «sk, that by 22.14.2 of Abramowitz and Stegun, for +2A .5+i> \a\KK.v U)\ where the Lemma s J— equation (3.1) are satisfied, with be the B-spline of order with left end-knot ... P *k ( VSV P,.^(x) kIC |A(k-s)| For splines, if Assumption First, consider the case Proof: * ciMk-.)| 2)1/ x a CK 1/r and = (x) (1/2)+d . and (P..(xJ cl l n. -1 + 2j/[L+l], = j B Let [-1,1]. .... (<r), . -1, -1, 0, (k " 4+m+1 l l ' ' '• •- r) ' n^V^u^i'V in -^ „.x(x.). A ,K, P such that , (x) = Ap K,(x) . for x e I follows of all multiplicative interactions of splines for components of p (x) h(x) and the usual basis result for B-splines (e.g. 19.2 of Powell, 1981). (P. x are all i.i.d. r.. (x.) c,L+m+lt positive integers L. eigenvalue follows by K. Also, when are tne so-called normalized B-splines with evenly spaced there (x-)), c the number of elements of uniform random variables, and noting that follows by the argument of Burman and Chen (1989, P.. x, that are nonzero are bounded, uniformly in K (x),...,P KK (x)) [2(m+l)/L.](L./2l it I = let let \-m-l,L/V' = n.Ll(Mk)>0)P, the elements of knots, - ck 5+v+2\ QED. . Next, a well known property of B-splines that for P s satisfied then Assumptions 3.2 and 4.1 is x = x_ where corresponding to components of Theorem CK a for the knot sequence m, j, ^(K) Then existence of a nonsingular matrix by inclusion 5 * ,HW so (x) - , /rf - iv ***' 5) equation (2.3), in J last equality follows by A.16: as , J J cn'iwuk-.)] Mk-s) = C*C is C with X . min (I 1 P. . . c,L p. 1587) that for (x)P. . (x)'dx) £ t,L C P. ,(x) = for all Therefore, the boundedness away from zero of the smallest P K (x) a subvector of 25 r ®»_,P/ , ( x #). analogously to the proof of Lemma since changing even knot spacing Also, A. 14. argument of B-splines, sup_ \d B Ax)/dx . Lemma that there n K 2 (x.)'ji] t Lemma Also, by let /n s sup l „|g n (x)-p K K let Z = JT (x)P (x)'dF(x) A. 10 and let re the hypotheses of P K = (K _2a ) so = O (K — _2a ). p — = (K/n) e 1/2 The A. 7. e K p hypotheses of Lemma A. 8, By Assumption 3.2 and Lemmas )]'. n replacing (x) — = (K/n) Then (x). eq. 1/2 For each . (A.2) is K 2 T[g (x)-P (x)'Tt] dF(x) s n (K/n) + + Proof of Theorem 3.2: (x), d = 0, Then by the second conclusion of Lemma A.8, ). p p K the K 2 2 J[g (x)-i(x)] dF(x) s 2T[g (x)-P (x)'n] dF(x) + 2(n-7t)'Z(n-ir) s p K _2a with A.8 are satisfied with K (K CK s In the A. 8. K p (x)) and —let *y (x)'ir| P Lemma P (x [pfyxj] Lemma replacing (x) \C „|g n (x)-P (A.3) p be as above, except with satisfied (with sup and 2 (x)'ir| A. 7 is satisfied Lemma proven using is K u xea. Lemma the hypothesis of A.ll, The second conclusion K in be that from Assumption 3.4 with it first conclusion then follows by the conclusion of A.ll, present follows as is such that E.i=l [g u n (x.)-p l (A.2) K For each 3.1: C is x implying the bounds on QED. 8.4. Proof of Theorem d s m, , The proof when derivatives given in the conclusion. proof of J CL I equivalent to rescaling the is Because P (l)r. [g ft ^1=1 °0 (x) is i |g_(x)-P%x)'ir| b), ., d when K a so that K, n Lemma 2 (x.)'Tr] p P K K |g-(x)-P (x)'n| , d = _a p (K — ). (x). p 2a ). QED. Also, by |g_(x)-P (x)'ir|. s Also, Lemma A.8 and the triangle inequality, 26 (K/n + — K K replacing (x) can be chosen so that O /n = l of Theorem 3.1 that eq. (A.2) and the hypotheses of first conclusion of K l a constant nonsingular linear transformation of Assumption 3.4 will be satisfied for Assumption 3.3 (x.)-P it follows as in the proof A.8 are satisfied. Then by the K (A.4) lg -£l K/ s lg -P 'nl d IP d pap = ° (K~ Proof of Theorem = C.(K)0 ((K/n) CK wflere for eacn and A. 12 A. 14 +K _a d = a = a/a. and s O (K p = ) 4.1 it d d 1/2 +K~ ~~ Lemma A. 3 there exists Lemma x, follows by Theorem 3.1 with for splines. d = K there n = £.n., Lemma K A. 3 are a representation is n I Assumption 3.4 a. lg n » - P with K ' n n \ satisfied is Assumptions 3.2 and equation A. 15, C Q (K) = and QED. ]). less than or equal to is (3.1) are Then the conclusion and Assumption 4.2 implies that Assumption 3.3 holds. satisfied, a follows that the hypotheses of inequality, for Also, by + C (K)"*-wll ) (C, tne dimension of * a pa,(K)[(K/n) follows that for each it Then by the triangle with 1/2 By Assumption 4.1: E/8n^ x ^' Then by Lemmas s + ) Therefore, by the conclusion of satisfied. SqM a (w-w)l C n (K) = for power series and K 1/2 QED. Proof of Theorem 4.2: It follows as in the proof of Theorem 4.1 that Assumptions 3.1 - 3.4 are satisfied, with £ (K) = n K C n (K) = for power series and K 1/2 The for splines. conclusion then follows by Theorem 3.2. QED. Proof of Theorem 4.3: proof of Theorem 4.2, except that Assumption 3.4 is now (3.1) < d (K) = a = -&+d satisfied with are now K satisfied with (1/2)+d K A. 12 and that Assumption 4.1 A. 12 is and Assumption 3.2 and equation A. 14, for power series, by Lemma A.16. Lemma A. 15, Lemma and A. 14, a Assumption 3.4 satisfied with u replacing equal to the vector from the conclusion of 27 is x. satisfied with Let Lemmas A. 13 is QED. > 0. By similar to that of Theorems 4.1 and 4.2. is and with QED. Follows as in the proof of Theorem 4.3, except that The proof 5.1: bounded and Lemmas (u) Lemmas by show that Assumption 3.4 holds for any Proof of Theorem P in the Cj(K) = for splines, by Proof of Theorem 4.4: applied to Follows as P (x) a = <*/r. = w«P A. 15 or A.16, for (u) w Also, note for power series Then by the smallest eigenvalue of and splines respectively. Y. from zero, K Y. E[P (x)P (x)'l a Y./9 bounds on elements of elements of P (u), P (x) |u] bounded away Y./!f CUsElP^^uJP'^fu)' sense, so the smallest eigenvalue of E[ww' K in the positive ]) K E[P (x)P (x)'] is semi-definite bounded away from zero. Also, are the same, up to a constant multiple, as bounds on so that Assumption 3.3 will hold. the conclusions to Theorems 3.1 and 3.2. QED. 28 The conclusion then follows by References Abramowitz, M. and Stegun, I. A., eds. Handbook of Mathematical Functions. (1972). Washington, D.C.: Commerce Department. Agarwal, G. and Studden, W. (1980). Asymptotic integrated mean square error using least squares and bias minimizing splines. Annals of Statistics. 8 1307-1325. Andrews, D.W.K. (1991). Asymptotic normality of series estimators for various nonparametric and semiparametric models. Econometrica. 59 307-345. Andrews, D.W.K. and Whang, Y.J. (1990). Additive interactive regression models: Circumvention of the curse of dimensionality. Econometric Theory. 6 466-479. Bickel P., C.A.J. Klaassen, Y. Ritov, and J. A. Wellner (1993): Efficient and adaptive inference in semiparametric models, monograph, forthcoming. Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association. 80 580-598. Breiman, L., Stone, C.J. (1978). Nonlinear additive regression, note. Buja, A., Hastie, T., and Tibshirani, R. (1989). Linear smoothers and additive models. Annals of Statistics. 17 453-510. Burman, P. and Chen, K.W. (1989). Nonparametric estimation of a regression function. Annals of Statistics. 17 1567-1596. Cox, D.D. (1988). of Approximation of Least Squares Regression on Nested Subspaces. .Annals Statistics. 16 713-732. Friedman, J. and Stuetzle, W. (1981). Projection pursuit regression. Journal of the American Statistical Association. 76 817-823. On the bias in flexible functional forms and an essentially unbiased form: The Fourier flexible form. Journal of Econometrics. 76 211 - 245. Gallant, A.R. (1981). Gallant, A.R. and Souza, G. (1991). On the asymptotic normality of Fourier flexible estimates. Journal of Econometrics. 50 329-353. Lorentz, G.G. (1986). Approximation form of Functions. New York: Chelsea Publishing Company. Newey, W.K. (1988). Adaptive estimation of regression models via moment restrictions. Journal of Econometrics. 38 301-339. Newey, W.K. (1993a). The asymptotic variance of semiparametric estimators. Preprint. MIT. Department of Economics. Newey, W.K. (1993b). Series estimation of regression functionals. forthcoming. Econometric Theory. 29 Powell, M.J.D. (1981). Approximation Theory and Methods. Cambridge, England: Cambridge University Press. Rao, C.R. (1973). Linear Statistical Inference and Its Applications. Riedel, K.S. Institute, (1992). York: Wiley. Smoothing spline growth curves with covariates. preprint, Courant New York Schumaker, L.L. New (1981): University. Spline Functions: Basic Theory. Wiley, New York. Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression. Annals of Statistics. 10 1040-1053. Stone, C.J. (1985). Additive regression and other nonparametric models. Annals Statistics. 13 689-705. of Stone, C.J. (1990). L_ rate of convergence for interaction spline regression, Tech. Rep. No. 268, Berkeley). (1984). Cross-validated spline methods for the estimation of multivariate functions from data on functionals. In Statistics: An Appraisal, Proceedings 50th Anniversary Conference Iowa State Statistical Laboratory (H.A. David and H.T. David, eds. ) 205-235, Iowa State University Press, Ames, Iowa. Wahba, G. Thomas, D.M. (1975). Ozone trends in the Eastern Los Angeles basin corrected for meteorological variations. Proceedings International Conference on Environmental Sensing and Assessment, 2, held September 14-19, 1975, in Las Vegas, Nevada. Zeldin, M.D. and 7579 O I 30 Date Due Lib-26-67 MIT LIBRARIES DUPL 3 TQflO 0063210^ 3