Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Proceedings of the International Congress of Mathematicians Berkeley, California, USA, 1986 A Nonparametric Framework for Statistical Modelling CHARLES J. STONE 1. Introduction. Much of mathematical statistics deals with inference concerning the unknowns in a stochastic model for a random phenomenon. In the parametric approach the unknowns are a specific finite number of real parameters. In the nonparametric approach they are functions, perhaps subject to smoothness or other regularity conditions. These functions can be approximated by means of a flexible finite-dimensional function space. To some extent, this reduces the nonparametric approach to the parametric approach. But the asymptotic theory is different when the error of approximation is taken into account and the dimension of the approximating function space is allowed to tend to infinity with the sample size. We will illustrate this theory by means of three examples: density estimation, logistic regression, and additive logistic regression. 2. Density estimation. Let y be a d-dimensional random variable taking on values from a known compact subset C of R d . It is assumed that the distribution of Y has a density function / which is continuous and positive on C. By definition / / = 1. Set g = l o g / . Let Sn denote a pn-dimensional vector space of functions on G having basis Bnj, 1 < j <pn. It is assumed that £\- Bnj = 1 on C and that no nontrivial linear combination of Bnj, 1 < j < pn, is almost everywhere equal to zero on C. Given 9 G 0 n , the space of p n -dimensional vectors, set Cn(9) = \og\ expl^OjBnj and fn(-;9) = exp Y!0*3»' - C "W ' Then / fn( • ; 0) = 1 for 9 G 6 n . Observe that fn( • ; 9), 0 G 9 n , is an exponential family in canonical form. Let 6 n o denote the (pn — l)-dimensional space consisting of those 9 G O n , the sum of whose elements is zero. Let 9n denote This research was supported in part by National Science Foundation Grant DMS-8600409. © 1987 International Congress of Mathematicians 1986 1052 A NONPARAMETRIC FRAMEWORK FOR STATISTICAL MODELLING 1053 the unique value of 9 G 6 n o that maximizes the expected log-likelihood function A n ('), defined by K(9)=E = T,93 J*Bnjf-Cn(9), J20jBnj(Y)-Cn(9) 9een0. Consider the loglinear density approximation fn = fn( • \9n) to / . Let Fi,... ,Yn be independent random variables each having density / . The log-likelihood function for the parametric model is given by ' The maximum likelihood estimator (MLE) 9n is the value of 9 G Gno that maximizes ln( • ) . Since ln( • ) is a strictly concave function on O n0 , the MLE is unique if it exists. The corresponding estimate fn = fn( • ; 9n) of / is called a loglinear density estimate, since log/ n G Sn. Let || ||2 and || ||oo denote the usual L^ and L ^ norms of functions on C. Let || \\oo,A denote the L ^ norm of functions on A. It is assumed that pn —> oo as n —* oo, that pn = o(n,5~£) for some e > 0, and that limn_,oo infsGsn ||s —ft||oo = 0 if h is continuous on G. Let Hnh denote the orthogonal projection of h onto Sn with respect to Lt2(C). It is assumed that there is a positive constant M such that ||IInJi||oo < M||A||oo for n > 1 and h G Loo (G); \Bnj\ < M on C for n > 1 and 1 < j < pn\ for 1 < j < pn, Bnj = 0 outside a set Cnj having diameter at most Mpn ' and Cnj fi Cnk is nonempty for at most M values of k\ and M'1^]2 < ^0hB, >nk oo, Cn -MpnLi^ JCnj 9 kB;nk \ k for n > 1, 9 G 6 n , and 1 < j < pn. These properties can be satisfied with Cnj, 1 < j < pn, a partition of C and Bnj the indicator function for Cnj. Here fn is the corresponding histogram density estimate. They can also be satisfied with d = 1, Sn a space of splines and Bnj, 1 < j < pn, a basis consisting of jB-splines; see de Boor [1, 2] or Stone [7, 8, 9] for details. Presumably, they can also be satisfied with tensor product spaces of splines and with spaces of the type that arise in the use of the finite element method (see arguments in Descloux [3] and de Boor [1]). Some asymptotic properties of loglinear density estimation will now be summarized. The proofs follow from arguments in Stone [9]. Set 6n = inf SCS» •ffllc THEOREM 1. (i) | | / „ - / | | o o = 0 ( « n ) ; (ii) fn exists except on an event whose probability tends to zero with n; R ||/n - /»Ila = O p r ((n-Vn) 1 / 2 ); and H ||/n - /nlloo = OprKn-Vnlogten)) 1 / 2 ). 1054 C. J. STONE Write fn - f = fn - f + fn - fn- The quantity / n - / is a "bias" term, while fn — fn is a "noise" term whose magnitude is indicated by its asymptotic variance. Under typical smoothness assumptions on g, 8n = 0(pnQ ) for some positive number q (this holds with q = m if g has a bounded rath derivative). Set 7 = l/(2<7 + d) and r = 57. Suppose that ^d < 1/2 or, equivalently, that q > d/2. To get the optimal rate of convergence of ||/ n — /H2 to zero, choose Pn ~ nld. Then 62 ~ n"1pn ~ n~2r and hence = Opr(n-r)- \\fn-fh To get the optimal rate of convergence of ||/ n - /Hoo to zero choose pn ~ (n/log(n)) 7d . Then 62 ~ n~ 1 p n log(p n ) ~ ( n - 1 log(n))2r and hence ||/n-/||oc = O p r ((n- 1 log(n)r). (See Stone [5] for a precise definition of optimal rate of convergence.) Let In(') denote the information function based on the random sample of size n. Then In(9) is the Hessian matrix of nGn( • ) at 9; that is, the pn x pn matrix whose (j, fc)th element is n(d2Cn(9)/d9jd9k). Let In~1(9) denote the inverse to In(9) viewed as a linear transformation of 9 n o- Set I~1 = In"1(9n) and In1 — In1^)Let Gn(2/), Gn(y) G 0 n o, denote the pn-dimensional vectors having elements GnM = Bnj(y)-^(<>n) and Ônj(y) = Bnj(y) - ^ (§n) , respectively. Set = fnmGniyyi^Gniv))1'2 SE(fn(y)) = Uy){Gn{y)'î-xGn{y))xl\ SE(Uy)) and THEOREM 2. Uniformly in y El, SE{Uv)) ~ (n _1 Pn) 1/2 , SE(fn{y))/SE(fn(y)) = 1 + Opr(l). and ( f (.A — t („\ \ (Uy)-fn{y)\ •V(0,1). \ SE(fn(y)) J —It=follows=from-TheoremH2-ths,t—ff[(y)=±=z r=T5WiS^(=/rr(2/)=)==is^an-asymptotic— (1 — a)-level confidence interval for fn(y)\ if an = o((n~1pn)1^2), it is also an asymptotic (1 — a)-level confidence interval for f(y). Here $(zq) = q, $ being the standard normal distribution function. Let P denote the distribution corresponding to / , defined by P(A) = fA f, and let Pn and Pn be defined similarly in terms of fn and / n . Let A denote a class of subsets of C. Given distributions Q\ and Q2 on G set A NONPARAMETRIC FRAMEWORK FOR STATISTICAL MODELLING 1055 ||Qi — Q2II00 = sup^g^ IQi(-A) —Q2(A)\. Under reasonable conditions on A, Sn, and / , \\Pn - PU«, = 0(pn1/d6n) and It was shown in Stone [9] for the special case of d = 1, bases consisting of J3-splines, and A the collection of subintervals of a compact interval G, that 11Ai "~ jFnlloo = O p r ( n - 1 / 2 ) . What is a corresponding result in the more general context of the present paper? 3. Logistic regression. Let X, Y be a pair of random variables such that X ranges over a known compact subset G of R d and Y takes on only two values, 0 and 1. It is assumed that the distribution of X is absolutely continuous and that its density is bounded away from zero and infinity on G. Let / be the regression function, defined on G by f(x)=Pr(Y = l\X = x). It is assumed that / is continuous and that 0 < / < 1. Let g denote the corresponding logistic regression function, defined by 0 = logit(/) = l o g ( / / ( l - / ) ) ; so that / = exp(flf)/(l + exp(g)). We can approximate g by a member of a p n -dimensional vector space Sn of functions on G. Let Bnj, 1 < j < pn> denote a basis of Sn. Then the expected log-likelihood function A n ( • ) is defined by kn(0) = E J2e3Bnj(X)Y - log I 1 + exp j Ç 9 3 B n j ( X ) , 8 G 6n. Let 9n be the unique 9 G 8 n that maximizes A n (0) and set e 0n = E njBnj and fn = exp(gn)/(l + exp(^ n )). 3 Let (X\, Y i ) , . . . , (Xn, Yn) be independent random pairs, each having the same distribution as (X, Y). The log-likelihood function for the parametric model is given by ln(0) = J2 °3 E YiBm(Xi) - E l0S I * + exP ( E eJBn3iXi) 3 i i V V J J which corresponds to an exponential family in canonical form. Let 9n denote the MLE of 9 and set gn = Y^j KjBnj and fn = exp(gn)/(l + exp(£ n )). Under appropriate regularity conditions, analogs of Theorems 1 and 2 of §2 should hold. 1056 C. J. STONE 4. Additive logistic regression. Let G be a rectangle, say, G — [0, l]d. It is then useful in practice to assume that g is additive or, more generally, to replace g by its best additive approximation g* ; this is defined to be the unique additive function h on G that maximizes the expected log-likelihood E[h(X)Y- log(l + e h ( x ) ) ] . Set / * = exp(<7*)/(l + exp(g*)). If g itself is additive, then g* = g and /* = / . To obtain a p n -dimensional space of additive approximations to g*, we consider pnfc-dimensional vector spaces Snk of functions on [0,1] for 1 < k < d, each containing the constant functions, and let Sn be the collection of all functions of the form s(xi,. ..,Xd) = Ylk sk(%k), where Sk G Snk for 1 < k < d. Then Pn = 1 + J2k(Pnk — !)• Analogs of Theorem 1 of §2 and its consequences for optimal rates of convergence should hold with / replaced by /*, g replaced by g*, and d replaced by 1; see Stone [7, 8] for what has been rigorously verified to date. An analog to Theorem 2 should also hold if g itself is additive. Otherwise, a more complicated standard error formula would be required since Pr(Y = 1\X = x) would not be exactly equal to f*(x). REFERENCES 1. C. de Boor, A bound on the Loo-norm of the Li-approximation by splines in terms of a global mesh ratio, Math. Comp. 30 (1976), 765-771. 2. , A practical guide to splines, Springer-Verlag, New York, 1978. 3. J. Descloux, On finite element matrices, SIAM J. Numer. Anal. 9 (1972), 260-265. 4. C. J. Stone, Optimal rates of convergence for nonparametric estimators, Ann. Statist. 8 (1980), 1348-1360. 5. , Optimal global rates of convergence for nonparametric regression, Ann. Statist. 10 (1982), 1040-1053. 6. , Optimal uniform rate of convergence for nonparametric estimators of a density function or its derivatives, Recent Advances in Statistics: Papers in Honor of Herman Chernoff on his Sixtieth Birthday (M. H. Rezvi, J. S. Rustagi, and D. Siegmund, editors), Academic Press, New York, 1983, pp. 393-406. 7. , Additive regression and other nonparametric models, Ann. Statist. 13 (1985), 689-705. 8. , The dimensionality reduction principle for generalized additive models, Ann. Statist. 14 (1966), 590-606. 9. , Asymptotic properties of logspline density estimation, Technical Report No. 69, Department of Statistics, Univ. of California, Berkeley, 1986. UNIVERSITY OF CALIFORNIA, BERKELEY, CALIFORNIA 94720, USA