Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inverse problem wikipedia , lookup
Dirac delta function wikipedia , lookup
Arrow's impossibility theorem wikipedia , lookup
Probability box wikipedia , lookup
Generalized linear model wikipedia , lookup
Coase theorem wikipedia , lookup
Multiple-criteria decision analysis wikipedia , lookup
The Canadian Journal of Statistics Vol. 19, No. 3, 1991, Pages 307-321 La Revue Canadienne de Statistique Multivariate z-estimators for location and scatter* Hendrik P. L O P U H ~ Delft University of Technology Key words and phrases: z-estimators, high breakdown point, bounded influence, high efficiency. AMS 1985 subject classijicarions: 62F35, 62H12. ABSTRACT We discuss the robustness and asymptotic behaviour of 2-estimators for multivariate location and scatter. We show that 2-estimators correspond to multivariate M-estimators defined by a weighted average of redescending y-functions, where the weights are adaptive. We prove consistency and asymptotic normality under weak assumptions on the underlying distribution, show that 2-estimators have a high breakdown point, and obtain the influence function at general distributions. In the special case of a location-scatter family, 2-estimators are asymptotically equivalent to multivariate S-estimators defined by means of a weighted p-function. This enables us to combine a high breakdown point and bounded influence with good asymptotic efficiency for the location and covariance estimator. On traite de la robustesse et du comportement asymptotique des 2-estimateurs pour des paramktres de position et de dispersion multivariCs. On montre que les vestimateurs correspondent i des M-estimateurs multivariCs dCfinis par une moyenne pondCrCe de y-fonctions redescendantes, les poids Ctant adaptatifs. On dCmontre la convergence et la normalit6 asymptotique sous des hypothkses faibles i propos de la loi sous-jacente. On montre que les vestimateurs ont un point de rupture ClevC et on obtient la fonction d'influence pour des lois gCnCrales. Dans le cas d'une famille de position-dispersion particulikre, les 2-estimateurs sont asymptotiquement Cquivalents i des S-estimateurs multivariis dCfinis par une p-fonction pondCrCe. Ceci permet de joindre un point de rupture ClevC et une influence bomCe i une bonne efficacitC asymptotique pour les estimateurs de la position et de la covariance. 1. INTRODUCTION The minimum-volume ellipsoid (MVE) estimator is defined as the center and scatter matrix of the smallest ellipsoid containing at least half of the observations (Rousseeuw 1983). This estimator is known to have good robustness properties, but its limiting behaviour is poor, as it converges with rate -\j/;; towards a nonnormal limiting distribution (Kim and Pollard 1990, Davies 1989). To retain the robustness and to improve the asymptotic properties one can smooth the condition of covering half of the observations. This may result in multivariate S-estimators, defined as the center and scatter matrix of the smallest ellipsoid that satisfies a condition on the average of smoothly transformed Mahalanobis distances (Davies 1987, Lopuhaa 1989). In the univariate case this is *This research is financially supported by NWO under Grant 10-62-10. LOPUHAA 308 Vol. 19, No. 3 equivalent to computing an M-estimator of scale as a function of the location parameter p and minimizing it over p. The S-estimators converge with rate towards a normal distribution. However, there is a tradeoff between robustness and asymptotic efficiency: a high breakdown point corresponds to a low efficiency and vice versa. Yohai and Zamar (1988) investigated an extension of regression S-estimators, which retains the robustness and improves the asymptotic efficiency. In the special case of estimating univariate location and scale, their proposal amounts to the following. To make the M-estimator of scale more efficient, they consider an adaptive multiple of it, which they call a z-estimator of scale, and minimize this as a function of the location parameter. Regression z-estimators were studied under the assumption of the usual parametric regression model with random carriers independent of the error terms. In this paper we study the robustness and asymptotic behaviour of z-estimators for multivariate location and scatter under weak conditions on the underlying distribution. In Section 2 we give the definition of multivariate z-functionals and give sufficient conditions for their existence. Continuity of these functionals, and hence consistency of the z-estimators, is shown in Section 3. In Section 4 we show that multivariate zestimators relate to multivariate M-estimators as defined in Huber (1981). The location z-estimator is shown to be equivalent to a location M-estimator, defined by an adaptively weighted average of redescending W-functions;for the covariance z-estimator something similar holds. The corresponding M-estimator type of score equations therefore become too complicated to obtain a limit theorem by means of Huber's results (Huber 1967). Instead we will use empirical process theory (Pollard 1984) to obtain the simultaneous limiting distribution for z-estimators of location and scatter. The robustness of these estimators will be measured by means of the finite-sample breakdown point and the influence function. In Section 5 we show that z-estimators have the same high breakdown point as S-estimators, and we obtain the general expression for the influence function. In Section 6 we consider a parametric location-scatter family as a special case. It turns out that in this case the limiting normal distribution and the influence function of z-estimators are the same as those of multivariate S-estimators that are defined by means of a weighted p-function. This enables us to combine a high breakdown point and a bounded influence function with good asymptotic efficiency. For the univariate case our results concerning the limiting distribution of the location z-estimator coincides with the corresponding results for the z-estimator of the regression coefficient in Yohai and Zamar (1988). The high efficiency of the z-estimator of scale, which was stated but not proved in their paper, will be an immediate consequence of our results for the covariance z-estimator. 6 2. DEFINITION AND EXISTENCE 2.1. Definition. We will first define z-functionals and then consider the z-estimators as the image of the empirical distribution under these functionals. Denote by [MI the determinant of a p x p matrix M, and by hp(M) 5 . . . hl(M) the eigenvalues of M. Let pl and p2 be nonnegative functions on R, and let bl and b2 be positive constants. We define z-functionals for location and scatter as follows. Let t(P) and C(P) be the vector and the positive definite symmetric p x p matrix that minimize ( J p2({(I q(t, C) = I C I P - gTc-l(x - t)1112)~ P ( x ) ) (2.1) 1991 subject to 2-ESTIMATORS FOR LOCATION AND SCAlTER / T -1 P I ({(X- t) C (x - t)) 'I2) dP(x) = b, . 309 (2.2) Denote this minimization problem by (Tp). Call t(P) the location 7-functional, and define the covariance 7-functional as Let X I , . .. ,x, be n observations in RP, and denote by P, the corresponding empirical distribution. Multivariate 7-estimators are defined as the vector t, = t(P,) and the matrix v, = V(p,) = b;lC,(l/n) ~ 2 ( { ( ~-it n ) T ~ , l (~ it n))1/2),where t, and C, minimize ICI [ p2 ({(xi - t ) T ~ - l (-~t))1/2)]P i subject to xy=l xyzl Compare the definition of 7, = (t,, V,) with Definition 2.1 of S-estimators on p. 1664 in Lopuhaa (1989). Note that 7-estimators are an extension of S-estimators, which are defined by minimizing ICI subject to (2.3). If we choose pl = p2 and bl = b2, then t, and V, = C, are just the ordinary S-estimators. The least-squares estimators can also be obtained as a special case, namely with pl(Y) = p2(Y) = y2 and bl = b2 = p; likewise the MVE estimators with p1 = p2 an indicator function and bl = b2 essentially $. To get the good robustness from the MVE estimators and the good limiting properties from the least-squares estimators, we will take functions pl and p2 that are so to speak "in between" these two cases. Throughout the paper we will assume that pl and p2 both satisfy the following conditions: (Rl) pk(0) = 0, and pk is twice continuously differentiable. Denote by u/k the derivative of pk . (R2) There exists a finite constant ck > 0 such that pk is strictly increasing on [0, ck] and constant on [ck,GO). Write ak = pk(ck). In addition we impose the following condition only on the function p2: It will guarantee that the loss function in (2.1) is a strictly increasing function of the magnitude of C (see Remark 2.1). Together with the boundedness condition in (R2), i.e., ak = sup pk < XI, this provides the good breakdown properties of the 7-estimators. To guarantee the existence of solutions of (Tp), the constant bl in (2.2) must be chosen such that 0 < bl < al. A typical function p that satisfies all conditions above is Tukey's biweight function p&v;C) (Example 2.2 in Lopuhai 1989). The breakdown point ofthe 7-estimators turns out to be a function of bl/pl(cl). However, when X I , . . . ,x, are assumed to be a sample from an elliptical distribution with density IBI-'f(llB-l(x - p)II), where BBT = 2,one must choose bl = pl(IIxlllf(llxII) dx for consistency. In this case the breakdown point will be a function only of c l ; small values of cl correspond with a high breakdown point and vice versa. The smoothness conditions on pl and p2 are needed to obtain asymptotic normality and a bounded influence function. The constant b2 > 0 is only a normalizing constant to obtain consistency of V, for the "true" scatter parameter. In the case of elliptically distributed observations s LOPUHAA 31 0 Vol. 19,No. 3 one should choose b2 = S p2(11xllY(llxII)dx for V , to be consistent for 2 . In this case the limiting variances of the T-estimators turn out to depend on both cl and c2. However, for any cl fixed and c2 large these variances will be close to those of the sample mean and the sample covariance. This enables us to combine a high breakdown point and bounded influence with a good efficiency for both t, and V,, for instance at the normal distribution. Possible choices for p l and p2 are the biweight functions plb)= pB(Y; c l ) and ~ 2 b=)P e b ; ~ 2 ) . REMARK 2.1. When the distribution P does not have all its mass concentrated at one point, then any pair (t(P),C ( P ) ) that is a solution of minimization problem (Tp)will also be a solution to the problem of minimizing cp(t,C ) subject to It is more convenient to deal with (2.4)than with (2.2).To see this, consider the function for s > 0. Note that lsCl = sPICl and that the derivative of sp2(ys-f ) with respect to s is P 2 ~ ~ - i ) v 2 ~ ~ -1f~ s - 4 S . ince P cannot have all its mass at t, condition (A) implies that hl(s)> 0, so that h is strictly increasing in s > 0. By means of a standard argument it follows that any solution of ( T p ) will be a solution of the minimization problem with (2.4) instead of (2.2). 2.2. Existence. Denote by PDS(p) the class of all positive definite symmetric p x p matrices, and let O be the parameter space RP x PDS@), an open subset of ~~+fp(p+').Solutions of ( P p ) in O exist when P does not have too much mass concentrated at some hyperplane of dimension 5 p - 1, that is, when P satisfies the following property for small enough E : (H,) For every hyperplane H with dim(H) 5 p - 1 , it holds that P(H) E. THEOREM 2.1. If P satisfies property ( H Efor ) some 0 < E 5 1 - bl / a l , then (Pp) has at least one solution. Before we prove Theorem 2.1, we show two lemmas. They will imply that all possible solutions of (Pp) are contained in a compact subset of O. We will denote ellipsoids { x : ( x - o~c-'(x - t ) 5 c 2 ) by E(t, C , c). LEMMA 2.1. Suppose that (t,C ) E O satisfies the constraint (2.2) and that bl < al. Then there exists a constant q > 0, which only depends on thefinctions p l and p2 and on the constant bl, such that J p2 ( { ( x- t ) T ~ (-X l- t ) ) l J 2 )dP(x) 2 q. Proof. Consider the set B = { x : d ( x - t ) T ~ - l (-x t ) > p11(b1/2)).Then b~ = / pl ({(x - t )T C-1 ( x - t ) ) l n ) dP(x) bl < ?-{I + - P ( B ) ) al P(B). Since bl satisfies 0 < bl < al, it follows that P(B) 2 b1/(2al- b l ) > 0.This means that .f p2({(x - V T c - l ( x - t ) ) 1 / 2 dP(x) ) 2 p~(p:l(b1/2))bl/(2al- b l ) = q > 0. Q.E.D. 1991 2-ESTIMATORS FOR LOCATION AND SCAlTER 31 1 (i) If P satisfies (H,) and if P(E(t,C,cl)) 2 e , then there exists a constant kl > 0, which depends only on e , P, and cl, such that hp(C) kl. (ii) Suppose that S pl(llxll/m) dP(x) 5 bl and that hp(C)2 kl > 0.Then there exists a constant k2 < oo,which depends only on k l , q , p l , p2, and b l , such that if h l ( C ) > k2, the pair (t,C ) cannot be a solution of (Pp). (iii) Assume that P satisfies (H,), that P(E(t,C , c)) 6, and that 0 < kl 5 hp(C)5 h l ( C ) 5 k2 < oo. Then there exists a compact set K C O, which only depends on e , P, cl, k t , and k2, such that ( t ,C ) is contained in K. > > Proof. The proof is similar to the proof of Lemma 3.1 in Lopuhaa (1989);the condition (H,) can easily be seen to be equivalent with the condition (C,) that was used there. The proofs of (i) and (iii) remain the same. For (ii) note that according to Lemma 2.1, l (t)}lI2) ~ dP(x) 5 a2. Therefore, since (0,m;1) satisfies the q 5 S p2({(x - t ) T ~ - constraint (2.3), according to Remark 2.1 every possible solution of ( P p )must satisfy ICI 5 (m;a2/q)P, which means that h l ( C ) 5 (m;a2/q)P/k?-' < oo. Q.E.D. Proof of Theorem 2.1. Along the lines of the proof of Theorem 3.1 in Lopuhaa (1989), it follows with Lemma 2.2 that there exists a compact subset K C O to which we can restrict ourselves for solving (Pp).Since the loss function in (Pp)is a continuous function of t and C , it must attain a minimum on K. Q.E.D. The finite-sample situation is a special case of Theorem 2.1. Let k,, be the maximum number of xi's that are contained in a hyperplane of dimension p- 1. Obviously, k,, 2 p, and k, = p if X I , .. . ,x, are in general position, i.e., no p + 1 points lie in some lower-dimensional hyperplane. An immediate consequence of Theorem 2.1 is that if n(1 - b l / a l ) 2 k, + 1, problem (Pp,,) has at least one solution (t,, C,). To show that every solution (t,, C,) of (Pp,,)converges to a solution (t(P),C(P))of (Pp) we shall need that (t(P),C(P))is uniquely defined. This will be the case, for instance, for any elliptical distribution PP,=which satisfies the following condition. ( F ) f is nonincreasing and has at least one point of decrease on [O, min(c1, c2)]. Note that PP,z satisfies property (H,) for every 0 < e 2.1 at least one solution of (Ppp,) exists. 5 1, so that according to Theorem 2.2. Let PP,x be an elliptical distribution that satisjies (F). Choose bl = THEOREM S pi(llxIIZf(llxII)dx in (2.2). Then (Ppp,) has a unique solution (p,2 ) . Proof. First note that by means of a suitable rescaling it is sufficient to consider the problem (P&): find a vector t in RP and a diagonal matrix A = diag(hl, . . . ,h,) with all h, > 0 that minimize cp(t,A ) subject to To show that (Pp,,) has a unique solution @,X) it is equivalent to show that ( P k I ) has the unique solution (0,I). The proof of this is a subtle variation on the proof of Theorem 1 of Davies (1987). He shows that the ordinary S-minimization problem (P&) of h, over all t E RP and positive definite diagonal matrices A satisfying minimizing n;=, LOPUHAA 312 Vol. 19, No. 3 has the unique solution (0,I). This holds under conditions on the function p in (2.7) which are weaker than ( R l H R 2 ) and with f nonincreasing with at least one common point of decrease with the function -p. Therefore, under condition (F), Davies's Theorem 1 applies to the S-minimization problems with the function pl or p2 in (2.7). Note that because of the choice of bl in (2.6), the constraint in (2.2) of the problem (T&) is exactly the same as the constraint (2.7) of the S-minimization problem (P&) with p = p1. Since hi >_ 111 = 1. this minimization problem has a unique solution (0,I), we have h, 2 1 and ( t ,A ) satisfy the constraint (2.2)) and Define the sets A = { ( t ,A ) : B = { ( t ,A ) : ny=, h, = 1). We are left showing that the problem n;=, has a unique solution (0,I). We shall first show this for B instead of A. Since ny=, hi = 1 for (t,A ) E B , it follows that minimizing cp over B is equivalent to min (t,A))EB / p2 ( { ( x - t l T ' ~( x - t ) ) 1 / 2 ) f ( l ldx. ~~~) The key observation is that this minimzation problem is exactly the transformed maximization problem considered by Davies (1987, p. 1275). It is derived from the original S-minimization problem with the function p2 using that this S-minimization problem has solution (t*,A *) = (0,I). According to the proof of Theorem 1 of Davies (1987), the transformed problem has the unique solution (0,I ) ; hence the problem (2.9) has a unique solution (0,I). However, (0,I ) is also an element of the set A, so that minimizing cp over (t,A ) E A nB also has the unique solution (0,I). Therefore, for showing that the minimization problem (2.8) has a unique solution (0, I), we are left with showing that cp(0,I ) < cp(t,A ) for all ( t ,A ) E A \ B. Suppose there were a pair ( i ,A ) E A \ B with cp(i,A ) 5 cp(0,I). Then for some 0 < s < 1 the pair ( i , s A ) E B. The function cp(i,sA) is equal to the function h(s) in (2.5) with (t,C ) = ( i , A ) and P spherically symmetric. This function was already shown to be strictly increasing for s > 0. Therefore we would find cp(i,s A ) < cp(i,A ) 5 cp(0,I). But this would be in contradiction with the fact that (0,I) minimizes cp over B. Q.E.D. 3. CONTINUITY OF T-FUNCTIONALS Denote by 0 ( P ) = (t(P),C ( P ) )a solution of (I$).For a distribution P and a function g : Rp -t R we shall write Pg(.) = S g(x) dP(x). Finally, for 0 = ( t ,C ) write d(x,0 ) = d ( x - t ) T C - l ( ~ - t). (3.1) We first show continuity of the functional 0 (.). THEOREM 3.1. Let Pk,k 2 0,be a sequence of distributions that converges weakly to P. Let C be the class of all measurable convex subsets of RP, and suppose that every C E C is a P-continuity set, i.e., P(aC) = 0. Suppose that P satis$es (H,) for some 0 < 6 < 1-bl/al has and that 0 ( P ) = (t(P),C ( P ) ) is uniquely deJined. Then for k suficiently large (Tpk) at least one solution 8(PkX and for any sequence of solutions 8 ( P k ) ,k 2 0, it holds that limk, 0 ( P k )= 0 (P). Proof. The proof runs along the lines of the proof of Theorem 3.2 in Lopuhaa (1989), so that a brief sketch suffices. Without loss of generality we may assume that 0 ( P ) = (0, I). 1991 2-ESTIMATORS FOR LOCATION AND SCATTER 313 By means of Theorem 4.2 in Rao (1962) it follows that for k sufficiently large Pk satisfies (H1-b,/a,),SO that according to Theorem 2.1 at least one solution O(Pk) = 0k = ( t k , Ck) exists. By using that 8k satisfies the constraint (2.2), one can show that Pk(E(tk,Ck,cl)) 1 - bl/al > 6 and conclude with Theorem 4.2 in Rao (1962) that for k sufficiently large, P(E(tk,C k ,~ 1 ) ) e . According to Lemma 2.2(i) this means that there exists a constant kl > 0 such that hp(Ck) kl eventually. By using that pl is strictly increasing on [0, c l ] and that Pk -t P weakly, it follows that for each q > 0 and k sufficiently large, Pkp1(II . [ [ / ( I + q ) ) 5 b l . This means that the pair (0,( 1 + q ) 2 ~satisfies ) (2.4) for k sufficiently large. Using that this holds for q arbitrarily close to 0, it follows from Remark 2.1 that > > > > Since hp(Ck) k l , we find by Lemma 2.1 that h l ( C k )is uniformly bounded above, so that by Lemma 2.2(iii) it follows that there exists a compact subset K of O such that for k sufficiently large, Ok will be in K. Therefore it suffices to show that every convergent subsequence {Ok,) has limit (0,I). Let Ok,, j = 1,2,. . . , be a subsequence for which lim,, 8k, = OL. According to Pk,pl(d(., O k j ) ) = Ppl(d(.,OL)). Lemma 3.2 in Lopuhaa (1989) it holds that bl = limj, This means that satisfies the constraint (2.2) of ( T p ) . Since this problem has solution (O,I), we must have ICLI{Pp2(d(.,O L)))P {Pp2(11.1I))P. Then from (3.2) it follows that eL > However, (0,I ) is the unique solution of ( T p ) , SO that we conclude that OL = (0,I). Q.E.D. Continuity of the location r-functional t(.) is contained in Theorem 3.1. For the covariance r-functional V(.), we have: COROLLARY 3.1. Under the conditions of Theorem 3.1, limk, V ( P k )= V ( P ) . Proof. By definition we have V ( P k )= b ; ' C ( ~ ~ ) . P ~ p ~ O(Pk))). ( d ( . , According to Lemma 3.2 in Lopuhaa (1989) it holds that Pkp2(d(-,@(Pk))) -t Pp2(d(-,@(P))), so that the corollary immediately follows from Theorem 3.1. Q.E.D. Consistency of the r-estimators (t,, V,) is a consequence of the continuity of the functionals t(.) and V(.). Let X I ,X2,. . . be a sequence of independent random vectors in Rp with a distribution P. From now on denote by P, the empirical distribution corresponding with X I , .. . ,X,. COROLLARY 3.2. If the distribution P satisjes the conditions of Theorem 3.1, then limn, (t,, V,) = (t(P),V ( P ) )with probability one. The condition that every convex set is a P-continuity set is in fact not needed in Corollary 3.2. It was needed in the proof of Theorem 3.1 merely to guarantee that the class E of all ellipsoids in RP satisfies sup^ IPk(E)-P(E)I -t 0. Since E has polynomial discrimination (Pollard 1984, p. 17), for the empirical distribution P, this property is a consequence of Theorem 11.14 in Pollard (1984). When P,z is an elliptical distribution that satisfies (F), and bl is chosen as in Theorem 2.2, then the conditions of Theorem 3.1 are satisfied, and hence (t,, C,) -t @,Z) with LOPUH~ 314 Vol. 19, No. 3 probability one. Therefore, if we want V , to be consistent for 2 , we must choose b2 = S p2(11xl1)f(llxII)dx. In general, if C ( P ) is considered to be the true scatter parameter to be estimated, one should choose for V, to be consistent for C(P). 4. LIMITING DISTRIBUTION We first investigate the asymptotic behaviour of (t,, C,), from which the limiting distribution of the z-estimators (t,, V , ) will follow. We assume that P satisfies property ( H , ) for some 0 < E < 1 - b l / a l and that the minimization problem (Tp) has a unique solution 80 = (to, Co),and we choose b~ by (3.3). It will be more convenient to consider (t,, C,) as solutions to the problem of finding a vector t and a positive definite symmetric matrix C that minimize subject to (2.3), where di = problem ( T p , , ) of Section 2.1. J ( x-~t ) T ~ - l -( ~t )i. This problem is equivalent to 4.1. Relation to M-Estimators. The relation with multivariate M-estimators can be obtained along the lines of Lopuhaa (1989). The Lagrangian corresponding to the minimization problem (Pp,,)is ~,(tC , , h) = 10glCl + p log (i f ~ 2 , d i ) )* ( An 2 ~l(di)-bl i= 1 Every solution 0 , = (t,, C,) of (Tp,,) must be a zero of all partial derivatives of L,. Therefore, besides satisfying the constraint (2.3), 0 , must also be a solution of the simultaneous equations n 2 {(i i= 1 j= 1 ~ 2 ( d j i ) - '~ ~ 2 ( d i h) V l ( d i ) } (Xi - t) = 0, di di After solving for h in the second (matrix) equation, we get two simultaneous equations in t and C . To keep things tidy we introduce the functions 1991 2-ESTIMATORS FOR LOCATION AND SCATTER 315 with d(x, 0 ) as in (3.1), and define A , @ ) = P,a(., 0 ) and B,(O) = P,b(., 0 ) . The simultaneous equations that arise are perhaps described most conveniently with the function Wn(.,0 ) = A ~ ( ~ ) w I+(Bn(0 .) ) ~ 2 ( . ) 7 which is an adaptively weighted average of the functions y1 and 1412. We obtain the equations This is of course a system of linearly dependent equations. However, by adding a suitable multiple of the constraint (2.3)to the second equation we can avoid the linear dependence. It follows that every solution 0 , of (Tp") must be a solution of the simultaneous equations These equations look like the M-estimator-type score equations as defined in Huber (1981), except that the function yf,(., 0 ) is a weighted average of two yf-functions, in which the weights depend on the sample X I ,. . . ,X,. 4.2, Asymptotic Normality Although 0, is a solution of the equations (4.1) defined with the function y,(.,O), only the limiting expression of y,(.,On) is of importance for the asymptotic behaviour of 0,. It turns out that (Lemma 4.3 in Lopuhaa 1990) with probability one, where A. = Pa(., 00) and Bo = Pb(., 00). This means that the function \y,(., 0 , ) converges pointwise with probability one to the function Note that because P satisfies (H,), we have that A. > 0, and since O0 satisfies (2.2), the ellipsoid E(to, Co,C , ) must have positive probability, which means that Bo > 0.The limiting distribution of (t,, C , ) will be shown to be the same as that of the multivariate M-estimators that solve (4.1) with yo(.)instead of yf,(., 0 ) . We can write the equations (4.1) briefly as 316 LOPUHAA Vol. 19, No. 3 The function Onis a weighted average A,(B)W (x,0 ) + Bn(B)W2(x,0 ) + 2b2R(x,0 ) of the functions W k = (WkJK,Wk,cov), where for k = 1,2, and of the function R = (0,Rev), where %,,(x, 0 ) = {pl(d(x,0 ) )- b l ) C . Let W O = (Wo,loc,W~,cov) be the function W,, except that yo(.)[see (4.3)] replaces '+fn(., 0 1: W O ( X0, ) = AoW1(x, 8 ) + BoW2(x, 0 ) + 2b2R(x, 0 ). (4.5) We shall use a tightness property from Pollard (1 984) for empirical processes (P, -P)$ indexed by functions $ in a class F . It is a combination of the approximation lemma (p. 27), Lemma 11.36 (p. 36), and the equicontinuity lemma (p. 150): LEMMA 4.1. let F be a permissible class of real-valued finctions with envelope F, and suppose that 0 < P F ~< m. If the class of graphs of finctions in F has polynomial discrimination, then for each 17 > 0 and E > 0 there exists a 6 > 0 for which where [61 = (($1, $2) : $1, $2 E F and P($I - $2)' I 6'). By the envelope F of F is meant a function F for which [$I 5 F for every $ E F . The classes of functions that we shall encounter will be indexed by 0,which means that these classes will always be permissible in the sense of Pollard (1984, Appendix C). For the concept of polynomial discrimination we refer to Pollard (1984, p. 17). Application of Lemma 4.1 involves some technicalities. For most of these we refer to Lopuhai (1990), and we restrict ourselves to illustrating how Lemma 4.1 can be used to obtain the limiting distribution of f i ( 7 , - 00). As a consequence of Lemma 4.1 and the consistency of 0 , (Theorem 3.1) we have the following properties (Lemma 4.5 in Lopuhaa 1990): for k = 1,2. The proof of (4.6) boils down to showing that each real-valued component of , R is a linear combination of functions of the type g(d(x,8)), the functions W I l l , W 2and h(d(x,0 ))xj,and k(d(x,0 ))xixi, with g, h, and k continuous and of bounded variation. For this type of functions it can be shown (Lemma 4.2 and 4.4 in Lopuhai 1990) that the classes of graphs that correspond to the classes of functions 9 = {g(d(x,0 ) ) : 0 E O ) , 8 = {h(d(x,O))xj : 0 E O ) , and !7Gj = {k(d(x,0 ) ) x q : 0 E 0 ) respectively, have and K, polynomial discrimination. If in addition IEllX1/I4 < m, then the classes G , have bounded envelopes, and hence we can apply Lemma 4.1 to the different functions that form each such real-valued component. After we put all components together, (4.6) follows. The property (4.6) is basically all we need to determine the limiting behaviour of the This turns out to be the same as that of the multivariate solutions 0, of the problem (Ppn). M-estimator defined by CL, WO(Xi,0 ) = 0 , where W o is given in (4.5). e, 1991 2-ESTIMATORS FOR LOCATION AND SCAlTER 317 THEOREM 4.1. Suppose that EllX1 [I4 < oo, that PWO(.,8 ) has a nonsingular derivative A. at 80, and that PWk(.,go) = 0 for k = 1,2. Then for solutions 0, = (t,, C,) of (Tp,,), Proof. We give a brief sketch of the proof to illustrate the use of (4.6). For details we refer to Lopuhaa (1990). Since 0 , is a solution of (4.4), it follows from (4.2) that By (4.6), P,Wo(., 8,) = PWo(., 0,) + (P, - P)Wo(.,0 0 )+ o p ( l / f i ) .Because 80 satisfies (2.2) and PWk(.,OO)= 0 for k = 1,2, we have that PWO(.,OO)= 0. Therefore, by If we treat the Taylor's formula it follows that PWo(., 8,) = Ao(O,- 00)+op(l18,- tlO1l). other two terms in (4.7) accordingly, they reduce to oP(ll8, - Ooll) + o p ( l / f i ) ,because (P, - P)w\~lk(., 80) = O p ( l / f i )according to the central limit theorem. We find that Since A. is nonsingular, it follows that 0 , - 00 = O P ( 1 / f i ) ,and after we put this into (4.8) the theorem follows. Q.E.D. 4.3. Asymptotic Normality of the 7-Estimators. The limiting behaviour of T , = (t,, V,) may now be obtained from Theorem 4.1 by expressing V , in terms of 8, - 80. Let R ~ ( x8, ) = p2(d(x,0 )) - b2, and denote by D2 the derivative of PR2(.,8 ) at 00. Similarly to (4.6) one may first show fi(P,-P){R2(., 0,)R2(.,0 0 ) ) = o p ( l / f i )and then obtain (Lemma 4.6 in Lopuhaa 1990) THEOREM 4.2 Let R12(x, 0 ) = 01{p2(d(x,8 ) )- b2) - @{pl(d(x,0 ) ) - b l ) , where o k = Evk(d(X1,80)). Under the conditions of Theorem 4.1, f i ( ~-, 80) has a limiting normal distribution with zero mean and covariance matrix A { ' M A { l T , where M is the 2 ( ~ , CO). covariance matrix of T(X1,go), with T ( x ,8 ) = W O ( x0, ) - ( b 2 ~ 1 ) - 1 ~ 01 )AO(O, Proof. With the expressions for the derivatives D2 and A. (Lemma 4.7 in Lopuhaa 1990) and the covariance part of Theorem 4.1, one can obtain the expansion By (4.9) if follows that and hence LOPUHAA 318 Vol. 19, No. 3 After we apply the linear map A. to both sides, it follows from Theorem 4.1 that Since T ( x ,6 0 ) is bounded, the theorem follows from the central limit theorem. Q.E.D. 5. ROBUSTNESS 5.1. Breakdown Point. The finite-sample breakdown point (Donoho and Huber 1983) of a location estimator t, at a collection X = ( x l ,.. . ,x,) is defined as the smallest fraction m / n of outliers that can take the estimator over all bounds, ~ * ( t ,X, ) = min Isrnln { mn : sup Y, l/t,(x)- ~.(Y,)II= m } . where the supremum is taken over all possible corrupted collections Y , that can be obtained from X by replacing m points of X with arbitrary values. The breakdown point of a convariance estimator C , at a collection X is defined as the smallest fraction m / n of outliers that can either take the largest eigenvalue hl(C,) over all bounds, or take the smallest eigenvalue hp(C,) arbitrarily close to zero: E*(C,, X ) = min D(C,(X), C,(Y,)) = ca where the supremum is taken over the same corrupted collections Y , as in (5.1), and where D(A, B ) = max{l h l ( A )- hl(B)I,IhP(A)-' - hp(B)-' 1). THEOREM 5.1. Let X be a collection of n 2 p + 1 points in RP in general position. If b l / a l ( n - p)/(2n), then the 2-estimators (t,,V,) have breakdown point c*(t,,X) = E*(V,,X) = [ d l/a l l / n , where fyl denotes the smallest integer y. < > Proof. As we can always rescale the functions pl, we may assume that a1 = 1. According to Lemma 2.1 there exists a constant q > 0 which only depends on pi, p2, and b l , such that I - C i=1 p2 ({(xi - t n ) T ~ i l ( ~ itn))'") >q Therefore V , breaks down whenever C, does, and it suffices to consider breakdown of t, and C,. The rest of the proof runs along the lines of the proof of Theorem 3.2 in Lopuhaa and Rousseeuw (1991). See Lopuhaa (1990) for details. Q.E.D. The optimal value (l/n)L(n-p+ 1)/2] for the breakdown point is obtained by choosing b l / a l = ( n -p)/(2n). Note that the breakdown point of the z-estimators depends only on the constant b t / a l , or only on the constant cl if bl is chosen as in Theorem 2.2. This means that p2 can be varied without changing the value of the breakdown point. 5.2. Influence Function. Whereas the breakdown point measures the global sensitivity of an estimator under large pertubations, the local sensitivity may be measured by the influence function 1991 T-ESTIMATORS FOR LOCATION AND SCAlTER 319 (Hampel 1974), which describes the influence of one single outlier. For the z-functional (t(.), V(.)) it is defined as T (-) = IF(x; T, P) = lim h10 T ((1 - h)P + h k ) - T (P) h where 6, denotes the Dirac measure concentrated in x E W. We assume that P satisfies property (H,) for some 0 < 6 < 1 - bllal and that the minimization problem (Pp) has a unique solution O0 = (to, Co), and we take b2 as in (3.3). For x E RP and 0 5 h 5 1 write Ph,x= (1 - h)P + h6,. For h 10 the distribution Ph,x converges weakly to P. According to Theorem 3.1 this means that at least one solution to the problem (Pph,x)exists for h sufficiently small and that 0 (Ph,x)+ 00. THEOREM 5.2. Under the conditions of Theorem 4.1 the T-functionalhas influencefunction IF(x; T ,P) = -AclT(x, oO),where the function T(x, 8 ) is dejined in Theorem 4.2. If one reads Ph,xinstead of P,, the proof runs along the lines of the proof of Theorem 4.2. We refer to Lopuhaa (1990) for details. It follows immediately from the expression for T(x, 00) given in Theorem 4.2 that IF(x; T ,P) is bounded. A more explicit expression for IF(x;T,P) can be obtained at elliptical distributions. This will be done in the next section. 6. ELLIPTICAL DISTRIBUTIONS As a special case we consider elliptical distributions. In this case one may show that T(x, 00) = So(x,00) where So = (SO,loc, SO,cov) is the function where po is a weighted average of the functions pl and p2, bo = Aobl+Bob2 = Epo(d(X1, go)), and yro is defined in (4.3). Furthermore, the derivative of PSo(-,0 ) at O0 is equal to Ao. The function So is exactly the O-valued function in c:) the equation C:=, So(Xi,0 ) = 0, of which the multivariate S-estimator 0; = defined by the function po is a solution [see (2.7) on p. 1666 in Lopuhaa (1989). Since T, behaves like the solutions of C:=, T(Xi, 0 ) = 0 [see (4.10)], its asymptotic properties must be the same as that of the multivariate S-estimator 0; (see Lopuhd 1990 for a more rigorous argument). The exact expressions for the limiting distribution and the influence function can thus be read from Corollaries 5.1 and 5.2 in Lopuhd (1989). Let vec(A) be p2-vector that stacks the columns of a p x p matrix A on top of each other, and let K,,, be the p2 x p2 permutation matrix uniquely defined by the property KP,, vec(A) = v e c ( ~ for ~ ) all A. Denote by A @3B the Kronecker product which is a p2 x p2 block matrix with the (i,j)th block equal to aUB. (c, LOPUHAA 320 Vol. 19, No. 3 e0 COROLLARY 6.1. Let P be an elliptical distribution with parameter = (IL, Z), let dg = - p ) 2 - l ( X 1 - p), and let po and yo = pb be defined by (6.2) and (4.3). Suppose that the conditions of Theorem 2.2 hold, that P has a finite fourth moment, and that XI eO) Then h(7, has a limiting normal distribution with zero mean, and t, and V , are asymptotically independent. The covariance of the limiting distribution of f i ( t , - p ) is given by ( c b / P i ) Z , where cb = p-'~v;(do) and Po = p-'E{(p - l)vo(do)/do+%(do)). The covariance matrix of the limiting distribution of the matrix h ( V , - 2 ) is given by ~, ool = p(p + 2)-'yf2~vi(do)do2and ool(I + Kp,p)(Z8 2 ) + 0 0 2 vec(2) v e ~ ( z ) where (TO;! = -(2/p)001 + 4 0 1 ; ~ ~ { p ~( db0I2, ~ ) with yo defied in (6.3) and wo = Eyo(do)do. Note that because Eg(do) = Sg(llxll)f(llxII) dx, the scalars in Corollary 6.1 do not depend on (IL,Z). When p2b)tends to some multiple of y2 as c;! -+ 00, for instance when p 2 6 ) is the biweight function ps(y; c2), then aa/Pi, ool, and 0 0 2 will tend to the corresponding values for the sample mean and the sample covariance. This means that for large values of c;! one has good asymptotic efficiency relative to the sample mean and sample covariance. This is true for any fixed value of cl. Hence, we can choose cl such that t, and V , have a high breakdown point (Theorem 5.1) and then vary c;! to obtain good efficiency (for instance) at the normal distribution. For the influence function we only give the expression at spherically symmetric distributions. The expressions at general elliptical distributions can be found by using affine equivariance. COROLLARY 6.2. Let P be spherically symmetric. Under the conditions of Corollary 6.1 it holds that the location 2-functional has influence function where is dejined in Corollary 6.1. The covariance 2-functional has influence function where yo and Q, are defined in Corollary 6.1. ACKNOWLEDGEMENT I thank Werner Stahel and Rudolf Griibel for helpful suggestions and remarks. REFERENCES Davies, P.L. (1987). Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Statist., 15, 1269-1292. Davies, P.L. (1989). Improving S-estimators by means of k-step M-estimators. Technical Report, GHS-Essen. Donoho, D.L., and Huber, P.J. (1983). The notion of breakdown point. A Festschrififor Erich L. Lehmann (P.J. Bickel, K.A. Doksum, and J.L. Hodges, Jr., eds.), Wadsworth, Belmont, Calif., 157-184. Hampel, F.R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69, 383-393. Huber, P.J. (1967). The behaviour of maximum likelihood estimates under nonstandard conditions. Proceedings of the Fifh Berkeley Symposium on Mathematical Statistics and Probability (L. Le Cam and J. Neyman, eds.), Univ. of California Press, Berkeley, 221-233. 2-ESTIMATORS FOR LOCATION AND SCAlTER Huber, P.J. (1981). Robust Statistics. Wiley, New York. Kim, J., and Pollard, D. (1990). Cube root asymptotics. Ann. Statist., 18, 191-219 Lopuhd, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. Ann. Statist., 17, 1662-1683. Lopuhd, H.P. (1990). Multivariate 2-estimators for location and scatter. Technical Report 90-04, Delft University of Technology. Lopuhd, H.P., and Rousseeuw, P.J. (1991). Breakdown properties of affine equivariant estimators of multivariate location and covariance matrices. Ann. Statist., 19, 229-248 Pollard, D. (1984). Convergence of Stochastic Processes. Springer-Verlag, New York. Rao, R.R. (1962). Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist., 33, 659-680. Rousseeuw, P.J. (1983). Multivariate estimation with high breakdown point. Presented at the Fourth Pannonian Symposium on Mathematical Statistics and Probability, Bad Tatzmannsdorf, Austria, 4 9 September 1983. Mathematical Statistics and Applications (1985)(W. Grossmann, G. Mug, I. Vincze, and W. Wertz, eds.), Reidel, Dordrecht, 283-297. Yohai, V.J., and Zarnar, R. (1988). High breakdown-point of estimates of regression by means of the minimization of an efficient scale. J. Amer. Statist. Assoc., 83, 406-413. Received 22 January 1990 Revised 19 July 1990 Accepted 18 October 1990 Depaninent of Mathematics Delft University of Technology Julianalaon 132 2628 BL Delft The Netherlands