Download Multivariate z-estimators for location and scatter

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inverse problem wikipedia , lookup

Dirac delta function wikipedia , lookup

Arrow's impossibility theorem wikipedia , lookup

Probability box wikipedia , lookup

Generalized linear model wikipedia , lookup

Coase theorem wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Mathematical optimization wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Transcript
The Canadian Journal of Statistics
Vol. 19, No. 3, 1991, Pages 307-321
La Revue Canadienne de Statistique
Multivariate z-estimators for
location and scatter*
Hendrik P. L O P U H ~
Delft University of Technology
Key words and phrases: z-estimators, high breakdown point, bounded influence, high
efficiency.
AMS 1985 subject classijicarions: 62F35, 62H12.
ABSTRACT
We discuss the robustness and asymptotic behaviour of 2-estimators for multivariate location and
scatter. We show that 2-estimators correspond to multivariate M-estimators defined by a weighted
average of redescending y-functions, where the weights are adaptive. We prove consistency and
asymptotic normality under weak assumptions on the underlying distribution, show that 2-estimators
have a high breakdown point, and obtain the influence function at general distributions. In the
special case of a location-scatter family, 2-estimators are asymptotically equivalent to multivariate
S-estimators defined by means of a weighted p-function. This enables us to combine a high
breakdown point and bounded influence with good asymptotic efficiency for the location and
covariance estimator.
On traite de la robustesse et du comportement asymptotique des 2-estimateurs pour des
paramktres de position et de dispersion multivariCs. On montre que les vestimateurs correspondent
i des M-estimateurs multivariCs dCfinis par une moyenne pondCrCe de y-fonctions redescendantes,
les poids Ctant adaptatifs. On dCmontre la convergence et la normalit6 asymptotique sous des hypothkses faibles i propos de la loi sous-jacente. On montre que les vestimateurs ont un point
de rupture ClevC et on obtient la fonction d'influence pour des lois gCnCrales. Dans le cas d'une
famille de position-dispersion particulikre, les 2-estimateurs sont asymptotiquement Cquivalents i
des S-estimateurs multivariis dCfinis par une p-fonction pondCrCe. Ceci permet de joindre un point
de rupture ClevC et une influence bomCe i une bonne efficacitC asymptotique pour les estimateurs
de la position et de la covariance.
1. INTRODUCTION
The minimum-volume ellipsoid (MVE) estimator is defined as the center and scatter
matrix of the smallest ellipsoid containing at least half of the observations (Rousseeuw
1983). This estimator is known to have good robustness properties, but its limiting
behaviour is poor, as it converges with rate -\j/;; towards a nonnormal limiting distribution
(Kim and Pollard 1990, Davies 1989). To retain the robustness and to improve the
asymptotic properties one can smooth the condition of covering half of the observations.
This may result in multivariate S-estimators, defined as the center and scatter matrix of
the smallest ellipsoid that satisfies a condition on the average of smoothly transformed
Mahalanobis distances (Davies 1987, Lopuhaa 1989). In the univariate case this is
*This research is financially supported by NWO under Grant 10-62-10.
LOPUHAA
308
Vol. 19, No. 3
equivalent to computing an M-estimator of scale as a function of the location parameter
p and minimizing it over p. The S-estimators converge with rate
towards a normal
distribution. However, there is a tradeoff between robustness and asymptotic efficiency:
a high breakdown point corresponds to a low efficiency and vice versa.
Yohai and Zamar (1988) investigated an extension of regression S-estimators, which
retains the robustness and improves the asymptotic efficiency. In the special case of
estimating univariate location and scale, their proposal amounts to the following. To
make the M-estimator of scale more efficient, they consider an adaptive multiple of it,
which they call a z-estimator of scale, and minimize this as a function of the location
parameter. Regression z-estimators were studied under the assumption of the usual
parametric regression model with random carriers independent of the error terms.
In this paper we study the robustness and asymptotic behaviour of z-estimators for
multivariate location and scatter under weak conditions on the underlying distribution.
In Section 2 we give the definition of multivariate z-functionals and give sufficient
conditions for their existence. Continuity of these functionals, and hence consistency
of the z-estimators, is shown in Section 3. In Section 4 we show that multivariate zestimators relate to multivariate M-estimators as defined in Huber (1981). The location
z-estimator is shown to be equivalent to a location M-estimator, defined by an adaptively
weighted average of redescending W-functions;for the covariance z-estimator something
similar holds. The corresponding M-estimator type of score equations therefore become
too complicated to obtain a limit theorem by means of Huber's results (Huber 1967).
Instead we will use empirical process theory (Pollard 1984) to obtain the simultaneous
limiting distribution for z-estimators of location and scatter.
The robustness of these estimators will be measured by means of the finite-sample
breakdown point and the influence function. In Section 5 we show that z-estimators have
the same high breakdown point as S-estimators, and we obtain the general expression
for the influence function. In Section 6 we consider a parametric location-scatter family
as a special case. It turns out that in this case the limiting normal distribution and the
influence function of z-estimators are the same as those of multivariate S-estimators
that are defined by means of a weighted p-function. This enables us to combine a high
breakdown point and a bounded influence function with good asymptotic efficiency.
For the univariate case our results concerning the limiting distribution of the location
z-estimator coincides with the corresponding results for the z-estimator of the regression
coefficient in Yohai and Zamar (1988). The high efficiency of the z-estimator of scale,
which was stated but not proved in their paper, will be an immediate consequence of our
results for the covariance z-estimator.
6
2. DEFINITION AND EXISTENCE
2.1. Definition.
We will first define z-functionals and then consider the z-estimators as the image
of the empirical distribution under these functionals. Denote by [MI the determinant
of a p x p matrix M, and by hp(M) 5 . . . hl(M) the eigenvalues of M. Let pl and
p2 be nonnegative functions on R, and let bl and b2 be positive constants. We define
z-functionals for location and scatter as follows.
Let t(P) and C(P) be the vector and the positive definite symmetric p x p matrix that
minimize
( J p2({(I
q(t, C) = I C I
P
- gTc-l(x - t)1112)~ P ( x ) )
(2.1)
1991
subject to
2-ESTIMATORS FOR LOCATION AND SCAlTER
/
T -1
P I ({(X- t) C
(x - t)) 'I2) dP(x) = b, .
309
(2.2)
Denote this minimization problem by (Tp). Call t(P) the location 7-functional, and define
the covariance 7-functional as
Let X I , . .. ,x, be n observations in RP, and denote by P, the corresponding empirical
distribution. Multivariate 7-estimators are defined as the vector t, = t(P,) and the matrix
v, = V(p,) = b;lC,(l/n)
~ 2 ( { ( ~-it n ) T ~ , l (~ it n))1/2),where t, and C,
minimize ICI [
p2 ({(xi - t ) T ~ - l (-~t))1/2)]P
i
subject to
xy=l
xyzl
Compare the definition of 7, = (t,, V,) with Definition 2.1 of S-estimators on p. 1664
in Lopuhaa (1989). Note that 7-estimators are an extension of S-estimators, which are
defined by minimizing ICI subject to (2.3). If we choose pl = p2 and bl = b2, then t,
and V, = C, are just the ordinary S-estimators.
The least-squares estimators can also be obtained as a special case, namely with
pl(Y) = p2(Y) = y2 and bl = b2 = p; likewise the MVE estimators with p1 = p2 an
indicator function and bl = b2 essentially $. To get the good robustness from the MVE
estimators and the good limiting properties from the least-squares estimators, we will
take functions pl and p2 that are so to speak "in between" these two cases. Throughout
the paper we will assume that pl and p2 both satisfy the following conditions:
(Rl) pk(0) = 0, and pk is twice continuously differentiable. Denote by u/k the
derivative of pk .
(R2) There exists a finite constant ck > 0 such that pk is strictly increasing on [0, ck]
and constant on [ck,GO). Write ak = pk(ck).
In addition we impose the following condition only on the function p2:
It will guarantee that the loss function in (2.1) is a strictly increasing function of the
magnitude of C (see Remark 2.1). Together with the boundedness condition in (R2),
i.e., ak = sup pk < XI, this provides the good breakdown properties of the 7-estimators.
To guarantee the existence of solutions of (Tp), the constant bl in (2.2) must be chosen
such that 0 < bl < al. A typical function p that satisfies all conditions above is Tukey's
biweight function p&v;C) (Example 2.2 in Lopuhai 1989).
The breakdown point ofthe 7-estimators turns out to be a function of bl/pl(cl). However, when X I , . . . ,x, are assumed to be a sample from an elliptical distribution with density IBI-'f(llB-l(x - p)II), where BBT = 2,one must choose bl = pl(IIxlllf(llxII) dx
for consistency. In this case the breakdown point will be a function only of c l ; small
values of cl correspond with a high breakdown point and vice versa. The smoothness
conditions on pl and p2 are needed to obtain asymptotic normality and a bounded influence function. The constant b2 > 0 is only a normalizing constant to obtain consistency
of V, for the "true" scatter parameter. In the case of elliptically distributed observations
s
LOPUHAA
31 0
Vol. 19,No. 3
one should choose b2 = S p2(11xllY(llxII)dx for V , to be consistent for 2 . In this case
the limiting variances of the T-estimators turn out to depend on both cl and c2. However,
for any cl fixed and c2 large these variances will be close to those of the sample mean
and the sample covariance. This enables us to combine a high breakdown point and
bounded influence with a good efficiency for both t, and V,, for instance at the normal
distribution. Possible choices for p l and p2 are the biweight functions plb)= pB(Y; c l )
and ~ 2 b=)P e b ; ~ 2 ) .
REMARK
2.1. When the distribution P does not have all its mass concentrated at one
point, then any pair (t(P),C ( P ) ) that is a solution of minimization problem (Tp)will
also be a solution to the problem of minimizing cp(t,C ) subject to
It is more convenient to deal with (2.4)than with (2.2).To see this, consider the function
for s > 0. Note that lsCl = sPICl and that the derivative of sp2(ys-f ) with respect to s is
P 2 ~ ~ - i ) v 2 ~ ~ -1f~ s - 4 S
. ince P cannot have all its mass at t, condition (A) implies
that hl(s)> 0, so that h is strictly increasing in s > 0. By means of a standard argument
it follows that any solution of ( T p ) will be a solution of the minimization problem with
(2.4) instead of (2.2).
2.2. Existence.
Denote by PDS(p) the class of all positive definite symmetric p x p matrices, and let
O be the parameter space RP x PDS@), an open subset of ~~+fp(p+').Solutions of ( P p )
in O exist when P does not have too much mass concentrated at some hyperplane of
dimension 5 p - 1, that is, when P satisfies the following property for small enough E :
(H,) For every hyperplane H with dim(H) 5 p - 1 , it holds that P(H)
E.
THEOREM
2.1. If P satisfies property ( H Efor
) some 0 < E 5 1 - bl / a l , then (Pp) has at
least one solution.
Before we prove Theorem 2.1, we show two lemmas. They will imply that all possible
solutions of (Pp) are contained in a compact subset of O. We will denote ellipsoids
{ x : ( x - o~c-'(x
- t ) 5 c 2 ) by E(t, C , c).
LEMMA
2.1. Suppose that (t,C ) E O satisfies the constraint (2.2) and that bl < al. Then
there exists a constant q > 0, which only depends on thefinctions p l and p2 and on the
constant bl, such that J p2 ( { ( x- t ) T ~ (-X l- t ) ) l J 2 )dP(x) 2 q.
Proof. Consider the set B = { x : d ( x - t ) T ~ - l (-x t ) > p11(b1/2)).Then
b~ =
/
pl
({(x - t )T C-1 ( x - t ) ) l n ) dP(x)
bl
< ?-{I
+
- P ( B ) ) al P(B).
Since bl satisfies 0 < bl < al, it follows that P(B) 2 b1/(2al- b l ) > 0.This means that
.f p2({(x - V T c - l ( x - t ) ) 1 / 2 dP(x)
)
2 p~(p:l(b1/2))bl/(2al- b l ) = q > 0. Q.E.D.
1991
2-ESTIMATORS FOR LOCATION AND SCAlTER
31 1
(i) If P satisfies (H,) and if P(E(t,C,cl)) 2 e , then there exists a constant kl > 0,
which depends only on e , P, and cl, such that hp(C) kl.
(ii) Suppose that S pl(llxll/m) dP(x) 5 bl and that hp(C)2 kl > 0.Then there exists
a constant k2 < oo,which depends only on k l , q , p l , p2, and b l , such that if h l ( C ) > k2,
the pair (t,C ) cannot be a solution of (Pp).
(iii) Assume that P satisfies (H,), that P(E(t,C , c)) 6, and that 0 < kl 5 hp(C)5
h l ( C ) 5 k2 < oo. Then there exists a compact set K C O, which only depends on
e , P, cl, k t , and k2, such that ( t ,C ) is contained in K.
>
>
Proof. The proof is similar to the proof of Lemma 3.1 in Lopuhaa (1989);the condition
(H,) can easily be seen to be equivalent with the condition (C,) that was used there.
The proofs of (i) and (iii) remain the same. For (ii) note that according to Lemma 2.1,
l (t)}lI2)
~
dP(x) 5 a2. Therefore, since (0,m;1) satisfies the
q 5 S p2({(x - t ) T ~ - constraint (2.3), according to Remark 2.1 every possible solution of ( P p )must satisfy
ICI 5 (m;a2/q)P, which means that h l ( C ) 5 (m;a2/q)P/k?-' < oo. Q.E.D.
Proof of Theorem 2.1. Along the lines of the proof of Theorem 3.1 in Lopuhaa (1989),
it follows with Lemma 2.2 that there exists a compact subset K C O to which we can
restrict ourselves for solving (Pp).Since the loss function in (Pp)is a continuous function
of t and C , it must attain a minimum on K. Q.E.D.
The finite-sample situation is a special case of Theorem 2.1. Let k,, be the maximum
number of xi's that are contained in a hyperplane of dimension p- 1. Obviously, k,, 2 p,
and k, = p if X I , .. . ,x, are in general position, i.e., no p + 1 points lie in some
lower-dimensional hyperplane. An immediate consequence of Theorem 2.1 is that if
n(1 - b l / a l ) 2 k,
+ 1, problem (Pp,,) has at least one solution (t,, C,). To show that
every solution (t,, C,) of (Pp,,)converges to a solution (t(P),C(P))of (Pp) we shall need
that (t(P),C(P))is uniquely defined. This will be the case, for instance, for any elliptical
distribution PP,=which satisfies the following condition.
( F ) f is nonincreasing and has at least one point of decrease on [O, min(c1, c2)].
Note that PP,z satisfies property (H,) for every 0 < e
2.1 at least one solution of (Ppp,) exists.
5 1, so that according to Theorem
2.2. Let PP,x be an elliptical distribution that satisjies (F). Choose bl =
THEOREM
S pi(llxIIZf(llxII)dx in (2.2). Then (Ppp,) has a unique solution (p,2 ) .
Proof. First note that by means of a suitable rescaling it is sufficient to consider the
problem (P&): find a vector t in RP and a diagonal matrix A = diag(hl, . . . ,h,) with
all h, > 0 that minimize cp(t,A ) subject to
To show that (Pp,,) has a unique solution @,X) it is equivalent to show that ( P k I )
has the unique solution (0,I). The proof of this is a subtle variation on the proof of
Theorem 1 of Davies (1987). He shows that the ordinary S-minimization problem (P&) of
h, over all t E RP and positive definite diagonal matrices A satisfying
minimizing
n;=,
LOPUHAA
312
Vol. 19, No. 3
has the unique solution (0,I). This holds under conditions on the function p in (2.7)
which are weaker than ( R l H R 2 ) and with f nonincreasing with at least one common
point of decrease with the function -p. Therefore, under condition (F), Davies's Theorem
1 applies to the S-minimization problems with the function pl or p2 in (2.7). Note that
because of the choice of bl in (2.6), the constraint in (2.2) of the problem (T&) is exactly
the same as the constraint (2.7) of the S-minimization problem (P&) with p = p1. Since
hi >_ 111 = 1.
this minimization problem has a unique solution (0,I), we have
h, 2 1 and ( t ,A ) satisfy the constraint (2.2)) and
Define the sets A = { ( t ,A ) :
B = { ( t ,A ) : ny=, h, = 1). We are left showing that the problem
n;=,
has a unique solution (0,I). We shall first show this for B instead of A. Since ny=, hi = 1
for (t,A ) E B , it follows that minimizing cp over B is equivalent to
min
(t,A))EB
/
p2 ( { ( x - t l T ' ~( x - t ) ) 1 / 2 ) f ( l ldx.
~~~)
The key observation is that this minimzation problem is exactly the transformed maximization problem considered by Davies (1987, p. 1275). It is derived from the original
S-minimization problem with the function p2 using that this S-minimization problem has
solution (t*,A *) = (0,I). According to the proof of Theorem 1 of Davies (1987), the
transformed problem has the unique solution (0,I ) ; hence the problem (2.9) has a unique
solution (0,I). However, (0,I ) is also an element of the set A, so that minimizing cp over
(t,A ) E A nB also has the unique solution (0,I).
Therefore, for showing that the minimization problem (2.8) has a unique solution
(0, I), we are left with showing that cp(0,I ) < cp(t,A ) for all ( t ,A ) E A \ B. Suppose
there were a pair ( i ,A ) E A \ B with cp(i,A ) 5 cp(0,I). Then for some 0 < s < 1
the pair ( i , s A ) E B. The function cp(i,sA) is equal to the function h(s) in (2.5) with
(t,C ) = ( i , A ) and P spherically symmetric. This function was already shown to be
strictly increasing for s > 0. Therefore we would find cp(i,s A ) < cp(i,A ) 5 cp(0,I). But
this would be in contradiction with the fact that (0,I) minimizes cp over B. Q.E.D.
3. CONTINUITY OF T-FUNCTIONALS
Denote by 0 ( P ) = (t(P),C ( P ) )a solution of (I$).For a distribution P and a function
g : Rp -t R we shall write Pg(.) = S g(x) dP(x). Finally, for 0 = ( t ,C ) write
d(x,0 ) = d ( x - t ) T C - l ( ~
- t).
(3.1)
We first show continuity of the functional 0 (.).
THEOREM
3.1. Let Pk,k 2 0,be a sequence of distributions that converges weakly to P. Let
C be the class of all measurable convex subsets of RP, and suppose that every C E C is
a P-continuity set, i.e., P(aC) = 0. Suppose that P satis$es (H,) for some 0 < 6 < 1-bl/al
has
and that 0 ( P ) = (t(P),C ( P ) ) is uniquely deJined. Then for k suficiently large (Tpk)
at least one solution 8(PkX and for any sequence of solutions 8 ( P k ) ,k 2 0, it holds that
limk,
0 ( P k )= 0 (P).
Proof. The proof runs along the lines of the proof of Theorem 3.2 in Lopuhaa (1989), so
that a brief sketch suffices. Without loss of generality we may assume that 0 ( P ) = (0, I).
1991
2-ESTIMATORS FOR LOCATION AND SCATTER
313
By means of Theorem 4.2 in Rao (1962) it follows that for k sufficiently large Pk
satisfies (H1-b,/a,),SO that according to Theorem 2.1 at least one solution O(Pk) =
0k = ( t k , Ck) exists. By using that 8k satisfies the constraint (2.2), one can show that
Pk(E(tk,Ck,cl)) 1 - bl/al > 6 and conclude with Theorem 4.2 in Rao (1962) that
for k sufficiently large, P(E(tk,C k ,~ 1 ) ) e . According to Lemma 2.2(i) this means that
there exists a constant kl > 0 such that hp(Ck) kl eventually. By using that pl is
strictly increasing on [0, c l ] and that Pk -t P weakly, it follows that for each q > 0 and k
sufficiently large, Pkp1(II . [ [ / ( I + q ) ) 5 b l . This means that the pair (0,( 1 + q ) 2 ~satisfies
)
(2.4) for k sufficiently large. Using that this holds for q arbitrarily close to 0, it follows
from Remark 2.1 that
>
>
>
>
Since hp(Ck) k l , we find by Lemma 2.1 that h l ( C k )is uniformly bounded above, so
that by Lemma 2.2(iii) it follows that there exists a compact subset K of O such that for
k sufficiently large, Ok will be in K. Therefore it suffices to show that every convergent
subsequence {Ok,) has limit (0,I).
Let Ok,, j = 1,2,. . . , be a subsequence for which lim,,
8k, = OL. According to
Pk,pl(d(., O k j ) ) = Ppl(d(.,OL)).
Lemma 3.2 in Lopuhaa (1989) it holds that bl = limj,
This means that
satisfies the constraint (2.2) of ( T p ) . Since this problem has solution
(O,I), we must have ICLI{Pp2(d(.,O L)))P {Pp2(11.1I))P. Then from (3.2) it follows that
eL
>
However, (0,I ) is the unique solution of ( T p ) , SO that we conclude that OL = (0,I).
Q.E.D.
Continuity of the location r-functional t(.) is contained in Theorem 3.1. For the
covariance r-functional V(.), we have:
COROLLARY
3.1. Under the conditions of Theorem 3.1, limk,
V ( P k )= V ( P ) .
Proof. By definition we have V ( P k )= b ; ' C ( ~ ~ ) . P ~ p ~ O(Pk))).
( d ( . , According to Lemma
3.2 in Lopuhaa (1989) it holds that Pkp2(d(-,@(Pk)))
-t Pp2(d(-,@(P))),
so that the
corollary immediately follows from Theorem 3.1. Q.E.D.
Consistency of the r-estimators (t,, V,) is a consequence of the continuity of the
functionals t(.) and V(.). Let X I ,X2,. . . be a sequence of independent random vectors
in Rp with a distribution P. From now on denote by P, the empirical distribution
corresponding with X I , .. . ,X,.
COROLLARY
3.2. If the distribution P satisjes the conditions of Theorem 3.1, then
limn,
(t,, V,) = (t(P),V ( P ) )with probability one.
The condition that every convex set is a P-continuity set is in fact not needed in
Corollary 3.2. It was needed in the proof of Theorem 3.1 merely to guarantee that the
class E of all ellipsoids in RP satisfies sup^ IPk(E)-P(E)I -t 0. Since E has polynomial
discrimination (Pollard 1984, p. 17), for the empirical distribution P, this property is a
consequence of Theorem 11.14 in Pollard (1984).
When P,z is an elliptical distribution that satisfies (F), and bl is chosen as in Theorem
2.2, then the conditions of Theorem 3.1 are satisfied, and hence (t,, C,) -t @,Z) with
LOPUH~
314
Vol. 19, No. 3
probability one. Therefore, if we want V , to be consistent for 2 , we must choose b2 =
S p2(11xl1)f(llxII)dx. In general, if C ( P ) is considered to be the true scatter parameter to
be estimated, one should choose
for V, to be consistent for C(P).
4. LIMITING DISTRIBUTION
We first investigate the asymptotic behaviour of (t,, C,), from which the limiting
distribution of the z-estimators (t,, V , ) will follow. We assume that P satisfies property
( H , ) for some 0 < E < 1 - b l / a l and that the minimization problem (Tp) has a unique
solution 80 = (to, Co),and we choose b~ by (3.3).
It will be more convenient to consider (t,, C,) as solutions to the problem of finding
a vector t and a positive definite symmetric matrix C that minimize
subject to (2.3), where di =
problem ( T p , , ) of Section 2.1.
J ( x-~t ) T ~ - l -( ~t )i. This problem
is equivalent to
4.1. Relation to M-Estimators.
The relation with multivariate M-estimators can be obtained along the lines of Lopuhaa
(1989). The Lagrangian corresponding to the minimization problem (Pp,,)is
~,(tC
, , h) = 10glCl + p log
(i f
~ 2 , d i ) )*
( An
2
~l(di)-bl
i= 1
Every solution 0 , = (t,, C,) of (Tp,,) must be a zero of all partial derivatives of L,.
Therefore, besides satisfying the constraint (2.3), 0 , must also be a solution of the
simultaneous equations
n
2 {(i
i= 1
j= 1
~ 2 ( d j i ) - '~ ~ 2 ( d i h) V l ( d i ) } (Xi - t) = 0,
di
di
After solving for h in the second (matrix) equation, we get two simultaneous equations
in t and C . To keep things tidy we introduce the functions
1991
2-ESTIMATORS FOR LOCATION AND SCATTER
315
with d(x, 0 ) as in (3.1), and define A , @ ) = P,a(., 0 ) and B,(O) = P,b(., 0 ) . The
simultaneous equations that arise are perhaps described most conveniently with the
function
Wn(.,0 ) = A ~ ( ~ ) w I+(Bn(0
.)
) ~ 2 ( . ) 7
which is an adaptively weighted average of the functions y1 and 1412. We obtain the
equations
This is of course a system of linearly dependent equations. However, by adding a suitable
multiple of the constraint (2.3)to the second equation we can avoid the linear dependence.
It follows that every solution 0 , of (Tp")
must be a solution of the simultaneous equations
These equations look like the M-estimator-type score equations as defined in Huber
(1981), except that the function yf,(., 0 ) is a weighted average of two yf-functions, in
which the weights depend on the sample X I ,. . . ,X,.
4.2, Asymptotic Normality
Although 0, is a solution of the equations (4.1) defined with the function y,(.,O),
only the limiting expression of y,(.,On) is of importance for the asymptotic behaviour
of 0,. It turns out that (Lemma 4.3 in Lopuhaa 1990)
with probability one, where A. = Pa(., 00) and Bo = Pb(., 00). This means that the
function \y,(., 0 , ) converges pointwise with probability one to the function
Note that because P satisfies (H,), we have that A. > 0, and since O0 satisfies (2.2),
the ellipsoid E(to, Co,C , ) must have positive probability, which means that Bo > 0.The
limiting distribution of (t,, C , ) will be shown to be the same as that of the multivariate
M-estimators that solve (4.1) with yo(.)instead of yf,(., 0 ) .
We can write the equations (4.1) briefly as
316
LOPUHAA
Vol. 19, No. 3
The function Onis a weighted average A,(B)W (x,0 ) + Bn(B)W2(x,0 ) + 2b2R(x,0 ) of
the functions W k = (WkJK,Wk,cov),
where
for k = 1,2, and of the function R = (0,Rev), where %,,(x, 0 ) = {pl(d(x,0 ) )- b l ) C .
Let W O = (Wo,loc,W~,cov)
be the function W,, except that yo(.)[see (4.3)] replaces
'+fn(.,
0 1:
W O ( X0, ) = AoW1(x, 8 ) + BoW2(x, 0 ) + 2b2R(x, 0 ).
(4.5)
We shall use a tightness property from Pollard (1 984) for empirical processes (P, -P)$
indexed by functions $ in a class F . It is a combination of the approximation lemma
(p. 27), Lemma 11.36 (p. 36), and the equicontinuity lemma (p. 150):
LEMMA
4.1. let F be a permissible class of real-valued finctions with envelope F, and
suppose that 0 < P F ~< m. If the class of graphs of finctions in F has polynomial
discrimination, then for each 17 > 0 and E > 0 there exists a 6 > 0 for which
where [61 = (($1, $2) : $1, $2 E F and P($I - $2)' I
6').
By the envelope F of F is meant a function F for which [$I 5 F for every $ E F .
The classes of functions that we shall encounter will be indexed by 0,which means that
these classes will always be permissible in the sense of Pollard (1984, Appendix C). For
the concept of polynomial discrimination we refer to Pollard (1984, p. 17). Application of
Lemma 4.1 involves some technicalities. For most of these we refer to Lopuhai (1990),
and we restrict ourselves to illustrating how Lemma 4.1 can be used to obtain the limiting
distribution of f i ( 7 , - 00).
As a consequence of Lemma 4.1 and the consistency of 0 , (Theorem 3.1) we have the
following properties (Lemma 4.5 in Lopuhaa 1990):
for k = 1,2. The proof of (4.6) boils down to showing that each real-valued component of
, R is a linear combination of functions of the type g(d(x,8)),
the functions W I l l , W 2and
h(d(x,0 ))xj,and k(d(x,0 ))xixi, with g, h, and k continuous and of bounded variation. For
this type of functions it can be shown (Lemma 4.2 and 4.4 in Lopuhai 1990) that the
classes of graphs that correspond to the classes of functions 9 = {g(d(x,0 ) ) : 0 E O ) ,
8 = {h(d(x,O))xj : 0 E O ) , and !7Gj = {k(d(x,0 ) ) x q : 0 E 0 ) respectively, have
and K,
polynomial discrimination. If in addition IEllX1/I4 < m, then the classes G ,
have bounded envelopes, and hence we can apply Lemma 4.1 to the different functions
that form each such real-valued component. After we put all components together, (4.6)
follows.
The property (4.6) is basically all we need to determine the limiting behaviour of the
This turns out to be the same as that of the multivariate
solutions 0, of the problem (Ppn).
M-estimator defined by CL, WO(Xi,0 ) = 0 , where W o is given in (4.5).
e,
1991
2-ESTIMATORS FOR LOCATION AND SCAlTER
317
THEOREM
4.1. Suppose that EllX1 [I4 < oo, that PWO(.,8 ) has a nonsingular derivative A.
at 80, and that PWk(.,go) = 0 for k = 1,2. Then for solutions 0, = (t,, C,) of (Tp,,),
Proof. We give a brief sketch of the proof to illustrate the use of (4.6). For details we
refer to Lopuhaa (1990). Since 0 , is a solution of (4.4), it follows from (4.2) that
By (4.6), P,Wo(., 8,) = PWo(., 0,) + (P, - P)Wo(.,0 0 )+ o p ( l / f i ) .Because 80 satisfies
(2.2) and PWk(.,OO)= 0 for k = 1,2, we have that PWO(.,OO)= 0. Therefore, by
If we treat the
Taylor's formula it follows that PWo(., 8,) = Ao(O,- 00)+op(l18,- tlO1l).
other two terms in (4.7) accordingly, they reduce to oP(ll8, - Ooll) + o p ( l / f i ) ,because
(P, - P)w\~lk(.,
80) = O p ( l / f i )according to the central limit theorem. We find that
Since A. is nonsingular, it follows that 0 , - 00 = O P ( 1 / f i ) ,and after we put this into
(4.8) the theorem follows. Q.E.D.
4.3. Asymptotic Normality of the 7-Estimators.
The limiting behaviour of T , = (t,, V,) may now be obtained from Theorem 4.1 by
expressing V , in terms of 8, - 80. Let R ~ ( x8, ) = p2(d(x,0 )) - b2, and denote by D2 the
derivative of PR2(.,8 ) at 00. Similarly to (4.6) one may first show fi(P,-P){R2(., 0,)R2(.,0 0 ) ) = o p ( l / f i )and then obtain (Lemma 4.6 in Lopuhaa 1990)
THEOREM
4.2 Let R12(x, 0 ) = 01{p2(d(x,8 ) )- b2) - @{pl(d(x,0 ) ) - b l ) , where o k =
Evk(d(X1,80)). Under the conditions of Theorem 4.1, f i ( ~-, 80) has a limiting normal distribution with zero mean and covariance matrix A { ' M A { l T , where M is the
2 ( ~ , CO).
covariance matrix of T(X1,go), with T ( x ,8 ) = W O ( x0, ) - ( b 2 ~ 1 ) - 1 ~ 01 )AO(O,
Proof. With the expressions for the derivatives D2 and A. (Lemma 4.7 in Lopuhaa 1990)
and the covariance part of Theorem 4.1, one can obtain the expansion
By (4.9) if follows that
and hence
LOPUHAA
318
Vol. 19, No. 3
After we apply the linear map A. to both sides, it follows from Theorem 4.1 that
Since T ( x ,6 0 ) is bounded, the theorem follows from the central limit theorem. Q.E.D.
5. ROBUSTNESS
5.1. Breakdown Point.
The finite-sample breakdown point (Donoho and Huber 1983) of a location estimator
t, at a collection X = ( x l ,.. . ,x,) is defined as the smallest fraction m / n of outliers that
can take the estimator over all bounds,
~ * ( t ,X, ) = min
Isrnln
{ mn
: sup
Y,
l/t,(x)- ~.(Y,)II= m }
.
where the supremum is taken over all possible corrupted collections Y , that can be
obtained from X by replacing m points of X with arbitrary values. The breakdown point
of a convariance estimator C , at a collection X is defined as the smallest fraction m / n
of outliers that can either take the largest eigenvalue hl(C,) over all bounds, or take the
smallest eigenvalue hp(C,) arbitrarily close to zero:
E*(C,, X ) = min
D(C,(X), C,(Y,)) = ca
where the supremum is taken over the same corrupted collections Y , as in (5.1), and
where D(A, B ) = max{l h l ( A )- hl(B)I,IhP(A)-' - hp(B)-' 1).
THEOREM
5.1. Let X be a collection of n 2 p + 1 points in RP in general position. If
b l / a l ( n - p)/(2n), then the 2-estimators (t,,V,) have breakdown point c*(t,,X) =
E*(V,,X) = [ d l/a l l / n , where fyl denotes the smallest integer y.
<
>
Proof. As we can always rescale the functions pl, we may assume that a1 = 1. According
to Lemma 2.1 there exists a constant q > 0 which only depends on pi, p2, and b l , such
that
I
-
C
i=1
p2
({(xi - t n ) T ~ i l ( ~ itn))'")
>q
Therefore V , breaks down whenever C, does, and it suffices to consider breakdown of
t, and C,. The rest of the proof runs along the lines of the proof of Theorem 3.2 in
Lopuhaa and Rousseeuw (1991). See Lopuhaa (1990) for details. Q.E.D.
The optimal value (l/n)L(n-p+ 1)/2] for the breakdown point is obtained by choosing
b l / a l = ( n -p)/(2n). Note that the breakdown point of the z-estimators depends only
on the constant b t / a l , or only on the constant cl if bl is chosen as in Theorem 2.2. This
means that p2 can be varied without changing the value of the breakdown point.
5.2. Influence Function.
Whereas the breakdown point measures the global sensitivity of an estimator under
large pertubations, the local sensitivity may be measured by the influence function
1991
T-ESTIMATORS FOR LOCATION AND SCAlTER
319
(Hampel 1974), which describes the influence of one single outlier. For the z-functional
(t(.), V(.)) it is defined as
T (-) =
IF(x;
T,
P) = lim
h10
T ((1 - h)P
+ h k ) - T (P)
h
where 6, denotes the Dirac measure concentrated in x E W.
We assume that P satisfies property (H,) for some 0 < 6 < 1 - bllal and that the
minimization problem (Pp) has a unique solution O0 = (to, Co), and we take b2 as in
(3.3). For x E RP and 0 5 h 5 1 write Ph,x= (1 - h)P + h6,. For h 10 the distribution
Ph,x converges weakly to P. According to Theorem 3.1 this means that at least one
solution to the problem (Pph,x)exists for h sufficiently small and that 0 (Ph,x)+ 00.
THEOREM
5.2. Under the conditions of Theorem 4.1 the T-functionalhas influencefunction
IF(x; T ,P) = -AclT(x, oO),where the function T(x, 8 ) is dejined in Theorem 4.2.
If one reads Ph,xinstead of P,, the proof runs along the lines of the proof of Theorem
4.2. We refer to Lopuhaa (1990) for details. It follows immediately from the expression
for T(x, 00) given in Theorem 4.2 that IF(x; T ,P) is bounded. A more explicit expression
for IF(x;T,P) can be obtained at elliptical distributions. This will be done in the next
section.
6. ELLIPTICAL DISTRIBUTIONS
As a special case we consider elliptical distributions. In this case one may show that
T(x, 00) = So(x,00) where So = (SO,loc,
SO,cov)
is the function
where po is a weighted average of the functions pl and p2,
bo = Aobl+Bob2 = Epo(d(X1, go)), and yro is defined in (4.3). Furthermore, the derivative
of PSo(-,0 ) at O0 is equal to Ao. The function So is exactly the O-valued function in
c:)
the equation C:=, So(Xi,0 ) = 0, of which the multivariate S-estimator 0; =
defined by the function po is a solution [see (2.7) on p. 1666 in Lopuhaa (1989). Since
T, behaves like the solutions of C:=, T(Xi, 0 ) = 0 [see (4.10)], its asymptotic properties
must be the same as that of the multivariate S-estimator 0; (see Lopuhd 1990 for a more
rigorous argument). The exact expressions for the limiting distribution and the influence
function can thus be read from Corollaries 5.1 and 5.2 in Lopuhd (1989).
Let vec(A) be p2-vector that stacks the columns of a p x p matrix A on top of each
other, and let K,,, be the p2 x p2 permutation matrix uniquely defined by the property
KP,, vec(A) = v e c ( ~ for
~ ) all A. Denote by A @3B the Kronecker product which is a
p2 x p2 block matrix with the (i,j)th block equal to aUB.
(c,
LOPUHAA
320
Vol. 19, No. 3
e0
COROLLARY
6.1. Let P be an elliptical distribution with parameter
= (IL, Z), let dg =
- p ) 2 - l ( X 1 - p), and let po and yo = pb be defined by (6.2) and (4.3). Suppose
that the conditions of Theorem 2.2 hold, that P has a finite fourth moment, and that
XI
eO)
Then h(7,
has a limiting normal distribution with zero mean, and t, and V , are
asymptotically independent. The covariance of the limiting distribution of f i ( t , - p ) is
given by ( c b / P i ) Z , where cb = p-'~v;(do) and Po = p-'E{(p - l)vo(do)/do+%(do)).
The covariance matrix of the limiting distribution of the matrix h ( V , - 2 ) is given by
~,
ool = p(p + 2)-'yf2~vi(do)do2and
ool(I + Kp,p)(Z8 2 ) + 0 0 2 vec(2) v e ~ ( z ) where
(TO;! = -(2/p)001 + 4 0 1 ; ~ ~ { p ~( db0I2,
~ ) with yo defied in (6.3) and wo = Eyo(do)do.
Note that because Eg(do) = Sg(llxll)f(llxII) dx, the scalars in Corollary 6.1 do not
depend on (IL,Z). When p2b)tends to some multiple of y2 as c;! -+ 00, for instance
when p 2 6 ) is the biweight function ps(y; c2), then aa/Pi, ool, and 0 0 2 will tend to the
corresponding values for the sample mean and the sample covariance. This means that
for large values of c;! one has good asymptotic efficiency relative to the sample mean and
sample covariance. This is true for any fixed value of cl. Hence, we can choose cl such
that t, and V , have a high breakdown point (Theorem 5.1) and then vary c;! to obtain
good efficiency (for instance) at the normal distribution.
For the influence function we only give the expression at spherically symmetric distributions. The expressions at general elliptical distributions can be found by using affine
equivariance.
COROLLARY
6.2. Let P be spherically symmetric. Under the conditions of Corollary 6.1
it holds that the location 2-functional has influence function
where
is dejined in Corollary 6.1. The covariance 2-functional has influence function
where yo and Q, are defined in Corollary 6.1.
ACKNOWLEDGEMENT
I thank Werner Stahel and Rudolf Griibel for helpful suggestions and remarks.
REFERENCES
Davies, P.L. (1987). Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion
matrices. Ann. Statist., 15, 1269-1292.
Davies, P.L. (1989). Improving S-estimators by means of k-step M-estimators. Technical Report, GHS-Essen.
Donoho, D.L., and Huber, P.J. (1983). The notion of breakdown point. A Festschrififor Erich L. Lehmann (P.J.
Bickel, K.A. Doksum, and J.L. Hodges, Jr., eds.), Wadsworth, Belmont, Calif., 157-184.
Hampel, F.R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc., 69, 383-393.
Huber, P.J. (1967). The behaviour of maximum likelihood estimates under nonstandard conditions. Proceedings
of the Fifh Berkeley Symposium on Mathematical Statistics and Probability (L. Le Cam and J. Neyman,
eds.), Univ. of California Press, Berkeley, 221-233.
2-ESTIMATORS FOR LOCATION AND SCAlTER
Huber, P.J. (1981). Robust Statistics. Wiley, New York. Kim, J., and Pollard, D. (1990). Cube root asymptotics. Ann. Statist., 18, 191-219 Lopuhd, H.P. (1989). On the relation between S-estimators and M-estimators of multivariate location and
covariance. Ann. Statist., 17, 1662-1683.
Lopuhd, H.P. (1990). Multivariate 2-estimators for location and scatter. Technical Report 90-04, Delft
University of Technology.
Lopuhd, H.P., and Rousseeuw, P.J. (1991). Breakdown properties of affine equivariant estimators of multivariate location and covariance matrices. Ann. Statist., 19, 229-248
Pollard, D. (1984). Convergence of Stochastic Processes. Springer-Verlag, New York.
Rao, R.R. (1962). Relations between weak and uniform convergence of measures with applications. Ann. Math.
Statist., 33, 659-680.
Rousseeuw, P.J. (1983). Multivariate estimation with high breakdown point. Presented at the Fourth Pannonian
Symposium on Mathematical Statistics and Probability, Bad Tatzmannsdorf, Austria, 4 9 September 1983.
Mathematical Statistics and Applications (1985)(W. Grossmann, G. Mug, I. Vincze, and W. Wertz, eds.),
Reidel, Dordrecht, 283-297.
Yohai, V.J., and Zarnar, R. (1988). High breakdown-point of estimates of regression by means of the
minimization of an efficient scale. J. Amer. Statist. Assoc., 83, 406-413.
Received 22 January 1990
Revised 19 July 1990
Accepted 18 October 1990
Depaninent of Mathematics
Delft University of Technology
Julianalaon 132
2628 BL Delft
The Netherlands