Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
• ESTIMATION IN LINEAR MODELS USING DIRECTIONALLY MINIMAX MEAN SQUARED ERROR . by ;:. THOMAS M. GERIG and GUILLERMO P. ZARATE-DE-LARA .} Institute of Statistics Mimeograph Series No. 1084 Raleigh, N.C. - August 1976 • 1 1. INTRODUCTION The most commonly used estimators of the regression coefficients in general linear models are the best linear unbiased estimators (or ordinary least squares estimators). By dropping the unbiasedness criterion, linear estimators can be obtained that have smaller variances. In this situation, it is common to adopt Some type of mean squared error criterion which balances increased bias against reduced variance. Unfortunately, estimators derived from this type of criterion invariably lead to expressions that involve the parameters themselves and, as such, are useless in practice. Several authors have proposed classes of biased regression estimators which can be used as alternatives to the ordinary least squares (01S) estimator. These estimators are studied and compared to 018 estimators with respect to mean squared error. Typically, they are not uniformly better in this respect but in some situations can be dramatically superior. Some of the common methods of biased regression estima tion include Hoerl and Kennard's [3] Ripge regression, Marquardt's [4] Generalized Inverse regression, Mayer and Willke's 01S regression, and Taro and Wallace's regression. [~J Shrunken [7l False Restrictions Many others have proposed modifications and extent ions to these. The purpose of this paper is to develop a meaningful criterion which can be used to derive possibly biased estimators which will be superior to 01S with 'respect to mean squared error. In Section 3, a criterion is proposed which is based on a minimax argument. In the formula for the trace of the mean squared error matrix, the value of 2 the regression parameters which are, in a sense, the least favorable to estimation (that is, which result in the greatest bias) are substituted for the unknown parameters. By this, it is hoped that the derived estimators will protect the user from the worst,case that nature can produce. In Section 4, this criterion is applied to the class of estimators obtained by computing OLS estimators subject to possibly false restrictions. The criterion is minimized by choice of a prespecified number of independent restrictions. A statistical test is given which can be used to decide whether the resulting estimator is superior in mean squared error to OLS. The resulting estimators turn out to be equiva lent to Marquardt's [4] Generalized Inverse estimators. Section 5 is devoted to the study of the class of Shrinkage estimators defined by Goldstein and Smith [2J. Within this class the optimum estimator turns out to a member of the subclass of Shrunken OLS estimators defined by Mayer and Wi11ke: [5J. 2. NOTATION Consider the linear model (2.1) y=XI3+e, where y is an matrix of rank (n x 1) m(~ n) vector of observations, of known constants, of unknown regression parameters, and e observable random variables with =0 E(e) S is an and X is an is an (n x m) (m x 1) vector (n x 1) vector of unE ( ee ') = L:c 2 , • 3 Define s = XIX (2.2) and (2.3) and write the singular value decomposition of X U is where ~ = U/\ l/2V ' is the , (n x m) , 1/2 Am > 0, V is (m x m) (2.4) /\ 1/2 _ - (m x m) , d' (1/ 2 ••• , AliZ ) , Al ' m U'U = Im ~ag identity matrix. s = V/\V' X as ' V'V 1/2 Al = VV' = I m ~ ,and ... Im From this we write . (Z.5) Define the partitioned matri:ces U = (U l :UZ) , (Z.6) V = (2.7) and VI where and U2 is is /\ (V l :V 2 ) Vz is (m x m - u) , (n x u) 1/2 for some . • = u. Ul is (n x m - u) , Similarly, define o ••••••••••• $ o (m x u) , 1/2 /\2 (2.8) 4 where [(m - u) x (m - u)l is and 1/2 AZ is for some (u'x u) u • We shall be concerned with the simultaneous estimation of the re- S, using linear functions of the observations gression parameters, Ay + b , where A is some (m x m) matrix and b an (m x 1) vector. For this we shall need the mean squared error matrix: M = M(Ay + b, (3) = E[ (Ay + b - 13 HAy + b - (3)' J where F =I - AX. Also we shall need the trace of T(Ay + b, 13) = M 9 tr[M(Ay + b, S)] • Fina lly, define x'x (i) (ii) C~(A) (iii) C(A) , for any vector x, is the largest characteristic root: of the matrix is the vector space generated by the columns of 3. A, A • DIRECTIONALLY MINIMAX TRACE MEAN SQUARED ERROR ESTIMATORS 13 The goal of this work is to obtain an estimator of which does not depend on the parameters themselves and which, in Some sense, minimizes the trace mean squared error, T(Ay + b, (3) • minimax philosophy leads to attempting to reRlace which maximizes T(Ay + b, 6) • is unboundedly increasing with 13 Adopting a by the value This, of course, is not helpful as \\13 \I • The modification adoped I ~n T this • 5 ~ work is to express the parameters in the form its direction cosines and k its length. T(Ay + b, k , the expression, ~) k. ~ are ,where Then, for fixed values of , can be maximized by choice of By Theorem 3.1 below, the ultimate choice of value of =~ S ~ , is independent of the ~ This fact indicates that, emanating from the origin, there is one direction corresponding to the worst choice of 13 with respect to mean sq1.iared errQr(or"equ.ivalently; squared bias). It is from along this ray that t~e 13 value of is chosen to maximize Ihe exact location on the ray is set by choice of Definition 3.1: A linear estimator ~ = Ay + b T. k. is said to be directionally minimax trace mean squared error (DMTMSE) estimator of S if A and b are such that they minimize S'T(A, b, k) = Sup T(Ay + b, 13) SE:~k = r:J 2 treAD\. I) + Sup (b + FS) I (b + FS) Se~k where 8k = (S Is = kci, Theorem 3.1: II~II = 1} . 2 (y, XS, .L:a ) , In the linear model for any A , and in particular, the DMTMSE estimator is of the form Ay . Proof: Either IlF,6 + bl1 2 ~ IlFsl12 or former holds, then i t follows that 1!-Fa + bl1 2 = \IF"~112 + IIbl12 - 2b'FS IIFS + bll IIbl12 ~ I\FS1I 2 ~ IIF13112. Suppose the ~ -2b'F,6 and hence, 2 + 211bl12 ~ IIFsI1 2 • Thus, 6 Now Supll$ + bl1 2 2 Sup(max[IIFS +bI1 , = SeiB · II-Fa +bI\2 J} SeiB k k 11:$11 2 ;;: sup , SeiBk with equality when all A and b =0 Therefore, ST(A,b,k);;: ST(A,O,k) , for k • In view of Theorem 3.1, the criterion can be reduced to: 2 ST(Ay, k) = a tr(Ar,A') + Sup [3 'F'FS 13 eiBk (3.1) where F =I 4. - AX • DMTMSE ESTIMATION IN THE ClASS OF OLS ESTIMATORS COMPUTED SUBJECT TO FALSE RESTRICTIONS One class of biased linear estimators of regression coefficients that has been proposed is that obtained by computing least squares estimates under sets of false restrictions. studied by Toro and Wallace These estimators were [7J. Their work is primarily concerned with testing whether or not a particular set of false restrictions will lead to estimators with smaller mean squared error. They leave the choice of restrictions up to the experimenter. For this case, we shall restrict our attention to the linear model (y,XS,ltj2) and assume that X is of full rank To obtain the class of estimators we shall impose u (rank(X) = m) possibly false • • • 7 restrictions on the parameters: RS=h, where R vector. is a (u x m) matrix of rank u« m) and h is a (u x 1) So that the equations are consistent, we shall assume that h e C(R) It is well known (see, for example, Pringle and Rayner [6]) that under this setup the estimator is of the form where S is given by (2.2). Theorem 4.1: In the class of least squares estimators subject to XS, 2 Ia ) , restrictions with respect to DMTMSE estimation. possibly false restrictions for the setup =h (y, RS = 0 are prefered over Proof: The theorem follows from Theorem 3.1 since, for Ra estimator will be of the form Ay + b , where A and equals b 0 whenever h =0 RS = h , the is independent of h 0 In view of Theorem 4.1, attention will be limited to the class of least squares estimators obtained subject to Ri3 = 0 • Theorem 4.2: The class of estimators, squares subject to constraints RS =0 ~(R) , obtained by least ,where of rank u} is equivalent to that where rank and RS u -1 Rt = 11 , where s R e R e (RIR (RIR is is given by (2.2). is (u x m) (u x m) of 8 Proof: rows of Since A and 2 A1 / y form a basis for m-Euc1idean space and, hence, be written as u ~ rank(R) and IGI BB' *0 G- 1R ~ R = BA 1 / 2y' rank(B) ~ , for Some u, matrix B. 1 RS- R' = GG' Thus, if we let R can Since Now, , it is easily seen that =R , (u x m) B must have full row rank. is positive definite. , ~(R) = ~(G-1R) • Also, if we let we have Thus, for every S (R) Y given in (2.4) are positive definite, the R "'" an~ = "'"S (R) there exists a corresponding "'" R such that as- 1i' = I • Using Theorem 4.2 we can respecify the class of estimators of interest as for all R: such that RS- 1R' the optimum with respe~t = I. Within this class we wish to find to the DMTMSE criterion set out in Section 3. It should be noted that the value of fixed. u is assumed to be given and Also, because of the assumption that R is of full row rank, the ordinary least squares (OLS) estimator is not in the class. However, there exist members of the class that are arbitrarily "close" to the OLS estimator. Subsections 4.1 and 4.2 below deal with deriving the optimum estimator for u =1 and for general u, respectively, and subsection 4.3 examines Some properties of the resulting estimators. • 9 ~ 4.1 The Single Constraint Case The first case we shall consider is when u = 1. the set of restrictions being imposed are of the form r is an vector satisfying (m x 1) By putting A ChM(F'F) r'S - S~l~!')S-lX' = (I ~1 - S rr'S r's =0 , where r = 1 in (3.1), and since = ChM(rr'S -2 rr') = r'rr'S -2 r =a 2tr[S -1 -1 In this case we get -1 - S -1 rr'S -1-1 + S rr'S -1 rr'S -1 J + k 2 r'rr'S -2 r = a 2 t r[ S-1 J =a m 2 ~ 2 2-2 + (k r t r - a ) r ' S r -1 2 2-2 Ai + (k r'r - a )r'S r , i=l where A = diag(A1' A2' ••• , Am) and A is defined in (2.4) and (2.5). The following theorem gives the vector r which minimizes Theorem 4.3: with equality when ~ is defined in (2.4). r = A1/ 2 V m m ,where and V That is, the DMTMSE estimator in the class of least 10 squares estimators computed subject to one possible incorrect re- a = (I - VmmV')S-lX'y striction is Proof: implies Let w = A- w'w = 1. 1/2 Vir so that r = VA Also, Now m 2 m ~:1 + (k2 =a r: ~ r: i=l i=l = a m 2 -1 r: - a ~ 2 m (~ i=l 2 w.~. ~ ~ 2 m r: . 1 ~= - a ) 2 w.~.)( ~ ~ m r: 1=1 2-1 w.~. ~ ~ 2-1 w.~. ~ ~ ) 2-1 m 2 + k ~i i=l Since 1/2 w . 1 ~= (4.1) w.~. ~ ~ 2 m r: w. = 1 and since the harmonic mean is always less than or i=l ~ equal to the arithmetic mean: m (~ i=l 2 2-1 m w.~.)( ~ ~ ~ w.~. ~ ~ i=l ) ~ (4.2) 1 • Furthermore, since the value of a weighted arithmetic mean is a1w~ys less than or equal to the largest value being averaged: m r: i=l 2-1 w.~. ~ ~ -1 Am :;; (4.3) Using (4.2) and (4.3) in (4.1) we get ST(r, k) ~a 2 m i=l ~a 2 ~ i=l m r: ~:1 + k2 i=l =a m 2 r: ~:1 + k2 - a r: ~ 2 m-1 ~:1 + k2 r: ~ i=l - a 2 -1 w.~. ~ ~ 2 -1 ~m • 11 Noting that equality holds in (4~2) whenever of i w =1 m w = 0, m-1 o I> • , wi > 0 for which , 0, 1) and in (4.3) whenever for all values = w2 = ... - wI we see that both equalities hold when in which case f2m vm' 8 ( T =A A.~ k) =a r 112 v . = V1\1/2w = Am m 2 m-1 L: A: 1 + ~ i=l k = (0, Wi 0, Thus, 2 and the resulting estimator becomes The estimator thus derived can be seen to be of the form ~ S = ~ Fe ,where OL8 estimator. P is an orthogonal projection matrix and ~ The e~timator can be obtained by projecting is the ~ S into the space orthogonal to the characteristic vector corresponding to the smallest characteristic root of XiX • Another interesting feature of this result is the fact that the estimator does not depend on the value of specified before hand. Of course, the resulting mean squared errors The General Case For the general case we shall aSSume that (1 and, hence, need not be k. will involve 4.2 k ~ u ~ m) • u is a fixed number The solution has the appearance of that for u =1 but the pro?f of the theorem is more involved. Theorem 4.4: to a set of • For the class of least squares estimates computed subject u indepei)dehtlypossibly incorrect restrictions, m-u L: i=l -1 )...1. + k 2 12 with equality holding whenever given (2.7) and (2.8). R = Ai/2V~ ,where A2 and V2 are That is, for this class of estimators the DMfMSE estimator is Proof: R=(BB,)-1/2 BA l/2 v , We can write matrix of rank = u. where B is any (uxm) From this we have that -1 m EA. i=l ~ m ~ - i=l -1 P.. A. ~~ ~ P = B'(BB,)-lB = «P .. » . Since is symmetric idempotent B ~J m 2 we have that ~ P .. = P .. which implies that p .. ~ o. Further, if ~~ ~~ j=l ~J 2 2. 2 1 Pii > 0 we have that o = ~ P .. ..v Pii - PiJ' ;:: P~.·.i- Pi.:i - Pii ~.Pii':S: • j*i J J . . m -1 m well known that ~ p .. = u. There fore, ~ P .. A. ,is maximized m i=1 ~~ ~ i=1 ~~ subject to Os: p .. :S: 1 and ~ p .. = u by choosing. P = ••• ~~ 11 i=1 ~~ .. P = 0 and P(m-u+1)(m-u+1) -- ••• -- Pm- 1 giving (m-t.«m-u) where m ~ i=1 -1 PHAi m = ~ i=m-u+1 This corresponds to taking o o 0 • • ··· ru It is -1 Ai B= [o:r ] • u and • 13 ~ Using this we have tr[S -1 1 1 - S- R'RS- l ~ m-u L:: "': i.=: 1 1 (4.4) 1. Next, we have that where Q =: 1/2 A suppose that PBA ~ Furthermore, if -1/2 e C(Q) ~ 'Ie • Note that Q2 ~ then = iJt..iJ ~iJ) = Q~ =: Q but is not symmetric. for Some 1/2 * then QiJ = ~ ~'c QiJ m and, hence, and IIQiJ *n 2 = lI/ e 11 2 Now = iJ = 1 • Therefore ChM(Q 'Q) = sup x(x'x=1) IIQxl1 2 ~ 1 and (4.5) Using (4~4) and (4.5) we may conclude that ~C' 2 m-u L: i=l -1 A. +k 2 (4.6) 1. It remains to demonstrate that the lower bound is attainable.' Putting B := (O:I ) • u we get R =: (BB') -1/2 BA 1/2VI := A21/2Vi2 and substituting this into the left . hand side of (4.6) gives • • 14 =r:J 2 m-u L: i=l which is the right hand side of (4.6). Finally, the estimator can take various forms: where 4.3 S is the OLS estimator. Some Properties of the Estimator Marquardt [Al proposed a class of regression estimators he called Generalized Inverse estimators. He argued that, as an alternative to using precise computing methods for obtaining "exact" solutions in situations where the X'X matrix has Some small but positive characteristic roots, it might be preferable to assign a lower rank to X9X and obtain a solution under this assumption. In our notation, his estimator can be written as where A l V is the assigned rank of X'X are defined in (2.7) and (2.8)0 Theorem 4.5: The DMrMSE estimator for the class of least squares estimators computed subject to possibly false restrictions is equivalent to Marquardt 9 s Generalized Inverse estimators when the assigned rank of X'X equals m- u • • 15 Proof: Since we are assuming that A+ S XIX is of full rank, + "" A vX'y Some further results which follow easily from previous results are summarized without proof in the following theorem. Theorem 4.6: (m x 1) For any Var(~IS) ~ Var(~'S) (i) E(~'S) with equality if, and only if, ~ e C(V ) 1 ~ IS = ~ IS • in which case (ii) ~ vector = tIS + ~'V2V2S and the second term (the bias) ~; vanishes if, and only if, (iii) The estimator (iv) For the more general model S . e C(VP or S e C(V 1) f'W is shorter than A , S i·~·, ""'-'f'IIt,I s's "" < S'S 2 (y, XS, La ) , where ll: I :j: a , the corresponding estimator is of the form ...S = ~~i(X'l:-IX)-lX'l:-ly where X'l: -1 X = ':"~ vAV ,.."." , VV' , = f'J"" V'V = I is a Toro and Wallace [7J , (m x m - u) is d iagona 1, matrix. give a statistical test for determining whether imposing a given set of possibly false restrictions, will result in a reduced mean square error. RS "" a This test is appropriate for use in this setting provided we make the assumption that the errors • in the model are normally distributed. test is In this case, the form of the 16 Reject H if o W~ W OJ W = [yrUzuzy/u]/[yr(I - UU')y/(n-m)] where in (Z.4) and (Z.6). and Note that the hypothesis U and U z are defined states that H o i.s 13 " superior to the OLS estimator,S, with respect to mean squared error ,.., .... and accepting H suggests that 13 is preferred over 13 • It can be o seen that W is a noncentra1 F random variable with 6 6 ~ AZ are defined in (Z.7) and (2.8). I3'VzViI3Am/2a2 ,where Am (n - m) It can be seen also that is defined in (Z.4) and (Z.5). is small whenever the projection of is small. and Z r 6 = crV I-' ZAZvZS/2a , where degrees of freedom and noncentra1ity and u 13 on C(VZ) Therefore, is small or 'I\m The latter condition is precisely that for which the Generaliz~Inverse estimators are intended. Some critical points and power computations are given in [:7.1·. 5. THE DMTMSE ESTIMATOR FOR THE ClASS OF SHRINKAGE ESTIMATORS Many regression estimators have been proposed that have the form ,.., S = Ay ~ vrUry ,where diag(Y1' ••• , Ym) . j = 1, • lit • , U and Included in this class are OLS m) , Generalized Inverse regression = 0, j + k Z) , =m- u j ., 1, r V are defined by (Z.4) and + 1, • • 4) , is discussed by Goldstein and Smith o •• , (y j (Y j = = Aj-l/Z , = A-l/Z j , j = 1, m), Ridge regression m) , as well as others. This class [21. In the following theorem, the DMTMSE estimator for this class is derived. • 17 2 (y, XS, Ia ) , the DMI'MSE estimator In the linear model Theorem 5.1: from the class of estimators of the form "'" S = vru'y , where is the OLS estimator, Proof: and V ym) , is "'" S = tS" , where 2 m -1 2 2 2 t = c /( ~ "'. + c) and c = k /r:/ ° j=l J r = diag(y l' are defined by (2.4) and ~ U .00, First, =a 2 m 2 y. j=l J ~ + 2 1/2. 2 Aj ' Yj) , j = 1, 2, ... , mJ k max[ ( 1 Suppose that the value of the second term is 2 P Then any set of 1/2. 2 2 Yj'S that minimizes ST must satisfy (1 - "'j" Y ) ~ P for j -1/2 j = 1, m and, hence, y. ~ (l-p)",. , j = 1, 000, J is a minimum, therefore (1_P)~~/2 , its smallest must equal Y. J m possible value, so as to make 22 = (l-p) J ~ '-1 m J~ A: l 2 y. as small as possible. J 2 2 +- p c This im- Since this is concave in . j=l J p , the minimum can be found by setting first derivatives equal to zero ST/a plies that resulting in Thus, 2 Y. = [c / ( ; ",:1 + J j=l J • where c 2 2 = k /a 2 • c2)J~~/2 J 18 The estimator obtained in Theorem 5.1 is a deterministically shrunken estimator as defined and studied in Mayer and Wil1ke [5J. Its form is simply.. a scalar times the OLS estimator. The scalar factor is between 0 and 1 and, as such, has the effect of shortening the length of the vector S. 6.· THE DMrMSEESTlMATOR FOR THE ClASS , :OF GENERAL LINEAR FUNCTIONS The model = ~ + e = UA1/2V '13 + e y ,.... can be decomposed as follows. Let n - m which satisfies UfU= 0 the orthogonal complement of [.~;] where E(e ) i =0 y = H;} U • = and 11''11 matrix of rank ,.... That is, let U be (n x n - m) = I n-m Then [~~~:~:] 13 + [~;j , sensible to base the estimation of 13 of estimating an U Thus, it seem solely upon Yl. For purposes S , then, we shall rewrite the model as: (6.1) e 19 Theorem Sol: -Sl = tYl Proof: el The DMTMSE linear estimator of ' where t = c2 /(m + c 2) (Cohen [lJ). Since X and =I Yl in (5.1) is given by is defined in (6.1). we have Suppose that A is not symmetric, then define A* which is symmetric. ole Note first that 'Ie ChM[(I-A ) (I-A ')] • Also * ~ tr(A) implies tr[(I-A)(I-A')] tr(AA') ~ A "" GrG' ~ ~'~ ST(A Yl ,l3 l ) 0 = = tr[(I-A*)(I-A* ')] Thus, + tr(A*A* ') • Therefore, '/~ '/~ tr(A A ') 0 tr{[(I_A)(I_AI)ll/2} ~ tr(I-A) , which implies ST(AY1,Sl) - [(I_A)(I_A,)]1/2 ChM[(I-A)(I-A')] tr(I) -2tr(A) + tr(AA') = tr(I) - 2tr(A'/~ ) tr(A) =I But '/~ tr(I-A) = tr(A) ~ tr(A* ) • Thus, Assuming that A is symmetric, we write and = a 2 n L: j=l 2 y. J + k 2 2 max ( (1 - Y .) , j = 1, ••• , m) J 0 Using the identical argument of Theorem 5.1, the minimizing value of 2 2 Y = c /(m + c ) , j = 1, .0., m 0 The choice of G is arbitrary. j Using the result of Theorem 6.1, an estimator for obtained in terms of where t = c 2/(m + c2) can be y: 0 This is a solution for the class of general linear estimators in the decomposed case given by (6.1). ~ S cause the DMTMSE estimator of CS However, be- is not necessarily equal to C times 20 that of S alone, it is not the solution for the original model. deed, the form of the estimator in the present case is some H. VA-1/2HU 'Y Infor . e 21 . ~ REFERENCES [lJ Cohen, Arthur, "All admissible linear estimators of the mean vector," . Annals of Mathematical Statistics, 37 (April 1966), 458-463. [2] Goldstein, M. and Smith, A.F .M., "Ridge-type estimators for regression analysis," Journal of the Royal Statistical Society, Sere B, 36 (June 1974), 284-291. [3J Hoerl, Arthur E. and Kennard, Robert W., "Ridge regression: Biased estimation for nonorthogonal problems,~ Technometrics, 12 (February 1970), 55-67. [4J Marquardt, D. W., "Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation," Technometrics, 72 (August 1970), 591-612. [5J Mayer, Lawrence S. and Willke, Thomas A., "On biased estimation in linear models," Technometrics, 15 (August 1973), 497-508. [6J Pringle, R. M. and Rayner, A. A., Generalized Inverse Matrices With Applications to Statistics, New York: Hafner Publishing Co., 1971. [7J Toro-Viscarrondo, Carlos and Wallace, T.D., "A test of the mean square error criterion for restrictions in linear regression," Journal of the American Statistical Association, 63 (June 1968), 558-572. . e APPENDIX Alternative proof of Theorem 6.1: Since X = I, we have (1) Write A = Gf'H ' , where G and tr(AA I) H are unitary matrices, therefore 2 = tr[GrH'HrG ' ] = trrr J = so that the trace term is independent of 2 m ~ Yi i=l G and H. Note now that = ChM[(1-GrH')(1-HrG')] ChM«1-A)(1-A ' » = ChM[ (G'H-r) I (G'H-r) '] = = Ch~(U-r)(u-r)] 2 = lI(u-r)1I = 1I1-u'rll where U = G'H , and 1101\ 2 (2) denote the matrix norm induced by the euclidean norm (known as the Spectral Norm)*. Take y. ~ e!~ = [0, such that ... , 0, 1, 0, (I - U'r)e. ~ *See ll1-rl\ • 0 • = e. ~ , = 'l-y il ' since reo ~ = y.e. ~ ~ where OJ, it follows that - U're.~ = e. ~ - y.U'e. ~ ~ Lancaster Peter, Theory of Matrices, Academic Press, New York and London, page 210. where 2 Q'l + Il c ll 2 = 1 and c is orthogonal to e .• ~ . e Hence (3) by applying (3) in (2) we conclude c~[(U-r)'(u-r)J ~ ChM«I-r)'(I-r» and the equality is attained letting G =H , hence, we write =a A 2 = GrG' m ~ j=l G'H =U = I and 2 2 2 y. + k max[(l-y.), j J which implies that J = 1, ••• , mJ using the identical argument of Theorem 5.1, the minimizing value of 2 y.~ c = --~ 2 , m+c j = 1, •.• , m •