Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cartesian tensor wikipedia , lookup
Horner's method wikipedia , lookup
Compressed sensing wikipedia , lookup
Matrix calculus wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
System of linear equations wikipedia , lookup
Bra–ket notation wikipedia , lookup
COMMENTS ON COMPUTING MINIMUM ABSOLUTE DEVIATIONS REGRESSIONS BY ITERATIVE LEAST SQUARES REGRESSIONS .L\.ND BY LINEAR PROGRAMMING by A. RONALD GALlANT and THOMAS M. GERIG Institute of Statistics Mimeograph Series lifo. 911 Raleigh - February 1974 COMMENTS ON COMPUTING MINIMUM ABSOLUTE DEVIATIONS REGRESSIONS BY ITERATIVE LEAST SQUARES REGRESSIONS AND BY LINEAR PROGRAMMING by A. Ronald Gallant and . 1/ Th omas M. Gerlg- 11 Assistant Professor of Statistics and Economics and Assistant Professor of Statistics, respectively. Department of Statistics, North Carolina State University, Raleigh, North Carolina 607 zr COMMENTS ON COMPUTING MINIMUM ABSOLUTE DEVIATIONS REGRESSIONS BY ITERATIVE LEAST SQUARES REGRESSIONS AND BY LINEAR PROGRAMMING ABSTRACT This note considers some aspects of the computational problem of fitting the regression model k Yt ~ ti~l Xit~i +~t by minimizing the sum of absolute deviations ~~lIYt (t ~ 1, 2, ... , n) - ~=l Xit~il The iterative method recently proposed by Schlossmacher (1973) is shown to have undesirable features under certain conditions. The linear programming approach using the simplex method as suggested by Fisher (1961) requires that the simplex tableau contain a sUbmatrix of order n by small problems. 2k + n which restricts the method to relatively We show that an n by k are sufficient to represent this submatrix. matrix and a 2k + n vector This representation improves the efficiency of the simplex method and allows its use in a large proportion of the problems which occur in applications. 1. INTRODUC'rrON Consider the linear model where is the )l of inputs, (n X 1) ~ the vector of responses, (n X 1) (k )C 1) X the (n f'V )C vector of unknown parameters, and vector of unobservable errors. The parameter ~ matrix k) the J,J. f'V is to be estimated by minimizing An iterative technique was recently proposed by Schlossmacher (1973) for solving this problem. The method consists of doing weighted regressions using diagonal weights equal to the reciprocal of the residual for each observation from the previous iterative step. When a residual becomes small relative to the others, the weight for the corresponding observation is set to zero. The procedure terminates when residuals from successive steps differ by only a small amount. We have found this method is unsatisfactory in some cases. An example is given where there exists a serious lack of stability at the answer and another where the procedure converges to the wrong answer. The objection to the use of linear programming and the simplex method of bantzig as suggested by Fisher (1961) is the excessive size of the simplex tableau which limits the method to small problems. We show that a submatrix of the simplex tableau may be represented by a 2 considerably smaller matrix which both reduces storage requirements and improves the efficiency of the simplex met.hod. We call attention to the paper by Wagner (1959) which gives an alternative method of achieving storage compression by recasting the primal linear program as a bounded variables dual linear programWe think that the reader will find the simplex method with storage compression as suggested in this note to be the simpler approach- 3 2· COMMENTS ON I'lERA'rIVELY REWEIGHI'ED LEAST SQUARES The following example illustrates the lack of stability of Schlossmacher's (1973) iteratively reweighted regression procedure. Suppose we wish to estimate the location, the following data: -1, -1, 0, 0, 2. A sample me~n) is 0, 2 ~(l) ~(2) ° The starting estimate (the yielding associated residuals 1, 1, 0, 0, 1/2. and weights estimate = of a population from ~, -1, -1, 0, The second step yields an '"' 3[(-1)(1) + (-1)(1) + (0)(0) + (0)(0) + (2)(~)] Thus, the first step estimate is the desired answer (the median) while the second step is a significant step away from the answer. A To see why this happened let ~(j) f rom th e w. 1 ~' J.th s t ep, Ir.1 (j) ,-1 be the =~ - z(j) ~~(j), ° Ir.(j)1 > f:>, w. = 1 1 if = (S1' S2' ... , (k X 1) Sn)' and Si = ~ = diag(w 1 , w2 ' if /r.(j)l:5 f:>, 1 sgn(ri(j))· A (~ - ~ ~ ( j +1 )) - ,~J 05 'J!, ~) -l~ '~ ~ XiS;:: ("oJ ° fV A 0l - ~ JG (j)) - i (j )] =2 ... , wn ), and Then the condition for stopping is fV estimate A =2 4 Thus, at the solution, X'S ,....,,...., ° = ,...., is a necessary condition for the residuals from the "next step" to be identical to those at the answer. No such S vector exists for the example given above and can be shown not to exist in many situations. In view of this, it is not surprising that the method exhibits the observed instability. The following example illustrates an instance when the process converges to the wrong answer. location, where 13, of a population from the following sample: 1 and o<t<:3 weights = 2t " A Then the first step estimate is t with residuals (l+t)-l, 0, (l_t)-l. (-l-t), 2t, (l-t). 13(1) = (-l-t), 2t, (l-t) and associated The second step estimate is ~(2) = [(l+t)-l + (l_t)-l]-l[_(l+t)-l + residuals -1, 3t, 1, is less than the test value for setting ~; weights to zero. ~[-l + 3t + 1] Suppose we wish to estimate the ° + (l_t)-l] =t with That is, the residuals are unchanged, the procedure has converged but to t not the correct answer 3t. 5 3. THE PROBLEM AS A LINEAR PROGRAM The problem of fitting the regression model Il ==:?S~ +!:t. by minimizing the sum of absolute deviations may be put in the form of a linear programming problem as follows. /2.+ into its positive and negative parts and ~- ~ so that Similarly decompose the residuals where ~ Decompose the vector =~ - ~~ into e '" = e+ - e where s. t. + e '" '" '" e+ ~ 0 ~,....." and e rv ~ O. rEhen """"-J the problem is equivalent to the linear program + e + '" - e '" + e '" where l ' ""' is a (l)t n) ~ 0, ""' row vector whose elements are ones. most convenient form of this linear program for our purposes is max l'e - l'y -0 == l'X fG+ - l'X~- - 2 rv ""' rv ""' '" '" ""' ""' s. t. :?S.Et ~+ ~0 ~~- - /2.- ~ e ""' rJ, S,Xe ""' ~ Q. The 6 k Set c' I""V == JS:- ;0 [k,:- == (l/X, r-..J'" /X, -2 1') -l (""o"Jr"V "....,.; (2k + n X 1). ,t ==:l (n X 2k + n), (1 X 2k + n), (n X 1), and ~ +' - ~ -' £, == (~ , ~ , ~ ) The matrix formulation of the linear program is s.t. When the program has been solved to obtain " minimum - max(-o) from " r I"V "+ == (~ , " ~-, " r and the associated ,..., the estimate of the parameter " e-) ,..., " by setting ,..., f3 "+ == f3 rv " f3 I"V f3 rv is recovered 7 4. THE SIMPLEX AWORI'l'fJM AND THE SUBMATRIX A '" Our discussion of the simplex algorithm is heavily dependent on the tabular representation of the problem and description of the method contained ~n (1968). Chapter III of Owen It is recommended that the reader have this reference in hand while reading this section. The problem is represented initially by the tableau ~ A T '" c' '" d = where the entry -Jv';L' The elementary step of the simplex algorithm is a pivot on some non-zero element a~ of the sUbmatrix A to obtain the tableau '" f- T '" The submat~ix i) ii) iii) iv) a A = a 0,'13 ' - = a~ a~, a = aai3 ,a~, -1 a~ '" d A by pivoting on -1 aai3 a 0,'13 as follows: '" a a 0,'13' -1 a~, 0,'13 ~, is obtained from "" 0,'13 ' = -b ,f a, f3'ff3, 13' f 13, , a ,f a, -1 aaf3 The simplex algorithm is a set of rules for choosing these pivots a~. These rules are explained in complete detail by Owen concern is with their effect on the submatrix A. '" (1968). Our 8 Consider the Figure. represented by a copy of j 13 == f3 (13 == . Initially the submatrix a permutation vector ~ 1, 2, ..• , 2k + n), < k then i) if Jf3 - ii) if k < jf3 ::; 2k iii) if 2k < jf3 A can be rv then then aat3 a initialized at and these rules: x aat3 ~ ajf3 at3 ::;: - x a(jf3- k) if == -1 We call this set of rules the jf3 ::;: , 2k + a and o == otherwise. representation of rv A . Again consider the Figure. It is quite clear from the three examples given that any number of pivots does not destroy the structure of rv A. entries, k There will always be A pivot on than an alteration of the entries in of (~ columns with arbitrary columns whose entries are their negatives, and negative elementary vectors. two columns. k k n amounts to no more aat3 columns of Thus, it is possible to obtain a (%' A and permuting rv I) representation A via the rules of the previous paragraph by suitably altering rv ,J.) • Three possibilities must be taken into account when modifying (~~) to reflect the effect of a pivot on aat3. These are: i) a pivot occurs on aat3 such that jf3 ::; k, ii) a pivot occurs on aafj such that k < jf3 ::; 2k, iii) a pivot occurs on aat3 such that 2k < jf3 and jf3 == 2k + a· 9 In the first case we have a a:t3 :.= x Alter - aj~ as follows: i) Pivot on x . to obtain aJ~ given above with ii) If - j~ + k j~ I set j~ X, replacing set j~ rv :.= I A conceptually- rv If 2k + a Otherwise set j~ + k :.= I X using the pivoting rules rv In the second case we have j~ I - j~ , j~ 2k + a :.= - I Alter - x (. ). cr, J~ - k as follows: i) ii) Pivot on x rv Change the sign on each element in row of X rv If j~ j~ , :.= except i) ii) iii) X except of rv ~ = I j ~ j~ - k - k. set j~/:'= Otherwise set aaj3 :.= 2k + j~ -1 I If ~ :.= j~ and j~ I :.= 2k + a set I j~ - 2k ~ Alter as follows: Do not pivot. Change the sign on each element in row Do not alter a X. of rv i- Clearly the above alterations of (;&~) to obtain be performed in place and no additional storage is needed. of caution, modify the vectors tableau j~ xa(j _ k) • In the third case we have (~ ~) a Change the sign on each element in column x (. ). a J~- k iii) X to obtain a(j~- k) ! before altering b, rv (~"i). c rv and the scalar d (h~) may One word of the 10 5. STORAGE AND EFFICIENCY Assuming double word storage for floating poi.nt variables and si~gle word storage for integers, the requires 2nk + 2k + n required to store (~~) words rather than the A itself. f"V representation of 8nk + 8k The elimination of the 2 n + 211 2 2 A f"V words term extends the simplex method of solving of the minimum absolute deviations estimation problem to a considerable proportion of the regression problems occurring in applications. A discussion of the relative efficiency of a pivot via the representation as opposed to operating directly on (x, j) f"V A is probably irrelevant--for small problems it doesn't matter and for moderate sized problems storage of (i) and (ii) a pivot on A f"V is infeasible. A using f"V (&"i) and indexing operations as opposed to case (iii) using n 2 + 2n k . via ~ (~"i) requires k n 2 Nevertheless, in cases requires + 2nk nk using arithmetic A. f"V For operations as opposed to The increased overhead required to index a~ indirectly in other portions of the program will be more than offset by these savings. f"V n CONFIGURATIONS OF' A SUlW"ATRIX OF 'I'HE SIMPLEX 'TABLEAU Init.ial [X -X '" '" -;0 Possibilities after Pivoting , x* ll ~12 -xU ~21 ~22 -~1 Xll r , ;!S12 .<521 'c22 [Xll ~12 - , .<521 X "'22 * -xU -x , ""12 -~22 , -1 0' '" 0 '" -x '-' f"ttJ l ~ -1 -~l -X 1"V22 0 -X ll -,2S12 -~l -,c22 - , X tv22 I"V -1 * 0 I"V = 1"V22 X -+ -I tv 0' tv -I "" -I ;!S12 •.;dl ~') _.<- r- rJ -1 -? -, Xu 11 I I- ,Q --I 2'j -I tv - 1"V21 x [Xll _25 21 -1 X , 0 ~ <ll .2<:1 -;!S12 -x ll 0 -~') -x ",21 tv -"- -, -~12 -x Il ~12 JS22 -,2S21 -~2 -~12 xll ~12 -1 ~2 -~21 -k22 0 , -? --, -1 , x n -~l '" 2'l -I "" 0' '" -I '" 2] -I I"V 12 REFERENCES [1] Fisher, W. D. (1961) trA note on c:unre fitting with mlnunurn deviations by linear programming, tr j-ourna1 of the American Statistical Association, 56, pp. 359-362. [2] Owen, G. (1968) Company. [3] Schlossmacher, E. J. (1973) trAn iterative te(~hnique for absolute deviation curve fitting,tr Journal of the American Statistical Association, 68, pp. 857-859. [4] Wagner, H. M. (1959) trLinear programming techniques for regression analysis, II Journal of the Americar Statistical Association, pp. 206-212. Game Theory. Philadelphia: W. B. Saunders