Download Gallant, R.A. and Gerig, T.M.; (1974). "Comments of computing minimum absolute deviations regressions by iterative least squares regressions and by linear programming."

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cartesian tensor wikipedia , lookup

Horner's method wikipedia , lookup

Compressed sensing wikipedia , lookup

Matrix calculus wikipedia , lookup

Dual space wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

System of linear equations wikipedia , lookup

Bra–ket notation wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Linear algebra wikipedia , lookup

Transcript
COMMENTS ON COMPUTING MINIMUM ABSOLUTE DEVIATIONS REGRESSIONS BY
ITERATIVE LEAST SQUARES REGRESSIONS .L\.ND BY LINEAR PROGRAMMING
by
A. RONALD GALlANT
and
THOMAS M. GERIG
Institute of Statistics
Mimeograph Series lifo. 911
Raleigh - February 1974
COMMENTS ON COMPUTING MINIMUM ABSOLUTE DEVIATIONS REGRESSIONS BY
ITERATIVE LEAST SQUARES REGRESSIONS AND BY LINEAR PROGRAMMING
by
A. Ronald Gallant
and
. 1/
Th omas M. Gerlg-
11
Assistant Professor of Statistics and Economics and Assistant
Professor of Statistics, respectively. Department of
Statistics, North Carolina State University, Raleigh, North
Carolina
607
zr
COMMENTS ON COMPUTING MINIMUM ABSOLUTE DEVIATIONS REGRESSIONS BY
ITERATIVE LEAST SQUARES REGRESSIONS AND BY LINEAR PROGRAMMING
ABSTRACT
This note considers some aspects of the computational problem of
fitting the regression model
k
Yt ~ ti~l Xit~i +~t
by minimizing the sum of absolute deviations
~~lIYt
(t ~ 1, 2, ... , n)
-
~=l Xit~il
The iterative method recently proposed by Schlossmacher (1973) is
shown to have undesirable features under certain conditions.
The
linear programming approach using the simplex method as suggested by
Fisher (1961) requires that the simplex tableau contain a sUbmatrix
of order
n
by
small problems.
2k + n
which restricts the method to relatively
We show that an
n
by
k
are sufficient to represent this submatrix.
matrix and a
2k + n
vector
This representation
improves the efficiency of the simplex method and allows its use in
a large proportion of the problems which occur in applications.
1.
INTRODUC'rrON
Consider the linear model
where
is the
)l
of inputs,
(n X 1)
~
the
vector of responses,
(n X 1)
(k
)C
1)
X the
(n
f'V
)C
vector of unknown parameters, and
vector of unobservable errors.
The parameter
~
matrix
k)
the
J,J.
f'V
is to be
estimated by minimizing
An iterative technique was recently proposed by Schlossmacher
(1973) for solving this problem.
The method consists of doing weighted
regressions using diagonal weights equal to the reciprocal of the
residual for each observation from the previous iterative step.
When
a residual becomes small relative to the others, the weight for the
corresponding observation is set to zero.
The procedure terminates
when residuals from successive steps differ by only a small amount.
We have found this method is unsatisfactory in some cases.
An
example is given where there exists a serious lack of stability at
the answer and another where the procedure converges to the wrong
answer.
The objection to the use of linear programming and the simplex
method of bantzig as suggested by Fisher (1961) is the excessive size
of the simplex tableau which limits the method to small problems.
We
show that a submatrix of the simplex tableau may be represented by a
2
considerably smaller matrix which both reduces storage requirements
and improves the efficiency of the simplex met.hod.
We call attention to the paper by Wagner (1959) which gives an
alternative method of achieving storage compression by recasting the
primal linear program as a bounded variables dual linear programWe think that the reader will find the simplex method with storage
compression as suggested in this note to be the simpler approach-
3
2·
COMMENTS ON I'lERA'rIVELY REWEIGHI'ED LEAST SQUARES
The following example illustrates the lack of stability of
Schlossmacher's
(1973)
iteratively reweighted regression procedure.
Suppose we wish to estimate the location,
the following data:
-1, -1, 0, 0, 2.
A
sample me~n) is
0, 2
~(l)
~(2)
°
The starting estimate (the
yielding associated residuals
1, 1, 0, 0, 1/2.
and weights
estimate
=
of a population from
~,
-1, -1, 0,
The second step yields an
'"' 3[(-1)(1) + (-1)(1) + (0)(0) + (0)(0) +
(2)(~)]
Thus, the first step estimate is the desired answer (the median) while
the second step is a significant step away from the answer.
A
To see why this happened let ~(j)
f rom th e
w.
1
~'
J.th s t ep,
Ir.1 (j)
,-1
be the
=~ -
z(j)
~~(j),
°
Ir.(j)1
> f:>, w. =
1 1
if
= (S1' S2' ... ,
(k X 1)
Sn)'
and
Si
=
~
= diag(w 1 , w2 '
if
/r.(j)l:5 f:>,
1
sgn(ri(j))·
A
(~ - ~ ~ ( j +1 )) -
,~J 05 'J!, ~) -l~ '~ ~
XiS;::
("oJ
°
fV
A
0l - ~ JG (j))
-
i (j )]
=2
... , wn ),
and
Then the condition for
stopping is
fV
estimate
A
=2
4
Thus, at the solution,
X'S
,....,,....,
°
= ,...., is a necessary condition for the
residuals from the "next step" to be identical to those at the
answer.
No such
S
vector exists for the example given above and
can be shown not to exist in many situations.
In view of this, it
is not surprising that the method exhibits the observed instability.
The following example illustrates an instance when the process
converges to the wrong answer.
location,
where
13,
of a population from the following sample:
1
and
o<t<:3
weights
=
2t
"
A
Then the first step estimate is
t
with residuals
(l+t)-l, 0, (l_t)-l.
(-l-t), 2t, (l-t).
13(1) =
(-l-t), 2t, (l-t)
and associated
The second step estimate is
~(2) = [(l+t)-l + (l_t)-l]-l[_(l+t)-l +
residuals
-1, 3t, 1,
is less than the test value for setting
~;
weights to zero.
~[-l + 3t + 1]
Suppose we wish to estimate the
° + (l_t)-l]
=t
with
That is, the residuals are unchanged,
the procedure has converged but to
t
not the correct answer
3t.
5
3.
THE PROBLEM AS A LINEAR PROGRAM
The problem of fitting the regression model
Il
==:?S~
+!:t. by
minimizing the sum of absolute deviations may be put in the form of
a linear programming problem as follows.
/2.+
into its positive and negative parts
and
~-
~
so that
Similarly decompose the residuals
where
~
Decompose the vector
=~ -
~~
into
e
'"
= e+ - e
where
s. t.
+ e
'"
'"
'"
e+
~
0
~,....."
and
e
rv
~
O.
rEhen
""""-J
the problem
is equivalent to the linear program
+ e
+
'"
- e
'"
+
e
'"
where
l '
""'
is a
(l)t n)
~
0,
""'
row vector whose elements are ones.
most convenient form of this linear program for our purposes is
max
l'e - l'y
-0 == l'X fG+ - l'X~- - 2 rv
""'
rv ""'
'" '"
""' ""'
s. t.
:?S.Et ~+
~0
~~-
-
/2.-
~
e
""'
rJ,
S,Xe
""'
~
Q.
The
6
k
Set
c'
I""V
==
JS:- ;0
[k,:-
== (l/X,
r-..J'"
/X, -2 1')
-l
(""o"Jr"V
"....,.;
(2k + n X 1).
,t ==:l
(n X 2k + n),
(1 X 2k + n),
(n X 1),
and
~
+' - ~ -'
£, == (~ , ~ , ~ )
The matrix formulation of the linear program is
s.t.
When the program has been solved to obtain
"
minimum - max(-o)
from
"
r
I"V
"+
== (~
,
"
~-,
"
r and the associated
,...,
the estimate of the parameter
"
e-)
,...,
"
by setting ,...,
f3
"+
==
f3
rv
"
f3
I"V
f3
rv
is recovered
7
4.
THE SIMPLEX AWORI'l'fJM AND THE SUBMATRIX
A
'"
Our discussion of the simplex algorithm is heavily dependent on
the tabular representation of the problem and description of the method
contained
~n
(1968).
Chapter III of Owen
It is recommended that the
reader have this reference in hand while reading this section.
The problem is represented initially by the tableau
~
A
T
'"
c'
'"
d =
where the entry
-Jv';L'
The elementary step of the simplex
algorithm is a pivot on some non-zero element
a~
of the sUbmatrix
A to obtain the tableau
'"
f-
T
'"
The submat~ix
i)
ii)
iii)
iv)
a
A
=
a
0,'13 ' -
= a~ a~,
a
=
aai3
,a~,
-1
a~
'"
d
A by pivoting on
-1
aai3 a 0,'13
as follows:
'"
a
a
0,'13'
-1
a~,
0,'13
~,
is obtained from
""
0,'13 '
=
-b
,f
a,
f3'ff3,
13' f 13,
,
a
,f
a,
-1
aaf3
The simplex algorithm is a set of rules for choosing these pivots
a~.
These rules are explained in complete detail by Owen
concern is with their effect on the submatrix
A.
'"
(1968).
Our
8
Consider the Figure.
represented by a copy of
j
13
==
f3
(13
==
.
Initially the submatrix
a permutation vector
~
1, 2, ..• , 2k + n),
< k
then
i)
if
Jf3 -
ii)
if
k < jf3 ::; 2k
iii)
if
2k < jf3
A can be
rv
then
then
aat3
a
initialized at
and these rules:
x
aat3
~
ajf3
at3
::;: - x
a(jf3- k)
if
== -1
We call this set of rules the
jf3
::;:
,
2k +
a
and
o
==
otherwise.
representation of rv
A .
Again consider the Figure.
It is quite clear from the three
examples given that any number of pivots does not destroy the
structure of rv
A.
entries,
k
There will always be
A pivot on
than an alteration of the entries in
of
(~
columns with arbitrary
columns whose entries are their negatives, and
negative elementary vectors.
two columns.
k
k
n
amounts to no more
aat3
columns of
Thus, it is possible to obtain a
(%'
A and permuting
rv
I)
representation
A via the rules of the previous paragraph by suitably altering
rv
,J.) •
Three possibilities must be taken into account when modifying
(~~)
to reflect the effect of a pivot on
aat3.
These are:
i)
a pivot occurs on
aat3
such that
jf3 ::; k,
ii)
a pivot occurs on
aafj
such that
k < jf3 ::; 2k,
iii)
a pivot occurs on
aat3
such that
2k < jf3
and
jf3
==
2k + a·
9
In the first case we have
a a:t3
:.=
x
Alter
-
aj~
as
follows:
i)
Pivot on
x .
to obtain
aJ~
given above with
ii)
If
- j~ + k
j~ I
set
j~
X,
replacing
set
j~
rv
:.=
I
A conceptually-
rv
If
2k + a
Otherwise set
j~ + k
:.=
I
X using the pivoting rules
rv
In the second case we have
j~
I
-
j~ ,
j~
2k + a
:.=
-
I
Alter
- x (.
).
cr, J~ - k
as follows:
i)
ii)
Pivot on
x
rv
Change the sign on each element in row
of
X
rv
If
j~
j~ ,
:.=
except
i)
ii)
iii)
X except
of
rv
~
=
I
j
~
j~ - k
- k.
set
j~/:'=
Otherwise set
aaj3
:.=
2k +
j~
-1
I
If
~
:.=
j~
and
j~
I
:.=
2k + a
set
I
j~ - 2k
~
Alter
as follows:
Do
not pivot.
Change the sign on each element in row
Do not alter
a
X.
of
rv
i-
Clearly the above alterations of
(;&~)
to obtain
be performed in place and no additional storage is needed.
of caution, modify the vectors
tableau
j~
xa(j _ k) •
In the third case we have
(~ ~)
a
Change the sign on each element in column
x (.
).
a J~- k
iii)
X
to obtain
a(j~- k)
!
before altering
b,
rv
(~"i).
c
rv
and the scalar
d
(h~)
may
One word
of the
10
5.
STORAGE AND EFFICIENCY
Assuming double word storage for floating poi.nt variables and
si~gle
word storage for integers, the
requires
2nk + 2k + n
required to store
(~~)
words rather than the
A itself.
f"V
representation of
8nk + 8k
The elimination of the
2
n
+ 211
2
2
A
f"V
words
term
extends the simplex method of solving of the minimum absolute
deviations estimation problem to a considerable proportion of the
regression problems occurring in applications.
A discussion of the relative efficiency of a pivot via the
representation as opposed to operating directly on
(x, j)
f"V
A is probably
irrelevant--for small problems it doesn't matter and for moderate
sized problems storage of
(i) and (ii) a pivot on
A
f"V
is infeasible.
A using
f"V
(&"i)
and indexing operations as opposed to
case (iii) using
n
2 + 2n k .
via
~
(~"i)
requires
k
n
2
Nevertheless, in cases
requires
+ 2nk
nk
using
arithmetic
A.
f"V
For
operations as opposed to
The increased overhead required to index
a~
indirectly
in other portions of the program will be more than offset
by these savings.
f"V
n
CONFIGURATIONS OF' A SUlW"ATRIX OF 'I'HE SIMPLEX 'TABLEAU
Init.ial
[X
-X
'"
'"
-;0
Possibilities after Pivoting
,
x*
ll
~12
-xU
~21
~22
-~1
Xll
r
,
;!S12
.<521
'c22
[Xll
~12
-
,
.<521
X
"'22
*
-xU
-x
,
""12
-~22
,
-1
0'
'"
0
'"
-x
'-'
f"ttJ l ~
-1
-~l
-X
1"V22
0
-X
ll
-,2S12
-~l
-,c22
-
,
X
tv22
I"V
-1
*
0
I"V
= 1"V22
X
-+
-I
tv
0'
tv
-I
""
-I
;!S12
•.;dl
~')
_.<-
r-
rJ -1
-?
-,
Xu
11
I
I-
,Q
--I
2'j
-I
tv
- 1"V21
x
[Xll
_25 21
-1
X
,
0
~ <ll .2<:1
-;!S12
-x ll
0
-~')
-x
",21
tv
-"-
-,
-~12
-x Il
~12
JS22
-,2S21
-~2
-~12
xll
~12
-1
~2
-~21
-k22
0
,
-?
--,
-1
,
x
n
-~l
'"
2'l
-I
""
0'
'"
-I
'"
2]
-I
I"V
12
REFERENCES
[1]
Fisher, W. D.
(1961) trA note on c:unre fitting with mlnunurn
deviations by linear programming, tr j-ourna1 of the American
Statistical Association, 56, pp. 359-362.
[2]
Owen, G.
(1968)
Company.
[3]
Schlossmacher, E. J.
(1973) trAn iterative te(~hnique for
absolute deviation curve fitting,tr Journal of the American
Statistical Association, 68, pp. 857-859.
[4]
Wagner, H. M.
(1959) trLinear programming techniques for
regression analysis, II Journal of the Americar Statistical
Association, pp. 206-212.
Game Theory.
Philadelphia:
W. B. Saunders