Download 10. Constrained least squares

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Orthogonal matrix wikipedia , lookup

Matrix calculus wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix multiplication wikipedia , lookup

Gaussian elimination wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Magic square wikipedia , lookup

System of linear equations wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Ordinary least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Least squares wikipedia , lookup

Transcript
L. Vandenberghe
EE133A (Spring 2016)
10. Constrained least squares
• least-norm problem
• least squares problem with equality constraints
10-1
Least-norm problem
minimize kxk2
subject to Cx = d
• C is a p × n matrix; d is a p-vector
• in most applications p < n and the equation Cx = d is underdetermined
• the goal is to find the solution of Cx = d with smallest norm
we will assume that C is right invertible (has linearly independent rows)
• equation Cx = d has at least one solution for every d
• C is wide or square (p ≤ n)
• if p < n there are infinitely many solutions
Constrained least squares
10-2
Example
F (t)
example of page 1-27
x7
x1
x9
x2
x4
x6
x3
F (t)
x10
x8
x5
0
1
2
3
4
5
6
7
8
9 10
t
• unit mass, with zero initial position and velocity
• piecewise-constant force F (t) = xj for t ∈ [j − 1, j), j = 1, . . . , 10
• position and velocity at t = 10 are given by y = Cx where
C=
Constrained least squares
19/2 17/2 15/2 · · ·
1
1
1
···
1/2
1
10-3
Example
forces that move mass over a unit distance with zero final velocity satisfy
19/2 17/2 15/2 · · ·
1
1
1
···
1/2
1
x=
1
0
some interesting solutions:
• solutions with only two nonzero elements:
x = (1, −1, 0, . . . , 0),
x = (0, 1, −1, . . . , 0),
...,
• least-norm solution: minimizes
Z
10
F (t)2dt = x21 + x22 + · · · + x210
0
Constrained least squares
10-4
Example
1
1
0.5
0.8
Position
Force
x = (1, −1, 0, . . . , 0)
0
−0.5
0.6
0.4
0.2
−1
0
0
2
4
6
8
10
0
2
Time
4
6
8
10
8
10
Time
Least norm-solution
1
0.05
Position
Force
0.8
0
0.6
0.4
0.2
−0.05
0
0
2
4
6
Time
Constrained least squares
8
10
0
2
4
6
Time
10-5
Least-distance solution
as a variation, we can minimize the distance to a given point a 6= 0:
minimize kx − ak2
subject to Cx = d
• reduces to least-norm problem by a change of variables y = x − a
minimize kyk2
subject to Cy = d − Ca
• from least-norm solution y, we obtain solution x = y + a of first problem
Constrained least squares
10-6
Tomography
reconstruction of unknown image from line integrals (see page 2-39)
Cij
pixel j
ray i
• x is unknown image with n pixels
• one linear equation per line
• in practice, very underdetermined
to find estimate x̂, solve
minimize kx − ak2
subject to Cx = d
• a is prior guess (for example, uniform intensity)
• goal is to find x̂ consistent with measurements, close to prior guess
Constrained least squares
10-7
Solution of least-norm problem
if C has linearly independent rows (is right invertible), then
x̂ = C T (CC T )−1d
is the unique solution of the least-norm problem
minimize kxk2
subject to Cx = d
• in other words if Cx = d and x 6= x̂, then kxk > kx̂k
• recall from page 4-26 that
C T (CC T )−1 = C †
is the pseudoinverse of a right invertible matrix C
Constrained least squares
10-8
Proof
1. we first verify that x̂ satisfies the equation:
C x̂ = CC T (CC T )−1d = d
2. next we show that kxk > kx̂k if Cx = d and x 6= x̂
kxk2 = kx̂ + x − x̂k2
= kx̂k2 + 2x̂T (x − x̂) + kx − x̂k2
= kx̂k2 + kx − x̂k2
≥ kx̂k2
with equality only if x = x̂
on line 3 we use Cx = C x̂ = d in
x̂T (x − x̂) = dT (CC T )−1C(x − x̂) = 0
Constrained least squares
10-9
QR factorization method
use the QR factorization C T = QR of the left invertible matrix C T :
x̂ = C T (CC T )−1d
= QR(RT QT QR)−1d
= QR(RT R)−1d
= QR−T d
Algorithm
1. compute QR factorization C T = QR (2p2n flops)
2. solve RT z = d by forward substitution (p2 flops)
3. matrix-vector product x̂ = Qz (2pn flops)
complexity: 2p2n flops
Constrained least squares
10-10
Example
C=
1 −1 1
1
1
0 1/2 1/2
,
d=
0
1
• QR factorization C T = QR
√ 
 

1/2 1/√2 1 1
 −1 0   −1/2 1/ 2  2
1√
=


 1 1/2   1/2
0  0 1/ 2
1 1/2
1/2
0
• solve RT z = b
z1 = 0, z2 =
√
2
0√
1 1/ 2
z1
z2
=
0
1
2
• evaluate x̂ = Qz = (1, 1, 0, 0)
Constrained least squares
10-11
Outline
• least-norm problem
• least squares problem with equality constraints
Least squares with equality constraints
minimize kAx − bk2
subject to Cx = d
• A is an m × n matrix, C is p × n, b is an m-vector, d is a p-vector
• in most applications p < n, so equations are underdetermined
• the goal is to find the solution with smallest value of kAx − bk
• we do not assume that A is tall (let alone left-invertible)
Special cases: this extends least squares and least-norm problems
• least squares problem is a special case with p = 0
• least-norm problem is a special case with A = I, b = 0
Constrained least squares
10-12
Example
• fit two polynomials f (x), g(x) to points (xi, yi)
f (xi) ≈ yi
for points xi ≤ a,
g(xi) ≈ yi
for points xi > a
• make the values and derivatives continuous at the boundary point a:
f 0(a) = g 0(a)
f (a) = g(a),
two polynomials of degree 4
g(x)
f (x)
a
Constrained least squares
10-13
Assumptions
minimize kAx − bk2
subject to Cx = d
we will make the following two assumptions:
1. the stacked (m + p) × n matrix
A
C
has linearly independent columns (is left invertible)
2. C has linearly independent rows (is right invertible)
• note that assumption 1 is a weaker than left invertibility of A
• assumptions imply that p ≤ n ≤ m + p
Constrained least squares
10-14
Optimality conditions
x̂ solves the constrained LS problem if and only if there exists a z such that
T
T
A A C
C
0
x̂
z
=
T
A b
d
(proof on next page)
• this is a set of n + p linear equations in n + p variables
• we’ll see that the matrix on the left-hand side is nonsingular
Special cases
• least squares: when p = 0, reduces to normal equations AT Ax̂ = AT b
• least-norm: when A = I, b = 0, reduces to C x̂ = d and x̂ + C T z = 0
Constrained least squares
10-15
Proof
suppose x satisfies Cx = d, and (x̂, z) satisfies the equation on page 10-15
kAx − bk2 = kA(x − x̂) + Ax̂ − bk2
= kA(x − x̂)k2 + kAx̂ − bk2 + 2(x − x̂)T AT (Ax̂ − b)
= kA(x − x̂)k2 + kAx̂ − bk2 − 2(x − x̂)T C T z
= kA(x − x̂)k2 + kAx̂ − bk2
≥ kAx̂ − bk2
• on line 3 we use AT Ax̂ + C T z = AT b; on line 4, Cx = C x̂ = d
• inequality shows that x̂ is optimal
• x̂ is the unique optimum because equality holds only if
A(x − x̂) = 0,
C(x − x̂) = 0
=⇒
x = x̂
by the first assumption on page 10-14
Constrained least squares
10-16
Nonsingularity
if the two assumptions hold, then the matrix
T
T
A A C
C
0
is nonsingular
Proof.
T
T
x
A A C
= 0 =⇒ xT (AT Ax + C T z) = 0,
z
C
0
=⇒ kAxk2 = 0,
=⇒ Ax = 0,
Cx = 0
Cx = 0
Cx = 0
=⇒ x = 0 by assumption 1
if x = 0, we have C T z = −AT Ax = 0; hence also z = 0 by assumption 2
Constrained least squares
10-17
Nonsingularity
if the assumptions do not hold, then the matrix
T
T
A A C
C
0
is singular
• if 1 does not hold, there exists an x 6= 0 with Ax = 0 and Cx = 0; then
T
T
x
A A C
=0
0
C
0
• if 2 does not hold there exists a z 6= 0 with C T z = 0; then
T
T
0
A A C
=0
z
C
0
in both cases, this shows that the matrix is singular
Constrained least squares
10-18
Solution by LU factorization
T
T
A A C
C
0
x̂
z
T
A b
d
=
Algorithm
1. compute H = AT A (mn2 flops)
2. compute c = AT b (2mn flops)
3. solve the linear equation
H
C
T
C
0
x̂
z
=
c
d
by the LU factorization ((2/3)(p + n)3 flops)
complexity: mn2 + (2/3)(p + n)3 flops
Constrained least squares
10-19
Solution by QR factorization
we derive one of several possible methods based on the QR factorization
T
T
A A C
C
0
x̂
z
=
T
A b
d
• if we define w = z − d, the equation can be written equivalently as
T
T
A A+C C
C
T
C
0
x̂
w
=
T
A b
d
• assumption 1 guarantees that QR factorization of stacked matrix exists:
Constrained least squares
A
C
= QR =
Q1
Q2
R
10-20
Solution by QR factorization
substituting the QR factorization gives the equation
T
T
QT2
R R R
Q2R
0
x̂
w
=
R
T
QT1 b
d
• multiply first equation with R−T and make change of variables y = Rx̂:
I
Q2
QT2
0
y
w
=
QT1 b
d
• next we note that the matrix Q2 = CR−1 has linearly independent rows:
QT2 u = R−T C T u = 0
=⇒
CT u = 0
=⇒
u=0
because C has linearly independent rows (assumption 2)
Constrained least squares
10-21
Solution by QR factorization
we use the QR factorization of QT2 to solve
I
Q2
QT2
0
y
w
=
QT1 b
d
• from the 1st block row, y = QT1 b − QT2 w; substitute this in the 2nd row:
Q2QT2 w = Q2QT1 b − d
• we solve this equation for w using the QR factorization QT2 = Q̃R̃:
R̃T R̃w = R̃T Q̃T QT1 b − d
which can be simpflified to
R̃w = Q̃T QT1 b − R̃−T d
Constrained least squares
10-22
Summary of QR factorization method
T
T
A A+C C
C
T
C
0
x̂
w
=
T
A b
d
Algorithm
1. compute the two QR factorizations
A
C
=
Q1
Q2
R,
QT2 = Q̃R̃
2. solve R̃T u = d by forward substitution and compute c = Q̃T QT1 b − u
3. solve R̃w = c by back substitution and compute y = QT1 b − QT2 w
4. compute Rx̂ = y by back substitution
complexity: 2(p + m)n2 + 2np2 flops for the QR factorizations
Constrained least squares
10-23
Comparison of the two methods
Complexity: roughly the same
• LU factorization
2
16 3
3
2
mn + (p + n) ≤ mn + n flops
3
3
2
• QR factorization
2(p + m)n2 + 2np2 ≤ 2mn2 + 4n3 flops
upper bounds follow from p ≤ n (assumption 2)
Stability: 2nd method avoids calculation of Gram matrix AT A
Constrained least squares
10-24
Exercise
A has linearly independent rows; f (x̂) denotes the solution of
minimize kx − x̂k2
subject to Ax = b
1. show that f (x̂) = A†b + (I − A†A)x̂
2. consider one step in the Kaczmarz algorithm (with kaik = 1):
x(j+1) = x(j) + (bi − aTi x(j))ai
show that f (x(j+1)) = f (x(j))
3. let y be any point satisfying Ay = b; show that
kx(j+1) − yk2 = kx(j) − yk2 − (bi − aTi x(j))2
Constrained least squares
10-25