Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Orthogonal matrix wikipedia , lookup
Matrix calculus wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Matrix multiplication wikipedia , lookup
Gaussian elimination wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Magic square wikipedia , lookup
System of linear equations wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Ordinary least squares wikipedia , lookup
L. Vandenberghe EE133A (Spring 2016) 10. Constrained least squares • least-norm problem • least squares problem with equality constraints 10-1 Least-norm problem minimize kxk2 subject to Cx = d • C is a p × n matrix; d is a p-vector • in most applications p < n and the equation Cx = d is underdetermined • the goal is to find the solution of Cx = d with smallest norm we will assume that C is right invertible (has linearly independent rows) • equation Cx = d has at least one solution for every d • C is wide or square (p ≤ n) • if p < n there are infinitely many solutions Constrained least squares 10-2 Example F (t) example of page 1-27 x7 x1 x9 x2 x4 x6 x3 F (t) x10 x8 x5 0 1 2 3 4 5 6 7 8 9 10 t • unit mass, with zero initial position and velocity • piecewise-constant force F (t) = xj for t ∈ [j − 1, j), j = 1, . . . , 10 • position and velocity at t = 10 are given by y = Cx where C= Constrained least squares 19/2 17/2 15/2 · · · 1 1 1 ··· 1/2 1 10-3 Example forces that move mass over a unit distance with zero final velocity satisfy 19/2 17/2 15/2 · · · 1 1 1 ··· 1/2 1 x= 1 0 some interesting solutions: • solutions with only two nonzero elements: x = (1, −1, 0, . . . , 0), x = (0, 1, −1, . . . , 0), ..., • least-norm solution: minimizes Z 10 F (t)2dt = x21 + x22 + · · · + x210 0 Constrained least squares 10-4 Example 1 1 0.5 0.8 Position Force x = (1, −1, 0, . . . , 0) 0 −0.5 0.6 0.4 0.2 −1 0 0 2 4 6 8 10 0 2 Time 4 6 8 10 8 10 Time Least norm-solution 1 0.05 Position Force 0.8 0 0.6 0.4 0.2 −0.05 0 0 2 4 6 Time Constrained least squares 8 10 0 2 4 6 Time 10-5 Least-distance solution as a variation, we can minimize the distance to a given point a 6= 0: minimize kx − ak2 subject to Cx = d • reduces to least-norm problem by a change of variables y = x − a minimize kyk2 subject to Cy = d − Ca • from least-norm solution y, we obtain solution x = y + a of first problem Constrained least squares 10-6 Tomography reconstruction of unknown image from line integrals (see page 2-39) Cij pixel j ray i • x is unknown image with n pixels • one linear equation per line • in practice, very underdetermined to find estimate x̂, solve minimize kx − ak2 subject to Cx = d • a is prior guess (for example, uniform intensity) • goal is to find x̂ consistent with measurements, close to prior guess Constrained least squares 10-7 Solution of least-norm problem if C has linearly independent rows (is right invertible), then x̂ = C T (CC T )−1d is the unique solution of the least-norm problem minimize kxk2 subject to Cx = d • in other words if Cx = d and x 6= x̂, then kxk > kx̂k • recall from page 4-26 that C T (CC T )−1 = C † is the pseudoinverse of a right invertible matrix C Constrained least squares 10-8 Proof 1. we first verify that x̂ satisfies the equation: C x̂ = CC T (CC T )−1d = d 2. next we show that kxk > kx̂k if Cx = d and x 6= x̂ kxk2 = kx̂ + x − x̂k2 = kx̂k2 + 2x̂T (x − x̂) + kx − x̂k2 = kx̂k2 + kx − x̂k2 ≥ kx̂k2 with equality only if x = x̂ on line 3 we use Cx = C x̂ = d in x̂T (x − x̂) = dT (CC T )−1C(x − x̂) = 0 Constrained least squares 10-9 QR factorization method use the QR factorization C T = QR of the left invertible matrix C T : x̂ = C T (CC T )−1d = QR(RT QT QR)−1d = QR(RT R)−1d = QR−T d Algorithm 1. compute QR factorization C T = QR (2p2n flops) 2. solve RT z = d by forward substitution (p2 flops) 3. matrix-vector product x̂ = Qz (2pn flops) complexity: 2p2n flops Constrained least squares 10-10 Example C= 1 −1 1 1 1 0 1/2 1/2 , d= 0 1 • QR factorization C T = QR √ 1/2 1/√2 1 1 −1 0 −1/2 1/ 2 2 1√ = 1 1/2 1/2 0 0 1/ 2 1 1/2 1/2 0 • solve RT z = b z1 = 0, z2 = √ 2 0√ 1 1/ 2 z1 z2 = 0 1 2 • evaluate x̂ = Qz = (1, 1, 0, 0) Constrained least squares 10-11 Outline • least-norm problem • least squares problem with equality constraints Least squares with equality constraints minimize kAx − bk2 subject to Cx = d • A is an m × n matrix, C is p × n, b is an m-vector, d is a p-vector • in most applications p < n, so equations are underdetermined • the goal is to find the solution with smallest value of kAx − bk • we do not assume that A is tall (let alone left-invertible) Special cases: this extends least squares and least-norm problems • least squares problem is a special case with p = 0 • least-norm problem is a special case with A = I, b = 0 Constrained least squares 10-12 Example • fit two polynomials f (x), g(x) to points (xi, yi) f (xi) ≈ yi for points xi ≤ a, g(xi) ≈ yi for points xi > a • make the values and derivatives continuous at the boundary point a: f 0(a) = g 0(a) f (a) = g(a), two polynomials of degree 4 g(x) f (x) a Constrained least squares 10-13 Assumptions minimize kAx − bk2 subject to Cx = d we will make the following two assumptions: 1. the stacked (m + p) × n matrix A C has linearly independent columns (is left invertible) 2. C has linearly independent rows (is right invertible) • note that assumption 1 is a weaker than left invertibility of A • assumptions imply that p ≤ n ≤ m + p Constrained least squares 10-14 Optimality conditions x̂ solves the constrained LS problem if and only if there exists a z such that T T A A C C 0 x̂ z = T A b d (proof on next page) • this is a set of n + p linear equations in n + p variables • we’ll see that the matrix on the left-hand side is nonsingular Special cases • least squares: when p = 0, reduces to normal equations AT Ax̂ = AT b • least-norm: when A = I, b = 0, reduces to C x̂ = d and x̂ + C T z = 0 Constrained least squares 10-15 Proof suppose x satisfies Cx = d, and (x̂, z) satisfies the equation on page 10-15 kAx − bk2 = kA(x − x̂) + Ax̂ − bk2 = kA(x − x̂)k2 + kAx̂ − bk2 + 2(x − x̂)T AT (Ax̂ − b) = kA(x − x̂)k2 + kAx̂ − bk2 − 2(x − x̂)T C T z = kA(x − x̂)k2 + kAx̂ − bk2 ≥ kAx̂ − bk2 • on line 3 we use AT Ax̂ + C T z = AT b; on line 4, Cx = C x̂ = d • inequality shows that x̂ is optimal • x̂ is the unique optimum because equality holds only if A(x − x̂) = 0, C(x − x̂) = 0 =⇒ x = x̂ by the first assumption on page 10-14 Constrained least squares 10-16 Nonsingularity if the two assumptions hold, then the matrix T T A A C C 0 is nonsingular Proof. T T x A A C = 0 =⇒ xT (AT Ax + C T z) = 0, z C 0 =⇒ kAxk2 = 0, =⇒ Ax = 0, Cx = 0 Cx = 0 Cx = 0 =⇒ x = 0 by assumption 1 if x = 0, we have C T z = −AT Ax = 0; hence also z = 0 by assumption 2 Constrained least squares 10-17 Nonsingularity if the assumptions do not hold, then the matrix T T A A C C 0 is singular • if 1 does not hold, there exists an x 6= 0 with Ax = 0 and Cx = 0; then T T x A A C =0 0 C 0 • if 2 does not hold there exists a z 6= 0 with C T z = 0; then T T 0 A A C =0 z C 0 in both cases, this shows that the matrix is singular Constrained least squares 10-18 Solution by LU factorization T T A A C C 0 x̂ z T A b d = Algorithm 1. compute H = AT A (mn2 flops) 2. compute c = AT b (2mn flops) 3. solve the linear equation H C T C 0 x̂ z = c d by the LU factorization ((2/3)(p + n)3 flops) complexity: mn2 + (2/3)(p + n)3 flops Constrained least squares 10-19 Solution by QR factorization we derive one of several possible methods based on the QR factorization T T A A C C 0 x̂ z = T A b d • if we define w = z − d, the equation can be written equivalently as T T A A+C C C T C 0 x̂ w = T A b d • assumption 1 guarantees that QR factorization of stacked matrix exists: Constrained least squares A C = QR = Q1 Q2 R 10-20 Solution by QR factorization substituting the QR factorization gives the equation T T QT2 R R R Q2R 0 x̂ w = R T QT1 b d • multiply first equation with R−T and make change of variables y = Rx̂: I Q2 QT2 0 y w = QT1 b d • next we note that the matrix Q2 = CR−1 has linearly independent rows: QT2 u = R−T C T u = 0 =⇒ CT u = 0 =⇒ u=0 because C has linearly independent rows (assumption 2) Constrained least squares 10-21 Solution by QR factorization we use the QR factorization of QT2 to solve I Q2 QT2 0 y w = QT1 b d • from the 1st block row, y = QT1 b − QT2 w; substitute this in the 2nd row: Q2QT2 w = Q2QT1 b − d • we solve this equation for w using the QR factorization QT2 = Q̃R̃: R̃T R̃w = R̃T Q̃T QT1 b − d which can be simpflified to R̃w = Q̃T QT1 b − R̃−T d Constrained least squares 10-22 Summary of QR factorization method T T A A+C C C T C 0 x̂ w = T A b d Algorithm 1. compute the two QR factorizations A C = Q1 Q2 R, QT2 = Q̃R̃ 2. solve R̃T u = d by forward substitution and compute c = Q̃T QT1 b − u 3. solve R̃w = c by back substitution and compute y = QT1 b − QT2 w 4. compute Rx̂ = y by back substitution complexity: 2(p + m)n2 + 2np2 flops for the QR factorizations Constrained least squares 10-23 Comparison of the two methods Complexity: roughly the same • LU factorization 2 16 3 3 2 mn + (p + n) ≤ mn + n flops 3 3 2 • QR factorization 2(p + m)n2 + 2np2 ≤ 2mn2 + 4n3 flops upper bounds follow from p ≤ n (assumption 2) Stability: 2nd method avoids calculation of Gram matrix AT A Constrained least squares 10-24 Exercise A has linearly independent rows; f (x̂) denotes the solution of minimize kx − x̂k2 subject to Ax = b 1. show that f (x̂) = A†b + (I − A†A)x̂ 2. consider one step in the Kaczmarz algorithm (with kaik = 1): x(j+1) = x(j) + (bi − aTi x(j))ai show that f (x(j+1)) = f (x(j)) 3. let y be any point satisfying Ay = b; show that kx(j+1) − yk2 = kx(j) − yk2 − (bi − aTi x(j))2 Constrained least squares 10-25