* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Projection Operators and the least Squares Method
Matrix (mathematics) wikipedia , lookup
Determinant wikipedia , lookup
Exterior algebra wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Gaussian elimination wikipedia , lookup
Euclidean vector wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Jordan normal form wikipedia , lookup
Principal component analysis wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Vector space wikipedia , lookup
Matrix multiplication wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Orthogonal matrix wikipedia , lookup
System of linear equations wikipedia , lookup
Ordinary least squares wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Projection Operators and the least Squares Method Let S and Q be a subspaces of a vector space V . Recall that S + Q is the subspace of all vectors x that can be written as x = s + q with s ∈ S and q ∈ Q. We say that V is the sum of S and Q if V = S + Q. If, in addition, S ∩ Q = {0} then we say that V is the direct sum of S and Q, and write V = S ⊕ Q. Theorem. If V is a vector space and V = S ⊕ Q, then each vector x ∈ V can be expressed in one and only one way as x = s + q with s ∈ S and q ∈ Q. Moreover, the function P : x → s, is a linear transformation, called the projection on S along Q. Usually we are dealing with the situation where V = Rn , and Q is the orthogonal complement of S : Q = S ⊥ = {q ∈ Rn : q ⊥ s for all s ∈ S}. Note that if S is a subspace of Rn then Rn = S ⊕ S ⊥ . Theorem. Let S be a subspace of Rn and let S ⊥ be its orthogonal complement. For each vector x ∈ Rn there exist unique vectors s ∈ S and q ∈ S ⊥ such that x = s + q. Moreover there exist n × n matrices PS and PQ such that s = PS x and q = PQ x. Example 1. Let x = (1, 2, 3)T and S := Span{x} ⊂ R3 . Then S ⊥ = {(x, y, z)T : x + 2y + 3z = 0}. The projection of a vector u = (a, b, c)T onto S is given by 1 1 2 3 14 14 14 a ( ) ( ) ( ) ( ) x·u −1 −1 2 4 6 b . x = xT x x xT u = xT x xxT u = 14 14 14 x·x c 3 14 6 14 9 14 Note that if P is the projection matrix onto a subspace S, then I − P must be the projection matrix onto the subspace S ⊥ . In particular, the matrix that projects onto the subspace S ⊥ of this example is 13 −2 −3 1 14 14 14 −2 14 10 14 −6 14 −3 14 −6 14 5 14 . Recall that matrix multiplication is associative and that scalar factors commute with any other factor. (1) Proof of the Theorem. We can prove this as follows. Let {a1 , a2 , · · · ar } be a basis for S. Now let A := (a1 , a2 , · · · ar ). This is an n × r matrix and its null space N (A) = {0} since the columns are linearly independent. The square matrix AT A is nonsingular. To see that it suffices to show that its null space is trivial: suppose x ∈ N (AT A), then xT AT Ax = 0. However that is the same as saying that ∥Ax∥2 = 0. Therefore Ax = 0. But since N (A) is trivial, this means x = 0, implying that the null space of AT A is trivial and so AT A must be a nonsingular r × r matrix. We next claim that PS = A(AT A)−1 AT and PQ = I − PS . Since PS x = Ay with y = (AT A)−1 AT x, we see that x must be in S, the column space of A. Obviously x = PS x + PQ x and so the only thing that remains is to show that PQ x is in S ⊥ = R(A)⊥ = N (AT ): AT PQ x = AT [x − A(AT A)−1 AT x] = AT x − (AT A)(AT A)−1 AT x = 0. Application. Let S be the span of a single vector a, then the projection of a vector x onto S should be the same as the projection of x along the vector a. Let’s see: PS x = a(aT a)−1 aT x = a aT x a·x = a. T a a ∥a∥2 So we see we get the usual formula for the projection of a vector x along a vector a. Exercise. Consider Example 1 again. Note that the two vectors (2, −1, 0)T and (3, 0, −1)T are orthogonal to (1, 2, 3)T and therefore form a basis for S ⊥ . Use the formula given by the theorem to compute the projection matrix for S ⊥ and verify that it agrees with that given by equation (1). Application to the Least Squares Approximation. Consider the problem Ax = b where A is an n × r matrix of rank r (so r ≤ n and the columns of A form a basis for its column space R(A). This problem has a solution only if b ∈ R(A). We now consider the problem with b ̸∈ R(A). Although the problem does then not have a solution, we can ask for a vector x̂ which comes closest to being a solution. We ask for a vector x̂ such that ∥Ax̂ − b∥ is minimized. This happens when the vector Ax̂ is the vector in S := R(A) that is closest to b. That is, we need Ax̂ = PS b. Using the explicit formula that we have for PS , this means that we need to solve Ax̂ = A(AT A)−1 AT b. Multiplying this equation on the left by AT : AT Ax̂ = (AT A)(AT A)−1 AT b = AT b. This yieds the so-called normal equation for the least squares solution x̂: AT Ax̂ = AT b. Note that since AT A is nonsingular this equation has a unique solution. One way to obtain this solution is by row reduction: ( T T ) A A A b −→ (I| x̂) . Least Squares Data Fit. The best known application of least squares solutions is finding the ”best” straight line approximation to a set of data. Given a set of data points (x1 , y1 ), (x1 , y1 ), · · · , (xr , yr ), we try to find constants m and b such that mxi + b = yi for each i = 1, 2, · · · , r. Of course, usually that is impossible since we have r equations for 2 unknowns, m and b. We can write these equations in matrix form: y1 x1 1 x2 1 ( ) y.2 . . . . m = .. . . . b . . . .. .. .. yr xr 1 Special case with an orthonormal basis. This is a very common situation. Suppose that the basis for S, {a1 , a2 , · · · ar }, is an orthonormal basis, that is a basis of mutually orthogonal unit vectors. That means aTi aj = δij . Then AT A = Ir , the r × r identity matrix and therefore P = AAT and we have, very simply, r ∑ ( T ) Px = ai x ai . i=1 √ √ √ √ √ Example 2. The vectors u := (1/ 3, 1/ 3, 1/ 3)T and v := (1/ 2, −1/ 2, 0)T form an orthonormal basis for a two-dimensional subspace of R3 . The projection operator onto this subspace is simply 5 −1 1 1 √ √1 6 6 3 1 2 √ √1 √1 3 3 3 3 −1 5 1 √1 √ −1 T 6 3 . = 6 (u, v)(u, v) = 3 −1 2 0 √12 √ 1 2 1 1 1 3 3 3 √ 0 3 General case - using a bi-orthogonal system. This is more advanced and probably beyond of what you are likely to come across. But for those of you who are interested enough, here is a way to construct a projection operator for the general case of a projection onto the subspace S ⊂ Rn , along the subspace Q where Q and S are not necessarily orthogonal to each other, but Rn = S ⊕ Q. In this case we can find a basis {a1 , a2 , · · · ar } for S and a basis {ar+1 , ar+2 , · · · an } for Q. Then {a1 , a2 , · · · an } is a basis for all of Rn . Let Ui be the (n − 1)-dimensional space spanned by all of the basis vectors safe ai . That is to say, Ui = Span{a1 , · · · , ai−1 , ai+1 , · · · , an }. Of course Ui⊥ is a one-dimensional subspace. Let bi ∈ Ui⊥ such that aTi bi = 1. We can write this as aTj bi = δij . It can now be verified that x= n ∑ ( i=1 bTi x ) ai , r ∑ ( bTi x ) ai ∈ S, i=1 n ∑ ( ) bTi x ai ∈ Q. i=1 and so the projection matrix for the projection onto S along Q is P = r ∑ ai bTi . i=1 Note that each term ai bTi is an n × n matrix of rank 1! Its column space is spanned by ai and its row space is spanned by bTi . For example, if n = 2 then ( ) ( ) ) a1 ( a1 b1 a1 b2 b1 b2 = . a2 a2 b1 a2 b2