Download Projection Operators and the least Squares Method

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Matrix (mathematics) wikipedia , lookup

Determinant wikipedia , lookup

Exterior algebra wikipedia , lookup

Laplace–Runge–Lenz vector wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Gaussian elimination wikipedia , lookup

Euclidean vector wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Jordan normal form wikipedia , lookup

Principal component analysis wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Vector space wikipedia , lookup

Matrix multiplication wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Orthogonal matrix wikipedia , lookup

System of linear equations wikipedia , lookup

Ordinary least squares wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Matrix calculus wikipedia , lookup

Four-vector wikipedia , lookup

Transcript
Projection Operators and the least Squares Method
Let S and Q be a subspaces of a vector space V . Recall that S + Q is the subspace of all vectors x that
can be written as x = s + q with s ∈ S and q ∈ Q. We say that V is the sum of S and Q if V = S + Q.
If, in addition, S ∩ Q = {0} then we say that V is the direct sum of S and Q, and write V = S ⊕ Q.
Theorem. If V is a vector space and V = S ⊕ Q, then each vector x ∈ V can be expressed in one
and only one way as x = s + q with s ∈ S and q ∈ Q. Moreover, the function P : x → s, is a linear
transformation, called the projection on S along Q.
Usually we are dealing with the situation where V = Rn , and Q is the orthogonal complement of S :
Q = S ⊥ = {q ∈ Rn : q ⊥ s for all s ∈ S}.
Note that if S is a subspace of Rn then
Rn = S ⊕ S ⊥ .
Theorem. Let S be a subspace of Rn and let S ⊥ be its orthogonal complement. For each vector x ∈ Rn
there exist unique vectors s ∈ S and q ∈ S ⊥ such that
x = s + q.
Moreover there exist n × n matrices PS and PQ such that
s = PS x and q = PQ x.
Example 1. Let x = (1, 2, 3)T and S := Span{x} ⊂ R3 . Then S ⊥ = {(x, y, z)T : x + 2y + 3z = 0}.
The projection of a vector u = (a, b, c)T onto S is given by 1
 1 2 3 
14
14
14
 
 a



(
)
(
)
(
)
(
)
x·u
−1
−1
2
4
6 
b .
x = xT x
x xT u = xT x
xxT u = 
 14 14 14 
x·x


c
3
14
6
14
9
14
Note that if P is the projection matrix onto a subspace S, then I − P must be the projection matrix
onto the subspace S ⊥ . In particular, the matrix that projects onto the subspace S ⊥ of this example is
 13 −2 −3 





1
14
14
14
−2
14
10
14
−6
14
−3
14
−6
14
5
14


.


Recall that matrix multiplication is associative and that scalar factors commute with any other factor.
(1)
Proof of the Theorem. We can prove this as follows. Let {a1 , a2 , · · · ar } be a basis for S. Now let
A := (a1 , a2 , · · · ar ). This is an n × r matrix and its null space N (A) = {0} since the columns are
linearly independent. The square matrix AT A is nonsingular. To see that it suffices to show that its
null space is trivial: suppose x ∈ N (AT A), then xT AT Ax = 0. However that is the same as saying that
∥Ax∥2 = 0. Therefore Ax = 0. But since N (A) is trivial, this means x = 0, implying that the null
space of AT A is trivial and so AT A must be a nonsingular r × r matrix. We next claim that
PS = A(AT A)−1 AT and PQ = I − PS .
Since PS x = Ay with y = (AT A)−1 AT x, we see that x must be in S, the column space of A. Obviously
x = PS x + PQ x and so the only thing that remains is to show that PQ x is in S ⊥ = R(A)⊥ = N (AT ):
AT PQ x = AT [x − A(AT A)−1 AT x] = AT x − (AT A)(AT A)−1 AT x = 0.
Application. Let S be the span of a single vector a, then the projection of a vector x onto S should
be the same as the projection of x along the vector a. Let’s see:
PS x = a(aT a)−1 aT x = a
aT x
a·x
=
a.
T
a a
∥a∥2
So we see we get the usual formula for the projection of a vector x along a vector a.
Exercise. Consider Example 1 again. Note that the two vectors (2, −1, 0)T and (3, 0, −1)T are orthogonal to (1, 2, 3)T and therefore form a basis for S ⊥ . Use the formula given by the theorem to compute
the projection matrix for S ⊥ and verify that it agrees with that given by equation (1).
Application to the Least Squares Approximation. Consider the problem
Ax = b
where A is an n × r matrix of rank r (so r ≤ n and the columns of A form a basis for its column space
R(A). This problem has a solution only if b ∈ R(A). We now consider the problem with b ̸∈ R(A).
Although the problem does then not have a solution, we can ask for a vector x̂ which comes closest to
being a solution. We ask for a vector x̂ such that ∥Ax̂ − b∥ is minimized. This happens when the vector
Ax̂ is the vector in S := R(A) that is closest to b. That is, we need
Ax̂ = PS b.
Using the explicit formula that we have for PS , this means that we need to solve
Ax̂ = A(AT A)−1 AT b.
Multiplying this equation on the left by AT :
AT Ax̂ = (AT A)(AT A)−1 AT b = AT b.
This yieds the so-called normal equation for the least squares solution x̂:
AT Ax̂ = AT b.
Note that since AT A is nonsingular this equation has a unique solution. One way to obtain this solution
is by row reduction:
( T T )
A A A b −→ (I| x̂) .
Least Squares Data Fit. The best known application of least squares solutions is finding the ”best”
straight line approximation to a set of data. Given a set of data points (x1 , y1 ), (x1 , y1 ), · · · , (xr , yr ), we
try to find constants m and b such that mxi + b = yi for each i = 1, 2, · · · , r. Of course, usually that is
impossible since we have r equations for 2 unknowns, m and b. We can write these equations in matrix
form:




y1
x1 1

 x2 1  (
) 
 y.2 
 . . 


 . .  m
=  ..  .
 . . 
b
 . 
 . . 
 .. 
 .. .. 
yr
xr 1
Special case with an orthonormal basis. This is a very common situation. Suppose that the basis
for S, {a1 , a2 , · · · ar }, is an orthonormal basis, that is a basis of mutually orthogonal unit vectors. That
means aTi aj = δij . Then AT A = Ir , the r × r identity matrix and therefore P = AAT and we have, very
simply,
r
∑
( T )
Px =
ai x ai .
i=1
√
√
√
√
√
Example 2. The vectors u := (1/ 3, 1/ 3, 1/ 3)T and v := (1/ 2, −1/ 2, 0)T form an orthonormal
basis for a two-dimensional subspace of R3 . The projection operator onto this subspace is simply
 5 −1 1 
 1

√
√1
6
6
3
 1

2
√
√1
√1


 3

3
3
3




−1
5
1 


 √1 √
−1  
T
6
3 .
= 6
(u, v)(u, v) =  3
−1
2 

 
0

  √12 √

 1
2


1
1 

1
3
3
3
√
0
3
General case - using a bi-orthogonal system. This is more advanced and probably beyond of what
you are likely to come across. But for those of you who are interested enough, here is a way to construct
a projection operator for the general case of a projection onto the subspace S ⊂ Rn , along the subspace
Q where Q and S are not necessarily orthogonal to each other, but Rn = S ⊕ Q. In this case we can
find a basis {a1 , a2 , · · · ar } for S and a basis {ar+1 , ar+2 , · · · an } for Q. Then {a1 , a2 , · · · an } is a basis for
all of Rn . Let Ui be the (n − 1)-dimensional space spanned by all of the basis vectors safe ai . That is to
say, Ui = Span{a1 , · · · , ai−1 , ai+1 , · · · , an }. Of course Ui⊥ is a one-dimensional subspace. Let bi ∈ Ui⊥
such that aTi bi = 1. We can write this as aTj bi = δij . It can now be verified that
x=
n
∑
(
i=1
bTi x
)
ai ,
r
∑
(
bTi x
)
ai ∈ S,
i=1
n
∑
(
)
bTi x ai ∈ Q.
i=1
and so the projection matrix for the projection onto S along Q is
P =
r
∑
ai bTi .
i=1
Note that each term ai bTi is an n × n matrix of rank 1! Its column space is spanned by ai and its row
space is spanned by bTi . For example, if n = 2 then
(
)
(
)
)
a1 (
a1 b1 a1 b2
b1 b2 =
.
a2
a2 b1 a2 b2