Download notes on matrix theory - VT Math Department

Document related concepts

Laplace–Runge–Lenz vector wikipedia , lookup

Exterior algebra wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Euclidean vector wikipedia , lookup

Rotation matrix wikipedia , lookup

Vector space wikipedia , lookup

Principal component analysis wikipedia , lookup

Determinant wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Jordan normal form wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

System of linear equations wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Four-vector wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix calculus wikipedia , lookup

Gaussian elimination wikipedia , lookup

Matrix multiplication wikipedia , lookup

Transcript
NOTES ON MATRIX THEORY
BY
CHRISTOPHER BEATTIE
JOHN ROSSI
Department of Mathematics
Virginia Tech
© Christopher Beattie and John Rossi, 2003
2
Contents
1 Linear equations and elementary matrix algebra { a review
1.1
1.2
1.3
1.4
Vectors in and
.
Matrix Operations . . .
Matrix Inverses . . . . .
The LU Decomposition
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2.1 A Model for General Vector Spaces . . . . . . .
2.2 The Basics of Bases . . . . . . . . . . . . . . .
2.2.1 Spanning Sets and Linear Independence
2.2.2 Basis and Dimension . . . . . . . . . . .
2.2.3 Change of Basis . . . . . . . . . . . . .
2.3 Linear Transformations and
their Representation . . . . . . . . . . . . . . .
2.3.1 Matrix Representations . . . . . . . . .
2.3.2 Similarity of Matrices . . . . . . . . . .
2.4 Determinants . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Rn
Cn
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Vector Spaces and Linear Transformations
3 Inner Products and Best Approximations
3.1
3.2
3.3
3.4
3.5
Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Best approximation and projections . . . . . . . . . . . . . . . .
Pseudoinverses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Orthonormal Bases and the QR Decomposition . . . . . . . . . .
Unitary Transformations and the Singular Value Decomposition .
4 The Eigenvalue Problem
4.1 Eigenvalues and Eigenvectors . . . . . .
4.1.1 Eigenvalue Basics . . . . . . . . .
4.1.2 The Minimal Polynomial . . . .
4.2 Invariant Subspaces and Jordan Forms .
4.2.1 Invariant Subspaces . . . . . . .
4.2.2 Jordan Forms . . . . . . . . . . .
4.3 Diagonalization . . . . . . . . . . . . . .
4.4 The Schur Decomposition . . . . . . . .
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
10
14
20
29
35
35
42
42
45
51
54
59
62
64
71
71
75
79
82
87
95
95
96
100
105
106
108
112
116
ii
CONTENTS
4.5 Hermitian and other Normal Matrices . . . . . . . .
4.5.1 Hermitian matrices . . . . . . . . . . . . . . .
4.5.2 Normal Matrices . . . . . . . . . . . . . . . .
4.5.3 Positive Denite Matrices . . . . . . . . . . .
4.5.4 Revisiting the Singular Value Decomposition
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
121
121
122
123
126
Chapter 1
Linear equations and
elementary matrix algebra
{ a review
Prerequisites:
None, aside from being a careful reader.
Advanced Prerequisites:
Familiarity with row reduction for solving linear systems.
Learning Objectives:
Review of row reduction and Gauss Elimination for solving systems of
linear equations.
Introduction of matrices as basic bookkeeping devices to organize row
reduction.
Identication of all possible solutions for a system of linear equations and
recognition of when a system fails to have any solution.
The set of equations
x + 2y + 2z = 3
3x + 5y + 4z = 5
2x + y 3z = 7
1
(1.1)
2
Linear Equations { a Review
is an example of a system of linear equations. The variables x, y, and z appear
\linearly" in the sense that they each have unit exponents (that is, they're not
raised to a power other than one); they don't multiply one another (for example,
no terms like \6xy"); and they don't otherwise participate as arguments of more
complicated functions like sin(x). A solution of the system (1.1) is a choice of
values for each of x, y, and z which, upon substitution, will satisfy all equations
simultaneously. The solution set is then simply the set of all possible solutions.
The usual strategy in solving a linear system such as this is to transform it into a
simpler linear system that may then be easily solved. \Simpler" in this context
means that each equation of the simpler linear system has only one variable
with a nonzero coeÆcient for which a value has not yet been determined (for
which we can then directly solve). For example, the linear system
2x = 3
3x + y = 5
2x + y + z = 7
is simple in this sense, since the rst equation can be solved directly for x, then
knowing x, the second equation can be solved directly for y, and then knowing
both x and y, the nal equation can be solved for z . Notice that the feature
that makes this particular system of equations simple is that the rst equation
has zero as a coeÆcient for both y and z , and the second equation has zero as
a coeÆcient for z .
The process of transforming the original linear system into a simpler one
then involves systematically introducing zeros as coeÆcients of certain variables
in certain equations. However we might conceive of ways of ddling with the
coeÆcients, of paramount importance is that we not change the set of solutions
in the course of our ddling. There are many types of transformations that we
may apply to a linear system that will not change the set of solutions and that
have the additional desired potential for yielding simplied linear systems. We
list three here that suÆce for all our purposes. These are called elementary
transformations:
Type 1 : Replace an equation with the sum of the same equation and a multiple of another equation;
Type 2 : Interchange two equations;
Type 3 : Multiply an equation by a nonzero number.
It should be obvious that Type 2 and Type 3 operations will not change
the set of solutions to the linear system in any way, since (for Type 2 ) the
set of solutions must be independent of the order in which the equations are
presented and (for Type 3 ) multiplying both sides of a given equation by a
nonzero number cannot change the set of values that satisfy that particular
equation. It is less obvious perhaps that Type 1 operations cannot change the
solution set of the system. Let's consider this in more detail. It is apparent
3
Gaussian Elimination
that the set of solutions to a linear system cannot be diminished by a Type
1 operation, since any set of values that satisfy all equations of the original
system before the Type 1 operation will satisfy all equations of the modied
system after the Type 1 operation as well. But can a Type 1 operation expand
the set of solutions to a linear system ? That is, could there be solutions to
the modied system (after the Type 1 operation) that were not solutions to
the original system (before the Type 1 operation) ? The key is in observing
that the eect of any Type 1 operation can be reversed with another Type 1
operation. For example, the Type 1 operation,
(equation i) + (equation j) ! (equation i)
can be undone by another Type 1 operation,
(equation i) (equation j) ! (equation i)
Then, the o riginal system could be viewed as resulting from a Type 1 operation applied to the modied system and so (using the reasoning above), every
solution of the original system must also be a solution to the modied system.
We conclude that the original and modied systems have precisely the same
solutions and Type 1 operations leave the solution set for a system unchanged.
Brief reection will make it clear that these three types of elementary transformations only permit summing, subtracting, scaling, and exchanging of equation coeÆcients with corresponding coeÆcients of other equations, and that
these transformations never allow the coeÆcients of any given variable to be
summed or exchanged with coeÆcients of a dierent variable. For this reason,
it is possible to introduce a \bookkeeping" mechanism to streamline the solution
process. Recognizing that only the relative position of the equation coeecients
need be known to apply any of the three elementary transformations, one typically denes an augmented matrix by stripping away all extraneous symbols and
letters, leaving only the coeÆcients and right-hand side constants in place. This
represents a linear system of equations without the clutter of symbols, e.g.,
2
3
1 2 2 3
x + 2y + 2z = 3
4 3 5
4 5 5 represents
3x + 5y + 4z = 5
2 1 3 7
2 x + y 3z = 7
The elementary equation transformations described before now correspond
to operations on rows in the augmented matrix and are known as elementary
row operations :
Type 1 : Replace a row with the sum of the same row and a multiple of another
row in the augmented matrix;
Type 2 : Interchange two rows; and
Type 3 : Multiply a row by a nonzero number.
4
Linear Equations { a Review
We proceed to simplify the system by introducing zeros as follows:
2
4
1 2
3 5
2 1
2
4
3
3
5
7
3
5
+ Type 1:
3
2
3
1 2 2
4 0
1 2
45
0 3 7 13
+ Type
1:
3
2
1 2 2 3
4 0
1 2 45
0 0 1 1
x + 2y + 2z =
3x + 5y + 4z =
2x + y 3z =
(Row/Eqn 2) + (
(Row/Eqn 3) + (
3)
2)
3
5
7
(Row/Eqn 1) ! (Row/Eqn 2)
(Row/Eqn 1) ! (Row/Eqn 3)
x + 2y + 2z =
3
1y 2z =
4
3y 7z = 13
(Row/Eqn 3) + ( 3) (Row/Eqn 2) ! (Row/Eqn 3)
x + 2y + 2z = 3
y 2z = 4
z = 1
The nal linear system is \simpler" and straightforward to solve:
z = 1
(solving the third equation for z )
y = 2
(solving the second equation for y)
x= 3
(solving the rst equation for x):
We were able to do the reduction to a simpler system here using only Type
1 elementary row operations and then followed this with \back substitution".
By using Type 1 and Type 2 elementary operations, one can always achieve
a rough \triangular" form similar to this, called the row echelon form. The
dening feature of the row echelon form is that the rst nonzero entry of each
row (called the \pivot" for that row) must occur below and to the right of the
pivot of each of the preceding rows above it. Rows consisting of all zeros occur at
the bottom of the matrix. Type 1 operations do the work of introducing zeros,
while Type 2 operations do the necessary shuing of rows to get a triangular
form. This is an important enough observation to state as
Theorem 1.1. Every m n matrix can be reduced to row echelon form using
only Type 1 and Type 2 elementary operations
Although terminology can vary, the composite process rst of using Type
1 and Type 2 elementary operations to reduce to a row echelon form that
represents a \triangular" system, and then of solving the triangular system by
back substitution, is called Gauss elimination. This isn't the only strategy
available to us. Instead of using back substitution after achieving a row echelon
form as we did above, we could have continued the reduction process with
elementary operations by next attacking nonzero entries in the upper triangular
portion of the augmented matrix, and then nally dividing each row by its rst
nonzero entry. : : : continuing with the reduction:
Gaussian Elimination
5
..
.
+ Type
1: (Row/Eqn 1) + 2 (Row/Eqn 2) ! (Row/Eqn 1)
2
3
1 0 2 5
x 2z = 5
4 0
1 2 45
y 2z = 4
0 0 1 1
z = 1
2) + ( 2) (Row/Eqn 3) ! (Row/Eqn 2)
+ Type 1: (Row/Eqn
(Row/Eqn 1) + ( 2) (Row/Eqn 3) ! (Row/Eqn 1)
+ Type
3: ( 1) (Row/Eqn 3) ! (Row/Eqn 3)
2
3
1 0 0 3
x = 3
4 0 1 0
y = 2
25
z = 1
0 0 1 1
(the same result as before). The additional elementary operations had the eect
of the back substitution phase of Gauss elimination. By using all three elementary operations in this way we can always achieve a reduced triangular form
called reduced row echelon form. The dening features of a reduced row echelon
form are that it be a row echelon form matrix for which additionally, the rst
nonzero entry in each row is a \1" (a \leading one") and each column with a
leading one has no other nonzero entries. The process of using all the elementary
operations in this way to achieve a reduced row echelon form from which the
solution is then available by inspection is generally called Gauss-Jordan elimination. Although Gauss-Jordan elimination is useful in some circumstances
(typically involving hand calculations), Gauss elimination based on what was
described above or in the equivalent \LU" form described in x1.4 is somewhat
more eÆcient and predominates in computer implementations.
Although this collection of methods for solving linear systems usually carries
some reference to Gauss in its name, the same basic idea of transforming systems
of linear equations systematically to a simpler form to extract a solution was
known in the early Han Dynasty in China nearly 2,000 years earlier. Gauss
rediscovered this elimination method at the beginning of the 19th century and
recommended it as replacement to the then commonly used Cramer's Rule1
for practical computation. Gauss and the geodisist Wilhelm Jordan (who later
in the century added the renements useful in hand calculations we described
above) used these methods to solve least squares problems in celestial mechanics
and surveying calculations, respectively.
A linear system of equations is said to be consistent if there exists at least
one choice of variable values that satisfy all equations simultaneously { that is,
if there is at least one solution to the system. If no such choice exists, the system
is called inconsistent. A linear system of equations is called homogeneous if the
constants on the right hand side of the equations are each equal to zero. If one
1 Against all odds, Cramer's Rule still survives in linear algebra pedagogy today despite
being more diÆcult to understand, unable to provide any result for systems with multiple
solutions, and immensely more expensive even for systems of modest size than either Gauss
or Gauss-Jordan elimination. Go gure : : :
6
Linear Equations { a Review
or more of these constants are nonzero, then the system is nonhomogeneous.
A homogeneous system is always consistent since the trivial or zero solution
consisting of all variables set equal to zero always is one possible solution.
Consider the linear system of equations
x1 2x2 + 2x3 x5 = 1
3x1 6x2 + 7x3 + 3x4 + x5 + 3x6 = 4
2x1 4x2 + 7x3 + 2x4 + 4x5 + 9x6 = 7
2x1 + 4x2 2x3 2x4 + 3x5 + 6x6 = 2
with its associated augmented matrix representation
2
3
1 2 2 1 0 0 1
6 3
6 7 3 1 3 4 77 :
6
4 2
4 7 2 4 9 75
2 4 2 2 3 6 2
Using Gauss elimination, the nal augmented matrix in row echelon form and
the linear system of equations it represents are found to be
2
3
1 2 2 1 0 0 1
x1 2x2 + 2x3 + x4 = 1
6 0
7
0
1
0
1
3
1
x3 + x5 + 3x6
= 1
6
7
4 0
0 0 0 1 0 25
x5
= 2
0 0 0 0 0 0 0
(0
= 0)
As we solve backwards starting with x5 , notice that x6 , x4 , and x2 are
completely unconstrained to the extent that whatever values we might choose
for them will lead to a valid solution of the linear system provided that values for
the remaining variables are chosen consistently with the remaining equations.
If we label the values of x6 , x4 , and x2 respectively, with the free parameters r,
s, and t, say, then we obtain that x5 = 2, x6 = r (free), x3 = 1 3r, x4 = s
(free), x2 = t (free), and x1 = 3 s + 2t. Note that the system has innitely
many solutions since we have an innite number of possible choices for assigning
values to r, s, and t.
This example represents a situation occuring for a large class of linear systems as described in the following theorem.
Theorem 1.2. A consistent system of linear equations with more unknowns
than equations always has innitely many solutions. As an important special
case, a homogeneous system of linear equations with more unknowns than equations always has innitely many nontrivial solutions.
The idea underlying Theorem 1.2 involves rst noticing that the pivot in each
row is associated with a variable that is solved for in terms of remaining variables
and the right hand side constants. Since the maximum number of these leading
nonzero entries can be no larger than the total number of equations, which by
hypothesis is strictly smaller than the total number of variables, there must
7
Gaussian Elimination
be \left-over" variables that are associated with free parameters that therefore
could take on an innite number of possible values. Simply stated, there are
too few equations to completely specify the unknowns, so there will always be
at least one free parameter able to take arbitrary values. Notice that a linear
system with more unknowns than equations might have no solutions and that a
linear system having an innite number of solutions might have fewer unknowns
than equations (e.g., consider
x1 2x2 = 1
2x1 + 4x2 = 2
3x1 6x2 = 3
which has the family of solutions (1 + 2s; s) with s varying over R).
If we have multiple linear systems of equations all having the same left hand
side coeÆcients but diering only in the right hand side constants, we do not
have to resolve each system separately from scratch. Some advantage can be
taken from the fact that the elementary row operations do not combine values
across columns. For example, suppose we have two linear systems represented
in terms of augmented matrices as
2
3
2
3
1 3 1 3
1 3 1 2
4 2 5 3
4 2 5 3 2 5:
45
3 9 4 1
3 9 4 3
We can represent both linear systems together in a single \fat" augmented
matrix as
3
2
1 3 1 3 2
4 2 5 3
4 2 5;
3 9 4 1 3
and proceed with just a single reduction task instead of the original two that
we had faced:
+ (Type 1)
2
3
1 3 1 3 2
4 0
1 1 10 2 5
0 0 1 8 3
+ (Type 1, Type 3)
2
4
1 0
0 1
0 0
4
1
1
27
10
8
+ (Type 1)
4
2
3
3
5
8
Linear Equations { a Review
2
3
1 0 0 5 8
4 0 1 0
2 15
0 0 1 8 3
Thus the solution set to the rst system is ( 5; 2; 8) and to the second system
is (8; 1; 3).
In exercise problems 1.1 - 1.4, nd the solution sets for the given systems.
x1 + x2 + 2x3 = 8
x1 + x2 + 2x3 = 4
Exercise 1.1. 2x1 9x2 + 7x3 = 11
2x1 9x2 + 7x3 = 4
x1 2x2 + 3x3 = 1
x1 2x2 + 3x3 = 9
Exercise 1.2. x1 x2 = 2
x1 x2 = 7
2x1 + x2 = 1
2x1 + x2 = 5
3x1 + 2x2 = 1
3x1 + 2x2 = 6
Exercise 1.3. x1 2x2 + x3 4x4 = 1
2x1 + x2 + 8x3 2x4 = 3
2x1 9x2 4x3 14x4 = 1
Exercise 1.4.
2x1 + 2x2 x3 + x5 = 0
x1 x2 + 2x3 3x4 + x5 = 0
x1 + x2 2x3 x5 = 0
x3 + x4 + x5 = 0
Problem 1.1. How many arithmetic operations should you expect to be necessary to solve a system of m linear equations in n unknowns using Gauss
elimination ?
1. Give an explicit expression for the leading terms with respect to m and n.
Suppose rst that the total number of arithmetic operations has the form
an3 + bn2 m + cnm2 + dm3 + : : :
where \: : : " includes all terms of lower order like those that are pure
second order or less in m or n. Find a, b. c, and d. For convenience,
suppose that only Type 1 operations are necessary to reduce the matrix
to row echelon form and that every Type 1 operation introduces only a
single zero into the modied matrix. Assume that only the basic solution
(all free parameters set to zero) is calculated. It might be useful to recall
the elementary formulas
n
n
X
X
n(n + 1)(2n + 1)
n(n + 1)
k2 =
k=
2
6
k=1
k=1
Gaussian Elimination
9
2. Write a Matlab routine that accepts an augmented matrix of dimension
m (n +1) representing the system of equations; performs Gauss elimination with the assumptions made in part 1; and returns the basic solution.
Using the flops command to count arithmetic operations, plot the number of arithmetic operations against
m, for values of m ranging from 5; : : : ; 15 and with xed n = 10.
n, for values of n ranging from 5; : : : ; 15 and with xed m = 10.
How do your plots compare with your predictions from part 1 ?
3. How would the number of arithmetic operations change if p > 1 linear
systems associated with dierent right-hand sides but the same coeÆcient
matrix were solved simultaneously ? Justify your answer either analytically or experimentally.
10
Linear Equations { a Review
1.1
Vectors in
R n and C n
Prerequisites:
Basic knowledge of vectors and Euclidean length.
Familiarity with the arithmetic of complex numbers.
Learning Objectives:
Familiarity with vector notation.
Ability to calculate the Euclidean norm, dot product, angle, and Euclidean
distance between vectors in Rn and C n .
You should already have some idea of the basic algebra of vectors. For
completeness, here are some basic denitions.
Denition 1.3 (Rn ). We dene Rn to be the set of ordered n-tuples of real
numbers
0
B
x1
x2
x=B
B .
@ .
.
xn
1
C
C
C:
A
We usually call elements of Rn vectors and use boldface, lowercase, Latin letters
to represent them.
We assume the usual rules for vector addition
0
1 0
1
0
1
x1
y1
x1 + y1
B x2 C B y2 C
B x2 + y2 C
B
C B
C
B
B . C + B . C := B
C
.. C
@ .. A @ .. A
@
. A
xn
yn
xn + yn
and scalar multiplication
0
1
0
1
x1
(x1 )
B x2 C
B (x2 ) C
C
B
B
B . C := B
C
.. C
@ .. A
@
. A
xn
(xn )
for any 2 R.
Remark 1.4. Some texts use Rn to refer to n-dimensional columns of numbers
and Rn to refer to n-dimensional rows of numbers. We will follow the more
common practice of letting Rn refer to both sets with the obvious map (the
transpose) between the two types of objects.
Vectors in
Rn
and
11
Cn
Remark 1.5. The space Rn satises the denition of an abstract vector space
which is described elsewhere in the series.
Example 1.6. Let x = (1; 1; 2; 3) and y = ( 4; 1; 02). Then
x + y = (1 + ( 4); 1 + 1; 2 + 0; 3 + 2) = ( 3; 0; 2; 5)
x y = (1 ( 4); 1 1; 2 0; 3 2) = (5; 2; 2; 1)
3y = (3( 4); 3(1); 3(0); 3(2)) = ( 12; 3; 0; 6):
The basic geometric ideas of vectors are contained in the following denitions:
Denition 1.7. We equip Rn with the Euclidean norm
q
kxk := x + x + + xn
and the dot product
Note that
2
1
2
2
2
x y := x1 y1 + x2 y2 + + xn yn :
kxk = x x:
The angle, between two nonzero vectors is dened by the identity
cos := kxxkkyyk :
2
The distance between two vectors x and y is the norm of the dierence
kx yk
We dene the open ball of radius r 2 R about the vector x to be
Br (x) := fy 2 Rn j kx yk < rg:
In R2 we designate two special vectors of unit length i = (1; 0) and j = (0; 1).
In R3 we write i = (1; 0; 0), j = (0; 1; 0) and k = (0; 0; 1).
Example 1.8. Let x = (1; 1; 2; 3) and y = ( 4; 1; 0; 2). Then
p
p
kxk = 12 + ( 1)2 + 22 + 32 = 15:
p
p
kyk = ( 4)2 + 12 + 02 + 22 = 21:
x y = (1)( 4) + ( 1)(1) + (2)(0) + (3)(2) = 1:
The distance between the two vectors is given by
p
p
kx yk = (1 ( 4))2 + ( 1 1)2 + (2 0)2 + (3 2)2 = 34
12
Linear Equations { a Review
Problem 1.2. Compute the distance between (3; 2) and (7; 4). Compute the
angle between them.
The following two theorems state simply that the Euclidean norm and dot
product obey more abstract denitions of a norm and \inner product" discussed
elsewhere in the series.
Theorem 1.9. The Euclidean norm in Rn satises the following properties.
1. For any x 2 Rn and any scalar 2 R we have
kxk = jjkxk:
(Note that jj denotes the absolute value of while kxk denotes the Euclidean norm of x.)
2. For any x 2 Rn we have
kxk 0
and equality holds if and only if x = 0.
3. For any x 2 Rn and y 2 Rn we have the triangle inequality
kx + yk kxk + kyk;
Theorem 1.10. The dot product in Rn satises the following properties:
1. For any x 2 Rn , y 2 Rn , and z 2 Rn and any scalars 2 R and we have
2. For any x 2 Rn
2R
(x + y) z = (x z) + (y z):
and y 2 Rn
xy =yx
3. For any x 2 Rn we have
xx0
and equality holds if and only if x = 0.
Denition 1.11 (C n ). The space C n of ordered n-tuples of complex numbers
is dened in much the same way as Rn except that scalars are complex numbers.
The Euclidean norm of z 2 C n is dened by
p
kzk := jz1 j2 + jz2 j2 + + jzn j2
and the dot product
z w := z1 w1 + z2 w2 + + znwn :
(Here jzi j is the modulus and zi is the complex conjugate of the complex number
zi .) Note that
jzj2 = z z:
Vectors in
Rn
and
13
Cn
Problem 1.3. Show that the Euclidean norm in C n satises the following prop-
erties.
1. For any z 2 C n and any scalar 2 C we have
kzk = jjkzk:
(Note that jj denotes the modulus of the complex number while kzk
denotes the Euclidean norm of z.)
2. For any z 2 C n we have
kzk 0
and equality holds if and only if z = 0.
3. For any z 2 C n and w 2 C n we have the triangle inequality
kz + wk kzk + kwk;
Problem 1.4. Show that the dot product in C n satises the following proper-
ties.
1. For any z 2 C n , w 2 C n , and y 2 Rn and any scalars 2 C and 2 C
we have
z (w + y) = (z w) + (z y):
2. For any z 2 C n and w 2 C n
zw =wz
3. For any z 2 C n , w 2 C n , and any scalar 2 C we have
(z) w = (z w):
4. For any z 2 C n we have
zz 0
and equality holds if and only if z = 0.
Remark 1.12. Note that we have dened the complex dot product so that
only the rst vector sees conjugation and no conjugation occurs on the entries
of the second vector. Many texts do exactly the opposite, so you should watch
to see which convention is used.
14
Linear Equations { a Review
1.2
Matrix Operations
Prerequisites:
Familiarity with elementary matrix arithmetic.
Learning Objectives:
Familiarity with matrix notation.
Review of basic operations such as addition, scalar multiplication, multiplication, and transposition of matrices.
Familiarity with block matrix operations.
The ability to express elementary row operations as matrix operations.
Recall that an m n matrix A is a rectangular array of real or complex
numbers having m rows and n columns. We denote the entry (or component)
of A in the ith row and j th column by aij and write the matrix A represented
in terms of its components as [aij ].
To add two matrices A = [aij ] and B = [bij ], both must be of the same size
and we dene the sum C = [cij ] as
A + B = C = [aij + bij ];
indicating that matrices are added componentwise and that cij = aij + bij for
each i and j .
We denote the m n matrix all of whose entries are zero by 0mn , or simply
0, if the context makes the matrix size unambiguous. Clearly,
A + 0 = 0 + A = A:
Scalar multiplication of a matrix, A, by a (real or complex) scalar is
dened by
A = [aij ]:
If A = [aij ] is an m p matrix and B = [bij ] is a p n matrix we dene the
matrix product AB as an m n matrix C = [cij ] with
cij =
p
X
`=1
ai` b`j
Matrix Operations
15
Note that cij is just the dot product of the ith row of A with the j th row of
B each considered as a vector in Rp . Recall that the matrix product of two
matrices is usually not commutative, that is, AB is not usually equal to BA,
even when both products are dened. However, matrix products are always
associative, that is (AB)C = A(BC)
We dene the square n n identity matrix In = [Æij ] so that Æij = 0 if i 6= j
and Æii = 1. For example,
2
3
1 0 0
1
0
I2 = 0 1 ; I3 = 4 0 1 0 5 ; etc.
0 0 1
The greek letter Æ is generally used for the elements of In for historical reasons,
but this use does serve to avoid confusion since the expected letter i is so often
used as a subscript. In this context, Æij is also called the \Kronecker delta." If
the matrix dimension is unambiguous then we simply write the identity matrix
as I. It is easy to see that if A is an m n matrix then
AIn = Im A = A:
We will use the convention throughout these notes that n 1 matrices typically will be denoted by lower case boldface letters and will usually be referred
to as vectors. Thus, dening A, B, and x as
2
3
2
3
2
3
1 2 2
3
x
A = 4 3 5 4 5 ; B = 4 5 5 ; and x = 4 y 5 ;
2 1 3
7
z
our original system (1.1) can then be written compactly with the aid of matrix
multiplication as
Ax = B:
It is occasionally useful to view the matrix product of two matrices broken
down in terms of matrix products of smaller submatrices. Consider the pair of
matrices
2
3
3 2
6 1
1 77
A = 13 24 31 12
B=6
4 2
15
2 3
and label the submatrices
3 1 ;B = 3 2 ;B = 2 1 :
A11 = 13 24 ; A12 =
11
21
1 2
1 1
2 3
16
Linear Equations { a Review
A simple calculation will verify that
AB = 111 154 = A11 B11 + A12 B21 :
The right-hand expression is an example of partitioned matrix multiplication.
More generally, suppose that the matrices A and B can be partitioned as
2
3
2
3
A11 A12 A1p
B11 B12 B1r
6 A21 A22 A2p 7
6 B21 B22 B2r 7
7
6
A=6
B
=
6 .
7
6 .
..
.. 5
..
.. 775
.
.
.
.
.
4 ..
4
.
.
.
.
.
.
.
Aq1 Aq2 Aqp
Bp1 Bp2 Bpr
where Aij and Bij are k k matrices (and so that, in particular, both row and
column dimensions of each matrix are divisible by k). Focus on an entry of the
matrix product C = AB , say, the (1; 2) entry. This is the dot product of the
rst row of A with the second column of B, which can be broken down into a
sum of dot products
p
X
i=1
(rst row of A1i ) (second column of Bi1 ):
Furthermore, this entry is exactly the (1; 2) entry of the matrix
C11 =
p
X
i=1
A1i Bi1
and in fact, the entire matrix product C = AB can be partitioned as
2
3
C11 C12 C1r
6 C21 C22 C2r 7
7
C=6
6 .
..
. . . ... 75
4 ..
.
Cq1 Cq2 Cqr
with
Cij =
p
X
`=1
Ai` B`j :
This idea can be extended without diÆculty to situations where the submatrices
are rectangular and of dierent sizes, provided only that the required operations
on submatrices are all well-dened. A particularly useful, albeit special, example
of this arises when C and B are partitioned by columns:
C = [C1 C2 Cn ]
B = [B1 B2 Bn ]:
Then the columns of C = AB can be directly expressed in terms of the columns
of B as Ci = Abi for each i = 1; ; n.
17
Matrix Operations
The transpose of an m n matrix A may be dened as the n m matrix,
T, obtained by interchanging rows with columns in A, in eect \twirling" A a
half turn about its main diagonal. For example, if
A = 30 24 17
2
3 0
T=4 2 4
1 7
then
3
5
The matrix T is typically written as At .
An alternate,
though occasionally more useful denition uses the dot product
P
u v = i ui vi and gives the transpose matrix T = [tij ] of A as that particular
n m matrix such that (Tx) y = x (Ay) for all x 2 Rn and y 2 Rm . Taking
this alternate denition as a starting point, notice then that
(Tx) y =
implying that
8
m <X
n
X
i=1
:
j =1
9
=
tij xj yi =
m X
n
X
i=1 j =1
;
n
X
j =1
xj
( m
X
i=1
)
aji yi = x (Ay)
ftij aji g xi yj = 0:
Since we ask that this be true for all x 2 Rn and y 2 Rm , it must be true that
tij = aji , which is equivalent to the rst (usual) denition of the transpose.
The following properties of the transpose are easily proved
1. (At )t = A.
2. (A + B)t = At + Bt .
3. (A)t = At ; 2 R:
4. (AB )t = Bt At :
Notice that the dot product itself can be written as x y = xt y.
Transposition of partitioned matrices is straightforward:
2
3
2 t
3
C11 Ct21 Ctq1
C11 C12 C1r t
6 C21 C22 C2r 7
6 Ct12 Ct22 Ctq2 7
7
6
7
Ct = 6
=
6 .
7
6 .
7
..
.
.
.
.
.
.
.
.
.
.
.
4 ..
5
4
. .
. . 5
.
.
.
Cq1 Cq2 Cqr
Ct1r Ct2r Ctqr
Because of the observation that
Row k of the product (BA) = (Row k of B) A;
it immediately follows that elementary row operations satisfy a type of associativity with multiplication that can be expressed as
rowop(BA) = rowop(B) A;
18
Linear Equations { a Review
where \rowop(B)" for example indicates the outcome of an elementary row
operation applied to B. If we take B = I in particular, then for any matrix
A, rowop(A) = rowop(I) A: From this, one can see that any of the three
types of elementary row operations can be applied to a given matrix through
the premultiplication by an appropriate elementary matrix (given by rowop(I)
above). For example, if A is a matrix with four rows, consider the following:
Type 1 example: \Add (row 2) to (row 4), replacing (row 4) with the result" can be accomplished by premultiplication with
2
3
1 0 0 0
6 0 1 0 0 7
6
7
4 0 0 1 0 5
0 0 1
Type 2 example: \Interchange (row 1) and (row 3)" can be accomplished by
premultiplication with
2
3
0 0 1 0
6 0 1 0 0 7
6
7
4 1 0 0 0 5
0 0 0 1
Type 3 example: \Multiply (row 3) by " can be accomplished by premultiplication with
2
3
1 0 0 0
6 0 1 0 0 7
6
7
4 0 0 0 5
0 0 0 1
.
The reduction phase of Gauss elimination on an augmented matrix \A"
then is expressible in terms of a sequence of premultiplications by elementary
matrices:
Ek Ek 1 E2 E1 A = R
where Ei is the elementary matrix that does the ith elementary operation during
the reduction. Here, k is the total number of elementary row operations that
were necessary and R is the nal row echelon matrix.
What happens if we postmultiply by elementary matrices ? A moment's
reection should reveal that now columns are combined with one another via
analogous elementary operations. Running through the previous examples now
for a matrix A with four columns :
Type 1 example: \Add (column 4) to (column 2), replacing (column 2)
with the result" can be accomplished by postmultiplication of A with
2
3
1 0 0 0
6 0 1 0 0 7
6
7
4 0 0 1 0 5
0 0 1
19
Matrix Operations
Type 2 example: \Interchange (column 1) and (column 3)" can be accomplished by postmultiplication of A with
2
6
6
4
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
3
7
7
5
Type 3 example: \Multiply (column 3) by " can be accomplished by postmultiplication of A with
2
6
6
4
.
1
0
0
0
0
1
0
0
0 0
0 0
0
0 1
3
7
7
5
Problem 1.5. How would you eect the following operations on a 3 4 matrix
A using only matrix multiplication ? In each case give the matrix E that is
needed and whether AE or EA is to be computed. Notice that only the rst
two correspond to elementary row operations.
1. Add 3 times row 2 to row 3.
2. Interchange rows 1 and 3.
3. Multiply column 2 by 6.
4. Delete row 2 to obtain a 2 4 matrix.
Now give the result of the matrix operations described above when performed
in the order given if A is dened as
2
3
1 2 3 4
A = 4 4 3 2 1 5:
2 4 3 1
Write the result also in the form E1 AE2 , giving both E1 and E2 .
20
Linear Equations { a Review
1.3
Matrix Inverses
Prerequisites:
Linear systems of equations
Matrix arithmetic
Gauss elimination
Elementary Matrices
Learning Objectives:
Familiarity with the denition of left and right inverses of a matrix.
Familiarity with the inverse of elementary matrices.
Ability to determine whether a matrix has a left inverse or a right inverse
and to compute those matrices when they exist.
Ability to determine whether a matrix is invertible using several techniques.
Ability to compute the inverse of an invertible matrix.
The notion of a matrix inverse arises naturally in the very special context of
the \reversibility" of elementary row operations as discussed in x1 and x1.2. In
particular, if E is an elementary matrix associated with some elementary row
operation and E^ is the elementary matrix associated with the elementary row
operation that undoes the rst, then the product E^ E = I since the successive
application of the two elementary row operations should have no net eect {
the second reverses the eect of the rst. An exactly analogous argument yields
that EE^ = I, too. For example, the elementary row operation \Add (row
2) to (row 4), replacing (row 4) with the result" for a matrix with four rows is
associated with the elementary matrix
2
3
1 0 0 0
6 0 1 0 0 7
7
E=6
4 0 0 1 0 5
0 0 1
The \inverse" elementary row operation is \Subtract (row 2) from (row 4),
replacing (row 4) with the result" is then associated with the elementary matrix
2
3
1 0 0 0
^ = 664 0 1 0 0 775
E
0 0 1 0
0 0 1
Matrix Inverses
21
One can multiply out to verify that E^ E = EE^ = I.
What about \matrix inverses" for matrices that are not elementary matrices
or maybe not even square ? This could be handy since then matrix inverses could
play a role in the solution of linear systems analogous to the role reciprocals play
in the solution of scalar equations. In the scalar case, reciprocals are dened
so that the product of a scalar and its reciprocal is 1. Then to solve the scalar
equation x = y, for x, multiply both sides of the equation by the reciprocal of
which we will call (= 1=) to get
()x = y
1 x = x = y:
But how then to dene \reciprocals" for matrix multiplication in general ?
In as much as the identity matrix is the matrix analog of the scalar \1"
and following on with what we observed for elementary matrices, we might try
requiring that the product of a matrix A with its \reciprocal" matrix B be the
identity matrix, I. However, since AB 6= BA typically, either of the plausible
requirements AB = I or BA = I should be considered independent of the other.
Suppose we have a matrix A 2 Rmn and a matrix BR dened in such a way
that ABR = Im . BR is called a right inverse of A. Whenever an A 2 Rmn has
a right inverse BR then BR 2 Rnm and the system of equations Ax = c will
be consistent for every possible right hand side c, since BR c itself will always
be one possible solution :
A(BR c) = (ABR )c = Ic = c:
There may be other solutions to the linear system as well, and indeed, there
could be more than one right inverse to a matrix or, in the other extreme, none
at all. For example, with A given by
A = 12 35 11 ;
both matrices
2
3
2
3
3 1
7 3
4 2
0 5 and 4 4 2 5
4 1
6 3
are right inverses for A.
Tall, skinny matrices (more rows than columns) can never have right inverses.
Indeed, if A 2 Rmn with m > n has a right inverse BR then BR 2 Rnm is
short and fat (more columns than rows) and then we are assured from Theorem
1.2 that the homogeneous system of equations BR x = 0 must have a nontrivial
solution x^ 6= 0. But then we nd ourselves in a dilemma, since now
x^ = Ix^ = A(BR x^ ) = 0:
So it can't happen that an A 2 Rmn with m > n has a right inverse !
22
Linear Equations { a Review
How do we go about calculating right inverses ? Partition BR and Im by
columns:
BR = [b1 b2 bm ]
Im = [e1 e2 em]:
Then the condition ABR = Im means that the columns of BR are solutions to
the multiple linear systems
Abi = ei
for i = 1; ; m. We can solve these systems simultaneously with Gauss-Jordan
elimination by forming the fat augmented matrix
A e1 e2 em = A Im
and then reducing to reduced row echelon form { for each choice of free parameters that arise in the nal stage of Gauss-Jordan elimination a set of solutions
are obtained, which then leads to a right inverse. Using as an example the
matrix A above, we nd
1 3 1 1 0
[A j Im ] =
2 5 1 0 1
+
1 0 2 5 3
(reduced row echelon form)
0 1 1 2 1
So, labeling the free parameter as s and t for the rst and second right-hand
side columns respectively, all right inverses are given by
2
BR = 4
5 2s
2+s
s
3
3 2t
1+t 5;
t
for all the various choices of s and t.
Theorem 1.13. The following statements are equivalent (whenever one is true,
they all must be true { whenever one is false they all must be false)
A matrix A has a right inverse.
the reduced row echelon form for A has a leading one in each row.
The system of equations Ax = b is consistent for each possible right-hand
side b.
Proof . We proceed by showing that the rst statement implies the second
which in turn implies the third, which nally comes around and implies the rst.
Suppose that A has a right inverse (call it B) but that somehow the reduced
row echelon form for A fails to have a leading one in each row and so, instead
23
Matrix Inverses
must have a bottom row of zeros (and maybe other rows of zeros too). This
means that there are m m elementary matrices fEi gki=1 so that
Ek Ek
1
E E A = R0 :
2
1
Now postmultiply by the hypothesized right inverse of A to get
Ek Ek 1 E2 E1 AB = Ek Ek 1 E2 E1
R
= 0 B
=
RB
0
This amounts to saying that the product of the elementary matrices fEi gki=1
has a bottom row of all zeros ! But this can't happen since if it were true
and fE^ i gki=1 denoted the elementary matrices that are inverse to fEi gki=1 then
postmultiplication in turn by E^ 1 ; E^ 2 ; ; E^ k 1 ; and E^ k yields
I =
=
RB E^ E^ E^ E^
1 2
k 1 k
0
RBE^ 1 E^ 2 E^ k 1 E^ k
0
which can't be true, since the identity matrix has no rows of zeros. Thus,
anytime a matrix A has a right inverse it must also have a reduced row echelon
form with a leading one in each row.
Now suppose the reduced row echelon form for A 2 Rmn has a leading
one in each row. This means, in particular, that m n and that every system
of equations of the form Ax = b is consistent for any b that might be chosen
(since after row reduction every equation is associated with a dierent variable
{ and possibly some free parameters). Consistency for all right hand sides, in
turn, implies that there are solutions to each of the multiple linear systems
Abi = ei
for i = 1; ; m and the condition ABR = Im can be satised column by
column. Thus, anytime a matrix A has a reduced row echelon with a leading
one in each row, Ax = b is consistent for every b and if a matrix A is such that
Ax = b is consistent for every b, then A must have a right inverse. 2
Working now from the other side, suppose instead we have a matrix BL
so that BL A = I. Appropriately enough, BL is called a left inverse of A. If
A 2 Rmn has a left inverse BL , then BL 2 Rnm and whenever the system
of equations Ax = b is consistent (it might not be) then there can only be one
unique solution to the system. Indeed, suppose there were two solutions x1 and
x2 to the linear system Ax = b. Then using BL A = I we nd
x1 = Ix1 = BL Ax1 = BL b = BL Ax2 = Ix2 = x2
24
Linear Equations { a Review
(the two solutions are the same !!).
Theorem 1.14. The following statements are equivalent (whenever one is true,
they all must be true { whenver one is false they all must be false)
A matrix A has a left inverse
the reduced row echelon form for A has a leading one in each column
Whenever the system of equations Ax = b is consistent, there can only be
one unique solution to the system
Proof . We will proceed by showing that the rst two statements are equivalent and then showing that the second and third statements are equivalent.
Suppose the reduced row echelon form for A 2 Rmn with m n has a leading
one in each column. Then there are m m elementary matrices fEi gki=1 so that
Ek Ek
1
E E A = I0n
2
1
Dene BL = In Z Ek Ek 1 E2 E1 for any Z 2 Rnm n . Then direct
substitution veries that BL A = In and we have exhibited a left inverse for A.
Conversely, suppose that A does have a left inverse BL 2 Rnm but that
nonetheless it somehow fails to have a reduced row echelon form with a leading
one in each column. Then Gauss-Jordan elimination on the augmented matrix
[A j 0] representing the homogeneous system of equations Ax = 0 yields
R 0
0 0
where R is a matrix containing no nontrivial rows and whatever nonzero entries
that remain in the reduced row echelon form. Since R contains fewer than n
leading ones it must be short and fat and by Theorem 1.2 there will be nontrivial
solution x^ 6= 0 to Rx = 0 and hence to Ax = 0. A conict arises since then
x^ = Ix^ = (BL Ax^ ) = 0:
So whenever A has a left inverse, its reduced row echelon form has a leading
one in each column.
To show that the second and third statements are equivalent, observe that
the reduced row echelon of the augmented matrix [A j b] will allow no free
parameters exactly when the reduced row echelon form for A has leading ones
in each column. No free parameters in the reduced row echelon of the augmented
matrix [A j b] amounts to the assertion that the solution to Ax = b is unique,
if it exists (the system might still be inconsistent, after all). 2
Just as for right inverses, a matrix could have more than one left inverse,
or none at all. Short fat matrices cannot have left inverses. Similar to the case
of right inverses (but not in exactly the same way), all left inverses of a matrix
25
Matrix Inverses
can be revealed through Gauss elimination. Notice rst that if a matrix A has
a left inverse BL then BtL is a right inverse to At . So to compute a left inverse,
we go through the process described above to calculate the right inverse of the
transposed matrix and transpose the result. For example, the set of left inverses
to the matrix A dened as
2
3
1 1
A=4 1 1 5
2 3
can be obtained from
Im
= 11 11 23 10 01
+
1 1 0 3 2
(reduced row echelon form)
0 0 1 1 1
Labeling the free parameter as s and t for the rst and second right-hand
side columns respectively, all left inverses are given by
3
s
s
1
BL =
2 t t 1 ;
for all the various choices of s and t.
Note that a solution to the scalar equation x = y exists and is unique for
each right-hand side y if and only if 6= 0, and this is precisely the condition
under which has a reciprocal = 1=. By analogy then, it is natural to consider the circumstances under which a solution x^ to a system of linear equations
Ax = b is guaranteed both to exist and to be unique for each right-hand side b
to be exactly when we consider A to be \invertible." From the above discussion,
such a situation implies that both a left and a right inverse exists for A. Notice
this implies that m = n, that is that A is a square matrix. Furthermore, if both
BR and BL exist then in fact,
BL = BL I = BLABR = IBR = BR
(BL and BR must be equal !!). This leads us to the following denition.
An n n matrix A is said to be invertible (or equivalently, nonsingular) if
there exists an n n matrix B such that AB = I and BA = I. The matrix
B is called the inverse of A and is denoted by B = A 1 . If no such B exists
that satises both AB = I and BA = I, we say that A is noninvertible or
singular. Again, as we have dened it, only square matrices are candidates for
invertibility (though left and right inverses can serve as useful generalizations
to rectangular matrices).
At
Theorem 1.15. The following statements are equivalent (whenever one is true,
they all must be true { whenever one is false they all must be false)
26
Linear Equations { a Review
A matrix A is invertible.
The matrix A is square and whenever the system of equations Ax = b is
consistent, there can only be one unique solution to the system.
the reduced row echelon form for A is the identity matrix, In .
The matrix A is square and the system of equations Ax = b is consistent
for each possible right-hand side b.
This result has as a pleasant consequence, that a square matrix, A, has a
left inverse if and only if it has a right inverse, as well. Furthermore, if A(1) and
A(2) are two invertible matrices of the same size then the product A(1) A(2) is
also invertible and
(A(1) A(2) ) 1 = (A(2) ) 1 (A(1) ) 1
that is, the inverse of a product is the product of the inverses taken in reverse
order. This easy to see by writing down,
(A(1) A(2) )(A(2) ) 1 (A(1) ) 1 =A(1) (A(2) (A(2) ) 1 )(A(1) ) 1
=A(1) I(A(1) ) 1 = I
Thus, (A(2) ) 1 (A(1) ) 1 is a (square) right inverse for A(1) A(2) and hence is
also its left inverse.
One can check (by multiplying out) that the matrix
2
3
1 2 3
A=4 2 5 3 5
1 0 8
is invertible with inverse given by
2
3
40 16 9
A 1 = 4 13 5 3 5
5 2 1
Given A we compute its inverse by solving the system AB = I, as above. This
leads us to the following procedure. Write the fat augmented matrix
2
3
1 2 3 1 0 0
4 2 5 3 0 1 0 5
1 0 8 0 0 1
and reduce the left-hand 3 3 submatrix to reduced row echelon form using
elementary row operations. The resulting right-hand 3 3 matrix will be the
inverse, if the resulting reduced row echelon form is the identity matrix. We
pick up the process from the time the left matrix is in row echelon form.
3
2
1 2 3 1 0 0
4 0 1
3 2 0 15
0 0 1 5 2 1
27
Matrix Inverses
+
2
4
1 2 0
0 1 0
0 0 1
14
13
5
+
6
5
2
2
3
3
1
3
5
3
1 0 0 40 16 9
4 0 1 0
13 5 3 5
0 0 1
5 2 1
Example 1.16. Solve the system
x1 + 2x2 + 3x3 = 1
2x1 + 5x2 + 3x3 = 0
x1 + 8x3 = 1
Writing the system as Ax = b, we obtain
2
32
3
40 16 9
1
x = 4 13 5 3 5 4 0 5
5 2 1
1
Multiplying out, the nal solution is x1 = 31; x2 = 10 and x3 = 4.
If a matrix is not invertible, the proceedure used for computing inverses will
exhibit this, just as Gauss elimination applied to an inconsistent system will
reveal that inconsistency. For example let
2
3
1 1 1
A=4 1 2 0 5
2 3 1
At one point in the reduction procedure we arrive at the augmented matrix
3
2
1 0 2 2 1 0
4 0 1
1 1 1 05
0 0 0 1 1 1
We can stop here since the left 3 3 matrix is not I but is in reduced row echelon
form. Our conclusion is that A has no inverse.
In problems 1.6 and 1.7 determine which matrices are invertible. If the matrix
is invertible, nd its inverse.
Problem 1.6.
2
4
1
3
2
0 2
1 3
3 6
3
5
28
Linear Equations { a Review
Problem 1.7.
2
4
2 4
1 4
4 12
1
3
5
3
5
Problem 1.8. A matrix D = [dij ] is called diagonal if all entries o the
main diagonal are zero, dij = 0 whenever i 6= j . Show that a square
diagonal matrix with no zero entries on the main diagonal (dii 6= 0) is
invertible and nd its inverse.
Suppose A = (aij ) is n n. If a1n ; a2;n 1; : : : an1 are all nonzero while
all the other entries of A are zero, show that A is invertible and nd its
inverse.
2. A matrix T = [tij ] is called upper triangular if all entries below the main
1.
diagonal are zero, tij = 0 whenever i > j . Show that a square upper
triangular matrix with no zero entries on the main diagonal (tii 6= 0) is
invertible and show that its inverse is also upper triangular.
Problem 1.9. 1. Explain carefully why short fat matrices A 2 Rmn with
m < n cannot have left inverses.
2. Suppose A(1) and A(2) are two matrices dimensioned so that the product
A(1) A(2) is is well-dened. Show that if A(1) and A(2) each have left
(2)
(1) (2)
inverses, B(1)
L and BL , respectively, then the product A A also has
(2) (1)
a left inverse given by BL BL .
3. Give an example of a matrix that has neither a left nor a right inverse.
4. If a matrix A 2 Rmn with m < n has a right inverse, how many free
parameters does the family of all right inverses for A have ?
1.4.
1.4
THE LU DECOMPOSITION
29
The LU Decomposition
Prerequisites:
Linear systems of equations
Matrix arithmetic
Gauss elimination
Elementary Matrices
Matrix inverses
Learning Objectives:
Familiarity with the denition of the LU decomposition of a matrix.
Ability to use an LU decomposition to solve a linear system.
Detailed understanding of the representation of Gauss elimination using
elementary matrices.
Ability to compute the permuted LU decomposition of a matrix.
Matrix decompositions play a fundamental role in modern matrix theory as
they often can reveal structural features of the transformations the matrices
represent. The LU decomposition is perhaps the most basic of these decompositions and is intimately related to Gaussian elimination. Without yet saying
exactly how this would be done from scratch, notice that the original coeÆcient
matrix of (1.1),
2
3
1 2 2
A = 4 3 5 4 5;
2 1 3
can be factored into the product of two matrices LU where U is an upper
triangular matrix (that is, nonzero entries are on or above the main diagonal)
and L is a unit lower triangular matrix (all nonzero entries are on or below the
main diagonal with ones on the diagonal). In particular, one can multiply out
to check that A = LU with
2
3
2
3
1 0 0
1 2 2
L = 4 3 1 0 5;
U=4 0 1 2 5
2 3 1
0 0 1
Assuming that such a decomposition is known, consider the following alternate approach to solving the system of equations, Ax = b. Write Ax = b
30
Linear Equations { a Review
as L(Ux) = b and dene y = Ux. Then solve in sequence the two (simple)
triangular systems: First, solve Ly = b for y, or explicitly for (1.1),
y1 = 3
y1 = 3
3y1 + y2 = 5 ) y2 = 4
2y1 + 3y2 + y3 = 7
y3 = 1
then solve Ux = y for x, or explicitly for (1.1),
x1 + 2x2 + 2x3 = 3 (= y1 )
x1 = 3
x2 2x3 = 4 (= y2 )
) x2 = 2
x3 = 1 (= y3 )
x3 = 1
A matrix A has an LU decomposition if A = LU with L a unit lower
triangular matrix and with U an upper triangular matrix. Not every matrix has
an LU decomposition { for example, one can attempt to complete the unknown
entries of
? ?
A = 01 11 = 1? 01
0 ?
to see that there is no choice that would satisfy the condition A = LU.
To see how this decomposition could be computed in circumstances where
it does exist, let us consider the row reduction phase of Gauss Elimination on
the above matrix. In this case, three elementary row operations, all of Type
1, are necessary to reduce A to row echelon form:
Zero out the (2; 1) entry by adding 3 (row 1) to (row 2).
Zero out the (3; 1) entry by adding 2 (row 1) to (row 3).
Zero out the (3; 2) entry by adding 3 (row 2) to (row 3).
When manifested as matrix multiplications this appears as:
2
32
3 2
3
1 0 0
1 2 2
1 2 2
4 3 1 0 54 3 5
4 5=4 0 1 2 5
0 0 1
2 1 3
2 1 3
2
32
1 0 0
4 0 1 0 54
2 0 1
2
32
1 0 0
1 0
4 0
1 0 54 0 1
0 3 1
2 0
Or, written altogether,
32
3
2
1 0 0
1 2 2
1 2
3 1 0 54 3 5 4 5 = 4 0 1
0 0 1
2 1 3
0 3
32
32
3 2
0
1 0 0
1 2 2
0 54 3 1 0 54 3 5 4 5 = 4
1
0 0 1
2 1 3
1
0
0
2
2
7
3
5
2
1
0
3
2
2 5:
1
E3;2E3;1 E2;1 A = R
where we have denoted the Type 1 elementary matrix that introduces a zero
at the (i; j ) location of A with Ei;j .
31
LU Decomposition
2
Observe that E3;2E3;1 E2;1 = 4
2
4
3
1
3
7
0 0
1 0
3 1
3
5
whereas (E3;2 E3;1 E2;1 ) 1 =
1 0 0
3 1 0 5. This is exactly the LU decomposition we gave above,
2 3 1
2
32
3
1 0 0
1 2 2
4
5
4
3
1
0
0 1 2 5 = LU:
A = (E3;2 E3;1 E2;1 ) R =
2 3 1
0 0 1
Notice that the multipliers (with changed sign) are placed at the locations in L
precisely where those zeros were in the matrix that the multipliers had a role in
introducing.
To see how this will work in general, consider the simplest case rst where
only Type 1 operations (i.e., adding multiples of rows to one another) are
suÆcient to complete reduction to row echelon form.
As before, let Ei;j denote the Type 1 elementary matrix that introduces a
zero into the (i; j ) location of the matrix being reduced and suppose specically
that this is done by adding ij (row j ) to (row i). By writing it out, one may
verify that Ei;j has a convenient representation:
Ei;j = I + ij ei etj
Since we only introduce zeros below the pivot entry in a given column we can
always assume that Ei;j is dened just for i > j .
Let's group together the Type 1 elementary matrices that work on the same
column of A. Dene the matrices,
M1 = En;1 En 1;1 E2;1
M2 = En;2 En 1;2 E3;2
and in general
Mj = En;j En 1;j Ej+1;j :
Each of the matrices Mj eects the row reduction that introduces zeros into
the entries j + 1; : : : n 1; n of (column j ) by adding multiples of using (row
j ). The matrix Mj is called a Gauss transformation and can be represented in
the following useful way:
Mj = I `j etj
where
`j = [0| ; 0; {z ; 0}; j+1;j ; ; n 1;j ; nj ]t
1
j
This can be seen directly by multiplying out
En;j En 1;j Ej+1;j = I + (nj en + n 1;j en 1 + + j+1;j ej+1 )etj
32
Linear Equations { a Review
(all other products in the expansion are 0).
Then to complete the reduction to row echelon form, we have
Mn
and
1
M M A = R
2
A = (M1 1 M2 1 M3 1
1
Mn )R:
1
1
Now, note that
1. For each j = 1; : : : ; n 1, we have Mj 1 = I + `j etj (just multiply it out
to check !)
2. For each i < j ,
Mi 1Mj 1 = I + `i eti + `j etj + `i eti `j etj
|{z}
=0
But then
M1 1
Mn
1
1
=
=
=
..
.
=
(I + `1 et1 )(I + `2 et2 )(I + `3 et3 ) (I + `n 1 etn 1 )
(I + `1 et1 + `2 et2 )(I + `3 et3 ) (I + `n 1etn 1 )
(I + `1 et1 + `2 et2 + `3 et3 ) (I + `n 1 etn 1 )
I+
nX1
j =1
`j etj
which is our lower triangular matrix,
2
1
6
21
L=6
6
..
4
.
3
0 0
1 0 77
.. . . . .. 75 :
.
.
n1 n2 1
This is exactly the pattern of multipliers we saw in the rst example.
The general case where A is an n m matrix is somewhat more complicated.
We commented in x1 that it is always possible to achieve row echelon form using
only Type 1 and Type 2 elementary operations. But Type 2 operations play
a role only when a row interchange is necessary to bring a nonzero entry into
the pivot position in order then to zero out all nonzero entries below it in the
same column using subsequent Type 1 elementary operations. For example,
the matrix
2
3
1 2 2
4 3 6
45
2 1 3
33
LU Decomposition
cannot be brought into row echelon form using only Type 1 operations. After
two steps we nd
2
32
32
3 2
3
1 0 0
1 0 0
1 2 2
1 2 2
4 0 1 0 54 3 1 0 54 3 6
4 5=4 0 0 2 5
2 0 1
0 0 1
2 1 3
0 3 7
However, a row interchange at this point leaves us in row echelon form:
2
32
32
32
3 2
3
1 0 0
1 0 0
1 0 0
1 2 2
1 2 2
4 0 0 1 54 0 1 0 54 3 1 0 54 3 6
4 5=4 0 3 7 5
0 1 0
2 0 1
0 0 1
2 1 3
0 0 2
Thus, the following sequence of steps occur in the course of row reduction
in general:
1. (Possible) row interchange of (row 1) with a lower row.
2. Row reduction using (row 1) to introduce zeros into entries 2; 3; : : : n of
(column 1).
3. (Possible) row interchange of (row 2) with a lower row.
4. Row reduction using (row 2) to introduce zeros into entries 3; 4; : : : n of
(column 2)
..
5.
.
In order to represent this reduction process with matrix multiplication, let
j denote the elementary matrix that eects a row interchange of (row j ) with
some appropriate lower row. The interleaving of Type 1 and Type 2 operations
used to reduce to row echelon form as described above then appears as:
M M A = R
The matrices i are elementary matrices associated with Type 2 (row interchange) operations { i is an example of a permutation matrix. In general,
permutation matrices are obtained by permuting the columns (or rows) of the
identity matrix. As a result, it's not hard to see that any product of permutation matrices is also a permutation matrix and if P is a permutation matrix
then PPt = I.
Since products of permutation matrices are permutation matrices (how would
you explain that ?) and since products of Gauss transformations are unit lower
triangular matrices that lead in a simple way to our L, we'd like to separate out
the Type 1 and Type 2 operations and collect each together separately. The
key observation that lets us do this is that for i > j
c j i
i Mj = M
Mn 1n
1
2
2
1
1
34
Linear Equations { a Review
c j is a Gauss transformation of the same form as Mj but with two
where M
multipliers interchanged (ith and one with index bigger than i according to i ).
Putting this together,
Mn 1 n 1 M3 3 M2 2 M1 1 A = R
becomes, after commuting the permutations past the Gauss Transformations,
cn 2 M
c3M
c2M
c 1 )(n
(Mn 1 M
or, with some rearrangement,
A = (1 2 3
If we dene
and
1
)A = R
3
2
1
c M
c M
c
c
n )M
M
n Mn R
1
P = n
1
1
1
2
1
3
1
c 1M
c 1M
c 1
L=M
1
2
3
3
2
1
2
1
1
1
c
M
n Mn ;
1
2
1
1
then we have our nal conclusion:
Theorem 1.17. Any matrix, A 2 Rmn , may be decomposed as the product of
three matrices
A = Pt LU
where
P is an m m permutation matrix,
L is a unit lower triangular m m matrix, and
U is an upper triangular m n matrix in row echelon form.
This is called the permuted LU decomposition. Notice that while not every
matrix has an LU decomposition, every matrix does have a permuted LU decomposition. A somewhat surprising interpretation of this result is that if we
had known what row interchanges would occur in the course of Gauss elimination and then performed them all at the outset instead before any reduction
occured (eseentially forming PA in the process), we would complete the reduction then with no further Type 2 operations and with exactly the same Type
1 operations and obtain exactly the same nal R as the usual Gauss elimination
with interleaved Type 1 and Type 2 operations would produce.
Chapter 2
Vector Spaces and
Linear Transformations
2.1
A Model for General Vector Spaces
Prerequisites:
Basic knowledge of vectors in R2 and R3 .
Matrix manipulation.
Basic calculus and multivariable calculus skills.
Skills in language and logic, techniques of proof.
Advanced Prerequisites:
Denitions of groups and elds.
Learning Objectives:
Understanding of the basic idea of a vector space.
Ability to prove elementary consequences of the vector space axioms.
Familiarity with several examples of vector spaces.
Familiarity with examples of sets of objects that are not vector spaces.
The concept of vector space permeates virtually all of mathematics and a
good chunk of those disciplines that call mathematics into service. Whence
35
36
Vector Spaces
comes such universality ? Well-suited vocabulary, more or less. The most powerful feature of linear algebra { and by association, of matrix theory is the geometric intuition that our daily three dimensional experience can lend to much
more (sometimes unspeakably!) complicated settings. Language used to describe vector spaces are a vehicle that carries many ideas and focusses thinking.
The rst step in pushing past the boundaries of the \physical" vector spaces
of two and three dimensions begins with the Cartesian view of two and three
dimensional space as comprising ordered pairs of numbers (for two dimensional
space) and ordered triples (for three dimensional space).
Suppose n is a positive integer. An n-vector (usually just called a \vector"
if n is unambiguous or immaterial) is an ordered n-tuple1 of n real numbers
0
B
a1
a2
a=B
B .
@ .
.
an
1
C
C
C:
A
Although we endeavor to write vectors always as columns (as opposed to rows),
the notation for matrix transpose allows us to write a also as a = (a1 ; a2 ; : : : ; an )t
without squandering quite as much space and without having to make silly distinctions between \row vectors" and \column vectors". Out in the world one
sees vectors written (seemingly at random) either as rows or columns.
Let u = (u1 ; u2 ; : : : ; un )t and v = (v1 ; v2 ; : : : ; vn )t be two n-vectors. We say
that u = v if and only if ui = vi for each i = 1; 2; : : : ; n. We dene
u + v = (u1 + v1 ; u2 + v2 ; : : : ; un + vn )t fvector additiong
and
ku = (ku1 ; ku2; : : : ; kun)t ; k 2 R f scalar multiplication g:
We also dene
0 = (0; 0; : : : ; 0)t and
u = ( u1 ; u2; : : : ; un)t :
The set of all such ordered n-tuples, together with the two dened arithmetic
operations on elements of the set: vector addition and scalar multiplication, is
called Euclidean n-space and is denoted Rn . This vector space is the natural
generalization of the \physical" vector spaces R2 and R3 to higher dimension
and is the prototype for even more general vector spaces.
In general, a vector space is a set of objects (the \vectors"), a set of scalars
(usually either the real numbers, \R," or complex numbers, \C ," but other
scalar elds are sometimes used), and a pair of operations, vector-vector \addition" and scalar-vector \multiplication" that interact \sensibly" with the usual
notions of addition and multiplication for real or complex numbers.
1 An odd expression extending the already clumsy terms: : : : quadruple (n = 4), quintuple
(n = 5), sextuple (n = 6), etc.
General Vector Spaces
37
The following eight properties for \sensible" vector addition and scalar multiplication should hold for all vectors u; v; and w in an (alleged) vector space
V and scalars ; in R or C . These properties are easy to verify for vectors in
Rn and scalars ; in R.
38
Vector Spaces
General properties of vector addition and scalar multiplication:
1.
2.
3.
4.
u+v =v+u
fcommutativity of vector additiong
u + (v + w)t = (u + v)t + w
fassociativity of vector additiong
There exists a zero vector \0" such that u + 0 = 0 + u = u
For every vector u there exists a vector \ u" such that u + ( u) = 0
5. (u) = ()u
6. (u + v) = u + v
7. ( + )u = u + u
of scalar-scalar multiplication
f associativity
with scalar-vector multiplication g
multiplication
f distributivity of scalar-vector
over vector-vector addition g
multiplication
f distributivity of scalar-vector
over scalar-scalar addition g
8. 1 u = u
We can dene C n , complex n-space in the same way except we (usually) take
the scalars to be complex numbers, elements of C . One can check that all the
eight properties of addition and scalar multiplication still hold.
Now let V be a set. Suppose we have dened a \vector addition" u + v so
as always to produce a result that ends up in V for each u; v starting in V and
a \scalar multiplication" u so as to produce a result in V for each u 2 V and
2 R (or alternatively for each 2 C ) (This is succinctly stated by saying that
the set V is closed under vector addition and scalar multiplication). Further
suppose that the eight properties for vector addition and scalar multiplication
hold in V . Depending on whether real scalars or complex scalars are being used,
V is called a real or complex vector space.
Examples 2.1.
1. V = Rn is the model upon which the general denition of a vector space
is based.
2. Let V be the set of vectors in R3 of the form (x; x; y)t with the addition
and scalar multiplication it inherits from R3 . Clearly properties 1,2,5,6,7
and 8 hold for V since they hold for R3 . Properties 3 and 4 are satised
with 0 = (0; 0; 0)t and u = ( x; x; y)t where both vectors are in V .
We only need to show that V is closed under vector addition and scalar
multiplication. To see this just note that if (x; x; y)t and (s; s; r)t are in
V , then (x + s; x + s; y + r)t and (x; x; y)t are both in V . Thus V is a
vector space.
3. Let V = Pn be the set of all polynomials of degree less than or equal to
n with real coeÆcients. If p = an z n + + a0 and q = bn z n + + b0 ,
we dene
p + q = (an + bn )z n + + (a0 + b0 )
39
General Vector Spaces
and
p = (an )z n + (an 1 )z n
1
+ + a0
4. Let V be the set of m n matrices with the usual denitions of matrix
addition and scalar multiplication.
5. Let V be the set of all real valued functions on an interval I. For f; g 2 V ,
we dene the functions f + g and f by
[f + g](x) = f (x) + g(x); [f ](x) = f (x)
Problem 2.1. Show that examples 2,3 and 4 are vector spaces.
Problem 2.2. Let V be the set of all polynomials of degree n. Is V a vector
space?
Problem 2.3. Show that in any vector space V with u in V and k real (or
complex) that
1. k0 = 0
2. 0u = 0
3. u = 1u
A subset W of a vector space V is called a subspace of V , if W is itself a
vector space with the vector addition and scalar multiplication inherited from
V . Given a subset W of a vector space V with the vector addition and scalar
multiplication inherited from V , all that is necessary to check in order to assert
that W is a subspace is that W is closed under vector addition and scalar
multiplication. This follows from Problem 2.3 and the fact that vector addition
and scalar multiplication in W is the same as for V , and so all the general
properties of vector addition and scalar multiplication still hold in W . The
following problem asks you to prove this.
Problem 2.4. Show that a subset W of a vector space V , with the vector
addition and scalar multiplication inherited from V , is a subspace of V if and
only if W is closed under addition and scalar multiplication.
Example 2.2. Let W = (x; y)t 2 R2 : x = 3y . Let u and v be in V . Then
u = (3s; s)t and v = (3t; t)t for some choice of scalars s and t. Then
u + v = ((3s + 3t); s + t)t = (3(s + t); s + t)t 2 W
and
u = ((3s); s)t = ((3)s; s)t 2 W
W is a subspace of V since W is closed under vector addition and scalar multiplication.
40
Vector Spaces
Problem 2.5.
1. Show W = (x; y; z )t 2 R3 : x + y 2z = 0 is a subspace of R3 .
2. Show W = (x; y)t 2 R2 : x = y + 1 is not a subspace of R2 .
While there are an innite variety of subspaces that can be considered, there
are two especially important subspaces associated with any given m n matrix
A. Dene the kernel of A and the range of A by
Ker(A) = fx 2 Rn : Ax = 0g
and
Ran(A) = fy 2 Rm : there exists an x 2 Rn such that Ax = yg :
The kernel of a matrix is evidently the set of all possible solutions to the
homogeneous system Ax = 0 and the range of a matrix is the set of all right
hand sides, b, for which the linear system of equations Ax = b is consistent.
Theorem 2.3. Let A be an m n matrix. Then
1. Ker(A) is a a subspace of Rn
2. Ran(A) is a subspace of Rm .
Problem 2.6. Prove Theorem 2.3.
We end this section with what is probably the most fundamental way of generating a subspace of a vector space. Let fv1 ; v2 ; : : : ; vr g be a set of vectors in a
vector space V . We say that a vector u is a linear combination of v1 ; v2 ; : : : ; vr ,
if there exist scalars 1 ; 2 ; : : : ; r such that
u = 1 v1 + 2 v2 + + r vr
For example, in V = R2 the vector u = (5; 4)t is a linear combination of
v1 = (1; 1)t and v2 = (3; 2)t with 1 = 2 and 2 = 1.
The set of all possible linear combinations of a set of vectors v1 ; v2 ; : : : ; vr
is called the span of v1 ; v2 ; : : : ; vr .
Theorem 2.4. Let fv1 ; v2 ; : : : ; vr g be a set of vectors in a vector space V and
let W be the set of all linear combinations of v1 ; v2 ; : : : ; vr . Then W is a subspace of V and furthermore W is the smallest subspace containing v1 ; v2 ; : : : ; vr
in the sense that if U was another subspace containing v1 ; v2 ; : : : ; vr then W U.
The subspace W in Theorem 2.4 is denoted as
W = span(v1 ; v2 ; : : : ; vr )
.
41
General Vector Spaces
Proof: Since V is closed under vector addition and scalar multiplication,
every linear combination of v1 ; v2 ; : : : ; vr must also be in V , so that W is
certainly a subset of V . To show that W is a subspace, it is enough to show
that W itself is closed under vector addition and scalar multiplication. Let
u; v 2 W . Then
u = 1 v1 + 2 v2 + + r vr
v = 1 v1 + 2 v2 + + r vr
Hence
u + v = (1 + 1 )v1 + (2 + 2 )v2 + + (r + r )vr
which is a linear combination of v1 ; v2 ; : : : ; vr and hence in W . A similar proof
shows that W is closed under scalar multiplication.
Now suppose that U is some subspace of V containing the vectors
v1 ; v2 ; : : : ; vr :
Since U is a subspace it must be closed under vector addition and scalar multiplication and so must contain all vectors of the form
1 v1 + 2 v2 + + r vr ;
which is to say, U must contain each vector of W . 2
Problem 2.7. Determine whether u and v 2 span(S ) for the set of vectors S
given.
1. u = (1; 1; 1; 2)t, v = (1; 0; 12; 5)t and
S = (1; 1; 2; 0)t; (0; 0; 1; 2)t; (1; 1; 0; 4)t
2. u = (3; 3; 3)t, v = (4; 2; 6)t and
S = (1; 1; 3)t; (2; 4; 0)t
42
Vector Spaces
2.2
The Basics of Bases
Prerequisites:
Vector spaces.
Subspaces.
Matrix manipulation, Gauss elimination.
Advanced Prerequisites:
Fundamental theorem of algebra
Learning Objectives:
Familiarity with the concepts of linear combination, linear independence,
span, basis and dimension.
Ability to determine whether a set of vectors is linearly independent.
Ability to determine whether a set of vectors spans a given subspace.
Ability to determine whether a set of vectors is a basis for a given subspace.
Ability to determine the dimension of a given subspace.
Ability to compute the coordinates of a vector with respect to a given
basis
Ability to compute the change of basis matrix between any two bases.
Vector bases are distinguished sets of vectors in a vector space with which we
may uniquely represent any vector in the vector space. Bases are important for
much the same reason any mechanism for representation is important { dierent
representations of the same thing can bring out dierent features of the thing.
2.2.1 Spanning Sets and Linear Independence
Let fv1 ; v2 ; : : : ; vn g V , for some vector space V . Since span(fv1 ; v2 ; : : : ; vn g)
is the smallest subspace containing fv1 ; v2 ; : : : ; vn g ; we know that
span fv1 ; v2 ; : : : ; vn g V:
If, in fact, it happens that span fv1 ; v2 ; : : : ; vn g = V then we say that
fv1 ; : : : ; vn g spans V:
Notice that this means that every vector in V is expressible as a linear combination of fv1 ; v2 ; : : : ; vn g :
43
Bases
Examples 2.5.
1. The standard example of a spanning set for R3 is
i = (1; 0; 0); j = (0; 1; 0); k = (0; 0; 1):
2. We show that
S = f(1; 1; 1); (1; 1; 2); (1; 0; 0)g
spans R3 . This is true if and only if the system of equations
(b1 ; b2 ; b3) = x1 (1; 1; 1) + x2 (1; 1; 2) + x3 (1; 0; 0)
is consistent for every vector (b1 ; b2 ; b3 ). Solving the system by Gaussian
elimination, we see that x3 = b1 b2 , x2 = b3 b2 , and x1 = 2b2 b3 .
Hence S spans V .
3. Let
S = f(1; 0; 0); (1; 1; 1); (2; 1; 1)g
We show that S does not span R3 , since the system
(b1 ; b2 ; b3) = x1 (1; 0; 0) + x2 (1; 1; 1) + x3 (2; 1; 1)
is consistent only if b3 b2 = 0. Thus, for instance, (1; 0; 1) is not in the
span of S.
Problem 2.8. Determine whether the following sets of vectors span R3 .
1. f(1; 1; 1); (2; 2; 0); (3; 0; 0)g
2. f(1; 3; 3); (1; 3; 4); (1; 4; 3); (6; 2; 1)g
Suppose we have n vectors v1 ; v2 ; : : : ; vn : which span a vector space V and
suppose vn is a linear combination of v1 ; v2 ; : : : ; vn 1 . Then the vector vn is
unnecessary in so far as v1 ; v2 ; : : : ; vn 1 still span V . Thus it is natural to look
for the smallest spanning set. This leads to the following denition.
We say that a set S = fv1 ; v2 ; : : : ; vr g is linearly independent if whenever
check
c1 v1 + c2 v2 + + cr vr = 0
condition
then
c1 = c2 = = cr = 0:
Otherwise S is called linearly dependent.
44
Vector Spaces
Suppose that S = fv1 ; v2 ; : : : ; vr g is linearly dependent. Then
c1 v1 + c2 v2 + + cr vr = 0
has a nontrivial solution (c1 ; c2 ; : : : ; cr ) 6= 0. By re-ordering if necessary, we
may assume that cr 6= 0. Then
c1
v
cr 1
crc vr
r
and so vr is a linear combination of v ; v ; : : : vr .
Problem 2.9. Prove the converse to the above argument. That is, show that
if vr is a linear combination of v ; v ; : : : vr , then v ; v ; : : : vr is linearly
dependent.
We see by the problem and the preceding argument that a set of vectors is
linearly dependent precisely when some vector in the set is a linear combination
of the others.
Let
1 ; 0 ; 1 :
S=
0
1
1
vr =
c2
v
cr 2
1
1
2
1
2
1
1
1
1
2
Clearly S is a linearly dependent set that spans R2 . In fact, every vector of R2
can be written as a linear combination of the vectors in S in more than one way.
For instance,
1 =1 1 =1 1 +1 0 :
1
1
0
1
This ambiguity disappears if a spanning set S is linearly independent.
Theorem 2.6. Let S = fv1 ; v2 ; : : : ; vr g be a linearly independent set spanning
a vector space V . Then every vector in V can be written in a unique way as a
linear combination of vectors in S.
Proof: Since S is a spanning set for V , every vector in V has at least one
representation as a linear combination of vectors in S. Suppose there were two:
c1 v1 + c2 v2 + + cr vr = u = k1 v1 + k2 v2 + + kr vr :
Then subtracting left from right,
0 = (c1
k1 )v1 + (c2
k2 )v2 + + (cr
kr )vr :
By linear independence we have that ci ki = 0 for each i = 1; 2; : : : ; r. Hence
the representation of u is unique. 2
45
Bases
0
1 0
1
0
1
1
1
1
Example 2.7. To determine whether @ 1 A ; @ 1 A ; and @ 0 A are lin1
0
0
early independent vectors, consider the system of equations
0
1
0
1
0
1 0
1
1
1
1
0
x1 @ 1 A + x2 @ 1 A + x3 @ 0 A = @ 0 A
1
0
0
0
and nd all possible solutions via Gaussian elimination. We see that x1 = x2 =
x3 = 0. Thus by denition, the vectors are linearly independent.
Problem 2.10. Determine whether the following sets of vectors are linearly
independent:
1. (1; 2; 3)t; (1; 2; 4)t; (1; 2; 5)t
2. (1; 1)t ; (2; 3)t; (3; 2)t
3. (1; 1; 1; 1)t; (3; 1; 1; 1)t; (1; 0; 0; 0)t
Problem 2.11. Show that any set S which contains the zero vector is linearly
dependent.
Problem 2.12. Suppose S consists of 4 vectors in R3 . Explain why S is linearly
dependent.
Problem 2.13. Show that if two vectors in R2 are linearly dependent, they lie
on the same line while three linearly dependent vectors in R3 lie in the same
plane.
2.2.2 Basis and Dimension
If S = fv1 ; : : : ; vr g is a set of vectors in a vector space V , then S is called a
basis for V if S is a linearly independent set and spans V .
Examples 2.8.
1. The standard example of a basis for Rn or C n is the set
S = fe1 ; e2 ; e3 ; : : : ; en g ;
where ej is the vector in Rn whose j th coordinate is 1 and whose other
coordinates are all 0. It is called the natural or standard basis for Rn or
C n.
2. If S is a linearly independent set, then S is a basis for span(S).
Problem 2.14.
1. Prove that the natural basis is indeed a basis for Rn or C n
46
Vector Spaces
2. Find another basis for Rn and prove it is one.
Our immediate goal is to prove that any two bases of a vector space must
each have the same number of vectors. We rst need the following theorem.
Theorem 2.9. Let
S = fv1 ; : : : ; vr g
be a basis for V and let
T = fw1 ; : : : ; ws g V;
where s > r. Then T is a linearly dependent set.
Proof:
Since S is a basis for V , we may write
w1 = a11 v1 + + ar1 vr
(2.1)
..
..
..
.
.
.
ws = a1s v1 + + ars vr
To prove that T is linearly dependent, set
k1 w1 + + ks ws = 0
(2.2)
and substitute (2.1). After rearranging terms, we obtain
(a11 k1 + + a1s ks )v1 + + (ar1 k1 + + ars ks )vr = 0:
Since S is a basis, it is linearly independent and so we obtain the following
system of equations in the variables ki .
a11 k1 + + a1s ks = 0
..
..
..
.
.
.
ar1k1 + + ars ks = 0
Since s > r, the above system has more unknowns than equations. Thus the
system has innitely many solutions for k1 ; : : : ; ks , and in particular, solutions
that are nontrivial. By equation 2.2, T must be linearly dependent. 2
We easily obtain the following theorem from Theorem 2.9.
Theorem 2.10. If some basis of a vector space V contains n vectors, then
every basis of V contains n vectors.
Problem 2.15. Prove Theorem 2.10.
The number of vectors in a basis of a vector space V is called the dimension
of V and denoted by dim(V ). By Theorem 2.10 this number is independent
of the particular basis of V we may happen to have and so reects a basic
structural feature of the vector space.
Remark: dim(Ran(A)) is called the rank of A. dim(Ker(A)) is called the
nullity of A.
Bases
47
Examples 2.11.
1. Since the natural basis for Rn contains n vectors, dim(Rn ) = n.
2. The set S = 1; x; x2 ; : : : ; xn is a basis for Pn (dened in section 2.1).
Thus dim Pn = (n + 1).
Example 2.12. We nd a basis and the dimension of the space of solutions to
the system
2x1 + 2x2 x3 + x5 = 0
x1 x2 + 2x3 3x4 + x5 = 0
x1 + x2 2x3 x5 = 0
x3 + x4 + x5 = 0
By Gaussian elimination, we nd that
x1 = s t; x2 = s; x3 = t; x4 = 0; x5 = t
So the general solution is
( s t; s; t; 0; t) = s( 1; 1; 0; 0; 0) + t( 1; 0; 1; 0; 1)
Clearly S = f( 1; 1; 0; 0; 0); ( 1; 0; 1; 0; 1)g spans the solution space. The set
S is also linearly independent since
0 = s( 1; 1; 0; 0; 0) + t( 1; 0; 1; 0; 1) = ( s t; s; t; 0; t)
implies that s = 0 and t = 0. Hence S is a basis and the dimension of the space
spanned by S is 2.
The method we just used to nd a basis for the solution space always yields a
spanning set S, since every solution is evidently a linear combination of vectors in
S with free parameters s and t functioning as coeÆcients. Linear independence
of S obtained in this way is also easy to see since each free parameter will be
xed at zero as a consequence of the \check condition" for linear independence.
Our next theorem tells us that if we know a priori the dimension of a vector
space, V , we need only check either that the set spans V or that the set is
linearly independent to determine whether the set is a basis for V .
Theorem 2.13. Let V be a vector space of dimension n. If S is a set of n
vectors then S is linearly independent if and only if S spans V .
Proof:
Suppose the set of vectors
S = fv1 ; : : : ; vn g
is linearly independent. If S does not span V , then we can nd a vector vn+1 in
V that is not in the span of S. and in particular, vn+1 is not a linear combination
of the vectors in S. Hence
S^ = fv1 ; : : : ; vn ; vn+1 g
48
Vector Spaces
is linearly independent. But this contradicts Theorem 2.9 because the dimension
of V is n and hence it contains a basis of n vectors.
Now suppose S spans V . If S is not linearly independent, we may discard
vectors until we arrive at a linearly independent set S^ with m < n vectors. But
S^ still spans V since we've been casting out only redundant vectors, and so S^ is
a basis for V with m < n vectors, a contradiction to the initial hypothesis that
the dimension of V is n. 2
Example 2.14. To check whether
f(1; 2; 3); (4; 5; 6); (2; 2; 2)g
is a basis for R3 , we need only check linear independence since dim(R3 ) = 3.
We note that this is a bit easier than checking whether the set spans. (Compare
Problem 2.8 with Problem 2.10.)
Problem 2.16. By inspection explain why f(1; 2); (0; 3); (2; 7)g is not a basis
for R2 and why f( 1; 3; 2); (6; 1; 1)g is not a basis for R3 :
Problem 2.17. Determine whether f(2; 3; 1); (4; 1; 1); (0; 7; 1)g is a basis for
R3 .
Problem 2.18. Find a basis for the subspaces of R3 consisting of solutions to
the following systems of linear equations.
1.
x1 + x2 x3 = 0
2x1 x2 + 2x3 = 0
x1 + x3 = 0
2.
3.
x+y+z =0
3x + 2y 2z = 0
4x + 3y z = 0
6x + 5y + z = 0
x+y+z =0
y+z =0
y z=0
Next we tackle the problem of nding a basis for the range of a matrix.
Suppose that S = fv1 ; : : : ; vr g spans V but is not linearly independent. As
mentioned before, in this case it is possible to discard a vector which is a linear
combinations of the other vectors and still have a set which spans V . Then after
discarding nitely many vectors we arrive at a subset of S which is a basis for
49
Bases
V . We now demonstrate a method which achieves this in the case V Rn . Let
A be the matrix whose columns are the vectors in S. If y is in V , then
y = x1 v1 + : : : + xr vr :
Thus letting x = (x1 ; : : : ; xr ), we obtain
Ax = y:
Remark:The range of a matrix is equal to the space spanned by its columns.
Let us put A in reduced row echelon form B. We note that if Ax = y; then
Bx = y1 for some y1 2 V: Hence by the remark above, the vector y is in the
range of A if and only if the vector y1 is in the range of B. More importantly,
both of these vectors are images of the same vector x. By inspection it is easy
to see which columns of B are linear combinations of the other columns. Say
the ith column is such a column. Ignore it! That is given y1 in the range of B,
we can nd a vector x whose ith coordinate is zero such that
Bx = y1 :
Thus given any y in V we can nd a vector x with ith coordinate zero such that
Ax = y:
Hence the ith column of A may be discarded.
Example 2.15. Let
S = (1; 2; 1)t; (2; 3; 3)t; (0; 1; 1)t; (2; 2; 2)t; (1; 0; 1)t
We nd a subset of S which spans the same space as S but is linearly independent. Let
2
3
1 2 0 2 1
A = 4 2 3 1 2 0 5:
1 3 1 2 1
The reduced row echelon form for A is
2
3
1 0 0 0 1
B = 4 0 1 0 1 1 5:
0 0 1 1 1
Clearly columns 3 and 4 in B are linear combinations of columns 1,2 and 3
and so we discard columns 3 and 4 from A. The resulting linearly independent
spanning set is
S1 = (1; 2; 1)t; (2; 3; 3)t; (0; 1; 1)t :
By the remark above, S1 is a basis for the range of A.
50
Vector Spaces
Problem 2.19. Find a basis for the range of the following matrices:
1.
2.
3.
2
0
A=4 2
1
2 0 2
1 1 0
0 1 0
2
2 2
A=4 2 3
2 4
2
3
5
3
5
3
1 1 1 1 1
A=4 2 3 4 5 6 5
6 5 4 3 2
Problem 2.20. Let A be an m n matrix whose range has dimension r and
whose kernel has dimension s. Show that r + s = n. Hint: This is simple for a
matrix in reduced row echelon form. (This problem gives an alternate proof of
the rank-nullity theorem proved in the next section.)
If A is an m n matrix then the range of At is the subspace spanned by
the rows of A. This is occasionally referred to as the row space of A. Since
elementary row operations only form linear combinations of the rows of A, the
row space of a matrix will not change in the course of Gauss elimination. So
if A is reduced via elementary operations to a reduced row echelon form, R,
the row space of A is identical to the row space of R. Since the nonzero rows
of either the row echelon form or of the reduced row echelon form are linearly
independent, the number of leading ones in the reduced row echelon form is
then, evidently the dimension of the row space of A. But the leading ones mark
the location of the linearly independent columns of A and so the number of
leading ones is also rank(A), i.e., the dimension of the range of A. This leads
to
Remark:The rank of a matrix is equal to the dimension of its row space.
We end this section with a very important theorem.
Theorem 2.16. If S = fv1 ; : : : ; vr g is a linearly independent set in a vector
space V of dimension n, then S can be enlarged to become a basis of V ; that is
there exists a basis of V which contains S as a subset.
If S spans V , then r = n and we are done. If not, there exists some vector
vr+1 62 span(S). The set
S1 = fv1 ; : : : ; vr ; vr+1 g
is clearly linearly independent. If S1 spans V , then we are done. Otherwise there
exists a vector vr+2 62 span(S1 ). We form S2 = S1 [ fvr+2 g. Since dim(V ) = n,
this process must stop. When it does, we have the required basis of V .
51
Bases
2.2.3 Change of Basis
Given a basis S = fu1 ; : : : ; un g of a vector space V , we have by Theorem 2.6
that every vector u in V can be written uniquely as
u = k1 u1 + + kn un
so there is a one-to-one correspondence between vectors u in V and n-tuples
(k1 ; : : : ; kn )t . We write this association as
(u)S := (k1 ; : : : ; kn )t :
The above representation denes the \S-coordinates" of u unambiguously. When
V is real or complex Euclidean n-space and S is the natural basis, we sometimes
do away with the subscript S and then write u with its usual coordinate representation.
Examples 2.17. Let
T = f(0; 0; 1); (1; 2; 3); (1; 0; 0)g
1. To nd (u)T where u = (1; 2; 0), we need to solve the system
(1; 2; 0) = c1 (0; 0; 1) + c2 (1; 2; 3) + c3 (1; 0; 0)
for c1 ; c2 ; c3 . We nd that c1 = 3,c2 = 1, and c3 = 0. So
uT = ( 3; 1; 0):
2. To nd what the vector (2; 4; 1)T equals in the usual basis simply note
that
(2; 4; 1)T = 2(0; 0; 1) + 4(1; 2; 3) + 1(1; 0; 0) = (5; 8; 14)
Problem 2.21. Let
S = f(1; 2; 1); (0; 1; 0); (1; 1; 0)g T = f(1; 1; 1); (1; 1; 0); (1; 0; 0)g :
1. Find (u)S and (u)T where u = (1; 2; 1).
2. Give the natural basis coordinate representation for (1; 5; 2)S and (3; 0; 2)T .
Problem 2.22. Suppose A 2 Rmn has linearly independent columns and let
the set of vectors A = fa1 ; a2 ; : : : ; an g be constituted from the columns of A.
Explain the following statement: x^ solves the linear system Ax = b if and only
if (b)A = x^
Suppose we are given a vector in S coordinates and we wish to write it in T
coordinates, that is we seek to nd a relationship between (u)S and (u)T .
52
Vector Spaces
Suppose V is two-dimensional: S = fu1 ; u2 g and T = fv1 ; v2 g each are
bases for V . We rst nd (u1 )T and (u2 )T by solving the two linear systems
u1 = av1 + bv2 :
u2 = cv1 + dv2
Hence we will have found that
(u1 )T = (a; b) and (u2 )T = (c; d)
Now let v be an arbitrary vector in V with (v)S = (k1 ; k2 ). Then
v = k1 u1 + k2 u2
= k1 (av1 + bv2 ) + k2 (cv1 + dv2 )
= (k1 a + k2 c)v1 + (k1 b + k2 d)v2
So, (v)T = (k1 a + k2 c; k1 b + k2 d): It is easy to see from this that
(v)T = ab dc (v)S
= [(u1 )T (u2 )T ] (v)S :
= B(v)S
(2.3)
where
B = [(u1 )T (u2 )T ]
with columns (u1 )T and (u2 )T is called the change of basis matrix from S (the
given basis) to T (the new basis).
In general, if S = fu1 ; : : : ; un g and T = fv1 ; : : : ; vn g, the change of basis
matrix B from S to T in equation (2.2.3) is given by
[(u1 )T : : : (un )T ] :
Example 2.18. Let S be the natural basis, fe1 ; e2 g for R2 and let T be the
set consisting of u = (1; 1) and v = (2; 0). To nd the change of basis matrix
B from S to T, we rst change the basis vectors of S to the T basis. By solving
an appropriate system, we see that
1
1
e1 = 0u +
2 v e2 = 1u + 2 v
Thus
B = (e ) (e ) = 0 1
1
Let w = (1; 3)S . Then
T
2
(w)T = B(w)S = 10
2
T
1
2
1
1
2
1
2
1 =
3
3
1
53
Bases
Problem 2.23. Show that if S = f(a1 ; b1 ; c1 ); (a2 ; b2; c2 ); (a3 ; b3 ; c3 )g and T is
the natural basis in R3 , then the change of basis matrix from S to T is
2
4
a1 a2 a 3
b1 b2 b3
c1 c2 c3
3
5:
Suppose the change of basis matrix from S to T is B. Since the columns of
B are the coordinate representations of vectors in S, these columns are linearly
independent and thus B is invertible. Thus by equation (2.2.3) we have
B 1 (v)T = (v)S :
But this means that B 1 is the change of basis matrix from T to S.
Problem 2.24. Let S = f(2; 4); (3; 8)g and T = f(1; 1); (0; 2)g. Find
1. (w)T given that (w)S = (1; 1)S .
2. (w)S given that (w)T = (1; 1)T.
Example 2.19. Suppose the vectors e1 and e2 are rotated (counterclockwise)
about the z -axis degrees while e3 is left xed. A simple trigonometric calculation tells us that the rotation transforms the natural basis S of R3 into a new
basis
80
1 0
1 0
19
cos sin 0 =
<
T = : @ sin A ; @ cos A ; @ 0 A ;
0
0
1
The change of basis matrix from T to S is given by
2
3
cos sin 0
B = 4 sin cos 0 5
0
0 1
By rotating back degrees (clockwise), it is easy to see that the change of basis
matrix from S to T is simply Bt . One can verify with a separate calculation
that in this case Bt = B 1 .
54
Vector Spaces
2.3
Linear Transformations and
their Representation
Prerequisites:
Vector spaces.
Subspaces.
Matrix manipulation, Gaussian elimination.
Bases.
Learning Objectives:
Ability to identify linear transformations.
Familiarity with basic properties of linear transformations.
Familiarity with the kernel and range of a linear transformation.
Ability to determine the action of a linear transformation from its action
on a basis.
Familiarity with the \rank plus nullity" theorem.
Ability to compute the standard matrix representation for a linear transformation from Rn to Rm .
Ability to compute the matrix representation of a linear transformation
in any bases.
If a function L maps a vector space V to a vector space W , we denote it by
L : V ! W:
We call L a linear transformation if the following two properties hold:
1. L(u + v) = L(u) + L(v)
2. L(ku) = kL(u) for all vectors u; v 2 V and k real or complex.
Example 2.20. Let L : R2 ! R3 be dened by
L(x; y) = (x; x + y; x y)
To see that L is linear simply note that
L[(x; y) + (u; v)] = L(x + u; y + v)
= (x + u; x + u + y + v; x + u y v)
= (x; x + y; x y) + (u; u + v; u v)
= L(x; y) + L(u; v)
55
Linear Transformations
Problem 2.25. Prove that the following are linear transformations.
1. L : R3 ! R2 dened by
L(a; b; c) = (a; b)
2. L : R2 ! P3 dened by
L(a; b) = ax2 + bx2 + ax + (a + 2b)
3. L : Rn ! Rm dened by
where A is an m n matrix.
4. L : V ! W dened by
L(u) = Au;
L(u) = 0
for all u in V . Here V and W are vector spaces. (The transformation L
here is called the 0 transformation.)
The linear transformation in the third part of the above problem
L(u) = Au
is called a matrix transformation. We shall see in the next section that all linear
transformations between (nite dimensional) vector spaces can be represented
as matrix transformations.
Problem 2.26. Determine whether the following functions from R2 to R3 are
linear.
1. L(x; y) = (0; 0; y)
2. L(x; y) = (px; x; 0)
3. L(x; y) = (1; x; y)
We need the following elementary properties of linear transformations.
Theorem 2.21. Let L : V ! W be a linear transformation. Then
1. L(0) = 0
2. L( v ) = L(v)
Proof: By linearity
L(0) = L(0 + 0) = L(0) + L(0):
Hence L(0) = 0. Also
L( v) = L( 1v) = 1L(v) = L(v):
56
Vector Spaces
Problem 2.27. Prove that if L is linear then
L(v w) = L(v) L(w):
If L : V ! W is a linear transformation, the set of vectors u in V such that
L(u) = 0 is called the kernel of L and denoted Ker(L). The set of vectors w
in W such that there exists u in V with L(u) = w is called the range of L and
denoted Ran(L).
Problem 2.28. Prove that the kernel and range of a linear transformation from
V to W are subspaces of V and W respectively.
We have already used the terms kernel and range for matrices. Indeed, let
L : Rn ! Rm be dened by
L(u) = Au
where A is an m n matrix. Here the kernel of L is precisely the kernel of A.
Similarly the range of L is the range of A.
Problem 2.29. Dene
L(u) = Au
where
2
3
4 1 2 3
6 2 1
1 4 77
A=6
4 2 1
1 45
6 0 9 9
1. Find a basis for the kernel and range of L.
2. Determine which of the following vectors are in the range of L:
(0; 0; 0; 6); (1; 3; 0; 0); (2; 4; 0; 1):
3. Determine which of the following vectors are in the kernel of L:
(3; 9; 2; 0); (0; 0; 0; 1); (0; 4; 1; 0):
One of the most important facts about linear transformations is that once
Indeed, let
L:V !W
be linear and let S = fv1 ; : : : vn g be a basis for V . Suppose we are given the
values of
L(v1 ); : : : ; L(vn ):
we know the transformation on a basis, we know it completely.
57
Linear Transformations
Then if v 2 V , we have
By linearity we obtain
v = a1 v1 + + an vn :
L(v) = L(a1 v1 + + an vn )
= a1 L(v1 ) + + an L(vn ):
Thus L is completely determined.
Problem 2.30. Let L : R3 ! R2 be a linear transformation such that
L(1; 0; 0) = (1; 1) L(0; 1; 0) = (0; 2) L(0; 0; 1) = (1; 2):
Find L(2; 1; 3).
If L : V ! W is a linear transformation, the following terminology follows
naturally from the matrix case:
the dimension of the range of L is the rank of L;
the dimension of the kernel of L is the nullity of L.
We have the following theorem which relates rank and nullity.
Theorem 2.22. If L : V ! W is a linear transformation, then
rank(L) + nullity(L) = dim(V )
Proof: Let r = rank(L) and fw1 ; w2 ; ; wr g be a basis for Ran(L). Likewise, let = nullity(L) and fn1 ; n2 ; ; n g be a basis for Ker(L). Then r
vectors exist fv1 ; v2 ; ; vr g V so that L(vi ) = wi for each i = 1; 2; ; r.
Consider the composite set of vectors in V ,
S = fv1 ; v2 ; ; vr ; n1 ; n2 ; ; n g
Every vector in V can be represented as a linear combination of vectors in the
set S. Indeed, pick an arbitrary vector v 2 V . Then for some choice of scalars
f1 ; 2 ; ; r g
L(v) =
=
r
X
i=1
r
X
i=1
i wi
i L(vi )
r
X
= L(
P
r v ) = 0 and v
So that L(v
i=1 i i
must be scalars f1 ; 2 ; ; g so that
v
r
X
i=1
i=1
i vi )
Pr
i vi =
i=1 i vi
X
j =1
j nj
2 Ker(L). But, then there
58
Vector Spaces
and so, v is a linear combination of vectors in S:
v=
r
X
i=1
i vi +
X
j =1
j nj
The vectors of S are linearly independent and so form a basis for V . To see this
consider a linear combination of the vectors of S :
r
X
i=1
i vi +
X
j =1
j nj = 0
Applying L to both sides of the equation yields
0
r
X
L @ i vi
r
X
i=1
i=1
+
i L(vi ) +
X
j =1
X
j =1
1
j nj A = 0
j L(nj ) = 0
r
X
i=1
i wi = 0
Since fw1 ; w2 ; ; wr g is a basis for Ran(L), i = 0 for each i = 1; 2; ; r.
But then,
X
j =1
j nj = 0
and since fn1 ; n2 ; ; n g is a basis for Ker(L), it must be that also j = 0 for
each j = 1; 2; ; . We conclude that S must be linearly independent set of
vectors so that dim(V ) = r + , as asserted by the theorem. 2
Problem 2.31. Let L : Rn ! Rn such that
L(v) = 3v
for all v 2 V . Find the kernel and range of L.
Problem 2.32. Let
v1 = (1; 2; 3); v2 = (2; 5; 3); v3 = (1; 0; 10)
be a basis for R3 . Find a general formula for L(v), a linear transformation on
V , in terms of L(v1 ); L(v2 ) and L(v3 ) given that
L(v1 ) = (1; 0); L(v2 ) = (1; 0); L(v3 ) = (0; 1)
Then nd L(1; 1; 1).
59
Linear Transformations
2.3.1 Matrix Representations
Suppose L : R3 ! R2 is a linear transformation such that
L(1; 0; 0) = (a11 ; a21 ) L(0; 1; 0) = (a12 ; a22 ) L(0; 0; 1) = (a13 ; a23 ):
Consider the matrix
A=
a11 a12 a13
a21 a22 a23
It is easy to see that Lx = Ax for all x in R3 . Indeed if x = (x1 ; x2 ; x3 ) we
have by the linearity of L that
L(x1 ; x2 ; x3 ) = x1 L(1; 0; 0) + x2 L(0; 1; 0) + x3 L(0; 0; 1)
= (x1 a11 ; x1 a21 ) + (x2 a12 ; x2 a22 ) + (x3 a13 ; x3 a23 )
= (x1 a11 + x2 a12 + x3 a13 ; x1 a21 + x2 a22 + x3 a23 ):
Clearly
x
1 a11 + x2 a12 + x3 a13
A(x) = x a + x a + x a
1 21
2 22
3 23
Thus we have expressed L in terms of the matrix A. Note that to nd the
columns of A we have taken the natural basis vectors in R3 applied L to them
and expressed the resulting vectors in their natural basis coordinates in R2 . We
say then that A is the matrix of L with respect to the bases
S = f(1; 0; 0); (0; 1; 0); (0; 0; 1)g and T = f(1; 0); (0; 1)g :
Since both bases are natural we also call A the standard or natural matrix of L.
Example 2.23. 1. Let L : R3 ! R3 be dened by
L(a1 ; a2 ; a3 ) = (a1 ; a2 + a3 ; 0):
Since
L(1; 0; 0) = (1; 0; 0); L(0; 1; 0) = (0; 1; 0); L(0; 0; 1) = (0; 1; 0)
we have that
2
3
1 0 0
A=4 0 1 1 5
0 0 0
2. Let L : R2 ! R2 be dened by
L(a1 ; a2 ) = (a2 ; a1 )
60
Vector Spaces
Since
we have that
L(1; 0) = (0; 1); L(0; 1) = (1; 0);
0 1
1 0
Problem 2.33. In parts 1. and 2. nd the standard matrix for the given linear
transformation L.
1. Let L : R2 ! R4 be dened by
L(x1 ; x2 ) = ( x2 ; x1 ; x1 + 3x2 ; x1 + x2 ):
A=
2. Let L : R4 ! R5 be dened by
L(x1 ; x2 ; x3 ; x4 ) = (x4 ; x1 ; x3 ; x2 ; x1 x3 ):
3. Let L : R2 ! R2 map each vector into its symmetric image about the
y-axis.
Suppose now that L : V ! W is a linear transformation between two nite
dimensional vector spaces V and W with bases
S = fv1 ; : : : ; vn g and T = fw1 ; : : : ; wm g
respectively. To nd the matrix A of L with respect to S and T, we proceed as
before. If
L(v1 ) = w1 ; : : : ; L(vn ) = wn ;
we set the n columns of A to be
(w1 )T ; : : : (wn )T :
Then if
L(v) = w;
we have that
A(v)S = (w)T :
An easy way to remember how to nd A is to write it in the form
A = [(L(v1 ))T : : : (L(vn ))T ]:
Warning ! The relationship between L and A depends completely on the
bases S and T. Indeed for A to represent L with respect to S and T, it must
consider all its input vectors with respect to the S coordinate system and all its
output vectors with respect to the T coordinate system. We also remark that
when V = W and S = T, we sometimes call A the matrix of L with respect to
S.
61
Linear Transformations
! R be dened by
L(x ; x ) = (x + x ; 2x + 4x ):
It is easy to see that the standard matrix for L with respect to the natural basis
is
1
1
A=
2 4
Example 2.24. Let L : R2
1
2
2
1
2
1
2
Suppose we want to nd the matrix B of L with respect to
S = T = fu1 ; u2 g ;
where
u1 = (1; 1); and u2 = (1; 2):
As before we nd that
B = [(L(u1 ))T (L(u2 ))T ]
Now
L(u1 ) = (2; 2) = (2; 0)T ; and L(u2 ) = (3; 6) = (0; 3)T :
We obtain
B = 20 03
One notes in the above example that although
L(1; 0) = (1; 2);
we have that
B 10 = 20
This is not a contradiction. Indeed B only inputs vectors in the S basis. Hence
the vector multiplied by B is actually the vector (1; 0)S which in the natural
basis is the vector (1; 1).
If L is a linear transformation from Rn to Rm and we want to nd the kernel
and range of L, we should rst nd the standard matrix A for L and then nd
the kernel and range of A.
Problem 2.34. 1. Let L : R2 ! R3 be dened by
L(x1 ; x2 ) = (x1 x2 ; x1 ; 0):
Find the matrix of L with respect to
S = f(1; 3); ( 2; 4)g and T = f(1; 1; 1); (2; 2; 0); (3; 0; 0)g :
62
Vector Spaces
2. Let L : R3 ! R3 be dened by
L(x1 ; x2 ; x3 ) = (x1 x2 ; x2 x1 ; x2 x3 ):
Find the matrix of L with respect to
S = f(1; 0; 1); (0; 1; 1); (1; 1; 0)g
Problem 2.35. In Problem 2.34 parts 1. and 2. nd a basis for the kernel and
range of L.
2.3.2 Similarity of Matrices
Let L : V ! V be a linear transformation on a nite dimensional vector space.
Let R and S be two bases for V and let A and B be the matrix of L with respect
to R and S respectively. As the reader may have suspected, there is a special
relationship between A and B. Indeed, let P be the change of basis matrix from
S to R. Then if L(v) = (w) we have
A(v)R = (w)R :
This implies that
AP(v)S = P(w)S :
But also
B(v)S = (w)S :
Since the last two equations are true for every v 2 V , we must have that
B = P 1 AP:
(2.4)
This leads to the following denition. If A and B are two n n matrices related
by equation (2.4) we say that A is similar to B.
Problem 2.36. Show that if A is similar to B then B is similar to A.
If A is similar to B, the preceeding problem allows us to say that A and B
are similar with no ambiguity.
We have just seen that if two n n matrices A and B represent the same
linear transformation, then A and B are similar. Suppose A and B are similar.
We would like to show that they can be made to represent a common linear
transformation. To see this dene
L(v) = Av:
Then L is a linear transformation from Rn to itself and A is the standard matrix
for L. (Note since we have not subscripted v with a basis on the right side of
Linear Transformations
63
the above equation, we are, in keeping with our convention, assuming it has
natural basis coordinates.) We dene
S = fPe1 ; : : : Pen g
where e1 ; : : : en are the natural basis vectors for Rn . Since P is invertible, S is
a linearly independent set and hence a basis for Rn and P is the change of basis
matrix from S to the natural basis.
Problem 2.37. Show that the claims made in the last sentence are true.
Hence B is the matrix of L with respect to S. Indeed if L(v) = w then
B(v)S = P 1 AP(v)S
= P 1 Av
= P 1w
= (w ) S :
Example 2.25. Let L, S, A, and B be as in Example 2.24. We now know that
A and B are similar. In fact from the above discussion
B = P 1 AP
where P is the change of basis matrix from S to the natural basis. Thus
1
1
P= 1 2
64
Vector Spaces
2.4
Determinants
Prerequisites:
Vector spaces.
Subspaces.
Matrix manipulation, Gauss elimination.
Bases
Linear transformations
Learning Objectives:
Familiarity with the concept of the determinant as a measure of the distortion of a matrix transformation.
Abilitiy to compute the determinant of a matrix by expressing the matrix
as a product of elementary matrices.
If A is a 2 2 matrix
A = ac db ;
the inverse of A is given by solving AB = I to get
B=
1
ad bc
d
c
b
a :
The quantity ad bc is called the determinant of A and is nonzero if and
only if A is invertible. By algebraically calculating the inverse of a general
square matrix A, determinants can be dened analogously by factoring out the
denominators of the resulting elements. Not surprisingly, determinants have
played an important historical role in the development of the theory of linear
systems, however their current role is much diminished and restricted for the
most part to use as an analytical tool. Our goal in this section is to give
geometric motivation for the denition of the determinant of a square matrix
and to show the reader a reasonable way to compute it. We begin by proving
the following theorem.
Theorem 2.26. Let A 2 Rnn , and let Rn be a bounded open set with
smooth boundary. We let
A(
) := fy 2 Rn j y = Ax for some x 2 g:
65
Determinants
denote the image of
under the linear transformation A.
vol(A(
))
vol(
)
Then the ratio
depends only on the matrix A. In particular, it is independent of the set :
To prove this we use the following lemmas.
Lemma 2.27. Any linear transformation L : Rn ! Rn maps
parallel lines to parallel lines,
parallelpipeds to parallelpipeds, and
translated sets to translated sets.
Proof. Let `1 and `2 be parallel lines in Rn . Then there are vectors u1 , u2 , and
v such that
`i = fui + tv j t 2 Rg i = 1; 2:
Now by linearity
L(`i ) = fL(ui + tv) = L(ui ) + tL(v) j t 2 Rg i = 1; 2:
So the sets L(`i ) are parallel lines as well. The second assertion follows directly
from the rst. The third assertion is left as a problem.
Problem 2.38. Prove that any linear transformation L : Rn ! Rn maps translated sets to translated sets, i.e. let Rn and let + v := fv + x j x 2 g:
Show that
L( + v) = L() + L(v):
Lemma 2.28. Let ai > 0 i; 1; 2; : : : ; n and let
P (a1 ; a2 ; : : : ; an ) := f(x1 ; x2 ; : : : ; xn ) 2 Rn j 0 xi ai ; i; 1; 2; : : : ; ng
denote a parallelpipe with sides on length ai on the positive ith coordinate axis.
We let P1 := P (1; 1; : : : ; 1) denote the unit parallelpiped. Then if A 2 Rnn
vol(A(P (a1 ; a2 ; : : : ; an )) = a1 a2 an vol(A(P1 )):
We rst note that by linearity and the properties of Euclidean length
kA(ei )k = kA(ei )k
for any > 0. Thus the length of a side of the deformed parallelpiped is
proportional to the length of the original side.
Proof.
66
Vector Spaces
We now claim that
vol(A(P (a1 ; a2 ; : : : ; an )) = ai vol(A(P (a1 ; a2 ; : : : ; ai 1 ; 1; ai+1 ; : : : ; an )) i = 1; 2; : : : ; n:
This is clear in two dimensions where the area of a parallelogram with sides a1
and a2 is given by a1 a2 sin where is the acute angle between the two sides.
Thus the area scales with the length of any one side. It is also clear in three
dimensions where the volume of a parallelpiped with sides of lengths a1 , a2 ,
and a3 is given by a1 a2 a3 sin 1 sin 2 . Here 1 is the acute angle between the
sides a1 and a2 , and 2 is the acute angle between the side a3 and the plane
containing a1 and a2 . The denition of the volume of a general parallelpiped in
higher dimensions is dened similarly, but requires a more general denition of
projection and angle, so we skip the proof for these cases.
The lemma follows directly from the assertion above.
Proof of Theorem 2.26. It follows immediately from lemma 2.28 that
vol(A(P (a1 ; a2 ; : : : ; an )) = vol(A(P )):
1
vol(P (a1 ; a2 ; : : : ; an )
For any choice of a1 ; a2 ; : : : ; an . Note that by lemma 2.27 this also holds for
translations of the parallelpiped P (a1 ; a2 ; : : : ; an).
For a general region Rn , the volume is dened by integration theory.
This is, the volume of is approximated by the sum of the volumes of small
parallelpipeds with disjoint interiors and sides along the coordinate axes. The
volume of A(
) can be approximated by the sum of volumes of the deformed
parallelpipeds. Each of the tems in the sum can be expressed as the product of
vol(A(P1 )) and the volume of the corresponding parallelpiped in . Thus, for
every approximating partition of and A(
) into a collection of parallelpipeds,
the ratio of the two approximations is always vol(A(P1 )). Taking the limit gives
us the nal result.
Denition 2.29. We call the constant vol(A(P1 )) the distortion of A and
denote it by
dist(A) := vol(A(P1 )):
Problem 2.39. In the following problem, all matrices are assumed to have real
entries.
1. Show if E is a 2 2 elementary matrix ofType 1 or 2, then dist(E)=1.
2. Show that if D is a 2 2 diagonal matrix, then dist(D) is the absolute
value of the product of the diagonal entries.
3. Show dist(A)(B)= dist(A)dist(B).
4. Show dist(A)=0 if and only if A is singular.
67
Determinants
Problem 2.40. Show that the distortion of A can be computed by reducing it
to row echelon form using elementary matrices. Give an example of the process.
Remark 2.30. When integrating vector valued functions in the new coordinate
system in R2 (for example using Green's Theorem), it is not enough to know
the distortion of A, we also need to know whether the orientation of the path
is preserved. Specically, suppose we have a closed curve which is oriented
positively, that is as we traverse the curve, the inside is to our left. If we map this
curve under A does the resulting closed curve have the same property or is the
inside now to the right? In the rst case A is called orientation preserving
while in the second case, it is called orientation reversing. Without this
knowledge the integral is correct only up to sign.
Problem 2.41. The following exercises reveal the orientation properties of 22
elementary matrices.
0
1
1. Let E = 1 0 . Show that E (an elementery matrix of Type 2) is
orientation reversing.
2. Show that any 2 2 Type 1 elementary matrix with real entries is orientation preserving.
3. Show that any 2 2 nonsingular diagonal matrix with real entries is orientation preserving if and only if the product of the diagonal elements is
positive.
Remark 2.31. With the motivation of Problems 2.39 and 2.41, we would like to
dene the determinant of A, denoted det(A)2 , to be a complex valued function
on n n matrices with complex entries that measures distortion according to
the properties of Problem 2.39 and and measures orienation according to the
properties of Problem 2.41.
Unfortunately, although the function dist(A) is well dened, it is not at all
clear whether the same would be true of det(A) given by the properties above.
Indeed, we would require that the determinant of a diagonal matrix D is the
product of its diagonal entries. However, if we factor such a matrix into two
matrices, say D = AB, can we be sure that that det(A) det(B) is not the
negative of det(D)?
To avoid this problem, we will a give rather mysterious denition of the determinant and then prove that it satises the properties suggested by Problems
2.39 and 2.41.
Denition 2.32 (Determinant). If (j1 ; : : : ; jn ) is an ordered n-tuple of integers between 1 and n inclusive, dene
Y
s(j1 ; : : : ; jn ) = sgn(jq jp );
p<q
notation jAj is sometimes used to denote the determinant but we will reserve this
notation to represent the matrix having as its elements the absolute values of the elements of
A.
2 The
68
Vector Spaces
Q
where denotes a product and sgn(x) is 1 if x is positive, 1 if x is negative
and 0 if x = 0. For an n n matrix A 2 C nn having entries [aij ] we dene
X
det(A) := s(j1 ; : : : jn )a1;j1 a2;j2 an;jn ;
where the sum extends over all ordered n-tuples of integers (j1 ; : : : jn ) with
1 j` n.
Remark 2.33. Note that s(j1 ; : : : ; jn ) is either 1, 1 or 0 and it changes sign
if any two of the j s are interchanged. Because of this, each nonzero term in
the sum dening the determinant consists of a product of entries of A one from
each column. Since s(j1 ; : : : ; jn ) = 0 if any two of the j 0 s are equal, we only
have n! terms in all.
Problem 2.42. Show that the formula for the determinant of a 3 3 matrix is
a11 a22 a33 a11 a23 a32 + a12 a23 a31 a12 a21 a33 + a13 a21 a32 a13 a22 a31
Theorem 2.34. The determinant has the following properties.
1. det(E) = 1 if E is an elementary matrix of Type 1.
2. det(E) = 1 if E is an elementary matrix of Type 2.
3. det(D) equals the product of the diagonal elements if D is diagonal.
4. det(AB) = det(A) det(B):
5. det(A) = 0 if and only if A is singular.
Problem 2.43. Use the denition of the determinant to show that the rst
three properties in Theorem 2.34 hold.
It is a little trickier to prove that the fourth and fth properties of Theorem
2.34 hold. They are consequences of the following lemma.
Lemma 2.35. For any j = 1; 2; : : : ; n, the determinant is a linear function of
the j th column if all the other columns are left xed.
We omit the proof3 .
Problem 2.44. Let E be an elementary matrix of Type 1 or 2. Prove, using
the denition of a determinant, that the determinant of E is equal to that of
its transpose. Then prove, using Theorem 2.34 show that this is true for any
square matrix.
Problem 2.45. Show that the determinant of an upper triangular matrix
2
6
A=6
6
4
a11 a12 : : : a1n
a22 : : : a2n
0
is the product of its diagonal entries.
3 See
...
ann
3
7
7
7
5
W. Rudin, Principles of Mathematical Analysis, pp.232-234.
Determinants
69
Example 2.36. We now compute a determinant by expressing it as the prod-
uct of elementary matrices. Note that we do not explicitly write down these
elementary matrices, but simply need to keep track of the eect they have on
the determinant as we row reduce the matrix to an upper triangular one (i.e.
put the matrix in row echelon form).
0
1
0
1
0 1 5
3 6 9
det @ 3 6 9 A = det @ 0 1 5 A
2 6 1
2 6 1
0
1
1 2 3
= 3 det @ 0 1 5 A
2 6 1
0
1
1 2
3
5 A = 165
= 3 det @ 0 1
0 0 55
Problem 2.46. Compute
0
1
3 6 9 3
B 1
0 1 0C
C
det B
@ 1
2 2 1A
1 3 2 1
Problem 2.47. Compute
0
1
0 2 1 0
B 1 0 1 1 C
C
det B
@ 2 1 3 1 A
0 1 2 3
Problem 2.48. Show that the system
ax + by = e
cx + dy = f
is consistent for any value of e and f if and only if det(A) 6= 0.
70
Vector Spaces
Chapter 3
Inner Products and Best
Approximations
3.1
Inner Products
Prerequisites:
Vector spaces.
Complex arithmetic.
Advanced Prerequisites:
Function spaces.
Learning Objectives:
Familiarity with the denitions of inner products
Familiarity with the basic examples of inner product spaces.
Familiarity with the Cauchy-Schwartz inequality.
Familiarity with the concept of orthogonality and the orthogonal complement in inner product spaces.
In R2 , one is able to calculate the angle between vectors using the dot product.
Indeed, if u = (u1 ; u2 ) and v = (v1 ; v2 ), the angle between u and v satises:
cos() = kuukkvvk
71
72
Inner Products and Best Approximations
p
where
u v = u1 v1 + u2 v2 is the dot product of u and v; and kuk = u u =
p
u21 + u22 is the (usual) Euclidean length of a vector u. In order to extend the
geometric notions of \angle between vectors" and \length of a vector" to more
general vector spaces it would seem enough to have a suitable generalization
of the dot product to more general vector spaces. The inner product is such a
generalization.
Let V be a complex vector space. An inner product on V is a function that
maps pairs of vectors u; v 2 V to a complex scalar hu; vi in such a way that
for all u; v 2 V :
1. hu; ui is real and nonnegative, and hu; ui = 0 if and only if u = 0.
2. hu; vi = hv; ui
3. hu; v + wi = hu; vi + hu; wi
If V is a real vector space then for every pair of vectors u; v 2 V , hu; vi is a real
scalar so that conditions (1) and (3) continue to hold; condition (2) becomes:
hu; vi = hv; ui
A vector space V on which an inner product is dened is called an inner product
space
p . We dene the norm of a vector in an inner product space as kuk =
hu; ui
Examples of inner product spaces:
R2 { the set of real ordered pairs. For vectors u = (u1 ; u2) and v = (v1 ; v2 )
in R2 ,
hu; vi = u1 v1 + u2v2 = ut v
denes an inner product on R2 . This provides a model for generalizations
to other vector spaces.
Cn { the set of complex n-tuples. For vectors u = (u1 ; u2 ; : : : ; un) and
v = (v1 ; v2 ; : : : ; vn ) in C n ,
hu; vi =
n
X
i=1
ui vi = u v
denes an inner product on Cn . (The conjugate transpose of a matrix
A = [aij ] is dened elmentwise as A = B = [bij ] with bij = aji ).
Pn { the set of (complex) polynomials of degree n or less. Pick n + 1
distinct points fzi gni=0 in the complex plane. For polynomials p; q 2 Pn ,
hp; qi =
denes an inner product on Pn .
n
X
i=0
p(zi )q(zi )
73
Inner Products
C mn { the set of m n matrices
with complex entries. For any matrix
P
T C mn dene trace(T) = ni=1 tii . Then for any A; B C mn ,
2
2
hA; Bi = trace(A B)
denes
an inner product on C mn . The associated norm: kAkF = trace(A A) =
P
i;j jaij j is called variously the Frobenius norm, the Hilbert-Schmidt
norm, or the Schatten 2-norm. We adhere to the name \Frobenius" and
will hang an \F" on such matrix norms to distinguish them from others
yet to come.
C [a; b] { the set of real-valued continuous functions on [a; b]. For any
functions f; g 2 C [a; b],
2
2
hf; gi =
Z b
a
f (x)g(x) dx
denes an inner product on C [a; b].
Problem 3.1. Prove that in each case above, the function h; i satises all the
conditions of an inner product.
It is a subtle but fundamental observation that the dening properties of
the inner product are suÆcient to guarantee that jhu; vij=kukkvk 1. This
allows us to extend sensibly the notion of angle between vectors in general vector
spaces so that the angle uv , between the vectors u; v 2 V is dened so that
cos(uv ) = hu; vi=kukkvk:
The following theorem guarantees that the right-hand quantity is bounded by
1 in magnitude and is known as the Cauchy-Schwarz inequality. (Note that the
conditions given for equality translate into the observation that uv is dened
to be either 0 or if and only if u and v are collinear.)
Theorem 3.1. Let u; v 2 V , an inner product space. Then
jhu; vij2 kuk2 kvk2
Equality holds if and only if u = kv, for some scalar k 2 C .
Proof: Let a = kvk2 , b = 2jhu; vij and c = kuk2 . First, if a = 0 then v = 0
(by property 1 of an inner product) which would then imply (by property 3
of an inner product) that b = 0. Clearly in this case the conclusion is true.
Now, consider the case a > 0 and choose 2 [0; 2) so that ei hu; vi is a real
nonnegative number. Then,
ei hu; vi = jei hu; vij = jhu; vij
Pick a real number t arbitrarily and dene z = tei . One may calculate:
0 ku + zvk2 = hu + zv; u + zvi
= kuk2 + z hu; vi + z hu; vi + jz j2 kvk2
(3.1)
= c + bt + at2 :
74
Inner Products and Best Approximations
Since this is true for all t 2 R, the quadratic polynomial at2 + bt + c is
always nonnegative; has only complex conjugate roots; and thus a nonpositive
discriminant b2 4ac. Thus
b2 4ac
and the conclusion is obtained by taking square roots on each side.
Notice that equality in the Cauchy-Schwarz inequality is equivalent to a zero
discriminant (b2 = 4ac) which means that equality can be attained in (3.1) with
t = b=2a. But that means in turn that u + zv = 0.2
Let V be an inner product space. Two vectors u and v in V are called
orthogonal (or perpendicular) if hu; vi = 0. A set of vectors fv1 ; v2 ; : : : ; vr g 2 V
is called orthogonal if
hvi ; vj i = 0; i 6= j:
An orthogonal set is called orthonormal if furthermore
hvi ; vi i = kvi k2 = 1; i = 1; 2; : : : ; r:
p
p
For example the set f(0; i; (i + 1)= 2); (0; (1 + i)= 2; 1)g is orthonormal.
Let W be a subspace of an inner product space, V . The set of vectors
W ? = fu 2 V such that hu; wi = 0 for all w 2 W g
is called the orthogonal complement of W .
Problem 3.2. Show that
W ? is a subspace of V
W W? ?
W \ W ? = f0g.
Problem 3.3. Show that if B is a set of orthogonal vectors, none of which is
the zero vector, then B is linearly independent.
Problem 3.4. Show that if B = fw1; w2 ; : : : ; w` g is a set of linearly independent vectors, then ` ` matrix, G = [hwi ; wj i] is nonsingular. G is called the
Gram matrix for B .
Problem 3.5. Prove the Pythagorean Theorem in inner product spaces: If u
and v are orthogonal then ku + vk2 = kuk2 + kvk2 .
3.2.
75
BEST APPROXIMATION AND PROJECTIONS
3.2
Best approximation and projections
Prerequisites:
Inner product spaces.
Matrix algebra.
Learning Objectives:
Familiarity with the concept of (possibly skew) projections. The ability
to compute orthogonal projections.
Understanding of the role of orthogonal projections in solving best approximation problems.
Very often in application settings a particular subspace, W say, of an inner
product space, V , may have some special properties that make it useful and
interesting to approximate any given vector v 2 V as well as possible with a
corresponding vector w 2 W . That is, nd w 2 W that solves
min kw vk = kw vk
(3.2)
w2W
We have the following characterization of the solution w
Theorem 3.2. Let W be a subspace of an inner product space, V . The vector
w 2 W is a solution to (3.2) if and only if w v ? W . Furthermore, for
any given v 2 V , there can be no more than one such solution w .
Suppose that w 2 W is a solution to (3.2) and pick an arbitrary vector
w 2 W . Choose a 2 [0; 2) so that e{ hw v; wi is real and nonnegative.
Now, for any real " > 0 dene z = "e{ and notice that
Proof:
kw vk k(w + zw) vk
(3.3)
= kw vk 2"jhw v; wij + " kwk
So for all " > 0,
0 2"jhw v; wij + " kwk
which means,
0 2jhw v; wij "kwk
Since we are free to make " as small as we like, it must be that hw v; wi = 0
and this is true for each w 2 W .
2
2
2
2
2
2
2
2
76
Inner Products and Best Approximations
To prove the converse, suppose w 2 W satises w v ? W . Then for
any w 2 W , (w w ) 2 W; and
kw vk2 = k(w w ) + (w v)k2 = k(w w )k2 + k(w v)k2 k(w v)k2
so w solves (3.2).
To prove uniqueness, suppose that there were two solutions to (3.2, say w1
and w2 . Then (w1 v) 2 W ? and (w2 v) 2 W ? . Since W ? is a subspace,
we nd
(w1 v) (w2 v) = (w1 w2 ) 2 W ? :
On the other hand, (w1 w2 ) 2 W . Since the only vector both in W and in
W ? is 0 (see Problem 3.2), we nd w1 = w2 . 2
While this is a nice characterization of solutions to best approximation problems, this result leaves open the question of whether a solution w to (3.2) always
exists and if so, how one might go about calculating it. The following theorem
describes one way of obtaining a solution to (3.2).
Theorem 3.3. Let V be an inner product space. If W is a nite-dimensional
subspace of V , then every vector v 2 V can be expressed uniquely as
v = w + w? ;
where w 2 W and w? 2 W ? . w is the unique solution to (3.2).
Proof:
Let
B = fw ; w ; : : : ; wr g
be a basis for W . Dene [ij ] to be the r r matrix inverse to [hwi ; wj i] (see
Problem 3.4) and
1
w =
r
X
i;j =1
2
wi ij hwj ; vi
and pick an arbitrary w 2 W . Then
w = a1 w1 + + ar wr
where ai 2 C , i = 1; 2; : : : ; r: We need only prove that v w and w are
orthogonal regardless of which w 2 W was chosen and then we can dene
w? = v w . Using properties of the inner product we see that
Pr
hw; v w i = P
k=1 ak hwk ; v w i
= rk=1 ak (hwk ; vi hwk ; w i)
Observe that
hwk ; w i =
=
r
X
hwk ; wi iij hwj ; vi
i;j =1
!
r X
r
X
hwk ; wi iij hwj ; vi = hwk ; vi
j =1 i=1
77
Pro jections
Thus hw; v w i = 0. Since w v 2 W ? , w is the unique solution to
(3.2). 2
Corollary 3.4. W = W ? ?
?
Proof: We saw in Problem 3.2 that W W ? . Pick any vector v 2
W ? ? . Then v = w + w? and since w 2 W W ? ? , one nds that
?
v w 2 W ? . But v w = w? 2 W ? ; so v w = 0 and v = w 2 W: 2
The vector w in the above decomposition of v is called the orthogonal
projection of v onto W . The vector w? is called the component of v orthogonal
to W . From the proof of the theorem, one can see that best approximations to
a subspace W , can be found in a straightforward way if a basis for W is known.
The mapping that carries the vector v to the vector w that solves (3.2) is called
an orthogonal projector, which we will denote as w = PW (v). PW is a linear
transformation, since evidently
w =
r
X
i;j =1
wi ij hwj ; vi
inherits linearity with respect to v from the linearity of the inner product with
respect to its second argument. If V = C n , we have the following convenient
matrix representation of PW in terms of fw1 ; w2 ; : : : ; wr g:
PW = WG 1 W ;
where W = [w1 ; w2 ; : : : ; wr ] and G = W W is the Gram matrix for fw1 ; w2 ; : : : ; wr g.
In general, what characterizes an orthogonal projector ?
Theorem 3.5. PW is an orthogonal projector onto a subspace W of C n if and
only if:
1. Ran(PW ) = W
= PW
= PW
Suppose that PW represents an orthogonal projector onto W , so
that w = PW v solves (3.2) for each v 2 C n . Then, in particular, for any
vector w 2 W , v = w itself solves (3.2) so w = PW w, and as a consequence
Ran(PW ) = W . Furthermore, for any vector v 2 C n , w = PW v 2 W , so
P2W v = PW (PW v) = PW w = w = PW v;
implying that P2W = PW . Finally, for any vectors u; v 2 C n , u PW u 2 W ?
and PW v 2 W so,
hu PW u; PW vi = 0
hu; PW vi hPW u; PW vi = 0
hu; PW vi hu; PW PW vi = 0
hu; (PW PW PW )vi = 0
2. P2W
3. PW
Proof:
78
Inner Products and Best Approximations
Thus, PW = PW PW and as a consequence,
(PW ) = (PW PW ) = PW PW = PW :
Conversely, suppose that PW is a matrix satisfying the three properties
above. Then for any vector v 2 C n and any vector w 2 W , we nd
hv PW v; wi = hv; wi hPW v; wi
= hv; wi hv; PW wi
= hv; wi hv; PW wi
= hv; wi hv; wi = 0
Thus, PW v solves (3.2) for each v 2 C n and so, represents an orthogonal
projection. 2
Problem 3.6. If PW represents an orthogonal projection onto a subspace W
of C n , show that I PW represents an orthogonal projection onto W ? .
Problem 3.7. Given a matrix, C 2 Rnr such that Ker(C) = f0g, show that
PRan(C) = C(Ct C) 1 Ct
represents an orthogonal projection onto Ran(C). What role does the assumption on Ker(C) play here ?
Problem 3.8. Given a matrix, B 2 Rrm such that Ran(B) = Rr , show that
PKer(B) = I Bt (BBt ) 1 B
represents an orthogonal projection onto Ker(B). What role does the assumption on Ran(B) play here ?
Any linear transformation that QW that satises the two properties
1. Ran(QW ) = W
2. Q2W = QW
is called a skew projector (or just a projector) onto the subspace W .
Problem 3.9. Show that if QW is a projector (either skew or orthogonal), then
Ran(QW ) = Ker(I QW ) and Ker(QW ) = Ran(I QW )
Problem 3.10. Suppose A 2 Rmn has a left inverse BL . Prove that Q =
ABL is a (possibly skew) projector onto Ran(A).
Problem 3.11. Suppose A 2 Rmn has a right inverse BR . Prove that Q =
I BR A is a (possibly skew) projector onto Ker(A).
Let V = R3 and let W be spanned by (1; 0; 0) and (0; 1; 0), that is W is the
x-y plane. For given u 2 R3 , the vectors u; w and w? form a right triangle
with u the hypotenuse, w in the x-y plane and w? parallel to the z -axis.
Problem 3.12. Let W be the subspace of R3 spanned by the vectors
v1 = (0; 1; 0); v2 = ( 4=5; 0; 3=5):
Show that v1 and v2 form an orthonormal set and nd PW u, where u = (1; 2; 1).
3.3.
3.3
PSEUDOINVERSES
79
Pseudoinverses
Prerequisites:
Vector spaces.
Inner products.
Matrix inverses.
Left and right inverses.
Projections.
Learning Objectives:
Familiarity with pseudosolutions and pseudoinverses.
Finding closest vectors out of subspaces can be used to extend the concepts
of left and right inverses. Suppose that A 2 C mn has a right inverse and
m < n. We know nullity(A) > 0 so A cannot have a left inverse. Although
Ax = b is consistent for any b, each right hand side b will be associated with
an innite family of solutions.
Problem 1: Find the smallest solution x^ to Ax = b.
Now, suppose instead that A 2 C mn has a left inverse and m > n. We
know rank(A) = n < m so A cannot have a right inverse and Ax = b will be
inconsistent for some b.
Problem 2: Find a vector x^ that brings Ax as close as possible to b.
In each of these cases, we seek vectors that are in some sense or other the
best possible \solution" to Ax = b.
Given any (rectangular) matrix A 2 C mn , the pseudoinverse of A is a
matrix B 2 C nm , that provides for each b 2 C m , a solution Bb = x 2 C n
to the following best approximation problem that is an aggregate of sorts of
Problems 1 and 2:
Problem 3 (= 1 + 2): Find a vector x^ that solves
minx2C n kAx bk
(3.4)
such that kxk is minimal.
A variety of notations are found for the pseudoinverse; the most common
appears to be B =\Ay ". The denition (3.4) does little to give insight into
what actions are taken to transform b into x . For that reason, the following
prescription for constructing x may be more useful as a denition of Ay :
The Action of the Pseudoinverse
80
Inner Products and Best Approximation
Dene P to be the orthogonal projection onto Ran(A) and Q to be the orthogonal projection onto Ker(A)? .
1. Find the component of b in Ran(A): Pb = y .
2. Find any one solution, x^ , to the linear system Ax = y .
3. Find the component of x^ in Ker(A)? : x = Qx^ .
Two issues immediately emerge:
Is the construction above well-dened to the extent that the nal result
x is the same regardless of which intermediate result x^ was picked ?
Does x solve (3.4) ?
Theorem 3.6. The construction of x specied above is well-dened and produces the unique solution to (3.4).
Proof: Step (1) denes y uniquely as the solution to (3.2) with W = Ran(A)
and v = b:
ky bk = min ky bk = xmin
kAx bk
2C n
y2Ran(A)
Since y 2 Ran(A), the linear system Ax = y of Step (2) must be consistent
and has at least one solution, say x^ . Notice that x^ is a solution to
min kAx bk = kAx^ bk
x2C n
and in fact, any solution to Ax = y will be a minimizer in the same sense.
To show that the outcome of Step (3) is independent of which solution x^ was
picked in Step (2), suppose that two solutions, x^ 1 and x^ 2 , were known that solve
Ax = y and these happened to produce two outcomes in Step (3): x1 = Qx^ 1
and x2 = Qx^ 2 . Since Ax^ 1 = y = Ax^ 2 , rearrangement gives A(^x1 x^ 2 ) = 0
so that (^x1 x^2 ) 2 Ker(A). We nd
x1 x2 = Q(^x1 x^ 2 ) = 0:
Thus, x1 = x2 and the outcome of Step (3) is uniquely determined regardless
of which solution in Step (2) was used. Furthermore, any such solution, x^ can
be decomposed as
x^ = x + n^
where n^ 2 Ker(A). By the Pythagorean Theorem,
kx^k2 = kx k2 + kn^ k2
so that out of all possible solutions, x^ , the minimal norm solution must occur
when n^ = 0 { that is, when x^ = x .
In some cases, the prescription for x = Ay b can be used to nd what
amounts to a formula for Ay .
81
Pseudoinverses
Theorem 3.7. If A 2 C mn has the full rank factorization A = XY , where
X 2 C mp and Y 2 C np are both of rank p, then the pseudoinverse is given
by
Ay = Y(Y Y) 1 (X X) 1 X
Proof: We rst construct the projections P and Q. Notice that Ran(A) =
Ran(X) and that Ker(A)? = Ran(A ) = Ran(Y). Then, directly
P = X(X X) 1 X ; Q = Y(Y Y) 1 Y
We now seek a solution to the system of equations
Ax^ = XY x^ = X(X X) 1 X b = Pb
Noticing that rank(X) = p implies that Ker(X) = f0g, we can rearrange to nd
0 = XY x^ X(X X) 1 X b
0 = X(Y x^ (X X) 1 X b)
0 = Y x^ (X X) 1 X b
While this does not yield a formula for x^ itself (recall that there won't generally be just one solution), we may premultiply both sides of the nal equation
by Y(Y Y) 1 to get
Y(Y Y) 1 Y x^ = Y(Y Y) 1 (X X) 1 X b which implies : : :
x = Qx^ = Y(Y Y) 1 (X X) 1 X b
(3.5)
Problem 3.13. Using the permuted LU factorization, show that for any matrix
A 2 C mn , if the rank of A is p then there are matrices X 2 C mp and
Y 2 C np , both of rank p, so that A = XY .
82
Inner Products and Best Approximation
3.4
Orthonormal Bases and the QR Decomposition
Prerequisites:
Vector spaces.
Inner products.
Matrix inverses.
Left and right inverses.
Projections.
Learning Objectives:
Ability to write vectors as linear combinations of an orthonormal basis.
Ability to use the Gram-Schmidt process to convert any basis to an orthonormal basis.
Ability to compute the QR decomposition of a matrix.
If S = fw1 ; : : : ; wr g is a basis for a subspace W of a vector space V and u
is an arbitrary vector in W , then u can be uniquely expressed as
u = k1 w1 + + kr wr
In particular, if W is a subspace of C n then by lining up the vectors of S as
columns of a matrix W = [w1 ; w2 ; : : : ; wr ] and placing the unknown coeÆcients
into a vector, k = [k1 ; : : : ; kr ]t , the coeÆcients can be found directly by solving
the system of equations Wk = u. If the system is inconsistent then u was not
in the subspace W after all. The next theorem tells us that if the vectors of S
are orthonormal then the coeÆcients are especially easy to nd whenever u is in
the subspace W and the case that u is not in the subspace W is also especially
easy to discover.
Theorem 3.8. If W = fw1 ; w2 ; : : : ; wr g is an orthonormal basis for a subspace
W of an inner product space V , then
r
X
i=1
jhwi ; uij kuk
2
2
(\Bessel's Inequality")
and u 2 W if and only if
kuk =
2
r
X
i=1
jhwi ; uij
2
(\Parseval Relation")
83
QR Decomposition
in which case,
u = hw1 ; uiw1 + hw2 ; uiw2 + + hwr ; uiwr
Proof:
Set i = hwi ; ui and calculate
0 ku
r
X
i=1
i wi k2
=kuk2 hu;
=kuk2
=kuk2
r
X
j =1
r
X
j =1
r
X
j =1
j wj i
h
r
X
i=1
r
X
j hu; wj i
i=1
j j = kuk2
i wi ; ui +
i hwi ; ui +
r
X
j =1
r
X
i;j =1
r
X
i=1
i j hwi ; wj i
i i
jj j
2
This establishes
Bessel's Inequality. The Parseval relation holds if and only if
kuk2 = Prj=1 jj j2 which in turn occurs precisely when u = Pri=1 i wi .
On the other hand, if u 2 W then since W is a basis for W , there exist
constants k1 ; : : : ; kr such that
u = k1 w1 + k2 w2 + + kr wr :
Taking the inner product against w1 , we obtain that
1 = hw1 ; ui = hw1 ; k1 w1 + k2 w2 + + kr wr i
= k1 hw1 ; w1 i + k2 hw1 ; w2 i + + kr hw1 ; wr i
= k1 1 + k2 0 + + kr 0
= k1
Similarly,
2 =hw2 ; ui = k2 ;
..
.
r =hwr ; ui = kr :
P
Thus, u = ri=1 i wi and the Parseval Relation must hold. 2
Notice that any u 2 V can be written as u = w + w? with w 2 W and
?
w ? W . Since
hwi ; ui = hwi ; w i + hwi ; w? i = hwi ; w i;
we have that w = hw ; uiw + hw ; uiw + + hwr ; uiwr and so through the
calculation of the inner products hwi ; ui, we've solved the best approximation
1
1
2
2
84
Inner Products and Best Approximation
problem
min kw uk2 =kw uk2
=kw?k2
w2W
r
X
=juk2
i=1
jhwi ; uij
2
The degree of \tightness" in Bessel's inequality then indicates how well u can
be approximated by vectors in the subspace W .
Problem 3.14. Suppose that W = f(0; 1; 0); ( 4=5; 0; 3=5)g is an orthonormal
basis for W . Test Bessel's inequality and solve the best approximation problem
for each of the following vectors
a. (1; 1; 1) b. (2; 1; 0) c. (3; 2; 1):
Which of the vectors lies closest to W ?
The preceding discussion shows how an orthonormal basis for a subspace
can be used to good eect. But how do we go about nding such a basis? The
next theorem demonstrates how to construct one starting from any given basis.
The construction is called the Gram-Schmidt Orthogonalization Process.
Theorem 3.9. Let V be an inner product space with a basis,
S = fu1 ; u2 ; : : : ; un g:
Then V has an orthonormal basis,
fq ; q ; : : : ; qn g;
1
2
so that for each k = 1; : : : ; n,
span fu1 ; u2 ; : : : ; uk g = span fq1 ; q2 ; : : : ; qk g:
Proof:
First we note that since S is a basis, u1 6= 0. Thus we may set
u1
q1 =
ku k :
Then kq k = 1 and span fu g = span fq g. To construct q , compute the
component of u orthogonal to spanfq g as in Theorem 3.3 and divide by its
length to produce a vector of length one. So we dene
w = u hq ; u iq
To see that w 6= 0, note that if it were zero, then u would be a scalar multiple
of q which is in turn a scalar multiple of u . This would mean that u ; u is a
1
1
1
1
2
2
2
1
2
2
1
2
1
1
2
1
1
2
85
QR Decomposition
linearly dependent set and could not be a part of a basis set for V , contradicting
our initial hypothesis. So w2 6= 0 and we dene
q2 =
w2
kw k
Clearly q has length 1 and since q is orthogonal to spanfq g, fq ; q g is an
orthonormal set. Furthermore, u = r q + r q where r = hq ; u i and
r = kw k, so that span fu ; u g = span fq ; q g.
Now we continue the construction inductively. Suppose for some k > 1,
we've produced a set of k 1 orthonormal vectors fq ; q ; : : : ; qk g so that for
each j = 1; : : : ; k 1, span fu ; u ; : : : ; uj g = span fq ; q ; : : : ; qj g. (We've
done this above for k=3). To construct qk , we will compute the component of uk
orthogonal to spanfq ; q ; : : : ; qk g and then divide by its length, producing
a vector of length one. Dene
2
2
2
22
2
2
1
1
12
1
2
22
1
2
1
2
2
1
2
2
1
1
1
12
2
2
1
1
2
1
wk = uk
kX1
j =1
hqj ; uk iqj
Again we must check that wk 6= 0. Were it so, then uk would be in the span of
fq1 ; q2 ; : : : ; qk 1 g and hence would be a linear combination of fu1; u2 ; : : : ; uk 1 g.
This implies that fu1 ; u2 ; : : : ; uk g is a linearly dependent set and could not be
a part of a basis for V , contradicting our starting hypothesis. Thus, wk 6= 0
and we dene
w
qk = k
kw k
k
A quick calculation veries for each j = 1; : : : ; k 1
hqj ; qk i = kw1 k hqj ; wk i
(3.6)
k
*
+
kX1
= kw1 k qj ; uk
hq` ; uk iq`
k
`=1
kX1
1
hq` ; uk ihqj ; q` i
= kw k (hqj ; uk i
k
`=1
= kw1 k (hqj ; uk i hqj ; uk i) = 0;
k
so fq1 ; q2 ; : : : ; qk g is an orthonormal set. Furthermore, uk = Pk`=1 r`k q` ,
where r`k = hq` ; uk i for ` = 1; : : : ; k 1 and rkk = kwk k. Thus, uk 2
span fq1 ; q2 ; : : : ; qk g. Since we already know that
spanfu1 ; u2 ; : : : ; uk 1 g = spanfq1 ; q2 ; : : : ; qk 1 g spanfq1 ; q2 ; : : : ; qk g;
we nd that spanfu1 ; u2 ; : : : ; uj g = spanfq1 ; q2 ; : : : ; qj g for each j = 1; : : : ; k,
which completes the induction step.2
86
Inner Products and Best Approximation
Problem 3.15. Apply the Gram-Schmidt process to the vectors
(1; 1; 1); (0; 1; 1); (1; 2; 1)
Problem 3.16. Let
W = spanf( 1; 0; 1; 2); (0; 1; 0; 1)g:
Find the solution to minw2W kw v k where v = ( 1; 2; 6; 0). (Hint: First use
the Gram-Schmidt process.)
Notice that if the original vectors fu1 ; u2 ; : : : ; un g are a basis for a subspace
of C m then the conclusions of Theorem 3.9 can interpretted as giving a matrix
decomposition, the QR decomposition, for a matrix having fu1 ; u2 ; : : : ; un g as
columns.
Theorem 3.10. Let A 2 C mn have rank n. Then there exist matrices Q 2
C mn and R 2 C nn so that Q Q = I, R is upper triangular with strictly
positive diagonal entries, and
A = QR
Proof: The columns of A, fa1 ; a2 ; : : : ; an g form a basis for Ran(A). Applying the Gram-Schmidt process to fa1 ; a2 ; : : : ; an g produces orthonormal vectors
fq1 ; q2 ; : : : ; qn g so that for each k = 1; 2; : : : ; n
ak =
k
X
j =1
rjk qj
where, in particular, rkk = kwk k > 0 as dened in the proof of Theorem 3.9.
This is just a column{by{column description of A = QR with Q = [q1 ; : : : ; qn ].
Orthonormality of fq1 ; q2 ; : : : ; qn g is equivalent to Q Q = I. 2
Problem 3.17. Modify the Gram-Schmidt process so that it will produce an
orthonormal basis for an inner product space V , starting with any spanning set
for V , fu1 ; u2 ; : : : ; un g (not necessarily a basis). How does this change Theorem
3.10 ?
3.5.
3.5
UNITARY TRANSFORMATIONS AND THE SINGULAR VALUE DECOMPOSITION
Unitary Transformations and the Singular
Value Decomposition
Prerequisites:
Vector spaces.
Inner products.
Matrix inverses.
Left and right inverses.
Projections.
Learning Objectives:
Familiarity with unitary matrices.
Familiarity with the concepts of unitarily equivalent and unitarily similar
matrices.
Familiarity with the singular value decomposition of a matrix.
Consider a matrix U 2 C nn satisfying any one of the properties dened
below.
U preserves length if for all x 2 C n ,
kUxk = kxk:
U preserves inner products if for all x and y in C n ,
hUx; Uyi = hx; yi:
U is a unitary matrix if U U = I (that is, if the columns of U are orthonormal vectors in C n .
The rst goal will be to show that if U has any one of these properties,
it has the remaining two as well. Since the inner product of two vectors is
proportional to the cosine of the angle between them, equivalence of these three
properties amounts to the observation that a length preserving transformation
also preserves angles, and that such a transformation is always associated with
a unitary matrix. The action of this transformation is simply a rigid motion of
the vectors of C n . Such a motion involves only rotations and reections through
coordinate planes.
87
88
Inner Products and Best Approximation
We introduce some standard notation. Let w = u + iv be a complex number
with u and v real. We call u and v the real and imaginary parts of w and denote
them by
u = Re(w); and v = Im(w):
Note that
w + w = 2Re(w) and w w = 2iIm(w);
Re( iw) = Im(w)
and
ww = Re(w)2 + Im(w)2 := jwj2 :
Theorem 3.11. An n n matrix U preserves length, if and only if it preserves
inner products, and if and only if it is unitary.
Suppose rst that U preserves inner products. Then
kUxk2 = hUx; Uxi = hx; xi = kxk2 :
Hence U preserves length.
Suppose now that U preserves lengths. We need the following equality whose
proof we leave as a problem.
Problem 3.18. Show for all complex numbers a and b,
ja + bj2 = jaj2 + ba + ab + jbj2 :
From the result of Problem 3.18 if x = (x1 ; : : : ; xn ); y = (y1 ; : : : ; yn ), then
Proof:
kx + yk = kxk +
2
2
n
X
i=1
(xi yi ) + xi yi ) + kyk2
(3.7)
= kxk2 + 2Rehx; yi + kyk2
Substituting iy for y in (3.7) gives
kx iyk2 = kxk2 + 2Rehx; iyi + kyk2
(3.8)
2
2
= kxk + 2Re[ ihx; yi] + kyk
= kxk2 + 2Imhx; yi + kyk2
Applying U to (3.7) and (3.8) and using the fact that U preserves lengths gives
kU(x + y)k2 = kUxk2 + 2RehUx; Uyi + kUyk2
= kxk2 + 2RehUx; Uyi + kyk2
89
Singular Value Decomposition
and
kU(x iy)k = kUxk + 2ImhUx; Uyi + kUyk
= kxk + 2ImhUx; Uyi + kyk
2
2
2
2
2
Thus we see that the real and imaginary parts of hx; yi and hUx; Uyi agree.
Hence
hUx; Uyi = hx; yi
and U preserves inner products.
If U is unitary, then it preserves length since
kUxk2 = hUx; Uxi = hU Ux; xi = hIx; xi = hx; xi = kxk2 :
Conversely, if U preserves length (and so, inner products) then for all x and y
in C n ,
hIx; yi = hx; yi = hUx; Uyi = hU Ux; yi
so that h(U U I)x; yi = 0 for all x and y in C n . But this is possible only if
(U U I) = 0, which is to say, only if U is unitary. 2
A matrix is unitary if and only if either the rows or columns of the matrix
form an orthonormal basis for C n .
Indeed, the columns of the unitary matrix, U, are the vectors
fUe1 ; : : : ; Uen g;
where e1 ; : : : ; en are the natural basis vectors for Cn . By the previous theorem,
U preserves both lengths and inner products and so
k
hUej ; Uek i = 01 ifif jj 6=
=k
This says that the columns of U form an orthonormal basis for C n . We leave
the converse to the reader.
Notice that U U = I implies U 1 = U , so that UU = (U ) U = I and
U evidently is unitary as well. But by our preceding discussion this means
that the columns of U (which are the conjugates of the rows of U) form an
orthonormal basis for C n . Hence the rows of U themselves form an orthonormal
basis for C n .
Problem 3.19.
Show that the product of unitary matrices is a unitary
matrix.
Show
that U is a unitary matrix if and only if the larger partitioned matrix
I 0 is unitary.
0 U
90
Inner Products and Best Approximation
Two matrices, A; B 2 C mn are said to be unitarily equivalent if there are
unitary matrices U 2 C mm and V 2 C nn , such that B = U AV. Likewise,
two square matrices, A; B 2 C nn are said to be unitarily similar if there is a
unitary transformation U 2 C nn such that B = U AU = U 1 AU.
Perhaps the single most useful matrix representation in matrix theory is the
Singular Value Decomposition (SVD):
Theorem 3.12. Every matrix is unitarily equivalent to a diagonal matrix (of
the same size) having nonnegative entries on the diagonal. In particular, suppose
A 2 C mn and rank(A) = r. There exist unitary matrices U 2 C mm and
V 2 C nn so that
A = UV
(3.9)
where = diag(1 ; 2 ; : : : ) 2 C mn with 1 2 r > 0 and r+1 =
= p = 0 for p = minfm; ng.
The columns of U = [u1 ; u2 ; : : : ; um ] are called the left singular vectors;
the columns of V = [v1 ; v2 ; : : : ; vn ] are called the right singular vectors; and
1 ; 2 ; : : : are the singular values of A. We'll prove this theorem while discussing
some adjacent ideas.
The Frobenius norm of a matrix is invariant under unitary equivalence since
for any unitary matrices U 2 C mm and V 2 C nn ,
kU AVk2F = trace([U AV] U AV)
= trace(V A UU AV)
= trace(V A AV)
= trace(A AVV )
= trace(A A) = kAk2F
If V is partitioned by columns as V = [v1 ; v2 ; : : : ; vn ], notice that
kAkF = kAVkF =
2
2
n
X
i=1
kAvi k
2
While dierent choices of unitary V won't change the overall sum, they can
aect the distribution of magnitudes among the summands. For a given matrix
A, we will seek to collect the \mass" of the sum as close to the beginning of
the summation as possible. In particular, this means we'll seek an orthonormal
basis of C n (the columns of V):
fv1 ; v2 ; : : : ; vn g;
that maximizes the sequence of quantities:
kAv1 k2;
kAv1 k2 +kAv2 k2 ;
kAv1 k2 +kAv2 k2 + kAv3 k2 ;
..
.
91
Singular Value Decomposition
Although at rst blush this may seem hopelessly complicated, notice that the
rst quantity maximized depends only on v1 , the second depends (in eect)
only on v2 since we've already gotten the best v1 , the third quantity depends
only on v3 in the same sense, and so on at each step we only are concerned with
maximizing with respect to the next vk in line, having already chosen the best
values of all previous vs.
The rst step proceeds as follows: Dene 1 = maxkvk=1 kAvk, let v1 be
the maximizing vector, and let u1 = 11 Av1 . Now starting with v1 , complete an
orthonormal basis for C n and ll out the associated unitary matrix V1 having
v1 as its rst column. Likewise, starting with u1 complete an orthonormal
basis for C m and ll out the associated unitary matrix U1 having u1 as its rst
column. Examining the partitioned matrix product of U1 AV1 yields
U1 AV1 =
1 w
^2
0 A
The 0 in the (2; 1) location comes from the orthogonality of u1 (which is a
multiple of Av1 ) to all the remaining columns
of U
1 . We now will show that
1
w = 0. Suppose we dene the vectors x = w and v = kx1k x. Then we
nd
2 2
3
1 + w w
6
7
..
6
7
.
7
Av = 6
6 other stu 7
4
5
..
.
In particular, we have that
q
kAvk
kAvk + w w = kmax
vk
2
1
1
=1
But the last expression is the largest possible value that the previous expressions
can attain, so in fact all inequalities are actually equalities, which in turn means
that it must be that w = 0. At this point, we've shown that any matrix is
unitarily equivalent to a matrix having rst row and column zero except for a
nonnegative diagonal entry.
Continuing to the next step, we go through the same construction on A^ 2 ,
and dene 2 = maxkvk=1 kA^ 2 vk, let v^2 be the maximizing vector, and let
^ 2 v^2 . Now starting with v^2 , complete an orthonormal basis for C n 1
u^ 2 = 12 A
and ll out the associated unitary matrix V^ 2 having v^2 as its rst column.
Likewise, starting with u^ 1 complete an orthonormal basis for C m 1 and ll
out the associated unitary matrix U^ 2 having u^ 2 as its rst column. Similar
reasoning to that found in the rst step above reveals
^U2 A^ 2 V^ 2 = 2 ^0
0 A3
92
Inner Products and Best Approximation
Problem 3.20. Explain why 2 as dened above satises
2 kAvk
for all vectors v with kvk = 1 and hv; v1 i = 0
We nish the second step by constructing,
V2 =
I 0
^2
0 V
and U2 =
Then one has
2
1
I 0
^2
0 U
0
0
0 2 0
0 0 A3
(U1 U2 ) AV1 V2 = 4
3
5:
The construction of the SVD continues in this way so that before we begin
step k we have found an orthonormal set of vectors v1 ; v2 ; : : : ; vk 1 and a
non-negative scalar k such that
k kAvk
for all vectors v with kvk = 1 and hv; vi i = 0 for i = 1; 2 : : : ; k 1. Step k
continues by dening unit vectors vk and uk so that Avk = sigmak uk 2
Notice that the singular values i and (right) singular vectors vi are constructed in such a way so that as well,
k kAvk
for all vectors v with kvk = 1 and v 2 spanfv1 ; v2 ; : : : ; vk ; g This is easy to
see since under these circumstances,
n
k
X
X
Av = UV v = i v vui = i v vui
so,
i
i=1
kAvk =
2
k
X
i=1
k
2
i=1
i
i2 jvi vj2
k
X
i=1
jvi vj
2
= k2 kvj2 :
The last step comes from an application of the Parseval Relation.
The nal set of ideas that we'll explore here center on a result known as the
Mirsky-Eckart-Young inequality:
93
Singular Value Decomposition
Theorem 3.13. Let A be an mn complex matrix with singular values 1 ; 2 ; : : : ; p
(p = min(m; n)), left singular vectors u1 ; u2 ; : : : ; um , and right singular vectors v1 ; v2 ; : : : ; vn . If B is any m n complex matrix with rank (B) = k < p
then
kA BkF v
u p
u X
t
i=k+1
i2
with equality occuring for the rank k matrix dened by
B =
Proof.
k
X
i=1
i ui vi
Notice rst that
kA B kF =kU (A B )VkF
=k diag(1 ; 2 ; : : : ; k )kF =
v
u p
u X
t
i=k+1
i2
so equality is achieved with B and B is evidently of rank k. Now let B an
arbitrary m n complex matrix with rank(B) = k < p and observe that the
\rank + nullity" theorem allows us to conclude that Ker(B) has dimension
n k. In the following lines we will generate a distinguished orthonormal basis
for Ker(B) that we will label as fzk+1 ; zk+2 ; : : : ; zn g. Let's start by considering
the sequence of subspaces:
least 1 dim
Zk+1 = |spanfv1 ; v{z
; atintersection
2 ; : : : ; vk+1 g \ Ker (B)
}
k+1 dimensional
least 2 dim
Zk+2 = |spanfv1 ; v{z2 ; : : : ; vk+2 g} \ Ker(B) ; atintersection
k+2 dimensional
least 3 dim
Zk+3 = |spanfv1 ; : : : {z; vk+2 ; vk+3 g} \ Ker(B) ; atintersection
k+3 dimensional
..
.
n k 1 dim
; at least
Zn 1 = span
fv1 ; v{z
2 ; : : : ; vn 1 g \ Ker (B)
intersection
}
|
n 1 dimensional
Zn =Ker(B) ; n k dimensional subspace
Notice that the subspaces are nested as
Zk+1 Zk+2 : : : Zn k 1 Ker(B);
94
Inner Products and Best Approximation
so we can pick a sequence of linearly independent vectors fzk+1 ; zk+2 ; : : : ; zn ; g
so that
spanfzk+1 g = Zk+1
spanfzk+1 ; zk+2 g = Zk+2
spanfzk+1 ; zk+2 ; zk+3 g = Zk+3
..
.
spanfzk+1 ; zk+2 ; : : : ; zn g = Ker(B):
Although our selection of fzi gni=k+1 might not be orthonormal we can apply the
Gram-Schmidt process to fzi gni=k+1 to produce an orthonormal set of vectors
with the same spanning properties given above. Thus, we can assume without
loss of generality that fzi gni=k+1 is an orthonormal set { an orthonormal basis
for Ker(B), in fact. We continue by augmenting fzi gni=k+1 with additional
orthonormal vectors fzi gki=1 that span Ker(B)? to create a full orthonormal
basis for C n . The unitary invariance of the Frobenius norm leads to
kA BkF =
2
n
X
k(A B)zi k
i=1
n
X
2
k(A B)zi k
i=k+1
n
X
=
i=k+1
p
X
i=k+1
2
kAzi k
2
i2 ;
with the last step justied by the observation that zi 2 spanfv1 ; v2 ; : : : ; vi g so
i kAzi k for each i = k +1; k +2; : : : ; n. This proves the inequality. 2
Chapter 4
The Eigenvalue Problem
4.1
Eigenvalues and Eigenvectors
Prerequisites:
Matrix algebra.
Learning Objectives:
Familiarity with the basic denitions for the eigenvalue problem.
Familiarity with the characteristic polynomial and the minimal polynomial
associated with a matrix.
The ability to compute eigenvalues by calculating roots of either the characteristic polynomial or the minimal polynomial.
Familiarity with cyclic subspaces and Krylov subspaces and understanding
their relationship with the minimal polynomial.
Let A be any n n complex matrix. The complex number is an eigenvalue
of A if the equation
Au = u
(4.1)
is satised with a nontrivial solution u 6= 0. The vector u is called an eigenvector
corresponding to . The pair (; u) may be referred to as an eigenpair. The
set of all points in the complex plane that are eigenvalues for A is called the
spectrum of A and commonly denoted as (A).
95
96
The Eigenvalue Problem
4.1.1 Eigenvalue Basics
If we think of A as a transformation taking a vector, x 2 C n to another vector,
Ax 2 C n , then generally one will expect both the direction and magnitude of
Ax to dier from that of the vector x. If x happens to be an eigenvector for A
then A leaves the direction of x unchanged, at least to the extent that Ax would
be just a scalar multiple of x. After multiplying both sides of equation (4.1)
by any nonzero scalar , one may see immediately that if (; u) is an eigenpair
for A then so is (; u). From this one may conclude that the magnitude of an
eigenvector is arbitrary (and so, irrelevant) and it is only the direction that is
signicant.
Aside from dierences of scalar multiples, a given eigenvalue might have
more than one eigenvector associated with it. A tiny bit of manipulation in
(4.1) yields the equivalent
(A I) u = 0:
(4.2)
Thus, is an eigenvalue for A if and only if the system of equations (4.1) has
a nontrivial solution u, which is to say, if and only if (A I) is singular. But
this can happen in turn, if and only if
det(A I) = 0:
(4.3)
We dene the characteristic polynomial of A as
pA (t) = det(tI A):
By expanding the determinant, one may see that pA (t) is an nth degree polynomial in t with the leading coeÆcient of tn equal to 1 (i.e., it's a monic polynomial of degree n). From (4.3), one sees that the eigenvalues of A must occur
exactly at the roots of pA (t). Since the degree of pA is n, this means (from
the fundamental theorem of algebra) A has at least one and at most n distinct
eigenvalues. Each distinct eigenvalue of a matrix has at least one eigenvector
associated with it, since if satises det(A I) = 0 then there is at least one
nontrivial solution to (A I)u = 0.
If pA (t) = tn + + c1 t + c0 , then we can factor pA such that
pA (t) = (t 1 )m(1 ) (t 2 )m(2 ) : : : (t N )m(N )
where 1 ; 2 ; : : : ; N are the distinct zeros of pA (and hence the distinct eigenvalues of A) and m(1 ); m(2 ); : : : ; m(N ) are positive integers satisfying m(1 )+
m(2 ) + + m(N ) = n. m(i ) is the algebraic multiplicity of the eigenvalue
i .
There are two conventions used in labeling eigenvalues. One may label the
distinct eigenvalues of the matrix as we did above, or one may label the eigenvalues \counting multiplicity". When we label the eigenvalues of a matrix counting
multiplicity, we list each eigenvalue as many times as its multiplicity. For example if A is a 6 6 matrix with distinct eigenvalues
Eigenvalues and Eigenvectors
97
1 with multiplicity one,
2 with multiplicity three, and
3 with multiplicity two,
we may label the eigenvalues of A counting multiplicity as fi g6i=1
1 = 1 ;
2 = 3 = 4 = 2 ; and
5 = 6 = 3 :
If the matrix A has all real entries, then
pA (t) = tn + + c1 t + c0
is evidently a polynomial with real coeÆcients. Thus if is an eigenvalue of A
then
pA () = n + + c1 + c0 = 0:
Taking the complex conjugate of both sides of the above equation and recalling
that the ci are all real, gives that
n
+ + c1 + c0 = 0
Thus is a zero of pA and hence an eigenvalue of A.
Similarly if A has all real entries and u is an eigenvector corresponding to
then conjugation yields
Au = u
Using the fact that A = A gives that
Au = u:
Hence u is an eigenvector of A corresponding to .
We have proved the following theorem
Theorem 4.1. Let A be an n n matrix with all real entries. Then if (; u)
is an eigenpair for A, so is (; u).
Example 4.2. Let
2 1 :
A=
5 2
We nd that det(A I) = 2 + 1. Thus the eigenvalues are i; i. To nd the
eigenspace for i, we solve the system
2 i 1
x1
5 2 i x2
98
The Eigenvalue Problem
and obtain that x1 = (i 5 2) x2 . Thus the = i eigenspace for A is one dimensional with basis vector ( (i 5 2) ; 1). By Theorem 4.1 we have that the = i
eigenspace is one dimensional with basis vector ( (i+2)
; 1).
5
Example 4.3. Let
2
A=4
3
2
0
2 0
3 0
0 5
3
5
We nd that
det(A tI) = (t 1)(t 5)2 :
To compute the = 5 eigenspace we need to solve the system
2
4
2
2
0
2 0
2 0
0 0
32
54
x1
x2
x3
3
2
3
0
5 = 4 0 5:
0
The eigenspace is two dimensional with basis vectors ( 1; 1; 0); (0; 0; 1).
The = 1 eigenspace is one dimensional and we leave the calculation to the
reader.
Problem 4.1. Find bases for all the eigenspaces for the following matrices
10
9
1. 4 2
2
3
2 0 1
2. 4 6 2 0 5
19 5 4
Notice that if is an eigenvalue of A then both (A I) and (A I)
are singular, so in particular, there are nontrivial solutions to (A I) y = 0.
Such a vector y is called a left eigenvector associated with , since it satises
y A = y
Problem 4.2. Show if A is upper triangular then its eigenvalues are precisely
its diagonal elements.
Problem 4.3. Show that an n n matrix A is singular if and only if = 0 is
an eigenvalue for A.
Problem 4.4. Show that the eigenvalues of a unitary matrix must have the
form ei for 2 [0; 2).
Eigenvalues and Eigenvectors
99
Looking back to (4.2) again, is an eigenvalue of A if and only if nullity(A
I) > 0 and every nontrivial vector in Ker (A I) is an eigenvector associated
with the eigenvalue . The subspace Ker (A I) is called the eigenspace
associated with the eigenvalue and nullity (A I) = dim Ker (A I) is
called the geometric multiplicity of .
The geometric multiplicity of an eigenvalue can be dierent from its algebraic
multiplicity. For example, the matrix
A = 30 73
has the characteristic polynomial pA (t) = (t 3)2 . So 3 is the (only) eigenvalue
of A and
algebraic multiplicity 2. But notice that nullity (A 3I) =
it has 0
7
nullity
0 0 = 1, so the geometric multiplicity of 3 is only 1.
Recall that two matrices, A and B, are similar if there exists a nonsingular
matrix S so that A = S 1 BS:
Problem 4.5. Suppose S 1 BS = A where A and B are n n matrices. If
(; u) is an eigenpair for A, show that (; Su) is an eigenpair for B.
Theorem 4.4. Similar matrices have the same eigenvalues with the same geometric and algebraic multiplicities.
Proof:
Suppose A and B are similar, and A = S 1 BS: Then
S 1 (B I)S = S 1 (BS IS)
= S 1 (BS SI)
= S 1 BS I
= A I
Thus
det(A I) = det(S 1 (B I)S)
= det(S) det(B I) det(S)
= det(B I):
Thus A and B have the same characteristic polynomial (having the same roots
with the same multiplicities !).
To show the geometric multiplicities of the eigenvalues are the same, recall
rst that the ranks of similar matrices are the same. Then since
nullity (A I) = n rank (A I)
= n rank (B I) = nullity (B I) ;
is an eigenvalue of both A and B with the same geometric multiplicity. 2
The geometric multiplicity of an eigenvalue can never exceed the algebraic
multiplicity. Indeed, suppose an eigenvalue of the matrix A has geometric
100
The Eigenvalue Problem
multiplicity, g() = nullity(A I), and let the columns of the matrix S1 2
C ng() span the eigenspace for : Ker(A I). The columns of S1 form a
basis for the eigenspace associated with that may be extended to a basis for all
of C n by augmenting S1 with the columns of a second matrix S2 2 C n[n g()] .
The partitioned matrix S = [S1 S2 ] then is invertible and has columns that
span C n , so in particular, there will be matrices A^ 12 and A^ 22 so that AS2 =
^ 12 + S2 A^ 22 . Thus,
S1 A
AS = [AS1 AS2 ]
h
i
^ 12 + S2 A^ 22
S1 S1 A
^ 12
= [S1 S2 ] I0g() A
^ 22 ;
A
and so,
S 1 AS =
Ig()
0
^ 12
A
^ 22 :
A
The characteristic polynomial of A then can be expressed as
^ 22 )
pA (t) = det(tI A) = (t )g() det(tI A
Since det(tI A^ 22 ) is itself a monic polynomial in t that may have (t ) as a
factor, pA (t) has as a root with a multiplicity of at least g().
4.1.2 The Minimal Polynomial
As polynomials play a fundamental role in our understanding of eigenvalues, we
rst recall a useful fact from algebra that will be important in what follows.
Proposition 4.5. If p and q are two polynomials with deg(p) deg(q), then
there exist polynomials and with deg( ) = deg(p) deg(q) and deg() <
deg(q) such that
p(x) = (x)q(x) + (x)
Example 4.6. Suppose p(x) = x3 2x2 + 3x + 2 and q(x) = x2 x + 2. Then
if we take (x) = ax + b and (x) = cx + d, we nd
(x)q(x) + (x) =(ax + b)(x2 x + 2) + (cx + d)
=ax3 + (b a)x2 + (2a b + c)x + (2b + d)
=x3 2x2 + 3x + 2
is satised with coeÆcients a; b; c; and d that solve
9 8
9
2
38
1 0 0 0 >
a>
1>
>
>
>
>
>
<
=
<
6 1
1 0 0 77 b =
2 =:
6
4 2
1 1 0 5>
> c >
> >
> 3 >
>
0 2 0 1 :d; : 2;
Eigenvalues and Eigenvectors
101
One nds a = 1, b = 1, c = 0, and d = 4 and indeed one may multiply out to
verify:
x3 2x2 + 3x + 2 = (x 1)(x2 x + 2) + (4)
Let q(t) = cn tn + + c1 t + c0 be a polynomial. The matrix q(A) is dened
by
q(A) = cn An + + c1 A + c0 I:
We have the following theorem.
Theorem 4.7. Let (; x) be an eigenpair for A, then (q(); x) is an eigenpair
for q (A).
Proof: Since (; x) is an eigenpair for A, we have Ax = x. Thus
A2 x = AAx = Ax = Ax = 2 x:
Continuing we see that for all k = 1; 2; : : : ; n
Ak x = k x:
Thus
q(A)x = cn An x + + c1 Ax + c0 Ix
= cn n x + + c1 x + c0 x
= q()x:
Thus (q(); x) is an eigenpair for q(A). 2.
Problem 4.6. Show that U Uq() and give an example showing that U 6
Uq() is possible.
Pick any vector v 2 C n and consider the sequence of vectors fv; Av; A2 v; : : : g.
Such a sequence of vectors is called a Krylov sequence generated by v. Since
each vector of the sequence is in C n , not more than n vectors of any Krylov
sequence can be linearly independent. Dene the subspace, M, spanned by the
full Krylov sequence:
M = spanfv; Av; A2 v; : : : g:
M is called the cyclic subspace generated by v. The rst s def
= dim(M) vectors of
the Krylov sequence are linearly independent and so, must constitute a basis for
M. Indeed, if this were not the case then A v 2 spanfv; Av; A2 v; : : : A 1 vg
for some < s which would imply that for each successive power of A:
A +1v 2 spanfAv; A2 v; : : : A vg
spanfv; Av; A2 v; : : : A 1 vg
+2
A v 2 spanfA2 v; A3 v; : : : A +1 vg
spanfv; Av; A2 v; : : : A 1 vg
..
.
102
The Eigenvalue Problem
which would imply in turn that
M spanfv; Av; A v; : : : A vg;
and that dim M = s , contradicting the tentative assertion that < s.
Pick any vector v 2 C n . From the discussion above we see that if s is
the dimension of the cyclic subspace generated by v then s n and As v 2
spanfv; Av; A v; : : : As vg. Thus, for some set of coeÆcients fc ; c ; : : : ; cs g,
As v = cs As v + + c Av + c v:
or equivalently,
As cs As
c A c I v =0
q(A)v =0
where q is a polynomial of degree s. q is called the minimal polynomial for
v and is said to \annihilate" v. Any polynomial that annihilates v will have
the minimal polynomial for v as a factor (which is why it's called \minimal").
To see this, suppose that a polynomial p annihilates v: p(A)v = 0. p has to
have degree at least as big as s (the degree of p must be at least as big as that
of q otherwise the cyclic subspace that led us to q would have had a smaller
dimension). But then we can divide q into p to get
p(t) = (t)q(t) + (t);
where (t) is the remainder left which must then have degree strictly less than
s. This leads to trouble since then
0 = p(A)v = (A)q(A)v + (A)v = (A)v
and (A) is a polynomial of degree strictly less than s = deg(q) that still
annihilates v. But the same reasoning that led us to conclude that the degree
of p had to be at least as big as s = deg(q) then forces us to a contradiction
unless (t) 0. Thus = 0 and q is a factor for p.
From the fundamental theorem of algebra, any polynomial of degree s, and
q(t) in particular, has precisely s roots (counting possible multiplicity) and may
be factored as
q(t) = (t )(t ) : : : (t s ):
At least one of f ; ; : : : ; s g must be an eigenvalue of A, since q(A)v = 0
implies that q(A) is singular and since
q(A) = (A I)(A I) : : : (A s I);
at least one of the factors (A i I) must be singular { if not even one were
singular than q(A) would be nonsingular, leading to a contradiction.
2
2
1
1
0
1
1
1
1
1
1
1
1
2
2
1
2
0
0
1
1
Eigenvalues and Eigenvectors
103
But in fact, they all have to be singular. Suppose to the contrary that one of
the factors is nonsingular and relabel the s so that (A s I) is the nonsingular
factor. Then we may premultiply q(A)v = 0 by (A s I) 1 to get
(A 1 I)(A 2 I) : : : (A s 1 I)v =0
As 1 c^s 2 As 2 c^1 A c^0 I v =0
for some set of coeÆcients fc^0; c^1 ; : : : ; c^s 2 g. But this implies that As 1 v 2
spanfv; Av; A2 v; : : : As 2 vg and that the cyclic subspace generated by v
has dimension no bigger than s 1, q which then conicts with the way that
s was dened. This means that all the roots of the minimal polynomial for a
vector v will always be eigenvalues of the matrix A.
Example 4.8. Let
3
1
A= 2 0 :
Start with v = 10 and form
K = v Av A2 v
1
3
7
= 0 2 6
The reduced row echelon form for K is
1
0
2
U= 0 1 3
Thus, A2 v = ( 2)v + 3Av which we rearrange to get
A2 3A + 2I v = 0
Then we nd that q() = 2 3 + 2 = ( 1)( 2): Thus 1 = 1 and 2 = 2
are eigenvalues of A.
To nd the eigenspace associated with 1 = 1 , we need to nd the solution
set to the system
(A 1I)u = 0:
The solution set for this system is a one-dimensional subspace spanned by the
vector (1; 1).
Similarly the eigenspace associated with 2 = 2 is a one-dimensional subspace spanned by ( 2; 1).
Since there are an innite number of possibilities for selecting the starting
vector v, how many dierent eigenvalues of A could be generated ? Well, since
the minimal polynomial for any vector can't have degree any larger than n,
104
The Eigenvalue Problem
there will be choices of v, say v^ is one of them, for which the associated minimal polynomial q^ has maximal degree, m, { so all other minimal polynomials
associated with other vectors can have degree no larger than m.
In fact, all other minimal polynomials will be factors of q^! To see that, pick
a second vector u that is not a scalar multiple of v^ and let qu be the minimal
polynomial for u. Dene quv^ to be the lowest common multiple of qu and q^ {
the lowest order polynomial that contains both qu and q^ as factors. Then quv^
annihilates every vector in the two dimensional subspace spanned by v^ and u:
quv^ (A)(v^ + u) = 0;
and so must contain the minimal polynomial of each vector in spanfv^ ; ug as
a factor. How many distinct such minimal polynomials could there be ? quv^
has degree no bigger than 2m; minimal polynomials will have degree no bigger
than m, so we can count the number of ways one may pick m or fewer monomial factors out of set of 2m possible choices. The exact maximum number
of possibilities is not so important as the fact that there is a nite number
of possibilities and so there must be only a nite number of possible distinct
minimal polynomials for vectors in spanfv^ ; ug. We can partition this two dimensional subspace into a nite number of equivalence classes, each containing
those vectors having the same minimal annihilating polynomials. These equivalence classes can't all be lines since we can't cover a two dimensional space
with a nite number of lines. Thus it will have to happen that two linearly
independent vectors x; y 2 spanfv^ ; ug could be found that will have the same
minimal polynomials qx = qy . Now we can go through the previous argument
and construct a lowest common multiple qxy which annihilates all vectors in
spanfx; yg = spanfv^ ; ug. But now qxy = qx = qy has degree no bigger than
m; it annihilates v^ and so contains q^ as a factor; and since both qxy and q^ have
degree m, it must be that qxy = q^. So q^ annihilates every vector in spanfv^ ; ug
{ but since u was arbitrarily chosen, q^ actually annihilates every vector in C n :
q^(A)v = 0 for all v 2 C n
or equivalently
q^(A) = 0:
q^ is called the minimal polynomial for A. Since q^ is itself the minimal polynomial
for a vector v^ , there can be no polynomial of lower degree that satises q(A) = 0.
In particular, this means (via arguments similar to what we explored above) that
any polynomial p that satises p(A) = 0 must contain q^ as a factor.
Problem 4.7. Why is it necessary for the minimal polynomial of A to have
each eigenvalue of A as a root ?
4.2.
4.2
INVARIANT SUBSPACES AND JORDAN FORMS
105
Invariant Subspaces and Jordan Forms
Prerequisites:
Matrix algebra.
Vector spaces
Familiarity with minimal polynomials
Learning Objectives:
Familiarity with invariant subspaces
Familiarity with spectral projectors and their relationship with the minimal polynomial.
Familiarity with Jordan forms associated with a matrix.
Another (potentially less familiar) background fact from polynomial algebra
that will be useful for us gives conditions suÆcient to guarantee that two polynomials can be found that when used as \coeÆcients" in a linear combination of
a pair of the given polynomials, all powers but the constant term can be made
to cancel out. This actually is an outcome of a deeper result giving conditions
for the principal ideal generated by a pair of polynomials to be the full set of
all polynomials { (including the constant polynomial, p(t) = c). A proof of this
result is not oered here.
Proposition 4.9. If q1 and q2 are polynomials with no roots in common, then
there exist polynomials p1 and p2 so that
p1 (x)q1 (x) + p2 (x)q2 (x) = 1
Example 4.10. Suppose q1 (x) = x 2 and q2 (x) = (x 3)2 . Then if we try
p1 (x) = ax + b and p2 (x) = c (constant), we nd
1 = (ax + b)(x 2) + (c)(x 3)2
(4.4)
2
= (a + c)x + (b + 2a 6c)x + ( 2b + 9c):
(4.5)
The coeÆcients a; b; and c satisfy the linear system
9 8
9
2
38
1 0 1 <a= <0=
4 2
1 65 b = 0 :
0 2 9 :c; :1;
This leads nally to a = 1, b = 4, and c = 1. One may verify that
( x + 4)(x 2) + (x 3)2 = 1
106
The Eigenvalue Problem
4.2.1 Invariant Subspaces
A subspace U of C n is an invariant subspace for A if x 2 U implies that
Ax 2 U , or more succinctly, if AU U . A matrix A 2 C nn will have a
variety of invariant subspaces: the trivial subspace f0g, the whole space C n ,
and eigenspaces are all invariant subspaces. For any vector v 2 C n , the cyclic
subspace generated by v is also an invariant subpace for A. Indeed, since any
invariant subpace of A that contains v must also contain A v for each integer
> 0, M is the smallest invariant subspace of A that contains v.
A large variety of invariant subspaces can be associated with the minimal
polynomial.
Theorem 4.11. If the minimal polynomial, qA , associated with the matrix A
is factored as qA (t) = q1 (t)q2 (t) so that q1 and q2 have no common factor then
the two subspaces U1 = Ker(q1 (A)) and U2 = Ker(q2 (A)) satisfy:
= U1 U2 which means C n = span(U1 ; U2 ) and U1 \ U2 = f0g
U1 and U2 are invariant subspaces for A
1. C n
2.
3. the polynomials q1 and q2 are the lowest order polynomials that annihilate
U1 and U2 , respectively.
Proof: Since q1 and q2 have no common factor, there exist polynomials p1
and p2 such that
p1 (t)q1 (t) + p2 (t)q2 (t) = 1
which implies
p1 (A)q1 (A) + p2 (A)q2 (A) = I:
Notice that for any x 2 C n ,
x = p1 (A)q1 (A)x + p2 (A)q2 (A)x;
so dening x1 = p2 (A)q2 (A)x and x2 = p1 (A)q1 (A)x, observe
q1 (A)x1 =q1 (A)p2 (A)q2 (A)x
=p2 (A)qA (A)x = 0
thus x1 2 U1 . A similar argument shows that x2 2 U2 . This shows that every
vector x 2 C n may written as
x = x1 + x2
for some x1 2 U1 and x2 2 U2 . Suppose w 2 U1 \ U2 . One may write
w = p1 (A)q1 (A)w + p2 (A)q2 (A)w = 0;
thus U1 \ U2 = f0g.
(4.6)
107
Invariant Subspaces and Jordan Forms
To see that U1 and U2 are invariant subspaces for A, notice that x 2 U1
implies q1 (A)x = 0 which implies Aq1 (A)x = 0 which implies q1 (A)Ax = 0
which implies Ax 2 U1 . A similar argument can be mustered for U2 .
Finally, suppose that r1 is any polynomial that annihilates U1 . Then for
each x 2 C n , we have from (4.6)
r1 (A)q2 (A)x = q2 (A)r1 (A)x1 + r1 (A)q2 (A)x2 = 0
and r1 (A)q2 (A) = 0. But this implies that r1 (t)q2 (t) must have the minimal
polynomial qA (t) = q1 (t)q2 (t) as a factor { or equivalently r1 (t) must have q1 (t)
as a factor. The same may be done for any polynomial r2 that annihilates U2 .
2
A matrix P is a spectral projector for a matrix A if
1. P2 = P
2. Ker(P) is an invariant subspace for A
3. Ran(P) is an invariant subspace for A
Theorem 4.12. The matrices E1 = p2 (A)q2 (A) and E2 = p1 (A)q1 (A) are
spectral projectors onto the invariant subspaces U1 and U2 , respectively.
Proof. Immediately, we have that Ker(E1 ) = U2 and Ker(E2 ) = U1 which we
established as invariant subspaces for A in the previous theorem. We next show
that E1 and E2 are idempotent. The previous theorem established that for any
x 2 C n,
x = E1 x + E2 x:
Applying E1 to both sides yields
E1 x = E21 x + E1 E2 x:
and
E1 E2 =p1 (A)p2 (A)q1 (A)q2 (A)
=p1 (A)p2 (A)qA (A) = 0
Thus, E1 = E21 and a similar argument gives that E2 = E22 . Now note that
this means Ran(E1) = Ker(E2 ) = U1 and Ran(E2 ) = Ker(E1 ) = U2 are also
invariant subspaces for A.
The ascent,1 (), of the eigenvalue is the smallest integer > 0 so that
Ker(A I) = Ker(A I)+1 .
Theorem 4.13. The minimal polynomial, qA , associated with the matrix A
can be represented as
qA (t) =
Y
2(A)
(t )()
(4.7)
1 This is sometimes called the index of an eigenvalue, but we eschew the term as too
nondescript.
108
The Eigenvalue Problem
Proof: The distinguishing feature of (4.7) is the ascent as exponent in
each factor of (t ). Suppose is an eigenvalue of A and factor the minimal
polynomial as
qA (t) = q1 (t)q2 (t)
with q1 (t) = (t )m for some m and q2 (t) such that q2 () 6= 0. The ascent
() is dened so that (t )() is the minimal order polynomial annihilating
Ker(A I)` for all ` () Since q1 is the minimal order polynomial that
annihilates Ker(A I)m it cannot happen that m > (). If it should happen
that m < (), then we could nd z 2 Ker(A I)m+1 such that z 62 Ker(A
I)m . Now, we can use (4.6) to write z = x1 + x2 for some x1 2 Ker(A I)m
and x2 2 Ker(q2 (A)). Since (t )m+1 annihilates both z and x1 , it must
annihilate x2 as well. But if p(t) = (t )m+1 annihilates x2 , p(t) must contain
the minimal polynomial for x2 and so p(t) must share a factor with the minimal
polynomial for any invariant subspace containing x2 { in particular, (t ) must
be a factor of q2 (t). This conicts with the original construction guaranteeing
that q2 () 6= 0. So ultimately, it must be that m = (). 2
4.2.2 Jordan Forms
If A has N distinct eigenvalues labeled as f1 ; 2 ; : : : ; N g then (4.7) is manifested as
qA (t) = (t 1 )(1 ) (t 2 )(2 ) : : : (t N )(N )
(4.8)
Since i 6= j if i 6= j , the individual terms (t i )(i ) share no common
factors with any of the other terms (t j )(j ) when i 6= j . A straightforward
extension of Theorem 4.11 is evident:
Theorem 4.14. Let A have a minimal polynomial given by (4.8) and dene
the N subspaces
Ui = Ker(A i I) i
(
)
for i = 1; 2; : : : ; N
Then
= U1 U2 : : : UN , which means C n = span(U1 ; U2 ; : : : UN ) and
Ui \ Uj = f0g whenever i 6= j ;
1. C n
2. Each Ui is an invariant subspace for A; and
3. qi (t) = (t
i )(i ) is the lowest order polynomial that annihilates Ui .
The proof is omitted but follows the same lines as the proof of Theorem
4.11. Theorem 4.14 leads immediately to a fundamental matrix representation
called the Jordan Form :
109
Invariant Subspaces and Jordan Forms
Theorem 4.15. There exists an invertible matrix S such that
2
3
J(1 )
0 :::
0
6
0 J(2 )
0 77
S 1 AS = 6
6
7
..
..
4
with
2
J(i ) =
6
6
6
6
6
6
6
4
.
0
i I R(12i) : : :
0 i I : : :
0
0
.
0
..
0
0
.
..
J(N )
5
R(1i)(i )
R(2i)(i )
.
..
.
i I R(i()i ) 1;(i )
0 i I
3
7
7
7
7
7
7
7
5
i)
(i)
where each matrix in the superdiagonal R(`;`
+1 has full column rank and rank (R`;`+1 ) =
nullity(A i I)`+1 nullity(A i I)` .
Proof: For each i = 1; : : : ; N , let Si be a matrix whose columns form
a basis for Ker(A i I)(i ) . Then S = [S1 ; S2 ; : : : ; SN ] has columns that
collectively form a basis for C n and so S is invertible. Since each Si spans an
invariant subspace for A, there exist matrices, call them Ji for the time being,
so that
ASi = Si Ji
(4.9)
Each Ji is a matrix representation of A as a linear transformation of Ui to Ui
with respect to the basis given by Si . As a result, Ji will always be a square
matrix with a dimension equal to nullity(A i I)(i ) Notice that dierent
choices for bases for Ker(A i I)(i ) produce dierent choices for Si which
in turn, induce dierent choices for Ji . But for any such choice, (4.9) remains
valid and can be written collectively as
3
2
J1 0 : : : 0
6 0 J2
0 77
(4.10)
AS = S 6
7:
6 .
.
..
5
4 ..
0 0
JN
One natural approach to take in constructing a particular basis for Ker(A
i I)(i ) begins with an orthonormal basis for Ker(A i I) as columns of a
matrix S(1i) . Since Ker(A i I) Ker(A i I)2 , one may extend the columns
of S(1i) to an orthonormal basis for Ker(A i I)2 by appending a set of columns,
S(2i) . Notice then that Ran[(A i I)S(2i) ] Ker(A i I), so there must be a
matrix R(12i) such that (A i I)S(2i) = S(1i) R(12i) . Furthermore, R(12i) must have
full(i)column rank for if it didn't there would be a nontrivial
vector z such that
R12 z = 0 which would then imply that (A i I)S(2i) z = 0. This identies a
110
The Eigenvalue Problem
linear combination of columns of S(2i) that lies in Ker(A i I) despite the fact
that we constructed S(2i) so as to have all columns orthogonal to Ker(A i I).
Thus R(12i) must have full column rank.
We may continue the process so that at step `, say, the columns of [S(1i) ; S(2i) ; : : : ; S(`i) ]
form an orthonormal basis for Ker(A i I)` and the columns of S(`i) include only
those basis vectors for Ker(A i I)` that are orthogonal to Ker(A i I)` 1 .
Thus, for any i = 1; : : : ; N
(A i I)S(1i) = 0
(A i I)S(2i) = S(1i) R(12i)
(A i I)S(3i) = S(2i) R(23i) + S(1i) R(13i)
..
.
(A i I)S(i()i ) =
(X
i )
1
j =1
S(ji) R(ji)(i )
These expressions can be rearranged to discover
ASi = ASi1 ; ASi2 ; : : : ; ASi(i )
2
= 4i S(1i) ; i S(2i) + S(1i) R(12i) ; : : : ; i S(i()i ) +
2
h
= S(1i) ; S(2i) ; : : : ;
6
6
i6
(i)
S(i ) 6
6
6
6
4
= Si J(i )
i
i I R
0 i I
( )
12
0
0
:::
:::
(X
i )
j =1
1
3
S(ji) R(ji)(i ) 5
3
R(1i)(i )
R(2i)(i )
. . . . . . ...
0 i I R(i()i )
0 0 i I
1
;(i )
7
7
7
7
7
7
7
5
Thus, (4.10) holds with Ji = J(i ). 2
Theorem 4.16. Let A be an n n matrix with characteristic polynomial pA (t)
and let m(i ) denote the algebraic multiplicity of i as an eigenvalue of A. Then
g(i ) + (i ) 1 m(i ) g(i ) (i )
If the dimension of J(i ) is labeled as di then det(tI J(i )) = (t i )di
and Theorem 4.15 can be used to establish
Proof:
pA (t) =
N
Y
i=1
det(tI J(i )) =
N
Y
i=1
(t i )di ;
Invariant Subspaces and Jordan Forms
111
indicating that the algebraic multiplicity of i is m(i ) = di . Now, from the
form of J(i ), one sees immediately that g(i ) is the dimension of the (1; 1)
block of J(i ), the total number of blocks must be (i ), and these blocks
cannot be increasing in dimension. So upper and lower bounds for di in terms
of g(i ) and (i ) are evident by inspection. 2
Corollary 4.17. The \Cayley-Hamilton Theorem"
pA (A) = 0
Since (i ) m(i ), pA (t) must contain the minimal polynomial of A
as a factor, and so must annihilate A.
Proof.
112
4.3
The Eigenvalue Problem
Diagonalization
Prerequisites:
Basic denitions for the eigenvalue problem
Matrix algebra.
Vector spaces
Learning Objectives:
Familiarity with conditions under which a matrix can be diagonalized.
Familiarity with conditions suÆcient to guarantee linearly independent
eigenvectors.
Let A be an n n matrix. If A is similar to a diagonal matrix, then A is
said to be diagonalizable.
Theorem 4.18. The matrix A 2 C nn is diagonalizable if and only if A has
a set of n linearly independent eigenvectors.
Proof: Suppose x1 ; : : : ; xn are n linearly independent eigenvectors of A.
Let X be the n n matrix whose ith column is xi . Since the columns of X are
linearly independent, we see that X is invertible. Then
X 1 AX = X 1 A [x1 : : : xn ]
= X 1 [Ax1 : : : Axn ]
= X 1 [12x1 : : : n xn ]
3
= X
2
=
6
6
6
4
6
1
X6
6
4
1
0
1
2
0
...
3
2
0
...
0
n
7
7
7
5
n
7
7
7
5
Conversely, suppose that A is similar to a diagonal matrix
2
6
D=6
6
4
1
0
3
2
...
0
n
7
7
7
5
113
Diagonalization
Then there exists X invertible such that
X 1 AX = D
or equivalently that
AX = XD
and so if we denote the columns of X by x1 ; : : : ; xn , we have
2
6
A [x1 : : : xn ] = [x1 : : : xn ] 6
6
4
1
0
2
...
0
n
3
7
7
7:
5
This implies that for each k = 1; 2; : : : ; n
Axk = k xk :
Thus x1 ; : : : ; xn are eigenvectors for A. Since they are the columns of the
invertible matrix X, they are linearly independent. The theorem is proved. 2
Corollary 4.19. The matrix A is diagonalizable if and only if the ascent of
each eigenvalue is 1; which occurs if and only if each eigenvalue appears as a
simple root of the minimal polynomial of A.
Let A be a diagonalizable matrix that is similar to the diagonal matrix D.
Clearly A can be thought of as the matrix representation for a linear transformation L with respect to the standard basis. As we see from the preceeding
proof, D is the matrix representation of L with respect to the basis formed by
the n independent eigenvectors of A.
Problem 4.8. Show that
A = 00 10
is not diagonalizable.
Example 4.20. Let
2
3
3 2 0
A=4 2 3 0 5
0 0 5
We nd that A has eigenvalues 1 = 1 with multiplicity one and 2 = 3 = 5
with multiplicity two. The 1 = 1 eigenspace has basis
f(1; 1; 0)g
114
The Eigenvalue Problem
and the 2 = 3 = 5 eigenspace has basis
f( 1; 1; 0); (0; 0; 1)g:
Thus setting
2
3
1
X=4 1
0
we have
1 0
1 0 5;
0 1
2
1 0 0
X 1 AX = 4 0 5 0
0 0 5
3
5
Problem 4.9. Diagonalize the following matrices. That is, nd X invertible
and D diagonal such that X 1 AX = D.
2
1.
4
2
2.
4
2
3.
4
2
4.
6
6
4
19
25
17
9
11
9
6
9
4
2
0
0
0
5
3
1 0 0
3 4 0
3 1 3
0 0 0
0 0 0
3 0 1
3
5
3
5
0
2
0
0
0
0
3
1
0
0
0
3
3
7
7
5
! R where
L(x ; x ) = (3x + 4x ; 2x + x ):
relative to which the matrix representation of L is diagonal.
Problem 4.10. Let L : R2
1
Find a basis for R2
2
2
1
2
1
2
Theorem 4.21. Suppose 1 ; : : : ; k are distinct eigenvalues of A. If for each
i = 1; : : : ; k, xi is an eigenvector corresponding to i , then the set
fx ; : : : ; xk g
1
is a linearly independent set.
Diagonalization
115
In particular, Theorem 4.21 implies that if Si is a basis for the i -eigenspace,
i = 1; : : : ; k. of a matrix A with distinct eigenvalues 1 ; : : : ; k then S1 [ [ Sk
is still a linearly independent set.
Proof: Suppose that the set S = fx1 ; : : : ; xk g is linearly dependent. Suppose
further that the largest linearly independent subset of S contains 0 < r < k
vectors so that by re-ordering if necessary we can assume that
fx1 ; : : : ; xr g
are linearly independent, while
fx1 ; : : : ; xr ; xr+1 g
is linearly dependent. Then
k1 x1 + + kr xr + kr+1 xr+1 = 0
(4.11)
has a nontrivial solution k1 ; : : : ; kr ; kr+1 . Applying A to both sides of (4.11)
gives
0 = A(k1 x1 + + kr xr + kr+1 xr+1
= k1 Ax1 + + kr Axr + kr Axr+1
= k1 1 x1 + + kr r xr + kr+1 r+1 xr+1 :
Thus
k1 1 x1 + + kr r xr + kr+1 r+1 xr+1 = 0
(4.12)
Since x1 ; : : : ; xr are independent we have by (4.11) that kr+1 6= 0 and hence that
at least one of k1 ; : : : ; kr is not zero. Multipling (4.11) by r+1 and adding to
(4.12), we obtain that
k1 (1 r+1 )x1 + + kr (r r+1 )xr = 0
Since x1 ; : : : ; xr are linearly independent, this has the solution
k1 (1 r+1 ) = k2 (2 r+1 ) = = kr (r r+1 ) = 0
But since all the i are distinct, this implies that k1 = k2 = = kr = 0, which
in turn forces, kr+1 = 0. This contradicts our starting hypothesis, and so the
theorem is proved. 2
Theorem 4.21 allows us to give a suÆcient condition for a matrix to be
diagonalizable.
Theorem 4.22. If the n n matrix A has n distinct eigenvalues, then A is
diagonalizable.
Proof: Since each of the n distinct eigenvalues gives at least one eigenvector, we have by Theorem 4.21 that the n eigenvectors so obtained are linearly
independent. 2
116
4.4
The Eigenvalue Problem
The Schur Decomposition
Prerequisites:
Basic denitions for the eigenvalue problem
Matrix algebra.
Unitary matrices and orthogonal bases
Learning Objectives:
Familiarity with the construction of unitary triangularization.
Understanding in what sense nondiagonalizable matrices are always close
to dagonalizable matrices.
We have seen in the previous two sections that not every square matrix
is similar to a matrix in diagonal form, although construction of the Jordan
form showed that every matrix is similar to a \block diagonal" form having
upper triangular blocks. We pursue here the possibility that every matrix could
be similar to some matrix in triangular form, though not necessarily \block
diagonal" as for the Jordan form. The following theorem is known as Schur's
Theorem.
Theorem 4.23. Given an n n matrix A, there exists a unitary matrix U
such that
U AU = T
where T is upper triangular.
Clearly Theorem 4.23 tells us that A is similar to T via the matrix U. Since
U is unitary we say that A and T are unitarily equivalent. Also the eigenvalues
of A are the main diagonal entries of T. Indeed, this follows since the eigenvalues
of an upper triangular matrix are the entries on the main diagonal and similar
matrices have the same eigenvalues.
Proof: We will prove this theorem by induction on the size n of the matrix.
Suppose A is a 2 2 matrix. Then A has at least one eigenpair (1 ; u1 ) with
u1 a unit vector. If the rst coordinate of u1 is nonzero, let
S = fu1; e2 g
If the rst coordinate is zero, then the second coordinate is not zero and we let
S = fu1; e1 g
117
The Schur Decomposition
In either case S is a basis for C 2 . Use the Gram-Schmidt process to get an
orthonormal basis
fu ; v g
1
2
of C 2 . Let U be the unitary matrix with columns u1 and v2 . Then AU has
columns
Au1 = 1 u1 and Av2
Thus the matrix U AU has rst column
h
u
1 ; u1 i
1 U u1 = 1 hv ; u i = 1 10
2
1
The matrix U AU is upper triangular and we are done in the n = 2 case.
If A is a 3 3 matrix, let (1 ; u1 ) be an eigenpair as before. Again u1 has
at least one nonzero coordinate. If the rst coordinate is nonzero, let
S = fu1 ; e2 ; e3g:
If the rst coordinate is zero but the second is nonzero, replace e2 in S by e1 .
If the rst two coordinates of u1 are both zero, replace e3 by e1 . In any case
S is a basis for C 3 . Using the Gram-Schmidt process we obtain a orthonormal
basis
fu ; v ; v :g
1
2
3
Let V be the unitary matrix with these vectors as columns. Then
2
1
0
B
V AV = 4 0
3
5
where B is a 2 2 matrix and the symbol denotes an entry which may change
from occurence to occurence, but is inconsequential in the proof of the theorem.
We know from the 2 2 case already proved, that there exists a 2 2 unitary
matrix W such that
W BW = T1
where T1 is a 2 2 upper triangular matrix. Set
2
Y=4
1
0
0
0 0
W
3
5
118
We have
The Eigenvalue Problem
(VY) A(VY) = Y (2V AV)Y
3
1 5Y
= Y 4 0
B
0
2
3
1
= 4 0 W BW 5
0
2
3
1 5
= 4 0
T
1
0
= T:
Since the columns of Y are orthonormal, Y is unitary. Thus setting U =
(VY), we have that U is the product of two unitary matrices and is hence
unitary. Since
U AU = T;
we are done in the 3 3 case.
For n = 4 we proceed as we did in the n = 3 case except that B is 3 3, Y
has the vector (1; 0; 0; 0) for its rst row and column and W is the matrix that
\triangularizes" B. It should now be clear how to proceed for general n. 2
Example 4.24. Let
2
3
0 0 1
A=4 0 0 1 5
4 0 0
We nd that
pA (t) = t(t 2)(t + 2):
and that (0; e2 ) is an eigenpair for A. We set
S = fe2 ; e1 ; e3 g
which coincidentally is already orthonormal. Set
2
3
0 1 0
V=4 1 0 0 5
0 0 1
We nd that
2
3
0 0 1
V AV = 4 0 0 1 5
0 4 0
119
The Schur Decomposition
We now triangularize the 2 2 matrix
B = 04 10
We already know that the diagonal elements of the nal 3 3 upper triangular
matrix are 0; 2; 2. This means that B must have eigenvalues 2 and 2. We
nd that
p
p
u1 = ( 1= 5; 2= 5)
is an eigenvector corresponding to 2. We now set S = fu1 ; e2 g Applying GramSchmidt we obtain the orthonormal basis
p p
p p
( 1= 5; 2= 5); (2= 5; 1= 5):
We set
p
p 5
2
=
1
=
p
p5
W=
2= 5 1= 5
and
2
1
0 p0 3
p
Y = 4 0 1=p5 2=p5 5
0 2= 5 1= 5
Finally we nd that
p
p 3
2
0 1= 5 2= 5
U = VY = 4 1
p0 p0 5
0 2= 5 1= 5
and that
2
p 3
p
0 2= 5 1= 5
U AU = T = 4 0
2
35
0
0
2
Problem 4.11. Show that given an n n matrix A, there exists a unitary
matrix U such that
U AU = L
where L is lower triangular.
Problem 4.12. Find upper triangular matrices unitarily equivalent to the following matrices
1. 12 03
120
The Eigenvalue Problem
2
3
0 1 1
2. 4 0 1 1 5 (Use the fact that (0; e1) is an eigenpair.)
0 1 0
Let A = (aij ), B = (bij ) be two n n complex matrices. We dene the
distance from A to B, denoted by d(A; B) to be
d(A; B) = kA BkF
We say that the a sequence of matrices (Ak ) converges to a matrixA if d(Ak ; A) !
0 as k ! 1.
Theorem 4.25. Let A be an n n matrix. There exist diagonalizable matrices
arbitrarily close to A
It is enough to nd for each integer k a matrix Ak with distinct eigenvalues such that d(A; Ak ) < 1=k. By Schur's Theorem, we can nd an unitary
matrix U and an upper triangular matrix T such that
2
1
3
7 ...
A = UTU = U 6
4
5U
0
n
Now dene
2 0
1
3
7 ...
Ak = UT1 U = U 6
4
5U
0
0
n
Here we mean that T and T1 dier only in their diagonal elements. It should
be clear that we can make ji 0i j small enough so that d(A; Ak ) < 1=k. 2
In eect, this amounts to asserting that the set of nondiagonalizable matrices
is \thin" in some sense.
Proof:
Problem 4.13. Recall the trace of an n n matrix A = [aij ], is dened by:
trace(A) = a11 + a22 + + ann
Prove that similar matrices have the same trace. Let A have eigenvalues
1 ; : : : ; n counting multiplicity. Show that
trace (A) = 1 + + n
(Comment: note that the characteristic polynomial of A, pA can be written
pA (t) = (t 1 ) (t n ):
So, the coeÆcient of tn 1 in the characteristic polynomial is trace (A)).
Problem 4.14. Show that if A has eigenvalues 1 ; : : : ; n counting multiplicity, then det(A) = 1 2 n and the constant term in the characteristic
polynomial pA (t) is det(A).
4.5.
4.5
HERMITIAN AND OTHER NORMAL MATRICES
121
Hermitian and other Normal Matrices
Prerequisites:
Schur decomposition
Matrix algebra.
Learning Objectives:
Familiarity with Hermitian, skew-Hermitian, and positive denite matrices.
Familiarity with the characterization of unitarily diagonalizable matrices.
Understanding the relationship of the SVD of A with the unitary diagonalization of A A.
The Schur decomposition reveals structure for many matrices that have some
sort of symmetry or other features that can be described using both the matrix
and its transpose.
4.5.1 Hermitian matrices
An n n matrix H is called Hermitian or self-adjoint if H = H . If H is
Hermitian and has only real entries, then H = Ht , and we say that H is
symmetric.
We say that an n n matrix A is unitarily diagonalizable if there exists a
unitary matrix U and a diagonal matrix D such that U AU = D.
As another application of Schur's Theorem we have the following important
result.
Theorem 4.26. Let H be Hermitian. Then H is unitarily diagonalizable and
has only real eigenvalues.
Proof: By Schur's Theorem, there exists a unitary matrix U and an upper
triangular matrix T such that
T = U HU:
Taking the adjoint of both sides of this equation and using the fact that H is
Hermitian, we obtain
T = (U HU) = U H U = U HU
Thus T = T . Since T is upper triangular and T is lower triangular, we see
that T is diagonal with only real entries on the main diagonal. Since these
diagonal entries are precisely the eigenvalues of H, the theorem is proved. 2
122
The Eigenvalue Problem
Theorem 4.27. Let H be a Hermitian matrix. Eigenvectors belonging to different eigenspaces are orthogonal.
Proof:
and
Suppose that h; ui and h; vi are eigenpairs for H with 6= . Then
hHu; vi = hu; vi
hu; Hvi = hu; vi:
But since H is Hermitian,
hu; Hvi = hu; H vi = hHu; vi
Hence hu; vi = hu; vi. This implies that
(u; vi = (u; vi = hu; vi
since is real by Theorem 4.26. Since 6= , it must be that hu; vi = 0. The
theorem is proved. 2
We know as a consequence of the previous two theorems we can unitarily
diagonalize the Hermitian matrix H as follows. Suppose 1 ; : : : ; N are the
distinct eigenvalues of H. Find a basis for each i -eigenspace and orthonormalize
it via the Gram-Schmidt process. By Theorem 4.27 distinct eigenspaces are
orthogonal and so we have produced n orthonormal eigenvectors which we use
for the columns of U.
Problem 4.15. Find a unitary matrix U that diagonalizes
2
3
4 2 2
1. H = 4 2 4 2 5
2 2 4
2. H = 1 2 i 1 +3 i
3
2
3 1 0 0 0
6 1 3 0 0 0 7
7
6
3. H = 66 0 0 2 1 1 77 (Note that pA (t) = (t 4)2 (t 1)2 (t 2).)
4 0 0 1 2 1 5
0 0 1 1 2
4.5.2 Normal Matrices
Which matrices are unitarily diagonalizable ?
Theorem 4.28. An n n matrix N is unitarily diagonalizable if and only if
N N = NN
Such matrices are called normal.
123
Hermitian and other Normal Matrices
If N is unitarily diagonalizable then N = UDU for some unitary
matrix U and diagonal matrix D. Since D D = DD , N must be normal.
Conversely suppose N is a normal matrix and let N = UTU be the Schur
decomposition of N. N N = NN implies that T T = TT . Now, calculate
the (k,k) diagonal entry on each side of this equality:
Proof:
k
X
j =1
jtjk j =
2
n
X
j =k
jtkj j
2
For k = 1:
jt j = jt j + jt j + + jt n j
0 = jt j + + jt n j
which implies t = = t n = 0. For k = 2:
jt j + jt j = jt j + jt j + + jt n j
0 = jt j + + jt n j
which implies t = t = = t n = 0. For k = 3:
jt j + jt j + jt j = jt j + jt j + + jt n j
0 = jt j + + jt n j
which implies t = t = = t n = 0. Thus the rst three rows of T have
only zero o-diagonal entries. One can continue in this way to nd ultimately
that all rows have only zero o-diagonal entries, so that T is, in fact, a diagonal
matrix. 2
Problem 4.16. A matrix A is called skew-Hermitian if A = A . Show that
skew-Hermitian matrices are normal. Show that if A is skew-Hermitian then all
the entries on the main diagonal are pure imaginary and that all eigenvalues of
A are pure imaginary.
Problem 4.17. Show that a unitary matrix is unitarily diagonalizable.
11
2
11
12
12
2
12
2
2
2
22
2
22
24
23
2
23
2
35
2
2
2
2
2
2
2
33
2
33
34
34
2
2
1
23
13
1
1
12
23
2
2
2
34
2
3
3
2
2
3
4.5.3 Positive Denite Matrices
An n n Hermitian matrix H is called positive denite if
x Hx > 0
for all nonzero x.
Theorem 4.29. A Hermitian matrix H is positive denite if and only if all its
eigenvalues are positive.
124
The Eigenvalue Problem
Proof: Suppose H is positive denite and (; x) is an eigenpair. Then if x
is nonzero,
0 < x Hx = x x:
Since x x > 0, we have that > 0.
Conversely assume that all the eigenvalues of H are positive. Then there
exists U unitary and D diagonal such that
HU = UD:
Given a nonzero x in C n , then y = U x is nonzero and satises Uy = x. Let
y = (y1 ; : : : ; yn ). We have
hHx; xi =
=
=
=
=
hHUy; Uyi
hUDy; Uyi
hU UDy; yi
hDy; yi
jy j + + n jyn j
> 0:
1
1
2
2
The last inequality follows since all the i are positive and at least one of the
yi is nonzero. 2
Let fp1 ; p2 ; : : : pr g be a set of vectors in C n . The Gram matrix of fp1; p2 ; : : : pr g
is the r r matrix dened by G = [pi pj ].
Theorem 4.30. The Gram matrix G = [pi pj ] of a set of vectors fp1; p2 ; : : : pr g
is positive denite if and only if the vectors are linearly independent.
Proof:
Suppose that G is positive denite. Then if p = Prj=1 xj pj ,
r
X
kpk = h
2
i=1
xi pi ;
r
X
j =1
xj pj i = x Gx:
Thus if fp1 ; p2 ; : : : pr g is linearly dependent so that there is a nontrivial linear
combination that yields p = 0 then G cannot be positive denite. The converse
just reverses the argument and is left as an exercise. 2
As another application where positive denite matrices arise, consider the
real valued function f dened on , a domain in R2 . Suppose that the function
is dierentiable with continuous second and third partials on . The Taylor
125
Hermitian and other Normal Matrices
series for such a function at a point (x0 ; y0 ) 2 is given by
@f @f f (x0 + h; y0 + k) = f (x0 ; y0 ) +
h+
k
(x0 ;y0 )
@x
@y (x0 ;y0 )
2
2
@ f + @@xf2 (x0 ;y0 ) h2 + 2 @x@y
hk
(x0 ;y0 )
2
+ @@yf2 (x0 ;y0 ) k2 + R(x0 ; y0 ; h; k)
h
h
h
2
= f (x0 ; y0 ) + rf k + r f k ; k
+R(x0 ; y0 ; h; k):
Here
rf =
h 2
rf
2
@f @x (x0 ;y0 )
@ 2 f 2
= 4 @@x2 f (x0 ;y0 ) @x@y (x0 ;y0 )
i
@f (
x
;y
)
@y 0 0
3
@ 2 f (x ;y )
@x@y
2 0 0 5
@ f
@y2 (x0 ;y0 )
and R is a remainder term such that R=(h2 + k2 ) ! 0 as (h; k) ! (0; 0). The
1 2 matrix rf is called the derivative of f at (x0 ; y0 ) and the 2 2 Hermitian
matrix r2 f is called the Hessian of f at (x0 ; y0 ).
Recall that (x0 ; y0 ) is called a critical point of f if rf = 0 The critical point
(x0 ; y0) is called a strict local minimum for f , if
f (x0 + h; y0 + k) > f (x0 ; y0 )
for all (h; k) suÆciently close to the origin. Since rf = 0 and R is neglegible,
we see that the critical point (x0 ; y0 ) is a strict local minimum if (but not only
if)
r f hk ; hk > 0
for all (h; k) suÆciently close to the origin. Since for all real (4.13)
2
h
r f h
= H hk ; hk ;
k ; k
we see that (x ; y ) is a local minimum if (4.13) holds for all (h; k) 2 R . Thus
the critical point (x ; y ) is a strict local minimum if H is positive denite.
Similarly the critical point (x ; y ) is a strict local maximum if H is positive
denite. We call such a matrix negative denite.
Problem 4.18. Show that the Hessian of f at (x ; y ) is positive denite if and
only if at the point (x ; y )
2
0
2
2
0
0
0
0
0
0
0
0
0
@2f @2f
@2f 2
@2f
> 0; and ( 2 )( 2 ) (
) > 0:
2
@x
@x @y
@x@y
126
The Eigenvalue Problem
4.5.4 Revisiting the Singular Value Decomposition
We learned earlier in this section that a normal matrix is unitarily equivalent to
a diagonal matrix { or to reinterpret in terms of matrix representations: \Any
linear transformation representable by a normal matrix can be represented by
a diagonal matrix after an appropriate choice of an orthonormal basis for C n ."
This is reminiscent of a more general matrix representation discussed at the end
of Chapter 3 known as the singular value decomposition.
We provide here a second derivation of the singular value decomposition
using the Schur decomposition.
Theorem 4.31. Suppose A 2 C mn and rank(A) = r. There exist unitary
matrices U 2 C mm and V 2 C nn so that
A = UV
(4.14)
where = diag(1 ; 2 ; : : : ) 2 C mn with 1 2 r > 0 and r+1 =
= p = 0 for p = minfm; ng.
Proof: Suppose rst that m n. The n n matrix A A is Hermitian,
positive semidenite (i.e. hAx; xi 0forallx) with rank(A A) = r, and so
A A has exactly r nonzero eigenvalues, all of which are positive and so may be
written as squares of numbers i 0:
12 22 r2 > 0
r2+1 = = n2 = 0
Likewise there exists a unitary n n matrix V that diagonalizes A A,
2
0
2
r
V (A A)V = diag(i ) = 0 0 :
Since the o-diagonal terms of V (A A)V = (AV) (AV) are zero, the
columns of (AV) are mutually orthogonal. Likewise, the norm of the i-th column
is i . Thus there is an m n matrix Un so that Un Un = I and Un [r 0] = AV
Now,
augment
the columns of Un to ll out an orthonormal basis of C m :
h
i
^ n . Then dening
U = Un U
= 0r 00
we nd
U = Un [r 0] = AV
and A = UV . We have decomposed A into the product of an m m unitary
matrix, a nonnegative diagonal matrix, and an n n unitary matrix. The case
m < n can be handled easily by considering A instead in the derivation above
and interchanging the roles of U and V. 2