Download Linear Algebra Notes - An error has occurred.

Document related concepts

Rotation matrix wikipedia , lookup

Exterior algebra wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Vector space wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Principal component analysis wikipedia , lookup

Determinant wikipedia , lookup

Jordan normal form wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Gaussian elimination wikipedia , lookup

System of linear equations wikipedia , lookup

Matrix calculus wikipedia , lookup

Four-vector wikipedia , lookup

Ordinary least squares wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Matrix multiplication wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Transcript
Linear Algebra Notes
1
Linear Equations
1.1
Introduction to Linear Systems
Definition 1. A system of linear equations is consistent if it has at least one solution. If it has no solutions
it is inconsistent.
1.2
Matrices, Vectors, and Gauss-Jordan Elimination
Definition 2. Given a system of linear equations
x − 3y = 8
4x + 9y = −2,
the coefficient matrix of the system contains the coefficients of the system:
1 −3
,
4 9
while the augmented matrix of the system contains
1 −3
4 9
the numbers to the right of the equals sign as well:
8
−2 .
Definition 3. A matrix is in reduced row-echelon form (RREF) if it satisfies all of the following
conditions:
a) If a row has nonzero entries, then the first nonzero entry is a 1, called a leading 1.
b) If a column contains a leading 1, then all the other entries in that column are 0.
c) If a row contains a leading 1, then each row above it contains a leading 1 further to the left.
Definition 4. An elementary row operation (ERO) is one of the following types of operations:
a) Row swap: swap two rows.
b) Row division: divide a row by a (nonzero) scalar.
c) Row addition: add a multiple of a row to another row.
Theorem 5. For any matrix A, there is a unique matrix rref(A) in RREF which can be obtained by applying
a sequence of ERO’s to A.
Procedure 6. Gauss-Jordan elimination (GJE) is a procedure for putting a matrix A in RREF by
applying a sequence of ERO’s. It can be applied to the augmented matrix of a system of linear equations to
solve the system.
Imagine a cursor moving through the entries of the augmented matrix, starting in the upper-left corner.
The procedure ends when the cursor leaves the matrix. Note that steps 2, 3, and 4 consist of elementary
row operations.
1
1. If the cursor column contains all 0’s, move the cursor to the right and repeat step 1.
2. If the cursor entry is 0, swap the cursor row with a lower row so that the cursor entry becomes nonzero.
3. If the cursor entry is not 1, divide the cursor row by it to make it 1.
4. If the other entries in the cursor column are nonzero, make them 0 by adding the appropriate multiples
of the cursor row to the other rows.
5. Move the cursor down and to the right and go to step 1.
1.3
On the Solutions of Linear Systems; Matrix Algebra
Definition 7. If a column of the RREF of a coefficient matrix contains a leading 1, the corresponding
variable of the linear system is called a leading variable. If a column does not contain a leading 1, the
corresponding variable is called a free variable.
Definition 8. A row of the form 0 0 · · · 0 | 1 in the RREF of an augmented matrix is called an
inconsistent row, since it signifies the inconsistent equation 0 = 1.
A | b can be read from
Theorem
9. The number of solutions of a linear system with augmented matrix
rref A | b :
rref A | b
no free variables
free variable
no inconsistent rows
1
∞
inconsistent row
0
0
Definition 10 (1.3.2). The rank of a matrix A, written rank(A), is the number of leading 1’s in rref(A).
Theorem 11. For an n × m matrix A, we have rank(A) ≤ n and rank(A) ≤ m.
Proof. Each of the n rows contains at most one leading 1, as does each of the m columns.
Theorem 12 (1.3.4, uniqueness of solution with n equations and n variables). A linear system of
n equations in n variables has a unique solution if and only if the rank of its coefficient matrix A is n, in
which case


1 0 0 ··· 0
0 1 0 · · · 0




rref(A) = 0 0 1 · · · 0 .
 .. .. .. . .
.
. . .
. .. 
0
0
0
···
1
Proof. A system Ax = b has a unique solution if and only if it has no free variables and rref A | b has
no inconsistent rows. This happens exactly when each row and each column of rref(A) contain leading 1’s,
which is equivalent to rank(A) = n. The matrix above is the only n × n matrix in RREF with rank n.
Notation 13. We denote the ijth entry of a matrix A by either aij or Aij or [A]ij . (The final notation is
convenient when working with a compound matrix such as A + B.)
2
Definition 14 (1.3.5). If A and B are n × m matrices, then their sum A + B is defined entry-by-entry:
[A + B]ij = Aij + Bij .
If k is a scalar, then the scalar multiple kA of A is also defined entry-by-entry:
[kA]ij = kAij .
Definition 15 (1.3.7, Row definition of matrix-vector multiplication). The product Ax of an n × m
matrix A and a vector x ∈ Rm is given by




w1 · x
− w1 −
 . 


..
Ax = 
 x =  ..  .
.
−
wn
−
wn · x
Notation 16. We denote the ith component of a vector x by either xi or [x]i . (The latter notation is
convenient when working with a compound vector such as Ax.)
Theorem 17 (1.3.8, Column definition of matrix-vector multiplication). The product Ax of an n×m
matrix A and a vector x ∈ Rm is given by

x 
1
|
|
 .. 


Ax = v1 · · · vm  .  = x1 v1 + · · · + xm vm .
|
|
xm
Proof. According to the row definition of matrix-vector multiplication, the ith component of Ax is
[Ax]i = wi · x
= ai1 x1 + · · · + aim xm
= x1 [v1 ]i + · · · + xm [vm ]i
= [x1 v1 ]i + · · · + [xm vm ]i
= [x1 v1 + · · · + xm vm ]i .
Since Ax and x1 v1 + · · · + xm vm have equal ith components for all i, they are equal vectors.
Definition 18 (1.3.9). A vector b ∈ Rn is called a linear combination of the vectors v1 , . . . , vm in Rn if
there exist scalars x1 , . . . , xm such that
b = x1 v 1 + · · · + xm v m .
Note that Ax is a linear combination of the columns of A. By convention, 0 is considered to be the unique
linear combination of the empty set of vectors.
Theorem 19 (1.3.10, properties of matrix-vector multiplication). If A, B are n × m matrices, x, y ∈
Rm , and k is a scalar, then
a) A(x + y) = Ax + Ay,
b) (A + B)x = Ax + Bx,
c) A(kx) = k(Ax).
3
Proof. Let wi and ui be the ith rows of A and B, respectively. We show that the ith components of each
side are equal.
[A(x + y)]i = wi · (x + y) = wi · x + wi · y = [Ax]i + [Ay]i = [Ax + Ay]i ,
[(A + B)x]i = (ith row of A + B) · x = (wi + ui ) · x = wi · x + ui · x = [Ax]i + [Bx]i = [Ax + Bx]i ,
[A(kx)]i = wi · (kx) = k(wi · x) = k[Ax]i = [k(Ax)]i .
Definition 20 (1.3.11). A linear system with augmented matrix A | b can be written in matrix form as
Ax = b.
4
2
2.1
Linear Transformations
Introduction to Linear Transformations and Their Inverses
Definition 21 (2.1.1). A function T : Rm → Rn is called a linear transformation if there exists an n × m
matrix A such that
T (x) = Ax
for all vectors x in Rm .
Note 22. If T : R2 → R2 , T (x) = y =
above can be written as
or
y1
a
a12
x
, A = 11
, and x = 1 , then the linear transformation
y2
a21 a22
x2
y1
a
a12 x1
= 11
,
y2
a21 a22 x2
(
y1 = a11 x1 + a12 x2
y2 = a21 x1 + a22 x2 .
Definition 23. The identity matrix of size n

1
0


In = 0
 ..
.
is
0
1
0
..
.
0
0
1
..
.
···
···
···
..
.

0
0

0
,
.. 
.
0
0
0
···
1
and T (x) = In x = x is the identity transformation from Rn to Rn . If the value of n is understood,
then we often write just I for In .
 
0
0
 
 .. 
.

Definition 24. The standard (basis) vectors e1 , e2 , . . . , em in Rm are the vectors ei = 
1, with a 1 in
 
.
 .. 
0
the ith place and 0’s elsewhere. Note that for a matrix A with m columns, Aei is the ith column of A.
Theorem 25 (2.1.2, matrix of a linear transformation). For a linear transformation T : Rm → Rn ,
there is a unique matrix A such that T (x) = Ax, obtained by applying T to the standard basis vectors:


|
|
|
A = T (e1 ) T (e2 ) · · · T (em ) .
|
|
|
It follows that if two n × m matrices A and B satisfy Ax = Bx for all x ∈ Rm , then A = B.
Proof. By the definition of matrix-vector multiplication, the ith column of A is Aei = T (ei ). For the second
statement, note that if Ax = Bx for all x ∈ Rm , then A and B define the same linear transformation T , so
they must be the same matrix by the first part of the theorem.
5
Theorem 26 (2.1.3, linearity criterion). A function T : Rm → Rn is a linear transformation if and
only if
a) T (v + w) = T (v) + T (w), for all vectors v and w in Rm , and
b) T (kv) = kT (v), for all vectors v in Rm and all scalars k.
Proof. Suppose T is a linear transformation, and let A be a matrix such that T (x) = Ax for all x ∈ Rm .
Then
T (v + w) = A(v + w) = Av + Aw = T (v) + T (w),
T (kv) = A(kv) = k(Av) = kT (v).
To prove the converse, suppose that a function T : Rm → Rn satisfies (a) and (b). Then for all x ∈ Rm ,
 
x1
 x2 
 
T (x) = T  .  = T (x1 e1 + x2 e2 + · · · + xm em )
 .. 
xm
= T (x1 e1 ) + T (x2 e2 ) + · · · + T (xm em )
= x1 T (e1 ) + x2 T (e2 ) + · · · + xm T (em )
 

 x1
|
|
|
 x2 
 
= T (e1 ) T (e2 ) · · · T (em )  .  ,
 .. 
|
|
|
xm
so T is a linear transformation.
2.2
Linear Transformations in Geometry
Definition 27. The linear transformation from R2 to R2 represented by a matrix of the form A =
k
0
0
k
is called scaling by (a factor of ) k.
Definition 28 (2.2.1). Given a line L through the origin in R2 parallel to the vector w, the orthogonal
projection onto L is the linear transformation
x·w
projL (x) =
w,
w·w
with matrix
2
1
w1
w12 + w22 w1 w2
w1 w2
.
w22
1
The projections onto the x- and y-axes are represented by the matrices
0
0
0
and
0
0
0
, respectively.
1
Definition 29 (2.2.2). Given a line L through the origin in R2 parallel to the vector w, the reflection
about L is the linear transformation
x·w
w − x,
ref L (x) = 2 projL (x) − x = 2
w·w
6
with matrix
2
1
2w1 − 1
2
2
w1 + w2 2w1 w2
2w1 w2
.
2w22 − 1
1
The reflections about the x- and y-axes are represented by the matrices
0
0
−1
and
−1
0
0
, respectively.
1
Definition 30 (2.2.3). The linear transformation from R2 to R2 represented by a matrix of the form
cos θ − sin θ
A=
sin θ
cos θ
is called counterclockwise rotation through an angle θ (about the origin).
Definition 31 (2.2.5). The linear transformation from R2 to R2 represented by a matrix of the form
1 k
1 0
A=
or A =
0 1
k 1
is called a horizontal shear or a vertical shear, respectively.
2.3
Matrix Products
Theorem 32. If T : Rm → Rp and S : Rp → Rn are linear transformations, then their composition
S ◦ T : Rm → Rn given by
(S ◦ T )(x) = S(T (x))
is also a linear transformation.
Proof. We show that if T and S satisfy the linearity criteria, then so does S ◦ T . Let v, w ∈ Rm and k ∈ R.
Then
(S ◦ T )(v + w) = S(T (v + w)) = S(T (v) + T (w)) = S(T (v)) + S(T (w)) = (S ◦ T )(v) + (S ◦ T )(w),
(S ◦ T )(kv) = S(T (kv)) = S(kT (v)) = k(S(T (v)) = k(S ◦ T )(v).
Definition 33 (2.3.1, matrix multiplication from composition of linear transformations). If B is
an n × p matrix and A is a q × m matrix, then the product matrix BA is defined if and only if p = q, in
which case it is the matrix of the linear transformation T (x) = B(A(x)). As a result, (BA)x = B(A(x)).
Theorem 34 (2.3.2, matrix multiplication using columns of matrix on the right). If B is an n × p
matrix and A is a p × m matrix with columns v1 , . . . , vm , then

 

|
|
|
|
|
|
BA = B v1 v2 · · · vm  = Bv1 Bv2 · · · Bvm  .
|
|
|
|
|
|
Proof. The ith column of BA is (BA)ei = B(Aei ) = Bvi .
7
Theorem 35 (2.3.4, matrix multiplication
matrix with columns v1 , . . . , vm , then the ijth

− w1

..

.

−
w
BA = 
i


..

.
− wn
entry-by-entry). If B is an n × p matrix and A is a p × m
entry of

−


 |
|
|



−
 v1 · · · vj · · · vm
 |
|
|

−
is the dot product of the ith row of B with the jth column of A:
[BA]ij = wi · vj = bi1 a1j + bi2 a2j + · · · + bip apj =
p
X
bik akj .
k=1
Proof. The ijth entry of BA is the ith component of Bvj , by Theorem 34, which equals wi ·vj , by Definition
15.
Theorem 36 (2.3.5, identity matrix). For any n × m matrix A,
AIm = A
and
In A = A.
Proof. Since (AIm )x = A(Im x) = Ax for all x ∈ Rm , we have AIm = A. The proof for In A = A is
analogous.
Theorem 37 (2.3.6, multiplication is associative). If AB and BC are defined, then (AB)C = A(BC).
We can simply write ABC to indicate this single matrix.
Proof. Using Definition 33 four times, we get
((AB)C)x = (AB)(Cx) = A(B(Cx)) = A((BC)x) = (A(BC))x
for any x of appropriate dimension, so (AB)C = A(BC).
Theorem 38 (2.3.7, multiplication distributes over addition). If A and B are n × p matrices and C
and D are p × m matrices, then
A(C + D) = AC + AD,
(A + B)C = AC + BC.
Proof. We show that the two sides of the first equation give the same linear transformation, using parts a
and b of Theorem 19. Because of Theorem 37, we can suppress some parentheses:
A(C + D)x = A(Cx + Dx) = ACx + ADx = (AC + AD)x.
Similarly,
(A + B)Cx = ACx + BCx = (AC + BC)x.
Theorem 39 (2.3.8, scalar multiplication). If A is an n × p matrix, B is a p × m matrix, and k is a
scalar, then
k(AB) = (kA)B and k(AB) = A(kB).
Note 40 (2.3.3, multiplication is not commutative). When A and B are both n × m matrices, AB and
BA are both defined, but they are usually not equal. In fact, they do not even have the same dimensions
unless n = m.
8
2.4
The Inverse of a Linear Transformation
Definition 41. For a function T : X → Y , X is called the domain and Y is called the target.
• A function T : X → Y is called one-to-one if for any y ∈ Y there is at most one input x ∈ X such
that T (x) = y (different inputs give different outputs).
• A function T : X → Y is called onto if for any y ∈ Y there is at least one input x ∈ X such that
T (x) = y (every target element is an output).
• A function T : X → Y is called invertible if for any y ∈ Y there is exactly one x ∈ X such that
T (x) = y. Note that a function is invertible if and only if it is both one-to-one and onto.
Definition 42 (2.4.1). If T : X → Y is invertible, we can define a unique inverse function T −1 : Y → X
by setting T −1 (y) to be the unique x ∈ X such that T (x) = y. It follows that
T −1 (T (x)) = x
and T (T −1 (y)) = y,
so T −1 ◦ T and T ◦ T −1 are identity functions. For any invertible function T , (T −1 )−1 = T , so T −1 is also
invertible, with inverse function T .
Note 43. A linear transformation T : Rm → Rn given by T (x) = Ax is invertible if for any y ∈ Rn , there
is a unique x ∈ Rm such that Ax = y.
Theorem 44 (2.4.2, linearity of the inverse). If a linear transformation T : Rm → Rn is invertible, then
its inverse T −1 is also a linear transformation.
Proof. We show that if T satisfies the linearity criteria (Theorem 26), then T −1 : Rn → Rm does also. Let
v, w ∈ Rn and k ∈ R. Then
v + w = T (T −1 (v)) + T (T −1 (w)) = T (T −1 (v) + T −1 (w)),
and applying T −1 to each side gives
T −1 (v + w) = T −1 (v) + T −1 (w).
Similarly,
kv = kT (T −1 (v)) = T (kT −1 (v)),
and so
T −1 (kv) = kT −1 (v).
Definition 45 (2.4.2). If T (x) = Ax is invertible, then A is said to be an invertible matrix, and the
matrix of T −1 is called the inverse matrix of A, written A−1 .
Theorem 46 (2.4.8, the inverse matrix as multiplicative inverse). A is invertible if and only if there
exists a matrix B such that BA = I and AB = I. In this case, B = A−1 .
9
Proof. If A is invertible, then, taking B to be A−1 ,
(BA)x = (A−1 A)x = A−1 (Ax) = T −1 (T (x)) = x = Ix
(AB)y = (AA−1 )y = A(A−1 y) = T (T −1 (y)) = y = Iy
for all x, y of correct dimension, so BA = I and AB = I.
Conversely, if we have a matrix B such that BA = I and AB = I, then T (x) = Ax and S(y) = By
satisfy
S(T (x)) = B(Ax) = (BA)x = Ix = x,
T (S(y)) = A(By) = (AB)y = Iy = y,
for any x, y of correct dimension, so S ◦ T and T ◦ S are identity transformations. Thus S is the inverse
transformation of T , A is invertible, and B = A−1 .
Theorem 47 (2.4.3, invertibility criteria). If a matrix is not square, then it is not invertible. For an
n × n matrix A, the following are equivalent:
1. A is invertible,
2. Ax = b has a unique solution x for any b,
3. rref(A) = In ,
4. rank(A) = n.
Proof. Let A be an n × m matrix with inverse A−1 . Since T (x) = Ax maps Rm → Rn , the inverse
transformation T −1 maps Rn → Rm , so A−1 is an m × n matrix. If m > n, then the linear system Ax = 0
has at least one free variable, so it cannot have a unique solution, contradicting the invertibility of A. If
n > m, then A−1 y = 0 has at least one free variable, so it cannot have a unique solution, contradicting the
invertibility of A−1 . It follows that n = m.
The equivalence of the first two statements is a restatement of Note 43. Statements 3 and 4 are equivalent
to the second one by Theorem 12.
Procedure 48 (2.4.5,
the inverse
To find the inverse of a matrix A (if it
computing
of a matrix).
exists), compute rref A | In , which is equal to rref(A) | B for some B.
• If rref(A) 6= In , then A is not invertible.
• If rref(A) = In , then A is invertible and A−1 = B.
b
is invertible if and only if
d
a b
ad − bc 6= 0. The scalar ad − bc is called the determinant of A, written det(A). If A =
is invertible,
c d
then
1
d −b
−1
.
A =
det(A) −c a
Theorem 49 (2.4.9, inverse of a 2 × 2 matrix). A 2 × 2 matrix A =
a
c
Theorem 50 (2.4.7, inverse of a product of matrices). If A and B are invertible n × n matrices,
then AB is invertible as well, and
(AB)−1 = B −1 A−1 .
10
Proof. To show that B −1 A−1 is the inverse of AB, we check that their product in either order is the identity
matrix:
(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In ,
(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 In B = B −1 B = In .
11
Subspaces of Rn and Their Dimensions
3
3.1
Image and Kernel of a Linear Transformation
Definition 51 (3.1.1). The image of a function T : X → Y is its set of outputs:
im(T ) = {T (x) : x ∈ X},
a subset of the target Y . Note that T is onto if and only if im(T ) = Y .
For a linear transformation T : Rm → Rn , the image is
im(T ) = {T (x) : x ∈ Rm },
a subset of the target Rn .
Definition 52 (3.1.2). The set of all linear combinations of the vectors v1 , . . . , vm in Rn is called their
span:
span(v1 , v2 , . . . , vm ) = {c1 v1 + · · · + cm vm : c1 , . . . , cm ∈ R}.
If span(v1 , v2 , . . . , vm ) = W for some subset W of Rn , we say that the vectors v1 , . . . , vm span W . Thus
span can be used as a noun or as a verb.
Theorem 53 (3.1.3). The image of a linear transformation T (x) = Ax is the span of the column vectors of
A. We denote the image of T by im(T ) or im(A).
Proof. By the column definition of matrix multiplication,

x 
1
|
|
 .. 


T (x) = Ax = v1 · · · vm  .  = x1 v1 + · · · + xm vm .
|
|
xm
Thus the image of T consists of all linear combinations of the column vectors v1 , . . . , vm of A.
Note 54. By the preceding theorem, the column vectors v1 , . . . , vm ∈ Rn of an n × m matrix A span Rn if
and only if im(A) = Rn , which is equivalent to T (x) = Ax being onto.
Theorem 55 (3.1.4, the image is a subspace). The image of a linear transformation T : Rm → Rn
has the following properties:
a) contains zero vector: 0 ∈ im(T ).
b) closed under addition: If y1 , y2 ∈ im(T ), then y1 + y2 ∈ im(T ).
c) closed under scalar multiplication: If y ∈ im(T ) and k ∈ R, then ky ∈ im(T ).
As we will see in the next section, these three properties mean that im(T ) is a subspace.
Proof.
a) 0 = A0 = T (0) ∈ im(T ).
b) There exist vectors x1 , x2 ∈ Rm such that T (x1 ) = y1 , T (x2 ) = y2 . Since T is linear, y1 + y2 =
T (x1 ) + T (x2 ) = T (x1 + x2 ) ∈ im(T ).
c) There exists a vector x ∈ Rm such that T (x) = y. Since T is linear, ky = kT (x) = T (kx) ∈ im(T ).
12
Definition 56 (3.1.1). The kernel of a linear transformation T : Rm → Rn is its set of zeros:
ker(T ) = {x ∈ Rm : T (x) = 0},
a subset of the domain Rm .
Theorem 57 (kernel criterion for one-to-one). A linear transformation T : Rm → Rn is one-to-one if
and only if ker(T ) = {0}.
Proof. Since T (x) = Ax for some A, we have T (0) = A0 = 0. If T is one-to-one, it follows immediately that
0 is the only solution of T (x) = 0, so ker(T ) = {0}.
Conversely, suppose ker(T ) = {0}, and let x1 , x2 ∈ Rm satisfy T (x1 ) = y and T (x2 ) = y. By the
linearity of T ,
T (x1 − x2 ) = T (x1 ) − T (x2 ) = y − y = 0,
so x1 − x2 = 0 and x1 = x2 , proving that T is one-to-one.
Definition 58 (3.2.6). A linear relation among the vectors v1 , . . . , vm ∈ Rn is an equation of the form
c1 v1 + · · · + cm vm = 0
for scalars c1 , . . . , cm ∈ R. If c1 = · · · = cm = 0, the relation is called trivial, while if at least one of the ci
in nonzero, the relation is nontrivial.
Theorem 59. The kernel of a linear transformation T (x) = Ax is the set of solutions x of the equation
Ax = 0, i.e.
x 

1
|
|
..  = 0,
v1 · · · vm  
 . 
|
|
xm
which corresponds to the set of linear relations x1 v1 + · · · + xm vm among the column vectors v1 , . . . , vm of
A. We denote the kernel of T by ker(T ) or ker(A).
Proof. The first statement is immediate from the definition of kernel, while the correspondence with linear
relations follows from the column definition of the product Ax.
Theorem 60 (3.1.4, the kernel is a subspace). The kernel of a linear transformation T : Rm → Rn
has the following properties:
a) contains zero vector: 0 ∈ ker(T ).
b) closed under addition: If x1 , x2 ∈ ker(T ), then x1 + x2 ∈ ker(T ).
c) closed under scalar multiplication: If x ∈ ker(T ) and k ∈ R, then kx ∈ ker(T ).
As we will see in the next section, these three properties mean that ker(T ) is a subspace.
Proof.
a) T (0) = A0 = 0.
b) Since T is linear, T (x1 + x2 ) = T (x1 ) + T (x2 ) = 0 + 0 = 0.
c) Since T is linear, T (kx) = kT (x) = k0 = 0.
13
Theorem 61 (3.1.7). For an n × m matrix A, ker(A) = {0} if and only if rank(A) = m.
Proof. The equation Ax = 0 always has a solution x = 0. This is the only solution if and only if there are
no free variables (by Theorem 9), meaning that all m variables are leading variables, i.e. rank(A) = m.
Theorem 62 (2.4.3, invertibility criteria). For an n × n matrix A, the following are equivalent:
1. A is invertible,
2. Ax = b has a unique solution x for any b,
3. rref(A) = In ,
4. rank(A) = n,
5. im(A) = Rn ,
6. ker(A) = {0}.
Proof. The equivalence of 1-4 was established by Theorem 47.
Statement 5 means that the linear system Ax = b is consistent for any b, which follows immediately
from 2.
5 implies 4 by proving the contrapositive. Suppose 4 is false, so that rank(A) < n.
We show that
Then rref(A) | en has an inconsistent row, so rref(A)x
= en has
no solutions. Applying the steps of Gaussrref(A)
|
e
Jordan elimination on
A
to
the
augmented
matrix
n , but in reverse order, yields an augmented
matrix of the form A | b for some vector b, so Ax = b must also have no solutions. Thus 5 is false and
the contrapositive is proven.
The equivalence of 4 and 6 follows from the case n = m of the preceding theorem.
3.2
Subspaces of Rn ; Bases and Linear Independence
Definition 63 (3.2.1). A subset W of a vector space Rn is called a subspace of Rn if it has the following
three properties:
a) contains zero vector: 0 ∈ W .
b) closed under addition: If w1 , w2 ∈ W , then w1 + w2 ∈ W .
c) closed under scalar multiplication: If w ∈ W and k ∈ R, then kw ∈ W .
Property a is needed only to assure that W is nonempty. If W contains any vector w, then it also contains 0w = 0, by property c. Properties b and c are together equivalent to W being closed under linear
combinations.
Note 64 (3.2.2). We proved in the preceding section that, for a linear transformation T : Rm → Rn , ker(T )
is a subspace of Rm , while im(T ) is a subspace of Rn .
Definition 65 (3.2.3). Let v1 , . . . , vm ∈ Rn .
a) A vector vi in the list v1 , . . . , vm is redundant if it is a linear combination of the preceding vectors
v1 , . . . , vi−1 . Note that v1 is redundant if and only if it equals 0, the unique linear combination of
the empty set of vectors.
14
b) The vectors v1 , . . . , vm are called linearly independent (LI) if none of them are redundant. Otherwise, they are linearly dependent (LD).
c) The vectors v1 , . . . , vm form a basis of a subspace V of Rn if they span V and are linearly independent.
Theorem 66 (3.2.7, linear dependence criterion). The vectors v1 , . . . , vm ∈ Rn are linearly dependent
if and only if there exists an nontrivial (linear) relation among them.
Proof. Suppose v1 , . . . , vm are linearly dependent and let vi = c1 v1 + · · · + ci−1 vi−1 be a redundant vector
in this list. Then we obtain a nontrivial relation by subtracting vi from both sides:
c1 v1 + · · · + ci−1 vi−1 + (−1)vi = 0.
Conversely, if there is a nontrivial relation c1 v1 + · · · + ci vi + · · · + cm vm = 0, where i is the highest
index such that ci 6= 0, then we can solve for vi to show that vi is redundant:
vi = −
ci−1
c1
v1 − · · · −
vi−1 .
ci
ci
Thus the vectors v1 , . . . , vm are linearly dependent.
Theorem 67 (3.2.8-9, linear independence criteria). For a list v1 , . . . , vm of vectors in Rn , the following
statements are equivalent:
1. v1 , . . . , vm are linearly independent.
2. None of v1 , . . . , vm are redundant.
3. There are no nontrivial relations among v1 , . . . , vm , i.e.
c1 v 1 + · · · + cm v m = 0
implies
c1 = · · · = cm = 0.


|
· · · vm  = {0}.
|

|
|
5. rank v1 · · · vm  = m.
|
|
|
4. ker v1
|

To prove that vectors are linearly independent, statement 3 is useful in an abstract setting, whereas 5 is
convenient when the vectors are given concretely.
Proof. Statement 2 is the definition of 1.
The equivalence of 2 and 3 follows immediately
from the preceding theorem.
 


x1
|
|
 .. 
There exists a nonzero vector x =  .  in the kernel of v1 · · · vm  if and only if there is a
|
|
xm
corresponding nontrivial relation x1 v1 + · · · + xm vm = 0, so 3 is equivalent to 4.
Theorem 61 implies that 4 and 5 are equivalent.
Note 68. By the equivalence of 1 and 4 in the preceding theorem, and by Theorem 57, the column vectors
v1 , . . . vm ∈ Rn of a matrix A are linearly independent if and only if ker(A) = {0}, which is equivalent to
T (x) = Ax being one-to-one.
15
Theorem 69 (3.2.10, bases and unique representation). The vectors v1 , . . . , vm form a basis of a
subspace V of Rn if and only if every vector v ∈ V can be expressed uniquely as a linear combination
v = c1 v 1 + · · · + cm v m .
Proof. Suppose v1 , . . . , vm is a basis of V ⊂ Rn and let v be any vector in V . Since v1 , . . . , vm span V , v
can be expressed as a linear combination of v1 , . . . , vm . Suppose there are two such representations
v = c1 v1 + · · · + cm vm ,
v = d1 v1 + · · · + dm vm .
Subtracting the equations yields the linear relation
0 = (c1 − d1 )v1 + · · · + (cm − dm )vm .
Since v1 , . . . , vm are linearly independent, this relation is trivial, meaning that c1 − d1 = · · · = cm − dm = 0,
so ci = di for all i. Thus any two representations of v as a linear combination of the basis vectors are in fact
identical, so the representation is unique.
Conversely, suppose every vector v ∈ V can be expressed uniquely as a linear combination of v1 , . . . , vm .
Applying this statement with v = 0, we see that 0v1 + · · · + 0vm = 0 is the only linear relation among
v1 , . . . , vm , so these vectors are linearly independent. Since each v ∈ V is a linear combination of v1 , . . . , vm ,
these vectors span V . We conclude that v1 , . . . , vm form a basis for V .
3.3
The Dimension of a Subspace of Rn
Theorem 70 (3.3.1). Let V be a subspace of Rn . If the vectors v1 , . . . , vp ∈ V are linearly independent,
and the vectors w1 , . . . , wq ∈ V span V , then p ≥ q.
Proof. Define matrices

|
A = w1
|
···

|
wq 
|

|
and B = v1
|
···

|
vp  .
|
The vectors v1 , . . . , vp are in V = span(w1 , . . . , wq ) = im(A), so there exist u1 , . . . , up ∈ Rq such that
v1 = Au1 ,
Combining these equations, we get

|
B = v1 · · ·
|
...,


|
|
vp  = A u1
|
|
|
vp = Aup .
···
{z
C

|
up ,
|
}
or B = AC.
The kernel of C is a subset of the kernel of B, which equals {0} since v1 , . . . , vp are linearly independent, so
ker(C) = {0} as well. By Theorem 61, rank(C) = p, and the rank must be less than or equal to the number
of rows, so p ≤ q as claimed.
Theorem 71 (3.3.2, number of vectors in a basis). All bases of a subspace V of Rn contain the same
number of vectors.
Proof. Let v1 , . . . , vp and w1 , . . . , wq be two bases of V . Since v1 , . . . , vp are linearly independent and
w1 , . . . , wq span V , we have p ≤ q, by the preceding theorem. On the other hand, since v1 , . . . , vp span V
and w1 , . . . , wq are linearly independent, we have p ≥ q, so in fact p = q.
16
Definition 72 (3.3.3). The number of vectors in a basis of a subspace V of Rn is called the dimension of
V , denoted dim(V ).
Note 73. It can easily be shown that the standard basis vectors e1 , . . . , en of Rn do in fact form a basis of
Rn , so that, as would be expected, dim(Rn ) = n.
Theorem 74 (3.3.4, size of linearly independent and spanning sets). Let V be a subspace of Rn with
dim(V ) = m.
a) Any linearly independent set of vectors in V contains at most m vectors. If it contains exactly m
vectors, then it forms a basis of V .
b) Any spanning set of vectors in V contains at least m vectors. If it contains exactly m vectors, then it
forms a basis of V .
Proof.
a) Suppose v1 , . . . , vp are linearly independent vectors in V and w1 , . . . , wm form a basis of V . Then
w1 , . . . , wm span V , so p ≤ m by Theorem 70.
Now let v1 , . . . , vm be linearly independent vectors in V . To prove that v1 , . . . , vm form a basis of V ,
we must show that any vector v ∈ V is contained the span of v1 , . . . , vm . By what we have already
shown, the m + 1 vectors v1 , . . . , vm , v must be linearly dependent. Since no vi is redundant in this
list, v must be redundant, meaning that v is a linear combination of v1 , . . . , vm , as needed.
b) Suppose v1 , . . . , vq span V and w1 , . . . , wm form a basis of V . Then w1 , . . . , wm are linearly independent, so q ≥ m by Theorem 70.
Now let v1 , . . . , vm be vectors which span V . To prove that v1 , . . . , vm form a basis of V , we must
show that they are linearly independent. We use proof by contradiction. Suppose that v1 , . . . , vm
are linearly dependent, with some redundant vi , so that vi = c1 v1 + · · · + ci−1 vi−1 for some scalars
c1 , . . . , ci−1 . In any linear combination v = d1 v1 + · · · + dm vm , we can substitute for vi to rewrite v
as a linear combination of the other vectors:
v = (d1 + di c1 )v1 + · · · + (di−1 + di ci−1 )vi−1 + di+1 vi+1 + · · · + dm vm .
We conclude that the subspace V = span(v1 , . . . , vm ) of dimension m is in fact spanned by just m − 1
vectors, a contradiction.
Procedure 75 (finding a basis of the kernel). The kernel of a matrix A consists of all solutions x to
the equation Ax= 0. To find
a basis of the kernel of A, solve Ax = 0, using Gauss-Jordan elimination
to compute rref A | 0 = rref(A) | 0 , solving the resulting system of linear equations for the leading
variables, and substituting parameters r, s, t, etc. for the free variables. Then write the general solution as
a linear combination of constant vectors with the parameters as coefficients. These constant vectors form a
basis for ker(A).
Procedure 76 (3.3.5, finding a basis of the image). To obtain a basis of the image of A, take the
columns of A corresponding to the columns of rref(A) containing leading 1’s.
Definition 77. The nullity of a matrix A, written nullity(A), is the dimension of the kernel of A.
17
Theorem 78. For any matrix A, rank(A) = dim(im A).
Proof. By Procedure 76, a basis of im(A) contains as many vectors as the number of leading 1’s in rref(A),
which is the definition of rank(A).
Theorem 79 (3.3.7, Rank-Nullity Theorem). For any n × m matrix A,
dim(ker A) + dim(im A) = m.
In terms of the linear transformation T : Rm → Rn given by T (x) = Ax, this can be written as
dim(ker T ) + dim(im T ) = dim(Rm ).
In terms of nullity and rank, we have
nullity(A) + rank(A) = m.
Proof. From Procedure 75, we know that a basis of ker(A) contains a vector for each free variable of A.
From Procedure 76, we know that a basis of im(A) contains a vector for each leading variable of A. Since
the number of free variables plus the number of leading variables equals the total number of variables m, the
first equation holds. The final two equations then follow from the definitions and the preceding theorem.

|
Theorem 80 (3.3.10, invertibility criteria). For an n × n matrix A = v1
|
equivalent:
···

|
vn , the following are
|
1. A is invertible,
2. Ax = b has a unique solution x for any b,
3. rref(A) = In ,
4. rank(A) = n,
5. im(A) = Rn ,
6. ker(A) = {0},
7. v1 , . . . , vn span Rn ,
8. v1 , . . . , vn are linearly independent,
9. v1 , . . . , vn form a basis of Rn .
Proof. Statements 1-6 are equivalent by Theorem 62. Statements 5 and 7 are equivalent by Note 54. Statements 6 and 8 are equivalent by Note 68. Statements 7, 8, and 9 are equivalent by Theorem 74.
18
3.4
Coordinates
Definition 81 (3.4.1). Consider a basis B = (v1 , v2 , . . . , vm ) of a subspace V of Rn . By Theorem 69, any
vector x ∈ V can be written uniquely as
x = c1 v1 + c2 v2 + · · · + cm vm .
The scalars c1 , c2 , . . . , cm are called the B-coordinates of x, and
 
c1
 c2 
 
[x]B =  . 
 .. 
cm
is the B-coordinate vector of x.


|
· · · vm , then the relationship between x and [x]B is given by
|

c 
1
|
|
 
x = c1 v1 + · · · + cm vm = v1 · · · vm   ...  = S[x]B .
|
|
cm
|
Note 82. If we let S = SB = v1
|
Note 83. For the standard basis E = (e1 , . . . , en ) of Rn , the E-coordinate vector of a vector x ∈ Rn is just
x itself, since
 
x1
 .. 
x =  .  = x1 e1 + · · · + xn en .
xn

|
In terms of the preceding note, SE = e1
|
···

|
en  = I, so that x = I[x]E = [x]E .
|
Theorem 84 (3.4.2, linearity of coordinates). If B is a basis of a subspace V of Rn , then for all x, y ∈ V
and k ∈ R:
a) [x + y]B = [x]B + [y]B ,
b) [kx]B = k[x]B .
Proof. Let B = (v1 , . . . , vm ).
a) If x = c1 v1 + · · · + cm vm and y = d1 v1 + · · · + dm vm , then x + y = (c1 + d1 )v1 + · · · + (cm + dm )vm ,
so that

    
c1 + d 1
c1
d1





.
.
. 
..
[x + y]B = 
 =  ..  +  ..  = [x]B + [y]B .
cm + dm
cm
dm
b) If x = c1 v1 + · · · + cm vm , then kx = kc1 v1 + · · · + kcm vm , so that


 
kc1
c1
 .. 
 .. 
[kx]B =  .  = k  .  = k[x]B .
kcm
19
cm
Theorem 85 (3.4.3, B-matrix of a linear transformation). Consider a linear transformation T : Rn →
Rn and a basis B = (v1 , . . . , vn ) of Rn . Then for any x ∈ Rn , the B-coordinate vectors of x and of T (x) are
related by the equation
[T (x)]B = B[x]B ,
where

|
B = [T (v1 )]B
|
···

|
[T (vn )]B  ,
|
the B-matrix of T . In other words, taking either path in the following diagram yields the same result (we
say that the diagram commutes):
T
x = c1 v1 + · · · + cn vn −−−−−−−−−→ T (x)




y
y
 
c1
B
 .. 
[x]B =  . 
−−−−−−−−−→ [T (x)]B
cm
Proof. Write x as a linear combination x = c1 v1 + · · · + cn vn of the vectors in the basis B. We use the
linearity of T to compute
T (x) = T (c1 v1 + · · · + cn vn ) = c1 T (v1 ) + · · · + cn T (vn ).
Taking the B-coordinate vector of each side and using the linearity of coordinates, we get
[T (x)]B = [c1 T (v1 ) + · · · + cn T (vn )]B
= c1 [T (v1 )]B + · · · + cn [T (vn )]B
 c 

1
|
|
 
= [T (v1 )]B · · · [T (vn )]B   ... 
|
|
cn
= B[x]B .
Note 86. For the standard basis E = (e1 , . . . , en ) of Rn , the E-matrix of a linear transformation T : Rn → Rn
given by T (x) = Ax is just A, the standard matrix of T . In terms of the preceding theorem,

 

|
|
|
|
[T (v1 )]E · · · [T (vn )]E  = T (v1 ) · · · T (vn ) = A.
|
|
|
|
In this case, the diagram above becomes:
T
x = x1 e1 + · · · + xn en −−−−−−−−−→ T (x)
 
x1
A
 .. 
[x]E =  . 
−−−−−−−−−→ [T (x)]E
xm
20
Theorem 87 (3.4.4, standard matrix and B-matrix). Consider a linear transformation T : Rn → Rn
and a basis B = (v1 , . . . , vn ) of Rn . The standard matrix A of T and the B-matrix B of T are related by the
equation


|
|
AS = SB, where S = v1 · · · vn  .
|
|
The equation AS = SB can be solved for A or for B to obtain equivalent equations A = SBS −1 and
B = S −1 AS. The relationship between A and B is illustrated by the following diagram:
x
x

S
A
−−−−−−−−−→
T (x)
x

S
B
[x]B −−−−−−−−−→ [T (x)]B
Proof. Applying Note 82 and Theorem 85, we compute
T (x) = Ax = A(S[x]B ) = (AS)[x]B ,
and
T (x) = S[T (x)]B = S(B[x]B ) = (SB)[x]B .
Thus (AS)[x]B = (SB)[x]B for all [x]B ∈ Rn , which implies that AS = SB. Multiplying on the left of each
side by S −1 , we get S −1 AS = B. Multiplying instead on the right of each side by S −1 , we get A = SBS −1 .
(We know that S is invertible because its columns form a basis of Rn .)
Definition 88 (3.4.5). Given two n × n matrices A and B, we say that A is similar to B, abbreviated
A ∼ B, if there exists an invertible matrix S such that
AS = SB
or, equivalently, B = S −1 AS.
Note 89. By Theorem 87, the standard matrix A of a linear transformation T : Rn → Rn is similar to the
B-matrix B of T for any basis B of Rn .
Theorem 90 (3.4.6). Similarity is an equivalence relation, which means that it satisfies the following
three properties for any n × n matrices A, B, and C:
a) reflexivity: A ∼ A.
b) symmetry: If A ∼ B, then B ∼ A.
c) transitivity: If A ∼ B and B ∼ C, then A ∼ C.
Proof.
a) A = IAI = I −1 AI.
b) If A ∼ B, then there exists S such that B = S −1 AS. Multiplying on the left of each side by S and on
the right of each side by S −1 , we get SBS −1 = A, or A = SBS −1 = (S −1 )−1 B(S −1 ), which shows
that B ∼ A.
c) If A ∼ B and B ∼ C, then there exists S such that B = S −1 AS and T such that C = T −1 BT .
Substituting for B in the second equation yields C = T −1 (S −1 AS)T = (ST )−1 A(ST ), which shows
that A ∼ C.
21
4
Linear Spaces
4.1
Introduction to Linear Spaces
Definition 91 (4.1.1). A linear space V (more commonly known as a vector space) is a set V together
with an addition rule and a scalar multiplication rule:
• For f, g ∈ V , there is an element f + g ∈ V .
• For f ∈ V and k ∈ R, there is an element kf ∈ V .
which satisfy the following eight properties (for all f, g, h ∈ V and c, k ∈ R):
1. addition is associative: (f + g) + h = f + (g + h).
2. addition is commutative: f + g = g + f .
3. an additive identity exists (a neutral element): There is an element n ∈ V such that f + n = f
for all f ∈ V . This n is unique and is denoted by 0.
4. additive inverses exist: For each f ∈ V , there exists a g ∈ V such that f + g = 0. This g is unique
and is denoted by (−f ).
5. s.m. distributes over addition in V : k(f + g) = kf + kg.
6. s.m. distributes over addition in R: (c + k)f = cf + kf .
7. s.m. is “associative”: c(kf ) = (ck)f .
8. an “identity” exists for s.m.: 1f = f .
Note 92. Vector spaces Rn and their subspaces W ⊂ Rn are examples of linear spaces. Linear spaces
are generalizations of Rn . Using the addition and scalar multiplication operations, we can construct linear
combinations, which then enable us to define the basic notions of linear algebra (which we have already
defined for Rn ): subspace, span, linear independence, basis, dimension, coordinates, linear transformation,
image, kernel, matrix of a transformation, etc.
Note 93. Typically, both Rn and the linear spaces defined above are referred to as vector spaces. If it is
necessary to draw a distinction, then the latter are called abstract vector spaces. When one is first learning
linear algebra, this terminology is potentially confusing because the elements of most “abstract vector spaces”
are not vectors in the traditional sense, but functions, or polynomials, or sequences, etc. Thus we will follow
the text in speaking of vector spaces Rn and linear spaces V .
Definition 94.
• The set F (R, R) of all functions f : R → R (real-valued functions of the real numbers) is a linear space.
• The set C ∞ of all functions f : R → R which can be differentiated any number of times (smooth
functions) is a linear space. It includes all polynomials, exponential functions, sin(x), cos(x), etc.
• The set P of all polynomials (with real coefficients) is a linear space.
• The set Pn of all polynomials of degree ≤ n is a linear space.
• The set Rn×m of n × m matrices with real coefficients is a linear space.
Definition 95 (4.1.2). A subset W of a linear space V is called a subspace of V if it satisfies the following
three properties:
22
a) contains neutral element: 0 ∈ W .
b) closed under addition: If f, g ∈ W , then f + g ∈ W .
c) closed under scalar multiplication: If f ∈ W and k ∈ R, then kf ∈ W .
Theorem 96. A subspace W of a linear space V is itself a linear space.
Proof. Property a guarantees that W contains the neutral element from V , which is property 3 of a linear
space.
For property 4 of a linear space, first note that for any f ∈ V , 0f = (0 + 0)f = 0f + 0f , and so 0 = 0f . It
follows that we can write the additive inverse as −f = (−1)f , since f + (−1)f = 1f + (−1)f = (1 + (−1))f =
0f = 0. Thus if f ∈ W , we have that the element −f = (−1)f ∈ V is also in W , by property c above.
Properties b and c imply that addition and scalar multiplication are well defined as operations within
W.
For properties 1-2 and 5-8 of a linear space, we simply note that all elements of W are also elements of
V , so the properties hold automatically.
Definition 97 (4.1.3). The terms span, redundant, linearly independent, basis, coordinates, and dimension
are defined for linear spaces V just as for vector spaces Rn . In particular, for a basis B = (f1 , . . . , fn ) of a
linear space V , any element f ∈ V can be written uniquely as a linear combination f = c1 f1 + · · · + cn fn of
the vectors in B. The coefficients c1 , . . . , cn are called the coordinates of f and the vector
 
c1
 .. 
[f ]B =  . 
cn
is the B-coordinate vector of f .
We define the B-coordinate transformation LB : V → Rn by
 
c1
 
LB (f ) = [f ]B =  ...  .
cn
If the basis B is understood, then we sometimes write just L for LB .
The B-coordinate transformation is invertible, with inverse
 
c1
−1  . 
LB  ..  = c1 f1 + · · · + cn fn .
cn
−1
n
It is easy to check that in fact L−1
B ◦ LB is the identity on V , and LB ◦ LB is the identity on R .
n
Note that the basis vectors f1 , . . . , fn for V and the standard basis vectors e1 , . . . , en for R are related
by
LB
/
e i ∈ Rn ,
fi ∈ V
o
L−1
B
since
 
0
 .. 
.
 

LB (fi ) = L(0f1 + · · · + 1fi + · · · + 0fn ) = 
1 = ei .
.
 .. 
0
23
Theorem 98 (4.1.4, linearity of the coordinate transformation LB ). If B is a basis of a linear space
V with dim(V ) = n, then the B-coordinate transformation LB : V → Rn is linear. In other words, for all
f, g ∈ V and k ∈ R,
a) [f + g]B = [f ]B + [g]B ,
b) [kf ]B = k[f ]B .
Proof. The proof is analogous to that of Theorem 84.
Note
99.

|
v1 · · ·
|
If 
V = Rn and B = (v1 , . . . , vn ), then L−1
: Rn → V = Rn has standard matrix SB =
B
|
vn  (encountered in the preceding section), so that
|
LB (x) = SB−1 x = [x]B .
Theorem 100 (4.1.5, dimension). If a linear space V has a basis with n elements, then all bases of V
consist of n elements, and we say that the dimension of V is n:
dim(V ) = n.
Proof. Consider two bases B = (f1 , . . . , fn ) and C = (g1 , . . . , gm ) of V .
We first show that [g1 ]B , . . . , [gm ]B ∈ Rn are linearly independent, which will imply m ≤ n by Theorem
74. Suppose
c1 [g1 ]B + · · · + cm [gm ]B = 0.
By the preceding theorem,
[c1 g1 + · · · + cm gm ]B = 0,
so that
c1 g1 + · · · + cm gm = 0.
Since g1 , . . . , gm are linearly independent, c1 = · · · = cm = 0, as claimed.
Similarly, we can show that [f1 ]C , . . . , [fn ]C are linearly independent, so that n ≤ m.
We conclude that n = m.
Definition 101 (4.1.8). Not every linear space has a (finite) basis. If we allow infinite bases, then every
linear space does have a basis, but we will not define infinite bases in this course. A linear space with a (finite)
basis is called finite dimensional. A linear space without a (finite) basis is called infinite dimensional.
Procedure 102 (4.1.6, finding a basis of a linear space V ).
1. Write down a typical element of V in terms of some arbitrary constants (parameters).
2. Express the typical element as a linear combination of some elements of V , using the arbitrary constants
as coefficients; these elements then span V .
3. Verify that the elements of V in this linear combination are linearly independent; if so, they form a
basis of V .
Theorem 103 (4.1.7, linear differential equations). The solutions of the differential equation
f (n) (x) + an−1 f (n−1) (x) + · · · + a1 f 0 (x) + a0 f (x) = 0,
with a0 , . . . , an−1 ∈ R, form an n-dimensional subspace of C ∞ . A differential equation of this form is called
an nth-order linear differential equation with constant coefficients.
Proof. This theorem is proven in section 9.3 of the text, which we will not cover in this course.
24
4.2
Linear transformations and isomorphisms
Definition 104 (4.2.1). Let V and W be linear spaces. A function f : V → W is called a linear transformation if, for all f, g ∈ V and k ∈ R,
T (f + g) = T (f ) + T (g)
and T (kf ) = kT (f ).
For a linear transformation T : V → W , we let
im(T ) = {T (f ) : f ∈ V }
and
ker(T ) = {f ∈ V : T (f ) = 0}.
Then im(T ) is a subspace of the target W and ker(T ) is a subspace of the domain V , so im(T ) and ker(T )
are each linear spaces.
If the image of T is finite dimensional, then dim(im T ) is called the rank of T , and if the kernel of T is
finite dimensional, then dim(ker T ) is called the nullity of T .
Theorem 105 (Rank-nullity Theorem). If V is finite dimensional, then the Rank-Nullity Theorem holds:
dim(V ) = dim(im T ) + dim(ker T )
= rank(T ) + nullity(T ).
Proof. The proof is a series of exercises in the text.
Definition 106 (4.2.2). An invertible linear transformation T is called an isomorphism (from the Greek
for “same structure”). The linear space V is said to be isomorphic to the linear space W , written V ' W ,
if there exists an isomorphism T : V → W .
Theorem 107 (4.2.3, coordinate transformations are isomorphisms). If B = (f1 , f2 , . . . , fn ) is a basis
of a linear space V , then the B-coordinate transformation LB (f ) = [f ]B from V to Rn is an isomorphism.
Thus V is isomorphic to Rn .
Proof. We showed in the preceding section that LB : V → Rn is an invertible linear transformation:
 
c1
LB
/
 .. 
f = c1 f1 + · · · + cn fn in V
[f
]
=
 .  in Rn .
B
o
(LB )−1
cn
Note 108. It follows from the preceding theorem that any n-dimensional vector space is isomorphic to Rn .
From this perspective, finite dimensional linear spaces are just vector spaces in disguise. An n-dimensional
linear space is really just Rn written in another “language.”
Theorem 109 (4.2.4, properties of isomorphisms). Let T : V → W be a linear transformation.
a) T is an isomorphism if and only if ker(T ) = {0} and im(T ) = W . (study only this part for the quiz)
b) Assume V and W are finite dimensional. If any two of the following statements are true, then T is
an isomorphism. If T is an isomorphism, then all three statements are true.
i. ker(T ) = {0}
ii. im(T ) = W
iii. dim(V ) = dim(W )
25
Proof.
a) Suppose T is an isomorphism. If T (f ) = 0 for an element f ∈ V , then we can apply T −1 to each side
to obtain T −1 (T (f )) = T −1 (0), or f = 0, so ker(T ) = {0}. To see that im(T ) = W , note that any g
in W can be written as g = T (T −1 (g)) ∈ im(T ).
Now suppose ker(T ) = {0} and im(T ) = W . To show that T is invertible, we must show that T (f ) = g
has a unique solution f for each g. Since im(T ) = W , there is at least one solution. If f1 and f2 are
two solutions, with T (f1 ) = g and T (f2 ) = g, then
T (f1 − f2 ) = T (f1 ) − T (f2 ) = g − g = 0,
so that f1 − f2 is in the kernel of T . Since ker(T ) = {0}, we have f1 − f2 = 0 and thus f1 = f2 .
b) If i. and ii. hold, then we have shown in part (a) that T is an isomorphism.
If i. and iii. hold, then
dim(im T ) = dim(V ) − dim(ker T ) = dim(V ) − dim{0} = dim(W ) − 0 = dim(W ).
We prove that im(T ) = W by contradiction. Suppose that there is some element g ∈ W which is not
contained in im(T ). If g1 , . . . , gn form a basis of im(T ), then g ∈
/ span(g1 , . . . , gn ) = im(T ), and so
g is not redundant in the list of vectors g1 , . . . , gn , g, which are therefore linearly independent. Thus
dim(W ) ≥ n + 1 > n = dim(im T ), contradicting our result dim(im T ) = dim(W ) from above.
If ii. and iii. hold, then
dim(ker T ) = dim(V ) − dim(im T ) = dim(V ) − dim(W ) = 0.
The only subspace with dimension 0 is {0}, so ker(T ) = {0}.
If T is an isomorphism, then i. and ii. hold by part (a). Statement iii. holds by the Rank-Nullity
Theorem and part (a):
dim(V ) = dim(ker T ) + dim(im T ) = dim{0} + dim(W ) = 0 + dim(W ) = dim(W ).
Theorem 110. If W is a subspace of a finite dimensional linear space V and dim(W ) = dim(V ), then
W =V.
Proof. Define a linear transformation T : W → V by T (x) = x. Then ker(T ) = {0}, which together with
the hypothesis dim(W ) = dim(V ) implies, by the preceding theorem, that im(T ) = V . It follows that every
element of V is also an element of W .
Theorem 111 (isomorphism is an equivalence relation). Isomorphism of linear spaces is an equivalence relation, which means that it satisfies the following three properties for any linear spaces V , W ,
and U :
a) reflexivity: V ' V .
b) symmetry: If V ' W , then W ' V .
c) transitivity: If V ' W and W ' U , then V ' U .
Proof.
a) Any linear space V is isomorphic to itself via the identity transformation I : V → V defined by
I(f ) = f , which is its own inverse.
26
b) If V ' W , then there exists an invertible linear transformation T : V → W . The inverse transformation T −1 : W → V is then an isomorphism from W to V , so W ' V .
c) If V ' W and W ' U , then there exist invertible linear transformations T : V → W and S : W → U .
Composing these transformations, we obtain (S ◦T ) : V → U , with inverse transformation (S ◦T )−1 =
T −1 S −1 . Thus S ◦ T is an isomorphism and V ' U .
4.3
The Matrix of a Linear Transformation
Definition 112 (4.3.1). Let V be an n-dimensional linear space with basis B, and let T : V → V be a linear
transformation. The B-matrix B of T is defined to be the standard matrix of the linear transformation
n
n
LB ◦ T ◦ L−1
B : R → R , so that
Bx = LB (T (L−1
B (x)))
for all x ∈ Rn :
T
V −−−−−−−−−→ V

x


LB y
L−1
B 
B
Rn −−−−−−−−−→ Rn
If f = L−1
B (x), so that x = LB (f ) = [f ]B , then
[T (f )]B = B[f ]B
for all f ∈ V :
T
f


LB y
V −−−−−−−−−→ V




LB y
LB y
T
−−−−−−−−−→
T (f )


LB y
B
B
x = [f ]B −−−−−−−−−→ [T (f )]B
Rn −−−−−−−−−→ Rn
Theorem 113 (4.3.2, B-matrix of a linear transformation). Let V be a linear space with basis B =
(f1 , . . . , fn ), and let T : V → V be a linear transformation. Then the B-matrix of T is given by


|
|
B = [T (f1 )]B · · · [T (fn )]B  .
|
|
The columns of B are the B-coordinate vectors of the transforms of the elements f1 , . . . , fn in the basis B.
Proof. The ith column of B is
Bei = B[fi ]B = [T (fi )]B .
Definition 114 (4.3.3). Let V be an n-dimensional linear space with bases B and C, and let T : V → V be
a linear transformation. The change of basis matrix from B to C, denoted by SB→C , is defined to be the
n
n
standard matrix of the linear transformation LC ◦ L−1
B : R → R , so that
SB→C x = LC (L−1
B (x))
for all x ∈ Rn :
27
n
jjj4 RO
j
j
j
jj
jjjj
SB→C
V jTTTT
TTTT
T
T
TT
L−1
B
Rn
LC
If f = L−1
B (x), so that x = LB (f ) = [f ]B , then
[f ]C = SB→C [f ]B
for all f ∈ V :
n
jj4 RO
j
j
j
jjj
jjjj
SB→C
V TTTT
TTTT
T
T
TT*
LB
Rn
j4 [f O ]C
LC jjjj
j
j
j
jjj
SB→C
f TjTTT
TTTT
T
T
*
LB
x = [f ]B
LC
−1 −1
Note 115. The inverse matrix of SB→C is SC→B , the standard matrix of LB ◦ L−1
.
C = (LC ◦ LB )
Theorem 116 (4.3.3, change of basis matrix). Let V be a linear space with two bases B = (f1 , . . . , fn )
and C. Then the change of basis matrix from B to C is given by


|
|
SB→C = [f1 ]C · · · [fn ]C  .
|
|
The columns of SB→C are the C-coordinate vectors of the elements f1 , . . . , fn in the basis B.
Proof. The ith column of SB→C is
SB→C ei = SB→C [fi ]B = [fi ]C .
Theorem 117 (4.3.4, change of basis in a subspace of Rn ). Consider a subspace V of Rn with two
bases B = (f1 , . . . , fm ) and C = (g1 , . . . , gm ). Then SB = SC SB→C , or

 

|
|
|
|
f1 · · · fm  = g1 · · · gm  SB→C ,
|
|
|
|
which is illustrated in the following diagrams:
4 Rm
LC jjjj
jjjjjj O
j
j
j
jtjjjj S
C
SB→C
V ⊂ Rn jTTT
TTTTTTSTB
T
TTTTTT T* m
LB
R
jj [x]O C
jjSjj
j
j
j
C
jj
SB→C
x jTtjTTT
TTTTSB
TTTT [x]B
In the case n = m, SB and SC become invertible, and we can solve for the change of basis matrix:
SB→C = SC−1 SB .
If, in addition, we take C to be the standard basis E = (e1 , . . . , en ) of Rn , then we get
SB→E = SE−1 SB = ISB = SB .
28
−1
m
Proof. By definition, SB→C x = LC (L−1
B (x)) for any x ∈ R , and we can apply LC to both sides to obtain
−1
L−1
C (SB→C x) = LB (x).
By Note 99, we can rewrite this as SC SB→C x = SB x. Since this holds for any x ∈ Rm , we conclude that
SC SB→C = SB .
When n = m, SB and SC are n × n matrices whose columns form bases, so they are invertible.
Theorem 118 (4.3.5, change of basis for the matrix of a linear transformation). Consider a linear
transformation T : V → V , where V is a finite dimensional linear space with two bases B and C. The
relationship between the B-matrix B of T and the C-matrix C of T involves the change of basis matrix
S = SB→C :
CS = SB or C = SBS −1 or B = S −1 CS.
The first equation comes from the outer rectangle in the following diagram. The two trapezoids and the two
(identical) triangles are precisely the commutative diagrams already encountered in this section.
RO n
SB→C Rn
eKK
KK LC
KK
KK
KK
sV
ss
s
ss
ss LB
yss
C
T
B
/ Rn
9
ss O
LC ss
s
ss
ss
s
/V K
SB→C
KK
KK
KK
LB KKK %
/ Rn
[f ]C eK
O
KK
KKLC
KK
KK
K
SB→C f
s
s
s
s
sss
ysss LB
[f ]B
C
/ [T (f )]C
s9 O
s
ss
s
s
ss
/ T (f )
SB→C
KKK
KKK
LB KK%
/ [T (f )]B
LC
T
B
Proof. We prove that CSB→C = SB→C B. Intuitively, the large rectangle commutes because the two trapezoids and the two triangles inside of it commute. Algebraically, this amounts to the following (we write S
for SB→C and x for [f ]B ):
−1
CSx = (LC ◦ T ◦ L−1
C )((LC ◦ LB )(x))
−1
= (LC ◦ T ◦ (L−1
C ◦ LC ) ◦ LB )(x)
= (LC ◦ T ◦ L−1
B )(x).
Similarly,
−1
SBx = (LC ◦ L−1
B )((LB ◦ T ◦ LB )(x))
−1
= (LC ◦ (L−1
B ◦ LB ) ◦ T ◦ LB )(x)
= (LC ◦ T ◦ L−1
B )(x).
Combining these results, CSx = SBx for all x ∈ Rn , so CS = SB as desired.
Note 119. If V is a subspace of dimension m in the vector space Rn , then we can write matrices SB and SC
in place of linear transformations LB and LC (provided that we change the direction of the corresponding
arrows):
Rm
O KK
KK SC
KK
KK
K%
SB→C V 9 ⊂ Rn
ss
ss
s
sss SB
s
m
R
C
T
B
/ Rm
O
s
s
SC ss
s
ss
yss
/ V ⊂ Rn
SB→C
eKK
KK
KK
K
K
SB
K / Rm
[x]C K
O
KK
KKSC
KK
KK
K%
SB→C 9x
s
ss
s
s
ss
sss SB
[x]B
29
C
T
B
/ [T (x)]C
O
s
SC s s
s
s
ysss
/ T (x)
SB→C
eKKK
KKK
K
K
SB
/ [T (x)]B
Finally, if V = Rn and we take C to be the standard basis E = (e1 , . . . , en ) of Rn , then the E-matrix of T
equals the standard matrix A of T , SE = I, SB→E = SB , and the picture simplifies to the following, where
the outer rectangle gives the familiar formula ASB = SB B from Theorem 87:
RO n KKK
KKKKK
KKKKKK
KKKKKK
SB V 9 = Rn
s
s
s
ss
s
s
ss SB
s
Rn
A
T
B
/ Rn
ssss O
s
s
s
s
ssssss
ssssss
/ V = Rn
SB
eKKK
KKK
K
SB KKK / Rn
[x]E K
O KKKKK
KKKKKK
KKKKKK
KK
SB 9x
ss
s
s
sss
sss SB
[x]B
30
A
T
B
/ [T (x)]E
ss O
s
s
ssss
s
s
s
sssss
/ T (x)
SB
eKKK
KKK
SB KK
/ [T (x)]B
5
Orthogonality and Least Squares
5.1
Orthogonal Projections and Orthonormal Bases
Definition 120 (5.1.1).
• Two vectors v, w ∈ Rn are called perpendicular or orthogonal if v · w = 0.
• A vector x ∈ Rn is orthogonal to a subspace V ⊂ Rn if x is orthogonal to all vectors v ∈ V .
Theorem 121. A vector x ∈ Rn is orthogonal to a subspace V ⊂ Rn with basis v1 , . . . , vm if and only
if x is orthogonal to all of the basis vectors v1 , . . . , vm .
Proof. If x is orthogonal to V , then x is orthogonal to v1 , . . . , vm by definition.
Conversely, if x is orthogonal to v1 , . . . , vm , then any v ∈ V can be written as a linear combination
v = c1 v1 + · · · + cm vm of basis vectors, from which it follows that
x · v = x · (c1 v1 + · · · + cm vm )
= x · (c1 v1 ) + · · · + x · (cm vm )
= c1 (x · v1 ) + · · · + cm (x · vm )
= c1 (0) + · · · + cm (0)
= 0,
so x is orthogonal to v.
Definition 122 (5.1.1).
• The length (or magnitude or norm) of a vector v ∈ Rn is ||v|| =
√
v · v.
• A vector u ∈ Rn is called a unit vector if its length is 1 (i.e., ||u|| = 1 or u · u = 1).
Theorem 123.
• For any vectors v, w ∈ Rn and scalar k ∈ R,
k(v · w) = (kv) · w = v · (kw)
• If v 6= 0, then the vector u =
of v.
1
||v|| v
and
||kv|| = |k| ||v|| .
is a unit vector in the same direction as v, called the normalization
Proof.
• We compute
(kv) · w =
v · (kw) =
n
X
i=1
n
X
i=1
(kv)i wi =
vi (kw)i =
n
X
i=1
n
X
i=1
kvi wi = k
vi kwi = k
n
X
i=1
n
X
vi wi = k(v · w),
vi wi = k(v · w),
i=1
which proves the first claim. We then use the definition of length to obtain
√ √
p
p
||kv|| = (kv) · (kv) = k 2 (v · v) = k 2 v · v = |k| ||v|| .
31
• To prove that the normalization u is a unit vector, we compute its length:
1 1 ||v|| = 1 ||v|| = 1.
||u|| = v = ||v||
||v|| ||v||
Definition 124 (5.1.2). The vectors u1 , . . . , um ∈ Rn are called orthonormal if they are all unit vectors
and all orthogonal to each other:
(
1 if i = j
.
ui · uj =
0 if i =
6 j
Note 125. The standard basis vectors e1 , . . . , en of Rn are orthonormal.
Theorem 126. Orthonormal vectors u1 , . . . , um are linearly independent.
Proof. Consider a relation
c1 u1 + · · · + ci ui + · · · + cm um = 0.
Taking the dot product of each side with ui , we get
(c1 u1 + · · · + ci ui + · · · + cm um ) · ui = 0 · ui = 0,
which simplifies to
c1 (u1 · ui ) + c2 (u2 · ui ) + · · · + ci (ui · ui ) + · · · + cm (um · ui ) = 0.
Since all of the dot products are 0 except for ui · ui = 1, we have ci = 0. This is true for all i = 1, 2, . . . , m,
so u1 , . . . , um are linearly independent.
Theorem 127 (5.1.4, orthogonal projection). For any vector x ∈ Rn and any subspace V ⊂ Rn , we can
write
x = xk + x⊥
for some xk in V and x⊥ perpendicular to V , and this representation is unique. The vector projV (x) = xk
is called the orthogonal projection of x onto V and is given by the formula
projV (x) = xk = (u1 · x)u1 + · · · + (um · x)um
for all x ∈ Rn , where (u1 , . . . , um ) is any orthonormal basis of V . The resulting orthogonal projection
transformation projV : Rn → Rn is linear.
Proof. Any potential projV (x) = xk ∈ V can be written as a linear combination
xk = c1 u1 + · · · + ci ui + · · · + cm um
of the basis vectors of V . Then
x⊥ = x − xk = x − c1 u1 − · · · − ci ui − · · · − cm um
is orthogonal to V if and only if it is orthogonal to all of the basis vectors ui ∈ V :
0 = ui · (x − c1 u1 − · · · − ci ui − · · · − cm um )
= ui · x − ui · (c1 u1 ) − · · · − ui · (ci ui ) − · · · − ui · (cm um )
= ui · x − c1 (ui · u1 ) − · · · − ci (ui · ui ) − · · · − cm (ui · um )
| {z }
| {z }
| {z }
0
1
= ui · x − ci .
32
0
Thus the unique solution has ci = ui · x for i = 1, . . . , m, which means that
xk = (u1 · x)u1 + · · · + (um · x)um
and
x⊥ = x − (u1 · x)u1 − · · · − (um · x)um .
For linearity, take x, y ∈ Rn and k ∈ R. Then
x + y = (xk + x⊥ ) + (yk + y⊥ ) = (xk + yk ) + (x⊥ + y⊥ ),
with xk + yk in V and x⊥ + y⊥ orthogonal to V , so
projV (x + y) = xk + yk = projV (x) + projV (y).
Similarly,
kx = k(xk + x⊥ ) = kxk + kx⊥ ,
with kxk in V and kx⊥ orthogonal to V , so
projV (kx) = kxk = k projV (x).
Note 128. The orthogonal projection of x onto a subspace V ⊂ Rn is obtained by summing the orthogonal
projections (ui · x)ui of x onto the lines spanned by the orthonormal basis vectors u1 , . . . , um of V .
Theorem 129 (5.1.6, coordinates via orthogonal projection). For any orthonormal basis B =
(u1 , . . . , un ) of Rn ,
x = (u1 · x)u1 + · · · + (un · x)un
for all x ∈ Rn , so the B-coordinate vector of x is given by


u1 · x


[x]B =  ...  .
un · x
Proof. If V = Rn in Theorem 127, then clearly x = x + 0 is a decomposition of x with x in V and 0
orthogonal to V . Thus
x = xk = projV (x) = (u1 · x)u1 + · · · + (un · x)un .
Definition 130 (5.1.7). The orthogonal complement V ⊥ of a subspace V ⊂ Rn is the set of all vectors
x ∈ Rn which are orthogonal to all vectors in v ∈ V :
V ⊥ = {x ∈ Rn : v · x = 0 for all v ∈ V }.
Theorem 131. For a subspace V ⊂ Rn , the orthogonal complement V ⊥ is the kernel of projV . The image
of projV is V itself.
Proof. Note that x ∈ V ⊥ if and only if x⊥ = x, i.e. projV (x) = xk = 0.
Any vector in the image of projV is contained in V by definition. Conversely, for any v ∈ V , projV (v) = v,
so v is in the image of projV .
33
Theorem 132 (5.1.8, properties of the orthogonal complement). Let V be a subspace of Rn .
a) V ⊥ is a subspace of Rn .
b) V ∩ V ⊥ = {0}
c) dim(V ) + dim(V ⊥ ) = n
d) (V ⊥ )⊥ = V
Proof.
a) By the preceding theorem, V ⊥ is the kernel of the linear transformation projV : Rn → Rn and is
therefore a subspace of the domain Rn .
b) Clearly 0 is contained in both V and V ⊥ . Any vector x in both V and V ⊥ is orthogonal to itself, so
2
that x · x = ||x|| = 0 and thus x = 0.
c) Applying the Rank-Nullity Theorem to the linear transformation projV : Rn → Rn , we have
n = dim(im(projV )) + dim(ker(projV )) = dim(V ) + dim(V ⊥ ).
d) Note that V ⊂ (V ⊥ )⊥ because, for any v ∈ V and x ∈ V ⊥ , x · v = v · x = 0. By part (c),
dim((V ⊥ )⊥ ) = n − dim(V ⊥ ) = n − (n − dim(V )) = dim(V ),
so Theorem 110 implies that (V ⊥ )⊥ = V .
Theorem 133 (5.1.9, Pythagorean Theorem). For two vectors x, y ∈ Rn , the equation
2
2
2
||x + y|| = ||x|| + ||y||
holds if and only if x and y are orthogonal.
Proof. We compute:
2
||x + y|| = (x + y) · (x + y)
= x · x + 2(x · y) + y · y
2
2
= ||x|| + 2(x · y) + ||y|| ,
2
2
which equals ||x|| + ||y|| if and only if x · y = 0.
Theorem 134 (5.1.10). For any vector x ∈ Rn and subspace V ⊂ Rn ,
||projV (x)|| ≤ ||x|| ,
with equality if and only if x ∈ V .
Proof. Since projV (x) = xk is orthogonal to x⊥ , we can apply the Pythagorean Theorem:
2
2
2
||x|| = ||projV (x)|| + x⊥ .
2
2
2
It follows that ||projV (x)|| ≤ ||x|| and thus ||projV (x)|| ≤ ||x||. There is equality if and only if x⊥ = 0,
or x⊥ = 0, which is equivalent to x ∈ V .
34
Theorem 135 (5.1.11, Cauchy-Schwarz inequality). For two vectors x, y ∈ Rn ,
|x · y| ≤ ||x|| ||y|| ,
with equality if and only if x and y are parallel.
Proof. Let u =
1
||y|| y
be the normalization of y. Then
projV (x) = (x · u)u
for any x ∈ Rn , so by the preceding theorem,
1
1
||x|| ≥ ||projV (x)|| = ||(x · u)u|| = |x · u| ||u|| = |x · u| = x ·
y =
|x · y| .
||y||
||y||
Multiplying each side by ||y||, we get
||x|| ||y|| ≥ |x · y| .
x·y Definition 136 (5.1.12). By the Cauchy-Schwarz inequality, ||x||||y||
=
angle between two nonzero vectors x, y ∈ Rn to be
θ = arccos
|x·y|
||x||||y||
≤ 1, so we may define the
x·y
.
||x|| ||y||
With this definition, we have the formula
x · y = ||x|| ||y|| cos θ
for the dot product in terms of the lengths of two vectors and the angle between them.
5.2
Gram-Schmidt Process and QR Factorization
Theorem 137 (5.2.1). For a basis v1 , . . . , vm of a subspace V ⊂ Rn , define subspaces
Vj = span(v1 , . . . , vj ) ⊂ V
for j = 0, 1, . . . , m. Note that V0 = span ∅ = {0} and Vm = V .
Let vj⊥ be the component of vj perpendicular to the span Vj−1 of the preceding basis vectors:
k
vj⊥ = vj − vj = vj − projVj−1 (vj ).
We can normalize the vj⊥ to obtain unit vectors:
1
uj = ⊥ vj⊥ .
v
j
Then u1 , . . . , um form an orthonormal basis of V .
Proof. In order to define uj , we must ensure that vj⊥ 6= 0. This holds because vj is not redundant in the
list v1 , . . . , vj , and thus is not contained in Vj−1 .
By definition, the vj are all orthogonal to each other. Thus the uj are too, since
!
!
1
1
1
⊥
⊥
ui · uj = ⊥ vi · ⊥ vj = ⊥ ⊥ (vi · vj ) = 0.
vi
vj
vi
vj
Since u1 , . . . , um ∈ V are orthogonal unit vectors, they are linearly independent. Since dim(V ) = m, these
vectors form an (orthonormal) basis for V .
35
Procedure 138 (5.2.1, Gram-Schmidt orthogonalization). We compute the orthonormal basis u1 , . . . , um
of the preceding theorem by performing the following steps for j = 0, 1, . . . , m.
1. Let
vj⊥ = vj − projVj−1 (vj ) = vj − (u1 · vj )u1 − · · · − (uj−1 · vj )uj−1 .
2. Let
1
uj = ⊥ vj⊥ .
vj
Note that v1⊥ = v1 − projV0 (v1 ) = v1 − 0 = v1 .
Note 139. By Theorem 117, the change of basis matrix SB→C from the original basis B = (v1 , . . . , vm ) to
the orthonormal basis C = (u1 , . . . , um ) satisfies the equation SB = SC SB→C , or

 

|
|
|
|
v1 · · · vm  = u1 · · · um  SB→C .
| {z }
|
|
|
|
|
{z
} |
{z
} R
M
Q
These matrices are customarily named M , Q, and R, as indicated above; the preceding equation is called
the QR factorization of M .
[x]C
O
SC jjjjj
j
j
j
j
jj
t
j
SB→C
x jTTTT
TTTT
T
TTT SB
[x]B
[x]C
O
Q jjjjj
j
j
j
j
jj
t
j
R
x jTTTT
TTTT
T
TTT M
[x]B
Theorem 140 (5.2.2, QR factorization). Let M be an n × m matrix with linearly independent columns
v1 , . . . , vm . Then there exists an n × m matrix Q with orthonormal columns u1 , . . . , um and an upper
triangular matrix R with positive diagonal entries such that
M = QR.
This representation
unique, and rij = ui · vj
is
the form rjj = vj⊥ ):
 ⊥ v 1  0

R= .
 ..
0
for i ≤ j (the diagonal entries can alternately be written in
u
1 ·⊥v2
v2 ..
.
0
···
···
..
.
···

u1 · vm
u2 · vm 

..  .

.
⊥ v m Proof. We obtain u1 , . . . , um using Gram-Schmidt orthogonalization. To find the jth column of R, we
express vj as a linear combination of u1 , . . . , uj :
vj = projVj−1 (vj ) + vj⊥
r1j
rj−1,j
rjj
z }| {
z }| {
z }| {
= (u1 · vj ) u1 + · · · + (uj−1 · vj ) uj−1 + vj⊥ uj .
|
{z
} | {z }
projVj−1 (vj )
vj⊥
entry rjj of R can be alternately expressed by taking the dot product of uj with each side of
The
diagonal
v⊥ uj = v⊥ to get
j
j
⊥ vj = uj · vj⊥
= uj · [vj − (u1 · vj )u1 − · · · − (uj−1 · vj )uj−1 ]
= uj · vj .
The uniqueness of the factorization is an exercise in the text.
36
5.3
Orthogonal Transformations and Orthogonal Matrices
Definition 141 (5.3.1). A linear transformation T : Rn → Rn is called orthogonal if it preserves the
length of vectors:
||T (x)|| = ||x|| , for all x ∈ Rn .
If T (x) = Ax is an orthogonal transformation, we say that A is an orthogonal matrix.
Theorem 142 (5.3.2, orthogonal transformations preserve orthogonality). Let T : Rn → Rn be an
orthogonal linear transformation. If v, w ∈ Rn are orthogonal, then so are T (v), T (w).
Proof. We compute:
2
2
||T (v) + T (w)|| = ||T (v + w)||
2
= ||v + w||
2
2
= ||v|| + ||w||
2
2
= ||T (v)|| + ||T (w)|| ,
so T (v) is orthogonal to T (w) by the Pythagorean Theorem.
Note 143. In fact, orthogonal transformations preserve all angles, not just right angles: the angle between
two nonzero vectors v, w ∈ Rn equals the angle between T (v), T (w). This is a homework problem.
Theorem 144 (orthogonal transformations preserve the dot product). A linear transformation
T : Rn → Rn is orthogonal if and only if T preserves the dot product:
v · w = T (v) · T (w)
for all v, w ∈ Rn .
Proof. Suppose T is orthogonal. Then T preserves the length of v + w, so
2
2
||T (v + w)|| = ||v + w||
(T (v) + T (w)) · (T (v) + T (w)) = (v + w) · (v + w)
T (v) · T (v) + 2T (v) · T (w) + T (w) · T (w) = v · v + 2v · w + w · w
2
2
2
2
||T (v)|| + 2T (v) · T (w) + ||T (w)|| = ||v|| + 2v · w + ||w||
2T (v) · T (w) = 2v · w
T (v) · T (w) = v · w,
where we have used that ||T (v)|| = ||v|| and ||T (w)|| = ||w||.
Conversely, suppose T preserves the dot product. Then
2
2
||T (v)|| = T (v) · T (v) = v · v = ||v|| ,
so ||T (v)|| = ||v||, and T is orthogonal.
Theorem 145 (5.3.3, orthogonal matrices and orthonormal bases). An n × n matrix A is orthogonal
if and only if its columns form an orthonormal basis of Rn .
Proof. Define T (x) = Ax and recall that

|
A = T (e1 ) · · ·
|
37

|
T (en ) .
|
Suppose A, and hence T , are orthogonal. Because e1 , . . . , en are orthonormal, their images T (e1 ), . . . , T (en )
are also orthonormal, since T preserves length and orthogonality. By Theorem 126, T (e1 ), . . . , T (en ) are
linearly independent. Since dim(Rn ) = n, the columns T (e1 ), . . . , T (en ) of A form an (orthonormal) basis
of Rn .
Conversely, suppose T (e1 ), . . . , T (en ) form an orthonormal basis of Rn . Then for any x = x1 e1 + · · · +
xn en ∈ Rn ,
2
2
||T (x)|| = ||x1 T (e1 ) + · · · + xn T (en )||
2
2
= ||x1 T (e1 )|| + · · · + ||xn T (en )||
= (|x1 | ||T (e1 )||)2 + · · · + (|xn | ||T (en )||)2
= x21 + · · · + x2n
2
= ||x|| ,
where the second equals sign follows from the Pythagorean Theorem. Then ||T (x)|| = ||x|| and T and A are
orthogonal.
Theorem 146 (5.3.4, products and inverses of orthogonal matrices).
a) The product AB of two orthogonal n × n matrices A and B is orthogonal.
b) The inverse A−1 of an orthogonal n × n matrix A is orthogonal.
Proof.
a) The linear transformation T (x) = (AB)x is orthogonal because
||T (x)|| = ||A(Bx)|| = ||Bx|| = ||x|| .
b) The linear transformation T (x) = A−1 x is orthogonal because
||T (x)|| = A−1 x = A(A−1 x) = ||Ix|| = ||x|| .
Definition 147 (5.3.5). For an m × n matrix A, the transpose AT of A is the n × m matrix whose ijth
entry is the jith entry of A:
[AT ]ij = Aji .
The rows of A become the columns of AT , and the columns of A become the rows of AT .
A square matrix A is symmetric if AT = A and skew-symmetric if AT = −A.
Note 148 (5.3.6). If v and w are two (column) vectors in Rn , then
v · w = vT w.
(Here we choose to ignore the difference between a scalar a and the 1 × 1 matrix a ).
Theorem 149 (5.3.7, transpose criterion for orthogonal matrices). An n × n matrix A is orthogonal
if and only if AT A = In or, equivalently, if A has inverse A−1 = AT .
38
Proof. If

|
A = v1
|
|
v2
|

|
vn  ,
|
···
then we compute

−
−

AT A = 

−
v1T
v2T
..
.
vnT

− 
|
−

 v1

|
−

|
v2
|
v1 · v1
|
 v2 · v1

vn  =  .
 ..
|
vn · v1

···
v1 · v2
v2 · v2
..
.
···
···
..
.

v1 · vn
v2 · vn 

..  .
. 
vn · v2
···
vn · vn
This product equals In if and only if the columns of A are orthonormal, which is equivalent to A being an
orthogonal matrix.
If A is orthogonal, it is also invertible, since its columns form an (orthonormal) basis of Rn . Thus
T
A A = In is equivalent to A−1 = AT by simple matrix algebra.
Theorem 150 (5.3.8, summary: orthogonal matrices). For an n×n matrix A, the following statements
are equivalent:
1. A is an orthogonal matrix.
2. ||Ax|| = ||x|| for all x ∈ Rn .
3. The columns of A form an orthonormal basis of Rn .
4. AT A = In .
5. A−1 = AT .
Proof. See Definition 141 and Theorems 145 and 149.
Theorem 151 (5.3.9, properties of the transpose).
a) If A is an n × p matrix and B a p × m matrix (so that AB is defined), then
(AB)T = B T AT .
b) If an n × n matrix A is invertible, then so is AT , and
(AT )−1 = (A−1 )T .
c) For any matrix A,
rank(A) = rank(AT ).
Proof.
a) We check that the ijth entries of the two matrices are equal:
[(AB)T ]ij = [AB]ji = (jth row of A) · (ith column of B),
[B T AT ]ij = (ith row of B T ) · (jth column of AT ) = (ith column of B) · (jth row of A).
b) Taking the transpose of both sides of AA−1 = I and using part (a), we get
(AA−1 )T = (A−1 )T AT = I.
Similarly, A−1 A = I implies
(A−1 A)T = AT (A−1 )T = I.
We conclude that the inverse of AT is (A−1 )T .
39
c) Suppose A has n columns. Since the vectors in the kernel of A are precisely those vectors in Rn which
are orthogonal to all of the rows of A, and hence to the span of the rows of A,
span(rows of A) = (ker A)⊥ .
By the Rank-Nullity Theorem, together with its corollary in Theorem 132 part (c),
rank(AT ) = dim(im AT )
= dim(span(columns of AT ))
= dim(span(rows of A))
= dim((ker A)⊥ )
= n − dim(ker A)
= dim(im A)
= rank(A).

− w1

..
Theorem 152 (invertibility criteria involving rows). For an n × n matrix A = 
.
− wn
following are equivalent:
−


, the
−
1. A is invertible,
2. w1 , . . . , wn span Rn ,
3. w1 , . . . , wn are linearly independent,
4. w1 , . . . , wn form a basis of Rn .
Proof. By the preceding theorem, A is invertible if and only if AT is invertible. Statements 2-4 are just the
last three invertibility criteria in Theorem 80 applied to AT , since the columns of AT are the rows of A.
Theorem 153 (column-row definition of matrix multiplication). Given matrices




− w1 −
|
|


..
A = v1 · · · vm  and B = 
,
.
|
|
− wm −
with v1 , . . . , vm , w1 , . . . , wm ∈ Rn , think of the vi as n × 1 matrices and the wi as 1 × n matrices. Then the
product of A and B can be computed as a sum of m n × n matrices:
AB = v1 w1 + · · · + vm wm =
m
X
vi wi .
i=1
Proof.
[v1 w1 + · · · + vm wm ]ij = [v1 w1 ]ij + · · · + [vm wm ]ij
= [v1 ]i [w1 ]j + · · · + [vm ]i [wm ]j
= Ai1 B1j + · · · + Aim Bmj
= [AB]ij
40
Theorem 154 (5.3.10, matrix of an orthogonal projection). Let V be a subspace of Rn with orthonormal
basis u1 , . . . , um . Then the matrix of the orthogonal projection onto V is


|
|
QQT , where Q = u1 · · · um  .
|
|
Proof. We know from Theorem 127 that, for x ∈ Rn ,
projV (x) = (u1 · x)u1 + · · · + (um · x)um .
If we view the vector ui as an n × 1 matrix and the scalar ui · x as a 1 × 1 matrix, we can write
projV (x) = u1 (u1 · x) + · · · + um (um · x)
= u1 (uT1 x) + · · · + um (uTm x)
= (u1 uT1 x + · · · + um uTm )x

 − uT
1
|
|

..
= u1 · · · um  
.
|
|
− uTm
−


x
−
T
= QQ x.
The second to last equals sign follows from the preceding theorem.
5.4
Least Squares and Data Fitting
Theorem 155 (5.4.1). For any matrix A,
(im A)⊥ = ker(AT ).
Proof. Let

|
A = v1
|
···

|
vm  ,
|
so that

− v1T

..
AT = 
.
T
− vm
−

,
−
and recall that im(A) = span(v1 , . . . , vm ). Then
(im A)⊥ = {x ∈ Rn : v · x = 0 for all v ∈ im(A)}
= {x ∈ Rn : vi · x = 0 for i = 1, . . . , m}
= ker(AT ).
Theorem 156 (5.4.2).
a) If A is an n × m matrix, then
ker(A) = ker(AT A).
b) If A is an n × m matrix with ker(A) = {0}, then AT A is invertible.
Proof.
41

a) If Ax = 0, then AT Ax = 0, so ker(A) ⊂ ker(AT A).
Conversely, if AT Ax = 0, then Ax ∈ ker(A) and Ax ∈ im(AT ) = (ker A)⊥ , so Ax = 0 by Theorem
132 part (b). Thus ker(AT A) ⊂ ker(A).
b) Since AT A is an m × m matrix and, by part (a), ker(AT A) = {0}, AT A is invertible by Theorem 47.
Theorem 157 (5.4.3, alternative characterization of orthogonal projection). Given a vector x ∈ Rn
and a subspace V ⊂ Rn , the orthogonal projection projV (x) is the vector in V closest to x, i.e.,
||x − projV (x)|| < ||x − v||
for all v ∈ V not equal to projV (x).
Proof. Note that x − projV (x) = x⊥ ∈ V ⊥ , while projV (x) − v ∈ V , so the two vectors are orthogonal. We
can therefore apply the Pythagorean Theorem:
2
2
2
||x − projV (x)|| + ||projV (x) − v|| = ||x − projV (x) + projV (x) − v||
2
= ||x − v|| .
2
2
2
This implies that ||x − projV (x)|| < ||x − v|| unless ||projV (x) − v|| = 0, i.e. projV (x) = v.
Definition 158 (5.4.4). Let A be an n × m matrix. Then a vector x∗ ∈ Rm is called a least-squares
solution of the system Ax = b if the distance between Ax∗ and b is as small as possible:
||b − Ax∗ || ≤ ||b − Ax||
for all x ∈ Rm .
Note 159. The vector x∗ is called a “least-squares solution” because it minimizes the sum of the squares
of the components of the “error” vector b − Ax. If the system Ax = b is consistent, then the least-squares
solutions x∗ are just the exact solutions, so that the error b − Ax∗ = 0.
Theorem 160 (5.4.5, the normal equation). The least-squares solutions of the system
Ax = b
are the exact solutions of the (consistent) system
AT Ax = AT b,
which is called the normal equation of Ax = b.
Proof. We have the following chain of equivalent statements:
The vector x∗ is a least-squares solution of the system Ax = b
⇐⇒ ||b − Ax∗ || ≤ ||b − Ax||
for all x ∈ Rm
⇐⇒ Ax∗ = proj(im A) (b)
⇐⇒ b − Ax∗ ∈ (im A)⊥ = ker(AT )
⇐⇒ AT (b − Ax∗ ) = 0
⇐⇒ AT Ax∗ = AT b.
42
Theorem 161 (5.4.6, unique least-squares solution). If ker(A) = 0, then the linear system Ax = b has
the unique least-squares solution
x∗ = (AT A)−1 AT b.
Proof. By Theorem 156 part (b), the matrix AT A is invertible. Multiplying each side of the normal equation
on the left by (AT A)−1 , we obtain the result.
Theorem 162 (5.4.7, matrix of an orthogonal projection). Let v1 , . . . , vm be any basis of a subspace
V ⊂ Rn , and set


|
|
A = v1 · · · vm  .
|
|
Then the matrix of the orthogonal projection onto V is
A(AT A)−1 AT .
Proof. Let b be any vector in Rn . If x∗ is a least-squares solution of Ax = b, then Ax∗ is the projection
onto V = im(A) of b. Since the columns of A are linearly independent, ker(A) = {0}, so we have the unique
least-squares solution
x∗ = (AT A)−1 AT b.
Multiplying each side by A on the left, we get
projV (b) = Ax∗ = A(AT A)−1 AT b.
Note 163. If v1 , . . . , vm form an orthonormal basis, then AT A = Im and the formula for the matrix of an
orthogonal projection simplifies to AAT , as in Theorem 154.
5.5
Inner Product Spaces
Definition 164 (5.5.1). An inner product on a linear space V is a rule that assigns a scalar, denoted
hf, gi, to any pair f, g of elements of V , such that the following properties hold for all f, g, h ∈ V and all
c ∈ R:
1. symmetry: hf, gi = hg, f i
2. preserves addition: hf + h, gi = hf, gi + hh, gi
3. preserves scalar multiplication: hcf, gi = c hf, gi
4. positive definiteness: hf, f i > 0 for all nonzero f ∈ V .
A linear space endowed with an inner product is called an inner product space.
Note 165. Properties 2 and 3 state that, for a fixed g ∈ V , the transformation T : V → R given by
T (f ) = hf, gi is linear.
Definition 166. We list some examples of inner product spaces:
43
• Let C[a, b] be the linear space of continuous functions from the interval [a, b] to R. Then
Z
hf, gi =
b
f (t)g(t) dt
a
defines an inner product on C[a, b].
• Let `2 be the linear space of all “square-summable” infinite sequences, i.e., sequences
x = (x0 , x1 , x2 , . . . , xn , . . .),
such that
P∞
i=0
x2i = x20 + x21 + · · · converges. Then
hx, yi =
∞
X
xi yi = x0 y0 + x1 y1 + · · ·
i=0
defines an inner product on `2 .
• The trace of a square matrix A, denoted tr(A), is the sum of its diagonal entries. An inner product
on the linear space Rn×m of all n × m matrices is given by
hA, Bi = tr(AT B).
Definition 167 (5.5.2).
• The norm (or magnitude) of an element f of an inner product space is
p
||f || = hf, f i.
• Two elements f, g of an inner product space are called orthogonal (or perpendicular) if
hf, gi = 0.
• The distance between two elements of an inner product space is defined to be the norm of their
difference:
dist(f, g) = ||f − g|| .
• The angle θ between two elements f, g of an inner product space is defined by the formula
hf, gi
−1
θ = cos
.
||f || ||g||
Theorem 168 (5.5.3, orthogonal projection). If V is an inner product space with finite dimensional
subspace W , then the orthogonal projection projW (f ) of an element f ∈ V onto W is defined to be the
unique element of W such that f − projW (f ) is orthogonal to W . Alternately, projW (f ) is the element of
W which minimizes
dist(f, projW (f )) = ||f − projW (F )|| .
If g1 , . . . , gm is an orthonormal basis of W , then
projW (f ) = hg1 , f i g1 + · · · + hgm , f i gm
for all f ∈ V .
44
Definition 169. Consider the inner product
1
hf, gi =
π
Z
π
f (t)g(t) dt
−π
on the linear space C[−π, π] of continuous functions on the interval [−π, π]. For each positive integer n,
define the subspace Tn of C[−π, π] to be
Tn = span(1, sin(t), cos(t), sin(2t), cos(2t), . . . , sin(nt), cos(nt)).
Then Tn consists of all functions of the form
f (t) = a + b1 sin(t) + c1 cos(t) + · · · + bn sin(nt) + cn cos(nt),
called trigonometric polynomials of order ≤ n.
Theorem 170 (5.5.4, an orthonormal basis of Tn ). The functions
1
√ , sin(t), cos(t), sin(2t), cos(2t), . . . , sin(nt), cos(nt)
2
form an orthonormal basis of Tn .
Proof. By the “Euler identities,” we obtain
Z
1 π
hsin(pt), cos(mt)i =
sin(pt) cos(mt) dt = 0, for all integers p, m,
π −π
Z
1 π
sin(pt) sin(mt) dt = 0, for distinct integers p, m,
hsin(pt), sin(mt)i =
π −π
Z
1 π
hcos(pt), cos(mt)i =
cos(pt) cos(mt) dt = 0, for distinct integers p, m.
π −π
(Note that 1 = cos(0t).) Thus the functions 1, sin(t), cos(t), sin(2t), cos(2t), . . . , sin(nt), cos(nt) are orthogonal to each other, and hence linearly independent. Since they clearly span Tn , they form a basis for Tn .
To obtain an orthonormal basis for Tn , we normalize the vectors. Since
s Z
√
1 π
||1|| =
1 dt = 2,
π −π
s Z
1 π
||sin(mt)|| =
sin2 (mt) dt = 1,
π −π
s Z
1 π
cos2 (mt) dt = 1,
||cos(mt)|| =
π −π
√
we need only replace the function 1 by 1/ ||1|| = 1/ 2.
Theorem 171 (5.5.5, Fourier coefficients). The best approximation (in the “continuous least-squares”
sense) of f ∈ C[−π, π] by function in the subspace Tn is
fn (t) = projTn f (t)
1
= a0 √ + b1 sin(t) + c1 cos(t) + · · · + bn sin(t) + cn cos(t),
2
45
where
Z π
1
=√
f (t) dt,
2π −π
Z
1 π
f (t) sin(kt) dt,
bk = hf (t), sin(kt)i =
π −π
Z
1 π
ck = hf (t), cos(kt)i =
f (t) cos(kt) dt.
π −π
a0 =
1
f (t), √
2
The a0 , bk , ck are called the Fourier coefficients of f , and the function fn is called the nth-order Fourier
approximation of f .
Proof. This is a direct application of the formula in Theorem 168 for the orthogonal projection of f ∈
C[−π, π] onto the subspace Tn ⊂ C[−π, π].
Note 172. By the Pythagorean Theorem,
2
1 2
2
2
2
2
||fn || = a0 √ + ||b1 sin(t)|| + ||c1 cos(t)|| + · · · + ||bn sin(nt)|| + ||cn cos(nt)||
2
= a20 + b21 + c21 + · · · + b2n + c2n .
Theorem 173 (5.5.6, behavior of fn as n → ∞). As we take higher and higher order approximations fn
of a function f ∈ C[−π, π], the error approaches zero:
lim ||f − fn || = 0.
n→∞
Thus
lim ||fn || = ||f || ,
n→∞
and, combining this fact with the preceding note,
2
a20 + b21 + c21 + · · · + b2n + c2n + · · · = ||f || .
Proof. The first equality is proven using advanced calculus. The second one follows from the first, since
2
2
2
||f − fn || + ||fn || = ||f ||
2
by the Pythagorean Theorem. For the final equality, we use the preceding note to substitute for ||fn || in
2
2
lim ||fn || = ||f || .
n→∞
Note 174. Applying the final equation of Theorem 173 to the function f (t) = t in C[−π, π], we obtain
Z
4 4
4
1 π 2
2
2
t dt = π 2 ,
4 + + + · · · + 2 + · · · = ||t|| =
4 9
n
π −π
3
or,
∞
X
1
4 4
4
π2
=1+ + +
+ ··· =
,
2
n
4 9 16
6
n=1
46
i.e.,
v
u ∞
u X 1
π = t6
.
n2
n=1
6
6.1
Determinants
Introduction to Determinants
Note 175. A 2 × 2 matrix
A=
a
c
b
d
is invertible if and only if det(A) = ad − bc 6= 0. The geometric reason for this is that (the absolute value
of) the determinant measures the area of the parallelogram spanned by the columns of A. In particular,
    

a
0
b a
b
= |ad − bc| = |det(A)| .






0
area of parallelogram spanned by
and
= c × d = c
d
0
ad − bc 0
Thus, we have the following chain of equivalent statements:
a
b
A is invertible ⇐⇒ the columns
and
of A are linearly independent
c
d
a
b
⇐⇒ the area of the parallelogram spanned by
and
is nonzero
c
d
⇐⇒ det(A) = ad − bc 6= 0.
Definition 176 (6.1.1). Consider the 3 × 3

|
A = u
|
matrix
 
u1
| |
v w = u2
| |
u3
v1
v2
v3

w1
w2  .
w3
We define the determinant of A to be the volume of the parallelepiped spanned by the column vectors
u, v, w of A, namely
det(A) = u · (v × w),
also known as the “triple product” of u, v, w. Then, as in the 2 × 2 case,
A is invertible ⇐⇒ the columns u, v, w of A are linearly independent
⇐⇒ the volume of the parallelepiped spanned by u, v, w is nonzero
⇐⇒ det(A) 6= 0.
In terms of the entries of A,
        

u1
v1
w1
u1
v2 w3 − v3 w2
det(A) = u2  · v2  × w2  = u2  · v3 w1 − v1 w3 
u3
v3
w3
u3
v1 w2 − v2 w1
= u1 (v2 w3 − v3 w2 ) + u2 (v3 w1 − v1 w3 ) + u3 (v1 w2 − v2 w1 )
= u1 v2 w3 − u1 v3 w2 + u2 v3 w1 − u2 v1 w3 + u3 v1 w2 − u3 v2 w1 .
Note 177. In the final expression above for det(A), note that each term contains exactly one entry from
each row and each column of A. We have written the terms so that u, v, w always occur in the same order;
only the indices change. In fact, the indices occur once in each of the 3! = 6 possible permutations.
The sign on each term in the determinant formula is determined by how many pairs of indices are “out of
order,” or “inverted,” in the corresponding permutation of the numbers 1, 2, 3. If the number of “inversions”
is even, then the sign of the term is positive, and if odd, then negative.
For example, in the permutation 3, 1, 2, the pair of numbers 1, 3 is inverted, as is the pair 2, 3. In term
of the entries of the matrix A, this can be visualized by noting that v1 and w2 are both above and to the
right of u3 . Since the number of inversions is even, the term u3 v1 w2 occurs with a positive sign.
Armed with this insight, we can define the determinant of a general n × n matrix.
47
Definition 178 (6.1.3).
• A pattern in an n × n matrix A is a choice of n entries of the matrix so that one entry is chosen in
each row and in each column of A. The product of the entries in a pattern P is denoted prod P .
• Two entries in a pattern are said to be inverted if one of them is above and to the right of the other
in the matrix.
• The signature of a pattern P is
sgn P = (−1)number of inversions in P .
• The determinant of A is then defined to be
X
det A =
(sgn P )(prod P ),
where the sum is taken over all n! patterns P in the matrix.
Note 179. If we separate the positive terms from the negative terms, we can write:

 


det A = 

X
patterns P with an
even # of inversions
Note 180. For a 2 × 2 matrix
 

prod P 
−
X
patterns P with an
odd # of inversions
a
A=
c

prod P 
.
b
,
d
there are two patterns, with products ad and bc. They have 0 and 1 inversions, respectively, so det A =
(1−)0 ad + (−1)1 bc = ad − bc, as expected.
Theorem 181 (6.1.4, determinant of a triangular matrix). The determinant of an upper or lower
triangular matrix is the product of the diagonal entries of the matrix. In particular, the determinant of a
diagonal matrix is the product of its diagonal entries.
Proof. For an upper triangular n × n matrix A, a pattern with nonzero product must contain a11 , and thus
a22 , . . . , and finally ann , so there is only one pattern with potentially nonzero product. This diagonal
pattern has no inversions, so its product is equal to det A.
The proof for a lower triangular matrix is analogous, and a diagonal matrix is a special case.
6.2
Properties of the Determinant
Theorem 182 (6.2.1, determinant of the transpose). For any square matrix A,
det(AT ) = det A.
Proof. Every pattern in A corresponds to a (transposed) pattern in AT with the same product and number
of inversions. Thus the determinants are equal.
48
Theorem 183 (6.2.2, linearity of the determinant in each row and column). Let w1 , . . . , wi−1 ,
wi+1 , . . . , wn ∈ Rn be fixed row vectors. Then the function T : R1×n → R given by


− w1 −


..


.


− vi−1 −


x
−
T (x) = det 

−
− vi+1 −




..


.
−
vn
−
is a linear transformation. We say that the determinant is linear in the ith row.
Similarly, let v1 , . . . , vi−1 , vi+1 , . . . , vn ∈ Rn be fixed column vectors. Then the function T : Rn×1 → R
given by


|
|
|
|
|
T (x) = det v1 · · · vi−1 x vi+1 · · · vn 
|
|
|
|
|
is a linear transformation. We say that the determinant is linear in the ith column.
Proof. The product prod P of a pattern P is linear in each row and column because it contains exactly one
factor from each row and one from each column. Since the determinant is a linear combination of pattern
products, it is linear in each row and column as well.
Note 184. The preceding theorem states that T (x + y) = T (x) + T (y) and T (kx) = kT (x), or, for linearity
in a row,

 



− v1 −
− v1 −
−
v1
−

 



..
..
..

 



.
.
.

 



 = det − x − + − x − and
−
x
+
y
−
det 

 




 



..
..
..







.
.
.
− vn −
− vn −
−
vn
−




− v1 −
− v1 −




..
..




.
.







det − kx − = k det − x −
.




..
..




.
.
− vn −
− vn −
Theorem 185 (6.2.3, elementary row operations and determinants). The elementary row operations
have the following effects on the determinant of a matrix.
a) If B is obtained from A by a row swap, then
det B = − det A.
b) If B is obtained from A by dividing a row of A by a scalar k, then
det B =
1
det A.
k
c) If B is obtained from A by adding a multiple of a row of A to another row, then
det B = det A.
49
Proof.
a) Row swap: Each pattern P in A corresponds to a pattern Pswap in B involving the same numbers.
If adjacent rows are swapped, then the number of inversions changes by exactly 1. Swapping any
two rows amounts to an odd number of swaps of adjacent rows, so the total change in the number of
inversions is odd. Thus sgn Pswap = − sgn P for each pattern P of A, which implies
X
X
X
det B =
(sgn Pswap )(prod Pswap ) =
(− sgn P )(prod P ) = −
(sgn P )(prod P ) = − det A.
P
P
P
b) Row division: This follows immediately from linearity of the determinant in each row.
c) Row addition: Suppose B is obtained by adding k times the ith row of A to the jth row of A. Then






..
..
..
.
.
.






− vi −
− vi −

−
v
−
i












..
..
..
det B = det 
,
 + k det 
 = det 
.
.
.






− vi −
− vj −
− vj + kvi −






..
..
..
.
.
.
by linearity of the determinant in the jth row. Note that the final matrix C has two equal rows. If
we swap the two rows, the result is again C, so that det C = − det C (by part (a)) and det C = 0.
Thus det B = det A.
Procedure 186 (6.2.4, using ERO’s to compute the determinant). Use ERO’s to reduce the matrix
A to a matrix B for which the determinant is known (for example, use GJE to obtain B = rref(A)). If you
have swapped rows s times and divided rows by the scalars k1 , k2 , . . . , kr to get from A to B, then
det A = (−1)s k1 k2 · · · kr (det B).
Note 187. Since det(A) = det(AT ), elementary column operations (ECO’s) may also be used to
compute the determinant. This is because performing an ECO on A is equivalent to first applying the
corresponding ERO to AT , and then taking the transpose once again.
Theorem 188 (6.2.6, determinant of a product). If A and B are n × n matrices, then
det(AB) = (det A)(det B).
Proof.
• Suppose first that A is not invertible. Then im(AB) ⊂ im(A) 6= Rn , so AB is also not invertible. Thus
(det A)(det B) = 0(det B) = 0 = det(AB).
• If A is invertible, we begin by showing that
rref A | AB = In | B .
It
is clear
that rref A | AB = rref(A) | C = In | C for some matrix C. We can associate to
A | AB a matrix equation AX = AB, where X is a variable n × n matrix. Multiplying each side
by A−1 , we see that the unique solution is X = B. When we apply elementary row operations to
50
A | AB , the set of solutions of the corresponding matrix equation does not change. Thus the matrix
equation In X = C also has unique solution X = B, and B = C as needed.
A | AB . ConsidSuppose we swap rows s times and
divide rows by k1 , k2 , . . . , kr in computing rref
ering the left and right halves of A | AB separately, and using Procedure 186, we conclude that
det A = (−1)s k1 k2 · · · kr
and
det(AB) = (−1)s k1 k2 · · · kr (det B) = (det A)(det B).
Theorem 189 (6.2.7, determinants of similar matrices). If A is similar to B, then
det A = det B.
Proof. By definition, there exists an invertible matrix S such that AS = SB. By the preceding theorem,
(det A)(det S) = det(AS) = det(SB) = (det S)(det B).
Since S is invertible, det S 6= 0, so we can divide each side by it to obtain det A = det B.
Theorem 190 (6.2.8, determinant of an inverse). If A is an invertible matrix, then
det(A−1 ) =
1
= (det A)−1 .
det A
Proof. Taking the determinant of both sides of AA−1 = In , we get
det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1.
We divide both sides by det A 6= 0 to obtain the result.
Theorem 191 (6.2.10, Laplace expansion). For an n × n matrix A, let Aij be the matrix obtained by
omitting the ith row and the jth column of A. The determinant of the (n − 1) × (n − 1) matrix Aij is called
the ijth minor of A.
The determinant of A can be computed by Laplace expansion (or cofactor expansion)
• along the ith row:
det A =
n
X
(−1)i+j aij det(Aij ),
or
j=1
• along the jth column:
det A =
n
X
(−1)i+j aij det(Aij ).
i=1
Definition 192 (6.2.11, determinant of a linear transformation).
• Let T : Rn → Rn be a linear transformation given by T (x) = Ax. Then the determinant of T is
defined to be equal to the determinant of A:
det T = det A.
51
• If V is a finite-dimensional vector space with basis B and T : V → V is a linear transformation, then
we define the determinant of T to be equal to the determinant of the B-matrix of T :
det T = det B.
If we pick a different basis C of V , then the C-matrix C of T is similar to B, so det C = det B, and
there is no ambiguity in the definition.
Note that if V = Rn , then A is the E-matrix of T , where E = {e1 , . . . , en } is the standard basis of Rn ,
so our two definitions agree.
6.3
Geometrical Interpretations of the Determinant; Cramer’s Rule
Theorem 193 (6.3.1, determinant of an orthogonal matrix). The determinant of an orthogonal matrix
is either 1 or −1.
Proof. If A is orthogonal, then AT A = I. Taking the determinant of both sides, we see that
(det A)2 = det(AT ) det(A) = det(AT A) = det(I) = 1,
so det A is either 1 or −1.
Definition 194 (6.3.2). An orthogonal matrix A with det A = 1 is called a rotation matrix, and the
linear transformation T (x) = Ax is called a rotation.
Theorem 195 (6.3.3, the determinant and Gram-Schmidt orthogonalization). If A is an n × n
matrix with columns v1 , v2 , . . . , vn , then
|det A| = v1⊥ v2⊥ · · · vn⊥ ,
where vk⊥ is the component of vk perpendicular to span(v1 , . . . , vk−1 ).
Proof. If A is invertible, then by Theorem 140 we can write A
where Q is an orthogonal matrix and
=QR,
R is an upper triangular matrix with diagonal entries rjj = vj⊥ . Thus
|det A| = |det(QR)| = |(det Q)(det R)| = |det Q| |det R| = (1)(r11 r22 · · · rnn ) = v1⊥ v2⊥ · · · vn⊥ .
If A is not invertible, then some vk is redundant in the list v1 , . . . , vn , so vk⊥ = 0 and
⊥ ⊥ v1 v2 · · · vn⊥ = 0 = |det a| .
Note 196. In the special case where A has orthogonal columns, the theorem says that
|det A| = ||v1 || ||v2 || · · · ||vn || .
Definition 197.
• The m-parallelepiped defined by the vectors v1 , . . . , vm ∈ Rn is the set of all vectors in Rn of the
form
c1 v1 + · · · + cm vm , where 0 ≤ ci ≤ 1.
A 2-parallelepiped is also called a parallelogram.
52
• The m-volume V (v1 , . . . , vm ) of this m-parallelepiped is defined to be
⊥ .
V (v1 , . . . , vm ) = v1⊥ v2⊥ · · · vm
In the case m = n, this is just |det A|, where A is the square matrix with columns v1 , . . . , vn ∈ Rn .
Theorem 198 (6.3.6, volume of an m-parallelepiped in Rn ). The m-volume of the vectors v1 , . . . , vm ∈ Rn
is
q
V (v1 , . . . , vm ) = det(AT A),
where A is the n × m matrix with columns v1 , . . . , vm ∈ Rn .
Proof. If the columns of A are linearly independent, then consider the QR factorization A = QR. Since Q
is orthogonal,
AT A = (QR)T (QR) = RT QT QR = RT R,
so
det(AT A) = det(RT R) = det(RT ) det(R) = (det R)2
⊥ 2
= (V (v1 , . . . , vm ))2 .
= v1⊥ v2⊥ · · · vm
Note 199. If m = n in the preceding theorem, then the m-volume is
q
q
p
det(AT A) = det(AT ) det(A) = det(A) det(A) = |det(A)| ,
as noted above.
Theorem 200 (6.3.7, expansion factor). Let T : Rn → Rn be a linear transformation. The image of the
n-parallelepiped Ω defined by vectors v1 , . . . , vn is equal to the n-parallelepiped T (Ω) defined by the vectors
T (v1 ), . . . , T (vn ).
The ratio between the n-volumes of T (Ω) and Ω, called the expansion factor of T , is just |det T |:
V (T (v1 ), . . . , T (vn )) = |det T | V (v1 , . . . , vn ).
Proof. The first statement follows from the linearity of T :
T (c1 v1 + · · · + cn vn ) = c1 T (v1 ) + · · · + cn T (vn ).
To compute the expansion factor, suppose T (x) = Ax, and let B be the matrix with columns v1 , . . . , vn .
Then AB has columns T (v1 ), . . . , T (vn ), so
V (T (v1 ), . . . , T (vn )) = |det(AB)| = |det A| |det B| = |det T | V (v1 , . . . , vn ).
Theorem 201 (6.3.8, Cramer’s Rule). Given a linear system Ax = b, with A invertible, define Ab,j
to be the matrix obtained by replacing the jth column of A by b. Then the components xj of the unique
solution vector x are
det(Ab,j )
xj =
.
det A
53
Proof. Write A in terms of its columns, as A = v1 · · · vj · · · vn . If x is the solution of the system
Ax = b, then
det(Ab,j ) = det v1 · · · b · · · vn
= v1 · · · Ax · · · vn
= v1 · · · (x1 v1 + · · · + xj vj + · · · + xn vn ) · · · vn
= v1 · · · xj vj · · · vn
= xj v 1 · · · v j · · · v n
= xj det A.
Theorem 202 (6.3.9, adjoint and inverse of a matrix). Let A be an invertible n × n matrix. Define the
classical adjoint adj(A) of A to be the n × n matrix whose ijth entry is (−1)i+j det(Aji ). Then
A−1 =
Note 203. In the 2 × 2 case, if A =
a
c
1
adj(A).
det A
b
, then we get the familiar formula
d
A−1 =
1
d −b
.
ad − bc −c a
54
7
Eigenvalues and Eigenvectors
7.1
Dynamical Systems and Eigenvectors: An Introductory Example
7.2
Finding the Eigenvalues of a Matrix
Definition 204 (7.1.1). Let A be an n × n matrix. A nonzero vector v ∈ Rn is called an eigenvector of
A if Av is a scalar multiple of v, i.e.,
Av = λv
for some scalar λ.
The scalar λ is called the eigenvalue of A associated with the eigenvector v. We sometimes call v a
λ-eigenvector.
Note 205. Eigenvalues may be 0, but eigenvectors may not be 0. Eigen is German for “proper” or
“characteristic.”
Theorem 206 (geometric interpretation). A vector v ∈ Rn is an eigenvector of an n×n matrix A if and
only if the line span(v) through the origin in Rn is mapped to itself by the linear transformation T (x) = Ax,
i.e.,
x ∈ span(v) =⇒ Ax ∈ span(v).
Proof. Suppose v is a λ-eigenvector of A. Any element of span(v) is equal to kv for some scalar k. We check
that A(kv) ∈ span(v):
A(kv) = k(Av) = k(λv) = (kλ)v.
Conversely, suppose the line span(v) is mapped to itself by T (x) = Ax. Since v ∈ span(v), we must have
Av ∈ span(v), so Av = λv for some scalar λ, which means that v is an eigenvector of A.
Theorem 207 (7.2.1, finding eigenvalues). A scalar λ is an eigenvalue of an n × n matrix A if and
only if
det(A − λIn ) = 0.
The expression fA (λ) = det(A − λIn ) is called the characteristic polynomial of A.
Proof. Note that
Av = λv ⇐⇒ Av − λv = 0
⇐⇒ Av − λ(In v) = 0
⇐⇒ (A − λIn )v = 0,
so that we have the following chain of equivalent statements:
λ is an eigenvalue of A ⇐⇒ There exists v 6= 0 such that Av = λv
⇐⇒ There exists v 6= 0 such that (A − λIn )v = 0
⇐⇒ ker(A − λIn ) 6= {0}
⇐⇒ A − λIn is not invertible
⇐⇒ det(A − λIn ) = 0.
55
Theorem 208 (7.2.2, eigenvalues of a triangular matrix). The eigenvalues of a triangular matrix are
its diagonal entries.
Proof. If A is an n × n triangular matrix, then so is A − λIn . The characteristic polynomial is therefore
det(A − λIn ) = (a11 − λ)(a22 − λ) · · · (ann − λ),
with roots a11 , a22 , . . . , ann .
Theorem 209 (7.2.5, characteristic polynomial). The characteristic polynomial fA (λ) = det(A − λIn )
is a polynomial of degree n in the variable λ, of the form
fA (λ) = (−λ)n + (tr A)(−λ)n−1 + · · · + det A.
Proof. We have

a11 − λ
 a21

fA (λ) = det(A − λIn ) =  .
 ..
an1
a12
a22 − λ
..
.
···
···
..
.
a1n
a2n
..
.
an2
···
ann − λ



.

The product of each pattern is a product of scalars aij and entries of the form aii − λ, which is a polynomial
in λ with degree equal to the number of diagonal entries in the pattern. The determinant, as a sum of these
products (or their opposites) is a sum of polynomials, and hence a polynomial.
The diagonal pattern contributes the product
(a11 − λ)(a22 − λ) · · · (ann − λ) = (−λ)n + (a11 + a22 + · · · + ann )(−λ)n−1 + (lower degree terms)
= (−λ)n + (tr A)(−λ)n−1 + (lower degree terms).
Any other pattern involves at least two entries off the diagonal, so its product is of degree ≤ n − 2. Thus
the degree of fA (λ) is n, with the leading two terms as claimed.
The constant term is fA (0) = det(A).
Definition 210 (7.2.6). An eigenvalue λ0 of a square matrix A has algebraic multiplicity k if λ0 is a
root of multiplicity k of the characteristic polynomial fA (λ), meaning that we can write
fA (λ) = (λ0 − λ)k g(λ)
for some polynomial g(λ) with g(λ0 ) 6= 0. We write AM(λ0 ) = k.
Theorem 211 (7.2.7, number of eigenvalues). An n × n matrix A has at most n real eigenvalues, even
if they are counted with their algebraic multiplicities. If n is odd, then A has at least one real eigenvalue. In
summary,
X
1≤
AM(λ) ≤ n.
eigenvalues
λ of A
Proof. The sum of the algebraic multiplicities of the eigenvalues of A is just the number of linear factors in
the complete factorization of the characteristic polynomial fA (λ) (over the real numbers), which is clearly
≤ n.
If n is odd, then
lim fA (λ) = ∞ and
lim fA (λ) = −∞.
λ→−∞
λ→∞
Thus there is some negative number a with fA (a) > 0 and some positive number b with fA (b) < 0. By the
Intermediate Value Theorem, there exists a real number c between a and b such that fA (c) = 0, so that c is
an eigenvalue of A.
56
Theorem 212 (7.2.8, determinant and trace in terms of eigenvalues). If an n × n matrix factors
completely into linear factors, so that it has n eigenvalues λ1 , λ2 , . . . , λn (counted with their algebraic multiplicities), then
det A = λ1 λ2 · · · λn
and
tr A = λ1 + λ2 + · · · + λn .
Proof. Since the characteristic polynomial factors completely, it can be written
fA (λ) = det(A − λIn ) = (λ1 − λ)(λ2 − λ) · · · (λn − λ).
Substituting 0 for λ, we get
fA (0) = det(A) = λ1 λ2 · · · λn .
The trace result is an exercise.
7.3
Finding the Eigenvectors of a Matrix
Definition 213 (7.3.1). Let λ be an eigenvalue of an n × n matrix A. The λ-eigenspace of A, denoted
Eλ , is defined to be
Eλ = ker(A − λIn )
= {v ∈ Rn : Av = λv}
= {λ-eigenvectors of A} ∪ {0}.
Note 214. An eigenspace is a subspace, since it is the kernel of the matrix A − λIn . All of the nonzero
vectors in Eλ are λ-eigenvectors.
Definition 215 (7.3.2). The dimension of the λ-eigenspace Eλ = ker(A − λIn ) is called the geometric
multiplicity of λ, written GM (λ). We have
GM (λ) = dim(Eλ )
= dim(ker(A − λIn ))
= nullity(A − λIn )
= n − rank(A − λIn ).
Definition 216 (7.3.3). Let A be an n × n matrix. A basis of Rn consisting of eigenvectors of A is called
an eigenbasis for A.
Theorem 217 (eigenvectors with distinct eigenvalues are linearly independent).
Let A be a square matrix. If v1 , v2 , . . . , vs are eigenvectors of A with distinct eigenvalues, then
v1 , v2 , . . . , vs are linearly independent.
Proof. We use proof by contradiction. Suppose v1 , . . . , vs are linearly dependent, and let vm be the first
redundant vector in this list, with
vm = c1 v1 + · · · + cm−1 vm−1 .
57
Suppose Avi = λi vi . Since the eigenvector vm is not 0, there must be some nonzero coefficient ck . Multiplying the equation vm = c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 by A, we get
Avm = A(c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 )
Avm = c1 Av1 + · · · + ck Avk + · · · + cm−1 Avm−1
λm vm = c1 λ1 v1 + · · · + ck λk vk + · · · + cm−1 λm−1 vm−1 .
Multiplying the same equation instead by λm , we get
λm vm = c1 λm v1 + · · · + ck λm vk + · · · + cm−1 λm vm−1 ,
which, when subtracted from our result above, yields
0 = (λm − λm )vm = c1 (λ1 − λm )v1 + · · · + ck (λk − λm )vk + · · · + cm−1 (λm−1 − λm )vm−1 .
Since ck and λk − λm are nonzero, we have a nontrivial linear relation among the vectors v1 , . . . , vm−1 ,
contradicting the minimality of m.
Note 218. Part (a) of the following theorem is a generalization of the preceding theorem, allowing multiple
(linearly independent) eigenvectors with a single eigenvalue.
Theorem 219 (7.3.4, eigenbases and geometric multiplicities).
a) Let A be an n × n matrix. If we concatenate bases for each eigenspace of A, then the resulting eigenvectors v1 , . . . , vs will be linearly independent. (Note that s is the sum of the geometric multiplicities
of the eigenvalues of A.)
b) There exists an eigenbasis for an n × n matrix A if and only if the sum of the geometric multiplicities
of its eigenvalues equals n:
X
GM(λ) = n.
eigenvalues
λ of A
Proof.
a) We use proof by contradiction. Suppose v1 , . . . , vs are linearly dependent, and let vm be the first
redundant vector in this list, with
vm = c1 v1 + · · · + cm−1 vm−1 .
Suppose Avi = λi vi . There must be at least one nonzero coefficient ck such that λk 6= λm , since
vm and the other vectors vi with the same eigenvalue have been chosen to be linearly independent.
Multiplying the equation vm = c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 by A, we get
Avm = A(c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 )
Avm = c1 Av1 + · · · + ck Avk + · · · + cm−1 Avm−1
λm vm = c1 λ1 v1 + · · · + ck λk vk + · · · + cm−1 λm−1 vm−1 .
Multiplying the same equation instead by λm , we get
λm vm = c1 λm v1 + · · · + ck λm vk + · · · + cm−1 λm vm−1 ,
which, when subtracted from our result above, yields
0 = (λm − λm )vm = c1 (λ1 − λm )v1 + · · · + ck (λk − λm )vk + · · · + cm−1 (λm−1 − λm )vm−1 .
Since ck and λk −λm are nonzero, we have a nontrivial linear relation among the vectors v1 , . . . , vm−1 ,
contradicting the minimality of m.
58
b) Any linearly independent set of eigenvectors can contain at most GM (λ) vectors from Eλ , so the
sum s of the geometric multiplicities is an upper bound on the size of a linearly independent set of
eigenvectors. By part (a), there does always exists a linearly independent set of s eigenvectors. These
s linearly independent vectors form a basis of Rn if and only s = dim(Rn ) = n.
Theorem 220 (7.3.5, n distinct eigenvalues). If an n × n matrix has n distinct eigenvalues, then there
exists an eigenbasis for A.
Proof. For each of the n eigenvalues, the geometric multiplicity is at least 1 (in fact they must all equal 1 in
this case), so the sum of the geometric multiplicities is n. The preceding theorem implies that an eigenbasis
exists.
Theorem 221 (7.3.6, eigenvalues of similar matrices). Suppose A is similar to B. Then
a) fA (λ) = fB (λ). (study only this part for the quiz)
b) nullity(A) = nullity(B) and rank(A) = rank(B).
c) A and B have the same eigenvalues, with the same algebraic and geometric multiplicities.
d) det A = det B and tr A = tr B.
Proof.
a) If B = S −1 AS and A, B are n × n matrices, then
fB (λ) = det(B − λIn )
= det(S −1 AS − λS −1 In S)
= det(S −1 (A − λI − n)S)
= (det S −1 )(det(A − λIn ))(det S)
= (det S)−1 (det S)(det(A − λIn ))
= det(A − λIn )
= fA (λ).
b) Suppose SB = AS. Let p = nullity(B) and consider a basis v1 , . . . , vp of ker(B). Then
A(Svi ) = S(Bvi ) = S(0) = 0,
so Sv1 , . . . , Svp ∈ ker(A). Furthermore, we show that Sv1 , . . . , Svp are linearly independent. Any
linear relation c1 Sv1 + · · · + cp Svp = 0 can be rewritten S(c1 v1 + · · · + cp vp ) = 0. Multiplying by
S −1 yields a linear relation c1 v1 + · · · + cp vp = S −1 0 = 0, which must be trivial, so c1 = · · · = cp = 0.
We have found p = nullity(B) linearly independent vectors in ker(A), which implies that nullity(A) ≥
nullity(B). A similar argument shows that nullity(A) ≤ nullity(B), so the nullities are equal. For the
ranks, we use the Rank-Nullity Theorem:
rank(A) = n − nullity(A) = n − nullity(B) = rank(B).
c) A and B have the same eigenvalues and algebraic multiplicities by part (a). Since A − λIn is similar
to B − λIn (see the proof of part (a)), the geometric multiplicities of an eigenvalue λ are equal by
part (b):
nullity(A − λIn ) = nullity(B − λIn ).
59
d) This follows from part (a), since determinant and trace are coefficients of the characteristic polynomial
(up to a fixed sign).
Note 222. Similar matrices generally do not have the same eigenvectors.
Theorem 223 (7.3.7, algebraic and geometric multiplicity). If λ is an eigenvalue of A, then
GM (λ) ≤ AM (λ).
Combining this with earlier results, we get
X
eigenvalues
λ of A
7.4
X
GM(λ) ≤
AM(λ) ≤ n.
eigenvalues
λ of A
Diagonalization
Theorem 224 (7.4.1, matrix of a linear transformation with respect to an eigenbasis). Let T :
Rn → Rn be a linear transformation given by T (x) = Ax. A basis D of Rn is an eigenbasis for A if and
only if the D-matrix of T is diagonal.
Proof. Let D = (v1 , v2 , . . . , vn ). The D-matrix of T is diagonal if
[Avi ]D , is equal to λi ei , for some λi , i = 1, 2, . . . , n:

λ1 0


|
|
|
 0 λ2
λ1 e1 λ2 e2 · · · λn en  = 
 ..
..
.
.
|
|
|
0
0
and only if its ith column, [T (vi )]D =
···
···
..
.
0
0
..
.
···
λn



.

But [Avi ]D = λi ei if and only if Avi = λi vi , which is the definition of D being an eigenbasis.
Definition 225 (7.4.2). Consider a linear transformation T : Rn → Rn given by T (x) = Ax.
• T is called diagonalizable if there exists a basis D of Rn such that the D-matrix of T is diagonal.
• A is called diagonalizable if A is similar to some diagonal matrix D, i.e., if there exists an invertible
matrix S such that S −1 AS is diagonal.
Theorem 226 (7.4.3, eigenbases and diagonalizability). For a linear transformation T : Rn → Rn
given by T (x) = Ax, the following statements are equivalent:
1. T is diagonalizable.
2. A is diagonalizable.
3. There exists an eigenbasis for A.
Proof. 1 and 2 are equivalent because the D-matrix for T is equal to D = S −1 AS, where the columns of S
are the basis vectors in D.
1 and 3 are equivalent by Theorem 224.
60
Procedure 227 (7.4.4, diagonalizing a matrix). To diagonalize an n × n matrix A (if possible):
1. Find the eigenvalues of A, i.e., the roots of the characteristic polynomial fA (λ) = det(A − λIn ).
2. For each eigenvalue λ, find a basis of the eigenspace Eλ = ker(A − λIn ).
3. A is diagonalizable if and only if the dimensions of the eigenspaces add up to n. In this case, concatenate
the bases of the eigenspaces found in step 2 to obtain an eigenbasis D = (v1 , v2 , . . . , vn ) for A. Then
the matrix D = S −1 AS is diagonal, and the ith diagonal entry of D is the eigenvalue λi associated
with vi :


λ1 0 · · · 0

−1 

|
|
|
|
|
|
 0 λ2 · · · 0 

 
 ..
..
..  = v1 v2 · · · vn  A v1 v2 · · · vn  .
..
.
.
.
. 
|
|
|
|
|
|
{z
} |
{z
}
|
0
0 · · · λn
{z
}
|
S
S −1
D
Theorem 228 (7.4.5, powers of a diagonalizable matrix). Suppose a matrix A is diagonalizable, with


λ1 0 · · · 0
 0 λ2 · · · 0 


S −1 AS = D =  .
.. . .
..  .
 ..
.
.
. 
0
0
···
λn
 t
λ1
0

=S .
 ..
0
λt2
..
.
···
···
..
.
0
0
..
.
0
0
···
λtn
Then, for any positive integer t,
At = SDt S −1


 −1
S .

Proof. Solving for A in S −1 AS = D, we obtain A = SDS −1 . Thus
t times
t
A = (SDS
t times
z
}|
{
z }| {
−1
−1
−1
) = (SDS )(SDS ) · · · (SDS ) = S DD · · · D S −1 = SDt S −1 ,
−1 t
so that

λ1
0

At = S  .
 ..
0
λ2
..
.
···
···
..
.
0
0
..
.
0
0
···
λn
t

 −1
 S

 t
λ1
0

=S .
 ..
0
λt2
..
.
···
···
..
.
0
0
..
.
0
0
···
λtn


 −1
S .

Definition 229 (7.4.6, eigenvalues of a linear transformation).
• Let V be a linear space and T : V → V a linear transformation. A nonzero element f ∈ V is called
an eigenvector (or an eigenfunction, eigenmatrix, etc., depending on the nature of V ) if T (f ) is
a scalar multiple of f , i.e.,
T (f ) = λf for some scalar λ.
The scalar λ is called the eigenvalue associated with the eigenvector f .
• If V is finite dimensional, then a basis D of V consisting of eigenvectors of T is called an eigenbasis
for T .
61
• The transformation T is called diagonalizable if there exists some basis D of V such that the D-matrix
of T is diagonal.
Theorem 230 (eigenbases and diagonalization). A linear transformation T : V → V is diagonalizable
if and only if there exists an eigenbasis for T .
Proof. Let D = (f1 , f2 , . . . , fn ) be a basis of V . Then the D-matrix of T is diagonal if and only if its ith
column [T (fi )]D is equal to λi ei for some λi , i = 1, 2, . . . , n. This condition is equivalent to T (fi ) = λi vi ,
which is the definition of D being an eigenbasis for T .
Procedure 231 (diagonalizing a linear transformation). Let V be a finite dimensional linear space.
To diagonalize a linear transformation T : V → V (if possible):
1. Sometimes you can find an eigenbasis D directly, in which case you are done. If not, then choose any
basis B = (f1 , . . . , fn ) of V .
2. Compute the B-matrix of V :

|
B = [T (f1 )]B
|
···

|
[T (fn )]B  .
|
3. Find the eigenvalues of B, i.e., the roots of the characteristic polynomial fB (λ) = det(B − λIn ).
4. For each eigenvalue λ, find a basis of the eigenspace Eλ = ker(B − λIn ).
5. B (and hence T ) is diagonalizable if and only if the dimensions of the eigenspaces add up to n. In
this case, concatenate the bases of the eigenspaces found in step 4 to obtain an eigenbasis D0 =
(v1 , v2 , . . . , vn ) for B.
6. The vi are the B-coordinate vectors of an eigenbasis D = (g1 , . . . , gn ) for T , that is,
[gi ]B = vi ,
or gi = L−1
B (vi ).
This procedure is illustrated in the diagrams below:
R n
LD0 Rn
7.5
eKK
KK LB
KK
KK
KK
sV
ss
s
ss
ss LD
yss
B
T
D
/ n
9R
ss LB ss
ss
ss
s
/V s
LD0
KK
KK
KK
K
LD KKK % / Rn
v i
LD0
ei
eKKK
KKKLB
KKK
KK
gi
ss
s
s
s
sssLD
s
s
ys
B
/ λi vi
s9 s
s
sss
sss
/ λi gi
LD0
KKK
KKK
K
LD KK%
/ λi ei
LB
T
D
Complex Eigenvalues
Definition 232. A field F is a set F together with an addition rule and a multiplication rule:
• For a, b ∈ F, there is an element a + b ∈ F.
• For a, b ∈ F, there is an element ab ∈ F.
which satisfy the following ten properties for all a, b, c ∈ F:
1. addition is associative: (a + b) + c = a + (b + c).
2. addition is commutative: a + b = b + a.
62
3. an additive identity exists: There is an element n ∈ F such that a + n = a for all a ∈ F. This n is
unique and is denoted by 0.
4. additive inverses exist: For each a ∈ F, there exists a b ∈ F such that a + b = 0. This b is unique
and is denoted by (−a).
5. multiplication is associative: a(bc) = (ab)c.
6. multiplication is commutative: ab = ba.
7. a multiplicative identity exists: There is an element e ∈ F such that ae = a for all a ∈ F. This e
is unique and is denoted by 1.
8. multiplicative inverses exist: For each nonzero a ∈ F, there exists a b ∈ F such that ab = 1. This
b is unique and is denoted by a−1 .
9. multiplication distributes over addition: a(b + c) = ab + ac.
10. the identities are distinct: 0 6= 1.
Note 233.
• The existence of additive inverses allows us to subtract, while the existence of multiplicative inverses
allows us to divide (by nonzero elements).
• In this course, we have studied linear algebra over the field R of real numbers. Other common fields
include the complex numbers C and the rational numbers Q. Many other fields exist, such as the field
F2 = {0, 1} of two elements, for which 1 + 1 = 0.
• The linear algebraic concepts we have studied in this course make sense over any field of scalars, with
the exception of the material in Chapter 5 involving dot products.
Theorem 234 (7.5.2, fundamental theorem of algebra). Any polynomial p(λ) with complex coefficients
splits, meaning that it can be written as a product of linear factors
p(λ) = k(λ − λ1 )(λ − λ2 ) · · · (λ − λn )
for some complex numbers k, λ1 , λ2 , · · · , λn .
Proof. This is a result in complex analysis.
Theorem 235 (7.5.4, number of complex eigenvalues). A complex n×n matrix A has exactly n complex
eigenvalues, if they are counted with their algebraic multiplicities. In other words,
X
AM(λ) = n.
eigenvalues
λ of A
Proof. The sum of the algebraic multiplicities is the number of linear factors in the complete factorization
of fA (λ), which equals n by the fundamental theorem of algebra.
63
Theorem 236 (7.5.3, real 2 × 2 matrices with complex eigenvalues). If A is a real 2 × 2 matrix with
eigenvalues a ± ib (where b 6= 0), and if v + iw is an eigenvector of A with eigenvalue a + ib, then
a −b
−1
S AS =
, where S = w v .
b a
Thus A is similar, over the real numbers, to a rotation-scaling matrix
p
cos θ − sin θ
a −b
2
2
,
= a +b
sin θ
cos θ
b a
√
√
where cos θ = a/ a2 + b2 and sin θ = b/ a2 + b2 .
Proof. By Theorem 226,
P
−1
a + ib
0
AP =
,
0
a − ib
where
P = v + iw
v − iw .
Similarly, we can diagonalize the rotation-scaling matrix above to obtain
a −b
a + ib
0
i
R−1
R=
, where R =
b a
0
a − ib
1
Thus
P −1 AP = R−1
and
a
b
a
b
−i
.
1
−b
R,
a
−b
= R(P −1 AP )R−1 = S −1 AS,
a
where S = P R−1 and S −1 = (P R−1 )−1 = RP −1 . We check that
1
1 −1
v + iw v − iw
S = PR =
−1
2i
i
= w
i
v .
Theorem 237 (7.5.5, determinant and trace in terms of eigenvalues). For any n × n complex matrix
A with complex eigenvalues λ1 , λ2 , . . . , λn , listed with their algebraic multiplicities,
det A = λ1 λ2 · · · λn
and
tr A = λ1 + λ2 + · · · + λn .
Proof. The proof is the same as for Theorem 212.
64