Document related concepts

Equation wikipedia , lookup

Tensor operator wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Dual space wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Cartesian tensor wikipedia , lookup

Determinant wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Jordan normal form wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Bra–ket notation wikipedia , lookup

System of linear equations wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Four-vector wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Matrix calculus wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Linear algebra wikipedia , lookup

Matrix multiplication wikipedia , lookup

Transcript
Linear Algebra Course Notes
QIRUI LI
C ONTENTS
1. Matrix and Determinants
2
1.1. What is the matrix do for linear algebra
2
1.2. Field and Number Matrix
8
1.3. Matrix Multiplication
11
1.4. Block Matrix Multiplicaiton
19
1.5. Elementary Matrices, Row and Column Transformations
22
1.6. Determinant
42
1.7. Laplacian Expansion, Cofactor, Adjugate, Inverse Matrix Formula
49
2. Linear Equation
55
2.1. Non-Homogeneous Linear Equation, The existence of the solution
60
2.2. Homogeneous Linear Equation, The uniqueness of the solution
63
3. Vector Spaces, Linear Transformation
68
3.1. Actions, Matrix of Actions
68
3.2. Linear Spaces, dimension
71
3.3. Linear Maps and Linear Transformation
102
3.4. Attempting to go back - Isomorphism, Kernel and Image
112
3.5. Invariant subspace for linear transformation. Power-Rank.
118
4. Eigenvalues and Eigenvectors
130
4.1. Finding Eigenvalues and Eigenvectors
130
4.2. Linear Independence of eigenvector, Algebraic and Geometrical multiplicity, Eigen
spaces.Diagonalization.
135
4.3. Polynomial of linear transformation, Spectral mapping theorem, minimal polynomial of linear
transformation.
153
4.4. (Coming soon, Not in Final)Jordan Canonical Form
159
4.5. (Coming soon, Not in Final)Root subspace and classification of invariant subspaces
159
4.6. (Coming soon, Not in Final)Classification of linear transformation commuting with T
159
Date: Last Update:16:22 Wednesday 1st July, 2015;
1
1. M ATRIX AND D ETERMINANTS
Introduction: Before you explore the interesting world of Linear Algebra. You should consolidate your
computation skills and tools. The Matrix and Determinants are the core for all kinds of Linear Algerba
Problems. Like whater and air, we can’t live without them. We will concentrate on calculation in this
chapter
1.1. What is the matrix do for linear algebra. -
1.1.1. Geometric intuition for linear transformations. Introduction: We will introduce conceptually why matrix is important and what should we always think
about when we first study matrices.
Think about how people present a cubic box in a piece of paper:
It is a common sense that angle A of the above picture should be a right angle. But when you use
protractor to measure A, you will see it is an obtuse angle. How does our brain work such that we can
correctly recognize that right angle although it is obtuse as apeared on paper?
Our brain is doing ”linear transformation” everytime so that we can understand space from our eyes.
Now let’s think about how our brain work. If you are good at drawing, how do you draw a cubic box on
paper so that it looks really like a box? (Suppose the distance of you and the box is far enough compared to
the size of the box)
As we trying to make it real to us, we are keeping parallel lines when we draw. This is exactly how
objects appear in our eyes. parallel kept to be parallel. In order to understand the world in the photo, or do
better in arts, we study linear algebra.
Look at following pictures.
2
3
In the first two pictures we map parallel lines to parallel lines, so it’s linear. In the third picure, parallel
lines does not mapped to parallel lines, it’s not linear. In the last graph, they even mapped lines to curves,
this like the world you see in distorting mirrow.
1.1.2. How to represent a linear transformation. We conceptually understand what is linear map, a map where parallel lines are kept. But how to explicitly
represent them?
It is a common sense that rotation is a linear map, because parallel lines rotated to parallel lines. Now
let’s specifically study the rotation counterclockwise by 90 degree.
Problem 1. What is the coordinates of point (1,0) after rotating 90 degree counterclockwise? How about
point (0,1)?
We know that (1,0) is the vector that pointing rightwards, after a rotation of 90 degree counterclockwise,
it would becom a vector that pointing upwards with length invariant, that is, (0,1). The vector (1,0) is facing
upwards, after rotation, it will facing leftwards, that is, (-1,0). Suppose we denote the map as T, then we
know
T((1,0)) = (0,1)
T((0,1)) = (-1,0)
In order to fully understand a linear map, we need to know where every point goes.
Problem 2. Looking at the folowing picture. Do you know where will (3,5) map to after rotating 90 degree
counterclockwise? or what is T(3,5)?
We draw the auxillary nets, we claim these squares in the nets maped to squares in the nets, because rotate
a square counterclockwise by 90 degree is still a square.
4
Before the rotation, we reach the point (3,5) by going right 3 units and going up 5 units.
After rotation conterclockwise, right became up, and up became left. After the rotation, (3,5) maps to
the point that should be got by going up 3 units and going left 5 units, which is exactly (-5,3)
Remember the T:
T((1,0)) = (0,1)
T((0,1)) = (-1,0)
Note that (1,0) means right. and (0,1) means up. (0,1) means up and (-1,0) means left. Then we can use
coordinate to explain what is hapenning:
5
All the fact explains T((3,5))=(-5,3)
Not only for (3,5), but also for any arbitrary point, we can know where it will map to by this method. for
example, T((2,1))=(-1,2), T((9,9))=(-9,9). Thus, to figure out a formula for general points. We assume
T((x,y)) = (u,v)
If we can express (u,v) in term of (x,y), we then dare to say we fully understand what is rotation. We know
that (x,y) will rotate to (-y,x), that is u=-y, and v=x. Then we write the system of equation in the following
standard form.
(1)
u = 0x + (-1)y
v = 1x + 0y
With previous understanding, we know that in order to know how a linear transformation on a plane acts
on every point is enough to know how that act on (1,0) and (0,1). And the coefficient in the system of
equation determines the linear map completely. People then refine all the coefficient into a matrix to denote
the specific linear transformation.
0 −1
With previous calculation. Rotation by 90 degree conterclockwise is
1 0
We already studied how to get a matrix from linear trasformation. Now we end this section by showing
an example how to get the linear transformation from matrix.
1 0
Problem 3. How do you understand
? Someone said it is reflection by x-axis, will you believe
0 −1
that?
We know that this means a linear map
T((x,y)) = (u,v)
, where the details of the map shoule be given by the following equation:
(2)
u = 1x + 0y
v = 0x + -1y
6
Let’s see where will (3,4) maps to. Plug (x,y)=(3,4) in that equation, we get (u,v)=(3,-4), which means
T((3,4)) = (3,-4)
It is exactly get by reflection by x axis. With checking more and more points, we found that it make sense
that it is reflection by x-axis.
All the sentences in this section is conceptual. None of them are strict proofs or definitions. With this
interpretations of matrix in mind, you will feel better to study matrix and better in understanding matrix
multiplication. Note that this interpretation only make sense when it is a matrix over R and assuming
basis is given and fixed. We will explain every terminology in the future. Next, we will develop a strict
theory for matrices. This section have nothing to do with the strict theory but makes you feel better on
understanding further concepts. Although matrix have far more meanings than only linear transformation,
7
1.2. Field and Number Matrix. Goal: Conceptually speaking, matrix is describing a linear transformation when elemets are in R,
but matrix itself is more than that. In this section, you are going to learn the most basic, strict, or even
abstract concepts of number matrix, to start with, we should define every term: Number and Matrix. To
define what is numbers, we define the terminology: Fields.
1.2.1. Fields.
Definition 1. We call a set F that equipped with two two-term-operations + and × a Field if this set satisfies
the following axioms
(1) Closed under addition and multiplication if a ∈ F and b ∈ F , then a + b ∈ F and a × b ∈ F
(2) Commutativity of + For any two elements a, b ∈ F , we have a + b = b + a
(2) Associativity of + For any three elements a, b, c ∈ F we have (a + b) + c = a + (b + c)
(3) Existence of 0 There exists an element 0, such that for any elemnt a ∈ F ,a + 0 = a.
(4) Existence of opposite For any element a ∈ F , There exists an element called −a, such that a +
(−a) = 0
(5) Commutativity of × For any two elements a, b ∈ F , we have a × b = b × a
(6) Associativity of × For any three elements a, b, c ∈ F we have (a × b) × c = a × (b × c)
(7) Existence of 1 There exists an element 1, such that for any elemnt a ∈ F , a × 1 = a.
(8) Existence of reciprocal For any non-zero lement a 6= 0 ∈ F , There exists an element called a−1 ,
such that a × a−1 = 1
(9) Distribute property for any a, b, c ∈ F , we have a × (b + c) = a × b + a × c
As you see, there is nothing strage for us. The importance of this axioms is that not only real number or
complex numbers could be numbers. There are more other type of useful numbers
Example 1. A simple example of a field is the field of rational numbers, consisting of numbers which can
be written as fractions ab , where a and b are integers, and b 6= 0. The opposite of such a fraction is simply
− ab , and the reciprocal (provided that a 0) is ab .
Example 2. Suppose F only has two elements, namely 0 and 1, they are zero element and unit element
respectively. And we define 1 + 1 = 0 ,0 + 1 = 1,0 + 0 = 0,0 ∗ 0 = 0,0 ∗ 1 = 0, 1 ∗ 1 = 1. Then this is a
field. This is the minium size field in the universe — field of two element.
Example 3. It is clear that real numbers(all rational and irrational numbers) formed a field. Later on
we will use complex numbers, which can be expressed as a + bi, where i is the mysterious element that
saitisfying i2 = −1. We can verify that complex numbers also satisfies the axioms of field. In the book ,we
denote real number field as R, and complex number field as C
In this book, the element in field are always called number or scalar.
In the study of linear algebra, you will not only encounter calculation, but also proofs. proofs are logic
analysis of why some statement is right based on axioms. To show an example, we proof the following
proposition.
Proposition 1. For any element a ∈ F , a × 0 = 0
Proof. We proof as following:
Assume c = a × 0
We Claim that c = c + c, because:
By definition of 0, we have 0 + 0 = 0.
Thus c = a × 0 = a × (0 + 0) = a × 0 + a × 0 = c + c
So we verified c = c + c
By the existance of −c, we add −c on both side of equation
8
Thus, (−c) + c = (−c) + c + c
So 0 = c
Thus we proved c = 0
This means a × 0 = 0
1.2.2. Matrices.
Definition 2. A Matrix M over a field F or, simply, a matrix M(when the field is clear) is a rectangular
array of numbers aij in F , which are usually presented in the following form:


a11 a12 · · · a1n
 a21 a22 · · · a2n 


 ..
..
.. 
..
 .
.
.
. 
am1 am2 · · ·
amn
Example 4. The following are matrices over Q (Remember Q is the field of rational numbers, i.e. numbers
of the form ab , where each a and b are integers. Like 12 , 27 ):

 1
3
1
2 4 8
0.212121 · · ·
0.5
7
6
1 2 3 ,  5 53 1  ,
,
(5),
8
0.24
0.333 · · ·
4
23 0.03
7
6
3
where in the last matrix the dots represents repeating digits, not the ignored elements.
Example 5. The following are matrices over R (Remember R is the field of all rational and irrational
numbers, i.e. √numbers that can be expressed as finitely many integer digit and infinitely many decimal
digits. Like 13 , 2, π)
 √ 


4
π
4
2
5
 π  , 31 2 ,  10− 13 cos 3π 1 
5
5 7
e
2
6
10
Example 6. The following are matrices over C (Remember C is the field of all complex numbers, i.e.
numbers that can be expressed as a + bi where a and b are real numbers and i is the element satisfies
i2 = −1)


 
1 3 2
3 + i 1 53
1+i
2i
,  6 2 8  ,  π5 + 16 i 5 1 
3 + πi 5 + 2i
1 2 4
9 + 9i 1 5i
Example 7. Because we know rational numbers are included in real numbers, real numbers are included
in complex numbers, so a matrix over Q is a matrix over R, and a matrix over R is a matrix over C


a11 a12 · · · a1n
 a21 a22 · · · a2n 


For a matrix A =  ..
..
..  over F,
..
 .
.
. 
.
am1 am2 · · · amn
the rows of Matrix are called row vectors of Matrix, always denoted as rk , where the subindex are
arranged by order:
1st row vector r1 = a11 a12 · · · a1n 2nd row vector r2 = a21 a22 · · · a2n
···
m’th row vector rm = am1 am2 · · · amn
9
The columns of Matrix are called column vectors of Matrix, always denoted as ck , where the subindex
are arranged by order:




a11
a12
 a21 
 a22 




1st column vector c1 =  ..  2nd column vector c2 =  .. 
 . 
 . 
am1



· · · n’th column vector cn = 

am2
a1n
a2n
..
.





amn

3 1
Example 8. For the matrix over R: A =  1 5
1
3 5
√ r1 = 3 1
2 , r2 =
The column vectors are
√ 
2
8 . The row vectors are
2
1 5 8 , r3 = 13 5 2

 
 √ 
3
1
2
c1 =  1  , c2 =  5  , c3 =  8 
1
5
2
3

The element aij , which located at i’th row and j’th column are called ij-entry of the Matrix. We frequently
denote a matrix by simply writting A = (aij )1≤i≤m if the formula or rule of aij is explicitly given or clear.
1≤j≤n


1 3 5
Example 9. Suppose (aij ) 1≤i≤3 =  2 1 9 , What is a12 ?
1≤j≤3
7 0 8
Answer: a12 is the element getting by the first row and second column, so a12 = 3
Example 10. What is the matrix (i + j) 1≤i≤3 ?
1≤j≤3


2 3 4
Answer:  3 4 5 
4 5 6
Example 11. What is the matrix (i2 + j 2 ) 1≤i≤1 ?
1≤j≤2
Besids the notation (aij )1≤i≤m where aij is bunch of numbers. People can also write a matrix as a row
1≤j≤n
vector of column vectors, or column
vector of row vectors. Explicitely, row vector of column vectors is
like A = c1 c2 · · · cn , this only make sense when
each

 the size of each column vector ci are
r1
 r2 


equal. The column vector of row vectors is like A =  ..  where each ri are row vectors, and only
 . 
rm
make sense when size of each of them are equal.
r1
Example 12. Suppose r1 = 6 2 3 , r2 = 4 9 , what is the matrix
?
r2
Answer:This does not make sense because the size of r1 are not equal to the size of r2
10
r1
Example 13. Suppose r1 = 2 3 , r2 = 1 5 , what is the matrix
?
r2
2 3
1 5
1
2
Example 14. Suppose c1 =
, c2 =
, what is the matrix c1 c2 ?
0
9
1 2
0 9
A matrix with m rows and n columns is called an m by n matrix, written as m × n. The pair of the
number m × n is called the size of the matrix.
Example 15. What is the size of 1 3 ?
Two matrices A and B over F are called equal, written A=B, if they have the same size and each of the
corresponing elements are equal. Thus the equality of two m × n matrices is equivalent to a system of mn
equalities, each of the equality corresponds to pair of elements.
x y
1 4
Example 16. Solving the equation
=
z 5
2 5
Answer: as the 22-entry matches each other, this is possible for the equation make sense. By definition of
equal of matrix, we have x = 1, y = 4, z = 2
A matrix of size 1 × n over F, namely, only consists of 1 row, are called row matrix or directly abuse of
notation called row vector.
A matrix of size m × 1 over F, namely, only consists of 1 column, are called column matrix or directly
called column vector
Example 17. Which of the following are row vectors? Which of them are column vectors?


4
1 3 6


2 4 , 2 , (5),
9 1 7
6
Answer: The 1st is a row vector, the 2nd is a column vector, the third is both row vector and column
vector,the last one is neither.
For convenient, people always ignore 0 in the entry of matrices, thus when you see some number matrix
has missing entry, that entry represents 0




0 0 0
1  means  0 0 1 
Example 18. 
0 0 0




2
0 2 0
 6
3  means  6 0 3 
2
0 2 0
1.3. Matrix Multiplication. Goal:Our Matrix is defined over a field. We have + and × in field. So it should be inherit by the matrix.
With these operation, we can do basic algebra with matrices. You will see in the future how matrix algebra
simplify all the linear algebra problem.
11
Definition 3. Suppose A = (aij )1≤i≤m and B = (bij )1≤i≤m are two matrices over F and have the same
1≤j≤n
1≤j≤n
size m × n. The sum of the two matrices are defined to be A + B = (aij + bij )1≤i≤m . Explicitly,
1≤j≤n





a11
a21
..
.
a12
a22
..
.
···
···
..
.
a1n
a2n
..
.
 
 
 
+
 
b11
b21
..
.
b12
b22
..
.
···
···
..
.
b1n
b2n
..
.


 
 
=
 
a11 + b11
a21 + b11
..
.
a12 + b12
a22 + b12
..
.
···
···
..
.
a1n + b1n
a2n + b2n
..
.





bm1 bm2 · · · bmn
am1 + bm1 am2 + bm2 · · · amn + bmn


1 7
1 3 5
Example 19. Does
+  2 2  make sense? if it is, please calculate.
7 1 0
1
4 9
Answer: This expression doesn’t make sense because the former is of size 2 × 3, and later 3 × 2, so their
size are not equal.
1 7 2
1 3 5
make sense? if it is, please calculate.
Example 20. Does
+
7 1 0
2 41 9
1 7 2
1+1 3+7 5+2
2 10 7
1 3 5
+
=
=
7 1 0
2 14 9
7 + 2 1 + 14 0 + 9
9 54 9
am1 am2 · · ·
amn
The definition of matrix multiplication is somewhat complicated. To see how complecated it is, we give
the strict definition:
Definition 4. Suppose A = (aij )1≤i≤m and B = (bjk )1≤j≤n are two matrices over F. Then we define the
1≤j≤n P
1≤k≤p
product of this two matrix to be AB = ( nj=1 aij bjk )1≤i≤m . Explicitly,
1≤k≤p





···
···
..
.
a1n
a2n
..
.
am1 am2 · · ·
amn
a11
a21
..
.
a12
a22
..
.
 
 
 
×
 
···
···
..
.
b1p
b2p
..
.
bn1 bn2 · · ·
bnp
b11
b21
..
.
b12
b22
..
.
Pn
 Pn
a b
a b
···
Pnj=1 1j j1 Pnj=1 1j j2
 
a
b
a
b
·
··
2j
j1
2j
j2
j=1
j=1
 
=
..
..
.
..
 
Pn .
Pn .
j=1 amj bj1
j=1 amj bj2 · · ·

Pn
a b
Pnj=1 1j jp
a
j=1 2j bjp
..
Pn .
j=1 amj bjp
It might cost you whole life to understand this definition by above words. let’s analysis what is going on.
Firstly, to define multiplication. the last number of the size of first matrix should match the first number
of the size of last matrix. In other words, the number of the columns of A should be equal to the number of
rows of B.
Example 21. Suppose A is a 3 × 2 matrix, B is a 5 × 9 matrix, does AB make sense?
Answer: No, because 2 6= 5


1 5
1 8 0
Example 22. Suppose A =  8 1 , B =
, Does AB make sense?
3 7 2
3 9
Yes, becausethe size of A is 3 × 2, and B is of size 2 × 3, the final result AB would be a 3 × 3
∗ ∗ ∗
matrix.like  ∗ ∗ ∗ 
∗ ∗ ∗


1 5
1
Example 23. Suppose A =  8 1  1, B =
, Does AB make sense?
8
3 9
12





the size of A is 3 × 2, and B is of size 2 × 1, the final result AB would be a 3 × 1
 because

∗
matrix. like  ∗ 
∗
If the matrix is of right size, suppose we have

a11 a12 · · · a1j · · · a1n
b11 b12
 a21 a22 · · · a2j · · · a2n   b21 b22


 ..
..
..
..  
..
..
..
 ..
 .
.
.
.
.
. 
.
.


 ai1 ai2 · · · aij · · · ain  
b
b

j1
j2


 ..
..
..
..
..
..  
.
.

.
.
 .
.
.
.
.
.
.
. 
bn1 bn2
am1 am2 · · · amj · · · amn

···
···
..
.
b1k
b2k
..
.
···
···
..
.
···
..
.
···
bjk
..
.
···
..
.
···
bnk
b1p
b2p
..
.


c11
c21
..
.
c12
c22
..
.
···
···
..
.
 
 
 
 
=
bjp   ci1 ci2 · · ·
 .
.. 
..
..
.
.   ..
.
bnp
cm1 cm2 · · ·
c1k
c2k
..
.
···
···
..
.
c1p
c2p
..
.
cik
..
.
···
..
.
···
cip
..
.
cmk
Then the element cik is given by multiplying ith row of A and kth column of B:
cik = ai1 b1k + ai2 b2k + ai3 b3k + · · · + ain bnk


5 7
1 5 7 
1
×
5
+
5
×
1
+
7
×
3
1
×
7
+
5
×
0
+
7
×
3
66
28
1 0 =
=
Example 24.
2×5+8×0+1×3 2×7+8×0+1×3
26 17
2 8 1
8 3
Definition 5. Suppose A = (aij )1≤i≤m is a matrix over F , and λ ∈ F is a scalar, then we define the scalar
1≤j≤n
multiplication of A to be λA = (λaij )1≤i≤m , to be explicitely,
1≤j≤n



λ

Example 25. 3
a11
a21
..
.
a12
a22
..
.
···
···
..
.
a1n
a2n
..
.


 
 
=
 
am1 am2 · · · amn
3 9 18
1 3 6
=
2 9 2
6 27 6
···
···
..
.
λa1n
λa2n
..
.
λam1 λam2 · · ·
λamn
λa11
λa21
..
.
λa12
λa22
..
.





Definition 6. Suppose A = (aij )1≤i≤m and B = (bjk )1≤j≤n are two matrices over F. Then we define
1≤j≤n
the difference of this two matrix to be A
corresponding entries of A and B

 
1 5 2
1 2
Example 26.  8 2 1  −  2 4
0 2 3
8 1
1≤k≤p
− B = A + (−1)B. Explicitly, just do substraction for each
 

0
0
3
2
6  =  6 −2 −5 
2
−8 1
1
Definition 7. The zero matrix of size m × n is a m × n matrix with all entries equal to 0, denote as 0m×n
or simply 0 if we konw the size from content.
0 0 0
Example 27.
is a zero matrix.
0 0 0
We call a matrix square matrix if it is of size n × n for some integer n. The multiplication of two square
matrix make sense if and only if they are of the same size, and the result is still a square matrix of the same
size. Now let’s concentrate on square matrix.
diagnal of square matrix: We call the left-right, up-down diagnal as diagnal of matrix.
Pay attention, we don’t call the other diagnal line the diagnal of matrix. When we say diagnal, we always
assume we start from left up and end with right down.
13
cmp











Example 28. In 


∗
∗
∗


, all the starts lies in the diagnal of the square matrix, but in 

∗
the stars are not lies in the diagnal of the matrix
∗
∗

,
∗
Definition 8. Unit Matrix: Unit Matrix is a square matrix who has entry 1 in diagnals and 0 elsewhere. we
always denote it as In , where n × n is the size of it. Unit Matrix looks like


1


1




1




1




..


.
1
Proposition 2. For any square matrix A of size n × n, we have In A = AIn = A
this means the unit matrix acts like multiplicative identity.
Definition 9. Suppose A is a square matrix of size n × n, if there exists a matrix B of the same size, such
that BA = In , then we call B the inverse of A, and we call the matrix A invertible matrix we always denote
such B as A−1
Proposition 3. If a matrix is invertible, then the inverse of a matrix unique, and is commute with the original
matrix, that is AA−1 = A−1 A = In
The proof needs the fact that if A have left inverse, then A should also have right inverse. We will prove
this proposition after we learned the rank of the matrix.
Proposition 4. The product of two invertible matrix is invertible, and the inverse is given by: (AB)−1 =
B −1 A−1
Proof. We calculate B −1 A−1 AB = B −1 (A−1 A)B = B −1 IB = B −1 B = I, so indeed, we can write(B −1 A−1 )(AB) =
I and by the uniqueness of inverse, this implies (AB)−1 = B −1 A−1
Proposition 5. Suppose A is an invertible matrix
(1) (AT )−1 = (A−1 )T
(2) (A−1 )−1 = A
Proposition 6. If the following notation make sense, then
(1) (A + B) + C = A + (B + C)
(2) A + 0 = 0 + A (Here 0 means 0 matrix)
(3) A + (−A) = (−A) + A = 0
(4) A + B = B + A
(5) k(A + B) = kA + kB (Here k ∈ F is a scalar in feild)
(6) (k + k 0 )A = kA + k 0 A (Here k and k’ are two scalars in feild)
(7) (kk 0 )A = k(k 0 A)
(8) 1A = A
Proposition 7. If the following notation of matrix make sense, then
(1) A(B + C) = AB + AC
(2) (B + C)A = BA + CA
(3) (AB)C = A(BC)
(4) k(AB) = (kA)B = A(kB)
14
With distributive law, we can apply our algebra skills of numbers to matrices, this is what we called
Matrix Algebra. Keep in mind that A and B are not necessarily commute.
Example 29. Suppose A, B are all 2 × 2 matrices, expand (A + B)(A − B), Is this equal to A2 − B 2 ?
When are they equal?
Answer: (A+B)(A−B) = A(A−B)+B(A−B) = AA−AB +BA−BB = A2 −B 2 +(BA−AB).
But A2 −B 2 +(BA−AB) = A2 −B 2 if and only if BA−AB = 0, that is AB = BA. So (A+B)(A−B) =
A2 − B 2 if and only if A and B commute.
This example shows when two matrices commute each other, most of the laws for numbers applies to A
and B. But tipically are not.
Commutativety is a very important problem in linear algebra. It is not exaggeration to say that the
uncertainty of the future is just because of the existance of non-commute matrices.
Definition 10. Suppose A = (aij )1≤i≤m , then we define the transpose of A to be AT = (aji ) 1≤j≤n , to be
1≤j≤n
1≤i≤m
precise,





···
···
..
.
a1n
a2n
..
.
am1 am2 · · ·
amn
a11
a21
..
.
a12
a22
..
.
T





 =


···
···
..
.
am1
am2
..
.
a1n a2n · · ·
amn
a11
a12
..
.
a21
a22
..
.





Example 30.


2 0
= 3 6 
1 5
T 1 5
1 6
=
6 7
5 7
 T
1
 3  = 1 3 5
5
2 3 1
0 6 5
T
Proposition 8. Suppose the following notation of matrix make sense, then
(1) (AB)T = B T AT
(2) (A + B)T = AT + B T
(3) (λA)T = λAT
(4) (AT )T = A
We will prove this proposition after we study the block matrix multiplication.
1.3.1. Some special square matrix. Diagnal Matrix: Diagnal Matrix is a square matrix who only has entry in diagnals and 0 elsewhere. we
always denote it as diag(a1 , a2 , · · · , an ), where n × n is the size of it, and each ai is the entry of it. Diagnal
Matrix looks like


a1


a2




a3


diag(a1 , a2 , · · · , an ) = 

a4




.
.


.
an
15
Scalar Matrix: Scalar Matrix is a diagnal matrix with the diagnal entry all the same. It could be viewed
as a scalar multiply the unit matrix
Unit matrix is scalar matrix, every scalar matrix is diagnal matrix. Zero square matrix is scalar matrix, is
diagnal matrix.


−1 0 0
Example 31. Diagnal Matrix over Q:  0 21 0 
0 0 5

 1
−2 0
0
Scalar Matrix over Q:  0 − 12 0 
0
0 − 12
Now we discuss what will happen when we multiplying the diagnal matrix.
When left multiplying diagnal matrix on a given matrix, the corresponding entries of diagnal matrix
multiplied on each row of the given matrix. When right multiplying diagnal matrix, the corresponding
entries multiplied on each column of the given matrix. To be precise, look at the following example:


 

2
2
1
3
4
2
6
  10 20 15  =  50 100 75 
5
Example 32. 
6
100 200 120
600 1200 720
This answer is got by applying the matrix multiplication rule. the final result the first row multiplied by
2, 2nd row by 5, 3rd row by 6.


 

2 10 300
2
4 50 1800
 =  12 350 4800 
5
Example 33.  6 70 800  
1 20 1200
6
2 100 7200
This answer is got by applying the matrix multiplication rule. the final result the first column multiplied
by 2, 2nd column by 5, 3rd column by 6
Proposition 9. The product of diagnal matrices is diagnal matrix. If the diagnal elements are all non-zero,
then the diagnal matrix is invertible, and the inverse is still a diagnal matrix.
Upper triangular matrix We call a matrix A to be upper triangular, if all the non-zero entries lies on or
above the diagnal. In other words, all entris below the diagnal is 0. The upper triangular matrices looks like:


a11 a12 · · · a1n

a22 · · · a2n 



.. 
.
.

.
. 
ann
Proposition 10. The product of two upper triangular matrices is still a upper triangular matrix, and the
entries of the diagnals of the product is the product of their corresponding entries in diagnals. if all the
entries on the diagnal are non-zero, then upper triangular matrix is invertible, and the inverse is still a
upper triangular matrix.


 

1 3 5
2 5 1
2 14 42
6 7 
3 2 =
18 61 
Example 34. We compute 
9
7
63
This example reflects the fact that the product of two upper triangular matrices is upper triangular, and
the diagnal entries correspond to product of each entries in diagnal.2 = 1 × 2, 18 = 6 × 3, 63 = 9 × 7
We will prove this result after studying the blockwise computation of matrices.
Lower triangular matrix We call a square matrix A to be lower triangular, if all the non-zero entries
lies on or below the diagnal. In other words, all entris above the diagnal is 0. The upper triangular matrices
looks like:
16





a11
a21
..
.

a22
..
.




..
.
an1 an2 · · ·
ann
Proposition 11. The product of two lower triangular matrices is still a lower triangular matrix, and the
entries of the diagnals of the product is the product of their corresponding entries in diagnals. If all the
entries in the diagnal are non-zero, then the lower triangular matrix is invertible, and the inverse is a lower
triangular matrix.
Symmetric matrix :we call a square matrix A to be symmetric, if it saitisfies AT = A.
The entry of symmetric matrix is symmetric with respect to diagnal. See the following examples
Example 35. The
 followingare symmetric matrices
1 4 6
1 3
, 4 7 5 
3 1
6 5 0
1.3.2. The geometric meaning of matrix multiplication. In this subsection, we fix our field R, this means every entry in the matrix are real numbers. As we said,
the matrix represents a linear map. How about the matrix multiplication?
As we said earlier, matrix multiplication corresponds to linear map composition.
To be concrete, the composition of two linear map just means do the first linear transformation first, and
then do the second one. We will see an example in this subsection, the detail discussion would left to the
future chapters.
We illustrate some of the geometric meaning of matrices we defined:
1 0
Unit matrix defines identity map for example, when n=2, the matrix In =
corresponding to
0 1
the equation u = 1x + 0y; v = 0x + 1y, which means u = x; v = y, so the linear map maps (x,y) to (x,y).
So it seems like doing nothing.
Now we will see a example that is helpful for us to understand the geometric meaning of linear multiplication.
Example 36. Rotation and Rotation again.
If you rotate a graph counterclockwisely by 90 degree, and then do it again, what is the final result? The
final result would be rotate a graph by 180 degree.
17
and
exactly reflects on the matrix, we know that rotationcounterclockwise by 90 degree is given
that is 0 −1
−1 0
by
and rotate by 180 degree is given by
. and the matrix multiplication exactly
1 0
0 −1
reflect
this fact:
0 −1
0 −1
−1 0
=
1 0
1 0
0 −1
√ !
3
1
−
−
2
√2
Example 37. If we know rotate a graph counterclockwise by 120 degree is given by a matrix
,
3
− 21
2
√ !3
3
1
−
−
1 0
2
2
√
show that
=
3
0 1
− 12
2
√ !3
1
−
− 23
√2
means rotate three times, each time 120 degree, so finally it will seems like
3
− 21
2
1 0
doing nothing, and matrix for doing nothing is the unit matrix, that is
0 1
18
1.4. Block Matrix Multiplicaiton. Matrix multiplication is not only valid if we do it numberwise, it’s also
make sense blockwise.
Definition 11. A partition of n is an ordered numbers P = (n1 , n2 , · · · , nk ), such that n1 + n2 + · · · +
nk = n . We call each summand nk as k’th part of partition Two partition P1 = (a1 , a2 , · · · , ak ) and
P2 = (b1 , b2 , · · · , bk ) are said to be the same if each correspond part are the same, that is ai = bi
Example 38. Let (2, 3, 1) be a partition of 6, this partition separate object like this: oo|ooo|o
Definition 12. A block matrix is a matrix with partition Pr on its rows and Pc on its columns. These
partition separate matrix into blocks. We denote the ij-block the block located in the i’th part row and j’th
part column
Example 39. Suppose we have a partition Pr = (2, 1) and Pc = (1, 2), and a 3 × 3 matrix A. then we
can separate A by row partition Pr and column partition Pc into a block matrix:




1
2 3
1 2 3
7 9 
A= 1 7 9 → 1
0 2 5
2 5
0
A11 A12
and then by this partition we can denote A as a block matrix A =
A
A22
21
1
2 3
where The 11-block is A11 =
, 12-block is A12 =
, 21-block is A21 = 0, 22-block is
1
7 9
A22 = 2 5
Proposition 12. Block Matrix Multiplication: Suppose A is a block matrix, with row partition Pr and
column partition Pc . and B is a block matrix, with row partition Qr and column partition Qc . If Pc = Qr ,
then the product AB is precisely the same with the same rule applied to blocks, to be precise,
Pn
Pn

 
  Pn
A B
A B
A B
···
A11 A12 · · · A1n
B11 B12 · · · B1p
Pnj=1 1j jp
Pnj=1 1j j1 Pnj=1 1j j2
 A21 A22 · · · A2n   B21 B22 · · · B2p  
A
A
B
A
B
·
·
·
j=1 2j Bjp
j=1 2j j1
j=1 2j j2

 
 
×
=
 ..




..
.
.
.
.
.
.
..
.
.
.
..
..   ..
..
..
..  
.
.
..
 .
.
Pn .
Pn .
Pn .
Am1 Am2 · · · Amn
Bn1 Bn2 · · · Bnp
j=1 Amj Bjp
j=1 Amj Bj1
j=1 Amj Bj2 · · ·
and the row partition of the product is given by Pr , the column partition is given by Qc
We illustrate this property by an example.



1 5 2
1 2 0
Example 40. By the normal method of matrix multiplication, we can compute:  8 1 2   2 0 1  =
1 3 4
0 3 5


11 8 15
 10 22 11 . Now we use another computation method:
7 14 23
We separate matrices as follows:



1
2 0
1 5
2
 8 1
2  2
0 1 
1 3
3 5
4
0
And apply our general method of multiply matrices, first we compute each entries:
1 5
1
2
11
+
0=
8 1
2
2
10
19
1 5
8 1
1 3

2 0
0 1
+
1 3
2 0
0 1
2
2
1
2
3 5
=
8 15
22 11
+4×0=7
+4×

3 5
=
14 23
11
8 15
22 11 
Thus, the final result is  10
14 23
7
It doesn’t matter which partition we use to calculate the product.
A very important view of matrix multiplication is by viewing matrix as row vectors of column vectors,
or column vectors of row vectors. We can use block matrix to strict our word. The trivial partition of n is
(n), which means no partition at all. The singleton partition of n is (1, 1, 1, · · · , 1), which means separate
an n-element set in to singletons. Thus, row vector of column vector is a block matrix with trivial partition
of rows and singleton partition on columns, and column vector of row vector is a block matrix with trivial
partition of columns and singleton partition on rows. Now we use this idea to discuss more about matrix
multiplication.
Definition 13. Suppose c1 , c2 , · · · , cn are column vectors over F. a linear combination of column vectors
c1 , c2 , · · · , cn with coefficient a1 , a2 , · · · , an means the column vector formed by c1 a1 + c2 a2 + · · · + cn an .
We also define linear combination of row vectors in the same manner. The lienar combination of row vectors
r1 , r2 , · · · , rn means a1 r1 + a2 r2 + · · · + an rn .
 
 
 
1
1
2





3 , c2 =
1 . Then
4  is a linear combination of
Example 41. Over field Q. Suppose c1 =
5
1
6
     
2
1
1





4
3
1 
c1 and c2 , the coeffitient has only one choice: a1 = 1, a2 = 1, which means
=
+
6
5
1
 
 
 
1
1
2
Example 42. Over field Q. Suppose c1 =  3 , c2 =  1 . Then  7  is not a linear combination
5
1
6
of c1 and c2 , because we can’t find any coefficient that can combine such a vector. (To prove, show that the
entry of any linear combination of c1 and c2 must be a arithmetic sequence)
1
0
1
3
Example 43. Over field Q. Suppose c1 =
, c2 =
, c3 =
. Then
is a linear
0
1
1
4
combination of c1 ,c2 ,c3 , could with coefficient a1 = 3, a2 = 4, a3 = 0, also could with coefficient a1 = 0,
a2 = 1, a3 = 3. There are infinitely many choice of coefficient. In this case, we always say that to represent
3
, these three vectors are redundant, which means we can only choose two of them, such as c1 and c2
4
3
and represents our
uniquely as 3c1 + 4c2 .
4
Proposition 13. Suppose Am×n , and Bn×p are two matrices. Let Cm×p = AB. Then each column of C is
linear combination of columns of A,with coefficient given by the corresponding rows of B. each row of C
is linear combination of rows of B, the coefficeint of i’th row of C comes from i’th row A. The coefficient of
j’th column of C comes from j’th column of B.
20



Suppose A = 

Proof.
a11
a21
..
.
a12
a22
..
.
···
···
..
.
a1n
a2n
..
.






, B = 


b11
b21
..
.
b12
b22
..
.
···
···
..
.
b1p
b2p
..
.





am1 am2 · · · amn
bn1 bn2 · · · bnp
To prove, write
A
in
to
a
block
matrix,


···
a12
a1n
a11
 a21
···
a22
b2n 


that is A = 

..
..
..
..


.
.
.
.
aa2
···
bmn
am1
To simplify notation, denote as A = (c1 , c2 , · · · , cn ) which ci are column vectors.
Now using blockmatrix multiplication, 
b11 b12 · · · b1p
 b21 b22 · · · b2p 


(c1 , c2 , · · · , cn )  ..
..
..  = (d1 , d2 , · · · , dn ); where d1 = c1 b11 +c2 b21 +· · ·+
..
 .
.
.
. 
bn1 bn2 · · · bnp
cn bn1 , d2 = c1 b12 + c2 b22 + · · · + cn bn2 ,...
This means the column vectors of product is linear combinations of column vectors of A.
For the rest of the proof, we remain to reader.



1 3 6
1 2 0
Example 44. What is the second column of the matrix product  2 7 8   0 0 2 ?
1 9 0
3 4 3


1 3 6
Answer: The second column of product is linear combination of columns of  2 7 8 , and the
1 9 0


 
1 2 0
2
coefficient comes from the second column of  0 0 2 , that is  0 . Thus we compute:
3 4 3
4
 
 


1
6
26
 2  × 2 +  8  × 4 =  36 
1
0
2


A11 A12 · · · A1n
 A21 A22 · · · A2n 


Proposition 14. Transpose of Block Matrix Suppose A =  ..
..
..  is a block matrix,
..
 .
.
.
. 
Am1 Am2 · · ·



T
then A = 

AT11
AT12
..
.
AT21
AT22
..
.
AT1m AT2m
···
···
..
.
···
ATn1
ATn2
..
.
ATnm
Amn





This proposition means before we transpose the position of each block, we should transpose each block
itself first.

T 

3 2
3
1 2
7
5 6  = 1 5
Example 45.  2
1 
1 6
7
2 6
6
21
To illustrate an application of block matrix multiplication, we give the proof of (AB)T = B T AT
Proposition 15. (AB)T = B T AT
Proof.
Separate A and
 B into
 Block Matrix,
r1
 r2 


we assume A =  .. , B = c1 c2 · · · cp
 . 
r
m


We see that AB = 

r1 c1
r2 c1
..
.
r1 c2
r2 c2
..
.
···
···
..
.
r1 cp
r2 cp
..
.





rm c1 rm c2 · · · rm cp
For column vector c row vector r, we should have rc = cT rT


a1
 a2 


Assume c =  .. , r = b1 b2 · · · bn
 . 
an
We compute rc = b1 a1 + · · · + bn an , cT rT = a1 b1 + · · · + an bn .
because number multiplication is commute,
Then we have b1 a1 + · · · + bn an = a1 b1 + · · · + an bn
Then we have rc = cT rT
Because a transpose of a number is this number itself, so (rc)T = rc = cT rT

 T T T T
T
c1 r1 c1 r2 · · · cT1 rm
 cT rT cT rT · · · cT rT 
2 2
2 m 
 2 1
T
Thus, (AB) =  ..
..
.. 
.
.
 .
.
.
. 
T
cTp r1T cTp r2T · · · cTp rm
On the 
other hand,

cT1
 cT 
 2 
T
B T =  .. , AT = r1T r2T · · · rm
 . 
cTp
 T T T T

T
c1 r1 c1 r2 · · · cT1 rm
 cT rT cT rT · · · cT rT 
2 2
2 m 
 2 1
we see that B T AT =  ..
..
.. 
..
 .
.
.
. 
T
cTp r1T cTp r2T · · · cTp rm
So we conclude (AB)T = B T AT
1.5. Elementary Matrices, Row and Column Transformations.
We have already studied in the last section that we can view the matrix multiplication as linear combination of column vectors of the first matrix, or row vectors of the second matrix. And the coefficient of
matrix multiplication is exactly given by other matrix. This shows that to understand matrix multiplication,
22
we have to study the linear combination of row and column vectors. In this section, we will study the most
basic linear combination of rows and columns, row and column transformation.
1.5.1. Elementary Row transformation. We have three types of row transformation.
row switching This transformation swiches two row of matrix.




1 4 8
3 3 5
r1 ↔r3
Example 46. Switch the 1st and 3rd row of matrix  2 0 9  −−
−−→  2 0 9 
3 3 5
1 4 8
row multiplying This transformation multiplies some row with scalar λ




1 4 8
1 4 8
2×r2
 4 0 18 
Example 47. Multiply the 2nd row of matrix by 2 :  2 0 9  −−−→
3 3 5
3 3 5
row adding In this transformation, we multiply some row by a scalar, but add that into another row.




1 4 8
1 4 8
r3 +2×r2
Example 48. Add twice of the 2nd row to the 3rd row :  2 0 9  −−
−−−→  2 0 9 
3 3 5
7 3 23
Caution: Write 2 × r instead of r × 2, the reason for that is simple, because scalar is 1 × 1 matrix. In this
view, scalar can only appear in fromt of row vectors.
Simillarly, we can define the column transformation in the same way.
1.5.2. column transformation. column switcing This transformation swiches two column of matrix.




1 4 8
8 4 1
c1 ↔c3
Example 49. Switch the 1st and 3rd column of matrix  2 0 9  −−
−−→  9 0 2 
3 3 5
5 3 3
column multiplying This transformation multiplies some column with scalar λ




1 4 8
1 8 8
c2 ×2
Example 50. Multiply the 2nd column of matrix by 2 :  2 0 9  −−
−→  2 0 9 
3 3 5
3 6 5
column adding In this transformation, we multiply some column
column.

1

2
Example 51. Add twice of the 2nd column to the 3rd column :
3
by a scalar, but add that into another



4 8
1 4 16
c3 +c2 ×2
0 9  −−
−−−→  2 0 9 
3 5
3 3 11
1.5.3. Realization of elementary transformation by matrix multiplication. In the view of last section, row
transformation is equivalent to left multiplication, column transformation is equivalent to right multiplication. In order to make it precise, we define the following Elementary Matrices
23
Definition 14. The Switching matrix is the matrix obtained by swapping ith and jth rows of unit matrix.
Denote by Sij :



i


Sij = 

j


1

..
.
0
1
..










1
.
0
..
.
1
Proposition 16. Left multiplyting switching matrix Sij will switch ith and jth rows of the matrix.


1 0 0
Example 52.  0 0 1  is obtained by switching 2nd and 3rd row of unit matrix, left multiplying
0 1 0


1 0 0
 0 0 1  will do the same thing to rows of other matrix. As we compute using definition of matrix
0 1 0
multiplication:


 

1 0 0
1 2 3
1 2 3
 0 0 1  4 5 6  =  7 8 9 
0 1 0
7 8 9
4 5 6


1 2 3
The result is exactly switch 2nd and 3rd row of  4 5 6 
7 8 9
Proposition 17. Right multiplying switching matrix Sij will switch jth and ith columns of the matrix.

1 0
Example 53.  0 0
0 1


1 0 0
ing  0 0 1  will
0 1 0
matrix
multiplication:


1 2 3
1
 4 5 6  0
0
7 8 9

0
1  is obtained by switching 2nd and 3rd column of unit matrix, right multiply0
do the same thing to columns of other matrix. As we compute using definition of
 

0 0
1 3 2
0 1 = 4 6 5 
1 0
7 9 8


1 2 3
The result is exactly switch 2nd and 3rd column of  4 5 6 
7 8 9
24
Definition 15. The Multiplying matrix is the matrix obtained by multiplying ith row by non-zero scalar λ
of unit matrix. Denote by Mi (λ):
i




Mi (λ) = i 


1

..
.
λ
..






.
1
Proposition 18. Left multiplyting multiplying matrix Mi (λ) will multiplies i’th row of the matrix by λ.


1 0 0
Example 54.  0 3 0  is obtained by multiplying the 2nd row of unit matrix by 3, left multiplying
0 0 1


1 0 0
 0 3 0  will do the same thing to rows of other matrix. As we compute using definition of matrix
0 0 1
multiplication:


 

1 0 0
1 2 3
1 2 3
 0 3 0   4 5 6  =  12 15 18 
0 0 1
7 8 9
7 8 9


1 2 3
The result is exactly multiplying the 2nd row of  4 5 6  by 3
7 8 9
Proposition 19. Right multiplyting multiplying matrix Mi (λ) will multiplies i’th column of the matrix by λ.


1 0 0
Example 55.  0 3 0  is obtained by multiplying the 2nd column of unit matrix by 3, right multi0 0 1


1 0 0
plying  0 3 0  will do the same thing to columns of other matrix. As we compute using definition of
0 0 1
matrix
multiplication:

 


1 2 3
1 0 0
1 6 3
 4 5 6   0 3 0  =  4 15 6 
7 8 9
0 0 1
7 24 9


1 2 3
The result is exactly multiplying the 2nd column of  4 5 6  by 3
7 8 9
25
Definition 16. The Addition matrix is the matrix obtained by add jth row by scalar λ to the ith row of unit
matrix. Denote by Aij (λ):
j



i


Aij (λ) = 




1

..
.
1
..










λ
.
1
..
.
1
Proposition 20. Left multiplying addition matrix Mi (λ) will add λ times j’th row to the i’th row.


1 2 0
Example 56.  0 1 0  is obtained by adding twice of the 2nd row of unit matrix to 1st row, left
0 0 1


1 2 0
multiplying  0 1 0  will do the same thing to rows of other matrix. As we compute using definition
0 0 1
of matrix
multiplication:


 

1 2 0
1 2 3
9 12 15
 0 1 0  4 5 6  =  4 5 6 
0 0 1
7 8 9
7 8 9


1 2 3
The result is exactly adding twice of the 2nd row to the 1st row of  4 5 6 
7 8 9
Proposition 21. right multiplying addition matrix Mi (λ) will add λ times i’th column to the j’th column.


1 2 0
Example 57.  0 1 0  is obtained by adding twice of the 1st column of unit matrix to 2nd column,
0 0 1


1 2 0
right multiplying  0 1 0  will do the same thing to columns of other matrix. As we compute using
0 0 1
definition
of
matrix
multiplication:


 

1 2 3
1 2 0
1 4 3
 4 5 6   0 1 0  =  4 13 6 
7 8 9
0 0 1
7 22 9


1 2 3
The result is exactly adding twice of the 1st column to the 2nd column of  4 5 6 
7 8 9
Caution: when Aij (m) serving at left, it is add m times j’th row to i’th row. But when it serving at right,
it is add m times i’th column to j’th column. Keep in mind that the order is different.
As previous example, we have seen there is an easy way to remember the operation. When left multiplying, like AB, you think how can we get A by row transformation from unit matrix, and then to get the
26
product, do the same row transformation to B. When right multiplying, like BA, you think how to get A by
column transformation from unit matrix, and then to get the product, do the same column transformation to
B.
All the matrices we defined above Sij , Mi (λ), Aij (λ), are called Elementary matrices.
−1
Proposition 22. Elementary matrices are invertible, Indeed, Sij
= Sij , Mi (λ)−1 = Mi (λ−1 ), Aij (λ)−1 =
Aij (−λ)
1.5.4. Row echelon form and column echelon form. Now we want to know by row transformation, how far
could we go. We want to make our matrices looks cleaner and better. Now we begin with an example to see
how neat we can do.
We start by an example:


1
2
1
2
7
 2
4
2
6
18 
−1
−2
0
−1
−3
Firstly, we use the first row to kill the first entry of other two rows,
 r2 − 2 × r1 

1
2
1
2
7
1
2
1
2
7
r3 + 1 × r1
 2
4
2
6
18  −−−−−−−−−−→ 
2
4 
−1
−2
0
−1
−3
1
1
4
Now, the first column is clear(with clear I mean that column only contains 1 element), funtunately so is
the second column. Now in the third column, we have two 1. But we can’t use the first row to clean the
second row, because if we do that. It would hit the 3-1 entry and 3-2 entry, we want to keep them to be 0.
Because finally we want it to look like stairs. So we arrange each rows from longer to shorter. Thus, we
might swap the last two rows




1
2
1
2
7
1
2
1
2
7
r2 ↔r1

2
4  −−
1
1
4 
−−→ 
1
1
4
2
4
Now we can use the 2nd row to clean up the first row




1
2
1
2
7
1
2
1
3
r1 −1×r2

1
1
4  −−
1
1
4 
−−−→ 
2
4
2
4

Now in the last row, the leading number is 2, we can make it to be 1 by multiplying 21




1
2
1
3
1
2
1
3
1
×r
3

1
1
4  −2−−→ 
1
1
4 
2
4
1
2
Now use the last row to clean up first two rows.
 r1 − 1 × r3 

1
3
1
2
1
r2 − 1 × r3

1
1
4  −−−−−−−−−−→ 
1
2 
1
2
1
2
This method works for every matrices. The simplest matrix we can reach is the matrix looks like above.
We call it Row Echlen Form

1
2
Definition 17. A m × n matrix is called a Row Echelon Form if it saitiesfying
(1) The first non-zero entry for every row vector is 1, we call this leading 1(or pivot)
27
(2) If a row contains a leading 1, then each row below it contains a leading 1 further to the right
(3) If a column contains a leading 1, then all other entries in that column are 0
Example 58. Which of the following is Row Echelon Form?
(1)


1
1 
3
2
1

1
(2)

1
6
1



1
6

6



1
(3)

1
3
2
1




2
1
0
1



1
Answer: The first one is not a row echelon form because there is a row that leading non-zero entry is 2;
The second one is not a echelon form, because if we circle the pivid:

1
6
1



1
6

6



1
In the second column, it contains an element of pivod, but there is other non-zero entry in the second
column.
The third one is a echelon form, and the pivod is




1
1
3
2
1

2
1
0



1
Now let’s summerize the steps we did in the first example and using the matrix multiplication to explicitly
represent that.
28

1
 2
−1
2
4
−2
1
2
0
2
6
−1
r2 − 2 × r1


7
1
r3 + 1 × r1
18  −−−−−−−−−−→ 
−3

r ↔r
2
1
−−
−−→
r −1×r
1
×r3
2
−−−→
2
2
1

7
4 
4
1
2
1
1
2
1
2

7
4 
4
1
2
0
1
1
1
2

3
4 
4
1
2
1
1
1

3
4 
2
1

1
2 
2


1
1


1
2
−−
−−−→
2
1

r1 − 1 × r3

1
r2 − 1 × r3
−−−−−−−−−−→ 
2
1
Combined with what we learned by the meaning of left multiplying the elementary matrices, we can
interprete every row transformation langrage by left multiplying a matrix. So It is the same as the following
29
picture


1
 2
−1
2
4
−2
1
2
0




2
6
−1
7
18 
−3
 
1
−2 1

1
 
×
 
1

×

1
1
1
−−−−−−−−−−−−−−−−−−−−−−−−→

1









×

1


×


2
1
2

7
4 
4
1
2
0
1
1
1
2

3
4 
4
1
2
1
1
1

3
4 
2
1

1
2 
2

1
−1
1
1

1
1
2

×


−−−−−−−−−−−→
1
2

1 −1
1




 

1
 
× 
 
1
Every step is the same as the following equalities:



1
1
1
 −2 1

 2
1
1
1
1  −1

1
2
1
2
7
2
4 
= 
1
1
4

1
  −2 1
 1

= 
1
1
2
1
1
2
1
2


1
2
4
−2
1

 2
1
1  1
1
−1
7
4 
4
30
1

1 −1 

×
1
1
1
−−−−−−−−−−−−−−−−−−−−−−−−−→ 




7
4 
4
1

1
−−−−−−−−−−−−→

2
2
1

1
1
−−−−−−−−−−−→
1
1
1


2

1
2
0
2
4
−2
2
6
−1
1
2
0
2
1

7
18 
−3
2
6
−1

7
18 
−3


1 −1

1
1
 1
1

1
2
0
1
= 



1
1

1
2
1
1 −1

1
2
= 
−1
1
1


1
1
1
1
= 

1
1
2
1


1
2
1



1
1 
3
4 
2
1
1
1
2
1
2
0
1

1

1 −1

1
1
1
2
6
−1
 
7
1
18  = 
−3
1
2
−1
2
4
−2
1
2
0

1


1
1
1
1

1
2 
2
Remark 1. Whenever you see a statement saying that there exists an invertible matrix P such that P M =
N , it is equavalent to say N can be obtained by row transformation from M, sometimes prove the later
sentense is more clear.
Whenever you see a statement saying that there exists an invertible matrix P such that M P = N , it is
equavalent to say N can be obtained by column transformation from M.

1

1
Proposition 23. For every matrix Am×n , there exists a invertible matrix Pm×m , such that P A is a row
echelon form. In this case, P A is called reduced row echelon form of A, denoted as rref(A)
31

1
1
7
18
−3
1
1

1
2
2
6
−1

1
1
  −2 1
 1

7
18 
−3
2
6
−1
1
  −2 1
 1

1 −1

1
1
2
0

1
1

1
2

1

1
2
4
−2
1

1
Now we compute
:


1
1
−1
1


1
1
1 −1  
1
1


1 − 21 −1
 2 −1 1 
2
−1 12
0
2
4
−2

1
  −2 1
 1
1 −1  
1 
1
2 
1
2
So our conclusion is
:

1 − 12 −1
1
 2 −1 1   2
2
−1
−1 12
0

1
1

 2
1
1
1
1
−1
1 
3
4 
4

1

1
  −2 1
=
1
Definition 18. For matrix Am×n , the rank of the matrix A is defined to be the number of non-zero rows of
rref(A)
Proposition 24. The rank of A is at most the number of non-zero rows of A


2 1 6
Example 59. Find the rank of A =  2 9 1 
2 2 0


2 1 6
 2 9 1 
2 2 0

r3 −1×r1
−−−−−→

2 1 6
 2 9 1 
1 −6

r2 −1×r1
−−−−−→
1
×r1
2
−−−→

2 1 6

8 −5 
1 −6

r ↔r
2
3
−−
−−→
r1 − 12 ×r2

3
8 −5 
1 −6
1
1
2

3
1 −6 
8 −5
1

6
1 −6 
8 −5
1

6
1 −6 
43
1

6
1 −6 
1
1





−−−−−−→ 

r −8×r
3
2
−−
−−−→
1
×r3
43
−−−−→




r −6×r
1
3
−−
−−−→
r +6×r
2
3
−−
−−−→
1 −6 
1


1
2
1

1

1

1
Thus, the rank of A is 3, ref f (A) = I3
32
Now we write it into the language of matrix multiplication. That is


1
2




1
×
1

1





1
43
1
−8

1 − 12
×
1

1
×
1
1
1
 ×  −1 1
1

−6
1
1 6 ×
1



1


1


1
×
1 ×
1
1
 

2 1 6
1
×
× 2 9 1 =

1
1
1
−1
1
2 2 0
1
1

Now we would like to compute


1
2


1
×
1



1
43
1 − 12
×
1

1
1
−8

1



1
×
1 ×
1
1

1
×
1
−1
1
1

×
1
1
1
 ×  −1 1
1

−6
1
1 6 ×
1



1

1
It is the same as



1 6 ×
1



1
1
2

−6
1
1


1
×
1



1
43
1

×
1
−1

1
1 − 12
×
1

1
1
−8
1
×

×
1
1
1
 ×  −1 1
1

1
1

1
So what we need is do the same row transformation to the identity matrix:
33


1
×
1

1


1 ×
1


1
1


1

r −1×r
3
1
−−
−−−→

1
1


−1
1

r −1×r
2
1
−−
−−−→

1
 −1 1

−1
1

1
×r1
2
−−−→
r ↔r

1
2

 −1 1

−1
1

2
3
−−
−−→
1
2
 −1
1 
−1 1

1
− 21
−−−−−−→  −1
1 
−1 1
r1 − 12 ×r2


1
− 21
 −1
1 
7 1 −8

r −8×r
3
2
−−
−−−→
1
×r3
43
−−−−→
1
 −1

r1 −6×r3
−−−−−→
r +6×r

−8
43
7
43
1
43
1
43
6
− 43
53
86
7
43
1
43
−8
43
 −1

2
3
−−
−−−→

− 21
1 

1
43
1
− 43
7
43
6
− 43
6
43
1
43

1 
53
86
5
− 43
−8
43


Thus we know

1
43
1
− 43
7
43
6
− 43
53
86
5
− 43
−8
43

 

2 1 6
1
6

 2 9 1  = 

1
43
1
2 2 0
1
43
 1
6
− 43
43
6
1
−1
Note that by definition of inverse matrix, this exactly means A =  − 43
43
7
43
34
1
43
53
86
5
− 43
−8
43


We call a square matrix A of full rank, if the rank of A is equal to the size of A.
Think about Echelon form, now suppose A is a full rank n × n square matrix, A is full rank means
rank(A) = n, this means ref f (A) has n leading 1, but the space in the matrix is square, that forces 1
to distribute in diagnal, and because leading 1 can clear all other entries in its column, so the Reduced
Row Echelon Form of the full rank square matrix should be unit matrix. And thus, we record our row
transformation in matrix P. so we can find a invertible matrix P,such that P A = I, so A is invertible. This
means full rank matrix is invertible. On the other hand, if A is invertible, this means there exists some P,
such that P A = I, but this just means doing some row transformation of A, and the final result is unit
matrix, a Row Echelon form, so we know reff(A)=I, and in this way, rank(A)=n.
Proposition 25. For square matrix A, A is invertible if and only if A is full rank.
This process gives a method to find the inverse of A, which is apply the same row transformation process
to unit matrix. Sometimes people will use this method to do this two steps simutaneously, like following:
35

2 1 6
 2 9 1
2 2 0
1
r −1×r
2 1 6
 2 9 1
1 −6

r −1×r
2
1
−−
−−−→
1
×r1
2
−−−→
2 1 6

8 −5
1 −6



r ↔r
2
3
−−
−−→
r1 − 12 ×r2
1
r −8×r
3
2
−−
−−−→
1
×r3
43
−−−−→
r −6×r
1
3
−−
−−−→

−−−−−→


1

−1 1
−1
1
1
2

1
2


−1 1
−1
1
1
6
1 −6
43

1
− 12
−1
1 
7 1 −8
1
6
1 −6
1
1
−1
1
1 −6
1

r2 +6×r3
1
1
− 21
−1
1
−1 1


1
2
−1

6
1 −6
8 −5
1


3
8 −5
1 −6
1
−1
1 
−1 1
−−−−−−→ 

1
2

1
3
1 −6
8 −5
1



1

3
1
−−
−−−→

1
1
1
1


− 12
1
−8
43


7
43
1
43
1
43
6
− 43
53
86

1
7
43
1
43
−8
43

1
43
1
− 43
7
43
6
− 43
−1
6
43
1
43
53
86
5
− 43
−8
43


When you put unit matrix on the right and doing row transformation to simplify the left matrix to unit, the
inverse of the matrix appears on the right.
The row echelon form is the simplest matrix we can reach by row transformation. But remember we can
still do column transformation, if we continue to do column transformation, things would be more clean.
36
Let’s continue the following example, now let’s see what happen if we do column transformation.


1
2
1

1
2 
1
2

0
0
0
0
1
0
0
0
1

1
2 
2

0
0
0
0
1
0
0
0
1

0
2 
2

0
0
0
0
1
0
0
0
1

0
0 
2

0
0
0
0
1
0
0
0
1

0
0 
0

1
 0
0
0
1
0
0
0
0
0
0
1

0
0 
0

0
1
0
0
0
1
0
0
0

0
0 
0
1
c2 −c1 ×2
−−
−−−→  0
0
1
c5 −c1 ×1
−−
−−−→  0
0
1
c5 −c3 ×2

0
−−−−−→
0
1
c5 −c4 ×2

0
−−−−−→
0
c ↔c
2
3
−−
−−→
c ↔c
3
4
−−
−−→
1
 0
0
And in each step, is the same as right multiplying a 5 × 5 matrix, If you want to figure out what is the
matrix,
 just do the same transformation
to unit matrix. This matrix is given by
1
0
0
−2
1
 0
0
0
1
0 


 0
1
0
0
−2 


 0
0
1
0
−2 
0
0
0
0
1
Combine with all we discussed, our conclusion is


1 − 21 −1
1
2
1
2
 2 −1 1   2
4
2
6
2
−1
−2
0
−1
−1 12
0


1
0
0
0
0
1
0
0
0 
= 0
0
0
1
0
0
Now we write it into a block matrix
37



7



18


−3
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
−2
1
0
0
0
1
0
−2
−2
1








1 0 0
0
0
 0 1 0
0
0 
0 0 1
0
0
This is the most simplest form we could reach by doing both row and column transformations. The
number of 1 at the end should be the number of rows in reduced row echlon form. So the number of 1 here
reflects the rank. weget this result because the
 matrix we are using is of rank 3. If we start with a matrix of
1 0
0 0 0
0 0 0 . We conclude that in the proposition
rank 2, we will get  0 1
0 0
0 0 0
Proposition 26. Suppose Am×n is of rank r, then there exist two invertible square matrices Pm×m , and
Qn×n , such that
Ir
0r×(n−r)
P AQ =
0(m−r)×r 0(m−r)×(n−r)
Here 0 means a matrix with all the element 0.
1.5.5. Some properties of rank. Now we discuss some useful properties of rank. remember rank is defined
to be the number of rows of reduced echelon form of A, and is also the number of 1 when both row and
column transform is allowed.
Proposition 27. rank(A) = rank(AT )
Proof.
Suppose rank(A)=r;
Then we can find some invertible matrix P,Q, such that P AQ =
Ir
0r×(n−r)
.
0(m−r)×r 0(m−r)×(n−r)
Now we transpose both side, we have
T Ir
0r×(m−r)
Ir
0r×(n−r)
=
,
QT AT P T =
0(m−r)×r 0(m−r)×(n−r)
0(n−r)×r 0(n−r)×(m−r)
because QT and P T are all invertible,
then rank(AT ) = r
Proposition 28. If B is an invertible matrix, then rank(BA) = rank(A), rank(AB) = rank(A)
Proof. We prove first, suppose rank(A)=r, then there exists square matrices P Q, such that P AQ =
Ir
0r×(n−r)
0(m−r)×r 0(m−r)×(n−
Now because P AQ = P B −1 BAQ. So [P B −1 ] would serve as ”P” for BA, and Q would serve as ”Q” for
BA. The same method for AB
Proposition 29. The rank of A is at most the number of non-zero rows or the number of non-zero columns
Proof.
We just do our algorithm of reducing to echelon form on non-zero rows, and thus the number
of non-zero row in the reduced row echelon form would less than the number the original non-zero rows.
Because the rank of A and A transpose are the same, the proposition also hold for columns Proposition 30. Consider the block matrix, rank A B ≥ max{rank(A), rank(B)}
Proof. We only have to prove rank(A) ≤ A B , if that is ture, the same method would also works for
B
P
A
P
B
Now we do row transformation such that it is reduced echelon form, that is P A B =
where P A is a reduced echelon form, it has r row. now below r’th row of P A P B every non-zero
entry comes from PB part, so it is further right to the r’th
leading 1. Continue the algorithm, finally we will
have no less than r leading 1. so the rank of A B is at least rank of A.
We list the following facts and leave it as excercise to reader
38
Proposition 31. If Am×n , Bn×q be two matrices. Then rank(AB) ≤ min{rank(A), rank(B)}
Proposition 32. If A, B is two matrices of the same size. Then rank(A + B) ≤ rank(A) + rank(B)
1.5.6. Block Row and Column Transformations. If you separate Matrix into Blocks, can you do Row and
Column transformations? The answer is yes. We define the following elementary block matrices.
We have three types of row transformation.
block row switching This transformation swiches two row of matrix.




8 9
1
2 3
7
br ↔br2
 1
5 6  −−1−−−→
Example 60. Switch the 1st and 2nd block row of matrix  4
2 3 
8 9
7
4
5 6
block row multiplying This transformation left multiplies an invertible matrix to some block row λ
!
Example 61. Left Multiply the 1st block row of matrix by

13
 9
7
1 3
1 2

1

4
(Note this is invertible):
7
1 3
×
2 3
1 2

5 6
−−−−−−−−−
8 9


17 21
12 15 
8 9
block row adding In this transformation, we left multiply some row by a matrix, but add that into another
block row.
!

8
 18
7
1
2


1

4
times the 2nd block row to the 1st block row(note: left multiply) :
7
1
br1 +
×b
2 3
2

5 6
−−−−−−−−−−
8 9

10 12
21 24 
8 9
Remark 2. Note that when we do row transformation, we multiply everything on the left, when we do
column transformation, we should multiply on the right
Simillarly, we can define the block column transformation in the same way.
1.5.7. block column transformation. column switching This transformation switches two
trix.



1
2 3
bc ↔bc2

5 6  −−1−−−→
Example 63. Switch the 1st and 2nd block column of matrix  4
8 9
7
column of ma2 3
5 6
8 9

1
4 
7
column multiplying This transformation right multiplies some block column with invertible matrix λ
!
Example 64. Right Multiply the 2nd block column by

1
 4
7

5 12
11 27 
17 42
39
1 3
1 2

1
:  4
7
1 3
2 3
1 2
5 6  −−−−−−−−−−→
8 9

bc2 ×
column adding In this transformation, we right multiplies some column by a matrix, but then add that
into another column. The examples leave to the reader.
Similarly, we also have the following elementary block matrices
Suppose we fix a partition P, then the elementary block matrix is a square matrix with same partition
P = (n1 , n2 , · · · , nk ) on its columns and rows. So the diagonal block of block matrices are square
Definition 19. The Block Switching matrix is the matrix obtained by swapping ith and jth block of unit
matrix. Denote by Sij :



i


Sij = 

j


I

..
.
0
..










I
.
I
0
..
.
I
where each I is of the right size nk
Proposition 33. If the row partition of the block matrix A is the same as P, then left multiplying switching
matrix Sij will switch i’th and j’th block rows of A.
Proposition 34. If the column partition of the block matrix A is the same as P, then right multiplying
switching matrix Sij will switch i’th and j’th block column of A.
Definition 20. The Block Multiplying matrix is the matrix obtained by multiplying ith block row by invertible matrix Pni ×ni of unit matrix. Denote by Mi (λ):
i




Mi (P ) = i 


I

..
.
P
..






.
I
Proposition 35. If the row partition of the block matrix A is the same as P, then left multiplying multiplication matrix Mi (P ) will left multiplies i’th block row of the matrix by P .
Proposition 36. If the column partition of the block matrix A is the same as P, then right multiplying
multiplication matrix Mi (P ) will right multiplies i’th block column of the matrix by P .
40
Definition 21. The Block Addition matrix is the matrix obtained by add jth block row left multiplied by
matrix Mni ×nj to the ith block row of unit matrix. Denote by Aij (M ):
j



i


Aij (M ) = 




I

..
.
I
..










M
.
I
..
.
I
Proposition 37. If the row partition of the block matrix A is the same as P, then left multiplying addition
matrix Aij (M ) will add M times j’th block row to the i’th block row.
Proposition 38. If the column partition of the block matrix A is the same as P, then right multiplying
addition matrix Aij (M ) will add i’th block column times M to the j’th block column.
Proposition 39. Elementary Block Matrices are invertible.
Proposition 40. Elementary Block row or column Transformation does not change the rank.
Example 65. To show an example of the application of the block matrix, Suppose A and B are n × n square
matrices, we shows an identity rank(I − AB) = rank(I − BA)
Proof. We consider the 2n × 2n Matrix blocked by partition (n, n) , that is I A BI.
I A
B I
I
bc −bc ×A
−−2−−−1−−→
B I − BA I
br2 −B×br1
−−−−−−−→
I − BA
I
I A
From here we know rank
= rank
= n + rank(I − BA)
B I
I − BA
P
(Remember in the homework we proved rank
= rank(P ) + rank(Q)) We can also do
Q
another transformation:
I A
B I
I − AB A
bc1 −bc2 ×B
−−−−−−−→
I I
−
AB
br −A×br2
−−1−−−−−→
I
I A
I − AB
From here we know rank
= rank
= rank(I − AB) + n
B I
I
So, n+rank(I-BA)=n+rank(I-AB), thus, rank(I − AB) = rank(I − BA).
41
1.6. Determinant.
Introduction:If you think row vector of a matrix as coordinate of vectors in space, then the geometric
meaning of the rank of the matrix is the dimension of the parallelpepid spanned by them. But we are not
only care about the dimension, sometime we want the volumn of that. Then we will find that determinant is
the right value to work out. Historically speaking, determinant determines when does a system of equation
with n unknowns and n equalities have unique solution. In our matrix language, determinant determines
when a matrix is invertible. You will find out that determinant gives you a fast way to determine when a
matrix is invertible and what is the inverse. In 2 × 2 and 3 × 3 it is fast, but in 4 × 4 or more dimension
cases, the way we learned before might be the most fast way.
We denote the set of all m × n matrices over F by Mm×n
Theorem 4. There exists function f defined on Mn×n . That is assign every n × n matrix over F a value in
F. which saitisfying:
(1) If we switch ith and jth rows of B, then the value becomes to opposite. namely
f (Sij B) = −f (B)
(2) If we multiplied a row of B by λ, then the value is also multiplied by λ(here lambda could be 0).
namely
f (Mi (λ)B) = λf (B)
(3) If we add any scalar times jth row to ith row of B, then the value doesn’t change. namely
f (Aij (λ)B) = f (B)
Then this kind of function exists, and any two such function is differ by a scalar multiple.
As any such two functions differ by a scalar multiple, this means whenever we defines the value of unit
matrix, then the function is uniquely determined. This let us able to make following definition.
Definition 22. The determinant is a unique function on n × n matrices, which assign every n × n matrix
M over F a value in F. denoted by |M | or det(M ) which saitisfying:
(1) If we switch ith and jth rows of a matrix, then the determinant becomes opposite. explicitly:
a11 a12 a13 · · · a1n a11 a12 a13 · · · a1n ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· aj1 aj2 aj3 · · · ajn ai1 ai2 ai3 · · · ain ··· ··· ··· ··· ··· = − ··· ··· ··· ··· ··· aj1 aj2 aj3 · · · ajn ai1 ai2 ai3 · · · ain ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· an1 an2 an3 · · · ann an1 an2 an3 · · · ann (2) If we multiplied a any row by λ, then the determinant will also multiplied by λ(here lambda could
be 0). explicitly:
a11 a12 · · · a1n a11 a12 · · · a1n ··· ··· ··· ··· ···
· · · · · · · · · λai1 λai2 · · · λain = λ ai1 ai2 · · · ain ··· ··· ··· ··· ···
· · · · · · · · · an1 an2 · · · ann an1 an2 · · · ann (3) If we add any scalar times jth row to ith row of B, then the value doesn’t change. explicitly:
42
a11
a12
a13
·
·
·
·
·
·
···
ai1 + λaj1 ai2 + λaj2 ai3 + λaj3
···
···
···
a
a
aj3
j1
j2
·
·
·
·
·
·
·
··
an1
an2
an3
···
···
···
···
···
···
···
a1n
···
ain + λajn
···
ajn
···
ann
=
a11 a12 a13
··· ··· ···
ai1 ai2 ai3
··· ··· ···
aj1 aj2 aj3
··· ··· ···
an1 an2 an3
···
···
···
···
···
···
···
a1n
···
ain
···
ajn
···
ann
(4) The determinant for unit matrix is 1
In other words, determinant is the unique function that saitisfying det(Sij B) = − det(B); det(Mi (λ)B) =
λ det(B); det(Aij (λ)B) = det(B); det(In ) = 1
This definition is basically what is determinant, and all other properties of determinant can all be deduced
from its uniqueness
2 3 ;
Example 66. Compute the determinant of following matrices 4 5 We proceed as following:
2 3
4 5
2 3
= −1
· · · · · · r2 + (−2) × r1
2
= −1
· · · · · · r1 + 3 × r2
1
= 2 × −1
1
= 2 × (−1) × 1
= 2 × (−1) × 1
= −2
Proposition 41.
If one row is a linear combination of other rows, then the determinant is 0
This is a skill when you can see that, but don’t worry if you can’t. you can calculate all determinant just by
row transformation.(also column transformation is valid, we will talk about it in the next few propositions)
2 3 Example 67. Calculate 4 6 Some one can see that second line is twice of the first line, so the determinant should be 0. But one can
also calculate:
43
2 3
4 6
2 3 = 2 3
= 0 × ∗ ∗
= 0
· · · · · · r2 + (−2) × r1
* can be anything
anyway, if one row of determinant is 0, the determinant is just 0
2 3 4 Example 68. Calculate 1 1 1 7 8 9 Some one can see that 3rd row is the first line plus five times the second row, so the determinant should
be 0. But one can also calculate:
2 3 4
1 1 1
7 8 9
2 3 4
= 1 1 1
5 5 5
· · · · · · r3 + (−1) × r1
2 3 4 = 1 1 1 · · · · · · r3 + (−5) × r2
= 0
because the last row of determinant is 0 row
Proof. The first property is clearly from the axiom.
Proposition 42. The determinant for diagonal matrix is the product of all diagonal elements
44
Proof.
a11
a22
a33
..
1
a22
a33
= a11 1
1
a33
= a11 a12 .
ann
..
.
ann
..
.
ann
= ···
1
1
1
= a11 a22 a33 · · · ann ..
.
1
of the upper tirangular matrix is the product of diagonal elements, that is
Proposition 43. The determinant
a11 a12 a13 · · · a1n a22 a23 · · · a2n ai3 · · · ain = a11 a22 a33 · · · ann
· · · · · · ann Proof. We can transfer upper triangular matrix to diagonal matrix by only row addition, with their diagonal
elements kept. But row addition does not change the determinant, so it is the same with the determinant of
diagonal matrix
1 4 5 2 6 Example 69. Calculate 9 Answer: 1 × 2 × 9 = 18
Proposition 44. The matrix M is invertible if and only if det(M ) 6= 0
Proof.
If det(M ) = 0,
then for any elementary row transformation of M, the determinant is 0.
45
in other words, for any invertible matrix P, det(P M ) = 0,
we choose a P such that PM=reff(M).
so by this particular choice, we know det(ref f (M )) = 0,
because reff(M) is upper triangular,
so determinant should be the product of diagonal.
that means there are 0 in diagonal. so M is not of full rank.
If det(M ) 6= 0
then for any elementary row transformation of M, the determinant does not equal to 0
because each time only differ by a non-zero scalar multiple.
That is, for any invertible matrix P, det(P M ) 6= 0.
so choose P such that PM=reff(M),
so det(ref f (M )) 6= 0,
so all element of reff(M) are non zero,
that means reff(M) is an unit matrix.
so M is of full rank,
so M is invertible.
Proposition 45. det(BA) = det(B) det(A)
Proof. If one of A or B is not invertible, then both side of the equation is 0, so true. If A is invertible, then we
det((Sij B)A)
det(Sij (BA))
=
= − det(BA))
consider the function f (B) = det(BA)
det(A) Let’s check f (Sij B) =
det(A)
det(A)
det(A) =
−f (B). And we can also use the same method to check f (Mi (λ)B) = λf (B) and f (Aij (λ)B) = f (B)
and f (In ) = 1. So f(B)=det(B). Then det(BA)
det(A) = det(B), that is det(BA) = det(B) det(A)
Proposition 46. The determinant also saitisfies column transformation property, that is det(BSij ) =
− det(B); det(BMi (λ)) = λ det(B); det(BAij (λ)) = det(B);
Proof. By previous proposition, determinant is multiplicative, so
det(BSij ) = det(B) det(Sij ) = − det(B);
det(BMi (λ)) = det(B) det(Mi (λ)) = λ det(B);
det(BAij (λ)) = det(B) det(Aij (λ)) = det(B);
This means when we calculate determinant we can both use colomn and row transformation.
Proposition 47. det(A) = det(AT )
T) =
Proof. Consider the function f (B) = det(B T ), we have f (Sij B) = det((Sij B)T ) = det(B T Sij
T
T
T
det(B ) det(Sij ) = − det(B ) = −f (B) and we can check f (Mi (λ)B) = λf (B); f (Aij (λ)B) = f (B);
f (In ) = 1 by the same process, that is f (B) = det(B). this means det(B) = det(B T )
Corollary 1. The determinant of lower triangular matrix is the product of all entries in diagonal.



1 2 5
3
)
7 8  1 8
Example 70. Without multiplying two matrix, compute det(
9
9 2 5
46



1 2 5
3
)
7 8  1 8
det(
9
9 2 5




1 2 5
3
)
7 8 ) det( 1 8
= det(
9
9 2 5
= 63 × 120
= 7560
Next, we give some calculation method for second order determinant and third order determinant.
Before we proceed, we illustrate that there is one more important property of determinant that worth
using:
a11
a12
···
···
Proposition 48. bi1 + ai1 bi1 + ai2
···
···
an1
an2
···
···
···
···
···
a1n
···
bi1 + ain
···
ann
=
a11 a12
··· ···
ai1 ai1
··· ···
an1 an2
···
···
···
···
···
a1n
···
ai1
···
ann
+
a11 a12
··· ···
bi1 bi1
··· ···
an1 an2
···
···
···
···
···
a1n
···
bi1
···
ann
Proof. This property could also proved by uniqueness of determinant. For A = (aij ) 1≤i≤n consider the
1≤j≤n
function
a11
a12
·
·
·
···
f (A) = bi1 + ai1 bi1 + ai2
···
···
an1
an2
···
···
···
···
···
a1n
···
bi1 + ain
···
ann
−
a11 a12
··· ···
bi1 bi1
··· ···
an1 an2
···
···
···
···
···
a1n
···
bi1
···
ann
As we checked the three properties of f, we should proceed slightly take care of b. We omit proof here,
The reader could do that, a good exercise.
a b
Proposition 49. c d
a b
Proof. c d
= ac − bd
a
=
c d
b a
+
c d = c d
c d
−
b
a b c Proposition 50. d e f = aei + bf g + cdh − ceg − bdi − af h
g h i Proof. We use the same method as we did before, until we can get an upper triangular, lower triangular or
block upper or lower triangular matrix.
47
a b c d e f g h i a
= d e f
g h i
b
+ d e f
g h i
c
+ d e f g h i a
e
= g h i
a
+ d
f
g h i
b
+
f
g h i
a
e
= g h i
a
− d f
g i
b
+
f
h h i
b
+ d e
g h i
c
c
+
e
f + d
g h i g h i b
− e d
g h g i
c
−
e
i h g
c
+ f d
i g h
= aei − af h + bf g − bdi − ceg + cdh
= aei + bf g + cdh − ceg − bdi − af h
2 5 Example 71. Calculate the determinant of 2 7 Answer: 2 × 7 − 5 × 2 = 4
2 5 6 Example 72. Calculate the determinant of 1 0 2 3 2 0 Answer: 2 ×0×0 + 5×2×3 + 6 ×1 ×2−6×0×3−5 ×1 ×0−2×2×2 = 0 +30 + 12 −0−0−8 = 34
2
Example 73. Calculate the determinant of 1 3 2 9 Answer: I want you to practice property of row adding
2
1 3 2
9
2
3 2
= 9
2
+ 1 0
9
= 2×3×9+0
= 54
48
1.7. Laplacian Expansion, Cofactor, Adjugate, Inverse Matrix Formula.
In this section, We generalize the method that we used to compute the formula of 2 by 2 and 3 by 3
determinants to n by n matirx, the goal is not to write down the explicite formula, but to make use of it to
analyses a method to find the inverse.
Definition 23. The block diagonal matrix is the block matrix with the same partition on rows and columns,
such that all non-zero blocks lies on the diagonal cells. Like:


A1
..




.
An
Example 74. The following could be viewed as block diagonal matrix with approperate partition



1


1
2 5 =
6 7
2 5 
6 7
The above matrix could be seen as a block diagonal matrix with partition (1, 2)

2 3
 4 6






2 3
  4 6
 
 
=
 
 
 
2 4 7
1 2 0
3 5 6



6







2 4 7
1 2 0
3 5 6
6
The
seen as block diagonal matrix with partition (1, 3, 2)
 above matrix could be
1 3


2 4




5 6



5 
7
The above matrix is not a block diagonal matrix, as the partition on its columns and rows are not the
same
Proposition 51. The determinant of block diagonal matrix is the product of the determinant of all diagonal
blocks.
Example 75. Compute the following determinant:
1 2
3 2
2 3
1 5
The determinant is just the product of the determinant of diagonal matrices. That is
49
1 2
3 2
2 3
1 5
1 2
= 3 2
2 3
1 5
= (−4) × 7
= −28
Definition 24. The block upper triangular matrix is the block matrix with the same partition on rows and
columns, such that all non-zero blocks lies on or above the diagonal cells. Like:

A11 · · · A1n

.. 
..

.
. 
Ann

Definition 25. The block lower triangular matrix is the block matrix with the same partition on rows and
columns, such that all non-zero blocks lies on or below the diagonal cells. Like:

A11

 ..
..

 .
.
An1 · · · An

Proposition 52. The determinant of block upper triangular matrix or block lower triangular matrix is the
product of determinant of blocks of diagonals.
Definition 26. Let M be a n × n square matrix over F , and 1 ≤ i ≤ n, 1 ≤ j ≤ n, then the cofactor of M
at i,j is defined to be the determinant of (n − 1) × (n − 1) matrix which is obtained by deleting the ith row
and jth column. If we multiply this determinant by (−1)i+j , then the value is called the algebraic cofactor,
denoted by Mij , where the M is the notation for original matrix.


1 2 2
Example 76. Let M= 8 2 8 , What is M12 ?
−1 2 0
Answer: M1 2 is got by computing the determinant of cancelling the 1st row and 2nd column, and times
(−1)i+j i.e.
8 8 −1 0 = −8
1+2 M12 = (−1)
Now we trying to use our method to find the determinant for n × n matirx inductively, and look what we
could find.
50
Example 77. Trying to simplify the determinant of 5 × 5 matrix into the computation of determinant 4 × 4
matrix
1
2
4
5
6
2
4
6
7
1
2
5
7
1
0
We use our standard method:
51
1
4
8
1
0
1
4
6
2
7
1
2
4
5
6
= 1
2
4
5
6
+ = 2
4
6
7
1
4
6
7
1
2
4
5
6
= 1 × −1 × 2
4
5
6
4
6
2
7
+
5
7
1
0
4
6
2
7
2
4
5
6
2
4
6
7
1
5
7
1
0
4
6
7
1
2
4
5
6
4
8
1
0
5
7
1
0
5
7
1
0
+
4
6
2
7
1
4
6
2
7
4
8
1
0
4
8
1
0
+
4
6
2
7
1
4
6
2
7
2
4
5
6
4
6
7
1
5
7
1
0
4
8
1
0
−2×
2
4
5
6
5
7
1
0
4
8
1
0
4
6
2
7
+2×
+1×
2
4
5
6
4
6
7
1
5
7
1
0
4
8
1
0
2
4
5
6
2
5
7
1
0
4
6
7
1
4
8
1
0
4
6
2
7
4
6
2
7
+
4
6
2
7
4
6
2
7
2
4
6
7
1
2
4
5
6
+
−
4
6
2
7
5
7
1
0
4
8
1
0
4
6
7
1
4
6
2
7
4
8
1
0
4
6
7
1
5
7
1
0
1
4
6
2
7
1
4
8
1
0
5
7
1
0
2
4
5
6
4
6
7
1
4
8
1
0
5
7
1
0
4
6
7
1
1
4
8
1
0
1
4
8
1
0
5
7
1
0
4
6
7
1
1
2
4
5
6
− 2
5
7
1
0
2
5
7
1
0
2
4
5
6
4
6
7
1
4
8
1
0
2
4
5
6
4
6
7
1
4
8
1
0
4
6
2
7
As you can see, in the expansion, we expanded out the all the entry of the first row, and left with the
determinant of numbers that in the different row and column with the entry, with the sign alternating, that
exactly the first row times the cofactor corresponds to first row. To conclude, we now have the Laplacian
Expansion:
Definition 27. Suppose A is a square matrix, then we can expand A by any row or any column. namely
Row expansion:
det(A) = ai1 Ai1 + ai2 Ai2 + · · · + ain Ain
52
Column expansion:
det(A) = a1i A1i + a2i A2i + · · · + ani Ani
(Please keep in mind to change the sign, as the term appeared on the formula means algebraic cofactors)
Example 78. Expand the following matrix by the second row
6 8 9 2 4 1 1 2 5 8 9
= −2 × 2 5
+4× 6 9
1 5
−1× 6 8
1 2
−2× 2 6
1 9
Expand the following matrix by the second column
6 8 9 2 4 1 1 2 5 6 9
= −4 × 1 5
+8× 2 1
1 5
As you can see, the method we used above can inductively give the formula, but what is more than that is
the following observation:
What if I change the coefficient of each corfactor by the entry of other row? Then in the previous example
it would be the laplacian expansion of the determinant:
Example 79. In the expansion of
6 8 9 2 4 1 1 2 5 8 9
= −2 × 2 5
+4× 6 9
1 5
−1× 6 8
1 2
Now if we change the coefficient by the corresponding entries of other lines, that is
6 9 6 8 8 9 −5×
−1 × +2×
1 2 2 5 1 5 What the value would be? as you can see, the line that we expanded only determined the coefficient in
front of each algebraic cofacter, not appear in any of the cofacter . if we change the coefficient, the only
change is the line that we expand by.
8
9 8 9 6 9 6 8 6
+2×
−1 × 1 5 − 5 × 1 2 = −1 −2 −5 = 0
2 5 1
2
5 we get the last equation because the second row is scalar multiple of the third row
53
Proposition 53. The othogonal
of cofactor: let A = mataijnn be a matirx, rk =
 property

Al1
 Al2 


be any row of A, and cl =  ..  be the cofactor of some row. then
 . 
ak1 ak2 · · · akn
Aln
rk cl = 0 if k 6= l
rk cl = detA if k = l
At this point, we can make the propersition:
Now we construct a matrix, namely, Adjugate Matrix of A, it is constructed by putting Aij in the j’th
row and i’th column(remember the order). Then with the adjugate matrix, we have the following property:
Proposition 54. let A = (aij ) 1≤i≤n be a matirx, we define A∗ = (Aji )1≤j≤n to be the adjunct matrix, then
1≤j≤n
A∗ A = det(A)In
1≤i≤n
Then if the determinant is not 0, simply divide the adjugate matrix by determinant, we can get the inverse
Proposition 55. let A = (aij ) 1≤i≤n be a matirx, we define A∗ = (Aji )1≤j≤n to be the adjunct matrix, then
1≤j≤n
1≤i≤n
A
−1
= (Aji )1≤j≤n
1≤i≤n


2 4 6
Example 80. Find the inverse of  1 3 4 
6 7 2


2 4 6
 1 3 4 
6 7 2
=
    1
 −
2 4 6 
 1 3 4  6 7 2 3
7
1
6
1
6
4
2
4
2
3
7


−22 34 −2
1 
22 −32 −2 
= − 64
−11 10
2
54
− − 4
7
2
6
2
6
6
2
6
2
4
7
4 6
3 4
2 6
− 1 4
2 4
1 3








2. L INEAR E QUATION
Introduction: We already get a solid background in computation of matrix. Before we start, we should
learn to solve linear equations, which will be used extensively in the future study. The matrix multiplication
tell us what is output if we input some number. Now, the Linear Equation is the other way, is to know what
is input into the system such that it creates such an output.
Definition 28. A linear equation over field F is an algebraic equation in which each terms is either constant(in F ) or a product of the constant and (the first order of) single variable.
Example 81. The following are linear equations
3x + 2y + z + w = 3
x + 2y = 5
3=7
Definition 29. A system of linear equation is a bunch of equation with the same variables together
Example 82. The following are system of linear equations

 3x + 2y + z = 6
4x + y
= 5

x+z
= 2
Definition 30. A solution of a linear equation is an assignment of each variable by a scalar in F , such that
the equation is true when we plug in the scalars for corresponding varibles. A solution of a system of linear
equation is an assignment of each variable by a scalar in F , such that each of the equation is simutaneously
true when we plug in that scalar to corresponding variebles.
Example 83. If we have a system of linear equation like

 3x + 2y + z = 6
4x + y
= 5

x+z
= 2
Then the assignment:

 x = 1
y = 1

z = 1
is a solution for linear equation, because when we plug in every assignment, the equation is true:

 3(1) + 2(1) + (1) = 6
4(1) + (1)
= 5

(1) + (1)
= 2
Another important point of view is that we can write the system of linear equation into matrix by matrix
multiplication, in this way, The system of linear equation is the same as a single matrix equation Ax = b. A
is called the coefficient matrix of this system.
Example 84. We illustrate some examples here. For the linear equation:

 3x + 2y + z = 6
4x + y
= 5

x+z
= 2
55
recall the matrix multiplication, we can write each equation as

 
x





3 2 1
y  = 6






 z 



x
4 1 0  y  = 5




 z 




x

 1 0 1  y  = 2



z
Now we put them into blocks.















3 2 1
4 1 0
1 0 1





x
y
z
x
y
z
x
y
z
 
 


   

6

 = 5 


2
 


 
Remember the right common factor(resp. left) of a column vector(resp. row vector) could factor out to
the right(resp. to the left), that is















3 2 1
4 1 0
1 0 1





x
y
z
x
y
z
x
y
z
 
 


  


 =


 


 
  
 
3 2 1 x
3 2 1
x
4 1 0  y  =  4 1 0  y 
z
1 0 1
z
1 0 1
Thus the linear equation is equavalent to say

   
3 2 1
x
6
 4 1 0  y  =  5 
1 0 1
z
2
Now you can try to expand that by using matrix multiplication and you will find that the single equation
for matrix is exactly the same for the system of linear equations.
Now we say two geometric meanings of linear equation, one is from the view from row vectors of A,
another one is from the view of column vectors of A.
By viewing each row of A, namely, looking at each single equations in the system of equation. each
single equation gives a codimension 1 space in the whole space, and the solution is the intersections of the
spaces. Let’s show some example to make this idea clear.
56
Example 85. The equation


0 1 2
 0 0 1 
1 0 0
is equavalent to the system of linear equations:

 y + 2z
z

x
  
x
1
y = 1 
z
2
= 1
= 1
= 2
And each of the equation could be viewed as a equation of a plane in 3-dimensional space, namely,
H1 , H2 , H3 for each equation. The solution is some assignment of x, y, z, if we view this assignment as
coordinates of some point. Then the coordinates saitisfies each of the equation. Remember the coordinate
of a point saitisfies a equation of a surface is the same as saying the point is on the surface. So the solution
represents a point such that this point is in each of the plane. And this is the same as asking for intersections.
By solving the equation, we find the intersection is (2, −1, 1)
Example 86. Previously we gave a example such that the intersection is a unique point. While it is still
possible to choose other position of planes such that it could have different relative position. Consider the
following example

   
0 1 0
x
1
 0 1 1  y  =  3 
0 1 −2
z
3
That is,

= 1
 y
y+z = 3

y − 2z = 3
Then the geometric picture is like
57
showing that there is no point share on all three plane. And indeed, if we trying to solve this linear
equation, it would have no solution.
Example 87. It could also be the case that a system of linear equation have multiple solutions. Consider
the follwing example

  

0 1 0
x
1
 0 1 1  y  =  3 
0 1 −2
z
−3
That is,

= 1
 y
y+z = 3

y − 2z = −3
And the graph is like
There are multiple points that on each of the plane, clearly, every point with y coordinate 1 and z coordinate 2 would on the intersection. For example (1, 1, 2), (2, 1, 2), (π, 1, 2) .... are all on the intersections.
Indeed, if we solve the equation, we will see there is no restriction for x, so x could be any value, and then
this system of equation have multiple solutions.
Now we look at the equation column by column, as we previously studied, right multiplying a matrix
means create the column combination of the columns of the left matrix, and the coefficient comes from the
corresponding column.
Example 88. Look at the equation
2 5
1 2
x
y
=
9
4
Of course, by looking at rows, we could view the solution as the intersection of two lines.
Now we change another point of view. We look at the columns, this question could viewed this way
x
9
2 5
x
2
5
2
5
=
=
=
x+
y
4
1 2
y
1
2
1
2
y
The last equation is by multiplying by block.
58
So the original equation is the same as asking for the right amount x, y that
2
1
x+
5
2
y=
9
4
9
4
If you treat column vector as coordinates on the plane. Then
is a point on the plane, and
5
2
5
and
are two vectors, which start at original point and ends at
and
respectively.
2
1
2
2
1
Now think, you can only go by the direction of the blue arrow, in how far of each direction you want to go
to get there? of course, wecan’tgo arbitrary amount,because
that might miss our goal. Finally, we figure
2
5
out we should go 2 unit of
, and 1 unit more of
1
2
This is the column picture of system of linear equation – to get the right amount of combination to create
the thing that we disired.
Now let’s focus on how to solve for linear equations.
Definition 31. In the system of linear equation Ax = b, if b is a zero column vector, we call this system
of linear equation, that is Ax = 0, as homogeneous system of linear equation, if b 6= 0, then we call this
system of linear equation non-homogeneous
Example 89. The following is a non-homogeneous linear equation
x + 2y + 3z = 5
The following is a homogeneous linear equation
x + 2y + 3z = 0
The following is a non-homogeneous system of linear equation

 x + y + 2z = 1
x+z
= 0

x+y
= 0
59
The following is a homogeneous system of linear equation

 x + y + 2z = 0
x+z
= 0

x+y
= 0
2.1. Non-Homogeneous Linear Equation, The existence of the solution. We start by the method to find
the solutions of non-homogeneous linear equation. This method is called Gaussian Elimination
Proposition 56. Given a system of linear equation, we can do the following transformation and that will
not change the solution of the equation.
(1) Switch any of the two equations
(2) Multiply any of the equation by a non-zero scalar λ
(3) Add λ multiples of a equation to another equation.
Example 90. Suppose we have a system of linear equation:

 x + y + 2z = 1
x+z
= 2

x+y
= 0
If we switch the first and second equation, of course they are of the same meaning:


= 2
 x + y + 2z = 1
 x+z
r1 ↔r2
x+z
= 2 −−
x + y + 2z = 1
−−→


x+y
= 0
x+y
= 0
If we add any multiple of the second equation to the first. of course they should give the same solution


 x + y + 2z = 1
 4x + y + 5z = 7
r1 +3×r2
x+z
= 2 −−
x+z
= 2
−−−→


x+y
= 0
x+y
= 0
If we times any equation by a non-zero number, they also should give the same solution:


 x + y + 2z = 1
 4x + y + 5z = 7
5×r2
x+z
= 2 −−−→
5x + 5z
= 10


x+y
= 0
x+y
= 0
As you can see the above statement is obvious. In the mean time, it also looks pretty like the row
transformation, is there something to do with the row transformation of the matrix? The answer is definitely
yes. To see the fact, remember that we previously illustrate that the system of equations is equavalant to a
single matrix equation.
Ax = b
We made a statement that if we do row transformation of A and b simutaneously, then it would not change
the solution. This statement is nothing more than claiming that for any invertible matrix P, P Ax = P b and
Ax = b gives the same solution.
To see why, we have to check every solution of Ax = b is a solution of P Ax = P b, and every solution
of P Ax = P b is a solutio of Ax = b. To do the first, suppose the column vector x0 is a solution of Ax = b,
then of course Ax0 = b, now we multiply equation of both side by P, so we get P Ax0 = P b, so we checked
that x0 is a solution of P Ax = b.
Now assume the column vector x1 is a solution of P Ax = P b. then of course P Ax1 = P b, then we
multiply P −1 on both side of the equation, then P canceled, so Ax1 = b, that exactly means x1 is a solution
for Ax = b
60
Then we can use the same method that we use in the row transformation in matrix to reduce the coefficient
matrix to reduced echelon form without changing the solution of that. To show an example, we show both
ways of doing it.
Example 91. Let’s solve the non-homogeneous equation:

 3x + 2y + z = 6
x+y
= 2

2x + 2y
= 4
To write it into a matrix form, it is the same as

   
3 2 1
x
6
 1 1 0  y  =  2 
2 2 0
z
4
In order to do the row transformation simutaneously, we put them in a row of block matrix.


3 2 1
6
 1 1 0
2 
2 2 0
4
Now we are ready to start doing row transformation on A and b simutaneously.


3 2 1
6
 1 1 0
2 
2 2 0
4

0 −1 1
r1 −3×r2
−−
−−−→  1 1 0
2 2 0

0
2 
0

0 −1 1
r3 −2×r2
−−
−−−→  1 1 0
0 0 0

0
2 
0

1 1 0
 0 −1 1
0 0 0

2
0 
0

1 0 1
−−−−−→  0 −1 1
0 0 0

2
0 
0


2
0 
0
r ↔r
1
2
−−
−−→
r1 +1×r2
(−1)×r2
−−−−−→
1 0 1
 0 1 −1
0 0 0
Now is the same solution as

   
1 0 1
x
2
 0 1 −1   y  =  0 
0 0 0
z
0
which means x + z = 2 and y − z = 0
61
Remember in the reduced row echelon form, it is possible for leading 1 to appear in the last column. To
show this, now we see another example
Example 92. Let’s solve the non-homogeneous equation:

 3x + 2y + z = 6
x+y
= 2

2x + 2y
= 5

3 2 1
 1 1 0
2 2 0

6
2 
5

0 −1 1
r1 −3×r2
−−
−−−→  1 1 0
2 2 0

0
2 
1

0 −1 1
r3 −2×r2
−−
−−−→  1 1 0
0 0 0

0
2 
1

1 1 0
 0 −1 1
0 0 0

2
0 
1

1 0 1
r1 +1×r2
−−
−−−→  0 −1 1
0 0 0

2
0 
1


2
0 
1
r ↔r
1
2
−−
−−→
(−1)×r2
−−−−−→
1 0 1
 0 1 −1
0 0 0
Now is the same solution as

   
1 0 1
x
2
 0 1 −1   y  =  0 
1
0 0 0
z
which means x + z = 2 and y − z = 0 and 0 = 1
With previous examples, we have seen it might be case that after transformation, we might have equality
that totally doesn’t make sense like 0 = 1, if this equation appears, none of the choice of the varables would
able to make this equality make sense. Then we call this system of linear equation inconsistent. As we
defined solutions to be the numbers saitisfing equations simutaneously, but with 0 = 1, we are not able to
do that, this means this equation does not have solution.
Proposition
57. The system of equation Ax = b exists solution if and only if the reduced echelon form of
A b does not have leading 1 in the column of b.
Now we learn how to write down the solution. The key is to express the varibles of leading 1 in terms of
free varibles. The free varibles are those who does not correspoding to leading 1.
62
Example 93. Suppose we reduced the system of linear equation to

   
1 0 1
x
2
 0 1 −1   y  =  0 
0 0 0
z
0
which means x + z = 2 and y − z = 0
Now the leading 1 correspond to the variable of x and y, then we can express x and y in term of other
varibles. In our case, it is
x = 2−z
y = z
Then the varible which is not correspond to leading one is called free-varibles, we can assign it to be any
value. For example, we can let z = 1, then we get a solution

 x = 1
y = 1

z = 1
2.2. Homogeneous Linear Equation, The uniqueness of the solution. Previously we talked about how
to know the existence of solutions of system of linear equation, is by looking at whether the leading one
appears at the last column. And we defined consistency. Now we start talking about the uniqueness of linear
equation.
Proposition 58. Suppose u and v are two solutions of the equation Ax = b, then u − v is the solution of
the equation Ax = 0
Proof.
Au = b and Av = b
then Au = Av
then Au − Av = 0
that is A(u − v) = 0
Example 94. The system of linear equation

 3x + 2y + z = 6
x+y
= 2

2x + 2y
= 4
We can check there exists
of course,
 solution
 for
 such
 a equation,
 and
 there are multiple solutions. Let’s
x
1
x
0
take any two of them, say  y  =  1  and  y  =  2 . As you can see, they both saitiesfies
z
1
z
2


1
the equation, and thir difference, namely,  −1 should saitisfies the equation:
−1

 3x + 2y + z = 0
x+y
= 0

2x + 2y
= 0
and it really is. because take the first equation of the system for example, we have
63
3(1) + 2(−1) + 1(−1)

    
1
1
0
because  −1  =  1  −  2 
−1
1
2
= 3(1 − 0) + 2(1 − 2) + 1(1 − 2)
= [3(1) + 2(1) + 1(1)] − [3(0) + 2(−2) + 1(−1)]

 
1
0
because  1   2  are solutions of 3x + 2y + z = 6
1
2
= 6−6
= 0
With the previous discussion,if we know one particular solution of Ax = b, and all homogeneous
solutions of Ax = 0, then we can know all solutions of Ax = b by sum up particular solution with
homogeneous solutions.
Proposition 59. The solution set of Ax = 0 saitisfies the following
 
0
 0 
 
(1)  ..  is a solution.
 . 
0
(2) If u, v are two solutions of equation Ax = 0, then for any scalar λ, µ in field F , we have u×λ+v×µ
also a solution. More generally, linear combination of solutions is also a solution.
Proof. Suppose v1 , v2 , · · · , vn are solutions of Ax = 0, so suppose a1 , a2 , · · · , an are any scalars in field.
then A(v1 a1 + v2 a2 + · · · + vn an ) = (Av1 )a1 + (Av2 )a2 + · · · + (An vn )an = 0a1 + 0a2 + · · · + 0an = 0.
Thus, v1 a1 + v2 a2 + · · · + vn an ) = (Av1 )a1 + (Av2 )a2 + · · · + (An vn )an = 0a1 + 0a2 + · · · + 0an = 0
is a solution
Definition 32. All solutions of the equation Ax = 0 is called the (right) null space of A.
We call it ”space” here because with previous proposition, we will see that they are actually forms a linear
space, we will learn linear space in the future.
Because for any invertible matrix P Ax = 0 is equavalent to say Ax = P −1 0 , this is equavalent to
P Ax = 0. so the solution set of Ax = 0 and P Ax = 0 are the same. Thus, we could choose P such that
P A is the reduced echelon form. and let’s expand to see what does this tells us.
Example 95. Solve the system of linear equation of the reduced echelon form of following:


1
2

1
3
5






1
This is equavalent to the following equation

 x1 + 2x2 + 3x4 = 0
x3 + 5x4
= 0

x5
= 0
and thus we can write the solutions like
64
x1
x2
x3
x4
x5



=0



 x1 = −2x2 − 3x4
x3 = −5x4

x5 = 0
Thus, the choice of x2 and x4 is free, we have a solution whenever we assign a value for them. and we can
exaust all solutions by assigning different value for them. Look at the oringinal equation, they are varibles
The variables that not corresponding to leading 1 is called free varibles. as we can see, we can choose
the value of free varibles whatever we like. and we can express other varibles by the value of free varibles.
That means, all the solutions is one to one correspond to the choice of value of free varibles. And each
choice of free varibles determines a linear combination of solutions. The choose of free varibles to be 1 and
other 0 is called a elementary solution. the number of free varibles is called the dimension of (right) null
space of A.
Example 96. What are elementary solutions for the system of homogeneous linear equation?

 x1 + 2x2 + 3x4 = 0
x3 + 5x4
= 0

x5
= 0
write this into matrix form



1
1
1
3
5






1
x1
x2
x3
x4
x5


 
 
=
 
 
0
0
0
0
0






We discovered that the leading 1 corresponds to x1 , x3 and x5 . And thus the free varibles are x2 and
x4 ,So we express the varible corresponds to leading 1 in term of other varibles:

 x1 = −2x2 − 3x4
x3 = −5x4

x5 = 0
The first elementary solution is by setting x2 = 1 and x4 = 0.

x1 = −2




 x2 = 1
x3 = 0


x4 = 0



x5 = 0
 


−2
x1
 x2   1 

 

 

Thus, the first elementary solution is: 
 x3  =  0 
 x4   0 
x5
0
The second elementary solution is by setting x2 = 0 and x4 = 1 So
65

x1
 x2

So the Second elementary solution is: 
 x3
 x4
x5

x1 = −3




 x2 = 0
x3 = −5


x4 = 1



x5 = 0
 

−3
  0 
 

 =  −5 
 

  1 
0
We denote the number of elementary solutions of homogeneous linear system Ax = 0 as null(A)
Proposition 60. null(A) = n - rank (A), where n is the number of columns of A
Proof. The number of varibles is the number of columns of A, and the rank is the number of leading 1. The
free varible are those varible that not corresponding to leading 1. Thus null(A)=# of free varible = n - # of
leading 1 = n - rank(A)
Proposition 61. Suppose the solution of equation Ax = b exists, then the solution is unique if and only if
any of the following equavalent condition holds:
(1) Ax = 0 only have 0 solution
(2) null(A) = 0
(3) rank(A) = number of columns of A
Corollary 2. Suppose A is a square matrix, then the solution of Ax = b exists and unique if and only if A
any of the equavalent condition holds:
(1) A is invertible
(2) det(A) 6= 0
(3) A is of full rank
In this case, the solution is uniquely given by x = A−1 b.
A∗
, then the solution is given by Crammer’s rule.
If we combine the method of expressing A−1 = det(A)
With previous discussion about existance and uniqueness, we wish to completely solve our linear equation
by representing all its solutions. There are several ways to present them.
Example 97. Represent the solution set of

 x1 + 2x2 + 3x4 = 6
x3 + 5x4
= 6

x5
= 1
We know a particular solution for them is






x1
x2
x3
x4
x5


 
 
=
 
 
66
1
1
1
1
1






So all the solution should be in the form

x1




 x2
x3


x4



x5
=
=
=
=
=
1+x
f1
1+x
f2
1+x
f3
1+x
f4
1+x
f5
Where each xei represents the diffrence between any 2 solutions.
Now these xei should saitisfies the homogeneous equation

f1 + 2f
x2 + 3f
x4 = 0
 x
x
f3 + 5f
x4
= 0

x
f5
= 0
And their solutions could be represent as

f1 = −2f
x2 − 3f
x4
 x
x
f3 = −5f
x4

x
f5 = 0
we can assume x
f2 = t and x
f4 = w, thus we have

x
f1 = −2t − 3w




f2 = t
 x
x
f3 = −5w


x
f4 = w



x
f5 = 0
Conbinede with particular solution, we have

x1 = 1 − 2t − 3w




 x2 = 1 + t
x3 = 1 − 5w


x4 = 1 + w



x5 = 1
So the solution set is equal to



{


x1
x2
x3
x4
x5



 |x1 = 1 − 2t − 3w; x2 = 1 + t; x3 = 1 − 5w; x4 = 1 + w; x5 = 1, where t,w ∈ R}


Now we introduce sevral ways to represent solution sets.
Parametric Form We assign arbitrary values, called parameters, to the free varibles and all the varibles
is determined by the parameters, the solution representing by this form is called parametric form. Like



{


x1
x2
x3
x4
x5



 |x1 = 1 − 2t − 3w; x2 = 1 + t; x3 = 1 − 5w; x4 = 1 + w; x5 = 1, where t,w ∈ R}


or directly we represent as
67

x1 = 1 − 2t − 3w




 x2 = 1 + t
x3 = 1 − 5w


x4 = 1 + w



x5 = 1
Linear Combination of Column Vector Form We can write the solution as linear combination of column vector form. like
  



1
−2
−3
 1   1 
 0 
  








{ 1  +  0  t + 
 −5  w|where t,w ∈ R}
 1   0 
 1 
1
0
0
or directly as

   



x1
1
−2
−3
 x2   1   1 
 0 

   



 x3  =  1  +  0  t +  −5  w

   



 x4   1   0 
 1 
x5
1
0
0
Matrix Form This presentation is by using the matrix multiplication, and present all the solutions as
  

1
−2
−3
 1   1
0 
  

t




−5 
{ 1  +  0
|where t,w ∈ R}
w
 1   0
1 
1
0
0
or directly as

   
x1
1
−2
 x2   1   1

   
 x3  =  1  +  0

   
 x4   1   0
x5
1
0
Of course, all of the set above are equavalent each other.
−3
0
−5
1
0



t

 w

3. V ECTOR S PACES , L INEAR T RANSFORMATION
3.1. Actions, Matrix of Actions. Introduction:We have seen that matrices of numbers, and matrix of
matrix, an natrual think about it is that is there matrix of anything? Now let’s define a border concept,
Actions.
3.1.1. Actions.
Definition 33. Suppose X,Y,Z is three sets, an action is simply a map X × Y −→ Z. In this case, we say
that the element of X left act on the element of Y, result in the element of Z.
Example 98. Suppose you have a rock and an egg, when you want to eat it, the idea is to use the rock act
on it: we can write this expression as multiplication:
Rock hit egg = good rock + damaged egg
68
Then the words”rock hit” could be viewd as an action, it can left act on any object and then gives the
result. Let’s see different result if you throw your rocks on.
Rock hit sea = disappeared rock + sea
Also the action could viewed in language just by simply combine all phrases:
Example 99.
Operator Object
Action
Result
I eat
an apple ”I eat” lef t act on ”an apple” ”I eat an apple”
I peel
the apple ”I peel” lef t act on ”an apple” I peel the apple
Example 100.
Operator Object
Action
Result
I eat
an apple ”I eat” lef t act on ”an apple” ”I eat an apple”
I peel
the apple ”I peel” lef t act on ”an apple” I peel the apple
The action from left and from right should be distinguished, Look at the following example
Example 101. Now consider the composition of two operator ”wear shoes × wear socks”. If your feet
accept action from left. Then this operator left act on your feet, the phinomenon is:
wear shoes × wear socks × yourf eet
= wear shoes × the f eet wearing socks
= the f eet wearing shoes in which the socks is wearing inside the shoes
Now let’s see if your feet is defined to accept action of this kind of operator from right. Then this operator
right act on you feet, the phinomenon is
yourf eet × wear shoes × wear socks
= the f eet waring shoes × wear socks
= the f eet wearing shoes in which the socks is wearing outside of the shoes
Because we choose to write certain composition of operator in one way, so we should have already defined
which direction that the operator should act on, and one can not change the direction.
Definition 34. Suppose we already have an action X × Y → Z, then we define the transpose of X, Y, Z, to
be X T , Y T , Z T respectively, in which the elements are not changed. but the direction of receive action is
reversed. More precisely, is calculated by the rule (ab)T = bT aT
If originally Y accept left action from X, after transpose it. Y T is defined to accept right action from X T .
In other words, everything didn’t change, but looks like the world seeing in the mirror.
Now we give an example
Example 102. Suppose I eat × an apple = I eat an apple, Calculate (an apple)T × (I eat)T
(an apple)T × (I eat)T
ll
= (I eat × an apple)T
= (I eat an apple)T
The action above is the most general actions. In here, we only consider a specific action. We want our
sets to have some addition structure so that we can use Matrix Theory to represent them.
Definition 35. A set G equipped with an operator ”+” is called an abelian semi-group, if any three element
a, b, c of G satisfies the following
(1) a + b is an element in the set G
69
(2) a + b = b + a
(3) a + (b + c) = (a + b) + c
Note that we didn’t say anything about inverse and zero, so generally we can only do addition, no subtracting.
Example 103. Suppose X is a set, G is the set of all subset of X, and define ”+” to be taking union of subset.
Then it clearly satisfies the 3 axioms. So is the taking intersection.
Example 104. Suppose G is all positive integers, we define ”+” to be taking the greatest common divisor,
then it clearly satisfies the 3 axioms.
Now we would like to consider the case that an action of O(operators) act on a semi-group G with result
also an element of G itself. In other words, the action is O × G → G
To illustrate example to play with. let G simply be the collection of all subsets in R2 , and the addition is
taking union of sets. We define O to be set of operations including any rotation and reflections. And we let
O to left act on G. We use the following symbols to represent rotation or reflection.
Example 105. We denote the following
: rotate clockwise by 90 degree
: rotate counterclockwise by 90 degree
: vertically reflection
: horizontally reflection
Now think these as operators, and define the ”+” as union of subsets, then we can calculate the following:
×
×
+
=
×
+
=
+
=
=
+
=
Now if we put all the element into matrix, and regard multiplication as action. addition as the addition
of the semi-groups. Then we can notationally simplify a lot of expressions. To give you an idea how to
Calculate them, see the following expression.
Example 106.
=
=






×
+
+
=
70
×








= 


×
×
×
×



= 







3.2. Linear Spaces, dimension. Introduction: Now we start studying linear algebra. With the basis and
understandings about matrix before, we can simplify a lot of expressions we need and gives us easier look
to understand.
Definition 36. The (right) linear space over field F is a set V, endowed with an abelian group structure,
which admits action of scalar of F from right, In other words. V is a set equipped with an operator ”+”
satisfying
(1) (V,+) is an abelian group, which means:
1) For any v ∈ V , w ∈ V , u = v + w is an element in V
2) For any v ∈ V , w ∈ V , v + w = w + v
3) For any v ∈ V , u ∈ V , w ∈ V ,(u + v) + w = u + (v + w)
4) There exists 0 ∈ V , such that for any v ∈ V , 0 + v = v
5) For any v ∈ V , There exists −v ∈ V , such that v + (−v) = 0
(2) (V,+) admits an action of field from right, which satisfying
1) For any v ∈ V , v × 1 = v
2) For any v ∈ V , λ ∈ F , v × λ defines an element in V
3) For any v ∈ V , λ, µ ∈ F ,v × (λµ) = (v × λ) × µ
4) For any v ∈ V , λ, µ ∈ F ,v × (λ + µ) = v × λ + v × µ
We often call the element of linear space as vectors
Definition 37. A subspace W of linear space V , is a subset of V , such that for any u,v in W , u×λ+v ×µ ∈
W for any scalars λ and µ
Remark 3. (λv means v × λ)It is an historical mistake to put the scalar action from left. In the future we
will introduce linear map, which people wish to write on the left of the vector. And the fact is that vector
admits scalar action and linear map action, but this two action has to be appear in different side. If linear
map with to appear on the left, then scalar should act on the right, and vise versa. People remember this
mistake though, and some textbook in ascent time trying to keep scalar in the left and write the linear map on
the right. However, that suggestion didn’t become popular because people already get used to write abstract
action on the left. like f (x), cosx. So, because of the historical mistake. The λv will means v × λ. It doesn’t
matter because the scalars are commutative. For example (µλ)v is v × µ × λ, which means µ should act
first and then λ, but the historical mistake symbol will looks like λ acts first and then µ. But it gives the
same result as λ and µ are commutative. In the future study, when you treat modules over non-commutative
rings. please let the scalar act from right, don’t let the historical mistake really misleading you.
71
Example 107. The plane with selected point and structure of vector addition is a linear space over R.
Indeed, the selected point is called origin. and for any point in the plane, it corresponds a vector, this
vector is starting from origin and ending at this point. The sum of two vector composed by applying the
parallelogram rule. The scalar multiplication of the vector is obtained by enlarging the vector by the
corresponding ratio.
Scalar multiplication:
Parallelogram law:
What is the subspace of this space? certainly by definition the space itself could be a subspace. This is
the only 2 dimensional subspace in this space. And 1 dimensional subspaces are lines that passes through
origins. The 0 dimensional subspace is a singleton set :{0}
2-dimensional subspaces: (the cube outside is auxiliary)
The geometric intuition for subspaces is not only a flat space, but also the space should pass though the
origin, or else it would not become a subspace. The most simple reason is that a subspace should contain
the vector 0. Just like the following example.
Example 108. The subset as shown in the following picture does not form a subspace:
The reason is simple, even if we just pick out two vectors, which means starting at origin and ending at
the plane. then by parallelogram law, then ending point of the sub would exceeded the plane. So this subset
is not closed even under vector addition. So a linear subspace should pass though the origin.
Example 109. Denote the set of all the n×1 matrices over F as Mn×1 (F), endowed with additive structure:
The sum of two matrices is just the normal sum of two matrices. the scalar multiplication is naturally
obtained by right multiplying a scalar. Thus, this is indeed a vector space.
72


x
Example 110. Now let V = M3×1 (F), then V is a linear space with elements looks like  y . Does the
z
 
x
set W = { y  |x + y + z = 0} a linear subspace?
z


x1
Answer: Yes, it is a linear subspace, we verify it. For any λ, µ ∈ F , we compose  x2  λ +
x3




y1
λx1 + µy1
 y2  µ =  λx2 + µy2 , because we know that each of the vector comes from W, so we have the
y3
λx3 + µy3
condition that
x1 + x2 + x3 = 0
, and
y1 + y2 + y3 = 0
.
Combine this two equation we get
(λx1 + µy1 ) + (λx2 + µy2 ) + (λx3 + µy3 ) = 0
That means


λx1 + µy1
 λx2 + µy2  ∈ W
λx3 + µy3


x
Example 111. The same notion as before, is the subset W = { y  |x + y + z = 3} a subspace? The
z
 
 
1
3
answer is no. because simply take  1  and  0  as two vectors in W, then the sum of the two vectors
1
0
 
 
4
4
 1  is no longer satisfies the condition x + y + z = 3, so  1  does not lies in the subset. This subset
1
1
is not closed under addition, so is not a subspace.
pass the oringin, and now at least
 Another
 point of view is gemetrically speaking the subspace should
 
0
0
 0  should in the subset, but this subset W does not contain  0 . This is exactly the same case as
0
0
we illustrate before.
Example 112. Now let V = {ax2 + bx + c|a, b, c ∈ R} to be the polynomials of degree less or equal
than 2 with real number coefficient. The addition structure is simply sum of two polinomials, the scalar
multiplication is defined also in the usual way. This is a linear space.
Example 113. Now consider the subset W = f ∈ V |f (0) = 0, is W a subspace? Yes, it is, indeed, for any
two polynomials f, g ∈ W , we have f (0) = g(0) = 0, then of course any linear combination would give
h(0) = λf (0) + µg(0) = λ0 + µ0 = 0, then h is an element in W, so W is a subspace.
73
Definition 38. Suppose V is a linear space over field F , The linear combination of v1 , v2 , · · · , vm by scalars
a1 , a2 , · · · am means the vector that is obtained by
a1 v1 + a2 v2 + · · · + an vn
Definition 39. Suppose w ∈ V ,v1 , v2 , · · · , vn ∈ V we say w is able to be linearly expressed by{v1 , v2 , · · · , vn }
if w can be written as a linear combination of v1 , v2 , · · · vn with some coefficient.
Example 114. If we draw arbitrary three vectors in a plane, then someone would be the linear combination
of others others.
To see that, imagine you are driving a car from the origin towards the direction of the first vector. At the
same time, your eyes are keep looking on the direction of the second vector. At some point, you will saw
the endpoint of w, and then you made a decision immediately to go the direction towards the second vector.
Finally you got w. In the whole process, we only go by 2 directions. so w is a linear combination of this two
vector with some appropriate amount.
We can also use the same philosophy to imagine every 4 vector in the space, someone should be the
linear combination of other three. (In the general position, you driving along 1 vector, with eyes keep
looking around along the plane that spaned by the other 2.)
Remark 4. The previous one is just a example. When you are driving, please keep your eyes looking on the
road! never look around. No linear algebra when you drive!
 
 
 
2
1
0
Example 115. Let V = M3×1 (R),  4  is a linear combination of  1  and  1 , with the
2
1
0
coefficient 2 and 2, that is
   
 
2
1
1
 4 = 1 ×2+ 2 ×2
2
1
1
Example 116. Let V = {ax2 + bx + c|a, b, c ∈ R}, the polynomial x2 + 1 is a linear combination of x2 and
1, with coefficient 1, 1. x2 + 1 is also an linear combination of x2 − x and x + 1, also with the coefficient
1,1.
Definition 40. The subspace spanned by {v1 , v2 , · · · , vn } means the subset of all possible linear combinations of {v1 , v2 , · · · , vn }. This set forms a linear subspace, is called
the subspace spanned by {v1 , v2 , · · · , vn }
denoted as span{v1 , v2 , · · · , vn } or span v1 v2 · · · vn
74
Example 117. Geometrically, the subspace spanned by a bouch of vector is the minimal linear subspace
that contains these vector. or it could also be viewed as the maximal subspace that is generated by these
vector, here generate means put all possible linear combination of them in.


 
1
0
Example 118. Suppose V = M3×1 (R), and v1 =  0 , and v2 =  1 , then the subspace spaned by
0
0
v1 and v2 are all possible linear combinations of v1 and v2 . let’s see
 
 
 
1
0
x
 0 ×x+ 1 ×y = y 
0
0
0
Then the linear space consists of all column vector with the last entry zero.
Definition 41. Suppose V is a linear space over F , what we call a linear relation of S = {v1 , v2 , · · · , vn }
is a equation of linear combination of these vectors to produce 0, i.e.linear relation is a equation like
a1 v1 + a2 v2 + · · · + an vn = 0
clearly we can choose all the coefficient to be zero, if it does, then we call this as trivial linear relation.
Definition 42. Suppose S = {v1 , v2 , · · · , vn } are subset of linear space V, we call the set S linearly dependent if they have a non-trivial linear relation. That is, there exists some scalar a1 , a2 , · · · , an , such that not
all of them are 0, and
a1 v1 + a2 v2 + · · · + an vn = 0
Definition 43. Suppose S = {v1 , v2 , · · · , vn } are subset of linear space V, we call the set S linearly independent if there exists some scalar a1 , a2 , · · · , an , such that
a1 v1 + a2 v2 + · · · + an vn = 0
forces a1 = a2 = · · · = an = 0. That is, this set of vectors only have trivial linear relation.
Geometrically speaking. Linearly dependence is the case when two vectors are collinear, three vectors
are coplanar, or even more extremely, three vectors collinear. Normally, n vectors should able to span an
n-dimensional space. Whenever they failed to do so, they are linearly dependent, and in this case, we can
always get rid of the redundant vector and minimize the set with keeping its ability to span the space we
want. Look at following picture:
75
These three vectors are linearly dependent because they can only span a plane, now in this case, get rid of
any vector and left with the other two would also able to span the plane. Then the left two would be linearly
independent.
In this case, this three vectors are also not able to span the three dimensional space, but we can only get
rid of the collinear one to make sure they would able to span the plane.
Example 119. The zero vector itself would be a linearly dependent vector, to see why is that, look at the
expression
a0 = 0
we can choose a to be any non-zero scalar to cook up this zero vector. so 0 vector itself is linearly
dependent
Example 120. In the plane, two collinear vectors are linearly dependence, precisely, if w = λv,then the
following is true:
w − λv = 0
the first coefficient is 1 and the second is λ
Example 121. If w can be linearly expressed by v1 , v2 , · · · , vn , then if we put them together, namely
w, v1 , v2 , · · · , vn the set of these vector would be linearly dependent.
w = a1 v1 + a2 v2 + · · · + an vn
then we can write it as
0 = (−1)w + a1 v1 + · · · + an vn
then clear not all of the coefficeint are 0(at least the first -1 is not 0), so they are linearly dependent.
This is exactly where the word dependent come from. linearly dependent means some of the vector could
be expressed as a linear combination of other vectors. So this vector depend on others.
76

  
 
1
0
1
Example 122. If V = M3×1 (R), the vectors  0 ,  1  and  1  would be linearly dependent,
0
0
0
because we can find some not-all-zero scalar that
 
 
 
 
0
1
1
0
 1  × 1 +  0  × 1 +  1  × (−1) =  0 
0
0
0
0
 
1
When we get rid of  1 , the left two vectors become linearly independent, to see this fact, suppose we
0
have
 
 
 
1
0
0
 0 ×x+ 1 ×y = 0 
0
0
0
That will forces
   
x
0
 y = 0 
0
0
, that is
x = 0
y = 0
Example 123. Suppose we have bunch of vectors S = {v1 , v2 , · · · , vn }, no matter they are linearly independent or dependent, now, we put the zero vector in. T = S ∪ {0} = {0, v1 , v2 , · · · , vn } Then this set of
vectors will become linearly dependent. To see this fact, we only consider to compose them to be 0. Now we
choose the coeffficent of 0 to be 1, and coefficient of others to be 0, now we get
0 × 1 + v1 × 0 + v2 × 0 + · · · + vn × 0 = 0
Now we successfully chosen some not-all-zero scalar to combine them to be 0, so they are linearly dependent.
Keep in mind that why we want to choose a bunch of vector in the space and talked a lot about the span
or linearly Independence? Because there are infinitely many vectors in the space. And they are invisible.
We want to choose finitely many of them so that we can represent them. The method of represent is by
linear combination. We want to choose as much vector as we can to be able to represent every vectors
in the space(So these vectors are able to span the whole space).Thus we would guarantee the existence of
the scalars to represents the vector. In the mean time, we also want to choose as few vector as we can
for our convenience by getting rid of redundant one(linearly dependent one). Thus we could guarantee the
uniqueness of the scalars to represents the vector.
Proposition 62. Let V be a linear space over F , S = {v1 , v2 , · · · , vn } be the set of vectors, then the
following are equivalent
(1) v1 , v2 , · · · , vn spans V
(2) For any w ∈ V , there exists a1 , a2 , · · · , an , such that w = a1 v1 + a2 v2 + · · · + an vn
Proof.
(1) → (2)
Because v1 , v2 , · · · , vn spans V
Then by definition every element of V is a linear combination of v1 , v2 , · · · , vn
That exactly means there exists a1 , a2 , · · · , an such that w = a1 v1 +a2 v2 +· · ·+an vn
77
(2) → (1)
Choose any vector w ∈ V
Suppose there exists a1 , a2 , · · · , an such that w = a1 v1 +a2 v2 +· · ·+an vn
Then w is a linear combination of v1 , v2 , · · · , vn .
now w ∈ span{v1 , v2 , · · · , vn }
So because every vector is in span{v1 , v2 , · · · , vn }, then
V = span{v1 , v2 , · · · , vn }
Proposition 63. Let V be a linear space over F , S = {v1 , v2 , · · · , vn } be the set of vectors, then the
following are equivalent
(1) v1 , v2 , · · · , vn are linearly independent
(2) For w ∈ V that can be linearly expressed by v1 , v2 , · · · , vn , in equation w = a1 v1 + a2 v2 + · · · +
an vn , the choice of scalars is unique
Proof.
(1) → (2)
Because v1 , v2 , · · · , vn are linearly independent
Then a1 v1 + a2 v2 + · · · + an vn = 0 will forces a1 = a2 = · · · = an = 0
Now suppose we have two kinds of expression
w = b1 v1 + b2 v2 + · · · + bn vn = c1 v1 + c2 v2 + · · · cn vn
look at the last equation and then move terms leftwards, we get
(b1 − c1 )v1 + (b2 − c2 )v2 + · · · + (bn − cn )vn = 0
because v1 , v2 , · · · vn are linearly independent
then it will forces the coefficient to be 0.
that is (b1 − c1 ) = · · · = (bn − cn ) = 0
that is b1 = c1 , b2 = c2 ,...bn = cn
So the expression is unique.
(2) → (1)
Suppose for any vector that linearly expressed by v1 , v2 , · · · , vn ,expression is unique.
obviously, 0 can be linearly expressed by v1 , v2 , · · · , vn coefficient is 0.
Because the uniqueness, the coefficent can only be 0
thus if a1 v1 + · · · + an vn = 0, then a1 = · · · = an = 0
This means that v1 , v2 , · · · , vn are linearly independent.
Now we both want uniqueness and existence. That would be great, now we define the best bunch of
vector that have both of them.
Definition 44. If V is a linear space over F , then the set {e1 , e2 , · · · , en } is called a basis of V if
(1) {e1 , e2 , · · · , en } is linearly independent.
(2) {e1 , e2 , · · · , en } span the whole space V
Proposition 64. Suppose {e1 , e2 , · · · , en } is a basis for V, then every vector w ∈ V can be uniquely
expressed as a linear combination of e1 , e2 , · · · , en
Proof. They span the space guarantees the existence of the linear combination. The linearly independence
guarantees the uniqueness.
78
Example 124. If V = {ax2 + bx + c|a, b, c ∈ R}. We said it is a linear space, then the vectors x2 ,x,1 could
be a basis for this linear space, firstly, any polynomial of degree less or equal than 2 is a linear combination
of these three vector over R, besides, if a linear combination of them gives the zero polynomial, then all of
the coefficient would be 0. Besides this standard basis, x2 − 2x + 1, x − 1, 1 could also be a basis. to see
why, we just check the property. it’s clear that for any polynomial, we can substitude x by (x − 1) + 1, and
then expand it in term of (x − 1), will give us the approprate linear combination of these vectors, besides, if
the combination of theses vector gives zero polynomial, then all the coeficcient would be 0. So it is indeed a
basis
     
1
0
0
Example 125. If V = M3×1 (R), then the  0 ,  1 ,  0  would be a basis, to see why, firstly,
0
0
1
 
x
any vector  y  could be written as a linear combination of these three
z
   
 
 
x
1
0
0
 y = 0 ×x+ 1 ×y+ 0 ×z
z
0
0
1
besides, if there is a linear combination of them gives the zero vector, then all the coefficient should be 0,
because
   
 
 
 
x
1
0
0
0
 y = 0 ×x+ 1 ×y+ 0 ×z = 0 
z
0
0
1
0
  
 
1
0
1
Besides, we can also have a different basis. for example  1 ,  −1 , 0  could also be a
0
1
0
basis, how to show this is a basis? We left it to reader.
Now we have the concept of basis, let’s define the most important concept of this course – Dimension.
Proposition 65. V is a linear space, suppose S1 = {e1 , e2 , · · · , en } is a basis for V, and S2 = {1 , 2 , · · · , m }
is also a basis for V. Then the number of elements in this two sets are the same. This number is called the
dimention of V
We left this exercise to reader. But, don’t do that right now. Keep reading the next section.
3.2.1. Using Matrix to simplify notation. .
In this section, we emphasis the historical mistake: λv should really means v × λ.
As you can see, we didn’t use any matrix notation until now. and what the method we use to prove is
simply plus, times and equal. we always have to move a lot of terms from the right to the left, and factor
out the common factor. So far so good, because we only treat one equation. In the future, if we stick on this
kind of notation, then we will skew up. That’s why the notion of matrix can simplify all the work. we can
represent a summation without using ”+”. Look at the following:
Example 126.

a1 a2 · · · an
a1 b1 + a2 b2 + · · · + an bn =




b1
b2
..
.
bn
79





Example 127. Previously, we have the equation like
v1 × a1 + v2 × a2 + · · · + vn × an = v1 × b1 + v2 × b2 + · · · + vn × bn
and we conclude by move the right hand side to the left and then minus and then factor out vi to get
v1 × (a1 − b1 ) + v2 × (a2 − b2 ) + · · · + vn × (an − bn ) = 0
But this kind of equation can easily use matrix to represent.
Originally we have

v1 v2 · · · vn




a1
a2
..
.




=

v1 v2
an

b1
 b2 

· · · vn 
 ··· 
bn
Simply move the right hand side to the left, and factor out the common row matrix, we get

v1 v2 · · · vn


(




b1
 

  b2 
) = 
−

··· 


bn
an
a1
a2
..
.

0
0
..
.





0
And simply use matrix substraction, that is

v1 v2 · · · vn




a1 − b1
a2 − b2
..
.
an − bn


 
 
=
 
0
0
..
.





0
This is not even the worst case, if you stick on using the classical notion, then sometime you have to do
substitution. You substitute n variables by another n variables, and each variable all depend on the new n
variables. If you do that one by one, you would cry and never want to touch math again, now matrix comes
and tell you that it can helps, then the beauty of math comes again to intoxicate you. Matrix multiplication
simplifies substitution of variables And the associativity of matrix multiplication enable you to do the
work around.
Example 128. If

 p = 3x + 2y + z
q = 2x + y

r = x + y + 2z

 u = p + 2q + 4r
v = 5p + r

w = p+q+r
Please write the u, v, w in terms of x, y, z
If you try substitute and substitute again, it would be a good exercise for calculation toleration. Now lets
do it by matrix.
In the first equation ,we seen that the scalar act on the left of the varibles, we should keep the order.
80

 p = 3x + 2y + z
q = 2x + y

r = x + y + 2z





p =









=⇒
q =










r =




3 2 1
2 1 0
1 1 2





x
y
z
x
y
z
x
y
z











  

p

=⇒  q  = 


r




3 2 1
2 1 0
1 1 2





x
y
z
x
y
z
x
y
z
 
 


 


 


 


 
The important skill is:
In the column matrix(matrix that only have 1 column), the right common factor (no matter what stuff)
could factor out to the right.
In the row matrix(matrix that only have 1 row), the left common factor(no matter what: same numbers,
same matrices, same hamburgers)could factor out to the left





  

p

 q =


r




3 2 1
2 1 0
1 1 2





x
y
z
x
y
z
x
y
z
 
 


 
  

p

  ⇒  q =


r
 


 
Now let’s return back, the equation:

 p = 3x + 2y + z
q = 2x + y

r = x + y + 2z
 

 
3 2 1 3 2 1
x
x






2 1 0
2 1 0 y
y 
=
z
z
1 1 2
1 1 2

 u = p + 2q + 4r
v = 5p + r

w = p+q+r
tells us nothing more than
  
  
 
 
p
3 2 1
x
u
1 2 4
p
 q  =  2 1 0  y   v  =  5 0 1  q 
1 1 2
z
w
1 1 1
r
r
Then to get p, q, r in term of x, y, z, we only need to use the first one to substitute the second one, thus

 
  

  
 
u
1 2 4
p
1 2 4
3 2 1
x
10 8 9
x
 v  =  5 0 1   q  =  5 0 1   2 1 0   y  =  16 11 7   y 
w
1 1 1
r
1 1 1
1 1 2
z
6 4 3
z
Thus we know

 u = 10x + 8y + 9z
v = 16x + 11y + 7z

w = 6x + 4y + 3z
Now let’s use the language of matrix to redefine everything we did.
81
3.2.2. Matrix interpretation. Previously we put bunch of vectors in sets, while in the future study, you will
find the order of how arrange them is important. So now instead of putting them into a set, it is more
resonable to put them in a row matrix. So previously we always talk about
the linearly independence of
{v1 , v2 , · · · , vn }, now it is more resonable to become v1 v2 · · · vn
Definition 45. Suppose V is a linear space, The linear combination of v1 , v2 , · · · , vm by scalars a1 , a2 , · · · am
means


a1


 a2 
v1 v2 · · · vn  . 
 .. 
an
The left is a 1 × n matrix over V, the right is a n × 1 matrix over F
Definition 46. Suppose w 
∈ V ,v1
, v2 , · · · , vn ∈ V we say w is able to be linearly expressed by {v1 , v2 , · · · , vn }
x1
 x2 


if the equation for varible  .. 
 . 
xn

w=
v1 v2
x1
x2
..
.


· · · vn 






xn
has solution.(Warning, this is not linear equation, because the left matrix is not a number matrix)
Definition 47. The subspace spaned by (v1 , v2 , · · · , vn ) is the collection of all the possible values of the
expression


t1


 t2 
v1 v2 · · · vn  . 
 .. 
tn
when each ti ranges from the base field.
Definition

 48. The (v1 , v2 , · · · , vn ) is said to be linearly dependent if the following equation for variable
x1
 x2 


 ..  has a non-zero solution:
 . 
xn

0=
v1 v2 · · · vn




x1
x2
..
.
xn
82





Definition

 49. The (v1 , v2 , · · · , vn ) is said to be linearly independent if the following equation for variable
x1
 x2 


 ..  only has zero solution:
 . 
xn

0=
v1 v2 · · · vn




x1
x2
..
.





xn
Example 129. Suppose
w=
v1 v2 · · · vn
v1 v2
is linearly independent, and




a1
b1




 a2 
 b2 
· · · vn  .  = v1 v2 · · · vn  . 
 .. 
 .. 
an
bn
, show the cancellation rule:





a1
a2
..
.


 
 
=
 
an
b1
b2
..
.





bn
Proof.

v1 v2 · · · vn




a1
a2
..
.




=





v1 v2 · · · vn
an
b1
b2
..
.





bn
then

v1 v2 · · · vn




a1
a2
..
.




−

v1 v2 · · · vn




an
b1
b2
..
.
bn
That is

v1 v2 · · · vn


(

a1
a2
..
.


 
 
−
 
an
b1
b2
..
.



) = 0

bn
That is

v1 v2 · · · vn


(

a1 − b1
a2 − b2
..
.
an − bn
83



) = 0




=0

Because v1 v2 · · · vn
that is true is
is linearly independent, view the above as an equation. then the only case





a1 − b1
a2 − b2
..
.



=0

an − bn
, that is





a1
a2
..
.


 
 
=
 
an
b1
b2
..
.





bn
So arithmetically speaking, independent vectors of (right) linear space could cancel from the left. But
lineally depend vectors can’t
Definition 50. Let V be a linear space, then e1 e2 · · · en is called a basis if
(1) e1 e2 · · · en spans the whole space. that is, for any v ∈ V , there exists some scalar
a1 , · · · , an , such that


a1


 a2 
v = e1 e2 · · · en  .. 
 . 
an

(2)
e1 e2 · · · en
is linearly independent, that is, if
e1 e2 · · · en




a1
a2
..
.



 = 0, then

an





a1
a2
..
.


 
 
=
 
an
Proposition 66. If e1 e2 · · · en
expressed in unique way as:
0
0
..
.





0
is a basis for vector space V, then every vector v ∈ V could be

v=
e1 e2 · · · en




a1
a2
..
.





an
Proof. The existence of this expression is guaranteed
by the first property of the definition of basis, To show
the uniqueness, remember e1 e2 · · · en is linearly independent, so by previous example we know
the scalars are unique.
84
Definition 51. Suppose V is a linear space,
v=



The column vector over the base field 

e1
a1
a2
..
.
e1 e2 · · · en is a basis, then in the expression


a1


 a2 
e2 · · · en  . 
 .. 
an



 is called the coordinate of v with respect to basis

e1 e2 · · · en
an
Example 130. Let V = {x2 × a + x × b + c|a, b, c ∈ R} be the right vector space over R, (we multiply
scalar on the right because we want it to be a right vector space). Choose x2 x 1 to
a basis.
 be 
1
then any polynomial could expressed in this way, for example x2 + x + 1 = x2 x 1  1  Then
1
 
1
the coordinatte of x2 + x + 1 with respect to this basis is  1 . Now let’s change a nother basis, maybe
1
2
x + 1 x + 1 2 . Then we observe that
1
x2 + x + 1 = (x2 + 1) × 1 + (x + 1) × 1 + 2 × (− )
2


1
. Then x2 + x + 1 = x2 + 1 x + 1 2  1 . So the coordinate of this vector with respect to
− 12


1
another basis is  1 
− 12
       
1
0
0
Example 131. Let V = M3×1 (R), now choose the basis   0   1   0   For keeping the
0
0
1


 
1
0
0
2
1
0 . Now the vector  4  is a linear combination
beauty of mathematics, we write it as  0
0
0
1
2
of basis, namely,
   
 
 
2
1
0
0
 4 = 0 ×2+ 1 ×4+ 0 ×2
2
0
0
1
, then it means
  
 
1
0
0
2
2
 4 = 0
1
0  4 
2
0
0
1
2

 
  
2
1
0
0
2
1
0  is  4 . (Looks like matrix
, so the coordinates of  4  with respect to the basis  0
0
0
1
2
2
multiplication huh?)
85


1
0
0
1
0  we observe that
Now we choose another basis  1
0
1
1
   
 
 
2
1
0
0
 4 = 1 ×2+ 1 ×2+ 0 ×0
1
1
2
0
This means

 
2
1
0
0
 4 = 1
1
0 
2
0
1
1
.
 

2
This means the coordinate of  4  with respect to the basis 
2


2
2 
0
1
1
0
0
1
1
  
0
2
0  is  2 
1
0
We schetch a very important point of view. when basis is chosen, we can treat vectors as column
matrices.
Proposition 67. Suppose V is a linear space over F , v, w ∈ V , e1 e2 · · · en is a basis of V. let




a1
b1




 a2 
 b2 
v = e1 e2 · · · en  ..  ; w = e1 e2 · · · en  .. 
 . 
 . 
an
bn
Then:

v+w =
e1 e2 · · · en




a1 + b1
a2 + b2
..
.





an + bn

v−w =
e1 e2 · · · en





v×λ=
e1 e2 · · · en




a1 − b1
a2 − b2
..
.





an − bn

a1
a2 

..  × λ
. 
an
This means, although linear space is abstract object, when the base is chosen, we can treat them as if
a column vector by looking at its coordinates. Then the sum of vector is just sum of the coordinate, the
difference of two vector is difference of coordinate. The scalar multiple of a vector is a scalar multiple of
coordinate. So the most important vector space we should handel well is the vector space of column matrices
M n1F . If we can understand this vector space well, other vector space is all the same with this one if we
choose a base.
Previously with the basis chosen, every single vector have unique coordinate, and we are able to handle
it by just think it as a column matrix.
86
But how to know bunch of vectors are linearly independent? How to know whether they span the whole
space or not? How do we know two bunch of vectors actually would span the same space?
With basis chosen, every vector transferred to a coordinate, coordinate is like a name card of the vector, a
realization of abstract object by explicitly computable and understandable column matrix. When we want to
talk about the property of the vector, then it is enough to look at its name card - the column matrix. To study
the properties of bunch of vector, we should put them together, so in fact we should put their coordinate
together to form a matrix, and link the property of the bunch of vectors to the property of matrix. And use
this matrix to give the algorithm of finding these vectors are weather linearly independent, span the whole
space, or became a new basis.
Let’s begin with the strict definition of the coordinate matrix we want.
e
e
·
·
·
e
, then for every bunch of vector
Proposition 68. Suppose
V
is
a
linear
space,
with
basis
1
2
n
v1 v2 · · · vm , There exists a unique n × m matrix P, such that we can put them into the equation:
v1 v2 · · · vm
=
e1 e2 · · · en
P
and each column of P is the coordinate of each vi with respect to the basis e1 e2 · · · en
Proof. Because e1 e2 · · · en is a basis, then for each vi , where i ranges from 1 to m, there exists a
unique coordinate column matrix such that


p1i


 p2i 
vi = e1 e2 · · · en  .. 
 . 
pni
, then put all the column vectors together,Let



P =

···
···
..
.
p1m
p2m
..
.
pn1 pn2 · · ·
pnm
p11
p21
..
.
p12
p22
..
.





Then we have exactly
v1 v2 · · · vm
=
e1 e2 · · · en
P
The uniqueness of this matrix comes from the uniqueness of the coordinate with respect to basis. Each
column of this matrix is exactly the coordinate of corresponding vi .
Theorem 5. Suppose V is a linear space over F , if there exists a basis e1 e2 · · · en consists of
finitely many vectors, say n, Then Any basis of V will consists of n vectors.
Proof.
Suppose we have two bases of linear space,
namely, e1 e2 · · · en , 1 2 · · · m
because e1 e2 · · · en is a basis, there exists a unique matrix Pn×m such that
1 2 · · · m = e1 e2 · · · en P
because 1 2 · · · m is a baies, there exists a unique matrix Qm×n , such that
e1 e2 · · · en = 1 2 · · · m Q
substitute this equation to previous one we have
1 2 · · · m = 1 2 · · · m QP
87
because we also have the obvious identity
1 2 · · · m = 1 2 · · · m Im
Then because of basis 1 2 · · · m , the coordinate is unique, that is
QP = Im
Similarly, we will have
P Q = In
.
Using the identity
det(Im + QP ) = det(In + P Q)
we now know
2m = 2n
So m = n
Thus the element of e1 e2 · · · en , 1 2 · · · m are the same.
By the arbitrary choice of basis, we showed every basis consists of same number of vector.
Corollary 3. Suppose
V is a linear space over F , with basis e1 e2 · · · en , then the set of vectors
v1 v2 · · · vn is a basis if and only if in the expression
v1 v2 · · · vn = e1 e2 · · · en P
, P is invertible.
Proof.
Suppose
v1 v2 · · · vn is a basis
Then there would exists a unique square matrix Q, such that
e1 e2 · · · en = v1 v2 · · · vn Q
Then doing substitution,
e1 e2 · · · en
e1 e2 · · · en
=
We know P Q = In ,
so P is invertible.
On the other hand, Suppose P is invertible
If

v1 v2 · · · vn




a1
a2
..
.



=0

an
That means

e1 e2 · · · en


P

a1
a2
..
.
an
88



=0

PQ
That means



P

a1
a2
..
.


 
 
=
 
an
0
0
..
.





0
so





a1
a2
..
.






 = P −1 


an
0
0
..
.


 
 
=
 
0
0
0
..
.





0
So v1 v2 · · · vn is linearly independent.
For any vector w ∈ V there exists expression


c1


 c2 
w = e1 e2 · · · en  .. 
 . 
cn
doing substitution, we have

w=
v1 v2 · · · vn


P

c1
c2
..
.





cn
Thus
So any vector is a linear
combination of vi
v1 v2 · · · vn spans the whole space
v1 v2 · · · vn is a basis.
Now the previous work gives us the first link, basis linked to invertible matrices.
Example 132. Suppose V is a linear space over F with a basis e1 e2 e3 , and suppose we know

 v1 = 3e1 + 2e2
v2 = 5e1 + e2 + e3

v3 = 2e1 + 6e3
Does v1 v2 v3 a basis?
Answer: Write this into matrix form:


3
5
2
v1 v2 v3 = e1 e2 e3  2 1 0 
0 1 6


3 5 2
, then because e1 e2 e3 is a basis. Then it is the same with asking whether  2 1 0  is invert0 1 6
ible. By calculating its determinant
3 5 2 2 1 0 = −38 6= 0
0 1 6 89
We know that it is invertible, thus
v1 v2 v3
form a basis.
Definition 52. Let V be a linear space over F , then the number of vectors in a basis of V does not depend
on the choice of basis. this number is then called the dimension of V.
Example 133. In our daily life. Suppose origin is chosen. a point is 0-dimensional. an infinite line is 1dimensional over R, an infinite flat plane is 2-dimensional over R, with time terminate and regardless of the
curvature of our space, then the world we live is 3-dimensional over R, if you would like to add the freedom
on time. Then the world we live is 4-dimensional over R, but what is the relation between time and length?
roughly speaking, you could think as if
√
1s = 299792458 −1m.
Example 134. Let V = M2×2 (R), with usual additionand scalar
multiplication,
then V
is 4-dimensional
1 0
0 1
0 0
0 0
vector space over R, because it has a basis consisting of
,
,
, and
.
0 0
0 0
1 0
0 1
a b
a b
1
0
Now let’s consider it’s subset {
|
=
}, This is actually a subspace, why?
c d
c d
2
0
because if we have 2 such matrix in the subset, say suppose we know
a1 b1
1
0
a2 b2
1
0
=
=
c1 d1
2
0
c2 d2
2
0
Then any linear combination of this two matrix would give the same property:
a1 b1
a2 b2
1
(λ
+µ
)
c1 d1
c2 d2
2
= λ
=
0
0
a1 b1
c1 d1
0
0
+
1
2
+µ
a2 b2
c2 d2
1
2
0
=
0
Now how many dimension is this linear subspace? As we can see the equation
a b
1
0
=
c d
2
0
gave us 2 linear relation. With 2 restrictions, we expect the dimension is 4-2=2. Let’s check. By matrix
multiplication, we have the following relation:
a + 2b = 0
c + 2d = 0
So we have a = −2b, c = −2d, so we can represent the element in this set as
−2b b
−2d d
There are no restriction onb and d now.
are free now.
We −2 1
Then, clear it has basis
and
, so the subspace is 2-dimensional.
−2 1
90
Proposition 69. Suppose V is an n-dimensional space over F , and suppose v1 v2 · · · vn consists
of n vectors, where n equal to the dimension and linearly independent. Then v1 v2 · · · vn is a basis
of V
Proof.
Taking a basis of V, say e1 e2 · · · en
then there is a matrix P, such that v1 v2 · · · vn = e1 e2 · · · en P
independent
then rank(P ) = n
Now we claim because v1 v2 · · · vn are linearly


x1


 x2 
This is because e1 e2 · · · en P  ..  = 0 only has 0 solution.
 . 
xn



This means P 

x1
x2
..
.


 
 
=
 
0
0
..
.



 only has 0 sulution.

xn
0
This means there is no free variables,
because the number of free variables is equal to n − rank(P )
Then n − rank(P ) = 0
So rank(P ) = n
Thus P is invertible,
Then v1 v2 · · · vn become another basis.
Proposition 70. Suppose V is an n-dimensional space over F , then any n+1 vectors in V will be linearly
dependent.
Proof.
Suppose those n+1 vectors together is linearly independent.
Then any n of them is linearly independent.
Then any n of them forms a basis.
Then the left one can be linearly represent by those n vectors.
They are actually linearly dependent.
Corollary 4. Suppose V is a linear space over F , W is a subspace of V , then
dim(W ) ≤ dim(V )
Proof. Choose a basis of W, then the vectors in that is also linearly independent in V, so the statement is
clear.
Example 135. Show that if A is an 2 × 2 matrix over R, then show that there exists a polynomial f (x) =
ax2 + bx + c with degree at most 2, such that f (A) = A2 + bA + cI is not invertible.
2
Proof. We consider the space V = M2×1 (R), and take a non-zero vector, say
, and consider the
1
following three vectors:
2
2
2
2
I
;A
;A
;
1
1
1
91
Then they are three vectors in the 2 dimensional space V. So they must be linearly dependent. So there exists
a non-trivial linear relation
2
2
2
0
2
aA
+ bA
+ cI
=
1
1
1
0
Then it is the same thing as
2
(aA + bA + cI)
That is
f (A)
2
So f (A) is not invertible, because if it is, then
1
2
1
2
1
=
=
=
0
0
0
0
[f (A)]−1
0
0
=
0
0
Example 136. Let Px2 = {ax2 + bx + c|a, b, c ∈ R} the space of all polynomials with degree at most 2.
Show that the following polynomials forms a basis.
u(x) = x(x − 1); v(x) = x(x − 2); w(x) = (x − 1)(x − 2)
Proof. As we know Px2 is 3-dimensional because we can choose the basis x2 x 1 , so in order to
show 3 vectors form a basis, it is enough to show they are linearly independent. To show this, we suppose
there is any linear relation, that is
ax(x − 1) + bx(x − 2) + c(x − 1)(x − 2) = 0
Then this means this equation is ture for any value of x. Now plug x=1 in this equation, we get
−b = 0
Now plug x=2 in this equation, we get
2a = 0
Now plug x=0 in this equation, we get
2c = 0
That means a = b = c = 0, so they are linearly independent, with the fact that the number of them is
euqal to the dimension, they form a basis.
Change of Basis
Suppose we have two bases in the linear space, what is the relationship of coordinate of the same vector?
We have studied in a vector
of the same number of vectors. Then suppose
space, any two bases consists
we have e1 e2 · · · en , and 1 2 · · · n as two basis, then there exists an unique invertible
matrix P, such that
1 2 · · · n = e1 e2 · · · en P
, this P is called the change of basis matrix
There are several ways to change basis, look at this example. We show three method to do it. Each of the
method is actually totally same, but with different habit of notation.
Example 137. (Classical)Suppose V is a linear space with basis e1 e2 e3 , and suppose we have
three vectors

e1
 1 =
2 = e1 + e2

3 = e2 + e3
If v = e1 + 2e2 + 4e3 , what is the coordinate of v in the basis 1 2 3 ?
92
Answer: We are trying to compose a linear combination. suppose the coefficient is x, y, z, then we list the
equation:
xe1 + y(e1 + e2 ) + z(e2 + e3 ) = e1 + 2e2 + 4e3
distribute that and combine terms with same ei , we get
(x + y)e1 + (y + z)e2 + ze3 = e1 + 2e2 + 4e3
By compare the coefficient in front of e1 , e2 , e3 respectively, we have

 x+y = 1
y+z = 2

z
= 4
Solve this equation, we get x = 3, y = −2, z = 4
Now we turn to the second method, because our basis is chosen and fixed, let’s treat them exactly like
column vectors.
 
     
1
1
1
0







2
Example 138. (Simplified)What is the coordinate of
in the basis { 0 , 1 , 1 }?
4
0
0
1
Answer: let these three column vector to combine by scalar x, y, z, now we list the equation
 
 
 
 
1
1
0
1
 0 ×x+ 1 ×y+ 1 ×z = 2 
0
0
1
4
This is the same as asking

  
x+y
1
 y+z = 2 
z
4
 
 
1
1



2
2 
, then we solve x = 3, y = −2, z = 4. So the coordinate of
in the new basis is
4
4
Now let’s see the third method.
Example 139. (Strict)Suppose

V is a linear space with basis
1 1 0
e1 e2 e3  0 1 1 . If v ∈ V has coordinate 
0 0 1
coordinate of v with basis 1 2 3 ?


x
1
3  x2 
x3
1 2
v=
e1 e2 e3 , another basis 1 2 3 =
1
2  with basis e1 e2 e3 , what is the
4
, then because

1 1 0
 0 1 1 
0 0 1

1 2 3
=
e1 e2 e3
So we have

v=
e1 e2 e3


1 1 0
x1
 0 1 1   x2 
0 0 1
x3
93
On the other hand

v=
e1 e2 e3

1
 2 
4
So
  

1
1 1 0
x1
 0 1 1   x2  =  2 
4
0 0 1
x3

Then


−1  
x1
1 1 0
1
 x2  =  0 1 1   2 
x3
 0 0 1
 4 
1 −1 1
1
=  0 1 −1   2 
0
1
4
 0 
3
=  −2 
4


Thus we know the coordinate of the vector v in the basis
1 2 3

3
is  −2 
4
The Algorithm for linearly independent of vectors. .
Now we figure out an algorithm to determine a bunch of vector v1 v2 · · · vm is linearly independent or not. Now with our basis chosen, it is the same as asking how to determine
a bunch of column matrix
is linearly independent or not. Keep in mind in the notion v1 v2 · · · vm = e1 e2 · · · en P ,
each column of P is exactly the coordinate of each vi , so our intuition tells us that the linear independence
of vi is exactly linear independence of each column of P.
Proposition 71. Now let V be a n-dimensional linear space over F , e1 e2 · · · en is a fixed
basis.
and v1 v2 · · · vm is bunch of vectors, and v1 v2 · · · vm = e1 e2 · · · en P then
the following are equivalent
(1) v1 v2 · · · vm is linearly independent
(2) Each column of P is linearly independent
Proof. The proof is super clear, because a linear relation of v1 v2 · · · vm is the same as linear
relation of the column of P. what I mean is




  

0
a1
a1
a1
 a2   0 

 a2 
a

2



  


v1 v2 · · · vm  .  = 0 ⇔ e1 e2 · · · en P  .  = 0 ⇔ P  .  =  . 
 ..   .. 
 .. 
 .. 
am
am
am
0
So their linear independence is equivalent.
Example 140. Suppose V is a linear space with basis e1 e2 e3 . And suppose

 v1 = 2e1 + 3e2
v2 = e 1 + e 2

v3 = e 2
94
Then clear
v1 v2 v3
has a linear relation:
v1 − 2v2 − v3 = 0
.
Now with the basis chosen, every vector has coordinate,

 
2





3 
e1 e2 e3

 v1 =




 0 



1
e1 e2 e3  1 
v2 =




 0 




0


 1 
e
e
e
v
=

1
2
3

 3
0
As you see the same linear relation also holds for its coordinates:
   
   
2
1
0
0
 3 − 1 ×2− 1 = 0 
0
0
0
0
So when base is chosen, the coordinate represents the vector. Now the question comes to how to determine
whether a bunch of column matrix is linearly independent or not.
Proposition 72. The following are equivalent.
(1) The columns of matrix 
P is linearly
independent.
 

x1
0
 x2   0 

  
(2) The linear equation P  ..  =  ..  only have 0 solution.
 .   . 
xn
0
(3) The rank of P is the number of its columns
Ir
(4) The reduced row echelon form for P is
0
Proof. The proof is very easy, if you don’t know how to prove, please send me an email.
We use this proposition, together with what we learned before to show several examples.
Example 141. Suppose V is a linear space with e1 e2 e3 as basis, and
v1 = 2e1 + 3e2
v2 = e 1 + e 2
Show that v1 v2 is linearly independent.
Proof: Writting the coordinate of v1 v2 , we get


2
1
v1 v2 = e 1 e 2 e 3  3 1 
0 0
First Method:


2 1
The rank of  3 1  is 2, equal to its columns, so
0 0
Second Method:
95
v1 v2
is linearly independent.


2 1 x
0
1
We find that The linear equation  3 1 
=
only has zero solution. that means
x2
0
0 0


x1
2 1
x
1
e1 e2 e3  3 1 
only has
= 0 only has zero solution. Because v1 v2
x2
x2
0 0
zero solution, then by definition, v1 v2 is linearly independent.
Third Method:




2 1
1 0
We reduced  3 1  to the reduced row echelon form to get  0 1 , then the upper block is exactly
0 0
0 0
an identity matrix, thus each columns are linearly independent.
Get rid of redundant vectors
Suppose we have bunch of vector v1 v2 · · · vm , they span a subspace W , and they are not
necessarily linearly independent. Can you choose the minimal set of vi such that they still span W but
linearly independent?
In order to do it, we need to transfer our linear space to which we familiar with. Look at following:




a1
a1




 a2 
 a2 
v1 v2 · · · vm  .  = 0 ⇐⇒
e1 e2 · · · en P  .  = 0
 .. 
 .. 
am
am



⇐⇒ P 

a1
a2
..
.
am


 
 
=
 
0
0
..
.





0
The first equivalence is get by substitution, the second equavalence is get by linearly independence of
basis. This formal computation showed us that a linear relation of v1 v2 · · · vm is also a linear
relation of columns of P, which is coordinates of v1 v2 · · · vm
Thus, the study of linear independence of vectors is equiavalent to study the linear independence of their
coordinates.
But even for coordinates, it’s also hard for us now to compute their linear independence in mind, look at
the following example.
Example 142. Let linear space V = M5×1 (R). Consider the following column matrices:

   
10
0
5
1
 0  1  0  0 

   
 2  2  0  0 

   
 3  1  1  2 
4
2
5
1
Are they linearly independent? If not, can you get rid of some of them such that they become linearly
independent?
As you can see, this takes a little bit time to discuss. But at least we know we shouldn’t get rid of the
second one. Because the second coordinate of second vector is 1 while others is 0. So getting rid of second
one would result in a subspace that the second coordinate is 0. But that subspace is definitely smaller than
96
the span of these four vectors, so the second vector is not redundant, taking it away will make the subspace
collapse.
With the above discussion, you may find it is easier to know which vector is VIP if they have a non-zero
coordinate entry while other vectors are 0 in the corresponding position. This might tell us to find a way to
simplify general vectors to the vectors we want with their linear relation keeping.
Proposition 73. row transformation does not change the linear relation of column matrices
We have sevral ways to understand this fact.
Matrix interpretation(Strict Language):



Put these column matrices in a matrix, a linear relation of these column matrices is simply a 

a1
a2
..
.



,

an
such that



P

a1
a2
..
.


 
 
=
 
an
0
0
..
.





0
Doing row transformation is nothing more than left multiply an invertible matrix Q, if we do that we have


   
a1
0
0
 a2 
 0   0 


   
QP  ..  = Q  ..  =  .. 
 . 
 .   . 
an
0
0
So a linear relation of columns of P , is a linear relation of columns of QP , a linear relation of QP , is a
linear relation of Q−1 QP = P .
Directly understanding(More on sense)
If we have a linear relation for each columns, that means




 
c11
cm1
0
 c12 
 cm2 
 0 




 
 ..  × a1 + · · · +  ..  × am =  .. 
 . 
 . 
 . 
cn
cm n
0
And these is simply n equations for numbers at each corresponding entry.
c1i a1 + · · · + cmi am = 0
, clear, row transformation is simply the equation adding, and each step is invertible. Then it does not change
the linear relation.
No matter which way you choose to understand this fact, now remember we can reduce a matrix to the
reduced row echelon form, and now the question is simply asking which column of reduced echelon form
should take away.
We copy the definition of reduced row echelon form here to review:
Definition 53. A m × n matrix is called a Row Echelon Form if it saitiesfying
(1) The first non-zero entry for every row vector is 1, we call this leading 1(or pivot)
(2) If a row contains a leading 1, then each row below it contains a leading 1 further to the right
(3) If a column contains a leading 1, then all other entries in that column are 0
97
Remember the concept of reduced row echelon form
Example 143. The following are reduced row echelon form
(1)


1
1

1
1 
1
3
(2)


1
6


1




1
6
1
(3)


1
2
3


1
1
2




1
1
Can you figure out which vector in the column of reduced row echelon form is redundant? Definitely
easy. For each column of reduced echelon form, if this column does not contain leading 1, then this column
is a linear combination of those leading 1. Every columns without leading 1 is redundant vector.
Now we left with columns with leading 1, are they linearly independent now? Definitely is, because each
leading 1 of the column vector does not appear in the same location. So each column have a nonzero entry
while other column has 0 in the corresponding position. In this way getting rid of any vector would result
the collapse of space. And therefore they must be linearly independent.
Now we summarize the method of getting rid of redundant vectors.
The method of getting rid of redundant vectors
Suppose we have a bunch of vectors v1 v2 · · · vm in the space V
Step 1: Choose a basis of the space V: e1 e2 · · · en
Step 2: Write down the coordinates of each vector v1 v2 · · · vm = e1 e2 · · · en P
Step 3: Transform the coordinate matrix P to reduced row echelon form.
Step 4: Then the columns of leading corresponding the useful vector, others are redundant.
Example 144. Suppose there are 3 banks: A bank, B bank, C bank respectively. And there are 3 product.
If you buy the first product, 10% of your money would go to bank A, 90% of your money would go to bank
B. If you buy the second product, 50% of your money would go to bank B, 50% of your money would go to
bank C. If you buy the third product, 5% of your money would go to bank A, 70% of your money would go to
bank B, and 25% of your money would go to bank C. As a manager, you want to invest your money to these
3 product. But actually only consider 2 of them is enough, why?
The most simple reason should be the product C seems a linear combination of product B and product
A, suppose buying A and B together is the same as buying 2C, why do we waste time consider C? The less
factor we consider, the easier life would be.
Now let’s do that by rigorous process.
Step 1: Choose a basis: Let

 e1 = Dollar that willing to go to Bank A
e2 = Dollar that willing to go to Bank B

e3 = Dollar that willing to go to Bank C
This is the basis of the vector space of distribution of dollars. Note that each vector here is dollar itself, not
the variable represents amount of dollar. The difference of the dollar e1 , e2 and e3 is their dream bank that
98
they are willing to go. (Don’t keep this kind of dollar with you, they might jump out of your pocket and run
to Bank A)
Step 2: Write down the coordinate of bunch of vectors.
Now suppose v1 = Product 1, v2 = Product 2, v3 = Product 3 So we have

 v1 = 0.1e1 + 0.9e2
v2 =
0.5e2 + 0.5e3

v3 = 0.05e1 + 0.7e2 + 0.25e3
We put coordinates into matrix.

v1 v2 v3
=
e1 e2 e3
Step 3: Now doing row transformation of the matrix.
99

0.1
0.05
 0.9 0.5 0.7 
0.5 0.25


0.1
0.05
 0.9 0.5 0.7 
0.5 0.25

r3 −0.5×r1
−−
−−−−→ 
0.1

0.9
0.5 0.5 
0.25 0.25


0.1 0.9
r3 −0.5×r2
0.5 0.5 
−−
−−−−→ 

r1 −1.8×r2
−−
−−−−→ 

10×r
1
−−−−→
1


2×r
2
−−−→
0.1
1


−0.9
0.5 0.5 

−9
0.5 0.5 

−9
1
1 
Because the leading 1 is at 1st and 2nd column, then the third column is redundant, correspondingly, v3
is redundant.
Step 4: Conclusion.
So v3 is redundant, after getting rid of v3 , then v1 , v2 would be linearly independent.
That’s why we only have to consider the product A and product B.
Example 145. Suppose V = M5×1 (R), what is the dimension of the subspace W ⊂ V spanned by
    

1
1
3
 3   1   7 
    

 2  ,  6  ,  10 
    

 0   1   1 
2
9
13

    
3
1
1
 3   1   7 

    

   
? Simplify span{
 2  ,  6  ,  10 }
 0   1   1 
13
9
2
In this example, we already have natural basis
    
1
0
 0   1  
    
 0 , 0 ,
    
 0   0  
0
0
0
0
1
0
0
100
 
 
 
,
 
 
0
0
0
1
0
 
 
 
,
 
 
0
0
0
0
1






, so do this directly, put them into matrix.
 

1
1
3

 3
1
7 
 


 2
6
10  = 


 0
1
1  
2
9
13
1
0
0
0
0
0
1
0
0
0
The reduced row echelon form of this matrix is

1





0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1






1
3
2
0
2
1
1
6
1
9
3
7
10
1
13







2
1 




Remark : Now it not only tell us the first and second one can already span the subspace, it also tells us
that the third one is twice of the first combine with the second.
Now we know that the first two vector are linearly independent, and they span the whole subspace. So it
is an basis, so the subspace is 2-dimensional.
And,
    
   

1
1
3
1
1
 3   1   7 
 3   1 
    
   

   
   

span{
 2  ,  6  ,  10 } = span{ 2  ,  6 }
 0   1   1 
 0   1 
2
9
13
2
9
101
3.3. Linear Maps and Linear Transformation. .
Introduction:Now we are heading to the most important part of linear algebra. Previously we’ve talked
the notational meaning of the matrix- to strict the language we use. But I should say the geometrical
meaning of the matrix is more important and more intuitive and should be the soul of linear algebra. You
can use a matrix to represent a linear transformation, and the matrix product is just the composition of
the two transformation. Sometimes linear transformation are losing data, then we use kernel to measure
in what extent did we lose the information. We also should know which value is possible to get by linear
transformation, so we talk about the image to be all possible values that can get by the linear transformation.
Now we start.
Geometrically speaking, linear transformation is the transformation that keeping the parallel lines.
Now that the transformation is keeping the parallel lines, can I completely understand every point maps
to where?
Of course, because parallelogram is keeping, so the parallelogram law is still true for the vector that maps
to., if we know where two vector goes, we will predict out where their sum goes. And so is for the scalar
multiples.
Practically speaking, linear transformation is a process of encoding data that preserves linear combination
(What do I mean by that? I’ll show an example.)
Example 146. In Game room, we have two kinds of game coin, namely, gold game coin and silver game
coin. Every gold one worth 2 dollars, and every silver one worth 1 dollar.
Then the price of the game coins is a linear transformation. How to understand that?
Think the word ”The price of” as an action on the coins that you are going to buy, like
102
The price of (2 gold + 1 silver) = 5 dollars
The price of (5 gold + 2 silver) = 12 dollars
Now, the game coins are putted into a bag. Suppose you don’t know how many coins and what types the
coins are in the bag. But your friend know and only tells you the price. He said the coins in red bag worth 6
dollars, and the coins in dark bag worth 5 dollars.
Problem 6. What is the price of the coins if you put the coins in two bags together?
In our general sense, we don’t have to pull all coins out and calculate price one by one again, what we
only have to do is sum up 5 and 6, and the total would be 11 dollars.
Problem 7. What is the price of the coins of 3 black bag and 2 red bag?
You also don’t need to pull them out and add one by one, simply do 5 dollars × 3 + 6 dollars × 2 =
27 dollars
Without knowing coins, simply do the combination we can know the information which is the same as we
do if we pulling out all the conins and sum up. This kind of encoded data is the typicall example of linear
map, and this example is exactly what I mean by preserving linear combination.
Now let’s see an example of non-linear map
Example 147. The diamonds are embed on the ring, and the price of the ring is propotional to the squate
of the number of diamond on the ring, for example, we have the following price

 Ring with 1 diamond \$100
Ring with 2 diamonds \$400

Ring with 3 diamonds \$900
Now suppose I have 2 rings, the left one has 2 diamond, worth \$400 and the right one also has 2 diamond,
worth \$400. What is the price of the ring if I dig out the all the diamond on this two ring and embed on a
new ring?
Obviously, the price is not just only 400 + 400 = 800, instead, the price goes up to 1600, this map, the
price of the ring, is not linear to the diamond.
With this understanding of linear maps, Let’s define what is linear map.
Definition 54. Suppose V1 , V2 is two linear space over F , A map T : V1 −→ V2 is called a linear map if
for any vector v, w ∈ V1 , and any scalar λ, we have
(1) T (v + w) = T v + T w
(2) T (λv) = λT v
We call the space V1 Domain Space, and space V2 the Target Space. In the case that Domain space and
Target Space are the same, we often call it Linear Transformation
Now return back to the case
Example 148. As we previously discussed bags of coins, that is not actually a linear space. Since the
number of bags are integers, instead of decimals. Also the number of bags does not allow negative numbers,
but in the concept of linear space we allow negative number. But still, if we can think decimal bags and
negative bags, then taking its price would be a linear map. because the price of the sum of two bags is the
sum of the price, and the price of scalar multiple of bag is scalar multiple of price.
This is the example where domain space and target space are not the same. So it is a linear map.
Now look at the most popular linear transformation: Rotation.
103
Example 149. Suppose we already choose an orthogonal axies(But we didn’t define orthogonal yet). And
we select our basis to be 1 unit
arrow on x-axis and y-axis(Standard Orthogonal Basis, which we also didn’t
define), denote as e1 e2 respectively, and consider rotation by the angle θ
This is the example where domain and target are the same, so
it isan linear transformation.
x
, What will this linear transformation
Every element in this plane can be expressed as e1 e2
y
do for this vector?
To understand the behavior of the arbitrary vector, we test this linear transformation on the basis first,
using the trigonometric function we have:
Then we can represents our work into following equations:
T e1 = cos θe1 + sin θe2
T e2 = − sin θe1 + cos θe2
It is clear that rotation preserve the parralel lines, so rotation is actually a linear transformation. With
the property of transformation ,we are able to calculate where any point goes.
For example, let’s calculate where the vector e1 + e2 goes.
T (e1 + e2 )
= T e1 + T e2
= (cos θe1 + sin θe2 ) + (− sin θe1 + cos θe2 )
= (cos θ − sin θ)e1 + (sin θ + cos θ)e2
This is not the only method, remember we can represent things into matrix. To put our work into matrix,
we have:
104
T
e1 e2
T e1 T e2
=
=
1
1
e1 e2
cos θ − sin θ
sin θ cos θ
So we calculate:
T (e1 + e2 )
= T
e1 e2
=
e1 e2
=
e1 e2
cos θ − sin θ
sin θ cos θ
cos θ − sin θ
cos θ + sin θ
1
1
As you can see again, matrix put everything together with neat work. And it seems linear transformation
is actually like matrix multiplication. If the basis is chosen, we regard vectors as column matrices, then the
linear transformation in this world is like a matrix. Let’s define now the matrix representation of T.
Definition 55. Suppose T : V −→ W is a linear map, and e1 e2 · · · en is basis for V, 1 2 · · · m
is basis for W. Then there exists a unique matrix P, with size n × m, such that
T
e1 e2 · · · en
=
1 2 · · · m
P
Then this matrix P is called the matrix representation
of T with respect to basis
Domain Space and basis 1 2 · · · m of Target Space
e1 e2 · · · en
of
Now let’s find matrices with respect to the bases. We introduce several examples including some practical
example.
Example 150. There are 3 brothers. Namely, A,B,C. They have some money originally. Suppose there is
neither income and costs. And they like to share their money. At the end of every month, each of them will
split his money into two equal part and give it to other two. For example. Suppose A has 300 dollars at first
month. At second month, A with give 150 to B, and give 150 to C.
(1) Suppose in a month, A has \$1000, B has \$ 500, C has \$ 200, what is the money of them next month.
Answer: Of course we can do it directly, As A will receive half of B and half of C, so A would
have 350. B will receive half of A and half of C, so B would have 600, C would receive half of A and
half of B, so C would have 750.
But, we want to use higher understanding to this process. If we view all possible status of the
money each one has as a linear space. OK, then it is 3 dimensional. And at the end of every month,
the status changes, and this change, is linear. So it is the same as saying every month we do an
linear transformation of current status, and the out put is the status next month.
Let V be the linear space of the current account status of A,B,C. We choose a basis for V
e1 = a dollar that belongs to A
e2 = a dollar that belongs to B
e3 = a dollar that belongs to C
And let T be the operator that changes the status to the next month. What is T e1 , T e2 , T e3
respectively?
105
Because A will split every dollar of him and give it to B and C. So a dollar belongs to A now
become two quarter belongs to B and two quarter belongs to C. That is
1
1
T e1 = e2 + e3
2
2
And we write everything down, that is

 T e1 =
T e2 =

T e3 =
1
2 e1
1
2 e1
1
2 e2
+ 12 e3
+ 12 e3
1
+ 2 e2
Now put our work into matrix, that is
1
2

e1 e2 e3
T
=
e1 e2 e3

1
2
1
2
1
2
1
2
1
2


Because A has 1000, B has 500, C has 200. This status corresponding to vectors


1000
e1 e2 e3  500 
200
And plug in the matrix representation form of T. the linear transformation act on this would result
matrix multiplication on the coordinates:

T
e1 e2 e3

1000
 500  =
200

e1 e2 e3

1
2
1
2
1
2
1
2
1
2
1
2


1000
  500 
200

e1 e2
=

350
e3  600 
750
= 350e1 + 600e2 + 750e3
That means the status of next month. 350 dollars belongs to A, 600 dollars belongs to B, and 750
dollars belongs to C.


1
1
In this example, the matrix 
basis.
1
2
1
2
2
1
2
2
1
2
 is the matrix representation of T with respect to the
Example 151. Suppose V = M2×2 (R) is the linear space of all 2 × 2 matrices, then we define a map by
1
1
T (A) =
A−A
2
2
(1) Is this map a linear transformation?
106
Answer: Yes, it is, let A, B be two matrices, and λ, µ be two scalars, we found that
T (λA + µB) =
1
2
=
1
= λ
1
1
= λ(
2
− µB
B − µB
2
1
2
B−B
2
1
2
1
1
) + µ(
1
1
2
+µ
2
A−A
2
µB − λA
1
A − λA
2
2
1
2
1
λA +
2
(λA + µB) − (λA + µB)
1
2
)
= λT (A) + µT (B)
T
So, indeed, this is a linear map.
If you are proficient with matrix notation, you can also check by matrices. But remember in matrix
notation, it is better to represent scalar multiplication by λI2 where the subindex is the number of
columns if it multiplied on the right. and rows if it is multiplied on the left.
Now let’s show the neater work:
µI2
µI2
µI2
1
1
A B
A B
=
− A B
λI2
2
λI2
λI2
2

=
1
A
2
1
B
2
µI2
λI2
µI2
λI2
µI2
λI2
−
 µI2
2 




1
λI2
2
A B
 =
1
A
2
1
B
2
=
1
A
2
1
B
2

µI

2 2 

A B 


1
λI2
2
−


1

1
µI2
λI2
µI2
λI2
2

A B 

−
1



1
2
=
=
A
2
=
1
2
A−A
T (A) T (B)
1
2
µI2
λI2
107
B
2
1
1
µI2
λI2
1
2
−
A
2
B−B
1
B
1
2
1
2
µI2
λI2
Whenever the operator T can pass though, it is an linear map. Because it defined in the same
space, it is an linear transformation.
(2) Find the matrix representation of this linear map with respect to the basis
1
1
1
1
−1
1
We proceed by apply T on each of the vectors in the basis, and then represent the result vector by
this basis again.
1
T(
1
) =
1
T(
1
−1
= 0
1
1
) =
1
2
=
2
1
−
1
+0
+0
1
1
2
1
−
1
1
+ (−1)
−1
1
+0
1
1
2
1
= 0
1
1
−1
1
+0
1
−
+0
1
1
−1
2
1
= 0
+0
= (−1)
T(
1
+0
−1
1
1
2
−1
1
1
) =
1
1
1
+0
0 0
0 0
=
1
2
T(
1
) =
1
1
−
1
1
= 0
1
2
0 0
0 0
=
1
1
+0
1
−1
108
+0
1
+1
1
So we know that
T
1
1
1
−1
1
1

=
1
1
1
−1
1
1
0
 0

 0
0
0
0
0
0
0
0
−1
0

0
0 

0 
1
The right matrix is the matrix representation for linear transformation T.
As you can see, the expression on the right hand side make sense. Because scalar right act on
matrix as multiplication.
So indeed, this matrix is the matrix representation of T.
This might falls you into the worry. Did we create a work that can put 2 × 8 matrix and 4 × 4
matrix multiply together ?!
No, we didn’t, As you can see, because now our object that the scalars act on is matrix. So the
left matrix is actually 1 × 4 instead of 2 × 8. So 1 × 4 matrix times 4 × 4 matrix make sense.
Now we stop by here.if you still, maybe, would like to put into block matrix work, and you still
think the left hand side as a 2 × 8 matrix. Now, remember the scalar multiplication is actually scalar
matrix multiplication, which I mean is:
a b
c d
is actually
λ −−−−−→
a b
c d
λ
λ
So if you want your notation work as block matrix multiplication, you should really goes like:
 1
1
1
1
−1
1
0

 0

0


 0

0


 0

0
0
0
0 0
0 0
0 0
0
0
0
0
0
0
0
0
0
0
0 0
0 0
0 0
0
0
0
0
0
−1
0
0
0
0
0 0
0 0
−1
0
0
0
0
0
0
0
0
1
0

0
0 


0

0 


0

0 


0
1
Now it is 2 × 8 multiply 8 × 8, it make sense now.
But, in this kind of problems, we view matrices as object, so we still think the left matrix as 1 × 4,
and right matrix as 4 × 4. And the matrix representation of the linear transformation is of course
4 × 4 We haven’t create any work that can put 2 × 8 and 4 × 4 matrix multiply together.
Example 152. Suppose V = M2×1 (R), W = M3×1 (R), And suppose we have a linear transformation
T : V −→ W defined by
 
1
1
T
=  1 
0
 0 
0
0
T
=  1 
1
1
109
1
0
Find the matrix representation of T with respect to the basis
in domain space and
0
1
      
1
0
0
basis   0   1    0  in target space
0
0
1
Answer: We proceed like following: we test our linear transformation on each of the vectors of basis:
   
 
 
1
1
0
0
1







1
0
1
0 ×0
T
=
=
×1+
×1+
0
0
0
0
1
   
 
 
0
1
0
0
0
T
= 1 = 0 ×0+ 1 ×1+ 0 ×1
1
1
0
0
1
.
Then we put our work into matrix, we get



1 0
1
0
0
1
0
1
0  1 1 
= 0
T
0
1
0 1
0
0
1


1 0
So the matrix representation of T with respect to that basis is  1 1 .
0 1
a
Now let’s look what is T
? Because in the natural basis, the coordinate of column matrix is just
b
this matrix itself. In this way, we have
a
T
b
= T

1

0
=
0
1
0
0
1
0
1
0
a
b


0
1 0 a
0  1 1 
b
1
0 1

1 0 a
=  1 1 
b
0 1

So we see in this example, the linear map that took place from column matrices to column matrices is just
realized by matrix multiplication. And this matrix is obtained by represent the linear map with respect to
those basis.
We have already told that with a basis, we can identify a vector with a column vector. With this identification, the linear relation, linear independence, and linear dependence are all the same.
So what is the linear transformation like if we using this identification. As you see in the previous
problem. linear map for column vectors is just multiplying the matrix. So we wish our abstract linear map
could also be realized as matrix multiplication on the coordinates.
110
Proposition 74. Suppose
V, W is two linear spaces over F , with bases e1 e2 · · · en of V and
1 2 · · · m of W. Suppose T : V −→ W is a linear map. And let P be the matrix representing
this linear transformation with this two bases.
T e1 e2 · · · en = 1 2 · · · m P
Then for any vector with coordinate expression

e1 e2 · · · en
v=





a1
a2
..
.




an
Then the coordinate form of T v is

1 2 · · · m
Tv =


P

a1
a2
..
.





an
Proof. Apply T to the second equation and use first equation to substitute we get the third equation.
Although this is an trivial statement, but this tells us if we only look at coordinate, then what the linear
map did is just multiply it by a matrix P.
Example 153. Suppose V = Px2 = {ax2 + bx + c|a, b, c ∈ R} is the linear space of polynomial with real
coefficient with degree at most 2. T : V −→ R is a linear transformation defined by evaluating by plug in
x=2
(1) with the basis x2 x 1 in V, and (1) in R, write down the matrix that representing the linear
transformation.
Answer: We test it on each of the basis:
T (x2 ) = 1
T (x) = 1
T (1) = 1
So we have
T
x2 x 1
= (1)
1 1 1
1 1 1
1
(2) Suppose v ∈ V has coordinate  3  with respect to basis
5
of T v with respect to basis (1)?
 
1
1 1 1  3 =9
5
So the matrix representation 
is
So the coordinate of T v with respect to basis (1) is 9.
111
x2 x 1 . what is the coordinate
Example 154. Previously we have seen with respect to the basis e1 e2 , our rotation has representation
cos θ − sin θ
sin θ cos θ
3
, what is the coordinate of T v?
. If v = e1 e2
4
Answer: Just multiply the matrix on it, the coordinate of T v with respect to the basis e1 e2 is going
to be
cos θ − sin θ
3
sin θ cos θ
4
So it is
3 cos θ − 4 sin θ
3 sin θ + 4 cos θ
In other words,
3 cos θ − 4 sin θ
T v = e1 e2
3 sin θ + 4 cos θ
To conclude, there is an identification of two world with bases, the left hand side is abstract linear space,
and linear transformation among them. The right hand side is the coordinate and matrix multiplication. We
conclude our result into the following form.
For Spaces, with chosen bases we have following:
vectors
↔ Coordinates
bunch of vectors
↔ Coordinate Matrix P
bunch of vectors form a basis
↔ Coordinate Matrix P is invertible
bunch of vectors are linearly independent ↔ rank(P) = number of columns of P
bunch of vectors spans the whole space
↔ rank(P) = number of rows of P
Linear Transformation
↔ Matrix multiplication
I’ll say one more word of the last row of the form. The matrix is given by represent linear transformation
into matrix by those basis.
3.4. Attempting to go back - Isomorphism, Kernel and Image. Now we know the concept of linear
transformation, but typically the most interesting question is, what is the inverse?
Example 155. Let’s keep previous problem of 3 brothers. Remember at the end of each month they split
their money and give it to other two.
Question: Suppose A,B,C have \$350, \$600, \$750 respectively this month, how much do they have previous
month?
We can also figure out this by elementary method, it is the money of A should originally given by B and C,
it represent half of the money of B and C. so B,C together has \$ 700 in total last month. And in the process
of distributing, the total amount of money does not change, and the total amount is \$1700, So that means A
has \$1000 last month. and we can figure out by the same process that B has \$ 600, C has \$ 750.
Now let’s do linear algebra. previously we know that


1
1
2
2
1 
T e1 e2 e3 = e1 e2 e3  1
2
1
2
112
1
2
2
Now suppose T −1 is the inverse change, then T −1 T , is just go back and go forward, so would change
nothing. We apply(left multiply) T −1 on both side, we have


1
1
2
2
1 
e1 e2 e3 = T −1 e1 e2 e3  1
2
1
2

and we’d like to move matrix to the other side, right multiply by 
e1 e2 e3
1
2
1
2

1
2
1
2
1
2

1
2
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
−1

, we have
−1
= T −1

e1 e2 e3
That means the matrix representation of linear transformation T −1 is the inverse of the matrix. Let’s
calculate this matrix out.


1
2
1
2
1
2
1
2
1
2
1
2
−1



−1 1
1
=  1 −1 1 
1
1 −1
That means

T −1
e1 e2 e3
=
e1 e2 e3
Now that the status now is

−1 1
1
 1 −1 1 
1
1 −1

e1 e2 e3

350
 600 
750
, so apply T −1 on that will result matrix multiplication on the coordinates. so previous month, the status
would be





−1
1
1
350
1000
e1 e2 e3  1 −1 1   600  = e1 e2 e3  500 
1
1 −1
750
200
So in the previous month, A has 1000 dollars, B has 500 dollars, C has 200 dollars.
Now the question comes, can any linear transformation goes back? Think about the question that we
talked about previously, the coins of game room. a gold coin worth \$ 2, and silver worth \$ 1. And now they
are packed in bags and you know the price of conins in the bag.
Now see the linear map case first. Suppose we map X to Y, now we want to go back from Y to X.
Problem 8. Suppose you know the total price of coins in red bag is \$ 7, do you know how many gold how
many silver are there?
This is a go back question. We want to know the original status of gold and silver, solely on its total price.
You will see the answer is not unique. You could have the following combinations:

3 gold coin + 1 silver coin



2 gold coin + 3 silver coin
7dollars =
1 gold coin + 5 silver coin



0 gold coin + 7 silver coin
113
If you know some of the answer, how do you know each of the asnwer? As You can see, each time you
subtract 1 gold coin, you supplement 2 silver coin. So we can decribe this process by you add 2 silver- 1
gold coin. So this ammount
2 silver - 1 gold
does not change the total price, but change the vector in your domain. Thus, we see the combination such
that the price is 0 is very important. It measures the difference betweeen your prediction and the real fact.
This is the first factor prevent you from predicting the original value, the possible combination that result to
0. Now let’s define the kernel of a linear map
Definition 56. Suppose V W are linear spaces over F , and T : V −→ W is a linear map, Then the kernel
of T is defined to be the set
ker(T ) = {v ∈ V |T v = 0}
Then it is a linear subspace in the domain.
As we can see one reason for us to failed to going back is the kernel. When the kernel is not zero space,
we can not tell exactly which vector is the departure.
Sometimes the case is even worse, if no vector maps to here, it is even impossible for going back. In the
previous question, the question is like
Problem 9. For coins in the bag. What kind of price can be result?
We know every price could be result if we treat the amount of these coins could be decimal or negative
numbers.
Now let’s define the image of a linear map. Only if you are cosidering the vector in image could you goes
back.
Definition 57. Suppose V W are linear spaces over F , and T : V −→ W is a linear map, Then the image
of T is defined to be the set
Im(T ) = {w ∈ W |w = T v for some v}
Then it is a linear subspace in the domain.
The kernel is the subspace of domain definede by the restriction T v = 0, Then the method to find the
kernel is by getting rid of the restriction by parametrization, and then write every vector in the kernel as the
linear combination by these parameters and then we get a basis of the subspace.
The image is the subspace of target defined by any possible vectors that can mapped from domain, thus,
because every vector is a linear combination of the basis in the domain, so every vector should mapped to
the linear combination of the image of the basis. This tells us the image is the subspace spanned by the
image of basis. To find the image of linear map, we simply get rid of redundant vectors.
Example 156. Suppose V1 , V2are linear spaces over R, and T : V1 −→ V2 is a linear map. e1 e2 e3
is a basis for V, and 1 2 is basis for V2 . Suppose we know

 T e1 = 1 + 32
T e2 = 21 + 2

T e3 = 1 − 52
(1) Find the matrix representation of T with respect to the basis e1 e2 e3 and 1 2 .
1 2 1
T e1 e2 e3 = 1 2
3 1 −5
114
(2) Find the kernel of T

e1
The kernel of T is asking for v =
T
e1 e2

x
e2 e3  y  such that
z
 
x
e3  y  = 0
z
Plug in the matrix representation, that is, asking for
 
x
1 2 1
 y =
1 2
3 1 −5
z
1 2
0
0
Now we are ready to solve x,y,z. Note that we can cancel linear independent vectors on both side.
So we have
 
x
1 2 1
 y = 0
3 1 −5
0
z
Reduce to reduced row echelon form we have
 
 
x
x
1
2
1
0
1
− 11
5




y
y
=P
=
8
3 1 −5
0
1
5
z
z
we get the solution:
x − 11
5 z = 0
y + 85 z = 0
Now assign free variable z by parameter 5t, we have
  
 

x
11t
11
 y  =  −8t  =  −8  t
z
5t
5
Thus any v ∈ ker(T ) have the form

v =
e1 e2

x
e3  y 
z

=
e1 e2 e3

11
 −8  t
5
= t(11e1 − 8e2 + 5e3 )
So and element in the kernel is linear combination of 11e1 − 8e2 + 5e3 . Thus,
ker(T ) = span{11e1 − 8e2 + 5e3 }
115
(3) Find the image of T
1 2
The image is asking for all the possible values for w =
a
b
That can get by the
form

w = Tv = T
e1 e2

x
e3  y 
z
= xT e1 + yT e2 + zT e3
Thus, Im(T ) = span{T e1 , T e2 , T e3 }.
But now we have to get rid of redundantvectors.
1 2 1
As T e1 T e2 T e3 = 1 2
3 1 −5
Then we look at the coordinate matrix and do row transformation P to reduce to reduced row
echelon form.
1
− 11
5
T e1 T e2 T e3 = 1 2 P
8
1
5
As you can see, in the reduced row echelon form, The piviot columns are the first and second
column, then We get rid of T e3 , thus
span{T e1 , T e2 , T e3 } = span{T e1 , T e2 }
= span{1 + 32 , 21 + 2 }
Im(T ) = span{1 + 32 , 21 + 2 }
We illustrate an important formula in linear algebra.
Theorem 10. Suppose T : V −→ W is a linear map, then the following formula is true:
dim(ker(T )) + dim(im(T )) = dim(V )
Please note that in the right hand side of the equation it is dimension of domain
Proof. Suppose V has basis e1 e2 · · · en and W has basis 1 2 · · · m , then by finding
its matrix representation form
T e1 e2 · · · en = 1 2 · · · m A
, where A is an m × n matrix.
As we know because the linear relation of T e1 T e2 · · · T en is the same
as linear relations of
the columns of A. By getting rid of redundant vectors of T e1 T e2 · · · T en , We found a basis in the
image. That means after getting rid of redundant vectors of A, the number of linearly independent columns
of A is dim(imT ), they are corresponding to number of leading 1.
116
Now when we are trying to find the kernel, the last step is always find solution of Ax = 0, while in
there, the number of elementary solution, is the number of free variables, and elementary solution is linearly
independent, left times our basis will gives the basis in the kernel. So the number of free variables is
dim(kerT )
Remember that the number of free variables correspond to those columns that does not have leading 1.
So, sum them together, we would get dim(ImT )+dim(KerT ) equal to the number of columns of A, which
is equal to n, dimenstion of the domain.
The above way shows us the method of finding image. As you know our motivation for kernel and image
is trying to go back. To conclude, we define the two type of good map
Suppose T : V −→ W is a linear map.
Definition 58.
(1) T is called an injective if for any v1 6= v2 ∈ V , we have T v1 6= T v2 .
(2) T is called an surjective if for any w ∈ W , we can find some v ∈ V , such that w = T v
To see in another way, for injective, the preimage of every vector is always unique, but not necessarily
exists. For surjective, the preimage of every vector is always exists, but not necessarily unique. And for
bijective, the preimage exists and unique. Thus, only bijective is invertible. To use our language of kernel
and image, we conclude in the following preoposition
Proposition 75. Suppose T : V −→ W is a linear map, then
(1) T is injective if and only if Ker(T ) = {0}
(2) T is surjective if and only if Im(T ) = W
(3) T is bijective if and only if Ker(T ) = {0} and Im(T ) = W .
Some more explaination about injective preserves linearly independence, and surjective preserves
the span.
Proposition 76. Suppose T : V −→ W is a linear map, then
(1) If T is injective, then dim(V ) ≤ dim(W )
(2) T is surjective then dim(V ) ≥ dim(W )
(3) T is bijective then dim(V ) = dim(W )
Definition 59. Two linear space V,W is called isomorphic if there exists a linear map between V,W such that
this linear map is isomorphic.
Proposition 77. Two linear space is isomorphic if and only if their dimension are equal.
more explaination goes here
3.4.1. Composition, Inverse and Restriction. .
Composition
If we think linear map T : V −→ W , and R : W −→ U as two machine, for T, it accepts input from
vectors of V, and its output is the vector in W, for R, the input is vectors in W, and the output is an vector in
U. now, If we combine this two machine to be one machine, simply concatenate the output pipe of T with
the input pipe of R together, and it will become a macine that receive input from V, and gives output in U.
To see the whole process, we give some vector v as input for the first machine, it would give some result w
in the space W, but now w is immediately becomes the input of R, and then R act on it and the final out put
will become an vector of U.
We see some set theoretic explaination of composition
Nice picture goes here
Proposition 78. The matrix appeared in the matrix representation of the composition of two linear map is
the multiplication of these two matrices.
117
More explaination goes here
The inverse of linear map
The invertible linear map, that can only happens between the space with same dimension. And thus, the
matrix appeared in the matrix representation is a square matrix.
Proposition 79. Suppose T : V −→ W is an isomorphism, and V has basis e1 e2 · · · en , and W
has basis 1 2 · · · n , Suppose
T
e1 e2 · · · en
=
1 2 · · · n
P
T −1
1 2 · · · n
=
e1 e2 · · · en
P −1
Then
example of plug in x=t+1 and t=x-1 as example of inverse map. and find their matrices and then
say that this two matrix is invertible each other.
The restriction of linear map
Here goes the picture example of restrictino on sets
Definition 60. Suppose T : V −→ W is a linear map, and U is a subspace of V , then T could be defined
on U by the same way. Denoted as
T |U : U −→ W
, we call this construction of linear map by restrct T on subspace U in the domain
For a linear map, restriction on the subspace in the domain is for free. You can restrict linear map on
any subspace you like. Simply ignore others and don’t care where others go, you only keep track of your
element.
A example of calculation for the restriction goes here
Whenever we have a linear map, we always consider two things. What is the kerenel? what is the image.
The kernel of T |U is the element in U that killed by T |U , because T |U maps by the same law as T , so
killed by T |U is the same as saying killed by T . Thus we know all element in U killed by T is simply
ker(T ) ∩ U .
Proposition 80.
ker(T |U ) = ker(T ) ∩ U
So how about image of T |U ? we can’t give the precise relation. because we only know the image of T |U
should be a subset of T , because we didn’t care about others, our members might not fill all the possible
positions of Im(T ), so we know
Proposition 81.
Im(T |U ) ⊂ Im(T )
3.5. Invariant subspace for linear transformation. Power-Rank. .
The difference between map and transformation When we talk about linear map, we treat two spaces
separately, so that is why we could discuss restriction in domain or target. And you see the restriction in the
domain comes for free, and the restriction in the target needs some discussion, and if you want to restrict
yourself simultaneously in the subset of domain and subset of target, you should first restrict domain, and
then check whether your target works.
118
Now we discuss linear transformation. Because we assume that we are using the same space, so whenever
we choose a subspace in the domain, this subspace will also be a subspace in the target. And if we only
restrict in the domain or target, then it is not linear transformation again, instead, becomes a linear map. So,
in order to make restriction of linear transformation again a linear transformation. We need to restrict us
simutaneously in the same subspace in the domain and target. And by the above discussion, we have the
following definition of invariant subspace.
Before define what is invariant subspace, let’s just look at set for example ,for which kind of subset can
we restrict our self on? Clear, we need that transformation not going to map things out from our subset.
Look at the following picture.
We call the subset that we could restrict our transformation as invariant subset. Does each of those would
be the set that we could restrict our transformation on?
119
In the first picture, yes it is. Every element in the blue circle does not go out, although they all collapse
to the center, but whenever it doesn’t blow out, we call it invariant. So invariance only means not going out,
collapse is also call invariance.
In the second picture, yes, it is. Everyone in the blue circle stays, some might gradually collapse to some
point, there are also some guy playing in a circle, they don’t collapse, just rotating and rotating all the way.
This is an invariant subspace.
In the third picture, it is not invariant subset, because some guy that originally in the blue circle. After
one time transformation, it goes out.
To sum up, invariant subset means the subset you chose is able to restrict all the element in there no matter
how many times you will apply transformations. In this case, we can restrict our transform on that subset,
otherwise, we can’t restrict.
For our linear transformation, the idea are the same.
Definition 61. Let T : V −→ V to be a linear transformation on V. Then the subspace W ⊂ V is called an
invariant subspace of T, if
T (W ) ⊂ W
Definition 62. Let T : V −→ V be an linear transformation, W is an invariant subspace, Then as a
linear map, we can restrict T on W both on domain, and target. We can do that because W ⊂ T (W ).
After this restriction, it becomes a linear transformation on subset. This is called the restriction of linear
transformation on the invariant subspace W
Remark 5. Please distinguish the restriction of linear map and linear transformation, As a linear map, we
discuss domain and target separately, we can choose a set in domain, and another set in target. Restrict
linear map is still a linear map. But in linear transformation case, the target and domain are identified
together, thus, choosing a subset in the domain is the same as choosing a subset in the target. So linear
tranformation can only restrict on invariant subspace to make sure it is still a linear transformation. In the
case that you just want to restrict on domain, you should say, forget about the linear transformation and
treat it as a linear map, and if you do that, the result is just a linear map.
Proposition 82. Suppose T : V −→ V is a linear transformation, then ker T and ImT are invariant
subspaces.
Proof. Easy
Proposition 83. Suppose T : V −→ V , R : V −→ V are two linear maps on the same space, saitisfiying
T R = RT , then ker R and ImR are invariant subspace for T
Proof. Easy, Just check the definition of kernel and image, show that any element in the kernel of R mapped
to the kernel of R. And every element in the image of R mapped to the image of R
examples of invariant subspaces and restriction of linear transformation on it
Example 157. Suppose V is 3-dimensional space over Q, with basis e1 e2 e3 , T : V −→ V is a
linear transformation, such that


0
−1
−2
3
2 
T e1 e2 e3 = e1 e2 e3  4
−1 0
2
Determine whether the subspace W = span e3 − 2e2 e3 − e1 is an invariant subspace, if it is, find
the matrix representation of T |W with respect to this basis.
Answer: Denote 1 2 = e3 − 2e2 e3 − e1 , To show W is an invariant subspace, we only
need T (1 ), and T (2 ) in W.
120
T (1 ) = T (e3 − 2e2 )
= (−2e1 + 2e2 + 2e3 ) − 2(3e2 − e1 )
= −4e2 + 2e3
T (2 ) = T (e3 − e1 )


−1
= T e1 e2 e3  0 
1



0
−1 −2
−1
e1 e2 e3  4
3
2  0 
=
−1 0
2
1

e1 e2 e3
=

−2
 −2 
3
= −2e1 − 2e2 + 3e3
e3 − 2e2 e3 − e1 ,
Now in order to show −4e2 +2e3 and −2e1 −2e2 +3e3 are in subspace spanned by
we make use of reduced row echelon form.

1 2 T 1 T 2
=
e1 e2 e3
0
 −2
1
−1
0
1
0
−4
2

−2
−2 
3
By applying the reduced process, we found

0
 −2
1
−1
0
1
0
−4
2


−2
1
−2  = P 
3
2
1

1
2 
for some invertible P.
And this exactly means T 1 = 21 , and T 2 = 1 + 22 , So W is indeed an invariantsubspace.
Thus, for T |W , the domain and target for T |W are both W, and with basis 1 2 , the matrix representation form for linear transformation T is
T |W
1 2
=
1 2
2 1
2
Image of high power
Before discussing this part, let’s see some example in set theory.
Example in set theory
Now we consider what will happen for the high power of the linear transformation.
Let’s still use set theory picture to consider, see the following picture.
121
If we keep track of all element and act by T onece. The whole set would collapse down to a smaller
subset, the image. And if we act twice, this subset may become more smaller. If we act several times, how
small would it collapse? Is there any way to describe this process?
Now let’s think why the subset collapse under the transformation of set. It is because someone in the
center is absorbing points. Look at the following picture:
As you can see, points in the ker T goes to the black hole after 1 step. points in the ker T 2 goes to ker T,
and then goes to black hole, it took them two steps. Points in ker T 3 takes 3 steps to collapse. And those
absorbed by the kernel is exactly the reason of collapse of the image of subset.
Now we want to measure its collapsion. We put these two graph together for you to compare:
122
The total number of element in whole set is 19. After 1 time of transformation, there are 12 element, after
2 time of transformation, there are only 6 element. Look at 19 −→ 12 −→ 6. First time we lost 7 element,
second time we lost 6 element, it seems that the element we lose is decreasing. How to measure the element
we lose?
As you can see, the element we lose is reflect in ker T. because he is absorbing points. From the whole
set to the Im T, the number of points we lose is exactly the points that absorbed by kernel. and from Im T
to Im T 2 , the number of the point we lose is the number of points in ker T ∩ Im T . So ker T ∩ Im T k
measures the point we lose from Im T k to Im T k+1
Actually this is just an example for you to understand better. In set theory, we can’t say what is kernel,
because there is nothing like zero-vector in set. With this picture in mind, let’s now discuss the case in linear
transformation.
Rank-power in linear transformation
Now suppose T : V −→ V is a linear transformation, how can we say something about the dimension of
their image and kernel of its high power?
Proposition 84. Suppose T : V −→ V is a linear transformation, then we have the chain of following
invariant subspaces.
{0} ⊂ ker T ⊂ ker T 2 ⊂ ker T 3 ⊂ · · ·
V ⊃ Im T ⊃ Im T 2 ⊃ Im T 3 ⊃ · · ·
Proof. If x ∈ ker T n , then T n (x) = 0, then of course T n+1 (x) = T 0 = 0, so x ∈ ker T n+1 , thus
ker T n ⊂ ker T n+1 .
For y ∈ Im T n+1 , there exists x such that y = T n+1 x = T n (T x), so y ∈ Im T n
In order to keep track of the map, we consider the following sequence in the space:
T
T |Im
T|
2
T|
3
Im T
Im T
V −−−−−−−−→ Im T −−−−→ Im T 2 −−−
−−→ Im T 3 −−−
−−→ Im T 4 →
− ···
In each step, we might lose some dimension, and the dimension we lost measured by kernel. The dimension formula tells us
T
Dimension lost = dim(Domain) − dim(Image)
= dim(Ker)
= Dimension of element in domain killed by transformation
123
Example 158. Which subspace measures the dimension lost from Im T 3 to Im T 5
Answer: From Im T 3 to Im T 5 , the linear transformation is T 2 , the dimension lost is measured by the
space inside of Im T 3 that killed by T 2 , so this space is
Im(T 3 ) ∩ ker(T 2 )
So
dim(Im T 3 ) − dim(Im T 5 ) = dim(Im T 3 ∩ ker T 2 )
To say this strictly, consider the linear transformation:
T 2 |Im T 3 : Im T 3 −→ Im T 3
The kernel of this transformation is ker T 2 ∩ Im T 3 , the image of this transformation is Im T 5 , and the
domain is Im T 3 , so by applying the dimension formula we have
dim(ker T 2 |Im T 3 ) + dim(Im T 2 |Im T 3 ) = dim(Im T 3 )
This is equivalent to
dim(ker T 2 ∩ Im T 3 ) + dim(Im T 5 ) = dim(Im T 3 )
Thus
dim(ker T 2 ∩ Im T 3 ) = dim(Im T 3 ) − dim(Im T 5 )
Now let’s go back to the sequence:
T
T |Im
T|
2
T|
3
Im T
Im T
V −−−−−−−−→ Im T −−−−→ Im T 2 −−−
−−→ Im T 3 −−−
−−→ Im T 4 →
− ···
We understand for each step, the dimension lost is the dimension of
T
ker T ∩ ImT k
But we have a subspace chain
ker T ⊃ ker T ∩ Im T ⊃ ker T ∩ Im T 2 ⊃ ker T ∩ Im T 3 ⊃ · · ·
And thus
dim(ker T ) ≥ dim(ker T ∩ Im T ) ≥ dim(ker T ∩ Im T 2 ) ≥ dim(ker T ∩ Im T 3 ) ≥ · · ·
With the understanding that it measures the lost of dimension, we have
dim(ImT 0 ) − dim(ImT 1 ) ≥ dim(ImT 1 ) − dim(ImT 2 ) ≥ dim(ImT 2 ) − dim(ImT 3 ) ≥ · · ·
Proposition 85. The function f (n) = dim(ImT n ) satiesfies the following property:
(1) f never increase, that is
f (n + 1) ≤ f (n)
(2) The decreasement
∆f (n) = f (n) − f (n + 1)
never increase. that is
f (n + 1) − f (n + 2) ≤ f (n) − f (n + 1)
Suppose we have a basis e1 e2 · · · en , then we can
correspond a linear transformation to a matrix, namely T e1 e2 · · · en = e1 e2 · · · en A. And the dimension of image is simply the
rank. So we transfer dim(ImT n ) to the rank(An ). And conversely, when we have an n × n matrix, we can
define a linear transformation of Mn×1 (F), simply left multiply it on elements of Mn×1 (F). Thus, we also
have the same property for rank of the matrix
124
Corollary 5. For arbitrary square matrix A rank(An ) satiesfies the following property:
(1) rank(An ) never increase, that is
rank(An+1 ) ≤ rank(An )
(2) The decreasement of rank(An ) never increase. that is
rank(An+1 ) − rank(An+2 ) ≤ rank(An ) − rank(An+1 )
In the whole process, as the incresement become smaller and smaller, finally it will goes to 0. and after
that, because it can not grow or decrease, the rank will become a stable number as the power goes very large.
Thus we define the stable number of the power rank.
Definition 63. For a linear transformation T (or a matrix A), the rank stable number of T is defined to be
the minimal number n, such that T n = T n+1 (resp. rank(An ) = rank(An+1 )).
Definition 64. if N is the power-rank stable number of T, then we call dim(Im(T N )) (resp, rank(AN ))as
the stable rank. and we call dim(ker(T N )) as the stable nullity.
Proposition 86. The rank stable number is less than the stable nullity.
Proof. Because the stable nullity measures the total amount of decrease needed to come to the stable point,
while each step before stable point one decrease at least 1. So less than the stable nullity times, it would
already decrease to the stable point.
Example 159. Suppose V is 10-dimensional space, T : V −→ V is a linear transformation, Suppose Im T
is 8- dimensional, Im T 3 is 5-dimensional. And T 10 = 0
(1) Find the dimension of subspace Im T k for all integer k.
(2) Find the dimension of ker T 3 ∩ Im T 5
(3) What is the rank stable number?
Answer: First let’s see the dimension of Im T 2 , it could be 8,7,6,5. Of couse it could not be 8 or
5 because if it is, then it will become stable all the way, forces impossible for T 10 = 0.
If it is 7, it also impossible, because in that case, Im T to Im T 2 dimension drops 1. That forces
Im T 2 to Im T 3 the dimension drops at most 1. but Im T 3 is 5-dimensional. 7−1 > 5. So the only
case is Im T 2 is 6-dimensional. And then, because Im T 3 is 5-dimensional, from here each step
the dimension losing is at most 1. As we know T 1 0 = 0, that means it will keep losing dimension
until it become 0. so Im T 4 is 4-dimensional, Im T 5 is 3-dimensional, Im T 6 is 2-dimensional,
Im T 7 is 1-dimensional, Im T 8 = 0 is 0-dimensional.
For (2) we know dim(ker T 3 ∩ Im T 5 ) = dim(Im T 5 ) − dim(Im T 8 ) = 3 − 0 = 3. So this
space is 3-dimensional.
For (3) The rank stable number is 8 because 8 is the first time for the rank to be stable.
power-rank curve
Now let’s using graphics to help us understand better, because we know rank(An ) as a function it is nonincrease and non-accelertes, so it is concave down, The power-rank curve ssaitisfies the following properties:
Proposition 87.
(1) The slope of the curve is never possitive
(2) All the curve lies above any link of two adjacent points.
(3) This curve only pass though integer points.
The third condition is most strong one. It is not like the curve that we general studied, because in this
world, we only have integer, so this curve is actually like segment. To give you an idea, lets look at the
previous example.
125
Example 160. Suppose V is 10-dimensional space, T : V −→ V is a linear transformation, Suppose Im T
is 8- dimensional, Im T 3 is 5-dimensional. And T 10 = 0, Try to draw the power-rank curve
We set up the points that we already have here:
Now you see, Now you know the curve, from 0 to 3, drops 5 unit. So dropping 5 unit in 3 steps. So what
is the first drop.
Is that possible to drop by 1 at first?
It’s impossible, because it is non-accelerate, so dropping by 1 would result either 1 every step or 0. So
We will still
even we could try our best:
miss the point at (3,5)
So the first step can only drop at least 2.
Is that possible to drop by 3 at first?
If drop 3 at first, we would reach 6 at 1:
126
But now your position is embarrasing. How to get that 5 at 3? you have two step far and you want to
drop totally by 1? remember the curve can only pass though integer points, so it has no choice: The only
way for it to get 5 at 3 is to:
And if he do that, he will stable all the way. So he will definitely miss that 0 at 10.
127
Now drop by 3 even hard, no wonder it is impossible to drop by 4 at first place.
So the only case is at first it drops by 2, and it drops by 2, every follow steps could choose 2 or 1 or 0.
Now it reach 7 at 1. then it wants to touch 5 at 3, the only step for it to be non-stable is keep dropping
and each step is 1. so
128
Now we saw that it could only drop by 1 or 0 afterwards, and dropping by 0 means it stable, at clear,
stable at any rank other than 0 would costs contradiction because it would miss that 0 at 10. So in can only
stable at rank 0. This exactly means it will keep dropping by 1 each step until it touches the bottom:
So this is the only case that it could be, and it tells us everything. The stable index is 8. and the stable-rank
is 0 and stable nullity is 9.
dim(V ) = 9
dim(Im(T )) = 7
dim(Im(T 2 )) = 6
dim(Im(T 3 )) = 5
dim(Im(T 4 )) = 4
dim(Im(T 5 )) = 3
dim(Im(T 6 )) = 2
dim(Im(T 7 )) = 1
dim(Im(T 8 )) = 0
129
4. E IGENVALUES AND E IGENVECTORS
4.1. Finding Eigenvalues and Eigenvectors. .
Remember what we talked about last time.
Problem 11. There are 3 brothers. Namely, A,B,C. They have some money originally. Suppose there is
neither income and costs. And they like to share their money. At the end of every month, each of them will
split his money into two equal part and give it to other two.
Suppose this month, A has \$8192, B has \$4096, C has \$8192. What amount will they have after 1 year?
This is the most classical problem, A process of the system. We can keep track of it several times, but if
we asked to keep track a lot of time, it would be a pain.
e1 = A dollar that belongs to A
e2 = A dollar that belongs to B
e3 = A dollar that belongs to C
Then the linear transformation would have a very ugly look:

T e1 e2 e3 = e1 e2 e3  12
1
2
1
2
1
2
1
2
1
2


However, if we choose another basis, like
1 = e1 + e2 + e3
2 =
e1 − e2
3 =
e1 − e3
Then 1 , 2 , 3 together form a basis. And the behavior of linear transformation is also nice:
1 −→ 1 −→ 1 −→ 1 −→ 1 −→ · · · −→ 1
1
1
1
1
1
2
2 −→ − 2 −→ 2 −→ − 2 −→ 2 −→ · · · −→
2
4
8
16
4096
1
1
1
1
1
3 −→ − 3 −→ 3 −→ − 3 −→ 3 −→ · · · −→
3
2
4
8
16
4096
This means in the basis 1 2 3 , the matrix representation of T 1 2 has a very simple form, namely


1
1

T 12 1 2 3 = 1 2 3 
4096
1
4096
By replacing the change of basis relation:

1 2 3
=
e1 e2

1
1
1
e3  1 −1 0 
1 0 −1
We would have

T 12
e1 e2 e3

1 1
1
 1 −1 0  =
1 0 −1

1
1 1
1
 1 −1 0  
1 0 −1

e1 e2 e3
130

1
4096
1
4096



1 1
1
Simply move the matrix  1 −1 0  to the right, we have
1 0 −1


1
1 1
1
12



e1 e2 e3
1 −1 0
e1 e2 e3
=
T
1 0 −1
−1
1 1
1
  1 −1 0 
1 0 −1

1
4096
1
4096

e1 e2
=

1
1
1
1
1



e3 3
1 −1 0
1 0 −1

1
4096
1
4096

1 1
1
  1 −2 1 
1 1 −2
You can finally calculate it out and find the answer.
In the above exercise, We saw that the key step to simplify the linear transformation is to find some vector
v such that
Tv = v × λ
. And we expect, that these vector would able to give us a basis.
Definition 65. Suppose T : V −→ V is a linear transformation. A non-zero vector v is called an eigenvector
if and only if there is some λ, such that
Tv = v × λ
And in this case, λ is called the eigenvalue of T.
We directly do that, suppose our space has a basis e1 e2 · · · en , Then we want to find the vector
v such that T v = v × λ. With the basis e1 e2 · · · en , we assume


a1


 a2 
v = e1 e2 · · · en  .. 
 . 
an
, and suppose
T
e1 e2 · · · en
=
e1 e2 · · · en
A
, where A is a matrix. we proceed like follows:
Tv = v × λ
means

T
e1 e2


· · · en 

a1
a2
..
.




=

e1 e2


· · · en 

an
a1
a2
..
.



λ

an
Then

e1 e2 · · · en


A

a1
a2
..
.



=


e1 e2 · · · en




a1 λ
a2 λ
..
.
an λ
an
That is
131






e1 e2 · · · en
a1
a2
..
.


A





=

e1 e2 · · · en




an
λa1
λa2
..
.





λan
We preceed on,


e1 e2 · · · en


A

a1
a2
..
.



=

e1 e2
an

λ









λ




· · · en 



λ
λ
..
.
a1
a2
..
.





an
λ
Cancel the basis on both side, we get




A

a1
a2
..
.

λ










λ


 
 
=
 


an
λ
λ
..
.
a1
a2
..
.





an
λ
We use neater symbol



A

a1
a2
..
.






 = λI 


a1
a2
..
.





an
an
Move the left hand side to right hand side and finally we get.

 
a1
0
 a2   0

 
(λI − A)  ..  =  ..
 .   .
an





0
Now we know that



v is non-zero vector ⇔ 

a1
a2
..
.
an


 
 
 6= 
 
0
0
..
.



 ⇔ λI − A is not invetible ⇔ det(λI − A) = 0

0
Proposition 88. λ is an eigenvalue of the linear transformation T if and only if det(λI − A) = 0, where A
is the matrix appeared in the matrix representation of A by some basis. We call the polynomial det(λI − A)
as the characteristic polynomial of T.
Now as you can see, we define some polynomial of T by its matrix representation. But for the same T it
might have different matrix to represent it. Does this definition don’t contradict?
132
Proposition 89. If A, B two matrices that represents the same lienar transformation T on some bases. Then
there exists some P, such that
B = P −1 AP
Proof. Suppose T e1 e2 · · · en = e1 e2 · · · en A, and T 1 2 · · · n = 1 2 · · · n B,
and 1 2 · · · n = e1 e2 · · · en P , then substitude and use uniqueness of coordinates, we
will get
B = P −1 AP
Definition 66. The matrix A and B are called similar, if there exists an invertible matrix P, such that
B = P −1 AP
Now let’s discuss does det(λI − A) and det(λI − B) are the same? We proceed as follows.
det(λI − B) = det(λI − P −1 AP )
= det(P −1 λIP − P −1 AP )
= det(P −1 (λI − A)P )
= det(P −1 ) det(λI − A) det P
= (det P )−1 det P det(λI − A)
= det(λI − A)
Proposition 90. The eigen polynomial for similar matrices are the same. Thus, similarity does not change
eigenvalues and its multiplicity as a root.
Proof. This is what we already showed prevously, we showed that det(λI − A) = det(λI − B), if A and
B are similar matrices.
So no matter which basis you chose, the characteristic polynomial does not change. Now let’s do some
examples on how to find and calculate eigenvalues and eigenvectors.
Example 161. Let V be two dimensional vector space over C, with basis e1 e2 , and suppose we have
a linear transformation T switching two vectors in the basis, that is
T e1 = e2
T e2 = e1
, Find the eigenvalue and eigenvector for T.
Our intuition feels like if they are switching things, then we combine this two thing by the way e1 + e2 ,
and e1 − e2 , then they will just keep it or just multiply it by -1. Let’s find them by eigenvalue process.
Write down the matrix form, we get
0 1
T e1 e2 = e1 e2
1 0
0 1
And here we set A =
1 0
So it’s characteristic polynomial is
133
λ −1 = λ2 − 1
det(λI − A) = −1 λ Now we find two roots of this characteristic polynomial, they are λ = 1 and λ = −1
Then for these two value, each value would have least one eigen vector, because if the coefficient matrix
is not invertible then the homogeneous linear equation will have non-zero solution.
If λ = 1, then we solve the equation
1 −1
x
0
=
−1 1
y
0
.
1
We found the elementary solution for this equation is
1
If λ = −1, then we solve the equation
−1 −1
x
0
=
−1 −1
y
0
1
We found the elementary solution for this equation is
−1
So This linear
transformation
has
two
eigen
values
and
and an eigenvector corresponds to 1 is 1 =
1
e1 e2
, and the eigenvector corresponds to -1 is 2 = e1 e2 . And fortunately, we can
1
check that they form a basis for the linear space, and thus this linear transformation would able to have a
very simple representation.
1 0
T 1 2 = 1 2
0 −1
Example 162. Now let’s come back to our oringinal problem, the three boys distributing their money problem. As you can see, by the basis we chose, we simplify the problem to a linear transformation


1
1
2
2
1 
T e1 e2 e3 = e1 e2 e3  12
2
1
2
1
2
Now let’s finding eigenvalues and eigen vectors for this linear transformation, we have the eigen polynomial.
λ −1 −1 2
2 − 1 λ − 1 = λ3 − 3 λ − 1 = (λ − 1)(λ − 1 )2
2
2 4
4
2
−1 −1 λ 2
2
Thus, this linear transformation has two different eigenvalues, namely, 1, and 21 . Now let’s find the
eigenvectors one by one.
For 1, we solving the equation

   
1 − 12 − 12
0
x
 −1 1 −1   y  =  0 
2
2
z
0
− 21 − 12 1
By reduced row echelon form, we see this has only one elementary solution, which is
   
x
1
 y = 1 
z
1
134
Now let’s continue to solve for eigenvalue 21
By solving the equation
 1
   
− 2 − 12 − 12
x
0
 −1 −1 −1   y  =  0 
2
2
2
z
0
− 21 − 12 − 12
We get two elementary solutions. which is
  

x
1
 y  =  −1 
z
0
  

x
1
 y = 0 
z
−1
Then we know the eigenvector for 1 is
 
1
1 = e1 e2 e3  1  = e1 + e2 + e3
1
The eigenvector for 2 are


1
2 = e1 e2 e3  −1  = e1 − e2
0


1
3 = e1 e2 e3  0  = e1 − e3
−1
Fortunately they form a basis.
In the above situation, we see two intresting facts, the eigenvector seems automatically forms a basis,
they are automatically linearly independet? Another fact is that the number of elementary solutions for the
eigen value seems equal to the algebraic multiplicity of it in the equation, is that true? We will solve this
question in our study next section
4.2. Linear Independence of eigenvector, Algebraic and Geometrical multiplicity, Eigen spaces.Diagonalization.
The first step to the basis–Linearly independence
We have following useful theorem.
Theorem 12. Suppose T : V −→ V is an linear transformation with distinct eigenvalues λ1 , λ2 , · · · ,
λs . (Here if an eigenvalue appears several times we just conted one here[In the future we will call this as
”algebraic multiplicity not considered”]).
Then for each one of them, we choose an eigenvector vi belongs
to λi . Then v1 v2 · · · vs are linearly independet.
Proof. Suppose there is a linear relation

v1 v2 · · · vs




a1
a2
..
.
as
. Then act the linear transformation on, we have
135



=0

0 = T0

= T
v1 v2 · · · vs





a1
a2
..
.




as

=
v1 × λ1 v2 × λ2 · · · vn × λn




a1
a2
..
.





as

=
v1 v2

λ1









λ2




· · · vs 



λ3
λ4
..
.
λs

=
v1 v2 · · · vs




λ1 a1
λ2 a2
..
.





λs as
The above calculation tells us if

v1 v2 · · · vs




a1
a2
..
.



=0

as
then by applying T,

v1 v2 · · · vs




λ1 a1
λ2 a2
..
.



=0

λs as
, By applying T and T again, we get

v1 v2 · · · vs





v1 v2 · · · vs




λ21 a1
λ22 a2
..
.
λ2s as
λ31 a1
λ32 a2
..
.
λ3s as



=0




=0

···
Then apply T until s − 1 times and put the vectors together, we get
136
a1
a2
..
.
as





a1
a2
a3
..
.
λ 1 a1
λ 2 a2
λ 3 a3
..
.
λ21 a1
λ22 a2
λ23 a3
..
.
···
···
···
..
.
λs−1
1 a1
λs−1
2 a2
λs−1
3 a3
..
.
as
λs as
λ2s as
···
λs−1
s as
OK we know numbers can commute, so

a1 λ 1
a1
 a2
a2 λ 2

 a3
a3 λ 3
v1 v2 · · · vs 
 ..
..
 .
.
as λ s
as
a1 λ21
a2 λ22
a3 λ23
..
.
···
···
···
..
.
a1 λs−1
1
a2 λs−1
2
a3 λs−1
3
..
.
as λ2s
···
as λs−1
s

v1 v2



· · · vs 






=


0 0 0 ··· 0
0 0 0 ··· 0




=


And you can see each row has a common left factor ai , let’s factor that out row by row.

v1 v2
a1

a2




· · · vs 



a3
..
.
..
.
an

1

 1

 1

  ..
 .

1
λ1
λ2
λ3
..
.
λ21
λ22
λ23
..
.
···
···
···
..
.
λs−1
1
λs−1
2
λs−1
3
..
.
λs
λ2s
···
λs−1
s




=


0 0 0 ··· 0
Now the matrix appeared on the right is Vandemonde matrix. Because we didn’t consider algebraic multiplicity, which means the eigenvalue I listed here are all distinct.By Quiz 2 bonus question, we know the
Vandemonde Matrix is invertible(because polynomial have two kind of basis and Vandemonde play the role
of change of basis matrix.). Thus, we invert it to the right, we have:

v1 v2




· · · vs 



a1

a2
a3
..
.
..





=



.



0 0 0 ··· 0 


an
1
1
1
..
.
λ1
λ2
λ3
..
.
λ21
λ22
λ23
..
.
···
···
···
..
.
λs−1
1
λs−1
2
λs−1
3
..
.
1
λs
λ2s
···
λs−1
s
That is

v1 v2




· · · vs 



a1

a2
a3
..
.
..
.




=



0 0 0 ··· 0
an
We know the coordinate of zero vector is all the way zero, so we have
a1 = a2 = · · · = 0
137
−1






Then we proved the linearly independence of the eigenvector that belongs to different eigenvalues.
Thus, if you have bunch of eigenvectors, to check that they are linearly indepedent, you don’t have to
check them by put together, instead, you only have to check each small family of eigenvectos belongs to the
same eigenvalue are linearly independent. To see this, we give an example
Example 163. Suppose T is a linear transformation T : V −→ V , e1 e2 e3 e4 e5 is a basis for
V. And the matrix representation of T is:

T
e1 e2 e3 e4 e5
e1 e2 e3 e4
=
2 1
 2 3

e5 


1
3
2
2
1
−1
5
−3
−6
−2
2
2
1
3
1






Are you scared by this 5 × 5 matrix? Oh don’t worry, We found a perfect fact that this is actually block
upper triangular. Now calculate the characteristic polynomial. Let’s call the matrix appeared on the right
as A.
det(λI − A) = λ − 2 −1
−2 λ − 3
λ − 2 −1
= −2 λ − 3
−1
−2
1
−3
−2
−5
λ − 2 −1
3
−2 λ − 3
6
−1
−1 λ + 2
λ − 2 −1
3
−2 λ − 3
6
−1
−1 λ + 2
Now as you can see, this is just the product of characteristic polynomial of the diagonal blocks.
Now using tricks to calculate each polynomial (If you calculate characteristic polynomial by just expanding the determinant, please stop. That would be a waste of life. Please keep on reading to find the tricks.
The quick method to determine coefficient of your characteristic polynomial.)
λ − 2 −1
−2 λ − 2
= λ2 − 5λ + 4 = (λ − 1)(λ − 4)
And use tricks to calculate 3 × 3 we have
λ − 2 −1
3
−2 λ − 3
6
−1
−1 λ + 2
= λ3 − 3λ2 + 3λ − 1 = (λ − 1)3
Now the characteristic polynomial for T is
fT (λ) = (λ − 1)4 (λ − 4)
So the algebraic multiplicity for eigenvalue 1 is 4, and the algebraic multiplicity for eigenvalue 4 is 1.
Because the geometric multiplicity can not exceed the the algebraic multiplicity and it would be at least one.
Then the geometric multiplicity for 4 is also 1.
Now, find the eigenvector for 1. we look at
138

2 1
 2 3

A−I =


1
3
2
2
1
2
2
1
3
1
−1
5
−3
−6
−2



1
 
 
−
 
 

−3 −1
  1 −3
 
=
 
 
1
1
1
1
Now we are going to find out the null space of the matrix

1 1
1 2 −1
 2 2
3 2 5


1 1 −3


2 2 −6
1 1 −3
1
3
1
2
1
2
2
1
2
1
−1
5
−3
−6
−3












Doing row reduce a little bit (Never do it completely to reduced row echelon form, it would cost you a lot
of time. You only reduce to the one when you could find solution. The knowledge is going to serve us, not to
kill us.)


1 1
1 2 −1
 2 2
3 2 5 




1
1
−3



2 2 −6 
1 1 −3

1 1
 2 2

r4 −2×r3 ,r5 −r3
−−−−−−−−−→ 



r2 −2×r1
−−−−−→






r2 −r3
−−−−→
1 1
1 1






1 2 −1
3 2 5 

1 1 −3 


1 2 −1
1 −2 7
1 1 −3







2 −1
−3 10 

1 1 −3 


1
OK, now the equation looks better, without completely reduced that to reduced row echelon form, we
could start by finding

   
1 1
1 2 −1
a
0





−3 10   b   0 



   
1 1 −3 

 c  =  0 

 d   0 
e
0
139
The second equation tells you 10e = 3d. So we assume d = 10t and e = 3t. And plug in the third
equation ,we get
c + 10t − 9t = 0
This means c = −t. Now plug the material you had in first equation, you have
a + b − t + 20t − 3t = 0
Which means
a + b = −16t
Now b comes as another free varible, associate b = s then
a = −16t − s
Put all your work together, you have

a =




 b =
c =


d
=



e =
−16t − s
s
−t
10t
3t
And by setting t=1,s=0; and t=0,s=1, we get elementary solution.






−16
0
−1
10
3
 
 
 
,
 
 
−1
1
0
0
0






Remember, this two vector is only for the space of matrix, we should go back to the original abstract
linear space. So the actual vector in the linear space we should left multiply the basis on. So the actual
eigenvector should be

1 =
e1 e2 e3 e4


e5 


−16
0
−1
10
3



 , 2 =



e1 e2 e3 e4


e5 


−1
1
0
0
0



=


e1 e2
−1
1
Because the elementary solution automatically gives us the basis of solution space, so we don’t have to
check linearly independece, they become linearly independent automatically in the same eigenspace. And
because in the diffrent eigenspace, the linear independece is guaranteed by our theorem. So we also don’t
need to check linear independency.
Now we found the geometric multiplicity for 1 is 2. It is far less than its algebraic multiplicity, which is
4. Only this point is sufficient to say it is not diagonalizable.
Now let’s work for eigenvalue 4.
140

2 1
 2 3

A−4I = 


1
3
2
2
1
2
2
1
3
1
−1
5
−3
−6
−2
 

4
 
 
−
 
 

−2 1
  2 −1
 
=
 
 
4
4
4
4
1 2 −1
3 2 5
−2 1 −3
2 −1 −6
1
1 −6
Now we solve the equation

−2 1
 2 −1





1 2 −1
3 2 5
−2 1 −3
2 −1 −6
1
1 −6





a
b
c
d
e


 
 
=
 
 
0
0
0
0
0






Before you start reducing to reduced row echelon form, wait!!! you found something.
Pick out the last three equation and rewrite it into matrix form:

   
−2 1 −3
c
0
 2 −1 −6   d  =  0 
1
1 −6
e
0
But simply noticed that the matrix


−2 1 −3
 2 −1 −6 
1
1 −6
is invertible!!! This means the homogeneous euqation would only have 0 solution , so we simply get
c = 0, d = 0, e = 0
OK, now with that, let’s going back, you are finding

−2 1
 2 −1





1 2 −1
3 2 5
−2 1 −3
2 −1 −6
1
1 −6





a
b
0
0
0

0
0
 
 
=
 
 
This is the same as finding
−2 1
2 −1
a
1
That’s too easy, just let
=
b
2
So we have

a
 b

 c

 d
e

a
b

1
2
0
0
0
 
 
=
 
 
=






So the eigenvector for eigenvalue 4 for this linear transformation is
141

0
0
0
0
0













3 =
e1 e2 e3 e4


e5 


1
2
0
0
0



=


e1 e2
1
2
Now because we only get one solution in the process of finding elementary solution . So this eigen space
would be 1 dimensional. And the theorem tells us that eigenvector come from different eigenspace are
automatically linearly independet. So we know the vectors:
1 = −16e1 − e3 + 10e4 + 3e5 , 2 = e2 − e1 , 3 = e1 + 2e2
are linearly independent.
All the eigenvector of eigenvalue 1 lives in the subspace span(1 , 2 ) and all the eigenvector of eigenvalue
4 lives in the subspace span(3 )
Let’s use the viewpoint of space to see eigen vectors.
Proposition 91. Suppose T : V −→ V is a linear transformation, and λ is an eigenvalue of T, then all the
eigenvector of eigenvalue λ forms a subspace of T. We call this subspace as eigensubspace of the eigenvalue
λ
Proof. Because eigen vector of eigenvalue λ means the kernel of the linear tranformation T − λE, where E
is the identity map. and of course kernel is a linear subspace.
We also have another proof
Proof. Consider the set of all eigenvectors of eigenvalue λ, namely, W = {v|T v = λv}. We can check for
any two vector in W, any linear combination of them is still in W. And thus it become a linear subspace. Definition 67. Suppose T : V −→ V is a linear map, with characteristice polynomial fT = (λ−λ1 )n1 (λ−
λ2 )n2 (λ − λ3 )n3 · · · (λ − λm )nm . Then for the eigenvalue λ1 , the multiplicity for it to appear as a root of
characteristic polynomial is called the algebraic multiplicity of λ1 . And the dimension of the eigenspace of
λ1 is called the geometric multiplicity of λ1
Theorem 13. Suppose T : V −→ V is a linear transformation. λ0 is an eigenvalue, then the geometric
multiplicity is less than the algebraic multiplicity.
Proof.
T
v1
Fix λ0 , and suppose the geometric multiplicity of λ0 is g.
Then we could able to find g many linear independent eigen vectors v1 , · · · , vg
Extend it to a basis of whole space, v1 · · · vg g+1 · · · n
With such basis, the matrix representation form of linear transformation looks like

λ0

..

.


λ0
· · · vg g+1 · · · n = v1 · · · vg g+1 · · · n 



Call the right matrix by A, Now by calculating its characteristic polynomial, we have
142

∗
∗








det(λI − A) = λ − λ0
..
∗
.
λ − λ0
λ−∗
∗
∗
|
{z
g
}
λ−∗ ∗
∗
.
g
..
= (λ − λ0 ) ∗
∗
∗
∗ λ−∗
∗
∗
..
.
∗
∗ λ−∗
= (λ − λ0 )g h(λ)
So the characteristic polynomial have at least g multiple root of λ0 .
Because the arbitrary choose of λ0 ,
We proved that for any eigenvalue, the geometric multiplicity is less than its algebraic multiplicity
Please remember why we want to study eigenvalue and eigenvectors. It is because we want to simplify
our problem, we wish there would be nice basis in our space such that every vector in the basis is eigenvector.
Definition 68. Let V be a linear space over F , then a linear transformation T : V −→ V is called
diagonalizable if there exists an eigenbasis for T in V. Where eigenbasis means a basis with all vector as
eigenvector.
Let’s translate this definition to matrix.
Definition 69. Let A be an n × n matrix, then A is called diagonalizable if it is similar to a diagonal matrix.
Of course, a linear transformation is diagonalizable if and only if its matrix in some matrix representation
is diagonalizable.
Corollary 6. A linear transformation is diagonalizable if and only if the geometric multiplicity is equal to
the algebraic multiplicity for each of its eigenvalues.
Proof. We only need to find an eigenbasis. Suppose geometric multiplicity is equal to algebraic. Then for
each distinct eigenvalue, we could find linearly independet eigenvectors as many as the algebraic multiplicity. And by going over all the eigenvalue, we found eigenvectors as many as the dimension of all space. And
because eigenvectors in each eigenspace is linearly independent, and eigenvectors comes from eigenspace
for different eigenvalue is automatically linearly independet, so when we put them together, they are linearly
independet. Thus, they form a basis.
On the other hand, if a linear transformation is diagonalizable, then we could find eigen basis. Because
geometric multiplicity could not exceeded algebraic. So the total number of those basis is at most the sum
of geometric, so is at most the sum of the algebraic multiplicity. But it reach the sum. The only case allow
it happen is geometric multiplicity equal to algebraic multiplicity.
Example 164. Previously we calculate the eigenvector and eigenvalue for the linear transformation
143

T
e1 e2 e3 e4 e5
=
e1 e2 e3 e4
2 1
 2 3

e5 


1
3
2
2
1
2
2
1
3
1
−1
5
−3
−6
−2






And we do found it has eigenvalue 1 and 4 and characteristic polynomial is
(λ − 1)4 (λ − 4)
For 1, it has eigen vector(basis of eigenspace of eigenvalue 1):
1 = −16e1 − e3 + 10e4 + 3e5 , 2 = e2 − e1
For 4, it has eigen vector(basis of eigenspace of eigenvalue 4):
3 = e1 + 2e2
From the polynomial, we know the multiplicity of the factor (λ − 1) is 4, and the multiplicity of the factor
(λ − 4) is 1. So we know, algebraic multiplicity for 1 is 4, and algebraic multiplicity for 4 is 1. And because
for 1, we only get at most 2 linearly independent eigenvectors, so the geometric multiplicity for 1 is 2. and
the algebraic multiplicity for 4 is 1. To sum up in the chart see
Eigenvalues
1 4
Algebraic multiplicity 4 1
Geometric multiplicity 2 1
So for eigenvalue 1, the geometric can not touch it . This linear transformation is not diagonalizable
because for eigenvalue 1 geometric multiplicity fails to fill all the algebraic multiplicity
Example 165. Although so many times we are not lucky enough to make sure our transformation is diagonalizable. But some cases, it is. Remember the original problem that 3 brothers are splitting money? The
transformation is like: V 3-dimensional, and
T e1 =
T e2 =
T e3 =
And we found the characteristic polynomial is
1
2 e2
1
2 e1
1
2 e1
+ 12 e3
+ 12 e3
+ 12 e2
1
(λ − 1)(λ + )2
2
So at this point, it has eigenvalue 1, and −1
,
the
algebraic
multiplicity for 1 is 1, and for −1
2
2 is 2.
OK and then we found eigenvectors. For 1 we found the eigenvector e1 + e2 + e3 , so the geometric
multiplicity for 1 is 1.And for −1
2 we found two linearly independent eigenvectors e1 − e2 and e2 − e3 , so the
geometric multiplicity for −1
is
2. So to sum up, we have:
2
Eigenvalues
1 − 12
Algebraic multiplicity 1 2
Geometric multiplicity 1 2
Because all the numbers match up, and it is diagonalizable. If it is, then pick up all the eigenvector you
found, in this case is exactly

 1 = e1 + e2 + e3
2 = e1 − e2

3 = e2 − e3
144
They are automatically linearly independet because our theorem garantee that, and because geometric
is equal to algebraic. Then the total number of them would equal the dimension of whole space. and if this
happen ,they form a bsis. So let your transformation T act on this basis, and you would have

T
1 2 3
=
1 2 3

1

− 12
− 12

Because you have a diagonal matrix, so we call this process as diagonalization.
Now we are ready to explain the tricks to quickly detemine the characteristic polynomial. Like the
adjugate method. This method is only quick for 2 by 2 and 3 by 3 matrix, for 4 by 4 or even more, it is better
to change your method. But this section is important because it set up the relation between the eigenvalue
and the appearance of matrix.
As we see we could use determinant to find the characteristic polynomial, is there any meaning of the
Definition 70. Suppose A is a square matrix, then the trace of A is defined to be the sum of its diagonals
Definition 71. For a square matrix A. The determinant of a submatrix in which the diagonal of the submatrix
lies on the diagonal of A is called a principal minor.
Example 166. Suppose we have a 4 × 4 matrix

1
 3
A=
 5
2
4
1
5
1
1
8
3
0

1
4 

2 
6
. Find all principal minors of A of size 2.
Answer, to find the principal minors of size 2, firstly, you pick arbitrarily 2 elements on the diagonal.

1
 3

 5
2
4
1
5
1
1
8
3
0

1
4 

2 
6
And then you draw verticle and horizontal line from the element you picked out, and take out the element
in the intersection of your lines.(I can’t draw picture due to the limited technology here. I would like to find
someone to help me out.)

1
 3

 5
2
4
1
5
1
1
8
3
0

1
4 

2 
6
And then put those 4 element together into determinant. it is a principal minor
1 4 1 6 Because we have 4 element in diagonal, and each choice of two element gives us a principal minor. And
there are 6 choice to choose 2 from 4. So we have 6 principal minor. Please practice that method to find
other five. And finally, you would found that they are:
145
1 4
3 1
1 1
,
5 3
1 1
,
2 6
1 4
,
1 6
1 8
,
5 3
3 2
,
0 6
Theorem 14. If A is a square matrix and fA (λ) is its characteristic polynomial. Suppose
fA (λ) = λn − a1 λn−1 + a2 λn−2 − a3 λn−3 + · · · + (−1)n an
. Then
ai = The sum of all principal minor of size i
In particular, a1 is the trace of the matrix, and an is the determinant of the matrix.
Proof.
For a square matrix A, suppose
det(λI − A) = λn − a1 λn−1 + a2 λn−2 − a3 λn−3 + · · · + (−1)n an
Then factor out λ we get
1
A) = λn − a1 λn−1 + a2 λn−2 − a3 λn−3 + · · · + (−1)n an
λ
Devide λn to right hand side, we have
λn det(I −
det(I −
1
λn
λn−1
λn−2
λn−3
1
A) = n − a1 n + a2 n − a3 n + · · · + (−1)n an n
λ
λ
λ
λ
λ
λ
This is
1
A) = 1 − a1 λ−1 + a2 λ−2 − a3 λ−3 + · · · + (−1)n an λ−n
λ
Now substitute λ by −λ
det(I −
1
A) = 1 + a1 λ−1 + a2 λ−2 + a3 λ−3 + · · · + an λ−n
λ
Now substitute λ by λ1 . We have
det(I +
det(I + λA) = an λn + an−1 λn−1 + · · · + a1 λ + 1
By doing derivative, you found the k’th coefficient could be found by
dk
det(I + λA)|λ=0
dλk
Remember the derivative of determinant is the sum of the determinant with derivative row by
k!ak =
row.
Then the k’th derivative of the matrix I + λA is the sum of detreminant of those matrix with
selected k many row of I + λA being diffrentiated, and if plug in λ = 0 That is exactly a determinant of
matrix A only with k many row selected and for position outside of the row there are only 1 on diagonal and
other position 0. The determinant of that is exactly a principal minor. And it has been counted k! times due
to the order. so
k!ak = k!(sum of the principal minor of size k)
Then ak =sum of the principal minor of size k.
146
Example 167. Yes, this is exactly what I mensioned as trick. I think in the future you will do a lot of
characteristic polynomial calculation in other courses, learn this and do this faster than others. Hahahaha!
Let’s Just take previous one forexample. I mensioned
that use the trick to calculate characteristic poly
2 1 −3
2 1
nomial of matrix
and  2 3 −6 
2 3
1 1 −2
Now lets calculate the first 2 by 2 matrix.
2 1
2 3
You know it is a 2 × 2 matrix, so you write on your paper.
λ2
And the sum of principal minor of size 1 is just the trace, so 2+3=5, so you write
λ2 − 5λ
(Please take care, -,+,-,+,-,+ is change alternating. be careful!)
and calculate the determinant: 2 × 3 − 1 × 2 = 4. So you write
f (λ) = λ2 − 5λ + 4
Yeah, we won!
OK and for the3 × 3 matrix.

2 1 −3
 2 3 −6 
1 1 −2
You know it is 3 × 3, so you write down
λ3
And then the next one would be the trace, so 2 + 3 + (−2) = 3, you write down (take care to change sign
alternatively!)
λ3 − 3λ2
OK, next coefficient, is the sum of principal minor of size 2. All principal minor is
2 1 = 4, 3 −6 = 0, 2 −3 = −1
2 3 1 −2 1 −2 And the sum of them are 4 + 0 − 1 = 3. So it is the coefficient with λ, you write down
λ3 − 3λ2 + 3λ
OK the last coefficient is the determinat. We found that determinant is 1. But now, take care! the last step
we should change the sign. So we wrote:
f (λ) = λ3 − 3λ2 + 3λ − 1
OK, we won!
Remember that the coefficient of polynomial is automatically have its own interpretation, in term of roots.
This is called Vieta’s formulas(You could Wikipedia it to know more!)
Lemma 1. Vieta’s formula Suppose P (x) is a polynomial of x of degree n over C with roots x1 , x2 · · · , xn
(considered in C, so all degree n polynomial would have n roots) Suppose
P (x) = xn − a1 xn−1 + a2 xn−2 − · · · + (−1)n a0
Then
147

a1



 a2
..

.


 a
n
= x1 + x2 + · · · + xn
= (x1 x2 + x1 x3 + · · · + x1 xn ) + (x2 x3 + · · · + x2 xn ) + · · · + xn−1 xn
= x1 x2 x3 · · · xn
Remark 6. We putting polynomial as alternating sum because only in alternating sum and leading coefficient one can those formula have the most simple look, or otherwise it would come with power of -1 and
divede something. And as you can see, in the characteristic polynomial case, if we put alternating sum.
then each coefficient is just the simple sum of principal minors. So putting alternating sum and has leading
coefficient 1 is the most natrual way to write polynomial. We are following natural, not trying to make you
in trouble : )
Now we have two different interpretation of characteristic polynomial, and keep in mind that the roots
are eigenvalues, so we have the following property.
Corollary 7. Suppose T : V −→ V is a linear transformation in n-dimensional space and A is its representation on some basis of V. Then

tr(A)
= a1 = λ1 + · · · + λn



 sum of principal minor of size 2 = a2 = (λ1 λ2 + λ1 λ3 + · · · + λ1 λn ) + (λ2 λ3 + · · · + λ2 λn ) + · · · + λn−1 λn
..

.


 det(A)
= an = λ1 λ2 λ3 · · · xn
With those fun property, We can do some interesting things.
Example 168. Suppose A is a 2 × 2 matrix, and suppose tr(A) = 3 and tr(A2 ) = 5, find det(A).
What? Are you sure? Yes! We could able to know determinant if we know it is 2 by 2 and its trace of A
and A square! It’s fun!
Because A is a 2 × 2 matrix, then it would have 2 eigenvalue(multiplicity considered), suppose it is
λ1 , λ2
Then the eigenvalue of
A2
would be
λ21 , λ22
(This is by spectral mapping theorem, you would know in next few pages.)
So because trace is the sum of two eigenvalues. So
3 = tr(A) = λ1 + λ2
5 = tr(A2 ) = λ21 + λ22
Then sqare the first equation we get
9 = λ21 + 2λ1 λ2 + λ22
Because we already know from the second equation that
5 = λ21 + λ22
So take the difference of them, we get
4 = 2λ1 λ2
So that is
λ1 λ2 = 2
148
And we know the product of two eigenvalue is equal to the determinant, so
det(A) = 2
Example 169. We call the n × n matrix A that has some power equal to n as nilpotent matrix. Show that if
Am = 0 for some m, then all the eigenvalue for A is equal to 0
Answer: We don’t know the eigenvalue for A actually, but we can assume the eigenvalue for A is
λ1 , λ2 , · · · , λn
So with this assumption, the eigenvalue for Am is
m
m
λm
1 , λ2 , · · · , λn
But Am = 0, we already know the eigenvalue for 0 matrix. because the characteristic polynomial for 0
matrix is det(λI − 0) = λn , so all the eigenvalue for 0 matrix is 0. In another hand, this means

λm
= 0

1


 λm = 0
2
..

.


 λm = 0
n
So we should have

λ1



 λ2
..

.


 λ
n
= 0
= 0
= 0
Now let’s look another theorem powered by the det(I − AB) = det(I − BA) Actually by replacing
A by λA, we could have general formula like det(I − λAB) = det(I − λBA) and this is also works for
nullity. We have null(I − λAB) = null(I − λBA)
Theorem 15. Suppose A is an m × n matrix and B is an n × m matrix. Then the characteristic polynomial
of AB and BA are only different by the factor of a power of λ
Proof.
det(λI − AB) = λm det(I − λ1 AB)
= λm det(I − λ1 BA)
= λm−n det(λI − BA)
So the only difference of them is just by the difference of their size.
Corollary 8.
tr(AB) = tr(BA)
Proof. because we know the characteristic polynomial of AB and BA are just different by a factor of λm−n .
And the trace of AB is just the opposite second coefficient of characteristic polynomial of AB. and trace
of BA is just the opposite second coefficient of characteristic polynomial of BA. But now because this two
polynomial has the same second coefficient(because this two polynomial only different by a factor of λm−n ).
so they are the same.
149
Upper triangulize a matrix by similarity
Last course we said that we are going to diagonalize a matrix, but as you see diagonalizable needs strong
condition that the algebraic multiplicity should be equal to the geometric multiplicity. As a result, the case
always happen is that we can’t diagnalize a matrix. In this case, some property would be hard to look into.
Instead, we could still find matrix P and make our matrix similar to an upper triangular matrix.
Firstly, let’s look into upper triangular matrix.
Proposition 92. If M is an upper triangular or lower triangular matrix, then the eigenvalue of M is exactly
appear on the diagonal, the multiplicity of appearance is exactly the same as algebraic multiplicity.
Proof.
Suppose

a11 a12 · · ·

a22 · · ·

A=
..

.
a1n
a2n
..
.





ann
Then the characteristic polynomial of A is
λ − a11 −a12 · · ·
λ − a22 · · ·
fA (λ) = ..
.
−a1n
−a2n
..
.
λ − ann
Then because the determinant of upper triangular matrix is the product of diagonal.
Then
fA (λ) = (λ − a11 )(λ − a22 ) · · · (λ − ann )
Thus, the diagonal are exactly the root of this polynomial
And the numerical multiplicity are exactly the same as the multiplicity it appears on diagonal. Example 170. Find the eigenvalue and eigenvector for the matrix


1 2 3
6 7 
A=
9
Answer: For this upper triangular matrix. we observe that 1,2,9 are all on diagonal, then 1,2,9 are its
eigenvalue. For 1, we find


−2 −3
I − A =  −5 −7 
−8
By setting up the equation
   

−2 −3
x
0
 −5 −7   y  =  0 
0
−8
z
We found 1 elementary solutions.
 
1

0 
c1 =
0
Now, let’s continue
150

 
x
5 −2 −3
−7
(6I − A)  y  = 
z
−3
We found another eigenvector
 
2

5 
c2 =
0
Now let’s also continue
  
x
8 −2 −3
3 −7
(9I − A)  y  = 
z

  
x
0
 y  =  0 
z
0

  
x
0
 y  =  0 
z
0
We found our last eigenvector:


23
c3 =  56 
24
Now let’s see the method to similar a matrix to upper triangular matrix.
Lemma 2. For any matrix A over C(It is C! we need every polynomial to have roots), we are able to find
P, such that P −1 AP is a block upper trangular matrix like


∗
∗
···
∗
λ

∗
∗
···
∗ 



∗
∗
···
∗ 
P −1 AP = 


..
..
.. 
.
.

.
.
.
. 
∗
∗
···
∗
Proof.
We view the n × n matrix A as a linear transformation by left multiplication on Mn×1 (C).
Then the characteristic polynomial of A is det(λI − A)
This polynomial would at least has 1 root, say λ0

c11
 c12 


Then we found a eigen vector for λ0 , say c1 =  .. 
 . 
c1n
Now because c1 is a non-zero column matrix, then some of the entry in c1 is not zero.
let’s say c1i 6= 0
Then we construct other column vector ck to be different column vectors that having
1 entry at place different than i and other entries 0.
Then c1 , c2 · · · , cn form a basis in Mn×1 (C)and we know Ac1 = λc1 .
So


∗
∗
···
∗
λ

∗
∗
···
∗ 



∗
∗
···
∗ 
A c1 c2 · · · cn = c1 c2 · · · cn 


..
..
.. 
.
.

.
.
.
. 
∗
∗
···
∗
151
Let P =
c1 c2 · · · cn

λ



AP = P 


So

λ



P −1 AP = 


, we have
∗
∗
∗
..
.
∗
∗
∗
..
.
···
···
···
..
.
∗
∗
∗
..
.
∗
∗
···
∗
∗
∗
∗
..
.
∗
∗
∗
..
.
···
···
···
..
.
∗
∗
∗
..
.
∗
∗
···
∗














Proposition 93. For any matrix A over C, we are able to find some invertible matrix P, such that P −1 AP is
an upper triangular matrix.
Proof. Firstly, let’s find P1 , such that
P1−1 AP1
λ1 r1
B
λ2 r2
C
=
And for B, we find P2 , such that
P2−1 BP
And we have
1
−1
P2
P1−1 AP1
=
1
P2
=
−1 1
P2
λ1

λ1
=
= 
r1 P2
−1
P2 BP2
λ1 r1
B
1
P2

r1 P2
λ2 r2 
C
And we continue doing this for C and continue and continue, finally we will get an upper triangular
matrix.
So we have use the similarity to relate every matrix to an upper triangular matrix. Which property is
preserved under similarity? Let’s look
Summary 1. The following property are preserved under similarity.
(1) det(A) = det(P −1 AP )
(2) rank(A) = rank(P −1 AP )
(3) Characteristic polynomial and eigenvalues.
(4) Algebraical multiplicity of an eigenvalue λ
(5) Geometrical multiplicity of an eigenvalue λ
152
Proof.
For (1), det(P −1 AP ) = det(P )−1 det(A) det(P ) = det(P )−1 det(P ) det(A) = det(A).
For (2), because rank does not change under right or left multiplication of invertible matrix. then
rank(P −1 AP ) = rank(AP ) = rank(A)
For (3) We already proved that, because characteristic polynomial are preserved, then the eigenvalue is the root of characteristic polynomial, so of course it is preserved.
For (4) Because characteristic polynomial are preserved, so of course, the algebraical multiplicity
of an eigenvalue is preserved.
For (5) Because Geometrical multiplicity is rank(λI −A), and λI −A is similar to λI −P −1 AP ,
so by (2), it is preserved.
With the discussion above, We may change a matrix to an upper triangular matrix without changing
essential property of it and then discover the relation of algebraical multiplicity and geometrical multiplicity.
Because any matrix over C could similar to an upper triangular matrix, thus it is an useful tool to explain
everything.
4.3. Polynomial of linear transformation, Spectral mapping theorem, minimal polynomial of linear
transformation. In this section, we are going to discuss more detail about algebraic multiplicity and geometrical multiplicity, and related it to the power-rank curve. And then discuss all the properties of a
polynomial of linear transformation has.
We will see the upepr-triangular matrices would be helpful.
Lemma 3. Suppose f (x) = xn +a1 xn−1 +· · ·+an is an polynomial over F, then f (P −1 AP ) = P −1 f (A)P
We call this property that similarity doesn’t change the polynomial of linear transformation.
Proof. We discovered
−1
−1
−1
−1
(P −1 AP )n = P
{z AP × · · · × P AP}
| AP × P AP × P
n many
= P −1 AP × P −1 AP × · · · × P −1 AP
= P −1 |A × A ×
{z· · · × A} P
nmany
= P −1 An P
Now let f (x) = xn + a1 xn−1 + · · · + an , then
f (P −1 AP ) =
=
=
=
(P −1 AP )n + a1 (P −1 AP )n−1 + · · · + an I
P −1 An P + a1 P −1 An−1 P + · · · + an
P −1 (An + a1 An−1 + · · · + an I)P
P −1 f (A)P
Theorem 16. Spectral mapping theoremSuppose A is a n × n matrix over C, then A has n eigenvalues(multiplicity considered), namely, λ1 , λ2 , · · · , λn . f is a polynomial, then the eigenvalues of f (A) is
exactly f (λ1 ), f (λ2 ), · · · , f (λn ).
Proof. Choose P such that P −1 AP is an upper-triangular matrix. Then the element on the diagonal is
exactly the eigenvalues with multiplicity considered. Now we act polynomial on, then the diagonal of
f (P −1 AP ) is exactly the polynomial on each entries on diagonal, so P −1 f (A)P has eigenvalues f (λ1 ), f (λ2 ), · · · , f (λn ).
So is f (A)
153
Example 171. Suppose A is an n × n matrix with characteristic polynomial
(λ − 1)3 (λ − 2)2
Find the characteristic polynomial of A2 + I
Answer: Characteristic polynomial is the product of the factor of eigenvalue where algebraic multiplicity
counted. So this fact tells us that:
Eigenvalues for A is1, 1, 1, 2, 2
Now A2 + I is getting just by applying the polynomial x2 + 1, so the eigenvalue of A2 + I is getting by
apply the polynomial x2 + 1 to the eigenvalue of A. then A2 + I has eigenvalue
Eigenvalues for A2 + Iis2, 2, 2, 5, 5
So A2 + I has characteristic polynomial (λ − 2)3 (λ − 5)2
Thus, the spectral mapping theorem helps us to determine the eigenvalues and algebraic multiplicity of
an eigenvalue, but we still don’t understand the geometric multiplicity part.
Let’s recall what is geometric multiplicity. For an n × n matrix A, and λ be one of its eigenvalue. Then
the geometric multiplicity of A is
null(A − λI) = n − rank(A)
. Thus, to know the geometric multiplicity of a polynomial, say, f (A), we has to know null(f (A) − f (λ)I),
and this is enough to study the nullity of a polynomial expression of linear transformation.
Firstly, if we come a polynomial, say λ2 − 3λ + 2, then we could factor it to be like (λ − 1)(λ − 2) So
in general we could write a polynomial as (λ − λ1 )n1 (λ − λ2 )n2 · · · (λ − λk )nk
The following lemma would be useful, that means the nullity of coprime factor could split up.
Lemma 4. Suppose α and β be different number, and A is an n × n matrix, then
null((A − αI)r (A − βI)s ) = null((A − αI)r ) + null((A − βI)s )
Proof. Let us denote B = A − βI, and let γ = β − α, so B − γI = A − αI
We only need to show
null((B − γI)r B s ) = null((B − γI)r ) + null(B s )
and because null(B − γI) = null(γI − B), so lets show
null((γI − B)r B s ) = null((γI − B)r ) + null(B s )
For the right hand side of the equation, we have
r
s
null((B − γI) ) + null(B ) = null
(B − γI)r
Bs
Now let’s do row transformation. We can use matrices. please remember an important formula we learned
in middle school saying
ak − bk = (a − b)(ak−1 + ak−2 b + ak−3 b2 + · · · + bk−1 )
k
k
−b
So aa−b
= ak−1 + ak−2 b + ak−3 b2 + · · · + bk−1 . This looks like fractional, but not fractional. We will
use this trick.
Starting to do row transformation
154
(γI − B)r
Bs
s
s
I−B r
bc2 +bc1 ×( γγI−B
)
(γI − B)r (γ s I − B s )r
Bs
(γI − B)r γ sr I + B s g(B s )
Bs
(γI − B)r γ sr I
Bs
(γI − B)r
γ sr I
1
r
s
− γ sr (γI − B) B
γ sr I
−−−−−−−−−−−−→
=
br1 −B s g(B s )×br2
−−−−−−−−−−−→
s
br2 − γBsr ×br1
−−−−−−−−→
bc1 −bc2 ×
(γI−B)r
γ sr
−−−−−−−−−−−→
bc1 ↔bc2
−−−−−−−−−−−−−→
− γ1sr (γI − B)r B s
−−−−−→
bc1 × γ 1sr ;bc2 ×(−γ sr )
γ sr I
− γ1sr (γI − B)r B s
I
(γI − B)r B s
here the g is some polynomial.
From above, we know
null(γI − B)r + null(B s ) = null(I) + null((γI − B)r B s )
and with null(I) = 0, we get the formula
null(γI − B)r + null(B s ) = null((γI − B)r B s )
Example 172. Suppose A is an n × n matrix with eigenvalues 1,2,3. And the geometric multiplicity for 1 is
2, for 2 is 2, for 3 is 2. Find the geometric multiplicity for the matrix A2 − 3A of eigenvalue -2
Yes, it has eigenvalue -2 by spectrual mapping theorem. To find the geometric multiplicity of A2 − 3A of
eigenvalue -2. We need to find
null(A2 − 3A + 2I)
We found A2 − 3A + 2I = (A − I)(A − 2I), so
null(A2 − 3A + 2I) = null((A − I)(A − 2I)) = null(A − I) + null(A − 2I) = 2 + 2
Because remember the nullity of A − I is the geometric multiplicity of eigenvalue 1, and the nullity of
A − 2I is the geometric multiplicity of 2. but we all know they are 2 from the assumption.
Previously, we know geometric multiplicity of eigenvalue λ0 of A is equal to null(A − λ0 I), but we still
didn’t give an explanation of what algebraic multiplicity is. With our powerful power-rank curve, we could
have a look in it.
155
Theorem 17. Suppose A is an n × n matrix, and λ0 be one of its eigenvalue. Then the stable nullity of
A − λ0 I is equal to the algebraic multiplicity of λ0 .
Proof. Firstly, by considering I, A, A2 , · · · , An , they are n+1 vectors in n-dimensional space. So they are
linearly dependent. Thus there exists a polynomial f (λ) such that f (A) = 0. Now suppose
f (λ) = (λ − λ1 )k1 (λ − λ2 )k2 · · · (λ − λm )km
Because the right hand side is 0, we can raise the power km to be large enough such that the nullity of
each factor stable.
So we know
(A − λ1 )k1 (A − λ2 )k2 · · · (A − λm )km = 0
take nullity and use theorem on both side we have
null((A − λ1 )k1 ) + null((A − λ2 )k2 ) + · · · + null((A − λm )km ) = n
Now change all the ki by si , where si is the rank-stable number, which is the minimal number that rank
stop dropping.
If some si = 0, then null((A − λi )si ) = null(I) = 0 so we could miss that because it has 0 contribution
to the equality. Then we only write down those term with stable number larger than zero
null((A − λ1 )s1 ) + null((A − λ2 )s2 ) + · · · + null((A − λm )sm ) = n
Because stable number larger than zero means not invertible. So each of the λi is eigenvalue. And we
know null((A − λ1 )s1 ) is the geometric multiplicity of eigenvalue 0 of (A − λ1 )s1 , which can not exceeded
algebraic multiplicity of eigenvalue 0 of (A−λ1 )s1 , which is equal to the algebraic multiplicity of eigenvalue
λ1 of A due to spectrual mapping theorem. So we know
null((A − λi )si ) ≤ ai
where ai is the algebraic multiplicity of λi .
So
n = null((A − λ1 )s1 ) + null((A − λ2 )s2 ) + · · · + null((A − λm )sm )
≤ a1 + a2 + · · · + am
= n
So because restrict by the n on both side of inequality. The only case that could be happen is equal. So
the stable nullity is equal to the algebraic multiplicity.
Now we give an explaination of the algebraic multiplicity, it is the stable nullity. In the mean time,
geometric multiplicity is just the nullity that you go by the first step, no wonder why it is smaller.
Example 173. Suppose A is an 8 × 8 matrix with characteristic polynomial
(λ − 1)5 (λ − 2)3
Suppose we know the geometric multiplicity of eigenvalue 2 is 1, then what is rank(A − 2I), rank((A −
2I)2 ), and rank((A − 2I)3 ), rank((A − 2I)4 )?
Because we know the geometric multiplicity of 2 is just null(A − 2I) = 1, and we know the stable nullity
is algebraic multiplicity of 2, which is 5. So the rank curve, would start from 8. drop by 1 at first step, and
finally reach 8-5=3 as its stable rank. Because at first it drops by 1, and non-acceleration garantees that
each step it could only drop by 1 or 0. but 0 means stable. so it drop by 1 each time until it touches 3. So
rank(A − 2I) = 7, rank((A − 2I)2 ) = 6, rank((A − 2I)3 = 5, rank((A − 2I)4 = 4, rank((A − 2I)5 ) =
3, rank((A − 2I)6 ) = 3, rank((A − 2I)7 ) = 3, · · ·
156
Definition 72. Suppose A is a n × n matrix, We call a polynomial p(λ) as annhilating polynomial for A if
p(A) = 0
2
Example 174. For example, if A =
is 2 × 2 matrix, then p(λ) = λ2 − 3λ + 2 could be an
1
annhilating polynomial because if we plug our matrix in it, it would become
4
2
1
0 0
2
A − 3A + 2I =
−3
+2
=
1
1
1
0 0
Corollary 9. For n × n matrix A, Its characteristic polynomial is annhilating polynomial.
Proof. Suppose we have
null((A − λ1 )s1 ) + null((A − λ2 )s2 ) + · · · + null((A − λm )sm ) = n
in which each of the λi are eigenvalue, and si is the rank stable number. Because the algebraic multiplicity
for λi , which we denote as ai , serves as stable nullity. And the previous theorem says stable nullity is always
larger than the stable number, so ai ≥ si . So at ai , it already stable, so
null((A − λ1 )a1 ) + null((A − λ2 )a2 ) + · · · + null((A − λm )am ) = n
That is
(A − λ1 )a1 (A − λ2 )a2 · · · (A − λm )am = 0
But left hand side is exactly the characteristic polynomial apply on A. So this tells us characteristic
polynomial kills A.
This is a very useful corollary, let’s see some example
Example 175. Suppose A is an 2 × 2 matrix with tr(A) = −1 and det(A) = 1, show that A3 = I
We don’t know what happened actually, but we could try. because we have trace and determinant, from
here we know the characteristic polynomial of A is
fA (λ) = λ2 + λ + 1
Now given that characteristic polynomial is annhilating, we know
A2 + A + I = 0
, that is
A2 = −A − I
Let’s repetidly using this and test out what is A3 , and replace A2 by −A − I whenever it appears.
Multiply A we know
A3 =
=
=
=
−A2 − A
−(−A − I) − A
A+I −A
I
But sometimes the characteristic polynomial is not the most efficient one. Just think identity matrix for
example. The eigenvalue for identity matrix is al l the one. So the characteristic polynomial for identity is
(λ − 1)n , where n is the size of it. But actually λ − 1 is enough to kill it. The later one, we would call
minimal polynomial.
157
Definition 73. Suppose A is an n × n matrix, the minimal polynomial of A is defined to be the lowest degree
with leading coefficient one annihilating polynomial
1 1
Example 176. Consider the matrix A =
1
Then A is an upper triangular matrix, with eigenvalue 1 and 1. So we know the eigenpolynomial kills A,
so at least
(A − I)2 = 0
But is it minimal? Yes, it is minimal, because we couldn’t lower any degree of each factor. if we plug in
A into A − I, the only matrix we will get is
1
0 0
A−I =
6=
0 0
And we still need to square it to get zero zero.
Previously, with the method showing in the nullity. We see only the factor of eigenvalue contributes
nullity. The minimal polynomial is looking for those smallest power such that it could provide enough
nullity, so of course it should be the rank-stable number. Because rank-stable number is the smallest number
that the nullity does not increase more.
Theorem 18. Suppose A is an n × n matrix, and λ1 , λ2 , · · · , λk are all its distinct eigenvalues, then the
minimal polynomial of A is equal to
(λ − λ1 )s1 (λ − λ2 )s2 · · · (λ − λk )sk
Where each sk is the rank stable number for A − λk I
Example 177. Suppose T : V −→ V is a linear transformation of 15 dimensional space, and its characteristic polynomial is (λ − 1)(λ − 2)2 (λ − 3)6 (λ − 4)6 and suppose we know

 dim(Im(T − 2I)) = 14
dim(Im(T − 3I)3 ) = 11

dim(Im(T − 4I)2 ) = 12
Find the minimal polynomial of T.
Answer: Finding the minimal polynomial is the same as finding the rank-stable number. Combine with
the fact that stable nullity of T − λ0 I is the algebraic multiplicity for λ0 , we know that
dim(Im(T − 2I)∞ ) = 15 − (algebraic multiplicity of 2) = 15 − 2 = 13
Similarly,
dim(Im(T − I)∞ ) = 14
dim(Im(T − 3I)∞ ) = 9
dim(Im(T − 4I)∞ ) = 9
After plotting out the power-rank curve, you would find that the rank-stable number for T − I is 1, for
T − 2I is 2, for T − 3I is 5, for T − 4I is also 5. So the minimal polynomial is just obtained by putting all
the rank stable number to each corresponding factor, so minimal polynomial of T is
(λ − 1)(λ − 2)2 (λ − 3)5 (λ − 4)5
158
Example 178. Suppose A is a square matrix with minimal polynomial λ2 − 3λ + 1 = 0, expressing A−1 in
term of a polynomial of A.
A2 − 3A + I = 0
We could write it as
(A − 3I)A + I = 0
Thus
(A − 3I)A = −I
So
(3I − A)A = I
then
A−1 = 3I − A
4.4. (Coming soon, Not in Final)Jordan Canonical Form.
4.5. (Coming soon, Not in Final)Root subspace and classification of invariant subspaces.
4.6. (Coming soon, Not in Final)Classification of linear transformation commuting with T. coursenote
ends here, I’ll continue writting
159