Download 1 Welcome to the world of linear algebra: Vector Spaces

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Location arithmetic wikipedia , lookup

Addition wikipedia , lookup

Classical Hamiltonian quaternions wikipedia , lookup

Vector space wikipedia , lookup

Determinant wikipedia , lookup

Bra–ket notation wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Linear algebra wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Matrix calculus wikipedia , lookup

Transcript
Engineering Mathematics 1–Summer 2012
Linear Algebra
The commercial message. Linearity is a key concept in mathematics and its applications.Linear objects are
usually nice, smooth, clean, easy to handle. Nonlinear ones are slimy, temperamental, and full of dark corners in
which monsters can hide. The world, unfortunately, has a strong tendency toward non-linearity, but there is no
way that we can understand nonlinear phenomena without first having a good understanding of linear ones. In
fact, one of the most common first approaches to a non-linear problem may be to approximate it by a linear one.
1
Welcome to the world of linear algebra: Vector Spaces
Vector spaces, also known as a linear spaces, come in two flavors, real and complex. The main difference between
them is what is meant by a scalar. When working with real vector spaces, a scalar is a real number. When working
with complex vector spaces, a scalar is a complex number. The important thing is not to mix the two flavors.
You either work exclusively with real vector spaces, or exclusively with complex ones. Well . . . ; nothing is that
definite. There are times when one wants to go from one flavor to the other; but that should be done with care.
One thing to remember, however, is that real numbers are also part of the complex universe; a real number is just
a complex number with zero imaginary part. So when working with complex vector spaces, real numbers are also
scalars because they are also complex numbers.
So what is a vector space? We could give a vague definition saying it is an environment that is linear algebra
friendly. But we need to be more precise. Linear operations are very basic ones, so as a better definition we can
say that a vector space is any set of objects in which we have defined two operations:
1. An addition, so that if we have any two objects in that set, we know what their sum is. And that sum should
be also an element of the set.
2. A scalar multiplication; it should be possible to multiply objects of the set by scalars to get (usually) new
objects in the set. If by scalars we mean real numbers, then we have a real vector space; if we mean complex
numbers, we have a complex vector space.
There is some freedom in how these operations are defined, but only a little bit. The operations should have the
usual properties we associate with sums and products. Before giving examples, we need to be a bit more precise.
In general, I will use lower case boldface letters to stand for vectors, a vector being simply an object in a
set that has been identified as a vector space, and regularly written lower case letters for scalars (real or complex
numbers). If V is a vector space, if v, w are elements of V (in other words, vectors), then their sum is denoted by
(written as) v + w, which also has to be in V .
If a is a scalar, v an element of V , then av denotes the scalar product of a times v. It should be an element of
V.
Summary of the information so far. By scalar we mean a complex or a real number. We should be aware that
if our scalars are real numbers, they should always be real; if complex, always complex. A vector (or linear) space
is any set of objects in which we have defined a sum and a scalar product satisfying some (soon to be specified)
properties. In this context, the elements of this set are called vectors. So we can say that a vector set is a bunch
of vectors closed under addition and scalar multiplication, a vector is an element of a vector space.
Finally!, here are the basic properties these operations have to satisfy. For a set V of objects to deserve to be
called a vector space it is, in the first place necessary (as mentioned already several times) that if v, w ∈ V , then
we have defined v + w ∈ V , and if a is a scalar, v ∈ V , we have defined av ∈ V . These operations must satisfy:
1. (Associativity of the sum) If v, w, z ∈ V , then (v + w) + z = v + (w + z). This allows us to write the sum
of any number of objects (vectors) without using parentheses: If v, w, y, z ∈ V , it makes sense to write
v + w + y + z. One person may compute this sum by adding first v + w, then y + z, finally adding these
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
2
two together. Another person may first compute w + y, then add v in front to get v + (w + y), finally add
z to get (v + (w + y)) + z. The result is the same.
2. (Commutativity of the sum) If v, w ∈ V , then v + w = w + v.
3. (Existence of 0) There exists a unique element of V , usually denoted by 0, such that v + 0 = v, for all v ∈ V .
4. (Existence of an additive inverse) If v ∈ V , there exists a unique element −v ∈ V such that v + (−v) = 0.
This allows us to define subtraction; one writes v − w for v + (−w).
5. (Distributivity 1) If v, w ∈ V , a ∈ R, then a(v + w) = av + aw.
6. (Distributivity 2) If a, b ∈ R, and v ∈ V , then (a + b)v = av + bv.
7. (Associativity 2) If a, b ∈ R, v ∈ V , then a(bv) = (ab)v = b(av).
8. (One is one) 1v = v for all v ∈ V . (Here 1 is the scalar number 1.)
These properties have consequences. The main consequence is that one operates with vectors as with numbers,
as long as things make sense. So, for example, in general, one can’t multiply a vector by a vector (there will be
particular exceptions). But otherwise, what seems right is usually right. Here are some examples of what is true
in any vector space V . Any scalar times the 0 vector is the zero vector. In symbols: a0 = 0. It is not one of
the eight properties listed above, but it follows from them, It is also true that the scalar 0 times any vector is the
zero vector: 0v = 0. One good thing is that all the linear spaces we will consider will be quite concrete, and all
the properties of the operations quite evident. At least, so I hope. What this abstract part does is to provide a
common framework for all these concrete sets of objects.
Here are a number of examples. It might be a good idea to check that in every case the 8 properties hold or,
at the very least, convince yourself that they hold.
• Example 1. The stupid vector space. (But, since we are scientists and engineers, we cannot use the
word stupid. We must be dignified. It is usually called the trivial vector space.) It is a silly example, but
it needs to be seen. It is the absolute simplest case of a linear space. The space has a single vector, the 0
vector. Addition is quite easy; all you need to know is that 0 + 0 = 0. So is scalar multiplication: If a is a
scalar, then a0 = 0. And −0 = 0.
The trivial vector space can be either real or complex. The next set of examples consist of real vector spaces.
• Example 2. The next vector space, just one degree above the previous one in complexity, is the set R of real
numbers. Here the real numbers are forced to play a double role, have something like a double personality:
On the one hand they are the numbers they always are, the scalars. But they also are the vectors. If a, b ∈ R
(I write them in boldface to emphasize that now they are acting as vectors), one defines a + b as usual. And
if a ∈ R and c is a scalar, then ca is just the usual product ca of c times a, but now interpreted as a vector.
• Example 3. Our main examples of real vector spaces are the spaces known as Rn , where n is a positive
integer. We already saw R1 ; it is just R. Now we will meet the next one: the space R2 consists of all pairs of
real numbers; in symbols:
R2 = {(a, b) : a, b are real numbers}.
As we all know, we can identify the elements of R2 with points in the plane; once we fix a system of cartesian
coordinates, we identify the pair (a, b) with the point of cartesian coordinates (a, b). Operations are defined
in a more or less expected way:
(a, b) + (c, d)
=
(a + c, b + d),
(1)
a(c, d)
=
(ac, ad).
(2)
Verifying associativity, commutativity, and distributivity, reduces to the usual associative, commutative, and
distributive properties of the operations for real numbers. The 0 element is 0 = (0, 0); obviously
(a, b) + 0 = (a, b) + (0, 0) = (a + 0, b + 0) = (a, b).
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
3
The additive inverse is also easy to identify, −(a, b) = (−a, −b):
(a, b) + (−a, −b) = (a + (−a), b + (−b)) = (0, 0) = 0.
These operations have geometric interpretations. I said that R2 can be identified with points of the plane.
But another interpretation is to think of the elements of R2 as being vectors, we represent a pair (a, b) as an
arrow beginning at(0, 0) and ending at the point of coordinates (a, b). Then we can add the vectors by the
parallelogram rule: Complete the parallelogram having two sides determined by the vectors you want to add.
The diagonal of the parallelogram from the origin is the sum of the vectors. The picture shows how to add
graphically a = (5, 2) and b = (3, 7) to get a + b = (8, 9).
In many applications one needs free vectors, vectors that do not have their origin at the origin. We can then
think of the pair (a, b) as an arrow that can start at any point we wish, but once we fix the starting point,
the end-point, the tip of the arrow, is a units in the x-direction, b-units in the y-direction, from the starting
point. This gives us an alternative way of adding vectors: To add a and b, place the origin of b at the end
of a. Then a + b is the vector starting where a starts, ending where b ends.
Here is a picture of the sum of the same two vectors as before done by placing the beginning of one at the
end of the other. In black it is a + b, in red b + a. The picture shows that the end result is the same.
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
4
It is also easy to see what b − a should be graphically. It should be a vector such that when it follows a we
get b. In the parallelogram construction, it is the other diagonal of the parallelogram.
What happens if a and b are parallel? Drawing a parallelogram can be a bit of a problem, but following one
vector by the other is no problem. For example if a = (a1 , a2 ) and b = (−a1 , −a2 ), then a, b have the same
length and point in exactly opposite directions. If you place b starting where a ends, you cancel out the
effect of a. The sum is a vector starting and ending at the same point, the 0 vector. Of course, analytically,
a + b = (a1 − a1 , a2 − a2 ) = (0, 0) = 0.
The next picture is a graphic illustration of the associative property of the sum of vectors.
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
5
There is also a graphic interpretation of scalar multiplication. Thinking of these vectors as arrows, one
frequently reads that a vector is an object that has a magnitude and a sense of direction. So does a weather
vane and a lot of animals; I mention this to point out how vague this definition is. But it is a useful way
of thinking about vectors. A vector is a magnitude or length, pointing in some direction. Multiplying by a
positive scalar (number) keeps the direction the same, but multiplies the length of the vector by the scalar.
If the scalar is negative, the length gets multiplied by the absolute value of the scalar, and the vector gets
turned around so it points in the exact opposite direction from where it was pointing before. If the scalar
is 0, you √
get the zero vector. Incidentally, the magnitude or length of the vector of components (a, b) is
|(a, b)| = a2 + b2 (as Pythagoras has decreed!)
It is good to keep the graphic interpretation in mind for the applications, but when it comes to doing
computations seeing vectors in the plane as simply pairs of numbers, as R2 , makes for a better, more precise,
more efficient way of proceeding.
• Example 3. The space R3 . This is the space of all triples of real numbers; in symbols;
R3 = {(a, b, c) : a, b, c are real numbers}.
It is similar to R2 , just one additional component. The operations are
(a, b, c) + (d, e, f )
=
(a + d, b + e, c + f ),
(3)
r(a, b, c)
=
(ra, rb, rc).
(4)
The 0 element is 0 = (0, 0, 0); obviously
(a, b, c) + 0 = (a, b, c) + (0, 0, 0) = (a + 0, b + 0, c + 0) = (a, b, c).
The additive inverse is also easy to identify, −(a, b, c) = (−a, −b, −c):
(a, b, c) + (−a, −b, −c) = (a + (−a), b + (−b), c + (−c)) = (0, 0, 0) = 0.
We can think of these vectors as points in 3-space (after a system of cartesian coordinates has been set up)
or as free “vectors” in 3-space. In this second interpretation, v = (a, b, c) is an “arrow” that we can start
from any point we wish, as long as the end-point is a units in the x-direction, b in the y-direction, c in the
z-direction, away from its beginning. To add two vectors graphically we can still follows one vector by the
other one. Or we can start them both from the same point. If they are not parallel they will determine a
plane, and on this plane we can use the rule of the parallelogram to add them. If they are parallel, work on
any of the infinite number of planes
√ that contain the two vectors. The length or magnitude of v = (a, b, c) is
again given by Pythagoras: |v| = a2 + b2 + c2 .
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
6
• Example 4. Why stop at 3? If n is a positive integer, we denote by Rn the set of all n-tuples of real
numbers. A typical element of Rn will be denoted by a = (a1 , . . . , an ). With a bit of imagination we might
be able to imagine spaces of any number of dimensions; then the elements of Rn can be thought of as points
in an n-dimensional space. Isn’t this exciting? The operations are
(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , a2 + b2 , . . . , an + bn ),
c(a1 , . . . , an )
(5)
=
(ca1 , ca2 , . . . , can ).
(6)
The 0 element is 0 = (0, . . . , 0); obviously, if a = (a1 , . . . , an ), then
| {z }
npositions
a + 0 = (a1 , . . . , an ) + (0, . . . , 0) = (a1 , . . . , an ) = a.
The additive inverse is also easy to identify, −(a1 , . . . , an ) = (−a1 , . . . , −an ).
• Example 5. What if we try something a bit more exotic? Suppose we again take R2 , pairs of real numbers,
but define (out of perversity)
(a, b) + (c, d) = (ac, bd),
r(a, b) = (ra, rb).
Well, it won’t work. Commutativity and associativity still hold. There even is something acting like a zero
element, namely (1, 1). In this crazy definition, (a, b) + (1, 1) = (a1, b1) = (a, b). Most elements even have
“additive” inverses; in this definition (a, b) + (1/a, a/b) = (a(1/a), b(1/b)) = (1, 1), which is the zero element.
But “most” is not enough! It has to be ALL. And any pair of which at least one component is 0 has no
“additive” inverse. For example, (3, 0) + (c, d) in this strange bad addition works out to (3c, 0) and can never
be (1, 1), no matter what (c, d) is. This is not a vector space.
• Example 6. Matrices. A matrix is a rectangular array of numbers. In this arrangement, the horizontal
levels are called rows, each vertical grouping is a column. If the matrix has m rows and n columns, we say it
is an m × n matrix. Here is a picture copied (stolen?) from Wikipedia, illustrating some basics
The entries of a matrix are usually denoted by double subindices; the first subindex denotes the row, the
second the column in which the entry has been placed. The Wikipedia matrix seems to have no end to the
right or downwards. If I were to present an example of an abstract m × n matrix (as I am about to do), I’d
write it up as follows:


a11 a12 a13 · · · a1n
 a21 a22 a23 · · · a2n 


 ..
..
..
..
.. 
 .
.
.
.
. 
am1
am2
am3
···
amn
One surrounds the array of numbers with parentheses (or square brackets), so as to make sure they stay in
place. I also dispense with commas between the subindices in the abstract notation; I write a12 rather than
a1,2 for example. Wherever this could cause confusion, I would add commas. For example in the unlikely
event that we have to deal with a matrix in which m or n (or both) are greater than 10, talking of an element
a123 is confusing. It could be a1,23 or a12,3 . Commas are indicated.
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
7
We make a vector space out of m × n matrices, which I’ll denote by Mm,n , by defining addition and scalar
multiplication in what could be said to be the obvious way, element wise:


 
b11 b12 b13 · · · b1n
a11 a12 a13 · · · a1n
 a21 a22 a23 · · · a2n   b21 b22 b23 · · · b2n 


 
 ..
..
..
..
.. 
..
..
..
..  +  ..

 .

.
.
.
.
. 
.
.
.
.
am1
am2
am3
···
amn

bm1
bm2
bm3
···
bmn
a11 + b11
a21 + b21
..
.
a12 + b12
a22 + b22
..
.
a13 + b13
a23 + b23
..
.
···
···
..
.
a1n + b1n
a2n + b2n
..
.
am1 + bm1
am2 + bm2
am3 + bm3
···
amn + bmn


=






for the sum, and



c

a11
a21
..
.
a12
a22
..
.
a13
a23
..
.
···
···
..
.
a1n
a2n
..
.
am1
am2
am3
···
amn


 
 
=
 
ca11
ca21
..
.
ca12
ca22
..
.
ca13
ca23
..
.
···
···
..
.
ca1n
ca2n
..
.
cam1
cam2
cam3
···
camn





for the product by the scalar c. The same definitions, written in a more compact way, are: If A = (aij ) and
B = (bij ) are two m × n matrices, then
A + B = (aij + bij ),
cA = (cai,j ).
It is quite easy to check that with these operations Mm,n is a vector space. The 0 vector in this space is the
zero matrix, the matrix all of whose entries are 0. The additive inverse of (aij ) is (−aij ).
Just to make sure we are on the same page, here are a few examples. These could be typical beginning linear
algebra exercises.
1. Let

1 −3
 2
0
A=
 −7 7
0
5
Evaluate 2A − 3B + 5C.
Solution.
2. Let A =
1
3
2
0
0
−4

5
1 
,
4 
0

−5
 6
B=
 −1
4
2
7
−1
4

4
8 
,
0 
4

0
 −6
C=
 2
0

1 2
8 11 
.
4 3 
0 1


17 −7 8
 −46 19 33 

2A − 3B + 5C = 
 −1 37 23 
−12 −2 −7
2
−3 1 4 0
and let B =
. Solve the matrix equation
2
1 1 2 2
A + 5X = B.
Solution.
1
−4/5 −1/5
X = (B − A) =
−2/5 1/5
5
0 1
0 0 0
3. Evaluate A + B if A =
and B =
.
1 0
0 0 0
4/5 −2/5
6/5
0
.
Solution. IT CANNOT BE DONE! IMPOSSIBLE! These matrices belong to different
worlds, and different worlds cannot collide. Later on, we’ll see that these two matrices can be multiplied
(once we define matrix multiplication); more precisely, AB makes sense, while BA doesn’t. But for now
let’s keep in mind that matrices of different types canNOT be added (or subtracted).
1
WELCOME TO THE WORLD OF LINEAR ALGEBRA: VECTOR SPACES
8
• Example 7. Functions. I’ll just consider one case. Let I be an interval (open, closed, bounded, or not)
in the real line and let V be the set of all (real-valued) functions of domain I. If f, g are functions we define
the function f + g as the function whose value at x ∈ I is the sum of the values of f at x and of g at x; in
symbols
(f + g)(x) = f (x) + g(x).
We define the scalar product cf (if c is a real number, f a function on I) as the function whose value at x ∈ I
is c times the value of f at x; in symbols,
(cf )(x) = cf (x).
It is easy to see that we have a vector space in which 0 is the constant function that is identically 0, and if
f ∈ V , then −f is the function whose value at every x ∈ I is −f (x).
Every one of the examples has its complex counterpart, all we need to do is allow complex numbers as our
scalars. Here are examples 2, 3,4, 6, and 7, redone as complex vector spaces. In each case the real vector space is
a subset of the complex one.
• Examples 2’, 3’, 4’. The vector space Cn consists of all n-tuples of complex numbers. The operations are
defined by
(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , a2 + b2 , . . . , an + bn ),
c(a1 , . . . , an )
(7)
=
(ca1 , ca2 , . . . , can ).
(8)
The 0 element is 0 = (0, . . . , 0); obviously, if a = (a1 , . . . , an ), then
| {z }
npositions
a + 0 = (a1 , . . . , an ) + (0, . . . , 0) = (a1 , . . . , an ) = a.
The additive inverse is, as in the real case, −(a1 , . . . , an ) = (−a1 , . . . , −an ).
• Example 6’. A complex matrix is a rectangular array of complex numbers. We will denote the set of all
m×n complex matrices by Mm,n (C). Addition and scalar multiplication are defined as in the real case, except
that we allow complex scalars. As an example (one of a zillion trillion possible ones), let A, B ∈ M3,2 (C) be
defined by



√ 
5 −8
1+i
3
1 
A =  2 − 5i
B =  −2
0 ,
0
1
1
0
then (see if you get the same result!)
√
√
√
√ 3 + (5 + √3)i
3A − iB =  2 3 + (2 − 5 √
3)i
3


3 + 8i
−i 
−i
• Example 7’. Let I be an interval in the real line and consider all complex valued functions on I. In other
words, we consider √
all functions of the form f (t) = u(t) + iv(t), where u(t), v(t) are real valued functions of
domain I, and i = −1. If f (t) = u1 (t) + iv1 (t) and g(t) = u2 (t) + iv2 (t), where u1 , u2 , v1 , v2 are real valued
functions of domain I, we define f + g in the natural way. (f + g)(t) = u1 (t) + u2 (t) + i(v1 (t) + v2 (t)). If
f (t) = u(t) + iv(t) and c = a + ib is a complex number (a scalar), where a, b are real numbers and u, v are
real valued functions of domain I, we define cf to be the function
(cf )(t) = c(f (t)) = (a + ib)(u(t) + iv(t)) = au(t) − bv(t) = i(av(t) + bu(t)).
We have again a vector space, a complex one this time.
2
WHAT CAME FIRST, THE CHICKEN OR THE EGG?
1.1
9
Exercises
1. Which of the following sets V, with the operations as defined, is a (real) vector space? In each case either
verify all properties (and this includes identifying the zero element), or determine at least one property that
fails to hold.
(a) V is the set of all upper triangular n × n matrices, with addition and scalar product defined in the usual
way. A square matrix is upper triangular if and only if all entries beneath the main diagonal are 0.
(b) V = {(x, y, z) ∈ R3 : 2x − 3y + 4z = 1}, with the same operations as elements of R3 .
(c) V = {(x, y, z) ∈ R3 : 2x − 3y + 4z = 0}, with the same operations as elements of R3 .
(d) V = {(x, y) : x, y ∈ R, x > 0} with operations defined as follows:
(x, y) + (x0 , y) = (xx0 , y + y 0 ),
r(x, y) = (xr , ry),
if (x, y), (x0 , y 0 ) ∈ V , and r is a real scalar.
(e) I = (a, b) is an interval and V is the set of all (real valued) differentiable functions of domain I. That
is, f ∈ V if and only if f (x) is defined for all x ∈ I and the derivative f 0 (x) exists st each x ∈ I. The
operations are defined as usual for functions; that is, if f, g are functions on I, then the function f + g
on I is the function whose value at any point x is f (x) + g(x); that is, (f + g)(x) = f (x) + g(x). And if
f is a function, c a scalar, then cf is defined by (cf )(x) = cf (x).
(f) V is the set of all polynomial (functions); a polynomial being an expression of the form
p(x) = an xn + · · · + a1 x + a0 ,
where a0 , . . . , an are real numbers. Operations (addition and scalar multiplication) as usual
(g) V is the set of all polynomials of degree higher than 2, plus the 0 polynomial.
2. Let y (n) + an−1 (t)y (n−1) + · · · + a1 (t)y 0 + a0 (t)y = 0 be a linear homogeneous differential equation of degree
n with coefficients a0 (t), . . . , an−1 (t) continuous functions in some open interval I of R. Show that the set
of solutions of this equation in I is a real vector space. Operations on solutions are defined as usual (see
Exercise 1e)
3. Show that the set of complex numbers is a real vector. That is, if we forget about multiplying two complex
non-real numbers and only consider products of the form cz, where c is real and z is complex, then C is a
real vector space.
2
What came first, the chicken or the egg?
In his book “God or Golem. A Comment on Certain Points Where Cybernetics Impinges on Religion,” Norbert
Wiener (a famous mathematician of the first half of the twentieth century, who among many other things coined the
word cybernetics from the Greek word for steersman) reexamines the age old precedence question. He concludes
that the question is meaningless, since both carry the same information. While we usually see an egg as the
device by which a chicken produces another chicken, Wiener argues that we may as well consider a chicken as the
means by which an egg produces another egg. If two objects have the same information, they are the same. The
point of this, when applied to vector spaces, is that SO FAR, the information carried by an m × n matrix is the
same as that carried by a row of mn numbers, or of mn numbers written out in any way we please.
Here is something one can do with m × n matrices. I can take an m × n matrix and write out the elements in
a row (or column) of mn elements in a variety of ways. For example, I can write out first the elements of the first
row, followed by those of the second row, and so forth. To illustrate I’ll use the case of m = 2, n = 3, but a similar
thing holds for all m, n. Suppose
1 2 3
0 6 7
A=
, B=
0 −2 4
2 0 −1
3
SUBSPACES
10
and we want to compute 2A + 3B. Instead of doing it the usual way, we could proceed as follows. First we write
both A, B as rows of numbers by the procedure I mentioned above. We call these new objects A0 , B 0 :
A0 = (1, 2, 3, 0, −2, 4),
B 0 = (0, 6, 7, 2, 0, −1).
A0 , B 0 are vectors in R6 . Operating on them by the rules of R6 we get
2A0 + 3B 0 = (2, 22, 27, 6, −4, 5).
Now we can reverse the process that we used to get A0 , B 0 and get
2 22 27
2A + 3B =
6 −4 5
The difference between A and A0 (and B and B 0 ) is just one of convenience. As vector spaces, Mm,n and Rmn are
identical. The same is true of Mm,n (C) and Cmn . Later on we’ll se that there is a good reason for writing numbers
out as arrays, rather than as rows or columns. And that frequently we want to write the elements of Rn or Cn in
the form of columns, rather than rows.
3
Subspaces
Given a vector space, inside of it there are usually sets that are themselves vector spaces with the vector operations.
For example, in R3 consider the set
V = {(x, y, 0) : x, y real numbers.}.
Is it a vector space? The only thing that could possibly go wrong is that the vector operations can take us out
of V . Or that some element of V might not have its additive inverse (the thing you get putting a minus sign in
front) in V . Or maybe there is no 0 in V . Or maybe scalar product takes us out of V . But none of these things
happens. 0 = (0, 0, 0) is in V ; it is the case x = y = 0. The description of V in words is: V is the set of all triples
of real numbers in which the last component is 0. Well, adding two such triples produces another such triple, since
0 + 0 = 0. Given such a triple, its additive inverse is of the same type, since −0 = 0. Multiplying by a scalar such
a triple, produces another one with third component 0, since c · 0 = 0 for all scalars. So yes, V is a vector space.
We say it is a subspace of R3 . Here is a precise definition:
Assume V is a vector space. A subset W of V is a subspace of V if (and only if)
1. 0 ∈ W .
2. If v, w ∈ W , then v + w ∈ W . (W is closed under addition.)
3. If v ∈ W and c is a scalar, then cv ∈ W . (W is closed under scalar multiplication.)
With these properties we can work in W without ever having to leave it (except if we want to leave it). We did
not ask for W to contain additive inverses of elements because we get that for free. In fact, if W is a subspace, if
v ∈ W , then −v = (−1)v; since −1 is a scalar (both in the real and complex case), (−1)v must be in W , meaning
−v ∈ W .
Here are more examples.
E1. Let
W = {(x, y, z, w) : x, y, z, w are real numbers andx + 2y − 3z + w = 0}
This is a subset of R4 . Is it a subspace of R4 ? To answer this question, we have to check the three properties
mentioned above. Is 0 ∈ W ? For R4 , 0 = (0, 0, 0, 0), so x = 0, y = 0, z = 0, w = 0 and, of course,
0 + 2 · 0 − 3 · 0 + 0 = 0. Yes, 0 ∈ W . Next, we have to check that if v, w ∈ W , then v + w ∈ W . For this
we assume v = (x, y, z, w), w = (x0 , y 0 , z 0 , w0 ) where x, y, z, w, x0 , y 0 , z 0 , w0 satisfy x + 2y − 3z + w = 0 and
x0 + 2y 0 − 3z 0 + w0 = 0. Then
v + w = (x + x0 , y + y 0 , z + z 0 , w + w0 )
3
SUBSPACES
11
and the question is whether (x + x0 ) + 2(y + y 0 ) − 3(z + z 0 ) + (w + w0 ) = 0. Using a bit of high school
mathematics we see that
(x+x0 )+2(y+y 0 )−3(z+z 0 )+(w+w0 ) = x+x0 +2y+2y 0 −3z−3z 0 +w+w0 = x+2y−3z+w+x0 +2y 0 −3z 0 +w0 = 0+0 = 0.
We conclude that v + w ∈ W . Finally, suppose v ∈ W , say v = (x, y, z, w) where x + 2y − 3z + w = 0.
Suppose c is a scalar (a real number since we are working with a real vector space). Then cv = (cx, cy, cz, cw)
and
cx + 2cy − 3cz + cw = c(x + 2y − 3z + w) = c · 0 = 0.
It follows that cv ∈ W . The conclusion is that W is indeed a subspace of R4 .
E2 Since real numbers are just complex numbers with the imaginary part equal to 0, it is (or should be) clear
that for any integer n > 0, Rn is a subset of Cn . Is it a subspace? At first it seems to be. The zero element,
the n-tuple with all components 0, is in Rn . Adding two vectors of Rn results in a vector of Rn . But our
scalars are now complex numbers; multiplying a vector of Rn by a scalar will almost always take us out of
Rn . Here is an example with n = 3. take, for example (1, 2, 3) ∈ R3 . Take for scalar the imaginary unit i.
Then i(1, 2, 3) = (i, 2i, 3i) ∈
/ R3 .
Rn is a subset but not a subspace of Cn .
E3. (The largest and smallest subspaces.) Suppose V is a vector space. Then V satisfies all the properties of being
a subspace of V . Every vector space is a subspace of itself. V is, of course, the largest possible subspace of
V . For the smallest one, consider the set W = {0}, the set containing only the zero element of V . Since it
contains the zero element, it satisfies the first property of being a subspace. If v, w ∈ W , well that can only
be if both are the zero element and then so is their sum; the second property holds. And, since any scalar
times the zero element is the zero element, the third property holds also. This is necessarily the smallest
possible subspace, known as the trivial subspace.
E4. Think of R3 as points of three space. That is, by setting up a system of cartesian coordinates, we can identify
each vector of R3 with a point of space. One can then show that precisely the following subsets of R3 are
subspaces:
• The origin as set of a single point; the trivial subspace.
• All straight lines through the origin.
• All planes that go through the origin.
• R3 .
E5. If A is an m×n matrix, then the transpose of A, denoted by AT , is the n×m matrix whose rows are the columns
of A (and, therefore, its columns are the rows of A). The formal description is: If A = (aij )1≤i≤m,1≤j≤n ,
then AT = (bij )1≤i≤n,1≤j≤m with bij = aji for all i, j. Briefly: If A = (aij ), then AT = (aji ). For example, if


1
0
2 − 3i −5
1
2
3
4 ,
A= √
8
9
− 5 −2 + 7i
then

1
1

0
2
T
A =
 2 − 3i 3
−5
4
√
− 5
−2 + 7i
8
9




It is an easy exercise to show that if A, B are m × n matrices, then (A + B)T = AT + B T ; if A is an m × n
matrix, then (cA)T = c AT for all scalars c.
If m = n; that is, when dealing with square matrices, it is possible to have AT = A. Matrices verifying this
property are called symmetric. Let n be a positive integer and let V = Mn,n or Mn,n (C) (either the real or
the complex space of all square n × n matrices). Let W = {A ∈ V : A = AT }. Then W is a subspace of V .
I leave the simple checking of the properties as an exercise.
3
SUBSPACES
12
E6. Let b be a real number. Let W = {(x, y) : x, y are real numbers and x + y = b}. There is exactly one value
of b for which W so defined is a subspace of R2 . Find that value and explain why it is the only value that
works.
E7. Let I be an interval in the real line and let V be the set of all real valued functions of domain I. As seen earlier,
this is a vector space with the usual operations. Let W be the set of all continuous functions of domain I.
Because the function that is identically 0 is continuous (about as continuous as can be), it is in W . Since it is
also the zero element of V , W contains the 0 element of V . In Calculus 1 we learn (and one hopes remember)
that the sum of continuous functions is continuous, and that when we multiply a continuous function by a
scalar (real number) the result is again continuous. We conclude W is a subspace of V .
A few concluding remarks (for now) about subspaces. I hope you realize that we don’t always have to call them
W . We could even call the vector space W and the subspace V . Or by any other convenient name; one should try
never to get hung up on notation.
Example E4 shows a typical fact; there is a certain level among subspaces. For example, we have lines and
planes; nothing in between. This is due to the fact that vector spaces have a dimension, as we will see in a while.
3.1
Exercises
1. Which of the following are subspaces of R3 . Justify your answer.
(a) All vectors of the form (a, 0, 0).
(b) All vectors of the form (a, 1, 1).
(c) All vectors of the form (a, b, c), where a = b + c.
(d) All vectors of the form (a, b, c), where a = b + c + 1.
2. Which of the following are subspaces of M2,2 . Justify.
(a) All 2 × 2 matrices with integer entries.
(b) All matrices
a
c
b
d
a
−b
b
a
where a + b + c + d = 0.
(c)
(d) All matrices
.
3. Let I = (a, b) be an interval in R and let V be the vector space of all real valued functions on I, as seen in
the notes it is a vector space with addition and scalar multiplication defined as usual. Let c, d be points in I:
a < c < d < b. Determine which of the following are subspaces of V .
(a) The set of all continuous bounded functions on I; that is the set of all continuous f on I such that there
exists some number M (depending on f ) such that |f (x)| ≤ M for all x ∈ I.
Rd
(b) The set of all continuous functions on I such that c f (x) dx = 0.
Rd
(c) The set of all continuous functions on I such that c f (x) dx = 1
Rx
(d) The set of all continuous functions on I such that c f (t) dx = f (x) for all x ∈ I.
(e) All solutions of the differential equation
an (t)
dn−1 y
dy
dn y
+ an−1 (t) n−1 + · · · + a1 (t)
+ a0 (t)y = 0
n
dt
dt
dt
where a0 , a1 , . . . , an are continuous functions on I.
4
4
MORE ON MATRICES
13
More on matrices
As we saw, the set of m × n matrices is a vector space; real if we restrict ourselves to real entries; complex if we
allow complex numbers. But there is more to matrices than just being a vector space. Matrices can be multiplied.
Well, not always; sometimes. We can form the product AB of the matrix A times the matrix B if and
only if the number of columns of APequals the number of rows of B. Here is the basic definition. It is
best done using the summation symbol .
Suppose A is an m × n matrix and B is an n × p matrix, say
A = (aij )1≤i≤m,1≤j≤n ,
B = (bjk )1≤j≤n,1≤k≤p .
Then AB = (cik )1≤i≤m,1≤k≤p , where
cik =
n
X
aij bjk = ai1 b1k + ai2 b2k + · · · + ain bnk .
j=1
In words: The element of AB in position (i, k) is obtained as follows. Only the i-th row of A and the k-th column
of B are involved. We multiply the first component of the i-th row of A by the first component of the k-th row of
B, add to this the product of the second component or entry of the i-th row of A to the second one of the k-th row
of B, add to this the product of the third entry of the i-th row of A to the third entry of the k-th row of B, and so
forth. Since each row of A has n components, and each column of B has n-components, it all works out, and we
end by adding the product of the last entries of the i-th row of A and k-th column of B.
Here are a number of examples and exercises. Please, verify that all examples are correct!
• Example.
• Example. Let A = (1
1
4
2
5
−3
3
6

1 2 4
8

−1 1 1
=
17
−1 0 7


−3
 4 

0), B = 
 5 .
7
4

0
 1
2
−4
−7
4
13
27
63
That is A is 1 × 4, B is 4 × 1. Then AB will be 1 × 1. We identify a 1 × 1 matrix with its single entry; that
is, we don’t enclose it in parentheses.
AB
=
1(−3) + (−3)(4) + (4)(5) + (0)(7) = 5,

BA
−3
 4
= 
 5
7
9
−12
−12 16
−15 20
−21 28

0
0 
.
0 
0
• Exercise. Let A be a 7 × 8 matrix and B an 8 × 4 matrix. Suppose all entries in the third row of A are zero.
Explain why all entries in the third row of AB will be 0.
Matrix product behaves a lot like an ordinary product of numbers; that is WITHIN REASON! and an important
exception, given an equation involving matrices, if it is true when one substitutes matrices for numbers, then it
will also be true for matrices. Specifically the following properties hold.
• (Associativity of the product) Briefly (AB)C = A(BC). But the products have to make sense! In a more
detailed way: If A is an m × n matrix, B is n × p and C is p × q, then (AB)C = A(BC). In this case AB is
an m × p matrix and it can be multiplied by C, which is p × q, to produce an m × q matrix (AB)C. On the
other hand, BC is an n × q matrix; we can multiply on the left by A to get an m × q matrix A(BC). These
two m × q matrices are one and the same. This property is not hard to verify, but it can get messy.
4
MORE ON MATRICES
14
• (Distributivity), Briefly written as
A(B + C)
=
AB + AC
(A + B)C = AC + BC.
Once again, all operations must make sense. For the first equality, B, C must be of the same type, say n × p.
If A is m × n, then A(B + C), AB, AC, AB + AC, are all defined; the property now makes sense. The second
equality assumes implicitly that A, B are of type m × n, and C of type n × p. This property is easy to verify.
• If c is a scalar, A of type m × n, B of type n × p, then
A(cB) = cAB = (cA)B.
Very easy to verify.
Is it true that AB = BA? This question doesn’t even make sense in most cases. For example if A is 3 × 5 and
B is 5 × 2, then AB is defined (it is 3 × 2), but BA is not. Even if both AB and BA are defined, the answer is
obviously no. For example if A is 3 × 4 and B is 4 × 3, then both AB and BA are defined, but AB is 3 × 3 and
BA is 4 × 4; they are most definitely not equal.
The question becomes more interesting for square matrices; if A, B are square matrices of the same type, say
both are n × n, then AB, BA are both defined, both n × n and potentially equal. But usually they are not. Here
are examples:
1.
1 2
5 6
19 22
=
3 4
7 8
43 50
5 6
1 2
23 34
=
7 8
3 4
31 46
2.

1 0
 0 3
−2 2

0 3
 1 0
−2 2

0 3
−1
4  1 0
−2 2
0

1 0
4
−1   0 3
−2 2
0

4
−1 
0

−1
4 
0

=
=
2
 −5
2

−8
 3
−2

1
4
8 −3 
−6 −10

17 12
−2 −1 
6 10
3.

1
 −2
1

−1
2
2
1 −1   −3
−2
1
3
 
 
−3
3
11 −11
10
2
2 −3  =  −10
11 −11  =  −3
−3
2
11 −10
11
3

−3
3
1
2 −3   −2
−3
2
1

−1
2
1 −1 
−2
1
In general AB 6= BA but, as the third example shows, there are exceptions. For example, every square matrix
naturally commutes with itself: If A = B, then AB = AA = BA. If A is a square matrix, we write A2 for A, A3
for AAA = AA2 = A2 A, etc. There are also square matrices that commute with every square matrix. The two
most notable examples are the zero matrix and the identity matrix (about to be defined). Suppose 0 is the n × n
zero matrix. Then, for every square n × n matrix A we have A0 = 0 = 0A; the square zero matrix commutes with
all square matrices.
The identity matrix is usually denoted by I, or by In if its type is to be emphasized. It is the square matrix
having ones in the main diagonal, all other entries equal to 0. If n = 2, then
1 0
I = I2 =
;
0 1
4
MORE ON MATRICES
15
if n = 3, then

1
I = I3 =  0
0
etc. The general definition is I = (δij )1≤i,j≤n , where
0
δij =
1
0
1
0

0
0 ;
1
if i 6= j,
if i = j.
It is called the identity matrix because as is immediately verified, if A is any square matrix of the same type as I,
then IA = A = AI. Please, verify this on your own; convince yourself it holds and try to understand why it holds.
In particular, I commutes with all square matrices of its same type. That is, In commutes with all n × n matrices.
But there is a bit more. Let A be an m × n matrix. Then
Im A = A = AIn .
(Verify this property.)
Exercise. Let M be a square n × n matrix. Show that M A = AM for all n × n matrices A if and only if M = cI,
where c is a scalar and I = In is the n × n identity matrix. (If c = 0, then M = 0, if c = 1 then M = I. In general
M would have to be a matrix having all off-diagonal entries equal to 0, all diagonal entries the same.) As a hint,
showing that if M = cI, then M A = AM for all n × n matrices should be very easy. For the converse, experiment
with different matrices A. For example, what does the equation M A = AM tell you assuming that A has all entries
but one equal to zero. For example, if A = (aij ) and aij = 0 if i 6= 1, j 6= 2, a1,2 = 1, if M = (mij ), then one can
see that the first row of AM is equal to the second row of M , while all other rows have only zero entries. On the
other hand, in M A one sees that the second column of M A is the first column of M , all other columns are zero
columns:





0 1 0 ··· 0
m11 m12 m13 · · · m1n
m21 m22 m23 · · · m2n
 0 0 0 · · · 0   m21 m22 m23 · · · m2n 
 0
0
0
···
0 





=
 .. .. ..




..
..
..
..
..
..
..
..
..
..
..
..  ,
 . . .




.
.
.
.
.
.
.
.
.
.
.
. 
0





0
0
···
0
mn1
m11
m21
..
.
m12
m22
..
.
m13
m23
..
.
···
···
..
.
m1n
m2n
..
.
mn1
mn2
mn3
···
mnn
···
mn2
mn3

0 1 0 ···
0 0 0 ···
.. .. ..
..
. . .
.
0 0 0 ···




mnn
0
0
..
.
0
0






=




0
0
···
0
0
..
.
m11
m21
..
.
0
0
..
.
···
···
..
.
0
0
..
.
0
mn1
0
···
0
0





Because M is supposed to commute with all matrices, it will commute with this selected one. Equating the two
product results, we see that m21 , m23 , . . . , m2n , m31 , m41 , . . . , mn1 must all be 0, while m11 = m22 . We are well on
our way.
From now on, instead of writing Mn,n I will write simply Mn . Thus Mn is the set of all n × n matrices with
real entries and Mn (C) is the set of all n × n matrices with complex entries. The space Mn (as well as Mn (C)
has some very nice properties. Under addition and scalar multiplication it is a vector space; but that is true of all
sets of matrices of the same type. But it is also closed under matrix multiplication: If A, B ∈ Mn then AB, BA
are defined and in Mn . The same holds for Mn (C). So in Mn (and in Mn (C) we can add and multiply any two
matrices and never leave the set. We can, of course, also subtract; A − B is the same as A + (−1)B. Can we also
divide?
This is a very natural question; if A, B ∈ Mn (C), what, if anything, should A/B be? We could say that it
should be a square n × n matrix such that when multiplied by B we get A back. Here we run into a first problem;
multiplication not being commutative. Should we ask for (A/B)B = A, for B(A/B) = A, or for both? As it turns
out this approach has more problems than one might think at first. Here is a very simple example. Suppose
0 0
0 1
A=0=
,
B=
0 0
0 0
4
MORE ON MATRICES
16
One could say that obviously A/B = 0/B = 0. In fact if you multiply 0 by B (either on the right or on the left),
you get A = 0. But here is a quaint fact. Notice that B 2 = 0, the 2 × 2 zero matrix. Multiplying B by B itself
from the left or from the right will also result in A. Should 0/B = B?
The problem, in general, is that given square matrices A, B there could be more than one matrix that can act
like A/B; more than one matrix C such that CB = BC = A. Or there could be none. For example, if in the
previous example we replace A by the identity matrix I, leaving B as it is, then I/B is undefined; there is no
matrix C such that CB = BC = I. Can you prove this? Due to this one does things in a different way. We have
an identity I; we could start trying to define I/B and then define A/B as either A(I/B) or (I/B)A. Well this also
has its problems, but they are solvable.
Problem 1. There are many matrices B ∈ Mn (C) for which I/B won’t make sense. That is, matrices B for which
there is no C such that CB = BC = I.
Solution. Define I/B only for the matrices for which it makes sense.
Problem 2. Even if I/B makes sense, there is no guarantee that for any given matrix A we will have (I/B)A =
A(I/B). So what should A/B be?
Solution. Forget about A/B, just think of dividing by B on the left and dividing by B on the right.
Let’s begin to be precise. Let A ∈ Mn (C). We say A is invertible if (and only if) there exists B ∈ Mn (C) such
that AB = I = BA. In this case one can show that B is unique and one denotes it by A−1 . That “B is unique”
means: “for a given A there may or there may not exist such a B, but there cannot exist more than one such
matrix.” This uniqueness is actually very easy to prove, and here is a proof. Those of you who hate proofs and
prefer to accept the instructor’s word for everything, please ignore it.
Proof that there can only be one matrix deserving to be called A−1 . Suppose there is more than one
such matrix, so there are at least 2 matrices, call them B, C, such that
AB = BA = I
and
AC = CA = I.
Then
B = BI = B(AC) = (BA)C = IC = C.
End of proof.
What is however amazing, and harder to prove, is that if A, C ∈ Mn (C), and AC = I, then this suffices to
make A invertible and C = A−1 . That is, in this world of matrices where commutativity is so rare, it suffices to
have AC = I to conclude that we also have CA = I and C = A−1 . So in verifying that a matrix C is the inverse
of A, we don’t have to check that both CA and AC are the identity. If AC = I, then we are done; CA will also be
I and C = A−1 . Of course, the same is true if CA = I; then AC = I and C = A−1 .
Deciding whether a general n × n matrix is invertible and, if it is, finding its inverse is actually a matter of
looking at n2 equations, and solving them, or showing they can’t be solved. If we were to try to do this at our
current level of knowledge, it would be a boring, difficult, messy exercise. We will develop methods to do this
efficiently. However, to give you an idea of what can be involved if one attacks the problem without any more tools
at hand, let us try to decide if the matrix


1 2 2
A= 2 3 1 
1 0 1
is invertible, and find its inverse. We are asking, in

1
 2
1
other words, whether there is a 3 × 3 matrix B such that

2 2
3 1  B = I.
0 1
4
MORE ON MATRICES
17
It is enough if there is a matrix B that works on the right; as mentioned, it will also work from the left. Well, B
will look like


r s t
B= u v w 
x y z
and the question becomes can we find

1
 2
1
r, s, t, u, v, w, x, y, z solving

 
2 2
r s t
1
3 1  u v w  =  0
0 1
x y z
0
0
1
0

0
0 .
1
If we can find such r, s, etc., then we have our inverse. If there is some reason why this is impossible, then there
won’t exist an inverse. In the last matrix we can perform the product on the left and the question now becomes:
Can we find r, s, t, u, v, w, x, y, z solving

 

r + 2u + 2x s + 2v + 2y t + 2w + 2z
1 0 0
 2r + 3u + x 2s + 3v + y 2t + 3w + z  =  0 1 0  .
r+x
s+y
t+z
0 0 1
This is equivalent to solving 9 = 32 equations for r, s, t, u, v, w, x, y, z:
r + 2u + 2x
=
1
s + 2v + 2y
=
0
t + 2w + 2z
=
0
2r + 3u + x
=
0
2s + 3v + y
=
1
2t + 3w + z
=
0
r+x
=
0
s+y
=
0
t+z
=
1
These equations are not as hard as one might think; still solving the system, or verifying that there is no solution,
is work. But let’s do it!. Maybe you want to do it on your own; it might make you appreciate more the methods
we’ll develop later on. So solve the system, then come back to compare your solution with mine.
From the equation r + x = 0, we get x = −r. Using this in the equation 2r + 3u + x = 0 and solving for u, we
get u = −r/3. Using these values for x, u in the first equation we get r = −3/5. We now have our possible first
column.
3
1
3
r=− ,
u= ,
x= .
5
5
5
Similarly we find the other columns. From s + y = 0, we get y = −s. From 2s + 3v + y = 1 we get v = (1 − s)/3;
from s + 2v + 2y = 0 we get s = 2/5 so that
s=
2
,
5
v=
1
,
5
2
y=− .
5
Finally, from the equations t + z = 1, 2t + 3w + z = 0, t + 2w + 2z = 0 we get
t=
4
,
5
3
w=− ,
5
z=
If our calculations are correct, the matrix A is invertible and
 3

2
4
−5
5
5



−3


1
1
1
3 
 1
−
A−1 = 
=
5
5 
 5

 5
3
3
2
1
−
5
5
5
1
.
5
2
1
−2

4
−3 
1
5
SYSTEMS OF LINEAR EQUATIONS.
18
But to be absolutely sure, one should multiply A by its supposed inverse (on either side0 and see that one gets the
identity matrix. One does.
Here is an easier exercise. Show that the matrix


−1 2 0
A =  −3 0 2 
0 1 2
is invertible and that
A−1

−2
1 
6
=
14
−3
−4
−2
1

4
2 
6
Do we have to go through the same process as before, finding the inverse of A and then seeing it works out to the
given value. Only if we are gluttons for pain! The obvious thing to do is to multiply A by its assumed inverse and
see that we get the identity. We see that




−2 −4 4
−1 2 0
 −3 0 2  · 1  6 −2 2 
14
−3 1 6
0 1 2





−2 −4 4
−1 2 0
14 0 0
1 
1 
−3 0 2   6 −2 2  =
0 14 0  = I.
=
14
14
−3 1 6
0 1 2
0 0 14
And we are done. The matrix given as A−1 is indeed the inverse of A, and since A has an inverse, it is invertible.
4.1
Exercises
1. Show that if A is invertible, then A−1 is invertible and (A−1 )−1 = A.
2. Show that if A ∈ Mn (C) is invertible, so is AT and (AT )−1 = (A−1 )T .
3. Show that if A, B ∈n (C) are invertible, so is AB and (AB)−1 = B −1 A−1 .
4. Let A ∈ Mn (C) and assume A2 = 0. Prove that I + A is invertible and that (I + A)−1 = I − A.
Note: It is possible for A2 to be the zero matrix without A being the zero matrix. For example, for n = 3,
the following two matrices have square equal to 0. Neither is the zero matrix. There are, of course, many
others in the same category.




1 1 −2
0 0 1
 0 0 0 ,
 1 1 −2 
1 1 −2
0 0 0
5
Systems of linear equations.
We now come to one of the many reasons linear algebra was invented, to solve systems of linear equations. In this
section I will try to summarize most of what you need to know about systems of equations of the form:
a11 x1
a21 x1
···
am1 x1
+
+
..
.
a12 x2
a22 x2
···
+ am2 x2
+
+
..
.
+
···
···
···
···
+
+
..
.
a1n xn
a2n xn
···
+ amn xn
=
=
..
.
b1
b2
..
.
(9)
= bm
This is a system of m equations in n unknowns. The unknowns are usually denoted by x1 , . . . , xn , but the notation
can change depending on the circumstances. For example, if n = 2; that is, if there are only two unknowns, then
one frequently write x for x1 and y for x2 . If there are only three unknowns, one frequently denotes them by x, y, z
rather than x1 , x2 , x3 . Less frequently, if one has four unknowns, one denotes them by x, y, z, w. For five or more
unknowns one usually uses subindices; since one can too easily run out of letters.
5
SYSTEMS OF LINEAR EQUATIONS.
19
The coefficients aij are given numbers; they can be real or complex. The same holds true for the right hand
side entries b1 , . . . , bn . Most of my examples will be in the real case, but all that I say is valid also in the complex
case (except if it obviously is not valid!). Once a single complex non-real number enters into the picture, one is in
complex mode.
Solving a system like (9) consists in finding an n-tuple of numbers x1 , x2 , . . . , xn such that when plugging it
into the equations, all equations are satisfied. For example, consider the system of equations
3x1
−x1
+ 2x2
+ x2
−
+
x3
x3
=
=
1
5
(10)
Here m = 2, n = 3. Then x1 = 0, x2 = 2, x3 = 3 is a solution. In fact,
3·0 +
−0 +
2·2 − 3 = 1
2
+ 3 = 5
But that isn’t all the story. There are more solutions, many more. For example, as one can verify,
x1 = 9, x2 = −4, x3 = 18
also is a solution. And so is
x1 = −6/5, x2 = 14/5, x3 = 1.
And many more. We will rarely be content with finding a single solution, in most cases we will want to find ALL
solutions. I anticipate here that the following, and only the following, can happen for a system of linear equations:
• The system has exactly one solution.
• The system has NO solutions.
• The system has an infinity of solutions.
So if someone tries to sell you a system that has exactly two solutions, don’t buy it! Once it has more than one
solution, it has an infinity of solutions.
To attack systems in a rational, efficient way, we want to develop some notation and terminology. Let us return
to the system (9). The matrix whose entries are the coefficients of the system; that is, the matrix


a11 a12 · · · a1n
 a21 a22 · · · a2n 


A= .
..
..
.. 
 ..
.
.
. 
am1
am2
···
amn
is called the system matrix (or matrix of the system). For example, for the system in (10), the system matrix is
3 2 −1
A=
−1 1 1
It is an m × n matrix. It will also be convenient to write n-tuples (and m-tuples,
and
other tuples) vertically,
a
1
as column vectors. So we think of C2 as consisting of all elements of the form
, where a1 , a2 are complex
a2


a1
numbers; C3 as consisting of all elements of the form  a2 , where a1 , a2 , a3 are complex numbers; and so forth.
a3
In general


a1
 a2 


Cn = { .  : a1 , a2 , . . . , an ∈ C}.
 .. 
an
5
SYSTEMS OF LINEAR EQUATIONS.
20
In other words, we are identifying Cn with the vector space of n × 1 matrices; Cn = Mn,1 (C). If working exclusively
with real numbers, one can replace C by R in all of this. We call matrices that consist of a single column, column
matrices or, more frequently, column vectors. Returning to oursystem,
 the m-tuple b1 , . . . , bm of numbers on the
b1
 b2 


right hand side of the equations will give rise to a vector b =  .  ∈ Cm = Mm,1 (C). We also introduce the
 .. 
bm



unknown/solution vector x = 

x1
x2
..
.



. Then the right hand side of the system is precisely Ax and the system

xn
(9) can be written in a nice and compact way as
Ax = b.
A solution of the system is now a vector x ∈ Cm such that Ax = b. Our objective is to develop an efficient method
for finding all solutions to such a system.
Two m × n systems of linear equations are said to be equivalent if they have exactly the same solutions. Solving
a system is usually done (whether one realizes it or not) by replacing the system by a sequence of systems, each
equivalent to the preceding one, until one gets a system so simple that it actually solves itself. The solutions of the
final, very simple system, are the same as of the original system.
The way one gets an equivalent system is by performing any of the following “operations.”
1. Interchange two equations. For example, if in the system (10) we interchange equations 1 and 2 we get
−x1
3x1
+
+
x2
2x2
+ x3
− x3
= 5
= 1
Nothing essential has changed; the order in which the equations are presented does not affect the solutions.
2. Multiply one equation by a non-zero constant. Nothing changes; we can go back to the original system by
dividing out the constant.
3. Add to an equation another equation multiplied by a constant. Again, nothing changes; if we now subtract
from the new equation the same equation we previously added, multiplied by the same constant, we are back
where we were before.
These are the three basic “operations” by which any system can be reduced to an immediately solvable one, one
that can be solved by inspection. For example, here is how these operations affect system (10)
1. Interchange the first and second equation. The system becomes
−x1
3x1
+
+
x2
2x2
+ x3
− x3
= 5
= 1
2. Multiply the first equation by −1:
x1
3x1
−
+
− x3
− x3
x2
2x2
= −5
=
1
3. Add, to the second equation, −3 times the first equation:
x1
−
x1
− x2
x2
x2
5x2
−
+
x3
2x3
= −5
= 16
x3
2
5 x3
= −5
= 16
5
4. Multiply the second equation by 1/5:
−
+
The system is now in Gauss reduced form, easy to solve, but I’ll go one step further (Gauss-Jordan reduction).
5
SYSTEMS OF LINEAR EQUATIONS.
21
5. To the first equation, add the second equation:
x1
−
3
5 x3
= − 95
+
2
5 x3
=
(11)
x2
16
5
The last system of equations is equivalent to the first one; everything that we did can be undone. But it is also
solved. We see from it, that every choice of x1 , x2 , x3 such that
9 3
x1 = − + x3 ,
5 5
x2 =
16 2
− x3
5
5
is a solution; indicating that we can select x3 arbitrarily, and then use the formulas for x1 , x2 . If we write the
solutions in (column) vector form, we found that
 9 3

− 5 + 5 x3


 16 2


−
x
x=
5 3 
 5


x3
Using our knowledge of vector operations, we can break the solution
 9 
 3
−5
5



 16 
 2


x=
 5  + x3  − 5



0
1
up as follows






But why have an x3 when we don’t have an x1 nor x2 written out explicitly anymore? Another way of writing the
solution of (10) is as follows:

− 59


x=


16
5


3
5




 + c  −2

 5


0
1



, where c is arbitrary.


(12)




9
0
Taking c = 3, we get the solution  2 , taking c = 18 we get  −4 . These are the two first solutions
3
18
mentioned earlier. Since there is an infinite number of choices of c, we have an infinity of solutions.
If we consider what we did here, we may realize that the only job done by the unknowns, by x1 , x2 , x3 in our
case, was to act as placeholders. The same is true about the equal sign. If we carefully keep the coefficients in
place, never mix a coefficient of x1 with one of x2 , for example, we don’t really need the variables. That is, we can
remove them from the original form of the system, and then bring them back at the very end. To be more precise
we need to introduce the augmented matrix of the system. This is the system matrix augmented (increased) by
adding the b vector as last column. We draw a line or dotted line to indicate that we have an augmented matrix.
The augmented matrix of (9) is the matrix


a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 


(A|b) =  .

..
..
....
 ..

.
.
..
am1
am2
···
amn
bm
Suppose we carry out the three operations on equations we mentioned above. The effect on the augmented matrix
of each one of these operations is, respectively:
5
SYSTEMS OF LINEAR EQUATIONS.
22
1. Interchange two rows. We will call the operation of interchanging rows i and j of a matrix operation I(i,j) .
2. Multiply a row by a non zero constant. We will call the operation of multiplying the i-th row by the constant
c 6= 0 operation IIc(i) .
3. Adding to a row another row multiplied by a constant. Adding to row i the j-th row multiplied by c (i 6= j)
will be denoted by III(i)+c(j) .
These operations are the row operations. The idea is to use them to simplify the augmented matrix. Simplifying
the matrix by applying row operations is known as row reduction.
Here is how we would solve (10) by working on the matrix. It is essentially what we did before, but it is less
messy, and easy to program. I usually start with the augmented matrix, and then perform row operations in
sequence. In some cases the order of performance doesn’t matter, so I may perform more than one simultaneously.
I write arrows joining one matrix to the next. Occasionally (and for now) I will write the operations performed on
top of the arrow. So here is the same old system (10) solved by row reduction; the first matrix is the augmented
matrix of the system.
I(1,2) ,II(−1)(1)
III(2)−3(1)
1 −1 −1 −5
1 −1 −1 −5
3 2 −1 1
−→
−→
−1 1
1 5
3
2 −1
1
0
5
2 16
II 1 (2)

1
−1
−1
−5
1
2
5
16
5
5

−→
0


III(1)+(2)

−→
1
0
− 35
− 95
1
2
5
16
5

0


The last matrix is in reduced canonical form (explained below), which means we are done. We can now write out
the system that has this last matrix as augmented matrix. We get
x1
x2
−
3
5 x3
=
− 59
+
2
5 x3
=
16
5
which is exactly (11). From here we continue as before to get the solution (12).
We need to know when to stop row reducing, when the simplest possible level has been achieved. This is called
row reduced echelon form or some such name; I’ll abbreviate it to RRE form. A matrix is in RRE form if
1. In each row, the first non zero coefficient is a 1. A row that does not start with a 1 will consist exclusively of
0’s.
2. All zero rows come after the non-zero rows.
3. All entries above and below the leading 1 of any given row are equal to 0.
4. If i < j, and neither row i nor row j is a zero row, then the leading 1 of row i must be in a column preceding
the column containing the leading 1 of row j.
Two facts are important here:
• Every matrix can be brought to RRE form by row operations.
• While there is more than one way to achieve RRE form by row operations, the end result is always the same
for any given matrix. It is called the RRE form of the matrix.
Maybe a few examples can clear things up.
Examples
5
SYSTEMS OF LINEAR EQUATIONS.
23
1. Find all solutions of
x1 − x2 + x3 − x4 + x5
=
1
3x1 + 2x2 − x4 + 9x5
=
0
7x1 + 10x2 + 3x3 + 6x4 − 9x5
The augmented matrix of the system is

1
 3
7
= −7

1
−1 1 −1
1
2 0 −1
9
0 
10 3
6 −9 −7
The first objective on our way to RRE form is, by row operations, get a 1 into the (1, 1) position. In our case,
the matrix already has a 1 in the correct place, so we need to do nothing. The only reason one will not be
able to get a 1 into the (1, 1) position is if the first column contains only zeros (unlikely!, but possible). Then
one moves over to the second column, tries to get a 1 into position (1, 2); if this fails, into position (1, 3);
etc. Total failure means one has the zero matrix, which is, of course, in RRE form. If there is any non-zero
entry in the first column, (or in any other column) one can get it to be in row 1 (if it isn’t there already) by
interchanging row 1 with the row containing the non-zero entry; if we then multiply row 1 by the reciprocal
of that non-zero entry, we have a 1 in row 1. So, at most two row operations place a 1 in row 1, if the column
has at least one non-zero entry. But, as mentioned, we already have a 1 where it should be to start.
Once we have the 1 in place, we use operations of the third type to get every entry below this 1 to be 0.
For example, if the first entry in row i, i > 1 is c, we perform the row operation III(i)−c(1) . Applied to our
matrix, it works as follows




1 −1 1 −1
1
1
1 −1
1 −1
1
1
III(2)−3(1) ,III(3)−7(1)
 3
 0
2 0 −1
9
5 −3
2
6 −3 
0 
−→
7 10 3
6 −9 −7
0 17 −4 13 −16 −14
The first column has been processed. Once a column has been processed, here is how on processes the next
column.
(a) Suppose i is the row in which the last leading 1 appeared (in one of the columns preceding the one we are
about to process). Presumably that 1 is in the column just preceding the one we are about to process,
but it could be before that. Is there a non-zero entry in the column to be processed, in some row strictly
below i? If no, the column has been processed; move on to the next column. If none remain, you are
done. If yes, get it into row i + 1 (if not already there) by interchanging the row containing it with row
i + 1. Then multiply the row i + 1 by the reciprocal of that non-zero entry so as to get a 1 in position
i + 1.
(b) Using operations of type III, get every entry above and below this 1 to be 0. Then move on to the next
column. If none remain, you are done.
Applying all this to our current matrix, here is how we finish the process.




1 −1
1 −1
1
1
1
1 −1
1 −1
1
II 1 (2)
5
2
6
 0
 0
5 −3
2
6 −3  −→
1 − 35
− 53 
5
5
0 17 −4 13 −16 −14
0 17 −4 13 −16 −14



11
11
2
2
2
2
− 35
− 53
1 0
1 0
5
5
5
5
5
5
 II


5



(3)
III(1)+(2) ,III(3)−17(2)
31
2
6
3
2
6
 0 1 −3

− 53 
− 35
−→
5
5
5
5
5

 −→  0 1 − 5



31
0 0 31
− 182
− 19
0 0
1
1 − 182
− 19
5
5
5
5
31
31


141
20
1 0 0 −1
31
31


III(1)− 2 (3) ,III(2)+ 3 (3) 

5
5
72
30 

1 − 31 − 31 
−→
 0 1 0


182
19
0 0 1
1 − 31 − 31






5
SYSTEMS OF LINEAR EQUATIONS.
24
The matrix is now in RRE form, and in the best possible way for the case m ≤ n (in our case m = 3, n = 5):
The first m columns constitute the m × m identity matrix. The system is equivalent to
141
31
72
x2 + x4 − x5
31
182
x3 + x4 −
x5
31
x1 − x4 +
=
=
=
20
31
30
−
31
19
−
31
Variables that do not correspond to the columns containing the leading 1’s; columns 4, 5 in our case, thus
x4 , x5 , can be chosen freely. All solution of the system are thus given by
141 20
+
31
31
x1
= x4 −
x2
= −x4 +
72
30
x5 −
31
31
x3
= −x4 +
182
19
x5 −
31
31
for arbitrary values of x4 , x5 . That is, for every choice of values of x4 , x5 we get a solution; we have again an
infinity of solutions. We can write the whole thing in vector notation as follows: The solutions of the system
are given by


 141   20 


20
x4 − 141
− 31
1
31 + 31
31



 











 −x4 + 72 x5 − 30 
 72   − 30 
 −1 
31
31 


 31   31 




 





 



x=
 −x4 + 182 x5 − 19  = x4  −1  + x5  182  +  − 19 
31
31 


 31   31 




 






 










1
x4
0
0 
0
1
0
x5
A number of other cosmetic changes can be made. For example, why have x4 , x5 where we don’t have
x1 , x2 , x3 ? We could relabel them c1 , c2 . Moreover, we can try to get rid of some denominators; if x5 = c2 is
arbitrary, so is x5 /31. We can also write the solution in the slightly nicer way






−141
1
20
 72 
 −1 
 −30 





1 




 −19 
x = c1  −1  + c2  182  +


31
 1 
 0 
 0 
31
0
0
for arbitrary values of c1 , c2
2. For our next example, consider
x + 3y + 2w
5x + 15y + z + 14w
=
5
= −1
x + 3y − z − 2w = 2
The augmented matrix is


1 3
0
2
5
 5 15
1 14 −1 
1 3 −1 −2
2
We proceed to row reduce.



1 3
0
2
5
1
III(2)−5(1) ,III(3)−(1)
 5 15
 0
1 14 −1 
−→
1 3 −1 −2
2
0
3
0
0


0
2
5
1
III(3)+(2)
1
4 −26  −→  0
−1 −4
−3
0
3
0
0
0
1
0

2
5
4 −26 
0 −29
5
SYSTEMS OF LINEAR EQUATIONS.
25
and we are done. The system has NO solutions. It has no solutions because it is equivalent to a system in
which the third equation is
0x + 0y + 0z + 0w = −29
and since the left hand side is 0 regardless of what x, y, z, w might be, it can never equal -29.
3. We saw in the previous example a system without solutions. Can it have solutions if we change the right
hand sides? That is, let us try to determine all values, if any, of b1 , b2 , b3 for which the system
x + 3y + 2w
5x + 15y + z + 14w
= b1
=
b2
x + 3y − z − 2w = b3
has solutions. We write up the augmented matrix and row reduce.




b1
1 3
0
2
1
3
0
2 b1
III(2)−5(1) ,III(3)−(1)
 0 0
 5 15
1 14 b2 
1
4 b2 − 5b1 
−→
1
3 −1 −2 b3
0 0 −1 −4
b3 − b1


b1
1 3 0 2
III(3)+(2)
b2 − 5b1 
−→  0 0 1 4
0 0 0 0 −6b1 + b2 + b3
The last equation makes sense if and only if −6b1 + b2 + b3 = 0, or b3 = 6b1 − b2 . In this case the RRE form
is


1 3 0 2
b1
 0 0 1 4 b2 − 5b1 
0
0 0 0 0
Notice also that the columns (among the first four) not containing leading 1’s are columns 2 and 4; thus y, w
can be selected freely. The solution could be given as
or in vector form
x
= −3y − 2w + b1
z
= −4w + b2 − 5b1


−3

 1 


x = y
 0 +w
0

 
b1
−2

0
0 
+
−4   b2 − 5b1
0
1


,

with y, w arbitrary.
4. We now find all solutions of
x1 − 2x2 + 2x3
=
1
x1 + x2 + 5x3
=
0
2x1 + 3x2 + x3
= −1
NOTE: If the number of equations is less than the number of unknowns (m < n) then there usually (but not
always!) is more than one solution. More precisely, there either is no solution or an infinity of solutions. If
the number of equations is more than the number of unknowns (m > n), there is a good chance of not having
any solutions. Actually, anything can happen, but the most likely outcome is no solutions because there are
too many conditions (equations). If the number of equations equals the number of unknowns (m = n), as
it does in our current example, one has a good chance to having a unique solution. Still, anything could
happen.
Solving the system. We row reduce the augmented matrix.





1 −2 2
1
1 −2
2
1
1
II 1 (2)
III(2)−(1) ,III(3)−2(1)
3
 0
 0
 1
1 5
0 
3
3 −1  −→
−→
2
3 1 −1
0
7 −3 −3
0

−2
2
1
1
1 −1/3 
7 −3
−3
5
SYSTEMS OF LINEAR EQUATIONS.

III(1)+2(2) ,III(3)−7(2)
−→
1 0
 0 1
0 0
26


4
1 0 4
1/3
II− 1 (3)
10
 0 1 1
1 −1/3  −→
−10 −2/3
0 0 1


1 0 0
1/3
III(1)−4(3) ,III(2)−(3)
 0 1 0
−1/3 
−→
1/15
0 0 1

1/15
−2/5 
1/15
There is a unique solution:
1
15


 2
x=
 −5







1
15
The following should be clear from all these examples: There exists a unique solution if and only if m ≥ n
and the first n rows of the RRE form of the augmented matrix constitute the identity matrix. Since the
augmented matrix has n + 1 columns, and as we row reduce it we are row reducing A, we see that this
condition involves only A, not the b part. Let us state this as a theorem so we realize it is an important
result:
Theorem 1 The m × n system of linear equations (9) has a unique solution for a given value of b ∈ Cm if
and only if
(a) m ≥ n.
(b) The row reduced echelon form of the system matrix A is either the n × n identity matrix (case n = m)
or the n × n identity matrix completed with m − n rows of zeros to an m × n matrix.
Moreover, since all this depends only on A, it has a unique solution for some b ∈ Cm if and only if it has a
unique solution for all b ∈ Cm .
The alternative to not having a unique solution for a given b ∈ Cm is having no solutions for that b or having
an infinity of solutions.
5. Suppose we want to solve two or more systems of linear equations having the same system matrix. For
example, say we want to solve
x1 − x2 + x3
=
1
−x1 + x2 + x3
=
2
x1 + x2 − x3
=
−5
and
x1 − x2 + x3
=
−1
−x1 + x2 + x3
=
4
x1 + x2 − x3
=
0
There is no need to duplicate efforts. You may have noticed that the system matrix carries the row reductions;
once the system matrix is in RRE form, so is the augmented matrix. So we just doubly augment and row
reduce


1 −1
1 −1
1
 −1
1
1
2
4 
1
1 −1 −5
0
Here we go:

1 −1
1
1
 −1
1
1
2
1
1 −1 −5


1
−1
III(2)+(1) ,III(3)−(1)
 0
4 
−→
0
0
−1
1
1
0
2
3
2 −2 −6


−1
1
I(2,3) ,II 1 (2)
2
 0
3 
−→
1
0

−1
1
1 −1
1 −1 −3 1/2 
0
2
3
3
5
SYSTEMS OF LINEAR EQUATIONS.

III(1)+(2)
−→
1 0
 0 1
0 0
0 −2
−1 −3
2
3
27


1 0
−1/2
II 1 (3)
2
 0 1
1/2  −→
3
0 0


1 0 0
−1/2
III(2)+(3)
1/2  −→  0 1 0
3/2
0 0 1
0 −2
−1 −3
1 3/2
−2
−3/2
3/2

−1/2
2 
3/2
The solution to the first system is

−2

 3
x=
 −2

3
2



,


the solution to the second system is

− 21





.
2
x=




3
2
5.1
Exercises
In Exercises 1-6 solve the systems by reducing the augmented matrix to reduced row echelon form.
1.
x
2x
3x
2.
x1
2x1
+ 3x2
+ 6x2
2x1
+
+ y
+ 4y
+ 6y
− 2x3
− 5x3
5x3
6x2
3.
x1
x1
x1
− 2x2
+
3x2
− 12x2
−
+
+
+ 2z
− 3z
− 5z
2x4
10x4
8x4
+
x3
+ 7x3
− 11x3
=
=
=
9
1
0
+ 2x5
+ 4x5
+
4x5
− 4x4
+ 2x4
− 16x4
−
+
+
3x6
15x6
18x6
=
0
= −1
=
5
=
6
= 1
= 2
= 5
4.
x1
−x1
3x1
+
x2
− 2x2
− 7x2
+ 2x3
+ 3x3
+ 4x3
=
=
=
8
1
10
2x1
−2x1
8x1
+ 2x2
+ 5x2
+
x2
+ 2x3
+ 2x3
+ 4x3
=
=
=
0
1
−1
5.
6.
3x1
6x1
− 2x2
+ 6x2
+ 6x2
+ 3x3
− 3x3
+ 3x3
=
1
= −2
=
5
7. For which values of a will the following system have no solutions? exactly one solution? Infinitely many
solutions?
x + 2y −
3z =
4
3x − y +
5z =
2
4x + y + (a2 − 14)z = a + 2
6
6
INVERSES REVISITED
28
Inverses Revisited
Suppose A is a square n × n matrix. It is invertible if and only if there exists a square n × n matrix X such that
AX = I. In this case we’ll also have XA = I, but this is of no concern right now. We can rephrase this in terms
of existence of solutions to systems of linear equations. Suppose we denote the columns of this (that may or may
not exist) matrix X by x(1) , . . . , x(n) . That is, if X = (xij )1≤i,j≤n , then






x1n
x12
x11
 x2n 
 x22 
 x21 






x(1) =  .  , x(2) =  .  , . . . , x(n) =  .  .
 .. 
 .. 
 .. 
xnn
xn2
xn1
It is then easy to see (I hope you agree) that the condition AX = I is equivalent to n systems of linear equations,
all with system matrix A:

Ax(1) = δ (1)



 Ax(2) = δ (2)
(13)
..

···
.
···



Ax(n) = δ (n)
where for 1 ≤ i ≤ n, δ (i) is the column vector having all entries equal to 0, except the i-th one which is 1; that
is δ (1) , δ (2) , . . . , ∆(n) are the columns of the identity matrix. We can solve all these systems simultaneously if we
augment A by all these columns; in other words if we augment A by the identity matrix: (A|I).
What will happen as we row reduce? For a square matrix, the row reduced form either is the identity matrix,
or there is at least one row of zeros in the RRE form. Think about it! You can figure out why this is so! Suppose
we get a row of zeros in the row reduced form of A (not of the augmented matrix, but of A). The only way we can
have solutions for all the n-systems is if we also have 0 for the full row in the augmented matrix. That means that
a certain number of row reductions produced a 0 row in the identity matrix. This is impossible, and it is not hard
to see why this is impossible. So if the RRE form of A contains a row of zeros, then A cannot be invertible; some
of the equations for the columns of the inverse are unsolvable. The alternative is that the row reduced echelon
form of A is the identity matrix. In this case the augmented columns contain the solutions to the equations (13);
in other words the augmented part has become the inverse matrix. Let us state part of this as a theorem.
Theorem 2 A square n × n matrix is invertible if and only if its RRE form is the identity matrix.
We go to examples.
Example 1. Let us compute the inverse of the matrix we already inverted in a previous section, namely


1 2 2
A= 2 3 1 
1 0 1
We augment and row reduce:

1 2 2 1
 2 3 1 0
1 0 1 0

III(1)+2(2) ,III(3)−2(2)
−→
1
 0
0
0
1
0


0
1
III(2)−2(1) ,III(3)−(1)
 0
0 
−→
1
0
0 −4 −3
−1 −3 −2
0
5
3
III(1)+4(3) ,III(2)−3(3)
−→
2
1
−2

1
 0
0
2
2
1
−1 −3 −2
−2 −1 −1


0
1 0
II−(2) ,II 1 (3)
5
 0 1
0 
−→
1
0 0
0 0
1 0
0 1
−3/5
1/5
3/5
2/5
1/5
−2/5
0
1
0

0
0 
1
−4 −3
3
2
1 3/5

4/5
−3/5 
1/5

2
0
−1
0 
−2/5 1/5
6
INVERSES REVISITED
29
As before, the inverse is

A−1
−3/5
=  1/5
3/5

4/5
−3/5 
1/5
2/5
1/5
−2/5
Example 2. Find the inverse of


−3
1
−4
0 

−1
1 
1 −3
0 3
 2 4

 2 1
3 0
Solution.
By row

0 3
 2 4

 2 1
3 0

reduction. As usual, row operations are performed in the order they


−3
1 1 0 0 0
1 2 −2
0 0 1/2 0
I(1,2) ,II 1 (1) 
−4
0 0 1 0 0 
0
3
−3
1
1
0 0
2


−→
 2 1 −1
0 1
−1
1 0 0 1 0 
1 0
1 −3 0 0 0 1
3 0
1 −3 0
0 0
are written.

0
0 

0 
1


1/2 0 0
1 2 −2
0 0
III(3)−2(1) ,III(4)−3(1)
III(3)+(2) ,III(4)+2(2)  0 3 −3
0 0 0 
1 1


−→
−→
 0 0
0
2 1
−1 1 0 
−3/2 0 1
0 0
1 −1 2



1 2 −2
0
1 0
0 −2/3 −2/3
0
1/2 0 0
1/2 0
II 1 (2) 


III
0
1
−1
1/3
1/3
0
0
0
0
1
−1
1/3
1/3
0 0
(1)−2(2)
3

 −→ 
−→
 0 0
 0 0
0
2
1
−1 1 0 
0
2
1
−1 1
0 0
1 −1
2 −3/2 0 1
0 0
1
−1
2 −3/2 0



1 0
0 −2/3 −2/3
1 0 0 −2/3 −2/3
1/2 0 0
1/2
 III(2)+(3) ,II 21 (4)  0 1 0 −2/3
I(4,3)  0 1 −1
1/3
1/3
0
0
0
7/3
−3/2


−→ 
−→
 0 0
 0 0 1
1
−1
2 −3/2 0 1 
−1
2 −3/2
0 0
0
2
1
−1 1 0
0 0 0
1
1/2 −1/2


1 0 0 0 −1/3
1/6 1/3 0
III(1)+ 2 (4) ,III(2)+ 2 (4) ,III(3)+(4) 
8/3 −11/6 1/3 1 
3
 0 1 0 0

−→ 3
 0 0 1 0
5/2
−2 1/2 1 
0 0 0 1
1/2 −1/2 1/2 0
1
 0

 0
0
2 −2
0
3 −3
1
−3
3
1
−6
7 −3
0
1
0
0
The answer is

A−1
− 31
1
6
1
3
8
3
− 11
6
1
3
5
2
−2
1
2
1
2
− 12
1
2




=




0



1 



1 


0
Example 3. Invert

1

4
A=
7
Solution.
By row reduction,

1 2 3 1 0
 4 5 6 0 1
7 8 9 0 0
2
5
8

3
6 .
9


0
1
III(2)−4(1) ,III(3)−7(1)
 0
0 
−→
1
0
2
3
1
−3 −6 −4
−6 −12 −7
0
1
0

0
0 
1
1/2 0
0 0
−1 1
−3/2 0

0
0 

0 
1

0 0
0 1 

0 1 
1/2 0

0
0 

0 
1
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES

III(3)−2(2)
−→
1
 0
0
30
2
3
1
−3 −6 −4
0
0
1

0 0
1 0 
−2 1
A row of zeros has developed in the row reduction of A; the systems of equations whose solutions are the
columns of the inverse of A are not solvable. The matrix A is not invertible.
a b
Example 4. Show that a 2 × 2 matrix
is invertible if and only if ad − bc 6= 0.
c d
Solution. We may have to consider two cases, a =
6 0, a = 0. Suppose first a 6= 0. We can then divide by
a and row reduce as follows
II 1 (1)
III(2)−c(1)
1
b/a
1 b/a 1/a 0
1/a 0
a b 1 0
a
−→
−→
0
1
c d 0 1
c
d
0 d − cb/a −c/a 1
If d − (cb/a) = 0, we are done, there is no inverse. Now d − (cb/a) = (ad − bc)/a 6= 0 if and only if ad − bc 6= 0.
We see that if ad − bc = 0, there is no inverse. On the other hand, if ad − bc 6= 0, we can divide by (ad − bc)/a:
II a
(2)
1
b/a
1/a 0
1 b/a
1/a
0
ad−bc
−→
0 d − cb/a −c/a 1
0 1 −c/(ad − bc) a/(ad − bc)

 

bc
b
d
b
1 0 a1 + a(ad−bc)
− ad−bc
1 0
− ad−bc
III(1)− b (2)
ad−bc
=

−→a 
c
a
c
a
0
1
−
0 1
− ad−bc
ad−bc
ad−bc
ad−bc
It follows that if a 6= 0, then
A
−1
1
=
ad − bc
d −b
−c
a
0 b
and ad − bc = −bc. If ad − bc = 0; that is (in this case) if b or c equals
c d
0, then A has a zero row or a zero column; it clearly can’t have an inverse. On the other hand if bc 6= 0, the
inverse we found above makes sense; it is
1
d −b
A−1 = −
−c
0
bc
Suppose now a = 0 so A =
Now
−
1
bc
d −b
−c
0
0
c
b
d
=−
1
bc
−bc
0
0
−bc
= I,
proving A is invertible and A−1 is given by the same formula as for a 6= 0.
6.1
Exercises
To come
7
Linear dependence, independence and bases
We return to our study of vector spaces. In this section we develop some of the fundamental concepts related to a
vector space. Remember that a scalar is either a complex number or a real number and that one either assumes
that we are only going to allow real numbers as scalars, and our vector spaces are real vector spaces, or we will
open the door to non-real numbers, and we are now dealing with complex vector spaces.
For all the definitions that follows, assume V is a vector space. It could be any of the examples we gave earlier,
or anything else that qualifies as being a vector space.
A linear combination of vectors v1 , . . . , vk ∈ V is any vector of the form
c1 v 1 + · · · + ck v k
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
31
where c1 , . . . , ck are scalars. We do allow the case k = 1; a linear combination is then just cv1 ; c a scalar.
Example. In R3 consider the vectors




1
−1
v1 =  −3 
and
v2 =  0 
2
1
Show that the vectors




0
1
0 =  0 ,
v =  −2 
0
1


1
are linear combinations of v1 , v2 , while w =  1  is not.
1
Solution. Given any number of vectors, the zero vector is always a linear combination of these vectors; we just
have to take the scalars involved equal to 0. In our case 0 = 0v1 + 0v2 . To show v is a linear combination of v1 , v2
reduces to showing that there exists scalars c1 , c2 such that v = c1 v1 + c2 v2 ; writing this out componentwise, it
works out to





 

1
1
−1
c1 − c2
 −2  = c1  −3  + c2  0  =  −3c1  .
1
2
1
2c1 + c2
In other words, we have to show the system of equations
c1 − c2
−3c1
2c1 + c2
=
1
= −2
=
1
has a solution. Analyzing similarly the situation for w, we have to prove that the system
c1 − c2
=
1
−3c1
=
1
2c1 + c2
=
1
does not have a solution. We can do it all at once, trying to solve both systems simultaneously since the have the
same system matrix. We row reduce


1 1
1 −1
 −3
0 −2 1 
2
1
1 1
Performing the following row operations III(2)+3(1) , III(3)−2(1) , and then III(3)+(2) , one gets


1 −1 1 1
 0 −3 1 4 
0
0 0 3
This already shows that the system for which the vector b is the last column of the matrix cannot have a solution.
In other words, this show that w is not a linear combination of v1 , v2 . Dropping the last column, we can solve for
c1 , c2 for the case of v. We keep row reducing performing II− 31 (2) followed by III(1)+(2) to get


1 0
2/3
 0 1 −1/3 
0 0
0
We obtained a unique solution c1 = 2/3, c2 = −1/3. One can verify that, in fact,
 2 1 
3 + 3








1
−1
1


2
1
2
  −2  = v.
−3  −  0  = 
 −3 3  =
3
3


2
1
1
2
1
23 − 3
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
32


1
 −2 

Another example. Express v = 
 3  as a linear combination of the
4
or show it isn’t possible.







1
0
1

 0 
 2 
 2 






v1 = 
 0  , v2 =  0  , v3 =  0  , v4 = 
−1
3
0
following five vectors v1 , v2 , v3 , v4 , v5 ,

1
−1 
,
1 
1


1
 0 

v5 = 
 1 
0
Solution. We have to find scalars c1 , c2 , c3 , c4 , c5 such that v = c1 v1 + c2 v2 + c3 v3 + c4 v4 + c5 v5 . Written out
in terms of components, this means finding c1 , c2 , c3 , c4 , c5 such that


 
c1 + c3 + c4 + c5
1
 −2   2c1 + 2c2 − c4 
;

 

 3 =
c4 + c5
3c2 − c3 + c4
4
in other words, solving the system of equations
c1 + c3 + c4 + c5
2c1 + 2c2 − c4
=
1
= −2
c4 + c5
=
3
3c2 − c3 + c4
=
4
As usual, we’ll do this setting up the augmented matrix and row reducing. The augmented matrix is:


1 0
1
1 1
1
 2 2
0 −1 0 −2 


 0 0
0
1 1
3 
4
0 3 −1
1 0
You will notice that the columns of the augmented matrix are the vectors v1 , v2 , v3 , v4 , and v. Noticing this
saves some time next time one has to do this. Lets proceed with the row reduction.




1 0
1
1 1
1
1 0
1
1
1
1
 2 2
0 −1 0 −2 
2 −2 −3 −2 −4 
(2)−2(1)  0

 III−→


 0 0


0
1 1
0 0
0
1
1
3
3 
0 3 −1
1 0
4
0 3 −1
1
0
4




1 0
1
1
1
1
1 0
1
1
1
1
II 1 (2) 
 (4)−3(2)  0 1 −1 −3/2 −1 −2 
2
 0 1 −1 −3/2 −1 −2  III−→


−→
 0 0
 0 0
0
1
1
3 
0
1
1
3 
0 3 −1
1
0
4
0 0
2 11/2
3 10




1 0
1
1
1
1
1 0 0 −7/4 −1/2 −4
II 1 (4) 

,III(2)+(4)  0 1 0
5/4
1/2
3 
2
 0 1 −1 −3/2 −1 −2  III(1)−(4)


−→
−→
 0 0


0
1
1
3
0 0 0
1
1
3 
5
5
0 0
1 11/4 3/2
0 0 1 11/4
3/2




1 0 0 −7/4 −1/2 −4
1 0 0 0
5/4
5/4
III
,III
7
5 (4) ,III(3)− 11 (4) 

I(3,4)  0 1 0
5/4
1/2
3 
4
4
 (1)+ 4 (4) (2)−
 0 1 0 0 −3/4 −3/4 
−→ 
−→
 0 0 1 11/4
 0 0 1 0 −5/4 −13/4 
3/2
5 
0 0 0
1
1
3
0 0 0 1
1
3
The solution is given by
5
5
c1 = − c5 + ,
4
4
c2 =
3
3
c5 − ,
4
4
c3 =
5
13
c5 − ,
4
4
c4 = −c5 + 3,
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
33
with c5 being arbitrary. This gives us an infinite number of ways of expressing v as a linear combination of
v1 , v2 , v3 , v4 , v5 . For example, selecting c5 = 0, we get the representation
v=
5
3
13
v1 − v2 − v3 + 3v4 .
4
4
4
You might want to take a minute or so and verify that this representation is correct. Or, we can take c5 = 1 to get
v = −2v3 − 2v4 + v5 .
Again, take a minute or so to see this works. Or, one can take c5 = −1 and get
v=
5
3
9
v1 − v2 − v3 + 4v4 − v5 .
2
2
2
And an infinity more.
Notice the difference between the two examples. In the first one there was a unique choice for the scalar
coefficients (or no choice at all). In the second example, there is an infinity of choices. This is because the set of
vectors in the first example were linearly independent, while those of the second example were not. Let us define
what this means.
Assume again that we have a vector space V and vectors v1 , . . . , vm in V . We say the vectors v1 , . . . , vm are
linearly dependent if there exist scalars c1 , c2 , . . . , cm , not all 0, such that
c1 v1 + · · · + cm vm = 0.
The vectors are linearly independent if they are not linearly dependent.
In more words than symbols: Given a set of vectors v1 , . . . , vm , the zero vector 0 can always be obtained as a
linear combination of these vectors by taking c1 = 0, c2 = 0, . . . , cm = 0; all coefficients equal to 0. If this is the
ONLY way one can get the zero vector, the vectors are said to be linearly independent. If there is another way,
with at least one of the coefficients not zero, they are linearly dependent.
A few obvious things to notice: If a set of vectors contains the zero vector, they are automatically linearly
dependent. In fact, say v1 = 0; then whatever v2 , . . . , vn may be, we will have c1 v1 + · · · + cn vn = 0 if we
take c2 = 0, . . . , cn = 0. and c1 = 1 (or any other non-zero number). Given two vectors v1 , v2 they are linearly
independent if and only if one is not equal to the other one times a scalar. In fact, if (say) v1 = cv2 , then
v1 + cv2 = 0 and since the coefficient of v1 is 1 6= 0, we clearly have linear dependence. If, on the other hand, we
have c1 v1 + c2 v2 = 0 and not both c1 , c2 are 0, then we can divide out by the non-zero coefficient and solve. For
example, if c1 6= 0, we can divide by c1 and solve to v1 = −(c2 /c1 )v2 , so v1 is a scalar times v2 . A bit less obvious,
perhaps, is that vectors v1 , . . . , vm are linearly dependent if and only if one of the vectors is a linear combination
of the others. In fact, if we have c1 v1 + · · · cm vm = 0 and one of the cj ’s is not 0, we can divide it out and solve
to get vj a linear combination of the remaining vectors. For example, if cm 6= 0 we get
cm−1
c1
v1 + · · · + −
vm−1 .
vm = −
cm
cm
Conversely, if one vector is a linear combination of the others, we can get at once a linear combination of all equal
to 0 in which the coefficient of the vector that was a combination of the others is 1 (or −1). For example, if
v2 = c1 v1 + c3 v3 + · · · + vm , then c1 v1 + (−1)v2 + c3 v3 + · · · + vm = 0
and, of course, −1 6= 0. Maybe we should have this as a theorem for easy reference:
Theorem 3 Vectors v1 , . . . , vm of a vector space V are linearly dependent if and only if one of the vectors is a
linear combination of the others. Equivalently, the vectors v1 , . . . , vm are linearly independent if and only if none
of the vectors is a linear combination of the others.
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
34
Here is another example.
Example Let the vector space V be now the space of all (real valued) functions defined on the real line. Show
that the following sets of vectors are linearly independent.
1. f1 , f2 where f1 (t) = sin t, f2 (t) = cos t.
2. g1 , g2 , g3 , g4 where g1 (t) = et , g2 (t) = e2t , g3 (t) = e3t , g4 (t) = e4t .
Solution. The first one is easy. Since we only have two vectors, we can ask whether one is a constant times the
other. Is there a scalar c such that either f1 = cf2 , which means sin t = c cos t FOR ALL t, or f2 = cf1 , which
means cos t = c sin t FOR ALL t. Of course not!. If sin t = c cos t, we simply have to set t = π/2 to get the rather
strange conclusion 1 = 0, so f1 = cf2 is out. So is f2 = cf1 ; we can’t have cos t = c sin t for all t because for t = 0
we get 1 = 0.
The second one is a bit harder; we have to show that the only way we can get c1 et + c2 e2t + c3 e3t + c4 e4t = 0 to
hold for ALL t is by taking c1 = c2 = c3 = c4 = 0. We can set up a system of impossible equations by judiciously
giving values to t, but I’ll postpone the somewhat simpler solution for later, once we have developed a few more
techniques.
Here comes another definition. Let v1 , . . . , vm be vectors in a vector space V . The span of v1 , . . . , vm is the set
of all linear combinations of these vectors. I will denote the span of v1 , . . . , vm by sp(v1 , . . . , vm ). Vectors that will
be for sure in the span of vectors v1 , . . . , vm include: The zero vector; as mentioned, it is a linear combination of
any bunch of vectors, just use 0 coefficients. The vectors v1 , . . . , vm themselves; to get vj as a linear combination
use cj = 1, all other coefficients equal to 0. And, in general, many more: v1 + 2v2 , −v1 , v1 + · · · + vm ; etc., etc.,
ad infinitum.
Here are a few examples.
Examples
1. Let us start with the simplest example; we have only one vector and that vector is the zero vector. What
can 0 span? Well, not much; multiplying by any scalar we always get 0. The span of the zero vector is the
set consisting of the zero vector alone; a set of a single element. In symbols, sp(0) = {0}.
2. On the other hand if V is a vector space, v ∈ V , v 6= 0, then the span of v consists of all multiples of v;
(
v) = {cv : c a scalar}.
3. Remember the example we did above:
In R3 consider the vectors
Show that the vectors

1
v1 =  −3 
2

−1
v2 =  0 
1


and




0
1
0 =  0 ,
v =  −2 
0
1


1
are linear combinations of v1 , v2 , while w =  1  is not.
1
We could now rephrase it in terms of spans; in solving it we showed that with the vectors defined as in the
example,
0, v ∈ sp(v1 , v2 ), while w ∈
/ sp(v1 , v2 ).
Here is a simple theorem.
Theorem 4 Assume v1 , . . . , vm are vectors in a vector space V . Then sp(v1 , . . . , vm ) is a subspace of V .
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
35
The reason why this theorem is true is, I hope, clear. First of all, the zero vector is in the span of v1 , . . . , vm . It
is also clear (I hope once more) that adding two linear combinations of these vectors, or multiplying such a linear
combination by a scalar, resolves into another linear combination of the same vectors. That’s it.
We will refer to the span ( v1 , . . . , vm ) of vectors v1 , . . . , vm also as the subspace of V spanned by v1 , . . . , vm ,
and call {v1 , . . . , vm } a spanning set for this subspace.
A subspace can be spanned by different sets of vectors. In fact, except for the pathetically small subspace
consisting of the zero vector by its lonesome, every other subspace of a vector space will have an infinity of different
spanning sets. Consider, for example, one of the simplest cases, suppose v is a non-zero vector in a vector space
V . Let W =( v). Then
W = {cv : c a scalar}.
But it is, or should be clear, that anything that is a multiple of v is also a multiple of any non-zero multiple of v.
That is, suppose d is any non zero scalar and we set w = dv. Any multiple of v is a multiple of w, and vice-versa:
If x = cv, then x = (c/d)w; if x = cw, then x = (cd)v. Any multiple of v also spans W .
Generally speaking, if there is one any spanning set, there is an infinity of them. But some spanning sets are
better than others. They have less fat,less superfluous elements. Say we are in a vector space V and W is a
subspace spanned by the vectors v1 , . . . , vm . If one of these vectors happens to be a linear combination of the
remaining ones, who needs it? For example, suppose
vm = a1 v1 + · · · + αm−1 vm−1
for some scalarsa1 , . . . , am .
Then any linear combination involving all vectors can be rewritten as one without vm :
c1 v1 +· · ·+cm vm = c1 v1 +· · ·+cm−1 vm−1 +cm (a1 v1 + · · · + αm−1 vm−1 ) = (c1 +cm a1 )v1 +· · ·+(cm−1 +cm am−1 )vm−1 .
That is, if vm is a linear combination of v1 , . . . , vm−1 , then sp(v1 , . . . , vm ) = sp(v1 , . . . , vm−1 ). Recalling Theorem
3, that one vectors is a linear combination of the others is equivalent to linear dependence. If the spanning is
linearly dependent, we can find a vector that is a linear combination of the others (there usually is more than one
choice), and throw it out. The remaining vectors still span the same subspace. We keep doing this. Can we run
out of vectors? No, we can’t, since nothing cannot span a subspace (well, sort of) and we are always spanning the
same subspace. But we only have a finite number of vectors to start with, so there must be a stopping point. The
stopping point is a linearly independent set of vectors spanning the same space as before. Such a set is called a
basis of the subspace. To put it in the form of a theorem:
Theorem 5 Let V be a vector space and let v1 , . . . , vm be vectors in V . Let W =( v1 , . . . , vm ). There is a subset
of {v1 , . . . , vm } that still spans W and is linearly independent; in other words: every spanning set contains a basis
of the spanned subspace.
We could now ask if this basis, obtained by discarding vectors from a spanning set, is always the same. Well,
if the spanning set was already linearly independent, and there is nothing to discard, then yes. Even so, there are
many other spanning sets that will span the same subspace. And since when discarding there is almost always
more than one choice of what to discard at each stage, the general answer is no. Vector subspaces tend to have an
infinity of different bases. What is, however, remarkable (I’d even dare say extremely remarkable) is that any two
bases of a given subspace of a vector space will have the same number of elements. So if you have a subspace and
found a basis of exactly seven vectors and someone tries to sell you a better, improved basis, of only six vectors,
don’t buy! There could be a better basis than the one you found, but it still will consist of seven vectors. If a
subspace has a basis of m vectors, we say that its dimension is m. That is, an m-dimensional subspace of a vector
space is one that has a basis of m elements (and hence ALL of its bases will have m elements).
Before we go any further, it may be good to make some additional definitions. A vector space V is a subspace
of itself, so we can ask if it can be spanned by a finite number of vectors. If so, we say it is finite dimensional ; if
not, it is infinite dimensional. By what we saw, if it is finite dimensional, it has a basis, hence a dimension. Infinite
dimensional vector spaces also have bases, but one has to redefine the concept a bit, and we won’t go into it.
Example. Show that the vectors
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1),
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
36
form a basis of Rn , hence Rn is a vector space of dimension n. In case there are too many dots, the vector ej is
the n-tuple in which the j-th component is 1, all other components are 0.
Solution. A direct way to verify that a set of vectors is a basis of a given space (or subspace) is by performing
two tasks (in any order)
1. Verify linear independence. In other words show that the equation
c1 e1 + · · · + cn en = 0
can only be solved by c1 = 0, c2 = 0, . . . , cn = 0.
2. Verify that for every vector w in the space (or subspace), the equation
c1 e1 + · · · + cn en = w
has at least one solution c1 , c2 = 0, . . . , cn . (Incidentally, if it happens to have more than one solution, then
the first condition fails; if a vector is a linear combination of linearly independent vectors, there is only one
choice for the coefficients.)
If either condition fails, we don’t have a basis.
Verifying the first condition; linear independence: Suppose c1 e1 + · · · + cn en = 0 for some scalars c1 , . . . , cn . If
we write the vectors of Rn in the from of column vectors, then this equation becomes


 
 

 

0
1
0
0
c1
 0 
 0 
 1 
 0   c2 


 
 

 

 ..  = c1  ..  + c2  ..  + · · · + cn  ..  =  .. 
 . 
 .   . 
 . 
 . 
0
0
0
1
cn
The only way the first and last vectors can be equal is if c1 = c2 = · · · = cn = 0. Linear independence has been
established.



Verifying the second condition; spanning: Let w be a vector of Rn ,so w = 

w1
w2
..
.



. We have to show that no

wn
matter what w1 , . . . , wn are, we can

w1
 w2

 ..
 .
always solve

 

1

 0 


 

 = c1  ..  + c2 

 . 

wn
0
0
1
..
.






 + · · · + cn 


0
0
0
..
.



?

1
Writing the right hand side as a single vector, the equation becomes

 

w1
c1
 w 2   c2 

 

 ..  =  ..  .
 .   . 
wn
cn
There is a solution, namely c1 = w1 , c2 = w2 , . . . , cn = wn . Spanning has been established.
The basis {e1 , . . . , en } of Rn is sometimes referred to as the canonical basis of Rn . You might notice that if we
allow complex scalars, it is also a basis of Cn , so that Cn is a (complex) vector space of dimension n.
Here is a theorem about bases; some of the properties mentioned in it are sort of obvious, others less so. I hope
all are believable (given that they are true).
Theorem 6 Let V be a vector space of dimension n. Then
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
37
1. No set of more than n vectors can be linearly independent.
2. No set of less than n vectors can span V .
3. Any set of n vectors that spans V is also linearly independent and a basis; i.e., if V = sp(v1 , . . . , vn ), then
v1 , . . . , vn is a basis of V .
4. Any set of n vectors that is linearly independent will also span V and be a basis of V; i.e., if v1 , . . . , vn are
linearly independent, then V = sp(v1 , . . . , vn ), and v1 , . . . , vn is a basis of V .
5. Let v1 , . . . , vm be linearly independent (so, by the first property, we must have m ≤ n). If m = n it is a basis;
if m < n it can be extended to a basis meaning we can find vectors vm+1 , . . . , vn so that v1 , . . . , vn is a basis
of V .
6. Every subspace of V has a basis. If W is a subspace of V there exists a set of vectors {v1 , . . . , vm } that is
linearly independent and such that W = sp(v1 , . . . , vm ). Necessarily m ≤ n (m = n if and only if V = W ).
So once we have a finite dimensional space V , there is a hierarchy of subspaces: Precisely one subspace of dimension
n, namely V itself, a lot (an infinity) of subspaces of dimensions n−1, all the way down to dimension 1. To keep the
poor subspace consisting only of the zero vector happy by giving it a dimension, one says that the trivial subspace
has dimension 0. From a geometric point of view, if we think of Rn as a sort of n-dimensional replica of our familiar
3-space (or as our 3-space if n = 3, the plane if n = 2, a line if n = 1), one dimensional subspaces are lines through
the origin, two dimensional subspaces are planes containing the origin, three dimensional subspaces, well, they are
replicas of R3 containing the origin. Maybe it is a good idea to work for a while in the familiar environment of
3 space. Setting up a system of orthogonal

 coordinates, we can think of points of 3 space as being vectors. The
x
triple (written) as a column x =  y  can be interpreted as the point of coordinates x, y, z, or as the arrow
z
from the origin to the point of coordinates x, y, z. Choose the one you like best, it makes no difference. (In applied
situations it can make a difference, but here we are not in an applied situation.) Suppose we have a non-zero
vector;
vector constitutes a very small linearly independent set of a single element. Say
 a non-zero

b1
b =  b2  6= 0 (so at least one of b1 , b2 , b3 is not 0), then the subspace sp(b) is the set
b3




b1
cb1
sp(b) = {c  b2  : c a real number} = { cb2  : c a real number}.
b3
cb3
In other words, it consists of all points of coordinates x, y, z satisfying
x = cb1 ,
y = cb2 ,
z = cb3 , −∞ < c < ∞.
These are the parametric equations of a line through the origin in the direction of the vector b.One usually uses t
or s for the parameter instead of c, but that does not really matter.
What about the subspace spanned by two vectors a and b. If these vectors are linearly dependent, we are back
to a line (assume that neither is 0 to avoid wasting time). Both vectors are then on the same line through the
origin, and either one of them spans that line, and is a basis for the line. On the other hand, if the vectors are not
collinear (i.e., they are linearly independent), then sp(a, b) is the plane determined by the two vectors (thinking of
them as arrows from the origin). We might retake this example later on.
What about three vectors? Well, if they are linearly dependent and not all 0, they span a line or a plane
through the origin. If linearly independent, they are a basis of R3 and span R3 .
Since our main space may well be Rn (or Cn ), it may be a good idea to have some algorithm to decide when
vectors in Rn are linearly independent; know what they span. In the section on determinants we’ll see how to do
this with determinants, but for now we’ll use our old and a bit neglected friend, row reduction. We were writing
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
38
vectors of Rn as column vectors, but for the algorithm I have in mind it will be more convenient to write them
as rows. After this I’ll give you a second algorithm where you write. Well, lets try to be more or less consistent
and keep writing the vectors as columns; in the first algorithm we’ll just transpose to get rows. So assume given
m vectors in Rn , say






v1m
v12
v11
 v2m 
 v22 
 v21 






v1 =  .  , v2 =  .  , . . . , vm =  .  ,
 .. 
 .. 
 .. 
vnm
vn2
vn1
The problem to solve is: determine the dimension of sp(v1 , . . . , vm ) and find a basis for the subspace W =
sp(v1 , . . . , vm ).
For both algorithms we introduce the matrix I’ll call M whose columns are the vectors; that is,


v11 v12 · · · v1m
 v21 v22 · · · v2m 


M = .
..
..
.. 
 ..
.
.
. 
vn1
vn2
···
vnm
Algorithm 1 Let N = M T . The vectors v1 , . . . , vm are the rows of N . Row reduce N to RRE form. The non-zero
rows are then a basis for W ; write them again as column vectors.
Why does this work? It works because row operations do not change the span of the row vectors. That is fairly
easy to see. So once you are in RRE form, the rows of the RRE form still span the same subspace W . But because
the non-zero ones all start with a leading 1, and everything above and below that 1 is 0, it is easy to see that
the non-zero rows are linearly independent, thus a basis. This basis, of course, might contain no vector from the
original spanning set.
Algorithm 2. This one is a bit harder to explain, but one row reduces directly the matrix M , bringing it to RRE
form. Now go to the original spanning set v1 , . . . , vm and discard every vector that was in a column of M which
now does NOT have a leading 1. That is, keep only the original vectors that were in a column that now has a
leading 1. These remaining vectors form a basis for W . The advantage of this algorithm is that the basis is made
up out of vectors from the original set.
Lets illustrate this with a somewhat messy example. Messy examples can sometimes be the best. Sometimes
they are the worst. NOTICE: This is only an example! No instructor would be sadistic enough to have you do a
computation like this one by hand. I just thought it might be good to occasionally deal with larger systems, and
understand in the process why computers are a great invention.
Consider the following 8 vectors in R7 :










 
1
3
1
1
1
1
 2 
 6 
 −3 
 1 
 −8 
 0 










 
 3 
 9 
 −2 
 1 
 −7 
 1 










 










 
−1
−3
4
0
9
v1 = 
 , v2 = 
 , v3 = 
 , v4 = 
 , v5 = 
 , v6 =  1  ,
 −2 
 −6 
 0 
 5 
 2 
 1 










 
 4 
 12 
 1 
 −2 
 −2 
 1 
−2
−6
5
−1
12
1





v7 = 




0
2
1
0
1
1
−1










 , v8 = 








3
−3
−1
8
−5
10
8





.




We want to find the dimension of the subspace W = sp(v1 , . . . , v8 ) they span, and a basis of this spanned
subspace. If perchance the dimension is 7 (it won’t be), then they span R7 and the basis we would get would be a
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
39
basis of R7 . The matrix M is





M =





3
1
1
1 1
0
3
6 −3
1 −8 0
2 −3 

9 −2
1 −7 1
1 −1 

−3
4
0
9 1
0
8 

−6
0
5
2 1
1 −5 

12
1 −2 −2 1
1 10 
−6
5 −1 12 1 −1
8
1
2
3
−1
−2
4
−2
The transpose matrix is

N = MT





=





2
3 −1 −2
4
6
9 −3 −6 12
−3 −2
4
0
1
1
1
0
5 −2
−8 −7
9
2 −2
0
1
1
1
1
2
1
0
1
1
−3 −1
8 −5 10
1
3
1
1
1
1
0
3
−2
−6
5
−1
12
1
−1
8












To row reduce this matrix, I used Excel. Moreover, because at every row reduction the space spanned by the row
vectors is always the same, it isn’t really necessary to reach RRE; it suffices to stop once one can see that the
non-zero rows are linearly independent. With the help of Excel, I carried out the following operations on N , in the
indicated order:
III(2)−3(1) ,
III(3)−(1) ,
III(4)−(1) ,
III(1)−2(2) ,
III(3)+5(2) ,
III(4)+10(2) ,
II5(4) ,
III(4)−2(3) ,
II5(5) ,
III(5)−(1) ,
III(5)+2(2) ,
III(5)+3(3) ,
At this point I got the following matrix:












1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
III(6)−(1) ,
II5(6) ,
III(8)−3(1) ,
III(6)−2(2) ,
III(6)−8(3) ,
I(2,8) ,
III(7)+9(2) ,
I(4,5) ,
I(2,3) ,
III(4)−2(3) ,
III(6)−(4) ,
II−(2)
I(4,7)
III(6)+2(5) .

−1
1
12 −8
0
2 −1 −7
6 −1 

5
0 −33 27
2 

0 10 −24 26 11 

0
0
11 −9
1 

0
0
0
0
0 

0
0
0
0
0 
0
0
0
0
0
I doubt I could have done this without Excel! Too many possibilities of mistakes. There is no real need to continue.
Vectors in Rn of the form






a1
0
0
 ∗ 
 a2 
 0 






 ∗ 
 ∗ 
 a3 






 ∗ ,
 ∗ ,
 ∗ ,...






 .. 
 .. 
 .. 
 . 
 . 
 . 
∗
∗
∗
where a1 , a2 , . . . are non-zero scalars and the entries marked with a * could be anything (zero or non-zero), have
to be linearly independent. Any linear combination of them with coefficients c1 , c2 , c3 , . . . would result in a vector
whose first component is c1 a1 . If it is the zero vector, then c1 a1 = 0, hence c1 = 0. The second component of this
linear combination is c1 (∗) + c2 a2 (c1 times the second component of the first vector plus c2 a2 ). Since c1 = 0, we
get c2 a2 = 0, hence c2 = 0. And so forth.
7
LINEAR DEPENDENCE, INDEPENDENCE AND BASES
40
Returning to our example, we proved that the set of vectors






0
0
1

 1 
 0 
0 







 2 
 −1 
5 










0 
w1 = 
,
 1  , w2 =  −1  , w3 = 
 −33 
 −7 
 12 






 27 
 6 
 −8 
2
−1
0





w4 = 




0
0
0
10
−24
26
11






,








w5 = 




0
0
0
0
11
−9
1





.




spans the same space as the original set v1 , . . . , v8 ; since they are linearly independent they are a basis of this
subspace and its dimension is 5.
Solution by the second algorithm. In many ways, this is a better solution. We need to row reduce the matrix
M ; this time I will take it to RRE form. Using excel, of course.
The following row operations put this matrix into RRE form. I’m not sure one can do it in less, but you may
certainly try.
III(2)−(1) ,
III(4)−5(2) ,
III(3)−3(1) ,
III(5)−2(2) ,
III(7)+ 25 (3) ,
I(5,7) ,
III(4)+(1) ,
III(6)+3(2) ,
II5(5) ,
III(2)+5(6) ,
The RRE form is:
III(5)+2(1) ,
III(6)−4(1) ,
III(7)−7(2) ,
III(1)− 53 (5) ,
III(3)−(6) ,










II−(3) ,
III(2)− 52 (5) ,
III(4)−2(6) ,
1 3 0 0
0 0 1 0
0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
III(7)+2(1) ,
III(1)− 45 (3) ,
III(6)+ 95 (5) ,
III(5)+11(6) ,
II− 15 (2) ,
III(1)−(2) ,
III(2)− 15 (3) ,
III(5)− 33
,
5 (3)
1
III(7)− 11
, II 25
(5) ,
5 (5)
III(7)+29(6) ,
III(3)+5(2) ,
III(6)+ 27
,
5 (3)
III(1)+7(6) ,
I4,6

−1 0 0
2
2 0 0
3 

0 0 0
0 

0 1 0 −2 

0 0 1
1 

0 0 0
0 
0 0 0
0
Looking at this matrix we notice that the columns that contain the leading 1’s of the non-zero rows are columns
1, 3, 4, 6, and 7. That means that of the original vectors, v1 , v3 , v4 , v6 , and v7 constitute a basis of the subspace
spanned by the original 8 vectors. Since this basis has 5 vectors, we again get that the dimension is 5, but now we
have a basis consisting of a subset of the original vectors.
7.1
Exercises
1. Show that the vector space of all m × n matrices; that is Mm,n (or, in the complex case, Mm,n (C) has
dimension m · n. Describe a basis.
2. In one of the examples in this section it was shown that the vectors








1
0
0
0
 0 
 1 


0 
0 








 −1 
 2 



5
0 














0 
w1 = 
 1  , w2 =  −1  , w3 = 
 , w4 =  10  ,
 12 
 −7 
 −33 
 −24 








 −8 
 6 
 27 
 26 
0
−1
2
11





w5 = 




0
0
0
0
11
−9
1










8
DETERMINANTS
41
constitute a basis for the subspace of R7 spanned by the eight vectors









1
1
1
3
1
 −8
 1 
 −3 
 6 
 2 









 −7
 1 
 −2 
 9 
 3 


















0
4
−3
−1
v1 = 
 , v5 =  9
 , v4 = 
 , v3 = 
 , v2 = 
 2
 5 
 0 
 −6 
 −2 









 −2
 −2 
 1 
 12 
 4 
12
−1
5
−6
−2





v7 = 




0
2
1
0
1
1
−1










 , v8 = 








3
−3
−1
8
−5
10
8










 , v6 = 








1
0
1
1
1
1
1





,









.




1. Express v1 as a linear combination of the vectors w1 , w2 , w3 , w4 , w5 . Show there is only one way of doing
this.
2. Express w1 as a linear combination of the vectors v1 , v2 , v3 , v4 , v5 , v6 , v7 , v8 in three different ways.
8
Determinants
Every square matrix (and only square matrices) gets assigned a number called its determinant. To repeat, if M
is a square matrix then the determinant of M , denoted usually by det(M ), is a scalar (real if the matrix is real;
otherwise complex). There are many (equivalent) ways of defining the determinant; I’ll select what could be a
favorite way of doing it because it also tells you how to compute it. At first glance the definition may look a bit
complex, but I hope that the examples will clear it up. A bit of practice will clear it up even more. By the way, in
this section I prove almost nothing; you’ll have to take my word that all I tell you is true.
The definition I have in mind is a recursive definition; which becomes a recursive algorithm for computing
determinants. Recursive procedures should be quite familiar to anybody who has done a bit of programming. Let
us start with the simplest case, a 1 × 1 matrix. That’s just a scalar enclosed in parentheses; for example (5) or
(1 + 3i). Mostly, one doesn’t even write the parentheses and identifies the set of 1 × 1 matrices with the set of
scalars. We define, for a 1 × 1 matrix
det(a) = a.
So the determinant of a scalar is the scalar. This could be called our base case. The next part of the definition
explains how to reduce computing the determinant of an n×n-matrix to computing determinants of (n−1)×(n−1)
matrices. I’ll give it first in words; after the examples I’ll write out the formulas in a more precise way. Here is the
full algorithm, which is usually known as Laplace’s expansion. Assume given an n × n square matrix.
Step 1. If n = 1, it is explained above what to do. Suppose from now on n ≥ 2.
Step 2. Assign a sign (“+” or “−”) to each position of the matrix, beginning by placing a + in position (1, 1)
and then alternating signs. This is totally independent of the entries in the matrix; the entry in + position
may well be negative, the one in a − position can be positive. For example, here is how this signs assignment
looks for 2 × 2, 3 × 3 and 4 × 4 matrices:




+ − + −
+ − +
 − + − + 
+ −


 − + − ,
,
 + − + − .
− +
+ − +
− + − +
8
DETERMINANTS
42
Step 3. Select a row of the matrix. One usually selects the row with the largest number of zero entries; all
things being equal one selects the first row. For each entry in that row compute the determinant of the
(n − 1) × (n − 1) matrix obtained by crossing out the selected row and the column containing the entry.
Multiply the determinant times the entry. If the entry is in a negative position, change the sign of this result.
Step 4. Add up the results of all the computations done in step 3. That’s the determinant.
An important fact here is that the result of the computation does not depend on the choice of the row. That’s not
so easy to justify with the tools at our disposal. Here is how we would compute the determinant of a 2 × 2 matrix
a b
A=
c d
Let us select the first row. The first entry is a; if we cross out the first row and the column containing a, we are
left with d. We multiply by a, getting ad. Since a is in a positive position, we leave it as it is. The next (and last)
entry of the first row is b. Crossing out the first row and the column containing b we are left with c. We multiply
b times c and, since b is in a negative place, change the sign to −bc.
Adding it all up gives ad − bc. Thus
a b
det(A) = det
= ad − bc
c d
What would have happened if we had chosen the second row? We would get the same determinant in the form
−cb + da.
A word on notation. Given a square n × n matrix, it is customary to write its determinant by replacing the
parentheses that enclose the array of entries of the matrix by vertical lines. Thus
a b c d = ad − bc.
So if the matrix A is given by



A=

then
a11
a21
..
.
a22
a22
..
.
···
···
..
.
ann
a2n
..
.
an1
an2
···
ann
a11
a21
det(A) = .
..
an1



,

a22
a22
..
.
···
···
..
.
ann
a2n
..
.
an2
···
ann
,
One slight problem with this notation is that the vertical lines look like absolute values, but determinants can be
negative (or even non-real).
Let us compute now a 3 × 3 determinant. While you should
there is no need to memorize any further formulas (assuming you
Expansion by the first row:
a b c d e f = a e f − b d
h i g
g h i memorize the formula for a 2 × 2 determinant,
know how Laplace’s expansion works).
d
f +
c
g
i
e h To complete the job we need to compute the three 2 × 2 determinants. Here is how one can compute a concrete
4 × 4 determinant. I will expand at first by the second row because it has a 0 term, and that cuts by one the
8
DETERMINANTS
43
number of 3 × 3 determinants to compute; all three by three determinants by the first row.
1
2
3
2 1
1 3
2 3
2
2 2 2 2 −3 −4
0
= (−2) −1 2 −2 + (−3) 1 2 −2 + 4 1 −1 −2 1 −1
2 −2 1
1 2
1 2
1
2 2 2 1
1
2
2
−1 2 −1 −2 2 −2 + 2
− 3
= (−2) 2 1 2 1
2 2
2 1 2 1 −2 2 −2 + 2
− 3
−3 1 2 1
2 2
2 1 −1 1 −2 −1 −2 + 2
− 2
+4 1
1 1
2 1
2 = (−2) 2(4 − (−4)) − 3(−2 − (−2)) + 2(−2 − 2) − 3 (4 − (−4)) − 3(2 − (−2)) + 2(2 − 2)
+4 ((−2) − (−2)) − 2(2 − (−2)) + 2(1 − (−1)) = −20.
Laplace’s method is useful for calculating the determinants of small matrices (up to 3 × 3, maybe 4 × 4), and
for matrices that have a lot of zeros. But it is not a very efficient method. I will list now the basic properties of
determinants.Some of these properties will allow us to calculate determinants more efficiently. Maybe. Some of
these properties are easy to verify, others are not so easy.
D1. If M is an n × n matrix and its rows are linearly dependent; equivalently, one row is a linear combination of
the other rows, then det(M ) = 0. In particular this holds if M has a zero row, or two equal rows.
D2. If M is an n × n, then det(M T ) = det(M ). The determinant of a matrix equals the determinant of its
transpose. Because of this, it turns out that one can compute the determinant of a matrix expanding along
a column rather than a row.
D3. If M is an n×n matrix and its columns are linearly dependent; equivalently, one column is a linear combination
of the other column, then det(M ) = 0. In particular this holds if M has a zero column, or two equal columns.
This property is, of course, and immediate consequence of properties D1,D2.
D4. The effect of row operations. Let M be ann × n matrix. If the matrix N is obtained from M by
1. interchanging two rows, that is applying I(i,j) , i 6= j, then det(N ) = − det(M );
2. multiplying a row by a scalar c, (operation IIc(i) ), then det(N ) = c det(M );
3. adding to a row i a row j times a scalar (operation III(i)+c(j) , i 6= j), then det(N ) = det(M ).
D5. If A, B are n × n matrices, then det(AB) = det(BA) = det(A) × det(B).
With these properties we can compute determinants using row reduction. While computing 5 × 5 determinants is
still a difficult thing to do without the aid of some calculating device (nice computer software, for example), it is a
far better method than using the Laplace expansion. If programmed in some computer language, it is an algorithm
that may be hard to beat for finding the determinant of medium sized matrices. To use it at maximum efficiency
we need the following additional property, which is actually an easy consequence of the Laplace expansion. First a
definition. A square matrix is said to be upper triangular if all entries below the main diagonal are 0. It is said to
lower triangular if all entries below the main diagonal are 0. The following matrix is upper triangular.


a11 a12 a1 3 · · · a1(n−1) a1n
 0 a22 a23 · · · a2(n−1) a2n 


 0
0 a33 · · · a3(n−1) a3n 



.
..
..
..
U =
 ..
.
.
.
···
··· 




..
..
..
..
 ···
.
.
.
.
··· 
0
0
0 ···
0
ann
8
DETERMINANTS
44
If you transpose it, it becomes lower triangular. The property in question is:
D6. The determinant of an upper or a lower triangular matrix equals the product of its diagonal entries.
Examples: Calculate the determinants of the following matrices:

1
a)A =  0
0

2
 −3
b)B = 
 4
1

−5 6
2 3 ,
0 2
0
5
4
2

0 0
0 0 
,
−3 0 
5 6



c)C = 


1
0
0
0
0

−5 6 7 8
2 3 4 5 

0 0 3 3 
.
0 0 2 1 
0 0 0 7
Solution.
a) det(A) = 1 · 2 · 2 = 4,
b) det(B) = 2 · 5 · (−3) · 6 = −180 c) det(C) = 1 · 2 · 0 · 2 · 7 = 0.
The idea now is to row reduce the matrix to upper (or lower–upper is better) triangular form, keeping track of
how the row reductions affect the determinant. If we interchange rows, we multiply the determinant by -1, if we
multiply a row by a constant, we divide the determinant by that constant; if we do an operation of type III, the
determinant stays the same. I’ll illustrate this computing again the determinant of the 4 × 4 matrix computed
before. That is, I’ll compute
1
2
3
2 2 −3 −4
0 .
1 −1
2
−2
1
1
2
2 First we work on the first column.We perform the operations
determinant is unchanged. We get
1
2
3
2 1
2 −3 −4
0 0
=
1 −1
2 −2 0
1
1
2
2 0
We can now exchange lines 2 and
new determinant by −1 and get
III(2)−2(1) , III(3)−(1) , III(4)−(1) on the matrix; the
2
3
2 −7 −10 −4 −3 −1 −4 −1 −1
0 3; this changes the sign of the determinant, so to compensate we multiply the
1
2
1
1
Next I multiply the second row by
changing the sign:
2
3
2 −3 −4
0 = − −1
2 −2 1
2
2 −1. To compensate I
1
2
3
2 2 −3 −4
0 =
1 −1
2 −2 1
1
2
2 1
0
0
0
2
3
2 −1
−1
0 −3
−1 −4 −7 −10 −4 need to divide by −1, which of course is the same as
1
2
3
2 0
1
1
0 0 −3 −1 −4 0 −7 −10 −4 Next I perform operations III(3)+3(2) , III(4)+7(2) . This does not change the determinant.
1
2
3
2 1 2
3
2 2 −3 −4
0 0 1
1
0 1 −1
= 0 0
2
−2
2
−4
1
1
2
2 0 0 −3 −4 Trying to avoid working with fractions, or just for the fun of it, I’ll multiply the 3rd row by 3 and the fourth row
by 2. I need to compensate by dividing the determinant by 2 × 3 = 6:
1
1 2
2
3
2 3
2 2 −3 −4
0 1 0 1
1
0 = 1 −1
2 −2 6 0 0
6 −12 1
1
2
2
0 0 −6
−8 8
DETERMINANTS
45
Finally, I perform operation III(4)+(3) ,
triangular form) to get
1
2
3
2
2 −3 −4
0
1 −1
2
−2
1
1
2
2
which does not change the determinant (and puts the matrix in upper
1
1 0
= 6 0
0
2
1
0
0
3
1
6
0
2
0
−12
−20
1
= (1 · 1 · 6 · (−20)) = −20.
6
What are determinants good for? Well, here is a first application.
Theorem 7 Let



v1 = 

v11
v21
..
.




,

v1m
v2m
..
.


. . . , vm = 






vnm
vn1
n
be m vectors in R . They are linearly independent if and only if
that is the matrix

v11 v12 · · ·
 v21 v22 · · ·

M = .
..
..
 ..
.
.
vn1 vn2 · · ·
the matrix whose columns consist of the vectors;

v1m
v2m 

.. 
. 
vnm
has at least one m × m submatrix with a non-zero determinant. More generally, the dimension of ( v1 , . . . , vm ) is
k if and only if M contains a k × k submatrix with 0 determinant, and every (k + 1) × (k + 1) submatrix has 0
determinant.
I’ll try to explain why this result must hold. First of all, by a submatrix of a matrix we mean either the matrix
itself, or any matrix obtained by crossing out some rows and/or columns of the original matrix. For example, the
3 × 4 matrix


a b c d
 e f g h 
i j k `
has a total of 18 2 × 2 submatrices, of which a few are
a b
a c
,
,
e f
e g
b
j
d
`
,
g
k
h
`
.
Before I try to explain why this theorem is true (an explanation you may skip if you trust everything I tell), I want
to consider some cases. Suppose m > n. Then there is no way we can find an m × m submatrix with a non-zero
determinant for the simple reason that you can’t find an m × m submatrix when you have fewer than m rows. This
simply reflects the fact that the dimension of Rn is n and if m > n, a set of m vectors cannot be independent.
So we may just restrict considerations to the case m ≤ n. Suppose m = n. Then the theorem tells us that the
n-vectors are linearly independent, hence a basis of Rn , if and only the determinant of M (in this case the one and
only n × n submatrix) is different from 0.
Here is a reason for the theorem. To simplify, I replace the matrix M by its transpose N ; N = M T , so N is m×n
and the vectors are the rows of N . Because he determinant of a matrix and its transpose are the same, nothing
changes much. Now suppose I row reduce N . A bit if reflection shows that in doing this, every row operation of
N is an equivalent row operation on some submatrices. The submatrices are also being row operated on, perhaps
reduced. The determinant can change by a row operation, but only by multiplication by a non-zero scalar. So, if
we reduce N to RRE, any submatrix of the original matrix N with a non-zero determinant will end as a matrix
of the RRE form of N with non-zero determinant. And conversely, zero determinants stay zero determinants. As
we learned before, the dimension of the subspace spanned by the vectors v1 , . . . , vm , is the number of non-zero
rows of the RRE form. A bit of reflection shows that if this number is k, then the largest non-zero determinant
8
DETERMINANTS
46
you can get in the RRE form is the k × k determinant obtained by crossing out all zero rows, and all columns not
containing a leading 1.
A very important application is the following.
Theorem 8 Let A be a square n × n matrix. Then A is invertible if and only if det(A) 6= 0.
I will try to at least give a reason why this theorem holds. Warning: Some of my arguments could be circular!
But, circles are, after all,consistent. Let us suppose first that A is a square invertible matrix. If we want to solve
any system of linear equations having A as its matrix, a system of the form
Ax = b,
there is a theoretically simple way of doing it. I say “theoretically simple,” because it isn’t the best to actually
use in practice. The method is to multiply on the right by the inverse. Well, let’s be precise. Assume first there is
a solution x. Then, because Ax = b, we have A−1 (Ax) = A−1 b, (A−1 A)x = A−1 b, Ix = A−1 b, x = A−1 b. In
other words, the only possible solution is x = A−1 b. Conversely, if we take x = A−1 b, then we verify at once that
Ax = b. That is:
If A is an invertible n × n square matrix, then the system of linear equations Ax = b has a unique solution for
every b ∈ Rn .
Fine. Now suppose A is a square, n × n matrix such that the equation Ax = b has a unique solution for every
b ∈ Rn . With e1 , . . . , en being the canonical basis of Rn , we solve the equations
Ax(1) = e1 ,
If these solutions are



x(1) = 

x11
x21
..
.
Ax(2) = e2 , . . . , Ax(n) = en .



,


,...,


x(n) = 

xn1
x1n
x2n
..
.





xnn
and we use them as columns of a matrix X; that is

x11
 x21

X= .
 ..
x12
x22
..
.
···
···
..
.
x1n
x2n
..
.
xn1
xn2
···
xnn





Then we see at once that AX = I, showing A is invertible. So we also have
If A is a square n × n matrix such that for every b ∈ Rn the system of equations Ax = b has a solution, then
A is invertible. And the solution is unique.
Putting it all together we see that a square n × n matrix is invertible if and only if the systems Ax = b have a
unique solution for every choice of b ∈ Rn . Well, if we recall Theorem 1, we see that this is equivalent to the RRE
form of the matrix A being the identity matrix. Let’s state this as a theorem.
Theorem 9 A a square n × n matrix A is invertible if and only if its RRE form is the n × n identity matrix.
And now we can explain why Theorem 8 must be true. When we row reduce a matrix the determinant might
change sign (if we interchange rows) or get multiplied by a constant (if we do that to a row). But if it starts as
being different from 0, it will end different from 0. And, of course, vice versa. If A is invertible, its RRE form is I,
det(I) = 1 6= 0, thus det(A) 6= 0. On the other hand, for a square matrix, the only way one can avoid getting I as
the RRE form is one either runs into a zero row or a zero column. In either case, the determinant is 0. Thus the
only square matrix in RRE form with non-zero determinant is I. So if det(A) 6= 0, then because the determinant
of its RRE form also must be 6= 0, the RRE form must be I and A is invertible.
We conclude this section with a new method for inverting matrices. It is probably the best method to use for
2 × 2 matrices, acceptable for 3 × 3 matrices, not so good for higher orders. To explain it, and also to write out
in a more precise way the computation of determinants by Laplace expansion, it is convenient to develop some
additional jargon.
8
DETERMINANTS
47
Let A = (aij )1≤i,j≤n be a square n × n matrix (n ≥ 2). If (i, j) is a position in the matrix, I will denote by
µij (A) the (n − 1) × (n − 1) matrix obtained from A by eliminating the i-th row and the j-th column. An n × n
matrix has n2 such submatrices. For example, if


a b c
A =  d e f ,
g h i
then these nine matrices are
e f
d f
d e
b c
a
µ11 (A) =
, µ12 (A) =
, µ13 (A) =
, µ21 (A) =
, µ22 (A) =
h i
g i
g h
h i
g
a b
b c
a c
a b
µ23 (A) =
, µ31 (A) =
, µ32 (A) =
, µ33 (A) =
.
g h
e f
d f
d e
c
i
,
The (i, j)-th minor of the matrix A is defined to be the determinant of µij . The (i, j)-th cofactor is the same as the
minor if (i, j) is a positive position, minus the minor otherwise. I will denote the (i, j)-th cofactor of A by cij (A);
a convenient way of writing is
cij (A) = (−1)i+j det(µij (A)).
The factor (−1)i+j is 1 precisely if (i, j) is a positive position; −1 otherwise.
The adjunct matrix A† is defined as the transpose of the matrix of cofactors:
T
A† = (cij (A)) .
This matrix is interesting because of the following result:
Theorem 10 Let A be a square matrix. Then
AA† = A† A = (det(A))I.
In particular, if det A = 0 then AA† is the zero matrix; not a very interesting observation. What is more
interesting is that if det A 6= 0, we can divide by determinant of A and get
1
1
A† =
A† A = I.
A
det A
det A
This provides an alternative reason for why Theorem 8 is true and shows:
Theorem 11 ?? If det(A) 6= 0, then A−1 =
1
†
det A A .
Let us use this method to try to find again the inverse

1
A= 2
1
First, we compute the determinant. No point in doing
row
1 2
det(A) = 2 3
1 0
=
of the matrix

2 2
3 1 
0 1
anything else if the determinant is 0. Expanding by the last
2
1
1
2
=
3
2 1
+
1 2
2 3 (14)
(2 − 6) + (3 − 4) = −5 6= 0
The matrix is invertible. Next we compute the 9 cofactors.
3 1 = 3, c12 (A) = − 2 1 = −1,
c11 (A) = + 1 1 0 1
2
c13 (A) = + 1
(15)
3 = −3,
0 8
DETERMINANTS
48
1 2 = −1, c23 (A) = − 1 2 = 2,
c22 (A) = + 1 0 1 1
1 2 1 2 = −1,
= 3, c33 (A) = + c32 (A) = + 2 3 2 1 2 2 = −2,
c21 (A) = − 0 1 2 2 = −4,
c31 (A) = + 3 1 The cofactor matrix is


3 −1 −3
 −2 −1
2 
−4
3 −1
Transposing we get the adjunct:

3
A† =  −1
−3

−2 −4
−1
3 
2 −1
The inverse is




3 −2 −4

1
3 =
A−1 = −  −1 −1

5

−3
2 −1
− 53
2
5
1
5
1
5
3
5
− 52
4
5



− 35 


1
5
Notes. Definitions of determinants vary. Of course, the end product is always the same! At an elementary level,
a common approach is to define the determinant of a square matrix recursively as the scalar you obtain when
expanding Laplace by the first row. Using our notation for submatrices this definition would look somewhat like
this: Let A = (aij )1≤i,j≤n be a square n × n matrix, say n ≥ 2 (one can start, as we did from n = 1, but most
texts will begin with n = 2). 1.) If n = 2 then det(A) = a11 a22 − a12 a13 . (Recursion base.)
2.) If n ≥ 3, define
n
X
det(A) =
(−1)1+j a1j det(µ1j (A)),
j=1
where (as above), µ1j (A) is the matrix obtained from A by crossing out the first row and j-th column. (Reduction
of the n case to the n − 1 case.)
The problem with this as a definition,rather than a theorem, is that it is
properties of the determinant. In particular, it isn’t easy to show, beginning
actually expand using any row, or even column, not just the first row. I won’t
write out now the Laplace expansion formulas for the determinant, since so far
It holds, for every i, 1 ≤ i ≤ n, that
det(A) =
n
X
quite hard to use it to verify any
with this definition, that you can
go into this any further, except to
I only gave them in words.
(−1)i+j aij det(µij (A)).
j=1
This is the expansion by rows. A similar result holds for columns. For each j such that 1 ≤ j ≤ n,
det(A) =
n
X
(−1)i+j aij det(µij (A)).
i=1
Laplace vs. Row Reduction. To compute a 2 × 2 determinant one has to perform 3 operations: 2 products and
a difference. Using Laplace, computing a 3 × 3 determinant reduces to computing three 2 × 2 determinants; a 4 × 4
determinant “reduces” to computing four 3 × 3 determinants, thus 4 × 3 = 12 two by two determinants. And so
forth. For an n × n matrix, there will be approximately n! = n(n − 1) · · · 3 · 2 operations involved in computing a
determinant by Laplace’s method. This is computationally very, very bad. It’s OK for small matrices, but as the
matrices get larger, it becomes a very bad method. Of course, if the matrix is what is called a sparse matrix, that
is, has a lot of zero entries, then it could be a reasonable method.
Consider now row reducing the matrix to triangular form. To get the first column to have all entries under
the first one equal to 0, you need to pivot (perhaps; i.e., get a non-zero entry to position (1, 1)), let’s call this one
8
DETERMINANTS
49
operation though it hardly takes time in a computer, divide each entry of the first row by the entry in position
(1, 1) (n operations), and then for each row below the first multiply the first row by the first entry of the row in
question (n operations) and subtract the first row multiplied by that entry from the row in question (n operations.
That is we have a maximum of 1 + n + (n − 1)n = n2 + 1 operations. We have to repeat similar operations for the
other columns; to simplify let’s say we go all the way to RRE form, so that we have n2 + 1 operations per column.
That is a total of n(n2 + 1) < (n + 1)3 operations. Now, for small values of n, there is little difference. For example,
if n = 4, then 4! = 24, (4 + 1)3 = 125. Row reduction seems worse (it actually isn’t). But, suppose n = 10. Then
10! = 3, 628, 800,
113 = 1331.
Row reduction is a polynomial time algorithm, while Laplace is super-exponential.
Cramer’s rule. The expression of the inverse of a matrix in terms of the adjunct matrix gives rise to a very
popular method for solving linear systems of equations in which the system matrix is square (as many equations
as unknowns) called Cramer’s rule,(so named for Gabriel Cramer, 
an 18thcentury Swiss mathematician who
x1
 .. 
supposedly first stated it). In words it states that the solution x =  .  of the n × n system Ax = b can
xn
be found as follows. The component xj equals the quotient of the determinant of the matrix obtained from A by
replacing the j-th column by b divided by the determinant of A. The advantage of this method, which only works
if det A 6= 0, is that you can compute one component of the solution without having to compute the others. The
disadvantage is that a computer might require more time to find one component by this method than all by row
reduction.
8.1
Exercises
To come.