Download MATHEMATICS Algebra, geometry, combinatorics

Document related concepts

History of mathematical notation wikipedia , lookup

History of mathematics wikipedia , lookup

Georg Cantor's first set theory article wikipedia , lookup

Vincent's theorem wikipedia , lookup

Location arithmetic wikipedia , lookup

Mathematical proof wikipedia , lookup

Line (geometry) wikipedia , lookup

Theorem wikipedia , lookup

Bra–ket notation wikipedia , lookup

Arithmetic wikipedia , lookup

Foundations of mathematics wikipedia , lookup

Factorization wikipedia , lookup

System of polynomial equations wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Algebra wikipedia , lookup

Addition wikipedia , lookup

Elementary mathematics wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
MATHEMATICS
Algebra, geometry, combinatorics
Dr Mark V Lawson
October 24, 2014
ii
Contents
1 The nature of mathematics
1.1 What are algebra, geometry and combinatorics?
1.1.1 Algebra . . . . . . . . . . . . . . . . . .
1.1.2 Geometry . . . . . . . . . . . . . . . . .
1.1.3 Combinatorics . . . . . . . . . . . . . . .
1.2 The scope of mathematics . . . . . . . . . . . .
1.3 Pure versus applied mathematics . . . . . . . .
1.4 The antiquity of mathematics . . . . . . . . . .
1.5 The modernity of mathematics . . . . . . . . .
1.6 The legacy of the Greeks . . . . . . . . . . . . .
1.7 The legacy of the Romans . . . . . . . . . . . .
1.8 What they didn’t tell you in school . . . . . . .
1.9 Further reading and links . . . . . . . . . . . . .
2 Proofs
2.1 How do we know what we think is true is
2.2 Three fundamental assumptions of logic
2.3 Examples of proofs . . . . . . . . . . . .
2.3.1 Proof 1 . . . . . . . . . . . . . . .
2.3.2 Proof 2 . . . . . . . . . . . . . . .
2.3.3 Proof 3 . . . . . . . . . . . . . . .
2.3.4 Proof 4 . . . . . . . . . . . . . . .
2.3.5 Proof 5 . . . . . . . . . . . . . . .
2.4 Axioms . . . . . . . . . . . . . . . . . .
2.5 Mathematics and the real world . . . . .
2.6 Proving something false . . . . . . . . .
2.7 Key points . . . . . . . . . . . . . . . . .
2.8 Mathematical creativity . . . . . . . . .
i
true?
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
5
7
8
9
11
12
14
15
15
16
.
.
.
.
.
.
.
.
.
.
.
.
.
19
20
22
23
23
26
28
29
31
37
41
41
42
43
ii
CONTENTS
2.9 Set theory: the language of mathematics . . . . . . . . . . . . 43
2.10 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . 52
3 High-school algebra revisited
3.1 The rules of the game . . . .
3.1.1 The axioms . . . . .
3.1.2 Indices . . . . . . . .
3.1.3 Sigma notation . . .
3.1.4 Infinite sums . . . .
3.2 Solving quadratic equations
3.3 Order . . . . . . . . . . . .
3.4 The real numbers . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
57
63
66
68
70
76
77
4 Number theory
4.1 The remainder theorem . . . . . . . . .
4.2 Greatest common divisors . . . . . . .
4.3 The fundamental theorem of arithmetic
4.4 Modular arithmetic . . . . . . . . . . .
4.4.1 Congruences . . . . . . . . . . .
4.4.2 Wilson’s theorem . . . . . . . .
4.5 Continued fractions . . . . . . . . . . .
4.5.1 Fractions of fractions . . . . . .
4.5.2 Rabbits and pentagons . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
81
81
91
97
108
109
112
113
113
116
.
.
.
.
.
.
.
.
.
.
.
.
.
123
. 123
. 131
. 132
. 134
. 136
. 141
. 141
. 141
. 145
. 148
. 150
. 151
. 151
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Complex numbers
5.1 Complex number arithmetic . . . . . . . . .
5.2 The fundamental theorem of algebra . . . .
5.2.1 The remainder theorem . . . . . . . .
5.2.2 Roots of polynomials . . . . . . . . .
5.2.3 The fundamental theorem of algebra
5.3 Complex number geometry . . . . . . . . . .
5.3.1 sin and cos . . . . . . . . . . . . . .
5.3.2 The complex plane . . . . . . . . . .
5.3.3 Arbitrary roots of complex numbers .
5.3.4 Euler’s formula . . . . . . . . . . . .
5.4 Making sense of complex numbers . . . . . .
5.5 Radical solutions . . . . . . . . . . . . . . .
5.5.1 Cubic equations . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
5.6
iii
5.5.2 Quartic equations . . . . . . . . . . . . . . . . . . . . . 154
5.5.3 Symmetries and particles . . . . . . . . . . . . . . . . . 156
Gaussian integers and factorizing primes . . . . . . . . . . . . 157
6 Rational functions
6.1 Numerical partial fractions . .
6.2 Analogies . . . . . . . . . . .
6.3 Partial fractions . . . . . . . .
6.4 Integrating rational functions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Matrices I: linear equations
7.1 Matrix arithmetic . . . . . . . . . . . . . . . . . . .
7.1.1 Basic matrix definitions . . . . . . . . . . .
7.1.2 Addition, subtraction, scalar multiplication
transpose . . . . . . . . . . . . . . . . . . .
7.1.3 Matrix multiplication . . . . . . . . . . . . .
7.1.4 Special matrices . . . . . . . . . . . . . . . .
7.1.5 Linear equations . . . . . . . . . . . . . . .
7.1.6 Conics and quadrics . . . . . . . . . . . . .
7.1.7 Graphs . . . . . . . . . . . . . . . . . . . . .
7.2 Matrix algebra . . . . . . . . . . . . . . . . . . . .
7.2.1 Properties of matrix addition . . . . . . . .
7.2.2 Properties of matrix multiplication . . . . .
7.2.3 Properties of scalar multiplication . . . . . .
7.2.4 Properties of the transpose . . . . . . . . . .
7.2.5 Some proofs . . . . . . . . . . . . . . . . . .
7.3 Solving systems of linear equations . . . . . . . . .
7.3.1 Some theory . . . . . . . . . . . . . . . . . .
7.3.2 Gaussian elimination . . . . . . . . . . . . .
7.4 Blankinship’s algorithm . . . . . . . . . . . . . . .
8 Matrices II: inverses
8.1 What is an inverse? . . . . . . .
8.2 Determinants . . . . . . . . . .
8.3 When is a matrix invertible? . .
8.4 Computing inverses . . . . . . .
8.5 The Cayley-Hamilton theorem .
8.6 Complex numbers via matrices .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
and
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
159
. 159
. 162
. 163
. 167
171
. 171
. 171
. .
. .
the
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
209
. 209
. 213
. 217
. 223
. 227
. 230
.
.
.
.
.
.
173
175
179
181
182
183
186
186
187
188
189
189
195
196
198
206
iv
9 Vectors
9.1 Vector algebra . . . . . . . . . . . . . . . . . . . . .
9.1.1 Addition and scalar multiplication of vectors
9.1.2 Inner, scalar or dot products . . . . . . . . .
9.1.3 Vector or cross products . . . . . . . . . . .
9.1.4 Scalar triple products . . . . . . . . . . . . .
9.2 Vector arithmetic . . . . . . . . . . . . . . . . . . .
9.2.1 i’s, j’s and k’s . . . . . . . . . . . . . . . . .
9.3 Geometry with vectors . . . . . . . . . . . . . . . .
9.3.1 Position vectors . . . . . . . . . . . . . . . .
9.3.2 Linear combinations . . . . . . . . . . . . .
9.3.3 Lines . . . . . . . . . . . . . . . . . . . . . .
9.3.4 Planes . . . . . . . . . . . . . . . . . . . . .
9.3.5 Determinants . . . . . . . . . . . . . . . . .
9.4 Summary of vectors . . . . . . . . . . . . . . . . . .
9.5 *Two vector proofs* . . . . . . . . . . . . . . . . .
9.6 Quaternions . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
231
. 232
. 232
. 238
. 240
. 243
. 245
. 245
. 249
. 249
. 250
. 251
. 255
. 258
. 263
. 266
. 268
Chapter 1
The nature of mathematics
This chapter is a guide to the mathematics described in this book.
1.1
1.1.1
What are algebra, geometry and combinatorics?
Algebra
Algebra started as the study of equations. The simplest kinds of equations
are ones like
3x − 1 = 0
where there is only one unknown x and that unknown occurs to the power
1. This means we have x alone and not, say, x1000 . It is easy to solve this
specific equation. Add 1 to both sides to get
3x = 1
and then divide both sides by 3 to get
1
x= .
3
This is the solution to my original equation and, to make sure, we check our
answer by calculating
1
3· −1
3
1
2
CHAPTER 1. THE NATURE OF MATHEMATICS
and observing that we really do get 0 as required. Even this simple example
raises an important point: to carry out these calculations, I had to know
what rules the numbers and symbols obeyed. You probably applied these
rules unconsciously, but in this book it will be important to know explicitly
what they are. The method used for the specific example above can be
applied to any equation of the form
ax + b = 0
as long as a 6= 0. Here a, b are specific numbers, probably real numbers, and
x is the real number I am trying to find. This equation is the most general
example of a linear equation in one unknown.
If x occurs to the power 2 then we get
ax2 + bx + c = 0
where a 6= 0. This is an example of a quadratic equation in one unknown. You
will have learnt a formula to solve such equations. But there is no reason to
stop at 2. If x occurs to the power 3 we get a cubic equation in one unknown
ax3 + bx2 + cx + d = 0
where a 6= 0. Solving such equations is much harder than solving quadratics
but there is also an algebraic formula for the roots. But there is no reason to
stop at cubics. We could look at equations in which x occurs to the power 4,
quartics, and once again there is a formula for finding the roots. The highest
power of x that occurs in such an equation is called its degree. These results
might lead you to expect that there are always algebraic formulae for finding
the roots of any polynomial equation whatever its degree. There aren’t.
For equations of degree 5, the quintics, and more, there are no algebraic
formulae which enable you to solve the equations. I don’t mean that no
formulae have yet been discovered, I mean that someone has proved that such
a formula is impossible, that someone being the young French mathematician
Evariste Galois (1811–1832), the James Dean of mathematics. Galois’s work
meant the end of the view that algebra was about finding formulae to solve
equations. We shall not study Galois’s work in this book but it has had a
huge impact on algebra. It is one of the reasons why the algebra you study
later in your university careers will look very different from the algebra you
studied at school. In fact, one of my goals in writing this book is to help you
navigate this transition.
1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 3
I have talked about solving equations where there is one unknown but
there is no reason to stop there. We can also study equations where there
are any finite number of unknowns and those unknowns occur to any powers.
The best place to start is where we have any number of unknowns but each
unknown can occur only to the first power and no products of unknowns are
allowed. This means we are studying linear equations like
x + 2y + 3z = 4.
Our goal is to find all the values of x, y and z that satisfy this equation.
Thus the solutions are ordered triples (x, y, z). For example, both (0, 2, 0)
and (2, 1, 0) are solutions whereas (1, 1, 1) is not a solution. It is unusual to
have just one linear equation to solve. Usually we have two or more such as
x + 2y + 3z = 4 and x + y + z = 0.
We then need to find all the triples (x, y, z) that satisfy both equations
simultaneously. In fact, as you should check, all the triples
(λ − 4, 4 − 2λ, λ)
where λ is any number satisfy both equations. For this reason, we often
speak about simultaneous linear equations. It turns out that solving systems
of linear equations never becomes difficult however many unknowns there
are. The modern way of studying systems of linear equations uses matrix
theory.
That leaves studying equations where there are at least 2 unknowns and
where there are no constraints on the powers of the unknowns and the extent
to which they may be multiplied together. This is much more complicated.
If you only allow squares such as x2 or products of at most two unknowns,
such as xy, then there are relatively simple methods for solving them. But,
even here, strange things happen. For example, the solutions to
x2 + y 2 = 1
can be written (x, y) = (sin θ, cos θ). If you allow cubes or products of more
than two unknowns then you enter the world of subjects like algebraic geometry and even connect with current research.
In this book, I shall introduce you to the theory of polynomial equations
and also to the theory of linear equations. I shall also show you how to solve
equations that look like this
ax2 + bxy + cy 2 + dx + ey + f = 0.
4
CHAPTER 1. THE NATURE OF MATHEMATICS
So far, I have been talking about the algebra of numbers. But I shall also
introduce you to the algebra of matrices, and the algebra of vectors, and the
algebra of subsets of a set, amongst others. In fact, I think the first shock
on encountering university mathematics can be summed up in the following
statement.
There is not one algebra, but many different algebras, each designed for different purposes.
These different algebras are governed by different sets of rules. For this
reason, it becomes crucial in university mathematics to make those rules
explicit. In this book, the algebra you studied at school I often call highschool algebra so we know what we are talking about.
In my description of solving equations, I have left to one side something
that probably seemed obvious: the nature of the solutions. These solutions
are of course numbers but what do we mean by ‘numbers’ ? You might think
that a number is a number but in mathematics this concept turns out to
be much more interesting than it might first appear. The everyday idea
of a number is essentially that of a real number. Informally, these are the
numbers that can be expressed as positive or negative decimals, with possibly
an infinite number of digits after the decimal place such as
π = 3 · 14159265358 . . .
where the dots indicate that this can be continued forever. Whilst such
numbers are sufficient to solve linear equations in one unknown, they are
not enough to solve quadratics, cubics, quartics etc. These require the introduction of complex numbers which involve such apparent ineffabilities as
the square root of minus one. Because such numbers don’t occur in everyday
life, there is a temptation to view them as somehow artificial or of purely
theoretical interest. This is wrong with a capital w. All numbers are artificial, in that they are artefacts of our imaginations that help us to understand
the world. Although you can see examples of two things you cannot see the
number two. It is an idea, an abstraction. As for being of only theoretical
interest, it is worth noting that quantum mechanics, the theory that explains
the behaviour of atoms and their constituents, uses complex numbers in an
essential way. In fact, for mathematicians the word ‘number’ usually means
‘complex number’ and mathematics is unthinkable without them.
1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 5
But this is not the end of our excavations of what we mean by the word
‘number’. There are occasions when we want to restrict the solutions: we
might want whole number solutions or solutions as fractions. It turns out
that the usual high-school methods for solving equations don’t work in these
cases. For example, consider the equation
2x + 4y = 3.
To find the real or complex solutions, we let x be any real or complex value
and then we can solve the equation to work out the corresponding value of
y. But suppose that we are only interested in whole number solutions? In
fact, there are none. You can see why by noting that the lefthand side of the
equation is exactly divisible by 2, whereas the righthand side isn’t. When
we are interested in solving equations, of whatever type, by means of whole
numbers or fractions we say that we are studying Diophantine equations. The
name comes from Diophantus of Alexandria who flourished around 250 CE,
and who studied such equations in his book Arithmetica. It is ironic that
solving Diophantine equations is often much harder than solving equations
using real or complex numbers.
1.1.2
Geometry
If algebra is about manipulating symbols, then geometry is about pictures.
The Ancient Greeks developed geometry to a very high level. Some of their
achievements are recorded in Euclid’s book the Elements which I shall have
more to say about later. It developed the whole of what became known as
Euclidean geometry on the basis of a few rules known as axioms. This geometry gives every impression of being a faithful mathematical version of the
geometry of actual space and for that reason you might expect that, unlike
algebra, there is only one geometry and that’s that. In fact, it was discovered
in the nineteenth century that there are other mathematical geometries such
as spherical geometry and hyperbolic geometry. In the twentieth century, it
became apparent that even the space we inhabit was much more complex
than it appeared. First came the four dimensional geometry of special relativity and then the curved space-time of general relativity. Modern particle
physics suggests that there may be many more dimensions in real space than
we can see. So, in fact, we have the following.
6
CHAPTER 1. THE NATURE OF MATHEMATICS
There is not one geometry, but many different geometries, each
designed for different purposes.
In this book, I will only talk about three-dimensional Euclidean geometry,
but this is the gateway to all these other geometries.
This, however, is not the end of the story. In fact, any book about algebra
must also be about geometry. The two are indivisible but it was not always
like that. Unlike geometry which began with a sort of Big Bang in Ancient
Greece, algebra crystallized much more slowly over time and in different
places. There is even some algebra, disguised, in the Elements. In the 17th
century, René Descartes discovered the first connection between algebra and
geometry which will be completely familiar to you. For example, x2 + y 2 = 1
is an algebraic equation, but it also describes something geometric: a circle
of unit radius centred on the origin. This connection between algebra and
geometry will play an important role in our study of linear equations and
vectors. But it is just a beginning.
If you are studying an algebra look for an accompanying geometry,
and if you are studying a geometry find a companion algebra.
This is quite a fancy way of saying things, but it boils down to the fact that
manipulating symbols is often helped by drawing pictures, and sometimes
the pictures are to complex so it is helpful to replace them with symbols. It’s
not a one-way street.
I want to give you some idea of why the connection between algebra and
geometry is so significant. Let me start with a problem that looks completely
algebraic. Problem: find all whole numbers a, b, c that satisfy the equation
a2 + b2 = c2 . I’ll write solutions that satisfy this equation as (a, b, c). Such
numbers are called Pythagorean triples. Thus (0, 0, 0) is a solution and so is
(3, 4, 5), and I can put in minus signs since when squared they disappear so
(−3, 4, −5) is a solution. In addition, if (a, b, c) is a solution so is (λa, λb, λc)
where λ is any whole number. I shall now show that this problem is equivalent
to one in geometry. Suppose first that a2 + b2 = c2 . We exclude the case
where c = 0 since then a = 0 and b = 0. We may therefore divide both sides
by c2 and get
a 2 b 2
+
= 1.
c
c
1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 7
Recall that a rational number is a real number that can be written in the
form uv where u and v are whole numbers and v 6= 0. It follows that
a b
(x, y) =
,
c c
is a rational point on the unit circle; that is, a point with rational co-ordinates.
On the other hand, if
m p
(x, y) =
,
n q
is a rational point on the unit circle then
(mq)2 (np)2
+
= 1.
(nq)2
(nq)2
Thus (mq, pn, nq) is a Pythagorean triple. We may therefore interpret our
algebraic question as a geometric one: to find all Pythagorean triples, find
all those points on the unit circle with centre the origin whose x and y coordinates are both rational. In fact, this can be used to get a very nice
solution to the original algebraic problem as we shall show later.
1.1.3
Combinatorics
The term ‘combinatorics’ may not be familiar though the sorts of questions
it deals with are. Combinatorics is the branch of mathematics that deals
with arrangements and the counting of arrangements. The fact that it deals
in counting makes it sound like this should be an easy subject. In fact, it is
often very difficult. For example, counting lies behind probability theory, a
subject that can often defy intuition. Let me give you a simple example. In a
class of, say, 25 students, how likely do you think it is that two students will
share the same birthday? By this I mean, the same date and month, though
not year. Unless you’ve seen this problem before, I think the instinct is to say
‘not very’. This is because we imagine in our mind’s eye those 25 students
to be arranged across 365 days without any pair of students landing on the
same date. In fact the answer, which you can calculate using the methods of
this book, is just over a half. In other words, there is the same chance of two
students sharing the same birthday as there is of tossing a coin and getting
heads. This little problem is often known as the birthday paradox. It is a
good example of where maths can be used to correct our faulty intuition. But
8
CHAPTER 1. THE NATURE OF MATHEMATICS
this is really a counting problem. To get the right answers to such problems,
you need to think about what you are counting in the right way.
1.2
The scope of mathematics
The most common replies to the question ‘what is mathematics?’ addressed
to a non-mathematician are usually the depressing ‘arithmetic’ or ‘accountancy’. Asked what they remember about school maths and they might be
able to dredge up some more-or-less arcane words with challenging spellings:
hypotenuse, isosceles, parallelogram. It either sounds a bit boring or a bit
weird, but in any event is so obviously completely removed from real life that
it can safely be ignored.
Mathematics, therefore, has an image problem.
I think part of the reason for this is the kind of maths that is taught in
schools and the way it is taught. School mathematics suffers by being based
on the narrow syllabuses proscribed by examining boards under political
direction. As a result, it is more by luck than design if anyone at school
gets an idea of what maths is actually about. In addition, teaching too often
means teaching to the exam, which means working through past exam papers
and learning tricks1 .
Let me begin by showing you just how vast a subject mathematics really
is. The official Mathematics Subject Classification currently divides mathematics into 64 broad areas in any one of which a mathematician could
work their entire professional life. You can see what they are in the box.
By the way, the missing numbers are deliberate and not because I cannot
count.
Mathematics Subject Classification 2010 (adapted)
00. General 01. History and biography 03. Mathematical logic
and foundations 05. Combinatorics 06. Order theory 08. General algebraic systems 11. Number theory 12. Field theory 13.
Commutative rings 14. Algebraic geometry 15. Linear and multilinear algebra 16. Associative rings 17. Non-associative rings 18.
Category theory 19. K-theory 20. Group theory and generaliza1
I say teaching and not teachers. My criticism is directed at policy not those who are
forced to carry out that policy often under enormous pressures.
1.3. PURE VERSUS APPLIED MATHEMATICS
9
tions 22. Topological groups 26. Real functions 28. Measure
and integration 30. Complex functions 31. Potential theory 32.
Several complex variables 33. Special functions 34. Ordinary differential equations 35. Partial differential equations 37. Dynamical
systems 39. Difference equations 40. Sequences, series, summability 41. Approximations and expansions 42. Harmonic analysis
43. Abstract harmonic analysis 44. Integral transforms 45. Integral
equations 46. Functional analysis 47. Operator theory 49. Calculus of variations 51. Geometry 52. Convex geometry and discrete
geometry 53. Differential geometry 54. General topology 55.
Algebraic topology 57. Manifolds 58. Global analysis 60. Probability theory 62. Statistics 65. Numerical analysis 68. Computer
science 70. Mechanics 74. Mechanics of deformable solids 76.
Fluid mechanics 78. Optics 80. Classical thermodynamics 81.
Quantum theory 82. Statistical mechanics 83. Relativity 85. Astronomy and astrophysics 86. Geophysics 90. Operations research
91. Game theory 92. Biology 93. Systems theory 94. Information
and communication 97. Mathematics education
Each of these broad areas is then subdivided into a large number of smaller
areas, any one of which could be the subject of a PhD thesis. This is a little
overwhelming, so to make it more manageable it can be summarized, very
roughly, into the following ten areas:
Algebra
Calculus and analysis
Combinatorics
Geometry and topology
Logic
Number theory
Probability and statistics
Differential equations
Mathematical physics
Computing
Most undergraduate courses will fit under one of these headings. But it is
important to remember that mathematics is one subject — dividing it up
into smaller areas is done for convenience only. When solving a problem any
and all of the above areas might be needed.
1.3
Pure versus applied mathematics
Sometimes a distinction is drawn between pure and applied mathematics.
Pure maths is supposed to be maths done for its own sake with no thought to
10
CHAPTER 1. THE NATURE OF MATHEMATICS
applications, whereas applied maths is maths used to solve some, presumably
practical, problem. I think there is often an implicit moralistic undertone to
this distinction with pure maths being viewed as perhaps rather self-indulgent
and decorative, and applied maths as socially responsible grown-up maths
that pays its way. Politicians prefer applied maths because they think it will
make money. Evidence for this distinction is the following quote from the
English mathematician G. H. Hardy (1877–1947) that is often used to prove
the point:
“I have never done anything ‘useful’. No discovery of mine has
made, or is likely to make, directly or indirectly, for good or ill,
the least difference to the amenity of the world.”
Hardy was a truly great mathematician and a decent human being. As
his dates show, he was of the generation that witnessed the First World
War where science and technology were applied to the business of wholesale
slaughter. His views on maths are therefore a not unnatural reaction on
the part of someone who taught young people who then went to war never
to return. Maths for him was perhaps a sanctuary2 . In reality, the terms
pure and applied are extremely fuzzy. A mathematician might start work on
solving a real-life problem and then be led to develop new pure mathematics,
or start in pure maths and develop an application. Calculus, for example,
developed mainly out of the need to solve problems in physics and then was
applied to pure maths. Complex numbers couldn’t have been more pure,
introduced to provide the missing roots to polynomial equations, but are now
the basis of quantum mechanics. In reality, there is just one mathematics.
The Banach-Tarski Paradox
The glory of mathematics is often to be found in its sheer weirdness.
For a universe founded on logic, it can lead to some pretty confounding
conclusions. For example, a solid the size of a pea may be cut into a
finite number of pieces which may then be reassembled in such a way as
to form another solid the size of the sun. This is known as the BanachTarski Paradox (1924). There’s no trickery involved here and no sleight
of hand. This is clearly pure maths — give me a real pea and whatever
I do it will remain resolutely pea-sized — but the ideas it uses involve
2
There was a similar reaction at the end of the Second World War amongst physicists
who turned instead to biology as an alternative to building weapons.
1.4. THE ANTIQUITY OF MATHEMATICS
11
such fundamental and seemingly straightforward notions as length, area
and volume that have important applications in applied maths.
1.4
The antiquity of mathematics
The history of chemistry or astronomy is not hugely relevant, however interesting it may be, to modern theories of chemistry or astronomy. A few hundred years ago, chemistry was alchemy and astronomy was astrology: modern
chemists are not searching for the philosopher’s stone and astronomers don’t
caste horoscopes. Alchemists and astrologers are often the forbears they
would prefer to forget.3 Maths is different, since what was mathematically
true hundreds of years ago remains true today. Here is a famous example.
Plimpton 322 is a small clay tablet kept in the George A. Plimpton Collection
at Columbia University dating to about 1,800 BCE. Impressed on the tablet
are a number of columns of numbers written in cuneiform. The numbers are
written not in base 10 but in base 60, the base that still lies behind the way
we tell the time and measure angles. The meaning and purpose of this clay
tablet is much disputed. But the second and third columns consist of the
following numbers, where I have given the usual corrected numbers. I have
given the first seven lines of the table — there are fifteen in the original.
1
2
3
4
5
6
7
B
C
119
169
3367 4825
4601 6649
12709 18541
65
97
319
481
2291 3541
If you calculate C 2 − B 2 you will get a perfect square D2 . Thus (B, D, C) is
a Pythagorean triple. How such large Pythagorean triples were computed is
a mystery.
This antiquity, combined with the fact that maths is a cumulative subject,
meaning that you have to learn X before you can learn Y , has the unfortunate
3
I am exaggerating a little here for rhetorical purposes. In fact, much fine work was
carried out under the guise of alchemy and astrology.
12
CHAPTER 1. THE NATURE OF MATHEMATICS
effect that most of the mathematics you learnt at school was invented before
1800. Here is a very rough chronology.
BCE
2000 Solving quadratics
400 Existence of irrational numbers
300 Euclidean geometry
200 Conics
CE
1550
1590
1630
1675
1700
1795
Solving cubics and quartics
Logarithms
Analytic geometry
Calculus
Probability
Complex numbers
Only matrices (1850) and vectors (1880) were introduced more recently. However, if you think of all the developments in physics since 1800 such as black
holes, the big bang theory, parallel universes, quantum then you might suspect that there have also been big developments in mathematics. There have,
but you would be forgiven for not knowing about them because they are not
promoted in the media or taught in school.
I should add that like any other field of human endeavour, it is of course
true that mathematical ideas go in and out of fashion, but crucially they
don’t become wrong with time.
1.5
The modernity of mathematics
The fact that what’s taught in schools doesn’t seem to change much from
generation to generation leads to one of the biggest misconceptions about
mathematics: that it has already all been discovered. To try and bring you
up to date, I am going to say a little about three mathematicians and their
work: Alan Turing (1912–1954), Sir Andrew Wiles (b. 1953), and Terence
Tao (b. 1975). I have chosen them to illustrate some additional points I want
to make about maths.
Alan Turing
Alan Turing is the only mathematician I know who has had a West End
play written about his life: the 1986 play Breaking the code by Hugh Whitemore. Turing is best known as one of the leading members of Bletchley Park
during the Second World War, for his role in the British development of
computers during and after the War, and for the ultimately tragic nature of
1.5. THE MODERNITY OF MATHEMATICS
13
his early death. Here I want to return to Turing the mathematician. As a
graduate student, he wrote a paper in 1936 entitled On computable numbers
with an application to the Entscheidungsproblem, where the long German
word means decision problem and refers to a specific question in mathematical logic. It was as a result of solving this problem that Turing was led to
formulate a precise mathematical blueprint for a computer now called Turing machines in his honour. This is the most extreme example I know of
a problem in pure maths leading to new applied maths — in fact, it led to
the whole field of computer science and the information age we now inhabit.
Amongst computer scientists, Turing is regarded as the father of computer
science. So, mathematicians invented the modern world.
Andrew Wiles
Mathematicians operate on a completely different timescale from everyone
else. I have already talked about Pythagorean triples, those whole numbers
(x, y, z) that satisfy the equation x2 + y 2 = z 2 . Here’s an idle thought. What
happens if we try to find whole number solutions to x3 +y 3 = z 3 or x4 +y 4 = z 4
or more generally xn + y n = z n where n ≥ 3. Let’s exclude the trivial case
where some of the numbers x, y or z are 0. So, here is the question: for n ≥ 3
find all whole number solutions to xn + y n = z n where xyz 6= 0. Back in the
17th century, Pierre de Fermat (1601?–1665) wrote in the margin of a book,
the Arithmetica of Diophantus, that he had found a proof that there were
no such solutions but that sadly there wasn’t enough room for him to record
it. This became known as Fermat’s Last Theorem. In fact, since Fermat’s
supposed proof was never found, it was really a conjecture. More to the
point, it is highly unlikely that he ever had a proof since in the subsequent
centuries many attempts were made to prove this result, all in vain, although
substantial progress was made. This problem became one of mathematics’
many Mount Everests: the peak that everyone wanted to scale. Finally, on
Monday 19th September, 1994, sitting at his desk, Andrew Wiles, building
on over three centuries of work, and haunted by his premature announcement
of his success the previous year, had a moment of inspiration as the following
quote from the Daily Telegraph dated 3rd May 1997 reveals
“Suddenly, totally unexpectedly, I had this incredible revelation.
It was so indescribably beautiful, it was so simple and so elegant.”
As a result Fermat’s Conjecture really is a theorem, but the proof required
travelling through what can only be described as mathematical hyperspace.
14
CHAPTER 1. THE NATURE OF MATHEMATICS
Wiles’s reaction to his discovery is also a glimpse of the profound intellectual
excitement that engages the emotions as well as the intellect when doing
mathematics4 .
Terence Tao
Tao won the 2006 Field’s medal. This is a mathematical honour comparable with a Nobel Prize though with the added twist that you have to be
under 40 to get one. You can read his thoughts at his blog, as well as use it
to find all manner of interesting things. So, what sorts of things does he do?
Here is one example that is remarkably easy to explain though the proof is
formidable. You know what primes are and, in any event, we shall talk about
them later. They can be regarded as the atoms of numbers and their properties have inspired hard questions and deep results. One of the things that
interests mathematicians is the sorts of patterns that can be found in primes.
An arithmetic progression is a sequence of numbers of the form a + dk where
a and d are fixed numbers. Consider the arithmetic progression 3 + 2k. Observe that for the consecutive values of k = 0, 1, 2, the numbers 3, 5, 7 which
arise are all prime. But when k = 3 we get 9 which is not prime. Our little
example is an instance of an arithmetic progression with 3 terms all prime.
Here is one with 10 terms 199 + 210k where k = 0, 1, . . . , 9. In 2004, Tao
and his colleague Ben Green proved that there were arithmetic progressions
of arbitrary length all of whose terms are prime. In other words, for any
number n there is an arithmetic progression so that the first n terms are all
prime.
1.6
The legacy of the Greeks
The word ‘mathematics’ is Greek. In fact, many mathematical terms are
Greek: lemma, theorem, hypotenuse, orthogonal, polygon, to name just a
few. The Greek alphabet is used as a standard part of mathematical notation. The very concept of a mathematical proof is a Greek idea. All of this
reflects the fact that Ancient Greece is the single most important historical
influence on the development and content of mathematics. By Ancient Greek
4
There is a BBC documentary directed by Simon Singh about Andrew Wiles made
for the BBC’s Horizon series. It is an exemplary example of how to portray complex
mathematics in an accessible way and cannot be too highly recommend.
1.7. THE LEGACY OF THE ROMANS
15
mathematics, I mean the mathematics developed in the wider Greek world
around the Mediterranean in the thousand or more years between roughly
600 BCE and and 600 CE. It begins with the work of semi-mythical figures,
such as Thales of Miletus and Pythagoras of Samos, and is developed in the
books of such mathematicians as Euclid, Archimedes, Apollonius of Perga,
Diophantus and Pappus. Of all the Ancient Greek mathematicians the greatest was Archimedes. His work is sophisticated mathematics of the highest
order. In particular, he developed methods that are close to those of integral
calculus and used them to calculate areas and volumes of complicated curved
shapes.
1.7
The legacy of the Romans
For all their aqueducts, roads, baths and maintenance of public order, it has
been said of the Romans that their only contribution to mathematics was
when Cicero rediscovered the grave of Archimedes and had it restored5 .
1.8
What they didn’t tell you in school
This book is written to help you make the transition from school maths to
university maths. You might well still be in school, or you might have left
school fifty years ago, it doesn’t matter. Maths as taught in school and the
maths taught at university are very different, but the failure to understand
those differences can cause problems. To be successful in university mathematics you have to think in new ways. University Mathematics is not just
School Mathematics with harder sums and fancier notation, it is different,
fundamentally different, from what you did at school.
In much of school mathematics, you learn methods for solving specific problems. Often, you just learn formulae.
A method for solving a problem that requires little thought in its application is called an algorithm. Computer programs are the supreme examples
of algorithms, and it is certainly true that finding algorithms for solving specific problems is an important part of mathematics, but it is by no means the
5
George Simmons, Calculus Gems, McGraw-Hill, Inc., New York, 1992, page 38.
16
CHAPTER 1. THE NATURE OF MATHEMATICS
only part. Problems do not come neatly labelled with the methods needed
for their solution. A new problem might be solvable using old methods or
it might require you to adapt those methods. On the other hand, you may
have to invent completely new methods to solve it. Such new methods require new ideas. In fact, what you might not have appreciated from school
mathematics is the important role played in mathematics by ideas. An idea
is a tool to help you think.
Mathematics at school is often taught without reasons being given
for why the methods work.
This is the fundamental difference between school mathematics and university mathematics. A reason why something works is called a proof. I shall
say a lot more about proofs in Chapter 2.
The Millennium Problems
Mathematics is difficult but intellectually rewarding. Just how hard can
be gauged by the following. The Millennium Problems is a list of seven
outstanding problems posed by the Clay Institute in the year 2000. A
correct solution to any one of them carries a one million dollar prize.
To date, only one has been solved, the Poincaré conjecture, by Grigori
Perelman in 2010, who declined to take the prize money. The point is
that no one offers a million dollars for something that is trivial. You can
read more about these problems at
http://www.claymath.org/millennium-problems
1.9
Further reading and links
There is a wealth of material about mathematics available on the Web and
I would encourage exploration. Here, I will point out some books and links
that develop the themes of this chapter. A book that is in tune with the
goals of this chapter is
P. Davis, R. Hersh, E. A. Marchisotto, The mathematical experience, Birkhäuser,
2012.
1.9. FURTHER READING AND LINKS
17
It’s one of those books that you can dip into and you will learn something
interesting but, most importantly, it will expand your understanding of what
mathematics is, as it did mine.
A good source book for the history of mathematics, and again something
that can be dipped into, is
C. B. Boyer, U. C. Merzbach, A history of mathematics, Jossey Bass, 3rd
Edition, 2011.
The books above are about maths rather than doing maths. Let me now
turn to some books that do maths in a readable way. There is a plethora
of popular maths books now available, and if you pick up any books by Ian
Stewart — though if the book appears to be rather more about volcanoes
than is seemly in a maths book, you have Iain Stewart — and Peter Higgins
then you will find something interesting. Sir (William) Timothy Gowers won
a Field’s Medal in 1998 and so can be assumed to know what he is talking
about.
T. Gowers, Mathematics: A Very Short Introduction, Oxford University
Press, 2002
It is worth checking out his homepage for some interesting links. He also
has his own blog which is worth checking out. I think the Web is serving to
humanize mathematicians: their ivory towers all have wi-fi. A classic book
of this type is
R. Courant, H. Robbins, What is mathematics, OUP, 1996.
This is also an introduction to university-level maths, and it has influenced
my thinking on the subject.
If you have never looked into Euclid’s book the Elements, then I would
recommend you do6 . There is an online version that you can access via David
E. Joyce’s website at Clark University. A handsome printed version, edited
by Dana Densmore, has been published by Green Lion Press, Santa Fe, New
Mexico.
6
Whenever I refer to Euclid, it will always be to this book. It consists of thirteen
chapters, themselves called ‘books’, which are numbered in the Roman fashion I–XIII.
18
CHAPTER 1. THE NATURE OF MATHEMATICS
Finally, let me mention the books of Martin Gardner. For a quarter of
a century, he wrote a monthly column on recreational mathematics for the
Scientific American which inspired amateurs and professionals alike. I would
start with
M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi:
Martin Gardner’s first book of mathematical puzzles and games, CUP, 2002
and follow your interests.
Chapter 2
Proofs
Part of the argument sketch, Monty Python
M = Man looking for an argument
A = Arguer
M: An argument isn’t just contradiction.
A: It can be.
M: No it can’t. An argument is a connected series of statements intended to
establish a proposition.
A: No it isn’t.
M: Yes it is! It’s not just contradiction.
A: Look, if I argue with you, I must take up a contrary position.
M: Yes, but that’s not just saying ‘No it isn’t.’
A: Yes it is!
M: No it isn’t!
A: Yes it is!
M: Argument is an intellectual process. Contradiction is just the automatic
gainsaying of any statement the other person makes.
(short pause)
A: No it isn’t.
The most fundamental difference between school and university mathematics lies in proofs. At school, you were probably told mathematical facts
and given recipes that solved particular kinds of problems. But the chances
19
20
CHAPTER 2. PROOFS
are, you were not given any reasons to back up those facts or explanation
as to why those recipes worked. University and professional mathematics is
different. Reasons and explanations are essential and are called proofs. They
are the essence of mathematics. Mathematical truth, and the notion of proof
that supports it, is so different from what we encounter in everyday life that
I shall need to begin by setting the scene.
2.1
How do we know what we think is true is
true?
Human beings usually believe something first for emotional reasons, and
then look for the evidence to back it up. The pitfalls of this are obvious.
We shall therefore be interested in reasons that do not involve emotion. To
be concrete, how would you verify the following claim: Mount Everest is
between 8 and 9 km high?
The appeal to authority
In the past, claims such as this would be resolved by consulting an encyclopedia or atlas whereas today, of course, we would simply go online. If
you do this, you will find that a height of about 8.8 km is quoted. For most
purposes this would settle things. But it’s important to understand what
this entails. We are, in effect, taking someone’s word for it. We assume that
whoever posted this information knows what they are talking about. What
we are doing, therefore, is appealing to authority. Most of what we take to
be true is based on such appeals to authority: parents, teachers, politicians,
religiosi etc tell us things that they claim to be true and more often than not
we believe them. There’s a small element of laziness involved on our part,
but it is so convenient. The pitfalls of this are also obvious.
The appeal to experiment
But where did the figure of 8.8km come from? It wasn’t just plucked
from the sky. The height of Mount Everest was first measured as part of the
great survey of India undertaken in the nineteenth century. This consisted of
a team of expert surveyers who not only employed extremely precise instruments that were used to take multiple measurements but who also tried to
2.1. HOW DO WE KNOW WHAT WE THINK IS TRUE IS TRUE?
21
minimize the effect of factors influencing the accuracy of their measurements
such as temperature and, amazingly, variations in gravity. Making measurements and taking great pains over those measurements together with estimations of the error bounds is such an important part of science that science
itself would be impossible without it. Let’s call this the appeal to experiment.
This brings me to how we know statements are true in mathematics. The
essential point is the following:
Neither of the above methods for ascertaining truth plays any
role whatsoever in determining mathematical truth.
This is so important, I am going to say it again in a different way:
• Results are not true in maths because I say so or because someone
important said they were true a long time ago.
• Results are not true in mathematics because I have carried out experiments and I always get the same answer.
• Results are not true in maths ‘just because they are’.
How then can we determine whether something in mathematics is true?
• Results are true in maths only because they have been proved to be
true.
• A proof shows that a result is true.
• A proof is something that you yourself can follow and at the end you
will see the truth of what has been proved.
• A result that has been proved to be true is called a theorem.
• The appeal to authority and the appeal to experiment are both fallible.
The appeal to proof is never fallible. The only truths we know for
certain are mathematical truths.
This is heady stuff. So what, then, is a proof? The remainder of this
chapter is devoted to an introductory answer to this question.
22
2.2
CHAPTER 2. PROOFS
Three fundamental assumptions of logic
In order to understand how mathematical proofs work, there are three simple, but fundamental, assumptions you have to understand.
I. Mathematics only deals in statements that are capable
of being either true or false.
Mathematics does not deal in statements which are ‘sometimes true’ or
‘mostly false’. There are no approximations to the truth in mathematics and
no grey areas. Either a statement is true or a statement is false, though we
might not know which. This is quite different from everyday life, where we
often say things which contain a grain of truth or where we say things for
rhetorical reasons which we don’t entirely mean. Mathematics also doesn’t
deal in statements that are neither true nor false like exclamations such as
‘Out damned spot!’ or with questions such as ‘To be or not to be?’.
II. If a statement is true then its negation is false, and if
a statement is false then its negation is true.
In natural languages, negating a sentence is achieved in different ways.
In English, the negation of ‘It is raining’ is ‘It is not raining’. In French,
the negation of ‘Il pleut’ is obtained by wrapping the verb in ‘ne . . . pas’ to
get ‘It ne pleut pas’. To avoid grammatical idiosyncracies, we can use the
formal phrase ‘it is not the case that’ and place it in front of any sentence
to negate it. So, ‘It is not the case that it is raining’ is the negation of ‘It is
raining’. In some languages, and French is one of them, adding negatives is
used for emphasis. This used to be the case in older forms of English and is
often the case in informal English. In formal English, we are taught that two
negatives make a positive which is actually the rule taken from mathematics
above where it is true. In fact, negating negatives in natural languages is
more complex than this. For example, if your partner says they are ‘not unhappy’ then this isn’t quite the same as being ‘happy’ and maybe you need
to talk.
III. Mathematics is free of contradictions.
2.3. EXAMPLES OF PROOFS
23
A contradiction is where both a statement and its negation are true. This
is impossible by (II) above. This assumption will play a vital role in proofs
as we shall see later.
2.3
Examples of proofs
Armed with the three assumptions above, I am going to take you through
five proofs of five results, three of them being major theorems. This will
enable me to show you examples of proofs but will also illustrate important
issues about how proofs, and mathematics, work.
Although proofs can be long or short, hard or easy they all tend to follow
the same script. First, there will be a statement of what is going to be
proved. This usually has the form: if a bunch of things are assumed true
then something else is also true. If the things assumed true are lumped
together as A, for assumptions, and the thing to be proved true is labelled
C, for conclusion, then a statement to be proved usually has the shape ‘if A
then C’ or ‘A implies C’ or, in notation, ‘A ⇒ C’. The proof itself should
be thought of as a (rational) argument between two protagonists whom we
shall call Alice and Bob. We assume that Alice wants to prove C. She can
use any of the assumptions A, any previously proved theorems, the rules of
logic, which I shall describe as we meet them, and definitions. Bob’s role
is to act like an attorney and to demand that Alice justify each claim she
makes. Thus Alice cannot just make assertions without justifying them, and
she is limited in the sorts of things that count as justifications. At the end
of this, Alice can say something like ‘ . . . and so C is proved’ and Bob will
be forced to agree.
2.3.1
Proof 1
We shall prove the following statement.
The square of an even number is even, and the square of an odd
number is odd.
In fact, this is really two statements ‘If n is an even number then n2 is even’
and ‘If n is an odd number then n2 is odd.’ Before we can prove them, we
24
CHAPTER 2. PROOFS
need to understand what they are actually saying. The terms odd and even
are only used of whole numbers such as
0, 1, 2, 3, 4, . . .
These numbers are called the natural numbers and they are the first kinds
of numbers we learn about as children. Thus we are being asked to prove a
statement about natural numbers. The terms ‘odd’ and ‘even’ might seem
obvious, but we need to be clear about how they are used in maths. By
definition, a natural number n is even if it is exactly divisible by 2, otherwise
it is said to be odd. In maths, we usually just say divisible rather than exactly
divisible. This definition of divisibility only makes sense when talking about
whole numbers. For fractions, for example, it is pointless since one fraction
will always divide another fraction. Notice that 0 is an even number because
0 = 2 × 0. In other words, 0 is exactly divisible by 2. However, remember,
you cannot divide by 0 but you can certainly divide into 0. You might have
been told that a number is even if its last digit is one of the digits 0, 2, 4, 6, 8.
In fact, this is a consequence of our definition rather than a definition itself.
I shall ask you to prove this result in the exercises. I shall say no more about
the definition of even. What about the definition of odd? A number is odd
if it is not even. This is not a very useful definition since a number is odd
if it fails to be even. We want a more positive characterization. So we shall
describe a better one. If you attempt to divide a number by 2 then there are
two possibilities: either it goes exactly, in which case the number is even, or
it goes so many times plus a remainder of 1, in which case the number is odd.
It follows that a better way of defining an odd number n is one that can be
written n = 2m + 1 for some natural number m. So, the even numbers are
those natural numbers that are divisible by 2, thus the numbers of the form
2n for some n, and the odd numbers are those that leave the remainder 1
when divided by 2, thus the numbers of the form 2n + 1 for some n. Every
number is either odd or even but not both.
There is a moral to be drawn from what I have just done, and I shall
state it boldly because of its importance. It may seem obvious but experience shows that it is, in fact, not.
Every time you are asked to prove a statement, you must ensure
that you understand what that statement is saying. This means,
in particular, checking that you understand what all the words in
2.3. EXAMPLES OF PROOFS
25
the statement mean.
The next point is that we are making a claim about all even numbers. If
you pick a few even numbers at random and square them then you will find
in every case that the result is even but this does not prove our claim. Even if
you checked a trillion even numbers and squared them and the results were all
even it wouldn’t prove the claim. Maths, remember, is not an experimental
science. There are plenty of examples in maths of statements that look true
and are true for umpteen cases but are in fact bunkum.
This means that, in effect, we have to prove an infinite number of statements: 02 is even, and 22 is even, and 42 is even . . . I cannot therefore prove
my claim by picking a specific even number, like 12, and checking that its
square is even. This simply verifies one of the infinitely many statements
above. As a result, the starting point for my proof cannot be a specific even
number. It has to be a general even number. We are now in a position to
prove our claims.
First, we prove that the square of an even number is even.
1. Let n be an even number. This is the assumption that gets the ball
rolling. Notice that n is not a specific even number. We want to prove
something for all even numbers so we cannot argue with a specific one.
2. Then n = 2m for some natural number m. Here we are using the
definition of what it means to be an even number.
3. Square both sides of the equation in (2) to get n2 = 4m2 . To do this
correctly, you need to follow the rules of high-school algebra.
4. Now rewrite this equation as n2 = 2(2m2 ). This uses more basic highschool algebra.
5. Since 2m2 is a natural number, it follows that n2 is even using our
definition of an even number. This proves our claim.
Second, we prove that the square of an odd number is odd. I’ll provide less
commentary than in the previous case.
1. Let n be an odd number.
2. By definition n = 2m + 1 for some natural number m.
26
CHAPTER 2. PROOFS
3. Square both sides of the equation in (2) to get n2 = 4m2 + 4m + 1.
4. Now rewrite the equation in (3) as n2 = 2(2m2 + 2m) + 1.
5. Since 2m2 + 2m is a natural number, it follows that n2 is odd using our
definition of an odd number. This proves our claim.
We have therefore proved our two claims. I admit that they are not
exciting but just bear with me.
2.3.2
Proof 2
We shall prove the following statement.
If the square of a number is even then that number is even, and
if the square of a number is odd then that number is odd.
In fact, this is really two statements ‘If n2 is even then n is even’ and ‘If n2
is odd then n is odd’. At first reading, you might think that I am simply
repeating what I proved above. But in Proof 1, I proved
‘if n is even then n2 is even’
whereas now I want to prove
‘if n2 is even then n is even’.
Our assumptions in each case are different and our conclusions in each case
are different. It is therefore important to distinguish between A ⇒ B and
B ⇒ A. The statement B ⇒ A is called the converse of the statement
A ⇒ B. Experience shows that people are prone to swapping assumptions
and conclusions without being aware of it.
We prove the first claim.
1. Suppose that n2 is even.
2. Now it is very tempting to try and use the definition of even here, just
as we did in Proof 1, and write n2 = 2m for some natural number m.
But this turns out to be a dead-end. Just like playing a game such as
chess, not every possible move is a good one. Choosing the right move
comes with experience and sometimes just plain trial-and-error.
2.3. EXAMPLES OF PROOFS
27
3. So we make a different move. We know that n is either odd or even.
Our goal is to prove that it must be even.
4. Could n be odd? The answer is no, because as we showed in Proof 1,
if n is odd then, as we showed above, n2 is odd.
5. Therefore n is not odd.
6. But a number that is not odd must be even. It follows that n is even.
We use a similar strategy to prove the second claim.
The proofs here were more subtle, and less direct, than in our first example and they employed the following important strategy: if there are two
possibilities exactly one of which is true; we rule out one of those possibilities
and so deduce that the other possibility must be true.1
Here is a concrete example. There are two politicians, Alice and Bob.
One of them always lies and the other always tells the truth. Suppose you
ask Bob the question: is it true that 2 + 2 = 5? If he replies ‘yes’ then you
know Bob is lying. Without further ado, you can deduce that Alice is that
paragon of politicians and always tells the truth.
If A ⇒ B and B ⇒ A then we say that A if, and only, if B or A iff B
or A ⇔ B. The use of the word iff is peculiar to mathematical English. If
we combine Proofs 1 and 2, we have proved the following two statements for
all natural numbers n: ‘n is even if, and only if, n2 is even’ and ‘n is odd if,
and only if, n2 is odd’.
It is important to remember that the statement ‘A if, and only, if B’ is in
fact two statements in one. It means (1) ‘A implies B’ and (2) ‘B implies
A’. So, to prove the statement ‘A if and only if B’ we have to prove TWO
statements: we have to prove ‘A implies B’ and we have to prove ‘B implies
A’.
The results of this example were trickier to prove than the previous ones,
but not much more exciting. However, we have now laid the foundations for
a truly remarkable result.
1
This might be called the Sherlock Holmes method. “How often have I said to you that
when you have eliminated the impossible, whatever remains, however improbable, must
be the truth?” The Sign of Four, 1890.
28
CHAPTER 2. PROOFS
2.3.3
Proof 3
We shall now prove our first real theorem.
√
2 cannot be written as an exact fraction.
If you square each of the fractions in turn
3 7 17 41
, , , ,...
2 5 12 29
you will find that you get closer and closer to 2 and so each of these numbers
is an approximation to the square root of 2. This raises the question: is
it possible to find a fraction xy whose square is exactly 2? In fact, it isn’t
but that isn’t proved just because my attempts above failed. Maybe, I just
haven’t looked
√ hard enough. So, I have to prove that it is impossible. To
prove that 2 is not an exact fraction, I am actually going to begin by trying
to show you that it is.
√
1. Suppose that 2 = xy where x and y are positive whole numbers where
y 6= 0.
2. We may assume that xy is a fraction in its lowest terms so that the only
natural number that divides both x and y is 1. Keep your eye on this
assumption because it will come back to sting us later.
3. Square both sides of the equation in (2) to get 2 =
x2
.
y2
4. Multiply both sides of the equation in (3) by y 2 .
5. We therefore get the equation 2y 2 = x2 .
6. Since 2 divides the lefthandside of this equation, it must divide the
righthandside. This means that x2 is even.
7. We now use Proof 2 to deduce that x is even.
8. We may therefore write x = 2u for some natural number u.
9. Substitute this value for x we have found in (5) to get 2y 2 = 4u2 .
10. Divide both sides of the equation in (9) by 2 to get y 2 = 2u2 .
2.3. EXAMPLES OF PROOFS
29
11. Since the righthand-side of the equation in (10) is even so is the lefthandside. Thus y 2 is even.
12. Since y 2 is even, it follows by Proof 2, that y is even.
13. If (1) is true then we are led to the following two conclusions. From (2),
we have that the only natural number to divide both x and y is 1. From
(7) and (12), 2 divides both
√ x and y. This is a contradiction. Thus (1)
cannot be true. Hence 2 cannot be written as an exact fraction.
This result is phenomenal. It says that no matter how much money you
spend
on a computer it will never be able to calculate the exact value of
√
2, just a very, very good approximation. We now make a very important
definition. A real number
√ that is not rational is called irrational. We have
therefore proved that 2 is irrational.
2.3.4
Proof 4
We now prove our second real theorem.
The sum of the angles in a triangle add up to 180◦ .
This is a famous result that everyone knows. You might have learnt about
it at school by drawing lots of triangles and measuring their angles but as
I said above, maths is not an experimental science and so this enterprize
proves nothing. The proof I give is very old and occurs in Euclid’s book the
Elements: specifically, Book I, Proposition 32. Draw a triangle and call its
three angles α, β and γ respectively.
β
γ
α
Our goal is to prove that
α + β + γ = 180◦ .
In fact, we shall show that the three angles add up to a straight line which
is the same thing. Draw a line through the point P parallel to the base of
the triangle.
30
CHAPTER 2. PROOFS
P
β
γ
α
Then extend the two sides of the triangle that meet at the point P as shown.
0
γ 0 β α0
β
α
γ
As a result, we get three angles that I have called α0 , β 0 and γ 0 . I now make
the following claims
• β 0 = β because the angles are opposite each other in a pair of intersecting straight line.
• α0 = α because these two angles are formed from a straight line cutting
two parallel lines.
• γ 0 = γ for the same reason as above.
But since α0 and β 0 and γ 0 add up to give a straight line, we have proved the
claim.
Now this is all well and good, but we have proved our result on the basis
of three other results currently unproved:
1. That given a line l and a point P not on that line I may draw a line
through the point P and parallel to l.
2. If two line intersect, then opposite angles are equal.
3. If a line l cuts two parallel lines l1 and l2 the angle l makes with l1 is
the same as the angle it makes with l2 .
2.3. EXAMPLES OF PROOFS
31
How do we know they are true? Result (2) can readily be proved. We shall
use the diagram below.
β
γ
α
δ
The proof that α = γ follows from the simple observation that α + β = β + γ.
This still leaves (1) and (3). I shall say more about them later when I talk
about axioms.
2.3.5
Proof 5
The most famous theorem of them all is the one attributed to Pythagoras
and proved in Book I, Proposition 47 of Euclid. We begin with a right-angled
triangle.
c
a
b
We want to prove, of course, that
a2 + b 2 = c 2 .
Consider the shape below. It has been constructed from four copies of our
triangle and two squares of areas a2 and b2 , respectively. I claim that this
shape is actually a square. First, the sides all have the same length a + b.
Second, the angles at the corners are right angles by Proof 4.
32
CHAPTER 2. PROOFS
a
a
b
a2
b2
b
Now look at the following picture. This is also a square with sides a + b so it
has the same area as the first square. Using Proof 4, the shape in the middle
really is a square with area c2 .
b
a
a
b
c2
a
b
a
b
If we subtract the four copies of the original triangle from both squares, the
shapes that remain must have the same areas, and we have proved the claim.
2.3. EXAMPLES OF PROOFS
33
Exercises 2.3
1. Raymond Smullyan is both a mathematician and a magician. Here
are two of his puzzles. On an island there are two kinds of people:
knights who always tell the truth and knaves who always lie. They are
indistinguishable.
(a) You meet three such inhabitants A, B and C. You ask A whether
he is a knight or knave. He replies so softly that you cannot make
out what he said. You ask B what A said and they say ‘he said
he is a knave’. At which point C interjects and says ‘that’s a lie!’.
Was C a knight or a knave?
(b) You encounter three inhabitants: A, B and C.
A says ‘exactly one of us is a knave’.
B says ‘exactly two of us are knaves’.
C says: ‘all of us are knaves’.
What type is each?
2. There are five houses, from left to right, each of which is painted a
different colour, their inhabitants are called W, C, O, S and M, but not
necessarily in that order, who own different pets, drink different drinks
and drive different cars.
(a) There are five houses.
(b) W lives in the red house.
(c) C owns the dog.
(d) Coffee is drunk in the green house.
(e) O drinks tea.
(f) The green house is immediately to the right (that is: your right)
of the ivory house.
(g) The Oldsmobile driver owns snails.
(h) The Bentley owner lives in the yellow house.
(i) Milk is drunk in the middle house.
(j) S lives in the first house.
34
CHAPTER 2. PROOFS
(k) The person who drives the Chevy lives in the house next to the
man with the fox.
(l) The Bentley owner lives in a house next to the house where the
horse is kept.
(m) The Lotus owner drinks orange juice.
(n) M drives the Porsche.
(o) S lives next to the blue house.
There are two questions: who drinks water and who owns the aardvark?
3. Prove that the sum of any two even numbers is even, that the sum of
any two odd numbers is even, and that the sum of an odd and an even
number is odd.
4. Prove that the sum of the interior angles in any quadrilateral is equal
to 360◦ .
5.
(a) A rectangular box has side of length 2, 3 and 7 units. What is the
length of the longest diagonal?
(b) I draw a square. Without measuring any lengths, you now have
construct a square that has exactly twice the area.
(c) A right-angled triangle has sides with lengths x, y and hypotenuse
2
z. Prove that if the area of the triangle is z4 then the triangle is
isosceles.
6.
(a) Prove that the last digit in the square of a positive whole number
must be one of 0,1,4,5,6, or 9. Is the converse true?
(b) Prove that a natural number is even if, and only if, its last digit
is even.
(c) Prove that a natural number is exactly divisible by 9 if, and only
if, the sum of its digits is divisible by 9.
√
7. Prove that 3 cannot be written as an exact fraction.
2.3. EXAMPLES OF PROOFS
35
8. The goal of this question is to prove Ptolomy’s theorem2 . This deals
with cyclic quadrilaterals, that is those quadrilaterals whose vertices lie
on a circle. With reference to the diagram below,
C
b
c
B
y
x
D
d
a
A
this theorem states that
xy = ac + bd.
Hint. Show that on the line BD there is a point X such that the angle
X ÂD is equal to the angle B ÂC. Deduce that the triangles AXD and
ABC are similar, and that the triangles AXB and ACD are similar.
Let the distance between D and X be e. Show that
c
y−e
b
e
= and that
= .
a
x
d
x
From this, the result follows by simple algebra. To help you show that
the triangles are similar, you will need to use Proposition III.21 from
Euclid which is illustrated by the following diagram
2
Claudius Ptolomeus was a Greek mathematician and astronomer who flourished
around 150 CE in the city of Alexandria.
36
CHAPTER 2. PROOFS
9. The goal of this question is to find all Pythagorean triples. That is
natural numbers (a, b, c) such that a2 + b2 = c2 . We shall do this using
geometry by finding all the rational points on the unit circle. We shall
use the diagram below.
P
A
We have drawn a unit circle centre the origin. From the point (−1, 0),
called A, we draw a line to any other point P on the circle.
(a) Show that any line passing through the point A has the equation
y = t(x + 1) where t is any real number.
(b) Show that this line intersects the circle at some point P on the
circle, different from A, when
1 − t2 2t
(x, y) =
,
.
1 + t2 1 + t2
(c) Deduce that the rational points on the circle correspond to the
values of t which are rational.
2.4. AXIOMS
37
(d) Put t = pq , in its lowest terms. Deduce that all Pythagorean triples
are obtained as the following
(r(q 2 − p2 ), 2pqr, r(p2 + q 2 ))
where p, q, r are any integers.
10. Take any positive natural number n; so n = 1, 2, 3, . . . If n is even,
divide it by 2 to get n2 ; if n is odd, multiply it by 3 and add 1 to obtain
3n+1. Now repeat this process and stop only if you get 1. For example,
if n = 6 you get 6, 3, 10, 5, 16, 8, 4, 2, 1. What happens if n = 11? What
about n = 27? Prove that no matter what number you start with, you
will always eventually reach 1.
2.4
Axioms
At this point, I need to confront some potential problems with the idea of
proof I have been developing. Once this is done, I will then be able to
complete the proof of Proof 4. Suppose I am trying to prove the statement
S. Then I am done if I can find a theorem S1 so that S1 ⇒ S. But this raises
the question of how I know that S1 is a theorem. This can only be because I
can find a theorem S2 such that S2 ⇒ S1 . There are now three possibilities:
1. At some point I find a theorem Sn such that S ⇒ Sn . This is clearly
a bad thing. In trying to prove S I have in fact used S and so haven’t
proved anything at all. This is an example of circular reasoning and has
to be avoided. I can do this by organizing what I know in a hierarchy
— so to prove a result, I am only allowed to use those theorems already
proved. In this way, I can avoid going around in circles.
2. Assuming I have avoided the above pitall, the next nasty possibility is
that I get an infinite sequence of implications:
. . . ⇒ Sn ⇒ Sn−1 ⇒ . . . ⇒ S1 ⇒ S.
I never actually know that S is a theorem because it is always proved in
terms of something else without end. This is also clearly a bad thing.
I establish relative truth, a statement is true if another is true, but not
absolute truth. I clearly don’t want this to happen. But if not, then I
am led inexorably to the third possibility.
38
CHAPTER 2. PROOFS
3. To prove S, I only have to prove only a finite number of implications
Sn ⇒ Sn−1 ⇒ . . . ⇒ S1 ⇒ S.
But, if Sn is supposed to be a theorem then how do I know it is true if
not in terms of something else, contradicting the assumption that this
was supposed to be a complete argument?
I shall now delve into case (3) above in more detail, since resolving it will
lead to an important insight. Maths is supposed to be about proving theorems but the analysis above has led us to the uncomfortable possibility that
some things have to be accepted as true ‘because they are’ which contradicts
what I went to great trouble to rubbish earlier. Before I explain the way
out of this conundrum, let me first consider an example from an apparently
completely different enterprize: playing a game.
To be concrete, let’s take the game of chess. Most people have learnt
chess at some point even if, like me, you are not very good at it. This game
consists of a board and some pieces. The pieces are of different types — kings,
queens, knights, bishops, castles, pawns — each of which can be moved in
different ways. To play chess means to accept the rules of chess and to move
the pieces in accordance with the rules. Whether one player wins or there is
a draw is also described by the rules of chess. It’s meaningless to ask whether
the rules of chess are true. But a move in chess is valid, which is another way
of saying true, if it is made according to those rules. This example provides
a way of understanding how maths works.
Maths should be viewed as a collection of different mathematical domains
each described by its own ‘rules of the game’ which in maths are termed
axioms. These axioms are the basic assumptions on which the theory is built
and are the building blocks of all proofs within that mathematical domain.
Our goal is to prove interesting theorems from those axioms.
As an example, consider Euclidean geometry. The Greeks attributed the
discovery of geometry to the Ancient Egyptians who needed it in recalculating land boundaries for the purposes of tax assessment after the yearly flood
of the Nile. Thus geometry probably first existed as a collection of geometrical methods that worked: the tax was calculated, the pyramids built and
everyone was happy. But it was the Ancient Greeks themselves who elevated
it into a mathematical science and a model of what could be achieved in
mathematics. Euclid’s book the Elements codified what was known about
geometry into a handful of axioms and then showed that all of geometry
2.4. AXIOMS
39
could be deduced from those axioms by the use of mathematcial proof. The
Elements is not only the single most important mathematics book ever written but one of the most important books — fullstop. Here is a list of the key
axioms.
1. Two distinct points determine a unique straight line.
2. A line segment can be extended infinitely in either direction.
3. Circles can be drawn with any centre and any radius.
4. Any two right angles are equal to each other.
5. Suppose that a straight line cuts two lines l1 and l2 . If the interior
angles on the same side add up to strictly less than 180◦ , then if l1 and
l2 are extended on that side they will eventually meet.
The last axiom needs a picture to illustrate what is going on.
l1
l2
In principle, all of the results you learnt in school about triangles and circles can be proved from these axioms. I say ‘in principle’ since there were
a few bugs which were later fixed by a number of mathematicians most notably David Hilbert. But this shouldn’t detract from what an enormous
achievement Euclid’s book was and is. We may now finish off Proof 4: claim
40
CHAPTER 2. PROOFS
(1) is proved in Book I, Proposition 31, and claim (3) is proved in Book I,
Proposition 29.
One way of teaching maths at university would therefore be to start with
a list of axioms and start proving things. But this approach has a number of disadvantages: it is time-consuming, laborious, sometimes, even, a
bit tedious, and takes a very, very long time to reach the really interesting
theorems. Therefore, in this book, I shall usually base each topic on quite
high-level axioms so that we can get to the interesting theorems quickly,
but I shall also give pointers to readers who want to see the full axiomatic
development.
Exercises 2.4
1. Hofstadter’s M U -puzzle. A string is just an ordered sequence of symbols. In this puzzle, you will construct strings using the letters M, I, U .
You are given the string M I which is your only axiom. You can make
new strings only by using the following rules any number of times in
succession in any order:
(I) If you have a string that ends in I then you can add a U on at the
end.
(II) If you have a string M x where x is a string then you may form
M xx.
(III) If III occurs in a string then you may make a new string with
III replaced by U .
(IV) If U U occurs in a string then you may erase it.
I shall write x → y to mean that y is the string obtained from the string
x by applying one of the above four rules. Here are some examples:
• By rule (I), M I → M IU .
• By rule (II), M IU → M IU IU .
• By rule (III), U M IIIM U → U M U M U .
• By rule (IV), M U U U II → M U II.
The question is: can you make M U ?
2.5. MATHEMATICS AND THE REAL WORLD
2.5
41
Mathematics and the real world
Euclidean geometry appears to be about the real world. In fact, for thousands of years this was what mathematicians believed until they discovered
other geometries with different properties. On the surface of a sphere, for
example, the sum of the angles in a spherical triangle will actually be bigger
than 180◦ , the exact amount being determined by the area of the triangle.
This result played an important role in surveying. But our analysis above
leads us to the following conclusion:
Mathematics is about logically consistent mathematical universes.
A mathematical truth is therefore something proved in one of those mathematical universes, and is not a truth about ‘out there’. Despite this, mathematical truths do help us to understand the actual physical universe we
inhabit. For example, does the geometry of the universe follow the rules of
Euclidean geometry? Here is what NASA says on the basis of the Wilkinson
Microwave Anisotropy Probe (WMAP):
“WMAP also confirms the predictions that the amplitude of the
variations in the density of the universe on big scales should be
slightly larger than smaller scales, and that the universe should
obey the rules of Euclidean geometry so the sum of the interior
angles of a triangle add to 180 degrees.”
http://map.gsfc.nasa.gov/news/index.html
2.6
Proving something false
‘Proving a statement true’ and ‘proving a statement false’ sound similar
but it turns out that ‘proving a statement false’ requires a lot less work
than ‘proving a statement true’. There is an asymmetry between them. To
prove a statement false all you need do is find a counterexample. Here is an
example. Consider the following statement: every odd number bigger than 1
is a prime. This is false. The reason is that 9 is odd, bigger than 1, and not
prime. Thus 9 is a counterexample. The number 9 here can be regarded as
a witness that shows the claim to be false. To prove a statement true, you
have to work hard. To prove a statement false, you only have to find one
42
CHAPTER 2. PROOFS
counterexample and you are done. (Though in research mathematics finding
a counterexample can be a Herculean task).
2.7
Key points
• One of the goals of this book is to introduce you to proofs. This does
not mean that you will afterwards be able to do proofs. That takes
time and practice.
• Initially, you should aim to understand proofs. This means seeing
why a proof is true. A good test of whether you really understand a
proof is whether you can explain it to someone else. It is much easier
to check that a proof is correct then it is to invent the proof in the
first place. Nevertheless, be warned, it can also take a long time just
to understand a proof.
• I shall ask you to find some proofs for yourself. But do not expect to
find them in a few minutes. Constructing proofs takes time, trial and
error and, yes, luck.
• If you don’t understand the words used in a statement that you are
asked to prove then you are not going to be able to prove that statement. Definitions are vitally important in mathematics.
• Every statement that you make in a proof must be justified: if it is a
definition, say that it is a definition; if it is a result known to be true,
that is a theorem, say that it is known to be true; if it is one of the
assumptions, say that it is one of the assumptions; if it is an axiom,
say that it is an axiom.
• When starting out, it is probably best to write each statement of a
proof on a separate line followed by its justification.
Finally, there are one or two pieces of terminology and notation that are
worth mentioning here. The conclusion of a proof is marked using the symbol
2. This replaces the older use of QED. If we believe something might be true
but there isn’t yet a proof we say that it is a conjecture. The things we can
prove fall, roughly, into the following categories: a theorem is a major result,
worthy of note; a proposition is a result, and a lemma is an auxiliary result,
2.8. MATHEMATICAL CREATIVITY
43
a tool, useful in many different places; a corollary is a result we can deduce
with little or no effort from a proposition or theorem.
2.8
Mathematical creativity
Everything I have said above is true, but does need to be placed in perspective. Where do proofs come from? More to the point, where do theorems
come from? Music is a useful analogy. You can learn how to write music
down, but that doesn’t make you a musician. In fact, there are some talented
musicians who cannot even read music. Proofs keep us honest and ground
what we are doing, but what makes maths fun is that it is creative, and
for creativity there are no rules. For example, in dreaming up a theorem,
experimentation may well play a role. Sometimes a theorem may evolve in
tandem with a proof, at other times the theorem, or more accurately, the
conjecture comes first and then there is the struggle to prove it, which may
take place over many generations and centuries.
2.9
Set theory: the language of mathematics
Everyday English is good at everyday jobs, but can be hopelessly imprecise where accuracy is important. To get around this, special varieties of
English, little dialects, have been constructed for particular purposes. In
mathematics, we use precise versions of everyday language augmented with
special symbols. Part of this special language is that of set theory, invented
by Georg Cantor (1845–1918) in the last quarter of the nineteenth century.
This section is mainly a phrasebook of the most important terms we shall
need for most of this book. I shall develop this language further when I need
to when studying combinatorics.
The starting point of set theory are the following two deceptively simple
definitions:
• A set is a collection of objects which we wish to regard as a whole. The
members of a set are called its elements3 .
• Two sets are equal precisely when they have the same elements.
3
Strictly speaking this definition is nonsense. Why?
44
CHAPTER 2. PROOFS
We often use capital letters to name sets: such as A, B, or C or fancy capital
letters such as N and Z. The elements of a set are usually denoted by lower
case letters. If x is an element of the set A then we write
x∈A
and if x is not an element of the set A then we write
x∈
/ A.
A set should be regarded as a bag of elements, and so the order of the
elements within the set is not important. In addition, repetition of elements
is ignored.4
Examples 2.9.1.
1. The following sets are all equal: {a, b}, {b, a}, {a, a, b}, {a, a, a, a, b, b, b, a}
because the order of the elements within a set is not important and any
repetitions are ignored. Despite this it is usual to write sets without
repetitions to avoid confusion. We have that a ∈ {a, b} and b ∈ {a, b}
but α ∈
/ {a, b}.
2. The set {} is empty and is called the empty set. It is given a special
symbol ∅, which is taken from Danish and is the first letter of the Danish
word meaning ‘empty’. Remember that ∅ means the same thing as {}.
Take careful note that ∅ =
6 {∅}. The reason is that the empty set
contains no elements whereas the set {∅} contains one element. By the
way, the symbol for the emptyset is different from the Greek letter phi:
φ or Φ.
The number of elements in a set is called its cardinality. If X is a set
then |X| denotes its cardinality. A set is finite if it only has a finite number
of elements, otherwise it is infinite. If a set has only finitely many elements
then we might be able to list them if there aren’t too many: this is done by
putting them in ‘curly brackets’ { and }. We can sometimes define infinite
sets by using curly brackets but then, because we can’t list all elements in
an infinite set, we use ‘. . .’ to mean ‘and so on in the obvious way’. This can
also be used to define finite sets where there is an obvious pattern. Often,
4
If you want to take account of repetitions you have to use multisets.
2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS
45
we describe a set by saying what properties an element must have to belong
to the set. Thus
{x : P (x)}
means ‘the set of all things x which satisfy the condition P ’. Here are some
examples of sets defined in various ways.
Examples 2.9.2.
1. D = { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday }, the set of the days of the week. This is a small finite set and so
we can conveniently list its elements.
2. M = { January, February, March, . . . , November, December }, the set
of the months of the year. This is a finite set but I didn’t want to write
down all the elements so I wrote ‘. . . ’ to indicate that there were other
elements of the set which I was too lazy to write down explicitly but
which are, nevertheless, there.
3. A = {x : x is a prime number}. I define a set by describing the properties that the elements of the set must have. Here P (x) is the statement
‘x is a prime number’ and those natural numbers x are admitted membership to the set when they are indeed prime.
In this book, the following sets of numbers will play a special role. We
shall use this notation throughout and so it is worthwhile getting used to it.
Examples 2.9.3.
1. The set N = {0, 1, 2, 3, . . .} of all natural numbers.
2. The set Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} of all integers. The reason
Z is used to designate this set is because ‘Z’ is the first letter of the
word ‘Zahl’, the German for number.
3. The set Q of all rational numbers i.e. those numbers that can be written
as fractions whether positive or negative.
4. The set R of all real numbers i.e. all numbers which can be represented
by decimals with potentially infinitely many digits after the decimal
point.
46
CHAPTER 2. PROOFS
5. The set C of all complex numbers, which I shall introduce from scratch
later on.
Given a set A, a new set B can be formed by choosing elements from
A to put in B. We say that B is a subset of A, which is written B ⊆ A.
In mathematics, the word ‘choose’ also includes the possibilty of choosing
nothing and the possibility of choosing everything. In addition, there doesn’t
have to be any rhyme or reason to your choices: you can pick elements ‘at
random’ if you want. If B ⊆ A and A 6= B then we say that B is a proper
subset of A.
Examples 2.9.4.
1. ∅ ⊆ A for every set A, where we choose no elements from A. It is a
very common mistake to forget the empty set when listing subsets of a
set.
2. A ⊆ A for every set A, where we choose all the elements from A. It is
a very common mistake to forget the set itself when listing subsets of
a set.
3. N ⊆ Z ⊆ Q ⊆ R ⊆ C.
4. E, the set of even natural numbers, is a subset of N.
5. O, the set of odd natural numbers, is a subset of N.
6. P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}, the set of primes, is a subset of N.
7. A = {x : x ∈ R and x2 = 4} which is just the set {−2, 2}.
There is a particular kind of subset that will be convenient to define now.
If A and B are sets we define the set A \ B to consist of those elements of A
that are not in B. Thus, in particular, A \ B ⊆ A. The operation is called
relative complement. For example, N \ E = O. The set R \ Q is precisely the
set of irrational numbers.
When set theory is first encountered it doesn’t look very impressive. What
could you possibly do with these very simple, if not simple-minded, definitions? In fact, all of mathematics can be developed using set theory. I am
going to finish off this section with a first glimpse at the power of set theory.
2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS
47
Consider the set {a, b}. I have explained above that order doesn’t matter
and so this is the same set as {b, a}. But there are many occasions where
we do want order to matter. For example, in the Olympics it is important
to know who came first and second in the 100m sprint not merely that the
first two over the finishing line were X and Y in alphabetical order. So we
need a new notion where order does matter. It is called an ordered pair and
is written (a, b), where a is called the first component and b is called the
second component. The key feature of this new object is that (a, b) = (c, d)
if, and only if, a = c and b = d. So, order matters. For example, the ordered pair (1, 2) is different from the ordered pair (2, 1). Furthermore, (1, 1)
does not mean the same as 1 on its own. The idea of an ordered pair is a
familiar one from co-ordinate geometry. We use ordered pairs of real numbers (x, y) to specifiy points in the plane. At first blush, set theory seems
inadequate to define ordered pairs. But in fact it can. I have put the details
in a box and you don’t need to read them to understand the rest of the book.
Ordered Pairs
I am going to show you how sets, which don’t encode order directly,
can nevertheless be used to define ordered pairs. It is an idea due to
Kuratowski (1896–1980). Define
(a, b) = {{a}, {a, b}}.
We have to prove, using only this definition, that we have (a, b) = (c, d)
if, and only if, a = c and b = d. The proof is essentially an exercise in
special cases. I shall prove the hard direction. Suppose that
{{a}, {a, b}} = {{c}, {c, d}}.
Since {a} is an element of the lefthand side it must be an element of the
righthand side. So {a} ∈ {{c}, {c, d}}. There are now two possibilities.
Either {a} = {c} or {a} = {c, d}. The first case gives us that a = c, and
the second case gives us that a = c = d. Since {a, b} is an element of the
lefthand side it must be an element of the righthand side. So {a, b} ∈
{{c}, {c, d}}. There are again two possibilities. Either {a, b} = {c} or
{a, b} = {c, d}. The first case gives us that a = b = c, and the second
48
CHAPTER 2. PROOFS
case gives us that (a = c and b = d) or (a = d and b = c). We therefore
have the following possibilities:
• a = b = c. But then {{a}, {a, b}} = {{a}}. It follows that c = d
and so a = b = c = d and, in particular, a = c and b = d.
• a = c and b = d.
• In all remaining cases, a = b = c = d and so, in particular, a = c
and b = d.
We can now build sets of ordered pairs. Let A and B be sets. Define
A × B, the product of A and B, to be the set
A × B = {(a, b) : a ∈ A and b ∈ B}.
Example 2.9.5. Let A = {1, 2, 3} and let B = {a, b}. Then
A × B = {(1, a), (1, b), (2, a), (2, b), (3, a), (3, b)}
and
B × A = {(a, 1), (1, b), (a, 2), (b, 2), (a, 3), (b, 3)}.
So, in particular, A × B 6= B × A, in general.
If A = B it is natural to abbreviate A × A as A2 . This now agrees with
the notation R2 which is the set of all ordered pairs of real numbers and,
geometrically, can be regarded as the real plane.
We have defined ordered pairs but there is no reason to stop with just
pairs. We may also define ordered triples. This can be done by defining
(x, y, z) = ((x, y), z).
The key property of ordered triples is that if (a, b, c) = (d, e, f ) then a = d,
b = e and c = f . Given three sets A, B and C we may define their product
A × B × C to be the set of all ordered triples (a, b, c) where a ∈ A, b ∈ B
and c ∈ C. A good example of an ordered triple in everyday life is a date
that consist of a day, a month and a year. Thus the 16th June 1904 is really
an ordered triple (16, June, 1904) where we specify day, month and year in
that order. If A = B = C then we write A3 rather than A × A × A. Thus
the set R3 consists of all Cartesian co-ordinates (x, y, z). In general, we may
define ordered n-tuples, which look like this (x1 , . . . , xn ), and products of nsets A1 × . . . × An . And if A1 = . . . = An then we write An for their n-fold
product.
2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS
49
Russell’s Paradox
There is more to sets than meets the eye. I shall now describe a famous
result in the history of mathematics called Russell’s Paradox. Define
the following
R = {x : x ∈
/ x},
in other words: the set of all sets that do not contain themselves as an
element. For example, ∅ ∈ R. We now ask the question: is R ∈ R?
Before resolving this question, let’s back off a bit and ask what it means
for X ∈ R. From the entry requirements, we would have to show that
X ∈
/ X . Putting X = R we deduce that R ∈ R is true only if R ∈
/ R.
Since this is an evident contradiction, we are inclined to deduce that R ∈
/
R. However, if R ∈
/ R then in fact R satisfies the entry requirements
to be an element of R and so R ∈ R. Thus exactly one of R ∈ R and
R∈
/ R must be true but assuming one is true implies the other is true.
We therefore have an honest-to-goodness contradiction. Our only way
out is to conclude that, whatever R might be, it is not a set. But this
in turn contradicts my definition of a set as a collection of objects since
R is a collection of objects. If you want to understand how to escape
this predicament, you will have to study set theory. Disconcerting as this
might be to you, imagine how much more so it was to the mathematician
Gottlob Frege (1848–1925). He was working on a book which based the
development of maths on sets when he received a letter from Russell
describing this paradox and undermining what Frege was attempting to
achieve.
Bertrand Russell himself was an Anglo-Welsh philosopher born in
1872, when Queen Victoria still had another thirty years on the throne
as ‘Queen empress’, and who died in 1970 a few months after Neil Armstrong stepped onto the moon. As a young man he made important
contributions to the foundations of mathematics but in the course of his
extraordinary life he found time to stand for parliament, encouraged the
philosopher Ludwig Wittgenstein, received two prison sentences, won
the Nobel prize for literature, was the first president of CND, and campaigned against the Vietnam war. See Russell: a very short introduction
by A. C. Grayling published by OUP, 2002, for a very short introduction.
I shall conclude this section by touching on a fundamental notion of mathematics: that of a function. I shall approach it by first defining something
50
CHAPTER 2. PROOFS
more general.
Let A and B be any sets. By definition a subset X ⊆ A × B is called
a relation from A to B. To motivate this definition, and new terminology, I
shall consider an example.
Example 2.9.6. Let A be the set {A(dam), B(eth), C(ate), D(ave)} of people. Let B be the set {a(apples), b(ananas), o(ranges)} of fruit. Define X to
be the following set of ordered pairs
{(A, a), (A, o), (B, b), (D, a), (D, b), (D, o)}
which tells us who likes which fruit. Thus, for example, Adam likes apples
and oranges (but not bananas) and Cate doesn’t like any of the fruit on offer.
It is pretty irresistible to represent this information by means of a directed
graph, such as the one below. Clearly, such graphs can be drawn to represent
any relation. The term ‘relation’ is now explained by the fact that X tells
us how the elements of A are related to the elements of B. In this case, the
relation is ‘likes to eat’.
A
a
B
o
C
b
D
Let X be a relation from A to B. We say that X is a function if it satisfies
two additional conditions: first, for each a ∈ A there is at least one b ∈ B
such that (a, b) ∈ X; second, if (a, b), (a, c) ∈ X then b = c. If we think back
to the graph in our example above, then the first condition says that every
element in A is at the base of an arrow, and the second condition says that
for each element in A is never at the base of two, or more, arrows. Slightly
different notation is used when dealing with functions. Rather than thinking
of ordered pairs, we think instead of inputs and outputs. Thus a function
from A to B is determined when for each a ∈ A there is associated exactly
one element b ∈ B. We think of a as the input and b as the corresponding,
uniquely determined, output. If we denote our function by f then we write
b = f (a) or that a 7→ b. Thus the corresponding relation is the set of all
2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS
51
ordered pairs (a, f (a)) where a ∈ A We call the set A the domain of the
function and the set B the codomain of the function. We write f : A → B or
f
A → B.
Example 2.9.7. Here is an example of a function f : A → B. Let A be the
set of all students in the lecture theatre at this time. Let B be the set of
natural numbers. Then f is defined when for each student a ∈ A we associate
their age f (a). We can see why this is precisely a function and not merely a
relation. First, everyone has an age and, assuming they don’t lie, they have
exactly one age. On the other hand, if we kept A as it is and let B be the
set of nationalities then we will no longer have a function in general. Some
people might be stateless, but even if we include that as a possibility in the
set B, we still won’t necessarily have a function since some people own more
than one passport.
Exercises 2.9
1. Let A = {♣, ♦, ♥, ♠}, B = {♠, ♦, ♣, ♥} and C = {♠, ♦, ♣, ♥, ♣, ♦, ♥, ♠}.
Is it true or false that A = B and B = C? Explain.
2. Let X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Write down the following subsets of
X:
(a) The subset A of even elements of X.
(b) The subset B of odd elements of X.
(c) C = {x : x ∈ X and x ≥ 6}.
(d) D = {x : x ∈ X and x > 10}.
(e) E = {x : x ∈ X and x is prime}.
(f) F = {x : x ∈ X and (x ≤ 4 or x ≥ 7)}.
3. (a) Find all subsets of {a, b}. How many are there? Write down also
the number of subsets with respectively 0, 1 and 2 elements.
(b) Find all subsets of {a, b, c}. How many are there? Write down also
the number of subsets with respectively 0, 1, 2 and 3 elements.
52
CHAPTER 2. PROOFS
(c) Find all subsets of the set {a, b, c, d}. How many are there? Write
down also the number of subsets with respectively 0, 1, 2, 3 and
4 elements.
(d) What patterns do you notice arising from these calculations?
4. If the set A has m elements and the set B has n elements how many
elements does the set A × B have?
5. If A has m elements, how many elements does the set An have?
6. Prove that that two sets A and B are equal if, and only if, A ⊆ B and
B ⊆ A.
2.10
Proof by induction
This is a method of proof that, although useful, does not always deliver much
insight into why something is true. The basis of this method is the following:
Let X be a subset of N that satisfies the following two conditions:
first, 0 ∈ X, and second if n ∈ X then n + 1 ∈ X. Then X = N.
This fact is called the induction principle, and can be viewed as one of the
basic axioms describing the natural numbers.
We may use it as a proof technique in the following way. Suppose we
have an infinite number of statements S0 , S1 , S2 , . . . which we want to prove.
By the induction principle, it is enough to do two things:
1. Show that S0 is true.
2. Show that if Sn is true then Sn+1 is also true.
It will then follow that Si is true for all positive i.
Proofs by induction have the following script:
Base step Show that the case n = 0 holds.
Induction hypothesis (IH) Assume that the case where n = k holds.
Proof bit Now use (IH) to show that the case where n = k + 1 holds.
Conclude that the result holds for all n by the induction principle.
2.10. PROOF BY INDUCTION
53
Example 2.10.1. Prove by induction that n3 + 2n is exactly divisible by 3
for all natural numbers n ≥ 0.
Base step: when n = 0, we have that 03 + 2 · 0 = 0 which is exactly
divisible by 3.
Induction hypothesis: assume result is true for n = k. We prove it for
n = k + 1. We need to prove that (k + 1)3 + 2(k + 1) is exactly divisible
by 3 assuming only that k 3 + 2k is exactly divisible by 3. We first expand
(k + 1)3 + 2(k + 1) to get
k 3 + 3k 2 + 3k + 1 + 2k + 2.
This is equal to
(k 3 + 2k) + 3(k 2 + k + 1)
which is exactly divisible by 3 using the induction hypothesis.
In practice, some simple variants of this principle are used. Rather than
the whole set N, we often work with a set of the form
N≥k = N \ {0, 1, . . . , k − 1}
where k ≥ 1. Our induction principle is modified accordingly: a subset X of
N≥k that contains k and contains n + 1 whenever it contains n must be equal
to the whole of N≥k . In our script above, the base step involves checking the
case where n = k.
What I described above I shall call basic induction. There is also something called the strong induction principle which runs as follows:
Let X be a subset of N that satisfies the following two conditions:
first, 0 ∈ X and second, if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then
{0, 1 . . . , n + 1} ⊆ X. Then X = N.
Finally, there is the well-ordering principle that states that every nonempty subset of the natural numbers has a smallest element.
Induction, strong induction and well-ordering look very different from
each other. In fact, they are equivalent and all useful in proving theorems.
Proposition 2.10.2. The following are equivalent.
1. The induction principle.
2. The strong induction principle.
54
CHAPTER 2. PROOFS
3. The well-ordering principle.
Proof. (1)⇒(2). I shall assume that the induction principle holds and prove
that the strong induction principle holds. Let X ⊆ N be such that 0 ∈ X
and and if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then {0, 1 . . . , n + 1} ⊆ X. We
shall use induction to prove that X = N. Let Y ⊆ N consist of all natural
numbers n such that {0, 1, . . . , n} ⊆ X. We have that 0 ∈ Y and we have
that n + 1 ∈ Y whenever n ∈ Y . By induction, we deduce that Y = N. It
follows that X = N.
(2)⇒(3). I shall assume that the strong induction principle holds and
prove that the well-ordering principle holds. Let X ⊆ N be a subset that has
no smallest element. I shall prove that X must be empty. Put Y = N \ X. I
claim that 0 ∈ Y . If not, then 0 ∈ X and that would obviously have to be the
smallest element, which is a contradiction. Suppose that {0, 1, . . . , n} ⊆ Y .
Then we must have that n + 1 ∈ X because otherwise n + 1 would be the
smalest element of X. We now invoke strong induction to deduce that Y = N
and so X = ∅.
(3)⇒(1). I shall assume the well-ordering principle and prove the induction principle. Let X ⊆ N be a subset such that 0 ∈ X and whenever n ∈ X
then n + 1 ∈ X. Suppose that N \ X is non-empty. Then it would have a
smallest element k say. But then k − 1 ∈ X and so, by assumption, k ∈ X,
which is a contradiction. Thus N \ X is empty and so X = N.
Strong induction will be used in a few places in this book but I will discuss
it in more detail when needed.
Exercises 2.10
1. Prove that for each natural number n ≥ 3, we have that
n2 > 2n + 1.
2. Prove that for each natural number n ≥ 5, we have that
2n > n2 .
3. Prove that for each natural number n ≥ 1, the number 4n +2 is divisible
by 3.
2.10. PROOF BY INDUCTION
55
4. Prove that
1 + 2 + 3 + ... + n =
n(n + 1)
.
2
5. Prove that
2 + 4 + 6 + . . . + 2n = n(n + 1).
6. Prove that
3
3
3
3
1 + 2 + 3 + ... + n =
n(n + 1)
2
2
.
7. Prove that a set with n ≥ 0 elements has exactly 2n subsets.
56
CHAPTER 2. PROOFS
Chapter 3
High-school algebra revisited
In this chapter, I will review some of the basic constructions from high-school
algebra from the perspective of this book.
3.1
3.1.1
The rules of the game
The axioms
Algebra deals with the manipulation of symbols. This means that symbols
are altered and combined according to certain rules. In high-school, the algebra you studied was mainly based on the properties of the real numbers. This
means that when you write x you mean an unknown or yet-to-be-determined
real number. In this section, I shall describe the rules, or axioms, that you
use for doing algebra with real numbers. The primary operations we are
interested in are addition x + y and multiplication x × y. As usual, I shall
abbreviate the operation of multiplication by concatenation, which simply
means we write xy. Sometimes, it is helpful to denote multiplication as
follows x · y. Of course, there are two other familiar operations: subtraction and division. We shall see that these should be treated in a different
way: subtraction as the inverse of addition, and division as the inverse of
multiplication.
Both addition and multiplication require two inputs and then deliver
one output with the inputs and outputs all being taken from the same set.
They are therefore examples of what are called binary operations and are
the commonest kinds of operations in algebra. For example, as we shall see
57
58
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
later, matrix addition and matrix multiplication are both binary operations,
the vector product of two vectors is a binary operation, and the intersection
and union of two sets are both binary operations. A binary operation on a
set X is nothing other than a function from X × X to X. I shall use ∗ to
mean any binary operation defined on some specified set X. We usually write
binary operations between the inputs rather than using the usual functional
notation.
a
a∗b
∗
b
The two most important properties a binary operation may have is commutativity and associativity.
A binary operation is commutative if
a∗b=b∗a
in all cases. That is, the order in which you carry out the operation is
not important. Addition and multiplication of real, and as we shall see later,
complex numbers are commutative. But we shall also meet binary operations
that are not commutative: both matrix multiplication and vector products
are examples. Commutativity is therefore not automatic.
A binary operation is associative if
(a ∗ b) ∗ c = a ∗ (b ∗ c)
in all cases. Remember that the brackets tell you how to work out the
product. Thus (a ∗ b) ∗ c means first work out a ∗ b, let’s call it d, and then
work out d ∗ c. Almost all the binary operations we shall meet in this book
are associative, the one important exception being the vector product.
In order to show that a binary operation ∗ is associative, we have to check
that all possible products (a ∗ b) ∗ c and a ∗ (b ∗ c) are equal. To show that a
binary operation is not associative, we simply have to find specific values for
a, b and c so that (a ∗ b) ∗ c 6= (a ∗ b) ∗ c. Here are examples of both of these
possibilities.
Example 3.1.1. Let’s take the set or real numbers R and investigate a new
binary operation denoted by ◦ that is defined as follows
a ◦ b = a + b + ab.
3.1. THE RULES OF THE GAME
59
We shall prove that it is associative. First, we have to understand what it is
we have to show. From the definition of associativity, we have to prove that
(a ◦ b) ◦ c = a ◦ (b ◦ c)
for all real numbers a, b and c. To do this, we calculate first the lefthand side
and then the righthand side and then verify they are equal. Because we are
trying to prove a result true for all real numbers, we cannot choose specific
values of a, b and c. We first calculate (a ◦ b) ◦ c. Using the axioms for real
numbers, we get that
(a ◦ b) ◦ c = (a + b + ab) ◦ c = (a + b + ab) + c + (a + b + ab)c
which is equal to a + b + c + ab + ac + bc + abc. Now we calculate a ◦ (b ◦ c).
We get that
a ◦ (b ◦ c) = a ◦ (b + c + bc) = a + (b + c + bc) + a(b + c + bc)
which is equal to a + b + c + ab + ac + bc + abc. We now see that we get the
same answers however we bracket the product and so we have proved that
the binary operation ◦ is associative.
Example 3.1.2. Let’s take the set N and define the binary operation ⊕ as
follows
a ⊕ b = a2 + b 2 .
I shall show that this binary operation is not associative. Let’s calculate first
(1 ⊕ 2) ⊕ 3. By definition this is computed as follows
(1 ⊕ 2) ⊕ 3 = (12 + 22 ) ⊕ 3 = 5 ⊕ 3 = 52 + 32 = 25 + 9 = 34.
Now we calculate 1 ⊕ (2 ⊕ 3) as follows
1 ⊕ (2 ⊕ 3) = 1 ⊕ (22 + 32 ) = 1 ⊕ (4 + 9) = 1 ⊕ 13 = 12 + 132 = 1 + 169 = 170.
Therefore
(1 ⊕ 2) ⊕ 3 6= 1 ⊕ (2 ⊕ 3).
It follows that the binary operation ⊕ is not associative.
We are now ready to state the algebraic axioms that form the basis of
high-school algebra. We shall split them up into three groups: those dealing
only with addition, those dealing only with multiplication, and finally those
that deal with both operations together.
60
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
Axioms for addition
(F1) Addition is associative. Let x, y and z be any real numbers. Then
(x + y) + z = x + (y + z).
(F2) There is an additive identity. The number 0 (zero) is the additive
identity. This means that for an real number x we have that x + 0 =
x = 0 + x. Thus adding zero to a number leaves it unchanged.
(F3) Each element has a unique additive inverse. This means that for each
number x there is another number, written −x, with the property that
x + (−x) = 0 = (−x) + x. The number −x is called the additive inverse
of the number x.
(F4) Addition is commutative. Let x and y be any real numbers. Then
x + y = y + x. The word commutative means that the order in which
you add the numbers does not matter.
The first thing to understand is that none of these axioms should be
surprising. They should all agree with your intuition.
Axioms for multiplication
(F5) Multiplication is associative. Let x, y and z be any real numbers. Then
(xy)z = x(yz).
(F6) There is a multiplicative identity. The number 1 is the multiplicative
identity. This means that for any real number x we have that 1x =
x = x1.
(F7) Each non-zero number has a unique multiplicative inverse. Let x 6= 0.
Then there is a unique real number written x−1 with the property that
x−1 x = 1 = xx−1 . The number x−1 is called the multiplicative inverse
of x. It is, of course, the number x1 . It is very important to observe
that zero does not have a multiplicative inverse.
(F8) Multiplication is commutative. Let x and y be any real numbers. Then
xy = yx. Once again the word commutative means that the order in
which you carry out the operations doesn’t matter. In this case, the
operation is multiplication.
3.1. THE RULES OF THE GAME
61
The axioms for multiplication are very similar to those for addition. The
only real difference between them is axiom (F7). This expresses the fact that
you cannot divide by zero.
Linking axioms
(F9) 0 6= 1.
(F10) The additive identity is a multiplicative zero. This means that 0x =
0 = x0. If you multiply any real number by 0 then you get 0.
(F11) Multiplication distributes over addition on the left and the right. There
are actually two distributive laws: the left distributive law
x(y + z) = xy + xz
and the right distributive law
(y + z)x = yx + zx.
Let me come back to the omission of subtraction and division. These are
not viewed as binary operations in their own right. Instead, we define a − b
to mean a + (−b). Thus to subtract b means the same thing as adding −b.
Likewise, we define a ÷ b, when b 6= 0 to mean a × b−1 . Thus to divide by b
is to multiply by b−1 .
We have missed out one further ingredient in algebra, and that is the
properties of equality.
Properties of equality
(E1) If a = b then c + a = c + b.
(E2) If a = b then ca = cb.
Example 3.1.3. When I talked about algebra in Chapter 1, I mentioned
that the usual way of solving a linear equation in one unknown depended on
the properties of real numbers. Let me now show you how we use the above
62
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
axioms to solve ax + b = 0 where a 6= 0. Throughout, I use without comment
the two properties of equality I have listed above.
ax + b
(ax + b) + (−b)
ax + (b + (−b))
ax + 0
ax
ax
−1
a (ax)
(a−1 a)x
1x
x
=
=
=
=
=
=
=
=
=
=
0
0 + (−b) by (F3)
0 + (−b) by (F1)
0 + (−b) by (F3)
0 + (−b) by (F2)
−b by (F2)
a−1 (−b) by (F10) since a 6= 0
a−1 (−b) by (F5)
a−1 (−b) by (F10)
a−1 (−b) by (F5)
I don’t propose that you go into quite such gory detail when solving
equations, but I wanted to show you what actually lay behind the rules that
you might have been taught at school.
Example 3.1.4. We can use our axioms to prove that −1×−1 = 1 something
which is hard to understand in any other way. By definition, −1 is the
additive inverse of 1. This means that 1 + (−1) = 0. Let us calculate
(−1)(−1) − 1. We have that
(−1)(−1) − 1 =
=
=
=
=
(−1)(−1) + (−1) by definition of subtraction
(−1)(−1) + (−1)1 since 1 is the multiplicative identity
(−1)[(−1) + 1] by the left distributivity law
(−1)0 by properties of additive inverses
0 by properties of zero
Hence (−1)(−1) = 1. In other words, the result follows from the usual rules
of algebra.
My final example explains the reason for the prohibition about dividing
by zero.
Example 3.1.5. The following fallacious ‘proof’ shows that 1 = 2.
1. Let a = b.
3.1. THE RULES OF THE GAME
63
2. Then a2 = ab when we multiply both sides by a.
3. Now add a2 to both sides to get 2a2 = a2 + ab.
4. Subtract 2ab from both sides to get 2a2 − 2ab = a2 + ab − 2ab.
5. Thus 2(a2 − ab) = a2 − ab.
6. We deduce that 2 = 1 by cancelling.
The source of the problem is in passing from line (5) to line (6). We are in
fact dividing by zero and this is the source of the problem.
3.1.2
Indices
We usually write a2 rather than aa, and a3 instead of aaa. In this section,
r
I want to review the meaning of algebraic expressions such as a s where rs is
any rational number. Our starting point is a result that I would encourage
you to assume as an axiom at a first reading. I have included the proof to
show you a more sophisticated example of proof by induction.
Lemma 3.1.6 (Generalized associativity). Let ∗ be any binary operation
defined on a set X. If ∗ is associative then however you bracket a product
such as
x1 ∗ . . . ∗ xn
you will always get the same answer.
Proof. If x1 , x2 , · · · , xn are elements of the set X then one particular bracketing will play an important role in our proof
x1 ∗ (x2 ∗ (· · · (xn−1 ∗ xn ) · · · ))
which we write as [x1 x2 . . . xn ].
The proof is by strong induction on the length n of the product in question. The base case is where n = 3 and is just an application of the associative
law. Assume that n ≥ 4 and that for all k < n, all bracketings of a sequence
of k elements of X lead to the same answer. This is therefore the induction hypothesis for strong induction. Let X denote any properly bracketed
expression obtained by inserting brackets into the sequence x1 , x2 , · · · , xn .
Observe that the computation of such a bracketed product involves computing n − 1 products. This is because at each step we can only compute the
64
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
product of adjacent letters xi ∗ xi+1 . Thus at each step of our calculation
we reduce the number of letters by one until there is only one letter left.
However the expression may be bracketed, the final step in the computation
will be of the form Y ∗ Z, where Y and Z will each have arisen from properly
bracketed expressions. In the case of Y it will involve a bracketing of some
sequence x1 , x2 , . . . , xr , and for Z the sequence xr+1 , xr+2 , . . . xn for some r
such that 1 ≤ r ≤ n − 1. Since Y involves a product of length r < n, we
may assume by the induction hypothesis that Y = [x1 x2 . . . xr ]. Observe that
[x1 x2 . . . xr ] = x1 ∗ [x2 . . . xr ]. Hence by associativity,
X = Y ∗ Z = (x1 ∗ [x2 . . . xr ]) ∗ Z = x1 ∗ ([x2 . . . xr ] ∗ Z).
But [x2 . . . xr ] ∗ Z is a properly bracketed expression of length n − 1 in
x2 , · · · , xn and so using the induction hypothesis must equal [x2 x3 . . . xn ].
It follows that X = [x1 x2 . . . xn ]. We have therefore shown that all possible
bracketings yield the same result in the presence of associativity.
We illustrate a special case of the above proof in the example below.
Example 3.1.7. Take n = 5. Then the notation [x1 x2 x3 x4 x5 ] introduced
in the above proof means x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5 ))). Consider the product
((x1 ∗ x2 ) ∗ x3 ) ∗ (x4 ∗ x5 ). Here we have Y = (x1 ∗ x2 ) ∗ x3 and Z = x4 ∗ x5 .
By associativity Y = x1 ∗ (x2 ∗ x3 ). Thus Y ∗ Z = (x1 ∗ (x2 ∗ x3 )) ∗ (x4 ∗ x5 ).
But this is equal to x1 ∗ ((x2 ∗ x3 ) ∗ (x4 ∗ x5 )) again by associativity. By the
induction hypothesis (x2 ∗ x3 ) ∗ (x4 ∗ x5 ) = x2 ∗ (x3 ∗ (x4 ∗ x5 )), and so
((x1 ∗ x2 ) ∗ x3 ) ∗ (x4 ∗ x5 ) = x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5 ))),
as required.
If a binary operation is associative then the above lemma tells us that
computing products of elements is straightforward because we never have
to worry about how to evaluate it as long as we maintain the order of the
elements. We now consider a special case of this result. Let a be any real
number. Define the nth power an of a, where n is a natural number, as
follows: a1 = a and an = aan−1 for any n ≥ 2. Generalized associativity
tells us that an can in fact be calculated in any way we like because we shall
always obtain the same answer. The following result should be familiar. I
shall ask you to prove it in the exercises.
Lemma 3.1.8 (Laws of exponents). Let m, n ≥ 1 be any natural numbers.
3.1. THE RULES OF THE GAME
65
1. am+n = am an .
2. (am )n = amn .
It follows from the above lemma that powers of the same element a commute with one another: am an = an am as both products equal am+n . Our goal
now is to define what am means when m is an arbitrary rational number. We
shall be guided by the requirement that the above laws of exponents should
continue to hold. We may extend the laws of exponents to allow m or n to
be 0. The only way to do this is to define a0 = 1, where 1 is the identity and
a 6= 0.
An extreme case! What about 00 ? This is a can of worms. For this book,
it is probably best to define 00 = 1.
We have explained what an means when n is positive but what can we say
when the exponent is negative? In other words, what does a−n mean? We
assume that the rules above still apply. Thus whatever a−n means we should
have that a−n an = a0 = 1. It follows that a−n = a1n . With this interpretation
we have defined an for all integer values of x.
1
are to
We now investigate what a n should mean. If the law of exponents
√
1
1
n
1
n
continue holidng, then (a n ) = a = a. It follows that a n = a.
r
We may now calculate a s it is equal to
√
r
a s = ( s a)r .
How do we calculate (ab)n ? This is just ab times itself n times. But the
order in which we multiply a’s and b’s doesn’t matter and so we can arrange
all the a’s to the front. Thus (ab)n = an bn .
We also have similar results for addition. We define 2x = x + x and
nx = x + . . . + x where the x occurs n times. We have 1x = x and 0x = 0.
Let {a1 , . . . , an } be a set of n elements. If we write them all in some
order ai1 , . . . , ain then we have what is called a permutation of the elements.
The following lemma can be treated as an axiom and the proof omitted until
later.
Lemma 3.1.9 (Generalized commutativity). Let ∗ be an associative and
commutative binary operation on a set X. Let a1 , . . . , an be any n elements
of X. Then
a1 ∗ . . . ∗ an = ai1 ∗ . . . ∗ ain .
66
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
Proof. First prove by induction the result that
a1 ∗ . . . ∗ an ∗ b = b ∗ a1 ∗ . . . ∗ an .
Let a1 , . . . , an , an+1 be n+1 elements. Consider the product ai1 ∗. . .∗ain ∗ain+1 .
Suppose that an+1 = air . Then
ai1 ∗ . . . ∗ air ∗ . . . ∗ ain ∗ ain+1 = (ai1 ∗ . . . ∗ ain ) ∗ an+1
where the expression in the backets is a product of some permutation of
the elements a1 , . . . , an . We have used here our result above. But by the
induction hypothesis, we may write ai1 ∗ . . . ∗ ain = a1 ∗ . . . ∗ an .
3.1.3
Sigma notation
At this point, it is appropriate to introduce some useful notation. Let
a1 , a2 , . . . , an be n numbers. Their sum is a1 + a2 + . . . + an and because
of generalized associativity we don’t have to worry about brackets. We now
abbreviate this as
n
X
ai .
i=1
P
Where
is Greek ‘S’ and stands for Sum. The letter i is called a subscript.
The equality i = 1 tells us that we start the value of i at 1. The equality
i = n tells us that we end the value of i at n. Although I have started the
sum at 1, I could, in other circumstances, have started at 0, or any other
appropriate number. This notation is very useful and can be manipulated
using the rules above. If 1 < s < n, then we can write
n
X
ai =
i=1
s
X
ai +
n
X
ai .
s+1
i=1
If b is any number then
b
n
X
i=1
!
ai
=
n
X
bai
i=1
is the generalized distributivity law that you are asked to prove in the exercises. These uses of sigma-notation shouldn’t cause any problems.
3.1. THE RULES OF THE GAME
67
P
The most complicated use of -notation arises when we have to sum up
what is called an array of numbers aij where 1 ≤ i ≤ m and 1 ≤ j ≤ n.
This arises in matrix theory, for example. For concreteness, I shall give the
example where m = 3 and n = 4. We can therefore think of the numbers aij
as being arranged in a 3 × 4 array as follows:
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
Observe that the first subscript tells you the row and the second subscript
tells you the column. Thus a23 is the number in the second row and the third
column. Now we can add these numbers up in two different ways getting the
same answer in both cases. The first way is to add the numbers up along the
rows. So, we calculate the following sums
4
X
4
X
a1j ,
j=1
4
X
a2j ,
j=1
a3j .
j=1
We then add up these three numbers
4
X
a1j +
j=1
4
X
a2j +
j=1
4
X
a3j =
j=1
3
4
X
X
i=1
!
aij
.
j=1
The second way is to add the numbers up along the columns. So, we calculate
the following sums
3
X
ai1 ,
3
X
i=1
3
X
ai2 ,
i=1
ai3 ,
3
X
ai4 .
i=1
i=1
n
X
3
4
X
X
We then add up these four numbers
n
X
i=1
The fact that
ai1 +
n
X
ai2 +
i=1
n
X
i=1
3
4
X
X
i=1
ai3 +
j=1
i=1
!
aij
ai4 =
=
j=1
4
3
X
X
j=1
i=1
!
aij
i=1
!
aij
.
68
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
is a consequence of the generalized commutativity law that you are asked to
prove in the exercises. We therefore have in general that
!
!
m
n
n
m
X
X
X
X
aij =
aij .
i=1
3.1.4
j=1
j=1
i=1
Infinite sums
What I have defined so far are finite sums and form part of algebra. There
are also infinite sums
∞
X
ai
i=1
which form part of analysis, the subject that provides the foundations for
calculus. There is one place where we use infinite sums in everyday life, and
that is in the decimal representations of numbers. Thus the fraction 13 can
be written as 0 · 3333 . . . and this is in fact an infinite sum: it means the
infinite sum
∞
X
3
.
i
10
i=1
But in general infinite sums are problematic. For example, consider the
infinite sum
∞
X
(−1)i+1 .
S=
i=1
So, this is just
S = 1 − 1 + 1 − 1 + ...
What is S? You’re first instinct might be to say 0 because
S = (1 − 1) + (1 − 1) + . . .
But it could equally well be 1 calculated as follows
S = 1 + (−1 + 1) + (−1 + 1) + . . .
In fact, it could even be 12 since S + S = 1 and so S = 12 . There is clearly
something seriously awry here, and it is that infinite sums have to be handled
very carefully if they are to make sense. Just how is the business of analysis
3.1. THE RULES OF THE GAME
69
and won’t be an issue in this book.
Warning! ∞ is not a number. It simply tells us to keep adding on terms
for increasing values of i without end so we never write
3
.
10∞
Exercises 3.1
1. Prove the following identities using the axioms introduced.
(a) (a + b)2 = a2 + 2ab + b2 .
(b) (a + b)3 = a3 + 3a2 b + 3ab2 + b3
(c) a2 − b2 = (a + b)(a − b)
(d) (a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2
2. Calculate the following.
(a) 23 .
1
(b) 2 3 .
(c) 2−4 .
3
(d) 2− 2 .
3. Assume that aij are assigned the following values
a11 = 1 a12 = 2 a13 = 3 a14 = 4
a21 = 5 a22 = 6 a23 = 7 a24 = 8
a31 = 9 a32 = 10 a33 = 11 a34 = 12
Calculate the following sums.
P3
(a)
i=1 ai2 .
P4
(b)
j=1 a3j .
P3 P4
2
(c)
a
i=1
j=1 ij .
70
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
4. Let a, b, c ∈ R. If ab = ac is it true that b = c? Explain.
5. Laws of exponents.
(a) Prove by induction that am+n = am an . To do this, fix m and then
prove the result by induction on n. Deduce that it holds for all
m.
(b) Prove by induction that (am )n = amn . To do this, fix m and then
prove the result by induction on n. Deduce that it holds for all
m.
6. Prove by induction that the left generalized distributivity law holds
a(b1 + b2 + b3 + . . . + bn ) = ab1 + ab2 + ab3 + . . . + abn ,
for any n ≥ 2.
3.2
Solving quadratic equations
The previous section might have given the impression that algebraic calculations are routine. In fact, once you pass beyond linear equations, they
usually require good ideas. The first place where a good idea is needed is in
solving quadratic equations. Quadratic equations were solved by the Babylonians and the Egyptians and are dealt with in all school algebra courses. I
have included them here because I want to show you that you don’t have to
remember a formula to solve such equations; what you have to remember is
a method. Let’s recall some definitions. An expression of the form
ax2 + bx + c
where a, b, c are numbers and a 6= 0 is called a quadratic polynomial or a
polynomial of degree 2. The numbers a, b, c are called the coefficients of the
quadratic. A quadratic where a = 1 is said to be monic. A number r such
that
ar2 + br + c = 0
is called a root of the polynomial. The problem of finding all the roots of a
quadratic is called solving the quadratic. Usually this problem is stated in
the form: ‘solve the quadratic equation ax2 + bx + c = 0’. Equation because
3.2. SOLVING QUADRATIC EQUATIONS
71
we have set the polynomial equal to zero. I shall now show you how to solve
a quadratic equation without having to remember a formula. Observe first
that if ax2 + bx + c = 0 then
c
b
x2 + x + = 0.
a
a
Thus it is enough to find the roots of monic quadratics. We shall solve this
equation by trying to do the following: write x2 + ab x as a perfect square plus
a number. This will turn out to be the crux of solving the quadratic. We
shall illustrate our construction by using some diagrams. First, we represent
geometrically the expression x2 + ab x.
x
x
b
a
Now cut the red rectangle into two pieces along the dotted line and rearrange
them as shown below.
72
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
x
b
2a
x
b
2a
It is now geometrically obvious that if we add in the small dotted square, we
get a new bigger square. This explain why the procedure is called completing
the square. We now express in algebraic terms what these diagrams suggest.
2
b
b2
b2
b
b2
b
2
2
− 2.
x + x= x + x+ 2 − 2 = x+
a
a
4a
4a
2a
4a
We therefore have that
b
x + x=
a
2
2
b2
b
− 2.
x+
2a
4a
Look carefully at what we have done here: we have rewritten the lefthand
side as a perfect square — the first term on the righthandside — plus a
number — the second term on the righthandside. It follows that
2
2
b
c
b
b2
c
b
4ac − b2
2
.
x + x+ = x+
− 2 + = x+
+
a
a
2a
4a
a
2a
4a2
Setting the last expression equal to zero and rearranging, we get
2
b
b2 − 4ac
x+
=
.
2a
4a2
Now take square roots of both sides, remembering that a non-zero number
has two square roots:
r
b
b2 − 4ac
x+
=±
2a
4a2
3.2. SOLVING QUADRATIC EQUATIONS
73
which of course simplifies to
√
b
b2 − 4ac
x+
=±
.
2a
2a
Thus
√
b2 − 4ac
2a
the usual formula for finding the roots of a quadratic.
x=
−b ±
Example 3.2.1. Solve the quadratic equation
2x2 − 5x + 1 = 0.
by completing the square. Divide through by 2 to make the quadratic monic
giving
1
5
x2 − x + = 0.
2
2
We now want to write
5
x2 − x
2
as a perfect square plus a number. We get
2
5
5
25
2
x − x= x−
− .
2
4
16
Thus our quadratic becomes
2
5
25 1
x−
+ = 0.
−
4
16 2
Rearranging and taking roots gives us
√
√
5
17
5 ± 17
x= ±
=
.
4
4
4
We now check our answer by substituting each of our two roots back into
the original quadratic and ensuring that we get zero in both cases.
For the quadratic equation
ax2 + bx + c = 0
the number D = b2 − 4ac, called the discriminant of the quadratic, plays an
important role.
74
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
• If D > 0 then the quadratic equation has two distinct real solutions.
• If D = 0 then the quadratic equation has one real root
repeated. In
b 2
this case, the quadratic is the perfect square x + 2a .
• If D < 0 then we shall see that the quadratic equation has two complex
roots which are complex conjugate to each other. This is called the
irreducible case.
If we put y = ax2 + bx + c then we may draw the graph of this equation.
The roots of the original quadratic therefore correspond to the points where
this graph crosses the x-axis. The diagrams below illustrate the three cases
that can arise.
D>0
D=0
3.2. SOLVING QUADRATIC EQUATIONS
75
D<0
Exercises 3.2
1. Calculate the discriminants of the following quadratics and so determine whether they have two distinct roots, or repeated roots, or no
real roots.
(a) x2 + 6x + 5.
(b) x2 − 4x + 4.
(c) x2 − 2x + 5.
2. Solve the following quadratic equations by completing the square. Check
your answers.
(a) x2 + 10x + 16 = 0.
(b) x2 + 4x + 2 = 0.
(c) 2x2 − x − 7 = 0.
3. I am thinking of two numbers x and y. I tell you their sum a and their
product b. What are x and y in terms of a and b?
4. Let p(x) = x2 + bx + c be a monic quadratic with roots x1 and x2 .
Express the discriminant of p(x) in terms of x1 and x2 .
5. This question is an interpretation of part √
of Book X of Euclid. We shall
be interested in numbers
of the form a + b where a and b are rational
√
and b > 0 where b is irrational1 .
1
Remember that irrational means not rational.
76
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
√
√
√
a = b + c where c is irrational Then b = 0.
√
√
√
√
a and √
c are rational and b and d are
(b) If a + b = c + d where √
irrational then a = c and b = d.
√
√
√
(c) Prove that the square roots of a + b have the form ±( x + y).
(a) If
3.3
Order
In addition to algebraic operations, the real numbers are also ordered: we
can always say of two real numbers whether they are equal or whether one of
them is bigger than the other. I shall write down first the axioms for order
that hold both for rational and complex numbers. The following notation is
important. If a ≤ b and a 6= b then we write a < b and say that a is strictly
less than b.
Axioms for order
(O1) For every element a ≤ a.
(O2) If a ≤ b and b ≤ a then a = b.
(O3) If a ≤ b and b ≤ c then a ≤ c.
(O4) Given any two elements a and b then either a ≤ b or b ≤ a or a = b.
If a > 0 the we say that it is positive and if a < 0 we say it is negative.
(O5) If a ≤ b and c ≤ d then a + b ≤ b + d.
(O6) If a ≤ b and c is positive then ac ≤ bc.
The only axiom that you really have to watch is (O6). Here is an example
of a proof using these axioms.
Example 3.3.1. We prove that a ≤ b if, and only if, b − a is positive. Since
this statement involves an ‘if, and only, if’ there are, as usual,two statements
to be proved. Suppose first that a ≤ b. By axiom (O5), we may add −a to
both sides to get a + (−a) ≤ b + (−a). But a + (−a) = 0 and b + (−a) = b − a,
by definition. It follows that 0 ≤ b − a and so b − a is positive. Now we prove
the converse. Suppose that b − a is positive. Then by definition 0 ≤ b − a.
3.4. THE REAL NUMBERS
77
Also by definition, b − a = b + (−a). Thus 0 ≤ b + (−a). By axiom (O5), we
may add a to both sides to get 0 + a ≤ (b + (−a)) + a. But 0 + a = a and
(b + (−a)) + a quickly simplifies to b. We have therefore proved that a ≤ b,
as required.
Exercises 3.3
1. Prove that between any two distinct rational numbers there is another
rational number.
2. Prove the following using the axioms.
(a) If a ≤ b then −b ≤ −a.
(b) a2 is positive for all a 6= 0.
(c) If 0 < a < b then 0 < b−1 < a−1 .
3.4
The real numbers
The axioms I have introduced so far apply equally well to both the rational
numbers Q and the real numbers R. But we have seen that
√ although Q ⊆ R
the two sets are not equal because we have proved that 2 ∈
/ Q. In fact, we
shall see later that there are many more irrational numbers than there are
rational numbers. In this section, I shall explain the fundamental difference
between rationals and reals. This material will not be needed in the rest of
this book instead its rôle is to connect with the foundations of calculus, that
is, with analysis.
It is convenient to write K to mean either Q or R in what follows because I
want to make the same definitions for both sets. A non-empty subset A ⊆ K
is said to be bounded above if there is some number b ∈ K so that for all
a ∈ A we have that a ≤ b. For example, the set A = {2n : n ≥ 0} is not
bounded above since its elementsgetter bigger and bigger without limit. On
n
the other hand, the set B = { 21 : n ≥ 0} is bounded above, for example
by 1. A non-empty subset A as above is said to have a least upper bound if
you can find a number a ∈ K with the following two properties: first of all,
a but be an upper bound for A and second of all if b is any upper bound for
78
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
A then a ≤ b. We shall now apply these definitions to a result we obtained
earlier.
Let
A = {a : a ∈ Q and a2 ≤ 2}
and let
B = {a : a ∈ R and a2 ≤ 2}.
Then A ⊆ Q and B ⊆ R. Both sets are bounded above: the number 1 21 , for
example, works in both case. However, I shall prove that the subset A does
not have a least upper bound, whereas the subset B does.
Let’s consider subset A first. Suppose that r were a least upper bound.
I claim that √
r2 would have to equal 2 which is impossible because we have
proved that 2 is irrational.
Suppose first that r2 < 2. Then I claim there is a rational number r1 such
that r < r1 and r12 < 2. Choose any rational number h such that 0 < h < 1
and
2 − r2
.
h<
2r + 1
Put r1 = r + h. By construction r1 > r. We calculate r12 as follows
r12 = r2 + 2rh + h2 = r2 + (2r + h)h < r2 + (2r + 1)h = r2 + 2 − r2 = 2.
Thus r12 < 2 as claimed. But this contradicts the fact that r is an upper
bound of the set A.
Suppose now that 2 < r2 . Then I claim that I can find a rational number
2
r1 such that r1 < r and 2 < r12 . Put h = r 2r−2 and define r1 = r − h. Clearly,
0 < r1 < r. We calculate r22 as follows
r12 = r2 − 2rh + h2 = r2 − (r2 − 2) + h2 > r2 − (r2 − 2) = 2.
But this contradicts the fact that r is supposed to be a least upper bound.
We
of A then
√ have therefore proved that if r is a least upper bound
√
r = 2. But this is impossible because we have proved that 2 is irrational.
Thus the set A does not have a least upper bound in the rationals. However,
by essentially the same reasoning
the set B does have a least upper bound
√
in the reals: the number 2. This motivates the following definition. It is
this axiom that is needed to develop calculus properly.
3.4. THE REAL NUMBERS
79
The completeness axiom for R
Every non-empty subset of the reals that is bounded above has a least
upper bound.
The Peano Axioms
Set theory is supposed to be a framework in which all of mathematics
can take place. Let me briefly sketch out how we can construct the
real numbers using set theory. The starting point are the Peano axioms
studied by G. Peano (1858–1932). These deal with a set P and an
operation on this set called the successor function which for each n ∈ P
produces a unique element n+ . The following four axioms should hold:
(P1) There is a distinguished element of P that we denote by 0.
(P2) There is no element n ∈ P such that n+ = 0.
(P3) If m, n ∈ P and m+ = n+ then m = n,
(P4) If X ⊆ P is such that 0 ∈ X and if n ∈ X then n+ ∈ X then
X = P.
By using ideas from set theory, one shows that P is essentially the set
of natural numbers together with its operations of addition and multiplication.
The natural numbers are deficient in that it is not always possible
to solve equations of the form a + x = b because of the lack of negative
numbers. However, we can use set theory to construct Z from N by
using ordered pairs. The idea is to regard (a, b) as meaning a − b.
However, there are many names for the same negative number so we
should have (0, 1) and (2, 3) and (3, 4) all signifying the same number:
namely, −1. To make this work, one uses another idea from set theory,
that of equivalence relations which we shall meet later. This gives rise
to the set Z. Again using ideas from set theory, the usual operations
can be constructed on Z.
But the integers are deficient because we cannot always solve equations of the form ax + b = 0 because of the lack of rational numbers.
To construct them we use ordered pairs again. This time (a, b), where
80
CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED
b 6= 0, is interpreted as ab . But again we have the problem of multiple
names for what should be the same number. Thus (1, 2) should equal
(−1, −2) should equal (2, 4) and so forth. Once again this problem is
solved by using an equivalence relation, and once again, the set which
arises, which is denoted by Q, is endowed with the usual operations.
As
√ we have seen, the rationals are deficient in not containing numbers
like 2. The intuitive idea behind the construction of the reals from the
rationals is that we want to construct R as all the numbers that can
be approximated arbitrarily by rational numbers. To do this, we form
the set of all subsets X of Q which have the following characteristics:
X 6= ∅, X 6= Q, if x ∈ X and y ≤ x then y ∈ X, and X doesn’t have
a biggest element. These subsets are called Dedekind cuts and should
be regarded as defining the real number r so that X consists of all the
rational numbers less than r.
Chapter 4
Number theory
Number theory is one of the oldest branches of mathematics and deals,
mainly, with the properties of the integers, the simplest kinds of numbers. It
is a vast subject, and so this chapter can only be an introduction. The main
result proved is that every natural number greater than one can be written
as a product of powers of primes, a result known as the fundamental theorem
of arithmetic. This shows that the primes are the building blocks, or atoms,
from which all natural numbers are constructed. The primes are still the
subject of intensive research and the source of many unanswered questions.
It is ironic that the numbers we learn about first as children are the source
of some of mathematics’ most difficult and interesting questions. The tool
that enables this chapter to work is the remainder theorem so that is where
we shall start.
4.1
The remainder theorem
We begin by stating a basic result that you may assume as an axiom but
which I shall also set as a proof in one of the exercises.
Lemma 4.1.1 (Remainder Theorem). Let a and b be integers where b > 0.
Then there are unique integers q and r such that
a = bq + r
where 0 ≤ r < b.
81
82
CHAPTER 4. NUMBER THEORY
The number q is called the quotient and the number r is called the remainder. For example, if we consider the pair of natural numbers 14 and 3
then
14 = 3 · 4 + 2
where 4 is the quotient and 2 is the remainder. Your first reaction to this
result is that it is obvious and you might conclude from this that it is therefore uninteresting. But this would be wrong. It is certainly not hard to
understand but despite that it is important. The reason is that whenever we
have a question that involves divisibility, it is very likely going to require the
use of this result.
Example 4.1.2. From the remainder theorem, we know that every natural
number n can be written as n = 10q + r where 0 ≤ r ≤ 9. The integer r is
nothing other than the units digit in the usual base 10 representation of n.
Thus, for example, 42 = 10 × 4 + 2. Similarly, it is the remainder theorem
that tells us that odd numbers are precisely those that leave remainder 1
when divided by 2.
Let a and b be integers where a 6= 0. We say that a divides b or that b
is divisible by a if there is a q such that b = aq. In other words, there is no
remainder. We also say that a is a divisor or factor of b. We write a | b to
mean the same thing as ‘a divides b’. It is very important to remember that
a | b does not mean the same thing as ab . The latter is a number, the former
is a statement about two numbers.
As a very simple example of the remainder theorem, we shall look at how
we write numbers down.
I don’t think our hunter-gatherer ancestors worried too much about writing numbers down because there wasn’t any need: they didn’t have to fill in
tax-returns and so didn’t need accountants. However, organizing cities does
need accountants and so ways had to be found of writing numbers down.
The simplest way of doing this is to use a mark like |, called a tally, for each
thing being counted. So
||||||||||
means 10 things. This system has advantages and disadvantages. The advantage is that you don’t have to go on a training course to learn it. The
disadvantage is that even quite small numbers need a lot of space like
||||||||||||||||||||||||||||||||||||||
4.1. THE REMAINDER THEOREM
83
It’s also hard to tell whether
|||||||||||||||||||||||||||||||||||||||
is the same number or not. (It’s not.) It’s inevitable that people will introduce abbreviations to make the system easier to use. Perhaps it was in
this way that the next development occurred. Both the ancient Egyptians
and Romans used similar systems but I’ll describe the Roman system because
it involves letters rather than pictures. First, you have a list of basic symbols:
number
symbol
1
I
5
V
10
X
50
L
100
C
500
D
1000
M
There are more symbols for bigger numbers. Numbers are then written
according to the additive principle. Thus MMVIIII is 2009. Incidently, I
understand that the custom of also using a subtractive principle so that, for
example, IX means 9 rather than using VIIII, is a more modern innovation.
This system is clearly a great improvement on the tally-system. Even quite
big numbers are written compactly and it is easy to compare numbers. On
the other hand, there is more to learn. The other disadvantage is that we need
separate symbols for different powers of 10 and their multiples by 5. This
was probably not too inconvenient in the ancient world where it is likely that
the numbers needed on a day-to-day basis were never going to be that big.
A common criticism of this system is that it is hard to do multiplication in.
However, that turns out to be a non-problem because, like us, the Romans
used pocket calculators or, more accurately, a device called an abacus that
could easily be carried under a toga. The real evidence for the usefulness of
this system of writing numbers is that it survived for hundreds and hundreds
of years.
The system used throughout the world today is quite different and is
called the positional number system. It seems to have been in place by the
ninth century in India but it was hundreds of years in development and the
result of ideas from many different cultures: the invention of zero on its own
is one of the great steps in human intellectual development. The genius of
the system is that it requires only 10 symbols
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
84
CHAPTER 4. NUMBER THEORY
and every natural number can be written using a sequence of these symbols.
The trick to making the system work is that we use the position on the page
of a symbol to tell us what number it means. Thus 2009 means
103
2
102
0
101
0
100
9
In other words
2 × 103 + 0 × 102 + 0 × 101 + 9 × 100 .
Notice the important rôle played by the symbol 0 which makes it clear to
which column a symbol belongs otherwise we couldn’t tell 29 from 209 from
2009. The disadvantage of this system is that you do have to go on a course
to learn it because it is a highly sophisticated way of writing numbers. On the
other hand, it has the enormous advantage that any number can be written
down in a compact way. Once the basic system had been accepted it could
be adapted to deal not only with positive whole numbers but also negative
whole numbers, using the symbol −, and also fractions with the introduction
of the decimal point. By the end of the sixteenth century, the full decimal
system was in place.
Notation warning! In the UK, we use a raised decimal point like 0 · 123
and not a comma. Also we generally write the number 1 without a long
hook at the top. If you do write it like that there is a danger that people will
confuse it with the number 7 which is not always written in the UK with a
line through it.
We shall now look in more detail at the way in which numbers can be
written down using a positional notation. In order not to be biased, we shall
not just work in base 10 but show how any base can be used. Our main tool
is the remainder theorem.
Let’s see how to represent numbers in base b where b ≥ 2. If d ≤ 10 then
we represent numbers by sequences of symbols taken from the set
Zd = {0, 1, 2, 3, . . . d − 1}
4.1. THE REMAINDER THEOREM
85
but if d > 10 then we need new symbols for 10, 11, 12 and so forth. It’s
convenient to use A,B,C, . . .. For example, if we want to write numbers in
base 12 we use the set of symbols
{0, 1, . . . , 9, A, B}
whereas if we work in base 16 we use the set of symbols
{0, 1, . . . , 9, A, B, C, D, E, F }.
If x is a sequence of symbols then we write xd to make it clear that we are
to interpret this sequence as a number in base d. Thus BAD16 is a number
in base 16.
The symbols in a sequence xd , reading from right to left, tell us the contribution each power of d such as d0 , d1 , d2 , etc makes to the number the
sequence represents. Here are some examples.
Examples 4.1.3. Converting from base d to base 10.
1. 11A912 is a number in base 12. This represents the following number
in base 10:
1 × 123 + 1 × 122 + A × 121 + 9 × 120 ,
which is just the number
123 + 122 + 10 × 12 + 9 = 2001.
2. BAD16 represents a number in base 16. This represents the following
number in base 10:
B × 162 + A × 161 + D × 160 ,
which is just the number
11 × 162 + 10 × 16 + 13 = 2989.
3. 55567 represents a number in base 7. This represents the following
number in base 10:
5 × 73 + 5 × 72 + 5 × 71 + 6 × 70 = 2001.
86
CHAPTER 4. NUMBER THEORY
These examples show how easy it is to convert from base d to base 10.
There are two ways to convert from base 10 to base d.
1. The first runs in outline as follows. Let n be the number in base 10
that we wish to write in base d. Look for the largest power m of d such
that adm ≤ n where a < d. Then repeat for n − adm . Continuing in
this way, we write n as a sum of multiples of powers of d and so we can
write n in base d.
2. The second makes use of the remainder theorem. The idea behind this
method is as follows. Let
n = am . . . a1 a0
in base d. We may think of this as
n = (am . . . a1 )d + a0
It follows that a0 is the remainder when n is divided by d, and the
quotient is n0 = am . . . a1 . Thus we can generate the digits of n in base
d from right to left by repeatedly finding the next quotient and next
remainder by dividing the current quotient by d; the process starts with
our input number as first quotient.
Examples 4.1.4. Converting from base 10 to base d.
1. Write 2001 in base 7. I’ll solve this question in two different ways: the
long but direct route and then the short but more thought-provoking
route.
We see that 74 > 2001. Thus we divide 2001 by 73 . This goes 5 times
plus a remainder. Thus 2001 = 5 × 73 + 286. We now repeat with
286. We divide it by 72 . It goes 5 times again plus a remainder. Thus
286 = 5 × 72 + 41. We now repeat with 41. We get that 41 = 5 × 7 + 6.
We have therefore shown that
2001 = 5 × 73 + 5 × 72 + 5 × 7 + 6.
Thus 2001 in base 7 is just 5556.
4.1. THE REMAINDER THEOREM
87
Now for the short method.
7
7
7
7
quotient
2001
285
40
5
0
remainder
6
5
5
5
Thus 2001 in base 7 is:
5556.
2. Write 2001 in base 12.
12
12
12
12
quotient
2001
166
13
1
0
remainder
9
10 = A
1
1
Thus 2001 in base 12 is:
11A9.
3. Write 2001 in base 2.
2
2
2
2
2
2
2
2
2
2
2
quotient
2001
1000
500
250
125
62
31
15
7
3
1
0
remainder
1
0
0
0
1
0
1
1
1
1
1
88
CHAPTER 4. NUMBER THEORY
Thus 2001 in base 2 is (reading from bottom to top):
11111010001.
When converting from one base to another it is always wise to check
your calculations by converting back.
Number bases have some special terminology associated with them which
you might encounter:
Base 2 binary.
Base 8 octal.
Base 10 decimal.
Base 12 duodecimal.
Base 16 hexadecimal.
Base 20 vigesimal.
Base 60 sexagesimal.
Binary, octal and hexadecimal occur in computer science; there are remnants
of a vigesimal system in French and the older Welsh system of counting; base
60 was used by astronomers in ancient Mesopotamia and is still the basis of
time measurement (60 seconds = 1 minute, and 60 minutes = 1 hour) and
angle measurement.
As a final example, of the importance of the remainder theorem, we look
at how we may write proper fractions as decimals. To see what’s involved,
let’s calculate some decimal fractions.
Examples 4.1.5.
1.
1
20
2.
1
7
= 0 · 142857142857142857142857142857 . . .. This fraction has an
infinite decimal representation, which consists of the same sequence of
numbers repeated. We abbreviate this decimal to 0 · 142857.
3.
37
84
= 0 · 05. This fraction has a finite decimal representation.
= 0 · 44047619. This fraction has an infinite decimal representation,
which consists of a non-repeating part followed by a part which repeats.
4.1. THE REMAINDER THEOREM
89
I shall characterize those fractions which have a finite decimal representation once we have proved our main theorem. I want to focus here on the
last two cases. Case (2) is said to be a purely periodic decimal whereas case
(3), which is more general, is said to be ultimately periodic.
Proposition 4.1.6. An infinite decimal fraction represents a rational number if and only if it is ultimately periodic.
Proof. The key is in the remainders. Consider the ultimately periodic decimal number
r = 0 · a1 . . . as b1 . . . bt .
We shall prove that r is rational. Observe that
10s r = a1 . . . as · b1 . . . bt
and
10s+t = a1 . . . as b1 . . . bt · b1 . . . bt .
From which we get that
10s+t r − 10s r = a1 . . . as b1 . . . bt − a1 . . . as
where the righthand side is the decimal form of some integer that we shall
call a. It follows that
a
r = s+t
10 − 10s
is a rational number.
The proof of the converse is based on the method we use to compute
the decimal expansion of m
. We carry out repeated divisions by n and at
n
each step of the computation we use the remainder obtained to calculate
the next digit. But there are only a finite number of possible remainders
and our expansion is assumed infinite. Thus at some point there must be
repetition.
Example 4.1.7. We shall write the ultimately periodic decimal 0 · 94̄. as a
proper fraction in its lowest terms. Put r = 0 · 94̄. Then
• r = 0 · 94̄.
• 10r = 9.444 . . .
90
CHAPTER 4. NUMBER THEORY
• 100r = 94.444 . . ..
. We can simplify this to r =
Thus 100r − 10r = 94 − 9 = 85 and so r = 85
90
We can now easily check that this is correct.
17
.
18
Exercises 4.1
1. Find the quotients and remainders for each of the following pair of
numbers. Divide the smaller into the larger.
(a) 30 and 6.
(b) 100 and 24.
(c) 364 and 12.
2. Write the number 2009 in
(a) Base 5.
(b) Base 12.
(c) Base 16.
3. Write the following numbers in base 10.
(a) DAB16 .
(b) ABBA12 .
(c) 443322115 .
4. Write the following decimals as fractions in their lowest terms.
(a) 0 · 534.
(b) 0 · 2106.
(c) 0 · 076923.
5. Prove the following properties of the division relation on Z.
(a) If a 6= 0 then a | a.
(b) If a | b and b | a then a = ±b.
(c) If a | b and b | c then a | c.
4.2. GREATEST COMMON DIVISORS
91
(d) If a | b and a | c then a | (b + c).
6. This question develops a proof of the remainder theorem. Let a and
b be integers with b > 0. Then there exist a unique pair of integers q
and r such that a = qb + r where 0 ≤ r < b.
(a) Let
X = {a − nb : n ∈ Z}.
Show that this set contains non-negative elements.
(b) Let X + be the subset of X consisting of non-negative elements.
This subset is non-empty by the first step. Use the well-ordering
principle to deduce that this set contains a minimum element r.
Thus r = a − qb ≥ 0 for some q ∈ Z.
(c) Show that if r ≥ b then X + in fact contains a smaller element,
which is a contradiction.
(d) We therefore have that a = bq + r where 0 ≤ r < b. It remains
to prove that q and r are unique with these propertries. Assume
therefore that a = bq 0 + r0 where 0 ≤ r0 < b. Deduce that q = q 0
and r = r0 .
4.2
Greatest common divisors
Let a, b ∈ N. A number d which divides both a and b is called a common
divisor of a and b. The largest number which divides both a and b is called
the greatest common divisor of a and b and is denoted by gcd(a, b). A pair
of natural numbers a and b is said to be coprime if gcd(a, b) = 1. For us
gcd(0, 0) is undefined but if a 6= 0 then gcd(a, 0) = a.
Example 4.2.1. Consider the numbers 12 and 16. The set of divisors of
12 is {1, 2, 3, 4, 6, 12}. The set of divisors of 16 is {1, 2, 4, 8, 16}. The set of
common divisors is the set of numbers that belong to both of these two sets:
namely, {1, 2, 4}. The greatest common divisor of 12 and 16 is therefore 4.
Thus gcd(12, 16) = 4.
One application of greatest common divisors is in simplifying fractions.
12
is equal to the fraction 34 because we can divide
For example, the fraction 16
out by the common divisor of numerator and denominator. The fraction
which results cannot be simplified further and is in its lowest terms.
92
CHAPTER 4. NUMBER THEORY
Lemma 4.2.2. Let d = gcd(a, b). Then gcd( ad , db ) = 1.
Proof. Because d divides both a and b we may write a = a0 d and b = b0 d for
some natural numbers a0 and b0 . We therefore need to prove that gcd(a0 , b0 ) =
1. Suppose that e | a0 and e | b0 . Then a0 = ex and b0 = ey for some natural
numbers x and y. Thus a = exd and b = eyd. Observe that ed | a and ed | b
and so ed is a common divisor of both a and b. But d is the greatest common
divisor and so e = 1, as required.
Let me paraphrase what the result above says since it is not surprising. If
I divide two numbers by their greatest common divisor then the numbers that
remain are coprime. This seems intuitively plausible and the proof ensures
that our intuition is correct.
Example 4.2.3. Greatest common divisors arise naturally in solving linear equations where we require the solutions to be integers. Consider, for
example, the linear equation
12x + 16y = 5.
If we want our solutions (x, y) to have real number co-ordinates, then it is
of course easy to solve this equation and find infinitely many solutions since
the solutions form a line in the plane. But suppose now that we require
(x, y) ∈ Z2 ; that is, we want the solutions to be integers. In other words, we
want to know whether the line contains any points with integer co-ordinates.
We can see immediately that this is impossible. We have calculated that
gcd(12, 16) = 4. Thus if x and y are integers, the number 4 divides the
lefthand side of our equation. But clearly, 4 does not divide the righthand
side of our equation. Thus the set
{(x, y) : (x, y) ∈ Z2 and 12x + 16y = 5}
is empty.
If the numbers a and b are large, then calculating their gcd in the way
I did above would be time-consuming and error-prone. We want to find an
efficient method of calculating the greatest common divisor. The following
lemma is the basis of just such a method.
Lemma 4.2.4. Let a, b ∈ N, where b 6= 0, and let a = bq +r where 0 ≤ r < b.
Then
gcd(a, b) = gcd(b, r).
4.2. GREATEST COMMON DIVISORS
93
Proof. Let d be a common divisor of a and b. Since a = bq + r we have that
a − bq = r so that d is also a divisor of r. It follows that any divisor of a and
b is also a divisor of b and r.
Now let d be a common divisor of b and r. Since a = bq + r we have that
d divides a. Thus any divisor of b and r is a divisor of a and b.
It follows that the set of common divisors of a and b is the same as the
set of common divisors of b and r. Thus gcd(a, b) = gcd(b, r).
The point of the above result is that b < a and r < b. So calculating gcd(b, r) will be easier than calculating gcd(a, b) because the numbers
involved are smaller. Compare
z }| {
a = bq + r
with
a = bq + r .
| {z }
The above result is the basis of an efficient algorithm for computing greatest
common divisors. It was described in Propositions 1 and 2 of Book VII of
Euclid.
Algorithm 4.2.5 (Euclid’s algorithm).
Input: a, b ∈ N such that a ≥ b and b 6= 0.
Output: gcd(a, b).
Procedure: write a = bq + r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). If
r 6= 0 then repeat this procedure with b and r and so on. The last non-zero
remainder is gcd(a, b)
Example 4.2.6. Let’s calculate gcd(19, 7) using Euclid’s algorithm. I have
highlighted the numbers that are involved at each stage.
19
7
5
2
=
=
=
=
7·2+5
5·1+2
2·2+1 ∗
1·2+0
By Lemma 1.3.3 we have that
gcd(19, 7) = gcd(7, 5) = gcd(5, 2) = gcd(2, 1) = gcd(1, 0).
The last non-zero remainder is 1 and so gcd(19, 7) = 1 and, in this case, the
numbers are coprime.
94
CHAPTER 4. NUMBER THEORY
There are occasions when we need to extract more information from Euclid’s algorithm as we shall discover later when we come to deal with prime
numbers. The following provides what we need.
Theorem 4.2.7 (Bézout’s theorem). Let a and b be natural numbers. Then
there are integers x and y such that
gcd(a, b) = xa + yb.
I shall prove this theorem by describing an algorithm that will compute
the integers x and y above. This is achieved by running Euclid’s algorithm
in reverse and is called the extended Euclidean algorithm. The procedure for
doing so is outlined below but the details are explained in the example that
follows it.
Algorithm 4.2.8 (Extended Euclidean algorithm).
Input: a, b ∈ N where a ≥ b and b 6= 0.
Output: numbers x, y ∈ Z such that gcd(a, b) = xa + yb.
Procedure: apply Euclid’s algorithm to a and b; working from bottom to top
rewrite each remainder in turn.
Example 4.2.9. This is a little involved so I have split the process up into
steps. I shall apply the extended Euclidean algorithm to the example I
calculated above. I have highlighted the non-zero remainders wherever they
occur, and I have discarded the last equality where the remainder was zero.
I have also marked the last non-zero remainder.
19 = 7 · 2 + 5
7 = 5·1+2
5 = 2·2+1 ∗
The first step is to rearrange each equation so that the non-zero remainder
is alone on the lefthand side.
5 = 19 − 7 · 2
2 = 7−5·1
1 = 5−2·2
4.2. GREATEST COMMON DIVISORS
95
Next we reverse the order of the list
1 = 5−2·2
2 = 7−5·1
5 = 19 − 7 · 2
We now start with the first equation. The lefthand side is the gcd we are
interested in. We treat all other remainders as algebraic quantities and systematically substitute them in order. Thus we begin with the first equation
1 = 5 − 2 · 2.
The next equation in our list is
2=7−5·1
so we replace 2 in our first equation by the expression on the right to get
1 = 5 − (7 − 5 · 1) · 2.
We now rearrange this equation by collecting up like terms treating the highlighted remainders as algebraic objects to get
1 = 3 · 5 − 2 · 7.
We can of course make a check at this point to ensure that our arithmetic is
correct. The next equation in our list is
5 = 19 − 7 · 2
so we replace 5 in our new equation by the expression on the right to get
1 = 3 · (19 − 7 · 2) − 2 · 7.
Again we rearrange to get
1 = 3 · 19 − 8 · 7 .
The algorithm now terminates and we can write
gcd(19, 7) = 3 · 19 + (−8) · 7 ,
as required. We can also, of course, easily check the answer!
96
CHAPTER 4. NUMBER THEORY
I shall describe a much more efficient algorithm for implementing the
extended Euclidean algorithm later in this book when I have discussed matrices.
A very useful application of Bézout’s theorem is the following.
Lemma 4.2.10. Let a and b be natural numbers. Then a and b are coprime
if, and only if, we may find integers x and y such that
1 = xa + yb.
Proof. Suppose first that a and b are coprime. Then by Bézout’s theorem
gcd(a, b) = ax + by
for some integers a and b. But, by assumption, gcd(a, b) = 1. Conversely,
suppose that
1 = xa + yb.
Then any natural number that divides both a and b must divide 1. It follows
that gcd(a, b) = 1.
The significance of the above lemma is that whenever you know that a
and b are coprime, you can actually write down an expression 1 = xa + yb
which means the same thing. This turns out to be enormously useful.
Exercises 4.2
1. Use Euclid’s algorithm to find the gcd’s of the following pairs of numbers.
(a) 35, 65.
(b) 135, 144.
(c) 17017, 18900.
2. Use the extended Euclidean algorithm to find integers x and y such
that gcd(a, b) = ax + by for each of the following pairs of numbers. You
should ensure that your answers for x and y have the correct signs.
(a) 112, 267.
(b) 242, 1870.
4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
97
3. We know how to find the greatest natural number that divides two
numbers. Define now gcd(a, b, c) to be the greatest common divisor of
a and b and c jointly. Prove that
gcd(a, b, c) = gcd(gcd(a, b), c).
Deduce that
gcd(gcd(a, b), c) = gcd(a, gcd(b, c)).
We may similarly define gcd(a, b, c, d) to be the greatest common divisor
of a and b and c and d jointly. Calculate gcd(910, 780, 286, 195) and
justify your calculations.
4. The following question is by Dubisch Amer. Math. Mon. 69. Define
N∗ = N \ {0}. A binary operation ◦ defined on N∗ is known to have
the following properties:
(a) a ◦ b = b ◦ a.
(b) a ◦ a = a.
(c) a ◦ (a + b) = a ◦ b.
Prove that a ◦ b = gcd(a, b). Hint: the question is not asking you to
prove that gcd(a, b) has these properties.
5. You have an unlimited supply of 3 cent stamps and an unlimited supply
of 5 cent stamps. By combining stamps of different values you can make
up other values: for example, three 3 cent stamps and two 5 cent stamps
make the value 19 cents. What is the largest value you cannot make?
Hint: you need to show that the question makes sense.
6. Let n ≥ 1. Define φ(n) to be the number of numbers less than or equal
to n and coprime to n. This is the Euler totient function. Tabulate the
values of φ(n) for 1 ≤ n ≤ 12.
4.3
The fundamental theorem of arithmetic
The goal of this section is to state and prove the most basic result about the
natural numbers: each natural number, excluding 0 and 1, can be written
98
CHAPTER 4. NUMBER THEORY
as a product of powers of primes in essentially one way. The primes are
therefore the ‘atoms’ from which all natural numbers can be built.
A proper divisor of a natural number n is a divisor that is neither 1 nor
n. A natural number n is said to be prime if n ≥ 2 and the only divisors of
n are 1 and n itself. A number bigger than or equal to 2 which is not prime
is said to be composite. It is important to remember that the number 1 is
not a prime. The only even prime is the number 2.
The properties of primes have exercised a great fascination ever since they
were first studied and continue to pose questions that mathematicians have
yet to solve. There are no nice formulae to tell us what the nth prime is but
there are still some interesting results in this direction. The polynomial
p(n) = n2 − n + 41
has the property that its value for n = 1, 2, 3, 4, . . . , 40 is always prime. Of
course, for n = 41 it is clearly not prime. In 1971, the mathematician Yuri
Matijasevic found a polynomial in 26 variables of degree 25 with the property
that when non-negative integers are substituted for the variables the positive
values it takes are all and only the primes. However, this polynomial does
not generate the primes in any particular order.
Lemma 4.3.1. Let n ≥ 2. Either n is prime or the smallest proper divisor
of n is prime.
Proof. Suppose n is not prime. Let d be the smallest proper divisor of n. If d
were not prime then d would have a smallest proper divisor and this divisor
would in turn divide n, but this would contradict the fact that d was the
smallest proper divisor of n. Thus d must itself be prime.
The following was also proved by Euclid: it is Proposition 20 of Book IX
of Euclid.
Theorem 4.3.2. There are infinitely many primes.
Proof. Let p1 , . . . , pn be the first n primes. Put
N = (p1 . . . pn ) + 1.
If N is a prime, then N is a prime bigger than pn . If N is composite, then N
has a prime divisor p by Lemma 4.3.1. But p cannot equal any of the primes
p1 , . . . , pn because N leaves remainder 1 when divided by pi . It follows that
p is a prime bigger than pn . Thus we can always find a bigger prime. It
follows that there must be an infinite number of primes.
4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
99
Example 4.3.3. It’s interesting to consider some specific cases of the numbers introduced in the above proof. The first few are already prime.
• 2 + 1 = 3 prime.
• 2 · 3 + 1 = 7 prime.
• 2 · 3 · 5 + 1 = 31 prime.
• 2 · 3 · 5 · 7 + 1 = 211 prime.
• 2 · 3 · 5 · 7 · 11 + 1 = 2, 311 prime.
• 2 · 3 · 5 · 7 · 11 · 13 + 1 = 30, 031 = 59 · 509.
The Prime Number Theorem
There are infinitely many primes but how are those primes distributed?
For example, are they arranged fairly regularly, or do the gaps between
them get bigger and bigger? There are no formulae which output the
nth prime in a usable way, but if we adopt a statistical approach then we
can obtain much more useful results. The idea is that for each natural
number n we count the number of primes π(n) less than or equal to
n. The graph of π(n) has a staircase shape — it certainly isn’t smooth
— but as you zoom away it begins to look smoother and smoother.
This raises the question of whether there is a smooth function that is a
good approximation to π(n). In 1792, the young Carl Friedrich Gauss
(1777–1855) observed that π(n) appeared to be close to the value of the
n
. But proving that this was always true,
amazingly simple function ln(n)
and not just an artefact of the comparatively small numbers he looked
at, turned out to be difficult. Eventually, in 1896 two mathematicians,
Jacques Hadamard (1865–1963) and the spectacularly named Charles
Jean Gustave Nicolas Baron de la Vallée Poussin (1866–1962), proved
independently of each other that
π(x)
=1
x→∞ x/ ln(x)
lim
a result known as the Prime Number Theorem. It was proved using complex analysis; that is, calculus using complex numbers. As an example,
100
CHAPTER 4. NUMBER THEORY
we have that
π(1, 000, 000) = 78, 498
whereas
106
= 72, 382.
ln 106
Algorithm 4.3.4. To decide √whether a number n is prime or composite.
Check to see if any prime p ≤ n divides n. If none of them do, the number
n is prime. We shall now explain why this√works. If a divides
n then we can
√
√
write n =√ab for some number b. If a < n then b > n whilst if a > n
then b < n. Thus to decide if√
n is prime or not we need only carry out trial
divisions by all numbers a ≤ n. However, this is inefficient because if a
divides n and a is not prime then a is divisible by some prime p which must
therefore also divide
√ n. It follows that we need only carry out trial divisions
by the primes p ≤ n.
Example 4.3.5. Determine whether 97 is prime using the above √
algorithm.
We first calculate the largest whole number less than or equal to 97. This
is 9. We now carry out trial divisions of 97 by each prime number p where
2 ≤ p ≤ 9; by the way, if you aren’t certain which of these numbers is prime:
just try them all. You’ll get the right answer although not as efficiently. You
might also want to remember that if m doesn’t divide a number neither can
any multiple of m. In any event, in this case we carry out trial divisions by
2, 3, 5 and 7. None of them divides 97 exactly and so 97 is prime.
Cryptography
Prime numbers play an important role in exchanging secret information.
In 1976, Whitfield Diffie and Martin Hellman wrote a paper on cryptography that can genuinely be called ground-breaking. In ‘New directions
in cryptography’ IEEE Transactions on Information Theory 22 (1976),
644–654, they put forward the idea of a public-key cryptosystem which
would enable
. . . a private conversation . . . [to] be held between any two individuals regardless of whether they have ever communicated
before.
With considerable farsightedness, Diffie and Hellman foresaw that such
4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
101
cryptosystems would be essential if communication between computers
was to reach its full potential. However, their paper did not describe a
concrete way of doing this. It was R. I. Rivest, A. Shamir and L. Adleman (RSA) who found just such a concrete method described in their
paper, ‘A method for obtaining digital signatures and public-key cryptosystems’ Communications of the ACM 21 (1978), 120–126. Their
method is based on the following observation. Given two prime numbers it takes very little time to multiply them together, but if I give
you a number that is a product of two primes and ask you to factorize
it then it takes a lot of time. You might like to think about why in
relation to the algorithm I gave for factroizing numbers above. After
considerable experimentation, RSA showed how to use little more than
undergraduate mathematics to put together a public-key cryptosystem
that is an essential ingredient in e-commerce. Ironically, this secret code
had in fact been invented in 1973 at GCHQ, who had kept it secret.
The following is the key property of primes we shall need to prove the
fundamental theorem of arithmetic. It is the main reason why we needed
Bézout’s theorem. It is Proposition 30 of Book VII of Euclid.
Lemma 4.3.6 (Euclid’s lemma).
1. Let p | ab where p is a prime. Then p | a or p | b.
2. Let p | a1 . . . an where p is a prime. Then p | ai for some i.
Proof. (1) Suppose that p does not divide a. We shall prove that p must then
divide b. If p does not divide a, then a and p are coprime. By Lemma 4.2.10,
there exist integers x and y such that 1 = px + ay. Thus b = bpx + bay. Now
p | bp and p | ba, by assumption, and so p | b, as required.
(2) This is a typical application for proof by induction. We have proved
the base case where n = 2. Assume that the result holds when n = k. We
prove that it holds for n = k + 1. Suppose that p | (a1 . . . ak )ak+1 . From
the base case, either p | a1 . . . ak or p | ak+1 . But we may now deduce that
p | pi for some 1 ≤ i ≤ k or p | ak+1 by the induction hypothesis. We have
therefore proved the result.
Example 4.3.7. The above result is not true if p is not a prime. For example,
6 | 9 × 4 but 6 divides neither 9 nor 4.
102
CHAPTER 4. NUMBER THEORY
Lemma 4.3.6 is so important, I want to spell out in words what it says:
If a prime divides a product of numbers it must divide at least
one of them.
There is a very nice application of Euclid’s lemma
√ to proving that certain
numbers are irrational. It generalizes our proof that 2 is irrational described
in Chapter 2.
Theorem 4.3.8. The square root of every prime number is irrational.
√
Proof. We shall prove this by contradiction. Assume that we can write p
as a rational. I shall show that this assumption leads to a contradiction and
√
so must be false. We are assuming that p = ab . By cancelling the greatest
common divisor of a and b we can in fact assume that gcd(a, b) = 1. This
√
will be crucial to our argument. Squaring both sides of the equation p = ab
and multiplying the resulting equation by b2 we get that
pb2 = a2 .
This says that a2 is divisible by p. But if a prime divides a product of two
numbers it must divide at least one of those numbers by Euclid’s lemma.
Thus p divides a. Thus we can write a = pc for some natural number c.
Substituting this into our equation above we get that
pb2 = p2 c2 .
Dividing both sides of this equation by p gives
b2 = pc2 .
This tells us that b2 is divisible by p and so in the same way as above p
√
divides b. We have therefore shown that our assumption that p is rational
leads to both a and b being divisible by p. But this contradicts the fact that
√
gcd(a, b) = 1. Our assumption is therefore wrong, and so p is not a rational
number.
We now come to the main theorem of this chapter.
Theorem 4.3.9 (Fundamental theorem of arithmetic). Every number n ≥ 2
can be written as a product of primes in one way if we ignore the order in
which the primes appear. By product we allow the possibility that there is
only one prime.
4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
103
Proof. Let n ≥ 2. If n is already a prime then there is nothing to prove, so
we can suppose that n is composite. Let p1 be the smallest prime divisor of
n. Then we can write n = p1 n0 where n0 < n. Once again, n0 is either prime
or composite. Continuing in this way, we can write n as a product of primes.
We now prove uniqueness. Suppose that
n = p1 . . . ps = q 1 . . . qt
are two ways of writing n as a product of primes. Now p1 | n and so p1 |
q1 . . . qt . By Euclid’s Lemma, the prime p1 must divide one of the qi ’s and,
since they are themselves prime, it must actually equal one of the qi ’s. By
relabelling if necessary, we can assume that p1 = q1 . Cancel p1 from both
sides and repeat with p2 . Continuing in this way, we see that every prime
occurring on the lefthand side occurs on the righthand side. Changing sides,
we see that every prime occurring on the righthand side occurs on the lefthand
side. We deduce that the two prime decompositions are identical.
When we write a number as a product of primes we usually gather together the same primes into a prime power, and write the primes in increasing
order which then gives a unique representation. This is illustrated in the example below.
Example 4.3.10. Let n = 999, 999. Write n as a product of primes. There
are a number of ways of doing this but in this case there is an obvious place
to start. We have that
n = 32 ·111, 111 = 33 ·37, 037 = 33 ·7·5, 291 = 33 ·7·11·481 = 33 ·7·11·13·37.
Thus the prime factorisation of 999, 999 is
999, 999 = 33 · 7 · 11 · 13 · 37.
Supernatural Numbers
There are natural numbers. Are there super natural numbers? It sounds
like a joke but in fact there are, though to be honest they are only
encountered in advanced work. But since they are easy to understand
and I like the name, I have included a brief description List the primes
in order 2, 3, 5, 7, . . .. By the fundamental theorem of arithmetic, each
natural number ≥ 2 may be expressed as a unique product of powers of
primes. Let’s write each such natural number as a product all primes.
104
CHAPTER 4. NUMBER THEORY
This can be achieved by including those primes not needed by raising
them to the power 0. For example,
10 = 2 · 5 = 21 · 30 · 51 · 70 . . .
which we could write as
(1, 0, 1, 0, 0, 0 . . .)
and
12 = 22 · 3 = 22 · 31 · 50 · 70 . . .
which we could write as
(2, 1, 0, 0, 0, 0 . . .)
Of course, for each natural number from some point on all the entries
will be zero. Thus each natural number ≥ 2 is encoded by an infinite
sequence of natural numbers that are zero from some point onwards.
We now define a supernatural number to be any sequence
(a1 , a2 , a3 , . . .)
where the ai are natural numbers. We define a natural number to be a
supernatural number where the ai = 0 for all i ≥ m for some natural
number m ≥ 1. This makes sense because each natural supernatural
number can be regarded as the encoded version of a natural number in
the non-super sense. I shall denote the set of supernatural numbers by
S; this is not yet the complete list since I still have to add some special
such numbers. I shall denote supernatural numbers by bold letters such
as a. I shall also denote the ith component by ai . Let a and b be two
supernatural numbers. We define their product as follows
(a · b)i = ai + bi .
This makes sense because, for example,
10 · 12 = 120
4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
105
and
(1, 0, 1, 0, 0, 0 . . .) · (2, 1, 0, 0, 0, 0 . . .) = (3, 1, 1, 0, 0, 0 . . .)
which encodes 23 31 51 = 120. I shall leave you to check that the multiplication is associative. If we define
1 = (0, 0, 0, . . .)
and allow it to be supernatural then we also have a multiplicative identity because
1 · a = a = a · 1.
Now introduce a new symbol ∞ which satisfies a + ∞ = ∞ = ∞ + a.
Then if we allow
0 = (∞, ∞, ∞, . . .)
as a supernatural number then we also have a zero in the set of supernatural numbers since
0 · a = 0 = a · 0.
Finally, allow ∞ to occur anyway any number of times in the definition
of a supernatural number. Then we have the full set of supernatural
numbers. How do you think that we could define gcd(a, b) and lcm(a, b)
of supernatural numbers?
I shall now describe two simple applications of our main theorem.
The greatest common divisor of two numbers a and b is the largest number
that divides into both a and b. On the other hand, if a | c and b | c then we
say that c is a common multiple of a and b. The smallest common multiple
of a and b is called the least common multiple of a and b and is denoted by
lcm(a, b). You might expect that to calculate the least common multiple we
would need a new algorithm, but in fact we can use Euclid’s algorithm as
the following result shows.
Proposition 4.3.11. Let a and b be natural numbers not both zero. Then
gcd(a, b) · lcm(a, b) = ab.
Proof. We begin with a special case to motivate the idea. Suppose that
a = pr and b = ps where p is a prime. Then it is immediate from the
106
CHAPTER 4. NUMBER THEORY
properties of indices that
gcd(a, b) = pmin(r,s) and lcm(a, b) = pmax(r,s)
and so, in this special case, we have that gcd(a, b) · lcm(a, b) = ab. Next
suppose that the prime factorizations of a and b are
a = pr11 . . . prmm and b = ps11 . . . psmm
where the pi are primes. We may easily determine the prime factorization of
gcd(a, b) when we bear in mind the following points. The primes that occur
in the prime factorization of gcd(a, b) must be from the set {p1 , . . . , pm }, the
min(ri ,si )
number pi
divides gcd(a, b) but no higher power does. It follows that
min(r1 ,s1 )
(rm ,sm )
gcd(a, b) = p1
. . . pmin
.
m
A similar kind of argument proves that
max(r1 ,s1 )
(rm ,sm )
lcm(a, b) = p1
. . . pmax
.
m
The proof of the fact that gcd(a, b)·lcm(a, b) = ab now follows by multiplying
the above two prime factorizations together. In the above proof, we assumed
that a and b had prime factorizations using the same set of primes. This
need not be true in general, but by allowing zero powers of primes we can
easily arrange for the same sets of primes to occur and the argument above
remains valid.
For our next result, we begin with an observation. Some fractions, such
as can be written with only a finite number of digits after the decimal
place, but others, such as 13 require an infinite number of digits. We can now
account for this using our main theorem.
1
2
Proposition 4.3.12. A proper rational number ab in its lowest terms has a
finite decimal expansion if and only if b = 2m 5n for some natural numbers m
and n.
Proof. Let
a
b
have the finite decimal representation 0 · a1 . . . an . This means
a1
a2
an
a
=
+ 2 + ... + n.
b
10 10
10
4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
107
The righthand side is just the fraction
a1 10n−1 + a2 10n−2 + . . . + an
.
10n
The denominator contains only the prime factors 2 and 5 and so the reduced
form will also only contain at most the prime factors 2 and 5.
To prove the converse, consider the proper fraction
a
.
2α 5β
If α = β then the denominator is 10α . If α 6= β then multiply the fraction by
a suitable power of 2 or 5 as appropriate so that the resulting fraction has
denominator a power of 10. But any fraction with denominator a power of
10 has a finite decimal expansion.
Exercises 4.3
1. List the primes less than 100. Hint: use the Sieve of Eratosthenes1
which can be used to construct a table of all primes up to the number
N . List all numbers from 2 to N inclusive. Mark 2 as prime and then
cross out from the table all numbers which are multiples of 2. The
process now iterates as follows. Find the smallest number which is not
marked as a prime and which has not been crossed out. Mark it as a
prime and cross out all its multiples. If no such number can be found
then you have found all primes less than or equal to N .
2. For each of the following numbers use Algorithm 4.3.4 to determine
whether they are prime or composite. When they are composite find a
prime factorization. Show all working.
(a) 131.
(b) 689.
(c) 5491.
3. Find the lowest common multiples of the following pairs of numbers.
1
Eratosthenes of Cyrene who lived about 250 BCE. He is famous for using geometry
and some simple observations to estimate the circumference of the earth.
108
CHAPTER 4. NUMBER THEORY
(a) 22, 121.
(b) 48, 72.
(c) 25, 116.
4. Given 24 · 3 · 55 · 112 and 22 · 56 · 114 , calculate their greatest common
divisor and least common multiple.
5. Use the
√fundamental theorem of arithmetic to show that we can always
write n, where n is a natural number, as a product of a natural
number and a product of square roots of primes. Calculate the square
roots of the following numbers exactly using the above method.
(a) 10.
(b) 42.
(c) 54.
6. Let a and b be coprime. Prove that if a | bc then a | c.
4.4
Modular arithmetic
From an early age, we are taught to think of numbers as being strung out
along the number line
−3
−2
−1
0
1
2
3
But that is not the only way we count. We count the seasons in a cyclic
manner
. . . autumn, winter, spring, summer . . .
and likewise the days of the week
. . . Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday . . .
Also the months of the year or the hours in a day, whether by means of the 12hour clock or the 24-hour clock. The fact that we use words for these events
obscures the fact that we really are counting. This is clearer in the names
for the months since October, November and December were originally the
4.4. MODULAR ARITHMETIC
109
eighth, ninth and tenth months, respectively, until Roman politics intervened
and they were shifted. But the counting in all these cases is not linear but
cyclic. Rather than using a number line to represent this type of counting,
we use instead number circles, and rather than using the words above, I
shall use numbers. Here is the number circle for the seasons with numbers
replacing words.
0
3
1
2
Adding in these systems of arithmetic means stepping around in a clockwise
direction, whereas subtracting means stepping around in an anticlockwise
direction. Modular arithmetic is the name given to these different systems
of cyclic counting. It was Gauss who realised that these different systems of
counting were mathematically interesting.
4.4.1
Congruences
Let n ≥ 2 be a fixed natural number which in this context we call the
modulus. If a, b ∈ Z we write a ≡ b if, and only if, a and b leave the same
remainder when divided by n or, what amounts to the same thing, n | a − b.
Here are a couple of simple examples. If n = 2, then a ≡ b if, and only
if, a and b are either both odd or both even. On the other hand, if n = 10
then a ≡ b if, and only if, a and b have the same units digit.
The symbol ≡ is a modification of the equality symbol =. If a ≡ b with
respect to n we say that a is congruent to b modulo n. In fact, congruence
behaves like a weakened form of equality as we now show.
Lemma 4.4.1. Let n ≥ 2 be a fixed modulus.
1. a ≡ a.
110
CHAPTER 4. NUMBER THEORY
2. a ≡ b implies b ≡ a.
3. a ≡ b and b ≡ c implies that a ≡ c.
4. a ≡ b and c ≡ d implies that a + c ≡ b + d.
5. a ≡ b and c ≡ d implies that ac ≡ bd.
Here is a very simple application of modular arithmetic.
Lemma 4.4.2. A natural number n is divisible by 9 if, and only if, the sum
of the digits of n is divisible by 9.
Proof. We shall work modulo 9. The proof hinges on the fact that 10 ≡ 1
modulo 9. By using Lemma 4.4.1, we quickly find that 10r ≡ 1 for all natural
numbers r ≥ 1. We use this result now. Let
n = an 10n + an−1 10n−1 + . . . + a1 10 + a0 .
Then n ≡ an + . . . + a0 . Thus n and the sum of the digits of n leave the same
remainder when divided by 9, and so n is divisible by 9 if, and only if, the
sum of the digits of n are divisible by 9.
Solving a linear equation such as ax+by = c is very easy. For each possible
real value of x we can compute the corresponding real value of y. But suppose
now that a, b and c are integers and we only want to find solutions (x, y) whose
co-ordinates are integers? This is an example of a Diophantine equation. We
shall show how it may solved with the help of modular arithmetic. First,
we shall show that the problem of finding integer solutions is equivalent to
solving a simple kind of liner equation in one unknown in modular arithmetic.
Lemma 4.4.3. Let a, b and c be integers. Then the following are equivalent.
1. The pair (x1 , y1 ) is an integer solution to ax + by = c for some y1 .
2. The integer x1 is a solution to the equation ax ≡ c (mod b).
Proof. (1) ⇒ (2). Suppose that ax1 + by1 = c. Then it is immediate that
ax1 ≡ c (mod b).
(2) ⇒ (1). Suppose that ax1 ≡ c (mod b). Then by definition, ax1 − c =
bz1 for some integer z1 . Thus ax1 + b(−z1 ) = c. We may therefore put
y1 = z1 .
4.4. MODULAR ARITHMETIC
111
We shall now describe how to solve all equations of the form
ax ≡ b (mod n).
Lemma 4.4.4. Consider the linear congruence ax ≡ b (mod n).
1. The linear congruence has a solution if, and only if, d = gcd(a, n) is
such that d | b.
2. If the condition in part (1) holds and x0 is any solution, then all solutions have the form
n
x = x0 + t
d
where t ∈ Z.
Proof. (1). Suppose first that x1 is a solution to our linear congruence. Then
by definition, ax1 −b = nq for some integer q. It follows that ax1 +n(−q) = b.
By definition d | a and d | n and so d | b.
We now prove the converse. By Bézout’s theorem, we may find integers
u and v such that au + nv = d. By assumption, d | b and so b = dw for some
integer w. It follows that auw + nvw = dw = b. Thus a(uw) ≡ b (mod n),
and we have found a solution.
(2) Let x0 be any one solution to ax ≡ b (mod n). It is routine to check
that x = x0 + t nd for any t ∈ Z. Let x1 be any solution to ax ≡ b (mod n).
Then a(x1 − x0 ) ≡ 0 (mod n). Thus a(x1 − x0 ) = tn for some integer t. The
result now follows.
There is a special case of the above result that is very important. Its
proof is immediate.
Corollary 4.4.5. Let p be a prime. Then the linear congruence ax ≡ b
(mod p), where a is not congruent to 0 modulo p, always has a solution, and
all solutions are congruent modulo p.
Example 4.4.6. Let’s find all the points on the line 2x + 3y = 5 that have
integer co-ordinates. Observe first that gcd(2, 3) = 1. Thus such points exist.
In this case, by inspection, 1 = 2 · 2 + (−1)3. Thus 5 = 10 · 2 + (−5)3. It
follows that (10, −5) is one point on the line with integer co-ordinates. Thus
the set of integer solutions is
{(10 + 3t, −5 − 2t) : t ∈ Z}.
112
4.4.2
CHAPTER 4. NUMBER THEORY
Wilson’s theorem
I shall finish off this section with an application of congruences to primes.
It is the first hint of hidden patterns in the primes. We need some notation
first. For each natural number n define n!, pronounced n factorial, or if
you are more extrovert n shriek, as follows: 0! = 1 and for n > 0 define
n! = n · (n − 1)!. In other words, n! is what you get when you multiply
together all the positive integers less than or equal to n. For each natural
number n, we shall be interested in the value of (n − 1)! modulo n. Observe
that there is no point in studying n! (mod n) since the answer is always 0.
It’s worth doing some numerical calculations first to see if you can spot a
pattern.
Theorem 4.4.7 (Wilson’s Theorem). Let n be a natural number. Then n is
a prime if, and only if,
(n − 1)! ≡ n − 1
(mod n)
Since n − 1 ≡ −1 (mod n) this is usually expressed in the form
(n − 1)! ≡ −1
(mod n).
Proof. The statement to be proved is an ‘if, and only if’ and so we have to
prove two statements: (1) If n is prime then (n − 1)! ≡ n − 1 (mod n). (2)
If (n − 1)! ≡ n − 1 (mod n) then n is prime.
We prove (1) first. Let n be a prime. The result is clearly true when
n = 2 so we may assume n is an odd prime. For each 1 ≤ a ≤ n − 1 there
is a unique number 1 ≤ b ≤ n − 1 such that ab ≡ 1 (mod n). If a = b then
a1 ≡ 1 (mod n) which means that n | (a − 1)(a + 1). Since n is a prime
either n | a − 1 or a | a + 1. This can only occur if a = 1 or a = n − 1. Thus
(n − 1)! ≡ n − 1 (mod n), as claimed.
We now prove (2). Suppose that (n−1)! ≡ n−1 (mod n). We prove that
n is a prime. Observe that when n = 1 we have that (n − 1)! = 1 which is
not congruent to 0 modulo 1. When n = 4, we get that (4 − 1)! ≡ 2 (mod 4).
Suppose that n > 4 is not prime. Then n = ab where 1 < a, b < n. If a 6= b
then ab occurs as a factor of (n − 1)! and so this is congruent to 0 modulo n.
If a = b then a occurs in (n − 1)! and so does 2a. Thus n is again a factor of
(n − 1)!.
This theorem is interesting for another reason. To show that a number
is prime, we would usually apply the algorithm we described earlier which
4.5. CONTINUED FRACTIONS
113
is just a systematic way of carrying out trial division. This theorem shows
that a number is prime in a completely different way. Although it is not a
pratical test for deciding whether a number is prime or composite, since n!
gets very big very quickly, it shows that there might be backdoor ways of
showing that a number is prime. This is a very important question in the
light of the rôle of prime numbers in cryptography.
4.5
Continued fractions
The goal of this section is to show how some of the ideas we have introduced
so far can interact with each other. The material we cover is not needed
elsewhere in this book.
4.5.1
Fractions of fractions
We return to an earlier calculation. We used Euclid’s algorithm to calculate
gcd(19, 7) as follows.
19
7
5
2
=
=
=
=
7·2+5
5·1+2
2·2+1
1·2+0
We first rewrite each line, except the last, as follows
5
19
= 2+
7
2
7
2
= 1+
5
5
5
1
= 2+
2
2
Take the first equality
But
5
7
5
19
=2+ .
7
2
7
is the reciprocal of 5 , and from the second equality
7
2
=1+ .
5
5
114
CHAPTER 4. NUMBER THEORY
If we combine them, we get
1
19
=2+
7
1+
2
5
however strange this may look. We may repeat the process to get
19
=2+
7
1
1+
1
2+
1
2
Fractions like this are called continued fractions. Suppose I just gave you
1
2+
1+
1
2+
1
2
You could work out what the usual rational expression was by working from
the bottom up. First compute the part in bold below
1
2+
1+
1
2+
1
2
to get
2+
1
1+
1
5
2
which simplifies to
2+
1
1+
2
5
This process can no be repeated and we shall eventually obtain a standard
fraction.
I am not going to develop the theory of continued fractions, but I shall
show you one more application. Let r be a real number. We may write
r as r = m1 + r1 where 0 ≤ r1 < 1. For example, π may be written as
π = 3 · 14159265358 . . . where here m = 3 and r1 = 0 · 14159265358 . . .. Now
4.5. CONTINUED FRACTIONS
115
since r1 < 1 and assume that it is non-zero. Then r11 > 1. We may therefore
repeat the above process and write r11 = m2 + r2 where once again r2 < 1.
This begin to feel an aweful lot like what we did above. In fact, we may write
r = m1 +
1
,
m2 + r2
and we can continue the above process with r2 . It looks like we would obtain
a continued fraction representation of r with the big difference that it could
be infinite. Here is a concrete example.
√
√
Example 4.5.1. We apply the above process to 3. Clearly, 1 < 3 < 2.
Thus
√
√
3 = 1 + ( 3 − 1)
√
where 3 − 1 < 1. We now focus on
√
1
.
3−1
To
√ convert this into a more usable form we multiple top and bottom by
3 + 1. We therefore get that
1 √
1
√
= ( 3 + 1).
2
3−1
√
It is clear that 1 < 21 ( 3 + 1) < 1 21 . Thus
√
1
3−1
√
=1+
.
2
3−1
We now focus on
which simplifies to
2
√
3−1
√
3 + 1. Clearly
√
2 < 3 + 1 < 3.
√
√
Thus 3 + 1 = 2 + ( 3 − 1). However, we have now gone full circle. Let’s
see what we have obtained. We have that
√
3=1+
1
1
√
1+
2 + ( 3 − 1)
.
116
CHAPTER 4. NUMBER THEORY
However, we saw above that the pattern repeats as
actually have is
√
1
3=1+
.
1
1+
1
2+
1
1+
...
Let’s see where we are by computing
√
3 − 1, so what we
1
1+
1
1+
2+
1
1
which simplifies to 47 . You can check that this is an approximation to
4.5.2
√
3.
Rabbits and pentagons
We now illustrate some of the ways that algebra and geometry may interact. We begin with an artificial looking question. In his book, Liber Abaci,
Fibonacci raised the following little puzzle which I’ve taken from MacTutor:
“A certain man put a pair of rabbits in a place surrounded on
all sides by a wall. How many pairs of rabbits can be produced
from that pair in a year if it is supposed that every month each
pair begets a new pair which from the second month on becomes
productive?”
These are obviously mathematical rabbits rather than real ones so let me
spell out the rules more explicitly:
Rule 1 The problem begins with one pair of immature rabbits.2
Rule 2 Each immature pair of rabbits takes one month to mature.
Rule 3 Each mature pair of rabbits produces a new immature pair at the
end of a month.
2
Fibonacci himself seems to have assumed that the starting pair was already mature
but we shan’t.
4.5. CONTINUED FRACTIONS
117
Rule 4 The rabbits are immortal.
The important point is that we must solve the problem using the rules we
have been given. To do this, I am going to draw a picture. I will represent
an immature pair of rabbits by ◦ and a mature pair by •. Rule 2 will be
represented by
◦
•
and Rule 3 will be represented by
•@
~~
~~
~
~
~
@@
@@
@@
•
◦
Rule 1 tells us that we start with ◦. Applying the rules we obtain the following
picture for the first 4 months.
1 pair
◦
•<
•
<<<
<<
<<
◦<
•<
<<
<<
<<
<<
<<
<
<<
<
◦
•<
•
<<
<<<
<<
<<
<<
<<
<
<
◦
•
•
1 pair
2 pairs
3 pairs
◦
5 pairs
We start with 1 pair and at the end of the first month we still have 1 pair, at
the end of the second month 2 pairs, at the end of the third month 3 pairs,
and at the end of the fourth month 5 pairs. I shall write this F0 = 1, F1 = 1,
F2 = 2, F3 = 3, F4 = 5, and so on. Thus the problem will be solved if we
can compute F12 . There is an apparent pattern in the sequence of numbers
1, 1, 2, 3, 5, . . . after the first two terms in the sequence each number is the
sum of the previous two. Let’s check that we are not just seeing things.
Suppose that the number of immature pairs of rabbits at a given time t is
118
CHAPTER 4. NUMBER THEORY
It and the number of mature pairs is Mt . Then using our rules at time t + 1
we have that Mt+1 = Mt + It and It+1 = Mt . Thus
Ft+1 = 2Mt + It .
Similarly
Ft+2 = 3Mt + 2It .
It is now easy to check that
Ft+2 = Ft+1 + Ft .
The sequence of numbers such that F0 = 1, F1 = 1 and satisfying the
rule Ft+2 = Ft+1 + Ft is called the Fibonacci sequence. We have that
F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, F5 = 8, F6 = 13, F7 = 21,
F8 = 34, F9 = 55, F10 = 89, F11 = 144, F12 = 233.
The solution to the original question is therefore 233 pairs of rabbits.
Fibonacci numbers arise in the most diverse situations: famously, in phyllotaxis which is the study of how leaves and petals are arranged on plants.
We shall now look for a formula that will enable us to calculate Fn directly.
To begin, we’ll follow an idea due to the astronomer Jonannes Kepler, and
as n gets bigger and bigger. I
look at the behaviour of the fractions FFn+1
n
have tabulated some calculations below.
F1
F0
F2
F1
1
2
F3
F2
F4
F3
F5
F4
F6
F5
F7
F6
F14
F13
1 · 5 1 · 6 1 · 625 1 · 615 1 · 619 1 · 6180
These ratios seem to be going somewhere; the question is: where? Notice
that
Fn+1
Fn + Fn−1
Fn−1
1
=
=1+
= 1 + Fn .
Fn
Fn
Fn
Fn−1
n
But for very large n we suspect that FFn+1
and FFn−1
will be almost the same.
n
This suggests, but doesn’t prove, that we need to find the positive solution
x to
1
x=1+ .
x
Thus x is a number that when you take its reciprocal and add 1 you get x
back again. This problem is really a quadratic equation in disguise
x2 = x + 1 or more usually x2 − x − 1 = 0.
4.5. CONTINUED FRACTIONS
119
This equation can be solved very simply to give us
√
1± 5
x=
.
2
That is
√
√
1− 5
1+ 5
and φ̄ =
.
φ=
2
2
The number φ is called the golden ratio, about which a deal of nonsense has
been written. Let’s go back and see if this calculation makes sense. First we
calculate φ and we get
φ = 1 · 618033988 . . .
I compute
F19
6765
=
= 1 · 618033963
F18
4181
on my pocket calculator. This is pretty close.
We can now get our formula for the Fibonacci numbers. Define
1
fn = √ φn+1 − φ̄n+1 .
5
I’m going to show you that Fn = fn . To do this, I’ll use the following
identities which are straightforward to check
√
φ − φ̄ = 5 φ2 = φ + 1 and φ̄2 = φ̄ + 1.
Let’s start with f0 . We know that
φ − φ̄ =
√
5
and so we really do have that f0 = 1. To calculate f1 we use the other
formulae and again we get f1 = 1. We now calculate fn + fn+1 we get
1
1
fn + fn+1 = √ φn+1 − φ̄n+1 + √ φn+2 − φ̄n+2
5
5
1
= √ φn+1 + φn+2 − (φ̄n+1 + φ̄n+2 )
5
1
= √ φn+1 (1 + φ) − φ̄n+1 (1 + φ̄)
5
1
= √ φn+1 φ2 − φ̄n+1 φ̄2
5
1
= √ φn+3 − φ̄n+3 = fn+2
5
120
CHAPTER 4. NUMBER THEORY
Because fn and Fn start in the same place and satisfy the same rules, we
have therefore proved that
Fn = √15 φn+1 − φ̄n+1 .
At this point, we can go back and verify our original idea that the fractions
seem to get closer and closer to φ as n gets larger and larger. We have
that
Fn+1
Fn
Fn+1
φn+2 − φ̄n+2
=
Fn
φn+1 − φ̄n+1
1
φ
− 1 φ
=
φ̄ n+1
n+1
( )
−
1 − (φ)
φ̄ φ̄
1
φ̄
I have rewritten it like this so that we can see what happens as n gets larger
and larger. Observe that the absolute value of φφ̄ is less than 1. So as n gets
larger and larger the first term above gets closer and closer to φ. Now look
at the second term. The absolute value of the fraction φφ̄ is strictly greater
than 1. Thus as n gets larger and larger the denominator of the second term
gets larger and larger and so the fraction as a whole gets smaller and smaller.
really is close to φ when n is large.
Thus we have proved that FFn+1
n
So far, what we have been doing is algebra. I shall now show that there
is geometry here as well. Below is a picture of a regular pentagon. I have
assumed that the length of the sides is 1. I claim that the length of a diagonal,
such as BE, is equal to φ.
B
C
1
φ
A
D
E
To prove this I am going to use Ptolomy’s theorem. We shall concentrate on
the cyclic quadrilateral formed by the vertices ABDE.
4.5. CONTINUED FRACTIONS
121
B
C
A
D
E
I’ll let the side of a diagonal be x. Then by Ptolomy’s theorem, we have that
x2 = 1 + x.
But this is precisely the quadratic equation we solved above. Its positive
solution is φ and so the length of a diagonal of a regular pentagon with side
1 is φ.
This raises the question of whether we can somehow see the Fibonacci
numbers in the regular pentagon. The answer is: almost. Consider the
diagram below.
B
C
e0
d0
a0
D
A
b0
c
0
E
I’ve drawn in all the diagonals. The shaded triangle BCD is similar to the
shaded triangle Ac0 E. This means that they have exactly the same shapes
just different sizes. It follows that
Ac0
BC
=
.
AE
BD
122
CHAPTER 4. NUMBER THEORY
But AE is a side of the pentagon and so has unit length, and BD is of length
φ. Thus
1
AC 0 = .
φ
Now, Dc0 has the same length as BC which is a side of the pentagon. Thus
Dc0 = 1. We now have
φ = DA = Dc0 + c0 A = 1 +
1
.
φ
Thus, just from geometry, we get
φ=1+
1
.
φ
This is a very odd equation because φ is mentioned on both sides. Let’s go
with it and repeat:
1
φ=1+
1 + φ1
and
1
φ=1+
1+
1
1+
1
φ
and
1
φ=1+
1
1+
1+
1
1+
1
φ
and so on. We therefore obtain a continued fraction. For each of these
fractions cover up the term φ1 and then calculate what you see to get
1,
1+
1
= 2,
1
1+
1
1+
1
1
3
= ,
2
and the Fibonacci sequence reappears.
1
1+
1+
1
1+
5
= ,...
3
1
1
Chapter 5
Complex numbers
Why be one-dimensional when you can be two-dimensional?
?
−3
−2
−1
0
1
2
3
?
We begin by returning to the familiar number line, where I have placed
the question marks there appear to be no numbers. I shall rectify this by
defining the complex numbers which give us a number plane rather than just a
number line. Complex numbers play a fundamental rôle in mathematics. For
example, in this chapter, I shall use them to show how e and π — numbers
of radically different origins — are in fact connected.
5.1
Complex number arithmetic
In the set of real numbers we can add, subtract, multiply and divide, but
we cannot always extract square roots. For example, the real number 1 has
the two real square roots 1 and −1, whereas the real number −1 has no real
123
124
CHAPTER 5. COMPLEX NUMBERS
square roots, the reason being that the square of any real non-zero number is
always positive. In this section, we shall repair this lack of square roots and,
as we shall learn, we shall in fact have achieved much more than this. Complex numbers were first studied in the 1500’s but were only fully accepted
and used in the 1800’s.
√
Warning! If r is a positive real number then r is usually interpreted to
mean the positive square root. If I want
√ to emphasize that both square roots
need to be considered I shall write ± r.
When the discriminant of a quadratic equation is strictly less than zero,
we know that it has no real roots. In this section, we shall show that in
this case the equation has two complex roots. This will mean that quadratic
equations will always have two roots. The key step is the following
We introduce a new number, denoted by i, whose defining property
is that i2 = −1. We shall assume that in all other respects it
satisfies the usual axioms of high-school algebra. This assumption
will be justified later.
We shall now explore the consequences of this definition which turns out
to be a profound one for mathematics. The numbers i and −i are the two
‘missing’ square roots of 1. In all other respects, the number i will behave
like a real number. Thus if b is any real number then bi is a number, and if
a is any real number then a + bi is a number. We therefore formally define a
complex number to be a number of the form a + bi where a, b ∈ R. We denote
the set of complex numbers by C. Complex numbers are sometimes called
imaginary numbers. This is not such a good term: they are not figments of
our imagination like unicorns or dragons. Like all numbers they are, however,
products of our imagination: no one has seen the complex number number i
but, then again, no one has seen the number 2. If z = a + bi then we call a
the real part of z, denoted Re(z), and b the complex or imaginary part of z,
denoted Im(z).
Two complex numbers a + bi and c + di are equal precisely when
a = c and b = d. In other words, when their real parts are equal
and when their complex parts are equal.
We can think of every real number as being a special kind of complex
number because if a is real then a = a + 0i. Thus R ⊆ C. Complex numbers
5.1. COMPLEX NUMBER ARITHMETIC
125
of the form bi are said to be purely imaginary. Now we show that we can
add, subtract, multiply and divide complex numbers. Addition, subtraction
and multiplication are all easy. Let a + bi, c + di ∈ C. To add these numbers
means to calculate (a + bi) + (c + di). We assume that the order in which
we add complex numbers doesn’t matter and that we may bracket sums of
complex numbers how we like and still get the same answer and so we can
rewrite this as a + c + bi + di. Next we assume that multiplication of complex
numbers distributes over addition of complex numbers to get (a+c)+(b+d)i.
Thus
(a + bi) + (c + di) = (a + c) + (b + d)i.
The definition of subtraction is similar and justified in the same way
(a + bi) − (c + di) = (a − c) + (b − d)i.
To multiply our numbers means to calculate (a + bi)(c + di). We first assume
complex multiplication distributes over complex addition to get (a + bi)(c +
di) = ac + adi + bic + bidi. Next we assume that the order in which we
multiply complex numbers doesn’t matter to get ac + adi + bic + bidi = ac +
adi+bci+bdi2 . Now we use the fact that i2 = −1 to get ac+adi+bci+bdi2 =
ac+adi+bci−bd. We now rearrange the terms to get the following definition
of multiplication
(a + bi)(c + di) = (ac − bd) + (ad + bc)i.
Examples 5.1.1. Carry out the following calculations.
1. (7 − i) + (−6 + 3i). We add together the real parts to get 1; adding
together −i and 3i we get 2i. Thus the solution is 1 + 2i.
2. (2 + i)(1 + 2i). First we multiply out the brackets as usual to get
2 + 4i + i + 2i2 . We now use the fact that i2 = −1 to get 2 + 4i + i − 2.
Finally we simplify to get 0 + 5i = 5i.
2
√
. Multiply out and simplify to get −i.
3. 1−i
2
The final operation is division. We have to show that when a + ib 6= 0
the reciprocal
1
a + ib
126
CHAPTER 5. COMPLEX NUMBERS
is also a complex number. We use an idea that can also be applied in other
situations called rationalizing the denominator. It is convenient first to define
a new operation on complex numbers. Let z = a + bi ∈ C. Define
z̄ = a − bi.
The number z̄ is called the complex conjugate of z. Why is this operation
useful? Let’s calculate z z̄. We have
z z̄ = (a + bi)(a − bi) = a2 − abi + abi − b2 i2 = a2 + b2 .
Notice that z z̄ = 0 if and only if z = 0. Thus for non-zero complex numbers
z, the number z z̄ is a positive real number. Let’s see how we can use the
complex conjugate to define division of complex numbers. Our goal is to
calculate
1
a + bi
where a+bi 6= 0. The first step is to multiply top and bottom by the complex
conjugate of a + bi. We therefore get
a − bi
a − bi
1
= 2
= 2
(a − bi) .
2
(a + bi)(a − bi)
a +b
a + b2
Examples 5.1.2. Carry out the following calculations.
1+i
.
i
1.
The complex conjugate of i is −i. Multiply top and bottom of the
fraction to get −i+1
= 1 − i.
1
2.
i
.
1−i
The complex conjugate of 1 − i is 1 + i. Multiply top and bottom
of the fraction to get i(1+i)
= i−1
.
2
2
3.
4+3i
.
7−i
The complex conjugate of 7 − i is 7 + i. Multiply top and bottom
of the fraction to get (4+3i)(7+i)
= 1+i
.
50
2
We shall need the following properties of the complex conjugate later on.
Lemma 5.1.3.
1. z1 + . . . + zn = z1 + . . . + zn .
2. z1 . . . zn = z1 . . . zn .
5.1. COMPLEX NUMBER ARITHMETIC
127
3. z is real if and only if z = z.
Proof. (1) We prove the case where n = 2. The general case can then be
proved using induction. Let z1 = a + ib and z2 = c + id. Then z1 + z2 =
(a + c) + i(b + d). Thus z1 + z2 = (a + c) − i(b + d). But z1 = a − ib and
z2 = c − id and so z1 + z2 = (a − ib) + (c − id) = (a + c) − (b + d)i. Thus
z1 + z2 = z1 + z2 .
(2) We prove the case where n = 2. The general case can then be proved
using induction. Using the notation form part (1), we have that z1 z2 =
(ac − bd) + (ad + bc)i. Thus z1 z2 = (ac − bd) − (ad + bc)i. On the other hand,
z1 z2 = (ac − bd) − i(ad + bd), as required.
(3) If z is real then it is immediate that z = z. Suppose that z = z where
z = a + ib. Then a + ib = a − ib. Hence b = −b and so b = 0. It follows that
z is real.
We now introduce a way of thinking about complex numbers that enables
us to visualize them. A complex number z = a + bi has two components: a
and b. It is irresistible to plot these as a point in the plane. The plane used
in this way is called the complex plane: the x-axis is the real axis and the
y-axis is interpreted as the complex axis.
z = a + ib
ib
a
Although a complex number can be thought of as labelling a point in the
complex plane, it can also be regarded as labelling the directed line segment
from the origin to the point, and this turns out to be the √
more fruitful
viewpoint. By Pythagoras’ theorem, the length of this line is a2 + b2 . We
define
√
|z| = a2 + b2
128
CHAPTER 5. COMPLEX NUMBERS
where z = a + bi. This is called the modulus1 of the complex number z.
Observe that
√
|z| = z z̄.
We shall use the following important property of moduli.
Lemma 5.1.4. |wz| = |w| |z|.
Proof. Let
p w = a + bi and z = c + di. Then wz = (ac
p− bd) + (ad + bc)i. Now
2
2
|wz| = (ac − bd) + (ad + bc) whereas |w| |z| = (a2 + b2 )(c2 + d2 ). But
(ac − bd)2 + (ad + bc)2 = (ac)2 + (bd)2 + (ad)2 + (bc)2 = (a2 + b2 )(c2 + d2 ).
Thus the result follows.
The complex numbers were obtained from the reals by simply adjoining
one new number, i, a square root of −1. Remarkably, every complex number
has a square root — there is no need to invent any new numbers.
Theorem 5.1.5. Every nonzero complex number has exactly two square
roots.
Proof. Let z = a + bi be a nonzero complex number. We want to find a
complex number w so that w2 = z. Let w = x + yi. Then we need to find
real numbers x and y such that (x+yi)2 = a+bi. Thus (x2 −y 2 )+2xyi = a+bi,
and so equating real and imaginary parts, we have to solve the following two
equations
x2 − y 2 = a and 2xy = b.
Now we actually have enough information to solve our problem, but we can
make life easier for ourselves by adding one extra equation. To get it, we use
the modulus function. From (x+yi)2 = a+bi
we get that |x + yi|2 = |a + bi|.
√
2
Now |x + yi| = x2 + y 2 and |a + bi| = a2 + b2 . We therefore have three
equations
√
x2 − y 2 = a and 2xy = b and x2 + y 2 = a2 + b2 .
If we add the first and third equation together we get
√
√
a2 + b 2
a
a + a2 + b 2
2
x = +
=
.
2
2
2
We can now solve for x and therefore for y.
1
Plural: moduli
5.1. COMPLEX NUMBER ARITHMETIC
129
Example 5.1.6. Every negative real number has√two square roots. We have
that the square roots of −r, where r > 0 are ±i r.
Example 5.1.7. Find both square roots of 3 + 4i and check your answers.
We assume that there is a complex number x + yi where both x and y are
real such that
(x + yi)2 = 3 + 4i.
Squaring and comparing real and imaginary parts we get that the following
two equations must be satisfied by x and y
x2 − y 2 = 3 and 2xy = 4.
We also have a third equation by taking moduli
x2 + y 2 = 5.
Adding the first and third equation together we get x = ±2. Thus y = 1 if
x = 2 and y = −1 if x = −2. The roots we want are therefore 2 + i and
−2 − i. Of course, one root will be minus the other. Now square either root
to check your answer: (2 + i)2 = 4 + 4i − 1 = 3 + 4i, as required.
Remark Notice that the two square roots of a non-zero complex number will
have the form w and −w; in other words, one root will be −1 times the other.
If we combine our method for solving quadratics with our method for
determining the square roots of complex numbers, we have a method for
finding the roots of quadratics with any coefficients, whether they be real or
complex.
Example 5.1.8. Solve the quadratic equation
4z 2 + 4iz + (−13 − 16i) = 0.
The complex numbers obey the same algebraic laws as the reals and so we
can solve this equation by completing the square or we can simply plug the
numbers into the formula for the roots of a quadratic. Here I shall complete
the square. First, we convert the equation into a monic one
z 2 + iz +
(−13 − 16i)
= 0.
4
130
CHAPTER 5. COMPLEX NUMBERS
Next, we observe that
2
i
1
z+
= z 2 + iz − .
2
4
Thus
2
i
1
z + iz = z +
+ .
2
4
2
Our equation therefore becomes
2
i
1
13
z+
+ + − − 4i = 0.
2
4
4
We therefore have
i
z+
2
2
= 3 + 4i.
Taking square roots of both sides using a previous calculation, we have that
z+
It follows that z = 2 +
work.
i
2
i
= 2 + i or − 2 − i.
2
or − 2 −
3i
.
2
Now check that these roots really do
Every quadratic equation ALWAYS has exactly two roots.
Exercises 5.1
1. Solve the following problems in complex number arithmetic. In each
case, the answer should be in the form a + ib where a and b are real.
(a) (2 + 3i) + (4 + i).
(b) (2 + 3i)(4 + i).
(c) (8 + 6i)2 .
(d)
(e)
2+3i
.
4+i
1
3
+ 1+i
.
i
5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
(f)
3+4i
3−4i
−
131
3−4i
.
4+4i
2. Find the square roots of each of the following complex numbers and
check your answers.
(a) −i.
√
(b) −1 + i 24.
(c) −13 − 84i.
3. Solve the following quadratic equations and check your answers.
(a) x2 + x + 1 = 0.
(b) 2x2 − 3x + 2 = 0.
(c) x2 − (2 + 3i)x − 1 + 3i = 0.
5.2
The fundamental theorem of algebra
We have proved that every quadratic equation has exactly two roots. The
goal of this section is to generalize this result: I shall prove that every polynomial equation of degree n has exactly n roots. This result plays a key role
in calculus where it is used (in its real version which I also describe) to prove
that any rational function can be integrated using partial fractions. In this
section, we shall work with arbitrary polynomials so I shall now recall some
of the terminology needed to handle them. An expression
an xn + an−1 xn−1 + . . . + a1 x + a0
where ai are complex numbers, called the coefficients, is called a polynomial.
If all the coefficients are zero then the polynomial is identically zero and we
shall call it the zero polynomial. We assume an 6= 0. The degree of this
polynomial is n. We abbreviate this to deg. If an = 1 the polynomial is
said to be monic. The term a0 is called the constant term and the term
an xn is called the leading term. Polynomials can be added, subtracted and
multiplied. Two polynomials are equal if they have the same degree and the
coefficients of terms of the same degree are equal.
• Polynomials of degree 1 are said to be linear.
132
CHAPTER 5. COMPLEX NUMBERS
• those of degree 2, quadratic.
• those of degree 3, cubic.
• those of degree 4, quartic.
• those of degree 5, quintic.
There are special terms for polynomials of degree higher than 5, if you want
them. Why are polynomials interesting? There are two answers to this question. First, they have widespread applications such as in helping to solve
linear differential equations and in studying matrices. Second, a polynomial
defines a function which is calculated in a very simple way using the operations of addition, subtraction and multiplication. However many, more
complicated, functions can be usefully approximated by polynomial ones.
We denote by C[x] the set of polynomials with complex coefficients and by
R[x], the set of polynomials with real coefficients. I will write F [x] to mean
F = R or F = C.
5.2.1
The remainder theorem
The addition, subtraction and multiplication of polynomials is easy. We shall
therefore concentrate in this section on division. Let f (x), g(x) ∈ F [x]. We
say that g(x) divides f (x), denoted by
g(x) | f (x),
if there is a polynomial q(x) ∈ F [x] such that f (x) = g(x)q(x). We say that
g(x) is a factor of f (x). There are obvious similarities here with our work in
Chapter 4.
Example 5.2.1. Let f (x) = x4 + 2x + 1 and g(x) = x + 1. Then
x + 1 | x4 + 2x + 1
since x4 + 2x + 1 = (x + 1)(x3 − x2 + x + 1).
In multiplying and dividing polynomials the following result is key.
Lemma 5.2.2. Let f (x), g(x) ∈ F [x] be non-zero polynomials. Then
deg f (x)g(x) = deg f (x) + deg g(x).
5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
133
Proof. Let f (x) have leading term am xm and let g(x) have leading term bn xn .
Then the leading term of f (x)g(x) is am bn xm+n . Now am bn 6= 0 and so the
degree of f (x)g(x) is m + n, as required.
The following result is analogous to the remainder theorem for integers
Lemma 4.1.1 I shall not prove it here.
Lemma 5.2.3 (Remainder theorem). Let f (x) and g(x) be polynomials in
F [x] where deg f (x) ≥ deg g(x). Then either
g(x) | f (x)
or
f (x) = g(x)q(x) + r(x)
where deg r(x) < deg g(x).
Example 5.2.4. Let f (x) = x3 + x + 3 and g(x) = x2 + x. Then x3 + x + 3 =
(x − 1)(x2 + x) + (2x + 3). Here x − 1 is the quotient and 2x + 3 is the
remainder.
The following example is a reminder of how to carry out long division of
polynomials. Remember that answers can always be checked by multiplying
out.
Example 5.2.5. Divide 6x4 + 5x3 + 4x2 + 3x + 2 by 2x2 + 4x + 5 and so find
the quotient and remainder. We set out the computation in the following
form.
2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
To get the term involving 6x4 we would have to multiply the lefthand side
by 3x2 . As a result we write down the following
3x2
2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
We now subtract the lower righthand side from the upper and we get
3x2
2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
−7x3 − 11x2 + 3x + 2
2
134
CHAPTER 5. COMPLEX NUMBERS
The procedure is now repeated with the new polynomial.
3x2 − 72 x
2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
−7x3 − 11x2 + 3x + 2
x
−7x3 − 14x2 − 35
2
41
2
3x + 2 x + 2
2
The procedure is repeated one more time with the new polynomial
3x2 − 72 x + 32 quotient
2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
−7x3 − 11x2 + 3x + 2
−7x3 − 14x2 − 35
x
2
41
2
3x + 2 x + 2
3x2 + 12
x + 15
2
2
29
x − 11
remainder
2
2
2
This is the end of the line because the new polynomial we obtain has degree
strictly less than the polynomial we are dividing by. What we have shown is
that
3
11
7
29
4
3
2
2
2
x−
6x + 5x + 4x + 3x + 2 = 2x + 4x + 5 3x − x +
+
.
2
2
2
2
You can verify this is true by multiplying out the righthand side.
5.2.2
Roots of polynomials
Let f (x) ∈ F [x]. A number r ∈ F is said to be a root or zero of f (x) if
f (r) = 0. The roots of f (x) are the solutions of the equation f (x) = 0.
Example 5.2.6. The number 1 is a root of x100 −2x98 +1 because 1−2+1 = 0.
Checking whether a number is a root is easy, but finding a root in the
first place is trickier. The next result tells us that when we find roots of polynomials we are in fact determining linear factors. It is crucial to eveything
we shall do.
5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
135
Proposition 5.2.7. Let r ∈ F . Then r is a root of f (x) ∈ F [x] if and only
if (x − r) | f (x).
Proof. Suppose that (x − r) | f (x). Then by definition f (x) = (x − r)q(x)
for some polynomial q(x). If we now calculate f (r) we see immediately that
it must be zero.
We now prove the converse. Suppose that r is a root of f (x). By the
remainder theorem, either (x − r) | f (x) or f (x) = q(x)(x − r) + r(x) where
deg(r(x)) < deg(x − r) = 1. If the former then we are done. If the latter
then it follows that r(x) is in fact a constant (that is, just a number). Call
this number a. If we calculate f (r) we get a. It follows that in fact a = 0
and so (x − r) | f (x).
Example 5.2.8. We have seen that the number 1 is a root of x100 − 2x98 + 1.
Thus by the above result (x − 1) | x100 − 2x98 + 1.
A root r of a polynomial f (x) is said to have multiplicity m if
(x − r)m | f (x)
but (x − r)m+1 does not divide f (x). A root is always counted according to
its multiplicity.
Example 5.2.9. The polynomial x2 + 2x + 1 has −1 as a root and no other
roots. However (x + 1)2 = x2 + 2x + 1 and so the root −1 occurs with
multiplicity 2. Thus the polynomial has two roots counting multiplicities.
This is the sense in which we can say that a quadratic equation always has
two roots.
The following result is extremely useful. It provides an upper bound to
the number of roots a polynomial may have.
Theorem 5.2.10. A non-constant polynomial of degree n has at most n
roots.
Proof. Let f (x) be a non-zero polynomial of degree n > 0. Suppose that
f (x) has a root a. Then f (x) = (x − a)f1 (x) by Proposition 5.2.7 and the
degree of f1 (x) is n − 1. This argument can be repeated and we reach the
desired conclusion.
136
5.2.3
CHAPTER 5. COMPLEX NUMBERS
The fundamental theorem of algebra
The big question I have so far not dealt with is whether a polynomial need
have a root at all. This is answered by the following theorem whose name
reflects its importance when first discovered, though not its significance in
modern algebra. We shall not give a proof because that would require more
advanced methods than are covered in this book. It was first proved by
Gauss.
Theorem 5.2.11 (Fundamental theorem of algebra (FTA)). Every nonconstant polynomial of degree n with complex coefficients has a root.
The fundamental theorem of algebra has the following important consequence using Theorem 5.2.10.
Corollary 5.2.12. Every non-constant polynomial with complex coefficients
of degree n has exactly n complex roots (counting multiplicities). Thus every
such polynomial can be written as a product of linear polynomials.
Proof. Let f (x) be a non-constant polynomial of degree n. By the FTA, this
polynomial has a root r1 . Thus f (x) = (x − r1 )f1 (x) where f1 (x) is a polynomial of degree n − 1. This argument can be repeated and we eventually end
up with f (x) = a(x − r1 ) . . . (x − rn ) where a is the last quotient, necessarily
a complex number.
Example 5.2.13. It can be checked that the quartic x4 − 5x2 − 10x − 6 has
roots −1, 3, i − 1 and −1 − i. We can therefore write
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
In many practical examples, our polynomials will have real coefficients
and we will want any factors of the polynomial to likewise be real. The result
above doesn’t do that because it could produce complex factors. However,
we can rectify this situation at a very small price. We shall use the notion of
the complex conjugate of a complex number that we introduced earlier. We
may now prove the following key lemma.
Lemma 5.2.14. Let f (x) be a polynomial with real coefficients. If the complex number z is a root then so too is z.
5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
137
Proof. Let
f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0
where the ai are real numbers. Let z be a complex root. Then
0 = an z n + an−1 z n−1 + . . . + a1 z + a0 .
Take the complex conjugate of both side and use the properties of the complex
conjugate to get
0 = an z̄ n + an−1 z̄ n−1 + . . . + a1 z̄ + a0
and so z̄ is also a root.
Example 5.2.15. We saw above that
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
Observe that the complex roots −1 − i and −1 + i are complex conjugates
of each other.
Lemma 5.2.16. Let z be a complex number which is not real. Then
(x − z)(x − z̄)
is an irreducible quadratic with real coefficients.
On the other hand, if x2 + bx + c is an irreducible quadratic with real
coefficients then its roots are complex conjugates of each other.
Proof. To prove the first claim, we multiply out to get
(x − z)(x − z̄) = x2 − (z + z̄)x + z z̄.
Observe that z + z̄ and z z̄ are both real numbers. The discriminant of this
polynomial is (z − z̄)2 . You can check that if z is complex and non-real then
z − z̄ is purely complex. It follows that its square is negative. We have
therefore shown that our quadratic is irreducible.
The proof of the second claim follows from the formula for the roots of a
quadratic combined with the fact that the square root of a negative real will
have the form ±αi where α is real.
138
CHAPTER 5. COMPLEX NUMBERS
Example 5.2.17. We saw above that
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
Multiply out (x + 1 + i)(x + 1 − i) and we get x2 + 2x + 2. Thus
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x2 + 2x + 2)
with all the polynomials involved being real.
The following theorem is the one that we can use to help us solve problems
involving real polynomials.
Theorem 5.2.18 (Fundamental theorem of algebra for real polynomials).
Every non-constant polynomial with real coefficients can be written as a product of polynomials with real coefficients which are either linear or irreducible
quadratic.
Proof. We can write the polynomial as a product of linear polynomials. Bring
the real linear factors to the front. The remaining linear polynomials will
have complex coefficients. They correspond to roots that come in complex
conjugate pairs. Multiplying together those complex linear factors corresponding to complex conjugate roots we get real quadratics and the result is
proved.
In fact, we can write any real polynomial as a real number times a product
of monic linear and quadratic factors. This result is the basis of the method
of partial fractions used in integrating rational functions in calculus. This is
discussed in Chapter 6.
Finding the exact roots of a polynomial is difficult, in general. However,
the following result tells us how to find the rational roots of polynomials with
integer coefficients. It is a nice, and perhaps unexpected, application of the
number theory we developed in Chapter 4.
Theorem 5.2.19 (Rational root theorem). Let
f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0
be a polynomial with integer coefficients. If rs is a root with r and s coprime
then r | a0 and s | an . In particular, if the polynomial is monic then any
rational roots must be integers and divide the constant term.
5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
Proof. Substituting
r
s
139
into f (x) we have, by assumption, that
r
r
r
0 = an ( )n + an−1 ( )n−1 + . . . + a1 ( ) + a0 .
s
s
s
Multiply through by sn to get
0 = an rn + an−1 srn−1 + . . . + sn−1 r + a0 sn .
We now make two observations. First, r | a0 sn . I claim that r and sn are
coprime. We may now deduce that r | a0 from a previous exercise. It only
remains to prove the claim. Let p be any prime that divides r and sn . Then
by Euclid’s lemma, p divides r and s which is a contradiction since r and s
are coprime. It follows that r | a0 . Second, s | an rn . By a similar argument
to the previous case s | an .
Example 5.2.20. Find all the roots of the following polynomial
x4 − 8x3 + 23x2 − 28x + 12.
The polynomial is monic and so the only possible rational roots are integers
and must divide 12. Thus the only possible rational roots are
±1, ±2, ±3, ±4, ±6, ±12.
We find immediately that 1 is a root and so (x−1) must be a factor. Dividing
out by this factor we get the quotient
x3 − 7x2 + 16x − 12.
We check this polynomial for rational roots and find 2 works. Dividing out
by (x − 2) we get the quotient
x2 − 5x + 6.
Once we get down to a quadratic we can solve it directly. In this case it
factorizes as (x − 2)(x − 3). We therefore have that
x4 − 8x3 + 23x2 − 28x + 12 = (x − 1)(x − 2)2 (x − 3).
At this point, multiply out the righthand side and check that we really do
have an equality. In this case, all roots are rational and are 1,2,2,3.
140
CHAPTER 5. COMPLEX NUMBERS
Exercises 5.2
1. Find the quotient and remainder when the first polynomial is divided
by the second.
(a) x3 − 7x − 1 and x − 2.
(b) x4 − 2x2 − 1 and x2 + 3x − 1.
(c) 2x3 − 3x2 + 1 and x.
2. Find all roots using the information given.
(a) 4 is a root of 3x3 − 20x2 + 36x − 16.
(b) −1, −2 are both roots of x4 + 2x3 + x + 2.
3. Find a cubic having roots 2, −3, 4.
4. Find a quartic having roots i, −i, 1 + i and 1 − i.
5. The cubic x3 + ax2 + bx + c has roots α, β and γ. Show that a, b, c can
each be written in terms of the roots.
√
6. 3 + i 2 is a root of x4 + x3 − 25x2 + 41x + 66. Find the remaining
roots.
√
7. 1 − i 5 is a root of x4 − 2x3 + 4x2 + 4x − 12. Find the remaining roots.
8. Find all the roots of the following polynomials.
(a) x3 + x2 + x + 1.
(b) x3 − x2 − 3x + 6.
(c) x4 − x3 + 5x2 + x − 6.
9. Write each of the following polynomials as a product of linear or quadratic
real factors.
(a) x3 − 1.
(b) x4 − 1.
(c) x4 + 1.
5.3. COMPLEX NUMBER GEOMETRY
5.3
141
Complex number geometry
We have proved that every non-zero complex number has two square roots
and from the fundamental theorem of algebra (FTA), we know that every
non-zero complex number has three cube roots, and four fourth roots, and
more generally n nth roots. However, we didn’t prove the FTA. The main
goal of this section is to prove that every non-zero complex number has n
nth-roots. To do this, we shall think about complex numbers in a geometric,
rather than an algebraic, way. Throughout this section we shall not assume
FTA. We shall only need Theorem 5.2.10: every polynomial of degree n has
at most n roots.
5.3.1
sin and cos
We recall some well-known properties of the trigonometric functions sin and
cos. First the addition formulae
sin(α + β) = sin α cos β + cos α sin β
and
cos(α + β) = cos α cos β − sin α sin β.
These formulae were important historically because they enabled unknown
values of sin’s and cos’s to be calculated from known ones, and so they were
useful in constructing trig tables in the days before calculators
Angles are most naturally measured in radians rather than degrees —
the system of angle measurement based on degrees is an historical accident.
Why 360 degrees in a circle? You would have to ask the Ancient Babylonians.
Recall that positive angles are measures in an anticlockwise direction.
The sin and cos functions are periodic functions with period 2π. This
means that for all angles θ
sin(θ + 2πn) = sin θ and cos(θ + 2πn) = cos θ
for all n ∈ Z. This fact will be crucial in what follows.
5.3.2
The complex plane
In this section, we shall describe in more detail an alternative way of thinking
about complex numbers which turns out to be very fruitful. Recall that a
142
CHAPTER 5. COMPLEX NUMBERS
complex number z = a + bi has two components: a and b. We can plot these
as a point in the plane. The plane used in this way is called the complex
plane: the x-axis is the real axis and the y-axis is interpreted as the complex
axis. Although a complex number can be thought of as labelling a point in
the complex plane, it can more usefully be regarded as labelling the directed
line segment from the origin to the point. This is how we shall regard it.
Let z = a + bi be a non-zero complex number and let θ be the angle that it
makes with the positive reals. The length of z as a directed line segment in
the complex plane is |z|, and by basic trig a = |z| cos θ and b = |z| sin θ. It
follows that
z = |z| (cos θ + i sin θ) .
z
i |z| sin θ
θ
|z| cos θ
Observe that |z| is a non-negative real number. This way of writing complex
numbers is called the polar form.
At this point, I need to clarify the only feature of complex numbers that
causes confusion. I have already mentioned that the functions sin and cos
are periodic. For that reason, there is not just one number θ that yields
the complex number z but infinitely many of them: namely, all the numbers
θ + 2πk where k ∈ Z. For this reason, we define the argument of z, denoted
by arg z, not merely to be the single angle θ but the set
arg z = {θ + 2πk : where k ∈ Z}.
The angle θ is chosen so that 0 ≤ θ < 2π and is called, for convenience, the
principal argument. But note that books vary on what they choose to call
the principal argument. This feature of the argument plays a crucial role
when we come to calculate nth roots.
Let w = r (cos θ + i sin θ) and z = s (cos φ + i sin φ) be two non-zero
complex numbers. We shall calculate wz. We have that
5.3. COMPLEX NUMBER GEOMETRY
143
wz = rs (cos θ + i sin θ) (cos φ + i sin φ)
= rs[(cos θ cos φ − sin θ sin φ) + (sin θ cos φ + cos θ sin φ)i]
but using the properties of the sin and cos functions this reduces to
wz = rs (cos(θ + φ) + i sin(θ + φ)) .
We thus have the following important result:
when two non-zero complex numbers are multiplied together their
lengths are multiplied and their arguments are added.
This result helps us to understand the meaning of i. Multiplication by i
is the same as a rotation about the origin by a right angle. Multiplication by
i2 is therefore the same as a rotation about the origin by two right angles.
But this is exactly the same as multiplication by −1.
i
−1
1
−i
We may apply similar reasoning to explain geometrically why −1 × −1 = 1.
We of course proved this algebraically in Chapter 3. Multiplication by −1 is
interpreted as rotation about the origin by 180◦ . It follows that doing this
twice takes us back to where we started and so is equivalent to multiplication
by 1.
The proof of the next theorem follows by induction from the result we
proved above. But it is important to note that it is the result above that is
really fundamental.
144
CHAPTER 5. COMPLEX NUMBERS
Theorem 5.3.1 (De Moivre). Let n be a positive integer. If z = r (cos θ + i sin θ)
then
z n = rn (cos nθ + i sin nθ) .
Example 5.3.2. Observe that complex numbers of the form
S 1 = {cos θ + i sin θ : θ ∈ R}
can be interpreted geometrically as being the unit circle with centre the
origin in the complex plane. Thus every non-zero complex number is a real
number times a complex number lying on the unit circle. The set S 1 has
some interesting algebraic properties as well. Observe that if u, v ∈ S 1 then
uv ∈ S 1 , and that if u ∈ S 1 then u−1 ∈ S 1 .
Our results above have nice applications in painlessly obtaining trigonometric identities.
Example 5.3.3. If you remember that when multiplying complex numbers
in polar form you add their arguments, then you can easily reconstitute the
identities we started with since
(cos α + i sin β)(cos α + i sin β) = cos(α + β) + i sin(α + β).
This is helpful in getting both sines and signs right.
Example 5.3.4. Express cos 3θ in terms of cos θ and sin θ using De Moivre’s
Theorem. We have that
(cos θ + i sin θ)3 = cos 3θ + i sin 3θ.
However, we can expand the lefthand side to get
cos3 θ + 3i cos2 θ sin θ + 3 sin θ(i sin θ)2 + (i sin θ)3
which simplifies to
cos3 θ − 3 cos θ sin2 θ + i 3 cos2 θ sin θ − sin3 θ
where we use the fact that i2 = −1 and i3 = −i and i4 = 1. Equating real
and imaginary parts we get
cos 3θ = cos3 θ − 3 cos θ sin2 θ.
We also get the formula
sin 3θ = 3 cos2 θ sin θ − sin3 θ
for free.
5.3. COMPLEX NUMBER GEOMETRY
5.3.3
145
Arbitrary roots of complex numbers
In this section, we shall prove that every non-zero complex number has n
nth roots: thus it has three cube roots, and four fourth roots and so on. We
begin with a special case that turns out to give us almost all the information
we need to solve the general case. We shall also need the following important
idea. The word radical simply means a square root, or a cube root, or a fourth
root and so on. We regard the four basic operations of algebra — addition,
subtraction, multiplication and division — together with the extraction of
nth roots as purely algebraic operations. Although slightly failing as a precise
definition, I shall say that a radical expression is an algebraic expression
involving nth roots. For example, the formula for the roots of a quadratic
describes the roots as radical expressions in terms of the coefficients of the
quadratic. Thus a radical expression is supposed to be an explicit description
of some real number. The following table gives some easy to find radical
expressions for the sines and cosines of some well-known angles.
θ
0◦
30◦
45◦
60◦
90◦
sin θ cos θ
0
1
√
1
2
√1
√2
3
2
3
2
√1
2
1
2
1
0
The nth roots of unity
We shall show that the number 1 has n nth roots — these are called the
n roots of unity. We know that the equation z n − 1 = 0 has at most n roots,
so all we need do is find n roots and we are home and dry. We begin with a
motivating example.
Example 5.3.5. We find the three cube roots of 1. There are two ways of
writing these roots: as trigonometric expressions and as radical expressions.
Divide the unit circle in the complex plane into an equilateral triangle with 1
as one of its vertices. Then the other two roots are ω1 = cos 120◦ + i sin 120◦
obtained by dividing 2π by 3 and ω2 = cos 240◦ + i sin 240◦ which is twice 2π
.
3
If we put ω = ω1 then in fact ω2 = ω 2 . This is the trigonometric form of the
roots.
146
CHAPTER 5. COMPLEX NUMBERS
ω
1
ω2
In this case, it is easy to write down the radical expressions for the roots
as well since we already have radical expressions for sin 60◦ and cos 60◦ . We
therefore have that
√ √ 1
1
−1 + i 3 and ω 2 = − 1 + i 3 .
ω=
2
2
The general case is solved in a similar way to our example above using
regular n-gons in the complex plane where one of the vertices is 1.
Theorem 5.3.6 (Roots of unity). The n roots of unity are given by the
following formula
2kπ
2kπ
cos
+ i sin
n
n
for k = 1, 2, . . . , n. These complex numbers are arranged uniformly on the
unit circle and form a regular polygon with n sides: the cube roots of unity
form an equilateral triangle, the fourth roots form a square, the fifth roots
form a pentagon, and so on.
There is one point here that is potentially confusing. It is always possible,
and easy, to write down trigonometric expressions for the nth roots of unity.
Using such an expression, we can then write down numerical values of the
nth roots to any desired degree of accuracy. Thus, from a purely practical
point of view, we can find the nth roots of unity. It is also always possible
to write down the radical expressions of the nth roots of unity but this is
far from easy in general. In fact, it forms part of the advanced subject
known as Galois theory.
Example 5.3.7. Gauss proved the following result which is highly nontrivial. You can verify that it is true by using a calculator — at least up to
the limits of your calculator. It is a good example of a radical expression
5.3. COMPLEX NUMBER GEOMETRY
147
where, on this occasion, the only radicals that occur are square roots; the
theory Gauss developed showed that this implied that the 17-gon could be
constructed using only a ruler and compass.
q
√
√
2π
16 cos
= −1 + 17 + 34 − 2 17
17
s
q
q
√
√
√
√
+
68 + 12 17 − 16 34 + 2 17 − 2(1 − 17) 34 − 2 17
Arbitrary nth roots
The nth roots of unity play an important role in finding arbitrary nth
roots. We begin with an example to illustrate the idea.
Example 5.3.8. We√
find the three cube roots of 2. If you use your calculator
3
you will simply find 2, a real number. There should be two others: where
are they? The explanation is that the other two cube roots are complex. Let
ω be the complex cube root of 1 that we described above. Then the three
cube roots of 2 are the following
√
√
√
3
3
3
2, ω 2, ω 2 2.
The above example generalizes.
Theorem 5.3.9 (nth roots). Let z = r (cos θ + i sin θ) be a non-zero complex
number. Put
√
θ
θ
n
u = r cos + i sin
,
n
n
the obvious nth root, and put
2π
2π
+ i sin ,
ω = cos
n
n
the first interesting nth root of unity. Then the nth roots of z are as follows
u, uω, . . . , uω n−1 .
It follows that the nth roots of z = r (cos θ + i sin θ) can be written in the
form
√
θ 2kπ
θ 2kπ
n
r cos
+
+ i sin
+
n
n
n
n
for k = 0, 1, 2, . . . , n − 1.
This is the reason why every non-zero number has two square roots that
differ by a multiple of −1: the two square roots of 1 are 1 and -1.
148
5.3.4
CHAPTER 5. COMPLEX NUMBERS
Euler’s formula
We have seen that every real number can be written as a whole number plus
a possibly infinite decimal part. It turns out that many functions can also be
written as a sort of decimal. I shall illustrate this by means of an example.
Consider the function ex . All you need to know about this function is that
it is equal to its derivative and e0 = 1. We would like to write
ex = a0 + a1 x + a2 x2 + a3 x3 + . . .
where the ai are real numbers that we have yet to determine. We can work
out the value of a0 easily by putting x = 0. This tells us that a0 = 1. To get
the value of a1 we first differentiate our expression to get
ex = a1 + 2a2 x + 3a3 x2 + . . .
Now put x = 0 again and this time we get that a1 = 1. To get the value of
a2 we differentiate our expression again to get
ex = 2a2 + 3 · 2 · a3 x + . . .
Now put x = 0 and we get that a2 = 21 . Continuing in this way we quickly
spot the pattern for the values of the coefficient an . We find that an = n!1
where n! = n(n − 1)(n − 2) . . . 2 · 1. What we have done for ex we can also
do for sin x and cos x and we obtain the following series expansions of each
of these functions.
• ex = 1 + x +
x2
2!
+
x3
3!
+
x4
4!
+ . . ..
• sin x = x −
x3
3!
+
x5
5!
−
x7
7!
+ . . ..
• cos x = 1 −
x2
2!
+
x4
4!
−
x6
6!
+ . . ..
There are interesting connections between these three series. We shall
now show that complex numbers help to explain them. Without worrying
about the validity of doing so, we calculate the infinite series expansion of
eiθ . We have that
eiθ = 1 + (iθ) +
1
1
(iθ)2 + (iθ)3 + . . .
2!
3!
5.3. COMPLEX NUMBER GEOMETRY
149
that is
1 2 1 3
1
θ − θ i + θ4 + . . .
2!
3!
4!
By separating out real and complex parts, and using the infinite series we
obtained above, we get Euler’s remarkable formula
eiθ = 1 + iθ −
eiθ = cos θ + i sin θ.
Thus the complex numbers enable us to find the hidden connections between
the three most important functions of calculus: the exponential function and
the sine and cosine functions. It follows that every non-zero complex number
can be written in the form reiθ . If we put θ = π in Euler’s formula, we get
the following result, which is widely regarded as one of the most amazing in
mathematics.
Theorem 5.3.10 (Euler’s identity).
eπi = −1.
This result shows us that the real numbers π, e and −1 are connected,
but that to establish that connection we have to use the complex number i.
This is one of the important roles of the complex numbers in mathematics in
that they enable us to make connections between topics that look different:
they form a mathematical hyperspace.
Exercises 5.3
1. Express cos 5x and sin 5x in terms of cos x and sin x.
2. Prove the following where x is real.2
(a) sin x =
(b) cos x =
1
(eix − e−ix ).
2i
1 ix
(e + e−ix ).
2
Hence show that cos4 x = 18 [cos 4x + 4 cos 2x + 3].
3. Find the 4th roots of unity as radical expressions.
2
Compare (a) and (b) below with sinh x = 12 (ex − e−x ) and cosh x = 21 (ex + e−x ).
150
CHAPTER 5. COMPLEX NUMBERS
4. Find the 6th roots of unity as radical expressions.
5. Find the 8th roots of unity as radical expressions.
6. Solve x3 = −8i.
7. Find radical expresssions for the roots of x5 − 1, and so show that
p
√
√
5
−
1
10
+
2
5
cos 72◦ =
and sin 72◦ =
.
4
4
To do this, consider the equation
x4 + x3 + x2 + x + 1 = 0.
Divide through by x2 to get
x2 +
1
1
+ x + + 1 = 0.
2
x
x
Put y = x + x1 . Show that y satisfies the quadratic
y 2 + y − 1 = 0.
You can now find all four values of x.
8. Determine all the values of ii . What do you notice?
5.4
Making sense of complex numbers
In this chapter, I have assumed that complex numbers exist and that they
obey the usual high-school rules of algebra. In this section, I shall sketch out
a proof of this.
We start with the set R × R whose elements are ordered pairs (a, b) where
a and b are real numbers. It will be helpful to denote these ordered pairs by
bold letters so a = (a1 , a2 ). We define 0 = (0, 0), 1 = (1, 0) and i = (0, 1).
We now define operations as follows
• If a = (a1 , a2 ) and b = (b1 , b2 ), define a + b = (a1 + b1 , a2 + b2 ).
• If a = (a1 , a2 ) define −a = (−a1 , −a2 ).
5.5. RADICAL SOLUTIONS
151
• If a = (a1 , a2 ) and b = (b1 , b2 ), define
ab = (a1 b1 − a2 b2 , a1 b2 + a2 b1 ).
• If a = (a1 , a2 ) 6= 0 define
a1
−a2
,p 2
).
a−1 = ( p 2
2
a1 + a2
a1 + a22
It is now a long exercise to check that all the usual axioms of high-school
algebra hold. Observe now that the element (a1 , a2 ) can be written
(a1 , 0)1 + (a2 , 0)i
and that
i2 = (0, 1)(0, 1) = (−1, 0) = −1.
The elements of the form (a, 0) can be identified with the real numbers. This
proves that the complex numbers as I described them earlier in this chapter
really do exist.
5.5
Radical solutions
There are two great historical revolutions in the history of algebra. The first
is the discovery that there are irrational numbers — this means that we have
to learn √
to work with real numbers that are described by radical expressions
such as 2. The second is Galois’s discovery that the roots of a polynomial
need not be radical expressions of the coefficients of the polynomial — put
simply, that there is not always a formula for the roots of a polynomial
equation. We begin by describing the way in which cubics and quartics can
be solved purely algebraically.
5.5.1
Cubic equations
Let
f (x) = a3 x3 + a2 x2 + a1 x + a0
where a3 6= 0. I shall assume all coefficients are real though the theory
works in general. We shall find all the roots of f (x). This problem can be
152
CHAPTER 5. COMPLEX NUMBERS
simplified in two ways. First, we may divide through by a3 and so, without
loss of generality, we may assume that f (x) is monic. That is a3 = 1. Second,
by means of a substitution we may obtain a cubic in which the coefficient of
the term in x2 is zero. Put x = y − a33 . You should do this and check that
you get a polynomial of the form
g(y) = y 3 + py + q.
We say that such a cubic is reduced. It follows that without loss of generality,
we need only solve the cubic
g(x) = x3 + px + q.
To do this needs what looks like a minor miracle. Let u and v be two complex
+ i sin 2π
, one of the complex cube roots of unity.
variables. Let ω = cos 2π
3
3
You should now check that the following cubic
t(x) = x3 − 3uv − (u3 + v 3 )
has the roots
uω + vω 2 ,
u + v,
uω 2 + vω.
Now we can solve
x3 + px + q = 0
if we can find u and v such that
p = −3uv,
q = −u3 − v 3 .
Now if we cube the first equation, we get the following two equations
−p
= u3 v 3 ,
27
−q = u3 + v 3 .
If we regard u3 and v 3 as the unknowns we know their sum and we know their
product. This means that u3 and v 3 are the roots of the quadratic equation
x2 + qx −
p3
= 0.
27
We therefore have that
1
u3 =
2
r
−q +
27q 2 + 4p3
27
!
5.5. RADICAL SOLUTIONS
153
and
1
v3 =
2
r
−q −
27q 2 + 4p3
27
!
.
To find u we have to take a cube root of the number u3 and there are three
possible such roots. Choose one such value for u. We then choose the value
of v so that p = −3uv.
Example 5.5.1. Find the roots of x3 − 9x − 2 = 0. Here p = 9 and q = −2.
The quadratic equation we have to solve is therefore
x2 − 2x − 27 = 0.
√
√
This has roots 1 ± 2 7. Put u3 = +2 7. We may choose a real cube root
in this case to get
q
√
3
u = 1 + 28.
We must then choose v to be
q
√
3
u = 1 − 28.
We may now write down the three roots of our original cubic.
The following cubic equation was studied by Bombelli in 1572 and had
an important influence on the development of complex numbers.
Example 5.5.2. Consider the cubic
x3 − 15x − 4 = 0.
The associated quadratic in this case is
x2 + 4x + 125 = 0.
This gives the two solutions that Bombelli would have written in a way
equivalent to the following
√
x = 2 ± −121.
We would write this as
x = 2 ± 11i.
154
CHAPTER 5. COMPLEX NUMBERS
Thus
u3 = 2 + 11i and v 3 = 2 − 11i.
There are
three cube roots of 2 + 11i all complex. Let’s press on√regardless.
√
3
Write 2 + 11i to represent one of those cube roots. Write 3 2 − 11i to
be the corresponding cube root such that their product is 5. Thus at least
symbolically we may write
√
√
u + v = 3 2 + 11i + 3 2 − 11i.
What is surprising is that for some choice of these cube roots this value must
be real. The reason is that the graph of our cubic has one real root which
can easily be checked to be 4. To see why, observe that
(2 + i)3 = 2 + 11i and (2 − i)3 = 2 − 11i.
If we choose 2 + i as one of the cube roots of 2 + 11i then we have to choose
2 − i as the corresponding cube root of 2 − 11i. In this way, we get
4 = (2 + 11i) + (2 − 11i)
as a root. It was the fact that real roots arose in this way that provided
the first inkling that there was a number system, the complex numbers,
that extended the so-called real numbers, but had every much as tangible
existence.
5.5.2
Quartic equations
Let
f (x) = a4 x4 + a3 x3 + a2 x2 + a1 x + a0 .
As usual, we may assume that a4 = 1. By means of a suitable substitution,
which is left as an exercise, we may eliminate the cubed term. We therefore
end up with a reduced quartic which it is convenient to write in the following
way
x4 = ax2 + bx + c.
Suppose that we could write the righthand side as a perfect square (dx + e)2 .
Then our quartic could be written as the product of two quadratics
x2 − (dx + e) x2 + dx + e .
5.5. RADICAL SOLUTIONS
155
The roots of each these two quadratics will be the four roots of our original
quartic. It is not true that we can always do this, but by means of another
miracle we can transform the equation into one with the same roots where
we can. Let t be a new variable whose value will be determined later. We
may write
(x2 + t)2 = (a + 2t)x2 + bx + (c + t2 ).
We now want to choose a value of t so that the righthand side is a perfect
square. This happens when the discriminant of the quadratic (a + 2t)x2 +
bx + (c + t2 ) is zero. That is when
b2 − 4(a + 2t)(c + t2 ) = 0.
Now this is a cubic in t. We now use the method of the previous section to
find a specific value of t say t1 . We then get
(x + t1 ) = (a + 2t1 ) x +
2
2
b
2(a + 2t1 )
2
.
It follows that the roots of the original quartic are the roots of the following
two quadratics
√
b
2
=0
(x + t1 ) − a + 2t1 x +
2a + 4t1
and
2
(x + t1 ) +
√
a + 2t1 x +
b
2a + 4t1
= 0.
Example 5.5.3. Solve the quartic
x4 = 1 − 4x.
We shall find a value of t below
(x2 + t)2 = t4 + 2x2 t + t2 = 2x2 t − 4x + (1 + t2 )
which makes the righthand side a perfect square. This requires us to find a
root of the cubic
t3 + t − 2 = 0.
156
CHAPTER 5. COMPLEX NUMBERS
Here t = 1 works. Our quartic with t therefore becomes
(x2 + 1)2 = 2(x − 1)2 .
Therefore the roots of our original quartic are the roots of the following two
quadratics
√
√
(x2 + 1) − 2(x − 1) = 0 and x2 + 1 + 2(x − 1) = 0.
The roots of our original quartic are therefore
p√
p√
−1 ±
1±i
8+1
8−1
√
√
and
.
2
2
5.5.3
Symmetries and particles
Although quadratic equations had been solved in antiquity, it was not until
the 16th century that cubics and quartics were first solved. This great leap
forward in the development of algebra was centred on a group of Italian
mathematicians — Scipione del Ferro (1465–1525), Niccolo Tartaglia (1500–
1557), Girolamo Cardano (1501–1576), Ludovico Ferrari (1522–1562), Rafael
Bombelli (1526–1572) — whose antics are worthy of an opera or Shakespeare
comedy but the importance of their work cannot be overemphasized. But
two points arise. First, the solution of quadratics, cubics and quartics seem
to rely on mathematical miracles. Second, we appear to see a pattern: to
solve cubics we need to solve an associated quadratic and to solve quartics
we need to solve an associated cubic. These two points were investigated by
a number of mathematicians in great depth: in particular, Lagrange (1736–
1813), Ruffini (1765–1822) and Abel (1802–1829). The expectation was high
that quintics should be solvable by using quartics in a way that continued
the pattern. Then came the great surprize. Ruffini and Abel proved that
the pattern does not continue and that one cannot always describe the roots
of a quintic in radical form — there are, of course, five roots — the point
is that these roots cannot in general be written down using an algebraic
formula. The question is why and the answer to this question also explains
the algebraic miracles we used above. It was discovered by Evariste Galois
(1811–1832). I shall not go into the details of his biography — he was killed,
for instance, in a duel — since you will find much more written about him
elsewhere, some of it accurate, instead I shall focus on his mathematics.
5.6. GAUSSIAN INTEGERS AND FACTORIZING PRIMES
157
By building on the work of Lagrange, he explained the miracles above and
much more. His approach was new: to determine whether the roots of a
polynomial could be expressed in algebraic terms as radical expressions, he
studied the symmetries of the polynomial. Just what this means is explained
in a subject known as Galois theory after its founder. Crucially, this is not
a mere extrapolation of existing algebraic manipulation, instead it involves
working at a higher level of abstraction. As so often happens in mathematics,
a development in one area led to developments in other areas. Sophus Lie
(1811–1832) realized that symmetries could also be used to help understand
the tricks that were used to solve differential equations. It was in this way
that symmetry came to play a fundamental role in physics. If you hear a
particle physicist talking about symmetries, they are paying an unconscious
tribute to Galois’ bold work in studying the nature of the roots of polynomial
equations.
5.6
Gaussian integers and factorizing primes
Complex numbers may be used to factorize some primes. For example,
5 = (1 − 2i)(1 + 2i).
To develop this example further, we shall need some definitions. The integers
Z are a subset of the reals R. We define the Gaussian integers, denoted
by Z[i], to be all complex numbers of the form m + in where m and n are
integers. What our example shows is that some primes can be factorized using
Gaussian integers. The question is: which ones? Observe that 5 = 12 + 22 .
In other words, it can be written as a sum of two squares. Another example
of a prime that can be written as a sum of two squares is 13. We have that
13 = 9 + 4 = 32 + 22 .
This prime can also be factorizes using Gaussian integers
13 = (3 + 2i)(3 − 2i).
In fact, any prime p that can be written as a sum of two squares p = a2 + b2 ,
can also be factorized using Gaussian integers
p = (a + ib)(a − ib).
158
CHAPTER 5. COMPLEX NUMBERS
This raises the question of exactly which primes can be written as a sum of
two squares.
Lemma 5.6.1. Let p be an odd prime that can be written as a sum of two
squares. Then p ≡ 1 (mod 4).
Proof. Let p = a2 + b2 . Since p is assumed odd, we must have that one of a2
and b2 is even and the other odd. Without loss of generality, we may assume
that a2 is odd and b2 is even. But from Chapter 2, this implies that a is
odd and b is even. We may therefore write a = 2u and b = 2v + 1 for some
natural numbers u and v. But then p = 4u2 + 4v 2 + 4v + 1. It follows that
p ≡ 1 (mod 4).
Lemma 5.6.2. Each odd prime p satisfies either p ≡ 1 (mod 4) or p ≡ 3
(mod 4).
Proof. The possible remainder when p is divided by 4 are 0, 1, 2, 3. Since p
is an odd prime both 0 and 2 are impossible and the result follows.
The lemma above tells us that each odd prime belongs to exactly one of
two camps. The obvious question is whether both of these camps are infinite.
Proposition 5.6.3.
1. There are infinitely many primes p such that p ≡ 3 (mod 4).
2. There are infinitely many primes p such that p ≡ 1 (mod 4).
We have proved that if an odd prime p can be written as a sum of two
squares then p ≡ 1 (mod 4). The hard question is whether the converse is
true.
Theorem 5.6.4 (Euler, 1754). An odd prime p can be written as a sum of
two squares if, and only if, p ≡ 1 (mod 4).
We may deduce from this theorem that every odd prime p ≡ 1 (mod 4)
can be factorized by means of Gaussian integers.
Chapter 6
Rational functions
(x)
A (real) rational function is simply a quotient fg(x)
where f (x) and g(x) are
any polynomials with real coefficients, the polynomial g(x) of course not
being equal to the zero polynomial. If deg f (x) < deg g(x), I shall say that
the rational function is proper. The set of all rational functions R(x) — notice
I use round brackets unlike the square brackets for the set of real polynomials
— can be added, subtracted, multiplied and divided. In fact, they satisfy all
the algebraic laws of high-school algebra. Rational functions are enormously
useful in mathematics. The goal of this section is to show that every rational
function can be written as a sum of simpler rational functions. Once I have
shown how to do this, I will outline its application to integration.
6.1
Numerical partial fractions
This section is intended as motivation for the partial fraction representation
of rational functions described in a later chapter, so it can be omitted at first
reading. The idea is to show how a fraction can be written as a sum of other
fractions having a particular form. Specifically, the goal of this section is to
show how a proper fraction can be written as a sum of proper fractions over
prime power denominators. This involves two steps which I shall describe by
means of examples. The theory is an application of the fundamental theorem
or arithmetic and the extended Euclidean algorithm.
In order to add two fractions together, we first have to ensure that both
are expressed over the same denominator. For example, suppose we want to
159
160
add
CHAPTER 6. RATIONAL FUNCTIONS
5
7
and
8
.
13
Since 7 × 13 = 91 we have the following
5
8
65 + 56
121
+
=
=
.
7 13
91
91
810
We shall now consider the reverse process, using the fraction 1003
as an
example. Observe that 1003 = 17 × 59 where 17 and 59 are coprime. Our
goal is to write
a
b
810
=
+
1003
17 59
for some natural numbers a and b. By the extended Euclidean algorithm, we
can write
1 = 7 · 17 − 2 · 59.
It follows that
7 · 17 − 2 · 59
7
2
1
=
=
− .
1003
17 · 59
59 17
Now multiply both sides by 810 to get
7 · 810 2 · 810
6
5
6
5
810
=
−
= 96 − 95 = 1 +
− .
1003
59
17
59
17
59 17
Simplifying we get
810
6
12
=
+
1003
59 17
as required.
We shall now do something different. Consider the fraction 10
. We have
16
4
that 16 = 2 and so we cannot write it as a product of coprime numbers.
However, we can do something else. We can write 10 = 2 + 8 = 21 + 23 . Thus
10
21 + 23
21 23
1
1
=
= 4+ 4 = 3+
4
16
2
2
2
2
2
and so
10
1
1
= 1 + 3.
16
2
2
Let’s now combine these two steps. Consider the fraction
factorisation of 90 is 2 · 32 · 5. Our first goal is to write
41
a
b
c
= + 2+ .
90
2 3
5
41
.
90
The prime
6.1. NUMERICAL PARTIAL FRACTIONS
161
Thus we have to find a, b, c such that
41 = 45a + 10b + 18c.
By trial and error, remembering that a, b, c have to be integers, we find that
41 = 45 · 1 + 10 · 5 + (−3) · 18.
It follows that
1
5
41
3
= + 2− .
90
2 3
5
We now want to write
5
d
e
= + 2
2
3
3 3
where |d| , |e| < 3. But 5 = 2 + 3 and so
1
2
5
= + 2.
2
3
3 3
It follows that
1 1 2 3
41
= + + − .
90
2 3 9 5
We may summarise what we have found in the following theorem.
Theorem 6.1.1.
(i) Let ab be a proper fraction, and let b = pn1 1 . . . pnr r be the prime factorisation
of b. Then
r
a X ci
=
b
pni
i=1 i
for some integers ci , where each of the fractions is proper.
(ii) Now let p be a prime and
c
pn
a proper fraction. Then
n
X dj
c
=
pn
pj
j=1
where each dj is such that |dj | < p.
162
6.2
CHAPTER 6. RATIONAL FUNCTIONS
Analogies
There are parallels between the properties of the natural numbers and the
properties of real polynomials. We have seen that there are remainder theorems for both natural numbers and polynomials. In the case of the natural
numbers, we used the remainder theorem to develop Euclid’s algorithm and
the Extended Euclidean algorithm for computing greatest common divisors.
We can do the same thing for polynomials. We define the greatest common
divisor of two real polynomials a(x) and b(x) to be a real polynomial of
largest degree dividing both a(x) and b(x). Any two such gcd’s will be real
number multiples of each other. We say that a(x) and b(x) are coprime if
their greatest common divisor is a constant polynomial. Euclid’s algorithm
and the Extended Euclidean algorithm can both be proved for real polynomials. As a consequence, if a(x) and b(x) are coprime real polynomials, then
we can find real polynomials c(x) and d(x) such that
1 = a(x)c(x) + b(x)d(x).
If f (x) is any real polynomial, then we can multiply both sides of the above
equation by f (x) to get
f (x) = a(x)[f (x)c(x)] + b(x)[f (x)d(x)].
Thus f (x) can be written in terms of a(x) and b(x) in a very simple way.
There is a simple refinement of this result I shall use below. If deg f (x) <
deg a(x) + deg b(x) then using the remainder theorem, we can in fact write
f (x) = B(x)a(x) + A(x)b(x)
where deg B(x) < deg b(x) and deg A(x) < deg a(x).
Every natural number can be written as a product of primes, where a
prime is a number which cannot be factorised in a non-trivial way. The
analogue of a prime number for real polynomials is the notion of an irreducible
polynomial. The real polynomial f (x) is said to be irreducible if it cannot
be factorised into real polynomials each having smaller degree than f (x).
Unlike the case of prime numbers, we can characterise the real irreducible
polynomials very easily. It is a consequence of the fundamental theorem for
real polynomials that there are only two kinds of irreducible real polynomials:
linear polynomials c(x−a) and irreducible quadratic polynomials c(x2 +ax+
b), that is, quadratics having only non-real roots.
6.3. PARTIAL FRACTIONS
163
We now have the following analogue of the fundamental theorem of arithmetic for real polynomials: every real polynomial of degree at least 1 can be
written as a product of a real number and powers of distinct monic polynomials or distinct monic irreducible quadratic polynomials in essentially one
way.
6.3
Partial fractions
(x)
be a rational function. If deg f (x) > deg g(x) then we may apply
Let fg(x)
the Remainder Theorem and write
f (x)
r(x)
= q(x) +
g(x)
g(x)
where deg r(x) < deg g(x). Thus without loss of generality, we may assume
that deg f (x) < deg g(x) in what follows. I shall also assume that g(x) is
monic; if it isn’t there will simply be a constant factor at the front.
By the fundamental theorem for real polynomials, we may write g(x) as
a product of distinct factors of the form (x − a)r or (x2 + ax + b)s . Using
(x)
can now be written as
this decomposition of g(x), the rational function fg(x)
a sum of simpler rational functions which have the following forms:
• For each factor of g(x) of the form (x − a)r , we will have a sum of the
form
Ar−1
Ar
A1
+ ... +
+
.
r−1
x−a
(x − a)
(x − a)r
• For each factor of g(x) of the form (x2 + ax + b)s , we will have a sum
of the form
A1 x + B1
As−1 x + Bs−1
As x + Bs
+
.
.
.
+
+
.
x2 + ax + b
(x2 + ax + b)s−1 (x2 + ax + b)s
(x)
This is called the partial fraction decomposition of fg(x)
. The practical
method for finding such decompositions is best illustrated by means of some
examples.
Examples 6.3.1.
164
CHAPTER 6. RATIONAL FUNCTIONS
5
1. Write x2 +x−6
in partial fractions. We have that x2 + x − 6 = (x + 3)(x −
2), a product of two distinct linear factors. We expect a solution of the
form
5
A
B
=
+
2
x +x−6
x+3 x−2
where A and B are real numbers to be determined. The RHS is just
A(x − 2) + B(x + 3)
.
(x + 3)(x − 2)
Comparing the LHS with the RHS we get that
5 = A(x − 2) + B(x + 3)
which must hold for all values of x. Putting x = 2 we get B = 1 and
putting x = −3 we get A = 1. Thus
x2
5
−1
1
=
+
.
+x−6
x+3 x−2
At this point, we check our solution.
9
2. Write (x−1)(x+2)
2 in partial fractions. Here we have a single linear factor
and a square of a linear factor. We therefore expect an answer in the
form
A
B
C
9
=
+
+
.
2
(x − 1)(x + 2)
x − 1 x + 2 (x + 2)2
Carrying out the sum on the RHS, and comparing the LHS with the
RHS we get that
9 = A(x + 2)2 + B(x − 1)(x + 2) + C(x − 1).
Putting x = 1 we get that A = 1, putting x = −2, we get that C = −3
and putting x = −1 and using the values we have for A and C we get
that B = −1. Thus
9
1
1
3
=
−
−
.
2
(x − 1)(x + 2)
x − 1 x + 2 (x + 2)2
6.3. PARTIAL FRACTIONS
165
3. Write x416x
in partial fractions. We have that x4 − 16 = (x − 2)(x +
−16
2)(x2 + 4), a product of two distinct linear factors and a quadratic
factor. We expect a solution of the form
16x
A
B
Cx + D
=
+
+ 2
.
− 16
x−2 x+2
x +4
x4
This leads to
16x = A(x + 2)(x2 + 4) + B(x − 2)(x2 + 4) + (Cx + D)(x − 2)(x + 2).
Using appropriate values of x we get that A = 1, B = 1, C = 2 and
D = 0. Thus
16x
1
−1
2x
=
+
+ 2
.
− 16
x−2 x+2 x +4
x4
4. Write
form
3x2 +2x+1
(x+2)(x2 +x+1)2
in partial fractions. We expect a solution in the
A
Bx + C
Dx + E
3x2 + 2x + 1
=
+ 2
+ 2
.
2
2
(x + 2)(x + x + 1)
x + 2 x + x + 1 (x + x + 1)2
This leads to
3x2 +2x+1 = A(x2 +x+1)2 +(Bx+C)(x+2)(x2 +x+1)+(Dx+E)(x+2).
Putting x = −2 yields A = 1. There are four unknowns left and so we
need four equations. However, to avoid having to solve four equations
in four unknowns we can vary our procedure. Putting x = 0 gives
0 = C + E. Putting x = 1 gives −1 = B + C + D + E. Thus
−1 = B + D. On the RHS the highest power of x occurring is x2 .
On the LHS the highest power of x occurring apears to be x4 but that
immediately implies that its coefficient must be zero. The coefficient of
x4 is 1 + B and so B = −1 which means that D = 0. Put x = 2. This
gives 6 = 7C + E. This quickly leads to E = −1 and C = 1. Thus
3x2 + 2x + 1
1
1−x
−1
=
+ 2
+ 2
.
2
2
(x + 2)(x + x + 1)
x − 2 x + x + 1 (x + x + 1)2
166
CHAPTER 6. RATIONAL FUNCTIONS
Let me conclude this section by sketching out why the partial fraction
decomposition of real rational functions is possible.
Consider the proper rational function
f (x)
a(x)b(x)
where a(x) and b(x) are coprime. Then we indicated above that we may
write
f (x)
A(x) B(x)
=
+
a(x)b(x)
a(x)
b(x)
where the rational functions are all proper. This may be generalised as
(x)
follows. Let fg(x)
be a proper rational function. Let g(x) = a1 . . . am (x) be a
product of pairwise coprime polynomials. Then we may write
m
f (x) X Ai (x)
=
,
g(x)
ai (x)
i=1
where the rational functions are all proper.
We shall now assume that the ai (x) are either powers of linear factors or
of quadratic factors and that these factors are distinct for different i.
h(x)
Consider the proper rational function (x−a)
r where r ≥ 1. Then we may
write
h(x) = a0 + a1 (x − 1) + . . . + ar−1 (x − a)r−1
for some real numbers a0 , . . . , ar−1 in a way analogous to writing a natural
number in a number base. Thus
ar−1
a1
a0
h(x)
=
+ ... +
+
.
r
r−1
(x − a)
x−a
(x − a)
(x − a)r
Consider the proper rational function
may similarly write
h(x)
(x2 +ax+b)r
where r ≥ 1. Then we
h(x) = (a0 x+b0 )+(a1 x+b1 )(x2 +ax+b)+. . .+(ar−1 x+br−1 )(x2 +ax+b)r−1
for some real numbers a0 , . . . , ar−1 and b0 , . . . , br−1 in a way analogous to
writing a natural number in a number base. Thus
(x2
h(x)
ar−1 x + br−1
a1 x + b 1
a0 x + b 0
= 2
+ ... + 2
+ 2
.
r
r−1
+ ax + b)
x + ax + b
(x + ax + b)
(x + ax + b)r
The existence of partial fraction decompositions of real rational functions
now follows.
6.4. INTEGRATING RATIONAL FUNCTIONS
6.4
167
Integrating rational functions
In order to appreciate the significance of partial fractions it is essential to
understand how they are used. The goal of this section is therefore to show
you how to calculate
Z
f (x)
dx
g(x)
exactly, when f (x) and g(x) are real polynomials.
We need to know one key property of integration: namely, if ai are real
numbers then
Z X
Z
n
n
X
ai fi (x)dx =
ai fi (x)dx
i=1
i=1
This property is known as linearity. I shall break my discussion up into a
number of steps.
(x)
Step 1. Suppose that in fg(x)
we have that deg f (x) > deg g(x). By the
Remainder Theorem for polynomials we can write
r(x)
f (x)
= q(x) +
g(x)
g(x)
where deg r(x) < deg g(x). By the linearity of integration, we have that
Z
Z
Z
f (x)
r(x)
dx = q(x)dx +
dx.
g(x)
g(x)
In other words, to integrate an arbitrary rational function it is enough to
know how to integrate polynomials and proper rational functions.
Step 2. By linearity of integration, integrating arbitrary polynomials can
be reduced to integrating the following
Z
xn dx
where n ≥ 0.
(x)
Step 3. Let fg(x)
be a proper rational function, so that deg f (x) < deg g(x).
We may factorise g(x) into a product of real linear polynomials and real
168
CHAPTER 6. RATIONAL FUNCTIONS
irreducible quadratic polynomials and then write
functions of one of the following two forms
f (x)
g(x)
as a sum of rational
a
,
(x − d)r
where a and d are real and r ≥ 1, and
(x2
px + q
+ bx + c)s
where p, q, b, c are real and s ≥ 1 and the quadratic has a pair of complex
conjugate roots. By the linearity of integration, this reduces calculating
Z
f (x)
dx
g(x)
to calculating integrals of the following two forms
Z
a
dx
(x − d)r
and
Z
px + q
dx.
(x2 + bx + c)s
Again by linearity of integration, this reduces to being able to calculate the
following three integrals
Z
Z
Z
1
x
1
dx
dx
dx.
r
2
s
2
(x − d)
(x + bx + c)
(x + bx + c)s
Step 4. We now concentrate on the two integrals involving quadratics. By
completing the square, we can write
b
b2
x2 + bx + c = (x + )2 + (c − ).
2
4
2
By assumption b2 − 4ac < 0 (why?). Put e2 = 4c−b
(which makes sense).
4
Thus
b
x2 + bx + c = (x + )2 + e2 .
2
6.4. INTEGRATING RATIONAL FUNCTIONS
169
I shall now use a technique of calculus known as substitution and put y =
x + 2b . Doing this, and returning to x as my variable, we need to be able to
integrate the following three integrals
Z
Z
Z
1
x
1
dx
dx
dx.
(x − d)r
(x2 + e2 )s
(x2 + e2 )s
Step 5. The second integral above can be converted into the first by means
of the substitution x2 = u.
We have therefore proved the following.
Theorem 6.4.1. The integration of an arbitrary rational function can be
reduced to integrals of the following three kinds:
R
1. xn dx.
R 1
2. (x−d)
r dx.
R
1
3. (x2 +e
2 )s dx.
170
CHAPTER 6. RATIONAL FUNCTIONS
Chapter 7
Matrices I: linear equations
The term matrix was introduced by James Joseph Sylvester (1814–1897)
in 1850, and the first paper on matrix algebra was published by Arthur
Cayley (1821–1895) in 18581 . Matrices were introduced initially as packaging
for systems of linear equations, but then came to be investigated in their
own right. The main goal of this chapter is to introduce the basics of the
arithmetic and algebra of matrices. This chapter and the two that follow
form the first steps in the subject known as linear algebra. It is hard to
overemphasize the importance of this subject throughout mathematics and
its applications
7.1
Matrix arithmetic
In this section, I shall introduce matrices and three arithmetic operations
defined on them. I shall also define an operation called the ‘transpose of a
matrix’ that will be important in later work. This section forms the foundation for all that follows.
7.1.1
Basic matrix definitions
A matrix2 is a rectangular array of numbers. In this course, the numbers will
usually be real numbers but, on occasion, I shall also use complex numbers
1
A memoir on matrices, Philosphical Transactions of the Royal Society of London 148
(1858), 17–37. This is well-worth reading.
2
Plural: matrices.
171
172
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
for variety.
Example 7.1.1. The following are all matrices:
1 2 3
4 5 6
,
4
1


1 1 −1
4  ,
, 0 2
1 1
3
6 .
Usually the array of numbers that comprises a matrix is enclosed in round
brackets. Occasionally books use square brackets with the same meaning.
Later on, I shall introduce determinants and these are indicated by using
straight brackets. In general, the kind of brackets you use is important and
is not just a matter of taste.
We usually denote matrices by capital Roman letters: A, B, C, etc. The
size of a matrix is m × n if it has m rows and n columns. The entries in a
matrix are often called the elements of the matrix and are usually denoted
by lower case Roman letters. If A is an m × n matrix, and 1 ≤ i ≤ m and
1 ≤ j ≤ n, then the entry in the ith row and jth column of A is often denoted
(A)ij . Thus ()ij means ‘the element in ith row and jth column’.
Examples 7.1.2.
1. Let
A=
1 2 3
4 5 6
Then A is a 2×3 matrix. We have that (A)11 = 1, (A)12 = 2, (A)13 = 3,
(A)21 = 4, (A)22 = 5, (A)23 = 6.
2. Let
B=
4
1
Then B is a 2 × 1 matrix. We have that (B)11 = 4, (B)21 = 1.
3. Let


1 1 −1
4 
C= 0 2
1 1
3
Then C is a 3 × 3 matrix. (C)11 = 1, (C)12 = 1, (C)13 = −1, (C)21 = 0,
(C)22 = 2, (C)23 = 4, (C)31 = 1, (C)32 = 1, (C)33 = 3.
7.1. MATRIX ARITHMETIC
173
4. Let
D=
6
Then D is a 1 × 1 matrix. We have that (D)11 = 6.
Matrices A and B are said to be equal, written A = B, if they have the
same size and corresponding entries are equal: that is, (A)ij = (B)ij for all
allowable i and j.
Example 7.1.3. Given that
a 2 b
3 x −2
=
4 5 c
y z
0
Find a, b, c, x, y, z. This example simply illustrates what it means for two
matrices to be equal. By definition a = 3, 2 = x, b = −2, 4 = y, 5 = z and
c = 0.
When we want to talk about an arbitrary matrix A we usually denote
its elements by aij where i tells you the row the element lives in and j the
column. For example, a typical 2 × 3 matrix A would be written
a11 a12 a13
A=
a21 a22 a23
7.1.2
Addition, subtraction, scalar multiplication and
the transpose
We define first the operations that cause us no trouble.
Addition Let A and B be two matrices of the same size. Then their sum
A + B is the matrix defined by
(A + B)ij = (A)ij + (B)ij .
That is, corresponding entries of A and B are added. If A and B are
not the same size then their sum is not defined.
Subtraction Let A and B be two matrices of the same size. Then their
difference A − B is the matrix defined by
(A − B)ij = (A)ij − (B)ij .
That is, corresponding entries of A and B are subtracted. If A and B
are not the same size then their difference is not defined.
174
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Scalar multiplication In matrix theory, numbers are often called scalars.
For us scalars will usually be either real or complex. Let A be any
matrix and λ any scalar. Then the matrix λA is defined as follows:
(λA)ij = λ(A)ij .
In other words, every element of A is multiplied by λ.
Transpose of a matrix Let A be an m × n matrix. Then the transpose
of A, denoted AT , is the n × m matrix defined by (AT )ij = (A)ji . We
therefore interchange rows and columns: the first row of A becomes the
first column of AT , the second row of A becomes the second column of
AT , and so on.
Examples 7.1.4.
1.
1
2 −1
3 −4
6
+
2 1 3
−5 2 1
=
1+2
2 + 1 −1 + 3
3 + (−5) −4 + 2
6+1
which gives
3
3 2
−2 −2 7
2.
1
2 −1
3 −4
6
−
2 1 3
−5 2 1
=
1−2
2 − 1 −1 − 3
3 − (−5) −4 − 2
6−1
which gives
3.
1 1
2 1
−1
1 −4
8 −6
5
−
3
3 2
−2 −2 7
is not defined since the matrices have different sizes.
4.
2
3
3 2
−2 −2 7
=
6
6 4
−4 −4 14
7.1. MATRIX ARITHMETIC
175
5. The transposes of the following matrices
1 2 3
4 5 6
4
1


1 1 −1
 0 2
4 
1 1
3
6
are, respectively,


1 4
 2 5 
3 6
7.1.3


1
0
1
 1 2 1 
4 1
−1 4 3
6 .
Matrix multiplication
This is more complicated than the other operations and, like them, is not
always defined. To define this operation it is useful to work with two special
classes of matrix. A row matrix or row vector is a matrix with one row (but
any number of columns). A column matrix or column vector is a matrix with
one column but any number of rows. Row and column matrices are often
denoted by bold lower case Roman letters a, b, c . . .. The ith element of the
row or column matrix a will be denoted by ai .
Examples 7.1.5. The matrix
1 2 3 4
is a row matrix whilst


1
 2 
 
 3 
4
is a column matrix.
I shall build up to the definition of matrix multiplication in three stages.
Stage 1. Let a be a row matrix and b a column matrix, where
a = (a1 a2 . . . am )
176
and
CHAPTER 7. MATRICES I: LINEAR EQUATIONS


b1
 b2 


 . 


b=

.


 . 
bn
Then their product ab is defined if, and only if, the number of columns of
a is equal to the number of rows of b, that is m = n, in which case their
product is the 1 × 1 matrix
ab = (a1 b1 + a2 b2 + . . . + an bn ).
The number
a1 b 1 + a2 b 2 + . . . + an b n
is called the inner product of a and b and is denoted by a · b. Using this
notation we have that
ab = (a · b).
Example 7.1.6. This odd way of multiplying is actually quite natural.
Here’s an example of where it arises in real life. If you buy y items whose
unit cost is x then you spend xy. This can be generalized as follows when
you buy a number of different kinds of items at different prices. Let a be the
row matrix
0·6 1 0·2
where 0 · 6 is the price of a bottle of milk, 1 is the price of a loaf of bread,
and 0 · 2 is the price of an egg. Let b be the column matrix


2
 3 
10
where 2 is the number of bottles of milk bought, 3 is the number of loaves
of bread bought, and 10 is the number of eggs bought. Thus a is the price
row matrix and b is the quantity column matrix. The total amount spent is
therefore
0 · 6 × 2 + 1 × 3 + 0 · 2 × 10 :
namely, the sum over all the commodities bought of the price of each commodity times the number of items of that commodity purchased. This number is precisely the inner product a · b: namely, 6 · 20.
7.1. MATRIX ARITHMETIC
177
Stage 2. Let a be a row matrix as above and let B be a matrix. Thus a is
a 1 × m matrix and B is a p × q matrix. Then their product aB is defined
if, and only if, the number of columns of a is equal to the number of rows
of B. Thus m = p. To calculate the product think of B as consisting of q
column matrices b1 , . . . , bq . We calculate the q numbers a · b1 , . . . , a · bq as
in stage 1, and the q numbers that result become the entries of aB. Thus
aB is a 1 × q matrix whose jth entry is the number a · bj .
Example 7.1.7. Let a be the cost matrix of our previous example. Let B be
the 3 × 5 matrix whose columns tell me the quantity of commodities bought
on each of the days of the week Monday to Friday:


2 0 2 0 4
B= 3 0 4 0 8 
10 0 10 0 20
Thus on Tuesday and Thursday no purchases were made, whilst on Friday
extra commodities were bought in preparation for the weekend. The matrix
aB is a 1 × 5 matrix which tells us how much was spent on each day of the
week. Thus


0
2
0
4
2
aB = 0 · 6 1 0 · 2  3 0 4 0 8 
10 0 10 0 20
which is equal to
6 · 2 0 7 · 2 0 14 · 4
Stage 3. Let A be an m × n matrix and let B be a p × q matrix. Their
product AB is defined if, and only if, the number of columns of A is equal
to the number of rows of B: that is n = p. If this is so then AB is an m × q
matrix. To define this product we think of A as consisting of m row matrices
a1 , . . . , am and we think of B as consisting of q column matrices b1 , . . . , bq .
As in Stage 2 above, we multiply the first row of A into each of the columns
of B and this gives us the first row of A; we then multiply the second row of
A into each of the columns of B to get the second row of B, and so on.
Example 7.1.8. Let B be the 3 × 5 matrix of the previous example whose
columns tell me the quantity of commodities bought on each of the days
178
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Monday to Friday


2 0 2 0 4
B= 3 0 4 0 8 
10 0 10 0 20
Let A be the 2×3 matrix whose first row tells me the cost of the commodities
in shop 1 and whose second row tells me the cost of the commodities in shop 2.
0·6
1 0·2
A=
0 · 65 1 · 05 0 · 30
The first row of AB tells me how much was spent on each day of the week
in shop 1, and the second row of AB tells me how much was spent on each
day of the week in shop 2. Thus


2 0 2 0 4
0·6
1 0·2 
3 0 4 0 8 
AB =
0 · 65 1 · 05 0 · 30
10 0 10 0 20
which is equal to
6 · 2 0 7 · 2 0 14 · 4
7 · 45 0 8 · 5 0
17
Examples 7.1.9.
1.



1 −1 0 2 1 


2
3
1
−1
3



=


0
2. The product
1 −1 2
3
0 1
0 −2
3
2
1 −1
doesn’t exist because the number of columns of the first matrix is not
equal to the number of rows of the second matrix.
3. The product
1 2 4
2 6 0


4
1 4 3
 0 −1 3 1 
2
7 5 2
7.1. MATRIX ARITHMETIC
179
exists because the first matrix is a 2 × 3 and the second is a 3 × 4. Thus
the product will be a 2 × 4 matrix and is
12 27 30 13
8 −4 26 12
Summary of matrix multiplication
• Let A be an m × n matrix and B a p × q matrix. The product AB
is defined if, and only if, n = p and the result will then be an m × q
matrix. In other words:
(m × n)(n × q) = (m × q).
• (AB)ij is the inner product of the ith row of A and the jth column of
B.
• It follows that the inner product of the ith row of A and each of the
columns of B in turn yields each of the elements of the ith row of AB
in turn.
If ai are row matrices and bj are column matrices then the product of
two matrices can be written as follows




a1
a1 · b1 . . . a1 · bn
 . 


.
...
.





 .  b1 . . . bn = 
.
...
.




 . 


.
...
.
am
am · b1 . . . am · bn
7.1.4
Special matrices
Matrices come in all shapes and sizes, but some of these are important enough
to warrant their own terminology. A matrix all of whose elements are zero is
called a zero matrix. The m × n zero matrix is denoted Om,n or just O and
we let the context determine the size of O. A square matrix is one in which
the number of rows is equal to the number of columns. In a square matrix A
the elements (A)11 , (A)22 , . . . , (A)nn are called the diagonal elements. All the
other elements of A are called the off-diagonal elements. A diagonal matrix is
180
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
a square matrix in which all off-diagonal elements are zero. A scalar matrix
is a diagonal matrix in which the diagonal elements are all the same. The
n × n identity matrix is the scalar matrix in which all the diagonal elements
are the number one. This is denoted by In or just I where we allow the
context to determine the size of I. Thus scalar matrices are those of the
form λI where λ is any scalar. A matrix is real if all its elements are real
numbers, and complex if all its elements are complex numbers. A matrix A
is said to be symmetric if AT = A. In particular, symmetric matrices are
always square.
Examples 7.1.10.
1. The matrix


1 0 0
 0 2 0 
0 0 3
is a 3 × 3 diagonal matrix.
2. The matrix

1
 0

 0
0
0
1
0
0

0
0 

0 
1
0
0
1
0
is the 4 × 4 identity matrix.
3. The matrix






42 0 0 0 0
0 42 0 0 0
0 0 42 0 0
0 0 0 42 0
0 0 0 0 42
is a 5 × 5 scalar matrix.
4. The matrix








0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0














7.1. MATRIX ARITHMETIC
181
is a 6 × 5 zero matrix.
5. The matrix


1 2 3
 2 4 5 
3 5 6
is a 3 × 3 symmetric matrix.
7.1.5
Linear equations
Matrices are extremely useful in helping us to solve systems of linear equations. For the time being, I shall simply show you how matrices provide a
convenient notation for writing down such equations.
A system of m linear equations in n unknowns is a list of equations of the
following form
a11 x1 + a12 x2 + . . . + a1n xn
a21 x1 + a22 x2 + . . . + a2n xn
am1 x1 + am2 x2 + . . . + amn xn
= b1
= b2
···
= bm
If we have only a few unknowns then we often use w, x, y, z rather than
x1 , x2 , x3 , x4 . A solution is a set of values of x1 , . . . , xn that satisfy all the
equations. The set of all solutions is called the solution set or general solution.
The equations above can be conveniently represented using matrices. Let A
be the m × n matrix (A)ij = aij , let b be the m × 1 matrix (b)i = bi , and let
x be the n × 1 matrix (x)j = xj . Then the system of linear equations above
can be written in the form
Ax = b.
The matrix A is called the coefficient matrix. At the moment, we are just
using matrices as packaging for the equations.
Example 7.1.11. The following system of linear equations
2x + 3y = 1
x+y = 2
182
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
may be written in matrix form as follows.
2 3
x
1
=
1 1
y
2
7.1.6
Conics and quadrics
We have dealt with polynomial equations in one unknown and, in this chapter, we shall deal with linear equations in several unknowns. But what about
equations in several unknowns where both products and powers of the unknowns can occur? The simplest class of such equations are the conics. These
are equations of the form
ax2 + bxy + cy 2 + dx + ey + f = 0
where a, b, c, d, e, f are numbers of some kind. These are equations in two
variables and variables either appear to degree zero, which is the constant
term, directly as linear terms or as binary products such as xy or x2 . In
general, the roots or zeroes of such equations form curves in the plane such
as circles, ellipses and hyperbolas. The term conic arises from the way that
they were first defined by the Greeks as those curves that arise when you cut
a double cone by means of a plane. These curves are important in astronomy
since it can be proved that the orbits of satellites, planets, space-craft etc
always follow conics. The reason for introducing them here is that they can
be represented as matrix equations as follows.
xT Ax + J T x + (h) = (0)
where
A=
a
1
b
2
1
b
2
c
x=
x
y
J=
f
g
This is not just a notational convenience. The fact that the matrix A is
symmetric means that powerful ideas from matrix theory, to be developed
later in this book, can be brought to bear on studying such conics. If we
replace the x above by the matrix
 
x

y 
x=
z
7.1. MATRIX ARITHMETIC
183
and A by a 3 × 3 symmetric matrix and J by a 3 × 1 matrix then we get
the matrix equation of a quadric surface. Examples of such surfaces are the
surface of a sphere or the surface described by a cooling tower. But even
though we are dealing with three rather than two dimensions, the matrix
algebra we shall develop applies just as well.
7.1.7
Graphs
The word ‘graph’ is used in two, completely different, ways in mathematics:
to mean the graph of a function, and to mean a certain kind of diagram. It
is in the second sense that we shall use it here. A graph consists of a set of
vertices and a collection of edges, where this collection may contain repeats.
By an edge we mean either a set of two vertices or a single vertex. Graphs
are represented by means of diagrams: the vertices are represented by circles
and the edges by means of lines joining the edges. An edge joining a vertex
to itself is called a loop. An example of a graph is given below.
4
3
5
1
2
This information can be represented by means of a 5 × 5 symmetric matrix G
given below. The entry (G)ij is just the number of edges connecting vertices
i and j.


0 1 0 1 0
 1 0 1 0 0 



0
1
0
1
1
G=


 1 0 1 0 1 
0 0 1 1 0
Exercises 7.1


1 2
1 4
1. Let A =  1 0  and B =  −1 1 . Find A + B, A − B and
−1 1
0 3
−3B.


184
CHAPTER 7. MATRICES I: LINEAR EQUATIONS




0 4 2
1 −3
5
0 −4 . Find the matrices
2. Let A =  −1 1 3  and B =  2
2 0 2
3
2
0
AB and BA.


0 1
3
1
1
0
3
3. Let A =
, B =  −1 1  and C =
.
0 −1
−1 1 1
3 1
Calculate BA, AA and CB. Can any other pairs of these matrices be
multiplied ? Multiply those which can.
4. Calculate


1
 2 
 
 3 
4
1 2 3


2 1
3 0
−1 2 3


−1 0 , B =
5. If A =
and C =
. Calcu−2 1
4 0 1
2 3
late both (AB)C and A(BC) and check that you get the same answer.
6. Calculate

 
2 −1
2
x
 1
2 −4   y 
3 −1
1
z
7. Calculate
2 + i 1 + 2i
i 3+i
2i 2 + i
1 + i 1 + 2i
where i is the complex number i.
8. Calculate



a 0 0
d 0 0
 0 b 0  0 e 0 
0 0 c
0 0 f
9. Calculate
7.1. MATRIX ARITHMETIC
185
(a)



1 0 0
a b c
 0 1 0  d e f 
0 0 1
g h i
(b)



0 1 0
a b c
 1 0 0  d e f 
0 0 1
g h i
(c)



a b c
0 1 0
 d e f  1 0 0 
g h i
0 0 1
10. Find the transposes of each of the following matrices


1
1 2
1 −3
5
 2 

0 −4 , C = 
A =  1 0 , B =  2
 3 
−1 1
3
2
0
4




11. This question deals with the following 4 matrices with complex entries
and their negatives: I, X, Y, Z where
1 0
0 1
i 0
0 −i
I=
X=
Y =
and Z =
0 1
−1 0
0 −i
−i 0
Show that the product of any two such matrices is again a matrix of
this type by completing the following table for multiplication where the
entry in row A and column B is AB in that order.
I
I
X
Y
Z
−I
−X
−Y
−Z
X
Y
Z −I −X −Y
−Z
186
7.2
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Matrix algebra
In this section, we shall look at algebra where the variables are matrices.
This algebra is similar to high-school algebra but also differs significantly in
one or two places. For example, if A and B are matrices it is not true in
general that AB = BA even if both products are defined. We will learn in
this section which rules of high-school algebra apply to matrices and those
which don’t.
7.2.1
Properties of matrix addition
In Chapter 3, I introduced the idea of a binary operation. Matrix addition
and multiplication both have two inputs just as addition and multiplication
of real numbers, but there is an added complication that not all pairs of
matrices can be added and not all pairs of matrices can be multiplied. Despite
this difference, I shall nevertheless use the same terminology I introduced in
Chapter 3 but in this slightly different setting.
(MA1) (A + B) + C = A + (B + C). This is the associative law for matrix
addition.
(MA2) A + O = A = O + A. The zero matrix O, the same size as A, is the
additive identity for matrices the same size as A.
(MA3) A + (−A) = O = (−A) + A. The matrix −A is the unique additive
inverse of A.
(MA4) A + B = B + A. Matrix addition is commutative.
Thus matrix addition has the same properties as the addition of real numbers, apart from the fact that the sum of two matrices is only defined when
they have the same size. The role of zero is played by the zero matrix O of
the appropriate size.
Example 7.2.1. Calculate
2A − 3B + 6I
where
A=
1 2
3 4
and B =
0 1
2 1
7.2. MATRIX ALGEBRA
187
Because we are dealing with matrix addition and scalar multiplication the
rules we apply are the same as those in high-school algebra. We get
8 1
0 11
7.2.2
Properties of matrix multiplication
(MM1) (AB)C = A(BC). This is the associative law for matrix multiplication.
(MM2) Let A be an m × n matrix. Then Im A = A = AIn . The matrices Im
and In are the left and right multiplicative identities, respectively.
(MM3) A(B + C) = AB + AC and (B + C)A = BA + CA. These are the
left and right distributivity laws for matrix multiplication over matrix
addition.
Thus matrix multiplication has the same properties as the multiplication
of real numbers, apart from the fact that the product is not always defined,
except the following three major differences.
1. Matrix multiplication is not commutative.
Consider the matrices
1 2
1 1
A=
and B =
3 4
−1 1
Then AB 6= BA. One consequence of the fact that matrix multiplication is
not commutative is that
(A + B)2 6= A2 + 2AB + B 2 ,
in general (see below).
2. The product of two matrices can be a zero matrix without either
matrix being a zero matrix.
Consider the matrices
1 2
−2 −6
A=
and B =
2 4
1
3
188
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Then AB = O.
3. Cancellation of matrices is not allowed.
Consider the matrices
0 2
2 3
−1 1
A=
and B =
and C =
0 1
1 4
1 4
Then A 6= O and AB = AC but B 6= C.
Just how different matrix algebra is from high-school algebra is shown by
the following example.
Example 7.2.2. Suppose that X 2 = I. Then X 2 − I = O and so we may
factorize to get (X − I)(X + I) = O. But we cannot conclude from this
that X = I or X = −I because we cannot conclude from the fact that the
product of two matrices is a zero matrix then one of the matrices must itself
by a zero matrix. We have seen that this is false. We therefore cannot deduce
that the identity matrix has two square roots. In fact, it has infinitely many
as we now show. Let
a
b
A=
c −a
and suppose that a2 + bc = 1. Check that A2 = I. Examples of matrices
satisfying these conditions are
√
1 + n2
√ −n
n − 1 + n2
where n is any positive integer. Thus the 2 × 2 identity matrix has infinitely
many square roots.
7.2.3
Properties of scalar multiplication
(S1) 1A = A.
(S2) λ(A + B) = λA + λB
(S3) (λµ)A = λ(µA).
(S4) (λ + µ)A = λA + µA.
(S5) (λA)B = A(λB) = λ(AB).
7.2. MATRIX ALGEBRA
7.2.4
189
Properties of the transpose
(T1) (AT )T = A.
(T2) (A + B)T = AT + B T .
(T3) (αA)T = αAT .
(T4) (AB)T = B T AT .
It is important to observe that the transpose of a product reverses the
order of the matrices.
There are some important consequences of the above properties:
• Because matrix addition is associative we can write sums without brackets.
• Because matrix multiplication is associative we can write matrix products without brackets.
• The left and right distributivity laws can be extended to arbitrary finite
sums.
7.2.5
Some proofs
In this section, I shall prove that the algebraic properties of matrices stated
really do hold. I shan’t prove all of them: just a representative sample. I shall
leave you the pleasure of proving the rest. It is important to observe that all
the properties of matrix algebra are ultimately proved using the properties
of real numbers.
Let A be an m × n matrix whose entry in the ith row and jth column is
aij . Let B be an n × p matrix whose entry in the jth row and kth column is
bjk . By definition (AB)ik is the number equal to the product of the ith row
of A times the kth column of B. This is just
(AB)ik =
n
X
j=1
Theorem 7.2.3.
1. (A + B) + C = A + (B + C).
aij bjk .
190
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
2. A(BC) = (AB)C.
3. (λ + µ)A = λA + µA.
Proof. (1) To show that (A + B) + C = A + (B + C) we have to prove two
things. First, the size of (A + B) + C is the same as the size of A + (B + C).
Second, elements of (A + B) + C and A + (B + C) in corresponding positions
are equal. To add A and B they have to be the same size and the result
will be the same size as both of them. Thus C is the same size as A and B.
It’s clear that both sides of the equation really are the same size. We now
compare corresponding elements:
((A + B) + C)ij = (A + B)ij + (C)ij = ((A)ij + (B)ij ) + (C)ij .
But now we use the associativity of addition of real numbers to get
((A)ij +(B)ij )+(C)ij = (A)ij +((B)ij +(C)ij ) = (A)ij +(B+C)ij = (A+(B+C))ij ,
as required.
(2) Let A be an m × n matrix with entries aij , let B be an n × p matrix
with entries bjk , and let C be a p × q matrix with entries ckl . It’s evident
that A(BC) and (AB)C have the same size, so it remains to show that
corresponding elements are the same. We shall prove that
(A(BC))il = ((AB)C)il .
By definition
(A(BC))il =
n
X
ait (BC)tl ,
t=1
and
(BC)tl =
p
X
bts csl .
s=1
Thus
(A(BC))il =
n
X
t=1
ait
p
X
!
bts csl
.
s=1
Using distributivity of multiplication over addition for real numbers this sum
is just
p
n X
X
ait bts csl .
t=1 s=1
7.2. MATRIX ALGEBRA
191
Now change the order in which we add up these real numbers to get
p
n
X
X
ait bts csl .
s=1 t=1
Now use distributivity again
p
n
X
X
s=1
!
ait bts
csl .
t=1
The sum within the brackets is just
(AB)is
and so the whole sum is
p
X
(AB)is csl
s=1
which is precisely
((AB)C)il .
(3) Clearly (λ + µ)A and λA + µA have the same sizes. We show that
corresponding elements are the same:
((λ + µ)A)ij = (λ + µ)(A)ij = λ(A)ij + µ(A)ij = (λA)ij + (µA)ij
which is just (λA + µA)ij , as required.
We now prove the properties satisfied by the transpose.
Theorem 7.2.4.
1. (AT )T = A.
2. (A + B)T = AT + B T .
3. (αA)T = αAT .
4. (AB)T = B T AT .
192
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Proof. (1) We have that
((AT )T )ij = (AT )ji = (A)ij .
(2) We have that
((A + B)T )ij = (A + B)ji = (A)ji + (B)ji = (AT )ij + (B T )ij
which is just
(AT + B T )ij .
(3) We have that
((αA)T )ij = (αA)ji = α(A)ji = α(AT )ji = (αAT )ij .
(4) Let A be an m × n matrix and B an n × p matrix. Thus AB is defined
and is m × p. Hence (AB)T is p × m. Now B T is p × n and AT is n × m.
Thus B T AT is defined and is p × m. Hence (AB)T and B T AT have the same
size. We now show that corresponding elements are equal. By definition
((AB)T )ij = (AB)ji .
This is equal to
n
n
X
X
(A)js (B)si =
(AT )sj (B T )is .
s=1
s=1
But real numbers commute under multiplication and so
n
n
X
X
(B T )is (AT )sj = (B T AT )ij ,
(AT )sj (B T )is =
s=1
s=1
as required.
Quantum Mechanics
Quantum mechanics is one of the fundamental theories of physics. At
its heart are matrices. We have defined the transpose of a matrix but for
matrices with complex entries there is another, related, operation. Given
any complex matrix A we define the matrix A† to be the one obtained by
transposing A and then taking the complex conjugate of all entries. It is
therefore the conjugate-transpose of A. A matrix A is called Hermitian
if A† = A. Observe that a real matrix is Hermitian precisely when it is
symmetric. It turns out that quantum mechanics is based on Hermitian
matrices and their generalizations. The fact that matrix multiplication
7.2. MATRIX ALGEBRA
193
is not commutative is one of the reasons that quantum mechanics is so
different from classical mechanics. The theory of quantum computing
makes heavy use of Hermitian matrices and their properties.
Exercises 7.2
1. Calculate
2. Calculate
2
0
7 −1


1
 2 
3
1 1
1 0
+
+
0 1
1 1
+

1
3 2 1  −1 
−4
2 2
3 3

3. Calculate
x y
4. If
5 4
4 4
1 −1
1
2
A=
3 1 5
x
y
calculate A2 , A3 and A4 .
1 1
1
5. Let A =
and x =
. Calculate Ax, A2 x, A3 x, A4 x and
1 0
0
A5 x. What do you notice?
6. Calculate A2 where
A=
cos θ sin θ
− sin θ cos θ
7. Show that
A=
satisfies A2 − 5A − 2I = O.
1 2
3 4
194
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
8. Let A be the following 3 × 3 matrix


2 4
4
 0 1 −1 
0 1
3
Calculate
A3 − 6A2 + 12A − 8I
where I is the

3

2
9. Let A =
2
3 × 3 identity matrix.

1 −1
2 −1  Calculate
2
0
A3 − 5A2 + 8A − 4I
where I is the 3 × 3 identity matrix.
10. If 3X + A = B, find X in terms of A and B.
1 1
2 2
11. If X + Y =
and X − Y =
find X and Y .
2 2
1 1
12. If AB = BA show that A2 B = BA2 .
13. Is it true that AABB = ABAB?
14. Show that
(A + B)2 − (A − B)2 = 2(AB + BA).
15. Let A and B be n × n matrices. Is it necessarily true that
(A − B)(A + B) = A2 − B 2 ?
If so, prove it. If not, find a counterexample.
16. Expand (A + I)4 carefully.
17. A matrix A is said to be symmetric if AT = A.
(a) Show that a symmetric matrix must be square.
(b) Show that if A is any matrix then AAT is defined and symmetric.
7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS
195
(c) Let A and B be symmetric matrices of the same size. Prove that
AB is symmetric if and only if AB = BA.
18. An n × n-matrix A is said to be skew-symmetric if AT = − A.
(a) Show that the diagonal entries of a skew-symmetric matrix are all
zero.
(b) If B is any n × n-matrix, show that B + B T is symmetric and
that B − B T is skew-symmetric.
(c) Deduce that every square matrix can be expressed as the sum of
a symmetric matrix and a skew-symmetric matrix.
19. Let A, B and C be square matrices of the same size. Define [A, B] =
AB − BA. Calculate
[[A, B], C] + [[B, C], A] + [[C, A], B].
20. Let A be a 2 × 2 matrix such that AB = BA for all 2 × 2 matrices B.
Show that
λ 0
A=
0 λ
for some scalar λ.
21. Let A be a 2 × 2 matrix. The trace of A, denoted tr(A), is the sum of
the diagonal elements.
(a) Show that tr(A + B) = tr(A) + tr(B); tr(λA) = λtr(A); tr(AB) =
tr(BA).
(b) Let A be a known matrix. Show that the equation AX − XA = I
cannot be solved for X.
7.3
Solving systems of linear equations
The goal of this section is to use matrices to help us solve systems of linear
equations. We begin by proving some general results on linear equations,
and then we describe Gaussian elimination, an algorithm for solving systems
of linear equations.
196
7.3.1
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Some theory
A system of m linear equations in n unknowns is a list of equations of the
following form
a11 x1 + a12 x2 + . . . + a1n xn
a21 x1 + a22 x2 + . . . + a2n xn
am1 x1 + am2 x2 + . . . + amn xn
= b1
= b2
···
= bm
A solution is a sequence of values of x1 , . . . , xn that satisfy all the equations. The set of all solutions is called the solution set or general solution.
The equations above can be conveniently represented using matrices. Let
A be the m × n matrix (A)ij = aij , let b be the m × 1 matrix (b)i1 = bi , and
let x be the n × 1 matrix (x)j1 = xj . Then the system of linear equations
above can be written in the form
Ax = b
If b is a zero matrix, we say that the equations are homogeneous, otherwise
they are said to be inhomogeneous.
A system of linear equations that has no solution is said to be inconsistent;
otherwise, it is said to be consistent.
We begin with some results that tell us what to expect when solving
systems of linear equations.
Proposition 7.3.1. Homogeneous equations Ax = 0 are always consistent,
because x = 0 is always a solution. In addition, the sum of any two solutions
is again a solution, and the scalar multiple of any solution is again a solution.
Proof. Let Ax = 0 be our homogeneous system of equations. Let a and b be
solutions. That is Aa = 0 and Ab = 0. We now calculate A(a + b). To do
this we use the fact that matrix multiplication satisfies the left distributivity
law
A(a + b) = Aa + Ab = 0 + 0 = 0.
Now let a be a solution and λ any scalar. Then
A(λa) = λAa = λ0 = 0.
7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS
197
Proposition 7.3.2. Let
Ax = b
be a consistent system of linear equations. Let p be any one solution. Then
every solution of the equation is of the form p + h for some solution h of
Ax = 0.
Proof. Let a be any solution to Ax = b. Let h = a − p. Then Ah = 0. The
result now follows.
Theorem 7.3.3 (Fundamental theorem of linear equations). We assume
that the scalars are the rationals, the reals or the complexes. A system of
linear equations Ax = b has either
• No solutions.
• Exactly one solution.
• Infinitely many solutions.
Proof. We prove that if we can find two different solutions we can in fact
find infinitely many solutions. Let u and v be two distinct solutions to
this equation then Au = b and Av = b. Consider now the column matrix
w = u − v. Then
Aw = A(u − v) = Au − Av = 0
using the distributive law. Thus w is a non-zero column matrix that satisfies
the equation Ax = 0. Consider now the column matrices of the form
u + λw
where λ is any real number. This is therefore a set of infinitely many different
column matrices. We calculate
A(u + λw) = Au + λAw = b
using the distributive law and properties of scalars. It follows that the infinitely many column matrices u + λw are solutions to the equation Ax =
b.
198
7.3.2
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Gaussian elimination
In this section, we shall develop an algorithm that will take as input a system
of linear equations and produce as output the following: if the system has
no solutions it will tell us, on the other hand if it has solutions then it will
determine them all. Our method is based on three simple ideas:
1. Certain systems of linear equations have a shape that makes them very
easy to solve.
2. Certain operations can be carried out on systems of linear equations
which simplify them but do not change the solutions.
3. Everything can be done using matrices.
Here are examples of each of these ideas.
Example 7.3.4. The system of equations
2x + 3y = 1
y = −3
is very easy to solve. From the second equation we get y = −3. Substituting
this value into the first equation gives us x = 5. We can check that this
solution is correct by checking that these two values satisfy every equation.
Example 7.3.5. The system of equations
2x + 3y = 1
x+y = 2
can be converted into a system with the same solutions but which is easier
to solve. Multiply the second equation by 2. This gives us the new equations
2x + 3y = 1
2x + 2y = 4
which have the same solutions as the original equations. Next, subtract the
first equation from the second equation to get
2x + 3y = 1
−y = 3
7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS
199
Finally, multiply the last equation by −1. The resulting equations have the
same solutions as the original equations, but they can now be easily solved
as we showed above.
Example 7.3.6. The system of equations
2x + 3y = 1
x+y = 2
can be written in matrix form as the matrix equation
2 3
x
1
=
1 1
y
2
For the purposes of our algorithm, we rewrite this equation in terms of what
is called an augmented matrix
2 3 1
1 1 2
The operations carried out in the previous example can be applied directly
to the augmented matrix.
2 3 1
2
3 1
2 3
2 3 1
1
=⇒
=⇒
=⇒
1 1 2
2 2 4
0 −1 3
0 1 −3
This augmented matrix can then be converted back into the usual matrix
form and solved
2x + 3y = 1
y = −3
We now formalize the above ideas.
A matrix is called a row echelon matrix or to be in row echelon form if it
satisfies the following three conditions:
1. Any zero rows are at the bottom of the matrix.
2. If there are non-zero rows then they begin with the number 1, called
the leading 1.
3. In the column beneath a leading 1, the elements are all zero.
200
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
The following operations on a matrix are called elementary row operations:
1. Multiply row i by a non-zero scalar λ. We notate this operation by
Ri ← λRi .
2. Interchange rows i and j. We notate this operation by Ri ↔ Rj .
3. Add a multiple λ of row i to another row j. We notate this operation
by Rj ← Rj + λRi .
The following result is not hard to prove.
Proposition 7.3.7. Applying the elementary row operations to a system of
linear equations does not change their solution set.
Given a system of linear equations
Ax = b
the matrix
(A|b)
is called the augmented matrix.
Algorithm 7.3.8. (Gaussian elimination) This is an algorithm for solving
systems of linear equations. In outline, the algorithm runs as follows:
1. Given a system of equations
Ax = b
form the augmented matrix
(A|b).
2. By using elementary row operations, convert
(A|b)
into an augmented matrix
(A0 |b0 )
which is a row echelon matrix.
7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS
201
3. Solve the equations obtained from
(A0 |b0 )
by back substitution.
Remarks
• The process in step (2) has to be carried out systematically to avoid
going around in circles.
• Elementary row operations applied to a set of linear equations do not
change the solution set. Thus the solution sets of
Ax = b and A0 x = b0
are the same.
• Solving systems of linear equations where the associated augmented
matrix is a row echelon matrix is easy and can be accomplished by
back substitution.
Here is a more detailed description of step (2) of the algorithm — the
input is a matrix B and the output is a matrix B 0 which is a row echelon
matrix:
1. Locate the leftmost column that does not consist entirely of zeros.
2. Interchange the top row with another row if necessary to bring a nonzero entry to the top of the column found in step 1.
3. If the entry now at the top of the column found in step 1 is a, then
multiply the first row by a1 in order to introduce a leading 1.
4. Add suitable multiples of the top row to the rows below so that all
entries below the leading 1 become zeros.
5. Now cover up the top row, and begin again with step 1 applied to the
matrix that remains. Continue in this way until the entire matrix is a
row echelon matrix.
202
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
The important thing to remember is to start at the top and work downwards.
We now look in more detail at the final part of the overall algorithm where
our set of equations A0 x = b0 is derived from an augmented matrix which
is a row echelon matrix. Assume that there is more than one solution. The
variables are divided into two groups: those variables corresponding to the
columns of A0 containing leading 1’s, called leading variables, and the rest,
called free variables. We solve for the leading variables in terms of the free
variables; the free variables can be assigned arbitrary values independently
of each other.
Examples 7.3.9.
1. We shall show that the following system of equations is inconsistent.
x + 2y − 3z = −1
3x − y + 2z = 7
5x + 3y − 4z = 2
The first step is to write down the augmented matrix of the system. In
this case, this is the matrix


1
2 −3 −1
 3 −1
7 
2
2
5
3 −4
Carry out the elementary row operations R2 ← R2 − 3R1 and R3 ←
R3 − 5R1 . This gives us


1
2 −3 −1
 0 −7 11 10 
0 −7 11
7
Now carry out the elementary row operation R3 ← R3 − R2 which
yields


1
2 −3 −1
 0 −7 11 10 
0
0
0 −3
The equation corresponding to the last line of the augmented matrix
is 0x + 0y + 0z = −3. Clearly, this equation has no solutions because
it is zero on the left and non-zero on the right. It follows that
the original set of equations has no solutions.
7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS
203
2. We shall show that the following system of equations has exactly one
solution, and we shall also check it.
x + 2y + 3z = 4
2x + 2y + 4z = 0
3x + 4y + 5z = 2
We first write down the augmented matrix


1 2 3 4
 2 2 4 0 
3 4 5 2
We then carry out the elementary row operations R2 ← R2 − 2R1 and
R3 ← R3 − 3R1 to get


1
2
3
4
 0 −2 −2 −8 
0 −2 −4 −10
Then carry out the elementary row
− 21 R3 that yield

1 2
 0 1
0 1
operations R2 ← − 12 R2 and R3 ←

3 4
1 4 
2 5
Finally, carry out the elementary row

1 2 3
 0 1 1
0 0 1
operation R3 ← R3 − R2

4
4 
1
This is now a row echelon matrix. Write down the corresponding set
of equations
x + 2y + 3z = 4
y+z = 4
z = 1
204
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Now solve by back substitution to get x = −5, y = 3 and z = 1.
Finally, we check that


  
1 2 3
−5
4
 2 2 4  3  =  0 
3 4 5
1
2
3. We shall show that the following system of equations has infinitely
many solutions, and we shall check them.
x + 2y − 3z = 6
2x − y + 4z = 2
4x + 3y − 2z = 14
The augmented matrix for this system is


1
2 −3 6
 2 −1
4 2 
4
3 −2 14
We transform this matrix into an echelon matrix by means of the following elementary row operations R2 ← R2 − 2R1 , R3 ← R3 − 4R1 ,
R2 ← − 51 R2 , R3 ← − 51 R3 and R3 ← R3 − R2 . This yields


1 2 −3 6
 0 1 −2 2 
0 0
0 0
Because the bottom row consists entirely of zeros, this means that we
have only two equations
x + 2y − 3z = 6
y − 2z = 2
By back substitution, both x and y can be expressed in terms of z, and
z may take any value we like. We say that z is a free variable. Let
z = λ ∈ R. Then the set of solutions can be written in the form
   


x
2
−1
 y  =  2  + λ 2 
z
0
1
7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS
205
We now check that these solutions work


 

1
2 −3
2−λ
6
 2 −1
4   2 + 2λ  =  2 
4
3 −2
λ
14
as required.
Exercises 7.3
1. In each case, determine whether the system of equations is consistent
or not. When consistent, find all solutions and show that they work.
(a)
2x + y − z = 1
3x + 3y − z = 2
2x + 4y + 0z = 2
(b)
2x + y − z = 1
3x + 3y − z = 2
2x + 4y + 0z = 3
(c)
2x + y − 2z = 10
3x + 2y + 2z = 1
5x + 4y + 3z = 4
(d)
x+y+z+w
4x + 5y + 3z + 3w
2x + 3y + z + w
5x + 7y + 3z + 3w
=
=
=
=
0
1
1
2
206
7.4
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Blankinship’s algorithm
The ideas of this chapter lead to an alternative, and better, procedure3 for
calculating the integers x and y such that gcd(a, b) = xa + yb. To explain
how it works, let’s go back to the basic step of Euclid’s algorithm. If a ≥ b
then we divide b into a and write
a = bq + r
where 0 ≤ r ≤ b. The key point is that gcd(a, b) = gcd(b, r). We shall now
think of (a, b) and (b, r) as column matrices
a
r
,
.
b
b
We want the 2 × 2 matrix that maps
r
a
to
.
b
b
This is the matrix
Thus
1 −q
0
1
1 −q
0
1
a
b
.
=
r
b
.
Finally, we can describe the process by the following matrix operation
1 0 a
1 −q r
→
0 1 b
0
1 b
by carrying out an elementary row operation. This procedure can be iterated.
It will terminate when one of the entries in the righthand column is 0. The
non-zero entry will then be the greatest common divisor of a and b and the
matrix on the lefthand side will tell you how to get to 0, gcd(a, b) from a, b
and so will provide the information that the Euclidean algorithm provides.
All of this is best illustrated by means of an example.
3
It was described by W. A. Blankinship in his paper ‘A new version of the Euclidean
algorithm’ American Mathematical Monthly 70 (1963), 742–745.
7.4. BLANKINSHIP’S ALGORITHM
207
Let’s calculate x, y such that gcd(2520, 154) = x2520 + y154. We start
with the matrix
1 0 2520
0 1 154
If we divide 154 into 2520 it goes 16 times plus a remainder. Thus we subtract
16 times the second row from the first to get
1 −16 56
0
1 154
We now repeat the process but, since the larger number, 154, is on the
bottom, we have to subtract some multiple of the first row from the second.
This time we subtract twice the first row from the second to get
1 −16 56
−2
33 42
Now repeat this procedure to get
3 −49 14
−2
33 42
And again
3 −49 14
−11 180 0
The process now terminates because we have a zero in the rightmost column.
The non-zero entry in the rightmost column is gcd(2520, 154). We also know
that
3 −49
2520
14
=
.
−11 180
154
0
Now this matrix equation corresponds to two equations. It is the one corresponding to the non-zero value that says
14 = 3 × 2520 − 49 × 154
which is both true and solves the extended Euclidean problem.
208
CHAPTER 7. MATRICES I: LINEAR EQUATIONS
Chapter 8
Matrices II: inverses
We have learnt how to add subtract and multiply matrices but we have not
defined division. The reason is that in general it cannot always be defined.
In this chapter, we shall explore when it can be. All matrices will be square
and I will always denote an identity matrix of the appropriate size.
8.1
What is an inverse?
The simplest kind of linear equation is ax = b where a and b are scalars. If
a 6= 0 we can solve this by multiplying by a−1 on both sides to get a−1 (ax) =
a−1 b. We now use associativity to get (a−1 a)x = a−1 b. Finally, a−1 a = 1 and
so 1x = a−1 b and this gives x = a−1 b. The number a−1 is the multiplicative
inverse of the non-zero number a.
We now try to copy this approach for the matrix equation
Ax = b.
We suppose that there is a matrix B such that BA = I.
• Multiply on the left both sides of our equation Ax = b to get B(Ax) =
Bb. Because order matters when you multiply matrices, which side
you multiply on also matters.
• Use associativity of matrix mulitiplication to get (BA)x = Bb.
• Now use our assumption that BA = I to get Ix = Bb.
• Finally, we use the properties of the identity matrix to get x = Bb.
209
210
CHAPTER 8. MATRICES II: INVERSES
We appear to have solved our equation, but we need to check it. We calculate
A(Bb). By associativity this is (AB)b. At this point we also have to assume
that AB = I. This gives Ib = b, as required. We conclude that in order to
copy the method for solving a linear equation in one unknown, our coefficient
matrix A must have the property that there is a matrix B such that
AB = I = BA.
We take this as the basis of the following definition.
A matrix A is said to be invertible if we can find a matrix B such that
AB = I = BA. The matrix B we call it an inverse of A, and we say that
the matrix A is invertible. Observe that A has to be square. A matrix that
is not invertible is said to be singular.
Example 8.1.1. A real number r regarded as a 1 × 1 matrix is invertible if
and only if it is non-zero, in which case an inverse is its reciprocal.
It’s clear that if A is a zero matrix, then it can’t be invertible just as in
the case of real numbers. However, the next example shows that even if A is
not a zero matrix, then it need not be invertible.
Example 8.1.2. Let A be the matrix
1 1
0 0
We shall show that there is no matrix B such that AB = I = BA. Let B be
the matrix
a b
c d
From BA = I we get
a = 1 and a = 0.
It’s impossible to meet both these conditions at the same time and so B
doesn’t exist.
On the other hand here is an example of a matrix that is invertible.
Example 8.1.3. Let


1 2 3
A= 0 1 4 
0 0 1
8.1. WHAT IS AN INVERSE?
and
211


1 −2
5
1 −4 
B= 0
0
0
1
Check that AB = I = BA. We deduce that A is invertible with inverse B.
As always, in passing from numbers to matrices things become more
complicated. Before going any further, I need to clarify one point which will
at least make our lives a little simpler.
Lemma 8.1.4. Let A be invertible and suppose that B and C are matrices
such that
AB = I = BA and AC = I = CA.
Then B = C.
Proof. Multiply AB = I both sides on the left by C. Then C(AB) = CI.
Now CI = C, because I is the identity matrix, and C(AB) = (CA)B since
matrix multiplication is associative. But CA = I thus (CA)B = IB = B. It
follows that C = B.
The above result tells us that if a matrix A is invertible then there is only
one matrix B such that AB = I = BA. We call the matrix B the inverse of
A. It is usually denoted by A−1 . It is important to remember that we can
only write A−1 if we know that A is invertible. In the following, we describe
some important properties of the inverse of a matrix.
Lemma 8.1.5.
1. If A is invertible then A−1 is invertible and its inverse is A.
2. If A and B are both invertible and AB is defined then AB is invertible
with inverse B −1 A−1 .
3. If A1 , . . . , An are all invertible and A1 . . . An is defined then A1 . . . An
−1
is invertible and its inverse is A−1
n . . . A1 .
Proof. (1) This is immediate from the equations A−1 A = I = AA−1 .
(2) Show that
AB(B −1 A−1 ) = I = (B −1 A−1 )AB.
(3) This follows from (2) above and induction.
212
CHAPTER 8. MATRICES II: INVERSES
We shall deal with the practical computation of inverse later. Let me
conclude this section by returning to my original motivation for introducing
an inverse.
Theorem 8.1.6 (Matrix inverse method). A system of linear equations
Ax = b
in which A is invertible has unique solution
x = A−1 b.
Proof. Observe that
A(A−1 b) = (AA−1 )b = Ib = b.
Thus A−1 b is a solution. It is unique because if x0 is any solution then
Ax0 = b
giving
A−1 (Ax0 ) = A−1 b
and so
x0 = A−1 b.
Example 8.1.7. We shall solve the following system of equations using the
matrix inverse method
x + 2y = 1
3x + y = 2
Write the equations in matrix form.
1
1 2
x
=
3 1
y
2
Determine the inverse of the coefficient matrix. In this case, you can check
that this is the following
−1
1 −2
−1
A =
−3
1
5
8.2. DETERMINANTS
213
Now we may solve the equations. From Ax = b we get that x = A−1 b. Thus
in this case
3 −1
1 −2
1
5
x=
=
1
−3
1
2
5
5
Thus x = 35 and y = 15 . Finally, it is always a good idea to check the solutions.
There are two (equivalent) ways of doing this. The first is to check by direct
substitution
3
1
x + 2y = + 2 · = 1
5
5
and
3 1
3x + y = 3 · + = 2
5 5
Alternatively, you can check by matrix mutiplication
3 1 2
5
1
3 1
5
which gives
1
2
You can see that both calculations are, in fact, identical.
8.2
Determinants
The obvious questions that arise from the previous section are how do we
decide whether a matrix is invertible or not and, if it is invertible, how do
we compute its inverse? The material in this section is key to answering
both of these questions. I shall define a number, called the determinant, that
can be calculated from any square matrix. Unfortunately, the definition is
completely unmotivated but it will justify itself by being useful.
Let A be a square matrix. We denote its determinant by det(A) or by
replacing the round brackets of the matrix A with straight brackets. It is
defined inductively: this means that I define an n × n determinant in terms
of (n − 1) × (n − 1) determinants.
• The determinant of the 1 × 1 matrix a is a.
214
CHAPTER 8. MATRICES II: INVERSES
• The determinant of the 2 × 2 matrix
a b
A=
c d
denoted
a b
c d
is the number ad − bc.
• The determinant of the 3 × 3 matrix


a b c
 d e f 
g h i
denoted
a b c d e f g h i is the number
e f − b d f
a g i
h i + c d e
g h
We could in fact define the determinant of any square matrix of whatever
size in much the same way. However, we shall limit ourselves to calculating
the determinants of 3 × 3 matrices at most. It’s important to pay attention
to the signs in the definition. You multiply alternately by plus one and minus
one
+ − + − ...
Examples 8.2.1.
1.
2.
2 3
4 5
2 1 0
1 0 2
0 1 1
= 2 × 5 − 3 × 4 = −2.
= 2 0 2
1 1
− 1 1 2
0 1
= −5
8.2. DETERMINANTS
3.
1 2 1
3 1 0
2 0 1
215
= 1 1 0
0 1
− 2 3 0
2 1
3 1
+
2 0
= −7
Determinants have many interesting properties, but as far as their connection with inverses is concerned, the following is the most important.
Theorem 8.2.2. Let A and B be square matrices having the same size. Then
det(AB) = det(A) det(B).
Proof. The result is true in general, but I shall only prove it for 2×2 matrices.
Let
a b
e f
A=
and B =
c d
g h
We prove directly that det(AB) = det(A) det(B). First
ae + bg af + bh
AB =
ce + dg cf + dh
Thus
det(AB) = (ae + bg)(cf + dh) − (af + bh)(ce + dg).
The first bracket multiplies out as
acef + adeh + bcgf + bdgh
and the second as
acef + adf g + bceh + bdgh.
Subtracting these two expressions we get
adeh + bcgf − adf g − bceh.
Now we calculate det(A) det(B). This is just
(ad − bc)(eh − f g)
which multiplies out to give
adeh + bcf g − adf g − bceh.
Thus the two sides are equal, and we have proved the result.
216
CHAPTER 8. MATRICES II: INVERSES
I shall mention one other property of determinants that we shall need
when we come to study vectors and which will be useful in developing the
theory of inverses. It can be proved in the 2 × 2 and 3 × 3 cases by direct
verification.
Theorem 8.2.3. Let A be a square matrix and let B be obtained from A by
interchanging any two columns. Then det(B) = − det(A).
There is a very important consequence of the above result.
Proposition 8.2.4. If two columns of a determinant are equal then the determinant is zero.
Proof. Let A be a matrix with two columns equal. Then if we swap those
two columns the matrix remains unchanged. Thus by Theorem 8.2.3, we
have that det A = − det A. It follows that det A = 0.
Exercises 8.2
1. Compute the following determinants.
(a)
1 −1 2
3 (b)
3 2
6 4
(c)
1 −1 1 2
3 4 0
0 1 (d)
1 2 0
0 1 1
2 3 1
8.3. WHEN IS A MATRIX INVERTIBLE?
217
(e)
2
2
2
1
0
5
100 200 300
(f)
1
3
5
102 303 504
1000 3005 4999
(g)
1 1 2
2 1 1
1 2 1
(h)
15 16 17
18 19 20
21 22 23
1−x
4
2. Solve 2
3−x
3. Calculate
= 0.
x
cos x
sin x
1 − sin x
cos x
0 − cos x − sin x
4. Prove that
a b
c d
=0
if, and only if, one column is a scalar multiple of the other. Hint:
consider two cases: ad = bc 6= 0 and ad = bc = 0 for this case you will
need to consider various possibilities.
8.3
When is a matrix invertible?
Recall from Theorem 8.2.2 that
det(AB) = det(A) det(B).
218
CHAPTER 8. MATRICES II: INVERSES
I use this property below to get a necessary condition for a matrix to be
invertible.
Lemma 8.3.1. If A is invertible then det(A) 6= 0.
Proof. By assumption, there is a matrix B such that AB = I. Take determinants of both side of the equation
AB = I
to get
det(AB) = det(I).
By the key property of determinants recalled above
det(AB) = det(A) det(B)
and so
det(A) det(B) = det(I).
But det(I) = 1 and so
det(A) det(B) = 1.
In particular, det(A) 6= 0.
Are there any other properties that a matrix must satisfy in order to have
an inverse? The answer is, surprisingly, no. We shall prove that a square
matrix A is invertible if, and only if, det A 6= 0. I shall only be able to
sketch out the proof of this theorem below. The practical issue of actually
computing inverses is dealt with in the next section.
To motivate things, we start with a 2 × 2 matrix A where we can prove
everything. Let
a b
A=
c d
We construct a new matrix as follows. Replace each entry aij of A by the
element you get when you cross out the ith row and jth column. Thus we
get
d c
b a
We now use the following matrix of signs
+ −
− +
8.3. WHEN IS A MATRIX INVERTIBLE?
219
to get
d −c
−b a
We now take the transpose of this matrix to get the matrix we call the
adjugate of A
d −b
adj(A) =
−c
a
The defining characteristic of the adjugate is that
Aadj(A) = det(A)I = adj(A)A
which can easily be checked. We deduce from the defining characteristic of
the adjugate that if det(A) 6= 0 then
1
d −b
−1
A =
a
det(A) −c
We have therefore proved the following.
Proposition 8.3.2. A 2×2 matrix is invertible if and only if its determinant
is non-zero.
1 2
Example 8.3.3. Let A =
. Determine if A is invertible and, if it
3 1
is, find its inverse, and check the answer. We calculate det(A) = −5. This is
non-zero, and so A is invertible. We now form the adjugate of A:
1 −2
adj(A) =
−3
1
Thus the inverse of A is
−1
A
1
=−
5
1 −2
−3
1
We now check that AA−1 = I (to make sure that we haven’t made any
mistakes).
We now consider the general case. Here I will simply sketch out the
argument. Let A be an n × n matrix with entries aij . We define its adjugate
as the result of the following sequence of operations.
220
CHAPTER 8. MATRICES II: INVERSES
• Pick a particular row i and column j. If we cross out this row and
column we get an (n − 1) × (n − 1) matrix which I shall denote by
M (A)ij . It is called a submatrix of the original matrix A.
• The determinant det(M (A)ij ) is called the minor of the element aij .
• Finally, if we multiply det(M (A)ij ) by the corresponding sign we get
the cofactor cij = (−1)i+j det(M (A)ij ) of the element aij .
• If we replace each element aij by its cofactor, we get the matrix C(A)
of cofactors of A.
• The transpose of the matrix of cofactors C(A), denoted adj(A), is
called the adjugate1 matrix of A. Thus the adjugate is the transpose
of the matrix of signed minors.
The crucial property of the adjugate is described in the next result.
Theorem 8.3.4. For any square matrix A, we have that
A(adj(A)) = det(A)I = (adj(A))A.
Proof. We have verified the above result in the case of 2 × 2 matrices. I shall
now prove it in the case of 3 × 3 matrices by means of an argument that
generalizes. Let A = (aij ) and we write


c11 c21 c31
B = adj(A) =  c12 c22 c32 
c13 c23 c33
We shall compute AB. We have that
(AB)11 = a11 c11 + a12 c12 + a13 c13 = det A
by expanding the determinant along the top row. The next element is
(AB)12 = a11 c21 + a12 c22 + a13 c23 .
But this is the determinant of the matrix


a11 a12 a13
 a11 a12 a13 
a31 a32 a33
1
This odd word comes from Latin and means ‘yoked together’.
8.3. WHEN IS A MATRIX INVERTIBLE?
221
which, having two rows equal, must be zero by Proposition 8.2.4. This pattern now continues with all the off-diagonal entries being zero for similar
reasons and the diagonal entries all being the determinant.
We may now prove the main theorem of this section.
Theorem 8.3.5. Let A be a square matrix. Then A is invertible if and only
if det(A) 6= 0. When A is invertible, its inverse is given by
A−1 =
1
adj(A).
det(A)
Proof. Let A be invertible. By our lemma above, det(A) 6= 0 and so we can
form the matrix
1
adj(A).
det(A)
We now calculate
A
1
1
adj(A) =
A adj(A) = I
det(A)
det(A)
by our theorem above. Thus A has the advertised inverse.
Conversely, suppose that det(A) 6= 0. Then again we can form the matrix
1
adj(A)
det(A)
and verify that this is the inverse of A and so A is invertible.
Example 8.3.6. Let


1 2 3
A= 2 0 1 
−1 1 2
We show that A is invertible and calculate its inverse. First, det(A) = −5
and so A is invertible. The matrix of minors is


−1
5
2
 1
5
3 
2 −5 −4
222
CHAPTER 8. MATRICES II: INVERSES
The matrix of cofactors is


−1 −5
2
 −1
5 −3 
2
5 −4
The adjugate is the transpose of the matrix of cofactors


−1 −1
2
 −5
5
5 
2
3 −4
Thus the inverse of A is the adjugate with each entry divided by the determinant of A


−1 −1
2
1
5
5 
A−1 = −  −5
5
2 −3 −4
The Moore-Penrose Inverse
We have proved that a square matrix has an inverse if, and only if,
it has a non-zero determinant. For rectangular matrices, the existence
of an inverse doesn’t even come up for discussion. However, in later
applications of matrix theory it is very convenient if every matrix have
an ‘inverse’. Let A be any matrix. We say that A+ is its Moore-Penrose
inverse if the following conditions hold:
1. A = AA+ A.
2. A+ = A+ AA+ .
3. (A+ A)T = A+ A.
4. (AA+ )T = AA+ .
It is not obvious, but every matrix A has a Moore-Penrose inverse A+
and, in fact, such an inverse is uniquely determined by the above four
conditions. In the case, where A is invertible in the vanilla-sense, its
Moore-Penrose inverse is just its inverse. But even singular matrices
have Moore-Penrose inverse. You can check that the matrix defined
8.4. COMPUTING INVERSES
223
below satisfies the four conditions above
+ 1 2
0 · 002 0 · 006
=
3 6
0 · 04 0 · 12
The Moore-Penrose inverse can be used to find approximate solutuions
to systems of linear equations that might otherwise have no solution.
Exercises 8.3
1. Use the adjugate method to compute the inverses of the following matrices. In each case, check that your solution works.
1 0
(a)
0 2
1 1
(b)
1 2


1 0 0
(c)  0 2 0 
0 0 3


1 2 3
(d)  2 0 1 
−1 1 2


1 2 3
(e)  1 3 3 
1 2 4


2
2 1
1 2 
(f)  −2
1 −2 2
8.4
Computing inverses
The practical way to compute inverses is to use elementary row operations.
I shall first describe the method and then I shall prove that it works. Let A
be a square n × n matrix. We want to determine whether it is invertible and,
224
CHAPTER 8. MATRICES II: INVERSES
if it is, we want to calculate its inverse. We shall do this at the same time
and we shall not need to calculate a determinant. We write down a new kind
of augmented matrix this time of the form B = (A | I) where I is the n × n
identity matrix. The first part of the algorithm is to carry out elementary
row operations on B guided by A. Our goal is to convert A into a row echelon
matrix. This will have zeros below the leading diagonal. We are interested
in what entries lie on the leading diagonal. If one of them is zero we stop and
say that A is not invertible. If all of them are 1 then the algorithm continues.
We now use the 1’s that lie on the leading diagonal to remove all element
above each 1. Our original matrix B now has the following form (I | A0 ). I
claim that A0 = A−1 . I shall illustrate this method by means of an example.
Example 8.4.1. Let


2 −2 4
3 2 
A= 2
−1
1 1
We shall show that A is invertible and
down the augmented matrix

2 −2 4
 2
3 2
−1
1 1
calculate its inverse. We first write

1 0 0
0 1 0 
0 0 1
We now carry out a sequence of elementary row operations to get the following


1 −1 1 0 0 −1
2 
 0
1 0 0 15
5
1
0
0 1 2 0
1
The leading diagonal contains only 1’s and so our original matrix is invertible.
We now use these 1’s to insert zeros above using elementary row operations.


1 0 0 − 21 15 − 58
2 
 0 1 0
0 15
5
1
0 0 1
0
1
2
It follows that the inverse of A is

A−1
− 12
= 0
1
2
1
5
1
5
− 58
0
1
2
5


8.4. COMPUTING INVERSES
225
At this point, it is always advisiable to check that A−1 A = I in fact rather
than just in theory.
We now need to explain why this method works. An n × n matrix E is
called an elementary matrix if it is obtained from the n × n identity matrix
by means of a single elementary row operation.
Example 8.4.2. Let’s find all the 2 × 2 elementary matrices. The first one
is obtained by interchanging two rows and so is
0 1
1 0
Next we obtain two matrices by multiplying each row by a non-zero scalar λ
λ 0
1 0
0 1
0 λ
Finally, we obtain two matrices by adding a scalar multiple of one row to
another row
1 λ
1 0
0 1
λ 1
There are now two key results we shall need.
Lemma 8.4.3.
1. Let B be obtained from A by means of a single elementary row operation
ρ. Thus B = ρ(A). Let E = ρ(I). Then B = EA.
2. Each elementary row matrix is invertible.
Proof. (1) This has to be verified for each of the three types of elementary row
operation. I shall deal with the third class of such operations: Rj ← Rj +λRi .
Apply this elementary row operation to the n × n identity matrix I to get
the matrix E. This agrees with the identity matrix everywhere except the
jth row. There it has a λ in the ith column and, of course, a 1 in the jth
column. We now calculate the effect of E on any suitable matrix A. Then
EA will be the same as A except in the jth row. This will consist of the jth
row of A to which λ times the ith row of A has been added.
(2) Let E be the elementary matrix that arises from the elementary row
operation ρ. Thus E = ρ(I). Let ρ0 be the elementary row operation that
undoes the effect of ρ. Thus ρρ0 and ρ0 ρ are both identity functions. Let
E 0 = ρ0 (I). Then E 0 E = ρ0 (E) = ρ0 (ρ(I)) = I. Similarly, EE 0 = I. It
follows that E is invertible with inverse E 0 .
226
CHAPTER 8. MATRICES II: INVERSES
Example 8.4.4. We give an example of 2 × 2 elementary matrices. Consider
the elementary matrix
1 0
E=
λ 1
which is obtained from the 2 × 2 identity matrix by carrying out the elementary row operation R2 ← R2 + λR1 . We now calculate the effect of this
matrix when we multiply it into the following matrix
a b c
A=
d e f
and we get
EA =
a
b
c
λa + d λb + e λc + f
But this matrix is what we would get if we applied the elementary row
operation directly to the matrix A.
We may now prove that our the elementary row operation method for
calculating inverses which we described above really works.
Proposition 8.4.5. If (I | B) can be obtained from (A | I) by means of
elementary row operations then A is invertible with inverse B.
Proof. Let the elementary row operations that transform (A | I) to (I | B)
be ρ1 , . . . , ρn in this order. Thus
(ρn . . . ρ1 )(A) = I and (ρn . . . ρ1 )(I) = B.
Let Ei be the elementary matrix corresponding to the elementary row operation ρi . Then
(En . . . E1 )A = I and (En . . . E1 )I = B.
Now the matrices Ei are invertible and so
A = (En . . . E1 )−1 and B = En . . . E1 .
Thus B is the inverse of A as claimed.
8.5. THE CAYLEY-HAMILTON THEOREM
227
Exercises 8.4
1. Use elementary row operations to compute the inverses of the following
matrices. In each case, check that your solution works.
1 0
(a)
0 2
1 1
(b)
1 2


1 0 0
(c)  0 2 0 
0 0 3


1 2 3
(d)  2 0 1 
−1 1 2


1 2 3
(e)  1 3 3 
1 2 4


2
2 1
1 2 
(f)  −2
1 −2 2
8.5
The Cayley-Hamilton theorem
The goal of this section is to prove a major theorem about square matrices. It
is true in general, although I shall only prove it for 2×2 matrices. It provides a
first indication of the importance of certain polynomials in studying matrices.
Let A be a square matrix. We can therefore form the product AA which
we write as A2 . When it comes to multiplying A by itself three times there
are apparently two possibilities: A(AA) and (AA)A. However, matrix multiplication is associative and so these two products are equal. We write this as
A3 . In general An+1 = AAn = An A. We define A0 = I, the identity matrix
the same size as A. The usual properties of exponents hold
Am An = Am+n and (Am )n = Amn .
228
CHAPTER 8. MATRICES II: INVERSES
One important consequence is that powers of A commute so that
Am An = An Am .
We can form powers of matrices, multiply them by scalars and add them
together. We can therefore form sums like
A3 + 3A2 + A + 4I.
In other words, we can substitute A in the polynomial
x3 + 3x2 + x + 4
remembering that 4 = 4x0 and so has to be replaced by 4I.
Example 8.5.1. Let f (x) = x2 + x + 2 and let
1 1
A=
1 0
We calculate f (A). Remember that x2 + x + 2 is really x2 + x + 2x0 . Replace
x by A and so x0 is replaced by A0 which is I. We therefore get A2 + A + 2I
and calculating gives
5 2
2 3
It is important to remember that when a square matrix A is substituted
into a polynomial, you must replace the constant term of the polynomial by
the constant term times the identity matrix. The identity matrix you use
will have the same size as A.
We now come to an important extension of what we mean by a root. If
f (x) is a polynomial and A is a square matrix, we say that A is a matrix root
of f (x) if f (A) is the zero matrix.
Let A be a square n × n matrix. Put
χA (x) = det(A − xI).
Observe that x is essentially a complex variable and so cannot be replaced
by a matrix. Then χA (x) is a polynomial of degree n called the characteristic
polynomial of A. It is worth observing that when x = 0 we get that χA (0) =
det(A), which is therefore the value of the constant term of the characteristic
polynomial.
8.5. THE CAYLEY-HAMILTON THEOREM
229
Theorem 8.5.2 (Cayley-Hamilton). Every square matrix is a root of its
characteristic polynomial.
Proof. I shall only prove this theorem in the 2 × 2 case, though it is true in
general. Let
a b
A=
c d
Then from the definition the characteristic polynomial is
a−x
b
c
d−x Thus
χA (x) = x2 − (a + d)x + (ad − bc).
We now calculate χA (A) which is just
a b
c d
2
− (a + d)
a b
c d
+
ad − bc
0
0
ad − bc
This simplifies to
0 0
0 0
which proves the theorem in this case. The general proof uses the adjugate
matrix of A − xI.
There is one very nice application of this theorem.
Proposition 8.5.3. Let A be an invertible n × n matrix. Then the inverse
of A can be written as a polynomial in A of degree n − 1.
Proof. We may write χA (x) = f (x) + det(A) where f (x) is a polynomial
with constant term zero. Thus f (x) = xg(x) for some polynomial g(x) of
degree n − 1. By the Cayley-Hamilton theorem, 0 = Ag(A) + det(A)I Thus
1
Ag(A) = − det(A)I. Put B = − det(A)
g(A). Then AB = I. But A and B
commute since B is a polynomial in A. Thus BA = I. We have therefore
proved that A−1 = B.
230
8.6
CHAPTER 8. MATRICES II: INVERSES
Complex numbers via matrices
Consider all matrices that have the following shape
a −b
b a
where a and b are arbitrary real numbers. You should show first that the sum,
difference and product of any two matrices having this shape is also a matrix
of this shape. Rather remarkably matrix multiplication is commutative for
matrices of this shape. Observe that the determinant of our matrix above is
a2 + b2 . It follows that every non-zero matrix of the above shape is invertible.
The inverse of the above matrix in the non-zero case is
1
a b
a2 + b2 −b a
and again has the same form. It follows that the set of all these matrices
satisfies the axioms of high-school algebra. Define
1 0
1=
0 1
and
i=
0 −1
1 0
We may therefore write our matrices in the form
a1 + bi.
Observe that
i2 = −1.
It follows that our set of matrices can be regarded as the complex numbers
in disguise.
Chapter 9
Vectors
Euclid’s book codified what was known about geometry into a handful of
axioms and then showed that all of geometry could be deduced from those
axioms by the use of mathematcial proof. Impressive though Euclid’s achievement was, it does suffer one drawback in that it is not the easiest system to
use. Even proving simple results, like Pythagoras’s theorem, takes dozens
of intermediate results. So although it is a great theoretical achievement, it
is not such a practical one. It was not until the nineteenth century that a
practical tool for doing three-dimensional geometry was constructed. On the
basis of the work carried out by Hamilton on quaternions — I say a little
more about this later — the theory of vectors, the subject of this chapter,
was developed by the American Josiah Willard Gibbs and promoted by the
English electrical engineer Oliver Heaviside (whose formal schooling ended
at the age of 16). In addition to setting up an algebraic system that will
enable us to carry out geometrical calculations easily, I shall also touch on a
deep connection with the work of the previous chapter. Each linear equation
in three unknowns is in fact the equation of a plane in three-dimensional
space. This means that the theory of linear equations in three unknowns
has a geometrical interpretation. This may be generalized: the theory of
matrices combined with a theory of vectors in arbitrary dimensions is known
as linear algebra, one of the most important branches of algebra. I have not
attempted to develop the subject in this chapter completely rigorously, so I
often make appeals to geometric intuition in setting up the algebraic theory
of vectors.
231
232
9.1
CHAPTER 9. VECTORS
Vector algebra
I shall assume you are familiar with the following ideas from school:
• The notion of a point.
• The notion of a line and of a line segment.
• The notion of the length of a line segment and the angle between two
line segments.
• The notion of parallel lines.
The notion of a pair of lines being parallel is fundamental to Euclidean geometry. We used it to prove that the angles in a triangle add up to two right
angles.
9.1.1
Addition and scalar multiplication of vectors
Definition of a vector Two directed line segments which are parallel, have
the same length, and point in the same direction are said to represent the
same vector.
The word ‘vector’ means carrier in Latin and what a vector carries is
information about length and direction and nothing else. Because vectors
stay the same when they move parallel to themselves, they also preserve
information about angles.
Thus vectors have length and direction but no other properties. I shall
denote vectors by bold letters a, b, . . . If P and Q are points then the directed
−→
line segment from P to Q is written P Q or P Q. If P = Q then P Q is just a
point. The zero vector 0 is represented by the degenerate line segment P P .
Vectors are denoted by arrows: the vector starts at the base of the arrow
(where the feathers would be) we shall call this the tail of the vector and
ends at the tip (where the arrowhead is) which we shall call the point of the
vector.
9.1. VECTOR ALGEBRA
233
Example 9.1.1. In the diagram below all the vectors shown are equal.
?









?









?









?









?









?









?









?









?









The set of vectors in space can be equipped with two operations: vector
addition and multiplication by a scalar.
Let a and b be vectors. Then their sum is defined as follows: slide the
vectors parallel to themselves so that the point of a touches the tail of b.
The directed line segment from the tail of a to the point of b represents the
vector a + b.
G ??
???
?? b
??
??
??
??
a o7
o
o
o
oo
ooo
o
o
ooo
ooooo a+b
o
ooo
This definition does make sense though I will not justify that here. If a is
a vector, then −a is defined to be the vector with the same length as a but
pointing in the opposite direction.
?











a 


 −a













234
CHAPTER 9. VECTORS
Theorem 9.1.2 (Properties of vector addition).
(VA1) a + (b + c) = (a + b) + c. This is the associative law for vector
addition.
(VA2) 0 + a = a = a + 0. The zero vector is the additive identity.
(VA3) a + (−a) = 0 = (−a) + a. The vector −a is the additive inverse of a.
(VA4) a + b = b + a. This is the commutative law for vector addition.
The proof of the commutativity of vector addition is illustrated below.
b
?
?7 /

ooo

o
o
oo 

ooo 

o

o


ooo
ooo


o


o
a+boo
a 

o
a

oob+a
o


o


ooo

 oooo


 ooo


ooooo


ooo
/ 
b
The proof of associativity is illustrated below.
We define a − b = a + (−b).
9.1. VECTOR ALGEBRA
235
Advanced remark We have seen the above properties before: real numbers
with respect to addition, and m × n matrices with respect to matrix addition. A set equipped with a binary operation that is associative, possesses
an identity, possesses unique inverses and is commutative is called an abelian
group.
Example 9.1.3. Consider the following square of vectors.
a
O
/
d
b
o
Then we have
a + b + c + d = 0.
Thus, in particular,
d = −c − b − a.
c
236
CHAPTER 9. VECTORS
Example 9.1.4. Consider the following diagram
/
? ???

??

??

??


??


??

??

f 
??c
a

k
??


??

??


??


??


??

??

oO
o
OOO
g
h
OOO
OOO
OOO
OOO
OOO
OOO
OOO
OO
d
e OOO
OOO
OOO
OOO
OOO
OOO
OOO
OO' O
b
(i) We may write c in terms of e, d and f . By following the arrows we get
that c = d + ef
(ii) We may write g in terms of c, d, e and k. By following the arrows we
get that g = −k + c + d − e.
(iii) We may solve x + b = f using similar methods to high-school algebra
to get x = f − b which is just a.
(iv) We may solve x + h = d − e in a similar fashion to get x = d − e − h
which is just g.
If a is a vector then kak is its length.
If kak = 1 then a is called a unit vector. We have that kak ≥ 0, and
kak = 0 iff a = 0. By results on triangles we have the triangle inequality
ka + bk ≤ kak + kbk .
We now define multiplication of a vector by a scalar. Let λ be a scalar
and a a vector. If λ = 0 then λa = 0; if λ > 0 then λa has the same direction
9.1. VECTOR ALGEBRA
237
as a and length λ kak; if λ < 0 then λa has the opposite direction to a and
length (−λ) kak. Observe that in all cases
kλak = |λ| kak .
If a is non-zero then
a
kak
is a unit vector in the same direction as a. We call this process normalisation.
Vectors that differ by a scalar multiple are said to be parallel.
â =
Theorem 9.1.5 (Properties of scalar multiplication).
(i) 0a = 0.
(ii) 1a = a.
(iii) (−1)a = −a.
(iv) (λ + µ)a = λa + µa.
(v) λ(a + b) = λa + λb.
(vi) λ(µa) = (λµ)a.
We can use what we have introduced so far to prove simple geometric
theorems.
Example 9.1.6. If the midpoints of the consecutive sides of any quadrilateral
are joined by line segments, then the resulting quadrilateral is a parallelogram. We refer to the picture below.
238
CHAPTER 9. VECTORS
We have that
a + b + c + d = 0.
−→ 1
−
−
→
Now AB = 2 a + 12 b and CD = 12 c + 12 d. But a + b = −(c + d). It follows
−→
−−→
that AB = −CD. Hence the line segment AB is parallel to the line segment
CD and they have the same lengths. Similarly, BC is parallel to AD and
has the same length.
9.1.2
Inner, scalar or dot products
We now introduce a notion that will enable us to measure angles and lengths.
It is a development of the idea of the perpendicular projection of a line onto
another line.
Let a and b be two vectors. If a, b 6= 0 then we define
a · b = kak kbk cos θ
where θ is the angle between a and b.
Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b
is zero then a · b is defined to be zero. We call a · b the inner product of a
and b. It is also sometimes called the scalar product and the dot product. It
is important to remember that it is a scalar and not a vector.
We say that non-zero vectors a and b are orthogonal to each other if the
angle between them is ninety degrees. The key property of the inner product
is that for non-zero a and b we have that
a · b = 0 iff a and b are orthogonal.
Theorem 9.1.7 (Properties of the inner product).
(i) a · b = b · a.
(ii) a · a = kak2 .
(iii) λ(a · b) = (λa) · b = a · (λb).
(iv) a · (b + c) = a · b + a · c.
Remarks
(i) The inner product a · a is often abbreviated a2 .
9.1. VECTOR ALGEBRA
239
(ii) Property (iv) says that the inner product of a sum is the sum of the
inner products. It will be very important to us. It is the only property
that takes a bit of work to prove. I give the proof later.
The inner product enables us to prove much more interesting theorems.
Example 9.1.8. The angle in a semicircle is a right angle. Draw a semicircle.
Choose any point on the circumference of the semicircle and join it to the
points at either end of the diameter of the semicircle. Then the claim is that
the resulting triangle is right-angled.
We are interested in the angle formed by AB and AC. Observe that
−→
−→
AB = −(a + b) and AC = a − b. Thus
−→ −→
AB · AC =
=
=
=
−(a + b) · (a − b)
−(a2 − a · b + b · a − b2 )
−(a2 − b2 )
0
using the fact that a · b = b · a and kak = kbk, because this is just the radius
of the semicircle. It follows that the angle BAC is a right angle, as claimed.
240
CHAPTER 9. VECTORS
Example 9.1.9. Pythagoras’ theorem proved using vectors.
We have that
a+b+c=0
and so a + b = −c. Now
(a + b)2 = (−c) · (−c) = kck2 .
But
(a + b)2 = kak2 + 2a · b + kbk2
and this is equal to kak2 + kbk2 because a · b = 0. It follows that
kak2 + kbk2 = kck2 .
Remark The set of 3-dimensional vectors equipped with the operations of
vector addition and scalar multiplication together with the inner product is
called three dimensional Euclidean space E3 . This is precisely the space of
Euclid’s geometry, but done in a modern way.
9.1.3
Vector or cross products
In three dimensional space there is another operation available to us that is
useful in many applications. Let a and b be non-zero vectors. We define a
new vector
a × b = kak kbk sin θn
9.1. VECTOR ALGEBRA
241
where θ is the angle between a and b, and n is a unit vector at right angles to
the plane containing a and b — this determines n up to sign: we choose the
direction of n so that when rotating a to b in a clockwise direction through
the angle θ we are looking in the direction of n.
O
a×b
/
??
??
??
??
??
??
??
a ??
??
??
??
??
?
b
If a or b is zero then a × b is the zero vector. We call it the vector
product of a and b. It is sometimes called the cross product. It is important
to remember that it is a vector. The key property of the vector product is
that for non-zero vectors
a × b = 0 iff a and b are parallel.
Theorem 9.1.10 (Properties of the vector product).
(i) a × b = −b × a.
(ii) λ(a × b) = (λa) × b = a × (λb).
(iii) a × (b + c) = a × b + a × c.
Remark Property (iii) says that the vector product distributes over addition.
This is the hardest property to prove; I give the proof later.
Warning! a × (b × c) 6= (a × b) × c. In other words, the vector product is
not associative.
242
CHAPTER 9. VECTORS
Warning! Distinguish between the following:
• λa. This is a scalar λ times a vector a and the result is a vector.
• a · b. This is the inner product of two vectors and is a scalar.
• a × b. This is the vector product of two vectors and is a vector.
You must not interchange notation for these different products (unlike
school algebra where you can).
Example 9.1.11. The area of the parallelogram determined by the vectors
a and b is ka × bk as the following picture shows.
Example 9.1.12. We shall prove the law of sines for triangles using the
vector product. With reference to the diagram below
we have that
sin A
sin B
sin C
=
=
.
a
b
c
9.1. VECTOR ALGEBRA
243
We choose vectors as shown so that
kak = a, kbk = b, kck = c.
Then
a + b + c = 0.
Hence
a + b = −c.
Take the vector product of this equation on both sides on the left with a, b
and c in turn. We get
1. a × b = c × a.
2. b × a = c × b.
3. c × a = b × c.
From (1), we get
ka × bk = kc × ak .
Thus
kbk sin C = kck sin B
which gives us the second equation in the statement of the result. The
remaining results follow similarly.
9.1.4
Scalar triple products
This product is nothing more than a combination of the previous two. However, it is included because, as we shall see, it has an important geometric
interpretation.
Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c)
is a scalar. We define
[a, b, c] = a · (b × c).
It is called the scalar triple product. Its properties are determined by the
properties of the inner and vectors products. What it means geometrically
will be described later.
Exercises 4.1
244
CHAPTER 9. VECTORS
1. Consider the following diagram.
A
a
/
B
b
/
C
c
D
E
Now answer the following questions.
(i) Write the vector BD in terms of a and c
(ii) Write the vector AE in terms of a and c
(iii) What is the vector DE?
(iv) What is the vector CF ?
(v) What is the vector AC?
(vi) What is the vector BF ?
2. If a, b, c and d represent the consecutive sides of a quadrilateral, show
that the quadrilateral is a parallelogram if and only if a + c = 0.
3. In the regular pentagon ABCDE, let AB = a, BC = b, CD = c, and
DE = d. Express EA, DA, DB, CA, EC, BE in terms of a, b, c,
and d.
4. Let a and b represent adjacent sides of a regular hexagon so that the
initial point of b is the terminal point of a. Represent the remaining
sides by means of vectors expressed in terms of a and b.
5. Prove that kak b + kbk a is orthogonal to kak b − kbk a for all vectors
a and b.
6. Let a and b be two non-zero vectors. Let
a·b
a.
u=
a·a
Show that b − u is orthogonal to a.
F
9.2. VECTOR ARITHMETIC
245
7. Simplify (u + v) × (u − v).
8. Let a and b be two unit vectors the angle between them being π3 . Show
that 2b − a and a are orthogonal.
9. Prove that
ku − vk2 + ku + vk2 = 2(kuk2 + kvk2 ).
Deduce that the sum of the squares of the diagonals of a parallelogram
is equal to the sum of the squares of all four sides.
9.2
Vector arithmetic
The theory I introduced in Section 4.1 is useful for proving general results
about geometry, but what if we want to calculate with particular vectors: how
do we describe them? To do this we need coordinates, and vectors viewed in
terms of coordinates will occupy us for the remainder of this section.
9.2.1
i’s, j’s and k’s
Set up a cartesian coordinate system consisting of x, y and z axes. We
orient the system so that in rotating the x axis clockwise to the y axis, we
are looking in the direction of the positive z axis. Let i, j and k be unit
vectors parallel to the x, y and z axes respectively (pointing in the positive
directions). Every vector a can be uniquely written in the form
a = a1 i + a2 j + a3 k
for some scalars a1 , a2 , a3 . This is achieved by orthogonal projection of the
vector a (moved so that it starts at the origin) onto each of the three coordinate axes. The numbers ai are called the components of a in each of the
three directions.
Remarks
• If a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k then a = b iff ai = bi ;
that is, corresponding components are equal.
• 0 = 0i + 0j + 0k.
246
CHAPTER 9. VECTORS
• If a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k then
a + b = (a1 + b1 )i + (a2 + b2 )j + (a3 + c3 )k.
• If a = a1 i + a2 j + a3 k then λa = λa1 i + λa2 j + λa3 k.
Theorem 9.2.1 (Scalar products). Let a = a1 i + a2 j + a3 k and b = b1 i +
b2 j + b3 k. Then
a · b = a1 b 1 + a2 b 2 + a3 b 3 .
Proof. This is proved using Theorem 9.1.7 (iv) and the following table
·
i
j
k
i j k
1 0 0
0 1 0
0 0 1
computed from the definition of the inner product. We have that
a · b = a · (b1 i + b2 j + b3 k) = b1 (a · i) + b2 (a · j) + b3 (a · k).
We now compute a · i, a · j, and a · k in turn:
• a · i = a1 .
• a · j = a2 .
• a · k = a3 .
Putting everything together we get
a · b = a1 b 1 + a2 b 2 + a3 b 3 ,
as required.
Remark If a = a1 i + a2 j + a3 k then kak =
Theorem 9.2.2 (Vector products).
b2 j + b3 k. Then
a × b = p
a21 + a22 + a23 .
Let a = a1 i + a2 j + a3 k and b = b1 i +
i j k a1 a2 a3 b1 b2 b3 Warning! This ‘determinant’ can only be expanded along the first row.
9.2. VECTOR ARITHMETIC
247
Proof. This follows by Theorem 9.1.10 (iii) and the following table
×
i
j
k
i
j k
0 k −j
−k 0
i
j −i 0
computed from the definition of the vector product. We have that
a × b = a × (b1 i + b2 j + b3 k) = b1 (a × i) + b2 (a × j) + b3 (a × k).
We now compute a × i, a × j, and a × k in turn:
• a × i = −a2 k + a3 j.
• a × j = a1 k − a3 i.
• a × k = −a1 j + a2 i.
Putting everything together we get
a × b = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k
which is equal to the given determinant.
The proof of the following now follows by our two theorems above.
Theorem 9.2.3 (Scalar triple products and determinants). Let
a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, c = c1 i + c2 j + c3 k.
Then
a1 a2 a3 [a, b, c] = b1 b2 b3 c1 c2 c3 Thus the properties of scalar triple products are the same as the properties of
3 × 3 determinants.
Proof. We calculate a · (b × c). This is equal to
(a1 i + a2 j + a3 k) · [(b2 c3 − b3 c2 )i − (b1 c3 − b3 c1 )j + (b1 c2 − b2 c1 )k].
248
CHAPTER 9. VECTORS
But this is equal to
a1 (b2 c3 − b3 c2 ) − a2 (b1 c3 − b3 c1 ) + a3 (b1 c2 − b2 c1 )
which is nothing other than
a1 a2 a3 b1 b2 b3 c1 c2 c3 Before we look at some examples it is worth stepping back a bit to see
where we are.
Summary
In Section 4.1, we defined vectors and vector operations geometrically. In Section 4.2, we showed that once we had chosen a coordinate system, vectors and vector operations could be described
algebraically. The important point to remember in what follows
is that the two approaches must give the same answers.
Exercises 4.2
1. Let a = 3i + 4j, b = 2i + 2j − k and c = 3i − 4k.
(i) Find kak, kbk, and kck.
(ii) Find a + b and a − c.
(iii) Determine ka − ck.
2. (i) Let a = 4i + j − 3k and b = i + 2j + 2k. Find a · b. Are a and b
orthogonal?
(ii) Find the angle between −2(i − j) + k and j − i.
3. The unit cube is determined by the three vectors i, j and k. Find the
angle between the long diagonal of the unit cube and one of its edges.
9.3. GEOMETRY WITH VECTORS
249
4. Calculate i × (i × k) and (i × i) × k. What do you deduce as a result
of this?
5. Calculate u · (v × w) where u = 3i − 2j − 5k, v = i + 4j − 4k, and
w = 3j + 2k.
6. If [a, b, c] = 0 what can you deduce?
9.3
Geometry with vectors
There are two kinds of vectors: the free vectors that we have been dealing
with up to now and the position vectors we introduce next.
9.3.1
Position vectors
So far, we have used vectors to describe line segments. But we can also use
vectors to describe the precise location of points. To do this, we have to
choose and fix a point O in space, called an origin. We can then consider all
the directed line segments that start at O. Each such segment represents a
vector and every vector is thus represented. The tops of the line segments are
points in space, and every point thus occurs. It follows that once an origin
has been fixed, vectors can be used to describe points. We talk about the
position vectors of points. However, we can only talk about position vectors
with respect to some fixed point O.
250
CHAPTER 9. VECTORS
Example 9.3.1. The point A has position vector a = −i + j and the point
B has position vector b = 2i + j − k. Find the position vector of the point
P which is 32 of the way between A and B.
AO ?
??
??
??
??
??
?? 2
??
??
??
??
??
??
a
O
?P ?
 ???

??

??

??


??

p 
??1

??


??


??


??

??


?


/B
b
We have that
−→
−→ −→
OP = OA + AP
−→ 2 −→
= OA + AB
3
2
= a + (b − a)
3
1
2
=
a+ b
3
3
2
1
(−i + j) + (2i + j − k)
=
3
3
2
= i+j− k
3
9.3.2
Linear combinations
Let v1 , . . . , vn be n vectors and let λ1 , . . . , λn be n scalars. Then the vector
v = λ1 v1 + . . . + λn vn
9.3. GEOMETRY WITH VECTORS
251
is called a linear combination of the n vectors.
Only two cases of this definition are needed in this course. If we are
given just one vector v1 then a linear combination is just a scalar multiple of
that vector. The other case if where we have two vectors v1 and v2 . Linear
combinations then look like this
λ1 v1 + λ2 v2 .
Let v be any non-zero vector. Then any vector parallel to this vector has
the form λv for some scalar λ.
Now let v1 and v2 be two non-zero vectors where neither is a multiple of
the other. Then these two vectors determine a plane in space. This plane
is not rooted to any point and so, for convenience, we may move it parallel
to itself so that it passes through some fixed point that we may treat as an
origin. Now let v be any vector which is parallel to this plane. We may move
it parallel to itself so that its tail is at the origin. By plane geometry, we
may find real numbers λ1 and λ2 such that
v = λ1 v1 + λ2 v2 .
We shall use these ideas in deriving formulae for lines and planes in space
in the sections below.
9.3.3
Lines
Intuitively, a line in space is determined by one of the following two pieces
of information:
1. Two distinct points each described by a position vector.
2. One point and a direction where the point is given by a position vector
and the direction by a (free) vector.
Let’s see how we can use vectors to obtain the equation of that line.
Let a and b be the position vectors of two distinct points. Let r =
xi + yj + zk be the position vector of a point on the line they determine.
Observe that the line determined by the two points will be parallel to the
vector b − a which is the direction the line is parallel to.
252
CHAPTER 9. VECTORS
The vectors r − a and b − a will be parallel. Thus there is a scalar λ such
that r − a = λ(b − a). It follows that
r = a + λ(b − a).
This is called the (vector form of ) the parametric equation of the line. The
parameter in question is λ.
We now derive the coordinate form of the parametric equation. Let
a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k.
Substituting in our vector form above and equating components we obtain
x = a1 + λ(b1 − a1 ),
y = a2 + λ(b2 − a2 ),
z = a3 + λ(b3 − a3 ).
For convenience, put ci = bi − ai . Thus the coordinate form of the parametric
equation for the line is
x = a1 + λc1 ,
y = a2 + λc2 ,
z = a3 + λc3 .
9.3. GEOMETRY WITH VECTORS
253
If c1 , c2 , c3 6= 0 then we can eliminate the parameters in the above equations to get the non-parametric equations of the line:
x − a1
y − a2
=
,
c1
c2
y − a2
z − a3
=
.
c2
c3
It’s worth noting that
• The parametric equation is useful for generating points on the line (by
choosing values of the parameter λ).
• The non-parametric equation is useful for checking that given points
lie on a given line.
Example 9.3.2. Find the parametric and the non-parametric equations of
the line through the point with position vector i + 2j + 3k and parallel to the
vector 4i + 5j + 6k. In this question, we are given the direction that the line
is parallel to. Thus
r − (i + 2j + 3k)
is parallel to
4i + 5j + 6k.
It follows that
r = i + 2j + 3k + λ(4i + 5j + 6k)
is the vector form of the parametric equation of the line. We now find the
cartesian form of the parametric equation. Put
r = xi + yj + zk.
Then
xi + yj + zk = i + 2j + 3k + λ(4i + 5j + 6k).
These two vectors are equal iff their coordinates are equal. Thus we have
that
x = 1 + 4λ
y = 2 + 5λ
z = 3 + 6λ
254
CHAPTER 9. VECTORS
This is the cartesian form of the parametric equation of the line. Finally, we
eliminate λ to get the non-parametric equation of the line
x−1
y−2
y−2
z−3
=
and
=
.
4
5
5
6
These two equations can be rewritten in the form
5x − 4y = −3 and 6y − 5z = −3.
9.3. GEOMETRY WITH VECTORS
9.3.4
255
Planes
Intuitively, a plane in space is determined by one of the following three pieces
of information:
1. Any three points that do not all lie in a straight line; that is, the points
form the vertices of a triangle.
2. One point and two non-parallel directions.
3. One point and a direction which is perpendicular or normal to the
plane.
We shall begin by finding the parametric equation of the plane determined
by the three points with position vectors a, b and c.
The vectors b − a and c − a are both parallel to the plane, but are not
parallel to each other. Thus every vector parallel to the plane they determine
has the form
λ(b − a) + µ(c − a)
for some scalars λ and µ. Here we use the ideas of Section 4.3.2. Thus if
the position vector of an arbitrary point on the plane is r, then r − a =
256
CHAPTER 9. VECTORS
λ(b − a) + µ(c − a). Thus the (vector form of ) the parametric equation of
the plane is
r = a + λ(b − a) + µ(c − a).
This can easily be written in coordinate form by equating components.
To find the non-parametric equation of a plane, we use the fact that a
plane is determined once a point on the plane is known and a vector orthogonal to every vector in the plane — such a vector is said to be normal to
the plane. Let n be a vector normal to our plane, and let a be the position
vector of a point in the plane.
Then r − a is orthogonal to n. Thus
(r − a) · n = 0.
This is the (vector form) of the non-parametric equation of the plane. To
find the coordinate form of the non-parametric equation, let
r = xi + yj + zk,
a = a1 i + a2 j + a3 k,
n = n1 i + n2 j + n3 k.
From (r − a) · n = 0 we get (x − a1 )n1 + (y − a2 )n2 + (z − a3 )n3 = 0. Thus
the non-parametric equation of the plane is
n1 x + n2 y + n3 z = a1 n1 + a2 n2 + a3 n3 .
Remark From the equation above, we deduce that the solutions of a linear
equation in three unknowns
ax + by + cz = d
9.3. GEOMETRY WITH VECTORS
257
all lie on a plane in general (although there are some degenerate cases where
something different from a plane will be obtained).
We observe that the non-parametric equation of the line in fact describes
the line as the intersection of two planes.
If we have three equations in three unknowns then, as long as the planes
are angled correctly, they will intersect in a point — that is, the equations
will have a unique solution. However, there are many cases where either the
planes have no points in common (no solution) of have lines or indeed planes
in common (infinitely many solutions).
Thus the nature of the solutions of a system of linear equations in three
unknowns is intimately bound up with the geometry of the planes they determine.
We have one final question to answer: given the parametric equation of
the plane, how do we find the non-parametric equation? The vectors b − a
and c − a are parallel to the plane but not parallel to each other. The vector
n = (b − a) × (c − a)
is normal to our plane.
Example 9.3.3. Find the parametric and non-parametric equations of the
plane containing the three points with position vectors
a = j − k,
b = i + j,
c = i + 2j.
We have that
b−a=i+k
and
c − a = i + j + k.
Thus the parametric equation of the plane is
r = j − k + λ(i + k) + µ(i + j + k).
To find the non-parametric equation, we need to find a vector normal to the
plane. We calculate
(b − a) × (c − a) = k − i.
Thus
(r − a) · (k − i) = 0.
258
CHAPTER 9. VECTORS
That is
(xi + (y − 1)j + (z + 1)k) · (k − i) = 0.
This simplifies to
z − x = −1,
the non-parametric equation of the plane. We now check that our three
original points satisfies this equation. The point a has co-ordinates (0, 1, −1);
the point b has co-ordinates (1, 1, 0); the point c has co-ordinates (1, 2, 0).
It is easy to check that each set of co-ordinates satisfies the equation.
9.3.5
Determinants
Let’s start with 1 × 1 matrices. The determinant of (a) is just a. The length
of a is |a|, the absolute value of the determinant of (a).
9.3. GEOMETRY WITH VECTORS
259
Theorem 9.3.4. Let a = ai + cj and b = bi + dj be a pair of plane vectors.
Then the area of the parallelogram determined by these vectors is the absolute
value of the determinant
a b c d Proof. The proof I give will be for the case where both vectors are in the
first quadrant. I shall consider two cases.
(Case 1): b is to the left of a when standing at the origin and looking
along a. Let
a = ai + cj and b = bi + dj.
The area of the parallelogram is the area of the rectangle defined by the
points
0, (a + b)i, a + b, (c + d)j
minus the area of two rectangles the same size, labelled (1), two triangles the
same size, labelled (2), and another two triangles of the same size, labelled
(3). That is
1
1
(a + b)(c + d) − 2bc − 2( )ac − 2( )bd
2
2
which is equal to
ac + ad + bc + bd − 2bc − bd − ac = ad − bc.
260
CHAPTER 9. VECTORS
(Case 2): b is to the right of a when standing at the origin and looking
along a. A similar argument shows that the area is bc − ad which is the
negative of the determinant.
Putting these two cases together, we see that the area is the absolute value
of the determinant, because we usually expect areas to be non-negative.
Theorem 9.3.5. Let
a = ai + dj + gk, b = bi + ej + hk, c = ci + f j + ik
be three vectors. Then the volume of the parallelepiped (‘squashed box’) determined by these three vectors is the absolute value of the determinant
a b c d e f g h i or its transpose.
Proof. We refer to the diagram below.
The volume of the box determined by the vectors a, b, c is equal to the
base area times the vertical height. This is equal to the absolute value of
kak kbk sin θ kck cos φ.
We have to use the absolute value of this expression because cos(φ) can take
negative values if c is below rather than above the plane of a and b as I have
drawn it. Now
9.3. GEOMETRY WITH VECTORS
261
• a × b = kak kbk sin θn, where n is the unit vector orthogonal to a and
b and in the correct direction.
• n · c = kck cos φ.
Thus
kak kbk sin θ kck cos φ = (a × b) · c.
By the properties of the inner product
(a × b) · c = c · (a × b) = [c, a, b].
We now use properties of the determinant
[c, a, b] = −[a, c, b] = [a, b, c].
It follows that the volume of the box is the absolute value of
[a, b, c].
It follows from the above theorem and our theorem on scalar triple products that the volume of the parallelepiped determined by the three vectors
a, b, and c is the absolute value of the scalar triple product [a, b, c].
The geometric significance of determinants is that they enable us to measure lengths, areas and volumes.
Exercises 4.3
1. (i) Find the parametric and the non-parametric equations of the line
through the two points with position vectors i − j + 2k and 2i +
3j + 4k.
(ii) Find the parametric and the non-parametric equations of the plane
containing the three points with position vectors i + 3k, i + 2j − k,
and 3i − j − 2k.
2. Let c be the position vector of the centre of a sphere with radius R.
Let an arbitrary point on the sphere have position vector r. Why is
kr − ck = R? Squaring both sides we get
(r − c) · (r − c) = R2 .
262
CHAPTER 9. VECTORS
If r = xi + yj + zk and c = c1 i + c2 j + c3 k, deduce that the equation of
the sphere with centre c1 i + c2 j + c3 k and radius R is
(x − c1 )2 + (y − c2 )2 + (z − c3 )2 = R2 .
(i) Find the equation of the sphere with centre i + j + k and radius 2.
(ii) Find the centre and radius of the sphere with equation
x2 + y 2 + z 2 − 2x − 4y − 6z − 2 = 0.
3. The distance of a point from a line is defined to be the length of the
perpendicular from the point to the line. Let the line in question have
parametric equation
r = p + λd
and let the position vector of the point be q. Show that the distance
of the point from the line is
kd × (q − p)k
.
kdk
4. The distance of a point from a plane is defined to be the length of the
perpendicular to the plane. Let the position vector of the point be q
and the equation of the plane be (r − p) · n = 0. Show that the distance
of the point from the plane is
|(q − p) · n|
.
knk
9.4. SUMMARY OF VECTORS
9.4
263
Summary of vectors
Inner products
Definition
Let a and b be two vectors. If a, b 6= 0 then we define
a · b = kak kbk cos θ
where θ is the angle between a and b. Note that this angle is always chosen
to be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. We
call a · b the inner product of a and b.
Co-ordinate form
Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then
a · b = a1 b 1 + a2 b 2 + a3 b 3 .
Uses
• The most important application is the following: if the vectors a and b
are non-zero then a · b = 0 precisely when a and b are orthogonal —
meaning ‘at right angles to each other’.
• The inner product can more generally be used to work out the angle
between two vectors
a·b
cos θ =
kak kbk
where θ is the angle between the non-zero vectors a and b.
• The inner
√ product can be used to work out the lengths of vectors:
kak = a · a.
264
CHAPTER 9. VECTORS
Vector products
Definition
Let a and b be non-zero vectors. We define a new vector
a × b = kak kbk sin θn
where θ is the angle between a and b, and n is a unit vector at right angles to
the plane containing a and b. This determines n up to sign: we choose the
direction of n so that when rotating a to b in a clockwise direction through
the angle θ we are looking in the direction of n. If a or b is zero then a × b
is the zero vector. We call it the vector product of a and b.
Co-ordinate form
Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then
i j k a × b = a1 a2 a3 b1 b2 b3 Uses
• The most important application of the vector product is in constructing a vector orthogonal to two other vectors, and in particular in constructing a vector orthogonal to a plane. That is, a vector normal to
the plane.
• If the vectors a and b are non-zero then a × b = 0 precisely when a
and b are parallel to each other.
• The vector product can be used to calculate the sine of the angle between two vectors
ka × bk
sin θ =
kak kbk
where θ is the angle between the non-zero vectors a and b.
9.4. SUMMARY OF VECTORS
265
Scalar triple products
Definition
Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c)
is a scalar. We define
[a, b, c] = a · (b × c).
It is called the scalar triple product.
Co-ordinate form
Let a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k. Then
a1 a2 a3 [a, b, c] = b1 b2 b3 c1 c2 c3 Uses
• The absolute value of [a, b, c] is the volume of the parallelepiped (‘squashed
box’) determined by the three vectors.
• The scalar triple product gives a geometric interpretation of 3 × 3 determinants.
266
9.5
CHAPTER 9. VECTORS
*Two vector proofs*
This section will not be examined in 2013.
My development of the theory of vectors in this chapter depended on two
important results: Theorem 9.1.7 (iv), the fact that
a · (b + c) = a · b + a · c
and Theorem 9.1.10 (iii), the fact that
a × (b + c) = a × b + a × c.
I shall sketch out proofs of both of these results here. The proof of the first
is not too difficult.
Theorem 9.5.1.
a · (b + c) = a · b + a · c.
Proof. Let x and y be a pair of vectors. Then the component of x in the
direction of y, written comp(x, y), is by definition the number kxk cos θ where
θ is the angle between x and y. Clearly
x · y = kyk comp(x, y).
Geometry shows (this means you should draw the pictures) that
comp(b + c, a) = comp(b, a) + comp(c, a).
We therefore have that
(b + c) · a = kak comp(b + c, a)
= kak comp(b, a) + kak comp(c, a)
= b·a+c·a
The proof of the second is hairier.
Theorem 9.5.2.
a × (b + c) = a × b + a × c.
9.5. *TWO VECTOR PROOFS*
267
Proof. We defined the vector product in terms of geometry and so we shall
have to prove this property by means of geometry. I shall sketch out a proof
following one given in Pettofrezzo’s book.
We begin with what is in effect a lemma. Let a and b be a pair of
vectors. It is convenient to move them so that they are both emanating from
the same point P . They determine a plane. In that plane, we can draw
a line perpendicular to the vector a and passing through the point P . We
project the vector b onto this line and we get a vector b0 . We claim that
a × b = a × b0 . The proof follows by observing that these two vectors clearly
have the same direction and a calculation shows that they have the same
length.
We now prove our theorem. We orientate ourselves so that the vector a
is at right angles to the page and pointing at you the reader. We project the
vectors a and b onto the plane of the page to get the vectors a0 and b0 .
We shall prove that
a × (b0 + c0 ) = a × b0 + a × c0 .
Let’s see first why this result is enough to prove the theorem. The vectors a
and b + c define a plane. As in our lemma above, we have that a × (b + c) =
a×(b+c)0 . Also a×b = a×b0 and a×c = a×c0 . As long as (b+c)0 = b0 +c0 ,
our theorem will follow.
We now prove that
a × (b0 + c0 ) = a × b0 + a × c0 .
Now, by the way we have defined our vectors, a × b0 and a × c0 are in the
plane of the page and are orthogonal to b0 and c0 , respectively. This leads to
the crux of the proof: the angle between a × b0 and a × c0 is the same as the
angle between b0 and c0 . The point is that because a is pointing out of the
page, the operator a × − has the effect of rotating vectors by a right angle
in the plane of the page.
It follows that a×b0 +a×c0 is at right angles to b0 +c0 . Thus a×b0 +a×c0
and a × (b0 + c0 ) are vectors pointing in the same direction.
We now compare the lengths of these two vectors. We shall use the fact
that the triangles formed by the vectors a × b0 and a × c0 is similar to the
triangle formed by the vectors b0 and c0 . Thus
ka × b0 k
ka × b0 + a × c0 k
=
.
kb0 + c0 k
kb0 k
268
CHAPTER 9. VECTORS
But this works out to give that
ka × b0 + a × c0 k = kak kb0 + c0 k .
Our claim is now proved.
9.6
Quaternions
The set of quaternions, denoted by H, was invented by the Irish mathematician Sir William Rowan Hamilton in 1843. They are 4-dimensional generalisations of the complex numbers. It was from the theory of quaternions
that the modern theory of vectors with inner and vector products developed.
To describe what they are, I shall reverse history and derive them from vectors. Recall the following from an earlier exercise. The Pauli matrices are:
I, X, Y, Z, −I, −X, −Y, −Z where
0 1
i 0
0 −i
X=
Y =
and Z =
−1 0
0 −i
−i 0
where i is the complex number i. You were asked to show that the product
of any two Pauli matrices is again a Pauli matrix by completing a table. We
shall just need a portion of that table relating to X, Y and Z. This is
X
Y
Y
X
Y
Z
−I
Z −Y
−Z −I
X
Y −X −I
We shall now consider matrices of the form
λI + αX + βY + γZ
where λ, α, β, γ ∈ R. We calculate the product of two such matrices using the
distributivity and scalar multiplication properties of matrix multiplication
and the above multiplication table. The product
(λI + αX + βY + γZ)(µI + α0 X + β 0 Y + γ 0 Z)
can be written in the form aI + bX + cY + dZ where a, b, c, d ∈ R although
I shall write it in a slightly different form
9.6. QUATERNIONS
269
(λµ − αα0 − ββ 0 − γγ 0 )I +
λ(α0 X + β 0 Y + γ 0 Z) + µ(αX + βY + γZ) +
(βγ 0 − γβ 0 )X + (γα0 − αγ 0 )Y + (αβ 0 − βα0 )Z.
Although this looks complicated there are some familiar things within it:
the first term contains what looks like an inner product and the last term
contains what looks like a vector product. Note that because this is matrix
multiplication this operation is associative.
The above calculation motivates the following construction. Let E3 denote the set of all 3-dimensional vectors. Thus a typical element of E3 is
αi + βj + γk. Put
H = R × E3 .
The elements of H are therefore ordered pairs (λ, a) consisting of a real
number λ and a vector a. We define the sum of two elements of H in a very
simple way
(λ, a) + (µ, a0 ) = (λ + µ, a + a0 ).
The product is defined in a way that mimics what I did above (you should
check this)
(λ, a)(µ, a0 ) = (λµ − a · a0 , λa0 + µa + (a × a0 )) .
It follows that this product is associative !
We shall now investigate what we can do with H. I shall only deal with
multiplication because addition poses no problems.
• Consider the subset R of H which consists of elements of the form
(λ, 0). You can check that (λ, 0)(µ, 0) = (λµ, 0). Thus R mimics the
real numbers.
• Consider the subset C of H which consists of the elements of the form
(λ, ai). You can check that
(λ, ai)(µ, a0 i) = (λµ − aa0 , (λa0 + µa)i).
In particular, (0, i)(0, i) = (−1, 0). Thus C mimics the set of complex
numbers.
270
CHAPTER 9. VECTORS
• Consider the subset E of H which consists of elements of the form (0, a).
You can check that
(0, a)(0, a0 ) = (−a · a0 , a × a0 ).
Thus E mimics vectors, the inner product and the vector product.
The set H with the above operations of addition and multiplication is
the set of quaternions. This structure pulls together most of the important
elements of this course: complex numbers, vectors and matrices.