Download F17CC1 ALGEBRA A Algebra, geometry and combinatorics

Document related concepts

Location arithmetic wikipedia , lookup

Classical Hamiltonian quaternions wikipedia , lookup

Theorem wikipedia , lookup

Vincent's theorem wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Arithmetic wikipedia , lookup

Bra–ket notation wikipedia , lookup

System of polynomial equations wikipedia , lookup

Factorization wikipedia , lookup

Algebra wikipedia , lookup

Addition wikipedia , lookup

Matrix calculus wikipedia , lookup

Elementary mathematics wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
F17CC1 ALGEBRA A
Algebra, geometry and combinatorics
Dr Mark V Lawson
July 12, 2013
ii
Contents
1 The fundamental theorem of arithmetic
1.1 Basic sets . . . . . . . . . . . . . . . . . . . . .
1.2 Writing numbers down . . . . . . . . . . . . . .
1.2.1 From tallies to the Hindu-Arabic number
1.2.2 Number bases . . . . . . . . . . . . . . .
1.3 The fundamental theorem of arithmetic . . . . .
1.3.1 Greatest common divisors . . . . . . . .
1.3.2 Primes: the atoms of number . . . . . .
1.4 Real numbers . . . . . . . . . . . . . . . . . . .
1.4.1 Irrational numbers . . . . . . . . . . . .
1.4.2 Decimal fractions . . . . . . . . . . . . .
1.5 *The prime number theorem* . . . . . . . . . .
1.6 *Proofs by induction* . . . . . . . . . . . . . .
1.7 Learning outcomes for Chapter 1 . . . . . . . .
1.8 Further reading and exercises . . . . . . . . . .
2 The fundamental theorem of algebra
2.1 Complex number arithmetic . . . . . . . .
2.1.1 Solving quadratic equations . . . .
2.1.2 Introducing complex numbers . . .
2.2 The fundamental theorem of algebra . . .
2.2.1 The arithmetic of polynomials . . .
2.2.2 Roots of polynomials . . . . . . . .
2.3 Complex number geometry . . . . . . . . .
2.3.1 sin and cos . . . . . . . . . . . . .
2.3.2 The complex plane . . . . . . . . .
2.3.3 Arbitrary roots of complex numbers
2.3.4 Euler’s formula . . . . . . . . . . .
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
system .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
5
5
7
13
13
18
23
24
25
28
31
37
37
.
.
.
.
.
.
.
.
.
.
.
39
39
40
42
50
51
52
59
60
60
63
66
ii
CONTENTS
2.4
2.5
2.6
2.7
2.8
2.9
*Making sense of complex numbers* . . . .
*Morning duel: cubics, quartics, quintics and
*Analogies* . . . . . . . . . . . . . . . . . .
*Rational functions* . . . . . . . . . . . . .
2.7.1 Numerical partial fractions . . . . . .
2.7.2 Partial fractions . . . . . . . . . . . .
2.7.3 Integrating rational functions . . . .
Learning outcomes for Chapter 2 . . . . . .
Further reading and exercises . . . . . . . .
. . . . . .
beyond* .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
3 Matrices
3.1 Matrix arithmetic . . . . . . . . . . . . . . . . . . .
3.1.1 Basic matrix definitions . . . . . . . . . . .
3.1.2 Addition, subtraction, scalar multiplication
transpose . . . . . . . . . . . . . . . . . . .
3.1.3 Matrix multiplication . . . . . . . . . . . . .
3.1.4 Summary of matrix mutiplication . . . . . .
3.1.5 Special matrices . . . . . . . . . . . . . . . .
3.1.6 Linear equations . . . . . . . . . . . . . . .
3.2 Matrix algebra . . . . . . . . . . . . . . . . . . . .
3.2.1 Properties of matrix addition . . . . . . . .
3.2.2 Properties of matrix multiplication . . . . .
3.2.3 Properties of scalar multiplication . . . . . .
3.2.4 Properties of the transpose . . . . . . . . . .
3.2.5 Polynomials of matrices . . . . . . . . . . .
3.3 Determinants . . . . . . . . . . . . . . . . . . . . .
3.4 Solving systems of linear equations . . . . . . . . .
3.4.1 Some theory . . . . . . . . . . . . . . . . . .
3.4.2 Gaussian elimination . . . . . . . . . . . . .
3.5 Blankinship’s algorithm . . . . . . . . . . . . . . .
3.6 *Some proofs* . . . . . . . . . . . . . . . . . . . . .
3.7 *Matrix inverses* . . . . . . . . . . . . . . . . . . .
3.7.1 The key idea . . . . . . . . . . . . . . . . .
3.7.2 Invertible and noninvertible matrices . . . .
3.7.3 The matrix inverse method for solving linear
3.8 *Complex numbers via matrices* . . . . . . . . . .
3.9 Learning outcomes for Chapter 3 . . . . . . . . . .
3.10 Further reading and exercises . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
69
70
71
71
74
77
80
80
81
. . . . . . 81
. . . . . . 81
and the
. . . . . . 83
. . . . . . 85
. . . . . . 89
. . . . . . 90
. . . . . . 91
. . . . . . 94
. . . . . . 94
. . . . . . 95
. . . . . . 97
. . . . . . 97
. . . . . . 98
. . . . . . 103
. . . . . . 110
. . . . . . 110
. . . . . . 112
. . . . . . 120
. . . . . . 122
. . . . . . 125
. . . . . . 125
. . . . . . 126
equations 129
. . . . . . 135
. . . . . . 136
. . . . . . 136
CONTENTS
iii
4 Vectors
4.1 Vector algebra . . . . . . . . . . . . . . . . . . . . .
4.1.1 Addition and scalar multiplication of vectors
4.1.2 Inner, scalar or dot products . . . . . . . . .
4.1.3 Vector or cross products . . . . . . . . . . .
4.1.4 Scalar triple products . . . . . . . . . . . . .
4.2 Vector arithmetic . . . . . . . . . . . . . . . . . . .
4.2.1 i’s, j’s and k’s . . . . . . . . . . . . . . . . .
4.3 Geometry with vectors . . . . . . . . . . . . . . . .
4.3.1 Position vectors . . . . . . . . . . . . . . . .
4.3.2 Linear combinations . . . . . . . . . . . . .
4.3.3 Lines . . . . . . . . . . . . . . . . . . . . . .
4.3.4 Planes . . . . . . . . . . . . . . . . . . . . .
4.3.5 Determinants . . . . . . . . . . . . . . . . .
4.4 Summary of vectors . . . . . . . . . . . . . . . . . .
4.5 *Two vector proofs* . . . . . . . . . . . . . . . . .
4.6 *Quaternions* . . . . . . . . . . . . . . . . . . . . .
4.7 Learning outcomes for Chapter 4 . . . . . . . . . .
4.8 Further reading and exercises . . . . . . . . . . . .
5 Counting
5.1 More set theory . . . . . . . . . . . . . . . . .
5.1.1 Operations on sets . . . . . . . . . . .
5.1.2 Partitions . . . . . . . . . . . . . . . .
5.1.3 Sequences . . . . . . . . . . . . . . . .
5.2 Ways of counting . . . . . . . . . . . . . . . .
5.2.1 Counting principles . . . . . . . . . . .
5.2.2 Counting sequences . . . . . . . . . . .
5.2.3 The power set . . . . . . . . . . . . . .
5.2.4 Counting arrangements: permutations
5.2.5 Counting choices: combinations . . . .
5.2.6 Examples of counting . . . . . . . . . .
5.3 The binomial theorem . . . . . . . . . . . . .
5.4 *An introduction to infinite numbers* . . . . .
5.5 *Proving things about sets* . . . . . . . . . .
5.6 Learning outcomes for Chapter 5 . . . . . . .
5.7 Further reading and exercises . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
137
. 138
. 139
. 145
. 147
. 150
. 152
. 152
. 156
. 156
. 157
. 158
. 162
. 165
. 170
. 173
. 175
. 177
. 177
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
179
. 179
. 179
. 182
. 183
. 185
. 186
. 187
. 187
. 188
. 190
. 192
. 194
. 200
. 202
. 203
. 204
iv
Afterword
CONTENTS
205
Chapter 1
The fundamental theorem of
arithmetic
In everday life a number is a number, but in mathematics we distinguish
between different kinds of numbers according to their properties and uses.
The goal of this chapter is to describe those numbers that should be familiar
to you whereas in the next chapter, we shall introduce numbers that might
be unfamiliar to you: the complex numbers that are so important in the
later study of mathematics. There are two essential results in this chapter:
the fact that every natural number greater than or equal to 2 can be written
uniquely as a product of powers of primes — this is the fundamental theorem
of arithmetic — and the proof that certain numbers are irrational.
1.1
Basic sets
Set theory, invented by Georg Cantor (1845–1918) in the last quarter of the
nineteenth century, provides a precise language for doing mathematics. This
section is mainly a phrasebook of the most important terms we shall need
for the first four chapters, whereas in Chapter 5 we shall study this language
in slightly more detail. The starting point of set theory is the following two
deceptively simple definitions:
• A set is a collection of objects which we wish to regard as a whole. The
members of a set are called its elements.
• Two sets are equal precisely when they have the same elements.
1
2 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
We often use capital letters to name sets: such as A, B, or C or fancy capital
letters such as N and Z. The elements of a set are usually denoted by lower
case letters. If x is an element of the set A then we write
x∈A
and if x is not an element of the set A then we write
x∈
/ A.
A set should be regarded as a bag of elements, and so the order of the
elements within the set is not important. In addition, repetition of elements
is ignored.1
Examples 1.1.1.
(i) The following sets are all equal: {a, b}, {b, a}, {a, a, b}, {a, a, a, a, b, b, b, a}
because the order of the elements within a set is not important and any
repetitions are ignored. Despite this it is usual to write sets without
repetitions to avoid confusion. We have that a ∈ {a, b} and b ∈ {a, b}
but α ∈
/ {a, b}.
(ii) The set {} is empty and is called the empty set. It is given a special
symbol ∅, which is taken from Danish and is the first letter of the
Danish word meaning ‘empty’. Remember that ∅ means the same thing
as {}. Take careful note that ∅ =
6 {∅}. The reason is that the empty
set contains no elements whereas the set {∅} contains one element. By
the way, the symbol for the emptyset is different from the Greek letter
phi: φ or Φ.
The number of elements in a set is called its cardinality. If X is a set
then |X| denotes its cardinality. A set is finite if it only has a finite number
of elements, otherwise it is infinite. If a set has only finitely many elements
then we might be able to list them if there aren’t too many: this is done by
putting them in ‘curly brackets’ { and }. We can sometimes define infinite
sets by using curly brackets but then, because we can’t list all elements in
an infinite set, we use ‘. . .’ to mean ‘and so on in the obvious way’. This can
also be used to define finite sets where there is an obvious pattern. Often,
1
If you want to take account of repetitions you have to use multisets.
1.1. BASIC SETS
3
we describe a set by saying what properties an element must have to belong
to the set. Thus
{x : P (x)}
means ‘the set of all things x which satisfy the condition P ’. Here are some
examples of sets defined in various ways.
Examples 1.1.2.
(i) D = { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday }, the set of the days of the week. This is a small finite set and so
we can conveniently list its elements.
(ii) M = { January, February, March, . . . , November, December }, the set
of the months of the year. This is a finite set but I didn’t want to write
down all the elements so I wrote ‘. . . ’ to indicate that there were other
elements of the set which I was too lazy to write down explicitly but
which are, nevertheless, there.
(iii) A = {x : x is a prime number}. I define a set by describing the properties that the elements of the set must have.
Sets can be complicated. In particular, a set can contain elements which
are themselves sets. For example, A = {{a}, {a, b}} is a set whose elements
are {a} and {a, b} which both happen to be sets. Thus {a} ∈ {{a}, {a, b}}.
In this course, the following sets of numbers will play a special role. We
shall use this notation throughout and so it is worthwhile getting used to it.
Examples 1.1.3.
(i) The set N = {0, 1, 2, 3, . . .} of all natural numbers.
(ii) The set Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} of all integers. The reason Z
is used to designate this set is because ‘Z’ is the first letter of the word
‘Zahl’, the German for number.
(iii) The set Q of all rational numbers i.e. those numbers that can be written
as fractions whether positive or negative.
(iv) The set R of all real numbers i.e. all numbers which can be represented
by decimals with potentially infinitely many digits after the decimal
point.
4 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
(v) The set C of all complex numbers, which I shall introduce in Chapter 2.
Given a set A, a new set B can be formed by choosing elements from A
to put in B. We say that B is a subset of A, which is written B ⊆ A. If
A ⊆ B and A 6= B then we say that A is a proper subset of B. Observe that
two sets are equal precisely when the following two conditions hold
1. A ⊆ B.
2. B ⊆ A.
This is often the best way of showing that two sets are equal although we
won’t have a lot of use of it in this course.
Examples 1.1.4.
(i) ∅ ⊆ A for every set A, where we choose no elements from A. It is a very
common mistake to forget the empty set when listing subsets of a set.
(ii) A ⊆ A for every set A, where we choose all the elements from A.
(iii) N ⊆ Z ⊆ Q ⊆ R ⊆ C.
(iv) E, the set of even natural numbers, is a subset of N.
(v) O, the set of odd natural numbers, is a subset of N.
(vi) P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}, the set of primes, is a subset of N.
(vii) A = {x : x ∈ R and x2 = 4} which is just the set {−2, 2}.
Exercises 1.1
1. Let A = {♣, ♦, ♥, ♠}, B = {♠, ♦, ♣, ♥} and C = {♠, ♦, ♣, ♥, ♣, ♦, ♥, ♠}.
Is it true or false that A = B and B = C? Explain.
2. Find all subsets of the set {a, b, c, d}.
3. Let X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Write down the following subsets of
X:
(i) The subset A of even elements of X.
1.2. WRITING NUMBERS DOWN
5
(ii) The subset B of odd elements of X.
(iii) C = {x : x ∈ X and x ≥ 6}.
(iv) D = {x : x ∈ X and x > 10}.
(v) E = {x : x ∈ X and x is prime}.
(vi) F = {x : x ∈ X and (x ≤ 4 or x ≥ 7)}.
4. Write down the cardinalities of the following sets.
(i) ∅.
(ii) {∅}.
(iii) {∅, {∅}}.
(iv) {∅, {∅}, {∅, {∅}}}.
1.2
Writing numbers down
In this section, we shall explain positional number notation, the system we
use for writing numbers down. The numbers we shall be looking at in this
section are the natural numbers: N = {0, 1, 2, 3, . . .}.
1.2.1
From tallies to the Hindu-Arabic number system
I don’t think our hunter-gatherer ancestors worried too much about writing
numbers down because there wasn’t any need: they didn’t have to fill in
tax-returns and so didn’t need accountants. However, organizing cities does
need accountants and so ways had to be found of writing numbers down.
The simplest way of doing this is to use a mark like |, called a tally, for
each thing being counted. So
||||||||||
means 10 things. This system has advantages and disadvantages. The advantage is that you don’t have to go on a training course to learn it. The
disadvantage is that even quite small numbers need a lot of space like
||||||||||||||||||||||||||||||||||||||
It’s also hard to tell whether
|||||||||||||||||||||||||||||||||||||||
6 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
is the same number or not. (It’s not.)
It’s inevitable that people will introduce abbreviations to make the system easier to use. Perhaps it was in this way that the next development
occurred. Both the ancient Egyptians and Romans used similar systems but
I’ll describe the Roman system because it involves letters rather than pictures. First, you have a list of basic symbols:
number 1 5
symbol I V
10 50 100 500 1000
X L
C
D
M
There are more symbols for bigger numbers but we won’t worry about them.
Numbers are then written according to the additive principle. Thus MMVIIII
is 2009. Incidently, I understand that the custom of also using a subtractive
principle so that, for example, IX means 9 rather than using VIIII, is a more
modern innovation.
This system is clearly a great improvement on the tally-system. Even
quite big numbers are written compactly and it is easy to compare numbers.
On the other hand, there is a bit more to learn. The other disadvantage is
that we need separate symbols for different powers of 10 and their multiples
by 5. This was probably not too inconvenient in the ancient world where
it is likely that the numbers needed on a day-to-day basis were never going
to be that big. A common criticism of this system is that it is hard to do
multiplication in. However, that turns out to be a non-problem because, like
us, the Romans used pocket calculators or, more accurately, a toga-friendly
device called an abacus. The real evidence for the usefulness of this system
of writing numbers is that it survived for hundreds and hundreds of years.
The system used throughout the world today, called the Hindu-Arabic
number system, seems to have been in place by the ninth century in India
but it was hundreds of years in development and the result of ideas from
many different cultures [3]; the invention of zero on its own is one of the
great steps in human intellectual development. The genius of the system is
that it requires only 10 symbols
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
and every natural number can be written using a sequence of these symbols.
The trick to making the system work is that we use the position on the page
1.2. WRITING NUMBERS DOWN
7
of a symbol to tell us what number it means. Thus 2009 means
103
2
102
0
101
0
100
9
In other words
2 × 103 + 0 × 102 + 0 × 101 + 9 × 100 .
Notice the important role played by the symbol 0 which makes it clear which
column a symbol belongs in otherwise we couldn’t tell 29 from 209 from 2009.
The disadvantage of this system is that you do have to go on a course to learn
it because it is a highly sophisticated way of writing numbers. On the other
hand, it has the enormous advantage that any number can be written down
in a compact way.
Once the basic system had been accepted it could be adapted to deal not
only with positive whole numbers but also negative whole numbers, using
the symbol −, and also fractions with the introduction of the decimal point.
By the end of the sixteenth century the full decimal system was in place [13].
Notation warning! In the UK, we use a raised decimal point like 0 · 123
and not a comma. Also we generally write the number 1 without a long
hook at the top. If you do write it like that there is a danger that people will
confuse it with the number 7 which is not always written in the UK with a
line through it.
1.2.2
Number bases
We shall now look in more detail at the way in which numbers can be written
down using a positional notation. In order not to be biased, we shall not just
work in base 10 but show how any base can be used. Base 10 probably arose
for biological reasons since we have ten fingers.
There is one result that we shall use throughout the remainder of this
section. It can be proved using the following idea. For simplicity let’s assume
that both a and b are positive. If 0 < a < b then b · 0 < a < b · 1. If a ≥ b
then we can always find a q such that bq ≤ a < b(q + 1). We therefore have
the following.
8 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Lemma 1.2.1 (Remainder Theorem). Let a and b be natural numbers. Then
there are unique integers q and r such that
a = bq + r
where 0 ≤ r < b.
The number q is called the quotient and the number r is called the remainder. For example, if we consider the pair of natural numbers 14 and 3
then
14 = 3 · 4 + 2
where 4 is the quotient and 2 is the remainder. There’s nothing new here
except possibly the terminology.
Let a and b be integers. We say that a divides b or that b is divisible by
a if there is a q such that b = aq. In other words, there is no remainder. We
also say that a is a divisor or factor of b. We write a | b to mean the same
thing as ‘a divides b’.2
Warning! a | b does not mean the same thing as ab . The latter is a number,
the former is a statement about two numbers.
Let’s see how to represent numbers in base b where b ≥ 2. If d ≤ 10 then
we represent numbers by sequences of symbols taken from the set
Zd = {0, 1, 2, 3, . . . d − 1}
but if d > 10 then we need new symbols for 10, 11, 12 and so forth. It’s
convenient to use A,B,C, . . .. For example, if we want to write numbers in
base 12 we use the set of symbols
{0, 1, . . . , 9, A, B}
whereas if we work in base 16 we use the set of symbols
{0, 1, . . . , 9, A, B, C, D, E, F }.
If x is a sequence of symbols then we write xd to make it clear that we are
to interpret this sequence as a number in base d. Thus BAD16 is a number
in base 16.
2
Observe that if a is nonzero, then a | a, if a | b and b | a then a = ±b, and finally if
a | b and b | c then a | c.
1.2. WRITING NUMBERS DOWN
9
The symbols in a sequence xd , reading from right to left, tell us the contribution each power of d such as d0 , d1 , d2 , etc makes to the number the
sequence represents. Here are some examples.
Examples 1.2.2. Converting from base d to base 10.
(i) 11A912 is a number in base 12. This represents the following number in
base 10:
1 × 123 + 1 × 122 + A × 121 + 9 × 120 ,
which is just the number
123 + 122 + 10 × 12 + 9 = 2001.
(ii) BAD16 represents a number in base 16. This represents the following
number in base 10:
B × 162 + A × 161 + D × 160 ,
which is just the number
11 × 162 + 10 × 16 + 13 = 2989.
(iii) 55567 represents a number in base 7. This represents the following
number in base 10:
5 × 73 + 5 × 72 + 5 × 71 + 6 × 70 = 2001.
These examples show how easy it is to convert from base d to base 10.
There are two ways to convert from base 10 to base d.
1. The first runs in outline as follows. Let n be the number in base 10
that we wish to write in base d. Look for the largest power m of d such
that adm ≤ n where a < d. Then repeat for n − adm . Continuing in
this way, we write n as a sum of multiples of powers of d and so we can
write n in base d.
10 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
2. The second makes use of the remainder theorem. The idea behind this
method is as follows. Let
n = am . . . a1 a0
in base d. We may think of this as
n = (am . . . a1 )d + a0
It follows that a0 is the remainder when n is divided by d, and the
quotient is n0 = am . . . a1 . Thus we can generate the digits of n in base
d from right to left by repeatedly finding the next quotient and next
remainder by dividing the current quotient by d; the process starts with
our input number as first quotient.
Examples 1.2.3. Converting from base 10 to base d.
(i) Write 2001 in base 7. I’ll solve this question in two different ways: the
long but direct route and then the short but more thought-provoking
route.
We see that 74 > 2001. Thus we divide 2001 by 73 . This goes 5 times
plus a remainder. Thus 2001 = 5 × 73 + 286. We now repeat with
286. We divide it by 72 . It goes 5 times again plus a remainder. Thus
286 = 5 × 72 + 41. We now repeat with 41. We get that 41 = 5 × 7 + 6.
We have therefore shown that
2001 = 5 × 73 + 5 × 72 + 5 × 7 + 6.
Thus 2001 in base 7 is just 5556.
Now for the short method.
7
7
7
7
quotient
2001
285
40
5
0
remainder
6
5
5
5
Thus 2001 in base 7 is:
5556.
1.2. WRITING NUMBERS DOWN
11
(ii) Write 2001 in base 12.
12
12
12
12
quotient
2001
166
13
1
0
remainder
9
10 = A
1
1
Thus 2001 in base 12 is:
11A9.
(iii) Write 2001 in base 2.
2
2
2
2
2
2
2
2
2
2
2
quotient
2001
1000
500
250
125
62
31
15
7
3
1
0
remainder
1
0
0
0
1
0
1
1
1
1
1
Thus 2001 in base 2 is (reading from bottom to top):
11111010001.
When converting from one base to another it is always wise to check
your calculations by converting back.
Terminology Number bases have some special terminology associated with
them which you might encounter:
Base 2 binary.
12 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Base 8 octal.
Base 10 decimal.
Base 12 duodecimal.
Base 16 hexadecimal.
Base 20 vigesimal.
Base 60 sexagesimal.
Binary, octal and hexadecimal occur in computer science; there are remnants
of a vigesimal system in French and the older Welsh system of counting; base
60 was used by astronomers in ancient Mesopotamia and is still the basis of
time measurement (60 seconds = 1 minute, and 60 minutes = 1 hour) and
angle measurement.
What good are number bases? There are a number of answers to this
question. First, it helps you to understand the true meaning of our positional
number system. Second, computers, famously, work in base 2, and so it gives
you some understanding of how they work. Third, as I indicated above, angle
and time measurement, for historical reasons, are carried out in base 60.
Fourth, there are mathematical patterns in working with different number
bases which turn out to have important applications. Fifth, it is interesting
mathematically.
Exercises 1.2
1. What are the possible remainders when a natural number is divided by
(i) 2.
(ii) 3.
(iii) 4.
(iv) n where n ≥ 2 is any natural number.
[This question really is as trivial as it looks].
2. Find the quotients and remainders for each of the following pair of
numbers; divide the smaller into the larger.
1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
13
(i) 30 and 6.
(ii) 100 and 24.
(iii) 364 and 12.
3. Write the number 2009 in
(i) Base 5.
(ii) Base 12.
(iii) Base 16.
4. Write the following numbers in base 10.
(i) DAB16 .
(ii) ABBA12 .
(iii) 443322115 .
1.3
The fundamental theorem of arithmetic
The goal of this section is to state and prove the most basic result about the
natural numbers: each natural number, excluding 0 and 1, can be written
as a product of powers of primes in essentially one way. The primes are the
‘atoms’ from which all natural numbers can be built.
1.3.1
Greatest common divisors
Let a, b ∈ N. A number d which divides both a and b is called a common
divisor of a and b. The largest number which divides both a and b is called
the greatest common divisor of a and b and is denoted by gcd(a, b). A pair
of natural numbers a and b is said to be coprime if gcd(a, b) = 1.
Note that for us gcd(0, 0) is undefined but that if a 6= 0 then gcd(a, 0) = a.
Example 1.3.1. Consider the numbers 12 and 16. The set of divisors of 12
is the set {1, 2, 3, 4, 6, 12}. The set of divisors of 16 is the set {1, 2, 4, 8, 16}.
The set of common divisors is the set of numbers that belong to both of
these two sets: namely, {1, 2, 4}. The greatest common divisor of 12 and 16
is therefore 4. Thus gcd(12, 16) = 4.
14 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
One application of greatest common divisors is in simplifying fractions.
is equal to the fraction 43 because we can divide
For example, the fraction 12
16
out the common divisor of numerator and denominator. The fraction which
results cannot be simplified further and is in its lowest terms. This is justified
by the following result.
Lemma 1.3.2. Let d = gcd(a, b). Then gcd( ad , db ) = 1.
Proof. Because d divides both a and b we may write a = a0 d and b = b0 d for
some natural numbers a0 and b0 . We therefore need to prove that gcd(a0 , b0 ) =
1. Suppose that e | a0 and e | b0 . Then a0 = ex and b0 = ey for some natural
numbers x and y. Thus a = exd and b = eyd. Observe that ed | a and ed | b
and so ed is a common divisor or both a and b. But d is the greatest common
divisor and so e = 1, as required.
Let me paraphrase what the result above says. If I divide two numbers by
their greatest common divisors then then numbers that remain are coprime.
This seems intuitively plausible and the proof ensures that our intuition is
correct.
If the numbers a and b are large, then calculating their gcd in the way
I did above would be time-consuming and error-prone. We want to find
an efficient way of calculating the greatest common divisor. The following
lemma is the basis of just such an efficient method.
Lemma 1.3.3. Let a, b ∈ N, where b 6= 0, and let a = bq +r where 0 ≤ r < b.
Then
gcd(a, b) = gcd(b, r).
Proof. Let d be a common divisor of a and b. Since a = bq + r we have that
a − bq = r so that d is also a divisor of r. It follows that any divisor of a and
b is also a divisor of b and r.
Now let d be a common divisor of b and r. Since a = bq + r we have that
d divides a. Thus any divisor of b and r is a divisor of a and b.
It follows that the set of common divisors of a and b is the same as the
set of common divisors of b and r. Thus gcd(a, b) = gcd(b, r).
The point of the above result is that b < a and r < b. So calculating gcd(b, r) will be easier than calculating gcd(a, b) because the numbers
involved are smaller. Compare
z }| {
a = bq + r
1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
15
with
a = bq + r .
| {z }
The above result is the basis of an efficient algorithm for computing greatest
common divisors. It was described by Euclid around 300 BC in his collection
of books ‘The Elements’ in Propositions 1 and 2 of Book VII.
Algorithm 1.3.4 (Euclid’s algorithm).
Input: a, b ∈ N such that a ≥ b and b 6= 0.
Output: gcd(a, b).
Procedure: write a = bq + r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). If
r 6= 0 then repeat this procedure with b and r and so on. The last non-zero
remainder is gcd(a, b)
Example 1.3.5. Let’s calculate gcd(19, 7) using Euclid’s algorithm. I have
highlighted the numbers that are involved at each stage.
19
7
5
2
=
=
=
=
7·2+5
5·1+2
2·2+1 ∗
1·2+0
By Lemma 1.3.3 we have that
gcd(19, 7) = gcd(7, 5) = gcd(5, 2) = gcd(2, 1) = gcd(1, 0).
The last non-zero remainder is 1 and so gcd(19, 7) = 1 and, in this case, the
numbers are coprime.
There are occasions when we need to extract more information from Euclid’s algorithm as we shall discover later when we come to deal with prime
numbers. Specifically, we can use Euclid’s algorithm to find integers x and
y such that
gcd(a, b) = xa + yb.
This is called Bézout’s theorem. This theorem is proved by running Euclid’s
algorithm in reverse when it is called the extended Euclidean algorithm. The
procedure for doing so is outlined below but the details are explained in the
example that follows it.
16 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Algorithm 1.3.6 (Bézout’s Theorem/Extended Euclidean algorithm).
Input: a, b ∈ N where a ≥ b and b 6= 0.
Output: numbers x, y ∈ Z such that gcd(a, b) = xa + yb.
Procedure: apply Euclid’s algorithm to a and b; working from bottom to top
rewrite each remainder in turn.
Example 1.3.7. This is a little involved so I have split the process up into
steps. I shall apply the extended Euclidean algorithm to the example I
calculated above. I have highlighted the non-zero remainders wherever they
occur, and I have discarded the last equality where the remainder was zero.
I have also marked the last non-zero remainder.
19 = 7 · 2 + 5
7 = 5·1+2
5 = 2·2+1 ∗
The first step is to rearrange each equation so that the non-zero remainder
is alone on the lefthand side.
5 = 19 − 7 · 2
2 = 7−5·1
1 = 5−2·2
Next we reverse the order of the list
1 = 5−2·2
2 = 7−5·1
5 = 19 − 7 · 2
We now start with the first equation. The lefthand side is the gcd we are
interested in. We treat all other remainders as algebraic quantities and systematically substitute them in order. Thus we begin with the first equation
1 = 5 − 2 · 2.
The next equation in our list is
2=7−5·1
1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
17
so we replace 2 in our first equation by the expression on the right to get
1 = 5 − (7 − 5 · 1) · 2.
We now rearrange this equation by collecting up like terms treating the highlighted remainders as algebraic objects to get
1 = 3 · 5 − 2 · 7.
We can of course make a check at this point to ensure that our arithmetic is
correct. The next equation in our list is
5 = 19 − 7 · 2
so we replace 5 in our new equation by the expression on the right to get
1 = 3 · (19 − 7 · 2) − 2 · 7.
Again we rearrange to get
1 = 3 · 19 − 8 · 7 .
The algorithm now terminates and we can write
gcd(19, 7) = 3 · 19 + (−8) · 7 ,
as required. We can also, of course, easily check the answer!
Exercise 1.3.8. Use the extended Euclidean algorithm to find integers x, y
such that gcd(a, b) = xa + yb when a = 2406 and b = 654. Check your
answer. [Solution: 6 = 28 · 2406 − 103 · 654].
I shall describe a much more efficient algorithm for implementing the
extended Euclidean algorithm when I have discussed matrices in Chapter 3.
The greatest common divisor of two numbers a and b is the largest number
that divides into both a and b. On the other hand, if a | c and b | c then we
say that c is a common multiple of a and b. The smallest common multiple
of a and b is called the least common multiple of a and b and is denoted by
lcm(a, b). You might expect that to calculate the least common multiple we
would need a new algorithm, but in fact we can use Euclid’s algorithm as
the following result shows. I shall prove the following result later once I have
proved the fundamental theorem of arithmetic.
18 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Proposition 1.3.9. Let a and b be natural numbers. Then
gcd(a, b) × lcm(a, b) = ab.
I shall now show how gcd’s and lcm’s play a natural role in the arithmetic
of fractions. The key property of fractions is that a fraction ab is unchanged
when numerator and denominator are both multiplied by the same non-zero
integer. Thus
ac
a
= .
b
bc
a
Given a fraction b we often want to simplify it as much as possible and this
is accomplished by calculating gcd(a, b) = d. We have a = a0 d and b = b0 d
and so
a0 d
a0
a
= 0 = 0.
b
bd
b
0 0
We have proved above that gcd(a , b ) = 1 and so the fraction cannot be
0
simplified any further. Thus ab0 is a fraction in its lowest terms.
When we come to add fractions, the problem is the reverse of simplification. We cannot immediately add ab + dc because the denominators b and
d are different. To make progress, we have to rewrite each fraction so that
their denominators are the same. The simplest way to do this is to rewrite
each fraction as a fraction over bd: to do this, we multiply the first fraction
by d and the second by b to get
ad + bc
ad bc
+
=
.
bd bd
bd
However, the most efficient way is to write each fraction over lcm(b, d). Let
lcm(b, d) = b0 b = d0 d. Then
a c
b 0 a d0 c
b0 a + d0 c
+ = 0 + 0 =
.
b d
bb dd
lcm(b, c)
1.3.2
Primes: the atoms of number
A proper divisor of a natural number n is a divisor that is neither 1 nor n.
A natural number n is said to be prime if n ≥ 2 and the only divisors of n
are 1 and n itself. A number bigger than or equal to 2 which is not prime is
said to be composite.
1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
19
Warning! The number 1 is not a prime.
The properties of primes have exercised a great fascination ever since they
were first studied and continue to pose questions that mathematicians have
yet to solve. We shall just describe their basic properties in this section.
Lemma 1.3.10. Let n ≥ 2. Either n is prime or the smallest proper divisor
of n is prime.
Proof. Suppose n is not prime. Let d be the smallest proper divisor of n. If d
were not prime then d would have a smallest proper divisor and this divisor
would in turn divide n, but this would contradict the fact that d was the
smallest proper divisor of n. Thus d must itself be prime.
The following was also proved by Euclid: it is Proposition 20 of Book IX
of ‘The Elements’.
Theorem 1.3.11. There are infinitely many primes.
Proof. Let p1 , . . . , pn be the first n primes. Put
N = (p1 . . . pn ) + 1.
If N is a prime, then N is a prime bigger than pn . If N is composite, then N
has a prime divisor p by Lemma 1.3.10. But p cannot equal any of the primes
p1 , . . . , pn because N leaves remainder 1 when divided by pi . It follows that
p is a prime bigger than pn . Thus we can always find a bigger prime. It
follows that there must be an infinite number of primes.
Algorithm 1.3.12. To √
decide if a number n is prime or composite. Check
to see if any prime p ≤ n divides n. If none of them do, the number n is
prime.
Let’s think about why this
divides n then we√can write n =√ab
√ works. If a√
for some number b. If a < n then b > n whilst if a > n then b < n.
Thus to decide if √
n is prime or not we need only carry out trial divisions by
all numbers a ≤ n. However, this is inefficient because if a divides n and
a is not prime then a is divisible by some prime p which must therefore also
divide
√ n. It follows that we need only carry out trial divisions by the primes
p ≤ n.
20 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Example 1.3.13. Determine whether 97 is prime using the above √
algorithm.
We first calculate the largest whole number less than or equal to 97. This
is 9. We now carry out trial divisions of 97 by each prime number p where
2 ≤ p ≤ 9; by the way, if you aren’t certain which of these numbers is prime:
just try them all. You’ll get the right answer although not as efficiently. You
might also want to remember that if m doesn’t divide a number neither can
any multiple of m. In any event, in this case we carry out trial divisions by
2, 3, 5 and 7. None of them divides 97 exactly and so 97 is prime.
The following is the key property of primes we shall need to prove the
fundamental theorem of arithmetic. We use Bézout’s Theorem to prove it.
It is Proposition 30 of Book VII of ‘the Elements’.
Lemma 1.3.14 (Euclid’s lemma). Let p | ab where p is a prime. Then p | a
or p | b.3
Proof. Suppose that p does not divide a. We shall prove that p must then
divide b. If p does not divide a, then a and p are coprime, and so there exist
integers x and y such that 1 = px + ay. Thus b = bpx + bay. Now p | bp and
p | ba, by assumption, and so p | b, as required.
Example 1.3.15. The above result is not true if p is not a prime. For
example, 6 | 9 × 4 but 6 divides neither 9 nor 4.
Lemma 1.3.14 is so important, I want to spell out in words what it says
If a prime divides a product of numbers it must divide at least
one of them.
Theorem 1.3.16 (Fundamental theorem of arithmetic). Every number n ≥
2 can be written as a product of primes in one way if we ignore the order in
which the primes appear. By product we allow the possibility that there is
only one prime.
Proof. Let n ≥ 2. If n is already a prime then there is nothing to prove, so
we can suppose that n is composite. Let p1 be the smallest prime divisor of
n. Then we can write n = p1 n0 where n0 < n. Once again, n0 is either prime
or composite. Continuing in this way, we can write n as a product of primes.
3
This result can be usefully generalised using much the same proof. Let p | ab where p
and a are coprime. Then p | b.
1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC
21
We now prove uniqueness. Suppose that
n = p1 . . . ps = q 1 . . . qt
are two ways of writing n as a product of primes. Now p1 | n and so p1 |
q1 . . . qt . By Euclid’s Lemma, the prime p1 must divide one of the qi ’s and,
since they are themselves prime, it must actually equal one of the qi ’s. By
relabelling if necessary, we can assume that p1 = q1 . Cancel p1 from both
sides and repeat with p2 . Continuing in this way, we see that every prime
occurring on the lefthand side occurs on the righthand side. Changing sides,
we see that every prime occurring on the righthand side occurs on the lefthand
side. We deduce that the two prime decompositions are identical.
When we write a number as a product of primes we usually gather together the same primes into a prime power, and write the primes in increasing
order which then gives a unique representation. This is illustrated in the example below.
Example 1.3.17. Let n = 999, 999. Write n as a product of primes. There
are a number of ways of doing this but in this case there is an obvious place
to start. We have that
n = 32 ·111, 111 = 33 ·37, 037 = 33 ·7·5, 291 = 33 ·7·11·481 = 33 ·7·11·13·37.
Thus the prime factorisation of 999, 999 is
999, 999 = 33 · 7 · 11 · 13 · 37.
Primes can be regarded as the atoms from which all other numbers can
be constructed.
√
Using the fundamental theorem of arithmetic we can always compute n,
where n is a natural number, exactly in
√ terms of the square roots of prime
numbers. For example, let’s calculate 540 exactly. First, we find a prime
factorization of 540. We have
540 = 10 · 54 = 2 · 5 · 2 · 27 = 2 · 5 · 2 · 3 · 9 = 22 · 33 · 5.
Thus
√
√
540 =
22 · 32 · 3 · 5 = 2 · 3 ·
√
√ √
3 · 5 = 6 3 5.
This is an exact answer. If someone needs to compute it explicitly, then they
can do so to a degree of accuracy they choose and not one that you have
22 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
arbitrarily decided upon.
We can use the prime factorizations of numbers to give a nice proof of
Proposition 1.3.9. Let m and n be two integers. To keep things simple, we
suppose that their prime factorizations are
m = pα1 pβ2 pγ3 and n = pδ1 p2 pζ3
where p1 , p2 , p3 are primes. It will be obvious how to extend this argument
to the general case. The prime factorizations of gcd(m, n) and lcm(m, n) are
min(α,δ) min(β,) min(γ,ζ)
p3
p2
gcd(m, n) = p1
and
max(α,δ) max(β,) max(γ,ζ)
p3
p2
lcm(m, n) = p1
respectively. I shall let you work out why and also work out how we can use
these results to prove the above proposition.
Exercises 1.3
1. Use Euclid’s algorithm to find the gcd’s of the following pairs of numbers.
(i) 35, 65.
(ii) 135, 144.
(iii) 17017, 18900.
2. Use the extended Euclidean algorithm to find integers x and y such
that gcd(a, b) = ax + by for each of the following pairs of numbers. You
should ensure that your answers for x and y have the correct signs.
(i) 112, 267.
(ii) 242, 1870.
3. Find the lowest common multiples of the following pairs of numbers.
(i) 22, 121.
(ii) 48, 72.
1.4. REAL NUMBERS
23
(iii) 25, 116.
4. List the primes less than 100.
5. For each of the following numbers use Algorithm 1.3.12 to determine
whether they are prime or composite. When they are composite find a
prime factorisation. Show all working.
(i) 131.
(ii) 689.
(iii) 5491.
6. Given that 3630000 = 24 · 3 · 55 · 112 and 915062500 = 22 · 56 · 114 ,
calculate the greatest common divisor and least common multiple of
these two numbers.
7. Calculate the square roots of the following numbers exactly.
(i) 10.
(ii) 42.
(iii) 54.
1.4
Real numbers
I described real numbers in terms of decimal expansions, but this is not the
basic definition of the real numbers. The reals differ from the rationals by
satisfying what is called the completeness axiom. A detailed discussion of
real numbers really belongs to a course in calculus/analysis rather than algebra: the whole of calculus is constructed on this very special property of
the real numbers but let me at least say something about it before moving
on to the business of this section. Draw the number line and now imagine
that you could see only the rational numbers on that line. We shall call this
the rational number line. Superficially, it wouldn’t look any different from
the whole number line. But in fact it is full of holes: indeed, there are more
holes than there are rational numbers. This is a little hard to believe at first
because between any two distinct rational numbers r1 and r2 you can always
2
. However, we have already seen that for any prime
find a third: namely, r1 +r
2√
p none of the numbers p is rational and so appear as holes in our rational
24 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
number line and it can be proved that there are lots of lots of holes. If we
now add to our rational number line all the remaining real numbers then it
can be proved that all holes disappear. The completeness axiom is actully
a way of stating that there are no holes but this is stated in mathematical
language: every non-empty subset of the reals that is bounded above has a
least upper bound. The completeness axiom enables us to talk about limits
and so differentiable and integrable functions.
√
Remark In mathematical work, expressions that contain roots such as 2
or numbers such as π or e and so on are never explicitly calculated until
needed; this is for two reasons: first, simplifications may arise and second,
any explicit calculation will always be an approximation and not the exact
answer.
1.4.1
Irrational numbers
Real numbers are the actual values of quantities such as mass length and time.
We cannot measure them exactly: the result of a measurement will always
be a rational number. We begin by proving that there are real numbers that
are not rationals.
Recall the basic property of prime numbers: if a prime divides the product
of two numbers then it must divide at least of the numbers. We use this
property below.
A real number which is not rational is said to be irrational.
Theorem 1.4.1. The square root of every prime number is irrational.
Proof. We shall prove this by a method called proof by contradiction. Assume
√
that we can write p as a rational. I shall show that this assumption leads
to a contradiction and so must be false.
√
We are assuming that p = ab . By cancelling the greatest common divisor
of a and b we can in fact assume that gcd(a, b) = 1. This will be crucial to
our argument.
√
Squaring both sides of the equation p = ab and multiplying the resulting
equation by b2 we get that
pb2 = a2 .
This says that a2 is divisible by p. But if a prime divides a product of two
numbers it must divide at least one of those numbers by Euclid’s lemma.
1.4. REAL NUMBERS
25
Thus p divides a. Thus we can write a = pc for some natural number c.
Substituting this into our equation above we get that
pb2 = p2 c2 .
Dividing both sides of this equation by p gives
b2 = pc2 .
This tells us that b2 is divisible by p and so in the same way as above p
divides b.
√
We have therefore shown that our assumption that p is rational leads
to both a and b being divisible by p. But this contradicts the fact that
√
gcd(a, b) = 1. Our assumption is therefore wrong, and so p is not a rational
number.
Irrational numbers abound: both e and π can be proved to be irrational,
for example. The discovery of irrational numbers is due to the Ancient Greeks
and was one of the first great mathematical discoveries.
Although we cannot calculate irrational numbers exactly, we can calculate
them to any degree of accuracy needed and it is by means of such approximations that irrational numbers
are handled practically. For example, suppose
√
n
where
n is not a perfect square. Make a first guess
we want
to
calculate
√
n
is in general a better
a to n. Put b = a . Then their average a0 = a+b
2
guess. This process can be repeated, as the following example illustrates,
and enables us to calculate square roots to any desired degree of accuracy.
√
Example 1.4.2. I shall calculate some approximations to 3 using the above
method. We observe that 12 < 3 < 22 so my first guess is 1. We have that
3
= 3 and the average of 1 and 3 is 2. I now start the process all over again
1
with 2 as my guess. We have that 32 = 1 · 5 and the average of 2 and 1 · 5
is 1 · 75. This is my new guess. The number 3 divided by 1 · 75 is 1 · 714
(approximately). The average of 1 · 75 and 1 · 714 is 1 · 732. My new guess
is 1 · 732. 3 divided by 1 · 732 is 1 · 732 to 3 decimal places. Observe that
(1 · 732)2 = 2 · 999 . . . which isn’t bad.
1.4.2
Decimal fractions
I shall describe in this section the decimal fractions which correspond to
rational numbers. To see what’s involved, let’s calculate some decimal fractions.
26 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Examples 1.4.3.
(i)
1
20
(ii)
1
7
(iii)
= 0 · 05. This fraction has a finite decimal representation.
= 0 · 142857142857142857142857142857 . . .. This fraction has an infinite decimal representation, which consists of the same sequence of
numbers repeated. We abbreviate this decimal to 0 · 142857.
37
84
= 0 · 44047619. This fraction has an infinite decimal representation,
which consists of a non-repeating part followed by a part which repeats.
Case (ii) is said to be a purely periodic decimal whereas case (iii), which
is more general, is said to be ultimately periodic.
Proposition 1.4.4. A proper rational number ab in its lowest terms has a
finite decimal expansion if and only if b = 2m 5n for some natural numbers m
and n.
Proof. Let
a
b
have the finite decimal representation 0 · a1 . . . an . This means
a1
a2
an
a
=
+ 2 + ... + n.
b
10 10
10
The righthand side is just the fraction
a1 10n−1 + a2 10n−2 + . . . + an
.
10n
The denominator contains only the prime factors 2 and 5 and so the reduced
form will also only contain at most the prime factors 2 and 5.
To prove the converse, consider the proper fraction
a
2α 5β
.
If α = β then the denominator is 10α . If α 6= β then multiply the fraction by
a suitable power of 2 or 5 as appropriate so that the resulting fraction has
denominator a power of 10. But any fraction with denominator a power of
10 has a finite decimal expansion.
Proposition 1.4.5. An infinite decimal fraction represents a rational number if and only if it is ultimately periodic.
1.4. REAL NUMBERS
27
Proof. Consider the ultimately periodic decimal number
r = 0 · a1 . . . as b1 . . . bt .
We shall prove that r is rational. Observe that
10s r = a1 . . . as · b1 . . . bt
and
10s+t = a1 . . . as b1 . . . bt · b1 . . . bt .
From which we get that
10s+t r − 10s r = a1 . . . as b1 . . . bt − a1 . . . as
where the righthand side is the decimal form of some integer that we shall
call a. It follows that
a
r = s+t
10 − 10s
is a rational number.
The proof of the converse is based on the method we use to compute
. We carry out repeated divisions by n and at
the decimal expansion of m
n
each step of the computation we use the remainder obtained to calculate
the next digit. But there are only a finite number of possible remainders
and our expansion is assumed infinite. Thus at some point there must be
repetition.
Example 1.4.6. We shall write the ultimately periodic decimal 0 · 94̄. as a
proper fraction in its lowest terms. Put r = 0 · 94̄. Then
• r = 0 · 94̄.
• 10r = 9.444 . . .
• 100r = 94.444 . . ..
85
Thus 100r − 10r = 94 − 9 = 85 and so r = 90
. We can simplify this to r =
We can now easily check that this is correct.
17
.
18
The commonest mistake in working with ultimately periodic decimals is
simply misreading which group of digits the overline groups together. This
is followed by ignoring the overline sign completely.
28 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Exercises 1.4
1. For each of the following fractions determine whether they have finite
or infinite decimal representations. If they have infinite decimal representations determine whether they are purely periodic or ultimately
periodic; in both cases determine the periodic block.
(i)
(ii)
(iii)
(iv)
(v)
(vi)
1
.
2
1
.
3
1
.
4
1
.
5
1
.
6
1
.
7
2. Write the following decimals as fractions in their lowest terms.
(i) 0 · 534.
(ii) 0 · 2106.
(iii) 0 · 076923.
1.5
*The prime number theorem*
This section will not be examined in 2013.
There are no nice formulae to tell us what the nth prime is but there are
still some interesting results in this direction. The polynomial
p(n) = n2 − n + 41
has the property that its value for n = 1, 2, 3, 4, . . . , 40 is always prime. Of
course, for n = 41 it is clearly not prime. In 1971, the mathematician Yuri
Matijasevic found a polynomial in 26 variables of degree 25 with the property
that when non-negative integers are substituted for the variables the positive
values it takes are all and only the primes. However, this polynomial does
not generate the primes in any particular order.
If we adopt a statistical approach then we can obtain much more useful
results. The idea is that for each natural number n we count the number
1.5. *THE PRIME NUMBER THEOREM*
29
of primes π(n) less than or equal to n. If we are going to do this then our
first problem is to compile a table of sufficiently many of them. The simplest
way of doing this is to use the Sieve of Eratosthenes. Suppose we want to
construct a table of all primes up to the number N . We begin by listing all
numbers from 2 to N inclusive. Mark 2 as prime and then cross out from the
table all numbers which are multiples of 2. The first number after 2 which
we have not crossed out is 3. We mark this as prime and then cross out all
multiples of 3. The first number after 3 not crossed out is 5. We mark this as
prime and continue in the same way. We stop when
√ we have crossed out all
multiples of the largest prime less than or equal to N . All marked numbers
will be prime as well as those numbers which remain not crossed out.
If you compile tables of primes in this way, you can calculate the function
π(x). Its graph has a staircase shape — it certainly isn’t smooth — but as
you zoom away it begins to look smoother and smoother. This raises the
question whether there is a smooth function that is a good approximation to
π(n). This seems to have been what Gauss did. He set up a table something
like the following (this is taken from LeVeque’s book Fundamentals of number
theory, Dover, 1977) where
∆(x) =
π(x) − π(x − 1000)
1000
represents an approximate slope of the curve π(x).
x
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
π(x)
168
303
430
550
669
783
900
1007
1117
1229
∆(x)
0 · 168
0 · 135
0 · 127
0 · 120
0 · 119
0 · 114
0 · 117
0 · 107
0 · 110
0 · 112
1
ln(x)
0 · 145
0 · 132
0 · 125
0 · 121
0 · 117
0 · 115
0 · 113
0 · 111
0 · 110
0 · 109
Gauss noticed, because that was the kind of person he was, that the slope of
1
π(x) looked very much like ln(x)
. This suggests that the function, defined by
30 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
integrating these slopes, is given by
Z
x
li(x) =
t=2
1
dt
ln(t)
should be an approximation to π(x). It is called the logarithmic integral. Of
course, this is not a theorem: it is a conjecture. It was proved in 1896 by two
mathematicians: Hadamard in France and de la Vallée Poussin in Belgium.
Theorem 1.5.1 (The Prime Number Theorem (PNT): version 1).
π(x)
= 1.
x→∞ li(x)
lim
This version of the PNT is not that easy for us to use. However by
l’Hôpital’s rule, we can show that
li(x)
= 1.
x→∞ x/ ln(x)
lim
If we assume the first version of the PNT and use the above result, we obtain
the second version of the PNT.
Theorem 1.5.2 (The Prime Number Theorem: version 2).
π(x)
= 1.
x→∞ x/ ln(x)
lim
The above theorem can be interpreted as saying that for large values of
x
x the value of π(x) is approximately given by ln(x)
. This result is a huge
improvement on the theorem that there are infinitely many primes: it tells
us not only that there are infinitely many of them but also how they are
distributed.
Prime numbers also play an important role in computing: specifically, in
exchanging secret information. In 1976, Whitfield Diffie and Martin Hellman
wrote a paper on cryptography that can genuinely be called ground-breaking.
In ‘New directions in cryptography’ IEEE Transactions on Information Theory 22 (1976), 644–654, they put forward the idea of a public-key cryptosystem which would enable
. . . a private conversation . . . [to] be held between any two individuals regardless of whether they have ever communicated before.
1.6. *PROOFS BY INDUCTION*
31
With considerable farsightedness, Diffie and Hellman foresaw that such cryptosystems would be essential if communication between computers was to
reach its full potential. However, their paper did not describe a concrete way
of doing this. It was R. I. Rivest, A. Shamir and L. Adleman (RSA) who
found just such a concrete method described in their paper, ‘A method for
obtaining digital signatures and public-key cryptosystems’ Communications
of the ACM 21 (1978), 120–126. Their method is based on the following
observation. Given two prime numbers it takes very little time to multiply
them together, but if I give you a number that is a product of two primes
and ask you to factorize it then it takes a lot of time. After considerable
experimentation, RSA showed how to use little more than second year undergraduate mathematics to put together a public-key cryptosystem that is
an essential ingredient in e-commerce. Ironically, this secret code had in fact
been invented in 1973 at GCHQ — who had kept it secret.
1.6
*Proofs by induction*
This section will not be examined in 2013.
This is a proof technique with applications throughout mathematics. The
basis of this technique is the following idea:
“I am thinking of a subset X of the infinite set {1, 2, 3, . . .}. I tell
you two things about X: first, 1 ∈ X, and second if n ∈ X then
n + 1 ∈ X. What is X?”
The fact that these two pieces of information are enough to determine
the set of positive integers is called the principle of induction. This principle
can be used to prove results as follows.
Suppose we have an infinite number of statements S1 , S2 , S3 , . . . which we
want to prove. By the principle of induction it is enough to do two things:
1. Show that S1 is true.
2. Show that if Sn is true then Sn+1 is also true.
It will follow that Si is true for all positive i. This proof technique can only
be learnt by attempting lots of examples.
32 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Example 1.6.1. Prove by induction that
n
X
k=1
k=
n(n + 1)
.
2
A proof by induction takes the following form:
Base step Show that the case k = 1 holds.
Induction hypothesis (IH) Assume that the case k = n holds.
Proof bit Now use (IH) to show that the case k = n + 1 holds.
Exercise 1.6.2. Prove by induction that
1 + 3 + 5 + . . . + (2n − 1) = n2
for each n ≥ 1.
Exercise 1.6.3. Prove by induction that 5n − 1 is exactly divisible by 4 for
all natural numbers n ≥ 1.
What I have described above I shall call ‘basic’ induction. There are
numerous variations on basic induction. I shall describe two here:
1. Rather than starting the base step at k = 1 we might start at k = 2 or
k = 3 and so on.
2. In basic induction we assume Sn and prove Sn+1 . Sometimes we need
to assume some or all of S1 , . . . , Sn to be true in order to prove Sn+1
and in addition our base case may consist of several cases. This is often
called ‘strong induction’.
Example 1.6.4. Prove for all natural numbers n ≥ 4 that n3 < 3n .
Exercises on induction
1. Prove the following by induction.
(i) n3 + 2n is exactly divisible by 3 for all natural numbers n ≥ 1.
1.6. *PROOFS BY INDUCTION*
(ii)
Pn
i=1
i2 =
n(n+1)(2n+1)
6
33
for all natural numbers n ≥ 1.
(iii) n! ≥ 2n−1 for all natural numbers n ≥ 1.
2. Prove for all n ≥ 1 that
1
1
1
n
+
+ ... +
=
.
1·2 2·3
n(n + 1)
n+1
3. Prove for all n ≥ 0 that the following number is exactly divisible by 17
3 · 52n+1 + 23n+1 .
4. A matrix A is said to be symmetric if it is equal to its transpose; that
is, AT = A. Prove that if A is symmetric then An is symmetric for all
n ≥ 1. [You will be able to answer this cover after we have studied
matrices in Chapter 3.]
5. Prove that n3 < 3n for all n ≥ 4.
Solutions
1. (i) Base step: when n = 1, we have that n3 + 2n = 3 which is exactly
divisible by 3. Induction hypothesis: assume result is true for
n = k. We prove it for n = k + 1. We need to prove that
(k + 1)3 + 2(k + 1) is exactly divisible by 3 assuming only that
k 3 +2k is exactly divisible by 3. We first expand (k +1)3 +2(k +1)
to get
k 3 + 3k 2 + 3k + 1 + 2k + 2.
This is equal to
(k 3 + 2k) + 3(k 2 + k + 1)
which is exactly divisible by 3 using the Induction hypothesis.
(ii) Base step: check the formula is true when n = 1. Induction hypothesis: assume the result is true for n = k. We prove it for
n = k + 1. We need to calculate
k+1
X
i=1
i2 .
34 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
But this is equal to
k+1
X
i2 =
i=1
k
X
!
+ (k + 1)2 .
i2
i=1
By the induction hypothesis this is equal to
k(k + 1)(2k + 1)
+ (k + 1)2 .
6
This can be written as
k(k + 1)(2k + 1) + 6(k + 1)2
.
6
Taking out the factor (k +1) and then carrying out some algebraic
manipulation gives
(k + 1)(k + 2)(2k + 3)
,
6
as required.
(iii) Base step: check inequality holds when n = 1. Induction hypothesis: assume inequality holds for n = k. We prove it for n = k + 1.
We argue as follows
(k + 1)! = (k + 1)k! ≥ (k + 1)2k−1
by the Induction hypothesis. Since k ≥ 1 it is clear that k + 1 ≥ 2.
Thus
(k + 1)2k−1 ≥ 22k−1 = 2k .
Hence we have proved that
(k + 1)! ≥ 2k ,
as required.
2. Base case: When n = 1 the LHS is
LHS=RHS.
1
2
and the RHS is
IH: We assume that
n
X
i=1
1
n
=
.
i(i + 1)
n+1
1
1+1
and so
1.6. *PROOFS BY INDUCTION*
35
Proof part: We have to prove that
n+1
X
i=1
1
n+1
=
.
i(i + 1)
n+2
We start with the LHS of the equality we are trying to prove
!
n+1
n
X
X
1
1
1
=
+
.
i(i + 1)
i(i + 1)
(n + 1)(n + 2)
i=1
i=1
By the induction hypothesis this is equal to
n
1
+
.
n + 1 (n + 1)(n + 2)
If we add these fractions and factorise the numerator we get
(n + 1)2
.
(n + 1)(n + 2)
On cancelling the common factor we get
n+1
,
n+2
as required.
3. Base case: When n = 0 the number in question is 17 and so clearly
exactly divisible by 17.
IH: Assume that 3 · 52n+1 + 23n+1 is exactly divisible by 17.
Proof part: We have to prove that
3 · 52n+3 + 23n+4
is exactly divisible by 17.
Observe that
3·52n+3 +23n+4 = 3·52n+1+2 +23n+1+3 = 3·52n+1 ·52 +23n+1 ·23 = 3·52n+1 ·25+23n+1 ·8
which is equal to
75 · 52n+1 + 8 · 23n+1 .
36 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Write 75 = 8 · 3 + 51 (why?). Then
75·52n+1 +8·23n+1 = (8·3+51)52n+1 +8·23n+1 = 8·3·52n+1 +8·23n+1 +51·52n+1
which is equal to
8(3 · 52n+1 + 23n+1 ) + 17 · 3 · 52n+1 .
By (IH) the first summand is exactly divisible by 17, so is the second
and so is their sum.
4. Base case: We are given that A is symmetric and A1 = A by definition.
IH: We assume that if A is symmetric then An is symmetric.
Proof part: We have to prove that An+1 is symmetric.
We know that An+1 = AAn . Thus
(An+1 )T = (AAn )T = (An )T AT
using a familiar property of the transpose. By (IH), we have that
(An )T = An and by assumption AT = A and so
(An+1 )T = An A = An+1
as required.
5. Base case: Here the base case is n = 4. The LHS is 43 = 64 and the
RHS is 34 = 81. Thus the LHS is strictly less than the RHS.
IH: We assume that n3 < 3n .
Proof part: We have to prove that (n + 1)3 < 3n+1 .
We start on the LHS of the inequality we are trying to prove
(n + 1)3 = [n(1 +
1
1 3
)] = n3 (1 + )3 .
n
n
By (IH), we know that n3 < 3n and because n ≥ 4 we know that
(1 + n1 )3 ≤ ( 54 )3 < 3. Thus
(n + 1)3 < 3n · 3 = 3n+1 ,
as required.
1.7. LEARNING OUTCOMES FOR CHAPTER 1
1.7
37
Learning outcomes for Chapter 1
At the end of working through this chapter, you should be able to do the
following. You can think of these as potential test and exam questions.
• Understand basic set notation.
• You should know the meanings of all the words highlighted in italics in
the lecture notes.
• You should work through and understand all the proofs in this chapter.
• Convert between different number bases.
• Calculate greatest common divisors using Euclid’s algorithm (and Blankinship’s algorithm discussed in Chapter 3).
• Use the extended Euclidean algorithm.
• Calculate least common multiples.
• Manipulate fractions using gcd’s and lcm’s.
• Find prime factorizations of numbers and apply them.
• Prove that certain numbers are irrational.
• Convert between fractional and decimal representations of numbers.
1.8
Further reading and exercises
For sets, look at Chapter 1 of Hammack. In Chapter 5, I shall end up
covering most of this material, but for the time being, concentrate on the
exercises that deal with the material I have taught so far. Chapter 1 of Olive
contains some basic background material that you might find useful. You can
find more exercises dealing with numbers in Chapter 11 of Schaum’s Outline
Discrete Mathematics. If you want to find out more about prime numbers, I
recommend Marcus du Sautoy, The music of the primes, Harper Perennial,
2004.
38 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC
Chapter 2
The fundamental theorem of
algebra
In this chapter, we introduce the complex numbers. These are essential for
the further development of both algebra and calculus. Not only do they
have practical applications, they also have important theoretical ones: they
enable us to connect different parts of mathematics that would otherwise
look unrelated.
2.1
Complex number arithmetic
In the set of real numbers we can add, subtract, multiply and divide, but
we cannot always extract square roots. For example, the real number 1 has
the two real square roots 1 and −1, whereas the real number −1 has no real
square roots, the reason being that the square of any real non-zero number is
always positive. In this section, we shall repair this lack of square roots and,
as we shall learn, we shall in fact have achieved much more than this. Complex numbers were first studied in the 1500’s but were only fully accepted
and used in the 1800’s.
√
Warning! If r is a positive real number then r is usually interpreted to
mean the positive square root. If I want
√ to emphasise that both square roots
need to be considered I shall write ± r.
39
40
2.1.1
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
Solving quadratic equations
Quadratic equations were solved by the Babylonians and the Egyptians and
are dealt with in all school algebra courses. I have included them here because
I want to show you that you don’t have to remember a formula to solve such
equations; what you have to remember is a method.
An expression of the form
ax2 + bx + c
where a, b, c are numbers and a 6= 0 is called a quadratic polynomial or a
polynomial of degree 2. The numbers a, b, c are called the coefficients of the
quadratic. A quadratic where a = 1 is said to be monic. A number r such
that
ar2 + br + c = 0
is called a root of the polynomial. The problem of finding all the roots of a
quadratic is called solving the quadratic. Usually this problem is stated in
the form: ‘solve the quadratic equation ax2 + bx + c = 0’. Equation because
we have set the polynomial equal to zero.
I shall now show you how to solve a quadratic equation without having
to remember a formula. Observe first that if ax2 + bx + c = 0 then
c
b
x2 + x + = 0.
a
a
Thus it is enough to find the roots of monic quadratics.
Next, I want to write x2 + ab x as a perfect square plus a number: this
will turn out to be the crux of solving the quadratic. Let α be any number.
Then we have the following identity
(x + α)2 = x2 + 2αx + α2 .
We would like
It follows that α =
b
2αx = x.
a
b
.
2a
We therefore have that
b
b
b2
x2 + x = (x + )2 − 2 .
a
2a
4a
2.1. COMPLEX NUMBER ARITHMETIC
41
Look carefully at what we have done here: we have rewritten the lefthand
side as a perfect square — the first term on the righthand side — plus a
number — the second term on the righthand side. It follows that
c
b
b2
b
4ac − b2
b
c
x2 + x + = (x + )2 − 2 + = (x + )2 +
.
a
a
2a
4a
a
2a
4a2
Setting the last expression equal to zero and rearranging, we get
(x +
b 2 b2 − 4ac
) =
.
2a
4a2
Now take square roots of both sides, remembering that a non-zero number
has two square roots:
r
b2 − 4ac
b
=±
x+
2a
4a2
which of course simplifies to
√
b
b2 − 4ac
=±
.
x+
2a
2a
Thus
√
b2 − 4ac
2a
the usual formula for finding the roots of a quadratic. This way of solving a
quadratic equation is called completing the square. Of course, you can just
use the formula, but now you have proved that the formula always works.
x=
−b ±
Example 2.1.1. Solve the quadratic equation
2x2 − 5x + 1 = 0.
by completing the square. Divide through by 2 to make the quadratic monic
giving
5
1
x2 − x + = 0.
2
2
We now want to write
5
x2 − x
2
as a perfect square plus a number. We get
5
5
25
x2 − x = (x − )2 − .
2
4
16
42
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
Thus our quadratic becomes
5
25 1
(x − )2 −
+ = 0.
4
16 2
Rearranging and taking roots gives us
√
√
5
17
5 ± 17
x= ±
=
.
4
4
4
We now check our answer by substituting each of our two roots back into
the original quadratic and ensuring that we get zero in both cases.
For the quadratic equation
ax2 + bx + c = 0
the number D = b2 − 4ac, called the discriminant of the quadratic, plays an
important role.
• If D > 0 then the quadratic equation has two distinct real solutions.
• If D = 0 then the quadratic equation has one real root repeated.
• If D < 0 then we shall see that the quadratic equation has two complex
roots which are complex conjugate to each other. This is called the
irreducible case.
2.1.2
Introducing complex numbers
In the previous section, we reviewed how to solve quadratic equations. The
method we described apparently yields nothing when the discriminant is
strictly less than zero because we have no way, up to now, of taking square
roots of negative numbers. In this section, we shall change all that and show
that quadratic equations aways have two roots. The key step is the following
We introduce a new number, denoted by i, whose defining property
is that i2 = −1. We shall assume that in all other respects it
satisfies the usual axioms of high-school algebra. This assumption
will be justified later.
2.1. COMPLEX NUMBER ARITHMETIC
43
We shall now explore the consequences of this definition which turns out to
be a profound one for mathematics.
It follows that i and −i are the two missing square roots of 1. In all other
respects the number i will behave like a real number. Thus if b is any real
number then bi is a number, and if a is any real number then a + bi is a
number.
A complex number is a number of the form a + bi where a, b ∈ R. We
denote the set of complex numbers by C. Complex numbers are sometimes
called imaginary numbers. This is not such a good term: they are not
figments of our imagination like unicorns or dragons. Like all numbers they
are, however, products of our imagination: no one has seen the complex
number number i but, then again, no one has seen the number 2. If z =
a + bi then we call a the real part of z, denoted Re(z), and b the complex or
imaginary part of z, denoted Im(z).
Two complex numbers a + bi and c + di are equal precisely when
a = c and b = d. In other words, when their real parts are equal
and when their complex parts are equal.
We can think of every real number as being a special kind of complex
number because if a is real then a = a + 0i. Thus R ⊆ C. Complex numbers
of the form bi are said to be purely imaginary.
Now we show that we can add, subtract, multiply and divide complex
numbers. Addition, subtraction and multiplication are all easy.
Let a + bi, c + di ∈ C. To add these numbers means to calculate (a + bi) +
(c+di). We assume that the order in which we add complex numbers doesn’t
matter and that we may bracket sums of complex numbers how we like and
still get the same answer and so we can rewrite this as a + c + bi + di. Next
we assume that multiplication of complex numbers distributes over addition
of complex numbers to get (a + c) + (b + d)i. Thus
(a + bi) + (c + di) = (a + c) + (b + d)i.
The definition of subtraction is similar and justified in the same way
(a + bi) − (c + di) = (a − c) + (b − d)i.
To multiply our numbers means to calculate (a + bi)(c + di). We first
assume complex multiplication distributes over complex addition to get (a +
44
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
bi)(c + di) = ac + adi + bic + bidi. Next we assume that the order in which
we multiply complex numbers doesn’t matter to get ac + adi + bic + bidi =
ac + adi + bci + bdi2 . Now we use the fact that i2 = −1 to get ac + adi + bci +
bdi2 = ac + adi + bci − bd. We now rearrange the terms to get the following
definition of multiplication
(a + bi)(c + di) = (ac − bd) + (ad + bc)i.
Examples 2.1.2. Carry out the following calculations.
(i) (7 − i) + (−6 + 3i). We add together the real parts to get 1; adding
together −i and 3i we get 2i. Thus the solution is 1 + 2i.
(ii) (2 + i)(1 + 2i). First we multiply out the brackets as usual to get 2 +
4i + i + 2i2 . We now use the fact that i2 = −1 to get 2 + 4i + i − 2.
Finally we simplify to get 0 + 5i = 5i.
2
√
. Multiply out and simplify to get −i.
(iii) 1−i
2
The final operation is division. This is a little more involved and to
explain how it can be done we need to define a new operation on complex
numbers. Let z = a + bi ∈ C. Define
z̄ = a − bi.
The number z̄ is called the complex conjugate of z. Why is this operation
useful? Let’s calculate z z̄. We have
z z̄ = (a + bi)(a − bi) = a2 − abi + abi − b2 i2 = a2 + b2 .
Notice that z z̄ = 0 if and only if z = 0. Thus for non-zero complex numbers
z, the number z z̄ is a positive real number.
Let’s see how we can use the complex conjugate to define division of
complex numbers. Our goal is to calculate
a + bi
c + di
where c+di 6= 0. The first step is to multiply top and bottom by the complex
conjugate of c + di. We therefore get
a + bi
(a + bi)(c − di)
(a + bi)(c − di)
=
=
.
c + di
(c + di)(c − di)
(c2 + d2 )
2.1. COMPLEX NUMBER ARITHMETIC
45
This pretty much solves the problem because the top of this expression can
be multiplied out in the usual way and the bottom is a real number and
can be divided into each term in the top. We therefore have the following
definition of division, where c + di 6= 0,
ac + bd bc − ad
a + bi
= 2
+ 2
i.
c + di
c + d2
c + d2
Examples 2.1.3. Carry out the following calculations.
(i)
(ii)
(iii)
1+i
.
i
The complex conjugate of i is −i. Multiply top and bottom of the
fraction to get −i+1
= 1 − i.
1
i
.
1−i
The complex conjugate of 1 − i is 1 + i. Multiply top and bottom
of the fraction to get i(1+i)
= i−1
.
2
2
4+3i
.
7−i
The complex conjugate of 7 − i is 7 + i. Multiply top and bottom
= 1+i
.
of the fraction to get (4+3i)(7+i)
50
2
We now introduce a way of thinking about complex numbers that enables
us to visualize them. A complex number z = a + bi has two components: a
and b. It is irresistible to plot these as a point in the plane. The plane used
in this way is called the complex plane: the x-axis is the real axis and the
y-axis is interpreted as the complex axis.
? O
 
 


z=a+ib 
ib









_ _ _ _ _ _ _ _ _/ a
Although a complex number can be thought of as labelling a point in the
complex plane, it can also be regarded as labelling the directed line segment
from
√ the origin to the point. By Pythagoras’ theorem, the length of this line
is a2 + b2 . We define
√
|z| = a2 + b2
46
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
where z = a + bi. This is called the modulus1 of the complex number z.
Observe that
√
|z| = z z̄.
We shall use the following important property of moduli.
Lemma 2.1.4. |wz| = |w| |z|.
Proof. Let
p w = a + bi and z = c + di. Then wz = (ac
p− bd) + (ad + bc)i. Now
|wz| = (ac − bd)2 + (ad + bc)2 whereas |w| |z| = (a2 + b2 )(c2 + d2 ). But
(ac − bd)2 + (ad + bc)2 = (ac)2 + (bd)2 + (ad)2 + (bc)2 = (a2 + b2 )(c2 + d2 ).
Thus the result follows.
Remark We can deduce an interesting result in number theory from the algebra used in proving the above result. The product of two natural numbers
each of which is a sum of squares is itself a sum of squares.
The complex numbers were obtained from the reals by simply throwing in
one new number, i, a square root of −1. Remarkably, every complex number
has a square root.
Theorem 2.1.5. Every nonzero complex number has exactly two square
roots.
Proof. Let z = a + bi be a nonzero complex number. We want to find a
complex number w so that w2 = z. Let w = x + yi. Then we need to find
real numbers x and y such that (x+yi)2 = a+bi. Thus (x2 −y 2 )+2xyi = a+bi,
and so equating real and imaginary parts, we have to solve the following two
equations
x2 − y 2 = a and 2xy = b.
Now we actually have enough information to solve our problem, but we can
make life easier for ourselves by adding one extra equation. To get it, we use
the modulus function. From (x+yi)2 = a+bi
we get that |x + yi|2 = |a + bi|.
√
Now |x + yi|2 = x2 + y 2 and |a + bi| = a2 + b2 . We therefore have three
equations
√
x2 − y 2 = a and 2xy = b and x2 + y 2 = a2 + b2 .
1
Plural: moduli
2.1. COMPLEX NUMBER ARITHMETIC
47
If we add the first and third equation together we get
√
√
a2 + b 2
a
a + a2 + b 2
2
x = +
=
.
2
2
2
We can now solve for x and therefore for y.
Example 2.1.6. Every negative real number has√two square roots. We have
that the square roots of −r, where r > 0 are ±i r.
Example 2.1.7. Find both square roots of 3 + 4i and check your answers.
We assume that there is a complex number x + yi where both x and y are
real such that
(x + yi)2 = 3 + 4i.
Squaring and comparing real and imaginary parts we get that the following
two equations must be satisfied by x and y
x2 − y 2 = 3 and 2xy = 4.
We also have a third equation by taking moduli
x2 + y 2 = 5.
Adding the first and third equation together we get x = ±2. Thus y = 1 if
x = 2 and y = −1 if x = −2. The roots we want are therefore 2 + i and
−2 − i. Of course, one root will be minus the other. Now square either root
to check your answer: (2 + i)2 = 4 + 4i − 1 = 3 + 4i, as required.
Remark Notice that the two square roots of a non-zero complex number will
have the form w and −w; in other words, one root will be −1 times the other.
If we combine our method for solving quadratics with our method for
determining the square roots of complex numbers, we have a method for
finding the roots of quadratics with any coefficients, whether they be real or
complex.
Example 2.1.8. Solve the quadratic equation
4z 2 + 4iz + (−13 − 16i) = 0.
The complex numbers obey the same algebraic laws as the reals and so we
can solve this equation by completing the square or we can simply plug the
48
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
numbers into the formula for the roots of a quadratic. Here I shall complete
the square. First, we convert the equation into a monic one
z 2 + iz +
(−13 − 16i)
= 0.
4
Next, we observe that
1
i
(z + )2 = z 2 + iz − .
2
4
Thus
i
1
z 2 + iz = (z + )2 + .
2
4
Our equation therefore becomes
i
1
13
(z + )2 + + (− − 4i) = 0.
2
4
4
We therefore have
i
(z + )2 = 3 + 4i.
2
Taking square roots of both sides using a previous calculation, we have that
z+
It follows that z = 2 +
work.
i
2
i
= 2 + i or − 2 − i.
2
or − 2 −
3i
.
2
Now check that these roots really do
Every quadratic equation ALWAYS has has exactly two roots.
Warning! Complex numbers are numbers: you need to think of them as a
whole and not as an ordered pair of numbers. For example, it is very common for students to write down the roots of, say, 3 + 4i as x = 2, y = 1 and
x = −2, y = −1. This is wrong. The roots are 2 + i and −2 − i.
Historical point In fact, complex numbers were not introduced in quite the
way I described them: it was cubic equations that led to their discovery.
2.1. COMPLEX NUMBER ARITHMETIC
49
Exercises 2.1
1. Calculate the discriminants of the following real quadratics and so determine whether they have two distinct roots, or repeated roots, or no
real roots.
(i) x2 + 6x + 5.
(ii) x2 − 4x + 4.
(iii) x2 − 2x + 5.
2. Solve the following quadratic equations by completing the square. Check
your answers.
(i) x2 + 10x + 16 = 0.
(ii) x2 + 4x + 2 = 0.
(iii) 2x2 − x − 7 = 0.
3. Solve the following problems in complex number arithmetic. In each
case, the answer should be in the form a + ib where a and b are real.
(i) (2 + 3i) + (4 + i).
(ii) (2 + 3i)(4 + i).
(iii) (8 + 6i)2 .
(iv) 2+3i
.
4+i
(v)
(vi)
1
3
+ 1+i
.
i
3+4i
3−4i
− 4+4i
.
3−4i
4. Find the square roots of each of the following complex numbers and
check your answers.
(i) −i.
√
(ii) −1 + i 24.
(iii) −13 − 84i.
5. Solve the following quadratic equations and check your answers.
(i) x2 + x + 1 = 0.
(ii) 2x2 − 3x + 2 = 0.
(iii) x2 − (2 + 3i)x − 1 + 3i = 0.
50
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
2.2
The fundamental theorem of algebra
I want to describe now a result which is one of the most important consequences of the properties of complex numbers: the fundamental theorem of
algebra. It should be understood that this is a misnomer since algebra has
expanded beyond all bounds since this theorem was first proved. Nevertheless, it is an important result playing a key role in calculus where it is used
(in its real version which I also describe) to prove that any rational function
can be integrated using partial fractions.
In this section, we shall work with arbitrary polynomials and I shall now
recall some terminology for handling them. An expression
an xn + an−1 xn−1 + . . . + a1 x + a0
where ai are complex numbers, called the coefficients, is called a polynomial.
We assume an 6= 0. The degree of this polynomial is n. We abbreviate this
to deg. If an = 1 the polynomial is said to be monic. The term a0 is called
the constant term and the term an xn is called the leading term. Polynomials
can be added, subtracted and multiplied.
Two polynomials are equal if they have the same degree and the coefficients of terms of the same degree are equal.
• Polynomials of degree 1 are said to be linear;
• those of degree 2, quadratic;
• those of degree 3, cubic;
• those of degree 4, quartic;
• those of degree 5, quintic.
There are special terms for polynomials of degree higher than 5, if you want
them.
Why are polynomials interesting? There are two answers to this question.
First, they have widespread applications such as in helping to solve linear
differential equations and in studying matrices. Second, a polynomial defines
a function which is calculated in a very simple way using the operations of
addition, subtraction and multiplication. However many, more complicated,
functions can be usefully approximated by polynomial ones.
2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
51
We denote by C[x] the set of polynomials with complex coefficients and
by R[x], the set of polynomials with real coefficients. I will write F [x] to
mean F = R or F = C.
2.2.1
The arithmetic of polynomials
The addition, subtraction and multiplication of polynomials is easy. We shall
therefore concentrate in this section on division.
Let f (x), g(x) ∈ F [x]. We say that g(x) divides f (x), denoted by
g(x) | f (x),
if there is a polynomial q(x) ∈ F [x] such that f (x) = g(x)q(x). We say that
g(x) is a factor.
Example 2.2.1. Let f (x) = x4 + 2x + 1 and g(x) = x + 1. Then
(x + 1) | x4 + 2x + 1
since x4 + 2x + 1 = (x + 1)(x3 − x2 + x + 1).
Lemma 2.2.2. Let f (x), g(x) ∈ F [x] be non-zero polynomials. Then
deg f (x)g(x) = deg f (x) + deg g(x).
Proof. Let f (x) have leading term am xm and let g(x) have leading term bn xn .
Then the leading term of f (x)g(x) is am bn xm+n . Now am bn 6= 0 and so the
degree of f (x)g(x) is m + n, as required.
The above result is used to justify the standard procedure for dividing one
polynomial into another which I shall now describe by means of an example.
Remember that answers can always be checked by multiplying out.
Example 2.2.3. Divide 6x4 + 5x3 + 4x2 + 3x + 2 by 2x2 + 4x + 5 and so find
the quotient and remainder. We set out the computation in the following
form.
2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
To get the term involving 6x4 we would have to multiply the lefthand side
by 3x2 . As a result we write down the following
3x2
2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
52
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
We now subtract the lower righthand side from the upper and we get
3x2
2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
−7x3 − 11x2 + 3x + 2
2
The procedure is now repeated with the new polynomial.
3x2 − 27 x
2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
−7x3 − 11x2 + 3x + 2
−7x3 − 14x2 − 35
x
2
41
2
3x + 2 x + 2
2
The procedure is repeated one more time with the new polynomial
3x2 − 27 x + 32 quotient
2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2
6x4 + 12x3 + 15x2
−7x3 − 11x2 + 3x + 2
−7x3 − 14x2 − 35
x
2
41
2
3x + 2 x + 2
3x2 + 12
x + 15
2
2
29
x − 11
remainder
2
2
2
This is the end of the line because the new polynomial we obtain has degree
strictly less than the polynomial we are dividing by. What we have shown is
that
7
3
29
11
6x4 + 5x3 + 4x2 + 3x + 2 = (2x2 + 4x + 5)(3x2 − x + ) + ( x − ).
2
2
2
2
You can verify this is true by multiplying out the righthand side.
2.2.2
Roots of polynomials
The following result is analogous to the remainder theorem for integers. I
shall not prove it here, although it should seem plausible.
2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
53
Proposition 2.2.4 (Remainder theorem). Let f (x) and g(x) be polynomials
in F [x] where deg f (x) ≥ deg g(x). Then either
g(x) | f (x)
or
f (x) = g(x)q(x) + r(x)
where deg r(x) < deg g(x).
Example 2.2.5. Let f (x) = x3 + x + 3 and g(x) = x2 + x. Then x3 + x + 3 =
(x − 1)(x2 + x) + (2x + 3).
Let f (x) ∈ F [x]. A number r ∈ F is said to be a root or zero of f (x) if
f (r) = 0. The roots of f (x) are the solutions of the equation f (x) = 0.
Example 2.2.6. The number 1 is a root of x100 −2x98 +1 because 1−2+1 = 0.
Checking whether a number is a root is easy, but finding a root in the
first place is trickier. The next result tells us that when we find roots of polynomials we are in fact determining linear factors. It is crucial to eveything
we shall do.
Proposition 2.2.7. Let r ∈ F . Then r is a root of f (x) ∈ F [x] if and only
if (x − r) | f (x).
Proof. Suppose that (x − r) | f (x). Then by definition f (x) = (x − r)q(x)
for some polynomial q(x). If we now calculate f (r) we see immediately that
it must be zero.
We now prove the converse. Suppose that r is a root of f (x). By the
remainder theorem, either (x − r) | f (x) or f (x) = q(x)(x − r) + r(x) where
deg(r(x)) < deg(x − r) = 1. If the former then we are done. If the latter
then it follows that r(x) is in fact a constant (that is, just a number). Call
this number a. If we calculate f (r) we get a. It follows that in fact a = 0
and so (x − r) | f (x).
Example 2.2.8. We have seen that the number 1 is a root of x100 − 2x98 + 1.
Thus by the above result (x − 1) | x100 − 2x98 + 1.
A root r of a polynomial f (x) is said to have multiplicity m if
(x − r)m | f (x)
but (x − r)m+1 does not divide f (x). A root is always counted according to
its multiplicity.
54
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
Example 2.2.9. The polynomial x2 + 2x + 1 has −1 as a root and no other
roots. However (x + 1)2 = x2 + 2x + 1 and so the root −1 occurs with
multiplicity 2. Thus the polynomial has two roots counting multiplicities.
This is the sense in which we can say that a quadratic equation always has
two roots.
Proposition 2.2.10. A non-zero polynomial of degree n has at most n roots.
Proof. Let f (x) be a non-zero polynomial of degree n > 0. Suppose that
f (x) has a root a. Then f (x) = (x − a)f1 (x) by Proposition 2.2.7 and the
degree of f1 (x) is n − 1. This argument can be repeated and we reach the
desired conclusion.
One question I have so far not dealt with is whether a polynomial need
have a root. This is answered by the following theorem whose name reflects its
importance when first discovered, and not its significance in modern algebra.
We shall not give a proof because that would require more advanced methods
than are covered in this course. It was first proved by the great German
mathematician Carl Friedrich Gauss (1777–1855) in 1799 when he was 22.
Theorem 2.2.11 (Fundamental theorem of algebra (FTA)). Every non-zero
polynomial of degree n with complex coefficients has a root.
This theorem has the following important consequence using Proposition 2.2.10.
Corollary 2.2.12. Every polynomial with complex coefficients of degree n
has exactly n complex roots (counting multiplicities). Thus every such polynomial can be written as a product of linear polynomials.
Proof. Let f (x) be a non-zero polynomial of degree n. By the FTA, this
polynomial has a root r1 . Thus f (x) = (x − r1 )f1 (x) where f1 (x) is a polynomial of degree n − 1. This argument can be repeated and we eventually end
up with f (x) = a(x − r1 ) . . . (x − rn ) where a is the last quotient, necessarily
a complex number.
Example 2.2.13. It can be checked that the quartic x4 − 5x2 − 10x − 6 has
roots −1, 3, i − 1 and −1 − i. We can therefore write
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
55
In many practical examples, our polynomials will have real coefficients
and we will want any factors of the polynomial to be likewise real. The result
above doesn’t do that because it could produce complex factors. However,
we can rectify this situation at a very small price. We shall use the notion
of the complex conjugate of a complex number that we introduced earlier.
Observe that
z1 + . . . + zn = z1 + . . . + zn
and
z1 . . . zn = z1 . . . zn
and
z is real if and only if z = z.
The proofs are left as exercises. We may now prove the following key
lemma.
Lemma 2.2.14. Let f (x) be a polynomial with real coefficients. If the complex number z is a root then so too is z.
Proof. Let
f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0
where the ai are real numbers. Let z be a complex root. Then
0 = an z n + an−1 z n−1 + . . . + a1 z + a0 .
Take the complex conjugate of both side and use the properties of the complex
conjugate to get
0 = an z̄ n + an−1 z̄ n−1 + . . . + a1 z̄ + a0
and so z̄ is also a root.
Example 2.2.15. We saw above that
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
Observe that the complex roots −1 − i and −1 + i are complex conjugates
of each other.
56
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
Lemma 2.2.16. Let z be a complex number which is not real. Then
(x − z)(x − z̄)
is an irreducible quadratic with real coefficients.
On the other hand, if x2 + bx + c is an irreducible quadratic with real
coefficients then its roots are complex conjugates of each other.
Proof. To prove the first claim, we multiply out to get
(x − z)(x − z̄) = x2 − (z + z̄)x + z z̄.
Observe that z + z̄ and z z̄ are both real numbers. The discriminant of this
polynomial is (z − z̄)2 . You can check that if z is complex and non-real then
z − z̄ is purely complex. It follows that its square is negative. We have
therefore shown that our quadratic is irreducible.
The proof of the second claim follows from the formula for the roots of a
quadratic combined with the fact that the square root of a negative real will
have the form ±αi where α is real.
Example 2.2.17. We saw above that
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i).
Multiply out (x + 1 + i)(x + 1 − i) and we get x2 + 2x + 2. Thus
x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x2 + 2x + 2)
with all the polynomials involved being real.
The following theorem is the one that we can use to help us solve problems
involving real polynomials.
Theorem 2.2.18 (Fundamental theorem of algebra for real polynomials).
Every polynomial with real coefficients can be written as a product of polynomials with real coefficients which are either linear or irreducible quadratic.
Proof. We can write the polynomial as a product of linear polynomials. Bring
the real linear factors to the front. The remaining linear polynomials will
have complex coefficients. They correspond to roots that come in complex
conjugate pairs. Multiplying together those complex linear factors corresponding to complex conjugate roots we get real quadratics and the result is
proved.
2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
57
In fact, we can write any real polynomial as a real number times a product
of monic linear and quadratic factors. This result is the basis of the method
of partial fractions used in integrating rational functions in calculus.
Finding the exact roots of a polynomial is difficult, in general. However,
the following result tells us how to find the rational roots of monic polynomials with integer coefficients. It is a nice, and perhaps unexpected, application
of the number theory we developed earlier.
Lemma 2.2.19. Let
f (x) = xn + an−1 xn−1 + . . . + a1 x + a0
be a monic polynomial with integer coefficients. Then if rs is a root with r
and s coprime then r | a0 and s = 1. In other words, any rational root is an
integer that divides the constant term.
Proof. Substituting
r
s
into f (x) we have, by assumption, that
r
r
r
0 = ( )n + an−1 ( )n−1 + . . . + a1 ( ) + a0 .
s
s
s
Multiply through by rn to get
0 = rn + an−1 srn−1 + . . . + sn−1 r + a0 sn .
Thus r | a0 sn . Now r and s are coprime and so r and sn must be coprime
(think about common divisors which are prime). It follows that r | a0 .
Similarly s | rn but s and rn are coprime and so s = 1.
Example 2.2.20. Find all the roots of the following polynomial
x4 − 8x3 + 23x2 − 28x + 12.
The polynomial is monic and so the only possible rational roots are integers
and must divide 12. Thus the only possible rational roots are
±1, ±2, ±3, ±4, ±6, ±12.
We find immediately that 1 is a root and so (x−1) must be a factor. Dividing
out by this factor we get the quotient
x3 − 7x2 + 16x − 12.
58
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
We check this polynomial for rational roots and find 2 works. Dividing out
by (x − 2) we get the quotient
x2 − 5x + 6.
Once we get down to a quadratic we can solve it directly. In this case it
factorizes as (x − 2)(x − 3). We therefore have that
x4 − 8x3 + 23x2 − 28x + 12 = (x − 1)(x − 2)2 (x − 3).
At this point, I usually multiply out the righthand side and check that I
really do have an equality. In this case, all roots are rational and are 1,2,2,3.
The above result suggests an approach to finding the roots of polynomials
with rational coefficients. First, multiply through by a number large enough
to render all coefficients integers. Second, find the rational roots using the
method above. For each such root a, there will be a linear factor (x − a).
Dividing out by the product of these factors will yield a polynomial whose
roots will be necessarily irrational but will at least be of smaller degree. This
polynomial will have real coefficients and so its complex roots will occur
in complex conjugate pairs. Although in real life this approach cannot be
guaranteed to find all the roots, it should stand you in good stead in the
considerably less real life of exam questions.
Exercises 2.2
1. Find the quotient and remainder when the first polynomial is divided
by the second.
(i) x3 − 7x − 1 and x − 2.
(ii) x4 − 2x2 − 1 and x2 + 3x − 1.
(iii) 2x3 − 3x2 + 1 and x.
2. Find all roots using the information given.
(i) 4 is a root of 3x3 − 20x2 + 36x − 16.
(ii) −1, −2 are both roots of x4 + 2x3 + x + 2.
3. Find a cubic having roots 2, −3, 4.
2.3. COMPLEX NUMBER GEOMETRY
59
4. Find a quartic having roots i, −i, 1 + i and 1 − i.
5. The cubic x3 + ax2 + bx + c has roots α, β and γ. Show that a, b, c can
each be written in terms of the roots.
√
6. 3 + i 2 is a root of x4 + x3 − 25x2 + 41x + 66. Find the remaining
roots.
√
7. 1 − i 5 is a root of x4 − 2x3 + 4x2 + 4x − 12. Find the remaining roots.
8. Find all the roots of the following polynomials.
(i) x3 + x2 + x + 1.
(ii) x3 − x2 − 3x + 6.
(iii) x4 − x3 + 5x2 + x − 6.
9. Write each of the following polynomials as a product of linear or quadratic
real factors.
(i) x3 − 1.
(ii) x4 − 1.
(iii) x4 + 1.
2.3
Complex number geometry
We proved in Section 2.1, that every non-zero complex number has two square
roots. From the fundamental theorem of algebra (FTA), discussed in Section 2.2, we know that every non-zero complex numbers has three cube roots,
and four fourth roots, and more generally n nth roots. However, we didn’t
prove the FTA. The main goal of this section is to prove that every non-zero
complex number has n nth-roots. To do this, we shall think about complex
numbers in a geometric, rather than an algebraic, way. Throughout this section we shall not assume FTA. We shall only need the result proved in the
previous section that every polynomial of degree n has at most n roots.
60
2.3.1
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
sin and cos
We first recall some well-known properties of the trigonometric functions sin
and cos. First the addition formulae
sin(α + β) = sin α cos β + cos α sin β
and
cos(α + β) = cos α cos β − sin α sin β.
These formulae were important historically because they enabled unknown
values of sin’s and cos’s to be calculated from known ones, and so they were
useful in constructing trig tables in the days before calculators
In university mathematics, angles are usually measured in radians rather
than degrees. This is because radians are a natural unit of angle measurement
whereas the system of angle measurement based on degrees is an historical
accident. Why 360 degrees in a circle? Ask the Ancient Babylonians. Positive angles are measures in an anticlockwise direction.
The sin and cos functions are periodic functions with period 2π. This
means that for all angles θ
sin(θ + 2πn) = sin θ and cos(θ + 2πn) = cos θ
for all n ∈ Z. This fact will be crucial in what follows.
The following table of values will be useful. I leave it as an exercise to
justify it.
θ
0◦
30◦
45◦
sin θ cos θ
0
1
√
60◦
90◦
2.3.2
1
2
√1
√2
3
2
3
2
√1
2
1
2
1
0
The complex plane
In this section, we shall describe in more detail an alternative way of thinking
about complex numbers which turns out to be very fruitful. Recall that a
complex number z = a + bi has two components: a and b. We can plot these
as a point in the plane. The plane used in this way is called the complex
2.3. COMPLEX NUMBER GEOMETRY
61
plane: the x-axis is the real axis and the y-axis is interpreted as the complex
axis. Although a complex number can be thought of as labelling a point in
the complex plane, it can more usefully be regarded as labelling the directed
line segment from the origin to the point. This is how we shall regard it.
Let z = a + bi be a non-zero complex number and let θ be the angle that it
makes with the positive reals. The length of z as a directed line segment in
the complex plane is |z|, and by basic trig a = |z| cos θ and b = |z| sin θ. It
follows that
z = |z| (cos θ + i sin θ) .
? O
 





z 
i|z| sin(θ)










_ _ _ _ _ _ _ _ _/ |z| cos(θ)
Observe that |z| is a non-negative real number. This way of writing complex
numbers is called the polar form.
At this point, I need to clarify the only feature of complex numbers that
causes confusion. I have already mentioned that the functions sin and cos
are periodic. For that reason, there is not just one number θ that yields
the complex number z but infinitely many of them: namely, all the numbers
θ + 2πk where k ∈ Z. For this reason, we define the argument of z, denoted
by arg z, not merely to be the single angle θ but the set of all angles θ + 2πk
where k ∈ Z. The angle θ is chosen so that 0 ≤ θ < 2π and is called, for
convenience, the principal argument. But note that books vary on what they
choose to call the principal argument. This feature of the argument plays a
crucial role when we come to calculate nth roots.
Observe that complex numbers of the form
cos θ + i sin θ
are precisely the complex numbers of unit length and so all together they
form the complex numbers lying on the unit circle with centre the origin in
the complex plane. Thus every non-zero complex number is a real number
times a complex number lying on the unit circle.
62
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
Let w = r (cos θ + i sin θ) and z = s (cos φ + i sin φ) be two non-zero
complex numbers. We shall calculate wz. We have that
wz = rs (cos θ + i sin θ) (cos φ + i sin φ)
= rs[(cos θ cos φ − sin θ sin φ) + (sin θ cos φ + cos θ sin φ)i]
but using the properties of the sin and cos functions this reduces to
wz = rs (cos(θ + φ) + i sin(θ + φ)) .
We thus have the following important result:
when two non-zero complex numbers are multiplied together their
lengths are multiplied and their arguments are added.
This result helps us to understand the meaning of i. Multiplication by i
is the same as a rotation about the origin by a right angle. Multiplication by
i2 is therefore the same as a rotation about the origin by two right angles.
But this is exactly the same as multiplication by −1.
iO
−1 o
/
1
−i
We may apply similar geometric reasoning to explain why −1 × −1 = 1.
Multiplication by −1 is interpreted as rotation about the origin by 180◦ . It
2.3. COMPLEX NUMBER GEOMETRY
63
follows that doing this twice takes us back to where we started and so is
equivalent to multiplication by 1.
The proof of the next theorem follows by repeated application of the
result we proved above.
Theorem 2.3.1 (De Moivre). Let n be a positive integer. If z = r (cos θ + i sin θ)
then
z n = rn (cos nθ + i sin nθ) .
This result has very nice applications in painlessly obtaining trigonometric identities.
Example 2.3.2. Express cos 3θ in terms of cos θ and sin θ using De Moivre’s
Theorem. We have that
(cos θ + i sin θ)3 = cos 3θ + i sin 3θ.
However, we can expand the LHS by either multiplying out directly or using
Pascal’s triangle (or the binomial theorem as described in Chapter 5) to get
cos3 θ + 3i cos2 θ sin θ + 3 sin θ(i sin θ)2 + (i sin θ)3
which simplifies to
cos3 θ − 3 cos θ sin2 θ + i 3 cos2 θ sin θ − sin3 θ
where we use the fact that i2 = −1 and i3 = −i and i4 = 1. Equating real
and imaginary parts we get
cos 3θ = cos3 θ − 3 cos θ sin2 θ.
We also get the formula
sin 3θ = 3 cos2 θ sin θ − sin3 θ
for free.
2.3.3
Arbitrary roots of complex numbers
In this section, we shall prove that every non-zero complex number has n
nth roots: thus it has three cube roots, and four fourth roots and so on. We
begin with a special case that turns out to give us almost all the information
we need to solve the general case.
64
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
The nth roots of unity
We shall show that the number 1 has n nth roots — these are called the
n roots of unity. We know that the equation z n − 1 = 0 has at most n roots,
so all we need do is find n roots and we are home and dry. We begin with
some concrete examples.
Example 2.3.3. We find the three cube roots of 1. Divide the unit circle in
the complex plane into an equilateral triangle with 1 as one of its vertices.
Let ω denote the first vertex we meet when travelling anticlockwise from 1.
Then the roots are ω, ω 2 , ω 3 = 1 where
ω=
√ √ 1
1
−1 + i 3 and ω 2 = − 1 + i 3 .
2
2
Example 2.3.4. We find the six sixth roots of unity. Divide the unit circle
in the complex plane into a regular hexagon with 1 as one of its vertices. Let
ω denote the first vertex we meet when travelling anticlockwise from 1. This
is just
ω = cos 60◦ + i sin 60◦ .
The remaining vertices are ω 2 , ω 3 , . . . , ω 6 = 1. It is now easy to check using
De Moivre that these are all roots of unity. For example, ω 2 = cos 120◦ +
i sin 120◦ . This gives the trigonometric form of the roots. In this case, we
can easily find the algebraic form of the roots; we get
ω=
√ 1
1+i 3 .
2
The general case is solved in a similar way to our examples above using
regular n-gons in the complex plane where one of the vertices is 1.
Theorem 2.3.5 (Roots of unity). The n roots of unity are given by the
following formula
2kπ
2kπ
cos
+ i sin
n
n
for k = 1, 2, . . . , n. These complex numbers are arranged uniformly on the
unit circle and form a regular polygon with n sides: the cube roots of unity
form an equilateral triangle, the fourth roots form a square, the fifth roots
form a pentagon, and so on.
2.3. COMPLEX NUMBER GEOMETRY
65
There is only one point here that is a little confusing. It is always possible
and easy to write down the trigonometric form of the nth roots of unity. It is
also always possible to write down the algebraic form of the nth roots of unity
but this is far from easy in general; in fact, it forms part of the advanced
subject known as Galois theory.
Arbitrary nth roots
The nth roots of unity play an important role in finding arbitrary nth
roots. We begin with an example to illustrate the idea.
Example 2.3.6. We√
find the three cube roots of 2. If you use your calculator
3
you will simply find 2, a real number. There should be two others: where
are they? The explanation is that the other two cube roots are complex. Let
ω be the complex cube root of 1 that we described above. Then the three
cube roots of 2 are the following
√
√
√
3
3
3
2, ω 2, ω 2 2.
The above example generalizes.
Theorem 2.3.7 (nth roots). Let z = r (cos θ + i sin θ) be a non-zero complex
number. Put
√
θ
θ
n
u = r cos + i sin
,
n
n
the obvious nth root, and put
ω = cos
2π
2π
+ i sin ,
n
n
the first interesting nth root of unity. Then the nth roots of z are as follows
u, uω, . . . , uω n−1 .
It follows that the nth roots of z = r (cos θ + i sin θ) can be written in the
form
√
θ 2kπ
θ 2kπ
n
r cos
+
+ i sin
+
n
n
n
n
for k = 0, 1, 2, . . . , n − 1.
This is the reason why every non-zero number has two square roots that
differ by a multiple of −1: the two square roots of 1 are 1 and -1.
66
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
2.3.4
Euler’s formula
We have seen that every real number can be written as a whole number plus
a possibly infinite decimal part. It turns out that many functions can also be
written as a sort of decimal. I shall illustrate this by means of an example.
Consider the function ex . All you need to know about this function is that
it is equal to its derivative and e0 = 1. We would like to write
ex = a0 + a1 x + a2 x2 + a3 x3 + . . .
where the ai are real numbers that we have yet to determine. We can work
out the value of a0 easily by putting x = 0. This tells us that a0 = 1. To get
the value of a1 we first differentiate our expression to get
ex = a1 + 2a2 x + 3a3 x2 + . . .
Now put x = 0 again and this time we get that a1 = 1. To get the value of
a2 we differentiate our expression again to get
ex = 2a2 + 3 · 2 · a3 x + . . .
Now put x = 0 and we get that a2 = 21 . Continuing in this way we quickly
spot the pattern for the values of the coefficient an . We find that an = n!1
where n! = n(n − 1)(n − 2) . . . 2 · 1. What we have done for ex we can also
do for sin x and cos x and we obtain the following series expansions of each
of these functions.
• ex = 1 + x +
x2
2!
+
x3
3!
+
x4
4!
+ . . ..
• sin x = x −
x3
3!
+
x5
5!
−
x7
7!
+ . . ..
• cos x = 1 −
x2
2!
+
x4
4!
−
x6
6!
+ . . ..
There are interesting connections between these three series. We shall
now show that complex numbers help to explain them. Without worrying
about the validity of doing so, we calculate the infinite series expansion of
eiθ . We have that
eiθ = 1 + (iθ) +
1
1
(iθ)2 + (iθ)3 + . . .
2!
3!
2.3. COMPLEX NUMBER GEOMETRY
67
that is
1 2 1 3
1
θ − θ i + θ4 + . . .
2!
3!
4!
By separating out real and complex parts, and using the infinite series we
obtained above, we get Euler’s remarkable formula
eiθ = 1 + iθ −
eiθ = cos θ + i sin θ.
Thus the complex numbers enable us to find the hidden connections between
the three most important functions of calculus: the exponential function and
the sine and cosine functions. It follows that every non-zero complex number
can be written in the form reiθ . If we put θ = π in Euler’s formula, we get
the following result, which is widely regarded as one of the most amazing in
mathematics.
Theorem 2.3.8 (Euler’s identity).
eπi = −1.
This result shows us that the real numbers π, e and −1 are connected,
but that to establish that connection we have to use the complex number i.
This is one of the important roles of the complex numbers in mathematics in
that they enable us to make connections between topics that look different:
they form a mathematical hyperspace.
It’s soon’, no’ sense, that faddoms the herts o’ men,
And by my sangs the rouch auld Scots I ken
E’en herts that ha’e nae Scots’ll dirl richt thro’
As nocht else could – for here’s a language rings
Wi’ datchie sesames, and names for nameless things.
Gairmscoile by Hugh MacDiarmid (my italics)
Exercises 2.3
1. Express cos 5x and sin 5x in terms of cos x and sin x.
2. Solve x3 = −8i.
68
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
3. Prove the following where x is real.2
1
(eix − e−ix ).
2i
= 21 (eix + e−ix ).
(i) sin x =
(ii) cos x
4. Using Question 3(ii), show that cos4 x = 81 [cos 4x + 4 cos 2x + 3].
2.4
*Making sense of complex numbers*
This section will not be examined in 2013.
In this chapter, I have assumed that complex numbers exist and that they
obey the usual high-school rules of algebra. In this section, I shall sketch out
a proof of this.
We start with the set R × R whose elements are ordered pairs (a, b) where
a and b are real numbers. It will be helpful to denote these ordered pairs by
bold letters so a = (a1 , a2 ). We define 0 = (0, 0), 1 = (1, 0) and i = (0, 1).
We now define operations as follows
• If a = (a1 , a2 ) and b = (b1 , b2 ), define a + b = (a1 + b1 , a2 + b2 ).
• If a = (a1 , a2 ) define −a = (−a1 , −a2 ).
• If a = (a1 , a2 ) and b = (b1 , b2 ), define
ab = (a1 b1 − a2 b2 , a1 b2 + a2 b1 ).
• If a = (a1 , a2 ) 6= 0 define
−a
a1
p 2 ).
,
a−1 = ( p 2
a1 + a22
a21 + a22
It is now a long exercise to check that all the usual axioms of high-school
algebra hold. Now observe that the element (a1 , a2 ) can be written
a1 1 + a2 i
and that
ii = (0, 1)(0, 1) = (−1, 0) = −1.
This proves that the complex numbers as I described them earlier in this
chapter really do exist.
2
Compare (i) and (ii) below with sinh x = 21 (ex − e−x ) and cosh x = 21 (ex + e−x ).
2.5. *MORNING DUEL: CUBICS, QUARTICS, QUINTICS AND BEYOND*69
2.5
*Morning duel: cubics, quartics, quintics
and beyond*
This section will not be examined in 2013.
The main source for this section is Galois Theory by Ian Stewart, Second
edition, Chapman and Hall, 1998.
We have seen that every polynomial with complex coefficients has a root.
However, no indication was given on how to find that root. On the other
hand, for quadratic polynomials we have a formula for finding the roots.
It’s natural to ask, therefore, whether we can find a formula for finding the
roots of any polynomial. I have already noted that the formula for solving a
quadratic equation was known in antiquity. In fact, a Babylonian clay tablet
dated 1600 BC poses problems which are equivalent to solving quadratic
equations. However, it was not until the sixteenth century that any advances
were made in solving polynomial equations of higher degree. The following
is taken from John Stillwell’s book Elements of algebra, Springer, 1994:
Around 1500, in Bologna3 , del Ferro found the solution to the
cubic equation. The solution was rediscovered by Tartaglia in
the 1530s, and published in Cardano’s Ars Magna4 [1545]. This
book also gave the solution to the quartic equation, which was
found by Cardano’s student Ferrari.
In other words, once quadratics had been solved, it took another 3,000
years to figure out how to solve cubics. Having learnt how to solve quadratics, cubics and quartics, the next equations to be studied were the quintics,
equations of degree 5. However, at this point the story takes an unexpected
turn. In 1824, the young Norwegian mathematican Abel proved that there
was no formula for the roots of a quintic expressible in terms of the basic
operations of algebra and extracting roots. The question now arises of why
there is such a difference between equations of degree 4 or less and equations
of degree 5 or more. The complete answer to this question was found by
the French mathematician Evariste Galois. An outline of his discoveries was
drawn up by Galois in a letter he wrote on the evening of 29th May 1832.
The next day he fought a dual with pistols, was shot and subsequently died
3
4
In Italy.
The Great Art.
70
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
of peritonitis. The reason for the dual is disputed: perhaps a femme fatale
perhaps for political reasons. The stark fact is that he was a mere 20 when
he died, but his work changed the course of mathematics: there is algebra
before Galois and algebra after Galois. Galois is responsible for the fact that
algebra changes its character at the university level: it is not merely a harder
version of school algebra but a different beast.
2.6
*Analogies*
This section will not be examined in 2013.
There are parallels between the properties of the natural numbers and
the properties of real polynomials. We have seen that there are remainder
theorems for both natural numbers and polynomials. In the case of the natural numbers, we used the remainder theorem to develop Euclid’s algorithm
and the Extended Euclidean algorithm for computing greatest common divisors. We can do the same thing for polynomials. We define the greatest
common divisor of two real polynomials a(x) and b(x) to be a real polynomial
of largest degree dividing both a(x) and b(x). Any two such gcd’s will be real
number multiples of each other. We say that a(x) and b(x) are coprime if
their greatest common divisor is a constant polynomial. Euclid’s algorithm
and the Extended Euclidean algorithm can both be proved for real polynomials. As a consequence, if a(x) and b(x) are coprime real polynomials, then
we can find real polynomials c(x) and d(x) such that
1 = a(x)c(x) + b(x)d(x).
If f (x) is any real polynomial, then we can multiply both sides of the above
equation by f (x) to get
f (x) = a(x)[f (x)c(x)] + b(x)[f (x)d(x)].
Thus f (x) can be written in terms of a(x) and b(x) in a very simple way.
There is a simple refinement of this result I shall use below. If deg f (x) <
deg a(x) + deg b(x) then using the remainder theorem, we can in fact write
f (x) = B(x)a(x) + A(x)b(x)
where deg B(x) < deg b(x) and deg A(x) < deg a(x).
2.7. *RATIONAL FUNCTIONS*
71
Every natural number can be written as a product of primes, where a
prime is a number which cannot be factorised in a non-trivial way. The
analogue of a prime number for real polynomials is the notion of an irreducible
polynomial. The real polynomial f (x) is said to be irreducible if it cannot
be factorised into real polynomials each having smaller degree than f (x).
Unlike the case of prime numbers, we can characterise the real irreducible
polynomials very easily. It is a consequence of the fundamental theorem for
real polynomials that there are only two kinds of irreducible real polynomials:
linear polynomials c(x−a) and irreducible quadratic polynomials c(x2 +ax+
b), that is, quadratics having only non-real roots.
We now have the following analogue of the fundamental theorem of arithmetic for real polynomials: every real polynomial of degree at least 1 can be
written as a product of a real number and powers of distinct monic polynomials or distinct monic irreducible quadratic polynomials in essentially one
way.
2.7
*Rational functions*
This section will not be examined in 2013.
(x)
where f (x) and g(x)
A (real) rational function is simply a quotient fg(x)
are any polynomials with real coefficients, the polynomial g(x) of course not
being equal to the zero polynomial. If deg f (x) < deg g(x), I shall say that
the rational function is proper. The set of all rational functions R(x) — notice
I use round brackets unlike the square brackets for the set of real polynomials
— can be added, subtracted, multiplied and divided. In fact, they satisfy all
the laws of algebra that the real numbers do, (F1)–(F9), and so form a field
like the rationals, reals and complexes. Rational functions are enormously
useful in mathematics. The goal of this section is to show that every rational
function can be written as a sum of simpler rational functions. Once I have
shown how to do this, I will outline its application to integration.
2.7.1
Numerical partial fractions
This section is intended as motivation for the partial fraction representation
of rational functions described below. I learnt about this idea from Littlewood’s book ‘A university algebra’ published in 1958. It has since been
72
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
forgotten and rediscovered: see the article by McDowell in The College Mathematics Journal 33 No. 5 (2002), 400–403.
The goal of this section is to show how a proper fraction can be written as
a sum of proper fractions over prime power denominators. This involves two
steps which I shall describe by means of examples. The theory is an application of the fundamental theorem or arithmetic and the extended Euclidean
algorithm.
In order to add two fractions together, we first have to ensure that both
are expressed over the same denominator. For example, suppose we want to
8
. Since 7 × 13 = 91 we have the following
add 57 and 13
5
8
65 + 56
121
+
=
=
.
7 13
91
91
810
We shall now consider the reverse process, using the fraction 1003
as an
example. Observe that 1003 = 17 × 59 where 17 and 59 are coprime. Our
goal is to write
a
b
810
=
+
1003
17 59
for some natural numbers a and b. By the extended Euclidean algorithm, we
can write
1 = 7 · 17 − 2 · 59.
It follows that
7 · 17 − 2 · 59
7
2
1
=
=
− .
1003
17 · 59
59 17
Now multiply both sides by 810 to get
810
7 · 810 2 · 810
6
5
6
5
=
−
= 96 − 95 = 1 +
− .
1003
59
17
59
17
59 17
Simplifying we get
810
6
12
=
+
1003
59 17
as required.
We shall now do something different. Consider the fraction 10
. We have
16
4
that 16 = 2 and so we cannot write it as a product of coprime numbers.
However, we can do something else. We can write 10 = 2 + 8 = 21 + 23 . Thus
21 + 23
21 23
1
1
10
=
=
+
=
+
.
16
24
24 24
23 2
2.7. *RATIONAL FUNCTIONS*
73
Thus
10
1
1
= 1 + 3.
16
2
2
Let’s now combine these two steps. Consider the fraction
factorisation of 90 is 2 · 32 · 5. Our first goal is to write
41
a
b
c
= + 2+ .
90
2 3
5
Thus we have to find a, b, c such that
41
.
90
The prime
41 = 45a + 10b + 18c.
By trial and error, remembering that a, b, c have to be integers, we find that
41 = 45 · 1 + 10 · 5 + (−3) · 18.
It follows that
1
5
3
41
= + 2− .
90
2 3
5
We now want to write
d
e
5
= + 2
2
3
3 3
where |d| , |e| < 3. But 5 = 2 + 3 and so
5
1
2
= + 2.
2
3
3 3
It follows that
41
1 1 2 3
= + + − .
90
2 3 9 5
We may summarise what we have found in the following theorem.
Theorem 2.7.1.
(i) Let ab be a proper fraction, and let b = pn1 1 . . . pnr r be the prime factorisation
of b. Then
r
a X ci
=
b
pni
i=1 i
for some integers ci , where each of the fractions is proper.
(ii) Now let p be a prime and
c
pn
a proper fraction. Then
n
X dj
c
=
pn
pj
j=1
where each dj is such that |dj | < p.
74
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
2.7.2
Partial fractions
(x)
be a rational function. If deg f (x) > deg g(x) then we may apply
Let fg(x)
the Remainder Theorem and write
f (x)
r(x)
= q(x) +
g(x)
g(x)
where deg r(x) < deg g(x). Thus without loss of generality, we may assume
that deg f (x) < deg g(x) in what follows. I shall also assume that g(x) is
monic; if it isn’t there will simply be a constant factor at the front.
By the fundamental theorem for real polynomials, we may write g(x) as
a product of distinct factors of the form (x − a)r or (x2 + ax + b)s . Using
(x)
can now be written as
this decomposition of g(x), the rational function fg(x)
a sum of simpler rational functions which have the following forms:
• For each factor of g(x) of the form (x − a)r , we will have a sum of the
form
Ar−1
Ar
A1
+ ... +
+
.
r−1
x−a
(x − a)
(x − a)r
• For each factor of g(x) of the form (x2 + ax + b)s , we will have a sum
of the form
A1 x + B1
As−1 x + Bs−1
As x + Bs
+ ... + 2
+ 2
.
2
s−1
x + ax + b
(x + ax + b)
(x + ax + b)s
(x)
. The practical
This is called the partial fraction decomposition of fg(x)
method for finding such decompositions is best illustrated by means of some
examples.
Examples 2.7.2.
5
(i) Write x2 +x−6
in partial fractions. We have that x2 +x−6 = (x+3)(x−2),
a product of two distinct linear factors. We expect a solution of the
form
5
A
B
=
+
2
x +x−6
x+3 x−2
where A and B are real numbers to be determined. The RHS is just
A(x − 2) + B(x + 3)
.
(x + 3)(x − 2)
2.7. *RATIONAL FUNCTIONS*
75
Comparing the LHS with the RHS we get that
5 = A(x − 2) + B(x + 3)
which must hold for all values of x. Putting x = 2 we get B = 1 and
putting x = −3 we get A = 1. Thus
x2
5
−1
1
=
+
.
+x−6
x+3 x−2
At this point, we check our solution.
9
(ii) Write (x−1)(x+2)
2 in partial fractions. Here we have a single linear factor
and a square of a linear factor. We therefore expect an answer in the
form
A
B
C
9
=
+
+
.
2
(x − 1)(x + 2)
x − 1 x + 2 (x + 2)2
Carrying out the sum on the RHS, and comparing the LHS with the
RHS we get that
9 = A(x + 2)2 + B(x − 1)(x + 2) + C(x − 1).
Putting x = 1 we get that A = 1, putting x = −2, we get that C = −3
and putting x = −1 and using the values we have for A and C we get
that B = −1. Thus
1
1
3
9
=
−
−
.
2
(x − 1)(x + 2)
x − 1 x + 2 (x + 2)2
(iii) Write x416x
in partial fractions. We have that x4 − 16 = (x − 2)(x +
−16
2
2)(x + 4), a product of two distinct linear factors and a quadratic
factor. We expect a solution of the form
A
B
Cx + D
16x
=
+
+ 2
.
− 16
x−2 x+2
x +4
x4
This leads to
16x = A(x + 2)(x2 + 4) + B(x − 2)(x2 + 4) + (Cx + D)(x − 2)(x + 2).
Using appropriate values of x we get that A = 1, B = 1, C = 2 and
D = 0. Thus
16x
1
−1
2x
=
+
+ 2
.
4
x − 16
x−2 x+2 x +4
76
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
(iv) Write
form
3x2 +2x+1
(x+2)(x2 +x+1)2
in partial fractions. We expect a solution in the
3x2 + 2x + 1
Bx + C
Dx + E
A
+
+
=
.
(x + 2)(x2 + x + 1)2
x + 2 x2 + x + 1 (x2 + x + 1)2
This leads to
3x2 +2x+1 = A(x2 +x+1)2 +(Bx+C)(x+2)(x2 +x+1)+(Dx+E)(x+2).
Putting x = −2 yields A = 1. There are four unknowns left and so we
need four equations. However, to avoid having to solve four equations
in four unknowns we can vary our procedure. Putting x = 0 gives
0 = C + E. Putting x = 1 gives −1 = B + C + D + E. Thus
−1 = B + D. On the RHS the highest power of x occurring is x2 .
On the LHS the highest power of x occurring apears to be x4 but that
immediately implies that its coefficient must be zero. The coefficient of
x4 is 1 + B and so B = −1 which means that D = 0. Put x = 2. This
gives 6 = 7C + E. This quickly leads to E = −1 and C = 1. Thus
1
1−x
−1
3x2 + 2x + 1
=
+ 2
+ 2
.
2
2
(x + 2)(x + x + 1)
x − 2 x + x + 1 (x + x + 1)2
Let me conclude this section by sketching out why the partial fraction
decomposition of real rational functions is possible.
Consider the proper rational function
f (x)
a(x)b(x)
where a(x) and b(x) are coprime. Then we indicated above that we may
write
f (x)
A(x) B(x)
=
+
a(x)b(x)
a(x)
b(x)
where the rational functions are all proper. This may be generalised as
(x)
follows. Let fg(x)
be a proper rational function. Let g(x) = a1 . . . am (x) be a
product of pairwise coprime polynomials. Then we may write
m
f (x) X Ai (x)
=
,
g(x)
ai (x)
i=1
2.7. *RATIONAL FUNCTIONS*
77
where the rational functions are all proper.
We shall now assume that the ai (x) are either powers of linear factors or
of quadratic factors and that these factors are distinct for different i.
h(x)
Consider the proper rational function (x−a)
r where r ≥ 1. Then we may
write
h(x) = a0 + a1 (x − 1) + . . . + ar−1 (x − a)r−1
for some real numbers a0 , . . . , ar−1 in a way analogous to writing a natural
number in a number base. Thus
a1
ar−1
a0
h(x)
+ ... +
=
+
.
r
r−1
(x − a)
x−a
(x − a)
(x − a)r
Consider the proper rational function
may similarly write
h(x)
(x2 +ax+b)r
where r ≥ 1. Then we
h(x) = (a0 x+b0 )+(a1 x+b1 )(x2 +ax+b)+. . .+(ar−1 x+br−1 )(x2 +ax+b)r−1
for some real numbers a0 , . . . , ar−1 and b0 , . . . , br−1 in a way analogous to
writing a natural number in a number base. Thus
(x2
ar−1 x + br−1
a1 x + b 1
a0 x + b 0
h(x)
= 2
+ ... + 2
+ 2
.
r
r−1
+ ax + b)
x + ax + b
(x + ax + b)
(x + ax + b)r
The existence of partial fraction decompositions of real rational functions
now follows.
2.7.3
Integrating rational functions
In order to appreciate the significance of partial fractions it is essential to
understand how they are used. The goal of this section is therefore to show
you how to calculate
Z
f (x)
dx
g(x)
exactly, when f (x) and g(x) are real polynomials.
We need to know one key property of integration: namely, if ai are real
numbers then
Z X
Z
n
n
X
ai fi (x)dx =
ai fi (x)dx
i=1
i=1
78
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
This property is known as linearity. I shall break my discussion up into a
number of steps.
(x)
Step 1. Suppose that in fg(x)
we have that deg f (x) > deg g(x). By the
Remainder Theorem for polynomials we can write
f (x)
r(x)
= q(x) +
g(x)
g(x)
where deg r(x) < deg g(x). By the linearity of integration, we have that
Z
Z
Z
f (x)
r(x)
dx = q(x)dx +
dx.
g(x)
g(x)
In other words, to integrate an arbitrary rational function it is enough to
know how to integrate polynomials and proper rational functions.
Step 2. By linearity of integration, integrating arbitrary polynomials can
be reduced to integrating the following
Z
xn dx
where n ≥ 0.
(x)
be a proper rational function, so that deg f (x) < deg g(x).
Step 3. Let fg(x)
We may factorise g(x) into a product of real linear polynomials and real
(x)
irreducible quadratic polynomials and then write fg(x)
as a sum of rational
functions of one of the following two forms
a
,
(x − d)r
where a and d are real and r ≥ 1, and
(x2
px + q
+ bx + c)s
where p, q, b, c are real and s ≥ 1 and the quadratic has a pair of complex
conjugate roots. By the linearity of integration, this reduces calculating
Z
f (x)
dx
g(x)
2.7. *RATIONAL FUNCTIONS*
79
to calculating integrals of the following two forms
Z
a
dx
(x − d)r
and
Z
px + q
dx.
(x2 + bx + c)s
Again by linearity of integration, this reduces to being able to calculate the
following three integrals
Z
Z
Z
1
x
1
dx
dx
dx.
r
2
s
2
(x − d)
(x + bx + c)
(x + bx + c)s
Step 4. We now concentrate on the two integrals involving quadratics. By
completing the square, we can write
b2
b
x2 + bx + c = (x + )2 + (c − ).
2
4
4c−b2
2
2
By assumption b − 4ac < 0 (why?). Put e = 4 (which makes sense).
Thus
b
x2 + bx + c = (x + )2 + e2 .
2
I shall now use a technique of calculus known as substitution and put y =
x + 2b . Doing this, and returning to x as my variable, we need to be able to
integrate the following three integrals
Z
Z
Z
x
1
1
dx
dx
dx.
r
2
2
s
2
(x − d)
(x + e )
(x + e2 )s
Step 5. The second integral above can be converted into the first by means
of the substitution x2 = u.
We have therefore proved the following.
Theorem 2.7.3. The integration of an arbitrary rational function can be
reduced to integrals of the following three kinds:
R
1. xn dx.
R 1
2. (x−d)
r dx.
R
1
3. (x2 +e
2 )s dx.
You will learn how to calculate these integrals in your calculus module;
it turns out that (1) and (2) are easy but (3) is trickier.
80
CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA
2.8
Learning outcomes for Chapter 2
• Solve quadratics by completing the square.
• Carry out the arithmetic of complex numbers.
• Find square roots of complex numbers.
• Prove trigonometric identities using complex numbers.
• Calculate nth roots.
• Factorize real polynomials into products of real and quadratic factors.
2.9
Further reading and exercises
There is a very nice introduction to complex numbers in Chapter 10 of Olive’s
book. Be warned, that she uses j rather than i, a convention favoured by
engineers. Chapters 1 and 3 of Hirst and Singerman contain alternative
viewpoints and further exercises on the material of this chapter.
Chapter 3
Matrices
The term matrix was introduced by James Joseph Sylvester in 1850, and the
first paper on matrix algebra was published by Arthur Cayley in 1858. Matrices were introduced initially as packaging for systems of linear equations,
but then came to be investigated in their own right. The main goal of this
chapter is to introduce the basics of the arithmetic and algebra of matrices,
and we shall only be able to hint at some of their myriad applications. This
chapter and the next form the first steps in the subject known as linear algebra. It is hard to overemphasize the importance of this subject throughout
mathematics and its applications
3.1
Matrix arithmetic
In this section, I shall introduce matrices and three arithmetic operations
defined on them. I shall also define an operation called the ‘transpose of a
matrix’ that will be important in later work. This section forms the foundation for all that follows.
3.1.1
Basic matrix definitions
A matrix1 is a rectangular array of numbers. In this course, the numbers will
usually be real numbers but, on occasion, I shall also use complex numbers
for variety.
1
Plural: matrices.
81
82
CHAPTER 3. MATRICES
Example 3.1.1. The following are all matrices:
1 2 3
4 5 6
,
4
1


1 1 −1
4  ,
, 0 2
1 1
3
6 .
Warning! Usually the array of numbers that comprises a matrix is enclosed
in round brackets. Occasionally books use square brackets with the same
meaning. Later on, I shall introduce determinants and these are indicated
by using straight brackets. In general, the kind of brackets you use is important and is not just a matter of taste.
We usually denote matrices by capital Roman letters: A, B, C, etc. The
size of a matrix is m × n if it has m rows and n columns. The entries in a
matrix are often called the elements of the matrix and are usually denoted
by lower case Roman letters. If A is an m × n matrix, and 1 ≤ i ≤ m and
1 ≤ j ≤ n, then the entry in the ith row and jth column of A is often denoted
(A)ij . Thus ()ij means ‘the element in ith row and jth column’.
Examples 3.1.2.
(i) Let
A=
1 2 3
4 5 6
Then A is a 2×3 matrix. We have that (A)11 = 1, (A)12 = 2, (A)13 = 3,
(A)21 = 4, (A)22 = 5, (A)23 = 6.
(ii) Let
B=
4
1
Then B is a 2 × 1 matrix. We have that (B)11 = 4, (B)21 = 1.
(iii) Let


1 1 −1
4 
C= 0 2
1 1
3
Then C is a 3 × 3 matrix. (C)11 = 1, (C)12 = 1, (C)13 = −1, (C)21 = 0,
(C)22 = 2, (C)23 = 4, (C)31 = 1, (C)32 = 1, (C)33 = 3.
3.1. MATRIX ARITHMETIC
83
(iv) Let
D=
6
Then D is a 1 × 1 matrix. We have that (D)11 = 6.
Matrices A and B are said to be equal, written A = B, if they have the
same size and corresponding entries are equal: that is, (A)ij = (B)ij for all
allowable i and j.
Example 3.1.3. Given that
a 2 b
3 x −2
=
4 5 c
y z
0
Find a, b, c, x, y, z. This example simply illustrates what it means for two
matrices to be equal. By definition a = 3, 2 = x, b = −2, 4 = y, 5 = z and
c = 0.
Remark A typical 2 × 3 matrix, for example, would be written
a11 a12 a13
A=
a21 a22 a23
3.1.2
Addition, subtraction, scalar multiplication and
the transpose
Addition Let A and B be two matrices of the same size. Then their sum
A + B is the matrix defined by
(A + B)ij = (A)ij + (B)ij .
That is, corresponding entries of A and B are added. If A and B are not the
same size then their sum is not defined.
Subtraction Let A and B be two matrices of the same size. Then their
difference A − B is the matrix defined by
(A − B)ij = (A)ij − (B)ij .
That is, corresponding entries of A and B are subtracted. If A and B are
not the same size then their difference is not defined.
84
CHAPTER 3. MATRICES
Scalar multiplication In matrix theory numbers are often called scalars.
Let A be any matrix and λ any scalar. Then the matrix λA is defined as
follows:
(λA)ij = λ(A)ij .
In other words, every element of A is multiplied by λ.
Examples 3.1.4.
(i)
1
2 −1
3 −4
6
+
2 1 3
−5 2 1
=
1+2
2 + 1 −1 + 3
3 + (−5) −4 + 2
6+1
which gives
3
3 2
−2 −2 7
(ii)
1
2 −1
3 −4
6
−
2 1 3
−5 2 1
=
1−2
2 − 1 −1 − 3
3 − (−5) −4 − 2
6−1
which gives
−1
1 −4
8 −6
5
(iii)
1 1
2 1
−
3
3 2
−2 −2 7
is not defined since the matrices have different sizes.
(iv)
2
3
3 2
−2 −2 7
=
6
6 4
−4 −4 14
Transpose of a matrix Let A be an m × n matrix. Then the transpose of
A, denoted AT , is the n × m matrix defined by (AT )ij = (A)ji . We therefore
interchange rows and columns: the first row of A becomes the first column
of AT , the second row of A becomes the second column of AT , and so on.
3.1. MATRIX ARITHMETIC
85
Examples 3.1.5. The transposes of the following matrices
1 2 3
4 5 6
4
1
4 1
,
are, respectively,


1 4
 2 5  ,
3 6


1 1 −1
4  ,
, 0 2
1 1
3
6


1 0 1
, 1 2 1  ,
−1 4 3
6 .
Example 3.1.6. If
A=
1 −1 2
3
0 1
and B =
0 −2
3
2
1 −1
we may calculate 3A + 2B using the above definitions to get
3 −7 12
13
2 1
3.1.3
Matrix multiplication
This is more complicated than the other operations and, like them, is not
always defined. To define this operation it is useful to work with two special
classes of matrix. A row matrix or row vector is a matrix with one row (but
any number of columns). A column matrix or column vector is a matrix with
one column but any number of rows. Row and column matrices are often
denoted by bold lower case Roman letters a, b, c . . .. The ith element of the
row or column matrix a will be denoted by ai .
Examples 3.1.7. The matrix
1 2 3 4
is a row matrix whilst
is a column matrix.


1
 2 
 
 3 
4
86
CHAPTER 3. MATRICES
I shall build up to the definition of matrix multiplication in three stages.
Stage 1. Let a be a row matrix and b a column matrix, where
a = (a1 a2 . . . am )
and




b=



b1
b2
.
.
.
bn








Then their product ab is defined iff2 the number of columns of a is equal to
the number of rows of b, that is m = n, in which case their product is the
1 × 1 matrix
ab = (a1 b1 + a2 b2 + . . . + an bn ).
Remark The number
a1 b 1 + a2 b 2 + . . . + an b n
is called the inner product of a and b and is denoted by a · b. Using this
notation we have that
ab = (a · b).
Example 3.1.8. This odd way of multiplying is actually quite natural.
Here’s an example of where it arises in real life. If you buy y items whose
unit cost is x then you spend xy. This can be generalised as follows when
you buy a number of different kinds of items at different prices. Let a be the
row matrix
0·6 1 0·2
2
Throughout these notes, I use the now standard abbreviation ‘iff’ for the phrase ‘if
and only if’.
3.1. MATRIX ARITHMETIC
87
where 0 · 6 is the price of a bottle of milk, 1 is the price of a loaf of bread,
and 0 · 2 is the price of an egg. Let b be the column matrix


2
 3 
10
where 2 is the number of bottles of milk bought, 3 is the number of loaves
of bread bought, and 10 is the number of eggs bought. Thus a is the price
row matrix and b is the quantity column matrix. The total amount spent is
therefore
0 · 6 × 2 + 1 × 3 + 0 · 2 × 10 :
namely, the sum over all the commodities bought of the price of each commodity times the number of items of that commodity purchased. This number is precisely the inner product a · b: namely, 6 · 20.
Stage 2. Let a be a row matrix as above and let B be a matrix. Thus
a is a 1 × m matrix and B is a p × q matrix. Then their product aB is
defined iff the number of columns of a is equal to the number of rows of B.
Thus m = p. To calculate the product think of B as consisting of q column
matrices b1 , . . . , bq . We calculate the q numbers a·b1 , . . . , a·bq as in stage 1,
and the q numbers that result become the entries of aB. Thus aB is a 1 × q
matrix whose jth entry is the number a · bj .
Example 3.1.9. Let a be the cost matrix of our previous example. Let B be
the 3 × 5 matrix whose columns tell me the quantity of commodities bought
on each of the days of the week Monday to Friday:


2 0 2 0 4
B= 3 0 4 0 8 
10 0 10 0 20
Thus on Tuesday and Thursday no purchases were made, whilst on Friday
extra commodities were bought in preparation for the weekend. The matrix
aB is a 1 × 5 matrix which tells us how much was spent on each day of the
week. Thus


2 0 2 0 4
aB = 0 · 6 1 0 · 2  3 0 4 0 8 
10 0 10 0 20
6 · 2 0 7 · 2 0 14 · 4
88
CHAPTER 3. MATRICES
Stage 3. Let A be an m × n matrix and let B be a p × q matrix. Their
product AB is defined iff the number of columns of A is equal to the number
of rows of B: that is n = p. If this is so then AB is an m×q matrix. To define
this product we think of A as consisting of m row matrices a1 , . . . , am and
we think of B as consisting of q column matrices b1 , . . . , bq . As in Stage 2
above, we multiply the first row of A into each of the columns of B and this
gives us the first row of A; we then multiply the second row of A into each
of the columns of B to get the second row of B, and so on.
Example 3.1.10. Let B be the 3 × 5 matrix of the previous example whose
columns tell me the quantity of commodities bought on each of the days
Monday to Friday


2 0 2 0 4
B= 3 0 4 0 8 
10 0 10 0 20
Let A be the 2×3 matrix whose first row tells me the cost of the commodities
in shop 1 and whose second row tells me the cost of the commodities in shop 2.
0·6
1 0·2
A=
0 · 65 1 · 05 0 · 30
The first row of AB tells me how much was spent on each day of the week
in shop 1, and the second row of AB tells me how much was spent on each
day of the week in shop 2. Thus


2 0 2 0 4
0·6
1 0·2 
3 0 4 0 8 
AB =
0 · 65 1 · 05 0 · 30
10 0 10 0 20
which is equal to
6 · 2 0 7 · 2 0 14 · 4
7 · 45 0 8 · 5 0
17
Examples 3.1.11.
(i)



1 −1 0 2 1 


2
3
1
−1
3



=


0
3.1. MATRIX ARITHMETIC
89
(ii) The product
1 −1 2
3
0 1
0 −2
3
2
1 −1
doesn’t exist because the number of columns of the first matrix is not
equal to the number of rows of the second matrix.
(iii) The product
1 2 4
2 6 0


4
1 4 3
 0 −1 3 1 
2
7 5 2
exists because the first matrix is a 2 × 3 and the second is a 3 × 4. Thus
the product will be a 2 × 4 matrix and is
12 27 30 13
8 −4 26 12
3.1.4
Summary of matrix mutiplication
• Let A be an m × n matrix and B a p × q matrix. The product AB is
defined iff n = p and the result will then be an m × q matrix. In other
words:
(m × n)(n × q) = (m × q).
• (AB)ij is the inner product of the ith row of A and the jth column of
B.
• It follows that the inner product of the ith row of A and each of the
columns of B in turn yields each of the elements of the ith row of AB
in turn.
If ai are row matrices and bj are column matrices then the product of
two matrices can be written as follows




a1
a1 · b1 . . . a1 · bn


 . 
.
...
.





 .  b1 . . . bn = 
.
.
.
.
.




 . 


.
...
.
am
am · b1 . . . am · bn
90
CHAPTER 3. MATRICES
3.1.5
Special matrices
Matrices come in all shapes and sizes, but some of these are important enough
to warrant their own terminology. A matrix all of whose elements are zero is
called a zero matrix. The m × n zero matrix is denoted Om,n or just O and
we let the context determine the size of O. A square matrix is one in which
the number of rows is equal to the number of columns. In a square matrix A
the elements (A)11 , (A)22 , . . . , (A)nn are called the diagonal elements. All the
other elements of A are called the off-diagonal elements. A diagonal matrix is
a square matrix in which all off-diagonal elements are zero. A scalar matrix
is a diagonal matrix in which the diagonal elements are all the same. The
n × n identity matrix is the scalar matrix in which all the diagonal elements
are the number one. This is denoted by In or just I where we allow the
context to determine the size of I. Thus scalar matrices are those of the
form λI where λ is any scalar. A matrix is real if all its elements are real
numbers, and complex if all its elements are complex numbers.
Examples 3.1.12.
(i) The matrix


1 0 0
 0 2 0 
0 0 3
is a 3 × 3 diagonal matrix.
(ii) The matrix

1
 0

 0
0
0
1
0
0
0
0
1
0

0
0 

0 
1
is the 4 × 4 identity matrix.
(iii) The matrix






is a 5 × 5 scalar matrix.
42 0 0 0 0
0 42 0 0 0
0 0 42 0 0
0 0 0 42 0
0 0 0 0 42






3.1. MATRIX ARITHMETIC
91
(iv) The matrix








0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0








is a 6 × 5 zero matrix.
3.1.6
Linear equations
Matrices are extremely useful in helping us to solve systems of linear equations. For the time being, I shall simply show you how matrices provide a
convenient notation for writing down such equations.
A system of m linear equations in n unknowns is a list of equations of the
following form
a11 x1 + a12 x2 + . . . + a1n xn
a21 x1 + a22 x2 + . . . + a2n xn
am1 x1 + am2 x2 + . . . + amn xn
= b1
= b2
···
= bm
If we have only a few unknowns then we often use w, x, y, z rather than
x1 , x2 , x3 , x4 . A solution is a set of values of x1 , . . . , xn that satisfy all the
equations. The set of all solutions is called the solution set or general solution.
The equations above can be conveniently represented using matrices. Let A
be the m × n matrix (A)ij = aij ; let b be the m × 1 matrix (b)i = bi , and let
x be the n × 1 matrix (x)j = xj . Then the system of linear equations above
can be written in the form
Ax = b.
The matrix A is called the coefficient matrix. At the moment, we are just
using matrices as packaging for the equations.
Example 3.1.13. Write the following system of equations in matrix form
2x + 3y = 1
x+y = 2
92
CHAPTER 3. MATRICES
This is just
2 3
1 1
x
y
=
1
2
Exercises 3.1


1 2
1 4
1. Let A =  1 0  and B =  −1 1 . Find A + B, A − B and
−1 1
0 3
−3B.




0 4 2
1 −3
5
0 −4 . Find the matrices
2. Let A =  −1 1 3  and B =  2
2 0 2
3
2
0
AB and BA.


0 1
3
1
1
0
3
3. Let A =
, B =  −1 1  and C =
.
0 −1
−1 1 1
3 1
Calculate BA, AA and CB. Can any other pairs of these matrices be
multiplied ? Multiply those which can.


4. Calculate


1
 2 
 
 3 
4
1 2 3


2 1
3 0
−1 2 3


−1 0 , B =
5. If A =
and C =
. Calcu−2 1
4 0 1
2 3
late both (AB)C and A(BC) and check that you get the same answer.
6. Calculate
7. Calculate

 
2 −1
2
x
 1


2 −4
y 
3 −1
1
z
2 + i 1 + 2i
i 3+i
where i is the complex number i.
2i 2 + i
1 + i 1 + 2i
3.1. MATRIX ARITHMETIC
8. Calculate
93



a 0 0
d 0 0
 0 b 0  0 e 0 
0 0 c
0 0 f
9. Calculate
(i)



1 0 0
a b c
 0 1 0  d e f 
0 0 1
g h i
(ii)



0 1 0
a b c
 1 0 0  d e f 
0 0 1
g h i
(iii)



a b c
0 1 0
 d e f  1 0 0 
g h i
0 0 1
10. Find the transposes of each of the following matrices


1
1 2
1 −3
5
 2 

0 −4 , C = 
A =  1 0 , B =  2
 3 
−1 1
3
2
0
4




11. The Pauli matrices are the following 4 matrices with complex entries
and their negatives:
1, i, j, k
where
1=
1 0
0 1
i=
i 0
0 −i
j=
0 1
−1 0
and k =
0 i
i 0
Show that the product of any two Pauli matrices is again either a Pauli
matrix or minus a Pauli matrix by completing the following Cayley
94
CHAPTER 3. MATRICES
table for multiplication (entry in row X and column Y is XY ).
1
i
j
k
1
i
j
k
3.2
Matrix algebra
In this section, we shall look at algebra where the variables are matrices.
This algebra is similar to high-school algebra but also differs significantly in
one or two places. For example, if A and B are matrices it is not true in
general that AB = BA even if both products are defined. We will learn in
this section which rules of school algebra apply to matrices and those which
don’t.
3.2.1
Properties of matrix addition
(MA1) (A + B) + C = A + (B + C). This is the associative law for matrix
addition.
(MA2) A + O = A = O + A. The zero matrix O, the same size as A, is the
additive identity for matrices the same size as A.
(MA3) A + (−A) = O = (−A) + A. The matrix −A is the unique additive
inverse of A.
(MA4) A + B = B + A. Matrix addition is commutative.
Thus matrix addition has the same properties as the addition of real numbers, apart from the fact that the sum of two matrices is only defined when
they have the same size. The role of zero is played by the zero matrix O of
the appropriate size.
Example 3.2.1. Calculate
2A − 3B + 6I
3.2. MATRIX ALGEBRA
95
where
A=
1 2
3 4
and B =
0 1
2 1
Because we are dealing with matrix addition and scalar multiplication the
rules we apply are the same as those in high-school algebra. We get
8 1
0 11
3.2.2
Properties of matrix multiplication
(MM1) (AB)C = A(BC). This is the associative law for matrix multiplication.
(MM2) Let A be an m × n matrix. Then Im A = A = AIn . The matrices Im
and In are the left and right multiplicative identities, respectively.
(MM3) A(B + C) = AB + AC and (B + C)A = BA + CA. These are the
left and right distributivity laws for matrix multiplication over matrix
addition.
Thus matrix multiplication has the same properties as the multiplication
of real numbers, apart from the fact that the product is not always defined,
except the following three major differences.
Warning 1: matrix multiplication is not commutative.
Consider the matrices
1 2
1 1
A=
and B =
3 4
−1 1
Then AB 6= BA. One consequence of the fact that matrix multiplication is
not commutative is that
(A + B)2 6= A2 + 2AB + B 2 ,
in general (see below).
96
CHAPTER 3. MATRICES
Warning 2: the product of two matrices can be a zero matrix
without either matrix being a zero matrix.
Consider the matrices
1 2
−2 −6
A=
and B =
2 4
1
3
Then AB = O.
Warning 3: cancellation of matrices is not allowed.
Consider the matrices
0 2
2 3
−1 1
A=
and B =
and C =
1 4
1 4
0 1
Then A 6= O and AB = AC but B 6= C.
Example 3.2.2. Calculate
1 2
2 1
1 2
−1 −2
0 1
1 0
3 4
2
1
However you bracket the matrices to carry out the calculation you should
always get
17 −2
3
0
Example 3.2.3. Find the 3 × 2 matrix X that satisfies



 

1 −4
2
0
0 0
3  − 4X + 3  4 −2  =  0 0 
2  −2
4
0
0
8
0 0
This can be solved much in the way it would be solved in high-school algebra
to yield


2 −2
 2
0 
2
6
Example 3.2.4. If
x y
4 1
A=
and B =
3 1
3 0
commute find x and y. Multiplying out the matrices and comparing entries
yields x = 5 and y = 1.
3.2. MATRIX ALGEBRA
3.2.3
97
Properties of scalar multiplication
(S1) 1A = A.
(S2) λ(A + B) = λA + λB
(S3) (λµ)A = λ(µA).
(S4) (λ + µ)A = λA + µA.
(S5) (λA)B = A(λB) = λ(AB).
Exercise 3.2.5. Calculate (3I)(4I). Note the brackets are there for clarity
only. The answer is simply 12I.
3.2.4
Properties of the transpose
(T1) (AT )T = A.
(T2) (A + B)T = AT + B T .
(T3) (αA)T = αAT .
(T4) (AB)T = B T AT .
Warning! Notice that the transpose of a product reverses the order of the
matrices.
There are some important consequences of the above properties:
• Because matrix addition is associative we can write sums without brackets.
• Because matrix multiplication is associative we can write matrix products without brackets.
• The left and right distributivity laws can be extended to arbitrary finite
sums.
98
3.2.5
CHAPTER 3. MATRICES
Polynomials of matrices
Let A be a square matrix. We can therefore form the product AA which we
write as A2 . When it comes to multiplying A by itself three times there are
apparently two possibilities: A(AA) and (AA)A. However, matrix multiplication is associative and so these two products are equal. We write this as
A3 . In general An+1 = AAn = An A. We define A0 = I, the identity matrix
the same size as A. It can be proved that the usual properties of exponents
hold:
Am An = Am+n and (Am )n = Amn .
One important consequence is that powers of A commute so that
Am An = An Am .
We can form powers of matrices, multiply them by scalars and add them
together. We can therefore form sums like
A3 + 3A2 + A + 4I.
In other words, we can substitute A in the polynomial
x3 + 3x2 + x + 4
remembering that 4 = 4x0 and so has to be replaced by 4I.
Example 3.2.6. Let f (x) = x2 + x + 2 and let
1 1
A=
1 0
We calculate f (A). Remember that x2 + x + 2 is really x2 + x + 2x0 . Replace
x by A and so x0 is replaced by A0 which is I. We therefore get A2 + A + 2I
and calculating gives
5 2
2 3
Example 3.2.7. Let f (x) = 2x2 + 3x + 3 and let
1 2
A=
2 3
Calculate f (A). You should get
A=
16 22
22 38
3.2. MATRIX ALGEBRA
99
Warning! When a square matrix A is into a polynomial, you must replace
the constant term of the polynomial by the constant term times the identity
matrix. The identity matrix you use will have the same size as A.
Example 3.2.8. Factorise
A2 + A.
We have that
A2 + A = A2 + AI = A(A + I)
using distributivity. A very common mistake is to think this is A(A + 1)
which is wrong because the sizes of A and 1 will not match.
Recall that a square matrix is a scalar matrix if it is a scalar multiple of
an identity matrix. Scalar matrices commute with all matrices because
A(λI) = λ(AI) = λA = λ(IA) = (λI)A.
Let’s now multiply together two matrices of the form A + λI and A + µI.
We get
(A + λI)(A + µI) = (A + λI)A + (A + λI)µI = AA + λIA + A(µI) + (λI)(µI)
which is just
A2 + (λ + µ)A + λµI.
Warning! In the above calculation (λ+µ) is a scalar and not a 1×1 matrix.
Example 3.2.9. The product of A + 2I and A + 3I is A2 + 5A + 6I.
A2 + 5A + 6I = (A + 2I)(A + 3I).
Example 3.2.10. Is the following argument correct? Suppose X 2 = I. Then
X 2 − I = O. Thus (X − I)(X + I) = O. Hence X = I or X = −I. The
answer is no. It assumes, incorrectly, that if the product of two matrices is a
zero matrix then one of the matrices must itself by a zero matrix. We have
seen that this is false.
In fact, far from having only two square roots, the 2 × 2 identity matrix
in fact has infinitely many as we now show.
100
CHAPTER 3. MATRICES
Example 3.2.11. Let
A=
a
b
c −a
and suppose that a2 + bc = 1. Check that A2 = I. Examples of matrices
satisfying these conditions are
√
1 + n2
√ −n
n − 1 + n2
where n is any positive integer. Thus the 2 × 2 identity matrix has infinitely
many square roots!
Example 3.2.12. Suppose that AB = A and BA = B. Prove that A2 = A
and B 2 = B. We have that
A2 = AA = (AB)A = A(BA) = AB = A.
The proof of the other case is similar.
Example 3.2.13. Given that BX = A and CB = I find X. We use matrix
algebra. Multiply BX = A on the left by C to get C(BX) = CA. By
associativity this is just (CB)X = CA. This simplifies to IX = CA and so
by properties of the identity matrix this yields X = CA.
Exercises 3.2
1. Calculate
2. Calculate
2
0
7 −1

1
 2 
3
+
1 1
1 0
3 2 1
+

0 1
1 1
+
2 2
3 3

3. Calculate
x y

1
 −1 
−4
5 4
4 4
3 1 5
x
y
3.2. MATRIX ALGEBRA
101
4. If
A=
1 −1
1
2
calculate A2 , A3 and A4 .
1 1
1
5. Let A =
and x =
. Calculate Ax, A2 x, A3 x, A4 x and
1 0
0
A5 x. What do you notice?
6. Calculate A2 where
A=
cos θ sin θ
− sin θ cos θ
7. Show that
A=
1 2
3 4
satisfies A2 − 5A − 2I = O.
8. Let A be the following 3 × 3 matrix


2 4
4
 0 1 −1 
0 1
3
Calculate
A3 − 6A2 + 12A − 8I
where I is the

3

2
9. Let A =
2
3 × 3 identity matrix.

1 −1
2 −1  and f (x) = x3 − 5x2 + 8x − 4. Calculate f (A).
2
0
10. If 3X + A = B, find X in terms of A and B.
1 1
2 2
11. If X + Y =
and X − Y =
find X and Y .
2 2
1 1
12. If AB = BA show that A2 B = BA2 .
13. Is it true that AABB = ABAB?
102
CHAPTER 3. MATRICES
14. Show that
(A + B)2 − (A − B)2 = 2(AB + BA).
15. Let A and B be n × n matrices. Is it necessarily true that
(A − B)(A + B) = A2 − B 2 ?
If so, prove it. If not, find a counterexample.
16. Expand (A + I)4 carefully.
17. A matrix A is said to be symmetric if AT = A.
(i) Show that a symmetric matrix must be square.
(ii) Show that if A is any matrix then AAT is defined and symmetric.
(iii) Let A and B be symmetric matrices of the same size. Prove that
AB is symmetric if and only if AB = BA.
18. An n × n-matrix A is said to be skew-symmetric if AT = − A.
(i) Show that the diagonal entries of a skew-symmetric matrix are all
zero.
(ii) If B is any n × n-matrix, show that B + B T is symmetric and
that B − B T is skew-symmetric.
(iii) Deduce that every square matrix can be expressed as the sum of
a symmetric matrix and a skew-symmetric matrix.
19. Let A, B and C be square matrices of the same size. Define [A, B] =
AB − BA. Calculate
[[A, B], C] + [[B, C], A] + [[C, A], B].
20. Let A be a 2 × 2 matrix such that AB = BA for all 2 × 2 matrices B.
Show that
λ 0
A=
0 λ
for some scalar λ.
21. Let A be a 2 × 2 matrix. The trace of A, denoted tr(A), is the sum of
the diagonal elements.
3.3. DETERMINANTS
103
(i) Show that tr(A + B) = tr(A) + tr(B); tr(λA) = λtr(A); tr(AB) =
tr(BA).
(ii) Let A be a known matrix. Show that the equation AX − XA = I
cannot be solved for X.
3.3
Determinants
In this section, we shall show how to calculate a single number from a square
matrix called the determinant of that matrix. Determinants are important,
and we shall explain their geometric meaning later on.
Determinants are only defined for square matrices. If A is a square matrix
we denote its determinant by det(A) or by replacing the round brackets of
the matrix A with straight brackets.
• The determinant of the 1 × 1 matrix a is a.
• The determinant of the 2 × 2 matrix
a b
A=
c d
denoted
a b
c d
is the number ad − bc.
• The determinant of the 3 × 3 matrix


a b c
 d e f 
g h i
denoted
a b c d e f g h i is the number
e f d f d e a
− b
+ c
h i g i g h 104
CHAPTER 3. MATRICES
We could in fact define the determinant of any square matrix of whatever
size in much the same way. However, we shall limit ourselves to calculating
the determinants of 3 × 3 matrices at most.
Warning! Pay attention to the signs in the definition. You multiply alternately by plus one and minus one
+ − + − ...
Examples 3.3.1.
(i)
2 3
4 5
= 2 × 5 − 3 × 4 = −2.
(ii)
2 1 0
1 0 2
0 1 1
= 2 0 2
1 1
− 1 1 2
0 1
= −5
(iii)
1 2 1
3 1 0
2 0 1
= 1 1 0
0 1
− 2 3 0
2 1
3 1
+
2 0
= −7
We shall briefly touch on some important properties of determinants that
play an important role in the more advanced theory.
Theorem 3.3.2. Let A and B be square matrices having the same size. Then
det(AB) = det(A) det(B).
Proof. The result is true in general, but I shall only prove it for 2×2 matrices.
Let
a b
e f
A=
and B =
c d
g h
We prove directly that det(AB) = det(A) det(B). First
ae + bg af + bh
AB =
ce + dg cf + dh
3.3. DETERMINANTS
105
Thus
det(AB) = (ae + bg)(cf + dh) − (af + bh)(ce + dg).
The first bracket multiplies out as
acef + adeh + bcgf + bdgh
and the second as
acef + adf g + bceh + bdgh.
Subtracting these two expressions we get
adeh + bcgf − adf g − bceh.
Now we calculate det(A) det(B). This is just
(ad − bc)(eh − f g)
which multiplies out to give
adeh + bcf g − adf g − bceh.
Thus the two sides are equal, and we have proved the result.
Theorem 3.3.3. Let A be any square matrix. Then
det(AT ) = det(A).
Proof. The theorem is true in general, but I shall only prove it for 2 × 2
matrices. Let
a b
A=
c d
We calculate det(AT )
a c
b d
= ad − cb = a b
c d
as claimed.
The proof of the following theorem is true in general but I would recommend proving it only for the 2 × 2 case. To state the results, I will think
of an n × n matrix A as being a list of its columns where the columns are
regarded as column vectors. Thus
A = (a1 , . . . , an ) .
106
CHAPTER 3. MATRICES
Theorem 3.3.4.
1. Let B be obtained from A by interchanging any two columns. Then
det(B) = − det(A). We say that the determinant is an alternating
function of its columns.
2. det (a1 , . . . , λai + µbi , . . . an ) is equal to
λ det (a1 , . . . , ai , . . . an ) + µ det (a1 , . . . , bi , . . . an ) .
We say that the determinant is n-linear.
From the above theorem, we may deduce the following useful properties
of determinants.
1. If two columns of a matrix are equal then the determinant is zero.
2. If a multiple of one column of a matrix is added to another then the
determinant is unchanged.
These properties of determinants can be used to simplify their calculation.
We conclude this section with an application of determinants to solving
certain systems of linear equations. This is useful in theory and bad in
practice.
Theorem 3.3.5 (Cramer’s Rule). Let Ax = b be a system of n equation in
n unknowns xi where the matrix A has a non-zero determinant. Define Bi to
be the matrix obtained from A by replacing the ith column of A by b. Then
xi =
det(Bi )
,
det(A)
and this is the only solution.
We shall verify this theorem in the case where the coefficient matrix is
2 × 2.
Example 3.3.6. Consider the system of equations
ax + by = e
cx + dy = f
3.3. DETERMINANTS
107
where the matrix of coefficients is invertible. By Cramer’s rule we have that
e b f d x = a
b
c d and
y = a
c
a
c
e
f
b
d
Direct substitution into the lefthand side of the equations shows that these
solutions work. We shall show this for the first equation. This becomes
e b a e + b
a c f f d .
a b c d The numerator is just
a(de − bf ) + b(af − ce) = ade − abf + abf − bce = e(ad − bc)
which gives the result.
Advice Although of some theoretical value, Cramer’s Rule can be used reasonably well for two or three unknowns but for larger systems it is highly
labour-intensive and not to be recommended.
Exercises 3.3
1. Compute the following determinants.
(i)
1 −1 2
3 108
CHAPTER 3. MATRICES
(ii)
3 2
6 4
(iii)
1 −1 1 2
3 4 0
0 1 (iv)
1 2 0
0 1 1
2 3 1
(v)
2
2
2
1
0
5
100 200 300
(vi)
1
3
5
102 303 504
1000 3005 4999
(vii)
1 1 2
2 1 1
1 2 1
(viii)
15 16 17
18 19 20
21 22 23
1−x
4
2. Solve 2
3−x
3. Calculate
= 0.
x
cos x
sin x
1 − sin x
cos
x
0 − cos x − sin x
3.3. DETERMINANTS
109
4. Solve the system of linear equations
2x + 4y + z = 1
x+y+z = 0
3x + 3y + z = 2
using Cramer’s rule and check your answer.
5. Consider a triangle with sides of lengths a, b and c. The angles opposite
these sides are respectively α, β and γ.
Show from basic trigonometry that
b cos γ + c cos β = a
c cos α + a cos γ = b
a cos β + b cos α = c
Use Cramer’s rule to show that
cos α =
b 2 + c 2 − a2
.
2bc
6. Show that if x1 , x2 , x3 are distinct then the determinant
1 x1 x21 1 x2 x22 1 x3 x23 110
CHAPTER 3. MATRICES
is (x2 − x1 )(x3 − x1 )(x3 − x2 ) and so non-zero. Deduce that given three
distinct points (x1 , y1 ), (x2 , y2 ) and (x3 , y3 ) there is a unique parabola
y = a + bx + cx2 that passes through them.
3.4
Solving systems of linear equations
The goal of this section is to use matrices to help us solve systems of linear
equations. We begin by proving some general results on linear equations,
and then we describe Gaussian elimination, an algorithm for solving systems
of linear equations.
3.4.1
Some theory
A system of m linear equations in n unknowns is a list of equations of the
following form
a11 x1 + a12 x2 + . . . + a1n xn
a21 x1 + a22 x2 + . . . + a2n xn
am1 x1 + am2 x2 + . . . + amn xn
= b1
= b2
···
= bm
A solution is a sequence of values of x1 , . . . , xn that satisfy all the equations. The set of all solutions is called the solution set or general solution.
The equations above can be conveniently represented using matrices. Let
A be the m × n matrix (A)ij = aij ; let b be the m × 1 matrix (b)i1 = bi , and
let x be the n × 1 matrix (x)j1 = xj . Then the system of linear equations
above can be written in the form
Ax = b
If b is a zero matrix, we say that the equations are homogeneous, otherwise
they are said to be inhomogeneous.
A system of linear equations that has no solution is said to be inconsistent;
otherwise, it is said to be consistent.
We begin with some results that tell us what to expect when solving
systems of linear equations.
3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS
111
Proposition 3.4.1. Homogeneous equations Ax = 0 are always consistent,
because x = 0 is always a solution. In addition, the sum of any two solutions
is again a solution, and the scalar multiple of any solution is again a solution.
Proof. Let Ax = 0 be our homogeneous system of equations. Let a and b be
solutions. That is Aa = 0 and Ab = 0. We now calculate A(a + b). To do
this we use the fact that matrix multiplication satisfies the left distributivity
law
A(a + b) = Aa + Ab = 0 + 0 = 0.
Now let a be a solution and λ any scalar. Then
A(λa) = λAa = λ0 = 0.
Proposition 3.4.2. Let
Ax = b
be a consistent system of linear equations. Let p be any one solution. Then
every solution of the equation is of the form p + h for some solution h of
Ax = 0.
Proof. Let a be any solution to Ax = b. Let h = a − p. Then Ah = 0. The
result now follows.
Theorem 3.4.3. A system of linear equations Ax = b has either
• no solutions;
• exactly one solution;
• infinitely many solutions.
Proof. We prove that if we can find two different solutions we can in fact
find infinitely many solutions. Let u and v be two distinct solutions to
this equation then Au = b and Av = b. Consider now the column matrix
w = u − v. Then
Aw = A(u − v) = Au − Av = 0
112
CHAPTER 3. MATRICES
using the distributive law. Thus w is a non-zero column matrix that satisfies
the equation Ax = 0. Consider now the column matrices of the form
u + λw
where λ is any real number. This is therefore a set of infinitely many different
column matrices. We calculate
A(u + λw) = Au + λAw = b
using the distributive law and properties of scalars. It follows that the infinitely many column matrices u + λw are solutions to the equation Ax =
b.
3.4.2
Gaussian elimination
In this section, we shall develop a method, in fact an algorithm, that will take
as input a system of linear equations and produce as output the following: if
the system has no solutions it will tell us, on the other hand if it has solutions
then it will determine them all. Our method is based on three simple ideas:
1. Certain systems of linear equations have a shape that makes them very
easy to solve.
2. Certain operations can be carried out on systems of linear equations
which simplify them but do not change the solutions.
3. Everything can be done using matrices.
Here are examples of each of these ideas.
Example 3.4.4. The system of equations
2x + 3y = 1
−y = 3
is very easy to solve. From the second equation we get y = −3. Substituting
this value into the first equation gives us x = 5. We can check that this
solution is correct by checking that these two values satisfy every equation.
3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS
113
Example 3.4.5. The system of equations
2x + 3y = 1
x+y = 2
can be converted into a system with the same solutions but which is easier to
solve. Multiply then second equation by 2. This gives us the new equations
2x + 3y = 1
2x + 2y = 4
which have the same solutions as the original equations. Next, subtract the
first equation from the second equation to get
2x + 3y = 1
−y = 3
These equations also have the same solutions as the original equations, but
they can now be easily solved.
Example 3.4.6. The system of equations
2x + 3y = 1
x+y = 2
can be written in matrix form as the matrix equation
2 3
x
1
=
1 1
y
2
For the purposes of our algorithm, we rewrite this equation in terms of what
is called an augmented matrix
2 3 1
1 1 2
The operations carried out in the previous example can be applied directly
to the augmented matrix.
2 3 1
2
3 1
2 3 1
=⇒
=⇒
1 1 2
2 2 4
0 −1 3
114
CHAPTER 3. MATRICES
This augmented matrix can then be converted back into the usual matrix
form and solved
2x + 3y = 1
−y = 3
We now formalize the above ideas.
A matrix is called a row echelon matrix or to be in row echelon form if it
satisfies the following three conditions:
1. Any zero rows are at the bottom of the matrix.
2. If there are non-zero rows then they begin with the number 1, called
the leading 1.
3. In the column beneath a leading 1, the elements are all zero.
The following operations on a matrix are called elementary row operations:
1. Multiply row i by a non-zero scalar λ. We notate this operation by
Ri ← λRi .
2. Interchange rows i and j. We notate this operation by Ri ↔ Rj .
3. Add a multiple λ of row i to another row j. We notate this operation
by Rj ← Rj + λRi .
The following result is not hard to prove.
Proposition 3.4.7. Applying the elementary row operations to a system of
linear equations does not change their set of solutions
Given a system of linear equations
Ax = b
the matrix
(A|b)
is called the augmented matrix.
Algorithm 3.4.8. (Gaussian elimination) This is an algorithm for solving
systems of linear equations. In outline, the algorithm runs as follows:
3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS
115
(i) Given a system of equations
Ax = b
form the augmented matrix
(A|b).
(ii) By using elementary row operations, convert
(A|b)
into an augmented matrix
(A0 |b0 )
which is a row echelon matrix.
(iii) Solve the equations obtained from
(A0 |b0 )
by back substitution.
Remarks
• The process in step (ii) has to be carried out systematically to avoid
going around in circles.
• Elementary row operations applied to a set of linear equations do not
change the solution set; thus the solution sets of
Ax = b and A0 x = b0
are the same.
• Solving systems of linear equations where the associated augmented
matrix is a row echelon matrix is easy and can be accomplished by
back substitution.
Here is a more detailed description of step (ii) of the algorithm — the
input is a matrix B and the output is a matrix B 0 which is a row echelon
matrix:
1. Locate the leftmost column that does not consist entirely of zeros.
116
CHAPTER 3. MATRICES
2. Interchange the top row with another row if necessary to bring a nonzero entry to the top of the column found in step 1.
3. If the entry now at the top of the column found in step 1 is a, then
multiply the first row by a1 in order to introduce a leading 1.
4. Add suitable multiples of the top row to the rows below so that all
entries below the leading 1 become zeros.
5. Now cover up the top row, and begin again with step 1 applied to the
matrix that remains. Continue in this way until the entire matrix is a
row echelon matrix.
The important thing to remember is to start at the top and work downwards.
Here is a more detailed description of step (iii) of the algorithm. Let
0
A x = b0 be a system of equations where the augmented matrix is a row
echelon matrix and where there is more than one solution. The variables
are divided into two groups: those variables corresponding to the columns
of A0 containing leading 1’s, called leading variables, and the rest, called free
variables. We solve for the leading variables in terms of the free variables; the
free variables can be assigned arbitrary values independently of each other.
Examples 3.4.9.
1. Show that the following system of equations is inconsistent (i.e. has no
solutions).
x + 2y − 3z = −1
3x − y + 2z = 7
5x + 3y − 4z = 2
The first step is to write down the augmented matrix of the system. In
this case, this is the matrix


1
2 −3 −1
 3 −1
2
7 
5
3 −4
2
3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS
117
Carry out the elementary row operations R2 ← R2 − 3R1 and R3 ←
R3 − 5R1 . This gives us


1
2 −3 −1
 0 −7 11 10 
7
0 −7 11
Now carry out the elementary row operation R3 ← R3 − R2 which
yields


1
2 −3 −1
 0 −7 11 10 
0
0
0 −3
The equation corresponding to the last line of the augmented matrix is
0x + 0y + 0z = −3. Clearly, this equation has no solutions and so the
original set of equations has no solutions.
2. Show that the following system of equations has exactly one solution,
and check it.
x + 2y + 3z = 4
2x + 2y + 4z = 0
3x + 4y + 5z = 2
We first write down the augmented

1 2
 2 2
3 4
matrix

3 4
4 0 
5 2
We then carry out the elementary row operations R2 ← R2 − 2R1 and
R3 ← R3 − 3R1 to get


1
2
3
4
 0 −2 −2 −8 
0 −2 −4 −10
The carry out the elementary row
− 21 R3 that yield

1 2
 0 1
0 1
operations R2 ← − 21 R2 and R3 ←

3 4
1 4 
2 5
118
CHAPTER 3. MATRICES
Finally, carry out the elementary row

1 2 3
 0 1 1
0 0 1
operation R3 ← R3 − R2

4
4 
1
This is now a row echelon matrix. Write down the corresponding set
of equations
x + 2y + 3z = 4
y+z = 4
z = 1
Now solve by back substitution to get x = −5, y = 3 and z = 1.
Finally, we check that


  
1 2 3
−5
4
 2 2 4  3  =  0 
3 4 5
1
2
3. Show that the following system of equations has infinitely many solutions, and check them.
x + 2y − 3z = 6
2x − y + 4z = 2
4x + 3y − 2z = 14
The augmented matrix for this system is


1
2 −3 6
 2 −1
4 2 
4
3 −2 14
We transform this matrix into an echelon matrix by means of the following elementary row operations R2 ← R2 − 2R1 , R3 ← R3 − 4R1 ,
R2 ← − 51 R2 , R3 ← − 51 R3 and R3 ← R3 − R2 . This yields


1 2 −3 6
 0 1 −2 2 
0 0
0 0
3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS
119
Because the bottom row consists entirely of zeros, this means that we
have only two equations
x + 2y − 3z = 6
y − 2z = 2
By back substitution, both x and y can be expressed in terms of z and
z may take any value we like. We say that z is a free variable. Let
z = λ ∈ R. Then the set of solutions can be written in the form
   


x
2
−1
 y  =  2  + λ 2 
z
0
1
We now check that these solutions work


 

1
2 −3
2−λ
6
 2 −1
4   2 + 2λ  =  2 
4
3 −2
λ
14
as required.
Exercises 3.4
1. In each case, determine whether the system of equations is consistent
or not. When consistent, find all solutions and show that they work.
(i)
2x + y − z = 1
3x + 3y − z = 2
2x + 4y + 0z = 2
(ii)
2x + y − z = 1
3x + 3y − z = 2
2x + 4y + 0z = 3
120
CHAPTER 3. MATRICES
(iii)
2x + y − 2z = 10
3x + 2y + 2z = 1
5x + 4y + 3z = 4
(iv)
x+y+z+w
4x + 5y + 3z + 3w
2x + 3y + z + w
5x + 7y + 3z + 3w
3.5
=
=
=
=
0
1
1
2
Blankinship’s algorithm
This is an alternative procedure to the extended Euclidean algorithm that
delivers exactly the same information in a much nicer way. It uses matrix
theory and was described by W. A. Blankinship in ‘A new version of the
Euclidean algorithm’ American Mathematical Monthly 70 (1963), 742–745.
To explain how it works, let’s go back to the basic step of Euclid’s algorithm.
If a ≥ b then we divide b into a and write
a = bq + r
where 0 ≤ r ≤ b. The key point is that gcd(a, b) = gcd(b, r). We shall now
think of (a, b) and (b, r) as column matrices
a
r
,
.
b
b
We want the 2 × 2 matrix that maps
a
r
to
.
b
b
This is the matrix
1 −q
0
1
.
3.5. BLANKINSHIP’S ALGORITHM
Thus
1 −q
0
1
a
b
121
=
r
b
.
Finally, we can describe the process by the following matrix operation
1 0 a
1 −q r
→
0 1 b
0
1 b
by carrying out an elementary row operation. This procedure can be iterated.
It will terminate when one of the entries in the righthand column is 0. The
non-zero entry will then be the greatest common divisor of a and b and the
matrix on the lefthand side will tell you how to get to 0, gcd(a, b) from a, b
and so will provide the information that the Euclidean algorithm provides.
All of this is best illustrated by means of an example.
Let’s calculate x, y such that gcd(2520, 154) = x2520 + y154. We start
with the matrix
1 0 2520
0 1 154
If we divide 154 into 2520 it goes 16 times plus a remainder. Thus we subtract
16 times the second row from the first to get
1 −16 56
0
1 154
We now repeat the process but, since the larger number, 154, is on the
bottom, we have to subtract some multiple of the first row from the second.
This time we subtract twice the first row from the second to get
1 −16 56
−2
33 42
Now repeat this procedure to get
3 −49 14
−2
33 42
And again
3 −49 14
−11 180 0
122
CHAPTER 3. MATRICES
The process now terminates because we have a zero in the rightmost column.
The non-zero entry in the rightmost column is gcd(2520, 154). We also know
that
14
3 −49
2520
=
.
−11 180
154
0
Now this matrix equation corresponds to two equations. The bottom one
can be verified. The top one says that
14 = 3 × 2520 − 49 × 154
which is both true and solves the extended Euclidean problem!
You can use either this method or the one described in Chapter 1.
3.6
*Some proofs*
This section will not be examined in 2013.
In this section, I shall prove that the algebraic properties of matrices
stated really do hold. I shan’t prove all of them: just a representative sample. It is important to observe, that all the properties of matrix algebra are
ultimately proved using the properties of real numbers.
Let A be an m × n matrix whose entry in the ith row and jth column is
aij . Let B be an n × p matrix whose entry in the jth row and kth column is
bjk . By definition (AB)ik is the number equal to the product of the ith row
of A times the kth column of B. This is just
(AB)ik =
n
X
j=1
Theorem 3.6.1.
1. (A + B) + C = A + (B + C).
2. A(BC) = (AB)C.
3. (λ + µ)A = λA + µA.
aij bjk .
3.6. *SOME PROOFS*
123
Proof. (1) To show that (A + B) + C = A + (B + C) we have to prove two
things. First, the size of (A + B) + C is the same as the size of A + (B + C).
Second, elements of (A + B) + C and A + (B + C) in corresponding positions
are equal. To add A and B they have to be the same size and the result
will be the same size as both of them. Thus C is the same size as A and B.
It’s clear that both sides of the equation really are the same size. We now
compare corresponding elements:
((A + B) + C)ij = (A + B)ij + (C)ij = ((A)ij + (B)ij ) + (C)ij .
But now we use the associativity of addition of real numbers to get
((A)ij +(B)ij )+(C)ij = (A)ij +((B)ij +(C)ij ) = (A)ij +(B+C)ij = (A+(B+C))ij ,
as required.
(2) Let A be an m × n matrix with entries aij , let B be an n × p matrix
with entries bjk , and let C be a p × q matrix with entries ckl . It’s evident
that A(BC) and (AB)C have the same size, so it remains to show that
corresponding elements are the same. We shall prove that
(A(BC))il = ((AB)C)il .
By definition
(A(BC))il =
n
X
ait (BC)tl ,
t=1
and
(BC)tl =
p
X
bts csl .
s=1
Thus
(A(BC))il =
n
X
t=1
ait (
p
X
bts csl ).
s=1
Using distributivity of multiplication over addition for real numbers this sum
is just
p
n X
X
ait bts csl .
t=1 s=1
Now change the order in which we add up these real numbers to get
p
n
X
X
s=1 t=1
ait bts csl .
124
CHAPTER 3. MATRICES
Now use distributivity again
p
n
X
X
(
ait bts )csl .
s=1 t=1
The sum within the brackets is just
(AB)is
and so the whole sum is
p
X
(AB)is csl
s=1
which is precisely
((AB)C)il .
(3) Clearly (λ + µ)A and λA + µA have the same sizes. We show that
corresponding elements are the same:
((λ + µ)A)ij = (λ + µ)(A)ij = λ(A)ij + µ(A)ij = (λA)ij + (µA)ij
which is just (λA + µA)ij , as required.
Warning! In (4) below, notice how matrices are reversed.
Theorem 3.6.2.
1. (AT )T = A.
2. (A + B)T = AT + B T .
3. (αA)T = αAT .
4. (AB)T = B T AT .
Proof. (1) We have that
((AT )T )ij = (AT )ji = (A)ij .
(2) We have that
((A + B)T )ij = (A + B)ji = (A)ji + (B)ji = (AT )ij + (B T )ij
3.7. *MATRIX INVERSES*
125
which is just
(AT + B T )ij .
(3) We have that
((αA)T )ij = (αA)ji = α(A)ji = α(AT )ji = (αAT )ij .
(4) Let A be an m × n matrix and B an n × p matrix. Thus AB is defined
and is m × p. Hence (AB)T is p × m. Now B T is p × n and AT is n × m.
Thus B T AT is defined and is p × m. Hence (AB)T and B T AT have the same
size. We now show that corresponding elements are equal. By definition
((AB)T )ij = (AB)ji .
This is equal to
n
n
X
X
(A)js (B)si =
(AT )sj (B T )is .
s=1
s=1
But real numbers commute under multiplication and so
n
X
T
T
(A )sj (B )is =
s=1
n
X
(B T )is (AT )sj = (B T AT )ij ,
s=1
as required.
3.7
*Matrix inverses*
This section will not be examined in 2013.
In this section, all matrices will be square.
I have described how matrices can be added, subtracted and multiplied.
In this section, I shall now describe the circumstances under which division
is possible, and give an application to solving linear equations.
3.7.1
The key idea
The simplest kind of linear equation is ax = b where a and b are scalars. If
a 6= 0 we can solve this by multiplying by a−1 on both sides to get a−1 (ax) =
a−1 b; we now use associativity to get (a−1 a)x = a−1 b; but a−1 a = 1 and so
1x = a−1 b and this gives x = a−1 b.
126
CHAPTER 3. MATRICES
We try to copy this approach for the matrix equation
Ax = b.
We suppose that there is a matrix B such that BA = I.
• Multiply on the left both sides of our equation Ax = b to get B(Ax) =
Bb. Because order matters when you multiply matrices, which side
you multiply on also matters.
• Use associativity of matrix mulitiplication to get (BA)x = Bb.
• Now use our assumption that BA = I to get Ix = Bb.
• Finally, we use the properties of the identity matrix to get x = Bb.
We appear to have solved our equation, but we need to check it. We
calculate A(Bb). By associativity this is (AB)b. At this point we also have
to assume that AB = I; this gives Ib = b, as required.
We conclude that in order to copy the method for a solving a linear
equation in one unknown, our coefficient matrix A must have the property
that there is a matrix B such that
AB = I = BA.
We shall now investigate this condition in more detail.
3.7.2
Invertible and noninvertible matrices
A matrix A is said to be invertible if we can find a matrix B such that
AB = I = BA. The matrix B we call it an inverse of A, and we say that
the matrix A is invertible.
Example 3.7.1. A real number r regarded as a 1 × 1 matrix is invertible if
and only if it is non-zero, in which case an inverse is its reciprocal.
It’s clear that if A is a zero matrix, then it can’t be invertible just as in
the case of real numbers. However, the next example shows that even if A is
not a zero matrix, then it need not be invertible.
3.7. *MATRIX INVERSES*
127
Example 3.7.2. Let A be the matrix
1 1
0 0
We shall show that there is no matrix B such that AB = I = BA. Let B be
the matrix
a b
c d
From BA = I we get
a = 1 and a = 0.
It’s impossible to meet both these conditions at the same time and so B
doesn’t exist.
Example 3.7.3. Let


1 2 3
A= 0 1 4 
0 0 1
and


1 −2
5
1 −4 
B= 0
0
0
1
Check that AB = I = BA. We deduce that A is invertible with inverse B.
As always, in passing from numbers to matrices things become more
complicated. Before going any further, I need to clarify one point which will
at least make our lives a little simpler.
Lemma 3.7.4. Let A be invertible and suppose that B and C are matrices
such that
AB = I = BA and AC = I = CA.
Then B = C.
Proof. Multiply AB = I both sides on the left by C. Then C(AB) = CI.
Now CI = C, because I is the identity matrix, and C(AB) = (CA)B since
matrix multiplication is associative. But CA = I thus (CA)B = IB = B. It
follows that C = B.
128
CHAPTER 3. MATRICES
The above result tells us that if a matrix A is invertible then there is only
one matrix B such that AB = I = BA. We call the matrix B the inverse of
A. It is usually denoted by A−1 .
Warning! We can only write A−1 if we know that A is invertible.
There are now two main questions. First, how can we tell whether a
matrix is invertible or not? And, second, if a matrix is invertible how do we
find its inverse? In the remainder of this section, I shall answer these two
questions. The key to answering them both is the determinant.
Recall from Theorem 3.3.2 that
det(AB) = det(A) det(B).
I use this property below to get a necessary condition for a matrix to be
invertible.
Lemma 3.7.5. If A is invertible then det(A) 6= 0.
Proof. By assumption, there is a matrix B such that AB = I. Take determinants of both side of the equation
AB = I
to get
det(AB) = det(I).
By the key property of determinants recalled above
det(AB) = det(A) det(B)
and so
det(A) det(B) = det(I).
But det(I) = 1 and so
det(A) det(B) = 1.
In particular, det(A) 6= 0.
If we return to our example above, then we can now see why we could
not find an inverse: its determinant is zero.
Are there any other properties that a matrix must satisfy in order to have
an inverse? The answer is no. I shall prove this for 2 × 2 and 3 × 3 matrices.
3.7. *MATRIX INVERSES*
3.7.3
129
The matrix inverse method for solving linear equations
Once we have written the equations in matrix form, we can use matrix inverses to solve them as long as the the matrix of coefficents is invertible.
Theorem 3.7.6. A system of linear equations
Ax = b
in which A is invertible has unique solution
x = A−1 b.
Proof. Observe that
A(A−1 b) = (AA−1 )b = Ib = b.
Thus A−1 b is a solution. It is unique because if x0 is any solution then
Ax0 = b
giving
A−1 (Ax0 ) = A−1 b
and so
x0 = A−1 b.
I shall call this method of solving linear equations the matrix inverse
method.
The 2 × 2 case
We shall begin by dealing with the case where we have two equations and
two unknowns. This contains all the ideas we shall need without any of the
labour. We therefore have a system of equations Ax = b where A is a 2 × 2
matrix.
An important ingredient in finding the inverse of A, if it exists, is a matrix
formed from A called the adjugate matrix of A. If
a b
A=
c d
130
CHAPTER 3. MATRICES
then we define the adjugate matrix, adj(A), to be the following matrix
d −b
adj(A) =
−c
a
The defining characteristic of the adjugate is that
A adj(A) = det(A)I = adj(A)A
We can deduce from the defining characteristic of the adjugate that if
det(A) 6= 0 then
1
d −b
−1
A =
a
det(A) −c
In particular, a 2 × 2 matrix is invertible if and only if its determinant is
non-zero.
1 2
Example 3.7.7. Let A =
. Determine if A is invertible and, if it
3 1
is, find its inverse, and check the answer. We calculate det(A) = −5. This is
non-zero, and so A is invertible. We now form the adjugate of A:
1 −2
adj(A) =
−3
1
Thus the inverse of A is
−1
A
−1
=
5
1 −2
−3
1
We now check that AA−1 = I (to make sure that we haven’t made any
mistakes).
We can now put all the pieces together, and show how to apply the matrix
inverse method in practice.
Example 3.7.8. Solve the following system of equations using the matrix
inverse method
x + 2y = 1
3x + y = 2
3.7. *MATRIX INVERSES*
131
1. Write the equations in matrix form:
1 2
x
1
=
3 1
y
2
2. Calculate det(A): this is equal to −5. Since the determinant is non-zero
the matrix inverse method can be applied.
3. Calculate adj(A):
1 −2
−3
1
1 −2
−3
1
adj(A) =
4. Form inverse:
A
−1
−1
=
5
5. Solve equations using inverse: from Ax = b we get that x = A−1 b.
Thus in this case
3 −1
1 −2
1
5
=
x=
1
−3
1
2
5
5
Thus x =
3
5
and y = 15 .
6. Check solutions: there are two (equivalent) ways of doing this. The
first is to check by direct substitution
x + 2y =
3
1
+2· =1
5
5
and
3 1
+ =2
5 5
Alternatively, you can check by matrix mutiplication
3 1 2
5
1
3 1
5
3x + y = 3 ·
which gives
1
2
You can see that both calculations are, in fact, identical.
132
CHAPTER 3. MATRICES
The 3 × 3 case
We begin by showing how to construct the adjugate matrix in general.
Let A be an n × n matrix with entries aij .
• Pick a particular row i and column j. If we cross out this row and
column we get an (n − 1) × (n − 1) matrix which I shall denote by
M (A)ij ; it is called a submatrix of the original matrix A.
• The determinant det(M (A)ij ) is called the minor of the element aij .
• Finally, if we multiply det(M (A)ij ) by the corresponding sign we get
the cofactor (−1)i+j det(M (A)ij ) of the element aij .
cofactor = signed minor
• If we replace each element aij by its cofactor, we get the matrix C(A)
of cofactors of A.
Example 3.7.9. We compute the matrix of cofactors of an arbitrary 2 × 2
matrix. Let
a b
A=
c d
We begin by computing the minors:
det(M (A)11 ) = d,
det(M (A)12 ) = c,
det(M (A)21 ) = b,
Thus the matrix of minors in this case is
d c
b a
The matrix of signs is
The matrix of cofactors is
+ −
− +
d −c
−b
a
det(M (A)22 ) = a.
3.7. *MATRIX INVERSES*
133
Example 3.7.10. Let


3 1 −4
6 
A= 2 5
1 4
8
We shall calculate the matrix of cofactors C(A) of the matrix A. The pattern
of signs we shall use is


+ − +
 − + − 
+ − +
Let A be any square matrix. The transpose of the matrix of cofactors
C(A), denoted adj(A), is called the adjugate3 matrix of A.
The crucial property of the adjugate is described in the next result.
Theorem 3.7.11. For any square matrix A, we have that
A(adj(A)) = det(A)I = (adj(A))A.
We have verified the above result in the case of 2 × 2 matrices. It is
possible to verify the above theorem in the case of 3 × 3 matrices, but that
it much more laborious. To prove this result in general we need to develop
the properties of determinants further.
Theorem 3.7.12. Let A be a square matrix. Then A is invertible if and
only if det(A) 6= 0. When A is invertible, its inverse is given by
A−1 =
1
adj(A).
det(A)
Proof. Let A be invertible. By our lemma above, det(A) 6= 0 and so we can
form the matrix
1
adj(A).
det(A)
We now calculate
A
1
1
adj(A) =
A adj(A) = I
det(A)
det(A)
by our theorem above. Thus A has the advertised inverse.
3
This odd word comes from Latin and means ‘yoked together’.
134
CHAPTER 3. MATRICES
Conversely, suppose that det(A) 6= 0. Then again we can form the matrix
1
adj(A)
det(A)
and verify that this is the inverse of A and so A is invertible.
Advice The adjugate is useful in finding the inverses of 2×2 and maybe 3×3
matrices, but for larger matrices it requires a lot of work. There are better
ways of finding inverses using what are called ‘elementary row operations’.
They will be described in a later course.
Example 3.7.13. Let


1 2 3
A= 2 0 1 
−1 1 2
We show that A is invertible and calculate its inverse. First, det(A) = −5
and so A is invertible. The matrix of minors is


−1
5
2
 1
5
3 
2 −5 −4
The matrix of cofactors is


−1 −5
2
 −1
5 −3 
2
5 −4
The adjugate is the transpose of the matrix of cofactors


−1 −1
2
 −5
5
5 
2
3 −4
Thus the inverse of A is the adjugate with each entry divided by the determinant of A


−1 −1
2
1 
−5
5
5 
A−1 =
−5
2 −3 −4
Now check your answer!
3.8. *COMPLEX NUMBERS VIA MATRICES*
135
We can now apply the matrix inverse method to solve systems of three
equations in three unknowns in the case where the matrix of coeffients is
invertible.
3.8
*Complex numbers via matrices*
This section will not be examined in 2013.
Consider all matrices that have the following shape
a −b
b a
where a and b arr arbitrary real numbers. You should show first that the sum,
difference and product of any two matrices having this shape is also a matrix
of this shape. Rather remarkably matrix multiplication is commutative for
matrices of this shape. Observe that the determinant of our matrix above is
a2 + b2 . It follows that every non-zero matrix of the above shape is invertible.
The inverse of the above matrix in the non-zero case is
1
a b
a2 + b2 −b a
and again has the same form. It follows that the set of all these matrices
satisfies the axioms of high-school algebra. Define
1 0
1=
0 1
and
i=
0 −1
1 0
We may therefore write our matrices in the form
a1 + bi.
Observe that
i2 = −1.
It follows that our set of matrices can be regarded as the complex numbers
in disguise.
136
3.9
CHAPTER 3. MATRICES
Learning outcomes for Chapter 3
• Add, subtract, and multiply two matrices, and multiply a matrix by a
scalar; be able to carry out sequences of such operations to obtain a
single matrix as a result.
• Compute f (A) given a polynomial f (x) and a square matrix A.
• Compute (small) determinants by first row expansion.
• Know and be able to use the basic properties of determinants.
• Solve linear equations using Gaussian elimination.
• Cramer’s rule and why you shouldn’t use it.
3.10
Further reading and exercises
The material in this chapter is absolutely basic and it is essential to master
all the ideas here since you will meet with them continually in your future
studies. The best place to gain that practice is to use the Schaum Outline
book Linear algebra. This contains far more than I cover in this course
but you will also find it useful in the second year. Chapter 6 of Hirst and
Singerman contains a treatment of matrices.
Chapter 4
Vectors
The Greeks attributed the discovery of geometry to the Ancient Egyptians
who needed it in recalculating land boundaries for the purposes of tax assessment after the yearly flood of the Nile. But it was the Ancient Greeks
themselves who elevated it into a mathematical science and a paradigm of
what could be achieved in mathematics.
Euclid’s book the Elements codified what was known about geometry into
a handful of axioms and then showed that all of geometry could be deduced
from those axioms by the use of mathematcial proof. The Elements is not
only the single most important mathematics books ever written but one of
the most important books period, as the Americans say.
Impressive though Euclid’s achievement was, it does suffer one drawback
in that it is not the easiest system to use. Even proving simple results, like
Pythagoras’s theorem, takes dozens of intermediate results. So although it
is a great theoretical achievement, it is not such a practical one. It was not
until the nineteenth century that a practical tool for doing three-dimensional
geometry was constructed. On the basis of the work carried out by Hamilton on quaternions — I say a little more about this later — the theory of
vectors, the subject of this chapter, was developed by the American Josiah
Willard Gibbs and promoted by the English electrical engineer Oliver Heaviside (whose formal schooling ended at the age of 16).
In addition to setting up an algebraic system that will enable us to carry
out geometrical calculations easily, I shall also touch on a deep connection
with the work of the previous chapter. Each linear equation in three unknowns is in fact the equation of a plane in three-dimensional space. This
means that the theory of linear equations in three unknowns has a geometri137
138
CHAPTER 4. VECTORS
cal interpretation. This may be generalized: the theory of matrices combined
with a theory of vectors in arbitrary dimensions is known as linear algebra,
one of the most important branches of algebra.
I have not attempted to develop the subject in this chapter completely
rigorously, so I often make appeals to geometric intuition in setting up the
algebraic theory of vectors.
4.1
Vector algebra
I shall assume you are familiar with the following ideas from school:
• the notion of a point;
• the notion of a line and of a line segment;
• the notion of the length of a line segment and the angle between two
lines;
• the notion of parallel lines.
The notion of a pair of lines being parallel is fundamental to Euclidean
geometry. It is used, for example, in proving that the sum of the angles in a
triangle adds up to 180◦ . This is illustrated in the following diagram where
we prove that A + B + C = 180◦ .
4.1. VECTOR ALGEBRA
4.1.1
139
Addition and scalar multiplication of vectors
Key definition Two directed line segments which are parallel, have the
same length, and point in the same direction are said to represent the same
vector.
The word ‘vector’ means carrier in Latin and what a vector carries is
information about length and about direction. Because vectors stay the
same when they move parallel to themselves, they also preserve information
about angles.
Thus vectors have length and direction but no other properties. I shall
denote vectors by bold letters a, b, . . . If P and Q are points then the directed
−→
line segment from P to Q is written P Q or P Q. If P = Q then P Q is just a
point. The zero vector 0 is represented by the degenerate line segment P P .
Vectors are denoted by arrows: the vector starts at the base of the arrow
(where the feathers would be) we shall call this the tail of the vector and
ends at the tip (where the arrowhead is) which we shall call the point of the
140
CHAPTER 4. VECTORS
vector.
Example 4.1.1. In the diagram below all the vectors shown are equal.
?









?









?









?









?









?









?









?









?









The set of vectors in space can be equipped with two operations: vector
addition and multiplication by a scalar.
Let a and b be vectors. Then their sum is defined as follows: slide the
vectors parallel to themselves so that the point of a touches the tail of b.
The directed line segment from the tail of a to the point of b represents the
vector a + b.
G ??
???
?? b
??
??
??
??
a o7
o
o
o
oo
ooo
o
o
ooo
ooooo a+b
o
ooo
This definition does make sense though I will not justify that here. If a is
a vector, then −a is defined to be the vector with the same length as a but
pointing in the opposite direction.
?











a 



 −a












4.1. VECTOR ALGEBRA
141
Theorem 4.1.2 (Properties of vector addition).
(VA1) a + (b + c) = (a + b) + c. This is the associative law for vector
addition.
(VA2) 0 + a = a = a + 0. The zero vector is the additive identity.
(VA3) a + (−a) = 0 = (−a) + a. The vector −a is the additive inverse of a.
(VA4) a + b = b + a. This is the commutative law for vector addition.
The proof of the commutativity of vector addition is illustrated below.
b
?
?/ 7

ooo

o
o
oo 

ooo 

o

o


ooo
ooo


o


o
a+boo
a 

o
a

oob+a
o


o


ooo

 oooo


 ooo


ooooo


ooo
/ 
b
The proof of associativity is illustrated below.
We define a − b = a + (−b).
142
CHAPTER 4. VECTORS
Advanced remark We have seen the above properties before: real numbers
with respect to addition, and m × n matrices with respect to matrix addition. A set equipped with a binary operation that is associative, possesses
an identity, possesses unique inverses and is commutative is called an abelian
group.
Example 4.1.3. Consider the following square of vectors.
a
O
/
d
b
o
Then we have
a + b + c + d = 0.
Thus, in particular,
d = −c − b − a.
c
4.1. VECTOR ALGEBRA
143
Example 4.1.4. Consider the following diagram
/
? ???

??

??

??


??


??

??

f 
??c
a

k
??


??

??


??


??


??

??

oO
o
OOO
g
h
OOO
OOO
OOO
OOO
OOO
OOO
OOO
OO
d
e OOO
OOO
OOO
OOO
OOO
OOO
OOO
OO' O
b
(i) We may write c in terms of e, d and f . By following the arrows we get
that c = d + ef
(ii) We may write g in terms of c, d, e and k. By following the arrows we
get that g = −k + c + d − e.
(iii) We may solve x + b = f using similar methods to high-school algebra
to get x = f − b which is just a.
(iv) We may solve x + h = d − e in a similar fashion to get x = d − e − h
which is just g.
If a is a vector then kak is its length.
If kak = 1 then a is called a unit vector. We have that kak ≥ 0, and
kak = 0 iff a = 0. By results on triangles we have the triangle inequality
ka + bk ≤ kak + kbk .
We now define scalar multiplication of a vector. Let λ be a scalar and a
a vector. If λ = 0 then λa = 0; if λ > 0 then λa has the same direction as a
144
CHAPTER 4. VECTORS
and length λ kak; if λ < 0 then λa has the opposite direction to a and length
(−λ) kak. Observe that in all cases
kλak = |λ| kak .
If a is non-zero then
a
kak
is a unit vector in the same direction as a. We call this process normalisation.
Vectors that differ by a scalar multiple are said to be parallel.
â =
Theorem 4.1.5 (Properties of scalar multiplication).
(i) 0a = 0.
(ii) 1a = a.
(iii) (−1)a = −a.
(iv) (λ + µ)a = λa + µa.
(v) λ(a + b) = λa + λb.
(vi) λ(µa) = (λµ)a.
We can use what we have introduced so far to prove simple geometric
theorems.
Example 4.1.6. If the midpoints of the consecutive sides of any quadrilateral
are joined by line segments, then the resulting quadrilateral is a parallelogram. We refer to the picture below.
4.1. VECTOR ALGEBRA
145
We have that
a + b + c + d = 0.
−→ 1
−
−
→
Now AB = 2 a + 12 b and CD = 12 c + 12 d. But a + b = −(c + d). It follows
−→
−−→
that AB = −CD. Hence the line segment AB is parallel to the line segment
CD and they have the same lengths. Similarly, BC is parallel to AD and
has the same length.
4.1.2
Inner, scalar or dot products
Let a and b be two vectors. If a, b 6= 0 then we define
a · b = kak kbk cos(θ)
where θ is the angle between a and b.
Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b
is zero then a · b is defined to be zero. We call a · b the inner product of a
and b. It is also sometimes called the scalar product and the dot product. It
is important to remember that it is a scalar and not a vector.
We say that non-zero vectors a and b are orthogonal to each other if the
angle between them is ninety degrees. The key property of the inner product
is that for non-zero a and b we have that
a · b = 0 iff a and b are orthogonal.
Theorem 4.1.7 (Properties of the inner product).
(i) a · b = b · a.
(ii) a · a = kak2 .
(iii) λ(a · b) = (λa) · b = a · (λb).
(iv) a · (b + c) = a · b + a · c.
Remarks
(i) The inner product a · a is often abbreviated a2 .
(ii) Property (iv) says that the inner product distributes over addition. It is
the only property that takes a bit of work to prove; I give the proof
later.
146
CHAPTER 4. VECTORS
The inner product enables us to prove much more interesting theorems.
Example 4.1.8. The angle in a semicircle is a right angle. Draw a semicircle.
Choose any point on the circumference of the semicircle and join it to the
points at either end of the diameter of the semicircle. Then the claim is that
the resulting triangle is right-angled.
We are interested in the angle formed by AB and AC. Observe that
−→
−→
AB = −(a + b) and AC = a − b. Thus
−→ −→
AB · AC =
=
=
=
−(a + b) · (a − b)
−(a2 − a · b + b · a − b2 )
−(a2 − b2 )
0
using the fact that a · b = b · a and kak = kbk, because this is just the radius
of the semicircle. It follows that the angle BAC is a right angle, as claimed.
4.1. VECTOR ALGEBRA
147
Example 4.1.9. Pythagoras’ theorem proved using vectors.
We have that
a+b+c=0
and so a + b = −c. Now
(a + b)2 = (−c) · (−c) = kck2 .
But
(a + b)2 = kak2 + 2a · b + kbk2
and this is equal to kak2 + kbk2 because a · b = 0. It follows that
kak2 + kbk2 = kck2 .
Remark The set of 3-dimensional vectors equipped with the operations of
vector addition and scalar multiplication together with the inner product is
called three dimensional Euclidean space. This is precisely the space of Euclid’s geometry, but done in a modern way.
4.1.3
Vector or cross products
In three dimensional space there is another operation available to us that is
useful in many applications. Let a and b be non-zero vectors. We define a
new vector
a × b = kak kbk sin(θ)n
148
CHAPTER 4. VECTORS
where θ is the angle between a and b, and n is a unit vector at right angles to
the plane containing a and b — this determines n up to sign: we choose the
direction of n so that when rotating a to b in a clockwise direction through
the angle θ we are looking in the direction of n.
O
a×b
/
??
??
??
??
??
??
??
a ??
??
??
??
??
?
b
If a or b is zero then a × b is the zero vector. We call it the vector
product of a and b. It is sometimes called the cross product. It is important
to remember that it is a vector. The key property of the vector product is
that for non-zero vectors
a × b = 0 iff a and b are parallel.
Theorem 4.1.10 (Properties of the vector product).
(i) a × b = −b × a.
(ii) λ(a × b) = (λa) × b = a × (λb).
(iii) a × (b + c) = a × b + a × c.
Remark Property (iii) says that the vector product distributes over addition.
This is the hardest property to prove; I give the proof later.
Warning! a × (b × c) 6= (a × b) × c. In other words, the vector product is
not associative.
4.1. VECTOR ALGEBRA
149
Warning! Distinguish between the following:
• λa. This is a scalar λ times a vector a and the result is a vector.
• a · b. This is the inner product of two vectors and is a scalar.
• a × b. This is the vector product of two vectors and is a vector.
You must not interchange notation for these different products (unlike
school algebra where you can).
Example 4.1.11. The area of the parallelogram determined by the vectors
a and b is ka × bk as the following picture shows.
Example 4.1.12. We shall prove the ‘law of sines’ for triangles using the
vector product. With reference to the diagram below
we have that
sin(A)
sin(B)
sin(C)
=
=
.
a
b
c
150
CHAPTER 4. VECTORS
We choose vectors as shown so that
kak = a, kbk = b, kck = c.
Then
a + b + c = 0.
Hence
a + b = −c.
Take the vector product of this equation on both sides on the left with a, b
and c in turn. We get
1. a × b = c × a.
2. b × a = c × b.
3. c × a = b × c.
From (1), we get
ka × bk = kc × ak .
Thus
kbk sin(C) = kck sin(B)
which gives us the second equation in the statement of the result. The
remaining results follow similarly.
4.1.4
Scalar triple products
This product is nothing more than a combination of the previous two. However, it is included because, as we shall see, it has an important geometric
interpretation.
Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c)
is a scalar. We define
[a, b, c] = a · (b × c).
It is called the scalar triple product. Its properties are determined by the
properties of the inner and vectors products. What it means geometrically
will be described later.
Exercises 4.1
4.1. VECTOR ALGEBRA
151
1. Consider the following diagram.
A
a
/
B
b
/
C
c
D
E
Now answer the following questions.
(i) Write the vector BD in terms of a and c
(ii) Write the vector AE in terms of a and c
(iii) What is the vector DE?
(iv) What is the vector CF ?
(v) What is the vector AC?
(vi) What is the vector BF ?
2. If a, b, c and d represent the consecutive sides of a quadrilateral, show
that the quadrilateral is a parallelogram if and only if a + c = 0.
3. In the regular pentagon ABCDE, let AB = a, BC = b, CD = c, and
DE = d. Express EA, DA, DB, CA, EC, BE in terms of a, b, c,
and d.
4. Let a and b represent adjacent sides of a regular hexagon so that the
initial point of b is the terminal point of a. Represent the remaining
sides by means of vectors expressed in terms of a and b.
5. Prove that kak b + kbk a is orthogonal to kak b − kbk a for all vectors
a and b.
6. Let a and b be two non-zero vectors. Let
a·b
a.
u=
a·a
Show that b − u is orthogonal to a.
F
152
CHAPTER 4. VECTORS
7. Simplify (u + v) × (u − v).
8. Let a and b be two unit vectors the angle between them being π3 . Show
that 2b − a and a are orthogonal.
9. Prove that
ku − vk2 + ku + vk2 = 2(kuk2 + kvk2 ).
Deduce that the sum of the squares of the diagonals of a parallelogram
is equal to the sum of the squares of all four sides.
4.2
Vector arithmetic
The theory I introduced in Section 4.1 is useful for proving general results
about geometry, but what if we want to calculate with particular vectors: how
do we describe them? To do this we need coordinates, and vectors viewed in
terms of coordinates will occupy us for the remainder of this section.
4.2.1
i’s, j’s and k’s
Set up a cartesian coordinate system consisting of x, y and z axes. We
orient the system so that in rotating the x axis clockwise to the y axis, we
are looking in the direction of the positive z axis. Let i, j and k be unit
vectors parallel to the x, y and z axes respectively (pointing in the positive
directions). Every vector a can be uniquely written in the form
a = a1 i + a2 j + a3 k
for some scalars a1 , a2 , a3 . This is achieved by orthogonal projection of the
vector a (moved so that it starts at the origin) onto each of the three coordinate axes. The numbers ai are called the components of a in each of the
three directions.
Remarks
• If a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k then a = b iff ai = bi ;
that is, corresponding components are equal.
• 0 = 0i + 0j + 0k.
4.2. VECTOR ARITHMETIC
153
• If a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k then
a + b = (a1 + b1 )i + (a2 + b2 )j + (a3 + c3 )k.
• If a = a1 i + a2 j + a3 k then λa = λa1 i + λa2 j + λa3 k.
Theorem 4.2.1. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then
a · b = a1 b1 + a2 b2 + a3 b3 .
Proof. This is proved using Theorem 4.1.7 (iv) and the following table
· i j k
i 1 0 0
j 0 1 0
k 0 0 1
computed from the definition of the inner product. We have that
a · b = a · (b1 i + b2 j + b3 k) = b1 (a · i) + b2 (a · j) + b3 (a · k).
We now compute a · i, a · j, and a · k in turn:
• a · i = a1 .
• a · j = a2 .
• a · k = a3 .
Putting everything together we get
a · b = a1 b 1 + a2 b 2 + a3 b 3 ,
as required.
Remark If a = a1 i + a2 j + a3 k then kak =
p
a21 + a22 + a23 .
Theorem 4.2.2. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then
i j k a × b = a1 a2 a3 b1 b2 b3 Warning! This ‘determinant’ can only be expanded along the first row.
154
CHAPTER 4. VECTORS
Proof. This follows by Theorem 4.1.10 (iii) and the following table
×
i
j k
i
0 k −j
j −k 0
i
k
j −i 0
computed from the definition of the vector product. We have that
a × b = a × (b1 i + b2 j + b3 k) = b1 (a × i) + b2 (a × j) + b3 (a × k).
We now compute a × i, a × j, and a × k in turn:
• a × i = −a2 k + a3 j.
• a × j = a1 k − a3 i.
• a × k = −a1 j + a2 i.
Putting everything together we get
a × b = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k
which is equal to the given determinant.
The proof of the following now follows by our two theorems above.
Theorem 4.2.3 (Scalar triple products and determinants). Let
a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, c = c1 i + c2 j + c3 k.
Then
a1 a2 a3 [a, b, c] = b1 b2 b3 c1 c2 c3 Thus the properties of scalar triple products are the same as the properties of
3 × 3 determinants.
Proof. We calculate a · (b × c). This is equal to
(a1 i + a2 j + a3 k) · [(b2 c3 − b3 c2 )i − (b1 c3 − b3 c1 )j + (b1 c2 − b2 c1 )k].
4.2. VECTOR ARITHMETIC
155
But this is equal to
a1 (b2 c3 − b3 c2 ) − a2 (b1 c3 − b3 c1 ) + a3 (b1 c2 − b2 c1 )
which is nothing other than
a1 a2 a3 b1 b2 b3 c1 c2 c3 Before we look at some examples it is worth stepping back a bit to see
where we are.
Summary
In Section 4.1, we defined vectors and the vector operations geometrically. In Section 4.2, we showed that once we had chosen a
co-ordinate system vectors and the vector operations could be described algebraically. The important point to remember in what
follows is that the two approaches must give the same answers.
Exercises 4.2
1. Let a = 3i + 4j, b = 2i + 2j − k and c = 3i − 4k.
(i) Find kak, kbk, and kck.
(ii) Find a + b and a − c.
(iii) Determine ka − ck.
2. (i) Let a = 4i + j − 3k and b = i + 2j + 2k. Find a · b. Are a and b
orthogonal?
(ii) Find the angle between −2(i − j) + k and j − i.
3. The unit cube is determined by the three vectors i, j and k. Find the
angle between the long diagonal of the unit cube and one of its edges.
156
CHAPTER 4. VECTORS
4. Calculate i × (i × k) and (i × i) × k. What do you deduce as a result
of this?
5. Calculate u · (v × w) where u = 3i − 2j − 5k, v = i + 4j − 4k, and
w = 3j + 2k.
6. If [a, b, c] = 0 what can you deduce?
4.3
Geometry with vectors
There are two kinds of vectors: the free vectors that we have been dealing
with up to now and the position vectors we introduce next.
4.3.1
Position vectors
So far, we have used vectors to describe line segments. But we can also use
vectors to describe the precise location of points. To do this, we have to
choose and fix a point O in space, called an origin. We can then consider all
the directed line segments that start at O. Each such segment represents a
vector and every vector is thus represented. The tops of the line segments are
points in space, and every point thus occurs. It follows that once an origin
has been fixed, vectors can be used to describe points. We talk about the
position vectors of points. However, we can only talk about position vectors
with respect to some fixed point O.
4.3. GEOMETRY WITH VECTORS
157
Example 4.3.1. The point A has position vector a = −i + j and the point
B has position vector b = 2i + j − k. Find the position vector of the point
P which is 23 of the way between A and B.
AO ?
??
??
??
??
??
?? 2
??
??
??
??
??
??
a
O
?P ?
 ???

??

??

??


??

p 
??1

??


??


??


??

??


?


/B
b
We have that
−→
−→ −→
OP = OA + AP
−→ 2 −→
= OA + AB
3
2
= a + (b − a)
3
1
2
=
a+ b
3
3
2
1
(−i + j) + (2i + j − k)
=
3
3
2
= i+j− k
3
4.3.2
Linear combinations
Let v1 , . . . , vn be n vectors and let λ1 , . . . , λn be n scalars. Then the vector
v = λ1 v1 + . . . + λn vn
158
CHAPTER 4. VECTORS
is called a linear combination of the n vectors.
Only two cases of this definition are needed in this course. If we are
given just one vector v1 then a linear combination is just a scalar multiple of
that vector. The other case if where we have two vectors v1 and v2 . Linear
combinations then look like this
λ1 v1 + λ2 v2 .
Let v be any non-zero vector. Then any vector parallel to this vector has
the form λv for some scalar λ.
Now let v1 and v2 be two non-zero vectors where neither is a multiple of
the other. Then these two vectors determine a plane in space. This plane
is not rooted to any point and so, for convenience, we may move it parallel
to itself so that it passes through some fixed point that we may treat as an
origin. Now let v be any vector which is parallel to this plane. We may move
it parallel to itself so that its tail is at the origin. By plane geometry, we
may find real numbers λ1 and λ2 such that
v = λ1 v1 + λ2 v2 .
We shall use these ideas in deriving formulae for lines and planes in space
in the sections below.
4.3.3
Lines
Intuitively, a line in space is determined by one of the following two pieces
of information:
1. Two distinct points.
2. One point and a direction.
Let’s see how we can use vectors to obtain the equation of that line.
Let a and b be the position vectors of two distinct points. Let r =
xi + yj + zk be the position vector of a point on the line they determine.
Observe that the line determined by the two points will be parallel to the
vector b − a which is the direction the line is parallel to.
4.3. GEOMETRY WITH VECTORS
159
The vectors r − a and b − a will be parallel. Thus there is a scalar λ such
that r − a = λ(b − a). It follows that
r = a + λ(b − a).
This is called the (vector form of ) the parametric equation of the line. The
parameter in question is λ.
We now derive the coordinate form of the parametric equation. Let
a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k.
Substituting in our vector form above and equating components we obtain
x = a1 + λ(b1 − a1 ),
y = a2 + λ(b2 − a2 ),
z = a3 + λ(b3 − a3 ).
For convenience, put ci = bi − ai . Thus the coordinate form of the parametric
equation for the line is
x = a1 + λc1 ,
y = a2 + λc2 ,
z = a3 + λc3 .
160
CHAPTER 4. VECTORS
If c1 , c2 , c3 6= 0 then we can eliminate the parameters in the above equations to get the non-parametric equations of the line:
x − a1
y − a2
=
,
c1
c2
y − a2
z − a3
=
.
c2
c3
It’s worth noting that
• The parametric equation is useful for generating points on the line (by
choosing values of the parameter λ).
• The non-parametric equation is useful for checking that given points
lie on a given line.
Example 4.3.2. Find the parametric and the non-parametric equations of
the line through the point with position vector i + 2j + 3k and parallel to the
vector 4i + 5j + 6k. In this question, we are given the direction that the line
is parallel to. Thus
r − (i + 2j + 3k)
is parallel to
4i + 5j + 6k.
It follows that
r = i + 2j + 3k + λ(4i + 5j + 6k)
is the vector form of the parametric equation of the line. We now find the
cartesian form of the parametric equation. Put
r = xi + yj + zk.
Then
xi + yj + zk = i + 2j + 3k + λ(4i + 5j + 6k).
These two vectors are equal iff their coordinates are equal. Thus we have
that
x = 1 + 4λ
y = 2 + 5λ
z = 3 + 6λ
4.3. GEOMETRY WITH VECTORS
161
This is the cartesian form of the parametric equation of the line. Finally, we
eliminate λ to get the non-parametric equation of the line
x−1
y−2
y−2
z−3
=
and
=
.
4
5
5
6
These two equations can be rewritten in the form
5x − 4y = −3 and 6y − 5z = −3.
162
4.3.4
CHAPTER 4. VECTORS
Planes
Intuitively, a plane in space is determined by one of the following three pieces
of information:
1. Any three points that do not all lie in a straight line; that is, the points
form the vertices of a triangle.
2. One point and two non-parallel directions.
3. One point and a direction which is perpendicular or normal to the
plane.
We shall begin by finding the parametric equation of the plane determined
by the three points with position vectors a, b and c.
The vectors b − a and c − a are both parallel to the plane, but are not
parallel to each other. Thus every vector parallel to the plane they determine
has the form
λ(b − a) + µ(c − a)
for some scalars λ and µ. Here we use the ideas of Section 4.3.2. Thus if
the position vector of an arbitrary point on the plane is r, then r − a =
4.3. GEOMETRY WITH VECTORS
163
λ(b − a) + µ(c − a). Thus the (vector form of ) the parametric equation of
the plane is
r = a + λ(b − a) + µ(c − a).
This can easily be written in coordinate form by equating components.
To find the non-parametric equation of a plane, we use the fact that a
plane is determined once a point on the plane is known and a vector orthogonal to every vector in the plane — such a vector is said to be normal to
the plane. Let n be a vector normal to our plane, and let a be the position
vector of a point in the plane.
Then r − a is orthogonal to n. Thus
(r − a) · n = 0.
This is the (vector form) of the non-parametric equation of the plane. To
find the coordinate form of the non-parametric equation, let
r = xi + yj + zk,
a = a1 i + a2 j + a3 k,
n = n1 i + n2 j + n3 k.
From (r − a) · n = 0 we get (x − a1 )n1 + (y − a2 )n2 + (z − a3 )n3 = 0. Thus
the non-parametric equation of the plane is
n1 x + n2 y + n3 z = a1 n1 + a2 n2 + a3 n3 .
Remark From the equation above, we deduce that the solutions of a linear
equation in three unknowns
ax + by + cz = d
164
CHAPTER 4. VECTORS
all lie on a plane in general (although there are some degenerate cases where
something different from a plane will be obtained).
We observe that the non-parametric equation of the line in fact describes
the line as the intersection of two planes.
If we have three equations in three unknowns then, as long as the planes
are angled correctly, they will intersect in a point — that is, the equations
will have a unique solution. However, there are many cases where either the
planes have no points in common (no solution) of have lines or indeed planes
in common (infinitely many solutions).
Thus the nature of the solutions of a system of linear equations in three
unknowns is intimately bound up with the geometry of the planes they determine.
We have one final question to answer: given the parametric equation of
the plane, how do we find the non-parametric equation? The vectors b − a
and c − a are parallel to the plane but not parallel to each other. The vector
n = (b − a) × (c − a)
is normal to our plane.
Example 4.3.3. Find the parametric and non-parametric equations of the
plane containing the three points with position vectors
a = j − k,
b = i + j,
c = i + 2j.
We have that
b−a=i+k
and
c − a = i + j + k.
Thus the parametric equation of the plane is
r = j − k + λ(i + k) + µ(i + j + k).
To find the non-parametric equation, we need to find a vector normal to the
plane. We calculate
(b − a) × (c − a) = k − i.
Thus
(r − a) · (k − i) = 0.
4.3. GEOMETRY WITH VECTORS
165
That is
(xi + (y − 1)j + (z + 1)k) · (k − i) = 0.
This simplifies to
z − x = −1,
the non-parametric equation of the plane. We now check that our three
original points satisfies this equation. The point a has co-ordinates (0, 1, −1);
the point b has co-ordinates (1, 1, 0); the point c has co-ordinates (1, 2, 0).
It is easy to check that each set of co-ordinates satisfies the equation.
4.3.5
Determinants
Let’s start with 1 × 1 matrices. The determinant of (a) is just a. The length
of a is |a|, the absolute value of the determinant of (a).
166
CHAPTER 4. VECTORS
Theorem 4.3.4. Let a = ai + cj and b = bi + dj be a pair of plane vectors.
Then the area of the parallelogram determined by these vectors is the absolute
value of the determinant
a b c d Proof. The proof I give will be for the case where both vectors are in the
first quadrant. I shall consider two cases.
(Case 1): b is to the left of a when standing at the origin and looking
along a. Let
a = ai + cj and b = bi + dj.
The area of the parallelogram is the area of the rectangle defined by the
points
0, (a + b)i, a + b, (c + d)j
minus the area of two rectangles the same size, labelled (1), two triangles the
same size, labelled (2), and another two triangles of the same size, labelled
(3). That is
1
1
(a + b)(c + d) − 2bc − 2( )ac − 2( )bd
2
2
which is equal to
ac + ad + bc + bd − 2bc − bd − ac = ad − bc.
4.3. GEOMETRY WITH VECTORS
167
(Case 2): b is to the right of a when standing at the origin and looking
along a. A similar argument shows that the area is bc − ad which is the
negative of the determinant.
Putting these two cases together, we see that the area is the absolute value
of the determinant, because we usually expect areas to be non-negative.
Theorem 4.3.5. Let
a = ai + dj + gk, b = bi + ej + hk, c = ci + f j + ik
be three vectors. Then the volume of the parallelepiped (‘squashed box’) determined by these three vectors is the absolute value of the determinat
a b c d e f g h i Proof. We refer to the diagram below.
The volume of the box determined by the vectors a, b, c is equal to the
base area times the vertical height. This is equal to the absolute value of
kak kbk sin(θ) kck cos(φ).
We have to use the absolute value of this expression because cos(φ) can take
negative values if c is below rather than above the plane of a and b as I have
drawn it. Now
168
CHAPTER 4. VECTORS
• a × b = kak kbk sin(θ)n, where n is the unit vector orthogonal to a
and b and in the correct direction.
• n · c = kck cos(φ).
Thus
kak kbk sin(θ) kck cos(φ) = (a × b) · c.
By the properties of the inner product
(a × b) · c = c · (a × b) = [c, a, b].
We now use properties of the determinant
[c, a, b] = −[a, c, b] = [a, b, c].
It follows that the volume of the box is the absolute value of
[a, b, c].
It follows from the above theorem and our theorem on scalar triple products that the volume of the parallelepiped determined by the three vectors
a, b, and c is the absolute value of the scalar triple product [a, b, c].
The geometric significance of determinants is that they enable us to measure lengths, areas and volumes.
Exercises 4.3
1. (i) Find the parametric and the non-parametric equations of the line
through the two points with position vectors i − j + 2k and 2i +
3j + 4k.
(ii) Find the parametric and the non-parametric equations of the plane
containing the three points with position vectors i + 3k, i + 2j − k,
and 3i − j − 2k.
2. Let c be the position vector of the centre of a sphere with radius R.
Let an arbitrary point on the sphere have position vector r. Why is
kr − ck = R? Squaring both sides we get
(r − c) · (r − c) = R2 .
4.3. GEOMETRY WITH VECTORS
169
If r = xi + yj + zk and c = c1 i + c2 j + c3 k, deduce that the equation of
the sphere with centre c1 i + c2 j + c3 k and radius R is
(x − c1 )2 + (y − c2 )2 + (z − c3 )2 = R2 .
(i) Find the equation of the sphere with centre i + j + k and radius 2.
(ii) Find the centre and radius of the sphere with equation
x2 + y 2 + z 2 − 2x − 4y − 6z − 2 = 0.
3. The distance of a point from a line is defined to be the length of the
perpendicular from the point to the line. Let the line in question have
parametric equation
r = p + λd
and let the position vector of the point be q. Show that the distance
of the point from the line is
kd × (q − p)k
.
kdk
4. The distance of a point from a plane is defined to be the length of the
perpendicular to the plane. Let the position vector of the point be q
and the equation of the plane be (r − p) · n = 0. Show that the distance
of the point from the plane is
|(q − p) · n|
.
knk
170
4.4
CHAPTER 4. VECTORS
Summary of vectors
Inner products
Definition
Let a and b be two vectors. If a, b 6= 0 then we define
a · b = kak kbk cos θ
where θ is the angle between a and b. Note that this angle is always chosen
to be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. We
call a · b the inner product of a and b.
Co-ordinate form
Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then
a · b = a1 b 1 + a2 b 2 + a3 b 3 .
Uses
• The most important application is the following: if the vectors a and b
are non-zero then a · b = 0 precisely when a and b are orthogonal —
meaning ‘at right angles to each other’.
• The inner product can more generally be used to work out the angle
between two vectors
a·b
cos θ =
kak kbk
where θ is the angle between the non-zero vectors a and b.
• The inner
√ product can be used to work out the lengths of vectors:
kak = a · a.
4.4. SUMMARY OF VECTORS
171
Vector products
Definition
Let a and b be non-zero vectors. We define a new vector
a × b = kak kbk sin θn
where θ is the angle between a and b, and n is a unit vector at right angles to
the plane containing a and b — this determines n up to sign: we choose the
direction of n so that when rotating a to b in a clockwise direction through
the angle θ we are looking in the direction of n. If a or b is zero then a × b
is the zero vector. We call it the vector product of a and b.
Co-ordinate form
Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then
i j k a × b = a1 a2 a3 b1 b2 b3 Uses
• The most important application of the vector product is in constructing
a vector orthogonal to two other vectors, and in particular in constructing a vector orthogonal to a plane — a vector normal to the plane.
• If the vectors a and b are non-zero then a × b = 0 precisely when a
and b are parallel to each other.
• The vector product can be used to calculate the sine of the angle between two vectors
ka × bk
sin θ =
kak kbk
where θ is the angle between the non-zero vectors a and b.
172
CHAPTER 4. VECTORS
Scalar triple products
Definition
Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c)
is a scalar. We define
[a, b, c] = a · (b × c).
It is called the scalar triple product.
Co-ordinate form
Let a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k. Then
a1 a2 a3 [a, b, c] = b1 b2 b3 c1 c2 c3 Uses
• The absolute value of [a, b, c] is the volume of the parallelepiped (‘squashed
box’) determined by the three vectors.
• The scalar triple product gives a geometric interpretation of 3 × 3 determinants.
4.5. *TWO VECTOR PROOFS*
4.5
173
*Two vector proofs*
This section will not be examined in 2013.
My development of the theory of vectors in this chapter, depended on
two important results: Theorem 4.1.7 (iv), the fact that
a · (b + c) = a · b + a · c
and Theorem 4.1.10 (iii), the fact that
a × (b + c) = a × b + a × c.
I shall sketch out proofs of both of these results here. The proof of the first
is not too difficult.
Theorem 4.5.1.
a · (b + c) = a · b + a · c.
Proof. Let x and y be a pair of vectors. Then the component of x in the
direction of y, written comp(x, y), is by definition the number kxk cos θ where
θ is the angle between x and y. Clearly
x · y = kyk comp(x, y).
Geometry shows (this means you should draw the pictures) that
comp(b + c, a) = comp(b, a) + comp(c, a).
We therefore have that
(b + c) · a = kak comp(b + c, a)
= kak comp(b, a) + kak comp(c, a)
= b·a+c·a
The proof of the second is hairier.
Theorem 4.5.2.
a × (b + c) = a × b + a × c.
174
CHAPTER 4. VECTORS
Proof. We defined the vector product in terms of geometry and so we shall
have to prove this property by means of geometry. I shall sketch out a proof
following one given in Pettofrezzo’s book.
We begin with what is in effect a lemma. Let a and b be a pair of
vectors. It is convenient to move them so that they are both emanating from
the same point P . They determine a plane. In that plane, we can draw
a line perpendicular to the vector a and passing through the point P . We
project the vector b onto this line and we get a vector b0 . We claim that
a × b = a × b0 . The proof follows by observing that these two vectors clearly
have the same direction and a calculation shows that they have the same
length.
We now prove our theorem. We orientate ourselves so that the vector a
is at right angles to the page and pointing at you the reader. We project the
vectors a and b onto the plane of the page to get the vectors a0 and b0 .
We shall prove that
a × (b0 + c0 ) = a × b0 + a × c0 .
Let’s see first why this result is enough to prove the theorem. The vectors a
and b + c define a plane. As in our lemma above, we have that a × (b + c) =
a×(b+c)0 . Also a×b = a×b0 and a×c = a×c0 . As long as (b+c)0 = b0 +c0 ,
our theorem will follow.
We now prove that
a × (b0 + c0 ) = a × b0 + a × c0 .
Now, by the way we have defined our vectors, a × b0 and a × c0 are in the
plane of the page and are orthogonal to b0 and c0 , respectively. This leads to
the crux of the proof: the angle between a × b0 and a × c0 is the same as the
angle between b0 and c0 . The point is that because a is pointing out of the
page, the operator a × − has the effect of rotating vectors by a right angle
in the plane of the page.
It follows that a×b0 +a×c0 is at right angles to b0 +c0 . Thus a×b0 +a×c0
and a × (b0 + c0 ) are vectors pointing in the same direction.
We now compare the lengths of these two vectors. We shall use the fact
that the triangles formed by the vectors a × b0 and a × c0 is similar to the
triangle formed by the vectors b0 and c0 . Thus
ka × b0 k
ka × b0 + a × c0 k
=
.
kb0 + c0 k
kb0 k
4.6. *QUATERNIONS*
175
But this works out to give that
ka × b0 + a × c0 k = kak kb0 + c0 k .
Our claim is now proved.
4.6
*Quaternions*
This section will not be examined in 2013.
The set of quaternions, denoted by H, were invented by the Irish mathematician Sir William Rowan Hamilton in 1843. They are 4-dimensional generalisations of the complex numbers. It was from the theory of quaternions
that the modern theory of vectors with inner and vector products developed.
To describe what they are, I shall reverse history and derive them from vectors. Recall the following from some earlier exercises; the notation is slightly
different. The Pauli matrices are: I, X, Y, Z, −I, −X, −Y, −Z where
0 1
i 0
0 −i
X=
Y =
and Z =
−1 0
0 −i
−i 0
where i is the complex number i. You were asked to show that the product
of any two Pauli matrices is again a Pauli matrix by completing a Cayley
table. We shall just need a portion of that Cayley table relating to X, Y and
Z. This is
X
Y
Z
X −I
Z −Y
Y −Z −I
X
Y
Y −X −I
We shall now consider matrices of the form
λI + αX + βY + γZ
where λ, α, β, γ ∈ R. We calculate the product of two such matrices using the
distributivity and scalar multiplication properties of matrix multiplication
and the above multiplication table. The product
(λI + αX + βY + γZ)(µI + α0 X + β 0 Y + γ 0 Z)
176
CHAPTER 4. VECTORS
can be written in the form aI + bX + cY + dZ where a, b, c, d ∈ R although
I shall write it in a slightly different form
(λµ − αα0 − ββ 0 − γγ 0 )I +
λ(α0 X + β 0 Y + γ 0 Z) + µ(αX + βY + γZ) +
(βγ 0 − γβ 0 )X + (γα0 − αγ 0 )Y + (αβ 0 − βα0 )Z.
Although this looks complicated there are some familiar things within it:
the first term contains what looks like an inner product and the last term
contains what looks like a vector product. Note that because this is matrix
multiplication this operation is associative.
The above calculation motivates the following construction. Let E3 denote the set of all 3-dimensional vectors. Thus a typical element of E3 is
αi + βj + γk. Put
H = R × E3 .
The elements of H are therefore ordered pairs (λ, a) consisting of a real
number λ and a vector a. We define the sum of two elements of H in a very
simple way
(λ, a) + (µ, a0 ) = (λ + µ, a + a0 ).
The product is defined in a way that mimics what I did above (you should
check this)
(λ, a)(µ, a0 ) = (λµ − a · a0 , λa0 + µa + (a × a0 )) .
It follows that this product is associative !
We shall now investigate what we can do with H. I shall only deal with
multiplication because addition poses no problems.
• Consider the subset R of H which consists of elements of the form
(λ, 0). You can check that (λ, 0)(µ, 0) = (λµ, 0). Thus R mimics the
real numbers.
• Consider the subset C of H which consists of the elements of the form
(λ, ai). You can check that
(λ, ai)(µ, a0 i) = (λµ − aa0 , (λa0 + µa)i).
In particular, (0, i)(0, i) = (−1, 0). Thus C mimics the set of complex
numbers.
4.7. LEARNING OUTCOMES FOR CHAPTER 4
177
• Consider the subset E of H which consists of elements of the form (0, a).
You can check that
(0, a)(0, a0 ) = (−a · a0 , a × a0 ).
Thus E mimics vectors, the inner product and the vector product.
The set H with the above operations of addition and multiplication is
the set of quaternions. This structure pulls together most of the important
elements of this course: complex numbers, vectors and matrices.
4.7
Learning outcomes for Chapter 4
• Compute with vectors using scalar products, vector products, and
scalar triple products.
• Find the equation of the unique line determined by two points in space
or a point and a direction.
• Find the equation of the unique plane determined by three points in
space or a point and two directions.
• Find the equation of the unique plane determined by a point in the
plane and the normal vector.
• Find volumes of parallelepipeds using scalar triple products.
4.8
Further reading and exercises
The material in this chapter usually causes problems. The reason, I think,
is that it requires you to think both geometrically and algebraically. I would
strongly recommend Chapter 11 of Olive for further reading, and possibly
Chapter 5 of Hirst and Singerman. If you would like to learn about vectors
and matrices with an antipodean flavour, I recommend the first six chapters
of David Easdown’s book A first course in linear algebra, Pearson Education
Australia, 2008 which also comes with an accompanying DVD.
178
CHAPTER 4. VECTORS
Chapter 5
Counting
Counting seems such an easy process that a chapter devoted to it would
appear to be unnecessary but it is a lot harder than it looks since counting
lies behind probability theory. The main goal of this chapter is not so much
the results themselves, which are important, but the methods used to prove
them.
5.1
More set theory
The main tool needed to count sets is set theory itself. In this section, we
describe some constructions that will be the basis of some useful counting
techniques.
5.1.1
Operations on sets
There are three operations defined on sets using the words ‘and’, ‘or’ and
‘not’. Let A and B be sets.
We can construct a new set, called the intersection of A and B and
denoted by A ∩ B, whose elements consist of all those elements that belong
to both A and to B.
We can construct a new set, called the union of A and B and denoted by
A ∪ B, whose elements consist of all the elements of A together with all the
elements of B. In constructing unions of sets, we remember that repetitions
don’t count.
We can construct a new set, called the difference or relative complement
179
180
CHAPTER 5. COUNTING
of A and B and denoted by A \ B,1 whose elements consist of all those elements that belong to A but not to B.
Warning! The word ‘or’ in mathematics doesn’t mean quite the same as
it does in everyday life. Thus ‘X or Y ’ means ‘X or Y or both’, whereas
in everyday life, we assume that ‘or’ means ‘exclusive or’: that is, ‘X or Y ’
means ‘X or Y but not both’.
When illustrating definitions involving sets or trying to gain an idea of
what’s true about sets in general, we often use Venn diagrams. In a Venn
diagram, a set is represented by a region in the plane. The intersection of two
sets can then be represented by the overlap of the two regions representing
each of the sets, and the union is represented by the region enclosed by both
regions. Although Venn diagrams cannot be used to prove results about sets,
they are a nice way of visualising sets and their properties.2
1
2
Sometimes denoted by A − B.
With thanks to Simone Rea for the Venn diagrams
5.1. MORE SET THEORY
A
181
B
A∩B
A
B
A∪B
A
B
A\B
182
CHAPTER 5. COUNTING
Example 5.1.1. Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Find A∩B, A∪B,
A \ B and B \ A.
Let’s begin with A ∩ B. We have to find the elements that belong to both
A and B. We start with the elements in A and work left-to-right: 1 doesn’t
belong to B; 2 doesn’t belong to B; 3 does belong to B as does 4. Thus
A ∩ B = {3, 4}.
To find A ∪ B we join the two sets together {1, 2, 3, 4, 3, 4, 5, 6} and then
read from left-to-right weeding out repetitions to get A∪B = {1, 2, 3, 4, 5, 6}.
To calculate A \ B we have to find the elements of A that don’t belong
to B. So read the elements of A from left-to-write comparing them with the
elements of B: 1 doesn’t belong to B; 2 doesn’t belong to B: but 3 and 4 do
belong to B. It follows that A \ B = {1, 2}.
To calculate B \ A we have to find the set of elements of B that don’t
belong to A: this set is just {5, 6}.
Examples 5.1.2.
(i) E ∩ O = ∅. This says that there is no number which is both odd and
even.
(ii) P ∩ E = {2}. This says that the only even prime number is 2.
(iii) E ∪ O = N. This says that every natural number is either odd or even.
5.1.2
Partitions
The sets A and B are said to be disjoint if A ∩ B = ∅. If A and B are disjoint
then their union is called a disjoint union. If A1 , . . . , An is a collection of sets
such that Ai ∩ Aj = ∅ when i 6= j then we say they are pairwise disjoint.
Let A be a set. A partition P = {A1 , . . . , An } of A is a set whose elements
consist of non-empty subsets A1 , . . . , An of A which are pairwise disjoint and
whose union is A. The subsets Ai are called the blocks of the partition.3
Examples 5.1.3.
(i) The set
P = {{a, c}, {b}, {d, e, f }}
3
The number of partitions of a set with n elements is called the nth Bell number.
5.1. MORE SET THEORY
183
is a partition of the set X = {a, b, c, d, e, f }. In this case there are three
blocks in the partition.
(ii) For statistical purposes a group (i.e a set) of people might be partitioned
by age: the blocks might be under 18’s, 18 to 35 year olds, 36 to 49
year olds, and over 50’s. Here we have 4 blocks.
(iii) How many partitions does the set X = {a, b, c} have? There is 1 partition with 1 block
{{a, b, c}},
there are 3 partitions with 2 blocks
{{a}, {b, c}}, {{b}, {a, c}}, {{c}, {a, b}},
there is 1 partition with 3 blocks
{{a}, {b}, {c}}.
There are therefore 5 partitions of the set X.
5.1.3
Sequences
In a set, the order the elements occur is irrelevant and repetitions are ignored.
But there are plenty of occasions when order does matter and repetitions are
required. On such occasions, we cannot use sets, we use instead a sequence
or list. The list of length n
(a1 , a2 , . . . , an )
consists of n entries where the first element is a1 , the second is a2 and so
on. A list with two entries (a, b) is called an ordered pair.4 Observe that
(a, b) 6= (b, a) unless a = b, so order matters, and (a, a) is different from a on
its own, so repetition matters. The element a is called the first component
and the element b is called the second component. We can also define ordered
triples, which look like (a, b, c), and more generally ordered n-tuples, which
look like (a1 , . . . , an ). Ordered n-tuples are just lists of length n.
4
This notation should not be confused with the notation for real intervals where (a, b)
denotes the set {r : r ∈ R and a < r < b}, nor with the use of brackets in clarifying the
meaning of algebraic expressions. The context should make clear what is intended.
184
CHAPTER 5. COUNTING
If A and B are sets then the set A × B, called the product of A by B, is
defined to be the following set:
A × B = {(a, b) : a ∈ A and b ∈ B},
the set of all ordered pairs where the first component comes from A and the
second component comes from B.
Example 5.1.4. For example, if A = {1, 2} and B = {a, b, c} then
A × B = {(1, a), (1, b), (1, c), (2, a), (2, b), (2, c)}.
More generally, we can define A × B × C to consist of all ordered triples
where the first component comes from A, the second from B and the third
from C. Yet more generally, we can define A1 × . . . × An to consist of all
n-tuples where the ith component comes from Ai .
Example 5.1.5. Dates consist of three pieces of information: the day, the
month and the year, which in the UK are usually stated in that order (this
is different in the US). So dates are ordered triples
(day, month, year)
where day ∈ {1, . . . , 31}, month ∈ {January, . . . , December} and year ∈ N
(I’m assuming here only AD years occur).
Let A be a set. Then we can form the sets A × A, A × A × A and so on.
These are usually abbreviated as A2 , A3 and so on. Thus we have the sets R,
R2 and R3 which represent the real line, real plane and real Euclidean space,
respectively.
Example 5.1.6. British car registration plates are 7-tuples consisting of two
letters, followed by two digits, followed by three letters. If we denote the set
of uppercase English letters by L and the set of digits by D = {0, 1, 2, . . . , 9}
then every registration plate is an element of the set
L × L × D × D × L × L × L.
In fact, not every such 7-tuple is allowable. To be more precise, we should
write
(L \ {I, Q, Z})2 × D2 × (L \ {I, Q})3
5.2. WAYS OF COUNTING
185
where I have used powers of sets as an abbreviation. There are further
restrictions on the last three letters to avoid the use of taboo words. This
too could be expressed by means of set difference. In other words, we can
use set notation to give a precise definition of the form taken by allowable
registration plates.
Exercises 5.1
1. Let S = {4, 7, 8, 10, 23}, T = {5, 7, 10, 14, 20, 25} and V = {2, 5, 10, 20, 30, 36}.
What are S ∪ (T ∩ V ), S \ (T ∩ V ) and (S ∩ T ) \ V ?
2. Let A = {a, b, c, d, e, f } and B = {g, h, k, d, e, f }. What are the elements of the set A \ ((A ∪ B) \ (A ∩ B))?
3. Write down the elements in the set {A, B, C} × {a, b}.
4. Let A = {1, 2, 3} and B = {a, b, c}. What is the set
(A × B) \ (({1} × B) ∪ (A × {c}))?
5. Which of the following are partitions of the set X = {1, 2, . . . , 9}? For
those which are not partitions, explain why they fail.
(i) {{1, 3, 5}, {2, 6}, {4, 8, 9}}.
(ii) {{1, 3, 5}, {2, 4, 6, 8}, {5, 7, 9}}.
(iii) {{1, 3, 5}, {2, 4, 6, 8}, {7, 9}}.
5.2
Ways of counting
The counting principles introduced in this section are of fundamental importance.
186
5.2.1
CHAPTER 5. COUNTING
Counting principles
There are many occasions in mathematics when we have to count the number
of elements in some set. In this section, I have gathered together some of the
most important results on counting. Recall that the number of elements in
a set X is called the cardinality of X, denoted by |X|.
Examples 5.2.1.
(i) |∅| = 0.
(ii) |{♦, ♣, ♥, ♠}| = 4.
(iii) |{a, b, c . . . , x, y, z}| = 26.
There are two general principles that help in counting sets.
I shall say that there is a one-to-one correspondence between the set A
and the set B if each element of A can be paired off with a unique element
in B, and if each element of B is thereby paired off with a unique element in
A. We also say that there is a bijection between A and B.5
1. Correspondence principle (really a definition) There is a one-toone correspondence between the set A and the set B if and only if |A| = |B|.
Example 5.2.2. The number of people (legally) at a football game is the
same as the number of tickets sold.
2. Partition counting principle Let A1 , . . . , An be the blocks of a partition of A. Then |A| = |A1 | + . . . + |An |.
Example 5.2.3. To count the number of people in the UK, count the number
of people (at a fixed time!) in each of the counties, and then add them up.
We shall now show how to use our two principles to count various kinds
of sets.
5
If there is an injection from A to B then |A| ≤ |B|, and if there is a surjection from
A onto B then |A| ≥ |B|.
5.2. WAYS OF COUNTING
5.2.2
187
Counting sequences
We can apply our counting principles to counting lists which will lead to a
third counting principle.
Proposition 5.2.4 (Product counting principle). Let A and B be sets. Then
|A × B| = |A| |B| .
More generally, if there are n sets A1 , . . . , An then
|A1 × . . . × An | = |A1 | . . . | An | .
Proof. The set A × B is the union of the disjoint sets {a} × B where a ∈ A.
Now |{a} × B| = |B| because there is a one-to-one correspondence between
the elements of {a} × B and the elements of B. Thus A × B is the disjoint
union of |A| sets each with |B| elements. Therefore |A × B| = |A| |B|, as
required. The more general result can be proved by induction (proof omitted).
Example 5.2.5. Let A = {1, 2, 3} and B = {α, β}. Then
A × B = {(1, α), (1, β), (2, α), (2, β), (3, α), (3, β)},
and we can see that this set contains 6 elements. On the other hand, using our
lemma above, we have that | A | = 3 and | B | = 2 and so | A × B | = 3 · 2 = 6,
which is what we expect.
A special case of the above proposition occurs when A = A1 = . . . = An ,
in which case,
|An | = |A|n .
This result is useful in counting lists. Let A be a set. Then a list with k
entries each from A is an element of Ak . Thus the number of lists of length
k with entries from A is |A|k .
5.2.3
The power set
The set whose elements are all the subsets of X is called the power set of X
and is denoted by P(X).
Warning! The power set of a set X contains both ∅ and X as elements.
188
CHAPTER 5. COUNTING
Example 5.2.6. Let’s find all the subsets of the set X = {a, b, c}. First we
have the subset with no elements, the emptyset. Then we have the subsets
that contain exactly one element: {a}, {b}, {c}. Then the subsets containing
exactly two elements: {a, b}, {a, c}, {b, c}. Finally, we have the whole set X.
It follows that X has 8 subsets and
P(X) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, X}.
Proposition 5.2.7. Let X be a finite set with n elements. Then
|P(X)| = 2n .
Proof. List the elements of X in any order. A subset of X is determined by
saying which elements of X are to be in the subset and which are not; we
can indicate these elements by writing a 1 above an element of X in the list
if it is in the subset, and a 0 above the element in the list if it is not in the
subset. Thus a subset determines a sequence of 0’s and 1’s of length n where
the 1’s tell you which elements of X are to appear and the 0’s tell you which
elements of X are to be omitted. This is clearly a one-to-one correspondence
between the set of subsets of X and the set of all sequences of 0’s and 1’s
of length n. Thus the number of subsets of X is the same as the number
of lists of length n where each component of the list is taken from the set
{0, 1}. Thus the number of subsets of X is the same as the cardinality of the
set {0, 1}n , but this is equal to 2n . Thus
|P(X)| = 2n ,
as required.
Example 5.2.8. We can illustrate the above proof by looking at the example
where X = {a, b, c}. We list the elements of X so (a, b, c). The empty set
corresponds to the sequence (0, 0, 0) the whole set to (1, 1, 1). The subset
{b, c} corresponds to the sequence (0, 1, 1).
5.2.4
Counting arrangements: permutations
Let X be an n-element set. We are interested in calculating the number of
lists of length n of elements of X where this time there are no repetitions.
We call such a list an n-permutation or just a permutation of X.
5.2. WAYS OF COUNTING
189
Example 5.2.9. Let X = {a, b, c}. The permutations which arise are
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a).
Thus the number of permutations of a 3-element set is 6.
Let n ≥ 1. Define 0! = 1 and
n! = n · (n − 1)!
when n ≥ 1. The number n! is called n factorial.
Proposition 5.2.10. Let X be an n-element set. The number of permutations of length n is n!.
Proof. Let Pn stand for the set of all permutations of an n-element set. Each
permutation of X will begin with one of the elements of X. Thus the set of
all permutations of X, Pn , is partitioned into n blocks each block consisting
of all permutations which begin with one of the elements of X. If we take one
of these blocks, and remove the first element from each of the permutations
in that block, the set which results is a set of permutations of an (n − 1)element set. Thus the number of permutations of an n element set is equal
to n times the number of permutations of an (n − 1)-element set. That is
|Pn | = n |Pn−1 | .
We observe that the cardinality of P1 is just 1. The result now follows.
Let k ≤ n. By a k-permutation we simply mean a list of length k without
repetition whose components are drawn from a set with n elements. Thus
an n-permutation is just what we have defined to be a permutation.
Example 5.2.11. Let’s calculate the 2-permutations of elements from {a, b, c}.
They are
(a, b), (b, a), (a, c), (c, a), (b, c), (c, b).
Proposition 5.2.12. The number of k-permutations of n elements is
n!
.
(n−k)!
Proof. The proof of this just modifies the argument given in the above proposition.
n!
Notation The number (n−k)!
is sometimes written n Pk . This can be read:
the number of ways of ranking k elements chosen from n objects.
190
CHAPTER 5. COUNTING
Example 5.2.13. For example, let’s list the 2-permutations from the set
{1, 2, 3, 4}. We obtain
(1, 2), (1, 3), (1, 4)
and
(2, 1), (2, 3), (2, 4)
and
(3, 1), (3, 2), (3, 4)
and
(4, 1), (4, 2), (4, 3).
On the other hand, our formula above tells us that we should have
which checks out.
5.2.5
4!
2!
= 12
Counting choices: combinations
Let A be a set where |A| = n. A subset of A with k elements is called a
k-subset. It is also often also called a combination of k objects.
Example 5.2.14. Let A = {a, b, c, d}. Let’s find the 2-subsets of A. These
are just {a, b}, {a, c}, {a, d} and {b, c}, {b, d}, {c, d}. That is, 6 altogether.
If X has n elements then there will be one 0-subset, one n-subset, and
then various numbers of 1-subsets, 2-subsets, . . . , and (n−1)-subsets. Denote
the number of k-subsets of an n element set by
n
,
k
pronounced ‘n choose k’.
Notation The number nk is sometimes written n Ck and is read: the number
of ways of choosing k objects from n objects.
Example 5.2.15. Let X = {a, b, c}. There is one 0-subset namely ∅ and
one 3-subset namely X. The 1-subsets are: {a}, {b}, {c} and so there are
three of them. The 2-subsets are: {a, b}, {a, c}, {b, c} and so there are three
of them. Observe that 1 + 3 + 3 + 1 = 8 = 23 .
5.2. WAYS OF COUNTING
191
Proposition 5.2.16. Let 0 ≤ k ≤ n. Then
n
n!
=
.
k
k!(n − k)!
Proof. Let P be the set of all k-permutations of a set with n elements. Partition this set by putting two such perms into the same block if they permute
the same set of k elements. Each block contains k! elements. It follows that
the number of blocks is |Pk!| . However, there is a bijective correspondence between the set of blocks and the set of k-subsets. The result now follows.
Example 5.2.17. Let’s now use this formula to calculate the number of
2-subsets of a 4 element set. This is just
4
=6
2
which is what we found by explicitly finding them.
Numbers of the form
n
k
are called binomial coefficients.
Remark When calculating
n
k
remember that in general a lot of cancellation occurs. For example,
100
100!
100 · 99 · 98!
=
=
= 50 · 99 = 4, 950.
98!2!
98! · 2
98
It would be silly to actually calculate 100! first etc.
Example 5.2.18. Direct calculation shows that
n
n
=
k
n−k
192
CHAPTER 5. COUNTING
but we can explain why this is true in terms of counting subsets of a set. Every
time I choose a subset with k elements I am simultaneously not choosing a
subset with n − k elements. There is therefore a one-to-one correspondence
between the subsets with k elements and the subsets with n − k elements. It
follows by the Correspondence Principle that there must be the same number
of k-subsets as there are n − k-subsets.
5.2.6
Examples of counting
In questions involving counting, ask the following questions and then use the
formulae indicated. The number of objects we make our choice from is n,
and the number of objects being chosen in some manner is k.
Order matters?
yes
yes
no
no
Repetition allowed?
yes
no
no
yes
Terminology
sequences
permutations
combinations
not discussed
Number
nk
n
k
n!
(n−k)!
n!
= k!(n−k)!
-
Examples 5.2.19.
(i) In the lottery, 6 distinct numbers are chosen from the range 1 to 49. How
many ways can this be done? Order is not important and repetitions
are not allowed and so the solution is
49
= 13, 983, 816.
6
(ii) There are 10 contestants in a race. Assuming no ties, how many possible
outcomes of the race are there? Here order matters and repetition is not
allowed. Thus the solution is 10! = 3, 628, 800. (Remember: 0! = 1).
(iii) A committee of 9 people has to elect a chairman, secretary and treasurer
(assumed all different). In how many ways can this be done? There is
an implicit order here: we are not just electing 3 people, we are electing
3 people to specific offices (which we could call ‘office 1’, ‘office 2’ and
‘office 3’). Thus order matters but repetition is not allowed and so the
solution is 9 × 8 × 7 = 504 ways.
5.2. WAYS OF COUNTING
193
(iv) Given the digits 1,2,3,4,5 how many 4-digit numbers can be formed
if repetition is allowed. We are just counting sequences and so the
solution is 54 = 625.
(v) The average novel has 250 pages, each page has 45 lines, and each line
consists of about 60 symbols. The symbols are upper and lower case
letters and punctuation symbols: say about 60 in total. How many
possible novels are there? We allow avant garde novels that consist of
nonsense words or are blank. Think of a novel as a sequence of symbols:
it will be
250 × 45 × 60 = 675, 000
symbols long. But each symbol can be one of 60 possibilities and so
the number of possible novels is
60675,000 .
It’s more convenient to write this as a power of 10 and we get, approximately,
6
1010
possible novels. For comparison purposes the number of atoms in the
universe is estimated to be 1080 .
Exercises 5.2
1. (i) A menu consists of 2 starters, 3 main courses and 4 drinks. How
many possible dinners are there consisting of one starter, one main
course and one drink? Explain your answer using products of sets.
(ii) For the purposes of this question, a date consists of an ordered triple
consisting of the following three components: first component a
natural number d in the range 1 ≤ d ≤ 31; second component a
natural number m in the range 1 ≤ m ≤ 12; third component a
natural number y in the range 1 ≤ y ≤ 3000. How many possible
dates are there?
(iii) In how many ways can 10 books be arranged on a shelf?
(iv) 8 cars are to be ranked first, second and third. In how many ways
can this be done?
194
CHAPTER 5. COUNTING
(v) In how many ways can a hand of 13 cards be chosen from a pack
of 52 cards?
(vi) In how many ways can a committee of 4 people be chosen from 10
candidates?
2. Let A and B be any finite sets. Prove that
|A ∪ B| = |A| + |B| − |A ∩ B| .
3. Prove, using results about sets, that for n ≥ r ≥ 1, we have
n+1
n
n
=
+
.
r
r−1
r
5.3
The binomial theorem
The goal of this section is to prove an important result in algebra using
what we have learnt about counting. We know how to calculate xn where
x is called a monomial. In this section, we shall describe how to calculate
(x + y)n where n is any natural number in terms of powers of x and y. The
expression x + y is called a binomial since it consists of two terms. Let’s look
at how this expression expands for i = 0, 1, 2, 3, 4. We have that
(x + y)0 = 1
(x + y)1 = 1x + 1y
(x + y)2 = 1x2 + 2xy + 1y 2
(x + y)3 = 1x3 + 3x2 y + 3xy 2 + 1y 3
(x + y)4 = 1x4 + 4x3 y + 6x2 y 2 + 4xy 3 + 1y 4
I have highlighted the coefficients that arise, including putting in unity.
These coefficients form what is known as Pascal’s triangle. Observe that
each row can be obtained from the preceding row as follows: apart from the
1’s at each end, each entry in row i + 1 is the sum of two entries in row i,
specifically the two numbers above to the left and right. We shall explain
why this works later. Let’s look at the last row I have written. The numbers
1,
4,
6,
4,
1
5.3. THE BINOMIAL THEOREM
are precisely the numbers
4
4
,
,
0
1
195
4
,
2
4
,
3
4
.
4
We may therefore write
4
(x + y) =
4 X
4
i=0
i
x4−i y i .
The following theorem says that this result is true for any n not just for
n = 4.
Theorem 5.3.1 (The Binomial Theorem). For any natural number n, we
have that
n X
n n−i i
n
(x + y) =
x y.
i
i=0
Proof. This is often proved by induction, but I want to give a more conceptual
proof. I shall also look at a special case to explain the idea. Let’s calculate
(x + y)(x + y)(x + y)
in great detail. Multiplying out the brackets, but before we carry out any
simplifications, we get
(x + y)(x + y)(x + y) = xxx + yxx + xyx + xxy + xyy + yxy + yyx + yyy.
There are 8 summands6 here and each summand is a sequence of x’s and y’s
of length 3. When we simplify, all summands containing the same number of
x’s are collected together. How many summands are there containing i x’s?
Clearly
n
.
i
All summands containing i x’s can be simplified to look like
xn−i y i .
The result now follows by generalising this argument.
6
A summand is something being added in a sum.
196
CHAPTER 5. COUNTING
Thus the numbers in Pascal’s triangle are just the binomial coefficients.
The explanation for the rule used in calculating successive rows of Pascal’s
triangle follows from the following lemma. The proof is left as an exercise in
algebraic manipulation.
Lemma 5.3.2. Let n ≥ r ≥ 1. Then
n+1
n
n
=
+
.
r
r−1
r
One important application of the binomial theorem, which plays a role in
calculus, is in estimating (x + h)n when h is small. I shall illustrate this by
means of an example.
Example 5.3.3. Let x be a real number and let h be small meaning 0 <
h < 1 with the idea that h is a lot smaller than 1. Let’s calculate (x + h)4 .
By the Binomial Theorem we have that
(x + h)4 = x4 + 4x3 h + 6x2 h2 + 4xh3 + h4 .
Now, if h is much smaller than 1 then each of h2 , h3 , h4 will be very much
smaller than 1. For example, if h = 0 · 2 then h2 = 0 · 04 and h3 = 0 · 008
and so on. We may therefore write
(x + h)4 ≈ x4 + 4x3 h,
where the symbol ≈ means ‘approximately equal to’, when h is small.
Let’s see how good this approximation is by calculating a specific example.
Calculate (2 · 00321)4 approximately. By our argument above
(2 · 00321)4 ≈ 24 + 4 × 23 × 0 · 00321 = 16 · 10272.
Calculating (2 · 00321)4 exactly, we get
(2 · 00321)4 = 16 · 10296756.
The argument above is used to calculate the derivative of the function
x 7→ xn when n is a positive integer.
Remark Remember to use ≈ and not = when approximating numbers.
Experience has shown that students often have problems with the binomial theorem. Here are some points to bear in mind:
5.3. THE BINOMIAL THEOREM
197
• Unless the power you have to calculate is small, the binomial theorem
should always be used and not Pascal’s triangle.
• Always write down the theorem so you have something to work with:
n X
n i n−i
(x + y) =
xy .
i
i=0
n
Observe that there is a plus sign between the two terms in the brackets.
n
• ni = n−i
.
• What you call x and what you call y doesn’t matter.
198
CHAPTER 5. COUNTING
Example 5.3.4. Calculate the constant term of
9
1
2
3x −
.
2x
1
) thus X = 3x2
Observe first that the term in the brackets is 3x2 + (− 2x
1
and Y = − 2x . We can now expand using the binomial theorem and simplify
using the properties of exponents
9
9
1
1
2
2
3x −
=
3x + −
(5.1)
2x
2x
= (X + Y )9
(5.2)
9
X 9
=
X 9−i Y i
(5.3)
i
i=0
i
9 X
1
9
2 9−i
=
(3x )
−
(5.4)
i
2x
i=0
9 X
9
=
(3x2 )9−i (−2x)−i
(5.5)
i
i=0
9 X
9
=
(3x2 )9−i (−2)−i (x)−i
(5.6)
i
i=0
9 X
9 9−i 18−2i
=
3 x
(−2)−i x−i
(5.7)
i
i=0
9 X
9 9−i
=
3 (−2)−i x18−2i x−i
(5.8)
i
i=0
9 X
9 9−i
=
3 (−2)−i x18−3i
(5.9)
i
i=0
9 9−i
X
9 3
x18−3i .
(5.10)
=
i
i
(−2)
i=0
Commentary
(4.1) The binomial theorem only applies to sums so the first step is to write
the difference as a sum.
5.3. THE BINOMIAL THEOREM
199
(4.2) This step is not strictly necessary but you might find it helpful when
learning the Binomial Theorem.
(4.3) This is nothing other than the Binomial Theorem applied to (X + Y )9 .
1
(4.4) Now replace X by 3x2 and Y by − 2x
. It is important to observe that
there are brackets around these expressions and that the whole bracket
is raised to a power.
1
is the same as (−2x)−1 . We also use the fact that
(4.5) Observe that − 2x
(aα )β = aαβ .
(4.6) This is where one of the commonest student errors creeps in. Observe
that (ab)α = aα bα . It is a very common mistake to raise only one of
the terms in the brackets to the power.
(4.7) Here I have simply applied the rule (ab)α = aα bα again.
(4.8) I have just rearranged and placed all the numbers together, their product is the coefficient.
(4.9) I have used the result that aα aβ = aα+β
(4.10) Not strictly necessary but I thought it looked better.
Once we have carried out the above computation we can find any coefficient we want:
• The coefficient of x18−3i is
9 39−i
.
i (−2)i
• The constant term occurs when 18 − 3i = 0 and so i = 6. Thus the
constant term is
3
3
9
3
9 3
567
=
=
.
6 (−2)6
6 26
16
Exercises 5.3
1. Write out (1 + x)8 using sigma-notation.
2. Write out (1 − x)8 using sigma-notation.
200
CHAPTER 5. COUNTING
3. Calculate the coefficient of a2 b8 in (a + b)10 .
4. Calculate the coefficent of x3 in (3 + 4x)6 .
5. Use the binomial theorem to prove the following.
P
(i) 2n = ni=0 ni .
P
(ii) 0 = ni=0 (−1)i ni .
P
(iii) ( 23 )n = ni=0 21i ni .
6. Prove, using results about sets, that
n X
n
2 =
.
i
i=0
n
7. Use the binomial theorem to prove that
culate (x + y)2n in two different ways.]
5.4
2n
n
=
n 2
i=0 i .
Pn
[Hint: cal-
*An introduction to infinite numbers*
This section will not be examined in 2013.
We begin with a result important in the history of set theory. It is called
Russell’s Paradox.7 It is the first inkling that the intuitively plausible idea
of a set may contain hidden depths.
Theorem 5.4.1. The collection of all sets that do not contain themselves as
an element is not a set.
7
Bertrand Russell was an Anglo-Welsh philosopher born in 1872, when Queen Victoria
still had another thirty years on the throne as ‘Queen empress’, and who died in 1970 a few
months after Neil Armstrong stepped onto the moon. As a young man he made important
contributions to the foundations of mathematics but in the course of his extraordinary life
he found time to stand for parliament, encouraged the philosopher Ludwig Wittgenstein,
received two prison sentences, won the Nobel prize for literature, was the first president
of CND, and campaigned against the Vietnam war. T. S. Eliot even wrote a poem about
him. Born into an aristocratic family, albeit a startlingly progressive one, he was later
an earl entitled to sit in the House of Lords. See Russell: a very short introduction by
A. C. Grayling published by OUP, 2002, for a very short introduction.
5.4. *AN INTRODUCTION TO INFINITE NUMBERS*
201
Proof. Define
R = {x : x is a set and x ∈
/ x}.
Suppose that R were a set. There are now two possibilities: either R ∈ R
or R ∈
/ R. Suppose first that R ∈ R. Then R must satisfy the condition to
be an element of R which is that R ∈
/ R, a contradiction. Suppose then that
R∈
/ R. Then R does not satisfy the condition to be an element of R and so
in fact R ∈ R. Our two possibilities lead to contradictions. The source of
the problem lies in our assumption that R is a set and so it isn’t.
It follows that the definition of set we gave earlier is really deficient. I’m
not going to discuss the ramifications of this result here instead I shall leave
it hanging as a warning, or invitation, to the curious.
Earlier in this chapter, I introduced the correspondence principle that
essentially defined what it means for two sets to have the same cardinality. I
only explored this notion for finite sets. I shall now show that in fact it leads
to an interesting theory for arbitrary sets.
A bijective correspondence between two sets A and B is defined as follows:
each of element of A is paired off with exactly one element of B in such a
way that different elements in A are paired off with different elements in
B, and every element of B is paired off with something in A. We say that
the sets A and B are equinumerous, denoted A ∼
= B, if there is a bijective
correspondence between A and B.
If A = ∅ define |A| = 0. If A ∼
= {1, 2, . . . , n} define |A| = n. If A ∼
=N
define |A| = ℵ0 ; this number is called aleph nought. Such a set is said to
be countably infinite. If A ∼
= R define |A| = c; this number is called the
cardinality of the continuum.
Theorem 5.4.2.
1. |E| = |O| = |N|.
2. |Z| = |N|.
3. |Q| = |N|.
4. |R| =
6 |N| and so ℵ0 6= c.
Proof. (1). To show that N ∼
= E, we use the correspondence (function)
∼
n 7→ 2n. To show that N = O, we use the function n 7→ 2n + 1.
202
CHAPTER 5. COUNTING
(2) List the elements of Z in the following way:
0, −1, 1, −2, 2, −3, 3, . . .
(3) One way is to split Q into two disjoint sets and show that each of
these sets is equinumerous with O and E, respectively. I shall do part and
leave the rest to you. Let Q+ be the set of all positive rationals. Set up
an array with columns and rows labelled by the non-zero natural numbers.
n
Interpret the entry in row m and column n as the rational number m
. Now
count the resulting fractions by counting along the diagonals (going up from
left to right) omitting repetitions.
(4) One of the great results of mathematics proved using the Cantor
diagonalization argument. To make my argument a tad more natural I shall
use the fact, that is easily proved, that N ∼
= N∗ where the latter set is the set
of positive natural numbers. Assume that there is a bijective correspondence
between N and R. Then we may list the reals as r1 , r2 , r3 . . .. Each real
number can be expressed as an infinite decimal: ri = ai · ai1 ai2 ai3 . . .. Define
a real number R as follows:
R = 0 · R1 R2 R3 . . .
where Ri is equal to 0 if aii is odd, and is equal to 1 otherwise. Observe
that R 6= ri for all i by construction, contradicting out assumption that
N∼
= R.
At this point, we begin to reach the limits of what is known. The continuum hypothesis (CH) is the assertion that the only infinite subsets of the
reals are either countably infinite or have cardinality of the continuum. No
one knows whether (CH) is true or false (but that’s not the half of it).
5.5
*Proving things about sets*
This section will not be examined in 2013.
Let me begin first by listing the most important properties of the set
operations that we defined at the start of this chapter. Let A, B and C be
any sets.
1. A ∩ (B ∩ C) = (A ∩ B) ∩ C. Intersection is associative.
5.6. LEARNING OUTCOMES FOR CHAPTER 5
203
2. A ∩ B = B ∩ A. Intersection is commutative.
3. A ∩ ∅ = ∅ = ∅ ∩ A.
4. A ∪ (B ∪ C) = (A ∪ B) ∪ C. Union is associative.
5. A ∪ B = B ∪ A. Intersection is commutative.
6. A ∪ ∅ = A = ∅ ∪ A. The empty set if the identity for union.
7. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). Intersection distributes over union.
8. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). Union distributes over intersection.
9. A \ (B ∪ C) = (A \ B) ∩ (A \ C).
10. A \ (B ∪ C) = (A \ B) ∪ (A \ C).
It’s possible to illustrate these results by drawing Venn diagrams: for
each result, draw a Venn diagram of the lefthandside and, separately, a Venn
diagram of the righthandside and observe that you get the same diagrams.
This is very handy if you want to check a result but doesn’t really constitute
a proof. So how should we prove these results? First, to prove that X = Y
prove (1) that X ⊆ Y and (2) that Y ⊆ X. But to prove the above results
we need more. The operations of intersection, union and set difference were
defined using the words and, or and not. Thus to prove any results that
use these words we must have clear definitions of what we actually mean by
them. This requires setting up the basics of propositional logic. Once this is
done, it is then a simple matter to prove the above results precisely.
5.6
Learning outcomes for Chapter 5
• Manipulate sets.
• Answer simple counting questions including those involving permutations and combinations.
• Be able to apply the binomial theorem.
204
5.7
CHAPTER 5. COUNTING
Further reading and exercises
This chapter is really a prelude to probability theory and any mysteries will
fall quickly into place once you start studying that subject. Chapter 3 of
Hammack contains some of the same material as well as Chapter 7 of Hirst
and Singerman.
Afterword
The development of algebra can be viewed as the development of our understanding of the concept of number. The introduction of complex numbers in
the sixteenth century burnt like a slow fuse through the following centuries
finally exploding in the nineteenth with the birth of many new algebraic
systems. This began with the observation that complex numbers have not
only an algebraic side but also a geometric one: they are two-dimensional
objects. Hamilton wondered if there were three-dimensional analogues of
complex numbers. After years of trying, he finally succeeded but found, not
three-dimensional but four-dimensional analogues, called quaternions. These
enjoyed an early vogue but applied mathematicians found them less convenient to work with. Gibbs stripped the quaternions down and rebuilt them
as three-dimensional vectors equipped with scalar and vector products. Vectors form the basis of vector analysis and so the first language we meet for
dealing with, say, Maxwell’s equations. When matrices were introduced, it
was realized that they could be used to represent both complex numbers and
quaternions as sets of matrices with real entries. This is the beginning of
both linear algebra and of the theory of (finite dimensional) algebras.
205
206
AFTERWORD
Bibliography
[1] J. W. Archbold, Algebra, Fourth Edition, Pitman Paperbacks, 1970.
[2] G. Birkhoff and S. Mac Lane, A survey of modern algebra, Third Edition,
The Macmillan Company, 1965.
[3] C. B. Boyer, U. Merzbach, History of mathematics, John Wiley and
Sons, 2nd edition, 1989.
[4] L. N. Childs, A concrete introduction to higher algebra, second edition,
Springer, 1995.
[5] G. Chrystal, Introduction to algebra, London, Adam and Charles Black,
1902.
[6] G. Cornell, J. H. Silverman, G. Stevens, Modular forms and Fermat’s
last theorem, Springer, 2000.
[7] R. Courant, Differential and integral calculus, volume 1, Blackie and
Son Limited, 1945.
[8] R. Courant and H. Robbins, What is mathematics?, OUP, 1978.
[9] H.-D. Ebbinghaus et al, Zahlen, Springer-Verlag, 1988.
[10] G. H. Hardy, A course of pure mathematics, Tenth Edition, CUP, 1967.
[11] J. L. Heilbron, Geometry civilized, Clarendon Press, Oxford, 2000.
[12] http://www-history.mcs.st-and.ac.uk/
[13] O. Ore, Number theory and its history, Dover, 1948.
207
208
BIBLIOGRAPHY
[14] A. J. Pettofrezzo, Vectors and their applications, Prentice-Hall, Inc.,
1966.
[15] E. Robson, Words and Pictures: New Light on Plimpton 322, American
Mathematical Monthly 109 (2002), 105–119.
[16] L. E. Sigler, Fibonacci’s Liber Abaci, Springer, 2003.
[17] J. Stillwell, Elements of algebra, Springer, 1994.
[18] C. J. Tranter, Advanced level pure mathematics, Fourth Edition, Hodder
and Stoughton, 1978.
[19] J. V. Uspensky, Theory of equations, McGraw-Hill, 1948.
[20] B. L. Van der Waerden, Algebra: erster Teil, Springer-Verlag, 1966.