Download F17CC1 ALGEBRA A Algebra, geometry and combinatorics

F17CC1 ALGEBRA A Algebra, geometry and combinatorics Dr Mark V Lawson July 12, 2013 ii Contents 1 The fundamental theorem of arithmetic 1.1 Basic sets . . . . . . . . . . . . . . . . . . . . . 1.2 Writing numbers down . . . . . . . . . . . . . . 1.2.1 From tallies to the Hindu-Arabic number 1.2.2 Number bases . . . . . . . . . . . . . . . 1.3 The fundamental theorem of arithmetic . . . . . 1.3.1 Greatest common divisors . . . . . . . . 1.3.2 Primes: the atoms of number . . . . . . 1.4 Real numbers . . . . . . . . . . . . . . . . . . . 1.4.1 Irrational numbers . . . . . . . . . . . . 1.4.2 Decimal fractions . . . . . . . . . . . . . 1.5 *The prime number theorem* . . . . . . . . . . 1.6 *Proofs by induction* . . . . . . . . . . . . . . 1.7 Learning outcomes for Chapter 1 . . . . . . . . 1.8 Further reading and exercises . . . . . . . . . . 2 The fundamental theorem of algebra 2.1 Complex number arithmetic . . . . . . . . 2.1.1 Solving quadratic equations . . . . 2.1.2 Introducing complex numbers . . . 2.2 The fundamental theorem of algebra . . . 2.2.1 The arithmetic of polynomials . . . 2.2.2 Roots of polynomials . . . . . . . . 2.3 Complex number geometry . . . . . . . . . 2.3.1 sin and cos . . . . . . . . . . . . . 2.3.2 The complex plane . . . . . . . . . 2.3.3 Arbitrary roots of complex numbers 2.3.4 Euler’s formula . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 5 5 7 13 13 18 23 24 25 28 31 37 37 . . . . . . . . . . . 39 39 40 42 50 51 52 59 60 60 63 66 ii CONTENTS 2.4 2.5 2.6 2.7 2.8 2.9 *Making sense of complex numbers* . . . . *Morning duel: cubics, quartics, quintics and *Analogies* . . . . . . . . . . . . . . . . . . *Rational functions* . . . . . . . . . . . . . 2.7.1 Numerical partial fractions . . . . . . 2.7.2 Partial fractions . . . . . . . . . . . . 2.7.3 Integrating rational functions . . . . Learning outcomes for Chapter 2 . . . . . . Further reading and exercises . . . . . . . . . . . . . . beyond* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Matrices 3.1 Matrix arithmetic . . . . . . . . . . . . . . . . . . . 3.1.1 Basic matrix definitions . . . . . . . . . . . 3.1.2 Addition, subtraction, scalar multiplication transpose . . . . . . . . . . . . . . . . . . . 3.1.3 Matrix multiplication . . . . . . . . . . . . . 3.1.4 Summary of matrix mutiplication . . . . . . 3.1.5 Special matrices . . . . . . . . . . . . . . . . 3.1.6 Linear equations . . . . . . . . . . . . . . . 3.2 Matrix algebra . . . . . . . . . . . . . . . . . . . . 3.2.1 Properties of matrix addition . . . . . . . . 3.2.2 Properties of matrix multiplication . . . . . 3.2.3 Properties of scalar multiplication . . . . . . 3.2.4 Properties of the transpose . . . . . . . . . . 3.2.5 Polynomials of matrices . . . . . . . . . . . 3.3 Determinants . . . . . . . . . . . . . . . . . . . . . 3.4 Solving systems of linear equations . . . . . . . . . 3.4.1 Some theory . . . . . . . . . . . . . . . . . . 3.4.2 Gaussian elimination . . . . . . . . . . . . . 3.5 Blankinship’s algorithm . . . . . . . . . . . . . . . 3.6 *Some proofs* . . . . . . . . . . . . . . . . . . . . . 3.7 *Matrix inverses* . . . . . . . . . . . . . . . . . . . 3.7.1 The key idea . . . . . . . . . . . . . . . . . 3.7.2 Invertible and noninvertible matrices . . . . 3.7.3 The matrix inverse method for solving linear 3.8 *Complex numbers via matrices* . . . . . . . . . . 3.9 Learning outcomes for Chapter 3 . . . . . . . . . . 3.10 Further reading and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 69 70 71 71 74 77 80 80 81 . . . . . . 81 . . . . . . 81 and the . . . . . . 83 . . . . . . 85 . . . . . . 89 . . . . . . 90 . . . . . . 91 . . . . . . 94 . . . . . . 94 . . . . . . 95 . . . . . . 97 . . . . . . 97 . . . . . . 98 . . . . . . 103 . . . . . . 110 . . . . . . 110 . . . . . . 112 . . . . . . 120 . . . . . . 122 . . . . . . 125 . . . . . . 125 . . . . . . 126 equations 129 . . . . . . 135 . . . . . . 136 . . . . . . 136 CONTENTS iii 4 Vectors 4.1 Vector algebra . . . . . . . . . . . . . . . . . . . . . 4.1.1 Addition and scalar multiplication of vectors 4.1.2 Inner, scalar or dot products . . . . . . . . . 4.1.3 Vector or cross products . . . . . . . . . . . 4.1.4 Scalar triple products . . . . . . . . . . . . . 4.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . 4.2.1 i’s, j’s and k’s . . . . . . . . . . . . . . . . . 4.3 Geometry with vectors . . . . . . . . . . . . . . . . 4.3.1 Position vectors . . . . . . . . . . . . . . . . 4.3.2 Linear combinations . . . . . . . . . . . . . 4.3.3 Lines . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Planes . . . . . . . . . . . . . . . . . . . . . 4.3.5 Determinants . . . . . . . . . . . . . . . . . 4.4 Summary of vectors . . . . . . . . . . . . . . . . . . 4.5 *Two vector proofs* . . . . . . . . . . . . . . . . . 4.6 *Quaternions* . . . . . . . . . . . . . . . . . . . . . 4.7 Learning outcomes for Chapter 4 . . . . . . . . . . 4.8 Further reading and exercises . . . . . . . . . . . . 5 Counting 5.1 More set theory . . . . . . . . . . . . . . . . . 5.1.1 Operations on sets . . . . . . . . . . . 5.1.2 Partitions . . . . . . . . . . . . . . . . 5.1.3 Sequences . . . . . . . . . . . . . . . . 5.2 Ways of counting . . . . . . . . . . . . . . . . 5.2.1 Counting principles . . . . . . . . . . . 5.2.2 Counting sequences . . . . . . . . . . . 5.2.3 The power set . . . . . . . . . . . . . . 5.2.4 Counting arrangements: permutations 5.2.5 Counting choices: combinations . . . . 5.2.6 Examples of counting . . . . . . . . . . 5.3 The binomial theorem . . . . . . . . . . . . . 5.4 *An introduction to infinite numbers* . . . . . 5.5 *Proving things about sets* . . . . . . . . . . 5.6 Learning outcomes for Chapter 5 . . . . . . . 5.7 Further reading and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 . 138 . 139 . 145 . 147 . 150 . 152 . 152 . 156 . 156 . 157 . 158 . 162 . 165 . 170 . 173 . 175 . 177 . 177 . . . . . . . . . . . . . . . . 179 . 179 . 179 . 182 . 183 . 185 . 186 . 187 . 187 . 188 . 190 . 192 . 194 . 200 . 202 . 203 . 204 iv Afterword CONTENTS 205 Chapter 1 The fundamental theorem of arithmetic In everday life a number is a number, but in mathematics we distinguish between different kinds of numbers according to their properties and uses. The goal of this chapter is to describe those numbers that should be familiar to you whereas in the next chapter, we shall introduce numbers that might be unfamiliar to you: the complex numbers that are so important in the later study of mathematics. There are two essential results in this chapter: the fact that every natural number greater than or equal to 2 can be written uniquely as a product of powers of primes — this is the fundamental theorem of arithmetic — and the proof that certain numbers are irrational. 1.1 Basic sets Set theory, invented by Georg Cantor (1845–1918) in the last quarter of the nineteenth century, provides a precise language for doing mathematics. This section is mainly a phrasebook of the most important terms we shall need for the first four chapters, whereas in Chapter 5 we shall study this language in slightly more detail. The starting point of set theory is the following two deceptively simple definitions: • A set is a collection of objects which we wish to regard as a whole. The members of a set are called its elements. • Two sets are equal precisely when they have the same elements. 1 2 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC We often use capital letters to name sets: such as A, B, or C or fancy capital letters such as N and Z. The elements of a set are usually denoted by lower case letters. If x is an element of the set A then we write x∈A and if x is not an element of the set A then we write x∈ / A. A set should be regarded as a bag of elements, and so the order of the elements within the set is not important. In addition, repetition of elements is ignored.1 Examples 1.1.1. (i) The following sets are all equal: {a, b}, {b, a}, {a, a, b}, {a, a, a, a, b, b, b, a} because the order of the elements within a set is not important and any repetitions are ignored. Despite this it is usual to write sets without repetitions to avoid confusion. We have that a ∈ {a, b} and b ∈ {a, b} but α ∈ / {a, b}. (ii) The set {} is empty and is called the empty set. It is given a special symbol ∅, which is taken from Danish and is the first letter of the Danish word meaning ‘empty’. Remember that ∅ means the same thing as {}. Take careful note that ∅ = 6 {∅}. The reason is that the empty set contains no elements whereas the set {∅} contains one element. By the way, the symbol for the emptyset is different from the Greek letter phi: φ or Φ. The number of elements in a set is called its cardinality. If X is a set then |X| denotes its cardinality. A set is finite if it only has a finite number of elements, otherwise it is infinite. If a set has only finitely many elements then we might be able to list them if there aren’t too many: this is done by putting them in ‘curly brackets’ { and }. We can sometimes define infinite sets by using curly brackets but then, because we can’t list all elements in an infinite set, we use ‘. . .’ to mean ‘and so on in the obvious way’. This can also be used to define finite sets where there is an obvious pattern. Often, 1 If you want to take account of repetitions you have to use multisets. 1.1. BASIC SETS 3 we describe a set by saying what properties an element must have to belong to the set. Thus {x : P (x)} means ‘the set of all things x which satisfy the condition P ’. Here are some examples of sets defined in various ways. Examples 1.1.2. (i) D = { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday }, the set of the days of the week. This is a small finite set and so we can conveniently list its elements. (ii) M = { January, February, March, . . . , November, December }, the set of the months of the year. This is a finite set but I didn’t want to write down all the elements so I wrote ‘. . . ’ to indicate that there were other elements of the set which I was too lazy to write down explicitly but which are, nevertheless, there. (iii) A = {x : x is a prime number}. I define a set by describing the properties that the elements of the set must have. Sets can be complicated. In particular, a set can contain elements which are themselves sets. For example, A = {{a}, {a, b}} is a set whose elements are {a} and {a, b} which both happen to be sets. Thus {a} ∈ {{a}, {a, b}}. In this course, the following sets of numbers will play a special role. We shall use this notation throughout and so it is worthwhile getting used to it. Examples 1.1.3. (i) The set N = {0, 1, 2, 3, . . .} of all natural numbers. (ii) The set Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} of all integers. The reason Z is used to designate this set is because ‘Z’ is the first letter of the word ‘Zahl’, the German for number. (iii) The set Q of all rational numbers i.e. those numbers that can be written as fractions whether positive or negative. (iv) The set R of all real numbers i.e. all numbers which can be represented by decimals with potentially infinitely many digits after the decimal point. 4 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC (v) The set C of all complex numbers, which I shall introduce in Chapter 2. Given a set A, a new set B can be formed by choosing elements from A to put in B. We say that B is a subset of A, which is written B ⊆ A. If A ⊆ B and A 6= B then we say that A is a proper subset of B. Observe that two sets are equal precisely when the following two conditions hold 1. A ⊆ B. 2. B ⊆ A. This is often the best way of showing that two sets are equal although we won’t have a lot of use of it in this course. Examples 1.1.4. (i) ∅ ⊆ A for every set A, where we choose no elements from A. It is a very common mistake to forget the empty set when listing subsets of a set. (ii) A ⊆ A for every set A, where we choose all the elements from A. (iii) N ⊆ Z ⊆ Q ⊆ R ⊆ C. (iv) E, the set of even natural numbers, is a subset of N. (v) O, the set of odd natural numbers, is a subset of N. (vi) P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}, the set of primes, is a subset of N. (vii) A = {x : x ∈ R and x2 = 4} which is just the set {−2, 2}. Exercises 1.1 1. Let A = {♣, ♦, ♥, ♠}, B = {♠, ♦, ♣, ♥} and C = {♠, ♦, ♣, ♥, ♣, ♦, ♥, ♠}. Is it true or false that A = B and B = C? Explain. 2. Find all subsets of the set {a, b, c, d}. 3. Let X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Write down the following subsets of X: (i) The subset A of even elements of X. 1.2. WRITING NUMBERS DOWN 5 (ii) The subset B of odd elements of X. (iii) C = {x : x ∈ X and x ≥ 6}. (iv) D = {x : x ∈ X and x > 10}. (v) E = {x : x ∈ X and x is prime}. (vi) F = {x : x ∈ X and (x ≤ 4 or x ≥ 7)}. 4. Write down the cardinalities of the following sets. (i) ∅. (ii) {∅}. (iii) {∅, {∅}}. (iv) {∅, {∅}, {∅, {∅}}}. 1.2 Writing numbers down In this section, we shall explain positional number notation, the system we use for writing numbers down. The numbers we shall be looking at in this section are the natural numbers: N = {0, 1, 2, 3, . . .}. 1.2.1 From tallies to the Hindu-Arabic number system I don’t think our hunter-gatherer ancestors worried too much about writing numbers down because there wasn’t any need: they didn’t have to fill in tax-returns and so didn’t need accountants. However, organizing cities does need accountants and so ways had to be found of writing numbers down. The simplest way of doing this is to use a mark like |, called a tally, for each thing being counted. So |||||||||| means 10 things. This system has advantages and disadvantages. The advantage is that you don’t have to go on a training course to learn it. The disadvantage is that even quite small numbers need a lot of space like |||||||||||||||||||||||||||||||||||||| It’s also hard to tell whether ||||||||||||||||||||||||||||||||||||||| 6 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC is the same number or not. (It’s not.) It’s inevitable that people will introduce abbreviations to make the system easier to use. Perhaps it was in this way that the next development occurred. Both the ancient Egyptians and Romans used similar systems but I’ll describe the Roman system because it involves letters rather than pictures. First, you have a list of basic symbols: number 1 5 symbol I V 10 50 100 500 1000 X L C D M There are more symbols for bigger numbers but we won’t worry about them. Numbers are then written according to the additive principle. Thus MMVIIII is 2009. Incidently, I understand that the custom of also using a subtractive principle so that, for example, IX means 9 rather than using VIIII, is a more modern innovation. This system is clearly a great improvement on the tally-system. Even quite big numbers are written compactly and it is easy to compare numbers. On the other hand, there is a bit more to learn. The other disadvantage is that we need separate symbols for different powers of 10 and their multiples by 5. This was probably not too inconvenient in the ancient world where it is likely that the numbers needed on a day-to-day basis were never going to be that big. A common criticism of this system is that it is hard to do multiplication in. However, that turns out to be a non-problem because, like us, the Romans used pocket calculators or, more accurately, a toga-friendly device called an abacus. The real evidence for the usefulness of this system of writing numbers is that it survived for hundreds and hundreds of years. The system used throughout the world today, called the Hindu-Arabic number system, seems to have been in place by the ninth century in India but it was hundreds of years in development and the result of ideas from many different cultures [3]; the invention of zero on its own is one of the great steps in human intellectual development. The genius of the system is that it requires only 10 symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and every natural number can be written using a sequence of these symbols. The trick to making the system work is that we use the position on the page 1.2. WRITING NUMBERS DOWN 7 of a symbol to tell us what number it means. Thus 2009 means 103 2 102 0 101 0 100 9 In other words 2 × 103 + 0 × 102 + 0 × 101 + 9 × 100 . Notice the important role played by the symbol 0 which makes it clear which column a symbol belongs in otherwise we couldn’t tell 29 from 209 from 2009. The disadvantage of this system is that you do have to go on a course to learn it because it is a highly sophisticated way of writing numbers. On the other hand, it has the enormous advantage that any number can be written down in a compact way. Once the basic system had been accepted it could be adapted to deal not only with positive whole numbers but also negative whole numbers, using the symbol −, and also fractions with the introduction of the decimal point. By the end of the sixteenth century the full decimal system was in place [13]. Notation warning! In the UK, we use a raised decimal point like 0 · 123 and not a comma. Also we generally write the number 1 without a long hook at the top. If you do write it like that there is a danger that people will confuse it with the number 7 which is not always written in the UK with a line through it. 1.2.2 Number bases We shall now look in more detail at the way in which numbers can be written down using a positional notation. In order not to be biased, we shall not just work in base 10 but show how any base can be used. Base 10 probably arose for biological reasons since we have ten fingers. There is one result that we shall use throughout the remainder of this section. It can be proved using the following idea. For simplicity let’s assume that both a and b are positive. If 0 < a < b then b · 0 < a < b · 1. If a ≥ b then we can always find a q such that bq ≤ a < b(q + 1). We therefore have the following. 8 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Lemma 1.2.1 (Remainder Theorem). Let a and b be natural numbers. Then there are unique integers q and r such that a = bq + r where 0 ≤ r < b. The number q is called the quotient and the number r is called the remainder. For example, if we consider the pair of natural numbers 14 and 3 then 14 = 3 · 4 + 2 where 4 is the quotient and 2 is the remainder. There’s nothing new here except possibly the terminology. Let a and b be integers. We say that a divides b or that b is divisible by a if there is a q such that b = aq. In other words, there is no remainder. We also say that a is a divisor or factor of b. We write a | b to mean the same thing as ‘a divides b’.2 Warning! a | b does not mean the same thing as ab . The latter is a number, the former is a statement about two numbers. Let’s see how to represent numbers in base b where b ≥ 2. If d ≤ 10 then we represent numbers by sequences of symbols taken from the set Zd = {0, 1, 2, 3, . . . d − 1} but if d > 10 then we need new symbols for 10, 11, 12 and so forth. It’s convenient to use A,B,C, . . .. For example, if we want to write numbers in base 12 we use the set of symbols {0, 1, . . . , 9, A, B} whereas if we work in base 16 we use the set of symbols {0, 1, . . . , 9, A, B, C, D, E, F }. If x is a sequence of symbols then we write xd to make it clear that we are to interpret this sequence as a number in base d. Thus BAD16 is a number in base 16. 2 Observe that if a is nonzero, then a | a, if a | b and b | a then a = ±b, and finally if a | b and b | c then a | c. 1.2. WRITING NUMBERS DOWN 9 The symbols in a sequence xd , reading from right to left, tell us the contribution each power of d such as d0 , d1 , d2 , etc makes to the number the sequence represents. Here are some examples. Examples 1.2.2. Converting from base d to base 10. (i) 11A912 is a number in base 12. This represents the following number in base 10: 1 × 123 + 1 × 122 + A × 121 + 9 × 120 , which is just the number 123 + 122 + 10 × 12 + 9 = 2001. (ii) BAD16 represents a number in base 16. This represents the following number in base 10: B × 162 + A × 161 + D × 160 , which is just the number 11 × 162 + 10 × 16 + 13 = 2989. (iii) 55567 represents a number in base 7. This represents the following number in base 10: 5 × 73 + 5 × 72 + 5 × 71 + 6 × 70 = 2001. These examples show how easy it is to convert from base d to base 10. There are two ways to convert from base 10 to base d. 1. The first runs in outline as follows. Let n be the number in base 10 that we wish to write in base d. Look for the largest power m of d such that adm ≤ n where a < d. Then repeat for n − adm . Continuing in this way, we write n as a sum of multiples of powers of d and so we can write n in base d. 10 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC 2. The second makes use of the remainder theorem. The idea behind this method is as follows. Let n = am . . . a1 a0 in base d. We may think of this as n = (am . . . a1 )d + a0 It follows that a0 is the remainder when n is divided by d, and the quotient is n0 = am . . . a1 . Thus we can generate the digits of n in base d from right to left by repeatedly finding the next quotient and next remainder by dividing the current quotient by d; the process starts with our input number as first quotient. Examples 1.2.3. Converting from base 10 to base d. (i) Write 2001 in base 7. I’ll solve this question in two different ways: the long but direct route and then the short but more thought-provoking route. We see that 74 > 2001. Thus we divide 2001 by 73 . This goes 5 times plus a remainder. Thus 2001 = 5 × 73 + 286. We now repeat with 286. We divide it by 72 . It goes 5 times again plus a remainder. Thus 286 = 5 × 72 + 41. We now repeat with 41. We get that 41 = 5 × 7 + 6. We have therefore shown that 2001 = 5 × 73 + 5 × 72 + 5 × 7 + 6. Thus 2001 in base 7 is just 5556. Now for the short method. 7 7 7 7 quotient 2001 285 40 5 0 remainder 6 5 5 5 Thus 2001 in base 7 is: 5556. 1.2. WRITING NUMBERS DOWN 11 (ii) Write 2001 in base 12. 12 12 12 12 quotient 2001 166 13 1 0 remainder 9 10 = A 1 1 Thus 2001 in base 12 is: 11A9. (iii) Write 2001 in base 2. 2 2 2 2 2 2 2 2 2 2 2 quotient 2001 1000 500 250 125 62 31 15 7 3 1 0 remainder 1 0 0 0 1 0 1 1 1 1 1 Thus 2001 in base 2 is (reading from bottom to top): 11111010001. When converting from one base to another it is always wise to check your calculations by converting back. Terminology Number bases have some special terminology associated with them which you might encounter: Base 2 binary. 12 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Base 8 octal. Base 10 decimal. Base 12 duodecimal. Base 16 hexadecimal. Base 20 vigesimal. Base 60 sexagesimal. Binary, octal and hexadecimal occur in computer science; there are remnants of a vigesimal system in French and the older Welsh system of counting; base 60 was used by astronomers in ancient Mesopotamia and is still the basis of time measurement (60 seconds = 1 minute, and 60 minutes = 1 hour) and angle measurement. What good are number bases? There are a number of answers to this question. First, it helps you to understand the true meaning of our positional number system. Second, computers, famously, work in base 2, and so it gives you some understanding of how they work. Third, as I indicated above, angle and time measurement, for historical reasons, are carried out in base 60. Fourth, there are mathematical patterns in working with different number bases which turn out to have important applications. Fifth, it is interesting mathematically. Exercises 1.2 1. What are the possible remainders when a natural number is divided by (i) 2. (ii) 3. (iii) 4. (iv) n where n ≥ 2 is any natural number. [This question really is as trivial as it looks]. 2. Find the quotients and remainders for each of the following pair of numbers; divide the smaller into the larger. 1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 13 (i) 30 and 6. (ii) 100 and 24. (iii) 364 and 12. 3. Write the number 2009 in (i) Base 5. (ii) Base 12. (iii) Base 16. 4. Write the following numbers in base 10. (i) DAB16 . (ii) ABBA12 . (iii) 443322115 . 1.3 The fundamental theorem of arithmetic The goal of this section is to state and prove the most basic result about the natural numbers: each natural number, excluding 0 and 1, can be written as a product of powers of primes in essentially one way. The primes are the ‘atoms’ from which all natural numbers can be built. 1.3.1 Greatest common divisors Let a, b ∈ N. A number d which divides both a and b is called a common divisor of a and b. The largest number which divides both a and b is called the greatest common divisor of a and b and is denoted by gcd(a, b). A pair of natural numbers a and b is said to be coprime if gcd(a, b) = 1. Note that for us gcd(0, 0) is undefined but that if a 6= 0 then gcd(a, 0) = a. Example 1.3.1. Consider the numbers 12 and 16. The set of divisors of 12 is the set {1, 2, 3, 4, 6, 12}. The set of divisors of 16 is the set {1, 2, 4, 8, 16}. The set of common divisors is the set of numbers that belong to both of these two sets: namely, {1, 2, 4}. The greatest common divisor of 12 and 16 is therefore 4. Thus gcd(12, 16) = 4. 14 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC One application of greatest common divisors is in simplifying fractions. is equal to the fraction 43 because we can divide For example, the fraction 12 16 out the common divisor of numerator and denominator. The fraction which results cannot be simplified further and is in its lowest terms. This is justified by the following result. Lemma 1.3.2. Let d = gcd(a, b). Then gcd( ad , db ) = 1. Proof. Because d divides both a and b we may write a = a0 d and b = b0 d for some natural numbers a0 and b0 . We therefore need to prove that gcd(a0 , b0 ) = 1. Suppose that e | a0 and e | b0 . Then a0 = ex and b0 = ey for some natural numbers x and y. Thus a = exd and b = eyd. Observe that ed | a and ed | b and so ed is a common divisor or both a and b. But d is the greatest common divisor and so e = 1, as required. Let me paraphrase what the result above says. If I divide two numbers by their greatest common divisors then then numbers that remain are coprime. This seems intuitively plausible and the proof ensures that our intuition is correct. If the numbers a and b are large, then calculating their gcd in the way I did above would be time-consuming and error-prone. We want to find an efficient way of calculating the greatest common divisor. The following lemma is the basis of just such an efficient method. Lemma 1.3.3. Let a, b ∈ N, where b 6= 0, and let a = bq +r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). Proof. Let d be a common divisor of a and b. Since a = bq + r we have that a − bq = r so that d is also a divisor of r. It follows that any divisor of a and b is also a divisor of b and r. Now let d be a common divisor of b and r. Since a = bq + r we have that d divides a. Thus any divisor of b and r is a divisor of a and b. It follows that the set of common divisors of a and b is the same as the set of common divisors of b and r. Thus gcd(a, b) = gcd(b, r). The point of the above result is that b < a and r < b. So calculating gcd(b, r) will be easier than calculating gcd(a, b) because the numbers involved are smaller. Compare z }| { a = bq + r 1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 15 with a = bq + r . | {z } The above result is the basis of an efficient algorithm for computing greatest common divisors. It was described by Euclid around 300 BC in his collection of books ‘The Elements’ in Propositions 1 and 2 of Book VII. Algorithm 1.3.4 (Euclid’s algorithm). Input: a, b ∈ N such that a ≥ b and b 6= 0. Output: gcd(a, b). Procedure: write a = bq + r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). If r 6= 0 then repeat this procedure with b and r and so on. The last non-zero remainder is gcd(a, b) Example 1.3.5. Let’s calculate gcd(19, 7) using Euclid’s algorithm. I have highlighted the numbers that are involved at each stage. 19 7 5 2 = = = = 7·2+5 5·1+2 2·2+1 ∗ 1·2+0 By Lemma 1.3.3 we have that gcd(19, 7) = gcd(7, 5) = gcd(5, 2) = gcd(2, 1) = gcd(1, 0). The last non-zero remainder is 1 and so gcd(19, 7) = 1 and, in this case, the numbers are coprime. There are occasions when we need to extract more information from Euclid’s algorithm as we shall discover later when we come to deal with prime numbers. Specifically, we can use Euclid’s algorithm to find integers x and y such that gcd(a, b) = xa + yb. This is called Bézout’s theorem. This theorem is proved by running Euclid’s algorithm in reverse when it is called the extended Euclidean algorithm. The procedure for doing so is outlined below but the details are explained in the example that follows it. 16 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Algorithm 1.3.6 (Bézout’s Theorem/Extended Euclidean algorithm). Input: a, b ∈ N where a ≥ b and b 6= 0. Output: numbers x, y ∈ Z such that gcd(a, b) = xa + yb. Procedure: apply Euclid’s algorithm to a and b; working from bottom to top rewrite each remainder in turn. Example 1.3.7. This is a little involved so I have split the process up into steps. I shall apply the extended Euclidean algorithm to the example I calculated above. I have highlighted the non-zero remainders wherever they occur, and I have discarded the last equality where the remainder was zero. I have also marked the last non-zero remainder. 19 = 7 · 2 + 5 7 = 5·1+2 5 = 2·2+1 ∗ The first step is to rearrange each equation so that the non-zero remainder is alone on the lefthand side. 5 = 19 − 7 · 2 2 = 7−5·1 1 = 5−2·2 Next we reverse the order of the list 1 = 5−2·2 2 = 7−5·1 5 = 19 − 7 · 2 We now start with the first equation. The lefthand side is the gcd we are interested in. We treat all other remainders as algebraic quantities and systematically substitute them in order. Thus we begin with the first equation 1 = 5 − 2 · 2. The next equation in our list is 2=7−5·1 1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 17 so we replace 2 in our first equation by the expression on the right to get 1 = 5 − (7 − 5 · 1) · 2. We now rearrange this equation by collecting up like terms treating the highlighted remainders as algebraic objects to get 1 = 3 · 5 − 2 · 7. We can of course make a check at this point to ensure that our arithmetic is correct. The next equation in our list is 5 = 19 − 7 · 2 so we replace 5 in our new equation by the expression on the right to get 1 = 3 · (19 − 7 · 2) − 2 · 7. Again we rearrange to get 1 = 3 · 19 − 8 · 7 . The algorithm now terminates and we can write gcd(19, 7) = 3 · 19 + (−8) · 7 , as required. We can also, of course, easily check the answer! Exercise 1.3.8. Use the extended Euclidean algorithm to find integers x, y such that gcd(a, b) = xa + yb when a = 2406 and b = 654. Check your answer. [Solution: 6 = 28 · 2406 − 103 · 654]. I shall describe a much more efficient algorithm for implementing the extended Euclidean algorithm when I have discussed matrices in Chapter 3. The greatest common divisor of two numbers a and b is the largest number that divides into both a and b. On the other hand, if a | c and b | c then we say that c is a common multiple of a and b. The smallest common multiple of a and b is called the least common multiple of a and b and is denoted by lcm(a, b). You might expect that to calculate the least common multiple we would need a new algorithm, but in fact we can use Euclid’s algorithm as the following result shows. I shall prove the following result later once I have proved the fundamental theorem of arithmetic. 18 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Proposition 1.3.9. Let a and b be natural numbers. Then gcd(a, b) × lcm(a, b) = ab. I shall now show how gcd’s and lcm’s play a natural role in the arithmetic of fractions. The key property of fractions is that a fraction ab is unchanged when numerator and denominator are both multiplied by the same non-zero integer. Thus ac a = . b bc a Given a fraction b we often want to simplify it as much as possible and this is accomplished by calculating gcd(a, b) = d. We have a = a0 d and b = b0 d and so a0 d a0 a = 0 = 0. b bd b 0 0 We have proved above that gcd(a , b ) = 1 and so the fraction cannot be 0 simplified any further. Thus ab0 is a fraction in its lowest terms. When we come to add fractions, the problem is the reverse of simplification. We cannot immediately add ab + dc because the denominators b and d are different. To make progress, we have to rewrite each fraction so that their denominators are the same. The simplest way to do this is to rewrite each fraction as a fraction over bd: to do this, we multiply the first fraction by d and the second by b to get ad + bc ad bc + = . bd bd bd However, the most efficient way is to write each fraction over lcm(b, d). Let lcm(b, d) = b0 b = d0 d. Then a c b 0 a d0 c b0 a + d0 c + = 0 + 0 = . b d bb dd lcm(b, c) 1.3.2 Primes: the atoms of number A proper divisor of a natural number n is a divisor that is neither 1 nor n. A natural number n is said to be prime if n ≥ 2 and the only divisors of n are 1 and n itself. A number bigger than or equal to 2 which is not prime is said to be composite. 1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 19 Warning! The number 1 is not a prime. The properties of primes have exercised a great fascination ever since they were first studied and continue to pose questions that mathematicians have yet to solve. We shall just describe their basic properties in this section. Lemma 1.3.10. Let n ≥ 2. Either n is prime or the smallest proper divisor of n is prime. Proof. Suppose n is not prime. Let d be the smallest proper divisor of n. If d were not prime then d would have a smallest proper divisor and this divisor would in turn divide n, but this would contradict the fact that d was the smallest proper divisor of n. Thus d must itself be prime. The following was also proved by Euclid: it is Proposition 20 of Book IX of ‘The Elements’. Theorem 1.3.11. There are infinitely many primes. Proof. Let p1 , . . . , pn be the first n primes. Put N = (p1 . . . pn ) + 1. If N is a prime, then N is a prime bigger than pn . If N is composite, then N has a prime divisor p by Lemma 1.3.10. But p cannot equal any of the primes p1 , . . . , pn because N leaves remainder 1 when divided by pi . It follows that p is a prime bigger than pn . Thus we can always find a bigger prime. It follows that there must be an infinite number of primes. Algorithm 1.3.12. To √ decide if a number n is prime or composite. Check to see if any prime p ≤ n divides n. If none of them do, the number n is prime. Let’s think about why this divides n then we√can write n =√ab √ works. If a√ for some number b. If a < n then b > n whilst if a > n then b < n. Thus to decide if √ n is prime or not we need only carry out trial divisions by all numbers a ≤ n. However, this is inefficient because if a divides n and a is not prime then a is divisible by some prime p which must therefore also divide √ n. It follows that we need only carry out trial divisions by the primes p ≤ n. 20 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Example 1.3.13. Determine whether 97 is prime using the above √ algorithm. We first calculate the largest whole number less than or equal to 97. This is 9. We now carry out trial divisions of 97 by each prime number p where 2 ≤ p ≤ 9; by the way, if you aren’t certain which of these numbers is prime: just try them all. You’ll get the right answer although not as efficiently. You might also want to remember that if m doesn’t divide a number neither can any multiple of m. In any event, in this case we carry out trial divisions by 2, 3, 5 and 7. None of them divides 97 exactly and so 97 is prime. The following is the key property of primes we shall need to prove the fundamental theorem of arithmetic. We use Bézout’s Theorem to prove it. It is Proposition 30 of Book VII of ‘the Elements’. Lemma 1.3.14 (Euclid’s lemma). Let p | ab where p is a prime. Then p | a or p | b.3 Proof. Suppose that p does not divide a. We shall prove that p must then divide b. If p does not divide a, then a and p are coprime, and so there exist integers x and y such that 1 = px + ay. Thus b = bpx + bay. Now p | bp and p | ba, by assumption, and so p | b, as required. Example 1.3.15. The above result is not true if p is not a prime. For example, 6 | 9 × 4 but 6 divides neither 9 nor 4. Lemma 1.3.14 is so important, I want to spell out in words what it says If a prime divides a product of numbers it must divide at least one of them. Theorem 1.3.16 (Fundamental theorem of arithmetic). Every number n ≥ 2 can be written as a product of primes in one way if we ignore the order in which the primes appear. By product we allow the possibility that there is only one prime. Proof. Let n ≥ 2. If n is already a prime then there is nothing to prove, so we can suppose that n is composite. Let p1 be the smallest prime divisor of n. Then we can write n = p1 n0 where n0 < n. Once again, n0 is either prime or composite. Continuing in this way, we can write n as a product of primes. 3 This result can be usefully generalised using much the same proof. Let p | ab where p and a are coprime. Then p | b. 1.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 21 We now prove uniqueness. Suppose that n = p1 . . . ps = q 1 . . . qt are two ways of writing n as a product of primes. Now p1 | n and so p1 | q1 . . . qt . By Euclid’s Lemma, the prime p1 must divide one of the qi ’s and, since they are themselves prime, it must actually equal one of the qi ’s. By relabelling if necessary, we can assume that p1 = q1 . Cancel p1 from both sides and repeat with p2 . Continuing in this way, we see that every prime occurring on the lefthand side occurs on the righthand side. Changing sides, we see that every prime occurring on the righthand side occurs on the lefthand side. We deduce that the two prime decompositions are identical. When we write a number as a product of primes we usually gather together the same primes into a prime power, and write the primes in increasing order which then gives a unique representation. This is illustrated in the example below. Example 1.3.17. Let n = 999, 999. Write n as a product of primes. There are a number of ways of doing this but in this case there is an obvious place to start. We have that n = 32 ·111, 111 = 33 ·37, 037 = 33 ·7·5, 291 = 33 ·7·11·481 = 33 ·7·11·13·37. Thus the prime factorisation of 999, 999 is 999, 999 = 33 · 7 · 11 · 13 · 37. Primes can be regarded as the atoms from which all other numbers can be constructed. √ Using the fundamental theorem of arithmetic we can always compute n, where n is a natural number, exactly in √ terms of the square roots of prime numbers. For example, let’s calculate 540 exactly. First, we find a prime factorization of 540. We have 540 = 10 · 54 = 2 · 5 · 2 · 27 = 2 · 5 · 2 · 3 · 9 = 22 · 33 · 5. Thus √ √ 540 = 22 · 32 · 3 · 5 = 2 · 3 · √ √ √ 3 · 5 = 6 3 5. This is an exact answer. If someone needs to compute it explicitly, then they can do so to a degree of accuracy they choose and not one that you have 22 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC arbitrarily decided upon. We can use the prime factorizations of numbers to give a nice proof of Proposition 1.3.9. Let m and n be two integers. To keep things simple, we suppose that their prime factorizations are m = pα1 pβ2 pγ3 and n = pδ1 p2 pζ3 where p1 , p2 , p3 are primes. It will be obvious how to extend this argument to the general case. The prime factorizations of gcd(m, n) and lcm(m, n) are min(α,δ) min(β,) min(γ,ζ) p3 p2 gcd(m, n) = p1 and max(α,δ) max(β,) max(γ,ζ) p3 p2 lcm(m, n) = p1 respectively. I shall let you work out why and also work out how we can use these results to prove the above proposition. Exercises 1.3 1. Use Euclid’s algorithm to find the gcd’s of the following pairs of numbers. (i) 35, 65. (ii) 135, 144. (iii) 17017, 18900. 2. Use the extended Euclidean algorithm to find integers x and y such that gcd(a, b) = ax + by for each of the following pairs of numbers. You should ensure that your answers for x and y have the correct signs. (i) 112, 267. (ii) 242, 1870. 3. Find the lowest common multiples of the following pairs of numbers. (i) 22, 121. (ii) 48, 72. 1.4. REAL NUMBERS 23 (iii) 25, 116. 4. List the primes less than 100. 5. For each of the following numbers use Algorithm 1.3.12 to determine whether they are prime or composite. When they are composite find a prime factorisation. Show all working. (i) 131. (ii) 689. (iii) 5491. 6. Given that 3630000 = 24 · 3 · 55 · 112 and 915062500 = 22 · 56 · 114 , calculate the greatest common divisor and least common multiple of these two numbers. 7. Calculate the square roots of the following numbers exactly. (i) 10. (ii) 42. (iii) 54. 1.4 Real numbers I described real numbers in terms of decimal expansions, but this is not the basic definition of the real numbers. The reals differ from the rationals by satisfying what is called the completeness axiom. A detailed discussion of real numbers really belongs to a course in calculus/analysis rather than algebra: the whole of calculus is constructed on this very special property of the real numbers but let me at least say something about it before moving on to the business of this section. Draw the number line and now imagine that you could see only the rational numbers on that line. We shall call this the rational number line. Superficially, it wouldn’t look any different from the whole number line. But in fact it is full of holes: indeed, there are more holes than there are rational numbers. This is a little hard to believe at first because between any two distinct rational numbers r1 and r2 you can always 2 . However, we have already seen that for any prime find a third: namely, r1 +r 2√ p none of the numbers p is rational and so appear as holes in our rational 24 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC number line and it can be proved that there are lots of lots of holes. If we now add to our rational number line all the remaining real numbers then it can be proved that all holes disappear. The completeness axiom is actully a way of stating that there are no holes but this is stated in mathematical language: every non-empty subset of the reals that is bounded above has a least upper bound. The completeness axiom enables us to talk about limits and so differentiable and integrable functions. √ Remark In mathematical work, expressions that contain roots such as 2 or numbers such as π or e and so on are never explicitly calculated until needed; this is for two reasons: first, simplifications may arise and second, any explicit calculation will always be an approximation and not the exact answer. 1.4.1 Irrational numbers Real numbers are the actual values of quantities such as mass length and time. We cannot measure them exactly: the result of a measurement will always be a rational number. We begin by proving that there are real numbers that are not rationals. Recall the basic property of prime numbers: if a prime divides the product of two numbers then it must divide at least of the numbers. We use this property below. A real number which is not rational is said to be irrational. Theorem 1.4.1. The square root of every prime number is irrational. Proof. We shall prove this by a method called proof by contradiction. Assume √ that we can write p as a rational. I shall show that this assumption leads to a contradiction and so must be false. √ We are assuming that p = ab . By cancelling the greatest common divisor of a and b we can in fact assume that gcd(a, b) = 1. This will be crucial to our argument. √ Squaring both sides of the equation p = ab and multiplying the resulting equation by b2 we get that pb2 = a2 . This says that a2 is divisible by p. But if a prime divides a product of two numbers it must divide at least one of those numbers by Euclid’s lemma. 1.4. REAL NUMBERS 25 Thus p divides a. Thus we can write a = pc for some natural number c. Substituting this into our equation above we get that pb2 = p2 c2 . Dividing both sides of this equation by p gives b2 = pc2 . This tells us that b2 is divisible by p and so in the same way as above p divides b. √ We have therefore shown that our assumption that p is rational leads to both a and b being divisible by p. But this contradicts the fact that √ gcd(a, b) = 1. Our assumption is therefore wrong, and so p is not a rational number. Irrational numbers abound: both e and π can be proved to be irrational, for example. The discovery of irrational numbers is due to the Ancient Greeks and was one of the first great mathematical discoveries. Although we cannot calculate irrational numbers exactly, we can calculate them to any degree of accuracy needed and it is by means of such approximations that irrational numbers are handled practically. For example, suppose √ n where n is not a perfect square. Make a first guess we want to calculate √ n is in general a better a to n. Put b = a . Then their average a0 = a+b 2 guess. This process can be repeated, as the following example illustrates, and enables us to calculate square roots to any desired degree of accuracy. √ Example 1.4.2. I shall calculate some approximations to 3 using the above method. We observe that 12 < 3 < 22 so my first guess is 1. We have that 3 = 3 and the average of 1 and 3 is 2. I now start the process all over again 1 with 2 as my guess. We have that 32 = 1 · 5 and the average of 2 and 1 · 5 is 1 · 75. This is my new guess. The number 3 divided by 1 · 75 is 1 · 714 (approximately). The average of 1 · 75 and 1 · 714 is 1 · 732. My new guess is 1 · 732. 3 divided by 1 · 732 is 1 · 732 to 3 decimal places. Observe that (1 · 732)2 = 2 · 999 . . . which isn’t bad. 1.4.2 Decimal fractions I shall describe in this section the decimal fractions which correspond to rational numbers. To see what’s involved, let’s calculate some decimal fractions. 26 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Examples 1.4.3. (i) 1 20 (ii) 1 7 (iii) = 0 · 05. This fraction has a finite decimal representation. = 0 · 142857142857142857142857142857 . . .. This fraction has an infinite decimal representation, which consists of the same sequence of numbers repeated. We abbreviate this decimal to 0 · 142857. 37 84 = 0 · 44047619. This fraction has an infinite decimal representation, which consists of a non-repeating part followed by a part which repeats. Case (ii) is said to be a purely periodic decimal whereas case (iii), which is more general, is said to be ultimately periodic. Proposition 1.4.4. A proper rational number ab in its lowest terms has a finite decimal expansion if and only if b = 2m 5n for some natural numbers m and n. Proof. Let a b have the finite decimal representation 0 · a1 . . . an . This means a1 a2 an a = + 2 + ... + n. b 10 10 10 The righthand side is just the fraction a1 10n−1 + a2 10n−2 + . . . + an . 10n The denominator contains only the prime factors 2 and 5 and so the reduced form will also only contain at most the prime factors 2 and 5. To prove the converse, consider the proper fraction a 2α 5β . If α = β then the denominator is 10α . If α 6= β then multiply the fraction by a suitable power of 2 or 5 as appropriate so that the resulting fraction has denominator a power of 10. But any fraction with denominator a power of 10 has a finite decimal expansion. Proposition 1.4.5. An infinite decimal fraction represents a rational number if and only if it is ultimately periodic. 1.4. REAL NUMBERS 27 Proof. Consider the ultimately periodic decimal number r = 0 · a1 . . . as b1 . . . bt . We shall prove that r is rational. Observe that 10s r = a1 . . . as · b1 . . . bt and 10s+t = a1 . . . as b1 . . . bt · b1 . . . bt . From which we get that 10s+t r − 10s r = a1 . . . as b1 . . . bt − a1 . . . as where the righthand side is the decimal form of some integer that we shall call a. It follows that a r = s+t 10 − 10s is a rational number. The proof of the converse is based on the method we use to compute . We carry out repeated divisions by n and at the decimal expansion of m n each step of the computation we use the remainder obtained to calculate the next digit. But there are only a finite number of possible remainders and our expansion is assumed infinite. Thus at some point there must be repetition. Example 1.4.6. We shall write the ultimately periodic decimal 0 · 94̄. as a proper fraction in its lowest terms. Put r = 0 · 94̄. Then • r = 0 · 94̄. • 10r = 9.444 . . . • 100r = 94.444 . . .. 85 Thus 100r − 10r = 94 − 9 = 85 and so r = 90 . We can simplify this to r = We can now easily check that this is correct. 17 . 18 The commonest mistake in working with ultimately periodic decimals is simply misreading which group of digits the overline groups together. This is followed by ignoring the overline sign completely. 28 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Exercises 1.4 1. For each of the following fractions determine whether they have finite or infinite decimal representations. If they have infinite decimal representations determine whether they are purely periodic or ultimately periodic; in both cases determine the periodic block. (i) (ii) (iii) (iv) (v) (vi) 1 . 2 1 . 3 1 . 4 1 . 5 1 . 6 1 . 7 2. Write the following decimals as fractions in their lowest terms. (i) 0 · 534. (ii) 0 · 2106. (iii) 0 · 076923. 1.5 *The prime number theorem* This section will not be examined in 2013. There are no nice formulae to tell us what the nth prime is but there are still some interesting results in this direction. The polynomial p(n) = n2 − n + 41 has the property that its value for n = 1, 2, 3, 4, . . . , 40 is always prime. Of course, for n = 41 it is clearly not prime. In 1971, the mathematician Yuri Matijasevic found a polynomial in 26 variables of degree 25 with the property that when non-negative integers are substituted for the variables the positive values it takes are all and only the primes. However, this polynomial does not generate the primes in any particular order. If we adopt a statistical approach then we can obtain much more useful results. The idea is that for each natural number n we count the number 1.5. *THE PRIME NUMBER THEOREM* 29 of primes π(n) less than or equal to n. If we are going to do this then our first problem is to compile a table of sufficiently many of them. The simplest way of doing this is to use the Sieve of Eratosthenes. Suppose we want to construct a table of all primes up to the number N . We begin by listing all numbers from 2 to N inclusive. Mark 2 as prime and then cross out from the table all numbers which are multiples of 2. The first number after 2 which we have not crossed out is 3. We mark this as prime and then cross out all multiples of 3. The first number after 3 not crossed out is 5. We mark this as prime and continue in the same way. We stop when √ we have crossed out all multiples of the largest prime less than or equal to N . All marked numbers will be prime as well as those numbers which remain not crossed out. If you compile tables of primes in this way, you can calculate the function π(x). Its graph has a staircase shape — it certainly isn’t smooth — but as you zoom away it begins to look smoother and smoother. This raises the question whether there is a smooth function that is a good approximation to π(n). This seems to have been what Gauss did. He set up a table something like the following (this is taken from LeVeque’s book Fundamentals of number theory, Dover, 1977) where ∆(x) = π(x) − π(x − 1000) 1000 represents an approximate slope of the curve π(x). x 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 π(x) 168 303 430 550 669 783 900 1007 1117 1229 ∆(x) 0 · 168 0 · 135 0 · 127 0 · 120 0 · 119 0 · 114 0 · 117 0 · 107 0 · 110 0 · 112 1 ln(x) 0 · 145 0 · 132 0 · 125 0 · 121 0 · 117 0 · 115 0 · 113 0 · 111 0 · 110 0 · 109 Gauss noticed, because that was the kind of person he was, that the slope of 1 π(x) looked very much like ln(x) . This suggests that the function, defined by 30 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC integrating these slopes, is given by Z x li(x) = t=2 1 dt ln(t) should be an approximation to π(x). It is called the logarithmic integral. Of course, this is not a theorem: it is a conjecture. It was proved in 1896 by two mathematicians: Hadamard in France and de la Vallée Poussin in Belgium. Theorem 1.5.1 (The Prime Number Theorem (PNT): version 1). π(x) = 1. x→∞ li(x) lim This version of the PNT is not that easy for us to use. However by l’Hôpital’s rule, we can show that li(x) = 1. x→∞ x/ ln(x) lim If we assume the first version of the PNT and use the above result, we obtain the second version of the PNT. Theorem 1.5.2 (The Prime Number Theorem: version 2). π(x) = 1. x→∞ x/ ln(x) lim The above theorem can be interpreted as saying that for large values of x x the value of π(x) is approximately given by ln(x) . This result is a huge improvement on the theorem that there are infinitely many primes: it tells us not only that there are infinitely many of them but also how they are distributed. Prime numbers also play an important role in computing: specifically, in exchanging secret information. In 1976, Whitfield Diffie and Martin Hellman wrote a paper on cryptography that can genuinely be called ground-breaking. In ‘New directions in cryptography’ IEEE Transactions on Information Theory 22 (1976), 644–654, they put forward the idea of a public-key cryptosystem which would enable . . . a private conversation . . . [to] be held between any two individuals regardless of whether they have ever communicated before. 1.6. *PROOFS BY INDUCTION* 31 With considerable farsightedness, Diffie and Hellman foresaw that such cryptosystems would be essential if communication between computers was to reach its full potential. However, their paper did not describe a concrete way of doing this. It was R. I. Rivest, A. Shamir and L. Adleman (RSA) who found just such a concrete method described in their paper, ‘A method for obtaining digital signatures and public-key cryptosystems’ Communications of the ACM 21 (1978), 120–126. Their method is based on the following observation. Given two prime numbers it takes very little time to multiply them together, but if I give you a number that is a product of two primes and ask you to factorize it then it takes a lot of time. After considerable experimentation, RSA showed how to use little more than second year undergraduate mathematics to put together a public-key cryptosystem that is an essential ingredient in e-commerce. Ironically, this secret code had in fact been invented in 1973 at GCHQ — who had kept it secret. 1.6 *Proofs by induction* This section will not be examined in 2013. This is a proof technique with applications throughout mathematics. The basis of this technique is the following idea: “I am thinking of a subset X of the infinite set {1, 2, 3, . . .}. I tell you two things about X: first, 1 ∈ X, and second if n ∈ X then n + 1 ∈ X. What is X?” The fact that these two pieces of information are enough to determine the set of positive integers is called the principle of induction. This principle can be used to prove results as follows. Suppose we have an infinite number of statements S1 , S2 , S3 , . . . which we want to prove. By the principle of induction it is enough to do two things: 1. Show that S1 is true. 2. Show that if Sn is true then Sn+1 is also true. It will follow that Si is true for all positive i. This proof technique can only be learnt by attempting lots of examples. 32 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Example 1.6.1. Prove by induction that n X k=1 k= n(n + 1) . 2 A proof by induction takes the following form: Base step Show that the case k = 1 holds. Induction hypothesis (IH) Assume that the case k = n holds. Proof bit Now use (IH) to show that the case k = n + 1 holds. Exercise 1.6.2. Prove by induction that 1 + 3 + 5 + . . . + (2n − 1) = n2 for each n ≥ 1. Exercise 1.6.3. Prove by induction that 5n − 1 is exactly divisible by 4 for all natural numbers n ≥ 1. What I have described above I shall call ‘basic’ induction. There are numerous variations on basic induction. I shall describe two here: 1. Rather than starting the base step at k = 1 we might start at k = 2 or k = 3 and so on. 2. In basic induction we assume Sn and prove Sn+1 . Sometimes we need to assume some or all of S1 , . . . , Sn to be true in order to prove Sn+1 and in addition our base case may consist of several cases. This is often called ‘strong induction’. Example 1.6.4. Prove for all natural numbers n ≥ 4 that n3 < 3n . Exercises on induction 1. Prove the following by induction. (i) n3 + 2n is exactly divisible by 3 for all natural numbers n ≥ 1. 1.6. *PROOFS BY INDUCTION* (ii) Pn i=1 i2 = n(n+1)(2n+1) 6 33 for all natural numbers n ≥ 1. (iii) n! ≥ 2n−1 for all natural numbers n ≥ 1. 2. Prove for all n ≥ 1 that 1 1 1 n + + ... + = . 1·2 2·3 n(n + 1) n+1 3. Prove for all n ≥ 0 that the following number is exactly divisible by 17 3 · 52n+1 + 23n+1 . 4. A matrix A is said to be symmetric if it is equal to its transpose; that is, AT = A. Prove that if A is symmetric then An is symmetric for all n ≥ 1. [You will be able to answer this cover after we have studied matrices in Chapter 3.] 5. Prove that n3 < 3n for all n ≥ 4. Solutions 1. (i) Base step: when n = 1, we have that n3 + 2n = 3 which is exactly divisible by 3. Induction hypothesis: assume result is true for n = k. We prove it for n = k + 1. We need to prove that (k + 1)3 + 2(k + 1) is exactly divisible by 3 assuming only that k 3 +2k is exactly divisible by 3. We first expand (k +1)3 +2(k +1) to get k 3 + 3k 2 + 3k + 1 + 2k + 2. This is equal to (k 3 + 2k) + 3(k 2 + k + 1) which is exactly divisible by 3 using the Induction hypothesis. (ii) Base step: check the formula is true when n = 1. Induction hypothesis: assume the result is true for n = k. We prove it for n = k + 1. We need to calculate k+1 X i=1 i2 . 34 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC But this is equal to k+1 X i2 = i=1 k X ! + (k + 1)2 . i2 i=1 By the induction hypothesis this is equal to k(k + 1)(2k + 1) + (k + 1)2 . 6 This can be written as k(k + 1)(2k + 1) + 6(k + 1)2 . 6 Taking out the factor (k +1) and then carrying out some algebraic manipulation gives (k + 1)(k + 2)(2k + 3) , 6 as required. (iii) Base step: check inequality holds when n = 1. Induction hypothesis: assume inequality holds for n = k. We prove it for n = k + 1. We argue as follows (k + 1)! = (k + 1)k! ≥ (k + 1)2k−1 by the Induction hypothesis. Since k ≥ 1 it is clear that k + 1 ≥ 2. Thus (k + 1)2k−1 ≥ 22k−1 = 2k . Hence we have proved that (k + 1)! ≥ 2k , as required. 2. Base case: When n = 1 the LHS is LHS=RHS. 1 2 and the RHS is IH: We assume that n X i=1 1 n = . i(i + 1) n+1 1 1+1 and so 1.6. *PROOFS BY INDUCTION* 35 Proof part: We have to prove that n+1 X i=1 1 n+1 = . i(i + 1) n+2 We start with the LHS of the equality we are trying to prove ! n+1 n X X 1 1 1 = + . i(i + 1) i(i + 1) (n + 1)(n + 2) i=1 i=1 By the induction hypothesis this is equal to n 1 + . n + 1 (n + 1)(n + 2) If we add these fractions and factorise the numerator we get (n + 1)2 . (n + 1)(n + 2) On cancelling the common factor we get n+1 , n+2 as required. 3. Base case: When n = 0 the number in question is 17 and so clearly exactly divisible by 17. IH: Assume that 3 · 52n+1 + 23n+1 is exactly divisible by 17. Proof part: We have to prove that 3 · 52n+3 + 23n+4 is exactly divisible by 17. Observe that 3·52n+3 +23n+4 = 3·52n+1+2 +23n+1+3 = 3·52n+1 ·52 +23n+1 ·23 = 3·52n+1 ·25+23n+1 ·8 which is equal to 75 · 52n+1 + 8 · 23n+1 . 36 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Write 75 = 8 · 3 + 51 (why?). Then 75·52n+1 +8·23n+1 = (8·3+51)52n+1 +8·23n+1 = 8·3·52n+1 +8·23n+1 +51·52n+1 which is equal to 8(3 · 52n+1 + 23n+1 ) + 17 · 3 · 52n+1 . By (IH) the first summand is exactly divisible by 17, so is the second and so is their sum. 4. Base case: We are given that A is symmetric and A1 = A by definition. IH: We assume that if A is symmetric then An is symmetric. Proof part: We have to prove that An+1 is symmetric. We know that An+1 = AAn . Thus (An+1 )T = (AAn )T = (An )T AT using a familiar property of the transpose. By (IH), we have that (An )T = An and by assumption AT = A and so (An+1 )T = An A = An+1 as required. 5. Base case: Here the base case is n = 4. The LHS is 43 = 64 and the RHS is 34 = 81. Thus the LHS is strictly less than the RHS. IH: We assume that n3 < 3n . Proof part: We have to prove that (n + 1)3 < 3n+1 . We start on the LHS of the inequality we are trying to prove (n + 1)3 = [n(1 + 1 1 3 )] = n3 (1 + )3 . n n By (IH), we know that n3 < 3n and because n ≥ 4 we know that (1 + n1 )3 ≤ ( 54 )3 < 3. Thus (n + 1)3 < 3n · 3 = 3n+1 , as required. 1.7. LEARNING OUTCOMES FOR CHAPTER 1 1.7 37 Learning outcomes for Chapter 1 At the end of working through this chapter, you should be able to do the following. You can think of these as potential test and exam questions. • Understand basic set notation. • You should know the meanings of all the words highlighted in italics in the lecture notes. • You should work through and understand all the proofs in this chapter. • Convert between different number bases. • Calculate greatest common divisors using Euclid’s algorithm (and Blankinship’s algorithm discussed in Chapter 3). • Use the extended Euclidean algorithm. • Calculate least common multiples. • Manipulate fractions using gcd’s and lcm’s. • Find prime factorizations of numbers and apply them. • Prove that certain numbers are irrational. • Convert between fractional and decimal representations of numbers. 1.8 Further reading and exercises For sets, look at Chapter 1 of Hammack. In Chapter 5, I shall end up covering most of this material, but for the time being, concentrate on the exercises that deal with the material I have taught so far. Chapter 1 of Olive contains some basic background material that you might find useful. You can find more exercises dealing with numbers in Chapter 11 of Schaum’s Outline Discrete Mathematics. If you want to find out more about prime numbers, I recommend Marcus du Sautoy, The music of the primes, Harper Perennial, 2004. 38 CHAPTER 1. THE FUNDAMENTAL THEOREM OF ARITHMETIC Chapter 2 The fundamental theorem of algebra In this chapter, we introduce the complex numbers. These are essential for the further development of both algebra and calculus. Not only do they have practical applications, they also have important theoretical ones: they enable us to connect different parts of mathematics that would otherwise look unrelated. 2.1 Complex number arithmetic In the set of real numbers we can add, subtract, multiply and divide, but we cannot always extract square roots. For example, the real number 1 has the two real square roots 1 and −1, whereas the real number −1 has no real square roots, the reason being that the square of any real non-zero number is always positive. In this section, we shall repair this lack of square roots and, as we shall learn, we shall in fact have achieved much more than this. Complex numbers were first studied in the 1500’s but were only fully accepted and used in the 1800’s. √ Warning! If r is a positive real number then r is usually interpreted to mean the positive square root. If I want √ to emphasise that both square roots need to be considered I shall write ± r. 39 40 2.1.1 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA Solving quadratic equations Quadratic equations were solved by the Babylonians and the Egyptians and are dealt with in all school algebra courses. I have included them here because I want to show you that you don’t have to remember a formula to solve such equations; what you have to remember is a method. An expression of the form ax2 + bx + c where a, b, c are numbers and a 6= 0 is called a quadratic polynomial or a polynomial of degree 2. The numbers a, b, c are called the coefficients of the quadratic. A quadratic where a = 1 is said to be monic. A number r such that ar2 + br + c = 0 is called a root of the polynomial. The problem of finding all the roots of a quadratic is called solving the quadratic. Usually this problem is stated in the form: ‘solve the quadratic equation ax2 + bx + c = 0’. Equation because we have set the polynomial equal to zero. I shall now show you how to solve a quadratic equation without having to remember a formula. Observe first that if ax2 + bx + c = 0 then c b x2 + x + = 0. a a Thus it is enough to find the roots of monic quadratics. Next, I want to write x2 + ab x as a perfect square plus a number: this will turn out to be the crux of solving the quadratic. Let α be any number. Then we have the following identity (x + α)2 = x2 + 2αx + α2 . We would like It follows that α = b 2αx = x. a b . 2a We therefore have that b b b2 x2 + x = (x + )2 − 2 . a 2a 4a 2.1. COMPLEX NUMBER ARITHMETIC 41 Look carefully at what we have done here: we have rewritten the lefthand side as a perfect square — the first term on the righthand side — plus a number — the second term on the righthand side. It follows that c b b2 b 4ac − b2 b c x2 + x + = (x + )2 − 2 + = (x + )2 + . a a 2a 4a a 2a 4a2 Setting the last expression equal to zero and rearranging, we get (x + b 2 b2 − 4ac ) = . 2a 4a2 Now take square roots of both sides, remembering that a non-zero number has two square roots: r b2 − 4ac b =± x+ 2a 4a2 which of course simplifies to √ b b2 − 4ac =± . x+ 2a 2a Thus √ b2 − 4ac 2a the usual formula for finding the roots of a quadratic. This way of solving a quadratic equation is called completing the square. Of course, you can just use the formula, but now you have proved that the formula always works. x= −b ± Example 2.1.1. Solve the quadratic equation 2x2 − 5x + 1 = 0. by completing the square. Divide through by 2 to make the quadratic monic giving 5 1 x2 − x + = 0. 2 2 We now want to write 5 x2 − x 2 as a perfect square plus a number. We get 5 5 25 x2 − x = (x − )2 − . 2 4 16 42 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA Thus our quadratic becomes 5 25 1 (x − )2 − + = 0. 4 16 2 Rearranging and taking roots gives us √ √ 5 17 5 ± 17 x= ± = . 4 4 4 We now check our answer by substituting each of our two roots back into the original quadratic and ensuring that we get zero in both cases. For the quadratic equation ax2 + bx + c = 0 the number D = b2 − 4ac, called the discriminant of the quadratic, plays an important role. • If D > 0 then the quadratic equation has two distinct real solutions. • If D = 0 then the quadratic equation has one real root repeated. • If D < 0 then we shall see that the quadratic equation has two complex roots which are complex conjugate to each other. This is called the irreducible case. 2.1.2 Introducing complex numbers In the previous section, we reviewed how to solve quadratic equations. The method we described apparently yields nothing when the discriminant is strictly less than zero because we have no way, up to now, of taking square roots of negative numbers. In this section, we shall change all that and show that quadratic equations aways have two roots. The key step is the following We introduce a new number, denoted by i, whose defining property is that i2 = −1. We shall assume that in all other respects it satisfies the usual axioms of high-school algebra. This assumption will be justified later. 2.1. COMPLEX NUMBER ARITHMETIC 43 We shall now explore the consequences of this definition which turns out to be a profound one for mathematics. It follows that i and −i are the two missing square roots of 1. In all other respects the number i will behave like a real number. Thus if b is any real number then bi is a number, and if a is any real number then a + bi is a number. A complex number is a number of the form a + bi where a, b ∈ R. We denote the set of complex numbers by C. Complex numbers are sometimes called imaginary numbers. This is not such a good term: they are not figments of our imagination like unicorns or dragons. Like all numbers they are, however, products of our imagination: no one has seen the complex number number i but, then again, no one has seen the number 2. If z = a + bi then we call a the real part of z, denoted Re(z), and b the complex or imaginary part of z, denoted Im(z). Two complex numbers a + bi and c + di are equal precisely when a = c and b = d. In other words, when their real parts are equal and when their complex parts are equal. We can think of every real number as being a special kind of complex number because if a is real then a = a + 0i. Thus R ⊆ C. Complex numbers of the form bi are said to be purely imaginary. Now we show that we can add, subtract, multiply and divide complex numbers. Addition, subtraction and multiplication are all easy. Let a + bi, c + di ∈ C. To add these numbers means to calculate (a + bi) + (c+di). We assume that the order in which we add complex numbers doesn’t matter and that we may bracket sums of complex numbers how we like and still get the same answer and so we can rewrite this as a + c + bi + di. Next we assume that multiplication of complex numbers distributes over addition of complex numbers to get (a + c) + (b + d)i. Thus (a + bi) + (c + di) = (a + c) + (b + d)i. The definition of subtraction is similar and justified in the same way (a + bi) − (c + di) = (a − c) + (b − d)i. To multiply our numbers means to calculate (a + bi)(c + di). We first assume complex multiplication distributes over complex addition to get (a + 44 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA bi)(c + di) = ac + adi + bic + bidi. Next we assume that the order in which we multiply complex numbers doesn’t matter to get ac + adi + bic + bidi = ac + adi + bci + bdi2 . Now we use the fact that i2 = −1 to get ac + adi + bci + bdi2 = ac + adi + bci − bd. We now rearrange the terms to get the following definition of multiplication (a + bi)(c + di) = (ac − bd) + (ad + bc)i. Examples 2.1.2. Carry out the following calculations. (i) (7 − i) + (−6 + 3i). We add together the real parts to get 1; adding together −i and 3i we get 2i. Thus the solution is 1 + 2i. (ii) (2 + i)(1 + 2i). First we multiply out the brackets as usual to get 2 + 4i + i + 2i2 . We now use the fact that i2 = −1 to get 2 + 4i + i − 2. Finally we simplify to get 0 + 5i = 5i. 2 √ . Multiply out and simplify to get −i. (iii) 1−i 2 The final operation is division. This is a little more involved and to explain how it can be done we need to define a new operation on complex numbers. Let z = a + bi ∈ C. Define z̄ = a − bi. The number z̄ is called the complex conjugate of z. Why is this operation useful? Let’s calculate z z̄. We have z z̄ = (a + bi)(a − bi) = a2 − abi + abi − b2 i2 = a2 + b2 . Notice that z z̄ = 0 if and only if z = 0. Thus for non-zero complex numbers z, the number z z̄ is a positive real number. Let’s see how we can use the complex conjugate to define division of complex numbers. Our goal is to calculate a + bi c + di where c+di 6= 0. The first step is to multiply top and bottom by the complex conjugate of c + di. We therefore get a + bi (a + bi)(c − di) (a + bi)(c − di) = = . c + di (c + di)(c − di) (c2 + d2 ) 2.1. COMPLEX NUMBER ARITHMETIC 45 This pretty much solves the problem because the top of this expression can be multiplied out in the usual way and the bottom is a real number and can be divided into each term in the top. We therefore have the following definition of division, where c + di 6= 0, ac + bd bc − ad a + bi = 2 + 2 i. c + di c + d2 c + d2 Examples 2.1.3. Carry out the following calculations. (i) (ii) (iii) 1+i . i The complex conjugate of i is −i. Multiply top and bottom of the fraction to get −i+1 = 1 − i. 1 i . 1−i The complex conjugate of 1 − i is 1 + i. Multiply top and bottom of the fraction to get i(1+i) = i−1 . 2 2 4+3i . 7−i The complex conjugate of 7 − i is 7 + i. Multiply top and bottom = 1+i . of the fraction to get (4+3i)(7+i) 50 2 We now introduce a way of thinking about complex numbers that enables us to visualize them. A complex number z = a + bi has two components: a and b. It is irresistible to plot these as a point in the plane. The plane used in this way is called the complex plane: the x-axis is the real axis and the y-axis is interpreted as the complex axis. ? O z=a+ib ib _ _ _ _ _ _ _ _ _/ a Although a complex number can be thought of as labelling a point in the complex plane, it can also be regarded as labelling the directed line segment from √ the origin to the point. By Pythagoras’ theorem, the length of this line is a2 + b2 . We define √ |z| = a2 + b2 46 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA where z = a + bi. This is called the modulus1 of the complex number z. Observe that √ |z| = z z̄. We shall use the following important property of moduli. Lemma 2.1.4. |wz| = |w| |z|. Proof. Let p w = a + bi and z = c + di. Then wz = (ac p− bd) + (ad + bc)i. Now |wz| = (ac − bd)2 + (ad + bc)2 whereas |w| |z| = (a2 + b2 )(c2 + d2 ). But (ac − bd)2 + (ad + bc)2 = (ac)2 + (bd)2 + (ad)2 + (bc)2 = (a2 + b2 )(c2 + d2 ). Thus the result follows. Remark We can deduce an interesting result in number theory from the algebra used in proving the above result. The product of two natural numbers each of which is a sum of squares is itself a sum of squares. The complex numbers were obtained from the reals by simply throwing in one new number, i, a square root of −1. Remarkably, every complex number has a square root. Theorem 2.1.5. Every nonzero complex number has exactly two square roots. Proof. Let z = a + bi be a nonzero complex number. We want to find a complex number w so that w2 = z. Let w = x + yi. Then we need to find real numbers x and y such that (x+yi)2 = a+bi. Thus (x2 −y 2 )+2xyi = a+bi, and so equating real and imaginary parts, we have to solve the following two equations x2 − y 2 = a and 2xy = b. Now we actually have enough information to solve our problem, but we can make life easier for ourselves by adding one extra equation. To get it, we use the modulus function. From (x+yi)2 = a+bi we get that |x + yi|2 = |a + bi|. √ Now |x + yi|2 = x2 + y 2 and |a + bi| = a2 + b2 . We therefore have three equations √ x2 − y 2 = a and 2xy = b and x2 + y 2 = a2 + b2 . 1 Plural: moduli 2.1. COMPLEX NUMBER ARITHMETIC 47 If we add the first and third equation together we get √ √ a2 + b 2 a a + a2 + b 2 2 x = + = . 2 2 2 We can now solve for x and therefore for y. Example 2.1.6. Every negative real number has√two square roots. We have that the square roots of −r, where r > 0 are ±i r. Example 2.1.7. Find both square roots of 3 + 4i and check your answers. We assume that there is a complex number x + yi where both x and y are real such that (x + yi)2 = 3 + 4i. Squaring and comparing real and imaginary parts we get that the following two equations must be satisfied by x and y x2 − y 2 = 3 and 2xy = 4. We also have a third equation by taking moduli x2 + y 2 = 5. Adding the first and third equation together we get x = ±2. Thus y = 1 if x = 2 and y = −1 if x = −2. The roots we want are therefore 2 + i and −2 − i. Of course, one root will be minus the other. Now square either root to check your answer: (2 + i)2 = 4 + 4i − 1 = 3 + 4i, as required. Remark Notice that the two square roots of a non-zero complex number will have the form w and −w; in other words, one root will be −1 times the other. If we combine our method for solving quadratics with our method for determining the square roots of complex numbers, we have a method for finding the roots of quadratics with any coefficients, whether they be real or complex. Example 2.1.8. Solve the quadratic equation 4z 2 + 4iz + (−13 − 16i) = 0. The complex numbers obey the same algebraic laws as the reals and so we can solve this equation by completing the square or we can simply plug the 48 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA numbers into the formula for the roots of a quadratic. Here I shall complete the square. First, we convert the equation into a monic one z 2 + iz + (−13 − 16i) = 0. 4 Next, we observe that 1 i (z + )2 = z 2 + iz − . 2 4 Thus i 1 z 2 + iz = (z + )2 + . 2 4 Our equation therefore becomes i 1 13 (z + )2 + + (− − 4i) = 0. 2 4 4 We therefore have i (z + )2 = 3 + 4i. 2 Taking square roots of both sides using a previous calculation, we have that z+ It follows that z = 2 + work. i 2 i = 2 + i or − 2 − i. 2 or − 2 − 3i . 2 Now check that these roots really do Every quadratic equation ALWAYS has has exactly two roots. Warning! Complex numbers are numbers: you need to think of them as a whole and not as an ordered pair of numbers. For example, it is very common for students to write down the roots of, say, 3 + 4i as x = 2, y = 1 and x = −2, y = −1. This is wrong. The roots are 2 + i and −2 − i. Historical point In fact, complex numbers were not introduced in quite the way I described them: it was cubic equations that led to their discovery. 2.1. COMPLEX NUMBER ARITHMETIC 49 Exercises 2.1 1. Calculate the discriminants of the following real quadratics and so determine whether they have two distinct roots, or repeated roots, or no real roots. (i) x2 + 6x + 5. (ii) x2 − 4x + 4. (iii) x2 − 2x + 5. 2. Solve the following quadratic equations by completing the square. Check your answers. (i) x2 + 10x + 16 = 0. (ii) x2 + 4x + 2 = 0. (iii) 2x2 − x − 7 = 0. 3. Solve the following problems in complex number arithmetic. In each case, the answer should be in the form a + ib where a and b are real. (i) (2 + 3i) + (4 + i). (ii) (2 + 3i)(4 + i). (iii) (8 + 6i)2 . (iv) 2+3i . 4+i (v) (vi) 1 3 + 1+i . i 3+4i 3−4i − 4+4i . 3−4i 4. Find the square roots of each of the following complex numbers and check your answers. (i) −i. √ (ii) −1 + i 24. (iii) −13 − 84i. 5. Solve the following quadratic equations and check your answers. (i) x2 + x + 1 = 0. (ii) 2x2 − 3x + 2 = 0. (iii) x2 − (2 + 3i)x − 1 + 3i = 0. 50 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA 2.2 The fundamental theorem of algebra I want to describe now a result which is one of the most important consequences of the properties of complex numbers: the fundamental theorem of algebra. It should be understood that this is a misnomer since algebra has expanded beyond all bounds since this theorem was first proved. Nevertheless, it is an important result playing a key role in calculus where it is used (in its real version which I also describe) to prove that any rational function can be integrated using partial fractions. In this section, we shall work with arbitrary polynomials and I shall now recall some terminology for handling them. An expression an xn + an−1 xn−1 + . . . + a1 x + a0 where ai are complex numbers, called the coefficients, is called a polynomial. We assume an 6= 0. The degree of this polynomial is n. We abbreviate this to deg. If an = 1 the polynomial is said to be monic. The term a0 is called the constant term and the term an xn is called the leading term. Polynomials can be added, subtracted and multiplied. Two polynomials are equal if they have the same degree and the coefficients of terms of the same degree are equal. • Polynomials of degree 1 are said to be linear; • those of degree 2, quadratic; • those of degree 3, cubic; • those of degree 4, quartic; • those of degree 5, quintic. There are special terms for polynomials of degree higher than 5, if you want them. Why are polynomials interesting? There are two answers to this question. First, they have widespread applications such as in helping to solve linear differential equations and in studying matrices. Second, a polynomial defines a function which is calculated in a very simple way using the operations of addition, subtraction and multiplication. However many, more complicated, functions can be usefully approximated by polynomial ones. 2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 51 We denote by C[x] the set of polynomials with complex coefficients and by R[x], the set of polynomials with real coefficients. I will write F [x] to mean F = R or F = C. 2.2.1 The arithmetic of polynomials The addition, subtraction and multiplication of polynomials is easy. We shall therefore concentrate in this section on division. Let f (x), g(x) ∈ F [x]. We say that g(x) divides f (x), denoted by g(x) | f (x), if there is a polynomial q(x) ∈ F [x] such that f (x) = g(x)q(x). We say that g(x) is a factor. Example 2.2.1. Let f (x) = x4 + 2x + 1 and g(x) = x + 1. Then (x + 1) | x4 + 2x + 1 since x4 + 2x + 1 = (x + 1)(x3 − x2 + x + 1). Lemma 2.2.2. Let f (x), g(x) ∈ F [x] be non-zero polynomials. Then deg f (x)g(x) = deg f (x) + deg g(x). Proof. Let f (x) have leading term am xm and let g(x) have leading term bn xn . Then the leading term of f (x)g(x) is am bn xm+n . Now am bn 6= 0 and so the degree of f (x)g(x) is m + n, as required. The above result is used to justify the standard procedure for dividing one polynomial into another which I shall now describe by means of an example. Remember that answers can always be checked by multiplying out. Example 2.2.3. Divide 6x4 + 5x3 + 4x2 + 3x + 2 by 2x2 + 4x + 5 and so find the quotient and remainder. We set out the computation in the following form. 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 To get the term involving 6x4 we would have to multiply the lefthand side by 3x2 . As a result we write down the following 3x2 2x2 + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 52 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA We now subtract the lower righthand side from the upper and we get 3x2 2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 2 The procedure is now repeated with the new polynomial. 3x2 − 27 x 2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 −7x3 − 14x2 − 35 x 2 41 2 3x + 2 x + 2 2 The procedure is repeated one more time with the new polynomial 3x2 − 27 x + 32 quotient 2x + 4x + 5 6x4 + 5x3 + 4x2 + 3x + 2 6x4 + 12x3 + 15x2 −7x3 − 11x2 + 3x + 2 −7x3 − 14x2 − 35 x 2 41 2 3x + 2 x + 2 3x2 + 12 x + 15 2 2 29 x − 11 remainder 2 2 2 This is the end of the line because the new polynomial we obtain has degree strictly less than the polynomial we are dividing by. What we have shown is that 7 3 29 11 6x4 + 5x3 + 4x2 + 3x + 2 = (2x2 + 4x + 5)(3x2 − x + ) + ( x − ). 2 2 2 2 You can verify this is true by multiplying out the righthand side. 2.2.2 Roots of polynomials The following result is analogous to the remainder theorem for integers. I shall not prove it here, although it should seem plausible. 2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 53 Proposition 2.2.4 (Remainder theorem). Let f (x) and g(x) be polynomials in F [x] where deg f (x) ≥ deg g(x). Then either g(x) | f (x) or f (x) = g(x)q(x) + r(x) where deg r(x) < deg g(x). Example 2.2.5. Let f (x) = x3 + x + 3 and g(x) = x2 + x. Then x3 + x + 3 = (x − 1)(x2 + x) + (2x + 3). Let f (x) ∈ F [x]. A number r ∈ F is said to be a root or zero of f (x) if f (r) = 0. The roots of f (x) are the solutions of the equation f (x) = 0. Example 2.2.6. The number 1 is a root of x100 −2x98 +1 because 1−2+1 = 0. Checking whether a number is a root is easy, but finding a root in the first place is trickier. The next result tells us that when we find roots of polynomials we are in fact determining linear factors. It is crucial to eveything we shall do. Proposition 2.2.7. Let r ∈ F . Then r is a root of f (x) ∈ F [x] if and only if (x − r) | f (x). Proof. Suppose that (x − r) | f (x). Then by definition f (x) = (x − r)q(x) for some polynomial q(x). If we now calculate f (r) we see immediately that it must be zero. We now prove the converse. Suppose that r is a root of f (x). By the remainder theorem, either (x − r) | f (x) or f (x) = q(x)(x − r) + r(x) where deg(r(x)) < deg(x − r) = 1. If the former then we are done. If the latter then it follows that r(x) is in fact a constant (that is, just a number). Call this number a. If we calculate f (r) we get a. It follows that in fact a = 0 and so (x − r) | f (x). Example 2.2.8. We have seen that the number 1 is a root of x100 − 2x98 + 1. Thus by the above result (x − 1) | x100 − 2x98 + 1. A root r of a polynomial f (x) is said to have multiplicity m if (x − r)m | f (x) but (x − r)m+1 does not divide f (x). A root is always counted according to its multiplicity. 54 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA Example 2.2.9. The polynomial x2 + 2x + 1 has −1 as a root and no other roots. However (x + 1)2 = x2 + 2x + 1 and so the root −1 occurs with multiplicity 2. Thus the polynomial has two roots counting multiplicities. This is the sense in which we can say that a quadratic equation always has two roots. Proposition 2.2.10. A non-zero polynomial of degree n has at most n roots. Proof. Let f (x) be a non-zero polynomial of degree n > 0. Suppose that f (x) has a root a. Then f (x) = (x − a)f1 (x) by Proposition 2.2.7 and the degree of f1 (x) is n − 1. This argument can be repeated and we reach the desired conclusion. One question I have so far not dealt with is whether a polynomial need have a root. This is answered by the following theorem whose name reflects its importance when first discovered, and not its significance in modern algebra. We shall not give a proof because that would require more advanced methods than are covered in this course. It was first proved by the great German mathematician Carl Friedrich Gauss (1777–1855) in 1799 when he was 22. Theorem 2.2.11 (Fundamental theorem of algebra (FTA)). Every non-zero polynomial of degree n with complex coefficients has a root. This theorem has the following important consequence using Proposition 2.2.10. Corollary 2.2.12. Every polynomial with complex coefficients of degree n has exactly n complex roots (counting multiplicities). Thus every such polynomial can be written as a product of linear polynomials. Proof. Let f (x) be a non-zero polynomial of degree n. By the FTA, this polynomial has a root r1 . Thus f (x) = (x − r1 )f1 (x) where f1 (x) is a polynomial of degree n − 1. This argument can be repeated and we eventually end up with f (x) = a(x − r1 ) . . . (x − rn ) where a is the last quotient, necessarily a complex number. Example 2.2.13. It can be checked that the quartic x4 − 5x2 − 10x − 6 has roots −1, 3, i − 1 and −1 − i. We can therefore write x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i). 2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 55 In many practical examples, our polynomials will have real coefficients and we will want any factors of the polynomial to be likewise real. The result above doesn’t do that because it could produce complex factors. However, we can rectify this situation at a very small price. We shall use the notion of the complex conjugate of a complex number that we introduced earlier. Observe that z1 + . . . + zn = z1 + . . . + zn and z1 . . . zn = z1 . . . zn and z is real if and only if z = z. The proofs are left as exercises. We may now prove the following key lemma. Lemma 2.2.14. Let f (x) be a polynomial with real coefficients. If the complex number z is a root then so too is z. Proof. Let f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 where the ai are real numbers. Let z be a complex root. Then 0 = an z n + an−1 z n−1 + . . . + a1 z + a0 . Take the complex conjugate of both side and use the properties of the complex conjugate to get 0 = an z̄ n + an−1 z̄ n−1 + . . . + a1 z̄ + a0 and so z̄ is also a root. Example 2.2.15. We saw above that x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i). Observe that the complex roots −1 − i and −1 + i are complex conjugates of each other. 56 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA Lemma 2.2.16. Let z be a complex number which is not real. Then (x − z)(x − z̄) is an irreducible quadratic with real coefficients. On the other hand, if x2 + bx + c is an irreducible quadratic with real coefficients then its roots are complex conjugates of each other. Proof. To prove the first claim, we multiply out to get (x − z)(x − z̄) = x2 − (z + z̄)x + z z̄. Observe that z + z̄ and z z̄ are both real numbers. The discriminant of this polynomial is (z − z̄)2 . You can check that if z is complex and non-real then z − z̄ is purely complex. It follows that its square is negative. We have therefore shown that our quadratic is irreducible. The proof of the second claim follows from the formula for the roots of a quadratic combined with the fact that the square root of a negative real will have the form ±αi where α is real. Example 2.2.17. We saw above that x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x + 1 + i)(x + 1 − i). Multiply out (x + 1 + i)(x + 1 − i) and we get x2 + 2x + 2. Thus x4 − 5x2 − 10x − 6 = (x + 1)(x − 3)(x2 + 2x + 2) with all the polynomials involved being real. The following theorem is the one that we can use to help us solve problems involving real polynomials. Theorem 2.2.18 (Fundamental theorem of algebra for real polynomials). Every polynomial with real coefficients can be written as a product of polynomials with real coefficients which are either linear or irreducible quadratic. Proof. We can write the polynomial as a product of linear polynomials. Bring the real linear factors to the front. The remaining linear polynomials will have complex coefficients. They correspond to roots that come in complex conjugate pairs. Multiplying together those complex linear factors corresponding to complex conjugate roots we get real quadratics and the result is proved. 2.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 57 In fact, we can write any real polynomial as a real number times a product of monic linear and quadratic factors. This result is the basis of the method of partial fractions used in integrating rational functions in calculus. Finding the exact roots of a polynomial is difficult, in general. However, the following result tells us how to find the rational roots of monic polynomials with integer coefficients. It is a nice, and perhaps unexpected, application of the number theory we developed earlier. Lemma 2.2.19. Let f (x) = xn + an−1 xn−1 + . . . + a1 x + a0 be a monic polynomial with integer coefficients. Then if rs is a root with r and s coprime then r | a0 and s = 1. In other words, any rational root is an integer that divides the constant term. Proof. Substituting r s into f (x) we have, by assumption, that r r r 0 = ( )n + an−1 ( )n−1 + . . . + a1 ( ) + a0 . s s s Multiply through by rn to get 0 = rn + an−1 srn−1 + . . . + sn−1 r + a0 sn . Thus r | a0 sn . Now r and s are coprime and so r and sn must be coprime (think about common divisors which are prime). It follows that r | a0 . Similarly s | rn but s and rn are coprime and so s = 1. Example 2.2.20. Find all the roots of the following polynomial x4 − 8x3 + 23x2 − 28x + 12. The polynomial is monic and so the only possible rational roots are integers and must divide 12. Thus the only possible rational roots are ±1, ±2, ±3, ±4, ±6, ±12. We find immediately that 1 is a root and so (x−1) must be a factor. Dividing out by this factor we get the quotient x3 − 7x2 + 16x − 12. 58 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA We check this polynomial for rational roots and find 2 works. Dividing out by (x − 2) we get the quotient x2 − 5x + 6. Once we get down to a quadratic we can solve it directly. In this case it factorizes as (x − 2)(x − 3). We therefore have that x4 − 8x3 + 23x2 − 28x + 12 = (x − 1)(x − 2)2 (x − 3). At this point, I usually multiply out the righthand side and check that I really do have an equality. In this case, all roots are rational and are 1,2,2,3. The above result suggests an approach to finding the roots of polynomials with rational coefficients. First, multiply through by a number large enough to render all coefficients integers. Second, find the rational roots using the method above. For each such root a, there will be a linear factor (x − a). Dividing out by the product of these factors will yield a polynomial whose roots will be necessarily irrational but will at least be of smaller degree. This polynomial will have real coefficients and so its complex roots will occur in complex conjugate pairs. Although in real life this approach cannot be guaranteed to find all the roots, it should stand you in good stead in the considerably less real life of exam questions. Exercises 2.2 1. Find the quotient and remainder when the first polynomial is divided by the second. (i) x3 − 7x − 1 and x − 2. (ii) x4 − 2x2 − 1 and x2 + 3x − 1. (iii) 2x3 − 3x2 + 1 and x. 2. Find all roots using the information given. (i) 4 is a root of 3x3 − 20x2 + 36x − 16. (ii) −1, −2 are both roots of x4 + 2x3 + x + 2. 3. Find a cubic having roots 2, −3, 4. 2.3. COMPLEX NUMBER GEOMETRY 59 4. Find a quartic having roots i, −i, 1 + i and 1 − i. 5. The cubic x3 + ax2 + bx + c has roots α, β and γ. Show that a, b, c can each be written in terms of the roots. √ 6. 3 + i 2 is a root of x4 + x3 − 25x2 + 41x + 66. Find the remaining roots. √ 7. 1 − i 5 is a root of x4 − 2x3 + 4x2 + 4x − 12. Find the remaining roots. 8. Find all the roots of the following polynomials. (i) x3 + x2 + x + 1. (ii) x3 − x2 − 3x + 6. (iii) x4 − x3 + 5x2 + x − 6. 9. Write each of the following polynomials as a product of linear or quadratic real factors. (i) x3 − 1. (ii) x4 − 1. (iii) x4 + 1. 2.3 Complex number geometry We proved in Section 2.1, that every non-zero complex number has two square roots. From the fundamental theorem of algebra (FTA), discussed in Section 2.2, we know that every non-zero complex numbers has three cube roots, and four fourth roots, and more generally n nth roots. However, we didn’t prove the FTA. The main goal of this section is to prove that every non-zero complex number has n nth-roots. To do this, we shall think about complex numbers in a geometric, rather than an algebraic, way. Throughout this section we shall not assume FTA. We shall only need the result proved in the previous section that every polynomial of degree n has at most n roots. 60 2.3.1 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA sin and cos We first recall some well-known properties of the trigonometric functions sin and cos. First the addition formulae sin(α + β) = sin α cos β + cos α sin β and cos(α + β) = cos α cos β − sin α sin β. These formulae were important historically because they enabled unknown values of sin’s and cos’s to be calculated from known ones, and so they were useful in constructing trig tables in the days before calculators In university mathematics, angles are usually measured in radians rather than degrees. This is because radians are a natural unit of angle measurement whereas the system of angle measurement based on degrees is an historical accident. Why 360 degrees in a circle? Ask the Ancient Babylonians. Positive angles are measures in an anticlockwise direction. The sin and cos functions are periodic functions with period 2π. This means that for all angles θ sin(θ + 2πn) = sin θ and cos(θ + 2πn) = cos θ for all n ∈ Z. This fact will be crucial in what follows. The following table of values will be useful. I leave it as an exercise to justify it. θ 0◦ 30◦ 45◦ sin θ cos θ 0 1 √ 60◦ 90◦ 2.3.2 1 2 √1 √2 3 2 3 2 √1 2 1 2 1 0 The complex plane In this section, we shall describe in more detail an alternative way of thinking about complex numbers which turns out to be very fruitful. Recall that a complex number z = a + bi has two components: a and b. We can plot these as a point in the plane. The plane used in this way is called the complex 2.3. COMPLEX NUMBER GEOMETRY 61 plane: the x-axis is the real axis and the y-axis is interpreted as the complex axis. Although a complex number can be thought of as labelling a point in the complex plane, it can more usefully be regarded as labelling the directed line segment from the origin to the point. This is how we shall regard it. Let z = a + bi be a non-zero complex number and let θ be the angle that it makes with the positive reals. The length of z as a directed line segment in the complex plane is |z|, and by basic trig a = |z| cos θ and b = |z| sin θ. It follows that z = |z| (cos θ + i sin θ) . ? O z i|z| sin(θ) _ _ _ _ _ _ _ _ _/ |z| cos(θ) Observe that |z| is a non-negative real number. This way of writing complex numbers is called the polar form. At this point, I need to clarify the only feature of complex numbers that causes confusion. I have already mentioned that the functions sin and cos are periodic. For that reason, there is not just one number θ that yields the complex number z but infinitely many of them: namely, all the numbers θ + 2πk where k ∈ Z. For this reason, we define the argument of z, denoted by arg z, not merely to be the single angle θ but the set of all angles θ + 2πk where k ∈ Z. The angle θ is chosen so that 0 ≤ θ < 2π and is called, for convenience, the principal argument. But note that books vary on what they choose to call the principal argument. This feature of the argument plays a crucial role when we come to calculate nth roots. Observe that complex numbers of the form cos θ + i sin θ are precisely the complex numbers of unit length and so all together they form the complex numbers lying on the unit circle with centre the origin in the complex plane. Thus every non-zero complex number is a real number times a complex number lying on the unit circle. 62 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA Let w = r (cos θ + i sin θ) and z = s (cos φ + i sin φ) be two non-zero complex numbers. We shall calculate wz. We have that wz = rs (cos θ + i sin θ) (cos φ + i sin φ) = rs[(cos θ cos φ − sin θ sin φ) + (sin θ cos φ + cos θ sin φ)i] but using the properties of the sin and cos functions this reduces to wz = rs (cos(θ + φ) + i sin(θ + φ)) . We thus have the following important result: when two non-zero complex numbers are multiplied together their lengths are multiplied and their arguments are added. This result helps us to understand the meaning of i. Multiplication by i is the same as a rotation about the origin by a right angle. Multiplication by i2 is therefore the same as a rotation about the origin by two right angles. But this is exactly the same as multiplication by −1. iO −1 o / 1 −i We may apply similar geometric reasoning to explain why −1 × −1 = 1. Multiplication by −1 is interpreted as rotation about the origin by 180◦ . It 2.3. COMPLEX NUMBER GEOMETRY 63 follows that doing this twice takes us back to where we started and so is equivalent to multiplication by 1. The proof of the next theorem follows by repeated application of the result we proved above. Theorem 2.3.1 (De Moivre). Let n be a positive integer. If z = r (cos θ + i sin θ) then z n = rn (cos nθ + i sin nθ) . This result has very nice applications in painlessly obtaining trigonometric identities. Example 2.3.2. Express cos 3θ in terms of cos θ and sin θ using De Moivre’s Theorem. We have that (cos θ + i sin θ)3 = cos 3θ + i sin 3θ. However, we can expand the LHS by either multiplying out directly or using Pascal’s triangle (or the binomial theorem as described in Chapter 5) to get cos3 θ + 3i cos2 θ sin θ + 3 sin θ(i sin θ)2 + (i sin θ)3 which simplifies to cos3 θ − 3 cos θ sin2 θ + i 3 cos2 θ sin θ − sin3 θ where we use the fact that i2 = −1 and i3 = −i and i4 = 1. Equating real and imaginary parts we get cos 3θ = cos3 θ − 3 cos θ sin2 θ. We also get the formula sin 3θ = 3 cos2 θ sin θ − sin3 θ for free. 2.3.3 Arbitrary roots of complex numbers In this section, we shall prove that every non-zero complex number has n nth roots: thus it has three cube roots, and four fourth roots and so on. We begin with a special case that turns out to give us almost all the information we need to solve the general case. 64 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA The nth roots of unity We shall show that the number 1 has n nth roots — these are called the n roots of unity. We know that the equation z n − 1 = 0 has at most n roots, so all we need do is find n roots and we are home and dry. We begin with some concrete examples. Example 2.3.3. We find the three cube roots of 1. Divide the unit circle in the complex plane into an equilateral triangle with 1 as one of its vertices. Let ω denote the first vertex we meet when travelling anticlockwise from 1. Then the roots are ω, ω 2 , ω 3 = 1 where ω= √ √ 1 1 −1 + i 3 and ω 2 = − 1 + i 3 . 2 2 Example 2.3.4. We find the six sixth roots of unity. Divide the unit circle in the complex plane into a regular hexagon with 1 as one of its vertices. Let ω denote the first vertex we meet when travelling anticlockwise from 1. This is just ω = cos 60◦ + i sin 60◦ . The remaining vertices are ω 2 , ω 3 , . . . , ω 6 = 1. It is now easy to check using De Moivre that these are all roots of unity. For example, ω 2 = cos 120◦ + i sin 120◦ . This gives the trigonometric form of the roots. In this case, we can easily find the algebraic form of the roots; we get ω= √ 1 1+i 3 . 2 The general case is solved in a similar way to our examples above using regular n-gons in the complex plane where one of the vertices is 1. Theorem 2.3.5 (Roots of unity). The n roots of unity are given by the following formula 2kπ 2kπ cos + i sin n n for k = 1, 2, . . . , n. These complex numbers are arranged uniformly on the unit circle and form a regular polygon with n sides: the cube roots of unity form an equilateral triangle, the fourth roots form a square, the fifth roots form a pentagon, and so on. 2.3. COMPLEX NUMBER GEOMETRY 65 There is only one point here that is a little confusing. It is always possible and easy to write down the trigonometric form of the nth roots of unity. It is also always possible to write down the algebraic form of the nth roots of unity but this is far from easy in general; in fact, it forms part of the advanced subject known as Galois theory. Arbitrary nth roots The nth roots of unity play an important role in finding arbitrary nth roots. We begin with an example to illustrate the idea. Example 2.3.6. We√ find the three cube roots of 2. If you use your calculator 3 you will simply find 2, a real number. There should be two others: where are they? The explanation is that the other two cube roots are complex. Let ω be the complex cube root of 1 that we described above. Then the three cube roots of 2 are the following √ √ √ 3 3 3 2, ω 2, ω 2 2. The above example generalizes. Theorem 2.3.7 (nth roots). Let z = r (cos θ + i sin θ) be a non-zero complex number. Put √ θ θ n u = r cos + i sin , n n the obvious nth root, and put ω = cos 2π 2π + i sin , n n the first interesting nth root of unity. Then the nth roots of z are as follows u, uω, . . . , uω n−1 . It follows that the nth roots of z = r (cos θ + i sin θ) can be written in the form √ θ 2kπ θ 2kπ n r cos + + i sin + n n n n for k = 0, 1, 2, . . . , n − 1. This is the reason why every non-zero number has two square roots that differ by a multiple of −1: the two square roots of 1 are 1 and -1. 66 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA 2.3.4 Euler’s formula We have seen that every real number can be written as a whole number plus a possibly infinite decimal part. It turns out that many functions can also be written as a sort of decimal. I shall illustrate this by means of an example. Consider the function ex . All you need to know about this function is that it is equal to its derivative and e0 = 1. We would like to write ex = a0 + a1 x + a2 x2 + a3 x3 + . . . where the ai are real numbers that we have yet to determine. We can work out the value of a0 easily by putting x = 0. This tells us that a0 = 1. To get the value of a1 we first differentiate our expression to get ex = a1 + 2a2 x + 3a3 x2 + . . . Now put x = 0 again and this time we get that a1 = 1. To get the value of a2 we differentiate our expression again to get ex = 2a2 + 3 · 2 · a3 x + . . . Now put x = 0 and we get that a2 = 21 . Continuing in this way we quickly spot the pattern for the values of the coefficient an . We find that an = n!1 where n! = n(n − 1)(n − 2) . . . 2 · 1. What we have done for ex we can also do for sin x and cos x and we obtain the following series expansions of each of these functions. • ex = 1 + x + x2 2! + x3 3! + x4 4! + . . .. • sin x = x − x3 3! + x5 5! − x7 7! + . . .. • cos x = 1 − x2 2! + x4 4! − x6 6! + . . .. There are interesting connections between these three series. We shall now show that complex numbers help to explain them. Without worrying about the validity of doing so, we calculate the infinite series expansion of eiθ . We have that eiθ = 1 + (iθ) + 1 1 (iθ)2 + (iθ)3 + . . . 2! 3! 2.3. COMPLEX NUMBER GEOMETRY 67 that is 1 2 1 3 1 θ − θ i + θ4 + . . . 2! 3! 4! By separating out real and complex parts, and using the infinite series we obtained above, we get Euler’s remarkable formula eiθ = 1 + iθ − eiθ = cos θ + i sin θ. Thus the complex numbers enable us to find the hidden connections between the three most important functions of calculus: the exponential function and the sine and cosine functions. It follows that every non-zero complex number can be written in the form reiθ . If we put θ = π in Euler’s formula, we get the following result, which is widely regarded as one of the most amazing in mathematics. Theorem 2.3.8 (Euler’s identity). eπi = −1. This result shows us that the real numbers π, e and −1 are connected, but that to establish that connection we have to use the complex number i. This is one of the important roles of the complex numbers in mathematics in that they enable us to make connections between topics that look different: they form a mathematical hyperspace. It’s soon’, no’ sense, that faddoms the herts o’ men, And by my sangs the rouch auld Scots I ken E’en herts that ha’e nae Scots’ll dirl richt thro’ As nocht else could – for here’s a language rings Wi’ datchie sesames, and names for nameless things. Gairmscoile by Hugh MacDiarmid (my italics) Exercises 2.3 1. Express cos 5x and sin 5x in terms of cos x and sin x. 2. Solve x3 = −8i. 68 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA 3. Prove the following where x is real.2 1 (eix − e−ix ). 2i = 21 (eix + e−ix ). (i) sin x = (ii) cos x 4. Using Question 3(ii), show that cos4 x = 81 [cos 4x + 4 cos 2x + 3]. 2.4 *Making sense of complex numbers* This section will not be examined in 2013. In this chapter, I have assumed that complex numbers exist and that they obey the usual high-school rules of algebra. In this section, I shall sketch out a proof of this. We start with the set R × R whose elements are ordered pairs (a, b) where a and b are real numbers. It will be helpful to denote these ordered pairs by bold letters so a = (a1 , a2 ). We define 0 = (0, 0), 1 = (1, 0) and i = (0, 1). We now define operations as follows • If a = (a1 , a2 ) and b = (b1 , b2 ), define a + b = (a1 + b1 , a2 + b2 ). • If a = (a1 , a2 ) define −a = (−a1 , −a2 ). • If a = (a1 , a2 ) and b = (b1 , b2 ), define ab = (a1 b1 − a2 b2 , a1 b2 + a2 b1 ). • If a = (a1 , a2 ) 6= 0 define −a a1 p 2 ). , a−1 = ( p 2 a1 + a22 a21 + a22 It is now a long exercise to check that all the usual axioms of high-school algebra hold. Now observe that the element (a1 , a2 ) can be written a1 1 + a2 i and that ii = (0, 1)(0, 1) = (−1, 0) = −1. This proves that the complex numbers as I described them earlier in this chapter really do exist. 2 Compare (i) and (ii) below with sinh x = 21 (ex − e−x ) and cosh x = 21 (ex + e−x ). 2.5. *MORNING DUEL: CUBICS, QUARTICS, QUINTICS AND BEYOND*69 2.5 *Morning duel: cubics, quartics, quintics and beyond* This section will not be examined in 2013. The main source for this section is Galois Theory by Ian Stewart, Second edition, Chapman and Hall, 1998. We have seen that every polynomial with complex coefficients has a root. However, no indication was given on how to find that root. On the other hand, for quadratic polynomials we have a formula for finding the roots. It’s natural to ask, therefore, whether we can find a formula for finding the roots of any polynomial. I have already noted that the formula for solving a quadratic equation was known in antiquity. In fact, a Babylonian clay tablet dated 1600 BC poses problems which are equivalent to solving quadratic equations. However, it was not until the sixteenth century that any advances were made in solving polynomial equations of higher degree. The following is taken from John Stillwell’s book Elements of algebra, Springer, 1994: Around 1500, in Bologna3 , del Ferro found the solution to the cubic equation. The solution was rediscovered by Tartaglia in the 1530s, and published in Cardano’s Ars Magna4 [1545]. This book also gave the solution to the quartic equation, which was found by Cardano’s student Ferrari. In other words, once quadratics had been solved, it took another 3,000 years to figure out how to solve cubics. Having learnt how to solve quadratics, cubics and quartics, the next equations to be studied were the quintics, equations of degree 5. However, at this point the story takes an unexpected turn. In 1824, the young Norwegian mathematican Abel proved that there was no formula for the roots of a quintic expressible in terms of the basic operations of algebra and extracting roots. The question now arises of why there is such a difference between equations of degree 4 or less and equations of degree 5 or more. The complete answer to this question was found by the French mathematician Evariste Galois. An outline of his discoveries was drawn up by Galois in a letter he wrote on the evening of 29th May 1832. The next day he fought a dual with pistols, was shot and subsequently died 3 4 In Italy. The Great Art. 70 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA of peritonitis. The reason for the dual is disputed: perhaps a femme fatale perhaps for political reasons. The stark fact is that he was a mere 20 when he died, but his work changed the course of mathematics: there is algebra before Galois and algebra after Galois. Galois is responsible for the fact that algebra changes its character at the university level: it is not merely a harder version of school algebra but a different beast. 2.6 *Analogies* This section will not be examined in 2013. There are parallels between the properties of the natural numbers and the properties of real polynomials. We have seen that there are remainder theorems for both natural numbers and polynomials. In the case of the natural numbers, we used the remainder theorem to develop Euclid’s algorithm and the Extended Euclidean algorithm for computing greatest common divisors. We can do the same thing for polynomials. We define the greatest common divisor of two real polynomials a(x) and b(x) to be a real polynomial of largest degree dividing both a(x) and b(x). Any two such gcd’s will be real number multiples of each other. We say that a(x) and b(x) are coprime if their greatest common divisor is a constant polynomial. Euclid’s algorithm and the Extended Euclidean algorithm can both be proved for real polynomials. As a consequence, if a(x) and b(x) are coprime real polynomials, then we can find real polynomials c(x) and d(x) such that 1 = a(x)c(x) + b(x)d(x). If f (x) is any real polynomial, then we can multiply both sides of the above equation by f (x) to get f (x) = a(x)[f (x)c(x)] + b(x)[f (x)d(x)]. Thus f (x) can be written in terms of a(x) and b(x) in a very simple way. There is a simple refinement of this result I shall use below. If deg f (x) < deg a(x) + deg b(x) then using the remainder theorem, we can in fact write f (x) = B(x)a(x) + A(x)b(x) where deg B(x) < deg b(x) and deg A(x) < deg a(x). 2.7. *RATIONAL FUNCTIONS* 71 Every natural number can be written as a product of primes, where a prime is a number which cannot be factorised in a non-trivial way. The analogue of a prime number for real polynomials is the notion of an irreducible polynomial. The real polynomial f (x) is said to be irreducible if it cannot be factorised into real polynomials each having smaller degree than f (x). Unlike the case of prime numbers, we can characterise the real irreducible polynomials very easily. It is a consequence of the fundamental theorem for real polynomials that there are only two kinds of irreducible real polynomials: linear polynomials c(x−a) and irreducible quadratic polynomials c(x2 +ax+ b), that is, quadratics having only non-real roots. We now have the following analogue of the fundamental theorem of arithmetic for real polynomials: every real polynomial of degree at least 1 can be written as a product of a real number and powers of distinct monic polynomials or distinct monic irreducible quadratic polynomials in essentially one way. 2.7 *Rational functions* This section will not be examined in 2013. (x) where f (x) and g(x) A (real) rational function is simply a quotient fg(x) are any polynomials with real coefficients, the polynomial g(x) of course not being equal to the zero polynomial. If deg f (x) < deg g(x), I shall say that the rational function is proper. The set of all rational functions R(x) — notice I use round brackets unlike the square brackets for the set of real polynomials — can be added, subtracted, multiplied and divided. In fact, they satisfy all the laws of algebra that the real numbers do, (F1)–(F9), and so form a field like the rationals, reals and complexes. Rational functions are enormously useful in mathematics. The goal of this section is to show that every rational function can be written as a sum of simpler rational functions. Once I have shown how to do this, I will outline its application to integration. 2.7.1 Numerical partial fractions This section is intended as motivation for the partial fraction representation of rational functions described below. I learnt about this idea from Littlewood’s book ‘A university algebra’ published in 1958. It has since been 72 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA forgotten and rediscovered: see the article by McDowell in The College Mathematics Journal 33 No. 5 (2002), 400–403. The goal of this section is to show how a proper fraction can be written as a sum of proper fractions over prime power denominators. This involves two steps which I shall describe by means of examples. The theory is an application of the fundamental theorem or arithmetic and the extended Euclidean algorithm. In order to add two fractions together, we first have to ensure that both are expressed over the same denominator. For example, suppose we want to 8 . Since 7 × 13 = 91 we have the following add 57 and 13 5 8 65 + 56 121 + = = . 7 13 91 91 810 We shall now consider the reverse process, using the fraction 1003 as an example. Observe that 1003 = 17 × 59 where 17 and 59 are coprime. Our goal is to write a b 810 = + 1003 17 59 for some natural numbers a and b. By the extended Euclidean algorithm, we can write 1 = 7 · 17 − 2 · 59. It follows that 7 · 17 − 2 · 59 7 2 1 = = − . 1003 17 · 59 59 17 Now multiply both sides by 810 to get 810 7 · 810 2 · 810 6 5 6 5 = − = 96 − 95 = 1 + − . 1003 59 17 59 17 59 17 Simplifying we get 810 6 12 = + 1003 59 17 as required. We shall now do something different. Consider the fraction 10 . We have 16 4 that 16 = 2 and so we cannot write it as a product of coprime numbers. However, we can do something else. We can write 10 = 2 + 8 = 21 + 23 . Thus 21 + 23 21 23 1 1 10 = = + = + . 16 24 24 24 23 2 2.7. *RATIONAL FUNCTIONS* 73 Thus 10 1 1 = 1 + 3. 16 2 2 Let’s now combine these two steps. Consider the fraction factorisation of 90 is 2 · 32 · 5. Our first goal is to write 41 a b c = + 2+ . 90 2 3 5 Thus we have to find a, b, c such that 41 . 90 The prime 41 = 45a + 10b + 18c. By trial and error, remembering that a, b, c have to be integers, we find that 41 = 45 · 1 + 10 · 5 + (−3) · 18. It follows that 1 5 3 41 = + 2− . 90 2 3 5 We now want to write d e 5 = + 2 2 3 3 3 where |d| , |e| < 3. But 5 = 2 + 3 and so 5 1 2 = + 2. 2 3 3 3 It follows that 41 1 1 2 3 = + + − . 90 2 3 9 5 We may summarise what we have found in the following theorem. Theorem 2.7.1. (i) Let ab be a proper fraction, and let b = pn1 1 . . . pnr r be the prime factorisation of b. Then r a X ci = b pni i=1 i for some integers ci , where each of the fractions is proper. (ii) Now let p be a prime and c pn a proper fraction. Then n X dj c = pn pj j=1 where each dj is such that |dj | < p. 74 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA 2.7.2 Partial fractions (x) be a rational function. If deg f (x) > deg g(x) then we may apply Let fg(x) the Remainder Theorem and write f (x) r(x) = q(x) + g(x) g(x) where deg r(x) < deg g(x). Thus without loss of generality, we may assume that deg f (x) < deg g(x) in what follows. I shall also assume that g(x) is monic; if it isn’t there will simply be a constant factor at the front. By the fundamental theorem for real polynomials, we may write g(x) as a product of distinct factors of the form (x − a)r or (x2 + ax + b)s . Using (x) can now be written as this decomposition of g(x), the rational function fg(x) a sum of simpler rational functions which have the following forms: • For each factor of g(x) of the form (x − a)r , we will have a sum of the form Ar−1 Ar A1 + ... + + . r−1 x−a (x − a) (x − a)r • For each factor of g(x) of the form (x2 + ax + b)s , we will have a sum of the form A1 x + B1 As−1 x + Bs−1 As x + Bs + ... + 2 + 2 . 2 s−1 x + ax + b (x + ax + b) (x + ax + b)s (x) . The practical This is called the partial fraction decomposition of fg(x) method for finding such decompositions is best illustrated by means of some examples. Examples 2.7.2. 5 (i) Write x2 +x−6 in partial fractions. We have that x2 +x−6 = (x+3)(x−2), a product of two distinct linear factors. We expect a solution of the form 5 A B = + 2 x +x−6 x+3 x−2 where A and B are real numbers to be determined. The RHS is just A(x − 2) + B(x + 3) . (x + 3)(x − 2) 2.7. *RATIONAL FUNCTIONS* 75 Comparing the LHS with the RHS we get that 5 = A(x − 2) + B(x + 3) which must hold for all values of x. Putting x = 2 we get B = 1 and putting x = −3 we get A = 1. Thus x2 5 −1 1 = + . +x−6 x+3 x−2 At this point, we check our solution. 9 (ii) Write (x−1)(x+2) 2 in partial fractions. Here we have a single linear factor and a square of a linear factor. We therefore expect an answer in the form A B C 9 = + + . 2 (x − 1)(x + 2) x − 1 x + 2 (x + 2)2 Carrying out the sum on the RHS, and comparing the LHS with the RHS we get that 9 = A(x + 2)2 + B(x − 1)(x + 2) + C(x − 1). Putting x = 1 we get that A = 1, putting x = −2, we get that C = −3 and putting x = −1 and using the values we have for A and C we get that B = −1. Thus 1 1 3 9 = − − . 2 (x − 1)(x + 2) x − 1 x + 2 (x + 2)2 (iii) Write x416x in partial fractions. We have that x4 − 16 = (x − 2)(x + −16 2 2)(x + 4), a product of two distinct linear factors and a quadratic factor. We expect a solution of the form A B Cx + D 16x = + + 2 . − 16 x−2 x+2 x +4 x4 This leads to 16x = A(x + 2)(x2 + 4) + B(x − 2)(x2 + 4) + (Cx + D)(x − 2)(x + 2). Using appropriate values of x we get that A = 1, B = 1, C = 2 and D = 0. Thus 16x 1 −1 2x = + + 2 . 4 x − 16 x−2 x+2 x +4 76 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA (iv) Write form 3x2 +2x+1 (x+2)(x2 +x+1)2 in partial fractions. We expect a solution in the 3x2 + 2x + 1 Bx + C Dx + E A + + = . (x + 2)(x2 + x + 1)2 x + 2 x2 + x + 1 (x2 + x + 1)2 This leads to 3x2 +2x+1 = A(x2 +x+1)2 +(Bx+C)(x+2)(x2 +x+1)+(Dx+E)(x+2). Putting x = −2 yields A = 1. There are four unknowns left and so we need four equations. However, to avoid having to solve four equations in four unknowns we can vary our procedure. Putting x = 0 gives 0 = C + E. Putting x = 1 gives −1 = B + C + D + E. Thus −1 = B + D. On the RHS the highest power of x occurring is x2 . On the LHS the highest power of x occurring apears to be x4 but that immediately implies that its coefficient must be zero. The coefficient of x4 is 1 + B and so B = −1 which means that D = 0. Put x = 2. This gives 6 = 7C + E. This quickly leads to E = −1 and C = 1. Thus 1 1−x −1 3x2 + 2x + 1 = + 2 + 2 . 2 2 (x + 2)(x + x + 1) x − 2 x + x + 1 (x + x + 1)2 Let me conclude this section by sketching out why the partial fraction decomposition of real rational functions is possible. Consider the proper rational function f (x) a(x)b(x) where a(x) and b(x) are coprime. Then we indicated above that we may write f (x) A(x) B(x) = + a(x)b(x) a(x) b(x) where the rational functions are all proper. This may be generalised as (x) follows. Let fg(x) be a proper rational function. Let g(x) = a1 . . . am (x) be a product of pairwise coprime polynomials. Then we may write m f (x) X Ai (x) = , g(x) ai (x) i=1 2.7. *RATIONAL FUNCTIONS* 77 where the rational functions are all proper. We shall now assume that the ai (x) are either powers of linear factors or of quadratic factors and that these factors are distinct for different i. h(x) Consider the proper rational function (x−a) r where r ≥ 1. Then we may write h(x) = a0 + a1 (x − 1) + . . . + ar−1 (x − a)r−1 for some real numbers a0 , . . . , ar−1 in a way analogous to writing a natural number in a number base. Thus a1 ar−1 a0 h(x) + ... + = + . r r−1 (x − a) x−a (x − a) (x − a)r Consider the proper rational function may similarly write h(x) (x2 +ax+b)r where r ≥ 1. Then we h(x) = (a0 x+b0 )+(a1 x+b1 )(x2 +ax+b)+. . .+(ar−1 x+br−1 )(x2 +ax+b)r−1 for some real numbers a0 , . . . , ar−1 and b0 , . . . , br−1 in a way analogous to writing a natural number in a number base. Thus (x2 ar−1 x + br−1 a1 x + b 1 a0 x + b 0 h(x) = 2 + ... + 2 + 2 . r r−1 + ax + b) x + ax + b (x + ax + b) (x + ax + b)r The existence of partial fraction decompositions of real rational functions now follows. 2.7.3 Integrating rational functions In order to appreciate the significance of partial fractions it is essential to understand how they are used. The goal of this section is therefore to show you how to calculate Z f (x) dx g(x) exactly, when f (x) and g(x) are real polynomials. We need to know one key property of integration: namely, if ai are real numbers then Z X Z n n X ai fi (x)dx = ai fi (x)dx i=1 i=1 78 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA This property is known as linearity. I shall break my discussion up into a number of steps. (x) Step 1. Suppose that in fg(x) we have that deg f (x) > deg g(x). By the Remainder Theorem for polynomials we can write f (x) r(x) = q(x) + g(x) g(x) where deg r(x) < deg g(x). By the linearity of integration, we have that Z Z Z f (x) r(x) dx = q(x)dx + dx. g(x) g(x) In other words, to integrate an arbitrary rational function it is enough to know how to integrate polynomials and proper rational functions. Step 2. By linearity of integration, integrating arbitrary polynomials can be reduced to integrating the following Z xn dx where n ≥ 0. (x) be a proper rational function, so that deg f (x) < deg g(x). Step 3. Let fg(x) We may factorise g(x) into a product of real linear polynomials and real (x) irreducible quadratic polynomials and then write fg(x) as a sum of rational functions of one of the following two forms a , (x − d)r where a and d are real and r ≥ 1, and (x2 px + q + bx + c)s where p, q, b, c are real and s ≥ 1 and the quadratic has a pair of complex conjugate roots. By the linearity of integration, this reduces calculating Z f (x) dx g(x) 2.7. *RATIONAL FUNCTIONS* 79 to calculating integrals of the following two forms Z a dx (x − d)r and Z px + q dx. (x2 + bx + c)s Again by linearity of integration, this reduces to being able to calculate the following three integrals Z Z Z 1 x 1 dx dx dx. r 2 s 2 (x − d) (x + bx + c) (x + bx + c)s Step 4. We now concentrate on the two integrals involving quadratics. By completing the square, we can write b2 b x2 + bx + c = (x + )2 + (c − ). 2 4 4c−b2 2 2 By assumption b − 4ac < 0 (why?). Put e = 4 (which makes sense). Thus b x2 + bx + c = (x + )2 + e2 . 2 I shall now use a technique of calculus known as substitution and put y = x + 2b . Doing this, and returning to x as my variable, we need to be able to integrate the following three integrals Z Z Z x 1 1 dx dx dx. r 2 2 s 2 (x − d) (x + e ) (x + e2 )s Step 5. The second integral above can be converted into the first by means of the substitution x2 = u. We have therefore proved the following. Theorem 2.7.3. The integration of an arbitrary rational function can be reduced to integrals of the following three kinds: R 1. xn dx. R 1 2. (x−d) r dx. R 1 3. (x2 +e 2 )s dx. You will learn how to calculate these integrals in your calculus module; it turns out that (1) and (2) are easy but (3) is trickier. 80 CHAPTER 2. THE FUNDAMENTAL THEOREM OF ALGEBRA 2.8 Learning outcomes for Chapter 2 • Solve quadratics by completing the square. • Carry out the arithmetic of complex numbers. • Find square roots of complex numbers. • Prove trigonometric identities using complex numbers. • Calculate nth roots. • Factorize real polynomials into products of real and quadratic factors. 2.9 Further reading and exercises There is a very nice introduction to complex numbers in Chapter 10 of Olive’s book. Be warned, that she uses j rather than i, a convention favoured by engineers. Chapters 1 and 3 of Hirst and Singerman contain alternative viewpoints and further exercises on the material of this chapter. Chapter 3 Matrices The term matrix was introduced by James Joseph Sylvester in 1850, and the first paper on matrix algebra was published by Arthur Cayley in 1858. Matrices were introduced initially as packaging for systems of linear equations, but then came to be investigated in their own right. The main goal of this chapter is to introduce the basics of the arithmetic and algebra of matrices, and we shall only be able to hint at some of their myriad applications. This chapter and the next form the first steps in the subject known as linear algebra. It is hard to overemphasize the importance of this subject throughout mathematics and its applications 3.1 Matrix arithmetic In this section, I shall introduce matrices and three arithmetic operations defined on them. I shall also define an operation called the ‘transpose of a matrix’ that will be important in later work. This section forms the foundation for all that follows. 3.1.1 Basic matrix definitions A matrix1 is a rectangular array of numbers. In this course, the numbers will usually be real numbers but, on occasion, I shall also use complex numbers for variety. 1 Plural: matrices. 81 82 CHAPTER 3. MATRICES Example 3.1.1. The following are all matrices: 1 2 3 4 5 6 , 4 1   1 1 −1 4  , , 0 2 1 1 3 6 . Warning! Usually the array of numbers that comprises a matrix is enclosed in round brackets. Occasionally books use square brackets with the same meaning. Later on, I shall introduce determinants and these are indicated by using straight brackets. In general, the kind of brackets you use is important and is not just a matter of taste. We usually denote matrices by capital Roman letters: A, B, C, etc. The size of a matrix is m × n if it has m rows and n columns. The entries in a matrix are often called the elements of the matrix and are usually denoted by lower case Roman letters. If A is an m × n matrix, and 1 ≤ i ≤ m and 1 ≤ j ≤ n, then the entry in the ith row and jth column of A is often denoted (A)ij . Thus ()ij means ‘the element in ith row and jth column’. Examples 3.1.2. (i) Let A= 1 2 3 4 5 6 Then A is a 2×3 matrix. We have that (A)11 = 1, (A)12 = 2, (A)13 = 3, (A)21 = 4, (A)22 = 5, (A)23 = 6. (ii) Let B= 4 1 Then B is a 2 × 1 matrix. We have that (B)11 = 4, (B)21 = 1. (iii) Let   1 1 −1 4  C= 0 2 1 1 3 Then C is a 3 × 3 matrix. (C)11 = 1, (C)12 = 1, (C)13 = −1, (C)21 = 0, (C)22 = 2, (C)23 = 4, (C)31 = 1, (C)32 = 1, (C)33 = 3. 3.1. MATRIX ARITHMETIC 83 (iv) Let D= 6 Then D is a 1 × 1 matrix. We have that (D)11 = 6. Matrices A and B are said to be equal, written A = B, if they have the same size and corresponding entries are equal: that is, (A)ij = (B)ij for all allowable i and j. Example 3.1.3. Given that a 2 b 3 x −2 = 4 5 c y z 0 Find a, b, c, x, y, z. This example simply illustrates what it means for two matrices to be equal. By definition a = 3, 2 = x, b = −2, 4 = y, 5 = z and c = 0. Remark A typical 2 × 3 matrix, for example, would be written a11 a12 a13 A= a21 a22 a23 3.1.2 Addition, subtraction, scalar multiplication and the transpose Addition Let A and B be two matrices of the same size. Then their sum A + B is the matrix defined by (A + B)ij = (A)ij + (B)ij . That is, corresponding entries of A and B are added. If A and B are not the same size then their sum is not defined. Subtraction Let A and B be two matrices of the same size. Then their difference A − B is the matrix defined by (A − B)ij = (A)ij − (B)ij . That is, corresponding entries of A and B are subtracted. If A and B are not the same size then their difference is not defined. 84 CHAPTER 3. MATRICES Scalar multiplication In matrix theory numbers are often called scalars. Let A be any matrix and λ any scalar. Then the matrix λA is defined as follows: (λA)ij = λ(A)ij . In other words, every element of A is multiplied by λ. Examples 3.1.4. (i) 1 2 −1 3 −4 6 + 2 1 3 −5 2 1 = 1+2 2 + 1 −1 + 3 3 + (−5) −4 + 2 6+1 which gives 3 3 2 −2 −2 7 (ii) 1 2 −1 3 −4 6 − 2 1 3 −5 2 1 = 1−2 2 − 1 −1 − 3 3 − (−5) −4 − 2 6−1 which gives −1 1 −4 8 −6 5 (iii) 1 1 2 1 − 3 3 2 −2 −2 7 is not defined since the matrices have different sizes. (iv) 2 3 3 2 −2 −2 7 = 6 6 4 −4 −4 14 Transpose of a matrix Let A be an m × n matrix. Then the transpose of A, denoted AT , is the n × m matrix defined by (AT )ij = (A)ji . We therefore interchange rows and columns: the first row of A becomes the first column of AT , the second row of A becomes the second column of AT , and so on. 3.1. MATRIX ARITHMETIC 85 Examples 3.1.5. The transposes of the following matrices 1 2 3 4 5 6 4 1 4 1 , are, respectively,   1 4  2 5  , 3 6   1 1 −1 4  , , 0 2 1 1 3 6   1 0 1 , 1 2 1  , −1 4 3 6 . Example 3.1.6. If A= 1 −1 2 3 0 1 and B = 0 −2 3 2 1 −1 we may calculate 3A + 2B using the above definitions to get 3 −7 12 13 2 1 3.1.3 Matrix multiplication This is more complicated than the other operations and, like them, is not always defined. To define this operation it is useful to work with two special classes of matrix. A row matrix or row vector is a matrix with one row (but any number of columns). A column matrix or column vector is a matrix with one column but any number of rows. Row and column matrices are often denoted by bold lower case Roman letters a, b, c . . .. The ith element of the row or column matrix a will be denoted by ai . Examples 3.1.7. The matrix 1 2 3 4 is a row matrix whilst is a column matrix.   1  2     3  4 86 CHAPTER 3. MATRICES I shall build up to the definition of matrix multiplication in three stages. Stage 1. Let a be a row matrix and b a column matrix, where a = (a1 a2 . . . am ) and     b=    b1 b2 . . . bn         Then their product ab is defined iff2 the number of columns of a is equal to the number of rows of b, that is m = n, in which case their product is the 1 × 1 matrix ab = (a1 b1 + a2 b2 + . . . + an bn ). Remark The number a1 b 1 + a2 b 2 + . . . + an b n is called the inner product of a and b and is denoted by a · b. Using this notation we have that ab = (a · b). Example 3.1.8. This odd way of multiplying is actually quite natural. Here’s an example of where it arises in real life. If you buy y items whose unit cost is x then you spend xy. This can be generalised as follows when you buy a number of different kinds of items at different prices. Let a be the row matrix 0·6 1 0·2 2 Throughout these notes, I use the now standard abbreviation ‘iff’ for the phrase ‘if and only if’. 3.1. MATRIX ARITHMETIC 87 where 0 · 6 is the price of a bottle of milk, 1 is the price of a loaf of bread, and 0 · 2 is the price of an egg. Let b be the column matrix   2  3  10 where 2 is the number of bottles of milk bought, 3 is the number of loaves of bread bought, and 10 is the number of eggs bought. Thus a is the price row matrix and b is the quantity column matrix. The total amount spent is therefore 0 · 6 × 2 + 1 × 3 + 0 · 2 × 10 : namely, the sum over all the commodities bought of the price of each commodity times the number of items of that commodity purchased. This number is precisely the inner product a · b: namely, 6 · 20. Stage 2. Let a be a row matrix as above and let B be a matrix. Thus a is a 1 × m matrix and B is a p × q matrix. Then their product aB is defined iff the number of columns of a is equal to the number of rows of B. Thus m = p. To calculate the product think of B as consisting of q column matrices b1 , . . . , bq . We calculate the q numbers a·b1 , . . . , a·bq as in stage 1, and the q numbers that result become the entries of aB. Thus aB is a 1 × q matrix whose jth entry is the number a · bj . Example 3.1.9. Let a be the cost matrix of our previous example. Let B be the 3 × 5 matrix whose columns tell me the quantity of commodities bought on each of the days of the week Monday to Friday:   2 0 2 0 4 B= 3 0 4 0 8  10 0 10 0 20 Thus on Tuesday and Thursday no purchases were made, whilst on Friday extra commodities were bought in preparation for the weekend. The matrix aB is a 1 × 5 matrix which tells us how much was spent on each day of the week. Thus   2 0 2 0 4 aB = 0 · 6 1 0 · 2  3 0 4 0 8  10 0 10 0 20 6 · 2 0 7 · 2 0 14 · 4 88 CHAPTER 3. MATRICES Stage 3. Let A be an m × n matrix and let B be a p × q matrix. Their product AB is defined iff the number of columns of A is equal to the number of rows of B: that is n = p. If this is so then AB is an m×q matrix. To define this product we think of A as consisting of m row matrices a1 , . . . , am and we think of B as consisting of q column matrices b1 , . . . , bq . As in Stage 2 above, we multiply the first row of A into each of the columns of B and this gives us the first row of A; we then multiply the second row of A into each of the columns of B to get the second row of B, and so on. Example 3.1.10. Let B be the 3 × 5 matrix of the previous example whose columns tell me the quantity of commodities bought on each of the days Monday to Friday   2 0 2 0 4 B= 3 0 4 0 8  10 0 10 0 20 Let A be the 2×3 matrix whose first row tells me the cost of the commodities in shop 1 and whose second row tells me the cost of the commodities in shop 2. 0·6 1 0·2 A= 0 · 65 1 · 05 0 · 30 The first row of AB tells me how much was spent on each day of the week in shop 1, and the second row of AB tells me how much was spent on each day of the week in shop 2. Thus   2 0 2 0 4 0·6 1 0·2  3 0 4 0 8  AB = 0 · 65 1 · 05 0 · 30 10 0 10 0 20 which is equal to 6 · 2 0 7 · 2 0 14 · 4 7 · 45 0 8 · 5 0 17 Examples 3.1.11. (i)    1 −1 0 2 1    2 3 1 −1 3    =   0 3.1. MATRIX ARITHMETIC 89 (ii) The product 1 −1 2 3 0 1 0 −2 3 2 1 −1 doesn’t exist because the number of columns of the first matrix is not equal to the number of rows of the second matrix. (iii) The product 1 2 4 2 6 0   4 1 4 3  0 −1 3 1  2 7 5 2 exists because the first matrix is a 2 × 3 and the second is a 3 × 4. Thus the product will be a 2 × 4 matrix and is 12 27 30 13 8 −4 26 12 3.1.4 Summary of matrix mutiplication • Let A be an m × n matrix and B a p × q matrix. The product AB is defined iff n = p and the result will then be an m × q matrix. In other words: (m × n)(n × q) = (m × q). • (AB)ij is the inner product of the ith row of A and the jth column of B. • It follows that the inner product of the ith row of A and each of the columns of B in turn yields each of the elements of the ith row of AB in turn. If ai are row matrices and bj are column matrices then the product of two matrices can be written as follows     a1 a1 · b1 . . . a1 · bn    .  . ... .       .  b1 . . . bn =  . . . . .      .    . ... . am am · b1 . . . am · bn 90 CHAPTER 3. MATRICES 3.1.5 Special matrices Matrices come in all shapes and sizes, but some of these are important enough to warrant their own terminology. A matrix all of whose elements are zero is called a zero matrix. The m × n zero matrix is denoted Om,n or just O and we let the context determine the size of O. A square matrix is one in which the number of rows is equal to the number of columns. In a square matrix A the elements (A)11 , (A)22 , . . . , (A)nn are called the diagonal elements. All the other elements of A are called the off-diagonal elements. A diagonal matrix is a square matrix in which all off-diagonal elements are zero. A scalar matrix is a diagonal matrix in which the diagonal elements are all the same. The n × n identity matrix is the scalar matrix in which all the diagonal elements are the number one. This is denoted by In or just I where we allow the context to determine the size of I. Thus scalar matrices are those of the form λI where λ is any scalar. A matrix is real if all its elements are real numbers, and complex if all its elements are complex numbers. Examples 3.1.12. (i) The matrix   1 0 0  0 2 0  0 0 3 is a 3 × 3 diagonal matrix. (ii) The matrix  1  0   0 0 0 1 0 0 0 0 1 0  0 0   0  1 is the 4 × 4 identity matrix. (iii) The matrix       is a 5 × 5 scalar matrix. 42 0 0 0 0 0 42 0 0 0 0 0 42 0 0 0 0 0 42 0 0 0 0 0 42       3.1. MATRIX ARITHMETIC 91 (iv) The matrix         0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0         is a 6 × 5 zero matrix. 3.1.6 Linear equations Matrices are extremely useful in helping us to solve systems of linear equations. For the time being, I shall simply show you how matrices provide a convenient notation for writing down such equations. A system of m linear equations in n unknowns is a list of equations of the following form a11 x1 + a12 x2 + . . . + a1n xn a21 x1 + a22 x2 + . . . + a2n xn am1 x1 + am2 x2 + . . . + amn xn = b1 = b2 ··· = bm If we have only a few unknowns then we often use w, x, y, z rather than x1 , x2 , x3 , x4 . A solution is a set of values of x1 , . . . , xn that satisfy all the equations. The set of all solutions is called the solution set or general solution. The equations above can be conveniently represented using matrices. Let A be the m × n matrix (A)ij = aij ; let b be the m × 1 matrix (b)i = bi , and let x be the n × 1 matrix (x)j = xj . Then the system of linear equations above can be written in the form Ax = b. The matrix A is called the coefficient matrix. At the moment, we are just using matrices as packaging for the equations. Example 3.1.13. Write the following system of equations in matrix form 2x + 3y = 1 x+y = 2 92 CHAPTER 3. MATRICES This is just 2 3 1 1 x y = 1 2 Exercises 3.1   1 2 1 4 1. Let A =  1 0  and B =  −1 1 . Find A + B, A − B and −1 1 0 3 −3B.     0 4 2 1 −3 5 0 −4 . Find the matrices 2. Let A =  −1 1 3  and B =  2 2 0 2 3 2 0 AB and BA.   0 1 3 1 1 0 3 3. Let A = , B =  −1 1  and C = . 0 −1 −1 1 1 3 1 Calculate BA, AA and CB. Can any other pairs of these matrices be multiplied ? Multiply those which can.   4. Calculate   1  2     3  4 1 2 3   2 1 3 0 −1 2 3   −1 0 , B = 5. If A = and C = . Calcu−2 1 4 0 1 2 3 late both (AB)C and A(BC) and check that you get the same answer. 6. Calculate 7. Calculate    2 −1 2 x  1   2 −4 y  3 −1 1 z 2 + i 1 + 2i i 3+i where i is the complex number i. 2i 2 + i 1 + i 1 + 2i 3.1. MATRIX ARITHMETIC 8. Calculate 93    a 0 0 d 0 0  0 b 0  0 e 0  0 0 c 0 0 f 9. Calculate (i)    1 0 0 a b c  0 1 0  d e f  0 0 1 g h i (ii)    0 1 0 a b c  1 0 0  d e f  0 0 1 g h i (iii)    a b c 0 1 0  d e f  1 0 0  g h i 0 0 1 10. Find the transposes of each of the following matrices   1 1 2 1 −3 5  2   0 −4 , C =  A =  1 0 , B =  2  3  −1 1 3 2 0 4     11. The Pauli matrices are the following 4 matrices with complex entries and their negatives: 1, i, j, k where 1= 1 0 0 1 i= i 0 0 −i j= 0 1 −1 0 and k = 0 i i 0 Show that the product of any two Pauli matrices is again either a Pauli matrix or minus a Pauli matrix by completing the following Cayley 94 CHAPTER 3. MATRICES table for multiplication (entry in row X and column Y is XY ). 1 i j k 1 i j k 3.2 Matrix algebra In this section, we shall look at algebra where the variables are matrices. This algebra is similar to high-school algebra but also differs significantly in one or two places. For example, if A and B are matrices it is not true in general that AB = BA even if both products are defined. We will learn in this section which rules of school algebra apply to matrices and those which don’t. 3.2.1 Properties of matrix addition (MA1) (A + B) + C = A + (B + C). This is the associative law for matrix addition. (MA2) A + O = A = O + A. The zero matrix O, the same size as A, is the additive identity for matrices the same size as A. (MA3) A + (−A) = O = (−A) + A. The matrix −A is the unique additive inverse of A. (MA4) A + B = B + A. Matrix addition is commutative. Thus matrix addition has the same properties as the addition of real numbers, apart from the fact that the sum of two matrices is only defined when they have the same size. The role of zero is played by the zero matrix O of the appropriate size. Example 3.2.1. Calculate 2A − 3B + 6I 3.2. MATRIX ALGEBRA 95 where A= 1 2 3 4 and B = 0 1 2 1 Because we are dealing with matrix addition and scalar multiplication the rules we apply are the same as those in high-school algebra. We get 8 1 0 11 3.2.2 Properties of matrix multiplication (MM1) (AB)C = A(BC). This is the associative law for matrix multiplication. (MM2) Let A be an m × n matrix. Then Im A = A = AIn . The matrices Im and In are the left and right multiplicative identities, respectively. (MM3) A(B + C) = AB + AC and (B + C)A = BA + CA. These are the left and right distributivity laws for matrix multiplication over matrix addition. Thus matrix multiplication has the same properties as the multiplication of real numbers, apart from the fact that the product is not always defined, except the following three major differences. Warning 1: matrix multiplication is not commutative. Consider the matrices 1 2 1 1 A= and B = 3 4 −1 1 Then AB 6= BA. One consequence of the fact that matrix multiplication is not commutative is that (A + B)2 6= A2 + 2AB + B 2 , in general (see below). 96 CHAPTER 3. MATRICES Warning 2: the product of two matrices can be a zero matrix without either matrix being a zero matrix. Consider the matrices 1 2 −2 −6 A= and B = 2 4 1 3 Then AB = O. Warning 3: cancellation of matrices is not allowed. Consider the matrices 0 2 2 3 −1 1 A= and B = and C = 1 4 1 4 0 1 Then A 6= O and AB = AC but B 6= C. Example 3.2.2. Calculate 1 2 2 1 1 2 −1 −2 0 1 1 0 3 4 2 1 However you bracket the matrices to carry out the calculation you should always get 17 −2 3 0 Example 3.2.3. Find the 3 × 2 matrix X that satisfies       1 −4 2 0 0 0 3  − 4X + 3  4 −2  =  0 0  2  −2 4 0 0 8 0 0 This can be solved much in the way it would be solved in high-school algebra to yield   2 −2  2 0  2 6 Example 3.2.4. If x y 4 1 A= and B = 3 1 3 0 commute find x and y. Multiplying out the matrices and comparing entries yields x = 5 and y = 1. 3.2. MATRIX ALGEBRA 3.2.3 97 Properties of scalar multiplication (S1) 1A = A. (S2) λ(A + B) = λA + λB (S3) (λµ)A = λ(µA). (S4) (λ + µ)A = λA + µA. (S5) (λA)B = A(λB) = λ(AB). Exercise 3.2.5. Calculate (3I)(4I). Note the brackets are there for clarity only. The answer is simply 12I. 3.2.4 Properties of the transpose (T1) (AT )T = A. (T2) (A + B)T = AT + B T . (T3) (αA)T = αAT . (T4) (AB)T = B T AT . Warning! Notice that the transpose of a product reverses the order of the matrices. There are some important consequences of the above properties: • Because matrix addition is associative we can write sums without brackets. • Because matrix multiplication is associative we can write matrix products without brackets. • The left and right distributivity laws can be extended to arbitrary finite sums. 98 3.2.5 CHAPTER 3. MATRICES Polynomials of matrices Let A be a square matrix. We can therefore form the product AA which we write as A2 . When it comes to multiplying A by itself three times there are apparently two possibilities: A(AA) and (AA)A. However, matrix multiplication is associative and so these two products are equal. We write this as A3 . In general An+1 = AAn = An A. We define A0 = I, the identity matrix the same size as A. It can be proved that the usual properties of exponents hold: Am An = Am+n and (Am )n = Amn . One important consequence is that powers of A commute so that Am An = An Am . We can form powers of matrices, multiply them by scalars and add them together. We can therefore form sums like A3 + 3A2 + A + 4I. In other words, we can substitute A in the polynomial x3 + 3x2 + x + 4 remembering that 4 = 4x0 and so has to be replaced by 4I. Example 3.2.6. Let f (x) = x2 + x + 2 and let 1 1 A= 1 0 We calculate f (A). Remember that x2 + x + 2 is really x2 + x + 2x0 . Replace x by A and so x0 is replaced by A0 which is I. We therefore get A2 + A + 2I and calculating gives 5 2 2 3 Example 3.2.7. Let f (x) = 2x2 + 3x + 3 and let 1 2 A= 2 3 Calculate f (A). You should get A= 16 22 22 38 3.2. MATRIX ALGEBRA 99 Warning! When a square matrix A is into a polynomial, you must replace the constant term of the polynomial by the constant term times the identity matrix. The identity matrix you use will have the same size as A. Example 3.2.8. Factorise A2 + A. We have that A2 + A = A2 + AI = A(A + I) using distributivity. A very common mistake is to think this is A(A + 1) which is wrong because the sizes of A and 1 will not match. Recall that a square matrix is a scalar matrix if it is a scalar multiple of an identity matrix. Scalar matrices commute with all matrices because A(λI) = λ(AI) = λA = λ(IA) = (λI)A. Let’s now multiply together two matrices of the form A + λI and A + µI. We get (A + λI)(A + µI) = (A + λI)A + (A + λI)µI = AA + λIA + A(µI) + (λI)(µI) which is just A2 + (λ + µ)A + λµI. Warning! In the above calculation (λ+µ) is a scalar and not a 1×1 matrix. Example 3.2.9. The product of A + 2I and A + 3I is A2 + 5A + 6I. A2 + 5A + 6I = (A + 2I)(A + 3I). Example 3.2.10. Is the following argument correct? Suppose X 2 = I. Then X 2 − I = O. Thus (X − I)(X + I) = O. Hence X = I or X = −I. The answer is no. It assumes, incorrectly, that if the product of two matrices is a zero matrix then one of the matrices must itself by a zero matrix. We have seen that this is false. In fact, far from having only two square roots, the 2 × 2 identity matrix in fact has infinitely many as we now show. 100 CHAPTER 3. MATRICES Example 3.2.11. Let A= a b c −a and suppose that a2 + bc = 1. Check that A2 = I. Examples of matrices satisfying these conditions are √ 1 + n2 √ −n n − 1 + n2 where n is any positive integer. Thus the 2 × 2 identity matrix has infinitely many square roots! Example 3.2.12. Suppose that AB = A and BA = B. Prove that A2 = A and B 2 = B. We have that A2 = AA = (AB)A = A(BA) = AB = A. The proof of the other case is similar. Example 3.2.13. Given that BX = A and CB = I find X. We use matrix algebra. Multiply BX = A on the left by C to get C(BX) = CA. By associativity this is just (CB)X = CA. This simplifies to IX = CA and so by properties of the identity matrix this yields X = CA. Exercises 3.2 1. Calculate 2. Calculate 2 0 7 −1  1  2  3 + 1 1 1 0 3 2 1 +  0 1 1 1 + 2 2 3 3  3. Calculate x y  1  −1  −4 5 4 4 4 3 1 5 x y 3.2. MATRIX ALGEBRA 101 4. If A= 1 −1 1 2 calculate A2 , A3 and A4 . 1 1 1 5. Let A = and x = . Calculate Ax, A2 x, A3 x, A4 x and 1 0 0 A5 x. What do you notice? 6. Calculate A2 where A= cos θ sin θ − sin θ cos θ 7. Show that A= 1 2 3 4 satisfies A2 − 5A − 2I = O. 8. Let A be the following 3 × 3 matrix   2 4 4  0 1 −1  0 1 3 Calculate A3 − 6A2 + 12A − 8I where I is the  3  2 9. Let A = 2 3 × 3 identity matrix.  1 −1 2 −1  and f (x) = x3 − 5x2 + 8x − 4. Calculate f (A). 2 0 10. If 3X + A = B, find X in terms of A and B. 1 1 2 2 11. If X + Y = and X − Y = find X and Y . 2 2 1 1 12. If AB = BA show that A2 B = BA2 . 13. Is it true that AABB = ABAB? 102 CHAPTER 3. MATRICES 14. Show that (A + B)2 − (A − B)2 = 2(AB + BA). 15. Let A and B be n × n matrices. Is it necessarily true that (A − B)(A + B) = A2 − B 2 ? If so, prove it. If not, find a counterexample. 16. Expand (A + I)4 carefully. 17. A matrix A is said to be symmetric if AT = A. (i) Show that a symmetric matrix must be square. (ii) Show that if A is any matrix then AAT is defined and symmetric. (iii) Let A and B be symmetric matrices of the same size. Prove that AB is symmetric if and only if AB = BA. 18. An n × n-matrix A is said to be skew-symmetric if AT = − A. (i) Show that the diagonal entries of a skew-symmetric matrix are all zero. (ii) If B is any n × n-matrix, show that B + B T is symmetric and that B − B T is skew-symmetric. (iii) Deduce that every square matrix can be expressed as the sum of a symmetric matrix and a skew-symmetric matrix. 19. Let A, B and C be square matrices of the same size. Define [A, B] = AB − BA. Calculate [[A, B], C] + [[B, C], A] + [[C, A], B]. 20. Let A be a 2 × 2 matrix such that AB = BA for all 2 × 2 matrices B. Show that λ 0 A= 0 λ for some scalar λ. 21. Let A be a 2 × 2 matrix. The trace of A, denoted tr(A), is the sum of the diagonal elements. 3.3. DETERMINANTS 103 (i) Show that tr(A + B) = tr(A) + tr(B); tr(λA) = λtr(A); tr(AB) = tr(BA). (ii) Let A be a known matrix. Show that the equation AX − XA = I cannot be solved for X. 3.3 Determinants In this section, we shall show how to calculate a single number from a square matrix called the determinant of that matrix. Determinants are important, and we shall explain their geometric meaning later on. Determinants are only defined for square matrices. If A is a square matrix we denote its determinant by det(A) or by replacing the round brackets of the matrix A with straight brackets. • The determinant of the 1 × 1 matrix a is a. • The determinant of the 2 × 2 matrix a b A= c d denoted a b c d is the number ad − bc. • The determinant of the 3 × 3 matrix   a b c  d e f  g h i denoted a b c d e f g h i is the number e f d f d e a − b + c h i g i g h 104 CHAPTER 3. MATRICES We could in fact define the determinant of any square matrix of whatever size in much the same way. However, we shall limit ourselves to calculating the determinants of 3 × 3 matrices at most. Warning! Pay attention to the signs in the definition. You multiply alternately by plus one and minus one + − + − ... Examples 3.3.1. (i) 2 3 4 5 = 2 × 5 − 3 × 4 = −2. (ii) 2 1 0 1 0 2 0 1 1 = 2 0 2 1 1 − 1 1 2 0 1 = −5 (iii) 1 2 1 3 1 0 2 0 1 = 1 1 0 0 1 − 2 3 0 2 1 3 1 + 2 0 = −7 We shall briefly touch on some important properties of determinants that play an important role in the more advanced theory. Theorem 3.3.2. Let A and B be square matrices having the same size. Then det(AB) = det(A) det(B). Proof. The result is true in general, but I shall only prove it for 2×2 matrices. Let a b e f A= and B = c d g h We prove directly that det(AB) = det(A) det(B). First ae + bg af + bh AB = ce + dg cf + dh 3.3. DETERMINANTS 105 Thus det(AB) = (ae + bg)(cf + dh) − (af + bh)(ce + dg). The first bracket multiplies out as acef + adeh + bcgf + bdgh and the second as acef + adf g + bceh + bdgh. Subtracting these two expressions we get adeh + bcgf − adf g − bceh. Now we calculate det(A) det(B). This is just (ad − bc)(eh − f g) which multiplies out to give adeh + bcf g − adf g − bceh. Thus the two sides are equal, and we have proved the result. Theorem 3.3.3. Let A be any square matrix. Then det(AT ) = det(A). Proof. The theorem is true in general, but I shall only prove it for 2 × 2 matrices. Let a b A= c d We calculate det(AT ) a c b d = ad − cb = a b c d as claimed. The proof of the following theorem is true in general but I would recommend proving it only for the 2 × 2 case. To state the results, I will think of an n × n matrix A as being a list of its columns where the columns are regarded as column vectors. Thus A = (a1 , . . . , an ) . 106 CHAPTER 3. MATRICES Theorem 3.3.4. 1. Let B be obtained from A by interchanging any two columns. Then det(B) = − det(A). We say that the determinant is an alternating function of its columns. 2. det (a1 , . . . , λai + µbi , . . . an ) is equal to λ det (a1 , . . . , ai , . . . an ) + µ det (a1 , . . . , bi , . . . an ) . We say that the determinant is n-linear. From the above theorem, we may deduce the following useful properties of determinants. 1. If two columns of a matrix are equal then the determinant is zero. 2. If a multiple of one column of a matrix is added to another then the determinant is unchanged. These properties of determinants can be used to simplify their calculation. We conclude this section with an application of determinants to solving certain systems of linear equations. This is useful in theory and bad in practice. Theorem 3.3.5 (Cramer’s Rule). Let Ax = b be a system of n equation in n unknowns xi where the matrix A has a non-zero determinant. Define Bi to be the matrix obtained from A by replacing the ith column of A by b. Then xi = det(Bi ) , det(A) and this is the only solution. We shall verify this theorem in the case where the coefficient matrix is 2 × 2. Example 3.3.6. Consider the system of equations ax + by = e cx + dy = f 3.3. DETERMINANTS 107 where the matrix of coefficients is invertible. By Cramer’s rule we have that e b f d x = a b c d and y = a c a c e f b d Direct substitution into the lefthand side of the equations shows that these solutions work. We shall show this for the first equation. This becomes e b a e + b a c f f d . a b c d The numerator is just a(de − bf ) + b(af − ce) = ade − abf + abf − bce = e(ad − bc) which gives the result. Advice Although of some theoretical value, Cramer’s Rule can be used reasonably well for two or three unknowns but for larger systems it is highly labour-intensive and not to be recommended. Exercises 3.3 1. Compute the following determinants. (i) 1 −1 2 3 108 CHAPTER 3. MATRICES (ii) 3 2 6 4 (iii) 1 −1 1 2 3 4 0 0 1 (iv) 1 2 0 0 1 1 2 3 1 (v) 2 2 2 1 0 5 100 200 300 (vi) 1 3 5 102 303 504 1000 3005 4999 (vii) 1 1 2 2 1 1 1 2 1 (viii) 15 16 17 18 19 20 21 22 23 1−x 4 2. Solve 2 3−x 3. Calculate = 0. x cos x sin x 1 − sin x cos x 0 − cos x − sin x 3.3. DETERMINANTS 109 4. Solve the system of linear equations 2x + 4y + z = 1 x+y+z = 0 3x + 3y + z = 2 using Cramer’s rule and check your answer. 5. Consider a triangle with sides of lengths a, b and c. The angles opposite these sides are respectively α, β and γ. Show from basic trigonometry that b cos γ + c cos β = a c cos α + a cos γ = b a cos β + b cos α = c Use Cramer’s rule to show that cos α = b 2 + c 2 − a2 . 2bc 6. Show that if x1 , x2 , x3 are distinct then the determinant 1 x1 x21 1 x2 x22 1 x3 x23 110 CHAPTER 3. MATRICES is (x2 − x1 )(x3 − x1 )(x3 − x2 ) and so non-zero. Deduce that given three distinct points (x1 , y1 ), (x2 , y2 ) and (x3 , y3 ) there is a unique parabola y = a + bx + cx2 that passes through them. 3.4 Solving systems of linear equations The goal of this section is to use matrices to help us solve systems of linear equations. We begin by proving some general results on linear equations, and then we describe Gaussian elimination, an algorithm for solving systems of linear equations. 3.4.1 Some theory A system of m linear equations in n unknowns is a list of equations of the following form a11 x1 + a12 x2 + . . . + a1n xn a21 x1 + a22 x2 + . . . + a2n xn am1 x1 + am2 x2 + . . . + amn xn = b1 = b2 ··· = bm A solution is a sequence of values of x1 , . . . , xn that satisfy all the equations. The set of all solutions is called the solution set or general solution. The equations above can be conveniently represented using matrices. Let A be the m × n matrix (A)ij = aij ; let b be the m × 1 matrix (b)i1 = bi , and let x be the n × 1 matrix (x)j1 = xj . Then the system of linear equations above can be written in the form Ax = b If b is a zero matrix, we say that the equations are homogeneous, otherwise they are said to be inhomogeneous. A system of linear equations that has no solution is said to be inconsistent; otherwise, it is said to be consistent. We begin with some results that tell us what to expect when solving systems of linear equations. 3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS 111 Proposition 3.4.1. Homogeneous equations Ax = 0 are always consistent, because x = 0 is always a solution. In addition, the sum of any two solutions is again a solution, and the scalar multiple of any solution is again a solution. Proof. Let Ax = 0 be our homogeneous system of equations. Let a and b be solutions. That is Aa = 0 and Ab = 0. We now calculate A(a + b). To do this we use the fact that matrix multiplication satisfies the left distributivity law A(a + b) = Aa + Ab = 0 + 0 = 0. Now let a be a solution and λ any scalar. Then A(λa) = λAa = λ0 = 0. Proposition 3.4.2. Let Ax = b be a consistent system of linear equations. Let p be any one solution. Then every solution of the equation is of the form p + h for some solution h of Ax = 0. Proof. Let a be any solution to Ax = b. Let h = a − p. Then Ah = 0. The result now follows. Theorem 3.4.3. A system of linear equations Ax = b has either • no solutions; • exactly one solution; • infinitely many solutions. Proof. We prove that if we can find two different solutions we can in fact find infinitely many solutions. Let u and v be two distinct solutions to this equation then Au = b and Av = b. Consider now the column matrix w = u − v. Then Aw = A(u − v) = Au − Av = 0 112 CHAPTER 3. MATRICES using the distributive law. Thus w is a non-zero column matrix that satisfies the equation Ax = 0. Consider now the column matrices of the form u + λw where λ is any real number. This is therefore a set of infinitely many different column matrices. We calculate A(u + λw) = Au + λAw = b using the distributive law and properties of scalars. It follows that the infinitely many column matrices u + λw are solutions to the equation Ax = b. 3.4.2 Gaussian elimination In this section, we shall develop a method, in fact an algorithm, that will take as input a system of linear equations and produce as output the following: if the system has no solutions it will tell us, on the other hand if it has solutions then it will determine them all. Our method is based on three simple ideas: 1. Certain systems of linear equations have a shape that makes them very easy to solve. 2. Certain operations can be carried out on systems of linear equations which simplify them but do not change the solutions. 3. Everything can be done using matrices. Here are examples of each of these ideas. Example 3.4.4. The system of equations 2x + 3y = 1 −y = 3 is very easy to solve. From the second equation we get y = −3. Substituting this value into the first equation gives us x = 5. We can check that this solution is correct by checking that these two values satisfy every equation. 3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS 113 Example 3.4.5. The system of equations 2x + 3y = 1 x+y = 2 can be converted into a system with the same solutions but which is easier to solve. Multiply then second equation by 2. This gives us the new equations 2x + 3y = 1 2x + 2y = 4 which have the same solutions as the original equations. Next, subtract the first equation from the second equation to get 2x + 3y = 1 −y = 3 These equations also have the same solutions as the original equations, but they can now be easily solved. Example 3.4.6. The system of equations 2x + 3y = 1 x+y = 2 can be written in matrix form as the matrix equation 2 3 x 1 = 1 1 y 2 For the purposes of our algorithm, we rewrite this equation in terms of what is called an augmented matrix 2 3 1 1 1 2 The operations carried out in the previous example can be applied directly to the augmented matrix. 2 3 1 2 3 1 2 3 1 =⇒ =⇒ 1 1 2 2 2 4 0 −1 3 114 CHAPTER 3. MATRICES This augmented matrix can then be converted back into the usual matrix form and solved 2x + 3y = 1 −y = 3 We now formalize the above ideas. A matrix is called a row echelon matrix or to be in row echelon form if it satisfies the following three conditions: 1. Any zero rows are at the bottom of the matrix. 2. If there are non-zero rows then they begin with the number 1, called the leading 1. 3. In the column beneath a leading 1, the elements are all zero. The following operations on a matrix are called elementary row operations: 1. Multiply row i by a non-zero scalar λ. We notate this operation by Ri ← λRi . 2. Interchange rows i and j. We notate this operation by Ri ↔ Rj . 3. Add a multiple λ of row i to another row j. We notate this operation by Rj ← Rj + λRi . The following result is not hard to prove. Proposition 3.4.7. Applying the elementary row operations to a system of linear equations does not change their set of solutions Given a system of linear equations Ax = b the matrix (A|b) is called the augmented matrix. Algorithm 3.4.8. (Gaussian elimination) This is an algorithm for solving systems of linear equations. In outline, the algorithm runs as follows: 3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS 115 (i) Given a system of equations Ax = b form the augmented matrix (A|b). (ii) By using elementary row operations, convert (A|b) into an augmented matrix (A0 |b0 ) which is a row echelon matrix. (iii) Solve the equations obtained from (A0 |b0 ) by back substitution. Remarks • The process in step (ii) has to be carried out systematically to avoid going around in circles. • Elementary row operations applied to a set of linear equations do not change the solution set; thus the solution sets of Ax = b and A0 x = b0 are the same. • Solving systems of linear equations where the associated augmented matrix is a row echelon matrix is easy and can be accomplished by back substitution. Here is a more detailed description of step (ii) of the algorithm — the input is a matrix B and the output is a matrix B 0 which is a row echelon matrix: 1. Locate the leftmost column that does not consist entirely of zeros. 116 CHAPTER 3. MATRICES 2. Interchange the top row with another row if necessary to bring a nonzero entry to the top of the column found in step 1. 3. If the entry now at the top of the column found in step 1 is a, then multiply the first row by a1 in order to introduce a leading 1. 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros. 5. Now cover up the top row, and begin again with step 1 applied to the matrix that remains. Continue in this way until the entire matrix is a row echelon matrix. The important thing to remember is to start at the top and work downwards. Here is a more detailed description of step (iii) of the algorithm. Let 0 A x = b0 be a system of equations where the augmented matrix is a row echelon matrix and where there is more than one solution. The variables are divided into two groups: those variables corresponding to the columns of A0 containing leading 1’s, called leading variables, and the rest, called free variables. We solve for the leading variables in terms of the free variables; the free variables can be assigned arbitrary values independently of each other. Examples 3.4.9. 1. Show that the following system of equations is inconsistent (i.e. has no solutions). x + 2y − 3z = −1 3x − y + 2z = 7 5x + 3y − 4z = 2 The first step is to write down the augmented matrix of the system. In this case, this is the matrix   1 2 −3 −1  3 −1 2 7  5 3 −4 2 3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS 117 Carry out the elementary row operations R2 ← R2 − 3R1 and R3 ← R3 − 5R1 . This gives us   1 2 −3 −1  0 −7 11 10  7 0 −7 11 Now carry out the elementary row operation R3 ← R3 − R2 which yields   1 2 −3 −1  0 −7 11 10  0 0 0 −3 The equation corresponding to the last line of the augmented matrix is 0x + 0y + 0z = −3. Clearly, this equation has no solutions and so the original set of equations has no solutions. 2. Show that the following system of equations has exactly one solution, and check it. x + 2y + 3z = 4 2x + 2y + 4z = 0 3x + 4y + 5z = 2 We first write down the augmented  1 2  2 2 3 4 matrix  3 4 4 0  5 2 We then carry out the elementary row operations R2 ← R2 − 2R1 and R3 ← R3 − 3R1 to get   1 2 3 4  0 −2 −2 −8  0 −2 −4 −10 The carry out the elementary row − 21 R3 that yield  1 2  0 1 0 1 operations R2 ← − 21 R2 and R3 ←  3 4 1 4  2 5 118 CHAPTER 3. MATRICES Finally, carry out the elementary row  1 2 3  0 1 1 0 0 1 operation R3 ← R3 − R2  4 4  1 This is now a row echelon matrix. Write down the corresponding set of equations x + 2y + 3z = 4 y+z = 4 z = 1 Now solve by back substitution to get x = −5, y = 3 and z = 1. Finally, we check that      1 2 3 −5 4  2 2 4  3  =  0  3 4 5 1 2 3. Show that the following system of equations has infinitely many solutions, and check them. x + 2y − 3z = 6 2x − y + 4z = 2 4x + 3y − 2z = 14 The augmented matrix for this system is   1 2 −3 6  2 −1 4 2  4 3 −2 14 We transform this matrix into an echelon matrix by means of the following elementary row operations R2 ← R2 − 2R1 , R3 ← R3 − 4R1 , R2 ← − 51 R2 , R3 ← − 51 R3 and R3 ← R3 − R2 . This yields   1 2 −3 6  0 1 −2 2  0 0 0 0 3.4. SOLVING SYSTEMS OF LINEAR EQUATIONS 119 Because the bottom row consists entirely of zeros, this means that we have only two equations x + 2y − 3z = 6 y − 2z = 2 By back substitution, both x and y can be expressed in terms of z and z may take any value we like. We say that z is a free variable. Let z = λ ∈ R. Then the set of solutions can be written in the form       x 2 −1  y  =  2  + λ 2  z 0 1 We now check that these solutions work      1 2 −3 2−λ 6  2 −1 4   2 + 2λ  =  2  4 3 −2 λ 14 as required. Exercises 3.4 1. In each case, determine whether the system of equations is consistent or not. When consistent, find all solutions and show that they work. (i) 2x + y − z = 1 3x + 3y − z = 2 2x + 4y + 0z = 2 (ii) 2x + y − z = 1 3x + 3y − z = 2 2x + 4y + 0z = 3 120 CHAPTER 3. MATRICES (iii) 2x + y − 2z = 10 3x + 2y + 2z = 1 5x + 4y + 3z = 4 (iv) x+y+z+w 4x + 5y + 3z + 3w 2x + 3y + z + w 5x + 7y + 3z + 3w 3.5 = = = = 0 1 1 2 Blankinship’s algorithm This is an alternative procedure to the extended Euclidean algorithm that delivers exactly the same information in a much nicer way. It uses matrix theory and was described by W. A. Blankinship in ‘A new version of the Euclidean algorithm’ American Mathematical Monthly 70 (1963), 742–745. To explain how it works, let’s go back to the basic step of Euclid’s algorithm. If a ≥ b then we divide b into a and write a = bq + r where 0 ≤ r ≤ b. The key point is that gcd(a, b) = gcd(b, r). We shall now think of (a, b) and (b, r) as column matrices a r , . b b We want the 2 × 2 matrix that maps a r to . b b This is the matrix 1 −q 0 1 . 3.5. BLANKINSHIP’S ALGORITHM Thus 1 −q 0 1 a b 121 = r b . Finally, we can describe the process by the following matrix operation 1 0 a 1 −q r → 0 1 b 0 1 b by carrying out an elementary row operation. This procedure can be iterated. It will terminate when one of the entries in the righthand column is 0. The non-zero entry will then be the greatest common divisor of a and b and the matrix on the lefthand side will tell you how to get to 0, gcd(a, b) from a, b and so will provide the information that the Euclidean algorithm provides. All of this is best illustrated by means of an example. Let’s calculate x, y such that gcd(2520, 154) = x2520 + y154. We start with the matrix 1 0 2520 0 1 154 If we divide 154 into 2520 it goes 16 times plus a remainder. Thus we subtract 16 times the second row from the first to get 1 −16 56 0 1 154 We now repeat the process but, since the larger number, 154, is on the bottom, we have to subtract some multiple of the first row from the second. This time we subtract twice the first row from the second to get 1 −16 56 −2 33 42 Now repeat this procedure to get 3 −49 14 −2 33 42 And again 3 −49 14 −11 180 0 122 CHAPTER 3. MATRICES The process now terminates because we have a zero in the rightmost column. The non-zero entry in the rightmost column is gcd(2520, 154). We also know that 14 3 −49 2520 = . −11 180 154 0 Now this matrix equation corresponds to two equations. The bottom one can be verified. The top one says that 14 = 3 × 2520 − 49 × 154 which is both true and solves the extended Euclidean problem! You can use either this method or the one described in Chapter 1. 3.6 *Some proofs* This section will not be examined in 2013. In this section, I shall prove that the algebraic properties of matrices stated really do hold. I shan’t prove all of them: just a representative sample. It is important to observe, that all the properties of matrix algebra are ultimately proved using the properties of real numbers. Let A be an m × n matrix whose entry in the ith row and jth column is aij . Let B be an n × p matrix whose entry in the jth row and kth column is bjk . By definition (AB)ik is the number equal to the product of the ith row of A times the kth column of B. This is just (AB)ik = n X j=1 Theorem 3.6.1. 1. (A + B) + C = A + (B + C). 2. A(BC) = (AB)C. 3. (λ + µ)A = λA + µA. aij bjk . 3.6. *SOME PROOFS* 123 Proof. (1) To show that (A + B) + C = A + (B + C) we have to prove two things. First, the size of (A + B) + C is the same as the size of A + (B + C). Second, elements of (A + B) + C and A + (B + C) in corresponding positions are equal. To add A and B they have to be the same size and the result will be the same size as both of them. Thus C is the same size as A and B. It’s clear that both sides of the equation really are the same size. We now compare corresponding elements: ((A + B) + C)ij = (A + B)ij + (C)ij = ((A)ij + (B)ij ) + (C)ij . But now we use the associativity of addition of real numbers to get ((A)ij +(B)ij )+(C)ij = (A)ij +((B)ij +(C)ij ) = (A)ij +(B+C)ij = (A+(B+C))ij , as required. (2) Let A be an m × n matrix with entries aij , let B be an n × p matrix with entries bjk , and let C be a p × q matrix with entries ckl . It’s evident that A(BC) and (AB)C have the same size, so it remains to show that corresponding elements are the same. We shall prove that (A(BC))il = ((AB)C)il . By definition (A(BC))il = n X ait (BC)tl , t=1 and (BC)tl = p X bts csl . s=1 Thus (A(BC))il = n X t=1 ait ( p X bts csl ). s=1 Using distributivity of multiplication over addition for real numbers this sum is just p n X X ait bts csl . t=1 s=1 Now change the order in which we add up these real numbers to get p n X X s=1 t=1 ait bts csl . 124 CHAPTER 3. MATRICES Now use distributivity again p n X X ( ait bts )csl . s=1 t=1 The sum within the brackets is just (AB)is and so the whole sum is p X (AB)is csl s=1 which is precisely ((AB)C)il . (3) Clearly (λ + µ)A and λA + µA have the same sizes. We show that corresponding elements are the same: ((λ + µ)A)ij = (λ + µ)(A)ij = λ(A)ij + µ(A)ij = (λA)ij + (µA)ij which is just (λA + µA)ij , as required. Warning! In (4) below, notice how matrices are reversed. Theorem 3.6.2. 1. (AT )T = A. 2. (A + B)T = AT + B T . 3. (αA)T = αAT . 4. (AB)T = B T AT . Proof. (1) We have that ((AT )T )ij = (AT )ji = (A)ij . (2) We have that ((A + B)T )ij = (A + B)ji = (A)ji + (B)ji = (AT )ij + (B T )ij 3.7. *MATRIX INVERSES* 125 which is just (AT + B T )ij . (3) We have that ((αA)T )ij = (αA)ji = α(A)ji = α(AT )ji = (αAT )ij . (4) Let A be an m × n matrix and B an n × p matrix. Thus AB is defined and is m × p. Hence (AB)T is p × m. Now B T is p × n and AT is n × m. Thus B T AT is defined and is p × m. Hence (AB)T and B T AT have the same size. We now show that corresponding elements are equal. By definition ((AB)T )ij = (AB)ji . This is equal to n n X X (A)js (B)si = (AT )sj (B T )is . s=1 s=1 But real numbers commute under multiplication and so n X T T (A )sj (B )is = s=1 n X (B T )is (AT )sj = (B T AT )ij , s=1 as required. 3.7 *Matrix inverses* This section will not be examined in 2013. In this section, all matrices will be square. I have described how matrices can be added, subtracted and multiplied. In this section, I shall now describe the circumstances under which division is possible, and give an application to solving linear equations. 3.7.1 The key idea The simplest kind of linear equation is ax = b where a and b are scalars. If a 6= 0 we can solve this by multiplying by a−1 on both sides to get a−1 (ax) = a−1 b; we now use associativity to get (a−1 a)x = a−1 b; but a−1 a = 1 and so 1x = a−1 b and this gives x = a−1 b. 126 CHAPTER 3. MATRICES We try to copy this approach for the matrix equation Ax = b. We suppose that there is a matrix B such that BA = I. • Multiply on the left both sides of our equation Ax = b to get B(Ax) = Bb. Because order matters when you multiply matrices, which side you multiply on also matters. • Use associativity of matrix mulitiplication to get (BA)x = Bb. • Now use our assumption that BA = I to get Ix = Bb. • Finally, we use the properties of the identity matrix to get x = Bb. We appear to have solved our equation, but we need to check it. We calculate A(Bb). By associativity this is (AB)b. At this point we also have to assume that AB = I; this gives Ib = b, as required. We conclude that in order to copy the method for a solving a linear equation in one unknown, our coefficient matrix A must have the property that there is a matrix B such that AB = I = BA. We shall now investigate this condition in more detail. 3.7.2 Invertible and noninvertible matrices A matrix A is said to be invertible if we can find a matrix B such that AB = I = BA. The matrix B we call it an inverse of A, and we say that the matrix A is invertible. Example 3.7.1. A real number r regarded as a 1 × 1 matrix is invertible if and only if it is non-zero, in which case an inverse is its reciprocal. It’s clear that if A is a zero matrix, then it can’t be invertible just as in the case of real numbers. However, the next example shows that even if A is not a zero matrix, then it need not be invertible. 3.7. *MATRIX INVERSES* 127 Example 3.7.2. Let A be the matrix 1 1 0 0 We shall show that there is no matrix B such that AB = I = BA. Let B be the matrix a b c d From BA = I we get a = 1 and a = 0. It’s impossible to meet both these conditions at the same time and so B doesn’t exist. Example 3.7.3. Let   1 2 3 A= 0 1 4  0 0 1 and   1 −2 5 1 −4  B= 0 0 0 1 Check that AB = I = BA. We deduce that A is invertible with inverse B. As always, in passing from numbers to matrices things become more complicated. Before going any further, I need to clarify one point which will at least make our lives a little simpler. Lemma 3.7.4. Let A be invertible and suppose that B and C are matrices such that AB = I = BA and AC = I = CA. Then B = C. Proof. Multiply AB = I both sides on the left by C. Then C(AB) = CI. Now CI = C, because I is the identity matrix, and C(AB) = (CA)B since matrix multiplication is associative. But CA = I thus (CA)B = IB = B. It follows that C = B. 128 CHAPTER 3. MATRICES The above result tells us that if a matrix A is invertible then there is only one matrix B such that AB = I = BA. We call the matrix B the inverse of A. It is usually denoted by A−1 . Warning! We can only write A−1 if we know that A is invertible. There are now two main questions. First, how can we tell whether a matrix is invertible or not? And, second, if a matrix is invertible how do we find its inverse? In the remainder of this section, I shall answer these two questions. The key to answering them both is the determinant. Recall from Theorem 3.3.2 that det(AB) = det(A) det(B). I use this property below to get a necessary condition for a matrix to be invertible. Lemma 3.7.5. If A is invertible then det(A) 6= 0. Proof. By assumption, there is a matrix B such that AB = I. Take determinants of both side of the equation AB = I to get det(AB) = det(I). By the key property of determinants recalled above det(AB) = det(A) det(B) and so det(A) det(B) = det(I). But det(I) = 1 and so det(A) det(B) = 1. In particular, det(A) 6= 0. If we return to our example above, then we can now see why we could not find an inverse: its determinant is zero. Are there any other properties that a matrix must satisfy in order to have an inverse? The answer is no. I shall prove this for 2 × 2 and 3 × 3 matrices. 3.7. *MATRIX INVERSES* 3.7.3 129 The matrix inverse method for solving linear equations Once we have written the equations in matrix form, we can use matrix inverses to solve them as long as the the matrix of coefficents is invertible. Theorem 3.7.6. A system of linear equations Ax = b in which A is invertible has unique solution x = A−1 b. Proof. Observe that A(A−1 b) = (AA−1 )b = Ib = b. Thus A−1 b is a solution. It is unique because if x0 is any solution then Ax0 = b giving A−1 (Ax0 ) = A−1 b and so x0 = A−1 b. I shall call this method of solving linear equations the matrix inverse method. The 2 × 2 case We shall begin by dealing with the case where we have two equations and two unknowns. This contains all the ideas we shall need without any of the labour. We therefore have a system of equations Ax = b where A is a 2 × 2 matrix. An important ingredient in finding the inverse of A, if it exists, is a matrix formed from A called the adjugate matrix of A. If a b A= c d 130 CHAPTER 3. MATRICES then we define the adjugate matrix, adj(A), to be the following matrix d −b adj(A) = −c a The defining characteristic of the adjugate is that A adj(A) = det(A)I = adj(A)A We can deduce from the defining characteristic of the adjugate that if det(A) 6= 0 then 1 d −b −1 A = a det(A) −c In particular, a 2 × 2 matrix is invertible if and only if its determinant is non-zero. 1 2 Example 3.7.7. Let A = . Determine if A is invertible and, if it 3 1 is, find its inverse, and check the answer. We calculate det(A) = −5. This is non-zero, and so A is invertible. We now form the adjugate of A: 1 −2 adj(A) = −3 1 Thus the inverse of A is −1 A −1 = 5 1 −2 −3 1 We now check that AA−1 = I (to make sure that we haven’t made any mistakes). We can now put all the pieces together, and show how to apply the matrix inverse method in practice. Example 3.7.8. Solve the following system of equations using the matrix inverse method x + 2y = 1 3x + y = 2 3.7. *MATRIX INVERSES* 131 1. Write the equations in matrix form: 1 2 x 1 = 3 1 y 2 2. Calculate det(A): this is equal to −5. Since the determinant is non-zero the matrix inverse method can be applied. 3. Calculate adj(A): 1 −2 −3 1 1 −2 −3 1 adj(A) = 4. Form inverse: A −1 −1 = 5 5. Solve equations using inverse: from Ax = b we get that x = A−1 b. Thus in this case 3 −1 1 −2 1 5 = x= 1 −3 1 2 5 5 Thus x = 3 5 and y = 15 . 6. Check solutions: there are two (equivalent) ways of doing this. The first is to check by direct substitution x + 2y = 3 1 +2· =1 5 5 and 3 1 + =2 5 5 Alternatively, you can check by matrix mutiplication 3 1 2 5 1 3 1 5 3x + y = 3 · which gives 1 2 You can see that both calculations are, in fact, identical. 132 CHAPTER 3. MATRICES The 3 × 3 case We begin by showing how to construct the adjugate matrix in general. Let A be an n × n matrix with entries aij . • Pick a particular row i and column j. If we cross out this row and column we get an (n − 1) × (n − 1) matrix which I shall denote by M (A)ij ; it is called a submatrix of the original matrix A. • The determinant det(M (A)ij ) is called the minor of the element aij . • Finally, if we multiply det(M (A)ij ) by the corresponding sign we get the cofactor (−1)i+j det(M (A)ij ) of the element aij . cofactor = signed minor • If we replace each element aij by its cofactor, we get the matrix C(A) of cofactors of A. Example 3.7.9. We compute the matrix of cofactors of an arbitrary 2 × 2 matrix. Let a b A= c d We begin by computing the minors: det(M (A)11 ) = d, det(M (A)12 ) = c, det(M (A)21 ) = b, Thus the matrix of minors in this case is d c b a The matrix of signs is The matrix of cofactors is + − − + d −c −b a det(M (A)22 ) = a. 3.7. *MATRIX INVERSES* 133 Example 3.7.10. Let   3 1 −4 6  A= 2 5 1 4 8 We shall calculate the matrix of cofactors C(A) of the matrix A. The pattern of signs we shall use is   + − +  − + −  + − + Let A be any square matrix. The transpose of the matrix of cofactors C(A), denoted adj(A), is called the adjugate3 matrix of A. The crucial property of the adjugate is described in the next result. Theorem 3.7.11. For any square matrix A, we have that A(adj(A)) = det(A)I = (adj(A))A. We have verified the above result in the case of 2 × 2 matrices. It is possible to verify the above theorem in the case of 3 × 3 matrices, but that it much more laborious. To prove this result in general we need to develop the properties of determinants further. Theorem 3.7.12. Let A be a square matrix. Then A is invertible if and only if det(A) 6= 0. When A is invertible, its inverse is given by A−1 = 1 adj(A). det(A) Proof. Let A be invertible. By our lemma above, det(A) 6= 0 and so we can form the matrix 1 adj(A). det(A) We now calculate A 1 1 adj(A) = A adj(A) = I det(A) det(A) by our theorem above. Thus A has the advertised inverse. 3 This odd word comes from Latin and means ‘yoked together’. 134 CHAPTER 3. MATRICES Conversely, suppose that det(A) 6= 0. Then again we can form the matrix 1 adj(A) det(A) and verify that this is the inverse of A and so A is invertible. Advice The adjugate is useful in finding the inverses of 2×2 and maybe 3×3 matrices, but for larger matrices it requires a lot of work. There are better ways of finding inverses using what are called ‘elementary row operations’. They will be described in a later course. Example 3.7.13. Let   1 2 3 A= 2 0 1  −1 1 2 We show that A is invertible and calculate its inverse. First, det(A) = −5 and so A is invertible. The matrix of minors is   −1 5 2  1 5 3  2 −5 −4 The matrix of cofactors is   −1 −5 2  −1 5 −3  2 5 −4 The adjugate is the transpose of the matrix of cofactors   −1 −1 2  −5 5 5  2 3 −4 Thus the inverse of A is the adjugate with each entry divided by the determinant of A   −1 −1 2 1  −5 5 5  A−1 = −5 2 −3 −4 Now check your answer! 3.8. *COMPLEX NUMBERS VIA MATRICES* 135 We can now apply the matrix inverse method to solve systems of three equations in three unknowns in the case where the matrix of coeffients is invertible. 3.8 *Complex numbers via matrices* This section will not be examined in 2013. Consider all matrices that have the following shape a −b b a where a and b arr arbitrary real numbers. You should show first that the sum, difference and product of any two matrices having this shape is also a matrix of this shape. Rather remarkably matrix multiplication is commutative for matrices of this shape. Observe that the determinant of our matrix above is a2 + b2 . It follows that every non-zero matrix of the above shape is invertible. The inverse of the above matrix in the non-zero case is 1 a b a2 + b2 −b a and again has the same form. It follows that the set of all these matrices satisfies the axioms of high-school algebra. Define 1 0 1= 0 1 and i= 0 −1 1 0 We may therefore write our matrices in the form a1 + bi. Observe that i2 = −1. It follows that our set of matrices can be regarded as the complex numbers in disguise. 136 3.9 CHAPTER 3. MATRICES Learning outcomes for Chapter 3 • Add, subtract, and multiply two matrices, and multiply a matrix by a scalar; be able to carry out sequences of such operations to obtain a single matrix as a result. • Compute f (A) given a polynomial f (x) and a square matrix A. • Compute (small) determinants by first row expansion. • Know and be able to use the basic properties of determinants. • Solve linear equations using Gaussian elimination. • Cramer’s rule and why you shouldn’t use it. 3.10 Further reading and exercises The material in this chapter is absolutely basic and it is essential to master all the ideas here since you will meet with them continually in your future studies. The best place to gain that practice is to use the Schaum Outline book Linear algebra. This contains far more than I cover in this course but you will also find it useful in the second year. Chapter 6 of Hirst and Singerman contains a treatment of matrices. Chapter 4 Vectors The Greeks attributed the discovery of geometry to the Ancient Egyptians who needed it in recalculating land boundaries for the purposes of tax assessment after the yearly flood of the Nile. But it was the Ancient Greeks themselves who elevated it into a mathematical science and a paradigm of what could be achieved in mathematics. Euclid’s book the Elements codified what was known about geometry into a handful of axioms and then showed that all of geometry could be deduced from those axioms by the use of mathematcial proof. The Elements is not only the single most important mathematics books ever written but one of the most important books period, as the Americans say. Impressive though Euclid’s achievement was, it does suffer one drawback in that it is not the easiest system to use. Even proving simple results, like Pythagoras’s theorem, takes dozens of intermediate results. So although it is a great theoretical achievement, it is not such a practical one. It was not until the nineteenth century that a practical tool for doing three-dimensional geometry was constructed. On the basis of the work carried out by Hamilton on quaternions — I say a little more about this later — the theory of vectors, the subject of this chapter, was developed by the American Josiah Willard Gibbs and promoted by the English electrical engineer Oliver Heaviside (whose formal schooling ended at the age of 16). In addition to setting up an algebraic system that will enable us to carry out geometrical calculations easily, I shall also touch on a deep connection with the work of the previous chapter. Each linear equation in three unknowns is in fact the equation of a plane in three-dimensional space. This means that the theory of linear equations in three unknowns has a geometri137 138 CHAPTER 4. VECTORS cal interpretation. This may be generalized: the theory of matrices combined with a theory of vectors in arbitrary dimensions is known as linear algebra, one of the most important branches of algebra. I have not attempted to develop the subject in this chapter completely rigorously, so I often make appeals to geometric intuition in setting up the algebraic theory of vectors. 4.1 Vector algebra I shall assume you are familiar with the following ideas from school: • the notion of a point; • the notion of a line and of a line segment; • the notion of the length of a line segment and the angle between two lines; • the notion of parallel lines. The notion of a pair of lines being parallel is fundamental to Euclidean geometry. It is used, for example, in proving that the sum of the angles in a triangle adds up to 180◦ . This is illustrated in the following diagram where we prove that A + B + C = 180◦ . 4.1. VECTOR ALGEBRA 4.1.1 139 Addition and scalar multiplication of vectors Key definition Two directed line segments which are parallel, have the same length, and point in the same direction are said to represent the same vector. The word ‘vector’ means carrier in Latin and what a vector carries is information about length and about direction. Because vectors stay the same when they move parallel to themselves, they also preserve information about angles. Thus vectors have length and direction but no other properties. I shall denote vectors by bold letters a, b, . . . If P and Q are points then the directed −→ line segment from P to Q is written P Q or P Q. If P = Q then P Q is just a point. The zero vector 0 is represented by the degenerate line segment P P . Vectors are denoted by arrows: the vector starts at the base of the arrow (where the feathers would be) we shall call this the tail of the vector and ends at the tip (where the arrowhead is) which we shall call the point of the 140 CHAPTER 4. VECTORS vector. Example 4.1.1. In the diagram below all the vectors shown are equal. ? ? ? ? ? ? ? ? ? The set of vectors in space can be equipped with two operations: vector addition and multiplication by a scalar. Let a and b be vectors. Then their sum is defined as follows: slide the vectors parallel to themselves so that the point of a touches the tail of b. The directed line segment from the tail of a to the point of b represents the vector a + b. G ?? ??? ?? b ?? ?? ?? ?? a o7 o o o oo ooo o o ooo ooooo a+b o ooo This definition does make sense though I will not justify that here. If a is a vector, then −a is defined to be the vector with the same length as a but pointing in the opposite direction. ? a −a 4.1. VECTOR ALGEBRA 141 Theorem 4.1.2 (Properties of vector addition). (VA1) a + (b + c) = (a + b) + c. This is the associative law for vector addition. (VA2) 0 + a = a = a + 0. The zero vector is the additive identity. (VA3) a + (−a) = 0 = (−a) + a. The vector −a is the additive inverse of a. (VA4) a + b = b + a. This is the commutative law for vector addition. The proof of the commutativity of vector addition is illustrated below. b ? ?/ 7 ooo o o oo ooo o o ooo ooo o o a+boo a o a oob+a o o ooo oooo ooo ooooo ooo / b The proof of associativity is illustrated below. We define a − b = a + (−b). 142 CHAPTER 4. VECTORS Advanced remark We have seen the above properties before: real numbers with respect to addition, and m × n matrices with respect to matrix addition. A set equipped with a binary operation that is associative, possesses an identity, possesses unique inverses and is commutative is called an abelian group. Example 4.1.3. Consider the following square of vectors. a O / d b o Then we have a + b + c + d = 0. Thus, in particular, d = −c − b − a. c 4.1. VECTOR ALGEBRA 143 Example 4.1.4. Consider the following diagram / ? ??? ?? ?? ?? ?? ?? ?? f ??c a k ?? ?? ?? ?? ?? ?? ?? oO o OOO g h OOO OOO OOO OOO OOO OOO OOO OO d e OOO OOO OOO OOO OOO OOO OOO OO' O b (i) We may write c in terms of e, d and f . By following the arrows we get that c = d + ef (ii) We may write g in terms of c, d, e and k. By following the arrows we get that g = −k + c + d − e. (iii) We may solve x + b = f using similar methods to high-school algebra to get x = f − b which is just a. (iv) We may solve x + h = d − e in a similar fashion to get x = d − e − h which is just g. If a is a vector then kak is its length. If kak = 1 then a is called a unit vector. We have that kak ≥ 0, and kak = 0 iff a = 0. By results on triangles we have the triangle inequality ka + bk ≤ kak + kbk . We now define scalar multiplication of a vector. Let λ be a scalar and a a vector. If λ = 0 then λa = 0; if λ > 0 then λa has the same direction as a 144 CHAPTER 4. VECTORS and length λ kak; if λ < 0 then λa has the opposite direction to a and length (−λ) kak. Observe that in all cases kλak = |λ| kak . If a is non-zero then a kak is a unit vector in the same direction as a. We call this process normalisation. Vectors that differ by a scalar multiple are said to be parallel. â = Theorem 4.1.5 (Properties of scalar multiplication). (i) 0a = 0. (ii) 1a = a. (iii) (−1)a = −a. (iv) (λ + µ)a = λa + µa. (v) λ(a + b) = λa + λb. (vi) λ(µa) = (λµ)a. We can use what we have introduced so far to prove simple geometric theorems. Example 4.1.6. If the midpoints of the consecutive sides of any quadrilateral are joined by line segments, then the resulting quadrilateral is a parallelogram. We refer to the picture below. 4.1. VECTOR ALGEBRA 145 We have that a + b + c + d = 0. −→ 1 − − → Now AB = 2 a + 12 b and CD = 12 c + 12 d. But a + b = −(c + d). It follows −→ −−→ that AB = −CD. Hence the line segment AB is parallel to the line segment CD and they have the same lengths. Similarly, BC is parallel to AD and has the same length. 4.1.2 Inner, scalar or dot products Let a and b be two vectors. If a, b 6= 0 then we define a · b = kak kbk cos(θ) where θ is the angle between a and b. Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. We call a · b the inner product of a and b. It is also sometimes called the scalar product and the dot product. It is important to remember that it is a scalar and not a vector. We say that non-zero vectors a and b are orthogonal to each other if the angle between them is ninety degrees. The key property of the inner product is that for non-zero a and b we have that a · b = 0 iff a and b are orthogonal. Theorem 4.1.7 (Properties of the inner product). (i) a · b = b · a. (ii) a · a = kak2 . (iii) λ(a · b) = (λa) · b = a · (λb). (iv) a · (b + c) = a · b + a · c. Remarks (i) The inner product a · a is often abbreviated a2 . (ii) Property (iv) says that the inner product distributes over addition. It is the only property that takes a bit of work to prove; I give the proof later. 146 CHAPTER 4. VECTORS The inner product enables us to prove much more interesting theorems. Example 4.1.8. The angle in a semicircle is a right angle. Draw a semicircle. Choose any point on the circumference of the semicircle and join it to the points at either end of the diameter of the semicircle. Then the claim is that the resulting triangle is right-angled. We are interested in the angle formed by AB and AC. Observe that −→ −→ AB = −(a + b) and AC = a − b. Thus −→ −→ AB · AC = = = = −(a + b) · (a − b) −(a2 − a · b + b · a − b2 ) −(a2 − b2 ) 0 using the fact that a · b = b · a and kak = kbk, because this is just the radius of the semicircle. It follows that the angle BAC is a right angle, as claimed. 4.1. VECTOR ALGEBRA 147 Example 4.1.9. Pythagoras’ theorem proved using vectors. We have that a+b+c=0 and so a + b = −c. Now (a + b)2 = (−c) · (−c) = kck2 . But (a + b)2 = kak2 + 2a · b + kbk2 and this is equal to kak2 + kbk2 because a · b = 0. It follows that kak2 + kbk2 = kck2 . Remark The set of 3-dimensional vectors equipped with the operations of vector addition and scalar multiplication together with the inner product is called three dimensional Euclidean space. This is precisely the space of Euclid’s geometry, but done in a modern way. 4.1.3 Vector or cross products In three dimensional space there is another operation available to us that is useful in many applications. Let a and b be non-zero vectors. We define a new vector a × b = kak kbk sin(θ)n 148 CHAPTER 4. VECTORS where θ is the angle between a and b, and n is a unit vector at right angles to the plane containing a and b — this determines n up to sign: we choose the direction of n so that when rotating a to b in a clockwise direction through the angle θ we are looking in the direction of n. O a×b / ?? ?? ?? ?? ?? ?? ?? a ?? ?? ?? ?? ?? ? b If a or b is zero then a × b is the zero vector. We call it the vector product of a and b. It is sometimes called the cross product. It is important to remember that it is a vector. The key property of the vector product is that for non-zero vectors a × b = 0 iff a and b are parallel. Theorem 4.1.10 (Properties of the vector product). (i) a × b = −b × a. (ii) λ(a × b) = (λa) × b = a × (λb). (iii) a × (b + c) = a × b + a × c. Remark Property (iii) says that the vector product distributes over addition. This is the hardest property to prove; I give the proof later. Warning! a × (b × c) 6= (a × b) × c. In other words, the vector product is not associative. 4.1. VECTOR ALGEBRA 149 Warning! Distinguish between the following: • λa. This is a scalar λ times a vector a and the result is a vector. • a · b. This is the inner product of two vectors and is a scalar. • a × b. This is the vector product of two vectors and is a vector. You must not interchange notation for these different products (unlike school algebra where you can). Example 4.1.11. The area of the parallelogram determined by the vectors a and b is ka × bk as the following picture shows. Example 4.1.12. We shall prove the ‘law of sines’ for triangles using the vector product. With reference to the diagram below we have that sin(A) sin(B) sin(C) = = . a b c 150 CHAPTER 4. VECTORS We choose vectors as shown so that kak = a, kbk = b, kck = c. Then a + b + c = 0. Hence a + b = −c. Take the vector product of this equation on both sides on the left with a, b and c in turn. We get 1. a × b = c × a. 2. b × a = c × b. 3. c × a = b × c. From (1), we get ka × bk = kc × ak . Thus kbk sin(C) = kck sin(B) which gives us the second equation in the statement of the result. The remaining results follow similarly. 4.1.4 Scalar triple products This product is nothing more than a combination of the previous two. However, it is included because, as we shall see, it has an important geometric interpretation. Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c) is a scalar. We define [a, b, c] = a · (b × c). It is called the scalar triple product. Its properties are determined by the properties of the inner and vectors products. What it means geometrically will be described later. Exercises 4.1 4.1. VECTOR ALGEBRA 151 1. Consider the following diagram. A a / B b / C c D E Now answer the following questions. (i) Write the vector BD in terms of a and c (ii) Write the vector AE in terms of a and c (iii) What is the vector DE? (iv) What is the vector CF ? (v) What is the vector AC? (vi) What is the vector BF ? 2. If a, b, c and d represent the consecutive sides of a quadrilateral, show that the quadrilateral is a parallelogram if and only if a + c = 0. 3. In the regular pentagon ABCDE, let AB = a, BC = b, CD = c, and DE = d. Express EA, DA, DB, CA, EC, BE in terms of a, b, c, and d. 4. Let a and b represent adjacent sides of a regular hexagon so that the initial point of b is the terminal point of a. Represent the remaining sides by means of vectors expressed in terms of a and b. 5. Prove that kak b + kbk a is orthogonal to kak b − kbk a for all vectors a and b. 6. Let a and b be two non-zero vectors. Let a·b a. u= a·a Show that b − u is orthogonal to a. F 152 CHAPTER 4. VECTORS 7. Simplify (u + v) × (u − v). 8. Let a and b be two unit vectors the angle between them being π3 . Show that 2b − a and a are orthogonal. 9. Prove that ku − vk2 + ku + vk2 = 2(kuk2 + kvk2 ). Deduce that the sum of the squares of the diagonals of a parallelogram is equal to the sum of the squares of all four sides. 4.2 Vector arithmetic The theory I introduced in Section 4.1 is useful for proving general results about geometry, but what if we want to calculate with particular vectors: how do we describe them? To do this we need coordinates, and vectors viewed in terms of coordinates will occupy us for the remainder of this section. 4.2.1 i’s, j’s and k’s Set up a cartesian coordinate system consisting of x, y and z axes. We orient the system so that in rotating the x axis clockwise to the y axis, we are looking in the direction of the positive z axis. Let i, j and k be unit vectors parallel to the x, y and z axes respectively (pointing in the positive directions). Every vector a can be uniquely written in the form a = a1 i + a2 j + a3 k for some scalars a1 , a2 , a3 . This is achieved by orthogonal projection of the vector a (moved so that it starts at the origin) onto each of the three coordinate axes. The numbers ai are called the components of a in each of the three directions. Remarks • If a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k then a = b iff ai = bi ; that is, corresponding components are equal. • 0 = 0i + 0j + 0k. 4.2. VECTOR ARITHMETIC 153 • If a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k then a + b = (a1 + b1 )i + (a2 + b2 )j + (a3 + c3 )k. • If a = a1 i + a2 j + a3 k then λa = λa1 i + λa2 j + λa3 k. Theorem 4.2.1. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then a · b = a1 b1 + a2 b2 + a3 b3 . Proof. This is proved using Theorem 4.1.7 (iv) and the following table · i j k i 1 0 0 j 0 1 0 k 0 0 1 computed from the definition of the inner product. We have that a · b = a · (b1 i + b2 j + b3 k) = b1 (a · i) + b2 (a · j) + b3 (a · k). We now compute a · i, a · j, and a · k in turn: • a · i = a1 . • a · j = a2 . • a · k = a3 . Putting everything together we get a · b = a1 b 1 + a2 b 2 + a3 b 3 , as required. Remark If a = a1 i + a2 j + a3 k then kak = p a21 + a22 + a23 . Theorem 4.2.2. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then i j k a × b = a1 a2 a3 b1 b2 b3 Warning! This ‘determinant’ can only be expanded along the first row. 154 CHAPTER 4. VECTORS Proof. This follows by Theorem 4.1.10 (iii) and the following table × i j k i 0 k −j j −k 0 i k j −i 0 computed from the definition of the vector product. We have that a × b = a × (b1 i + b2 j + b3 k) = b1 (a × i) + b2 (a × j) + b3 (a × k). We now compute a × i, a × j, and a × k in turn: • a × i = −a2 k + a3 j. • a × j = a1 k − a3 i. • a × k = −a1 j + a2 i. Putting everything together we get a × b = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b1 )j + (a1 b2 − a2 b1 )k which is equal to the given determinant. The proof of the following now follows by our two theorems above. Theorem 4.2.3 (Scalar triple products and determinants). Let a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, c = c1 i + c2 j + c3 k. Then a1 a2 a3 [a, b, c] = b1 b2 b3 c1 c2 c3 Thus the properties of scalar triple products are the same as the properties of 3 × 3 determinants. Proof. We calculate a · (b × c). This is equal to (a1 i + a2 j + a3 k) · [(b2 c3 − b3 c2 )i − (b1 c3 − b3 c1 )j + (b1 c2 − b2 c1 )k]. 4.2. VECTOR ARITHMETIC 155 But this is equal to a1 (b2 c3 − b3 c2 ) − a2 (b1 c3 − b3 c1 ) + a3 (b1 c2 − b2 c1 ) which is nothing other than a1 a2 a3 b1 b2 b3 c1 c2 c3 Before we look at some examples it is worth stepping back a bit to see where we are. Summary In Section 4.1, we defined vectors and the vector operations geometrically. In Section 4.2, we showed that once we had chosen a co-ordinate system vectors and the vector operations could be described algebraically. The important point to remember in what follows is that the two approaches must give the same answers. Exercises 4.2 1. Let a = 3i + 4j, b = 2i + 2j − k and c = 3i − 4k. (i) Find kak, kbk, and kck. (ii) Find a + b and a − c. (iii) Determine ka − ck. 2. (i) Let a = 4i + j − 3k and b = i + 2j + 2k. Find a · b. Are a and b orthogonal? (ii) Find the angle between −2(i − j) + k and j − i. 3. The unit cube is determined by the three vectors i, j and k. Find the angle between the long diagonal of the unit cube and one of its edges. 156 CHAPTER 4. VECTORS 4. Calculate i × (i × k) and (i × i) × k. What do you deduce as a result of this? 5. Calculate u · (v × w) where u = 3i − 2j − 5k, v = i + 4j − 4k, and w = 3j + 2k. 6. If [a, b, c] = 0 what can you deduce? 4.3 Geometry with vectors There are two kinds of vectors: the free vectors that we have been dealing with up to now and the position vectors we introduce next. 4.3.1 Position vectors So far, we have used vectors to describe line segments. But we can also use vectors to describe the precise location of points. To do this, we have to choose and fix a point O in space, called an origin. We can then consider all the directed line segments that start at O. Each such segment represents a vector and every vector is thus represented. The tops of the line segments are points in space, and every point thus occurs. It follows that once an origin has been fixed, vectors can be used to describe points. We talk about the position vectors of points. However, we can only talk about position vectors with respect to some fixed point O. 4.3. GEOMETRY WITH VECTORS 157 Example 4.3.1. The point A has position vector a = −i + j and the point B has position vector b = 2i + j − k. Find the position vector of the point P which is 23 of the way between A and B. AO ? ?? ?? ?? ?? ?? ?? 2 ?? ?? ?? ?? ?? ?? a O ?P ? ??? ?? ?? ?? ?? p ??1 ?? ?? ?? ?? ?? ? /B b We have that −→ −→ −→ OP = OA + AP −→ 2 −→ = OA + AB 3 2 = a + (b − a) 3 1 2 = a+ b 3 3 2 1 (−i + j) + (2i + j − k) = 3 3 2 = i+j− k 3 4.3.2 Linear combinations Let v1 , . . . , vn be n vectors and let λ1 , . . . , λn be n scalars. Then the vector v = λ1 v1 + . . . + λn vn 158 CHAPTER 4. VECTORS is called a linear combination of the n vectors. Only two cases of this definition are needed in this course. If we are given just one vector v1 then a linear combination is just a scalar multiple of that vector. The other case if where we have two vectors v1 and v2 . Linear combinations then look like this λ1 v1 + λ2 v2 . Let v be any non-zero vector. Then any vector parallel to this vector has the form λv for some scalar λ. Now let v1 and v2 be two non-zero vectors where neither is a multiple of the other. Then these two vectors determine a plane in space. This plane is not rooted to any point and so, for convenience, we may move it parallel to itself so that it passes through some fixed point that we may treat as an origin. Now let v be any vector which is parallel to this plane. We may move it parallel to itself so that its tail is at the origin. By plane geometry, we may find real numbers λ1 and λ2 such that v = λ1 v1 + λ2 v2 . We shall use these ideas in deriving formulae for lines and planes in space in the sections below. 4.3.3 Lines Intuitively, a line in space is determined by one of the following two pieces of information: 1. Two distinct points. 2. One point and a direction. Let’s see how we can use vectors to obtain the equation of that line. Let a and b be the position vectors of two distinct points. Let r = xi + yj + zk be the position vector of a point on the line they determine. Observe that the line determined by the two points will be parallel to the vector b − a which is the direction the line is parallel to. 4.3. GEOMETRY WITH VECTORS 159 The vectors r − a and b − a will be parallel. Thus there is a scalar λ such that r − a = λ(b − a). It follows that r = a + λ(b − a). This is called the (vector form of ) the parametric equation of the line. The parameter in question is λ. We now derive the coordinate form of the parametric equation. Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Substituting in our vector form above and equating components we obtain x = a1 + λ(b1 − a1 ), y = a2 + λ(b2 − a2 ), z = a3 + λ(b3 − a3 ). For convenience, put ci = bi − ai . Thus the coordinate form of the parametric equation for the line is x = a1 + λc1 , y = a2 + λc2 , z = a3 + λc3 . 160 CHAPTER 4. VECTORS If c1 , c2 , c3 6= 0 then we can eliminate the parameters in the above equations to get the non-parametric equations of the line: x − a1 y − a2 = , c1 c2 y − a2 z − a3 = . c2 c3 It’s worth noting that • The parametric equation is useful for generating points on the line (by choosing values of the parameter λ). • The non-parametric equation is useful for checking that given points lie on a given line. Example 4.3.2. Find the parametric and the non-parametric equations of the line through the point with position vector i + 2j + 3k and parallel to the vector 4i + 5j + 6k. In this question, we are given the direction that the line is parallel to. Thus r − (i + 2j + 3k) is parallel to 4i + 5j + 6k. It follows that r = i + 2j + 3k + λ(4i + 5j + 6k) is the vector form of the parametric equation of the line. We now find the cartesian form of the parametric equation. Put r = xi + yj + zk. Then xi + yj + zk = i + 2j + 3k + λ(4i + 5j + 6k). These two vectors are equal iff their coordinates are equal. Thus we have that x = 1 + 4λ y = 2 + 5λ z = 3 + 6λ 4.3. GEOMETRY WITH VECTORS 161 This is the cartesian form of the parametric equation of the line. Finally, we eliminate λ to get the non-parametric equation of the line x−1 y−2 y−2 z−3 = and = . 4 5 5 6 These two equations can be rewritten in the form 5x − 4y = −3 and 6y − 5z = −3. 162 4.3.4 CHAPTER 4. VECTORS Planes Intuitively, a plane in space is determined by one of the following three pieces of information: 1. Any three points that do not all lie in a straight line; that is, the points form the vertices of a triangle. 2. One point and two non-parallel directions. 3. One point and a direction which is perpendicular or normal to the plane. We shall begin by finding the parametric equation of the plane determined by the three points with position vectors a, b and c. The vectors b − a and c − a are both parallel to the plane, but are not parallel to each other. Thus every vector parallel to the plane they determine has the form λ(b − a) + µ(c − a) for some scalars λ and µ. Here we use the ideas of Section 4.3.2. Thus if the position vector of an arbitrary point on the plane is r, then r − a = 4.3. GEOMETRY WITH VECTORS 163 λ(b − a) + µ(c − a). Thus the (vector form of ) the parametric equation of the plane is r = a + λ(b − a) + µ(c − a). This can easily be written in coordinate form by equating components. To find the non-parametric equation of a plane, we use the fact that a plane is determined once a point on the plane is known and a vector orthogonal to every vector in the plane — such a vector is said to be normal to the plane. Let n be a vector normal to our plane, and let a be the position vector of a point in the plane. Then r − a is orthogonal to n. Thus (r − a) · n = 0. This is the (vector form) of the non-parametric equation of the plane. To find the coordinate form of the non-parametric equation, let r = xi + yj + zk, a = a1 i + a2 j + a3 k, n = n1 i + n2 j + n3 k. From (r − a) · n = 0 we get (x − a1 )n1 + (y − a2 )n2 + (z − a3 )n3 = 0. Thus the non-parametric equation of the plane is n1 x + n2 y + n3 z = a1 n1 + a2 n2 + a3 n3 . Remark From the equation above, we deduce that the solutions of a linear equation in three unknowns ax + by + cz = d 164 CHAPTER 4. VECTORS all lie on a plane in general (although there are some degenerate cases where something different from a plane will be obtained). We observe that the non-parametric equation of the line in fact describes the line as the intersection of two planes. If we have three equations in three unknowns then, as long as the planes are angled correctly, they will intersect in a point — that is, the equations will have a unique solution. However, there are many cases where either the planes have no points in common (no solution) of have lines or indeed planes in common (infinitely many solutions). Thus the nature of the solutions of a system of linear equations in three unknowns is intimately bound up with the geometry of the planes they determine. We have one final question to answer: given the parametric equation of the plane, how do we find the non-parametric equation? The vectors b − a and c − a are parallel to the plane but not parallel to each other. The vector n = (b − a) × (c − a) is normal to our plane. Example 4.3.3. Find the parametric and non-parametric equations of the plane containing the three points with position vectors a = j − k, b = i + j, c = i + 2j. We have that b−a=i+k and c − a = i + j + k. Thus the parametric equation of the plane is r = j − k + λ(i + k) + µ(i + j + k). To find the non-parametric equation, we need to find a vector normal to the plane. We calculate (b − a) × (c − a) = k − i. Thus (r − a) · (k − i) = 0. 4.3. GEOMETRY WITH VECTORS 165 That is (xi + (y − 1)j + (z + 1)k) · (k − i) = 0. This simplifies to z − x = −1, the non-parametric equation of the plane. We now check that our three original points satisfies this equation. The point a has co-ordinates (0, 1, −1); the point b has co-ordinates (1, 1, 0); the point c has co-ordinates (1, 2, 0). It is easy to check that each set of co-ordinates satisfies the equation. 4.3.5 Determinants Let’s start with 1 × 1 matrices. The determinant of (a) is just a. The length of a is |a|, the absolute value of the determinant of (a). 166 CHAPTER 4. VECTORS Theorem 4.3.4. Let a = ai + cj and b = bi + dj be a pair of plane vectors. Then the area of the parallelogram determined by these vectors is the absolute value of the determinant a b c d Proof. The proof I give will be for the case where both vectors are in the first quadrant. I shall consider two cases. (Case 1): b is to the left of a when standing at the origin and looking along a. Let a = ai + cj and b = bi + dj. The area of the parallelogram is the area of the rectangle defined by the points 0, (a + b)i, a + b, (c + d)j minus the area of two rectangles the same size, labelled (1), two triangles the same size, labelled (2), and another two triangles of the same size, labelled (3). That is 1 1 (a + b)(c + d) − 2bc − 2( )ac − 2( )bd 2 2 which is equal to ac + ad + bc + bd − 2bc − bd − ac = ad − bc. 4.3. GEOMETRY WITH VECTORS 167 (Case 2): b is to the right of a when standing at the origin and looking along a. A similar argument shows that the area is bc − ad which is the negative of the determinant. Putting these two cases together, we see that the area is the absolute value of the determinant, because we usually expect areas to be non-negative. Theorem 4.3.5. Let a = ai + dj + gk, b = bi + ej + hk, c = ci + f j + ik be three vectors. Then the volume of the parallelepiped (‘squashed box’) determined by these three vectors is the absolute value of the determinat a b c d e f g h i Proof. We refer to the diagram below. The volume of the box determined by the vectors a, b, c is equal to the base area times the vertical height. This is equal to the absolute value of kak kbk sin(θ) kck cos(φ). We have to use the absolute value of this expression because cos(φ) can take negative values if c is below rather than above the plane of a and b as I have drawn it. Now 168 CHAPTER 4. VECTORS • a × b = kak kbk sin(θ)n, where n is the unit vector orthogonal to a and b and in the correct direction. • n · c = kck cos(φ). Thus kak kbk sin(θ) kck cos(φ) = (a × b) · c. By the properties of the inner product (a × b) · c = c · (a × b) = [c, a, b]. We now use properties of the determinant [c, a, b] = −[a, c, b] = [a, b, c]. It follows that the volume of the box is the absolute value of [a, b, c]. It follows from the above theorem and our theorem on scalar triple products that the volume of the parallelepiped determined by the three vectors a, b, and c is the absolute value of the scalar triple product [a, b, c]. The geometric significance of determinants is that they enable us to measure lengths, areas and volumes. Exercises 4.3 1. (i) Find the parametric and the non-parametric equations of the line through the two points with position vectors i − j + 2k and 2i + 3j + 4k. (ii) Find the parametric and the non-parametric equations of the plane containing the three points with position vectors i + 3k, i + 2j − k, and 3i − j − 2k. 2. Let c be the position vector of the centre of a sphere with radius R. Let an arbitrary point on the sphere have position vector r. Why is kr − ck = R? Squaring both sides we get (r − c) · (r − c) = R2 . 4.3. GEOMETRY WITH VECTORS 169 If r = xi + yj + zk and c = c1 i + c2 j + c3 k, deduce that the equation of the sphere with centre c1 i + c2 j + c3 k and radius R is (x − c1 )2 + (y − c2 )2 + (z − c3 )2 = R2 . (i) Find the equation of the sphere with centre i + j + k and radius 2. (ii) Find the centre and radius of the sphere with equation x2 + y 2 + z 2 − 2x − 4y − 6z − 2 = 0. 3. The distance of a point from a line is defined to be the length of the perpendicular from the point to the line. Let the line in question have parametric equation r = p + λd and let the position vector of the point be q. Show that the distance of the point from the line is kd × (q − p)k . kdk 4. The distance of a point from a plane is defined to be the length of the perpendicular to the plane. Let the position vector of the point be q and the equation of the plane be (r − p) · n = 0. Show that the distance of the point from the plane is |(q − p) · n| . knk 170 4.4 CHAPTER 4. VECTORS Summary of vectors Inner products Definition Let a and b be two vectors. If a, b 6= 0 then we define a · b = kak kbk cos θ where θ is the angle between a and b. Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. We call a · b the inner product of a and b. Co-ordinate form Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then a · b = a1 b 1 + a2 b 2 + a3 b 3 . Uses • The most important application is the following: if the vectors a and b are non-zero then a · b = 0 precisely when a and b are orthogonal — meaning ‘at right angles to each other’. • The inner product can more generally be used to work out the angle between two vectors a·b cos θ = kak kbk where θ is the angle between the non-zero vectors a and b. • The inner √ product can be used to work out the lengths of vectors: kak = a · a. 4.4. SUMMARY OF VECTORS 171 Vector products Definition Let a and b be non-zero vectors. We define a new vector a × b = kak kbk sin θn where θ is the angle between a and b, and n is a unit vector at right angles to the plane containing a and b — this determines n up to sign: we choose the direction of n so that when rotating a to b in a clockwise direction through the angle θ we are looking in the direction of n. If a or b is zero then a × b is the zero vector. We call it the vector product of a and b. Co-ordinate form Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k. Then i j k a × b = a1 a2 a3 b1 b2 b3 Uses • The most important application of the vector product is in constructing a vector orthogonal to two other vectors, and in particular in constructing a vector orthogonal to a plane — a vector normal to the plane. • If the vectors a and b are non-zero then a × b = 0 precisely when a and b are parallel to each other. • The vector product can be used to calculate the sine of the angle between two vectors ka × bk sin θ = kak kbk where θ is the angle between the non-zero vectors a and b. 172 CHAPTER 4. VECTORS Scalar triple products Definition Let a, b and c be three vectors. Then b × c is a vector. Thus a · (b × c) is a scalar. We define [a, b, c] = a · (b × c). It is called the scalar triple product. Co-ordinate form Let a = a1 i + a2 j + a3 k, b = b1 i + b2 j + b3 k, and c = c1 i + c2 j + c3 k. Then a1 a2 a3 [a, b, c] = b1 b2 b3 c1 c2 c3 Uses • The absolute value of [a, b, c] is the volume of the parallelepiped (‘squashed box’) determined by the three vectors. • The scalar triple product gives a geometric interpretation of 3 × 3 determinants. 4.5. *TWO VECTOR PROOFS* 4.5 173 *Two vector proofs* This section will not be examined in 2013. My development of the theory of vectors in this chapter, depended on two important results: Theorem 4.1.7 (iv), the fact that a · (b + c) = a · b + a · c and Theorem 4.1.10 (iii), the fact that a × (b + c) = a × b + a × c. I shall sketch out proofs of both of these results here. The proof of the first is not too difficult. Theorem 4.5.1. a · (b + c) = a · b + a · c. Proof. Let x and y be a pair of vectors. Then the component of x in the direction of y, written comp(x, y), is by definition the number kxk cos θ where θ is the angle between x and y. Clearly x · y = kyk comp(x, y). Geometry shows (this means you should draw the pictures) that comp(b + c, a) = comp(b, a) + comp(c, a). We therefore have that (b + c) · a = kak comp(b + c, a) = kak comp(b, a) + kak comp(c, a) = b·a+c·a The proof of the second is hairier. Theorem 4.5.2. a × (b + c) = a × b + a × c. 174 CHAPTER 4. VECTORS Proof. We defined the vector product in terms of geometry and so we shall have to prove this property by means of geometry. I shall sketch out a proof following one given in Pettofrezzo’s book. We begin with what is in effect a lemma. Let a and b be a pair of vectors. It is convenient to move them so that they are both emanating from the same point P . They determine a plane. In that plane, we can draw a line perpendicular to the vector a and passing through the point P . We project the vector b onto this line and we get a vector b0 . We claim that a × b = a × b0 . The proof follows by observing that these two vectors clearly have the same direction and a calculation shows that they have the same length. We now prove our theorem. We orientate ourselves so that the vector a is at right angles to the page and pointing at you the reader. We project the vectors a and b onto the plane of the page to get the vectors a0 and b0 . We shall prove that a × (b0 + c0 ) = a × b0 + a × c0 . Let’s see first why this result is enough to prove the theorem. The vectors a and b + c define a plane. As in our lemma above, we have that a × (b + c) = a×(b+c)0 . Also a×b = a×b0 and a×c = a×c0 . As long as (b+c)0 = b0 +c0 , our theorem will follow. We now prove that a × (b0 + c0 ) = a × b0 + a × c0 . Now, by the way we have defined our vectors, a × b0 and a × c0 are in the plane of the page and are orthogonal to b0 and c0 , respectively. This leads to the crux of the proof: the angle between a × b0 and a × c0 is the same as the angle between b0 and c0 . The point is that because a is pointing out of the page, the operator a × − has the effect of rotating vectors by a right angle in the plane of the page. It follows that a×b0 +a×c0 is at right angles to b0 +c0 . Thus a×b0 +a×c0 and a × (b0 + c0 ) are vectors pointing in the same direction. We now compare the lengths of these two vectors. We shall use the fact that the triangles formed by the vectors a × b0 and a × c0 is similar to the triangle formed by the vectors b0 and c0 . Thus ka × b0 k ka × b0 + a × c0 k = . kb0 + c0 k kb0 k 4.6. *QUATERNIONS* 175 But this works out to give that ka × b0 + a × c0 k = kak kb0 + c0 k . Our claim is now proved. 4.6 *Quaternions* This section will not be examined in 2013. The set of quaternions, denoted by H, were invented by the Irish mathematician Sir William Rowan Hamilton in 1843. They are 4-dimensional generalisations of the complex numbers. It was from the theory of quaternions that the modern theory of vectors with inner and vector products developed. To describe what they are, I shall reverse history and derive them from vectors. Recall the following from some earlier exercises; the notation is slightly different. The Pauli matrices are: I, X, Y, Z, −I, −X, −Y, −Z where 0 1 i 0 0 −i X= Y = and Z = −1 0 0 −i −i 0 where i is the complex number i. You were asked to show that the product of any two Pauli matrices is again a Pauli matrix by completing a Cayley table. We shall just need a portion of that Cayley table relating to X, Y and Z. This is X Y Z X −I Z −Y Y −Z −I X Y Y −X −I We shall now consider matrices of the form λI + αX + βY + γZ where λ, α, β, γ ∈ R. We calculate the product of two such matrices using the distributivity and scalar multiplication properties of matrix multiplication and the above multiplication table. The product (λI + αX + βY + γZ)(µI + α0 X + β 0 Y + γ 0 Z) 176 CHAPTER 4. VECTORS can be written in the form aI + bX + cY + dZ where a, b, c, d ∈ R although I shall write it in a slightly different form (λµ − αα0 − ββ 0 − γγ 0 )I + λ(α0 X + β 0 Y + γ 0 Z) + µ(αX + βY + γZ) + (βγ 0 − γβ 0 )X + (γα0 − αγ 0 )Y + (αβ 0 − βα0 )Z. Although this looks complicated there are some familiar things within it: the first term contains what looks like an inner product and the last term contains what looks like a vector product. Note that because this is matrix multiplication this operation is associative. The above calculation motivates the following construction. Let E3 denote the set of all 3-dimensional vectors. Thus a typical element of E3 is αi + βj + γk. Put H = R × E3 . The elements of H are therefore ordered pairs (λ, a) consisting of a real number λ and a vector a. We define the sum of two elements of H in a very simple way (λ, a) + (µ, a0 ) = (λ + µ, a + a0 ). The product is defined in a way that mimics what I did above (you should check this) (λ, a)(µ, a0 ) = (λµ − a · a0 , λa0 + µa + (a × a0 )) . It follows that this product is associative ! We shall now investigate what we can do with H. I shall only deal with multiplication because addition poses no problems. • Consider the subset R of H which consists of elements of the form (λ, 0). You can check that (λ, 0)(µ, 0) = (λµ, 0). Thus R mimics the real numbers. • Consider the subset C of H which consists of the elements of the form (λ, ai). You can check that (λ, ai)(µ, a0 i) = (λµ − aa0 , (λa0 + µa)i). In particular, (0, i)(0, i) = (−1, 0). Thus C mimics the set of complex numbers. 4.7. LEARNING OUTCOMES FOR CHAPTER 4 177 • Consider the subset E of H which consists of elements of the form (0, a). You can check that (0, a)(0, a0 ) = (−a · a0 , a × a0 ). Thus E mimics vectors, the inner product and the vector product. The set H with the above operations of addition and multiplication is the set of quaternions. This structure pulls together most of the important elements of this course: complex numbers, vectors and matrices. 4.7 Learning outcomes for Chapter 4 • Compute with vectors using scalar products, vector products, and scalar triple products. • Find the equation of the unique line determined by two points in space or a point and a direction. • Find the equation of the unique plane determined by three points in space or a point and two directions. • Find the equation of the unique plane determined by a point in the plane and the normal vector. • Find volumes of parallelepipeds using scalar triple products. 4.8 Further reading and exercises The material in this chapter usually causes problems. The reason, I think, is that it requires you to think both geometrically and algebraically. I would strongly recommend Chapter 11 of Olive for further reading, and possibly Chapter 5 of Hirst and Singerman. If you would like to learn about vectors and matrices with an antipodean flavour, I recommend the first six chapters of David Easdown’s book A first course in linear algebra, Pearson Education Australia, 2008 which also comes with an accompanying DVD. 178 CHAPTER 4. VECTORS Chapter 5 Counting Counting seems such an easy process that a chapter devoted to it would appear to be unnecessary but it is a lot harder than it looks since counting lies behind probability theory. The main goal of this chapter is not so much the results themselves, which are important, but the methods used to prove them. 5.1 More set theory The main tool needed to count sets is set theory itself. In this section, we describe some constructions that will be the basis of some useful counting techniques. 5.1.1 Operations on sets There are three operations defined on sets using the words ‘and’, ‘or’ and ‘not’. Let A and B be sets. We can construct a new set, called the intersection of A and B and denoted by A ∩ B, whose elements consist of all those elements that belong to both A and to B. We can construct a new set, called the union of A and B and denoted by A ∪ B, whose elements consist of all the elements of A together with all the elements of B. In constructing unions of sets, we remember that repetitions don’t count. We can construct a new set, called the difference or relative complement 179 180 CHAPTER 5. COUNTING of A and B and denoted by A \ B,1 whose elements consist of all those elements that belong to A but not to B. Warning! The word ‘or’ in mathematics doesn’t mean quite the same as it does in everyday life. Thus ‘X or Y ’ means ‘X or Y or both’, whereas in everyday life, we assume that ‘or’ means ‘exclusive or’: that is, ‘X or Y ’ means ‘X or Y but not both’. When illustrating definitions involving sets or trying to gain an idea of what’s true about sets in general, we often use Venn diagrams. In a Venn diagram, a set is represented by a region in the plane. The intersection of two sets can then be represented by the overlap of the two regions representing each of the sets, and the union is represented by the region enclosed by both regions. Although Venn diagrams cannot be used to prove results about sets, they are a nice way of visualising sets and their properties.2 1 2 Sometimes denoted by A − B. With thanks to Simone Rea for the Venn diagrams 5.1. MORE SET THEORY A 181 B A∩B A B A∪B A B A\B 182 CHAPTER 5. COUNTING Example 5.1.1. Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Find A∩B, A∪B, A \ B and B \ A. Let’s begin with A ∩ B. We have to find the elements that belong to both A and B. We start with the elements in A and work left-to-right: 1 doesn’t belong to B; 2 doesn’t belong to B; 3 does belong to B as does 4. Thus A ∩ B = {3, 4}. To find A ∪ B we join the two sets together {1, 2, 3, 4, 3, 4, 5, 6} and then read from left-to-right weeding out repetitions to get A∪B = {1, 2, 3, 4, 5, 6}. To calculate A \ B we have to find the elements of A that don’t belong to B. So read the elements of A from left-to-write comparing them with the elements of B: 1 doesn’t belong to B; 2 doesn’t belong to B: but 3 and 4 do belong to B. It follows that A \ B = {1, 2}. To calculate B \ A we have to find the set of elements of B that don’t belong to A: this set is just {5, 6}. Examples 5.1.2. (i) E ∩ O = ∅. This says that there is no number which is both odd and even. (ii) P ∩ E = {2}. This says that the only even prime number is 2. (iii) E ∪ O = N. This says that every natural number is either odd or even. 5.1.2 Partitions The sets A and B are said to be disjoint if A ∩ B = ∅. If A and B are disjoint then their union is called a disjoint union. If A1 , . . . , An is a collection of sets such that Ai ∩ Aj = ∅ when i 6= j then we say they are pairwise disjoint. Let A be a set. A partition P = {A1 , . . . , An } of A is a set whose elements consist of non-empty subsets A1 , . . . , An of A which are pairwise disjoint and whose union is A. The subsets Ai are called the blocks of the partition.3 Examples 5.1.3. (i) The set P = {{a, c}, {b}, {d, e, f }} 3 The number of partitions of a set with n elements is called the nth Bell number. 5.1. MORE SET THEORY 183 is a partition of the set X = {a, b, c, d, e, f }. In this case there are three blocks in the partition. (ii) For statistical purposes a group (i.e a set) of people might be partitioned by age: the blocks might be under 18’s, 18 to 35 year olds, 36 to 49 year olds, and over 50’s. Here we have 4 blocks. (iii) How many partitions does the set X = {a, b, c} have? There is 1 partition with 1 block {{a, b, c}}, there are 3 partitions with 2 blocks {{a}, {b, c}}, {{b}, {a, c}}, {{c}, {a, b}}, there is 1 partition with 3 blocks {{a}, {b}, {c}}. There are therefore 5 partitions of the set X. 5.1.3 Sequences In a set, the order the elements occur is irrelevant and repetitions are ignored. But there are plenty of occasions when order does matter and repetitions are required. On such occasions, we cannot use sets, we use instead a sequence or list. The list of length n (a1 , a2 , . . . , an ) consists of n entries where the first element is a1 , the second is a2 and so on. A list with two entries (a, b) is called an ordered pair.4 Observe that (a, b) 6= (b, a) unless a = b, so order matters, and (a, a) is different from a on its own, so repetition matters. The element a is called the first component and the element b is called the second component. We can also define ordered triples, which look like (a, b, c), and more generally ordered n-tuples, which look like (a1 , . . . , an ). Ordered n-tuples are just lists of length n. 4 This notation should not be confused with the notation for real intervals where (a, b) denotes the set {r : r ∈ R and a < r < b}, nor with the use of brackets in clarifying the meaning of algebraic expressions. The context should make clear what is intended. 184 CHAPTER 5. COUNTING If A and B are sets then the set A × B, called the product of A by B, is defined to be the following set: A × B = {(a, b) : a ∈ A and b ∈ B}, the set of all ordered pairs where the first component comes from A and the second component comes from B. Example 5.1.4. For example, if A = {1, 2} and B = {a, b, c} then A × B = {(1, a), (1, b), (1, c), (2, a), (2, b), (2, c)}. More generally, we can define A × B × C to consist of all ordered triples where the first component comes from A, the second from B and the third from C. Yet more generally, we can define A1 × . . . × An to consist of all n-tuples where the ith component comes from Ai . Example 5.1.5. Dates consist of three pieces of information: the day, the month and the year, which in the UK are usually stated in that order (this is different in the US). So dates are ordered triples (day, month, year) where day ∈ {1, . . . , 31}, month ∈ {January, . . . , December} and year ∈ N (I’m assuming here only AD years occur). Let A be a set. Then we can form the sets A × A, A × A × A and so on. These are usually abbreviated as A2 , A3 and so on. Thus we have the sets R, R2 and R3 which represent the real line, real plane and real Euclidean space, respectively. Example 5.1.6. British car registration plates are 7-tuples consisting of two letters, followed by two digits, followed by three letters. If we denote the set of uppercase English letters by L and the set of digits by D = {0, 1, 2, . . . , 9} then every registration plate is an element of the set L × L × D × D × L × L × L. In fact, not every such 7-tuple is allowable. To be more precise, we should write (L \ {I, Q, Z})2 × D2 × (L \ {I, Q})3 5.2. WAYS OF COUNTING 185 where I have used powers of sets as an abbreviation. There are further restrictions on the last three letters to avoid the use of taboo words. This too could be expressed by means of set difference. In other words, we can use set notation to give a precise definition of the form taken by allowable registration plates. Exercises 5.1 1. Let S = {4, 7, 8, 10, 23}, T = {5, 7, 10, 14, 20, 25} and V = {2, 5, 10, 20, 30, 36}. What are S ∪ (T ∩ V ), S \ (T ∩ V ) and (S ∩ T ) \ V ? 2. Let A = {a, b, c, d, e, f } and B = {g, h, k, d, e, f }. What are the elements of the set A \ ((A ∪ B) \ (A ∩ B))? 3. Write down the elements in the set {A, B, C} × {a, b}. 4. Let A = {1, 2, 3} and B = {a, b, c}. What is the set (A × B) \ (({1} × B) ∪ (A × {c}))? 5. Which of the following are partitions of the set X = {1, 2, . . . , 9}? For those which are not partitions, explain why they fail. (i) {{1, 3, 5}, {2, 6}, {4, 8, 9}}. (ii) {{1, 3, 5}, {2, 4, 6, 8}, {5, 7, 9}}. (iii) {{1, 3, 5}, {2, 4, 6, 8}, {7, 9}}. 5.2 Ways of counting The counting principles introduced in this section are of fundamental importance. 186 5.2.1 CHAPTER 5. COUNTING Counting principles There are many occasions in mathematics when we have to count the number of elements in some set. In this section, I have gathered together some of the most important results on counting. Recall that the number of elements in a set X is called the cardinality of X, denoted by |X|. Examples 5.2.1. (i) |∅| = 0. (ii) |{♦, ♣, ♥, ♠}| = 4. (iii) |{a, b, c . . . , x, y, z}| = 26. There are two general principles that help in counting sets. I shall say that there is a one-to-one correspondence between the set A and the set B if each element of A can be paired off with a unique element in B, and if each element of B is thereby paired off with a unique element in A. We also say that there is a bijection between A and B.5 1. Correspondence principle (really a definition) There is a one-toone correspondence between the set A and the set B if and only if |A| = |B|. Example 5.2.2. The number of people (legally) at a football game is the same as the number of tickets sold. 2. Partition counting principle Let A1 , . . . , An be the blocks of a partition of A. Then |A| = |A1 | + . . . + |An |. Example 5.2.3. To count the number of people in the UK, count the number of people (at a fixed time!) in each of the counties, and then add them up. We shall now show how to use our two principles to count various kinds of sets. 5 If there is an injection from A to B then |A| ≤ |B|, and if there is a surjection from A onto B then |A| ≥ |B|. 5.2. WAYS OF COUNTING 5.2.2 187 Counting sequences We can apply our counting principles to counting lists which will lead to a third counting principle. Proposition 5.2.4 (Product counting principle). Let A and B be sets. Then |A × B| = |A| |B| . More generally, if there are n sets A1 , . . . , An then |A1 × . . . × An | = |A1 | . . . | An | . Proof. The set A × B is the union of the disjoint sets {a} × B where a ∈ A. Now |{a} × B| = |B| because there is a one-to-one correspondence between the elements of {a} × B and the elements of B. Thus A × B is the disjoint union of |A| sets each with |B| elements. Therefore |A × B| = |A| |B|, as required. The more general result can be proved by induction (proof omitted). Example 5.2.5. Let A = {1, 2, 3} and B = {α, β}. Then A × B = {(1, α), (1, β), (2, α), (2, β), (3, α), (3, β)}, and we can see that this set contains 6 elements. On the other hand, using our lemma above, we have that | A | = 3 and | B | = 2 and so | A × B | = 3 · 2 = 6, which is what we expect. A special case of the above proposition occurs when A = A1 = . . . = An , in which case, |An | = |A|n . This result is useful in counting lists. Let A be a set. Then a list with k entries each from A is an element of Ak . Thus the number of lists of length k with entries from A is |A|k . 5.2.3 The power set The set whose elements are all the subsets of X is called the power set of X and is denoted by P(X). Warning! The power set of a set X contains both ∅ and X as elements. 188 CHAPTER 5. COUNTING Example 5.2.6. Let’s find all the subsets of the set X = {a, b, c}. First we have the subset with no elements, the emptyset. Then we have the subsets that contain exactly one element: {a}, {b}, {c}. Then the subsets containing exactly two elements: {a, b}, {a, c}, {b, c}. Finally, we have the whole set X. It follows that X has 8 subsets and P(X) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, X}. Proposition 5.2.7. Let X be a finite set with n elements. Then |P(X)| = 2n . Proof. List the elements of X in any order. A subset of X is determined by saying which elements of X are to be in the subset and which are not; we can indicate these elements by writing a 1 above an element of X in the list if it is in the subset, and a 0 above the element in the list if it is not in the subset. Thus a subset determines a sequence of 0’s and 1’s of length n where the 1’s tell you which elements of X are to appear and the 0’s tell you which elements of X are to be omitted. This is clearly a one-to-one correspondence between the set of subsets of X and the set of all sequences of 0’s and 1’s of length n. Thus the number of subsets of X is the same as the number of lists of length n where each component of the list is taken from the set {0, 1}. Thus the number of subsets of X is the same as the cardinality of the set {0, 1}n , but this is equal to 2n . Thus |P(X)| = 2n , as required. Example 5.2.8. We can illustrate the above proof by looking at the example where X = {a, b, c}. We list the elements of X so (a, b, c). The empty set corresponds to the sequence (0, 0, 0) the whole set to (1, 1, 1). The subset {b, c} corresponds to the sequence (0, 1, 1). 5.2.4 Counting arrangements: permutations Let X be an n-element set. We are interested in calculating the number of lists of length n of elements of X where this time there are no repetitions. We call such a list an n-permutation or just a permutation of X. 5.2. WAYS OF COUNTING 189 Example 5.2.9. Let X = {a, b, c}. The permutations which arise are (a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). Thus the number of permutations of a 3-element set is 6. Let n ≥ 1. Define 0! = 1 and n! = n · (n − 1)! when n ≥ 1. The number n! is called n factorial. Proposition 5.2.10. Let X be an n-element set. The number of permutations of length n is n!. Proof. Let Pn stand for the set of all permutations of an n-element set. Each permutation of X will begin with one of the elements of X. Thus the set of all permutations of X, Pn , is partitioned into n blocks each block consisting of all permutations which begin with one of the elements of X. If we take one of these blocks, and remove the first element from each of the permutations in that block, the set which results is a set of permutations of an (n − 1)element set. Thus the number of permutations of an n element set is equal to n times the number of permutations of an (n − 1)-element set. That is |Pn | = n |Pn−1 | . We observe that the cardinality of P1 is just 1. The result now follows. Let k ≤ n. By a k-permutation we simply mean a list of length k without repetition whose components are drawn from a set with n elements. Thus an n-permutation is just what we have defined to be a permutation. Example 5.2.11. Let’s calculate the 2-permutations of elements from {a, b, c}. They are (a, b), (b, a), (a, c), (c, a), (b, c), (c, b). Proposition 5.2.12. The number of k-permutations of n elements is n! . (n−k)! Proof. The proof of this just modifies the argument given in the above proposition. n! Notation The number (n−k)! is sometimes written n Pk . This can be read: the number of ways of ranking k elements chosen from n objects. 190 CHAPTER 5. COUNTING Example 5.2.13. For example, let’s list the 2-permutations from the set {1, 2, 3, 4}. We obtain (1, 2), (1, 3), (1, 4) and (2, 1), (2, 3), (2, 4) and (3, 1), (3, 2), (3, 4) and (4, 1), (4, 2), (4, 3). On the other hand, our formula above tells us that we should have which checks out. 5.2.5 4! 2! = 12 Counting choices: combinations Let A be a set where |A| = n. A subset of A with k elements is called a k-subset. It is also often also called a combination of k objects. Example 5.2.14. Let A = {a, b, c, d}. Let’s find the 2-subsets of A. These are just {a, b}, {a, c}, {a, d} and {b, c}, {b, d}, {c, d}. That is, 6 altogether. If X has n elements then there will be one 0-subset, one n-subset, and then various numbers of 1-subsets, 2-subsets, . . . , and (n−1)-subsets. Denote the number of k-subsets of an n element set by n , k pronounced ‘n choose k’. Notation The number nk is sometimes written n Ck and is read: the number of ways of choosing k objects from n objects. Example 5.2.15. Let X = {a, b, c}. There is one 0-subset namely ∅ and one 3-subset namely X. The 1-subsets are: {a}, {b}, {c} and so there are three of them. The 2-subsets are: {a, b}, {a, c}, {b, c} and so there are three of them. Observe that 1 + 3 + 3 + 1 = 8 = 23 . 5.2. WAYS OF COUNTING 191 Proposition 5.2.16. Let 0 ≤ k ≤ n. Then n n! = . k k!(n − k)! Proof. Let P be the set of all k-permutations of a set with n elements. Partition this set by putting two such perms into the same block if they permute the same set of k elements. Each block contains k! elements. It follows that the number of blocks is |Pk!| . However, there is a bijective correspondence between the set of blocks and the set of k-subsets. The result now follows. Example 5.2.17. Let’s now use this formula to calculate the number of 2-subsets of a 4 element set. This is just 4 =6 2 which is what we found by explicitly finding them. Numbers of the form n k are called binomial coefficients. Remark When calculating n k remember that in general a lot of cancellation occurs. For example, 100 100! 100 · 99 · 98! = = = 50 · 99 = 4, 950. 98!2! 98! · 2 98 It would be silly to actually calculate 100! first etc. Example 5.2.18. Direct calculation shows that n n = k n−k 192 CHAPTER 5. COUNTING but we can explain why this is true in terms of counting subsets of a set. Every time I choose a subset with k elements I am simultaneously not choosing a subset with n − k elements. There is therefore a one-to-one correspondence between the subsets with k elements and the subsets with n − k elements. It follows by the Correspondence Principle that there must be the same number of k-subsets as there are n − k-subsets. 5.2.6 Examples of counting In questions involving counting, ask the following questions and then use the formulae indicated. The number of objects we make our choice from is n, and the number of objects being chosen in some manner is k. Order matters? yes yes no no Repetition allowed? yes no no yes Terminology sequences permutations combinations not discussed Number nk n k n! (n−k)! n! = k!(n−k)! - Examples 5.2.19. (i) In the lottery, 6 distinct numbers are chosen from the range 1 to 49. How many ways can this be done? Order is not important and repetitions are not allowed and so the solution is 49 = 13, 983, 816. 6 (ii) There are 10 contestants in a race. Assuming no ties, how many possible outcomes of the race are there? Here order matters and repetition is not allowed. Thus the solution is 10! = 3, 628, 800. (Remember: 0! = 1). (iii) A committee of 9 people has to elect a chairman, secretary and treasurer (assumed all different). In how many ways can this be done? There is an implicit order here: we are not just electing 3 people, we are electing 3 people to specific offices (which we could call ‘office 1’, ‘office 2’ and ‘office 3’). Thus order matters but repetition is not allowed and so the solution is 9 × 8 × 7 = 504 ways. 5.2. WAYS OF COUNTING 193 (iv) Given the digits 1,2,3,4,5 how many 4-digit numbers can be formed if repetition is allowed. We are just counting sequences and so the solution is 54 = 625. (v) The average novel has 250 pages, each page has 45 lines, and each line consists of about 60 symbols. The symbols are upper and lower case letters and punctuation symbols: say about 60 in total. How many possible novels are there? We allow avant garde novels that consist of nonsense words or are blank. Think of a novel as a sequence of symbols: it will be 250 × 45 × 60 = 675, 000 symbols long. But each symbol can be one of 60 possibilities and so the number of possible novels is 60675,000 . It’s more convenient to write this as a power of 10 and we get, approximately, 6 1010 possible novels. For comparison purposes the number of atoms in the universe is estimated to be 1080 . Exercises 5.2 1. (i) A menu consists of 2 starters, 3 main courses and 4 drinks. How many possible dinners are there consisting of one starter, one main course and one drink? Explain your answer using products of sets. (ii) For the purposes of this question, a date consists of an ordered triple consisting of the following three components: first component a natural number d in the range 1 ≤ d ≤ 31; second component a natural number m in the range 1 ≤ m ≤ 12; third component a natural number y in the range 1 ≤ y ≤ 3000. How many possible dates are there? (iii) In how many ways can 10 books be arranged on a shelf? (iv) 8 cars are to be ranked first, second and third. In how many ways can this be done? 194 CHAPTER 5. COUNTING (v) In how many ways can a hand of 13 cards be chosen from a pack of 52 cards? (vi) In how many ways can a committee of 4 people be chosen from 10 candidates? 2. Let A and B be any finite sets. Prove that |A ∪ B| = |A| + |B| − |A ∩ B| . 3. Prove, using results about sets, that for n ≥ r ≥ 1, we have n+1 n n = + . r r−1 r 5.3 The binomial theorem The goal of this section is to prove an important result in algebra using what we have learnt about counting. We know how to calculate xn where x is called a monomial. In this section, we shall describe how to calculate (x + y)n where n is any natural number in terms of powers of x and y. The expression x + y is called a binomial since it consists of two terms. Let’s look at how this expression expands for i = 0, 1, 2, 3, 4. We have that (x + y)0 = 1 (x + y)1 = 1x + 1y (x + y)2 = 1x2 + 2xy + 1y 2 (x + y)3 = 1x3 + 3x2 y + 3xy 2 + 1y 3 (x + y)4 = 1x4 + 4x3 y + 6x2 y 2 + 4xy 3 + 1y 4 I have highlighted the coefficients that arise, including putting in unity. These coefficients form what is known as Pascal’s triangle. Observe that each row can be obtained from the preceding row as follows: apart from the 1’s at each end, each entry in row i + 1 is the sum of two entries in row i, specifically the two numbers above to the left and right. We shall explain why this works later. Let’s look at the last row I have written. The numbers 1, 4, 6, 4, 1 5.3. THE BINOMIAL THEOREM are precisely the numbers 4 4 , , 0 1 195 4 , 2 4 , 3 4 . 4 We may therefore write 4 (x + y) = 4 X 4 i=0 i x4−i y i . The following theorem says that this result is true for any n not just for n = 4. Theorem 5.3.1 (The Binomial Theorem). For any natural number n, we have that n X n n−i i n (x + y) = x y. i i=0 Proof. This is often proved by induction, but I want to give a more conceptual proof. I shall also look at a special case to explain the idea. Let’s calculate (x + y)(x + y)(x + y) in great detail. Multiplying out the brackets, but before we carry out any simplifications, we get (x + y)(x + y)(x + y) = xxx + yxx + xyx + xxy + xyy + yxy + yyx + yyy. There are 8 summands6 here and each summand is a sequence of x’s and y’s of length 3. When we simplify, all summands containing the same number of x’s are collected together. How many summands are there containing i x’s? Clearly n . i All summands containing i x’s can be simplified to look like xn−i y i . The result now follows by generalising this argument. 6 A summand is something being added in a sum. 196 CHAPTER 5. COUNTING Thus the numbers in Pascal’s triangle are just the binomial coefficients. The explanation for the rule used in calculating successive rows of Pascal’s triangle follows from the following lemma. The proof is left as an exercise in algebraic manipulation. Lemma 5.3.2. Let n ≥ r ≥ 1. Then n+1 n n = + . r r−1 r One important application of the binomial theorem, which plays a role in calculus, is in estimating (x + h)n when h is small. I shall illustrate this by means of an example. Example 5.3.3. Let x be a real number and let h be small meaning 0 < h < 1 with the idea that h is a lot smaller than 1. Let’s calculate (x + h)4 . By the Binomial Theorem we have that (x + h)4 = x4 + 4x3 h + 6x2 h2 + 4xh3 + h4 . Now, if h is much smaller than 1 then each of h2 , h3 , h4 will be very much smaller than 1. For example, if h = 0 · 2 then h2 = 0 · 04 and h3 = 0 · 008 and so on. We may therefore write (x + h)4 ≈ x4 + 4x3 h, where the symbol ≈ means ‘approximately equal to’, when h is small. Let’s see how good this approximation is by calculating a specific example. Calculate (2 · 00321)4 approximately. By our argument above (2 · 00321)4 ≈ 24 + 4 × 23 × 0 · 00321 = 16 · 10272. Calculating (2 · 00321)4 exactly, we get (2 · 00321)4 = 16 · 10296756. The argument above is used to calculate the derivative of the function x 7→ xn when n is a positive integer. Remark Remember to use ≈ and not = when approximating numbers. Experience has shown that students often have problems with the binomial theorem. Here are some points to bear in mind: 5.3. THE BINOMIAL THEOREM 197 • Unless the power you have to calculate is small, the binomial theorem should always be used and not Pascal’s triangle. • Always write down the theorem so you have something to work with: n X n i n−i (x + y) = xy . i i=0 n Observe that there is a plus sign between the two terms in the brackets. n • ni = n−i . • What you call x and what you call y doesn’t matter. 198 CHAPTER 5. COUNTING Example 5.3.4. Calculate the constant term of 9 1 2 3x − . 2x 1 ) thus X = 3x2 Observe first that the term in the brackets is 3x2 + (− 2x 1 and Y = − 2x . We can now expand using the binomial theorem and simplify using the properties of exponents 9 9 1 1 2 2 3x − = 3x + − (5.1) 2x 2x = (X + Y )9 (5.2) 9 X 9 = X 9−i Y i (5.3) i i=0 i 9 X 1 9 2 9−i = (3x ) − (5.4) i 2x i=0 9 X 9 = (3x2 )9−i (−2x)−i (5.5) i i=0 9 X 9 = (3x2 )9−i (−2)−i (x)−i (5.6) i i=0 9 X 9 9−i 18−2i = 3 x (−2)−i x−i (5.7) i i=0 9 X 9 9−i = 3 (−2)−i x18−2i x−i (5.8) i i=0 9 X 9 9−i = 3 (−2)−i x18−3i (5.9) i i=0 9 9−i X 9 3 x18−3i . (5.10) = i i (−2) i=0 Commentary (4.1) The binomial theorem only applies to sums so the first step is to write the difference as a sum. 5.3. THE BINOMIAL THEOREM 199 (4.2) This step is not strictly necessary but you might find it helpful when learning the Binomial Theorem. (4.3) This is nothing other than the Binomial Theorem applied to (X + Y )9 . 1 (4.4) Now replace X by 3x2 and Y by − 2x . It is important to observe that there are brackets around these expressions and that the whole bracket is raised to a power. 1 is the same as (−2x)−1 . We also use the fact that (4.5) Observe that − 2x (aα )β = aαβ . (4.6) This is where one of the commonest student errors creeps in. Observe that (ab)α = aα bα . It is a very common mistake to raise only one of the terms in the brackets to the power. (4.7) Here I have simply applied the rule (ab)α = aα bα again. (4.8) I have just rearranged and placed all the numbers together, their product is the coefficient. (4.9) I have used the result that aα aβ = aα+β (4.10) Not strictly necessary but I thought it looked better. Once we have carried out the above computation we can find any coefficient we want: • The coefficient of x18−3i is 9 39−i . i (−2)i • The constant term occurs when 18 − 3i = 0 and so i = 6. Thus the constant term is 3 3 9 3 9 3 567 = = . 6 (−2)6 6 26 16 Exercises 5.3 1. Write out (1 + x)8 using sigma-notation. 2. Write out (1 − x)8 using sigma-notation. 200 CHAPTER 5. COUNTING 3. Calculate the coefficient of a2 b8 in (a + b)10 . 4. Calculate the coefficent of x3 in (3 + 4x)6 . 5. Use the binomial theorem to prove the following. P (i) 2n = ni=0 ni . P (ii) 0 = ni=0 (−1)i ni . P (iii) ( 23 )n = ni=0 21i ni . 6. Prove, using results about sets, that n X n 2 = . i i=0 n 7. Use the binomial theorem to prove that culate (x + y)2n in two different ways.] 5.4 2n n = n 2 i=0 i . Pn [Hint: cal- *An introduction to infinite numbers* This section will not be examined in 2013. We begin with a result important in the history of set theory. It is called Russell’s Paradox.7 It is the first inkling that the intuitively plausible idea of a set may contain hidden depths. Theorem 5.4.1. The collection of all sets that do not contain themselves as an element is not a set. 7 Bertrand Russell was an Anglo-Welsh philosopher born in 1872, when Queen Victoria still had another thirty years on the throne as ‘Queen empress’, and who died in 1970 a few months after Neil Armstrong stepped onto the moon. As a young man he made important contributions to the foundations of mathematics but in the course of his extraordinary life he found time to stand for parliament, encouraged the philosopher Ludwig Wittgenstein, received two prison sentences, won the Nobel prize for literature, was the first president of CND, and campaigned against the Vietnam war. T. S. Eliot even wrote a poem about him. Born into an aristocratic family, albeit a startlingly progressive one, he was later an earl entitled to sit in the House of Lords. See Russell: a very short introduction by A. C. Grayling published by OUP, 2002, for a very short introduction. 5.4. *AN INTRODUCTION TO INFINITE NUMBERS* 201 Proof. Define R = {x : x is a set and x ∈ / x}. Suppose that R were a set. There are now two possibilities: either R ∈ R or R ∈ / R. Suppose first that R ∈ R. Then R must satisfy the condition to be an element of R which is that R ∈ / R, a contradiction. Suppose then that R∈ / R. Then R does not satisfy the condition to be an element of R and so in fact R ∈ R. Our two possibilities lead to contradictions. The source of the problem lies in our assumption that R is a set and so it isn’t. It follows that the definition of set we gave earlier is really deficient. I’m not going to discuss the ramifications of this result here instead I shall leave it hanging as a warning, or invitation, to the curious. Earlier in this chapter, I introduced the correspondence principle that essentially defined what it means for two sets to have the same cardinality. I only explored this notion for finite sets. I shall now show that in fact it leads to an interesting theory for arbitrary sets. A bijective correspondence between two sets A and B is defined as follows: each of element of A is paired off with exactly one element of B in such a way that different elements in A are paired off with different elements in B, and every element of B is paired off with something in A. We say that the sets A and B are equinumerous, denoted A ∼ = B, if there is a bijective correspondence between A and B. If A = ∅ define |A| = 0. If A ∼ = {1, 2, . . . , n} define |A| = n. If A ∼ =N define |A| = ℵ0 ; this number is called aleph nought. Such a set is said to be countably infinite. If A ∼ = R define |A| = c; this number is called the cardinality of the continuum. Theorem 5.4.2. 1. |E| = |O| = |N|. 2. |Z| = |N|. 3. |Q| = |N|. 4. |R| = 6 |N| and so ℵ0 6= c. Proof. (1). To show that N ∼ = E, we use the correspondence (function) ∼ n 7→ 2n. To show that N = O, we use the function n 7→ 2n + 1. 202 CHAPTER 5. COUNTING (2) List the elements of Z in the following way: 0, −1, 1, −2, 2, −3, 3, . . . (3) One way is to split Q into two disjoint sets and show that each of these sets is equinumerous with O and E, respectively. I shall do part and leave the rest to you. Let Q+ be the set of all positive rationals. Set up an array with columns and rows labelled by the non-zero natural numbers. n Interpret the entry in row m and column n as the rational number m . Now count the resulting fractions by counting along the diagonals (going up from left to right) omitting repetitions. (4) One of the great results of mathematics proved using the Cantor diagonalization argument. To make my argument a tad more natural I shall use the fact, that is easily proved, that N ∼ = N∗ where the latter set is the set of positive natural numbers. Assume that there is a bijective correspondence between N and R. Then we may list the reals as r1 , r2 , r3 . . .. Each real number can be expressed as an infinite decimal: ri = ai · ai1 ai2 ai3 . . .. Define a real number R as follows: R = 0 · R1 R2 R3 . . . where Ri is equal to 0 if aii is odd, and is equal to 1 otherwise. Observe that R 6= ri for all i by construction, contradicting out assumption that N∼ = R. At this point, we begin to reach the limits of what is known. The continuum hypothesis (CH) is the assertion that the only infinite subsets of the reals are either countably infinite or have cardinality of the continuum. No one knows whether (CH) is true or false (but that’s not the half of it). 5.5 *Proving things about sets* This section will not be examined in 2013. Let me begin first by listing the most important properties of the set operations that we defined at the start of this chapter. Let A, B and C be any sets. 1. A ∩ (B ∩ C) = (A ∩ B) ∩ C. Intersection is associative. 5.6. LEARNING OUTCOMES FOR CHAPTER 5 203 2. A ∩ B = B ∩ A. Intersection is commutative. 3. A ∩ ∅ = ∅ = ∅ ∩ A. 4. A ∪ (B ∪ C) = (A ∪ B) ∪ C. Union is associative. 5. A ∪ B = B ∪ A. Intersection is commutative. 6. A ∪ ∅ = A = ∅ ∪ A. The empty set if the identity for union. 7. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C). Intersection distributes over union. 8. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). Union distributes over intersection. 9. A \ (B ∪ C) = (A \ B) ∩ (A \ C). 10. A \ (B ∪ C) = (A \ B) ∪ (A \ C). It’s possible to illustrate these results by drawing Venn diagrams: for each result, draw a Venn diagram of the lefthandside and, separately, a Venn diagram of the righthandside and observe that you get the same diagrams. This is very handy if you want to check a result but doesn’t really constitute a proof. So how should we prove these results? First, to prove that X = Y prove (1) that X ⊆ Y and (2) that Y ⊆ X. But to prove the above results we need more. The operations of intersection, union and set difference were defined using the words and, or and not. Thus to prove any results that use these words we must have clear definitions of what we actually mean by them. This requires setting up the basics of propositional logic. Once this is done, it is then a simple matter to prove the above results precisely. 5.6 Learning outcomes for Chapter 5 • Manipulate sets. • Answer simple counting questions including those involving permutations and combinations. • Be able to apply the binomial theorem. 204 5.7 CHAPTER 5. COUNTING Further reading and exercises This chapter is really a prelude to probability theory and any mysteries will fall quickly into place once you start studying that subject. Chapter 3 of Hammack contains some of the same material as well as Chapter 7 of Hirst and Singerman. Afterword The development of algebra can be viewed as the development of our understanding of the concept of number. The introduction of complex numbers in the sixteenth century burnt like a slow fuse through the following centuries finally exploding in the nineteenth with the birth of many new algebraic systems. This began with the observation that complex numbers have not only an algebraic side but also a geometric one: they are two-dimensional objects. Hamilton wondered if there were three-dimensional analogues of complex numbers. After years of trying, he finally succeeded but found, not three-dimensional but four-dimensional analogues, called quaternions. These enjoyed an early vogue but applied mathematicians found them less convenient to work with. Gibbs stripped the quaternions down and rebuilt them as three-dimensional vectors equipped with scalar and vector products. Vectors form the basis of vector analysis and so the first language we meet for dealing with, say, Maxwell’s equations. When matrices were introduced, it was realized that they could be used to represent both complex numbers and quaternions as sets of matrices with real entries. This is the beginning of both linear algebra and of the theory of (finite dimensional) algebras. 205 206 AFTERWORD Bibliography [1] J. W. Archbold, Algebra, Fourth Edition, Pitman Paperbacks, 1970. [2] G. Birkhoff and S. Mac Lane, A survey of modern algebra, Third Edition, The Macmillan Company, 1965. [3] C. B. Boyer, U. Merzbach, History of mathematics, John Wiley and Sons, 2nd edition, 1989. [4] L. N. Childs, A concrete introduction to higher algebra, second edition, Springer, 1995. [5] G. Chrystal, Introduction to algebra, London, Adam and Charles Black, 1902. [6] G. Cornell, J. H. Silverman, G. Stevens, Modular forms and Fermat’s last theorem, Springer, 2000. [7] R. Courant, Differential and integral calculus, volume 1, Blackie and Son Limited, 1945. [8] R. Courant and H. Robbins, What is mathematics?, OUP, 1978. [9] H.-D. Ebbinghaus et al, Zahlen, Springer-Verlag, 1988. [10] G. H. Hardy, A course of pure mathematics, Tenth Edition, CUP, 1967. [11] J. L. Heilbron, Geometry civilized, Clarendon Press, Oxford, 2000. [12] http://www-history.mcs.st-and.ac.uk/ [13] O. Ore, Number theory and its history, Dover, 1948. 207 208 BIBLIOGRAPHY [14] A. J. Pettofrezzo, Vectors and their applications, Prentice-Hall, Inc., 1966. [15] E. Robson, Words and Pictures: New Light on Plimpton 322, American Mathematical Monthly 109 (2002), 105–119. [16] L. E. Sigler, Fibonacci’s Liber Abaci, Springer, 2003. [17] J. Stillwell, Elements of algebra, Springer, 1994. [18] C. J. Tranter, Advanced level pure mathematics, Fourth Edition, Hodder and Stoughton, 1978. [19] J. V. Uspensky, Theory of equations, McGraw-Hill, 1948. [20] B. L. Van der Waerden, Algebra: erster Teil, Springer-Verlag, 1966.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download F17CC1 ALGEBRA A Algebra, geometry and combinatorics