Download Concrete Algebra - the School of Mathematics, Applied Mathematics

Document related concepts

Horner's method wikipedia , lookup

Cubic function wikipedia , lookup

Root of unity wikipedia , lookup

Quadratic form wikipedia , lookup

Gröbner basis wikipedia , lookup

Equation wikipedia , lookup

Quartic function wikipedia , lookup

Modular representation theory wikipedia , lookup

Polynomial wikipedia , lookup

Projective plane wikipedia , lookup

Homogeneous coordinates wikipedia , lookup

Field (mathematics) wikipedia , lookup

Resultant wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Polynomial greatest common divisor wikipedia , lookup

Algebraic variety wikipedia , lookup

Elliptic curve wikipedia , lookup

System of polynomial equations wikipedia , lookup

Polynomial ring wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Factorization wikipedia , lookup

Eisenstein's criterion wikipedia , lookup

Algebraic number field wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
Benjamin McKay
Concrete Algebra
With a View Toward Abstract Algebra
March 20, 2017
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
iii
Preface
With my full philosophical rucksack I can only climb slowly up the mountain of mathematics.
— Ludwig Wittgenstein
Culture and Value
These notes are from lectures given in 2015 at University College Cork. They aim
to explain the most concrete and fundamental aspects of algebra, in particular the
algebra of the integers and of polynomial functions of a single variable, grounded
by proofs using mathematical induction. It is impossible to learn mathematics by
reading a book like you would read a novel; you have to work through exercises and
calculate out examples. You should try all of the problems. More importantly, since
the purpose of this class is to give you a deeper feeling for elementary mathematics,
rather than rushing into advanced mathematics, you should reflect about how the
following simple ideas reshape your vision of algebra. Consider how you can use
your new perspective on elementary mathematics to help you some day guide other
students, especially children, with surer footing than the teachers who guided you.
v
vi
The temperature of Heaven can be rather accurately computed.
Our authority is Isaiah 30:26, “Moreover, the light of the Moon
shall be as the light of the Sun and the light of the Sun shall
be sevenfold, as the light of seven days.” Thus Heaven receives
from the Moon as much radiation as we do from the Sun, and in
addition 7 × 7 = 49 times as much as the Earth does from the
Sun, or 50 times in all. The light we receive from the Moon is
one 1/10 000 of the light we receive from the Sun, so we can ignore
that. . . . The radiation falling on Heaven will heat it to the point
where the heat lost by radiation is just equal to the heat received by
radiation, i.e., Heaven loses 50 times as much heat as the Earth by
radiation. Using the Stefan-Boltzmann law for radiation, (H/E)
temperature of the earth (∼ 300 K), gives H as 798 K (525 ◦C).
The exact temperature of Hell cannot be computed. . . . [However]
Revelations 21:8 says “But the fearful, and unbelieving . . . shall
have their part in the lake which burneth with fire and brimstone.”
A lake of molten brimstone means that its temperature must be at
or below the boiling point, 444.6 ◦C. We have, then, that Heaven,
at 525 ◦C is hotter than Hell at 445 ◦C.
— Applied Optics , vol. 11, A14, 1972
In these days the angel of topology and the devil of abstract algebra
fight for the soul of every individual discipline of mathematics.
— Hermann Weyl
Invariants, Duke Mathematical Journal 5, 1939, 489–
502
— and so who are you, after all?
— I am part of the power which forever wills evil and forever works
good.
— Goethe
Faust
This Book is not to be doubted.
— Quran , 2:1/2:6-2:10 The Cow
Contents
1
The integers
2
Mathematical induction
3
Greatest common divisors
4
Prime numbers
5
Modular arithmetic
6
Secret messages
7
Rational, real and complex numbers
1
9
17
23
27
41
8
Polynomials
9
Factoring polynomials
57
65
10 Resultants and discriminants
11 Groups
47
75
87
12 Permuting roots
13 Fields
121
14 Rings
131
109
15 Algebraic curves in the plane
137
16 Where plane curves intersect
147
17 Quotient rings
153
18 Field extensions and algebraic curves
19 The projective plane
159
163
20 Algebraic curves in the projective plane
21 Families of plane curves
22 The tangent line
23 Projective duality
191
201
24 Polynomial equations have solutions
25 More projective planes
Bibliography
225
List of notation
Index
229
vii
227
169
183
207
203
Chapter 1
The integers
God made the integers; all else is the work of man.
— Leopold Kronecker
Notation
We will write numbers using notation like 1 234 567.123 45, using a decimal point . at
the last integer digit, and using thin spaces to separate out every 3 digits before or
after the decimal point. You might prefer 1,234,567·123,45 or 1,234,567.123,45, which
are also fine. We reserve the · symbol for multiplication, writing 2 · 3 = 6 rather than
2 × 3 = 6.
The laws of integer arithmetic
The integers are the numbers . . . , −2, −1, 0, 1, 2, . . .. Let us distill their essential
properties, using only the concepts of addition and multiplication.
Addition laws:
a. The associative law: For any integers a, b, c: (a + b) + c = a + (b + c).
b. The identity law: There is an integer 0 so that for any integer a, a + 0 = a.
c. The existence of negatives: for any integer a, there is an integer b (denote
by the symbol −a) so that a + b = 0.
d. The commutative law: For any integers a, b, a + b = b + a.
Multiplication laws:
a. The associative law: For any integers a, b, c: (ab)c = a(bc).
b. The identity law: There is an integer 1 so that for any integer a, a1 = a.
c. The zero divisors law: For any integers a, b, if ab = 0 then a = 0 or b = 0.
d. The commutative law: For any integers a, b, ab = ba.
The distributive law:
a. For any integers a, b, c: a(b + c) = ab + ac.
1
2
The integers
Sign laws:
An integer a is positive if it lies among the integers 1, 1+1, 1+1+1, . . ., negative
if −a is positive. Of course, we write 1 + 1 as 2, and 1 + 1 + 1 as 3 and so on.
We write a < b to mean that there is a positive number c so that b = a + c,
write a > b to mean b < a, write a ≤ b to mean a < b or a = b, and so on. We
write |a| to mean a, if a ≥ 0, and to mean −a otherwise, and call it the absolute
value of a.
a. The inequality cancellation law for addition: For any integers a, b, c, if
a + c < b + c then a < b.
b. The inequality cancellation law for multiplication: If a < b and if c > 0
then ac < bc.
c. Determinacy of sign: Every integer a has precisely one of the following
properties: a > 0, a < 0, a = 0.
The law of well ordering:
a. Any collection of positive integers has a least element; that is to say, an
element a so that every element b satisfies a ≤ b.
All of the other arithmetic laws we are familiar with can be derived from these. For
example, the associative law for addition, applied twice, shows that (a + b) + (c + d) =
a + (b + (c + d)), and so on, so that we can add up any finite sequence of integers, in
any order, and get the same result, which we write in this case as a + b + c + d. A
similar story holds for multiplication.
To understand mathematics, you have to solve a large number of problems.
I prayed for twenty years but received no answer until I prayed with my
legs.
— Frederick Douglass, statesman and escaped slave
1.1 For each equation below, what law above justifies it?
a. 7(3 + 1) = 7 · 3 + 7 · 1
b. 4(9 · 2) = (4 · 9)2
c. 2 · 3 = 3 · 2
1.2 Use the laws above to prove that if a, b, c are any integers then (a + b)c = ac + bc.
1.3 Use the laws above to prove that 0 + 0 = 0.
1.4 Use the laws above to prove that 0 · 0 = 0.
1.5 Use the laws above to prove that, for any integer a, a · 0 = 0.
1.6 Use the laws above to prove that (−1)(−1) = 1.
1.7 Use the laws above to prove that the product of any two negative numbers is
positive.
3
Division of integers
1.8 Our laws ensure that there is an integer 0 so that a + 0 = 0 + a = a for any integer
a. Could there be two different integers, say p and q, so that a + p = p + a = a for any
integer a, and also so that a + q = q + a = a for any integer a? (Roughly speaking,
we are asking if there is more than one integer which can “play the role” of zero.)
1.9 Our laws ensure that there is an integer 1 so that a · 1 = 1 · a = a for any integer
a. Could there be two different integers, say p and q, so that ap = pa = a for any
integer a, and also so that aq = qa = a for any integer a?
Theorem 1.1 (The equality cancellation law for addition). Suppose that a, b and c
are integers and that a + c = b + c. Then a = b.
Proof. By the existence of negatives, there is a number −c so that c + (−c) = 0.
Clearly (a + c) + (−c) = (b + c) + (−c). Apply the associative law for addition:
a + (c + (−c)) = b + (c + (−c)), so a + 0 = b + 0, so a = b.
We haven’t mentioned subtraction yet.
1.10 Suppose that a and b are integers. Prove that there is a unique integer c so that
a = b + c. Of course, from now on we write this integer c as a − b.
1.11 Prove that, for any integers a, b, a − b = a + (−b).
1.12 Prove that subtraction distributes over multiplication: for any integers a, b, c,
a(b − c) = ab − ac.
1.13 Prove that any two integers b and c satisfy just precisely one of the conditions
b > c, b = c, b < c.
Theorem 1.2. The equality cancellation law for multiplication: for an integers a, b, c
if ab = ac and a 6= 0 then b = c.
Proof. If ab = ac then ab−ac = ac−ac = 0. But ab−ac = a(b−c) by the distributive
law. Then
1.14 We know how to add and multiply 2 × 2 matrices with integer entries. Of the
various laws of addition and multiplication and signs for integers, which hold true also
for such matrices?
Division of integers
Can you do Division? Divide a loaf by a knife—what’s the answer to
that?
— Lewis Carroll
Through the Looking Glass
We haven’t mentioned division yet. Danger: although 2 and 3 are integers,
3
= 1.5
2
is not an integer.
4
The integers
1.15 Suppose that a and b are integers and that b 6= 0. Prove from the laws above
that there is at most one integer c so that a = bc. Of course, from now on we write
this integer c as ab or a/b.
1.16 Danger: why can’t we divide by zero?
We already noted that 3/2 is not an integer. At the moment, we are trying to
work with integers only. An integer b divides an integer c if c/b is an integer; we also
say that b is a divisor of c.
1.17 Explain why every integer divides into zero.
1.18 Prove that, for any two integers b and c, the following are equivalent:
a. b divides c,
b. −b divides c,
c. b divides −c,
d. −b divides −c,
e. |b| divides |c|.
Proposition 1.3. Take any two integers b and c. If b divides c, then |b| < |c|
or b = c or b = −c.
Proof. By the solution of the last problem, clearly we can assume that b and c are
positive. We know that b divides c, say c = qb for some integer q. Since b > 0 and
c > 0, clearly q > 0. If q = 1 then c = b. But then q > 1, say q = n + 1, some positive
integer n. But then c = qb = (n + 1)b = nb + b > b.
Theorem 1.4 (Euclid). Suppose that b and c are integers and c 6= 0. Then there
are unique integers q and r (the quotient and remainder) so that b = qc + r and so
that 0 ≤ r < |c|.
Proof. To make our notation a little easier, we can assume (by perhaps changing the
signs of c and q in our statement above) that c > 0.
Note that the integer (1 + |b|)c is a positive multiple of c bigger than b. By the
law of well ordering, since there is a positive multiple of c bigger than b, there is a
smallest one.
Write the smallest positive multiple of c bigger than b as (q + 1)c. So the next
smallest positive multiple of c is not bigger than b: qc ≤ b. So qc ≤ b < (q + 1)c, or
0 ≤ b − qc < c. Let r = b − qc.
We have proven that we can find a quotient q and remainder r. We need to show
that they are unique. Suppose that there is some other choice of integers Q and R
so that b = Qc + R and 0 ≤ R < c. Taking the difference between b = qc + r and
b = Qc + R, we find 0 = (Q − q)c + (R − r). In particular, c divides into R − r.
Switching the labels as to which ones are q, r and which are Q, R, we can assume
that R ≥ r. So then 0 ≤ r ≤ R < c, so R − r < c. By proposition 1.3, since
0 ≤ R − r < c and c divides into R − r, we must have R − r = 0, so r = R. Plug into
0 = (Q − q)c + (R − r) to see that Q = q.
5
The greatest common divisor
How do we actually find quotient and remainder, by hand? Long division:
14
17 249
170
79
68
11
So if we start with b = 249 and c = 17, we carry out long division to find the quotient
q = 14 and the remainder r = 11.
1.19 Find the quotient and remainder for b, c equal to:
a. −180, 9
b. −169, 11
c. −982, −11
d. 279, −11
e. 247, −27
The greatest common divisor
A common divisor of some integers is an integer which divides them all.
1.20 Given any collection of one or more integers, not all zero, prove that they have
a greatest common divisor, i.e. a largest positive integer divisor.
Denote the greatest common divisor of integers m1 , m2 , . . . , mn as
gcd {m1 , m2 , . . . , mn } .
If the greatest common divisor of two integers is 1, they are coprime. We will also
define gcd {0, 0, . . . , 0} ..= 0.
Lemma 1.5. Take two integers b, c with c 6= 0 and compute the quotient q and
remainder r, so that b = qc + r and 0 ≤ r < |c|. Then the greatest common divisor
of b and c is the greatest common divisor of c and r.
Proof. Any divisor of c and r divides the right hand of the equation b = qc + r, so it
divides the left hand side, and so divides b and c. By the same reasoning, writing the
same equation as r = b − qc, we see that any divisor of b and c divides c and r.
This makes very fast work of finding the greatest common divisor by hand. For example, if b = 249 and c = 17 then we found that q = 14 and r = 11, so gcd {249, 17} =
gcd {17, 11}. Repeat the process: taking 17 and dividing out 11, the quotient and
remainder are 1, 6, so gcd {17, 11} = gcd {11, 6}. Again repeat the process: the quotient and remainder for 11, 6 are 1, 5, so gcd {11, 6} = gcd {6, 5}. Again repeat the
process: the quotient and remainder for 11, 6 are 1, 5, so gcd {11, 6} = gcd {6, 5}. In
6
The integers
more detail, we divide the smaller number (smaller in absolute value) into that the
larger:
14
17 249
170
79
68
11
Throw out the larger number, 249, and replace it by the remainder, 11, and divide
again:
1
11 17
11
6
Again we throw out the larger number (in absolute value), 17, and replace with the
remainder, 6, and repeat:
1
6 11
6
5
and again:
1
5 6
5
1
and again:
5
1 5
5
0
The remainder is now zero. The greatest common divisor is therefore 1, the final
nonzero remainder: gcd {249, 17} = 1. This method to find the greatest common
divisor is called the Euclidean algorithm.
If the final nonzero number is negative, just change its sign to get the greatest
common divisor. For example,
gcd {−4, −2} = gcd {−2, 0} = gcd {−2} = 2.
1.21 Find the greatest common divisor of
a. 4233, 884
b. -191, 78
c. 253, 29
d. 84, 276
e. -92, 876
f. 147, 637
g. 266 664, 877 769 (answer: 11 111)
7
The least common multiple
The least common multiple
The least common multiple of a finite collection of integers m1 , m2 , . . . , mn is the smallest positive number ` = lcm {m1 , m2 , . . . , mn } so that all of the integers m1 , m2 , . . . , mn
divide `.
Lemma 1.6. The least common multiple of any two integers b, c (not both zero) is
lcm {b, c} =
|bc|
.
gcd {b, c}
Proof. For simplicity, assume that b > 0 and c > 0; the cases of b ≤ 0 or c ≤ 0 are
treated easily by flipping signs as needed and checking what happens when b = 0
or when c = 0 directly; we leave this to the reader. Let d ..= gcd {b, c}, and factor
b = Bd and c = Cd. Then B and C are coprime. Write the least common multiple
` of b, c as either ` = b1 b or as ` = c1 c, since it is a multiple of both b and c. So
then ` = b1 Bd = c1 Cd. Cancelling, b1 B = c1 C. So C divides b1 B, but doesn’t
divide B, so divides b1 , say b1 = b2 C, so ` = b1 Bd = b2 CBd. So BCd divides `. So
(bc)/d = BCd divides `, and is a multiple of b: BCd = (Bd)C = bC, and is a multiple
of c: BCd = B(Cd) = Bc. But ` is the least such multiple.
Sage
Computers are useless. They can only give you answers.
— Pablo Picasso
These lecture notes include optional sections explaining the use of the sage computer algebra system. At the time these notes were written, instructions to install sage
on a computer are at www.sagemath.org, but you should be able to try out sage on
sagecell.sagemath.org or even create worksheets in sage on cloud.sagemath.com
over the internet without installing anything on your computer. If we type
gcd(1200,1040)
and press shift–enter, we see the greatest common divisor: 80. Similarly type
lcm(1200,1040)
and press shift–enter, we see the least common multiple: 15600. Sage can carry out
complicated arithmetic operations. The multiplication symbol is *:
12345*13579
gives 167632755. Sage uses the expression 2^3 to mean 23 . You can invent variables
in sage by just typing in names for them. For example, ending each line with enter,
except the last which you end with shift–enter (when you want sage to compute
results):
8
The integers
x=4
y=7
x*y
to print 28. The expression 15 % 4 means the remainder of 15 divided by 4, while
15//4 means the quotient of 15 divided by 4. We can calculate the greatest common
divisor of several numbers as
gcd([3800,7600,1900])
giving us 1900.
Chapter 2
Mathematical induction
It is sometimes required to prove a theorem which shall be true
whenever a certain quantity n which it involves shall be an integer
or whole number and the method of proof is usually of the following
kind. 1st. The theorem is proved to be true when n = 1. 2ndly.
It is proved that if the theorem is true when n is a given whole
number, it will be true if n is the next greater integer. Hence the
theorem is true universally.
— George Boole
So nat’ralists observe, a flea
Has smaller fleas that on him prey;
And these have smaller fleas to bite ’em.
And so proceeds Ad infinitum.
— Jonathan Swift
On Poetry: A Rhapsody
It is often believed that everyone with red hair must have a red haired ancestor.
But this principle is not true. We have all seen someone with red hair.
According to the principle, she must have a red haired ancestor. But then by
the same principle, he must have a red haired ancestor too, and so on. So the
principle predicts an infinite chain of red haired ancestors. But there have
only been a finite number of creatures (so the geologists tell us). So some
red haired creature had no red haired ancestor.
We will see that
n(n + 1)
2
for any positive integer n. First, let’s check that this is correct for a few values
?
of n just to be careful. Danger: just to be very careful, we put = between any
two quantities when we are checking to see if they are equal; we are making
clear that we don’t yet know.
?
For n = 1, we get 1 = 1(1 + 1)/2, which is true because the right hand side
is 1(1 + 1)/2) = 2/2 = 1.
1 + 2 + 3 + ··· + n =
?
For n = 2, we need to check 1 + 2 = 2(2 + 1)/2. The left hand side of the
9
10
Mathematical induction
equation is 3, while the right hand side is:
2(3)
2(2 + 1)
=
= 3,
2
2
So we checked that they are equal.
?
For n = 3, we need to check that 1 + 2 + 3 = 3(3 + 1)/2. The left hand side
is 6, and you can check that the right hand side is 3(4)/2 = 6 too. So we
checked that they are equal.
But this process will just go on for ever and we will never finish checking all
values of n. We want to check all values of n, all at once.
Picture a row of dominoes. If we can make the first domino topple, and we can
make sure that each domino, if it topples, will make the next domino topple, then
they will all topple.
Theorem 2.1 (The Principle of Mathematical Induction). Take a collection of positive integers. Suppose that 1 belongs to this collection. Suppose that, whenever all
positive integers less than a given positive integer n belong to the collection, then so
does the integer n. (For example, if 1, 2, 3, 4, 5 belong, then so does 6, and so on.)
Then the collection consists precisely of all of the positive integers.
Proof. Let S be the set of positive integers not belonging to that collection. If S is
empty, then our collection contains all positive integers, so our theorem is correct.
But what is S is not empty? By the law of well ordering, if S is not empty, then S
has a least element, say n. So n is not in our collection. But being the least integer
not in our collection, all integers 1, 2, . . . , n − 1 less than n are in our collection. By
hypothesis, n is also in our collection, a contradiction to our assumption that S is not
empty.
Let’s prove that 1 + 2 + 3 + · · · + n = n(n + 1)/2 for any positive integer
n. First, note that we have already checked this for n = 1 above. Imagine
that we have checked our equation 1 + 2 + 3 + · · · + n = n(n + 1)/2 for any
positive integer n up to, but not including, some integer n = k. Now we want
?
to check for n = k whether it is still true: 1 + 2 + 3 + · · · + n = n(n + 1)/2.
Since we already know this for n = k − 1, we are allowed to write that out,
without question marks:
1 + 2 + 3 + · · · + k − 1 = (k − 1)(k − 1 + 1)/2.
Simplify:
1 + 2 + 3 + · · · + k − 1 = (k − 1)k/2.
Now add k to both sides. The left hand side becomes 1 + 2 + 3 + · · · + k. The
11
Mathematical induction
right hand side becomes (k − 1)k/2 + k, which we simplify to
(k − 1)k
(k − 1)k
2k
+k =
+
,
2
2
2
(k − 1)k + 2k
,
=
2
2
k − k + 2k
=
,
2
2
k +k
=
,
2
k(k + 1)
=
.
2
So we have found that, as we wanted to prove,
1 + 2 + 3 + ··· + k =
k(k + 1)
.
2
Note that we used mathematical induction to prove this: we prove the result
for n = 1, and then suppose that we have proven it already for all values of
n up to some given value, and then show that this will ensure it is still true
for the next value.
The general pattern of induction arguments: we start with a statement we want
to prove, which contains a variable, say n, representing a positive integer.
a. The base case: Prove your statement directly for n = 1.
b. The induction hypothesis: Assume that the statement is true for all positive
integer values less than some value n.
c. The induction step: Prove that it is therefore also true for that value of n.
We can use induction in definitions, not just in proofs. We haven’t made
sense yet of exponents. (Exponents are often called indices in Irish secondary
schools, but nowhere else in the world to my knowledge).
a. The base case: For any integer a 6= 0 we define a0 ..= 1. Watch: We
can write ..= instead of = to mean that this is our definition of what
a0 means, not an equation we have somehow calculated out. For any
integer a (including the possibility of a = 0) we also define a1 ..= a.
b. The induction hypothesis: Suppose that we have defined already what
a1 , a2 , . . . , ab
means, for some positive integer b.
c. The induction step: We then define ab+1 to mean ab+1 ..= a · ab .
For example, by writing out
a4 = a · a3 ,
= a · a · a2 ,
= a · a · a · a1 ,
=a
·a·a
| · a{z
}.
4 times
12
Mathematical induction
Another example:
a3 a2 = (a · a · a) (a · a) = a5 .
In an expression ab , the quantity a is the mantissa or base and b is the
exponent.
2.1 Use induction to prove that ab+c = ab ac for any integer a and for any positive
integers b, c.
2.2 Use induction to prove that ab
integers b, c.
c
= abc for any integer a and for any positive
Sometimes we start induction at a value of n which is not at n = 1.
Let’s prove, for any integer n ≥ 2, that 2n+1 < 3n .
a. The base case: First, we check that this is true for n = 2: 22+1 < 32 ?
Simplify to see that this is 8 < 9, which is clearly true.
b. The induction hypothesis: Next, suppose that we know this is true for
all values n = 2, 3, 4, . . . , k − 1.
c. The induction step: We need to check it for n = k: 2k+1 < 3k ? How
can we relate this to the values n = 2, 3, 4, . . . , k − 1?
2k+1 = 2 · 2k ,
< 3 · 3k−1 by assumption,
= 3k .
We conclude that 2n+1 < 3n for all integers n ≥ 2.
2.3 All horses are the same colour; we can prove this by induction on the number of
horses in a given set.
a. The base case: If there’s just one horse then it’s the same colour as itself, so
the base case is trivial.
b. The induction hypothesis: Suppose that we have proven that, in any set of at
most k horses, all of the horses are the same colour as one another, for any
number k = 1, 2, . . . , n.
c. The induction step: Assume that there are n horses numbered 1 to n. By the
induction hypothesis, horses 1 through n − 1 are the same color as one another.
Similarly, horses 2 through n are the same color. But the middle horses, 2
through n − 1, can’t change color when they’re in different groups; these are
horses, not chameleons. So horses 1 and n must be the same color as well.
Thus all n horses are the same color. What is wrong with this reasoning?
2.4 We could have assumed much less about the integers than the laws we gave in
chapter 1. Using only the laws for addition and induction,
a. Explain how to define multiplication of integers.
13
Sage
b. Use your definition of multiplication to prove the associative law for multiplication.
c. Use your definition of multiplication to prove the equality cancellation law for
multiplication.
2.5 Suppose that x, b are positive integers. Prove that x can be written as
x = a0 + a1 b + a2 b2 + · · · + ak bk
for unique integers a0 , a1 , . . . , ak with 0 ≤ ai < b. Hint: take quotient and remainder,
and apply induction. (The sequence ak , ak−1 , . . . , a1 , a0 is the sequence of digits of x
in base b notation.)
2.6 Prove that for every positive integer n,
13 + 23 + · · · + n3 =
n(n + 1)
2
2
.
Sage
A computer once beat me at chess, but it was no match for me at kick
boxing.
— Emo Philips
To define a function in sage, you type def, to mean define, like:
def f(n):
return 2*n+1
and press shift–enter. For any input n, it will return a value f (n) = 2n + 1. Careful
about the * for multiplication. Use your function, for example, by typing
f(3)
and press shift–enter to get 7.
A function can depend on several inputs:
def g(x,y):
return x*y+2
A recursive function is one whose value for some input depends on its value for
other inputs. For example, the sum of the integers from 1 to n is:
def sum_of_integers(n):
if n<=0:
return 0
else:
return n+sum_of_integers(n-1)
giving us sum_of_integers(5) = 15.
14
Mathematical induction
2.7 The Fibonnaci sequence is F (1) = 1, F (2) = 1, and F (n) = F (n − 1) + F (n − 2)
for n ≥ 3. Write out a function F(n) in sage to compute the Fibonacci sequence.
Lets check that 2n+1 < 3n for some values of n, using a loop:
for n in [0..10]:
print(3^n-2^(n+1))
This will try successively plugging in each value of n starting at n = 0 and going up
to n = 1, n = 2, and so on up to n = 10, and print out the value of n and the value
of 3n − 2n+1 . It prints out:
(0, -1)
(1, -1)
(2, 1)
(3, 11)
(4, 49)
(5, 179)
(6, 601)
(7, 1931)
(8, 6049)
(9, 18659)
(10, 57001)
giving the value of n and the value of the difference 3n − 2n+1 , which we can readily
see is positive once n ≥ 2. Tricks like this help us to check our induction proofs.
If we write a=a+1 in sage, this means that the new value of the variable a is equal
to one more than the old value that a had previously. Unlike mathematical variables,
sage variables can change value over time. In sage the expression a<>b means a 6= b.
Although sage knows how to calculate the greatest common divisor of some integers,
we can define own greatest common divisor function:
def my_gcd(a, b):
while b<>0:
a,b=b,a%b
return a
We could also write a greatest divisor function that uses repeated subtraction, avoiding
the quotient operation %:
def gcd_by_subtraction(a, b):
if a<0:
a=-a
if b<0:
b=-b
if a==0:
return b
if b==0:
return a
while a<>b:
if a>b:
a=a-b
Lists in sage
else:
b=b-a
return a
Lists in sage
Sage manipulates lists. A list is a finite sequence of objects written down next to one
another. The list L=[4,4,7,9] is a list called L which consists of four numbers: 4, 4, 7
and 9, in that order. Note that the same entry can appear any number of times; in
this example, the number 4 appears twice. You can think of a list as like a vector in
Rn , a list of numbers. When you “add” lists they concatenate: [6]+[2] yields [6, 2].
Sage uses notation L[0], L[1], and so on, to retrieve the elements of a list, instead of
the usual vector notation. Warning: the element L[1] is not the first element. The
length of a list L is len(L). To create a list [0, 1, 2, 3, 4, 5] of successive integer, type
range(0,6). Note that the number 6 here tells you to go up to, but not include, 6.
So a L list of 6 elements has elements L0 , L1 , . . . , L5 , denoted L[0], L[1], . . . , L[5] in
sage. To retrieve the list of elements of L from L2 to L4 , type L[2:5]; again strangely
this means up to but not including L5 .
For example, to reverse the elements of a list:
def reverse(L):
n=len(L)
if n<=1:
return L
else:
return L[n-1:n]+reverse(L[0:n-1])
We create a list with 10 entries, print it:
L=range(0,10)
print(L)
yielding [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and then print its reversal:
print(reverse(L))
yielding [9, 8, 7, 6, 5, 4, 3, 2, 1, 0].
We can construct a list of elements of the Fibonacci sequence:
def fibonacci_list(n):
L=[1,1]
for i in range(2,n):
L=L+[L[i-1]+L[i-2]]
return L
so that fibonacci_list(10) yields [1, 1, 2, 3, 5, 8, 13, 21, 34, 55], a list of the first ten
numbers in the Fibonacci sequence. The code works by first creating a list L=[1,1]
with the first two entries in it, and then successively building up the list L by concatenating to it a list with one more element, the sum of the two previous list entries,
for all values i = 2, 3, . . . , n.
15
Chapter 3
Greatest common divisors
The laws of nature are but the mathematical thoughts of God.
— Euclid
The extended Euclidean algorithm
Theorem 3.1 (Bézout). For any two integers b, c, not both zero, there are integers s, t
so that sb + tc = gcd {b, c}.
These Bézout coefficients s, t play an essential role in advanced arithmetic, as we
will see.
To find the Bézout coefficients of 12, 8, write out a matrix
1
0
0
1
12
.
8
Now repeatedly add some integer multiple of one row to the other, to try to
make the bigger number in the last column (bigger by absolute value) get
smaller (by absolute value). In this case, we can subtract row 2 from row 1,
to get rid of as much of the 12 as possible. All operations are carried out a
whole row at a time:
1 −1 4
0
1
8
Now the bigger number (by absolute value) in the last column is the 8. We
subtract as large an integer multiple of row 1 from row 2, in this case subtract
2 (row 1) from row 2, to kill as much of the 8 as possible:
1
−2
−1
3
4
0
Once we get a zero in the last column, the other entry in the last column is
the greatest common divisor, and the entries in the same row as the greatest
common divisor are the Bézout coefficients. In our example, the Bézout
coefficients of 12, 8 are 1, −1 and the greatest common divisor is 4. This trick
to calculate Bézout coefficients is the extended Euclidean algorithm.
17
18
Greatest common divisors
Again, we have to be a little bit careful about minus signs. For example, if
we look at −8, −4, we find Bézout coeffiicients by
1
0
0
1
−8
−4
→
−2
1
1
0
0
,
−4
the process stops here, and where we expect to get an equation sb + tc =
gcd {b, c}, instead we get
0(−8) + 1(−4) = −4,
which has the wrong sign, a negative greatest common divisor. So if the
answer pops out a negative for the greatest common divisor, we change signs:
t = 0, s = −1, gcd {b, c} = 4.
Theorem 3.2. The extended Euclidean algorithm calculates Bézout coefficients. In
particular, Bézout coefficients s, t exist for any integers b, c with c 6= 0.
Proof. At each step in the extended Euclidean algorithm, the third column proceeds
exactly by the Euclidean algorithm, replacing the larger number (in absolute value)
with the remainder by the division. So at the last step, the last nonzero number in
the third column is the greatest common divisor.
Our extended Euclidean algorithm starts with
1
0
b
,
c
0
1
a matrix which satisfies
1
0
b
c
−1
!
b
c
0
1
=
0
,
0
It is easy to check that if some 2 × 3 matrix M satisfies
b
c
−1
M
!
0
,
0
=
then so does the matrix you get by adding any multiple of one row of M to the other
row.
Eventually we get a zero in the final column, say
or
s
S
t
T
d
,
0
s
S
t
T
0
D
.
It doesn’t matter which since, at any step, we could always swap the two rows without
changing the steps otherwise. So suppose we end up at
s
S
t
T
d
.
0
19
The greatest common divisor of several numbers
But
s
S
t
T
d
0
b
c
−1
!
=
sb + tc − d
.
0
This has to be zero, so:
sb + tc = u.
Proposition 3.3. Take two integers b, c, not both zero. The number gcd {b, c} is
precisely the smallest positive integer which can be written as sb + tc for some integers
s, t.
Proof. Let d ..= gcd {b, c}. Since d divides into b, we divide it, say as b = Bd, for some
integer B. Similarly, b = Cd for some integer C. Imagine that we write down some
positive integer as sb+tc for some integers s, t. Then sb+sc = sBd+tCd = (sB +tC)d
is a multiple of d, so no smaller than d.
The greatest common divisor of several numbers
Take several integers, say m1 , m2 , . . . , mn , not all zero. To find their greatest common
divisor, we only have to find the greatest common divisor of the greatest common
divisors. For example,
gcd {m1 , m2 , m3 } = gcd {gcd {m1 , m2 } , m3 }
and
gcd {m1 , m2 , m3 , m4 } = gcd {gcd {m1 , m2 } , gcd {m3 , m4 }}
and so on.
3.1 Use this approach to find gcd {54, 90, 75}.
Bézout coefficients of several numbers
Take integers m1 , m2 , . . . , mn , not all zero. We will see that their greatest common
divisor is the smallest positive integer d so that
d = s1 m1 + s2 m2 + · · · + sn mn
for some integers s1 , s2 , . . . , sn , the Bézout coefficients of m1 , m2 , . . . , mn . To find the
Bézout coefficients, and the greatest common divisor:
a. Write down the matrix:

1
0
0

.
 ..
0
0
1
0
..
.
0
0
0
1
..
.
0
...
...
...
..
.
...
0
0
0
..
.
1

m1
m2 

m3 
.. 
. 
mn
b. Subtract whatever integer multiples of rows from other rows, repeatedly, until
the final column has exactly one nonzero entry.
20
Greatest common divisors
c. That nonzero entry is the greatest common divisor
d = gcd {m1 , m2 , . . . , mn } .
d. The entries in the same row, in earlier columns, are the Bézout coefficients, so
the row looks like
s1 s2 . . . sn d
To find Bézout coefficients of 12, 20, 18:
1
0
0
0
1
0
0
0
1
12
20
18
!
the smallest entry (in absolute value) in the final column is the 12, so we
divide it into the other two entries:
a. add −row 1 to row 2, and
b. −row 1 to row 3:
1
−1
−1
0
1
0
0
0
1
12
8
6
!
Now the smallest (in absolute value) is the 6, so we add integer multiples of
its row to the two earlier rows:
a. add −2 · row 3 to row 1, and
b. −row 3 to row 2:
3
0
−1
−2
−1
1
0
1
0
0
2
6
!
Now the 2 is the smallest nonzero entry (in absolute value) in the final column.
Note: once a row in the final column has a zero in it, it becomes irrelevant,
so we could save time by crossing it out and deleting it. That happens here
with the first row. But we will leave it as it is, just to make the steps easier
to follow. Add −3 · row 2 to row 3:
3
0
−1
−2
−1
4
0
1
−3
0
2
0
!
Just to carefully check that we haven’t made any mistake, you can compute
out that this matrix satisfies:
3
0
−1
0
1
−3
−2
−1
4
0
2
0
!


12
 20 
 18  =
−1
0
0
0
!
.
In particular, our Bézout coefficients are the 0, 1, −1, in front of the greatest
common divisor, 2. Summing up:
0 · 12 + 1 · 20 + (−1) · 18 = 2.
21
Sage
3.2 Use this approach to find Bézout coefficients of 54, 90, 75.
3.3 Prove that this method works.
3.4 Explain how to find the least common multiple of three integers a, b, c.
Sage
Computer science is no more about computers than astronomy is about
telescopes.
— Edsger Dijkstra
The extended Euclidean algorithm is built in to sage: xgcd(12,8) returns (4, 1, −1),
the greatest common divisor and the Bézout coefficients, so that 4 = 1 · 12 + (−1) · 8.
We can write our own function to compute Bézout coefficients:
def bez(b,c):
p,q,r=1,0,b
s,t,u=0,1,c
while u<>0:
if abs(r)>abs(u):
p,q,r,s,t,u=s,t,u,p,q,r
Q=u//r
s,t,u=s-Q*p,t-Q*q,u-Q*r
return r,p,q
so that bez(45,210) returns a triple (g, s, t) where g is the greatest commmon divisor
of 45, 210, while s, t are the Bézout coefficients.
Let’s find Bézout coefficients of several numbers. First we take a matrix A of size
n × (n + 1). Just as sage has lists L indexed starting at L[0], so sage indexes matrices
as Aij for i and j starting at zero, i.e. the upper left corner entry of any matrix A is
A[0,0] in sage. We search for a row which has a nonzero entry in the final column:
def row_with_nonzero_entry_in_final_column(A,rows,columns,starting_from_row=0):
for i in range(starting_from_row,rows):
if A[i,columns-1]<>0:
return i
return rows
Note that we write starting_from_row=0 to give a default value: we can specify a row
to start looking from, but if you call the function rwo_with_nonzero_entry_in_final_column
without specifying a value for the variable starting_from_row, then that variable
will take the value 0. We use this to find the Bézout coefficients of a list of integers:
def bez(L):
n=len(L)
A=matrix(n,n+1)
for i in range(0,n):
22
Greatest common divisors
for j in range(0,n):
if i==j:
A[i,j]=1
else:
A[i,j]=0
A[i,n]=L[i]
k=row_with_nonzero_entry_in_final_column(A,n,n+1)
l=row_with_nonzero_entry_in_final_column(A,n,n+1,k+1)
while l<n:
if abs(A[l,n])>abs(A[k,n]):
k,l=l,k
q=A[k,n]//A[l,n]
for j in range(0,n+1):
A[k,j]=A[k,j]-q*A[l,j]
k=row_with_nonzero_entry_in_final_column(A,n,n+1)
l=row_with_nonzero_entry_in_final_column(A,n,n+1,k+1)
return ([A[k,j] for j in range(0,n)],A[k,n])
Given a list L of integers, we let n be the length of the list L, form a matrix A of size
n × (n + 1), put the identity matrix into its first n columns, and put the entries of
L into its final column. Then we find the first row k with a nonzero entry in the last
column. Then we find the next row l with a nonzero entry in the last column. Swap
k and l if needed to get row l to have smaller entry in the last column. Find the
quotient q of those entries, dropping remainder. Subtract off q times row l from row
k. Repeat until there is no such row l left. Return a list of the numbers. Test it out:
bez([-2*3*5,2*3*7,2*3*11]) yields ([3, 2, 0] , −6), giving the Bézout coefficients as
the list [3, 2, 0] and the greatest common divisor next: −6.
Chapter 4
Prime numbers
I think prime numbers are like life. They are very logical but you could
never work out the rules, even if you spent all your time thinking about
them.
— Mark Haddon
The Curious Incident of the Dog in the Night-Time
Prime numbers
A positive integer p ≥ 2 is prime if the only positive integers which divide p are 1 and
p.
4.1 Prove that 2 is prime and that 3 is prime and that 4 is not prime.
Lemma 4.1. If b, c, d are integers and d divides bc and d is coprime to b then d
divides c.
Proof. Since d divides the product bc, we can write the product as bc = qd. Since d
and b are coprime, their greatest common divisor is 1. So there are Bézout coefficients
s, t for b, d, so sb + td = 1. So then sbc + tdc = c, i.e. sqd + tdc = c, or d(sq + dc) = c,
i.e. d divides c.
Corollary 4.2. A prime number divides a product bc just when it divides one of the
factors b or c.
An expression of a positive integer as a product of primes, such as 12 = 2 · 2 · 3,
written down in increasing order, is a prime factorization.
Theorem 4.3. Every positive integer n has a unique prime factorization.
Proof. Danger: if we multiply together a collection of integers, we must insist that
there can only be finitely many numbers in the collection, to make sense out of the
multiplication.
More danger: an empty collection is always allowed in mathematics. We will
simply say that if we have no numbers at all, then the product of those numbers (the
product of no numbers at all) is defined to mean 1. This will be the right definition
to use to make our theorems have the simplest expression. In particular, the integer
1 has the prime factorization consist of no primes.
First, let’s show that there is a prime factorization, and then we will see that
it is unique. It is clear that 1, 2, 3, . . . , 12 have prime factorizations, as in our table
2 = 2,
3 = 3,
4 = 2 · 2,
5 = 5,
6 = 2 · 3,
7 = 7,
8 = 2 · 2 · 2,
9 = 3 · 3,
10 = 2 · 5,
11 = 11,
12 = 2 · 2 · 3,
23
..
.
2395800 = 23 · 32 · 52 · 113 ,
..
.
24
Prime numbers
above. Suppose that some positive integer has no prime factorization. Pick n to be
the smallest positive integer with no prime factorization. If n does not factor into
smaller factors, then n is prime and n = n is a factorization. Suppose that n factors,
say into a product n = bc of positive integers. Write down a prime factorization for b
and then next to it one for c, giving a prime factorization for n = bc. So every positive
integer has at least one prime factorization.
Let p be the smallest integer so that p ≥ 2 and p divides n. Clearly p is prime.
Since p divides the product of the primes in any prime factorization of n, so p must
divide one of the primes in the factorization. But then p must equal one of these
primes, and must be the smallest prime in the factorization. This determines the first
prime in the factorization. So write n = pn1 for a unique integer n1 , and by induction
we can assume that n1 has a unique prime factorization, and so n does as well.
Theorem 4.4 (Euclid). There are infinitely many prime numbers.
Proof. Write down a list of finitely many primes, say p1 , p2 , . . . , pn . Let b = p1 p2 . . . pn
and let c = b + 1. Then clearly b, c are coprime. So the prime decomposition of c has
none of the primes p1 , p2 , . . . , pn in it, and so must have some other primes distinct
from these.
4.2 Write down the positive integers in a long list starting at 2:
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, . . .
Strike out all multiples of 2, except 2 itself:
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, . . .
Skip on to the next number which isn’t struck out: 3. Strike out all multiples of 3,
except 3 itself:
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, . . .
Skip on to the next number which isn’t struck out, and strike out all of its multiples
except the number itself. Continue in this way forever. Prove that the remaining
numbers, those which do not get struck out, are precisely the prime numbers. Use
this to write down all prime numbers less than 120.
Greatest common divisors
We speed up calculation of the greatest common divisor by factoring out any obvious
common factors: gcd {46, 12} = gcd {2 · 23, 2 · 6} = 2 gcd {23, 6} = 2. For example,
this works well if both numbers are even, or both are multiples of ten, or both are
multiples of 5, or even if both are multiples of 3.
4.3 Prove that an integer is a multiple of 3 just when its digits sum up to a multiple
of 3. Use this to see if 12345 is a multiple of 3. Use this to find gcd {12345, 123456}.
4.4 An even integer is a multiple of 2; an odd integer is not a multiple of 2. Prove
that if b, c are positive integers and
a. b and c are both even then gcd {b, c} = 2 gcd {b/2, c/2},
b. b is even and c is odd, then gcd {b, c} = gcd {b/2, c},
25
Sage
c. b is odd and c is even, then gcd {b, c} = gcd {b, c/2},
d. b and c are both odd, then gcd {b, c} = gcd {b, b − c}.
Use this to find the greatest common divisor of 4864, 3458 without using long division.
Answer: gcd {4864, 3458} = 19.
Sage
Mathematicians have tried in vain to this day to discover some order
in the sequence of prime numbers, and we have reason to believe that it
is a mystery into which the human mind will never penetrate.
— Leonhard Euler
Sage can factor numbers into primes:
factor(1386)
gives 2 · 32 · 7 · 11. Sage can find the next prime number larger than a given number:
next_prime(2005)= 2011. You can test if the number 4 is prime with is_prime(4),
which returns False. To list the primes between 14 and 100, type
prime_range(14,100)
to see
[17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89,
97]
The number of primes less than or equal to a real number x is traditionally denoted
π(x) (which has nothing to do with the π of trigonometry). In sage, π(x) is denoted
prime_pi(x). For example, π(106 ) = 78498 is calculated as prime_pi(10^6). The
prime number theorem says that π(x) gets more closely approximated by
x
−1 + log x
as x gets large. We can plot π(x), and compare it to that approximation.
p=plot(prime_pi, 0, 10000, rgbcolor=’red’)
q=plot(x/(log(x)-1), 5, 10000, rgbcolor=’blue’)
show(p+q)
which displays:
26
Prime numbers
4.5 Write code in sage to find the greatest common divisor of two integers, using the
ideas of problem 4.4 on page 24.
Chapter 5
Modular arithmetic
Mathematics is the queen of the sciences and number theory is the queen
of mathematics.
— Carl Friedrich Gauss
Definition
If we divide integers by 7, the possible remainders are
0, 1, 2, 3, 4, 5, 6.
For example, 65 = 9 · 7 + 2, so the remainder is 2. Two integers are congruent mod 7
if they have the same remainder modulo 7. So 65 and 2 are congruent mod 7, because
65 = 9 · 7 + 2 and 2 = 0 · 7 + 2. We denote congruence modulo 7 as
65 ≡ 2
(mod 7).
Sometimes we will allow ourselves a sloppy notation, where we write the remainder of
65 modulo 7 as 65. This is sloppy because the notation doesn’t remind us that we are
working out remainders modulo 7. If we change 7 to some other number, we could get
confused by this notation. We will often compute remainders modulo some chosen
number, say m, instead of 7.
If we add multiples of 7 to an integer, we don’t change its remainder modulo 7:
65 + 7 = 65.
Similarly,
65 − 7 = 65.
If we add, or multiply, some numbers, what happens to their remainders?
Theorem 5.1. Take a positive integer m and integers a, A, b, B. If a ≡ A (mod m)
and b ≡ B (mod m) then
a+b≡A+B
(mod m),
a−b≡A−B
(mod m),
and ab ≡ AB
27
(mod m).
28
Modular arithmetic
The bar notation is more elegant. If we agree that ā means remainder of a modulo
the fixed choice of integer m, we can write this as: if
ā = Ā and b̄ = B̄
then
a + b = A + B,
a − b = A − B, and
ab = AB.
Note that 9 ≡ 2 (mod 7) and 4 ≡ 11 (mod 7), so our theorem tells us that
9 · 4 ≡ 2 · 11 (mod 7). Let’s check this. The left hand side:
9 · 4 = 36,
= 5(7) + 1,
so that 9 · 4 ≡ 1 (mod 7). The right hand side:
2 · 11 = 22,
= 3(7) + 1,
so that 2 · 11 ≡ 1 (mod 7).
So it works in this example. Let’s prove that it always works.
Proof. Since a − A is a multiple of m, as is b − B, note that
(a + b) − (A + B) = (a − A) + (b − B)
is also a multiple of m, so addition works. In the same way
ab − AB = (a − A)b + A(b − B)
is a multiple of m.
5.1 Prove by induction that every “perfect square”, i.e. integer of the form n2 , has
remainder 0, 1 or 4 modulo 8.
5.2 Take an integer like 243098 and write out the sum of its digits 2 + 4 + 3 + 0 + 9 + 8.
Explain why every integer is congruent to the sum of its digits, modulo 9.
5.3 Take integers a, b, m with m 6= 0. Let d = gcd {a, b, m}. Prove that a ≡ b (mod m)
just when a/d ≡ b/d (mod m/d).
29
Arithmetic of remainders
Arithmetic of remainders
We now define an addition law on the numbers 0, 1, 2, 3, 4, 5, 6 by declaring that
when we add these numbers, we then take remainder modulo 7. This is not usual
addition. To make this clear, we write the remainders with bars over them, always.
For example, we are saying that in this addition law 3̄ + 5̄ means 3 + 5 = 8 = 1̄,
since we are working modulo 7. We adopt the same rule for subtraction, and for
multiplication. For example, modulo 13,
7̄ + 9̄
11 + 6̄ = 16 · 17,
= 13 + 3 · 13 + 4,
= 3 · 4,
= 12.
If we are daring, we might just drop all of the bars, and state clearly that we are
calculating modulo some integer. In our daring notation, modulo 17,
16 · 29 − 7 · 5 = 16 · 12 − 7 · 5,
= 192 − 35,
= (11 · 17 + 5) − (2 · 17 + 1) ,
= 5 − 1,
= 4.
5.4 Expand and simplify 5̄ · 2̄ · 6̄ − 9̄ modulo 7.
The addition and multiplication tables for remainder modulo 5:
+
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
·
0
1
2
3
4
0
0
0
0
0
0
1
0
1
2
3
4
2
0
2
4
1
3
3
0
3
1
4
2
4
0
4
3
2
1
5.5 Describe laws of modular arithmetic, imitating the laws of integer arithmetic. If
we work modulo 4, explain why the zero divisors law fails. Why are there no sign
laws?
5.6 Compute the remainder when dividing 19 into 37200 .
5.7 Compute the last two digits of 92000 .
5.8 Prove that the equation a2 + b2 = 3c2 has no solutions in nonzero integers a,
b and c. Hint: start by proving that modulo 4, a2 = 01 or 1. Then consider the
equation modulo 4; show that a, b and c are divisible by 2. Then each of a2 , b2 and
c2 has a factor of 4. Divide through by 4 to show that there would be a smaller set
of solutions to the original equation. Apply induction.
30
Modular arithmetic
To carry a remainder to a huge power, say 22005 modulo 13, we can build up the
power out of smaller ones. For example, 22 = 4 modulo 13, and therefore modulo 13,
24 = 22
2
,
2
=4 ,
= 16,
= 3.
Keeping track of these partial results as we go, modulo 13,
28 = 24
2
,
2
=3 ,
= 9.
We get higher and higher powers of 2: modulo 13,
k
k
2k
22 mod 13
0
1
2
3
4
5
6
7
8
9
10
11
1
2
4
8
16
32
64
128
256
512
1024
2048
2
4
3
9
92 = 81 = 3
32 = 9
92 = 3
32 = 9
92 = 3
32 = 9
92 = 3
The last row gets into 22048 , too large to be relevant to our problem. We now want
to write out exponent 2005 as a sum of powers of 2, by first dividing in 1024:
2005 = 1024 + 981
and then dividing in the next power of 2 we can fit into the remainder,
= 1024 + 512 + 469,
= 1024 + 512 + 256 + 128 + 64 + 16 + 4 + 1.
Then we can compute out modulo 13:
22005 = 21024+512+256+128+64+16+4+1 ,
= 21024 2512 2256 2128 264 216 24 21 ,
= 3 · 9 · 3 · 9 · 3 · 3 · 3 · 2,
= (3 · 9)3 · 2,
= 272 · 2,
= 12 · 2,
= 2.
31
Reciprocals
5.9 Compute 2100 modulo 125.
Reciprocals
Every nonzero rational number b/c has a reciprocal: c/b. Since we now have modular
arithmetic defined, we want to know which remainders have “reciprocals”. Working
modulo some positive integer, say that a remainder x has a reciprocal y = x−1 if
xy = 1. (It seems just a little too weird to write it as y = 1/x, but you can if you
like.) Reciprocals are also called multiplicative inverses. For example, modulo 7
1̄ · 1̄ = 1̄,
2̄ · 4̄ = 1̄,
3̄ · 5̄ = 1̄,
4̄ · 2̄ = 1̄,
5̄ · 3̄ = 1̄,
6̄ · 6̄ = 1̄.
So in this weird type of arithmetic, we can allow ourselves the freedom to write these
equations as identifying a reciprocal.
1̄−1 = 1̄,
2̄−1 = 4̄,
3̄−1 = 5̄,
4̄−1 = 2̄,
5̄−1 = 3̄,
6̄−1 = 6̄.
A remainder that has a reciprocal is a unit.
5.10 Danger: If we work modulo 4, then prove that 2̄ has no reciprocal. Hint: 2̄2 = 0.
5.11 Prove that, modulo any integer m, (m − 1)−1 = m − 1, and that modulo m2 ,
(m − 1)−1 = m2 − m − 1.
Theorem 5.2. Take a positive integer m. In the remainders modulo m, a remainder
r̄ is a unit just when r, m are coprime integers.
Proof. If r, m are coprime integers, so their greatest common divisor is 1, then write
Bézout coefficients sr + tm = 1, and quotient by m:
s̄r̄ = 1̄.
On the other hand, if
s̄r̄ = 1̄,
then sr is congruent to 1 modulo m, i.e. there is some quotient q so that sr = qm + 1,
so sr − qm = 1, giving Bézout coefficients s = s, t = −q, so the greatest common
divisor of r, m is 1.
32
Modular arithmetic
Working modulo 163, let’s compute 14−1 . First we carry out the long division
11
14 163
140
23
14
9
Now let’s start looking for Bézout coefficients, by writing out matrix:
1
0
0
1
14
163
and then add −11 · row 1 to row 2:
1
−11
0
1
12
−11
−1
1
5
.
9
12
−23
−1
2
5
.
4
35
−23
−3
2
1
.
4
35
−163
−3
14
1
.
0
14
.
9
Add −row 2 to row 1:
Add −row 1 to row 2:
Add −row 2 to row 1:
Add −4 · row 1 to row 2:
Summing it all up: 35 · 14 + (−3) · 163 = 1. Quotient out by 163: modulo
163, 35 · 14 = 1, so modulo 163, 14−1 = 35.
5.12 Use this method to find reciprocals:
a. 13−1 modulo 59
b. 10−1 modulo 11
c. 2−1 modulo 193.
d. 6003722857−1 modulo 77695236973.
5.13 Suppose that b, c are remainders modulo a prime. Prove that bc = 0 just when
either b = 0 or c = 0.
5.14 Suppose that p is a prime number and n is an integer with n < p. Explain why,
modulo p, the numbers 0, n, 2n, 3n, . . . , (p − 1)n consist in precisely the remainders
0, 1, 2, . . . , p − 1, in some order. (Hint: use the reciprocal of n.) Next, since every
33
The Chinese remainder theorem
nonzero remainder has a reciprocal remainder, explain why the product of the nonzero
remainders is 1. Use this to explain why
np−1 ≡ 1
(mod p).
Finally, explain why, for any integer k,
kp ≡ k
(mod p).
The Chinese remainder theorem
An old woman goes to market and a horse steps on her basket and
crushes the eggs. The rider offers to pay for the damages and asks
her how many eggs she had brought. She does not remember the exact
number, but when she had taken them out two at a time, there was one
egg left. The same happened when she picked them out three, four, five,
and six at a time, but when she took them seven at a time they came
out even. What is the smallest number of eggs she could have had?
— Brahmagupta (580CE–670CE)
Brahma-Sphuta-Siddhanta (Brahma’s Correct System)
Take a look at some numbers and their remainders modulo 3 and 5:
n
n mod 3
n mod 5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
0
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
Remainders modulo 3 repeat every 3, and remainders modulo 5 repeat every 5, but
the pair of remainders modulo 3 and 5 together repeat every 15.
Theorem 5.3. Take some positive integers m1 , m2 , . . . , mn , so that any two of them
are coprime. Suppose that we want to find an unknown integer x given only the
34
Modular arithmetic
knowledge of its remainders modulo m1 , m2 , . . . , mn ; so we know its remainder modulo
m1 is r1 , modulo m2 is r2 , and so on. There is such an integer x, and x is unique
modulo
m1 m2 . . . mn .
Proof. Let
m ..= m1 m2 . . . mn .
For each i, let
m
= m1 m2 . . . mi−1 mi+1 mi+2 . . . mn .
mi
Note carefully that all of m1 , m2 , . . . , mn divide ui , except mi . So if j =
6 i then modulo
mj , ūi = 0. All of the other mj are coprime to mi , so their product is coprime to m,
so has a reciprocal modulo mi . Thus modulo mi , ūi =
6 0. Each ūi has some reciprocal
modulo mi , call it v̄i . Let
ui ..=
x ..= r1 u1 v1 + r2 u2 v2 + · · · + rn un vn .
Then modulo m1 ,
x̄ = r̄1 ,
and so on.
Let’s find an integer x so that
x≡1
(mod 3),
x≡2
(mod 4),
x≡1
(mod 7).
So in this problem we have to work modulo (m1 , m2 , m3 ) = (3, 4, 7), and get
remainders (r1 , r2 , r3 ) = (1, 2, 1). First, no matter what the remainders, we
have to work out the reciprocal mod each mi of the product of all of the other
mj . So let’s reduce these products down to their remainders:
4 · 7 = 28 = 9 · 3 + 1 ≡ 1
(mod 3),
3 · 7 = 21 = 5 · 4 + 1 ≡ 1
(mod 4),
3 · 4 = 12 = 1 · 7 + 5 ≡ 5
(mod 7).
We need the reciprocals of these, which, to save ink, we just write down for
you without writing out the calculations:
1−1 ≡ 1
(mod 3),
1
−1
≡1
(mod 4),
5
−1
≡3
(mod 7).
(You can easily check those.) Finally, we add up remainder times product
times reciprocal:
x = r1 · 4 · 7 · 1 + r2 · 3 · 7 · 1 + r3 · 3 · 4 · 3,
= 1 · 4 · 7 · 1 + 2 · 3 · 7 · 1 + 1 · 3 · 4 · 3,
= 106.
35
The Chinese remainder theorem
We can now check to be sure:
106 ≡ 1
(mod 3),
106 ≡ 2
(mod 4),
106 ≡ 1
(mod 7).
The Chinese remainder theorem tells us also that 106 is the unique solution
modulo 3 · 4 · 7 = 84. But then 106 − 84 = 22 is also a solution, the smallest
positive solution.
5.15 Solve the problem about eggs. Hint: ignore the information about eggs taken
out six at a time.
5.16 How many soldiers are there in Han Xin’s army? If you let them parade in rows
of 3 soldiers, two soldiers will be left. If you let them parade in rows of 5, 3 will be
left, and in rows of 7, 2 will be left.
Given some integers m1 , m2 , . . . , mn , we consider sequences (b1 , b2 , . . . , bn ) consisting of remainders: b1 a remainder modulo m1 , and so on. Add sequences of remainders
in the obvious way:
(b1 , b2 , . . . , bn ) + (c1 , c2 , . . . , cn ) = (b1 + c1 , b2 + c2 , . . . , bn + cn ) .
Similarly, we can subtract and multiply sequences of remainders:
(b1 , b2 , . . . , bn ) (c1 , c2 , . . . , cn ) = (b1 c1 , b2 c2 , . . . , bn cn ) ,
by multiplying remainders as usual, modulo the various m1 , m2 , . . . , mn .
Modulo (3, 5), we multiply
(2, 4)(3, 2) = (2 · 3, 4 · 2),
= (6, 8),
= (0, 3).
Let m ..= m1 m2 . . . mn . To each remainder modulo m, say b, associate its remainder b1 modulo m1 , b2 modulo m2 , and so on. Associate the sequence ~b ..=
(b1 , b2 , . . . , bn ) of all of those remainders. In this way we make a map taking each
remainder b modulo m to its sequence ~b of remainders modulo all of the various mi .
−−→
−−→
−
→
Moreover, b + c = ~b + ~c and b − c = ~b − ~c and bc = ~b~c, since each of these works when
we take remainder modulo anything.
Take m1 , m2 , m3 to be 3, 4, 7. Then m = 3 · 4 · 7 = 84. If b = 8 modulo 84,
36
Modular arithmetic
then
b1 = 8
mod 3,
= 2,
b2 = 8
mod 4,
= 0,
b3 = 8
mod 7,
= 1,
~b = (b1 , b2 , b3 ) = (2, 0, 1) .
Corollary 5.4. Take some positive integers m1 , m2 , . . . , mn , so that any two of
them are coprime. The map taking b to ~b, from remainders modulo m to sequences
of remainders modulo m1 , m2 , . . . , mn , is one-to-one and onto, identifies sums with
sums, products with products, differences with differences, units with sequences of
units.
Euler’s totient function
Euler’s totient function φ assigns to each positive integer m = 2, 3, . . . the number of
all remainders modulo m which are units (in other words, which are relatively prime
to m) [in other words, which have reciprocals]. It is convenient to define φ(1) ..= 1
(even though there isn’t actually 1 unit remainder modulo 1).
5.17 Explain by examining the remainders that the first few values of φ are
m
φ(m)
1
2
3
4
5
6
7
1
1
2
2
4
2
6
5.18 Prove that a positive integer m ≥ 2 is prime just when φ(m) = m − 1.
Theorem 5.5. Suppose that m ≥ 2 is an integer with prime factorizaton
m = pa1 1 pa2 2 . . . pann ,
so that p1 , p2 , . . . , pn are prime numbers and a1 , a2 , . . . , an are positive integers. Then
φ(m) = pa1 1 − pa1 1 −1
pa2 2 − pa2 2 −1 . . . pann − pann −1 .
Proof. If m is prime, this follows from problem 5.18.
Suppose that m has just one prime factor, or in other words that m = pa for some
prime number p and integer a. It is tricky to count the remainders relatively prime
37
Euler’s totient function
to m, but easier to count those not relatively prime, i.e. those which have a factor of
p. Clearly these are the multiples of p between 0 and pa − p, so the numbers pj for
0 ≤ j ≤ pa−1 − 1. So there are pa−1 such remainders. We take these out and we are
left with pa − pa−1 remainders left, i.e. relatively prime.
If b, c are relatively prime integers ≥ 2, then corollary 5.4 on the preceding page
maps units modulo bc to pairs of a unit modulo b and a unit modulo c, and is one-to-one
and onto. Therefore counting units: φ(bc) = φ(b)φ(c).
Theorem 5.6 (Euler). For any positive integer m and any integer a coprime to m,
aφ(m) ≡ 1
(mod m).
Proof. Working with remainders modulo m, we have to prove that for any unit
remainder a, aφ(m) = 1 modulo m.
Let U be the set of all units modulo m, so U is a subset of the remainders
0, 1, 2, . . . , m − 1. The product of units is a unit, since it has a reciprocal (the product
of the reciprocals). Therefore the map
u ∈ U 7→ au ∈ U
is defined. It has an inverse:
u ∈ U 7→ a−1 u ∈ U,
where a−1 is the reciprocal of a. Writing out the elements of U , say as
u1 , u 2 , . . . , u q
note that q = φ(m). Then multiplying by a scrambles these units into a different
order:
au1 , au2 , . . . , auq .
But multiplying by a just scrambles the order of the roots, so if we multiply them all,
and then scramble them back into order:
(au1 ) (au2 ) . . . (auq ) = u1 u2 . . . uq .
Divide every unit u1 , u2 , . . . , uq out of boths sides to find aq = 1.
5.19 Take an integer m ≥ 2. Suppose that a is a unit in the remainders modulo m.
Prove that the reciprocal of a is aφ(m)−1 .
Euler’s theorem is important because we can use it to calculate polynomials of very
large degree quickly modulo prime numbers (and sometimes even modulo numbers
which are not prime).
Modulo 19, let’s find 123456789987654321 . First, the base of this expression is
123456789 = 6497725 · 19 + 14 So modulo 19:
123456789987654321 = 14987654321 .
That helps with the base, but the exponent is still large. According to Euler’s
theorem, since 19 is prime, modulo 19:
a19−1 = 1
38
Modular arithmetic
for any remainder a. In particular, modulo 19,
1418 = 1.
So every time we get rid of 18 copies of 14 multiplied together, we don’t
change our result. Divide 18 into the exponent:
987654321 = 54869684 · 18 + 9.
So then modulo 19:
14987654321 = 149 .
We know that 1418 = 1 modulo 19, so 149 is a square root of 1. But the
only square roots of 1 are ±1, modulo any integer, so 149 = ±1 modulo 19.
We leave the reader to check that 149 = −1 = 18 modulo 19, so that finally,
modulo 19,
123456789987654321 = 18.
Lemma 5.7. For any prime number p and integers a and k, a1+k(p−1) = a modulo p.
Proof. If a is a multiple of p then a = 0 modulo p so both sides are zero. If a is not
a multiple of p then ap−1 = 1 modulo p, by Euler’s theorem. Take both sides to the
power k and multiply by a to get the result.
Theorem 5.8. Suppose that m = p1 p2 . . . pn is a product of distinct prime numbers.
Then for any integers a and k with k ≥ 0,
a1+kφ(m) ≡ a
(mod m).
Proof. By lemma 5.7, the result is true if m is prime. So if we take two prime numbers
p and q, thena1+k(p−1)(q−1) = a modulo p, but also modulo q, and therefore modulo
pq. The same trick works if we start throwing in more distinct prime factors into
m.
5.20 Give an example of integers a and m with m ≥ 2 for which a1+φ(m) 6= a modulo
m.
Sage
In Sage, the quotient of 71 modulo 13 is mod(71,13). The tricky bit: it returns a
“remainder modulo 13”, so if we write
a=mod(71,13)
this will define a to be a “remainder modulo 13”. The value of a is then 6, but the
value of a2 is 10, because the result is again calculated modulo 13.
Euler’s totient function is
euler_phi(777)
which yields φ(777) = 432. To find 14−1 modulo 19,
inverse_mod(14,19)
39
Sage
yields 14−1 = 15 modulo 19.
We can write our own version of Euler’s totient function, just to see how it might
look:
def phi(n):
return prod(p^a-p^(a-1) for (p,a) in factor(n))
where prod means product, so that
phi(666)
yields φ(666) = 216. The code here uses the function factor(), which takes an
integer n and returns a list p=factor(n) of its prime factors. In our case, the prime
factorisation of 666 is 666 = 2 · 32 · 37. The expression p=factor(n) when n = 666
yields a list p=[(2,1), (3,2), (37,1)], a list of the prime factors together with their
powers. To find these values, p[0] yields (2,1), p[1] yields (3,2), and p[2] yields
(37,1). The expression len(p) gives the length of the list p, which is the number of
entries in that list, in this case 3. For each entry, we set b = pi and e = ai and then
multiply the result r by be − be−1 .
To use the Chinese remainder theorem, suppose we want to find a number x so
that x has remainders 1 mod 3 and 2 mod 7,
crt([1,2],[3,7])
gives you x = 16; you first list the two remainders 1,2 and then list the moduli 3,7.
Another way to work with modular arithmetic in sage: we can create an object
which represents the remainder of 9 modulo 17:
a=mod(9,17)
a^(-1)
yielding 2. As long as all of our remainders are modulo the same number 17, we can
do arithmetic directly on them:
b=mod(7,17)
a*b
yields 12.
Chapter 6
Secret messages
And when at last you find someone to whom you feel you can pour
out your soul, you stop in shock at the words you utter— they
are so rusty, so ugly, so meaningless and feeble from being kept
in the small cramped dark inside you so long.
— Sylvia Plath
The Unabridged Journals of Sylvia Plath
Man is not what he thinks he is, he is what he hides.
— André Malraux
Enigma, Nazi secret code machine
41
42
Secret messages
RSA: the Cocks, Rivest, Shamir and Adleman algorithm
Alice wants to send a message to Bob, but she doesn’t want Eve to read it.
She first writes the message down in a computer, as a collection of 0’s and
1’s. She can think of these 0’s and 1’s as binary digits of some large integer x,
or as binary digits of some remainder x modulo some large integer m. Alice
takes her message x and turns it into a secret coded message by sending Bob
not the original x, but instead sending him xd modulo m, for some integer
d. This will scramble up the digits of x unrecognizably, if d is chosen “at
random”. For random enough d, and suitably chosen positive integer m, Bob
can unscramble the digits of xd to find x.
For example, if we take m ..= 55 and let d ..= 17, we find:
x
x17 mod 55
x
x17 mod 55
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
0
1
7
53
49
25
41
17
13
4
10
11
12
18
9
5
36
52
28
24
15
21
22
23
29
20
16
47
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
8
39
35
26
32
33
34
40
31
27
3
19
50
46
37
43
44
45
51
42
38
14
30
6
2
48
54
0
If Alice sends Bob the secret message y = x17 , then Bob decodes it by x = y 33 ,
as we will see.
43
Sage
Theorem 6.1. Pick two different prime numbers p and q and let m = pq. Recall
Euler’s totient function φ(m). Suppose that d and e are positive integers so that de = 1
modulo φ(m). If Alice maps each remainder x to y = xd modulo m, then Bob can
invert this map by taking each remainder y to x = y e modulo m.
e
Proof. We have to prove that xd = x modulo m for all x, i.e. that xde = x modulo
m for all x. For x coprime to m, this follows from Euler’s theorem, but for other
values of x the result is not obvious.
By theorem 5.5 on page 36, φ(m) = (p − 1)(q − 1). Since de = 1 modulo φ(m),
clearly
de − 1 = k(p − 1)(q − 1)
for some integer k. If k = 0, then de = 1 and the result is clear: x1 = x. Since d and
e are positive integers, de − 1 is not negative, so k ≥ 0. So we can assume that k > 0.
The result follows from theorem 5.8 on page 38.
6.1 Take each letter in the alphabet, ordered
abc . . . zABC . . . Z,
and one “blank space” letter , so 26 + 26 + 1 = 53 letters in all. Use the rule that
each letter is then represented by a number from 100 to 152, starting with a 7→ 100,
b 7→ 101, and so on to
7→ 152. Then string the digits of these numbers together
into a single number. For example, ab c is 100 101 152 102.
a. Write out the message Hail Ceasar as a number.
b. Translate the number 127 114 114 111 104 into letters by this encoding.
c. Apply the method above, taking
p = 993 319,
q = 999 331,
d = 13
and use a computer to compute the secret code on the number x = 127 114 114 111 104.
You should get:
y = 202 002 770 451
d. Use a computer to check that the associated number e for the given numbers
p, q, d in this case is
e = 839 936 711 257.
How can you use a computer to find this number e, if you only know the numbers
p, q, d in this case?
e. Use the process described in theorem 6.1 to decode the message
y = 660 968 731 660.
Warning: you might want to factor e first and then compute the power y e
modulo m by computing one factor of e at a time.
Sage
Sage shines because we can’t do any of the computations of this chapter by hand.
44
Secret messages
Alice
To encode a single letter as a number, sage has a function ord():
ord(’A’)
yields 65, giving each letter (and punctuation sign and space, etc.) a different positive
integer less than 100. (If Alice wants to use lower case letters, we need to go beyond
100, so let’s not do that.) To apply this to an entire string of text,
m = "HELLO WORLD"
m = map(ord, m)
m
yields [72, 69, 76, 76, 79, 32, 87, 79, 82, 76, 68]. Alice turns this list of numbers into a
single number, by first reversing order:
m.reverse()
m
yields [68, 76, 82, 79, 87, 32, 79, 76, 76, 69, 72]. Then the expression
ZZ(m,100)
adds up successively these numbers, multiplying by 100 at each step, so that
x=ZZ(m,100)
x
yields x. This is the integer Alice wants to encode. Alice sets up the values of her
primes and exponent:
p=993319
q=999331
d=13
m=p*q
f=euler_phi(m)
e=inverse_mod(d,f)
The secret code Alice sends is y = xd (mod m). Sage can compute powers in modular
arithmetic quickly (using the same tricks we learned previously):
y=power_mod(x, d, m)
y
yielding 782102149108. This is Alice’s encoded message. She can send it to Bob
openly: over the phone, by email, or on a billboard.
45
Sage
Bob
Bob decodes by
x=power_mod(y, e, m)
x
yielding 53800826153. (Alice has secretly whispered e and m to Bob.) Bob turns this
number x back into text by using the function chr(), which is the inverse of ord():
def recover_message(x):
if x==0:
return ""
else:
return recover_message(x//100)+chr(x%100)
Bob applies this:
print(recover_message(x))
to yield HELLO WORLD.
Chapter 7
Rational, real and complex numbers
A fragment of the Rhind papyrus, 1650BCE, containing ancient Egyptian calculations
using rational numbers
There is a remote tribe that worships the number zero. Is nothing sacred?
— Les Dawson
Rational numbers
A rational number is a ratio of two integers, like
2
,
3
−7
,
9
47
22
.
−11
48
Rational, real and complex numbers
We also write them 2/3, −7/9, 22/−11. In writing 23 , the number 2 is the numerator,
and 3 is the denominator. We can think of rational numbers two different ways:
a. Geometry: Divide up an interval of length b into c equal pieces. Each piece has
length b/c.
b. Algebra: a rational number b/c is just a formal expression we write down, with
two integers b and c, in which c 6= 0. We agree that my rational number b/c
and your rational number B/C we declare to be equal just when we can rescale
numerators and denominators to get them to match. In other words, just when
I can find a nonzero integer a, and you can find a nonzero integer A, so that
ab = AB and ac = AC.
Of course, the two ideas give the same objects, but in these notes, the algebra approach
is best. For example, we see that 22/11 = 14/7, because I can factor out:
22
11 · 2
=
,
11
11 · 1
and factor out
14
7·2
=
.
7
7·1
as simply b, and in this way we the integers sitting
Write any rational number 1b
among the rational numbers.
More generally, any rational number b/c can be simplified by dividing b and c by
their greatest common divisor, and then changing the sign of b and of c if needed, to
ensure that c > 0; the resulting rational number is in lowest terms.
7.1 Bring these rational numbers to lowest terms: 224/82, 324/ − 72, −1000/8800.
Multiply rational numbers by
bB
b B
·
=
.
c C
cC
Similarly, divide a rational number by a nonzero rational number as
b
c
B
C
=
b C
· .
c B
Add rationals with the same denominator as:
b
c
b+c
+ =
.
d
d
d
The same for subtraction:
c
b−c
b
− =
.
d
d
d
But if the denominators don’t match up, we rescale to make them match up:
b
B
bC
cB
+
=
+
.
c
C
cC
cC
7.2 By hand, showing your work, simplify
2
1 324 392 4
7
− ,
·
, + .
3
2 49
81 5
4
49
Real numbers
Lemma 7.1. The number
squares to 2.)
√
2 is irrational. (To be more precise, no rational number
√
√
Proof. If it is rational, say 2 = cb , then c 2 = b, so squaring gives c2 · 2 = b2 . Every
2
prime factor in b occurs twice in b , so the prime factorization of the right hand side
has an even number of factors of each prime. But the left hand side has an odd
number of factors of 2, since c2 has an even number, but there is another factor of 2
on the left hand side. This contradicts uniqueness of prime factorization.
√
7.3 Suppose that N is a positive integer. Prove that either N is irrational or N is
a perfect square, i.e. N = n2 for some integer n.
√
√
7.4 Prove that there are no rational numbers x and y satisfying 3 = x + y 2.
A rational number is positive if it is a ratio of positive integers.
7.5 Prove that every positive rational number has a unique expression in the form
α
1 α2
k
pα
1 p2 . . . pk
β
q1β1 q2β2 . . . q` `
where
p1 , p 2 , . . . , p k , q 1 , q 2 , . . . , q `
are prime numbers, all different, and
α1 , α2 , . . . , αk , β1 , β2 , . . . , β`
are positive integers. For example
234
32 · 13
=
.
16
23
7.6 Of the various laws of addition and multiplication and signs for integers, which
hold true also for rational numbers?
The most important property that rational numbers have, that integers don’t, is
that we can divide any rational number by any nonzero rational number.
Real numbers
Order is an exotic in Ireland. It has been imported from England but
will not grow. It suits neither soil nor climate.
— J.A. Froude
Think of the rational numbers geometrically, so that b/c is the length of each piece
when a rod of length b is cut into c equal lengths. Draw an straight line, infinite in
both directions, and mark a point 0 on it. Then mark at point 1 at 1 unit of distance
from 0, and so on marking all of the integers to form a number line. The points of this
continuous number line are the real numbers. This physical definition doesn’t make
it possible to prove anything about the real numbers.
50
Rational, real and complex numbers
√
The rational numbers are insufficient for algebra: they have no 2. Imagine
taking all positive rational numbers b/c for which b2 /c2 > 2. Thinking geometrically,
drawing a number line:
√
2
Our rational numbers sit along √
the line getting very close to the circled point of the
line, which should be the point 2. But there is no rational number there.
We assume that the reader is familiar with real numbers, with continuity, and
with the intermediate value theorem: every continuous function y = f (x) defined on
an interval a ≤ x ≤ b takes on all values between f (a) and f (b). We also assume that
the reader knows that polynomials are continuous. See [6] for the complete story of
real numbers, continuity and the intermediate value theorem.
If f (x) is a polynomial with positive coefficients, like
f (x) = −4 + 7x + 17x2 + 674x4 ,
then making x very large makes f (x) very large positive, clearly, since x4
eventually gets much larger than all other terms. The equation f (x) = 0
must have a solution x > 0, because f (x) starts at f (0) = −4 (negative) and
then gets much larger than anything we like eventually for large enough x,
so gets positive, so must be zero somewhere in between.
7.7 Prove that the equation
394x3 − 92843x2 + 209374x − 2830
has a real number solution x. More generally, prove that any odd-order polynomial
equation in a real variable has a real solution.
A root x0 of a polynomial b(x) has multiplicity k if (x − x0 )k divides b(x).
Theorem 7.2 (Descartes). The number of positive roots, counted with multiplicities,
of a real polynomial b(x) is either equal to the number of changes of sign of its coefficients (when we write the polynomial b(x) with terms in order of degree), or less by a
multiple of 2.
• The polynomial
b(x) = 12x7 + x5 − x3 − x2 + x + 1
has coefficients with signs + + − − ++. We ignore the zero coefficients,
for example in front of x6 and x4 . So as we travel along the signs, they
change twice, once from a + to a neighboring −, and once from a −
to a neighboring +. So this b(x) has at most 2 roots x > 0, and the
number of roots with x > 0 is even. So there are either 2 positive roots
or none.
• The polynomial c(x) = x5 −3x4 +3x3 +9x2 −27x+3 has 4 sign changes,
51
Real numbers
so 4 or 2 or 0 roots x with x > 0. If we look at the polynomial c(−x),
expand out to get c(−x) = −x5 − 3x4 − 3x3 + 9x2 + 27x + 3 has 1 sign
change, so 1 root with x > 0. Therefore c(x) has 1 root with x < 0.
Proof. Take a polynomial b(x). If x divides b(x), dividing out factors of x from b(x)
doesn’t change positive roots or signs of coefficients, so we can assume that there are
no factors of x. In other words, assume that the constant coefficient of b(x) is not
zero:
b(x) = b0 + b1 x + b2 x2 + · · · + bn xn
with b0 6= 0 and bn 6= 0.
The number of sign changes in the coefficients is even just when b0 and bn have
the same sign: we start positive and end positive, so switched off and on an even
number of times.
Plug in x = 0 to see that b(0) = b0 . For large positive numbers x, b(x) has the
same sign as bn :
b(x)
b0
b1
bn−1
= n + n−1 + · · · +
+ bn → bn .
xn
x
x
x
If b0 has the opposite sign to bn , then b(0) and b(x) have opposite signs for large
positive x, so b(x) must be zero somewhere in between by the intermediate value
theorem. So if b0 has opposite sign to bn , then b(x) has a positive root. So if b(x) has
no positive roots, then b0 and bn have the same sign. So if b(x) has no positive roots,
then the number of sign changes in the coefficients is even and the theorem holds true
of b(x).
Suppose that b(x) has a positive root, say at x = a. Factor out:
b(x) = (x − a)c(x)
for some polynomial c(x). We claim that b(x) has one more sign change than c(x).
Write out the coefficients of c(x), say as
c(x) = c0 + c1 x + c2 x2 + · · · + cn−1 xn−1 .
Expand out the equation
b(x) = (x − a)c(x)
to find the relations between the coefficients:
bj = cj−1 − acj .
Starting at the top, bn = cn−1 , the top coefficients match in sign. As we go down
the coefficients by degree, pick out the first sign change in the coefficients of c(x), say
cj−1 < 0 while cj > 0. Then bj = cj−1 − acj < 0, so the sign of the bj coefficient
is the same as that of cj−1 . If the b(x) coefficients have not already changed sign
before we got to bj , then they certainly must change sign by the time we get to bj .
The coefficients of b(x) might have changed sign many times before we get to bj , but
at least we see that they have to change sign at least once by the time we get to bj .
The same argument works for the next sign change and so on: at least as many sign
changes. Since the signs of the highest coefficients of b(x) and c(x) agree, while the
signs of the lowest coefficients disagree, the total number of sign changes of b(x) must
52
Rational, real and complex numbers
be greater than those of c(x) by an odd number: one more sign change or three more
or five more, etc.
By induction, the number of positive roots of c(x) is equal to the number of sign
changes of c(x) or less by a multiple of two. Since b(x) has one more positive root, and
one more sign change or three more or five more, etc., the result holds by induction
on degree.
For example, factor out all of the positive roots of a polynomial:
b(x) = (x − x1 )(x − x2 ) . . . (x − xk )c(x),
so that c(x) has no positive roots, and so the constant term of c(x) is positive. The
constant term of b(x) is then
b0 = b(0) = (−1)k x1 . . . xk c(0)
with sign (−1)k . So the number of sign changes in b(x) is even just the constant term
in b(x) is positive which occurs just when the number of positive roots is even.
Complex numbers
For every complex problem there is an answer that is clear, simple,
and wrong.
— H. L. Mencken
Entre deux vérités du domaine réel, le chemin le plus facile et le
plus court passe bien souvent par le domaine complexe.
The shortest and easiest path between any two facts about the
real domain often passes through the complex domain.
— Paul Painlevé
Analyse des travaux scientifiques
√
Just as the rational
numbers have a deficiency, no 2, so the real numbers have a
√
deficiency: no −1. We can fix this by introducing the complex numbers. Here are
two definitions:
a. Algebra: We agree to play a game, where we write down algebraic expressions in
real numbers and in an abstract variable i, but we treat the symbol i according
to the following rule. Whenever we see two copies of i multiplied together (or
we see i2 ) appear anywhere, we are allowed to wipe them out and replace with
−1, and vice versa when√we see a −1 we can replace it with an i2 . We
√ force by
hand the existence of a −1 by forcing the abstract symbol i to be −1.
b. Geometry: We draw the usual x, y axes on the plane. We pretend that each
point of the plane represents a single number, which we write as z = x + yi to
represent the point (x, y). If we have two points, z = x + yi and Z = X + Y i,
we add them by the rule:
z + Z = (x + X) + (y + Y )i
53
Complex numbers
and multiply them by the rule:
zZ = (xX − yY ) + (xY + yX)i.
For example,
(2 + 4i)(3 − i) = (2 · 3 + 4 · 1) + (−2 + 4 · 3)i = 10 + 10i.
It isn’t at all clear that these two games we can play arrive at the same result. Note
that the rule for multiplication seems to come out of the air in the geometric theory,
but in the algebraic theory it is just what you get by expanding out both sides and
insisting that i2 = −1. Algebra is in this respect clearly superior. But the geometric
approach makes very precise what sort of objects we are really dealing with, so we
will use it as our definition.
Just like with rational numbers, if a complex number z = x + yi is not zero, or in
other words x and y are not both zero, then it has a reciprocal
1
x − yi
= 2
.
z
x + y2
7.8 Check that this is a reciprocal. Why is it defined?
The complex numbers are in a very strong sense free of the deficiencies of the
rational and real numbers:
Theorem 7.3 (The fundamental theorem of algebra). Every nonconstant polynomial
function
p(z) = a0 + a1 z + a2 z 2 + · · · + an z n
with complex number coefficients a0 , a1 , . . . , an has a complex root, i.e. a complex
number z = z0 so that p(z0 ) = 0.
The proof of the theorem uses analysis, but we want to focus on algebra; see [6] p.
513 chapter 2 theorem 2 for a complete proof. For every complex polynomial problem,
there is an answer that is perhaps unclear, complex and right.
Complex conjugation is the operation taking z = x + yi to z̄ ..= x − iy.
7.9 Prove that a complex number z is a real number just when z = z̄.
7.10 Prove that addition, subtraction, multiplication and division of complex numbers
passes through conjugation:
z + w = z̄ + w̄,
z − w = z̄ − w̄,
zw = z̄ w̄,
z
w
=
z̄
,
w̄
for any complex numbers z and w. Explain by induction why, for any polynomial p(z)
with real number coefficients,
p(z) = p(z̄) ,
for any complex number z.
.
p The norm or modulus of a complex number z = x + yi is the real number |z| .=
x2 + y 2 .
7.11 Prove that z z̄ = |z| for any complex number z.
54
Rational, real and complex numbers
Sage
Sage represents numbers in two different ways: as exact symbolic expressions, or as a
finite decimal approximation (often called a numeric or floating point representation).
For example we compute 72 + 53 symbolically as
7/2+5/3
yielding
31
.
6
We can also play with irrational numbers symbolically: represent
√
27 as
sqrt(27)
√
which sage simplifies to yield 3 3.
We will never use decimal approximations in this class (please don’t), but here is
how you can use them if you wish. By writing out a number with a decimal point, we
are asking sage to approximate numerically, keeping only some number of decimals
accuracy:
sqrt(27.0)
yields 5.19615242270663. If we want to ask sage to give us a particular number of
decimal places accuracy in a decimal approximation of a symbolic expression, say 50
decimals:
numerical_approx(pi, digits=50)
yields
3.1415926535897932384626433832795028841971693993751.
To use complex numbers, try
z=3+4*i
real_part(z)
yielding 3, the real part, while imag_part(z) yields 4.
To count the number of sign changes in a polynomial, it is usually easy to spot
by eye, but we can also write sage code. Each polynomial p has associated list of
coefficients p.list(). This list has the coefficients in order, but with zero entries
in the list at zero coefficients. For example, if p=t^5-2*t+7 then p.list() yields
[7, −2, 0, 0, 0, 1]. We will travel along the coefficients one by one, with a for loop. The
code (which we explain below):
def descartes(p):
sign = 0
sign_changes = 0
for c in p.list():
if c <> 0:
if sign == 0:
if c < 0:
sign = -1
else:
if c > 0:
sign = 1
else:
55
Sage
if c*sign < 0:
sign_changes = sign_changes + 1
sign = -sign
return sign_changes
We store the sign of the last nonzero coefficient in a variable sign, which is set to
zero initially to mean that we haven’t yet encountered a nonzero coefficient. If we
encounter our first nonzero coefficient, we just set sign to its value. But if we find
any subsequent nonzero coefficient, we compare whether it has the same sign as sign,
and if it has an opposite sign, i.e. if L[i]*sign<0, then we count a new sign change.
Now calling the function:
t = var(’t’)
descartes(t^(79)-4*t^(17)+t^(10)+t^6+t-1)
yields 3.
Chapter 8
Polynomials
I was x years old in the year x2 .
— Augustus De Morgan (when asked his age)
A polynomial in a variable x is an expression like
1, x + 1, 5 + 3x2 − 2x,
a finite sum of terms, each the product of a number (the coefficient of the term), and
a nonnegative integer power of x (the degree of the term). We define the degree of
a constant to be 0, except that the degree of 0 is defined to be −∞. We can add
together coefficients of terms of the same degree:
2x2 + 7x2 + x = (2 + 7)x2 + x = 9x2 + x.
We usually arrange that the terms decrease in degree, writing 1+5x+x2 as x2 +5x+1.
When we write a polynomial this way, the degree is the largest degree of any term. We
might ask the coefficients to be integers, rational numbers, real numbers, or complex
numbers (in which case, we usually use z as the name of the variable, instead of
x). Bizarrely, we might even allow the coefficients to be remainders modulo some
integer. As soon as we fix on any of these choices of coefficients, the addition laws,
multiplication laws, and distributive law (familiar from chapter 1) are immediately
obvious, just writing b(x) as a polynomial, instead of an integer b, and so on, and
changing the word integer to polynomial. We will ignore the sign laws and the law of
well ordering.
Let’s work modulo 2, in honour of George Boole. Let p(x) ..= x2 + x. To be
very careful, when we write x2 + x, we mean 1̄x2 + 1̄x, i.e. the coefficients are
remainders. This polynomial is not zero, because its coefficients are not zero.
Danger: if we ask that x also be a remainder modulo 2, we might want to
think of this polynomial as a function of a variable x. But then that function
is zero, because if we let x = 0̄ then
p 0̄ = 1̄ · 0̄2 + 1̄ · 0̄ = 0̄
and if we let x = 1̄ then
p 1̄ = 1̄ · 1̄2 + 1̄ · 1̄ = 0̄
modulo 2. So a polynomial with coefficients in remainders modulo an integer
is not just a function. It is a purely algebraic object.
57
58
Polynomials
8.1 Work modulo 5: for which remainders a can we find a remainder b so that a = b2 ?
In other words, which elements have square roots?
The extended Euclidean algorithm
You are familiar with long division of polynomials. For example, to divide x − 4 into
x4 − 3x + 5, we have to order the terms in decreasing degrees as usual, and add in
zero terms to get all degrees to show up then
x3 +4x2 +16x +61
x−4
x4 +0x3 +0x2 −3x
x4 −4x3
+5
4x3 +0x2 −3x
4x3 −16x2
+5
16x2 −3x
16x2 −64x
+5
61x +5
61x−244
249
The quotient is x3 + 4x2 + 16x + 61 and the remainder is 249. Summing up, this
calculation tells us that
x4 − 3x + 5 = (x − 4) x3 + 4x2 + 16x + 61 + 249,
We stop when the remainder has small degree than the expression we want to divide
in, in this case when 249 has degree zero, smaller than the degree of x − 4.
In this simple problem, we never needed to divide. But in general, to divide b(x)
by c(x), we might need to divide coefficients of b(x) by those of c(x) at some stage.
Therefore from now on, we will restrict the choice of coefficients to ensure that we can
always divide. For example, we can carry out long division of any two polynomials
with rational coefficients, and we always end up with a quotient and a remainder,
which are themselves polynomials with rational coefficients. The same is true if we
work with real number coefficients, or complex number coefficients. But if we work
with integer coefficients, the resulting quotient and remainder might only have rational
coefficients.
8.2 Determine the greatest common divisor of b(x) = x3 + 4x2 + x − 6 and c(x) =
x5 − 6x + 5 and the associated Bézout coefficients s(x), t(x).
It is trickier to specify the restrictions on coefficients if they are to be remainders
modulo an integer. From now on, we will only work with remainders modulo a prime.
The reason is that every nonzero remainder modulo a prime has a reciprocal, so we
can always divide coefficients.
8.3 Prove that every nonzero remainder modulo a prime has a reciprocal modulo
that prime.
With these restrictions on coefficients, we can immediately generalize the Euclidean and extended Euclidean algorithms, and compute Bézout coefficients and least
common multiples, using the same proofs, so we won’t repeat them.
59
The extended Euclidean algorithm
Take the polynomials b(x) = 3 + 2x − x2 and c(x) = 12 − x − x2 . We
want to find the Bézout coefficients and the greatest common divisor among
polynomials over the rational numbers. At each step, force down the degree
of one of the polynomials.
1
0
1
0
0
1
3 + 2x − x2
12 − x − x2
−1
1
add -row 2 to row 1,
−9 + 3x
12 − x − x2
add (x/3)row 1 to row 2,
1
0
−1
1 − x3
−9 + 3x
12 − 4x
1
0
0
− 31 −
add
3 + 2x − x2
0
x
3
4
row 2 to row 1:,
3
The equation of the Bézout coefficients is therefore
1 3 + 2x − x2 − 1 12 − x − x2 = −9 + 3x = 3(x − 3).
The greatest common divisor, once we divide away any coefficient in front of
the highest term, is x − 3. A monic polynomial is one whose highest order
term has coefficient 1: the greatest common divisor will always be taken to
be a monic polynomial, and then there is a unique greatest common divisor.
Fix a choice of whether we are working with rational, real or complex coefficients
or if our coefficients will be remainders modulo a prime. We will say this choice is a
choice of field, and our polynomials are said to live over that field. A polynomial over
a chosen field is irreducible if it does not split into a product a(x)b(x) of polynomials
over the same field, except for having a(x) or b(x) constant. The same proof for
prime numbers proves that a polynomial over a field admits a factorization into a
product of irreducibles over that field, unique up to reordering the factors and perhaps
multiplying any of the factors by a nonzero constant.
Let’s working over the remainders modulo 2. Notice that 2x = 0x = 0, and
similarly that −1 = 1 so −x2 = x2 . Let’s find the Bézout coefficients of
b(x) = x3 + x and c(x) = x2 :
1
0
0
1
x3 + x
, add x(row 2) to row 1,
x2
1
0
x
1
x
x2
1
x
x
1+x
, add x(row 1) to row 2,
x
.
0
The Bézout coefficients are 1, x, and the greatest common divisor is x:
1 x3 + x + x x2 = x.
60
Polynomials
Factoring
Proposition 8.1. Take any constant c and any polynomial p(x) over any field. Then
p(x) has remainder p(a) when divided by x − c.
Proof. Take quotient and remainder: p(x) = (x − c)q(x) + r(x), using the Euclidean
algorithm. The degree of the remainder is smaller than the degree of x − c, so must
be zero or −∞. So the remainder is a constant, say r0 . Plug in x = c: p(c) =
(c − c)q(c) + r0 = r0 .
Corollary 8.2. A polynomial p(x) (with coefficients, as described above, rational or
real or complex or in the remainders modulo a prime number) has a root at x = c,
i.e. p(c) = 0, for some constant c, just when the linear polynomial x − c divides
into p(x).
8.4 Working over any field, show that the solutions of the quadratic equation
0 = ax2 + bx + c
(where a, b, c are constants and a 6= 0) are precisely the numbers x so that
x+
b
2a
2
=
b2
c
− .
2a2
a
√
Show that every constant k has either two square roots, which we denote as ± k, or
one square root (if 2 = 0 in our field or if k = 0) or has no square roots. Conclude
that the solution of the quadratic equation, over any field, are
√
−b ± b2 − 4ac
x=
.
2a
just when the required square roots exist.
8.5 Prove that a quadratic or cubic polynomial (in other words of degree 2 or 3) over
any field is reducible just when it has a root.
8.6 Prove that the polynomial x3 + x + 1 is irreducible over the field of remainders
modulo 2. By writing down all quadratic polynomials over that field, and factoring
all of them but this one, show that this is the only irreducible quadratic polynomial
over that field.
The polynomial x3 + x + 1 is irreducible over the field of remainders modulo
2, as we saw in the previous problem. But then, if it is reducible over the
integers, say a(x)b(x), then quotient out the coefficients modulo 2 to get a
factorization ā(x)b̄(x) over the field of remainders modulo 2, a contradiction.
Therefore x3 + x + 1 is irreducible over the integers.
Corollary 8.3. Every polynomial over any field has at most one factorisation into
linear factors, up to reordering the factors.
Proof. The linear factors are determined by the roots. Divide off one linear factor
from each root, and apply induction on the degree of the polynomial to see that the
number of times each linear factor appears is uniquely determined.
Factoring integer coefficient polynomials
Factoring integer coefficient polynomials
A powerful trick for guessing rational roots of integer coefficient polynomials:
Lemma 8.4. If a polynomial p(x) with integer coefficients has a rational root n/d,
with n and d coprime integers, then the numerator n divides the coefficient of the
lowest term of p(x), while the denominator d divides the coefficient of the highest
term of p(x).
Proof. Write out
p(x) = ak xk + ak−1 xk−1 + · · · + a1 x + a0 ,
so that these integers a0 , a1 , . . . , ak are the coefficients of p(x). Plug in:
0 = ak (n/d)k + ak−1 (n/d)k−1 + · · · + a1 (n/d) + a0 .
Multiply both sides by dk :
0 = ak nk + ak−1 nk−1 d + · · · + a1 ndk−1 + a0 dk .
All terms but the last term contain a factor of n, so n divides into the last term. But
n and d are relatively prime, so n divides into a0 . The same trick backwards: all
terms but the first term contain a factor of d, so d divides into the first term. But n
and d are relatively prime, so d divides into ak .
The polynomial p(x) = x3 − 3x − 1, if it had a rational root n/d, would
need to have n divide into −1, while d would divide into 1. So the only
possibilities are n/d = 1 or n/d = −1. Plug in to see that these are not roots,
so x3 − 3x − 1 has no rational roots.
8.7 Using lemma 8.4, find all rational roots of
a. x2 − 5,
b. x2 − 3x − 5,
c. x1000 − 3x − 5.
d. x2 /3 − x − 5/3,
A root c of a polynomial p(x) has multiplicity k if (x − c)k divides p(x).
Corollary 8.5. Over any field, the degree of a polynomial is never less than the sum
of the multiplicities of its roots, with equality just when the polynomial is a product of
linear factors.
Proof. Degrees add when polynomials multiply, by looking at the leading term.
61
62
Polynomials
Sage
To construct polynomials in a variable x, we first defined the variable, using the
strange expression:
x = var(’x’)
We can then solve polynomial equations:
solve(x^2 + 3*x + 2, x)
yielding [x == -2, x == -1]. To factor polynomials:
x=var(’x’)
factor(x^2-1)
yields (x + 1)*(x - 1). To use two variables x and y:
x, y = var(’x, y’)
solve([x+y==6, x-y==4], x, y)
yielding [[x == 5, y == 1]]. We can take greatest common divisor:
x=var(’x’)
b=x^3+x^2
c=x^2+x
gcd(b,c)
yielding x2 + x. For some purposes, we will need to tell sage which field the coefficients
of our polynomials come from. We can get the quotient and remainder in polynomial
division by
R.<x> = PolynomialRing(QQ)
b=x^3+x^2
c=x^2+x
b.quo_rem(c)
yielding (x, 0). To find Bézout coefficients,
xgcd(x^4-x^2,x^3-x)
yields x3 − x, 0, 1 .
The expression R.<x> = PolynomialRing(QQ) tells sage to use polynomials in a
variable x with coefficients being rational numbers. The set of all rational numbers is
called Q, written QQ when we type into a computer. To work with coefficients that
are integers modulo a prime, for example with integers modulo 5:
R.<x> = PolynomialRing(GF(5))
b=x^3+x^2+1
c=x^2+x+1
b.quo_rem(c)
yields (x, 4x + 1). The set of all integers modulo a prime p is sometimes called GF(p).
We can define our own function to calculate Bézout coefficients, which does exactly
what the xgcd() function does on polynomials of one variable:
63
Sage
def bezpoly(b,c):
p,q,r=1,0,b
s,t,u=0,1,c
while u<>0:
if r.degree()>u.degree():
p,q,r,s,t,u=s,t,u,p,q,r
Q=u//r
s,t,u=s-Q*p,t-Q*q,u-Q*r
return (r,p,q)
Then the code
R.<t> = PolynomialRing(QQ)
f=(2*t+1)*(t+1)*(t-1)
g=(2*t+1)*(t-1)*t^2
bezpoly(f,g)
returns 2t2 − t − 1, −t + 1, 1 .
Chapter 9
Factoring polynomials
Silva: You’ve got many refinements. I don’t think you need to worry
about your failure at long division. I mean, after all, you got through
short division, and short division is all that a lady ought to cope with.
— Tennessee Williams
Baby Doll and Tiger Tail, Act I, Scene 2
Factoring with integer coefficients
Proposition 9.1. Working with coefficients remainders modulo a prime, if 0 =
a(x)b(x) for two polynomials a(x), b(x), then either a(x) = 0 or b(x) = 0.
Proof. The highest term of the product is the product of the highest terms, so has
coefficient the product of the coefficients of the highest terms. In problem 5.13 on
page 32, we saw that if a product of remainders modulo a prime vanishes (modulo
that prime), then one of the factors is zero. So the coefficient of the highest term of
either a(x) or of b(x) is zero. But then by definition that isn’t the highest term.
Lemma 9.2 (Gauss’s lemma). Take a polynomial p(x) with integer coefficients. Suppose that we can factor it as p(x) = b(x)c(x) into polynomials b(x), c(x) with rational
coefficients. Then we can rescale each of b(x) and c(x), by multiplying by nonzero
rational numbers, to arrange that p(x) = b(x)c(x) still but now b(x) and c(x) have
integer coefficients.
Proof. The idea is to just clear denominators. First, take a least common denominator
d for all of the coefficients of b(x) and c(x), and scale both sides with it, so that now we
have an equation d p(x) = b(x)c(x) with new polynomials b(x), c(x), but with integer
coefficients. Expand d into its prime factorization, say
d = p1 p2 . . . pn ,
where some of these primes might appear several times in the factorization. Quotient
out the coefficients of both sides of d p(x) = b(x)c(x) by the prime p1 to see 0 =
b̄(x)c̄(x). So then one of b̄(x) or c̄(x) vanishes (in other words, all of its coefficients
vanish). Say, for example, b̄(x) = 0. But then p1 divides into all coefficients of b(x),
so we can divide both sides of d p(x) = b(x)c(x) by p1 . Repeat until you have divided
away all of the prime factors in d.
65
66
Factoring polynomials
A polynomial with integer coefficients is irreducible if it does not split into a
product b(x)c(x) except for having b(x) = ±1. (Note that this notion of irreducibility
is different from being irreducible over a field.)
Corollary 9.3. Suppose that p(x) is a polynomial with coprime integer coefficients.
Then it is irreducible as an integer coefficient polynomial just when it is irreducible as
an rational coefficient polynomial.
Proof. By Gauss’ Lemma above, if p(x) factors into rational polynomials, then it
factors into integer polynomials. Conversely, since p(x) has coprime coefficients, if
p(x) factors into integer polynomials p(x) = b(x)c(x) then neither b(x) nor c(x) can
be constant polynomials.
We saw in example 8 on page 61 that x3 − 3x − 1 has no rational roots. If it
is reducible over the field of rational numbers, then it has a rational root; see
problem 8.5 on page 60. So therefore it is irreducible over the field of rational
numbers. Its coefficients are coprime, so it is irreducible over the integers.
Theorem 9.4. Every nonzero integer coefficient polynomial has a factorization into
irreducible integer coefficient polynomials, unique up to the order in which we write
down the factors and up to perhaps multiplying factors by −1.
Proof. Clearly we can assume that p(x) is not constant.
Let d be the greatest common divisor of the coefficients of p(x), so that p(x) =
d P (x), where the coefficients of P (x) are coprime. Since d factors uniquely into
primes, it suffices to prove that P (x) can be factored uniquely into irreducibles. Thus
we may assume that coefficients of p(x) are coprime.
Factor p(x) into irreducibles over the field of rational numbers. By Gauss’ Lemma,
such a factorization yields a factorization of p(x) into integer coefficient factors, whose
factors are constant rational number multiples of the rational coefficient factors. But
we want to check that the factors remain irreducible. Since the coefficients of p(x) are
coprime, the coefficients in each of these factors are also coprime: a common divisor
would pull out of the whole factorization. By corollary 9.3, each factor is irreducible
over the field of rational numbers. This shows that p(x) can be written as a finite
product of integer coefficient polynomials, each irreducible over the field of rational
numbers, so irreducible as an integer coefficient polynomial.
Suppose that we have two factorizations of p(x) into irreducible integer coefficient
polynomials. Recall that the factorization over the field of rationals is unique up to
reordering factors and scaling factors by nonzero constant rational numbers. Therefore
our two factorizations are each obtained from the other by such tricks. So if we take
out one of the factors P (x) from one factorization, and the corresponding factor Q(x)
from the other, then Q(x) = ab P (x) where a, b are integers. The coefficients of P (x)
are coprime integers, and so are those of Q(x). So bQ(x) = aP (x), and taking greatest
common divisor, we find |b| = |a|. So P (x) = ±Q(x).
Sage can test to see if a polynomial is irreducible:
Eisenstein’s criterion: checking that there are no more factors
R.<x> = PolynomialRing(QQ)
b=x^3+x+2
b.is_irreducible()
yields False. Indeed factor(b) yields (x + 1) · (x2 − x + 2). To work over the finite
field with 2 elements:
R.<x> = PolynomialRing(GF(2))
b=x^3+x+2
b.is_irreducible()
yields False, while factor(b) yields x · (x + 1)2 .
Eisenstein’s criterion: checking that there are no more factors
Proposition 9.5 (Eisenstein’s criterion). Take a polynomial
q(x) = an xn + am−1 xm−1 + · · · + a1 x + a0 .
with integer coefficients a0 , a1 , . . . , an . Suppose that there is a prime p so that
a. p does not divide the highest degree coefficient an and
b. p divides all of the other coefficients and
c. p2 does not divide the lowest coefficient a0 .
Then q(x) is irreducible over the field of rational numbers.
Proof. By corollary 9.3 on the preceding page, if q(x) factors over the field of rational
numbers, then it factors over the integers, say
an xn + an−1 xn−1 + · · · + a0 = (br xr + · · · + b0 ) (cs xs + · · · + c0 )
with integer coefficients
b0 , b 1 , . . . , b r , c 0 , c 1 , . . . , c s .
2
Since p, but not p , divides the lowest coefficient a0 = b0 c0 , p divides exactly one of
b0 , c0 , say b0 . Now from the equation
a1 = b0 c1 + b1 c0 ,
and the fact that p divides all but the highest degree coefficient an , we see that p
divides b1 . From the equation
a2 = b0 c2 + b1 c1 + b2 c0 ,
we see that p divides b2 . By induction, we find that p divides all coefficients
b0 , b 1 , . . . , b r .
This contradicts the condition that p does not divide an .
The polynomial x9 + 14x + 7 is irreducible by Eisenstein’s criterion.
67
68
Factoring polynomials
For any prime p and positive integer d, the polynomial xd − p satisfies Eisenstein’s criterion. Therefore there are no rational number roots of p of any
degree ≥ 2.
9.1 Give some examples of polynomials to which Eisenstein’s criterion applies.
Eisenstein’s criterion in sage
To check Eisenstein’s criterion, tell sage to work with integer coefficient polynomials:
R.<x> = PolynomialRing(ZZ)
Write a function to find a list of prime factors of any integer:
def prime_factors(x):
x=abs(x)
list=[]
p=2
while p<=x:
if x%p==0:
list=list+[p]
while x%p==0:
x=x//p
p=p+1
return list
Note that [] represents an empty list, and we add lists by concatenating them. So
prime_factors(-6) yields [2,3], the prime factors in order. Finally, we make a
function Eisenstein(b) to apply to integer coefficient polynomials, which returns
the lowest prime p for which Eisenstein’s criterion applies to our polynomial b(x), or
returns 0 if no such prime exists.
def Eisenstein(b):
c=b.coefficients()
highest=c.pop()
possible_primes=prime_factors(gcd(c))
for p in possible_primes:
if (highest%p<>0) and (c[0]%(p^2)<>0):
return p
return 0
For example
Eisenstein(2*x^8+27*x^4+3*x^2+6)
yields 3. The expression c=b.coefficients() yields a list of coefficients of the polynomial, in order from lowest to highest degree. The expression c.pop() returns the
last element in the list, and at the same time deletes that element from the list. If
Eisenstein(p) is not [], then p is irreducible by Eisenstein’s criterion.
69
Factoring real and complex polynomials
Factoring real and complex polynomials
Theorem 9.6. Every complex coefficient polynomial function
p(z) = a0 + a1 z + a2 z 2 + · · · + an z n
of any degree n splits into a product of linear factors
p(z) = an (z − r1 ) (z − r2 ) . . . (z − rn ) .
In particular, a complex coefficient polynomial is irreducible just when it is linear.
Proof. Apply the fundamental theorem of algebra to find a root r1 , and then divide
p(z) by z − r1 and apply induction.
Theorem 9.7. Every real coefficient polynomial function
p(x) = a0 + a1 x + a2 x2 + · · · + an xn
of any degree n splits into a product of real linear and quadratic factors
p(x) = an (x − r1 ) (x − r2 ) . . . (x − rk ) q1 (x)q2 (x) . . . q` (x),
where qj (x) = x2 + bj x + cj is quadratic with no real roots. In particular, a real
coefficient polynomial is irreducible just when it is linear or quadratic ax2 + bx + c,
with a 6= 0 and with b2 − 4ac < 0.
Proof. If the polynomial p(x) has a real root, divide off and apply induction on
degree of p(x). So we can assume p(x) has no real roots. Take a complex root, say
z1 = x1 + y1 i. Because all coefficients of p(x) are real, p(x) = p(x), for any real
number x, and more generally for a complex number z:
p(z) = p(z̄) .
In particular, if z1 = x1 + y1 i is a root, then z̄1 = x1 − y1 i is also a root. So p(z) is
divisible by
(z − z1 ) (z − z̄1 ) = z 2 − 2x1 z + |z1 |2 ,
a quadratic function with real coefficients. Divide this factor off any apply induction
on degree of p(x).
Partial fractions
Lemma 9.8. Take any rational function
p(x)
q(x)
over any field. Suppose that its denominator factors, say into coprime factors q(x) =
u(x)v(x). The it splits into a sum
p(x)
t(x)p(x)
s(x)p(x)
=
+
q(x)
u(x)
v(x)
where s(x), t(x) are the Bézout coefficients of u(x), v(x).
70
Factoring polynomials
Proof. Write out s(x)u(x) + t(x)v(x) = 1 and then multiply both sides by p(x) and
divide both sides by q(x).
b(x)
A partial fraction is a rational function c(x)
n where b(x) has smaller degree than
c(x) and c(x) is irreducible. A partial fraction decomposition of a rational function
p(x)
is an expression as a sum of a polynomial function together with finitely many
q(x)
partial fractions, so that the denominators of the partial fractions multiplied together
divide q(x).
Theorem 9.9. Every rational function over any field has a partial fraction decomposition, unique up to reordering the partial fractions.
Proof. Just repeat the process described in lemma 9.8 on the preceding page.
•
4
−1
1
=
+
.
x2 + 2x − 3
x+3
x−1
• Over the field of real numbers, the rational function
p(x)
x9 − 2x6 + 2x5 − 7x4 + 13x3 − 11x2 + 12x − 4
=
q(x)
x7 − 3x6 + 5x5 − 7x4 + 7x3 − 5x2 + 3x − 1
has partial fraction decomposition
p(x)
1
1
1
x+1
= x2 + 3x + 4 +
+
+ 2
+ 2
.
q(x)
(x − 1)
(x − 1)3
x +1
(x + 1)2
Note that we cannot get rid of the square in the last denominator,
because we have to allow powers. Also note that if we allow complex
numbers here instead of real numbers, then we can factor x2 + 1 =
(x − i)(x + i), so there is a different partial fraction decomposition over
the complex numbers, with lower degrees in denominators.
In sage:
x=var(’x’)
f = 4/(x^2+2*x-3)
f.partial_fraction(x)
1
yields − x+3
+
1
.
x−1
Integrals
Corollary
R 9.10. Every rational function f (x) with real coefficients has indefinite
integral f (x) dx a polynomial in functions of the form
x,
1
, log(x − c), log x2 + bx + c , arctan(x − c) ,
x−c
for finitely many constant numbers b, c.
71
Integrals
Proof. The partial fraction decomposition of f (x) is a sum of a polynomial and some
partial fractions. The integral of a polynomial is a polynomial, and the integral of
a sum is a sum, so it is enough to prove the result assuming that f (x) is a single
p(x)
partial fraction q(x)
n . Write out p(x) as a sum of terms, and thereby break up the
integral, so it is enough to assume that p(x) = xk for some integer k ≥ 0. Since q(x)
is irreducible, it is linear or quadratic with no real roots.
Suppose that q(x) is linear and n = 1. Since the degree of p(x) is smaller than
that of q(x), p(x) is a constant, and we can rescale to get p(x) = 1 and q(x) monic,
and translate in x by a constant to get q(x) = x:
Z
Z
f (x) dx =
dx
= log x + C.
x
Suppose that q(x) is quadratic and n = 1. Rescale to get q(x) and p(x) monic.
Since the degree of p(x) is smaller than that of q(x), p(x) is a constant or linear. This
q(x) has two complex roots, say z1 and z¯1 . Translate x by a real number constant to
get z1 to have vanishing real part: z1 = y1 i. Rescale variables to get y1 = 1, so that
q(x) = x2 + 1. If p(x) is linear, say p(x) = x + c, then
p(x)
x
1
=
+c
,
q(x)
q(x)
q(x)
so we can assume that p(x) = x or p(x) = 1. So then
x
f (x) = 2
,
x +1
Z
or
f (x) =
1
,
x2 + 1
f (x) dx =
log x2 + 1
2
+ C,
Z
f (x) dx = arctan(x) + C.
If q(x) = (x − c)n with n ≥ 2, translate to get q(x) = xn so that we can reduce
the problem to
Z
dx
x1−n
+ C.
=
xn
1−n
If q(x) is an irreducible quadratic, brought to a power n ≥ 2, translation gives us
q(x) = 1 + x2 as above. If p(x) = 1, then we use
Z
dx
x
2n − 1
=
+
2n (1 + x2 )n
2n
(1 + x2 )n+1
Z
dx
,
(1 + x2 )n
by induction. If p(x) = x2k+1 an even power, then use substitution u = x2 , to reduce
the problem down by induction. In general, differentiate the expression
xk
(1 + x2 )n
and integrate both sides to find
xk
+ C = −2n
(1 + x2 )n
Z
xk+1 dx
+k
(1 + x2 )n+1
Z
xk−1 dx
.
(1 + x2 )n
This enables us, by induction, to lower the order of the numerator, at the cost of
raising the order of the denominator, to reduce the integration to a previous case.
72
Factoring polynomials
Sage can integrate:
integral((1/x)+(x-1)/x^3,x)
yields − 22x−1
+ log (x).
x2
Several variables
Over any finite field, the polynomial
q(t) =
Y
(t − c)
c
(where the product is over all constants c in the field) vanishes for any value of the
variable t in that field.
Lemma 9.11. Take a nonzero polynomial in several variables, over a field k, that
vanishes for all values of those variables. Then the field k is finite and the polynomial
is expressible as
p(x) =
n
X
pi (x)q(xi )
i=1
where x = (x1 , x2 , . . . , xn ) and each pi (x) is a polynomial and
q(t) =
Y
(t − c)
c
is our polynomial that vanishes for all values of t in k. In particular, p(x) has degree
at least equal to the number of elements in the field in at least one of the variables.
Proof. In one variable, we can factor each root as in corollary 8.2 on page 60. Suppose
we have two variables x, y and a polynomial p(x, y). Set y to zero, and find that
by induction, the resulting polynomial p(x, 0) is divisible as required by q(x), say
p(x, 0) = q(x)p1 (x). So p(x, y) = q(x)p1 (x) + yh(x, y), say. It is good enough to prove
the result for yh(x, y) and add to q(x)p1 (x). So we can assume that p(x, y) = yh(x, y).
Moreover, since p(x, y) vanishes for all x, y values in our field, h(x, y) vanishes for all
x, y values in our field as long as y 6= 0.
Define a polynomial δ(t) by
δ(t) =
Y
c6=0
1−
t
,
c
where the product is over all nonzero constant elements c in our field. Check that
δ(0) = 1 while δ(b) = 0 for any nonzero element b of our field. Therefore
h(x, y) − δ(y)h(x, 0)
vanishes for all values of x, y in our field. By induction on degree, we can write
h(x, y) − δ(y)h(x, 0) = h1 (x, y)q(x) + h2 (x, y)q(y).
Plugging back into p(x, y) gives the result.
Factorization over rational functions and over polynomials
A linear function is a polynomial of the form
f (x1 , x2 , . . . , xn ) = a1 x1 + a2 x2 + · · · + an xn .
If not all of the coefficients a1 , a2 , . . . , an are zero, the set of points x = (x1 , x2 , . . . , xn )
at which f (x) = 0 is called a linear hyperplane.
Lemma 9.12. Suppose that in variables x = (x1 , x2 , . . . , xn )
a. p(x) is a polynomial and
b. f (x) is a nonzero linear function and
c. p(x) = 0 at every point x where f (x) = 0 and
d. the field we are working over contains more elements than the degree of p in
any variable.
Then f divides p, i.e. there is a polynomial q(x) so that p(x) = q(x)f (x).
Proof. By a linear change of variables, we can arrange that f (x) = x1 . Expand p(x)
in powers of x1 . If there is a “constant term”, i.e. a monomial in x2 , x3 , . . . , xn ,
then setting x1 = 0 we will still have some nonzero polynomial in the variables
x2 , x3 , . . . , xn . But this polynomial vanishes for all values of these variables, so is zero
by lemma 9.11 on the facing page.
Factorization over rational functions and over polynomials
Proposition 9.13 (Gauss’ lemma). Suppose that p(x, y) is a polynomial in two
variables over a field, and that p(x, y) factors as p(x, y) = b(x, y)c(x, y), where b(x, y)
and c(x, y) are polynomial in x with coefficients rational functions of y. Then p(x, y)
also factors in polynomials in x, y. To be precise, there are nonzero rational functions
s(y) and t(y) so that B(x, y) ..= s(y)b(x, y) is a polynomial in x, y and C(x, y) ..=
t(y)c(x, y) is a polynomial in x, y, and p(x, y) = B(x, y)C(x, y).
Proof. The coefficients in x on the right hand side of the equation p(x, y) = b(x, y)c(x, y)
are rational in y, hence are quotients of polynomials in y. Multiplying through by a
common denominator we obtain an equation
d(y)p(x, y) = B(x, y)C(x, y)
where B(x, y), C(x, y) are polynomials in x, y and d(y) is a nonzero polynomial. If
d(y) is constant, divide it into C(x, y), and the result is proven.
So we can assume that d(y) is nonconstant and write d(y) as a product of irreducible polynomials
d(y) = d1 (y)d2 (y) . . . dn (y).
Expanding out B(x, y) and C(x, y) into powers of x
B(x, y) =
X
Bj (y)xj ,
C(x, y) =
X
Cj (y)xj .
Then d1 (y) divides into all of the terms in
B(x, y)C(x, y) =
XX
n
j,k
Bj (y)Ck (y)xj+k .
73
74
Factoring polynomials
In particular, d1 (y) divides into the lowest term B0 (y)C0 (y), so into one of the factors,
say into B0 (y). Suppose that d1 (y) divides into all of the terms B0 (y), B1 (y), . . . , Bj−1 (y)
but not into Bj (y), and divides into all of the terms C0 (y), C1 (y), . . . , Ck−1 (y) but
not into Ck (y). Then pd1 (y) does not divide into the xj+k term, a contradiction.
Therefore d1 (y) divides into all terms in B(x, y) (and hence divides into B(x, y)), or
divides into all terms in C(x, y) (and hence divides into C(x, y)). We cancel out a
copy of d1 (y) from both sides of the equation
d(y)p(x, y) = B(x, y)C(x, y)
and proceed by induction on the degree of d(y).
Chapter 10
Resultants and discriminants
The resultant
For any polynomial
b(x) = b0 + b1 x + · · · + bm xm ,
denote its vector of coefficients as
b0
 b1 
~b =  .  .
 .. 
bm


Clearly the vector of coefficients of xb(x) is the same, with an extra row added to the
top, with a zero in that row. When we multiply two polynomials b(x) and
c(x) = c0 + c1 x + · · · + cn xn
the vector of coefficients of b(x)c(x) is

b0
 b1
 .
 .
 .


bm





b0
..
.
..
.
bm
..
.
...
...
..
.

 
 c0

  c1 
 
b0   ..  ,
 .
b1 
cn
.. 

.
bm
(with zeroes represented as blank spaces) and this (m + n + 1) × (n + 1) matrix we
denote by


b0
 b1 b0

 .

..
..
 .

.
.
 .



.


..
[b] = b
. . . b0  .
m


bm . . . b1 


.. 
..


.
.
bm
−
→
So the vectors of coefficient are related by bc = [b]~c.
75
76
Resultants and discriminants
The resultant of two polynomials
b(x) = b0 + b1 x + · · · + bm xm and
c(x) = c0 + c1 x + · · · + cn xn ,
denoted resb,c , is the determinant of the matrix [b] [c] , given by stacking the
matrices [b] and [c] of suitable sizes beside one another, to get a square matrix of size
(m + n) × (m + n). To remember the sizes: the number of columns in [b] is the degree
of c, while the number of columns in [c] is the degree of b.
• If
b(x) = 4 − x + 7x2 + x3 ,
c(x) = −6 + 2x,
then resb,c is the determinant of the 4 × 4 matrix
−6
2
0
0

4
−1
7
1
0
−6
2
0

0
0
,
−6
2
which is resb,c = −712.
• If we swap b and c, we swap columns, so clearly resc,b = (−1)mn resb,c .
• Take b(x) = x2 + 4x + 7 and c(x) = x3 and compute

resb,c
7
4

= det 1
0
0
0
7
4
1
0
0
0
7
4
1

0
0
0
1
0
0
0

0 = 73 .
0
1
• The easiest examples of arbitrary degrees arise when we take any b(x)
but try c(x) = xn :

resb,c
b0
 ..
 .

 ..
 .


= det b
m






b0
..
.
..
.
..
.
..
.
bm
..
.
b0
..
.
..
.
bm
1
..
.







n
 = b0 .






1
77
Sage
Sage
Computing resultants in sage requires us to tell sage what sort of numbers arise as
the coefficients of our polynomials.
P.<x> = PolynomialRing(QQ)
a=x^3+x+7
b=x^2+2
a.resultant(b)
yields 51. The expression P.<x> = PolynomialRing(QQ) tells sage that we are working
with polynomials with rational coefficients (QQ means rational) in a variable x.
We could write our own resultant function, just to see how it might work:
def resultant(b,c):
B=b.list()
C=c.list()
m=len(B)-1
n=len(C)-1
A=matrix(m+n,m+n)
for j in range(0,n):
for i in range(0,m+n):
if (0<=i-j) and (i-j<=m):
A[i,j]=B[i-j]
else:
A[i,j]=0
for j in range(n,m+n):
for i in range(0,m+n):
if (0<=i-j+n) and (i-j+n<=n):
A[i,j]=C[i-j+n]
else:
A[i,j]=0
return det(A)
Try it out:
t=var(’t’)
p = t^2+2*t+1
q = t+1
resultant(p,q)
yields 0.
Common factors and resultants
Lemma 10.1. Two polynomials b(x) and c(x) of degrees m and n, over any field,
have a nonconstant common factor just when
0 = u(x)b(x) + v(x)c(x)
for two polynomials u(x) and v(x) of degrees n − 1 and m − 1 at most, not both zero.
78
Resultants and discriminants
Proof. Two polynomials b(x) and c(x) of degrees m and n have a nonconstant common
factor just when we can write the polynomials factored, say as b(x) = v(x)d(x) and
c(x) = −u(x)d(x). But then
0 = u(x)b(x) + v(x)c(x).
On the other hand, suppose that
0 = u(x)b(x) + v(x)c(x)
for two polynomials u(x) and v(x) of degrees n − 1 and m − 1 at most. Suppose that
b(x) and c(x) have no common factor, so their greatest common divisor is 1. Write
out Bézout coefficients
1 = s(x)b(x) + t(x)c(x)
and compute
v = vsb + vtc,
= vsb + tvc,
= vsb − tub,
= b (vs − tu) .
But the degree of v is smaller than that of b, so v = 0. Swapping roles of b, c and of
u, v and of s, t, we get u = 0.
Proposition 10.2. Two polynomials, over any field, have a nonconstant common
factor just when their resultant vanishes.
Proof. Take two polynomials
u(x) = u0 + u1 x + · · · + un−1 xn−1 ,
v(x) = v0 + v1 x + · · · + vm−1 xm−1 .
The vector of coefficients of u(x)b(x) + v(x)c(x) is
−−−−→
bu + cv = [b]
[c]
~u
~v
By linear algebra, the determinant vanishes just when the matrix has a nonzero null
vector, i.e. a nonzero vector in its kernel. So the resultant vanishes just when there
is a choice of coefficients
u0 , u 1 , . . . , v 0 , v 1 , . . . ,
not all zero, so that u(x)b(x) + v(x)c(x) = 0. In other words, there are polynomials
u(x) and v(x) of required degrees so that u(x)b(x) + v(x)c(x) = 0. The theorem now
follows from lemma 10.1 on the previous page.
The resultant is brutal to compute, even for polynomials of fairly small degree.
Its importance, like that of the determinant of a matrix, arises from its theoretical
power.
10.1 Find the resultants, and use them to decide if there are nonconstant common
factors.
79
Common factors and resultants
a. x2 − 5x + 6, x3 − 3x2 + x − 3
b. x2 + 1, x − 1
10.2 Over the field of remainders modulo 2, calculate the resultant of b(x) = x3 +
x2 + x + 1, c(x) = x3 + x. Warning: the 6 × 6 determinant you get is actually easier
than it looks at first, so look for tricks to find it without any hard calculation.
Lemma 10.3. Take two polynomials b(x), c(x) in a variable x of degrees m, n. The
resultant r = resb,c is expressible as r = u(x)b(x) + v(x)c(x) where u(x) and v(x) are
polynomials of degrees n − 1, m − 1. One can make a particular choice of u(x), v(x)
so that the coefficients of u(x) and of v(x) are expressible as polynomial expressions
in the coefficients of b(x) and c(x), and those polynomial expressions have integer
coefficients.
Proof. Look at our matrix:

b0
 b1
 .
 .
 .
b0
...
..
.
bm

r = det 
 bm




..
c0
c1
..
.
.
...
...
..
.
b0
b1
..
.
bm
cn

c0
...
..
.
cn
..






c0 

c1 
.. 

.
.
...
...
..
.
cn
Add to the first row the second multiplied by x and then the third multiplied by x2
and so on, which doesn’t change the determinant:

b(x)
 b1
 .
 .
 .

r = det 
 bm




xb(x)
b0
..
.
..
.
bm
...
..
xn−1 b(x)
.
...
...
..
.
c(x)
c1
..
.
b0
b1
..
.
bm
xc(x)
c0
..
.
..
.
cn
cn
...
..
xm−1 c(x)
.
...
...
..
.
c0
c1
..
.
cn
Expand the determinant across the top row to see that
r = u(x)b(x) + v(x)c(x)
where

1
 b1
 .
 .
 .

u(x) = det 
bm




x
b0
..
.
..
.
bm
...
..
xn−1
c1
..
.
.
...
...
..
.

b0
b1
..
.
bm
cn
c0
..
.
..
.
cn
..
.
...
...
..
.






c0 

c1 
.. 

.
cn






.





80
Resultants and discriminants
and

 b1
 .
 .
 .

v(x) = det 
bm




b0
..
.
..
.
bm
..
1
c1
..
.
.
...
...
..
.
b0
b1
..
.
bm
cn
x
c0
..
.
..
.
cn
...
..
xm−1
.
...
...
..
.







c0 

c1 
.. 

.
cn
Note that we can expand out any determinant using only multiplication and addition
of entries and some ± signs.
If we scale up b(x) by a factor λ and look at the matrix whose determinant is our
resultant, we see that the resultant of b(x), c(x) scales by λn , where n is the degree
of c(x). So practically we only need to compute resultants of monic polynomials.
Proposition 10.4. Given two monic polynomials over any field, each factored into
linear factors,
b(x) = (x − β1 ) (x − β2 ) . . . (x − βm ) ,
c(x) = (x − γ1 ) (x − γ2 ) . . . (x − γn ) ,
then the resultant is
resb,c = (β1 − γ1 ) (β1 − γ2 ) . . . (βm − γn ) ,
= c(β1 ) c(β2 ) . . . c(βm ) ,
= (−1)mn b(γ1 ) b(γ2 ) . . . b(γn ) .
Proof. We will only give the proof here for polynomials over an infinite field. We will
give the proof over finite fields in chapter 13. If we expand out the expression for b(x)
into a sum of monomials, with coefficients being expressed in terms of these βi , and
similarly for c(x), then the resultant is a huge polynomial expression in terms of the
various βi and γj . This polynomial vanishes whenever βi = γj . Thinking of βi and
γj as abstract variables, the expression βi − γj is a linear function. Assume for the
moment that our field is infinite. Then the resultant is divisible by βi − γj , for any i
and j, by lemma 9.12 on page 73. Therefore the resultant is divisible by
(β1 − γ1 ) (β1 − γ2 ) . . . (βm − γn ) .
From our examples above, we see that the total number of such factors is mn, and
therefore each one arises just once, so our resultant is as described
resb,c = (β1 − γ1 ) (β1 − γ2 ) . . . (βm − γn ) ,
except perhaps for a constant factor, which we check by looking at our examples. For
a finite field, we will fix up this proof in chapter 13; for the moment, the reader will
have to accept that this can be done somehow.
81
Common factors and resultants
10.3 Suppose that b(x) is monic and c(x) is not and that they both split into linear
factors. Prove that
resb,c = c(β1 ) c(β2 ) . . . c(βm ) .
In particular, the resultant vanishes just when b(x) and c(x) have a common root.
Lemma 10.5. For any polynomials b(x), c(x), d(x),
resb,cd = resb,c resb,d .
Proof. If all of the polynomials split into linear factors, the result is clear:
resb,cd = c(β1 ) d(β1 ) . . . c(βm ) d(βm ) ,
= c(β1 ) c(β2 ) . . . c(βm ) d(β1 ) d(β2 ) . . . d(βm ) ,
= resb,c resb,d .
We will see eventually (in theorem 13.2 on page 126) that we can always replace
our field by a larger field in which any finite number of polynomials split into linear
factors.
10.4 Prove that
resb,c+db = resb,c
if deg d + deg b = deg c.
10.5 A fast trick to find resultants: use the result of the previous problem and the
computation on page 58 to compute resb,c where
b(x) = x4 − 3x + 5,
c(x) = x − 4.
The discriminant
For any polynomial p(x), with any sort of coefficients, say
p(x) = a0 + a1 x + · · · + an xn ,
we define the derivative
p0 (x) = a1 + 2a2 x + 3a3 x2 + · · · + nan xn ,
just imitating calculus. There is no interpretation of this as a “rate of change”, since
the coefficients could be remainders modulo some integer, or something much stranger.
The discriminant ∆p of a polynomial p(x) is the resultant with the derivative:
∆p ..= resp,p0 . By definition, the discriminant vanishes just when the polynomial has
a common factor with its derivative. For example, over the complex numbers, the
discriminant vanishes just when the polynomial has a “multiple root”, i.e. a repeated
linear factor:
p(z) = (z − z0 )2 . . . .
10.6 Find the discriminants of (a) ax2 + b + c, (b) x3 + x2 , (c) x3 + x2 + 1 and explain
what they tell you about these polynomials.
82
Resultants and discriminants
• The polynomial f (x) = 1 + 2x2 + x4 over the real numbers has no real
roots. Its derivative is f 0 (x) = 4x + 4x3 , so its discriminant is
1 0
0 1
2 0

∆f = det 
0 2
1 0

0
0
1
0
0
0
1
0
2
0
1
0
4
0
4
0
0
0
0
0
4
0
4
0
0
0
0
0
4
0
4
0
0
0
0

0
=0
4

0
4

(after a long calculation). So f (x) has a common factor with f 0 (x). We
can see the common factor more clearly if we factor:
f (x) = 1 + 2x2 + x4 = 1 + x2
0
3
f (x) = 4x + 4x = 4x 1 + x
2
2
,
.
So f (x), f 0 (x) share a common factor of 1 + x2 . They don’t share a
root over the real numbers, but over the complex numbers
f (x) = (x − i)2 (x + i)2 ,
f 0 (x) = 4x(x − i)(x + i),
so they share roots at x = ±i, the double roots of f (x).
• The polynomial p(x) = x3 + 2x2 + 2x + 2 is irreducible over the rational
numbers by Eisenstein’s criterion, so has no rational roots. It has odd
degree so at least one real root. There is no positive real root since
all terms are positive. The derivative is p0 (x) = 3x2 + 4x + 2. By the
quadratic formula, the zeroes of p0 (x) are at
√
2
2 2
x=− ±
i,
3
3
not real. Clearly p0 (x) > 0 for all values of x. Therefore p(x) is
increasing, so has precisely one real root, irrational and negative. The
two remaining complex roots are nonreal conjugates. The discriminant
of p(x) is


2 0 2 0 0
2 2 4 2 0


∆ = det 2 2 3 4 2 = −44.
1 2 0 3 4
0 1 0 0 3
Since ∆ 6= 0, we see that there are no double roots over the complex
numbers, which is already clear.
10.7 Prove that for the discriminant ∆ of a cubic polynomial the sign tells us that
83
Sage
the number of roots is:
∆>0
∆=0
∆<0
3 distinct roots
a triple root or 1 real double root and 1 real single root
1 real and 2 complex roots.
Sage
Sage computes discriminants:
R.<t> = PolynomialRing(QQ)
(t^2-t+7).discriminant()
yields −27.
Parameterisation
Take the curve in the plane parameterised by
(x(t), y(t)) = t2 − 1, t t2 − 1
:
We want to find an equation for this curve, as a polynomial in x and y.
Proposition 10.6. Draw a curve
x(t) =
p(t)
u(t)
, y(t) =
,
q(t)
v(t)
where p(t), q(t), u(t), v(t) are polynomials with complex coefficients. The resultant in
the variable t of the polynomials
p(t) − xq(t), u(t) − yv(t)
vanishes along the points (x, y) in the plane which lie along this curve.
Proof. The resultant vanishes just at those points (x, y) where the two polynomials
have a common factor. If there is a common factor of two polynomials, that factor has
a root over the complex numbers, so there is a common root t of both polynomials,
in other words a value of t where x = p(t)/q(t) and y = u(t)/v(t). On the other hand,
a common root gives a common linear factor. So the resultant vanishes just along
the image of the curve in the “plane”. This “plane” has points given as pairs (x, y)
of complex numbers, so it is not the complex plane, but two copies of the complex
plane.
84
Resultants and discriminants
• For the curve in the plane parameterised by
(x(t), y(t)) = t2 − 1, t t2 − 1
,
we have to take the resultant of
t2 − 1 − x, t t2 − 1 − y,
as a polynomial in t; treat x and y as constants:

−(1 + x)

0

1
det 

−(1 + x)
0
−1
0
−(1 + x)
0
−1
−y
−1
0
1

−y 

−1 = y 2 − x2 − x3
0 
1
So the curve is precisely the set of points (x, y) so that y 2 = x2 + x3 .
• For real variables, the curve x = t2 , y = t4 has both x ≥ 0 and y ≥ 0
since t2 ≥ 0 and t4 ≥ 0. So the curve is the right half of the parabola
y = x2 on which x ≥ 0. The resultant comes out as

−x
0

1
det 
0
0
0
0
−x
0
1
0
0
0
0
−x
0
1
0
0
0
0
−x
0
1
−y
0
0
0
1
0

0
−y 

2
0 
2
= x −y .
0 
0 
1
So the resultant vanishes along y = x2 , the whole parabola. So we
don’t quite get what we wanted: we get the algebraic equation y = x2
that that curve satisfies, but we miss out on the inequality x ≥ 0.
10.8 Find a polynomial equation f (x, y) = 0 whose solutions are the points of the
curve
x(t) = t2 , y(t) = t3 − t.
10.9 Find a polynomial equation f (x, y) = 0 whose solutions are the points of the
curve
1
x(t) = t2 + t, y(t) = .
t
Sage
Sage handles 2 variable polynomials:
P.<x,y> = PolynomialRing(QQ)
a = x + y
b = x^3 - y^3
a.resultant(b)
85
Sage
yields −2y 3 . If we want to use the resultant in the other variable, we specify which
variable to get rid of: a.resultant(b, y) yields 2x3 . Sage will take a parameterisation
of a curve and give us the equation of the curve. Our example above becomes:
R.<x,y,t> = PolynomialRing(QQ)
(t^2-1-x).resultant(t*(t^2-1)-y,t)
yields −x3 − x2 + y 2 .
Chapter 11
Groups
Numbers measure quantity, groups measure symmetry
— Mark Armstrong
Groups and Symmetry
The theory of groups is, as it were, the whole of mathematics stripped
of its matter and reduced to pure form.
— Henri Poincaré
Transformation groups
Take a set X, which could be a set of points in the plane, or a set of numbers, or a
collection of various geometric figures, or the set of books in a library, or any other
set of any sort of things. A transformation f of X associates to each element x of X
some element f (x) also from X.
• Take a square X in the plane, and rotate it around its centre by a right
angle. Each point p of X is rotated into some point f (p). Note that
the centre point of the square is “fixed”, i.e. rotated into itself.
• Take any set X. The identity transformation is the transformation
that leaves every point of X where it is: f (x) = x).
• Shuffle a deck of cards. If X is the set of cards, the shuffle is a transformation f of X.
Two maps f : X → Y and g : Y → X between sets are inverses if f (g(x)) = x
and g(f (x)) = x for any element x of X. In other words, f ◦ g and g ◦ f are both the
identity transformation. Recall that a map f is invertible, also called a bijection or a
permutation if it has an inverse. A bijection transformation is a permutation.
11.1 A map f : X → Y of a set X is injective (also called one-to-one) if f (p) = f (q)
only when p = q. It is surjective (also called onto) if every element y of Y has the
form y = f (x) for some element x of X. Prove that a map is a bijection just when it
is injective and surjective.
87
88
Groups
11.2 Prove that a transformation f of a finite set X is injective just when it is
surjective, and hence just when it is a bijection.
11.3 Prove that a bijection f has a unique inverse.
The inverse of a bijection f is denoted f −1 .
A transformation group G on a set X is a nonempty collection of permutations of
X so that
a. if f and g are in G then f ◦ g is in G.
b. if f is in G then f −1 is also in G.
• Take X the Euclidean plane. Every rotation of the plane around the
origin is by some angle, say θ. Conversely, every angle θ is the angle
of some rotation around the origin. So angles form a transformation
group G, the group of rotations of the plane X around the origin.
• Take X to be three dimensional Euclidean space. Take a line ` through
the origin, with a chosen direction along that line. Hold the line in
your hand, so that your thumb points up the chosen direction. Then
your figures curl around the line in a certain direction. In additional,
take an angle θ, and rotate the whole space X around in the direction
of your fingers, by the angle θ. Every rotation of Euclidean space X
arises in this way. (If you change the choice of direction to point your
thumb, but keep the same axis of rotation `, then you change θ to −θ.)
When you compose a rotation with another rotation, the result is yet a
third rotation. The rotations of Euclidean space form a transformation
group G.
• Take X to be the real number line. Given any real number t, we can
define a transformation f (x) = x + t. In this way, the real numbers t
are associated to the elements of a group G of transformations of the
real number line.
• A rigid motion of Euclidean space is a transformation that preserves
distances. For example, rotations, mirror reflections: (x, y) 7→ (x, −y),
and translations (x, y) 7→ (x + x0 , y + y0 ) are rigid motions of the plane.
The rigid motions form a transformation group.
• Take a geometric figure X in the plane, or in space. A rigid motion
preserving X is a symmetries of X; symmetries of figure form a transformation group. For example, if X is a square, every symmetry permutes
the corners somehow, and we can easily see these symmetries:
2
3
1
2
4
3
1
4
4
3
1
2
The symmetry group of a non-isosceles triangle consists just of the
identity transformation. For an isosceles, but not equilateral, triangle,
the symmetry group is the identity transformation and the reflection
across the axis of symmetry. An equilateral triangle has as symmetries:
the identity transformation, the rotation by 120◦ , the rotation by 240◦ ,
and the three reflections in the three axes of symmetry.
89
Transformation groups
• The symmetry group of an infinitely repeating sodium choride lattice
includes the translations of space that move each green chlorine atom
and purple natrium atom to its neighbor, up, down, left, right, forward
or back, and mirror reflections around the coordinate planes through
the origin, where we put the origin in the middle of any one of these
atoms, and also the transformations given by permuting the order of
the variables x, y, z of each coordinate axis.
• Every 2 × 2 matrix
A=
a
c
b
d
(with entries from any chosen field) determines a linear transformation:
x
y
x
7 A
→
y
=
a
c
b
d
x
y
=
ax + by
cx + dy
of the plane. Similarly, a square matrix A, say n × n, determines
a linear transformation of n-dimensional Euclidean space. A linear
transformation is a bijection just when the associated matrix A is
invertible, i.e. just when A has nonzero determinant. The invertible
n×n matrices form a transformation group G, of linear transformations
of n-dimensional Euclidean space.
• Take X to be the set of numbers { 1, 2, 3, 4 }. The symmetric group on 4
letters is the group G of all permutations of X. The terminology “on 4
letters” reminds us that we could just as well have taken X = { a, b, c, d }
to consist of any 4 distinct elements, and then we would get essentially
the same story, just relabelling. It is convenient notation to write a
permutation of finite sets as, for example,
1
2
2
3
3
1
4
4
to mean the map f so that f (1) = 2, f (2) = 3, f (3) = 1 and f (4) = 4,
i.e. reading down from the top row as the input to f to the entry
underneath as the output of f . It is often understood that the first row
is just 1 2 3 4 or some such, so we can just write down the second row:
2 3 1 4, to display the permutation.
90
Groups
11.4 What is are the permutations g ◦ f and f ◦ g where g is
2314
and f is
3 1 2 4?
Another notation for permutations: the cycle (1342) means the permutation that
takes 1 to 3, 3 to 4, 4 to 2 and 2 to 1, reading from left to right. More generally,
the cycle (a1 a2 . . . ak ) is the permutation taking a1 to a2 , a2 to a3 , and so on, and
taking ak to a1 . So (2314) takes 3 to 1. By definition, if we have a number x which is
not among these various a1 , a2 , . . . , ak , then the cycle (a1 a2 . . . ak ) fixes the number
x: (2354) takes 1 to 1. Two cycles (a1 a2 . . . ak ) and (b1 b2 . . . b` ) are disjoint if none
of the ai occur among the bj and vice versa. So
(1452) is disjoint from (36),
but
(1352) is not disjoint from (36)
since they both have 3 in them. We can write the identity permutation as (). Note
that any cycle of length 1 is also the identity transformation: (2) = (1) = (), since (2)
takes 2 to 2, and fixes everything else. If two permutations f and g can be expressed
as disjoint cycles, then f g = gf , since each fixes all of the letters moved by the other.
Sage knows how to multiply cycles:
G = SymmetricGroup(5)
sigma = G("(1,3) (2,5,4)")
rho = G([(2,4), (1,5)])
rho^(-1) * sigma * rho
prints (1, 2, 4)(3, 5).
Theorem 11.1. Every permutation of the elements of a finite set is expressed as a
product of disjoint cycles, uniquely except for the order in which we write down the
cycles, which can be arbitrary.
Proof. Suppose our finite set is 1, 2, . . . , n. Our permutation f takes 1 to some number
f (1), takes that number to some other number f (f (1)), and so on, and eventually
(because there are finitely many), if we repeat it enough times we cycle back to a
number we have seen before. Suppose that this first happens at the k-th step. Writing
f 2 1 to mean f ◦ f (1), not multiplication but composition, we are saying that f k 1 must
already be among 1, f 1, . . . , f k−1 1, say f k 1 = f ` 1 for some 0 ≤ ` < k. Suppose that
we pick k the smallest possible Since f is a bijection, if ` > 0 we apply f −1 to both
sides of f k 1 = f ` 1 to find a smaller value of k, a contradiction. So ` = 0, i.e. f k 1 = 1
is our first repeat. So we have a cycle (1 f 1 . . . f k−1 1). But that cycle might not
exhaust the whole of f , since there might be some other numbers among 1, 2, . . . , n
not among the elements of this cycle.
Take the smallest one, say b. Since f is a bijection, if any number from b, f b, f 2 b, . . .
is also in the cycle (1 f 1 f 2 1 . . . f k−1 1, we can apply f −1 to both sides as many
times as we need to get that b is already in that cycle, a contradiction. By the same
argument above, replacing 1 by b, we get another cycle (b f b . . . f `−1 b) disjoint from
the first. Continue in this way by induction to express f as a product of disjoint
91
Transformation groups
cycles. This expression is unique, since each cycle is precisely built from applying
f repeatedly to any element until a cycle forms, so any two such expressions have
exactly the same cycle through any element.
For example, in cycle notation
and
1
3
1
2
2
3
3
1
4
4
2
4
3
1
4
5
5
2
= (123)
= (13)(245).
11.5 Prove that any set with n elements has n! permutations, in several ways: by
counting
a. how many places to move 1 to, and then how many places there are left to move
2 to, and so on,
b. the number of places to stick object n in between the elements of any given
permutation a1 a2 . . . an−1 of the numbers 1, 2, . . . , n−1, say as na1 a2 . . . an−1 or
a1 na2 . . . an−1 or a1 a2 n . . . an−1 and so on, for example: given the permutation
312
of the numbers 1, 2, 3, we can put 4 in there as
4 3 1 2,
3 4 1 2,
3 1 4 2, or
3 1 2 4.
c. the number of ways to put at the end of a given permutation a1 a2 . . . an−1 some
number b from among
1
1
1
b = ,1 + ,...,n − ,
2
2
2
for example into the permutation 3 1 2 we stick
312
1
,
2
or 3 1 2
3
,
2
or 3 1 2
5
,
2
or 3 1 2
7
2
and then replace the numbers in the list by integers from 1, 2, . . . , n, but in the
same order:
4 2 3 1,
or 4 1 3 2,
or 4 1 2 3,
or 3 1 2 4.
d. the possible disjoint cycles.
Sage knows all of the permutations of small symmetric groups, and will print them
out for you in cycle notation:
H = DihedralGroup(6)
H.list()
yields
,
(1, 5)(2, 4),
(1, 4)(2, 3)(5, 6),
(1, 2)(3, 6)(4, 5),
(1, 6)(2, 5)(3, 4),
(2, 6)(3, 5),
(1, 6, 5, 4, 3, 2),
(1, 5, 3)(2, 6, 4),
(1, 2, 3, 4, 5, 6),
(1, 3, 5)(2, 4, 6),
(1, 4)(2, 5)(3, 6),
(1, 3)(4, 6)
where the identity element is the blank space at the start.
92
Groups
Groups
Each symmetry of an equilateral triangle in the plane permutes the vertices. If we
label the vertices, say as 1, 2, 3, then every permutation of the labels 1, 2, 3 occurs
from a unique symmetry of the triangle. So the symmetries of the equilateral triangle
correspond to the permutations of the labels, i.e. the symmetry group of the equilateral
triangle is, in some sense, the same as the symmetric group on three letters. It is
this notion of correspondence that leads us to see that two transformation groups
can be “the same” while they are transforming very different sets; in our example,
the symmetries of the equilateral triangle transform the infinitely many points of the
triangle, while the permutations of three letters transform a set of three letters. In
order to say what is “the same” in the two transformation groups, we need to move
to a more abstract concept of group.
A group is a set G so that
a. To any elements a, b of G there is associated an element ab of G, called the
product of a and b.
b. The product is associative: if a, b, c are elements of G, then (ab)c = a(bc), so
that either expression is denoted as abc.
c. There is an identity element: some element 1 of G, so that a1 = 1a = a for any
element a of G.
d. Every element is invertible: if a is an element of G, there is an element, denoted
a−1 , of G for which aa−1 = a−1 a = 1.
• Nonzero rational numbers form a group G under usual multiplication,
where a−1 means the reciprocal of any rational number a. Note that 0
does not have a reciprocal, so we have to remove it.
• Nonzero elements of any field form a group G under usual multiplication,
where a−1 means the reciprocal of any element a. Note that 0 does not
have a reciprocal, so we have to remove it.
• Any transformation group G is a group, using composition of transformations as its product, and the identity transformation as its identity
element. In particular, as examples of groups: the symmetry groups of
geometric figures, the group of invertible matrices (under matrix multiplication), the symmetry group on some letters, the group of rotations
of the Euclidean plane, and the group of rotations of Euclidean space.
It is easier to check if a collection of transformations forms a transformation group
than to check if some abstract product operation forms a group; almost all of our
examples of groups will be transformation groups in some obvious way.
Two elements a, b of a group commute if ab = ba. A group is abelian if any two
elements of the group commute.
• Nonzero rational numbers form an abelian group under usual multiplication.
• Nonzero elements of any field form an abelian group under usual multiplication.
93
Additive notation
• Invertible 2 × 2 matrices, with entries drawn from some chosen field,
form an non-abelian group under usual matrix multiplication.
• The symmetries of the equilateral triangle form a non-abelian group:
if we rotate by 120◦ and the reflect across an angle bisector, we get
a different result than if we first reflect across that bisector and then
rotate.
11.6 Prove that, for any elements a, b, c, d of a group G,
a(b(cd)) = (ab)(cd) = ((ab)c)d.
Generalize by induction to prove that parentheses are not needed in expressions of
any length.
11.7 For any element a of a group G, define a0 to mean 1, and define by induction
an+1 = aan . Prove by induction that am an = am+n for an integers m, n.
11.8 Suppose that 1, 10 are two elements of a group G, and that a1 = 1a = a and
also that a10 = 10 a = a for any element a of the group G). Prove that 1 = 10 .
11.9 Given elements a, b of a group G, prove that the equation ax = b has a unique
solution x in G. Prove also that the equation ya = b has a unique solution y in G.
11.10 Prove that (ab)−1 = b−1 a−1 for any elements a, b of any group.
11.11 Suppose that x belongs to a group G and that x2 = x. Prove that x = 1.
11.12 Prove that the set of nonzero rational numbers, with product operation defined
as usual division, do not form a group. (It might help to write the operation is some
funny notation like a ? b, instead of ab, which easily gets confused with multiplication.)
Additive notation
When a group is abelian, it is often preferred to write the group operation not as ab
but instead as a + b, and the identity element as 0 rather than 1, and the inverse as
−a rather than a−1 .
• The set of integers form an abelian group, where the group operation
is usual addition.
• Any field is an abelian group, where the group operation is usual
addition.
• The set of 2 × 2 matrices, with integer entries form an abelian group,
where the group operation is usual addition. The same for rational
entries, real entries, complex entries, entries drawn from any field, and
for 2 × 3 matrices, and so on.
• The rotations of the plane around the origin form an abelian group,
writing each rotation in terms of its angle of rotation, and using addition
of angles, i.e. composition of rotations, as the group operation. (Careful: the rotations of Euclidean 3-dimensional space form a non-abelian
group.)
94
Groups
Sage knows many finite groups, and can say whether they are abelian:
H = SymmetricGroup(6)
H.is_abelian()
yields False.
Multiplication tables
If a group G is finite (i.e. has finitely many elements), we let |G| be the number
of elements of G, called the order of G. We can then write a multiplication table,
with rows representing some element x, columns some element y, and entries xy For
example, if a group G has 2 elements, say 1 and a, then it is easy to see that a2 = 1
so the multiplication table must be
1
a
a
a
1
1
1
a
11.13 Prove that any group G of order 3
multiplication table
1
1 1
a a
b b
has (in some labelled of its elements) the
a
a
b
1
b
b
1
a
If G and H are two groups, their product is the group G × H whose elements are
pairs (g, h) of elements g of G and h of H, with product
(g1 , h1 )(g2 , h2 ) = (g1 g2 , h1 h2 )
and identity element (1, 1). Given a multiplication table for G and one for H, we can
make one for G × H by taking each row a and column b entry ab for G, and each row
x and column y entry xy for H, and writing out (a, x) as a row for G × H, (b, y) as a
row for G × H, and (ab, xy) as entry. For example, if G = H are both the group of
order 2 above, then G × H has multiplication table:
1
(a, 1)
(1, a)
(a, a)
1
1
(a, 1)
(1, a)
(a, a)
(a, 1)
(a, 1)
1
(a, a)
(1, a)
(1, a)
(1, a)
(a, a)
1
(a, 1)
(a, a)
(a, a)
(1, a)
(a, 1)
1
An isomorphism φ : G → H is a bijection between elements of two groups G and
H so that φ(xy) = φ(x)φ(y): φ preserves multiplication. If there is some isomorphism
between two groups, they are isomorphic, and for all purposes of algebra we can treat
them as being the same group.
• In problem 11.13, we saw that all groups of order 3 are isomorphic.
• The remainders modulo 4 form a group G1 , under addition as the group
operation. The set of complex numbers G2 = { 1, −1, i, −i } form a
group under multiplication.
95
Cyclic groups
• The matrices
1
0
0
0
,
1
−1
1
−1
,
0
0
0
0
,
−1
1
−1
0
form a group G3 under multiplication. All of these groups are isomorphic. For example, map 0 7→ 1, 1 7→ i, 2 7→ −1, 3 7→ −i to map
G1 → G2 .
• Let G be the group of order 2 we found above and H the group of
remainders modulo 4. The groups G × G and H are both of order 4,
but they are not isomorphic, because H contains the element 1 and
every element of H is expressed in terms of 1 as precisely one of the 4
elements
1, 1 + 1, 1 + 1 + 1, 1 + 1 + 1 + 1.
So if G (in multiplication notation!) were isomorphic to H, we would
have to find some element (x, y) of G so that the 4 elements
(x, y), (x, y)2 , (x, y)3 , (x, y)4
consistute the whole of G. But check the multiplication table above
to see that (x, y)2 = (1, 1), i.e. you can write 4 different elements
expressed in terms of products of a single element.
Sage knows the multiplication tables of many groups. The dihedral group Dn is
the symmetry group of a regular polygon with n equal sides and angles. If we type
H = DihedralGroup(6)
H.cayley_table()
we see the multiplication table:
∗
a
b
c
d
e
f
g
h
i
j
k
l
a
a
b
c
d
e
f
g
h
i
j
k
l
b
b
a
d
c
h
g
f
e
l
k
j
i
c
c
e
f
b
j
i
d
a
k
l
h
g
d
d
h
g
a
k
l
c
b
j
i
e
f
e
e
c
b
f
a
d
i
j
g
h
l
k
f
f
j
i
e
l
k
b
c
h
g
a
d
g
g
k
l
h
i
j
a
d
e
f
b
c
h
h
d
a
g
b
c
l
k
f
e
i
j
i
i
l
k
j
g
h
e
f
a
d
c
b
j
j
f
e
i
c
b
k
l
d
a
g
h
k
k
g
h
l
d
a
j
i
c
b
f
e
l
l
i
j
k
f
e
h
g
b
c
d
a
where a is the identity element.
Cyclic groups
A group G is cyclic if it consists of the powers of a single element x, i.e. G consists
of the powers xn for integers n. For example, the integers (written additively) are
96
Groups
generated by multiplies of 1, as is the group of remainders modulo some integer n ≥ 0.
Theorem 11.2. Every cyclic group G is isomorphic to either the integers (when
|G| = ∞) or to the integers modulo an integer n ≥ 0, when |G| = n.
Proof. Suppose that G consists of the powers xk of a single element x. Define a map
f : Z → G by f (k) = xk . Then clearly f (k + `) = xk+` = xk x` = f (k)f (`). If f
is a bijection, then f is our isomorphism. So suppose that f is not a bijection. By
definition, f is surjective, so f must fail to be injective, i.e. f (k) = f (`) for some
k =
6 `, i.e. xk = x` , i.e. xk−` = 0 i.e. f (k − `) = 1. Let K ⊂ Z be the set of all
integers m so that f (m) = 1. Note that K contains zero, and K also contains some
nonzero element k − `. If m is in K then xm = 1 so x2m = 1 and so on, i.e. if m is
in K then all multiples of m are in K. If K contains some integers b, c, take Bézout
coefficients sb + tc = g and then
f (g) = f (sb + tc) = xsb+tc = (xb )s (xc )t = 1s 1t = 1.
Hence g is in K as well: the greatest common divisor n of all elements of K also
lies in K. But then every multiple of n also lies in K. Hence K consists precisely
of all multiples of n, i.e. K = nZ, and so G = { 1, x, . . . , xn−1 } is isomorphic to the
remainders modulo n, taking each remainder b to xb .
The order of an element a of a group G is the order of the cyclic group consisting
of the powers ak .
11.14 Take any two elements a, b of some group. Prove that ab and ba have the same
order.
11.15 Take any two elements a, b of order two in some group. If ab also has order
two, prove that a and b commute.
11.16 Take a permutation f of n letters 1, 2, . . . , n, expressed as a product of disjoint
cycles. Prove that the order of f is the least common multiple of the orders of the
disjoint cycles.
Sage can compute the order of an element of a group:
G = SymmetricGroup(5)
sigma = G("(1,3) (2,5,4)")
sigma.order()
yields 6.
Graphs
Take a hexagon X in the plane and let G be its symmetry group. Label the
vertices of X in order as we travel around the vertices counterclockwise, say
as 1, 2, 3, 4, 5, 6.
97
Graphs
2
1
3
6
4
5
Any symmetry moves 1 to one of the six vertices, and moves 2 to a next
door neighbor vertex. We can draw the vertices of the hexagon twice, one for
drawing where 1 is taken to, and once again to represent whether 2 is taken
to the vertex counterclockwise from 1 or clockwise.
In a sense, this is a picture of the group, since every element of the group is
completely determined once we know where vertices 1 and 2 go: the rest are
then in order cyclically around the hexagon. Let b be rotation of the hexagon
by 60◦ , and let c be reflection of the hexagon across the axis of symmetry
through vertex 1. Label the elements of the group according to how they
move vertices 1 and 2, i.e. how much of b and c they contain:
b2
b1
b2 c
b3
b1 c
c
b3 c
b4 c
b4
1
b5 c
b5
Multiplication of each element by b on the left, i.e. x 7→ bx, spins the hexagon
around to right.
98
Groups
b2
b1
b2 c
b3
b1 c
c
b3 c
b4 c
b4
1
b5 c
b5
Multiplication of each element by c on the right, i.e. x 7→ xc, moves the
inward hexagon points to the outward ones, and vice versa.
Elements g, h, . . . of a group G generate that group if every element can be written
as a product of integer powers of those elements. If we fix some elements generating
a group G, the Cayley graph of G (with respect to those generators) is the set of
elements of G, each draw as a dot, together with an arrow from one dot to another,
labelled by g, if the first dot is some element k and the second is gk (or kg, depending
on which ordering we prefer). These pictures are almost never useful in proving results
about groups, but they give us some intuition.
Sage knows the Cayley graphs (for various generators) of many groups. If we type
H = DihedralGroup(6)
show(H.cayley_graph())
we see the graph:
The graph is labelled by the cycles of each element, thought of as a permutation of
the vertices of the hexagon.
99
Subgroups
Subgroups
A subset H of a group G is a subgroup if
a. H contains the identity element and
b. if a and b are elements of H then ab is also an element of H,
c. if b is an element of H then b−1 is also an element of H.
It follows that H is itself a group.
• The even numbers are a subgroup of the integers, under usual integer
addition.
• The invertible integer 2 × 2 matrices are a subgroup of the invertible
real 2 × 2 matrices.
• If b is an element of any group G, then the set of all elements of G of
the form bk for integers k is a cyclic group, the subgroup generated by
b.
• The real 2 × 2 matrices of unit determinant form a subgroup of the
group of all invertible real 2 × 2 matrices, since determinants multiply.
Cosets
If S is any subset of a group G and b is any element of G, denote by bS the set of
elements of the form bs for s in S, called a left translate or left coset of S; if we write
coset, we mean left coset. Denote by Sb the set of elements of the form sb for s in
S, called a right translate or right coset of S. Each coset bS has the same number of
elements as S, since s 7→ bs has inverse t 7→ b−1 t, so is a bijection.
• The group G of symmetries of the hexagon has a subgroup H of rotations of the hexagon. Divide up G into H-cosets:
b2
b1
b2 c
b3
b1 c
c
b3 c
b4 c
b4
1
b5 c
b5
It is more helpful to draw our cosets as columns of a matrix
100
Groups
1
c
b1
b1 c
b2
b2 c
b3
b3 c
b4
b4 c
b5
b5 c
to emphasize that they are all the same size, each a copy of H.
• Let G be the group of rotations and H the subgroup of those rotations
which preserve a regular hexagon X around the origin. For each angle
θ, which we think of as a rotation g, gH is the set of all rotations gh for
h ∈ H, i.e. geometrically the rotations which take the given hexagon
X into the hexagon gX.
• Let G be the group of integers under addition, and H the subgroup of
multiples of 7, i.e. H is the set of all integers of the form 7k for any
integer k. Our translates in this notation have the form 1 + H, 2 + H,
and so on. The translate 1 + H is the set of all numbers of the form
1 + 7k for any integer k. The translate 2 + H is the set of all numbers
of the form 2 + 7k for any integer k, and so on.
Lemma 11.3. If x and y are elements of a group G and H is a subgroup of G,
then xH = yH just when y −1 x belongs to H. Any two cosets xH and yH are either
disjoint: xH ∩ yH empty, or equal: xH = yH.
Proof. Suppose that xH = yH. Then every element xh of xH must belong to yH,
i.e. have the form yh0 for some element h0 of H, so xh = yh0 , i.e. x = yh0 h−1 , and so
y −1 x = h0 h−1 is in H.
Conversely, if y −1 x is in H, then every element xh of xH has the form
xh = (yy −1 )xh,
= y(y −1 xh),
= yh0
101
Cosets
where h0 = y −1 xh is in H. Similarly, every element yh of yH has the form
yh = (xx−1 )xh,
= x(x−1 yh),
= xh0
where h0 = x−1 yh = (y −1 x)−1 h is in H.
If xH intersects yH, say at xh = yh0 , then y −1 xh = h0 lies in H so y −1 x lies in
H, i.e. xH = yH.
If H is a subgroup of a group G, we let G/H be the collection of all cosets of H
in G.
• The group G of symmetries of the hexagon has a subgroup H of rotations of the hexagon. If we draw each coset as a column, we draw each
point of G/H as a single dot underneath that column:
1
c
b1
b1 c
b2
b2 c
b3
b3 c
b4
b4 c
b5
b5 c
1H
cH
• Let G be the group of all rotations of the plane and H the subgroup of
those rotations which preserve a regular hexagon X around the origin.
For each angle θ, which we think of as a rotation g, gH is the set of all
rotations gh for h ∈ H, i.e. geometrically the rotations which take the
given hexagon X into the hexagon gX. So each coset gH in G/H is
identified with a rotated hexagon gX, i.e. G/H is the set of all rotated
hexagons gX for any rotation g.
• Let G be the group of integers under addition, and H the subgroup of
multiples of 7, i.e. H is the set of all integers of the form 7k for any
102
Groups
integer k. Our translates in this notation have the form 1 + H, 2 + H,
and so on. The translate 1 + H is the set of all numbers of the form
1 + 7k for any integer k, which we denote by 1̄. The translate 2 + H is
the set of all numbers of the form 2 + 7k for any integer k, which we
denote by 2̄, and so on. Of course, 7̄ = 0̄ is the coset of H. So G/H
is the set of remainders modulo 7. Each element of G/H represents
“gluing together” all elements of G that differ by a multiple of H.
Actions
A group G acts on a set X if each element b of G has an associated transformation
fg of X so that fbc = fb ◦ fc for any elements b, c of G. So then G is a transformation
group of X, and write gx to mean fg (x). The action is transitive if any two points
p, q of X are connected by some element of G, i.e. q = gp for some g in G. The
stabilizer of a point x0 of X is the subset Gx0 of G consisting of those g in G so that
gx0 = x0 . If a group G acts on two sets X and Y , a map f : X → Y is equivariant if
f (gx) = gf (x) for any x in X and g in G. An equivariant bijection makes two actions
practically identical.
Lemma 11.4. Given a group action of a group G on a set X, and an element x0
of X, the stabilizer Gx0 of each point is a subgroup.
Proof. Clearly if bx0 = x0 and cx0 = x0 then (bc)x0 = b(cx0 ) = bx0 = x0 and so
on.
Subgroups correspond to transitive actions; geometrically it is always easier to
think about actions, but algebraically it is easier to work with subgroups.
Theorem 11.5. If H is a subgroup of a group G, and we let X = G/H, then G
acts on X transitively by b(cH) = (bc)H with stabilizer Gx0 = H where x0 is the
coset 1H. Conversely, if G acts transitively on a set X, and we let H = Gx0 , there
is a unique equivariant bijection f : G/H → X, so that f (gH) = gx0 .
Proof. Clearly b(cH) = (bc)H is the set of elements of G of the form bch for h in H.
Hence the action of G on G/H is defined.
Take any element gH of G/H. So gH is a coset. By lemma 11.3 on page 100,
knowing only the coset gH, we can determine the element g, up to replacing g by gh for
any h ∈ H. But then the expression gx0 will only change to (gh)x0 = g(hx0 ) = gx0 ,
i.e. not at all. So knowledge of the coset gH determines knowledge of gx0 , i.e. f is
well defined. It is then easy to check that f is equivariant. Since G acts transitively
on X, every element x of X has the form x = gx0 for some g, f is surjective. If
f (bH) = f (cH) then bx0 = cx0 so c−1 bx0 = x0 i.e. c−1 b lies in Gx0 . But H is by
definition Gx0 , so c−1 b lies in H, i.e. bH = cH, so f is injective.
Theorem 11.6 (Lagrange). If H is a finite subgroup of a group G then
|G/H| =
|G|
.
|H|
Proof. The set G is divided up into disjoint cosets, each containing |H| elements.
103
Group presentations
Corollary 11.7. The order of any finite subgroup of a group divides the order of the
group.
Corollary 11.8. The order of any element of a group divides the order of the group.
Corollary 11.9. Every finite group of prime order, say order p, is cyclic, isomorphic
to the group of remainders modulo p under addition.
Proof. Take any element which is not the identity element, so cannot have order 1.
Since G is finite, the order of the element is finite. Since |G| is prime, the order of
the element divides this prime, so equals this prime.
Take a group G acting on a set X. The orbit Gx of an element x of X is the set
of all elements of X of the form gx for g in G: it is where G takes x. Every orbit is,
by definition, acted on transitively by G, so the map G/Gx → Gx taking gGx 7→ gx
is an equivariant bijection.
11.17 If X is a hexagon in the plane, and G is the symmetry group of X, what are
all of the orbits of G?
A fixed point of a group G acting on a set X is a point x fixed by all elements of
G, i.e. gx = x for any g in G.
11.18 If a finite group G acts on a finite set X, and the order of X is coprime to the
order of G, prove that X contains a fixed point of the G-action.
Group presentations
The symmetries of an equilateral triangle can all be expressed in terms of
the rotation, call it b, by 120◦ , and the reflection, call it c, about any one
the axes of symmetry through one of the vertices. Clearly b3 = 1, i.e. if we
rotate three times by 120◦ , the resulting rotation by 360◦ moves every point
back where it started. Similarly c2 = 1. The reader can draw pictures to see
that bc = cb2 . Once we have these three rules, any expression in the group
in terms of b and c can be successively simplified by these three rules, to an
expression in which c only appears to the first power, i.e. no c2 or c−1 or
c−13 appears, and b appears only to the power 1 or 2, and b always appears
after c. Hence the group is precisely
1, b, c, b2 , cb, cb2 .
Take an abstract set S. A word on the alphabet S is a finite sequence of choices
of element of S and integer power. We write each word as a string of symbols
a
sa1 1 sa2 2 . . . skk .
We allow the empty string, and write it as 1. Reduce a word by deleting any s0 symbols
and replacing any subword sp sq by sp+q . The free group hSi on an alphabet S is
the collection of all reduced words on S. The multiplication operation: multiply two
words by writing down one after the other, called concatenation and then reducing.
104
Groups
The inverse operation: write down the word in reverse order, with opposite signs for
the powers.
Take an abstract set S with associated free group hSi. Take a subset R ⊂ hSi.
The group generated by S with relations R, denoted hS|Ri, is the quotient of hSi by
the smallest normal subgroup containing R. The expression of a group in the form
hS|Ri is a presentation of the group with generators S and relations R.
Lemma 11.10. Every map of sets f : S → G to a group extends uniquely to a
morphism of groups f : hSi → G. It therefore extends to an injective morphism of
groups hS|Ri → G where R is the set of all words
a
w = sa1 1 sa2 2 . . . skk
on the alphabet S for which
f (s1 )a1 . . . f (sk )ak = 1.
In practice, we usually write the relations not as a set R, but as a collection of
equations like
w1 = w2
between words. This equation means that we require w2−1 w1 ∈ R.
The two transformations
T : (x, y) 7→ (x, y + 1)
and
F : (x, y) 7→ (x + 1, −y)
of the plane, a translation and a flip, satisfy T F T = F , and generate some
group H, the symmetries of the wallpaper pattern:
Let G = hf, t|tf t = f i; in other words G is the group generated by alphabet
{ t, f } with relation tf t = t, i.e. with R = { t−1 tf t }. So we have a surjective
map G → H, mapping t 7→ T and f 7→ F . Here t is a formal symbol, not
an actual transformation of the plane, while T is the actual transformation.
Take any word in G. Wherever we find tf we replace it by f t−1 . Wherever
we find t−1 f we replace it by f t. Wherever we find tf −1 we replace it by
f −1 t−1 . Wherever we find t−1 f −1 we replace it by f −1 t. The reader can
check that these are all consequences of tf t = f . Therefore any word in G
can be written uniquely as f p tq . If the corresponding transformation F p T q is
the identity, then it fixes the origin, and so the translations of the x variable
105
Morphisms
cancel each other out, i.e. p = 0. But then T q (x, y) = (x, y + q) fixes the
origin just when q = 0. So the only element of G mapping to the trivial
transformation of the plane is 1. We can see that G → H is an isomorphism.
Morphisms
A morphism of groups is a map φ : G → H so that φ(xy) = φ(x)φ(y): φ preserves
multiplication. We have seen several of these.
• Label the vertices of an equilateral triangle as 1, 2, 3. Identify the
group G of symmetries of the triangle with the symmetric group H on
3 letters, by taking each symmetry g of the equilateral triangle to the
permutation h = φ(g) of its vertices.
• Label the vertices of any polygon X in the plane, or polyhedron X in
Euclidean 3-dimensional space, with n vertices. Map the group G of
symmetries of X to the symmetric group H on n letters, by taking
each symmetry g of X to the permutation h = φ(g) of its vertices. This
map might not be surjective. For example, if X is a square in the plane
2
3
1
2
4
3
1
4
4
3
1
2
no rotation or reflection will ever put 1 and 2 diagonally across from
each other, because that would pull them apart to a greater distance.
• Every element g of any group G has associated a morphism
φ : Z 7→ G,
given by φ(k) = g k . This map is surjective just when G is cyclic, and
injective just when g has infinite order.
Almost everything we know about any group arises from looking at morphisms to
that group and from that group. The kernel of a morphism φ : G → H is the set of
all elements g of G for which φ(g) = 1, i.e. the elements “killed” by φ. The image of
a morphism φ : G → H is the set of all elements h of H for which h = φ(g) for some
g in G.
11.19 Prove that the kernel of any morphism of groups φ : G → H is a subgroup of
G.
11.20 Prove that the kernel of any morphism of groups φ : G → H is { 1 } just when
φ is injective.
11.21 Prove that the image of any morphism of groups φ : G → H is a subgroup of
H.
It seems natural to ask which subgroups arise as kernels and as images.
106
Groups
11.22 Prove that every subgroup G of any group H is the image of a morphism of
groups φ : G → H.
A subgroup K of a group G is normal if gKg −1 = K for any g in G.
11.23 Every rigid motion f of three dimensional Euclidean space can be written
uniquely as f (x) = Ax + b where A is an orthogonal matrix and b is a vector in
three dimensional Euclidean space. A translation is a rigid motion f of the form
f (x) = x + b. An orthogonal transformation is a rigid motion f of the form f (x) = Ax.
Let G be the group of all rigid motions three dimensional Euclidean space. Let T be
the set of all translations. Prove that T is a normal subgroup of G. Let U be the set
of all orthogonal transformations. Prove that U is not a normal subgroup of G.
Theorem 11.11. A subset K of a group G is a normal subgroup if and only if K is
the kernel of some morphism φ : G → H to some group H. Indeed, if K is normal,
then there is a unique group operation on G/K so that the map φ : G → G/K given
by φ(g) = gK is a group morphism with kernel K.
Proof. If K is the kernel of a morphism φ : G → H to some group H, then an element
k of G belongs to K just when φ(k) = 1. But
φ(gkg −1 ) = φ(g)φ(k)φ(g)−1
so φ(k) = 1 implies φ(gkg −1 ) = 1 i.e. if k lies in K then gkg −1 does too. Similarly,
φ(k) = φ(g −1 g k g −1 g),
= φ(g)−1 φ(g k g −1 )φ(g),
so φ(gkg −1 ) = 1 implies φ(k) = 1 i.e. k lies in K just when gkg −1 does too. Hence
K is normal.
Suppose that K is normal. Define on G/K the operation (bK)(cK) = bcK. It
is convenient to write each coset bK of K as b̄. Then the operation is b̄c̄ = bc, with
identity 1̄. To see that this is well defined, note that knowledge of the coset bK
determines b precisely up to replacing b by bk for some k in K. Hence b̄ determines
b uniquely up to the equation b̄ = bk for k in K. So given b̄ and c̄, we know b and
c precisely up to replacing them with bk0 and ck1 for some k0 , k1 in K. Hence this
replaces bc by
bk0 ck1 = bc(c−1 k0 c)k1 .
But then taking K-cosets,
bk0 ck1 = bc.
¯ comes out the same for b, c and for bk0 and ck1 , i.e.
Hence the resulting product bc
the operation is well defined. The kernel of this operation is the set of elements b of
G for which b̄ = 1̄ i.e. bK = K i.e. b in K.
Sage can tell you whether the subgroup H generated by some elements r1 , r2 , r3
is a normal subgroup:
A4 = AlternatingGroup(4)
r1 = A4("(1,2) (3,4)")
r2 = A4("(1,3) (2,4)")
r3 = A4("(1,4) (2,3)")
H = A4.subgroup([r1, r2, r3])
H.is_normal(A4)
107
Morphisms
yields True.
Theorem 11.12. The image of any morphism of groups φ : G → H with kernel K
is isomorphic to G/K by the isomorphism
φ̄ : G/K → H
given by φ̄(gK) = φ(g). If φ is onto, then φ̄ is onto.
Proof. When we map g to φ(g), if we replace g by gk for any k in K, this replaces
φ(g) by φ(gk) = φ(g)φ(k) = φ(g)1 = φ(g). Hence φ̄(ḡ) = φ(g) is well defined. It is
morphism of groups because φ̄(bc) = φ(bc) = φ(b)φ(c) = φ̄(b̄)φ̄(c̄). Its kernel is { 1̄ },
because φ̄(b̄) = 1 just when φ(b) = 1 just when b is in K, so just when b̄ = 1̄ is the
identity element of G/K.
Sage knows how to construct the quotient group G/H:
A4 = AlternatingGroup(4)
r1 = A4("(1,2) (3,4)")
r2 = A4("(1,3) (2,4)")
r3 = A4("(1,4) (2,3)")
H = A4.subgroup([r1, r2, r3])
A4.quotient(H)
yields h(1, 2, 3)i, meaning that the quotient group is isomorphic to the group of
permutations generated by the cycle (1 2 3).
Lemma 11.13. Suppose that φ : G → H is a morphism of groups and that NG
and NH are normal subgroups of G and H and that φ(NG ) lies in NH . Then there
is a unique morphism φ̄ : G/NG → H/NH so that φ̄(gNG ) = φ(g)NH .
Proof. If we replace g by gn for some n in NG , then we alter φ(g) to φ(gn) = φ(g)φ(n).
So we don’t change φ(g)NH . Hence φ̄ is defined. It is easy to check that φ̄ is a
morphism of groups.
11.24 The kernel K of the action of a group G on a set X is the set of elements g
of G so that gx = x for every x in X, i.e. the kernel is the set of elements that act
trivially. Prove that the kernel of any action is a normal subgroup, and that there is
a unique action of G/K on X so that (gK)x = gx for any g in G and x in X.
Theorem 11.14. Suppose that N is a normal subgroup of a group G and lies inside
another normal subgroup H of G. Then G/H is isomorphic to (G/N )/(H/N ) via
the map f (gH) = (gN )(H/N ).
Proof. Note that H/N is a normal subgroup of G/N because its elements are hN for
h in H and so, for any gN in G/N , so
(gN )(hN )(gN )−1 = (gN )(hN )(g −1 N ) = ghg −1 N
lies in H/N . Therefore by the above result, the surjective morphism φ : G → G/N
descends to a surjective morphism φ̄ : G/H → (G/N )/(H/N ). The kernel of this
morphism consist of the gH so that φ(g) lies in H/N , i.e. g lies in H, i.e. gH = H is
the identity element.
108
Groups
Amalgamations
Suppose that G and H are two groups. We would like to define a group G ∗ H, which
contains G and H, and is otherwise “as unconstrained as possible”. The product
G × H is not “unconstrained”, because the elements of G commute with those of H
inside G × H.
First, note that any group G has an obvious group morphism hGi → G given by
g 7→ g. It will help to write out concatenations using some symbol like
g1 ∗ g2 ∗ · · · ∗ gn ∈ hGi .
Then we can write our group morphism as
g1 ∗ g2 ∗ · · · ∗ gn ∈ hGi 7→ g1 g2 . . . gn ∈ G.
This group morphism is clearly surjective, with kernel precisely the group NG ⊂ hGi
whose elements are the concatenations
g1 ∗ g2 ∗ · · · ∗ gn
for which g1 g2 . . . gn = 1. So we can write
G = hG|NG i .
Think of NG as encoding all of the equations satisfied inside the group G.
The free product G ∗ H to be the group hG t H|NG t NH i generated by the
elements of G and H, subject to the relations consisting of all equations satisfied by
elements of G together with all equations satisfied by elements of H.
Another way to look at this: a word in G, H is a finite sequence of elements of G
and of H (perhaps empty), written beside one another with ∗ symbols inbetween, like
g1 ∗ g2 ∗ h1 ∗ g3 ∗ h2 ∗ h3 ∗ g4 ,
et cetera. We denote the empty sequence as 1. We reduce a word by deleting any
appearance of the identity element (of either group), and also by replacing any two
neighboring elements from the same group by their product in that group:
g1 ∗ g2 7→ g1 g2 .
A word is reduced if we cannot further reduce it. The group G ∗ H is the set of reduced
words, with multiplication being simply writing down one word after another and
then reducing.
A further wrinkle: suppose that K ⊂ G is a subgroup which also appears as a
subgroup of H: K ⊂ H. The amalgamation of G and H over K, denoted G ∗K H,, is
G ∗K H = hG t H|NG t NH t Ei
where E is the collection of equations kG = kH where kG is an element of K as a
subgroup of G, and kH is the associated element of K ⊂ H.
Chapter 12
Permuting roots
This letter, if judged by the novelty and profundity of ideas it contains,
is perhaps the most substantial piece of writing in the whole literature
of mankind.
— Hermann Weyl
On Évariste Galois’s letter, written the night before Galois
died in a pistol duel. Galois’s letter is about permutations
of roots of polynomials.
Vieta’s formulas
For now, we work over any field. Which polynomials have which roots? How are the
coefficients related to the roots?
To get a quadratic polynomial p(x) to have roots at 3 and 7, we need it to
have x − 3 and x − 7 as factors. So it has to be
p(x) = a(x − 3)(x − 7),
= a x2 − 7x − 3x + (−3)(−7) ,
= a x2 − (3 + 7)x + 3 · 7 .
A polynomial is monic if its leading coefficient is 1. We can make any polynomial
monic by dividing off the leading term:
3x2 − 4x + 1 = 3 x2 −
4
1
x+
.
3
3
The only monic quadratic polynomial p(x) with roots x = 3 and x = 7 is
p(x) = x2 − (3 + 7)x + 3 · 7.
The only monic quadratic polynomial p(x) with roots x = r and x = s, by the same
steps, must be
p(x) = x2 − (r + s)x + rs.
By the same steps, the only cubic polynomial q(x) with roots at x = r, x = s and
x = t is
q(x) = (x − r)(x − s)(x − t).
109
110
Permuting roots
If we multiply it all out, we get
q(x) = x3 − (r + s + t)x2 + (rs + rt + st)x − rst.
Ignoring the minus signs, the coefficients are
1
r+s+t
rs + rt + st
rst
There is some pattern:
a. There is a minus sign, turning on and off like a light switch, each time we write
down a term.
b. Each term involves successively more roots multiplied together, and added up
over all choices of that many roots.
Proposition 12.1 (Vieta’s formula (Vieté)). If a monic polynomial p(x) splits into
a product of linear factors, say
p(x) = (x − r1 ) (x − r2 ) . . . (x − rn )
then the numbers r1 , r2 , . . . , rn are the roots of p(x), and the coefficients of p(x), say
p(x) = xn − an−1 xn−1 + · · · ± a1 x ± a0 ,
(with signs switching each term in front of the aj coefficients) are computed from the
roots by
an−1 = r1 + r2 + · · · + rn ,
an−2 =
X
ri rj ,
i<j
..
.
ai =
X
ri1 ri2 . . . rin−i ,
i1 <i2 <···<in−i
..
.
a2 = r1 r2 . . . rn−1 + r1 r2 . . . rn−2 rn + · · · + r2 r3 . . . rn ,
a1 = r1 r2 . . . rn .
111
Vieta’s formulas
The elementary symmetric functions are the polynomials e1 , . . . , en of variables
t1 , t2 , . . . , tn , given by
e1 (t1 , t2 , . . . , tn ) = t1 + t2 + · · · + tn ,
e2 (t1 , t2 , . . . , tn ) = t1 t2 + t1 t3 + · · · + tn−1 tn ,
=
X
ti tj ,
i<j
..
.
ei (t1 , t2 , . . . , tn ) =
X
ti1 ti2 . . . tin−i ,
i1 <i2 <···<in−i
..
.
en−1 (t1 , t2 , . . . , tn ) = t1 t2 . . . tn−1 + t1 t2 . . . tn−2 tn + · · · + t2 t3 . . . tn ,
en (t1 , t2 , . . . , tn ) = t1 t2 . . . tn .
So we can restate our proposition as:
Proposition 12.2 (Vieta’s formula). If a monic polynomial p(x) splits into a product
of linear factors, say
p(x) = (x − r1 ) (x − r2 ) . . . (x − rn )
then the numbers r1 , r2 , . . . , rn are the roots of p(x), and the coefficients of p(x):
p(x) = xn − e1 xn−1 + . . . (−1)n−1 en−1 x + (−1)n en ,
are the elementary symmetric functions
ei = ei (r1 , r2 , . . . , rn )
of the roots r1 , r2 , . . . , rn .
Proof. We can see this immediately for linear polynomials: p(x) = x − r, and we
checked it above for quadratic and cubic ones. It is convenient to rewrite the elementary
symmetric functions as
ej (r1 , r2 , . . . , rn ) =
X
r1b1 r2b2 . . . rnbn
where the sum is over all choices of numbers b1 , b2 , . . . , bn so that
a. each of these bi is either 0 or 1 and
b. so that altogether
b1 + b2 + b3 + · · · + bn = j,
or in other words, “turn on” j of the roots, and turn off the rest, and multiply out
the ones that are turned on, and then sum over all choices of which to turn on.
If we expand out the product
p(x) = (x − r1 ) (x − r2 ) . . . (x − rn )
we do so by pick whether to multiply with x or with −r1 from the first factor, and
then whether to multiply with x or with −r2 from the second, and so on, and we add
112
Permuting roots
over all of these choices. Each choice write as b1 = 0 if we pick to multiply in the x,
but b1 = 1 if we pick to multiply in the −r1 , and so on:
p(x) =
X
(−r1 )b1 (−r2 )b2 . . . (−rn )bn x(1−b1 )+(1−b2 )+···+(1−bn ) ,
b1 ,...,bn
=
X
X
j
b1 +···+bn =j
(−r1 )b1 (−r2 )b2 . . . (−rn )bn xn−j .
Symmetric functions
A function f (t1 , t2 , . . . , tn ) is symmetric if its value is unchanged by permuting the
variables t1 , t2 , . . . , tn .
For example, t1 + t2 + · · · + tn is clearly symmetric. For any numbers t =
(t1 , t2 , . . . , tn ) let
Pt (x) = (x − t1 ) (x − t2 ) . . . (x − tn ) .
Clearly the roots of Pt (x) are precisely the entries of the vector t.
Let e(t) = (e1 (t), e2 (t), . . . , en (t)), so that e maps vectors to vectors (with entries
in our field).
Lemma 12.3. Over the field of complex numbers, the map e is onto, i.e. for each
complex vector c there is a complex vector t so that e(t) = c. In particular, there is
no relation between the elementary symmetric functions.
Proof. Let t1 , t2 , . . . , tn be the complex roots of the polynomial
P (x) = xn − c1 xn−1 + c2 xn−2 + · · · + (−1)n cn .
Such roots exist by the fundamental theorem of algebra (see theorem 7.3 on page 53).
By proposition 12.2 on the previous page, Pt (x) has coefficients precisely the same as
those of P (x), i.e. Pt (x) = P (x).
Lemma 12.4. Over any field, the entries of two vectors s and t are permutations of
one another just when e(s) = e(t), i.e. just when the elementary symmetric functions
take the same values at s and at t.
Proof. The roots of Ps (x) and Pt (x) are the same numbers.
Corollary 12.5. A function is symmetric just when it is a function of the elementary
symmetric functions.
This means that every symmetric function f (t1 , t2 , . . . , tn ) has the form f (t) =
h(e(t)), for a unique function h, and conversely if h is any function at all, then
f (t) = h(e(t)) determines a symmetric function. We want a recipe to write down
each symmetric polynomial function in terms of the elemenary symmetric functions,
to find this mysterious h.
Take the function
f = 3x2 yz + 3xy 2 z + 3xyz 2 + 5xy + 5xz + 5yz.
113
Symmetric functions
Clearly the terms with the 5’s look like an elementary symmetric function:
f = 3x2 yz + 3xy 2 z + 3xyz 2 + 5e2 .
(We write e2 to mean the function e2 (x, y, z) = xy + xz + yz, the 2nd
elementary symmetric function.) But what about the terms with the 3’s?
Factor them all together as much as we can:
f = 3xyz(x + y + z) + 5e2 .
Then it is clear:
f = 3e3 e1 + 5e2 .
If a = (a1 , a2 , . . . , an ), write ta to mean ta1 1 ta2 2 . . . tann . Order terms by “alphabetical” order, also called weight, by taking account of the order in t1 first, and then if
the orders of two terms in t1 are tied, break the tie by looking at the order in t2 , and
so on.
• The term
t51 t32
has higher weight than any of
t31 t52 , t51 t22 , 1, and t1000
.
2
• Showing the highest order term, and writing dots to indicate lower
order terms,
e1 (t) = t1 + . . . ,
e2 (t) = t1 t2 + . . . ,
..
.
ej (t) = t1 t2 . . . tj + . . .
..
.
en (t) = t1 t2 . . . tn .
If a symmetric polynomial contains a term, then it also contains every term
obtained by permuting the variables. The weights appear in all possible permutations,
like
t31 t52 + t51 t32 .
Among those permutations, the unique highest weight term, say ta , has a = (a1 , a2 , . . . , an )
with a1 ≥ a2 ≥ · · · ≥ an .
Take a symmetric polynomial
9 7
f = 6t16
1 t2 t3 + . . .
where we only write out the highest weight term. The highest weight term
contains the variables t1 , t2 , t3 , with smallest power in t3 and higher powers
in the others. Inside the highest weight term, each of t1 , t2 , t3 appears to at
114
Permuting roots
least a power of 7. Factor out 7 powers of each variable; we underline that
factored out part so you can see it:
f = 6t91 t22 (t1 t2 t3 )7 + . . .
In the remaining factors, each variable appears at least to a power of 2, so
we factor out 2 of each:
f = 6t71 (t1 t2 )2 (t1 t2 t3 )7 + . . .
So finally it is clear that f has the same highest weight as 6e71 e22 e73 . Hence
f = 6e71 e22 e73 + . . .
up to terms of lower weight.
If we pick integers d1 , d2 , . . . , dn ≥ 0,
e1 (t)d1 e2 (t)d2 . . . en (t)dn = td11 (t1 t2 )d2 . . . (t1 t2 . . . tn )dn + . . . ,
= t1d1 +d2 +···+dn t2d2 +d3 +···+dn . . . tdnn + . . . .
Write this more neatly as
e(t)d = ta + . . .
where
a1 ..= d1 + d2 + d3 + · · · + dn ,
a2 ..= d2 + d3 + · · · + dn ,
..
.
a3 ..= d3 + · · · + dn
an ..= dn .
On the other hand, given a decreasing sequence of nonnegative integers
a = (a1 , a2 , . . . , an )
calculate d by
d1 ..= a2 − a1 ,
d2 ..= a3 − a2 ,
..
.
dn ..= an
and then e(t)d = ta + . . .. So for each decreasing sequence a = (a1 , a2 , . . . , an ), we
have ta = e(t)d + . . ..
Theorem 12.6. Every symmetric polynomial f has exactly one expression as a
polynomial in the elementary symmetric polynomials. If f has real/rational/integer
coefficients, then f is a real/rational/integer coefficient polynomial of the elementary
symmetric polynomials.
115
Sage
Proof. The highest weight term in f is a constant multiple of a monomial ta . Every
symmetric polynomial, if it contains a monomial ta among its terms, also contains tb ,
for any b obtained by permuting the entries of a. So if there is any term at all, there
is one ta for which a has decreasing entries. As above, let
dn = an , dn−1 = an−1 − an , dn−2 = an−2 − an−1 , . . . , d1 = a1 − a2 .
Then e(t)d has leading term ta , so subtracting off a suitable multiple of e(t)d from f
gives us a polynomial whose highest weight term is lower than ta . Apply induction
on the highest term.
The sum of squares of two variables is symmetric:
t21 + t22 = (t1 + t2 )2 − 2 t1 t2 .
To compute out these expressions: f (t) = t21 + t22 has highest term t21 . The
polynomials e1 (t) and e2 (t) have highest terms t1 and t1 t2 . So we subtract
off e1 (t)2 from f (t), and find f (t) − e1 (t)2 = −2 t1 t2 = −2e2 (t).
12.1 Express each of the following polynomials as polynomials in the elementary
symmetric functions:
a. 4xyz 3 + 4xzy 3 + 4yzx3
b. x4 y 4 z 4
c. xyz + x2 y 2 z 2
d. x3 y 3 z + x3 yz z + xy 3 z 3
Sage
Sage can compute with elementary symmetric polynomials. It writes them in a
strange notation. The polynomials we have denoted by e3 sage denotes by e[3].
More strangely, sage denotes e1 e32 e25 as e[5,5,2,2,2,1] which is the same in sage
as e[2]^3*e[1]*e[5]^2. You can’t write e[1,2,2,2,5,5]; you have to write it as
e[5,5,2,2,2,1]: the indices have to decrease. Set up the required rings:
P.<w,x,y,z>=PolynomialRing(QQ)
S=SymmetricFunctions(QQ)
e=S.e()
This creates a ring P = Q[w, x, y, z], and then lets S be the ring of symmetric functions.
The last (mysterious) line sets up the object e to be the elementary symmetric functions
of the ring S. We can then define a polynomial in our variables:
f = w^2+x^2+y^2+z^2+w*x+w*y+w*z+x*y+x*z+y*z
e.from_polynomial(f)
which prints out e1,1 − e2 , the expression of f in terms of elementary symmetric
functions. To expand out a symmetric polynomial into w, x, y, z variables:
116
Permuting roots
q = e[2,1]+e[3]
q.expand(4,alphabet=[’w’,’x’,’y’,’z’])
prints out
w2 x+wx2 +w2 y+4wxy+x2 y+wy 2 +xy 2 +w2 z+4wxz+x2 z+4wyz+4xyz+y 2 z+wz 2 +xz 2 +yz 2
Sums of powers
Define pj (t) = tj1 + tj2 + · · · + tjn , the sums of powers.
Lemma 12.7 (Isaac Newton). The sums of powers are related to the elementary
symmetric functions by
0 = e1 − p1 ,
0 = 2 e2 − p1 e1 + p2 ,
..
.
0 = k ek − p1 ek−1 + p2 ek−2 − · · · + (−1)k−1 pk−1 e1 + (−1)k pk ,
Using these equations, we can write the elementary symmetric functions inductively in terms of the sums of powers, or vice versa.
Proof. Let’s write t(`) for t with the `th entry removed, so if t is a vector with n
entries, then t(`) is a vector with n − 1 entries.
pj ek−j =
X
tj`
`
X
ti1 ti2 . . . tik−j
i1 <i2 <···<ik−j
Either we can’t pull a t` factor out of the second sum, or we can:
i1 ,i2 ,···6=`
=
X
`
=
X
tj`
i1 ,i2 ,···6=`
X
ti1 ti2 . . . tik−j +
X
i1 <i2 <···<ik−j
tj` ek−j t(`) +
`
`
X
X
tj+1
`
ti1 ti2 . . . tik−j−1
i1 <i2 <···<ik−j−1
tj+1
ek−j−1 t(`) .
`
`
Putting in successive terms of our sum,
pj ek−j − pj+1 ek−j−1 =
X
−
X
=
X
tj` ek−j t(`) +
X
`
tj+1
ek−j−1 t(`)
`
`
tj+1
ek−j−1
`
(`)
t
−
X
−
`
X
tj+2
ek−j−2 t(`)
`
`
tj` ek−j
(`)
t
`
tj+2
ek−j−2
`
t(`) .
`
Hence the sum collapses to
p1 ek − p2 ek−1 + · · · + (−1)k−1 pk−1 e1 =
X
t` ek−1 t(`) + (−1)k−1
`
= k ek + (−1)k−1 pk .
X
`
tk` · e0 t(`)
117
The invariants of a square matrix
Proposition 12.8. Every symmetric polynomial is a polynomial in the sums of
powers. If the coefficients of the symmetric polynomial are real (or rational), then it
is a real (or rational) polynomial function of the sums of powers.
Proof. We can solve recursively for the sums of powers in terms of the elementary
symmetric functions and conversely.
The invariants of a square matrix
A complex-valued or real-valued function f (A), depending
as a polynomial on the
entries of a square matrix A, is an invariant if f F AF −1 = f (A) for any invertible
matrix F .
So an invariant is independent of change of basis. If T : V → V is a linear map
on an n-dimensional vector space, we can define the value f (T ) of any invariant f of
n × n matrices, by letting f (T ) = f (A) where A is the matrix associated to F T F −1 ,
for any isomorphism F : Rn → V of vector spaces.
For any n × n matrix A, write
det(A − λI) = χA (λ) = en (A) − en−1 (A)λ + en−2 (A)λ2 + · · · + (−1)n λn .
The functions e1 (A), e2 (A), . . . , en (A) are invariants, while χA (λ) is the characteristic polynomial of A.
If we write the trace of a matrix A as tr A, then the functions
pk (A) = tr Ak .
are invariants.
12.2 If A is diagonal, say


A=

t1

t2
..

,

.
tn
then prove that ej (A) = ej (t1 , t2 , . . . , tn ), the elementary symmetric functions of the
eigenvalues.
12.3 Generalize the previous exercise to A diagonalizable.
Theorem 12.9. Every invariant polynomial function of real or complex matrices has
exactly one expression as a polynomial function of the elementary symmetric functions
of the eigenvalues.
We can replace the elementary symmetric functions of the eigenvalues by the sums
of powers of the eigenvalues.
118
Permuting roots
Proof. Every invariant function f (A) determines a function f (t) by setting
t1


A=


t2
..

.

.
tn
Taking F any permutation matrix, invariance tells us that f F AF −1 = f (A). But
f F AF −1 is given by applying the associated permutation to the entries of t. Therefore f (t) is a symmetric function. Therefore f (t) = h(s(t)), for some polynomial
h; so f (A) = h(s(A)) for diagonal matrices. By invariance, the same is true for
diagonalizable matrices. If we work with complex matrices, then every matrix can be
approximated arbitrarily closely by diagonalizable matrices, as we hope the reader
has seen previously. Therefore by continuity of h, the equation f (A) = h(s(A)) holds
for all matrices A.
For real matrices, the equation only holds for those matrices whose eigenvalues
are real. However, for polynomials this is enough, since two polynomial functions
equal on an open set are equal everywhere.
Consider the function f (A) = ej (|λ1 | , |λ2 | , . . . , |λn |), where A has eigenvalues
λ1 , λ2 , . . . , λn . This function is a continuous invariant function of a real matrix A,
and is not a polynomial in λ1 , λ2 , . . . , λn .
Resultants and permutations
A polynomial is homogeneous if all of its terms have the same total degree, i.e. sum
of degrees in all variables.
12.4 Prove that a polynomial b(x) of degree m in some variables x1 , x2 , . . . , xn over
an infinite field is homogeneous just when b(tx) = tm b(x) for every t. Give a counterexample over a finite field.
Lemma 12.10. Every factor of a homogeneous polynomial is homogeneous.
Proof. Write the polynomial factorized, as b(x)c(x), and then expand out b(x) and
c(x) into sums of terms of various orders. Then b(x)c(x) expands into products of
such terms, with orders adding up, and they must all add up to the same total, so
the highest total degree terms multiply together and the lowest total degree terms
multiply together both give the same total degree in the product.
In chapter 10, we saw that for any two polynomials split into linear factors
b(x) = (x − β1 ) (x − β2 ) . . . (x − βm ) ,
c(x) = (x − γ1 ) (x − γ2 ) . . . (x − γn ) ,
the resultant is
resb,c = (β1 − γ1 ) (β1 − γ2 ) . . . (βm − γn ) .
119
Resultants and permutations
So the resultant is homogeneous of degree m + n in the variables βi , γj . The resultant
is clearly invariant under any permutation of the roots of b(x), and also under any
permutation of the roots of c(x). Indeed the resultant is by definition expressed
in terms of the coefficients of b(x) and c(x), not the roots. The coefficients are
homogeneous polynomials in the roots, elementary symmetric functions. Expanding
out the coefficients
b(x) = xm + bm−1 xm−1 + · · · + b0 ,
= xm − e1 (β)xm−1 + e2 (β)xm−2 + · · · ± em (β),
c(x) = xn + cn−1 xn−1 + · · · + c0 ,
= xn − e1 (γ)xn−1 + e2 (γ)xn−2 + · · · ± en (γ),
we see that bj has degree m − j in the roots.
Suppose now that we just take any polynomial b(x) of degree m with coefficients
being abstract variables bj . We will now invent a different concept of weight. It seems
natural to assign each coefficient bj the weight m − j, and then define the weight of a
monomial in the bj to be the sum of weights of its factors. If b(x) splits, the weight of
a polynomial in the coefficients of b(x) is precisely its degree as a polynomial in the
roots of b(x). In particular, the weight of the resultant is mn, when bi receives weight
m − i and cj receives weight n − j. For example, from our examples we computed,
n
0
we can see that there is a term bn
0 = bm−m cn−0 , and by symmetry there is a term
mn m
mn 0
m
(−1) c0 = (−1) bm−m cn−n , also of weight mn.
Proposition 12.11. Given any two polynomials b(x, y) of total degree m and c(x, y)
of total degree m over a field, either there are at most mn values of y for which there is
some point (x, y) which satisfies both b(x, y) = 0 and c(x, y) = 0, or b(x, y) and c(x, y)
have a common factor as polynomials in x, y. If the polynomials are homogeneous
then either they have a homogeneous common factor, or their resultant is homogeneous
nonzero of degree mn.
Proof. Thinking of the polynomials as polynomials in x with coefficients rational in
y, we look at the resultant. Expanding out
b(x, y) =
X
j+k≤m
bjk xk y j =
X
bj (y)xj ,
j
we see that bj (y) has degree at most m − j, exactly the weight of the coefficient bj (y)
as it enters into the resultant. Therefore the resultant is a polynomial of degree at
most mn in y. The resultant vanishes at those values y = a for which there is a
common factor between b(x, a) and c(x, a), and in particular if there is a common
root it vanishes. So there are at most mn such points or the resultant is everywhere
zero. If the resultant vanishes everywhere, then b(x, y) and c(x, y) have a common
factor with coefficients rational functions of y. By Gauss’s lemmma (proposition 9.13
on page 73) they also have a common factor in polynomials in x, y.
For homogeneous polynomials, each term bj (y) has degree exactly m − j, so the
resultant either vanishes everywhere or has degree exactly mn.
Chapter 13
Fields
Names for our favourite sets
The following are the standard names for various collections of numbers (or remainders):
Z
the set of all integers,
Q
the set of all rational numbers,
R
the set of all real numbers,
C
the set of all complex numbers,
mZ
the set of all integer multiples of some number m,
Z/mZ the set of remainders modulo a positive integer m.
We can draw R as the number line,
and draw Z as equispaced dots on the number line,
and C as the plane:
We could also draw the elements of Z/mZ much like we would draw a clock:
2
3
1
4
0
13.1 Explain how to add real numbers, how to add complex numbers, and how to
add remainders modulo an integer, using these pictures. Explain how to multiply
complex numbers using these pictures.
Rings
All of the above sets are examples of rings. A ring is a set of objects S together with
two operations called addition and multiplication, and denoted by b + c and bc, so
that if b and c belong to S, then b + c and bc also belong to S, and so that:
121
122
Fields
Addition laws:
a. The associative law: For any elements a, b, c in S: (a + b) + c = a + (b + c).
b. The identity law: There is an element 0 in S so that for any element a in
S, a + 0 = a.
c. The existence of negatives: for any element a in S, there is an element b
in S (denote by the symbol −a) so that a + b = 0.
d. The commutative law: For any elements a, b in S, a + b = b + a.
Multiplication laws:
a. The associative law: For any elements a, b, c in S: (ab)c = a(bc).
The Distributive Law:
a. For any elements a, b, c in S: a(b + c) = ab + ac.
• Clearly Z, Q, R, C and Z/mZ (for any positive integer m) are rings.
• If X is a set, and S is a ring, then we can add and multiply any two
functions mapping X to S by adding and multiplying the values of the
functions as usual. The set T of all functions mapping X to S is a ring.
• If S is a ring, and x is an abstract variable (really just a pure symbol
that we write down) then a polynomial in x with values in S is a formal
expression we write down of the form
a0 + a1 x + · · · + an xn
with coefficients a0 , a1 , . . . , an drawn from S. A polynomial is equal to
0 just when all of its coefficients are 0. Denote the set of all polynomials
in x with coefficients from S as S[x]. Similarly, if we have two abstract
variables x and y, we treat them as commuting: xy = yx, and then we
define S[x, y] to be the set of all polynomials in x and y with coefficients
drawn from S. For example,
1
4
− 2x + y + 8x2 y + y 3
3
7
belongs to Q[x, y].
• If S is a ring, a rational function with coefficients drawn from S is a
formal ratio
b(x)
c(x)
of polynomials b(x), c(x) in S[x] with c(x) not the zero polynomial. We
declare such a formal expression to be equal to
a(x)b(x)
a(x)c(x)
for any nonzero polynomial a(x). Let S(x) be the collection of all
rational functions with coefficients in S in a variable x. Danger: even
b(x)
though we call c(x)
a rational function, it is not, strictly speaking,
123
Special types of rings
actually a function, but only a formal expression. For example, if
S = Z/2Z, then the rational function
1
x(x + 1)
is not defined for any value of x in Z/2Z. So we have to think of x not
as a variable that varies over some numbers, but as an abstract symbol.
Similarly, a rational function is really not a function but an abstract
combination of symbols.
• If x, y are two abstract variables, we similarly define S(x, y) to be the
ring of all rational functions in x, y, i.e. formal ratios of polynomials
in x, y with coefficients drawn from S.
• If R is a ring, R x, x−1 means the ring whose elements are b(x) +
c x−1 , polynomials in an abstract variable x and an abstract variable
called x−1 , but so that when we multiply those two variables together,
we get 1.
• If R and S are rings, let R ⊕ S be the set of all pairs (r, s) where r is
in R and s is in S. Define addition and multiplication by
(r1 , s1 ) + (r2 , s2 ) ..= (r1 + r2 , s1 + s2 ) ,
(r1 , s1 ) (r2 , s2 ) ..= (r1 r2 , s1 s2 ) .
Similarly we define a finite (or even an infinite) sum of rings, as the set
of finite (or infinite) sequences of elements, one from each ring, with
obvious addition and multiplication, element at a time.
If a ring R is a subset of a ring S, and the addition and multiplication operations
of R are just the ones of S applied to elements of R, then R is a subring of S. For
example, Z is a subring of Q, which is subring of R, which is a subring of C.
Special types of rings
A ring is a ring with identity if it satisfies the identity law: There is an element 1 in
S so that for any element a in S, a1 = a.
A ring is a division ring if it satisfies the zero divisors law: for any elements a, b
of S, if ab = 0 then a = 0 or b = 0.
A ring is commutative if it satisfies the commutative law: for any elements a, b of
S, ab = ba.
The associative law for addition, applied twice, shows that (a + b) + (c + d) =
a + (b + (c + d)), and so on, so that we can add up any finite sequence, in any order,
and get the same result, which we write in this case as a + b + c + d. A similar story
holds for multiplication.
13.2 Let S be the set of all 2 × 2 matrices with integer coefficients, using the usual
laws of matrix addition and multiplication as the addition and multiplication in S.
Prove that S is a ring with identity, but not commutative. (Note that we require
every ring to have commutative addition; a commutative ring is one with commutative
multiplication).
124
Fields
13.3 Let 2Z be the set of all even integers using the usual laws of integer addition
and multiplication as the addition and multiplication in 2Z. Prove that 2Z is a
commutative ring but without identity.
13.4 Let 2S be the set of all 2 × 2 matrices with even integer coefficients, using the
usual laws of matrix addition and multiplication as the addition and multiplication
in 2S. Prove that 2S is a noncommmutative ring without identity.
13.5 Suppose that S is a ring with identity in which 0 = 1. Prove that S = { 0 }.
From here on, we will only be interested in commutative rings with identity.
Fields
A unit in a ring is an element that has a multiplicative inverse, i.e. a reciprocal. For
example, in Z/7Z every nonzero element is a unit, because 7 is a prime number. In Z
the units are ±1. If S is a ring, we denote by S × the set of units of S. For example, for
any ring S, (S[x])× = S × , since we no polynomial of positive degree has a reciprocal
polynomial. On the other hand, every nonzero rational function is a unit in S(x).
13.6 Prove that, for any ring S with identity, if b and c are units then bc is a unit.
13.7 Suppose that R is a ring and that b ∈ R is both a zero divisor and a unit. Prove
that R = { 0 } has exactly one element. Prove that any two rings that contains exactly
one element are isomorphic, by a unique isomorphism.
A field is a commutative ring with identity in which every nonzero element is a
unit, i.e. has a multiplicative inverse.
Lemma 13.1. For any positive integer m, Z/mZ is a field if and only if m is prime.
Proof. If m factors, say as m = bc, with b, c ≥ 2, then b 6= 0 modulo m, but bc = 0
modulo m, so both b and c are zero divisors, and therefore are not units. If m does
not factor, then m is prime, and we know that every nonzero element of Z/mZ is a
unit; we even have a recipe to find its multiplicative inverse.
• The set Q of rational numbers is a field, as is the set R of real numbers
and the set C of complex numbers.
• Any field k lies inside the field k(x) of rational functions with coefficients
in k. In particular, every field is contained inside an infinite field.
We can easily finish the proof of proposition 10.4 on page 80 over any field, because
any field k lies inside an infinite field k(x).
13.8 A zero divisor r in a ring R is an element so that rs = 0 for some element s of
R. Suppose that k is a field. Find the (i) zero divisors and (ii) the units in
a. k
b. k[x]
c. k x, x−1
d. k[x] ⊕ k[y].
125
Linear algebra
Linear algebra
Most of linear algebra works identically over any field: matrices, Gaussian elimination,
invertibility, solvability of equations, determinants, and the theory of eigenvalues and
eigenvectors.
13.9 Let k be the Boolean numbers k ..= Z/2Z, and A the matrix
A=
0
1
1
1
0
1
0
1
0
!
,
thought of as having entries from k. Is A invertible? If so, find A−1 .
13.10 If A is a matrix whose entries are rational functions of a variable t over a field
k, prove that the rank of A is constant in t, except for finitely many values of t.
13.11 If A is a matrix whose entries are integers, let Ap be the same matrix, but
with coefficients taken modulo a prime p. Prove that A is invertible over the rational
numbers (i.e. has an inverse matrix with rational number entries) just when, for all
but finitely many primes p, the matrix Ap is invertible over Z/pZ. Prove that A is
not invertible over the rational numbers just when, for every prime number p, Ap is
not invertible over Z/pZ. Prove that A is invertible over the integers just when, for
every prime number p, the matrix Ap is invertible over Z/pZ.
Splitting fields
If k and K are fields and k is contained in K, with the same identity element, zero
element, addition and multiplication operations, then k is a subfield of K and K is
an extension of k. The oldest example: the complex numbers form a field extension
of the real numbers. A polynomial p(x) with coefficients in a field k splits if it is
a product of linear factors. If k ⊂ K is a subfield, a polynomial p(x) in k[x] splits
over K if it splits into a product of linear factors when we allow the factors to have
coefficients from K.
The real-coefficient polynomial x2 + 1 splits over C:
x2 + 1 = (x − i) (x + i) .
√ √
13.12 Denote by Q 2 the field extension of Q generated by 2, i.e. the smallest
√
√ subfield of C containing 2. Prove that Q 2 consists precisely of the numbers
√
√ b + c 2 for b, c rational. Prove that Q 2 is a splitting field for x2 − 2 over Q.
Consider how we could write down field extensions. If k ⊂ K is a subfield, a basis
of K over k is a collection of elements α1 , α2 , . . . , αn of K so that every element of
K has the form
a1 α1 + a2 α2 + · · · + an αn
for some coefficients
a1 , a2 , . . . , an
from k. The degree of K over k is the smallest number of elements in a basis.
126
Fields
• We can write every element of C as x · 1 + y · i, for some real coefficients
x, y. So C is a degree 2 extension of R with 1, i as a basis.
√ √
• Clearly 1, 2 is a basis for Q 2 over Q.
• The real numbers R as an extension of Q has no basis; it is an infinite
degree extension.
We will prove the following theorem in a later chapter:
Theorem 13.2. If p(x) is a polynomial over a field k, then there is an extension K
of k over which p(x) splits, and so that every element of K is expressible as a rational
function of the roots of p(x) with coefficients from k. Moreover, K has finite degree
over k. This field K is called the splitting field of p(x) over k, and is uniquely
determined up to an isomorphism of fields which is the identity map on k.
• Over k = R the polynomial p(x) = x2 + 1 has splitting field C:
x2 + 1 = (x − i) (x + i) .
• Every polynomial over C splits into linear factors, so for any subfield
k ⊂ C the splitting field K of any polynomial over k lies inside C, a
field of complex numbers.
Example of a splitting field
Consider the polynomial p(x) ..= x2 + x + 1 over the finite field k ..= Z/2Z. Let’s look
for roots of p(x). Try x = 0:
p(0) = 02 + 0 + 1 = 1,
no good. Try x = 1:
p(1) = 12 + 1 + 1 = 1 + 1 + 1 = 1,
since 1 + 1 = 0. No good. So p(x) has no roots in k. We know by theorem 13.2 that
there is some splitting field K for p(x), containing k, so that p(x) splits into linear
factors over K, say
p(x) = (x − α) (x − β) ,
for some α, β ∈ K. Moreover, every element of K is a rational function of α, β over k.
What can we say about this mysterious field K? We will find that we can nail down
the multiplication and addition table of K completely. Effectively, we can just write
down K. We know that K contains k, contains α, contains β, and that everything in
K is made up of rational functions over k with α and β plugged in for the variables.
We also know that K has finite dimension over k. Otherwise, K is a total mystery:
we don’t know its dimension, or a basis of it over k, or its multiplication or addition
rules, or anything. We know that in k, 1 + 1 = 0. Since k is a subfield of K, this
holds in K as well. So in K, for any c ∈ K,
c(1 + 1) = c0 = 0.
127
Example of a splitting field
Therefore c + c = 0 in K, for any element c ∈ K. Roughly speaking, the arithmetic
rules in k impose themselves in K as well.
A clever trick, which you probably wouldn’t notice at first: it turns out that
β = α + 1. Why? Clearly by definition, α is a root of p(x), i.e.
α2 + α + 1 = 0.
So then let’s try α + 1 and check that it is also a root.
(α + 1)2 + (α + 1) + 1 = α2 + 2 α + 1 + α + 1 + 1,
but numbers in K cancel in pairs, c + c = 0, so
= α2 + α + 1,
=0
since α is a root of p(x). So therefore elements of K can be written in terms of α
purely, and we can just forget β.
The next fact about K: clearly
{0, 1, α, α + 1} ⊂ K.
We want to prove that
{0, 1, α, α + 1} = K.
How? First, lets try to make up an addition table for these 4 elements:
+
0
1
α
α+1
0
0
1
α
α+1
1
1
0
α+1
α
α
α
α+1
0
1
α+1
α+1
α
1
0
To make up a multiplication table, we need to note that
0 = α2 + α + 1,
so that
α2 = −α − 1 = α + 1,
and
α(α + 1) = α2 + α = α + 1 + α = 1.
Therefore
(α + 1) (α + 1) = α2 + 2α + 1 = α.
This gives the complete multiplication table:
·
0
1
α
α+1
0
0
0
0
0
1
0
1
α
α+1
α
0
α
α+1
1
α+1
0
α+1
1
α
128
Fields
Looking for reciprocals, we find that
1
does not exist,
0
1
= 1,
1
1
= α + 1,
α
1
= α.
α+1
So {0, 1, α, α + 1} is a field, containing k, and p(x) splits over this field, and the field
is generated by k and α, so this field must be the splitting field of p(x):
{0, 1, α, α + 1} = K.
So K is a finite field with 4 elements.
13.13 Consider the polynomial
p(x) = x3 + x2 + 1
over the field k = Z/2Z. Suppose that that splitting field K of p(x) contains a root α
of p(x). Prove that α2 and 1 + α + α2 are the two other roots. Compute the addition
table and the multiplication table of the 8 elements
0, 1, α, 1 + α, α2 , 1 + α2 , α + α2 , 1 + α + α2 .
Use this to prove that
K = 0, 1, α, 1 + α, α2 , 1 + α2 , α + α2 , 1 + α + α2
so K is a finite field with 8 elements.
Sage
Lets get sage to construct the splitting field K of p(x) = x2 + x + 1 over the field
k = Z/2Z. Sage constructs splitting fields by first building a ring, which we will call
R, of polynomials over a field. In our case, we build R = k[x]. Then we let K be the
field generated by an element a, which corresponds to the variable x in the polynomial
ring R.
R.<x> = PolynomialRing(GF(2))
K.<a> = (x^2 + x + 1).splitting_field()
We can then make a list of some elements of K, call it S, and compute out addition
and multiplication tables:
S=[0,1,a,a+1]
print("Addition table:")
for i in range(0,4):
for j in range(0,4):
129
Algebraic closure
print("({})+({})={}".format(S[i],S[j],S[i]+S[j]))
print("Multiplication table:")
for i in range(0,4):
for j in range(0,4):
print("({})*({})={}".format(S[i],S[j],S[i]*S[j]))
Algebraic closure
An extension K of a field k is an splitting field of a collection of polynomials over k if
every one of those polynomials splits over K and K is generated by the roots of all of
these polynomials put together. If that collection consists of all of the polynomials
defined over k, then we say that K is an algebraic closure of k.
We won’t prove:
Theorem 13.3. Every field k has an algebraic closure, denoted k̄, unique up to an
isomorphism which is the identity map on k.
• Clearly R̄ = C.
• Danger: Q̄ 6= C. The number π = 3.14 . . . does not satisfy any polynomial with rational coefficients, so π belongs to C but doesn’t belong
to Q̄. However, Q̄ is a subfield of C. Sit Q inside C as usual. Take
the subfield of C generated by all roots of all polynomials with rational
coefficients, all roots of polynomials with all of those as coefficients,
and so on, and take the union K of all such fields. By the theorem, K
is isomorphic to Q̄, by an isomorphism fixing every rational number.
• If k is any finite field, say with elements
k = { α1 , α2 , . . . , αn } ,
then let
p(x) ..= 1 + (x − α1 ) (x − α2 ) . . . (x − αn ) .
Clearly p(α) = 1 for any α ∈ k. Therefore k is not algebraically closed:
every algebraically closed field is infinite.
13.14 Prove that, for any finite field, and any integer, there is a finite extension of
that field which has more elements than that integer.
Proposition 13.4. Draw a curve
x(t) =
p(t)
u(t)
, y(t) =
,
q(t)
v(t)
where p(t), q(t), u(t), v(t) are polynomials with coefficients in a field k. The resultant
in the variable t of the polynomials
p(t) − xq(t), u(t) − yv(t)
vanishes along the points (x, y) in the plane which lie along this curve, with coefficients
in k̄.
130
Fields
Proof. The proof is the same as that of proposition 10.6 on page 83, using algebraic
closure instead of complex coefficients.
Lemma 13.5. Suppose that b(x) and c(x) are polynomials over a field. In the
splitting field of b(x), take roots β1 , β2 , . . . , βm of b(x) and in the splitting field of
c(x), take roots γ1 , γ2 , . . . , γn of c(x). If the highest terms are b(x) = bm xm + . . .
and c(x) = cn xn + . . . then
m
resb,c = bn
m cn (β1 − γ1 ) (β1 − γ2 ) . . . (βm − γn ) ,
= bn
m c(β1 ) c(β2 ) . . . c(βm ) ,
= (−1)mn cm
n b(γ1 ) b(γ2 ) . . . b(γn ) .
Proof. Our previous proof, in proposition 10.4 on page 80, worked over any infinite
field in which the polynomials split. That proof therefore works over the algebraic
closure of any field, since the algebraic closure is infinite and contains all splitting
fields.
Sage
Sage computes resultants of polynomials over any field. For example, let K be the
splitting field of x2 +x+1 over k = Z/2Z, with generator a. We compute the resultant
and greatest common divisor of
p(t) = (at + 1)2 (at − 1)2 , q(t) = (at + 1)(at + a2 )2 .
via
R.<x> = PolynomialRing(GF(2))
K.<a> = (x^2 + x + 1).splitting_field()
S.<t> = PolynomialRing(K)
p = S((a*t+1)^2*(a*t-1)^2)
q = S((a*t+1)*(a*t+a^2)^2)
print(p.gcd(q))
print(p.resultant(q))
yielding t + a + 1 and 0.
Chapter 14
Rings
We dance round in a ring and suppose, but the secret sits in the middle
and knows.
— Robert Frost
The Secret Sits, A Witness Tree, 1942
Morphisms
A morphism of rings f : R → S is a map f taking elements of a ring R to elements of
a ring S, so that f (a + b) = f (a) + f (b) and f (ab) = f (a)f (b) for all elements b, c of
R. (Morphisms are also often called homomorphisms.)
The map f : Z → Z/mZ, f (b) = b̄, remainder modulo m, is a morphism.
14.1 Prove that the map f : Z/3Z → Z/6Z, f (b) = 4b is a morphism. Draw pictures
to explain it.
14.2 Find all morphisms f : Z/pZ → Z/qZ, where p and q are prime numbers and
p 6= q.
Sage
We define R = Q[x, y] and T = Q[z] and define a morphism φ : R → S by φ(x) = z 3
and φ(y) = 0:
R.<x,y> = PolynomialRing(QQ)
T.<z> = PolynomialRing(QQ)
phi = R.hom([z^3, 0],T)
phi(x^2+y^2)
yielding φ(x2 + y 2 ) = z 6 .
Kernel and image of a morphism
If f : R → S is a morphism of rings, the kernel of f is the set
ker f ..= { r ∈ R | f (r) = 0 } .
131
132
Rings
The image of f is the set
f (R) ..= { s ∈ S | s = f (r) for some r ∈ R } .
The morphism x ∈ R 7→ x + 0i ∈ C has kernel { 0 } and image the set of all
numbers x + 0i. (We often write 0 to mean { 0 } in algebra.)
The morphism a ∈ Z 7→ ā ∈ Z/mZ has kernel mZ.
14.3 Prove that the kernel of a morphism f : R → S of rings is a subring of R, while
the image is a subring of S.
14.4 Prove that the morphism p(x) ∈ R[x] 7→ p(i) ∈ C has kernel consisting of all
polynomials of the form
1 + x2 p(x),
for any polynomial p(x).
14.5 Suppose that R is a ring and that f, g : Q → R are morphisms and that f (1) =
g(1). Prove that f = g.
A morphism f : R → S with kernel 0 is injective, also called one-to-one, while a
morphism f : R → S with image f (R) = S is surjective or onto.
A surjective and injective morphism is an isomorphism. If there is an isomorphism
between rings R and S, they are isomorphic, denoted R ∼
= S. If two rings are isomorphic, then they are identical for all practical purposes. An isomorphism f : R → R
from a ring to itself is an automorphism.
Take some positive integers m1 , m2 , . . . , mn . Let
m = m1 m2 . . . mn .
For each k ∈ Z/mZ, write its remainder modulo m1 as k̄1 , and so on. Write
k̄ to mean
k̄ ..= k̄1 , k̄2 , . . . , k̄n .
Let
S = (Z/m1 Z) ⊕ (Z/m2 Z) ⊕ · · · ⊕ (Z/mn Z) .
Map f : Z/mZ → S by f (k) = k̄. It is clear that f is a ring morphism. The
Chinese remainder theorem states that f is a ring isomorphism precisely
when any two of the integers
m1 , m2 , . . . , mn
are coprime. For example, if m = 3030 = 2 · 3 · 5 · 101, then
Z/3030Z ∼
= Z/2Z ⊕ Z/3Z ⊕ Z/5Z ⊕ Z/101Z.
14.6 Prove that Z/4Z is not isomorphic to Z/2Z ⊕ Z/2Z.
133
Kernels
Kernels
The main trick to understand complicated rings is to find morphisms to them from
more elementary rings, and from them to more elementary rings.
14.7 Prove that every subring of any given ring is the image of some morphism of
rings.
It is natural to ask which subrings of a ring are kernels of morphisms. An ideal
of a commutative ring R is a nonempty subset I of R so that, if i, j are in I then
i − j is in I and if i is in I and r is in R then ir is in I: an ideal is sticky under
multiplication.
• The even integers form an ideal 2Z inside the integers.
• The integers do not form an ideal inside the rational numbers. They
are not sticky, because i = 2 is an integer and r = 13 is rational, but
ir = 23 is not an integer.
• The real polynomials in a variable x vanishing to order 3 or more at
the origin form an ideal x3 R[x] inside R[x]. Note why they are sticky:
if i = x3 p(x) and r = q(x) then ir = x3 p(x)q(x) has a factor of x3 still,
so also vanishes to order 3 or more. This is a good way to think about
ideals: they remind us of the ideal of functions vanishing to some order
somewhere, inside some ring of functions.
14.8 Prove that the only ideals in Q are 0 and Q.
Lemma 14.1. The kernel of any morphism of rings is an ideal.
Proof. Take a ring morphism f : R → S and let I ..= ker f . Then if i, j are in
I, this means precisely that f kills i and j, so f (i) = 0 and f (j) = 0. But then
f (i − j) = f (i) − f (j) = 0 − 0 = 0. So f kills i − j, or in other words i − j lies in I.
Pick any i in I and any r in R. We need to prove that ir is in I, i.e. that f kills
ir, i.e. that 0 = f (ir). But f (ir) = f (i)f (r) = 0f (r) = 0.
We will soon see that kernels of morphisms are precisely the same as ideals.
14.9 Prove that the ideals of Z are precisely 0, Z and the subsets mZ for any integer
m ≥ 2. (Of course, we could just say that the ideals are the sets mZ for any integer
m ≥ 0.)
Lemma 14.2. Given a ring R and any set A of elements of R, there is a unique
smallest ideal containing A, denoted (A). To be precise, I is the set of all finite sums
r1 a1 + r2 a2 + · · · + rn an
for any elements
r1 , r 2 , . . . , r n
of R and any elements
a1 , a2 , . . . , an
of A.
134
Rings
Proof. Clearly A lies in I, and I is an ideal. On the other hand, if A lies in some ideal,
that ideal must contain all elements of A, and be sticky, so contain all products ra
for r in R and a in A, and so contains all finite sums of such, and so contains I.
• The ideal I ..= (A) generated by A ..= { 12, 18 } inside R ..= Z is
precisely 6Z, because we use Bézout coefficients r1 , r2 to write 6 as 6 =
r1 · 12 + r2 · 18, and so 6 lies in I = (12, 18). But then I = (12, 18) = (6).
• More generally, the ideal I ..= (m1 , m2 , . . . , mn ) generated by some
integers m1 , m2 , . . . , mn is precisely I = (m) = mZ generated by their
greatest common divisor. So we can think of ideals in rings as a
replacement for the concept of greatest common divisor.
• The same idea works for polynomials, instead of integers. Working, for example, over the rational numbers, the ideal I ..=
(p1 (x), p2 (x), . . . , pn (x)) inside the ring R = Q[x] is just I = p(x)Q[x]
where
p(x) = gcd {p1 (x), p2 (x), . . . , pn (x)} .
• In R ..= Q[x, y], the story is more complicated. The ideal I ..= x, y 2
is not expressible as p(x, y)R, because I cannot be expressed using only
one generator p(x, y).
Sage
In sage we construct ideals, for example inside the ring R = Q[x, y], as:
R.<x,y>=PolynomialRing(QQ)
R.ideal( [x^3,y^3+x^3] )
We can then test whether two ideals are equal:
R.ideal( [x^3,y^3+x^3] )==R.ideal( [x^3,y^3] )
yields True.
Fractions
Theorem 14.3. A commutative ring R lies inside a field k as a subring just when
the product of any two nonzero elements of R is nonzero. After perhaps replacing k by
the subfield generated by R, the field k is then uniquely determined up to isomorphism,
and called the field of fractions of R. Every ring morphism R → K to a field K
extends uniquely to an injective field morphism k → K.
Proof. If a commutative ring R lies in a field k, then clearly any two nonzero elements
have nonzero product, as they both have reciprocals. Take two pairs (a, b) and (c, d)
135
Sage
with a, b, c, d in R, and 0 6= b and d 6= 0. Declare them equivalent if ad = bc. Write
the equivalence class of (a, b) as ab . Define, as you might expect,
a
+
b
a
−
b
a
·
b
c
ad + bc
=
,
d
bd
c
ad − bc
=
,
d
bd
c
ac
=
,
d
bd
Tedious checking shows that these are well defined, and yield a field k. The crucial
step is that the denominators can’t vanish. The reciprocal in k is easy to find:
−1
a
b
=
b
a
if a 6= 0 and b 6= 0. Map R to k by a 7→ a1 . Check that this map is injective. Given a
ring morphism f : R → K to a field, extend by
f
a
b
=
f (a)
.
f (b)
Again tedious checking ensures that this is defined. Every field morphism is injective.
• The field of fractions of Z is Q.
• The field of fractions of the ring
1
2
h i
Z
of half integers is also Q.
• The field of fractions of a field k is k.
• The field of fractions of Z[x] is Q(x).
• The field of fractions of Q[x] is also Q(x).
14.10 Prove that these examples above are correct.
Sage
Sage constructs fields of fractions out of any ring with no zero divisors. For example,
let F = Z/7Z and R = F [x, y] and let k be the fraction field of R, and then S = k[t]
and then compute the resultant of two polynomials p(t), q(t) ∈ S:
R.<x,y> = PolynomialRing(GF(7))
k=R.fraction_field()
S.<t> = PolynomialRing(k)
p=S(x*t+t^2/t)
q=S(y*t+1)
p.resultant(q)
136
Rings
Algebras
Almost every ring we encounter allows us to multiply by elements from some field.
• Matrices over a field k can be multiplied by elements of k.
• Polynomials over a field k can be multiplied by elements of k.
• Rational functions over a field k can be multiplied by elements of k.
An algebra A over a field k is a ring with an operation s ∈ k, a ∈ A 7→ sa ∈ A,
called scalar multiplication so that
a. s(a + b) = (sa) + (sb),
b. (r + s)a = (ra) + (sa),
c. (rs)a = r(sa),
d. s(ab) = (sa)b = a(sb),
e. 1a = a,
for r, s ∈ k, a, b ∈ A.
• The field k is an algebra over itself.
• If k ⊂ K is a subfield, then K is an algebra over k.
• If A is an algebra over k, then the n × n matrices with entries in A are
an algebra over k.
• If A is an algebra over k, then the polynomials A[x] with coefficients
in A are an algebra over k.
Chapter 15
Algebraic curves in the plane
Everyone knows what a curve is, until he has studied enough mathematics to become confused through the countless number of possible
exceptions.
— Felix Klein
Let’s find the solutions of the algebraic equation y 2 = x2 + x3 . The solutions
form a curve in the plane, intuitively because the equation is one constraint on
two variables. First, we note that there is one obvious solution: (x, y) = (0, 0).
Next, a minor miracle occurs: we find that almost every line through (0, 0)
passes through another point of the curve, which we can compute. To see
this, write any line through (0, 0), say with slope t, as y = tx. Plug this in
to the equation y 2 = x2 + x3 to see if we can find any more solutions:
(tx)2 = x2 + x3 ,
which we simplify to get either x = 0 or, dividing by x,
t2 = 1 + x.
Solving for x, wefind x = t2 − 1. Plug in to the equation of the line y = tx to
get y = t t2 − 1 . Every solution, except (x, y) = (0, 0), has to lie on some
line through (0, 0), and that line will be either y = tx or a vertical line x = 0.
But on the line x = 0, the only solution is y = 0, the origin. Moreover, if
we let t = ±1, we get the solution (x, y) = (0, 0) too. So we get all of the
solutions.
Each line strikes the curve at two points (one of which is the origin); except
for the vertical line.
137
138
Algebraic curves in the plane
An algebraic curve is the set of zeroes of a nonconstant polynomial in two variables;
the degree of the curve is the degree of the polynomial. A curve of degree 1, 2, 3, 4,
5, 6 is called a line, conic, cubic curve, quartic curve, quintic curve.
15.1 Use the same trick for the circle x2 + y 2 = 1 as follows. Show first (using the
formula for the slope of a line) that for any point (x, y) of the circle, the line through
(0, 1) and (x, y) passes through the horizontal axis at the point (t, 0) where
x
.
t=
1−y
It helps to draw a picture. Solve for x: x = t(1 − y), and plug into the equation of
the circle to find that
1
(x, y) = 2
2t, t2 − 1 .
t +1
Explain now why every rational point of the circle corresponds to a rational value of
t and vice versa. Explain why the same conclusion holds over any field k.
Let’s try a different cubic curve. The degree of a term in a polynomial is the
sum of the degrees in each of the variables; the degree of the polynomial is
the highest degree of any nonzero term. Take the cubic curve
y 2 = (x + 1)x(x − 1).
We see right away that some lines strike the curve in more than two points,
one at the origin, and two others. Calculate as above that those two others
are
√
t2 ± t4 + 4
x=
, y = tx.
2
Fermat’s Last Theorem says that there are no integer solutions to an +bn = cn
with nonzero integers a, b, c and n ≥ 3. Dividing both sides by c, if there
were a solution, we would solve
xn + y n = 1
with nonzero rational numbers x, y. Conversely, any positive rational numbers
x, y solving this equation, after clearing denominators, give a solution to
an + bn = cn . So Fermat’s Last Theorem is equivalent to the statement that
Algebraic curves in the plane
the algebraic curve xn + y n = 1 has no positive rational number solutions;
in particular, we are interested in algebraic curves over the rational numbers,
not just the real numbers.
We make our first, naive, attempt to define algebraic curves. An algebraic curve
is the set of points (x, y) of the plane satisfying an equation 0 = f (x, y), where f (x, y)
is a nonconstant polynomial.
Consider the polynomial equation x2 + y 2 = −7. There are no real values of
x, y that satisfy this, so the algebraic curve is empty.
To remedy this problem, we allow the variables x and y to take on complex values.
Sticking in a value for one of the variables, or perhaps for the other, with a suitable
choice of value, we will find that we have a nonconstant polynomial in one variable,
so there is a solution (x, y) by the fundamental theorem of algebra. Hence algebraic
curves have complex points, but perhaps no real points.
To study the existence of rational solutions to algebraic equations we can clear
denominators as we did above for Fermat’s Last Theorem, and look for integer solutions
to different algebraic equations. We can then look at those equations modulo a prime,
and get equations over a finite field. So it is natural to consider algebraic curves over
any field.
If k is a field, an algebraic curve X over k given by a nonconstant polynomial
equation f (x, y) = 0 with coefficients in k is the set X of its k̄-points, i.e. points (x, y)
with x, y in the algebraic closure k̄ satisfying f (x, y) = 0. There are infinitely many
k̄-points, because we can force either x or y to any constant value, and for infinitely
many such constants f (c, y) or f (x, c) is not constant so has a root in k̄.
The equation of an algebraic curve is not uniquely determined. The curve x2 −y =
9
0 is the same curve as x2 − y = 0. The curve xy = 0 splits into x = 0 or y = 0, two
curves. If the equation of the curve has nonconstant factors, we say that the curve
is reducible and the equations formed from the factors, or the curves they cut out,
are components of X. We can assume that the polynomial f (x, y) in the equation
f (x, y) = 0 of a curve is not a square, cube, etc. of any polynomial.
A regular function on an algebraic curve is the restriction of a polynomial function.
For example, on the algebraic curve y 2 = x2 + x3 , the functions 0 and y 2 − x2 − x3
take on the same values at every point, and therefore are equal as regular functions.
If X is an algebraic curve over field k, let k[X] be the set of regular functions on X
with coefficients in k.
A regular morphism of algebraic curves f : C → D is a map which can be expressed
somehow as f (x, y) = (s(x, y), t(x, y)) for two polynomials s(x, y), t(x, y), so that f ,
applied toany point of C, yields a point of D. For example, the map f (x, y) = (s, t) =
1 − x2 , x maps (y = 0) to t2 = s2 + s3 , as we saw previously. A regular morphism
with a regular inverse is biregular, and the associated curves are biregular. Clearly
biregular algebraic curves have isomorphic algebras of regular functions.
Take a conic 0 = f (x, y) and expand out
f (x, y) = ax2 + bxy + cy 2 + rx + sy + t.
If 0 = a = b = c then this is a line, not a conic, so we can assume that at
139
140
Algebraic curves in the plane
least one of a, b, c is not zero. First, suppose that 0 = a = c, so
f (x, y) = bxy + rx + sy + t.
Rescale to arrange that b = 1. Factor as
f (x, y) = (x + s)(y + r) + t − rs.
Change variables by translation, replacing x + s by x and y + r by y and let
u = rs − t to get
f (x, y) = xy − u.
If u = 0, then our conic is a pair of lines intersecting at a single point. If
u 6= 0, rescale x by 1/u to arrange that our conic is xy = 1.
Suppose that one or more of a or b is not zero, and swap variables if needed
to arrange that a 6= 0 and rescale the equation 0 = f (x, y) to get a = 1.
f (x, y) = x2 + bxy + cy 2 + rx + sy + t.
Suppose we work over a field not of characteristic 2.
by
f (x, y) = x +
2
2
+
b2
c−
4
y 2 + rx + sy + t.
Replace x + by/2 by a new variable called x and replace c − b2 /4 by a new
constant called c:
f (x, y) = x2 + cy 2 + rx + sy + t.
Complete the square in x
f (x, y) = x +
r
2
2
+ cy 2 + rx + sy + t −
r2
.
4
Rename x + r/2 to x and t − r2 /4 to t:
f (x, y) = x2 + cy 2 + sy + t.
If c = 0 then
f (x, y) = x2 + sy + t.
If s 6= 0 then replace sy + t by a new variable, which we call −y, to get
f (x, y) = x2 − y.
Note that then our curve is a parabola
y = x2 . On the parabola, every
2
polynomial g(x, y) is just g x, x , a polynomial in x. If 0 = s = t then our
conic is actually a line x = 0; linearly transform variable to get the conic
to be y = 0. If 0 = s and t 6= 0 then our conic is x2 = −t. Again linearly
transform to get the conic y 2 = −t. If −t has a square root in k, say a, then
our equation is y = ±a, and rescale the y variable to get a = 1, so a pair of
disjoint lines y = ±1. If −t has no square root in k, then an empty set of
points with√coordinates in k, but a pair of disjoint lines with coordinates in
k(α), α = −t: y = ±α. This α is uniquely determined up to scaling by
141
Algebraic curves in the plane
an element of k× . Summing up, over any field of characteristic not 2, every
conic is biregular to precisely one of
Equation
y=0
Geometry
Regular function algebra
line
k[x]
xy = 0
pair of intersecting lines
k[x] ⊕ k[y]
xy = 1
hyperbola
k x, x−1
y = x2
parabola
k[x]
y = ±1
pair of disjoint lines
k[x] ⊕ k[x]
y = ±α
pair of disjoint lines over
k(α), α2 ∈ k, α ∈
/k
k(α)[x]
For example, over the rational numbers, the equations x2 = −t for t any prime
or x2 = t for t any prime give conics with no rational points. Different primes
give different conics, and the reader can work out why they are not biregular
over the rational numbers. One the other hand, over the real numbers, we
only have a single conic x2 = −1 with no real points.
A rational function on an algebraic curve X over a field k is a rational function
of x, y, whose denominator is not divisible by any nonconstant factor of the equation
of the curve.
On the curve X = (y = x2 ), the rational function f = x/(x − y) might seem
to be undefined near the origin, but we can also write it as
x
,
x−y
x
=
,
x − x2
1
.
=
1−x
f=
A rational function f is regular near a point of X if f can be written as f = b/c
for some polynomials b, c with c nonzero near that point.
A rational morphism of algebraic curves f : C → D is a map which can be expressed
somehow as f (x, y) = (s(x, y), t(x, y)) for two rational functions s(x, y), t(x, y), so that
f , applied to any point of C at which both s(x, y) and t(x, y) are defined, yields a point
of D. A rational morphism with a rational inverse is birational, and the associated
curves are birational. A rational morphism is regular near a point of C if s(x, y) and
t(x, y) are regular near that point, i.e. can be expressed near that point as ratios of
polynomials not vanishing near that point.
Lemma 15.1. Algebraic curves in the plane (over any field) are birational just when
their fields of rational functions are isomorphic.
Proof. Suppose that the curves are birational. Clearly the rational functions compose
with the birational map, and with its inverse, identifying the fields. Conversely,
142
Algebraic curves in the plane
suppose that the fields are isomorphic, say by an isomorphism
φ : k(C) → k(D).
Write the coordinates of the plane in which C lives as x, y, and those of the plane in
which D lives as u, v. Let s(x, y) = φ−1 u and t(x, y) = φ−1 v.
A curve birational to a line in the plane (with equation y = 0, for example) is a
rational curve. An algebraic curve C is rational just when k(C) ∼
= k(x).
15.2 Imitate our picture of drawing lines through a point of y 2 = x3 +x2 , but replacing
that cubic curve by an irreducible conic (curve of degree 2). Prove that conics are
rational.
15.3 Prove that every algebraic curve over any algebraically closed field has infinitely
many points. Hints: Pick a point not on the curve. (You need to prove that such a
point exists, using the fact that every algebraically closed field is infinite.) Change
variables by translation to arrange that this point is the origin. Any line through the
origin strikes the curve at some point, except perhaps for finitely many exceptional
lines. (Why?) Two lines through the origin intersect only at the origin. So any two
lines through the origin strike the curve at different points. There are infinitely many
lines through the origin. (Why?)
Integrals
Suppose that g(x, y) = 0 is the equation of an irreducible plane algebraic curve over
the field of real numbers, for example y 2 = x2 + x3 . Suppose that we√can locally write
2
3
the curve as the graph of a function y = y(x), for example y =
R x + x . Pick a
rational function f (x, y) on the curve, and consider the integral f dx, by which we
mean
Z
Z
f (x, y) dx =
f (x, y(x)) dx.
If the curve is rational, then we can parameterize it by x = x(t), y = y(t), as rational
functions of a new variable t. So then the integral becomes
Z
Z
f dx =
f (x(t), y(t))
dx
dt
dt
and this is a composition of rational functions, so a rational function of t. Using partial
fractions, we can integrate this explicitly. For example, on the curve y 2 = x2 + x3 , if
we take the rational function
x3
,
y
we can write our integral as
Z
Z
f dx =
x3 dx
√
.
x2 + x3
We then solve this integral by parameterising as on page 137,
x = x(t) = t2 − 1, y = y(t) = t t2 − 1
143
Ideals
to get
Z
√
x3 dx
=
x2 + x3
x3 dx
,
y
Z
t2 − 1
Z
=
2t, dt
t (t2 − 1)
Z
=2
3
t2 − 1
2
,
dt,
and the reader can simplify this.
Ideals
Given an algebraic curve C and a point p ∈ C (say, to be precise, that p ∈ C(k)
is a point defined over k), we are naturally interested in the regular functions on
C vanishing at p. If p has coordinates p = (x0 , y0 ), then the polynomial functions
x−x0 , y−y0 vanish simultaneously only at p. For simplicity, translate the plane so that
p lies at the origin. A regular function on C vanishes at p just when it is a sum xg(x, y)+
yh(x, y), since every polynomial function vanishing at the origin is expressible in a
finite Taylor series expansion. In other words, the regular functions on C vanishing
at the origin are precisely the ideal I = (x, y). Similarly, the regular functions on
C vanishing at (x, y) = (x0 , y0 ) are the polynomials of the form (x − x0 ) g(x, y) +
(y − y0 ) h(x, y), so constitute the ideal I = (x − x0 , y − y0 ). This is the motivation
for the name “ideal”: an ideal is like a idealized notion of a point.
The regular functions on the real number line are the polynomials p(x), and those
which vanish at a point x = x0 are those of the form (x − x0 ) p(x). Similarly, if we
take two points x0 =
6 x1 , then the regular functions on the line vanishing at both
points are the polynomials of the form (x − x0 ) (x − x1 ) p(x), the polynomials divisible
(x − x0 ) (x − x1 ). If we imagine running the two points into one another, so that they
collide at x = 0, then these polynomials approach polynomials
divisible by x2 . We
2
2
could think of the equation x = 0, or of the ideal I = x , as representing a “double
point”.
Working over k ..= Q on the curve C ..= (x2 = y), the ideal I ..= (x − 2) does
not correspond to any pointed defined over k. Again, we can think of I as an “ideal
point”.
Suppose that C is an algebraic curve and p = (x0 , y0 ) is a point of C. Let I be
the ideal of all regular functions on C vanishing at p. Then I is the kernel of the
morphism g(x, y) 7→ g(x0 , y0 ), which is a morphism k[C] → k. So we think of all
ideals of k[C] as if they were “ideal points”.
Sage
Sage can usually plot the real points of an algebraic plane curve. For any equation
like y 3 + x3 − 6x2 y = 0 the code
x,y=var(’x,y’)
contour_plot(y^3+x^3-6*x^2*y==0, (x,-10,10), (y,-10,10))
yields a picture of the level sets of the function y 3 + x3 − 6x2 y:
144
Algebraic curves in the plane
from which we can see that our curve is a union of three lines. Similarly
x,y=var(’x,y’)
contour_plot(y^2-x*(x-1)*(x-2)==0, (x,-3,4), (y,-5,5))
yields level sets like
and we can also plot the algebraic curve, without the other level sets, using
f(x,y) = y^2-x*(x-1)*(x-2)
implicit_plot(f, (-3, 4), (-5, 5))
yielding
145
Sage
Similarly, we can draw algebraic surfaces:
f(x,y,z) = y^2-x*(x-1)*(x-2)+z^4-1
implicit_plot3d(f, (-3, 4), (-5, 5), (-1,1))
yielding
Chapter 16
Where plane curves intersect
Resultants in many variables
Lemma 16.1. Take two polynomials b(x), c(x) in a variable x, with coefficients in
a commutative ring S, of degrees m, n. Then the resultant r = resb,c is expressible
as r = u(x)b(x) + v(x)c(x) where u(x) and v(x) are polynomials, with coefficients in
the same commutative ring S, of degrees n − 1, m − 1.
The proof is identical to the proof of lemma 10.3 on page 79.
Take two polynomials b(x, y) and c(x, y) and let r(x) be the resultant of b(x, y), c(x, y)
in the y variable, so thinking of b and c as polynomials in y, with coefficients being
polynomials in x. So r(x) is a polynomial in x.
• If b(x, y) = y + x, c(x, y) = y, then r(x) = x vanishes just at the value
x = 0 where there is a common factor: y.
c=0
r=0
b=0
• Let b(x, y) = xy 2 + y, c(x, y) = xy 2 + y + 1
b=0
c=0
Look at the picture at x = 0 and near x = 0: because the polynomials
drop degrees, the number of roots of b(0, y) on the vertical line x = 0
is smaller than the number of roots of b(a, y) on the vertical line x = a
for constants x = a near 0. We can see roots “flying away” to infinity
on those lines.
147
148
Where plane curves intersect
b=0
c=0
Finding the determinant of the associated 4 × 4 matrix tells us that
r(x) = x2 , which vanishes just at the value x = 0 where b(0, y) = y
and c(0, y) = y + 1 drop in degrees, not due to a common factor.
b=0
c=0
The resultant of y and y + 1 is not r(0) since the degrees drop, so we
would compute resultant of y and y + 1 using a 2 × 2 matrix, and find
resultant −1.
• If b(x, y) = xy 2 + y and c(x, y) = 2xy 2 + y + 1 then r(x) = x(x + 1)
vanishes at x = 0 and x = −1. At x = 0, b(0, y) = y, c(0, y) = y+1 have
no common factor, but drop degrees. At x = −1, b(−1, y) = −y(y − 1),
c(−1, y) = −2(y − 1)(y − 1/2) have a common factor, but they don’t
drop degrees.
b=0
c=0
• If b(x, y) = x2 +y 2 −1 and c(x, y) = (x−1)2 +y 2 −1 then r(x) = (2x−1)2
vanishes at x = 1/2, where there are two different intersection points,
a double common factor:
√ √ 3
3
y+
.
b(1/2, y) = c(1/2, y) = y −
2
2
b=0
c=0
149
Resultants in many variables
Lemma 16.2. Suppose that k is a field and
x = (x1 , x2 , . . . , xn )
are variables and y is a variable. Take b(x, y) and c(x, y) two nonconstant polynomials
in k[x, y]. Then in some finite degree extension of k there are constants
λ = (λ1 , λ2 , . . . , λn )
so that in the expressions b(x + λy, y) , c(x + λy, y), the coefficients of highest order
in y are both nonzero constants.
• Return to our earlier example of b(x, y) = xy 2 + y and c(x, y) = 2xy 2 +
y + 1 and let
B(x, y) ..= b(x + λy, y) = λy 3 + xy 2 + y,
C(x, y) ..= c(x + λy, y) = 2λy 3 + 2xy 2 + y + 1.
For any nonzero constant λ 6= 0, the resultant of B(x, y), C(x, y) is

0
1

x
r(x) = det 
l
0
0
0
0
1
x
l
0
0
0
0
1
x
l
1
1
2x
2λ
0
0
0
1
1
2x
2λ
0

0
0

1
2
 = −λ (λ + 1 + x) .
1
2 x
2λ
So the resultant now vanishes just when x = −(λ+1), which is precisely
when B(x, y), C(x, y) have a common factor of y − 1. With a small
value of λ 6= 0, the picture changes very slightly: we change x, y to
x+λy, y, a linear transformation which leaves the x-axis alone, but tilts
the y-axis to the left. Crucially, we tilt the asymptotic line at which the
two curves approached one another (where b(x, y) and c(x, y) dropped
degrees), but with essentially no effect on the intersection point:
B=0
C=0
• If b(x, y) = x2 + y 2 − 1 and c(x, y) = (x − 1)2 + y 2 − 1, then r(x) =
(2x − 1)2 vanishes at x = 1/2, where there are two different intersection
points.
150
Where plane curves intersect
b=0
c=0
If instead we pick any nonzero constant λ and let B(x, y) = (x + λy)2 +
y 2 − 1 and C(x, y) = (x + λy − 1)2 + y 2 − 1 then the resultant
√ √ 1−λ 3
1+λ 3
x−
r(x) = 4 λ2 + 1 x −
2
2
vanishes at two distinct values of x corresponding to the two distinct
roots. In the picture, the two roots now lie on different vertical lines
(different values of x).
B=0
C=0
Proof. Write b as a sum of homogeneous polynomials of degrees 0, 1, 2, . . . , d, say
b = b0 + b1 + · · · + bd .
It is enough to prove the result for bd , so assume that b is homogeneous of degree d.
The expression
b(x, 1)
is a nonzero polynomial over an infinite field, so doesn’t vanish everywhere by lemma 9.11
on page 72. Pick λ to be a value of x for which b(λ, 1) 6= 0. Then
b(x + λy, y) = yb(x, 1) + · · · + y d b(λ, 1).
Similarly for two polynomials b(x, y), b(x, y), or even for any finite set of polynomials.
Corollary 16.3. Take one variable y and several variables x = (x1 , x2 , . . . , xn )
and two polynomials b(x, y) and c(x, y) over a field. Let r(x) be the resultant of
b(x, y), c(x, y) in the y variable. For any constant a in our field, if b(a, y) and c(a, y)
have a common root in the algebraic closure of our field then r(a) = 0.
If the coefficient of highest order in y of both b(x, y) and c(x, y) is constant in x (for
example, perhaps after the normalization described in lemma 16.2 on the previous page)
151
Resultants in many variables
then r(a) = 0 at some value x = a in our field just exactly when b(a, y) and c(a, y)
have a common factor. Moreover r(x) is then the zero polynomial just when b(x, y)
and c(x, y) have a common factor which is a polynomial in x, y of positive degree in y.
Proof. If, at some value x = a, both the degrees of b(x, y) and c(x, y) don’t drop,
then the resultant in y is expressed by the same expression whether we set x = a to
a constant value or leave x as an abstract variable, compute resultant, and then set
x = a.
Work over the ring of polynomials in y, with coefficients rational in x. The
resultant in y being zero as a function of x forces a common factor in that ring, i.e.
b(x, y) = d(x, y)B(x, y),
c(x, y) = d(x, y)C(x, y),
where d(x, y), B(x, y) and C(x, y) are rational in x and polynomial in y and d(x, y)
has positive degree in y. In particular, c(x, y) factorises over that ring. By the Gauss
lemma (proposition 9.13 on page 73), c(x, y) factorises over the polynomials in x, y.
But c(x, y) is irreducible, so one factor is constant, and it isn’t d(x, y), so it must
be C(x, y), so we rescale by a nonzero constant to get d(x, y) = c(x, y), i.e. c(x, y)
divides b(x, y).
Summing up: given two algebraic curves in the plane B = (b(x, y) = 0) and
C = (c(x, y) = 0), we can (after perhaps “slanting” the variables a little) find the
points where they intersect (over the algebraic closure of our field) by calculating a
resultant.
Chapter 17
Quotient rings
Recall once again that all of our rings are assumed to be commutative rings with
identity. Suppose that R is a ring and that I is an ideal. For any element r ∈ R, then
translate of I by r, denoted r + I, is the set of all elements r + i for any i ∈ I.
• If R = Z and I = 12Z is the multiples of 12, then 7 + I is the set
of integers which are 7 larger than a multiple of 12, i.e. the set of all
numbers
. . . , 7 − 12, 7, 7 + 12, 7 + 2 · 12, 7 + 3 · 12, . . .
which simplifies to
. . . , −5, 7, 19, 31, 53, . . .
But this is the same translate as −5 + I, since −15 + 12 = 7 so −5 + I
is the set of all numbers
. . . , −5 − 12, −5, −5 + 12, −5 + 2 · 12, −5 + 3 · 12, . . .
which is just
. . . , −15, −5, 7, 19, 31, 53, . . .
the same sequence.
• Working again inside Z, 2 + 2Z = 2Z is the set of even integers.
• If R = Q[x] and I = (x), then
form
1
2
+ I is the set of all polynomials of the
1
+ xp(x)
2
for any polynomial p(x).
• If i ∈ I then i + I = I.
We add two translates by
(r1 + I) + (r2 + I) ..= (r1 + r2 ) + I,
and multiply by
(r1 + I) (r2 + I) = (r1 r2 ) + I.
Lemma 17.1. For any ideal I in any ring R, two translates a + I and b + I are
equal just when a and b differ by an element of I.
Proof. We suppose that a + I = b + I. Clearly a belongs to a + I, because 0 belongs
to I. Therefore a belongs to a + I = b + I, i.e. a = b + i for some i from I.
153
154
Quotient rings
Lemma 17.2. The addition and multiplication operations on translates are well
defined, i.e. if we find two different ways to write a translate as r + I, the results of
adding and multiplying translates don’t depend on which choice we make of how to
write them.
Proof. You write your translates as a1 + I and a2 + I, and I write mine as b1 + I and
sb2 +I, but we suppose that they are the same subsets of the same ring: a1 +I = b1 +I
and a2 + I = b2 + I. By lemma 17.1 on the previous page, a1 = b1 + ii and a2 = b2 + i2
for some elements i1 , i2 of I. So then
a1 + a2 + I = b1 + i1 + b2 + i2 + I,
= b1 + b2 + (i1 + i2 ) + I,
= b1 + b2 + I.
The same for multiplication.
If I is an ideal in a commutative ring R, the quotient ring R/I is the set of all
translates of I, with addition and multiplication of translates as above. The reader
can easily prove:
Lemma 17.3. For any ideal I in any commutative ring R, R/I is a commutative
ring. If R has an identity element then R/I has an identity element.
• If R ..= Z and I ..= mZ for some integer m then R/I = Z/mZ is the
usual ring of remainders modulo m.
• If p(x, y) is an irreducible nonconstant polynomial over a field k and
I = (p(x, y)) = p(x, y)k[x] is the associated ideal, then k[x, y]/I = k[X]
is the ring of regular functions on the algebraic plane curve X =
(p(x, y) = 0).
Lemma 17.4. A ideal I in a ring R with identity is all of R just when 1 is in I.
Proof. If 1 lies in I, then any r in R is r = r1 so lies in I, so I = R.
An ideal I in a ring R is prime if I is not all of R and, for any two elements a, b
of R, if ab is in I then either a or b lies in I.
• The ideal pZ in Z generated by any prime number is prime.
• The ring R is not prime in R, because the definition explicitly excludes
it.
• Inside R ..= Z[x], the ideal (x) is prime, because in order to have a
factor of x sitting in a product, it must lie in one of the factors.
• Inside R ..= Z[x], the ideal (2x) is not prime, because a = 2 and b = x
multiply to a multiple of 2x, but neither factor is a multiple of 2x.
• If p(x) is an irreducible polynomial, then (p(x)) is a prime ideal.
• The regular functions on an algebraic curve p(x, y) = 0 over a field k
constitute the ring R/I where R = k[x, y] and I = (p(x, y)). The curve
is irreducible just when the ideal is prime.
Quotient rings
• Take a field k. In the quotient ring R = k[x, y, z]/(z 2 − xy), the ideal
(z) contains z 2 = xy, but does not contain x or y, so is not prime.
Lemma 17.5. An ideal I in a commutative ring with identity R is prime just when
R/I has no zero divisors, i.e. no nonzero elements α, β can have αβ = 0.
Proof. Write α and β as translates α = a + I and β = b + I. Then αβ = 0 just when
ab + I = I, i.e. just when ab lies in I. On the other hand, α = 0 just when a + I = I,
just when a is in I.
A ideal I in a ring R is maximal if I is not all of R but any ideal J which contains
I is either equal to I or equal to R (nothing fits in between).
Lemma 17.6. If I is an ideal in a commutative ring R with identity, then I is a
maximal ideal just when R/I is a field.
Proof. Suppose that I is a maximal ideal. Take any element α 6= 0 of R/I and write
it as a translate α = a + I. Since α =
6 0, a is not in I. We know that (a) + I = R,
since I is maximal. But then 1 lies in (a) + I, so 1 = ab + i for some b in R and i in
I. Expand out to see that
1
= b + I.
α
So nonzero elements of R/I have reciprocals.
Suppose instead that R/I is a field. Take an element a in R not in I. Then a + I
has a reciprocal, say b + I. But then ab + I = 1 + I, so that ab = 1 + i for some i in
I. So (a) + I contains (ab) + I = (1) = R.
• If p is a prime number, then I ..= pZ is maximal in R ..= Z, because
any other ideal J containing I and containing some other integer, say
n, not a multiple of p, must contain the greatest common divisor of n
and p, which is 1 since p is prime. We recover the fact that Z/pZ is a
field.
• The ideals in R ..= Z/4Z are (0), (1) = Z/4Z and (2) ∼
= Z/2Z, and the
last of these is maximal.
• Take a field k and let R ..= k[x]. Every ideal I in R has the form
I = (p(x)), because each ideal is generated by the greatest common
divisor of its elements. We have seen that R/I is prime just when
p(x) is irreducible. To further have R/I a field, we need I = (p(x))
to be maximal. Take some polynomial q(x) not belonging to I, so
not divisible by p(x). Since p(x) is irreducible, p(x) has no factors, so
gcd {p(x), q(x)} = 1 in k[x], so that the ideal generated by p(x), q(x)
is all of R. Therefore I = (p(x)) is maximal, and the quotient R/I =
k[x]/(p(x)) is a field.
• If k is an algebraically closed field and R ..= k[x] then every irreducible
polynomial is linear, so every maximal ideal in R is I = (x − c) for
some constant c.
155
156
Quotient rings
17.1 For any positive integer m ≥ 2, what are the prime ideals in Z/mZ, and what
are the maximal ideals?
Sage
We take the ring R = Q[x, y] and then quotient out by the ideal I = (x2 y, xy 3 ) to
produce the ring S = R/I. The generators x, y in R give rise to elements x + I, y + I
in S, which are renamed to a, b for notational convenience.
R.<x,y> = PolynomialRing(QQ)
I = R.ideal([x^2*y,x*y^3])
S.<a,b> = R.quotient_ring(I)
a^2*b
which yields a^2 + 1.
Careful: sage is not perfect, and if you change QQ above the ZZ, sage doesn’t seem
to be able to give the correct answer (as of the time these notes were written).
We can define R = Q[x, y], I = (y 2 ), S = R/I where we label x + I, y + I as a, b,
T = Q[z] and define a morphism φ : S → T by φ(a) = z 3 , φ(b) = 0:
R.<x,y> = PolynomialRing(QQ)
I = R.ideal([y^2])
S.<a,b> = R.quotient_ring(I)
T.<z> = PolynomialRing(QQ)
phi = S.hom([z^3, 0],T)
phi(a^2+b^2)
yielding a^2.
Construction of splitting fields
Theorem 17.7. Suppose that k is a field and p(x) is a nonconstant polynomial over k,
say of degree n. Then there is a splitting field for p(x) over k, of degree at most n!.
Proof. Think for example of x2 + 1 over F = R, to have some concrete example in
mind. For the moment, assume that p(x) is irreducible. Recall that if p(x) divides a
product q(x)r(x), then p(x) divides one of the factors q(x) or r(x). We have no roots
of p(x) in k, so we will construct a splitting field.
Recall that k[x] is the ring of all polynomials over k in a variable x. Let I ⊂ k[x]
be the ideal consisting of all polynomials divisible by p(x) and let K ..= k[x]/I. We
have seen above that K is a field.
Every element of K is a translate, so has the form q(x) + I, for some polynomial
q(x). Any two translates, say q(x) + I and r(x) + I, are equal just when q(x) − r(x)
belongs to I, i.e. is divisible by p(x). In other words, if you write down a translate
q(x) + I and I write down a translate r(x) + I then these will be the same translate
just exactly when
r(x) = q(x) + p(x)s(x),
for some polynomial s(x) over k.
157
Construction of splitting fields
Recall how we multiply elements of K. For two elements q(x) + I, r(x) + I, their
product in K is q(x)r(x) + I. If I write the same translates down differently, I could
write them as
q(x) + Q(x)p(x) + I, r(x) + R(x)p(x) + I,
and my product would turn out to be
(q(x) + Q(x)p(x)) (r(x) + R(x)p(x)) + I
=q(x)r(x) + p(x) (q(x)R(x) + Q(x)r(x) + Q(x)R(x)) + I,
=q(x)r(x) + I,
the same translate, since your result and mine agree up to multiples of p(x), so
represent the same translate.
The next claim is that K is a finite degree extension of k. This is not obvious,
since K = k[x]/I and both k[x] and I are “infinite dimensional” in some sense. Take
any element of K, say q(x) + I, and divide q(x) by p(x), say
q(x) = p(x)Q(x) + r(x),
a quotient Q(x) and remainder r(x). Clearly q(x) differs from r(x) by a multiple of
p(x), so
q(x) + I = r(x) + I.
Therefore every element of K can be written as a translate r(x) + I for r(x) a
polynomial of degree less than the degree of p(x). Clearly r(x) is unique, since we
can’t quotient out anything of lower degree than p(x). Therefore K is identified
with the space of polynomials in x of degree less than the degree of p(x), as the
identification preserves addition and scaling by constants. Danger: this identification
doesn’t preserve multiplication. We multiply and then modulo out by p(x).
The notation is much nicer if we write x + I as, say, α. Then clearly x2 + I = α2
and so on, so q(x) + I = q(α) for any polynomial q(x) over k. So we can say that
α ∈ K is an element so that p(α) = 0, so p(x) has a root over K. Moreover, every
element of K is a polynomial over k in the element α. In particular, K has a basis
1, α, α2 , . . . , αn−1 where n is the degree of p(x): the degree of K as a field extension
is the degree of p(x).
We still need to check that p(x) splits over K into a product of linear factors. In
fact, the bad news is that it might not. Clearly we can split off one linear factor: x − α,
since α is a root of p(x) over K. Inductively, if p(x) doesn’t completely split into a
product of linear factors, we can try to factor out as many linear factors as possible,
and then repeat the whole process for any remaining nonlinear factors, constructing
splitting fields for them, adding in new roots as needed.
Finally, if p(x) is reducible, factor p(x) into irreducible factors, and take a splitting
field K for one of the factors. Then by induction on the degree of p(x), the other
factors have a splitting field over K.
Lets work out how to calculate in a splitting field of an irreducible polynomial p(x)
over a field k. Our proof shows that our splitting field is produced one step at a time,
starting with a field extension K which just gives us one new root α to p(x). The
elements of K look like polynomials q(α), where α is just some formal symbol, and
you add and multiply as usual with polynomials. But we can always assume that q(α)
is a polynomial in α of degree less than the degree of p(x), and then subtract off any
p(x)-multiples when we multiply or add, since p(α) = 0. For example, if p(x) = x2 + 1,
158
Quotient rings
and k = R, then the field K consists of expressions like q(α) = b + cα, where b and c
are any real numbers. When we multiply, we just replace α2 + 1 by 0, i.e. replace α2
by −1. So we just get K being the usual complex numbers.
Transcendental numbers
A number α ∈ C is algebraic if it is the solution of a polynomial equation p(α) = 0
where p(x) is a nonzero polynomial with rational coefficients. A number which is not
algebraic is called transcendental. More generally, given a field extension K of a field
k, an element α ∈ K is algebraic over k if it is a root of a nonzero polynomial p(x)
with coefficients in k.
Theorem 17.8. For any field extension K of a field k, an element α ∈ K is transcendental if and only if the field
k(α) =
p(α) p(x)
∈ k(x) and q(α) 6= 0
q(α) q(x)
is isomorphic to k(x).
Proof. If α is transcendental, then the map
f:
p(α)
p(x)
∈ k(x) 7→
∈ k(α)
q(x)
q(α)
is clearly well defined, onto, and preserves all arithmetic operations. To show that f
is 1-1 is the same as showing that f has trivial kernel. Suppose that p(x)/q(x) lies in
the kernel of f . Then p(α) = 0, so p(x) = 0, so p(x)/q(x) = 0. So the kernel is trivial,
and so f is a bijection preserving all arithmetic operations, so f is an isomorphism of
fields.
On the other hand, take a element α in K and suppose that there is some isomorphism of fields
g : k(x) → k(α).
Let β = g(x). Because g is a field isomorphism, all arithmetic operations carried out
on x must then be matched up with arithmetic operations carried out on β, so
g
p(x)
q(x)
=
p(β)
.
q(β)
Because g is an isomorphism, some element must map to α, say
g
So
p0 (x)
q0 (x)
= α.
p0 (β)
= α.
q0 (β)
So k(β) = k(α). Any algebraic relation on α clearly gives one on β and vice versa.
Therefore α is algebraic if and only if β is. Suppose that β is algebraic. Then q(β) = 0
for some polynomial q(x), and then g is not defined on 1/q(x), a contradiction.
Chapter 18
Field extensions and algebraic curves
We can sometimes draw pictures of field extensions. Take an algebraic curve C =
(p(x, y) = 0) in the plane, and look at its field of rational functions k(C), our favourite
example of a field extension of a field k.
Suppose that K is a field extension of k(C), say K = k(C)(α) given by extending by
an element α. We want to draw a curve D so that K = k(D).
If α is transcendental, then k(C)(α) ∼
= k(C)(z) for some abstract variable z. But
k(C) is already a field of rational functions of two variables x, y, quotiented out to
enforce the relation p(x, y) = 0. So K is just the rational functions on the “cylinder”
C × k inside 3-dimensional space, with variables x, y, z; our cylinder has the equation
p(x, y) = 0, independent of z.
Suppose instead that α is algebraic, say satisfying a polynomial equation f (z) = 0
with f (z) a polynomial with coefficients from k(C). So each coefficient of f (z) is itself
a rational function on C. Each such function is expressible as a rational function
of x, y. Clearing denominators, we get a polynomial equation f (x, y, z) = 0 with
coefficients in the underlying field k. The solutions form a surface in 3-dimensional
space. But the resulting surface depends on how we pick out rational functions of
x, y to represent our coefficients of f (z) from k(C). So in fact we have to restrict
159
160
Field extensions and algebraic curves
our x, y variables to lie in C, i.e. we consider the curve D cut out by the equations
p(x, y) = f (x, y, z) = 0.
The map (x, y, z) ∈ D 7→ (x, y) ∈ C is regular.
We wonder now whether there is a k(C)-root to f (z) = 0, say β. This β is in k(C),
so a rational function on C, so that f (β) = 0. So this is a rational map z = β(x, y)
from C to D, lifting the curve C up from the plane into space, tracing out D on the
cylinder. In particular, there is no such map if D is irreducible and D → C is many
to one:
So in such a case, k(C) ⊂ k(D) is a nontrivial field extension. We get a picture giving
some intuition for nontrivial algebraic field extensions: typically we expect to see an
irreducible curve D mapping to a curve C by a many-to-one map.
The story continues similarly in higher dimensions: a transcendental extension
increases dimension, while an algebraic extension, say of degree n, preserves dimension,
giving a new geometric object which maps regularly to the old one, by an n-to-1 map.
Rational curves
Recall that the degree of a field extension k ⊂ K, denoted [K : k], is the smallest
integer n so that there areP
n elements α1 , α2 , . . . , αn of K so that every element of
K is a linear combination
aj αj with coefficients a1 , a2 , . . . , an in k. The degree of
the field extension is infinite if there is no such integer n.
18.1 Suppose that k ⊂ K ⊂ L are field extensions. Prove that [L : k] = [L : K][K : k].
Note: this works even if one or more degree is infinite: let n∞ = ∞n = ∞∞ = ∞ for
any positive integer n. You have to prove these infinite degree cases as well.
18.2 Prove that a field extension is an isomorphism if and only if it has degree one.
161
Rational curves
18.3 Suppose that p(x) is an irreducible polynomial over a field k. Let K = k[x]/(p(x)).
Prove that [K : k] is the degree of p(x).
18.4 Prove that if u(t) is a nonconstant rational function over a field k, then there
are infinitely many values of u(t) as t varies over k̄. In particular, k ⊂ k(u(t)) is a
transcendental extension.
Recall that a curve C = (0 = f (x, y)) is irreducible if f (x, y) is irreducible.
Theorem 18.1. An irreducible plane algebraic curve C is rational just when there
is a nonconstant rational map L → C. In other words, C is rational just when there
are rational functions x(t), y(t) for which the point (x(t), y(t)) lies in C for all t in
the algebraic closure of the field.
Proof. If our curve C is rational then by definition C is birational to a line, so there is
such a map. Suppose on the other hand that there is such a map L → C, say written
as t ∈ L 7→ (x(t), y(t)) ∈ C with
f (x(t), y(t)) = 0
for some rational functions x(t), y(t), not both constant. As we vary t through k̄,
the moving point (x(t), y(t)) moves through infinitely many different points of the
plane. Take a rational function on C, say g(x, y) = b(x, y)/c(x, y). Then c(x, y) 6= 0
at all but finitely points of C. So at infinitely many values of t, g(x(t), y(t)) is defined.
Note that this requires that our curve C is irreducible, as otherwise we might have
g(x, y) vanishing on a component of C containing the points of the form (x, y) =
(u(t), v(t)), and C might contain a rational curve among its various components. We
map g(x, y) ∈ k(X) 7→ g(x(t), y(t)) ∈ k(t), a morphism of fields. The image is some
subfield K = k(x(t), y(t)) ⊂ k(t). The result follows from the following theorem
applied to K.
Theorem 18.2 (Lüroth’s theorem). Suppose that K ⊂ k(t) is a finitely generated
extension of a field k, where t is an abstract variable. Then K = k(f (t)) is generated
by a single rational function f (t) in k(t). In particular, either K = k (if f (t) is a
constant function) or K is isomorphic to k(t) by t 7→ f (t).
The proof of this theorem requires a lemma:
Lemma 18.3. Take a nonconstant rational function f (x) = b(x)/c(x), with b(x), c(x)
relatively prime, over a field k. Then, as a polynomial in a variable y with coefficients
in k(f (x)), the expression
f (x)c(y) − b(y)
in k(f (x))[y] is irreducible.
Proof. This expression is not zero, because it is nonconstant in x. It vanishes at y = x.
Hence x satisfies this polynomial (in y) expression with coefficients in k(f (x)). So k(x)
is an algebraic extension of k(f (x)). Since k(x) is transcendental over k, k(f (x)) is
transcendental over k. So f (x) solves no polynomial equation over k, and so taking any
abstract variable t and mapping k(t, y) → k(f (x), y) by t 7→ f (x) gives an isomorphism
of fields, and similarly k[t, y] → k[f (x), y] is an isomorphism of rings. By Gauss’s
lemma (proposition 9.13 on page 73), to show that tc(y) − b(y) is irreducible in k(t)[y],
it suffices to prove that it is irreducible in k[t, y], i.e. clear t from denominators. If
162
Field extensions and algebraic curves
we can factor tc(y) − b(y) = g(t, y)h(t, y) in k[t, y] then one of g(t, y) or h(t, y) has
degree 1 in t, since the product does, and so one of g(t, y) or h(t, y) has degree zero
in t, depending only on y, say g(t, y) = g(y). Then g(t)h(t, y) = tc(y) − b(y), so g(y)
divides tc(y) − b(y), and since t is an abstract variable, g(y) divides b(y) and c(y),
which are relatively prime, so g(y) is a nonzero constant.
We now prove Lüroth’s theorem:
Proof. For each element f (t) = b(t)/c(t) in K, let n be the larger of the degrees of
b(t), c(t). Pick an element f (t) = b(t)/c(t) in K, not in k, for which the value of n is
as small as possible. By the previous lemma, the polynomial B(y) = f (x)c(y) − b(y)
is irreducible in k(f (x))[y], so the map
x, y ∈ k(f (x))[y]/(B(y)) 7→ x, x ∈ k(x)
is an isomorphism of fields and [k(x) : k(f (x))] = n. The expression B(y) is the
polynomial in K[y] of smallest positive y-degree which is satisfied by y = x, since a
smaller degree one would give a smaller value for n. Indeed the y-degree of B(y) is n.
In particular, B(y) is also irreducible in K[y] and the map
x, y ∈ K[y]/(B(y)) 7→ x, x ∈ k(x)
is an isomorphism. In particular, [k(x) : K] = n. But
n = [k(x) : k(f (x))],
= [k(x) : K][K : k(f (x))],
= n[K : k(f (x))]
so [K : k(f (x))] = 1.
Chapter 19
The projective plane
The more systematic course in the present introductory memoir . . . would
have been to ignore altogether the notions of distance and metrical geometry . . . . Metrical geometry is a part of descriptive geometry, and
descriptive geometry is all geometry.
— Arthur Cayley
The old fashioned term descriptive geometry means the geometry of straight lines.
Straight lines in the plane remain straight when you rescale the plane, or translate
the plane to the left or right, up or down, or when you rotate the plane. They even
remain straight when you carry out any linear change of variables, as you know from
linear algebra.
To see a World in a Grain of Sand,
And Heaven in a Wild Flower.
Hold Infinity in the palm of your hand,
And Eternity in an hour.
— William Blake
Auguries of Innocence
Railway tracks built on a flat plane appear to meet “at infinity”.
Creative Commons
Attribution-Share Alike 3.0
Unported license. I, MarcusObal
Imagine that we were to add a point to the plane “at infinity” in the direction where
these tracks (straight lines) appear to meet. Danger: in order to add only one point,
imagine that the two rails meet at the same point in one direction as they do in the
other, making them both into circles. The projective plane is the set of all of the
usual points of the plane (which we can think of as the “finite points” of the projective
plane) together with some sort of “points at infinity” to represent the directions in
the plane.
163
164
The projective plane
But you must note this: if God exists and if He really did create the world,
then, as we all know, He created it according to the geometry of Euclid
and the human mind with the conception of only three dimensions in
space. Yet there have been and still are geometricians and philosophers,
and even some of the most distinguished, who doubt whether the whole
universe, or to speak more widely the whole of being, was only created in
Euclid’s geometry; they even dare to dream that two parallel lines, which
according to Euclid can never meet on earth, may meet somewhere in
infinity.
— Fyodor Dostoyevsky
The Brothers Karamazov
We would like a more precise mathematical definition of points “at infinity”. A
pencil of parallel lines is the collection of all lines parallel to a given line:
A simple approach: just say that a “point at infinity” means nothing more than a
choice of pencil of parallel lines.
There is another completely different way to give a precise mathematical definition
of the projective plane, which is more geometric and more powerful. Place yourself
at a vantage point above the plane.
Every “finite” point of the plane lies on a line through your vantage point: the line
of sight.
You can also look out to the points at infinity, along lines:
The projective plane
For example, there is such a line parallel to the lines of our train track:
On the other hand, if we take any line through the vantage point, either it hits a
“finite” point of the plane:
or it is parallel to the plane
so that it is pointed along the direction of some train tracks (lines) in the plane, to
some “point at infinity”.
The points of the projective plane, finite points together with infinite points at infinity,
are in 1-1 correspondence with lines through the vantage point.
This picture gives us a rigorous definition of points at infinity: the projective plane
is the set of all lines through a chosen point of 3-dimensional space, called the vantage
165
166
The projective plane
point. The points of the projective plane (using this definition) are the lines through
the vantage point. The “finite points” are the lines not parallel to the horizontal
plane, while the “infinite points” are the lines which are parallel the horizontal plane.
The lines, also called projective lines, of the projective plane are the planes through
the vantage point.
Homogeneous coordinates
We can make this more explicit by writing each point of 3-dimensional space R3 as a
triple (x, y, z). Draw the plane not as the usual xy-plane, but as the plane z = z0 , for
some nonzero constant z0 . Take the vantage point to be the origin. Any linear change
of the 3 variables x, y, z takes lines (and planes) through the origin to one another:
there are more symmetries of the projective plane than we encountered before.
Every point of 3-dimensional space not at the origin lies on a unique line through
the origin. So every point (x, y, z) with not all of x, y, z zero lines on a unique line
through the origin. The points of this line are precisely the rescalings (tx, ty, tz) of
that point. So each point of the projective plane can be written as a triple (x, y, z),
not all zero, and two such triples represent the same point just when each is a rescaling
of the other. Denote such a point as [x, y, z]. For example, the point [3, 1, 2] of the
projective plane means the line through the origin consisting of the points of the form
(3t, t, 2t) for all values of a variable t. The coordinates x, y, z are the homogeneous
coordinates of a point of the projective plane.
In our pictures, the plane z = z0 was below the vantage point, but it is more
traditional to take the plane to be z = 1, above the vantage point. The affine plane is
just the usual xy plane, but identified either with the plane z = 1, or with the subset
of the projective plane given by points [x, y, 1].
The roles of x, y, z are all the same now, and we can see that any linear change of
variables of x, y, z takes the points of the projective plane to one another, and takes
the lines of the projective plane to one another. Each linear change of variables is
represented by an invertible matrix
g=
g00
g10
g20
g01
g11
g21
g02
g12
g22
!
.
Conventionally we label our matrix columns using 0, 1, 2 rather than 1, 2, 3. Such a
matrix takes a point p = [x, y, z] of the plane to the point q = [X, Y, Z] where
X
Y
Z
!
=g
x
y
z
!
.
We denote this relation as q = [g]p and call [g] a projective automorphism. Clearly
if g and h are 2 invertible 3 × 3 matrices, then [gh] = [g][h], i.e. we carry out the
transformation of lines through the origin by carrying out the multiplications by the
matrices.
Lemma 19.1. If a square matrix g rescales every nonzero vector by a scalar, i.e.
gx = λ(x)x for some number λ(x), for every vector x 6= 0, then λ(x) is a constant
scalar independent of x.
167
Example: the Fano plane
Proof. Suppose that g is n × n. If n = 1 then every
Psquare matrix is a constant scalar,
so suppose that n ≥ 2. Write our vectors as x = j xj ej in the standard basis of Rn
and calculate
gx = λ(x) x,
=
X
λ(x) xj ej ,
j
=
X
xj gej ,
j
=
X
xj λ(ej ) ej .
j
So for any x with xj =
6 0, we have λ(x) = λ(ej ). But then if xi 6= 0, and if i =
6 j, then
λ(x) = λ(ei ) ,
= λ(ei + ej ) ,
= λ(ej ) .
So λ is a constant.
Lemma 19.2. Two projective automorphisms [g], [h] have precisely the same effect
on all points [x, y, z] of the projective plane just when the matrices agree up to a
constant h = λg, some number λ 6= 0.
Proof. Let ` = h−1 g, so that [`] = [h]−1 [g] and [`] fixes every point of the projective
plane just when ` rescales every vector in R3 by a scalar.
In particular, the points of the projective plane are permuted by the projective
automorphisms, as are the projective lines.
The projective!plane over any field k, denoted P2 (k), is the set of lines through
the origin in k3 . In k3 , the line through two points
x
y
z
!
p=
,q =
X
Y
Z
!
is just the set of all points tp + (1 − t)q for t in k.
Example: the Fano plane
Let k ..= Z/2Z; the Fano plane is the projective plane P2 (k). Each point is [x, y, z]
with x, y, z in k defined up to rescaling. But the only nonzero scalar you can use to
rescale with is 1, since k = { 0, 1 }. Therefore we can just write [x, y, z] as (x, y, z).
Taking all possibilities for x, y, z from k, not all zero, we get 7 points
1
0
1
0
1
0
1
0 , 1 , 1 , 0 , 0 , 1 , 1 ,
0
0
0
1
1
1
1
" # " # " # " # " # " # " #
corresponding to the numbers 1, 2, . . . , 7 written in base 2.
We can also draw this diagram as a cube; for each point, you draw the corresponding point in R3 with the same coordinates:
168
The projective plane
4
5
6
7
2
3
1
with one vertex not marked. The unmarked vertex If we let k = Z/2Z, then the cube
is the set of points in k3 , or more precisely, since the unlabelled point is the origin
and is deleted, the cube is the Fano plane. Every line in the Fano plane has three
vertices. Take two numbers from 1 to 7, say 1, 5, and write out their binary digits
under one another:
001
.
101
Look at these as vectors in k3 , so add them without carrying digits:
001
101
100
The sum vector lies on the same line in the Fano plane, giving the lines:
4
6
2
5
7
1
3
4
6
2
5
7
1
3
4
6
2
5
7
1
3
4
6
7
1
2
3
4
4
4
6
2
5
7
1
3
6
2
5
7
1
3
5
6
2
5
1
3
Projective space
Similarly, projective space Pn (k) is the set of lines through the origin of kn+1 , and
a projective automorphism of projective space is the result of applying an invertible
square matrix to those lines. Again, two invertible square matrices yield the same
projective automorphism just when they agree up to a scalar multiple, by the same
proof.
Chapter 20
Algebraic curves in the projective plane
Homogenisation of plane curves
Take an algebraic curve y 2 = x2 + x3 in the affine plane over a field k.
The cone on the curve is the surface in k3 given by y 2 z = x2 z + x3 .
The recipe: given an algebraic equation, like y 2 = x2 + x3 :
a. Find the degree n of the highest term; in this example n = 3.
b. Invent a new variable z.
c. Multiply each term by a power of z so that the total degree of each term reaches
n.
A polynomial is homogeneous if all of its terms have the same total degree, i.e. sum
of degrees in all variables.
20.1 Homogenise x2 y − x2 + y = 1 + 2x − y 2 x3 .
In sage:
R.<x,y,z> = PolynomialRing(QQ)
p = x^3+x*y+1
p.homogenize(z)
yields x3 + xyz + z 3 .
The homogenised polynomial equation y 2 z = xz 2 + x3 has all terms of degree
3, so if we rescale x, y, z to tx, ty, tz, we rescale both sides of the equation by t3 , so
169
170
Algebraic curves in the projective plane
solutions (x, y, z) rescaled remain solutions (tx, ty, tz). Thefore every solution lies on
a line through the origin consisting of other solutions. The set of solutions is a surface,
which is a union of lines through the origin. A projective plane curve is the set of
lines through the origin satisfying a nonzero homogeneous polynomial.
Conics
Take a hyperbola xy = 1 in the affine plane,
y
x
−5
−5
and homogenise to xy = z 2 , as we see from three different perspectives:
If we slice the surface along a plane like z = 2, parallel to the plane we drew
but higher up, that plane z = 2 intersects our surface in a dilated hyperbola.
But if we slice the surface along the plane x = 1, we get xy = z 2 to become
y = z 2 , a parabola:
171
Conics
Suitable linear changes of variable interchange the x and z axes, so suitable
projective automorphisms identify the two affine curves: the hyperbola and
the parabola. In particular, the two curves have isomorphic fields of rational
functions. They don’t have isomorphic rings of regular functions.
If instead we slice our surface with the plane x + y = 1,
we get y = 1 − x plugged in to xy = z 2 , which simplifies to
x−
1
2
2
+ z2 =
1
,
4
a circle of radius 12 . So again the circle is birational to the parabola and to
the hyperbola, and they all have the same rational function fields. In these
pictures we can see why the conics are called conics: they are all slices of the
same cone.
The affine curve x2 +y 2 = −1 has no real points on it, but has complex points.
It homogenises to x2 + y 2 + z 2 = 0, a “sphere of zero radius”, with the origin
as its only real point, but with again many complex points. The complex
linear change of variables X = x − iy, Y = x + iy, Z = iz gives us XY = Z 2 ,
172
Algebraic curves in the projective plane
the same projective curve as before. So these curves are all birational over
the complex numbers, and indeed are all identified by automorphisms of the
complex projective plane.
20.2 By the same trick of homogenising and dehomogenising in various variables, draw
the same sort of pictures for the circle x2 + y 2 = 1 to see what cone it homogenises
to, and what curves it makes when you intersect that cone with the planes (a) x = 1,
(b) y = 1 and (c) z = y + 1.
Counting intersections of curves
Who are you going to believe, me or your own eyes?
— Chico Marx
• A line through a cubic curve
can hit as many as 3 points.
• Two conics
can intersect at as many as 4 points,
• but sometimes only at 3 points:
• Some conics don’t appear to intersect at all
but keep in mind that our points can have coordinates in the algebraic
closure, so they actually intersect.
173
Counting intersections of curves
• For example, x2 + y 2 = 1 and (x − 4)2 + y 2 = 1:
√
intersect at (x, y) = (2, ±i 15).
• Some conics appear not to have even complex intersections, such as
x2 + y 2 = 1 and x2 + y 2 = 2.
But in the projective plane, we can homogenise to x2 + y 2 = z 2 and
x2 +y 2 = 2z 2 , and then intersect with the plane x = 1, to get 1+y 2 = z 2
and 1 + y 2 = 2z 2 , which intersect at (y, z) = (±i, 0).
Lemma 20.1. Suppose that k is an infinite field. Then for any finite set of lines
in P2 (k), there are infinitely many points of P2 (k) not on any of those lines.
Proof. If we work in the affine plane, the points (x, y) lying on those lines must satisfy
one of the associated linear equations, say y = mi x + bi or x = ci . Pick a number
x0 not among the constants ci . Now we have finitely many values y = mi x0 + bi to
avoid; pick y to not be one of those.
Lemma 20.2. Every nonzero homogeneous polynomial of degree
n in two variables
over a field k has n roots, counted with multiplicities, in P1 k̄ , and so splits into a
product of homogeneous linear factors.
Proof. As part of the proof, we define multiplicities of zeroes of a homogeneous
polynomial p(x, y) to be multiplicities of linear factors. So we are saying precisely
that p(x, y) factors into n factors. If p(x, y) has a factor of y, then divide that out;
by induction we can assume that there is no factor of y. Set y = 1: look for zeroes
of p(x, 1). Since y is not a zero of p(x, y), p(x, 1) has degree precisely n, and so has
precisely n zeroes in k̄, so splits into n linear factors.
• The polynomial
p(x, y) = x3 + x2 y − 2xy 2
becomes
p(x, 1) = x3 + x2 − 2x = x(x − 1)(x + 2),
so that
p(x, y) = x(x − y)(x + 2y).
• The family of curves with y 2 = ax contains the curve y 2 = 0 when
a = 0, a line. So a family of curves of degree 2 contains a curve of
degree 1. It is natural to think of the curve being a “double line” when
a = 0.
174
Algebraic curves in the projective plane
Similarly, we can say that a multiple component of a curve means a curve given
by an equation which is a positive integer power of an irreducible homogeneous polynomial, say 0 = f (x, y, z)k giving a component with multiplicity k. From now on, we
allow projective plane curves to have components with positive integer multiplicities.
Breaking curves into components
Theorem 20.3 (Study). Every algebraic curve in the projective plane has a unique
decomposition into irreducible algebraic curves, its components. To be more precise,
every nonzero homogeneous polynomial in 3 variables f (x, y, z) has a decomposition
f = a0 g1k1 g2k2 . . . gnkn ,
into a product of a nonzero constant a0 and several coprime irreducible homogeneous
polynomials gj (x, y, z) to various positive integer powers, unique up to rescaling the
constant and the polynomials by nonzero constants.
Proof. Suppose that g is an irreducible polynomial and that f = 0 at every point in
k̄ where g = 0. We want to prove that g divides f . Work in the affine plane z = 1.
Think of each equation as a polynomial in one variable x, but with coefficients rational
functions of y. By corollary 16.3 on page 150, after perhaps a linear change of variables,
the resultant in y vanishes just at those x values where there are simultaneous roots.
In particular, the number of different values of x where such roots occur is never more
than the degree of the resultant, as a polynomial in x. For each value x = c (over
k̄) for which g(c, y) is not constant, the values of y at which g(c, y) = 0 also satisfy
f (c, y) = 0, so the resultant vanishes. Therefore this resultant is the zero polynomial.
If the curve where f = 0 is the union of distinct irreducible curves given by equations
gi = 0, then each of these gi divides f , so their product divides f . Apply induction
on the degree of f .
Theorem 20.4. Every rational morphism of projective plane algebraic curves f : B →
C is constant or onto a finite union of components of C, at most one for each
component of B.
Proof. We can assume that B is irreducible, and we have already assumed that our
field is algebraically closed by definition of the points of the curve. Our rational
morphism is
f (x, y, z) = (p(x, y, z), q(x, y, z), r(x, y, z)).
If B = (b(x, y, z) = 0) then the system of equations
b(a, b, c) = 0, a = a(x, y, z), b = b(x, y, z), c = c(x, y, z)
has (after perhaps a linear change of x, y, z variables) resultant eliminating x, y, z given
by some algebraic equation in a, b, c which specifies which points lie in the image. By
corollary 16.3 on page 150 this equation is not trivial. Split C into irreducibles
C1 , C2 , . . . , CN , say Cj = (cj = 0). Then c1 c2 · · · cN ◦ f = 0 on B, since f takes B to
C. So if B = (b(x, y, z) = 0) then b is divisible by c1 c2 · · · cN ◦ f in any affine chart.
If c1 ◦ f is not zero on B then divide by it to see that already c2 · · · cN ◦ f = 0 on B.
175
Counting intersections
So f takes B into C2 ∪ · · · ∪ CN . We can assume that cj ◦ f = 0 on B for every j, i.e.
that B ⊂ f −1 Cj for every j, i.e. that
f (B) ⊂
\
Cj :
j
there is only one such Cj , i.e. C is irreducible. Hence the image is precisely C.
Counting intersections
Pick a triangle pqr in the projective plane. A triangle is generic for two algebraic
projective plane curves B, C if (1) no line connecting two intersection points of B and
C lies through q and (2) neither curve passes through q.
q
p
r
A triangle is very generic for curves B and C if it is generic and there is no intersection
point of B and C along the line qr, even in the algebraic closure of the field.
By projective automorphism, we can make any triangle become the one whose
vertices are
[0, 0, 1], [1, 0, 0], [0, 1, 0].
176
Algebraic curves in the projective plane
This projective automorphism is unique up to rescaling the variables. In the affine
plane, these vertices are the points (0, 0), (∞, 0), (0, ∞). Suppose that our two curves
have equations B = (b(x, y) = 0) and C = (c(x, y) = 0). The triangle is generic
just when (1) no vertical line intersects two intersection points of B and C and (2)
(0, ∞) = [0 : 1 : 0] is not on B or C, i.e. b(x, y) and c(x, y) both have nonzero constants
in front of their highest terms in y. If the triangle is generic, it is also very generic
just when there is no zero of the homogeneous polynomials b(x, y, z), c(x, y, z) on the
line at infinity, i.e. the line z = 0, or, in other words, the highest order terms of b(x, y)
and c(x, y) (which are homogeneous of the same degree as b(x, y) and c(x, y)) have no
common root on P1 . The resultant resb,c (x) in y depends, up to constant factor, only
on the choice of curves B and C and the choice of triangle. The intersection number
of B and C at a point p = (x, y), denoted BCp , is the multiplicity of x as a zero of
resb,c (x). If p is not a point of both B and C then define BCp ..= 0. If p lies on a
common component of B and C then let BCp = ∞. Danger: so far, the intersection
number depends on the choice of triangle.
• Let b(x, y) = y + x and c(x, y) = y. The intersection points of the
lines (b(x, y) = 0) and (c(x, y = 0)in the affine plane are at 0 =
y = y + x, i.e. at the origin. In the projective plane, there is no
further intersection point, as the equations are already homogeneous.
Therefore the standard triangle is very generic for these two lines. The
resultant of b(x, y), c(x, y) in y is r(x) = x. Therefore the two lines
B = (y + x = 0) and C = (y = 0) have intersection number 1 at
(0, 0), and similarly any two lines have intersection number 1 at their
intersection point.
• Let B = (xy 2 + y = 0) and C = (xy 2 + y + 1 = 0). The curves intersect
in the affine plane at
1
(x, y) = −2, −
2
(assuming that 2 6= 0 in our field). But if we homogenize, we find also
the intersection point (0, ∞) = [0 : 1 : 0]. So the coordinate triangle is
not generic. As previously, if we change x to x + λ for any constant
λ 6= 0, our curves change to
B = (λy 3 + xy 2 + y = 0), C = (2λy 3 + 2xy 2 + y + 1).
They now intersect at
(x, y) = (−1 − λ, 1) = [−1 − λ, 1, 1].
(We previously saw this from looking at the resultant r(x) = −λ2 (λ +
1 + x).) After homogenizing and then setting y = 1, we also find
another intersection at [x, y, z] = [−λ, 1, 0]. The standard triangle is
generic for these curves, since the two intersection points lie on different
lines through [0, 1, 0]. But it is not very generic, because our second
intersection point lies outside the affine plane, on the line at infinity.
We can only calculate the intersection number at the first point, where
the resultant vanishes to degree 1, so an intersection number
BC(−λ−1,1) = 1.
The other point is not in the affine plane, so intersection multiplicity
is not defined at that point, at least by our definition.
177
Counting intersections
Theorem 20.5 (Bézout). Over any field, take two projective algebraic plane curves B
and C not sharing a common component. Over some finite degree extension of the
field there is a very generic triangle for the curves. For any generic triangle, the
intersection numbers of the intersection points at points in the affine plane over the
given field sum to
X
BCp ≤ deg B deg C.
p
A generic triangle is very generic just when the intersection numbers of the intersection
points at points of the affine plane sum to
X
BCp = deg B deg C.
p
Proof. Split B and C into irreducibles and add up intersections by multiplying resultants: without loss of generality we can assume that B and C are irreducible.
By proposition 12.11 on page 119, there are at most deg B deg C values of x that
occur on intersection points, and by the same reasoning at most that number of y
values, so finitely many intersection points.
Since there are finitely many intersection points, there are finitely many lines
through them. By lemma 20.1 on page 173 we can pick one vertex of our triangle
to avoid lying on any of them (after perhaps replacing by an extension field), while
arbitrarily picking the other two vertices to be any two distinct points. So we have
a generic triangle, and we can assume that its vertices are (0, 0), (∞, 0) and (0, ∞).
We can then move the line at infinity to avoid the intersection points (again in some
extension field), so there is a very generic triangle.
Pick any generic triangle. No two intersection points lie on any line through (0, ∞),
i.e. on any vertical line: all of the intersection points have distinct x values. There is
no intersection point at (0, ∞), i.e. the highest order terms in y in the equations have
no common zero. Therefore even when b(a, y) drops degree at some value of x = a,
c(a, y) doesn’t, so the resultant is the product of the values of b(a, y) at the roots y
of c(a, y), vanishing precisely at the values of x where the curves have an intersection.
In the proof of proposition 12.11 on page 119, we found the degree of the resultant
to be at most deg b deg c and nonzero. There are at most a number of intersection
points given by the degree of the resultant as a polynomial. Hence the sum of the
intersection numbers over the intersections in the affine plane is the sum of the degree
of root over all roots of the resultant, so at most the degree of the resultant. Our
triangle is very generic just when this sum covers all intersection points, as there are
none on the line at infinity.
We still need to see that for our very generic triangle, the degree of the resultant is
precisely deg B deg C. The assumption of being very generic is expressed algebraically
as saying that the homogeneous polynomials b(x, y, z), c(x, y, z) have no common roots
along z = 0. In other words, b(x, y, 0) and c(x, y, 0) have no common roots on P1 . So
the resultant of b(x, y, 0), c(x, y, 0) in y is nonzero, and has degree deg B deg C. But
the degree of resultant of b(x, y, z), c(x, y, z) can be no less for variable z as for the
fixed value z = 0. Therefore the resultant has the required degree.
Theorem 20.6. To any two projective algebraic plane curves B and C over a field k
and any point p of the projective plane defined over the algebraic closure k̄ of k, there
is a unique quantity BCp so that
a. BCp = CBp and
178
Algebraic curves in the projective plane
b. BCp = 0 just when p does not lie on any common point of B and C defined
over k̄ and
c. BCp = ∞ just when p lies on a common component of B and C defined over k̄
and
d. BCp is a positive integer otherwise and
e. two distinct lines meet with intersection multiplicity 1 at their unique point of
intersection and
f. if B splits into components (perhaps with multiplicities) B1 and B2 , then BCp =
B1 Cp + B2 Cp and
g. if B and C are defined by homogeneous polynomials b(x, y, z) and c(x, y, z) and
E is the curve defined by bh + c where h(x, y, z) has degree deg h = deg b − deg c,
then BCp = BEp .
Moreover, BCp can be computed as defined above using any generic triangle.
Proof. In this proof, we write bcp instead of BCp if B is cut out by the equation
0 = b(x, y, z) and C by 0 = c(x, y, z). First we prove that the conditions a.–g.
determine the multiplicity uniquely. Since they are independent of choice of affine
chart, this ensures that the multiplicity is also independent. We then check that our
definition above satisfies these, so must be determined by these conditions independent
of the choice of affine chart made in the definition above.
Any two notions of multiplicity agree on common components of B and C, and
on points not belonging to B or C, by the conditions above. So we only need to
check points of positive finite multiplicity. We can assume by induction that B and
C are both irreducible, and cut out by irreducible homogeneous polynomials b and
c. Suppose that we have two different methods of calculating an intersection number
satisfying our various conditions, one of which we can take to be the definition by the
resultant above. By induction suppose that they agree wherever they both assign a
value less than the larger of the two values that they assign as bcp .
Take affine coordinates in which p = (0, 0). If deg b(x, 0) = 0 then b(x, 0) = 0 so
b(x, y) is divisible by y, not irreducible, so our result holds by induction. The same
holds if deg c(x, 0) = 0. So both b and c have positive degree in each of x and y when
the other variable is set to zero. Rescale to get both to have unit coefficient in x:
b(x, y) = xβ + . . . ,
c(x, y) = xγ + . . . .
Suppose (after perhaps swapping the names of b and c) that β ≤ γ and let
h(x, y) = c(x, y) − xγ−β b(x, y).
By construction, h(x, 0) has degree in x at most γ −1. Our last property of intersection
multiplicity above demands that bcp = bhp . Therefore we can replace c by h and
repeat, lowering the degree in x by induction. This proves uniqueness of intersection
multiplicities satisfying our axioms.
We want to prove existence, i.e. that the recipe we defined above satisfies these
axioms:
a. Resultants change sign when we swap polynomials, so resb,c = − resc,b .
b. Follows from theorem 20.5 on the preceding page.
179
Sage
c. Follows from theorem 20.5 on page 177.
d. Follows from theorem 20.5 on page 177.
e. Follows by direct computation of the resultant.
f. Follows from lemma 10.5 on page 81.
g. Follows from proposition 10.4 on page 80, and the existence of splitting fields.
We restate Bézout’s theorem without reference to generic triangles, since the
previous theorem proves that they are irrelevant.
Theorem 20.7 (Bézout). Over any field k, take two projective algebraic plane curves
B and C not sharing a common component. Over the algebraic closure k̄ of the field,
the intersection numbers of the intersection points sum to
X
BCp = deg B deg C.
p∈P2(k̄)
20.3 Suppose that every intersection point of two projective algebraic plane curves
B and C, over an algebraically closed field, has multiplicity one. Prove that B and C
are regular at these points.
20.4 Prove that every regular projective algebraic plane curve is irreducible. Give an
example of reducible regular affine algebraic plane curve.
Sage
Pieter Belmans wrote the following sage code to find the intersection number of two
algebraic curves at a point. First we find the minimum degree of any term in a
polynomial f (x) of one variable.
def ldegree(f):
minimum = infinity
for (n, m) in f.dict():
minimum = min(minimum, n)
return minimum
Given any two polynomials b, c in two variables, we want to figure out what the two
variables are:
def determine_variables(b, c):
if len(b.variables()) == 2:
return b.variables()
if len(c.variables()) == 2:
return c.variables()
if len(b.variables()) == len(c.variables()) == 1:
if b.variables() == c.variables():
return (b.variable(0), 0)
else:
return (c.variable(0), b.variable(0))
return (0,0)
180
Algebraic curves in the projective plane
Finally, we use our definition of intersection number, and induction, to compute
intersection numbers recursively.
def intersection_number(b, c, point = (0,0)):
(x,y) = determine_variables(b, c)
# translate both curves to origin and calculate it there
b = b.subs({x:x + point[0], y:y + point[1]})
c = c.subs({x:x + point[0], y:y + point[1]})
# if $b(0,0)\neq 0$ or $c(0,0)\neq 0$ they don’t intersect in the origin
if b.subs({x:0, y:0}) != 0 or c.subs({x:0, y:0}) != 0:
return 0
# if $b$ or $c$ are zero they don’t intersect properly
if b == 0 or c == 0:
return Infinity
# we only look at factors of $x$
f = b.subs({y:0})
g = c.subs({y:0})
# $b$ contains a component $y=0$
if f == 0:
# $c$ contains a component $y=0$ too, no proper intersection
if c == 0:
return infinity
# remove common $y^n$ in $b$, count degree of $x$ in $c$ and recurse
else:
f = b.quo_rem(y)[0]
return ldegree(g) + intersection_number(f, c)
# $b$ does not contain a component $y=0$
else:
# $c$ *does* contain a component $y=0$
if g == 0:
g = c.quo_rem(y)[0]
return ldegree(f) + intersection_number(b, g)
# we recurse by removing factors of $x$
else:
p, q = f.lc(), g.lc()
r, s = f.degree(), g.degree()
# we drop the highest degree term
if r <= s:
return intersection_number(b, p*c - q*x^(s-r)*b)
else:
return intersection_number(q*b - p*x^(r-s)*c, c)
A test run:
P.<x,y> = PolynomialRing(QQ)
b = P(x^2+y^2)^2+3*x^2*y-y^3
c = P(x^2+y^2)^3-4*x^2*y^2
intersection_number(b,c)
yields 14, the intersection number at the origin of the curves:
Sage
181
Chapter 21
Families of plane curves
All happy families are alike; each unhappy family is unhappy in its own
way.
— Leo Tolstoy
Anna Karenina
Pencils of curves
Every curve C of degree n has a homogeneous equation p(x, y, z) = 0 of degree n,
unique up to rescaling, by Study’s theorem (theorem 20.3 on page 174). Take two
curves C and D of the same degree n, with equations p(x, y, z) = 0 and q(x, y, z) = 0.
The level sets
p(x, y, z)
=r
q(x, y, z)
of the ratio form a collection of curves Cr , called the pencil of curves containing C and
D. The curve C is C0 while D is C∞ . Every point [x, y, z] in the plane lies on a unique
one of these curves, with value r given by taking the value of p(x, y, z)/q(x, y, z) at
that point, except for the points in C ∩ D, for which the ratio is never defined. To be
more precise, each curve Cr can also be written as
p(x, y, z) = rq(x, y, z),
(except for C∞ = D). The points of C ∩ D belong to all curves Cr of the pencil.
Moreover the degree of Cr is the same as that of C and D.
• The lines through a point form a pencil: if C = (x = 0) and D = (y = 0)
then Cr = (x = ry).
• The pencil of conics containing the circle x2 + y 2 = 1 and the reducible
183
184
Families of plane curves
conic x2 = y 2 (which is a pair of lines x = ±y):
Look at the 4 points where the circle intersects the two lines; these
points lie in every conic in the pencil.
• The pencil of cubics containing y 2 = x + x3 and xy 2 = 0:
• Danger: for some values of r, Cr might actually be of lower order than
C or D. For example, if C = (y 2 = x) and D = (−y 2 = x), two conics,
the pencil has curves Cr with equation
(1 − r) y 2 = (1 + r)x,
which drops order at r = 1, i.e. C1 is a line. But note that it we
homogenize,
(1 − r) y 2 = (1 + r)zx,
the curve C1 is actually a pair of lines. In a pencil Cr for which C0
and C∞ are irreducible, we rarely but occasional find reducible curves,
as in this example, and we might picture that the moving curve Cr
“bubbles” into reducible components.
Creative Commons license, Attribution NonCommercial Unported 3.0,
Fir0002/Flagstaffotos
Polygons
Lemma 21.1. Suppose that two projective plane curves C and D of order n intersect
at exactly n2 points (the maximum possible by Bézout’s theorem). Write n as n = p+q.
If exactly pn of these points lie on a curve E of degree p (again the maximum possible)
then the remaining points, qn of them, lie on a curve F of degree q (again the maximum
possible).
Proof. Write C and D as C0 and C∞ in a pencil Cr . Pick a point s different from
the intersection points of C0 and C∞ . We can pick such a point, because we can
replace our field by its algebraic closure if needed, or more simply by any infinite
field containing. The numbers of intersection points can’t increase when we work in a
larger field, since Bézout’s theorem tells us that they are maximal already. The point
s then belongs to a unique element of the pencil, say to Cr0 .
In particular, we can pick s to belong to E. But then s sits in Cr0 along with the
points of C ∩ D and s also sits in E along with pn points of C ∩ D. So E and Cr0
contain 1 + pn points of C ∩ D. By Bézout’s theorem, since their degrees are p and n,
they contain a common irreducible component. Since E is irreducible by assumption,
E is that component of Cr0 . So Cr0 = E ∪ F , for some curve F of degree at most
q, i.e. the pencil bubbles into two components. The remaining intersection points of
C ∩ D don’t lie in E, but all lie in Cr0 , so they lie in F .
Polygons
In a projective plane, we have a concept of line, but not of line segment, or of angle
or distance. In a projective plane a triangle is a triple of points not all on the same
line, together with the lines that connect them. So a triangle looks like
rather than
Similarly, a quadrilateral is a quadruple of points, called the vertices, no three on the
same line, with a chosen ordering defined up to cyclically permuting, so there is no
first point or second point, but if you pick a point from the quadruple, there is a next
point, and a next, and so on until you come back to the point you started with. In
the same way, we define a polygon. The lines through subsequent vertices are the
edges. We can’t define the notion of a square or rectangle since there is no notion of
length or angle available.
Pascal’s mystic hexagon
Draw a hexagon with vertices sitting inside a conic.
185
186
Families of plane curves
Since each side has a “successor”, and there is an even number of sides, each side has
an “opposite” side, half way around the hexagon. Pick a side and its opposite, and
draw lines joining them.
The same for the next pair of opposite sides.
And finally for the third pair of opposite sides.
Draw all of these lines together in one picture.
Mysteriously, all three intersection points lie on a line.
Theorem 21.2. Given a hexagon in the projective plane over a field, the intersection
points of opposite sides lie on a line.
187
The space of curves of given degree
Proof. Let C be the reducible cubic curve consisting of three of the lines, no two
of them in succession (for example, the first, third and fifth lines). Let D be the
reducible cubic curve consisting of the remaining three lines of the hexagon. Let E
be the quadric. The result follows immediately from lemma 21.1 on page 185.
The space of curves of given degree
Consider the vector space Vd of homogeneous polynomials of degree d in 3 variables.
Each such polynomial has a coefficient of xa y b z c as long as a + b + c = d. Imagine
that we take d + 2 sticks sitting in a row. Colour 2 of them white and the rest black.
Take a to be the number of black sticks before the first white one, b the number of
black sticks between the two white ones, and c the number of black sticks after the
second white one. Clearly a + b + c = d. So the dimension of Vd as a vector space is
d+2
.
2
But since the equation pC = 0 of a curve C is defined only up to rescaling, the curves
∗
of degree d are identified with points of Pd (k) where
d∗ = −1 +
d+2
2
d
d∗
1
2
3
4
5
6
7
8
9
2
5
9
14
20
27
35
44
54
=
d(d + 3)
.
2
For example, the set of lines in P2 (k) is a 2-dimensional projective space, a projective
∗
plane, the dual projective plane. More generally, our projective space Pd (k) contains
all curves of degree d, reducible or irreducible.
∗
A pencil Cr = (p = rq) of two curves C and D is just a line inside Pd (k).
More generally, a linear system is collection of plane curves whose equations form a
∗
projective linear subspace in Pd (k).
The condition that a point q lies in a curve C = (pC = 0) is a linear equation
pC (q) = 0 on the coefficients of the polynomial pC so is satisfied by a linear projective
∗
hyperplane inside Pd (k). For example, the condition that a point lie in a line is one
equation, so determines a line in the dual projective plane. Similarly, the condition
that a curve has order at least k at a point q of the plane is a collection of k(k + 1)/2
linearly independent equations, killing the terms in the Taylor series up to order k
(if we arrange that q is the origin, for example). A collection of points p1 , p2 , . . . , pn
in the projective plane is in general position for curves of degree d if the linear
188
Families of plane curves
system of curves passing through them has either (1) the minimal dimension possible
d(d + 3)/2 − n if this quantity is not negative and (2) is empty otherwise.
Lemma 21.3. The definition of general position makes sense, in that any linear
system of degree d curves through n points has dimension at least
d∗ − n =
d(d + 3)
− n.
2
After perhaps replacing our field with an infinite extension field, if d∗ − n ≥ 0 then
equality is acheived for some collection of n points, while if d∗ − n < 0 then there is
a collection of n points not lying on any curve of degree d.
Proof. The proof is by induction. For d = 1, there is a pencil of lines through a point
(n = 1) and a unique line through any 2 points (n = 2 and no line through 3 or more
points, if we pick our 3 points correctly. Suppose that the result is true for all values
of n and d less than some particular choices of n and d ≥ 2. Note that
(d − 1)(d − 1 + 3)
d(d + 3)
=d+1+
.
2
2
Pick d + 1 points on a line L, and then pick
(d − 1)(d − 1 + 3)
2
points in general position, so that they lie on a unique curve C of degree d − 1. By
Bézout’s theorem, the line L is a component of any curve D of degree d through the
first d + 1 points. Therefore such a curve D is just C ∪ L. Hence D is the unique
curve of its degree through the given points. So through some collection of d(d + 3)/2
points, there is a unique curve of degree d. If we add any more points, choosing them
not to lie on the that curve, then there is no curve of degree d through those points.
If we take away any points, say down to some number n of point, then the dimension
goes up by at most one for each removed point, giving us our result by induction.
• Through any 2 points there is precisely one line:
• Through any 5 points, no three on a line, there is precisely one irreducible conic.
The space of curves of given degree
• Through any 9 points in general position, there is precisely one cubic.
• Through any 14 points in general position, there is precisely one quartic.
• Through any 20 points in general position, there is precisely one quintic.
189
Chapter 22
The tangent line
Proposition 22.1. Pick an irreducible plane algebraic curve C of degree d and a
point p of the plane not lying on C. Among all of the lines through p, every one of
them strikes C in exactly d distinct points, except for at most d(d − 1) lines.
Proof. First assume that C is irreducible. Arrange that the point p lies at the origin
in an affine chart, so that the lines through p are those of the form y = mx or x = 0.
Also arrange that C is not entirely on the line at infinity, so contains some finite points.
From the equation q(x, y) = 0 of the curve C, the intersections lie on q(x, mx) = 0,
a polynomial of degree at most d in x. This polynomial is not the zero polynomial,
because the origin does not lie on C. Therefore the number of intersections is d.
The multiplicities will all be 1 except at values of m where the discriminant (as a
polynomial in x) of q(x, mx) = 0 vanishes. The discriminant is an equation of degree at
most d(d − 1) in m. If the discriminant vanishes as a polynomial in m, it vanishes as a
rational function of m, and so q(x, mx) has a zero of higher multiplicity as a polynomial
in x with rational coefficients in m: q(x, mx) = (x − f (m))2 Q(x, m). But replacing
m by y/x we find that q(x, y) is reducible as a polynomial in x with coefficients
rational in y, and so by the Gauss’ lemma q(x, y) is reducible as a polynomial in x, y,
a contradiction. So the discriminant is a nonzero polynomial, and if it is constant we
can homogenise and dehomogenise to get it to be nonconstant.
Given a curve in the projective plane, we can take any of the three homogeneous
coordinate variables [x, y, z] and rescale to 1, say to x = 1, and see the curve as lying
on the affine plane. As we have seen, changing the choice of variable, say to y = 1 or
to z = 1, involves rehomogenising the equation and then plugging in y = 1 or z = 1,
giving us a birational curve.
Looking at our curve in the affine plane, by setting x = 1, the curve “homogenises”
into its cone in k3 , by homogenising the equation. The tangent line in the affine plane
x = 1 homogenises into a plane through the origin, the tangent plane to the cone.
When we set y = 1 or z = 1, or more generally we look at how our homogenised cone
surface passes through some plane not containing the origin, the tangent plane to
the cone becomes a tangent line to the resulting curve. Therefore we can speak of a
tangent line to an algebraic plane curve in the projective plane, by which we mean
the projective line which corresponds to the tangent plane to the cone.
To calculate that tangent line, we can either operate directly in some affine plane,
say x = 1, and just differentiate implicitly as usual, or we can homogenise, and then
differentiate the equation of the surface via partial derivatives.
• The curve y 2 = x2 + x3 has tangent line given by differentiating implicitly:
2yy 0 = 2x + 3x2 ,
191
192
The tangent line
so that at the point (x, y) = (3, 6), we have
2(6)y 0 = 2(3) + 3(3)2 ,
i.e. the slope of the tangent line is
y0 =
11
,
4
giving the tangent line
y−6=
11
(x − 3),
4
at the point (x, y) = (3, 6).
• For the same curve, we could instead homogenise to the cone
y 2 z = x2 z + x3 ,
and then treat any one of the variables as a function of the other two,
say treating z = z(x, y), so take partial derivatives:
∂z
∂z
= 2xz + x2
+ 3x2 ,
∂x
∂x
∂z
∂z
2yz + y 2
= x2 .
∂y
∂y
y2
We are interested in the point (x, y) = (3, 6) in the affine plane z = 1, so
the point (x, y, z) = (3, 6, 1). We plug in these values to our equations:
∂z
∂z
= 2(3)(1) + 32
+ 3(3)2 ,
∂x
∂x
∂z
∂z
2(6)(1) + 62
= 32 .
∂y
∂y
62
Solve these to find
∂z
11
=
,
∂x
9
∂z
4
=− ,
∂y
9
giving an equation of the tangent plane:
z−1=
11
4
(x − 3) − (y − 6),
9
9
which simplifies to
11
4
x − y.
9
9
This is the same solution, because along the affine plane z = 1 it gives
z=
1=
11
4
x − y,
9
9
which, solving for y, is
y−6=
11
(x − 3),
4
exactly the same solution we had above.
193
The tangent line
• At the point (x, y) = (0, 0), the process above fails to yield a tangent
line to the curve y 2 = x2 + x3 since differentiating implicitly:
2yy 0 = 2x + 3x2 ,
and plugging in the point (x, y) = (3, 6), we have 0 = 0, no equation
at all. In general, our procedure yields a tangent line to an irreducible
curve precisely at the points (x, y) where the polynomial p(x, y) in
the equation p(x, y) = 0 of the curve has at least one nonzero partial
derivative, so that we can solve for y 0 (x) or for x0 (y) implicitly. In this
example,
there are clearly 2 different lines tangent to the curve. We find these
by taking the homogeneous terms of lowest degree in x, y, y 2 = x2 ,
dropping the x3 because it has higher order, and then factoring into
y = ±x. We can justify this over the real numbers (or the complex
2
2
3
numbers),
√ by saying 2that our equation y = x + x has solutions
y = ± x2 + x3 ∼
= ±x , asymptotically as x → 0.
Take any algebraic curve C in the projective plane, and any point p0 of C. Write
out an affine chart in which p0 is the origin (x, y) = (0, 0), and write out the equation
0 = p(x, y) of the curve. We can take p(x, y) to be a product of irreducibles pi (x, y).
Since the origin belongs to C, at least one of the factors pi (x, y) vanishes there. Expand
each pi (x, y) into sums of constants times monomials. Without loss of generality, we
can work over an infinite field. Rescale (x, y) by a nonzero constant, and then let
that constant “get small”, i.e. look at the powers of that constant, thinking of higher
powers as “smaller”. Each monomial term rescales by some power of that factor.
Rescale both sides of the equation 0 = p(x, y) to get rid of the lowest power. For
example if C is 0 = x(y 2 − x2 − x3 ), rescaling by a factor λ gives
0 = λx λ2 y 2 − λ2 x2 − λ3 x3 .
Divide out the lowest power of λ, in this case λ2 , to get
0 = x y 2 − x2 − λ3 x3 .
The invariant terms give a homogeneous polynomial x y 2 − x2 . The zeroes of this
polynomial form the tangent lines to the curve. By homogeneity, the number of
tangent lines is the degree of the homogeneous polynomial, which is at most the
degree of the curve. The order of a point p0 on a curve C is the number of tangent
lines, counted with multiplicity. A regular point of an algebraic curve is a point of
order 1; any other point is a singular point.
• At any point of order 1, make a projective change of variables to put
the point at the origin, and the line to be the horizontal line y =
∂p
0. So our curve has equation p(x, y) = 0 with ∂y
6= 0 and with
194
The tangent line
∂p
∂x
= 0 at the origin. After a constant rescaling, our equation is
y = ax2 + bxy + cy 2 + . . .. Note that we cannot rid our equation of
these higher order terms in y.
• The curve y 3 = x4 over the real numbers is the graph of a differentiable
(but not polynomial) function y = x4/3 . Its tangent lines at the origin
arise from keeping the lowest degree homogeneous terms: y 3 = 0, i.e.
there is a unique tangent line y = 0. Nonetheless, over the complex
numbers this curve is not the graph of a differentiable complex function
of a complex variable.
• The singular points of an affine algebraic curve C with equation 0 =
p(x, y) are precisely the points where
0 = p(x, y) =
∂p
∂p
=
,
∂x
∂y
by definition. Similarly, the singular points of a projective algebraic
curve are the singular points of one of its representatives in an affine
chart.
• Take a reducible curve and write it as a union of irreducible component
curves. Correspondingly, write its equation as a product of two polynomials. Any two of the irreducible components intersect somewhere
(at least in the points defined over the algebraic closure of the field),
say at a point p. Take affine coordinates in which p is the origin. Each
of the two components has equation given by a polynomial vanishing
at the origin, so the original curve is a product of such polynomials.
Therefore the equation of the reducible curve vanishes to at least second order. Therefore every reducible curve is singular. Consequently,
every nonsingular curve is irreducible.
• The curve y = xn in the affine chart z = 1 has no singularities in that
chart, if n ≥ 1 is an integer. In the chart x = 1 it becomes z n−1 y = 1
which has singularities just when
z n−1 y = 1 and z n−1 = 0 and (n − 1)z n−2 y = 0,
which never happen simultaneously. Therefore over any field, the curve
y = xn is regular for every n = 1, 2, . . .. In particular, there are smooth
curves of all degrees.
Lemma 22.2. The number of singular points of an irreducible algebraic curve of
degree d is at most d(d − 1).
Proof. The singular points are the solutions of
0 = p(x, y) =
∂p
∂p
=
.
∂x
∂y
If one or both of these partial derivatives vanishes, then p(x, y) is a function of one
variable only, and the result is trivial to prove. If the two functions
p(x, y),
∂p
∂x
195
The tangent line
have a common factor, then since p(x, y) is irreducible, this factor must be p(x, y).
∂p
is a polynomial of lower degree in x than p(x, y), a contradiction.
But ∂x
22.1 What are the singularities, and their orders, of y p = xp+q for 1 ≤ p, q?
Theorem 22.3. The tangent line to a projective plane algebraic curve at a smooth
point is the unique line that has intersection number 2 or more at that point. More
generally, the intersection number of a line ` and a curve C = (0 = f ) at a point p is
the multiplicity of the zero of p as a zero of the restriction of f to `.
Proof. We can arrange that our coordinates axes form a generic triangle as usual, and
that our intersection point is at (x, y) = (1, 1). We can assume that our projective
plane algebraic curve is
X
fj (x)y j .
0 = f (x, y) =
j
We can arrange that our line is y = 1. The intersection number is the multiplicity of
x = 1 as a zero of the resultant

−1
1


det 




−1
1
−1
..
.
..
.
−1
1
1
f0 (x)
f1 (x) 

f2 (x) 
,
..

.

fn−1 (x)
fn (x)
and we add each row to the next


−1



= det 



−1
−1
..
.
−1
f0 (x)

f0 (x) + f1 (x)

f0 (x) + f1 (x) + f2 (x)

,
..

.

f0 (x) + f1 (x) + · · · + fn−1 (x)
f0 (x) + f1 (x) + · · · + fn (x)
= (−1)n (f0 (x) + f1 (x) + · · · + fn (x)) ,
= (−1)n f (x, 1).
So the intersection number of the line and the curve is the multiplicity of vanishing
of f (x, 1) at x = 1. Since f (1, 1) = 1, the intersection number is positive, and is two
or more just when
d f (x, 1) = 0,
dx x=1
i.e. just when y = 1 is a tangent line.
Theorem 22.4. Denote by Cp the order of a projective algebraic plane curve C at a
point p. Then for any point p and projective algebraic plane curves C, D:
Cp Dp ≤ CDp ,
with equality just when the curves have distinct tangent lines at p.
196
The tangent line
Proof. After perhaps replacing our field by a finite degree extension, we pick a generic
triangle for the two curves, and arrange that its vertices are (0, 0), (∞, 0), (0, ∞) as
usual. We can arrange that our point p is (x, y) = (1, 1). Write the equations of
our curves as C = (f = 0) and D = (g = 0). Then CDp is the order of x = 1 as a
zero of the resultant r(x) = resf,g (x) in y. Let r ..= Cp , s ..= Dp . The lowest order
homogeneous polynomial terms have order equal to the order of the point on the
curve, so we can write out
f (x, y) = f0 (x)(1 − x)r + f1 (x)(1 − x)r−1 (1 − y) + · · · + fr (x)(1 − y)r
+ fr+1 (x)(1 − y)r+1 + · · · + fm (x)(1 − y)m
g(x, y) = g0 (x)(1 − x)s + g1 (x)(1 − x)s−1 (1 − y) + · · · + gr s(x)(1 − y)s
+ gs+1 (x)(1 − y)s+1 + · · · + gn (x)(1 − y)n .
The tangent lines are the zero lines of the homogeneous terms of lowest order in
1 − x, 1 − y:
0 = f0 (0)(1 − x)r + f1 (0)(1 − x)r−1 (1 − y) + · · · + fr (0)(1 − y)r
0 = g0 (0)(1 − x)s + g1 (0)(1 − x)s−1 (1 − y) + · · · + gs (0)(1 − y)s .
We can assume that our axes are chosen so that the horizontal and vertical axes are
not tangent to either curve at the origin. To find tangent lines we can restrict :
0 = f0 (0)(1 − x)r + f1 (0)xr−1 + · · · + fr (0)
0 = g0 (0)xs + g1 (0)xs−1 + · · · + gs (x).
The number of common tangent lines, counting multiplicity, is the degree of the
resultant of these two polynomials.
Conics
A quadratic form is a homogeneous polynomial of degree 2.
Proposition 22.5 (Sylvester’s Law of Inertia). Every quadratic form in n variables
over the real numbers is identified, by some linear change of variables, with precisely
one of
x21 + x22 + · · · + x2p − y12 + y22 + · · · + yq2
where p + q ≤ n. The triple of numbers (p, q, r) so that p + q + r = n is called the
signature of the quadratic form.
Proof. Write the quadratic form as
q(x) = a11 x21 + a12 x1 x2 + · · · + ann x2n ,
or in other words as q(x) = hAx, xi for a symmetric matrix A. Apply an orthogonal
change of variables which diagonalizes the matrix A. Then rescale each variable to
get the eigenvalues of the matrix to be ±1 or zero.
197
Conics
Lemma 22.6. Every quadratic form in n variables over a field k of characteristic
not 2 can be brought to the “diagonal” form
q(x) =
X
ai x2i
by a linear change of variables. The ai are uniquely determined up to multiplying by
squares ai b2i of nonzero elements bi of k.
Proof. Take a vector x for which q(x) 6= 0. Make a linear change of variables to get
x = e1 , so that q(e1 ) 6= 0. Hence after rescaling
q(x) = c1 x21 + . . . ,
say
q(x) = c1 x21 + c2 x1 x2 + · · · + cn x1 xn + . . .
where x1 does not appear in the remaining terms. Replace x1 , x2 , . . . , xn with
X1 , x2 , . . . , xn where
cn
c2
x2 − · · · −
x2
X1 ..= x1 −
2c1
2c1
and check that now
q(x) = c1 X12 + . . .
where the . . . don’t involve X1 , so we can apply induction.
• For the real numbers, any positive real number has a square root, so
we recover the previous lemma.
• For k = C, any complex number has a square root, so every quadratic
form in n complex variables becomes
q(z) = z12 + z22 + · · · + zp2 ,
for a unique p ≤ q.
• For k = Q, any rational number, up to multiplying by a nonzero square
of a rational number, is a square-free integer, i.e. has the form
α = ±p1 p2 . . . ps
where all of the primes
p1 , p 2 , . . . , p s
are distinct. So every quadratic form over k has the form
q(x) =
X
αi x2i
for square-free integers αi . In the associated conic curve, we can rescale
the coefficients, so we can assume that the square-free integers αi have
no common factor. We can also assume, by changing all signs if needed,
that there are at least as many positive αi as negative.
• Since every element of any field obtains a square root in some extension,
the quadratic forms over any algebraically closed field have the form
q(z) = z12 + z22 + · · · + zp2 ,
for a unique p ≤ q.
198
The tangent line
• Look at the curves x2 = y 2 and the curves x2 + y 2 = 0, defined over
k = R; each is a singular plane conic. The first is a pair of lines x = ±y
while the second, a circle of radius zero, has the origin as its only real
point, but splits into x = ±iy over k̄ = C.
Lemma 22.7. If a plane conic over a field k has a singular point with coordinates
in k then it is either
a. a pair of lines defined over k or else
√
b. a pair of lines defined over a quadratic extension k( α) meeting at a single point
defined over k, the singular point, and no other point of the curve is defined
over k.
Proof. At a singular point of a plane conic, in an affine chart in which the singular
point is the origin, the equation of the conic becomes a quadratic form in x, y. This
factors over k or over a quadratic extension of k, i.e. the conic is a pair of lines
in a quadratic extension. If any point (except the origin) on one of those lines has
coordinates in k, then the line does, and so does the associated linear factor, and then
by dividing that linear factor in, we see that the other linear factor is also defined
over k. Suppose on the other hand that the linear factors don’t exist over k, only over
the quadratic extension. Then no other point of the affine plane with coordinates
in k belongs to the curve, except the singular point. Write out the equation of the
curve as ax2 + bxy + cy 2 = 0, say. Switch to the affine chart y = 1 where the curve is
ax2 + bx + c. Since the homogeneous equation has no roots in P1 (k), this quadratic
equation has no root in k, so there is no point in this affine chart y = 1. Therefore
the projective conic has precisely one point with coordinates in k.
Proposition 22.8. If a conic in the projective plane over a field k has a regular point
with coordinates in k, it is either a pair of distinct lines, and taken to the conic xy = 0
by some projective automorphism, or is irreducible and taken to the conic y = x2 by
some projective automorphism.
Proof. Suppose that our conic is not a pair of lines. It has a regular point with
coordinates in k, and is not a pair of lines, so it has no singular points, and so is
irreducible. Arrange by projective automorphism that the regular point lies at the
origin of the affine chart z = 1 with tangent line y = 0, so the curve has the form
y = ax2 + bxy + cy 2 .
If a = 0, we get a factor of y on both sides, so the equation is reducible. Rescale y to
arrange a = 1, and write the equation as x2 = y (1 − bx − cy). Change variables by a
projective automorphism X = x, Y = y, Z = z − bx − cy to get to x2 = y.
Corollary 22.9. If an irreducible conic in the projective plane over a field k has a
regular point with coordinates in k then it is birational to a line.
Proof. Suppose that the conic, call it C, has equation x2 = y, i.e. x2 = yz homogeneously. Map
[s, t] ∈ P1 (k) 7→ st, s2 , t2 ∈ C ⊂ P2 (k).
199
Conics
The inverse map is
[x, y, z] ∈ C 7→
[x, y],
[z, y],
if y 6= 0,
if z =
6 0.
Clearly this identifies the rational functions.
More geometrically, pick a regular point p0 on the curve, and arrange that it is
the origin in an affine chart. Then take a line ` in the plane, and identify each point
p ∈ ` of that line with the intersection point of the line pp0 with C. We leave the
reader to check that this gives the same map.
Chapter 23
Projective duality
The set of lines in the projective plane P = P2 (k) is called the dual plane and denoted
P ∗ = P2∗ (k). Each line has a homogeneous linear equation 0 = ax + by + cz, unique
up to rescaling, and so the point [a, b, c] is determined, allowing us to identify P ∗
with P2 (k) itself. The points of P ∗ correspond to lines in P . Through each point
of P , there is a pencil of lines; for example if the point is (x0 , y0 ) in the affine chart
z = 1, then the lines are a (x − x0 ) + b (y − y0) = 0, or expanding out: ax + by + c = 0
where c = −ax0 − by0 . Hence in P ∗ the pencil is the line consisting of those points
[a, b, c] so that ax0 + y0 + c = 0. So each point p of P is identified with a line p∗ in
P ∗ , consisting of those lines in P passing through p. Each line λ of P is identified
with a point λ∗ in P ∗ . Given two points p, q of P and the line pq through them, the
corresponding lines p∗ and q ∗ in P ∗ intersect at the point corresponding to (pq)∗ . We
can drop all of this notation and just say that each point p of P is a line in P ∗ , and
vice versa.
Dual curves
Take a projective algebraic curve C in P = P2 (k) over a field k with algebraic closure
k̄, with equation 0 = p(x, y) in some affine chart.
∂p
, etc. Show that at any point (x0 , y0 ) of
23.1 Denote partial derivatives as px ..= ∂x
C at which px =
6 0 or py 6= 0, the equation of the tangent line is
0 = px (x0 , y0 ) (x − x0 ) + py (x0 , y0 ) (y − y0 ) .
Lemma 23.1. In homogeneous coordinates [x, y, z] on P = P2 , if the equation of
a curve C is 0 = p(x, y, z) with p(x, y, z) homogeneous, then the tangent line at a
point [x0 , y0 , z0 ] of C has homogeneous equation
0 = xpx (x0 , y0 , z0 ) + ypy (x0 , y0 , z0 ) + zpz (x0 , y0 , z0 ) .
Proof. Differentiate to get tangent line
0 = px (x0 , y0 , z0 ) (x − x0 ) + py (x0 , y0 , z0 ) (y − y0 ) + pz (x0 , y0 , z0 ) (z − z0 ) .
Take the equation that says that p(x, y, z) is homogeneous of degree d:
px (λx, λy, λz) = λd p(x, y, z),
and differentiate it in λ to get
xpx (λx, λy, λz) + ypy (λx, λy, λz) + zpz (λx, λy, λz) = dλd−1 p(x, y, z).
Now set λ = 1 to get
xpx + ypy + zpz = dp.
201
202
Projective duality
Plug into the homogeneous equation for the tangent line to get
0 = xpx (x0 , y0 , z0 ) + ypy (x0 , y0 , z0 ) + zpz (x0 , y0 , z0 ) − dp(x0 , y0 , z0 ) .
But at [x0 , y0 , z0 ] we know that p(x0 , y0 , z0 ) = 0.
Each line [a, b, c] in P ∗ has homogeneous equation ax+by +cz = 0, so the equation
of the tangent line at each point [x0 , y0 , z0 ] of a curve with homogeneous equation
0 = p(x, y, z) is
" # " #
a
px
b = py .
c
pz
The dual curve of C is the smallest algebraic curve C ∗ in P ∗ containing the tangent
lines of C with coordinates in k̄.
Corollary 23.2. Every curve has a dual curve, i.e. the map
a
px
b = py
c
pz
" #
" #
taking the smooth points of C to P ∗ has image inside an algebraic curve.
Proof. Assume that the field is algebraically closed. Suppose that the image contains
finitely many points. The original curve has finitely many tangent lines, so is a finite
union of lines. Following Study’s theorem (theorem 20.3 on page 174) we can assume
that C is irreducible, say C = (p(x, y, z) = 0). Take the four equations p = 0, a = px ,
b = py , c = pz , rescaling out any one of the x, y, z variables by homogeneity (or some
linear combination of them), and compute resultants for them in one of the x, y, z
variables, treating them as having coefficients rational in the other two x, y, z variables
and in a, b, c. Repeat until we get rid of x, y, z entirely. Since we have polynomial
expressions to begin with in all of our variables, each resultant is also polynomial in
those variables. By corollary 16.3 on page 150, perhaps making a linear change of
x, y, z variables before taking our first resultant, the resultants we end up are not all
zero polynomials. So the image of the duality map lies in an algebraic curve in the
dual plane. But the resultant vanishes just precisely on the points in the image of the
duality map, since these equations specify precisely those points [a, b, c].
Chapter 24
Polynomial equations have solutions
For any field k, affine space kn is the set of all points
c1
 
 c2 

c=
 ... 
cn
with coordinates c1 , c2 , . . . , cn in k. Recall that an ideal I in k[x] is a collection of
polynomials so that we can add and multiply polynomials in I without leaving I, and
I is sticky: if f ∈ I and g ∈ k[x] then f g ∈ I. Any set S of polynomials generates an
ideal (S): the ideal is the set of all expressions
g1 f1 + g2 f2 + · · · + gs fs
where f1 , f2 , . . . , fs is any finite collection of polynomials in S and g1 , g2 , . . . , gs are
any polynomials at all in k[x]. Given any set of polynomials, we can replace that set
by the ideal of polynomials it generates inside k[x] without changing the roots of the
polynomials.
Theorem 24.1 (Nullstellensatz). Given a collection of polynomial equations in variables
x = (x1 , x2 , . . . , xn ) .
over a field k, either there is a point x = c defined in a finite algebraic extension field
of k where all of these polynomials vanish or else the ideal generated by the polynomials
is all of k[x].
Proof. Let I be the ideal generated by the polynomials in the equations, and let X
be the set of simultaneous roots of all of those polynomials. If the ideal I is the zero
ideal, then X = kn and the result is clear.
In one dimension, i.e. if n = 1, take a smallest degree nonzero polynomial in
the ideal. Any other polynomial in the ideal has larger or the same degree, so take
quotient and remainder to get something smaller, a contradiction: every element of
the ideal is divisible by that one nonzero polynomial. Hence X is the set of zeroes
of that polynomial, so not empty (after perhaps replacing k by a finite algebraic
extension field).
Suppose that that we have already proven the result for all number of variables
up to and including n. Write polynomials in n + 1 variables as f (x, y) with x =
(x1 , x2 , . . . , xn ). Pick any nonzero element g(x, y) from I. By lemma 16.2 on page 149,
we can make a linear change of variables to arrange that g(x, y) = y m + . . . is monic
in the y variable (after perhaps replacing k by a finite algebraic extension field). Let
I 0 = I ∩ k[x] be the ideal in k[x] consisting of polynomials in I that don’t involve the
203
204
Polynomial equations have solutions
y variable. Note that I = k[x, y] just when 1 ∈ I so just when 1 ∈ I 0 . So we can
suppose that 1 is not in I 0 , so I 0 6= k[x]. By induction, there is a point x = c at which
all polynomials in I 0 vanish (after perhaps replacing k by a finite algebraic extension
field).
Let
J ..= { f (c, y) | f ∈ I } ⊂ k[y].
It is clear that J is an ideal in k[y]. We will now prove that J 6= k[y]. If J = k[y], we
can write 1 = f (c, y) for some f ∈ I. Expand out as a polynomial in y:
f (x, y) = f0 (x) + f1 (x)y + · · · + f` (x)y ` .
Then at x = c,
1 = f0 (c) + f1 (c)y + · · · + f` (c)y ` .
So 1 = f0 (c) and 0 = f1 (c) = f2 (c) = · · · = f` (c). Take the resultant in the y variable:
r(x) ..= resf,g . By lemma 16.1 on page 147,
r(x) = u(x, y)f (x, y) + v(x, y)g(x, y)
for polynomials u(x, y), v(x, y). Therefore r(x) belongs to I 0 . But g(c, y) = y m + . . .
doesn’t vanish. Looking at the determinant that gives the resultant r(x), it is upper
triangular with 1 in every diagonal entry, so r(x) = 1. But then 1 ∈ I 0 ⊂ I and
therefore I = k[x, y]. This is a contradiction, so our assumption that J = k[y] has
led us astray. So J ( k[y] is an ideal in one variable polynomials, and therefore J is
generated by a single polynomial h(y). All elements of I vanish at (x, y) = (c, b) for
any of the zeroes y = b of h(y).
24.1 Given a point c ∈ kn , prove that the ideal
Ic ..= { f (x) | f (c) = 0 }
of functions vanishing at x = c is a maximal ideal. Prove that if x = p and x = q are
distinct points of kn , then Ip 6= Iq .
Corollary 24.2. Take variables
x = (x1 , x2 , . . . , xn ) .
over an algebraically closed field k. Every maximal ideal in k[x] is the set of all
polynomial functions vanishing at some point c, for a unique point c.
Proof. Take any ideal I ⊂ k[x] Take a point c so that all polynomials in I vanish
there, using theorem 24.1 on the previous page. Then I ⊂ Ic .
Theorem 24.3 (Hilbert basis theorem). For any field k and any variables x =
(x1 , x2 , . . . , xn ), every ideal I in k[x] is finitely generated. In other words, there are
finitely many polynomials
f1 (x), f2 (x), . . . , fs (x)
in I so that the polynomials in I are precisely those of the form
g1 (x)f1 (x) + g2 (x)f2 (x) + · · · + gs (x)fs (x),
for any polynomials g1 (x), g2 (x), . . . , gs (x) in k[x].
205
Closed sets
Proof. The result is obvious if n = 0 since the only ideals in k are k and 0. By
induction, suppose that the result is true for any number of variables less than or
equal to n. Take an ideal I ⊂ k[x, y] where x = (x1 , x2 , . . . , xn ) and y is a variable.
Write each element f ∈ I as
f (x, y) = f0 (x) + f1 (x)y + · · · + fd (x)y d ,
so that fd (x) is the leading coefficient. Let J be the set of all leading coefficients of
elements of I. Clearly J is an ideal in k[x], so finitely generated by induction. So
we can pick polynomials f1 (x, y), f2 (x, y), . . . , fs (x, y) in I whose leading coefficients
generate the ideal J of all leading coefficients.
Let N be the maximum degree in y of any of the fj (x, y). Take some polynomial
g(x, y) of degree greater than N in y, say with leading coefficient ǧ(x). Express that
leading coefficient in terms of the leading coefficients fˇj (x) of the polynomials fj (x, y)
as
X
ǧ(x) =
hj (x)fˇj (x).
If we raise the y degree of fj (x, y) suitably to match the y degree of g(x, y), say by
multiplying by y kj , some positive integer kj , then
X
hj (x)y kj fj (x, y)
has leading coefficient the same as g(x, y), so
g(x, y) =
X
y kj hj (x)fj (x, y) + . . .
up to lower order in y. Inductively we can lower the order in y of g, replacing g by the
. . ., until we get to order in y less than N . At each step, we modify g by something
in the ideal generated by the fj (x, y).
Add to our list of fj (x, y) some additional fj (x, y) ∈ I to span all of the polynomials
in I of degree up to n. By induction we have generated all of the polynomials in
I.
Closed sets
An affine variety X = XS is the collection of zeroes of a set S of polynomials in
variables
x = (x1 , x2 , . . . , xn ) .
A closed set in affine space kn (over a field k) is just another name for an affine
variety.
• The entire set kn is closed: it is the zero locus of the polynomial
function p(x) = 0.
• Every finite set in k is closed: the set { c1 , c2 , . . . , cn } is the zero locus
of
p(x) = (x − c1 ) (x − c2 ) . . . (x − cn ) .
• Any infinite subset of k, except perhaps k itself, is not closed. We
know that every ideal I in k[x] is generated by a single polynomial (the
greatest common divisor of the polynomials in I), say I = p(x), so the
associated closed set is the finite set of roots of p(x).
206
Polynomial equations have solutions
• A subset of k2 is closed just when it is either (a) all of k2 or (b) a finite
union of algebraic curves and points.
An open set is the complement of a closed set. An open set containing a point
c of kn is a neighborhood of c. The intersection of any collection of closed sets is a
closed set: just put together all of the equations of each set into one set of equations.
In particular, the intersection of all closed sets containing some collection X of points
of kn is a closed set, the closure of X. A set S is dense in a closed set X if X is the
closure of S.
If the set X in kn is the zero locus of some polynomials fi (x) and the set Y in kn
is the zero locus of some polynomials gj (x), then X ∪ Y in kn is the zero locus of the
polynomials fi (x)gj (x). Hence the union of finitely many closed sets is closed.
Chapter 25
More projective planes
So full of shapes is fancy
That it alone is high fantastical.
— William Shakespeare
Twelfth Night
We know that man has the faculty of becoming completely absorbed in a
subject however trivial it may be, and that there is no subject so trivial
that it will not grow to infinite proportions if one’s entire attention is
devoted to it.
— Leo Tolstoy
War and Peace
Projective planes
A projective plane (in a more general sense of the term than we have used previously)
is a set P , whose elements are called points, together with a set L, whose elements
are called lines, and a defined notion of incidence, i.e. of a point being “on” a line (or
we could say that a line is “on” or “through” a point) so that
a. Any two distinct points both lie on a unique line.
b. Any two distinct lines both lie on a unique point.
c. There are 4 points, with no 3 on the same line.
We usually denote a projective plane as P , leaving its set L of lines understood, and
its notion of incidence understood.
Lemma 25.1. The projective plane P2 (k) over any field is a projective plane in this
more general sense.
Proof. Given any two distinct points of P2 (k), they arise from two distinct vectors in
k3 , determined up to rescaling. Such vectors lie in a basis of k3 , and span a plane in
k3 through the origin. The lines in that plane lying through the origin consitute a
projective line through the two points. Similarly, any two distinct projective lines are
the lines through the origin of two distinct planes through the origin. Those planes are
each cut out by a linear equation, and the linear equations have linearly independent
coefficients, because they are distinct planes. But then the simultaneous solutions of
207
208
More projective planes
the two equations in three variables form a line through the origin, a projective point.
To find 4 points, with no 3 on the same line, consider the 3 points
1
0
0
a , 1 , 0
b
c
1
" # " # " #
for any constants a, b, c from k. For these points to be on the same projective line,
the corresponding vectors
!
!
!
1
0
0
a , 1 , 0
b
c
1
must lie on the same plane, i.e. satisfying the same linear equation with not all
coefficients zero. But these vectors are linearly independent. Now take the 4 points
1
1
0
0
1 , 0 , 1 , 0
0
1
0
1
" # " # " # " #
and check that any 3 do not lie in a projective line.
25.1 Prove that every line in a projective plane contains at least three points.
Duality
We defined polygons in projective planes over fields in section 21 on page 185; we
employ the same definitions in any projective plane. If P is a projective plane with
set of lines L, the dual projective plane, denoted P ∗ , has L as its set of “points” and
P as its set of “lines”, and the same incidence notion, i.e. we use interchange the
words “point” and “line”.
Lemma 25.2. The dual P ∗ of a projective plane P is a projective plane, and P ∗∗ =
P.
Proof. We have only to check that there is a quadrilateral in P ∗ . Pick a quadrilateral
in P , and we need to see that its 4 edges, in the obvious cyclic ordering, are the
vertices of a quadrilateral in P ∗ . Three of those vertices in P ∗ lie on a line in P ∗
just when three of the lines of the original quadrilateral lie through the same point,
say p. So we need to prove that no quadrilateral has three of its edges through the
same point. Taking the remaining edge as the fourth edge, the first three edges of the
quadrilateral pass through the same point, so that point is the first and the second
vertex. But then the third vertex lies on a line with the first two.
For example, if V is a 3-dimensional vector space over a field k, the associated
projective plane PV is the set of lines through the origin of V . A line in PV is a
plane through the origin of V . Every plane through the origin is the set of solutions
of a linear equation, i.e. is the set of zeroes of a linear function f : V → k, i.e. of an
element 0 6= f ∈ V ∗ of the dual vector space. Such a function is uniquely determined
by its zeroes, up to rescaling. So each line in PV is identified with a 2-plane through
the origin of V and thereby with a linear function f ∈ V ∗ defined up to rescaling, and
thereby with a line k · f ⊂ V ∗ through the origin of V ∗ , and thereby with a point of
PV ∗ : P(V ∗ ) = P(V )∗ .
Projective spaces
Projective spaces
A projective space is a pair of sets, called the sets of points and lines, together with a
definition of incidence of points on lines, so that
a. Any two distinct points p, q are on exactly one line pq.
b. Every line contains at least three points.
c. For any four points p, q, r, s, with no three on the same line, if pq intersects rs
then pr intersects qs.
Points are collinear if they lie on a line, and lines intersect at a point if they both lie
on that point. By a. and b., every line of a projective space is determined uniquely
by the set of points lying on it. Therefore it is safe to think of lines as sets of points.
25.2 Prove that Pn (k) is a projective space, for any field k.
Given a point p and a line `, let p`, the plane spanned by p and `, be the set of
points q so that q = p or pq intersects `. A line of a plane P = p` in a projective
space is a line of the projective space which intersects two or more points of P .
Lemma 25.3. Every plane in a projective space is a projective plane.
Proof. Take a projective space S, and a plane P = p` in S. Looking at the definition
of projective space, the first requirement ensures the first requirement of a projective
plane: any two distinct points line in a unique line. Take two distinct lines m, n. We
need to show they intersect at a unique point, and that this point also lies in P .
First, suppose that m = ` and that p lies on n. We need to prove that n intersects
`. By definition of lines in a plane, there are at least two points of P on each line of
P . Pick a point q 6= p from P on n. Then n = pq intersects m = ` at a point of `, by
definition of a plane.
Next, suppose that m = ` but that p does not lie on n. We need to prove that n
intersects `. By definition of n being a line on the plane P , n contains two points of P .
Pick two points q, q 0 of P lying on n. If either point lies on `, we are done. Let r, r0
be the intersection points of pq, pq 0 with `, which exist by definition of a plane. Then
pq = qr and pq 0 = q 0 r0 so qr and q 0 r0 intersect at p. Suppose that three of q, q 0 , r, r0
lie on the same line. But n = qq 0 so that line is n. But then r is on n, so n intersects
`, and we are done. So we can assume that q, q 0 , r, r0 are not collinear. By property c.,
since qr and q 0 r0 intersect (at p), qq 0 = n intersects rr0 = `.
Next, suppose that m 6= `. Repeating the above if n = `, we can assume that
n 6= `. From the above, m and n both intersect `, say at points a and b. If a = b
then m and n intersection and we are done. If a 6= b, take points a0 6= a in m and
b0 6= b in n both lying in P . Repeating the above, a0 b0 intersects `. So either three of
a, a0 , b, b0 are collinear, or aa0 = m intersects bb0 = n. Suppose that three of a, a0 , b, b0
are collinear. Then one of b or b0 lies in m, or one of a, a0 lies in n, so m and n
intersect at a point of P . Therefore we can assume that no three of a, a0 , b, b0 are
collinear. By property c., since a0 b0 and ab intersect, aa0 = m intersects bb0 = n. We
have established that any two lines of P intersect in S. We have to prove that this
intersection point c lies in the plane P . The line pa is a line of P , and so intersects `,
m and n. The line n = bc is also a line of P . If three of p, a, b, c are collinear, then
p lies in n or a lies in n, and we are done as above. By property c., since pa and bc
intersect, pc intersects ab = `. By definition of a plane, the point c belongs to P . So
we have established that any two distinct lines of P lie on a unique point of P .
209
210
More projective planes
Finally, take two distinct points a, b on ` and a0 6= a on pa and b0 6= b on pb. By
definition, a, b, a0 , b0 all lie on P . Collinearity of any three of these points would, by
the same argument, force a = b.
Corollary 25.4. If a line lies in a plane in a projective space, then every point lying
on that lie lies on that plane.
A projective subspace of a projective space of dimension −∞ means an empty set,
of dimension zero means a point, and inductively of dimension n + 1 means the union
of all points on all lines through a projective subspace of dimension n and a disjoint
point. A line of a projective subspace is a line of the projective space containing two
or more points of the subspace. Clearly every projective subspace is itself a projective
space, by induction from the proof that every plane in a projective space is a projective
plane.
Lemma 25.5. In any 3-dimensional projective space, every line meets every plane.
Proof. Pick a plane P . We want to see that every line ` meets P . Suppose that ` = ab
for two points a, b. If either of a, b are on P , then ` meets P , so suppose otherwise.
Pick a point q0 not in P . Suppose that a 6= q0 and b 6= q0 . Each of a, b lies on
a line q0 a0 , q0 b0 , for some a0 , b0 on P , since our projective space has dimension 3. If
a = a0 then a is on P and ` so ` meets P . Similarly if b = b0 . Therefore we can
suppose that a 6= a0 and b =
6 b0 . So aa0 and bb0 intersect at q0 . Suppose that three of
a, a0 , b, b0 lie on a line. Then aa0 = q0 a0 contains b or b0 , so equals ab = ` or a0 b0 lying
on P . In the first case, ab = ` then contains a0 , which lies on P , so ` meets P . In the
second case, a0 b0 lies in P and contains a which lies on `, so ` meets P . So we can
assume that no three of a, a0 , b, b0 lie on a line. By c., ab intersects a0 b0 , so ` intersects
a line in P .
Lemma 25.6. Any two distinct planes in a 3-dimensional projective space meet along
a unique line.
Proof. Take planes P 6= P 0 . Take a point p0 in P but not P 0 . Pick two distinct lines
`, m in P through p0 . Let a be the intersection of ` and P 0 , and b the intersection of
m and P 0 . The line ab lies in P 0 and P . If some common point c of P and P 0 does
not lie in ab, then the plane abc lies in P and P 0 , so P = P 0 .
The quaternionic projective spaces
The quaternions over a commutative ring R are the quadruples a = (a0 , a1 , a2 , a3 ) of
elements of R, each written as
a = a0 + a1 i + a2 j + a3 k.
They form the ring HR with addition component by component and multiplication
given by the rules
i2 = j 2 = k2 = −1, ij = k, jk = i, ki = j, ji = −k, kj = −i, ik = −j,
and conjugation defined by
ā = a0 − a1 i − a2 j − a3 k.
211
The quaternionic projective spaces
The quaternions H (with no underlying commutative ring specified) are the quaternions over R = R.
25.3 Prove that HR is associative, by a direct computation.
A skew field is an associative ring with identity, not necessarily commutative, in
which every element a has a multiplicative inverse: a−1 a = aa−1 = 1.
25.4 Check that
aā = āa = a20 + a21 + a22 + a23 .
In particular, if k is a field in which sums of squares are never zero unless all elements
are zero (as when k = R), prove that Hk is a skew field.
25.5 Suppose that k is a skew field. Define P2 (k) by analogy with the definition for
a field. Explain why P2 (k) is then a projective plane.
The quaternionic projective plane P2 (H) is the projective plane of the quaternions.
Take a skew field k. Add vectors x in kn as usual component by component, but
multiply by elements ra in k on the right:
x1
x1 a
 x2 
 x2 a 
 . a =  . .
 .. 
 .. 
xn
xn a




Multiply by matrices on the left as usual.
A vector space V over a skew field k is a set V equipped with an operation u + v
and an operationa va so that
Addition laws:
1. u + v is in V
2. (u + v) + w = u + (v + w)
3. u + v = v + u
for any vectors u, v, w in V ,
Zero laws:
1. There is a vector 0 in V so that 0 + v = v for any vector v in V .
2. For each vector v in V , there is a vector w in V , for which v + w = 0.
Multiplication laws:
1. va is in V
2. v · 1 = v
3. v(ab) = (va)b
4. v(a + b)v = va + vb
5. (u + v)a = ua + va
for any numbers a, b ∈ k, and any vectors u and v in V .
212
More projective planes
For example, V = kn is a vector space, with obvious componentwise addition and
scaling. A subspace of a vector space V is a nonempty subset U of V so that when
we apply the same scaling and addition operations to elements of U we always get
elements of U . In particular, U is also a vector space.
A linear relation between vectors v1 , v2 , . . . , vp ∈ V is an equation
0 = v1 a1 + v2 a2 + · · · + vp ap
satisfied by these vectors, in which not all of the coefficients a1 , a2 , . . . , ap vanish.
Vectors with no linear relation are linearly independent. A linear combination of
vectors v1 , v2 , . . . , vp ∈ V is vector
v1 a1 + v2 a2 + · · · + vp ap
for some elements a1 , a2 , . . . , ap from k. One weird special case: we consider the vector
0 in V to be a linear combination of an empty set of vectors. The span of a set of
vectors is the set of all of their linear combinations. A basis for V is a set of vectors
so that every vector in V is a unique linear combination of these vectors. Note that
some vector spaces might not have any basis. Also, if V = { 0 } then the empty set is
a basis of V .
Theorem 25.7. Any two bases of a vector space over a skew field have the same
number of elements.
Proof. If V = { 0 }, the reader can check this special case and see that the empty set
is the only basis. Take two bases u1 , u2 , . . . , up and v1 , v2 , . . . , vq for V . If needed,
swap bases to arrange that p ≤ q. Write the vector v1 as a linear combination
v1 = u1 a1 + u2 a2 + · · · + up ap .
If all of the elements a1 , a2 , . . . , ap vanish then v1 = 0, and then zero can be written as
more than one linear combination, a contradiction. By perhaps reordering our basis
vectors, we can suppose that a1 6= 0. Solve for u1 :
u1 = (v1 + u2 (−a2 ) + · · · + up (−ap )) a−1
1 .
We can write every basis vector u1 , u2 , . . . , up as a linear combination of v1 , u2 , . . . , up ,
and so the span of v1 , u2 , . . . , up is V . If there is more than one way to write a vector
in V as such a linear combination, say
v1 b1 + u2 b2 + · · · + up bp = v1 c1 + u2 c2 + · · · + up cp
then the difference between the left and right hand sides is a linear relation
0 = v1 d1 + u2 d2 + · · · + up dp .
Solve for v1 in terms of u1 and plug in:
0 = u1 a1 d1 + u2 (d2 + a2 d1 ) + · · · + up (dp + ap d1 ) .
Since these u1 , u2 , . . . , up form a basis,
0 = a1 d1 = d2 + a2 d1 = · · · = dp + ap d1 .
Incidence geometries
Multiplying the first equation by a−1
gives d1 = 0, and then that gives 0 = d2 =
1
· · · = dp , a contradiction. Therefore v1 , u2 , . . . , up is a basis. By induction, repeating
this process, we can get more and more of the vectors v1 , v2 , . . . , vq to replace the
vectors u1 , u2 , . . . , up and still have a basis. Finally, we will find a basis u1 , u2 , . . . , up
consisting of a subset of p of the vectors v1 , v2 , . . . , vq . But then every vector is a
unique linear combination of some subset of these vectors, a contradiction unless that
subset is all of them, i.e. p = q.
The dimension of a vector space is the number of elements in a basis, or ∞ if there
is no basis. A linear map f : V → W between vector spaces V, W over a skew field k
is a map so that f (v1 + v2 ) = f (v1 ) + f (v2 ) for any v1 , v2 in V and f (va) = f (v)a for
any v in V and a in k.
25.6 Prove that for any linear map f : V → W between vector space V = kq and
W = kp , there is a unique p × q matrix A with entries from k so that f (v) = Av for
any v in V .
The kernel of a linear map f : V → W is the set of all vectors v in V so that
f (v) = 0. The image of a linear map f : V → W is the set of all vectors w in W of
the form w = f (v) for some v in V .
25.7 Prove that the kernel and image of a linear map f : V → W are subspaces of V
and of W and that their dimensions add up to the dimension of V .
25.8 Prove that for any basis v1 , v2 , . . . , vn for a vector space V , there is a unique
linear map f : V → W with has any given values f (vi ) = wi .
25.9 Imitating the proof of lemma 25.1 on page 207, prove that P2 (k) is a projective
plane for any skew field k.
25.10 Prove that Pn (k) is a projective space for any skew field k.
Incidence geometries
We want to consider more general configurations of points and lines than projective
planes or spaces. An incidence geometry, also called a configuration, is set P , whose
elements are called points, together with a set L, whose elements are called lines, and
a defined notion of incidence, i.e. of a point being “on” a line (or we could say that a
line is “on” or “through” a point).
• As points, take P ..= Z/mZ and as lines takes L ..= Z/mZ and then
say that a point p is on a line ` if p = ` or p = ` + 1. Call this a polygon
or an m-gon. If m = 3, call it a triangle, and if m = 4 a quadrilateral.
• Any projective plane or projective space is a configuration.
• The affine plane is a configuration.
• If we just draw any picture on a piece of paper of some dots (“points”)
and some squiggly curves (“lines”, not necessarily straight), with some
indication which lines intersect which points, that describes a configuration. For example, we can draw the Fano plane as:
213
214
More projective planes
6
1
3
4
7
2
5
where the “points” are the numbered vertices and the “lines” are the
straight lines, including the three straight lines through the center, and
the circle.
A morphism of configurations
f : (P0 , L0 ) → (P1 , L1 )
is a pair of maps, both called f , f : P0 → P1 and f : L0 → L1 so that if a point p lies
on a line `, then f (p) lies on f (`).
• Every projective plane contains a quadrilateral, so has a morphism
from the quadrilateral P = Z/4Z.
• Take the Fano plane, using our diagram, and delete the one line corresponding to the circle; call the result the kite. Every projective plane
P2 (k) over a field k contains the points [x, y, z] for x, y, z among 0, 1
and not all zero, and contains the various lines between them, i.e. contains the kite. So there is a morphism, injective on points and lines,
from the kite to every projective plane over any field. In fact, if we
start off with any quadrilateral in any projective plane, and we draw
from it all lines through any two of its vertices, and then all of the
intersections of those lines, and then all lines through any two vertices
of that configuration, and so on, we end up with the kite.
• If k has a field extension K, then the obvious map P2 (k) → P2 (K) is
a morphism.
An isomorphism is a morphism whose inverse is also a morphism, so that a point
lies on a line just exactly when the corresponding point lies on the corresponding
line. An automorphism of a configuration is an isomorphism from the configuration
to itself. A monomorphism is a morphism injective on points and on lines, so that a
point lies on a line after we apply the morphism just when it did before.
• By definition, every projective plane admits a monomorphism from a
quadrilateral.
• The map P2 (R) → P2 (C) is not an isomorphism, as it is not onto on
either points or lines, so has no inverse, but it is a monomorphism.
• The map f : Pn (k) → Pn+1 (k) taking
f [x0 , x1 , . . . , xn ] = [x0 , x1 , . . . , xn , 0]
is a monomorphism.
Desargues’ theorem
• Invertible matrices with entries in a field k act as projective automorphisms on P2 (k), and similarly in all dimensions; call these linear
projective automorphisms or k-projective automorphisms.
√
• Let k = Q( 2) and let f : k → k be the automorphism
√ √
f a+b 2 =a−b 2
for any a, b in Q. Then clearly f extends to kn+1 for any integer n,
and so extends to Pn (k), giving an automorphism which is not linear.
25.11 Any two quadrilaterals in the projective plane P2 (k) over any field k can be
taken one to the other by a linear automorphism. Hint: think about the linear algebra
in k3 , get the first three vectors where you want them to go, and think about rescaling
them to get the last vector in place.
The dual of a configuration has as line the points and as points the lines and the
same incidence relation. If f : A → B is a morphism of configurations, define the
dual morphism to be the same maps on points and lines, but interpreted as lines and
points respectively.
Desargues’ theorem
Two triangles with a chosen correspondence between their vertices are in perspective
from a point p if the lines through corresponding vertices pass through p.
The two triangles are in perspective from a line ` if corresponding edges intersect on
`.
215
216
More projective planes
Every projective plane has a quadrilateral, so contains a pair of triangles in
perspective from a point. The dual of a pair of triangles in perspective from a point is
a pair of triangles in perspective from a line. A configuration is Desargue if every pair
of triangles in perspective from some point is also in perspective from some line. If we
draw the picture very carefully, it appears that the real projective plane is Desargues.
The incidence geometry
Desargues’ theorem
is the standard Desargues configuration. The same configuration with three curves
added through those 3 points to determine a triangle is the broken Desargues configuration.
A configuration is thus Desargues just when every morphism to it from the configuration of a pair of triangles in perspective extends to a morphism from a standard
Desargue configuration, or equivalently the configuration admits no monomorphisms
from the broken Desargues configuration. Not every projective plane is Desargues, as
we will see.
25.12 Prove that a projective plane is Desargues precisely when, for any pair of
triangles in perspective, their dual triangles in the dual projective plane are also in
perspective.
Lemma 25.8. Every 3-dimensional projective space is Desargues.
Proof. Looking back at our drawings of triangles in perspective, suppose that the two
triangles lie on different planes. Then these two planes intersect along a single line.
The lines drawn along the sides of either triangle lie in one or the other of those two
different planes, and intersect along that line. Therefore the intersection points lie on
that line, i.e. the intersection points are collinear.
But so far we have assumed that the two triangles in perspective lie in different
planes. Suppose that they lie in the same plane. We want to “perturb” the triangles
out of that plane. Pick a point outside that plane, which exists by axiom c., and call
it the perturbed origin o0 . On the line o0 a from the perturbed origin o0 to one of the
vertices a of the first triangle, pick a point, the perturbed vertex a0 of the first triangle,
not lying in the original plane. Take the vertex A in the second triangle corresponding
to a. The points o, a, A are collinear by definition of triangles in perspective. The
points o, o0 , a, A therefore are coplanar. The point a0 by construction lines in this
217
218
More projective planes
plane. So the lines oa0 and o0 A lie in that plane, and intersect in a point A0 , the
perturbed vertex of the second triangle. The triangles with perturbed vertices are
now in perspective from the origin o (not from the perturbed origin). The perturbed
triangles lie in different planes, since otherwise the entire perturbed picture lies in
a single plane, which contains the other vertices of our two triangles, so lies in the
original plane.
Essentially we just perturbed a vertex of the first triangle, and then suitably
perturbed the corresponding one in the second triangle to get a noncoplanar pair of
triangles in perspective. The intersection points p0 , q 0 , r0 of the perturbed standard
Desargues configuration lie in a line p0 q 0 r0 by the previous reasoning. Take the plane
containing that line p0 q 0 r0 and the perturbed origin o0 . That plane intersects the plane
P of the original unperturbed triangles along a line `. The lines o0 p0 , o0 q 0 , o0 r0 intersect
the plane P at the three intersection points p, q, r of the unperturbed configuration,
so these lie in `.
Theorem 25.9. Every projective space of dimension 3 or more is Desargue.
Proof. Start with any triangle: it lies on a plane. Add one more point: the picture
still lies in a 3-dimensional projective space. But then we can start our sequence
of pictures above constructing the two triangles in perspective, and then the entire
Desargues configuration, inside that 3-dimensional projective space.
For example, projective space Pn (k) of any dimension n ≥ 3 over any skew field k
is Desargues.
Corollary 25.10. If a projective plane embeds into a projective space of some dimension n ≥ 3, then the projective plane is Desargues.
For example, projective space Pn (k) of any dimension n ≥ 2 over any field k is
Desargues.
Generating projective planes
A monomorphism A → P from a configuration to a projective plane P is universal
if every monomorphism A → Q to a projective plane Q factors through a morphism
P → Q, i.e. A → P → Q. A configuration is planar if any two distinct points of A
lie on at most one line. A planar configuration is spanning if there are 4 points with
no three collinear.
Lemma 25.11. A configuration is planar if and only if it has a monomorphism to a
projective plane. A planar configuration is spanning if and only if it has a universal
monomorphism to a projective plane; this projective plane, the plane generated by the
configuration, is then unique up to a unique isomorphism.
Proof. Note that any two lines of a planar configuration A intersect at, at most, one
point, since otherwise they would contain two distinct points lying on two different
lines.
For any configuration B, let B+ have the same points as B, but add in a new line
for each pair of points that don’t lie on a line. To be more precise, we can let the
lines of B+ be the lines of B together with the unordered pairs of points {p, q} of B
that don’t lie on a line of B.
Octonions
For any configuration B, let B + have the same lines as B, but add in a new point
for each pair of lines that don’t lie on a point. To be more precise, we can let the
points of B + be the points of B together with the unordered pairs of lines {m, n}
of B that don’t lie on a point of B. Inductively, let A0 ..= A, A2j+1 ..= A2j,+ and
A2j+2 ..= A+
2j+1 .
Let P be the configuration whose points are all points of A1 , A2 , . . ., and whose
lines are the lines of A1 , A2 , . . .. Clearly any two distinct points of P lie on a unique
line, and any two distinct lines of P lie on a unique point. Suppose that A is spanning.
Three noncollinear points of A will remain noncollinear in A+ , because we only add
new points. They will also remain noncollinear in A+ , because we only add new lines
through pairs of points, not through triples. Therefore they remain noncollinear in
P , so that P is a projective plane.
If f : A → Q is a monomorphism to a projective plane, define f : A+ → Q by
f {p, q} = f (p)f (q), and similarly for A+ → Q, to define a morphism f : P → Q.
Given another universal morphism f 0 : A → P 0 , both monomorphisms must factor
via A → P → P 0 and A → P 0 → P . These factoring maps P → P 0 and P 0 → P are
isomorphisms, as one easily checks, by looking step by step at how they act on points
and lines.
A trivalent configuration is one in which every point is on at least three lines and
every line is on at least three points.
Lemma 25.12. Take a spanning planar configuration A and its universal morphism
to a projective plane A → P . Every monomorphism T → P from a finite trivalent
configuration factors through T → A → P .
Proof. At each stage of constructing P , we never put in a new line passing through
more than two points. So among the lines of T , the last to appear in the construction
of P contains only two points, a contradiction, unless T lies in A.
Let A have 4 points and no lines, like ::, which is clearly spanning and planar.
Take P to be the projective plane generated by A. By the lemma, no line in
P contains 3 points. As every line in the Desargues configuration contains
3 points, P does not contain the Desargues configuration. Start with the
quadrilateral given by the points of A, thought of as one point and one triangle
of P . Since every line in any projective plane contains at least 3 points, we
can then build a pair of triangles in perspective in P out of our triangle and
point, as in our pictures of the Desargues configuration. Therefore P is not
Desargues.
Octonions
In this section, we will allow the use of the word ring in a somewhat more general
context. A ring is a set of objects S together with two operations called addition and
multiplication, and denoted by b + c and bc, so that if b and c belong to S, then b + c
and bc also belong to S, and so that:
Addition laws:
219
220
More projective planes
a. The associative law: For any elements a, b, c in S: (a + b) + c = a + (b + c).
b. The identity law: There is an element 0 in S so that for any element a in
S, a + 0 = a.
c. The existence of negatives: for any element a in S, there is an element b
in S (denote by the symbol −a) so that a + b = 0.
d. The commutative law: For any elements a, b in S, a + b = b + a.
Multiplication laws:
The Distributive Law:
a. For any elements a, b, c in S: a(b + c) = ab + ac.
We do not require the associative law for multiplication; if it is satisfied the ring is
associative.
A ring is a ring with identity if it satisfies the identity law: There is an element
1 in S so that for any element a in S, a1 = a. We will only ever consider rings with
identity.
A ring is a division ring if it satisfies the zero divisors law: for any elements a, b
of S, if ab = 0 then a = 0 or b = 0.
A ring is commutative if it satisfies the commutative law: for any elements a, b of
S, ab = ba.
The associative law for addition, applied twice, shows that (a + b) + (c + d) =
a + (b + (c + d)), and so on, so that we can add up any finite sequence, in any order,
and get the same result, which we write in this case as a + b + c + d. A similar story
holds for multiplication if R is associative.
Any skew field k has a projective space Pn (k) for any integer n, defined as the set
of tuples
x = (x0 , x1 , . . . , xn )
with x0 , x1 , . . . , xn ∈ k, but with two such tuples equivalent just when the second
equals the first scaled by a nonzero element:
x = (x0 , x1 , . . . , xn ) ∼
= xc = (x0 c, x1 c, . . . , xn c) .
This approach doesn’t work for nonassociative rings, because it might be that (xa)b
can not be written as x(ab) or even as xc for any c. It turns out to be a very tricky
problem to define a meaning for the expression Pn (R) if R is a nonassociative ring, a
problem which does not have a general solution.
A conjugation on a ring R is a map taking each elements b of R to an element,
denoted b̄, of R, so that
a. b + c = b̄ + c̄,
b. bc = c̄b̄,
c. 1̄ = 1 if R has a chosen multiplicative identity element 1,
d. b̄ = b,
for any elements b, c of R. A ring with a conjugation is a ∗-ring. If R is a ∗-ring, let
R+ be the set of pairs (a, b) for a, b in R; write such a pair as a + ιb, and write a + ι0
as a, so that R ⊂ R+ . Endow R+ with multiplication
(a + ιb) (c + ιd) ..= ac − db̄ + ι (ād + cb)
221
Octonions
with conjugation
a + ιb ..= ā − ιb
and with identity 1 and zero 0 from R ⊂ R+ . For example, R+ = C, H ..= C+ is the
quaternions and O ..= H+ is the octaves, also called the octonions. A normed ring is
a ∗-ring for which the vanishing of a finite sum
0 = aā + bb̄ + · · · + cc̄
implies 0 = a = b = · · · = c.
25.13 If R is a normed ring then prove that R+ also a normed ring.
25.14 If R is associative, normed and has no zero divisors, then prove that R+ has
no zero divisors.
A ∗-ring is real if a = ā for every a.
25.15 Prove that R+ is not real for any ∗-ring R.
25.16 Prove that every real ∗-ring is commutative.
25.17 Prove that R+ is not real, for any ∗-ring R, and is therefore not commutative.
25.18 Prove that R is real just when R+ is commutative.
25.19 Prove that R is commutative and associative just when R+ is associative.
The associator of a ring R is the expression
[a, b, c] = (ab)c − a(bc).
Clearly a ring is associative just when its associator vanishes. A ring is alternative
when its associator alternates, i.e. changes sign when we swap any two of a, b, c.
25.20 Prove that any alternative ring satisfies b(cb) = (bc)b for any elements b, c. We
denote these expressions both as bcb, without ambiguity.
25.21 Prove that R is normed and associative just when R+ is normed and alternative.
25.22 Prove that R is normed and associative and every nonzero element of the form
aā + bb̄
in R has a left inverse c then every element a + ιb in R+ has a left inverse: cā − ιcb.
In particular, O is a normed alternative division ring and every element of O has
a left inverse.
222
More projective planes
Coordinates
Given a projective plane P , take a quadrilateral:
Call this:
the line at infinity. Henceforth, draw it far away. Two lines:
are parallel if they meet at the line at infinity. One of the two “finite” points of our
quadrilateral we designate the origin. We draw lines from it out to the two “far away”
points: the axes:
Map any point p not on the line at infinity to points on the two axes:
p
px
and
py
p
The map
p 7→ (px , py )
is invertible: if we denote the “far away” points ∞x and ∞y :
(px , py ) 7→ p = (px ∞x ) (py ∞y ) ,
This map takes the finite points p (i.e. not on the line at infinity) to their “coordinates”
px and py on the axes, and takes finite points on the axes to points in the plane. So we
write the point p as (px , py ). We still haven’t used the fourth point of the quadrilateral:
Coordinates
Draw lines through it parallel to the axes:
Call the points where these lines hit the axes (1, 0) and (0, 1). Take any point on one
axis and draw the line through it parallel to the line (1, 0)(0, 1):
Identify two points of the two axes if they lie on one of these parallels, but of course
we write them as (x, 0) or (0, x).
The points of the line at infinity we think of as slopes. Any point of the line at
infinity, except the one on the vertical axis, is a “finite slope”. Draw lines through the
origin.
Each line (except the vertical axis) passes through a point (1, m) and a point on the
line at infinity, which we label as m.
A line in the finite plane has slope m if it hits the line at infinity in this point labelled
m. If it strikes the vertical axis at (0, b), we call b its intercept. Given points x, m, b on
the finite horizontal axis, we write mx + b to mean the point y on the finite horizontal
axis so that the point with coordinates (x, y) lies on the line with slope m and vertical
intercept b. This defines an operation x, m, b 7→ mx + b, which depends on three
arguments x, m, b on the finite horizontal line.
A ternary ring (sometimes called a planar ternary ring is
a. an operation taking three elements x, m, b of a set R to an element, denoted
mx + b, of the set R,
b. together with a choice of two elements 0, 1 of R, so that for any x, y, m, n, b, c
in X:
1. m0 + b = 0x + b = b,
2. m1 + 0 = m,
3. 1x + 0 = x,
223
224
More projective planes
4. mx + b = y has a unique solution b for any given x, m, y,
5. mx + b = y has a unique solution x for any given b, m, y if m 6= 0,
6. mx + b = nx + c has a unique solution x if m 6= n,
7. The pair of equations mx1 + b = y1 , mx2 + b = y2 has a unique solution
m, b if x1 =
6 x2 and y1 6= y2 .
Warning: a ternary ring is not necessarily a ring. Given a ternary ring, we can
define a + b to mean 1a + b, and define ab to mean ab + 0, defining an addition and
multiplication operation, but these do not necessarily satisfy the axioms of a ring.
Warning: even worse, with these definitions of addition and multiplication, it might
turn out that (mx) + b is not equal to the ternary operation mx + b, so we will be
safe: we avoid reference to the addition and multiplication operations henceforth.
Theorem 25.13. For any quadrilateral in any projective plane, the operation mx + b
defined above yields a ternary ring. Every ternary ring arises this way from a unique
projective plane with quadrilateral.
Proof. Start with a projective plane, and follow our steps above to define mx + b. To
be precise, y = mx+b holds just when (x, y) lies on the line with slope m and intercept
(0, b). We need to check that this operation satisfies the identities of a ternary ring.
The identity m0 + b = b says that the point (0, b) lies on the line of slope m through
(0, b), for example. All of the other statements have equally banal proofs, which the
reader can fill in.
Start with a ternary ring R. Let’s construct a projective plane. The finite points
of the projective plane are the points of the set R × R. The finite slopes of the
projective plane are the points of a copy of the set R. The infinite slope point of the
projective plane is some point ∞y , just a symbol we invent which is not an element
of R. The line at infinity is the set R ∪ { ∞y }. The points of our projective plane are
the elements of the set
P ..= R × R ∪ R ∪ { ∞y } .
For any m, b from R, the line with slope m and intercept b is the set
{ (x, y) | y = mx + b } ∪ { m } .
A line of our projective plane is either (1) the line at infinity or (2) a line with some
slope m and intercept b. Again, boring steps show that the axioms of a projective
plane are satisfied, with (0, 0), (1, 1), 0, ∞y as quadrilateral.
For example, every ring with identity (not necessarily associative) in which every
element has a left inverse becomes a ternary ring by taking mx + b to mean using
multiplication and addition in the ring. In particular, the octaves have a projective
plane P2 (O), the octave projective plane.
Bibliography
[1] C. Herbert Clemens, A scrapbook of complex curve theory, second ed., Graduate
Studies in Mathematics, vol. 55, American Mathematical Society, Providence, RI,
2003. MR 1946768 (2003m:14001)
[2] David S. Dummit and Richard M. Foote, Abstract algebra, third ed., John Wiley
& Sons, Inc., Hoboken, NJ, 2004. MR 2286236 (2007h:00003)
[3] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Concrete mathematics, second ed., Addison-Wesley Publishing Company, Reading, MA, 1994, A
foundation for computer science. MR 1397498 (97d:68003)
[4] Thomas W. Hungerford, Algebra, Graduate Texts in Mathematics, vol. 73,
Springer-Verlag, New York-Berlin, 1980, Reprint of the 1974 original. MR 600654
(82a:00006)
[5] Igor R. Shafarevich, Basic notions of algebra, Encyclopaedia of Mathematical
Sciences, vol. 11, Springer-Verlag, Berlin, 2005, Translated from the 1986 Russian
original by Miles Reid, Reprint of the 1997 English translation [MR1634541],
Algebra, I. MR 2164721
[6] Michael Spivak, Calculus, Calculus, Cambridge University Press, 2006. 50, 53
[7] E. B. Vinberg, A course in algebra, Graduate Studies in Mathematics, vol. 56,
American Mathematical Society, Providence, RI, 2003, Translated from the 2001
Russian original by Alexander Retakh. MR 1974508 (2004h:00001)
[8] Judy L. Walker, Codes and curves, Student Mathematical Library, vol. 7, American Mathematical Society, Providence, RI; Institute for Advanced Study (IAS),
Princeton, NJ, 2000, IAS/Park City Mathematical Subseries. MR 1768485
(2001f:14046)
[9] André Weil, Number theory for beginners, Springer-Verlag, New York-Heidelberg,
1979, With the collaboration of Maxwell Rosenlicht. MR 532370 (80e:10004)
225
List of notation
gcd {m1 , m2 , . . . , mn }
greatest common divisor, 5
lcm {m1 , m2 , . . . , mn }
least common multiple, 7
b̄
congruence class of b, 27
b ≡ c (mod m)
b is congruent to c modulo m, 27
φ(m)
Euler’s totient function, 36
||z||
modulus of a complex number, 53
resb,c
resultant of polynomials b(x), c(x), 76
0
p (x)
derivative of polynomial, 81
|G|
order of group, 94
Dn
the dihedral group, 95
G ∗K H
amalgation over a subgroup, 108
G∗H
amalgation, 108
ej
elementary symmetric functions, 111
x
a
multivariable exponent, 113
pj
sum of powers polynomials, 116
R(x)
the field of rational functions with coefficients in a ring R, 122
k̄
algebraic closure, 129
ker f
∼S
R=
kernel of a ring morphism, 131
(A)
ideal generated by a set A in a ring R, 133
R/I
quotient of a ring by an ideal, 154
2
isomorphic rings, 132
P (k)
projective plane over a field k, 167
Pn (k)
projective space over a field k, 168
BCp
multiplicity of intersection of curves B and C at point p, 176
P
∗
H
2
P (O)
dual projective plane, 208
quaternions, 210
the octave projective plane, 224
227
Index
∗-ring, 220
real, 221
Carroll, Lewis, 3
Cayley graph, 98
Cayley, Arthur, 163
characteristic polynomial, 117
Chinese remainder theorem, 132
closed set, 205
closure, 206
coefficient
of polynomial, 57
common divisor, 5
commutative law
addition, 1, 122, 220
multiplication, 1, 123, 220
commuting, 92
complex conjugation, 53
component, 139
multiple, 174
concatentation, 103
cone, 169
configuration, 213
broken Desargues, 217
dual, 215
standard Desargues, 217
trivalent, 219
conic, 138
conjugation, 220
coordinates
homogeneous, 166
coprime, 5
cubic
curve, 138
polynomial, 60
curve
algebraic, 138
cubic, 138
irreducible, 139
projective plane, 170
quartic, 138
quintic, 138
rational, 142
abelian, 92
absolute value, 2
addition laws, 1, 122, 211, 219
affine space, 203
affine variety, 205
algebra, 136
algebraic
closure, 129
curve, 138, 139
algorithm
Euclidean, 6
extended Euclidean, 17
alphabet, 103
alternative ring, 221
Armstrong, Mark, 87
associative law
addition, 1, 122, 220
multiplication, 1, 122
associator, 221
automorphism
projective plane, 214
ring, 132
Bézout
coefficients, 17
theorem, 177, 179
basis, 125, 212
bijection, 87
birational, 141
biregular, 139
Blake, William, 163
Boole, George, 9
Brahma’s correct system, 33
Brahma-Sphuta-Siddhanta, 33
Brahmagupta, 33
broken Desargues configuration, 217
Brothers Karamazov, 164
229
230
Index
reducible, 139
cycle, 90
disjoint, 90
cyclic, 95
Dawson, Les, 47
De Morgan, Augustus, 57
degree
of field extension, 125, 160
of polynomial, 57
denominator, 48
derivative, 81
Desargues
configuration
broken, 217
standard, 217
projective plane, 216
Descartes
theorem, 50
descriptive geometry, 163
determinacy of sign, 2
determinant, 89
dihedral group, 95
Dijkstra, Edsger, 21
dimension, 213
discriminant, 81
distributive law, 1, 122, 220
division, 3
by zero, 4
divisor, 4
Dostoyevsky, Fyodor, 164
Douglass, Frederick, 2
dual
configuration, 215
curve, 202
projective plane, 187, 201, 208
equality cancellation law
addition, 3
multiplication, 3
Euclid, 4, 17, 24
Euclidean algorithm, 6
extended, 17
Euler’s totient function, 36, 43
Euler, Leonhard, 25
even integer, 24
existence of negatives, 1, 122, 220
extended Euclidean algorithm, 17
Fano plane, 167, 213, 214
Fermat’s Last Theorem, 138
Fibonacci sequence, 14, 15
field, 59, 124
of fractions, 134
splitting, 126, 129
fixed point, 103
form
quadratic, 196
free
group, 103
product, 108
Frost, Robert, 131
function
rational, 122, 141
recursive, 13
regular, 139
Gauss’s lemma, 65
Gauss, Carl Friedrich, 27
general position, 187
generate
projective plane, 218
generators, 98
generic
triangle, 175
very, 175
greatest common divisor, 5
group, 92
isomorphism, 94
transformation, 88
Haddon, Mark, 23
hair
red, 9
heaven, vi
hell, vi
Hilbert basis theorem, 204
homogeneous
coordinates, 166
polynomial, 118, 169
homomorphism, 131
horse, 12
hyperplane
linear, 73
ideal, 133
maximal, 155
prime, 154
identity law
addition, 1, 122, 220
231
Index
multiplication, 1, 123, 220
identity transformation, 87
image, 105, 213
incidence geometry, 213
induction, 10
inequality cancellation law
addition, 2
multiplication, 2
inertia, 196
injective, 87
morphism, 132
intermediate value theorem, 50
intersection number, 176
invariant
of square matrix, 117
inverse, 87
invertible
transformation, 87
irreducible
curve, 139
polynomial, 59, 66, 161
isomorphic
rings, 132
isomorphism
of groups, 94
projective plane, 214
ring, 132
Karenina, Anna, 183
kernel, 105, 213
ring morphism, 131
kite, 214
Klein, Felix, 137
Kronecker, Leopold, 1
Lagrange, Joseph-Louis, 102
law of well ordering, 2, 4, 10
least common multiple, 7
left coset, 99
left translate, 99
line, 138
linear
function, 73
hyperplane, 73
projective automorphism, 215
system, 187
linear combination, 212
linear map, 213
linear relation, 212
linear transformation, 89
linearly independent, 212
list, 15
Lüroth’s theorem, 161
Malraux, André, 41
Marx, Chico, 172
mathematical induction, 10
maximal ideal, 155
Mencken, H. L., 52
modulus, 53
monic polynomial, 59, 109
morphism
injective, 132
of groups, 105
one-to-one, 132
onto, 132
rational, 141
regular, 139
ring, 131
surjective, 132
multiplication laws, 1, 122, 211, 220
multiplication table, 94
multiplicative inverse, 31
multiplicity, 50
negative, 2
neighborhood, 206
Newton, Isaac, 116
norm, 53
nullstellensatz, 203
number
rational, 47
number line, 49
numerator, 48
octave projective plane, 224
octaves, 221
octonions, 221
odd integer, 24
one-to-one, 87
morphism, 132
onto, 87
morphism, 132
open set, 206
optics, vi
orbit, 103
order, 94
group element, 96
of point on curve, 193
Painlevé, Paul, 52
232
Index
papyrus
Rhind, 47
partial fraction, 70, 142
decomposition, 70
pencil, 164, 183, 187
perfect square, 49
permutation, 87
perspective, 215
Philips, Emo, 13
Picasso, Pablo, 7
plane, 209
Fano, 167
plane curve
projective, 170
Plath, Sylvia, 41
Poincaré, Henri, 87
point
regular, 193
singular, 193
polygon, 185, 213
polynomial, 57, 122
cubic, 60
homogeneous, 118, 169
irreducible, 59, 66, 161
monic, 59, 109
quadratic, 60
reducible, 59, 66, 161
positive
integer, 2
rational number, 49
presentation, 104
prime, 23
prime ideal, 154
product, 94
projective
automorphism, 166, 168
line, 166
plane, 165, 207
automorphism, 214
Desargues, 216
dual, 187, 208
generation, 218
isomorphism, 214
plane curve, 170
space, 168, 209
subspace, 210
projective plane
quaternionic, 211
quadratic
form, 196
polynomial, 60
quadrilateral, 185
quartic
curve, 138
quaternionic projective plane, 211
quaternions, 210
quintic
curve, 138
quotient, 4
quotient ring, 154
rational
curve, 142
function, 122, 141
morphism, 141
number, 47
real
∗-ring, 221
recursive function, 13
red hair, 9
reduce, 103
reducible
curve, 139
polynomial, 59, 66, 161
regular
function, 139
point, 193
regular point, 141
remainder, 4, 42
resultant, 76, 129, 147
Rhind papyrus, 47
right coset, 99
right translate, 99
ring, 121, 219
alternative, 221
scalar multiplication, 136
Shakespeare, William, 207
sign laws, 2
signature, 196
singular point, 193
skew field, 211
span, 212
split polynomial, 125
splitting field, 126, 129
square-free, 197
standard Desargues configuration, 217
Study
theorem, 174
233
Index
subgroup, 99
subspace, 212
projective, 210
subtraction, 3
surjective, 87
morphism, 132
Swift, Jonathan, 9
Sylvester’s law of inertia, 196
Sylvester, James, 196
symmetric
function, 112
symmetric group, 89
tangent line, 191
temperature, vi
ternary ring, 223
theorem
Bézout, 177, 179
Descartes, 50
Hilbert basis, 204
intermediate value, 50
Lüroth, 161
nullstellensatz, 203
Study, 174
Tolstoy, Leo, 183, 207
totient function, 36
trace, 117
transformation, 87
group, 88
identity, 87
invertible, 87
linear, 89
triangle, 185
generic, 175
very generic, 175
trivalent configuration, 219
Twelfth Night, 207
unit, 31
ring, 124
variety
affine, 205
very generic
triangle, 175
Vieté, Francois, 110
War and Peace, 207
weight, 113, 119
well ordering, 2
Weyl, Hermann, 109
Williams, Tennessee, 65
Wittgenstein, Ludwig, v
word, 103, 108
zero divisor, 124