Document related concepts
no text concepts found
Transcript
```PMATH 340
Lecture Notes on Elementary Number Theory
Anton Mosunov
Department of Pure Mathematics
University of Waterloo
Winter, 2017
Contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Introduction . . . . . . . . . . . . . . . . . . . . . . .
Divisibility. Factorization of Integers.
The Fundamental Theorem of Arithmetic . . . . . . .
Greatest Common Divisor. Least Common Multiple.
Lemma. . . . . . . . . . . . . . . . . . . . . . . . . .
Diophantine Equations.
The Linear Diophantine Equation ax + by = c . . . . .
Euclidean Algorithm. Extended Euclidean Algorithm .
Congruences.
The Double-and-Add Algorithm . . . . . . . . . . . .
The Ring of Residue Classes Zn . . . . . . . . . . . .
Linear Congruences . . . . . . . . . . . . . . . . . . .
The Group of Units Z?n . . . . . . . . . . . . . . . . .
Euler’s Theorem and Fermat’s Little Theorem . . . . .
The Chinese Remainder Theorem . . . . . . . . . . .
Polynomial Congruences . . . . . . . . . . . . . . . .
The Discrete Logarithm Problem.
The Order of Elements in Z?n . . . . . . . . . . . . . .
The Primitive Root Theorem . . . . . . . . . . . . . .
Big-O Notation . . . . . . . . . . . . . . . . . . . . .
Primality Testing . . . . . . . . . . . . . . . . . . . .
16.1 Trial Division . . . . . . . . . . . . . . . . . .
16.2 Fermat’s Primality Test . . . . . . . . . . . . .
16.3 Miller-Rabin Primality Test . . . . . . . . . .
Public Key Cryptosystems.
The RSA Cryptosystem . . . . . . . . . . . . . . . . .
The Diffie-Hellman Key Exchange Protocol . . . . . .
Integer Factorization . . . . . . . . . . . . . . . . . .
1
. . . . . .
3
. . . . . .
Bézout’s
. . . . . .
5
9
. . . . . . 15
. . . . . . 18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
29
31
33
36
38
41
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
50
53
56
57
58
61
. . . . . . 62
. . . . . . 67
. . . . . . 69
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
19.1 Fermat’s Factorization Method . . .
19.2 Dixon’s Factorization Method . . .
Quadratic Residues . . . . . . . . . . . . .
The Law of Quadratic Reciprocity . . . . .
Multiplicative Functions . . . . . . . . . .
The Möbius Inversion . . . . . . . . . . . .
The Prime Number Theorem . . . . . . . .
The Density of Squarefree Numbers . . . .
Perfect Numbers . . . . . . . . . . . . . .
Pythagorean Triples . . . . . . . . . . . . .
Fermat’s Infinite Descent.
Fermat’s Last Theorem . . . . . . . . . . .
Gaussian Integers . . . . . . . . . . . . . .
Fermat’s Theorem on Sums of Two Squares
Continued Fractions . . . . . . . . . . . . .
The Pell’s Equation . . . . . . . . . . . . .
Algebraic and Transcendental Numbers.
Liouville’s Approximation Theorem . . . .
Elliptic Curves . . . . . . . . . . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
70
72
75
81
86
91
95
96
101
104
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
105
110
120
124
135
. . . . . . . . . . . . 137
. . . . . . . . . . . . 140
1
Introduction
This is a course on number theory, undoubtedly the oldest mathematical discipline
known to the world. Number theory studies the properties of numbers. These may
be integers, like
√ −2, 0 or 7, or rational numbers like 1/3 or −7/9, or algebraic
numbers like 2 or i, or transcendental numbers like e or π. Though most of
the course will be dedicated to Elementary Number Theory, which studies congruences and various divisibility properties of the integers, we will also dedicate
several lectures to Analytic Number Theory, Algebraic Number Theory, and other
subareas of number theory.
There are many interesting questions that one might ask about numbers. In
search for answers to these questions mathematicians unravel fascinating properties of numbers, some of which are quite profound. Here are several curious facts
1. Every odd number exceeding 5 can be expressed as a sum of three primes
(Helfgott-Vinogardov Theorem, 2013. In 1954, Vinogardov proved the result for all odd n > B for some B, and in 2013 Helfgott demonstrated that
one can take B = 5);
2. There are infinitely many prime numbers p and q such that |p − q| ≤ 246
(Zhang’s Theorem, 2013. Zhang proved the result for 7 · 107 , and in 2014
the constant was reduced to 246 by Maynard, Tao, Konyagin and Ford);
3. For all n ≥ exp (exp(33.217)) there always exists a prime between n3 and
(n + 1)3 (Ingham’s Theorem, 1937. Ingham proved the result for all n ≥ B
for some B, and in 2014 Dudek demonstrated that one can take B as above);
4. There are infinitely many primes of the form x2 + y4 (Friedlander-Iwaniec
Theorem, 1997);
5. Up to x > 1, there are “approximately” x/ log x prime numbers (Prime Number Theorem, 1896);
6. Given a positive integer d, there exist distinct prime numbers p1 , p2 , . . . , pd
which form an arithmetic progression (Green-Tao Theorem, 2004).
Despite the simplicity of their formulations, all of these results are highly nontrivial and their proofs reside on some deep theories. For example, the Green-Tao
3
Theorem resides on Szemerédi’s Theorem, which in turn uses the theory of random graphs.
There are many number theoretical problems out there that are still open. At
the 1912 International Congress of Mathematicians, the German mathematician
Edmund Landau listed the following four basic problems about primes that still
remain unresolved:
1. Can every even integer greater than 2 be written as a sum of two primes?
(Goldbach’s Conjecture, 1742);
2. Are there infinitely many prime numbers p and q such that |p − q| = 2?
(Twin Prime Conjecture, 1849);
3. Does there always exist a prime between two consecutive perfect squares?
(Legendre’s Conjecture, circa 1800);
4. Are there infinitely many primes of the form n2 + 1? (see Bunyakovsky’s
Conjecture, 1857).
It is widely believed that the answer to each of the questions above is “yes”.
There is a lot of computational evidence towards each of them, and for some of
them conjectural asymptotic formulas were established. However, none of them
are proved.
Aside from being an interesting theoretical subject, number theory also has
many practical applications. It is widely used in cryptographic protocols, such
as RSA (Rivest-Shamir-Adleman, 1977), the Diffie-Hellman protocol (1976), and
ECIES (Elliptic Curve Integrated Encryption Scheme). These protocols rely on
certain fundamental properties of finite fields (RSA, D-H) and elliptic curves defined over them (ECIES). For example, consider the Discrete Logarithm Problem:
given a prime p and integers c, m, one may ask whether there exists an integer d
such that cd − m is divisible by p, and if so, what is its value. We may write this
in the form of a congruence
cd ≡ m
(mod p).
When p is extremely large (hundreds of digits) and c, m are chosen properly, this
problem is widely believed to be intractable; that is, no modern computer can
solve it in a reasonable amount of time (the computation would require billions of
4
years). This property is used in many cryptosystems, including the first two mentioned above. Many cryptosystems, like RSA, can be broken by quantum computers. The construction of protocols infeasible to attacks by quantum computers is
a subject of Post Quantum Cryptography and number theory plays a crucial role
there (see the Lattice-Based or Isogeny-Based Cryptography).
2
Divisibility. Factorization of Integers.
The Fundamental Theorem of Arithmetic
Before we proceed, let us invoke a little bit of notation:
N = {1, 2, 3, . . .} — the natural numbers;
Z = {0,
±1, ±2, . . .} — the ring of integers;
Q = mn : m ∈ Z, n ∈ N — the field of fractions;
R — the field of real numbers;
C = {a + bi : a, b ∈ R, i2 = −1} — the field of complex numbers.
We call Z a ring because 0, 1 ∈ Z and a, b ∈ Z implies a ± b ∈ Z and a · b ∈ Z.
In other words, Z is closed under addition, subtraction and multiplication. Note,
however, that a, b ∈ Z with b 6= 0 does not imply that a/b ∈ Z, so it is not closed
under division. A collection that is closed under addition, subtraction, multiplication and division by a non-zero element is called a field. According to this
definition, every field is also a ring.
Exercise 2.1. Demonstrate the proper inclusions in N ( Z ( Q ( R ( C. No
proofs are required.
Definition 2.2. Let a, b ∈ Z. We say that a divides b, or that a is a factor of b,
when b = ak for some k ∈ Z. We write a | b if this is the case, and a - b otherwise.
Example 2.3. 3 | 12 because 12 = 3 · 4; 3 - 13; −1 | 7 because 7 = (−1) · (−7);
0 - 3.
Proposition 2.4.
1
Let a, b, c, x, y ∈ Z.
1. If a | b and b | c, then a | c;
1 Proposition
1.2 in Frank Zorzitto, A Taste of Number Theory.
5
2. If c | a and c | b, then c | ax ± by;
3. If c | a and c - b, then c - a ± b;
4. If a | b and b 6= 0, then |a| ≤ |b|;
5. If a | b and b | a, then a = ±b;
6. If a | b, then ±a | ±b;
7. 1 | a for all a ∈ Z;
8. a | 0 for all a ∈ Z;
9. 0 | a if and only if a = 0.
Proof. Exercise.
Definition 2.5. Let p ≥ 2 be a natural number. Then p is called prime if the only
positive integers that divide p are 1 and p itself. It is called composite otherwise.
We remark that 1 is neither prime nor composite. We will also use the above
terminology only with respect to integers exceeding 1 (so according to this convention −3 is not prime and −6 is not composite).
Exercise 2.6. Among the collection −5, 1, 5, 6, which numbers are prime?
Theorem 2.7. For each integer n ≥ 2 there exists a prime p such that p | n.
Proof. We will prove this result using strong induction on n.
Base case. For n = 2 we have 2 | n. Since 2 is prime, the theorem holds.
Induction hypothesis. Suppose that the theorem is true for n = 2, 3, . . . , k.
Induction step. We will show that the theorem is true for n = k + 1. If n is
prime the result holds. Otherwise there exists a positive integer d such that d | n,
d 6= 1 and d 6= n. By property 4 of Proposition 2.4 we have d ≤ n, and since d 6= 1
and d 6= n we conclude that 2 ≤ d ≤ n − 1 = k. Thus d satisfies the induction
hypothesis, so there exists a prime p such that p | d. Since p | d and d | n, by
property 1 of Proposition 2.4 we conclude that p | n.
Theorem 2.8. (Euclid’s Theorem, circa 300BC) There are infinitely many prime
numbers.
6
Proof. Suppose not, and there are only finitely many prime numbers, say p1 , p2 , . . . , pk .
Consider the number
q = p1 p2 · · · pk + 1.
Since q ≥ 2, by Theorem 2.7 there exists some prime, say pi , which divides q. On
the other hand, since pi | p1 p2 · · · pk and pi - 1, by property 3 of Proposition 2.4 it
is the case that pi - q. This leads us to a contradiction. Hence there are infinitely
many prime numbers.
There are many alternative proofs of this fact, suggested by Euler, Erdős,
Furstenberg, and other mathematicians (see the wikipedia page for Euclid’s Theorem). At the end of this section, we will see the proof given by Euler.
We will now turn our attention to the Fundamental Theorem of Arithmetic,
which states that any integer greater than 1 can be written uniquely (up to reordering) as the product of primes.
Example 2.9. Number 60 can be written as 60 = 22 · 3 · 5.
In order to prove the theorem, we will utilize the following tools:
1. Well-Ordering Principle. Let S be a non-empty subset of the natural numbers N. Then S contains the smallest element. To spell it out, there exists
x ∈ S such that the inequality x ≤ y holds for any y ∈ S.
2. Generalized Euclid’s Lemma.2 Let p be a prime number and a1 , a2 , . . . , ak
be integers. If p | a1 a2 · · · ak , then there exists an index i, 1 ≤ i ≤ k, such
that p | ai .
Theorem 2.10. (The Fundamental Theorem of Arithmetic) Any integer greater
than 1 can be written uniquely (up to reordering) as the product of primes.
Proof. We will start by proving that every positive integer greater than 1 can be
written as a product of primes. Let S denote the collection of all positive integers
greater than 1 that cannot be written as a product of primes. Suppose that S is
not empty. Since S ( N and N is well-ordered, we conclude that S contains the
smallest element, say n. Clearly, n is not a prime. Thus there exists a positive
integer d such that d | n, d 6= 1 and d 6= n. Thus both d and n/d are strictly less than
n and greater than 1. Furthermore, either d or n/d cannot be written as a product
2 We
will prove this result in Corollary 3.15 once we will introduce the notion of a greatest
common divisor.
7
of primes, for the converse would imply that n is a product of primes. Thus either
d or n/d is in S, which contradicts the fact that n is the smallest element in S. This
means that S is empty, so every integer greater than 1 is a product of primes.
To prove uniqueness, consider two prime power decompositions
n = p1a1 pa22 · · · pak k = qb11 qb22 · · · qb` ` .
We will show that they are in fact the same.3 Without loss of generality, we may
assume that p1 < p2 < . . . < pk and q1 < q2 < . . . < q` . Pick some index i such
that 1 ≤ i ≤ k. Since pi | n = qb11 qb22 · · · qb` ` , by Generalized Euclid’s Lemma there
exists some index j(i), 1 ≤ j(i) ≤ `, such that
b
j(i)
pi | q j(i)
.
Now apply Generalized Euclid’s Lemma once again to deduce that pi | q j(i) . Since
q j(i) is prime, its only divisors are 1 and q j(i) , which means that pi = q j(i) . Since
p1 < p2 < . . . < pk , we see that j(i1 ) 6= j(i2 ) whenever i1 6= i2 .
From above we conclude that for each i such that 1 ≤ i ≤ k we can put in
correspondence some element j(i) — and each j(i) arises from unique i — such
that 1 ≤ j(i) ≤ `, which means that there are at least as many j’s as there are i’s,
so k ≤ `.
Apply Generalized Euclid’s Lemma once again, but with the roles of pi and
q j reversed, thus observing that for each j such that 1 ≤ j ≤ ` we can put in
correspondence some element i( j) — and each i( j) arises from unique j — such
that 1 ≤ i( j) ≤ `, so ` ≤ k. Since k ≤ ` and ` ≤ k, it is the case that k = `.
From here we deduce that pai i | qbi i and qbi i | pai i . By property 5 of Proposition
2.4, we have pai i = qbi i . Since pi = qi , it is the case that ai = bi .
The fact that the prime factorization is unique was utilized by Euler to provide
an alternative proof of Euclid’s Theorem.
Theorem 2.9. (Euclid’s Theorem, circa 300BC) There are infinitely many
prime numbers.
Proof. (Euler’s proof, 1700’s) Consider the harmonic series
∞
1
1
1
∑ n = 1+ 2 + 3 +....
n=1
3 Note
that this is not the proof by contradiction, for we do not assume that these prime power
decompositions are distinct.
8
It is widely known that this series is divergent. Now let p > 1 and recall the
formula for the infinite geometric series:
∞
1
1
1
1
∑ pk = 1 + p + p2 + . . . = 1 − 1/p .
k=0
Using this formula, we observe that
∞
1
1
1
1
=
1
+
+
+
.
.
.
=
,
∑
∏ 1 − 1/p ∏
2
p
p
n
n=1
p prime
p prime
where the last equality holds by the Fundamental Theorem of Arithmetic. If there
would be only finitely many primes, the product on the left hand side would be
finite, which contradicts the fact that the series on the right hand side is divergent.
3
Greatest Common Divisor. Least Common Multiple. Bézout’s Lemma.
When divisibility fails, we speak of quotients and remainders.
Theorem 3.1. (The Remainder Theorem)4 Let a, b be integers, a > 0. Then there
exist unique integers q and r such that
b = aq + r,
where 0 ≤ r < a.
Proof. Recall that every real number x “sits” in between two consecutive integers;
that is, there exists some unique integer q such that
q ≤ x < q + 1.
Now set x = b/a. Then from above inequality it follows that
aq ≤ b < aq + a.
But then
0 ≤ b − aq < a.
4 Proposition
1.3 in Frank Zorzitto, A Taste of Number Theory.
9
If we now put r = b − aq, then
b = aq + r
and r satisfies 0 ≤ r < a. From the above construction it is also evident that q and
r are unique, so the result follows.
Definition 3.2. Let a, b be integers, a > 0. Write b = aq + r, where 0 ≤ r < a.
Then a is called the modulus, b is called the dividend, q is called the quotient and
r is called the remainder.
Note that for a > 0 the expression a | b simply means that in b = aq + r the
remainder r is equal to zero.
Given a and b, one can easily compute q and r using the calculator. First,
compute a/b, and the integer part of this expression is precisely your q. Then
compute r with the formula r = b − aq.
Definition 3.3. Let a and b be integers. An integer d such that d | a and d | b is
called a common divisor of a and b. When at least one of a and b is not zero, the
largest integer with such a property is called the greatest common divisor of a and
b and is denoted by gcd(a, b). When a = b = 0, we define gcd(a, b) := 0.
The greatest common divisor of a and b possesses many interesting properties.
Let us demonstrate several of them.
Proposition 3.4. Let
f
f
f
a = pe11 pe22 · · · pekk and b = p11 p22 · · · pkk ,
where p1 , p2 , . . . , pk are distinct prime numbers and e1 , e2 , . . . , ek , f1 , f2 , . . . , fk are
integers ≥ 0. Then
min{e1 , f1 } min{e2 , f2 }
min{ek , fk }
p2
· · · pk
.
gcd(a, b) = p1
Further, any common divisor c of a and b must also divide gcd(a, b).
Proof. Note that
min{e1 , f1 } min{e2 , f2 }
min{ek , fk }
p2
· · · pk
g = p1
divides both a and b. Also, any integer
c = pg11 pg22 · · · pgk k
10
(1)
such that gi > min{ai , bi } for some i fails to divide either a or b. Hence any
common divisor c satisfies gi ≤ min{ai , bi } for all i, 1 ≤ i ≤ k. Hence c divides
g. Maximizing the inequality for each index we get that g is in fact the greatest
common divisor.
Note that Proposition 3.4 suggests one formula for the computation of gcd(a, b).
First, one has to factor a and b by writing them in the form
f
f
f
a = pe11 pe22 · · · pekk and b = p11 p22 · · · pkk ,
where the indices ei and f j are allowed to be 0 (convince yourself that any two
numbers can be written in this form). Then one might simply utilize the formula
(1). This approach works fine when the numbers are small and easily factorable,
but unfortunately as the numbers get really large the efficient factorization is infeasible for modern electronic computers (but feasible for quantum computers,
see Shor’s Algorithm). In fact, the security of the RSA public key cryptosystem
is based on the difficulty of factorization.
Example 3.5. Let us compute the greatest common divisor of 440 and 300. The
prime factorizations are 440 = 23 · 5 · 11 and 300 = 22 · 3 · 52 . We see that
440 = 23 · 30 · 51 · 111 and 300 = 22 · 31 · 52 · 110 .
Thus
gcd(440, 300) = 2min{3,2} · 3min{0,1} · 5min{1,2} · 11min{1,0} = 22 · 30 · 51 · 110 = 20.
Exercise 3.6. Let a and b be integers. An integer ` is called a common multiple
of a and b if it satisfies a | ` and b | `. The smallest non-negative integer with
such a property is called the least common multiple of a and b and is denoted by
lcm(a, b). Given the statement as in Proposition 3.4, prove that
max{e1 , f1 } max{e2 , f2 }
max{ek , fk }
p2
· · · pk
lcm(a, b) = p1
(2)
and that every common multiple c of a and b is divisible by lcm(a, b). That is, if
a | c and b | c, then lcm(a, b) | c.
Exercise 3.7. Let a and b be non-negative integers. Prove that
ab = gcd(a, b) lcm(a, b).
11
(3)
Exercise 3.8. Compute lcm(440, 300) using formulas (2) and (3).
We will now address the following question: which integers c can be written
in the form ax + by, where x and y are integers? Speaking in fancy mathematical
language, the identity c = ax + by means that c is an integer (linear) combination
of a and b.
Let us play around a little bit with the quantity ax + by. Clearly, a can be
written in this form, since a = a · 1 + b · 0. Same applies to b, since b = a · 0 + b · 1.
The number 0 can always be represented in this form, since 0 = a · 0 + b · 0. Note
that, when at least one of a and b is not zero, ax + by will always represent at least
one positive integer, because a · a + b · b > 0. It turns out that the least positive
integer d represented by ax + by is precisely the greatest common divisor of a and
b.
Example 3.9. Consider a = 7 and b = 15. Then the equation
7x + 15y = 1
has a solution (x, y) = (−2, 1). In fact, it has infinitely many solutions, as any
solution of the form (x, y) = (−2 + 15n, 1 − 7n) for n ∈ Z is a solution, too.
However, when a = 7 and b = 14 the equation
7x + 14y = 1
has no solutions, as the left hand side will always be divisible by 7, while this
is not the case for the right hand side. So number 1 cannot be represented as an
integer linear combination of 7 and 14. Hence the question: which numbers can
be represented in this form?
Theorem 3.10. (Bézout’s lemma)5 Let a, b be integers such that a 6= 0 or b 6= 0.
If d is the least positive integer combination of a and b, then d divides every
combination of a and b. Furthermore, d = gcd(a, b).
Proof. We know that ax + by = d > 0. Now consider some integer combination
c = as + bt,
where s,t ∈ Z. We want to show that d | c. Recall that
c = dq + r
5 Proposition
1.4 in Frank Zorzitto, A Taste of Number Theory.
12
for some q, r ∈ Z, where 0 ≤ r < d. Thus
0≤r
= c − dq
= as + bt − (ax + by)q
= a(s − xq) + b(t − yq)
< d.
We see that r is an integer combination of a and b, which is less than d, and nonnegative. Because d is the least positive integer combination of a and b, the only
option is that r = 0. Hence d | c. In particular, d | a and d | b, because a, b are
integer combinations of a and b.
We will now show that d = gcd(a, b). On one hand, we know that d | a and
d | b, so d is a common divisor of a and b. By Proposition 3.4, d must divide the
greatest common divisor of a and b, i.e. d | gcd(a, b). On the other hand, since
d = ax + by = gcd(a, b)(a1 x + b1 y) for some x, y, a1 , b1 ∈ Z, we have gcd(a, b) | d.
Since d | gcd(a, b) and gcd(a, b) | d, by property 5 of Proposition 2.4 we conclude
that d = gcd(a, b).
With the help of Theorem 3.10 we can answer the question which numbers
can be represented in the form ax + by. Since
gcd(a, b) = ax + by
for some x, y ∈ Z and gcd(a, b) is the smallest positive integer representable in this
form, we see that any integer c divisible by gcd(a, b) can be written in such a way,
since for some k ∈ Z it is the case that
c = k · gcd(a, b) = k(ax + by) = a(kx) + b(ky).
On the other hand, if gcd(a, b) - c, then c cannot be written as an integer combination of a and b.
We will now use Bézout’s lemma to establish a few more properties of prime
numbers. In particular, we will prove Euclid’s lemma, which we already saw in
Section 2.
Definition 3.11. Let a and b be integers. We say that a and b are coprime if
gcd(a, b) = 1.
13
Proposition 3.12. Let a, b, c be integers with a, b coprime. If a | c and b | c, then
ab | c.
Proof. Since a and b are coprime, by Bézout’s lemma there exist integers x and y
such that
ax + by = 1.
Thus
a(cx) + b(cy) = c.
After dividing both sides of the above equality by ab we obtain
c
c
c
·x+ ·y = .
b
a
ab
Since a | c and b | c, the quantity on the left hand side of the above equality is an
integer. Hence the same applies to the quantity on the right hand side, so c/(ab)
is an integer.
Proposition 3.13. Let a, b, c be integers with a, b coprime. If a | bc, then a | c.
Proof. Since a and b are coprime, by Bézout’s lemma there exist integers x and y
such that
ax + by = 1.
Thus
a(cx) + b(cy) = c.
After dividing both sides of the above equality by a we obtain
c·x+
c
bc
·y = .
a
a
Since a | bc, the quantity on the left hand side of the above equality is an integer.
Hence the same applies to the quantity on the right hand side, so c/a is an integer.
Proposition 3.14. (Euclid’s lemma) If p is prime and p | ab for some integers a,
b, then p | a or p | b.6
6 The
proof is from Frank Zorzitto’s “A Taste of Number Theory” (see Proposition 2.4 on page
31).
14
Proof. Say p - a. Let d = gcd(p, a). Since d | p, the definition of primes forces
d = 1 or d = p, and since p - a, it must be that d = 1, so p and a are coprime.
From Proposition 3.13 it follows that p | b.
Corollary 3.15. (Generalized Euclid’s lemma) Let p be a prime number and
a1 , a2 , . . . , ak be integers. If p | a1 a2 · · · ak , then there exists an index i, 1 ≤ i ≤ k,
such that p | ai .
Proof. The result clearly holds for k = 1, so assume that k ≥ 2. If p | a1 , we are
done. If not, then p and a1 are coprime, so by Proposition 3.13 it must be the case
that p | a2 a3 · · · ak . If p | a2 we are done. If not, then p and a2 are coprime, so by
Proposition 3.13 it must be the case that p | a3 a4 · · · ak . Proceeding in the same
fashion, we eventually reach p | ak−1 ak , where we may apply Euclid’s lemma to
draw the desired conclusion.
Exercise 3.16. Show that one cannot remove the coprimality condition neither
from Proposition 3.12 nor from Proposition 3.13.
4
Diophantine Equations.
The Linear Diophantine Equation ax + by = c
An equation is called Diophantine if we are only concerned with its integer solutions. Any equation can be converted into its Diophantine form. For example,
instead of looking at x2 + y2 = 1 for (x, y) ∈ R2 we may restrict our attention to
(x, y) ∈ Z2 . Note that in the former case there are infinitely many solutions (in
fact, there are uncountably many of them). These are all the points lying on the
circle centered at the origin with the radius equal to 1. However, if we look at
(x, y) ∈ Z2 then there are only four solutions, namely (±1, 0) and (0, ±1). (Do
you see why?)
Sometimes, converting an equation into its Diophantine form is not very interesting. This√
is the case for the equation x2 + y2 = 1. Another example is the
equation y =
√x 2, which has no integer solutions aside from (0, 0) due to irrationality of 2. But sometimes understanding integer solutions can get difficult,
even extremely difficult. The reason is that, when considering an equation over the
real numbers R or — even better! — over the complex numbers C, there are many
analytical tools that we can utilize. Say, if we are looking at equation f (x) = 0
for x ∈ R, we might utilize the fact that f (x) is continuous, or differentiable, or
15
maybe even smooth. Another reason why it might be easier to analyze equations
not only over R or C, but also over Q, is because all of them are fields.
Quite often we can say many things about the Diophantine equation by “lifting” it and considering it, for example, over Q, for if there are only finitely many
solutions over Q, then there are only finitely many solutions over Z. Such a technique applies to hyperelliptic equations, like y2 = x5 + 2 (see Faltings’ Theorem).
However, sometimes there are infinitely many solutions over Q, but only finitely
many — or even none! — over Z. The fact that Q is a field can be utilized to
prove that there are infinitely many rational solutions to elliptic equations
y2 = x3 + 46,
y2 = x3 − 2.
Note that the first equation has a solution (−7/4, 51/8), while the second equation
has a solution (129/100, 383/1000). Unlike Q, R or C, the ring of integers Z
is not closed under division by a non-zero element, so we need to use different
techniques to study it. For example, the equation y2 = x3 + 46 has no solutions in
integers, while the equation y2 = x3 − 2 has two solutions (3, ±5).
Example 4.1. Let a, b, c, n be fixed integers, n ≥ 3, and x, y, z be integer variables.
Here are several examples of Diophantine equations:
ax + by = c
x2 + y2 = z2
x2 − dy2 = ±1
y2 = x3 + ax + b
axn + byn = c
axn + byn = czn
x 2 + 7 = 2y
— Linear Diophantine equation in two variables;
— Pythagorean equation;
— Pell equation;
— Weierstrass equation;
— Thue equation;
— Fermat type equation;
— Ramanujan-Nagell equation.
When analyzing equations, we would like to answer the following questions:
1. Do solutions exist?
2. If solutions exist, how many of them are there? (finitely many, countably
many, uncountably many)
3. What are the solutions?
4. Are there algorithms which can generate solutions?
16
We address the same questions when analyzing Diophantine equations. Of course,
in this case the number of solutions will be at most countable.
We will now turn our attention to the linear Diophantine equation in two variables
ax + by = c.
Here a, b, c are fixed integers and x, y are integer variables. We will fully classify
the solutions to this equation.
The question of existence of a solution was fully resolved at the end of Section
3, where we established that solutions exist if and only if gcd(a, b) | c. To this end,
the only thing that is left for us to do is to find all the solutions when they exist,
and come up with a procedure for their computation. As the following Proposition
shows, by knowing one solution to ax + by = c we can deduce all of the solutions.
Proposition 4.2. Let a, b, c be integers. Let (x, y) be a pair of integers such that
ax + by = c.
Then any pair of integers (x0 , y0 ) such that c = ax0 + by0 must be of the form
a
b
0 0
n, y +
n ,
(x , y ) = x −
gcd(a, b)
gcd(a, b)
where n ranges over the integers.
Proof. Suppose that (x, y) and (x0 , y0 ) are integer pairs such that
c = ax + by = ax0 + by0 .
Then a(x − x0 ) = b(y0 − y). This means that a | b(y0 − y), and further
a
| (y0 − y).
gcd(a, b)
This means that
a
gcd(a, b)
for some n ∈ Z. Substituting this relation into the equation a(x − x0 ) = b(y0 − y),
we see that
ab
a(x − x0 ) = n
,
gcd(a, b)
which means that
b
x0 = x − n
.
gcd(a, b)
y0 = y + n
17
Thus we see that from one solution to ax + by = c (if it exists) we may produce
all solutions once we compute gcd(a, b). In order to determine one solution to this
equation, we use the Extended Euclidean Algorithm. This algorithm allows one
to compute a pair of integers (x, y) such that
ax + by = gcd(a, b).
This allows us to produce a solution to ax + by = c, as then it must be the case that
gcd(a, b) | c, so for some integer k we have
c = k gcd(a, b) = k(ax + by) = a(kx) + b(ky).
We may then use Proposition 4.2 to compute all solutions to the linear Diophantine
equation ax + by = c. We will learn about the Extended Euclidean Algorithm in
the following section.
Exercise 4.3. Let a1 , a2 , . . . , ak be integers at least one of which is not 0. The
largest integer d such that d | ai for all i, 1 ≤ i ≤ k, is called the greatest common divisor of a1 , a2 , . . . , ak . It is denoted by gcd(a1 , a2 , . . . , ak ). When a1 = a2 = . . . = ak = 0,
we define gcd(a1 , a2 , . . . , ak ) := 0.
Determine the formulas for gcd(a1 , a2 , . . . , ak ) and lcm(a1 , a2 , . . . , ak ) that are
analogous to (1) and (2). Does a formula similar to (3) hold? Explain why or why
not.
Exercise 4.4. Let a1 , a2 , . . . , ak be integers. We say that c ∈ Z can be represented
as an integer linear combination of a1 , a2 , . . . , ak if there exist x1 , x2 , . . . , xk ∈ Z
such that
c = a1 x1 + a2 x2 + . . . + ak xk .
Given integers a1 , a2 , . . . , ak , which integers can be written as an integer combination of a1 , a2 , . . . , ak ?
5
Euclidean Algorithm. Extended Euclidean Algorithm
Let a, b be integers at least one of which is not 0. In the previous section, we
found one formula for the computation of gcd(a, b), namely (1). Though being
useful, it is not very efficient, as the algorithm for fast integer factorization is
18
unknown.7 However, there exists a much more efficient algorithm to compute
gcd(a, b), developed by Euclid in his fundamental work Elements. It is called the
Euclidean Algorithm.
We begin our explorations by first showing yet another interesting property of
the greatest common divisor. In particular, if a, b are integers at least one of which
is not zero, then gcd(a, b) does not change if we replace b with b + ak, where k is
an arbitrary integer.
Proposition 5.1. Suppose a, b are two integers. Then for any integer k it is the
case that
gcd(a, b) = gcd(a, b + ak).
Proof. Let d1 = gcd(a, b) and d2 = gcd(a, b + ak). We will show that d1 | d2 and
d2 | d1 .
Since d1 | a and d1 | b, it is the case that d1 | (b + ak). Since d1 is a common
divisor of a and b + ak, by Proposition 3.4 it must divide their greatest common
divisor d2 . Thus d1 | d2 .
Now observe that d2 | a and d2 | b + ak. Thus a = d2 r1 and b + ak = d2 r2 for
some r1 , r2 ∈ Z. But then
b = d2 r2 − ak = d2 r2 − d2 r1 k = d2 (r2 − r1 k).
Hence d2 | b, which means that d2 is a common divisor of a and b. By Proposition
3.4 it must divide their greatest common divisor d1 . Thus d2 | d1 . Since d1 | d2
and d2 | d1 , we conclude that d1 = d2 .
We will now describe the Euclidean Algorithm. Let a, b be positive integers
such that ab 6= 0, since when ab = 0 it is easy to compute gcd(a, b). Without loss
of generality, we suppose that a > b (if a < b we may interchange a and b, and if
a = b then gcd(a, b) = a). We define the finite sequence of integers a1 , a2 , . . . as
follows. Set r1 = a, r2 = b, and write
r1 = q1 r2 + r3 .
Note that the remainder r3 satisfies 0 ≤ r3 < r2 = b. Then compute
r2 = q2 r3 + r4 ,
r3 = q3 r4 + r5 ,
7 By
“fast” we mean “polynomial time”.
19
and so on. Since the sequence of integers r1 > r2 > . . . is bounded below by 0,
in n steps this sequence eventually reaches some smallest positive number rn . We
will show that this smallest positive integer rn is precisely gcd(a, b).
Why does this process allow one to compute gcd(a, b)? By Proposition 5.1,
gcd(r1 , r2 ) = gcd(r1 − q1 r2 , r2 ) = gcd(r3 , r2 ).
Let us compute one more step:
gcd(r3 , r2 ) = gcd(r3 , r2 − q2 r3 ) = gcd(r3 , r4 ).
Proceeding in the same fashion, we see that
gcd(a, b) = gcd(r1 , r2 ) = gcd(r2 , r3 ) = . . . = gcd(ri , ri+1 )
for all i such that 1 ≤ i ≤ n − 1. We see that the calculations get easier with each
step, and in the end we obtain
gcd(a, b) = gcd(r1 , r2 ) = . . . = gcd(rn−1 , rn ) = gcd(rn , 0) = rn .
Theorem 5.2. Let a, b be positive integers with a > b. Let r1 > r2 > . . . be the
finite sequence as defined above. Let rn be the smallest positive integer in this
sequence. Then rn = gcd(a, b).
Proof. Recall that d = gcd(a, b) = gcd(ri , ri+1 ) for i = 1, 2, . . . , n − 1. Now consider the last equation
rn−2 = qn−2 rn−1 + rn .
The remainder in the expression
rn−1 = qn−1 rn + rn+1
satisfies 0 ≤ rn+1 < rn . Since rn is the smallest positive integer in this sequence
and the sequence is strictly decreasing, the only possibility is that rn+1 = 0, which
means that rn divides rn−1 . But then
rn = gcd(rn−1 , rn ) = gcd(rn−2 , rn−1 ) = . . . = gcd(r1 , r2 ) = gcd(a, b).
Consider several examples.
20
Example 5.3. Let us compute gcd(440, 300) using the Euclidean Algorithm. We
have
440 = 1 · 300 + 140
300 = 2 · 140 + 20
140 = 7 · 20 + 0.
Thus gcd(440, 300) = 20.
Example 5.4. Let us compute gcd(233, 144) using the Euclidean Algorithm. We
have
233 = 1 · 144 + 89
144 = 1 · 89 + 55
89 = 1 · 55 + 34
55 = 1 · 34 + 21
34 = 1 · 21 + 13
21 = 1 · 13 + 8
13 = 1 · 8 + 5
8 = 1·5+3
5 = 1·3+2
3 = 1·2+1
2 = 2 · 1 + 0.
Thus gcd(233, 144) = 1.
Note that both numbers in Example 5.4 are smaller than in Example 5.3. Nevertheless, in Example 5.4 the Euclidean Algorithm terminated in 12 steps, while
in Example 5.3 it terminated in 3 steps. This is because in Example 5.4 we
chose our integers to be the 13th and the 12th Fibonacci numbers. Recall that
Fibonacci numbers are the numbers defined recursively by F1 = 1, F2 = 2 and
Fn = Fn−1 + Fn−2 for n ≥ 3. It turns out that the slowest performance of the Euclidean Algorithm is achieved for consecutive Fibonacci numbers. Nevertheless,
the algorithm does work in polynomial time. In 1844, Gabriel Lamé proved that
the number of steps required for the completion of the Euclidean Algorithm is at
most 5 log10 (min{a, b}), so we see that the algorithm works in polynomial time.
Exercise 5.5. Let F1 = 1, F2 = 2, and for an integer n ≥ 3 define Fn = Fn−1 + Fn−2 .
The number Fn is called the n-th Fibonacci number. Prove that the computation
of gcd(Fn+1 , Fn ) with the Euclidean Algorithm requires n steps.
Above we managed to compute gcd(a, b). Still, we do not know how to produce integer solutions (x, y) to the Diophantine equation
ax + by = gcd(a, b).
21
This can be achieved with the help of the Extended Euclidean Algorithm. It is
essentially the same as the Euclidean Algorithm, but along with the sequence
r1 , r2 , . . . we will also keep track of two additional sequences s1 , s2 , . . . and t1 ,t2 , . . ..
The algorithm is as follows. Set
r1 = a, r2 = b;
s1 = 1, s2 = 0;
t1 = 0, t2 = 1.
For i ≥ 3, we proceed by computing
ri+1 = ri−1 − qi−1 ri ;
si+1 = si−1 − qi−1 si ;
ti+1 = ti−1 − qi−1ti .
Note that, out of the three lines above, the Euclidean Algorithm computes only
the first one. We claim that, if the Euclidean Algorithm terminates in n + 1 steps,
then integers sn and tn satisfy asn + btn = gcd(a, b).
Theorem 5.6. Let a, b be positive integers with a > b. Let r1 > r2 > . . . > rn > 0,
s1 , s2 , . . . , sn and t1 ,t2 , . . . ,tn be sequences as defined above. Then
asn + btn = gcd(a, b).
Proof. We claim that the equation
asi + bti = ri
is satisfied for all i = 1, 2, . . . , n. Since Theorem 5.2 asserts that rn = gcd(a, b),
this would imply the result. To prove this statement, we proceed using induction
on n.
Base case. According to our setup, r1 = a, r2 = b, s2 = t1 = 0 and s1 = t2 = 1.
Thus as1 + bt1 = r1 and as2 + bt2 = r2 , so the base case holds for i = 1, 2.
Induction hypothesis. Assume that asi + bti = ri for i = k − 1, k.
Induction step. We will demonstrate that the result holds for i = k + 1:
rk+1 = rk−1 − rk qk
= (ask−1 − ask qk ) + (btk−1 − btk qk )
We conclude that asi + bti = ri for all i = 1, 2, . . . , n, as claimed.
22
Using Extended Euclidean Algorithm, we can finally solve the Diophantine
equation ax + by = c.
Example 5.7. Let us determine all solutions to the Diophantine equation
440x + 300y = 80
using the Extended Euclidean Algorithm. Set
r1 = 440, r2 = 300;
s1 = 1,
s2 = 0;
t1 = 0,
t2 = 1.
Step 1. 440 = 1 · 300 + 140, so q1 = 1 and r3 = 140. Thus
s3 = s1 − q1 s2 = 1 − 1 · 0 = 1;
t3 = t1 − q1t2 = 0 − 1 · 1 = −1.
Step 2. 300 = 2 · 140 + 20, so q2 = 2 and r4 = 20. Thus
s4 = s2 − q2 s3 = 0 − 2 · 1
= −2;
t4 = t2 − q2t3 = 1 − 2 · (−1) = 3.
Step 3. Since 20 | 140, the algorithm terminates.
We conclude that
440 · (−2) + 300 · 3 = 20.
After multiplying both sides of the above equality by 4, we obtain a solution
(x, y) = (−8, 12) to the Diophantine equation 440x + 300y = 80. By Proposition
4.2, if a = 440 and b = 300 then all solutions to this Diophantine equation must
be of the form
a
b
n, y +
n = (−8 − 15n, 12 + 22n),
x−
gcd(a, b)
gcd(a, b)
where n ranges over the integers.
Exercise 5.8. (a) Let a, b, c be integers such that a 6= 0 or b 6= 0, and gcd(a, b) | c.
Consider the Diophantine equation ax + by = c. Prove that there exists the
unique solution (x, y) such that 0 ≤ x < b/ gcd(a, b) and the unique solution
(x0 , y0 ) such that 0 ≤ y0 < a/ gcd(a, b);
23
p
(b) For (x, y) ∈ R2 , let k(x, y)k := x2 + y2 denote the Euclidean norm. Let a, b, c
be integers such that c 6= 0 and gcd(a, b) = 1, and consider the linear Diophantine equation
ax + by = c.
Prove that the solution (x, y) ∈ Z2 of the above equation that corresponds to
the smallest value of k(x, y)k satisfies
|c|
k(a, b)k
|c|
≤ k(x, y)k ≤
+
.
k(a, b)k
k(a, b)k
2
6
Congruences.
Throughout this section, we fix a positive integer n, which we call the modulus.
Definition 6.1. We say that integers a and b are congruent modulo n if n divides
a − b. We denote this by
a ≡ b (mod n).
To say that a and b are congruent modulo n is the same as to say that their
remainders after division by n are the same. That is, if
a = q1 n + r1 and b = q2 n + r2 ,
where 0 ≤ r1 , r2 < n, then r1 = r2 . A rather surprising fact is that the congruence
relation ≡ behaves much like the equality relation =.
Proposition 6.2. The congruence relation ≡ is an equivalence relation. That is,
it satisfies the following three axioms:
(a) Reflexivity. If a is any integer, then a ≡ a (mod n);
(b) Symmetry. If a ≡ b (mod n), then b ≡ a (mod n);
(c) Transitivity. If a ≡ b and b ≡ c (mod n), then a ≡ c (mod n).
Proof. Exercise.
24
Example 6.3. Let n = 5. Then the numbers 7 and 27 are congruent to each other
modulo 5, because 5 | (27 − 7). Also note that both 7 and 27 have the same
remainder after division by 5:
7 = 1 · 5 + 2 and 27 = 4 · 5 + 2.
In fact, it is easy to notice that there are infinitely many numbers congruent to 7
modulo 5. Convince yourself that all of them belong to the set
{5q + 2 : q ∈ Z} = . . . , −8, −3, 2, 7, 12, . . . .
Proposition 6.4.
8
Let n be a modulus, and suppose that
a ≡ a1
b ≡ b1
Then
(mod n),
(mod n).
a ± b ≡ a1 ± b1 (mod n),
ab ≡ a1 b1 (mod n).
Proof. Let us first show that a + b ≡ a1 + b1 (mod n). Note that n | (a − a1 ) and
n | (b − b1 ). By property 2 of Proposition 2.4,
n | (a − a1 ) + (b − b1 ) = (a + b) − (a1 + b1 ),
so by definition we see that a + b ≡ a1 + b1 (mod n). An analogous proof holds if
we replace the plus sign with the minus sign.
To see that ab ≡ a1 b1 (mod n), observe that
ab − a1 b1 = ab − a1 b + a1 b − a1 b1 = (a − a1 )b + a1 (b − b1 ).
Since n | (a − a1 ) and n | (b − b1 ), once again, by property 2 of Proposition 2.4 it
is the case that
n | (a − a1 )b + a1 (b − b1 ) = ab − a1 b1 ,
and by definition this means that ab ≡ a1 b1 (mod n).
If we now combine Propositions 6.2 and 6.4, it becomes clear that in any congruence, which involves only addition, subtraction and multiplication of integers,
we can easily replace a with a1 whenever a ≡ a1 (mod n). This is known as the
replacement principle.
8 Proposition
3.3 in Frank Zorzitto, A Taste of Number Theory.
25
Example 6.5. Let f (x) = x5 − 10x + 7. We can compute the remainder of f (27)
divided by 5 as follows: note that 27 ≡ 2 (mod 5). Since f (x) involves only
addition, subtraction and multiplication of integers, by the replacement principle
we can compute f (2) instead of f (27), because f (27) ≡ f (2) (mod 5). Also,
since 10 ≡ 0 (mod 5) and 7 ≡ 2 (mod 5), we see that
f (27) ≡ f (2)
≡ 25 − 10 · 2 + 7
≡ 25 − 0 · 2 + 2
≡ 34
≡ 4 (mod 5).
Since 0 ≤ 4 < 5, we conclude that 4 is the remainder of f (27) divided by 5.
Example 6.6. Let us compute the last decimal digit of 30799 . Note that this is
the same as finding the remainder of 30799 divided by 10. By the replacement
principle, reading from left to right and top to bottom, we have
30799 ≡ 799
≡ (73 )33 ≡ 34333 ≡ 333
≡ (33 )11 ≡ (27)11 ≡ 711
≡ 72 · (73 )3
≡ 49 · 33 ≡ 9 · 27 ≡ 9 · 7
≡ 63
≡3
(mod 10).
Thus 3 is the last decimal digit of 30799 . Analogously, we can determine the last
k decimal digits of any number by applying the replacement principle modulo 10k
instead of 10. However, as the modulus grows, the computations become more
and more challenging.
In practice, in order to compute a` (mod n) for some large power `, we utilize
the so called Double-and-Add Algorithm. The algorithm is as follows: first, write
the integer ` in its binary expansion, i.e.
k
` = ∑ ci 2i = ck 2k + ck−1 2k−1 + . . . + c1 · 2 + c0 ,
i=0
where ci ∈ {0, 1}. Then
k
k−1
a` ≡ ack 2 +ck−1 2 +...+c1 ·2+c0 ,
k ck k−1 ck−1
c
≡ a2
· a2
· · · a2 1 · ac0
26
(mod n).
j
But then note that, for j such that 2 ≤ j ≤ k, we can deduce the value of a2 from
j−1
a2 modulo n as follows:
j−1 2
j
a2 ≡ a2
(mod n).
2
k
Therefore we can compute a2 , a2 , . . . , a2 in k − 1 steps.
Example 6.7. Let us compute n ≡ 7114 (mod 23) such that 0 ≤ n < 23. Note that
114 = 64 + 32 + 16 + 2 = 26 + 25 + 24 + 2.
Then
72
74
78
716
732
764
≡ 49
≡ (72 )2
≡ (74 )2
≡ (78 )2
≡ (716 )2
≡ (732 )2
≡ 3 (mod 23);
≡ 32 ≡ 9 (mod 23);
≡ 92 ≡ 81 ≡ 12 (mod 23);
≡ 122 ≡ 144 ≡ 6 (mod 23);
≡ 62 ≡ 36 ≡ 13 (mod 23);
≡ 132 ≡ 169 ≡ 8 (mod 23).
We can utilize the table above in our calculations:
7114 ≡ 764+32+16+2
≡ 764 · 732 · 716 · 72
≡ 8 · 13 · 6 · 3
≡ 1872
≡ 9 (mod 23).
We will now take a look at some interesting applications of modular arithmetic. For example, it can be used to demonstrate that certain Diophantine equations have no solutions.
Example 6.8. Let us show that the Diophantine equation
x2 + y2 = 4z + 3
has no solutions. This is the same as solving the congruence
x2 + y2 ≡ 3
27
(mod 4)
in integers x and y. Since every integer is congruent to either 0, 1, 2 or 3 modulo
4, there are essentially 16 possible combinations of x and y that we can check.
However, the problem becomes even simpler if we note that
02 ≡ 0, 12 ≡ 1, 22 ≡ 0, 32 ≡ 1
(mod 4).
Thus every perfect square is congruent to either 0 or 1 modulo 4. Since we are
dealing with the sum of two perfect squares, there are now only three options left
to check, namely
0 + 0 ≡ 0, 0 + 1 ≡ 1, 1 + 1 ≡ 2
(mod 4).
As we can see, none of them add up to 3, which means that x2 + y2 ≡ 3 (mod 4)
has no solutions in integers x and y. Therefore there are no solutions to the Diophantine equation x2 + y2 = 4z + 3.
Exercise 6.9. (a) Show that the Diophantine equation x2 + y2 + z2 = 8t + 7 has
no solutions for x, y, z,t ∈ Z;
√
√
(b) Let Z[ 2] := {a + b 2 : a, b ∈ Z}.√ Show that there exists a solution to
x2 + y2 + z2 = 8t + 7 for x, y, z,t ∈ Z[ 2];
(c) Show that integers x, y, z,t satisfy x2 + y2 + z2 = 8t + 3 if and only if x, y and
z are odd.
In school, you probably heard of divisibility rules for various integers. For
example, in order to check that some integer is divisible by 3, one just has to add
up all of its decimal digits together and verify that the resulting number is divisible
by 3. To verify that some integer n is divisible by 4, one just has to ensure that the
number representable by the last two decimal digits of n is divisible by 4. These
divisibility rules are the consequences of modular arithmetic.
Example 6.10. Let us prove the following divisibility rule for 3 and 9. Let n be a
positive integer, and let m be the sum of the decimal digits of n. Then 3 | n if and
only if 3 | m, and 9 | n if and only if 9 | m.
Let us prove the divisibility rule for 3, as the divisibility rule for 9 is analogous
to it. We write the number n in base 10:
k
n = ∑ ai 10i ,
i=0
28
where ai ∈ {0, 1, . . . , 9}. Then, by definition,
m = ak + ak−1 + . . . + a1 + a0 .
Since 10 ≡ 1 (mod 3),
n ≡ ak 10k + ak−1 10k−1 + . . . + a1 · 10 + a0
≡ ak · 1k + ak−1 · 1k−1 + . . . + a1 · 1 + a0
≡ ak + ak−1 + . . . + a1 + a0
≡ m (mod 3).
We conclude that 3 | (n − m), so there exists an integer k1 such that n − m = 3k1 .
Now assume that 3 | m. Then there exists an integer k2 such that m = 3k2 . But
then
3k1 = n − m = n − 3k2
implies n = 3(k1 + k2 ), which means that 3 | n. Analogously, we can show that if
3 | n, then 3 | m. If we replace the modulus 3 with the modulus 9, the proof will
remain the same.
Exercise 6.11. Prove the following divisibility rule for 11. Let n be an integer.
Let m be the sum of the digits of n in blocks of two from right to left. Then 11 | n
if and only if 11 | m.
Example: If n = 3928881, then m = 3 + 92 + 88 + 81 = 264 is divisible by
11. Thus 3928881 is divisible by 11 as well.
7
The Ring of Residue Classes Zn
Recall that, according to our terminology, the set of all integers Z forms a ring,
if 0, 1 ∈ Z and for all a and b in Z we have a ± b ∈ Z and a · b ∈ Z. Now let n
be a modulus. In this section, we will introduce the first example of a finite ring
Zn and study its properties. As the name suggests, this ring will have only finitely
many elements. Just like the ring of integers Z, it will contain its own analogues
of 0 and 1, and we will also endow it with the operations of addition, subtraction
and multiplication, which will be very much similar to the operations in Z.
Definition 7.1. Let a be an integer. The set
[a] := {nq + a : q ∈ Z}
29
is called the residue class of a modulo n. The integer a is called a representative
of the residue class [a].
Note that [a] = [b] if and only if a ≡ b (mod n). Also, each residue class
contains an integer r such that 0 ≤ r < n. It is conventional to pick such integers
as representatives. For example, if n = 5, even though one can denote the set of
all integers congruent to 17 modulo 5 by , we would rather prefer to use 
instead, since 17 ≡ 2 (mod 5) and 0 ≤ 2 < 5. Since there are only n possible
numbers between 0 and n (exclusive), namely
0, 1, 2, . . . , n − 1,
and each integer is congruent modulo n to exactly one of these numbers, we see
that there are exactly n residue classes modulo n.
Exercise 7.2. Let n be a positive integer. Prove that the residue classes , ,
. . . , [n − 1] modulo n partition the integers. That is,
 ∪  ∪ . . . ∪ [n − 1] = Z,
and also [a] ∩ [b] 6= ∅ implies [a] = [b]. Hint: use Proposition 6.2.
We denote the collection of residues modulo n by Z/nZ or Zn .9 Since the
notation Zn is utilized in your course notes, we will stick with it in these lecture
notes.
Proposition 7.3. Let n be a positive integer and consider the collection Zn of all
residues modulo n. Define the binary operations +, − and · as follows:
[a] ± [b] := [a ± b] and [a] · [b] := [a · b].
Then, under these binary operations, Zn forms a ring.
Proof. Exercise. Hint: use Proposition 6.4.
Note that Zn is a finite ring. When we carry out operations in Zn , we are
doing modular arithmetic. To do modular arithmetic, just carry out the regular
arithmetic and then replace the result with any other integer modulo n (once again,
conventionally we pick a representative r such that 0 ≤ r < n).
9 The latter notation might be ambiguous, as when
30
p is prime the symbol Z p is used to represent
Example 7.4. Here are two examples of a modular arithmetic in Z17 :
 +  =  +  =  = .
 ·  =  ·  =  = .
Note that, in the case of addition, one could slightly simplify the computations by
noting that 33 ≡ −1 (mod 17):
 +  = [−1] +  = .
After all, dealing with −1 is much simpler than with 16.
Despite the fact that Zn behaves much like Z, some of its properties might
be rather unpleasant. For example, Z has no zero divisors apart from 0. In other
words, the identity ab = 0 implies that either a = 0 or b = 0. In general, this is not
true for Zn .
Example 7.5. To see that Z6 contains zero divisors that are 6= , note that
 ·  =  =  =  · .
Thus we see that  ·  =  in Z6 , even though  6=  and  6= .
The same is true for Z15 :
 ·  =  =  =  · .
Thus we see another major difference between Z and Zn : in Z, the expression
ab = ac with a 6= 0 implied b = c. However, in general, this is no longer true for
Zn . It is not difficult to show that, in fact, Zn has no non-trivial zero divisors if
and only if n is prime or n = 1.
8
Linear Congruences
Let n be a modulus. We will now turn our attention to equations in Zn . Let a, b be
integers, and consider the linear equation
[a][x] = [b],
where x is an unknown integer.
31
Example 8.1. The linear equation
[x] = 
has only one solution in Z13 , namely [x] = . As there are only finitely many
possibilities, we may check all of them, from  to , in order to find a solution.
Even though there is only one solution in Z13 , there are actually infinitely many
solutions in Z. This is because any integer y ∈ , — that is, any integer of the
form y = 13q + 6, — satisfies
7y ≡ 3
(mod 13).
The linear equation
[x] = 
has two solutions in Z9 , namely [x] =  and [x] = . Here we see the principal
difference between the linear equation in Zn and the linear equation cx = d in Z:
the only way cx = d can have more than one solution is if c = d = 0.
Finally, the equation
[x] = 
has no solutions in Z9 . Once again, we can easily verify this by plugging in all
the possible values of [x] = , , . . . , .
It turns out that the tools that we have in our hands right now can help us to
solve the linear congruence easily. Observe that
[a][x] = [ax] = [b],
and this is the same as solving the congruence
ax ≡ b (mod n).
Now by definition, n has to divide ax − b, so there exists an integer y such that
ax − b = n(−y).
In other words, the linear congruence [a][x] = [b] has a solution if and only if the
Diophantine equation
ax + ny = b
has a solution in integers x and y. From what we have learned in Section 3, it immediately follows that the linear equation [a][x] = [b] has no solutions if and only
if gcd(a, n) - b (verify that this is the case for the last two equations in Example
8.1). When the solutions exist, we can use the Extended Euclidean Algorithm to
find them.
32
Example 8.2. Let us consider the linear equation [x] =  in Z300 . From
Example 5.7 we know that the solutions to
440x + 300y = 80
in integers x and y are of the form
x = −8 + 15n and y = 12 − 22n,
where n is an integer. Thus [−8+15n] =  in Z300 . By evaluating −8 + 15n
at n = 1, 2, . . . , 20 we obtain 20 distinct solutions in Z300 , namely
, , , . . . , .
Note that gcd(440, 300) = 20 and there are 20 distinct solutions. In Exercise 8.3,
you are asked to prove that this phenomenon holds in general.
Exercise 8.3. Let n ≥ 1 be a modulus, a, b be integers such that a 6= 0. Prove
that, if gcd(a, n) | b, then the total number of distinct residue classes satisfying
[a][x] = [b] is equal to gcd(a, n).
9
The Group of Units Z?n
Let n be a modulus and consider the finite ring Zn of residues modulo n. Recall
that, in general, the ring Zn does not enjoy the property that if [a][b] = [a][c] and
[a] 6= 0 then [b] = [c] (see Example 7.5). However, for special values of [a] called
units this cancellation law actually holds.
Definition 9.1. The residue class [a] in Zn is called a unit if there exists a solution
to [a][x] =  in Zn . If [a] is a unit, we say that any integer b ∈ [a] is invertible
modulo n.
Proposition 9.2. The following statements are equivalent:
1. [a] is a unit;
2. For all integers b and c, [a][b] = [a][c] implies [b] = [c];
3. a and n are coprime.
33
Proof. Let us prove that 1 implies 2. Since [a] is a unit, there exists an integer x
such that [a][x] = . Now suppose that [a][b] = [a][c] for some integers b and c.
Then
[x][a][b] = [x][a][c].
Since Zn is a commutative ring, we see that [x][a] = [a][x] = . Thus the above
equality simplifies to
[b] = [c],
and this implies [b] = [c].
To prove that 2 implies 3, suppose that the statement is false and a and n are
not coprime. WIthout loss of generality, we may assume that 0 ≤ a < n. Then
there exists an integer p > 1 such that a = pk1 and n = pk2 for some integers k1
and k2 . Since p > 1, we conclude that 1 ≤ k2 < n, which in turn implies
k1 6≡ 0 (mod n).
But then
ak2 = pk1 k2 = pk2 k1 = nk1 ≡ 0 ≡ a · 0
(mod n).
Thus we see that [a][k2 ] = [a], even though [k2 ] 6= . This contradicts our
assumption, so a and n are coprime.
Finally, let us demonstrate that 3 implies 1. Since a and n are coprime, by
Bézout’s lemma there exist integers x and y such that ax + ny = 1. This means that
[a][x] = , so by Definition 9.1 the residue class [a] is a unit.
Corollary 9.3. Let [a] be a unit in Zn . Then for any integer b the equation
[a][x] = [b] has a unique solution.
Proof. Suppose that there are two solutions [x] and [y], so
[a][x] = [b] = [a][y].
By property 2 of Proposition 9.2, the identity [a][x] = [a][y] implies [x] = [y].
Note that the statements of Proposition 9.2 and Corollary 9.3 can be translated
from the language of residue classes to the language of congruences. For example,
property 1 simply states that ax ≡ 1 (mod n), while property 2 states that ab ≡ ac
(mod n) implies b ≡ c (mod n). Finally, Corollary 9.3 implies that the congruence
ax ≡ b (mod n) has a unique solution such that 0 ≤ x < n, and all integer solutions
to this congruence must be of the form x + nq for q ∈ Z.
34
Proposition 9.4. If p is prime and [a] 6=  in Z p , then [a] is a unit. Furthermore,
Z p has no zero divisors apart from  itself.
Proof. Since [a] 6= , without loss of generality we may assume that 1 ≤ a < p.
Note that this implies that a and p are coprime, for otherwise gcd(a, p) = d > 1
would imply d = p. But then p = d < a and a < p at the same time, a contradiction. Since gcd(a, p) = 1, by Bézout’s lemma there exist integers x and y such that
ax + by = 1. But then [a][x] = , so by Definition 9.1 the residue class [a] must
be a unit in Z p . Since every unit obeys the cancellation law stated in property 2 of
Proposition 9.2, it follows that Z p has no zero divisors apart from  itself.
Definition 9.5. Let [a] be a unit in Zn . The element [x] satisfying [a][x] =  is
called an inverse of Zn and is denoted by [a]−1 .
When translated to the language of congruences, the fact that a is invertible
modulo n implies the existence of an integer which we denote by a−1 such that
a · a−1 ≡ 1 (mod n).
Definition 9.6. The set of all units of Zn is called the group of units of Zn and is
denoted by Z?n .
Proposition 9.7. The set of all units of Zn forms a group under the operation of
multiplication. That is, it satisfies the following four group axioms:
1. Closure. For all [a], [b] ∈ Z?n , [a] · [b] ∈ Z?n ;
2. Associativity. ([a] · [b]) · [c] = [a] · ([b] · [c]);
3. Identity element. For all [a] in Z?n , the element  satisfies
[a] ·  =  · [a] = [a];
4. Inverse element. For each [a] in Z?n there exists an element [a]−1 in Z?n such
that
[a] · [a]−1 = [a]−1 · [a] = .
Furthermore, the group of units Z?n is finite and Abelian:10
10 In the context of groups, it is conventional to use the word “Abelian” instead of “commutative”.
35
5. Abelianness. For all [a], [b] ∈ Z?n , [a] · [b] = [b] · [a];
6. Finiteness. There are only finitely many elements in Z?n .
Proof. Exercise.
Example 9.8. Let us compute Z?10 . By Proposition 9.2, it suffices to find all
integers m, 0 ≤ m < 10, that are coprime to 10. Thus Z?n = {1, 3, 7, 9}. To convince
ourselves that Z?10 is closed under the operation of multiplication, let us construct
the multiplication table:
·
1
3
7
9
1
1
3
7
9
3
3
9
1
7
7
7
1
9
3
9
9
7
3
1
We can see that all of the elements in the multiplication table are indeed in
Furthermore, we see that each row, as well as each column in this table is
just a result of permutation of 1, 3, 7 and 9. In the future, we will see that this is
not a coincidence.
Z?10 .
10
Euler’s Theorem and Fermat’s Little Theorem
We will now prove our first non-trivial result — the Euler’s Theorem.
Definition 10.1. Let ϕ(n) denote the number of integers m such that 0 ≤ m < n
and gcd(m, n) = 1. The function ϕ is called the Euler’s totient function.
Exercise 10.2. Let #X denote the cardinality of a set X. Let n be a modulus. Prove
that ϕ(n) = #Z?n .
Theorem 10.3. (Euler’s Theorem) If [a] ∈ Z?n , then [a]ϕ(n) = .
Proof.
11
Let k = ϕ(n). Let
 = [u1 ], [u2 ], . . . , [uk ]
11 Theorem
3.16 in Frank Zorzitto, A Taste of Number Theory.
36
be the complete list of residues of Z?n . Since Z?n is a group, all the elements
[a] · [u1 ], [a] · [u2 ], . . . , [a] · [uk ]
are in Z?n . Furthermore, no element appears in this list twice, for if [a] · [ui ] =
[a] · [u j ] for some i 6= j, then [ui ] = [u j ] by property 2 of Proposition 9.2. Hence
the second list is just a permutation of [u1 ], [u2 ], . . . , [uk ]. Thus
[u1 ] · [u2 ] · · · [uk ] = ([a] · [u1 ]) · ([a] · [u2 ]) · · · ([a] · [uk ]).
Since Z?n is an Abelian group, we can rearrange the order of multiplication in order
to obtain
[u1 ] · [u2 ] · · · [uk ] = [a]k · [u1 ] · [u2 ] · · · [uk ].
Finally, we refer to property 2 of Proposition 9.2 to cancel the unit [u1 ]·[u2 ] · · · [uk ],
and conclude that [a]k = .
In the language of congruences, Euler’s Theorem translates to
aϕ(n) ≡ 1 (mod n)
for every integer that is invertible modulo n.
Example 10.4. Let us prove that 1223 divides 6231222 − 1. This become evident
once we note that ϕ(1223) = 1222 and gcd(1223, 623) = 1 (so  is a unit in
Z1223 ). By Euler’s Theorem,
6231222 ≡ 1
(mod 1223),
which means that 1223 divides 6231222 − 1.
Corollary 10.5. (Fermat’s Little Theorem) Let p be prime. Then for any integer
a such that p - a it is the case that [a] p−1 = . In other words,
a p−1 ≡ 1
(mod p).
Proof. Note that for any integer a such that 1 ≤ a < p it is the case that gcd(a, p) = 1.
Thus [a] is a unit in Z?p and ϕ(p) = p − 1. The result then follows from Euler’s
Theorem.
The theorems of Euler and Fermat give us a useful tool for raising integers to
high powers modulo n.
37
Proposition 10.6.
integers such that
12
If n is a modulus, a is coprime to n, and k, ` are non-negative
k ≡ ` (mod ϕ(n)),
then
ak ≡ a`
(mod n).
Proof. Say k ≤ `. We are given that ` = qϕ(n) + k for some q ≥ 0. Then, by
Euler’s Theorem,
q
a` = aqϕ(n)+k = aϕ(n) ak ≡ 1q ak = ak (mod n).
155
Example 10.7. Let us compute 177 modulo 33. Note that ϕ(33) = 20. Since
gcd(17, 33) = 1, by Euler’s theorem it first makes sense to reduce 7155 modulo
20. We can apply Euler’s Theorem again here. Note that ϕ(20) = 8, and since
gcd(7, 8) = 1 we can see that 78 ≡ 1 (mod 20). But then, by Proposition 10.6,
7155 = 719·8+3 ≡ 73 ≡ 343 ≡ 3 (mod 20).
Thus
155
177
≡ 173 ≡ 4913 ≡ 33 (mod 33).
Exercise 10.8. Compute the integer n, 0 ≤ n < 55, such that
2134
n ≡ 813
11
(mod 55).
The Chinese Remainder Theorem
Now that we know how to solve linear congruences, let us try to understand how
to work with systems of congruences. Since the congruence relation ≡ behaves
much like the equality relation =, solving a system of linear congruences with
a single modulus would be very similar to solving a system of linear equations,
which we already know how to handle through the methods of linear algebra.
12 Proposition
3.20 in Frank Zorzitto, A Taste of Number Theory.
38
On the other hand, if we consider different systems of different moduli, things
might get interesting. We will merely consider the most simple example of such
systems, namely

x ≡ a1 (mod n1 ),



x ≡ a (mod n ),
2
2

...



x ≡ ak (mod nk ),
where a1 , a2 , . . . , ak are integers and n1 , n2 , . . . , nk are positive integers greater than
1 that are pairwise coprime. Our goal here is to determine x, which satisfies all of
the k congruences above. The existence of such an x is asserted by the Chinese
Remainder Theorem. Before proceeding to its statement, let us recall Proposition
3.12 and the following consequence of it.
Proposition 11.1. Let m and n be integers greater than 1 that are coprime. Then
the congruence
a ≡ b (mod mn)
is true if and only if both of the congruences
a ≡ b (mod m),
a ≡ b (mod n)
are true.
Proof. Suppose that a ≡ b (mod mn). Then mn | (a − b). But then m | (a − b) and
n | (a − b) so, by definition, a ≡ b (mod m) and a ≡ b (mod n).
To prove the converse, suppose that a ≡ b (mod m) and a ≡ b (mod n). Then
m | (a − b) and n | (a − b). Since gcd(m, n) = 1, we may apply Proposition 3.12
to conclude that mn | (a − b). Thus a ≡ b (mod mn).
Theorem 11.2. (The Chinese Remainder Theorem)13 If m, n are coprime moduli
and a, b are any integers, then the congruences
x≡a
x≡b
(mod m),
(mod n)
have a common solution x. Furthermore, any two solutions x, y to this pair of
congruences must be such that x ≡ y (mod mn).
13 Theorem
4.2 in Frank Zorzitto, A Taste of Number Theory.
39
Proof. Since m and n are coprime, by Bézout’s lemma the equation
mt − ns = b − a
can be solved integers s and t. Thus mt + a = ns + b = x. Note that x ≡ a (mod m)
and x ≡ b (mod n), which makes it a solution to both congruences.
If y is another solution to the system of congruences, then
x ≡ y (mod m),
x ≡ y (mod n).
By Proposition 11.1, we conclude that x ≡ y (mod mn).
We can easily generalize this result to arbitrary number of coprime moduli.
Theorem 11.3. (Generalized Chinese Remainder Theorem)14 Suppose n1 , n2 , . . . , nk
are moduli that are pairwise coprime. That is, ni and n j are coprime when i 6= j.
If a1 , a2 , . . . , ak are integers, then there exists an integer x such that

x ≡ a1 (mod n1 ),



x ≡ a (mod n ),
2
2

.
.
.



x ≡ ak (mod nk ).
Furthermore, if x0 is such a solution of these congruences, then the complete
solution is given by all
x ≡ x0
(mod n1 n2 · · · nk ).
Example 11.4. Let us solve the system of congruences
(
x ≡ 3 (mod 6),
x ≡ 7 (mod 13).
Since 6 and 13 are coprime, by Bézout’s lemma there exist integers x and y such
that
6x + 13y = 1.
14 Theorem
4.3 in Frank Zorzitto, A Taste of Number Theory.
40
Note that x = −2 and y = 1 give us an answer. We can multiply both sides of the
above equality by 7 − 3 = 4 to obtain a solution to
6x0 + 13y0 = 7 − 3.
Such a solution is given by x0 = 4·(−2) = −8 and y0 = 1·4 = 4. After rearranging,
we get
3 + 6x0 = 7 − 13y0 = −45.
Note that −45 ≡ 3 (mod 6) and −45 ≡ 7 (mod 13). Since 6 and 13 are coprime, by
the Chinese Remainder Theorem the congruence x ≡ −45 ≡ 33 (mod 78) captures
all integer solutions to the original system of congruences.
Exercise 11.5. Solve the system of congruences


x ≡ 3 (mod 5),
x ≡ 5 (mod 7),


x ≡ 7 (mod 11).
12
Polynomial Congruences
The Chinese Remainder Theorem can also be utilized to solve polynomial congruences. Let d be a positive integer and consider a polynomial
f (x) = cd xd + cd−1 xd−1 + . . . + c1 x + c0
with integer coefficients c0 , c1 , c2 , . . . , cd . Then the congruence of the form
f (x) ≡ 0
(mod n)
(4)
is called a polynomial congruence. We would like to find all integers x, which
satisfy such a congruence. Note that, if we replace the coefficients ci of f (x) with
their residue classes [ci ], thus “reducing” our polynomial from Z to Zn , solving
the congruence (4) is equivalent to solving the equation
f ([x]) = 
in Zn . If such an equation is satisfied by some residue class [x0 ], we say that [x0 ]
is a root of f (x) in Zn .
41
Let
n = pe11 pe22 · · · pekk
be the prime factorization of n. Then, as it turns out, there is a one-to-one correspondence between solutions to the congruence (4) and solutions to the system of
congruences

f (x) ≡ 0 (mod pe11 );



 f (x) ≡ 0 (mod pe2 );
2

...



f (x) ≡ 0 (mod pekk ).
This result follows from the next proposition, which is very similar to Proposition
11.1.
Proposition 12.1. Let f (x) ∈ Z[x] be a polynomial. Let m and n be coprime
moduli. Then
f (x) ≡ 0 (mod mn)
if and only if
(
f (x) ≡ 0
f (x) ≡ 0
(mod m);
(mod n).
Proof. Suppose that f (x) ≡ 0 (mod mn). Then mn | f (x), which means that
m | f (x) and n | f (x).
Suppose that f (x) ≡ 0 (mod m) and f (x) ≡ 0 (mod n). Then m | f (x) and
n | f (x). Since m and n are coprime, it follows from Proposition 3.12 that mn | f (x).
Coming back to our previous notation, if n = pe11 pe22 · · · pekk is the prime factorization of n, and integers x1 , x2 , . . . , xk satisfy
f (xi ) ≡ 0
(mod pei i )
for i = 1, 2, . . . , k, then we can find x such that x ≡ xi (mod pei i ) for all i using
the Generalized Chinese Remainder Theorem. But then such an x would satisfy
f (x) ≡ 0 (mod pei i ) for all i, and therefore f (x) ≡ 0 (mod n). From here it follows
that, if each congruence f (x) ≡ 0 (mod pei i ) has si solutions, then the congruence
f (x) ≡ 0 (mod n) has s1 s2 · · · sk solutions.
Now we would like to determine how many solutions does a polynomial congruence f (x) ≡ 0 (mod pe ) have. Due to the time limitations, we will answer this
42
question only in the case e = 1, and show that there are at most d solutions, where
d is the degree of f (x). We remark that, in general, there are at most d solutions
when p is an odd prime, and at most 2d solutions when p = 2. The most accurate
estimates on the number of solutions of polynomial congruences was established
in 1991 by the Canadian mathematician Cameron L. Stewart, who is currently a
professor at the University of Waterloo.
Proposition 12.2. 15 If p is prime and f (x) is a polynomial of degree d with
coefficients in Z p , then f (x) has at most d roots in Z p .
Proof. We will prove this result by induction on the degree d of a polynomial
f (x).
Base case. Let d = 0. Then f (x) = α0 for some non-zero α0 in Z p . Clearly,
this polynomial has 0 ≤ d = 0 roots, so the result holds.
Induction hypothesis. Suppose that the result is true for all polynomials of
degrees k = 1, 2, . . . , d − 1.
Induction step. We will show that the result holds for every polynomial of
degree k = d. Let
f (x) = αd xd + αd−1 xd−1 + . . . + α1 x + α0 ,
where αd 6= 0. If f (x) has no roots, then surely 0 ≤ n. Otherwise f (x) has a root,
say β . Then
f (x) = f (x) − 0
= f (x) − f (β )
= αd (xd − β d ) + αd−1 (xd−1 − β d−1 ) + . . . + α1 (x − β ).
Now recall that, for any positive integer j ≥ 2 it is the case that
x j − β j = (x − β )(x j−1 + x j−2 β + x j−3 β 2 + . . . + xβ j−2 + β j−1 ).
Now we see that we can factor out (x − β ) in the expression for f (x) given above,
which means that
f (x) = (x − β )g(x)
for some polynomial g(x) with coefficients in Z p . Clearly, the degree of g(x) does
not exceed d − 1, so we can apply the inductive hypothesis to conclude that g(x)
as at most d − 1 roots.
15 Proposition
5.14 in Frank Zorzitto, A Taste of Number Theory.
43
Let γ 6= β be some root of f (x). Then
0 = f (γ) = (γ − β )g(γ).
We claim that g(γ) = 0. For assume otherwise, so that g(γ) 6= 0 and γ −β 6= 0. But
then both γ − β and g(γ) are non-trivial zero divisors in Z p , and this contradicts
Proposition 9.4, which asserts that there are no non-trivial zero divisors in Z p
whenever p is prime. We conclude that g(γ) = 0.
Since every root of f (x) is either equal to β or one of at most d − 1 roots of
g(x), we conclude that there are at most d roots of f (x).
Example 12.3. Let us solve the polynomial congruence
x49 + 2x33 + 24 ≡ 0
(mod 119).
Note that 119 = 7 · 17. By Proposition 12.1, there is a one-to-one correspondence
between the roots to the above congruence and the roots to the system of congruences
(
x49 + 2x33 + 24 ≡ 0 (mod 7);
x49 + 2x33 + 24 ≡ 0 (mod 17).
Let us solve each of these congruences separately.
Consider the case n = 7 with ϕ(7) = 6. Note that x ≡ 0 (mod 7) is not a
solution. This means that gcd(x, 7) = 1, so we may apply Euler’s Theorem:
x49 + 2x33 + 24 ≡ x8·6+1 + 2x5·6+3 + 24
≡ x + 2x3 + 24
≡ 2x3 + x + 3 (mod 7).
Thus we need to solve the congruence
2x3 + x + 3 ≡ 0 (mod 7).
After evaluating the left hand side at x = 1, 2, 3, 4, 5, 6, we can convince ourselves
that there are only two solutions, namely
x≡2
(mod 7) and x ≡ 6 (mod 7).
Consider the case n = 17 with ϕ(17) = 16. Note that x ≡ 0 (mod 17) is not a
solution. This means that gcd(x, 17) = 1, so we may apply Euler’s Theorem:
x49 + 2x33 + 24 ≡ x3·16+1 + 2x2·16+1 + 24
≡ x + 2x + 24
≡ 3x + 24 (mod 17).
44
Thus we need to solve the congruence
3x + 24 ≡ 0 (mod 17).
We see that x ≡ −8 ≡ 9 (mod 17) is a solution. Since 17 is prime, it follows from
Proposition 12.2 that this is the only solution.
Since there are two solutions modulo 7 and only one solution modulo 17, we
conclude that there are 2 · 1 = 2 solutions modulo 7 · 17 = 119. These solutions
correspond to two systems of equations:
(
(
x ≡ 2 (mod 7),
x ≡ 6 (mod 7),
and
x ≡ 9 (mod 17);
x ≡ 9 (mod 17).
We can compute solutions modulo 119 using the Extended Euclidean Algorithm.
Consider the first system of congruences. Since 7 and 17 are coprime, by Bézout’s
lemma there exists a solution to
7x + 17y = 1.
For example, x = 5 and y = −2. By multiplying both sides of the above equality
by 9 − 2 = 7, we can find a solution to
7x0 + 17y0 = 9 − 2 = 7,
namely x0 = 7 · x = 35 and y0 = 7 · (−2) = −14. But then
x1 = 2 + 7x0 = 9 − 17y0 = 247
satisfies x1 ≡ 2 (mod 7) and x1 ≡ 9 (mod 17). Therefore x1 ≡ 247 ≡ 9 (mod 119)
is a solution. The second system of congruences can be solved analogously and
gives us a solution x2 ≡ 111 (mod 119).
Exercise 12.4. Give examples of polynomials with coefficients in Z8 and Z15 for
which the conclusion of Proposition 12.2 does not hold.
13
The Discrete Logarithm Problem.
The Order of Elements in Z?n
Let n be a modulus. We already looked at certain kinds of equations in Zn . For example, in Section 6, we learned that neither [x]2 +[y]2 = 3 in Z4 nor [x]2 + [y]2 + [z]2 = 7
45
in Z8 have solutions. In Section 8, we studied the equation [a][x] = [b] in Zn and
saw that the usual application of the Extended Euclidean Algorithm allows us to
produce all of its solutions.
Now we want to understand how to handle exponential equations in Z?n . In
these kinds of equations, we are given residue classes [a] and [b] from Z?n , and
we want to determine all integer solutions x to the equation [a]x = [b]. This is
essentially the same as solving the congruence
ax ≡ b (mod n).
The problem of finding solutions to these exponential equations is known as the
discrete logarithm problem, or DLP.
Example 13.1. In Section 10, we already saw an example of an exponential equation in Z?n , namely
ax ≡ 1 (mod n).
According to Euler’s Theorem, this equation always has a non-zero solution whenever a and n are coprime. In particular, any x ≡ 0 (mod ϕ(n)) satisfies the above
congruence, for if x = ϕ(n)k for some integer k, then
ax ≡ aϕ(n)k ≡ (aϕ(n) )k ≡ 1k ≡ 1
(mod n).
However, we do not know whether there are no other solutions to this equation.
Depending on the choice of a, there might exist other solutions as well.
In general, the discrete logarithm problem is hard to solve. This problem
lies in the foundation of certain cryptosystems, which we will study in the future. Examples include the ElGamal encryption scheme and the Diffie-Hellman
key exhchange. There are algorithms for solving the discrete logarithm problem,
such as Shanks’s baby-step giant-step algorithm, or the number field sieve. None
of these algorithms run in polynomial time. However, just like for the problem
of integer factorization, there are quantum algorithms which compute solve the
discrete logarithm problem in polynomial time. In these notes, when solving the
discrete logarithm problem, we will use brute force or apply Euler’s Theorem.
In order to understand how solutions to ax ≡ b (mod n) look like, we need to
understand certain fundamental properties of the group of units Z?n .
Definition 13.2. If α ∈ Z?n , the order of α is the smallest exponent k ≥ 1 such that
α k = 1. The order is denoted by k = ord(α) or, if α = [a] for some integer a, by
k = ord(a).
46
From Euler’s Theorem, it follows that for all α ∈ Z?n it is the case that ord(α) ≤ ϕ(n).
In fact, a much stronger result holds.
Proposition 13.3. 16 Let α ∈ Z?n . A positive integer m satisfies α m = 1 if and only if
ord(α) | m. Consequently, ord(α) | ϕ(n).
Proof. Let k = ord(α). We apply the Remainder Theorem and write
m = kq + r,
where 0 ≤ r < k. Then, since α k = 1, we obtain
1 = α m = α kq+r = (α k )q α r = 1q α r = α r .
Since k is the smallest positive integer satisfying α k = 1, it must be the case that
r = 0, so k | m.
For the converse, let m = kq. Then
α m = α kq = (α k )q = 1q = 1.
Finally, according to Euler’s Theorem it is the case that α ϕ(n) = 1. But then it
follows from what we proved above that ord(α) | ϕ(n).
Example 13.4. Let us determine ord(α) in Z?n for n = 17 and α = . We have
ϕ(n) = 16. Note that D = {1, 2, 4, 8, 16} is the complete list of positive divisors of
ϕ(n). It follows from Proposition 13.3 that ord(α) ∈ D. Thus, in order to find the
order of α, we just need to iterate over all elements in D. The smallest element d
satisfying d =  is the order. We have
31 ≡ 3 (mod 17),
32 ≡ 9 (mod 17),
34 ≡ (32 )2 ≡ 92 ≡ 81 ≡ −4 (mod 17),
38 ≡ (34 )2 ≡ (−4)2 ≡ 16 ≡ −1 (mod 17),
316 ≡ (38 )2 ≡ (−1)2 ≡ 1 (mod 17).
Thus we see that ord(α) = 16, which is the largest possible order that the element
of Z?17 can attain. Note that there was no need for us to compute 316 modulo 17,
because we know the result from Euler’s Theorem.
In contrast, consider the element β =  in Z?17 . We have
1 ≡ 316 ≡ (32 )8 ≡ 98
(mod 17),
which means that ord(β ) ≤ 8. Convince yourself that, in fact, ord(β ) = 8.
16 Propositon
5.5 in Frank Zorzitto, A Taste of Number Theory.
47
[a]x
Proposition 13.3 allows us to classify all solutions to the exponential equation
= [b].
Proposition 13.5. Let [a], [b] be the elements of Z?n . If x satisfies the equation
[a]x = [b], then all solutions x0 to this equation satisfy
x0 ≡ x
(mod ord(a)).
Proof. Let x be a solution to ax ≡ b (mod n) and let k = ord(a). By the Remainder
Theorem, we can write
x = kq + r,
where 0 ≤ r < k. But then
ax ≡ akq+r ≡ (ak )q · ar ≡ 1 · ar ≡ ar
(mod n).
Thus, without loss of generality, we may assume that 0 ≤ x < k. Now suppose
0
that there exists some other x0 such that ax ≡ b (mod n). Once again, without loss
of generality we may assume that 0 ≤ x ≤ x0 < k. But then
ax ≡ b ≡ ax
implies
0
0
ax −x ≡ 1
(mod n)
(mod n).
Since 0 ≤ x0 − x < k, it must be the case that x = x0 , for otherwise we would get
a contradiction to the fact that k is the smallest positive integer satisfying ak ≡ 1
(mod n). Therefore all solutions to [a]x = [b] are of the form x0 ≡ x (mod ord(a)).
Example 13.6. Let us compare the solutions to exponential equations
3x ≡ 1
(mod 17) and 9y ≡ 1
(mod 17).
In the first case, we see that the congruence x ≡ 0 (mod 16) captures all solutions.
However, in the second case, even though y ≡ 0 (mod 16) does provide solutions,
it clearly does not cover all of the possibilities because, for example, y = 8 also
satisfies 9y ≡ 1 (mod 17). In fact, Proposition 13.5 implies that the solutions are
of the form y ≡ 0 (mod 8).
We conclude this section with several general observations about orders of
elements of Z?n .
48
Proposition 13.7.
17
If α ∈ Z?n and k = ord(α), then the list
α, α 2 , α 3 , . . . , α k = 1
does not repeat itself.
Proof. Suppose that we have a repetition α i = α j , where 1 ≤ i < j ≤ k. Thus
α j−i = 1. Since 1 ≤ j − i < k, this contradicts the minimality of k as the order of
α.
Proposition 13.8.
18
If α ∈ Z?n and k = ord(α), then
ord(α j ) =
k
.
gcd( j, k)
Proof. Let ord(α j ) = `. We will show that ` = k/ gcd( j, k). Note that
α j` = (α j )` = 1.
It follows from Proposition 13.3 that k | j`. That is, j` = ku for some integer u.
But then
j
k
`=
u,
gcd( j, k)
gcd( j, k)
and since j/ gcd( j, k) and k/ gcd( j, k) are coprime, it follows from Proposition
3.13 that k/ gcd( j, k) divides `.
On the other hand, since k is the order of α,
(α j )k/ gcd( j,k) = (α k ) j/ gcd( j,k) = 1 j/ gcd( j,k) = 1.
By Proposition 13.3 applied to the order of α j , we obtain that ` | k/ gcd( j, k).
Since k/ gcd( j, k) | ` and ` | k/ gcd( j, k), we conclude that ` = k/ gcd( j, k).
Corollary 13.9. 19 Let α be an element of Z?n . Then ord(α j ) = ord(α) if and only if
gcd( j, ord(α)) = 1.
Proposition 13.10.
are coprime then
20
Let α, β in Z?n have orders k and `, respectively. If k and `
ord(αβ ) = k`.
17 Proposition
5.6 in Frank Zorzitto, A Taste of Number Theory.
5.7 in Frank Zorzitto, A Taste of Number Theory.
19 Proposition 5.9 in Frank Zorzitto, A Taste of Number Theory.
20 Proposition 5.16 in Frank Zorzitto, A Taste of Number Theory.
18 Proposition
49
Proof. Let m = ord(αβ ). Since
(αβ )k` = α k` β k` = (α k )` (β ` )k = 1` 1k = 1,
we see from Proposition 13.3 that m | k`.
We will now show that k` | m. Since gcd(k, `) = 1, it follows from Proposition
3.12 that we only need to demonstrate k | m and ` | m. On one hand,
(α m )k = α mk = (α k )m = 1m = 1
and
(β m )` = β m` = (β ` )m = 1m = 1.
On the other hand,
(α m )` = (α m )` · 1
= (α m )` (β m )`
= (α m β m )`
= ((αβ )m )`
= 1`
= 1.
It follows from above calculations, as well as from Proposition 13.3, that k | m`.
Since k and ` are coprime, Proposition 3.13 allows us to conclude that k | m. We
can carry out an analogous calculation to show that (β m )k = 1, which would imply
` | m. But then k` | m, and since we already demonstrated that m | k`, it must be
the case that m = k`.
14
The Primitive Root Theorem
Let n be a modulus. The elements α ∈ Z?n whose order is equal to ϕ(n) deserve
a special attention. According to Proposition 13.7, they generate the whole group
Z?n simply by computing the exponents α, α 2 , . . . , α ϕ(n) = 1. Such elements are
called primitive roots and in this section we address the question of their existence
in Z?n . We will answer this question only partially by proving the Primitive Root
Theorem.
Definition 14.1. An element α ∈ Z?n is called a primitive root if ord(α) = ϕ(n).
50
Example 14.2. Let us demonstrate that Z?17 contains a primitive root. If we reduce
the elements in the list {3, 32 , 33 , . . . , 316 } modulo 17, then the resulting list is
{3, 9, 10, 13, 5, 15, 11, 16, 14, 8, 7, 4, 12, 2, 6, 1}.
Note that all 16 elements are distinct and they constitute the whole Z?17 .
Not every element in Z?17 is a primitive root. For example, the observation
made above does not hold for the list {9, 92 , 93 , . . . , 916 } reduced modulo 17:
{9, 13, 15, 16, 8, 4, 2, 1, 9, 13, 15, 16, 8, 4, 2, 1}.
The first 8 elements are distinct, and starting from the 9th element the pattern
repeats. Hence 9, 92 , . . . , 9ϕ(n) = 1 do not produce Z?17 , which is not a surprise,
because from Example 13.4 we know that ord(9) = 8.
There are groups which have no primitive roots at all. For example, there
are no primitive roots in Z?n whenever n has at least two distinct prime divisors.
Examples include Z?6 , Z?10 or Z?15 , and we leave it as an exercise to the reader to
verify that each of these three groups have no primitive roots.
Before jumping into the proof of the Primitive Root Theorem, let us determine
how many primitive roots are there in Z?n .
Proposition 14.3. 21 If Z?n has a primitive root, then the total number of primitive
roots in Z?n is ϕ(ϕ(n)).
Proof. Let α be a primitive root, so that ord(α) = ϕ(n) and
α, α 2 , . . . , α ϕ(n) = 1
cover all Z?n without repetition. The other primitive roots are those powers α j in
the list for which
ord(α j ) = ϕ(n) = ord(α).
According to Corollary 13.9, these are the powers α j where j from 1 to ϕ(n) is
coprime to ϕ(n), and there are precisely ϕ(ϕ(n)) such j’s.
We are now ready to state the Primitive Root Theorem.
Theorem 14.4. (The Primitive Root Theorem)22 Let p be prime. Then Z?p contains a primitive element.
21 Proposition
22 Theorem
5.10 in Frank Zorzitto, A Taste of Number Theory.
5.17 in Frank Zorzitto, A Taste of Number Theory.
51
If you are familiar with the basics of group theory, then you can translate the
statement of the theorem into group theoretical language by saying that the group
Z?p is cyclic whenever p is prime. In order to prove this result, we need to prove
one lemma.
Lemma 14.5.
23
Let p be prime. If α is an element of Z?p of order k, then
α, α 2 , . . . , α k−1 , α k = 1
is the complete, non-repeating list of all β in Z?p such that β k = 1.
Proof. According to Proposition 13.7, the list α, α 2 , . . . , α k contains no repetitions. Every α j in the list satisfies
(α j )k = (α k ) j = 1 j = 1.
Hence every element in the list is a root of the polynomial xk − 1. Since we
found k distinct roots of the polynomial xk − 1 whose degree is k, it follows from
Proposition 12.2 that there are no other roots.
Proof. (of Theorem 14.4) Let α be an element of Z?p . If ord(α) = p − 1, then α
is a primitive root, so we are done. Thus we may assume that k = ord(α) < p − 1.
According to Lemma 14.5, the list α, α 2 , . . . , α k = 1 picks up all roots of xk −1
in Z?p . Since k < p − 1, there is some γ in Z?p , which is not on this list. Hence
γ k 6= 1.
Let ` = ord(γ). Notice that ` - k, for otherwise we would have γ k = (γ ` )k/` =
1k/` = 1. This means that in the unique factorizations of k and `, there is a prime
number q that appears more often in ` than it does in k. Therefore
k = qd k1 and ` = qe `1 ,
where 0 ≤ d < e and q - k1 , q - `1 .
d
Let β = α q γ `1 . Then, according to Proposition 13.8,
ord(α q ) =
k
k
= d = k1 ,
d
gcd(k, q ) q
ord(γ `1 ) =
`
`
= = qe .
gcd(`, `1 ) `1
d
23 Proposition
5.15 in Frank Zorzitto, A Taste of Number Theory.
52
Since k1 and qe are coprime, it follows from Proposition 13.10 that
d
ord(β ) = ord(α q γ `1 ) = ord(α qd ) ord(α `1 ) = qe k1 > qd k1 = k = ord(α).
In this way, new elements of strictly increasing order can be found in Z?p , until
we reach some element of the largest possible order ϕ(p) = p − 1. By definition,
this element is a primitive root.
In conclusion, we provide a statement of the Generalized Primitive Root Theorem, which provides a full classification of moduli n such that Z?n contains a
primitive root. Due to the time limitations, we will refrain from proving this result.
Theorem 14.6. (Generalized Primitive Root Theorem) The group of units Z?n contains a primitive root if and only if n = 2, 4, an odd prime power, or an odd prime
power multiplied by two.
15
Big-O Notation
Before we proceed to the discussion of primality tests and integer factorization
algorithms, let us introduce several important definitions. When analyzing the
performance of algorithms, we will often be using the big-O notation and the notion of a polynomial time (or subexponential time or exponential time) algorithm.
Definition 15.1. Let f (n) and g(n) be two functions of n. We say that f (n) =
O(g(n)) if there exists a positive real number M such that | f (n)| ≤ M|g(n)| for all
sufficiently large n.
Example 15.2. Let f (n) = n2 + 4n + 7 and g(n) = n3 . Note that
12 =
19 =
28 =
39 =
52 =
...
f (1) > g(1) = 1,
f (2) > g(2) = 8,
f (3) > g(3) = 27,
f (4) < g(4) = 64,
f (5) < g(5) = 125,
so we see that, even though f (n) dominates g(n) for n = 1, 2, 3, the pattern changes
for n = 4, 5, and in fact it so happens that f (n) < g(n) for all n ≥ 4. Thus f (n) =
O(g(n)). Note, however, that g(n) 6= O( f (n)).
53
Another example is f (n) = en and g(n) = 5en + en/2 . Evidently, f (n) ≤ g(n),
so f (n) = O(g(n)). However, one may also notice that g(n) = O( f (n)), because
en/2 ≤ en , and this implies that
g(n) = 5en + en/2 ≤ 5en + en = 6en = 6 f (n),
which means that g(n) = O( f (n)). In this case, we say that f (n) and g(n) have
the same asymptotic behaviour as n approaches infinity.
The big-O notation is used in order to simplify f (n) whenever we are interested not in its precise form, but rather in its behaviour for very large n. For
example, a function
f (n) = n5 + 2en + 3 log(n)
simplifies to f (n) = O(en ), because 2en dominates all other summands present
above (note that 3 log(n) < n5 < 2en for sufficiently large n). Also, according to
our definition, we may ignore the constant 2 in front of 2en , because it is present
implicitly in the expression f (n) = O(en ). Thus, when writing a certain expression
in its big-O form, all that we need to do is to identify some “simple” function that
dominates f (n), and we want to pick this function in the best way possible. Say,
in the example above we could have written f (n) = O(e2n ), but this is a less sharp
estimate than f (n) = O(en ), because e2n grows much faster than en . Thus the
the expression f (n) = O(e2n ). The most common types of functions that we will
encounter are
at most constant growth;
at most logarithmic growth;
at most polynomial growth (k > 0);
1/k
O exp cn
at most subexponential growth (c > 0, k > 1);
O(exp(cn))
at most exponential growth (c > 0).
O(1)
O(log n)
k
O(n
) When analyzing the performance of algorithms, the function f (n) will represent the number of steps required for the algorithm to terminate given the input n. For example, it was proved by Gabriel Lamé that the computation of
gcd(a, b) with the Euclidean algorithm requires at most 5 log10 (min{a, b}) steps,
and this allows us to conclude that the performance of the Euclidean algorithm is
O (log (min{a, b})). So the number of steps required for the algorithm to terminate grows logarithmically as min{a, b} approaches infinity.
54
Definition 15.3. Suppose that an algorithm takes a positive integer n as its input.
We say that an algorithm works in polynomial time if there exists a positive real
number k such that the number of steps required for it to compute is O (log n)k .
Once again, consider the Euclidean Algorithm. As the number of steps required to compute gcd(a, b) is equal to O(log(min{a, b})), we see that we may
take k = 1 in order to conclude that the algorithm works in polynomial time. This
may seem a bit strange, because (log n)k is not a polynomial function (compare
it to, say, n2 or n3 + n + 7, which are polynomials). But when talking about an
algorithm, we are interested in its performance not with respect to an input n,
but rather with respect to the size of an input. You may think of the size of n
as the number of decimal digits of n. This number never exceeds blog10 nc + 1,
so it is logarithmic in terms of n. So, if we provide n = 1000000 as an input to
some algorithm, roughly speaking we would consider it efficient if it terminates
in 7k steps for some positive integer k (note that 7 is the number of decimal digits
of n) rather than in 1000000k steps. From this perspective, any algorithm which
works in O(n) = O(elog n ) would actually be considered as an algorithm which
works in exponential time. Such algorithms can be used to compute values only
for relatively small values of n.
Example 15.4. Here are some examples of famous algorithms and their asymptotic running time.
• The fastest algorithm for integer multiplication known to date is the ToomCook Multiplication Algorithm, which was invented in 1963. Given two
positive integers a and b, for d = log (max{a, b}) this algorithm requires
O(d 1.585 ) steps to compute, so it works in polynomial time;
• Shanks’s Baby-Step Giant-Step Algorithm, which was invented in 1971, allows one to compute discrete logarithms
modulo n. If d = log n, then the
√
running time of the algorithm is O( n) = O(ed/2 ), so it works in exponential time;
• General number field sieve is the fastest algorithm which factors large integers that is known to date. If n is an integer and d = log n, the algorithm
1/3
2/3
works in O(e2d (log d) ). The constant 2 in this expression is not optimal.
We see that this algorithm is neither polynomial, nor exponential. These
types of algorithms are called subexponential.
55
16
Primality Testing
For more details, please refer to the monograph by R. Crandall, C. Pomerance,
Prime Numbers: A Computational Perspective, 2001.
As it was mentioned in the introduction, number theory is heavily used in cryptography. In the upcoming sections, we will look at several cryptographic protocols, all of which, in one way or the other, involve primality testing. For example,
in order to ensure that the communication provided by the RSA cryptosystem is
secure, one has to be able to generate a pair of very large prime numbers (several
thousands of bits). But how do we ensure that some given number n is prime,
when we know that the problem of factorization of large integers is infeasible to
electronic computers? It turns out that there are several alternative ways to verify
that n is prime, which do not require the factorization of n.
There are three kinds of primality tests out there, namely
1. Heuristic tests — tests that work well in practice, but reside on a heuristic
explanation rather than on a proof (Fermat’s Primality Test);
2. Probabilistic tests — given n, these tests verify whether a number n is a
pseudoprime, i.e., it is a prime with a very large probability (Miller-Rabin
Primality Test);
3. Deterministic tests — given n, these tests guarantee the primality or the
compositeness of n (trial division, AKS Primality Test, Elliptic Curve Primality Test).
In this section, we will study the trial division method, the Fermat’s Primality
Test and the Miller-Rabin Primality Test. We remark that the best known primality test, the AKS Primality Test, was invented by Indian mathematicians Manindra
Agrawal, Neeraj Kayal and Nitin Saxena in 2002. To this day, it is the only deterministic unconditional polynomial time algorithm for primality testing. In 2005,
its asymptotic running time got improved by C. Pomerance and H. W. Lenstra, Jr.
to Õ((log n)6 ). Despite all of its benefits, the probabilistic Miller-Rabin Primality
Test is used in practice more often. If k denotes the number of times the algorithm
has to run before we conclude that n is a pseudoprime, the asymptotic running
time of the Miller-Rabin Primality Test is O(k(log n)3 ).
56
16.1
Trial Division
What is the most obvious way for determining whether a given integer n ≥ 2 is
composite? Well, one just has to find one of its non-trivial factors! That is, if we
can show that there exists some integer d such that d | n and 1 < d < n, then n is
composite.
For example, if n = 35, we just have to check that 2 - 35, 3 - 35, 4 - 35, until we
find out that 5 | 35. Therefore, 35 is a composite number. Of course, if we would
consider n = 37, the problem arises, as now we have to check 2 - 37, 3 - 37, . . . ,
36 - 37, until we find out that n is prime. Fortunately, as the following proposition
suggests, there is no need to check all n − 2 numbers in between 1 and n to be
certain that n is prime.
Proposition 16.1.
For any composite integer n ≥ 2 there exists a divisor d such
√
that 1 < d ≤ n. Furthermore, we may assume that d is prime.
Proof. Let n =√dk for some√non-trivial divisors d and k. If we now suppose
that both√d > n and k√> n, then dk > n, a contradiction. Therefore either
former.
1 < d ≤ n or 1 < k ≤ n hold. Without loss of generality, assume the √
Since Theorem 2.7 asserts
√ the existence of a prime p dividing d and d ≤ n, we
see that 1 < p ≤ d ≤ n.
Now we may adjust our primality test as follows. Let bxc denote the largest
integer ≤ x. According to Proposition 16.1, in order to verify that n is prime, we
just have to ensure that
√
2 - n, 3 - n, . . . , b nc - n.
√
For example, in the case of n = 37, we have b 37c = b6.083c = 6, and 2 - 37,
3 - 37, . . . , 6 - 37. Therefore 37 is prime. Thus
√ we were able to reduce the number
of steps in our primality test from n − 2 to b nc − 1. Quite a significant improvement!
We can actually do slightly better. According to Proposition 16.1, we can limit
ourselves only to prime divisors of n. So, in the case of n = 37, there was no need
to check its divisibility by 4 or 6, since these numbers are composite. So we could
achieve the same conclusion simply by testing 2 - 37, 3 - 37 and 5 - 37.
In order
√ to make this further improvement, we need to know all prime numbers ≤ n. Fortunately, there is a rather simple method called the Sieve or Eratosthenes, which allows us to produce all prime numbers up to X in O(X log log X)
steps (see Assignment 3). The method was discovered by the Greek mathematician Eratosthenes of Cyrene (≈ 250BC), and goes as follows:
57
1. Initialize a table A of X elements by setting A = 1 and A[i] = 0 for all 2 ≤
i ≤ X;
2. Let p = 2;
3. Set A[2p] = 1, A[4p] = 1, A[6p] = 1, and so on, for all multiples of p in the
table A;
√
4. Change p to the smallest index k > p such that A[k] = 0. If p > X, terminate.
Otherwise, return back to step 3.
In the end, all elements i such that A[i] = 0 will correspond to prime numbers.
It follows from Merten’s Second Theorem that the asymptotic running time of the
Sieve of Eratosthenes is O(X log log X) (see Assignment 3). This can be further
improved to O(X) if we start eliminating not from 2p (i.e. 2p, 4p, 6p, and so
on), but from p2 , thus crossing out p2 , (p + 1)p, (p + 2)p, etc. The improvement
becomes evident once we note that by the time the algorithm reaches prime p, the
numbers 2p, 3p, . . . , (p − 1)p already got eliminated by some prime√less than p.
each time
Of course, it is impractical to run the Sieve of Eratosthenes up to n√
we try to factor n, as then the asymptotic running time will always be O( n). This
is why in practice one usually runs the Sieve of Eratosthenes up to some large
bound first, then stores all prime numbers in the table, and later uses this table
to factor integers. It follows from the Prime Number Theorem that the number
of √
primes ≤ X is O(X/ log X). So, assuming that the table of prime
√ numbers up
to n is given
√ to us a priori, the trial division will now take O( n/ log n) steps
Note the power of this method: for example, given a number n ≤ 1012 , we
just have to check p | n for all primes p ≤ 106 . Given the table containing 78498
prime numbers less than a million, this verification can be done by the computer
almost immediately. In fact this method should work quite fast for all numbers
with at most 18 decimal digits. However, when the number of digits of n exceeds
18, things start to get more complicated: there are too many prime numbers to
check, and it is difficult to fit all of them into memory at once.
16.2
Fermat’s Primality Test
Another interesting way of demonstrating that a number n is composite is to use
the Fermat’s Little Theorem, which states that, if n is prime and a is coprime to n,
58
then
an ≡ a (mod n).
Therefore all that we have to do to prove that n is composite is to find a such that
an 6≡ a (mod n). If a satisfies such a property, we call it a witness for the nonprimality of n. In practice, the computation of an (mod n) can be done relatively
Example 16.2. Let us use Fermat’s Primality Test to prove that n = 323 is not
prime. Note that
323 = 28 + 26 + 2 + 1 = 256 + 64 + 2 + 1.
Now pick a random a such that 1 < a < 323, say a = 5. If n is prime, then Fermat’s
Little Theorem should hold for a. We use the Double-and-Add Algorithm to check
whether this is the case:
52 ≡ 25,
54 ≡ (52 )2 ≡ 302,
58 ≡ (54 )2 ≡ 118,
516 ≡ (58 )2 ≡ 35,
532 ≡ (516 )2 ≡ 256
564 ≡ (532 )2 ≡ 290
5128 ≡ (564 )2 ≡ 120
5256 ≡ (5128 )2 ≡ 188
(mod 323).
Thus
5323 ≡ 5256 · 564 · 52 · 5
≡ 188 · 290 · 25 · 5
≡ 256 · 125
≡ 23
6≡ 5 (mod 323).
This result allows us to conclude that 323 is not prime. Note, however, that if we
would randomly pick a = 18, 152, 170 or any other number for which a323 ≡ a
(mod 323) actually holds, we would not be able to draw any conclusion about n.
Fortunately, for 323 there are only 7 possible a’s between 1 and 323 such that
a323 ≡ a (mod 323), so the probability of this happening is relatively small. And
even if this happens, we could just pick yet another random value of a, for which
a323 6≡ a (mod 323) might be true.
From Example 16.2, the algorithm becomes clear. Let n be an integer, and let
k ≥ 1 be the maximal number of times that we are going to choose a at random.
Then do the following:
59
1. Set i = 0;
2. If i = k, conclude that n is a pseudoprime. Otherwise pick a random integer
a such that 1 < a < n;
3. Compute an (mod n) using the Double-and-Add Algorithm;
4. If an 6≡ a (mod n), conclude that n is composite. Otherwise increment i and
go back to step 2.
According to this algorithm, we conclude that n is a pseudoprime whenever k
random choices of a result in an ≡ a (mod n). In practice, this algorithm works
quite well, even though it is purely heuristic. However, there are some special
composite numbers which do not admit witnesses of their non-primality at all.
Definition 16.3. A composite integer n is called a Carmichael number whenever
an ≡ a
(mod n)
for all integers a.
There exist infinitely many Carmichael numbers, and the first 10 of them are
561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341.
They were discovered by the American mathematician Robert Carmichael. What
is interesting is that the criterion for determining Carmichael numbers was found
by the German mathematician Alwin Korselt in 1899, even before Carmichael
numbers were discovered.
Theorem 16.4.
24
An integer n is a Carmichael number if and only if
1. n = p1 · p2 · · · pk , where k > 1 and p j are primes without repetition;
2. every p j − 1 divides n − 1.
Therefore every Carmichael number will always be regarded as a pseudoprime
by the Fermat’s Primality Test and this is unavoidable.
24 Theorem
5.21 in Frank Zorzitto, A Taste of Number Theory.
60
16.3
Miller-Rabin Primality Test
This test was originally developed by Gary Miller in 1976 and it was deterministic,
but its determinism relied on a reasonable but unproved conjecture, called the
Extended Riemann Hypothesis. In 1980, Michael Rabin converted this algorithm
into unconditional, but probabilistic algorithm. This is the algorithm that we are
To understand the idea behind the Miller-Rabin primality test, recall that the
congruence
x2 ≡ 1 (mod p)
has exactly two solutions, namely x ≡ ±1 (mod p), whenever p is prime. This
simply follows from Proposition 12.2 applied to the quadratic polynomial x2 − 1
with coefficients in Z p .
Now let n > 2 be prime. Then n − 1 = 2s d for some positive integers s and d,
where d is odd. According to Fermat’s Little Theorem,
s−1 2
s
an−1 ≡ a2 d ≡ a2 d ≡ 1 (mod n).
s−1
s−1
Thus we see that a2 d is a root of x2 − 1 modulo n. Since n is prime, a2 d ≡ ±1
s−1
(mod n). If a2 d ≡ −1 (mod n), we stop. Otherwise, we can extract the square
s−2
root one more time, so that a2 d ≡ ±1 (mod n), and so on, until we either reach
r
a2 d ≡ −1 (mod n) for some r or ad ≡ 1 (mod n). We conclude that, if n is prime,
then
• either ad ≡ 1 (mod n); or
r
• a2 d ≡ −1 (mod n) for some r such that 0 ≤ r ≤ s − 1.
Thus, if we could show that
and
(mod n)
r
a2 d 6≡ −1
(mod n)
for all r such that 0 ≤ r ≤ s − 1, then n has to be composite. Note that with the
s
Fermat’s Primality Test we would only check for a2 d ≡ 1 (mod n), whereas in the
s−1
Miller-Rabin primality test we perform s checks for ad , a2d , . . . , a2 d (mod n).
As it turns out, this is more than enough to fix many problems that we saw with
61
Fermat’s Primality Test. For example, Catalan numbers can be recognized as
composite numbers. Furthermore, one can prove that at least 3/4 of a’s coprime
to an odd composite number n are witnesses of n’s compositeness. Therefore,
the probability that the Miller-Rabin Test would fail is at most 1/4, which means
that after k verifications the probability that n is composite while it is reported as
pseudoprime is at most 1/4k .
Unfortunately, one cannot do better than that, and predict the location of witnesses in Z/nZ. Their distribution can be very different, and this is why choosing
a at random is better than to use a = 2, 3, 5, . . . iteratively. For example, Arnaut
found a 397-digit composite number for which all bases a < 307 are not witnesses.
This number was reported to be prime by the Maple isprime() function, because
it picked prime bases a = 2, 3, 5, . . . iteratively, rather than randomly.
Example 16.5. Let us show that n = 323 is a pseudoprime using Miller-Rabin
Primality Test and base a = 18. Note that a323 ≡ a (mod n), so if we would
use Fermat’s Primality Test on n only once, it would report n as a pseudoprime.
However, 322 = 2 · 161, and we note that
18161 ≡ 18 6≡ ±1
(mod 323),
so n = 323 would be reported as composite by the Miller-Rabin Primality Test.
17
Public Key Cryptosystems.
The RSA Cryptosystem
For more details, please refer to the monograph by W. Trappe, L. C. Washington,
Introduction to Cryptography with Coding Theory, 2nd edition, 2006.
Suppose that Alice wants to send a secret message to Bob, and because they
are too far away from each other and personal communication is impossible, she
needs to send this message over the internet. The channel between Alice’s computer and Bob’s computer is unprotected. While travelling from one computer
to the other, the message passes many times through many different routers, and
it is possible to intercept it by listening on the channel. For example, this can be
done with packet analyzers like WireShark. Though interception of the message is
hardly avoidable, it is possible to protect the information itself through encryption.
Since the antiquity, the humanity was using what we now call private key
cryptosystems. Perhaps, the most famous example of a private key encryption
62
is the so-called Caesar cypher. According to Suetonius, Julius Caesar used this
cypher in order to encrypt messages of military significance. The cypher shifts the
message by 3 letters to the left: A → X, B → Y , C → Z, D → A, . . . , Y → T , Z → V
(note that we used Latin alphabet instead of English alphabet). For example, the
phrase
DEVS EX MACHINA
can be encrypted using Caesar’s cypher as follows:
ABRP BS IXZEFKX
Now this cypher is not terribly sophisticated, but back in Caesar’s time it was
considered quite complex, and surely the receiver would have to know the magical
number 3 in order to decrypt it by shifting letters three times to the right. So, as
we can see, both the sender and the receiver, along with the encryption/decryption
procedure, must agree on some private key, which in this case is equal to 3. Many
ciphers, such as the Vigenère cipher, the renowned Enigma cipher, or modern
ciphers such as the Digital Encryption Standard (DES) or Rijndael (AES), work
according to this principle: once the sender and the receiver agree on some secret
key, they both can encrypt and decrypt messages, thus being able to communicate
securely. But what if the sender and the receiver are too far away from each other?
If Alice is in Australia, Bob is in Bulgaria, then how can they agree on a secret
key? One answer to this problem would be public key cryptography. Key insight:
Alice and Bob don’t even have to agree on a private key in order to send encrypted
messages to each other!
The RSA cryptosystem was invented in 1977 by Ron Rivest, Adi Shamir and
Leonard Adleman. It was the first practical widely deployed public key cryptosystem. This is how RSA works. Bob generates two really large distinct prime
numbers p and q, computes n = pq, as well as ϕ(n) = (p − 1)(q − 1). Then he
chooses an encryption exponent e such that
gcd(e, ϕ(n)) = 1,
and solves the congruence
de ≡ 1
(mod ϕ(n))
for d. Then he sends the public key (n, e) to Alice. Alternatively, he can publish
(n, e) on his webpage, thus making this key publicly available to everyone. However, he does not release the private key (p, q, d). No one knows the values of p, q
and d except for Bob.
63
Now Alice can use Bob’s public key (n, e) to send messages to Bob securely.
Suppose that Alice wants to send a message written in English. First, she converts
this message into a number m. For example, this can be done using the ASCII
table. According to the ASCII table, every upper or lower case letter of English
alphabet, digit, and some special characters like * \$ ! or %, correspond to some
number between 0 and 127. For example, in the message
Hello!
the letter ‘H’ corresponds to 72, letter ‘e’ corresponds to 101, and so on:
Character
H
e
l
o
!
Base 10
72
101
108
111
33
Base 2
010010002
011001012
011011002
011011112
001000012
We concatenate base 2 representations of ASCII numbers corresponding to our
characters together, thus obtaining a bigger number m:
m = 01001000
.
| {z } 01101100
| {z } 01101100
| {z } 01101111
| {z } 00100001
| {z } 01100101
| {z }2
H
e
l
l
o
!
Note that each character fits into 1 byte = 8 bits. Since there are 6 characters in
our message, the resulting number m satisfies 0 ≤ m < 26·8 = 248 . Now, if Bob
will receive this number m, he can easily decode the message by reading off 8 bits
at a time and matching them to a corresponding character in the ASCII table.
Before encrypting the message, Alice needs to verify that 0 ≤ m < n so that
the information will not get lost during the transmission. If it so happens that
m ≥ n, she breaks the message into k = bm/nc + 1 pieces m1 , m2 , . . . , mk such that
0 ≤ mi < n for all i, 1 ≤ i ≤ k, and then sends m1 , m2 , . . . , mk to Bob consecutively.
Suppose that 0 ≤ m < n. Now Alice uses Bob’s public key (n, e) and computes
the integer c, 0 ≤ c < n, such that
c ≡ me
(mod n).
This number c is the result of RSA encryption, and Alice sends this encrypted
message to Bob over the unprotected channel.
64
When Bob receives the encrypted message c, he can decrypt it and obtain the
original message m using the private key d:
cd ≡ (me )d ≡ mde ≡ m (mod n).
Note that above we utilized the fact that de ≡ 1 (mod ϕ(n)).
Example 17.1. Suppose that Bob chose p = 1597 and q = 4139. Then
n = pq = 1597 · 4139 = 6609983,
ϕ(n) = (p − 1)(q − 1) = 1596 · 4138 = 6604248.
Bob chooses the encryption exponent e = 3263993 and then computes
d ≡ e−1 ≡ 3263993−1 ≡ 2051801 (mod 6604248).
Now he keeps p, q and d in secret, and makes (n, e) publicly available.
Now, in order to send the message “Hi!” to Bob, Alice converts it into an
integer m using the ASCII table:
m = 01001000
= 4745505.
| {z } 01101001
| {z } |00100001
{z }2
H
i
!
Alice verifies that 0 ≤ m < n, and then computes the encrypted message c with
the Double-and-Add Algorithm using Bob’s encryption exponent e:
c ≡ me ≡ 47455053263993 ≡ 673426 (mod 6609983).
Then Alice sends c = 673426 to Bob.
using his private key d:
m ≡ cd ≡ 6734262051801 ≡ 4745505
(mod 6609983).
After that, Bob converts the 3 byte number m into a three character message “Hi!”
which Alice sent to him using the ASCII table.
Now why this method of communication is secure? Suppose that some malicious adversary Eve managed to eavesdrop on the unprotected channel and intercept the message c. Since Bob’s public key (n, e) is available to everyone, Eve also
knows both n and e. Therefore Eve’s goal is, by knowing (n, e) and c, to obtain
65
m. The most obvious way to solve this problem is to find an integer d such that
de ≡ 1 (mod ϕ(n)). In order to do so, Eve has to compute ϕ(n) = (p − 1)(q − 1)
by knowing n. Unfortunately for Eve, the problem of computing ϕ(n) from n
when n is a composite number is difficult, and requires a factorization of n. To
this day, we do not know any polynomial time factorization algorithms. The best
ones, namely the Quadratic Sieve and the Generalized Number Field Sieve, are
subexponential. Thus, if we choose n large enough, — and the National Institute
of Standards and Technology (NIST) recommends to choose n > 21024 , — the
factorization of n would become infeasible to modern electronic computers, even
if the work load would be distributed among several supercomputers.
Of course, the numbers p, q and e should be chosen by Bob very carefully.
For example, if either p or q are really small, √
then they can be located
√ using trial
√
division. If either p or q are really close to n = pq, say |p − n| ≤ 2n1/4 ,
then the number n can be factored using the Fermat’s Factorization Method. If
the prime divisors of either p − 1 or q − 1 are really small, then the number n can
be factored using Pollard’s p − 1 Algorithm (see Assignment 3). If e is chosen
so that d is really small, say d < 3−1 n1/4 , then it can be calculated in polynomial
time O(log n) (see Section 6.2.1 in Trappe and Washington).
When sending the message, Alice has to be really cautious as well. For example, if the number m is relatively small in comparison to n, then even without
the knowledge of d or the factorization of n Eve can decrypt the message using
the Short Plaintext Attack (see Section 6.2.2 in Trappe and Washington). To solve
this problem, Alice can pad her message with some random characters either at
the beginning or at the end. So as you can see, there are many things that both
Alice and Bob have to check before establishing a secure communication.
The RSA cryptosystem can be utilized not only for secure communication, but
also for authentication purposes. Imagine a situation when Alice sends a message
m to Bob, and Bob cares not so much about the privacy of their communication,
but rather about the authenticity of the sender. That is, he wants to be absolutely
sure that the message m was sent to him by Alice and no one else. The way
this can be done using RSA is as follows: Alice puts a digital signature s on the
message m using her private key d:
s ≡ md
(mod n).
Then she sends (m, s) to Bob. When Bob receives the message with Alice’s signature, he can verify that it belongs to Alice by using her public key e and checking
that
m ≡ se (mod n).
66
Exercise 17.2. Use your favourite computer algebra system to encrypt the message m = 12345 with RSA using the public key (n, e) = (786073, 221891). Then
break the system by factoring n = pq, determining the private key d, and then
decrypting the message c = 547988.
Exercise 17.3. Use your favourite computer algebra system to verify that the
message (m, s) = (100, 1580073) belongs to the owner of the public key (n, e) =
(5988889, 4324055). Then break the system and put a fake digital signature s0 on
the message m0 = 1000000, so that (m0 , s0 ) passes the verification with the public
key (n, e).
Exercise 17.4. (Exercise 7 in Trappe and Washington) Naive Nelson uses RSA to
receive a single ciphertext c, corresponding to the message m. His public modulus
is n and his public encryption exponent is e. Since he feels guilty that his system
was used only once, he agrees to decrypt any ciphertext that someone sends him,
as long as it is not c, and return the answer to that person. Eve sends him the
ciphertext 2e c (mod n). Show how this allows Eve to find m.
Exercise 17.5. (Exercise 8 in Trappe and Washington) In order to increase security, Bob chooses n and two encryption exponents e1 , e2 . He asks Alice to encrypt
her message m to him by first computing c1 ≡ me1 (mod n), then encrypting c1 to
get c2 ≡ ce12 (mod n). Alice then sends c2 to Bob. Does this double encryption
increase security over single encryption? Why or why not?
Exercise 17.6. (Exercise 10 in Trappe and Washington) The exponents e = 1 and
e = 2 should not be used in RSA. Why?
18
The Diffie-Hellman Key Exchange Protocol
There are many benefits to using RSA, but there is one big problem: despite the
fact that it works in polynomial time, it is quite slow. For suppose that we want to
compute
c ≡ me (mod n).
The Double-and-Add Algorithm requires at most log e squarings and at most log e
multiplications, thus resulting in at most 2 log e ≤ 2 log n arithmetic operations in
total. Each multiplication involves numbers of size at most log n. The best known
multiplication algorithm, the Toom-Cook Algorithm, requires O((log n)1.465 ) steps
to multiply two integers of size at most log n. Since there are at most 2 log n multiplications, the encryption and decryption require O((log n)2.465 ) steps to compute.
67
Roughly speaking, this means that if n is a 2048 bit number, then one can encrypt
or decrypt messages in 20482.465 ≈ 1.45 · 108 steps.
Private key cryptosystems (also referred to as symmetric ciphers or block ciphers) are much much faster, because their execution does not involve any complex mathematical computations. Instead, in order to encrypt the message they
use logical operations, such as AND, OR, NOT and XOR, as well as bit shifts
and bit permutations. Caesar cipher is an example of a cipher which uses only
shifts, but on letters of the alphabet rather than on bits. Anagrams, like “eHll!o”,
are examples of permutations on letters. These operations are very simple and
in fact require only O(1) steps to compute (compare it to multiplication, which
requires O((log n)1.465 )). In the end, both encryption and decryption for these ciphers require O(log n) steps. The most widely deployed symmetric ciphers are
3-DES (Triple Data Encryption Standard) and AES (Advanced Encryption Standard), which is also commonly referred to as Rijndael.
As it was mentioned in Section 17, in order to use private key cryptosystems
two parties must agree on a secret key. So how can this be done when Alice and
Bob are too far away from each other? Here is one way: Alice generates a secret
key K, encrypts it using RSA with Bob’s public key, and then sends the encrypted
message to Bob. Bob decrypts the message, and so now Alice and Bob share a
secret K in common. Then they may use whichever symmetric algorithm they
want, such as 3-DES or AES.
But there is another way for Alice and Bob to agree on a common key. This
procedure, called The Diffie-Hellman Key Exchange Protocol, was patented by
Whitfield Diffie and Martin Hellman in 1977. Its security is based on the Discrete
Logarithm Problem, and it works as follows. Alice generates a large prime number
p, an integer g such that 0 ≤ g < p, and an integer x such that 1 ≤ x ≤ p − 2. She
computes gx (mod p), and then sends p, g and gx (mod p) to Bob. When Bob
receives p, g and gx (mod p), he generates an integer y such that 1 ≤ y ≤ p − 2,
computes gy (mod p), and then sends it back to Alice. Finally, since Alice knows
x and gy (mod p), she can compute
gxy ≡ (gy )x
(mod p),
and since Bob knows y and gx (mod p), he can compute
gxy ≡ (gx )y
(mod p).
So in the end both Alice and Bob share a secret in common, namely gxy (mod p).
68
Why is this secure? If a malicious adversary Eve would listen on the communication between Alice and Bob, she could intercept p, g, gx (mod p) and
gy (mod p), and by knowing this information she would have to compute gxy
(mod p). This problem is called the Diffie-Hellman Problem, and it is at least as
hard as the Discrete Logarithm Problem. That is, if Eve would know how to solve
the Discrete Logarithm Problem, she would be able to solve the Diffie-Hellman
Problem (see Assignment 3). However, it is not known whether these two problems are equivalent. We do not know any polynomial time algorithm for computing discrete logarithms. The best known subexponential algorithm is due to
Adleman and it utilizes index calculus. The discrete logarithm can be computed
quite fast in some special cases, but if the parameters p, g, x and y are chosen
properly, the problem becomes intractable to modern electronic computers. There
are many things that need to be verified in order to ensure that the communication
is secure, but we will just mention that the parameter g should be chosen so that
ord(g) in Z?p is sufficiently large.
As a final remark, we would like to mention that there exists an efficient quantum algorithm for computing discrete logarithms, which was invented by Peter
Shor in 1997.
19
Integer Factorization
The next computational problem that we address is the integer factorization problem. That is, given a composite integer n, we would like to find a non-trivial
divisor of n. Unlike for primality testing, we do not know any polynomial time
algorithm for integer factorization. Many mathematicians believe that the integer
factorization problem is hard, and several cryptographic protocols, such as RSA,
reside on this assumption. If you want to become a famous mathematician, try
inventing a polynomial time algorithm for integer factorization. Note, however,
that there exists an efficient quantum algorithm for integer factorization, which
was invented by Peter Shor in 1994.
There are many algorithms for integer factorization. The most obvious one,
trial division, we studied√in Section 16. Of course, this algorithm allows us to
factor an integer n in O( n) = O(elog n/2 ) steps, so this algorithm is exponential
and is no good for factoring large integers.
In this section, we will study two factorization algorithms, namely the Fermat’s Algorithm and its optimized variant, called the Dixon’s Algorithm. The
former is an exponential algorithm and the latter is a subexponential algorithm.
69
You will also learn about Euler’s Factorization Method in Assignment 3.
19.1
Fermat’s Factorization Method
Fermat’s Factorization Method was suggested by the French mathematician Pierre
de Fermat back in XVII century. The idea is simple: given an integer n, the goal
is to find integers x and y such that
n = x2 − y2 .
Then
n = (x − y)(x + y),
and if neither x − y nor x + y are equal to 1, this results in a non-trivial factorization
of n. Note that even numbers cannot be represented in this form, but we may
easily disregard them from consideration, since every even number greater than 2
always has a non-trivial divisor equal to 2. Unlike even integers, odd integers can
be represented as a difference of two perfect squares, for if n = k`, then
n=
k+`
2
2
k−`
−
2
2
.
Since n is odd, then so are k and `, which means that both (k + `)/2 and (k − `)/2
are integers, too. If n = k` is a multiple of 4, such a representation is also possible
once we assume that both k and ` are even. From the formula above it is also
evident that there can be many representations of an integer as a difference of two
perfect squares.
Let dxe denote the smallest integer ≥ x. We will now convert the observations
√
1. Put x := d ne and then set y := x2 − n;
√ 2. If y is a perfect square, return x − y ; otherwise proceed to Step 3;
3. Increase x by 1 and then set y := x2 − n;
4. Go back to Step 2.
Note that the algorithm always terminates. Furthermore, if the algorithm returns 1, then the number n must be prime.
70
Example 19.1. Let us use Fermat’s Algorithm to factorize n = 8023. Note that
√
n ≈ 89.57, so we begin with x = 90 and y = x2 − n = 902 − 8023 = 77. We see
that
y = ?
x y
90 77
no
91 258
no
yes
92 441
√
Since 441 = 21, we see that
8023 = 922 − 212 = (92 − 21)(92 + 21) = 73 · 113.
Thus Fermat’s Factorization Algorithm terminated in just three steps, resulting in
√
a non-trivial factor x − y = 92 − 21 = 73.
Exercise 19.2. Use Fermat’s Algorithm to factor integers 4747 and 7303.
Now let us analyze the performance of the algorithm above. We will count a
single computation
√ of x and y as one step. If n = k` and k is the largest divisor of
n such that k ≤ n, then Fermat’s Algorithm will return k as a result. In this case,
y = (k + `)/2, which means that the number of steps required for the computation
is equal to
√
k+`
− d Ne.
2
We can bound this quantity from above as follows:
√
k+`
k+` √
− d Ne ≤
− N
2
2
√
√
( k − `)2
=
√ 2
( n − k)2
=
.
2k
We see that, if n is prime, then k = 1 and the algorithm requires O(n) steps to
compute. Therefore, in its worst case, the algorithm is exponential. Note
√ that it is
even worse than trial division, because the trial division requires O( n) steps to
compute.
Why do we care then about Fermat’s Factorization Method? First of all, in
some special cases it performs really well. For suppose that k satisfies
√
n − k ≤ 2n1/4 ,
71
so it is relatively close to
√
n. Then for all n > 64 it is the case that
√
√
4 n
( n − k)2
≤ √
2k
2( n − 2n1/4 )
2
≤
1 − 2n−1/4
< 3,
which means that Fermat’s Algorithm terminates in two steps! Of course, this is
much faster than if we would use trial division. This is why Fermat’s Factorization Method is usually used in
√ combination with the Trial Division Method. First
one chooses a constant
c > n and then Fermat’s Algorithm is used to look for
√
divisors between n and c. After that, one√only has to check√prime divisors of
n with the trial division method up to c − c2 − n instead of n. Even though
this observation does not allow us to push the bound below O(n1/2 ), it helps to
decrease the constant implicit in the big-O notation significantly. Further improvements can be done through sieving, and in 1974 Lehman managed to combine all
of the improvements and invented a factorization algorithm based on Fermat’s
Factorization Method and trial division with asymptotic running time O(n1/3 ).
Though Fermat’s Algorithm can be quite slow in its worst case, it lies in
the foundation of the best factorization algorithms known to date, namely the
quadratic sieve and the generalized number field sieve, which have subexponential
asymptotic running time. Both of these algorithms evolved from the factorization
method due to Dixon.
19.2
Dixon’s Factorization Method
Dixon’s Factorization Method was proposed in 1971 by the Canadian mathematician John D. Dixon, who is a professor emeritus at Carleton University, Ottawa.
Recall that in Fermat’s Factorization Method we were choosing an integer x between 0 and n and then evaluating x2 (mod n), hoping that the result would be a
perfect square; that is,
x2 ≡ y2 (mod n).
√
Unfortunately, up to n, there are only b nc perfect squares, and so for very large
n the total proportion of perfect squares less than n tends to zero:
√
√
b nc
n
1
≤
= √ −→ 0.
n
n
n
72
Dixon’s method suggests that, instead of looking for a perfect square we can actually construct it from many random samples. The idea is as follows: by picking
distinct x1 , x2 , . . . between 0 and n at random, we obtain relations of the form
x12 ≡ z1
x22 ≡ z2
...
(mod n),
(mod n),
where z1 , z2 , . . . are integers between 0 and n. One would then hope to select
relations i1 , i2 , . . . , ir so that the number zi1 zi2 · · · zir = y2 is a perfect square. But
then
(xi1 xi2 · · · xir )2 ≡ y2 (mod n),
which means that one can compute a divisor d of n by evaluating
d = gcd(xi1 xi2 · · · xir − y, n).
If it so happens that d = 1 or d = n, we construct a new set of random samples, or
select a different k-tuple i1 , i2 , . . . , ir with the property described above.
Now the main question is, how do we construct congruences xi2 ≡ zi (mod n),
from which we can produce a non-trivial perfect square? The main idea here is to
pick only those xi ’s, for which the resulting values of zi ’s are so-called B-smooth
numbers.
Definition 19.3. Let B ≥ 2 be a real number. An integer n is called B-smooth if
for any prime p | n it is the case that p ≤ B.
Example 19.4. For example, numbers 2, 3, 4, 5, 6, 8, 9, 10, 12 are all 5-smooth.
The reason is that every prime p dividing an integer from that list satisfies p ≤ 5.
The numbers 7 and 11, however, are not 5-smooth, but they are both 11-smooth.
Now every time we choose a random x and then evaluate z ≡ x2 (mod n) such
that 0 ≤ z < n, we need to verify that z is B-smooth. One can check that a given
number z is B-smooth in just O(B) steps using trial division. Note that, if p1 <
p2 < . . . < pk are all prime numbers ≤ B, then every B-smooth number can be
written in the form
z = pe11 pe22 · · · pekk ,
where e1 , e2 , . . . , ek are non-negative integers. Thus we obtain a vector v = (e1 , e2 , . . . , ek )
in Zk . Further, we can reduce the elements of this vector modulo 2, thus obtaining
a vector ṽ = (ẽ1 , ẽ2 , . . . , ẽk ) in Zk2 with ẽ1 , ẽ2 , . . . , ẽk ∈ {0, 1}. Because Z2 forms a
73
field (that is, division by a non-zero element is always allowed), the set Zk2 constitutes a k-dimensional vector space over Z2 , which means that we can analyze it
from the perspective of linear algebra. In particular, any collection of k + 1 vectors
in Zk2 will always be linearly dependent.
Now suppose that for distinct values x1 , x2 , . . . , xk+1 we managed to compute
B-smooth values z1 , z2 , . . . , zk+1 , which correspond to vectors v˜1 , v˜2 , . . . , vk+1
˜ in
k
k
Z2 . Since Z2 has dimension k, it must be the case that vectors v˜1 , v˜2 , . . . , vk+1
˜ are
linearly dependent in Zk2 . But then there must exist indices i1 , i2 , . . . , ir for some
r ≤ k + 1 such that
vi1 + vi2 + . . . + vir ≡ 0
(mod 2),
which means that zi1 zi2 · · · zir is a perfect square. In order to find such linearly
dependent vectors v˜i1 , v˜i2 , . . . , v˜ir in Zk2 , we row reduce the (k +1)×(k +1) matrix
M = [v˜1 , v˜2 , . . . , vk+1
˜ ]T ,
whose coefficients belong to Z2 . Note that the row reduction requires O(k3 ) =
O(B3 ) steps. At this point, we can compute the value
d = gcd(xi1 xi2 · · · xir − zi1 zi2 · · · zir , n)
and, in case if d = 1 or d = n, repeat the procedure of choosing distinct random
values x1 , x2 , . . . , xk+1 once again.
The only thing that is left for us to establish
is the value of B. As it turns out,
√
log
n
log
log n) , so the asymptotic running
O(
the most optimal choice for B is B = e
time of Dixon’s algorithm is subexponential.
Exercise 19.5. In this exercise, we will use Dixon’s method to find a non-trivial
factor of 34081.
(a) Factorize integers 15, 486, 24010 to ensure that they are all 7-smooth;
(b) Suppose that the execution of Dixon’s Factorization Algorithm allowed us to
locate the congruences
8052 ≡ 486 (mod 34081);
8462 ≡ 15 (mod 34081);
9542 ≡ 24010 (mod 34081).
Using the above congruences, as well as the factorizations obtained in Part (a),
find integers x and y such that
x2 ≡ y2
(mod 34081),
and then use these x and y to compute a non-trivial factor of 34081.
74
20
Let n ≥ 3 be a modulus and a, b, c be arbitrary integers. We will now turn our
ax2 + bx + c ≡ 0
(mod n).
We require that n - a, for otherwise the above congruence would reduce to the
linear congruence bx + c ≡ 0 (mod n). Also, if n = 2, by Fermat’s Little Theorem
x2 ≡ x (mod 2) regardless of x. Thus
ax2 + bx + c ≡ (a + b)x + c (mod 2),
so once again we obtain a linear congruence. Thus it is reasonable to assume that
n ≥ 3. Finally, for the simplicity of exposition, we will assume that n is an odd
prime, and we will indicate that by writing p instead of n. Note that the integer
p−1
2 is even.
In this section, we will not aim to solve quadratic congruences. Instead, we
will investigate when solutions exist. Note that it follows from Propositon 12.2
that the polynomial [a][x]2 + [b][x] + [c] has at most 2 roots in Z p .
Proposition 20.1. 25 Let p be an odd prime, and a, b, c be integers where p - a.
ax2 + bx + c ≡ 0 (mod n)
has a solution x if and only if the congruence
y2 ≡ b2 − 4ac
(mod p)
has a solution y. In that case, y ≡ 2ax + b (mod p).
Proof. Multiply both sides of the quadratic congruence by 4a to get
4a2 x2 + 4abx + 4ac ≡ 0
(mod p).
This can be rewritten as
(2ax + b)2 − b2 + 4ac ≡ 0
25 Proposition
(mod p),
6.1 in Frank Zorzitto, A Taste of Number Theory.
75
which is the same as
(2ax + b)2 ≡ b2 − 4ac
(mod p).
Conversely, suppose that y is a solution to y2 ≡ b2 − 4ac (mod p). Note that
we can solve the linear congruence 2ax + b ≡ y (mod p) for x, because [2a] is a
unit in Z p . Thus
(2ax + b)2 ≡ y2 ≡ b2 − 4ac
(mod p),
which is the same as
4a2 x2 + 4abx + 4ac ≡ 0
(mod p).
Since [4a] is a unit in Z p , we can multiply both sides of the above congruence by
(4a)−1 (mod p) in order to obtain
ax2 + bx + c ≡ 0 (mod p).
Therefore x which satisfies 2ax + b ≡ y (mod p) is a solution to the original
Proposition 20.1 tells us that solving the quadratic congruence
ax2 + bx + c ≡ 0
(mod p)
is equivalent to solving a simplified quadratic congruence
x2 ≡ d
(mod p),
where d = b2 − 4ac. The integer d is called the discriminant of the quadratic
polynomial aX 2 + bX + c. Thus, in order to find solutions to x2 ≡ d (mod p), we
need to understand which residue classes of Z p are squares.
Definition 20.2. A residue α in Z p is called a quadratic residue when α ∈ Z?p and
α = β 2 for some other residue β in Z?p . If such β does not exist, then α is called
When translated to the language of congruences, we say that an integer a has
a quadratic residue modulo an odd prime p if p - a and a ≡ x2 (mod p) for some
integer x.
76
Example 20.3. Let us find all quadratic residues in Z?13 . We note that
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
2 = 
Thus the quadratic residues are , , , , , .
Exercise 20.4. Determine all quadratic residues in Z?17 , Z?19 and Z?23 .
Proposition 20.5. Let p be an odd prime. Then the group of units Z?p has exactly
(p − 1)/2 quadratic residues and exactly (p − 1)/2 quadratic nonresidues.
Proof. Note that, for any [a] in Z?p , it is the case that [a]2 = (−[a])2 . Thus it is
sufficient to look at a’s such that 1 ≤ a ≤ (p − 1)/2. We now claim that all the
elements in the collection
p−1 2
2
2
 ,  , . . . ,
2
are distinct. Suppose not, and [a]2 = [b]2 = [c] for some residue [c]. Then both
[a] and [b] are the roots of the polynomial X 2 − [c] in Z p . By Proposition 12.2,
such a polynomial has at most 2 roots in Z p . However, we see that it has at least 4
distinct roots, namely ±[a] and ±[b]. Thus we obtain a contradiction. Therefore
the above collection has no repetitions, so Z?p contains (p − 1)/2 residues. Since
every element of Z?p which is not a residue is a nonresidue, we conclude that there
are exactly (p − 1)/2 nonresidues.
Definition 20.6. For an odd prime p and an integer a coprime with p, we let
(
+1 if a has a quadratic residue modulo p;
a
:=
p
−1 if a does not have a quadratic residue modulo p.
The symbol ap is called the Legendre symbol for a modulo p.
Example 20.7. Note that
8
6
= +1 while
= −1.
17
17
77
Also, for any odd prime p it is clear that 1 is a quadratic residue, i.e.
However, the value of −1
p varies with p. For example,
−1
−1
= +1 while
= −1.
13
19
1
p
= +1.
We will now give an alternative proof of Proposition 20.5 using primitive
roots.
Proof. (of Proposition 20.5) Since p is an odd prime, it follows from the Primitive
Root Theorem that there exists a primitive root γ in Z?p . That is, for every residue
α in Z?p there exists an integer j, 1 ≤ j ≤ p − 1, such that α = γ j .
First of all, let us demonstrate that it is impossible to represent α by both odd
and even powers of γ. For suppose that α = γ i = γ j for some 1 ≤ i ≤ j. Then
γ j−i = 1. By Proposition 13.3, ord(γ) | j − i. Since ord(γ) = p − 1, we conclude
that an even number p − 1 divides j − i. But then it means that either both i and j
are odd or both i and j are even.
Now recall that, since γ is a primtive root in Z?p , the elements γ, γ 2 , . . . , γ p−1 are
distinct, and half of them are even powers of γ. These are the quadratic residues.
On the other hand, all odd powers of γ are quadratic nonresidues.
Proposition 20.8. Let p be an odd prime and let α and β be the elements of Z?p .
Then
• If α and β are quadratic residues then αβ is a quadratic residue;
• If α is a quadratic residue and β is a quadratic nonresidue then αβ is a
• If α and β are quadratic nonresidues then αβ is a quadratic residue.
Proof. Since p is an odd prime, it follows from the Primitive Root Theorem that
there exists a primitive root γ in Z?p . Then α = γ i and β = γ j , so αβ = γ i+ j .
Now, as we saw in the second proof of Proposition 20.5, if α and β are quadratic
residues then both i and j are even, which means that i + j is even as well. Therefore αβ = (γ (i+ j)/2 )2 is a quadratic residue. We can prove the other two statements analogously.
The propositions
above suggest one algorithm for calculating the Legendre
a
symbol p . First, we need to find the primitive root γ in Z?p and then determine the
parity of x in γ x = [a]. Fortunately, Euler came up with a much simpler procedure.
78
Proposition 20.9. (Euler’s Test)26 If p is an odd prime and a is an integer such
that p - a, then
p−1
a
2
(mod p).
a
≡
p
p−1
In other words, if a has a quadratic residue, then a 2 ≡ +1 (mod p), and if a
p−1
does not have a quadratic residue, then a 2 ≡ −1 (mod p).
Proof. Let [b] be a primitive root in Z?p . Suppose that a is a quadratic residue.
Then
a ≡ b2 j (mod p)
for some non-negative integer j. Thus
a
p−1
2
≡ b2 j
p−1
2
≡ b(p−1) j ≡ (b j ) p−1 ≡ 1 (mod p).
Thus ap = +1, as claimed.
Now suppose that a is a quadratic nonresidue. Then
a ≡ b2 j+1
(mod n)
for some non-negative integer j. Then
a
p−1
2
≡ b2 j+1
p−1
2
≡b
p−1
2
b(p−1) j ≡ b
p−1
2
(mod p).
Note that
p−1 2
≡ b p−1 ≡ 1 (mod p),
b 2
h p−1 i
so the residue class b 2 is a root of the polynomial X 2 − 1 in Z p . Since p is an
odd prime, by Proposition 12.2, this polynomial has at most two roots. In fact, it
has exactly two roots, namely X = ±. Therefore
b
p−1
2
≡ ±1
(mod p).
p−1
Note that it cannot happen that b 2 ≡ 1 (mod p), because then the order of [b]
would be strictly less than p − 1 = ϕ(p), which contradicts the fact that [b] is a
primitive root in Z?p . Therefore
b
26 Proposition
p−1
2
≡ −1
(mod p),
6.8 in Frank Zorzitto, A Taste of Number Theory.
79
and so we conclude that, when a is a quadratic nonresidue,
a
p−1
2
≡ −1
(mod p).
Therefore for any a such that p - a it is the case that a
p−1
2
≡
a
p
(mod p).
Corollary 20.10. 27 The integer −1 is a quadratic residue modulo an odd prime
p if and only if p ≡ 1 (mod 4).
Proof. By Euler’s Test,
p−1
−1
≡ (−1) 2
p
(mod p).
Since both sides of the above congruence are equal to ±1, this congruence is
actually an equality. The result then follows from the fact that
(
p−1
1
p ≡ 1 (mod 4);
(−1) 2 =
−1 p ≡ 3 (mod 4).
Example 20.11. Does a = 138 have a quadratic residue modulo p = 557? We use
Euler’s Test to answer this question. Note that p−1
2 = 278. We can now compute
a
p−1
2
(mod p) using the Double-and-Add algorithm:
a
p−1
2
≡ 138278 ≡ −1 (mod 557).
Therefore 138 does not have a quadratic residue modulo 557.
364
51
Exercise 20.12. Compute 199
, 503 and 273
461 using Euler’s Test.
At the end of this section, let us take a look at one curious application of the
Proposition 20.13.
27 Proposition
28 Proposition
28
There are infinitely many primes congruent to 1 modulo 4.
6.10 in Frank Zorzitto, A Taste of Number Theory.
6.11 in Frank Zorzitto, A Taste of Number Theory.
80
Proof. Suppose we have a finite list of primes p1 , p2 , . . . , pn congruent to 1 modulo 4. We will show how to produce yet another prime congruent to 1 modulo 4
that is not on this list. Let
x = (2 · p1 · p2 · · · pn )2 + 1.
Let q be any prime factor of x. If q ∈ {2, p1 , p2 , . . . , pn }, then q | 1, which is
impossible. Since q divides x, we see that
−1 ≡ (2 · p1 · p2 · · · pn )2
(mod q),
which means that −1 is a quadratic residue modulo q. But then it follows from
Corollary 20.10 that q ≡ 1 (mod 4). Thus we were able to produce on more prime
which is not in the original list of primes. Repeating this procedure yet another
time but with the list p1 , p2 , . . . , pn , pn+1 = q, we can produce one more prime
congruent to 1 modulo 4, and so on. Hence we can generate infinitely many
distinct primes that are congruent to 1 modulo 4.
21
Let p ≥ 3 be prime and a be aninteger such that p - a. We have already seen several
approaches for computing ap , for example the Euler’s Test. In this section, we
will investigate one more approach invented by Gauss. In fact, he established what
we now call the Law of Quadratic Reciprocity, which encapsulates very important
We begin by proving the following proposition on the multiplicativity of the
Legendre symbol.
Proposition 21.1. 29 The Legendre symbol is multiplicative. That is, if p is an odd
prime and a, b are integers coprime to p, tehn
a
b
ab
=
.
p
p
p
Furthermore, if a ≡ b (mod p), then
a
b
=
p
p
29 Propositon
6.15 in Frank Zorzitto, A Taste of Number Theory.
81
Proof. The second statement is obvious because the residue is the same for all
congruent integers. a b
To prove that ab
p = p p for any a and b coprime to p, we apply Euler’s
Test (see Proposition 20.9):
p−1 p−1
p−1
a
b
ab
= a 2 b 2 ≡ (ab) 2 ≡
(mod p).
p
p
p
Since ap bp = ±1 and ab
p = ±1 and these two integers are congruent modulo
p, they have to be identical.
By the Fundamental Theorem of Arithmetic, every positive integer a > 1 is
a product of primes. That is, a = q1 q2 · · · qn for some primes q1 , q2 , . . . , qn with
repetitions allowed. By Proposition 21.1,
q1
qn
q2
a
=
···
.
p
p
p
p
Also, if a is a negative integer, then a = −1 · b for some positive integer b, which
means that
−1
b
a
=
.
p
p
p
We conclude that, in order to determine the value of ap , one has to explore the
values of qp for distinct primes p and q.
Essentially, for any fixed prime q, the Law of Quadratic
Reciprocity allows us
q
to understand what values does the Legendre symbol p take when an odd prime
p varies. As a very simple example, let us explore the case q = 2.
Proposition 21.2.
30
If p is an odd prime then
(
+1 p ≡ 1, 7
2
=
p
−1 p ≡ 3, 5
(mod 8);
(mod 8).
Proof. Suppose p = 8k + 1 for some for some positive integer k. There are 4k =
p−1
2 even integers between 1 and p, namely
2, 4, 6, . . . , 4k − 2, 4k, 4k + 2, 4k + 4, . . . , 8k − 2, 8k.
30 Proposition
6.14 in Frank Zorzitto, A Taste of Number Theory.
82
Let us compute their product:
x = 2 · 4 · 6 · · · (4k − 2) · · · (4k) · · · (4k + 2) · · · (4k + 4) · · · (8k − 2) · · · (8k)
= 24k (1 · 2 · 3 · · · (2k) · (2k + 1) · (2k + 2) · · · (4k − 1) · (4k))
= 24k (4k)!
However,
4k + 2 ≡ 1 − 4k
4k + 4 ≡ 3 − 4k
..
.
8k − 2 ≡ −2
8k ≡ −1
(mod p)
(mod p)
(mod p)
(mod p).
Using the above information, we can compute x (mod p) as follows:
x ≡ 2 · 4 · 6 · · · (4k − 2) · (4k) · (1 − 4k) · (3 − 4k) · (5 − 4k) · · · (−2) · (−1)
≡ 2 · 4 · 6 · · · (4k − 1) · (4k − 3) · (4k − 5) · · · 3 · 1 · (−1)2k
≡ (4k)! (mod p).
We conclude that
24k (4k)! ≡ (4k)! (mod p).
After cancelling (4k)! on both sides we obtain
2
p−1
2
≡ 24k ≡ 1
(mod p).
By Euler’s Test, the integer 2 has a quadratic residue modulo p. The cases p ≡
3, 5, 7 (mod 8) can be studied analogously and are left as an exercise to the reader.
Since we managed to understand how qp behaves for fixed q = 2, one would
hope that such a result can be established for all other primes. Indeed, this can
be achieved with the Law of Quadratic Reciprocity, proved by the German mathematician Carl Friedrich Gauss at the age of 19.
Theorem 21.3. (Gauss’s Law of Quadratic Reciprocity)31 Let p and q be distinct
odd prime numbers. Then
p−1 q−1
p
q
= (−1) 2 · 2 .
q
p
31 Theorem
6.16 in Frank Zorzitto, A Taste of Number Theory.
83
In other words,
( q
p
p if p ≡ 1 (mod 4) or q ≡ 1 (mod 4);
=
q
− qp if p ≡ 3 (mod 4) and q ≡ 3 (mod 4).
The proof is quite non-trivial, so due to time limitations we will not present it
in class or in these notes. If you would like to see the proof, see Section 6.4 in
Frank Zorzitto, A Taste of Number Theory.
Example 21.4. Let us examine how the value of 3p depends on the odd prime p.
By the Law of Quadratic Reciprocity,
p−1 3−1
p−1
p
3
= (−1) 2 · 2 = (−1) 2 .
p
3
Multiplying both sides of the above equality by 3p , we obtain
p−1
3
p
= (−1) 2
.
p
3
Now there are two cases to consider:
1. Suppose that p ≡ 1 (mod 4). Then 3p = 3p , so the value of 3p depends
on the congruence class of p modulo 3. Note that 1p = +1 and 2p = −1.
We conclude that 3p = +1 if
and
3
p
(
p≡1
p≡1
(mod 4);
(mod 3),
(
p≡1
p≡2
(mod 4);
(mod 3).
= −1 if
Since 3 and 4 are coprime,
we can apply the Chinese Remainder
Theorem
3
3
to conclude that p = +1 when p ≡ 1 (mod 12) and p = −1 when p ≡ 5
(mod 12).
84
2. Analogously, we can analyze the case p ≡ 3 (mod 4). We have 3p = − 3p ,
which means that 3p = +1 if
(
p ≡ 3 (mod 4);
p ≡ 2 (mod 3),
and 3p = −1 if
(
p ≡ 3 (mod 4);
p ≡ 1 (mod 3).
Applying the Chinese Remainder Theorem, we see that 3p = +1 when
p ≡ 11 (mod 12) and 3p = −1 when p ≡ 7 (mod 12).
We conclude that
(
+1
3
=
p
−1
p ≡ 1, 11 (mod 12);
p ≡ 5, 7 (mod 12).
Exercise 21.5. Determine for which odd primes p the Legendre symbols ±5
p and
±7
p are equal to +1 or −1.
Exercise 21.6. Let us determine the value of 247
479 . Note that 209 = 13·19, 13 ≡ 1
(mod 4) and 19, 479 ≡ 3 (mod 4). Then we may use the multiplicativity of the
Legendre symbol and the Law of Quadratic Reciprocity as follows:
247
13
19
=
479
479 479
479
479
=
· −
13
19
4
11
=−
13 19
2
11
2
=−
13 19
13
=−
11
2
=−
11
= 1.
85
Note that the last equality holds because the only quadratic residues in Z?11 are
, , ,  and . Since  is not in this list, it is a quadratic nonresidue.
22
Multiplicative Functions
The last 16 sections were all devoted to the theory of congruences, and at this
point it is time to switch gears and move towards other topics. This section, we
begin our first exposition to the Analytic Number Theory.
In analytic number theory, we utilize the tools of real or complex analysis in
order to answer some questions in number theory. For example, the techniques of
analytic number theory allow us to explain the asymptotic behaviour of functions
π(x) = #{p ≤ x : p is prime}
or
Q(x) = #{n ≤ x : n is squarefree}.
Here #X denotes the cardinality of the set X. The study of analytic number theory
begins with the introduction of multiplicative and totally multiplicative functions.
Definition 22.1. A non-zero function f : N → C is called multiplicative if for any
coprime positive integers m and n it is the case that
f (mn) = f (m) f (n).
Definition 22.2. A non-zero function f : N → C is called totally multiplicative if
for any positive integers m and n, not necessarily coprime, it is the case that
f (mn) = f (m) f (n).
Example 22.3. Here are some examples of multiplicative and totally multiplicative functions:
1. The indicator function I(n) is totally multiplicative:
(
1, if n = 1;
I(n) =
0, if n 6= 1;
2. The constant function 1(n) is totally multiplicative:
1(n) = 1 for all n.
86
3. The identity function i(n) is totally multiplicative:
i(n) = n for all n.
4. The Legendre symbol np for a fixed odd prime p is totally multiplicative in
accordance with Proposition 21.1;
5. The Euler totient function ϕ(n) is multiplicative, but not totally multiplicative;
6. The number of divisors function τ(n) is multiplicative, but not totally multiplicative:
τ(n) = #{d : d | n, d > 0};
7. The sum of divisors function σ (n) is multiplicative, but not totally multiplicative:
σ (n) = ∑ d;
d|n
d>0
8. The Möbius function is multiplicative, but not totally multiplicative (you
will prove this fact in Assignment 5):


if n = 1;
1,
µ(n) = 0,
if n is not squarefree;


k
(−1) , if n is squarefree with k distinct prime factors.
We will now explore some properties of multiplicative functions.
Proposition 22.4. 32 If m and n are coprime positive integers, then every positive
divisor d of their product mn comes from a unique pair of integers a and b such
that
a | m, b | n and ab = d.
Proof. If the unique factorizations of m and n are given by
f
f
f
m = pe11 pe22 · · · pekk and n = q11 q22 · · · q`` ,
32 Proposition
8.2 in Frank Zorzitto, A Taste of Number Theory.
87
then the unique factorization of mn takes the form
d = pr11 pr22 · · · prkk qs11 qs22 · · · qs`` ,
where 0 ≤ ri ≤ ei and 0 ≤ s j ≤ f j . If we now set
a = pr11 pr22 · · · prkk and b = qs11 qs22 · · · qs`` ,
it becomes obvious that a | m, b | n and ab = d.
Now we need to confirm that the above a and b are unique. Suppose that there
exist positive integers c and e such that c | m, e | n and ec = d. Then ce = ab. Since
c | m and b | n, it must be the case that c and b are coprime. Therefore c | a. By a
symmetric argument, a | c, whence a = c, and then b = e.
Proposition 22.5. Let f : N → C be a multiplicative function. Then
1. f (1) = 1;
2. The function f (n) is fully determined by its values at prime powers;
3. The function g(n) given by
g(n) :=
f (d)
∑
d|n
d>0
is multiplicative.
Proof. Property 1 is obvious, because
f (n) = f (1 · n) = f (1) f (n).
By definition, f (n) is non-zero, so there exists some n such that f (n) 6= 0. For such
n, we may cancel f (n) on both sides of the above equality, thus leaving f (1) = 1.
To establish property 2, let n = pe11 pe22 · · · pekk be the prime factorization of n.
Then
f (n) = f (pe11 ) f (pe22 · · · pekk )
= f (pe11 ) f (pe22 ) f (pe33 · · · pekk )
···
= f (pe11 ) f (pe22 ) · · · f (pekk ).
since gcd(pe11 , pe22 · · · pekk ) = 1;
since gcd(pe22 , pe33 · · · pekk ) = 1;
Thus if we know the values of f (pe ) for all prime powers pe , we know the values
of f (n) for all positive integers n.
88
To establish property 3, we use Proposition 22.4:
g(mn) =
f (d)
∑
d|mn
=
∑
f (ab)
by Proposition 22.4;
a|m,b|n
=
∑
f (a) f (b)
since gcd(a, b) = 1 and f is multiplicative;
a|m,b|n
!
!
=
∑ f (a)
∑ f (b)
a|m
b|n
= g(m)g(n).
Proposition 22.6. The Euler totient function ϕ(n) is multiplicative. Furthermore,
if n = pe11 pe22 · · · pekk is the prime factorization of n, then
ϕ(n) = (pe11 − pe11 −1 )(pe22 − pe22 −1 ) · · · (pekk − pekk −1 ).
Proof. For an integer x, let us use the notation [x]n to indicate the residue class of
x modulo n.
Let m and n be coprime integers exceeding 1. We will show that Z?mn is in
one-to-one correspondence with the Cartesian product
Z?m × Z?n = {(α, β ) : α ∈ Z?m , β ∈ Z?n }.
Let [x]mn ∈ Z?mn . Then gcd(x, mn) = 1, which means that gcd(x, m) = 1 and
gcd(x, n) = 1. But then [x]m and [x]n must be units in Z?m and Z?n respectively,
so [x]m ∈ Z?m and [x]n ∈ Z?n .
Conversely, if [a]m ∈ Z?m and [b]n ∈ Z?n , then by the Chinese Remainder Theorem there exists some [x]mn ∈ Zmn such that
[x]m = [a]m ∈ Z?m and [x]n = [b]n ∈ Z?n .
Therefore x is coprime to both m and n, and so x is coprime to mn. Thus we
conclude that [x]mn ∈ Z?mn .
Now that we saw that there exists a one-to-one correspondence between Z?mn
and Z?m × Z?n , we can conclude that
#Z?mn = # (Z?m × Z?n ) .
89
But since the cardinality of the Cartesian product is equal to the cardinality of the
individual sets, i.e.
# (Z?m × Z?n ) = #Z?m · #Z?n ,
with the help of Exercise 10.2 we can conclude that
ϕ(mn) = #Z?mn = #(Z?m × Z?n ) = #Z?m · #Z?n = ϕ(m)ϕ(n).
In order to establish the formula for ϕ(n) recall that according to property 2
of Proposition 22.5 it is sufficient to compute ϕ(pe ) for a prime power pe . The
only numbers less than pe that are not coprime to it are p, 2p, 3p, . . . , (pe−1 − 1)p.
There are pe−1 − 1 numbers like that in total, which means that
ϕ(pe ) = (pe − 1) − (pe−1 − 1) = pe − pe−1 .
Now that we know the formula for ϕ(pe ) when pe is a prime power, it is straightforward to write down the general formula for ϕ(n) because it is multiplicative.
Proposition 22.7. The number of divisors function τ(n) is multiplicative. Furthermore, if n = pe11 pe22 · · · pekk is the prime factorization of n, then
σ (n) = (e1 + 1) (e2 + 1) · · · (ek + 1) .
Proof. To see that τ(n) is multiplicative, let n ≥ 2 be an integer and consider the
prime factorization of n:
n = pe11 pe22 · · · pekk .
Then every divisor d of n must be of the form
f
f
f
d = p11 p22 · · · pkk ,
where 0 ≤ fi ≤ ei for all i = 1, 2, . . . , k. Each fi has ei + 1 possibilities, so we see
that there are exactly
τ(n) = (e1 + 1)(e2 + 2) · · · (ek + 1)
possible divisors of n.
Now suppose that
f
f
f
m = pe11 pe22 · · · pekk and n = q11 q22 · · · q``
are coprime, i.e. the prime numbers p1 , p2 , . . . , pk , q1 , q2 , . . . , q` are distinct. Then
τ(mn) = (e1 + 1)(e2 + 1) · · · (ek + 1)( f1 + 1)( f2 + 1) · · · ( f` + 1) = τ(m)τ(n),
which means that τ(n) is a multiplicative function.
90
Proposition 22.8. The sum of divisors function σ (n) is multiplicative. Furthermore, if n = pe11 pe22 · · · pekk is the prime factorization of n, then
!
!
!
pekk +1 − 1
pe11 +1 − 1
pe22 +1 − 1
σ (n) =
···
.
p1 − 1
p2 − 1
pk − 1
Proof. To see that σ (n) is multiplicative, note that
σ (n) =
∑ d = ∑ i(d),
d|n
d>0
d|n
d>0
where i(n) = n is the identity function. Since the identity function i(n) is multiplicative, it follows from property 3 of Proposition 22.5 that σ (n) is multiplicative
as well.
In order to establish the formula for σ (n) recall that according to property 2
of Proposition 22.5 it is sufficient to compute σ (pe ) for a prime power pe . The
divisors of pe are 1, p, p2 , . . . , pe , so
σ (pe ) = 1 + p + p2 + . . . + pe =
pe+1 − 1
.
p−1
Note that the last equality holds because the sequence 1, p, . . . , pe constitutes an
(e + 1)-term geometric progression with the first element equal to 1 and common
ratio p. Now that we know the formula for σ (pe ) when pe is a prime power, it
is straightforward to write down the general formula for σ (n) because it is multiplicative.
23
The Möbius Inversion
From now on, when writing d | n, we will always assume that the divisor d is
positive. As we shall see, the Möbius function


if n = 1;
1,
µ(n) = 0,
if n is not squarefree;


k
(−1) , if n is squarefree with k distinct prime factors
plays a crucial role in analytic number theory.
91
Proposition 23.1.
33
For every n ≥ 1,
∑ µ(d) = I(n).
d|n
Proof. Let g(n) = ∑d|n µ(n). Note that
g(1) = µ(1) = 1 = I(1).
Now let n ≥ 2. Since µ(n) is multiplicative, it follows from property 3 of Proposition 22.5 that g(n) is multiplicative as well. By property 2 of Proposition 22.5,
it suffices to check that g(pe ) = 0 for every prime power pe . We have
g(pe ) =
∑ µ(d)
d|pe
= µ(1) + µ(p) + µ(p2 ) + . . . + µ(pe )
= 1−1+0+...+0
=0
= I(pe ),
so the result follows.
The Möbius function is important because it allows us to express the function
f in terms of g whenever these two functions are connected by the relation
g(n) = ∑ f (d).
d|n
The operation of expressing f through g is called the Möbius inversion.
Proposition 23.2. 34 If f and g are arbitrary functions, not necessarily multiplicative, that are defined on the set of positive integers and satisfy
g(n) = ∑ f (d)
d|n
for all n ≥ 1, then
f (n) = ∑ g(d)µ
d|n
n
d
33 Proposition
34 Theorem
= ∑g
d|n
n
d
8.6 in Frank Zorzitto, A Taste of Number Theory.
8.7 in Frank Zorzitto, A Taste of Number Theory.
92
µ(d).
Proof. First, note that for a positive integer n and a pair of positive integers d, e it
is the case that de | n if and only if d | n and e | n/d.
Second, note that


n
∑ g d µ(d) = ∑ ∑n f (e) µ(d)
d|n e|
d|n
d
=
f (e)µ(d)
∑
d|n,e| dn
=
∑ f (e)µ(d)
ed|n
=
f (e)µ(d)
∑
e|n,d| ne


= ∑  ∑ µ(d) f (e)
e|n
= ∑I
d| ne
n
e|n
e
f (d)
= f (n).
Before proceeding to examples of the Möbius inversion, let us prove the following fact about the Euler totient function ϕ(n).
Proposition 23.3.
35
For every positive integer n,
∑ ϕ(n) = n.
d|n
Proof. Let g(n) = ∑d|n ϕ(d). By property 3 of Proposition 22.5, the function g(n)
is multiplicative. Therefore, by property 2 of Proposition 22.5, it is sufficient
to understand its values g(pe ) for prime powers pe . Using the formula given in
Proposition 22.6, we obtain
g(pe ) = ϕ(1) + ϕ(p) + ϕ(p2 ) + . . . + ϕ(pe )
= 1 + (p − 1) + (p2 − p) + . . . + (pe − pe−1 )
= pe .
35 Proposition
8.4 in Frank Zorzitto, A Taste of Number Theory.
93
And now, since g(n) is multiplicative, for any integer n with the prime factorization n = pe11 pe22 · · · pekk we may conclude that
g(n) = g(pe11 pe22 · · · pekk ) = g(pe11 )g(pe22 ) · · · g(pekk ) = pe11 pe22 · · · pekk = n.
Now that we established the connection between the identity function i(n) and
the Euler totient function ϕ(n), we can write down a new formula for ϕ(n) via the
Möbius inversion.
Example 23.4. Let us prove that for every positive integer n it is the case that
µ(d)
d
d|n
ϕ(n) = n ∑
By Proposition 23.3, the identity function i(n) and the Euler totient function
ϕ(n) are connected by means of the relation
i(n) = ∑ ϕ(d).
d|n
Now the Möbius inversion formula tells us that
n
µ(d)
n
.
= ∑ µ(d) = n ∑
ϕ(n) = ∑ µ(d)i
d
d
d
d|n
d|n
d|n
Example 23.5. Note that
σ (n) = ∑ d,
d|n
which means that there is a connection between the sum of divisors function σ (n)
and the identity function i(n). But then it follows from the Möbius inversion
formula that
n
n = ∑ µ(d)σ
.
d
d|n
Exercise 23.6. The von Mangoldt function, denoted by Λ(n), is defined as
(
log p, if n = pk for some prime p and integer k ≥ 1;
Λ(n) =
0,
otherwise.
94
Prove that
log n = ∑ Λ(d),
d|n
and then use the Möbius inversion to establish the formula
Λ(n) = − ∑ µ(d) log d.
d|n
24
The Prime Number Theorem
In 1797 or 1798, it has been conjectured by Legendre that the number of primes up
to x is approximated by the function A logxx+B , where A and B are some constants.
According to the recollections of Gauss, “in the year 1792 or 1793”, when he was
15 or 16 years old, he made a similar observation. In simple terms, this conjecture
states that, up to x, there are “roughly” logx x prime numbers.
The Prime Number Theorem is a theorem which confirms the conjecture made
by Legendre and Gauss. It is one of the most renowned results in Analytic Number Theory. The Prime Number Theorem was proved independently by Jacques
Hadamard and Charles Jean de la Vallée-Poussin in 1896.
Theorem 24.1. (The Prime Number Theorem) Let
π(x) := #{p ≤ x : p is prime}.
Then
lim
x→∞
π(x)
x
log x
= 1.
A more accurate statement of the Prime Number Theorem is the following
one:
√
π(x) = Li(x) + O xe−a log x ,
where a is a positive constant and
Zx
Li(x) =
dt
.
logt
2
Indeed, the function Li(x) describes the behaviour of the prime counting function
more precisely than logx x . In this form, we also see the error term, which tells us
how far is the value of π(x) from the value of Li(x).
95
The analytic proof of Prime Number Theorem heavily relies on complex analysis, so it is not “elementary”. More precisely, it requires some delicate analysis
of (non-trivial) zeros of the Riemann zeta function
∞
ζ (s) :=
1
∑ ns ,
n=1
where s is a complex number with Re(s) > 1. The elementary proof of Prime
Number Theorem was discovered half a century later, in 1948, by the Norwegian
mathematician Atle Selberg.36
√
Since the proof was introduced, the error term O(xe−a log x ) was improved
many√times. If the Riemann Hypothesis is true, the error term can be improved
to O( x log x). The Riemann Hypothesis concerns the distribution of non-trivial
zeros of ζ (s). It is undoubtedly one of the hardest open mathematical problems.
At the University of Waterloo, there are several experts which work in the area
of Analytic Number Theory and problems related to the distribution of zeros of
Riemann zeta function, including Yu-Ru Liu and Michael Rubinstein.
It is worthwhile mentioning a very interesting elementary argument of Erdős,
which explains why the function logx x “captures” the behaviour of π(x). The proof
does not involve any analytic techniques and should be quite accessible to second
or third year undergraduate students in mathematics. To those who are interested
in the subject, we recommend this proof for further reading.
Theorem 24.2. (Erdős, 1949) For x ≥ 2,
x
x
3 log 2
< π(x) < (6 log 2)
.
8
log x
log x
Proof. See Theorem 4 in https://uwaterloo.ca/pure-mathematics/sites/
25
The Density of Squarefree Numbers
In this section, we will see one basic analytical result on the density of squarefree
numbers.
36 On
the history of elementary proof of Prime Number Theorem and Selberg’s dispute with
Erdős, see the article of D. Goldfeld, The elementary proof of the prime number theorem: an
historical perspective, 2003.
96
Theorem 25.1. Let
Q(x) = #{n ≤ x : n ≥ 2 is squarefree}.
Then the natural asymptotic density of squarefree numbers is given by
Q(x)
6
= 2 ≈ 0.6079.
x→∞ x
π
lim
In other words, Theorem 25.1 tells us that over 60% of all positive integers are
squarefree. Before proceeding to the proof, let us establish the following simple
lemma.
Lemma 25.2. Let f (n) be a multiplicative function such that the series
∞
∑ | f (n)|
n=1
converges. Then
∞
∑ f (n) = ∏
1 + f (p) + f (p2 ) + . . . .
p is prime
n=1
Proof. For a fixed positive number y, the following identity holds:
∏
(1 + f (p) + f (p2 ) + . . .) =
p is prime
p<y
f (n).
∑n
if p | n then p < y
As y approaches infinity, the right hand side approaches ∑∞
n=1 f (n), while the left
hand side approaches the desired Euler product.
Since the series ∑∞
n=1 | f (n)| converges, it must be the case that
∑ | f (n)| → 0
n≥y
as y approaches infinity. We can utilize this fact in order to show that, as y approaches infinity,
∞
∑ f (n) −
n=1
∑n
f (n) =
if p | n then p < y
∑n
∃p|n : p≥y
97
f (n) ≤
∑ | f (n)| → 0.
n≥y
This observation allows us to conclude that
∞
lim
∑ f (n) = y→∞
n=1
∑n
f (n) = lim
y→∞
if p | n then p < y
=
∏
(1 + f (p) + f (p2 ) + . . .)
p is prime
p<y
∏
1 + f (p) + f (p2 ) + . . . .
p is prime
Proof. (of Theorem 25.1) Note that
(
1, if n is squarefree;
µ 2 (n) =
0, otherwise,
which means that
∑ µ 2(n).
Q(x) =
n≤x
Let `(n) denote the largest integer such that `(n)2 | n. Then it follows from Proposition 22.8 that
(
1, if `(n) = 1;
µ 2 (n) =
0, otherwise;
= I (`(n))
=
∑
µ(d)
d|`(n)
=
∑ µ(d).
d
d 2 |n
As it turns out, this formula is much easier to analyze than µ 2 (n).
Now let {x} := x − bxc denote the fractional part of x. Note that {x} satisfies
98
0 ≤ {x} < 1 for any x. Then
Q(x) =
∑ µ 2(n)
n≤x
=
∑ ∑ µ(d)
n≤x d
d 2 |n

=
∑√
d≤ x




µ(d) 
1
∑


n≤x
d 2 |n
jxk
= ∑ µ(d) 2
√
d
d≤ x
x n x o
= ∑ µ(d) 2 − 2
√
d
d
d≤ x
=
∑
√
d≤ x
µ(d)
nxo
x
.
−
µ(d)
∑√
d 2 d≤
d2
x
Since |µ(d){x/d}| < 1, we conclude that
Q(x) =
∑
√
d≤ x
<x
µ(d)
nxo
x
µ(d)
−
∑√
d 2 d≤
d2
x
µ(d)
+ ∑ 1
√
d2
x
d≤ x
∑√
d≤
=x
√
µ(d)
+
b
xc
2
d
x
∑√
d≤
∞
∞
µ(d)
µ(d) √
−
x
+ x.
∑
2
√ d2
d=1 d
d> x
≤x∑
Now observe that
µ(d)
1
∑√ d 2 ≤ ∑√ d 2 <
d> x
d> x
Above we utilized the fact that
Z∞
√
b xc
dt
1
2
= √ ≤√ .
2
t
b xc
x
√
√
x ≤ 2b xc for all x ≥ 2. For convenience, define
99
the constant c as
∞
c :=
µ(d)
.
2
d=1 d
∑
Then
∞
Q(x) ≤ cx − x
µ(d) √
+ x
d2
x
∑√
d>
√
2
< cx + x √ + x
x
√
= cx + 3 x.
Through analogous observations, we can also establish the lower bound on Q(x),
and obtain the final relation
√
√
cx − 3 x < Q(x) < cx + 3 x.
Now the only thing that is left for us to do is to compute c. Recall that
π2
1
=
∑ 2 6.
n=1 n
∞
This result was proved by Leonhard Euler in 1734. Further, by the argument
analogous to the second proof of Theorem 2.10, we see that
∞
1
π2
1
1
1 −1
=∑ 2= ∏
1+ 2 + 4 +... = ∏
1− 2
.
6
p
p
p
n=1 n
p is prime
p is prime
Note that the last equality holds due to the formula for the infinite geometric series.
Since the function µ(n)/n2 is multiplicative and
∞
∑
d=1
∞
µ(d)
1
π2
≤
=
< ∞,
∑
2
d2
6
d=1 d
100
2
we can apply Lemma 25.2 to the series ∑∞
d=1 µ(d)/d in order to obtain
µ(d)
µ(p) µ(p2 )
c= ∑ 2 = ∏
1+ 2 +
+...
p
p4
p is prime
d=1 d
1
= ∏
1− 2
p
p is prime
!−1
∞
1
= ∑ 2
d=1 n
∞
=
6
.
π2
Thus we conclude that
√
√
6
6
x − 3 x < Q(x) < 2 x + 3 x,
2
π
π
and further
3
Q(x)
6
3
6
−√ <
< 2+√ .
2
π
x
x
π
x
By letting x tend to infinity, we see that the Squeeze Theorem implies
6
Q(x)
= 2.
x→∞ x
π
lim
26
Perfect Numbers
One of the oldest problems in mathematics concerns the existence of odd perfect
numbers. Around 300BC, these numbers were introduced by Euclid in his book
Elements (VII.22).
Definition 26.1. A positive integer n is called perfect if the sum of its divisors is
equal to 2n, or in other words σ (n) = 2n.
The first eight perfect numbers are
6, 28, 496, 8128, 33550336, 8589869056, 137438691328, 2305843008139952128.
101
Aside from the fact that they tend to grow pretty quickly (which we shall explain
later), we may notice one thing that they all have in common, namely that they are
all even. But do there exist odd perfect numbers? We do not know. This question
was studied thoroughly over the past two centuries, and quite a few things are
known about odd perfect numbers. For example, if an odd perfect number n exist,
it must satisfy the following three (out of many other) criteria:
1. n > 101500 ;
2. n has at least 101 prime factors and at least 10 distinct prime factors;
3. The largest prime factor of n is greater than 108 .
In 2003, Carl Pomerance gave a heuristic argument why the existence of odd
perfect numbers is highly unlikely. Those who are interested can find his argument
Unlike odd perfect numbers, we do know that even perfect numbers exist.
Even more than that, we know exactly how perfect numbers look like. However,
we still do not know whether there are infinitely many even perfect numbers. As
we shall see later, this problem is equivalent to showing that there are infinitely
many Mersenne primes.
Definition 26.2. Let Mn := 2n − 1. An integer M p = 2 p − 1 is called a Mersenne
prime if it is prime.
The first eight Mersenne primes are
3, 7, 31, 127, 8191, 131071, 524287, 2147483647.
As we will see in the proof of Euclid-Euler Theorem, which was proved by Leonhard Euler in 1747, the even perfect numbers and Mersenne primes are closely
related.
Theorem 26.3. (Euclid-Euler Theorem, 1747)37 An even positive integer n is a
perfect number if and only if it has the form n = 2 p−1 M p , where M p is a Mersenne
prime.
Proof. The sufficient condition was proved by Euclid around 300 BC. You are
asked to reproduce his proof in Assignment 5, so we omit it in these lecture notes.
37 Theorem
8.5 in Frank Zorzitto, A Taste of Number Theory.
102
For the necessary condition, suppose that n is even and perfect. Let us write
n = 2 p−1 m, where p ≥ 2 and m is odd. Note that p ≥ 2 because n is even. We will
show that m = 2 p − 1, and that m is prime.
We have that n is perfect, and so
σ (n) = 2n = 2 p m.
Because 2 p−1 and m are coprime and σ is multiplicative, the first equation yields
σ (n) = σ (2 p−1 )σ (m).
By adding up the divisors of 2 p−1 we obtain
σ (2 p−1 ) = 1 + 2 + 22 + . . . + 2 p−1 = 2 p − 1.
We conclude that
σ (n) = (2 p − 1)σ (m),
and so
2 p m = (2 p − 1)σ (m).
Since 2 p and 2 p − 1 are coprime, 2 p − 1 | m. So
m = (2 p − 1)d
for some positive integer d.
Now we need to prove that in the expression m = (2 p − 1)d we have d = 1. We
plug in this expression into the equality 2 p m = (2 p − 1)σ (m) in order to obtain
2 p (2 p − 1)d = (2 p − 1)σ (m),
and thus 2 p d = σ (m). From m = (2 p − 1)d and 2 p d = σ (m) we come to
m + d = 2 p d = σ (m).
Now suppose that d > 1. Since d < m, there are at least three divisors of
m, namely 1, d and m. So σ (m) ≥ m + d + 1, and this contradicts the fact that
σ (m) = m + d. Therefore d = 1.
To see that m is prime, note that σ (m) = m+d = m+1. Since the divisors of m
add up to m + 1, our m can have only 1 and m as divisors, which makes m a prime.
Hence our perfect even number m is of the form 2 p−1 M p , where M p = 2 p − 1 is a
Mersenne prime.
103
Though we do not know if there are infinitely many Mersenne primes, we do
know quite a few of them. On January 7th 2016, The Great Internet Mersenne
Prime Search reported the discovery of the 49th Mersenne prime, which is the
largest Mersenne prime known to date. This prime is M74207281 , and it has 22338618
decimal digits. If you want to make some significant impact to Computational
Number Theory, try to search for other Mersenne primes!
27
Pythagorean Triples
In Section 4, we learned how to solve the linear Diophantine equation ax + by = c.
We will now turn our attention to equations of degree two or more. The analysis of
such equations can be much more challenging, and many Diophantine equations,
such as Thue equations, remain the objects of active research nowadays.
In this section, we will classify all positive integer solutions to the Pythagorean
equation
x2 + y2 = z2 .
Note that if the integers x, y and z satisfy the above equation, then so do integers
dx, dy and dz for any integer d. Thus it is only interesting to consider the case
when gcd(x, y, z) = 1. In this case, we call the triple of solutions primitive. The
first three primitive solutions to the Pythagorean equation are (x, y, z) = (3, 4, 5),
(5, 12, 13) and (8, 15, 17).
Theorem 27.1. Suppose integers x, y and z satisfy the Pythagorean equation
x2 + y2 = z2 . Then there exist integers d, m, n such that
x = d(n2 − m2 ), y = 2dmn, z = d(n2 + m2 ).
Proof. 38 Let d = gcd(x, y, z). Then the triple (x/d, y/d, z/d) is also a solution,
so without loss of generality we may assume that gcd(x, y, z) = 1, i.e. (x, y, z) is a
primitive solution. From here it follows that either x or y have different parity, for
if we assume that both x and y are odd, then x2 +y2 ≡ 2 (mod 4), which contradicts
the fact that z2 ≡ 0, 1 (mod 4) for any integer z. Without loss of generality, we
may assume that x is odd and y is even, which means that z is odd. Now we write
y2 = z2 − x2 = (z − x)(z + x).
38 The
proof is taken from Section 1.1 of M. J. Jacobson, Jr. and H. C. Williams, Solving the Pell
Equation, 2009.
104
If we let g = gcd(z − x, z + x) = gcd(2z, z + x) = gcd(z − x, 2x) (see Proposition
5.1), then g | 2z and g | 2x, which means that g | gcd(2z, 2x) = 2 gcd(z, x). Since
(x, y, z) is a primitive solution, it must be the case that gcd(z, x) = 1. This means
that g | 2, and since x and z are odd it must be the case that g = 2.
Now we can write
y 2 z − x z + x =
.
2
2
2
Since the value on the left hand side of the above equality is a perfect square and
z−x
z+x
z−x z+x
2 , 2 are coprime integers, it must be the case that 2 and 2 are perfect
squares. Put
z−x
z+x
= m2 and
= n2 .
2
2
But then x = n2 − m2 , y = 2mn and z = n2 + m2 . Now we see that for any integer
d the identity
2
2
d(n2 − m2 ) + (2dmn)2 = d(n2 + m2 )
holds, which means that all solutions (x, y, z) to x2 + y2 = z2 are of the form
(d(n2 − m2 ), 2dmn, d(n2 + m2 )), as claimed.
28
Fermat’s Infinite Descent.
Fermat’s Last Theorem
Perhaps, the most famous mathematical story is the story of Fermat’s Last Theorem. Around 1637, Fermat wrote his Last Theorem in the margin of his copy of
Diophantus’s Arithmetica. When reformulated, his claim sounds as follows:
Theorem 28.1. (Fermat’s Last Theorem) Let n ≥ 3. Then the equation xn +yn = zn
has no solutions in positive integers x, y and z.
He claimed to discover a “truly marvellous” proof of this fact, but couldn’t
write it because the margin of the book which he was reading was too narrow to
contain all of the proof.
Many mathematicians tried to establish the proof of Fermat’s Last Theorem.
The case n = 4 was proved by Fermat himself in 1636. In 1753, Euler proved it
for the case n = 3. Alternative proofs were given by Kausler, Legendre, Calzolari, Lamé, and many others. In his proof, Euler utilized Fermat’s idea of infinite
105
descent, which we shall discuss in this section. The case n = 5 was proved by
Dirichlet and Legendre around 1825, and alternative proofs were given by Gauss,
Lebesgue, Lamé, and others. The case n = 7 was proved by Gabriel Lamé in 1839.
In the 1820’s, Sophie German developed an approach to attack the problem
for several exponents at the same time. In particular, she managed to show that
the Fermat’s Last Theorem holds for all primes n < 100.
In 1847, Gabriel Lamé suggested to approach the problem by factoring the
equation x p + y p = z p for odd prime p as follows:
z p = x p + y p = (x + y)(x + ζ p y)(x + ζ p2 y) · · · (x + ζ pp−1 y),
(5)
where ζ p = exp(2πi/p) is the primitive p-th root of unity. If instead of the standard ring of integers Z one considers the ring of integers
Z[ζ p ] = {x0 + x1 ζ p + x2 ζ p2 + . . . + ζ pp−1 : x1 , x2 , . . . , x p ∈ Z},
then one would hope that such notions as unique factorization or coprimality take
place in Z[ζ p ], just like they do in Z. Assuming that this is the case, one could
show that the algebraic integers x+y, x+ζ p y, . . . , x+ζ pp−1 y are coprime, and since
the expression (5) has a p-th power of an integer z on its left hand side, one could
then hope that x + ζ pi y = qip for some qi ∈ Z[ζ p ], where i = 0, 1, . . . , p. In other
words, each of the numbers x + y, x + ζ p y, . . . , x + ζ pp−1 y are perfect p-th powers,
and one could prove that this is impossible. Note how similar this idea to the one
presented in the proof of Theorem 27.1.
Unfortunately, there is a flaw in this argument: it is not necessarily true that
the ring Z[α] for some algebraic number α has the unique factorization. Perhaps,
the most famous example is that in the ring
√
√
Z[ −5] = {x1 + x2 −5 : x1 , x2 ∈ Z}
one can write the number 6 in two different ways:
√
√
6 = 2 · 3 = (1 + −5)(1 − −5).
The odd primes p such that the elements of the ring Z[ζ p ] may not possess the
unique factorization are called irregular primes. They are called regular otherwise. The first eight irregular primes are
37, 59, 67, 101, 103, 131, 149157.
106
Therefore Lamé’s strategy applies to all primes p < 100, except for p = 37, 59
and 67. Around 1850, Ernst Kummer managed to prove that for all regular primes
the Fermat equation x p + y p = z p has no solutions in positive integers when p is an
odd prime. However, it is still unknown whether there are infinitely many regular
primes. In 1964, Carl Ludwig Siegel conjectured that approximately 60.65% of
all prime numbers are regular. The techniques suggest by Lamé and Kummer
(and Euler before that) evolved into a whole new area of mathematics, known
nowadays as the Algebraic Number Theory. The next few sections will contain a
brief introduction to this subject.
The Fermat’s Last Theorem was proved by the English mathematician Andrew Wiles. His proof was published in 1994 in the special issue of Annals of
Mathematics. The original paper is available here: https://math.stanford.
edu/~lekheng/flt/wiles.pdf. As an exercise: try to understand at least the
first page! Since the Fields medal, which is one of the most important awards
for mathematicians, is restricted to those under age 40, and Andrew Wiles proved
the Fermat’s Last Theorem at the age 41, he received a silver plaque from the
International Mathematical Union instead of the Fields medal.
The proof of Andrew Wiles combined many areas of number theory together.
It is an interconnection of the Theory of Elliptic Curves, Theory of Modular Forms,
Representation Theory, Iwasawa Theory, and many other mathematical subjects.
In short, Andrew Wiles managed to do the following. Consider the equation
y2 = x3 + ax + b,
where a and b are complex numbers such that 4a3 + 27b2 6= 0. When a and b
are real, such an equation defines a plane curve, called an elliptic curve. In 1985,
the German mathematician Gerhard Frey pointed out that for an integer n ≥ 3 the
elliptic curve
y2 = x(x − an )(x + bn ),
where a and b are positive integers such that an + bn = cn for some integer c, must
be very special. In particular, he pointed out that such a curve must be semistable
and non-modular. The fact that it is non-modular would then contradict the socalled Taniyama-Shimura Conjecture, proposed by the Japanese mathematicians
Yutaka Taniyama and Goro Shimura in 1957. The conjecture stated that every
elliptic curve, semistable or not, has to be modular. Andrew Wiles managed to
prove this conjecture in the semistable case. Fermat’s Last Theorem then follows
from this result. The fact that all elliptic curves, semistable or not, are modular, was proved in 2001 by Christophe Breuil, Brian Conrad, Fred Diamond and
107
Richard Taylor. This result is known as the Modularity Theorem. It took more
than 350 years for the proof of Fermat’s Last Theorem to be discovered.
Fermat claimed that he had the proof of the Fermat’s Last Theorem. Of course,
it is highly unlikely that the argument he had in mind was as involved as the one
given by Andrew Wiles. Most likely, Fermat believed that the theorem could be
proved using the technique of infinite descent, which he developed. This technique allowed him to prove the theorem in the special case when n = 4. We
present a more general result in the following proposition. The idea of infinite
descent can be summarized as follows: when considering certain Diophantine
equations, like x3 + 2y3 + 4z3 = 0 or x4 + y4 = z2 , one can show that the existence of one solution leads to the existence of another solution, which is “smaller”
than the previous one. One would then obtain an infinite strictly decreasing sequence of positive integers x1 > x2 > x3 > . . ., which would contradict the fact
that the natural numbers are bounded below by 1. We will demonstrate the application of this technique in two special cases. More examples can be found in the
Proposition 28.2. (Fermat, 1636)39 The equation x4 + y4 = z2 has no solutions in
positive integers x, y and z.
Proof. By Theorem 27.1, every primitive solution (x, y, z) to the equation x2 + y2 = z2
must be of the form
x = n2 − m2 , y = 2mn, z = n2 + m2 .
Assume that there is a solution to x4 + y4 = z2 , where x, y and z are positive
integers. Without loss of generality, we may suppose that gcd(x, y) = 1, which
means that gcd(x, z) = 1 and gcd(y, z) = 1. We will find a second positive integer
solution (x0 , y0 , z0 ) with gcd(x0 , y0 ) = 1 that is smaller than (x, y, z) in a suitable
sense.
Since x4 + y4 = z2 and gcd(x, y) = 1, at least one of x and y is odd. Otherwise,
z2 ≡ 2 (mod 4), and this congruence as we saw before has no solutions. Without
loss of generality, we may assume that x is odd and y is even. Then z is odd. Since
(x2 )2 + (y2 )2 = z2 , the triple (x2 , y2 , z) must be a primitive Pythagorean triple, so
there exist integers m and n such that
x2 = n2 − m2 , y2 = 2mn, z = n2 + m2 .
39 Theorem
3.1 in Keith Conrad, Proofs by descent.
108
(6)
Since x2 + m2 = n2 and gcd(m, n) = 1, we conclude that (x, m, n) is another primitive Pythagorean triple. Since x is odd, the formula for primitive Pythagorean
triples once again tells us that
x = a2 − b2 , m = 2ab, n = a2 + b2 ,
(7)
where a and b are positive. Substituting the values of m and n in (7) into the
second equation of (6), we obtain y2 = 4(a2 + b2 )ab. Since y is even,
y 2
2
= (a2 + b2 )ab.
Since gcd(a, b) = 1, the three factors on the right are pairwise coprime. Since they
are all positive, each of them must be a perfect square:
a = x02 , b = y02 , a2 + b2 = z02 .
Since gcd(a, b) = 1, it must be the case that gcd(x0 , y0 ) = 1. Now the last equation
can be rewritten as x04 + y04 = z02 , so (x0 , y0 , z0 ) is another solution to our original
equation with gcd(x0 , y0 ) = 1.
Now we compare z0 to z. Since
0 < z0 ≤ z02 = a2 + b2 = n ≤ n2 < z,
we see that from one primitive solution (x, y, z) to x4 + y4 = z2 we can produce
another primitive solution (x0 , y0 , z0 ) such that z > z0 . But then we could produce
an infinite strictly decreasing sequence of positive integers z > z0 > z00 > . . ., and
this contradicts the fact that the positive integers are bounded below by 1.
Corollary 28.3. The Fermat’s Last Theorem holds for n = 4. In other words, the
equation x4 + y4 = z4 has no solutions in positive integers x, y and z.
√ Another example of the proof by infinite descent is the proof of irrationality of
2. This proof was discovered by Pythagoreans, who showed that the diagonal
of a square cannot be represented as a ratio of two integers. The Pythagoreans
kept the proof of this fact as a secret and, according to the legend, its discoverer
(possibly Hippasus of Metapontum) was murdered for divulging it.
√
Proposition 28.4.
The
number
2 is irrational. That is, there exist no integers m
√
and n such that 2 = m/n.
109
√
Proof. Suppose
not
and
there
exist
positive
integers
m
and
n
such
that
2 = m/n.
√
Then m = 2n. Raising both sides of this equation to the power of two, we obtain
m2 = 2n2 ,
so (m, n) is a positive solution to the Diophantine equation x2 = 2y2 . From the
above equality we see that 2 | m2 , which means that 2 | m. But then we can write
m as m = 2m0 for some integer m0 . Therefore
m2 = (2m0 )2 = 4m2 = 2n2 .
Thus we obtain 4m2 = 2n2 , and by cancelling 2 on both sides we get
n2 = 2m02 .
Thus from the positive integer solution (m, n) we can obtain another positive integer solution (n, m0 ) to the Diophantine equation x2 = 2y2 . Note that
1
1
m0 = m = √ n < n,
2
2
so the second coordinate in the solution (m, n) is strictly greater than the second
coordinate in the solution (n, m0 ). Thus, if there would be a positive integer solution to the Diophantine equation x2 = 2y2 , we could produce an infinite strictly
decreasing sequence of positive integers n > m0 > m00 > . . ., and this contradicts
the fact that the positive integers are bounded below by 1.
√
Exercise 28.5. Let k be a positive integer. Prove that the number k is rational if
and only if k is a perfect square.
29
Gaussian Integers
Let i denote one of the complex roots of the polynomial x2 +1. That is, the number
i satisfies the equation i2 = −1. Notice that if i is a root of x2 + 1, then so is −i.
Definition 29.1. A complex number of the form a + bi, where a, b ∈ Z is called a
Gaussian integer. The set of Gaussian integers is denoted by Z[i].
The notation Z[i] suggests that the set of Gaussian integers is analogous to the
ring of rational integers Z, where we now treat the numbers i or −i as (Gaussian)
integers as well. The similarities between the two sets become even more obvious
once we note that, just like the set of rational integers Z, the set Z[i] forms a
commutative ring under the standard operations of addition and multiplication.
110
Proposition 29.2. The set of Gaussian integers
Z[i] := {a + bi : a, b ∈ Z}
forms a commutative ring under the standard operations of addition and multiplication.
Proof. Strictly speaking, to prove this result one would have to do the routine
verification of the ring axioms and the commutativity. We will leave this part an
exercise. What is worthwhile mentioning is that both 0 = 0 + 0 · i and 1 = 1 + 0 · i
are the elements of Z[i], and also that the operations of addition and multiplication
are well-defined. That is, for all a + bi, c + di ∈ Z[i], their sum, difference and
product are the elements of Z[i]:
(a + bi) ± (c + di) = (a ± c) + (b ± d)i ∈ Z[i];
(a + bi)(c + di) = (ac − bd) + (ad + bc)i ∈ Z[i].
Also, note that Z ( Z[i], so every rational integer is also a Gaussian integer.
We will see that the Gaussian integers will be of a great help when we will try
to answer the question which integers can be represented as a sum of two squares.
In other words, we will use Gaussian integers to solve the Diophantine equation
n = a2 + b2 , where n is fixed and a, b ∈ Z are variables. Note that, if n ∈ N is
representable as a sum of two squares, then
n = a2 + b2 = (a + bi)(a − bi),
so we just managed to factor a rational integer n, which is also a Gaussian integer,
as a product of two Gaussian integers a + bi and a − bi.
Definition 29.3. Let a, b ∈ Z[i]. We say that a divides b, or that a is a factor of
b, when b = ak for some k ∈ Z[i]. We write a | b if this is the case, and a - b
otherwise.
Example 29.4. For example, 5 = (1 + 2i)(1 − 2i), so (1 + 2i) | 5 and (1 − 2i) | 5.
One of the most important invariants attached to a Gaussian integer z is its
norm, which we denote by N(z).
111
Definition 29.5. The norm function is defined to be
N : Z[i] → N ∪ {0}, a + bi 7→ a2 + b2 .
Definition 29.6. Let z = a + bi be a Gaussian integer. The complex conjugate of
z is z = a − bi. The absolute value of z is
p
p
√
|z| := zz = (a + ib)(a − ib) = a2 + b2 .
Note the obvious connection between the norm of a Gaussian integer z = a+bi
and the absolute value of z:
p
2
N(z) = N(a + bi) = a2 + b2 =
a2 + b2 = |z|2 .
The norm map has many nice properties. For example, it is multiplicative; that is,
the norm of the product of two Gaussian integers is equal to the product of their
norms:
N(zw) = |zw|2 = (|z||w|)2 = |z|2 |w|2 = N(z)N(w).
We will see the usefulness of this property later. Another important thing to mention is that the only Gaussian integer whose norm is equal to zero is zero itself.
That is, N(z) = 0 if and only if z = 0.
Now comes the time to speak about the geometric interpretation of the Gaussian integers. Consider Figure 1,40 which depicts a complex plane. The Gaussian
integers a + bi form a square grid located at points (a, b), where the coordinates
are rational integers. If z = a + bi is a Gaussian integer, then the point (a, −b),
which corresponds to the complex conjugate z = a − bi of z, is just the result of
reflection of the point (a, b) along the x-axis. In turn, the absolute value |z| represents the distance from the point (a, b) to the origin. Note that it is equal to the
distance from the point (a, −b) to the origin.
The next important concept that we need to introduce is the concept of units.
Definition 29.7. A Gaussian integer u is a unit of Z[i] when u | w for all w in Z[i].
In other words, the units are those very special numbers that divide every
single element of the ring. The notion of a unit does not apply only to the ring of
Gaussian integers, but in fact applies to any algebraic ring. For example, in Z the
only units are 1 and −1. When talking about the prime factorization of rational
40 The
7d/Gaussian_integer_lattice.png.
112
Figure 1: Gaussian integers
integers, we always omit ±1. When doing so, we actually mean that the prime
factorization is unique up to multiplication by ±1. We will see that the analogue
of the Fundamental Theorem of Arithmetic holds for Gaussian integers, and so
every Gaussian integer has the unique prime factorization up to multiplication by
a unit. In the next proposition, we prove that the only units in the ring of Gaussian
integers are ±1 and ±i.
Proposition 29.8.
41
The following are equivalent:
1. z is a unit in Z[i];
2. N(z) = 1;
3. z ∈ {1, −1, i, −i};
4. the inverse complex number z−1 := 1/z is also a Gaussian integer.
Proof. Suppoze that z is a unit. Then z | 1, since z divides every Gaussian integer.
Thus 1 = zw for some w in Z[i]. Then
1 = 12 + 02 = N(1) = N(zw) = N(z)N(w).
Since N(z) and N(w) are positive integers, we deduce that N(z) = 1 (and N(w) = 1).
41 Proposition
7.9 in Frank Zorzitto, A Taste of Number Theory.
113
Suppose that N(z) = 1, where z = a+bi for some a, b ∈ Z. We have a2 + b2 = 1,
which means that a2 = 1, b = 0 or a = 0, b2 = 1. In the first case, a = ±1 and
b = 0, which means that z = ±1. In the second case, a = 0 and b = ±1, which
means that z = ±i.
If z is one of 1, −1, i, −i, its inverse is 1, −1, −i, i, respectively, and these are
again Gaussian integers.
Finally, suppose that z and z−1 are Gaussian integers. If w is any other Gaussian integer, we see that z | w, because w = z(z−1 w) and z−1 w is a Gaussian integer.
We will now turn our attention to establishing the analogue of the Fundamental
Theorem of Arithmetic in the ring of Gaussian integers. For this purpose, we need
to introduce the definition of a Gaussian prime.
Definition 29.9. Let z be a Gaussian integer. Then z is called a Gaussian prime if
it is not a unit and any factorization z = wu in Z[i] forces w or u to be a unit.
Compare this definition to Definition 2.5, where we introduced the notion of
a rational prime. One can notice the similarities, since an ordinary rational prime
can be factored in Z only if one of its factors is a unit, which in the case of Z are
±1.
Example 29.10. The integer 2 is a prime in Z, but it is not a Gaussian prime
because 2 = (1 + i)(1 − i), and neither 1 + i nor 1 − i is a unit. The number 3,
however, is not only a rational prime, but also a Gaussian prime. For suppose that
3 = zw for some Gaussian integers z and w. Then
9 = N(3) = N(zw) = N(z)N(w),
which means that N(z) | 9 and N(w) | 9. If we assume that N(z) = 1 then z must
be a unit by Proposition 29.8. Thus we need to eliminate this case. But then
N(z) = 3, and if we let z = a + bi, then
3 = N(z) = N(a + bi) = a2 + b2 .
However, we saw many times that integers congruent to 3 modulo 4 cannot be
represented as a sum of two squares, which means that N(z) 6= 3. Analogously,
N(w) 6= 3. But then either N(z) = 1 or N(w) = 1, which means that either z or w
is a unit.
114
Exercise 29.11. Prove that every rational prime p such that p ≡ 3 (mod 4) is a
Gaussian prime.
The next step is to establish the analogue of the Remainder Theorem for Gaussian integers.
Proposition 29.12. 42 If z, w are Gaussian integers and z 6= 0, then there exist
Gaussian integers q and r such that w = qz + r, where N(r) < N(z).
Proof. Recall the geometric interpretation of the Gaussian integers, given in Figure 1. The complex number w/z is located somewhere on the complex plane C.
This w/z need not be a Gaussian integer. However, as Figure 2 demontrates, one
can see that it falls into one of the rectangular areas, whose vertices are Gaussian
integers.
Figure 2: Complex number in a square with Gaussian integers as vertices
We pick our Gaussian integer q so that the distance between the point corresponding to q and the point corresponding to w/z is the smallest. By inspection,
we can see that such a Gaussian integer q must satisfy
1
w
−q ≤ √ .
z
2
The Gaussian integer q has to be in one of the four boxes as shown on Figure 3,
and the diagonal of each box has length
s 1 2
1 2
1
+
=√ .
2
2
2
115
Figure 3: Gaussian integer closest to a given complex number
We conclude that
2
w
1
− q ≤ < 1.
z
2
Therefore
w − zq
z
2
< 1,
and so |w − zq|2 < |z|2 , which is the same as N(w − zq) < N(z). Put r := w − zq,
and obtain w = zq + r, where N(r) < N(z).
Example 29.13. Let us see how the Remainder Theorem for Gaussian integers
works. Let w = 4 + 7i and z = 1 − 3i. Then
w 4 + 7i (4 + 7i)(1 + 3i) −17 + 19i
=
=
=
= −1.7 + 1.9i.
z
1 − 3i (1 − 3i)(1 + 3i)
10
We see that the nearest integer point to (−1.7, 1.9) is (−2, 2). Thus q = −2 + 2i.
Then
r = w − qz = (4 + 7i) − (1 − 3i)(−2 + 2i) = −I.
We conclude that
4 + 7i = (−2 + 2i)(1 − 3i) − i.
Note that N(−i) = 1 < 10 = N(z).
42 Proposition
7.11 in Frank Zorzitto, A Taste of Number Theory.
116
We will now prove the analogue of Bézout’s lemma for Gaussian integers. For
a, b ∈ Z[i], we call an integer ax + by with x, y ∈ Z[i] a Gaussian combination of a
and b. In the following proposition, it is crucial that for every Gaussian integer a
the value N(a) is always non-negative.
Proposition 29.14. Let a, b be Gaussian integers such that a 6= 0 or b 6= 0. If d
is a Gaussian combination of a and b such that N(d) is minimal, then d divides
every combination of a and b.
Proof. We know that ax + by = d and N(d) > 0 is minimal. Now consider some
integer combination
c = as + bt,
where s,t ∈ Z[i]. We want to show that d | c. By Proposition 29.12,
c = dq + r
for some q, r ∈ Z[i], where N(r) < N(d). Thus
0≤r
= c − dq
= as + bt − (ax + by)q
= a(s − xq) + b(t − yq).
We see that r is an integer combination of a and b such that N(r) < N(d). Because
d is the integer combination of a and b such that N(d) is minimal, the only option
is that N(r) = 0. Hence d | c. In particular, d | a and d | b, because a, b are integer
combinations of a and b.
Definition 29.15. A Gaussian integer d = ax + by such that x, y are Gaussian integers, d | a and d | b is called a greatest common divisor of Gaussian integers a
and b.
Exercise 29.16. Let d1 and d2 be greatest common divisors of Gaussian integers
a and b. Prove that d1 = ud2 for some unit u in Z[i].
Finally, we prove the analogue of Euclid’s lemma for Gaussian integers.
Proposition 29.17. 43 if p is a Gaussian prime and p | zw for some Gaussian
integers z, w, then p | z or p | w.
43 Proposition
7.13 in Frank Zorzitto, A Taste of Number Theory.
117
Proof. Suppose that p - z. We will show that p | w. Let u be a greatest common
divisor of p and z. Thus u = pt + zs for some t, s ∈ Z[i] and u | p, u | z. Write
p = uk for some k ∈ Z[i]. Since p is a Gaussian prime, one of u or k is a unit in
Z[i].
If k is a unit, then u = pk−1 ∈ Z[i], and so p | u. Since p | u and u | z, it must
be that p | z, contrary to our assumption. Thus u is a unit with inverse u−1 ∈ Z[i].
Now multiply u = pt + zs by wu−1 :
w = ptwu−1 + zswu−1 .
Clearly, p | ptwu−1 , and we are given that p | zw. Thus p | w.
Exercise 29.18. Use the results established above to prove the Fundamental Theorem of Arithmetic for Gaussian integers: any Gaussian integer that is not a unit
can be written uniquely (up to reordering and multiplication by a unit) as a product
of Gaussian primes.
Exercise 29.19. Compute the quotient and the remainder after division of w by z,
when (w, z) = (6 + i, 2 − i), (27 − 5i, 3 − 7i), (4 + 7i, 8 − i).
Exercise 29.20. Let ω denote the primitive third root of unity. That is,
√
2πi
−1 + −3
.
ω =e 3 =
2
Note that ω satisfies the equation ω 2 + ω + 1 = 0. The set
Z[ω] := {a + bω : a, b ∈ Z}
is called the ring of Eisenstein integers. For any Eisenstein integer α = a + bω,
where a, b ∈ Z, the norm map is defined by
N(a + bω) := a2 − ab + b2 .
(8)
Just like the ring of Gaussian integers, the ring of Eisenstein integers is a Unique
Factorization Domain. Geometrically, Eisenstein integers form a lattice on the
complex plane (see Figure 4).
1. Prove that Z[ω] is a ring by showing that 0, 1 ∈ Z[ω], and for all α, β ∈ Z[ω]
it is the case that α ± β ∈ Z[ω] and α · β ∈ Z[ω];
118
2. Prove that the norm map defined in (8) is multiplicative. That is, for every
α, β ∈ Z[ω] it is the case that N(αβ ) = N(α)N(β ). Explain why N(α) ≥ 0
for every α ∈ Z[ω] and why N(α) = 0 if and only if α = 0;
3. We say that υ ∈ Z[ω] is a unit if υ | α for every α ∈ Z[ω]. Prove that
υ ∈ Z[ω] is a unit if and only if N(υ) = 1;
4. Find all units in Z[ω].
Figure 4: Eisenstein integers
Exercise 29.21. Let
n
o
√
√
Z[ 2] := a + b 2 : a, b ∈ Z .
√
√
We say that υ ∈ Z[ 2] is a√unit if υ | α for every α ∈ Z[ 2]. Prove that there are
infinitely many units in Z[ 2].
Hint: Consider the Pell equation x2 − 2y2 = ±1. Explain √why, for every
(x1√
, y1 ) satisfying this Diophantine equation, the value x1 + y1 2 is a unit in
Z[ 2]. Find any solution (x1 , y1 ), and then prove that, for √
every positive √
integer n, the integer coefficients xn and yn of the number xn + yn 2 := (x1 + y1 2)n
also satisfy the equation xn2 − 2y2n = ±1.
Exercise 29.22. Consider the ring
√
√
Z[ −13] = {a + b −13 : a, b ∈ Z}.
√
For every a, b ∈ Z, the norm map on Z[ −13] is defined by
√
N(a + b −13) := a2 + 13b2 .
119
You may assume that the
√ norm is multiplicative. We will show that the unique
factorization fails in Z[ −13]. To solve this problem, you might want to refer to
Section 2.3 in Frank Zorzitto, A Taste of Number Theory.
√
1. Prove that the only units of Z[ −13] are ±1.
√
√
Hint: Let υ = a + b −13 for
a,
b
∈
Z.
By
definition,
υ
∈
Z[
−13] is a
√
unit if υ | α for every α ∈ Z[ −13]. Thus, in particular, υ | 1. Explain why
this fact implies the equality a2 + 13b2 = 1. What are the solutions to this
Diophantine equation?
√
2. We say that a non-zero number
γ
∈
Z[
−13] is prime if the factoriza√
that
either α√is a unit or β is a
tion γ = αβ for α, β ∈ Z[ −13] implies
√
unit.
√ Prove that the numbers 2, 7, 1 + −13 and 1 − −13 are prime in
Z[ −13];
√
3. Using Part (b), explain why the unique factorization fails in Z[ −13].
30
Fermat’s Theorem on Sums of Two Squares
We will now turn our attention to the Diophantine equation n = a2 + b2 , where n
is a fixed positive integer and a, b are integer variables. On December 25th 1640,
Fermat sent the proof of the following theorem to Mersenne, which is why in
some sources it is called Fermat’s Christmas Theorem. This theorem will allow
us to explain which positive integers are representable as a sum of two squares,
and how many solutions does the equation n = a2 + b2 have.
Theorem 30.1. (Fermat’s Theorem on Sums of Two Squares)44 If p is a rational
odd prime and p ≡ 1 (mod 4), then p = a2 + b2 for some rational integers a and
b.
Proof. (Richard Dedekind, circa 1894) Since p ≡ 1 (mod 4), it follows from
Corollary 20.10 that −1 is a quadratic residue modulo 4. Thus −1 ≡ x2 (mod p)
for some rational integer x. Thus p | x2 + 1 in Z, and so p | (x + i)(x − i) in Z[i].
Now note that p - x + i, for if we assume that x + i = p(c + di) for some Gaussian integer c + di, then by equating the imaginary parts we get pd = 1, which
contradicts the fact that p - 1. Likewise, p - x − i.
44 Theorem
7.14 in Frank Zorzitto, A Taste of Number Theory.
120
Since p divides a product without dividing either of the factors, Proposition
29.17 tells us that p is not a Gaussian prime. Thus p = uv, where u, v ∈ Z[i] are
not units. But then
p2 = N(p) = N(uv) = N(u)N(v),
so N(u) = 1, p or p2 .If N(u) = 1, then u is a unit. If N(u) = p2 , then N(v) = 1, so
v is a unit. Hence N(u) = N(v) = p. But if we now write u = a + bi, then
p = N(u) = N(a + bi) = a2 + b2 ,
so p is a sum of two squares of rational integers.
Now we know that, when p is an odd prime, the equation p = x2 + y2 has a
solution in positive integers x and y if and only if p ≡ 1 (mod 4). Notice that
it also has a solution when p = 2, because 2 = 12 + 12 . We would now like to
generalize this result to all positive integers n. For this purpose, we need to prove
the following lemma.
Lemma 30.2. 45 If p in Z[i] is a Gaussian prime and pk | uv for some Gaussian
integers u and v and exponent k ≥ 1, then there are exponents j, ` = 0, 1, . . . , k
such that p j | u, p` | v and j + ` = k.
Proof. We will prove this statement using the principle of mathematical induction.
Base case. For k = 1, the result is equivalent to Euclid’s lemma for Gaussian
integers, stated in Proposition 29.17.
Induction hypothesis. Suppose that the theorem is true for k − 1.
Induction step. Let pk | uv. Then p | u or p | v. Suppose that p | v. Write
v = wp for some w in Z[i]. Then pk | uwp, which means that pk−1 | uw. According
to the induction hypothesis, there exist integers j and m, 0 ≤ j, m ≤ n − 1, such
that p j | u, pm | w, and j + m = k − 1. But then pm+1 | wp = v. If we now put
` = m + 1, then p j | u, p` | v, and j + ` = k, as claimed.
Proposition 30.3. Let n be a positive integer. The Diophantine equation n = x2 + y2
has a solution if and only if n has the prime factorization
2f
2f
2f
n = 2t pe11 pe22 · · · pekk q1 1 q2 2 · · · q` ` ,
where p j ≡ 1 (mod 4) for all j = 1, 2, . . . , k and q j ≡ 3 (mod 4) for all j =
1, 2, . . . , `.
45 Proposition
7.16 in Frank Zorzitto, A Taste of Number Theory.
121
Proof. Let w = a + bi and z = c + di be Gaussian integers. Since the norm map is
multiplicative, it is the case that
(a2 + b2 )(c2 + d 2 ) = N(w)N(z)
= N(wz)
= N ((ac − bd) + (ad + bc)i)
= (ac − bd)2 + (ad + bc)2 .
The identity above allows us to conclude that the product mn of any two numbers
m = a2 + b2 and n = c2 + d 2 will be representable as a sum of two squares as well:
mn = (a2 + b2 )(c2 + d 2 ) = (ac − bd)2 + (ad + bc)2 .
Since 2 is representable as a sum of two squares, as well as any odd prime p ≡ 1
(mod 4), we conclude that every integer n with the prime factorization
n = 2t pe11 pe22 · · · pekk ,
where p j ≡ 1 (mod 4) for all j = 1, 2, . . . , k is representable as a sum of two
squares. We know that for every rational prime q ≡ 3 (mod 4) the Diophantine
equation q2 f +1 = a2 + b2 has no solutions for every non-negative integer f , because q2 f +1 ≡ 3 (mod 4). However, every even power of q is representable as a
sum of two squares, because q2 f = (q f )2 + 02 for every positive integer f . But
then once again we can use the identity
(a2 + b2 )(c2 + d 2 ) = (ac − bd)2 + (ad + bc)2
to conclude that every integer n with the prime factorization
2f
2f
2f
n = 2t pe11 pe22 · · · pekk q1 1 q2 2 · · · q` ` ,
where p j ≡ 1 (mod 4) for all i = 1, 2, . . . , k and q j ≡ 3 (mod 4) for all j = 1, 2, . . . , `
is representable as a sum of two squares. We will now show that these are the only
numbers representable as a sum of two squares.
To prove this fact, all that we have to do is to show that, whenever n = x2 + y2
and some prime q ≡ 3 (mod 4) satisfies n = qk m, where m is an integer such that
q - m, then the exponent k has to be even. We see that
qk | x2 + y2 = (x + yi)(x − yi).
122
Since every rational prime q ≡ 3 (mod 4) is also a Gaussian and prime, it follows
from Lemma 30.2 that there exist integers j and `, 0 ≤ j, ` ≤ k, such that j + ` = k,
q j | (x + yi) and q` | (x − yi).
Suppose that j ≥ `. Then x + yi = q j (c + di) for some integers c and d. Therefore
x + yi = q j c + q j di,
which means that x = q j c and y = q j d. But then
n = x2 + y2 = q2 j c2 + q2 j d 2 = p2 j (c2 + d 2 ).
Since j ≥ `, we see that 2 j = j + j ≥ j + ` = k, and since qk is the highest power
of q that divides n and q2 j | n, we must conclude that k = 2 j, which is an even
number.
Now that we know for which positive integers n does the Diophantine equation
n = x2 + y2 have a non-trivial solution, there are only two questions left for us
to discuss namely how many solutions are there and how does one compute the
solutions. Let r2 (n) denote the number of integer solutions to n = x2 + y2 , where
x, y ∈ Z are allowed to be positive, negative or zero. As it turns out,
r2 (n) = 4 (d1 (n) − d3 (n)) ,
where d1 (n) and d3 (n) correspond to the number of divisors of n congruent to 1
and 3 modulo 4, respectively. This formula can also be rewritten as follows:
r2 (n) = 4
∑
(−1)
d−1
2
.
d|n
d≡1,3 (mod 4)
From this formula it follows that for every prime p ≡ 1 (mod 4) the Diophantine
equation p = x2 + y2 has only 4 solutions, and if (x, y) is one of them, then the
other three are (x, −y), (−x, y) and (−x, −y).
As for the computation of the actual solutions, when p ≡ 1 (mod 4) is prime,
the computation of x and y such that p = x2 + y2 basically reduces to finding a
quadratic residue of −1 modulo p. This can be done in polynomial time using
the Tonelli-Shanks Algorithm. If z is an integer such that z2 ≡ −1 (mod p) then
one can use the Euclidean algorithm for Gaussian integers to compute x + yi =
gcd(z + i, p). In order to find a solution to n = x2 + y2 for a composite integer
123
n one would have to factor n first, and as we know in general the integer factorization is a difficult problem. In fact, as we saw in Assignment 3, the ability to
represent a composite integer n as a sum of two squares in two different ways
yields a non-trivial factorization of n. Such a method of factorization is called
the Euler Factorization Method. Leonhard Euler used this method to factor the
integer 10000009 = 293 · 3413 by knowing the fact that
10000009 = 10002 + 32 = 9722 + 2352 .
Exercise 30.4. Consider the setup as in Exercise 29.19. We say that γ 6= 0 is an
Eisenstein prime if the factorization γ = αβ for α, β ∈ Z[ω] implies that either α
is a unit or β is a unit.
1. Prove that every rational prime p ≡ 2 (mod 3) is also an Eisenstein prime.
Hint: See Example 29.10.
2. Note that 3 = (1 − ω)(1 − ω 2 ), so 3 is not an Eisenstein prime. Also, it
can be shown that every rational prime p ≡ 1 (mod 3) is not an Eisenstein
prime. Use this fact, as well as Parts (a) and (b), to show that every integer
n with the prime factorization
2f
2f
2f
n = 3t pe11 pe22 · · · pekk q1 1 q2 2 · · · q` ` ,
where pi ≡ 1 (mod 3) for all i = 1, 2, . . . , k and q j ≡ 2 (mod 3) for all j =
1, 2, . . . , `, admits a non-trivial solution (x, y) to the Diophantine equation
n = x2 − xy + y2 .
31
Continued Fractions
Even though most of the real numbers are not rational, to simplify calculations we
approximate them by rationals. However, some rationals are better than others, so
which ones should we pick? For example, we can truncate the decimal expansion
of the number π = 3.1415926535 . . . after the 9th digit, and approximate π by
the rational number 3141592654/109 . However, after a careful investigation we
discover that the rational number 103993/33102 also approximates π to 9 decimal digits while having a significantly smaller denominator. So we can ask the
following question:
124
For a given real number α and a positive integer Q, which rational
numbers p/q with 1 ≤ q ≤ Q correspond to the minimal value of
|α − p/q|?
This question lies in the core of the subarea of Number Theory called Diophantine Approximation. As we will find out, the best possible rational approximations to a non-zero real number α form a sequence {pn /qn }∞
n=0 , entitled the
canonical continued fraction expansion of α. Every canonical continued fraction
is a special case of what is called a partial fraction, whose properties we will now
investigate.
Definition 31.1. Let a0 , a1 , . . . , aN be real numbers such that ai > 0 for all i satisfying 1 ≤ i ≤ N. Define the partial fraction [a0 , a1 , . . . , aN ] by
[a0 , a1 , . . . , aN ] := a0 +
1
a1 + .
.
1
. .+
1
aN
The numbers a0 , a1 , . . . , aN are called partial coefficients of [a0 , a1 , . . . , aN ]. If n is
an integer such that 0 ≤ n ≤ N, the partial fraction [a0 , a1 , . . . , an ] is called the n-th
covergent to [a0 , a1 , . . . , aN ].
Note that in the definition of a partial fraction we let ai ’s be real numbers
such that ai > 0 for all i satisfying 1 ≤ i ≤ N. If we allow ai ’s to be negative or
complex, then not every choice of ai ’s is admissible, as the examples [1, 1, −1]
or [i, i, i] demonstrate. Soon we will introduce canonical continued fractions and
restrict the domain of ai ’s from real numbers to integers.
√ √ √
Example 31.2. Let us determine the value of [ 2, 2, 2]. We have
√
√
√ √ √
√
√
1
2 4 2
[ 2, 2, 2] = 2 + √
= 2+
=
.
3
3
2 + √12
Also, we see that
√
4 2
1
= 1+
3
1+ 7
1√
+3
2
2
√
7
= 1, 1, + 3 2 .
2
Thus several continued fractions can correspond to the same number. Some continued fractions, like
√
4 2
= [1, 1, 7, 1, 2, 1, 7, 1, 2, 1, . . .],
3
125
appear to be periodic, while some continued fractions, like
√
3
3 = [1, 2, 3, 1, 4, 1, 5, 1, 1, 6, 2, 5, 8, . . .]
seem to be aperiodic. They can also be infinite. Certain continued fractions have
quite elegant continued fraction expansions. For example,
tan(1) = [1, 1, 1, 3, 1, 5, 1, 7, 1, 9, 1, 11, 1, 13, . . .].
√ √ √
Exercise 31.3. Compute
[1,
2,
3,
4,
5]
and
[
5, 2 5, 3 5]. Give an example of a
√
continued fraction of 3 2 with at least five terms.
Some elementary properties of continued fractions are
1
,
[a0 , a1 , . . . , an ] = a0 , a1 , . . . , an−1 +
an
[a0 , a1 . . . , an ] = [a0 , [a1 , . . . , an ]]
and, more generally,
[a0 , a1 , . . . , an ] = [a0 , a1 , . . . , am−1 , [am , . . . , an ]] .
Proposition 31.4. Let a0 , a1 , . . . , aN be real numbers such that ai > 0 for all i
satisfying 1 ≤ i ≤ N. For a non-negative integer n, define the real numbers pn and
qn by
p0 = a0 ,
q0 = 1,
p1 = a1 a0 + 1,
q1 = a1
...
...
pn = an pn−1 + pn−2 , qn = an qn−1 + qn−2 .
Then [a0 , a1 , . . . , an ] = pn /qn .
Proof. We will prove this statement using the principle of mathematical induction.
Base case. Clearly, we have
[a0 ] = a0 = a10 = qp00 ,
[a0 , a1 ] = a0 aa11+1 = qp11 ,
so the result holds for n = 0, 1.
Induction hypothesis. Suppose that the statement is true for n = m − 1, m,
where m < N.
126
Induction step. We will show that the result holds for n = m + 1. We have
1
[a0 , a1 , . . . , am+1 ] = a0 , a1 , . . . , am +
am+1
1
am + am+1
pm−1 + pm−2
=
1
am + am−1
qm−1 + qm−2
am+1 (am pm−1 + pm−2 ) + pm−1
am+1 (am qm−1 + qm−2 ) + qm−1
am+1 pm + pm−1
=
am+1 qm + qm−1
pm+1
=
.
qm+1
=
Proposition 31.5. For any positive integer n, it is the case that
pn qn−1 − pn−1 qn = (−1)n−1
or, equivalently,
pn pn−1
(−1)n
−
=
.
qn qn−1
qn qn−1
Proof. See Assignment 6.
Proposition 31.6. For any positive integer n, it is the case that
pn qn−2 − pn−2 qn = (−1)n an
or, equivalently,
pn pn−2 (−1)n an
−
=
.
qn qn−2
qn qn−2
Proof. The result follows from Proposition 31.5:
pn pn−2 an pn−1 + pn−2 pn−2
−
=
−
qn qn−2
an qn−1 + qn−2 qn−2
qn−2 (an pn−1 + pn−2 ) − pn−2 (an qn−1 + qn−2 )
=
qn−2 (an qn−1 + qn−2 )
an (pn−1 qn−2 − pn−2 qn−1 )
=
qn qn−2
n
(−1) an
=
.
qn qn−2
127
Proposition 31.7. Let a0 , a1 , . . . , aN be real numbers such that ai > 0 for all i
satisfying 1 ≤ i ≤ N. Let xn = pn /qn . Then the following hold:
1. It is the case that
x0 < x2 < x4 < . . .
and
x1 > x3 > x5 > . . . .
2. Every odd convergent is greater than any even convergent. That is,
x2k+1 > x2`
for any k and `;
3. The N-th convergent xN is greater than any even convergent and less than
any odd convergent.
Proof. Let us prove property 1. If n is even, then it follows from Proposition 31.6
that
an
pn pn−2
−
=
> 0.
xn − xn−2 =
qn qn−2
qn qn−2
Therefore
xn−2 =
pn−2
pn
= xn
<
qn−2
qn
for all even n. Analogously, one can show that xn−2 > xn for all odd n.
To establish property 2, recall that by Proposition 31.5 we have x2k+1 > x2k for
all non-zero k. If ` ≤ k, then x2k > x2` , so x2k+1 > x2` . If ` > k, then x2` < x2`+1 ;
since x2k+1 > x2`+1 , it follows that x2k+1 > x2` .
Finally, to see that property 3 holds, we note that if xN is even then by property 1
we have x0 < x2 < . . . < xN . Thus xN is greater than any even convergent. On the
other hand, by property 2, every even convergent, including xN , is less than every
odd convergent. The result then follows for all even N, and similarly one can also
argue that it is true when N is odd.
Example 31.8. Let us see an example of the phenomenon described
√ in Proposition
31.7. Consider the following continued fraction expansion of 7:
√
7 = [2, 1, 1, 1, 4, 1, 1, 1, 4, 1, . . .] = 2.64575 . . . .
128
The first 10 convergents of
√
7 are
5 8 37 45 82 127 590 717
2, 3, , , , , ,
,
,
.
2 3 14 17 31 48 223 271
We see that
√
5 37 82 590
717 127 45 8
<
<
<
< ... < 7 < ... <
<
<
< < 3.
2 14 31 223
271
48
17 3
√
The n-th convergents to√the left of 7 correspond to even n, while the n-th convergents to the right of 7 correspond to odd n.
2<
Now let α be a real number. We construct a canonical continued fraction
expansion of α as follows:
Step 1. Define a0 := bαc. If α = a0 then α = [a0 ]. Otherwise let α = a0 +
1/α1 for some α1 .
Step 2. Let a1 = bα1 c. If α1 = a1 then α = a0 + 1/a1 = [a0 , a1 ]. Otherwise
let α1 = a1 + 1/α2 for some α2 .
We repeat this procedure. If it stops after a finite number of steps then α =
[a0 , . . . , aN ]. Otherwise α = [a0 , a1 , . . .] has an infinite canonical continued fraction expansion.
Example 31.9. Let us determine the first five terms in the canonical continued
fraction expansion of π = 3.14159 . . ., as well as the first five convergents of π.
Step 1. Define a0 := bπc = 3. Then
π = [3, α1 ] = 3 +
1
,
α1
where α1 = 1/(π − 3) = 7.06251.
Step 2. Define a1 := bα1 c = 7. Then
π = [3, 7, α2 ] = 3 +
1
,
7 + α12
where α2 = 1/(α1 − 7) = 15.99659. We see that the first convergent to π is
p1
1
1 22
= a0 + = 3 + = .
q1
a1
7
7
129
Step 3. Define a2 := bα2 c = 15. Then
π = [3, 7, 15, α3 ] = 3 +
1
7 + 15+1
,
1
α3
where α3 = 1/(α2 − 15) = 1.00342. We see that the second convergent to
π is
p2
1
1
333
= a0 +
= 3+
=
.
1
1
q2
106
a1 + a
7 + 15
2
Proceeding in the same fashion, we see that
π = [3, 7, 15, 1, 292, 1, . . .],
and the first five convergents of π are
22 333 355 103993 104348
,
,
,
,
.
7 106 113 33102 33215
Exercise 31.10. Determine the first five terms in the canonical continued fraction
expansion of the Euler constant e = 2.71828 . . ., as well as the first five convergents of e.
Exercise 31.11. Prove that α has a finite canonical continued fraction expansion
if and only if α is a rational number.
Proposition 31.12. Let α be a real number and let pn /qn be the n-th convergent
in the canonical fraction expansion of α. Then
|q1 α − p1 | > |q2 α − p2 | > |q3 α − p3 | > . . . .
Proof. Let α = [a0 , a1 , . . . , an , αn+1 ]. Then
α=
αn+1 pn + pn−1
.
αn+1 qn + qn−1
It follows from Proposition 31.5 that
pn αn+1 + pn−1
|qn α − pn | = qn
− pn
qn αn+1 + qn−1
|qn pn−1 − pn qn−1 |
=
|qn αn+1 + qn−1 |
1
=
.
qn αn+1 + qn−1
130
Now note that
qn αn+1 + qn−1 ≥ qn + qn−1
= an qn−1 + qn−2 + qn−1
= qn−1 (an + 1) + qn−2
> qn−1 αn + qn−2 .
The observation made above allows us to conclude that
|qn α − pn | =
1
1
<
= |qn−1 α − pn−1 |.
qn αn+1 + qn−1 qn−1 αn + qn−2
Proposition 31.13. Let α be a real number and let pn /qn be the n-th convergent
in the canonical fraction expansion of α. Then
1
pn
1
<
α
−
<
.
(an+1 + 2)q2n
qn
an+1 q2n
Proof. Let α = [a0 , a1 , . . . , αn+1 ] for some αn+1 such that an+1 ≤ αn+1 < an+1 +1.
Also, let pn /qn = [a0 , a1 , . . . , an ] be the n-th convergent to α. Then it follows from
the formula
αn+1 pn + pn−1
,
α=
αn+1 qn + qn−1
as well as from Proposition 31.5, that
α−
pn
1
=
.
qn
qn (αn+1 qn + qn−1 )
Since qn > qn−1 , we can deduce the desired result by establishing the following
inequalities:
an+1 qn < αn+1 qn + qn−1 < (an+1 + 1)qn + qn = (an+1 + 2)qn .
Proposition 31.14. Let α be a real number and let pn /qn be the n-th convergent
in the canonical fraction expansion of α. Then for all integers p and q such that
0 < q < qn+1 it is the case that |qα − p| ≥ |qn α − pn |.
131
Proof. Note that if p = pn and q = qn then the result holds. Thus we may assume
that p/q 6= pn /qn . Recall from Proposition 31.5 that
pn qn+1 − qn pn+1 = (−1)n+1 .
Then the matrix
pn pn+1
A=
qn qn+1
has a non-zero determinant det A = (−1)n+1 , which means that it is invertible.
Furthermore, the inverse matrix is defined by
1
qn+1 −pn+1
−1
n+1 qn+1 −pn+1
A =
= (−1)
.
pn
−qn
pn
det A −qn
As we can see, the matrix A−1 has integer coefficients. This means that the matrix
equation
p
u
=A
q
v
can be solved in integers u and v, and the solution is
u = (−1)n+1 (qn+1 p − pn+1 q), v = (−1)n+1 (pn q − qn p).
Note that v 6= 0 and u 6= 0, for otherwise it would be the case that p/q = pn /qn
or p/q = pn+1 /qn+1 . Of course, the latter is impossible because, according to the
hypothesis, q < qn+1 .
Now consider the expressions
p = upn + vpn+1 ,
q = uqn + vqn+1 .
Note that
q = uqn + vqn+1 < qn+1 .
We claim that u and v have opposite signs. If we assume that both u and v are
negative then q would be negative, which contradicts the assumption q > 0. On
the other hand, if we assume that both u and v are positive, then q would have to
exceed qn+1 . This would lead us to a contradiction to the inequality established
above. Since neither u nor v can be zero, we see that our claim holds; that is, the
numbers u and v have different signs.
132
Next, recall that according to property 3 of Proposition 31.7 either
pn
pn+1
pn+1
pn
<α <
or
<α <
qn
qn+1
qn+1
qn
must hold, depending on whether n is even or odd. In any case, it must be that
αqn − pn and αqn+1 − pn+1 have different signs. Since u, v have different signs
and αqn − pn , αqn+1 − pn+1 have different signs, the signs of u(qn α − pn ) and
v(qn+1 α − pn+1 ) match. Hence
|qα − p| = |α(uqn + vqn+1 ) − (upn + vpn+1 )|
= |u(qn α − pn ) + v(qn+1 α − pn+1 )|
= |u(qn α − pn )| + |v(qn+1 α − pn+1 )|
≥ |u||qn α − pn |
≥ |qn α − pn |.
The fact that u(qn α − pn ) and v(qn+1 α − pn+1 ) have the same signs was utilized
to establish the third equality. In turn, the last inequality follows from the fact that
u is a non-zero integer.
Corollary 31.15. Let p/q be a rational number and let α be a real number. Then
the inequality
p
1
α− < 2
q
2q
implies that p/q = pn /qn for some non-negative integer n. That is, the number
p/q appears as a convergent in the canonical continued fraction expansion of α.
Proof. See Assignment 6.
We conclude this section by discussing the question of periodicity of canonical
continued fraction expansions.
Definition 31.16. Let α be a real number with the canonical continued fraction
expansion
α = [a0 , a1 , . . . , an ; b1 , b2 , . . . , bk , b1 , b2 , . . . , bk , b1 , . . .].
In other words, at some point the elements of the continued fraction expansion
start to repeat. We indicate this by writing
α = [a0 , a1 , . . . , an ; b1 , b2 , . . . , bk ].
133
A canonical continued fraction expansion of such kind is called preperiodic, and
if the terms a0 , a1 , a2 , . . . , an are missing we say that it is periodic. The smallest
number k such that the terms repeat is called the period of a continued fraction.
It was proved by Joseph-Louis Lagrange that a real number α has a preperiodic
canonical continued√fraction expansion if and only if it is a quadratic irrational.
That is, α = a + b d for some rational numbers a, b 6= 0 and d, where d is a
positive integer that is not a perfect square.
Example
31.17. Let us determine the canonical continued fraction expansion of
√
7. By computing the first few terms, we see that
√
7 = [2, 1, 1, 1, 4, 1, 1, 1, 4, 1, . . .].
√
Thus we can guess that 7 = [2, 1, 1, 1, 4]. Let us prove this fact.
Let θ = [1, 1, 1, 4]. Then
θ = [1, 1, 1, 4] = [1, 1, 1, 4, θ ] = 1 +
1
1 + 1+
1
1
4+ θ1
=
14θ + 3
.
9θ + 2
We see that θ satisfies the equation
3θ 2 − 4θ − 1 = 0.
The above equation has two roots, but since θ > 0 we can conclude that
√
2+ 7
θ=
.
3
Then
√
1
3
7+2 7 √
√ =
√ = 7,
[2, 1, 1, 1, 4] = [2, θ ] = 2 + = 2 +
θ
2+ 7
2+ 7
as claimed.
√
Exercise 31.18. Determine canonical continued fraction expansions for 1+2 5 and
√
2. Are they both preperiodic? Are they both periodic? What are the periods of
their continued fraction expansions?
Exercise 31.19. Prove that if a real number α has a preperiodic canonical continued fraction expansion, then there exist rational integers a, b and c, not all zero,
such that
aα 2 + bα + c = 0.
134
32
The Pell’s Equation
For more details on the subject, we refer the reader to the monograph of M. J. Jacobson, Jr. and H. C. Williams, Solving the Pell Equation, 2009.
In 1773, Gotthold Ephraim Lessing was appointed librarian of the Herzog
August Library in Wolfenbüttel, Germany. In this library, he discovered an ancient Greek manuscript with a poem of 44 lines, which contained an interesting
arithmetical problem. This problem is attributed to Archimedes and is called the
Archimedes’ Cattle Problem. The problem was to calculate the number of cattle
in the herd of Helios, the god of the sun. There were two parts to this problem,
the first of which could be solved relatively easy by setting up a system of seven
equations with eight unknowns, each for one type of bulls and cows present in the
herd. Much more challenging was the second part of the problem, which, in its
x2 − 4729494y2 = 1.
Despite its innocent look, the smallest solution to this equation has more than
100000 digits. In 1880, A. Amthor discovered that the smallest herd that could
satisfy both parts of this problem had approximately 7.76 × 10206544 bulls. In
comparison, it is conjectured that there are between 1078 and 1082 atoms in the
known, observable universe.46 Of course, Amthor himself did not calculate this
number precisely. In 1965, the precise answer to the Archimedes’ Cattle Problem
was given by Hugh Williams, Gus German and Robert Zanke, who were University of Waterloo students at that time. To calculate the answer, they used a
combination of the IBM 7040 and IBM 1620 computers. You can find a fascinating article about the history of computing at the University of Waterloo here:
https://cs.uwaterloo.ca/40th/Chronology/printable.shtml.
An equation of the form
x2 − dy2 = ±1,
where d is positive and is not a perfect square, is called a Pell’s equation. The
name is due to Euler, who attributed the method of solving this equation to John
Pell. It is widely believed that Euler actually made a mistake and confused John
Pell with William Brouncker. The English mathematician William Brouncker discovered a general method for solving the Pell’s equation, which was based on
continued fractions. He was able to apply it to the equation
x2 − 313y2 = 1
46 According
to http://www.universetoday.com/36302/atoms-in-the-universe/.
135
and find the smallest positive solution
x = 32188120829134849, y = 1819380158564160.
When writing to Frenicle de Bessy who proposed this problem to him, Brouncker
claimed that it only took him “an hour or two” to find the solution. In 1768,
Joseph-Louis Lagrange managed to prove that Pell’s equation has a solution different from (±1, 0) for every positive d that is not a perfect square.
We will now apply Corollary 31.15 to show that every positive solution to
Pell’s equation
x2 − dy2 = ±1
√
must arise as a convergent of d.
Theorem 32.1. Let d be a positive integer that is not a perfect square. Then every
solution (x, y) 6= (±1, 0) to Pell’s equation
x2 − dy2 = ±1
must satisfy x/y
√ = pn /qn for some positive integer n, where pn /qn is the n-th
convergent of d.
Proof. Suppose that (x, y) 6= (±1, 0) is a solution. Without loss of generality, we
may assume that x and y are positive. Then
p
√
x ≥ dy2 − 1 ≥ y d − 1.
Therefore
√
|x − dy| =
since
1
1
1
< ,
√
≤√ q
2y
|x + dy|
dy 1 + 1 − d1
√
√
d + d − 1 > 2 for d ≥ 2. Thus
√
1
x
d − < 2.
y
2y
It follows from Corollary 31.15 that x/y is a convergent of
136
√
d.
33
Algebraic and Transcendental Numbers.
Liouville’s Approximation Theorem
In 1840, the French mathematician Joseph Marie Liouville proved the so-called
Approximation Theorem, which allowed him to discover the first transcendental
−k! . This number is called the Liouville Number. You are asked
number ∑∞
k=0 10
to reproduce Liouville’s proof for a different number in Exercise 33.7.
Definition 33.1. A complex number α is called algebraic if there exists a nonzero polynomial f (t) with rational coefficients such that f (α) = 0. Otherwise, it
is called transcendental.
Definition 33.2. Let α be an algebraic number. Let
f (t) = cd t d + cd−1t d−1 + . . . + c1t + c0
be a polynomial such that
a) f (α) = 0;
b) c0 , c1 , . . . , cd ∈ Z;
c) cd > 0;
d) gcd(c0 , c1 , . . . , cd ) = 1;
e) The polynomial f (t) has the smallest degree among all non-zero polynomials
satisfying a), b), c) and d).
Then f (t) is called the minimal polynomial of α. It is a fact from algebraic number
theory that such a polynomial is unique. We say that the algebraic number α has
a degree d if the degree of its minimal polynomial is equal to d, i.e. deg f = d.
√
√
Example 33.3. Consider the number 2. This number is algebraic, since 2 is a
root of the polynomial f (t) = t 2 − 2, which has rational coefficients. Note that it is
also a root of f1 (t) = 0, or f2 (t) = t 3 + 3t 2 − 2t − 6, or f3 (t) = 6t 2 − 12. However,
none of these polynomials satisfy Definition 33.2.
p√
√
Exercise 33.4. Explain why the numbers α = 0, 1/2, i,
2 + 3 are algebraic.
For each α, find a non-zero monic polynomial with rational coefficients such that
f (α) = 0.
137
Exercise 33.5. a) Prove that every rational number x/y has degree 1;
b) Prove that every quadratic irrational
√ has degree 2. In other words,2show that
every number of the form a + b d, where a, b, d ∈ Q and d 6= ±r for some
r ∈ Q, satisfies some polynomial f (x) of degree 2 and does not satisfy any
polynomial of degree 1.
Some properties of an irreducible polynomial:
• For a given algebraic number α the minimal polynomial of α is unique;
• Every minimal polynomial f (t) is irreducible over the field of rational numbers. That is, if g(t) | f (t) and g(t) ∈ Q[t], then g(t) = ± f (t) or g(t) = ±1;
• Let α be a root of its minimal polynomial f (t). Then f 0 (α) 6= 0. That is, in
C[t] it is the case that (t − α) | f (t) while (t − α)2 - f (t).
Theorem 33.6. (Liouville’s Approximation Theorem, 1840) Let α be an irrational algebraic number (that is, a number of degree d ≥ 2). Then there exists
some constant C, which depends only on α, such that for any x ∈ Z, y ∈ N the
following inequality holds:
x
C
α− ≥ d.
y
y
Proof. 47 Let f (t) = cd t d + . . . + c1t + c0 be the minimal polynomial of α. Since f
is irreducible over Q and is of degree d ≥ 2, it has no rational roots, so f (x/y) 6= 0
for any x ∈ Z, y ∈ N. Furthermore,
x
f
=
y
k
1
x
∑ ck y = yd
k=0
d
d
∑ ck xk yd−k
k=0
|
{z
∈N
≥
1
.
yd
}
We now apply the Mean Value Theorem and observe that there exists some
real ξ , satisfying
f (α) − f (x/y) − f (x/y)
f 0 (ξ ) =
=
.
α − x/y
α − x/y
47 The
proof is from P. Garrett, Liouville’s theorem on diophantine approximation, 2013.
See
http://www.math.umn.edu/~garrett/m/mfms/notes_2013-14/04b_Liouville_
approx.pdf. Note that there is an error in these notes: instead of estimating | f 0 (ξ )| from above,
the author obtains the estimate from below.
138
Rearranging the terms of the above equality, we get
| f 0 (ξ )|−1
x
x
−1
α− = f
.
· f 0 (ξ )
≥
y
y
yd
For now, our constant | f 0 (ξ )|−1 depends on α and y (note that x depends on α
and y), but it is not hard to eliminate the dependency on y by slightly adjusting our
constant. In particular, since f is minimal, the multiplicity of α is 1, which means
that f 0 (α) 6= 0. This means that for all ξ within some small neighbourhood Uα
of α, it must be the case that 0 < | f 0 (ξ )| ≤ 2| f 0 (α)|. Plainly, there exists some
large y0 , which depends only on α, such that some rational fraction x/y with the
denominator y ≥ y0 falls into Uα . We conclude that
α−
| f 0 (ξ )|−1 | f 0 (α)|−1
x
≥
≥
y
yn
2yd
for all y ≥ y0 . Finally, we choose our constant c by picking the minimum between
2−1 | f 0 (α)|−1 and yd |α − x/y| over all y < y0 . This concludes the proof.
Liouville’s Approximation Theorem is a very elegant result which can be explained on a rather intuitive level. As y grows, we certainly expect our approximations x/y of α to be more precise. The question is, to what extent, and how
can we measure the ”quality” of our approximation? The theorem tells us that any
irrational algebraic number cannot be approximated “too well” by rational numbers. One intuitive explanation of this
is the following: no fraction
phenomenon
1
will approximate α better than up to d + logy C base-y places.
√
For example, when α = 2 one may take C = 1/4, and observe that for all
q≥2
√
x
1
2− ≥ 2.
y
4y
One of the ways to interpret
√ the above inequality is as follows: no fraction x/y for
y > 2 will approximate 2 significantly better than up to 2 base-y places.
Many more things can be said regarding Liouville’s inequality. For example,
one may ask what happens if we make C a function of y:
α−
x
C(y)
≥ d .
y
y
It turns out that for d = 2 one cannot replace the constant C with some monotonously
increasing function C(y), but for d ≥ 3 this can be done. The first improvement
139
of such kind was introduced by Thue in 1909, who showed that one can take
d
C(y) = c1 y 2 −1−ε for some constant c1 , which depends only on α, and any ε > 0.
This result allowed him to prove Thue’s Theorem. The further improvements
were developed by Siegel, Gelfond and Dyson, until in 1955 Roth showed that
C(y) = c1 yd−2−ε would do the job as well. In basic terms, his result states that
there are only finitely many rational approximations x/y to α of degree ≥ 3, which
will result in more than 2 + ε accurate base-y places.
Exercise 33.7. (a) Prove that, for every integer n ≥ 1, the number
∞
α :=
1
1
1
1
1
∑ 2k! = 1 + 2 + 4 + 64 + 16777216 + . . .
k=0
satisfies the inequality
n
1
1
< n! n .
k!
(2 )
k=0 2
α−∑
Hint: Note that
(9)
∞
1
1
<
.
∑
k!
2k
k=n+1 2
k=(n+1)!
∞
∑
Use the formula for the infinite geometric series afterwards.
(b) Use Liouville’s Theorem and the inequality established in Part (a) to prove
that the number α is either rational or transcendental.
Hint: Suppose not. Then there exist fixed integers d ≥ 2 and C > 0 such that
α−
C
x
≥ d
y
y
for all integers x and y > 0. Why does this inequality contradict the inequality
(9)?
34
Elliptic Curves
Let n be a squarefree number. We say that n is congruent if there exists a right
triangle with rational sides whose area is n. For example, the number 5 is congruent since it is the area of the right triangle with rational sides 20/3, 3/2 and 41/6.
140
Number 6 is also congruent, since it is the area of the right triangle with rational
sides 3, 4 and 5. In contrast, the number 3 is not congruent. Also, note that if n
is congruent, then any integer of the form s2 n also trivially arises as the area of
a right triangle with rational sides. That is why we restrict our attention only to
squarefree n.
Given a squarefree number n, how can we find out whether it is congruent or
not? Essentially, what we need to do is to solve the system of equations
(
a2 + b2 = c2 ;
1
2 ab = n
for a, b, c ∈ Q. Set
x=
2n2 (a + c)
n(a + c)
, y=
.
b
b2
Then
y2 = x3 − n2 x,
where y 6= 0. Thus, instead of the original system of equations we just have to
find x, y ∈ Q such that y2 = x3 − n2 x. If such rational x and y exist, one can easily
obtain a solution to the original system of equations by setting
2nx
x 2 + n2
x2 − n2
a=
, b=
, c=
.
y
y
y
Thus we just have to find a rational point (x, y) on the curve y2 = x3 − n2 x. Such a
curve is an example of elliptic curve.
Definition 34.1. Let F = Fq , Q, R, C, where q is a prime power.48 Let a, b ∈ F be
such that 4a3 + 27b2 6= 0. The collection
E(F) = (x, y) ∈ F2 : y2 = x2 + ax + b ∪ {∞}
is called an elliptic curve, defined over the field F. Here ∞ denotes the point at
infinity. The value
∆ = −16(4a3 + 27b2 )
is called the discriminant of an elliptic curve E(F).
48 Here
Fq denotes the finite field of order q. We will not give a rigorous construction of Fq here.
We remark though that when q is prime the finite field Fq is the same as Zq , the ring of residue
classes modulo q.
141
Example 34.2. The graph of an elliptic curve E1 : y2 = x3 − 25x over R is depicted on Figure 5. This elliptic curve, aside from trivial rational points (0, 0)
and (±5, 0), contains a rational point (x, y) = (45, 300). This fact implies that the
number 5 is congruent. Furthermore, one can show that in the case of E1 (Q) the
existence of one non-trivial rational point implies the existence of infinitely many
rational points. In contrast, E2 : y2 = x3 − 9x has no non-trivial rational points, so
the elliptic curve E2 (Q) contains only four points, namely (0, 0), (±3, 0) and the
point at infinity. Both curves E1 (R) and E2 (R) contain infinitely many points.
Also, note that the graph of E1 (R) contains two connected components. This
is because the discriminant of E2 is equal to ∆(E1 ) = 106 and is positive. In
contrast, the discriminant of E3 : y2 = x3 − 2 is equal to ∆(E3 ) = −1728 and is
negative. The negative sign indicates that the graph of E2 (R) has one connected
component.
-20
-15
-10
-5
10
10
5
5
0
5
10
15
20-20
-15
-10
-5
0
-5
-5
-10
-10
5
10
15
20
Figure 5: Elliptic curves y2 = x3 − 25x and y2 = x3 − 2
Exercise 34.3. Find integers a and b such that the discriminant of a curve y2 =
x3 + ax + b is equal to zero. How does the graph of such a curve look like?
Many problems in number theory are actually connected to elliptic curves. For
example, consider the Fermat equation
a3 + b3 = c3 .
The question of existence of non-trivial solutions to this Diophantine equation is
equivalent to solving the equation
u3 + v3 = 1
142
in rational numbers u and v. If we now let
x = 12(u2 − uv + v2 ), y = 36(u − v)(u2 − uv + v2 ),
then
y2 = x3 − 432.
If some point (x, y) ∈ Q2 lies on the elliptic curve determined by the above equation, then it is straightforward to check that the numbers
u=
36 + y
36 − y
, v=
6x
6x
are rational and satisfy u3 + v3 = 1. So once again the existence of a solution to
some Diophantine equation reduces to the question of existence of a non-trivial
rational point on some elliptic curve.
The first questions about elliptic curves date back to Diophantus of Alexandria,
who looked at the Diophantine equation of the form
y(6 − y) = x3 − x.
Fermat claimed that he knew how to solve the Diophantine equation y2 = x3 + 1,
but did not provide his proof. The problem got fully resolved only one century
later by Euler. The field of algebraic number theory essentially was born when
Euler tried to solve the Diophantine equation y2 = x3 − 2 by writing
√
√
x3 = y2 + 2 = (y + −2)(y − −2)
√
√
and then claiming that y + −2 and y − −2 are “coprime”, without rigorously
explaining what coprimeness
means in this setting. Of course, his intuition was
√
correct: the ring
Z[
−2]
is
a
√
√ Unique Factorization√Domain, and indeed one can
show that y + −2 and y − −2 are coprime in Z[ −2], as long as y 6= 0.
Elliptic curves got extensively studied over the past two centuries. The theory of elliptic curves truly blossomed with the prominent work of Weierstrass on
elliptic functions, which connects elliptic curves defined over the field of complex numbers C to lattices on a complex plane. In fact, every elliptic curve arises
from (or can be reduced to) a lattice on the complex plane! Elliptic curves are
intimately connected to modular forms, and the development of the theories of
elliptic curves and modular forms resulted in Andrew Wiles’s proof of Fermat’s
Last Theorem (see Section 28 for more details).
143
Other prominent mathematicians which contributed a lot to the development of
the theory of elliptic curves were Abel and Jacobi. By studying so-called elliptic
integrals, they realized that, in fact, one can impose arithmetic on points of an
elliptic curve. More precisely, such an arithmetic takes place whenever an elliptic
curve E(F) is defined over a field F. This is why we restrict our attention only to
F = Fq , Q, R, C and not, say, Z or Z/pk Z for p prime and k ≥ 2. The latter two
collections are rings but not fields.
To explain what this means, consider for now some elliptic curve E defined
over the field of real numbers R. For two distinct points P, Q ∈ E(R), we draw
a line through P and Q. Of course, this line is uniquely defined. For now, let us
assume that this line is neither tangent to P nor to Q (see the first picture on Figure
6).49 Our line will intersect E at some third point, say R. Our arithmetic on an
elliptic curve is then defined as follows:
P + Q + R = ∞;
that is, any three points P, Q and R which lie on E add up to ∞ (the point at
infinity). Alternatively, if R0 = (xR0 , yR0 ), we can write
P + Q = −R0 ,
so by “adding” two points together we were able to produce the third point,
namely −R0 = (xR0 , −yR0 ). On Figure 6, the point at infinity is actually denoted by
0. Soon we will see that there is a deep reason for this alternative notation.
Figure 6: Group law
49 The
picture is taken from Wikipedia: https://upload.wikimedia.org/wikipedia/
commons/thumb/7/77/ECClines-2.0.svg/680px-ECClines-2.0.svg.png.
144
We can formalize the observations made above as follows. Let P = (xP , yP )
and Q = (xQ , yQ ). Suppose that xP 6= xQ . Let
s=
yP − yQ
xP − xQ
denote the slope of the line passing through the points P and Q. Then we define
the third point R = (xR , yR ) = P + Q as follows:
xR = s2 − xP − xQ , yR = −yP + s(xP − xR ).
It is straightforward to verify that R indeed belongs to E(R). Furthermore, if we
look closer at the expressions for xR and yR , we can notice that they preserve the
field of definition. That is, if P and Q are points in R2 , then R is also a point in
R2 . If P and Q are points in Q2 , then so is R. This applies to any field, so the
procedure of addition of points is well-defined for any base field F. See Figure 7
for the demonstration that the field of definition remains unchanged. All the points
in this example belong to Z2 , and therefore in Q2 as weill. Note, however, that in
general an addition of two integer points on an elliptic curve may not result in an
integer point, but it will result in a rational point or a point at infinity.50
We need to consider three special cases separately. For example on the second
picture of Figure 6, we see the situation when the line is tangential to the point Q.
This picture corresponds to the following: if instead of distinct points P and Q we
pick two identical points, i.e. P = Q, then we can think of the tangent line as the
line which passes through both P and Q. In this case, the slope of our line tangent
to E at P = (xP , yP ) is equal to
3xP2 + a
,
s=
2yP
and we may compute the point R = (xR , yR ) = P + P as
xR = s2 − 3xP , yR = −yP + s(xP − xR ).
Once again, we can easily verify that (xR , yR ) indeed lies on E and the formulas of xR and yR above preserve the field of definition. That is, if P ∈ F, then
R = P + P ∈ F. For short, we write R = 2P, and more generally
nP = |P + P +
{z. . . + P} .
n times
50 The
picture is taken from William Stein’s lecture notes, Chapter 6, Figure 6.3: http://
wstein.org/simuw06/ch6.pdf.
145
Figure 7: The group law: (1, 0) + (0, 2) = (3, 4) on y2 = x3 − 5x + 4
The only two special cases left for us to consider is when P 6= Q with xP = xQ
(third picture on Figure 6) and when P = Q with yP = 0 (fourth picture on Figure 6).
Both cases result in a vertical line, which has an infinite slope. In the former case,
we write P + Q = ∞, and in the latter case we write 2P = ∞.
At this point, we covered all four cases that can arise. In this unorthodox way,
we were able to define the operation of addition ”+” on E(F). We can also define
the operation of negation: if P = (xP , yP ), we write −P = (xP , −yP ). Also, we can
define the operation of subtraction ”−” as follows: P − Q = P + (−Q). One can
also notice that the point at infinity plays the role of zero, and this explains the
notation present in Figure 6. We summarize the observations made above (and
introduce a few more) in Proposition 34.4.
Proposition 34.4. (The Group Law) Let F be a field and E(F) be an elliptic curve.
The collection of points E(F) forms a group, called a Mordell-Weil Group, under
the operation of addition. That is, it satisfies the following four group axioms:
1. Closure. For all P, Q ∈ E(F), P + Q ∈ E(F);
2. Associativity. For all P, Q, R ∈ E(F), (P + Q) + R = P + (Q + R);
146
3. Identity element. For all P in E(F), the element ∞ satisfies
P + ∞ = ∞ + P = P;
4. Inverse element. For each P in E(F) there exists an element −P in E(F)
such that
P + (−P) = (−P) + P = ∞.
Furthermore, the group of points on an elliptic curve E(F) is Abelian:
5. Abelianness. For all P, Q ∈ E(F), P + Q = Q + P.
Theorem 34.5. (Mordell’s Theorem, 1922) Every elliptic curve E defined over
the field of rational numbers Q is a finitely generated Abelian group. That is,
E(Q) ∼
= C × Zr ,
where r is a non-negative integer and C is a finite Abelian group.
The main point of the above theorem is that the number r is finite. It is called
the Mordell-Weil rank of an elliptic curve E(Q). Such a nice classification is
impossible when the base field is R or C. In its essence, the theorem is saying
that even though there can be infinitely many rational points, there cannot be “too
many” of them in a very precise sense.
To better explain the theorem of Mordell, let us recall the notion of an order
of a group element. Just like for other groups that we studied, we say that the
point P ∈ E(F) has order n if n is the smallest positive integer such that nP = ∞.
If such an integer does not exist, we say that P has infinite order. According to
Mordell’s Theorem, there exist r elements P1 , P2 , . . . , Pr of infinite order such that
every element P ∈ E(Q) can be written in the form
r
P = T + ∑ ni Pi ,
i=1
where n1 , n2 , . . . , nr are integers and T is a point of finite order (such points are
called torsion points).
An elliptic curve is a first interesting example of what is called an Abelian
variety. In 1928, the theorem of Mordell was generalized by the French mathematician André Weil to all Abelian varieties.
We conclude this section with Siegel’s Theorem, which has profound consequences in the analysis of Diophantine equations related to elliptic curves.
147
Theorem 34.6. (Siegel’s Theorem, 1929) Every elliptic curve E(C) contains only
finitely many integer points. That is, for any numbers a, b ∈ C such that 4a3 +
27b2 6= 0, the Diophantine equation
y2 = x3 + ax + b
has only finitely many solutions in integers x and y.
148
```