Download MA 311 NUMBER THEORY BUTLER UNIVERSITY FALL 200 1

Document related concepts

Elliptic curve primality wikipedia , lookup

Sieve of Eratosthenes wikipedia , lookup

Addition wikipedia , lookup

Mersenne prime wikipedia , lookup

Integer triangle wikipedia , lookup

Wieferich prime wikipedia , lookup

List of prime numbers wikipedia , lookup

Prime number theorem wikipedia , lookup

Transcript
MA 311
NUMBER THEORY
BUTLER UNIVERSITY
FALL 2008
SCOTT T. PARSELL
1. Introduction
Number theory could be defined fairly concisely as the study of the natural numbers:
1, 2, 3, 4, 5, 6, . . . . We usually denote this set by ℕ. The set of all integers (including 0 and
the negatives) is denoted by ℤ. Is there anything about the natural numbers that’s worth
studying? It seems that we have a pretty good understanding of them once we’ve learned
to count! Perhaps surprisingly, this turns out to be a rich and fascinating field of study,
bursting with unsolved problems. A good starting point for our investigations is to look at
how the natural numbers factor.
Primes. A prime number is a number greater than 1 that cannot be written as the
product of two smaller natural numbers. The first few primes are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, . . . .
Integers exceeding 1 that are not prime are called composite. The primes are important
because each natural number greater than 1 can be written as a product of primes, and
this factorization is unique (up to the order of the factors). For example, 24 = 23 ⋅ 3 and
105 = 3 ⋅ 5 ⋅ 7. It is fairly easy to show that there are infinitely many prime numbers; we’ll
prove this in a later section. However, there remain many interesting unsolved (or partially
solved) questions about the primes and how they are distributed. For example,
∙ How precisely can we estimate the number of primes less than 𝑥? (We know that
𝑥/ log 𝑥 gives a good first approximation.) What about primes of the form 4𝑛 + 1, of
the form 4𝑛 + 3, etc.?
∙ Are there infinitely many primes of the form 𝑛2 + 1? How about of the form 2𝑛 − 1?
Of the form 2𝑛 + 1?
∙ Is there an efficient algorithm for finding a number’s prime factorization or proving
that a number is prime? (The difficulty of factoring efficiently is the basis of the
security of RSA encryption.)
∙ Are there infinitely many pairs of “twin primes”, i.e., primes whose difference is two,
such as 3 and 5)? If not, can anything be said about small gaps between primes
asymptotically?
∙ (Goldbach’s problem) Can every even integer exceeding 2 be written as the sum of
two primes?
Questions about the distribution of primes usually fall under the heading of analytic
number theory because many of the techniques are based on real and complex analysis (i.e.,
mathematics related to calculus).
1
2
SCOTT T. PARSELL
Divisibility and congruences. Along with the idea of factoring integers comes the
notion of divisibility. We say that 𝑎 divides 𝑏 if there exists an integer 𝑘 such that 𝑎𝑘 = 𝑏.
For example, 4 divides 24 since 4 ⋅ 6 = 24, and 15 divides 105 since 15 ⋅ 7 = 105. Divisibility
leads to the important idea of congruences. We say that 𝑎 is congruent to 𝑏 modulo 𝑛 if 𝑛
divides 𝑎 − 𝑏. In this case, we write
𝑎≡𝑏
(mod 𝑛).
For example, 3 ≡ 75 (mod 24) and 8 ≡ 38 (mod 10). Arithmetic with congruences (sometimes called modular arithmetic) is useful for detecting certain types of periodic phenomena.
For example, one could use arithmetic mod 24 to keep track of the hour of day (in military
time) without regard to minutes, seconds, or day. One could use arithmetic mod 10 to keep
track of the last digit of a positive number (or mod 100 to keep track of the last two digits).
If 𝑛 objects are arranged in a circle, then arithmetic mod 𝑛 can be used to keep track of
the positions of the objects as they are rearranged. We’ll see some more interesting uses of
congruences later on. For instance, they can be used to construct check-digit schemes to
minimize errors in data entry. Facts about the computation of powers modulo 𝑛 form the
basis for constructing an RSA cryptosystem.
Rings and fields. If one is doing arithmetic with congruences, say modulo 6, then
effectively there are only 6 distinct “numbers” to work with, usually denoted by 0, 1, 2, 3,
4, and 5. Under this scheme, the number 0 actually stands for the set
[0]6 = {. . . , −24, −18, −12, −6, 0, 6, 12, 18, 24, . . . }.
Similarly, 1 stands for [1]6 = {. . . , −17, −11, −5, 1, 7, 13, 19, . . . }, and so on. However, it is
convenient to pick one small integer (usually either the smallest positive integer or the one of
smallest absolute value) to represent each “congruence class”. The integers themselves are
an example of an abstract algebraic structure called a ring, which is basically a set equipped
with addition and multiplication operations satisfying basic properties like associativity and
the distributive law (we omit the precise definition of a ring here). The set of congruence
classes {0, 1, 2, 3, 4, 5} can be viewed as a ring in its own right, sometimes denoted by ℤ/6ℤ
or ℤ6 , with addition and multiplication defined modulo 6. For example, 2 + 5 = 1 and
2 ⋅ 3 = 0 in the ring ℤ6 .
One defect of rings is that multiplicative inverses do not exist in general. For example, 2
does not have a multiplicative inverse in ℤ, nor in ℤ6 . However, 2 does have a multiplicative
inverse in ℤ7 , since 2 ⋅ 4 = 1 under mod 7 arithmetic. Special rings in which all nonzero elements have multiplicative inverses (such as the rational numbers, real numbers, and complex
numbers) are called fields. It turns out that
ℤ𝑛 = {0, 1, 2, . . . , 𝑛 − 1},
under arithmetic modulo 𝑛 is a field if and only if 𝑛 is prime. Our algebra with congruences
will be influenced by these considerations. Just as the equation 2𝑥 = 1 can be solved over
the rationals but not over the integers, the congruence 2𝑥 ≡ 1 (mod 𝑛) can be solved when
𝑛 = 7 but not when 𝑛 = 6 (in other words, the equation 2𝑥 = 1 has a solution over ℤ7 but
not over ℤ6 ).
One can construct further examples of rings
√ by “adjoining” irrational or complex numbers
to the set of integers. For example if 𝑖 = −1, then the set ℤ[𝑖] of all complex numbers
of the form 𝑎 + 𝑏𝑖, where 𝑎 and 𝑏 are integers, forms a ring, known as the ring of Gaussian
MA 311
NUMBER THEORY
FALL 2008
3
integers. One can ask whether such a ring has any number-theoretic properties in common
with the integers, such as unique factorization. It turns out that this ring does have unique
factorization, but not all the integer primes remain prime in ℤ[𝑖]. For instance, 2 = (1 +
𝑖)(1 − 𝑖), but 3 remains irreducible. The numbers 1 + 𝑖 and 1 − 𝑖 are primes in ℤ[𝑖], and the
number 6 has the
√unique prime factorization 6 = (1 + 𝑖) ⋅ (1 − 𝑖) ⋅ 3.
If we let 𝛿 = −5, then we can construct the ring ℤ[𝛿], which is the set of all complex
numbers of the form 𝑎 + 𝑏𝛿, where 𝑎 and 𝑏 are integers. Something bizarre happens when
we try to factor 6 in this ring. We obviously have
6=2⋅3
and
6 = (1 + 𝛿)(1 − 𝛿),
and one can show that 2, 3, 1 + 𝛿, and 1 − 𝛿 are all irreducible in the ring ℤ[𝛿]. Thus we have
two different factorizations for 6, which means that unique factorization fails in this ring!
The study of primes and factorization in rings such as ℤ[𝑖] and ℤ[𝛿] forms the basis for
much of algebraic number theory. Here one makes heavy use of general results from modern
algebra, so we won’t pursue this branch of the subject very deeply.
Diophantine equations. One area of number theory that we hope to touch on later in
the course overlaps with both analytic and algebraic number theory. A diophantine equation
is simply an equation (usually a polynomial in two or more variables) for which we seek
integer (or sometimes rational) solutions; a classic example is the equation
𝑥2 + 𝑦 2 = 𝑧 2 .
This equation has many integer solutions, such as (3, 4, 5) and (5, 12, 13). In fact, it can
be shown that there are infinitely many integer solutions, and all the solutions can be described by an explicit parametrization. These are the so-called Pythagorean triples, which
correspond to the lengths of the sides in right triangles. Interestingly, things become dramatically different if we change the equation to 𝑥3 + 𝑦 3 = 𝑧 3 . Here the only integer solutions
are the “trivial” ones with 𝑥𝑦𝑧 = 0. In fact, Fermat’s Last Theorem asserts that if 𝑘 is any
integer exceeding 2 then the diophantine equation 𝑥𝑘 + 𝑦 𝑘 = 𝑧 𝑘 has only trivial solutions.
This seemingly innocent conjecture remained unproven for over 300 years until deep work of
Wiles resolved it in 1995.
As another example, consider the diophantine equation 𝑦 2 = 𝑥3 +17. This is an example of
an elliptic curve, which more generally has the form 𝑦 2 = 𝑓 (𝑥), where 𝑓 is a cubic polynomial.
It turns out that the rational points lying on such a curve have an additive group structure,
and this can be used as the basis for an encryption scheme and also for an efficient factoring
algorithm. Wiles also exploited connections with elliptic curves in his proof of Fermat’s Last
Theorem. All this work on diophantine equations in few variables uses primarily algebraic
techniques, so the detailed study of these topics is best left for a more advanced course.
A theorem of Lagrange states that every positive integer can be expressed as the sum of
four perfect squares. In other words, the diophantine equation
𝑥21 + 𝑥22 + 𝑥23 + 𝑥24 = 𝑛
can be solved for every positive integer 𝑛. For instance, when 𝑛 = 31 we can take 𝑥1 = 5,
𝑥2 = 2, 𝑥3 = 1, and 𝑥4 = 1. A generalization of this question known as Waring’s problem
asks what happens with higher powers. For instance, how large does 𝑠 have to be in order
to represent all integers as sums of 𝑠 perfect cubes? (The answer turns out to be 9.) What if
we only need to represent all sufficiently large integers? Here we know that 7 cubes suffice,
4
SCOTT T. PARSELL
but it’s conjectured that 4 would be enough! The type of diophantine equation involved in
Waring’s problem typically has enough variables that it can be attacked by analytic methods,
and this has been a very active area of research over the past 20 years. We’ll discuss some
of the underlying ideas later in the course.
In Waring’s problem, one could also ask what happens if the variables are restricted to
be primes. For example, the Goldbach problem mentioned earlier amounts to solving the
equation 𝑝1 + 𝑝2 = 𝑛 in primes 𝑝1 and 𝑝2 for every even 𝑛 > 2. The general Waring-Goldbach
problem considers the solubility of the diophantine equation
𝑝𝑘1 + ⋅ ⋅ ⋅ + 𝑝𝑘𝑠 = 𝑛
in primes 𝑝1 , . . . , 𝑝𝑠 for every 𝑛 for which the underlying congruences are feasible.
A variation known as a diophantine inequality arises when attempting to approximate
irrational
number
√numbers by rational numbers. For instance, if we want to find a rational
√
close to 2, then we are looking for integer solutions to the inequality ∣𝑥/𝑦 − 2∣ < 𝜀, where
𝜀 is a small positive number. Dirichlet’s theorem on diophantine approximation actually
tells us that we can solve this inequality with 𝜀 replaced by an explicit function
of the
√
denominator, namely 1/𝑦 2 . Thus we can solve the diophantine inequality ∣𝑥 − 2𝑦∣ < 1/𝑦.
More general inequalities (for example, involving sums of 𝑘th powers) are a subject of current
research interest.
Where do we begin? We’ve only scratched the surface of number theory by mentioning
some of the important ideas and some of the interesting unsolved problems. In the next
section, we’ll start laying the foundations for our study by developing some actual machinery
on divisibility, primes, and congruences. This will lead us to our first main goal, which is
to understand RSA cryptography. Following that, we hope to touch on some of the more
advanced topics mentioned above, such as the distribution of primes, the algebraic structure
of ℤ𝑛 , Waring’s problem, and diophantine approximation.
2. Divisibility
Recall that if 𝑎, 𝑏 ∈ ℤ, we say that 𝑎 divides 𝑏 (and write 𝑎∣𝑏) if there exists 𝑘 ∈ ℤ such
that 𝑏 = 𝑎𝑘. For example, 2 divides 6, but 4 does not divide 6. When 𝑎 divides 𝑏, we say
that 𝑏 is a multiple of 𝑎 and that 𝑏 is divisible by 𝑎. Two easy properties of divisibility that
we’ll find useful are given in the following lemma.
Lemma 2.1. Let 𝑎, 𝑏, and 𝑐 be integers.
(a) If 𝑎∣𝑏 and 𝑏∣𝑐, then 𝑎∣𝑐.
(b) If 𝑎∣𝑏 and 𝑎∣𝑐, then 𝑎∣(𝑏𝑠 + 𝑐𝑡) for all integers 𝑠 and 𝑡.
Proof. If 𝑎∣𝑏 and 𝑏∣𝑐, then we can write 𝑏 = 𝑎𝑘 and 𝑐 = 𝑏𝑙 for some integers 𝑘 and 𝑙. We
then have 𝑐 = 𝑎(𝑘𝑙), which shows that 𝑎∣𝑐. Similarly, if 𝑎∣𝑏 and 𝑎∣𝑐, then we can write
𝑏 = 𝑎𝑘 and 𝑐 = 𝑎𝑙 for some integers 𝑘 and 𝑙. If 𝑠 and 𝑡 are arbitrary integers, we have
𝑏𝑠 + 𝑐𝑡 = 𝑎𝑘𝑠 + 𝑎𝑙𝑡 = 𝑎(𝑘𝑠 + 𝑙𝑡), which shows that 𝑎∣(𝑏𝑠 + 𝑐𝑡).
□
The following divisibility exercise gives us a chance to review proof by mathematical
induction.
Example 2.2. Prove that 𝑛5 − 𝑛 is divisible by 5 for every positive integer 𝑛.
MA 311
NUMBER THEORY
FALL 2008
5
Solution. We proceed by induction on 𝑛. First of all, we have 15 − 1 = 0, which is clearly
divisible by 5, since 0 = 5 ⋅ 0. This establishes the base case. Now suppose that 𝑛 ≥ 1 is an
integer and that 𝑛5 − 𝑛 is divisible by 5. Then by the binomial theorem one has
(𝑛 + 1)5 − (𝑛 + 1) = 𝑛5 + 5𝑛4 + 10𝑛3 + 10𝑛2 + 5𝑛 + 1 − 𝑛 − 1
= (𝑛5 − 𝑛) + 5(𝑛4 + 2𝑛3 + 2𝑛2 + 𝑛).
Here the first term on the right is divisible by 5 according to the induction hypothesis, and
the second term is clearly divisible by 5 since 𝑛4 + 2𝑛3 + 2𝑛2 + 𝑛 is an integer. We therefore
deduce from part (b) of Lemma 2.1 that (𝑛 + 1)5 − (𝑛 + 1) is divisible by 5, and the result
now follows by induction.
□
In the future, we will not always be quite so pedantic in writing, but the above solution
serves as a good model for constructing proofs of this type. In general, to prove that a
statement 𝑃 (𝑛) holds for all positive integers 𝑛, one must first establish 𝑃 (1) and then prove
the implication 𝑃 (𝑛) =⇒ 𝑃 (𝑛 + 1). This principle is one of the fundamental axioms
about the integers. It is equivalent to the well-ordering principle, which states that every
non-empty subset of the positive integers has a smallest element.
Greatest common divisors. The greatest common divisor of 𝑎 and 𝑏 is the largest
positive integer that divides both 𝑎 and 𝑏. It is denoted by gcd(𝑎, 𝑏), or sometimes just
(𝑎, 𝑏) when there is no danger of confusion with an ordered pair. For example, gcd(4, 6) = 2,
gcd(12, 51) = 3, and gcd(9, 16) = 1. If gcd(𝑎, 𝑏) = 1, then we say that 𝑎 and 𝑏 are relatively
prime (or coprime). We note that gcd(𝑎, 0) = 𝑎 for every non-zero integer 𝑎 and that gcd(0, 0)
is undefined. The least common multiple of 𝑎 and 𝑏 is the smallest positive integer that is a
multiple of both 𝑎 and 𝑏. It is denoted by lcm(𝑎, 𝑏) or [𝑎, 𝑏]. For example, lcm(4, 6) = 12. It
is fairly easy to see that
gcd(𝑎, 𝑏)lcm(𝑎, 𝑏) = 𝑎𝑏.
When 𝑎 and 𝑏 are small, one can compute gcd(𝑎, 𝑏) fairly easily by looking at the prime
factorizations of 𝑎 and 𝑏 and picking out the parts in common. For instance, 24 = 23 ⋅ 3
and 180 = 22 ⋅ 32 ⋅ 5, so gcd(24, 180) = 22 ⋅ 3 = 12. However, since factoring is expensive
computationally, this is not an efficient method when 𝑎 and 𝑏 are large. A better method is
based on the division with remainder algorithm learned in grade school.
Theorem 2.3. (Division with remainder) For any integers 𝑎 and 𝑏 with 𝑏 > 0, there
exist unique integers 𝑞 and 𝑟 such that
𝑎 = 𝑞𝑏 + 𝑟
and
0 ≤ 𝑟 < 𝑏.
Proof. We first prove the existence of 𝑞 and 𝑟. Consider the list of integers
. . . 𝑎 − 3𝑏, 𝑎 − 2𝑏, 𝑎 − 𝑏, 𝑎, 𝑎 + 𝑏, 𝑎 + 2𝑏, 𝑎 + 3𝑏, . . . .
Since 𝑏 > 0, we can select one with the smallest non-negative value, say 𝑟 = 𝑎 − 𝑞𝑏. If 𝑟 ≥ 𝑏,
then we find that
𝑟 − 𝑏 = 𝑎 − 𝑞𝑏 − 𝑏 = 𝑎 − (𝑞 + 1)𝑏
is a non-negative number on our list with a smaller value than 𝑟, which contradicts our choice
of 𝑞. Thus we have 0 ≤ 𝑟 < 𝑏 and 𝑎 = 𝑞𝑏 + 𝑟.
To check uniqueness, suppose there are integers 𝑞1 , 𝑞2 , 𝑟1 , and 𝑟2 with
𝑎 = 𝑞1 𝑏 + 𝑟1 = 𝑞2 𝑏 + 𝑟2
and
0 ≤ 𝑟1 , 𝑟2 < 𝑞.
6
SCOTT T. PARSELL
Then we have 𝑏(𝑞1 − 𝑞2 ) = 𝑟2 − 𝑟1 , and we may suppose without loss of generality that
𝑟1 ≤ 𝑟2 . Then
0 ≤ 𝑟2 − 𝑟1 < 𝑏 − 𝑟1 ≤ 𝑏,
and hence
0 ≤ 𝑏(𝑞1 − 𝑞2 ) < 𝑏,
which implies that 𝑞1 − 𝑞2 = 0. Thus 𝑞1 = 𝑞2 , and it follows that 𝑟1 = 𝑟2 .
□
For example, if 𝑎 = 48 and 𝑏 = 9, then we can write 48 = 5 ⋅ 9 + 3, so we can take 𝑞 = 5
and 𝑟 = 3 in Theorem 2.3. We call 𝑞 the quotient and 𝑟 the remainder. Notice that 𝑟 = 0 if
and only if 𝑏 divides 𝑎.
Theorem 2.4. Let 𝑎 and 𝑏 be nonzero integers. Then gcd(𝑎, 𝑏) is the smallest positive
integral linear combination of 𝑎 and 𝑏. That is, gcd(𝑎, 𝑏) is the smallest positive value of
𝑎𝑠 + 𝑏𝑡, where 𝑠 and 𝑡 are integers.
Proof. By taking 𝑠 = 𝑎 and 𝑡 = 𝑏, we see that positive integral linear combinations exist, so
we can let 𝑔 denote the smallest such value. Write 𝑔 = 𝑎𝑠0 + 𝑏𝑡0 . By Theorem 2.3, we can
write
𝑎 = 𝑞𝑔 + 𝑟 = 𝑞(𝑎𝑠0 + 𝑏𝑡0 ) + 𝑟,
where
0 ≤ 𝑟 < 𝑔.
Solving for 𝑟, we get
𝑟 = 𝑎(1 − 𝑞𝑠0 ) + 𝑏(−𝑞𝑡0 ),
so 𝑟 is an integral linear combination of 𝑎 and 𝑏, and since 𝑟 < 𝑔, the minimality of 𝑔 implies
that 𝑟 = 0. Thus we see that 𝑔 divides 𝑎, and we can apply a similar argument to deduce
that 𝑔 divides 𝑏. Thus 𝑔 is a common divisor of 𝑎 and 𝑏. Moreover, if 𝑑 is any common
divisor of 𝑎 and 𝑏, then 𝑑 divides both 𝑎𝑠0 and 𝑏𝑡0 , so 𝑑 divides 𝑔. Thus we conclude that
𝑔 = gcd(𝑎, 𝑏).
□
Corollary 2.5. The integers 𝑎 and 𝑏 are relatively prime if and only if there exist integers
𝑠 and 𝑡 such that 𝑎𝑠 + 𝑏𝑡 = 1.
Proof. If gcd(𝑎, 𝑏) = 1, then it follows from Theorem 2.4 that 𝑎𝑠 + 𝑏𝑡 = 1 for some integers
𝑠 and 𝑡. Conversely, suppose that 1 can be expressed as a linear combination of 𝑎 and 𝑏.
Since Theorem 2.4 ensures that gcd(𝑎, 𝑏) is the smallest positive integer with this property,
we may conclude that gcd(𝑎, 𝑏) = 1.
□
For example, we have 9 ⋅ (−7) + 16 ⋅ 4 = 1, which shows that gcd(9, 16) = 1. An efficient
algorithm for computing gcd(𝑎, 𝑏) is based on the following simple result.
Lemma 2.6. If 𝑎 = 𝑞𝑏 + 𝑟, then gcd(𝑎, 𝑏) = gcd(𝑏, 𝑟).
Proof. If 𝑑 divides both 𝑎 and 𝑏, then 𝑑 clearly divides 𝑟 = 𝑎 − 𝑞𝑏, so 𝑑 is a common divisor
of 𝑏 and 𝑟. Conversely, if 𝑑 divides both 𝑏 and 𝑟, then 𝑑 clearly divides 𝑎 = 𝑞𝑏 + 𝑟, so 𝑑 is a
common divisor of 𝑎 and 𝑏. Therefore the set of common divisors of 𝑎 and 𝑏 is identical to
the set of common divisors of 𝑏 and 𝑟, so the greatest common divisors must be equal. □
The Euclidean Algorithm. We can compute the greatest common divisor very efficiently by successively applying Theorem 2.3 and Lemma 2.6. The gcd is the last non-zero
MA 311
NUMBER THEORY
FALL 2008
7
remainder in this process. That is, to compute gcd(𝑎, 𝑏), we write
𝑎 = 𝑏𝑞1 + 𝑟1
(0 < 𝑟1 < 𝑏)
𝑏 = 𝑟 1 𝑞2 + 𝑟 2
(0 < 𝑟2 < 𝑟1 )
𝑟1 = 𝑟2 𝑞3 + 𝑟3
...
(0 < 𝑟3 < 𝑟2 )
𝑟𝑗−2 = 𝑟𝑗−1 𝑞𝑗 + 𝑟𝑗
𝑟𝑗−1 = 𝑟𝑗 𝑞𝑗+1 ,
(0 < 𝑟𝑗 < 𝑟𝑗−1 )
so that gcd(𝑎, 𝑏) = 𝑟𝑗 .
Example 2.7. Use the Euclidean algorithm to compute 𝑑 = gcd(630, 132), and find integers
𝑠 and 𝑡 such that 𝑑 = 630𝑠 + 132𝑡.
Solution. We have
630 = 132 ⋅ 4 + 102
132 = 102 ⋅ 1 + 30
102 = 30 ⋅ 3 + 12
30 = 12 ⋅ 2 + 6
12 = 6 ⋅ 2,
so the algorithm terminates with 𝑗 = 4, and we have gcd(630, 132) = 𝑟4 = 6. We can now
work backwards through these equations to find the required integers 𝑠 and 𝑡. We have
6 = 30 − 12 ⋅ 2
= 30 − (102 − 30 ⋅ 3) ⋅ 2
= 30 ⋅ 7 − 102 ⋅ 2
= (132 − 102) ⋅ 7 − 102 ⋅ 2
= 132 ⋅ 7 − 102 ⋅ 9
= 132 ⋅ 7 − (630 − 132 ⋅ 4) ⋅ 9
= 132 ⋅ 43 − 630 ⋅ 9,
so we can take 𝑠 = −9 and 𝑡 = 43.
□
There is another way to organize the computations in the Euclidean algorithm that produces gcd(𝑎, 𝑏) and the integers 𝑠 and 𝑡 simultaneously. The idea is to set up an augmented
matrix consisting of a 2 × 2 identity matrix, followed by 𝑎 and 𝑏 in the third column. One
then subtracts one a multiple of one row from the other until the entries in the third column divide one another. The multiples we use are exactly the quotients 𝑞1 , 𝑞2 , . . . , 𝑞𝑗 . Thus
Example 2.7 could be handled as follows:
]
[
]
[
]
[
1 −4 102
1 −4 102
1 0 630
→
→
0 1 132
0
1 132
−1
5 30
]
[
]
[
4 −19 12
4 −19 12
→
.
→
−1
5 30
−9
43 6
8
SCOTT T. PARSELL
Every row [𝑥 𝑦 ∣ 𝑧] of every matrix in this computation has the property that 630𝑥 + 132𝑦 =
𝑧, because this is satisfied by the initial matrix and is preserved by the row operations.
Therefore, the required integers 𝑠 and 𝑡 appear to the left of gcd(𝑎, 𝑏) in the final matrix.
In the worst case, the Euclidean algorithm takes on the order of log 𝑛 steps to compute
gcd(𝑎, 𝑏), where 𝑛 = max(∣𝑎∣, ∣𝑏∣). The function log 𝑛 grows very slowly as 𝑛 → ∞, so the
algorithm runs very quickly on a computer.
Primes. Recall that an integer 𝑛 > 1 is said to be prime if its only positive factors
are 1 and 𝑛. One can generate all the primes up to 𝑁 using the Sieve of Eratosthenes to
successively strike out all the proper multiples
√ of 2, 3, 5, etc. If an integer less than 𝑁 is√not
prime, then it has a prime divisor less than 𝑁 , so one can terminate this process at 𝑁 .
The integers that remain uncrossed are the primes up to 𝑁 .
Lemma 2.8. (Euclid’s Lemma) Let 𝑎 and 𝑏 be integers, and let 𝑝 be a prime. If 𝑝∣𝑎𝑏,
then 𝑝∣𝑎 or 𝑝∣𝑏.
Proof. Suppose that 𝑝 divides 𝑎𝑏 but that 𝑝 does not divide 𝑎. Since 𝑝 is prime, we must
have gcd(𝑎, 𝑝) = 1, so by Theorem 2.4 there exist integers 𝑠 and 𝑡 such that 𝑎𝑠 + 𝑝𝑡 = 1.
Multiplying through by 𝑏, we obtain
𝑎𝑏𝑠 + 𝑝𝑏𝑡 = 𝑏.
Since 𝑝∣𝑎𝑏 and 𝑝∣𝑝, we deduce from part (b) of Lemma 2.1 that 𝑝∣𝑏.
□
Note that Lemma 2.8 fails if 𝑝 is not prime. For example, 6∣12 = 3⋅4, but 6 does not divide
3 or 4. One can easily show by induction that Lemma 2.8 can be extended to products of
more than two integers. That is, if 𝑝 is a prime dividing the product 𝑎1 ⋅ ⋅ ⋅ 𝑎𝑚 , then 𝑝 must
divide at least one of the 𝑎𝑖 .
As a simple application of Euclid’s Lemma, we perform the following entertaining exercise.
√
Example 2.9. Prove that 2 is irrational.
√
√
Solution. We proceed by contradiction. If 2 were rational, then we could write 2 = 𝑎/𝑏
for some positive integers 𝑎 and 𝑏 with (𝑎, 𝑏) = 1. After squaring both sides and clearing
denominators, we find that 2𝑏2 = 𝑎2 , and hence in particular that 2∣𝑎2 . Since 2 is prime,
it now follows from Euclid’s Lemma that 2∣𝑎, so we can write 𝑎 = 2𝑐 for some integer 𝑐.
Substituting this into our previous equation yields 2𝑏2 = 4𝑐2 , or 𝑏2 = 2𝑐2 . Thus 2∣𝑏2 and
hence by Euclid’s Lemma we have 2∣𝑏. We have now deduced that both 𝑎 and 𝑏 are divisible
by 2, contradicting
our original assumption that (𝑎, 𝑏) = 1. This contradiction forces us to
√
conclude that 2 is in fact irrational.
□
√
Note that there is little difficulty in generalizing the argument to handle 𝑝, where 𝑝 is any
√
prime. In fact it is not hard to see that 𝑛 is irrational if and only if 𝑛 fails to be a perfect
square, but this requires information about factoring composite integers. The following
result is the most important application of Euclid’s Lemma and, as its name suggests, is
fundamental to our study of number theory.
Theorem 2.10. (Fundamental Theorem of Arithmetic) Every integer 𝑛 > 1 can be
written as a product of prime factors, and this factorization is unique up to the order of the
factors.
MA 311
NUMBER THEORY
FALL 2008
9
Proof. The existence of factorizations follows easily by induction on the size of the integer 𝑛.
For the base case, it suffices to note that 𝑛 = 2 is prime. Now suppose that 𝑛 ≥ 2 and that
every integer 𝑘 with 2 ≤ 𝑘 ≤ 𝑛 − 1 has a factorization into primes. If 𝑛 is prime, then we are
done. Otherwise, we may write 𝑛 = 𝑎𝑏 where 2 ≤ 𝑎, 𝑏 ≤ 𝑛 − 1, and the induction hypothesis
shows that 𝑎 and 𝑏 both have factorizations, which combine to produce a factorization of 𝑛.
To prove uniqueness, we induct on the number of factors. Suppose that
𝑛 = 𝑝1 ⋅ ⋅ ⋅ 𝑝𝑟 = 𝑞1 ⋅ ⋅ ⋅ 𝑞𝑠 ,
where the 𝑝𝑖 and 𝑞𝑖 are primes, and we may assume without loss of generality that 𝑟 ≤ 𝑠. If
𝑟 = 1, then clearly 𝑠 = 1, so 𝑝1 = 𝑞1 . Now let 𝑟 > 1, and suppose that unique factorization
holds for all integers with fewer than 𝑟 prime factors. Since 𝑝1 ∣𝑞1 ⋅ ⋅ ⋅ 𝑞𝑠 , we have 𝑝1 ∣𝑞𝑖 (and
hence 𝑝1 = 𝑞𝑖 ) for some 𝑖 by an easy extension of Euclid’s Lemma. By relabeling, we may
suppose that 𝑖 = 1, and hence we may divide through by 𝑝1 to get
𝑝2 ⋅ ⋅ ⋅ 𝑝𝑟 = 𝑞2 ⋅ ⋅ ⋅ 𝑞𝑠 .
The induction hypothesis now implies that 𝑟 = 𝑠 and that 𝑝2 , . . . , 𝑝𝑟 is a permutation of
𝑞2 , . . . , 𝑞𝑠 , and the uniqueness follows.
□
√
In rings where unique factorization fails, like ℤ[ −5], the problem is that the notions
of “irreducible” and “prime” do not correspond. The property in Lemma 2.8 is used as
the definition of prime, but there are
√ irreducible elements that don’t satisfy this property.
For example,
2
is
irreducible
in
ℤ[
−5], but it is not
√
√
√ prime in this
√ ring because 2 divides
6 = (1 + −5)(1 − −5), but 2 does not divide 1 + −5 or 1 − −5
Theorem 2.11. There are infinitely many primes.
Proof. Assume to the contrary that there are only finitely many primes, say 𝑝1 , 𝑝2 , . . . , 𝑝𝑛 ,
and let
𝑁 = 𝑝1 𝑝2 ⋅ ⋅ ⋅ 𝑝𝑛 + 1.
We know from Theorem 2.10 that 𝑁 has at least one prime factor, say 𝑞. We cannot have
𝑞 = 𝑝𝑖 for some 𝑖 because this would imply that 𝑞 divides 1 = 𝑁 − 𝑝1 𝑝2 ⋅ ⋅ ⋅ 𝑝𝑛 . This is a
contradiction, so we conclude that there must be infinitely many primes.
□
This theorem was first proved by Euclid, and we’ve given his original proof. Many other
proofs have been discovered since Euclid’s time. A more general theorem of Dirichlet states
that there are infinitely many primes of the form 𝑝 = 𝑞𝑛 + 𝑎 whenever 𝑞 and 𝑎 are relatively
prime. For example, there are infinitely many primes of the form 𝑝 = 4𝑛 + 1 and also of the
form 𝑝 = 4𝑛 + 3. A weak version of the prime number theorem states that if 𝜋(𝑥) denotes
the number of primes up to 𝑥, then 𝜋(𝑥) ∼ 𝑥/ log 𝑥 asymptotically, in the sense that
𝜋(𝑥)
= 1.
𝑥→∞ 𝑥/ log 𝑥
lim
One could interpret this by saying that the probability that the integer 𝑥 is prime is roughly
1/ log 𝑥. Throughout these notes log 𝑥 denotes the natural (base 𝑒) logarithm.
Theorem 2.12. There are arbitrarily large gaps between consecutive primes.
10
SCOTT T. PARSELL
Proof. Given an integer 𝑛 > 1, we’ll construct a list of 𝑛 consecutive composite numbers. If
we let 𝑎 = (𝑛 + 1)! + 2, then the 𝑛 numbers
𝑎, 𝑎 + 1, 𝑎 + 2, . . . , 𝑎 + 𝑛 − 1
are all composite, since 𝑘 + 2 divides 𝑎 + 𝑘 = (𝑛 + 1)! + (𝑘 + 2) for 𝑘 = 0, 1, 2 . . . , 𝑛 − 1. □
At the other extreme, the Twin Primes Conjecture states that there are infinitely pairs of
primes whose difference is 2, for instance
(3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), . . . .
Those familiar with analysis may wish to observe that if 𝑝𝑛 denotes the 𝑛th prime then
Theorem 2.12 is equivalent to the statement that lim sup(𝑝𝑛+1 − 𝑝𝑛 ) = ∞, while the Twin
Primes Conjecture asserts that lim inf(𝑝𝑛+1 − 𝑝𝑛 ) = 2. In spite of some recent breakthroughs
in this area, we do not even know for sure that lim inf(𝑝𝑛+1 − 𝑝𝑛 ) < ∞. This indicates that
we’re not very close to a proof of the Twin Primes Conjecture!
Perfect numbers and Mersenne primes. A positive integer is said to be perfect if
it is the sum of its proper positive divisors (that is, not including the number itself). For
example,
6=1+2+3
and
28 = 1 + 2 + 4 + 7 + 14
are perfect. The first few perfect numbers are 6, 28, 496, 8128, 33550336. It is believed that
there are infinitely many perfect numbers, but this is not known. Another open problem is
to determine whether there are any odd perfect numbers (it’s believed that the answer is
no).
Theorem 2.13. A positive even integer 𝑚 is perfect if and only if we can write 𝑚 =
2𝑛−1 (2𝑛 − 1), where 2𝑛 − 1 is prime.
Proof. First suppose that 𝑝 = 2𝑛 − 1 is prime. We need to show that 𝑚 = 2𝑛−1 𝑝 is perfect.
The proper positive divisors of 𝑚 are
1, 2, 4, 8, . . . , 2𝑛−1 , 𝑝, 2𝑝, 4𝑝, 8𝑝, . . . , 2𝑛−2 𝑝,
so their sum is
2𝑛 − 1 + 𝑝(2𝑛−1 − 1) = 𝑝 + (2𝑛−1 − 1)𝑝 = 2𝑛−1 𝑝 = 𝑚.
This shows that 𝑚 is perfect.
Conversely, suppose that 𝑚 is an even perfect number. We need to show that there is an
integer 𝑛 such that 𝑚 = 2𝑛−1 (2𝑛 − 1) and 2𝑛 − 1 is prime. Since 𝑚 is even, we can write
𝑚 = 2𝑎 𝑡, where 𝑎 ≥ 1 and 𝑡 is odd. Let 𝑆 denote the sum of all the positive divisors of 𝑡
(i.e., the sum of the odd positive divisors of 𝑚). Since 𝑚 is perfect, we know that the sum
of all the positive divisors of 𝑚 is equal to 2𝑚, so we have have
2𝑚 = 𝑆 + 2𝑆 + 4𝑆 + 8𝑆 + ⋅ ⋅ ⋅ + 2𝑎 𝑆 = (2𝑎+1 − 1)𝑆,
and thus
𝑆=
2𝑚
2𝑎+1 𝑡
(2𝑎+1 − 1)𝑡 + 𝑡
𝑡
=
=
=
𝑡
+
.
2𝑎+1 − 1
2𝑎+1 − 1
2𝑎+1 − 1
2𝑎+1 − 1
MA 311
NUMBER THEORY
FALL 2008
11
Since 𝑆 and 𝑡 are integers, we see that 𝑢 = 𝑡/(2𝑎+1 − 1) is an integer, and 𝑢 < 𝑡 since 𝑎 ≥ 1.
Thus 𝑢 and 𝑡 are two distinct divisors of 𝑡. It follows that they are the only positive divisors
of 𝑡, whence 𝑡 is prime and 𝑢 = 1. Thus we have 𝑡 = 2𝑎+1 − 1, so on setting 𝑛 = 𝑎 + 1 we get
𝑚 = 2𝑛−1 𝑡 = 2𝑛−1 (2𝑛 − 1),
where 2𝑛 − 1 is prime.
□
Primes of the form 2𝑛 − 1 are called Mersenne primes. As a result of Theorem 2.13, finding
even perfect numbers is equivalent to finding Mersenne primes. Notice that 6 = 21 ⋅ (22 − 1),
28 = 22 (23 − 1), 496 = 24 (25 − 1), 8128 = 26 (27 − 1), and 33550336 = 212 (213 − 1).
The following theorem restricts the possibilities somewhat.
Theorem 2.14. If 2𝑛 − 1 is prime, then 𝑛 is prime.
Proof. We prove the contrapositive. Suppose that 𝑛 is composite. Then we can write 𝑛 = 𝑎𝑏
for some integers 𝑎 and 𝑏 with 1 < 𝑎, 𝑏 < 𝑛. Then we have
2𝑛 − 1 = 2𝑎𝑏 − 1 = (2𝑎 )𝑏 − 1 = (2𝑎 − 1)(1 + 2𝑎 + 22𝑎 + ⋅ ⋅ ⋅ + 2(𝑏−1)𝑎 ).
Here we have used the factorization
𝑥𝑏 − 1 = (𝑥 − 1)(1 + 𝑥 + 𝑥2 + ⋅ ⋅ ⋅ + 𝑥𝑏−1 )
with 𝑥 = 2𝑎 . Since 1 < 𝑎 < 𝑛, we have 1 < 2𝑎 − 1 < 2𝑛 − 1, and hence we conclude that
2𝑛 − 1 is composite.
□
The converse of Theorem 2.14 is false. That is, there exist primes 𝑝 for which 2𝑝 − 1 is
not prime. The smallest example is 211 − 1 = 2047 = 23 ⋅ 89. There are 46 known Mersenne
primes, the largest of which is 243,112,609 − 1. This was discovered in August 2008 and has
12,978,189 digits. The largest known perfect number is therefore 243,112,608 (243,112,609 − 1).
This world-record prime was actually the 45th Mersenne prime to be discovered. The 46th
one was found about two weeks later but has only 11,185,272 digits. To join the Great
Internet Mersenne Prime Search (GIMPS), go to http://www.mersenne.org.
3. Congruences
Let 𝑛 be a positive integer, and let 𝑎 and 𝑏 be arbitrary integers. We say that 𝑎 and 𝑏 are
congruent modulo 𝑛 if 𝑛 divides 𝑎 − 𝑏. In this case, we write
𝑎≡𝑏
(mod 𝑛).
For example, we have 37 ≡ 2 (mod 5), 37 ≡ −3 (mod 5), and 24 ≡ 0 (mod 6). Notice
that 𝑎 ≡ 0 (mod 𝑛) if and only if 𝑛∣𝑎 and that 𝑎 ≡ 𝑏 (mod 𝑛) if and only if we can write
𝑎 = 𝑏 + 𝑘𝑛 for some integer 𝑘.
Lemma 3.1. If 𝑎 ≡ 𝑐 (mod 𝑛) and 𝑏 ≡ 𝑑 (mod 𝑛), then 𝑎 + 𝑏 ≡ 𝑐 + 𝑑 (mod 𝑛) and 𝑎𝑏 ≡ 𝑐𝑑
(mod 𝑛).
Proof. Suppose that 𝑎 ≡ 𝑐 (mod 𝑛) and 𝑏 ≡ 𝑑 (mod 𝑛). Then there exist integers 𝑘 and 𝑙
such that 𝑎 = 𝑐 + 𝑘𝑛 and 𝑏 = 𝑑 + 𝑙𝑛. We then have
𝑎 + 𝑏 = 𝑐 + 𝑑 + (𝑘 + 𝑙)𝑛
and
𝑎𝑏 = 𝑐𝑑 + (𝑘𝑑 + 𝑙𝑐 + 𝑘𝑙𝑛)𝑛,
which shows that 𝑎 + 𝑏 ≡ 𝑐 + 𝑑 (mod 𝑛) and 𝑎𝑏 ≡ 𝑐𝑑 (mod 𝑛).
□
This lemma allows us to manipulate congruences algebraically as we do with equations.
12
SCOTT T. PARSELL
Example 3.2. For what integers 𝑥 does the congruence 4𝑥 + 1 ≡ 3 (mod 7) hold?
Solution. Subtracting 1 from both sides shows that the congruence is equivalent to 4𝑥 ≡ 2
(mod 7). Multiplying both sides by 2 now gives 8𝑥 ≡ 4 (mod 7), which is the same as 𝑥 ≡ 4
(mod 7), since 8 ≡ 1 (mod 7). Hence the congruence is satisfied by all integers 𝑥 of the form
𝑥 = 4 + 7𝑘, where 𝑘 is an integer.
□
Lemma 3.3. (Cancellation) If 𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛) and (𝑎, 𝑛) = 1, then 𝑏 ≡ 𝑐 (mod 𝑛).
Proof. Suppose that 𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛) and (𝑎, 𝑛) = 1. Then 𝑛 divides 𝑎𝑏 − 𝑎𝑐 = 𝑎(𝑏 − 𝑐).
Since (𝑎, 𝑛) = 1, it follows by imitating the proof of Euclid’s Lemma that 𝑛 divides 𝑏 − 𝑐
(exercise). Thus we have 𝑏 ≡ 𝑐 (mod 𝑛).
□
Note that Lemma 3.3 may fail without the assumption that (𝑎, 𝑛) = 1. For instance, we
have 2 ⋅ 5 ≡ 2 ⋅ 14 (mod 6), but 5 ∕≡ 14 (mod 6).
Example 3.4. For what values of 𝑥 does the congruence 4𝑥 + 1 ≡ 5 (mod 7) hold?
Solution. Here the congruence is equivalent to 4𝑥 ≡ 4 (mod 7), and since (4, 7) = 1 we
may apply Lemma 3.3 to conclude that 𝑥 ≡ 1 (mod 7). Hence the congruence holds for all
integers 𝑥 of the form 𝑥 = 1 + 7𝑘, where 𝑘 is an integer.
□
Residue Classes. It is easy to see that congruence modulo 𝑛 defines an equivalence
relation on the set of integers and therefore partitions the integers into equivalence classes.
Our solutions to Examples 3.2 and 3.4 indicate how these are defined. In Example 3.4, for
instance, the solution was the set of all integers congruent to 1 modulo 7, that is, all integers
𝑥 that can be expressed in the form 𝑥 = 1 + 7𝑘 for some integer 𝑘. We call this set the
residue class of 1 modulo 7. It is sometimes denoted by [1] or [1]7 . Thus
[1]7 = {. . . , −20, −13, −6, 1, 8, 15, 22, . . . }.
Similarly, the solution of Example 3.2 is the set of all integers in the residue class
[4]7 = {. . . , −17, −10, −3, 4, 11, 18, . . . }.
In general, we let [𝑎] or [𝑎]𝑛 denote the residue class of 𝑎 modulo 𝑛, which is defined to be
the set of all integers of the form 𝑎 + 𝑘𝑛, where 𝑘 ∈ ℤ.
It is often convenient to view each residue class as a single element in a number system.
Therefore, we let ℤ𝑛 denote the set of residue classes modulo 𝑛. Technically, we have
ℤ𝑛 = {[0]𝑛 , [1]𝑛 , [2]𝑛 , . . . , [𝑛 − 1]𝑛 },
but Lemma 3.1 allows us to work with any set of representatives, such as {0, 1, 2, . . . , 𝑛 − 1},
when doing computations. Thus we often dispense with the brackets and just think of ℤ𝑛
as the set {0, 1, 2, . . . , 𝑛 − 1} under mod 𝑛 arithmetic. With this viewpoint, we could say
that the congruence in Example 3.4 has the unique solution 𝑥 = 1 in ℤ7 . Addition and
multiplication in ℤ7 can be represented by the following tables:
MA 311
+
0
1
2
3
4
5
6
0
0
1
2
3
4
5
6
NUMBER THEORY
1
1
2
3
4
5
6
0
2
2
3
4
5
6
0
1
3
3
4
5
6
0
1
2
4
4
5
6
0
1
2
3
5
5
6
0
1
2
3
4
6
6
0
1
2
3
4
5
×
0
1
2
3
4
5
6
FALL 2008
0
0
0
0
0
0
0
0
1
0
1
2
3
4
5
6
2
0
2
4
6
1
3
5
3
0
3
6
2
5
1
4
4
0
4
1
5
2
6
3
5
0
5
3
1
6
4
2
13
6
0
6
5
4
3
2
1
A set such as {0, 1, 2, . . . , 𝑛−1} that contains exactly one representative of each equivalence
class is called a complete residue system modulo 𝑛. Complete residue systems are not unique;
for instance {0, 1, 2, 3, 4, 5, 6} and {−3, −2, −1, 0, 1, 2, 3} are equally valid complete residue
systems modulo 7, and either one could be used to represent ℤ7 .
Solving Linear Congruences. We want to develop a systematic procedure for finding
the solutions of a congruence of the shape 𝑎𝑥 ≡ 𝑏 (mod 𝑛). The following lemma is an
important starting point.
Lemma 3.5. (Multiplicative Inverses) If (𝑎, 𝑛) = 1, then there is an integer 𝑐 such that
𝑐𝑎 ≡ 1 (mod 𝑛). Moreover, the residue class of 𝑐 modulo 𝑛 is unique.
Proof. Since (𝑎, 𝑛) = 1, we know from Corollary 2.5 that there exist integers 𝑠 and 𝑡 with
𝑎𝑠 + 𝑛𝑡 = 1. We then have 𝑎𝑠 = 1 − 𝑛𝑡, which shows that 𝑎𝑠 ≡ 1 (mod 𝑛), so we can take
𝑐 = 𝑠. Now suppose that 𝑐′ is any other integer with 𝑐′ 𝑎 ≡ 1 (mod 𝑛). Then
𝑐′ ≡ 𝑐′ (𝑐𝑎) ≡ (𝑐′ 𝑎)𝑐 ≡ 𝑐 (mod 𝑛),
and the uniqueness claim follows.
□
If 𝑐𝑎 ≡ 1 (mod 𝑛), then we say that 𝑐 is the inverse of 𝑎 modulo 𝑛, and we sometimes
write 𝑐 = 𝑎−1 or 𝑐 = 𝑎−1 mod 𝑛. Lemma 3.5 shows that when (𝑎, 𝑛) = 1, the congruence
𝑎𝑥 ≡ 𝑏 (mod 𝑛) has a unique solution in ℤ𝑛 , given by 𝑥 = 𝑎−1 𝑏.
In view of Corollary 2.5, it is easy to see that Lemma 3.5 can be strengthened to an “if and
only if” statement. That is, 𝑎 has a multiplicative inverse modulo 𝑛 if and only if (𝑎, 𝑛) = 1.
In order to find 𝑎−1 when (𝑎, 𝑛) = 1, we apply the Euclidean algorithm to find integers 𝑠
and 𝑡 with
𝑎𝑠 + 𝑛𝑡 = 1.
−1
We then have 𝑎𝑠 ≡ 1 (mod 𝑛), so 𝑠 ≡ 𝑎 (mod 𝑛). For small values of 𝑛, we can often find
inverses by inspection without resorting to the Euclidean algorithm.
Example 3.6. Solve the congruence 4𝑥 ≡ 3 (mod 9).
Solution. Since (4, 9) = 1 we know that 4 has a multiplicative inverse modulo 9, and we find
by inspection that 4−1 = 7 in ℤ9 since 4 ⋅ 7 = 28 ≡ 1 (mod 9). Multiplying through by 7
now gives 𝑥 ≡ 21 ≡ 3 (mod 9), and hence 𝑥 = 3 is the unique solution in ℤ9 .
□
Example 3.7. Solve the congruence 91𝑥 ≡ 5 (mod 64).
Solution. We can start by observing that 91 ≡ 27 (mod 64), so the congruence is equivalent
to 27𝑥 ≡ 5 (mod 64). Since (27, 64) = 1, we can again find a unique solution modulo 64 by
14
SCOTT T. PARSELL
multiplying through 27−1 , but finding the inverse by inspection is not quite as easy as it was
in Example 3.6. Thus we apply the Euclidean algorithm:
[
]
[
]
[
]
1 0 64
1 −2 10
1 −2 10
→
→
0 1 27
0
1 27
−2
5 7
[
]
[
]
3 −7 3
3 −7 3
→
→
.
−2
5 7
−8 19 1
This shows that 64 ⋅ (−8) + 27 ⋅ 19 = 1 and hence that 27 ⋅ 19 ≡ 1 (mod 64). Hence we have
27−1 = 19 in ℤ64 . Thus 𝑥 ≡ 5 ⋅ 19 ≡ 31 is the unique solution modulo 64.
□
What, if anything, can we say about the solutions to the congruence 𝑎𝑥 ≡ 𝑏 (mod 𝑛)
when (𝑎, 𝑛) > 1? The following theorem provides the answer.
Theorem 3.8. Write 𝑑 = (𝑎, 𝑛). The congruence 𝑎𝑥 ≡ 𝑏 (mod 𝑛) has a solution if and
only if 𝑑 divides 𝑏. In this case, there are exactly 𝑑 solutions modulo 𝑛, spaced 𝑛/𝑑 apart.
Proof. If 𝑥 is a solution to the congruence, then we have 𝑎𝑥 = 𝑏 + 𝑘𝑛 for some integer 𝑘,
and thus 𝑏 = 𝑎𝑥 − 𝑘𝑛. Since 𝑑∣𝑎 and 𝑑∣𝑛, we must have 𝑑∣𝑏 by Lemma 2.1. Therefore the
congruence has no solution if 𝑑 does not divide 𝑏.
Now suppose that 𝑑∣𝑏. Then since 𝑎𝑥 − 𝑏 = 𝑘𝑛 if and only if 𝑎𝑑 𝑥 − 𝑑𝑏 = 𝑘 𝑛𝑑 , we see that the
congruence is equivalent to
𝑎
𝑏
𝑛
𝑥≡
(mod ).
𝑑
𝑑
𝑑
Since (𝑎/𝑑, 𝑛/𝑑) = 1, Lemma 3.5 shows that there is a unique solution 𝑥0 modulo 𝑛/𝑑 and
hence 𝑑 distinct solutions modulo 𝑛, given by 𝑥 = 𝑥0 + 𝑚(𝑛/𝑑) for 0 ≤ 𝑚 ≤ 𝑑 − 1.
□
Example 3.9. Describe the solutions of the congruence 6𝑥 ≡ 5 (mod 9).
Solution. We have (6, 9) = 3, which fails to divide 5, so Theorem 3.8 tells us that there is
no solution.
□
Example 3.10. Describe the solutions of the congruence 24𝑥 ≡ 9 (mod 33).
Solution. We have (24, 33) = 3, which divides 9, so the proof of Theorem 3.8 shows that the
congruence is equivalent to 8𝑥 ≡ 3 (mod 11). Since 8−1 = 7 in ℤ11 , we find that 𝑥 = 10
is the unique solution modulo 11. It follows that there are exactly 3 solutions modulo 33,
represented by the residue classes 𝑥 = 10, 𝑥 = 21, and 𝑥 = 32.
□
Applications to check digit schemes. Congruences can be used to construct a method
for reducing errors in data entry. Suppose we have a list of 9-digit identification numbers
of the form 𝑥1 𝑥2 . . . 𝑥9 to enter into a computer. We can add a 10th digit 𝑥10 satisfying the
congruence
𝑥10 ≡ 𝑥1 + ⋅ ⋅ ⋅ + 𝑥9 (mod 10);
that is, 𝑥10 is the sum of the previous 9 digits modulo 10. We can now enter our ID numbers
in the form 𝑥1 𝑥2 . . . 𝑥10 and program our computer to reject our entry if the above congruence
is not satisfied. For example, the number 129-28-5468 would be entered as 129-28-5468-5
The number 𝑥10 (in this case 5) is called a check digit. This scheme will catch any errors
in which only a single digit is mistyped; for instance, the erroneous entry 126-28-5468-5 for
the ID number above would be rejected. Many other errors will be caught as well, and this
MA 311
NUMBER THEORY
FALL 2008
15
scheme can be applied to data strings of any length. One notable disadvantage is that it does
not detect errors in which two digits are interchanged; for example, the entry 129-28-4568-5
would be accepted by our computer as a valid ID even though it may have resulted from
mistyping 54 as 45.
In order to detect errors resulting from interchanging digits, one can employ a more sophisticated scheme. We illustrate by examining the International Standard Book Number
(ISBN) system. These numbers are 10 digits long and come in 4 blocks; for instance, the
ISBN for Niven, Zuckerman, and Montgomery, Introduction to the Theory of Numbers, 5th
edition, is 0-471-62546-9. The first digit indicates the country of publication, the second
block encodes the publisher (Wiley), the third block identifies the title and edition, and the
fourth block is a check digit. If the first nine digits are 𝑥1 , . . . , 𝑥9 , then the check digit 𝑥10 is
determined by the congruence
𝑥10 ≡
9
∑
𝑖𝑥𝑖 ≡ 𝑥1 + 2𝑥2 + 3𝑥3 + ⋅ ⋅ ⋅ + 9𝑥9
(mod 11).
𝑖=1
Thus in the above case, we would compute
𝑥10 ≡ 0 + 2 ⋅ 4 + 3 ⋅ 7 + 4 ⋅ 1 + 5 ⋅ 6 + 6 ⋅ 2 + 7 ⋅ 5 + 8 ⋅ 4 + 9 ⋅ 6 ≡ 196 ≡ 9 (mod 11).
We find 𝑥10 by reducing the above expression modulo 11 to obtain one of the standard
representatives 0, 1, 2 . . . , 9, 10. (In the event that 𝑥10 = 10, the ISBN uses X instead.)
It turns out that this scheme protects both against mistyping a single digit and against
interchanging two unequal digits, as long as only one of these errors occurs in a given entry.
Theorem 3.11. If 𝐴 = 𝑥1 𝑥2 . . . 𝑥10 is a valid ISBN and 𝐵 = 𝑥′1 𝑥′2 . . . 𝑥′10 is obtained from 𝐴
by altering exactly one digit or interchanging two unequal digits, then 𝐵 is not a valid ISBN.
Proof. Note that since 10 ≡ −1 (mod 11) our check digit test for a valid ISBN is equivalent
to the congruence
10
∑
𝑖𝑥𝑖 ≡ 0 (mod 11).
𝑖=1
Suppose that 𝐵 is obtained from 𝐴 by replacing some digit 𝑥𝑗 by 𝑥′𝑗 , where 𝑥𝑗 ∕= 𝑥′𝑗 . Then
(∑
)
10
10
∑
′
𝑖𝑥𝑖 =
𝑖𝑥𝑖 − 𝑗𝑥𝑗 + 𝑗𝑥′𝑗 ≡ 𝑗(𝑥′𝑗 − 𝑥𝑗 ) ∕≡ 0 (mod 11)
𝑖=1
𝑖=1
by Euclid’s Lemma, since 11 does not divide 𝑗 or 𝑥𝑗 − 𝑥′𝑗 .
Suppose instead that 𝐵 is obtained from 𝐴 by interchanging the 𝑗th and 𝑘th digits, where
𝑗 ∕= 𝑘 and 𝑥𝑗 ∕= 𝑥𝑘 . Then we can write 𝑥′𝑗 = 𝑥𝑘 and 𝑥′𝑘 = 𝑥𝑗 , and hence
(∑
)
10
10
∑
′
𝑖𝑥𝑖 ≡
𝑖𝑥𝑖 + 𝑗𝑥𝑘 + 𝑘𝑥𝑗 − 𝑗𝑥𝑗 − 𝑘𝑥𝑘 ≡ (𝑘 − 𝑗)(𝑥𝑗 − 𝑥𝑘 ) ∕≡ 0 (mod 11)
𝑖=1
𝑖=1
by Euclid’s Lemma, since 11 does not divide 𝑘 − 𝑗 or 𝑥𝑗 − 𝑥𝑘 .
□
Example 3.12. The code number 5-382-14572-2 was obtained from a valid ISBN by interchanging two adjacent digits. What was the original ISBN?
16
SCOTT T. PARSELL
Solution. Adopting the notation from the proof of Theorem 3.11, we have
10
∑
𝑖𝑥′𝑖 = 5 + 6 + 24 + 8 + 5 + 24 + 35 + 56 + 18 + 20 = 201 ≡ 3 (mod 11).
𝑖=1
Suppose the adjacent digits 𝑥𝑗 and 𝑥𝑗+1 were interchanged in the original ISBN. Then by
applying the last displayed equation in the proof of Theorem 3.11 with 𝑘 = 𝑗 + 1, we see
that
𝑥′𝑗+1 − 𝑥′𝑗 = 𝑥𝑗 − 𝑥𝑗+1 ≡ 3 (mod 11).
In the given code, we have 𝑥′6 − 𝑥′5 = 3, and there is no other pair of adjacent digits with
this property, so these must be the ones that were interchanged. It follows that the original
ISBN was 5-382-41572-2.
□
In the above example, we were able to use the ISBN scheme not only to detect an error
but also to correct it, assuming we were fairly confident that the error involved transposing
adjacent digits. Of course, if there was more than one adjacent pair (𝑥′𝑗 , 𝑥′𝑗+1 ) in the erroneous
code with 𝑥′𝑗+1 − 𝑥′𝑗 = 3, then we’d be less successful.
Recently, the above system (known as ISBN-10) has been phased out in favor of a 13-digit
code that is compatible with the UPC/EAN scheme. Here the check digit is determined by
the congruence
𝑥1 + 3𝑥2 + 𝑥3 + 3𝑥4 + 𝑥5 + ⋅ ⋅ ⋅ + 3𝑥12 + 𝑥13 ≡ 0
(mod 10),
and a 12-digit UPC is converted to this form by putting an extra 0 at the beginning. Since
the arithmetic now occurs in ℤ10 , there is no need to allow X as a possible check digit. This
scheme (known as ISBN-13) still detects all single-digit errors but unfortunately no longer
detects all transpositions. Many recent books contain both the ISBN-10 and ISBN-13 codes.
Fermat’s Little Theorem. In many applications of congruences, it is important to be
able to compute powers of an integer efficiently modulo some number 𝑛. In the case where
𝑛 is a prime, we have the following useful result.
Theorem 3.13. (Fermat’s Little Theorem) If 𝑝 is a prime not dividing 𝑎, then
𝑎𝑝−1 ≡ 1 (mod 𝑝).
Proof. Suppose that 𝑝 does not divide 𝑎, and consider the product
𝑋 = 𝑎 ⋅ 2𝑎 ⋅ 3𝑎 ⋅ ⋅ ⋅ (𝑝 − 1)𝑎 = 𝑎𝑝−1 [1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ (𝑝 − 1)] = 𝑎𝑝−1 (𝑝 − 1)!.
Suppose that 1 ≤ 𝑖, 𝑗 ≤ 𝑝 − 1 and that 𝑖𝑎 ≡ 𝑗𝑎 (mod 𝑝). Since (𝑎, 𝑝) = 1, Lemma 3.3 implies
that 𝑖 ≡ 𝑗 (mod 𝑝), and hence that 𝑖 = 𝑗. Therefore, the integers 𝑎, 2𝑎, 3𝑎, . . . , (𝑝 − 1)𝑎
represent all the non-zero residue classes modulo 𝑝, and hence their product, 𝑋, must be
congruent modulo 𝑝 to 1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ (𝑝 − 1) = (𝑝 − 1)!. That is, we have
𝑎𝑝−1 (𝑝 − 1)! ≡ (𝑝 − 1)! (mod 𝑝).
Now since all the prime factors of (𝑝 − 1)! are smaller than 𝑝, we find that 𝑝 and (𝑝 − 1)! are
relatively prime, and thus Lemma 3.3 implies that 𝑎𝑝−1 ≡ 1 (mod 𝑝).
□
MA 311
NUMBER THEORY
FALL 2008
17
We can use Fermat’s Little Theorem to compute powers modulo a prime very efficiently
by applying division with remainder to the exponent. Usually we are interested in the least
non-negative representative for a particular residue class; this is sometimes called the residue
and denoted by the MOD symbol. For instance, the residue of 8 modulo 5 is 8 MOD 5 = 3.
Example 3.14. Compute 22008 MOD 13.
Solution. Since 13 is prime and doesn’t divide 2, Theorem 3.13 implies that 212 ≡ 1 (mod 13).
Moreover, division with remainder yields 2008 = 12 ⋅ 167 + 4, so
22008 = 212⋅167+4 = (212 )167 ⋅ 24 ≡ 24 ≡ 3
(mod 13).
Thus we have 22008 MOD 13 = 3.
□
Fermat’s Little Theorem also yields a negative test for primality, which is often faster than
trial division. If 𝑏 is a positive integer not divisible by 𝑛 and we can show that 𝑏𝑛−1 ∕≡ 1
(mod 𝑛), then we may conclude that 𝑛 is not prime. However, the converse of this is false.
For example, 2340 ≡ 1 (mod 341), and yet 341 = 11 ⋅ 31 is not prime. So this does not give
a way to prove that an integer is prime. We’ll return to this topic in the next section.
Reduced residues and Euler’s Theorem. Recall that 𝑎 has a multiplicative inverse
modulo 𝑛 if and only if (𝑎, 𝑛) = 1. When 𝑛 is prime, the residues with this property are
just 1, 2, 3, . . . , 𝑛 − 1. In general, we write 𝜙(𝑛) for the number of positive integers less than
or equal to 𝑛 that are relatively prime to 𝑛. This is known as Euler’s phi function. For
instance, we have 𝜙(1) = 1, 𝜙(2) = 1, 𝜙(3) = 2, 𝜙(4) = 2, 𝜙(5) = 4, 𝜙(6) = 2, 𝜙(7) = 6,
𝜙(8) = 4, 𝜙(9) = 6, and 𝜙(10) = 4. Notice that 𝜙(𝑝) = 𝑝 − 1 whenever 𝑝 is prime.
The property of being relatively prime to 𝑛 depends only on the residue class of an integer,
since (𝑎, 𝑛) = (𝑎+𝑘𝑛, 𝑛) for any integer 𝑘 by Lemma 2.6. Therefore, we can view 𝜙(𝑛) as the
number of residue classes modulo 𝑛 that are relatively prime to 𝑛. Any set of representatives
for these classes is called a reduced residue system modulo 𝑛. For instance, {1, 2, 3, 4} is a
reduced residue system modulo 5, while {1, 3, 7, 9} and {−3, −1, 1, 3} are reduced residue
systems modulo 10. We often use ℤ∗𝑛 to denote a reduced residue system modulo 𝑛. Those
familiar with abstract algebra may wish to note that ℤ∗𝑛 forms a group under multiplication.
The following result generalizes Fermat’s Little Theorem to the case of composite moduli.
Theorem 3.15. (Euler’s Theorem) If 𝑎 and 𝑛 are positive integers with (𝑎, 𝑛) = 1, then
𝑎𝜙(𝑛) ≡ 1 (mod 𝑛).
Proof. Let 𝑏1 , . . . , 𝑏𝜙(𝑛) denote the positive integers less than or equal to 𝑛 that are relatively
prime to 𝑛, and let 𝑟𝑖 = 𝑎𝑏𝑖 MOD 𝑛 be the residue of 𝑎𝑏𝑖 modulo 𝑛. Suppose that 1 ≤
𝑖, 𝑗 ≤ 𝜙(𝑛) and 𝑟𝑖 = 𝑟𝑗 . Then 𝑎𝑏𝑖 ≡ 𝑎𝑏𝑗 (mod 𝑛), which implies that 𝑏𝑖 ≡ 𝑏𝑗 (mod 𝑛) since
(𝑎, 𝑛) = 1. Since 𝑏1 , . . . , 𝑏𝜙(𝑛) are distinct integers between 1 and 𝑛, we must have 𝑖 = 𝑗.
This shows that 𝑟1 , . . . , 𝑟𝜙(𝑛) are distinct. Moreover, it is clear that (𝑟𝑖 , 𝑛) = 1 for each 𝑖, so
{𝑟1 , . . . , 𝑟𝜙(𝑛) } is a reduced residue system modulo 𝑛. In particular, we have
𝑏1 ⋅ ⋅ ⋅ 𝑏𝜙(𝑛) ≡ 𝑟1 ⋅ ⋅ ⋅ 𝑟𝜙(𝑛) ≡ 𝑎𝑏1 ⋅ ⋅ ⋅ 𝑎𝑏𝜙(𝑛) ≡ 𝑎𝜙(𝑛) 𝑏1 ⋅ ⋅ ⋅ 𝑏𝜙(𝑛)
(mod 𝑛).
Since 𝑏1 ⋅ ⋅ ⋅ 𝑏𝜙(𝑛) is relatively prime to 𝑛, we conclude that 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛), as desired. □
Example 3.16. Compute 5999 MOD 12.
18
SCOTT T. PARSELL
Solution. We have 𝜙(12) = 4 and (5, 12) = 1, so Theorem 3.15 implies that 54 ≡ 1 (mod 12).
Since 999 = 4 ⋅ 249 + 3, we have
5999 = 54⋅249+3 = (54 )249 ⋅ 53 ≡ 53 ≡ 5 (mod 12).
Thus we have 5999 MOD 12 = 5.
□
In turns out that 𝜙(𝑛) can be computed easily provided that the prime factorization of 𝑛 is
known. This follows from the following important theorem about simultaneous congruences.
We say that integers 𝑚1 , . . . , 𝑚𝑟 are pairwise relatively prime if (𝑚𝑖 , 𝑚𝑗 ) = 1 whenever 𝑖 ∕= 𝑗.
Theorem 3.17. (Chinese Remainder Theorem) Let 𝑚1 , . . . , 𝑚𝑟 be pairwise relatively
prime positive integers, and let 𝑏1 , . . . , 𝑏𝑟 be any integers. There exists an integer 𝑥 satisfying
the system of congruences
𝑥 ≡ 𝑏1 (mod 𝑚1 ),
𝑥 ≡ 𝑏2 (mod 𝑚2 ),
... ,
𝑥 ≡ 𝑏𝑟 (mod 𝑚𝑟 ),
and 𝑥 is unique modulo 𝑚1 ⋅ ⋅ ⋅ 𝑚𝑟 .
Proof. Let 𝑀 = 𝑚1 ⋅ ⋅ ⋅ 𝑚𝑟 , and for each 𝑖 write 𝑀𝑖 = 𝑀/𝑚𝑖 . Since the 𝑚𝑖 are pairwise
relatively prime, we have (𝑀𝑖 , 𝑚𝑖 ) = 1, and thus Theorem 3.8 shows that there is a unique
integer 𝑠𝑖 modulo 𝑚𝑖 satisfying the congruence
𝑀𝑖 𝑠𝑖 ≡ 𝑏𝑖
(mod 𝑚𝑖 ).
It is easy to check that the integer
𝑥 = 𝑀1 𝑠1 + 𝑀2 𝑠2 + ⋅ ⋅ ⋅ + 𝑀𝑟 𝑠𝑟
satisfies our system of congruences. If 𝑥′ is another solution to the system, then we have
𝑥 ≡ 𝑥′ (mod 𝑚𝑖 ) for each 𝑖, and hence 𝑥 − 𝑥′ is divisible by 𝑚𝑖 . Since the 𝑚𝑖 are pairwise
relatively prime, it follows easily that 𝑥 − 𝑥′ is divisible by 𝑀 , which establishes uniqueness
modulo 𝑀 .
□
Example 3.18. Solve the system of congruences
𝑥 ≡ 1 (mod 5),
2𝑥 ≡ 4 (mod 6),
3𝑥 ≡ 2 (mod 7).
Solution. We first rewrite the system in a form to which Theorem 3.17 applies. In view of
Theorem 3.8, we see that the system is equivalent to
𝑥 ≡ 1 (mod 5),
𝑥 ≡ 2 (mod 3),
𝑥 ≡ 3 (mod 7),
and we may now employ the proof of Theorem 3.17 with 𝑚1 = 5, 𝑚2 = 3, and 𝑚3 = 7 to
produce a unique solution modulo 𝑀 = 105. We must find integers 𝑠1 , 𝑠2 , and 𝑠3 satisfying
the congruences
21𝑠1 ≡ 1 (mod 5),
35𝑠2 ≡ 2 (mod 3),
15𝑠3 ≡ 3 (mod 7).
We see easily by inspection that 𝑠1 = 1, 𝑠2 = 1, and 𝑠3 = 3 are solutions, and thus
𝑥 = 21 ⋅ 1 + 35 ⋅ 1 + 15 ⋅ 3 = 101
is the unique solution of the original system modulo 105. Hence the solutions are precisely
the integers of the form 𝑥 = 101 + 105𝑘, where 𝑘 ∈ ℤ.
□
MA 311
NUMBER THEORY
FALL 2008
19
The Chinese Remainder Theorem also allows us to deal with systems of congruences in
which the moduli are not pairwise relatively prime. The technique is to convert the system
to an equivalent one in which all the moduli are distinct prime powers.
Example 3.19. Find all solutions of the system
𝑥 ≡ 1 (mod 36)
and
𝑥 ≡ 5 (mod 56).
Solution. By the Chinese Remainder Theorem, the first congruence is equivalent to the pair
𝑥 ≡ 1 (mod 4)
and
𝑥 ≡ 1 (mod 9),
and the second congruence is equivalent to the pair
𝑥 ≡ 5 (mod 8)
and
𝑥 ≡ 5 (mod 7).
The congruences modulo powers of 2 must contain either redundant or contradictory information, so we examine these more carefully. If 𝑥 ≡ 5 (mod 8), then we can write
𝑥 = 8𝑘 + 5 = 4(2𝑘 + 1) + 1,
for some 𝑘 ∈ ℤ, and it follows that 𝑥 ≡ 1 (mod 4). Since 𝑥 ≡ 5 (mod 8) implies 𝑥 ≡ 1
(mod 4), the latter congruence is redundant and may be eliminated from consideration. We
have therefore reduced to the system
𝑥 ≡ 5 (mod 8),
𝑥 ≡ 1 (mod 9),
𝑥 ≡ 5 (mod 7),
and here the moduli are pairwise relatively prime, so Theorem 3.17 applies. We know that
the unique solution modulo 𝑀 = 504 is given by
𝑥 = 63𝑠1 + 56𝑠2 + 72𝑠3 ,
where 𝑠1 , 𝑠2 , and 𝑠3 are integers satisfying
63𝑠1 ≡ 5 (mod 8),
56𝑠2 ≡ 1 (mod 9),
72𝑠3 ≡ 5 (mod 7),
2𝑠2 ≡ 1 (mod 9),
2𝑠3 ≡ 5 (mod 7).
or equivalently,
7𝑠1 ≡ 5 (mod 8),
We see that 𝑠1 = 3, 𝑠2 = 5, and 𝑠3 = 6 satisfy these congruences, and thus
𝑥 = 63 ⋅ 3 + 56 ⋅ 5 + 72 ⋅ 6 = 901 ≡ 397
(mod 504)
is the unique solution modulo 504.
□
Example 3.20. Find all solutions of the system
𝑥 ≡ 1 (mod 36)
and
𝑥≡3
(mod 56).
Solution. As in the previous example, the Chinese Remainder Theorem implies that the
system is equivalent to
𝑥 ≡ 1 (mod 4),
𝑥 ≡ 3 (mod 8),
𝑥 ≡ 1 (mod 9),
𝑥 ≡ 3 (mod 7).
But if 𝑥 ≡ 3 (mod 8), then we have 𝑥 = 8𝑘 + 3 = 4(2𝑘) + 3 for some integer 𝑘, which shows
that 𝑥 ≡ 3 (mod 4). Hence these two congruences are inconsistent, and we conclude that
the system has no solution.
□
20
SCOTT T. PARSELL
One way of viewing the Chinese Remainder Theorem is that it gives a bijection between
the integers 𝑥 with 0 ≤ 𝑥 < 𝑀 and the integral 𝑟-tuples (𝑏1 , . . . , 𝑏𝑟 ) with 0 ≤ 𝑏𝑖 < 𝑚𝑖 . The
correspondence is given by
𝑥 Ã→ (𝑥 MOD 𝑚1 , . . . , 𝑥 MOD 𝑚𝑟 ).
The CRT is what allows us to recover 𝑥 uniquely modulo 𝑀 from the numbers 𝑏𝑖 = 𝑥 MOD
𝑚𝑖 . In fact, this yields a bijection between the 𝜙(𝑀 ) reduced residue classes modulo 𝑀 and
the 𝜙(𝑚1 ) ⋅ ⋅ ⋅ 𝜙(𝑚𝑟 ) 𝑟-tuples of reduced residue classes modulo 𝑚1 , . . . , 𝑚𝑟 . This observation
allows us to prove the following important multiplicative property of Euler’s phi function.
Theorem 3.21. If (𝑚, 𝑛) = 1, then 𝜙(𝑚𝑛) = 𝜙(𝑚)𝜙(𝑛).
Proof. By the Chinese Remainder Theorem, there is a one-to-one correspondence,
𝑥 Ã→ (𝑥 MOD 𝑚, 𝑥 MOD 𝑛)
between the integers 𝑥 with 0 ≤ 𝑥 < 𝑚𝑛 and the pairs (𝑎, 𝑏) with 0 ≤ 𝑎 < 𝑚 and 0 ≤ 𝑏 < 𝑛.
Now suppose that 𝑥 is one of the 𝜙(𝑚𝑛) integers with (𝑥, 𝑚𝑛) = 1. Then one clearly
has (𝑥, 𝑚) = (𝑥, 𝑛) = 1, so Lemma 2.6 implies that (𝑥 MOD 𝑚, 𝑥 MOD 𝑛) is one of the
𝜙(𝑚)𝜙(𝑛) pairs (𝑎, 𝑏) with (𝑎, 𝑚) = (𝑏, 𝑛) = 1. On the other hand, if 𝑥 ≡ 𝑎 (mod 𝑚) and
𝑥 ≡ 𝑏 (mod 𝑛), where (𝑎, 𝑚) = (𝑏, 𝑛) = 1, then Lemma 2.6 shows that (𝑥, 𝑚) = (𝑥, 𝑛) = 1
and hence that (𝑥, 𝑚𝑛) = 1. It follows that the CRT bijection specializes to a bijection
among reduced residue classes.
□
To help visualize the correspondence used in the proof of Theorem 3.21, we illustrate it
explicitly for the case 𝑚 = 8, 𝑛 = 9. In row 𝑖, column 𝑗 we write the unique integer 𝑥 with
0 ≤ 𝑥 < 72 that satisfies 𝑥 ≡ 𝑖 (mod 8) and 𝑥 ≡ 𝑗 (mod 9). The reduced residues modulo
8, 9, and 72 are indicated by stars, and we see that 𝜙(72) = 24 = 4 ⋅ 6 = 𝜙(8)𝜙(9).
0
1∗
2
3∗
4
5∗
6
7∗
0
0
9
18
27
36
45
54
63
1∗
64
1∗
10
19∗
28
37∗
46
55∗
2∗
56
65∗
2
11∗
20
29∗
38
47∗
3
48
57
66
3
12
21
30
39
4∗
40
49∗
58
67∗
4
13∗
22
31∗
5∗
32
41∗
50
59∗
68
5∗
14
23∗
6
24
33
42
51
60
69
6
15
7∗
16
25∗
34
43∗
52
61∗
70
7∗
8∗
8
17∗
26
35∗
44
53∗
62
71∗
Corollary 3.22. If 𝑛 = 𝑝𝛼1 1 ⋅ ⋅ ⋅ 𝑝𝛼𝑘 𝑘 , where 𝑝1 , . . . , 𝑝𝑘 are distinct primes, then
(
) (
)
1
1
𝛼1
𝛼1 −1
𝛼𝑘
𝛼𝑘 −1
𝜙(𝑛) = (𝑝 − 𝑝
) ⋅ ⋅ ⋅ (𝑝 − 𝑝
)=𝑛 1−
⋅⋅⋅ 1 −
.
𝑝1
𝑝𝑘
Proof. Applying Theorem 3.21 repeatedly gives 𝜙(𝑛) = 𝜙(𝑝𝛼1 1 ) ⋅ ⋅ ⋅ 𝜙(𝑝𝛼𝑘 𝑘 ). Now if 𝑝 is prime,
then the only positive integers less than or equal to 𝑝𝑡 that are not relatively prime to 𝑝𝑡 are
the multiples of 𝑝, namely 𝑝, 2𝑝, 3𝑝, . . . , 𝑝𝑡−1 𝑝. Since there are 𝑝𝑡−1 such multiples, we have
𝜙(𝑝𝑡 ) = 𝑝𝑡 − 𝑝𝑡−1 = 𝑝𝑡 (1 − 1/𝑝),
and the result follows.
□
MA 311
NUMBER THEORY
FALL 2008
21
Example 3.23. Compute 𝜙(21000).
Solution. We have 21000 = 23 ⋅ 3 ⋅ 53 ⋅ 7, so Corollary 3.22 gives
𝜙(21000) = (23 − 22 )(3 − 1)(53 − 52 )(7 − 1) = 4 ⋅ 2 ⋅ 100 ⋅ 6 = 4800.
□
Example 3.24. Find the last two digits of 32008 .
Solution. The last two digits are determined by the residue class modulo 100. Since 100 =
22 ⋅ 52 , we have 𝜙(100) = (4 − 2)(25 − 5) = 40 by Corollary 3.22. Moreover, one has
2008 = 40 ⋅ 50 + 8, so Euler’s Theorem gives
32008 = (340 )50 ⋅ 38 ≡ 38 ≡ 61
(mod 100).
Therefore the last two digits are 61.
□
4. Public-key cryptography
We can use Euler’s Theorem to devise a scheme for public-key encryption. In such a
system, each individual creates and publishes some unique data (known as a public key)
that allows them to receive encrypted messages from other users. The system we’ll describe
was developed at MIT in 1977 by Rivest, Shamir, and Adelman and is commonly known as
RSA. To construct the code, we choose two large primes 𝑝 and 𝑞, say around 200 digits each.
We then compute 𝑛 = 𝑝𝑞 and use Corollary 3.22 to calculate 𝜙(𝑛) = (𝑝 − 1)(𝑞 − 1). Next
we choose an integer 𝑒 > 1 that is relatively prime to 𝜙(𝑛) and use the Euclidean algorithm
to find
𝑑 = 𝑒−1 MOD 𝜙(𝑛).
We make the pair (𝑛, 𝑒) publicly available but keep 𝑑 secret. Obviously, we keep 𝑝 and 𝑞
secret as well, since knowing them would enable one to find 𝜙(𝑛), and hence 𝑑. The security
of the system rests on the fact that it is essentially impossible to factor 𝑛 in a reasonable
amount of time with current technology.
To encrypt a message to a user whose public key is (𝑛, 𝑒), we first create a digital version of
the message, say 𝑀 , using some character-to-integer scheme such as ASCII. For simplicity,
we use the conversions
A = 01, B = 02, C = 03, D = 04, . . . , Y = 25, Z = 26,
so that each letter of the alphabet corresponds to a 2-digit integer, and we use 27 to represent
a space. If desired, we could introduce additional integers to stand for punctuation marks
and other symbols. If 𝑀 ≥ 𝑛, we break the message into blocks so that each block is smaller
than 𝑛. We then encrypt the message by computing
𝐸 = 𝑀 𝑒 MOD 𝑛.
The recipient then decrypts the message by computing 𝐸 𝑑 MOD 𝑛, using Euler’s Theorem
and the fact that 𝑑𝑒 = 1 + 𝑘𝜙(𝑛) for some integer 𝑘. One has
𝐸 𝑑 ≡ (𝑀 𝑒 )𝑑 ≡ 𝑀 𝑑𝑒 ≡ 𝑀 1+𝑘𝜙(𝑛) ≡ (𝑀 𝜙(𝑛) )𝑘 ⋅ 𝑀 ≡ 𝑀
𝑑
(mod 𝑛),
and thus 𝐸 MOD 𝑛 = 𝑀 . In applying Euler’s Theorem, we implicitly assumed that
(𝑀, 𝑛) = 1, but the probability that this fails is negligible when 𝑛 is composed of two 200digit primes. The point of RSA is that, given 𝑛 and 𝑒, one cannot compute the decryption
key 𝑑 without knowing 𝜙(𝑛), which is equivalent to knowing the factorization 𝑛 = 𝑝𝑞.
22
SCOTT T. PARSELL
Example 4.1. Decode the encrypted message 0828, which was generated using RSA with
public key (4897, 19).
Solution. Here the integer 𝑛 = 4897 is far too small to create a secure cryptosystem, and
after some trial division we easily obtain the factorization 𝑛 = 𝑝𝑞, where 𝑝 = 59 and 𝑞 = 83.
It now follows from Corollary 3.22 that 𝜙(𝑛) = 58 ⋅ 82 = 4756. Next we calculate the
decryption key 𝑑 = 19−1 MOD 4756 using the Euclidean Algorithm:
[
]
[
]
[
]
1 0 4756
1 −250 6
1 −250 6
→
→
.
0 1
19
0
1 19
−3
751 1
Hence we have 𝑑 = 751, so we can decrypt the message by computing 𝑀 = 828751 MOD
4897. This computation is doable on a calculator by successive squaring. We write the
exponent 751 in binary as
1011011112 = 512 + 128 + 64 + 32 + 8 + 4 + 2 + 1
and square 𝑎 = 828 repeatedly modulo 4897. This gives 𝑎2 = 4, 𝑎4 = 16, 𝑎8 = 256,
𝑎16 = 1875, 𝑎32 = −421, 𝑎64 = 949, 𝑎128 = −447, 𝑎256 = −968, and 𝑎512 = 1697, and it
follows that
𝑀 ≡ 𝑎512 𝑎128 𝑎64 𝑎32 𝑎8 𝑎4 𝑎2 𝑎 ≡ 2515 (mod 4897),
so the message was “YO”.
□
The method of successive squaring used in the above example gives a good way of performing fast modular exponentiation, which has been implemented in many software packages.
For instance, Mathematica has the function PowerMod[a,b,n], which quickly computes 𝑎𝑏
MOD 𝑛. A useful tool for finding modular inverses is the function ExtendedGCD[a,b], which
returns gcd(𝑎, 𝑏), together with integers 𝑠 and 𝑡 satisfying gcd(𝑎, 𝑏) = 𝑎𝑠 + 𝑏𝑡. Mathematica
has built-in arbitrary precision, so it is great for handling long integers without the fear of
truncation. Programs that only store, say, the first 16 digits of a number give more than sufficient accuracy for many applications, but losing even a single digit of an integer is obviously
devastating for number theory.
Digital Signatures. We can also apply the RSA encryption principle to authenticate
digital signatures. If your public key is (𝑛, 𝑒) and you send me your signature 𝑆 in the form
𝐷 = 𝑆 𝑑 MOD 𝑛,
where 𝑑 is your personal decryption key, then I can recover 𝑆 by computing
𝐷𝑒 MOD 𝑛 = 𝑆 𝑑𝑒 MOD 𝑛 = 𝑆 1+𝑘𝜙(𝑛) MOD 𝑛 = 𝑆.
Moreover, I know that the signature is authentic since you’re the only one who knows 𝑑.
If the signature had been encrypted using an incorrect 𝑑 then I would most likely obtain
gibberish when attempting to recover 𝑆.
Example 4.2. You receive the digital signature 20496 from a user with initials S. P. and
public key (21311, 41). Does it appear to be authentic?
Solution. We compute 2049641 MOD 21311 using Mathematica and get
PowerMod[20496, 41, 21311] = 1916.
MA 311
NUMBER THEORY
FALL 2008
23
Since S=19 and P=16, the message must have come from S. P., or at least someone with
access to his decryption key. Of course, the numbers are once again so small that anyone
could have figured out the decryption key and sent a phony signature.
□
The digital signature process described above presumes that the signer is not concerned
about the possibility of his or her signature being viewed by a third party. The goal is
simply to provide a method for the recipient to verify the signer’s identity. One can transmit
sensitive information and verify the identity of the sender by nesting an encryption within
the digital signature process. Suppose that Alice, whose public key is (𝑛𝐴 , 𝑒𝐴 ), wants to
send a message 𝑀 to Bob, whose public key is (𝑛𝐵 , 𝑒𝐵 ), and Bob wants to be sure that the
message is really coming from Alice. Alice first “signs” the message using her own decryption
key, 𝑑𝐴 , and then encrypts it using Bob’s public key. Thus she computes 𝑆 = 𝑀 𝑑𝐴 MOD 𝑛𝐴
and sends Bob 𝐸 = 𝑆 𝑒𝐵 MOD 𝑛𝐵 . When Bob receives 𝐸, he uses his decryption key 𝑑𝐵 to
compute 𝑆 = 𝐸 𝑑𝐵 MOD 𝑛𝐵 and then recovers the message as 𝑀 = 𝑆 𝑒𝐴 MOD 𝑛𝐴 . At this
point, he knows the contents of the message and can be sure that it was sent by Alice and
not someone pretending to be Alice.
Primality testing. One issue in implementing RSA is that we need to find large integers
that are known to be prime. Fortunately, there are alternatives to trial division for investigating primality. As mentioned in §3, Fermat’s Little Theorem can be used to show that
an integer is not prime. For example, if 𝑝 is an odd prime, then the theorem tells us that
2𝑝−1 ≡ 1 (mod 𝑝). The converse of this statement is false; that is, there exist odd composite
integers 𝑛 with the property that 2𝑛−1 ≡ 1 (mod 𝑛). However, it turns out that such integers are fairly rare, so there is a good chance that an integer 𝑛 satisfying this congruence
will in fact be prime. An odd composite integer 𝑛 satisfying this congruence is called a
pseudoprime. The only pseudoprimes less than 1000 are
341 = 11 ⋅ 31,
561 = 3 ⋅ 11 ⋅ 17,
and
645 = 3 ⋅ 5 ⋅ 43.
More generally, if 𝑛 is an odd composite integer with (𝑏, 𝑛) = 1 and 𝑏𝑛−1 ≡ 1 (mod 𝑛), then
we say that 𝑛 is a pseudoprime for the base 𝑏.
If we want to know whether 𝑛 is prime, we could first test for divisibility by 2, 3, 5, and 7
and then compute 2𝑛−1 MOD 𝑛, 3𝑛−1 MOD 𝑛, 5𝑛−1 MOD 𝑛, and 7𝑛−1 MOD 𝑛, for instance.
If any of these is not equal to 1, then Fermat’s Theorem implies that 𝑛 is not prime. If they
are all equal to 1, then 𝑛 is very likely (but not certain) to be prime. Interestingly, and
perhaps unfortunately, there are odd composite integers 𝑛 that are pseudoprimes for every
base 𝑏 with (𝑏, 𝑛) = 1. Such numbers are called Carmichael numbers, and the smallest one
is 561. Carmichael numbers are very sparse (561 is the only one less than 1000), but it was
proved in 1994 by Alford, Granville, and Pomerance that there are infinitely many! In fact,
they showed that there are at least 𝑥2/7 Carmichael numbers not exceeding 𝑥.
The concept of pseudoprimes can be strengthened by making the following simple observation. Suppose that 𝑏𝑛−1 ≡ 1 (mod 𝑛). If 𝑛 is odd, we can write 𝑛 = 2𝑚 + 1 for some
integer 𝑚, and we see that 𝑛 divides 𝑏2𝑚 − 1 = (𝑏𝑚 − 1)(𝑏𝑚 + 1). If 𝑛 is prime, it now follows
from Euclid’s Lemma that 𝑛 divides 𝑏𝑚 − 1 or 𝑏𝑚 + 1. Thus if 𝑏𝑚 ∕≡ ±1 (mod 𝑛) then we
can conclude that 𝑛 is not prime. On the other hand, if 𝑏𝑚 ≡ 1 (mod 𝑛) and 𝑚 is even, then
we can apply the same reasoning within the factorization 𝑏𝑚 − 1 = (𝑏𝑚/2 − 1)(𝑏𝑚/2 + 1).
24
SCOTT T. PARSELL
Example 4.3. Show how to deduce that 341 is not prime without using its prime factorization.
Solution. One easily computes that 2340 ≡ 1 (mod 341), so 341 divides
2340 − 1 = (2170 − 1)(2170 + 1) = (285 − 1)(285 + 1)(2170 + 1).
We further compute that 2170 ≡ 1 (mod 341) and 285 ≡ 32 (mod 341). But if 341 were prime
then it would have to divide 285 − 1, 285 + 1, or 2170 + 1 by Euclid’s Lemma, which would
mean that 285 ≡ ±1 (mod 341) or 2170 ≡ −1 (mod 341). Since none of these conclusions
holds, we may conclude that 341 is not prime.
□
In general, if 𝑛 is an odd integer exceeding 1, we can write 𝑛 = 2𝑎 𝑡 + 1, where 𝑡 is odd and
𝑎 ≥ 1. Then one has the factorization
𝑎−1 𝑡
𝑏𝑛−1 − 1 = (𝑏𝑡 − 1)(𝑏𝑡 + 1)(𝑏2𝑡 + 1)(𝑏4𝑡 + 1) ⋅ ⋅ ⋅ (𝑏2
+ 1).
(4.1)
𝑎
In Example 4.3 we had 𝑎 = 2 and 𝑡 = 85. An odd composite integer 𝑛 = 2 𝑡 + 1 is called
a strong pseudoprime for the base 𝑏 if (𝑏, 𝑛) = 1 and 𝑛 divides one of the factors on the
right-hand side of (4.1). Any integer (prime or composite) with this property is said to have
passed the strong pseudoprime test. Strong pseudoprimes are considerably more scarce than
ordinary pseudoprimes. For the base 𝑏 = 2, for example, there are 5597 pseudoprimes up to
109 but only 1282 strong pseudoprimes, the smallest of which is 2047.
Example 4.4. Show that 2047 is a strong pseudoprime for the base 2.
Solution. We observe that 2046 is not divisible by 4, so we have 𝑎 = 1 and 𝑡 = 1023 in the
above notation. Moreover, Mathematica shows that 21023 ≡ 1 (mod 2047), so 2047 divides
21023 − 1 and hence passes the strong pseudoprime test. Finally, we note that 2047 = 23 ⋅ 89
is composite and is therefore in fact a strong pseudoprime.
□
The results are more striking if we apply the strong pseudoprime test for several different
bases. There is only one integer less than 2.5 × 1010 , namely
3, 215, 031, 751 = 151 × 751 × 28351,
that is a strong pseudoprime for bases 2, 3, 5, and 7. Moreover, there is no “strong” analogue
of the Carmichael numbers. That is, every composite number 𝑛 fails the strong pseudoprime
test for some base 𝑏 with (𝑏, 𝑛) = 1. Such a 𝑏 is called a witness to the compositeness of 𝑛.
In fact, it can be shown that at least half of the bases 𝑏 ≤ 𝑛 with (𝑏, 𝑛) = 1 are witnesses
when 𝑛 is composite, so this procedure can be used to identify primes with near certainty.
In 2004, Agrawal, Kayal, and Saxena developed an algorithm that proves the primality or
compositeness of 𝑛 with
√ a running time that is polynomial in log 𝑛. For comparison, trial
division requires 𝑂( 𝑛) steps to prove that an integer is prime. Many software packages
have built-in functions that implement various primality tests. In Mathematica, for example,
PrimeQ[n] returns true or false according to whether or not 𝑛 is prime, while Prime[k]
returns the 𝑘th prime number.
Factorization algorithms. Attacks on RSA could be made if efficient factoring algorithms were known. As with primality testing, there are algorithms that are far more
MA 311
NUMBER THEORY
FALL 2008
25
efficient than trial division, but no current algorithm comes close to breaking RSA with
200-digit primes, except in very special cases that can easily be avoided. We briefly explore
some of the ideas involved in these factorization techniques.
Fermat’s factoring method was to try to express 𝑛 as the difference of two squares. If we
can find positive integers 𝑥 and 𝑦 such that
𝑛 = 𝑥2 − 𝑦 2 = (𝑥 − 𝑦)(𝑥 + 𝑦),
then we’ve found a factorization of 𝑛, provided that 𝑥 − 𝑦 ∕= 1. Kraitchik realized that one
could apply the spirit of Fermat’s idea more efficiently by instead looking for integers 𝑥 and
𝑦 satisfying the weaker condition
𝑥2 ≡ 𝑦 2
(mod 𝑛),
so that 𝑛 divides (𝑥 − 𝑦)(𝑥 + 𝑦). This no longer ensures a factorization of 𝑛, but there is a
reasonable chance that both 𝑥 − 𝑦 and 𝑥 + 𝑦 contain some of the prime factors of 𝑛. For
example, if 𝑛 is the product of two distinct primes 𝑝 and 𝑞, one would expect roughly a 50%
chance that 𝑝 and 𝑞 split among the two factors 𝑥−𝑦 and 𝑥+𝑦. In this case gcd(𝑥−𝑦, 𝑛) will
be a non-trivial factor of 𝑛, and this can be computed efficiently via the Euclidean algorithm.
If both 𝑝 and 𝑞 divide the same factor, then one can simply try different values for 𝑥 and
𝑦. Powerful recent factoring methods like the quadratic sieve are based on finding suitable
integers 𝑥 and 𝑦 to carry out this principle.
Pollard’s so-called “rho” method is based on generating a quasi-random sequence of numbers that are distinct modulo the integer 𝑛 to be factored but not distinct modulo its smallest
prime divisor 𝑝. Suppose we generate “random” integers 𝑥1 , . . . , 𝑥𝑘 , where 𝑘 is large by com√
parison with 𝑝 but small by comparison with 𝑛. For example, we could take 𝑘 ≈ 10𝑛1/4 .
Then the probability that the 𝑥𝑖 are distinct modulo 𝑝 is very small, so gcd(𝑥𝑖 − 𝑥𝑗 , 𝑛)
will most likely produce the factor 𝑝 for some 𝑖 and 𝑗. This leads to a factorization of 𝑛
with expected running time 𝑂(𝑛1/4 ). When the method works, the numbers 𝑥𝑖 MOD 𝑝 are
eventually periodic and can thus be written in a shape resembling the Greek letter 𝜌.
Example 4.5. Use Pollard’s rho method to factor the integer 𝑛 = 36287.
Solution. First note that 236286 ≡ 35799 ∕≡ 1 (mod 36287), so Fermat’s Little Theorem
implies that 𝑛 is composite. We construct our quasi-random sequence of integers recursively
by taking 𝑥0 = 1 and 𝑥𝑖+1 = (𝑥2𝑖 + 1) MOD 𝑛. The first few terms of the sequence are
1, 2, 5, 26, 677, 22886, 2439, 33941, 24380, 3341, 22173, 25654, 26685.
In particular, one has 𝑥5 = 22886 and 𝑥12 = 26685, which gives gcd(𝑥12 − 𝑥5 , 𝑛) = 131. Thus
we obtain the factorization 36287 = 131 ⋅ 277.
□
Suppose that 𝑛 is a large composite integer with no small prime factors but that all the
prime factors of 𝑝 − 1 are small for some prime 𝑝∣𝑛. For example, suppose that 𝑝 − 1 divides
10000!. Then by Fermat’s Little Theorem one has 210000! ≡ 1 (mod 𝑝), and thus 𝑝 divides
gcd(210000! − 1, 𝑛). Thus we can attempt to find 𝑝 by computing gcd(2𝑖! − 1, 𝑛) for various
values of 𝑖. This is known as Pollard’s 𝑝 − 1 method, and it can be applied with bases other
than 2 as well.
26
SCOTT T. PARSELL
Example 4.6. Use Pollard’s 𝑝 − 1 method to factor the integer 𝑛 = 69841.
Solution. We have 2𝑛−1 ≡ 37073 ∕≡ 1 (mod 𝑛), so 𝑛 is composite. With 𝑖 = 5 in the above
notation, we obtain gcd(2120 − 1, 69841) = 331, which gives us a nontrivial divisor of 𝑛. It
is now easily checked that 𝑛 = 211 ⋅ 331 is the desired prime factorization. Note that the
method was effective here because 𝑝 − 1 = 330 = 2 ⋅ 3 ⋅ 5 ⋅ 11 is divisible by 11!. One would
typically expect to test up to 𝑖 = 11 before finding 𝑝, but we happened to find it sooner. □
One important consequence of the Pollard 𝑝 − 1 method is that RSA can be broken if the
primes 𝑝 and 𝑞 are chosen in such a way that 𝑝 − 1 or 𝑞 − 1 has only small prime factors.
Therefore, one must be careful to avoid this situation when constructing a public key. We
also mention that neither of Pollard’s algorithms will prove primality if they are applied to
prime integers. Hence they should only be used on integers that are known to be composite,
for example by failing a pseudoprime test. The Mathematica function FactorInteger[n]
implements a variety of advanced algorithms to attempt to determine the prime factorization
of 𝑛, but it typically becomes extremely slow when the smallest prime factor of 𝑛 is large.
5. Primitive roots
When (𝑎, 𝑛) = 1, we know from Euler’s Theorem that 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛). However, there
may be smaller powers of 𝑎 that are congruent to 1 modulo 𝑛. We define the order of 𝑎
modulo 𝑛 (or the order of 𝑎 in ℤ∗𝑛 ) to be the smallest positive integer 𝑑 such that
𝑎𝑑 ≡ 1 (mod 𝑛).
For example, the elements 3, 5, and 7 all have order 2 in ℤ∗8 . The elements 2 and 3 have
orders 3 and 6, respectively, in ℤ∗7 . In view of Euler’s Theorem, the order of an element in
ℤ∗𝑛 is at most 𝜙(𝑛).
Note that if 𝑎𝑑 ≡ 1 (mod 𝑛) for some positive integer 𝑑, then we can write 𝑎𝑑−1 𝑎 + 𝑘𝑛 = 1
for some 𝑘 ∈ ℤ, so Corollary 2.5 implies that (𝑎, 𝑛) = 1. Thus if (𝑎, 𝑛) > 1 then there is no
positive power of 𝑎 that is 1 modulo 𝑛. Hence order is not defined for the elements of ℤ𝑛
that are not relatively prime to 𝑛.
Theorem 5.1. If 𝑎 has order 𝑑 in ℤ∗𝑛 and 𝑚 is a positive integer with 𝑎𝑚 ≡ 1 (mod 𝑛),
then 𝑑 divides 𝑚.
Proof. We use division with remainder (Theorem 2.3) to write 𝑚 = 𝑞𝑑 + 𝑟, where 𝑞 and 𝑟
are integers with 0 ≤ 𝑟 < 𝑑. Then we have
1 ≡ 𝑎𝑚 ≡ 𝑎𝑞𝑑+𝑟 ≡ (𝑎𝑑 )𝑞 ⋅ 𝑎𝑟 ≡ 𝑎𝑟
(mod 𝑛),
so the minimality of 𝑑 implies that 𝑟 = 0, and hence 𝑚 = 𝑞𝑑, as required.
□
Corollary 5.2. The order of every element of ℤ∗𝑛 divides 𝜙(𝑛).
Proof. In view of Euler’s Theorem, this follows from Theorem 5.1 with 𝑚 = 𝜙(𝑛).
□
For example, it is easy to check directly that each element of ℤ∗7 has order 1, 2, 3, or 6
and that each element of ℤ∗8 has order 1 or 2. If the order of 𝑎 modulo 𝑛 happens to be 𝜙(𝑛)
then we say that 𝑎 is a primitive root modulo 𝑛.
MA 311
NUMBER THEORY
FALL 2008
27
Example 5.3. Determine the primitive roots modulo 7 and modulo 8.
Solution. The elements 3 and 5 are primitive roots modulo 7 because they both have order
6 = 𝜙(7). It is easily checked that all other elements of ℤ∗7 have order less than 6, so there
are no other primitive roots. Finally, there are no primitive roots modulo 8 because there
are no elements of order 4 = 𝜙(8).
□
A primitive root is sometimes a called a generator because computing successive powers
of it generates the whole of ℤ∗𝑛 . For example, 3 is a generator for ℤ∗7 because
31 = 3,
32 = 2,
33 = 6,
34 = 4,
35 = 5,
and 36 = 1.
For this reason, we sometimes use the letter 𝑔 to denote a primitive root. In algebraic
terms, the existence of a primitive root modulo 𝑛 means that ℤ∗𝑛 is a cyclic group under
multiplication. Example 5.3 shows that ℤ∗7 is cyclic but that ℤ∗8 is not. The following
theorem shows that primitive roots are generators.
Theorem 5.4. If 𝑔 is a primitive root modulo 𝑛 and (𝑟, 𝑛) = 1, then we have 𝑟 ≡ 𝑔 𝑖 (mod 𝑛)
for some integer 𝑖 with 1 ≤ 𝑖 ≤ 𝜙(𝑛).
Proof. Consider the 𝜙(𝑛) integers 𝑔, 𝑔 2 , 𝑔 3 , . . . , 𝑔 𝜙(𝑛) . If 𝑔 𝑖 ≡ 𝑔 𝑗 (mod 𝑛) for some 𝑖 and 𝑗
with 1 ≤ 𝑖 < 𝑗 ≤ 𝜙(𝑛), then we would have 𝑔 𝑗−𝑖 ≡ 1 (mod 𝑛), which is impossible since
𝑔 has order 𝜙(𝑛) and 0 < 𝑗 − 𝑖 < 𝜙(𝑛). Therefore, the integers 𝑔, 𝑔 2 , 𝑔 3 , . . . , 𝑔 𝜙(𝑛) all lie in
distinct residue classes modulo 𝑛. Since each 𝑔 𝑖 is also relatively prime to 𝑛, we deduce that
the set {𝑔, 𝑔 2 , 𝑔 3 , . . . , 𝑔 𝜙(𝑛) } forms a reduced residue system modulo 𝑛. Hence there is some
exponent 𝑖 for which 𝑟 ≡ 𝑔 𝑖 (mod 𝑛).
□
Theorem 5.5. If 𝑎 has order 𝑑 modulo 𝑛, then 𝑎𝑖 has order 𝑑/(𝑑, 𝑖) modulo 𝑛.
Proof. Let 𝑒 denote the order of 𝑎𝑖 modulo 𝑛. First of all, we have
(𝑎𝑖 )𝑑/(𝑑,𝑖) ≡ (𝑎𝑑 )𝑖/(𝑑,𝑖) ≡ 1
(mod 𝑛),
so Theorem 5.1 implies that 𝑒 divides 𝑑/(𝑑, 𝑖). Moreover, we have
𝑎𝑒𝑖 ≡ (𝑎𝑖 )𝑒 ≡ 1 (mod 𝑛),
so Theorem 5.1 further implies that 𝑑 divides 𝑒𝑖, and hence that 𝑑/(𝑑, 𝑖) divides 𝑒𝑖/(𝑑, 𝑖).
Since 𝑑/(𝑑, 𝑖) and 𝑖/(𝑑, 𝑖) are relatively prime, it follows from a homework exercise that
𝑑/(𝑑, 𝑖) divides 𝑒. Since 𝑒 divides 𝑑/(𝑑, 𝑖) and 𝑑/(𝑑, 𝑖) divides 𝑒, and both quantities are
positive, we may conclude that 𝑒 = 𝑑/(𝑑, 𝑖), as desired.
□
Corollary 5.6. If ℤ∗𝑛 contains a primitive root, then the total number of primitive roots in
ℤ∗𝑛 is 𝜙(𝜙(𝑛)). In other words, if ℤ∗𝑛 is cyclic, then it has 𝜙(𝜙(𝑛)) generators.
Proof. If 𝑔 is a primitive root modulo 𝑛, then Theorem 5.5 shows that 𝑔 𝑖 is a primitive root
if and only if (𝜙(𝑛), 𝑖) = 1. Hence there are 𝜙(𝜙(𝑛)) choices for 𝑖. By Theorem 5.4, all
□
elements of ℤ∗𝑛 can be expressed as 𝑔 𝑖 for some 𝑖, so this completes the proof.
The following theorem, due to Gauss, completely characterizes the integers 𝑛 for which
has a primitive root.
ℤ∗𝑛
28
SCOTT T. PARSELL
Theorem 5.7. There exists a primitive root modulo 𝑛 if and only if 𝑛 = 1, 2, 4, 𝑝𝑘 , or 2𝑝𝑘 ,
where 𝑝 is an odd prime and 𝑘 is a positive integer.
Example 5.8. What can you say above the existence of primitive roots modulo 𝑛 when
9 ≤ 𝑛 ≤ 20? How many primitive roots are there modulo 18 and 19?
Solution. In view of Theorem 5.7, there are primitive roots modulo 9, 10, 11, 13, 14, 17, 18,
and 19, while there are no primitive roots modulo 12, 15, 16, or 20. By Corollary 5.6, the
number of primitive roots modulo 18 is 𝜙(𝜙(18)) = 𝜙(6) = 2, and the number of primitive
roots modulo 19 is 𝜙(𝜙(19)) = 𝜙(18) = 6.
□
The full proof of Theorem 5.7 is somewhat time-consuming, although it is accessible with
elementary techniques. Rather than giving the complete argument, which involves a number
of separate cases, we will be content to prove the existence of primitive roots for prime
moduli. Before doing this, we need some auxiliary results. The following theorem, due to
Lagrange, concerns solutions of polynomial congruences modulo a prime.
Theorem 5.9. Let 𝑓 (𝑥) be a polynomial of degree 𝑑 with integer coefficients, and let 𝑝 be a
prime not dividing the leading coefficient of 𝑓 (𝑥). Then the congruence 𝑓 (𝑥) ≡ 0 (mod 𝑝)
has at most 𝑑 distinct solutions modulo 𝑝.
Proof. We proceed by induction on 𝑑. When 𝑑 = 0, the polynomial 𝑓 (𝑥) is a constant not
divisible by 𝑝, so the congruence has no solutions. Now suppose that 𝑑 > 0 and that the
result holds for all polynomials of degree less than 𝑑. Let 𝑓 (𝑥) be a polynomial of degree
𝑑, with 𝑝 not dividing the leading coefficient, and suppose that 𝑓 (𝑎) ≡ 0 (mod 𝑝). Using
division with remainder for polynomials, we can write
𝑓 (𝑥) = 𝑞(𝑥)(𝑥 − 𝑎) + 𝑟,
where 𝑞(𝑥) is a polynomial of degree 𝑑 − 1, and where 𝑟 is an integer. (Since 𝑥 − 𝑎 has degree
one, the remainder has degree zero.) Moreover, 𝑝 does not divide the leading coefficient of
𝑞(𝑥), since this is the same as the leading coefficient of 𝑓 (𝑥). We have
𝑟 = 𝑓 (𝑎) ≡ 0 (mod 𝑝),
which means that 𝑝∣𝑟, and thus for any integer 𝑥 we have
𝑓 (𝑥) ≡ 𝑞(𝑥)(𝑥 − 𝑎) (mod 𝑝).
Now if 𝑓 (𝑏) ≡ 0 (mod 𝑝), then 𝑝 divides 𝑞(𝑏)(𝑏 − 𝑎), so Euclid’s Lemma implies that 𝑝
divides 𝑞(𝑏) or 𝑝 divides 𝑏 − 𝑎. In the first case, we have 𝑞(𝑏) ≡ 0 (mod 𝑝), so the induction
hypothesis ensures that there are at most 𝑑 − 1 choices for 𝑏. In the second case, we have
𝑏 ≡ 𝑎 (mod 𝑝), which gives one additional possibility. Thus 𝑓 (𝑥) ≡ 0 (mod 𝑝) has at most
𝑑 solutions in total.
□
Note that the theorem fails for composite moduli. For example, the congruence 𝑥2 − 1 ≡ 0
(mod 8) has four solutions, 𝑥 = 1, 3, 5, 7, but the polynomial 𝑓 (𝑥) = 𝑥2 − 1 has degree two.
The next lemma establishes an interesting relationship between an integer and the Euler
phi function of its divisors. To illustrate, notice that the positive divisors of 12 are 1, 2, 3,
MA 311
NUMBER THEORY
FALL 2008
29
4, 6, and 12 and that
𝜙(1) + 𝜙(2) + 𝜙(3) + 𝜙(4) + 𝜙(6) + 𝜙(12) = 1 + 1 + 2 + 2 + 2 + 4 = 12.
The positive divisors of 17 are 1 and 17, and we have 𝜙(1) + 𝜙(17) = 1 + 16 = 17. The
positive divisors of 20 are 1, 2, 4, 5, 10, and 20, and we have
𝜙(1) + 𝜙(2) + 𝜙(4) + 𝜙(5) + 𝜙(10) + 𝜙(20) = 1 + 1 + 2 + 4 + 4 + 8 = 20.
We now show that this phenomenon occurs in general.
Lemma 5.10. Let 𝑛 be a positive integer, and let 𝑑1 , 𝑑2 , . . . , 𝑑𝑡 denote the positive divisors
of 𝑛. Then
∑
𝜙(𝑑1 ) + 𝜙(𝑑2 ) + ⋅ ⋅ ⋅ + 𝜙(𝑑𝑡 ) =
𝜙(𝑑) = 𝑛.
𝑑∣𝑛
Proof. Let 𝑆(𝑑) denote the number of integers 𝑎 with 1 ≤ 𝑎 ≤ 𝑛 and (𝑎, 𝑛) = 𝑑. Since
𝑆(𝑑) = 0 unless 𝑑 is a divisor of 𝑛, we can write
∑
𝑆(𝑑1 ) + 𝑆(𝑑2 ) + ⋅ ⋅ ⋅ + 𝑆(𝑑𝑡 ) =
𝑆(𝑑) = 𝑛.
𝑑∣𝑛
Consider an integer 𝑎 counted by 𝑆(𝑑). Then (𝑎, 𝑛) = 𝑑, so in particular we have 𝑑∣𝑎 and
𝑑∣𝑛, and furthermore (𝑎/𝑑, 𝑛/𝑑) = 1. Thus we can write 𝑎 = 𝑘𝑑 for a unique integer 𝑘 with
1 ≤ 𝑘 ≤ 𝑛/𝑑 and (𝑘, 𝑛/𝑑) = 1. Hence the number of choices for 𝑘 is 𝜙(𝑛/𝑑). Since this
also gives the number of possibilities for 𝑎, we deduce that 𝑆(𝑑) = 𝜙(𝑛/𝑑). Notice that the
numbers 𝑛/𝑑1 , 𝑛/𝑑2 , . . . , 𝑛/𝑑𝑡 are just the divisors 𝑑1 , 𝑑2 , . . . , 𝑑𝑡 , listed in a different order.
Thus we have
∑
∑
∑
𝑆(𝑑) = 𝑛,
𝜙(𝑛/𝑑) =
𝜙(𝑑) =
𝑑∣𝑛
𝑑∣𝑛
𝑑∣𝑛
as desired.
□
We can now demonstrate the existence of primitive roots modulo a prime 𝑝. The following
theorem actually makes the stronger assertion that ℤ∗𝑝 contains elements of all orders dividing
𝑝 − 1 (including 𝑝 − 1 itself). Note that Corollary 5.2 implies that no other orders are
permissible.
Theorem 5.11. If 𝑝 is prime and 𝑑 is a positive integer dividing 𝑝−1, then there are exactly
𝜙(𝑑) elements of order 𝑑 in ℤ∗𝑝 .
Proof. Let 𝑑 be a divisor of 𝑝 − 1, and let 𝑁 (𝑑) be the number of elements of order 𝑑 in
ℤ∗𝑝 . If 𝑁 (𝑑) > 0, then there exists some 𝑎 ∈ ℤ∗𝑝 of order 𝑑. The integers 𝑎, 𝑎2 , 𝑎3 , . . . , 𝑎𝑑 are
distinct modulo 𝑝, since otherwise we would have 𝑎𝑗−𝑖 ≡ 1 (mod 𝑝), where 0 < 𝑗 − 𝑖 < 𝑑,
violating the definition of order. Moreover, for each 𝑖 with 1 ≤ 𝑖 ≤ 𝑑 we have
(𝑎𝑖 )𝑑 ≡ (𝑎𝑑 )𝑖 ≡ 1 (mod 𝑝),
so each 𝑎𝑖 is a solution of the congruence 𝑥𝑑 − 1 ≡ 0 (mod 𝑝), and Theorem 5.9 implies that
these are the only solutions. Furthermore, every element of order 𝑑 satisfies the congruence
and must therefore be a power of 𝑎. We know from Theorem 5.5 that 𝑎𝑖 has order 𝑑 if and
only if (𝑑, 𝑖) = 1, so there are exactly 𝜙(𝑑) elements of order 𝑑. Thus we’ve shown that either
30
SCOTT T. PARSELL
𝑁 (𝑑) = 0 or 𝑁 (𝑑) = 𝜙(𝑑) whenever 𝑑∣(𝑝 − 1). Since there are 𝑝 − 1 elements in ℤ∗𝑝 , we deduce
from Lemma 5.10 that
∑
∑
𝑁 (𝑑) = 𝑝 − 1 =
𝜙(𝑑).
𝑑∣(𝑝−1)
𝑑∣(𝑝−1)
Since 𝑁 (𝑑) ≤ 𝜙(𝑑) for each 𝑑, we must actually have 𝑁 (𝑑) = 𝜙(𝑑) for each 𝑑, and this
completes the proof.
□
Corollary 5.12. There are exactly 𝜙(𝑝 − 1) primitive roots in ℤ∗𝑝 .
Proof. This follows immediately by taking 𝑑 = 𝑝 − 1 in Theorem 5.11.
□
The Lucas primality test. Suppose the integer 𝑛 has passed a strong pseudoprime test
and is therefore suspected to be prime. It turns out that we can then use primitive roots to
try to prove that 𝑛 is prime. Suppose that 𝑛 passes the ordinary pseudoprime test for the
base 𝑏, so that
𝑏𝑛−1 ≡ 1 (mod 𝑛),
and further that we are able to factor 𝑛 − 1, say 𝑛 − 1 = 𝑝𝑒11 ⋅ ⋅ ⋅ 𝑝𝑒𝑟𝑟 . Theorem 5.1 implies
that the order of 𝑏 modulo 𝑛 divides 𝑛 − 1, so if we can show that
𝑏(𝑛−1)/𝑝𝑖 ∕≡ 1
(mod 𝑛)
for each 𝑖 then we may conclude that the order of 𝑏 is actually 𝑛 − 1. On the other hand, we
know from Euler’s Theorem that the order of 𝑏 cannot exceed 𝜙(𝑛), so we have 𝑛 − 1 ≤ 𝜙(𝑛).
But it follows easily from Corollary 3.22 that this can only happen if 𝑛 is prime, in which
case 𝜙(𝑛) = 𝑛 − 1.
Example 5.13. Use the Lucas test to prove that 𝑛 = 631 is prime.
Solution. We have 3630 ≡ 1 (mod 631), so the order of 3 modulo 631 must divide 630.
Moreover we have 630 = 2 ⋅ 32 ⋅ 5 ⋅ 7 and
3(𝑛−1)/2 = 3315 ≡ −1 (mod 631),
(𝑛−1)/5
3
=3
126
≡ 242 (mod 631),
3(𝑛−1)/3 = 3210 ≡ −44
3
(𝑛−1)/7
=3
90
≡ 269
(mod 631),
(mod 631),
which shows that the order of 3 modulo 631 is actually equal to 630. We may therefore
conclude that 631 is prime and that 3 is a primitive root modulo 631.
□
If we find an element 𝑏 of order 𝑛 − 1 in ℤ∗𝑛 , then the above argument shows that 𝑛 is
prime and hence that 𝑏 is a primitive root modulo 𝑛. So the success of the test depends
in part on being able to find primitive roots quickly. However, Corollary 5.12 implies that
there are 𝜙(𝑛 − 1) primitive roots in ℤ∗𝑛 when 𝑛 is prime, and a bit of elementary analytic
number theory shows that 𝜙(𝑛) ≈ 𝜋62 𝑛 on average. Hence the proportion of numbers 𝑏 ≤ 𝑛
that are primitive roots modulo 𝑛 averages about 6/𝜋 2 ≈ 0.608 for large prime values of 𝑛.
Thus we have a good chance of finding a suitable 𝑏 fairly quickly if 𝑛 is in fact prime.
A more serious issue is that it may not be easy to factor 𝑛 − 1. If we’re lucky, it will have
several relatively small prime factors, but there might be a large factor remaining whose
primality needs to be established. In this case, we can iterate the Lucas test until our
numbers 𝑛 − 1 are small enough to be factored by trial division.
MA 311
NUMBER THEORY
FALL 2008
31
The Diffie-Hellman key exchange. The first secure method for public-key cryptography was actually developed about two years before the RSA breakthrough. One of the
fundamental problems with classical cryptography is the difficulty of agreeing on the key for
a particular cipher without having this information intercepted. Diffie and Hellman resolved
this by generating large prime 𝑝 and then choosing a primitive root 𝑠 modulo 𝑝. Note that
Corollary 5.12 ensures that there are 𝜙(𝑝 − 1) possible choices for 𝑠. The pair (𝑝, 𝑠) is public information. Now if Alice wants to communicate with Bob, she chooses a number 𝑎 at
random between 2 and 𝑝 − 2 and sends him 𝛼 = 𝑠𝑎 MOD 𝑝. Bob then chooses a number 𝑏
at random between 2 and 𝑝 − 2 and sends Alice 𝛽 = 𝑠𝑏 MOD 𝑝. Since 𝑠 is a primitive root
modulo 𝑝, we know that neither 𝛼 nor 𝛽 will equal 1. We observe that
𝑘 = 𝑠𝑎𝑏 MOD 𝑝 = 𝛽 𝑎 MOD 𝑝 = 𝛼𝑏 MOD 𝑝
can now be calculated by both Alice and Bob and can be used as the key for whatever
cryptosystem they employ.
Example 5.14. Using the public prime 𝑝 = 197 and public base 𝑠 = 31, show how Alice and
Bob can agree on a common key for secure communication.
Solution. Suppose that Alice randomly chooses 𝑎 = 72. Then she sends
𝛼 = 3172 MOD 197 = 76
to Bob. If Bob randomly chooses 𝑏 = 109, then he sends
𝛽 = 31109 MOD 197 = 147
to Alice. At this point, Alice computes
𝛽 72 MOD 197 = 14772 MOD 197 = 28
and Bob computes
𝛼109 MOD 197 = 76109 MOD 197 = 28,
so they’ve agreed on the key 𝑘 = 28.
□
The natural way for Eve (who is eavesdropping) to obtain the key 𝑘 would be to solve the
two congruences
𝑠𝑎 ≡ 𝛼 (mod 𝑝) and 𝑠𝑏 ≡ 𝛽 (mod 𝑝)
(5.1)
for 𝑎 and 𝑏. Solving either one of these is known as the discrete log problem, and there is
no known efficient algorithm for handling it. It is believed that no such algorithm exists,
but this has not been proven. What Eve really needs is an efficient algorithm for finding 𝑠𝑎𝑏
MOD 𝑝 from 𝑠𝑎 and 𝑠𝑏 , which is known as the Diffie-Hellman problem. Its solution would
obviously follow from a solution to the discrete log problem, but it’s not known whether the
two problems are equivalent. In view of Theorem 5.4, the fact that 𝑠 is a primitive root
modulo 𝑝 ensures that the congruences (5.1) have unique solutions 𝑎 and 𝑏 between 1 and
𝑝 − 1 for every choice of 𝛼 and 𝛽 between 1 and 𝑝 − 1. The uniqueness of the solutions
obviously makes it less likely that Eve will stumble upon one quickly by trial and error.
The ElGamal Cryptosystem. One disadvantage of the Diffie-Hellman method is that
Alice has to wait for a response from Bob before she can calculate the key and initiate secure
communication. However, ElGamal showed that the protocol can be adapted to create a selfcontained public-key cryptosystem. In addition to the public prime 𝑝 and base 𝑠, suppose
32
SCOTT T. PARSELL
that Alice and Bob publish their numbers 𝛼 and 𝛽 in a directory. If Alice wants to send a
message 𝑥 to Bob, she generates a random session key 𝑘 between 2 and 𝑝 − 2 and sends Bob
𝑡 = 𝑠𝑘 MOD 𝑝 and 𝑦 = 𝛽 𝑘 𝑥 MOD 𝑝.
He then recovers the message by computing
𝑦(𝑡𝑏 )−1 MOD 𝑝 = (𝑥𝑠𝑏𝑘 ) ⋅ (𝑠𝑘𝑏 )−1 = 𝑥,
provided that 𝑥 ≤ 𝑝 − 1. Longer messages can of course be broken into blocks prior to
encryption.
Example 5.15. Suppose that Alice and Bob use ElGamal with public prime 𝑝 = 11881379
and base 𝑠 = 23, and that Alice has published 𝛼 = 10442571. How can Bob discreetly ask
Alice to tea?
Solution. Bob first needs to pick a random session key, say 𝑘 = 101. He then converts TEA
to digital form, say 𝑥 = 200501, and calculates
𝑡 = 23101 MOD 𝑝 = 3054634 and 𝑦 = 200501 ⋅ 10442571101 MOD 𝑝 = 3497868.
Bob now sends the pair (3054634, 3497868) to Alice, and she recovers the message using her
private key, 𝑎 = 8137:
3497868 ⋅ (30546348137 )−1 MOD 𝑝 = 3497868 ⋅ 7225717 MOD 𝑝 = 200501.
□
6. Quadratic reciprocity
Quadratic Residues. Having studied linear congruences in §3, it is natural to ask about
solving quadratic congruences. Let 𝑝 be a prime and let 𝑎 ∈ ℤ∗𝑝 . We say that 𝑎 is a quadratic
residue modulo 𝑝 if there exists 𝑥 such that 𝑥2 ≡ 𝑎 (mod 𝑝). If no such 𝑥 exists, then 𝑎 is
called a quadratic non-residue modulo 𝑝. We sometimes denote the sets of quadratic residues
and non-residues in ℤ∗𝑝 by 𝑅 and 𝑁 , respectively.
Example 6.1. Identify the quadratic residues and non-residues in ℤ∗5 , ℤ∗7 , and ℤ∗11 .
Solution. In ℤ∗5 , we have 𝑅 = {1, 4} and 𝑁 = {2, 3}. In ℤ∗7 , we have 𝑅 = {1, 2, 4} and
𝑁 = {3, 5, 6}. In ℤ∗11 , we have 𝑅 = {1, 3, 4, 5, 9} and 𝑁 = {2, 6, 7, 8, 10}.
□
The next theorem shows that there are always equal numbers of quadratic residues and
non-residues modulo an odd prime.
Theorem 6.2. If 𝑝 is an odd prime, then ∣𝑅∣ = ∣𝑁 ∣ = 12 (𝑝 − 1).
Proof. If 𝑥2 ≡ 𝑦 2 (mod 𝑝), then 𝑝 divides 𝑥2 −𝑦 2 = (𝑥−𝑦)(𝑥+𝑦), so Euclid’s Lemma implies
that 𝑝 divides 𝑥 − 𝑦 or 𝑥 + 𝑦, and thus 𝑥 ≡ ±𝑦 (mod 𝑝). Thus every quadratic residue in ℤ∗𝑝
has exactly two distinct square roots, which implies that the set 𝑅 = {𝑥2 : 𝑥 ∈ ℤ∗𝑝 } contains
□
exactly half the elements of ℤ∗𝑝 .
How do we determine whether a particular element of ℤ∗𝑝 is a quadratic residue? One
answer is given by the following theorem.
MA 311
NUMBER THEORY
FALL 2008
33
Theorem 6.3. (Euler’s Criterion) Let 𝑝 be an odd prime, and let 𝑎 ∈ ℤ∗𝑝 . Then 𝑎 is a
quadratic residue modulo 𝑝 if and only if 𝑎(𝑝−1)/2 ≡ 1 (mod 𝑝).
Proof. By Theorem 5.11, we know that there exists a primitive root 𝑔 modulo 𝑝. If 𝑎 is a
quadratic residue modulo 𝑝, then there exists 𝑥 ∈ ℤ∗𝑝 with 𝑥2 ≡ 𝑎 (mod 𝑝). By Theorem
5.4, we have 𝑥 ≡ 𝑔 𝑖 (mod 𝑝) for some integer 𝑖, and thus 𝑎 ≡ 𝑔 2𝑖 (mod 𝑝). It follows that
𝑎(𝑝−1)/2 ≡ (𝑔 2𝑖 )(𝑝−1)/2 ≡ (𝑔 𝑝−1 )𝑖 ≡ 1 (mod 𝑝).
Conversely, if 𝑎 is a quadratic non-residue modulo 𝑝, then 𝑎 cannot be an even power of 𝑔,
so Theorem 5.4 implies that 𝑎 ≡ 𝑔 2𝑗+1 (mod 𝑝) for some integer 𝑗. Thus we have
𝑎(𝑝−1)/2 ≡ (𝑔 2𝑗+1 )(𝑝−1)/2 ≡ (𝑔 𝑝−1 )𝑗 𝑔 (𝑝−1)/2 ≡ 𝑔 (𝑝−1)/2 ∕≡ 1
(mod 𝑝),
since 𝑔 has order 𝑝 − 1. In fact, we can deduce that 𝑎(𝑝−1)/2 ≡ −1 (mod 𝑝), since 𝑎(𝑝−1)/2 is
a solution of the congruence 𝑥2 ≡ 1 (mod 𝑝).
□
As a result, the congruence 𝑥2 ≡ 𝑎 (mod 𝑝) has two solutions if 𝑎(𝑝−1)/2 ≡ 1 (mod 𝑝) and
no solutions if 𝑎(𝑝−1)/2 ≡ −1 (mod 𝑝).
Example 6.4. Decide whether the congruence 𝑥2 ≡ 6 (mod 37) has solutions.
Solution. We have
618 ≡ (62 )9 ≡ (−1)9 ≡ −1 (mod 37),
so Euler’s Criterion implies that the congruence has no solution.
□
( )
When 𝑎 is an integer and 𝑝 is an odd prime,
⎧
( ) 
⎨ 0
𝑎
=
1

𝑝
⎩−1
we define the Legendre symbol
𝑎
𝑝
by
if 𝑝∣𝑎
if 𝑎 ∈ 𝑅 .
if 𝑎 ∈ 𝑁
Note that this definition only depends on the residue class( of) 𝑎 modulo
( 3 ) 𝑝, so replacing
( 7 ) 𝑎 by
2
𝑎 + 𝑘𝑝 does not change the value. For example, we have 7 = 1, 7 = −1, and 7 = 0.
The Legendre symbol (sometimes read as “𝑎 on 𝑝”) has the following useful properties:
Theorem 6.5. Let 𝑎 and 𝑏 be integers, and let 𝑝 be an odd prime. Then
( )
( ) ( )( )
𝑎
𝑎𝑏
𝑎
𝑏
(𝑝−1)/2
(i)
≡𝑎
(mod 𝑝)
(iii)
=
𝑝
𝑝
𝑝
𝑝
( )
( 2)
−1
𝑎
(ii)
= (−1)(𝑝−1)/2
(iv)
= 1 if 𝑎 is not divisible by 𝑝.
𝑝
𝑝
Proof. Fermat’s Little Theorem gives 𝑎𝑝−1 ≡ 1 (mod 𝑝) when (𝑎, 𝑝) = 1, so property (i)
follows immediately from Euler’s Criterion and Euclid’s Lemma. Properties (ii), (iii), and
(iv) follow easily from (i).
□
Note that property (ii) implies that −1 is a quadratic residue mod 𝑝 if and only if 𝑝 ≡ 1
(mod 4). For example, the congruence 𝑥2 ≡ −1 is solvable modulo 73 but not modulo 71.
34
SCOTT T. PARSELL
The following criterion is a key ingredient in proving the law of quadratic reciprocity,
which provides an efficient method for computing the Legendre symbol.
Theorem 6.6. (Gauss’ Criterion) Let 𝑝 be an odd prime, and let 𝑎 be a positive integer
not divisible by 𝑝. For 1 ≤ 𝑖 ≤ 21 (𝑝 − 1), let 𝑟𝑖 = 𝑎(2𝑖 − 1) MOD 𝑝, and let 𝑡 be the number
of 𝑟𝑖 that are even. Then we have
( )
𝑎
= (−1)𝑡 .
𝑝
Example 6.7. Use Gauss’ criterion to calculate
(2) ( 2 ) ( 2 )
(2)
, 11 , 13 , and 17
.
7
Solution. For 𝑝 = 7, we have 𝑟1 = 2 ⋅ 1 = 2, 𝑟2 = 2 ⋅ 3 = 6, and
( )𝑟3 = 2 ⋅ 5 = 3. Hence the
number of even residues is 𝑡 = 2, and Gauss’ Criterion gives 72 = (−1)2 = 1. Similarly,
have 𝑟1 = 2, 𝑟2 = 6, 𝑟3 = 10, 𝑟4 = 3, and 𝑟5 = 7, which yields 𝑡 = 3 and
(for2 )𝑝 = 11 we
3
=
(−1)
=
−1. For 𝑝 =( 13,) we get 𝑟1 = 2, 𝑟2 = 6, 𝑟3 = 10, 𝑟4 = 1, 𝑟5 = 5, and 𝑟6 = 9,
11
2
so we again have 𝑡 = 3 and 13
= −1. Finally, for 𝑝 = 17 we have(𝑟1 )= 2, 𝑟2 = 6, 𝑟3 = 10,
2
𝑟4 = 14, 𝑟5 = 1, 𝑟6 = 5, 𝑟7 = 9, and 𝑟8 = 13, which gives 𝑡 = 4 and 17
= 1.
□
The result of Example 6.7 may be generalized as follows.
( )
2
2
Corollary 6.8. If 𝑝 is an odd prime, then
= (−1)(𝑝 −1)/8 .
𝑝
Proof. It is an easy exercise to check that (𝑝2 − 1)/8 is even if 𝑝 ≡ ±1 (mod 8) and odd if
𝑝 ≡ ±3 (mod 8). The proof therefore splits into four cases. First of all, suppose that 𝑝 ≡ 1
(mod 8) so that 𝑝 = 8𝑘 + 1 for some positive integer 𝑘. The numbers
2 ⋅ 1, 2 ⋅ 3, 2 ⋅ 5, . . . , 2(4𝑘 − 1)
are all less than 𝑝 (since 8𝑘 − 2 < 8𝑘 + 1), so their residues are all clearly even. On the other
hand, the numbers
2(4𝑘 + 1), 2(4𝑘 + 3), 2(4𝑘 + 5), . . . , 2(8𝑘 − 1)
all lie between 𝑝 and 2𝑝, so their residues are 1, 5, 9, . . . , 8𝑘 − 3, which are all odd. The
number of even residues in Gauss’ criterion is therefore 𝑡 = 2𝑘, since 2𝑖 − 1 ranges from 1
to 4𝑘 − 1 as 𝑖 ranges from 1 to 2𝑘, and thus (2/𝑝) = (−1)2𝑘 = 1. Next suppose that 𝑝 ≡ 3
(mod 8), so that 𝑝 = 8𝑘 + 3. Then the numbers
2 ⋅ 1, 2 ⋅ 3, 2 ⋅ 5, . . . , 2(4𝑘 + 1)
are all less than 𝑝 (since 8𝑘 + 2 < 8𝑘 + 3), so their residues are even. The numbers
2(4𝑘 + 3), 2(4𝑘 + 5), 2(4𝑘 + 7), . . . , 2(8𝑘 + 1)
all lie between 𝑝 and 2𝑝, so their residues are 3, 7, 11, . . . , 8𝑘 − 1, which are all odd. We
therefore have 𝑡 = 2𝑘 + 1 and hence (2/𝑝) = (−1)2𝑘+1 = −1. The remaining two cases are
left as exercises.
□
Proof of Gauss’ criterion: Write 𝑚 = 21 (𝑝 − 1). We re-index the residues so that
𝑟1 , 𝑟2 , . . . , 𝑟𝑡 are even and 𝑟𝑡+1 , 𝑟𝑡+2 , . . . , 𝑟𝑚 are odd. Let 𝑏1 , 𝑏2 , . . . , 𝑏𝑚 be the positive odd
MA 311
NUMBER THEORY
FALL 2008
35
integers less than 𝑝, re-ordered so that 𝑟𝑖 = 𝑎𝑏𝑖 MOD 𝑝. The numbers
𝑝 − 𝑟1 , 𝑝 − 𝑟2 , . . . , 𝑝 − 𝑟𝑡 , 𝑟𝑡+1 , 𝑟𝑡+2 , . . . , 𝑟𝑚
are positive odd integers less than 𝑝; we claim that they are distinct and hence a re-ordering
of 𝑏1 , . . . , 𝑏𝑚 . To show this, we consider three cases:
(i) If 𝑟𝑖 = 𝑟𝑗 , where 𝑡 + 1 ≤ 𝑖, 𝑗 ≤ 𝑚, then 𝑎𝑏𝑖 ≡ 𝑎𝑏𝑗 (mod 𝑝), so Lemma 3.3 gives 𝑏𝑖 ≡ 𝑏𝑗
(mod 𝑝). But 𝑏1 , . . . , 𝑏𝑚 are distinct positive integers less than 𝑝, so we deduce that 𝑖 = 𝑗.
(ii) If 𝑝 − 𝑟𝑖 = 𝑝 − 𝑟𝑗 , where 1 ≤ 𝑖, 𝑗 ≤ 𝑡, then 𝑟𝑖 = 𝑟𝑗 , so the above argument gives 𝑖 = 𝑗.
(iii) If 𝑝 − 𝑟𝑖 = 𝑟𝑗 , where 1 ≤ 𝑖 ≤ 𝑡 and 𝑡 + 1 ≤ 𝑗 ≤ 𝑚, then 𝑟𝑖 + 𝑟𝑗 ≡ 0 (mod 𝑝), so
𝑎(𝑏𝑖 + 𝑏𝑗 ) ≡ 0 (mod 𝑝), and thus 𝑏𝑖 + 𝑏𝑗 ≡ 0 (mod 𝑝). Since 0 < 𝑏𝑖 + 𝑏𝑗 < 2𝑝, it follows that
𝑏𝑖 + 𝑏𝑗 = 𝑝, which is impossible since 𝑏𝑖 + 𝑏𝑗 is even.
We therefore have
𝑏1 ⋅ ⋅ ⋅ 𝑏𝑚 ≡ (𝑝 − 𝑟1 ) ⋅ ⋅ ⋅ (𝑝 − 𝑟𝑡 )𝑟𝑡+1 ⋅ ⋅ ⋅ 𝑟𝑚 ≡ (−1)𝑡 𝑟1 ⋅ ⋅ ⋅ 𝑟𝑚 ≡ (−1)𝑡 𝑎𝑚 (𝑏1 ⋅ ⋅ ⋅ 𝑏𝑚 ) (mod 𝑝).
Since 𝑝 does not divide 𝑏1 ⋅ ⋅ ⋅ 𝑏𝑚 , we deduce that 𝑎𝑚 ≡ (−1)𝑡 (mod 𝑝), and the result now
follows from part (i) of Theorem 6.5. □
We are now ready to state the main theorem of this section, which is one of the most
important and beautiful results in elementary number theory.
Theorem 6.9. (Quadratic Reciprocity) If 𝑝 and 𝑞 are distinct odd primes, then
{
( )( )
(𝑝−1)(𝑞−1)
𝑝
𝑞
1 if 𝑝 ≡ 1 (mod 4) or 𝑞 ≡ 1 (mod 4)
4
= (−1)
=
𝑞
𝑝
−1 if 𝑝 ≡ 𝑞 ≡ 3 (mod 4).
The proof of Theorem 6.9 uses Gauss’ criterion but requires a somewhat technical argument to count the even residues 𝑝(2𝑖 − 1) MOD 𝑞 and 𝑞(2𝑗 − 1) MOD 𝑝. There are actually
many ways of proving quadratic reciprocity; over 200 different proofs have appeared in print
since Gauss’ original work in the early 1800s. Before launching into a proof, we illustrate
with some typical applications.
( )
11
Example 6.10. Use quadratic reciprocity to calculate
.
31
Solution. Since 11 and 31 are both primes congruent to 3 mod 4, quadratic
reciprocity gives
( 11 )
( 31 )
( 31 ) ( 9 ) ( 32 )
= − 11 . Now since 31 ≡ 9 (mod 11), we have 11 = 11 = 11 = 1. We therefore
31
( 11 )
conclude that 31
= −1 and hence that 11 is a quadratic non-residue modulo 31.
□
(
Example 6.11. Use quadratic reciprocity to calculate
)
42
.
61
Solution. We first apply Theorem 6.5 (iii) to write
( ) ( )( )( )
42
2
3
7
=
.
61
61
61
61
36
SCOTT T. PARSELL
(2)
Since 61 ≡ 5 (mod 8), Corollary 6.8 gives 61
= −1. Next, since 61 ≡ 1 (mod 4), we may
apply quadratic reciprocity to obtain
( ) ( ) ( )
( ) ( ) ( )
61
1
7
61
5
3
=
=
= 1 and
=
=
= −1,
61
3
3
61
7
7
( 42 )
by the result of Example 6.1. Thus we conclude that 61
= (−1) ⋅ (1) ⋅ (−1) = 1, and hence
that 42 is a quadratic residue modulo 61.
□
Quadratic reciprocity can be used to determine a general criterion for 3 to be a quadratic
residue modulo a prime 𝑝 > 3. The result is somewhat reminiscent of the analogous criterion
for (2/𝑝) given in Corollary 6.8, except that here the conclusion depends on the residue class
of 𝑝 modulo 12 rather than modulo 8.
( ) {
3
1 if 𝑝 ≡ ±1 (mod 12)
Corollary 6.12. One has
=
.
𝑝
−1 if 𝑝 ≡ ±5 (mod 12)
Proof. If 𝑝 ≡ 1 (mod 12), then 𝑝 ≡ 1 (mod 3) and 𝑝 ≡ 1 (mod 4), so quadratic reciprocity
gives
( ) ( ) ( )
3
𝑝
1
=
=
= 1.
𝑝
3
3
Similarly, if 𝑝 ≡ −1 (mod 12), then 𝑝 ≡ 2 (mod 3) and 𝑝 ≡ 3 (mod 4), so quadratic
reciprocity yields
( )
( )
( )
3
𝑝
2
=−
=−
= −(−1) = 1.
𝑝
3
3
We leave the remaining two cases as exercises.
□
A proof of quadratic reciprocity. We now describe an argument that leads from
Gauss’ criterion to the conclusion of Theorem 6.9. Let 𝑝 and 𝑞 be odd primes, and define
𝑟𝑖 = 𝑞(2𝑖 − 1) MOD 𝑝
(1 ≤ 𝑖 ≤
𝑝−1
)
2
and
𝑠𝑗 = 𝑝(2𝑗 − 1) MOD 𝑞
(1 ≤ 𝑗 ≤ 𝑞−1
).
2
( )
( )
By Theorem 6.6, we have 𝑝𝑞 = (−1)𝑡 , where 𝑡 is the number of even 𝑟𝑖 , and 𝑝𝑞 = (−1)𝑢 ,
where 𝑢 is the number of even 𝑠𝑗 . It follows that
( )( )
𝑝
𝑞
= (−1)𝑡+𝑢 .
(6.1)
𝑞
𝑝
It therefore suffices to show that 𝑡 + 𝑢 is odd if and only if 𝑝 ≡ 𝑞 ≡ 3 (mod 4). We now
let 𝑋 denote the set of all integers of the form 𝑥 = 𝑞𝑎 − 𝑝𝑏, where 𝑎 and 𝑏 are odd integers
with 1 ≤ 𝑎 < 𝑝 and 1 ≤ 𝑏 < 𝑞. For example, if 𝑝 = 7 and 𝑞 = 11, each element of 𝑋
has the form 𝑥 = 11𝑎 − 7𝑏 where 𝑎 ∈ {1, 3, 5} and 𝑏 ∈ {1, 3, 5, 7, 9}. Taking 𝑎 = 1 gives
𝑥 = 4, −10, −24, −38, −52, while 𝑎 = 3 gives 𝑥 = 26, 12, −2, −16, −30, and finally 𝑎 = 5
gives 𝑥 = 48, 34, 20, 6, −8, for a total of 15 elements.
MA 311
NUMBER THEORY
FALL 2008
37
Lemma 6.13. The elements of 𝑋 are nonzero even integers, and one has
∣𝑋∣ = 14 (𝑝 − 1)(𝑞 − 1).
Proof. Suppose that 𝑥 = 𝑞𝑎 − 𝑝𝑏 ∈ 𝑋. Then 𝑞𝑎 and 𝑝𝑏 are odd, so 𝑥 is clearly even.
Moreover, if 𝑞𝑎 = 𝑝𝑏, then 𝑝∣𝑞𝑎, and hence 𝑝∣𝑎, which is impossible since 1 ≤ 𝑎 < 𝑝. Finally,
if 𝑞𝑎 − 𝑝𝑏 = 𝑞𝑎′ − 𝑝𝑏′ , then 𝑞(𝑎 − 𝑎′ ) = 𝑝(𝑏 − 𝑏′ ), which implies that 𝑝∣(𝑎 − 𝑎′ ) and hence that
𝑎 = 𝑎′ and 𝑏 = 𝑏′ , since −𝑝 < 𝑎 − 𝑎′ < 𝑝. Hence these expressions are all distinct, and ∣𝑋∣ is
just the number of ordered pairs (𝑎, 𝑏).
□
Next, we let 𝑌 = {𝑟 ∈ 𝑋 : −𝑞 < 𝑟 < 𝑝}. For example, when 𝑝 = 7 and 𝑞 = 11, we have
𝑌 = {−10, −8, −2, 4, 6}.
Lemma 6.14. One has ∣𝑌 ∣ = 𝑡 + 𝑢.
Proof. First suppose that 𝑟 ∈ 𝑌 and that 0 < 𝑟 < 𝑝. Then 𝑟 ≡ 𝑞𝑎 (mod 𝑝) for some odd
integer 𝑎 with 1 ≤ 𝑎 < 𝑝, and we can write 𝑎 = 2𝑖 − 1 with 1 ≤ 𝑖 ≤ 21 (𝑝 − 1). But since 𝑟 < 𝑝,
we must actually have 𝑟 = 𝑟𝑖 , and Lemma 6.13 shows that this is one of the even residues
counted by 𝑡. On the other hand, if 𝑟𝑖 = 𝑞(2𝑖 − 1) MOD 𝑝 is even, then 0 < 𝑟𝑖 < 𝑝, and
𝑟𝑖 ≡ 𝑞𝑎 (mod 𝑝), where 𝑎 = 2𝑖 − 1 is odd and 1 ≤ 𝑎 < 𝑝. It then follows that 𝑞𝑎 − 𝑟𝑖 = 𝑝𝑏
for some 𝑏 ∈ ℤ, and clearly 𝑏 must be odd and positive. Moreover, 𝑝𝑏 < 𝑞𝑎 < 𝑞𝑝 and hence
𝑏 < 𝑞. This shows that 𝑟𝑖 ∈ 𝑌 . We may therefore conclude that the elements 𝑟 ∈ 𝑌 with
0 < 𝑟 < 𝑝 are precisely the even residues 𝑟𝑖 counted by 𝑡. A similar argument shows that the
elements 𝑠 ∈ 𝑌 with −𝑞 < 𝑠 < 0 are precisely the negatives of the even residues 𝑠𝑗 counted
by 𝑢, and the lemma follows immediately.
□
To determine whether 𝑡 + 𝑢 is even, we attempt to pair up the elements of 𝑌 via the
correspondence
𝑞𝑎 − 𝑝𝑏 Ã→ 𝑞(𝑝 − 1 − 𝑎) − 𝑝(𝑞 − 1 − 𝑏).
(6.2)
For example, when 𝑝 = 7 and 𝑞 = 11, we have 11𝑎 − 7𝑏 Ã→ 11(6 − 𝑎) − 7(10 − 𝑏), which gives
the pairs (4, −8), (−10, 6), and (−2, −2). On the other hand, if 𝑝 = 5 and 𝑞 = 7, then 𝑋 =
{−18, −8, −4, 2, 6, 16} and 𝑌 = {−4, 2}, so the correspondence 7𝑎−5𝑏 Ã→ 7(4−𝑎)−5(6−𝑏)
yields the obvious pair (2, −4). We now aim to show that this correspondence gives the
desired parity result for ∣𝑌 ∣.
Lemma 6.15. The pairs arising from the correspondence (6.2) consist of distinct elements
unless 𝑝 ≡ 𝑞 ≡ 3 (mod 4), in which case a single element is paired with itself.
Proof. We first note that if 𝑞𝑎 − 𝑝𝑏 ∈ 𝑌 then one has
−𝑞 = −𝑞 + 𝑝 − 𝑝 < −𝑞 + 𝑝 − (𝑞𝑎 − 𝑝𝑏) < −𝑞 + 𝑝 + 𝑞 = 𝑝,
which shows that 𝑞(𝑝 − 1 − 𝑎) − 𝑝(𝑞 − 1 − 𝑏) = −𝑞 + 𝑝 − (𝑞𝑎 − 𝑝𝑏) ∈ 𝑌 . Moreover, Lemma 6.13
shows that the expressions 𝑞𝑎 − 𝑝𝑏 are distinct, so if an element is paired with itself in (6.2),
we must have 𝑎 = 𝑝 − 1 − 𝑎 and 𝑏 = 𝑞 − 1 − 𝑏, which gives 𝑎 = 12 (𝑝 − 1) and 𝑏 = 21 (𝑞 − 1).
But these values are both odd if and only if 𝑝 ≡ 𝑞 ≡ 3 (mod 4), and this completes the
proof.
□
The proof of quadratic reciprocity is now within our grasp. By Lemmas 6.14 and 6.15, we
see that ∣𝑌 ∣ = 𝑡 + 𝑢 is odd if and only if 𝑝 ≡ 𝑞 ≡ 3 (mod 4), so the result follows from (6.1).
38
SCOTT T. PARSELL
The Jacobi symbol. There is a generalization of the Legendre symbol, called the Jacobi
symbol, that is defined whenever the bottom entry is odd. If 𝑚 = 𝑝1 ⋅ ⋅ ⋅ 𝑝𝑟 , where the 𝑝𝑖 are
(not necessarily distinct) primes, then we define
( ) ( )
( )
𝑎
𝑎
𝑎
=
⋅⋅⋅
,
𝑚
𝑝1
𝑝𝑟
where the factors on the right are Legendre symbols. It turns out that the Jacobi symbol
enjoys many of the same properties as the Legendre symbol, including the law of quadratic
reciprocity.
Theorem 6.16. The results of Theorem 6.5 (𝑖𝑖), (𝑖𝑖𝑖), Corollary 6.8, and Theorem 6.9 hold
with the Legendre symbol replaced by the Jacobi symbol and the odd primes 𝑝 and 𝑞 replaced
by odd positive integers.
Proof. It suffices to write out the prime factorizations of the odd integers in question and
apply the definition of the Jacobi symbol in combination with the corresponding properties
of the Legendre symbol. We leave the details as an exercise.
□
Note that part (iv) of Theorem 6.5 does not quite hold for the Jacobi symbol. The
correct analogue is that (𝑛2 /𝑚) = 1 if (𝑚, 𝑛) = 1. Theorem 6.16 often allows us to perform
computations with Legendre symbols more efficiently than was previously possible. For
instance, in Example 6.11, we could apply quadratic reciprocity for Jacobi symbols to obtain
( ) ( ) ( ) ( ) ( )
21
61
19
21
2
=
=
=
=
= −1
61
21
21
19
19
rather than dealing with (3/61) and (7/61) separately. Unfortunately, the Jacobi symbol
(𝑎/𝑚) does not tell us whether 𝑎 is a square mod 𝑚. For example, (2/9) = (2/3)(2/3) = 1,
but 2 is not a square modulo 9.
7. Some diophantine equations
A diophantine equation usually refers to a polynomial equation with integer coefficients to
which we seek integer solutions. As a simple example, consider the equation
9𝑥 + 6𝑦 = 20.
This is a linear diophantine equation in two variables. A moment’s thought reveals that this
equation has no integer solutions, since 9𝑥+6𝑦 is divisible by 3 for any integers 𝑥 and 𝑦 while
20 is not divisible by 3. From another point of view, notice that solving the above equation
is equivalent to solving the congruence 9𝑥 ≡ 20 (mod 6), and we know from Theorem 3.8
that this has no solution since (9, 6) = 3 does not divide 20.
On the other hand, the equation 2𝑥 + 3𝑦 = 7 has infinitely many integer solutions, given
by 𝑥 = −1 + 3𝑘 and 𝑦 = 3 − 2𝑘 for any 𝑘 ∈ ℤ. The following theorem characterizes the
solutions of the linear diophantine equation 𝑎𝑥 + 𝑏𝑦 = 𝑐.
Theorem 7.1. Let 𝑎, 𝑏, and 𝑐 be integers, and write 𝑑 = (𝑎, 𝑏). The equation 𝑎𝑥 + 𝑏𝑦 = 𝑐
has integer solutions if and only if 𝑑∣𝑐. Moreover, the set of solutions is given by
𝑥 = 𝑥0 + 𝑘𝑏/𝑑,
where (𝑥0 , 𝑦0 ) is any particular solution.
𝑦 = 𝑦0 − 𝑘𝑎/𝑑
(𝑘 ∈ ℤ),
MA 311
NUMBER THEORY
FALL 2008
39
Proof. The equation 𝑎𝑥 + 𝑏𝑦 = 𝑐 is equivalent to the congruence 𝑎𝑥 ≡ 𝑐 (mod 𝑏), and
Theorem 3.8 shows that this is solvable if and only if (𝑎, 𝑏) divides 𝑐. If 𝑥0 is any solution
of the congruence, then we have 𝑎𝑥0 = 𝑐 − 𝑏𝑦0 for some integer 𝑦0 , so (𝑥0 , 𝑦0 ) solves the
equation. Moreover, any solution (𝑥, 𝑦) satisfies the congruences
𝑎
𝑥
𝑑
≡
𝑐
𝑑
(mod 𝑑𝑏 )
and
𝑏
𝑦
𝑑
≡
𝑐
𝑑
(mod 𝑎𝑑 ),
which have unique solutions modulo 𝑏/𝑑 and 𝑎/𝑑, respectively. Therefore we have 𝑥 =
𝑥0 + 𝑘𝑏/𝑑 and 𝑦 = 𝑦0 + 𝑚𝑎/𝑑 for some integers 𝑘 and 𝑚. Substituting into the equation
𝑎𝑥 + 𝑏𝑦 = 𝑐, we find that (𝑥, 𝑦) is a solution if and only if 𝑚 = −𝑘.
□
Example 7.2. Describe all integer solutions of the diophantine equations 35𝑥 + 49𝑦 = 64
and 35𝑥 + 49𝑦 = 63.
Solution. In view of Theorem 7.1, the equation 35𝑥 + 49𝑦 = 64 has no integer solutions, but
the equation 35𝑥 + 49𝑦 = 63 has solutions 𝑥 = −1 + 7𝑘 and 𝑦 = 2 − 5𝑘 for every 𝑘 ∈ ℤ. □
Notice that the solubility of our linear diophantine equation was closely connected to the
solubility of the underlying congruences. This is a fairly general principle that is useful to
keep in mind when studying higher degree equations.
Example 7.3. Determine all integer solutions of the diophantine equation 𝑥2 + 𝑦 2 = 1999.
Solution. Notice that 0 and 1 are the only perfect squares modulo 4, and no two of these
add up to 3, which is congruent to 1999 modulo 4. We therefore conclude that the equation
has no integer solutions.
□
Example 7.4. Determine all integer solutions of the equation 𝑥2 + 7𝑦 2 + 35𝑧 2 = 70493.
Solution. If (𝑥, 𝑦, 𝑧) were a solution, then 𝑥 would satisfy the congruence 𝑥2 ≡ 70493 ≡ 3
(mod 7). But 3 is a quadratic non-residue modulo 7, so we conclude that there are no integer
solutions.
□
Pythagorean triples. A famous quadratic diophantine equation in three variables is the
Pythagorean equation
𝑥2 + 𝑦 2 = 𝑧 2 .
(7.1)
Notice that this equation has many “trivial” solutions, (0, 𝑦, ±𝑦) and (𝑥, 0, ±𝑥), obtained
by setting one of the variables on the left hand side equal to zero. These solutions are
not very interesting. Of course, there are some well-known right triangles with integer side
lengths, which give non-trivial solutions such as (3, 4, 5) and (5, 12, 13). A solution to (7.1)
is sometimes called a Pythagorean triple.
The equation (7.1) also has a special property called homogeneity, which means that if
(𝑥, 𝑦, 𝑧) is a solution, then so is (𝑘𝑥, 𝑘𝑦, 𝑘𝑧) for any integer 𝑘. For this reason, we usually
restrict attention to the so-called primitive solutions, in which 𝑥, 𝑦, and 𝑧 have no non-trivial
common factors. It turns out that we can express all primitive solutions of this equation as
a two-parameter family. It is easy to show that in any primitive Pythagorean triple we must
have 𝑧 odd and either 𝑥 or 𝑦 even. By interchanging 𝑥 and 𝑦 if necessary, we may suppose
without loss of generality that 𝑥 is even.
40
SCOTT T. PARSELL
Theorem 7.5. If (𝑥, 𝑦, 𝑧) is a primitive Pythagorean triple, where 𝑥 is even and 𝑥, 𝑦, and
𝑧 are positive, then
𝑥 = 2𝑠𝑡,
𝑦 = 𝑠2 − 𝑡2 ,
and
𝑧 = 𝑠 2 + 𝑡2 ,
for some relatively prime positive integers 𝑠 and 𝑡. Conversely, if 𝑠 and 𝑡 are relatively prime,
𝑠 > 𝑡 > 0, and 𝑠 or 𝑡 is even, then (2𝑠𝑡, 𝑠2 − 𝑡2 , 𝑠2 + 𝑡2 ) is a primitive Pythagorean triple.
Proof. Let (𝑥, 𝑦, 𝑧) be a positive primitive Pythagorean triple with 𝑥 even and 𝑦 and 𝑧 odd.
Then we have
𝑥2 = 𝑧 2 − 𝑦 2 = (𝑧 + 𝑦)(𝑧 − 𝑦),
and both 𝑧 + 𝑦 and 𝑧 − 𝑦 are even, so we can write
)(
)
( )2 (
𝑧+𝑦
𝑧−𝑦
𝑥
=
,
2
2
2
where all three factors are integers. Any common divisor of (𝑧 + 𝑦)/2 and (𝑧 − 𝑦)/2 would
have to divide their sum and difference, 𝑧 and 𝑦, but we know that 𝑧 and 𝑦 are relatively
prime and hence so are (𝑧 + 𝑦)/2 and (𝑧 − 𝑦)/2. It follows easily that both (𝑧 + 𝑦)/2 and
(𝑧 − 𝑦)/2 must be perfect squares, say
𝑧+𝑦
𝑧−𝑦
= 𝑠2
and
= 𝑡2 .
2
2
2
2
2
2
The equations 𝑥 = 2𝑠𝑡, 𝑦 = 𝑠 − 𝑡 , and 𝑧 = 𝑠 + 𝑡 follow immediately. Conversely, it is
easy to check that
(2𝑠𝑡)2 + (𝑠2 − 𝑡2 )2 = (𝑠2 + 𝑡2 )2 .
Moreover, any odd prime dividing both 2𝑠𝑡 and 𝑠2 − 𝑡2 would have to divide either 𝑠 or 𝑡
and either 𝑠 + 𝑡 or 𝑠 − 𝑡, and in all of these cases the prime would divide both 𝑠 and 𝑡. Thus
if (𝑠, 𝑡) = 1 then the above triple is primitive.
□
Example 7.6. Find all positive primitive Pythagorean triples with one of the variables equal
to 15.
Solution. Since 15 is odd and is not the sum of two squares, Theorem 7.5 implies that
𝑦 = 𝑠2 − 𝑡2 is the only variable that could take the value 15. So we seek positive integers
𝑠 > 𝑡 such that 𝑠2 − 𝑡2 = (𝑠 + 𝑡)(𝑠 − 𝑡) = 15. Clearly, the only possibilities are
𝑠 + 𝑡 = 15,
𝑠−𝑡=1
and
𝑠 + 𝑡 = 5,
𝑠 − 𝑡 = 3,
which yield 𝑠 = 8, 𝑡 = 7 and 𝑠 = 4, 𝑡 = 1. Hence the only Pythagorean triples of this type
are (112, 15, 113) and (8, 15, 17).
□
Theorem 7.7. The equation 𝑥4 + 𝑦 4 = 𝑧 2 has no integer solutions with 𝑥𝑦𝑧 ∕= 0.
Proof. If (𝑥, 𝑦, 𝑧) is a solution with gcd(𝑥, 𝑦) = 𝑑, then 𝑧 2 is divisible by 𝑑4 and hence 𝑧
is divisible by 𝑑2 , so we obtain a new solution (𝑥/𝑑, 𝑦/𝑑, 𝑧/𝑑2 ) with the first two variables
relatively prime. Therefore we may suppose that (𝑥, 𝑦, 𝑧) is a solution with 𝑥, 𝑦, and 𝑧
positive, gcd(𝑥, 𝑦) = 1 and 𝑧 as small as possible. We will show how to construct a solution
with a smaller value of 𝑧, thereby producing a contradiction.
MA 311
NUMBER THEORY
FALL 2008
41
Since (𝑥2 , 𝑦 2 , 𝑧) is a positive primitive Pythagorean triple, we may apply Theorem 7.5
(after possibly interchanging 𝑥 and 𝑦) to write
𝑥2 = 2𝑠𝑡,
𝑦 2 = 𝑠 2 − 𝑡2 ,
and
𝑧 = 𝑠2 + 𝑡2 ,
where 𝑠 > 𝑡 > 0 and gcd(𝑠, 𝑡) = 1. Since 𝑦 is odd, it follows that 𝑠 is odd and 𝑡 is even, so
we in fact have gcd(𝑠, 2𝑡) = 1, and thus 𝑠 and 2𝑡 are both perfect squares, say
𝑠 = 𝑢2
and
2𝑡 = 𝑣 2 .
Furthermore, we have 𝑡2 + 𝑦 2 = 𝑠2 , so (𝑡, 𝑦, 𝑠) is another primitive Pythagorean triple, and
we can apply Theorem 7.5 again to write
𝑡 = 2𝑆𝑇,
𝑦 = 𝑆 2 − 𝑇 2,
and
𝑠 = 𝑆 2 + 𝑇 2,
where 𝑆 > 𝑇 > 0 and gcd(𝑆, 𝑇 ) = 1. We now have 𝑆𝑇 = 𝑡/2 = (𝑣/2)2 , which implies that
𝑆 and 𝑇 are both perfect squares, say 𝑆 = 𝑋 2 and 𝑇 = 𝑌 2 . But now
𝑋 4 + 𝑌 4 = 𝑆 2 + 𝑇 2 = 𝑠 = 𝑢2
and
𝑢2 = 𝑠 < (𝑠2 + 𝑡2 )2 = 𝑧 2 ,
so 𝑢 < 𝑧, and taking 𝑍 = 𝑢 gives a new solution (𝑋, 𝑌, 𝑍) with 𝑍 < 𝑧.
□
Corollary 7.8. The equation 𝑥4 + 𝑦 4 = 𝑧 4 has no integer solutions with 𝑥𝑦𝑧 ∕= 0.
Proof. If (𝑥, 𝑦, 𝑧) were a solution with 𝑥𝑦𝑧 ∕= 0, then we would have 𝑥4 + 𝑦 4 = (𝑧 2 )2 ,
contradicting Theorem 7.7.
□
Theorem 7.9. (Fermat’s Last Theorem) If 𝑘 is an integer with 𝑘 ≥ 3 is an integer, then
the equation 𝑥𝑘 + 𝑦 𝑘 = 𝑧 𝑘 has no integer solutions with 𝑥𝑦𝑧 ∕= 0.
Note that this follows easily from Corollary 7.8 when 𝑘 is a multiple of 4. The proof
for arbitrary 𝑘 is extremely hard and was just completed by Wiles in 1995. The following
symmetric generalization of Fermat’s Last Theorem is still unsolved.
Conjecture 7.10. If 𝑘 is an integer with 𝑘 ≥ 5, then the equation 𝑥𝑘 + 𝑦 𝑘 = 𝑧 𝑘 + 𝑤𝑘 has
no non-trivial integer solutions.
Notice that there are non-trivial solutions to this equation when 𝑘 = 2, 3, and 4. For
instance, one has
12 + 72 = 52 + 52 ,
13 + 123 = 93 + 103 ,
and 1334 + 1344 = 1584 + 594 .
Equations in “many” variables. A general theme illustrated above is that diophantine
equations in few variables (relative to the degree) tend to have few, if any, non-trivial solutions. Conversely, equations in sufficiently many variables (relative to the degree) tend to
have many non-trivial solutions. One of the most interesting problems here is to try to quantify the phrase “sufficiently many.” As an example, we look at the problem of representing
integers as sums of 𝑘th powers.
Theorem 7.11. (Lagrange’s four squares theorem) Every positive integer can be written as the sum of four squares.
42
SCOTT T. PARSELL
For example, we have 31 = 52 + 22 + 12 + 12 and 120 = 102 + 42 + 22 + 02 . We leave it as an
exercise to show that there are infinitely many positive integers that cannot be represented
as sums of three squares.
Lemma 7.12. If 𝑚 and 𝑛 are sums of four squares, then so is 𝑚𝑛.
Proof. Suppose that 𝑚 = 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 and 𝑛 = 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 . Then it is easy (but
somewhat tedious) to verify that
𝑚𝑛 = (𝑥𝑎 + 𝑦𝑏 + 𝑧𝑐 + 𝑤𝑑)2 + (𝑥𝑏 − 𝑦𝑎 + 𝑧𝑑 − 𝑤𝑐)2 + (𝑥𝑐 − 𝑧𝑎 + 𝑤𝑏 − 𝑦𝑑)2 + (𝑥𝑑 − 𝑤𝑎 + 𝑦𝑐 − 𝑧𝑏)2 .
We leave this algebra as an exercise.
□
Lemma 7.13. If 𝑝 is an odd prime, then there exist integers 𝑥, 𝑦, and 𝑘, with 0 < 𝑘 < 𝑝,
such that
𝑥2 + 𝑦 2 + 1 = 𝑘𝑝.
Proof. It suffices to find integers 𝑥 and 𝑦 with
𝑥2 + 𝑦 2 + 1 ≡ 0 (mod 𝑝)
and
𝑥2 + 𝑦 2 + 1 < 𝑝 2 .
(7.2)
We divide the proof into two cases.
( )
If 𝑝 ≡ 1 (mod 4), then Theorem 6.5 (ii) implies that −1
= 1, so −1 is a quadratic
𝑝
residue modulo 𝑝. Therefore we can find 𝑥 with 0 < 𝑥 < 𝑝/2 such that 𝑥2 ≡ −1 (mod 𝑝),
and (7.2) is satisfied with 𝑦 = 0.
( )
= −1. Now let 𝑎 be the
If 𝑝 ≡ 3 (mod 4), then Theorem 6.5 (ii) implies that −1
𝑝
smallest quadratic non-residue modulo 𝑝. Then we have
( ) ( )( )
−𝑎
−1
𝑎
=
= (−1)(−1) = 1
𝑝
𝑝
𝑝
by Theorem 6.5 (iii), so −𝑎 is a quadratic residue modulo 𝑝. Therefore we can find 𝑥 with
0 < 𝑥 < 𝑝/2 such that 𝑥2 ≡ −𝑎 (mod 𝑝). Furthermore, the minimality of 𝑎 ensures that
𝑎 − 1 is a quadratic residue modulo 𝑝, so we can find 𝑦 with 0 < 𝑦 < 𝑝/2 such that 𝑦 2 ≡ 𝑎 − 1
(mod 𝑝). It is easy to check that 𝑥 and 𝑦 satisfy (7.2), so this completes the proof.
□
Proof of Lagrange’s Theorem: In view of Lemma 7.12 and the fact that 2 = 12 +12 +02 +02 ,
it suffices to prove that every odd prime 𝑝 is the sum of four squares. By Lemma 7.13, we
can find integers 𝑥, 𝑦, 𝑧, and 𝑤 such that
𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 = 𝑘𝑝
(7.3)
for some positive integer 𝑘 < 𝑝. For instance, take 𝑥 and 𝑦 as in the lemma, 𝑧 = 1, and
𝑤 = 0. We employ a descent argument to show that we can find a solution to (7.3) with
𝑘 = 1. To do this, we suppose that we have a solution with 𝑘 > 1 and demonstrate how to
construct a solution with a smaller value of 𝑘. First of all, if 𝑘 is even, then an even number
of the variables on the left-hand side are odd, so by relabeling if necessary we may suppose
that 𝑥 ± 𝑦 and 𝑧 ± 𝑤 are even, and
)2 (
)2 (
)2 (
)2
(
𝑥−𝑦
𝑧+𝑤
𝑧−𝑤
𝑥+𝑦
+
+
+
= (𝑘/2)𝑝.
2
2
2
2
MA 311
NUMBER THEORY
FALL 2008
43
If 𝑘/2 is even, then we can repeat the argument until we obtain a solution to (7.3) with 𝑘
odd, so we may suppose from now on that 𝑘 is odd. Now let 𝑎, 𝑏, 𝑐, and 𝑑 denote the least
absolute value residues of 𝑥, 𝑦, 𝑧, and 𝑤 modulo 𝑘. That is,
𝑎 ≡ 𝑥 (mod 𝑘),
𝑏 ≡ 𝑦 (mod 𝑘),
𝑐 ≡ 𝑧 (mod 𝑘),
𝑑 ≡ 𝑤 (mod 𝑘),
where ∣𝑎∣, ∣𝑏∣, ∣𝑐∣, ∣𝑑∣ < 𝑘/2 since 𝑘 is odd. Then we have
𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 ≡ 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 ≡ 0 (mod 𝑘),
so we can write 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 = 𝑘𝑚 for some integer 𝑚, and 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 < 𝑘 2 , so
we have 𝑚 < 𝑘. If 𝑚 = 0 then we would have 𝑎 = 𝑏 = 𝑐 = 𝑑 = 0, which would imply that
𝑘𝑝 = 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 is divisible by 𝑘 2 . This cannot occur when 1 < 𝑘 < 𝑝 since 𝑝 is prime,
so we conclude that 𝑚 > 0. Now by the proof of Lemma 7.12 we can write
(𝑘𝑝)(𝑘𝑚) = 𝑋 2 + 𝑌 2 + 𝑍 2 + 𝑊 2 ,
where
𝑋 = 𝑥𝑎 + 𝑦𝑏 + 𝑧𝑐 + 𝑤𝑑,
𝑌 = 𝑥𝑏 − 𝑦𝑎 + 𝑧𝑑 − 𝑤𝑐,
𝑍 = 𝑥𝑐 − 𝑧𝑎 + 𝑤𝑏 − 𝑦𝑑,
𝑊 = 𝑥𝑑 − 𝑤𝑎 + 𝑦𝑐 − 𝑧𝑏,
and it is easy to check that 𝑋, 𝑌 , 𝑍, and 𝑊 are each divisible by 𝑘. It follows that
(𝑋/𝑘)2 + (𝑌 /𝑘)2 + (𝑍/𝑘)2 + (𝑊/𝑘)2 = 𝑚𝑝,
which gives a solution of (7.3) with 0 < 𝑚 < 𝑘. This completes the descent and shows that
there is in fact a solution with 𝑘 = 1. □
Waring’s problem. One might ask whether similar results exist for higher powers. That
is, given a positive integer 𝑘, can we find a positive integer 𝑠 such that all positive integers
𝑛 can be written in the form
𝑛 = 𝑥𝑘1 + 𝑥𝑘2 + ⋅ ⋅ ⋅ + 𝑥𝑘𝑠
(7.4)
for some non-negative integers 𝑥1 , 𝑥2 , . . . , 𝑥𝑠 ? This question was posed by Waring in 1770
(around the same time as Lagrange’s Theorem was proved) and has received considerable
attention over the past century. The original version of the problem seeks to determine 𝑔(𝑘),
which is defined to be the smallest integer 𝑠 such that the above equation can be solved for
every positive integer 𝑛. For example, one has 𝑔(2) = 4. It is also known that 𝑔(3) = 9,
𝑔(4) = 19, and 𝑔(5) = 37 and that
⌊( )𝑘 ⌋
3
𝑘
𝑔(𝑘) ≥ 2 +
−2
(7.5)
2
for all 𝑘, where ⌊𝑥⌋ denotes the greatest integer less than or equal to 𝑥. Notice that the
integer 23 really does require 9 cubes in order to achieve a representation. Since 23 < 33
and 23 < 23 + 23 + 23 , the most efficient decomposition is
23 = 23 + 23 + 13 + 13 + 13 + 13 + 13 + 13 + 13 .
Amazingly, it turns out that 23 and 239 are the only two integers that actually require 9
cubes. In fact, there are only finitely many integers that require 8 cubes, and it follows
that every sufficiently large integer can be expressed as the sum of 7 cubes. In general, we
44
SCOTT T. PARSELL
define 𝐺(𝑘) to be the smallest integer 𝑠 such that every sufficiently large integer 𝑛 can be
represented in the form (7.4). For example, it is known that
𝐺(2) = 4,
4 ≤ 𝐺(3) ≤ 7,
𝐺(4) = 16,
6 ≤ 𝐺(5) ≤ 17,
and 9 ≤ 𝐺(6) ≤ 24.
It turns out that 𝐺(𝑘) grows much slower than 𝑔(𝑘) as 𝑘 → ∞, reflecting the fact that the
representation of small integers poses some unusual difficulties that do not persist in the
long run. In fact, it was shown by Wooley in 1992 that 𝐺(𝑘) grows no faster than 𝑘 log 𝑘
asymptotically, whereas (7.5) shows that the growth of 𝑔(𝑘) is exponential in 𝑘. In the 1920s,
Hardy and Littlewood devised a method for counting the number of representations of 𝑛 in
the form (7.4) by using a definite integral. Refinements of this strategy due to Vinogradov,
Davenport, Vaughan, Woooley, and others have led to the sharpest available upper bounds
for 𝐺(𝑘) when 𝑘 ≥ 3. Notice that even in the cubic case, the existing technology still leaves
fairly large gaps between what is conjectured and what can be proved!
We give
√ a very brief outline of the Hardy-Littlewood method. When 𝛼 is a real number
and 𝑖 = −1, write
𝑒(𝛼) = 𝑒2𝜋𝑖𝛼 = cos(2𝜋𝛼) + 𝑖 sin(2𝜋𝛼).
If 𝑚 is an integer, then it is easy to verify the orthogonality relations
{
∫ 1
∫ 1
∫ 1
1 if 𝑚 = 0
sin(2𝜋𝛼𝑚) 𝑑𝛼 =
cos(2𝜋𝛼𝑚) 𝑑𝛼 + 𝑖
.
𝑒(𝛼𝑚) 𝑑𝛼 =
0 if 𝑚 ∕= 0
0
0
0
If we let 𝑃 = ⌊𝑛1/𝑘 ⌋ and introduce the exponential sum
𝑓 (𝛼) =
𝑃
∑
𝑒(𝛼𝑥𝑘 ),
𝑥=1
then the fact that 𝑒(𝑎)𝑒(𝑏) = 𝑒(𝑎 + 𝑏) gives
∫ 1
𝑃
𝑃 ∫
∑
∑
𝑠
𝑓 (𝛼) 𝑒(−𝛼𝑛) 𝑑𝛼 =
⋅⋅⋅
0
𝑥1 =1
𝑥𝑠 =1
1
0
𝑒(𝛼(𝑥𝑘1 + ⋅ ⋅ ⋅ + 𝑥𝑘𝑠 − 𝑛)) 𝑑𝛼,
and the orthogonality relations show that each term in the sum is 1 or 0 according to whether
or not 𝑥𝑘1 + ⋅ ⋅ ⋅ + 𝑥𝑘𝑠 = 𝑛. The integral on the left therefore counts the representations of 𝑛
in this form, and demonstrating the existence of representations amounts to showing that
the integral is positive. This is a non-trivial task that involves dissecting the interval [0, 1]
into two subsets according to the nature of the rational approximations to 𝛼 and applying
several types of estimates for the exponential sum 𝑓 (𝛼). Notice that as the real variable 𝛼
runs from 0 to 1, the complex variable 𝑧 = 𝑒2𝜋𝑖𝛼 traces out the unit circle ∣𝑧∣ = 1. The
original set-up devised by Hardy and Littlewood actually takes the latter perspective, using
integrals over circles in the complex plane. For this reason, the technique is often referred to
as the circle method, and the two subsets mentioned above are called major and minor arcs.
8. Irrationality and transcendence
√
We have already seen in §2 that irrational numbers exist; for instance, 2 ∕∈ ℚ. In fact,
almost all real numbers are irrational, since the rationals form a countable set while the
reals are uncountable. On the other hand, given any two real numbers 𝛼 < 𝛽, we can find a
rational number lying between them. To see this, let 𝑛 be an integer with 𝑛 > 1/(𝛽 − 𝛼),
so that 𝑛𝛽 − 𝑛𝛼 > 1. Clearly there must be an integer 𝑚 between 𝑛𝛼 and 𝑛𝛽, and it
MA 311
NUMBER THEORY
FALL 2008
45
follows that the rational number 𝑚/𝑛 lies between 𝛼 and 𝛽. In particular, by choosing 𝛽
sufficiently close to 𝛼, we can find a rational number that approximates 𝛼 to any desired
degree of accuracy. This property is often expressed by saying that the rationals are dense
in the reals. In number theory, we often desire more quantitative information about rational
approximations. For instance, how does the quality of the approximation improve as we
allow the denominator to increase? This is the type of information that determines how we
dissect into major and minor arcs in the Hardy-Littlewood method. One simple answer is
given by the following theorem.
Theorem 8.1. (Dirichlet’s theorem on diophantine approximation) Given a real
number 𝛼 and a positive integer 𝑃 , there exist integers 𝑎 and 𝑞 with (𝑎, 𝑞) = 1 and 1 ≤ 𝑞 ≤
𝑃 − 1 such that
¯
¯
¯
¯
𝑎
¯𝛼 − ¯ ≤ 1 .
¯
𝑞 ¯ 𝑞𝑃
Proof. It suffices to prove the result for 𝛼 ∈ [0, 1], since the general case can then be obtained
by replacing 𝑎/𝑞 by ⌊𝛼⌋+𝑎/𝑞. We divide the interval [0, 1] into 𝑃 subintervals, each of length
1/𝑃 , and consider the values of 𝑞𝛼 − ⌊𝑞𝛼⌋ as 𝑞 runs over the integers 1, 2, 3, . . . , 𝑃 − 1. First
of all, if 𝑞𝛼 − ⌊𝑞𝛼⌋ lies in the interval [0, 1/𝑃 ] for some 𝑞, then taking 𝑎 = ⌊𝑞𝛼⌋ gives
∣𝑞𝛼 − 𝑎∣ ≤ 1/𝑃 . Similarly, if 𝑞𝛼 − ⌊𝑞𝛼⌋ lies in the interval [1 − 1/𝑃, 1] for some 𝑞, then taking
𝑎 = ⌊𝑞𝛼⌋ + 1 gives ∣𝑞𝛼 − 𝑎∣ ≤ 1/𝑃 . If none of these 𝑃 − 1 values lies in the first or last
subinterval, then the pigeonhole principle ensures that two of them must lie in one of the
remaining 𝑃 − 2 subintervals. That is, we have
∣(𝑞2 𝛼 − ⌊𝑞2 𝛼⌋) − (𝑞1 𝛼 − ⌊𝑞1 𝛼⌋)∣ ≤ 1/𝑃
for some integers 𝑞1 and 𝑞2 with 1 ≤ 𝑞1 < 𝑞2 ≤ 𝑃 −1. Taking 𝑞 = 𝑞2 −𝑞1 and 𝑎 = ⌊𝑞2 𝛼⌋−⌊𝑞1 𝛼⌋
again gives ∣𝑞𝛼 − 𝑎∣ ≤ 1/𝑃 . Finally, if (𝑎, 𝑞) = 𝑑 then setting 𝑎′ = 𝑎/𝑑 and 𝑞 ′ = 𝑞/𝑑 gives
(𝑞 ′ , 𝑎′ ) = 1 and ∣𝑞 ′ 𝛼 − 𝑎′ ∣ ≤ 1/(𝑑𝑃 ) ≤ 1/𝑃 , which completes the proof.
□
Corollary 8.2. If 𝛼 is an irrational number, then there are infinitely many rational numbers
𝑎/𝑞 for which
¯
¯
¯
¯
𝑎
¯𝛼 − ¯ < 1 .
¯
𝑞 ¯ 𝑞2
Proof. If there were only finitely many such rational approximations to 𝛼, then we could
find one, say 𝑎/𝑞, with 𝛿 = ∣𝛼 − 𝑎/𝑞∣ minimal. Since 𝛼 ∕∈ ℚ, we have 𝛿 > 0, so we may let
𝑃 = ⌊1/𝛿⌋ + 1 > 1/𝛿. By Theorem 8.1, we can find a rational number 𝑏/𝑟 with 1 ≤ 𝑟 < 𝑃
and
¯
¯
¯
¯
¯𝛼 − 𝑏 ¯ ≤ 1 < 1 .
¯
𝑟 ¯ 𝑟𝑃
𝑟2
Since 1/(𝑟𝑃 ) < 𝛿, this contradicts the minimality of 𝛿.
□
Note that if 𝛼 is rational then the inequality in Corollary 8.2 has only finitely many
solutions. To see this, write 𝛼 = 𝑚/𝑛 and note that if 𝑚/𝑛 ∕= 𝑎/𝑞 then we have
¯
¯
¯ 𝑚 𝑎 ¯ ∣𝑚𝑞 − 𝑎𝑛∣
1
1
¯ − ¯=
≥
≥ 2
¯𝑛
¯
𝑞
𝑛𝑞
𝑛𝑞
𝑞
46
SCOTT T. PARSELL
whenever 𝑞 ≥ 𝑛, so the only possible solutions come from 1 ≤ 𝑞 < 𝑛. A theorem of Hurwitz
shows that there are in fact infinitely many solutions of
¯
¯
¯
¯
¯𝛼 − 𝑎 ¯ < √ 1
¯
𝑞¯
5𝑞 2
when 𝛼 is irrational.
This turns out to be best possible in the sense that the result fails
√
√ if
the constant 1/ 5 is replaced by anything smaller. However, the golden ratio 𝛼 = 21 (1 + 5)
provides the only counterexample!
Continued fractions. One way of generating good rational approximations to an irrational number 𝛼 is to construct the continued fraction expansion
1
𝛼 = 𝑥0 +
.
1
𝑥1 +
1
𝑥2 +
1
𝑥3 +
𝑥4 + . . .
To save space, this is sometimes denoted by 𝛼 = [𝑥0 ; 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , . . . ]. We can construct
continued fractions for rational numbers as well, but in this case the expansion is finite.
125
as a finite continued fraction.
54
Solution. We first split off the integer part by writing
125
17
=2+ .
54
54
Next we take the reciprocal of the fractional part and repeat the process. We have
54
3
17
2
3
1
=3+ ,
= 5 + , and
=1+ .
17
17
3
3
2
2
Thus we have
125
1
= [2; 3, 5, 1, 2].
=2+
1
54
3+
1
5+
1
1+
2
Example 8.3. Express the rational number
□
Example 8.4. What real number is represented by the continued fraction [1; 1, 1, 1, 1, . . . ]?
Solution. If 𝛼 = [1; 1, 1, 1, 1, . . . ] then we have
1
1
𝛼=1+
=1+ .
1
𝛼
1+
1 + ...
2
It follows that 𝛼 − 𝛼 − 1 = 0, and since 𝛼 is clearly positive we may conclude that
√
1+ 5
.
𝛼=
2
□
MA 311
NUMBER THEORY
FALL 2008
47
To generate the continued fraction for 𝛼, we first take 𝑥0 = ⌊𝛼⌋ and then write
𝛼1 =
1
𝛼 − 𝑥0
and
𝑥1 = ⌊𝛼1 ⌋.
In general, if 𝛼𝑛 and 𝑥𝑛 have been defined, we take
𝛼𝑛+1 =
1
𝛼𝑛 − 𝑥𝑛
and
𝑥𝑛+1 = ⌊𝛼𝑛+1 ⌋.
√
Example 8.5. Compute the continued fraction for 2.
√
Solution. First of all, we have 𝑥0 = ⌊ 2⌋ = 1. Next, we have
𝛼1 = √
√
1
= 2 + 1,
2−1
and hence 𝑥1 = 2. Furthermore,
𝛼2 =
1
1
=√
= 𝛼1 ,
𝛼1 − 2
2−1
and hence 𝑥2 = 2. Since
√ 𝛼𝑛+1 depends only on 𝛼𝑛 and 𝑥𝑛 , we can conclude that 𝑥𝑛 = 2 for
all 𝑛 ≥ 1. Therefore, 2 = [1; 2, 2, 2, 2, . . . ] = [1; 2].
□
By truncating
the continued fraction obtained above, we can obtain rational approxima√
tions to 2, for instance
𝑝1
1
3
=1+ = ,
𝑞1
2
2
𝑝2
1
=1+
𝑞2
2+
1
2
7
= ,
5
and
𝑝3
1
17
=1+
= .
1
𝑞3
12
2 + 2+ 1
2
The rational number 𝑝𝑛 /𝑞𝑛 is called the 𝑛th convergent to 𝛼, and the integer 𝑥𝑛 is called the
𝑛th partial quotient of 𝛼. It turns out that the convergents satisfy some simple recurrence
relations, which make them easy to compute once the partial quotients are known.
Theorem 8.6. If 𝛼 has the continued fraction expansion [𝑥0 ; 𝑥1 , 𝑥2 , 𝑥3 , . . . ], then the 𝑛th
convergent to 𝛼 is the rational number 𝑝𝑛 /𝑞𝑛 defined by recurrence relations
𝑝𝑛 = 𝑥𝑛 𝑝𝑛−1 + 𝑝𝑛−2
and
𝑞𝑛 = 𝑥𝑛 𝑞𝑛−1 + 𝑞𝑛−2
(𝑛 ≥ 0),
where we take 𝑝−1 = 1, 𝑞−1 = 0, 𝑝−2 = 0, and 𝑞−2 = 1.
Proof. We regard the convergents 𝑝𝑛 /𝑞𝑛 as functions of the partial quotients. That is,
𝑝𝑛 = 𝑝𝑛 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛 ) and 𝑞𝑛 = 𝑞𝑛 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛 ). The result is clear for 𝑛 = 0, since
the recursions give 𝑝0 = 𝑥0 and 𝑞0 = 1. Now suppose that [𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 ] = 𝑝𝑛−1 /𝑞𝑛−1 .
Then we can write
[𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 , 𝑥𝑛 ] = [𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 +
1
]
𝑥𝑛
=
𝑝𝑛−1 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛−1 +
𝑞𝑛−1 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛−1 +
1
)
𝑥𝑛
1 .
)
𝑥𝑛
48
SCOTT T. PARSELL
Applying the above recurrence relations, we obtain
[𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 , 𝑥𝑛 ] =
(𝑥𝑛−1 +
1
)𝑝𝑛−2
𝑥𝑛
1
)𝑞
𝑥𝑛 𝑛−2
+ 𝑝𝑛−3
(𝑥𝑛−1 +
+ 𝑞𝑛−3
𝑥𝑛 𝑥𝑛−1 𝑝𝑛−2 + 𝑝𝑛−2 + 𝑥𝑛 𝑝𝑛−3
=
𝑥𝑛 𝑥𝑛−1 𝑞𝑛−2 + 𝑞𝑛−2 + 𝑥𝑛 𝑞𝑛−3
𝑥𝑛 (𝑥𝑛−1 𝑝𝑛−2 + 𝑝𝑛−3 ) + 𝑝𝑛−2
𝑥𝑛 𝑝𝑛−1 + 𝑝𝑛−2
𝑝𝑛
=
=
= .
𝑥𝑛 (𝑥𝑛−1 𝑞𝑛−2 + 𝑞𝑛−3 ) + 𝑞𝑛−2
𝑥𝑛 𝑞𝑛−1 + 𝑞𝑛−2
𝑞𝑛
The result follows by induction.
□
√
Example 8.7. Find the continued fraction expansion for 29, and compute the first 6
convergents.
√
Solution. We have 𝑥0 = ⌊ 29⌋ = 5, and thus
√
√
1
29 + 5
29 − 3
𝛼1 = √
=
=2+
.
4
4
29 − 5
It follows that 𝑥1 = 2 and
√
√
29 + 3
29 − 2
4
𝛼2 = √
=
=1+
.
5
5
29 − 3
This in turn gives 𝑥2 = 1 and
√
√
5
29 + 2
29 − 3
𝛼3 = √
=1+
,
=
5
5
29 − 2
which yields 𝑥3 = 1 and
√
√
29 + 3
29 − 5
5
𝛼4 = √
=2+
.
=
4
4
29 − 3
Now we have 𝑥4 = 2 and
√
√
4
𝛼5 = √
= 29 + 5 = 10 + ( 29 − 5),
29 − 5
and from this we see that 𝑥5 = 10 and 𝛼6 = 𝛼
√1 , which means that the continued fraction becomes periodic. We therefore conclude that 29 = [5; 2, 1, 1, 2, 10], and we can use Theorem
8.6 to compute the convergents. We have 𝑝0 = 5, 𝑝1 = 2 ⋅ 5 + 1 = 11, 𝑝2 = 1 ⋅ 11 + 5 = 16,
𝑝3 = 1 ⋅ 16 + 11 = 27, 𝑝4 = 2 ⋅ 27 + 16 = 70, and 𝑝5 = 10 ⋅ 70 + 27 = 727. Similarly,
we get 𝑞0 = 1, 𝑞1 = 2, 𝑞2 = 1 ⋅ 2 + 1 = 3, 𝑞3 = 1 ⋅ 3 + 2 = 5, 𝑞4 = 2 ⋅ 5 + 3 = 13, and
𝑞5 = 10 ⋅ 13 + 5 = 135. Hence the first 6 convergents are
11
16
27
70
727
5,
,
,
,
, and
.
2
3
5
13
135
□
Algebraic and transcendental numbers. A real number that is a root of a non-trivial
polynomial with integer coefficients is said to be algebraic. More precisely, if 𝛼 is a root of a
polynomial of degree 𝑘 with integer coefficients that is irreducible over ℚ, then we say that 𝛼
is algebraic of degree 𝑘. Note that any rational number 𝑝/𝑞 is algebraic of degree one, since
MA 311
NUMBER THEORY
FALL 2008
49
√
it is a root of the polynomial 𝑓 (𝑥) = 𝑞𝑥 − 𝑝. Any real number of the form 𝑎 ± 𝑏 𝑑, where
𝑎, 𝑏, and
√ 𝑑 are rational and 𝑑 is not a perfect square, is algebraic 2of degree two. For instance,
1
(1+
5) is algebraic of degree two, since it is a root of 𝑓 (𝑥) = 𝑥 −𝑥−1. Algebraic numbers
2
of degree two are sometimes called quadratic irrationals. It turns out that a number is a
quadratic irrational if and only if it has an eventually periodic continued fraction expansion.
The set of algebraic numbers is closed under addition
and
√
√ multiplication, but the set of
algebraic
numbers
of
degree
𝑘
is
not.
For
instance,
2
and
3 are algebraic of degree 2, but
√
√
√ √
2 + 3 is algebraic of degree 4 and 2 ⋅ 2 = 2 is algebraic of degree 1.
√
√
Example 8.8. Prove that 𝛼 = 2 + 3 is algebraic.
√
√
Solution. First of all, we have 𝛼2 = 2 + 2 6 + 3, and hence 𝛼2 − 5 = 2 6. Squaring both
sides gives 𝛼4 − 10𝛼2 + 25 = 24, or 𝛼4 − 10𝛼2 + 1 = 0. Thus 𝛼 is a root of the polynomial
𝑓 (𝑥) = 𝑥4 − 10𝑥2 + 1 and hence is algebraic of degree at most 4. One can in fact show that
𝑓 is irreducible over ℚ and hence that 𝛼 is algebraic of degree 4.
□
Real numbers that are not algebraic are called transcendental. Probably the two most
famous transcendental numbers are 𝑒 and 𝜋. Proving the transcendence of 𝑒 and 𝜋 is beyond
the scope of the course; however, it is not too difficult to show that 𝑒 is irrational.
Theorem 8.9. The number 𝑒 is irrational.
Proof. Suppose to the contrary that 𝑒 is rational, say 𝑒 = 𝑝/𝑞, where 𝑝 and 𝑞 are integers
with 𝑞 ≥ 1. We recall that 𝑒 can be expressed as the infinite series
∞
∑
1
𝑒=
.
𝑘!
𝑘=0
Let 𝑛 ≥ 2𝑞 be an integer, and let 𝑒𝑛 denote the 𝑛th partial sum of this series; that is,
𝑛
∑
1
1 1
1
1
𝑒𝑛 =
=1+1+ + +
+ ⋅⋅⋅ + .
𝑘!
2 6 24
𝑛!
𝑘=0
Clearly 𝑒𝑛 is rational, and we can write 𝑒𝑛 = 𝑎/𝑛! for some integer 𝑎. Moreover, we have
𝑒 > 𝑒𝑛 , and thus
𝑝
𝑎
𝑝𝑛! − 𝑎𝑞
1
𝑒 − 𝑒𝑛 = −
=
≥
.
𝑞 𝑛!
𝑞𝑛!
𝑞𝑛!
On the other hand, we have
∞
∑
1
1
1
1
𝑒 − 𝑒𝑛 =
=
+
+
+ ...
𝑘!
(𝑛
+
1)!
(𝑛
+
2)!
(𝑛
+
3)!
𝑘=𝑛+1
)
(
1
1
1
1
2
1
<
=
⋅
≤
1 + + 2 + ...
(𝑛 + 1)!
𝑛 𝑛
(𝑛 + 1)! 1 − 1/𝑛
(𝑛 + 1)!
since 𝑛 ≥ 2. Combining our two inequalities, we obtain
1
2
≤ 𝑒 − 𝑒𝑛 ≤
,
𝑞𝑛!
(𝑛 + 1)!
which implies that 𝑛 ≤ 2𝑞 − 1, a contradiction.
□
50
SCOTT T. PARSELL
The idea of the preceding proof may be summarized by saying that 𝑒 has rational approximations (namely 𝑒𝑛 ) that are “too good” to allow 𝑒 to be rational, since two distinct
rationals differ by at least the reciprocal of the product of the denominators. The following
theorem may be viewed as a generalization of this idea. It states that algebraic numbers
cannot have fantastically good rational approximations.
Theorem 8.10. (Liouville’s Theorem) Suppose that 𝛼 is an algebraic number of degree
𝑘 ≥ 2. Then there exists a positive constant 𝑐𝛼 such that
¯
¯
¯
¯
¯𝛼 − 𝑎 ¯ > 𝑐𝛼
¯
𝑞 ¯ 𝑞𝑘
for all integers 𝑎 and 𝑞 with 𝑞 ≥ 1.
Proof. Suppose that 𝛼 is a root of the irreducible polynomial
𝑃 (𝑥) = 𝑏𝑘 𝑥𝑘 + 𝑏𝑘−1 𝑥𝑘−1 + ⋅ ⋅ ⋅ + 𝑏1 𝑥 + 𝑏0 ,
where 𝑘 ≥ 2, and let 𝑎 and 𝑞 be integers with 𝑞 ≥ 1. First of all, we note that 𝑃 (𝑎/𝑞) ∕= 0,
since 𝑃 is irreducible of degree at least two. Furthermore, it is clear that 𝑞 𝑘 𝑃 (𝑎/𝑞) is
an integer and hence that 𝑞 𝑘 ∣𝑃 (𝑎/𝑞)∣ ≥ 1. Since 𝛼 is a root of 𝑃 , we may write 𝑃 (𝑥) =
(𝑥−𝛼)𝑄(𝑥), where 𝑄 is a polynomial of degree 𝑘 −1, not necessarily with integer coefficients.
Since 𝑄 is a continuous function, we know that it attains maximum and minimum values on
any closed, bounded interval. Therefore, there exists 𝑀𝛼 > 0 such that ∣𝑄(𝑥)∣ ≤ 𝑀𝛼 for all
𝑥 ∈ [𝛼 − 1, 𝛼 + 1]. We set 𝑐𝛼 = (1 + 𝑀𝛼 )−1 and consider two cases. If ∣𝛼 − 𝑎/𝑞∣ ≤ 1, then
we have
𝑞 −𝑘 ≤ ∣𝑃 (𝑎/𝑞)∣ ≤ ∣𝛼 − 𝑎/𝑞∣∣𝑄(𝑎/𝑞)∣ ≤ ∣𝛼 − 𝑎/𝑞∣𝑀𝛼 < ∣𝛼 − 𝑎/𝑞∣𝑐−1
𝛼 ,
which gives ∣𝛼 − 𝑎/𝑞∣ > 𝑐𝛼 𝑞 −𝑘 , as required. If ∣𝛼 − 𝑎/𝑞∣ > 1, then the desired inequality
follows from the observation that 𝑐𝛼 ≤ 1.
□
Example 8.11. Find an admissible value for 𝑐𝛼 in Liouville’s Theorem when 𝛼 =
√
3
2.
Solution. In the notation of the above proof, we have
𝑃 (𝑥) = 𝑥3 − 2 = (𝑥 − 𝛼)(𝑥2 + 𝛼𝑥 + 𝛼2 ) = (𝑥 − 𝛼)𝑄(𝑥).
Since 𝑄′ (𝑥) = 2𝑥 + 𝛼, we find that 𝑄 is increasing on the interval [𝛼 − 1, 𝛼 + 1] and hence
that 𝑄(𝛼 − 1) ≤ 𝑄(𝑥) ≤ 𝑄(𝛼 + 1) for all 𝑥 in the interval [𝛼 − 1, 𝛼 + 1]. Since 𝑄(𝛼 − 1) > 0
and 𝑄(𝛼 + 1) = 3𝛼2 + 3𝛼 + 1 < 9.542, we can take 𝑀𝛼 = 9.542 and thus any 𝑐𝛼 < (10.542)−1
is admissible. For example, one has
¯
¯
¯√
¯
𝑎
3
¯ 2− ¯> 1
¯
𝑞 ¯ 11𝑞 3
for all integers 𝑎 and all positive integers 𝑞.
□
One might hope that the proof of Theorem 8.9 could be modified to show that 𝑒 is
transcendental using the contrapositive of Liouville’s Theorem. However, the quality of the
rational approximations 𝑒𝑛 is not sufficient to make this argument work. We note that 𝑒𝑛 has
denominator 𝑞 = 𝑛!, but 2/(𝑛 + 1)! > 1/(𝑛!)2 = 1/𝑞 2 so the inequality ∣𝑒 − 𝑒𝑛 ∣ < 2/(𝑛 + 1)!
doesn’t even rule out the possibility that 𝑒 is a quadratic irrational! Therefore a more
MA 311
NUMBER THEORY
FALL 2008
51
sophisticated argument is required to prove that 𝑒 is transcendental. However, we can
establish the existence of transcendental numbers by working with a series that converges
much faster.
Theorem 8.12. The number 𝛼 =
∞
∑
10−𝑗! = 0.11000100000000000000000100000000..... is
𝑗=1
transcendental.
𝑛
𝑎𝑛 ∑ −𝑗!
Proof. We write
=
10 , where 𝑞𝑛 = 10𝑛! . We then have
𝑞𝑛
𝑗=1
¯
¯
∞
∑
¯
¯
¯𝛼 − 𝑎𝑛 ¯ =
10−𝑗! = 10−(𝑛+1)! + 10−(𝑛+2)! + 10−(𝑛+3)! + . . .
¯
¯
𝑞𝑛
𝑗=𝑛+1
(
) 10
10
< 10−(𝑛+1)! 1 + 10−1 + 10−2 + . . . =
⋅ 10−(𝑛+1)! = 𝑞𝑛−(𝑛+1) .
9
9
If 𝛼 is algebraic of degree 𝑘 ≥ 2, then Liouville’s Theorem implies that there is a constant
𝑐 > 0 such that ∣𝛼 − 𝑎𝑛 /𝑞𝑛 ∣ > 𝑐𝑞𝑛−𝑘 for all 𝑛. This statement holds for 𝑘 = 1 as well since
𝛼 ∕= 𝑎𝑛 /𝑞𝑛 and hence 𝛼 = 𝑏/𝑟 =⇒ ∣𝛼 − 𝑎𝑛 /𝑞𝑛 ∣ ≥ (𝑟𝑞𝑛 )−1 , whence we can take 𝑐 = 1/(𝑟 + 1).
Thus if 𝛼 is algebraic of degree 𝑘 we have
10 −(𝑛+1)
𝑐𝑞𝑛−𝑘 < ∣𝛼 − 𝑎𝑛 /𝑞𝑛 ∣ <
𝑞
,
9 𝑛
and thus 𝑞𝑛𝑛+1−𝑘 < 10/(9𝑐). But 𝑞𝑛 → ∞ as 𝑛 → ∞, so we obtain a contradiction by taking
𝑛 sufficiently large in terms of 𝑘 and 𝑐.
□
Some open questions. A real number 𝛼 is said to be badly approximable if there is a
positive constant 𝑐𝛼 such that ∣𝛼−𝑎/𝑞∣ > 𝑐𝛼 𝑞 −2 for all integers 𝑎 and 𝑞 with 𝑞 ≥ 1. Liouville’s
Theorem shows that all algebraic numbers of degree two (i.e., all quadratic irrationals) are
badly approximable. It is conjectured that no algebraic numbers of degree greater than
two are badly approximable, but this has not been proven. It turns out that a number is
badly approximable if and only if the partial quotients in its continued fraction expansion
are bounded. For instance, 𝑒 is not badly approximable, for it can be shown that
𝑒 = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, . . . ].
By contrast, it is unknown whether 𝜋 is badly approximable (the conjecture is that it’s
not). We do know that most real numbers are not badly approximable, in the sense that the
badly approximable numbers have measure zero in the real line. However, the set of badly
approximable numbers is uncountable (like the reals), whereas the set of algebraic numbers
is countable (like the integers and the rationals). Therefore, there are uncountably many
badly approximable transcendental numbers, but producing a single specific example seems
to be non-trivial.
On a more basic level, much is still unknown about the irrationality and transcendence√of
familiar numbers. For instance, it is not known whether the numbers 𝜋 ± 𝑒, 𝜋/𝑒, 𝜋 𝑒 , 𝜋 2 ,
2𝑒 , and log 𝜋 are irrational. As another example, consider the Riemann zeta function
∞
∑
1
𝜁(𝑠) =
𝑛𝑠
𝑛=1
52
SCOTT T. PARSELL
for 𝑠 > 1. When 𝑠 is an even integer, it is known that 𝜁(𝑠) is a rational multiple of 𝜋 𝑠
(and hence transcendental); for example, 𝜁(2) = 𝜋 2 /6. Much less is known when 𝑠 is odd.
It was proved by Apéry in 1979 that 𝜁(3) is irrational, but it is unknown whether 𝜁(3) is
transcendental. It is unknown whether 𝜁(5) is irrational, although it has been shown that at
least one of 𝜁(5), 𝜁(7), 𝜁(9), and 𝜁(11) must be irrational (Zudilin, 2001). In fact, it is known
that there are infinitely many odd integers 𝑠 for which 𝜁(𝑠) is irrational (Rivoal, 2000), but
the irrationality is not known for any particular odd 𝑠 > 3.
9. The distribution of primes
Suppose that 𝑝𝑛 denotes the 𝑛th prime, so that 𝑝1 = 2, 𝑝2 = 3, 𝑝3 = 5, and so on. One
of the central problems in analytic number theory is to obtain precise information about
the behavior of 𝑝𝑛 as 𝑛 → ∞. A simple way to get an idea of how the sequence (𝑝𝑛 ) is
distributed is to look at the sum of the reciprocals of the terms:
∞
∑
1
1 1 1 1
1
1
1
= + + + +
+
+
+ ....
𝑝
2 3 5 7 11 13 17
𝑛=1 𝑛
(9.1)
As a starting point, we recall from calculus that the harmonic series 1 + 12 + 13 + 41 + . . .
diverges. The following lemma provides quantitative information about the growth rate of
the partial sums. Specifically, it says that the sum of the first 𝑁 terms is roughly log 𝑁 .
Lemma 9.1. For all positive integers 𝑁 , one has
1 1 1
1
0 < 1 + + + + ⋅⋅⋅ +
− log 𝑁 ≤ 1.
2 3 4
𝑁
Proof. By using a left-hand Riemann sum to over-estimate the area under the graph of
𝑦 = 1/𝑡, we find that
∫ 𝑁 +1
1 1
1
𝑑𝑡
1 + + + ⋅⋅⋅ +
>
= log(𝑁 + 1) > log 𝑁.
2 3
𝑁
𝑡
1
By considering a right-hand Riemann sum we similarly obtain
∫ 𝑁
1 1
1
𝑑𝑡
1 + + + ⋅⋅⋅ +
≤1+
= 1 + log 𝑁,
2 3
𝑁
𝑡
1
and the result follows by subtracting log 𝑁 from each side of the above inequalities.
□
It turns out that the quantity considered above actually approaches a limit as 𝑁 → ∞,
known as Euler’s constant:
)
(
1
1 1
𝛾 = lim 1 + + + ⋅ ⋅ ⋅ +
− log 𝑁 = 0.57721566490153286060651209008240243 . . .
𝑁 →∞
2 3
𝑁
Although Euler’s constant is known to over 1,000,000 decimal places, it is still unknown
whether it is irrational. It is conjectured to be transcendental.
We now return to the prime harmonic series (9.1). It turns out that the 𝑁 th partial sum
of this series is on the order of log log 𝑁 rather than log 𝑁 . In what follows, it is useful to
call an integer square-free if it is not divisible by the square of any prime. In other words, 𝑛
is square-free if we can write 𝑛 = 𝑝1 ⋅ ⋅ ⋅ 𝑝𝑟 , where 𝑝1 , . . . , 𝑝𝑟 are distinct primes. For instance,
MA 311
NUMBER THEORY
FALL 2008
53
the integer 42 = 2 ⋅ 3 ⋅ 7 is square-free but 45 = 32 ⋅ 5 is not. From now on, the letter 𝑝 is
reserved to denote a prime unless otherwise indicated.
∑1
Theorem 9.2. For every integer 𝑁 > 1, one has
> log log 𝑁 − log 2.
𝑝
𝑝≤𝑁
Proof. Every positive integer 𝑛 can be written uniquely in the form 𝑛 = 𝑞𝑚2 , where 𝑞 and
𝑚 are positive integers with 𝑞 square-free. Using Lemma 9.1, we obtain
∞
∑1
∑
∑
∑ 1 ∑
1
1
log 𝑁 <
=
≤
.
2
𝑛
𝑞𝑚
𝑞 𝑚=1 𝑚2
√
𝑛≤𝑁
𝑞≤𝑁
𝑞≤𝑁
𝑞 squarefree
Furthermore, we have
𝑚≤
𝑁/𝑞
𝑞 squarefree
∫ ∞
∞
∑
1
𝑑𝑡
<1+
= 2,
2
𝑚
𝑡2
1
𝑚=1
and the inequality 1 + 𝑢 ≤ 𝑒𝑢 yields
) ∏
(∑ )
∑ 1
∏(
1
1
1/𝑝
≤
1+
≤
𝑒 = exp
.
𝑞 𝑝≤𝑁
𝑝
𝑝
𝑝≤𝑁
𝑝≤𝑁
𝑞≤𝑁
𝑞 squarefree
We therefore deduce that
(∑ )
1
log 𝑁 < 2 exp
,
𝑝
𝑝≤𝑁
and taking logarithms gives
log log 𝑁 < log 2 +
∑1
,
𝑝
𝑝≤𝑁
as required.
Corollary 9.3. The prime harmonic series
□
∞
∑
1
diverges.
𝑝
𝑛
𝑛=1
Proof. This follows immediately from Theorem 9.2, since lim (log log 𝑁 ) = ∞.
𝑁 →∞
□
∑
We may interpret the divergence of 𝑝−1
𝑛 to mean that the primes are not all that sparsely
distributed. For instance, if the primes were as sparse
as the sequence of perfect squares
∑ −2
then the series would converge by comparison with
𝑛 . On the other hand, comparing
the orders of growth of the partial sums in Lemma 9.1 and Theorem 9.2 indicates that the
primes are, at least in some sense, significantly sparser than the integers themselves.
In order to obtain more precise information about the growth of 𝑝𝑛 , it is useful to define
𝜋(𝑛) to be the number of primes 𝑝 with 𝑝 ≤ 𝑛. We aim to derive some elementary bounds
for 𝜋(𝑛) due to Chebyshev and then use our results to obtain bounds on 𝑝𝑛 . We begin with
two simple combinatorial lemmas.
( )
2𝑛
𝑛
Lemma 9.4. One has 2 ≤
< 4𝑛 for all positive integers 𝑛.
𝑛
54
SCOTT T. PARSELL
Proof. By the binomial theorem, we have
𝑛
4 = (1 + 1)
2𝑛
)
2𝑛 (
∑
2𝑛
=
𝑘=0
𝑘
( )
2𝑛
>
.
𝑛
The other inequality may be established by a simple induction argument and is left as an
exercise.
□
Lemma 9.5. One has
⌋
𝑝 𝑛⌋⌊
∑ ⌊log
∑
𝑛
log 𝑛! =
log 𝑝.
𝑝𝑚
𝑝≤𝑛 𝑚=1
3 ⋅ ⋅ ⋅ (𝑛 − 1) ⋅ 𝑛, it is clear that there are no primes 𝑝 > 𝑛 dividing 𝑛!,
Proof. Since 𝑛! = 1 ⋅ 2 ⋅ ∏
so we may write 𝑛! = 𝑝≤𝑛 𝑝𝛼𝑝 , where 𝛼𝑝 is a non-negative integer representing the exact
power of 𝑝 that divides 𝑛!. Taking logarithms gives
∑
log 𝑛! =
𝛼𝑝 log 𝑝,
𝑝≤𝑛
so it remains to find a formula for 𝛼𝑝 . Among the integers 1, 2, 3, . . . , 𝑛 − 1, 𝑛, there are
⌊𝑛/𝑝⌋ multiples of 𝑝. Of these, ⌊𝑛/𝑝2 ⌋ are also multiples of 𝑝2 , and in general ⌊𝑛/𝑝𝑚 ⌋ of
them are multiples of 𝑝𝑚 . Since 𝑝𝑚 > 𝑛 when 𝑚 > log𝑝 𝑛, we see that
⌊ ⌋ ⌊ ⌋ ⌊ ⌋
𝑛
𝑛
𝑛
𝛼𝑝 =
+ 2 + 3 + ... =
𝑝
𝑝
𝑝
⌊log𝑝 𝑛⌋⌊
∑
𝑚=1
⌋
𝑛
,
𝑝𝑚
and this completes the proof.
□
Theorem 9.6. For every integer 𝑛 ≥ 2 one has
𝑛
6𝑛
< 𝜋(𝑛) <
.
6 log 𝑛
log 𝑛
Proof. By taking logarithms in the result of Lemma 9.4, we obtain
since
(2𝑛)
𝑛
𝑛 log 2 ≤ log(2𝑛)! − 2 log 𝑛! < 𝑛 log 4,
= (2𝑛)!/(𝑛!)2 . Lemma 9.5 therefore gives
𝑛 log 2 ≤
⌋
𝑝 2𝑛⌋ (⌊
∑ ⌊log
∑
2𝑛
𝑝≤2𝑛
𝑚=1
𝑝𝑚
⌊ ⌋)
𝑛
−2 𝑚
log 𝑝 < 𝑛 log 4.
𝑝
(9.2)
Now since ⌊2𝑥⌋ − 2⌊𝑥⌋ is 0 if 0 ≤ 𝑥 − ⌊𝑥⌋ < 1/2 and 1 otherwise, we find that
∑
𝑛
< 𝑛 log 2 ≤
(log𝑝 2𝑛)(log 𝑝) = 𝜋(2𝑛) log 2𝑛,
2
𝑝≤2𝑛
and hence 𝜋(2𝑛) > 2𝑛/(4 log 2𝑛), which establishes the lower bound for even integers. Since
2𝑛 ≥ 23 (2𝑛 + 1) for 𝑛 ≥ 1, we also have
𝜋(2𝑛 + 1) ≥ 𝜋(2𝑛) >
2𝑛 + 1
2𝑛
>
,
4 log 2𝑛
6 log(2𝑛 + 1)
MA 311
NUMBER THEORY
FALL 2008
55
which proves the lower bound for odd integers. For the upper bound, we delete all but the
𝑚 = 1 term from (9.2) to obtain
⌊ ⌋)
∑ (⌊ 2𝑛 ⌋
𝑛
−2
log 𝑝 < 𝑛 log 4.
𝑝
𝑝
𝑝≤2𝑛
Let 𝜗(𝑛) =
∑
log 𝑝. Since ⌊2𝑛/𝑝⌋ − 2⌊𝑛/𝑝⌋ = 1 when 𝑛 < 𝑝 ≤ 2𝑛, we deduce that
𝑝≤𝑛
∑
𝜗(2𝑛) − 𝜗(𝑛) =
log 𝑝 < 𝑛 log 4.
𝑛<𝑝≤2𝑛
Now if 𝑛 is a particular integer with 𝑛 ≥ 2, there is a positive integer 𝑘 such that 2𝑘 ≤ 𝑛 <
2𝑘+1 . We then have
𝑘+1
𝜗(𝑛) ≤ 𝜗(2
)=
𝑘
∑
(𝜗(2
𝑟+1
𝑟
) − 𝜗(2 )) <
𝑟=0
𝑘
∑
2𝑟 log 4 = (2𝑘+1 − 1) log 4 < 4𝑛 log 2,
𝑟=0
since the first summation telescopes and 𝜗(1) = 0. On the other hand, we have
∑
𝜗(𝑛) ≥
log 𝑝 ≥ (𝜋(𝑛) − 𝜋(𝑛2/3 )) log(𝑛2/3 ) ≥ 32 (𝜋(𝑛) − 𝑛2/3 ) log 𝑛.
𝑛2/3 <𝑝≤𝑛
Combining the previous two inequalities yields (𝜋(𝑛) − 𝑛2/3 ) log 𝑛 < 6𝑛 log 2, and hence
(
)
6𝑛 log 2
𝑛
log 𝑛
2/3
𝜋(𝑛) <
+𝑛 =
6 log 2 + 1/3 .
log 𝑛
log 𝑛
𝑛
It is a simple calculus exercise to show that the function (log 𝑥)/𝑥1/3 takes its maximum
value at 𝑥 = 𝑒3 and hence that (log 𝑛)/𝑛1/3 ≤ 3/𝑒 for all 𝑛 ≥ 1. We therefore have
𝜋(𝑛) <
𝑛
6𝑛
(6 log 2 + 3/𝑒) <
,
log 𝑛
log 𝑛
as required.
□
We now deduce upper and lower bounds on the size of the 𝑛th prime.
Theorem 9.7. For every integer 𝑛 ≥ 2, one has
1
𝑛 log 𝑛 < 𝑝𝑛 < 18𝑛 log 𝑛.
6
Proof. Suppose that 𝑝𝑛 = 𝑘. By Theorem 9.6, we have
𝑛 = 𝜋(𝑘) <
6𝑘
6𝑝𝑛
=
,
log 𝑘
log 𝑝𝑛
and thus 𝑝𝑛 > 16 𝑛 log 𝑝𝑛 > 61 𝑛 log 𝑛, which gives the lower bound. Similarly, Theorem 9.6
gives
𝑘
𝑝𝑛
𝑛 = 𝜋(𝑘) >
=
,
6 log 𝑘
6 log 𝑝𝑛
56
SCOTT T. PARSELL
and thus 𝑝𝑛 < 6𝑛 log 𝑝𝑛 . We recall from the proof of Theorem 9.6 that log 𝑥 ≤ (3/𝑒)𝑥1/3 ,
1/3
2/3
which gives 𝑝𝑛 < (18/𝑒)𝑛𝑝𝑛 and thus 𝑝𝑛 < 18𝑛/𝑒. Taking logarithms gives
2
log 𝑝𝑛 < log 𝑛 + log(18/𝑒) < 2 log 𝑛,
3
provided that 𝑛 > 6. We therefore obtain 𝑝𝑛 < 18𝑛 log 𝑛 when 𝑛 > 6, and it is easy to check
that this holds for 2 ≤ 𝑛 ≤ 6 as well.
□
Even more precise information is known about 𝜋(𝑛) and 𝑝𝑛 asymptotically as 𝑛 → ∞.
Before mentioning some of these results, we discuss some of the common asymptotic notation.
We say that 𝑓 (𝑥) ∼ 𝑔(𝑥) as 𝑥 → ∞ if
𝑓 (𝑥)
= 1.
𝑥→∞ 𝑔(𝑥)
lim
Furthermore, we write 𝑓 (𝑥) = 𝑜(𝑔(𝑥)) if
lim
𝑥→∞
𝑓 (𝑥)
= 0.
𝑔(𝑥)
Finally, we write 𝑓 (𝑥) = 𝑂(𝑔(𝑥)) if there is a constant 𝑀 such that ∣𝑓 (𝑥)∣ ≤ 𝑀 ∣𝑔(𝑥)∣ for all
𝑥. Notice that 𝑓 = 𝑜(𝑔) implies that 𝑓 = 𝑂(𝑔).
Theorem 9.8. (The Prime Number Theorem) As 𝑛 → ∞ one has
𝑛
𝜋(𝑛) ∼
and
𝑝𝑛 ∼ 𝑛 log 𝑛.
log 𝑛
The proof of the prime number theorem is beyond the scope of the course, as the most
direct method requires the theory of complex variables. If 𝜋(𝑛; 𝑞, 𝑎) denotes the number of
primes 𝑝 ≤ 𝑛 with 𝑝 ≡ 𝑎 (mod 𝑞), then it is also known that
1
𝑛
𝜋(𝑛; 𝑞, 𝑎) ∼
𝜋(𝑛) ∼
𝜙(𝑞)
𝜙(𝑞) log 𝑛
whenever (𝑞, 𝑎) = 1. This is called the prime number theorem for arithmetic progressions. In
particular, it shows that there are infinitely many primes in each reduced residue class modulo
𝑞 and that the primes are equally distributed among the residue classes asymptotically. For
example, roughly half of the odd primes are congruent to 1 mod 4 and roughly half are
congruent to 3 mod 4.
The prime number theorem may be interpreted by saying that the probability that the
integer 𝑛 is prime is roughly 1/ log 𝑛. In fact, this interpretation leads to an approximation
for 𝜋(𝑛) that is more accurate than 𝑛/ log 𝑛. It is known that 𝜋(𝑛) ∼ li(𝑛), where
∫ 𝑥
𝑑𝑡
li(𝑥) =
.
2 log 𝑡
We may think of li(𝑥) as a sort of cumulative distribution function for the density function
𝑓 (𝑡) = 1/ log 𝑡. It is known that ∣𝜋(𝑥)−li(𝑥)∣ = 𝑜(𝑥) as 𝑥 → ∞, and in fact one can make the
error term more explicit. The best known quantitative version of the prime number theorem
states that
√
∣𝜋(𝑥) − li(𝑥)∣ = 𝑂(𝑥 exp(−𝑐 log 𝑥))
MA 311
NUMBER THEORY
FALL 2008
57
for some constant 𝑐 > 0. However, it is easy to show that this error term grows more
rapidly than 𝑥1−𝛿 for every 𝛿 > 0, so this is actually a fairly weak
√ result in some sense. It is
conjectured that the true error term is just slightly larger than 𝑥.
Conjecture 9.9. (The Riemann Hypothesis) One has
√
∣𝜋(𝑥) − li(𝑥)∣ = 𝑂( 𝑥 log 𝑥).
This is one of the most notorious unsolved problems in mathematics, and even establishing
an error term of 𝑂(𝑥1−𝛿 ) for some positive 𝛿 would be considered a major breakthrough. The
usual statement of the Riemann hypothesis concerns the zeta function mentioned at the end
of §8. This is a function of a complex variable, which is defined by the infinite series
∞
∑
𝜁(𝑠) =
𝑛−𝑠
𝑛=1
when Re(𝑠) > 1. The above series fails to converge when Re(𝑠) ≤ 1, but it turns out that the
zeta function has a unique extension (called an analytic continuation) to the whole complex
plane. This extension of 𝜁(𝑠) has so-called “trivial” zeros at the negative even integers, and
the Riemann hypothesis is equivalent to the assertion that all the remaining zeros of 𝜁(𝑠) lie
on the line Re(𝑠) = 1/2.
Twin Primes and Mersenne Primes. It is conjectured that 𝜋2 (𝑛), the number of twin
prime pairs (𝑝, 𝑝 + 2) with 𝑝 + 2 ≤ 𝑛, is asymptotic to 𝐶𝑛/(log 𝑛)2 for some constant 𝐶 > 0,
but we don’t even know that 𝜋2 (𝑛) → ∞. This latter statement is known as the Twin Prime
Conjecture. In some sense, the twin primes are very sparse, as it can be shown that the sum
of their reciprocals,
(
) (
) (
) (
)
1 1
1 1
1
1
1
1
+
+
+
+
+
+
+
+ ...
3 5
5 7
11 13
17 19
converges, in contrast to the conclusion of Corollary 9.3. The value of the above sum, known
as Brun’s constant, is quite difficult to estimate precisely because of the slow convergence;
however, its value appears to be around 1.902160583. In 1994, Nicely discovered inconsistencies in his computations of Brun’s constant, which turned out to result from a subtle flaw
in Intel’s new Pentium processor. This led to an embarrassing recall and provided one of
the more surprising applications of number theory. It is not known whether Brun’s constant
is rational; of course, its irrationality would imply the Twin Prime Conjecture since a finite
sum of rational numbers is rational.
Recall that the Mersenne numbers are integers of the form 2𝑝 − 1 where 𝑝 is prime. It
is conjectured that the number of Mersenne primes up to 𝑛 is asymptotic to 𝑒𝛾 log2 (log 𝑛),
where 𝛾 is Euler’s constant. However, only 46 Mersenne primes have been discovered as
of November 2008, and proving that there are infinitely many seems completely out of
reach. The computational evidence certainly suggests that the Mersenne primes are sparsely
distributed among the Mersenne numbers; that is, for most primes 𝑝 the number 2𝑝 − 1 turns
out to be composite. However, it also remains an open problem to establish that there are
infinitely many composite Mersenne numbers. It seems inconceivable that this would fail,
since then all sufficiently large Mersenne numbers would be prime! Nevertheless, the existing
technology does not seem to be capable of generating a proof.
58
SCOTT T. PARSELL
References
[1] G. E. Andrews, Number Theory, Dover, 1994.
[2] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics, Springer-Verlag, 1976.
[3] T. H. Barr, Invitation to cryptology, Prentice Hall, 2002.
[4] D. M. Bressoud, Factorization and primality testing, Undergraduate Texts in Mathematics, Springer-Verlag, New York, 1989.
[5] E. B. Burger, Exploring the number theory jungle: A journey into diophantine analysis,
AMS Student Mathematical Library, Volume 8, 2000.
[6] M. Erickson and A. Vazzana, Introduction to number theory, Discrete Mathematics and
its Applications, Chapman & Hall/CRC, Boca Raton, 2008.
[7] J. A. Gallian, Contemporary abstract algebra, 6th ed, Houghton Mifflin, 2006.
[8] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 6th ed,
Oxford University Press, 2008.
[9] J. F. Humphreys and M. Y. Prest, Numbers, groups, and codes, Cambridge University
Press, 1989.
[10] K. Ireland and M. Rosen, A classical introduction to modern number theory, 2nd ed,
Graduate Texts in Mathematics, 84, Springer-Verlag, 1990.
[11] N. Koblitz, A course in number theory and cryptography, 2nd ed, Graduate Texts in
Mathematics, 114, Springer-Verlag, 1994.
[12] H. L. Montgomery and R. C. Vaughan, Multiplicative number theory I. Classical
theory, Cambridge University Press, 2007.
[13] M. B. Nathanson, Additive number theory I: The classical bases, Graduate Texts in
Mathematics, 164, Springer-Verlag, 1996.
[14] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An introduction to the theory of
numbers, 5th ed, Wiley, 1991.
[15] K. H. Rosen, Elementary number theory and its applications, 5th ed, Pearson Addison
Wesley, 2005.
[16] J. H. Silverman, A friendly introduction to number theory, 3rd ed, Pearson Prentice
Hall, 2006.
[17] G. Tenenbauam and M. Mendés France, The prime numbers and their distribution,
AMS Student Mathematical Library, Volume 6, 2000.
[18] R. C. Vaughan, The Hardy-Littlewood method, 2nd ed, Cambridge University Press,
1997.