Download Graduate Texts in Mathematics 232

Document related concepts

Addition wikipedia , lookup

Line (geometry) wikipedia , lookup

Vincent's theorem wikipedia , lookup

List of prime numbers wikipedia , lookup

Georg Cantor's first set theory article wikipedia , lookup

Mathematical proof wikipedia , lookup

Non-standard calculus wikipedia , lookup

Elementary mathematics wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Central limit theorem wikipedia , lookup

Four color theorem wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Quadratic reciprocity wikipedia , lookup

Theorem wikipedia , lookup

Number theory wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Transcript
Graduate Texts in Mathematics
232
Editorial Board
S. Axler K.A. Ribet
Graham Everest
Thomas Ward
An Introduction to
Number Theory
With 16 Figures
Graham Everest, BSc, PhD
School of Mathematics
University of East Anglia
Norwich
NR4 7TJ
UK
Thomas Ward, BSc, MSc, PhD
School of Mathematics
University of East Anglia
Norwich
NR4 7TJ
UK
Editorial Board
S. Axler
Mathematics Department
San Francisco State University
San Francisco, CA 94132
USA
K.A. Ribet
Department of Mathematics
University of California, Berkeley
Berkeley, CA 94720-3840
USA
Mathematics Subject Classification (2000): 11Y05/11/16/55
British Library Cataloguing in Publication Data
Everest, Graham, 1957–
An introduction to number theory. — (Graduate texts in
mathematics ; 232)
1. Number theory
I. Title II. Ward, Thomas, 1963–
512.7
ISBN 1852339179
Library of Congress Control Number: 2005923447
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored on transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the
Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to
the publishers.
Graduate Texts in Mathematics series ISSN 0072-5285
ISBN-10: 1-85233-917-9
ISBN-13: 978-1-85233-917-3
Springer Science+Business Media
springeronline.com
© Springer-Verlag London Limited 2005
The use of registered names, trademarks, etc. in this publication does not imply, even in the absence
of a specific statement, that such names are exempt from the relevant laws and regulations and therefore
free for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or
omissions that may be made.
Typesetting: Camera-ready by authors
Printed in the United States of America
12/3830-543210 Printed on acid-free paper SPIN 11316527
And he brought him forth abroad, and said,
Look now toward heaven, and tell the stars, if
thou be able to number them: and he said unto
him, So shall thy seed be.
Genesis 15, verse 5
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
A Brief History of Prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Euclid and Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Summing Over the Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Listing the Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Fermat Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Proving the Fundamental Theorem of Arithmetic . . . . . . . . . . . .
1.7 Euclid’s Theorem Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
11
16
29
31
35
39
2
Diophantine Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Pythagoras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The Fundamental Theorem of Arithmetic in
Other Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Sums of Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Siegel’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Fermat, Catalan, and Euler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
43
45
48
52
56
3
Quadratic Diophantine Equations . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Quadratic Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Euler’s Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 The Quadratic Reciprocity Law . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Quadratic Rings
.........................................
√
3.5 Units in Z[ d], d > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
59
65
67
73
75
78
4
Recovering the Fundamental Theorem of Arithmetic . . . . . .
4.1 Crisis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 An Ideal Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Fundamental Theorem of Arithmetic for Ideals . . . . . . . . . . . . . .
83
83
84
85
viii
Contents
4.4 The Ideal Class Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5
Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.1 Rational Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 The Congruent Number Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Explicit Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4 Points of Order Eleven . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5 Prime Values of Elliptic Divisibility Sequences . . . . . . . . . . . . . . 112
5.6 Ramanujan Numbers and the Taxicab Problem . . . . . . . . . . . . . . 117
6
Elliptic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.1 Elliptic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2 Parametrizing an Elliptic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Complex Torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Partial Proof of Theorem 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7
Heights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.1 Heights on Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Mordell’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.3 The Weak Mordell Theorem: Congruent
Number Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 The Parallelogram Law and the Canonical Height . . . . . . . . . . . 146
7.5 Mahler Measure and the Naı̈ve Parallelogram Law . . . . . . . . . . . 150
8
The Riemann Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.1 Euler’s Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.2 Multiplicative Arithmetic Functions . . . . . . . . . . . . . . . . . . . . . . . . 161
8.3 Dirichlet Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.4 Euler Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.5 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.6 The Zeta Function Is Analytic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.7 Analytic Continuation of the Zeta Function . . . . . . . . . . . . . . . . . 175
9
The Functional Equation of the
Riemann Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.1 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.2 The Functional Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.3 Fourier Analysis on Schwartz Spaces . . . . . . . . . . . . . . . . . . . . . . . 187
9.4 Fourier Analysis of Periodic Functions . . . . . . . . . . . . . . . . . . . . . 189
9.5 The Theta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
9.6 The Gamma Function Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Contents
ix
10 Primes in an Arithmetic Progression . . . . . . . . . . . . . . . . . . . . . . 207
10.1 A New Method of Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
10.2 Congruences Modulo 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
10.3 Characters of Finite Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . 213
10.4 Dirichlet Characters and L-Functions . . . . . . . . . . . . . . . . . . . . . . 217
10.5 Analytic Continuation and Abel’s
Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
10.6 Abel’s Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
11 Converging Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.1 The Class Number Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
11.2 The Dedekind Zeta Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.3 Proof of the Class Number Formula . . . . . . . . . . . . . . . . . . . . . . . . 233
11.4 The Sign of the Gauss Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.5 The Conjectures of Birch and Swinnerton-Dyer . . . . . . . . . . . . . . 238
12 Computational Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
12.1 Complexity of Arithmetic Computations . . . . . . . . . . . . . . . . . . . 245
12.2 Public-key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
12.3 Primality Testing: Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . 253
12.4 Primality Testing: Pseudoprimes . . . . . . . . . . . . . . . . . . . . . . . . . . 258
12.5 Carmichael Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
12.6 Probabilistic Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
12.7 The Agrawal–Kayal–Saxena Algorithm . . . . . . . . . . . . . . . . . . . . . 266
12.8 Factorizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
12.9 Complexity of Arithmetic in Finite Fields . . . . . . . . . . . . . . . . . . 276
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Introduction
This book is written from the perspective of several passionately held beliefs
about mathematical education. The first is that mathematics is a good story.
Theorems are not discovered in isolation, but happen as part of a culture, and
they are generally motivated by paradigms. In this book we are going to show
how one result from antiquity can be used to illuminate the study of much
that forms the undergraduate curriculum in number theory at a typical U.K.
university. The result is the Fundamental Theorem of Arithmetic. Our hope
is that students will understand that number theory is not just a collection of
tricks and isolated results but has a coherence fueled directly by a connected
narrative that spans centuries.
The second belief is that mathematics students (and indeed professional
mathematicians) come to the subject with different preferences and evolving
strengths. Therefore, we have endeavored to present differing approaches to
number theory. One way to achieve this is the obvious one of selecting material from both the algebraic and the analytic disciplines. Less obviously, in
the early part of the book particularly, we sometimes present several different
proofs of a single result. The aim is to try to capture the imagination of the
reader and help her or him to discover his or her own taste in mathematics.
The book is written under the assumption that students are being exposed
to the power of analysis in courses such as complex variables, as well as the
power of abstraction in courses such as algebra. Thus we use notions from
finite group theory at several points to give alternative proofs. Often the resulting approaches simplify and promote generalization, as well as providing
elegance. We also use this approach because we want to try to explain how
different approaches to elementary results are worked out later in different
approaches to the subject in general. Thus Euler’s proof of the Fundamental
Theorem of Arithmetic could be taken to prefigure the development of analytic
number theory with its ingenious use of the Euler product Formula. When we
move further into the analytic aspects of arithmetic, Euler’s relatively simple
observation may seem like a rather flimsy pretext. However, the view that
many nineteenth-century mathematicians took of functions (complex func-
2
Introduction
tions particularly) was profoundly influenced by the Fundamental Theorem
of Arithmetic. In their view, many functions are factorizable objects, and we
will try to illustrate this in describing some of the great achievements of that
century.
Having spoken of different approaches, it will surprise few readers that
number theory has many streams. A major surprise is the fact that some
of these meet again: Chapter 11 shows that many of the themes in Chapters 1–10 become reconciled further on. The classical class number formula
reconciles the analytic stream of ideas with the algebraic. We also discuss –
necessarily in general terms – the L-function associated with an elliptic curve
and the conjectures of Birch and Swinnerton-Dyer, which draw together the
elliptic, algebraic and analytic streams. The underlying motif is the theory
of L-functions. As we enter a new millennium, it has become clear that one
of the ways into the deepest parts of number theory requires a better understanding of these fundamental objects.
The third belief is that number theory is a living subject, even when studied at an elementary level. The onset of electronic computing gave the subject
an enormous boost, and it is a pleasure to be able to record some recent developments. The language of arithmetical complexity has helped to change the
way we think about numbers. Modern computers can carry out calculations
with numbers that are almost unimaginably large. We recommend that any
reader unfamiliar with modern number theory packages tries a few experiments using some of the excellent free software available from the internet. To
start to think of the issues raised by large integer calculation can be no bad
thing. Intellectually too, this computational topic illustrates an interesting
point about the enduring nature of the paradigm. Our story begins over two
millennia ago, yet it is the same questions that continue to fascinate us. What
are the primes like? Where can they be found? How can the prime factors of
an integer be computed? Whether these questions will endure awhile longer
nobody can tell. The history of these problems already presents a fascinating
story worth telling, and one that says a lot about one of the most important
and beautiful narratives of enquiry in human history – mathematics.
One of the most striking and pleasurable aspects of number theory is the
extent of time and range of cultures over which it has been studied. We do
not go into a detailed history of the developments described here, but the
names and places given in the list of “Dramatis Personae” should give some
idea of how widely number theory has been studied. The names in this list are
rather crudely Anglicized and the locations somewhat arbitrarily modernized.
The many living mathematicians who have made significant contributions to
the topics covered here have been omitted but may be found on the Web
site in [113]. A densely written, comprehensive review of number theory up
to about 1920 may be found in Dickson’s history [42], [43], [44]; a discursive
and masterly account of the four millennia ending in 1798 is provided by
Weil [157].
Introduction
3
Finally, we say something about the way this book could be used. It is
based on three courses taught at the University of East Anglia on various
aspects of number theory (analytic, algebraic/geometric, and computational),
mostly at the final-year undergraduate level. We were motivated in part by
G. A. and J. M. Jones’ attractive book [84]. Their book sets out to deal with
the subject as it is actually taught. Typically, third-year students will not
have done a course in number theory and their experience will necessarily
be fragmentary. Like [84], our book begins in quite an elementary way. We
have found that the different years at a university do not equate neatly with
different abilities: Students in their early years can often be stretched well
beyond what seems possible, and upper-level students do not complain about
beginning in simple ways. We will try to show how different chapters can
be put together to make a course; the book can be used as a basis for two
upper-level courses and one at an intermediate level.
We thank many people for contributing to this text. Notable among them
are Christian Röttger, for writing up notes from an analytic number theory
course at UEA; Sanju Velani, for making available notes from his analytic
number theory course; several cohorts of UEA undergraduates for feedback on
lecture courses; Neal Koblitz and Joe Silverman for their inspiring books; and
Elena Nardi for help with the ancient Greek in Section 1.7.1. We thank Karim
Belabas, Robin Chapman, Sue Everest, Gareth and Mary Jones, Graham
Norton, David Pierce, Peter Pleasants, Christian Röttger, Alice Silverberg,
Shaun Stevens, Alan and Honor Ward, and others for pointing out errors and
suggesting improvements. Errors and solecisms that remain are entirely the
authors’ responsibility.
February 14, 2005
Norwich, UK
Graham Everest
Thomas Ward
Notation and terminology
“Arithmetic” is used both as a noun and an adjective. The particular notation used is collected at the start of the index. The symbols N, P, Z, Q, R, C
denote the natural numbers {1, 2, 3, . . . }, prime numbers {2, 3, 5, 7, . . . }, integers, rational numbers, real numbers, and complex numbers, respectively.
Any field with q = pr elements, p ∈ P and r ∈ N, is denoted Fq , and F∗q
denotes its multiplicative group; the field Fp , p ∈ P, is identified with the
set {0, 1, . . . , p − 1} under addition and multiplication modulo p. For a complex number s = σ + it, (s) = σ and (s) = t denote the real and imaginary
parts of s respectively. The symbol means “divides”, so for a, b ∈ Z, ab if
there is an integer k with ak = b. For any set X, |X| denotes the cardinality
of X. The greatest common divisor of a and b is written gcd(a, b). Products
are written using · as in 12 = 3 · 4 or n! = 1 · 2 · · · (n − 1) · n. The order
of growth of functions f, g (usually these are functions N → R) is compared
using the following notation:
4
Introduction
f (x)
−→ 1 as x → ∞;
g(x)
f = O(g) if there is a constant A > 0 with f (x) Ag(x) for all x;
f (x)
f = o(g) if
−→ 0 as x → ∞.
g(x)
f ∼ g if
In particular, f = O(1) means that f is bounded. The relation f = O(g) will
also be written f g, particularly when it is being used to express the fact
that two functions are commensurate, f g f . A sequence a1 , a2 , . . . will
be denoted (an ).
References
The references are not comprehensive, and material that is not explicitly cited
is nonetheless well-known. It is inevitable that we have borrowed ideas and
used them inadvertently without citation; we apologize for any egregious instances of this. The general references that are likely to be most accessible
without much background are as follows. For Chapter 2, [147]; for Chapters 3
and 4, [77], [96], [147], and [154]; for Chapters 5–7, [27] and [143]; for Chapters 8–10, [4], [75], and [81]; for Chapter 9, [6]; and for Chapter 12, [21], [22],
[36], [90], and [66].
Possible Courses
A course on analytic number theory could follow Chapters 1, 8, 9, and 10;
one on Diophantine problems or elliptic curves could follow Chapters 1, 2, 5,
6, and 7. A lower-level course on algebraic number theory could be based on
Chapters 1, 2, 3 and 4; one on complexity could be based on Chapters 1 and 12.
(These could also be used for the complexity part of a course on cryptography.)
The exercises are generally routine applications of the methods in the text,
but exercises marked * are to be viewed as projects, some of them requiring
further reading and research.
Introduction
5
Dramatis Personae
Person
Pythagoras of Samos
Euclid of Alexandria
Eratosthenes of Cyrene
Diophantus of Alexandria
Hypatia of Alexandria
Sun Zi
Brahmagupta
Abu Ali al-Hasan ibn al-Haytham
Bhaskaracharya
Leonardo Pisano Fibonacci
Qin Jiushao
Pietro Antonio Cataldi
Claude Gaspar Bachet de Méziriac
Marin Mersenne
Pierre de Fermat
James Stirling
Leonhard Euler
Joseph–Louis Lagrange
Lorenzo Mascheroni
Adrien-Marie Legendre
Jean Baptiste Joseph Fourier
Johann Carl Friedrich Gauss
Siméon Denis Poisson
August Ferdinand Möbius
Niels Henrik Abel
Carl Gustav Jacob Jacobi
Johann Peter Gustav Lejeune Dirichlet
Joseph Liouville
Ernst Eduard Kummer
Evariste Galois
Karl Theodor Wilhelm Weierstrass
Pafnuty Lvovich Tchebychef
Georg Friedrich Bernhard Riemann
François Edouard Anatole Lucas
Jules Henri Poincaré
David Hilbert
Srinivasa Aiyangar Ramanujan
Louis Joel Mordell
Carl Ludwig Siegel
Emil Artin
Kurt Mahler
Derrick Henry Lehmer
André Weil
Date
Country
569 b.c.–475 b.c. Greece, Egypt
325 b.c.–265 b.c. Greece, Egypt
276 b.c.–194 b.c. Libya, Greece, Egypt
200–284
Greece, Egypt
370–415
Egypt
400–460
China
598–670
India
965–1040
Iraq, Egypt
1114–1185
India
1170–1250
Italy
1202–1261
China
1548–1626
Italy
1581–1638
France
1588–1648
France
1601–1665
France
1692–1770
Scotland
1707–1783
Switzerland, Russia
1736–1813
Italy, France
1750–1800
Italy, France
1752–1833
France
1768–1830
France
1777–1855
Germany
1781–1840
France
1790–1868
Germany
1802–1829
Norway
1804–1851
Germany
1805–1859
France, Germany
1809–1882
France
1810–1893
Germany
1811–1832
France
1815–1897
Germany
1821–1894
Russia
1826–1866
Germany, Italy
1842–1891
France
1854–1912
France
1862–1943
Germany
1887–1920
India, England
1888–1972
USA, England
1896–1981
Germany
1898–1962
Austria, Germany
1903–1988
Germany, UK, Australia
1905–1991
USA
1906–1998
France, USA
1
A Brief History of Prime
Most of the results in this book grow out of one theorem that has probably
been known in some form since antiquity.
Theorem 1.1. [Fundamental Theorem of Arithmetic] Every integer
greater than 1 can be expressed as a product of prime numbers in a way that
is unique up to order.
For the moment, we are using the term prime in its most primitive form –
to mean an irreducible integer greater than one. Thus a positive integer p is
prime if p > 1 and the factorization p = ab into positive integers implies that
either a = 1 or b = 1. The expression “up to order” means simply that we
regard, for example, the two factorizations 6 = 2 · 3 = 3 · 2 as the same.
Theorem 1.1, the Fundamental Theorem of Arithmetic, will reverberate
throughout the text. The fact that the primes are the building blocks for all
integers already suggests they are worth particular study, rather in the way
that scientists study matter at an atomic level. In this case, we need a way of
looking for primes and methods to construct them, identify them, and even
quantify their appearance if possible. Some of these quests took thousands of
years to fulfill, and some are still works in progress. At the end of this chapter,
we will give a proof of Theorem 1.1, but for now we want to get on with our
main theme.
1.1 Euclid and Primes
The first consequence of the Fundamental Theorem of Arithmetic for the
primes is that there must be infinitely many of them.
Theorem 1.2. [Euclid] There are infinitely many primes.
To emphasize the diversity of approaches to number theory, we will give
several proofs of this famous result.
8
1 A Brief History of Prime
Euclid’s Proof in Modern Form. If there are only finitely many primes,
we can list them as p1 , . . . , pr . Let
N = p1 · · · pr + 1 > 1.
By the Fundamental Theorem of Arithmetic, N can be factorized, so it must
be divisible by some prime pk of our list. Since pk also divides p1 · · · pr , it
must divide the difference
N − p1 · · · pr = 1,
which is impossible, as pk > 1.
Euler’s Analytic Proof. Assume that there are only finitely many primes,
so they may be listed as p1 , . . . , pr . Consider the product
X=
r k=1
1
1−
pk
−1
.
The product is finite since 1 is not a prime and by hypothesis there are only
finitely many primes. Now expand each factor into a convergent geometric
series,
1
1
1
1
= 1 + + 2 + 3 + ··· .
p p
p
1 − p1
For any fixed K, we deduce that
1
1−
1
p
1+
1
1
1
+
+ ··· + K .
p p2
p
Putting this into the equation for X gives
1
1
1
1
1
1
X 1 + + 2 + ··· + K · 1 + + 2 + ··· + K
2 2
2
3 3
3
1
1
1
1
1
1
· 1 + + 2 + ··· + K ··· 1 +
+ 2 + ··· + K
5 5
5
pr
pr
pr
1 1 1
= 1 + + + + ···
2 3 4
1
=
,
n
(1.1)
n∈N (K)
where
N (K) = {n ∈ N | n = pe11 · · · perr , ei K for all i}
denotes the set of all natural numbers with the property that each prime
factor appears no more than K times. Notice that the identity (1.1) requires
1.1 Euclid and Primes
9
the Fundamental Theorem of Arithmetic. Given any number n ∈ N, if K is
large enough, then n ∈ N (K), so we deduce that
X
∞
1
.
n
n=1
The series on the right-hand side (known as the harmonic series) diverges
to infinity, but X is finite. Again we have reached a contradiction from the
assumption that there are finitely many primes.
Let us recall why the harmonic series diverges to infinity. As with Theorem 1.2, there are many ways to prove this; the first is elementary, while the
second compares the series with an integral.
Elementary Proof. Notice that
1
1
,
2
2
1 1
1
+ ,
3 4
2
1
1 1 1 1
+ + + ,
5 6 7 8
2
1+
and so on. For any k 1,
1
1
1
1
1
+
+ · · · + k+1 2k · k+1 = .
2k + 1 2k + 2
2
2
2
This means that
k+1
2
n=1
and it follows that
1
k
for all k 1,
n
2
∞
1
diverges.
n
n=1
Hidden in the last argument is some indication of the rate at which the
harmonic series diverges. Since the sum of the first 2k+1 terms exceeds k/2,
the sum of the first N terms must be approximately Clog N for some positive
constant C. The second proof improves on this: Equation (1.2) gives a sharper
lower bound as well as an upper bound.
∞
1
diverges using the same technique
2
n
n=1
of grouping terms together. Of course, this will not work since this series
converges, but you will see something mildly interesting. In particular, can
you use this to estimate the sum?
Exercise 1.1. Try to prove that
10
1 A Brief History of Prime
N
Using the Integral Test. Compare n=1 n1 with the integral
N
1
dx = log N.
x
1
6 1
6
6
Figure 1.1 shows n=1 n1 trapped between 0 x+1
dx and 1 + 1
general, it follows that
log(N + 1) N
1
1 + log N.
n
n=1
1
x
dx; in
(1.2)
This shows again that the harmonic series diverges and that the partial sum
of the first N terms is approximately log N .
1
y=
y=
1
x+1
1
x
1
1
2
0
1
1
3
1
4
2
Figure 1.1. Graphs of y =
1
x
1
5
3
4
and y =
1
x+1
1
6
5
6
trapping the harmonic series.
This proof is a harbinger of more subtle results. Comparing series with
integrals is a powerful technique; more generally, using analytic techniques
to study properties of numbers has been one of the most important ideas in
number theory.
Exercise 1.2. Extend the method illustrated in Figure 1.1 to show that the
sequence (an ) defined by
an =
n
1
− log n
m
m=1
is decreasing (that is, an+1 an for all n) and nonnegative. Deduce that it
converges to some number γ, and estimate γ to three digits. This number
is known as the Euler–Mascheroni constant. It is not known if γ is rational,
although it is expected not to be.
1.2 Summing Over the Primes
11
1.2 Summing Over the Primes
We begin this section with yet another proof that there are infinitely many
primes. Recall that P denotes the set of prime numbers.
Theorem 1.3. The series
1
p∈P
p
diverges.
Several proofs are offered; each one provides different insights. We adopt
ap dethe convention that p always denotes a prime so, for example,
notes
p>N
ap .
p∈P,p>N
Notice that Theorem 1.3 tells us something about the sequence (pn ) of
primes that begins
p1 = 2, p2 = 3, p3 = 5, . . . . For example, the sequence n1+ε /pn cannot be bounded for any ε > 0.
First Proof of Theorem 1.3. We argue by contradiction: Assume that
the series converges. Then there is some N such that
1
1
< .
p
2
p>N
Let
Q=
p
pN
be the product of all the primes less than or equal to N . The numbers
1 + nQ,
n ∈ N,
are never divisible by primes less than N because such primes do divide Q.
Now consider
⎞t
⎛
∞
∞
1
1
⎠<
⎝
P =
= 1.
p
2t
t=1
t=1
p>N
We claim that
⎞t
⎛
∞
1
1
⎠
⎝
1
+
nQ
p
n=1
t=1
∞
p>N
because every term on the left-hand side appears on the right-hand side at
least once. (Convince yourself of this claim by taking N = 11 and finding
some terms on the right-hand side.) It follows that
∞
1
1.
1
+
nQ
n=1
(1.3)
12
1 A Brief History of Prime
However, the series in Equation (1.3) diverges since
K
K
1
1 1
1 + nQ
2Q n=1 n
n=1
for any K, and the right-hand side diverges as K → ∞. This contradiction
proves the theorem.
Second Proof of Theorem 1.3. We will prove a stronger result, namely
1
> log log N − 2.
p
(1.4)
pN
Fix N and let
N(N ) = {n ∈ N : all prime factors of n are less than or equal to N }.
Then (just as in Euler’s analytic proof of Theorem 1.2 on p. 8)
1
1 + p−1 + p−2 + p−3 + · · ·
=
n
pN
n∈N(N )
−1
1 − p−1 .
=
pN
If n N , then certainly n ∈ N(N ), so
1
n
nN
n∈N(N )
1
.
n
It follows by Equation (1.2) that
log N n∈N(N )
−1
1
1 − p−1 .
=
n
(1.5)
pN
In order to estimate the right-hand side of Equation (1.5), we need the
following bound. For any v ∈ [0, 1/2],
2
1
ev+v .
1−v
To see why the bound (1.6) holds, let f(v) = (1 − v) exp(v + v 2 ). Then
f (v) = v(1 − 2v) exp(v + v 2 ) 0 for v ∈ [0, 12 ],
so the fact that f(0) = 1 implies that f(v) 1 for all v ∈ [0, 1/2].
For any prime p, v = p1 12 , so by the bound (1.6)
(1.6)
1.2 Summing Over the Primes
1−p
pN
−1 −1
exp p−1 + p
−2
13
.
pN
Combining this with Equation (1.5) and taking logarithms gives
log log N p−1 + p−2 .
(1.7)
pN
Finally, we observe that
∞
1
1
<
< 1,
2
p
n2
p
n=2
(1.8)
so the contribution to the right-hand side of Equation (1.7) from pN p−2 is
bounded independently of N . This completes the second proof of Theorem 1.3.
Exercise 1.3. Prove the second inequality in Equation (1.8) using the integral
test: Show that
N
N
1
1
<
dx 1
2
n
(x
−
1)2
2
n=2
for all N 2.
In fact, an estimate stronger than Equation (1.4) holds. Mertens showed
that there is a constant A (approximately 0.261) such that
1
1
.
(1.9)
= log log N + A + O
p
log N
pN
Exercise 1.4. Is it possible to prove Equation (1.9) with O(1) in place of
A + O(
1
)
log N
using only the methods of the second proof of Theorem 1.3?
The third proof of Theorem 1.3 extends the relationship between prod
−1
ucts such as p∈P 1 − p−1
and the harmonic series to a factorization of a
function that will later turn out to have a starring role.
Definition 1.4. The Riemann zeta function is defined by
∞
1
ζ(σ) =
σ
n
n=1
wherever this makes sense.
14
1 A Brief History of Prime
10
8
6
4
2
0
2
4
6
8
10
12
14
16
18
20
Figure 1.2. The graph of ζ(σ) for 1 < σ 20.
Understanding the properties of this function turns out to be the key to
many deeper properties of the prime numbers. For now, we simply think of σ
as being a real number and note that the series defining ζ(σ) converges by the
integral test for σ > 1 to a positive sum and diverges at σ = 1. For σ > 1, ζ(σ)
is a decreasing function of σ.
Viewed as a real function of a real variable, the zeta function does not look
particularly subtle or useful. Figure 1.2 shows the graph of ζ(σ) for 1 < σ 20.
Some indication of just how complicated this function really is appears when
it is viewed as a complex-valued function of a complex variable. It is clear
that the series defining the zeta function converges for s = σ + it when σ > 1
(see p. 166 for more on this). Figure 1.3 shows the function (ζ( 32 + it))
for 0 t 60, giving the first insight into the complex properties of the zeta
function.
In Chapter 8, the Riemann zeta function is extended to a complex analytic
function defined on the whole complex plane with the exception of a single
pole, and this opens up the most mysterious aspect of the zeta function – its
behavior along the line (s) = 12 . Figure 9.1 on p. 186 gives some idea of how
complicated this is.
Recall that p will be used to denote a prime number, so a product over
the variable p means a product over p ∈ P.
The first step in understanding the zeta function is the Euler product
representation, which is a factorization of the zeta function into terms corresponding to primes. The idea of factorizing a function will be discussed again
at the start of Chapter 9.
Theorem 1.5. [Euler Product Representation] For any σ > 1,
−1
ζ(σ) =
1 − p−σ .
p
1.2 Summing Over the Primes
15
2.0
1.6
1.2
0.8
0.4
0
10
20
30
40
50
60
Figure 1.3. The graph of (ζ( 32 + it)) for 0 t 60.
Proof. For any σ > 1,
∞
∞
1
1
1 − 2−σ ζ(σ) =
−
σ
n
(2n)σ
n=1
n=1
1
=
nσ
n odd
1
= 1+
,
nσ
p|n⇒p>2
where the last sum is taken over those n with all prime factors greater than 2
(that is, the odd numbers greater than 2).
Now let P be a large prime and repeat the same argument with each of
the primes 3, 5, . . . , P in turn. This gives
1
1 − 2−σ 1 − 3−σ 1 − 5−σ · · · 1 − P −σ ζ(σ) = 1 +
.
nσ
p|n⇒p>P
The last sum ranges over those n with the property that all the prime factors
of n are greater than P . Thus the last sum is a subsum of the tail of the
convergent series defining ζ(σ), and in particular it must tend to zero as P
goes to infinity. It follows that
lim 1 − 2−σ 1 − 3−σ 1 − 5−σ · · · 1 − P −σ ζ(σ) = 1,
P →∞
so
ζ(σ) =
1 − p−σ
−1
.
p
Remark 1.6. An infinite product is defined to be convergent if the corresponding partial products form a convergent sequence, that does not converge to
zero. The nonzero condition is imposed to allow us to take logarithms of infinite products, thereby connecting infinite products and infinite sums in a
meaningful way.
16
1 A Brief History of Prime
Third Proof of Theorem 1.3. Taking logarithms of the Euler product
representation shows that, for any σ > 1,
log 1 − p−σ
log ζ(σ) = −
p
=−
∞
p
∞
1
−1
1
=
+
.
mσ
σ
mσ
mp
p
mp
p
p m=2
m=1
(1.10)
Notice that the series involved converge absolutely, so rearrangement is permissible. For any prime p,
1
1
1− σ ,
p
2
so
∞
p
∞
1
1
<
mσ
mσ
mp
p
p m=2
m=2
1
1
=
2σ 1 − p−σ
p
p
1
2
2ζ(2σ) < 2ζ(2),
p2σ
p
which shows that the last double sum in Equation (1.10) is bounded. The
bound 2ζ(2) holds for any σ 1, and the double sum converges for σ > 12 .
Thus
1
+ O(1).
log ζ(σ) =
pσ
p
The left-hand side goes to infinity as σ tends to 1 from above, so the sum on
the right-hand side must do the same.
1.3 Listing the Primes
Early in the history of the subject, Eratosthenes1 devised a kind of sieve for
listing the primes. To illustrate his method – the sieve of Eratosthenes – we
consider the problem of finding all the primes up to 50. First arrange all the
integers between 1 and 50 in a grid.
1
Eratosthenes of Cyrene (276 b.c.–194 b.c.) was born in what is now Libya. He
made major contributions to many subjects, including finding surprisingly accurate estimates for the circumference of the Earth and the distances from the
Earth to the Sun and the Moon.
1.3 Listing the Primes
1
11
21
31
41
2
12
22
32
42
3
13
23
33
43
4
14
24
34
44
5
15
25
35
45
6
16
26
36
46
7
17
27
37
47
8
18
28
38
48
9
19
29
39
49
17
10
20
30
40
50
Now do the sieving: Eliminate 1, then start with 2 and cross out all numbers greater than 2 and divisible by 2. Then take the next surviving number 3
and cross out all the multiples of 3 that are greater than 3. Repeat with
the next surviving number and continue until the numbers divisible by 7 are
crossed out.
Exercise 1.5. Why can you stop sieving once you get to 7?
The remaining numbers are the prime numbers below 50, as shown below.
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 Understanding the patterns of the surviving numbers remains one of the great
challenges facing mathematics two thousand years after Eratosthenes.
This method has great value, allowing people throughout history to rapidly
create lists of primes. It fails to meet our longer-term objectives however. It
elegantly and efficiently produces lists of primes without having to do trial
divisions but does not help to decide if a given large number (with hundreds
of digits, for example) is prime.
Table 1.1. Early prime hunters.
Name
Pietro Cataldi
T. Brancker
Felkel Kulik
Derrick Henry Lehmer
Date
Bound
1588
750
1688
100000
1876 100330200
1909 10006721
Table 1.1 is a short list of some of the calculations of prime tables in
recent history; in each case all the primes up to the bound were listed. A
rather different problem is to find exactly how many primes there are below
a certain bound (without finding them all). Kulik listed the smallest factors
of all the integers up to his bound and in particular found all the primes up
to his bound. Lehmer’s table was widely distributed and as a result was very
influential (despite being shorter than Kulik’s table).
18
1 A Brief History of Prime
1.3.1 Functions that Generate Primes.
In the seventeenth century attention turned to finding formulas that would
generate the primes. Euler pointed out the following polynomial example.
Example 1.7. The polynomial x2 + x + 41 yields prime values for 0 x 39,
but x = 40, 41 do not yield primes.
What is striking about this example is that it is prime for many values in
succession relative to the size of the coefficients and the degree.
Exercise 1.6. (a) [Goldbach 1752] Prove that if f ∈ Z[x] has the property
that f (n) is prime for all n 1, then f must be a constant.
(b) Extend your argument to show that if f ∈ Z[x] has the property that f (n)
is prime for all n N for some N , then f must be a constant.
(c) Let P ∈ Z[x1 , . . . , xk ] be a polynomial in k 2 variables with integer
coefficients. Define a function f by f (n) = P (n, 2n , 3n , . . . , (k − 1)n ), and
assume that f (n) → ∞ as n → ∞. Show that f (n) is composite for infinitely
many values of n.
Remarkably, there is an explicit integral polynomial in several variables
whose set of positive values as the variables run through the nonnegative
integers coincides with the primes. This polynomial was discovered as a byproduct of research into Hilbert’s 10th Problem, which asked if there could
be an algorithm to determine if a polynomial Diophantine2 problem has a
solution. However, once again, this is useless with regard to the aim of finding
ways to generate primes efficiently.
There are ingenious “formulas” for the primes. Many of these require
knowledge of the first (n − 1) primes to produce the nth prime, and none
of them seem to be computationally useful. We will prove one striking result
of this kind here, and two further results in Exercise 1.24 on p. 33 and in
Exercise 8.9 on p. 163. The result proved here rests on Bertrand’s Postulate,
which is the first of many results that say something about how the prime
numbers appear and how the next prime compares in size with the previous
prime. The arguments below are intricate but elementary, and the basic contradiction arrived at in the proof of Theorem 1.9 is similar to one that will be
used to prove Zsigmondy’s Theorem (Theorem 1.15) in Section 8.3.1.
We need a lemma that says something about the growth in the product of
all the primes up to n. As usual p will be used to denote a prime.
Lemma 1.8. For any n 1,
log p < 2n log 2.
(1.11)
pn
2
Diophantine problems are discussed in Chapter 2. The term is used to denote
problems involving equations in which only integer solutions are sought.
1.3 Listing the Primes
Proof. Let
M=
19
2m + 1
(2m + 1)(2m) · · · (m + 2)
.
=
m!
m
This is a binomial coefficient, so it is an integer (see Exercise 1.10 for a stronger
form of this). The coefficient M appears twice in the binomial expansion
of 22m+1 = (1 + 1)2m+1 , so M < 22m . If m + 1 < p 2m + 1 for some prime p,
then p divides the numerator of M but does not divide the denominator, so
p divides M,
p∈A(m)
where A(m) denotes the set of primes p with m + 1 < p 2m + 1. It follows
that
log p −
log p =
log p log M < 2m log 2.
(1.12)
p2m+1
pm+1
p∈A(m)
We now prove Equation (1.11) by induction. It holds for n 2, so suppose it
holds for all n k − 1. If k is even, then
log p =
log p < 2(k − 1) log 2 < 2k log 2
pk
pk−1
by the inductive hypothesis. If k is odd, write k = 2m + 1 and then
log p =
log p −
log p +
log p
p2m+1
p2m+1
pm+1
pm+1
< 2m log 2 + 2(m + 1) log 2
= 2(2m + 1) log 2 = 2k log 2,
since m + 1 < k. Thus the inequality (1.11) holds for all n by induction.
Theorem 1.9. [Bertrand’s Postulate] If n 1, then there is at least one
prime p with the property that
n < p 2n.
(1.13)
Proof. For any real number x, let x
denote the integer part of x. Thus x
is the greatest integer less than or equal to x. Let p be any prime. Then
n
n
n
+ 2 + 3 + ···
p
p
p
is the largest power of p dividing n! (see Exercise 8.7(a) on p. 162). Fix n 1
and let
20
1 A Brief History of Prime
N=
pk(p)
p2n
be the prime decomposition of N = (2n)!/(n!)2 . The number of times that
a given prime p divides N is the difference between the number of times it
divides (2n)! and (n!)2 , so
k(p) =
∞ 2n
n
−
2
,
pm
pm
m=1
(1.14)
and each of the terms in the sum is either 0 or 1, depending on whether p2n
m
is odd or even. If pm > 2n the term is certainly 0, so
log 2n
.
(1.15)
k(p) log p
Now the proof of the theorem proceeds by a contradiction argument. Assume there is some n 1 for which there is no prime satisfying the inequality (1.13), and let p be a prime factor of N = (2n)!/(n!)2 . Thus p < n by our
assumption, and k(p) 1. If
2
n<pn
3
then
2p 2n < 3p and p2 >
4 2
n > 2n,
9
so Equation (1.14) becomes
k(p) =
2n
n
−2
= 2 − 2 = 0.
p
p
We deduce that p 23 n for every prime factor p of N . It follows that
log p p|N
p2n/3
log p 4
n log 2
3
(1.16)
by Lemma 1.8. Now if k(p) 2 then by the bound (1.15),
2 log p k(p) log p log 2n,
√
√
so p 2n and thus there are at most 2n possible values of p. Hence
√
k(p) log p 2n log 2n.
k(p)2
Together with the inequality (1.16), this shows that
1.3 Listing the Primes
log N log p +
k(p)=1
log p +
21
k(p) log p
k(p)2
√
2n log 2n
p|N
√
4
log 2 + 2n log 2n.
3
(1.17)
Now N is the largest coefficient (namely the middle one) in the binomial
expansion of
22n = (1 + 1)2n ,
so
22n = 2 +
2n
2n
2n
+
+· · · +
2nN.
1
2
2n − 1
Substituting this estimate into the inequality (1.17) gives
2n log 2 √
4
n log 2 + log 2n + 2n log 2n.
3
(1.18)
It is clear that the inequality (1.18) cannot hold for large values of n; a simple
calculation shows that (1.18) implies that n does not exceed 500.
It follows that if n > 500, then there is a prime satisfying the inequality (1.13). A calculation confirms that (1.13) also holds for all n 500, completing the proof of the theorem.
Notice that a consequence of Equation (1.13) is that if the primes are listed
in order as p1 , p2 , . . . , then
pn+1 < 2pn for all n 1.
(1.19)
It is clear that Theorem 1.9 gives another proof that there must be infinitely many primes. In each interval of the form (n, 2n] there is at least one.
This gives us a bound for the prime counting function
π(X) = |{p X | p ∈ P}.
The proof of Euclid’s Theorem 1.2 already says a little more than the purely
qualitative statement that π(X) → ∞ as X → ∞: from the proof of Theorem 1.2 we see that
pn+1 p1 p2 · · · pn + 1.
This tells us something about π(X). Define a sequence (un ) by setting u1 = 2
and un+1 = u1 · · · un + 1 for n 1. Then
π(X) min{n | un X}.
This is an extremely slowly growing sequence, and the bound obtained
for π(X) is very far from the truth.
22
1 A Brief History of Prime
Theorem 1.9 says more: there are at least N primes in the interval
(1, 2N ] = (1, 2] ∪ (2, 4] ∪ (4, 8] ∪ · · · ∪ (2N −1 , 2N ],
so π(2N ) > N . It follows that π(X) is larger than C log(X) for some positive constant C, infinitely often. Something closer to the truth about the
asymptotic behavior of π(X) is the Prime Number Theorem (Theorem 8.1).
Finding more refined estimates for π(X) generally involves deep problems in
analytic number theory. An exception is the result of Tchebychef, described in
Exercise 8.7 on p. 162, which uses elementary methods to give better bounds
for π(X).
Bertrand’s Postulate is enough to exhibit a striking but impractical formula for the primes. More importantly, the bound (1.13) immediately motivates the question of whether the upper estimate 2n could be reduced, perhaps
for all large n only, and this is the subject of ongoing research.
Corollary 1.10. There exists a real number θ with the property that
·θ
22
·
2·
is a prime number for any number of iterations of the exponential.
Proof. Let q1 be any prime, and choose a sequence of primes (qn ) with the
property that
2qn < qn+1 < 2qn +1 .
(1.20)
This is possible by Bertrand’s Postulate. Now define functions f (1) , f (2) , . . .
by f (1) (x) = log2 (x) and f (n+1) (x) = log2 (f (n) (x)) for n 1. Define sequences (un ) and (vn ) by
un = f (n) (qn ) and vn = f (n) (qn + 1).
By the inequality (1.20),
qn < f (1) (qn+1 ) < f (1) (qn+1 + 1) < qn + 1,
so by applying the increasing function f (n) we have
un < un+1 < vn+1 < vn .
It follows that the sequence (un ) is increasing and bounded above, so it converges. Let
θ = lim un .
n→∞
Define functions g
Then
(n)
by g
(1)
(x) = 2x and g (n+1) (x) = 2g
g (n) (un ) < g (n) (θ) < g (n) (vn ),
(n)
(x)
for all n 1.
1.3 Listing the Primes
23
so
qn < g (n) (θ) < qn + 1 for all n 1
as required.
Exercise 1.7. [Mills] A deep result of Ingham improves Equation (1.13) to
say that there is a constant C such that
pn+1 − pn < Cp5/8
n .
Assuming this result, modify the proof of Corollary 1.10 to show that there
n
is a real number θ with the property that θ3 is a prime for all n 1.
Exercise 1.8. [Richert] Use Theorem 1.9 to show that every integer greater
than 6 is a sum of distinct primes. (Hint: Show this is true for the numbers 7
to 19, then use Theorem 1.9 to see that we can keep adding new primes to
the set of sums obtained without missing out any integers).
Exercise 1.9. [Dressler] (a) Modify the proof of Theorem 1.9 to show that
pn+1 < 2pn − 10 for all n > 6.
(Hint: Assume there is an integer n 1000 for which no prime p has
the
property n < p < 2n − 10, and consider the primes dividing N = 2n−10
n−10 .)
(b)*Use your result to prove that every positive integer apart from 1, 2, 4, 6
and 9 can be written as a sum of distinct odd primes.
1.3.2 Mersenne Primes
Mersenne3 noticed that 22 − 1 = 3, 23 − 1 = 7, 25 − 1 = 31, and 27 − 1 = 127
are all primes. He suggested on the basis of experiments that 2p − 1 would be
a prime whenever p is a prime that exceeds by 3 or less an even power of 2.
Lemma 1.11. If 2n − 1 is prime, then n is prime.
Proof. We prove the contrapositive statement that n being composite
forces 2n − 1 to be composite. If n = ab with a, b > 1, then
2n − 1 = (2a − 1)(2n−a + 2n−2a + · · · + 2a + 1),
so 2n − 1 is composite.
The list of primes noticed by Mersenne does not continue uninterrupted
because 211 −1 is composite. A prime of the form 2p −1 is known as a Mersenne
3
Marin Mersenne (1588–1648) was a French friar in the religious order of the
Minims. He defended Descartes and Galileo against their theological critics and
worked to undermine alchemy and astrology. He wrote on music as part of his
studies in physics and mathematics.
24
1 A Brief History of Prime
prime. The next few Mersenne primes are 213 − 1, 217 − 1 and 219 − 1. It is
not known if there are infinitely many Mersenne primes. That 219 − 1 is prime
was known to Cataldi in 1588, and this was the largest known prime for 150
years. Fermat discovered that 223 − 1 is not prime in 1640; in 1732 Euler knew
that 229 − 1 is not prime but that 231 − 1 is prime.
It is worth pausing to say something about how this knowledge, which
potentially requires the factorization of ten-digit numbers, accrued. Generally
this involved a mixture of improving technique with congruences, some guile,
and some heroic calculations. The first of several theoretical advances was
discovered by Fermat and is now known as Fermat’s Little Theorem.
Theorem 1.12. [Fermat’s Little Theorem] For any prime p and any
integer a,
ap ≡ a (mod p).
In keeping with our philosophy about differing approaches, we present two
proofs of Fermat’s Little Theorem.
Combinatorial Proof. It is enough to prove the statement when a is a
positive integer, so we use induction. The result is true for a = 1 because
both sides are 1. Assume it is true for a = b. Now
p p j
p
p
p−1
(b + 1) = b + pb
+ · · · + pb + 1 =
b
j
j=0
p!
by the Binomial Theorem. For 0 < j < p, pj = j!(p−j)!
has a numerator
divisible by p and denominatornot
divisible
by
p;
the
Fundamental
Theorem
of Arithmetic then shows that pj is divisible by p for j = 1, . . . , p − 1. So
(b + 1)p ≡ bp + 1 ≡ b + 1
(mod p)
by the inductive hypothesis. Thus Fermat’s Little Theorem is proved.
Exercise 1.10. Prove that the product of any n successive integers is divisible
by n!.
A second, and often more useful, version of Fermat’s Little Theorem can
be written as follows. Integers a and b are said to be coprime if gcd(a, b) = 1.
For all a ∈ Z that are coprime to p,
ap−1 ≡ 1 (mod p).
(1.21)
This form is easily seen to be equivalent to Theorem 1.12 as follows:
ap − a = a(ap−1 − 1),
so when
p does not divide a the Fundamental Theorem of Arithmetic shows
that p(ap−1 − 1) if and only if p(ap − a).
1.3 Listing the Primes
25
The second proof of Fermat’s Little Theorem proves the congruence (1.21)
and uses slightly more sophisticated ideas from group theory. The virtue of
this second proof is that it is quicker and (as we shall see) is better suited
to generalization. It does require some properties of modular arithmetic (see
Exercise 1.28 on p. 38).
Proof Using Group Theory. Work in the group G = (Z/pZ)∗ of nonzero
residues modulo p under multiplication. The residue of a generates a cyclic
subgroup of G whose order must divide that of G by Lagrange’s Theorem.
Since the order of G is (p − 1), we deduce Equation (1.21).
This proof is something of an anachronism: Lagrange’s Theorem generalized Fermat’s Little Theorem. However, thinking of residues using group
theory is a powerful tool and gives rise to many more results, so it is useful to
begin thinking in those terms now. Exercise 3.6 on p. 62 gives a good example
where a proof using group theory can be favourably compared with a proof
that only uses congruences.
Exercise 1.11. Fermat’s Little Theorem says that, for any prime p, 2p−1 − 1
is divisible by p. It sometimes happens that 2p−1 −1 is divisible by p2 . Find all
the primes p with this property for p < 106 . Such primes are called Wieferich
primes, and it is not known if there are infinitely many of them.
Exercise 1.12. *A pair of congruences that arises in the Catalan problem
(see p. 57) for odd primes p, q is
pq−1 ≡ 1
(mod q 2 ) and q p−1 ≡ 1
(mod p2 ).
(1.22)
A pair of odd primes satisfying Equation (1.22) is called a Wieferich pair.
Find all the Wieferich pairs with p, q < 104 .
Exercise 1.13. An integer n is called a perfect number if it is equal to the
sum of its proper divisors. Thus 6 = 1 + 2 + 3 is a perfect number.
(a) If q = 2p − 1 is a Mersenne prime, prove that 2p−1 q is a perfect number.
(b) Prove that if n is an even perfect number, then n has the form 2p−1 (2p −1)
for some prime of the form 2p − 1.
It is not known if there are any odd perfect numbers, but there are certainly
no odd perfect numbers smaller than 10400 .
Write Mn = 2n − 1 for the nth Mersenne number. The Mersenne numbers
have special properties that make them particularly suitable for primality
testing. The next result is the first of a series of results showing that divisors
of Mn are quite prescribed when n is prime.
Lemma 1.13. Suppose p is a prime and q is a nontrivial prime divisor of Mp .
Then q ≡ 1 modulo p.
26
1 A Brief History of Prime
Again, we give two proofs.
Proof Using the Euclidean Algorithm. The condition that q divides Mp amounts to
2p ≡ 1 (mod q).
By Fermat’s Little
Theorem, 2q−1 ≡ 1 modulo q. Let d = gcd(p, q − 1).
If d = p, then p(q − 1) as required. The only other possibility is d = 1 since p
is prime. By Theorem 1.23 (see p. 35), in this case there are integers a and b
with 1 = pa + (q − 1)b. Notice that one of a and b must be negative. Now
2 ≡ 21 ≡ 2pa+(q−1)b ≡ (2p )a (2(q−1) )b ≡ 1a 1b ≡ 1
(mod q),
(1.23)
which is impossible as q > 1, so the result is proved.
In the preceding argument, we have made use of negative exponents of
expressions modulo q, but only in the form
1−a ≡ 1 (mod q) for a > 0.
(1.24)
Proof Using Group Theory. Work in the group G of nonzero residues
modulo q. In this group 2 generates a cyclic subgroup whose order divides p
since 2p − 1 ≡ 0 modulo q. Since 2 is not the identity and p is prime, the
order of 2 must be p. Again, by Lagrange’s Theorem, this order must divide
the order of the group G, which is (q − 1).
Example 1.14. Lemma 1.13 is a significant help in factorizing Mn . To see how
this works, we present Fermat’s proof from 1640 that 223 − 1 is not prime. If q
is a prime dividing
223 − 1, then q ≡ 1 modulo 23. Now 23n + 1 is a prime
√
smaller than 223 − 1 only for
n = 2, 12, 20, 26, 30, 36, 42, 44, 50, 56, 60, 62, 72, 84, 86, 102, 104, 110.
Trial division shows that M23 is divisible by the first of the resulting numbers, 47. In general, there is no reason to expect the smallest possible candidate
to be a divisor, but even if the largest were the first such divisor, only 18 trial
divisions are involved.
In 1876, Lucas discovered a test for proving the primality of Mersenne
numbers. Using this test, he proved that
2127 − 1 = 170141183460469231731687303715884105727
is prime, but 267 − 1 is not. This disproved the suggestion of Mersenne.
The latter number occupies a special place in the history (and folklore) of
mathematics. First, Lucas showed it is not prime but was not able to exhibit a
nontrivial factor, which might seem a remarkable idea. In fact, it is something
we will encounter again in the computational number theory sections. Second,
1.3 Listing the Primes
27
this number was the subject of a famous talk given by Prof. F. N. Cole to
the American Mathematical Society in 1903 entitled “On the Factorization
of Large Numbers.” On one blackboard, he wrote out the decimal expansion
of 267 − 1 and on another he proceeded to compute the product of 193707721
and 761838257287, thereby showing them to be equal. The legend goes that
after this silent lecture he sat down to “prolonged applause.”
The specific arithmetic properties of Mersenne numbers mean that results
on the primality of later terms in the sequence sometimes predated results on
earlier terms. For example, 2127 −1 was shown to be prime in 1876 while 289 −1
and 2107 − 1 were shown to be prime in 1914.
Exercise 1.14. *[Lucas–Lehmer Test] Define an integer sequence by
S1 = 4
and Sn+1 = Sn2 − 2
for n 2.
Let p be an odd prime. Prove that Mp = 2p − 1 is a prime if and only
if Sp−1 ≡ 0 modulo Mp .
1.3.3 Zsigmondy’s Theorem
Although the proof of the conjecture that there are infinitely many Mersenne
primes seems a long way off, it is known that the sequence starts to produce
new prime factors very quickly. A prime p is a primitive divisor of Mn if p
divides Mn but does not divide Mm for any m < n. Table 1.2 shows the prime
factorization of Mn for 2 n 24, with primitive divisors shown in bold.
The pattern that seems to emerge from Table 1.2 turns out to reflect
something genuine. Sequences such as the Mersenne sequence, after a few
initial terms, always have primitive divisors.
Theorem 1.15. [Zsigmondy] Let Mn = 2n −1. Then for every n = 6, n > 1,
the term Mn has a primitive divisor.
As seen in Table 1.2, M6 does not have a primitive divisor, so this result
is optimal. The proof of Theorem 1.15 is presented in Section 8.3.1 on p. 167,
after we have proved the Möbius inversion formula (Theorem 8.15). A basic
result that will be needed for the proof can be proved now, using the Binomial
Theorem. Notice that this result, proved as the next exercise, already shows
that the divisors of the sequence (Mn ) have a special structure.
Exercise 1.15. Let p denote a prime, and for any integer N , define ordp (N
)
to be the exact power of p that divides N . Thus ordp (N ) = a means pa N
but pa+1 N .
(a) Prove that ordp behaves like a logarithm in the sense that
ordp (xy) = ordp (x) + ordp (y)
for all integers x, y.
(b) Prove that if pMn then ordp (Mkn ) = ordp (Mn ) + ordp (k).
(c) Deduce that gcd(Mn , Mm ) = Mgcd(n,m) for all m, n.
28
1 A Brief History of Prime
Table 1.2. Primitive divisors of (Mn ).
n
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Mn
Factorization
3
3
7
7
15
3·5
31
31
63
32 · 7
127
127
255
3 · 5 · 17
511
7 · 73
1023
3 · 11 · 31
2047
23 · 89
4095
3 · 5 · 7 · 13
8191
8191
16383
3 · 43 · 127
32767
7 · 31 · 151
65535
3 · 5 · 17 · 257
131071
131071
262143
33 · 7 · 19 · 73
524287
524287
1048575 3 · 52 · 11 · 31 · 41
2097151
7 · 127 · 337
4194303
3 · 23 · 89 · 683
8388607
47 · 178481
16777215 3 · 5 · 7 · 13 · 17 · 241
Exercise 1.16. (a) Show that if q is a prime then every prime divisor of Mq
is a primitive divisor.
(b) If Mn does not have a primitive divisor show that Mn divides the quantity
Mn/p .
n
p|n,
p<n
(c) Deduce that for n > 6, every term Mn has a primitive divisor if n has only
two distinct prime divisors. (Hint: take logarithms of the quantities in (b) and
compare the growth rates of both sides.)
(d) What can you deduce if n has three distinct prime divisors?
Zsigmondy’s Theorem holds in greater generality, though we will not prove
the following result here.
Theorem 1.16. [Zsigmondy] Let an = cn − dn , where c > d are positive
coprime integers. Then an always has a primitive divisor unless
(1) c = 2, d = 1 and n = 6; or
(2) c + d = 2k and n = 2.
1.4 Fermat Numbers
29
Exercise 1.17. Find some nontrivial examples of case (2) of the theorem.
A more general result is considered in Exercise 8.19 on p. 169.
Exercise 1.18. Prove that the sequence (un ) does not satisfy a Zsigmondy
Theorem in each of the following cases. This means that for every N there is
a term un , n > N , which does not have a primitive divisor.
(a) un = an + b for integers a and b;
(b) un = n2 + an + b for integers a and b with the property that the zeros
of x2 + ax + b are integers;
(c)*un = n2 + an + b for integers a and b.
Exercise 1.19. *Can any polynomial un = nd + ad−1 nd−1 + · · · + a0 for integers a0 , . . . , ad−1 have the property that the sequence (un ) satisfies a Zsigmondy Theorem?
1.3.4 Mersenne Primes in the Computer Age
The arrival of electronic computers extended the limits of large Mersenne
prime-hunting dramatically.
Table 1.3 is a short list showing how the size of the largest known Mersenne
prime has grown over recent years; #Mp denotes the number of decimal digits
in Mp . In 1978, Nickol and Noll were 18-year-old students. We do not distinguish here between a Mersenne prime that is the largest known at the time
from a Mersenne prime for which all smaller Mersenne primes are known;
see the references for a more detailed discussion. In Table 1.3, (G) denotes
GIMPS and (P) denotes PrimeNet; these are distributed computer searches
using idle time on many thousands of computers all over the world. Because
of the special properties of Mersenne numbers (and related numbers of special
shape), it has usually been the case that the largest explicitly known prime
number is a Mersenne prime.
1.4 Fermat Numbers
n
Fermat noticed that the expression Fn = 22 + 1 takes prime values for the
first few values of n:
F0 = 3,
F1 = 5,
F2 = 17,
F3 = 257,
and
F4 = 65537.
He believed the sequence might always take prime values.
Euler in 1732 gave
the first counterexample, when he showed that 641F5 .
Euler, in common with Fermat and many others, was able to perform
these impressive calculations through a good use of technique to minimize
the amount of calculation required. Since Euler’s time, many other Fermat
numbers have been investigated and shown to be composite. No prime values
30
1 A Brief History of Prime
Table 1.3. Largest known prime values of Mp (from Caldwell’s Prime Pages [25]).
p
17
19
31
61
89
107
127
521
607
1279
2203
2281
3217
4253
4423
9689
9941
11213
19937
21701
23209
44497
86243
110503
132049
216091
756839
859433
1257787
1398269
2976221
3021377
6972593
13466917
20996011
24036583
#Mp
6
6
10
19
27
33
39
157
183
386
664
687
969
1281
1332
2917
2993
3376
6002
6533
6987
13395
25962
33265
39751
65050
227832
258716
378632
420921
895932
909526
2098960
4053946
6320430
7235733
Date
1588
1588
1772
1883
1911
1914
1876
1952
1952
1952
1952
1952
1957
1961
1961
1963
1963
1963
1971
1978
1979
1979
1982
1988
1983
1985
1992
1994
1996
1996
1997
1998
1999
2001
2003
2004
Discoverer
Cataldi
Cataldi
Euler
Pervushin
Powers
Powers
Lucas
Robinson
Robinson
Robinson
Robinson
Robinson
Riesel
Hurwitz
Hurwitz
Gillies
Gillies
Gillies
Tuckerman
Nickol and Noll
Noll
Nelson and Slowinski
Slowinski
Colquitt and Welsh
Slowinski
Slowinski
Slowinski and Gage
Slowinski and Gage
Slowinski and Gage
Armengaud, Woltman et al. (G)
Spence, Woltman et al. (G)
Clarkson, Woltman, Kurowski et al. (G, P)
Hajratwala, Woltman, Kurowski et al. (G, P)
Cameron, Woltman, Kurowski et al. (G, P)
Shafer, Woltman, Kurowski et al. (G, P)
Findley, Woltman, Kurowski et al. (G)
of Fn with n > 4 have been discovered, and it is generally expected that only
finitely many terms of the sequence (Fn ) are prime.
To begin, we return to Euler’s result that 641 divides F5 . First, notice
that 640 = 5 · 27 ≡ −1 modulo 641 so working modulo 641,
1 = (−1)4 ≡ (5 · 27 )4 = 54 · 228 .
Now 54 = 625 ≡ −16 modulo 641 and 16 = 24 . Hence
1.5 Primality Testing
1 ≡ −232 ≡ −22
5
31
(mod 641).
Of course, this elegant argument is useful only once we suspect that 641
is a factor of F5 . Euler also used some cunning to reach that point.
Lemma 1.17. Suppose p is a prime with pFn . Then p = 2n+1 k + 1 for
some k ∈ N.
Example 1.18. When n = 5, Lemma 1.17 shows that if p is a prime dividing F5 ,
then p = 26 k + 1 = 64k + 1 for some k. Thus the list of possible divisors is
greatly reduced. We only have to test F5 for divisibility by
65, 129, 193, 257, 321, 385, 449, 513, 577, 641, . . . ,
of which 65, 129, 321, 385, 513, . . . are not primes. Therefore we only have
to test 193,
257, 449, 577, 641, . . . and so on. At the fifth attempt, we find
that 641F5 .
n
Proof of Lemma 1.17. Suppose p is a prime with pFn , so 22 ≡ −1
modulo p and p is odd. Hence
22
n+1
n
= (22 )2 ≡ (−1)2 ≡ 1
(mod p).
Let d = gcd(2n+1 , p − 1), and write d = 2n+1 a + (p − 1)b for integers a and b
using Theorem 1.23. Just as in Equation (1.23) one of a and b will be negative,
so we again use Equation (1.24) to argue that
2d = 22
n+1
a+(p−1)b
≡ (22
n+1
)a (2p−1 )b ≡ 1
(mod p).
Since d2n+1 , d = 2c for some 0 c n + 1 so
c
22 = 2d ≡ 1 (mod p).
n
However, 22 ≡ −1 modulo p and −1 ≡ 1 modulo p, so the smallest possibility
for c is (n + 1). Hence d = 2n+1 . On the other hand, d(p − 1) so p − 1 = k2n+1
as claimed.
Exercise 1.20. Strengthen Lemma 1.17 by showing that any prime p dividing Fn must have the form 2n+2 k + 1 for some k ∈ N.
1.5 Primality Testing
We have covered enough ground to take a first look at the challenges thrown
up by primality testing. Given a small integer, one can determine if it is
prime by testing for divisibility by known small primes. This method becomes
totally unfeasible very quickly. We are really trying to factorize. The ability
32
1 A Brief History of Prime
to rapidly factorize large integers remains the Holy Grail of computational
number theory. Later we will look at some more sophisticated techniques and
estimate the range of integers for which they are applicable.
For now, we concentrate on properties of primes that can be used to help
determine primality. Fermat’s Little Theorem is an example, although it does
not give a necessary and sufficient condition for primality, just a necessary one.
The next result does give a necessary and sufficient condition; it is known as
Wilson’s Theorem because of a remark to this effect allegedly made by John
Wilson in 1770 to the mathematician Edward Waring. An early proof was
published by Lagrange in 1772. The theorem first seems to have been noted
by al-Haytham4 some 750 years before Wilson.
Theorem 1.19. An integer n > 1 is prime if and only if
(n − 1)! ≡ −1 (mod n).
Proof of ‘only if’ direction. We prove that the congruence is satisfied
when n is prime and leave the converse as an exercise. Assume that n = p is
an odd prime. (The congruence is clear for n = 2.)
Each of the integers 1 < a < p − 1 has a unique multiplicative inverse
distinct from a modulo p (see Corollary 1.25). Uniqueness is obvious; for
distinctness, note that a2 ≡ 1 modulo p implies p(a+1)(a−1), forcing a ≡ ±1
modulo p by primality. Thus in the product
(p − 1)! = (p − 1)(p − 2) · · · 3 · 2 · 1,
all the terms cancel out modulo p except the first and the last. Their product
is clearly −1 modulo p.
Exercise 1.21. Prove the converse: If n > 1 and (n − 1)! ≡ −1 modulo n,
then n is prime.
Exercise 1.22. [Gauss] Prove the following generalization of Theorem 1.19.
Let
Pn =
m
m<n,
gcd(m,n)=1
be the product of all positive integers less than n and coprime to n. Then Pn +1
is divisible by n if n is equal to 4, pk , or 2pk for some odd prime p, and Pn − 1
is divisible by n if n is not of that form.
4
Abu Ali al-Hasan ibn al-Haytham (964–1040) lived in Persia and Egypt. He is
most famous for Alhazen’s Problem: Find the point on a spherical mirror where
a light will be reflected to an observer. In number theory, in addition to proving
what we often call Wilson’s Theorem, al-Haytham worked on perfect numbers
(see Exercise 1.13).
1.5 Primality Testing
33
Exercise 1.23. [Clement] (a) Use al-Haytham’s Theorem (Theorem 1.19)
to prove that, for n > 1, n and n + 2 are both prime if and only if
4 (n − 1)! + 1 + n ≡ 0 (mod n(n + 2)).
(b) Prove that, for n > 13, the triple n, n + 2, and n + 6 are all prime if and
only if
4320 4 (n − 1)! + 1 + n + 361n(n + 2) ≡ 0 mod n(n + 2)(n + 6) .
(c) Find a similar characterization of prime triples of the form n, n + 4,
and n + 6.
Primes p for which p + 2 is also a prime are called twin primes, and it
is a long-standing conjecture that there are infinitely many twin primes. A
remarkable result of Brun from 1919 is that the reciprocals of the twin primes
(whether there are infinitely many or not) are summable:
p,p+2∈P
1
= B < ∞.
p
(1.25)
Numerical estimation of Brun’s constant B is very difficult.
Exercise 1.24. Theorem 1.19 gives another ‘formula’ for the primes. Show
that (n − 2)! is congruent to 1 or 0 modulo n depending on whether n is prime
or not, for n 3.
(a) Deduce that the prime counting function π(X) = |{p ∈ P | p X}| may
be written
π(X) = 1 +
X j=3
(j − 2)!
(j − 2)! − j
j
, X 3,
with π(1) = 0, π(2) = 1.
(b) Define a function f by f (x, x) = 0 and
1
x−y
f (x, y) =
1+
for x = y.
2
|x − y|
Use Theorem 1.9 to prove that
n
pn = 1 +
2
f (n, π(j)).
j=1
In principle, Theorem 1.19 seems to offer a general primality test because
the condition is necessary and sufficient. The problem is that in practice it
is impossible to compute (n − 1)! modulo n in a reasonable amount of time
34
1 A Brief History of Prime
for any integer that is not quite small. In Chapter 12 we will seek to give a
better understanding of what counts as “small” or “large” in terms of modern
computing.
Fermat’s Little Theorem offers another hope. Taking a = 2, Fermat’s Little
Theorem implies that
2p−1 ≡ 1 (mod p) whenever p is prime.
(1.26)
At various times in history, it has been thought that a kind of converse might
be true: If n is odd and 2n−1 ≡ 1 modulo n, might it follow that n is prime?
Calculations tend to support this, and for n < 341 this does indeed successfully
detect primality.
Example 1.20. Testing the congruence 2n−1 ≡ 1 modulo n fails to detect the
fact that n = 341 = 11 · 31 is composite. By Fermat’s Little Theorem, 210 ≡ 1
modulo 11 so 2340 ≡ 134 ≡ 1 modulo 11. Also 25 = 32 ≡ 1 modulo 31, so
2340 = (25 )68 ≡ 168 = 1 (mod 31).
Thus 2340 − 1 is divisible by the coprime numbers 11 and 31, and hence by
their product 341, so 2340 ≡ 1 modulo 341.
However, Fermat’s Little Theorem says more than Equation (1.26): It gives
the congruence
ap−1 ≡ 1 (mod p)
for any base a, not just a = 2. Taking a = 3 in Example 1.20, we quickly find
3340 ≡ 56 (mod 341),
which contradicts Fermat’s Little Theorem with a = 3, showing that 341
cannot be prime. Notice the recurrence of a phenomenon encountered before:
Using a = 3, we have shown that a number is not prime without exhibiting a
nontrivial factor.
This method suggests the following as a primality test. Given an integer n,
choose numbers a at random with 1 < a < n and test to see if an−1 ≡ 1
modulo n. If not, then n is definitely composite. If the congruence is satisfied
for several such a, we might view this as compelling evidence that n must be
prime. Unfortunately, this also fails as a primality test.
Exercise 1.25. Prove that n = 561 is a composite number that satisfies Fermat’s Little Theorem for every possible base by showing that a560 ≡ 1 modulo 561 for every a, 1 < a < n with gcd(a, 561) = 1. (Hint: Use Fermat’s Little
Theorem on each of the factors 3, 11, and 17 of 561.)
A composite integer that satisfies the congruence of Fermat’s Little Theorem for all bases coprime to itself is known as a Carmichael number ; these will
be discussed in more detail in Section 12.5. It was not known whether there
1.6 Proving the Fundamental Theorem of Arithmetic
35
are infinitely many Carmichael numbers until 1994, when Alford, Granville,
and Pomerance not only proved that there are infinitely many but gave some
measure of how many there are asymptotically. The existence of infinitely
many Carmichael numbers renders the test based on Fermat’s Little Theorem test too unreliable. Later, we will see however that a more sophisticated
version is salvageable as a primality test.
1.6 Proving the Fundamental Theorem of Arithmetic
We uncover Euclid’s real genius once we try to prove the Fundamental Theorem of Arithmetic. There are two parts to it: existence and uniqueness. The existence part is not difficult. Let n > 1 be an integer, and choose r with 2r > n.
If n itself is not divisible by any a with 1 < a < n, then nothing else needs
to be said. Otherwise, we can write n = ab with 1 < a, b < n. Again, if a
and b cannot be factorized, further then we are done. If this is not the case
then at least one of them can be factorized. Once we have done this r times,
we have n = a1 · · · ar with each 1 < ai < n. This implies n 2r , giving a
contradiction. Thus n must be a product of no more than r prime factors.
It is when we come to the uniqueness part of the proof that we uncover a
subtlety – namely, that the definition of prime as an irreducible element is not
really adequate to prove the Fundamental Theorem of Arithmetic. Suppose
we try to argue as follows: Consider two factorizations for n into primes, say
p1 · · · p r = n = q 1 · · · q s .
We would like to say that because p1 divides the right-hand side, it must
divide one of the qi . However, if we are working with the definition of prime
as irreducible, then we need a result that tells us that being irreducible forces
this divisibility property. Such a result may be found using the Euclidean
Algorithm.
Later, we will see examples in rings that are closely related to Z whose
elements have genuinely different factorizations into irreducibles.
Exercise 1.26. Let
A = {n ∈ N | n ≡ 1 (mod 4)},
and call n = 1 an A-prime if the only divisors of n in A are 1 and n.
(a) Show that every element of A except 1 factorizes as a finite product of Aprimes.
(b) Show that this factorization into A-primes is not unique.
1.6.1 The Euclidean Algorithm
Given a, b > 0 in Z, we can always find q and r with a = bq + r and 0 r < b.
Indeed, for q we can simply take the integer part a/b
of a/b and then show
that by defining r = a − bq we must have 0 r < b.
36
1 A Brief History of Prime
Something very interesting happens when we iterate this process. It will
help to define q = q1 and r = r1 and continue to find quotients and remainders
as follows:
a = bq1 + r1 ,
0 r1 < b
b = r1 q 2 + r 2 ,
0 r2 < r1
..
..
.
.
rn−3 = rn−2 qn−1 + rn−1 ,
rn−2 = rn−1 qn + rn ,
rn−1 = rn qn+1 + 0.
0 rn−1 < rn−2
0 rn < rn−1
The sequence of remainders is decreasing and each term is nonnegative, so the
sequence
must terminate. We have written rn for the last nonzero remainder,
so rn rn−1 . We claim that rn is the greatest common divisor of a and b.
Example 1.21. Let a = 17 and b = 11. Then the Euclidean Algorithm gives
the equations
17 = 11 · 1 + 6,
11 = 6 · 1 + 5,
6 = 5 · 1 + 1,
5 = 1 · 5 + 0.
The last nonzero remainder is the greatest common divisor of 17 and 11, which
is clearly 1.
To prove that rn = gcd(a, b), we need a better notion of greatest common
divisor than the intuitive one.
Definition 1.22. If a and b in Z are not both zero, d is said to be a greatest
common divisor of a and b if
(1) da and db; and
(2) if d is any number with d a and d b, then d d.
The first condition says d is a common divisor of a and b, while the second
says it is the greatest such divisor.
Note that we say “a” greatest common divisor rather than “the” greatest
common divisor because if d satisfies this condition then −d will also satisfy the definition. If we work in N, then the greatest common divisor will
be unique. The notation gcd(a, b) denotes the unique nonnegative greatest
common divisor of a and b. If gcd(a, b) = 1, then we will call a and b coprime.
Exercise 1.27. Using Definition 1.22, show that rn = gcd(a, b). (Hint: Work
your way up and then down the chain of equations to verify the two properties.)
1.6 Proving the Fundamental Theorem of Arithmetic
37
The next result is fundamental to the structure of the integers; it is an
easy consequence of the Euclidean Algorithm and is sometimes referred to as
Bezout’s Lemma.
Theorem 1.23. If d = gcd(a, b) with a, b ∈ Z not both zero, then there are
numbers x, y ∈ Z with
d = ax + by.
(1.27)
Proof. The idea is to work your way up the chain of equations in the Euclidean Algorithm, always expressing the remainder in terms of the previous
two remainders. Writing ∗ for an integer, we get
gcd(a, b) = rn = rn−2 − rn−1 qn
= rn−2 (1 + qn qn−1 ) − rn−3 qn
= rn−3 · ∗ + rn−4 · ∗
..
.
= b · ∗ + r1 · ∗
= a · ∗ + b · ∗.
Example 1.24. Using the equations from Example 1.21 we find that
1 = 6−5
= 6 − (11 − 6)
= 2 · 6 − 11
= 2(17 − 11) − 11
= 2 · 17 − 3 · 11.
Corollary 1.25. Let n > 1 and a denote elements of Z. Then a and n are
coprime if and only if there exists x with
ax ≡ 1 (mod n).
That is, gcd(a, n) = 1 if and only if a is invertible modulo n.
Proof. The congruence is equivalent to the existence of an integer y with
ax + ny = 1.
If a and n have a factor in common then that factor will also divide 1, so the
congruence implies a and n are coprime. Conversely, if a and n are coprime
then 1 is a greatest common divisor of a and n so we can use Theorem 1.23
to see that there are integers x and y with ax + ny = 1, which translates into
the congruence.
38
1 A Brief History of Prime
∗
Exercise 1.28. Let p be a prime. Prove that the set (Z/pZ) of nonzero
elements in Z/pZ forms a group under multiplication modulo p.
One of the remarkable things about the Euclidean Algorithm is that it
finds the greatest common divisor of two integers without factorizing either
of them. We will see later how this has been exploited in powerful ways by
computational number theory in recent years.
Exercise 1.29. Prove the Fundamental Theorem of Arithmetic using Theorem 1.23. (Hint: This is done in greater generality on p. 47.)
1.6.2 An Inductive Proof of Theorem 1.1
We wish to prove that any natural number n has a decomposition n = p1 · · · pr
into primes uniquely up to rearrangement of the prime factors.
For n = 2, the theorem is clearly true. We proceed by induction. Suppose
that the Fundamental Theorem of Arithmetic holds for all natural numbers
strictly less than some a > 1. We want to deduce the Fundamental Theorem
of Arithmetic for a. Let
D = {d | d > 1, da}
denote the set of non-identity divisors of a. The set D is nonempty since it
contains a, so it has a smallest element, which we denote p. This smallest
element must be a prime because if it had a nontrivial divisor that would be
a smaller element of D. Thus we have a decomposition
a = pb, p prime, b < a.
Since b < a, by the inductive hypothesis, the Fundamental Theorem of Arithmetic holds for b, so there is a prime decomposition
b = p 1 · · · ps
into primes uniquely up to rearrangement. It follows that
a = p · p1 · · · ps
is a prime decomposition of a, and a has no other prime decomposition involving the prime p.
Suppose that a has another prime decomposition,
a = q1 · · · qr ,
in which the prime p does not appear. In particular, q1 = p. Moreover, by the
definition of p, q1 > p since q1 ∈ D, 1 q1 − p < q1 . Let c = q2 · · · qr , and
define
a0 = a − pc = p(b − c) = (q1 − p)c.
(1.28)
1.7 Euclid’s Theorem Revisited
39
Now 1 a0 < a and the divisors (b − c), (q1 − p), and c are all less than a.
By the inductive hypothesis, the numbers a0 , (b − c), (q1 − p), and c all have
unique prime decompositions. By Equation (1.28), the prime p must appear
in any prime decomposition of a0 and therefore (by uniqueness) must also
appear in the decomposition of (q1 − p) or that of c.
Now p cannot
appear in a prime decomposition of (q1 − p) because that
would require pq1 , which is impossible, as p and q1 are distinct primes. Nor
can p appear in a prime decomposition of c = q2 · · · qr by assumption. Thus
the assumption of a second prime decomposition for a leads to a contradiction,
completing the proof of the Fundamental Theorem of Arithmetic.
1.7 Euclid’s Theorem Revisited
In this section, three further proofs of Theorem 1.2 are given, each interesting
and suggestive in its own right.
1.7.1 What Did Euclid Really Prove?
First, we return to the master’s proof. The following is a translation of Euclid’s
proof taken from Joyce’s Web translation of Euclid’s Elements. In Euclid’s
time, numbers were thought of as relatively concrete lengths of line segments.
Thus, for example, a number A measures a number B if a stick of length A
could be used to fit into a stick of length B a whole number of times. In
modern terminology, A divides B. We start with Euclid’s Theorem in (an
approximation of) Euclid’s language:
OÉ prÀtoi ĆrijmoÈ pleÐouc eÊsÈ pantäc toÜ
protejèntoc plăjouc prÿtwn ĆrijmÀn.
A translation of this is the following theorem, which is Proposition 20 of
Book IX in Euclid’s Elements.
Theorem 1.26. The prime numbers are more than any assigned multitude of
prime numbers.
Proof. Let A, B, and C be the assigned prime numbers. I say that there are
more prime numbers than A, B, and C. Take the least number DE measured
by A, B, and C. Add the unit DF to DE.
Then EF is either prime or not.
First, let it be prime. Then the prime numbers A, B, C, and EF have
been found, which are more than A, B, and C.
Next, let EF not be prime. Therefore, it is measured by some prime number. Let it be measured by the prime number G. I say that G is not the same
as any of the numbers A, B, and C.
40
1 A Brief History of Prime
If possible, let it be so.
Now A, B, and C measure DE, and therefore G also measures DE. But
it also measures EF . Therefore G, being a number, measures the remainder,
the unit DF , which is absurd.
Therefore G is not the same as any one of the numbers A, B, and C,
and by hypothesis it is prime. Therefore, the prime numbers A, B, C, and G
have been found, which are more than the assigned multitude of A, B, and C.
Therefore, prime numbers are more than any assigned multitude of prime
numbers.
There is little between this argument and Euclid’s proof in modern form
on p. 8. Euclid did not have our modern notion of infinity, so he proved that
there are more primes than any prescribed number. He also often stated proofs
using examples (in this case, what he really proves is that there are more than
three primes), but it is clear he understood the general case. It is possible that
part of the reason for this is the notational difficulties involved in dealing with
arbitrarily large finite lists of objects.
1.7.2 A Topological Proof of Theorem 1.2
In 1955, Furstenberg gave a completely different type of proof of the infinitude
of the primes using ideas from topology.
Furstenberg’s Topological Proof of Theorem 1.2. Define a topology
on the integers Z by taking as a basis the arithmetic progressions. For each
prime p, let Sp denote the arithmetic progression pZ. Since
Sp = Z\ (pZ + 1) ∪ · · · ∪ (pZ + (p − 1)) ,
the set Sp is the complement of an open set, and thus is closed. Let S = p Sp
be the union of all the sets Sp as p varies over the primes. If there are only
finitely many primes, then S is a finite union of closed sets, and thus is closed.
However, every integer except ±1 is in some Sp , so the complement of S
is {1, −1}, which is clearly not open. It follows that S cannot be closed and
therefore cannot be a finite union, so there must be infinitely many primes. In contrast with the other proofs of Theorem 1.2, this is qualitative – all
it tells us about the prime counting function is that π(X) → ∞ as X → ∞.
1.7.3 Goldbach’s Proof
Goldbach showed how one may use a sequence of integers with the property
that an infinite subsequence are pairwise coprime to give a different proof.
Goldbach’s Proof of Theorem 1.2. We claim that the Fermat numn
bers Fn = 22 + 1 are pairwise coprime:.
m = n =⇒ gcd (Fm , Fn ) = 1.
(1.29)
1.7 Euclid’s Theorem Revisited
41
The first step is to show by induction that
Fm − 2 = F0 F1 · · · Fm−1 for all m 1.
(1.30)
To see why this is true, first note that F1 − 2 = F0 and assume that Equation (1.30) holds for m k. Then
F0 F1 · · · Fk−1 Fk = (Fk − 2)Fk
k
k
= 22 − 1 2 2 + 1
= 22
k+1
− 1 = Fk+1 − 2,
showing Equation (1.30) by induction. Thus for m > n,
dFm , dFn =⇒ dFm − 2 =⇒ d2,
which forces d to be 1 since all the Fn are odd numbers. This proves Equation (1.29).
This in turn means there must be infinitely many primes. By Theorem 1.1,
each Fn has a prime factor pn , say, and by Equation (1.29) these are all
distinct.
The proof using Fermat numbers actually does a little more than prove
there are infinitely many primes. It also gives some insight into how many
primes there are that are smaller than a given number. By the time we reach
the number Fn , we must have seen at least n different primes, so
log(X − 1)
1
log
π(X) ,
log 2
log 2
which is approximately proportional to log log X. This is far weaker than the
remark on p. 21.
Notes to Chapter 1: The exact history of Theorem 1.1 is not clear, and it is
likely that it was known and used long before it was explicitly stated. The earliest
precise formulation and proof seems to be due to Gauss [67], but it could be argued
that Euclid certainly knew that if a prime p divides a product ab, then p must divide a or b, and that his geometrical formalism and approach to exposition did not
require him to consider products of more than three terms (see Section 1.7.1). Many
of the proofs of Euclid’s Theorem are featured in the Prime Pages Web site [25];
Ribenboim’s book [125] describes no fewer than 11 proofs. Example 1.7 is related to
subtle problems in algebraic number theory; see Ribenboim’s book [125] for a discussion and detailed references. That the positive values of a polynomial in several
variables could coincide with the primes is essentially a by-product of Matijasevič’s
solution to one of Hilbert’s famous problems. Some of the history and references
and two explicit polynomials are given in accessible form in the paper [85] of Jones,
Sato, Wada and Wiens. The proofs of Lemma 1.8 and Theorem 1.9 are those of
42
1 A Brief History of Prime
Erdös [51] and Kalmar, and may be found in Hardy and Wright [75]; that of Corollary 1.10 follows a survey paper of Dudley [46]. Bertrand’s Postulate (Theorem 1.9)
was first proved by Tchebychef [151, Tome I, pp. 49–70, 63]. He also proved that
for any e > 15 , there is a prime between x and (1 + e)x for x sufficiently large. The
deep result of Ingham [80] has been improved a great deal — for example, Baker,
Harman and Pintz [8] have shown that there is a prime in the interval [x − x0.525 , x]
for x sufficiently large. Exercise 1.7 is due to Mills [107]. Exercise 1.8 comes from a
paper of Richert [127]; Exercise 1.9 from a paper of Dressler [45]. Further material
on Mersenne primes – and on large primes in general – may be found on Caldwell’s
Prime Pages Web site [25]; Table 1.3 is taken from his Web site. A recent account
of the GIMPS record-breaking prime is in Ziegler’s short article [167]. Zsigmondy’s
Theorems 1.15 and 1.16 appeared first in his paper [168]; a more accessible proof
may be found in a short paper by Roitman [132]. Deep recent work has extended this
to a larger class of sequences: Bilu, Hanrot and Voutier have shown that for n > 30
the nth term of any Lucas or Lehmer sequence has a primitive divisor in their
paper [15]. The current status of Fermat numbers and their factorization may be
found on Keller’s Web site [88]. Parts of the intricate connection between group
theory and the origins of modern number theory, and in particular a discussion of
how Gauss used group-theoretic concepts long before they were formalized, are in a
paper of Wußing [164]. For more on the very special numbers found in Exercise 1.11
see Ribenboim’s popular article [123]. The inductive proof of Theorem 1.1 in Section 1.6.2 is taken from Hasse’s classic text [76] and is attributed there to Zermelo.
Hasse’s text is also the source of the statement of Euclid’s Theorem in Greek in
Section 1.7.1. We thank David Joyce for permission to use the translation in Section 1.7 from his Web site [86]; this Web site is based on several translations of
Euclid’s work, but the primary and most accessible source remains the translation
by Heath [53]. Exercise 1.24 is taken from Hardy and Wright [75]. Furstenberg’s
proof of Euclid’s Theorem appeared in [63]. Exercise 1.23 is taken from Clement’s
paper [31]. Brun’s result in Equation (1.25) appeared originally in his paper [24]; a
modern proof may be found in the book of LeVeque [100]. Finally, we make some
remarks concerning Section 1.7.2. Using topology in this setting might seem odd,
but perhaps Euler’s proof using the harmonic series seemed odd when it first appeared. We don’t wish to stretch the point, but it could just be that Furstenburg’s
proof points forward to new ways of looking at arithmetic in just the same way
as Euler’s did. Profound structures in the integers have certainly been uncovered
using methods from ergodic theory, combinatorics, functional analysis, and Fourier
analysis; see a survey paper of Bergelson [11], the book by Furstenberg [64], and
a new approach in a paper of Gowers [72] for some of these startling results. In a
similar vein, Green and Tao [73] have recently proved the deep result that the primes
contain arbitrarily long arithmetic progressions.
2
Diophantine Equations
Diophantine equations are equations (very often involving polynomials with
integer coefficients) in which the solutions are required to be integers. They
have been studied since antiquity and are mathematically both challenging
and attractive because of the great diversity of methods that are needed to
understand them.
2.1 Pythagoras
In this chapter, we are going to explore the relationship between the Fundamental Theorem of Arithmetic and the study of polynomial Diophantine
problems. We begin with an equation handed down from antiquity,
x2 + y 2 = z 2 .
(2.1)
We know that an equation of this kind is related to a right-angled triangle with
side lengths x, y, and z. Right-angled triangles have been studied and used
for four thousand years (at least). Equation (2.1) is called the Pythagorean
equation to honor Pythagoras for his result connecting Equation (2.1) to rightangled triangles. We seek to identify all the integral solutions; that is, to find
all triples of integers (x, y, z) that satisfy Equation (2.1). The main point in
the first three sections of this chapter is to emphasize the symbiosis between
properties of numbers and solutions of equations.
To motivate what follows, rearrange the equation to read
x2 = z 2 − y 2 = (z + y)(z − y).
(2.2)
If we knew that gcd(z + y, z − y) = 1, then we could apply the Fundamental
Theorem of Arithmetic to argue that both (z + y) and (z − y) must themselves be squares and use the resulting equations to parametrize all triples of
solutions.
44
2 Diophantine Equations
To refine the proof, we resort to a congruence argument. First, we may
assume that the triple (x, y, z) contains no common prime factor – otherwise
we may divide through by the square of that factor. A triple (x, y, z) is called
a primitive solution of Equation (2.1) if x, y, and z have no common factor.
Second, we may assume that only one of the three is even because if two are
then the third must be, contrary to the primitive condition. Now the even one
out (so to speak) cannot be z because
x2 + y 2 ≡ 0 (mod 4)
is impossible with x and y being odd. Thus we may suppose one of x or y is
even. Without loss of generality, suppose it is x that is even. Write x = 2x
and substitute into Equation (2.2) to give
z−y
z+y
.
x2 =
2
2
Notice that each of (z ±y)/2 must be an integer because z and y are both odd.
More than that, they must be coprime because any common factor of any two
of x, y, and z must divide the third. Hence, any common divisor of (z ± y)/2
will also divide their sum and their difference, z and y, and we are assuming
the triple (x, y, z) is primitive.
Thus at last we may apply the Fundamental Theorem of Arithmetic to
deduce that (z ± y)/2 are both squares, say
z + y = 2m2 , z − y = 2n2 , m > n.
(2.3)
We are assuming z and y are positive so z + y > z − y, giving the inequality between m and n. Solving Equation (2.3) for z and y and then using Equation (2.1) to find x gives the following characterization of primitive
Pythagorean triples.
Theorem 2.1. The primitive integral solutions of the Pythagorean equation
x2 + y 2 = z 2
with even x are given by
x = 2mn, y = m2 − n2 , z = m2 + n2
with m > n coprime integers, not both odd.
The integers m > n are said to parametrize the solutions of the equation.
Exercise 2.1. For any primitive solution of Equation (2.1) show that one
of x, y, or z is divisible by 3, one by 4, and one by 5.
2.2 The Fundamental Theorem of Arithmetic in Other Contexts
45
Exercise 2.2. Finding integral solutions to Equation (2.1) is equivalent to
finding rational solutions to x2 + y 2 = 1. Find the second point of intersection
with the circle x2 + y 2 = 1 of the line with slope t through the point (1, 0),
and show that letting t run through all rationals gives all rational solutions
to x2 + y 2 = 1.
Using geometry to construct new rational solutions of Diophantine equations from old ones is a powerful idea that will be taken up again in Section 5.1.
2.2 The Fundamental Theorem of Arithmetic in
Other Contexts
In the integers, the Fundamental Theorem of Arithmetic is a direct consequence of the existence of the Euclidean Algorithm. In certain rings, the
two properties are not equivalent. For example, the Fundamental Theorem
of Arithmetic holds in the ring of integer polynomials Z[x], even though this
ring does not have a Euclidean Algorithm. Nonetheless, in many arithmetic
contexts, the Fundamental Theorem of Arithmetic can be proven easily because one has a Euclidean Algorithm. We will consider only commutative rings
with a multiplicative identity, written 1.
Definition 2.2. A commutative ring R is Euclidean if there is a function
N : R\{0} → N
with the following properties:
(1) N (ab) = N (a)N (b) for all a, b ∈ R, and
(2) for all a, b ∈ R, if b = 0, then there exist q, r ∈ R such that
a = bq + r and r = 0 or N (r) < N (b).
Such a function is called a norm on R.
Much of what follows can be done with weaker conditions. In particular,
one does not need such a strong property as (1). However, in many cases, the
norm does have this property, so we assume it to allow a speedier and more
natural development of the argument.
Example 2.3. The following are examples of Euclidean rings.
(1) Let R = Z[i] denote the Gaussian integers, so
R = {x + iy | x, y ∈ Z},
where i2 = −1. Setting N (x + iy) = x2 + y 2 shows that R is a Euclidean
ring.
46
2 Diophantine Equations
(2) Let F denote any field and let R = F[x] be the ring of polynomials with
coefficients in F. Define N (f ) = 2deg(f ) , where deg(f ) is the degree of f
in F[x], which is defined for all nonzero elements of R.
We prove the first of these; the second is an exercise.
Proof that Z[i] Is Euclidean. Condition (1) of Definition 2.2 is easily verified by direct computation. For property (2), let a, b = 0 ∈ R and
write ab−1 = p + iq with p, q ∈ Q. Now define m, n ∈ Z by
m ∈ [p − 1/2, p + 1/2), n ∈ [q − 1/2, q + 1/2).
Let q = m + in ∈ R and r = a − b(m + in). For r = 0,
N (r) = N ((ab−1 − m − in)b)
= N (p + iq − m − in)N (b)
= N (p − m + i(q − n))N (b) 1
4
+
1
4
N (b) < N (b),
showing property (2).
Exercise 2.3. When R = Z, for any fixed a and b, the values of q and r in
Definition 2.2(2) are uniquely determined. Is the same true when R = Z[i]?
In any ring, we define greatest common divisors in exactly the same way
as before. A greatest common divisor is defined up to multiplication by units
(invertible elements). In any Euclidean ring, the function N can be used to
define a Euclidean Algorithm, which can be used to find the greatest common
divisor just as for the integers.
Definition 2.4. In a ring R,
(1) α divides β, written αβ, if there is an element γ ∈ R with β = αγ;
(2) u is a unit if u divides 1;
(3) π (not equal to zero nor to a unit) is prime if for all α, β ∈ R,
π αβ =⇒ π α or π β;
(4) a non-unit µ is irreducible if
µ = αβ =⇒ α or β is a unit.
Notice that u ∈ R is a unit if and only if there is some µ with uµ = 1. We
write U (R) or R∗ for the units in the commutative ring R; this is an Abelian
group under multiplication. If the recent clutch of definitions are new to you,
we recommend the following exercise.
2.2 The Fundamental Theorem of Arithmetic in Other Contexts
47
Exercise 2.4. (a) Show that, in any commutative ring, every prime element
is irreducible.
(b) Show that, in a Euclidean ring, u is a unit if and
√ only if N (u) = 1.
(c) Show that there√are infinitely many units in Z[ 3].√
(d) Show that 3√+ −2 is an irreducible element of Z[ −2].
(e) Let ξ = −1+2 −3 and R = Z[ξ]. Prove that R is a Euclidean domain with
¯ and find all
respect to the norm N (a + bξ) = a2 − ab + b2 = (a + bξ)(a + bξ)
the units in R.
Exercise 2.5. Prove the Remainder Theorem:
For a polynomial f ∈ F[x], F
a field, f (a) = 0 if and only if (x − a)f (x).
Exercise 2.6. Give a different proof of Lemma 1.17 on p. 31 using group theory by considering the multiplicative group of units U (Z/Fn Z) = (Z/Fn Z)∗ .
Exercise 2.7. Prove that Z[x] does not have a Euclidean Algorithm by showing that the equation 2f (x) + xg(x) = 1 has no solution for f, g ∈ Z[x], but 2
and x have no common divisor in Z[x].
Despite the conclusion of Exercise 2.7, the ring Z[x] does have unique
factorization into irreducibles.
We will say that a ring has the Fundamental Theorem of Arithmetic if
either of the following properties hold.
(FTA1) Every irreducible element is prime.
(FTA2) Every nonzero non-unit can be factorized uniquely up to order and
multiplication by units.
Theorem 2.5. Every Euclidean ring has the Fundamental Theorem of Arithmetic.
Proof. Clearly, every irreducible µ has N (µ) 2. Arguing as we did in Z
shows we cannot keep factorizing into irreducibles forever, so the existence
part is easy. To complete the argument, we just need to show that every
irreducible is prime. This follows easily from Theorem 1.23. Let µ be an irreducible and suppose that µ divides αβ but µ does not divide α. Clearly, the
greatest common divisor of µ and α is 1 because µ admits only itself and units
as divisors and µ does not divide α, so we can write
µx + αy = 1
for some x, y ∈ R by Theorem 1.23. Multiply through by β to obtain
µxβ + αβy = β.
Since µ divides both terms on the left-hand side, it must divide the right-hand
side, and this completes the proof.
48
2 Diophantine Equations
2.3 Sums of Squares
The resolution of the Pythagorean equation ( Equation (2.1)) is an elementary and well-known result. We are now going to show how the Fundamental
Theorem of Arithmetic in other contexts can yield solutions to less tractable
Diophantine equations. Consider the following problem: Which integers can
be represented as the sum of two squares? That is, what are the solutions to
the Diophantine problem
n = x2 + y 2 ?
When n is a prime, experimenting with a few small values suggests the following.
Theorem 2.6. The prime p can be written as the sum of two squares if and
only if p = 2 or p is congruent to 1 modulo 4.
To prove this, we are going to use the Fundamental Theorem of Arithmetic
in the ring of Gaussian integers R = Z[i] with norm function N : R → N
defined by N (x + iy) = x2 + y 2 as in Example 2.3(1).
Lemma 2.7. If p is 2 or a prime congruent to 1 modulo 4, then the congruence
T 2 + 1 ≡ 0 (mod p)
is solvable in integers.
Proof. This is clear for p = 2 so suppose p = 4n + 1 for some integer n > 0.
Using al-Haytham’s Theorem (Theorem 1.19),
(p − 1)! = (p − 1)(p − 2) · · · 3 · 2 · 1 ≡ −1
(mod p).
Now
4n = p − 1 ≡ −1 (mod p),
4n − 1 = p − 2 ≡ −2 (mod p),
..
.
2n + 1 = p − 2n ≡ −2n
(mod p).
It follows that
(−1)(−2) · · · (−2n)(2n)(2n − 1) · · · 3 · 2 · 1 = (2n)!(−1)2n ≡ −1
Thus T = (2n)! has T 2 + 1 ≡ 0 modulo p, proving the lemma.
(mod p).
Proof of Theorem 2.6. The case p = 2 is trivial. The case when p is
congruent to 3 modulo 4 is also dealt with easily; no integer that is congruent
2.3 Sums of Squares
49
to 3 modulo 4 can be the sum of two squares because squares are 0 or 1
modulo 4.
Assume that p is a prime congruent to 1 modulo 4. By Lemma 2.7, we can
write
cp = T 2 + 1 = (T + i)(T − i) in R = Z[i]
for some integers T and c.
Suppose (for a contradiction) that p is irreducible in R. Then since Z[i] has
the Fundamental Theorem of Arithmetic, p is prime. Hence p must divide one
of T ± i in R since it divides their product, and this is impossible because p
does not divide the coefficient of i. It follows that p cannot be irreducible in R,
so
p = µν
is a product of two non-units in R. Taking the norm of both sides shows that
p2 = N (µν) = N (µ)N (ν).
This is an equation in Z, so by the Fundamental Theorem of Arithmetic there
are three possibilities.
1. N (µ) = 1 and N (ν) = p2 , which is impossible since µ is not a unit;
2. N (ν) = 1 and N (µ) = p2 , which is impossible since ν is not a unit;
3. N (µ) = N (ν) = p, which must be the case, and this means there is a
nontrivial solution to the equation x2 + y 2 = p.
What is being witnessed here is a symbiotic relationship between certain
Diophantine equations and the structure of an associated ring. To illustrate
this, we now give a theorem that characterizes the primes of Z[i].
Theorem 2.8. The primes of R = Z[i] are of three types,
(1) 1 + i,
(2) integer primes p ≡ 3 modulo 4,
(3) factors x ± iy of the integer primes p ≡ 1 modulo 4,
together with all multiples of these types by units.
Exercise 2.8. Prove Theorem 2.8. (Hint: Show that any prime in Z[i] divides
a prime in Z.)
Exercise 2.9. Prove that if a prime p is a sum of two squares, p = a2 + b2 ,
then this representation is unique (apart from the obvious changes).
Exercise 2.10. Prove that the positive integer n is a sum of two squares if
and only if every prime p with p ≡ 3 modulo 4 that divides n does so to an
even exponent.
50
2 Diophantine Equations
2.3.1 Lagrange’s Four Squares Theorem
One of the many classical results of elementary number theory extends Theorem 2.6 to all integers – at the expense of allowing more squares to be added
together. Bachet conjectured the result, and Diophantus stated it; Fermat
may have had a proof. The first published proof was that of Lagrange in 1770,
which we now present.
Lemma 2.9. Let p be an odd prime. Then there are integers a and b with
a2 + b2 + 1 ≡ 0 (mod p).
Proof. Define the sets
A=
and
a2 | 0 a p−1
2
p−1
2
B = −b − 1 | 0 b .
2
No two elements of A are congruent modulo p, and no two elements of B are
congruent modulo p. It follows that each of the sets A and B contains p+1
2
elements modulo p, so by the pigeonhole principle1 there must be an element
of A that is equal to an element of B modulo p since there are only p distinct
integers modulo p. Thus there are integers a and b with
a2 + b2 + 1 ≡ 0 (mod p)
as required.
Theorem 2.10. [Lagrange] Every positive integer is a sum of four integer
squares.
Proof. The first step is to note the Euler four-square identity,
(a2 + b2 + c2 + d2 )(w2 + x2 + y 2 + z 2 ) = (aw + bx + cy + dz)2
+(ax − bw − cz + dy)2
+(ay + bz − cw − dx)2
+(az − by + cx − dw)2 ,
which may be proved simply by expanding the right-hand side. This identity
means that the property of being written as a sum of four squares is preserved
under products. By the Fundamental Theorem of Arithmetic, it is therefore
1
The ‘pigeonhole’ principle states that if (Q + 1) letters are placed in Q pigeonholes, one pigeonhole must contain more than one letter. It is readily proved by
contradiction.
2.3 Sums of Squares
51
sufficient to prove that any prime is a sum of four integer squares. It is clear
that 2 = 12 + 12 + 02 + 02 is a sum of four integer squares, so it is enough to
prove that any odd prime is a sum of four integer squares.
Let p be an odd prime. By Lemma 2.9, there are integers a, b, c, d and m
with
mp = a2 + b2 + c2 + d2 .
(2.4)
If m = 1 then we are done, so assume that m > 1. The proof proceeds by
finding an expression for m p as a sum of four squares, with 0 < m < m. This
can be repeated, reducing the size of m each time, until we eventually must
find an expression for the prime p itself as a sum of four squares.
Now notice that if an even integer 2n is a sum of two squares, 2n = x2 +y 2 ,
then the integers x and y are either both even or both odd. It follows that the
identity
2 2
x+y
x−y
n=
+
(2.5)
2
2
expresses n as a sum of two integer squares. Returning to Equation (2.4), if m
is even, then either none, two, or four of the numbers a, b, c, d are even. Thus
we can use Equation (2.5) twice to deduce that ( m
2 )p is a sum of four squares.
In this case we have halved the size of m.
If m is odd, write
w≡a
x≡b
y≡c
z≡d
with − m
2 < w, x, y, z <
m
2.
(mod
(mod
(mod
(mod
m)
m)
m)
m)
Then
w2 + x2 + y 2 + z 2 < m2
and
w2 + x2 + y 2 + z 2 ≡ 0 (mod m).
It follows that
w2 + x2 + y 2 + z 2 = km
for some k, 0 < k < m. Now in Euler’s four-square identity
(a2 + b2 + c2 + d2 )(w2 + x2 + y 2 + z 2 ) = (aw + bx + cy + dz)2
+(ax − bw − cz + dy)2
+(ay + bz − cw − dx)2
+(az − by + cx − dw)2 (2.6)
the left-hand side is km2 p. By our choice of w, x, y, z we see that ax ≡ bw
and dy ≡ cz modulo m, so (ax − bw − cz + dy)2 is divisible by m2 . A similar
argument shows that
52
2 Diophantine Equations
(ay + bz − cw − dx)2
and
(az − by + cx − dw)2
are also divisible by m2 . For the first term,
aw + bx + cy + dz ≡ w2 + x2 + y 2 + z 2 ≡ 0
(mod m),
so the right-hand side of Equation (2.6) is divisible by m2 . It follows that the
identity (2.6) can be divided through by m2 , resulting in an expression for kp
as a sum of four squares, with 0 < k < m.
Repeating this reduction a finite number of times will reduce m to 1,
resulting in an expression for the odd prime p as a sum of four squares,
completing the proof.
Exercise 2.11. *[Legendre] Show that every integer not of the form
4n (8k + 7)
is a sum of three integer squares.
Exercise 2.12. Suppose a prime p is a sum of four squares. Is it true that
the representation is unique? What if p is a sum of three squares?
2.4 Siegel’s Theorem
In this section, we show how a direct application of the Fundamental Theorem of Arithmetic in rings that are larger than the integers, for example the
Gaussian integers Z[i], can yield all the integral solutions to certain cubic equations. In the first example, we use the Fundamental Theorem of Arithmetic
only in Z.
Theorem 2.11. The only integral solution of the equation
y 2 = x3 + x
(2.7)
is x = 0, y = 0.
Proof. Let x and y be integers with y 2 = x3 + x. Write the right-hand side
of the equation as x3 + x = x(x2 + 1). Any factor of x will divide x2 , so any
factor common to x and x2 + 1 will also divide 1. Thus x and x2 + 1 must be
coprime and hence, by the Fundamental Theorem of Arithmetic, both must
be squares (since their product is y 2 ). Writing z 2 = x2 + 1, we see that
1 = z 2 − x2 = (z + x)(z − x).
By the Fundamental Theorem of Arithmetic in Z, (z + x) and (z − x) must
both be 1 or both be −1.
Solving for x and z shows that x = 0 in both cases.
2.4 Siegel’s Theorem
53
Theorem 2.12. The only integral solution of the equation
y 2 = x3 − 1
(2.8)
is x = 1, y = 0.
For this equation, it looks as if we should factorize the right-hand side
over Z, but it does not seem easy to get to the proof that way. Instead we
factorize over a bigger ring that is also known to satisfy the Fundamental
Theorem of Arithmetic.
Proof of Theorem 2.12. Rewrite the equation as
y 2 + 1 = x3
and then factorize the left-hand side as (y + i)(y − i) in Z[i]. We claim that the
two factors y ± i must be coprime. To see why, let δ = gcd(y + i, y − i); δ must
divide the difference y + i − (y − i) = 2i. However, we claim that no factor
of 2 can divide y ± i. This is because x must be odd; if x is even then x3 ≡ 0
modulo 8, which means that y 2 + 1 ≡ 0 modulo 8 and this congruence has
no solutions. We deduce that δ must be a unit, and the two factors y ± i are
coprime in Z[i].
Applying the Fundamental Theorem of Arithmetic in Z[i], we deduce that
each factor y + i, y − i must be a unit multiple of a cube. Since all units are
themselves cubes, we deduce that each of y ± i is a cube in Z[i], so assume
y + i = (a + bi)3 , a, b ∈ Z.
Equating imaginary parts gives
1 = 3a2 b − b3 = b(3a2 − b2 ).
By the Fundamental Theorem of Arithmetic in Z, the solutions are greatly
restricted: b = (3a2 − b2 ) = ±1. If b = 1, then 3a2 − 1 = 1, which is impossible
as no integer a has 3a2 = 2. The only alternative is b = −1, in which case a =
0, yielding the unique solution y = 0 and x = 1.
√
Exercise 2.13. Use the preceding method in the ring Z[ −2] to prove that
the only integral solutions of
y 2 = x3 − 2
are x = 3, y = ±5.
Later we will be thinking of the set of solutions to equations such as these
geometrically, so we will describe the solutions as points (x, y) in the plane.
Now consider the example
y 2 = x3 − 3.
(2.9)
54
2 Diophantine Equations
Experimentation with small integers suggests that there will be no integral
solutions, but we encounter a difficulty when we try to prove this using the
preceding methods. The reason is √
that the Fundamental Theorem of Arithmetic does not hold in the ring Z[ −3] (see Exercise 3.17 on p. 73.) On the
other hand, the Fundamental Theorem of Arithmetic does hold in the bigger
ring Z[ω], where ω = e2πi/3 is a nontrivial cube root of unity.
This suggests that we might try to find all the solutions (x, y) over the
ring Z[ω] as a precursor to finding all the solutions over the smaller ring Z.
This might seem audacious but historically this is just what happened in the
general case.
Theorem 2.13. [Siegel’s Theorem] Suppose a, b, c ∈ Q. Then there are
only finitely many integer pairs (x, y) with
y 2 = x3 + ax2 + bx + c,
(2.10)
provided the cubic polynomial x3 + ax2 + bx + c has no repeated zeros.
This theorem will not be proved here – see the notes at the end of the
chapter for references where complete proofs may be found. The curve described by an equation of the shape Equation (2.10) is known as an elliptic
curve provided the right-hand side has no repeated zeros. In order for Siegel’s
Theorem to hold, some condition about the cubic polynomial is clearly needed
because, for example, the equation y 2 = x3 has infinitely many integral solutions. We will devote considerable space to studying the remarkable properties
of elliptic curves.
Exercise 2.14. Prove that the polynomial x3 + ax + b has no repeated zero
if and only if 4a3 + 27b2 = 0.
The genius of people such as Siegel is that they are willing to take an
imaginative step up from particular cases, and are in addition able to supply
the guile needed to complete the proof. In fact, he gave two different proofs of
Theorem 2.13. In his second proof Siegel showed that there are only finitely
many solutions (x, y) with x and y lying inside a suitably large ring containing Z in which the Fundamental Theorem of Arithmetic holds. The rings
in which Siegel proposed to work typically contain infinitely many units, in
contrast with the integers Z. We can appreciate some of the technical difficulties he had to overcome by considering the techniques that went into his
second proof in some special cases. His second proof turned out to be very
important: He first reduced the given equation to a finite number of linear
equations over a finitely generated group. Subsequently, methods were developed in Diophantine Approximation that applied to these linear equations and
allowed, ultimately, a practical method for finding all the integral solutions of
the equation in Theorem 2.13.
2.4 Siegel’s Theorem
55
√
Exercise 2.15. Fix a square-free integer d > 1, and assume that Z[ d] satisfies the Fundamental Theorem of Arithmetic. Show that the equation
y 2 = x3 + d
has only finitely many
√ integral solutions. (Hint: You may assume that the
units of the ring Z[ d] are all of the form ±un for some unit u > 1.)
The rings Siegel worked with are obtained by inverting certain chosen
primes. This technique provides us with a new class of rings to study. As a
simple illustrative example, let S denote the set {2} and let ZS denote the
ring Z[ 12 ] consisting of all rational numbers with a denominator consisting of
a power of 2. Given any nonzero q ∈ Q, write q = 2r q , where r ∈ Z and
the numerator and denominator of q are odd. Define the S-norm of q to
be |q|S = |q |. The ring R has infinitely many units, consisting of the rational
numbers ±2k for k ∈ Z. The ring R is sometimes called the ring of S-integers
of Z, and its units are known as S-units.
Exercise 2.16. Prove that the ring ZS is a Euclidean ring with respect to |.|S .
The next exercise will provide a further illustration of some of the techniques needed to prove Siegel’s Theorem. We have already seen examples
where the Fundamental Theorem
√ of Arithmetic fails in some quadratic rings.
We overcame that failure in Z[ −3] by working in the bigger ring R = Z[ω],
where ω is a nontrivial cube root of unity.
√ Letting S = {2} as before, R is a
subring of an even bigger ring RS = Z[ −3, 12 ].
√
Exercise 2.17. Define a norm function on RS = Z[ −3, 12 ] with the property
that RS is a Euclidean ring. Find all solutions to Equation (2.9) in the ring RS .
Again, this exercise shows there are only finitely many solutions to a specific
cubic equation in a ring with infinitely many units.
Theorem 2.14 below is quite deep and we will not prove it. The proof
requires Theorem 4.14 from Chapter 3. The notes at the end of the chapter
reference a proof in√the literature. It shows that the Fundamental Theorem
of Arithmetic in Z[ d] can be recovered by inverting a finite list of primes.
Theorem 2.14. Let d be a nonsquare integer. There is a finite list of primes
p1 , . . . , pr
√ 1
with the property that Z[ d, p1 , . . . , p1r ] has the Fundamental Theorem of
Arithmetic.
Combining the techniques learned thus far allows a special case of Siegel’s
Theorem to be proved. An integer is called square-free if it is not divisible by
the square of any integer greater than 1.
56
2 Diophantine Equations
Exercise
√ 2.18. Suppose d < 0 is a square-free integer with the property
that Z[ d, p1 ] has the Fundamental Theorem of Arithmetic for some prime p.
Show that the equation
y 2 = x3 + d
(2.11)
has only finitely many integral solutions.
When explicit approaches such as this succeed, they allow the determination of all the integral solutions. Determining all the integral solutions predicted by Siegel’s Theorem is generally quite a difficult problem and requires
powerful methods from transcendence theory. It was not until late in the twentieth century that these methods were sufficiently well advanced to allow for
a practical method of solving a given equation.
An S-unit equation is one of the form
a1 x1 + · · · + an xn = 1
with ai fixed constants in some field K, and the solutions xi are sought in
a finitely generated subgroup of K∗ . For the cubic equations studied here,
Siegel reduced the problem of finding all the integral solutions to finding the
solutions of a finite number of S-unit equations all having n = 2. He then
showed that such an equation has only finitely many solutions. In general Sunit equations turn out to lie behind many other Diophantine equations and
they have come to be studied as important in their own right.
2.5 Fermat, Catalan, and Euler
Finally we mention three famous Diophantine problems, all of which have
recently been solved. There are detailed references in the notes at the end of
the chapter.
2.5.1 Fermat
Fermat’s Last Theorem, now proved by Wiles, states that the equation
xn + y n = z n ,
n 3,
(2.12)
has no nontrivial solutions. (A solution is trivial if one of x, y or z is zero.)
Clearly, it is only necessary to prove this in the case when n = p is a prime.
A startling aspect of the solution is that it depends on deep results concerning the arithmetic of elliptic curves: If ap + bp + cp = 0 for a prime p and
integers a, b, c, then the elliptic curve with equation
y 2 = x(x − ap )(x + bp )
turns out to have properties that Wiles was able to show were impossible. We
will be studying the arithmetic of elliptic curves in Chapters 5 and 6.
2.5 Fermat, Catalan, and Euler
57
Exercise 2.19. *Prove that Equation (2.12) has no nontrivial solutions with n
equal to 3, 4, or 5.
Exercise 2.20. *Prove that Equation (2.12) has no nontrivial solutions in
Gaussian integers with n = 4.
Exercise 2.21. *Prove that Equation (2.12) has no solutions x, y, z in positive integers with n a Gaussian integer.
2.5.2 Catalan
The Catalan equation is
ux − v y = 1,
u, v, x, y ∈ N,
u, v, x, y 2.
(2.13)
A solution is 32 − 23 = 1; the Catalan problem is to show that there are no
others, and this has recently been proved.
2.5.3 Euler
Euler conjectured that an nth power cannot be written as the sum of fewer
than n nontrivial nth powers for n 3. Lander and Parkin made a computer
search for nontrivial solutions to the Diophantine equation
n
x5i = y 5 ,
n 6.
i=1
Among the solutions, they found a counterexample to Euler’s conjecture
for n = 5. Their resulting announcement matches the famous seminar of Cole
described on p. 27 for its brevity and drama: The entire text of their paper is
as follows.
“A direct search on the CDC 6600 yielded
275 + 845 + 1105 + 1335 = 1445
as the smallest instance in which four fifth powers sum to a fifth
power. This is a counterexample to a conjecture by Euler [see L. E.
Dickson, History of the theory of numbers, Vol. 1, p. 648, Chelsea,
New York, 1952] that at least n nth powers are required to sum to
an nth power, n > 2.”
In addition, it was shown that the case n = 4, namely the Diophantine equation
u4 + v 4 + w4 = x4 ,
(2.14)
has no solutions in positive integers with x < 220000.
58
2 Diophantine Equations
In a dramatic development, Elkies used a mixture of sophisticated theory
and a computer search to find a solution to Equation (2.14),
26824404 + 153656394 + 187967604 = 206156734 .
(2.15)
Following this, Roger Frye found that the minimal solution to Equation (2.14)
is
958004 + 2175194 + 4145604 = 4224814
and showed that there are no other solutions with u v w < x < 1000000.
Notes to Chapter 2: Much of the material in this chapter is part of algebraic
number theory. Stewart’s book [147] is an accessible introduction at this level; for
more advanced treatments, see the books of Hasse [76], Janusz [83] or Lang [96]. A
sophisticated text on related topics is Serre’s classic book [137]. Barbeau’s book [10]
discusses Pell’s equation in detail and requires very little background. A proof of
Theorem 2.14 can be found in Lang [96, Chapter I, Proposition 17]. The seminal
finiteness results on S-unit equations mentioned at the end of Section 2.4 may be
found in the papers of Evertse [60], Schlickewei [134], and van der Poorten and
Schlickewei [120]. These results have found wide application; a surprising connection to ergodic theory is shown in a paper of Schmidt and Ward [135]. For attractive
accounts of Fermat’s Last Theorem, see the popular accounts of Ribenboim [126]
and van der Poorten [119]; a serious introduction at a high level to the mathematics
behind Wiles’ extraordinary proof [162] may be found in the proceedings [35] of an instructional conference edited by Cornell, Silverman and Stevens. Exercise 2.20 comes
from a short note by Cross [38]; Exercise 2.21 comes from a paper of Zuehlke [169]
and uses some transcendence theory. The Catalan problem Equation (2.13) was initially reduced to a finite calculation and then solved completely by Mihăilescu; see
the paper of Metsänkylä [106] for an account and the monograph [58, p. 159] by
Everest, van der Poorten, Shparlinski and Ward for an overview of related questions. An accessible account of the Catalan problem before its final solution may be
found in the book of Ribenboim [124]. The results of Lander and Parkin appeared
in their paper [92]; their dramatic announcement quoted in Section 2.5.3 is [91].
The state of Euler’s problem in 1967 is surveyed in a paper of Lander, Parkin and
Selfridge [93]. Equation (2.15) of Elkies is in [49].
3
Quadratic Diophantine Equations
Attempts to go beyond the Pythagorean Diophantine equation quickly lead
to general questions about quadratic Diophantine problems. Apparently simple questions seem to require an excursion into the theory of finite fields.
For example, we prove that any finite field has a primitive root in order to
develop the classical theory of the Legendre symbol and the Quadratic Reciprocity Law. Some general theory of quadratic rings and quadratic forms is
established, up to the finiteness of the class number for quadratic forms.
3.1 Quadratic Congruences
Suppose we now seek to generalize our earlier results and understand the
Diophantine equation
x2 + 2y 2 = p
(3.1)
when p is a prime
√ and x and y are integers. We can do this by using properties
of the ring Z[ −2], but we also need a better understanding of the arithmetic
of the integers modulo p when p is a prime.
√
Exercise 3.1. Let R = Z[ −2].
(a) Show that the function N : R → N defined by
√
N (x + y −2) = x2 + 2y 2
satisfies N (αβ) = N (α)N (β) for all α, β ∈ R.
(b) Determine all the units in R.
(c) Show that R is Euclidean with respect to N .
Following our earlier method, we now expect to use unique factorization in R together with some knowledge of congruences to understand Equation (3.1). The relevant congruence to study for this equation is
T 2 + 2 ≡ 0 (mod p).
(3.2)
60
3 Quadratic Diophantine Equations
Exercise 3.2. Compute the list of primes p < 1000 for which the congruence (3.2) has a solution with T ∈ Z.
It is becoming clear that we need some tool that will guarantee the existence of a solution for certain congruences and rule out a solution for others. For example, your computations in Exercise 3.2 should suggest that for
primes p ≡ 1 or 3 modulo 8 there is a solution, while there is no solution for
primes p ≡ 5 or 7 modulo 8. (The prime p = 2 does give a solution.) Our
earlier approach suggests that the area we need to look at is the arithmetic
of Z/pZ. Previously we used al-Haytham’s Theorem in a crucial way, and here
we have no obvious analog. It turns out that the property we need is directly
related to a natural concept in group theory.
Definition 3.1. An element a of Z/pZ is a primitive root modulo p if the
powers of a generate all the nonzero residues modulo p.
Example 3.2. It is easy to prove that the powers of 2 yield all the nonzero
residues modulo 5: 20 ≡ 1, 21 ≡ 2, 22 ≡ 4, 23 ≡ 3 modulo 5. Thus 2 is a
primitive root modulo 5. Similarly, 3 is a primitive root modulo 7, but 2 is
not since no power of 2 is congruent to 3 modulo 7.
The set of residues modulo p forms a field: The existence of a primitive root a modulo p is the same as the statement that the multiplicative
group (Z/pZ)∗ of the field Z/pZ is cyclic, generated by a. We will use freely
other equivalent ways of saying this. If G denotes a finite Abelian group with n
elements, written multiplicatively, then a generates G if and only if any of the
following equivalent conditions hold:
1. am = 1, 1 < m n =⇒ m = n;
2. the order of a is n; 3. am = 1, 1 < m =⇒ nm.
Theorem 3.3. The multiplicative group of any finite field is cyclic.
This is an important result, and we will spend some time proving it. When
we have done this, we can return to our equations. The proof of Theorem 3.3
involves an important example of an arithmetic function.
Definition 3.4. An arithmetic function is any function f : N → C. An arithmetic function with f(1) = 0 and
f(mn) = f(m)f(n)
whenever m and n are coprime is called multiplicative. (Note that this implies f(1) = 1.) If f has this property not only for coprime m, n, but for
all m, n ∈ N, then f is called completely multiplicative.
3.1 Quadratic Congruences
61
Multiplicative arithmetic functions will be discussed further in Section 8.2.
One of the most important arithmetic functions is
φ(n) = |{1 a n | gcd(a, n) = 1}| ,
called the Euler phi-function.
Exercise 3.3. Let p be a prime. Show that φ(pe ) = pe−1 (p − 1) for any e 1.
Lemma 3.5. The Euler phi-function is multiplicative.
We will postpone the proof slightly to note an immediate corollary of
Lemma 3.5 and Exercise 3.3.
Corollary 3.6. If n is factorized into powers of distinct primes, n = p pep ,
then
p−1
(p − 1)pep −1 = n
.
φ(n) =
p
p|n
p|n
Exercise 3.4. Give an example to show that φ is not completely multiplicative.
Exercise 3.5. (a) Find all values of n ∈ N with φ(n) = 12 n.
(b) Find all values of n ∈ N with φ(n) = φ(2n).
(c) Find all six values of n ∈ N with φ(n) = 12.
1
(d) Find the smallest n ∈ N for which φ(n)
n < 4.
φ(n )
(e) Find a sequence of integers (nj ) for which njj → 0 as j → ∞.
The proof of Lemma 3.5 depends on the following result.
Theorem 3.7. [Chinese Remainder Theorem] Suppose m, n ∈ N are coprime. Then the simultaneous congruences
x ≡ a (mod m),
x ≡ b (mod n),
have a solution x ∈ N for any a, b ∈ Z, and the solution is unique modulo mn.
The Chinese Remainder Theorem was discovered by Chinese mathematicians in the fourth century A.D. The first appearance seems to have been in
a work of Sun-Zi, and a general treatment was given by Qin1 Jiushao. Special
1
Also transliterated as Ch’in Chiu-Shao. Jiushao seems to have been both a rogue
and a mathematical genius. His work Shushu Jiuzhang (Mathematical Treatise
in Nine Sections) appeared in 1247 and contained many important and novel
results and methods. The so-called Chinese Remainder Theorem is among these,
attributed to experts in astronomy and calenders. It has been suggested that the
theorem does not bear his name because in the form ‘Chin’ it was too easily
confused with ‘Chinese’.
62
3 Quadratic Diophantine Equations
results of the same sort were used by Fibonacci in Italy and al-Haytham in
Iraq. We will see it again in Chapter 12 (see p. 256) in greater generality.
Proof of the Chinese Remainder Theorem. The coprimality condition
guarantees that there exist m , n such that
mm = 1 (mod n) and nn = 1
(mod m)
(3.3)
by Corollary 1.25. Then x = bmm + ann satisfies both the required congruences.
If, on the other hand, x and y satisfy both congruences, then (x − y) is
divisible by m and by n. Since m and n are coprime, (x − y) must be divisible
by mn.
Example 3.8. Solve the simultaneous congruences x ≡ 2 modulo 17 and x ≡ 8
modulo 11. We find m = 2 and n = 14 in the proof of the Chinese Remainder
Theorem. Then
x = 8 · (17 · 2) + 2 · (11 · 14) = 580
satisfies the two congruences. (The smallest solution is the remainder of 580
divided by 11 · 17, namely 19.)
Proof of Lemma 3.5. Let m and n be coprime. Define a map
Φ : Z/mnZ → Z/mZ × Z/nZ
by
x → (x (mod m), x (mod n)) ,
where we think of the elements of Z/mnZ as {0, 1, . . . , mn−1}. By the Chinese
Remainder Theorem, Φ is a bijection. (In fact, Φ is an isomorphism of rings.)
Now define
∗
(Z/nZ) = {1 a n : gcd(a, n) = 1}
and likewise for n and mn. Since x is coprime to mn if and only if it is coprime
both to m and n, Φ restricts to these subsets:
∗
∗
∗
Φ : (Z/mnZ) → (Z/mZ) × (Z/nZ) .
Here Φ is still a bijection. (In fact, the set (Z/kZ)∗ is the set of units U (Z/kZ)
of Z/kZ, and Φ is an isomorphism of (multiplicative) groups.) By definition,
the cardinality of (Z/mZ)∗ is just φ(m) and likewise for n and mn, which
completes the proof of Lemma 3.5.
The next exercise is a generalization of Fermat’s Little Theorem (Theorem 1.12), called the Euler–Fermat Theorem.
Exercise 3.6. Given n > 1 in N, show that for any a ∈ Z with gcd(a, n) = 1
aφ(n) ≡ 1 (mod n).
(3.4)
3.1 Quadratic Congruences
63
Exercise 3.6 is a pretty standard one found in most texts that deal with
the φ-function. It is a good test case for our earlier remarks about how different
approaches can yield different benefits. It is possible to prove Equation (3.4)
using congruences modulo pr for each prime power pr dividing n, together
with the Binomial Theorem. Another, slicker, proof simply uses Lagrange’s
Theorem on the group U (Z/nZ) = (Z/nZ)∗ .
Theorem 3.9. For any n ∈ N,
φ(d) = n.
(3.5)
d|n
Proof. First check the equality when n = pr is a prime power. The left-hand
side is
r
1+
(p − 1)pi−1 = 1 + (pr − 1) = n
i=1
by summing the geometric progression or noticing that it is a telescoping sum.
Next, observe that both sides of Equation (3.5) are multiplicative arithmetic
functions. For the left-hand side, this follows from
φ(d) =
φ(d1 d2 ) =
φ(d1 )
φ(d2 )
d|mn
d1 |m d2 |n
d1 |m
d2 |n
for any pair of coprime integers (m, n). Note that d divides mn if and only if
there exist divisors d1 of m and d2 of n such that d = d1 d2 , so it is enough to
check the prime power case.
We can now prove Theorem 3.3. In the proof, we will be working with
a general finite field. Such a field can always be explicitly presented using
polynomials; however, nowhere will we need an explicit presentation. This
suggests that more abstract methods might also be applicable to prove the
theorem. Indeed, a proof can be given that only uses the theory of finite
Abelian groups.
Proof of Theorem 3.3. Let F be a finite field with q elements. We are
going to prove that if g is any element of F∗ , then g j has the same order as g
if and only if gcd(j, q − 1) = 1. This will allow us to find how many elements
there are of each order, showing in particular that there are φ(q − 1) distinct
generators in total.
Example 3.10. The distinct powers of 3 in F∗7 are
30 ≡ 1, 31 ≡ 3, 32 ≡ 2, 33 ≡ 6, 34 ≡ 4, 35 ≡ 5.
The only values of j, 1 j 6 with gcd(j, 6) = 1 are 1 and 5. Since 35 ≡ 5
modulo 7, 5 is another generator of F∗7 . Similarly, F∗11 = 2 (the multiplicative group generated by 2). The values of j between 1 and 10 for
which gcd(j, 10) = 1 are j = 1, 3, 7, 9 so there are four possibilities for generators of F∗11 , namely 21 ≡ 2, 23 ≡ 8, 27 ≡ 7, 29 ≡ 6.
64
3 Quadratic Diophantine Equations
Exercise 3.7. Prove that in any field, a polynomial of degree d has no more
than d zeros. (Hint: Use Exercise 2.5 on p. 47).
Returning to the proof of Theorem 3.3, suppose d(q − 1) and a is an
element of F∗ of order d (if one exists). Then
ad = 1 in F and am = 1 with 0 m < d implies m = 0.
The elements 1, a, a2 , · · · , ad−1 are all distinct, otherwise ai = aj would imply
that ak = 1 with some 0 < k < d. We claim that if an element a of order d
exists, then the other elements of order d in F∗ are precisely those powers aj
with 1 j < d and gcd(j, d) = 1. Thus if there is an element of order d, then
there will be precisely φ(d) of them. If a does have order d, then the only other
elements of order d must lie among the powers aj above since any element of
order d satisfies the equation
xd − 1 = 0
in F, this equation has at most d roots by Exercise 3.7, and each of the
powers aj , 0 j < d satisfies the equation. Thus all the elements of order d
must lie among these powers. But which of the powers have order d? We now
prove our claim that aj has order d, 1 j < d, if and only if gcd(j, d) = 1.
If 1 < gcd(j, d) = d < d then 1 < d/d < d and
(aj )d/d = (ad )j/d = 1j/d = 1,
so aj does not have order d (since d/d < d).
Conversely, suppose that gcd(j, d) = 1 and aj has order d with
1 < d d.
Then ajd = 1, so djd since a has order d. However, gcd(d, j) = 1, which
forces dd . On the other hand, d d, so we must have d = d . This
completes the proof that aj has order d if and only if gcd(j, d) = 1.
Each of the (q − 1) elements of F∗ has order dividing (q − 1), so by Theorem 3.9,
φ(d) = q − 1.
d|(q−1)
Thus, for every d dividing (q − 1), we must have φ(d) elements (not none) of
order d. In particular, we have φ(q − 1) 1 elements of order (q − 1).
Notice that we have proved a little more than Theorem 3.3. The proof
shows how many elements of Fq there are of each possible order, finding in
particular that there must be at least one element of order (q − 1), which is
therefore a primitive root.
Exercise 3.8. Verify that 2 is a primitive root for the prime p = 19. Find all
the elements of order 6 under multiplication modulo 19, expressed as integers
between 1 and 18.
3.2 Euler’s Criterion
65
Despite the seemingly complete knowledge provided by the proof of Theorem 3.3, several closely related questions turn out to be extremely difficult.
The following is a famous conjecture of Artin which remains an open problem.
Conjecture 3.11. [Artin] Any integer that is not a square or −1 is a primitive
root modulo p for infinitely many primes p.
An apparently less ambitious question is to ask, given an explicitly presented finite field, whether there is an algorithm for determining a primitive
root. For example, if p is a given prime, can we determine a primitive root
for p? The most obvious thing to try is checking the integers 2, 3, 5, 6 . . . (not 4
of course!) in the hope that a primitive root will soon be found. Thus one seeks
an upper bound on the smallest primitive root, and this too is difficult. The
smallest primitive root modulo p can be shown – conditionally – to be bounded
by a constant multiple of (log p)6 , a result of Shoup from 1992. However this
result relies upon a hard unproven hypothesis stated in Section 12.7.1. This
might not sound very satisfactory, but it turns out to have great practical
value.
3.2 Euler’s Criterion
Many problems concerning quadratic congruences can be reduced to solving
the simplest such congruence, namely x2 ≡ a modulo p for a prime p and
given a.
Definition 3.12. Let p be an odd prime and a an integer. The Legendre
symbol is defined by
⎧
⎨ 0 if pa,
a
=
1 if p a and x2 ≡ a (mod p) has a solution,
⎩
p
−1 otherwise.
If a = 0 and ( ap ) = −1 then a is a quadratic nonresidue modulo p; otherwise a is a quadratic residue modulo p.
Some elementary properties of the Legendre symbol will be used without
2
comment. In particular, if a ≡ b modulo p, then ( ap ) = ( pb ) and ( ap ) = 1 for
any a = 0.
Theorem 3.13. [Euler’s Criterion] Let p be an odd prime. Then
a
≡ a(p−1)/2 (mod p).
p
(3.6)
66
3 Quadratic Diophantine Equations
Proof. The statement is obvious if a ≡ 0 modulo p, so assume that a is
coprime to p. Notice that the only square roots of 1 modulo p are congruent
to ±1 since x2 − 1 = (x − 1)(x + 1) in any field.
Now
(a(p−1)/2 )2 = ap−1 ≡ 1 (mod p),
so
a(p−1)/2 ≡ ±1 (mod p).
Let g denote a generator of the cyclic group (Z/pZ)∗ . Then a ≡ g j modulo p
for some j, and a is a quadratic residue if and only if j is even. Suppose a is
a quadratic residue, so j = 2j for some integer j . It follows that
a(p−1)/2 ≡ (g j )(p−1)/2 = g j
(p−1)
= (g p−1 )j ≡ 1
(mod p).
Thus ( ap ) = 1 implies that a(p−1)/2 ≡ 1 modulo p.
Conversely, if a(p−1)/2 ≡ 1 modulo p, then g j(p−1)/2 ≡ 1 modulo p. However, g has order (p − 1) modulo p, so
(p − 1)j(p − 1)/2, which implies 2(p − 1)j(p − 1).
Canceling (p − 1) from both sides shows that j is even. Thus a(p−1)/2 ≡ 1
modulo p implies that ( ap ) = 1.
Corollary 3.14. The Legendre symbol satisfies
ab
a
b
=
.
p
p
p
That is, the Legendre symbol viewed as an arithmetic function
·
: Z → {0, ±1}
p
is completely multiplicative.
The proof follows immediately from Theorem 3.13 because the right-hand
side of Equation (3.6) is completely multiplicative.
Exercise 3.9. Suppose that p, q > 0 are odd primes with q = 4p + 1. Prove
that 2 is a primitive root modulo q. It follows that Artin’s conjecture (on p. 65)
for a = 2 would be proved if we knew there are infinitely many primes q of
the form 4p + 1 where p is a prime.
Exercise 3.10. Prove Corollary 3.14 using concepts from group theory. (Hint:
The set of squares in the group G = (Z/pZ)∗ forms a subgroup. The index of
this subgroup in G is of order 2 if p is odd; see Exercise 3.12 below.)
3.3 The Quadratic Reciprocity Law
67
3.3 The Quadratic Reciprocity Law
The main result on quadratic residues is a reciprocity law. Gauss did many
calculations with quadratic residues and in particular studied whether there
might be a relation between p being a quadratic residue modulo q and q being
a quadratic residue modulo p when p and q are primes. Based on his extensive
calculations, he conjectured and then proved (in several ways) the following:
When one of p or q is congruent to 1 modulo 4, either both of the congruences
x2 ≡ q
(mod p), y 2 ≡ p
(mod q),
are solvable or both are not. If both p and q are congruent to 3 modulo 4,
then one is solvable if and only if the other is not. This surprising result is of
great importance.
Theorem 3.15. Let p and q denote odd primes. If p ≡ q ≡ 3 modulo 4, then
q
p
=−
.
p
q
If at least one of p or q is 1 modulo 4, then the symbols are equal.
Theorem 3.15 can be stated as a neater formula, and this is what we will
prove. If p and q are odd primes, then
q
p
(p−1)/2·(q−1)/2
= (−1)
.
(3.7)
p
q
The even prime 2 has to be treated separately: The theorem below will be
proved on p. 68.
Theorem 3.16. If p is an odd prime, then
2
= 1 if and only if p ≡ ±1
p
(mod 8).
Exercise 3.11. (a) Show that Theorem 3.16 can be written in the form
2
2
= (−1)(p −1)/8
p
for an odd prime p.
(p−1)/2
(b) Prove that ( −1
.
p ) = (−1)
Exercise 3.12. A Diophantine equation with solutions in Z must have solutions modulo p (that is, in Z/pZ) for all primes p.
(a) Show that the converse does not hold by proving that
(x2 − 2)(x2 − 3)(x2 − 6) = 0
has a solution modulo p for every prime p but no integral solution.
(b) Show that x8 − 16 = 0 has a solution modulo p for every prime p but no
integral solution.
68
3 Quadratic Diophantine Equations
Exercise 3.13. (a) Show that Equation (3.7) is equivalent to Theorem 3.15.
(b) Show that if p is an odd prime, then
−2
= 1 if and only if p ≡ 1 or 3 (mod 8).
p
√
(c) Use the arithmetic of Z[ −2] to show that the prime p can be written
p = x2 + 2y 2 , with x, y ∈ Z,
if and only if p ≡ 1 or 3 modulo 8.
Exercise 3.14. (a) Show that if p > 3 is a prime, then
−3
= 1 if and only if p ≡ 1 (mod 3).
p
(b) Show that the map x → x3 + 2 is a bijection on Z/pZ for any odd prime p
congruent to 2 modulo 3. Deduce that the equation(x2 + 3)(x3 + 2) = 0 has
a solution modulo q for any prime q but has no integral solutions.
Exercise 3.15. *Show that a monic polynomial f ∈ Z[x] of degree 4 or less
that has a solution modulo q for every prime q has an integral solution.
The proof of Theorem 3.16 acts as a dummy run for the proof of Theorem 3.15. The proofs given here are due to Serre.
Proof of Theorem 3.16. The prime p is odd, so p2 −1 ≡ 0 modulo 8. Let F
denote the field with p2 elements. Then F∗ is a cyclic group of order p2 − 1
by Theorem 3.3. Since p2 − 1 is divisible by 8, this implies that F∗ contains
an element of order 8. Let ζ denote such an element. Let
G = ζ − ζ 3 − ζ 5 + ζ 7.
(3.8)
Now (ζ 4 )2 = ζ 8 = 1, so ζ 4 = −1 (ζ has order 8, so we cannot have ζ 4 = 1)
and therefore
ζ 5 = −ζ and ζ 7 = −ζ 3 .
Therefore
G = 2(ζ − ζ 3 ),
so G2 = 4(ζ − ζ 3 )2 = 4(ζ 2 + ζ 6 − 2ζ 4 ). But ζ 4 + 1 = 0 implies that ζ 6 + ζ 2 = 0.
Therefore
G2 = 8.
Recall that we are working in the field F so that 8 denotes not only the
integer 8 but also the sum 1F + · · · + 1F (seven additions), where 1F is the
multiplicative identity in F∗ .
The proof of the theorem depends on finding two distinct expressions
for Gp .
3.3 The Quadratic Reciprocity Law
69
First expression for Gp :
Gp = GGp−1
= G(G2 )(p−1)/2
2
= G8(p−1)/2
because G = 8
8
by Euler’s criterion
=G
p
2
=G
by Corollary 3.14.
p
Second expression for Gp :
2
Define a function f : Z → {0, ±1} to be 0 when j is even and (−1)(j −1)/8
when j is odd. Notice that
f (j) = 1 if and only if j ≡ ±1
(mod 8).
The second expression for Gp is
Gp = f (p)G.
(3.9)
Equate the two expressions for Gp to obtain
2
= f (p)G.
G
p
Now G is not zero in F (because G2 = 8), so cancelling gives
2
= f (p) = 1 if and only if p ≡ ±1 (mod 8).
p
The field F has characteristic p, so (a + b)p = ap + bp in F because all
binomial coefficients apart from the end ones are divisible by p. (A similar argument was used in the proof of Fermat’s Little Theorem on p. 24.) Similarly,
by induction,
(a1 + · · · + an )p = ap1 + · · · + apn .
Using Equation (3.8) and the definition of f ,
G = f (1)ζ + f (3)ζ 3 + f (5)ζ 5 + f (7)ζ 7
= f (0) + f (1)ζ + f (2)ζ 2 + · · · + f (7)ζ 7
=
7
f (j)ζ j .
j=0
Thus
Gp =
7
j=0
f (j)ζ j
p
=
7
f (j)ζ jp .
(3.10)
j=0
Note that f (j) does not need to be raised to the power p because f (j)p = f (j).
70
3 Quadratic Diophantine Equations
Lemma 3.17. For all j ∈ Z, f (p)f (jp) = f (j).
Assuming this lemma for the moment, Equation (3.10) gives
Gp =
7
7
f (j)ζ jp =
j=0
f (p)f (jp)ζ jp = f (p)
7
j=0
f (jp)ζ jp .
j=0
Now, for fixed p, jp modulo 8 runs through 0, . . . , 7 as j does, so this shows
that Gp = f (p)G, proving Equation (3.9).
All we need to do now is prove Lemma 3.17, which states that
f (p)f (pj) = f (j).
Clearly, this is true if j is even, so suppose that j is odd. The statement is true
for any odd pair j and p. This can be checked by examining all the possibilities
for j and p modulo 16. Alternatively, notice that
(−1)((jp)
2
−1)/8
= (−1)((jp)
2
2
−p2 +p2 −1)/8
= ((−1)p )(j
= (−1)(j
2
2
−1)/8
−1)/8
(−1)(p
(−1)(p
2
2
−1)/8
−1)/8
.
This shows that f (jp) = f (j)f (p), and Lemma 3.17 follows by multiplying
both sides by f (p) (whose square is 1).
Finally, we come to the proof of the Quadratic Reciprocity Law (Theorem 3.15). Theorem 3.3 will again play a pivotal role.
Proof of Theorem 3.15. Consider the field F with pq−1 elements. Then F∗
is a cyclic group with order pq−1 − 1 by Theorem 3.3. By Fermat’s Little
Theorem, pq−1 ≡ 1 modulo q. Thus there is an element ζ in F∗ whose order
is q. Define
q−1 j
ζj.
(3.11)
G=
q
j=1
The sum G is called a Gauss sum because Gauss seems to have been the first
person to systematically study sums such as these.
The proof works as before by finding two different expressions for Gp . We
claim first that
G2 = (−1)(q−1)/2 q.
(3.12)
Using this, we can derive our first expression for Gp .
First expression for Gp :
Gp = GGp−1 = G(G2 )(p−1)/2 = G((−1)(q−1)/2 q)(p−1)/2
q
.
= G(−1)(q−1)/2·(p−1)/2 q (p−1)/2 = G(−1)(q−1)/2·(p−1)/2
p
3.3 The Quadratic Reciprocity Law
Second expression for Gp :
We claim that
Gp =
p
G.
q
71
(3.13)
Equating the two expressions gives
(q−1)/2·(p−1)/2
G(−1)
p
q
=
G.
p
q
We can cancel G because it is not zero in F (since its square is (−1)(q−1)/2 q,
which is not zero in F); the Quadratic Reciprocity Law follows at once.
The next step is to show Equation (3.13). By the Binomial Theorem,
q−1 p q−1 j
j
p
ζj
ζ jp
G =
=
q
q
j=1
j=1
because ( qj )p = ( qj ). By the multiplicativity of the Legendre symbol (Corollary 3.14), the right-hand side is
q−1 jp
p
ζ jp
q j=1 q
since pq is ±1. Now jp modulo q runs through 1, . . . , q − 1 as j does, so the
second expression for Gp can be written as in Equation (3.13).
The only tricky part of this proof is to evaluate G2 . Expanding the product
for G2 gives
q−1 q−1 j
−k
ζj
ζ −k ,
G2 =
q
q
j=1
k=1
noting that as k runs through 1, . . . , q − 1, so does −k modulo
q. k
= −1
By the multiplicativity of the Legendre symbol, −k
q
q
q .
−1
Pulling the factor q out to the front and replacing k by jk in the second
sum gives
q−1 q−1 jk
j
−1
ζ j(1−k) .
G2 =
q j=1
q
q
k=1
jk
= kq , so
By the multiplicativity of the Legendre symbol qj
q
G2 =
q−1 q−1 k
−1
ζ j(1−k) .
q j=1
q
k=1
72
3 Quadratic Diophantine Equations
Next we add zero to both sides of this equation in a special form. On the
right-hand side, add
q−1 k
ζ 0(1−k) .
(3.14)
0=
q
k=1
This expression is zero because half of the nonzero residues modulo q are
squares, so half of the values of the symbol are 1 and the other half are −1.
Thus
q−1 q−1 −1 k
G2 =
ζ j(1−k) .
q j=0
q
k=1
This double sum can be rearranged to give
q−1 q−1
k
−1
G =
ζ j(1−k) .
q
q j=0
2
(3.15)
k=1
By Euler’s criterion (Theorem 3.13), the term with k = 1 contributes
−1
q = (−1)(q−1)/2 q
q
to G2 .
We claim that all the other terms (those with k = 1) in Equation (3.15)
contribute nothing. Assume that k = 1, and write η = ζ 1−k . Then η is a
nontrivial qth root of 1. We claim that
S = 1 + η + · · · + η q−1 = 0.
To see this, notice that
ηS = η + η 2 + · · · + η q−1 + η q = 1 + η + · · · + η q−1 = S,
which shows that S = 0 since η = 1.
Apart from being a very beautiful result, the Quadratic Reciprocity Law
is important in that it allows the Legendre symbol to be rapidly computed.
This is useful in many areas, including primality testing (see, for example,
Section 12.6).
91 Example 3.18. Compute the Legendre symbol 167
using the Quadratic Reciprocity Law. First notice that
91
7
13
167
167
6
11
=
=−
=−
.
167
167
167
7
13
7
13
The problem has become more manageable and is readily finished by noting
that
3.4 Quadratic Rings
11
13
=
13
11
=
2
11
73
= −1
6
2
3
3
7
1
=
=
=−
=−
= −1.
7
7
7
7
3
3
91 It follows that 167
= −1.
and
19
1003
Exercise 3.16. Evaluate the Legendre symbols ( 11
37 ), ( 31 ), ( 111 ).
3.4 Quadratic Rings
It is tempting to conclude that we are now in a position to characterize those
primes p that can be written in the form
p = x2 + dy 2 ,
with x, y ∈ Z for a given d ∈ Z. Unfortunately, this problem is a little more
complicated than it√first appears. The methods of this chapter are applicable
only if the ring√Z[ −d] is Euclidean, and this is not always the case. The
structure of Z[ −d] is quite subtle, and some basic questions about these
rings are still open.
Exercise 3.17. Show
√ that the Fundamental Theorem of Arithmetic does not
hold in the ring Z[ −3] by considering the two factorizations
√
√
2 · 2 = 4 = (1 + −3)(1 − −3)
of 4. (Hint: Show that 2 cannot be a prime in this ring.)
Example 3.19. Consider the equation
x2 + 5y 2 = p.
In order to understand this, we expect to use the Quadratic Reciprocity Law
to solve
T 2 + 5 ≡ 0 (mod p)
for T .
Exercise 3.18. Show that ( −5
p ) = 1 if and only if p ≡ 1, 3, 7, or 9 modulo 20.
In particular, the congruence T 2 + 5 ≡ 0 modulo 7 has a solution: it is
easily found that T = 3 is a solution. However, the equation
x2 + 5y 2 = 7
has no solution in integers.
74
3 Quadratic Diophantine Equations
Exercise 3.19. Show
√ that the Fundamental Theorem of Arithmetic does not
hold in the ring Z[ −5].
√
Exercise 3.20. Show that there are infinitely many rings Z[ −d], where d is
a positive square-free integer, in which the Fundamental Theorem of Arithmetic does not hold.
The Quadratic Reciprocity Law is a useful tool for understanding when
quadratic congruences have no solutions. For example, Exercise 3.18 shows
that we will never obtain a solution to the equation x2 + 5y 2 = p if p is any
prime that is congruent to 11 modulo 20. More than that, it can predict the
existence of solutions when the equation cannot be checked easily by hand.
√
Exercise 3.21. (a) Show that Z[ 2] is Euclidean with respect to the norm
√
N (x + y 2) = x2 − 2y 2 .
(b) Show that if p is an odd prime, then the equation
x2 − 2y 2 = p
has a solution whenever p ≡ ±1 modulo 8 but has no solutions when p ≡ ±3
modulo 8. This is (by now) a routine use of
√ the Quadratic Reciprocity Law
together with the Euclidean property of Z[ 2].
√
Exercise 3.22. When d > 1 is square-free, the ring Z[ d] has infinitely many
units. Deduce that if
x2 − dy 2 = p
has a solution in integers, then it has infinitely many solutions. (The first
part of the exercise will be covered in the next section, but try to find a proof
yourself.)
The statement in the first part of the exercise is not easy. The equation
x2 − dy 2 = 1
is often called Pell’s Equation after the seventeenth-century mathematician
John Pell. This is now thought to be a misattribution. Brahmagupta seems
to have known how to solve the equation long before Pell. In the twelfth century, Bhaskaracharya discovered the simplest of the infinitely many nontrivial
solutions when d = 61, namely
x = 1766319049,
y = 226153980.
√
3.5 Units in Z[ d], d > 0
√
3.5 Units in Z[ d], d > 0
75
√
For d < 0, the ring R = Z[ d] has only finitely many units, so we assume
in this section that d > 0 is a fixed square-free integer. Write {t} for the
fractional part of a real number t.
Lemma 3.20. There are infinitely many coprime pairs of integers p and q > 0
with
√
1
|q d − p| < .
q
Proof. Let Q > 1 denote an integer. Divide the interval [0, 1) into Q subintervals [0, 1/Q), [1/Q, 2/Q), . . . and consider the (Q + 1) numbers
√
√
√
0, { d}, {2 d}, . . . , {Q d}.
√
There are (Q + 1) of them since d is irrational, so at least two must lie in a
single one of the Q intervals of the form [a/Q, (a + 1)/Q) by the pigeonhole
principle Thus there must be integers q1 , q2 with 0 q1 < q2 Q such that
√
√ {q2 d} − {q1 d} < 1/Q.
Unwinding the definition of the fractional part, this means that there are
integers p1 and p2 with
√
√
√
q2 d − p2 − q1 d + p1 = (q2 − q1 ) d − (p2 − p1 ) < 1/Q.
The proof is now finished by choosing Q q = q2 − q1 > 0 and p = p2 − p1 . This was originally proved by Dirichlet and is the starting point for a deep
subject known as Diophantine approximation. This subject has to do with
how well an irrational number can be approximated by rational numbers.
Exercise 3.23. Show that there is a constant C > 0 such that
√
C
< |q d − p|
q
for all integers p and q > 0.
Exercise 3.24. More generally, show that if α is algebraic of degree k > 1
(that is, α satisfies an irreducible polynomial of degree k with integer coefficients), then there is a constant C(α) > 0 such that
C(α)
< |qα − p|
q k−1
for all integers p and q > 0.
76
3 Quadratic Diophantine Equations
Theorem 3.21. If d > 1 is a square-free integer, then
x2 − dy 2 = 1
has infinitely many solutions√in integers (x, y). Moreover, each solution corresponds to a unit in R = Z[ d] with norm 1. Any unit with norm 1 has the
form ±un for n ∈ Z, where u is a fixed unit with norm 1.
Proof. Using Lemma 3.20, choose p, q > 0 with
√
1
|q d − p| < .
q
Then
p−
so
(3.16)
√
1
1
<q d<p+ ,
q
q
√
√
1
|q d + p| < 2q d + .
q
(3.17)
Multiplying the inequalities (3.16) and (3.17) shows that
√
|p2 − dq 2 | < 1 + 2 d.
We would like to show that the left-hand side is 1 for infinitely many
pairs (p, q). We cannot deduce this at once, but notice that the right-hand
side is a uniform bound (independent of p and q), so there must be an integer e with
√
1e<1+2 d
such that for infinitely many pairs p and q,
p2 − dq 2 = e.
There also must be infinitely many distinct pairs (p, q) and (p , q ) such
that p ≡ p modulo e, q ≡ q modulo e, and
pp − dqq ≡ p2 − dq 2 ≡ 0
(mod e).
Given such a distinct pair, write
pp − dqq = xe and pq − q p = ye
for integers x and y. Then
2
2
pp − dqq pq − qp
−d
x2 − dy 2 =
e
e
1 2 2
= 2 p (p − dq 2 ) − dq 2 (p2 − dq 2 )
e
1
= (p2 − dq 2 ) = 1,
e
√
3.5 Units in Z[ d], d > 0
77
so there are infinitely many solutions.
To prove the claim about the structure of the unit group, consider the map
L : U (R) → R2
defined by
√
√
√
L(x + y d) = (log(x + y d), log(x − y d)).
The image is a nontrivial discrete subgroup (see Exercise 3.25 below) of R2 ,
so it must have rank 1 or 2 by Exercise 3.26.
On the other hand, x2 − dy 2 = 1 implies that
√
√
log(x − y d) + log(x + y d) = 0,
so the image of L lies in a one-dimensional subspace of R2 and therefore
the rank must be 1. This is enough to prove the claim: The image set must
be {n(v, −v) | n ∈ Z} for some nonzero v ∈ R and the claim follows with u
satisfying L(u) = (v, −v).
Exercise 3.25. Explain why the image of L is a discrete subgroup of R2 in
the proof of Theorem 3.21.
Exercise 3.26. Prove that a discrete subgroup of Rn has rank less than or
equal to n.
Finding u is in general a nontrivial problem. In some books you will see the
method of continued fractions used, which does give an algorithm. The method
used here, which is a first step into the subject called geometry of numbers,
was chosen for two reasons. First, using a generalization of this argument, one
can go on to analyze the units of the ring of algebraic integers (see p. 84) inside
a number field. This always turns out to be finitely generated with a rank that
is easily computed from basic data about the number field. The method using
continued fractions does not generalize. Second, the geometry of numbers,
when worked out fully, really represents an application of topological ideas. If
you ask what kind of shapes in space must contain lattice points, you quickly
find yourself resorting to ideas such as compactness and connectedness as well
as convexity.
A beautiful fact about the solutions of the equation in Theorem 3.21√is
that they form a group. Moreover, the multiplication law on elements x + y d
can be expressed in terms of polynomial functions on the coordinates x and y
as follows:
√
√
√
(x1 + y1 d)(x2 + y2 d) = (x1 x2 + dy1 y2 ) + (x1 y2 + x2 y1 ) d.
In Chapter 5, we will encounter a whole family of Diophantine equations in
two variables whose solutions form groups, and for which the multiplication
law can be expressed in terms of rational functions on the coordinates.
78
3 Quadratic Diophantine Equations
3.6 Quadratic Forms
The subject of quadratic forms is a large one, and we will merely introduce it
here via a classical proof of a result due to Lagrange and some comment on
Gauss’ Theorem. Consider the Diophantine equation
ax2 + bxy + cy 2 = n
(3.18)
in which we seek an integral solution (x, y) for given a, b, c, n ∈ Z. The discriminant ∆ of the quadratic form ax2 + bxy + cy 2 is defined to be
∆ = b2 − 4ac.
Just as with the Pythagorean equation, there are several elementary reductions to be made.
First, if gcd(a, b, c) = d > 1, then d must also divide n, and Equation (3.18)
becomes
(a/d)x2 + (b/d)xy + (c/d)y 2 = (n/d),
so without loss of generality we may assume that gcd(a, b, c) = 1.
Second, if gcd(x, y) = e > 1, then
a(x/e)2 + b(x/e)(y/e) + c(y/e)2 = (n/e2 ),
so we may assume without loss of generality that x and y are coprime. As in
the Pythagorean case, call solutions (x, y) with gcd(x, y) = 1 primitive.
Third, if the discriminant ∆ is a square, then the equation
at2 + bt + c = 0
has rational solutions that may be written u1 /v1 and u2 /v2 in lowest terms,
with v1 and v2 positive, so Equation (3.18) may be written as
a(v1 x − u1 y)(v2 x − u2 y) = nv1 v2 .
This is not really a quadratic equation, but a pair of linear ones. For each
integral pair (r, s) with ars = nv1 v2 , solve the equations
v1 x − u1 y = r,
v2 x − u2 y = s.
Integral solutions to this pair of equations – if there are any – solve Equation (3.18).
Exercise 3.27. Let ρ =
1 − (−1)b
∆−ρ
. Show that
is an integer.
2
4
3.6 Quadratic Forms
79
Theorem 3.22. [Lagrange] Let ∆ be a nonsquare integer. Then there is a
quadratic form ax2 + bxy + cy 2 of discriminant ∆ with a primitive solution to
ax2 + bxy + cy 2 = n
if and only if the congruence
∆−ρ
z + ρz −
≡0
4
2
(mod n)
(3.19)
has a solution z.
Proof. Assume that (α, β) is a primitive integral solution to Equation (3.18).
By Theorem 1.23, there are integers γ, δ with
αγ + βδ = 1.
x
α −δ X
=
.
y
β γ
Y
Let
Notice that det
α −δ
= 1, so this matrix is an invertible transformation
β γ
on Z2 .
Let
r = aαδ + cβδ +
b+ρ
2
and
s = aδ 2 + bδγ + cγ 2 .
Notice that by our choice of ρ, both r and s are integers. Now express Equation (3.18) in the variables X and Y to obtain
a(αX − δY )2 + b(αX − δY )(βX − γY ) + c(βX + γY )2
= X 2 (aα2 + bαβ + cβ 2 ) + XY (2aαδ − b(αγ + βδ) + 2cβγ) + Y 2 (aδ 2 + bδγ + cγ 2 )
= nX 2 + (2r + ρ)XY + sY 2 = n.
The equation
nX 2 + (2r + ρ)XY + sY 2 = n
(3.20)
has the solution X = 1, Y = 0, corresponding to
α
α −δ 1
=
.
β
β γ
0
The discriminant of Equation (3.20) is
(2r + ρ)2 − 4sn = ∆,
(3.21)
80
3 Quadratic Diophantine Equations
so
r2 + ρr −
∆−ρ
= sn,
4
showing that r is a solution of the congruence (3.19).
Conversely, assume that r is a solution to the congruence (3.19). Then
solving Equation (3.21) gives an integer s and hence the integer solution X =
1, Y = 0 to Equation (3.20). Changing back to the variables x, y using
X
γ δ x
=
Y
−β α y
gives an integral solution to the equation
nx2 + (2r + ρ)xy + s2 = n
that has discriminant ∆.
Example 3.23. Let a = 1, b = 0, c = 5, and n = 7, so ρ = 0 and ∆ = −20.
Theorem 3.22 applies to say that there is a quadratic form representing 7 with
discriminant −20 if and only if
z 2 + 5 ≡ 0 (mod 7)
has a solution. We know that −5 is a quadratic residue modulo 7, so there is
such a form. The proof constructs the form
7x2 + 6xy + 2y 2 ,
and of course this represents 7 when x = 1 and y = 0.
Exercise 3.28. Prove that any odd prime congruent to 1 modulo 4 is a sum
of two integer squares using Theorem 3.22 (cf. Theorem 2.6 where this was
proved using different methods).
The next exercises explore the change of variables (X, Y ) to (x, y) used
in the proof of Theorem 3.22. An integer n is said to be represented by an
integral quadratic form Q if there are integers x and y with Q(x, y) = n.
Exercise 3.29. Let
P (x, y) = ax2 + bxy + cy 2
and
Q(x, y) = AX 2 + BXY + CY 2
be binary quadratic forms with integer coefficients. Say that P and Q are
equivalent, written P ∼ Q, if there is an integral change of variables
x
α −δ X
=
y
β γ
Y
3.6 Quadratic Forms
81
α −δ
with det
= 1 such that
β γ
P (x, y) = Q(X, Y ).
(a) Show that ∼ is an equivalence relation.
(b) Show that equivalent quadratic forms have the same discriminant.
(c) Show that equivalent quadratic forms represent the same set of integers:
If P ∼ Q, then
{P (x, y) | x, y ∈ Z} = {Q(x, y) | x, y ∈ Z}.
Exercise 3.30. Show that a prime number p is represented by a quadratic
form P if and only if there is a quadratic form equivalent to P of the form
px2 + dxy + ey 2
for integers d and e.
Let P (x, y) = ax2 + bxy + cy 2 be a quadratic form. Then P is positivedefinite if P (x, y) 0 for all x and y and is reduced if either
c > a and − a < b a
or
c = a and 0 b a.
Exercise 3.31. Prove that a positive-definite binary quadratic form is equivalent to a unique reduced quadratic form.
Exercise 3.32. The class number of d is the number of equivalence classes
of positive-definite forms with discriminant d. Prove that the class number is
finite for any d.
Notes to Chapter 3: Artin’s conjecture from Section 3.1 is still open – see the
monograph [58, Section 3.2, 3.3] by Everest, van der Poorten, Shparlinski and Ward
for descriptions of what is known and references to the literature. Shoup’s result
can be found in his paper [138]. There is a discussion of the history of the Chinese
Remainder Theorem in many places; see the ‘History of Mathematics’ Web site [113]
for references. Mahler’s paper [102] gives an account of the method actually used by
the early Chinese mathematicians, as opposed to the modern approach which follows
Gauss [67]. We thank Robin Chapman for Exercise 3.12(b). Gauss was justly proud
of having proved the Quadratic Reciprocity Law and many mathematicians have
seen it since as foundational in the modern theory of numbers. The history and
mathematics of the Quadratic Reciprocity Law and the development of reciprocity
laws for higher degrees are described in Lemmermeyer’s monograph [98].
4
Recovering the Fundamental Theorem of
Arithmetic
This short chapter will explain how ideal theory was developed as a means
of recovering from the failure of the Fundamental Theorem of Arithmetic
witnessed in Chapter 3. We begin with a few historical remarks to set that
development in context and go on to give a reasonably complete account of
unique factorization of ideals in the ring of algebraic integers in a quadratic
field. Finally we introduce the class number and the class group.
4.1 Crisis
The attempt to understand fully the problem we set out to study in the
last chapter exposed a phenomenon that represented something of a historical crisis. During the nineteenth century, mathematicians had to come to
terms with the breakdown of the Fundamental Theorem of Arithmetic. In
March 1847, Lamé announced a proof of Fermat’s Last Theorem (described in
Section 2.5.1) to the Paris Academy, assuming (wrongly) that the Fundamental Theorem of Arithmetic held in the ring Z[e2πi/n ] for every n 1. Lamé
acknowledged that Liouville originally suggested this approach to Fermat’s
Last Theorem, but Liouville himself addressed the meeting and suggested
that there might be a problem with the assumption of unique factorization
into primes.
The question raised was this: Does unique factorization into primes hold
in the ring Z[e2πi/n ]? This problem became a focal point for rapid developments. On May 24th 1847, Liouville presented a letter from Kummer to
the Academy that settled the arguments. Kummer had proved in 1844 that
unique factorization failed in general but that his “ideal complex numbers”
in a paper of 1846 allowed a form of unique factorization to be recovered. By
September 1847, Kummer had presented a paper to the Berlin Academy in
which he proved that for p a regular prime 1 Fermat’s Last Theorem holds for
1
A prime p is called regular if p does not divide the numerators of any of the
Bernoulli numbers B2 , B4 , . . . , Bp−3 ; the Bernoulli numbers are defined on p. 203.
84
4 Recovering the Fundamental Theorem of Arithmetic
exponent n = p, essentially by Lamé’s method. In this paper, Kummer also
showed that 37 is not regular since 37 divides the numerator of B32 . Thus
Kummer proved Fermat’s Last Theorem for many indices and showed that
Lamé’s approach failed for others.
These dramatic developments did not lead to a proof of Fermat’s Last
Theorem but contributed to algebraic number theory√in a profound way by
eventually leading to the result that rings such as Z[ −5] do have a kind of
Fundamental Theorem of Arithmetic – but at the level of ideals rather than
elements.
4.2 An Ideal Solution
Definition 4.1. An ideal in a commutative ring R is a subgroup of the additive group of R that is closed under multiplication by elements of R.
It is easy to construct ideals in a commutative ring: Take all the multiples
(a) = aR = {ar | r ∈ R}
√
of a single element a. In rings such as Z and Z[ −2], all ideals are of this
form, and this is true for any Euclidean ring.
Exercise 4.1. (a) Using the Euclidean Algorithm, prove that any ideal in Z
has the form (k) = kZ for some k ∈ Z.
(b) More generally, prove that in a Euclidean ring R any ideal has the
form (k) = kR, the multiples of a single element k.
Such singly-generated ideals are called principal, and any ring in which all
ideals are principal is called a principal ideal domain.
The statement in Exercise 4.1(b) is not true in all commutative rings.
It is difficult to envisage what ideals look like in general; however, a more
sophisticated version of the Fundamental Theorem of Arithmetic makes them
easier to understand. This is described in Section 4.3 for quadratic fields.
Any field K containing the rationals contains a ring OK of algebraic integers; this ring is a generalization of the usual integers in the rationals, and
is defined to be the set of all zeros in the field K of monic polynomials with
coefficients in Z.
√
√
Exercise 4.2. Show that the ring of√algebraic integers in Q( d) is Z[ d]
if d ≡ 2 or 3 modulo 4 and is Z[(1 + d)/2]√if d ≡ 1 modulo 4. (Hint: Start
by showing that any algebraic integer in Q( d) that is not in Z must satisfy
a quadratic equation.)
√
Exercise
√ 4.3. By√the previous√exercise, the ring of algebraic integers in Q( 6)
(or Q(√ 14))
that the ring of algebraic integers
√ is Z[ 6] (resp. Z[ 14]). Prove
√ √
in Q( 6, 14) is strictly larger than Z[ 6, 14].
4.3 Fundamental Theorem of Arithmetic for Ideals
85
Exercise 4.4. Adapt the methods of Theorem 3.21 to show that the group
√
∗
of units OK
inside the ring of algebraic integers OK of the field K = Q( d)
when d > 0 is square-free comprises {±un | n ∈ Z}, where u > 1 is some unit
of OK . Such an element u is called a fundamental unit.
Exercise 4.5. Find fundamental units for the real quadratic fields
√
√
√
√
Q( 2), Q( 3), Q( 5), and Q( 7).
√
Any element of Q( √
d) for a square-free integer d may be written √
uniquely
in the form α = x + y d with x and y rational. This
presents
Q(
d) as a
√
two-dimensional vector space over Q with basis {1, d}.
√
√
Definition 4.2. The norm of α = x + y d in Q( d) is defined to be
N (α) = x2 − dy 2
and the trace
T (α) = 2x.
√
Exercise√4.6. The map β → αβ on Q( d) is a Q-linear map on the Q vector
space Q( d). Find the 2 × 2 matrix determined by this map, and show that
the absolute value of its determinant is |N (α)| and its trace is T (α).
Unique factorization will be recovered in Section 4.3 by working with prime
ideals in the algebraic integers. These matters represent the beginnings of
an important subject called algebraic number theory. The recovery of the
Fundamental Theorem of Arithmetic at the level of ideals represents a major
achievement that continues to influence the development of number theory
and geometry.
Theorem 2.14 on p. 55 gives a different way to recover the Fundamental
Theorem of Arithmetic, used to dramatic effect in Theorem 2.13, but the
development of ideal theory proved to be of much greater importance.
4.3 Fundamental Theorem of Arithmetic for Ideals
We begin with a natural definition of multiplication on ideals. Subsequently,
we introduce a notion of prime ideal, then we go on to show that every nontrivial ideal factorizes as a product of prime ideals in a way that is unique.
Definition 4.3. Let I and J denote ideals in a commutative ring. The sum
and product of I and J are defined by I + J = {a + b | a ∈ I, b ∈ J}, while IJ
is the additive subgroup generated by the set {ab | a ∈ I, b ∈ J}.
Exercise 4.7. If I and J denote ideals in a commutative ring, prove that the
sum and the product of I and J are also ideals.
86
4 Recovering the Fundamental Theorem of Arithmetic
Sums and products of more than two ideals are defined in an entirely
analogous fashion and again turn out to be ideals.
It might have seemed more natural to define IJ to be the set
{ab | a ∈ I, b ∈ J}
rather than the subgroup this set generates, however the set of products by
itself is not always closed under addition.
Exercise 4.8. Give an example of a commutative ring R together with two
ideals I and J such that the set {ab | a ∈ I, b ∈ J} is not an ideal.
√
√
√
If α = x + y d ∈ K = Q( d), write α∗ = x − y d for the conjugate of α.
If d < 0, then this is the usual complex conjugate; for d > 0 this terminology
comes from Galois theory. For an ideal I, write I ∗ for the set of conjugates
of α ∈ I.
Exercise 4.9. Let I and J denote ideals in OK .
(a) Show that I ∗ is an ideal in OK .
(b) Show that (I + J)∗ = I ∗ + J ∗ .
(c) Show that (IJ)∗ = I ∗ J ∗ .
If α1 , . . . , αk are elements of OK , write
(α1 , . . . , αk )
for the ideal
(α1 ) + · · · + (αk ) = α1 OK + · · · + αk OK
generated by α1 , . . . , αk . Also define
α1 , . . . , αk = α1 Z + · · · + αk Z
for the additive subgroup of OK generated by α1 , . . . , αk . It is important to
distinguish these different types of generation.
In what follows,
we are going to work with the full ring of algebraic integers
√
in the field √
Q( d) for a square-free integer√d. Following Exercise 4.2, define δ
to be (1 +√ d)/2 if d ≡ 1 modulo 4 and d if d ≡ 2 or 3 modulo 4. Thus,
if K = Q( d), then OK = Z[δ]. Ideals in OK , although not always principal,
can always be generated as ideals by two elements.
Theorem 4.4. Let I denote an ideal in OK . Then there are elements α, β in I
with I = (α, β).
Proof. Since OK as an additive group is a subgroup of Q2 , it follows that I
can be generated as an additive group by two elements. We will first show
that one of these elements can be chosen to lie in Z. Let
B = {b ∈ Z | a + bδ ∈ I for some a ∈ Z};
4.3 Fundamental Theorem of Arithmetic for Ideals
87
then B is an ideal of Z. Hence B = gZ for some g ∈ Z and similarly I ∩Z = hZ
for some h ∈ Z. Since g ∈ B, there must be c ∈ Z with c + gδ ∈ I.
We claim that
I = c + gδ, h.
(4.1)
Clearly c + gδ, h ⊆ I. Now assume that a + bδ ∈ I with a, b ∈ Z.
Since b ∈ B, b = eg for some e ∈ Z. Therefore
a − ec = a + bδ − e(c + gδ).
This is an element of I ∩ Z, so it can be written as f h for some f ∈ Z. Then
a + bδ = a − ec + e(c + gδ) = f h + e(c + gδ) ∈ c + gδ, h,
showing Equation (4.1).
To finish the proof of the theorem, use Equation (4.1) to write α = c + hδ
and β = g. Then α, β ∈ I, so (α, β) ⊆ I. Conversely, if γ ∈ I, then for
integers m and n,
γ = mα + nβ,
so I ⊆ (α, β), which concludes the proof.
As a final step toward proving the Fundamental Theorem of Arithmetic
for ideals in OK , we note the following lemma.
Lemma 4.5. [Hurwitz’s Lemma] If α, β are elements of OK and k ∈ Z
divides N (α), N (β), and T (αβ ∗ ), then k divides αβ ∗ and α∗ β in OK .
Exercise 4.10. Prove Hurwitz’s Lemma. (Hint: This only uses simple properties of the norm and trace functions.)
Corollary 4.6. Let I denote any ideal of OK . Then II ∗ is a principal ideal kZ
of Z.
Proof. We know that I = (α, β) for some α, β, so
II ∗ = (α, β)(α∗ , β ∗ ) = (αα∗ , αβ ∗ , βα∗ , ββ ∗ ).
This means that II ∗ contains the integers N (α) = αα∗ and N (β), as well
as T (αβ ∗ ) = αβ ∗ + α∗ β. If k denotes the greatest common divisor
of these
integers then k ∈ II ∗ , so (k) ⊆ II ∗ . Now k N (α), k N (β), and k T (αβ ∗ ) and
hence, by Hurwitz’s Lemma, k αβ ∗ and k βα∗ , so II ∗ ⊆ (k) as claimed. The integer k appearing in Corollary 4.6 may be taken as positive without
loss of generality since kZ = −kZ.
Definition 4.7. If I denotes any ideal of OK , then the unique integer k > 0
with II ∗ = kZ is called the norm of I, written N (I).
88
4 Recovering the Fundamental Theorem of Arithmetic
Corollary 4.8. (1) For I = (α, β),
N (I) = gcd(N (α), N (β), T (αβ ∗ )).
(2) If I = (α) is a principal ideal, then N (I) = N (α).
(3) The norm is multiplicative: N (IJ) = N (I)N (J) for all ideals I and J.
(4) N (I) = [OK : I], the group-theoretic index of I as a subgroup of OK .
Proof.(1) This appeared in the proof of Corollary 4.6.
(2) This follows because (α)(α∗ ) = (αα∗ ).
(3) (N (IJ)) = IJI ∗ J ∗ = (II ∗ )(JJ ∗ ) = (N (I))(N (J)) = (N (I)N (J)).
Exercise 4.11. Prove Corollary 4.8(4). (Hint: if (h, c + gδ) is a nonzero ideal
of OK , then N (I) = gh.)
Corollary 4.9. If I, J and K are ideals of OK with I = {0} and IJ = IK,
then J = K.
Proof. This is obvious if I = (α) is principal because in that case IJ = αJ
so J = α−1 (IJ). Similarly, K = α−1 (IK) = α−1 (IJ) = J. In general, the
identity IJ = IK implies that
(II ∗ )J = (IJ)I ∗ = (IK)I ∗ = (II ∗ )K,
and the result follows as before.
This important ‘cancellation’ property of ideals in OK will play a key role
in the proof of the Fundamental Theorem of Arithmetic for ideals.
Definition 4.10. If I and J are ideals in OK , we write I J (I divides J) if
there is an ideal K in OK with J = IK.
Notice that IK ⊆ I, so if I J then J ⊆ I.
Lemma 4.11. Given two ideals I and J in OK , I J if and only if J ⊆ I.
Proof. One direction is already proved, so assume that J ⊆ I. Then
JI ∗ ⊆ II ∗ = (N (I)),
so
1
JI ∗
N (I)
is an ideal contained in OK . It follows that
K=
IK =
1
1
1
I(JI ∗ ) =
J(II ∗ ) =
J(N (I)) = J,
N (I)
N (I)
N (I)
and hence I J as claimed.
In what follows, we see a real duplication of ideas from Chapter 1, worked
out in the context of ideals. The interchangeability of inclusion and divisibility
for ideals will be used repeatedly.
4.4 The Ideal Class Group
89
Definition 4.12. A nonzero ideal
I = R in a commutative ring R is called
maximal
if
for
any
ideal
J,
J
I implies that J = I. An ideal P is prime
if P IJ implies that P I or P J.
Exercise 4.12. In a commutative ring R, let M and P denote ideals.
(a) Show that M is maximal if and only if the quotient ring R/M is a field.
(b) Show that P is prime if and only if R/P is an integral domain (that is,
in R/P the equation ab = 0 forces either a or b to be 0).
(c) Deduce that every maximal ideal is prime.
Theorem 4.13. [Fundamental Theorem of Arithmetic for Ideals]
Any nonzero proper ideal in OK can be written as a product of prime ideals,
and that factorization is unique up to order.
Proof. If I is not maximal, it can be written as a product of two nontrivial
ideals. Comparing norms shows these ideals must have norms smaller than I.
Keep going: The sequence of norms is descending, so it must terminate, resulting in a finite factorization of I. By Exercise 4.12, every maximal ideal is
prime, so all that remains is to demonstrate that the resulting factorization is
unique. This uniqueness follows from Corollary 4.9, which allows cancellation
of nonzero ideals common to two products.
4.4 The Ideal Class Group
In this section, we are going to see how the nineteenth-century mathematicians
interpreted Exercise 3.32 on p. 81 in terms of quadratic fields. The major result
we will present is that ideals in OK , for a quadratic field K, can be described
using a finite list of representatives I1 , . . . , Ih ; any nontrivial ideal I can be
written Ii P , where 1 i h and P is a principal ideal. Thus h, known as the
class number, measures the extent to which OK fails to be a principal ideal
domain. This statement was proved for arbitrary algebraic number fields and
proved to be influential in the way number theory developed in the twentieth
century.
Given two ideals I and J in OK , define a relation ∼ by
I ∼ J if and only if I = λJ for some λ ∈ K∗ .
Exercise 4.13. Show that ∼ is an equivalence relation.
We are going to outline a proof of the following important theorem.
Theorem 4.14. There are only finitely many equivalence classes of ideals
in OK under ∼.
90
4 Recovering the Fundamental Theorem of Arithmetic
One class is easy to spot – namely the one consisting of all principal ideals.
Of course, OK is a principal ideal domain if and only if there is only one class
under the relation. One can define a multiplication on classes: If [I] denotes
the class containing I, then one can show that the multiplication defined by
[I][J] = [IJ]
(4.2)
is independent of the representatives chosen.
Corollary 4.15. The set of classes under ∼ forms a finite Abelian group.
The group in Corollary 4.15 is known as the ideal class group of K (or just
the class group).
Proof of Corollary 4.15. In the class group, associativity of multiplication is inherited from OK . The element [OK ] acts as the identity. Finally, given
any nonzero ideal I, the relation II ∗ = (N (I)) shows that the inverse of the
class [I] is [I ∗ ].
Lemma 4.16. Given a square-free integer d = 1, there is a constant Cd √that
depends upon d only such that for any nonzero ideal I of OK , K = Q( d),
there is a nonzero element α ∈ I with |N (α)| Cd N (I).
Exercise 4.14. *Prove Lemma 4.16. The basic idea is a technique similar to
that used in the proof of Theorem 3.21 showing that a lattice point must
exist in a region constrained by various inequalities. Since the original proof,
considerable efforts have gone into decreasing the constant Cd for practical application. The best techniques use the geometry of numbers, a theory initiated
by Minkowski.
Proof of Theorem 4.14. First show that every class contains an ideal
whose norm is bounded by Cd . Given a class [I], apply Lemma 4.16 with I ∗
replacing I. Now (α) ⊆ I ∗ , so we can write (α) = I ∗ J for some ideal J.
However, this gives a relation [I ∗ ][J] = [(α)] in the class group. This means
that [J] is the inverse of [I ∗ ]. However, we remarked earlier that [I] and [I ∗ ]
are mutual inverses in the class group. Hence [I] = [J]. Now
|N (α)| = N ((α)) = N (I ∗ )N (J).
Since the left-hand side is bounded by Cd N (I ∗ ), we can cancel N (I ∗ ) to
obtain N (J) Cd .
Now the theorem follows easily: For any given integer k 0, there are only
finitely many ideals of norm k; this is because any ideal must be a product
of prime ideals of norm p or p2 , where p runs through the prime factors of k.
There are only finitely many such prime ideals and hence there are only finitely
many ideals of norm k. Now apply this to the integers k Cd to deduce that
there are only finitely many ideals of norm bounded by Cd . Since each class
contains an ideal whose norm is thus bounded, by the first part of the proof,
it follows that there are only finitely many classes.
4.4 The Ideal Class Group
91
Exercise 4.15. Investigate the relationship between quadratic forms and ideals in quadratic fields. In particular, show that Exercise 3.32 on p. 81 is equivalent to Theorem 4.14. (Hint: If I denotes an ideal with basis {α, β}, show
that for x, y ∈ Z, N (xα +yβ)/N (I) is a (binary) integral quadratic form. How
does a change of basis for I relate to the form? What effect does multiplying I
by a principal ideal have on the form?)
4.4.1 Prime Ideals
To better understand prime ideals, we close with an exercise that links up the
various trains of thought in this chapter and shows that ideal theory better
explains the various phenomena encountered in Chapter 3.
√
Exercise 4.16. Factorize the ideal (6) √
into prime ideals in Z[ −5], expressing
each prime factor in the form (a, b + c −5).
Exercise 4.17.
√ Let OK denote the ring of algebraic integers in the quadratic
field K = Q( d) for a square-free integer d.
(a) If P is a prime ideal in OK , show that P (p) for some integer prime p ∈ Z.
(b) Show that there are only three possibilities for the factorization of the
ideal (p) in OK :
(p) = P1 P2 where P1 and P2 are prime ideals in OK (p splits);
(p) = P , where P is a prime ideal in OK (p is inert);
(p) = P 2 , where P is a prime ideal in OK (p is ramified ).
This should be compared with the possible primes in Z[i] described in Theorem 2.8(3). The following exercise gives a complete description of splitting
types in terms of the Legendre symbol.
Exercise 4.18.
√ Let OK denote the ring of algebraic integers in the quadratic
field K = Q( d) for a square-free integer d. Let D = d if d ≡ 1 modulo 4
and let D = 4d otherwise. Show
that an odd prime p is inert, ramified, or
is −1, 0, or +1, respectively. What are the
split as the Legendre symbol D
p
possibilities when p = 2?
We should say something about the terminology. Splitting and inertia are
fairly obvious, the latter signifying that the prime p remains prime in this
bigger ring, just as primes p ≡ 3 modulo 4 remain primes in Z[i]. The term
“ramify” means literally to branch, and we see here something
of an overlap
√
with the theory of functions. A function such as y = x really consists of
two possible branches. This notion was borrowed deliberately to name the
phenomenon seen in number theory, where a prime in Z becomes a power of
a prime in a larger ring. We end this chapter with a definition because it is
going to appear again in Chapter 11.
92
4 Recovering the Fundamental Theorem of Arithmetic
√
Definition 4.17. Let K = Q( d) denote a quadratic field, where d is a
square-free integer. Define D by
d if d ≡ 1 modulo 4 and
D=
(4.3)
4d otherwise.
√
Then D is called the discriminant of the quadratic field K = Q( d).
Notes to Chapter 4: Much of this chapter was based on Robin Chapman’s excellent expository notes. To see the details worked out economically in the general
case, consult Lang’s book [96]. Lemma 4.16 is proved as Theorem 4 on p.119 of that
book; Chapter V is an excellent introduction to Minkowski’s geometry of numbers.
5
Elliptic Curves
One of the many powerful ideas that have been brought to bear on problems in
number theory is a connection between Diophantine problems and geometry.
Exercise 2.2 on p. 45 gave a hint of this phenomenon; the geometric structure
in that case was a unit circle, an object with algebraic structure in that the
points of the unit circle form a group. In this chapter, we introduce a family
of curves with a group structure. The main aim is to develop a working understanding of the group operation and to illustrate this with many examples.
In subsequent chapters we will make these ideas more rigorous.
5.1 Rational Points
Having studied the Pythagorean equation, which has infinitely many integral
solutions, perhaps the existence of so few integral solutions to the equation1
y 2 = x3 − 2
seems a little disappointing. However, the solution (3, 5) has an amazing property. We can use this one integral solution to generate other exotic rational
solutions to the equation. It may
seem
obvious, but we can use this so not383
lution to generate the solution 129
,
100 1000 . Moreover, in a precise sense, this
rational solution is the next simplest solution to the equation.
1
The equation y 2 = x3 + C is sometimes called Bachet’s equation after Claude
Bachet (1581–1638). Bachet is most famous for translating the Arithmetica of
Diophantus from Greek into Latin. This is the book in which Fermat wrote his
famous marginal note asserting what is now called Fermat’s Last Theorem. In
addition, Bachet discovered the duplication formula for this curve, showing that
if (x, y) is a solution, then
((x4 − 8Cx)/(2y)2 , (−x6 − 20Cx3 + 8C 2 )/(2y)3 )
is also a – potentially different – solution.
94
5 Elliptic Curves
To see how this is done, first construct the tangent to the curve at the
point P = (3, 5). This has equation
y=
27
31
x− .
10
10
If we substitute the equation for this line into the equation of the curve, then
(we claim) the line will meet the curve at another point, and this point will
have rational coordinates.2 To see this more explicitly, note that when we
substitute, we get a cubic equation for x,
31
27
x−
10
10
2
= x3 − 2.
We claim that x = 3 is a double root of this equation. Clearly, it is a single root
by substituting in and getting 25 on both sides of the equation; differentiating
and substituting shows it is a double root because you get 27 on both sides.
To find the third point of intersection, use the sum of roots formula. For
a cubic, this says that if x1 , x2 , and x3 are the three zeros of the cubic
x3 + ax2 + bx + c,
then x1 + x2 + x3 = −a (see Exercise 5.11 on p. 105). Applying this, and
letting x denote the third root, we see that
3+3+x=
27
10
2
.
Solving this for x gives x = 129
100 . To find y, use the equation of the tangent to
383
see that y = 1000
.
It is tempting to try this again. We cannot expect anything by joining our
new point back to P . However, we could join the other integral solution (3, −5)
to the new point to see where the line meets the curve again. Technically, it
is better to reflect the new point in the x-axis and try to join that to our first
point (for reasons
that will
become apparent later). Thus we define P1 = (3, 5)
383
and P2 to be 129
100 , − 1000 . Recursively define Pn to be the reflection in the xaxis of the third point of intersection of the line joining P to Pn−1 . The next
point is
164323
66234835
,
,−
P3 =
29241
5000211
from which we obtain
P4 =
2
2340922881 113259286337279
.
,
58675600
449455096000
If you are comfortable with geometrical notions, then you will accept that since P
is already a double point, the third point must be rational.
5.1 Rational Points
95
It is an amazing fact that there are infinitely many rational points on this
curve and (up to reflection in the x-axis) they can all be constructed in this
way, starting with P .
This example already exhibits some typical behavior; for example, the
denominator of the x-coordinate of P3 is a square while the denominator of
the y-coordinate is the cube of the same number:
66234835
164323
.
,
−
P3 =
1712
1713
Exercise 5.1. Prove that any rational point on the curve y 2 = x3 − 2 must
have the shape P = (A/B 2 , C/B 3 ) for coprime integers A, B, C.
Example 5.1. To start to understand what is going on in the geometrical iteration that produces the points Pn , consider the sequence (Bn ), where Bn is
the square root of the denominator of the x-coordinate of Pn . The first few
values are shown in Table 5.1.
Table 5.1. Growth in the values of Bn .
n
Bn
1
1
2
10
3
171
4
7660
5
12660211
6
22652313570
7
58809175344521
8
1735132266687114280
9
357172782187144055262201
10
115455343251682907198856192050
11
30298854203539385536028167296302051
12
689991490842950483313935163766440646064580
13
22743339816243727151383520741637996456735801712571
14
1301982234059157037070228212465238100265563723924858470330
15 45687890972429224342713610900040552323182688307706080693278173039
The lengths of the numbers Bn written out in decimal digits seem to grow
quadratically in n. The number of digits in Bn is approximately log10 Bn , so
this suggests a relationship between log Bn and n2 . The following beautiful
result makes this precise.
Theorem 5.2. There is a constant h > 0 for which
1
log Bn → h as n → ∞
n2
where (Bn ) is the sequence in Table 5.1.
96
5 Elliptic Curves
We are not going to prove this result. An easy consequence of Theorem 5.2
is the finiteness of the number of integral points in the sequence (Pn ). Later,
we will prove that the maximum of log |An | and 2 log |Bn | also grows as in the
statement of Theorem 5.2 (see the comments after Theorem 7.13 on p. 147.)
In Section 7.4.1, Theorem 7.15, we will relate the growth rates of log |An |
and 2 log |Bn | to each other.
The geometrical operation taking Pn to Pn+1 described above is a special case of a more general one: There is a binary operation on the set of
points (x, y) satisfying the equation y 2 = x3 − 2 that behaves like a group
law. (At this point there is no indication of an identity.) Indeed, we can define such an operation on the set of points satisfying any equation of the
form y 2 = x3 + ax2 + bx + c under the nondegeneracy condition of no repeated
zeros used in Siegel’s Theorem (Theorem 2.13 on p. 54).
Let E denote the set of points (x, y) with y 2 = x3 + ax2 + bx + c, assume
that the cubic has no repeated zeros, and define a binary operation + on the
curve E as follows. If P and Q are points on E, then the line through P and Q
meets E in exactly one further point, say (x, y). The reflection R = (x, −y)
of (x, y) in the x-axis is then defined to be P + Q (see Figure 5.1). The
case P = Q requires a notion of tangency (which can be defined for curves
over any field, using order of vanishing), and then 2P is obtained by reflecting
the unique other point of intersection of the line tangent to the curve at P in
the x-axis. The tangent is well-defined by the nondegeneracy condition.
Exercise 5.2. Draw the curve y 2 = x2 (x + 1). Show that the tangent at (1, 0)
is not well-defined.
Theorem 5.3. The set E with binary operation + forms an Abelian group
after adding one point “at infinity.”
A natural question is to ask what the identity of the group is, and this
will be fully resolved – and the theorem proved – in the next chapter. At this
stage, we have to confess that the identity element does not appear to exist
– it is the point ‘at infinity’. For now, we can think of this as a formal single
point added to the plane with the property that it lies on any vertical line of
the form x = constant. We will give more justification for this claim and will
return to the question when we have described a fascinating class of functions
that lie behind the theory of elliptic curves – see Chapter 6. The identity
element is the point at infinity, which in Figure 5.1 may be thought of as being
reached by moving infinitely far up (or down) the y-axis. Additive inverses
are given by reflection in the x-axis, so if P = (x, y) then −P = (x, −y). In
Figure 5.2, the point P = (0, 1) on the curve y 2 = x3 − 3x + 1 is shown, with
a sequence of points approaching −P = (0, −1) shown being added to P ; the
third point of intersection is approaching 0, the point at infinity. Take care
not to confuse 0, the point at infinity, with the origin (0, 0).
Exercise 5.3. Draw a picture of the (x, y) plane with a unit sphere whose
South pole is tangent to the plane at (0, 0). Define a map from the plane
5.1 Rational Points
97
4
2
−2
1
2
3
−2
−4
Figure 5.1. The binary operation on y 2 = x3 +1, showing (2, 3)+(0, 1)+(−1, 0) = 0.
to the sphere by sending a point P on the plane to the unique point on the
sphere that is collinear with P and the North pole. Show that the closure of
the image of a curve y 2 = x3 + ax2 + bx + c in the sphere contains the North
pole. This single point may be thought of as giving a single “point at infinity”
on the curve.
A more subtle question is how to verify the associative law for the binary
operation. This is so familiar in ordinary addition that we are prone to overlook it. When it is encountered in matrix multiplication, it follows from the
associative law in the underlying ring. Here a different principle is at work:
Although it is still true that the law is inherited from the complex numbers, it
is so via a bijection involving transcendental functions. In the twentieth century, algebraic geometers sought to understand this phenomenon in a more
abstract way. The subject of Abelian varieties is a deep and powerful one
98
5 Elliptic Curves
4
2
P
−2
1
2
3
4
5
−P
−2
−4
Figure 5.2. Points converging to −P = (0, −1) showing the point at infinity.
about geometric objects, defined over arbitrary fields, with an Abelian group
structure.
Exercise 5.4. Convince yourself that the associative law holds for an elliptic
curve with the geometrical binary operation. In other words, choose a specific
elliptic curve and plot it accurately. Choose three arbitrary points P , Q and R.
Now demonstrate geometrically that the point you get by adding R to P + Q
is the same as the one you get by adding P to Q + R.
5.2 The Congruent Number Problem
In this section, we introduce a problem from antiquity that was recently reinterpreted using the theory of elliptic curves. A natural number-theoretic
question arises with the familiar (3, 4, 5) triangle in Figure 5.3. This triangle
5.2 The Congruent Number Problem
99
– we may think of it as being defined by the triple of integers (3, 4, 5) – has
integral sides and integral area: What other triples of integers, or rationals,
have this property?
5 area=6
3
4
Figure 5.3. Six is a congruent number.
Example 5.4. There is a right-angled triangle with rational sides and area 5.
The triple (1 12 , 6 32 , 6 56 ) is Pythagorean: Expressing these as fractions over 6
and checking that
92 + 402 = 412
confirms the triangle with the sides given is right-angled. The area of the
triangle is easily computed to be 5.
We will see later that there are arbitrarily complicated examples of this
sort.
Example 5.5. The triple
2017680 1437599 2094350404801
,
,
1437599 168140 241717895860
gives a right-angled triangle with rational sides and area 6.
Examples 5.4 and 5.5 give examples of integer right-angled triangles with
integral area by clearing fractions, but it is simpler to allow the sides to be
rational, giving Definition 5.6.
Exercise 5.5. Find a rational right-angled triangle with area 7.
Such a triangle was known to Arab mathematicians of the twelfth century
and rediscovered by Euler in the eighteenth century.
Definition 5.6. An integer that is the area of a right-angled triangle with
rational sides is called a congruent number.
If an integer n is a congruent number and it is divisible by a square then
the sides of any triangle showing that n is congruent can be scaled accordingly.
Therefore we will assume without comment that it is sufficient to assume n is
square-free in any discussion about whether it is a congruent number or not.
100
5 Elliptic Curves
For millennia it has remained an unsolved problem to find an algorithm
for checking whether a given integer is a congruent number. In recent times,
Tunnell has shown how such an algorithm can be devised – see p. 243. What
is remarkable about his work is the fact that although the proof uses a great
deal of sophisticated twentieth-century mathematics, the way into the proof
is a back-of-an-envelope piece of high-school algebra that goes as follows. If n
is a congruent number, then there is a triple of rational numbers (X, Y, Z)
with
X 2 + Y 2 = Z 2 and 12 XY = n.
These two equations give two further equations,
(X ± Y )2 = X 2 ± 2XY + Y 2 = Z 2 ± 4n,
which can be written
2 2
X ±Y
Z
=
± n.
2
2
Multiplying the two equations (given by the choice of sign) together gives
2
2 4
X −Y2
Z
=
− n2.
4
2
Writing v = (X 2 − Y 2 )/4 and u = Z/2, we obtain
v 2 = u 4 − n2 .
Now multiply by u2 to obtain
(uv)2 = u6 − n2 u2 .
Finally, writing x = u2 and y = uv, we obtain a rational point (x, y) on the
elliptic curve
y 2 = x3 − n2 x,
so a congruent number n gives rise to a rational point on an elliptic curve
associated with n.
Example 5.7. If we start with the (3, 4, 5) triangle, then following the steps
35
2
3
just given, we obtain the rational point ( 25
4 , − 8 ) on the curve y = x − 36x
(following the convention that X = 4 should be the even side).
The curve y 2 = x3 − 36x has several integral points. In addition to (0, 0)
and (±6, 0), there is another, namely (−3, 9). One might wonder if these also
come from right-angled triangles. The answer is no, and Tunnell’s Theorem
(Theorem 5.8) suggests why not.
Notice that in the construction above, the x-coordinate we obtained turned
out to be the square of a rational. Moreover, the denominator of x must be
even. To see this, remember Theorem 2.1, which determines the Pythagorean
triples. On clearing the denominators in the triple (X, Y, Z), one of X or Y
must have an even numerator and Z cannot. Thus the denominator 2 in u =
Z/2 cannot cancel.
5.2 The Congruent Number Problem
101
Theorem 5.8. Suppose n is a positive integer and (x, y) denotes a rational
point on the elliptic curve y 2 = x3 − n2 x with x equal to the square of a
rational with an even denominator. Then n is a congruent number.
Proof. The proof uses the characterization of Pythagorean triples from Theorem 2.1. Initially we retrace some of the steps√used earlier, but there is an
ingenious twist at the end of the proof. Let u = x > 0; by assumption u ∈ Q.
Write v = y/u so
v 2 = y 2 /u2 = x(x2 − n2 )/x = x2 − n2 .
We therefore have a Pythagorean equation,
v 2 + n2 = x2 .
(5.1)
Unfortunately, the resulting triangle does not have area n. Let t denote the
denominator of v; then t is the denominator of x, by Equation (5.1). Now
clear the denominators to obtain a Pythagorean triple (t2 v, t2 n, t2 x). Since t
is even, we can write, for integers a > b > 0,
t2 n = 2ab, t2 v = a2 − b2 , t2 x = a2 + b2 .
We claim there is a right-angled triangle with sides 2a/t, 2b/t, and 2u. This
is easy to see:
2 2
2a
2b
+
= 4(a2 + b2 )/t2 = 4t2 x/t2 = 4x = (2u)2.
t
t
The area of this triangle is
1 2a 2b
2ab
= 2 = n.
2 t t
t
Of course, this theorem does not solve the congruent number problem:
What makes us think we know any more about the rational points on an
elliptic curve than we do about congruent numbers? In fact, a great deal of
research about rational points on elliptic curves took place in the twentieth
century, so reducing a problem to finding rational points on elliptic curves
allows many deep results to be applied. Even without invoking any of that,
we already learn something quite surprising from Theorem 5.8.
Exercise 5.6. Let P be a rational point on the elliptic curve
y 2 = x3 − n2 x
which is neither (0, 0) nor (±n, 0). Using the algebraic doubling formula we
used before, show that the x-coordinate of the resulting point is the square
of a rational with an even denominator. Thus, if we can keep doing this, we
obtain (potentially) infinitely many different rational right-angled triangles
with area n. This was certainly not obvious from the definition of a congruent
number.
102
5 Elliptic Curves
The construction in Theorem 5.8 looked a little unwieldy. The next result
is a neater formulation.
Theorem 5.9. Suppose n is a positive integer and x ∈ Q has the property
that x, x + n, x − n are all rational squares. Put
√
√
√
√
√
X = x + n − x − n, Y = x + n + x − n, Z = 2 x.
Then the triangle with sides X, Y , and Z is a rational right-angled triangle
with area n.
Exercise 5.7. Confirm the statements in Theorem 5.9.
The shape of the equation defining the elliptic curve
y 2 = x(x + n)(x − n)
might lead you to think the conditions of Theorem 5.9 must always be satisfied
for a rational point (x, y). The point (−3, 9) is a counterexample however.
Subsequently (see Section 7.3) we will come to understand when the conditions
hold in terms of the group-theoretic structure of the curve.
35
Exercise 5.8. Take n = 6 and P = 25
4 , 8 . Find the rational right-angled
triangle of area 6 corresponding to 2P . Find the triangle corresponding to 4P .
Exercise 5.9. Find a rational point P other than (0, 0) or (±5, 0) on the
curve y 2 = x3 − 25x. Use P to find a rational right-angled triangle of area 5
different from Example 5.4.
The hard part of all this is to understand when rational points of the right
kind exist in the first place. It is somewhat easier to show that as long as a
rational point is not (0, 0) or (±n, 0), then one can go on constructing others
with the right properties to guarantee the existence of many rational rightangled triangles with area n. Thus the problem comes down to finding for
which n are there any nontrivial rational points. A satisfactory resolution of
this problem has recently been given – see p. 243, where Tunnell’s Theorem
is stated. We will go on now to relate the geometric construction given before
to the existence of a group structure on the curve.
Exercise 5.10. The geometric addition on elliptic curves allows us to construct new rational right-angled triangles from existing ones. In this exercise,
the same construction is carried out directly on the triangle. Let (x, y, z) be
a Pythagorean triple with x < y < z. This construction will find another
Pythagorean triple (X, Y, Z) with
y 2 − x2
;
2z
2xyz
;
Y = 2
y − x2
x4 + y 4 + 6x2 y 2
Z=
.
2z(y 2 − x2 )
X=
5.2 The Congruent Number Problem
103
Let Px , Py , Pz denote the vertices of the triangle, opposite the sides x, y and z
respectively. Draw a circle with center Pz and radius x, and let Q be the point
on this circle where a tangential line from Px meets the circle (see Figure 5.4).
Extend the line Px Py to a point R at a distance 2z from Px . Now draw a circle
with center Px through Q, and call S the point of intersection between the
circle and the line Px Py . Finally, draw a line through S parallel to QR, and
let T be the intersection of this line with Px Q.
R
z
Py
S
z
Px
x
Pz
y
X
T
Q
Figure 5.4. Constructing a new Pythagorean triple.
Prove that the distance from Px to T is X =
continue the construction to find the length Y .
y 2 −x2
2z ,
and show how to
35
Example 5.10. Consider the curve y 2 = x3 − 36x and the point P = ( 25
4 , 8 )
on the curve. Then P is a rational point of infinite order, and we compute
that
1726556399
2P = 1442401
19600 , − 2744000
and
4P =
4386303618090112563849601
233710164715943220558400
, − 870369109085580828275935650626254401
11298385812463619737216684496448000 .
104
5 Elliptic Curves
The elliptic curve y 2 = x3 − 36x allows other rational right triangles with
area 6 to be computed: Using the points above, one finds the right-angled
triangles with sides
120 7 1201
, ,
7 10 70
and
2017680 1437599 2094350404801
,
,
,
1437599 168140 241717895860
each of which has area 6 (see Exercise 5.8 on p. 102).
The arithmetic complexity of the rational points in Example 5.10 seems
to grow enormously, just as we saw in Example 5.1 on p. 95, and we want to
quantify this growth in complexity. As we saw stated in Theorem 5.2, there is
a quadratic-exponential growth in the size of the denominators. To make this
more precise, we will use a naı̈ve notion of “height” on elliptic curves over the
rationals, which allows us to measure how rational points grow in complexity
under maps such as P → 2P . This notion of height was introduced by Mordell
with the specific aim of proving the following theorem, which was conjectured
by Poincaré. The proof will exercise us considerably in Chapter 7.
For an elliptic curve E defined over the rationals, denote by E(Q) the set of
points on E with rational coordinates, together with the point ‘at infinity’. The
geometrical addition law makes E(Q) into a group, and Mordell’s Theorem
says something about the structure of this group.
Theorem 5.11. [Mordell’s Theorem] Let E denote an elliptic curve defined over Q. Then E(Q) is a finitely generated Abelian group.
A complete proof of this theorem may be found in the references at the end
of the chapter. In Section 7.2 we will show how it follows from the so-called
weak Mordell Theorem, and then in Section 7.3 will prove the weak Mordell
Theorem in a special case.
Later developments have placed this result in a more general context. Algebraic curves have an integer parameter called the genus, which measures the
topological complexity of the underlying complex space. For an elliptic curve,
the fundamental domain (this will be defined in Chapter 6 on p. 122) can be
wrapped up into a torus (or doughnut) that is topologically a sphere with one
handle. Roughly speaking, the genus counts the complexity in this topological sense when the underlying field of definition is the complex numbers. One
of the great challenges facing mathematicians during the last century was to
give a properly precise definition of genus when the base field is arbitrary.
Remarkably, the genus of a curve seems to govern how many rational points
it will have. Elliptic curves have genus one, giving a finitely generated group
of rational points. Curves of genus greater than one have only finitely many
rational points by a deep result of Faltings.
Theorem 5.11 means that E(Q) is isomorphic to Zr × F for some r ∈ N
and finite group F . The number r is called the rank of the curve, and it is
5.3 Explicit Formulas
105
conjectured that for any r ∈ N there is a curve defined over the rationals with
rank r. The possibilities for the finite group F are more constrained – we will
describe some of this in Section 5.4.
5.3 Explicit Formulas
In this section we will turn the geometric notion of addition on an elliptic curve
into an algebraic formulation that allows computations to be made. We are
going to work with a special form of cubic equation throughout this section.
Subsequently, we will explain how the different forms of equation relate to
each other. As a warm-up, we recommend the following exercise.
Exercise 5.11. Let p(x) = x3 + ax2 + bx + c = (x − λ1 )(x − λ2 )(x − λ3 ). Find
expressions in a, b and c for λ1 λ2 λ3 , λ1 λ2 + λ1 λ3 + λ2 λ3 , and λ1 + λ2 + λ3 .
Given points P1 = (x1 , y1 ) and P2 = (x2 , y2 ) on the elliptic curve
y 2 = x3 + ax + b,
explicit formulas may be found for x3 and y3 , where P1 + P2 = (x3 , y3 ).
Case I: If x1 = x2 , then the line joining P1 to P2 has equation
y2 − y1
y − y1
=
,
x − x1
x2 − x1
so
y=
y2 − y1
x2 y1 − x1 y2
x+
.
x2 − x1
x2 − x1
!"
#
!"
#
α
β
Substituting this into the equation
y 2 = x3 + ax + b
for the curve gives
(αx + β)2 = x3 + ax + b,
whose roots are the x-coordinates x1 , x2 , x3 of the three points of intersection
with the curve. By the sum of roots formula in Exercise 5.11, we must have
x1 + x2 + x3 = α2 ,
so
x3 = α − x1 − x2 =
2
Reflecting in the x-axis gives P3 , so
y2 − y 1
x2 − x1
2
− x1 − x2 .
106
5 Elliptic Curves
y2 − y 1
x2 y1 − x1 y2
x3 −
y3 = −αx3 − β = −
x2 − x1
x2 − x1
or
y3 = α(x1 − x3 ) − y1 .
Case II: Assume that x1 = x2 and y1 = y2 . Let y = αx + β be the equation of the tangent to the curve at (x1 , y1 ). By implicit differentiation of the
equation y 2 = x3 + ax + b, we obtain
α=
and hence
2
2
3x1 + a
x3 =
− 2x1 ,
2y1
3x21 + a
,
2y1
y3 =
3x21 + a
2y1
3x21 + a
2y1
2
− 3x1 + y1 ,
so y3 = α(x1 − x3 ) − y1 .
Case III: If x1 = x2 and y1 = −y2 , then P1 = −P2 so P3 is the point at
infinity.
Notice that all the formulas are rational functions (quotients of polynomials) with coefficients in the same field as a and b. This suggests there is
a closure property as follows. Let L denote any field over which the curve is
defined, and write E(L) for the set of points with coefficients in L together
with the point at infinity. Then P1 , P2 ∈ E(L) implies that P1 + P2 ∈ E(L).
Thus the group operation is well-defined on E(L). Actually, some care needs
to be taken if the characteristic of L is 2 or 3, starting with a different form
of equation. We will discuss this further in Section 5.3.2.
5.3.1 Torsion Points
Later, we will give a more precise explanation of the identity element for the
group operation. For the moment, we continue to think of the identity as the
point at infinity, so an equation such as 2P = 0 on the curve E means that a
vertical line is a tangent to E at the point P . This allows us to speak of torsion
points on an elliptic curve E: P is a point of order dividing n if nP = 0 in
this geometrical sense. As we will see, the geometrical definition really gives a
group structure to the points on the elliptic curve, and thus the usual terms
from group theory such as “torsion” and “order” can be applied.
Example 5.12. Consider the curve E : y 2 = x3 + 1, and let P = (2, 3). Using
the formulas, we find
2P = (0, 1), 3P = 2P + P = (−1, 0), 4P = 3P + P = (0, −1) = −2P.
It follows that 6P = 0 (so P is a torsion point with respect to the group
structure on the curve), and since P, 2P, 3P = 0, the point P has order 6 (see
Figure 5.5).
5.3 Explicit Formulas
4
P
2
2P
3P
−2
1
2
3
4P
−2
−4
Figure 5.5. The point P = (2, 3) has order 6 on y 2 = x3 + 1.
Exercise 5.12. Find the order of the point (3, 8) on the elliptic curve
y 2 = x3 − 43x + 166.
Exercise 5.13. Find the order of the point (0, 16) on the elliptic curve
y 2 = x3 + 256.
Exercise 5.14. Find the order of the point ( 12 , 12 ) on the elliptic curve
y 2 = x3 + 14 x.
Exercise 5.15. Find the order of the point (− 13 , 12 ) on the elliptic curve
y 2 = x3 − 13 x +
19
108 .
107
108
5 Elliptic Curves
Exercise 5.16. Suppose that K denotes any field with characteristic not
equal to 2 or 3, and E : y 2 = x3 + ax + b (a, b ∈ K). Assuming the binary operation defined before makes E(K) into a group, prove that P = (x, y)
has order 2 if and only if y = 0.
Exercise 5.17. Suppose 1 n ∈ N and consider the elliptic curve
E : y 2 = x3 − n2 x.
Prove that there are only two real points of order 3 in E(R). Mark these points
on a graph of the curve. (They are points of inflexion.) It may be interesting
to look at Example 7.2 on p. 134.
Exercise 5.18. Use your graph from Exercise 5.17 to show that the subgroup
of real points on y 2 = x3 − n2 x with order dividing 4 is isomorphic to C2 × C4 .
Exercise 7.5 on p. 138 describes a useful result that allows all the rational torsion points on an integral elliptic curve to be effectively determined.
When K = Q, there are not many possibilities for the orders of torsion points
in E(Q). For example, in Section 5.4 we show that there are no points of
order 11 on any elliptic curve defined over the rationals (assuming a difficult
but, in principle, elementary result from Diophantine equations). On the other
hand, the complex torsion points on an elliptic curve are easy to describe once
we have the necessary function theory – see Section 6.3.
5.3.2 The Equation Defining an Elliptic Curve
At several points we have used equations of differing shapes to define an elliptic
curve. In the statement of Siegel’s Theorem (Theorem 2.13) we set y 2 equal to
a cubic in x with no repeated zeros. The addition formulas in the last section
were computed using a special type of cubic. It is fair to ask just what is the
correct definition in general. In Chapter 6 we will see that a pair of complex
functions parametrize a curve of the shape y 2 = x3 + ax + b in which the
right-hand side has no repeated zeros. Because of his important work in the
area, this equation became known as a Weierstrass equation or Weierstrass
model. We will see that the geometric definition of addition does indeed impose
a group structure on the complex solutions of that equation. However, the
explicit formulas define a group structure over any field of characteristic other
than 2 or 3 (as does the geometrical definition of the group operation when
a suitable notion of tangency is developed). We will have to ask you to take
this statement on trust, or apply the Lefschetz3 principle.
3
The Lefschetz principle says, in effect, that if an algebraic formula holds in C, then
it will hold in any field where it makes sense. Although this is a valid principle,
generally it is best used as a pointer toward phenomena that deserve to be better understood, rather in the way that algebraic geometers came to understand
elliptic curves and their generalizations.
5.3 Explicit Formulas
109
Suppose then that F denotes any field. It is possible to develop a theory of
elliptic curves from one equation, regardless of the characteristic. When using
the Weierstrass equation in characteristic 2 or 3, one cannot define tangents
adequately (look at what happens when you differentiate). Tate used a more
general equation with the following shape:
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
(5.2)
with a1 , a2 , a3 , a4 , a6 ∈ F satisfying the non-degeneracy condition that every
point on the curve has a unique tangent. This condition is equivalent to the
non-vanishing of a complicated polynomial expression. For the special case in
which Equation (5.2) takes the form y 2 = x3 + ax + b, this non-degeneracy
condition is equivalent to the cubic having no repeated zeros, and therefore
to the condition that 4a3 + 27b2 = 0 (see Exercise 2.14). The addition formulas can all be worked out for this general equation, in any characteristic.
However, the formulas are significantly more complicated and this can hinder
the development of intuition about the group law. This is why we prefer to
develop the theory for the Weierstrass equation. The Equation (5.2) became
known as a generalized Weierstrass equation, although it is becoming usual
to refer to this too as a Weierstrass equation. The reader should beware that
the modern literature on elliptic curves tends to work with the generalized
equation.
The following exercise shows how gory the associative law can be when
expressed in terms of the algebraic formula, even for the simplest form of
equation.
Exercise 5.19. Using just the Weierstrass equation y 2 = x3 +ax+b, verify the
associative law for addition on an elliptic curve using the algebraic formulas
from Section 5.3. Different formulas are required depending upon whether
the x-coordinates are equal or not. Even doing one special case of
P + (Q + R) = (P + Q) + R
is tiresome and requires a great deal of both paper and patience.
Although we do not have the space to develop the algebraic geometry
needed to properly develop a theory of elliptic curves over arbitrary fields, we
recommend doing the following exercise to get a feel for elliptic curves over a
finite field.
Exercise 5.20. Let E denote the elliptic curve y 2 = x3 − 2. Find the order
of the point (3, 5) in the group E(F7 ). What is the order of E(F7 )? Do the
same over other fields Fp for primes p. Can you detect any restrictions of
the resulting group orders? For a precise result on this theme consult Hasse’s
Theorem (Theorem 11.11 on p. 240).
In several respects the group E(F), where F denotes a finite field, can be
studied along the lines that we studied F∗ . The two groups will often exhibit
110
5 Elliptic Curves
properties that can be directly related – and this phenomenon is useful in
cryptography and coding theory. Earlier we proved that F∗ is always a cyclic
group. Therefore a natural question is to ask for the structure of E(F).
Exercise 5.21. *Let F be a finite field. Prove that E(F) is always a cyclic
group or a direct product of two cyclic groups. Find an example where the
group has two nontrivial cyclic factors.
5.4 Points of Order Eleven
The structure of the points of finite order in the group E(Q) for an elliptic
curve defined over the rationals is very constrained: A deep result of Mazur
says that the torsion subgroup of E(Q) must be isomorphic to Z/nZ for
some n, 1 n 12, n = 11, or to Z/2Z ⊕ Z/nZ, 1 n 4. Proving this
important result requires more material, but we can exhibit one nontrivial
constraint (assuming a difficult Diophantine result and using some elementary
properties of the geometry of the rational projective plane P2 (Q)). If you have
not encountered projective space, postpone this section until you have read
Section 6.2. In what follows, we use little more than the geometric definition
of addition on an elliptic curve to paint a putative rational point of order 11
into a corner where it cannot exist.
Theorem 5.13. If E is an elliptic curve defined over Q, then E(Q) has no
point of order 11.
Proof. Assume that P is a point in E(Q) with order 11. Then no three
points of S = {0, P, 3P, 4P } could lie on a straight line because if A, B, C are
collinear then A + B + C = 0 by the geometric definition of group addition.
Since P has order 11, this last equation is impossible for three distinct points
from S.
It follows that there is a nonsingular linear map on P2 (Q) sending
0 → [0, 1, 0], P → [1, 0, 0], 3P → [0, 0, 1], and 4P → [1, 1, 1].
To see this, notice first that of the four points
[0, 1, 0], [1, 0, 0], [0, 0, 1], [1, 1, 1],
no three are collinear, by checking the various determinants. Given any four
points with homogenous coordinates v1 , v2 , v3 , v4 , the matrix
M = [av1t |bv2t |cv3t ]
will, for any a, b, c = 0, send
5.4 Points of Order Eleven
111
[1, 0, 0] → v1 ,
[0, 1, 0] → v2 ,
[0, 0, 1] → v3 , and
[1, 1, 1] → av1 + bv2 + cv3 .
The equation av1 + bv2 + cv3 = v4 has a unique solution with a, b, c all
nonzero by the non-collinearity assumption. Thus, by applying a change of
variables in P2 (Q), we may assume that 0 = [0, 1, 0], P = [1, 0, 0], 3P = [0, 0, 1],
and 4P = [1, 1, 1].
Now let 5P = [x1 , x2 , x3 ]. Then, if 1 is the line through 5P and 0, and 2
is the line through 4P and P , −5P ∈ 1 ∩ 2 . Thus
r[0, 1, 0] + s[x1 , x2 , x3 ] = t[1, 0, 0] + w[1, 1, 1],
for some r, s, t, w ∈ Q. Comparing coefficients shows that
sx1 = t + w; sx2 + r = w; sx3 = w.
If s = 0, then P = 0, which is impossible, so without loss of generality we
may put s = 1. Then r = x3 − x2 , and so
−5P = r[0, 1, 0] + s[x1 , x2 , x3 ] = [x1 , x3 , x3 ].
Similar arguments show that
−4P = [1, 0, 1],
−P = [x1 − x3 , x2 , 0],
−3P = [0, x3 − x1 + x2 , x3 − x1 ], and
2P = [x1 x3 − x21 + x1 x2 , x23 − x1 x3 + x2 x3 , x23 − x1 x3 ].
Since 11P = 0, the points 5P, 4P, 2P are collinear. Taking the determinant
of the matrix whose rows are the coefficients of these points, it follows that
x33 − x21 x2 + x21 x3 + x1 x22 − 2x1 x23 = 0.
(5.3)
We claim that the only rational solutions to Equation (5.3) are
[0, 1, 0], [1, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0].
The notes at the end of the chapter provide references where this difficult
result is proved. The point 5P must correspond to one of these possibilities.
It cannot be [0, 1, 0] because this is 0 and 5P = 0. It cannot be [1, 1, 1] because
this is 4P and 5P = 4P implies P = 0. Similarly, it cannot be [1, 0, 0] because
this is P and 5P = P implies 4P = 0. It cannot be [1, 0, 1] because this is −4P
and 9P = 0. It cannot be [1, 1, 0] because this is −P and 6P = 0.
The contradiction proves that there can be no such point P .
112
5 Elliptic Curves
5.5 Prime Values of Elliptic Divisibility Sequences
Elliptic curves generate a family of integer sequences that relate to several
interesting parts of mathematics, including graph theory and cryptography.
Suppose the elliptic curve E has a nontorsion point P ∈ E(Q). Write
x(nP ) =
An
,
Bn2
(5.4)
in lowest terms, with An and Bn in Z. An elliptic analog of the question about
Mersenne primes asks how often Bn is prime as n varies. Because Bn grows
so rapidly, this is potentially a method to find very large prime numbers.
Example 5.14. Let
E : y 2 = x3 + 26,
P = (−1, 5).
The term B29 is a prime with 286 decimal digits.
Example 5.15. Let
E : y 2 = x3 + 15,
P = (1, 4).
The term B41 is a prime with 510 decimal digits.
In some respects, this method for producing primes mirrors the situation
with sequences such as the Mersenne and Fibonacci sequences, which are
expected to produce large primes. For many years, the largest known primes
have come from the Mersenne sequence. However, numerical investigation
suggests that, for fixed E and P , the sequence (Bn ) should only contain
finitely many primes, and a non-rigorous probabilistic argument4 suggests
the number of prime terms should be uniformly bounded.
Just like the Mersenne and Fibonacci sequences, the sequence (Bn ) is a
divisibility sequence, meaning that Bm |Bn whenever m|n. A consequence of
this property, together with the rapid growth rate, is that there can only be
finitely many primes in the sequence (Bn ) if P is the multiple of another point,
or if P is a non-integral point; moreover the terms Bn for large n cannot be
prime if the index n is not itself prime. We say that a rational point is a
generator if it is not the multiple of any other rational point.
Let E and E be two elliptic curves defined over Q. An isogeny is a nonzero
homomorphism defined by rational functions on the coordinates of the points:
4
Crudely, the Prime Number Theorem (Theorem 8.1) implies that the probability
that a large integer N is prime is approximately 1/ log N . The expected
number
of prime terms Bn with n < x is (speculatively) approximately n<x 1/ log Bn .
By Theorem 5.2 this sum converges as x → ∞. It is known that the quantity h
appearing in Theorem 5.2 is uniformly bounded below by some positive constant
independent of the initial nontorsion rational point P and curve defined over the
rationals E, provided the starting equation has minimal ∆.
5.5 Prime Values of Elliptic Divisibility Sequences
113
φ : E → E.
Taking E = E , the multiplication-by-n map P → nP for n ∈ Z is an example
of an isogeny. The isogeny has an integral degree m 1, which is the degree
of the underlying rational functions that define it.
Exercise 5.22. *Prove that the degree of the isogeny P → nP is n2 .
The curves E and E are said to be m-isogenous if there is an isogeny of
degree m between them. It can be proved that the multiplication-by-n map
can be factorized as a composition of two isogenies, each of degree n.
Definition 5.16. We say the point P ∈ E(Q) is magnified if it is the image
of a rational point under an isogeny of degree m > 1.
The term was chosen because the height of a point increases under such a
map – see Chapter 7 for more details about heights. The following result of
Everest, Miller and Stephens will not be proved here.
Theorem 5.17. If P ∈ E(Q) is a magnified point, then Bn is a prime power
for only finitely many n.
Example 5.18. (1) The curve
y 2 = x3 + x2 − 4x
is 2-isogenous to the curve in Weierstrass form,
E : y 2 = x3 + x2 + 16x + 16.
The generator (−2, 2) maps to the generator P = (0, 4) on E. Thus the
sequence of denominators for P on E contains only a finite number of prime
powers.
(2) The curve
y 2 = x3 − 9x + 9
is 3-isogenous to the curve in Weierstrass form,
E : y 2 = x3 − 189x − 999.
The generator (1, 1) maps to the generator P = (−8, 1) on E. Thus the
sequence of denominators for P on E contains only a finite number of prime
powers.
Call the number of distinct prime divisors of an integer its length. The
following conjecture has arisen from work of Everest and King.
Conjecture 5.19. Given a fixed bound on the length, there are only finitely
many terms Bn with length below that bound.
114
5 Elliptic Curves
5.5.1 The curve u3 + v 3 = D
This section shows that the primality question can be answered in complete
generality for curves in homogenous form.
Theorem 5.20. Suppose E denotes a curve defined by an equation
u3 + v 3 = D
(5.5)
for some nonzero D ∈ Q. Let P denote a nontorsion Q-rational point. Write,
in lowest terms,
AP CP
P =
.
,
BP BP
Then the integers BP are prime powers for only finitely many Q-points P .
Note that the shape of the rational points is slightly different; the denominators of the x and y coordinates are not compelled to be powers. These
curves, although not in the form to which we are accustomed, are still elliptic
curves. The geometric addition used before works here and defines a group.
As we shall see, a simple transformation puts them into the more usual form.
Example 5.21. As Ramanujan famously pointed out, the taxicab equation5
x3 + y 3 = 1729,
(5.6)
has two distinct integral solutions. These give rise to points
P = (1, 12) and Q = (9, 10)
on the elliptic curve defined by Equation (5.6). The only rational points
on Equation (5.6) that seem to yield prime denominators are 2Q and P + Q
(and their inverses).
Proof of Theorem 5.20. There is a transformation between the homogenous model given by Equation (5.5) and the Weierstrass model,
y 2 = x3 − 24 33 D2 .
The transformations are given by
22 3D
22 32 D(u − v)
,
y=
,
u+v
u+v
2 2 32 D − y
22 32 D + y
,
v=
.
u=
6x
6x
x=
5
Srinivasa Ramanujan was a largely self-taught mathematical genius. According
to C. P. Snow, on one of G. H. Hardy’s visits to Ramanujan in the hospital in
Putney, Hardy said “I thought the number of my taxicab was 1729. It seemed to
me rather a dull number.” To which Ramanujan replied, “No, Hardy! It is a very
interesting number. It is the smallest number expressible as the sum of two cubes
in two different ways.”
5.5 Prime Values of Elliptic Divisibility Sequences
115
Writing x = X/Z 2 and y = Y /Z 3 , where gcd(X, Z) = gcd(Y, Z) = 1, it follows
that
22 32 DZ 3 + Y
u=
.
6XZ
If X divides the numerator of u, then X divides 26 33 D2 . By Siegel’s Theorem
(Theorem 2.13), this can only happen finitely often. Since Z is coprime to the
numerator, apart from a finite number of points, the denominator of u always
has two nontrivial coprime factors.
Exercise 5.23. Prove$
that any integer solutions to the equation u3 + v 3 = D
have max{|u|, |v|} 2
|D|
3 .
5.5.2 Higher Rank Considerations
Let E denote an elliptic curve, defined over Q. We say rational points P and Q
are independent if no integer linear combination mP + nQ can represent the
point at infinity unless m = n = 0.
Theorem 5.22. Let E denote an elliptic curve, defined over Q, and suppose
that P and Q denote independent rational points both of which are magnified
under the same isogeny. Write
x(nP + mQ) =
An,m
.
2
Bn,m
(5.7)
Then there are only finitely many pairs (m, n) for which Bn,m is prime.
This theorem will not be proved here. Examples of the phenomenon of
simultaneous magnification under the same isogeny are not easy to find: The
following example uses the generalized Weierstrass form (5.2).
Example 5.23. The elliptic curve
y 2 + xy = x3 + x2 − 156x + 2070
has independent generators P = (3, 39) and Q = (13, 43) that are magnified
under the same 2-isogeny.
Remark 5.24. Probabilistic arguments together with results from some numerical experiments suggest that, for certain curves in Weierstrass form (5.2), if P
and Q denote independent nontorsion rational points, then the denominator
of nP + mQ can be the square of a prime infinitely often. Indeed, there seem
to be asymptotically c log X such primes with |m|, |n| < X. Of course, none of
the numerical examples that are considered in these arguments use magnified
points.
116
5 Elliptic Curves
5.5.3 Elliptic Analogs of Zsigmondy’s Theorem
Zsigmondy’s Theorem (Theorem 1.16 on p. 28) has an elliptic analog.
Theorem 5.25. [Silverman] Let E denote an elliptic curve defined over Q,
in generalized Weierstrass form, and let P = (x(P ), y(P )) denote a nontorAn
sion rational point on E. Let x(nP ) = B
2 in lowest terms. Then the elliptic
n
divisibility sequence (Bn ) satisfies a Zsigmondy theorem: For all sufficiently
large n, Bn has a primitive divisor.
In view of the fact that sequences such as (Bn ) seem likely to contain only
finitely many prime terms, Theorem 5.25 takes on a more interesting status,
as a means of producing large primes from elliptic divisibility sequences.
Analogs of the precise bound in Theorem 1.15 hold for certain elliptic divisibility sequences. The next result is an explicit bound for the first appearance
of a primitive divisor in a congruent number curve.
Example 5.26. Let E denote the curve
E : y 2 = x3 − 25x
and let P = (−4, 6). Then Bn has a primitive divisor for every n > 1.
The factorizations of Bn for this example, 2 n 8, with the primitive
divisors in bold, are shown in Table 5.2.
Table 5.2. Primitive divisors of (Bn ).
n
Bn
Factorization
2
12
22 ·3
3
2257
37·61
4
1494696
23 ·3·72 ·31·41
5
8914433905
5·13·17·761·10601
6
178761481355556
22 ·32 ·11·37·61·71·587·4799
7
62419747600438859233
197·421·215153·3498052153
8 5354229862821602092291248 24 ·3·72 ·31·41·113279·3344161·4728001
There is a difference in the proof for the odd and even terms. For a sequence (Bn ), define the even Zsigmondy bound of (Bn ) to be the greatest even
integer n for which Bn does not have a primitive divisor, and similarly define
the odd Zsigmondy bound of (Bn ) to be the greatest odd integer for which Bn
does not have a primitive divisor.
Theorem 5.27. Let E denote the elliptic curve
E : y 2 = x3 − T 2 x,
5.6 Ramanujan Numbers and the Taxicab Problem
117
where T 1 is a square-free integer. Let P ∈ E(Q) denote a nontorsion
point and write Bn2 for the denominator of x(nP ). Then the even Zsigmondy
bound of the sequence (Bn ) is not greater than 18. If x(P ) < 0, then the odd
Zsigmondy bound of (Bn ) is not greater than 3. If x(P ) is a square, then the
odd Zsigmondy bound is not greater than 21.
In specific cases, the terms not covered by Theorem 5.27 can be checked on
a computer; this is how Example 5.26 was computed. Theorem 5.27 will not
be proved here, but the main idea is contained in the following exercise. The
condition stated there for the absence of a primitive divisor is very similar to
that found for the Mersenne numbers in Exercise 1.16(b) on p. 28.
Exercise 5.24. It can be shown that if Bn does not have a primitive divisor
then
Bn n
Bn/p .
p|n
Assuming this, use Theorem 5.2 to deduce that n must be bounded.
5.6 Ramanujan Numbers and the Taxicab Problem
In view of Example 5.21 and the story concerning Ramanujan, integers N for
which the Diophantine equation
N = x3 + y 3
has two nontrivially distinct solutions are sometimes called Ramanujan numbers. Table 5.3 shows the first few of these; there are infinitely many such
numbers. In the table u3 + v 3 = x3 + y 3 .
Table 5.3. The first few Ramanujan numbers.
N
1729
4104
13832
20683
32832
u
1
2
18
10
18
v
12
16
20
27
30
x
9
9
2
19
4
y
10
15
24
24
32
Indeed, it turns out that for any k there are infinitely many numbers N
with the property that N can be expressed as a nontrivial sum of two cubes
in k essentially different ways. The smallest number T (k) with this property is
called the kth taxicab number or Hardy–Ramanujan number. Table 5.4 shows
the known taxicab numbers with the pairs whose cubes sum to the number,
and the discoverer.
118
5 Elliptic Curves
Table 5.4. The first few taxicab numbers.
k
1
2
3
4
5
T (k)
2
Pairs
1, 1
1, 12
1729
9, 10
167, 436
228, 423
87539319
255, 414
2421, 19083
5436, 18948
6963472309248
10200, 18072
13322, 16630
38787, 365757
107839, 362753
48988659276962496 205292, 342952
221424, 336588
231518, 331954
Discoverer
de Bessy (1657)
Leech (1957)
Rosenstiel et al. (1991)
Wilson (1997)
It is suspected that
T (6) = 24153319581254312065344.
Notes to Chapter 5: The footnote about Bachet’s equation on p. 93 is taken from
the book of Silverman and Tate [143]. A very thorough treatment of all aspects of
elliptic curves is given in Silverman’s books [139], [142], and aspects of elliptic curves
close to the topics in number theory we study are in Koblitz’s book [89]. These books
are highly recommended to any reader interested in learning more about elliptic
curves. The construction in Exercise 5.10 on p. 102 was shown to us by Bartholdi,
and we thank him for permission to include it here. The congruent number problem
and its connection to elliptic curves are described in detail in Koblitz’s book [89].
Mordell’s Theorem appears first in his paper [110]; the paper of Poincaré mentioned
is [116]. An attractive historical account of Mordell’s theorem may be found in the
paper of Cassells [26]. Faltings’ Theorem on higher-genus curves appears in his papers [61] and [62]. An account of some of the background needed for this proof
appears in the conference proceedings [34] edited by Cornell and Silverman. There
are expositions of Faltings’ proof by Deligne [41] and Szpiro [149]. The claim about
the integral solutions to Equation (5.3) may be found in several places, including
a paper [14] by Billing and Mahler; the presentation in Section 5.4 comes from a
course taught by Silverberg at Ohio State University. Mazur’s Theorem appeared
first in his paper [105]; a treatment may also be found in Silverman’s book [139].
Elliptic divisibility sequences are discussed in the monograph [58, Chapter 10] by
Everest, van der Poorten, Shparlinski and Ward. The incidence of primes in these
sequences has been studied by Chudnovsky and Chudnovsky [30] (Example 5.14 is
taken from that paper), Einsiedler, Everest and Ward [48] and Rogers [131]. Theorem 5.17 appears in the paper [56] of Everest, Miller and Stephens; Example 5.18
5.6 Ramanujan Numbers and the Taxicab Problem
119
comes from Cremona’s Web site [37]. More on Conjecture 5.19 may be found in a
paper of Everest and King [54]. More on Remark 5.24 may be found in a paper of
Everest, Rogers and Ward [57] or Rogers’ thesis [131]. Exercise 5.23 is taken from the
book [143, p. 149] by Silverman and Tate. Theorem 5.25 is proved in Silverman’s
paper [140]; Example 5.26 and Theorem 5.27 are taken from a paper of Everest,
McLaren and Ward [55]. References for the taxicab numbers in Table 5.4 may be
found in Sloane’s on-line encyclopedia of integer sequences [144]; there is an elementary account of the connection between T (2) and elliptic curves in an accessible
paper by Silverman [141], and the calculation of T (5) is described in an article by
Wilson [163].
6
Elliptic Functions
Elliptic curves can be viewed from many different mathematical perspectives.
In the last chapter, they were seen as primarily geometrical objects; in this
chapter, we start by emphasizing their relationship with some classical transcendental functions from complex analysis. To motivate the material in this
chapter, recall that the trigonometric functions sine and cosine parametrize
the points on the circle S1 . The rational points on the circle in turn parametrize
Pythagorean triples. This gives a triangle of ideas involving the circle: in one
corner are the classical transcendental functions, in another a compact group,
and in the third a connection to a Diophantine problem. In the last chapter,
we saw two corners of an analogous triangle involving elliptic curves. Rational points on elliptic curves give solutions to various Diophantine problems.
Our next goal is to fill out the third corner of the elliptic triangle by finding
transcendental functions that parametrize the points on elliptic curves. An
important by-product of our work will be the justification that the operation
defined by geometry in Chapter 5 really satisfies the axioms for a group. (See
Theorem 6.5 and the comments just after.)
6.1 Elliptic Functions
Let L ⊆ C denote a lattice in the complex plane. This means L is the set
of integer linear combinations of two complex numbers w1 and w2 that are
linearly independent over R. Write ω1 , ω2 for the lattice ω1 Z + ω2 Z ⊆ C.
More generally, a lattice in Rn is any subgroup isomorphic to Zn ; a lattice
in C coincides with this definition by viewing C as R2 .
One of the ways lattices of different dimensions arise naturally is in the
study of periodic functions. The best-known example is the exponential function
e : R → S1 = {z ∈ C | |z| = 1}
x → eix
122
6 Elliptic Functions
This is a periodic function because it satisfies e(x + 2π) = e(x) for all x ∈ R,
so e is periodic with respect to the one-dimensional lattice 2πZ ⊆ R.
We are interested in complex functions f with the doubly-periodic property
that
f (z + ω1 ) = f (z + ω2 ) = f (z),
that is, functions that are periodic with respect to L or L-periodic.
ω1
ω2
Figure 6.1. The lattice L spanned by ω1 and ω2 in C.
The lattice L is represented as a discrete subset of C in Figure 6.1: The
points of L are the points where the dashed lines intersect. The shaded region
Π = {r1 ω1 + r2 ω2 | 0 r1 , r2 < 1}
is a fundamental domain for the quotient C/L in the sense that each coset of L
has exactly one representative in Π. The L-periodic function analogous to the
exponential function that we will study is called the Weierstrass ℘-function
corresponding to L. For any z ∈
/ L, this is defined to be
1
1
1
.
(6.1)
−
℘L (z) = 2 +
z
(z − )2
2
0=∈L
6.1 Elliptic Functions
123
The elements of L have to be enumerated in some way in order to define the
sum. For the moment,suppose some enumeration
L\{0} = {1 , 2 , . . . } has
∞
been fixed and define 0=∈L f () to be n=1 f (n ). We will first prove that
the series in Equation (6.1) converges absolutely. It follows that the order in
which the enumeration takes place does not affect the value of the sum.
Lemma 6.1. The series
1
1
1
℘L (z) = 2 +
− 2
z
(z − )2
0=∈L
is absolutely convergent for any z ∈
/ L. The series defines a meromorphic
function whose only singularities are double poles at each lattice point in L.
Proof. Let z be any point not in L. Write
1
1
1 2z − z 2 /
−
=
.
.
(z − )2
2
3 (z/ − 1)2
Since |z/ − 1| is bounded below by a positive constant, there is a constant C1
depending on z such that
1 C1
1
−
.
≤
(z − )2
2 ||3
Therefore, it is enough to prove that the series 0=∈L ||−3 converges. To
see this, notice first that there is a constant C > 0 with the property that
|mω1 + nω2 | 1
max{|m|, |n|}.
C
Exercise 6.1. Prove that there are 8k integer pairs (m, n) with max{|m|, |n|}
equal to k. (See Figure 6.2, which suggests an inductive proof.)
It follows that
0=∈L
||−3 =
(m,n)=(0,0)
C3 ·
1
|mω1 + nω2 |3
(m,n)=(0,0)
= C3 ·
∞
8k
k=1
k3
1
max{|m|, |n|}3
= 8C 3
∞
1
,
k2
k=1
which converges. We have shown that the series defining ℘L (z) converges
absolutely for z ∈ C\L.
Finally, it is clear that the only pole of ℘L in Π is a double pole at 0 since
124
6 Elliptic Functions
r
r
r
r
r r
6
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r -
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
Figure 6.2. There are 8k integer pairs (m, n) with max{|m|, |n|} = k.
℘L (z) −
1
1
1
=
−
z2
(z − )2
2
0=∈L
converges absolutely in Π. Similarly, for any ∈ L,
1
1
1
1
+ 2
=
−
℘L (z) −
2
2
(z − )2
(z
−
)
z
=∈L;
=0
converges absolutely in Π + for the same reason, showing that the only pole
of ℘L in Π + is a double pole at .
The absolute convergence of ℘L (z) means that Equation (6.1) can be differentiated term by term (see Exercise 6.2 below) to give
℘L (z) = −2
∈L
1
,
(z − )3
(6.2)
which also converges absolutely. It is clear that ℘L (z) is periodic with respect
to L since if 0 ∈ L
℘L (z + 0 ) = −2
∈L
1
1
= −2
3
(z + 0 − )
(z − )3
∈L
is just a rearrangement of the terms.
Our ultimate goal is to prove that ℘L is periodic with respect to L. Periodicity of ℘L does not itself imply this, of course, but a simple argument does
allow us to deduce it.
6.1 Elliptic Functions
125
∞
n
Exercise 6.2. (a) Let f (z) =
be a complex power series with
n=0 cn z
radius of convergence R > 0. Prove that f is differentiable (see
∞Definition 8.18
on p. 170) on the set {z ∈ C | |z| < R} and that f (z) = n=1 ncn z n−1 on
this set.
(b) Show how to use this to justify the expression Equation (6.2) by using
the absolute convergence of the series defining ℘L to show that it may be
expanded as a power series.
What we have done up to now might seem clumsy: Given a series whose
terms are clearly differentiable, the most natural way to show it is differentiable is surely to differentiate term by term. This is a reasonable criticism,
however it involves a more subtle notion of convergence called uniform convergence (see Section 8.5). Term-by-term differentiability is easily provable
for power series, whose terms are simply monomials, but can be much trickier
when the terms are more complicated functions. This alternative approach
to the analyticity of ℘L is given in Exercise 8.20 on p. 173, using the concept of uniform convergence, once we have had time to introduce the concept
properly.
Lemma 6.2. The Weierstrass ℘-function ℘L is periodic with respect to L.
Proof. We want to prove that
℘L (z + ω1 ) = ℘L (z + ω2 ) = ℘L (z)
for all z ∈
/ L. First, notice that by Equation (6.1) and Equation (6.2),
℘L (−z) = ℘L (z) and ℘L (−z) = −℘L (z).
That is, ℘L (z) is an even function and ℘L (z) is an odd function. Now fix i to
be 1 or 2 and let
f (z) = ℘L (z + ωi ) − ℘L (z).
Then f is differentiable for all z ∈
/ L. Since ℘L (z) is periodic with respect
to L, we deduce that f (z) = 0 for all z ∈ C\L, so f is constant on this open
connected set.
To determine the constant value of f let z = −ωi /2. Then
f (−wi /2) = ℘L (ωi /2) − ℘L (−ωi /2),
which shows that f (−ωi /2) = 0 since ℘L is an even function. It follows that f
must be zero everywhere, showing that ℘L is periodic with respect to L. Definition 6.3. An elliptic function is a meromorphic function C → C that
is periodic with respect to a lattice L. If L = Zω1 + Zω2 , then ω1 and ω2 are
known as periods. With respect to a chosen basis {ω1 , ω2 }, the domain
Π = {r1 ω1 + r2 ω2 | 0 r1 , r2 < 1}
for the lattice L is the fundamental domain.
126
6 Elliptic Functions
Lemma 6.4. An elliptic function with no poles in its fundamental domain is
constant. Let Πβ = β + Π be the fundamental domain translated by β ∈ C,
and let f denote an elliptic function with no zeros or poles on the boundary
of Πβ
. If the zeros of f in Πβ have orders mi and the poles have orders nj ,
then
mi = nj .
Proof. The first statement is clear: Any such function would be bounded
on Π, and therefore on all of C, by periodicity, so it is a bounded entire
function and therefore must be constant by Liouville’s Theorem.
For the second statement, first notice that
f (z) dz = 0
Πβ
since f has the same values on opposite sides of Πβ , while dz changes sign. The
result now follows by applying this to the elliptic function g(z) = f (z)/f (z).
Near a zero z0 of order m for f , g has a simple pole with residue m (that
m
near z0 ). Near a pole z0 of order n for f , g has a
is, g(z) behaves like z−z
0
n
).
simple pole with residue −n (that is, g(z) behaves like − z−z
0
Cauchy’s Residue Theorem gives the result.
6.2 Parametrizing an Elliptic Curve
Lemma 6.4 will be used to prove the main result of this section: The values
of ℘L (z) and ℘L (z), for z lying in the fundamental domain, parametrize a
complex elliptic curve. Before stating this important result, we return to the
question raised at the end of Section 5.1: What is the identity element for the
binary operation on an elliptic curve?
In order to answer this, we need to come clean about elliptic curves. The
discussion in Section 5.1 concerned the set of solutions to an equation y 2 =
x3 + ax2 + bx + c in R2 ; these are just an affine part of the real points of the
curve. A complex elliptic curve is really the set of complex points in projective
space satisfying the projectivized version of the equation. The vague notion
of adding a point ‘at infinity’ can be made precise by studying elliptic curves
in this more natural setting of projective space.
Two-dimensional projective space P2 (C) is defined to be the set of equivalence classes
P2 (C) = {(z0 , z1 , z2 ) ∈ C3 | (z0 , z1 , z2 ) = (0, 0, 0)}/ ∼,
where (z0 , z1 , z2 ) ∼ (z0 , z1 , z2 ) if there is a constant λ = 0 with
(z0 , z1 , z2 ) = (λz0 , λz1 , λz2 ).
An element of P2 (C) is then an equivalence class, and we write
6.2 Parametrizing an Elliptic Curve
127
[z0 , z1 , z2 ] = {(z0 , z1 , z2 ) | (z0 , z1 , z2 ) ∼ (z0 , z1 , z2 )}
for the equivalence class containing (z0 , z1 , z2 ).
The complex elliptic curve E(C) associated with the equation
E : y 2 = x3 + ax2 + bx + c
is the subset of P2 (C) defined by
E(C) = {[z0 , z1 , z2 ] | z12 z2 = z03 + az02 z2 + bz0 z22 + cz23 }.
Notice that this curve contains two parts. If z2 = 0, then we can assume
without loss of generality that z2 = 1, so all the points [z0 , z1 , 1] with
z12 = z03 + az02 + bz0 + c
lie on E. This is the complex affine part of the curve. There is exactly one point
with z2 = 0 (if z2 = 0, then z0 = 0 so z1 must be nonzero), namely [0, 1, 0].
This point is the “point at infinity” on the curve.
We will write
E : y 2 = x3 + ax2 + bx + c
for the complex projective curve, suppressing the third variable (because it
only contributes one point to the curve). We will always assume that the righthand side has no repeated zeros. (See Exercise 2.14 for a simple formulation
of this condition in the case a = 0.)
It will be useful to talk about the K-points of an elliptic curve for other
fields K. The curve E : y 2 = x3 +ax2 +bx+c is said to be defined over a field L
if the coefficients a, b, c come from L. For any field K containing L, the Kpoints of the curve, E(K), are the points in E whose projective coordinates
can be chosen in K. Thus E(C) is the complex projective curve. The following
is a major result and most of this section will be devoted to the proof.
Theorem 6.5. Let L ⊆ C denote a lattice with fundamental domain Π.
(1) There are constants a = a(L) and b = b(L) with 4a3 + 27b2 = 0 such that,
for all z ∈ C\L,
1 2
4 ℘L (z)
= ℘L (z)3 + a℘L (z) + b.
(2) For z ∈ C/L, the map π : Π → P2 (Q) defined by π(0) = [0, 1, 0] and
π(z) = [℘L (z), 12 ℘L (z), 1],
z = 0,
defines a bijection between Π and the set of complex projective points on
the elliptic curve E : y 2 = x3 + ax + b.
(3) Suppose z1 , z2 , z3 ∈ Π have images π(zi ) = Pi , i = 1, 2, 3. Then
z 1 + z2 + z3 = 0
in Π if and only if P1 , P2 , and P3 lie on a straight line.
128
6 Elliptic Functions
The last part of the theorem is the long-awaited justification that the
operation defined on the points of an elliptic curve in Chapter 5 is a group
operation. Under the bijection
π : z → [℘L (z), 12 ℘L (z), 1],
z = 0,
the fact that the point 0 = z ∈ C/L corresponds to the point at infinity
relates the geometrical idea of infinity on the projective curve to the analytic
idea that ℘L (z) −→ ∞ as z −→ 0. This is important if we work with the
projective curve because the set of projective points forms a group with the
point at infinity as the identity. Notice that this arises simply by transporting
the group structure of C/L to the curve E. Theorem 6.5(3) says that the
familiar addition in C is related, via the transcendental functions ℘L and ℘L ,
to the geometric addition on the projective curve. This transport of structure
from the additive group C to the curve proves that the geometric binary
operation on the projective curve really does satisfy the group axioms. Now
the ‘Lefschetz principle’ (see the footnote on p. 108) shows that this result
over C extends to verify the group law for elliptic curves over arbitrary fields
in characteristic not equal to 2 or 3.
Exercise 6.3. Show that
℘L (ω1 /2) = ℘L (ω2 /2) = ℘L ((ω1 + ω2 )/2) = 0.
Show that there are no other solutions of ℘L (z) = 0 with z ∈ Π.
Exercise 6.3 identifies the 2-torsion points on the elliptic curve with reference to the lattice L. The complex torsion on an elliptic curve can easily be
described. We will take a brief interlude to apply Theorem 6.5 to the study
of the complex torsion points on an elliptic curve. The proof of Theorem 6.5
will follow in Section 6.4.
6.3 Complex Torsion
Theorem 6.5 allows the torsion points on an elliptic curve to be understood in
a way that is analogous to our understanding of torsion points on the circle:
Since e : R → S1 has kernel 2πZ, it induces an isomorphism
e : R/2πZ −→ S1 .
The distinct points of order dividing n in the additive group R/2πZ are those
of the form 2πj
n +2πZ for j = 0, 1, . . . , n−1. We deduce that the points of order
dividing n in S1 are those of the form e(2πj/n) = e2πij/n for j = 0, 1, . . . , n−1.
It is not difficult to find the points of order dividing n on S1 . Theorem 6.5
repeats the trick for the problem of finding all points of order dividing n for
the group operation on a complex elliptic curve. Given 1 n ∈ N, the points
6.4 Partial Proof of Theorem 6.5
129
z = (r1 ω1 + r2 ω2 )/n for 0 r1 , r2 n
all have nz ≡ 0 modulo L, and these are the n2 points with order dividing n
in the group C/L. These are torsion points on the complex curve. Deciding
which of these points correspond to rational torsion points on the curve is a
different and difficult question.
Exercise 6.4. Let En (C) for n ∈ N denote the subgroup of points on a complex elliptic curve E whose order divides n. Show that
En (C) ∼
= Z/nZ ⊕ Z/nZ.
6.4 Partial Proof of Theorem 6.5
We are not going to prove all of Theorem 6.5; in particular we will not prove
that the quantity 4a3 + 27b2 is not zero. A complete account may be found
in the references. What we will show is how the important equation in Theorem 6.5(1) arises.
Proof of Theorem 6.5(1). Assume first that z has |z| < || for all
nonzero ∈ L. Then the Taylor expansion about z = 0 gives
1
1
1
1
2z
3z 2
4z 3
− 2 = 2 (1 − z/)−2 − 2 = 3 + 4 + 5 + · · · .
2
(z − )
By absolute convergence of the series defining ℘L (z), we can rearrange the
terms in
2z
1
3z 2
4z 3
℘L (z) = 2 +
+ 4 + 5 + ···
z
3
0=∈L
to get
℘L (z) =
1
+ 2z
−3 + 3z 2
−4 + 4z 3
−5 + · · · .
2
z
indicates that the sum is over the nonzero lattice points ∈ L only. For
The
any n ∈ N, the terms of the form −(2n+1) as runs through the
nonzero terms
−2n−1
of L cancel out in pairs: (−)−2n−1 = −−2n−1 . It follows that
= 0,
so the Laurent expansion of ℘L (z) about z = 0 looks like
℘L (z) =
1
+ 3z 2 G4 (L) + 5z 4 G6 (L) + · · · ,
z2
where
G2n (L) =
(6.3)
−2n , 1 n ∈ N.
This expression agrees with the classical result that even meromorphic functions only have even powers in their Laurent expansion at 0.
130
6 Elliptic Functions
Consider the function
g(z) = ℘L (z)2 − 4℘L (z)3 + 60G4 (L)℘L (z) + 140G6 (L).
This function is analytic on Π, moreover g is periodic with respect to L
because it is an algebraic expression in periodic functions. Finally, it can be
checked that the Laurent expansion of g(z) contains only positive powers of z.
By Lemma 6.4, g must be a constant. Setting z = 0 shows that this constant
value must be zero, so g is the zero function and hence the equation stated in
the theorem holds (after dividing by 4).
Notice that a = −15G4 (L) and b = −35G6 (L).
Notice that Theorem 6.5(1) is a statement about all z ∈ C\L. In the proof,
we have assumed that |z| < || for all nonzero lattice points. This
means
in
2
particular that the proof is valid for all points in the region − ω1 +ω
+ Π;
2
it follows for all z ∈ C\L by periodicity.
Exercise 6.5. (a) Let L = 1, i. Show that the corresponding curve EL has
equation y 2 = x3 + ax for some a ∈ R.
(b) Let L = 1, ω, where ω denotes a cube root of unity. Show that the
corresponding elliptic curve EL has equation y 2 = x3 + b for some b ∈ R.
Proof of Theorem 6.5(2). We show that the map is a bijection, beginning
with surjectivity. Suppose α ∈ C is given. The function ℘L (z) − α has two
poles (actually one double pole) in Π so, by Lemma 6.4, it must have two
zeros. To prove injectivity (which appears to be threatened by the existence
of the two zeros) note that the two zeros are negatives of each other. This is
because, for z ∈
/ L, ℘L (−z) = ℘L (z). However, ℘L (−z) = −℘L (z). Thus, the
images of z and −z will (usually) be distinct points on the curve, the only
counterexamples arising when ℘L (z) = 0. By Exercise 6.3, this happens for
only three values of z, namely w1 /2, w2 /2, and (w1 +w2 )/2, but this is exactly
when z and −z define the same element of C/L.
Finally, we show how an argument using complex analysis gives the third
part of Theorem 6.5.
Proof of Theorem 6.5(3). Let the equation of the line containing the
points P1 and P2 be y = mz + b. Consider the function
f (z) = ℘L (z) − m℘L (z) − b.
This has three poles in Π (actually one triple pole) so, by Lemma 6.4, it has
three zeros. Two of these are z1 and z2 ; let z3 denote the third. Then P1 , P2 ,
and P3 lie on the line y = mz + b and (3) is seen by integrating the function h(z) = zf (z)/f (z) over a displaced parallelogram Πβ = β + Π, where β
is chosen so that h has no singularities on the boundary Γβ of Πβ shown in
Figure 6.3.
6.4 Partial Proof of Theorem 6.5
131
β + ω1 + ω2
β + ω1
β + ω2
β
Figure 6.3. Integrating along the four sides of Γβ .
The main part of the proof is to show that z1 + z2 + z3 ∈ L. By Cauchy’s
Residue Theorem,
1
(6.4)
h(z) dz = z1 + z2 + z3
2πi Γβ
because h has a simple pole at each zi with residue zi . Now break the integral
in Equation (6.4) into two parts corresponding to pairs of opposite sides in Γβ :
β+ω2
β+ω1
1
1
h(z) dz =
h(z) dz +
h(z) dz
2πi Γβ
2πi
β
β+ω1 +ω2
β
β+ω1 +ω2
1
+
h(z) dz +
h(z) dz
2πi
β+ω2
β+ω1
= I1 + I2 .
Substitute z = w + ω2 in the second integral of I1 , and use the periodicity
of f to obtain
β+w1
β+w1
zf (z)
(z + w2 )f (z)
1
dz −
dz
I1 =
2πi
f (z)
f (z)
β
β
ω2 β+ω1 f (z)
=
dz.
2πi β
f (z)
Now make the substitution u = f (z) to deduce that
1
ω2
I1 =
du,
2πi Ω u
where Ω is the image of the line joining β to β+ω1 in the variable u. Periodicity
with respect to L means that Ω is a closed curve, so we finally obtain
132
6 Elliptic Functions
I1 =
ω2
2πi
Ω
1
du = mω2 ∈ Zω2 ,
u
where the integer m is the winding number, counting the number of times Ω
winds around zero.
A similar argument for the two other sides of Πβ shows that I2 = nw1 for
some n ∈ Z. Thus
z1 + z2 + z3 = nw1 + mw2 ∈ L.
Exercise 6.6. (a) Prove that, for any lattice L ⊆ C,
G8 (L) = 37 G24 (L).
(b) More generally, prove that all the Gi (i 8) can be expressed as polynomials in G4 and G6 with rational coefficients.
Exercise 6.7. (a) Given any nonzero c ∈ C, consider the map L → cL = L .
Let EL and EL denote the corresponding elliptic curves. Prove that the map
defines a group isomorphism between EL (C) and EL (C).
(b) Prove that the map in (a) has the following effect upon the coordinates
of the corresponding curves. If y 2 = x3 + ax + b is the equation defining EL
and y 2 = x3 + a x + b is the equation defining EL , show that the effect of the
map in (a) is to take (x, y) to (c−2 x, c−3 y). (Hint: Recall the definition of a
and b from Theorem 6.5(1).)
Exercise 6.8. (a) Show that, for any lattice L and c ∈ C∗ ,
G4 (cL) = c−4 G4 (L) and G6 (cL) = c−6 G6 (L).
(b) Prove that any elliptic curve y 2 = x3 + ax + b with ab = 0 is parametrized
by the Weierstrass ℘-function for some lattice L.
Notes to Chapter 6: The Lefschetz principle is discussed in Silverman [139, Section VI.6]. Theorem 6.5 is also proved in [139] along with a converse result: Given a
and b with 4a3 + 27b2 = 0, there exists a lattice L such that ℘L (z) and 12 ℘L (z)
parametrize the elliptic curve with equation y 2 = x3 + ax + b. For an explanation
of the remarkable phenomenon described in Exercise 6.6(b), consult Koblitz [89]. A
classical treatment of elliptic functions from the analytic viewpoint is contained in
Whittaker and Watson [160]; there are sophisticated accounts of elliptic functions
and their role in number theory in the books of Apostol [5], Chandrasekharan [29],
Lang [95] and Weil [159].
7
Heights
In this chapter we introduce a way to measure the arithmetic complexity of
points on elliptic curves. This measurement of the height turns out to be an
essential ingredient in understanding the structure of the rational points on
an elliptic curve. Our understanding of heights will be a key ingredient in the
proof of Mordell’s Theorem in Section 7.2.
7.1 Heights on Elliptic Curves
Given a rational affine point P = ( M
N , ∗), where M and N are coprime integers,
define the naı̈ve height of P to be
max{|M |, |N |} if M
N = 0,
H(P ) =
1
if M = 0.
Write x(P ) and y(P ) for the coordinates of an affine point P = (x(P ), y(P )).
Define the logarithmic height to be h(P ) = log H(P ).
The definition of the complex projective plane P2 (C) on p. 126 extends to
higher dimensions: For any field K, projective N -space over K is defined by
PN (K) = {(x0 , . . . , xN ) | (x0 , . . . , xN ) = (0, . . . , 0)}/ ∼,
where (x0 , . . . , xN ) ∼ (x0 , . . . , xN ) if there is a constant λ ∈ K∗ with
(x0 , . . . , xN ) = λ(x0 , . . . , xN ).
As before, we write [x0 , . . . , xN ] for the equivalence class (or point in projective
space) containing the affine point (x0 , . . . , xN ).
The naı̈ve height extends to projective space PN (Q). Given a point [y]
in PN (Q), choose x = (x0 , . . . , xN ) ∈ ZN +1 in such a way that [y] = [x] and
gcd(x0 , . . . , xN ) = 1.
134
7 Heights
Then the projective height
H : PN (Q) → R
is defined by
H([x]) = max {|xi |}.
i=0,...,N
Notice that this is compatible with the naı̈ve height in the following sense:
If P = (x, y) is a point on the affine piece of E(Q), then H(P ) = H([x, 1]),
where [x, 1] ∈ P1 (Q).
The logarithmic quantity h(P ) is a simple example of a Mahler measure:
log max{|M |, |N |} = m(N x − M )
(see p. 150).
Examples 5.1 and 5.10 suggest that the number of decimal digits in the numerator and the denominator roughly quadruples each time a point is doubled.
This is a manifestation of a general phenomenon, the duplication formula.
Theorem 7.1. [Duplication Formula] Let E denote an elliptic curve defined over the rationals, and let P be a point in E(Q). Then
h(2P ) = 4h(P ) + O(1),
(7.1)
where the implied constant in O depends on E but not on the point P .
This will be proved on p. 137 after some more machinery has been developed.
In multiplicative notation, the duplication formula may be written
H(P )4 H(2P ) H(P )4 .
Example 7.2. Consider the curve E : y 2 = x3 − n2 x with 1 n ∈ N. Let P be
a rational point on E. A calculation gives
x(2P ) =
so if x(P ) =
M
N
x2 + n2
2y
2
=
(x2 + n2 )2
,
4(x3 − n2 x)
in lowest terms, then
x(2P ) =
(M 2 + n2 N 2 )2
.
4M N (M 2 − n2 N 2 )
(7.2)
It may be checked that any cancellation in Equation (7.2) is bounded: Explicitly, if d divides both numerator and denominator, then d|16n6 . Examining
the cases |M | |N | and |M | < |N | separately shows that
max{|M 2 + n2 N 2 |2 , |N 2 (M 2 − n2 N 2 )|}
7.1 Heights on Elliptic Curves
135
is commensurate1 with
max{|M |4 , |N |4 } = max{|M |, |N |}4 ,
and the duplication formula Equation (7.1) follows.
Exercise 7.1. Verify Theorem 7.1 for the curve y 2 = x3 + C, C = 0.
The duplication formula is a special case of a general principle about polynomial maps on projective space, and so we prove Theorem 7.1 in greater
generality.
A polynomial f in N variables is called homogenous if there is a constant d ∈ N (the degree of f ) with
f (λx0 , . . . , λxN ) = λd f (x0 , . . . , xN ).
Exercise 7.2. Let f0 , . . . , fM be polynomials in N + 1 variables. Show that
the map x → (f0 (x), . . . , fM (x)) between KN +1 and KM +1 induces a welldefined map PN (K) → PM (K) if and only if the polynomials f0 , . . . , fM are all
homogenous of the same degree and the only common zero of the polynomials
is the point (0, . . . , 0).
Definition 7.3. A map
f : PN (Q) −→ PM (Q)
is called a morphism of degree d if
f ([x]) = f ([x0 , . . . , xN ]) = [f0 ([x]), . . . , fM ([x])],
where the fj , 0 j M are homogenous polynomials of degree d with the
property that the only common zero is 0.
Lemma 7.4. Let f : PN (Q) → PM (Q) be a morphism of degree d. Then
H([x])d H(f ([x])) H([x])d .
Proof. Write f ([x]) = [f0 (x), . . . , fM (x)], where [x] = [x0 , . . . , xN ] ∈ PN (Q).
By clearing denominators, we may assume that each xj is an integer. Since
each fi is homogenous of degree d, they may be written
fi (x) =
ce xe00 · · · xeNN ,
with ce ∈ Q, ei ∈ N, e0 + · · · + eN = d, and only finitely many ce nonzero. It
follows that there is a constant C such that
1
In the sense that the ratio is bounded above and below by positive constants
independent of N and M .
136
7 Heights
|fi (x)| C · (max{|xj |})d ,
for each i and all j, so there is a similar bound for max{|fi (x)|}. To find the
height, notice that the only possible denominators that need to be cleared
come from the coefficients of the polynomials fi , which is a bounded quantity
in total. It follows that there is an upper bound for the height of the form
C · H(x)d .
To get the lower bound, we use Hilbert’s Nullstellensatz: There exists e ∈ N
and polynomials gij ∈ Q[x] such that
xe0 = g00 (x)f0 (x) + · · · + g0N (x)fN (x)
..
.
xeN = gN 0 (x)f0 (x) + · · · + gN N (x)fN (x).
The gij s can be taken to be homogenous polynomials of degree (e − d) so
|gij (x)| (max{|xk |})e−d .
On the other hand,
xej = gj0 (x)f0 (x) + · · · + gjN (x)fN (x)
for j = 0, . . . , N , so
(max{|xj |})e−d max{|f0 |, . . . , |fN |} (max{|xj |})e .
It follows that
max{|f0 |, . . . , |fN |} (max{|xj |})d ,
and since the only possible denominators are those arising from the coefficients
of the fi , the lower bound is proved.
Example 7.5. To see that e > d really occurs in the Nullstellensatz, define
f : P1 (Q) → P1 (Q)
by
f : [x0 , x1 ] → [x20 , (x0 + x1 )2 ] = [f0 (x0 , x1 ), f1 (x0 , x1 )].
Then f is a morphism of degree 2. Now x20 = 1 · f0 , but there are no rational
polynomials A, B for which x21 = A · f0 + B · f1 . However,
x30 = x0 · f0
x31 = (2x0 + 3x1 ) · f0 + (−2x0 + x1 ) · f1 .
7.1 Heights on Elliptic Curves
137
Exercise 7.3. (a) Using the explicit formulas from Example 7.2, prove that
the map defined by [x(P ), 1] → [x(2P ), 1] is a morphism of degree 4 for the
curve y 2 = x3 − n2 x.
(b) Do the same for the curve y 2 = x3 + c.
Proof of Theorem 7.1. By Lemma 7.4, all we need to show is that the
map
[x, 1] → [x(2P ), 1]
on P1 (Q) is a morphism of degree 4. Assume that the curve is
E : y 2 = x3 + ax + b
and P = (x, y). Then
2
3x2 + a
− 2x
2y
(3x2 + a)2
− 2x
4y 2
9x4 + 6x2 a + a2
− 2x
4(x3 + ax + b)
x4 − 2x2 a − 8xb + a2
.
4(x3 + ax + b)
x(2P ) =
=
=
=
Write x = xx01 ∈ Q (in lowest terms as usual). Then, writing x(2P ) =
and dropping a factor of 4,
f0 (x0 ,x1 )
f1 (x0 ,x1 )
f0 (x0 , x1 ) = x40 − 2x20 x21 a − 8x0 x31 b + a2 x41 , and
f1 (x0 , x1 ) = x30 x1 + ax0 x31 + bx41 .
To show that these define a morphism of degree 4, it only remains to show
that the unique common zero of f0 and f1 is (0, 0). If f0 = f1 = 0 and x1 = 0,
then x0 = 0. Assume x1 = 0. Then we may assume that x1 = 1 and x0 = x.
We now need to show that
f (x) = x4 − 2x2 a − 8xb + a2 , and
g(x) = x3 + ax + b
cannot have a common zero. One way to see this is using resultants (see
Exercise 7.4); we will use the Euclidean Algorithm (see Example 2.3(2)) to
find the greatest common divisor of f and g. Assume first that a = 0 and
recall we are assuming that 4a3 + 27b2 = 0. The Euclidean Algorithm gives
x4 −2x2 a−8xb+a2 = x3 +ax+b x−3ax2 −9bx+a2 ;
2
1
9b
4
b
x3 + ax + b = −3ax2 − 9bx + a2 − x + 2 +
+
a
x;
3a
a
a2
3
9a3
27ba2
4
b2
+a2,
− 3
x−
−3ax2 −9bx+a2 =
a+9 2 x
3
a
4a +27b2
(4a3 +27b2 )
138
7 Heights
which shows that the greatest common divisor of f and g is a nonzero constant.
If a = 0 then b = 0 since 4a3 + 27b2 = 0, so the Euclidean Algorithm gives
x4 − 8xb = (x3 + b)x − 9xb;
1 (x3 + b) = (−9xb) − 9b
x + b,
which, again, shows the greatest common divisor of f and g is a nonzero
constant.
Exercise 7.4. Show that the resultant of the polynomials
f (x) = x4 − 2x2 a − 8xb + a2 and g(x) = x3 + ax + b
is (4a2 + 27b2 )2 . This shows that the condition 4a3 + 27b2 = 0 implies that f
and g have no common zero.
Exercise 7.5. Let E : y 2 = x3 + ax + b with a, b ∈ Z denote an elliptic
curve. Using arguments from p-adic analysis, it can be shown that any nonzero
torsion point Q ∈ E(Q) must have x(Q) and y(Q) integral. Assuming this,
prove that y(Q) = 0 or y(Q)2 divides 4a3 +27b2 for any rational torsion point.
Exercise 7.6. Recall from Exercise 5.12 that the point P = (3, 8) has order 7
on the elliptic curve
y 2 = x3 − 43x + 166.
Using Exercise 7.5, show that there are no rational torsion points other than
those in the subgroup generated by P .
7.2 Mordell’s Theorem
In this section, we will see how Mordell’s Theorem follows from the weak
Mordell Theorem. In the next section, we will give a proof of the weak Mordell
Theorem for the congruent number curve and discuss how the proof can be
extended to cover a wider class of curves. The proof in full generality requires
more algebraic number theory than we have at our disposal. Complete proofs
may be found in the references discussed at the end of the chapter.
Theorem 7.6. [Weak Mordell Theorem] Let E denote an elliptic curve
defined over Q. Then E(Q)/2E(Q) is a finite Abelian group.
Lemma 7.7. Let E : y 2 = x3 + ax + b be an elliptic curve defined over the
rationals.
(1) If P0 = 0 is a point in E(Q), then there is a constant c1 = c1 (E, P0 ) > 0
such that
h(P + P0 ) < 2h(P ) + c1 .
(7.3)
7.2 Mordell’s Theorem
139
(2) Given h0 > 0, there are only finitely many points P ∈ E(Q) with
h(P ) < h0 .
Proof. (2) is clear since only finitely many rationals m
n (in lowest terms)
have log max{|m|, |n|} < h0 .
To prove (1), write P = (x, y) and P0 = (x0 , y0 ). From the equation
y 2 = x3 + ax + b,
write (in lowest terms)
x=
s
r
s
r
, y = 3 , x0 = 20 , y0 = 30 ,
2
t
t
t0
t0
with r, s, t, r0 , s0 , t0 all integers. Then
x(P + P0 ) =
y0 − y
x0 − x
2
− x − x0
=
y02 − 2y0 y + y 2
(x0 + x) 2
−
(x − 2x0 x + x2 )
(x0 − x)2
(x0 − x)2 0
=
x30 + ax0 + b + x3 + ax + b − 2y0 y
(x0 − x)2
−
(x30 − 2x20 x + x0 x2 + xx20 − 2x0 x2 + x3 )
(x0 − x)2
=
a(x0 + x) + 2b − 2y0 y − (−x20 x − x2 x0 )
(x0 − x)2
=
a(x0 + x) + 2b − 2y0 y + x0 x(x0 + x)
(x0 − x)2
=
(a + x0 x)(x0 + x) + 2b − 2y0 y
.
(x0 − x)2
Substituting r, s, t then gives
r r
r
r
ss
a + t20t2 t20 + t2 + 2b − 2 t3 t03
0
0
0
x(P + P0 ) =
2
r0
r
− t2
t2
0
(at20 t2 + r0 r)(r0 t2 + rt20 ) + 2bt4 t40 − 2ss0 tt0
=
.
(r0 t2 − rt20 )2
The effect of clearing denominators in the rationals a and b appearing as
coefficients in the elliptic curve can be absorbed into the constant c1 . It is
therefore sufficient to check that the numerator and denominator satisfy the
inequality in Equation (7.3).
140
7 Heights
First
|numerator| < (c2 |t|2 + c3 |r|)(c4 |t|2 + c5 |r|) +c6 |t|4 + c7 |st|.
!"
#
<c8
(7.4)
(max{|r|,|t|2 })2
The first two terms have
c8 (max{|r|, |t|2 })2 + c6 |t|4 c9 H(P )2 .
Looking at the third term, we need to show that
|st| c10 H(P )2 .
Since y 2 = x3 + ax + b, we have
(st)2 = r3 t2 + art6 + bt8 .
There are two cases to consider.
I: |r| |t|2 . In this case
|st|2 c11 |r|4 + c12 |r|4 + c13 |r|4 = c14 H(P )4 .
II: |r| < |t|2 . In this case
|st|2 < c15 |t|8 + c16 |t|8 + c17 |t|8 = c18 H(P )4 .
In both cases, |st| < c10 H(P )2 , as required. Therefore
|numerator| < c19 H(P )2 ,
or in logarithmic form
log |numerator| < c20 + 2h(P ).
The denominator is simpler:
|denominator| = |r0 t2 − rt20 |2
2
< c21 |t|2 + c22 |r|
< c23 H(P )2 ,
so
log |denominator| < c24 + 2h(P ).
Proof of Theorem 5.11 Assuming Theorem 7.6. Let Q = {Q1 , . . . , Qs }
denote a fixed set of coset representatives for 2E(Q) in E(Q). By Theorem 7.6,
it is enough to prove the following: There is a constant R = R(E, Q) with
the property that every point P ∈ E(Q) can be expressed as an integral
7.2 Mordell’s Theorem
141
combination of the Qi , i = 1, . . . , s and those Q ∈ E(Q) with h(Q) < R
(finite in number by Lemma 7.7(2)).
Let P be a rational point on E, and write (for some i0 , i1 , · · · ∈ {1, . . . , s})
P = P0 = Qi0 + 2P1
P1 = Qi1 + 2P2
..
.
Pn = Qin + 2Pn+1 .
The duplication formula Equation (7.1) says that
4h(Pn+1 ) − c1 < h(2Pn+1 )
(7.5)
for some c1 = c1 (E) > 0. On the other hand, Lemma 7.7(1) shows that
h(2Pn+1 ) = h(Pn − Qin ) < 2h(Pn ) + c2
(7.6)
for some c2 = c2 (E, Q). Combining Equation (7.5) and Equation (7.6) gives
1
h(Pn ) + c3
2
for some c3 = c3 (E, Q). Iterating this gives
1 1
h(Pn−1 ) + c3 + c3
h(Pn+1 ) <
2 2
1
1
= 2 h(Pn−1 ) + c3 1 +
2
2
..
.
1
1
1
1
< n+1 h(P0 ) + c3 1 + + 2 + · · · + n .
2
2 2
2
h(Pn+1 ) <
As n → ∞, for fixed P0 ,
1
2n+1
h(P0 ) → 0 and c3
1
1
1
1 + + 2 + ··· + n
2 2
2
−→ 2c3 .
(7.7)
Now P = Qi0 + 2P1 , P1 = Qi1 + 2P2 and so on gives
P = Qi0 + 2(Qi1 + 2P2 )
= Qi0 + 2Qi1 + 22 P2
= Qi0 + 2Qi1 + 22 Qi2 + 23 P3
..
.
so P is being written as an integer combination of elements of Q and a point
whose height is uniformly bounded as n → ∞ by Equation (7.7).
1
Take R = (2 + 10
)c3 , a constant depending on E and Q. We have shown
that any rational point P can be written as an integral combination of the
points of Q and a point with height bounded by R.
142
7 Heights
7.3 The Weak Mordell Theorem: Congruent
Number Curve
We will now give a proof of Theorem 7.6 for the congruent number curve
y 2 = x3 − n2 x,
where n > 0 denotes an integer. The proof uses a homomorphism from E(Q)
to a quotient group of the group of nonzero rationals and we begin my introducing this group. Let Q denote Q∗ /Q∗2 , which is the quotient of the group of
nonzero rationals by the subgroup of all nonzero squares. The representatives
for this group can be taken to be all nonzero integers r which are not divisible
by the square of a prime. We will write r for the coset containing r. Notice
that the identity of the group is 1 and the element −1 is an element of order 2
in Q. In this section, the point (0, 0) will play a distinguished role and will be
denoted T = (0, 0).
Lemma 7.8. Define a map φ1 : E(Q) → Q by
φ1 (0) = 1
φ1 (T ) = −1
φ1 ((x, y)) = x otherwise.
Then φ1 is a group homomorphism.
This is a remarkable claim. If you try to prove it simply using the addition
formula it can be difficult to dig out, and might even start to look impossible.
We will use a simple trick already encountered to make it come out quite
smoothly. The reason φ in the definition carries the suffix 1 is because we will
define two other similar maps shortly.
Proof of Lemma 7.8. Let P1 and P2 denote rational points with
P1 + P2 = P3 .
We wish to deduce that φ1 (P3 ) = φ1 (P1 )φ1 (P2 ). There are a number of special
cases to be considered before we can deal with the general situation. The only
nontrivial special case which requires any work arises when one of P1 or P2 is
the 2-torsion point T = (0, 0). Say we add P = (x, y) to T , where x = 0. The
image of the sum under φ1 is
(y/x)2 − x = (y 2 − x3 )/x2 = −n2 x = −x = φ1 (P )φ1 (T ),
hence the result is true in this special case. An almost identical proof gives
the case where P3 = T = (0, 0).
Recall Section 5.3, where we converted the geometric addition on an elliptic
curve into explicit formulas. The group law on an elliptic curve tells us that
7.3 The Weak Mordell Theorem: Congruent Number Curve
143
the points P1 , P2 and −P3 lie on the same straight line. Writing P1 + P2 = P3
with Pi = (xi , yi ), we need to show that x1 x2 x3 is a rational square. From the
above, we may assume each of x1 , x2 , and x3 are nonzero rational numbers.
Let the line containing the points P1 , P2 and −P3 be written y = αx + β,
for rationals α and β. Our assumptions guarantee that β = 0. Substitute the
equation of the line into the equation of the curve to get
x3 − n2 x − (αx + β)2 = 0.
The roots of this equation are the three rational numbers x1 , x2 and x3 because
it is this equation which defines them. Hence we can factorize the left-hand
side as
(x − x1 )(x − x2 )(x − x3 ).
Now if we compare the two equations (see Exercise 5.11 on p. 105) we see
that x1 x2 x3 is equal to β 2 , the square of a rational. In other words, up to a
rational square x1 x2 and x3 are equal; hence
φ1 (P1 + P2 ) = φ1 (P1 )φ1 (P2 ).
Exercise 7.7. Verify that Lemma 7.8 is true for an elliptic curve of the form
y 2 = x3 + ax2 + bx
with the same definition of φ1 .
We have already indicated that E(Q) can be an infinite group. The second
lemma says that even if that is true, the image of this group under φ1 is a
finite group.
Lemma 7.9. The image of E(Q) under φ1 is a finite subgroup of Q.
Proof. Suppose r lies in the image of φ1 . Without loss of generality, assume r
is a square-free
integer. We claim
that rn. To prove this, suppose p is a prime
with p r, then we will show p n. The statement φ1 ((x, y)) = r amounts to two
equations
x2 − n2 = rs2
x = rt2
for rationals s and t. Now clear denominators by writing t = a/b for coprime
integers a and b. Eliminating x, we obtain an equation
r2 a4 − n2 b4 = rc2
144
7 Heights
for some integer c. If pr but p n then pb and therefore p4 n2 b4 . Thus p2
must divide the left-hand side and it follows that p 2 must divide the righthand side. Since r is square-free, it follows that pc so p2 c2 and
hence p3
divides the right-hand side. This forces p3 to divide r2 a4 so pa (since r is
square-free). Thus p divides a and b which contradicts the assumption that
they are coprime.
Exercise 7.8. For the elliptic curve E defined by y 2 = x3 − 36x, the torsionfree part of E(Q) is generated by the rational point (−3, 9) (you may assume
this). Find the image of E(Q) under the map φ1 .
Exercise 7.9. Suppose E denotes an elliptic curve and p and q denote rational
numbers. The map x → x − p, y → y − q takes rational points on this curve
to rational points on a new elliptic curve. Assume that the point at infinity
on the first curve maps to the point at infinity on the second (this can be
verified by taking limits as before). Show that the resulting map is a group
isomorphism. In the language of Section 5.5, the map is an isogeny of degree
one.
Exercise 7.10. Define a map φ2 : E(Q) → Q by
φ2 (O) = 1;
φ2 ((n, 0)) = −1;
φ2 ((x, y)) = x − n otherwise.
Prove that φ2 is a group homomorphism. (Hint: Compose this map with a
suitable translation map and use Exercise 7.9.)
In a similar vein to Exercise 7.10, we can define a map φ3 : E(Q) → Q by
φ3 ((x, y)) = x + n
whenever x = −n.
Exercise 7.11. Show that both of the maps φ2 and φ3 have finite image in Q.
Our goal is in sight now. Combine the three maps into one by defining
φ : E(Q) → Q3
to be
φ(P ) = (φ1 (P ), φ2 (P ), φ3 (P )).
Earlier on we showed that the doubling map on a rational point on the congruent number curve E : y 2 = x3 − n2 x produced an x-coordinate which is
the square of a rational, provided the starting point does not have order 2.
This suggests that we might find 2E(Q) inside the kernel of φ. More is true.
7.3 The Weak Mordell Theorem: Congruent Number Curve
145
Lemma 7.10. The kernel of φ is precisely 2E(Q). In other words the rational
point P = (x, y) is the double of a rational point if and only if x and x ± n
are all rational squares. Explicitly, write
x = r12 ;
x + n = r22 ;
x − n = r32 for ri ∈ Q.
Then P = 2Q where Q = (X, Y ) and X and Y are given by the formulas
X = x + r1 r2 + r1 r3 + r2 r3 ,
Y = (r1 + r2 + r3 )(X − x) − y,
provided the signs of the ri are chosen so that r1 r2 r3 = y.
35
Example 7.11. Let n = 6. The point Q = (−3, 9) doubles to the point ( 25
4 ,− 8 )
5
7
2
3
on the curve y = x − 36x. This is verified by taking r1 = 2 , r2 = − 2 , r3 = 12 .
As expected,
25 5 7 5 1 1 7
X=
− . + . − . = −3,
4
2 2 2 2 2 2
and similarly Y = 9.
The proof of Lemma 7.10 is purely computational and we leave the verification as an exercise. The burden of explanation rests on the question of why
it should be true in the first place. In one sense it is not wrong to say it comes
down to Mordell’s genius. The notes at the end of the chapter include a useful
reference which suggests how Mordell might have come upon this remarkable
phenomenon.
Exercise 7.12. Suppose E is an elliptic curve defined by the equation
E : y 2 = x3 + ax2 + bx + c
where a, b and c are rational. Assuming the roots of the cubic are all rational,
adapt the proof above to deduce the weak Mordell Theorem for E.
In the general case, the technicalities of the proof are no greater from the
point of view of elliptic curves. What is required is a deeper knowledge of
algebraic number fields.
During this section, we have seen how homomorphisms between elliptic
curves, or homomorphisms from elliptic curves to other groups, played an
important role. Although we will not develop this any further, it is worth being
aware of the importance of the map which reduces modulo p, for a prime p.
This map takes an elliptic curve defined over Q to one defined over Fp . Since
all the group operations are defined by rational functions, we should not be
surprised that the map is a group homomorphism (though this does of course
require that the reduced curve is really an elliptic curve.) More remarkably, the
notion of “infinity” as the identity of the group is quite robust. The following
exercise gives an opportunity to encounter this phenomenon.
146
7 Heights
Exercise 7.13. Suppose E is an elliptic curve defined by the equation
E : y 2 = x3 + ax + b,
a, b ∈ Z.
Let p denote a prime number coprime to 4a3 + 27b2 and let E1 (Q) denote
the set of rational points (x, y) on the curve with the property that the denominators of x and y are divisible by p together with the point at infinity.
Prove that E1 (Q) is a subgroup of E(Q). (Hint: Resist the temptation to do
this using the functions defining addition. What is the kernel of the reduction
map?)
7.3.1 The Generation Game
We have seen some examples of elliptic curves with rank 1; for example the
curve given by the equation y 2 = x3 − 2, with the generator (3, 5), also the
congruent number curve for n = 6 which is generated by (−3, 9). It is natural
to ask how the rank can be proved to be 1 and how these can be proved to be
the generators. Although many special cases have been worked out, in general
there is no algorithm known for determining the rank of an elliptic curve nor
for finding a set of generators. In the notes at the end of the chapter, several
of the references provide details about how special cases can be approached,
as well as links to massive tables of curves whose ranks have been computed,
along with systems of generators. We recommend as a worthwhile exercise,
doing some computations with some of these curves using a computer algebra
package.
7.4 The Parallelogram Law and the Canonical Height
The duplication formula Equation (7.1) says that for any P ∈ E(Q)
h(2P ) = 4h(P ) + O(1),
or, equivalently, there is a constant c = c(E) such that
h(P ) − 1 h(2P ) < c.
4
(7.8)
The next result exploits this to produce a height function with better functorial properties, the canonical height. The approach below is due to Tate; the
canonical height was discovered independently by Neron.
Theorem 7.12. For any rational point P on an elliptic curve E defined over
the rationals,
h(2n P ) %
lim
= h(P )
(7.9)
n→∞
4n
exists. The limit %
h(P ) is called the canonical height of P .
7.4 The Parallelogram Law and the Canonical Height
Proof. Let aN =
1
4N
147
h(2N P ). If N > M 1, then
1
1
h(2M P ) − N h(2N P )
M
4
4
1
1
M
= M h(2 P ) − M +1 h(2M +1 P )
4
4
1
1
M +1
+ M +1 h(2
P ) − M +2 h(2M +2 P )
4
4
aM − aN =
+···
+
1
4N −1
h(2N −1 P ) −
1
h(2N P )
4N
which may be grouped into
1
1
M
M
aM − aN = M h(2 P ) − h(2 · 2 P )
4
4
1
1
M +1
M +1
+ M +1 h(2
P ) − h(2 · 2
P)
4
4
+···
1
1
+ N −1 h(2N −1 P ) − h(2 · 2N −1 P ) .
4
4
By the duplication formula (Theorem 7.1), this gives
1
1
1
1
4
→ 0 as M → ∞,
|aM − aN | < M c 1 + + 2 + · · · = M c
4
4 4
4
3
showing that (aN ) is a Cauchy sequence.
If the order of P is a power of 2, then %
h(P ) = 0. In fact, any torsion
%
%
point P has h(P ) = 0, and moreover h(P ) = 0 implies that P is a torsion
point by Theorem 7.13(4).
Theorem 7.13. Let E be an elliptic curve defined over the rationals.
(1) For every point P ∈ E(Q),
%
h(P ) = h(P ) + O(1)
uniformly.
(2) For all P, Q ∈ E(Q),
%
h(P + Q) + %
h(P − Q) = 2%
h(P ) + 2%
h(Q).
(3) For every m ∈ Z and P ∈ E(Q),
%
h(mP ) = m2 %
h(P ).
(7.10)
148
7 Heights
(4) For P ∈ E(Q),
%
h(P ) = 0 if and only if P is a torsion point.
This is proved below. Equation (7.10) is called the parallelogram law. It
follows from (1) and (3) in Theorem 7.13 that
h(mP ) = m2 h(P ) + O(1),
which is a weaker version of Theorem 5.2. A more useful generalization of this
formula is the parallelogram law for the naı̈ve height. This will be stated now,
then Theorem 7.13 will be proved. The parallelogram law for the naı̈ve height
will be proved in Section 7.5.
Lemma 7.14. For all P, Q ∈ E(Q),
h(P + Q) + h(P − Q) = 2h(P ) + 2h(Q) + O(1)
(7.11)
uniformly.
Proof of Theorem 7.13.
(1) By iterating the relation
h(P ) =
1
(h(2P ) + O(1)) ,
4
we have
1
h(22 P ) 1
+ O(1) + O(1)
4
4
4
2
h(2 P )
1
1
=
+
O(1)
+
42
4 42
..
.
h(2N P )
1
1
1
=
+
O(1)
+
·
·
·
+
.
+
4N
4 42
4N
!"
#
h(P ) =
O(1)
Letting N → ∞ gives
h(P ) = %
h(P ) + O(1).
(2) Applying a similar limiting procedure to the naı̈ve parallelogram law Equation (7.11) gives
7.4 The Parallelogram Law and the Canonical Height
149
%
h(P + Q) + %
h(P − Q) − 2%
h(P ) − 2%
h(Q)
1
1
= lim
h(2N (P + Q)) + N h(2N (P − Q))
N →∞ 4N
4
2
2
− N h(2N P ) − N h(2N Q)
4
4
1
= lim
O(1) = 0.
N →∞ 4N
(3) This is proved by induction on m 1. The case m −1 follows since
h(−P ) = h(P ) ⇒ %
h(P ) = %
h(−P ).
For m = 0, h(0) = 0 = %
h(0). Assume therefore that
%
h(mP ) = m2 %
h(P ),
and substitute mP for P and P for Q in the parallelogram law Equation (7.10):
%
h(mP + P ) = 2%
h(mP ) + 2%
h(P ) − %
h((m − 1)P )
2%
= 2m h(P ) + 2%
h(P ) − (m − 1)2 %
h(P )
= (m + 1)2 %
h(P ).
(4) If P is a torsion point, then mP = 0 for some m = 0, so by (3) %
h(P ) = 0.
Conversely, suppose that %
h(P ) = 0 for some P ∈ E(Q). Then
%
h(mP ) = m2 %
h(P ) = 0 for all m,
so h(mP ) must be uniformly bounded for all m by (1). By Lemma 7.7(2), this
means that the set {mP }m∈Z must be finite, so P is a torsion point.
7.4.1 A Strong Form of Siegel’s Theorem
The result that follows we call the Strong Siegel Theorem; it was proved by
Silverman and we will not prove it here. It relates the growth rates of the
numerators and the denominators of the multiples nP of a nontorsion rational
point.
Theorem 7.15. [Strong Siegel Theorem] Let E denote an elliptic curve
defined over Q and suppose P ∈ E(Q) denotes a nontorsion point. Let (Pn )
be any sequence of rational points with %
h(Pn ) → ∞ as n → ∞, and write
An Cn
.
Pn =
,
Bn2 Bn3
Then
log |An |
−→ 1
2 log |Bn |
as
n −→ ∞.
150
7 Heights
This can be interpreted as saying that the numerators and denominators
of Pn have roughly the same number of decimal digits for large n. Theorem 5.2
follows from this, together with Theorem 7.13. A particular case of a rational
sequence (Pn ) with %
h(Pn ) → ∞ is given by taking Pn = nP for a rational
nontorsion point P on an elliptic curve defined over the rationals.
7.5 Mahler Measure and the Naı̈ve Parallelogram Law
In proving Lemma 7.14, some simple estimates on polynomials will be needed,
and one way to phrase them is to use the Mahler measure, which is of independent interest. There are several natural ways to measure the size of a
polynomial in such a way that an integer polynomial with zeros of large arithmetic complexity will have large measure.
Definition 7.16. For any nonzero polynomial
d
d−1
F (x) = ad x + ad−1 x
+ · · · + a0 = ad
d
(x − αi )
i=1
in C[x], define three measures as follows.
d
(1) The Mahler measure of F is M (F ) = |ad | · i=1 max{1, |αi |}.
(2) The height of F is H(F ) = max01d {|ai |}.
d
(3) The length of F is L(F ) = i=0 |ai |.
In (1), an empty product is assumed to be 1, so the Mahler measure of
the nonzero constant polynomial F (x) = a0 is |a0 |. Write m(F ) = log M (F )
for the logarithmic Mahler measure of F .
Mahler showed that
d
|ai | M (F ) for all i = 0, . . . , d
i
and also showed that all three measures are commensurate in the sense that
H(F ) M (F ) H(F )
and
L(F ) M (F ) L(F ),
(7.12)
with the implied constants depending only on the degree d.
The absolute value of the discriminant of F is defined to be
|αi − αj |.
|∆(F )| = |ad |2d−2
i=j
Mahler also showed that
|∆(F )| dd M (F )2d−2 .
(7.13)
7.5 Mahler Measure and the Naı̈ve Parallelogram Law
151
Exercise 7.14. (a) Prove that
−d log 2 + (F ) m(F ) (F ),
where we write = log L. This is equivalent to an exact description of the
implied constants in Equation (7.12):
2−d L(F ) M (F ) L(F ).
(b) Prove a weaker form of the inequality (7.13) as follows. Assume that
(x − αi )
F (x) = xd + ad−1 xd−1 + · · · + a0 =
1id
is monic, so the absolute value of the discriminant is
|∆(F )| =
|αi − αj |.
i=j
Prove that
|∆(F )| 2d(d−1) M (F )2d−2 .
Exercise 7.15. Fix a polynomial
F (x) = ad xd + ad−1 xd−1 + · · · + a0 = ad
d
(x − αi )
i=1
in Z[x]. Call F hyperbolic if |αi | =
1 for all i = 1, . . . , d and ergodic if αik = 1
for some k 0, and any i implies that k = 0.
(a) Prove that
1
m(F ) =
log |F (e2πis )| ds
(7.14)
0
when F is hyperbolic.
(b) Prove Equation (7.14) without assuming that F is hyperbolic.
d
(c) Prove that ∆n (F ) = i=1 |αin −1| is an integer for all n. For F hyperbolic,
prove that
∆n+1 (F )
lim ∆n (F )1/n = lim
= M (F ).
n→∞
n→∞ ∆n (F )
(d) Prove that an ergodic polynomial of degree d 3 is hyperbolic.
(e) Find a polynomial that is ergodic but not hyperbolic.
(f)*For F ergodic but not hyperbolic, prove that
lim ∆n (F )1/n = M (F )
n→∞
but that
∆n+1 (F )
∆n (F )
does not converge as n → ∞.
152
7 Heights
Exercise 7.16. [Kronecker’s Lemma] Prove that a polynomial F ∈ Z[x]
has m(F ) = 0 if and only if every zero λ of F satisfies λk = 1 for some k 1.
Exercise 7.17. Considerable interest has been shown in the set of values of
the Mahler measure of integer polynomials.
(a) Compute m(F ) to 3 decimal places when
F (x) = x10 + x9 − x7 − x6 − x5 − x4 − x3 + x + 1.
(7.15)
(b)*Explore the mathematical literature on Lehmer’s Problem: Is there an
integer polynomial G with m(G) > 0 and with m(G) < m(F )? More generally,
given arbitrary > 0, is there an integer polynomial H with m(H) > 0
and m(H) < ? Extensive calculations have been made of values of the Mahler
measure for monic polynomials, and no nonzero value smaller than m(F ) has
been found.
Proof of Lemma 7.14. Let E : y 2 = x3 + ax + b be the elliptic curve. Let P
and Q be points in E(Q), and write x(P ) = x1 , x(Q) = x2 , x(P + Q) = x3 ,
and x(P − Q) = x4 .
The values of x3 and x4 depend on the y coordinates of P and Q, which
complicates the proof considerably. We will work in the coordinates
x1 x2 , x1 + x2 , x3 x4 , and x3 + x4 ,
because these only depend on the x coordinates. Now
2(x1 + x2 )(a + x1 x2 ) + 4b
,
(x1 − x2 )2
(x1 x2 − a)2 − 4b(x1 + x2 )
,
x3 x4 =
(x1 − x2 )2
x3 + x4 =
and we may write
(x1 − x2 )2 = (x1 + x2 )2 − 4x1 x2 ,
giving x3 + x4 and x3 x4 in terms of x1 x2 , x1 + x2 .
We claim that for any x1 , x2 ∈ Q,
h([x1 + x2 , x1 x2 , 1]) = h([x1 , 1]) + h([x2 , 1]) + O(1).
To see this, write x1 = st , x2 =
u
v
(7.16)
in lowest terms, and define
F1 (x) = tx − s,
F2 (x) = vx − u.
Then
m(F1 F2 ) = m(F1 )m(F2 ).
Now by Equation (7.12) and Exercise 7.14,
(7.17)
7.5 Mahler Measure and the Naı̈ve Parallelogram Law
m
d
ai xi
=h
i=0
where h(
d
i=0
d
153
ai xi
+ O(d),
i=0
ai xi ) = log max{|ai |}. Applying this to Equation (7.17) gives
h(F1 F2 ) = h(F1 ) + h(F2 ) + O(1).
(7.18)
Now
h(F1 ) = h([x1 , 1]),
h(F2 ) = h([x2 , 1]).
(7.19)
On the other hand,
F1 (x)F2 (x) = (tx − s)(vx − u) = tvx2 − x(sv + tu) + su
so
h(F1 F2 ) = max{|tv|, |sv + tu|, |su|}.
Now
sv + tu
, and
tv
su
x1 x2 =
,
tv
x1 + x2 =
so
h([x1 + x2 , x1 x2 , 1]) = h
sv + tu su
, ,1
tv
tv
= h([sv + tu, su, tv]).
Now sv + tu, su, and tv cannot have a common factor by Gauss’ Lemma,
so h(F1 F2 ) = h([x1 +x2 , x1 x2 , 1]), and Equations (7.18) and (7.19) give Equation (7.16).
Change variables and work with x1 + x2 and x1 x2 :
2(x1 + x2 )(a + x1 x2 ) + 4b
and
(x1 + x2 )2 − 4x1 x2
(x1 x2 − a)2 − 4b(x1 + x2 )
.
x3 x4 =
(x1 + x2 )2 − 4x1 x2
x3 + x4 =
Lemma 7.17. Assume that 4a3 + 27b2 = 0. Then the map P2 (Q) → P2 (Q)
defined by
[u, v, t] = [2u(at + v) + 4bt2 , (v − at)2 − 4btu, u2 − 4tv]
is a morphism of degree 2.
The formulas in Lemma 7.17 come from setting u = x1 + x2 and v = x1 x2
and using t to make the expressions homogenous.
Proof of Lemma 7.17. Suppose the three polynomials vanish:
154
7 Heights
2u(at + v) + 4bt2 = 0,
(v − at) − 4btu = 0, and
u2 − 4tv = 0.
2
(7.20)
(7.21)
(7.22)
If t = 0, then u = 0 by Equation (7.22), so v = 0 by Equation (7.21).
Suppose therefore that t = 0, and divide by t2 in each equation. Write
x=
u
v
, so x2 =
2t
t
by Equation (7.22). Equations (7.20) and (7.21) give
(x2 − a)2 − 8bx = 0 and
x(a + x2 ) + b = 0,
or
x4 − 2ax2 − 8bx + a2 = 0 and
x3 + ax + b = 0.
These polynomials arose in the proof of the duplication formula on p. 138,
where it was shown that they have no common zero.
Now apply Lemma 7.17 to the vectors
[x1 + x2 , x1 x2 , 1] and [x3 + x4 , x3 x4 , 1].
Since the map from the first to the second is a morphism of degree 2,
h([x3 + x4 , x3 x4 , 1]) = 2h([x1 + x2 , x1 x2 , 1]) + O(1)
by Lemma 7.4. Equation (7.16) shows that
h([x3 + x4 , x3 x4 , 1]) = h([x3 , 1]) + h([x4 , 1]) + O(1)
and
h([x1 + x2 , x1 x2 , 1]) = h([x1 , 1]) + h([x2 , 1]) + O(1),
so
h([x3 , 1]) + h([x4 , 1]) = 2h([x1 , 1]) + 2h([x2 , 1]) + O(1),
and therefore
h(P + Q) + h(P − Q) = 2h(P ) + 2h(Q) + O(1).
Notes to Chapter 7: The polynomial in Equation (7.15) is taken from Lehmer’s
paper [97]; a starting point for Exercises 7.15 and 7.17(b) is [59] and the references
7.5 Mahler Measure and the Naı̈ve Parallelogram Law
155
therein. Hilbert’s Nullstellensatz may be found in any book on algebraic geometry; an accessible account is in Reid’s notes [122]. The statement about integrality
of torsion points in Exercise 7.5 is due to Lutz [101] and Nagell [112]. Accessible
proofs are in Cassels [27, Chapter 12], Husemöller [79, Chapter 5, Section 6] and
Silverman [139, Chapter VIII, Section 7]. The characterization of all torsion points
on the curves y 2 = x3 + ax (for a integral and not divisible by a fourth power)
and y 2 = x3 + b (for b integral and not divisible by a sixth power) is given in
Cassels [27, Exercise to Chapter 12]. Theorem 7.6 is proved in Lang [94], and Silverman [139]; a sketch proof from an advanced point of view is in the article by
Milne [108, Theorem 20.10]; see also Lemmermeyer’s excellent Web notes on elliptic
curves. Cassels’ article [26] is an excellent piece of background reading, in which he
gives a plausible explanation as to how Mordell might have discovered what became
known as the weak Mordell Theorem. Cremona’s Web site [37] gives the rank and
a set of generators for thousands of elliptic curves. The strong form of Siegel’s Theorem (Theorem 7.15) can be found in Silverman [139, Chapter IX, Section 3]. The
Mahler measure in Section 7.5 was introduced in two papers of Mahler [103], [104].
There are extensive references to the many places where the Mahler measure arises
in [59], especially connections between heights and dynamical systems. Computational material related particularly to the Mahler measure from Section 7.5 appears
in a book by Borwein [18].
8
The Riemann Zeta Function
We saw in Chapter 1 that estimates for sums of arithmetic functions are an
essential step in understanding arithmetic problems. One of the themes we
wish to pursue is the following strange phenomenon: Arithmetic properties of
integers, especially primes, can be deduced from analytic properties of functions. A serious instance is afforded by the Prime Number Theorem itself
(see p. 3 for the ∼ and o(x) notation).
Theorem 8.1. [Prime Number Theorem] Asymptotically, the number of
primes is given by
π(x) = |{p ∈ P | p x}| ∼
x
.
log x
The Prime Number Theorem is of major importance in number theory. The
notes at the end of the chapter give references where proofs can be found, including elementary approaches (in this context “elementary” means “without
recourse to complex analysis”, and not “easy”). Around the beginning of the
nineteenth century, Legendre published a conjecture equivalent to the Prime
Number Theorem. Gauss also studied the values of π(x) at a similar time and
conjectured1 that
x
1
dt.
(8.1)
π(x) ∼ Li(x) =
log
t
2
The Prime Number Theorem was first proved in 1896 by two mathematicians
independently – Hadamard and de la Vallée Poussin. Their proofs used the
Riemann zeta function and they were able to give an estimate for the error
1
For all small values of x, π(x) < Li(x), and several prominent mathematicians
conjectured that the inequality always holds. In 1914, Littlewood proved that the
inequality reverses infinitely often. Amazingly, the smallest value of x where the
inequality first reverses is still not known, although it is known to be below 10371 .
This is a compelling instance of a situation where even enormous amounts of
numerical evidence can be completely deceptive.
158
8 The Riemann Zeta Function
term in the formula, based upon an estimate for a zero-free region of the
zeta function. We will have more to say about the zeros of the Riemann zeta
function in Section 9.2.1.
In this chapter, we will start by giving a far-reaching refinement of the
integral test that quickly gives sharper estimates for some arithmetic functions. We then develop the algebra of arithmetic functions with respect to a
natural notion of multiplication, Dirichlet convolution. Finally, we apply these
results to show how the Riemann zeta function may be extended to the whole
complex plane with a simple pole at 1.
8.1 Euler’s Summation Formula
N
The integral test used on p. 10 compares the sum n=1 f(n) with the inteN
gral 1 f(x) dx. Euler’s Summation Formula is a refinement of this tool that
allows us to derive sharper asymptotic formulas. Recall that
{t} = t − t
(8.2)
denotes the fractional part of a real number t, where t
is the greatest integer
smaller than or equal to t.
Theorem 8.2. Let a < b be real numbers, and suppose that f is a complexvalued function defined on [a, b] with a continuous derivative on (a, b). Then
b
b
f(n) =
f(t) dt +
{t}f (t) dt − f(b){b} + f(a){a}.
(8.3)
a
a<nb
a
Proof. We give the proof in the case a, b ∈ N for simplicity.
Suppose that a < n − 1 < n b. Now
n
t
f (t) dt = (n − 1)[f(n) − f(n − 1)] = nf(n) − (n − 1)f(n − 1) − f(n).
n−1
Sum this from a + 1 to b:
b
t
f (t) dt =
a
b
(n − 1)[f(n) − f(n − 1)]
n=a+1
b
= bf(b) − af(a) −
f(n).
n=a+1
Rearrange to give
b
n=a+1
f(n) = bf(b) − af(a) −
a
b
[t]f (t) dt.
(8.4)
8.1 Euler’s Summation Formula
159
On the other hand, integrating by parts,
b
a
&
'b
f(t) dt = tf(t) a −
b
tf (t) dt.
(8.5)
a
Equations (8.4) and (8.5) together give
b
n=a+1
f(n) =
b
f(t) dt +
a
b
{t}f (t) dt,
a
completing the proof.
N
Applying the Euler Summation Formula to the harmonic series k=2 k1
already gives a nontrivial result. Here a = 1, b = N , and f(t) = 1/t. This gives
N
N
N
N
1
1
{t}
{t}
=
dt −
dt = log N −
dt.
2
n
t
t
t2
1
1
1
(8.6)
k=2
Clearly
∞
∞
{t}
{t}
{t}
dt
=
dt
−
dt
2
2
t
t
t2
1
1
N
∞
and the last term is less than N t12 dt = N1 . Adding 1 to each side of Equation (8.6) gives the following result, which should be compared with Exercise 1.2 to appreciate the power of the Euler Summation Formula.
N
Theorem 8.3.
N
1
1
= log N + γ + O
, where
n
N
n=1
∞
{t}
γ = 1−
dt
t2
1
is the Euler–Mascheroni constant.
Definition 8.4. For 1 n ∈ N, let d(n) denote the number of divisors of n.
For example, d(n) = 2 if and only if n is a prime. It follows that information
about d reflects something about the distribution of the primes themselves.
Exercise 8.1. Prove that d(n) is odd if and only if n is a square.
Theorem 8.5.
N
n=1
√
d(n) = N log N + (2γ − 1)N + O( N ).
160
8 The Riemann Zeta Function
Proof. The Euler Summation Formula in the usual form with integer boundaries gives a much larger remainder term of the form
√ O(N ), which swamps
the (2γ −1)N term. For the sharper result with a O( N ) error term, we apply
the Euler Summation Formula in the more general form given in Theorem 8.2.
Notice that
√
{(m, q) : mq x} = 2 {(m, q) : mq x, m < q} + O( x).
(8.7)
It follows that
d(n) =
nx
1
mx qx/m
= 2
√
1 + O( x)
m,q;
mqx,m<q
= 2
( x )
√
− m + O( x)
m
√
m< x
= 2x
1
√
m + O( x).
−2
√ m
√
m< x
m< x
Now we can apply the Euler Summation Formula to each sum to obtain
x
√
√
√ d(n) = 2x(log x + γ + O(x−1/2 )) − 2
+ O( x) + O( x)
2
nx
√
= x log x + (2γ − 1)x + O( x)
as required.
The sharper form of the Euler Summation Formula used in the proof of
Theorem 8.5 does not always give more precise results. For example, Theorem 8.2 applied to the harmonic series gives
1
{x}
1
+
= log x + γ + O
;
n
x
x
1nx
the last summand is O( x1 ) and so goes into the remainder term. In this case,
the general form does not give us more information, and for many summands
this will be the case.
Exercise
8.2. Prove Equation (8.7). (Hint: If mq x and m < q, then m <
√
x. The number of q in this set for fixed m is [x/m] − m; draw a picture to
see why.)
Another application of the Euler Summation Formula gives Stirling’s Formula.
8.2 Multiplicative Arithmetic Functions
161
Theorem 8.6. [Stirling’s Formula]
log N ! = N log N − N + O(log N ).
N
Proof. Clearly log N ! = n=2 log n. Put f(t) = log t, and then by the Euler
Summation Formula
N
N
{t}
dt = N log N − N + O(log N ).
log t dt +
log N ! =
t
1
1
Stirling’s Formula and its refinements are extremely important in many
parts of mathematics. A refinement of the Euler Summation Formula using
derivatives of higher order gives the more precise estimate
√
√
2πnn+1/2 e−n < n! < 2πnn+1/2 e−n+1/12n .
(8.8)
Exercise 8.3. *Prove the inequality (8.8).
Exercise 8.4. Use the Euler Summation Formula to prove the following
asymptotic formulas, where A and B are constants.
N
log n
log N
1
2
(a)
.
= (log N ) + A + O
n
2
N
n=1
N
1
1
(b)
= log log N + B + O
.
n log n
N log N
n=2
Exercise 8.5. Prove that
N
1
d(n)
= (log N )2 + 2γ log N + O(1),
n
2
n=1
where γ is the Euler–Mascheroni constant.
8.2 Multiplicative Arithmetic Functions
Recall from Definition 3.4 that an arithmetic function f is multiplicative if
f(m)f(n) = f(m)f(n) when gcd(m, n) = 1.
It turns out that many functions of interest in number theory have this property.
Definition 8.7. The Möbius function µ is defined by µ(1) = 1 and
(−1)k if n is a product of k distinct primes, and
µ(n) =
0 otherwise.
162
8 The Riemann Zeta Function
Exercise 8.6. By Fermat’s Little Theorem (Theorem 1.12), for any prime p
and integer a, ap − a ≡ 0 modulo p. The Möbius function gives a natural
generalization to a composite modulus.
(a) For any a, n ∈ N, prove that
µ(n/d)ad ≡ 0 (mod n).
d|n
(b)*For any a, k, n ∈ N, prove that
k
µ(n/d)ad ≡ 0 (mod n).
d|n
Remarkably, the Prime Number Theorem is also intimately connected with
growth properties of the Möbius function, and in fact
µ(n) = o(x).
(8.9)
Prime Number Theorem ⇐⇒
nx
That is,
the Prime Number Theorem follows from and implies the fact
that x1 nx µ(n) → 0. Similarly, the important Riemann Hypothesis (Conjecture 9.7 in the next chapter) is equivalent to a conjectured result about
partial sums of the Möbius function.
Exercise 8.7. Prove Tchebychef’s weak form of the Prime Number Theorem
by the following steps.
(a) Let N be an integer and p a prime. Show that the largest power of p
dividing N ! is exactly
N
N
N
+ 2 + 3 + ··· .
p
p
p
(b) Use (a) to show that
N N N log(N !) =
+ 2 + 3 + · · · log p.
p
p
p
pN
(c) Use Stirling’s Formula (Theorem 8.6) to show that
N
log p
= N log N + O(N ).
p
pN
(d) Deduce that there exist constants A, B > 0 with the property that
A
N
N
< π(N ) < B
log N
log N
for all N 2.
8.2 Multiplicative Arithmetic Functions
163
Exercise 8.8. Let 0 < a < b be fixed real numbers and assume the Prime
Number Theorem.
a
(a) Prove that π(ax)
π(bx) → b as x → ∞.
(b) Deduce that there is some X such that for any x > X there is a prime p
with ax < p < bx.
(c) Deduce that there is a rational pq , a < pq < b, with p and q both prime.
The Möbius function appears in a striking prime formula due to Gandhi.
Exercise 8.9. *Let Pn denote the product of the first n primes. Prove that
the next prime pn+1 is the unique integer m with the property that
⎞
⎛
µ(d)
1
1 < 2m ⎝
− ⎠ < 2.
2d − 1 2
d|Pn
Theorem 8.8. The Möbius function is multiplicative. Moreover,
1
if n = 1,
µ(d) =
0
otherwise.
d|n
Proof. Let m and n be integers with gcd(m, n) = 1, and factorize m and n as
products of prime powers. The primes involved must all be distinct. If, in the
factorization there is an exponent of 2 or more, then we are done since both
sides of the equation µ(mn) = µ(m)µ(n) are zero. If m (and n) is a product
of k (resp. ) distinct primes, then mn is a product of k + primes, all of
which are distinct since m and n are coprime. It follows that µ(m) = (−1)k
and µ(n) = (−1) , and so
µ(mn) = (−1)k+ = µ(m)µ(n).
For the next claim, it is sufficient to check the prime power case since again
both sides of the equation of Theorem 8.8 are multiplicative (by the argument
used in the proof of Theorem 3.9). If n = pr with r 1, the left-hand side is
µ(1) + µ(p) = 1 − 1 = 0,
which completes the proof.
Theorem 8.9. For all integers n 1, φ(n) =
d|n
Proof. By Corollary 3.6, φ(n) = n
pi |n
1−
1
pi
n
µ(d) .
d
, so
1
µ(d)
1
φ(n)
+
− ··· =
=1−
.
n
pi
pi pj
d
pi |n
i=j
d|n
164
8 The Riemann Zeta Function
8.3 Dirichlet Convolution
The proof of Theorem 8.9 is a special instance of a general technique.
Definition 8.10. The convolution of arithmetic functions f and g is the function f ∗ g defined by
n
.
(f ∗ g)(n) =
f(d)g
d
d|n
Theorem 8.11. Convolution is commutative and associative. In other words,
f ∗ g = g ∗ f and (f ∗ g) ∗ h = f ∗ (g ∗ h)
for any arithmetic functions f, g, and h.
Proof. The sum in
f(d)g
n
d|n
d
runs over all pairs d, e ∈ N with de = n, so it is equal to
f(d)g(e),
de=n
and the latter expression is symmetric in f and g.
To see that convolution is associative, check the property for n = p a prime
by hand. The proof in the general case goes in much the same way as the proof
of commutativity:
(f ∗ g) ∗ h (n) = f ∗ (g ∗ h) (n) =
f(c)g(d)h(e),
cde=n
from which associativity is clear.
Lemma 8.12. Define the arithmetic function I by I(1) = 1 and I(n) = 0 for
all n > 1. Then, for any arithmetic function f,
f ∗ I = I ∗f = f.
Proof.
(f ∗ I) (n) =
d|n
f(d) I
n
d
= f(n) I(1) = f(n)
since all the other summands are zero by the definition of I.
Theorem 8.13. If f is an arithmetic function with f(1) = 0, then there is a
unique arithmetic function g such that f ∗ g = I. This function is denoted f −1 .
8.3 Dirichlet Convolution
165
Proof. The equation (f ∗ g)(1) = f(1)g(1) determines g(1). Then define g
recursively as follows. Assuming that g(1), . . . , g(n − 1) have been defined
uniquely, the equation
n
(f ∗ g)(n) = f(1)g(n) +
f(d)g
d
1<d|n
allows us to calculate g(n) uniquely.
Example 8.14. Let u(n) = 1 for all n. Then, by Theorem 8.8,
u−1 = µ.
(8.10)
Exercise 8.10. Let f be a multiplicative arithmetic function with f(1) = 0.
(a) Prove that f −1 (n) = µ(n)f(n) for all square-free n.
(b) Prove that f −1 (p2 ) = f(p)2 − f(p2 ) for all primes p.
Exercise 8.11. Let f be an arithmetic function, and consider the (formal)
relationship
∞
∞
(1 − xn )f(n)/n =
R(n)xn .
(8.11)
n=1
− n1
n=0
n
(a) Prove that R(n) =
a=1 (f ∗ u)(a) · R(n − a) for all n 1.
(b) Assume that R(0) = 1. Prove that f is uniquely determined by Equation (8.11).
n
(c) For f(n) = nα , prove that R(n) = − n1 a=1 (nα + 1) · R(n − a) for all n 1.
Exercise 8.12. If f is multiplicative, prove that f is completely multiplicative
if and only if f −1 (pa ) = 0 for all primes p and a 2.
Exercise 8.13. Define an arithmetic function ν(n) to be 1 when n = 0 and
the number of distinct prime factors of n for n 1. Let f = µ ∗ ν. Prove
that f(n) ∈ {0, 1} for all n ∈ N.
Exercise 8.14. (a) Prove that the collection of all arithmetic functions f
with f(1) = 0 forms an Abelian group under Dirichlet convolution.
(b) Prove that the multiplicative arithmetic functions form a subgroup.
(c) Show by example that the completely multiplicative functions do not form
a subgroup.
Theorem 8.15. [Möbius inversion formula] Given
functions f
narithmetic
.
and g, f(n) =
g(d) if and only if g(n) =
f(d)µ
d
d|n
d|n
Proof. Assume that f(n) =
d|n g(d), and let u(n) = 1 for all n as in
Example 8.14. Then f = g ∗ u. Convolve both sides of f = g ∗ u with µ and
use Equation (8.10) to see that
166
8 The Riemann Zeta Function
f ∗ µ = g ∗ u ∗ µ = g ∗ I = g,
so
g(n) =
f(d)µ
d|n
n
d
.
For the converse, convolve g = f ∗ µ with u.
Thus Theorem 3.9 and Theorem 8.9 are equivalent: We can move from
one to the other by convolving with the Möbius function or its convolution
inverse.
Exercise 8.15. Suppose σ denotes a real number for which
F(σ) =
∞
∞
f(n)
g(n)
and
G(σ)
=
σ
n
nσ
n=1
n=1
are absolutely convergent series. Prove that
F(σ) · G(σ) =
∞
(f ∗ g)(n)
.
nσ
n=1
Example 8.16. If f ∗ g = I, then F(σ)G(σ) = 1, so
∞
µ(n)
1
.
=
ζ(σ) n=1 nσ
Series such as F, G, and F · G are called Dirichlet series. We next study the
Riemann zeta function in the context of Dirichlet series.
Exercise 8.16. For all s ∈ C with (s) > 2, show that
∞
ζ(s − 1) φ(n)
.
=
ζ(s)
ns
n=1
The traditional notation for the variable s in Definition 1.4 of the Riemann
zeta function is
s = σ + it
with σ, t ∈ R.
∞
For s with real part σ = (s) > 1, we claim that the series n=1 n1s converges
absolutely. To prove this, notice that
has modulus n−σ
n−s = n−σ−it = n−σ e−it log n
∞
and that n=1 n1σ is a convergent series by the integral test.
8.3 Dirichlet Convolution
167
8.3.1 Application of Möbius Inversion to Zsigmondy’s Theorem
Before showing how Theorem 8.15 can be used to prove Zsigmondy’s Theorem
(Theorem 1.15), a preliminary observation needs to be made. The polynomial xn − 1 already has some natural factorization according to the divisors
of n. If d 1 denotes any integer, let φd denote the monic polynomial whose
zeros are the primitive 2 dth roots of unity. The polynomial φd is known as
the dth cyclotomic polynomial.
A simple application of Galois theory says that
φd (x) ∈ Z[x] for every d 1.
If you are not familiar with Galois theory we ask you take this on trust. A
natural factorization of xn − 1 into integral polynomials follows at once, by
dividing the nth roots of unity into the dth primitive roots of unity for d
running over the divisors of n,
φd (x).
(8.12)
xn − 1 =
d|n
Exercise 8.17. Compute the polynomials φd for 1 d 15.
The factorization given by Equation (8.12) into integral polynomials yields
a partial factorization of Mn = 2n − 1 into integers,
2n − 1 =
φd (2).
(8.13)
d|n
The first thing to notice about Equation (8.13) is that, by definition, any
primitive divisor of Mn must divide φn (2). The proof of Theorem 1.15 proceeds
by showing that any factor of φn (2) which is common to Md for some d < n
must itself already divide n. Therefore, as soon as φn (2) exceeds n, Mn is
guaranteed to have a primitive divisor.
We claim that for every n > 6, φn (2) > n. To prove this, note first that
for all n > 1,
2φ(n)
φn (2) >
.
(8.14)
e
To prove this inequality, apply Möbius inversion to Equation (8.13) to see
that the logarithm of the left-hand side of the inequality (8.14) is
2
A complex number z is a primitive dth root of unity if z d = 1, but z e = 1 for
any e with 0 < e < d. Thus the primitive dth roots of unity are precisely the
complex numbers
e2πik/d with gcd(k, d) = 1.
Thus there are exactly φ(d) distinct primitive dth roots of unity, where φ denotes
the Euler function defined on p. 61.
168
8 The Riemann Zeta Function
µ(n/d) log(2d − 1).
d|n
This is bounded below by φ(n) log 2 − 1 using Theorem 8.9 and an easy estimate for the logarithm.
The right-hand term of the inequality (8.14) can be estimated using Corollary 3.6 and the bound (1.4). We deduce that if φn (2) n then
log n < 2 log log n + C,
(8.15)
for a constant C > 0 which can be made explicit. Clearly, this inequality
bounds n and so we have completed the proof that for large enough n, the
logarithm of the right-hand side of the inequality (8.14) is greater than log n.
This completes the proof of the original claim, albeit in an inexplicit way.
Exercise 8.18. Find an explicit value for the constant C in the inequality (8.15) and use this to find an explicit bound for n.
Applying the multiplicative form of Theorem 8.15 to Equation (8.12) yields
φn (x) =
(xd − 1)µ(n/d) ,
(8.16)
d|n
so in particular
φn (2) =
(2d − 1)µ(n/d) .
(8.17)
d|n
Thus, Equation (8.17) gives
ordp (φn (2)) =
µ(n/d) ordp (Md ).
(8.18)
d|n
Now suppose that p is a prime with pMn and pMd for some d, 1 < d < n
(so p is not a primitive divisor of Mn ). Let d0 be the smallest value of d for
which pMd . Now the sequence (Mn ) has the strong divisibility property of
Exercise 1.15(c) on p. 27, namely,
gcd(Mn , Mm ) = Mgcd(n,m) for all m, n.
Thus we may assume that d0 n. By Exercise 1.15(b),
ordp (Mnd0 ) = ordp (Md0 ) + ordp (n),
(8.19)
and ordp (Md ) = 0 unless d is a multiple of d0 . Use Equation (8.19) to
write Equation (8.18) as
ordp (φn (2)) =
µ(n/dd0 ) ordp (Md0 ) + ordp (d)
d|(n/d0 )
= ordp (Md0 )
d|(n/d0 )
µ(n/dd0 ) +
d|(n/d0 )
µ(n/dd0 ) ordp (d).
8.4 Euler Products
169
Since n > d0 , the first term vanishes because
µ(d) = 0 for n > 1
d|n
by Theorem 8.8. The second term is bounded above by ordp (n). As shown
above, φn (2) > n for all n > 6, which concludes the proof of Theorem 1.15
on p. 27.
The next exercise uses similar methods to solve a special case of a deep
general result due to Bilu, Hanrot and Voutier. The result was first shown by
Carmichael in 1913.
Exercise 8.19. [Carmichael] Let A and B be nonzero integers. The Lucas
sequence associated with the pair (A, B) is the sequence (un ) defined by
un =
αn − β n
,
α−β
where α and β are the roots of the equation
x2 − Ax + B = 0.
Assume that α and β are real, and prove the following theorem: The term un
of the associated Lucas sequence has a primitive divisor for n = 1, 2, 6, except
when A = B = −1 and n = 12. (Hint: Let ζ be a primitive nth root of unity,
and write
Qn =
(α − ζ k β).
1kn,
gcd(k,n)=1
Show that un will have a primitive divisor if Qn is not too small relative to the
size of n, and then show that the smallest values of Qn arise for A = 1, B = −1
and A = 3, B = 2.)
8.4 Euler Products
Recall from Chapter 1 that the Riemann zeta function has a decomposition
as an Euler product. By Theorem 1.5,
ζ(σ) =
p
1−
1
pσ
−1
,
where, as before, the product over p means a product over all the primes,
and σ > 1. We will now show that this holds for all complex s with (s) > 1.
Since it is no more difficult, we prove a more general theorem.
170
8 The Riemann Zeta Function
Theorem 8.17. If f is a multiplicative arithmetic function, and
converges absolutely, then
∞
f(n) =
∞
n=1
f(n)
f(1) + f(p) + f(p2 ) + · · · .
p
n=1
If, in addition, f is completely multiplicative, then
∞
f(n) =
p
n=1
Proof. Let
P (x) =
1
.
1 − f(p)
(f(1) + f(p) + f(p2 ) + · · · ).
px
Each factor is absolutely convergent by hypothesis, and there are a finite
number of factors, so
f(n),
P (x) =
n∈A(x)
where
A(x) = {n ∈ N | all prime factors of n are less than or equal to x}.
Now consider the difference
∞
f(n) −
f(n) |f(n)| |f(n)|.
n=1
n∈A(x)
n>x
n∈A(x)
/
The last sum tends to zero as x tends to infinity because it is the tail of a
convergent series. The identity for completely multiplicative functions follows
from the general Euler product expansion because in this case each factor of
the infinite product is a convergent geometric series.
The nonvanishing condition of Remark 1.6 on p. 15 is automatically satisfied in the setting of Theorem 8.17 for all completely multiplicative functions f
since the limit of a nontrivial convergent geometric series cannot be zero.
Much of the study of the Riemann zeta function involves complex analysis.
Definition 8.18. Let S be an open subset of C. A function f : S → C is called
complex differentiable or holomorphic on S if the limit
f(z + h) − f(z)
h→0
h
exists and is finite for all z ∈ S. If for all z ∈ S, f equals its own Taylor series
in a small neighborhood of z,
lim
f(z + h) =
then f is called analytic on S.
∞ (n)
f (z) n
h for small h,
n!
n=0
8.5 Uniform Convergence
171
Recall that all functions holomorphic on S are analytic on S and vice versa,
in contrast with the case of real functions of a real variable, where “analytic”
is a strictly stronger condition than “differentiable infinitely often.”
Our next goal is to show that the function s → ζ(s) is analytic on the
half-plane (s) > 1. It is tempting to argue as follows. Each of the individual
functions s
→ n−s is analytic, so the convergent sum should be analytic with
∞
n
derivative n=1 − log
ns . Unfortunately, an infinite sum of analytic functions
might not be analytic – indeed it might not even be continuous.
Example 8.19. Let
fn (x) =
for n 1, and let f(x) =
a geometric progression,
N
n=0
fn (x) =
∞
n=1 fn (x).
x2
1−
1
1 + x2
x2
(1 + x2 )n
We can sum the fn because they form
− 1−
1
1 + x2
x2
1 + x2
N +1
.
Now when x = 0 we can let N tend to infinity, the second term tends to
zero, and the whole
sum converges to f(x) = 1 + x2 . However, fn (0) = 0 for
∞
all n 1, so f(0) = n=1 fn (0) = 0 also. Thus the limit function f is not even
continuous, although all the summands fn are analytic on a neighborhood of 0.
The same phenomenon for complex z is seen in the region {z | | arg(z)| < π/4}.
One useful criterion to make sure that nothing goes wrong in manipulating
series of functions is uniform convergence.
8.5 Uniform Convergence
Definition 8.20. Let S ⊆ C be a nonempty set. A sequence (Fn ) converges
pointwise to F on S if, for every s ∈ S, Fn (s) → F(s). The sequence (Fn )
converges uniformly to F on S if for all > 0 there exists N = N () such that
for all n > N
|F(s) − Fn (s)| < for all s ∈ S.
The uniformity in this definition is that the number N is not allowed to
depend on s. Many useful properties of the terms of a sequence of functions
are inherited by the limit function if the convergence is uniform.
Theorem 8.21. Suppose that the sequence of functions (Fn ) converges to F
uniformly on S. If each Fn is continuous on S, then F is continuous on S.
172
8 The Riemann Zeta Function
Proof. Fix s0 ∈ S and > 0. Choose N such that, for all s ∈ S,
.
3
|F(s) − FN (s)| <
(8.20)
This is possible because the sequence of functions (Fn ) is converging uniformly
on S. Next, choose δ > 0 such that, for all s ∈ S with |s − s0 | < δ,
|FN (s) − FN (s0 )| <
.
3
(8.21)
Now we have set the stage: For all s ∈ S with |s − s0 | < δ, we have
|F(s) − F(s0 )| = |F(s) − FN (s) + FN (s) − FN (s0 ) + FN (s0 ) − F(s0 )|
|F(s) − FN (s)| + |FN (s) − FN (s0 )| + |FN (s0 ) − F(s0 )|
< + + = .
3 3 3
We have used the inequality (8.20) twice, in order to estimate the first and
third terms, and the inequality (8.21) for the second term. This proves that F
is continuous at each point s0 ∈ S.
Theorem 8.22. For every δ > 0, the partial sums of the Riemann zeta function converge uniformly on S1+δ = {s ∈ C : (s) > 1 + δ}. That is,
N
1
−→ ζ(s) as N → ∞
s
n
n=1
uniformly on S1+δ .
on
In particular, by Theorem 8.21, the Riemann zeta function is continuous
*
S1+δ = {s ∈ C : (s) > 1} = S1 .
δ>0
The convergence is not uniform on the whole of S1 .
Proof of Theorem 8.22. Notice that for any s ∈ S1+δ ,
N
∞
1 1 ζ(s) −
=
ns ns n=1
<
n=N +1
∞
n=N +1
∞
<
N
1
n1+δ
1
x1+δ
dx =
N −δ
.
δ
Given any > 0, this is less than when N is large, independently of s,
showing uniform convergence.
8.6 The Zeta Function Is Analytic
173
Exercise 8.20. (see Section 6.1) Let L denote a lattice in C with an associated Weierstrass ℘-function defined by
1
1
1
for z ∈
/ L.
(8.22)
−
℘L (z) = 2 +
z
(z − )2
2
0=∈L
Prove that
1
1
1
−→ ℘L (z)
+
−
z2
(z − )2
2
0=∈L,
||n
uniformly as n → ∞ on any compact subset of C\L.
8.6 The Zeta Function Is Analytic
The next consequence of uniform convergence is that it preserves the analyticity of complex functions.
Theorem 8.23. Suppose S ⊆ C is open, and we have a function F : S → C
and a sequence of functions FN : S → C converging to F uniformly on S. If
each FN is analytic, then F is analytic.
N
Example 8.24. The function FN (s) = n=1 n1s is analytic and converges uniformly to ζ(s) on every S1+δ , δ > 0 as in Theorem 8.22, so by Theorem 8.23,
the Riemann zeta function is analytic on S1+δ for every δ > 0 – in other
words, ζ is analytic on {s ∈ C | (s) > 1}.
Proof of Theorem 8.23. Given a fixed point a ∈ S, we have to prove that F
is analytic on a neighborhood of a. We use complex analysis, in particular
Cauchy’s formula. Let γ be a closed simple curve, that is a finite join of smooth
curves such that a ∈ Int(γ) and the closure Int(γ) ⊆ S. Then Cauchy’s formula
says that for any function f that is analytic on S, and for any b ∈ Int(γ),
1
f(z)
dz.
(8.23)
f(b) =
2πi γ z − b
Since S is open, it contains a small disk around a. We will need the following
result.
Lemma 8.25. Suppose a sequence of continuous functions GN : γ → C converges uniformly on γ to a function G : γ → C. Then G is continuous and
GN (s) ds =
G(s) ds.
lim
N →∞
γ
γ
174
8 The Riemann Zeta Function
Proof. The continuity of G follows from Theorem 8.21, so in particular G is
integrable. Now
G(s) ds − GN (s) ds = [G(s) − GN (s)] ds
γ
γ
γ
|G(s) − GN (s)| ds
γ
length(γ) max |G(s) − GN (s)|.
s∈γ
The last quantity tends to zero by the definition of uniform convergence. This
completes the proof of the lemma.
Exercise 8.21. (a) Prove Morera’s Theorem: If f is a continuous function
on a domain D ⊆ C with the property that γ f(z) dz = 0 for every closed
contour in D, then f is analytic on D.
(b) Use this to give a different proof of Theorem 8.23.
Now a magic wand can be waved to complete the proof of Theorem 8.23.
By hypothesis, the functions FN are analytic on S, so for all b ∈ Int(γ) ⊆ S,
by Cauchy’s formula Equation (8.23),
FN (s)
1
ds.
FN (b) =
2πi γ s − b
Define GN (s) =
FN (s)
s−b .
Then GN converges to G(s) =
F(s)
s−b
uniformly on γ since
F(s) − FN (s) |G(s) − GN (s)| = s−b
C max |F(s) − FN (s)|,
s∈γ
where C = maxs∈γ
1
|s−b| .
This proves that for all b ∈ Int(γ)
1
F(b) = lim FN (b) = lim
N →∞
N →∞ 2πi
1
GN (s) ds =
2πi
γ
γ
F(s)
ds.
s−b
(8.24)
Here, we have applied Lemma 8.25 to interchange a uniform limit and the
integral in the last step. Finally, recall that any function F satisfying Cauchy’s
formula (Equation (8.24)) on Int(γ) is analytic there; this may be seen as
follows. For all b ∈ Int(γ),
1
F(b + h) − F(b)
1
1
−
ds
F(s)
=
h
2πih γ
s−b−h s−b
1
F(s)
=
ds,
2πi γ (s − b)(s − b − h)
8.7 Analytic Continuation of the Zeta Function
175
since the hs cancel. In the last integral, the limit h → 0 may be taken, which
gives the derivative. Strictly speaking, we should establish uniform convergence (as with the GN above) to be able to apply Lemma 8.25 again, but this
is straightforward.
Corollary 8.26. For all s with (s) > 1,
∞
d
log n
ζ(s) = −
.
ds
ns
n=1
We will see later that many of the deeper properties of the Riemann zeta
function and their consequences for number theory take place for complex
values s = σ + it with σ < 1, and all we have done so far (including the
definition of the Riemann zeta function) does not apply to such values.
8.7 Analytic Continuation of the Zeta Function
A very important idea from complex analysis is that of analytic continuation.
Given a function f defined by a convergent power series on a disk D of positive
radius, an analytic function defined on any domain containing D that coincides
with f on D is called a continuation of f.
Exercise 8.22. Prove the Uniqueness Theorem: Suppose that G is a domain
in C, and f and g are differentiable functions on G with f(z) = g(z) for
all z ∈ S, where S ⊆ G has a limit point in G. Then f(z) = g(z) for all z ∈ G.
Example 8.27. Consider the function defined by the power series
g(s) = 1 + s + s2 + · · · ,
which converges for |s| < 1. Then g can be continued to a function that is
analytic on the whole of C except for a simple pole at s = 1. To see this, notice
−1
that for |s| < 1, g(s) = s−1
. The latter expression is defined on C apart from
a simple pole at s = 1 with residue −1. Of course, g is not defined by the
series for |s| 1.
Theorem 8.28. The Riemann zeta function has an analytic continuation
to the whole of the complex plane with the exception of a simple pole with
residue 1 at s = 1.
We will present two different proofs of this: a standard proof that may
be found in any of the books on the topic and an alternative proof no doubt
known to the experts but not readily available in the literature.
In Chapter 9, we will find a functional equation for the Riemann zeta
function. Using this, Theorem 8.28 may be deduced from the weaker statement
that there is a continuation to (s) > 0. The first proof is of this weaker
statement.
176
8 The Riemann Zeta Function
Theorem 8.29. The Riemann zeta function has an analytic continuation to
the set {s ∈ C | (s) > 0} with the exception of a simple pole at s = 1 with
residue 1.
First (Standard) Proof of Theorem 8.29. This involves a careful use
of the Euler Summation Formula – the care is needed because the formula as
stated only applies to finite intervals. Assume first that (s) > 1; then, by
the Euler Summation Formula,
N
N
N
1
−s{t}
−s
=
t dt +
dt.
(8.25)
s
n
ts+1
1
1
n=2
The first term is
t1−s
1−s
N
=
1
N 1−s
1
,
−
1−s
1−s
and since we suppose (s) > 1, we get N 1−s → 0 as N tends to infinity. The
second term in Equation (8.25) also converges as N tends to infinity since
∞
∞
−s{t} |s|
dt
dt < ∞.
s+1
σ+1
t
t
1
1
∞ n+1 −s{t}
This shows that n=1 n
ts+1 dt is absolutely convergent, so we are justified in writing
∞
∞
1
{t}
1
=
1
−
dt.
(8.26)
ζ(s) = 1 +
−
s
s
1+s
n
1
−
s
t
1
n=2
Lemma 8.30. The integral in Equation (8.26) represents an analytic function
on the range (s) > 0.
Proof. Write
I(s) =
where fn (s) =
n+1
n
∞
fn (s),
n=1
{t}
ts+1
dt. We will prove that for any δ > 0
(a) the series for I(s) converges uniformly on (s) > δ and
(b) each fn is analytic on the range (s) > δ.
Then we may apply Theorem 8.23 to complete the proof that I(s) is analytic
on (s) > 0. As for (a),
∞
N
∞
fn (s) = fn (s) |fn (s)|
I(s) −
n=1
n=N +1
n=N +1
−σ ∞
∞
1
t
dt
=
σ+1
t
−σ N +1
N +1
=
(N + 1)−σ
,
σ
8.7 Analytic Continuation of the Zeta Function
177
and in absolute value this is smaller than δ(N 1+1)δ , so it tends to zero. The
bound depends on δ only, not on σ or s, which proves the uniform convergence.
For (b), consider the difference quotient
fn (s + h) − fn (s)
1
=
h
h
n+1
n
{t}
ts+1
1
− 1 dt.
th
(8.27)
Use a first-order Taylor approximation for the exponential
t−h = e−h log t = 1 − h log t + f(h, t),
where
f(h, t) = O (h log t)2 .
(8.28)
Substituting into Equation (8.27) gives
1
(fn (s + h) − fn (s)) =
h
n+1
n
{t}
ts+1
− log t +
1
f(h, t) dt.
h
The left-hand side for small h should be close to the derivative of fn , and we
can make an intelligent guess at what this derivative will be:
n+1
n+1
1
{t}
1
(fn (s + h) − fn (s)) +
log
t
dt
|f(h, t)| dt.
h
s+1
σ+1
t
ht
n
n
The right-hand side tends to zero as h → 0 by Equation (8.28). This completes
the first proof of the analytic continuation of the zeta function to (s) > 0.
Lemma 8.30 gives the analytic continuation of ζ to the half-plane (s) > 0
by Equation (8.26), apart from a simple pole at s = 1 with residue 1.
A natural question is to ask why it was necessary to split the integral from 1
to ∞ into a sum of subintegrals. The reason is that the Taylor approximation
in Equation (8.28) is only valid for bounded values of h log t. If we had tried to
make a similar argument for the integral from 1 to ∞, t would be unbounded
and the quantity h log t would be unbounded. By the splitting of the integral,
we had only to consider t ∈ [n, n + 1] for a fixed n at each stage.
These are treacherous waters! We are catching a glimpse here of how quite
reasonable questions about the Riemann zeta function turn out to involve
potentially subtle analytic problems. The methods we have just used can be
squeezed to give a little more. The second proof of Theorem 8.29 (given below)
will give an analytic continuation to the whole plane.
Exercise 8.23. (a) Show that
A(s) = 1 −
1
1
1
+ s − s + ···
s
2
3
4
178
8 The Riemann Zeta Function
is analytic for (s) > 0.
1
ζ(s), and deduce the analytic continuation
(b) Show that A(s) = 1 − 2s−1
of the Riemann zeta function to (s) > 0.
(c) Repeat (a) and find a suitable analog of (b) for
B(s) = 1 +
1
1
1
1
1
− s + s + s − s + ··· .
2s
3
4
5
6
(d) Deduce from (c) that the only pole of ζ in (s) > 0 is at s = 1.
Exercise 8.24. Prove that the Laurent expansion of ζ about s = 1 begins
ζ(s) =
1
+ γ + φ(s),
s−1
(8.29)
where φ denotes a function that is analytic at s = 1 and vanishes there.
The next proof gives the full continuation to the whole complex plane
(with the exception of the simple pole at s = 1).
Second Proof of Theorem 8.28. Consider
∞
−1
1
x−s dx =
=
.
1
−
s
s
−
1
1
We subtract this from ζ(s) to remove the pole at s = 1. For (s) > 1,
∞
∞ n+1
1
1
=
−
x−s dx
ζ(s) −
s − 1 n=1 ns n=1 n
1
∞ 1
−s
−
(n + x) dx
=
ns
0
n=1
1
∞
x −s
1
1−
1+
dx .
(8.30)
=
ns
n
0
n=1
The sums involved here converge absolutely in the region (s) > 1.
Now assume that we have continued the zeta function to the domain
(s) > 1 − K
for some integer K 0. We want to continue it further to (s) > −K. To do
this, put h = x/n and use a Taylor approximation for
fs (h) = (1 + h)−s
of order K. Recall that the Taylor polynomial of degree K for fs at h = 0 is
defined by3
3
If you only want the continuation to (s) > 0, think of K = 1: The Taylor
polynomial for (1 + x/n)−s in this case is simply 1 − sx
, and the error term
n
is O(n−2 ). Substitute Equation (8.31) into Equation (8.30), and you get a series
convergent for (s) > 0.
8.7 Analytic Continuation of the Zeta Function
Tf,s,K (h) =
K
(k)
fs (0)
k=0
k!
hk .
179
(8.31)
We have to calculate higher derivatives of fs at h = 0. These are given by
setting h = 0 in the relation
fs(k) (h) =
(−1)k (s + k − 1)(s + k − 2) · · · (s + 1)s
fs (h),
(h + 1)k
which is easy to prove by induction on k. Since fs is analytic on the neighborhood of h = 0, we have an estimate for the error term of the form
(K+1) (h )
fs
|h|K+1
(8.32)
|fs (h) − Tf,s,K (h)| (K + 1)!
for some h with |h | < |h|. For bounded values of s, this is O(|h|K+1 ). Use the
Taylor polynomial with h = x/n in Equation (8.30). We evaluate the inner
integral first:
1
0
K
(k)
x −s
fs (0)
1
1+
dx = 1 +
+ O K+1 .
n
(k + 1)! nk
n
k=1
Putting this into Equation (8.30) gives the nice identity
(−1)k (s + k − 1) · · · (s + 1)s
1
=−
ζ(s + k)
s−1
(k + 1)!
k=1
1
∞
x x −s
1
−
1
+
+
T
dx.
f,s,K
ns 0
n
n
n=1
K
ζ(s) −
(8.33)
The last sum converges by the inequality (8.32), and for all s = 0, −1, −2, . . .
the values
ζ(s + 1), ζ(s + 2), . . . , ζ(s + K − 1)
are all defined by hypothesis, giving the continuation of the zeta function to
(s) > −K.
In the case s = −m = 0, −1, −2, . . . , 1 − K, one of the arguments of ζ in the
first summand of Equation (8.33) becomes 1, but this is no problem since the
pole is cancelled out by the appropriate factor (s + m) in the coefficient. (The
right-hand side of Equation (8.33) has a removable singularity there.)
Thus, by induction, there is an analytic continuation of the zeta function
to
(s) > −1, −2, . . . ,
in fact to the whole of the complex plane.
180
8 The Riemann Zeta Function
Exercise 8.25. Recall that π(x) denotes the number of primes less than or
equal to x. Prove that if s = σ + it with σ > 1, then
∞
π(x)
dx.
log ζ(s) = s
s − 1)
x(x
2
To do this, convert the sum over all primes to a sum over all natural numbers
using
1
if n is prime,
π(n) − π(n − 1) =
0
otherwise.
The next exercise is an easy version of what has been done for the Riemann
∞
zeta function ζ(s) = n=1 e−s log n .
Exercise 8.26. Define a function F by
F (s) =
∞
e−ns .
n=0
(a) Find the domain of convergence for this series.
(b) Prove that the series converges uniformly for (s) > δ for any fixed δ > 0.
(c) Using (b), find the derivative of F by differentiating term by term.
(d) Find a simple expression for F by viewing it as a geometric progression
and summing it.
(e) Differentiate this closed form and expand the answer using the Binomial
Theorem. Check that you get the series in (c) again.
(f) Obtain the analytic continuation of F to the whole complex plane. Describe
the location and order of all the poles of F .
(g) Compute F (−1)
Exercise 8.27. Prove that
1
ζ(s)
=
∞
µ(n)
for (s) > 1.
ns
n=1
Exercise 8.28. Prove that ζ 2 (s) =
Exercise 8.29. Prove that
Exercise 8.30. Prove that
ζ(s−1)
ζ(s)
∞
d(n)
for (s) > 1.
ns
n=1
=
m,n∈N;
gcd(m,n)=1
∞
φ(n)
for (s) > 2.
ns
n=1
1
ζ 2 (2)
.
=
2
(mn)
ζ(4)
Notes to Chapter 8: A treatment of the Prime Number Theorem at a level
similar to ours may be found in Jameson’s book [81]; this book also includes Selberg’s elementary proof of the Prime Number Theorem [136] (not using analytic
8.7 Analytic Continuation of the Zeta Function
methods). The quantity θ(n) =
pn
181
log p from Lemma 1.8 plays a central role
→ 1 as n → ∞ is equivalent
in the elementary proof; indeed the statement θ(n)
n
to the theorem. Erdös [52] also published an elementary proof of the Prime Number Theorem; a careful account of the controversy is provided by Goldfeld [70]. An
accessible proof of the important inequality (8.8) may be found in Spivak’s lovely
book [145, p. 543]. Exercise 8.6(b) is a special case of a result due to Moss [111].
The implications (8.9) are in the book of Apostol [4, p. 91]. Tchebychef’s proof of
Exercise 8.7 appeared originally in his paper [152] of 1852. Exercise 8.9 is due to
Gandhi [65]; an accessible treatment is in a paper of Golomb [71]. Exercise 8.11(a)
and (c) are taken from a paper of Brent [20]; Exercise 8.11(b) comes from a paper of
Jänichen [82]. The proof of Zsigmondy’s Theorem in Section 8.3.1 is adapted from
a more general result of Schinzel [133]. A readable account of Exercise 8.19 appears
in a paper of Yabuta [165]. For further reading on Section 8.7, consult Apostol [4] or
Titchmarsh [153]. Edwards’ book [47] contains a translation of Riemann’s original
paper [128]. Fourier analysis – and its more august cousin, harmonic analysis – plays
a central role in number theory. For sophisticated accounts, see Tate’s thesis [150]
or Weil’s book [158]. A more accessible account of this advanced material may be
found in the book of Ramakrishnan and Valenza [121].
9
The Functional Equation of the
Riemann Zeta Function
A functional equation is simply an identity involving functions. The trigonometric identity sin2 (x) + cos2 (x) = 1 is one of the most familiar examples. In
this chapter, a highly nontrivial functional equation satisfied by the Riemann
zeta function is found, that allows the function to be extended to the whole
complex plane (apart from the singularity at s = 1).
The proof is difficult, and it might seem that we have strayed far from the
arithmetic path that started with the Fundamental Theorem of Arithmetic.
However, the proof relies crucially on some observations that arose because
of the way mathematicians, particularly in the nineteenth century, thought
about functions. If a function f : C → C has distinct zeros z1 , z2 , . . . , then it
seems natural to “factorize” it, and hope that
f(z) = (z − z1 )(z − z2 ) · · · .
Of course, convergence issues arise, and occasionally some careful doctoring
is needed to make this idea precise and useful.
9.1 The Gamma Function
The Gamma function is one of many classical special functions. It is surprising how the Gamma function helps us to understand properties of the zeta
function and some other arithmetic problems.
Definition 9.1. The Gamma function Γ is defined by
∞
Γ (s) =
e−t ts−1 dt
(9.1)
0
for any s ∈ C with (s) > 0.
Exercise 9.1. Prove that the integral in Equation (9.1) exists for any s ∈ C
with (s) > 0.
184
9 The Functional Equation of the Riemann Zeta Function
As with the Riemann zeta function, it is important to establish the analytic
properties of the Gamma function.
Exercise 9.2. Prove that Γ (s) is an analytic function of s for (s) > 0 by
proving the following statements.
n+1
∞
(a) Γ (s) = n=0 Γn (s), where Γn (s) = n e−t ts−1 dt.
N
(b) For any fixed δ > 0, n=0 Γn (s) → Γ (s) uniformly on {s ∈ C | (s) > δ}
as N → ∞.
(c) Γn is analytic for any n 0.
∞
All these steps are very similar to the argument for 0 t{t}
s+1 dt.
Later on, another (better) proof that Γ is analytic will be given.
Lemma 9.2. The Gamma function has the following properties.
(1) For all s with (s) > 0,
Γ (s + 1) = sΓ (s).
(2) For all integers N 0,
Γ (N + 1) = N !.
Proof. The first relation is found by integrating,
∞
'∞
&
e−t ts dt = −e−t ts 0 + s
Γ (s + 1) =
0
∞
e−t ts−1 dt.
0
The first term vanishes at t = 0 because (s) > 0.
The second statement follows from the first by induction together with the
easy calculation that Γ (1) = 1.
Proposition 9.3. The Gamma function can be analytically continued to all
of C, where it is analytic apart from simple poles at 0, −1, −2, . . . and so on.
Proof. By Lemma 9.2(1), we may write
1
Γ (s + 1).
s
The right-hand side is defined for (s) > −1 apart from s = 0, where it has
a simple pole with residue Γ (1) = 1. Iterating this gives
Γ (s) =
Γ (s) =
1
Γ (s + 2).
s(s + 1)
(9.2)
The right-hand side of Equation (9.2) is defined for (s) > −2, apart from s =
0, −1, where there are simple poles again. In this way, we can inductively
continue the Gamma function to the whole plane, where it is analytic apart
from simple poles at 0, −1, −2, . . . .
Theorem 9.4. Γ (s) = 0 for all s ∈ C.
This will be proved in Section 9.6.
9.2 The Functional Equation
185
9.2 The Functional Equation
Our goal throughout this chapter will be the proof of the following theorem.
Theorem 9.5. [The Functional Equation] Let
s
F(s) = π −s/2 Γ
ζ(s)
2
for (s) > 0. Then F satisfies the functional equation
F(1 − s) = F(s).
Corollary 9.6. The function F has an analytic continuation to the whole
complex plane apart from poles at 1 and 0. The Riemann zeta function has
an analytic continuation to the complex plane where it is analytic apart from
a simple pole at s = 1. The zeta function vanishes at negative even integers.
Proof. Expand Theorem 9.5 to give
s
1−s
−(1−s)/2
ζ(s),
π
ζ(1 − s) = π −s/2 Γ
Γ
2
2
π −s+1/2 Γ 2s ζ(s)
ζ(1 − s) =
.
Γ 1−s
2
(9.3)
We know that Γ (s) has a simple pole at s = −m for m ∈ N. Thus, for
all m ∈ N,
ζ(−2m) = 0
since 1 − s = −2m if and only if s = 2m + 1, ζ(s) = 0 for (s) > 1 by
the Euler product expansion, and Γ = 0 everywhere. The case s = 1 is
different: Here the right-hand side has a simple pole in the numerator, too
(in ζ), cancelling the one in Γ . Thus ζ(s) is analytic and nonzero at s = 0.
By the functional equation, the values of F(s) for (s) 1/2 determine all
of F. We found ζ(−2m) = 0 for all m ∈ N, and there are no more zeros of ζ
with (s) < 0 because Γ has no other poles by Equation (9.3). Also, ζ(s) = 0
for (s) > 1 because of the Euler product expansion (see Remark 1.6).
In the course of the proof, we found a set of special values of the zeta
function at negative even integers. Later we will see that negative odd integers
yield rational values of the zeta function (see Exercise 9.10 on p. 204).
9.2.1 The Riemann Hypothesis
Corollary 9.6 gives another proof that the Riemann zeta function can be
continued to the whole plane, where it is analytic apart from a simple pole
at s = 1. Moreover, any nontrivial zero of ζ must lie in the critical strip defined
by 0 (s) 1. Riemann stated without proof the following conjecture.
186
9 The Functional Equation of the Riemann Zeta Function
Conjecture 9.7. [The Riemann Hypothesis] All zeros of ζ in the critical
strip 0 (s) 1 have (s) = 12 .
This is still an open problem, and its resolution is viewed as one of the outstanding open problems in mathematics. All the zeros found thus far (the
first ten billion are known) lie on the line (s) = 12 , and they are all simple. Figure 9.1 shows (ζ( 12 + it)) for 0 t 60, which already shows the
extraordinary subtlety and complexity of the zeta function along the critical
line.
30
20
10
0
10
20
30
40
50
60
Figure 9.1. The graph of (ζ( 21 + it)) for 0 < t 60.
Just as the Prime Number Theorem is equivalent to a statement about the
partial sums of the Möbius function, the Riemann Hypothesis is equivalent to
the statement that for every ε > 0
µ(n) = O(xε+1/2 ).
nx
Growth properties of the Möbius function are very delicate, and the numerical
evidence can be deceptive. A long-standing conjecture of Mertens, supported
by a great deal of numerical evidence, was that
√
µ(n) < x;
nx
this was eventually disproved by Odlyzko and te Riele in 1985.
It is reasonable to ask why certain problems, such as the Riemann Hypothesis, obtain legendary status. Certainly this one has attracted considerable
folklore. David Hilbert is reputed to have said that if he were to be awoken in
9.3 Fourier Analysis on Schwartz Spaces
187
a thousand years, the first question he would ask would be about the status
of the Riemann Hypothesis. Many mathematicians believe it must be true,
although some great figures have been sceptical. The explanation for its importance is multi-faceted. On the one hand, its statement has great beauty
and simplicity, while the many unsuccessful attempts to resolve it have driven
forward sophisticated methods in number theory. On the other hand – perhaps
more germane to our study – the Riemann Hypothesis is intimately connected
to the distribution of the primes. Many results in analytic number theory can
be proved in stronger forms if the Riemann Hypothesis is assumed. Less obviously, but perhaps most importantly, the Riemann Hypothesis seems to lie
at the heart of future developments in the area of overlap between number
theory, geometry and analysis. Workers in this area sometimes need an almost prophetic insight that can lead to layers of conjectures about how hard
unsolved problems will eventually be cracked. Much of this has to do with
functions that generalize the Riemann zeta function, called L-functions (we
will encounter an L-function in the next chapter.) The Riemann Hypothesis
seems to be a basic example of a whole series of results that will be needed
to make progress in this area. Finally, in addition to its central role in number theory, the Riemann Hypothesis is conjectured to relate to problems in
physics – the zeros of the zeta function corresponding to the eigenvalues of an
appropriate Hermitian operator.
The Clay Mathematics Institute1 has offered a million dollars for a proof
of the Riemann Hypothesis. The prize is not on offer for a disproof, say by
giving a counterexample.
9.3 Fourier Analysis on Schwartz Spaces
For the proof of the functional equation in Theorem 9.5, we will need some
Fourier analysis.
Definition 9.8. The Schwartz space S is the set of functions f : R → C that
are infinitely differentiable and whose derivatives f (n) (including the function
itself f (0) = f) all satisfy
m
(1 + |x|) f (n) = O(1)
(9.4)
for all m ∈ N. The bound in O(1) may depend upon m and n.
Example 9.9. The Gaussian function f(x) = e−x is in S.
2
1
On May 24th 2000, the Clay Mathematics Institute established seven Millennium
Prize Problems, each worth one million dollars, including the Riemann Hypothesis
because “they are important classic questions that have resisted solution over the
years.”
188
9 The Functional Equation of the Riemann Zeta Function
Notice that S is a complex vector space and that any function f ∈ S is
integrable,
∞
∞
∞
1
f(x)
dx
|f(x)|
dx
C
dx < ∞,
2
−∞
−∞
−∞ (1 + |x|)
just by taking n = 0 and m = 2 in Equation (9.4).
Definition 9.10. For any function f ∈ S, the Fourier transform of f is the
function
∞
%f(y) =
f(x)e−2πixy dx.
−∞
The integral exists for the same reason as before,
∞
|%f(y)| |f(x)| dx < ∞,
−∞
and in fact %f ∈ S again since we may apply Equation (9.4) with m = n to get
the bound for %f (n) .
Thus f → %f is a linear map from S to S. It turns out that this
map has a fixed point
√ – a function equal to its Fourier transform. Recall
2
∞
that −∞ e−x dx = π.
Lemma 9.11. If f(y) = e−πy , then %f(y) = f(y).
2
Proof.
%f(y) =
∞
e−πx e−2πixy dx.
2
−∞
The idea is to complete the square,
−π(x2 + 2ixy) = −π[(x + iy)2 + y 2 ],
so the Fourier transform of f is
%f(y) = e−πy2
∞
e−π(x+iy) dx.
2
−∞
Let
∞
I(y) =
e−π(x+iy) dx.
2
−∞
We know that I(0) = 1. What happens if y = 0? Fix some large N and
consider the following paths:
γ1 = [−N, N ],
γ2 = [N, N + yi],
γ3 = [N + yi, −N + yi], γ4 = [−N + yi, −N ].
9.4 Fourier Analysis of Periodic Functions
189
Put γ = γ1 + γ2 + γ3 + γ4 (a rectangle). Since e−πz is an analytic function
on the whole of the complex plane, we have, for any N 0,
2
e−πz dz = 0.
2
γ
Now, as N → ∞, the integral of e−πz over γ1 tends to I(0) = 1, the integral
over γ3 tends to −I(y), and the integrals over γ2 and γ4 both tend to 0,
as N → ∞. This completes the proof of Lemma 9.11.
2
Exercise 9.3. Prove that
N +yi
N
e−z dz → 0 as N → ∞ for any y ∈ R.
2
9.4 Fourier Analysis of Periodic Functions
Fourier analysis is more familiar in the setting of periodic functions.
Definition 9.12. A function g : R → C is periodic with period 1 if
g(x) = g(x + 1) for all x ∈ R.
If g is periodic and piecewise continuous, then its kth Fourier coefficient is
defined for k ∈ Z by
1
g(x)e−2πikx dx,
ck =
0
and its Fourier series is the function
G(x) =
ck e2πikx .
k∈Z
Lemma 9.13. If g is periodic and twice differentiable with continuous second
derivative, then there exists a constant C > 0, depending only upon g, such
that
C
|ck | 2
k
for all k = 0.
Proof. Integrate by parts:
−e−2πikx g(x)
ck =
2πik
1
+
0
0
1
e−2πikx g (x)
dx.
2πik
Now the bracketed term vanishes because g is periodic. Integrate by parts
again, so that k 2 appears
1in the denominator, and then bound the exponential
by 1. Finally, put C = 0 |g | dx/(4π 2 ).
190
9 The Functional Equation of the Riemann Zeta Function
Theorem 9.14. Any function g that is periodic and differentiable infinitely
often has a Fourier series expansion
g(x) =
ck e2πikx
k∈Z
that is uniformly convergent on R.
Proof. Let G be the Fourier series of g, and apply Lemma 9.13:
n
1
2πikx ck e
,
G(x) −
C
k2
|k|>n
k=−n
where the last sum tends to zero independent of x since the constant C depends only on g. This proves the convergence is uniform.
The equality g(x) = G(x) is not so easy to prove. We first record a few
lemmas that are of interest in their own right.
Lemma 9.15. Consider the sequence of functions (DK ) defined by
DK (x) =
K
for K ∈ N,
e2πikx ,
k=−K
called the Dirichlet kernel. Then
1
DK (x) dx = 1,
(9.5)
0
DK (x) =
and
sin((2K + 1)πx)
,
sin(πx)
1
g(y + x)DK (x) dx =
0
K
ck e2πiky ,
(9.6)
(9.7)
k=−K
where ck are the Fourier coefficients of g as in Theorem 9.14.
The functions DK are useful because they concentrate at the origin and
pick out the Fourier coefficients conveniently. The shape of DK is illustrated
in Figure 9.2, that shows the graph of D11 .
Proof of Lemma 9.15. Equation (9.5) follows from the fact that
1
e2πikx dx = 0 for all k = 0.
0
Equation (9.6) is proved by induction on k or directly by summation of a
geometric progression. Equation (9.7) follows since
9.4 Fourier Analysis of Periodic Functions
191
20
15
10
5
Figure 9.2. The Dirichlet kernel D11 (x) for − 12 x 12 .
1
y
g(z)DK (z − y) dz
g(y + x)DK (x) dx =
0
y−1
1/2
=
−1/2
g(z)DK (y − z) dz.
In the last step, we have used the fact that g and DK are periodic functions
and that DK is an even function. At this point, we put in the definition of
the DK , interchange the integral and the sum, and extract a factor e2πiky
from each summand, which gives the right-hand side of Equation (9.7).
Lemma 9.16. [Riemann–Lebesgue Lemma] Let g be a continuous periodic
function, and let ck be the kth Fourier coefficient of g. Then
lim ck = 0.
|k|→∞
192
9 The Functional Equation of the Riemann Zeta Function
Proof of Lemma 9.16. Define for continuous complex-valued periodic functions u, v the inner product
1
u(x)v(x) dx
(u, v) =
0
and the norm
u =
+
(u, u).
Let uk (x) = e2πikx so that ck = (g, uk ). Using the linearity of the inner product
and the orthogonality relations
0 if k = ,
(uk , u ) =
1 if k = ,
we get
,2 ,
K
K
K
,
,
,
,
(g, uk )uk , = g −
(g, uk )uk , g −
(g, uk )uk
,g −
,
,
k=−K
k=−K
k=−K
K
= (g, g) −
|(g, uk )|2
k=−K
K
= g2 −
|(g, uk )|2 .
k=−K
Since the left-hand side is nonnegative, the sum
∞on the right-hand side must
be bounded independently of K, so the series k=−∞ |ck |2 converges. In particular, ck → 0 as |k| → ∞.
An immediate consequence of the Riemann–Lebesgue Lemma is
1
g(x) sin(kx) dx −→ 0 as k → ∞.
ck + c−k = 2i
(9.8)
0
Now we are ready to complete the proof of Theorem 9.14. By Lemma 9.15,
the partial sums of the Fourier series are given by the left-hand side of Equation (9.7). We manipulate this integral a little, using Equation (9.6):
1
g(y + x)DK (x) dx =
0
1/2
−1/2
g(y + x) − g(y)
sin((2K + 1)x) dx
sin(πx)
1/2
+
−1/2
g(y)DK (x) dx.
(9.9)
The last integral in Equation (9.9) simply equals g(y) for all K by the property
in Equation (9.5) of the Dirichlet kernel. For the first summand, observe that
9.4 Fourier Analysis of Periodic Functions
193
g(y + x) − g(y)
sin(πx)
is a periodic continuous function for x ∈ [−1/2, 1/2] (the limit in x = 0 exists
by l’Hôpital’s rule). By Equation (9.8), this implies that the first summand
tends to zero as K tends to infinity.
Theorem 9.17. [Poisson Summation Formula] Suppose that f belongs to
the Schwartz space S. Then
%f(m).
f(m) =
m∈Z
Proof. Let
g(x) =
m∈Z
f(x + m),
m∈Z
which is certainly convergent since f ∈ S. Clearly, g is periodic. Moreover, g
is differentiable infinitely often since
(n)
1
(n)
f
(x
+
m)
(x
+
m)
,
f
C
(1 + |x + m|)2
|m|>N
|m|>N
|m|>N
where the last series tends to zero for |x| bounded. Therefore the nth derivatives of the partial sums converge uniformly by periodicity for all n 0. We
cannot apply Theorem 8.23 since the functions fn are not necessarily analytic.
However, we can use Lemma 8.25 as follows. Let γ be the real interval [1, x],
let GN be the N th partial sum of the derivatives fn , and use the fundamental
theorem of calculus to see that
x
d
GN (t) dt = GN (x) − GN (0).
dx 0
The integral converges to g(x) − g(0) as N → ∞, and similarly for higher
derivatives, using induction. It follows that g is n times differentiable and
its nth derivative is the limit of that of the partial sums, so we may do Fourier
analysis on g.
Let ck be the kth Fourier coefficient of g. Then, by Theorem 9.14,
∞
g(x) =
ck e2πikx ,
g(0) =
k=−∞
=
∞
m=−∞
0
1
∞
0 m=−∞
1
ck .
k=−∞
On the other hand,
1
−2πikx
ck =
g(x)e
dx =
0
∞
f(x + m)e−2πikx dx.
f(x + m)e−2πikx dx
(9.10)
194
9 The Functional Equation of the Riemann Zeta Function
This interchange of sum and integral is justified because the series for g converges uniformly by Lemma 8.25 again. Multiply each summand by a factor
of e−2πikm = 1 and substitute x + m for x to find
1
∞
∞
ck =
f(x + m)e−2πik(x+m) dx =
f(x)e−2πikx dx
m=−∞
−∞
0
= %f(k).
Now
∞
∞
f(m) = g(0) =
m=−∞
ck =
k=−∞
∞
%f(k)
k=−∞
by Equation (9.10). This completes the proof of the Poisson Summation Formula.
9.5 The Theta Function
Another classical special function we need is the theta function. This satisfies
a surprising functional equation, which plays a key role in the proof of the
functional equation for the zeta function.
Theorem 9.18. For real y > 0, define the theta function by
θ(y) =
∞
e−n
2
πy
.
n=−∞
Then
θ
1
√
= y θ(y).
y
Proof. This relation is far from obvious and looks barely possible. The series
defining θ converges uniformly in the range y > δ for any fixed δ > 0. Fix
2
some real b > 0 and define, with f(y) = e−πy as in Lemma 9.11,
fb (y) = f(by) = e−πb y .
2 2
Of course, fb is in the Schwartz space S, so we may apply the Poisson Summation Formula (Theorem 9.17) to obtain
∞
fb (n) =
n=−∞
Next, we need to compute %fb (y):
∞
n=−∞
%fb (n).
(9.11)
9.5 The Theta Function
%fb (y) =
∞
−∞
fb (x)e−2πixy dx =
∞
f(bx)e−2πixy dx.
195
(9.12)
−∞
Now put u = bx, so dx = 1b du. Thus, Equation (9.12) becomes
∞
y
1 y
%fb (y) = 1
.
f(u)e−2πiu b du = %f
b −∞
b b
Apply Lemma 9.11 to this equation to see that
%fb (y) = 1 f y .
b b
Put this result into Equation (9.11), and use the definition of f again to obtain
∞
e−πb
2
n2
∞
1 −πn2 /b2
e
.
b n=−∞
=
n=−∞
Finally, put b =
√
y and the functional equation for θ emerges.
We are now ready for the proof of the functional equation of the zeta
function.
Proof of Theorem 9.5. We begin with
∞
s ∞
dx
=
Γ
e−x xs/2−1 dx =
e−x xs/2
,
2
x
0
0
so, in the domain (s) > 1 + δ,
F(s) = π
−s/2
∞ n=1
∞
0
Next, replace x by πn2 y in the integral. This means
cancellation we get
F(s) =
0
dx
.
x
n−s e−x xs/2
∞
∞
n=1
e−πn y y s/2
2
dx
dy
=
x
y,
and after some
dy
.
y
(9.13)
The interchange of the integral and sum is permitted because the series for
the zeta function converges uniformly on (s) > 1 + δ for any fixed δ > 0.
Define
∞
2
θ(y) − 1
e−πn y =
.
g(y) =
2
n=1
Split the integral in Equation (9.13) into 0 y 1 and 1 y < ∞,
1
∞
dy
dy
+
.
y s/2 g(y)
y s/2 g(y)
F(s) =
y
y
1
0
196
9 The Functional Equation of the Riemann Zeta Function
In the second integral, change y to z = y −1 , so that it becomes an integral
dz
over the region ∞ > z 1, and dy
= − yz . Thus
F(s) =
∞
y
1
s/2
dy
+
g(y)
y
∞
z −s/2 g(z −1 )
1
dz
.
z
(9.14)
The Poisson Summation Formula (Theorem 9.17) gave us Theorem 9.18,
which may be applied to give
√
y θ(y) − 1
θ(y −1 ) − 1
g(y −1 ) =
=
2
2
√
√
√
y (θ(y) − 1) + y − 1 √
y−1
=
= y g(y) +
.
2
2
Substituting this into Equation (9.14),
∞
∞
dy
dy
y s/2 g(y)
y (1−s)/2 g(y)
F(s) =
+
y
y
1
1
∞
dy
1
.
y −s/2 y 1/2 − 1
+
2 1
y
(9.15)
Let J denote the third integral in Equation (9.15). Then
∞
(1−s)/2
y −s/2
y
−
y −(1+s)/2 − y −(2+s)/2 dy =
(1 − s)/2
−s/2 1
1
1
1
1
−1
−
−
=2
.
=2
1−s s
s−1 s
∞
2J =
(9.16)
Lemma 9.19. For all z ∈ C, the function
∞
dy
G(z) =
y z g(y)
y
1
is analytic.
Assuming this for the moment, we have, by Equations (9.15) and (9.16),
1
1−s
s
1
+G
+
F(s) = G
− ,
2
2
s−1 s
so F(1 − s) = F(s), and F is analytic for all s ∈ C apart from simple poles
at s = 1 and s = 0, completing the proof of Theorem 9.5.
All that remains is to prove the lemma.
∞
Proof of Lemma 9.19. Write G(z) = n=1 Gn (z), where
n+1
y z g(y) dy.
Gn (z) =
n
9.6 The Gamma Function Revisited
197
We will prove that the Gn are analytic functions on all of C and then use
a uniform convergence argument. Consider the difference quotient for Gn (z)
(exactly as in the standard proof of Theorem 8.29 for the analytic continuation
of the zeta function on p. 176),
1
h
n+1
n
1
y z y h − 1 g(y) dy =
h
n+1
y z g(y)(1 + h log y + ρ(h, y) − 1) dy,
n
where ρ(h, y) = O(h2 ) for bounded values of y. One may therefore divide by h
and take the limit h → 0.
Next, we prove that the partial sums of the Gn converge uniformly on a
suitable domain. Consider z in the half-plane (z) < K for some fixed K.
There
∞
∞
z
y
g(y)
dy
y K |g(y)| dy.
(9.17)
N
N
Now we estimate |g(y)|:
|g(y)| =
∞
e−πn
2
y
n=1
∞
e−πny =
n=1
e−πy
.
1 − e−πy
The denominator is clearly bounded below for 1 < y, so the right-hand side of
the inequality (9.17) is finite for N = 1, say. As an immediate consequence, the
integrals from N to infinity must tend to zero, and all this was independent
of z. Now we may apply Theorem 8.23 and deduce that G is analytic on
the half-plane (z) < K. Since K was arbitrary, G is analytic on the whole
complex plane.
9.6 The Gamma Function Revisited
We have seen that the zeta function and the Gamma function go together
like Hardy and Wright. We need to know some additional properties of Γ
(in particular that Γ (s) = 0 for all s ∈ C) in order to understand the zeta
function better.
Theorem 9.20. [Weierstrass] Define a function f by
γs
f(s) = se
∞ (
n=1
1+
s −s/n )
e
,
n
where γ is the Euler–Mascheroni constant. Then f is an analytic function on
the whole of the complex plane, and it is zero at 0, −1, −2, −3, . . . only.
This rather mysterious function turns out to satisfy f(s) = 1/Γ (s), giving
another formula for the Gamma function and incidentally proving that Γ (s)
198
9 The Functional Equation of the Riemann Zeta Function
is nonzero for all s ∈ C (Theorem 9.4). The argument may appear at first
sight an infuriating piece of magic, but it appears more reasonable when
thought of as a (functional) factorization. We know that Γ has simple poles
at 0, −1, −2, . . . , so 1/Γ must have zeros there. The most naı̈ve approach is
to look for a factorization of 1/Γ (s) in the form
Cs(s + 1)(s + 2) · · · ,
but this expression clearly does not converge. Trying to correct the most
obvious defect (that the terms do not converge to 1) would lead one to look
for expressions such as
Cs(1 + s)(1 + s/2)(1 + s/3) · · · ,
∞
but this is still not convergent because n=1 ns is not convergent. What is
needed is a factorization in which the terms converge to 1 fast enough to
guarantee convergence of the infinite product. The argument below gives a
quadratic rate of convergence. This kind of adjustment became a standard
tool in nineteenth-century analytic number theory, and we will encounter it
several times.
Proof of Theorem 9.20. Consider the function
g(s) =
∞
gn (s) =
n=1
∞ (
n=1
s s)
− .
log 1 +
n
n
(9.18)
Each gn is analytic except at −1, −2, −3, . . . . We want to prove that the
series Equation (9.18) converges uniformly on {s ∈ C : |s| < K} for every
fixed K > 0. Choose N > 2K, so for all n N , |s/n| 1/2, and therefore
s
1 s 2 1 s 3
s
= −
+
− ··· .
log 1 +
n
n 2 n
3 n
Thus we can estimate gn (s) for all these s and n by
1 s 2 1 s 3
+ + ···
2 n
3 n
|s|2
1
2K 2
|s|2
2 2 < 2 .
2
n 1 − |s|/n
n
n
|gn (s)| This can be summed from n = N to give
∞
∞
2K 2
gn (s) ,
n2
n=N
n=N
and the latter is arbitrarily small if N is large, as it is the tail end of a
convergent series. Thus the series Equation (9.18) is a uniformly convergent
sum of analytic functions gn (s) on |s| < K for any K. By Theorem 8.23, we
9.6 The Gamma Function Revisited
199
deduce that the limit g(s) is analytic for all s not equal to −1, −2, −3, . . . .
The same holds for
∞ (
s −s/n )
e
1+
.
eg(s) =
n
n=1
After multiplying this by seγs , we see that f is an analytic function away
from −1, −2, −3, . . . . It is clear that f has zeros at each of these points, so log f
has a singularity there. Conversely, away from these obvious zeros, we have
shown that log f is analytic, so f cannot be zero elsewhere. Finally, for some
fixed m ∈ N, consider the infinite product defining f without the factor corresponding to n = m. The same estimates as above show that the logarithm
of this is analytic at s = −m, so f is analytic at s = −m as well.
Corollary 9.21. The zeros of f in Theorem 9.20 are all simple, and the function
∞
1 −γs s −1 s/n
1
= e
1+
e
f(s)
s
n
n=1
is analytic on C apart from simple poles at 0, −1, −2, . . . . The function 1/f
has no zeros at all (because f has no poles).
Theorem 9.22. [Euler] For all s = 0, −1, −2, . . .,
-
s
−1 .
∞
1 1
1
s
=
1+
1+
.
f(s)
s n=1
n
n
Proof. We use the definition of the Euler–Mascheroni constant γ (see Exercise 1.2 on p. 10 or Theorem 8.3).
f(s) = s lim es(1+1/2+···+1/m−log m) lim
N →∞
m→∞
= s lim es(1+1/2+···+1/m−log m)
m→∞
= s lim m−s
m→∞
N m n=1
s
1+ .
n
m n=1
n=1
1+
1+
s −s/n
e
n
s −s/n
e
n
(9.19)
Now we pull a rabbit out of the hat: Write m as
2 3
m−1
m
· ···
·
1 2
m−2 m−1
1
1
1
= 1+
1+
··· 1 +
,
1
2
m−1
m=
(9.20)
where as usual an empty product (the case m = 1) is defined to be 1. Substitute this into Equation (9.19) and use the fact that for all s
200
9 The Functional Equation of the Riemann Zeta Function
lim
m→∞
1+
1 s
m
=1
to see that Equation (9.19) becomes
−s m 1
s
1+
1+
f(s) = s lim
m→∞
n
n
n=1
n=1
s −s
m 1
1
s
= s lim 1 +
1+
1+
m→∞
m n=1
n
n
m
−s
1
s
.
= s lim
1+
1+
m→∞
n
n
n=1
m−1
Now invert both sides, and the proof of Theorem 9.22 is complete.
Corollary 9.23. For all s ∈ C,
1
1 · 2 · · · (m − 1)ms
= lim
.
f(s) m→∞ s(s + 1) · · · (s + m − 1)
Proof. By Theorem 9.22, for s = 0, −1, −2, . . .
s
−1
m−1 1
1 s
1
= lim
1+
1+
f(s) m→∞ s n=1
n
n
s
1 s
1
s
(1
+
1)
1
+
·
·
·
1
+
2
m−1
1
·
= lim
m→∞ s
s
s
1 + 1 · · · 1 + m−1
s
1 s
1
s
(1
+
1)
2
1
+
·
·
·
(m
−
1)
1
+
2
m−1
1
·
,
= lim
m→∞ s
(1 + s)(2 + s) · · · (m − 1 + s)
where we have just multiplied the numerator and denominator by
2 · 3 · · · (m − 1).
Now collect the integers in the numerator into one product and the other
factors into a product
s
m−1
1
1+
= ms
n
n=1
by the identity (9.20). This completes the proof of the corollary.
Theorem 9.24. For all s such that (s) > 0,
∞
1
e−t ts−1 dt.
= Γ (s) =
f(s)
0
9.6 The Gamma Function Revisited
201
Thus we have three representations of the Gamma function – Definition 9.1
and the ones given in Theorems 9.20 and 9.22. The ability to move between
these different formulations will be very useful.
Proof of Theorem 9.24. For n ∈ N, define
n
n
t
Γn (s) =
1−
ts−1dt.
n
0
Evaluate Γn (s) using integration by parts. Substitute t = nτ to give
1
(1 − τ )n τ s−1 dτ
Γn (s) = ns
0
s 1
ns n 1
nτ
= n (1 − τ )
+
(1 − τ )n−1 τ s dτ
s 0
s 0
ns n(n − 1) 1
=
(1 − τ )n−2 τ s+1 dτ = · · ·
s(s + 1) 0
ns n · (n − 1) · (n − 2) · · · 2 · 1
.
=
s(s + 1) · · · (s + n)
s
Now let n tend to infinity, and use Corollary 9.23, which shows that
lim Γn (s) =
n→∞
1
.
f(s)
To complete the proof of Theorem 9.24, we need to prove that
lim Γn (s) = Γ (s).
n→∞
This is plausible because
lim
n→∞
t
1−
n
n
= e−t
(9.21)
for all t. (To prove this, just take logarithms, replace 1/n by h, and apply
l’Hôpital’s rule.) However, to apply Equation (9.21) to our problem, an exchange of limit and integral is required. We must therefore prove that
n n
t
lim
e−t − 1 −
ts−1 dt = 0.
(9.22)
n→∞ 0
n
Estimate the integrand in Equation (9.22) by
n n s−1 −t
t
= tσ−1 e−t − 1 − t .
t
e
−
1
−
n
n We need the following estimate:
202
9 The Functional Equation of the Riemann Zeta Function
n 2 −t
−t
e − 1 − t t e
n
n
(9.23)
for all t ∈ [0, n]. Assuming this,
n n
1 ∞ −t σ+1
t Γ (σ + 2)
σ−1 −t
t
e t
dt =
,
e − 1 − n dt n
n
0
0
which obviously tends to zero. (Note that the convergence is even uniform for
bounded s, although we do not need this here.)
Exercise 9.4. Prove the inequality (9.23).
Exercise 9.5. Using logarithmic differentiation on the representation of Γ in
Theorem 9.20, prove that
(9.24)
Γ (1) = −γ.
Corollary 9.25. For all s ∈ C, s ∈
/ N,
Γ (s)Γ (1 − s) =
π
.
sin(πs)
Proof. By Theorem 9.20,
∞
∞
1 s −1 s/n s −1 −s/n
Γ (s)Γ (−s) = − 2
1+
1−
e
e
s n=1
n
n
n=1
−1
∞ s2
1 π
1− 2
=− 2
=−
s n=1
n
s sin(πs)
using the classical formula
s2
1− 2 .
n
(9.25)
The corollary follows because −sΓ (−s) = Γ (1 − s).
sin(πs) = πs
∞ n=1
Equation (9.25) is another example of an analog of the Fundamental Theorem of Arithmetic in a function-theory context. We know that sin(πs) vanishes
at each integer, so we might hope to factorize it in the form
cs
∞
(n2 − s2 ).
n=1
Of course, this does not converge, and attempting to get the terms to converge
to 1 fast enough to guarantee convergence of the infinite product plausibly
leads one to conjecture Equation (9.25).
9.6 The Gamma Function Revisited
203
Exercise 9.6. Prove the identity (9.25).
Exercise 9.7. Justify the steps in the following argument. The Taylor expansion of the sine function gives
sin(πs) = πs −
(πs)3
+ ··· .
6
(9.26)
By Equation (9.25), this is equal to
1 1 1
πs 1 − s2
+ + + ··· + ···
1 4 9
∞
1
+ ··· .
= πs − πs3
2
n
n=1
Comparing the coefficient of s3 with that of Equation (9.26) gives
∞
1
π2
.
=
n2
6
n=1
Exercise 9.8. Prove that ζ(2k) is a rational multiple of π 2k for any k 1.
Much less is known about the values ζ(3), ζ(5), . . . . Apéry proved in 1978
that ζ(3) ∈ Q, and there are some very deep results on the algebraic independence of various values of ζ at odd integers.
Exercise 9.9. This exercise is a more explicit version of the previous one.
(a) Replace s by iz in Equation (9.25) to deduce that
sinh(πz) = πz
∞ 1+
n=1
z2
.
n2
(9.27)
(b) Use logarithmic differentiation to prove
∞
(−1)k+1
πz
πz
+
=1+
ζ(2k)z 2k .
πz
e −1
2
22k−1
(9.28)
k=1
(c) Deduce that
ζ(2k) = (−1)k π 2k
22k−1
(2k − 1)!
−
B2k
,
2k
(9.29)
where Bn denotes the nth Bernoulli number defined by
∞
Bn z n
z
=
.
z
e − 1 n=1 n!
(9.30)
204
9 The Functional Equation of the Riemann Zeta Function
Exercise 9.10. (a) Use Theorem 9.5 and Equation (9.29) to prove that ζ
takes rational values at negative odd integers.
(b)Use Equation (9.30) to show that Bn = 0 for odd integers n > 1.
(c)Deduce that
Bn+1
ζ(−n) = −
(9.31)
n+1
for all n > 0.
The neatness of Equation (9.31) suggests there might be a more elegant
way to prove it. Hurwitz found a beautiful proof using complex analysis.
Exercise 9.11. Use the functional equation together with Equations (9.24)
and (8.24) to prove that
ζ (0)
= log(2π).
(9.32)
ζ(0)
Prove that ζ(0) = − 12 and deduce the value of ζ (0).
Exercise 9.12. *Prove that
any k 2.
∞
1
is a rational multiple of π k for
k
(4n
+
1)
n=−∞
There are many deep results on the location and distribution of the zeros
of the Riemann zeta function, all far beyond our scope.
Theorem 9.26. Define N (T ) to be the number of zeros of the Riemann zeta
function in the critical strip up to height T ,
N (T ) = |{s ∈ C : 0 (s) 1, ζ(s) = 0, 0 < (s) < T }| .
Then there is an asymptotic formula,
T
T
T
log
+ O(log T ).
−
N (T ) =
2π
2π
2π
The proof makes use of Stirling’s Formula extended to the complex plane,
1
log Γ (s) = −s + s −
log s + O(1),
2
provided |Arg(s)| < π − δ.
Exercise 9.13. Define a function ν by ν(1) = 0, and ν(n) is the number of
distinct prime divisors of n for n > 1.
∞
1
ν(n)
(a) Prove that
= ζ(s)
.
s
n
ps
n=1
p∈P
∞
2ν(n)
ζ 2 (s)
(b) Prove that
=
.
s
n
ζ(2s)
n=1
9.6 The Gamma Function Revisited
205
At the start of this chapter, the idea of “factorizing” functions in the way
that polynomials are factorized was discussed. Quite apart from the convergence issues that pervade this topic, infinite products may behave in quite
surprising ways, as shown by the next exercise.
Exercise 9.14. Using Exercise 8.11, show that, for any x with |x| < 1,
ex =
∞
(1 − xn )−µ(n)/n .
n=1
The functional equations we have considered in this chapter are analytic
properties of known classical functions. The next exercise is (relatively) light
relief and is a functional equation in another sense: The unknown solution
sought is a function.
Exercise 9.15. *Find the solutions to the functional equation
f (xz − y)f (x)f (y) + 3f (0) = 1 + 2f (0)f (0) + f (x)f (y) for all x, y, z ∈ R.
Does the solution change if the identity is only required to hold for all x, y, z
in Z?
9.6.1 Factorizing the Riemann Zeta Function
Several times in this chapter, we have seen a function factorize in a meaningful
way into an infinite product of “irreducible” terms corresponding to zeros,
corresponding to a function-theoretic version of the Fundamental Theorem of
Arithmetic. The Riemann Hypothesis itself can be understood in these terms
– except that the location of the zeros is not known.
Theorem 9.27. [Hadamard] Let Ξ denote the set of zeros of the Riemann
zeta function in the critical strip {z | 0 < (z) < 1}. Then
ebs
s s/ξ
ζ(s) =
e ,
1−
2(s − 1)Γ ( 2s + 1)
ξ
ξ∈Ξ
where b = log(2π) − 1 + γ2 .
In this theorem, the zeros of the zeta function outside the critical strip are
accounted for by the poles of Γ ( 2s + 1).
Exercise 9.16. Assuming the statement of Theorem 9.27 for some constant b,
show that it must have the stated value by using Exercise 9.11.
Notes to Chapter 9: For a very interesting discussion of both the mathematics
and the history of the type of analysis used in this chapter, and in particular to
206
9 The Functional Equation of the Riemann Zeta Function
gain some insight into how Euler came close to the functional equation, see Hardy’s
monograph [74]. An elegant guide to classical Fourier analysis may be found in
Katznelson’s book [87]. Apéry’s proof that ζ(3) is irrational appeared in his paper [3];
an accessible account is provided by van der Poorten [118]. More recent results on
values of the zeta function at odd integers appear in works by Ball and Rivoal [9] or
Rivoal [130] and references therein. The disproof of Merten’s conjecture mentioned
on p. 186 appears in the paper of Odlyzko and te Riele [114]. A comprehensive
guide to many of the analytic arguments here, including Exercises 9.4 and 9.6 is
the classic text of Whittaker and Watson [160]. Artin’s book [6] is an exceptionally
clear account of the main properties of the Gamma function. Deeper properties of
the zeta function, emphasizing the role of Poisson summation, may be found in
Patterson’s book [115]. Several different approaches to the functional equation for
the Riemann zeta function appear in the book of Titchmarsh [153]. For a recent
overview of the Riemann Hypothesis written by a worker in the field, consult the
survey of Conrey [33]. Exercise 9.12 is classical; a proof requiring little background
appears in a paper of Beukers, Kolk and Calabi [13] and is discussed in a paper of
Elkies [50]. Exercise 9.14 is taken from a paper of Brent [19]. Exercise 9.15 is taken
from a paper of Šuniḱ [148].
10
Primes in an Arithmetic Progression
We begin with two elementary results and then give more sophisticated proofs
of them, suggesting a general method. The algebraic part of this method
concerns characters of Abelian groups, the analytic part is a nonvanishing
statement about L-functions. The culmination is Dirichlet’s general result,
Theorem 10.5 in Section 10.1.
Consider all the primes congruent to 1 modulo 4 (the first row of Figure 10.1) and all the primes congruent to 3 modulo 4 (the second row). We
might guess that there are infinitely many primes of each type.
p≡1
p≡3
(mod 4) 5
13 17
29
37 41
53
61
73
(mod 4) 3 7 11
19 23
31
43 47
59
67 71
Figure 10.1. The primes modulo 4.
Proposition 10.1. There are infinitely many primes congruent to 3 modulo 4.
Proof. This proceeds like Euclid’s proof that there are infinitely many prime
numbers. Suppose that the proposition is false, and there are only r such
primes p1 , . . . , pr . Let
N = (p1 · · · pr )2 + 2.
Since
p21 ≡ · · · ≡ p2r ≡ 1 (mod 4),
we have N ≡ 3 modulo 4. Now N decomposes into prime factors,
N = q1 · · · q k ,
which must all be odd, so they are all congruent to 1 or 3 modulo 4. At
least one of the primes qi must be congruent to 3 since otherwise N would be
208
10 Primes in an Arithmetic Progression
congruent to 1. Thus qi is one of p1 , . . . , pr and divides N and (N − 2), and
hence divides 2, a contradiction.
Proposition 10.2. There are infinitely many primes congruent to 1 modulo 4.
First Proof of Proposition 10.2. This proof is slightly different. Rather
than deriving a contradiction, we will show that, for any given N > 1, there
exists a prime congruent to 1 modulo 4 and greater than N . Given N > 1,
define
M = (N !)2 + 1.
(10.1)
Clearly, M is odd. Let p be the smallest prime factor of M . We must
have p > N since N ≡ 1 modulo q for any prime q N . We claim that p ≡ 1
modulo 4 (which completes the proof since N was arbitrary). To prove the
claim, transform Equation (10.1) into
(N !)2 = M − 1 ≡ −1 (mod p).
(10.2)
Since p divides M , p is odd, and we may raise the congruence (10.2) to
the ( p−1
2 )th power:
(N !)p−1 ≡ (−1)(p−1)/2 .
By Fermat’s Little Theorem, ap−1 ≡ 1 modulo p for all a ≡ 0 modulo p,
so (p − 1)/2 must be even, proving the claim.
Exercise 10.1. Prove that there are infinitely many primes congruent to 1
or to 5 modulo 6.
These results are all very well, but the proofs are awkward and ad hoc.
We would like to have a general principle for proving such results.
10.1 A New Method of Proof
Just as the analytic proofs of Theorem 1.2 in the end gave us more information than Euclid’s original proof by contradiction, it turns out that the most
powerful approach to primes in congruence classes comes from analysis.
Second Proof of Proposition 10.2. This proof works along the lines of
the second proof of Theorem 1.3 on p. 12. Consider for odd n ∈ N the function
1 + (−1)(n−1)/2
1 for n ≡ 1 (mod 4),
c1 (n) =
=
(10.3)
0 for n ≡ 3 (mod 4).
2
The function c1 is a gadget for picking out a particular congruence class.
Later, we will generalize this fact using orthogonality relations for characters
of Abelian groups. Using the gadget, for real σ > 1,
10.1 A New Method of Proof
p≡1 mod 4
209
c1 (p)
1
=
σ
p
pσ
p odd
=
1 1
1 (−1)(p−1)/2
+
.
2
pσ
2
pσ
p odd
(10.4)
p odd
The rearrangement here is permitted because the series involved converge
absolutely. The first summand on the right-hand side of Equation (10.4) tends
to infinity as σ → 1 (by Theorem 1.3). We claim that the second summand
converges for σ → 1 and in particular is bounded. (This will be proved below.)
This implies that the left-hand side of Equation (10.4) tends to infinity, and we
conclude that there must be infinitely many primes over which the summation
runs.
For the moment, let us pursue the aim of another proof of Proposition 10.2
since all the essentials of Dirichlet’s proof become apparent there already. We
still have to prove the convergence claim for the last sum in Equation (10.4).
To do this, define two functions χ, χ0 : N → {−1, 0, 1} by
0
if n is even,
χ(n) =
(−1)(n−1)/2 if n is odd,
0 if n is even,
χ0 (n) =
1 if n is odd.
Define a complex function by
L(s, χ) =
∞
χ(n)
,
ns
n=1
and define L(s, χ0 ) similarly. Such functions are called L-functions and are a
special kind of Dirichlet series. Clearly, the series defining L(s, χ) and L(s, χ0 )
converge absolutely for all s with (s) > 1.
Lemma 10.3. The series L(s, χ) converges for s = 1, and
L(1, χ) = 1 −
1 1 1
π
+ − + ··· = .
3 5 7
4
Proof. Consider the integral
1
dt
π
= [tan−1 (t)]10 = .
2
1
+
t
4
0
(10.5)
Substitute into this integral the expansion
∞
1
1
=
,
2 )n
1 + t2
(−t
n=0
(10.6)
210
10 Primes in an Arithmetic Progression
which converges for all 0 t < 1. Fix any 0 < x < 1, and then the series
in Equation (10.6) converges uniformly for 0 t x. We therefore have for
all x with 0 < x < 1
x
∞ x
dt
f(x) =
=
(−t2 )n dt
2
0 1+t
0
n=1
=
∞
(−1)n 2n+1
x
2n + 1
n=0
because there we may interchange integration and summation thanks to the
uniform convergence. Now, we may take the limit x → 1 thanks to Abel’s
Limit Theorem. (This is a useful special feature of power series – see Section 10.6.) For x → 1, we get L(1, χ) on the right-hand side, and the integral
in Equation (10.5), f(1) = π/4, on the left-hand side.
Lemma 10.4. The functions χ and χ0 are completely multiplicative (see Definition 3.4).
Proof. Check all the possible values of m and n modulo 4.
Now recall the Euler expansion (Theorem 1.5) for the zeta function. Since χ
and χ0 are completely multiplicative, we get in exactly the same way (see
Theorem 8.17) an Euler expansion of L(σ, χ) and L(σ, χ0 ),
−1
χ(p)
1− σ
L(σ, χ) =
,
p
p odd
−1
1
1− σ
L(σ, χ0 ) =
.
p
(10.7)
(10.8)
p odd
Take logarithms of Equation (10.7) and Equation (10.8) to get
χ(p)
χ(p)
log L(σ, χ) = −
=
log 1 − σ
+ O(1),
p
pσ
p odd
p odd
1
1
log L(σ, χ0 ) = −
log 1 − σ =
+ O(1).
p
pσ
p odd
p odd
Adding (see Equation (10.4)) gives
log L(σ, χ0 ) · L(σ, χ) =
1 + χ(p)
+ O(1)
pσ
p odd
= 2
p≡1 mod 4
1
+ O(1).
pσ
(10.9)
(10.10)
10.2 Congruences Modulo 3
211
What is the behavior of the left-hand side as σ tends to 1 from above?
π
= 0,
4
1 1
L(σ, χ0 ) = 1 − σ
−→ ∞.
2
nσ
n
L(σ, χ) →
The terms O(1) in Equations (10.9) and (10.10) are still O(1) for σ → 1, so
we conclude that
1
−→ ∞
pσ
p≡1 mod 4
as σ → 1 from above. This completes the second proof of Proposition 10.2. Had we subtracted Equations (10.9) and (10.10) instead of adding them,
we would have proved that
1
pσ
p≡3 mod 4
diverges as σ → 1 from above and hence would have found another proof of
Proposition 10.1.
Exercise 10.2. Use the gadget
c3 (n) =
1 − (−1)(n−1)/2
=
2
0 for n ≡ 1 (mod 4)
1 for n ≡ 3 (mod 4)
(10.11)
to prove that there are infinitely many primes p ≡ 3 modulo 4.
The biggest payoff of this more sophisticated approach is that the argument can be made to work in complete generality. At the end of this chapter,
we will have proved the following theorem.
Theorem 10.5. [Dirichlet] If a ∈ N and q ∈ N are coprime, then there are
infinitely many primes p such that
p ≡ a (mod q).
Note that this is the most general result we could hope for: If a and q
are not coprime, then every number n ≡ a modulo q will be divisible
by gcd(a, q) > 1, so there can only be finitely many such primes.
10.2 Congruences Modulo 3
To understand the ingredients necessary to prove Dirichlet’s Theorem, we
repeat the argument above for primes congruent to 1 or 2 modulo 3.
Consider the functions
212
10 Primes in an Arithmetic Progression
χ0 (n) =
1 if
0 if
3 n
3n,
⎧
⎨ 1 if n ≡ 1 (mod 3)
χ(n) = −1 if n ≡ 2 (mod 3)
⎩
0 if n ≡ 0 (mod 3).
As in the previous example, the functions c1 and c2 picking out a particular
congruence class can be rewritten using χ and χ0 as
1
1 if n ≡ 1 (mod 3)
c1 (n) = (χ0 (n) + χ(n)) =
0 otherwise,
2
and
1
c2 (n) = (χ0 (n) − χ(n)) =
2
1 if n ≡ 2 (mod 3)
0 otherwise.
Define the associated L-functions
L(σ, χ) =
∞
χ(n)
nσ
n=1
and similarly L(σ, χ0 ). As in the previous example, χ and χ0 are completely
multiplicative and hence L(σ, χ) and L(σ, χ0 ) have Euler product expansions.
Moreover,
1
1
= 1 − σ ζ(σ)
L(σ, χ0 ) =
nσ
3
3 |n
tends to infinity as σ → 1. We have all the ingredients to repeat the analog
of the second proof of Proposition 10.2 in this case except one: We do not yet
know whether
L(1, χ) = 0.
(10.12)
If we knew this, we could proceed exactly as before; the key step is to notice
that
1
log L(σ, χ0 ) · L(σ, χ) = 2
+ O(1).
pσ
p≡1 mod 3
As long as we do not know the inequality (10.12), the left-hand side might have
a limit as σ tends to 1. We will prove the inequality (10.12) in Section 10.5
as part of a general result.
Thus if we want to prove results such as Proposition 10.2 along the lines
of the second proof, then there are two things we need to get to grips with:
1. A mechanism for pulling out a particular congruence class via multiplicative functions (see Sections 10.3 and 10.4).
2. A nonvanishing statement about L-functions at σ = 1 (see Section 10.5).
10.3 Characters of Finite Abelian Groups
213
10.3 Characters of Finite Abelian Groups
In this section, we want to deal with the first problem from the preceding
section. Consider the example n = 5. Define the functions
1 if n ≡ 0 (mod 5)
χ0 (n) =
0 if 5n,
⎧
i if n ≡ 2 (mod 5)
⎪
⎪
⎪
⎪
⎨ −1 if n ≡ 4 (mod 5)
−i if n ≡ 3 (mod 5)
χ(n) =
⎪
⎪
1 if n ≡ 1 (mod 5)
⎪
⎪
⎩
0 if n ≡ 0 (mod 5).
Now check that
1
(χ0 (n) + χ(n) + χ2 (n) + χ3 (n)) = c1 (n) =
4
1 if n ≡ 1 (mod 5)
0 otherwise.
What if you want to pull out the congruence class n ≡ 2 modulo 5? Is an
ingenious ad hoc. argument needed each time? We clearly need a general
setup.
Recall that, in the ring Z/nZ, the units are
U (Z/nZ) = {k
(mod n) | gcd(k, n) = 1}.
The units form a group under multiplication.
Example 10.6. Let n = 5, so U (Z/5Z) = {1, 2, 3, 4} is a cyclic group, and 2 is
a generator. The multiplication table of U (Z/5Z) is
1
2
3
4
1
1
2
3
4
2
2
4
1
3
3
3
1
4
2
4
4
3
2
1
so U (Z/5Z) ∼
= {1, i, −1, −i}.
Definition 10.7. Let G be a finite Abelian group. A character of G is a homomorphism
χ : G → (C∗ , ·).
The multiplicative group C∗ is C\{0} equipped with the usual multiplication.
By convention, we will write all finite groups multiplicatively in this section –
hence the identity will be written as 1G or 1. For any group, the map
χ0 : G → C∗ , χ0 (g) = 1,
is a character called the trivial character.
214
10 Primes in an Arithmetic Progression
Lemma 10.8. Let G be a finite Abelian group, and let χ be a character of G.
Then χ(1G ) = 1 and χ(g) is a root of unity for any g ∈ G. In particular, |χ(g)| = 1. Thus χ(g) lies on the unit circle in C.
Proof. Clearly
χ(1G ) = χ(1G · 1G ) = χ(1G )χ(1G ),
so χ(1G ) = 1 since χ(1G ) = 0. As to the second statement, we use the fact
that for every g ∈ G there exists n ∈ N such that g n = 1G . This implies that
χ(g)n = χ(g n ) = χ(1G ) = 1.
Example 10.9. Let G = Ck = g, a cyclic group of order k. Now
g k = 1,
so
χ(g)k = 1
and therefore χ(g) must be a kth root of unity. Any of the k different kth
roots of unity can occur as χ(g), and of course χ(g) determines all the values
of χ on G since G is generated by g, so there are k distinct characters of G.
We can label the characters of G with labels 0, 1, . . . , n − 1 as follows: χj is
determined by χj (g) = e2πij/k , so χj (g m ) = e2πijm/k .
Theorem 10.10. Let G be a finite Abelian group. Then the characters of G
form a group with respect to the multiplication
(χ · ψ)(g) = χ(g)ψ(g),
% The identity in G
% is the trivial character. The group G
% is isomordenoted G.
phic to G. In particular, any finite Abelian group G of order n has exactly n
distinct characters.
This theorem is the first intimation of an entire dual world, a mirror image
to the familiar world of finite Abelian groups. This duality extends to a larger
class of Abelian groups and in that wider class takes subgroups to quotient
groups, quotient groups to subgroups, and products to sums.
Exercise 10.3. What happens if the same construction is made for other
groups?
(a) Describe the group
% = {homomorphisms Z → S1 }.
Z
(b) For nondiscrete groups G, we need to restrict to continuous characters.
Find
01 = {continuous homomorphisms S1 → S1 }.
S
%
(c)*A more challenging problem is to describe the group Q.
10.3 Characters of Finite Abelian Groups
215
Proof of Theorem 10.10. Use the structure theorem for finite Abelian
groups, which says that G is isomorphic to a product of cyclic groups,
G∼
=
k
Cnj .
j=1
Choose a generator gj for each of the factors Cnj and define characters on G
by
χ(j) (∗, . . . , ∗, gj , ∗, . . . , ∗) = e2πi/nj ,
that is, ignore all entries except the jth, and there use the same definition
as in Example 10.9. Then the characters χ(1) , . . . , χ(k) generate a subgroup
% that is isomorphic to G: Each χ(j) generates a cyclic group of order nj ,
of G
and this group has a trivial intersection with the span of all the other χ(i) s
since all characters in the latter have value 1 at gj . Likewise, for any given
character of G, it is easy to write down a product of powers of the χ(j) that
coincides with χ on the generators gj and hence on all of G.
Corollary 10.11. Let G be a finite Abelian group. For any 1 = g ∈ G, there
% such that χ(g) = 1.
exists χ ∈ G
Proof. Looking again at the proof of Theorem 10.10, we may write
g = (∗, . . . , ∗, gjr , ∗, . . . , ∗)
with some entry gjr = 1, 0 < r < nj . Then χ(j) (g) = e2πir/nj = 1.
Theorem 10.12. Let G be a finite Abelian group. Then, for any element h ∈
%
G and any character ψ ∈ G,
|G| if ψ = χ0
(10.13)
ψ(g) =
0 if ψ = χ0 ,
g∈G
|G| if h = 1
χ(h) =
(10.14)
0 if h = 1.
χ∈G
These identities are known as the orthogonality relations for finite Abelian
group characters.
Proof. Consider Equation (10.13) first. The case ψ = χ0 is trivial, so assume ψ = χ0 . There is an element h ∈ G such that ψ(h) = 1. Then
ψ(h)
ψ(g) =
ψ(gh) =
ψ(g)
g∈G
g∈G
g∈G
because multiplication
by h only permutes the summands. This equation can
only be true if g∈G ψ(g) = 0.
216
10 Primes in an Arithmetic Progression
For Equation (10.14), assume h = 1. By Corollary 10.11, there exists some
% such that ψ(h) = 1. We now use the dual of the argument
character ψ ∈ G
above,
ψ(h)
χ(h) =
(ψ · χ)(h) =
χ(h)
χ∈G
χ∈G
χ∈G
%
since multiplication
by ψ only permutes the elements of G, and again this can
only be true if χ∈G χ(h) = 0.
Corollary 10.13. For all g, h ∈ G, we have
|G| if g = h
χ(g)χ(h) =
0 if g =
h.
χ∈G
Proof. Note that
χ(h−1 ) = χ(h)−1 = χ(h)
since χ(h) is on the unit circle in C. Then use Theorem 10.12 with gh−1 in
place of h.
This is the gadget in its ultimate form. Character theory allows us to
construct functions that will extract any desired residue class. As an example,
take G = U (Z/5Z) ∼
= C4 . Table 10.1 shows all the characters on G.
Table 10.1. Characters on U (Z/5Z).
1
2
4
3
χ0 χ1 χ2 χ3
1 1 1 1
1 i −1 −i
1 −1 1 −1
1 −i −1 i
Note that we have written the elements of U (Z/5Z) in Table 10.1 in an unusual ordering 20 , 21 , 22 , 23 , adapted to the generator 2. The character values
behave likewise. Note also χ21 = χ2 and χ31 = χ3 = χ−1
1 . We used earlier
χ0 (n) + χ1 (n) + χ2 (n) + χ3 (n) = 4c1 (n),
which is just the case h = 1 of Corollary 10.13. We asked then, “What
about c2 (n), which is 1 if n is congruent to 2 and 0 otherwise?” The corollary
suggests that we take h = 2, and we get
χ0 (n) − iχ1 (n) − χ2 (n) + iχ3 (n) = 4c2 (n).
This can be checked simply by going through the possible cases.
10.4 Dirichlet Characters and L-Functions
217
If you compare the ideas used here with Fourier analysis, much is familiar.
The expression
1 f(h)g(h)
|G|
h∈G
is an inner product on the vector space of all functions on G, and the characters
form a complete orthonormal set. There are no difficulties about convergence
because the group is finite. In particular, any complex function on G can be
written as a linear combination of the characters.
10.4 Dirichlet Characters and L-Functions
Definition 10.14. Given 1 < q ∈ N, let G = U (Z/qZ) and fix a character χ
% Extend χ to a function X on N by setting
in G.
χ(n mod q) if n is coprime to q,
X(n) =
0
otherwise.
The function X is called a Dirichlet character modulo q.
This is a slight abuse of language – characters are functions on groups,
and N certainly is not a group. We will even write χ instead of X for the
Dirichlet character associated with χ. In the same way, for any a ∈ G, we can
extend the function
1 if b = a,
(10.15)
ca (b) =
0 otherwise,
to a periodic function on N, which will also be written as ca . Finally, associate
to each Dirichlet character χ the L-function
L(s, χ) =
∞
χ(n)
,
ns
n=1
which is called the L-function of χ.
Example 10.15. Take the trivial character χ0 of U (Z/4Z). The associated
Dirichlet character is just the function χ0 that we used in the second proof
of Proposition 10.2. The functions c1 and c3 that we used there are extensions of functions on U (Z/4Z) as in Equation (10.15). The same holds for
the corresponding L-functions – the notation has been carefully chosen to be
consistent.
Theorem 10.16. A Dirichlet character is completely multiplicative, and the
associated L-function therefore has an Euler product expansion.
218
10 Primes in an Arithmetic Progression
Proof. Let χ be a Dirichlet character modulo q. If two integers m, n are given,
and at least one of them is not coprime to q, then neither is the product mn.
Thus, χ(mn) = 0 = χ(m)χ(n). If on the other hand, both m and n are coprime
to q, then (m mod q)·(n mod q) = (mn mod q) by definition, and because χ in
the original sense is a group character, we have χ(mn) = χ(m)χ(n). The existence of an Euler product expansion then follows directly from Theorem 8.17.
Since χ(p) = 0 for all p dividing q, we get
L(s, χ) =
p
χ(p)
1− s
p
−1
=
p |q
χ(p)
1− s
p
−1
.
(10.16)
Clearly, the L-functions converge for (s) > 1 by comparison with the
Riemann zeta function. Now let us see how these L-functions can be marshaled
to prove Dirichlet’s Theorem about primes in an arithmetic progression.
By Theorem 8.17, which gives the Euler product expansion, L(s, χ) = 0
for (s) > 1, so we may take logarithms in Equation (10.16) and expand the
logarithm in a Taylor expansion
log L(s, χ) = −
p |q
=
χ(p)
log 1 − s
p
χ(p)
p |q
ps
=
∞
1 χ(pm )
m psm
m=1
p |q
+ O(1),
as before. For a given congruence class a mod q, with a coprime to q, multiply
both sides by χ(a) and sum over all χ ∈ U (Z/qZ) (the associated Dirichlet
characters). We get
χ(a) log L(s, χ) =
χ
χ(a)
χ(p)
p |q
χ
ps
+ O(1).
Since the series on the right converges absolutely, we may interchange summations. By Corollary 10.13,
φ(q) if p = a (mod q)
χ(a)χ(p) =
0
otherwise,
χ
where φ(q) = |U (Z/qZ)| by definition of the Euler function. We have proved
χ
χ(a) log L(s, χ) = φ(q)
p≡a mod q
1
+ O(1).
qs
Now let s → 1 from above. We claim the following.
(10.17)
10.5 Analytic Continuation and Abel’s Summation Formula
219
1. The L-function L(s, χ0 ) has a simple pole at s = 1.
2. For all χ = χ0 , the L-function L(s, χ) has a nonzero limit as s → 1.
Once these claims have been proved, we know that the left-hand side
in Equation (10.17) tends to infinity as s → 1. For the right-hand side, this
means that there must be infinitely many summands, which will complete the
proof of Dirichlet’s Theorem.
The first claim is quite easy to prove,
L(s, χ0 ) =
p |q
1−
1
ps
−1 1
1 − s ζ(s),
=
p
p|q
and we know that ζ has a simple pole at s = 1. The second claim – the
nonvanishing of the L-function – is very difficult to prove. Thus far, nothing
that we have done has required the L-function to be defined for complex
values of s. It is the proof of nonvanishing that requires the complex variable
methods.
A last remark before we embark on this. Looking back at Figure 10.1, one
might have guessed that both congruence classes of primes modulo 4 contain
about the same number of primes up to a given bound. This is true in complete
generality: For a coprime to q,
|{p ∈ P | p ≡ a (mod q), p T }|
1
−→
as T → ∞.
|{p ∈ P | p T }|
φ(q)
In particular, the limit is independent of a. This can be proved by a slight
refinement of the methods given in this chapter.
10.5 Analytic Continuation and Abel’s
Summation Formula
In this section, we will not only complete the proof of Dirichlet’s Theorem 10.5.
On the way, we will see Abel’s Summation Formula and the analytic continuation of L-functions to the half-plane (s) > 0.
Theorem 10.17. [Abel] Let a be an arithmetic function, and define
A(x) =
a(n).
nx
Let f : [x, y] → C be differentiable with a continuous derivative. Then
y
a(n)f(n) = A(y)f(y) − A(x)f(x) −
A(t)f (t) dt.
(10.18)
x<ny
x
220
10 Primes in an Arithmetic Progression
Proof. Assume x, y ∈ N are integral for simplicity, m = y and k = x, so the
left-hand side of Equation (10.18) is
m
n=k+1
a(n)f(n) =
m
(A(n) − A(n − 1))f(n)
n=k+1
m
= −
A(n − 1)(f(n) − f(n − 1)) + A(m)f(m) − A(k)f(k).
n=k+1
This is rather like integration by parts, using sums instead of integrals and
taking differences instead of differentiating. There are some terms coming
from the boundary of the summation interval, too, which complicates things
a little. Notice that we have not used the hypothesis that f be differentiable
up to now. Using A(t) = A(n) for all t ∈ [n, n + 1), we get
n
n
A(n − 1)(f(n) − f(n − 1)) = A(n − 1)
f (t) dt =
A(t)f (t) dt.
n−1
n−1
Sum this from n = k + 1 to n = m to get the statement of the theorem.
We apply this formula to a(n) = χ(n) (for χ a Dirichlet character modulo q,
k+q−1
but not the trivial character) and f(t) = t−s . Notice that n=k χ(n) = 0 for
all k ∈ N, so A(x) = O(1); in fact |A(x)| φ(q) for all x. Abel’s Summation
Formula gives
y
χ(n)
A(t)
A(y)
= s −1+s
dt.
(10.19)
s+1
ns
y
1 t
1<ny
The integral on the right-hand side of Equation (10.19) can be split into
integrals from 1 to 2, from 2 to 3, and so on. The series of these integrals
converges uniformly for (s) > δ, with any fixed δ > 0, so we may let y
go to ∞. Each of these integrals is an analytic function of s for (s) > 0.
Hence the function L(s, χ) is analytic in this domain by Theorem 8.23. This
argument is in fact the same as the one we used in one of the proofs of the
analytic continuation of the zeta function.
Proof of Nonvanishing of L-functions. We will now finally prove the
last step in Dirichlet’s Theorem by showing that L(s, ψ) = 0 for s = 1. Consider first the case that ψ is a non-real Dirichlet character (non-real meaning
that not all values of ψ(n) are ±1). Consider, for σ > 1,
χ(p)
log
L(σ, χ) =
log L(σ, χ) = −
log 1 − s
p
χ
χ
χ
p |q
=
χ
p |q
∞
χ(p)m
.
mpσm
m=1
(10.20)
Suppose L(1, ψ) = 0. Then we must also have L(1, ψ̄) = 0. By hypothesis, ψ = ψ̄, and both ψ and ψ̄ appear in the product over all characters
10.5 Analytic Continuation and Abel’s Summation Formula
221
in Equation (10.20). As σ tends to 0, the simple pole of L(s, χ0 ) is doubly
cancelled by the zeros in L(s, ψ) and L(s, ψ̄), and hence the product must
tend to 0 and the logarithm in Equation (10.20) must go to −∞. But the
right-hand side
of Equation (10.20) is always nonnegative. This follows from
the fact that χ χ(pm ) = 0 or φ(q) by Theorem 10.12. This is a contradiction,
and we have proved L(1, ψ) = 0 in the case that ψ is not real.
The case that ψ is real, so ψ(n) = ±1 for all n ∈ N, is rather more complicated. Suppose again that L(1, ψ) = 0. Then ζ(s)L(s, ψ) must be analytic
on the half-plane (s) > 0. Write F(s) = ζ(s)L(s, ψ) as a Dirichlet series (see
Exercise 8.15 on p. 166),
∞
f(n)
F(s) =
,
ns
n=1
where the function f = ψ ∗ u is defined by
f(n) = (ψ ∗ u)(n) =
ψ(d).
d|n
Lemma 10.18. Define another arithmetic function g by
1 if n is a square;
g(n) =
0 otherwise.
Then f(n) g(n) for all n ∈ N.
Proof. Note that both f and g are multiplicative arithmetic functions, so it
is enough to consider the case n = pk , a prime power. We have
⎧
1 if ψ(p) = 0,
⎪
⎪
⎨
k
+
1 if ψ(p) = 1,
f(pk ) = 1 + ψ(p) + · · · + ψ(p)k =
0
if ψ(p) = −1 and k is odd,
⎪
⎪
⎩
1 if ψ(p) = −1 and k is even.
Clearly f(n) 0 for all n. This settles already the claim of the lemma in
the case that n is not a square. If n is a square, the exponent of each prime
in n is even, and we get f(n) 1 by looking at the preceding equation. This
completes the proof of Lemma 10.18.
Returning to the main proof, fix a number r with 0 < r < 3/2. Now F is
analytic on the half-plane σ > 0, so we may consider the Taylor expansion
of F about s = 2,
∞
F(ν) (2)
F(2 − r) =
(−r)ν ,
ν!
ν=1
where the νth derivative F(ν) (2) is given by
F(ν) (2) =
∞
n=1
f(n)
(− log n)ν
.
n2
(10.21)
222
10 Primes in an Arithmetic Progression
We will prove that the Dirichlet series for F converges uniformly for σ > 0
so that we may indeed differentiate term by term. Now consider a general
summand of the Taylor expansion,
∞
∞
F(ν) (2)
rν f(n)(log n)ν
rν g(n)(log n)ν
(−r)ν =
ν!
ν! n=1
n2
ν! n=1
n2
=
∞
∞
rν 1 · (log n2 )ν
(−2r)ν (− log n)ν
=
ν! n=1
n4
ν! n=1
n4
=
(−2r)ν (ν)
ζ (4).
ν!
Use this inequality for all terms of the Taylor expansion of F to deduce that
F(2 − r) (−2r)ν
ν=0
ν!
ζ (ν) (4) = ζ(4 − 2r).
(10.22)
Now let r converge to 3/2 from below. The right-hand side of Equation (10.22)
tends to infinity since ζ has a pole at s = 1. The left-hand side is bounded
because F(s) is analytic for s > 0, a contradiction.
We still have to prove our claim that the Dirichlet series for F converges
uniformly for all s > 0. Look again at Equation (10.21) and substitute it into
the Taylor series for F about s = 2 to get
F(2 − r) =
∞
∞
1 ν
(log n)ν
r
f(n)
.
ν! n=1
n2
ν=0
Note that the minus sign of −r cancels with that in the derivative so that all
terms are positive. Hence we may interchange the summations,
F(2 − r) =
∞
∞
f(n) 1
(r log n)ν ,
2
n
ν!
n=1
ν=0
(10.23)
and this sum converges for all r with |r| < 2 because by our assumption F(s) is
analytic on the whole half-plane (s) > 0. The inner sum in Equation (10.23)
is just er log n = nr , so Equation (10.23) becomes
F(2 − r) =
∞
f(n)
.
n2−r
n=1
The right-hand side converges for all r < 2, and if we substitute s = 2 − r, we
get just the Dirichlet series for F back again, which converges for all s > 0.
Since any Dirichlet series convergent for all s > s0 converges uniformly in
that domain, the proof is complete.
Exercise 10.4. Locate the steps in the preceding proof that required L(χ, s)
to be a function of a complex variable s rather than a real one.
10.6 Abel’s Limit Theorem
223
10.6 Abel’s Limit Theorem
We used the following result on p. 210 in the proof of Lemma 10.3.
Theorem 10.19. [Abel] Given a real power series
f(x) =
∞
an xn
n=0
that converges for all x with 0 < x < x0 , suppose that the limit
L=
∞
an xn0 = lim
N →∞
n=0
N
an xn0
n=0
exists. Then the limit of f(x) exists as x tends to x0 from below, and
lim f(x) = L.
x→x−
0
Proof. For any > 0, we will show that, for all x sufficiently close to x0 and
for all N sufficiently large,
N
n
n an (x0 − x ) .
(10.24)
n=0
To do this, rewrite the sum in the inequality (10.24) as
N
an (xn0
n=0
−x )=
n
N
an 1 −
n=0
x
x0
n xn0 .
Let y = x/x0 and use the geometric series expansion
1 − y n = (1 − y)(1 + y + y 2 + · · · + y n−1 )
to obtain
N
an (xn0 − xn ) = (1 − y)
n=0
N
an
n=1
n−1
y k xn0 .
(10.25)
k=0
We may interchange the summations since these sums are both finite, giving
N
n=0
an (xn0 − xn ) = (1 − y)
N
k=0
yk
N
an xn0 .
(10.26)
n=k+1
The coefficients of y k in Equation (10.26) form a Cauchy sequence since the
corresponding series converges. Hence they are bounded in absolute value by
some bk ,
224
10 Primes in an Arithmetic Progression
N
n
an x0 bk ,
n=k+1
which depends only on k, not on N . Moreover, we know that
bk −→ 0 as k −→ ∞
so we may choose K such that bk < for all k K. Then we can estimate
the sum in Equation (10.25) by splitting it,
N
K−1
∞
n
n an (x0 − x ) (1 − y)
y k bk + (1 − y)
y k .
(10.27)
n=0
k=0
k=K
As x tends to x0 , y tends to 1. This means that the first summand on the
right-hand side of the inequality (10.27) becomes arbitrarily small. The second
summand is equal to y K , and hence tends to , and our estimates no longer
depend on N .
Notes to Chapter 10: There are accounts of this material in many of the references. Monsky’s paper [109] discusses one of many possible simplifications of the
proof of Dirichlet’s Theorem. Theorem 10.19 is a result of Abelian type; typically the
converse of such a result is false but becomes true under an additional assumption.
The converse theorems obtained in this way are called Tauberian and are generally
deeper; see Hardy [74, Chapter VII]. The survey paper [68] of Gelbart and Miller
discusses the historical development of L-functions, starting with Riemann’s original
paper and explaining how L-functions may be associated with groups. The solution
to Exercise 10.3(c) is most naturally given in terms of adeles – see Ramakrishnan
and Valenza [121], Tate’s thesis [150], or Weil [158].
11
Converging Streams
In Chapters 2 and 4, and again in Chapters 8–10, we developed two different
approaches to number theory. The first viewed the study of numbers in more
algebraic terms, the second in more analytic terms. One of the great early
achievements of algebraic number theory was a reconciliation of these two
approaches in the class number formula, discussed in Section 11.1. The name
algebraic number theory is a little unfortunate – it is the study of algebraic
numbers (solutions of polynomial equations) and the fields and rings they
generate, but it uses an enormous range of techniques, including analysis. We
will need to discuss the evaluation of Gauss sums as part of this study. This
too shows that an idea which appears to belong within elementary number
theory can only be understood, apparently, by deep methods.
Similarly, in Chapters 5–7 we developed the arithmetic of elliptic curves
and this again appeared to be somewhat distinct in flavour from the other
topics. A second unifying theme presented informally in this chapter is the
conjecture of Birch and Swinnerton-Dyer, which is a profound connection
between the arithmetic and analytic properties of elliptic curves. The bridge
connecting this to the other chapters is the theory of L-functions. It is no
surprise that some of the deepest and most important unsolved problems in
number theory for the new millennium are concerned with these mysterious
objects.
11.1 The Class Number Formula
We will state the class number formula for quadratic fields and then look at
some aspects of the proof. There is a more general formulation of the class
number formula for algebraic number fields. Dirichlet originally proved the
quadratic case in 1837 using quadratic forms rather than quadratic fields, as
part of his proof concerning primes in arithmetic progressions. He showed that
the associated L-function, for real characters, does not vanish at 1 because it
is equal to a recognizably nonzero expression. Thus the results in this chapter
226
11 Converging Streams
could be taken as another approach to the nonvanishing of the L-function for
a real character.
In essence, the class number formula gives the class number as a finite
sum of easily computable quantities. In special cases, it gives an unexpected
relation between units of the ring of algebraic integers that depends upon
the class number. To this day, the only convincing proofs of this result use
nontrivial analysis.
Jacobi made many fundamental discoveries about the class number, and
Dirichlet was able to build upon these in his formulation and eventual proof of
the class number formula. Jacobi’s name is attached to a generalization of the
Legendre symbol (defined on p. 65) which will be needed later. This symbol
is sometimes called the quadratic symbol.
Definition 11.1. Let n 1 be an odd integer with prime factorization
αr
1
n = pα
1 · · · pr .
Then the Jacobi symbol
·
n
: Z → {0, ±1} is defined to be
α1 αr
a
a
a
=
···
,
n
p1
pr
where
a
pi
denotes the Legendre symbol.
a
= 0 unless gcd(n, a) = 1. Also, if a prime p does not divide n,
Clearly,
n
then
2 p a
a
=
n
n
and
a
2
p n
=
a
.
n
Thus, in order to evaluate Jacobi symbols, we need only consider a and n
square-free and coprime. Note that the symbol only depends upon a modulo n.
The Quadratic Reciprocity Law extends to the Jacobi symbol in the following form.
Theorem 11.2. Suppose a and n are positive, nonzero, coprime integers.
Then
n
a
= (−1)(a−1)/2·(n−1)/2
if a and n are odd;
n
a
2
2
= (−1)(n −1)/8 if n is odd.
n
11.1 The Class Number Formula
227
Exercise 11.1. Prove Theorem 11.2. (Hint: Consider the primes p ≡ 3 modulo 4 dividing a and n.)
Notice that the Jacobi symbol does not characterize the property of being
a quadratic residue in the way the Legendre symbol does.
Certainly if a is a
quadratic residue modulo n with gcd(a, n) = 1, then na = 1. The converse
does not hold, as the next example shows.
Example 11.3. Since 2 is not a quadratic residue modulo 3, it is not a quadratic
residue modulo 15. On the other hand,
2
2
2
=
by definition
15
3
5
= (−1)(−1) by Theorem 11.2
= 1.
a
, is an extension
Definition 11.4. The Kronecker symbol, also written
n
of the Jacobi symbol to all n = 0. It is defined by the properties:
•
a
= 0 if gcd(a, n) > 1;
n
a
1 if a > 0,
•
=
−1 if a < 0;
−1
a 1 if a ≡ ±1 (mod 8),
•
=
−1 if a ≡ ±3 (mod 8);
2
a
b
a
b
ab
=
.
•
cd
c
c
d
d
√
Exercise 11.2. Let D be the discriminant of the quadratic field Q( d)
for a
square-free integer d. Show that χ : Z → {0, ±1} defined by χ(n) = D
n is a
Dirichlet character modulo |D|.
Theorem 11.5. [Class Number
Formula] Let D denote the discriminant
√
of a quadratic field K = Q( d). Then the class number h of OK is given by
⎧
D−1
⎪
⎪
⎪− 1
χ(a) log sin(aπ/D) for D > 0,
⎪
⎪
⎨ 2 log u a=1
h=
|D|−1
⎪
⎪
w ⎪
⎪
⎪
χ(a)a
for D < 0,
⎩ − 2|D|
a=1
where w denotes the number of roots of unity in K, u is the fundamental unit
of K, and χ denotes the associated character (see Exercise 11.2).
228
11 Converging Streams
The fundamental unit u is introduced in Exercise 4.4 on p. 85.
The following theorem is a special case of the class number formula, yet it
already makes an amazing claim, giving a totally unexpected relation between
units in a real quadratic number field.
Theorem 11.6. Let q > 0 denote a prime congruent to 1 modulo 4. Then the
quantity
(q−1)/2
a
sin(aπ/q)−( q ) ,
v =
a=1
√
is a unit in Z[(1 + q)/2], the ring of algebraic integers in Q( q). This unit
is related to the fundamental unit u by the equation
√
v = uh ,
√
where h denotes the class number of the field Q( q).
Exercise 11.3. Prove that
+
sin(π/5) =
and deduce that
√
10 − 2 5
,
4
+
√
10 + 2 5
sin(2π/5) =
.
4
Use this to show
that Theorem 11.6 applied to the prime q = 5 constructs the
√
unit v = 1+2 5 .
Exercise 11.4. Show that h = 1 when −D is 3, 4, 7, 8, 11, 19, 43, 67, or 163.
Exercise 11.5. Use Exercise 4.5 on p. 85 to find the class number of
√
√
√
√
Q( 2), Q( 3), Q( 5) and Q( 7).
Gauss, using the equivalent notion for quadratic forms, conjectured that
the only values of −D with D > 0 for which h = 1 are those given in Exercise 11.4. This was eventually proved1 in the second half of the twentiethcentury by Heegner, and then by Baker and Stark. Baker and Stark verified
the result independently using different methods. They also solved the class
number two problem.
1
Kurt Heegner was a school teacher in Berlin. In 1952 he published a proof of
the class number one problem, an old and famous problem. For several reasons
– minor errors in the work, Heegner’s refusal to give seminar presentations of
his work, and perhaps some reluctance by professional mathematicians to accept
that an “amateur” had solved such an important problem – his proof was not
generally accepted. Heegner’s proof was finally accepted after Alan Baker and
Harold Stark independently proved the result in 1967.
11.2 The Dedekind Zeta Function
229
Exercise 11.6. Show that h = 2 when −D is 15, 20, 24, 35, 40, 51, 52, 88,
91, 115, 123, 148, 187, 232, 235, 267, 403, or 427.
Exercise 11.7. Using a computer
√ algebra package, find the class number of
the imaginary quadratic field Q( −d) for 1 d 100.
We are going to prove the class number formula up to sign. Then we will
consider the sign of the Gauss sum in a separate section. Firstly we develop
some machinery: These ideas are so important, they are worth considering
just for their own sake.
11.2 The Dedekind Zeta Function
A quadratic field has a complex function associated with it that generalizes
the Riemann zeta function associated with Q. The theory of ideals proved to
be ideal as a way of defining such a function. Euler’s momentous observation
about the product formula for the Riemann zeta function requires the Fundamental Theorem of Arithmetic in the integers, which does not hold at the
level of elements in the ring of algebraic integers in a quadratic field. Instead,
one defines the Dedekind zeta function as follows. If I is an ideal in OK , then
write N (I) for the norm of I defined on p. 87. Now define
ζK (s) =
N (I)−s
I
for s ∈ C, where the sum runs over all the nonzero ideals in OK . If K = Q, then
the Dedekind zeta function ζK coincides with the Riemann zeta function ζ.
Notice that our results about the recovery of the Fundamental Theorem
of Arithmetic at the level of ideals (see Section 4.3) means the Dedekind zeta
function admits an Euler product expansion
ζK (s) =
P
1−
1
N (P )s
−1
,
where the product is taken over all prime ideals of OK .
√
Exercise 11.8. Let K = Q( d) for a square-free integer d. Show that
ζK (s) = ζ(s)L(s, χ) for (s) > 1,
where ζ is the Riemann zeta function and L(·, χ) is the L-function for the
character χ from Exercise 11.2. (Hint: Use Exercise 4.18 on p. 91.) Using the
results from earlier chapters, deduce the analytic continuation of ζK to a larger
half-plane.
230
11 Converging Streams
Exercise 11.8, when combined with the results in Chapter 10, gives information about the nature of ζK (s) near s = 1. The L-function is analytic
at s = 1 so ζK (s) inherits only a simple pole from the Riemann zeta function.
The class number formula eventually follows because it is possible to directly
compute the nature of the singularity in two different ways and then equate
them. The residue of ζK at the pole s = 1 is given by the following theorem.
Theorem 11.7. Let ζK denote the Dedekind zeta function of a quadratic number field with discriminant D. Let h denote the class number of OK , let w
denote the number of units in OK if D < 0 and let u denote a fundamental
∗
unit of OK
if D > 0. Then
⎧
2h log u
⎪
⎪
if D > 0,
⎨ √
D
lim (s − 1)ζK (s) = ρK =
2πh
s→1
⎪
⎪
if D < 0.
⎩ +
w |D|
Dirichlet proved this using the language of quadratic forms. Nowadays,
Hecke’s proof using the language of fields and ideals tends to be preferred.
The proof sketched below is a variation of Hecke’s and uses some complex
analysis.
Outline proof of Theorem 11.7. The following proof varies from that in
most of the textbooks; references are provided in the notes at the end of the
chapter.
The two cases D > 0 and D < 0 vary ultimately but begin in the same
way. The idea is to sum over each ideal class, so fix an ideal class C. There is
a fixed ideal J belonging to the inverse class with the property that IJ = (bI )
is a principal ideal generated by the quadratic integer bI which depends on I.
Now bI ∈ J so sum over the elements of J, up to multiplication by units, and
consider
N (J) s
1
= N (J)s
.
(11.1)
N (b)
N (b)s
0=b∈J
0=b∈J
In this sum it is understood that no pair of distinct elements b and b have b/b
equal to a unit. The technical details thus become much easier in the imaginary
quadratic case because there are only finitely many units, so we assume for
now that D < 0. After choosing a basis {b1 , b2 } for the ideal J, N (b) becomes
a positive-definite quadratic form Q(x, y) in the variables x, y, where b =
b1 x + b2 y.
The sum in Equation (11.1) can be compared with the corresponding integral
Q(x, y)−s dxdy,
A
where x and y now become continuous variables. The range of integration A
is an infinite cone, with a small region around (0, 0) removed to take account
11.2 The Dedekind Zeta Function
231
of the fact that b = 0 is excluded from the summation. The shape of the cone
depends on the number of units in OK – if there are w units in OK then the
angle of the cone is 2π/w. The difference between the sum and the integral is
analytic on the region (s) > 0. The method of proof for this is an extension
of the methods used in the second proof of Theorem 8.29 on p. 176. Around
each integral point (x, y) there is a square of area 1. The integral of Q−s over
that square differs from Q(x, y)−s by an amount that may be computed using
the Taylor expansion. The sum over the integral points near the boundary
of the cone contributes a function which is analytic on R(s) > 12 so we can
ignore it.
Exercise 11.9. Prove that this sum is analytic, by comparing the sum with
an integral along an infinite strip with parallel sides.
It follows that the singularity of the sum can be calculated from that of
the integral.
The integral is easily integrated because Q is+a positive-definite quadratic
form. A linear substitution with Jacobian N (J) |D| reduces the integral to
1
+
(X 2 + Y 2 )−s dXdY,
N (J) |D|
B
where the region B is a cone of angle 2π/w with a small region around the
origin removed. Using polar coordinates (r, θ), the region around the origin
can be taken to be r < 1. The resulting integral has a simple pole at s = 1
with residue
2π
+
.
wN (J) |D|
When evaluating the residue in the sum in Equation (11.1) there are two
things to be borne in mind. The first is that the factor N (J) cancels because
of the factor N (J)s in Equation (11.1), which is N (J) when s = 1. The second
is that we sum over all the ideal classes. This explains the appearance of the
term h and the shape of the residue in Theorem 11.7.
When D > 0 a complication is added to the proof because there are
infinitely many units in the ring of algebraic integers OK . Thus greater care is
needed in counting the elements b of J. Roughly the same technique is used
as in the D < 0 case – the sum can be written as in Equation (11.1) with the
same proviso about elements differing by unit multiplication. An additional
assumption will be inserted however; assume that b > 0 and work with the
subgroup of positive units. Write b and b∗ for the conjugates of b ∈ J and fix
a basis for J as above. Writing b = b1 x + b2 y and extending the coordinates x
and y to continuous real variables, we now switch to the continuous, positive
real variables X√ = b1 x + b2 y and Y = b∗1 x + b∗2 y. The transformation has
Jacobian N (J) D exactly as before. A factor of 2 will be inserted below
because of the assumption about the positive unit group. The sum equals (up
232
11 Converging Streams
to an analytic function on (s) > 12 ) the following integral in a half-plane
containing s = 1:
2
√
(XY )−s dXdY,
N (J) D
C
where C is a region we will now describe. Essentially, the region is a fundamental domain under the action of the positive part of the unit group. This
can be expressed neatly in terms of X and Y as follows: The map b → bu
sends c = b∗ /b to c/u2 . Hence each element b in the sum can be represented
by one with
b∗
1
< u2 .
b
This imposes the following constraints upon X and Y :
1
Y
< u2 .
X
(11.2)
One final piece of bookkeeping requires a small region around (0, 0) to be
removed, to take account of the fact that b = 0 is not included in the sum:
Perform the double integral, first over Y satisfying the inequality (11.2) and
then over the interval 1 < X < ∞ – the lower bound ensuring that a suitable
region has been removed around the origin. Integrating over Y gives
-
.
∞
∞
2
2(u2(1−s) − 1)
−s
−s
√
√
X
Y
dY dX =
X 1−2s dX.
Y
2
N (J) D 1
(1
−
s)N
(J)
D
1
1 X <u
Integrating over X yields the following closed formula for the value of the
integral:
1 − u2−2s
√
.
I=
N (J) D(1 − s)2
Although this might appear to yield a double pole at s = 1, notice that s = 1
is a zero of the numerator. The Taylor series of the numerator about s = 1
begins
1 − 1 + 2(s − 1) log u + · · · .
The term N (J) disappears just as it did before. Now the claims made about
the nature of the singularity and the residue in Theorem 11.7 follow after
summing over the classes.
A version of Theorem 11.7 exists which is expressed in terms of the counting of norms of ideals.
Exercise 11.10. Let K denote a quadratic field with discriminant D. For
positive real T , let R(T ) denote the ideal counting function defined as follows:
R(T ) = |{I | N (I) < T }|,
where I denotes an ideal of the ring of algebraic integers of K. Prove that
11.3 Proof of the Class Number Formula
R(T )
→ ρK
T
233
T → ∞,
as
where ρK is defined by the formulas in Theorem 11.7, depending upon the
sign of D.
11.3 Proof of the Class Number Formula
Before we can prove Theorem 11.5, we record some basic lemmas to allow the
proof to proceed unhindered.
Lemma 11.8. Let N > 1 denote a positive integer and suppose 1 a < N .
Then
2a
πa πi
+
− log 1 − e2πai/N = − log 2 sin
1−
.
N
2
N
Proof. This is elementary, relying only upon the definition of the principal
branch of the complex logarithm and some manipulation with the half-angle
formulas.
Let χ denote the character from Exercise 11.2. We will use a form of Gauss
sum (compare this definition with the one in Equation (3.11)).
Lemma 11.9. Let ζ = e2πi/|D| and define
|D|−1
G=
χ(a)ζ a .
(11.3)
a=1
Then
G2 = D.
(11.4)
|D|−1
1 χ(a)ζ an .
G a=1
(11.5)
Moreover, for every n,
χ(n) =
Proof. The claim about G2 is proved in exactly the same way as Equation (3.12) on p. 70 was proved. The product G2 may be written
|D|−1
a=1;
(a,|D|)=1
|D|−1
|D|−1
χ(a)ζ
a
r=1;
(r,|D|)=1
χ(ar)ζ
ar
=
|D|−1
χ(r)ζ (1+r)a .
a=1;
r=1;
(a,|D|)=1 (r,|D|)=1
Instead of adding zero in the form of Equation (3.14), here we add zero in the
two forms
234
11 Converging Streams
|D|−1
|D|−1
χ(r)ζ (1+r)a
a=1;
r=1;
(a,|D|)=1 (r,|D|)>1
and
|D|−1
|D|−1
a=1;
(a,|D|)>1
r=1
χ(r)ζ (1+r)a .
The rest of the proof proceeds as before. Equation (11.5) is proved as follows.
For (n, D) = 1,
|D|−1
|D|−1
χ(a)ζ an =
a=1
|D|−1
χ(an2 )ζ an = χ(n)
a=1
χ(an)ζ an = χ(n)G,
a=1
while for (n, D) > 1 it is clear.
Notice that G lies in a quadratic field. Which quadratic field it lies in
depends upon the sign of D.
√ If D > 0 then it follows from Equation (11.5)
that G is real; hence G = ± D by Equation (11.4).
When D < 0, it follows that
G = −G,
+
so G = ±i |D| using Equation (11.4). Understanding which sign occurs is
not trivial – during the proof below we will fudge this issue, essentially giving
a proof of the class number formula up to sign. In Section 11.4 we will show
how a simple application of Fourier Analysis can be used to determine the
sign of the simplest Gauss sum.
Proof of Theorem 11.5. We begin with the formal evaluation of L(χ, 1),
worrying about convergence later. By definition,
L(χ, 1) =
∞
χ(n)
.
n
n=1
Apply Equation (11.5) and rearrange to give
L(χ, 1) =
|D|−1
∞
1 1 2πain/|D|
χ(a)
.
e
G a=1
n
n=1
The inner sum is − log 1 − e2πai/|D| so invoke Lemma 11.8 with N = |D| to
obtain
L(χ, 1) = −
|D|−1
1 πi
2a
πa
+
1−
.
χ(a) log 2 sin
G a=1
|D|
2
|D|
(11.6)
11.4 The Sign of the Gauss Sum
235
At this point, the sign of D brings about a dichotomy. If D > 0 then only
the logarithm terms in Equation (11.6) survive. We know that L(χ, 1) must be
real and since the Gauss sum G is real, the imaginary part of Equation (11.6)
must cancel. We obtain
|D|−1
1 πa
L(χ, 1) = −
.
χ(a) log sin
G a=1
D
+
+
By Theorem 11.7 this is equal to 2h log u/ |D|. Since G = ± |D|, cancellation occurs and the theorem is proved up to sign.
When D < 0 it is the logarithm terms in Equation (11.6) which cancel.
The Gauss sum G is purely imaginary so no real part of Equation (11.6) can
survive. The cancelling leaves
|D|−1
πi χ(a)a.
L(χ, 1) = −
G|D| a=1
+
By Theorem 11.7
+this is equal to 2πh/w |D|. In this case, cancellation occurs
because G = ±i |D| and once again the formula is proved up to sign.
Finally, the rearrangement can be justified using Abel’s Summation Formula,
Theorem 10.17. All that is needed to use this is the uniform boundedness
of ax χ(a).
Alternatively, one could work backwards from the Taylor Series for the
sine function, whose convergence properties are known to be adequate.
Proof of Theorem 11.6. In this special case the class number formula
comes out as
(q−1)/2 aπ
a
log sin
= h log u,
q
q
a=1
which proves the claim about v as well as giving the proof of its relation
with u.
Of particular note is the way that so much of our earlier material goes
into the class number formula. It suggests that the formula lies very deep in
the fabric of number theory, and it has long been recognized as a profound
relationship. We hope this material might persuade a reader to look into more
advanced topics in the area of overlap between algebra and analysis.
11.4 The Sign of the Gauss Sum
We are only going to consider a simple example, which should be compared
with the Gauss sumdefined
by Equation (3.11) on p. 70. Let q denote an odd
·
prime number and q the Legendre symbol.
236
11 Converging Streams
Write
G=
q−1 a
a=1
q
ζa
2πi/q
where ζ = e
. It is important to recognize, as Gauss himself did, that G is
sensitive to the choice of qth root of unity. In particular, replacing ζ by some
other primitive qth root of unity could change the sign of G. In the following,
we will use Dirichlet’s method2 to evaluate G, based on Fourier analysis.
Exercise 11.11. Generalize Exercise 5.11 to show that
q−1
e2πia/q = −1.
a=1
By Exercise 11.11 we may write
G=
q−1 a
a=1
=2
q
q−1
q−1
e2πia/q +
e2πia/q + 1
a=1
e2πia/q + 1
a=1;
( aq )=1
=
q−1
e2πib
2
/q
.
b=0
Dirichlet’s method works for a more general class of sums.
Theorem 11.10. [Dirichlet] Let N denote a positive integer and define
H=
N
−1
e2πik
2
/N
.
(11.7)
k=0
√
⎧
(1
+
i)
⎪
⎪
√ N
⎨
N
H=
⎪
0
⎪
√
⎩
i N
Then
where
2
√
if
if
if
if
N
N
N
N
≡ 0 (mod 4),
≡ 1 (mod 4),
≡ 2 (mod 4),
≡ 3 (mod 4),
N denotes the positive square root.
Dirichlet contributed significantly to the theory of Fourier analysis. For example, in 1829, he became the first person to give a rigorous proof of the Poisson
Summation Formula. He obtained Theorem 11.10 in 1835 as an application.
11.4 The Sign of the Gauss Sum
237
Proof. The functions {x → e2πinx/N }n∈Z form an orthonormal family with
respect to the inner product
< f, g >=
1
N
N
f (t)g(t) dt.
0
2
The Fourier expansion3 of the map x → e2πix /N with respect to the orthonormal family shows that
∞
1 N 2πix2 /N −2πinx/N
2πik2 /N
=
e
e
dx e2πink/N ,
e
N
0
n=−∞
so
H=
N
−1
∞
k=0 n=−∞
=
∞
n=−∞
1
N
1
N
N
2πix2 /N −2πinx/N
e
e
0
N
2πix2 /N −2πinx/N
e
e
dx
dx e2πink/N
N −1
0
e2πink/N .
(11.8)
k=0
The orthogonality relations Theorem 10.12 applied to the group Z/N Z show
that
N
−1
N if N n,
2πink/N
e
=
0
if N n.
k=0
Thus Equation (11.8) simplifies to give
∞ H=
N
e2πix
2
/N −2πinx
e
dx
0
=
n=−∞
∞ N
n=−∞ 0
∞
=N
2
e2πi(x
−πin2 N/2
3
x
N
dx
1−n/2
2
e2πiN v dv
e
(11.9)
−n/2
n=−∞
where v =
−N nx)/N
− n2 . Now
The function which is being expanded here is defined as follows. Let
2
f (x) = e2πix
/N
for 0 x < N,
and then extend f by requiring that f (x + N ) = f (x) for all x. It is clear
that f (0) = limx→N f (x), so the resulting function is continuous and piecewise
twice continuously differentiable. It follows that the Fourier series for f converges
pointwise to f everywhere.
238
11 Converging Streams
e−πin
2
N/2
=
1
i−N
if n is even,
if n is odd
because odd squares are congruent to 1 modulo 4 so the sum in Equation (11.9)
may be split into sums over n = 2m + 1 and n = 2m, giving
H=N
∞
m=−∞
1−m
2πiN v 2
e
−N
dv + N i
−m
∞
−m+1/2
2
e2πiN v dv.
−m−1/2
m=−∞
Recombining the integrals shows that
H = N (1 + i−N )
=
√
−N
N (1 + i
∞
2
e2πiN v dv
−∞
∞
)
2
e2πiw dw
(11.10)
−∞
√
where w = v N . To compute the integral, notice that Equation (11.10) holds
for all N , in particular for N = 1. When N = 1, H = 1 by Equation (11.7)
and it follows that
∞
2
1
e2πiw dw =
.
1
+
i−1
−∞
Substituting this value gives
H=
1 + i−N √
N,
1 + i−1
and checking the possible congruence classes modulo 4 completes the proof of
Theorem 11.10.
Exercise 11.12. Evaluate the sum G in Equation (11.3).
11.5 The Conjectures of Birch and Swinnerton-Dyer
The group law on an elliptic curve introduced in Chapter 5 involves rational
functions only so, as pointed out in Section 5.3, the theory of elliptic curves
makes sense over a finite field Fq of characteristic p as long as we avoid division
by p. Exercises 5.20 and 5.21 on p. 109 began our study of elliptic curves over
finite fields.
11.5.1 The Hasse Theorem
Consider the elliptic curve E defined by the affine equation
E : y 2 = x3 + ax + b,
(11.11)
11.5 The Conjectures of Birch and Swinnerton-Dyer
239
with a and b integral. Recall that the non-degeneracy condition on the curve E
is defined in terms of
∆ = 4a3 + 27b2 .
Fix a prime p. If p ∆, then the curve defined over Fp obtained by reducing Equation (11.11) modulo p is an elliptic curve; let Np = Np (E) denote
the number of points on this curve. How large is Np ?
There is a trivial bound: The projective plane P2 (Fp ) is defined by
P2 (Fp ) = {(x, y, z) ∈ F3p | (x, y, z) = (0, 0, 0)}/ ∼,
where (x, y, z) ∼ (x , y , z ) if and only if there is a λ ∈ F∗p with
(x, y, z) = λ(x , y , z ).
There are (p3 − 1) choices for the triple (x, y, z), and each equivalence class
3
−1
under ∼ has |F∗p | = (p−1) elements. It follows that there are pp−1
= p2 +p+1
points in P2 (Fp ), so certainly
|Np | p2 + p + 1.
(11.12)
This estimate ignores the fact that the points we are counting lie on the curve
defined by Equation (11.11), so it is hardly surprising that much more can be
done. The curve in projective coordinates is defined by
{[x, y, z] ∈ P2 (Fp ) | y 2 z = x3 + axz 2 + bz 3 }.
For each of the (p2 − p) possible values of (x, z) with z = 0, we are trying to
solve an equation of the form y 2 = f (x, z) for y. This has at most two possible
solutions. If z = 0, then x = 0 also, so y = 0, and there is (projectively) one
choice for y. Thus
|Np | 2(p2 − p)/(p − 1) + (p − 1)/(p − 1) = 2p + 1,
(11.13)
a dramatic improvement over the inequality (11.12). However, we know that
not all numbers are quadratic residues modulo p. Indeed, for p = 2, exactly
half of the elements of F∗p are quadratic residues.
Exercise 11.13. Let p denote an odd prime, and assume that gcd(a, p) = 1.
Find the exact number of solutions to y 2 = ax + b over Fp .
Exercise 11.14. Let p denote a prime congruent to 2 modulo 3. Show that
for an elliptic curve of the form E : y 2 = x3 + b, Np = p + 1.
Exercise 11.15. Let p denote a prime congruent to 3 modulo 4. Show that
for an elliptic curve of the form E : y 2 = x3 − x, Np = p + 1.
240
11 Converging Streams
Our problem is a little more subtle. Working in projective coordinates as
before, when z = 0 there are two choices (or one choice if p = 2) for y if
(x3 + axz 2 + bz 2 )/z
is a quadratic residue modulo p and no possible choices for y if
(x3 + axz 2 + bz 2 )/z
is a quadratic nonresidue modulo p. If z = 0, then x = 0, so there is one choice
for y. Thus for an odd prime p, we expect
|Np | = 2|{x ∈ Fp | x3 + ax + b is a quadratic residue (mod p)}|
+|{x ∈ Fp | x3 + ax + b ≡ 0 (mod p)}|
+1.
It is not clear if this is an improvement over the inequality (11.13), but it
is nonetheless suggestive. For now, we ignore the second term since no more
than three values of x in Fp will have x3 +ax+b ≡ 0 modulo p. If x3 +ax+b is
no more or less likely than x to be a quadratic residue, then we would expect
the first term to contribute p to the total. This suggests that we write
Np = (p + 1) − ap ,
(11.14)
where ap encodes the information about the extent to which the polynomial x3 + ax + b fails to distribute its values fairly between quadratic residues
and nonresidues. If the polynomial behaves reasonably well, then we expect
the “error” ap defined by Equation (11.14) to be small relative to the prime p.
This turns out to be the case. The next theorem was conjectured by Artin
in his thesis and proved by Hasse. We will not prove it here; proofs may be
found in several of the references at the end of the chapter.
Theorem 11.11. [Hasse’s Theorem] Let Np denote the number of points
in Fp on an elliptic curve defined over Fp . Then
√
|Np − (p + 1)| 2 p.
(11.15)
Notice
that the hypothesis (of being an elliptic curve over Fp ) requires
that p ∆. This theorem gives a precise bound for the size of the error term ap
in Equation (11.14).
Birch and Swinnerton-Dyer carried out extensive calculations on the numbers Np and combined this with deep theoretical insights into the arithmetic of
elliptic curves. One of the resulting conjectures is that one of the “global” measures of the complexity of E(Q) – the rank of the curve – should be reflected
in the extent to which the “local” quantities Np exceed (p + 1). Numerical
experiments led to the following conjecture.
11.5 The Conjectures of Birch and Swinnerton-Dyer
241
Conjecture 11.12. There is a constant C, depending only on the curve E, with
the property that
Np
∼ C(log X)r
p
p∈P,p<X
where r is the rank of the group E(Q).
11.5.2 The L-function Attached to an Elliptic Curve
The error term ap defined in Equation (11.14) as p varies carries subtle information about the elliptic curve. For p∆, the reduction of the curve modulo p
is not an elliptic curve. This phenomena is called bad reduction.
In order to understand some of the possibilities in the more familiar setting
of curves over the reals, we recall two examples. The point (0, 0) on y 2 = x3
(mentioned on p. 54) is a cusp – a point where two tangents coincide. The
point (1, 0) on y 2 = x2 (x + 1) is also singular but for a different reason: there
is a pair of distinct tangents (see Exercise 5.2).
Returning to the primes of bad reduction, the associated number ap is
defined according to the type of reduction as follows, using formal derivatives
and tangents over finite fields.
•
If the curve reduced modulo p has a cusp (a point where two tangents
coincide), then set ap to be 0. This is called additive reduction.
• If the curve has a double point as its only singularity, with tangents having
rational slopes over Fp , then set ap to be 1. This is called split multiplicative
reduction.
• The remaining possibility is that the curve has a double point as its only
singularity over Fp , with tangents having slopes defined over a quadratic
extension of Fp but not over Fp . In that case set ap to be −1. This is called
non-split multiplicative reduction.
With the notation above, the L-function attached to the curve E is defined
as an Euler product to be
−1 −1
LE (s) =
1 − ap p−s
1 − ap p−s + p1−2s
(11.16)
p|∆
p |∆
Exercise 11.16. Use the inequality (11.15) to show that the Euler product (11.16) converges when (s) > 32 .
Exercise 11.17. If LE is expanded as a Dirichlet series
LE (s) =
prove that for each prime p, cp = ap .
∞
cn
,
ns
n=1
242
11 Converging Streams
The Euler product defining LE does not converge at s = 1, but for any X
notice that Equation (11.14) shows that
p
−1
=
(1 − ap /p + 1/p) .
Np
p<X
p<X
Thus
p|∆,
p<X
1 − ap p−1
−1 1 − ap p−1 + p1−2
−1
=A×
p |∆,
p<X
p
,
Np
p<X
where A is a term depending only on the finitely many primes dividing ∆.
What this means is that if the L-function can be extended to include the
point s = 1, we expect the analytic properties at s = 1 of the extension
to carry information about the numbers Np . The conjectures of Birch and
Swinnerton-Dyer are a very precise formulation of this idea. In order to state
some of them, we need several preliminary results and some new definitions.
Some details, even of the definitions, are omitted.
The first of the results needed is a conjecture usually attributed to Hasse,
Weil, and Deuring, which was proved by work of Wiles and of Taylor and
Wiles in a special case. The full result was obtained by a strengthening of
Wiles’ method due to Breuil, Conrad, Diamond, and Taylor. The arithmetic
conductor NE of the elliptic curve E, referred to in Theorem 11.13, is a refined
version of ∆. It has the same prime divisors as ∆ but reflects more precisely
the behavior of the curve under reduction modulo each prime.
Theorem 11.13. The L-function LE extends to an entire function on the
whole complex plane, and the extended function satisfies a functional equation
of the form
ΛE (s) = ±ΛE (2 − s),
where
ΛE (s) =
1
s/2
Γ (s)NE LE (s).
(2π)s
There are several aspects to the conjecture of Birch and Swinnerton-Dyer,
two of which we present below in ascending order of strength.
Conjecture 11.14. [Birch and Swinnerton-Dyer] The extended L-function
has the following properties.
1. LE (1) = 0 if and only if the rank of E is zero.
2. If the expansion of LE at s = 1 has the form
LE (s) = bg (s − 1)g + · · ·
with g 0 and bg = 0, then the rank of E is g.
11.5 The Conjectures of Birch and Swinnerton-Dyer
243
The third part of the conjecture is the analog of the formula in Equation (11.7); it gives an exact formula for bg in terms of data associated with
the elliptic curve. It is too complicated to state here but may be found in
the references. Remarkably, each of the ingredients in Theorem 11.7 has an
elliptic analog.
The conjecture of Birch and Swinnerton-Dyer is one of the seven Millennium Prize Problems announced by the Clay Mathematics Institute.
11.5.3 Tunnell’s Theorem
Theorem 5.8 hints at a connection between elliptic curves and congruent numbers. It is very far from a characterization of congruent numbers. A deep
result due to Tunnell exploits the connection between elliptic curves, their Lfunctions and congruent numbers at a more profound level, and gives a form
of characterization of congruent numbers. Part of his result may be stated as
follows.
Theorem 11.15. [Tunnell’s Theorem] Let n be an odd natural number
with no square factor. Say that n has property T if the number of integer
triples (x, y, z) satisfying the equation 2x2 + y 2 + 8z 2 = n is twice the number
of integer triples (x, y, z) satisfying 2x2 + y 2 + 32z 2 = n. Then
n is congruent =⇒ n has property T .
If (part of ) the Birch and Swinnerton-Dyer conjecture holds, then
n has property T
=⇒ n is congruent.
Notes to Chapter 11: The material in Sections 11.1 to 11.4 is classical, and complete treatments may be found in many of the references, including Davenport [40],
Hasse [76], Lang [96], and Weil [158]. Davenport’s book [40] approaches the class
number formula via quadratic forms, which is closer to Dirichlet’s original formulation. The results on the class number one problem mentioned appear in papers
of Baker [7], Heegner [78], and Stark [146]. We followed Davenport [40, Chapter 2]
closely in the proof of Theorem 11.10. The equivalence between ideal counting (Exercise 11.10) and Theorem 11.7 is explained, in the general case, in Lang [96, Chapters
VI and XV]. Hasse’s Theorem is proved in many places including Chahal [28] and
Silverman [139]; Silverman and Tate [143] give a particularly accessible treatment.
For more on Conjecture 11.12 and its connection to the conjecture of Birch and
Swinnerton-Dyer, see the paper of Goldfeld [69]. The work of Wiles and others on
Theorem 11.13 is discussed in many places, including the notes of Darmon [39],
which contains references to the original papers; the analytic continuation as we
have stated it is proved in the paper of Breuil, Conrad, Diamond and Taylor [23].
The conjectures of Birch and Swinnerton-Dyer, and the proofs of special cases, are
244
11 Converging Streams
discussed in (relatively) accessible form in a paper by Wiles [161] on the Clay Institute Web site and in a paper of Zagier [166]. They were originally formulated in
the papers [16] of Birch and Swinnerton-Dyer. Tunnell’s Theorem (Theorem 11.15)
is shown in his paper [155]; an accessible treatment is in Koblitz [89].
12
Computational Number Theory
Expressions such as “too slow” or “computationally infeasible” have been used
to describe methods for finding large primes and verifying their primality.
For example, we dismissed the sieve of Eratosthenes on these grounds. In
this chapter, we are going to discuss objective tests on algorithms to try to
quantify statements about how fast or slowly they may run. This comprises a
brief introduction to a field of growing importance. Three important themes
taken up in the books cited at the end of the chapter are the following. How
can advances in the speed of electronic computing devices be exploited by
number theorists? How can number theory be applied to improve the speed
of calculations? How can number theory contribute to the search for secure
methods of communication?
12.1 Complexity of Arithmetic Computations
To begin, let us analyze the basic arithmetic operations. A computer may
seem to take no time at all to carry out a simple calculation, but in fact
computer calculations run in a time that is approximately proportional to the
number of bit operations required. Computers work in binary so instead of
expressing integers in decimal notation, we use their binary equivalents: Just
as the string of decimal digits ak ak−1 · · · a0 is used to denote the integer
a0 + 10a1 + · · · + 10k ak ,
a string of binary (zero–one) digits bk bk−1 · · · b0 denotes the integer
b0 + 2b1 + · · · + 2k bk .
The first few binary numbers are shown in Table 12.1.
Arithmetic in binary is carried out as usual. For example, the long multiplication 120 · 30 = 3600 becomes
246
12 Computational Number Theory
Table 12.1. Binary numbers.
Decimal number Binary equivalent
1
1
2
10
3
11
4
100
5
101
6
110
7
111
1111000 × 11110 = 111000010000
by the calculation shown in Figure 12.1.
Notice that this only involves the repetition of a simple basic operation:
Add a binary digit 0 or 1 to a binary digit 0 or 1 and output a 0 or a 1
with a carry of a 0 or a 1 to the next column. This bit operation (bit is an
abbreviation of binary digit),
0 + 0 → 0,
1 + 0 → 1,
0 + 1 → 1,
1 + 1 → 0 carry 1,
is the building block from which many arithmetic operations can be built. It
will be useful to allow a carried digit to be included; for example, 1 plus 1
with a carried digit 1 outputs 1 and a carried 1 in one operation.
1111000
11110×
1111
111
11
1
0
1
1
1
0
0
1
1
0
00
000
1000
+
+
+
111000010000
Figure 12.1. Binary multiplication.
Definition 12.1. A bit operation is the process of adding or subtracting two
binary digits, taking account of any carried or borrowed digits, and outputting
an answer and a carry or a borrow.
Now consider the problem of adding two numbers m and n presented in
decimal form, outputting the sum in binary. There are two steps to carry out.
1. Convert m and n to binary. Without loss of generality, assume that m
has k bits and n has bits with k .
12.1 Complexity of Arithmetic Computations
247
2. Add the two numbers. Notice that no more than (k + 1) bit operations
will be required.
Definition 12.2. The complexity C of an arithmetic calculation is the number of bit operations required to carry it out.
The complexity of a calculation is an upper bound on the time it will take
to run on a computer.
Thus
C(add a k-bit number to an -bit number) k + 1 if k .
Example 12.3. The calculation 23 + 7 = 30 in binary takes five bit operations:
10111
111+
11110
Recall the following notation for functions f, g : N → R. We say that
f = O(g) or f (x) = O(g(x))
if there is a constant C 0 with
f (x) Cg(x) for all x ∈ N.
This is particularly convenient for complexity calculations because it is only
an upper bound: C = O(g) is a true statement as soon as we find an algorithm that runs in g bit operations, even if we have missed an ingenious
algorithm that runs much faster. Finding lower bounds for the complexity of
a calculation is a more subtle problem that we do not touch on here.
Lemma 12.4. For k ,
C(add a k-bit number to an -bit number) = O(k).
Subtracting one number from another has the same complexity.
Lemma 12.5. For k ,
C(subtract an -bit number from a k-bit number) = O(k).
Lemma 12.6. For k ,
C(multiply a k-bit number by an -bit number) = O(k 2 ).
248
12 Computational Number Theory
∗k ∗k−1 . . . . . . . . . . . . ∗1
∗ ∗−1 . . . ∗1
∗ ∗ ... ∗
∗ ∗ ...
..
.
..
.
∗
..
.
∗
∗
... ∗
Figure 12.2. Multiplying a k-bit number by an -bit number.
Proof. Notice that all we need is an upper bound for the number of bit
operations required. The long multiplication can be done in the shape shown
in Figure 12.2, with arbitrary bits denoted ∗. We have the following upper
bounds. There are no more than (k + 2) operations needed to add the lowest
two rows, resulting in an integer with no more than (k + 2) bits. Adding this
to the third lowest row requires no more than (k + 3) bit operations, resulting
in an integer with no more than (k + 3) bits, and so on. The total number of
bit operations does not exceed
(k + 2) + (k + 3) + · · · + ( + k + 1) k( + k + 1)
k(2k + 1) = O(k 2 ).
Dividing one integer by another results in a rational number. In arithmetic,
we are primarily interested in integer operations, so division means finding the
quotient and remainder.
Lemma 12.7. For k ,
C(divide a k-bit number by an -bit number) = O(k 2 ).
Proof. This calculation amounts to repeatedly subtracting an -bit number
from a k-bit number and testing to see if the answer is less than the -bit
number. By Lemma 12.5, this takes no more than k O(k) = O(k 2 ) bit operations.
12.1.1 Improving Complexity Estimates
We have been considering only the most obvious methods for doing arithmetic.
It is possible to speed up these basic algorithms considerably. For example,
Lemma 12.6 says that
12.1 Complexity of Arithmetic Computations
249
C(multiply a k-bit number by a k-bit number) = O(k 2 ).
Using a different algorithm to do the multiplying allows complexity
O(k · log k · log log k · log log log k).
We will not prove any sophisticated results of this kind, but some indication of how even simple algorithms can be improved is given by the following
result.
Theorem 12.8. C(multiply a k-bit number by a k-bit number) = O(k 1.59 ).
Proof. This relies on chopping up the integers m and n to be multiplied in
a clever way (as all these methods do). Assume for simplicity that k is even.
If m has k bits, then we can write
m = a · 2k/2 + b,
where a and b have at most k/2 bits. Similarly,
n = c · 2k/2 + d,
where c and d have at most k/2 bits.
Then
mn = ac2k + (bc + ad)2k/2 + bd
= ac2k + [ac + bd − (a − b)(c − d)]2k/2 + bd.
We have computed ac, bd, (a − b)(c − d), all of which are approximately
half the bit length of m and n. Notice that multiplying by 2k really constitutes
a shift of binary data and so can be ignored in complexity terms.
Write C(k) for the complexity of multiplying two k-bit numbers this way.
Then
C(k) 3C(k/2) + O(k).
This means that there is a constant C1 with
C(k) 3C(k/2) + C1 k.
We want to iterate this to find an estimate for C(k); the danger is that we cannot add O(·) terms carelessly. If we abuse notation slightly to also write C(k)
for an upper bound of the functions satisfying the relation for real values of k
as well as integral values, then
C(k) 3C(k/2) + C1 k
3 3C(k/22 ) + C1 (k/2) + C1 k
32 C(k/22 ) + (1 + 3/2)C1 k
..
.
3r C(k/2r ) + (1 + 3/2 + · · · + (3/2)r−1 ) for all r 1,
250
12 Computational Number Theory
so
C(k) 3r C(k/2r ) + C2 k(3/2)r for all r 1.
Choose r so large that k/2r 1; that is, r = log k/ log 2. It follows that
C(k) = O 3log k/ log 2
= O k log 3/ log 2
= O k 1.59 .
12.1.2 Polynomial Complexity
All the complexity calculations so far have involved the number of bits in the
numbers. Sometimes it is useful to relate this to the size of the numbers in
the usual sense. If n has bit length k, then
2k−1 n 1 + 2 + · · · + 2k−1 = 2k − 1 2k ,
so
(k − 1) log 2 log n k log 2.
This means that log n is interchangeable with k in all O-estimates. For example,
C(multiply m by n) = O(log m · log n).
The complexity estimates above have been logarithmic in the input variables. In general, a computation involving numbers n1 , . . . , nr is said to have
polynomial complexity (or to be a polynomial time calculation) if its complexity satisfies
C = O (log n1 )d1 · · · (log nr )dr ,
where the di are integers. As a rule of thumb, polynomial complexity estimates
are considered computationally feasible for large values of the variables. Of
course, in practice, the constant hidden in the O notation will influence the
running time. For practical implementation, it is desirable to make this as
small as possible. We do not touch on such issues here.
There are several quite basic arithmetic problems that are believed not to
be solvable in polynomial time, and this complexity is used in cryptography.
Example 12.9. [The Knapsack Problem] Given natural numbers n1 , . . . , nd
and N , decide if there is a subset I of the index set {1, 2, . . . , d} with the
property that
ni = N.
i∈I
The most naı̈ve approach is to simply try all the subsets, and there are 2d of
them. No algorithm is known to decide this problem in polynomial time.
12.2 Public-key Cryptography
251
12.2 Public-key Cryptography
Part of the great interest in computational number theory comes about because the arithmetic properties of large integers lie at the heart of all modern
methods of secure electronic communication. This is a large and active field
of research, and we will simply describe one very simple example, the RSA
cryptosystem. This is a method for encrypting a message, in such a way that
only the intended recipient can feasibly decrypt it. It also allows for digital
signatures that authenticate that you are who you claim to be. The scheme
is named after Ronald Rivest, Adi Shamir, and Leonard Adleman who developed it in 1977. What we will describe below is a method for parties to
send each other large numbers (that may be used to encode any form of message) in a secure fashion. Real implementations of RSA are a great deal more
complicated than the description here, but the basic idea of connecting the
difficulty of an arithmetic operation to secure communication should become
clear.
12.2.1 The RSA algorithm
Fix two large primes p and q, and compute their product n = pq. The number n is called the modulus of the resulting scheme. Choose a number e, the
public exponent, with
1 e < n;
gcd(e, (p − 1)(q − 1)) = 1.
Next compute d, the private exponent, with the property that
ed ≡ 1 (mod (p − 1)(q − 1)).
There is such a d since the requirement that gcd(e, (p − 1)(q − 1)) = 1 means
∗
that e ∈ (Z/pqZ) (see Corollary 1.25 on p. 37). Now publish openly the public
key, which is the pair of numbers (n, e). The private key (n, d) will be used
to decode messages. The original primes p and q are no longer needed – but
must be kept secret.
The security of the system is based on the following: It is very difficult to
compute d given e and n, unless you know p and q. Finding the inverse of e
modulo (p − 1)(q − 1) is easy if you know the value of φ(pq) = (p − 1)(q − 1),
but here we know the value of pq and can only work out (p − 1)(q − 1) by
finding p and q. A fast method to factorize large numbers would render the
method insecure. We will see later that the calculations involved in encryption
and decryption can be done very quickly (see Exercise 12.5).
Encrypting a message
Suppose that Albertina wants to send an integer m < n to Bill. Albertina
converts m into an encrypted number c (usually called the cyphertext) by
computing
252
12 Computational Number Theory
c = me
(mod n),
where the pair (n, e) is Bill’s published public key. She then sends the number c
to Bill over an open channel.
When Bill receives the encrypted message c, he uses his private key (n, d)
to compute
m = cd (mod n).
(12.1)
Exercise 12.1. Show that Equation (12.1) correctly recovers the original
message m.
Exercise 12.2. In order to use the RSA algorithm, a ready supply of primes
of specified size is needed. Use the Prime Number Theorem (Theorem 8.1) to
estimate the number of primes with N decimal digits for a large N .
Anyone other than Bill will only have access to the public key (n, e). In
order to recover the message m, they would have to somehow compute d,
which involves factorizing n.
Digital Signatures
Suppose now that Albertina wants to send a message m to Bill in such a way
that Bill is assured that the message has not been substituted by another
message, and that the message comes from Albertina and not from someone
else. Both of these problems are real because the public key allows anyone to
generate messages that appear authentic. In order to do this, both Albertina
and Bill create public and private keys as above.
Albertina creates a digital signature s by computing
sA = mdA
(mod nA ),
where (nA , dA ) is her private key. She sends m (using the system above) and
the signature sA to Bill. To verify the signature, Bill checks that the message m
is recovered by
m = seAA (mod nA ),
where (nA , eA ) are Albertina’s public key.
Exercise 12.3. Explain what happens if either the encrypted message or the
signature is tampered with.
This approach allows both the encryption of the message and the signing
to be done without any exchange of private keys. Each party only needs to
know the other’s public key and their own private key. Only someone with the
correct private key can decrypt received messages or sign outgoing messages.
12.3 Primality Testing: Euclidean Algorithm
253
12.3 Primality Testing: Euclidean Algorithm
The two most natural arithmetic problems of computational interest are the
following. Firstly, to decide if a given integer is a prime. Secondly, to factorize
an integer known to be composite.
Lucas proved that M67 = 147573952589676412927 is composite without
exhibiting any factors. In fact, M67 is a difficult number to factorize in the
sense that it has no small factors:
267 − 1 = 193707721 · 761838257287
is the prime factorization of M67 . Being able to prove that a number is composite without exhibiting any of its factors seems unlikely, but as we have
already seen in Section 1.5, Fermat’s Little Theorem sometimes allows this.
For example, the fact that 290 ≡ 64 modulo 91 implies that 91 must be a
composite number.
This idea – exploiting known properties of primes as a test for primality
– leads to very efficient tests for primality that only rarely give the wrong
answer. The first ingredients needed are complexity estimates for modular
arithmetic and the Euclidean Algorithm.
Lemma 12.10. C(multiply a modulo m by b modulo m) = O (log m)2 .
Proof. If m has k bits, then the residues of a and b modulo m are numbers
with no more than k bits. Computing a · b takes O(k 2 ) bit operations by
Lemma 12.6, and a · b has no more than 2k bits. Finding the residue of a · b
modulo m takes O(k 2 ), giving a total time of O(k 2 ) + O(k 2 ) = O(k 2 ).
The Euclidean Algorithm finds the greatest common divisor of two integers
without factorizing them. Given integers a > b > 0, the algorithm proceeds
by the following successive divisions:
a = bq1 + r1 , 0 < r1 < b
b = r1 q2 + r2 , 0 < r2 < r1
..
.
rn−2 = rn−1 qn + rn ,
0 < rn < rn−1
rn−1 = rn qn+1
and then
gcd(a, b) = rn .
Theorem 12.11. C(find gcd(a, b)) = O (log a)3 .
Proof. Each step involves a simple
division of integers that are less than a, so
each step has complexity O (log a)2 . We need to estimate how many steps are
taken before the algorithm terminates. Since the remainders ri are nonnegative
254
12 Computational Number Theory
integers, the number of steps can be bounded if we know that the remainders ri
are decreasing rapidly. We claim that
ri+2 12 ri for i = 1, 2, . . . .
(12.2)
There are two ways to see this. The claim follows by noticing that every two
steps, the bit length of the remainders must go down by at least 1. This
becomes clear with an explicit approach to long division; if after one step the
bit length has not gone down, then the next step involves dividing two binary
integers of the same bit length, and hence the quotient is 1 and the remainder
must be shortened.
To prove this in a more rigorous manner, consider the following possibilities
for a fixed i. If
ri+1 12 ri ,
(12.3)
then ri+2 < ri+1 then
1
2 ri ,
as required. If the inequality (12.3) does not hold,
ri+1 > 12 ri .
(12.4)
Now
ri = ri+1 qi+2 + ri+2 ,
so if qi+2 2 then ri 2ri+1 + ri+2 2ri+1 , which contradicts the inequality (12.4). It follows that qi+2 = 1 (as we predicted), so
ri+2 = ri − ri+1 < ri − 12 ri = 12 ri ,
proving the inequality (12.2).
The inequality (12.2) gives the following rapid decay in the size of the
remainders:
r3 < 12 r1 ,
r5 < 12 r3 < 14 r1 ,
..
.
r2n+1 <
1
2 n r1 .
It follows that
n>
log a
1
1
=⇒ n r1 < n a < 1 =⇒ r2n+1 = 0,
log 2
2
2
so the number of steps is n = O(log a).
Thus, the total time taken to complete the algorithm is O (log a)3 .
Corollary 12.12. Let a > b > 0 be integers. Then integers m, n can be found
with |m| b, |n| a, and
in complexity O (log a)3 .
am + bn = gcd(a, b)
12.3 Primality Testing: Euclidean Algorithm
255
Corollary 12.13. Let m 2 be an integer and let a, 1 a m, be an
integer with gcd(m,
a) = 1. Then the inverse of a modulo m can be found with
complexity O (log m)3 .
Example 12.14. Let m = 31 and a = 12. Since gcd(31, 12) = 1, there is an x
with 12x ≡ 1 modulo 31. Apply the Euclidean Algorithm:
31 = 12 · 2 + 7,
12 = 7 · 1 + 5,
7 = 5 · 1 + 2,
5 = 2 · 2 + 1.
It follows that
1 = 5−2·2
= 5 − 2(7 − 5) = 3 · 5 − 2 · 7
= 3(12 − 7) − 2 · 7 = 3 · 12 − 5 · 7
= 3 · 12 − 5(31 − 2 · 12) = 13 · 12 − 5 · 31,
so 12−1 ≡ 13 modulo 31.
We will exploit the speed of the Euclidean Algorithm in what follows.
The first goal is to extend our use of Fermat’s Little Theorem to construct
a primality test. This involves exponentiation modulo m, so the first step
is to find a way to compute bn modulo m for large values of m, n, and b.
Certainly, in a calculation such as 290 ≡ 64 modulo 91 the huge intermediate
number 290 should not be computed – all the intermediate steps should be
done modulo m. On the face of it, this still leaves n possible multiplications
and reductions modulo m of numbers as large as m. It turns out that there is
an additional simplification that reduces the order of complexity dramatically:
Carry out the multiplications using the Repeated Squaring algorithm.
Theorem 12.15. Given m 2 and b, n ∈ N, 0 b < m,
C(compute bn modulo m) = O (log m)2 log n) .
Proof. Assume that n is presented as a binary number
n = n0 + 2n1 + · · · + 2k nk
with bits ni ∈ {0, 1}. Now carry out the following calculations (all modulo m).
256
12 Computational Number Theory
Step 1: compute b0 = bn0 ;
Step 2: compute b2 ;
Step 3: compute b1 = b0 (b2 )n1 ;
Step 4: compute b4 = (b2 )2 ;
2
Step 5: compute b2 = b1 (b2 )n2 ;
2
Step 6: compute b8 = (b2 )2 ;
Step 7: compute b3 = b2 (b8 )n3 ;
..
.
Step 2k + 1: compute bn = bn .
The total number
of steps is 2k + 1 = O(k). The complexity of each step
is O (log m)2 because it involves multiplying two integers modulo m. It follows that the total complexity is
O(k) · O (log m)2 = O k(log m)2 = O (log m)2 log n .
We also need a general result about solutions to simultaneous linear congruences, the Chinese Remainder Theorem (see p. 61). We know how to solve
linear congruencies of the form
ax ≡ b
(mod m)
if gcd(a, m) = 1 using the Euclidean Algorithm. It is often useful to be able
to solve simultaneous congruences.
Theorem 12.16. Suppose that m1 , . . . , mr are integers with gcd(mi , mj ) = 1
for all i = j. Then the simultaneous congruences
x ≡ a1
x ≡ a2
..
.
x ≡ ar
(mod m1 )
(mod m2 )
(mod mr )
have a solution, and this solution is unique modulo M = m1 · · · mr .
The uniqueness means that if x and y are integers solving all the congruences, then x ≡ y modulo M .
Proof. If x and y both satisfy the congruences, then
x − y ≡ 0 (mod mi ) for i = 1, . . . , r.
12.3 Primality Testing: Euclidean Algorithm
257
Since the mi are all coprime, this means (x − y) is divisible by M as required.
We show that there is a solution by constructing one. Let
Mi =
M
= m1 · · · mi−1 mi+1 · · · mr for i = 1, . . . , r.
mi
Then gcd(Mi , mi ) = 1 for all i, so by the Euclidean Algorithm there are
integers Ni , 0 Ni mi , with
Mi Ni ≡ 1 (mod mi ) for i = 1, . . . , r.
Let
x=
r
ai Mi Ni .
i=1
Then x satisfies all the congruences.
Example 12.17. Consider the simultaneous congruences
x ≡ 2 (mod 3),
x ≡ 3 (mod 4),
x ≡ 4 (mod 5).
The Chinese Remainder Theorem predicts a solution that is unique modulo 60.
Working through the proof gives
M1 = 20 so 20N1 ≡ 1 (mod 3) and N1 = 2,
M2 = 15 so 15N2 ≡ 1 (mod 4) and N2 = 3,
M3 = 12 so 12N3 ≡ 1 (mod 5) and N3 = 3.
Hence a solution is
x = a1 M1 N1 + a2 M2 N2 + a3 M3 N3
= 2.20.2 + 3.15.3 + 4.12.3
= 80 + 135 + 144
= 359 ≡ 59 (mod 60)
Exercise 12.4. Find an estimate for C(compute (n − 1)! modulo n) (see the
discussion in Section 1.5 concerning the practicality of using al-Haytham’s
Theorem for primality testing).
Exercise 12.5. Estimate the complexity of the steps involved in the RSA
algorithm from Section 12.2
258
12 Computational Number Theory
12.4 Primality Testing: Pseudoprimes
Given an integer n, it is easy enough to construct a flawless test that will
certify the primality
of n. If n is not prime, then it must have a prime factor
√
no larger than n, so a test√for the primality of n is to see if n is divisible
by any prime smaller than n. This is completely impractical as soon as n
is large. To see why, consider what is involved in applying the method to a
relatively small number close to 1016 .
1. We need to know or find all the primes up to about 108 .
2. Even if we have a list of all those primes, there are about logNN primes up
to N , so we would have to trial divide by about 5 · 106 numbers.
This approach will never give complexity that is polynomial in n.
In general, any approach that requires knowledge of all the primes up to a
bound that is relatively large in relation to n in order to test the primality of n
is doomed to be impractical for large values of n. This includes Eratosthenes
for example, although this is useful for small values of n.
Let us go back to Fermat’s Little Theorem to construct a primality testing
algorithm free of this basic weakness. Suppose n is a large positive integer
(with 80 digits, say). Consider the following algorithm.
(1) Choose an integer b, 1 < b < n.
(2) Compute gcd(b, n) using the Euclidean Algorithm.
(3) If gcd(b, n) > 1, then we have found a nontrivial factor of n so n is not
prime.
(4) If gcd(b, n) = 1, then compute bn−1 modulo n using the Repeated Squaring
algorithm.
(5) If bn−1 ≡ 1 modulo n, then n is not prime because it does not satisfy
Fermat’s Little Theorem.
Example 12.18. Let n = 91 and choose b = 6. We check that gcd(6, 91) = 1,
and find that 690 ≡ 63 modulo 91, so 91 is not prime.
Suppose this algorithm is run several times, and for each of the chosen
values of b with gcd(b, n) = 1 we find that
bn−1 ≡ 1 (mod n).
(12.5)
How convinced should we be that n is prime?
Definition 12.19. A composite integer n with bn−1 ≡ 1 modulo n for some b
with gcd(b, n) = 1 is called a pseudoprime to base b.
Example 12.20. The fact that 91 is not a prime was readily detected by this
method. To see how recalcitrant some numbers can be, consider n = 561. The
first few numbers b with gcd(b, 561) = 1 are 2, 4, 5, 7, 8, 10, and we can easily
find that
12.4 Primality Testing: Pseudoprimes
259
2560 ≡ 1 (mod 561),
4560 ≡ 1 (mod 561),
5560 ≡ 1 (mod 561),
7560 ≡ 1 (mod 561),
8560 ≡ 1 (mod 561),
and
10560 ≡ 1 (mod 561).
Despite this prime-like behavior, 561 is composite: 561 = 3 · 11 · 17. We have
already seen that this number is a stubborn mimic of primality: It satisfies Equation (12.5) for every base b with gcd(b, 561) = 1 (see Exercise 1.25
on p. 34).
In order to start to understand how prevalent pseudoprimality is, we need
to look at the algebraic properties of pseudoprimes. It will be useful to write
x̄ ∈ {0, 1, . . . , n − 1}
for the residue modulo n of an integer x.
Theorem 12.21. Suppose that n is odd.
(1) n is a pseudoprime to base b if and only if the multiplicative order of b̄
in (Z/nZ)∗ divides (n − 1).
(2) If n is a pseudoprime to bases b1 and b2 , then n is a pseudoprime to the
−1
bases b1 b2 , b−1
1 , and b2 modulo n.
n−1
(3) If b
≡ 1 modulo n for some base b with gcd(n, b) = 1, then cn−1 ≡ 1
modulo n for at least half of all possible bases c.
Proof. (1) The congruence bn−1 ≡ 1 modulo n means that b̄n−1 is the
identity in the group (Z/nZ)∗ , which holds if and only if the order of b̄ divides (n − 1).
(2) This holds because (b1 b2 )n−1 ≡ bn−1
.bn−1
modulo n (and similarly for
1
2
the inverses).
(3) Let B = {b1 , . . . , bs } denote the set of bases with respect to which n is a
pseudoprime, and let b be a base for which n is not a pseudoprime. By (2) the
set {bb1 , . . . , bbs } consists of bases for which n is not a pseudoprime. Thus B
contains no more elements than its complement. Since there are φ(n) possible
bases, we must have
1
s < φ(n).
2
260
12 Computational Number Theory
It is tempting to argue as follows: If we find a base b for which n is a
pseudoprime (that is, bn−1 ≡ 1 modulo n), then the probability that n is
prime is at least 12 .
Unfortunately, this does not make sense unless we know that a composite number n will always have a base b for which bn−1 ≡ 1 modulo n. For
numbers n that do have such witnesses to their non-primality, it does make
sense to say that if n is a pseudoprime with respect to k different bases chosen
randomly,1 then the probability that n is prime exceeds 1 − 2−k .
For relatively modest values of k, this probability is so close to 1 that the
probability that we have passed a composite number as prime is comparable
with the probability of a numerical error in the computer itself.
However, if our candidate number n is composite but has the property
that bn−1 ≡ 1 modulo n for all b with gcd(n, b) = 1, then this test will always
fail.
12.5 Carmichael Numbers
Definition 12.22. A composite integer n with the property that bn−1 ≡ 1
modulo n for all b with gcd(n, b) = 1 is called a Carmichael number.
Example 12.23. Let n = 561. We saw in Example 12.20 that 561 is a pseudoprime with respect to the bases 2, 4, 5, 7, 8, and 10. Knowing that 561 =
3 · 11 · 17, we can use the Chinese Remainder Theorem to argue as follows.
Let b be any integer with gcd(b, 561) = 1. Then
gcd(b, 3) = 1 =⇒ b2 ≡ 1
(mod 3) =⇒ b560 = (b2 )280 ≡ 1
gcd(b, 11) = 1 =⇒ b
10
≡ 1 (mod 11) =⇒ b
560
gcd(b, 17) = 1 =⇒ b
16
≡ 1 (mod 17) =⇒ b
560
(mod 3),
10 56
≡ 1 (mod 11),
16 35
≡ 1 (mod 17),
= (b )
= (b )
so by the Chinese Remainder Theorem
b560 ≡ 1 (mod 561).
Thus 561 is a Carmichael number.
We shall see later that a Carmichael number cannot have a square factor. A great deal is known about the structure of Carmichael numbers: They
must have at least three prime factors, for example, and 561 is the smallest Carmichael number. A striking result of Alford, Granville and Pomerance
from 1993 is that there are infinitely many Carmichael numbers, indeed
|{n | n X and n is a Carmichael number}| > X 2/7
1
(12.6)
We are assuming here that the bases can be chosen “randomly” and in particular
independently of the property of pseudoprimality.
12.5 Carmichael Numbers
261
asymptotically as X → ∞ (see the notes at the end of the chapter for the
reference).
∗
For a prime p, Z/pZ is a finite field, so its multiplicative group (Z/pZ) is
2
cyclic. Despite the fact that the ring Z/p Z is not a field (it has zero-divisors
and should not be confused with the Galois field Fp2 ), its multiplicative group
is also cyclic.
∗
Lemma 12.24. If p is a prime, then G = Z/p2 Z is a cyclic group.
Proof. For p = 2, the invertible elements of Z/4Z are the residues 1 and 3,
so in this case G is a cyclic group with two elements. We may thus assume
that p is an odd prime.
We first claim that if gcd(a, p) = 1, then 1 + ap has order p in G. To see
this, notice that
p
p
(1 + ap)p = 1 + ap2 +
(ap)2 + · · · +
(ap)p−1 + (ap)p
2
p−1
≡1
(mod p2 ).
Since p is prime, this means that the order of 1 + ap is either 1 (which is
impossible, as 1 + ap ≡ 1 modulo p2 ) or p.
∗
We next claim that there is an element in G of order (p−1). Since (Z/pZ)
is cyclic, we can find an integer g, 1 < g < p, with
g n ≡ 1 (mod p) for 1 < n p − 1
only if n = p − 1. If g p−1 ≡ 1 modulo p2 , then the order of g modulo p2 will
still be (p − 1). If not, then g p−1 = 1 + bp modulo p2 with gcd(b, p) = 1 and
we claim that g1 = g(1 + bp) has order (p − 1) modulo p2 . The order of g1
cannot be less than (p − 1) since g1 ≡ g modulo p, so it is enough to check
that
g1p−1 = (g(1 + bp))p−1 = g p−1 (1 + bp)p−1
≡ (1 + bp)(1 + b(p − 1)p) (mod p2 )
≡ (1 + bp)(1 − bp) (mod p2 )
≡ 1 (mod p2 ).
Thus G has p(p − 1) elements and contains an element of order p and an
element of order (p − 1); since gcd(p, p − 1) = 1, the product of these two
elements has order p(p − 1), so G is cyclic.
In general, (Z/mZ)∗ is cyclic if m = pr or 2pr for an odd prime p.
When (Z/mZ)∗ is cyclic, any generator for it is called a primitive root modulo m.
Theorem 12.25. If n is a Carmichael number, then n is square-free.
262
12 Computational Number Theory
Proof. Let n be a Carmichael number with p2 n for some prime p. Let n
denote the p-primary part of n – that is, n is np−r if n is divisible by p
exactly r times. Let g, 1 < g < p2 , be a generator of (Z/p2 Z)∗ . By the
Chinese Remainder Theorem, the congruences
b≡g
(mod p2 ),
b ≡ 1 (mod n )
have a solution. Since
gcd(b, p) = gcd(b, n ) = 1,
we must have gcd(b, n) = 1. Now n is a Carmichael number, so bn−1 ≡ 1
modulo p2 which implies that g n−1 ≡ 1 modulo p2 since b ≡ g modulo p2 .
The order of g modulo p2 is p(p − 1), so p(p − 1)(n − 1). This is impossible
because n − 1 ≡ −1 modulo p, which completes the proof.
12.6 Probabilistic Primality Testing
Recall the Jacobi symbol defined in Definition 11.1. If p is an odd prime, then
Euler’s Criterion says that
a
≡ a(p−1)/2 (mod p).
(12.7)
p
Suppose now that n is an odd integer that we wish to test for primality.
Choose an integer a at random between 1 and n and compute gcd(a, n) = da .
If 1 < da < n, then n has a proper divisor and so is not prime.
If gcd(a, n) = 1, compute a(n−1)/2 modulo n using the Repeated Squaring
algorithm.
Next compute the Jacobi symbol na ; since gcd(a, n) = 1, this symbol is
either +1 or −1.
We have computed two numbers, a(n−1)/2 modulo n and na . If they differ
then n is not prime. (If n is prime, they cannot differ by Euler’s criterion Equation (12.7).)
As with Fermat’s Little Theorem, the important question is what it means
if the number n passes this test in the sense that the two numbers agree?
It turns out that there is no analog of the problem caused by Carmichael
numbers.
Theorem 12.26. If n > 1 is an odd composite number, then there is an
integer b, 1 < b < n, with gcd(b, n) = 1 and
b
= b(n−1)/2 (mod n).
n
12.6 Probabilistic Primality Testing
263
Proof. Assume first that there is a prime p with p2 n, and let b = 1 + np . By
the multiplicative property of the Jacobi symbol,
b
b
b
=
.
n
p
n/p
Now pb = 1 because b ≡ 1 modulo p. On the other hand, b ≡ 1 modulo n/p,
b
so n/p
= 1 and therefore nb = 1.
By the Binomial Theorem, for j 2,
j
j
2 j
n
n
n
n
=1+ j+
+ ··· +
bj = 1 +
p
p
p
p
2
n
≡ 1 + j (mod n)
p
since p2 n implies that ( np )2 , ( np )3 , . . . , ( np )j are all congruent to 0 modulo n.
Taking j = n−1
2 gives
n n−1
(n−1)/2
b
≡ 1 (mod n)
≡1+
p
2
since np n−1
≡ 0 modulo n, so we have found an integer b with gcd(b, n) = 1
2
and
b
= b(n−1)/2 (mod n).
n
If there is no prime
p with p2 n, then n is square-free. Suppose that p is an
odd prime with pn. Let a be a quadratic nonresidue modulo p, and consider
the congruences
x≡a
(mod p),
x≡1
(mod n/p).
Notice that gcd(p, n/p) = 1 because n is square-free, so by the Chinese Remainder Theorem there is a solution b with 1 b n. Now gcd(a, p) = 1,
so gcd(b, p) = 1 and therefore gcd(b, n) = 1. Notice that
b
b
b
=
n
p
n/p
a
1
=
p
n/p
= (−1) · 1
= −1.
(n−1)/2
Since b ≡ 1 modulo np , b(n−1)/2 = 1 + dn
≡ −1
p for some d ∈ Z. If b
(n−1)/2
modulo n, then we may write b
= −1 + en. Now
264
12 Computational Number Theory
dn
= −1 + en
p
=⇒ 2p = n(ep − d),
1+
which contradicts the fact that n is an odd composite number, so
b(n−1)/2 ≡ 1 (mod n)
and we have again found an integer b with gcd(b, n) = 1 and
b
= b(n−1)/2 (mod n).
n
Once we know there is some witness b to non-primality, there must be
many such witnesses.
Lemma 12.27. If n > 1 is an odd composite number, then at least half of all
the integers b, 1 b n, with gcd(b, n) = 1 will satisfy
b
= b(n−1)/2 (mod n).
n
Proof. This is essentially the same as the proof of Theorem 12.21(3). If b1
and b2 satisfy gcd(b1 , n) = gcd(b2 , n) = 1, but
b1
(n−1)/2
= b1
(mod n)
n
and
b2
n
(n−1)/2
= b2
b1 b 2
= (b1 b2 )n−1/2
n
The rest of the proof proceeds as before.
then
(mod n)
(mod n).
This gives a probabilistic algorithm for primality testing. Make k random
choices b1 , . . . , bk of distinct integers with 1 bi n and gcd(bi , n) = 1. If
bi
(n−1)/2
= bi
(mod n),
n
then we accept that n is probably prime in that the probability that a composite number would pass all those tests is 21k . In order to decide how practical
this is, we need to estimate the complexity of the ingredient steps. We know
the complexity of exponentiation modulo n using the Repeated Squaring algorithm; the only step we do not know is the complexity of computing the
Jacobi symbol.
12.6 Probabilistic Primality Testing
265
Theorem 12.28. Let b and n be integers with 1 b n. Then
b
= O (log n)3 .
C calculate the Jacobi symbol
n
Proof. We may assume that gcd(b, n) = 1. Choose b1 with b1 ≡ b modulo n
and − n2 b1 n2 . Then
b2
n
b
= ±
with 0 < b2 n
n
2
b2
1
.
= ±
n
n
If b2 = 2, then we can use the n2 formula. If b2 is odd, then apply quadratic
reciprocity to see that
1
n
b
= ±
n
n
b2
1
n (mod b2 )
= ±
n
b2
1
b3
n
= ±
with 1 b3 < b2 .
n
b2
2
At each repetition of this basic step, the denominator is reduced by a factor
of 2 so there are at most O(log n) steps. Each step involves finding
b, bi , bj n
(mod 4)
in O(log n) operations and finding bi n modulo bj n in O((log n)2 ) steps.
Thus the total complexity is
O(log n) · O (log n)2 = O (log n)3 .
This test is known as the Solovay–Strassen primality test and it has several refinements; the Miller–Rabin test is even faster and is widely used in
computer algebra packages. Typically, when such a package verifies the primality of an integer, it only guarantees that the probability that the integer is
composite is very small (usually less than 10−6 ). Given the random element
to choosing bases, repeating the test multiplies the probability – by this stage
it is pretty well certain the number is prime.
Exercise 12.6. This exercise introduces the Miller–Rabin test. Given n ∈ N,
write n − 1 = 2s t with t odd. Given b, 0 < b < n, gcd(b, n) = 1, n is called a
strong pseudoprime to base b if bt ≡ 1 modulo n or there is an r, 0 r < s
r
such that b2 t ≡ −1 modulo n. Prove that if n is an odd composite integer
then n is a strong pseudoprime for at most one quarter of the possible bases b
with 0 < b < n and gcd(b, n) = 1.
266
12 Computational Number Theory
We recommend using the primality testing command on a computer with
a number theory package. It is almost trivial now to construct an integer
with thousands of decimal digits that passes the Miller–Rabin test and is thus
almost certainly prime. Numbers which are constructed in this way are often
called industrial primes. For all practical purposes they behave like primes
and indeed are probably so with probability very close to 1. For example,
when the RSA cryptosystem is implemented, in order to produce a public
key, it relies upon the construction of an integer which is the product of two
large primes. The primes used are industrial primes in the above sense.
In 1983, Adleman, Pomerance, and Rumely found a sophisticated deterministic algorithm with complexity
(log n)O(log log log n) .
The exponent log log log n grows very slowly in n, and this algorithm is in
practice very fast. Other modern algorithms use elliptic curves and Abelian
varieties. In 1992 Adleman and Huang gave a probabilistic algorithm with
polynomial running time that after k iterations either gives a definitive answer
or gives no answer. The probability of no answer is 2−k . This algorithm will
never give a wrong answer but may with low probability fail to give an answer.
In Section 12.7.1 we will gave an account of a deterministic version of the
Solovay–Strassen test. This test relies upon a hard unproven hypothesis. Until
very recently, there was general frustration at the lack of a deterministic, polynomial time algorithm that does not rely upon unproven hypotheses. In 2003
there was great excitement when Agrawal, Kayal, and Saxena announced an
ingenious approach to primality testing that gives such an algorithm.
12.7 The Agrawal–Kayal–Saxena Algorithm
There are attractive and readable accounts of this brilliant work and its later
refinements in the references at the end of the chapter. All the methods considered so far begin with some theoretical characterization of primality that turns
out to be implementable in some practical way. The Agrawal–Kayal–Saxena
algorithm is no exception. We have seen that Fermat’s Little Theorem gives
a property of prime numbers that is shared by some composite numbers. A
similar property in polynomials does give a complete characterization of prime
numbers.
Lemma 12.29. Let n denote a positive integer and suppose a is an integer
with gcd(a, n) = 1. Then n is prime if and only if the following congruence
holds for polynomials
(x − a)n ≡ xn − a (mod n).
In other words, if and only if the equation
12.7 The Agrawal–Kayal–Saxena Algorithm
267
(x − a)n = xn − a
holds in the ring Z/nZ[x].
Exercise 12.7. Prove Lemma 12.29. (Hint: It is not much harder than the
proof of Fermat’s Little Theorem. However, on this occasion, we suggest the
use of congruences and the Binomial Theorem rather than Lagrange.)
Although this lemma is simple, the problem as it stands is that it cannot
be checked with polynomial complexity. The neat idea that unlocked this
beautiful result was to consider the congruence not just modulo n, but also
modulo a polynomial of the form xr − 1 for some prime r. However then a
also needs to vary in order to keep the integrity of the test. What Agrawal–
Kayal–Saxena proved is that r and various a can be chosen in such a way as
to yield a primality test whose complexity is polynomial in log n.
Here is a version of their algorithm as refined by Bernstein, taken from an
expository article by Bornemann.
Theorem 12.30. Suppose n ∈ N and s n. Suppose primes q and r are
chosen with the properties that q (r − 1), n(r−1)/q ≡ 0, 1 modulo r, and
√
q+s−1
n2
r .
s
If, for all a with 1 a < s,
(1) gcd(a, n) = 1, and
(2) (x − a)n = xn − a modulo (xr − 1, n) in the ring of polynomials Z[x],
then n is a prime power.
This gives a version of the Agrawal–Kayal–Saxena algorithm.
(1) Decide if n is a power of a natural number. If it is, go to (5).
(2) Choose integers q, r, and s satisfying the
hypotheses of Theorem 12.30.
(3) For a = 1, . . . , s − 1, do two checks. If an, go to (5). If (x − a)n = xn − a
modulo (xr − 1, n), go to (5).
(4) If you have reached this step, then n is prime.
(5) If you have reached this step, then n is composite.
12
The complexity of the original algorithm was a little overO (log n)
.
6
In 2003 Lenstra and Pommerance reduced this to a little over O (log n) and
it can be reduced further for special types of integers. Despite the apparent
simplicity, proving that the algorithm does indeed test for primality and that
the choices can all be made in polynomial complexity requires considerable
ingenuity. However, the mathematics involved is not much beyond the scope
of this book. We recommend that an interested reader follow the references
in the notes to this chapter.
268
12 Computational Number Theory
In one sense this algorithm resolves an outstanding problem. The existence of a polynomial time algorithm for determining primality is certainly
a theoretical result of major interest and importance. However, something
of a cloud hangs over the implementation of the algorithm as it stands. In
practice, on current understanding, it is rather slow, so there is a rather interesting kind of trade-off. The probabilistic algorithms we discussed are very
fast and very easy to implement, but give an uncertain answer, whereas the
deterministic algorithm is currently too slow to implement. This is an area
where much remains to be done and it is likely to see significant development.
To further complicate things, the next section discusses some powerful methods that work under the assumption of generalized versions of the Riemann
Hypothesis.
12.7.1 Deterministic Primality Testing
We will now show that the Solovay–Strassen primality test does have a deterministic form, but only under the assumption of an unproven hypothesis
related to the Riemann Hypothesis. We have already seen a remarkable phenomenon at work in the area of primality testing; new results often draw
upon earlier classical theorems from the literature, proven themselves simply
for interest’s sake. Here comes another example.
If you compose a list of quadratic residues and nonresidues for the first
few prime numbers, you are likely to wonder whether the distribution of
the quadratic residues obeys any predictable patterns. Since 1 is always a
quadratic residue, the most basic question concerns the smallest predictable
value of a quadratic nonresidue. This is a difficult problem: In particular, the
bounds that are provable seem much weaker than the data suggests. On the
other hand, a measure of the strength of such bounds is their applicability.
The strongest known bound is due to Ankeny, but it relies on the Extended
Riemann Hypothesis. This is stated in several different ways in the literature;
one form is the following.
Conjecture 12.31. [Extended Riemann Hypothesis] Let L(·, χ) denote
an L-function associated with a Dirichlet character χ. Then all the solutions
of L(s, χ) = 0 in the critical strip 0 (s) 1 have (s) = 1/2.
The following is a refined form of Ankeny’s original theorem.
Theorem 12.32. [Ankeny] Let p denote a prime and assume the Extended
Riemann Hypothesis holds for the L-function associated with the character
given by the Legendre symbol for p. Then the smallest quadratic residue modulo p is no larger than 2(log p)2 .
We are not going to prove Ankeny’s Theorem. His theorem has an easy generalization for a non-prime modulus which is used in the following result, which
is also not going to be proved here.
12.8 Factorizing
269
Theorem 12.33. Suppose n > 1 denotes any odd integer and the Extended
Riemann Hypothesis holds for L(·, χ) where χ denotes the Jacobi symbol n. .
Then either an integer a exists with a < 2(log n)2 such that gcd(a, n) = 1, or
a
= a(n−1)/2 (mod n).
n
Using this latter result it is easy to extend the Solovay–Strassen test to deterministic polynomial time form; one simply has to run the test for each
integer a > 1 below 2(log n)2 . The deterministic version of Solovay–Strassen
is an excellent test that is easy to implement and runs very quickly. The one
thing against it is that it relies upon an unproven hypothesis. Whether that
hypothesis will be proved or – contrary to expectation – disproved, soon, nobody can tell. Thus, for the moment, primality testing lies in a state of flux
and we urge the reader to watch for developments.
12.8 Factorizing
Given an integer n, the Fundamental Theorem of Arithmetic guarantees that n
can be factorized. A natural question is to ask if we can find polynomial complexity algorithms to do this, and the answer seems to be no. Much hangs
upon this question; the success of the RSA cryptosystem relies upon the apparent intractability of finding a fast factorizing algorithm. There are even
large rewards for finding factors of some apparently difficult numbers. RSA
Laboratories offers rewards of up to $200000 for factorizing certain RSA numbers, the name given to numbers with just two distinct prime factors. There
are methods that are substantially faster than trial division for factorizing
while not having polynomial complexity. On the other hand, they can be difficult to analyze; we simply describe three approaches whose implementation
is easily understood and that will enable the reader to be able to enter into
the literature with a good grasp of basic principles.
To give an idea about the way that primality testing and factorizing differ
in practice, it might be helpful to consider the relative sizes of the integers to
which the known methods can be applied with a hope of success. A well-chosen
integer with approximately 200 decimal digits has a good chance of resisting
factorization today, even using the best method, the number field sieve. (It
will become clear later what is meant by the term “well-chosen”.) By contrast,
modern primality testing techniques installed on a PC can verify (with the
earlier probabilistic caveat) the primality of integers with tens of thousands of
digits in a matter of seconds. The Cunningham Project is an online attempt
to tabulate factorizations of integers of the shape bn ± 1, where b is small (up
to 12) and n is very large – the Fermat and Mersenne numbers, for example.
The tables give a good indication of the successes, as well as the current
limitations of factoring techniques.
270
12 Computational Number Theory
In case this sounds pessimistic, it should be acknowledged how rapid developments in the field have been. Carl Pomerance has pointed out that in
the 1970s, even 20-digit integers were difficult to factorize. By 1980, the factorization of 50-digit integers was becoming commonplace and by 1990, the
record stood at 116-digits (in each case, these are difficult numbers, constructed as products of two large primes). In 1994, a 129-digit integer was
factorized, which was remarkable because in an article in 1976, this number
was predicted to be safe for 40 quadrillion years.
12.8.1 The Rho Method
This method generates test numbers as candidates to be factors of the given
number in a particular way, and the logical structure of the algorithm resembles the Greek letter ρ (rho), hence the name.
Start with a map f : Z/nZ → Z/nZ, typically a polynomial of degree
greater than 1. Pick a starting value x0 ∈ Z/nZ and compute differences
between iterates of f applied to x0 as follows. Let x1 = f (x0 ), x2 = f (x1 ) and
so on, and then compute gcd(x1 − x0 , n), gcd(x2 − x1 , n), and so on. The hope
is that the iterates of f are randomly distributed among the residue classes of
proper divisors of n, so among the calculations of gcd(xr − xr−1 , n) we hope
to quickly find a factor of n. The name “rho” is given to this method because
the algorithm is said to run in the shape of a letter ρ.
Example 12.34. Let n = 91 and f (x) = x2 + 1. Then take x0 = 1 and compute
x1 = 2, x2 = 5, x3 = 26, x4 = 40,
and so on. (Remember that these are residues modulo n.) We find
gcd(x3 − x2 , n) = gcd(26 − 5, 91) = gcd(21, 91) = 7.
Thus we have found a factor of n after carrying out relatively few calculations.
Example 12.35. Let n = 323 and f (x) = x2 + 1. Take x0 = 1, then
x1 = 2, x2 = 5, x3 = 26, x4 = 31.
We find that
gcd(x5 − x4 , n) = gcd(316 − 31, 323) = gcd(285, 323) = 19,
so 323 is divisible by 19.
Exercise 12.8. Apply this method with n = 437, f (x) = x2 + 1, and x0 = 1.
Confirm that gcd(x5 − x4 , n) finds a nontrivial factor of n.
12.8 Factorizing
271
Several questions present themselves immediately. How do you know which
iterates to compare? How do you pick f and x0 ? In the literature, the following
iterates are often compared:
x1 − x0 , x4 − x3 ,
x2 − x1 , x5 − x3 ,
x3 − x1 , x6 − x3 ,
x7 − x3 ,
x8 − x7 , x16 − x15 , x32 − x31 , . . .
x9 − x7 , x17 − x15 , x33 − x31 , . . .
x10 − x7 , x18 − x15 , x34 − x31 , . . .
x11 − x7 , x19 − x15 , x35 − x31 , . . .
x12 − x7 , x20 − x15 , x36 − x31 , . . .
..
.
x21 − x15 , x37 − x31 , . . .
..
.
x38 − x31 , . . .
x15 − x7 ,
..
.
...
x −x ,
31
15
x63 − x31 , . . .
and so on. This appears to leave gaps but, in fact, the gaps can be accounted
for.
Example 12.36. Let n = 4087 and f (x) = x2 +x+1. Pick x0 = 2 and compute
gcd(x1 − x0 , n) = gcd(7 − 2, 4087) = 1,
gcd(x2 − x1 , n) = gcd(57 − 7, 4087) = 1,
gcd(x3 − x1 , n) = gcd(3307 − 7, 4087) = 1,
gcd(x4 − x3 , n) = gcd(2745 − 3307, 4087) = 1,
gcd(x5 − x3 , n) = gcd(1343 − 3307, 4087) = 1,
gcd(x6 − x3 , n) = gcd(2626 − 3307, 4087) = 1,
gcd(x7 − x3 , n) = gcd(3734 − 3307, 4087) = 61;
we have found a factor of 4087.
Frustratingly, this method seems to work when it chooses. Deciding when
and how it will work – and how quickly it will work when it does – involves a
complicated and unsatisfactory argument. There are several reasons for this
difficulty.
1. It is difficult to make the “spreading” property (through the residue classes
modulo n) of f precise. Even having done so, it is difficult to check the
property for concrete functions.
2. Even if you satisfy yourself about the spreading property, the complexity
estimate is well away from being polynomial in log n.
3. It is a probabilistic approach in character (but this is not such a serious
difficulty).
272
12 Computational Number Theory
12.8.2 The Factor Base Method
This method has been quite successful in the sense that the largest nontrivial factorizations have been performed using it, often using many computers operating in tandem. It is not really viable in general, however,
as we will see. On the other hand, the number field sieve (often referred
to as NFS), extensively refined by many workers, is a successful factoring
method based on this method. The complexity of the NFS can be shown to
be O(exp[(log n)1/3 (log log n)2/3 ]).
The basic idea goes as follows. Suppose a large integer n can be expressed
in the form n = s2 − t2 with s, t ∈ Z. Then obviously (s + t) and (s − t) are
factors of n. Of course, there is no hope of doing this in general, and even
if such a representation exists, it is not clear how to find it. If we suppose
further that (s + t) and (s − t) are approximately equal √
(relative to the size
of n), then we could hope to find them. Start with x = n
and then check
(x + 1)2 − n, (x + 2)2 − n, (x + 3)2 − n,
and so on to see if the result is a square.
This is still not of much practical use because the hypothesis is much
too strong. In the RSA cryptosystem, the security depends on the practical
impossibility (at present) of finding the factors of a product pq = n, where p
and q are very large distinct primes. The user of such a system guarantees
that p and q are not close, giving no hope of expressing n in the form
n = s2 − t2
in a reasonable amount of time. However, the basic idea becomes quite workable if we replace equality by congruences. We try to solve
s2 − t2 ≡ 0 (mod n).
This is much easier to solve, and if we have a solution pair (s, t), then we
can compute gcd(s ± t, n) and hope this will yield a nontrivial factor. The
factor base method relies upon generating a large number of solutions of the
congruence that can be tested.
Definition 12.37. Given an integer n known to be composite (perhaps it has
failed a primality test or perhaps it is a known public key in an RSA cryptosystem), let B denote the set {−1, p1 , . . . , pn }, where the pi are distinct primes.
We say x is a B-number for n if x2 modulo n can be expressed as a product
of powers of elements of B.
Example 12.38. Let n = 4633 and choose B = {−1, 2, 3}. Then 67, 68, 69
are B-numbers:
672 ≡ −144 ≡ −24 32 (mod 4633),
682 ≡ −9 ≡ −32 (mod 4633),
12.8 Factorizing
273
and
692 ≡ 128 ≡ 27
(mod 4633).
Solve s2 ≡ t2 (mod n):
(−77)2 ≡ (67 · 68)2 ≡ 24 34 ≡ (22 32 )2 ≡ 362
(mod 4633).
We check gcd(−77 + 36, n) = 41 and gcd(−77 − 36, n) = 113, giving nontrivial
factors of n.
In general, given n, B, and several B-numbers b1 , . . . , bk , the problem
is turned into one in linear algebra. Each bi gives rise to a vector over the
field F2 = Z/2Z as follows. Assume that B = {−1, p1 , . . . , ph−1 }; then each of
the squares of the B-numbers has a unique factorization
e
h−1
(−1)e0 pe11 · · · ph−1
with ei ∈ N,
and we associate to this the vector (e0 , . . . , eh−1 ) modulo 2. Doing this for
each of the given B-numbers produces a k × h matrix over F2 . We then look
for linear dependence relations among the rows of the matrix.
Example 12.39. Let n = 4633 and B = {−1, 2, 3}. Then the calculation in
Example 12.38 gives
⎡
⎤ ⎡
⎤
142
100
⎣1 0 2⎦ ≡ ⎣1 0 0⎦ (mod 2)
070
010
and the first two rows give the desired relation.
Of course, a dependence relation will not necessarily yield a factorization,
so the more B-numbers we can find, the greater will be our chances of success.
Example 12.40. Let n = 1829 and B = {−1, 2, 3, 5, 7, 11, 13}. Then
42, 43, 61, 74, 85, 86 are B-numbers.
Factorizing them gives the vectors
422 ≡ −5 · 13 → (1, 0, 0, 1, 0, 0, 1),
432 ≡ 22 · 5 → (0, 2, 0, 1, 0, 0, 0),
612 ≡ 32 · 7 → (0, 2, 0, 0, 1, 0, 0),
742 ≡ −11 → (1, 0, 0, 0, 0, 1, 0),
852 ≡ −7 · 13 → (1, 0, 0, 0, 1, 0, 1),
862 ≡ 24 · 5 → (0, 4, 0, 1, 0, 0, 0),
from which we find the following reduced matrix:
274
12 Computational Number Theory
⎡
1
⎢0
⎢
⎢0
⎢
⎢1
⎢
⎣1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
0
1
0
1
0
0
0
0
1
0
0
⎤
1
0⎥
⎥
0⎥
⎥.
0⎥
⎥
1⎦
0
Notice that r2 = r6 , so
(43 · 86)2 ≡ 26 52 ≡ (23 5)2 ≡ 402 .
Unfortunately, 43 · 86 ≡ 40, so we have found 402 ≡ 402 . Thus we find a
solution of the congruence but no factor. Try another dependence relation:
r1 + r 2 + r 3 + r 5
is the zero row. This gives a solution of the congruence,
(1459)2 ≡ (b1 b2 b3 b5 )2 ≡ (2 · 3 · 5 · 7 · 13)2 ≡ (901)2 .
We find
gcd(1459 + 901, 1827) = 59,
giving a nontrivial factor of n.
Example 12.41. Let n = 2201, and compute a few candidates for B-numbers
modulo n:
472 ≡ 8 = 23 ,
482 ≡ 103 has no small factors,
493 ≡ 200 = 23 · 52 ,
502 ≡ 299 = 13 · 23,
512 ≡ 400 = 24 · 52 ,
522 ≡ 503 has no small factors.
So choose B = {−1, 2, 5} and select as B-numbers 47, 49, and 51. The factorizations give the matrix
⎡
⎤ ⎡
⎤
010
010
⎣0 3 2⎦ ≡ ⎣0 1 0⎦ (mod 2),
042
000
which has the relation r1 = r2 . This gives
(47 · 49)2 ≡ 1600 = 402 ;
we find that 47 · 49 ≡ 102 modulo n, and
gcd(102 + 40, n) = 71
identifies the factor 71 of n.
12.8 Factorizing
275
One of the things that makes this method so effective is the ability of
computers to do linear algebra – row reduction in particular – over F2 very
rapidly.
12.8.3 Elliptic Curves
The third and final factorizing method we mention is the easiest to describe
but by far the hardest to analyze. It was invented by Lenstra and is implemented in many computer packages. Suppose you are given an integer n that
is known not to be prime. The idea is to choose an elliptic curve E defined
over Z, together with a rational nontorsion point P , and then try to compute
the sequence of points P, 2P, 3P, . . . modulo n. This sounds simple, yet it is
based on some hard analysis, and it does work for integers in a certain range.
The idea behind the method is that adding points on an elliptic curve using
the addition formulas always entails some division. Doing division modulo n
can only be achieved by inverting an integer modulo n, and this is only possible if that integer is coprime to n. Thus failure to produce the next point in
the sequence finds a factor of n.
In order to follow the discussion below, it may be helpful to review the
explicit formulas for addition on an elliptic curve from Section 5.3 and recall that addition, multiplication, and division (via the Euclidean Algorithm)
modulo n have polynomial complexity.
Example 12.42. Let n = 21, and choose the elliptic curve
E : y 2 + y = x3 − x
together with the point P = (0, 0). The first few multiples of P modulo 21
can be computed without any problem:
2P = (1, 0), 3P = (20, 20), 4P = (2, 18), 5P = (16, 2), 6P = (6, 14).
However, the computer will refuse to calculate 7P modulo 21. In order to
find this value, it would need to invert 6 modulo 21; this is done using the
Euclidean Algorithm, which rapidly detects the factor 3 of 21. Notice that if
the initial point is P = (0, 0), then it is gcd(x(kP ), n) that potentially reveals
a factor of n.
Example 12.43. For a slightly more impressive example, let n = 39701558597.
Working with the same elliptic curve and the same point, this time the computer will find
526P = (3341173047, 12476794460)
(mod n),
but refuses to go any further. The reason is that
gcd(3341173047, n) = 1049
yields a factor of n. The other factor, 37847053, is now easily found.
276
12 Computational Number Theory
Lenstra’s idea was that the flexibility of choosing curves E and points P ,
together with a suitable multiplier k, might make it possible to detect when
computing kP modulo n becomes impossible – in other words, when you
stumble onto a factor of n. There are much fuller accounts of this topic than
we are able to give in the notes at the end of the chapter. An important aspect
of the complexity of Lenstra’s method is that the running time depends on
the second largest prime factor of n, which may be much smaller than the
square root of n. Rather than seek to analyze this method, we suggest the
following as a worthwhile exercise.
Exercise 12.9. Using a computer package programmed with the arithmetic
of elliptic curves, try to factorize some large integers of your choosing using
Lenstra’s method.
12.8.4 Elliptic Curve Factorizing in Practice
As an example of how Lenstra’s method has been used in practice we include the following. A repeated theme in the book has been the phenomenon
whereby an integer can be known as a composite with only a partial factorization (or none at all). On April 25th 1998, the complete factorization of the
Mersenne number M589 was obtained. Since 589 is divisible by 19 and 31, two
obvious factors present themselves; M19 and M31 . There are two others, the
smallest of which is the 46 digit prime
2023706519999643990585239115064336980154410119.
The other prime factor has 227 decimal digits. In this instance, the second
largest prime factor is quite a bit smaller than the square root of the number
to be factored, which is an ideal situation for the application of Lenstra’s
method.
12.9 Complexity of Arithmetic in Finite Fields
For the results in this chapter, we used the complexity of the following operations over Fp = Z/pZ:
C(add a to b) = O(log p);
C(multiply a by b) = O (log p)2 ;
C(invert a = 0) = O (log p)3 .
These estimates can be extended to cover arithmetic in finite fields. The
elements of a finite field can be represented explicitly as polynomials over Fp .
The most important estimate we need is the complexity of performing the
Euclidean Algorithm in Fp [x] for a fixed prime p. The proofs of the following
results closely follow the earlier complexity arguments.
12.9 Complexity of Arithmetic in Finite Fields
277
Theorem 12.44. Suppose f and g are nonzero elements of Fp [x] whose degrees are bounded by n. Then we can
find q, r in Fp [x] with f = gq + r
and deg r < deg g, with complexity O n2 (log p)3 .
Corollary 12.45. Suppose Fq is a finite field, q = pr , in which multiplication
is determined by a monic irreducible polynomial of degree r (that is, Fq is presented as Fp [x]/f (x) · Fp [x] for a monic irreducible polynomial f of degree r).
Then, for a, b = 0 in Fq ,
(1) C(add a to b) = O(r logp) = O(log
q); (2) C(multiply a by b) = O r2 (log
p)2 = O (log
q)2 ;
3
3
(3) C(invert b = 0)= O r3 (log p)
= O (log q) ;
n
2
(4) C(find b ) = O log n(log q) .
Exercise 12.10. Prove Theorem 12.44 and Corollary 12.45.
Notes to Chapter 12: For this chapter we leaned heavily on Koblitz’s excellent
account of computational number theory [90]. Consult the books of Bressoud and
Wagon [21] and Bressoud [22] for more on this topic, as well as Cohen [32], Crandall
and Pomerance [36], von zur Gathen and Gerhard [66], and the references therein.
The lower bound (12.6) for Carmichael numbers is due to Alford, Granville, and
Pomerance [2]. The RSA algorithm from Section 12.2 appeared in a paper of Rivest,
Shamir and Adleman [129]. A few years before RSA was invented at MIT, Clifford
Cocks in the UK invented a public-key cryptography scheme using similar ideas. His
invention was classified until very recently. For details on the Agrawal–Kayal–Saxena
algorithm, see their original paper [1], Bernstein’s paper [12], or the announcement
by Bornemann [17] and the references therein. Lenstra and Lenstra wrote an account
of the Number Field Sieve in [99]. An excellent comparison of sieving methods, as
well as some interesting history, can be found in a survey article by Pomerance [117].
The current state of the Cunningham Project may be found on Wagstaff’s Web
site [156].
References
1. M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P,
www.cse.iitk.ac.in/users/manindra/primality.ps.
2. W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many
Carmichael numbers, Ann. Math. (2) 139, no. 3 (1994), 703–722.
3. R. Apéry, Irrationalité de ζ(2) et ζ(3), Astérisque 61 (1979), 11–13.
4. T. M. Apostol, Introduction to Analytic Number Theory, Undergraduate Texts
in Mathematics, Springer-Verlag, New York, 1976.
5. T. M. Apostol, Modular Functions and Dirichlet Series in Number Theory,
second ed., Graduate Texts in Mathematics, vol. 41, Springer-Verlag, New
York, 1990.
6. E. Artin, The Gamma Function, Translated by Michael Butler, Athena Series:
Selected Topics in Mathematics, Holt, Rinehart and Winston, New York, 1964.
7. A. Baker, Linear forms in the logarithms of algebraic numbers. I, II, III, Mathematika 13 (1966), 204-216; 14 (1967), 102-107; 14 (1967), 220–228.
8. R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive
primes. II, Proc. London Math. Soc. (3) 83, no. 3 (2001), 532–562.
9. K. Ball and T. Rivoal, Irrationalité d’une infinité de valeurs de la fonction zêta
aux entiers impairs, Invent. Math. 146, no. 1 (2001), 193–207.
10. E. J. Barbeau, Pell’s Equation, Problem Books in Mathematics, SpringerVerlag, New York, 2003.
11. V. Bergelson, Ergodic Ramsey theory—an update, in M. Pollicott and
K. Schmidt (eds.), Ergodic theory of Zd actions (Warwick, 1993–1994), London
Math. Soc. Lecture Note Ser., vol. 228, Cambridge Univ. Press, Cambridge,
1996, pp. 1–61.
12. D. J. Bernstein, Proving primality after Agrawal–Kayal–Saxena,
http://cr.yp.to/papers.html#aks.
13. F. Beukers, J. A. C. Kolk, and E. Calabi, Sums of generalized harmonic series
and volumes, Nieuw Arch. Wisk. (4) 11, no. 3 (1993), 217–224.
14. G. Billing and K. Mahler, On exceptional points on cubic curves, J. London
Math. Soc. 15 (1940), 32–43.
15. Y. Bilu, G. Hanrot, and P. M. Voutier, Existence of primitive divisors of Lucas
and Lehmer numbers, J. Reine Angew. Math. 539 (2001), 75–122, With an
appendix by M. Mignotte.
280
References
16. B. J. Birch and H. P. F. Swinnerton-Dyer, Notes on elliptic curves. I, II, J.
Reine Angew. Math. 212 (1963), 7–25; 218 (1965), 79–108.
17. F. Bornemann, PRIMES is in P: A Breakthrough for “Everyman”, Notices
Amer. Math. Soc. 50, no. 5 (2003), 545–552.
18. P. Borwein, Computational Excursions in Analysis and Number Theory, CMS
Books in Mathematics/Ouvrages de Mathématiques de la SMC, vol. 10,
Springer-Verlag, New York, 2002.
19. B. Brent, An expansion of ex off roots of one, Fibonacci Quart. 12 (1974), 208.
20. B. Brent, Functional equations with prime roots from arithmetic expressions
for Gα , Fibonacci Quart. 12 (1974), 199–207.
21. D. Bressoud and S. Wagon, A Course in Computational Number Theory, Key
College Publishing, Emeryville, CA, 2000.
22. D. M. Bressoud, Factorization and Primality Testing, Undergraduate Texts in
Mathematics, Springer-Verlag, New York, 1989.
23. C. Breuil, B. Conrad, F. Diamond, and R. Taylor, On the modularity of elliptic
curves over Q: wild 3-adic exercises, J. Amer. Math. Soc. 14, no. 4 (2001), 843–
939.
24. V. Brun, La série 1/5 + 1/7 + 1/11 + 1/13 + 1/17 + 1/19 + 1/29 + 1/31 + 1/41 +
1/43 + 1/59 + 1/61 + · · · où les dénominateurs sont nombres premiers jumeaux
est convergente ou finie, Bull. Sci. Math. 43 (1919), 100–104, 124–128.
25. C. Caldwell, The Prime Pages, www.utm.edu/research/primes/.
26. J. W. S. Cassels, Mordell’s finite basis theorem revisited, Math. Proc. Cambridge Philos. Soc. 100, no. 1 (1986), 31–41.
27. J. W. S. Cassels, Lectures on Elliptic Curves, London Mathematical Society
Student Texts, vol. 24, Cambridge University Press, Cambridge, 1991.
28. J. S. Chahal, Topics in Number Theory, The University Series in Mathematics,
Plenum Press, New York, 1988.
29. K. Chandrasekharan, Elliptic Functions, Grundlehren der Mathematischen
Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 281,
Springer-Verlag, Berlin, 1985.
30. D. V. Chudnovsky and G. V. Chudnovsky, Sequences of numbers generated
by addition in formal groups and new primality and factorization tests, Adv.
Appl. Math. 7, no. 4 (1986), 385–434.
31. P. A. Clement, Congruences for sets of primes, Amer. Math. Monthly 56 (1949),
23–25.
32. H. Cohen, Advanced Topics in Computational Number Theory, Graduate Texts
in Mathematics, vol. 193, Springer-Verlag, New York, 2000.
33. J. B. Conrey, The Riemann Hypothesis, Notices Amer. Math. Soc. 50, no. 3
(2003), 341–353.
34. G. Cornell and J. H. Silverman (eds.), Arithmetic Geometry, Papers from the
Conference held at the University of Connecticut, Storrs, Connecticut, July
30–August 10, 1984, Springer-Verlag, New York, 1986.
35. G. Cornell, J. H. Silverman, and G. Stevens (eds.), Modular Forms and Fermat’s Last Theorem, Papers from the Instructional Conference on Number
Theory and Arithmetic Geometry held at Boston University, Boston, MA, August 9–18, 1995, Springer-Verlag, New York, 1997.
36. R. Crandall and C. Pomerance, Prime Numbers: A Computational Perspective,
Springer-Verlag, New York, 2001.
37. J. Cremona, Elliptic curve data, www.maths.nott.ac.uk/personal/jec/.
References
281
38. J. T. Cross, In the Gaussian integers, α4 + β 4 = γ 4 , Math. Mag. 66, no. 2
(1993), 105–108.
39. H. Darmon, Rational Points on Modular Elliptic Curves, CBMS Regional Conference Series in Mathematics, vol. 101, Published for the Conference Board of
the Mathematical Sciences, Washington, DC, 2004.
40. H. Davenport, Multiplicative Number Theory, Revised and with a preface by
Hugh L. Montgomery, third ed., Graduate Texts in Mathematics, vol. 74,
Springer-Verlag, New York, 2000.
41. P. Deligne, Preuve des conjectures de Tate et de Shafarevitch (d’après G.
Faltings), Astérisque (1985), nos. 121-122, 25–41, (Seminar Bourbaki, Vol.
1983/84).
42. L. E. Dickson, History of the Theory of Numbers. Vol. I: Divisibility and Primality, Chelsea Publishing Co., New York, 1966.
43. L. E. Dickson, History of the Theory of Numbers. Vol. II: Diophantine Analysis,
Chelsea Publishing Co., New York, 1966.
44. L. E. Dickson, History of the Theory of Numbers. Vol. III: Quadratic and
Higher Forms, with a Chapter on the Class Number by G. H. Cresse, Chelsea
Publishing Co., New York, 1966.
45. R. E. Dressler, A stronger Bertrand’s postulate with an application to partitions, Proc. Amer. Math. Soc. 33 (1972), 226–228.
46. U. Dudley, History of a formula for primes, Amer. Math. Monthly 76 (1969),
23–28.
47. H. M. Edwards, Riemann’s Zeta Function, Dover Publications Inc., Mineola,
NY, 2001, reprint of the 1974 original [Academic Press, New York].
48. M. Einsiedler, G. R. Everest, and T. Ward, Primes in elliptic divisibility sequences, London Math. Soc. J. Comput. Math. 4 (2001), 1–15.
49. N. D. Elkies, On A4 +B 4 +C 4
= D 4 , Math. Comp. 51, no. 184 (1988), 825–835.
∞
−n
50. N. D. Elkies, On the sums
, Amer. Math. Monthly 110,
k=−∞ (4k + 1)
no. 7 (2003), 561–573.
51. P. Erdös, Beweis eines Satzes von Tschebyschef, Acta Litt. Sci. Szeged 5 (1932),
194–198.
52. P. Erdös, On a new method in elementary number theory which leads to an
elementary proof of the prime number theorem, Proc. Nat. Acad. Sci. U.S.A.
35 (1949), 374–384.
53. Euclid, The Thirteen Books of Euclid’s Elements Translated from the Text of
Heiberg. Vol. I: Introduction and Books I, II. Vol. II: Books III–IX. Vol. III:
Books X–XIII and Appendix, Translated with Introduction and Commentary
by Thomas L. Heath, 2nd ed, Dover Publications Inc., New York, 1956.
54. G. R. Everest and H. King, Prime powers in elliptic divisibility sequences,
Comp. Math. to appear, 2005.
55. G. R. Everest, G. McLaren, and T. Ward, Divisibility in elliptic divisibility
sequences, arXiv:math.NT/0409540, 2004.
56. G. R. Everest, V. Miller, and N. Stephens, Primes generated by elliptic curves,
Proc. Amer. Math. Soc. 132, no. 1 (2004), 955–963.
57. G. R. Everest, P. Rogers, and T. Ward, A higher rank Mersenne problem,
Springer Lecture Notes in Computer Science 2369 (2002), 95–107.
58. G. R. Everest, A. J. van der Poorten, I. Shparlinski, and T. Ward, Recurrence Sequences, Mathematical Surveys and Monographs, vol. 104, American
Mathematical Society, Providence, RI, 2003.
282
References
59. G. R. Everest and T. Ward, Heights of Polynomials and Entropy in Algebraic
Dynamics, Springer-Verlag, London, 1999.
60. J.-H. Evertse, On sums of S-units and linear recurrences, Compositio Math.
53, no. 2 (1984), 225–244.
61. G. Faltings, Endlichkeitssätze für abelsche Varietäten über Zahlkörpern, Invent. Math. 73, no. 3 (1983), 349–366.
62. G. Faltings, Erratum: “Finiteness theorems for abelian varieties over number
fields”, Invent. Math. 75, no. 2 (1984), 381.
63. H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955),
353.
64. H. Furstenberg, Recurrence in Ergodic Theory and Combinatorial Number Theory, M. B. Porter Lectures, Princeton University Press, Princeton, NJ, 1981.
65. J. M. Gandhi, Formulae for the nth prime, in J. H. Jordan and W. A. Webb
(eds.), Proceedings of the Washington State University Conference on Number Theory (Washington State Univ., Pullman, Wash., 1971), Dept. Math.,
Washington State Univ., Pullman, Wash., 1971, pp. 96–106.
66. J. von zur Gathen and J. Gerhard, Modern Computer Algebra, second ed.,
Cambridge University Press, Cambridge, 2003.
67. C. F. Gauss, Disquisitiones Arithmeticae, Translated and with a preface by
Arthur A. Clarke, Revised by William C. Waterhouse, Cornelius Greither and
A. W. Grootendorst and with a preface by William C. Waterhouse, SpringerVerlag, New York, 1986.
68. S. S. Gelbart and S. D. Miller, Riemann’s zeta function and beyond, Bull.
Amer. Math. Soc. (N.S.) 41, no. 1 (2004), 59–112.
69. D. Goldfeld, Sur les produits partiels Eulériens attachés aux courbes elliptiques,
C. R. Acad. Sci. Paris Sér. I Math. 294, no. 14 (1982), 471–474.
70. D. Goldfeld, The elementary proof of the prime number theorem: an historical
perspective, in D. Chudnovsky, G. Chudnovsky and M. B. Nathanson (eds.),
Number theory (New York, 2003), Springer-Verlag, New York, 2004, pp. 179–
192.
71. S. W. Golomb, A direct interpretation of Gandhi’s formula, Amer. Math.
Monthly 81 (1974), 752–754.
72. W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11,
no. 3 (2001), 465–588.
73. B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, arXiv:math.NT/0404188, 2004.
74. G. H. Hardy, Divergent Series, The Clarendon Press, Oxford University Press,
Oxford, 1949.
75. G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers,
fifth ed., The Clarendon Press, Oxford University Press, New York, 1979.
76. H. Hasse, Number Theory, Translated from the third German edition and with
a preface by Horst Günter Zimmer, Springer-Verlag, Berlin, 1980.
77. E. Hecke, Lectures on the Theory of Algebraic Numbers, Translated from the
German by George U. Brauer, Jay R. Goldman and R. Kotzen, Graduate Texts
in Mathematics, vol. 77, Springer-Verlag, New York, 1981.
78. K. Heegner, Diophantische Analysis und Modulfunktionen, Math. Z. 56 (1952),
227–253.
79. D. Husemoller, Elliptic Curves, With an Appendix by Ruth Lawrence, Graduate Texts in Mathematics, vol. 111, Springer-Verlag, New York, 1987.
References
283
80. A. E. Ingham, On the difference between consecutive primes, Quart. J. Math.
Oxford Ser. (2) 8 (1937), 255–266.
81. G. J. O. Jameson, The Prime Number Theorem, London Mathematical Society
Student Texts, vol. 53, Cambridge University Press, Cambridge, 2003.
82. W. Jänichen, Über die Verallgemeinerung einer Gaußschen Formel aus der
Theorie der höheren Kongruenzen, Sitzungsber. Berl. Math. Ges. 20 (1921),
23–29.
83. G. J. Janusz, Algebraic Number Fields, second ed., Graduate Studies in Mathematics, vol. 7, American Mathematical Society, Providence, RI, 1996.
84. G. A. Jones and J. M. Jones, Elementary Number Theory, Springer Undergraduate Mathematics Series, Springer-Verlag London Ltd., London, 1998.
85. J. P. Jones, D. Sato, H. Wada, and D. Wiens, Diophantine representation of
the set of prime numbers, Amer. Math. Monthly 83, no. 6 (1976), 449–464.
86. D. Joyce, Euclid’s Elements, aleph0.clarku.edu/˜djoyce/.
87. Y. Katznelson, An Introduction to Harmonic Analysis, corrected ed., Dover
Publications Inc., New York, 1976.
88. W. Keller, Fermat Factoring Status, www.prothsearch.net/fermat.html.
89. N. Koblitz, Introduction to Elliptic Curves and Modular Forms, Graduate
Texts in Mathematics, vol. 97, Springer-Verlag, New York, 1984.
90. N. Koblitz, A Course in Number Theory and Cryptography, Graduate Texts in
Mathematics, vol. 114, Springer-Verlag, New York, 1987.
91. L. J. Lander and T. R. Parkin, Counterexample to Euler’s conjecture on sums
of like powers, Bull. Amer. Math. Soc. 72 (1966), 1079.
92. L. J. Lander and T. R. Parkin, A counterexample to Euler’s sum of powers
conjecture, Math. Comp. 21 (1967), 101–103.
93. L. J. Lander, T. R. Parkin, and J. L. Selfridge, A survey of equal sums of like
powers, Math. Comp. 21 (1967), 446–459.
94. S. Lang, Elliptic Curves: Diophantine Analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences],
vol. 231, Springer-Verlag, Berlin, 1978.
95. S. Lang, Elliptic Functions, second ed., Graduate Texts in Mathematics,
vol. 112, Springer-Verlag, New York, 1987, (with an appendix by J. Tate).
96. S. Lang, Algebraic Number Theory, second ed., Graduate Texts in Mathematics, vol. 110, Springer-Verlag, New York, 1994.
97. D. H. Lehmer, Factorization of certain cyclotomic functions, Ann. Math. 34
(1933), 461–479.
98. F. Lemmermeyer, Reciprocity Laws From Euler to Eisenstein, Springer Monographs in Mathematics, Springer-Verlag, Berlin, 2000.
99. A. K. Lenstra and H. W. Lenstra, Jr. (eds.), The Development of the Number
Field Sieve, Lecture Notes in Mathematics, vol. 1554, Springer-Verlag, Berlin,
1993.
100. W. J. LeVeque, Fundamentals of Number Theory, Addison-Wesley Publishing
Co., Reading, Mass.-London-Amsterdam, 1977.
101. E. Lutz, Sur l’equation y 2 = x3 − Ax − B dans les corps p–adic, J. Reine
Angew. Math. 177 (1937), 237–247.
102. K. Mahler, On the Chinese remainder theorem, Math. Nachr. 18 (1958), 120–
122.
103. K. Mahler, An application of Jensen’s formula to polynomials, Mathematika 7
(1960), 98–100.
284
References
104. K. Mahler, On some inequalities for polynomials in several variables, J. London
Math. Soc. 37 (1962), 341–344.
105. B. Mazur, Modular curves and the Eisenstein ideal, Inst. Hautes Études Sci.
Publ. Math. (1977), no. 47, 33–186.
106. T. Metsänkylä, Catalan’s conjecture: another old Diophantine problem solved,
Bull. Amer. Math. Soc. (N.S.) 41, no. 1 (2004), 43–57.
107. W. H. Mills, A prime-representing function, Bull. Amer. Math. Soc. 53 (1947),
604.
108. J. S. Milne, Abelian varieties, in G. Cornell and J. H. Silverman (eds.), Arithmetic Geometry, Papers from the Conference held at the University of Connecticut, Storrs, Connecticut, July 30–August 10, 1984, Springer-Verlag, New
York, 1986, pp. 103–150.
109. P. Monsky, Simplifying the proof of Dirichlet’s theorem, Amer. Math. Monthly
100, no. 9 (1993), 861–862.
110. L. J. Mordell, On the rational solutions of the indeterminate equations of the
third and fourth degrees, Proc. Cambridge Phil. Soc. 21 (1922), 179–192.
111. P. Moss, Algebraic Realizability Problems, Ph.D. thesis, University of East Anglia, 2003.
112. T. Nagell, Solution de quelque problèmes dans la théorie arithmétique des
cubiques planes du premier genre, Wid. Akad. Skrifter Oslo I 1 (1935).
113. J. J. O’Connor and E. F. Robertson, History of mathematics archive,
www-gap.dcs.st-and.ac.uk/˜history/.
114. A. M. Odlyzko and H. J. J. te Riele, Disproof of the Mertens conjecture, J.
Reine Angew. Math. 357 (1985), 138–160.
115. S. J. Patterson, An Introduction to the Theory of the Riemann Zeta-Function,
Cambridge Studies in Advanced Mathematics, vol. 14, Cambridge University
Press, Cambridge, 1988.
116. H. Poincaré, Sur les propriétés arithmétiques des courbes algébriques, J. de
Liouville 7 (1901), 166–233.
117. C. Pomerance, A tale of two sieves, Notices Amer. Math. Soc. 43, no. 12 (1996),
1473–1485.
118. A. J. van der Poorten, A proof that Euler missed. . .Apéry’s proof of the irrationality of ζ(3): An informal report, Math. Intelligencer 1, no. 4 (1978/79),
195–203.
119. A. J. van der Poorten, Notes on Fermat’s Last Theorem, Canadian Mathematical Society Series of Monographs and Advanced Texts, John Wiley & Sons
Inc., New York, 1996.
120. A. J. van der Poorten and H. P. Schlickewei, Additive relations in fields, J.
Austral. Math. Soc. Ser. A 51, no. 1 (1991), 154–170.
121. D. Ramakrishnan and R. J. Valenza, Fourier Analysis on Number Fields, Graduate Texts in Mathematics, vol. 186, Springer-Verlag, New York, 1999.
122. M. Reid, Undergraduate Algebraic Geometry, London Mathematical Society
Student Texts, vol. 12, Cambridge University Press, Cambridge, 1988.
123. P. Ribenboim, “1093”, Math. Intelligencer 5, no. 2 (1983), 28–34.
124. P. Ribenboim, Catalan’s Conjecture: Are 8 and 9 the Only Consecutive Powers?, Academic Press Inc., Boston, MA, 1994.
125. P. Ribenboim, The New Book of Prime Number Records, Springer-Verlag, New
York, 1996.
126. P. Ribenboim, Fermat’s Last Theorem for Amateurs, Springer-Verlag, New
York, 1999.
References
285
127. H.-E. Richert, Über Zerfällungen in ungleiche Primzahlen, Math. Z. 52 (1949),
342–343.
128. B. Riemann, Über die Anzahl der Primzahlen unter einer gegebenen
Grösse, Monatsberichte der Berliner Akademie (November 1859), 671–680,
www.maths.tcd.ie/pub/HistMath/People/Riemann/.
129. R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital
signatures and public-key cryptosystems, Comm. ACM 21, no. 2 (1978), 120–
126.
130. T. Rivoal, Irrationalité d’au moins un des neuf nombres ζ(5), ζ(7), . . . , ζ(21),
Acta Arith. 103, no. 2 (2002), 157–167.
131. P. Rogers, Topics in Elliptic Divisibility Sequences, M.Phil. thesis, University
of East Anglia, 2002.
132. M. Roitman, On Zsigmondy primes, Proc. Amer. Math. Soc. 125, no. 7 (1997),
1913–1919.
133. A. Schinzel, Primitive divisors of the expression An − B n in algebraic number
fields, J. Reine Angew. Math. 268/269 (1974), 27–33.
134. H. P. Schlickewei, S-unit equations over number fields, Invent. Math. 102, no. 1
(1990), 95–107.
135. K. Schmidt and T. Ward, Mixing automorphisms of compact groups and a
theorem of Schlickewei, Invent. Math. 111, no. 1 (1993), 69–76.
136. A. Selberg, An elementary proof of the prime-number theorem, Ann. Math.
(2) 50 (1949), 305–313.
137. J.-P. Serre, A Course in Arithmetic, Graduate Texts in Mathematics, vol. 7,
Springer-Verlag, New York, 1973.
138. V. Shoup, Searching for primitive roots in finite fields, Math. Comp. 58, no. 197
(1992), 369–380.
139. J. H. Silverman, The Arithmetic of Elliptic Curves, Graduate Texts in Mathematics, vol. 106, Springer-Verlag, New York, 1986.
140. J. H. Silverman, Wieferich’s criterion and the abc-conjecture, J. Number Theory
30, no. 2 (1988), 226–237.
141. J. H. Silverman, Taxicabs and sums of two cubes, Amer. Math. Monthly 100,
no. 4 (1993), 331–340.
142. J. H. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves, Graduate Texts in Mathematics, vol. 151, Springer-Verlag, New York, 1994.
143. J. H. Silverman and J. Tate, Rational Points on Elliptic Curves, Undergraduate
Texts in Mathematics, Springer-Verlag, New York, 1992.
144. N. J. A. Sloane, An on-line version of the encyclopedia of integer
sequences, Electron. J. Combin. 1 (1994), Feature 1, approx. 5 pp.,
www.research.att.com/˜njas/sequences/.
145. M. Spivak, Calculus, 2 ed., Publish or Perish, Berkeley, CA, 1980.
146. H. M. Stark, A complete determination of the complex quadratic fields of classnumber one, Michigan Math. J. 14 (1967), 1–27.
147. I. Stewart and D. Tall, Algebraic Number Theory and Fermat’s Last Theorem,
third ed., A K Peters Ltd., Natick, MA, 2002.
148. Z. Šuniḱ, An ideal functional equation with a ring, Mathematics Magazine
(October 2004), 310–313.
149. L. Szpiro, La conjecture de Mordell (d’après G. Faltings), Astérisque (1985),
no. 121-122, 83–103, (Seminar Bourbaki, Vol. 1983/84).
286
References
150. J. T. Tate, Fourier analysis in number fields, and Hecke’s zeta-functions, in
J. W. S. Cassels and A. Fröhlich (eds.), Algebraic Number Theory, Proceedings
of an Instructional Conference, Brighton, 1965, Thompson, Washington, DC,
1967, pp. 305–347.
151. P. L. Tchebychef, Oeuvres. Tomes I, II, Chelsea Publishing Co., New York,
1962.
152. P. L. Tchebychef, Mémoire sur les nombres premiers, J. de Math. Pures. Appl.
17 (1852), 366–390.
153. E. C. Titchmarsh, The theory of the Riemann Zeta-Function, edited and with
a preface by D. R. Heath-Brown, second ed., The Clarendon Press, Oxford
University Press, New York, 1986.
154. E. Trost, Primzahlen, Verlag Birkhäuser, Basel-Stuttgart, 1953.
155. J. B. Tunnell, A classical Diophantine problem and modular forms of weight
3/2, Invent. Math. 72, no. 2 (1983), 323–334.
156. S. Wagstaff, The Cunningham Project, www.cerias.purdue.edu/homes/ssw/.
157. A. Weil, Number theory: An Approach Through History, From Hammurapi to
Legendre, Birkhäuser Boston Inc., Boston, MA, 1984.
158. A. Weil, Basic Number Theory, Classics in Mathematics, Springer-Verlag,
Berlin, 1995, Reprint of the second (1973) edition.
159. A. Weil, Elliptic Functions According to Eisenstein and Kronecker, Classics in
Mathematics, Springer-Verlag, Berlin, 1999.
160. E. T. Whittaker and G. N. Watson, A Course of Modern Analysis, Cambridge
Mathematical Library, Cambridge University Press, Cambridge, 1996.
161. A. Wiles, The Birch and Swinnerton-Dyer conjecture (Clay Mathematics Institute), www.claymath.org/prizeproblems/birchsd.pdf.
162. A. Wiles, Modular elliptic curves and Fermat’s last theorem, Ann. Math. (2)
141, no. 3 (1995), 443–551.
163. D. W. Wilson, The fifth taxicab number is 48988659276962496, J. Integer Seq.
2 (1999), Article 99.1.9, 1 HTML document.
164. H. Wußing, Implizite gruppentheoretische Denkformen in den Disquisitiones
Arithmeticae von Carl Friedrich Gauß, in Proceedings of the 2nd Gauss Symposium. Conference A: Mathematics and Theoretical Physics (Munich, 1993)
(Berlin), Sympos. Gaussiana, de Gruyter, 1995, pp. 179–185.
165. M. Yabuta, A simple proof of Carmichael’s theorem on primitive divisors,
Fibonacci Quart. 39, no. 5 (2001), 439–443.
166. D. Zagier, The Birch-Swinnerton-Dyer conjecture from a naive point of view, in
G. van der Geer, F. Oort, and J. H. M. Steenbrink (eds.), Arithmetic Algebraic
Geometry (Texel, 1989), Progr. Math., vol. 89, Birkhäuser Boston, Boston,
MA, 1991, pp. 377–389.
167. G. M. Ziegler, The great prime number record races, Notices Amer. Math. Soc.
51, no. 4 (2004), 414–416.
168. K. Zsigmondy, Zur Theorie der Potenzreste, Monatsh. Math. 3 (1892), 265–284.
169. J. A. Zuehlke, Fermat’s last theorem for Gaussian integer exponents, Amer.
Math. Monthly 106, no. 1 (1999), 49.
Index
(·), 84
·, 63
<
· ·, · >, 237
, 65, 226
·
C, 3
C(·), 247
∆, 78
δ = δ(d), 86
DK , 190
d(·), 159
E(C), 127
E(K), 127
E(Q), 104
e(·), 121
Fn , 29
·, 19, 35, 158
F∗q , 3
Fq , 3
{·}, 75
Γ (·), 183
γ, 10
gcd(·,·), 36
214
G,
I, 164
(·), 3
Int(γ), 173
I ∗ , 86
k(·, p), 19
L(·,·), 209
, 4
Mn , 25
µ(·), 161
N , 85
N, 3
N (I), 87
ν(·), 165
O(·), 3
o(·), 3
OK , OK∗ , 85
ordp (·), 27
P, 3
P2 (C), 126
P2 (Fp ), 239
℘, 122
φ(·), 61
φd (·), 167
π(·), 21, 157, 180
Q, 3
Q, 142
R∗ , 46
R, 3
(·), 3
S, 187
∼, 3
T , 85
T (·), 117
θ(·), 18
U (R), 46
Z, 3
ZS , 55
Abel, 5
Limit Theorem, 210, 223
Summation Formula, 219, 235
Abelian theorem, 224
Abelian variety, 97
additive reduction, 241
288
Index
adeles, 224
Agrawal–Kayal–Saxena algorithm, 266,
277
algebraic integers, 84, 226
prime ideals, 85
quadratic field, 84, 86, 91, 228, 231,
232
units, 77, 85
analytic
continuation, 175
L-function of elliptic curve, 242
Dedekind zeta function, 229, 230
L-function, 219
Riemann zeta function, 175, 178,
185
function, 170
Morera’s Theorem, 174
uniform convergence, 173
Ankeny, 268
arithmetic function, 60
completely multiplicative, 60
convolution, 164
multiplicative, 60
Artin, 5
conjecture, 65
associative law, 97
Bachet, 5
equation, 93, 118
Bernoulli number, 203
regular prime, 83
Bertrand’s Postulate, 19
improvements, 42
prime counting function, 22
sum of primes, 23
Bezout’s Lemma, 37
Bhaskaracharya, 5
Binomial Theorem, 24, 27, 63, 71, 180,
263, 267
Birch and Swinnerton-Dyer, 240
conjectures, 242
congruent numbers, 243
bit length, 250
bit operation, 246
Brahmagupta, 5
Brun’s constant, 33
canonical height, 146
Carmichael
number, 34, 260
square-free, 261
primitive divisors, 169
Theorem, 169
Catalan equation, 25, 57
Cataldi, 5, 24
Cauchy
formula, 173
Residue Theorem, 126, 131
sequence, 147, 223
character, 213, 229
Dirichlet, 217, 227, 268
group, 214
L-function, 217
orthogonality relations, 215
Chinese Remainder Theorem, 61, 81,
256
proof, 62
class group, 90
class number, 81, 89
formula, 225
via quadratic forms, 243
one, 243
Clay Mathematics Institute, 187, 243
Birch and Swinnerton-Dyer conjectures, 243
Riemann Hypothesis, 187
Clement, 33
Cole, 27
complexity, 247
O-notation, 247
polynomial, 250
congruent number, 98, 99, 103
elliptic curve, 100, 118
Zsigmondy’s Theorem, 116
problem, 101
Tunnell’s Theorem, 100, 243
conjugate, 86
convolution, 164
identity, 164
inverse, 164
coprime, 24, 36
critical strip, 185
cryptography, 251
digital signature, 252
public, private key, 251
Cunningham Project, 269, 277
cusp, 241
cyclotomic polynomial, 167
Index
Dedekind
zeta function, 229
analytic continuation, 230
Euler product, 229
residue formula, 230
digital signature, 252
Diophantine
approximation, 75
equation, 43
quadratic, 59
solutions modulo p, 67
sum of squares, 48
Diophantus, 5
Dirichlet, 5, 75, 225, 230
Theorem, 211
character, 217, 227, 268
multiplicative, 218
convolution, 164
Abelian group, 165
identity, 164
inverse, 164
kernel, 190
L-function, 217
elliptic curve, 241
series, 166, 180, 209, 221
Theorem, 224
discriminant
polynomial, 150
quadratic field, 92
quadratic form, 78
divisibility sequence, 112
duplication formula, 134, 141, 146
proof, 137
elementary proof, 180
elliptic
curve, 54, 56, 93, 127, 139
additive reduction, 241
affine part, 126
arithmetic conductor, 242
associative law, 97
canonical height, 146
complex torsion point, 129
congruent number, 100, 116
cusp, 241
defined over K, 127
duplication formula, 134
Fermat’s Last Theorem, 56
forms a group, 96
289
generalized Weierstrass equation,
109, 116
Hasse Theorem, 240
identity, 96
isogeny, 112
isomorphism, 132
K-points, 127
L-function, 241
magnified point, 113
m-isogenous, 113
naı̈ve height, 104
non-degeneracy, 129, 239
non-split multiplicative reduction,
241
other characteristics, 109
over a finite field, 238
points of order eleven, 110
rank, 105, 240
rank, generators, 155
rational torsion point, 129
split multiplicative reduction, 241
torsion point, 106, 128, 155
Weierstrass equation, 108
Weierstrass model, 108
divisibility sequence, 112, 118
function, 125, 132
fundamental domain, 125
periods, 125
Eratosthenes, 5, 16, 245
sieve, 16
Euclid, 5, 39, 41, 207
infinitely many primes, 8
Theorem, 7
analytic proof, 8
Goldbach’s proof, 40
original proof, 39
sum of prime reciprocals, 11
topological proof, 40
Euclidean Algorithm, 35–37, 46, 84,
138, 258, 275
complexity, 254
complexity over finite fields, 277
finite fields, 276
polynomial, 137
Euclidean ring, 45, 47
Euler, 5
and Mersenne primes, 24
conjecture, 57
Elkies, 58
290
Index
Lander and Parkin, 57
Criterion, 262
Fermat Theorem, 62
four-square identity, 50
functional equation, 206
infinitely many primes, 8
Mascheroni constant, 10, 159, 197,
199
phi-function, 61, 167, 218, 221, 259
prime values of polynomials, 18
product, 14, 170
Dedekind zeta function, 229
L-function, 218
L-function of elliptic curve, 241
Riemann zeta function, 14
Summation Formula, 158
divisor function, 160
harmonic series, 159
Stirling’s Formula, 160
Extended Riemann Hypothesis, 268
factor base method, 272
factorization
Γ function, 198
of a function, 2, 13, 183, 198, 205
Riemann zeta function, 15, 205
sine function, 202
factorizing
factor base method, 272
rho method, 270
Faltings’ Theorem, 104, 118
Fermat, 5
and Mersenne primes, 24
Last Theorem, 56, 83
Little Theorem, 24, 34, 253, 255
base, 34
combinatorial proof, 24
composite modulus, 162
proof using group theory, 25
numbers, 29, 40
Fibonacci, 5, 62
sequence, 112
Fourier, 5
analysis, 42, 181, 206, 234
coefficient, 189
periodic function, 189
series, 189
transform, 188
fudge, 234
fundamental domain, 125
Fundamental Theorem of Arithmetic, 7,
24, 51–54, 183, 269
by inverting primes, 55
Euclidean rings, 47
failure, 54, 73, 74
Fermat’s Last Theorem, 83
for ideals, 84, 89
Dedekind zeta function, 229
function-theoretic, 197, 205
Gaussian integers, 48
in other rings, 45, 47
inductive proof, 38
proof, 35
fundamental unit, 85, 228
class number formula, 227
Furstenberg
topological proof of Euclid, 40
Galois, 5
Galois theory, 167
Gamma function, 183, 197, 206
analytic continuation, 184
analyticity, 184
Stirling’s Formula, 204
Gandhi’s prime formula, 163
Gauss, 5, 32, 41
class number one problem, 228
function lies in S, 188
lemma, 153
sum, 70, 233, 235
Gaussian integers, 45, 57
and cubic Diophantine equations, 52
Fundamental Theorem of Arithmetic,
48
primes, 49
gcd(·,·), 3
generator, 112
Goldbach, 18, 40
greatest common divisor, 36, 46
group
of units, 46, 47, 54, 62, 75
group theory, 25, 26, 47, 60, 66, 106
Hadamard, 157
Hardy–Ramanujan number, 117
harmonic
analysis, 181
series, 9
Index
integral test, 10
Hasse
Theorem, 109, 240, 243
al-Haytham, 5, 62
Theorem, 32, 48, 60, 257
Clement’s extension, 33
Gauss’ generalization, 32
Hecke, 230
height, 104, 113
canonical, 146
logarithmic, 133
naı̈ve, 133
projective, 133
Hilbert, 5
10th problem, 18
Nullstellensatz, 136, 155
homogenous
coordinates, 110
polynomial, 135
defines a morphism, 135
degree, 135
l’Hôpital, 193, 201
Hurwitz’ Lemma, 87
Hypatia, 5
ideal, 84
class group, 90
class number, 89
conjugate, 86
divides, 88
Fundamental Theorem of Arithmetic,
89
maximal, 89
norm, 87
prime, 89
principal, 84
sum, product, 85
ideal class group, 90
industrial prime, 266
inert prime, 91
irreducible, 46
isogeny, 112
Jacobi, 5
symbol, 226, 269
Jiushao, 5
knapsack problem, 250
Kronecker
291
symbol, 227
Kronecker’s Lemma, 152
Kummer, 5, 83
L-function, 209, 268
analytic continuation, 219
attached to elliptic curve, 241
analytic continuation, 242
Euler product, 241
nonvanishing, 212
Lagrange, 5
and Wilson, al-Haytham Theorem, 32
Theorem in group theory, 25
Theorem on quadratic forms, 79
Theorem on sum of squares, 50
Lamé, 83
lattice, 121
Lefschetz principle, 108, 128, 132
Legendre, 5
sum of squares, 52
symbol, 65, 91, 268
Euler criterion, 65
multiplicative, 66
Lehmer, 5
polynomial, 154
Problem, 152
length, 113
L-function
associated with groups, 224
Liouville, 5, 83
Theorem, 126
Littlewood, 157
Lucas, 5, 253
Lehmer test, 27
primality of Mersenne numbers, 26
sequence, 169
magnified point, 113
Mahler, 5
measure, 134, 150, 155
ergodic, 151
hyperbolic, 151
logarithmic, 150
Mascheroni, 5
Mazur’s Theorem, 110, 118
Mersenne, 5, 23
number, 25, 269, 276
factorization, 167
Zsigmondy’s Theorem, 27
292
Index
prime, 24
largest known, 30
Lucas–Lehmer test, 27
perfect number, 25
sequence, 112
Mertens
conjecture, 186, 206
Odlyzko and te Riele, 186
Theorem, 13
Millennium Prize Problems, 187, 243
Birch and Swinnerton-Dyer conjectures, 243
Miller–Rabin test, 265
Minkowski, 90
Möbius, 5
function, 161
FLT for composite moduli, 162
inversion formula, 165
multiplicative, 163
Prime Number Theorem, 162
Riemann Hypothesis, 186
inversion formula, 27
Mordell, 5, 104
Theorem, 104
proof, 140
weak, 104, 138, 142, 155
Morera’s Theorem, 174
morphism, 135
multiplicative function, 60
naı̈ve height, 133
Neron–Tate height, 146
non-split multiplicative reduction, 241
notation, 3
Nullstellensatz, 136
number field sieve, 269, 272, 277
orthogonality relations, 215
parallelogram law, 147, 148
naı̈ve, 148
proof, 148
Pell’s equation, 58
perfect number, 25
pigeonhole principle, 50, 75
Poincaré, 5, 104
Poisson, 5
Summation Formula, 193, 196, 206,
236
polynomial
ergodic, 151
height, 150
hyperbolic, 151
length, 150
Mahler measure, 150
primality testing, 31
Agrawal–Kayal–Saxena, 266, 277
al-Haytham’s Theorem, 257
Carmichael numbers, 35
deterministic, 268
Euclidean Algorithm, 253
Jacobi symbol, 264
Legendre symbol, 72
Lenstra, 275
Miller–Rabin, 265
pseudoprimes, 258
Solovay–Strassen, 262, 268
deterministic form, 268
using Fermat’s Little Theorem, 34,
255
Wilson’s Theorem, 34
prime, 46
counting function, 21, 33, 41, 157, 162
qualitative, 40
quantitative, 21, 41, 162
formula, 18, 33
Bertrand’s Postulate, 22
Gandhi, 163
ideal, 89
industrial, 266
inert, ramified, split, 91
regular, 83
twin, 33
Wieferich, 25
Prime Number Theorem, 157, 162, 180,
252
elementary proof, 181
Tchebychef, 162
primitive
divisor, 27
Carmichael’s Theorem, 169
elliptic divisibility sequence, 116
Zsigmondy’s Theorem, 28
root, 60, 261
Artin conjecture, 65
smallest, 65
root of unity, 167, 236
solution, 44, 79
Index
principal ideal, 84
domain, 84
private key, 251
projective space, 126
pseudoprime, 258
strong, 265
public key, 251
Pythagoras, 5, 43
equation, 43
primitive solution, 44
Pythagorean triple, 44, 99–102
congruent number, 100
integral area, 99
parametrized, 43
primitive, 44
quadratic
field, 73
class number, 89
norm, trace, 85
principal ideal, 84
units, 75
form, 225
class number, 81
discriminant, 78
positive-definite, 81
reduced, 81
theorem of Lagrange, 78
nonresidue, 65, 268
residue, 65, 239, 268
Quadratic Reciprocity Law, 67, 73, 81
computing Legendre symbols, 72
Jacobi symbol, 226
rabbit, 199
Ramanujan, 5, 114
number, 117
ramified prime, 91
regular prime, 83
remainder theorem, 47
Repeated Squaring algorithm, 255
rho method, 270
Riemann, 5
Hypothesis, 162, 186, 205, 268
and the Möbius function, 186
Extended, 268
primality testing, 268
Lebesgue Lemma, 192
zeta function, 13, 157, 166, 230
293
analytic continuation, 175, 176
and Dedekind zeta function, 229
critical strip, 185
Euler product, 14
functional equation, 185
RSA cryptosystem, 251, 266, 269, 272
Schwartz space, 187
Fourier transform, 188
Serre, 68
Siegel, 5
Theorem, 54, 108, 155
strong form, 149
Silverman’s Theorem, 116
Solovay–Strassen test, 262, 268
deterministic form, 268
split multiplicative reduction, 241
split prime, 91
square-free, 55, 99
Carmichael number, 262
Stirling, 5
Formula, 160, 204
sum of primes, 23
sum of squares, 48, 50, 51
uniqueness, 49, 52
Sun Zi, 5
S-unit equation, 56
symbol
Jacobi, quadratic, 226
Kronecker, 227
Legendre, 65
Tate, 146
Tauberian theorem, 224
taxicab number, 117
Tchebychef, 5, 162
theta function, 194
functional equation, 194
torsion point, 106, 148, 149
order 11, 110
Tunnell’s Theorem, 243
twin primes, 33
uniform convergence, 125, 171
and integrals, 173
zeta function, 173
unit, 46, 53, 54, 74, 213
fundamental, 85, 228
class number formula, 227
in quadratic field, 75
294
Index
root
primitive, 167
Wieferich prime, pair, 25
Wilson’s Theorem, 32, 257
winding number, 132
de la Vallée Poussin, 157
weak Mordell Theorem, 104, 138, 145,
155
Weierstrass, 5, 197
℘-function, 122, 173
absolute convergence, 123
is periodic, 125
Laurent expansion, 129
parametrizes elliptic curves, 127
equation, generalized equation, 109
model for elliptic curve, 108
Weil, 5
zeta function
Dedekind, 229, 230
analytic continuation, 229
irrational values, 206
Poission Summation Formula, 206
Riemann, 13, 157
analytic continuation, 175, 178, 185
Zsigmondy’s Theorem, 27
Carmichael Theorem, 169
elliptic analog, 116
odd, even bound, 116
proof, 167