Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PMATH 340 Lecture Notes on Elementary Number Theory Anton Mosunov Department of Pure Mathematics University of Waterloo Winter, 2017 Contents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Introduction . . . . . . . . . . . . . . . . . . . . . . . Divisibility. Factorization of Integers. The Fundamental Theorem of Arithmetic . . . . . . . Greatest Common Divisor. Least Common Multiple. Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . Diophantine Equations. The Linear Diophantine Equation ax + by = c . . . . . Euclidean Algorithm. Extended Euclidean Algorithm . Congruences. The Double-and-Add Algorithm . . . . . . . . . . . . The Ring of Residue Classes Zn . . . . . . . . . . . . Linear Congruences . . . . . . . . . . . . . . . . . . . The Group of Units Z?n . . . . . . . . . . . . . . . . . Euler’s Theorem and Fermat’s Little Theorem . . . . . The Chinese Remainder Theorem . . . . . . . . . . . Polynomial Congruences . . . . . . . . . . . . . . . . The Discrete Logarithm Problem. The Order of Elements in Z?n . . . . . . . . . . . . . . The Primitive Root Theorem . . . . . . . . . . . . . . Big-O Notation . . . . . . . . . . . . . . . . . . . . . Primality Testing . . . . . . . . . . . . . . . . . . . . 16.1 Trial Division . . . . . . . . . . . . . . . . . . 16.2 Fermat’s Primality Test . . . . . . . . . . . . . 16.3 Miller-Rabin Primality Test . . . . . . . . . . Public Key Cryptosystems. The RSA Cryptosystem . . . . . . . . . . . . . . . . . The Diffie-Hellman Key Exchange Protocol . . . . . . Integer Factorization . . . . . . . . . . . . . . . . . . 1 . . . . . . 3 . . . . . . Bézout’s . . . . . . 5 9 . . . . . . 15 . . . . . . 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 29 31 33 36 38 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 50 53 56 57 58 61 . . . . . . 62 . . . . . . 67 . . . . . . 69 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 19.1 Fermat’s Factorization Method . . . 19.2 Dixon’s Factorization Method . . . Quadratic Residues . . . . . . . . . . . . . The Law of Quadratic Reciprocity . . . . . Multiplicative Functions . . . . . . . . . . The Möbius Inversion . . . . . . . . . . . . The Prime Number Theorem . . . . . . . . The Density of Squarefree Numbers . . . . Perfect Numbers . . . . . . . . . . . . . . Pythagorean Triples . . . . . . . . . . . . . Fermat’s Infinite Descent. Fermat’s Last Theorem . . . . . . . . . . . Gaussian Integers . . . . . . . . . . . . . . Fermat’s Theorem on Sums of Two Squares Continued Fractions . . . . . . . . . . . . . The Pell’s Equation . . . . . . . . . . . . . Algebraic and Transcendental Numbers. Liouville’s Approximation Theorem . . . . Elliptic Curves . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 72 75 81 86 91 95 96 101 104 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 110 120 124 135 . . . . . . . . . . . . 137 . . . . . . . . . . . . 140 1 Introduction This is a course on number theory, undoubtedly the oldest mathematical discipline known to the world. Number theory studies the properties of numbers. These may be integers, like √ −2, 0 or 7, or rational numbers like 1/3 or −7/9, or algebraic numbers like 2 or i, or transcendental numbers like e or π. Though most of the course will be dedicated to Elementary Number Theory, which studies congruences and various divisibility properties of the integers, we will also dedicate several lectures to Analytic Number Theory, Algebraic Number Theory, and other subareas of number theory. There are many interesting questions that one might ask about numbers. In search for answers to these questions mathematicians unravel fascinating properties of numbers, some of which are quite profound. Here are several curious facts about prime numbers: 1. Every odd number exceeding 5 can be expressed as a sum of three primes (Helfgott-Vinogardov Theorem, 2013. In 1954, Vinogardov proved the result for all odd n > B for some B, and in 2013 Helfgott demonstrated that one can take B = 5); 2. There are infinitely many prime numbers p and q such that |p − q| ≤ 246 (Zhang’s Theorem, 2013. Zhang proved the result for 7 · 107 , and in 2014 the constant was reduced to 246 by Maynard, Tao, Konyagin and Ford); 3. For all n ≥ exp (exp(33.217)) there always exists a prime between n3 and (n + 1)3 (Ingham’s Theorem, 1937. Ingham proved the result for all n ≥ B for some B, and in 2014 Dudek demonstrated that one can take B as above); 4. There are infinitely many primes of the form x2 + y4 (Friedlander-Iwaniec Theorem, 1997); 5. Up to x > 1, there are “approximately” x/ log x prime numbers (Prime Number Theorem, 1896); 6. Given a positive integer d, there exist distinct prime numbers p1 , p2 , . . . , pd which form an arithmetic progression (Green-Tao Theorem, 2004). Despite the simplicity of their formulations, all of these results are highly nontrivial and their proofs reside on some deep theories. For example, the Green-Tao 3 Theorem resides on Szemerédi’s Theorem, which in turn uses the theory of random graphs. There are many number theoretical problems out there that are still open. At the 1912 International Congress of Mathematicians, the German mathematician Edmund Landau listed the following four basic problems about primes that still remain unresolved: 1. Can every even integer greater than 2 be written as a sum of two primes? (Goldbach’s Conjecture, 1742); 2. Are there infinitely many prime numbers p and q such that |p − q| = 2? (Twin Prime Conjecture, 1849); 3. Does there always exist a prime between two consecutive perfect squares? (Legendre’s Conjecture, circa 1800); 4. Are there infinitely many primes of the form n2 + 1? (see Bunyakovsky’s Conjecture, 1857). It is widely believed that the answer to each of the questions above is “yes”. There is a lot of computational evidence towards each of them, and for some of them conjectural asymptotic formulas were established. However, none of them are proved. Aside from being an interesting theoretical subject, number theory also has many practical applications. It is widely used in cryptographic protocols, such as RSA (Rivest-Shamir-Adleman, 1977), the Diffie-Hellman protocol (1976), and ECIES (Elliptic Curve Integrated Encryption Scheme). These protocols rely on certain fundamental properties of finite fields (RSA, D-H) and elliptic curves defined over them (ECIES). For example, consider the Discrete Logarithm Problem: given a prime p and integers c, m, one may ask whether there exists an integer d such that cd − m is divisible by p, and if so, what is its value. We may write this in the form of a congruence cd ≡ m (mod p). When p is extremely large (hundreds of digits) and c, m are chosen properly, this problem is widely believed to be intractable; that is, no modern computer can solve it in a reasonable amount of time (the computation would require billions of 4 years). This property is used in many cryptosystems, including the first two mentioned above. Many cryptosystems, like RSA, can be broken by quantum computers. The construction of protocols infeasible to attacks by quantum computers is a subject of Post Quantum Cryptography and number theory plays a crucial role there (see the Lattice-Based or Isogeny-Based Cryptography). 2 Divisibility. Factorization of Integers. The Fundamental Theorem of Arithmetic Before we proceed, let us invoke a little bit of notation: N = {1, 2, 3, . . .} — the natural numbers; Z = {0, ±1, ±2, . . .} — the ring of integers; Q = mn : m ∈ Z, n ∈ N — the field of fractions; R — the field of real numbers; C = {a + bi : a, b ∈ R, i2 = −1} — the field of complex numbers. We call Z a ring because 0, 1 ∈ Z and a, b ∈ Z implies a ± b ∈ Z and a · b ∈ Z. In other words, Z is closed under addition, subtraction and multiplication. Note, however, that a, b ∈ Z with b 6= 0 does not imply that a/b ∈ Z, so it is not closed under division. A collection that is closed under addition, subtraction, multiplication and division by a non-zero element is called a field. According to this definition, every field is also a ring. Exercise 2.1. Demonstrate the proper inclusions in N ( Z ( Q ( R ( C. No proofs are required. Definition 2.2. Let a, b ∈ Z. We say that a divides b, or that a is a factor of b, when b = ak for some k ∈ Z. We write a | b if this is the case, and a - b otherwise. Example 2.3. 3 | 12 because 12 = 3 · 4; 3 - 13; −1 | 7 because 7 = (−1) · (−7); 0 - 3. Proposition 2.4. 1 Let a, b, c, x, y ∈ Z. 1. If a | b and b | c, then a | c; 1 Proposition 1.2 in Frank Zorzitto, A Taste of Number Theory. 5 2. If c | a and c | b, then c | ax ± by; 3. If c | a and c - b, then c - a ± b; 4. If a | b and b 6= 0, then |a| ≤ |b|; 5. If a | b and b | a, then a = ±b; 6. If a | b, then ±a | ±b; 7. 1 | a for all a ∈ Z; 8. a | 0 for all a ∈ Z; 9. 0 | a if and only if a = 0. Proof. Exercise. Definition 2.5. Let p ≥ 2 be a natural number. Then p is called prime if the only positive integers that divide p are 1 and p itself. It is called composite otherwise. We remark that 1 is neither prime nor composite. We will also use the above terminology only with respect to integers exceeding 1 (so according to this convention −3 is not prime and −6 is not composite). Exercise 2.6. Among the collection −5, 1, 5, 6, which numbers are prime? Theorem 2.7. For each integer n ≥ 2 there exists a prime p such that p | n. Proof. We will prove this result using strong induction on n. Base case. For n = 2 we have 2 | n. Since 2 is prime, the theorem holds. Induction hypothesis. Suppose that the theorem is true for n = 2, 3, . . . , k. Induction step. We will show that the theorem is true for n = k + 1. If n is prime the result holds. Otherwise there exists a positive integer d such that d | n, d 6= 1 and d 6= n. By property 4 of Proposition 2.4 we have d ≤ n, and since d 6= 1 and d 6= n we conclude that 2 ≤ d ≤ n − 1 = k. Thus d satisfies the induction hypothesis, so there exists a prime p such that p | d. Since p | d and d | n, by property 1 of Proposition 2.4 we conclude that p | n. Theorem 2.8. (Euclid’s Theorem, circa 300BC) There are infinitely many prime numbers. 6 Proof. Suppose not, and there are only finitely many prime numbers, say p1 , p2 , . . . , pk . Consider the number q = p1 p2 · · · pk + 1. Since q ≥ 2, by Theorem 2.7 there exists some prime, say pi , which divides q. On the other hand, since pi | p1 p2 · · · pk and pi - 1, by property 3 of Proposition 2.4 it is the case that pi - q. This leads us to a contradiction. Hence there are infinitely many prime numbers. There are many alternative proofs of this fact, suggested by Euler, Erdős, Furstenberg, and other mathematicians (see the wikipedia page for Euclid’s Theorem). At the end of this section, we will see the proof given by Euler. We will now turn our attention to the Fundamental Theorem of Arithmetic, which states that any integer greater than 1 can be written uniquely (up to reordering) as the product of primes. Example 2.9. Number 60 can be written as 60 = 22 · 3 · 5. In order to prove the theorem, we will utilize the following tools: 1. Well-Ordering Principle. Let S be a non-empty subset of the natural numbers N. Then S contains the smallest element. To spell it out, there exists x ∈ S such that the inequality x ≤ y holds for any y ∈ S. 2. Generalized Euclid’s Lemma.2 Let p be a prime number and a1 , a2 , . . . , ak be integers. If p | a1 a2 · · · ak , then there exists an index i, 1 ≤ i ≤ k, such that p | ai . Theorem 2.10. (The Fundamental Theorem of Arithmetic) Any integer greater than 1 can be written uniquely (up to reordering) as the product of primes. Proof. We will start by proving that every positive integer greater than 1 can be written as a product of primes. Let S denote the collection of all positive integers greater than 1 that cannot be written as a product of primes. Suppose that S is not empty. Since S ( N and N is well-ordered, we conclude that S contains the smallest element, say n. Clearly, n is not a prime. Thus there exists a positive integer d such that d | n, d 6= 1 and d 6= n. Thus both d and n/d are strictly less than n and greater than 1. Furthermore, either d or n/d cannot be written as a product 2 We will prove this result in Corollary 3.15 once we will introduce the notion of a greatest common divisor. 7 of primes, for the converse would imply that n is a product of primes. Thus either d or n/d is in S, which contradicts the fact that n is the smallest element in S. This means that S is empty, so every integer greater than 1 is a product of primes. To prove uniqueness, consider two prime power decompositions n = p1a1 pa22 · · · pak k = qb11 qb22 · · · qb` ` . We will show that they are in fact the same.3 Without loss of generality, we may assume that p1 < p2 < . . . < pk and q1 < q2 < . . . < q` . Pick some index i such that 1 ≤ i ≤ k. Since pi | n = qb11 qb22 · · · qb` ` , by Generalized Euclid’s Lemma there exists some index j(i), 1 ≤ j(i) ≤ `, such that b j(i) pi | q j(i) . Now apply Generalized Euclid’s Lemma once again to deduce that pi | q j(i) . Since q j(i) is prime, its only divisors are 1 and q j(i) , which means that pi = q j(i) . Since p1 < p2 < . . . < pk , we see that j(i1 ) 6= j(i2 ) whenever i1 6= i2 . From above we conclude that for each i such that 1 ≤ i ≤ k we can put in correspondence some element j(i) — and each j(i) arises from unique i — such that 1 ≤ j(i) ≤ `, which means that there are at least as many j’s as there are i’s, so k ≤ `. Apply Generalized Euclid’s Lemma once again, but with the roles of pi and q j reversed, thus observing that for each j such that 1 ≤ j ≤ ` we can put in correspondence some element i( j) — and each i( j) arises from unique j — such that 1 ≤ i( j) ≤ `, so ` ≤ k. Since k ≤ ` and ` ≤ k, it is the case that k = `. From here we deduce that pai i | qbi i and qbi i | pai i . By property 5 of Proposition 2.4, we have pai i = qbi i . Since pi = qi , it is the case that ai = bi . The fact that the prime factorization is unique was utilized by Euler to provide an alternative proof of Euclid’s Theorem. Theorem 2.9. (Euclid’s Theorem, circa 300BC) There are infinitely many prime numbers. Proof. (Euler’s proof, 1700’s) Consider the harmonic series ∞ 1 1 1 ∑ n = 1+ 2 + 3 +.... n=1 3 Note that this is not the proof by contradiction, for we do not assume that these prime power decompositions are distinct. 8 It is widely known that this series is divergent. Now let p > 1 and recall the formula for the infinite geometric series: ∞ 1 1 1 1 ∑ pk = 1 + p + p2 + . . . = 1 − 1/p . k=0 Using this formula, we observe that ∞ 1 1 1 1 = 1 + + + . . . = , ∑ ∏ 1 − 1/p ∏ 2 p p n n=1 p prime p prime where the last equality holds by the Fundamental Theorem of Arithmetic. If there would be only finitely many primes, the product on the left hand side would be finite, which contradicts the fact that the series on the right hand side is divergent. 3 Greatest Common Divisor. Least Common Multiple. Bézout’s Lemma. When divisibility fails, we speak of quotients and remainders. Theorem 3.1. (The Remainder Theorem)4 Let a, b be integers, a > 0. Then there exist unique integers q and r such that b = aq + r, where 0 ≤ r < a. Proof. Recall that every real number x “sits” in between two consecutive integers; that is, there exists some unique integer q such that q ≤ x < q + 1. Now set x = b/a. Then from above inequality it follows that aq ≤ b < aq + a. But then 0 ≤ b − aq < a. 4 Proposition 1.3 in Frank Zorzitto, A Taste of Number Theory. 9 If we now put r = b − aq, then b = aq + r and r satisfies 0 ≤ r < a. From the above construction it is also evident that q and r are unique, so the result follows. Definition 3.2. Let a, b be integers, a > 0. Write b = aq + r, where 0 ≤ r < a. Then a is called the modulus, b is called the dividend, q is called the quotient and r is called the remainder. Note that for a > 0 the expression a | b simply means that in b = aq + r the remainder r is equal to zero. Given a and b, one can easily compute q and r using the calculator. First, compute a/b, and the integer part of this expression is precisely your q. Then compute r with the formula r = b − aq. Definition 3.3. Let a and b be integers. An integer d such that d | a and d | b is called a common divisor of a and b. When at least one of a and b is not zero, the largest integer with such a property is called the greatest common divisor of a and b and is denoted by gcd(a, b). When a = b = 0, we define gcd(a, b) := 0. The greatest common divisor of a and b possesses many interesting properties. Let us demonstrate several of them. Proposition 3.4. Let f f f a = pe11 pe22 · · · pekk and b = p11 p22 · · · pkk , where p1 , p2 , . . . , pk are distinct prime numbers and e1 , e2 , . . . , ek , f1 , f2 , . . . , fk are integers ≥ 0. Then min{e1 , f1 } min{e2 , f2 } min{ek , fk } p2 · · · pk . gcd(a, b) = p1 Further, any common divisor c of a and b must also divide gcd(a, b). Proof. Note that min{e1 , f1 } min{e2 , f2 } min{ek , fk } p2 · · · pk g = p1 divides both a and b. Also, any integer c = pg11 pg22 · · · pgk k 10 (1) such that gi > min{ai , bi } for some i fails to divide either a or b. Hence any common divisor c satisfies gi ≤ min{ai , bi } for all i, 1 ≤ i ≤ k. Hence c divides g. Maximizing the inequality for each index we get that g is in fact the greatest common divisor. Note that Proposition 3.4 suggests one formula for the computation of gcd(a, b). First, one has to factor a and b by writing them in the form f f f a = pe11 pe22 · · · pekk and b = p11 p22 · · · pkk , where the indices ei and f j are allowed to be 0 (convince yourself that any two numbers can be written in this form). Then one might simply utilize the formula (1). This approach works fine when the numbers are small and easily factorable, but unfortunately as the numbers get really large the efficient factorization is infeasible for modern electronic computers (but feasible for quantum computers, see Shor’s Algorithm). In fact, the security of the RSA public key cryptosystem is based on the difficulty of factorization. Example 3.5. Let us compute the greatest common divisor of 440 and 300. The prime factorizations are 440 = 23 · 5 · 11 and 300 = 22 · 3 · 52 . We see that 440 = 23 · 30 · 51 · 111 and 300 = 22 · 31 · 52 · 110 . Thus gcd(440, 300) = 2min{3,2} · 3min{0,1} · 5min{1,2} · 11min{1,0} = 22 · 30 · 51 · 110 = 20. Exercise 3.6. Let a and b be integers. An integer ` is called a common multiple of a and b if it satisfies a | ` and b | `. The smallest non-negative integer with such a property is called the least common multiple of a and b and is denoted by lcm(a, b). Given the statement as in Proposition 3.4, prove that max{e1 , f1 } max{e2 , f2 } max{ek , fk } p2 · · · pk lcm(a, b) = p1 (2) and that every common multiple c of a and b is divisible by lcm(a, b). That is, if a | c and b | c, then lcm(a, b) | c. Exercise 3.7. Let a and b be non-negative integers. Prove that ab = gcd(a, b) lcm(a, b). 11 (3) Exercise 3.8. Compute lcm(440, 300) using formulas (2) and (3). We will now address the following question: which integers c can be written in the form ax + by, where x and y are integers? Speaking in fancy mathematical language, the identity c = ax + by means that c is an integer (linear) combination of a and b. Let us play around a little bit with the quantity ax + by. Clearly, a can be written in this form, since a = a · 1 + b · 0. Same applies to b, since b = a · 0 + b · 1. The number 0 can always be represented in this form, since 0 = a · 0 + b · 0. Note that, when at least one of a and b is not zero, ax + by will always represent at least one positive integer, because a · a + b · b > 0. It turns out that the least positive integer d represented by ax + by is precisely the greatest common divisor of a and b. Example 3.9. Consider a = 7 and b = 15. Then the equation 7x + 15y = 1 has a solution (x, y) = (−2, 1). In fact, it has infinitely many solutions, as any solution of the form (x, y) = (−2 + 15n, 1 − 7n) for n ∈ Z is a solution, too. However, when a = 7 and b = 14 the equation 7x + 14y = 1 has no solutions, as the left hand side will always be divisible by 7, while this is not the case for the right hand side. So number 1 cannot be represented as an integer linear combination of 7 and 14. Hence the question: which numbers can be represented in this form? Theorem 3.10. (Bézout’s lemma)5 Let a, b be integers such that a 6= 0 or b 6= 0. If d is the least positive integer combination of a and b, then d divides every combination of a and b. Furthermore, d = gcd(a, b). Proof. We know that ax + by = d > 0. Now consider some integer combination c = as + bt, where s,t ∈ Z. We want to show that d | c. Recall that c = dq + r 5 Proposition 1.4 in Frank Zorzitto, A Taste of Number Theory. 12 for some q, r ∈ Z, where 0 ≤ r < d. Thus 0≤r = c − dq = as + bt − (ax + by)q = a(s − xq) + b(t − yq) < d. We see that r is an integer combination of a and b, which is less than d, and nonnegative. Because d is the least positive integer combination of a and b, the only option is that r = 0. Hence d | c. In particular, d | a and d | b, because a, b are integer combinations of a and b. We will now show that d = gcd(a, b). On one hand, we know that d | a and d | b, so d is a common divisor of a and b. By Proposition 3.4, d must divide the greatest common divisor of a and b, i.e. d | gcd(a, b). On the other hand, since d = ax + by = gcd(a, b)(a1 x + b1 y) for some x, y, a1 , b1 ∈ Z, we have gcd(a, b) | d. Since d | gcd(a, b) and gcd(a, b) | d, by property 5 of Proposition 2.4 we conclude that d = gcd(a, b). With the help of Theorem 3.10 we can answer the question which numbers can be represented in the form ax + by. Since gcd(a, b) = ax + by for some x, y ∈ Z and gcd(a, b) is the smallest positive integer representable in this form, we see that any integer c divisible by gcd(a, b) can be written in such a way, since for some k ∈ Z it is the case that c = k · gcd(a, b) = k(ax + by) = a(kx) + b(ky). On the other hand, if gcd(a, b) - c, then c cannot be written as an integer combination of a and b. We will now use Bézout’s lemma to establish a few more properties of prime numbers. In particular, we will prove Euclid’s lemma, which we already saw in Section 2. Definition 3.11. Let a and b be integers. We say that a and b are coprime if gcd(a, b) = 1. 13 Proposition 3.12. Let a, b, c be integers with a, b coprime. If a | c and b | c, then ab | c. Proof. Since a and b are coprime, by Bézout’s lemma there exist integers x and y such that ax + by = 1. Thus a(cx) + b(cy) = c. After dividing both sides of the above equality by ab we obtain c c c ·x+ ·y = . b a ab Since a | c and b | c, the quantity on the left hand side of the above equality is an integer. Hence the same applies to the quantity on the right hand side, so c/(ab) is an integer. Proposition 3.13. Let a, b, c be integers with a, b coprime. If a | bc, then a | c. Proof. Since a and b are coprime, by Bézout’s lemma there exist integers x and y such that ax + by = 1. Thus a(cx) + b(cy) = c. After dividing both sides of the above equality by a we obtain c·x+ c bc ·y = . a a Since a | bc, the quantity on the left hand side of the above equality is an integer. Hence the same applies to the quantity on the right hand side, so c/a is an integer. Proposition 3.14. (Euclid’s lemma) If p is prime and p | ab for some integers a, b, then p | a or p | b.6 6 The proof is from Frank Zorzitto’s “A Taste of Number Theory” (see Proposition 2.4 on page 31). 14 Proof. Say p - a. Let d = gcd(p, a). Since d | p, the definition of primes forces d = 1 or d = p, and since p - a, it must be that d = 1, so p and a are coprime. From Proposition 3.13 it follows that p | b. Corollary 3.15. (Generalized Euclid’s lemma) Let p be a prime number and a1 , a2 , . . . , ak be integers. If p | a1 a2 · · · ak , then there exists an index i, 1 ≤ i ≤ k, such that p | ai . Proof. The result clearly holds for k = 1, so assume that k ≥ 2. If p | a1 , we are done. If not, then p and a1 are coprime, so by Proposition 3.13 it must be the case that p | a2 a3 · · · ak . If p | a2 we are done. If not, then p and a2 are coprime, so by Proposition 3.13 it must be the case that p | a3 a4 · · · ak . Proceeding in the same fashion, we eventually reach p | ak−1 ak , where we may apply Euclid’s lemma to draw the desired conclusion. Exercise 3.16. Show that one cannot remove the coprimality condition neither from Proposition 3.12 nor from Proposition 3.13. 4 Diophantine Equations. The Linear Diophantine Equation ax + by = c An equation is called Diophantine if we are only concerned with its integer solutions. Any equation can be converted into its Diophantine form. For example, instead of looking at x2 + y2 = 1 for (x, y) ∈ R2 we may restrict our attention to (x, y) ∈ Z2 . Note that in the former case there are infinitely many solutions (in fact, there are uncountably many of them). These are all the points lying on the circle centered at the origin with the radius equal to 1. However, if we look at (x, y) ∈ Z2 then there are only four solutions, namely (±1, 0) and (0, ±1). (Do you see why?) Sometimes, converting an equation into its Diophantine form is not very interesting. This√ is the case for the equation x2 + y2 = 1. Another example is the equation y = √x 2, which has no integer solutions aside from (0, 0) due to irrationality of 2. But sometimes understanding integer solutions can get difficult, even extremely difficult. The reason is that, when considering an equation over the real numbers R or — even better! — over the complex numbers C, there are many analytical tools that we can utilize. Say, if we are looking at equation f (x) = 0 for x ∈ R, we might utilize the fact that f (x) is continuous, or differentiable, or 15 maybe even smooth. Another reason why it might be easier to analyze equations not only over R or C, but also over Q, is because all of them are fields. Quite often we can say many things about the Diophantine equation by “lifting” it and considering it, for example, over Q, for if there are only finitely many solutions over Q, then there are only finitely many solutions over Z. Such a technique applies to hyperelliptic equations, like y2 = x5 + 2 (see Faltings’ Theorem). However, sometimes there are infinitely many solutions over Q, but only finitely many — or even none! — over Z. The fact that Q is a field can be utilized to prove that there are infinitely many rational solutions to elliptic equations y2 = x3 + 46, y2 = x3 − 2. Note that the first equation has a solution (−7/4, 51/8), while the second equation has a solution (129/100, 383/1000). Unlike Q, R or C, the ring of integers Z is not closed under division by a non-zero element, so we need to use different techniques to study it. For example, the equation y2 = x3 + 46 has no solutions in integers, while the equation y2 = x3 − 2 has two solutions (3, ±5). Example 4.1. Let a, b, c, n be fixed integers, n ≥ 3, and x, y, z be integer variables. Here are several examples of Diophantine equations: ax + by = c x2 + y2 = z2 x2 − dy2 = ±1 y2 = x3 + ax + b axn + byn = c axn + byn = czn x 2 + 7 = 2y — Linear Diophantine equation in two variables; — Pythagorean equation; — Pell equation; — Weierstrass equation; — Thue equation; — Fermat type equation; — Ramanujan-Nagell equation. When analyzing equations, we would like to answer the following questions: 1. Do solutions exist? 2. If solutions exist, how many of them are there? (finitely many, countably many, uncountably many) 3. What are the solutions? 4. Are there algorithms which can generate solutions? 16 We address the same questions when analyzing Diophantine equations. Of course, in this case the number of solutions will be at most countable. We will now turn our attention to the linear Diophantine equation in two variables ax + by = c. Here a, b, c are fixed integers and x, y are integer variables. We will fully classify the solutions to this equation. The question of existence of a solution was fully resolved at the end of Section 3, where we established that solutions exist if and only if gcd(a, b) | c. To this end, the only thing that is left for us to do is to find all the solutions when they exist, and come up with a procedure for their computation. As the following Proposition shows, by knowing one solution to ax + by = c we can deduce all of the solutions. Proposition 4.2. Let a, b, c be integers. Let (x, y) be a pair of integers such that ax + by = c. Then any pair of integers (x0 , y0 ) such that c = ax0 + by0 must be of the form a b 0 0 n, y + n , (x , y ) = x − gcd(a, b) gcd(a, b) where n ranges over the integers. Proof. Suppose that (x, y) and (x0 , y0 ) are integer pairs such that c = ax + by = ax0 + by0 . Then a(x − x0 ) = b(y0 − y). This means that a | b(y0 − y), and further a | (y0 − y). gcd(a, b) This means that a gcd(a, b) for some n ∈ Z. Substituting this relation into the equation a(x − x0 ) = b(y0 − y), we see that ab a(x − x0 ) = n , gcd(a, b) which means that b x0 = x − n . gcd(a, b) y0 = y + n 17 Thus we see that from one solution to ax + by = c (if it exists) we may produce all solutions once we compute gcd(a, b). In order to determine one solution to this equation, we use the Extended Euclidean Algorithm. This algorithm allows one to compute a pair of integers (x, y) such that ax + by = gcd(a, b). This allows us to produce a solution to ax + by = c, as then it must be the case that gcd(a, b) | c, so for some integer k we have c = k gcd(a, b) = k(ax + by) = a(kx) + b(ky). We may then use Proposition 4.2 to compute all solutions to the linear Diophantine equation ax + by = c. We will learn about the Extended Euclidean Algorithm in the following section. Exercise 4.3. Let a1 , a2 , . . . , ak be integers at least one of which is not 0. The largest integer d such that d | ai for all i, 1 ≤ i ≤ k, is called the greatest common divisor of a1 , a2 , . . . , ak . It is denoted by gcd(a1 , a2 , . . . , ak ). When a1 = a2 = . . . = ak = 0, we define gcd(a1 , a2 , . . . , ak ) := 0. Determine the formulas for gcd(a1 , a2 , . . . , ak ) and lcm(a1 , a2 , . . . , ak ) that are analogous to (1) and (2). Does a formula similar to (3) hold? Explain why or why not. Exercise 4.4. Let a1 , a2 , . . . , ak be integers. We say that c ∈ Z can be represented as an integer linear combination of a1 , a2 , . . . , ak if there exist x1 , x2 , . . . , xk ∈ Z such that c = a1 x1 + a2 x2 + . . . + ak xk . Given integers a1 , a2 , . . . , ak , which integers can be written as an integer combination of a1 , a2 , . . . , ak ? 5 Euclidean Algorithm. Extended Euclidean Algorithm Let a, b be integers at least one of which is not 0. In the previous section, we found one formula for the computation of gcd(a, b), namely (1). Though being useful, it is not very efficient, as the algorithm for fast integer factorization is 18 unknown.7 However, there exists a much more efficient algorithm to compute gcd(a, b), developed by Euclid in his fundamental work Elements. It is called the Euclidean Algorithm. We begin our explorations by first showing yet another interesting property of the greatest common divisor. In particular, if a, b are integers at least one of which is not zero, then gcd(a, b) does not change if we replace b with b + ak, where k is an arbitrary integer. Proposition 5.1. Suppose a, b are two integers. Then for any integer k it is the case that gcd(a, b) = gcd(a, b + ak). Proof. Let d1 = gcd(a, b) and d2 = gcd(a, b + ak). We will show that d1 | d2 and d2 | d1 . Since d1 | a and d1 | b, it is the case that d1 | (b + ak). Since d1 is a common divisor of a and b + ak, by Proposition 3.4 it must divide their greatest common divisor d2 . Thus d1 | d2 . Now observe that d2 | a and d2 | b + ak. Thus a = d2 r1 and b + ak = d2 r2 for some r1 , r2 ∈ Z. But then b = d2 r2 − ak = d2 r2 − d2 r1 k = d2 (r2 − r1 k). Hence d2 | b, which means that d2 is a common divisor of a and b. By Proposition 3.4 it must divide their greatest common divisor d1 . Thus d2 | d1 . Since d1 | d2 and d2 | d1 , we conclude that d1 = d2 . We will now describe the Euclidean Algorithm. Let a, b be positive integers such that ab 6= 0, since when ab = 0 it is easy to compute gcd(a, b). Without loss of generality, we suppose that a > b (if a < b we may interchange a and b, and if a = b then gcd(a, b) = a). We define the finite sequence of integers a1 , a2 , . . . as follows. Set r1 = a, r2 = b, and write r1 = q1 r2 + r3 . Note that the remainder r3 satisfies 0 ≤ r3 < r2 = b. Then compute r2 = q2 r3 + r4 , r3 = q3 r4 + r5 , 7 By “fast” we mean “polynomial time”. 19 and so on. Since the sequence of integers r1 > r2 > . . . is bounded below by 0, in n steps this sequence eventually reaches some smallest positive number rn . We will show that this smallest positive integer rn is precisely gcd(a, b). Why does this process allow one to compute gcd(a, b)? By Proposition 5.1, gcd(r1 , r2 ) = gcd(r1 − q1 r2 , r2 ) = gcd(r3 , r2 ). Let us compute one more step: gcd(r3 , r2 ) = gcd(r3 , r2 − q2 r3 ) = gcd(r3 , r4 ). Proceeding in the same fashion, we see that gcd(a, b) = gcd(r1 , r2 ) = gcd(r2 , r3 ) = . . . = gcd(ri , ri+1 ) for all i such that 1 ≤ i ≤ n − 1. We see that the calculations get easier with each step, and in the end we obtain gcd(a, b) = gcd(r1 , r2 ) = . . . = gcd(rn−1 , rn ) = gcd(rn , 0) = rn . Theorem 5.2. Let a, b be positive integers with a > b. Let r1 > r2 > . . . be the finite sequence as defined above. Let rn be the smallest positive integer in this sequence. Then rn = gcd(a, b). Proof. Recall that d = gcd(a, b) = gcd(ri , ri+1 ) for i = 1, 2, . . . , n − 1. Now consider the last equation rn−2 = qn−2 rn−1 + rn . The remainder in the expression rn−1 = qn−1 rn + rn+1 satisfies 0 ≤ rn+1 < rn . Since rn is the smallest positive integer in this sequence and the sequence is strictly decreasing, the only possibility is that rn+1 = 0, which means that rn divides rn−1 . But then rn = gcd(rn−1 , rn ) = gcd(rn−2 , rn−1 ) = . . . = gcd(r1 , r2 ) = gcd(a, b). Consider several examples. 20 Example 5.3. Let us compute gcd(440, 300) using the Euclidean Algorithm. We have 440 = 1 · 300 + 140 300 = 2 · 140 + 20 140 = 7 · 20 + 0. Thus gcd(440, 300) = 20. Example 5.4. Let us compute gcd(233, 144) using the Euclidean Algorithm. We have 233 = 1 · 144 + 89 144 = 1 · 89 + 55 89 = 1 · 55 + 34 55 = 1 · 34 + 21 34 = 1 · 21 + 13 21 = 1 · 13 + 8 13 = 1 · 8 + 5 8 = 1·5+3 5 = 1·3+2 3 = 1·2+1 2 = 2 · 1 + 0. Thus gcd(233, 144) = 1. Note that both numbers in Example 5.4 are smaller than in Example 5.3. Nevertheless, in Example 5.4 the Euclidean Algorithm terminated in 12 steps, while in Example 5.3 it terminated in 3 steps. This is because in Example 5.4 we chose our integers to be the 13th and the 12th Fibonacci numbers. Recall that Fibonacci numbers are the numbers defined recursively by F1 = 1, F2 = 2 and Fn = Fn−1 + Fn−2 for n ≥ 3. It turns out that the slowest performance of the Euclidean Algorithm is achieved for consecutive Fibonacci numbers. Nevertheless, the algorithm does work in polynomial time. In 1844, Gabriel Lamé proved that the number of steps required for the completion of the Euclidean Algorithm is at most 5 log10 (min{a, b}), so we see that the algorithm works in polynomial time. Exercise 5.5. Let F1 = 1, F2 = 2, and for an integer n ≥ 3 define Fn = Fn−1 + Fn−2 . The number Fn is called the n-th Fibonacci number. Prove that the computation of gcd(Fn+1 , Fn ) with the Euclidean Algorithm requires n steps. Above we managed to compute gcd(a, b). Still, we do not know how to produce integer solutions (x, y) to the Diophantine equation ax + by = gcd(a, b). 21 This can be achieved with the help of the Extended Euclidean Algorithm. It is essentially the same as the Euclidean Algorithm, but along with the sequence r1 , r2 , . . . we will also keep track of two additional sequences s1 , s2 , . . . and t1 ,t2 , . . .. The algorithm is as follows. Set r1 = a, r2 = b; s1 = 1, s2 = 0; t1 = 0, t2 = 1. For i ≥ 3, we proceed by computing ri+1 = ri−1 − qi−1 ri ; si+1 = si−1 − qi−1 si ; ti+1 = ti−1 − qi−1ti . Note that, out of the three lines above, the Euclidean Algorithm computes only the first one. We claim that, if the Euclidean Algorithm terminates in n + 1 steps, then integers sn and tn satisfy asn + btn = gcd(a, b). Theorem 5.6. Let a, b be positive integers with a > b. Let r1 > r2 > . . . > rn > 0, s1 , s2 , . . . , sn and t1 ,t2 , . . . ,tn be sequences as defined above. Then asn + btn = gcd(a, b). Proof. We claim that the equation asi + bti = ri is satisfied for all i = 1, 2, . . . , n. Since Theorem 5.2 asserts that rn = gcd(a, b), this would imply the result. To prove this statement, we proceed using induction on n. Base case. According to our setup, r1 = a, r2 = b, s2 = t1 = 0 and s1 = t2 = 1. Thus as1 + bt1 = r1 and as2 + bt2 = r2 , so the base case holds for i = 1, 2. Induction hypothesis. Assume that asi + bti = ri for i = k − 1, k. Induction step. We will demonstrate that the result holds for i = k + 1: rk+1 = rk−1 − rk qk = (ask−1 + btk−1 ) − (ask + btk )q = (ask−1 − ask qk ) + (btk−1 − btk qk ) = ask+1 + btk+1 . We conclude that asi + bti = ri for all i = 1, 2, . . . , n, as claimed. 22 Using Extended Euclidean Algorithm, we can finally solve the Diophantine equation ax + by = c. Example 5.7. Let us determine all solutions to the Diophantine equation 440x + 300y = 80 using the Extended Euclidean Algorithm. Set r1 = 440, r2 = 300; s1 = 1, s2 = 0; t1 = 0, t2 = 1. Step 1. 440 = 1 · 300 + 140, so q1 = 1 and r3 = 140. Thus s3 = s1 − q1 s2 = 1 − 1 · 0 = 1; t3 = t1 − q1t2 = 0 − 1 · 1 = −1. Step 2. 300 = 2 · 140 + 20, so q2 = 2 and r4 = 20. Thus s4 = s2 − q2 s3 = 0 − 2 · 1 = −2; t4 = t2 − q2t3 = 1 − 2 · (−1) = 3. Step 3. Since 20 | 140, the algorithm terminates. We conclude that 440 · (−2) + 300 · 3 = 20. After multiplying both sides of the above equality by 4, we obtain a solution (x, y) = (−8, 12) to the Diophantine equation 440x + 300y = 80. By Proposition 4.2, if a = 440 and b = 300 then all solutions to this Diophantine equation must be of the form a b n, y + n = (−8 − 15n, 12 + 22n), x− gcd(a, b) gcd(a, b) where n ranges over the integers. Exercise 5.8. (a) Let a, b, c be integers such that a 6= 0 or b 6= 0, and gcd(a, b) | c. Consider the Diophantine equation ax + by = c. Prove that there exists the unique solution (x, y) such that 0 ≤ x < b/ gcd(a, b) and the unique solution (x0 , y0 ) such that 0 ≤ y0 < a/ gcd(a, b); 23 p (b) For (x, y) ∈ R2 , let k(x, y)k := x2 + y2 denote the Euclidean norm. Let a, b, c be integers such that c 6= 0 and gcd(a, b) = 1, and consider the linear Diophantine equation ax + by = c. Prove that the solution (x, y) ∈ Z2 of the above equation that corresponds to the smallest value of k(x, y)k satisfies |c| k(a, b)k |c| ≤ k(x, y)k ≤ + . k(a, b)k k(a, b)k 2 6 Congruences. The Double-and-Add Algorithm Throughout this section, we fix a positive integer n, which we call the modulus. Definition 6.1. We say that integers a and b are congruent modulo n if n divides a − b. We denote this by a ≡ b (mod n). To say that a and b are congruent modulo n is the same as to say that their remainders after division by n are the same. That is, if a = q1 n + r1 and b = q2 n + r2 , where 0 ≤ r1 , r2 < n, then r1 = r2 . A rather surprising fact is that the congruence relation ≡ behaves much like the equality relation =. Proposition 6.2. The congruence relation ≡ is an equivalence relation. That is, it satisfies the following three axioms: (a) Reflexivity. If a is any integer, then a ≡ a (mod n); (b) Symmetry. If a ≡ b (mod n), then b ≡ a (mod n); (c) Transitivity. If a ≡ b and b ≡ c (mod n), then a ≡ c (mod n). Proof. Exercise. 24 Example 6.3. Let n = 5. Then the numbers 7 and 27 are congruent to each other modulo 5, because 5 | (27 − 7). Also note that both 7 and 27 have the same remainder after division by 5: 7 = 1 · 5 + 2 and 27 = 4 · 5 + 2. In fact, it is easy to notice that there are infinitely many numbers congruent to 7 modulo 5. Convince yourself that all of them belong to the set {5q + 2 : q ∈ Z} = . . . , −8, −3, 2, 7, 12, . . . . Proposition 6.4. 8 Let n be a modulus, and suppose that a ≡ a1 b ≡ b1 Then (mod n), (mod n). a ± b ≡ a1 ± b1 (mod n), ab ≡ a1 b1 (mod n). Proof. Let us first show that a + b ≡ a1 + b1 (mod n). Note that n | (a − a1 ) and n | (b − b1 ). By property 2 of Proposition 2.4, n | (a − a1 ) + (b − b1 ) = (a + b) − (a1 + b1 ), so by definition we see that a + b ≡ a1 + b1 (mod n). An analogous proof holds if we replace the plus sign with the minus sign. To see that ab ≡ a1 b1 (mod n), observe that ab − a1 b1 = ab − a1 b + a1 b − a1 b1 = (a − a1 )b + a1 (b − b1 ). Since n | (a − a1 ) and n | (b − b1 ), once again, by property 2 of Proposition 2.4 it is the case that n | (a − a1 )b + a1 (b − b1 ) = ab − a1 b1 , and by definition this means that ab ≡ a1 b1 (mod n). If we now combine Propositions 6.2 and 6.4, it becomes clear that in any congruence, which involves only addition, subtraction and multiplication of integers, we can easily replace a with a1 whenever a ≡ a1 (mod n). This is known as the replacement principle. 8 Proposition 3.3 in Frank Zorzitto, A Taste of Number Theory. 25 Example 6.5. Let f (x) = x5 − 10x + 7. We can compute the remainder of f (27) divided by 5 as follows: note that 27 ≡ 2 (mod 5). Since f (x) involves only addition, subtraction and multiplication of integers, by the replacement principle we can compute f (2) instead of f (27), because f (27) ≡ f (2) (mod 5). Also, since 10 ≡ 0 (mod 5) and 7 ≡ 2 (mod 5), we see that f (27) ≡ f (2) ≡ 25 − 10 · 2 + 7 ≡ 25 − 0 · 2 + 2 ≡ 34 ≡ 4 (mod 5). Since 0 ≤ 4 < 5, we conclude that 4 is the remainder of f (27) divided by 5. Example 6.6. Let us compute the last decimal digit of 30799 . Note that this is the same as finding the remainder of 30799 divided by 10. By the replacement principle, reading from left to right and top to bottom, we have 30799 ≡ 799 ≡ (73 )33 ≡ 34333 ≡ 333 ≡ (33 )11 ≡ (27)11 ≡ 711 ≡ 72 · (73 )3 ≡ 49 · 33 ≡ 9 · 27 ≡ 9 · 7 ≡ 63 ≡3 (mod 10). Thus 3 is the last decimal digit of 30799 . Analogously, we can determine the last k decimal digits of any number by applying the replacement principle modulo 10k instead of 10. However, as the modulus grows, the computations become more and more challenging. In practice, in order to compute a` (mod n) for some large power `, we utilize the so called Double-and-Add Algorithm. The algorithm is as follows: first, write the integer ` in its binary expansion, i.e. k ` = ∑ ci 2i = ck 2k + ck−1 2k−1 + . . . + c1 · 2 + c0 , i=0 where ci ∈ {0, 1}. Then k k−1 a` ≡ ack 2 +ck−1 2 +...+c1 ·2+c0 , k ck k−1 ck−1 c ≡ a2 · a2 · · · a2 1 · ac0 26 (mod n). j But then note that, for j such that 2 ≤ j ≤ k, we can deduce the value of a2 from j−1 a2 modulo n as follows: j−1 2 j a2 ≡ a2 (mod n). 2 k Therefore we can compute a2 , a2 , . . . , a2 in k − 1 steps. Example 6.7. Let us compute n ≡ 7114 (mod 23) such that 0 ≤ n < 23. Note that 114 = 64 + 32 + 16 + 2 = 26 + 25 + 24 + 2. Then 72 74 78 716 732 764 ≡ 49 ≡ (72 )2 ≡ (74 )2 ≡ (78 )2 ≡ (716 )2 ≡ (732 )2 ≡ 3 (mod 23); ≡ 32 ≡ 9 (mod 23); ≡ 92 ≡ 81 ≡ 12 (mod 23); ≡ 122 ≡ 144 ≡ 6 (mod 23); ≡ 62 ≡ 36 ≡ 13 (mod 23); ≡ 132 ≡ 169 ≡ 8 (mod 23). We can utilize the table above in our calculations: 7114 ≡ 764+32+16+2 ≡ 764 · 732 · 716 · 72 ≡ 8 · 13 · 6 · 3 ≡ 1872 ≡ 9 (mod 23). We will now take a look at some interesting applications of modular arithmetic. For example, it can be used to demonstrate that certain Diophantine equations have no solutions. Example 6.8. Let us show that the Diophantine equation x2 + y2 = 4z + 3 has no solutions. This is the same as solving the congruence x2 + y2 ≡ 3 27 (mod 4) in integers x and y. Since every integer is congruent to either 0, 1, 2 or 3 modulo 4, there are essentially 16 possible combinations of x and y that we can check. However, the problem becomes even simpler if we note that 02 ≡ 0, 12 ≡ 1, 22 ≡ 0, 32 ≡ 1 (mod 4). Thus every perfect square is congruent to either 0 or 1 modulo 4. Since we are dealing with the sum of two perfect squares, there are now only three options left to check, namely 0 + 0 ≡ 0, 0 + 1 ≡ 1, 1 + 1 ≡ 2 (mod 4). As we can see, none of them add up to 3, which means that x2 + y2 ≡ 3 (mod 4) has no solutions in integers x and y. Therefore there are no solutions to the Diophantine equation x2 + y2 = 4z + 3. Exercise 6.9. (a) Show that the Diophantine equation x2 + y2 + z2 = 8t + 7 has no solutions for x, y, z,t ∈ Z; √ √ (b) Let Z[ 2] := {a + b 2 : a, b ∈ Z}.√ Show that there exists a solution to x2 + y2 + z2 = 8t + 7 for x, y, z,t ∈ Z[ 2]; (c) Show that integers x, y, z,t satisfy x2 + y2 + z2 = 8t + 3 if and only if x, y and z are odd. In school, you probably heard of divisibility rules for various integers. For example, in order to check that some integer is divisible by 3, one just has to add up all of its decimal digits together and verify that the resulting number is divisible by 3. To verify that some integer n is divisible by 4, one just has to ensure that the number representable by the last two decimal digits of n is divisible by 4. These divisibility rules are the consequences of modular arithmetic. Example 6.10. Let us prove the following divisibility rule for 3 and 9. Let n be a positive integer, and let m be the sum of the decimal digits of n. Then 3 | n if and only if 3 | m, and 9 | n if and only if 9 | m. Let us prove the divisibility rule for 3, as the divisibility rule for 9 is analogous to it. We write the number n in base 10: k n = ∑ ai 10i , i=0 28 where ai ∈ {0, 1, . . . , 9}. Then, by definition, m = ak + ak−1 + . . . + a1 + a0 . Since 10 ≡ 1 (mod 3), n ≡ ak 10k + ak−1 10k−1 + . . . + a1 · 10 + a0 ≡ ak · 1k + ak−1 · 1k−1 + . . . + a1 · 1 + a0 ≡ ak + ak−1 + . . . + a1 + a0 ≡ m (mod 3). We conclude that 3 | (n − m), so there exists an integer k1 such that n − m = 3k1 . Now assume that 3 | m. Then there exists an integer k2 such that m = 3k2 . But then 3k1 = n − m = n − 3k2 implies n = 3(k1 + k2 ), which means that 3 | n. Analogously, we can show that if 3 | n, then 3 | m. If we replace the modulus 3 with the modulus 9, the proof will remain the same. Exercise 6.11. Prove the following divisibility rule for 11. Let n be an integer. Let m be the sum of the digits of n in blocks of two from right to left. Then 11 | n if and only if 11 | m. Example: If n = 3928881, then m = 3 + 92 + 88 + 81 = 264 is divisible by 11. Thus 3928881 is divisible by 11 as well. 7 The Ring of Residue Classes Zn Recall that, according to our terminology, the set of all integers Z forms a ring, if 0, 1 ∈ Z and for all a and b in Z we have a ± b ∈ Z and a · b ∈ Z. Now let n be a modulus. In this section, we will introduce the first example of a finite ring Zn and study its properties. As the name suggests, this ring will have only finitely many elements. Just like the ring of integers Z, it will contain its own analogues of 0 and 1, and we will also endow it with the operations of addition, subtraction and multiplication, which will be very much similar to the operations in Z. Definition 7.1. Let a be an integer. The set [a] := {nq + a : q ∈ Z} 29 is called the residue class of a modulo n. The integer a is called a representative of the residue class [a]. Note that [a] = [b] if and only if a ≡ b (mod n). Also, each residue class contains an integer r such that 0 ≤ r < n. It is conventional to pick such integers as representatives. For example, if n = 5, even though one can denote the set of all integers congruent to 17 modulo 5 by [17], we would rather prefer to use [2] instead, since 17 ≡ 2 (mod 5) and 0 ≤ 2 < 5. Since there are only n possible numbers between 0 and n (exclusive), namely 0, 1, 2, . . . , n − 1, and each integer is congruent modulo n to exactly one of these numbers, we see that there are exactly n residue classes modulo n. Exercise 7.2. Let n be a positive integer. Prove that the residue classes [0], [1], . . . , [n − 1] modulo n partition the integers. That is, [0] ∪ [1] ∪ . . . ∪ [n − 1] = Z, and also [a] ∩ [b] 6= ∅ implies [a] = [b]. Hint: use Proposition 6.2. We denote the collection of residues modulo n by Z/nZ or Zn .9 Since the notation Zn is utilized in your course notes, we will stick with it in these lecture notes. Proposition 7.3. Let n be a positive integer and consider the collection Zn of all residues modulo n. Define the binary operations +, − and · as follows: [a] ± [b] := [a ± b] and [a] · [b] := [a · b]. Then, under these binary operations, Zn forms a ring. Proof. Exercise. Hint: use Proposition 6.4. Note that Zn is a finite ring. When we carry out operations in Zn , we are doing modular arithmetic. To do modular arithmetic, just carry out the regular arithmetic and then replace the result with any other integer modulo n (once again, conventionally we pick a representative r such that 0 ≤ r < n). 9 The latter notation might be ambiguous, as when the ring of p-adic integers. 30 p is prime the symbol Z p is used to represent Example 7.4. Here are two examples of a modular arithmetic in Z17 : [33] + [12] = [16] + [12] = [28] = [11]. [11] · [19] = [11] · [2] = [22] = [5]. Note that, in the case of addition, one could slightly simplify the computations by noting that 33 ≡ −1 (mod 17): [33] + [12] = [−1] + [12] = [11]. After all, dealing with −1 is much simpler than with 16. Despite the fact that Zn behaves much like Z, some of its properties might be rather unpleasant. For example, Z has no zero divisors apart from 0. In other words, the identity ab = 0 implies that either a = 0 or b = 0. In general, this is not true for Zn . Example 7.5. To see that Z6 contains zero divisors that are 6= [0], note that [2] · [3] = [6] = [0] = [2] · [0]. Thus we see that [2] · [3] = [0] in Z6 , even though [2] 6= [0] and [3] 6= [0]. The same is true for Z15 : [3] · [5] = [15] = [0] = [3] · [0]. Thus we see another major difference between Z and Zn : in Z, the expression ab = ac with a 6= 0 implied b = c. However, in general, this is no longer true for Zn . It is not difficult to show that, in fact, Zn has no non-trivial zero divisors if and only if n is prime or n = 1. 8 Linear Congruences Let n be a modulus. We will now turn our attention to equations in Zn . Let a, b be integers, and consider the linear equation [a][x] = [b], where x is an unknown integer. 31 Example 8.1. The linear equation [7][x] = [3] has only one solution in Z13 , namely [x] = [6]. As there are only finitely many possibilities, we may check all of them, from [0] to [12], in order to find a solution. Even though there is only one solution in Z13 , there are actually infinitely many solutions in Z. This is because any integer y ∈ [6], — that is, any integer of the form y = 13q + 6, — satisfies 7y ≡ 3 (mod 13). The linear equation [3][x] = [6] has two solutions in Z9 , namely [x] = [2] and [x] = [5]. Here we see the principal difference between the linear equation in Zn and the linear equation cx = d in Z: the only way cx = d can have more than one solution is if c = d = 0. Finally, the equation [3][x] = [7] has no solutions in Z9 . Once again, we can easily verify this by plugging in all the possible values of [x] = [0], [1], . . . , [8]. It turns out that the tools that we have in our hands right now can help us to solve the linear congruence easily. Observe that [a][x] = [ax] = [b], and this is the same as solving the congruence ax ≡ b (mod n). Now by definition, n has to divide ax − b, so there exists an integer y such that ax − b = n(−y). In other words, the linear congruence [a][x] = [b] has a solution if and only if the Diophantine equation ax + ny = b has a solution in integers x and y. From what we have learned in Section 3, it immediately follows that the linear equation [a][x] = [b] has no solutions if and only if gcd(a, n) - b (verify that this is the case for the last two equations in Example 8.1). When the solutions exist, we can use the Extended Euclidean Algorithm to find them. 32 Example 8.2. Let us consider the linear equation [440][x] = [80] in Z300 . From Example 5.7 we know that the solutions to 440x + 300y = 80 in integers x and y are of the form x = −8 + 15n and y = 12 − 22n, where n is an integer. Thus [440][−8+15n] = [80] in Z300 . By evaluating −8 + 15n at n = 1, 2, . . . , 20 we obtain 20 distinct solutions in Z300 , namely [7], [22], [37], . . . , [292]. Note that gcd(440, 300) = 20 and there are 20 distinct solutions. In Exercise 8.3, you are asked to prove that this phenomenon holds in general. Exercise 8.3. Let n ≥ 1 be a modulus, a, b be integers such that a 6= 0. Prove that, if gcd(a, n) | b, then the total number of distinct residue classes satisfying [a][x] = [b] is equal to gcd(a, n). 9 The Group of Units Z?n Let n be a modulus and consider the finite ring Zn of residues modulo n. Recall that, in general, the ring Zn does not enjoy the property that if [a][b] = [a][c] and [a] 6= 0 then [b] = [c] (see Example 7.5). However, for special values of [a] called units this cancellation law actually holds. Definition 9.1. The residue class [a] in Zn is called a unit if there exists a solution to [a][x] = [1] in Zn . If [a] is a unit, we say that any integer b ∈ [a] is invertible modulo n. Proposition 9.2. The following statements are equivalent: 1. [a] is a unit; 2. For all integers b and c, [a][b] = [a][c] implies [b] = [c]; 3. a and n are coprime. 33 Proof. Let us prove that 1 implies 2. Since [a] is a unit, there exists an integer x such that [a][x] = [1]. Now suppose that [a][b] = [a][c] for some integers b and c. Then [x][a][b] = [x][a][c]. Since Zn is a commutative ring, we see that [x][a] = [a][x] = [1]. Thus the above equality simplifies to [1][b] = [1][c], and this implies [b] = [c]. To prove that 2 implies 3, suppose that the statement is false and a and n are not coprime. WIthout loss of generality, we may assume that 0 ≤ a < n. Then there exists an integer p > 1 such that a = pk1 and n = pk2 for some integers k1 and k2 . Since p > 1, we conclude that 1 ≤ k2 < n, which in turn implies k1 6≡ 0 (mod n). But then ak2 = pk1 k2 = pk2 k1 = nk1 ≡ 0 ≡ a · 0 (mod n). Thus we see that [a][k2 ] = [a][0], even though [k2 ] 6= [0]. This contradicts our assumption, so a and n are coprime. Finally, let us demonstrate that 3 implies 1. Since a and n are coprime, by Bézout’s lemma there exist integers x and y such that ax + ny = 1. This means that [a][x] = [1], so by Definition 9.1 the residue class [a] is a unit. Corollary 9.3. Let [a] be a unit in Zn . Then for any integer b the equation [a][x] = [b] has a unique solution. Proof. Suppose that there are two solutions [x] and [y], so [a][x] = [b] = [a][y]. By property 2 of Proposition 9.2, the identity [a][x] = [a][y] implies [x] = [y]. Note that the statements of Proposition 9.2 and Corollary 9.3 can be translated from the language of residue classes to the language of congruences. For example, property 1 simply states that ax ≡ 1 (mod n), while property 2 states that ab ≡ ac (mod n) implies b ≡ c (mod n). Finally, Corollary 9.3 implies that the congruence ax ≡ b (mod n) has a unique solution such that 0 ≤ x < n, and all integer solutions to this congruence must be of the form x + nq for q ∈ Z. 34 Proposition 9.4. If p is prime and [a] 6= [0] in Z p , then [a] is a unit. Furthermore, Z p has no zero divisors apart from [0] itself. Proof. Since [a] 6= [0], without loss of generality we may assume that 1 ≤ a < p. Note that this implies that a and p are coprime, for otherwise gcd(a, p) = d > 1 would imply d = p. But then p = d < a and a < p at the same time, a contradiction. Since gcd(a, p) = 1, by Bézout’s lemma there exist integers x and y such that ax + by = 1. But then [a][x] = [1], so by Definition 9.1 the residue class [a] must be a unit in Z p . Since every unit obeys the cancellation law stated in property 2 of Proposition 9.2, it follows that Z p has no zero divisors apart from [0] itself. Definition 9.5. Let [a] be a unit in Zn . The element [x] satisfying [a][x] = [1] is called an inverse of Zn and is denoted by [a]−1 . When translated to the language of congruences, the fact that a is invertible modulo n implies the existence of an integer which we denote by a−1 such that a · a−1 ≡ 1 (mod n). Definition 9.6. The set of all units of Zn is called the group of units of Zn and is denoted by Z?n . Proposition 9.7. The set of all units of Zn forms a group under the operation of multiplication. That is, it satisfies the following four group axioms: 1. Closure. For all [a], [b] ∈ Z?n , [a] · [b] ∈ Z?n ; 2. Associativity. ([a] · [b]) · [c] = [a] · ([b] · [c]); 3. Identity element. For all [a] in Z?n , the element [1] satisfies [a] · [1] = [1] · [a] = [a]; 4. Inverse element. For each [a] in Z?n there exists an element [a]−1 in Z?n such that [a] · [a]−1 = [a]−1 · [a] = [1]. Furthermore, the group of units Z?n is finite and Abelian:10 10 In the context of groups, it is conventional to use the word “Abelian” instead of “commutative”. 35 5. Abelianness. For all [a], [b] ∈ Z?n , [a] · [b] = [b] · [a]; 6. Finiteness. There are only finitely many elements in Z?n . Proof. Exercise. Example 9.8. Let us compute Z?10 . By Proposition 9.2, it suffices to find all integers m, 0 ≤ m < 10, that are coprime to 10. Thus Z?n = {1, 3, 7, 9}. To convince ourselves that Z?10 is closed under the operation of multiplication, let us construct the multiplication table: · 1 3 7 9 1 1 3 7 9 3 3 9 1 7 7 7 1 9 3 9 9 7 3 1 We can see that all of the elements in the multiplication table are indeed in Furthermore, we see that each row, as well as each column in this table is just a result of permutation of 1, 3, 7 and 9. In the future, we will see that this is not a coincidence. Z?10 . 10 Euler’s Theorem and Fermat’s Little Theorem We will now prove our first non-trivial result — the Euler’s Theorem. Definition 10.1. Let ϕ(n) denote the number of integers m such that 0 ≤ m < n and gcd(m, n) = 1. The function ϕ is called the Euler’s totient function. Exercise 10.2. Let #X denote the cardinality of a set X. Let n be a modulus. Prove that ϕ(n) = #Z?n . Theorem 10.3. (Euler’s Theorem) If [a] ∈ Z?n , then [a]ϕ(n) = [1]. Proof. 11 Let k = ϕ(n). Let [1] = [u1 ], [u2 ], . . . , [uk ] 11 Theorem 3.16 in Frank Zorzitto, A Taste of Number Theory. 36 be the complete list of residues of Z?n . Since Z?n is a group, all the elements [a] · [u1 ], [a] · [u2 ], . . . , [a] · [uk ] are in Z?n . Furthermore, no element appears in this list twice, for if [a] · [ui ] = [a] · [u j ] for some i 6= j, then [ui ] = [u j ] by property 2 of Proposition 9.2. Hence the second list is just a permutation of [u1 ], [u2 ], . . . , [uk ]. Thus [u1 ] · [u2 ] · · · [uk ] = ([a] · [u1 ]) · ([a] · [u2 ]) · · · ([a] · [uk ]). Since Z?n is an Abelian group, we can rearrange the order of multiplication in order to obtain [u1 ] · [u2 ] · · · [uk ] = [a]k · [u1 ] · [u2 ] · · · [uk ]. Finally, we refer to property 2 of Proposition 9.2 to cancel the unit [u1 ]·[u2 ] · · · [uk ], and conclude that [a]k = [1]. In the language of congruences, Euler’s Theorem translates to aϕ(n) ≡ 1 (mod n) for every integer that is invertible modulo n. Example 10.4. Let us prove that 1223 divides 6231222 − 1. This become evident once we note that ϕ(1223) = 1222 and gcd(1223, 623) = 1 (so [623] is a unit in Z1223 ). By Euler’s Theorem, 6231222 ≡ 1 (mod 1223), which means that 1223 divides 6231222 − 1. Corollary 10.5. (Fermat’s Little Theorem) Let p be prime. Then for any integer a such that p - a it is the case that [a] p−1 = [1]. In other words, a p−1 ≡ 1 (mod p). Proof. Note that for any integer a such that 1 ≤ a < p it is the case that gcd(a, p) = 1. Thus [a] is a unit in Z?p and ϕ(p) = p − 1. The result then follows from Euler’s Theorem. The theorems of Euler and Fermat give us a useful tool for raising integers to high powers modulo n. 37 Proposition 10.6. integers such that 12 If n is a modulus, a is coprime to n, and k, ` are non-negative k ≡ ` (mod ϕ(n)), then ak ≡ a` (mod n). Proof. Say k ≤ `. We are given that ` = qϕ(n) + k for some q ≥ 0. Then, by Euler’s Theorem, q a` = aqϕ(n)+k = aϕ(n) ak ≡ 1q ak = ak (mod n). 155 Example 10.7. Let us compute 177 modulo 33. Note that ϕ(33) = 20. Since gcd(17, 33) = 1, by Euler’s theorem it first makes sense to reduce 7155 modulo 20. We can apply Euler’s Theorem again here. Note that ϕ(20) = 8, and since gcd(7, 8) = 1 we can see that 78 ≡ 1 (mod 20). But then, by Proposition 10.6, 7155 = 719·8+3 ≡ 73 ≡ 343 ≡ 3 (mod 20). Thus 155 177 ≡ 173 ≡ 4913 ≡ 33 (mod 33). Exercise 10.8. Compute the integer n, 0 ≤ n < 55, such that 2134 n ≡ 813 11 (mod 55). The Chinese Remainder Theorem Now that we know how to solve linear congruences, let us try to understand how to work with systems of congruences. Since the congruence relation ≡ behaves much like the equality relation =, solving a system of linear congruences with a single modulus would be very similar to solving a system of linear equations, which we already know how to handle through the methods of linear algebra. 12 Proposition 3.20 in Frank Zorzitto, A Taste of Number Theory. 38 On the other hand, if we consider different systems of different moduli, things might get interesting. We will merely consider the most simple example of such systems, namely x ≡ a1 (mod n1 ), x ≡ a (mod n ), 2 2 ... x ≡ ak (mod nk ), where a1 , a2 , . . . , ak are integers and n1 , n2 , . . . , nk are positive integers greater than 1 that are pairwise coprime. Our goal here is to determine x, which satisfies all of the k congruences above. The existence of such an x is asserted by the Chinese Remainder Theorem. Before proceeding to its statement, let us recall Proposition 3.12 and the following consequence of it. Proposition 11.1. Let m and n be integers greater than 1 that are coprime. Then the congruence a ≡ b (mod mn) is true if and only if both of the congruences a ≡ b (mod m), a ≡ b (mod n) are true. Proof. Suppose that a ≡ b (mod mn). Then mn | (a − b). But then m | (a − b) and n | (a − b) so, by definition, a ≡ b (mod m) and a ≡ b (mod n). To prove the converse, suppose that a ≡ b (mod m) and a ≡ b (mod n). Then m | (a − b) and n | (a − b). Since gcd(m, n) = 1, we may apply Proposition 3.12 to conclude that mn | (a − b). Thus a ≡ b (mod mn). Theorem 11.2. (The Chinese Remainder Theorem)13 If m, n are coprime moduli and a, b are any integers, then the congruences x≡a x≡b (mod m), (mod n) have a common solution x. Furthermore, any two solutions x, y to this pair of congruences must be such that x ≡ y (mod mn). 13 Theorem 4.2 in Frank Zorzitto, A Taste of Number Theory. 39 Proof. Since m and n are coprime, by Bézout’s lemma the equation mt − ns = b − a can be solved integers s and t. Thus mt + a = ns + b = x. Note that x ≡ a (mod m) and x ≡ b (mod n), which makes it a solution to both congruences. If y is another solution to the system of congruences, then x ≡ y (mod m), x ≡ y (mod n). By Proposition 11.1, we conclude that x ≡ y (mod mn). We can easily generalize this result to arbitrary number of coprime moduli. Theorem 11.3. (Generalized Chinese Remainder Theorem)14 Suppose n1 , n2 , . . . , nk are moduli that are pairwise coprime. That is, ni and n j are coprime when i 6= j. If a1 , a2 , . . . , ak are integers, then there exists an integer x such that x ≡ a1 (mod n1 ), x ≡ a (mod n ), 2 2 . . . x ≡ ak (mod nk ). Furthermore, if x0 is such a solution of these congruences, then the complete solution is given by all x ≡ x0 (mod n1 n2 · · · nk ). Example 11.4. Let us solve the system of congruences ( x ≡ 3 (mod 6), x ≡ 7 (mod 13). Since 6 and 13 are coprime, by Bézout’s lemma there exist integers x and y such that 6x + 13y = 1. 14 Theorem 4.3 in Frank Zorzitto, A Taste of Number Theory. 40 Note that x = −2 and y = 1 give us an answer. We can multiply both sides of the above equality by 7 − 3 = 4 to obtain a solution to 6x0 + 13y0 = 7 − 3. Such a solution is given by x0 = 4·(−2) = −8 and y0 = 1·4 = 4. After rearranging, we get 3 + 6x0 = 7 − 13y0 = −45. Note that −45 ≡ 3 (mod 6) and −45 ≡ 7 (mod 13). Since 6 and 13 are coprime, by the Chinese Remainder Theorem the congruence x ≡ −45 ≡ 33 (mod 78) captures all integer solutions to the original system of congruences. Exercise 11.5. Solve the system of congruences x ≡ 3 (mod 5), x ≡ 5 (mod 7), x ≡ 7 (mod 11). 12 Polynomial Congruences The Chinese Remainder Theorem can also be utilized to solve polynomial congruences. Let d be a positive integer and consider a polynomial f (x) = cd xd + cd−1 xd−1 + . . . + c1 x + c0 with integer coefficients c0 , c1 , c2 , . . . , cd . Then the congruence of the form f (x) ≡ 0 (mod n) (4) is called a polynomial congruence. We would like to find all integers x, which satisfy such a congruence. Note that, if we replace the coefficients ci of f (x) with their residue classes [ci ], thus “reducing” our polynomial from Z to Zn , solving the congruence (4) is equivalent to solving the equation f ([x]) = [0] in Zn . If such an equation is satisfied by some residue class [x0 ], we say that [x0 ] is a root of f (x) in Zn . 41 Let n = pe11 pe22 · · · pekk be the prime factorization of n. Then, as it turns out, there is a one-to-one correspondence between solutions to the congruence (4) and solutions to the system of congruences f (x) ≡ 0 (mod pe11 ); f (x) ≡ 0 (mod pe2 ); 2 ... f (x) ≡ 0 (mod pekk ). This result follows from the next proposition, which is very similar to Proposition 11.1. Proposition 12.1. Let f (x) ∈ Z[x] be a polynomial. Let m and n be coprime moduli. Then f (x) ≡ 0 (mod mn) if and only if ( f (x) ≡ 0 f (x) ≡ 0 (mod m); (mod n). Proof. Suppose that f (x) ≡ 0 (mod mn). Then mn | f (x), which means that m | f (x) and n | f (x). Suppose that f (x) ≡ 0 (mod m) and f (x) ≡ 0 (mod n). Then m | f (x) and n | f (x). Since m and n are coprime, it follows from Proposition 3.12 that mn | f (x). Coming back to our previous notation, if n = pe11 pe22 · · · pekk is the prime factorization of n, and integers x1 , x2 , . . . , xk satisfy f (xi ) ≡ 0 (mod pei i ) for i = 1, 2, . . . , k, then we can find x such that x ≡ xi (mod pei i ) for all i using the Generalized Chinese Remainder Theorem. But then such an x would satisfy f (x) ≡ 0 (mod pei i ) for all i, and therefore f (x) ≡ 0 (mod n). From here it follows that, if each congruence f (x) ≡ 0 (mod pei i ) has si solutions, then the congruence f (x) ≡ 0 (mod n) has s1 s2 · · · sk solutions. Now we would like to determine how many solutions does a polynomial congruence f (x) ≡ 0 (mod pe ) have. Due to the time limitations, we will answer this 42 question only in the case e = 1, and show that there are at most d solutions, where d is the degree of f (x). We remark that, in general, there are at most d solutions when p is an odd prime, and at most 2d solutions when p = 2. The most accurate estimates on the number of solutions of polynomial congruences was established in 1991 by the Canadian mathematician Cameron L. Stewart, who is currently a professor at the University of Waterloo. Proposition 12.2. 15 If p is prime and f (x) is a polynomial of degree d with coefficients in Z p , then f (x) has at most d roots in Z p . Proof. We will prove this result by induction on the degree d of a polynomial f (x). Base case. Let d = 0. Then f (x) = α0 for some non-zero α0 in Z p . Clearly, this polynomial has 0 ≤ d = 0 roots, so the result holds. Induction hypothesis. Suppose that the result is true for all polynomials of degrees k = 1, 2, . . . , d − 1. Induction step. We will show that the result holds for every polynomial of degree k = d. Let f (x) = αd xd + αd−1 xd−1 + . . . + α1 x + α0 , where αd 6= 0. If f (x) has no roots, then surely 0 ≤ n. Otherwise f (x) has a root, say β . Then f (x) = f (x) − 0 = f (x) − f (β ) = αd (xd − β d ) + αd−1 (xd−1 − β d−1 ) + . . . + α1 (x − β ). Now recall that, for any positive integer j ≥ 2 it is the case that x j − β j = (x − β )(x j−1 + x j−2 β + x j−3 β 2 + . . . + xβ j−2 + β j−1 ). Now we see that we can factor out (x − β ) in the expression for f (x) given above, which means that f (x) = (x − β )g(x) for some polynomial g(x) with coefficients in Z p . Clearly, the degree of g(x) does not exceed d − 1, so we can apply the inductive hypothesis to conclude that g(x) as at most d − 1 roots. 15 Proposition 5.14 in Frank Zorzitto, A Taste of Number Theory. 43 Let γ 6= β be some root of f (x). Then 0 = f (γ) = (γ − β )g(γ). We claim that g(γ) = 0. For assume otherwise, so that g(γ) 6= 0 and γ −β 6= 0. But then both γ − β and g(γ) are non-trivial zero divisors in Z p , and this contradicts Proposition 9.4, which asserts that there are no non-trivial zero divisors in Z p whenever p is prime. We conclude that g(γ) = 0. Since every root of f (x) is either equal to β or one of at most d − 1 roots of g(x), we conclude that there are at most d roots of f (x). Example 12.3. Let us solve the polynomial congruence x49 + 2x33 + 24 ≡ 0 (mod 119). Note that 119 = 7 · 17. By Proposition 12.1, there is a one-to-one correspondence between the roots to the above congruence and the roots to the system of congruences ( x49 + 2x33 + 24 ≡ 0 (mod 7); x49 + 2x33 + 24 ≡ 0 (mod 17). Let us solve each of these congruences separately. Consider the case n = 7 with ϕ(7) = 6. Note that x ≡ 0 (mod 7) is not a solution. This means that gcd(x, 7) = 1, so we may apply Euler’s Theorem: x49 + 2x33 + 24 ≡ x8·6+1 + 2x5·6+3 + 24 ≡ x + 2x3 + 24 ≡ 2x3 + x + 3 (mod 7). Thus we need to solve the congruence 2x3 + x + 3 ≡ 0 (mod 7). After evaluating the left hand side at x = 1, 2, 3, 4, 5, 6, we can convince ourselves that there are only two solutions, namely x≡2 (mod 7) and x ≡ 6 (mod 7). Consider the case n = 17 with ϕ(17) = 16. Note that x ≡ 0 (mod 17) is not a solution. This means that gcd(x, 17) = 1, so we may apply Euler’s Theorem: x49 + 2x33 + 24 ≡ x3·16+1 + 2x2·16+1 + 24 ≡ x + 2x + 24 ≡ 3x + 24 (mod 17). 44 Thus we need to solve the congruence 3x + 24 ≡ 0 (mod 17). We see that x ≡ −8 ≡ 9 (mod 17) is a solution. Since 17 is prime, it follows from Proposition 12.2 that this is the only solution. Since there are two solutions modulo 7 and only one solution modulo 17, we conclude that there are 2 · 1 = 2 solutions modulo 7 · 17 = 119. These solutions correspond to two systems of equations: ( ( x ≡ 2 (mod 7), x ≡ 6 (mod 7), and x ≡ 9 (mod 17); x ≡ 9 (mod 17). We can compute solutions modulo 119 using the Extended Euclidean Algorithm. Consider the first system of congruences. Since 7 and 17 are coprime, by Bézout’s lemma there exists a solution to 7x + 17y = 1. For example, x = 5 and y = −2. By multiplying both sides of the above equality by 9 − 2 = 7, we can find a solution to 7x0 + 17y0 = 9 − 2 = 7, namely x0 = 7 · x = 35 and y0 = 7 · (−2) = −14. But then x1 = 2 + 7x0 = 9 − 17y0 = 247 satisfies x1 ≡ 2 (mod 7) and x1 ≡ 9 (mod 17). Therefore x1 ≡ 247 ≡ 9 (mod 119) is a solution. The second system of congruences can be solved analogously and gives us a solution x2 ≡ 111 (mod 119). Exercise 12.4. Give examples of polynomials with coefficients in Z8 and Z15 for which the conclusion of Proposition 12.2 does not hold. 13 The Discrete Logarithm Problem. The Order of Elements in Z?n Let n be a modulus. We already looked at certain kinds of equations in Zn . For example, in Section 6, we learned that neither [x]2 +[y]2 = 3 in Z4 nor [x]2 + [y]2 + [z]2 = 7 45 in Z8 have solutions. In Section 8, we studied the equation [a][x] = [b] in Zn and saw that the usual application of the Extended Euclidean Algorithm allows us to produce all of its solutions. Now we want to understand how to handle exponential equations in Z?n . In these kinds of equations, we are given residue classes [a] and [b] from Z?n , and we want to determine all integer solutions x to the equation [a]x = [b]. This is essentially the same as solving the congruence ax ≡ b (mod n). The problem of finding solutions to these exponential equations is known as the discrete logarithm problem, or DLP. Example 13.1. In Section 10, we already saw an example of an exponential equation in Z?n , namely ax ≡ 1 (mod n). According to Euler’s Theorem, this equation always has a non-zero solution whenever a and n are coprime. In particular, any x ≡ 0 (mod ϕ(n)) satisfies the above congruence, for if x = ϕ(n)k for some integer k, then ax ≡ aϕ(n)k ≡ (aϕ(n) )k ≡ 1k ≡ 1 (mod n). However, we do not know whether there are no other solutions to this equation. Depending on the choice of a, there might exist other solutions as well. In general, the discrete logarithm problem is hard to solve. This problem lies in the foundation of certain cryptosystems, which we will study in the future. Examples include the ElGamal encryption scheme and the Diffie-Hellman key exhchange. There are algorithms for solving the discrete logarithm problem, such as Shanks’s baby-step giant-step algorithm, or the number field sieve. None of these algorithms run in polynomial time. However, just like for the problem of integer factorization, there are quantum algorithms which compute solve the discrete logarithm problem in polynomial time. In these notes, when solving the discrete logarithm problem, we will use brute force or apply Euler’s Theorem. In order to understand how solutions to ax ≡ b (mod n) look like, we need to understand certain fundamental properties of the group of units Z?n . Definition 13.2. If α ∈ Z?n , the order of α is the smallest exponent k ≥ 1 such that α k = 1. The order is denoted by k = ord(α) or, if α = [a] for some integer a, by k = ord(a). 46 From Euler’s Theorem, it follows that for all α ∈ Z?n it is the case that ord(α) ≤ ϕ(n). In fact, a much stronger result holds. Proposition 13.3. 16 Let α ∈ Z?n . A positive integer m satisfies α m = 1 if and only if ord(α) | m. Consequently, ord(α) | ϕ(n). Proof. Let k = ord(α). We apply the Remainder Theorem and write m = kq + r, where 0 ≤ r < k. Then, since α k = 1, we obtain 1 = α m = α kq+r = (α k )q α r = 1q α r = α r . Since k is the smallest positive integer satisfying α k = 1, it must be the case that r = 0, so k | m. For the converse, let m = kq. Then α m = α kq = (α k )q = 1q = 1. Finally, according to Euler’s Theorem it is the case that α ϕ(n) = 1. But then it follows from what we proved above that ord(α) | ϕ(n). Example 13.4. Let us determine ord(α) in Z?n for n = 17 and α = [3]. We have ϕ(n) = 16. Note that D = {1, 2, 4, 8, 16} is the complete list of positive divisors of ϕ(n). It follows from Proposition 13.3 that ord(α) ∈ D. Thus, in order to find the order of α, we just need to iterate over all elements in D. The smallest element d satisfying [3]d = [1] is the order. We have 31 ≡ 3 (mod 17), 32 ≡ 9 (mod 17), 34 ≡ (32 )2 ≡ 92 ≡ 81 ≡ −4 (mod 17), 38 ≡ (34 )2 ≡ (−4)2 ≡ 16 ≡ −1 (mod 17), 316 ≡ (38 )2 ≡ (−1)2 ≡ 1 (mod 17). Thus we see that ord(α) = 16, which is the largest possible order that the element of Z?17 can attain. Note that there was no need for us to compute 316 modulo 17, because we know the result from Euler’s Theorem. In contrast, consider the element β = [9] in Z?17 . We have 1 ≡ 316 ≡ (32 )8 ≡ 98 (mod 17), which means that ord(β ) ≤ 8. Convince yourself that, in fact, ord(β ) = 8. 16 Propositon 5.5 in Frank Zorzitto, A Taste of Number Theory. 47 [a]x Proposition 13.3 allows us to classify all solutions to the exponential equation = [b]. Proposition 13.5. Let [a], [b] be the elements of Z?n . If x satisfies the equation [a]x = [b], then all solutions x0 to this equation satisfy x0 ≡ x (mod ord(a)). Proof. Let x be a solution to ax ≡ b (mod n) and let k = ord(a). By the Remainder Theorem, we can write x = kq + r, where 0 ≤ r < k. But then ax ≡ akq+r ≡ (ak )q · ar ≡ 1 · ar ≡ ar (mod n). Thus, without loss of generality, we may assume that 0 ≤ x < k. Now suppose 0 that there exists some other x0 such that ax ≡ b (mod n). Once again, without loss of generality we may assume that 0 ≤ x ≤ x0 < k. But then ax ≡ b ≡ ax implies 0 0 ax −x ≡ 1 (mod n) (mod n). Since 0 ≤ x0 − x < k, it must be the case that x = x0 , for otherwise we would get a contradiction to the fact that k is the smallest positive integer satisfying ak ≡ 1 (mod n). Therefore all solutions to [a]x = [b] are of the form x0 ≡ x (mod ord(a)). Example 13.6. Let us compare the solutions to exponential equations 3x ≡ 1 (mod 17) and 9y ≡ 1 (mod 17). In the first case, we see that the congruence x ≡ 0 (mod 16) captures all solutions. However, in the second case, even though y ≡ 0 (mod 16) does provide solutions, it clearly does not cover all of the possibilities because, for example, y = 8 also satisfies 9y ≡ 1 (mod 17). In fact, Proposition 13.5 implies that the solutions are of the form y ≡ 0 (mod 8). We conclude this section with several general observations about orders of elements of Z?n . 48 Proposition 13.7. 17 If α ∈ Z?n and k = ord(α), then the list α, α 2 , α 3 , . . . , α k = 1 does not repeat itself. Proof. Suppose that we have a repetition α i = α j , where 1 ≤ i < j ≤ k. Thus α j−i = 1. Since 1 ≤ j − i < k, this contradicts the minimality of k as the order of α. Proposition 13.8. 18 If α ∈ Z?n and k = ord(α), then ord(α j ) = k . gcd( j, k) Proof. Let ord(α j ) = `. We will show that ` = k/ gcd( j, k). Note that α j` = (α j )` = 1. It follows from Proposition 13.3 that k | j`. That is, j` = ku for some integer u. But then j k `= u, gcd( j, k) gcd( j, k) and since j/ gcd( j, k) and k/ gcd( j, k) are coprime, it follows from Proposition 3.13 that k/ gcd( j, k) divides `. On the other hand, since k is the order of α, (α j )k/ gcd( j,k) = (α k ) j/ gcd( j,k) = 1 j/ gcd( j,k) = 1. By Proposition 13.3 applied to the order of α j , we obtain that ` | k/ gcd( j, k). Since k/ gcd( j, k) | ` and ` | k/ gcd( j, k), we conclude that ` = k/ gcd( j, k). Corollary 13.9. 19 Let α be an element of Z?n . Then ord(α j ) = ord(α) if and only if gcd( j, ord(α)) = 1. Proposition 13.10. are coprime then 20 Let α, β in Z?n have orders k and `, respectively. If k and ` ord(αβ ) = k`. 17 Proposition 5.6 in Frank Zorzitto, A Taste of Number Theory. 5.7 in Frank Zorzitto, A Taste of Number Theory. 19 Proposition 5.9 in Frank Zorzitto, A Taste of Number Theory. 20 Proposition 5.16 in Frank Zorzitto, A Taste of Number Theory. 18 Proposition 49 Proof. Let m = ord(αβ ). Since (αβ )k` = α k` β k` = (α k )` (β ` )k = 1` 1k = 1, we see from Proposition 13.3 that m | k`. We will now show that k` | m. Since gcd(k, `) = 1, it follows from Proposition 3.12 that we only need to demonstrate k | m and ` | m. On one hand, (α m )k = α mk = (α k )m = 1m = 1 and (β m )` = β m` = (β ` )m = 1m = 1. On the other hand, (α m )` = (α m )` · 1 = (α m )` (β m )` = (α m β m )` = ((αβ )m )` = 1` = 1. It follows from above calculations, as well as from Proposition 13.3, that k | m`. Since k and ` are coprime, Proposition 3.13 allows us to conclude that k | m. We can carry out an analogous calculation to show that (β m )k = 1, which would imply ` | m. But then k` | m, and since we already demonstrated that m | k`, it must be the case that m = k`. 14 The Primitive Root Theorem Let n be a modulus. The elements α ∈ Z?n whose order is equal to ϕ(n) deserve a special attention. According to Proposition 13.7, they generate the whole group Z?n simply by computing the exponents α, α 2 , . . . , α ϕ(n) = 1. Such elements are called primitive roots and in this section we address the question of their existence in Z?n . We will answer this question only partially by proving the Primitive Root Theorem. Definition 14.1. An element α ∈ Z?n is called a primitive root if ord(α) = ϕ(n). 50 Example 14.2. Let us demonstrate that Z?17 contains a primitive root. If we reduce the elements in the list {3, 32 , 33 , . . . , 316 } modulo 17, then the resulting list is {3, 9, 10, 13, 5, 15, 11, 16, 14, 8, 7, 4, 12, 2, 6, 1}. Note that all 16 elements are distinct and they constitute the whole Z?17 . Not every element in Z?17 is a primitive root. For example, the observation made above does not hold for the list {9, 92 , 93 , . . . , 916 } reduced modulo 17: {9, 13, 15, 16, 8, 4, 2, 1, 9, 13, 15, 16, 8, 4, 2, 1}. The first 8 elements are distinct, and starting from the 9th element the pattern repeats. Hence 9, 92 , . . . , 9ϕ(n) = 1 do not produce Z?17 , which is not a surprise, because from Example 13.4 we know that ord(9) = 8. There are groups which have no primitive roots at all. For example, there are no primitive roots in Z?n whenever n has at least two distinct prime divisors. Examples include Z?6 , Z?10 or Z?15 , and we leave it as an exercise to the reader to verify that each of these three groups have no primitive roots. Before jumping into the proof of the Primitive Root Theorem, let us determine how many primitive roots are there in Z?n . Proposition 14.3. 21 If Z?n has a primitive root, then the total number of primitive roots in Z?n is ϕ(ϕ(n)). Proof. Let α be a primitive root, so that ord(α) = ϕ(n) and α, α 2 , . . . , α ϕ(n) = 1 cover all Z?n without repetition. The other primitive roots are those powers α j in the list for which ord(α j ) = ϕ(n) = ord(α). According to Corollary 13.9, these are the powers α j where j from 1 to ϕ(n) is coprime to ϕ(n), and there are precisely ϕ(ϕ(n)) such j’s. We are now ready to state the Primitive Root Theorem. Theorem 14.4. (The Primitive Root Theorem)22 Let p be prime. Then Z?p contains a primitive element. 21 Proposition 22 Theorem 5.10 in Frank Zorzitto, A Taste of Number Theory. 5.17 in Frank Zorzitto, A Taste of Number Theory. 51 If you are familiar with the basics of group theory, then you can translate the statement of the theorem into group theoretical language by saying that the group Z?p is cyclic whenever p is prime. In order to prove this result, we need to prove one lemma. Lemma 14.5. 23 Let p be prime. If α is an element of Z?p of order k, then α, α 2 , . . . , α k−1 , α k = 1 is the complete, non-repeating list of all β in Z?p such that β k = 1. Proof. According to Proposition 13.7, the list α, α 2 , . . . , α k contains no repetitions. Every α j in the list satisfies (α j )k = (α k ) j = 1 j = 1. Hence every element in the list is a root of the polynomial xk − 1. Since we found k distinct roots of the polynomial xk − 1 whose degree is k, it follows from Proposition 12.2 that there are no other roots. Proof. (of Theorem 14.4) Let α be an element of Z?p . If ord(α) = p − 1, then α is a primitive root, so we are done. Thus we may assume that k = ord(α) < p − 1. According to Lemma 14.5, the list α, α 2 , . . . , α k = 1 picks up all roots of xk −1 in Z?p . Since k < p − 1, there is some γ in Z?p , which is not on this list. Hence γ k 6= 1. Let ` = ord(γ). Notice that ` - k, for otherwise we would have γ k = (γ ` )k/` = 1k/` = 1. This means that in the unique factorizations of k and `, there is a prime number q that appears more often in ` than it does in k. Therefore k = qd k1 and ` = qe `1 , where 0 ≤ d < e and q - k1 , q - `1 . d Let β = α q γ `1 . Then, according to Proposition 13.8, ord(α q ) = k k = d = k1 , d gcd(k, q ) q ord(γ `1 ) = ` ` = = qe . gcd(`, `1 ) `1 d 23 Proposition 5.15 in Frank Zorzitto, A Taste of Number Theory. 52 Since k1 and qe are coprime, it follows from Proposition 13.10 that d ord(β ) = ord(α q γ `1 ) = ord(α qd ) ord(α `1 ) = qe k1 > qd k1 = k = ord(α). In this way, new elements of strictly increasing order can be found in Z?p , until we reach some element of the largest possible order ϕ(p) = p − 1. By definition, this element is a primitive root. In conclusion, we provide a statement of the Generalized Primitive Root Theorem, which provides a full classification of moduli n such that Z?n contains a primitive root. Due to the time limitations, we will refrain from proving this result. Theorem 14.6. (Generalized Primitive Root Theorem) The group of units Z?n contains a primitive root if and only if n = 2, 4, an odd prime power, or an odd prime power multiplied by two. 15 Big-O Notation Before we proceed to the discussion of primality tests and integer factorization algorithms, let us introduce several important definitions. When analyzing the performance of algorithms, we will often be using the big-O notation and the notion of a polynomial time (or subexponential time or exponential time) algorithm. Definition 15.1. Let f (n) and g(n) be two functions of n. We say that f (n) = O(g(n)) if there exists a positive real number M such that | f (n)| ≤ M|g(n)| for all sufficiently large n. Example 15.2. Let f (n) = n2 + 4n + 7 and g(n) = n3 . Note that 12 = 19 = 28 = 39 = 52 = ... f (1) > g(1) = 1, f (2) > g(2) = 8, f (3) > g(3) = 27, f (4) < g(4) = 64, f (5) < g(5) = 125, so we see that, even though f (n) dominates g(n) for n = 1, 2, 3, the pattern changes for n = 4, 5, and in fact it so happens that f (n) < g(n) for all n ≥ 4. Thus f (n) = O(g(n)). Note, however, that g(n) 6= O( f (n)). 53 Another example is f (n) = en and g(n) = 5en + en/2 . Evidently, f (n) ≤ g(n), so f (n) = O(g(n)). However, one may also notice that g(n) = O( f (n)), because en/2 ≤ en , and this implies that g(n) = 5en + en/2 ≤ 5en + en = 6en = 6 f (n), which means that g(n) = O( f (n)). In this case, we say that f (n) and g(n) have the same asymptotic behaviour as n approaches infinity. The big-O notation is used in order to simplify f (n) whenever we are interested not in its precise form, but rather in its behaviour for very large n. For example, a function f (n) = n5 + 2en + 3 log(n) simplifies to f (n) = O(en ), because 2en dominates all other summands present above (note that 3 log(n) < n5 < 2en for sufficiently large n). Also, according to our definition, we may ignore the constant 2 in front of 2en , because it is present implicitly in the expression f (n) = O(en ). Thus, when writing a certain expression in its big-O form, all that we need to do is to identify some “simple” function that dominates f (n), and we want to pick this function in the best way possible. Say, in the example above we could have written f (n) = O(e2n ), but this is a less sharp estimate than f (n) = O(en ), because e2n grows much faster than en . Thus the expression f (n) = O(en ) tells us more information about the function f (n) than the expression f (n) = O(e2n ). The most common types of functions that we will encounter are at most constant growth; at most logarithmic growth; at most polynomial growth (k > 0); 1/k O exp cn at most subexponential growth (c > 0, k > 1); O(exp(cn)) at most exponential growth (c > 0). O(1) O(log n) k O(n ) When analyzing the performance of algorithms, the function f (n) will represent the number of steps required for the algorithm to terminate given the input n. For example, it was proved by Gabriel Lamé that the computation of gcd(a, b) with the Euclidean algorithm requires at most 5 log10 (min{a, b}) steps, and this allows us to conclude that the performance of the Euclidean algorithm is O (log (min{a, b})). So the number of steps required for the algorithm to terminate grows logarithmically as min{a, b} approaches infinity. 54 Definition 15.3. Suppose that an algorithm takes a positive integer n as its input. We say that an algorithm works in polynomial time if there exists a positive real number k such that the number of steps required for it to compute is O (log n)k . Once again, consider the Euclidean Algorithm. As the number of steps required to compute gcd(a, b) is equal to O(log(min{a, b})), we see that we may take k = 1 in order to conclude that the algorithm works in polynomial time. This may seem a bit strange, because (log n)k is not a polynomial function (compare it to, say, n2 or n3 + n + 7, which are polynomials). But when talking about an algorithm, we are interested in its performance not with respect to an input n, but rather with respect to the size of an input. You may think of the size of n as the number of decimal digits of n. This number never exceeds blog10 nc + 1, so it is logarithmic in terms of n. So, if we provide n = 1000000 as an input to some algorithm, roughly speaking we would consider it efficient if it terminates in 7k steps for some positive integer k (note that 7 is the number of decimal digits of n) rather than in 1000000k steps. From this perspective, any algorithm which works in O(n) = O(elog n ) would actually be considered as an algorithm which works in exponential time. Such algorithms can be used to compute values only for relatively small values of n. Example 15.4. Here are some examples of famous algorithms and their asymptotic running time. • The fastest algorithm for integer multiplication known to date is the ToomCook Multiplication Algorithm, which was invented in 1963. Given two positive integers a and b, for d = log (max{a, b}) this algorithm requires O(d 1.585 ) steps to compute, so it works in polynomial time; • Shanks’s Baby-Step Giant-Step Algorithm, which was invented in 1971, allows one to compute discrete logarithms modulo n. If d = log n, then the √ running time of the algorithm is O( n) = O(ed/2 ), so it works in exponential time; • General number field sieve is the fastest algorithm which factors large integers that is known to date. If n is an integer and d = log n, the algorithm 1/3 2/3 works in O(e2d (log d) ). The constant 2 in this expression is not optimal. We see that this algorithm is neither polynomial, nor exponential. These types of algorithms are called subexponential. 55 16 Primality Testing For more details, please refer to the monograph by R. Crandall, C. Pomerance, Prime Numbers: A Computational Perspective, 2001. As it was mentioned in the introduction, number theory is heavily used in cryptography. In the upcoming sections, we will look at several cryptographic protocols, all of which, in one way or the other, involve primality testing. For example, in order to ensure that the communication provided by the RSA cryptosystem is secure, one has to be able to generate a pair of very large prime numbers (several thousands of bits). But how do we ensure that some given number n is prime, when we know that the problem of factorization of large integers is infeasible to electronic computers? It turns out that there are several alternative ways to verify that n is prime, which do not require the factorization of n. There are three kinds of primality tests out there, namely 1. Heuristic tests — tests that work well in practice, but reside on a heuristic explanation rather than on a proof (Fermat’s Primality Test); 2. Probabilistic tests — given n, these tests verify whether a number n is a pseudoprime, i.e., it is a prime with a very large probability (Miller-Rabin Primality Test); 3. Deterministic tests — given n, these tests guarantee the primality or the compositeness of n (trial division, AKS Primality Test, Elliptic Curve Primality Test). In this section, we will study the trial division method, the Fermat’s Primality Test and the Miller-Rabin Primality Test. We remark that the best known primality test, the AKS Primality Test, was invented by Indian mathematicians Manindra Agrawal, Neeraj Kayal and Nitin Saxena in 2002. To this day, it is the only deterministic unconditional polynomial time algorithm for primality testing. In 2005, its asymptotic running time got improved by C. Pomerance and H. W. Lenstra, Jr. to Õ((log n)6 ). Despite all of its benefits, the probabilistic Miller-Rabin Primality Test is used in practice more often. If k denotes the number of times the algorithm has to run before we conclude that n is a pseudoprime, the asymptotic running time of the Miller-Rabin Primality Test is O(k(log n)3 ). 56 16.1 Trial Division What is the most obvious way for determining whether a given integer n ≥ 2 is composite? Well, one just has to find one of its non-trivial factors! That is, if we can show that there exists some integer d such that d | n and 1 < d < n, then n is composite. For example, if n = 35, we just have to check that 2 - 35, 3 - 35, 4 - 35, until we find out that 5 | 35. Therefore, 35 is a composite number. Of course, if we would consider n = 37, the problem arises, as now we have to check 2 - 37, 3 - 37, . . . , 36 - 37, until we find out that n is prime. Fortunately, as the following proposition suggests, there is no need to check all n − 2 numbers in between 1 and n to be certain that n is prime. Proposition 16.1. For any composite integer n ≥ 2 there exists a divisor d such √ that 1 < d ≤ n. Furthermore, we may assume that d is prime. Proof. Let n =√dk for some√non-trivial divisors d and k. If we now suppose that both√d > n and k√> n, then dk > n, a contradiction. Therefore either former. 1 < d ≤ n or 1 < k ≤ n hold. Without loss of generality, assume the √ Since Theorem 2.7 asserts √ the existence of a prime p dividing d and d ≤ n, we see that 1 < p ≤ d ≤ n. Now we may adjust our primality test as follows. Let bxc denote the largest integer ≤ x. According to Proposition 16.1, in order to verify that n is prime, we just have to ensure that √ 2 - n, 3 - n, . . . , b nc - n. √ For example, in the case of n = 37, we have b 37c = b6.083c = 6, and 2 - 37, 3 - 37, . . . , 6 - 37. Therefore 37 is prime. Thus √ we were able to reduce the number of steps in our primality test from n − 2 to b nc − 1. Quite a significant improvement! We can actually do slightly better. According to Proposition 16.1, we can limit ourselves only to prime divisors of n. So, in the case of n = 37, there was no need to check its divisibility by 4 or 6, since these numbers are composite. So we could achieve the same conclusion simply by testing 2 - 37, 3 - 37 and 5 - 37. In order √ to make this further improvement, we need to know all prime numbers ≤ n. Fortunately, there is a rather simple method called the Sieve or Eratosthenes, which allows us to produce all prime numbers up to X in O(X log log X) steps (see Assignment 3). The method was discovered by the Greek mathematician Eratosthenes of Cyrene (≈ 250BC), and goes as follows: 57 1. Initialize a table A of X elements by setting A[1] = 1 and A[i] = 0 for all 2 ≤ i ≤ X; 2. Let p = 2; 3. Set A[2p] = 1, A[4p] = 1, A[6p] = 1, and so on, for all multiples of p in the table A; √ 4. Change p to the smallest index k > p such that A[k] = 0. If p > X, terminate. Otherwise, return back to step 3. In the end, all elements i such that A[i] = 0 will correspond to prime numbers. It follows from Merten’s Second Theorem that the asymptotic running time of the Sieve of Eratosthenes is O(X log log X) (see Assignment 3). This can be further improved to O(X) if we start eliminating not from 2p (i.e. 2p, 4p, 6p, and so on), but from p2 , thus crossing out p2 , (p + 1)p, (p + 2)p, etc. The improvement becomes evident once we note that by the time the algorithm reaches prime p, the numbers 2p, 3p, . . . , (p − 1)p already got eliminated by some prime√less than p. each time Of course, it is impractical to run the Sieve of Eratosthenes up to n√ we try to factor n, as then the asymptotic running time will always be O( n). This is why in practice one usually runs the Sieve of Eratosthenes up to some large bound first, then stores all prime numbers in the table, and later uses this table to factor integers. It follows from the Prime Number Theorem that the number of √ primes ≤ X is O(X/ log X). So, assuming that the table of prime √ numbers up to n is given √ to us a priori, the trial division will now take O( n/ log n) steps instead of O( n). Note the power of this method: for example, given a number n ≤ 1012 , we just have to check p | n for all primes p ≤ 106 . Given the table containing 78498 prime numbers less than a million, this verification can be done by the computer almost immediately. In fact this method should work quite fast for all numbers with at most 18 decimal digits. However, when the number of digits of n exceeds 18, things start to get more complicated: there are too many prime numbers to check, and it is difficult to fit all of them into memory at once. 16.2 Fermat’s Primality Test Another interesting way of demonstrating that a number n is composite is to use the Fermat’s Little Theorem, which states that, if n is prime and a is coprime to n, 58 then an ≡ a (mod n). Therefore all that we have to do to prove that n is composite is to find a such that an 6≡ a (mod n). If a satisfies such a property, we call it a witness for the nonprimality of n. In practice, the computation of an (mod n) can be done relatively fast using the Double-and-Add Algorithm. Example 16.2. Let us use Fermat’s Primality Test to prove that n = 323 is not prime. Note that 323 = 28 + 26 + 2 + 1 = 256 + 64 + 2 + 1. Now pick a random a such that 1 < a < 323, say a = 5. If n is prime, then Fermat’s Little Theorem should hold for a. We use the Double-and-Add Algorithm to check whether this is the case: 52 ≡ 25, 54 ≡ (52 )2 ≡ 302, 58 ≡ (54 )2 ≡ 118, 516 ≡ (58 )2 ≡ 35, 532 ≡ (516 )2 ≡ 256 564 ≡ (532 )2 ≡ 290 5128 ≡ (564 )2 ≡ 120 5256 ≡ (5128 )2 ≡ 188 (mod 323). Thus 5323 ≡ 5256 · 564 · 52 · 5 ≡ 188 · 290 · 25 · 5 ≡ 256 · 125 ≡ 23 6≡ 5 (mod 323). This result allows us to conclude that 323 is not prime. Note, however, that if we would randomly pick a = 18, 152, 170 or any other number for which a323 ≡ a (mod 323) actually holds, we would not be able to draw any conclusion about n. Fortunately, for 323 there are only 7 possible a’s between 1 and 323 such that a323 ≡ a (mod 323), so the probability of this happening is relatively small. And even if this happens, we could just pick yet another random value of a, for which a323 6≡ a (mod 323) might be true. From Example 16.2, the algorithm becomes clear. Let n be an integer, and let k ≥ 1 be the maximal number of times that we are going to choose a at random. Then do the following: 59 1. Set i = 0; 2. If i = k, conclude that n is a pseudoprime. Otherwise pick a random integer a such that 1 < a < n; 3. Compute an (mod n) using the Double-and-Add Algorithm; 4. If an 6≡ a (mod n), conclude that n is composite. Otherwise increment i and go back to step 2. According to this algorithm, we conclude that n is a pseudoprime whenever k random choices of a result in an ≡ a (mod n). In practice, this algorithm works quite well, even though it is purely heuristic. However, there are some special composite numbers which do not admit witnesses of their non-primality at all. Definition 16.3. A composite integer n is called a Carmichael number whenever an ≡ a (mod n) for all integers a. There exist infinitely many Carmichael numbers, and the first 10 of them are 561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341. They were discovered by the American mathematician Robert Carmichael. What is interesting is that the criterion for determining Carmichael numbers was found by the German mathematician Alwin Korselt in 1899, even before Carmichael numbers were discovered. Theorem 16.4. 24 An integer n is a Carmichael number if and only if 1. n = p1 · p2 · · · pk , where k > 1 and p j are primes without repetition; 2. every p j − 1 divides n − 1. Therefore every Carmichael number will always be regarded as a pseudoprime by the Fermat’s Primality Test and this is unavoidable. 24 Theorem 5.21 in Frank Zorzitto, A Taste of Number Theory. 60 16.3 Miller-Rabin Primality Test This test was originally developed by Gary Miller in 1976 and it was deterministic, but its determinism relied on a reasonable but unproved conjecture, called the Extended Riemann Hypothesis. In 1980, Michael Rabin converted this algorithm into unconditional, but probabilistic algorithm. This is the algorithm that we are going to learn about. To understand the idea behind the Miller-Rabin primality test, recall that the congruence x2 ≡ 1 (mod p) has exactly two solutions, namely x ≡ ±1 (mod p), whenever p is prime. This simply follows from Proposition 12.2 applied to the quadratic polynomial x2 − 1 with coefficients in Z p . Now let n > 2 be prime. Then n − 1 = 2s d for some positive integers s and d, where d is odd. According to Fermat’s Little Theorem, s−1 2 s an−1 ≡ a2 d ≡ a2 d ≡ 1 (mod n). s−1 s−1 Thus we see that a2 d is a root of x2 − 1 modulo n. Since n is prime, a2 d ≡ ±1 s−1 (mod n). If a2 d ≡ −1 (mod n), we stop. Otherwise, we can extract the square s−2 root one more time, so that a2 d ≡ ±1 (mod n), and so on, until we either reach r a2 d ≡ −1 (mod n) for some r or ad ≡ 1 (mod n). We conclude that, if n is prime, then • either ad ≡ 1 (mod n); or r • a2 d ≡ −1 (mod n) for some r such that 0 ≤ r ≤ s − 1. Thus, if we could show that ad 6≡ 1 and (mod n) r a2 d 6≡ −1 (mod n) for all r such that 0 ≤ r ≤ s − 1, then n has to be composite. Note that with the s Fermat’s Primality Test we would only check for a2 d ≡ 1 (mod n), whereas in the s−1 Miller-Rabin primality test we perform s checks for ad , a2d , . . . , a2 d (mod n). As it turns out, this is more than enough to fix many problems that we saw with 61 Fermat’s Primality Test. For example, Catalan numbers can be recognized as composite numbers. Furthermore, one can prove that at least 3/4 of a’s coprime to an odd composite number n are witnesses of n’s compositeness. Therefore, the probability that the Miller-Rabin Test would fail is at most 1/4, which means that after k verifications the probability that n is composite while it is reported as pseudoprime is at most 1/4k . Unfortunately, one cannot do better than that, and predict the location of witnesses in Z/nZ. Their distribution can be very different, and this is why choosing a at random is better than to use a = 2, 3, 5, . . . iteratively. For example, Arnaut found a 397-digit composite number for which all bases a < 307 are not witnesses. This number was reported to be prime by the Maple isprime() function, because it picked prime bases a = 2, 3, 5, . . . iteratively, rather than randomly. Example 16.5. Let us show that n = 323 is a pseudoprime using Miller-Rabin Primality Test and base a = 18. Note that a323 ≡ a (mod n), so if we would use Fermat’s Primality Test on n only once, it would report n as a pseudoprime. However, 322 = 2 · 161, and we note that 18161 ≡ 18 6≡ ±1 (mod 323), so n = 323 would be reported as composite by the Miller-Rabin Primality Test. 17 Public Key Cryptosystems. The RSA Cryptosystem For more details, please refer to the monograph by W. Trappe, L. C. Washington, Introduction to Cryptography with Coding Theory, 2nd edition, 2006. Suppose that Alice wants to send a secret message to Bob, and because they are too far away from each other and personal communication is impossible, she needs to send this message over the internet. The channel between Alice’s computer and Bob’s computer is unprotected. While travelling from one computer to the other, the message passes many times through many different routers, and it is possible to intercept it by listening on the channel. For example, this can be done with packet analyzers like WireShark. Though interception of the message is hardly avoidable, it is possible to protect the information itself through encryption. Since the antiquity, the humanity was using what we now call private key cryptosystems. Perhaps, the most famous example of a private key encryption 62 is the so-called Caesar cypher. According to Suetonius, Julius Caesar used this cypher in order to encrypt messages of military significance. The cypher shifts the message by 3 letters to the left: A → X, B → Y , C → Z, D → A, . . . , Y → T , Z → V (note that we used Latin alphabet instead of English alphabet). For example, the phrase DEVS EX MACHINA can be encrypted using Caesar’s cypher as follows: ABRP BS IXZEFKX Now this cypher is not terribly sophisticated, but back in Caesar’s time it was considered quite complex, and surely the receiver would have to know the magical number 3 in order to decrypt it by shifting letters three times to the right. So, as we can see, both the sender and the receiver, along with the encryption/decryption procedure, must agree on some private key, which in this case is equal to 3. Many ciphers, such as the Vigenère cipher, the renowned Enigma cipher, or modern ciphers such as the Digital Encryption Standard (DES) or Rijndael (AES), work according to this principle: once the sender and the receiver agree on some secret key, they both can encrypt and decrypt messages, thus being able to communicate securely. But what if the sender and the receiver are too far away from each other? If Alice is in Australia, Bob is in Bulgaria, then how can they agree on a secret key? One answer to this problem would be public key cryptography. Key insight: Alice and Bob don’t even have to agree on a private key in order to send encrypted messages to each other! The RSA cryptosystem was invented in 1977 by Ron Rivest, Adi Shamir and Leonard Adleman. It was the first practical widely deployed public key cryptosystem. This is how RSA works. Bob generates two really large distinct prime numbers p and q, computes n = pq, as well as ϕ(n) = (p − 1)(q − 1). Then he chooses an encryption exponent e such that gcd(e, ϕ(n)) = 1, and solves the congruence de ≡ 1 (mod ϕ(n)) for d. Then he sends the public key (n, e) to Alice. Alternatively, he can publish (n, e) on his webpage, thus making this key publicly available to everyone. However, he does not release the private key (p, q, d). No one knows the values of p, q and d except for Bob. 63 Now Alice can use Bob’s public key (n, e) to send messages to Bob securely. Suppose that Alice wants to send a message written in English. First, she converts this message into a number m. For example, this can be done using the ASCII table. According to the ASCII table, every upper or lower case letter of English alphabet, digit, and some special characters like * $ ! or %, correspond to some number between 0 and 127. For example, in the message Hello! the letter ‘H’ corresponds to 72, letter ‘e’ corresponds to 101, and so on: Character H e l o ! Base 10 72 101 108 111 33 Base 2 010010002 011001012 011011002 011011112 001000012 We concatenate base 2 representations of ASCII numbers corresponding to our characters together, thus obtaining a bigger number m: m = 01001000 . | {z } 01101100 | {z } 01101100 | {z } 01101111 | {z } 00100001 | {z } 01100101 | {z }2 H e l l o ! Note that each character fits into 1 byte = 8 bits. Since there are 6 characters in our message, the resulting number m satisfies 0 ≤ m < 26·8 = 248 . Now, if Bob will receive this number m, he can easily decode the message by reading off 8 bits at a time and matching them to a corresponding character in the ASCII table. Before encrypting the message, Alice needs to verify that 0 ≤ m < n so that the information will not get lost during the transmission. If it so happens that m ≥ n, she breaks the message into k = bm/nc + 1 pieces m1 , m2 , . . . , mk such that 0 ≤ mi < n for all i, 1 ≤ i ≤ k, and then sends m1 , m2 , . . . , mk to Bob consecutively. Suppose that 0 ≤ m < n. Now Alice uses Bob’s public key (n, e) and computes the integer c, 0 ≤ c < n, such that c ≡ me (mod n). This number c is the result of RSA encryption, and Alice sends this encrypted message to Bob over the unprotected channel. 64 When Bob receives the encrypted message c, he can decrypt it and obtain the original message m using the private key d: cd ≡ (me )d ≡ mde ≡ m (mod n). Note that above we utilized the fact that de ≡ 1 (mod ϕ(n)). Example 17.1. Suppose that Bob chose p = 1597 and q = 4139. Then n = pq = 1597 · 4139 = 6609983, ϕ(n) = (p − 1)(q − 1) = 1596 · 4138 = 6604248. Bob chooses the encryption exponent e = 3263993 and then computes d ≡ e−1 ≡ 3263993−1 ≡ 2051801 (mod 6604248). Now he keeps p, q and d in secret, and makes (n, e) publicly available. Now, in order to send the message “Hi!” to Bob, Alice converts it into an integer m using the ASCII table: m = 01001000 = 4745505. | {z } 01101001 | {z } |00100001 {z }2 H i ! Alice verifies that 0 ≤ m < n, and then computes the encrypted message c with the Double-and-Add Algorithm using Bob’s encryption exponent e: c ≡ me ≡ 47455053263993 ≡ 673426 (mod 6609983). Then Alice sends c = 673426 to Bob. When Bob receives c, he computes m with the Double-and-Add Algorithm using his private key d: m ≡ cd ≡ 6734262051801 ≡ 4745505 (mod 6609983). After that, Bob converts the 3 byte number m into a three character message “Hi!” which Alice sent to him using the ASCII table. Now why this method of communication is secure? Suppose that some malicious adversary Eve managed to eavesdrop on the unprotected channel and intercept the message c. Since Bob’s public key (n, e) is available to everyone, Eve also knows both n and e. Therefore Eve’s goal is, by knowing (n, e) and c, to obtain 65 m. The most obvious way to solve this problem is to find an integer d such that de ≡ 1 (mod ϕ(n)). In order to do so, Eve has to compute ϕ(n) = (p − 1)(q − 1) by knowing n. Unfortunately for Eve, the problem of computing ϕ(n) from n when n is a composite number is difficult, and requires a factorization of n. To this day, we do not know any polynomial time factorization algorithms. The best ones, namely the Quadratic Sieve and the Generalized Number Field Sieve, are subexponential. Thus, if we choose n large enough, — and the National Institute of Standards and Technology (NIST) recommends to choose n > 21024 , — the factorization of n would become infeasible to modern electronic computers, even if the work load would be distributed among several supercomputers. Of course, the numbers p, q and e should be chosen by Bob very carefully. For example, if either p or q are really small, √ then they can be located √ using trial √ division. If either p or q are really close to n = pq, say |p − n| ≤ 2n1/4 , then the number n can be factored using the Fermat’s Factorization Method. If the prime divisors of either p − 1 or q − 1 are really small, then the number n can be factored using Pollard’s p − 1 Algorithm (see Assignment 3). If e is chosen so that d is really small, say d < 3−1 n1/4 , then it can be calculated in polynomial time O(log n) (see Section 6.2.1 in Trappe and Washington). When sending the message, Alice has to be really cautious as well. For example, if the number m is relatively small in comparison to n, then even without the knowledge of d or the factorization of n Eve can decrypt the message using the Short Plaintext Attack (see Section 6.2.2 in Trappe and Washington). To solve this problem, Alice can pad her message with some random characters either at the beginning or at the end. So as you can see, there are many things that both Alice and Bob have to check before establishing a secure communication. The RSA cryptosystem can be utilized not only for secure communication, but also for authentication purposes. Imagine a situation when Alice sends a message m to Bob, and Bob cares not so much about the privacy of their communication, but rather about the authenticity of the sender. That is, he wants to be absolutely sure that the message m was sent to him by Alice and no one else. The way this can be done using RSA is as follows: Alice puts a digital signature s on the message m using her private key d: s ≡ md (mod n). Then she sends (m, s) to Bob. When Bob receives the message with Alice’s signature, he can verify that it belongs to Alice by using her public key e and checking that m ≡ se (mod n). 66 Exercise 17.2. Use your favourite computer algebra system to encrypt the message m = 12345 with RSA using the public key (n, e) = (786073, 221891). Then break the system by factoring n = pq, determining the private key d, and then decrypting the message c = 547988. Exercise 17.3. Use your favourite computer algebra system to verify that the message (m, s) = (100, 1580073) belongs to the owner of the public key (n, e) = (5988889, 4324055). Then break the system and put a fake digital signature s0 on the message m0 = 1000000, so that (m0 , s0 ) passes the verification with the public key (n, e). Exercise 17.4. (Exercise 7 in Trappe and Washington) Naive Nelson uses RSA to receive a single ciphertext c, corresponding to the message m. His public modulus is n and his public encryption exponent is e. Since he feels guilty that his system was used only once, he agrees to decrypt any ciphertext that someone sends him, as long as it is not c, and return the answer to that person. Eve sends him the ciphertext 2e c (mod n). Show how this allows Eve to find m. Exercise 17.5. (Exercise 8 in Trappe and Washington) In order to increase security, Bob chooses n and two encryption exponents e1 , e2 . He asks Alice to encrypt her message m to him by first computing c1 ≡ me1 (mod n), then encrypting c1 to get c2 ≡ ce12 (mod n). Alice then sends c2 to Bob. Does this double encryption increase security over single encryption? Why or why not? Exercise 17.6. (Exercise 10 in Trappe and Washington) The exponents e = 1 and e = 2 should not be used in RSA. Why? 18 The Diffie-Hellman Key Exchange Protocol There are many benefits to using RSA, but there is one big problem: despite the fact that it works in polynomial time, it is quite slow. For suppose that we want to compute c ≡ me (mod n). The Double-and-Add Algorithm requires at most log e squarings and at most log e multiplications, thus resulting in at most 2 log e ≤ 2 log n arithmetic operations in total. Each multiplication involves numbers of size at most log n. The best known multiplication algorithm, the Toom-Cook Algorithm, requires O((log n)1.465 ) steps to multiply two integers of size at most log n. Since there are at most 2 log n multiplications, the encryption and decryption require O((log n)2.465 ) steps to compute. 67 Roughly speaking, this means that if n is a 2048 bit number, then one can encrypt or decrypt messages in 20482.465 ≈ 1.45 · 108 steps. Private key cryptosystems (also referred to as symmetric ciphers or block ciphers) are much much faster, because their execution does not involve any complex mathematical computations. Instead, in order to encrypt the message they use logical operations, such as AND, OR, NOT and XOR, as well as bit shifts and bit permutations. Caesar cipher is an example of a cipher which uses only shifts, but on letters of the alphabet rather than on bits. Anagrams, like “eHll!o”, are examples of permutations on letters. These operations are very simple and in fact require only O(1) steps to compute (compare it to multiplication, which requires O((log n)1.465 )). In the end, both encryption and decryption for these ciphers require O(log n) steps. The most widely deployed symmetric ciphers are 3-DES (Triple Data Encryption Standard) and AES (Advanced Encryption Standard), which is also commonly referred to as Rijndael. As it was mentioned in Section 17, in order to use private key cryptosystems two parties must agree on a secret key. So how can this be done when Alice and Bob are too far away from each other? Here is one way: Alice generates a secret key K, encrypts it using RSA with Bob’s public key, and then sends the encrypted message to Bob. Bob decrypts the message, and so now Alice and Bob share a secret K in common. Then they may use whichever symmetric algorithm they want, such as 3-DES or AES. But there is another way for Alice and Bob to agree on a common key. This procedure, called The Diffie-Hellman Key Exchange Protocol, was patented by Whitfield Diffie and Martin Hellman in 1977. Its security is based on the Discrete Logarithm Problem, and it works as follows. Alice generates a large prime number p, an integer g such that 0 ≤ g < p, and an integer x such that 1 ≤ x ≤ p − 2. She computes gx (mod p), and then sends p, g and gx (mod p) to Bob. When Bob receives p, g and gx (mod p), he generates an integer y such that 1 ≤ y ≤ p − 2, computes gy (mod p), and then sends it back to Alice. Finally, since Alice knows x and gy (mod p), she can compute gxy ≡ (gy )x (mod p), and since Bob knows y and gx (mod p), he can compute gxy ≡ (gx )y (mod p). So in the end both Alice and Bob share a secret in common, namely gxy (mod p). 68 Why is this secure? If a malicious adversary Eve would listen on the communication between Alice and Bob, she could intercept p, g, gx (mod p) and gy (mod p), and by knowing this information she would have to compute gxy (mod p). This problem is called the Diffie-Hellman Problem, and it is at least as hard as the Discrete Logarithm Problem. That is, if Eve would know how to solve the Discrete Logarithm Problem, she would be able to solve the Diffie-Hellman Problem (see Assignment 3). However, it is not known whether these two problems are equivalent. We do not know any polynomial time algorithm for computing discrete logarithms. The best known subexponential algorithm is due to Adleman and it utilizes index calculus. The discrete logarithm can be computed quite fast in some special cases, but if the parameters p, g, x and y are chosen properly, the problem becomes intractable to modern electronic computers. There are many things that need to be verified in order to ensure that the communication is secure, but we will just mention that the parameter g should be chosen so that ord(g) in Z?p is sufficiently large. As a final remark, we would like to mention that there exists an efficient quantum algorithm for computing discrete logarithms, which was invented by Peter Shor in 1997. 19 Integer Factorization The next computational problem that we address is the integer factorization problem. That is, given a composite integer n, we would like to find a non-trivial divisor of n. Unlike for primality testing, we do not know any polynomial time algorithm for integer factorization. Many mathematicians believe that the integer factorization problem is hard, and several cryptographic protocols, such as RSA, reside on this assumption. If you want to become a famous mathematician, try inventing a polynomial time algorithm for integer factorization. Note, however, that there exists an efficient quantum algorithm for integer factorization, which was invented by Peter Shor in 1994. There are many algorithms for integer factorization. The most obvious one, trial division, we studied√in Section 16. Of course, this algorithm allows us to factor an integer n in O( n) = O(elog n/2 ) steps, so this algorithm is exponential and is no good for factoring large integers. In this section, we will study two factorization algorithms, namely the Fermat’s Algorithm and its optimized variant, called the Dixon’s Algorithm. The former is an exponential algorithm and the latter is a subexponential algorithm. 69 You will also learn about Euler’s Factorization Method in Assignment 3. 19.1 Fermat’s Factorization Method Fermat’s Factorization Method was suggested by the French mathematician Pierre de Fermat back in XVII century. The idea is simple: given an integer n, the goal is to find integers x and y such that n = x2 − y2 . Then n = (x − y)(x + y), and if neither x − y nor x + y are equal to 1, this results in a non-trivial factorization of n. Note that even numbers cannot be represented in this form, but we may easily disregard them from consideration, since every even number greater than 2 always has a non-trivial divisor equal to 2. Unlike even integers, odd integers can be represented as a difference of two perfect squares, for if n = k`, then n= k+` 2 2 k−` − 2 2 . Since n is odd, then so are k and `, which means that both (k + `)/2 and (k − `)/2 are integers, too. If n = k` is a multiple of 4, such a representation is also possible once we assume that both k and ` are even. From the formula above it is also evident that there can be many representations of an integer as a difference of two perfect squares. Let dxe denote the smallest integer ≥ x. We will now convert the observations made above into an algorithm: √ 1. Put x := d ne and then set y := x2 − n; √ 2. If y is a perfect square, return x − y ; otherwise proceed to Step 3; 3. Increase x by 1 and then set y := x2 − n; 4. Go back to Step 2. Note that the algorithm always terminates. Furthermore, if the algorithm returns 1, then the number n must be prime. 70 Example 19.1. Let us use Fermat’s Algorithm to factorize n = 8023. Note that √ n ≈ 89.57, so we begin with x = 90 and y = x2 − n = 902 − 8023 = 77. We see that y = ? x y 90 77 no 91 258 no yes 92 441 √ Since 441 = 21, we see that 8023 = 922 − 212 = (92 − 21)(92 + 21) = 73 · 113. Thus Fermat’s Factorization Algorithm terminated in just three steps, resulting in √ a non-trivial factor x − y = 92 − 21 = 73. Exercise 19.2. Use Fermat’s Algorithm to factor integers 4747 and 7303. Now let us analyze the performance of the algorithm above. We will count a single computation √ of x and y as one step. If n = k` and k is the largest divisor of n such that k ≤ n, then Fermat’s Algorithm will return k as a result. In this case, y = (k + `)/2, which means that the number of steps required for the computation is equal to √ k+` − d Ne. 2 We can bound this quantity from above as follows: √ k+` k+` √ − d Ne ≤ − N 2 2 √ √ ( k − `)2 = √ 2 ( n − k)2 = . 2k We see that, if n is prime, then k = 1 and the algorithm requires O(n) steps to compute. Therefore, in its worst case, the algorithm is exponential. Note √ that it is even worse than trial division, because the trial division requires O( n) steps to compute. Why do we care then about Fermat’s Factorization Method? First of all, in some special cases it performs really well. For suppose that k satisfies √ n − k ≤ 2n1/4 , 71 so it is relatively close to √ n. Then for all n > 64 it is the case that √ √ 4 n ( n − k)2 ≤ √ 2k 2( n − 2n1/4 ) 2 ≤ 1 − 2n−1/4 < 3, which means that Fermat’s Algorithm terminates in two steps! Of course, this is much faster than if we would use trial division. This is why Fermat’s Factorization Method is usually used in √ combination with the Trial Division Method. First one chooses a constant c > n and then Fermat’s Algorithm is used to look for √ divisors between n and c. After that, one√only has to check√prime divisors of n with the trial division method up to c − c2 − n instead of n. Even though this observation does not allow us to push the bound below O(n1/2 ), it helps to decrease the constant implicit in the big-O notation significantly. Further improvements can be done through sieving, and in 1974 Lehman managed to combine all of the improvements and invented a factorization algorithm based on Fermat’s Factorization Method and trial division with asymptotic running time O(n1/3 ). Though Fermat’s Algorithm can be quite slow in its worst case, it lies in the foundation of the best factorization algorithms known to date, namely the quadratic sieve and the generalized number field sieve, which have subexponential asymptotic running time. Both of these algorithms evolved from the factorization method due to Dixon. 19.2 Dixon’s Factorization Method Dixon’s Factorization Method was proposed in 1971 by the Canadian mathematician John D. Dixon, who is a professor emeritus at Carleton University, Ottawa. Recall that in Fermat’s Factorization Method we were choosing an integer x between 0 and n and then evaluating x2 (mod n), hoping that the result would be a perfect square; that is, x2 ≡ y2 (mod n). √ Unfortunately, up to n, there are only b nc perfect squares, and so for very large n the total proportion of perfect squares less than n tends to zero: √ √ b nc n 1 ≤ = √ −→ 0. n n n 72 Dixon’s method suggests that, instead of looking for a perfect square we can actually construct it from many random samples. The idea is as follows: by picking distinct x1 , x2 , . . . between 0 and n at random, we obtain relations of the form x12 ≡ z1 x22 ≡ z2 ... (mod n), (mod n), where z1 , z2 , . . . are integers between 0 and n. One would then hope to select relations i1 , i2 , . . . , ir so that the number zi1 zi2 · · · zir = y2 is a perfect square. But then (xi1 xi2 · · · xir )2 ≡ y2 (mod n), which means that one can compute a divisor d of n by evaluating d = gcd(xi1 xi2 · · · xir − y, n). If it so happens that d = 1 or d = n, we construct a new set of random samples, or select a different k-tuple i1 , i2 , . . . , ir with the property described above. Now the main question is, how do we construct congruences xi2 ≡ zi (mod n), from which we can produce a non-trivial perfect square? The main idea here is to pick only those xi ’s, for which the resulting values of zi ’s are so-called B-smooth numbers. Definition 19.3. Let B ≥ 2 be a real number. An integer n is called B-smooth if for any prime p | n it is the case that p ≤ B. Example 19.4. For example, numbers 2, 3, 4, 5, 6, 8, 9, 10, 12 are all 5-smooth. The reason is that every prime p dividing an integer from that list satisfies p ≤ 5. The numbers 7 and 11, however, are not 5-smooth, but they are both 11-smooth. Now every time we choose a random x and then evaluate z ≡ x2 (mod n) such that 0 ≤ z < n, we need to verify that z is B-smooth. One can check that a given number z is B-smooth in just O(B) steps using trial division. Note that, if p1 < p2 < . . . < pk are all prime numbers ≤ B, then every B-smooth number can be written in the form z = pe11 pe22 · · · pekk , where e1 , e2 , . . . , ek are non-negative integers. Thus we obtain a vector v = (e1 , e2 , . . . , ek ) in Zk . Further, we can reduce the elements of this vector modulo 2, thus obtaining a vector ṽ = (ẽ1 , ẽ2 , . . . , ẽk ) in Zk2 with ẽ1 , ẽ2 , . . . , ẽk ∈ {0, 1}. Because Z2 forms a 73 field (that is, division by a non-zero element is always allowed), the set Zk2 constitutes a k-dimensional vector space over Z2 , which means that we can analyze it from the perspective of linear algebra. In particular, any collection of k + 1 vectors in Zk2 will always be linearly dependent. Now suppose that for distinct values x1 , x2 , . . . , xk+1 we managed to compute B-smooth values z1 , z2 , . . . , zk+1 , which correspond to vectors v˜1 , v˜2 , . . . , vk+1 ˜ in k k Z2 . Since Z2 has dimension k, it must be the case that vectors v˜1 , v˜2 , . . . , vk+1 ˜ are linearly dependent in Zk2 . But then there must exist indices i1 , i2 , . . . , ir for some r ≤ k + 1 such that vi1 + vi2 + . . . + vir ≡ 0 (mod 2), which means that zi1 zi2 · · · zir is a perfect square. In order to find such linearly dependent vectors v˜i1 , v˜i2 , . . . , v˜ir in Zk2 , we row reduce the (k +1)×(k +1) matrix M = [v˜1 , v˜2 , . . . , vk+1 ˜ ]T , whose coefficients belong to Z2 . Note that the row reduction requires O(k3 ) = O(B3 ) steps. At this point, we can compute the value d = gcd(xi1 xi2 · · · xir − zi1 zi2 · · · zir , n) and, in case if d = 1 or d = n, repeat the procedure of choosing distinct random values x1 , x2 , . . . , xk+1 once again. The only thing that is left for us to establish is the value of B. As it turns out, √ log n log log n) , so the asymptotic running O( the most optimal choice for B is B = e time of Dixon’s algorithm is subexponential. Exercise 19.5. In this exercise, we will use Dixon’s method to find a non-trivial factor of 34081. (a) Factorize integers 15, 486, 24010 to ensure that they are all 7-smooth; (b) Suppose that the execution of Dixon’s Factorization Algorithm allowed us to locate the congruences 8052 ≡ 486 (mod 34081); 8462 ≡ 15 (mod 34081); 9542 ≡ 24010 (mod 34081). Using the above congruences, as well as the factorizations obtained in Part (a), find integers x and y such that x2 ≡ y2 (mod 34081), and then use these x and y to compute a non-trivial factor of 34081. 74 20 Quadratic Residues Let n ≥ 3 be a modulus and a, b, c be arbitrary integers. We will now turn our attention to quadratic congruences ax2 + bx + c ≡ 0 (mod n). We require that n - a, for otherwise the above congruence would reduce to the linear congruence bx + c ≡ 0 (mod n). Also, if n = 2, by Fermat’s Little Theorem x2 ≡ x (mod 2) regardless of x. Thus ax2 + bx + c ≡ (a + b)x + c (mod 2), so once again we obtain a linear congruence. Thus it is reasonable to assume that n ≥ 3. Finally, for the simplicity of exposition, we will assume that n is an odd prime, and we will indicate that by writing p instead of n. Note that the integer p−1 2 is even. In this section, we will not aim to solve quadratic congruences. Instead, we will investigate when solutions exist. Note that it follows from Propositon 12.2 that the polynomial [a][x]2 + [b][x] + [c] has at most 2 roots in Z p . Proposition 20.1. 25 Let p be an odd prime, and a, b, c be integers where p - a. The quadratic congruence ax2 + bx + c ≡ 0 (mod n) has a solution x if and only if the congruence y2 ≡ b2 − 4ac (mod p) has a solution y. In that case, y ≡ 2ax + b (mod p). Proof. Multiply both sides of the quadratic congruence by 4a to get 4a2 x2 + 4abx + 4ac ≡ 0 (mod p). This can be rewritten as (2ax + b)2 − b2 + 4ac ≡ 0 25 Proposition (mod p), 6.1 in Frank Zorzitto, A Taste of Number Theory. 75 which is the same as (2ax + b)2 ≡ b2 − 4ac (mod p). Conversely, suppose that y is a solution to y2 ≡ b2 − 4ac (mod p). Note that we can solve the linear congruence 2ax + b ≡ y (mod p) for x, because [2a] is a unit in Z p . Thus (2ax + b)2 ≡ y2 ≡ b2 − 4ac (mod p), which is the same as 4a2 x2 + 4abx + 4ac ≡ 0 (mod p). Since [4a] is a unit in Z p , we can multiply both sides of the above congruence by (4a)−1 (mod p) in order to obtain ax2 + bx + c ≡ 0 (mod p). Therefore x which satisfies 2ax + b ≡ y (mod p) is a solution to the original quadratic congruence. Proposition 20.1 tells us that solving the quadratic congruence ax2 + bx + c ≡ 0 (mod p) is equivalent to solving a simplified quadratic congruence x2 ≡ d (mod p), where d = b2 − 4ac. The integer d is called the discriminant of the quadratic polynomial aX 2 + bX + c. Thus, in order to find solutions to x2 ≡ d (mod p), we need to understand which residue classes of Z p are squares. Definition 20.2. A residue α in Z p is called a quadratic residue when α ∈ Z?p and α = β 2 for some other residue β in Z?p . If such β does not exist, then α is called a quadratic nonresidue. When translated to the language of congruences, we say that an integer a has a quadratic residue modulo an odd prime p if p - a and a ≡ x2 (mod p) for some integer x. 76 Example 20.3. Let us find all quadratic residues in Z?13 . We note that [1]2 = [1] [2]2 = [4] [3]2 = [9] [4]2 = [3] [5]2 = [12] [6]2 = [10] [7]2 = [10] [8]2 = [12] [9]2 = [3] [10]2 = [9] [11]2 = [4] [12]2 = [1] Thus the quadratic residues are [1], [3], [4], [9], [10], [12]. Exercise 20.4. Determine all quadratic residues in Z?17 , Z?19 and Z?23 . Proposition 20.5. Let p be an odd prime. Then the group of units Z?p has exactly (p − 1)/2 quadratic residues and exactly (p − 1)/2 quadratic nonresidues. Proof. Note that, for any [a] in Z?p , it is the case that [a]2 = (−[a])2 . Thus it is sufficient to look at a’s such that 1 ≤ a ≤ (p − 1)/2. We now claim that all the elements in the collection p−1 2 2 2 [1] , [2] , . . . , 2 are distinct. Suppose not, and [a]2 = [b]2 = [c] for some residue [c]. Then both [a] and [b] are the roots of the polynomial X 2 − [c] in Z p . By Proposition 12.2, such a polynomial has at most 2 roots in Z p . However, we see that it has at least 4 distinct roots, namely ±[a] and ±[b]. Thus we obtain a contradiction. Therefore the above collection has no repetitions, so Z?p contains (p − 1)/2 residues. Since every element of Z?p which is not a residue is a nonresidue, we conclude that there are exactly (p − 1)/2 nonresidues. Definition 20.6. For an odd prime p and an integer a coprime with p, we let ( +1 if a has a quadratic residue modulo p; a := p −1 if a does not have a quadratic residue modulo p. The symbol ap is called the Legendre symbol for a modulo p. Example 20.7. Note that 8 6 = +1 while = −1. 17 17 77 Also, for any odd prime p it is clear that 1 is a quadratic residue, i.e. However, the value of −1 p varies with p. For example, −1 −1 = +1 while = −1. 13 19 1 p = +1. We will now give an alternative proof of Proposition 20.5 using primitive roots. Proof. (of Proposition 20.5) Since p is an odd prime, it follows from the Primitive Root Theorem that there exists a primitive root γ in Z?p . That is, for every residue α in Z?p there exists an integer j, 1 ≤ j ≤ p − 1, such that α = γ j . First of all, let us demonstrate that it is impossible to represent α by both odd and even powers of γ. For suppose that α = γ i = γ j for some 1 ≤ i ≤ j. Then γ j−i = 1. By Proposition 13.3, ord(γ) | j − i. Since ord(γ) = p − 1, we conclude that an even number p − 1 divides j − i. But then it means that either both i and j are odd or both i and j are even. Now recall that, since γ is a primtive root in Z?p , the elements γ, γ 2 , . . . , γ p−1 are distinct, and half of them are even powers of γ. These are the quadratic residues. On the other hand, all odd powers of γ are quadratic nonresidues. Proposition 20.8. Let p be an odd prime and let α and β be the elements of Z?p . Then • If α and β are quadratic residues then αβ is a quadratic residue; • If α is a quadratic residue and β is a quadratic nonresidue then αβ is a quadratic nonresidue; • If α and β are quadratic nonresidues then αβ is a quadratic residue. Proof. Since p is an odd prime, it follows from the Primitive Root Theorem that there exists a primitive root γ in Z?p . Then α = γ i and β = γ j , so αβ = γ i+ j . Now, as we saw in the second proof of Proposition 20.5, if α and β are quadratic residues then both i and j are even, which means that i + j is even as well. Therefore αβ = (γ (i+ j)/2 )2 is a quadratic residue. We can prove the other two statements analogously. The propositions above suggest one algorithm for calculating the Legendre a symbol p . First, we need to find the primitive root γ in Z?p and then determine the parity of x in γ x = [a]. Fortunately, Euler came up with a much simpler procedure. 78 Proposition 20.9. (Euler’s Test)26 If p is an odd prime and a is an integer such that p - a, then p−1 a 2 (mod p). a ≡ p p−1 In other words, if a has a quadratic residue, then a 2 ≡ +1 (mod p), and if a p−1 does not have a quadratic residue, then a 2 ≡ −1 (mod p). Proof. Let [b] be a primitive root in Z?p . Suppose that a is a quadratic residue. Then a ≡ b2 j (mod p) for some non-negative integer j. Thus a p−1 2 ≡ b2 j p−1 2 ≡ b(p−1) j ≡ (b j ) p−1 ≡ 1 (mod p). Thus ap = +1, as claimed. Now suppose that a is a quadratic nonresidue. Then a ≡ b2 j+1 (mod n) for some non-negative integer j. Then a p−1 2 ≡ b2 j+1 p−1 2 ≡b p−1 2 b(p−1) j ≡ b p−1 2 (mod p). Note that p−1 2 ≡ b p−1 ≡ 1 (mod p), b 2 h p−1 i so the residue class b 2 is a root of the polynomial X 2 − 1 in Z p . Since p is an odd prime, by Proposition 12.2, this polynomial has at most two roots. In fact, it has exactly two roots, namely X = ±[1]. Therefore b p−1 2 ≡ ±1 (mod p). p−1 Note that it cannot happen that b 2 ≡ 1 (mod p), because then the order of [b] would be strictly less than p − 1 = ϕ(p), which contradicts the fact that [b] is a primitive root in Z?p . Therefore b 26 Proposition p−1 2 ≡ −1 (mod p), 6.8 in Frank Zorzitto, A Taste of Number Theory. 79 and so we conclude that, when a is a quadratic nonresidue, a p−1 2 ≡ −1 (mod p). Therefore for any a such that p - a it is the case that a p−1 2 ≡ a p (mod p). Corollary 20.10. 27 The integer −1 is a quadratic residue modulo an odd prime p if and only if p ≡ 1 (mod 4). Proof. By Euler’s Test, p−1 −1 ≡ (−1) 2 p (mod p). Since both sides of the above congruence are equal to ±1, this congruence is actually an equality. The result then follows from the fact that ( p−1 1 p ≡ 1 (mod 4); (−1) 2 = −1 p ≡ 3 (mod 4). Example 20.11. Does a = 138 have a quadratic residue modulo p = 557? We use Euler’s Test to answer this question. Note that p−1 2 = 278. We can now compute a p−1 2 (mod p) using the Double-and-Add algorithm: a p−1 2 ≡ 138278 ≡ −1 (mod 557). Therefore 138 does not have a quadratic residue modulo 557. 364 51 Exercise 20.12. Compute 199 , 503 and 273 461 using Euler’s Test. At the end of this section, let us take a look at one curious application of the theory of quadratic residuocity. Proposition 20.13. 27 Proposition 28 Proposition 28 There are infinitely many primes congruent to 1 modulo 4. 6.10 in Frank Zorzitto, A Taste of Number Theory. 6.11 in Frank Zorzitto, A Taste of Number Theory. 80 Proof. Suppose we have a finite list of primes p1 , p2 , . . . , pn congruent to 1 modulo 4. We will show how to produce yet another prime congruent to 1 modulo 4 that is not on this list. Let x = (2 · p1 · p2 · · · pn )2 + 1. Let q be any prime factor of x. If q ∈ {2, p1 , p2 , . . . , pn }, then q | 1, which is impossible. Since q divides x, we see that −1 ≡ (2 · p1 · p2 · · · pn )2 (mod q), which means that −1 is a quadratic residue modulo q. But then it follows from Corollary 20.10 that q ≡ 1 (mod 4). Thus we were able to produce on more prime which is not in the original list of primes. Repeating this procedure yet another time but with the list p1 , p2 , . . . , pn , pn+1 = q, we can produce one more prime congruent to 1 modulo 4, and so on. Hence we can generate infinitely many distinct primes that are congruent to 1 modulo 4. 21 The Law of Quadratic Reciprocity Let p ≥ 3 be prime and a be aninteger such that p - a. We have already seen several approaches for computing ap , for example the Euler’s Test. In this section, we will investigate one more approach invented by Gauss. In fact, he established what we now call the Law of Quadratic Reciprocity, which encapsulates very important properties of quadratic residues. We begin by proving the following proposition on the multiplicativity of the Legendre symbol. Proposition 21.1. 29 The Legendre symbol is multiplicative. That is, if p is an odd prime and a, b are integers coprime to p, tehn a b ab = . p p p Furthermore, if a ≡ b (mod p), then a b = p p 29 Propositon 6.15 in Frank Zorzitto, A Taste of Number Theory. 81 Proof. The second statement is obvious because the residue is the same for all congruent integers. a b To prove that ab p = p p for any a and b coprime to p, we apply Euler’s Test (see Proposition 20.9): p−1 p−1 p−1 a b ab = a 2 b 2 ≡ (ab) 2 ≡ (mod p). p p p Since ap bp = ±1 and ab p = ±1 and these two integers are congruent modulo p, they have to be identical. By the Fundamental Theorem of Arithmetic, every positive integer a > 1 is a product of primes. That is, a = q1 q2 · · · qn for some primes q1 , q2 , . . . , qn with repetitions allowed. By Proposition 21.1, q1 qn q2 a = ··· . p p p p Also, if a is a negative integer, then a = −1 · b for some positive integer b, which means that −1 b a = . p p p We conclude that, in order to determine the value of ap , one has to explore the values of qp for distinct primes p and q. Essentially, for any fixed prime q, the Law of Quadratic Reciprocity allows us q to understand what values does the Legendre symbol p take when an odd prime p varies. As a very simple example, let us explore the case q = 2. Proposition 21.2. 30 If p is an odd prime then ( +1 p ≡ 1, 7 2 = p −1 p ≡ 3, 5 (mod 8); (mod 8). Proof. Suppose p = 8k + 1 for some for some positive integer k. There are 4k = p−1 2 even integers between 1 and p, namely 2, 4, 6, . . . , 4k − 2, 4k, 4k + 2, 4k + 4, . . . , 8k − 2, 8k. 30 Proposition 6.14 in Frank Zorzitto, A Taste of Number Theory. 82 Let us compute their product: x = 2 · 4 · 6 · · · (4k − 2) · · · (4k) · · · (4k + 2) · · · (4k + 4) · · · (8k − 2) · · · (8k) = 24k (1 · 2 · 3 · · · (2k) · (2k + 1) · (2k + 2) · · · (4k − 1) · (4k)) = 24k (4k)! However, 4k + 2 ≡ 1 − 4k 4k + 4 ≡ 3 − 4k .. . 8k − 2 ≡ −2 8k ≡ −1 (mod p) (mod p) (mod p) (mod p). Using the above information, we can compute x (mod p) as follows: x ≡ 2 · 4 · 6 · · · (4k − 2) · (4k) · (1 − 4k) · (3 − 4k) · (5 − 4k) · · · (−2) · (−1) ≡ 2 · 4 · 6 · · · (4k − 1) · (4k − 3) · (4k − 5) · · · 3 · 1 · (−1)2k ≡ (4k)! (mod p). We conclude that 24k (4k)! ≡ (4k)! (mod p). After cancelling (4k)! on both sides we obtain 2 p−1 2 ≡ 24k ≡ 1 (mod p). By Euler’s Test, the integer 2 has a quadratic residue modulo p. The cases p ≡ 3, 5, 7 (mod 8) can be studied analogously and are left as an exercise to the reader. Since we managed to understand how qp behaves for fixed q = 2, one would hope that such a result can be established for all other primes. Indeed, this can be achieved with the Law of Quadratic Reciprocity, proved by the German mathematician Carl Friedrich Gauss at the age of 19. Theorem 21.3. (Gauss’s Law of Quadratic Reciprocity)31 Let p and q be distinct odd prime numbers. Then p−1 q−1 p q = (−1) 2 · 2 . q p 31 Theorem 6.16 in Frank Zorzitto, A Taste of Number Theory. 83 In other words, ( q p p if p ≡ 1 (mod 4) or q ≡ 1 (mod 4); = q − qp if p ≡ 3 (mod 4) and q ≡ 3 (mod 4). The proof is quite non-trivial, so due to time limitations we will not present it in class or in these notes. If you would like to see the proof, see Section 6.4 in Frank Zorzitto, A Taste of Number Theory. Example 21.4. Let us examine how the value of 3p depends on the odd prime p. By the Law of Quadratic Reciprocity, p−1 3−1 p−1 p 3 = (−1) 2 · 2 = (−1) 2 . p 3 Multiplying both sides of the above equality by 3p , we obtain p−1 3 p = (−1) 2 . p 3 Now there are two cases to consider: 1. Suppose that p ≡ 1 (mod 4). Then 3p = 3p , so the value of 3p depends on the congruence class of p modulo 3. Note that 1p = +1 and 2p = −1. We conclude that 3p = +1 if and 3 p ( p≡1 p≡1 (mod 4); (mod 3), ( p≡1 p≡2 (mod 4); (mod 3). = −1 if Since 3 and 4 are coprime, we can apply the Chinese Remainder Theorem 3 3 to conclude that p = +1 when p ≡ 1 (mod 12) and p = −1 when p ≡ 5 (mod 12). 84 2. Analogously, we can analyze the case p ≡ 3 (mod 4). We have 3p = − 3p , which means that 3p = +1 if ( p ≡ 3 (mod 4); p ≡ 2 (mod 3), and 3p = −1 if ( p ≡ 3 (mod 4); p ≡ 1 (mod 3). Applying the Chinese Remainder Theorem, we see that 3p = +1 when p ≡ 11 (mod 12) and 3p = −1 when p ≡ 7 (mod 12). We conclude that ( +1 3 = p −1 p ≡ 1, 11 (mod 12); p ≡ 5, 7 (mod 12). Exercise 21.5. Determine for which odd primes p the Legendre symbols ±5 p and ±7 p are equal to +1 or −1. Exercise 21.6. Let us determine the value of 247 479 . Note that 209 = 13·19, 13 ≡ 1 (mod 4) and 19, 479 ≡ 3 (mod 4). Then we may use the multiplicativity of the Legendre symbol and the Law of Quadratic Reciprocity as follows: 247 13 19 = 479 479 479 479 479 = · − 13 19 4 11 =− 13 19 2 11 2 =− 13 19 13 =− 11 2 =− 11 = 1. 85 Note that the last equality holds because the only quadratic residues in Z?11 are [1], [3], [4], [5] and [9]. Since [2] is not in this list, it is a quadratic nonresidue. 22 Multiplicative Functions The last 16 sections were all devoted to the theory of congruences, and at this point it is time to switch gears and move towards other topics. This section, we begin our first exposition to the Analytic Number Theory. In analytic number theory, we utilize the tools of real or complex analysis in order to answer some questions in number theory. For example, the techniques of analytic number theory allow us to explain the asymptotic behaviour of functions π(x) = #{p ≤ x : p is prime} or Q(x) = #{n ≤ x : n is squarefree}. Here #X denotes the cardinality of the set X. The study of analytic number theory begins with the introduction of multiplicative and totally multiplicative functions. Definition 22.1. A non-zero function f : N → C is called multiplicative if for any coprime positive integers m and n it is the case that f (mn) = f (m) f (n). Definition 22.2. A non-zero function f : N → C is called totally multiplicative if for any positive integers m and n, not necessarily coprime, it is the case that f (mn) = f (m) f (n). Example 22.3. Here are some examples of multiplicative and totally multiplicative functions: 1. The indicator function I(n) is totally multiplicative: ( 1, if n = 1; I(n) = 0, if n 6= 1; 2. The constant function 1(n) is totally multiplicative: 1(n) = 1 for all n. 86 3. The identity function i(n) is totally multiplicative: i(n) = n for all n. 4. The Legendre symbol np for a fixed odd prime p is totally multiplicative in accordance with Proposition 21.1; 5. The Euler totient function ϕ(n) is multiplicative, but not totally multiplicative; 6. The number of divisors function τ(n) is multiplicative, but not totally multiplicative: τ(n) = #{d : d | n, d > 0}; 7. The sum of divisors function σ (n) is multiplicative, but not totally multiplicative: σ (n) = ∑ d; d|n d>0 8. The Möbius function is multiplicative, but not totally multiplicative (you will prove this fact in Assignment 5): if n = 1; 1, µ(n) = 0, if n is not squarefree; k (−1) , if n is squarefree with k distinct prime factors. We will now explore some properties of multiplicative functions. Proposition 22.4. 32 If m and n are coprime positive integers, then every positive divisor d of their product mn comes from a unique pair of integers a and b such that a | m, b | n and ab = d. Proof. If the unique factorizations of m and n are given by f f f m = pe11 pe22 · · · pekk and n = q11 q22 · · · q`` , 32 Proposition 8.2 in Frank Zorzitto, A Taste of Number Theory. 87 then the unique factorization of mn takes the form d = pr11 pr22 · · · prkk qs11 qs22 · · · qs`` , where 0 ≤ ri ≤ ei and 0 ≤ s j ≤ f j . If we now set a = pr11 pr22 · · · prkk and b = qs11 qs22 · · · qs`` , it becomes obvious that a | m, b | n and ab = d. Now we need to confirm that the above a and b are unique. Suppose that there exist positive integers c and e such that c | m, e | n and ec = d. Then ce = ab. Since c | m and b | n, it must be the case that c and b are coprime. Therefore c | a. By a symmetric argument, a | c, whence a = c, and then b = e. Proposition 22.5. Let f : N → C be a multiplicative function. Then 1. f (1) = 1; 2. The function f (n) is fully determined by its values at prime powers; 3. The function g(n) given by g(n) := f (d) ∑ d|n d>0 is multiplicative. Proof. Property 1 is obvious, because f (n) = f (1 · n) = f (1) f (n). By definition, f (n) is non-zero, so there exists some n such that f (n) 6= 0. For such n, we may cancel f (n) on both sides of the above equality, thus leaving f (1) = 1. To establish property 2, let n = pe11 pe22 · · · pekk be the prime factorization of n. Then f (n) = f (pe11 ) f (pe22 · · · pekk ) = f (pe11 ) f (pe22 ) f (pe33 · · · pekk ) ··· = f (pe11 ) f (pe22 ) · · · f (pekk ). since gcd(pe11 , pe22 · · · pekk ) = 1; since gcd(pe22 , pe33 · · · pekk ) = 1; Thus if we know the values of f (pe ) for all prime powers pe , we know the values of f (n) for all positive integers n. 88 To establish property 3, we use Proposition 22.4: g(mn) = f (d) ∑ d|mn = ∑ f (ab) by Proposition 22.4; a|m,b|n = ∑ f (a) f (b) since gcd(a, b) = 1 and f is multiplicative; a|m,b|n ! ! = ∑ f (a) ∑ f (b) a|m b|n = g(m)g(n). Proposition 22.6. The Euler totient function ϕ(n) is multiplicative. Furthermore, if n = pe11 pe22 · · · pekk is the prime factorization of n, then ϕ(n) = (pe11 − pe11 −1 )(pe22 − pe22 −1 ) · · · (pekk − pekk −1 ). Proof. For an integer x, let us use the notation [x]n to indicate the residue class of x modulo n. Let m and n be coprime integers exceeding 1. We will show that Z?mn is in one-to-one correspondence with the Cartesian product Z?m × Z?n = {(α, β ) : α ∈ Z?m , β ∈ Z?n }. Let [x]mn ∈ Z?mn . Then gcd(x, mn) = 1, which means that gcd(x, m) = 1 and gcd(x, n) = 1. But then [x]m and [x]n must be units in Z?m and Z?n respectively, so [x]m ∈ Z?m and [x]n ∈ Z?n . Conversely, if [a]m ∈ Z?m and [b]n ∈ Z?n , then by the Chinese Remainder Theorem there exists some [x]mn ∈ Zmn such that [x]m = [a]m ∈ Z?m and [x]n = [b]n ∈ Z?n . Therefore x is coprime to both m and n, and so x is coprime to mn. Thus we conclude that [x]mn ∈ Z?mn . Now that we saw that there exists a one-to-one correspondence between Z?mn and Z?m × Z?n , we can conclude that #Z?mn = # (Z?m × Z?n ) . 89 But since the cardinality of the Cartesian product is equal to the cardinality of the individual sets, i.e. # (Z?m × Z?n ) = #Z?m · #Z?n , with the help of Exercise 10.2 we can conclude that ϕ(mn) = #Z?mn = #(Z?m × Z?n ) = #Z?m · #Z?n = ϕ(m)ϕ(n). In order to establish the formula for ϕ(n) recall that according to property 2 of Proposition 22.5 it is sufficient to compute ϕ(pe ) for a prime power pe . The only numbers less than pe that are not coprime to it are p, 2p, 3p, . . . , (pe−1 − 1)p. There are pe−1 − 1 numbers like that in total, which means that ϕ(pe ) = (pe − 1) − (pe−1 − 1) = pe − pe−1 . Now that we know the formula for ϕ(pe ) when pe is a prime power, it is straightforward to write down the general formula for ϕ(n) because it is multiplicative. Proposition 22.7. The number of divisors function τ(n) is multiplicative. Furthermore, if n = pe11 pe22 · · · pekk is the prime factorization of n, then σ (n) = (e1 + 1) (e2 + 1) · · · (ek + 1) . Proof. To see that τ(n) is multiplicative, let n ≥ 2 be an integer and consider the prime factorization of n: n = pe11 pe22 · · · pekk . Then every divisor d of n must be of the form f f f d = p11 p22 · · · pkk , where 0 ≤ fi ≤ ei for all i = 1, 2, . . . , k. Each fi has ei + 1 possibilities, so we see that there are exactly τ(n) = (e1 + 1)(e2 + 2) · · · (ek + 1) possible divisors of n. Now suppose that f f f m = pe11 pe22 · · · pekk and n = q11 q22 · · · q`` are coprime, i.e. the prime numbers p1 , p2 , . . . , pk , q1 , q2 , . . . , q` are distinct. Then τ(mn) = (e1 + 1)(e2 + 1) · · · (ek + 1)( f1 + 1)( f2 + 1) · · · ( f` + 1) = τ(m)τ(n), which means that τ(n) is a multiplicative function. 90 Proposition 22.8. The sum of divisors function σ (n) is multiplicative. Furthermore, if n = pe11 pe22 · · · pekk is the prime factorization of n, then ! ! ! pekk +1 − 1 pe11 +1 − 1 pe22 +1 − 1 σ (n) = ··· . p1 − 1 p2 − 1 pk − 1 Proof. To see that σ (n) is multiplicative, note that σ (n) = ∑ d = ∑ i(d), d|n d>0 d|n d>0 where i(n) = n is the identity function. Since the identity function i(n) is multiplicative, it follows from property 3 of Proposition 22.5 that σ (n) is multiplicative as well. In order to establish the formula for σ (n) recall that according to property 2 of Proposition 22.5 it is sufficient to compute σ (pe ) for a prime power pe . The divisors of pe are 1, p, p2 , . . . , pe , so σ (pe ) = 1 + p + p2 + . . . + pe = pe+1 − 1 . p−1 Note that the last equality holds because the sequence 1, p, . . . , pe constitutes an (e + 1)-term geometric progression with the first element equal to 1 and common ratio p. Now that we know the formula for σ (pe ) when pe is a prime power, it is straightforward to write down the general formula for σ (n) because it is multiplicative. 23 The Möbius Inversion From now on, when writing d | n, we will always assume that the divisor d is positive. As we shall see, the Möbius function if n = 1; 1, µ(n) = 0, if n is not squarefree; k (−1) , if n is squarefree with k distinct prime factors plays a crucial role in analytic number theory. 91 Proposition 23.1. 33 For every n ≥ 1, ∑ µ(d) = I(n). d|n Proof. Let g(n) = ∑d|n µ(n). Note that g(1) = µ(1) = 1 = I(1). Now let n ≥ 2. Since µ(n) is multiplicative, it follows from property 3 of Proposition 22.5 that g(n) is multiplicative as well. By property 2 of Proposition 22.5, it suffices to check that g(pe ) = 0 for every prime power pe . We have g(pe ) = ∑ µ(d) d|pe = µ(1) + µ(p) + µ(p2 ) + . . . + µ(pe ) = 1−1+0+...+0 =0 = I(pe ), so the result follows. The Möbius function is important because it allows us to express the function f in terms of g whenever these two functions are connected by the relation g(n) = ∑ f (d). d|n The operation of expressing f through g is called the Möbius inversion. Proposition 23.2. 34 If f and g are arbitrary functions, not necessarily multiplicative, that are defined on the set of positive integers and satisfy g(n) = ∑ f (d) d|n for all n ≥ 1, then f (n) = ∑ g(d)µ d|n n d 33 Proposition 34 Theorem = ∑g d|n n d 8.6 in Frank Zorzitto, A Taste of Number Theory. 8.7 in Frank Zorzitto, A Taste of Number Theory. 92 µ(d). Proof. First, note that for a positive integer n and a pair of positive integers d, e it is the case that de | n if and only if d | n and e | n/d. Second, note that n ∑ g d µ(d) = ∑ ∑n f (e) µ(d) d|n e| d|n d = f (e)µ(d) ∑ d|n,e| dn = ∑ f (e)µ(d) ed|n = f (e)µ(d) ∑ e|n,d| ne = ∑ ∑ µ(d) f (e) e|n = ∑I d| ne n e|n e f (d) = f (n). Before proceeding to examples of the Möbius inversion, let us prove the following fact about the Euler totient function ϕ(n). Proposition 23.3. 35 For every positive integer n, ∑ ϕ(n) = n. d|n Proof. Let g(n) = ∑d|n ϕ(d). By property 3 of Proposition 22.5, the function g(n) is multiplicative. Therefore, by property 2 of Proposition 22.5, it is sufficient to understand its values g(pe ) for prime powers pe . Using the formula given in Proposition 22.6, we obtain g(pe ) = ϕ(1) + ϕ(p) + ϕ(p2 ) + . . . + ϕ(pe ) = 1 + (p − 1) + (p2 − p) + . . . + (pe − pe−1 ) = pe . 35 Proposition 8.4 in Frank Zorzitto, A Taste of Number Theory. 93 And now, since g(n) is multiplicative, for any integer n with the prime factorization n = pe11 pe22 · · · pekk we may conclude that g(n) = g(pe11 pe22 · · · pekk ) = g(pe11 )g(pe22 ) · · · g(pekk ) = pe11 pe22 · · · pekk = n. Now that we established the connection between the identity function i(n) and the Euler totient function ϕ(n), we can write down a new formula for ϕ(n) via the Möbius inversion. Example 23.4. Let us prove that for every positive integer n it is the case that µ(d) d d|n ϕ(n) = n ∑ By Proposition 23.3, the identity function i(n) and the Euler totient function ϕ(n) are connected by means of the relation i(n) = ∑ ϕ(d). d|n Now the Möbius inversion formula tells us that n µ(d) n . = ∑ µ(d) = n ∑ ϕ(n) = ∑ µ(d)i d d d d|n d|n d|n Example 23.5. Note that σ (n) = ∑ d, d|n which means that there is a connection between the sum of divisors function σ (n) and the identity function i(n). But then it follows from the Möbius inversion formula that n n = ∑ µ(d)σ . d d|n Exercise 23.6. The von Mangoldt function, denoted by Λ(n), is defined as ( log p, if n = pk for some prime p and integer k ≥ 1; Λ(n) = 0, otherwise. 94 Prove that log n = ∑ Λ(d), d|n and then use the Möbius inversion to establish the formula Λ(n) = − ∑ µ(d) log d. d|n 24 The Prime Number Theorem In 1797 or 1798, it has been conjectured by Legendre that the number of primes up to x is approximated by the function A logxx+B , where A and B are some constants. According to the recollections of Gauss, “in the year 1792 or 1793”, when he was 15 or 16 years old, he made a similar observation. In simple terms, this conjecture states that, up to x, there are “roughly” logx x prime numbers. The Prime Number Theorem is a theorem which confirms the conjecture made by Legendre and Gauss. It is one of the most renowned results in Analytic Number Theory. The Prime Number Theorem was proved independently by Jacques Hadamard and Charles Jean de la Vallée-Poussin in 1896. Theorem 24.1. (The Prime Number Theorem) Let π(x) := #{p ≤ x : p is prime}. Then lim x→∞ π(x) x log x = 1. A more accurate statement of the Prime Number Theorem is the following one: √ π(x) = Li(x) + O xe−a log x , where a is a positive constant and Zx Li(x) = dt . logt 2 Indeed, the function Li(x) describes the behaviour of the prime counting function more precisely than logx x . In this form, we also see the error term, which tells us how far is the value of π(x) from the value of Li(x). 95 The analytic proof of Prime Number Theorem heavily relies on complex analysis, so it is not “elementary”. More precisely, it requires some delicate analysis of (non-trivial) zeros of the Riemann zeta function ∞ ζ (s) := 1 ∑ ns , n=1 where s is a complex number with Re(s) > 1. The elementary proof of Prime Number Theorem was discovered half a century later, in 1948, by the Norwegian mathematician Atle Selberg.36 √ Since the proof was introduced, the error term O(xe−a log x ) was improved many√times. If the Riemann Hypothesis is true, the error term can be improved to O( x log x). The Riemann Hypothesis concerns the distribution of non-trivial zeros of ζ (s). It is undoubtedly one of the hardest open mathematical problems. At the University of Waterloo, there are several experts which work in the area of Analytic Number Theory and problems related to the distribution of zeros of Riemann zeta function, including Yu-Ru Liu and Michael Rubinstein. It is worthwhile mentioning a very interesting elementary argument of Erdős, which explains why the function logx x “captures” the behaviour of π(x). The proof does not involve any analytic techniques and should be quite accessible to second or third year undergraduate students in mathematics. To those who are interested in the subject, we recommend this proof for further reading. Theorem 24.2. (Erdős, 1949) For x ≥ 2, x x 3 log 2 < π(x) < (6 log 2) . 8 log x log x Proof. See Theorem 4 in https://uwaterloo.ca/pure-mathematics/sites/ ca.pure-mathematics/files/uploads/files/pmath440notes_0.pdf. 25 The Density of Squarefree Numbers In this section, we will see one basic analytical result on the density of squarefree numbers. 36 On the history of elementary proof of Prime Number Theorem and Selberg’s dispute with Erdős, see the article of D. Goldfeld, The elementary proof of the prime number theorem: an historical perspective, 2003. 96 Theorem 25.1. Let Q(x) = #{n ≤ x : n ≥ 2 is squarefree}. Then the natural asymptotic density of squarefree numbers is given by Q(x) 6 = 2 ≈ 0.6079. x→∞ x π lim In other words, Theorem 25.1 tells us that over 60% of all positive integers are squarefree. Before proceeding to the proof, let us establish the following simple lemma. Lemma 25.2. Let f (n) be a multiplicative function such that the series ∞ ∑ | f (n)| n=1 converges. Then ∞ ∑ f (n) = ∏ 1 + f (p) + f (p2 ) + . . . . p is prime n=1 Proof. For a fixed positive number y, the following identity holds: ∏ (1 + f (p) + f (p2 ) + . . .) = p is prime p<y f (n). ∑n if p | n then p < y As y approaches infinity, the right hand side approaches ∑∞ n=1 f (n), while the left hand side approaches the desired Euler product. Since the series ∑∞ n=1 | f (n)| converges, it must be the case that ∑ | f (n)| → 0 n≥y as y approaches infinity. We can utilize this fact in order to show that, as y approaches infinity, ∞ ∑ f (n) − n=1 ∑n f (n) = if p | n then p < y ∑n ∃p|n : p≥y 97 f (n) ≤ ∑ | f (n)| → 0. n≥y This observation allows us to conclude that ∞ lim ∑ f (n) = y→∞ n=1 ∑n f (n) = lim y→∞ if p | n then p < y = ∏ (1 + f (p) + f (p2 ) + . . .) p is prime p<y ∏ 1 + f (p) + f (p2 ) + . . . . p is prime Proof. (of Theorem 25.1) Note that ( 1, if n is squarefree; µ 2 (n) = 0, otherwise, which means that ∑ µ 2(n). Q(x) = n≤x Let `(n) denote the largest integer such that `(n)2 | n. Then it follows from Proposition 22.8 that ( 1, if `(n) = 1; µ 2 (n) = 0, otherwise; = I (`(n)) = ∑ µ(d) d|`(n) = ∑ µ(d). d d 2 |n As it turns out, this formula is much easier to analyze than µ 2 (n). Now let {x} := x − bxc denote the fractional part of x. Note that {x} satisfies 98 0 ≤ {x} < 1 for any x. Then Q(x) = ∑ µ 2(n) n≤x = ∑ ∑ µ(d) n≤x d d 2 |n = ∑√ d≤ x µ(d) 1 ∑ n≤x d 2 |n jxk = ∑ µ(d) 2 √ d d≤ x x n x o = ∑ µ(d) 2 − 2 √ d d d≤ x = ∑ √ d≤ x µ(d) nxo x . − µ(d) ∑√ d 2 d≤ d2 x Since |µ(d){x/d}| < 1, we conclude that Q(x) = ∑ √ d≤ x <x µ(d) nxo x µ(d) − ∑√ d 2 d≤ d2 x µ(d) + ∑ 1 √ d2 x d≤ x ∑√ d≤ =x √ µ(d) + b xc 2 d x ∑√ d≤ ∞ ∞ µ(d) µ(d) √ − x + x. ∑ 2 √ d2 d=1 d d> x ≤x∑ Now observe that µ(d) 1 ∑√ d 2 ≤ ∑√ d 2 < d> x d> x Above we utilized the fact that Z∞ √ b xc dt 1 2 = √ ≤√ . 2 t b xc x √ √ x ≤ 2b xc for all x ≥ 2. For convenience, define 99 the constant c as ∞ c := µ(d) . 2 d=1 d ∑ Then ∞ Q(x) ≤ cx − x µ(d) √ + x d2 x ∑√ d> √ 2 < cx + x √ + x x √ = cx + 3 x. Through analogous observations, we can also establish the lower bound on Q(x), and obtain the final relation √ √ cx − 3 x < Q(x) < cx + 3 x. Now the only thing that is left for us to do is to compute c. Recall that π2 1 = ∑ 2 6. n=1 n ∞ This result was proved by Leonhard Euler in 1734. Further, by the argument analogous to the second proof of Theorem 2.10, we see that ∞ 1 π2 1 1 1 −1 =∑ 2= ∏ 1+ 2 + 4 +... = ∏ 1− 2 . 6 p p p n=1 n p is prime p is prime Note that the last equality holds due to the formula for the infinite geometric series. Since the function µ(n)/n2 is multiplicative and ∞ ∑ d=1 ∞ µ(d) 1 π2 ≤ = < ∞, ∑ 2 d2 6 d=1 d 100 2 we can apply Lemma 25.2 to the series ∑∞ d=1 µ(d)/d in order to obtain µ(d) µ(p) µ(p2 ) c= ∑ 2 = ∏ 1+ 2 + +... p p4 p is prime d=1 d 1 = ∏ 1− 2 p p is prime !−1 ∞ 1 = ∑ 2 d=1 n ∞ = 6 . π2 Thus we conclude that √ √ 6 6 x − 3 x < Q(x) < 2 x + 3 x, 2 π π and further 3 Q(x) 6 3 6 −√ < < 2+√ . 2 π x x π x By letting x tend to infinity, we see that the Squeeze Theorem implies 6 Q(x) = 2. x→∞ x π lim 26 Perfect Numbers One of the oldest problems in mathematics concerns the existence of odd perfect numbers. Around 300BC, these numbers were introduced by Euclid in his book Elements (VII.22). Definition 26.1. A positive integer n is called perfect if the sum of its divisors is equal to 2n, or in other words σ (n) = 2n. The first eight perfect numbers are 6, 28, 496, 8128, 33550336, 8589869056, 137438691328, 2305843008139952128. 101 Aside from the fact that they tend to grow pretty quickly (which we shall explain later), we may notice one thing that they all have in common, namely that they are all even. But do there exist odd perfect numbers? We do not know. This question was studied thoroughly over the past two centuries, and quite a few things are known about odd perfect numbers. For example, if an odd perfect number n exist, it must satisfy the following three (out of many other) criteria: 1. n > 101500 ; 2. n has at least 101 prime factors and at least 10 distinct prime factors; 3. The largest prime factor of n is greater than 108 . In 2003, Carl Pomerance gave a heuristic argument why the existence of odd perfect numbers is highly unlikely. Those who are interested can find his argument here: http://home.earthlink.net/~oddperfect/pomerance.html. Unlike odd perfect numbers, we do know that even perfect numbers exist. Even more than that, we know exactly how perfect numbers look like. However, we still do not know whether there are infinitely many even perfect numbers. As we shall see later, this problem is equivalent to showing that there are infinitely many Mersenne primes. Definition 26.2. Let Mn := 2n − 1. An integer M p = 2 p − 1 is called a Mersenne prime if it is prime. The first eight Mersenne primes are 3, 7, 31, 127, 8191, 131071, 524287, 2147483647. As we will see in the proof of Euclid-Euler Theorem, which was proved by Leonhard Euler in 1747, the even perfect numbers and Mersenne primes are closely related. Theorem 26.3. (Euclid-Euler Theorem, 1747)37 An even positive integer n is a perfect number if and only if it has the form n = 2 p−1 M p , where M p is a Mersenne prime. Proof. The sufficient condition was proved by Euclid around 300 BC. You are asked to reproduce his proof in Assignment 5, so we omit it in these lecture notes. 37 Theorem 8.5 in Frank Zorzitto, A Taste of Number Theory. 102 For the necessary condition, suppose that n is even and perfect. Let us write n = 2 p−1 m, where p ≥ 2 and m is odd. Note that p ≥ 2 because n is even. We will show that m = 2 p − 1, and that m is prime. We have that n is perfect, and so σ (n) = 2n = 2 p m. Because 2 p−1 and m are coprime and σ is multiplicative, the first equation yields σ (n) = σ (2 p−1 )σ (m). By adding up the divisors of 2 p−1 we obtain σ (2 p−1 ) = 1 + 2 + 22 + . . . + 2 p−1 = 2 p − 1. We conclude that σ (n) = (2 p − 1)σ (m), and so 2 p m = (2 p − 1)σ (m). Since 2 p and 2 p − 1 are coprime, 2 p − 1 | m. So m = (2 p − 1)d for some positive integer d. Now we need to prove that in the expression m = (2 p − 1)d we have d = 1. We plug in this expression into the equality 2 p m = (2 p − 1)σ (m) in order to obtain 2 p (2 p − 1)d = (2 p − 1)σ (m), and thus 2 p d = σ (m). From m = (2 p − 1)d and 2 p d = σ (m) we come to m + d = 2 p d = σ (m). Now suppose that d > 1. Since d < m, there are at least three divisors of m, namely 1, d and m. So σ (m) ≥ m + d + 1, and this contradicts the fact that σ (m) = m + d. Therefore d = 1. To see that m is prime, note that σ (m) = m+d = m+1. Since the divisors of m add up to m + 1, our m can have only 1 and m as divisors, which makes m a prime. Hence our perfect even number m is of the form 2 p−1 M p , where M p = 2 p − 1 is a Mersenne prime. 103 Though we do not know if there are infinitely many Mersenne primes, we do know quite a few of them. On January 7th 2016, The Great Internet Mersenne Prime Search reported the discovery of the 49th Mersenne prime, which is the largest Mersenne prime known to date. This prime is M74207281 , and it has 22338618 decimal digits. If you want to make some significant impact to Computational Number Theory, try to search for other Mersenne primes! 27 Pythagorean Triples In Section 4, we learned how to solve the linear Diophantine equation ax + by = c. We will now turn our attention to equations of degree two or more. The analysis of such equations can be much more challenging, and many Diophantine equations, such as Thue equations, remain the objects of active research nowadays. In this section, we will classify all positive integer solutions to the Pythagorean equation x2 + y2 = z2 . Note that if the integers x, y and z satisfy the above equation, then so do integers dx, dy and dz for any integer d. Thus it is only interesting to consider the case when gcd(x, y, z) = 1. In this case, we call the triple of solutions primitive. The first three primitive solutions to the Pythagorean equation are (x, y, z) = (3, 4, 5), (5, 12, 13) and (8, 15, 17). Theorem 27.1. Suppose integers x, y and z satisfy the Pythagorean equation x2 + y2 = z2 . Then there exist integers d, m, n such that x = d(n2 − m2 ), y = 2dmn, z = d(n2 + m2 ). Proof. 38 Let d = gcd(x, y, z). Then the triple (x/d, y/d, z/d) is also a solution, so without loss of generality we may assume that gcd(x, y, z) = 1, i.e. (x, y, z) is a primitive solution. From here it follows that either x or y have different parity, for if we assume that both x and y are odd, then x2 +y2 ≡ 2 (mod 4), which contradicts the fact that z2 ≡ 0, 1 (mod 4) for any integer z. Without loss of generality, we may assume that x is odd and y is even, which means that z is odd. Now we write y2 = z2 − x2 = (z − x)(z + x). 38 The proof is taken from Section 1.1 of M. J. Jacobson, Jr. and H. C. Williams, Solving the Pell Equation, 2009. 104 If we let g = gcd(z − x, z + x) = gcd(2z, z + x) = gcd(z − x, 2x) (see Proposition 5.1), then g | 2z and g | 2x, which means that g | gcd(2z, 2x) = 2 gcd(z, x). Since (x, y, z) is a primitive solution, it must be the case that gcd(z, x) = 1. This means that g | 2, and since x and z are odd it must be the case that g = 2. Now we can write y 2 z − x z + x = . 2 2 2 Since the value on the left hand side of the above equality is a perfect square and z−x z+x z−x z+x 2 , 2 are coprime integers, it must be the case that 2 and 2 are perfect squares. Put z−x z+x = m2 and = n2 . 2 2 But then x = n2 − m2 , y = 2mn and z = n2 + m2 . Now we see that for any integer d the identity 2 2 d(n2 − m2 ) + (2dmn)2 = d(n2 + m2 ) holds, which means that all solutions (x, y, z) to x2 + y2 = z2 are of the form (d(n2 − m2 ), 2dmn, d(n2 + m2 )), as claimed. 28 Fermat’s Infinite Descent. Fermat’s Last Theorem Perhaps, the most famous mathematical story is the story of Fermat’s Last Theorem. Around 1637, Fermat wrote his Last Theorem in the margin of his copy of Diophantus’s Arithmetica. When reformulated, his claim sounds as follows: Theorem 28.1. (Fermat’s Last Theorem) Let n ≥ 3. Then the equation xn +yn = zn has no solutions in positive integers x, y and z. He claimed to discover a “truly marvellous” proof of this fact, but couldn’t write it because the margin of the book which he was reading was too narrow to contain all of the proof. Many mathematicians tried to establish the proof of Fermat’s Last Theorem. The case n = 4 was proved by Fermat himself in 1636. In 1753, Euler proved it for the case n = 3. Alternative proofs were given by Kausler, Legendre, Calzolari, Lamé, and many others. In his proof, Euler utilized Fermat’s idea of infinite 105 descent, which we shall discuss in this section. The case n = 5 was proved by Dirichlet and Legendre around 1825, and alternative proofs were given by Gauss, Lebesgue, Lamé, and others. The case n = 7 was proved by Gabriel Lamé in 1839. In the 1820’s, Sophie German developed an approach to attack the problem for several exponents at the same time. In particular, she managed to show that the Fermat’s Last Theorem holds for all primes n < 100. In 1847, Gabriel Lamé suggested to approach the problem by factoring the equation x p + y p = z p for odd prime p as follows: z p = x p + y p = (x + y)(x + ζ p y)(x + ζ p2 y) · · · (x + ζ pp−1 y), (5) where ζ p = exp(2πi/p) is the primitive p-th root of unity. If instead of the standard ring of integers Z one considers the ring of integers Z[ζ p ] = {x0 + x1 ζ p + x2 ζ p2 + . . . + ζ pp−1 : x1 , x2 , . . . , x p ∈ Z}, then one would hope that such notions as unique factorization or coprimality take place in Z[ζ p ], just like they do in Z. Assuming that this is the case, one could show that the algebraic integers x+y, x+ζ p y, . . . , x+ζ pp−1 y are coprime, and since the expression (5) has a p-th power of an integer z on its left hand side, one could then hope that x + ζ pi y = qip for some qi ∈ Z[ζ p ], where i = 0, 1, . . . , p. In other words, each of the numbers x + y, x + ζ p y, . . . , x + ζ pp−1 y are perfect p-th powers, and one could prove that this is impossible. Note how similar this idea to the one presented in the proof of Theorem 27.1. Unfortunately, there is a flaw in this argument: it is not necessarily true that the ring Z[α] for some algebraic number α has the unique factorization. Perhaps, the most famous example is that in the ring √ √ Z[ −5] = {x1 + x2 −5 : x1 , x2 ∈ Z} one can write the number 6 in two different ways: √ √ 6 = 2 · 3 = (1 + −5)(1 − −5). The odd primes p such that the elements of the ring Z[ζ p ] may not possess the unique factorization are called irregular primes. They are called regular otherwise. The first eight irregular primes are 37, 59, 67, 101, 103, 131, 149157. 106 Therefore Lamé’s strategy applies to all primes p < 100, except for p = 37, 59 and 67. Around 1850, Ernst Kummer managed to prove that for all regular primes the Fermat equation x p + y p = z p has no solutions in positive integers when p is an odd prime. However, it is still unknown whether there are infinitely many regular primes. In 1964, Carl Ludwig Siegel conjectured that approximately 60.65% of all prime numbers are regular. The techniques suggest by Lamé and Kummer (and Euler before that) evolved into a whole new area of mathematics, known nowadays as the Algebraic Number Theory. The next few sections will contain a brief introduction to this subject. The Fermat’s Last Theorem was proved by the English mathematician Andrew Wiles. His proof was published in 1994 in the special issue of Annals of Mathematics. The original paper is available here: https://math.stanford. edu/~lekheng/flt/wiles.pdf. As an exercise: try to understand at least the first page! Since the Fields medal, which is one of the most important awards for mathematicians, is restricted to those under age 40, and Andrew Wiles proved the Fermat’s Last Theorem at the age 41, he received a silver plaque from the International Mathematical Union instead of the Fields medal. The proof of Andrew Wiles combined many areas of number theory together. It is an interconnection of the Theory of Elliptic Curves, Theory of Modular Forms, Representation Theory, Iwasawa Theory, and many other mathematical subjects. In short, Andrew Wiles managed to do the following. Consider the equation y2 = x3 + ax + b, where a and b are complex numbers such that 4a3 + 27b2 6= 0. When a and b are real, such an equation defines a plane curve, called an elliptic curve. In 1985, the German mathematician Gerhard Frey pointed out that for an integer n ≥ 3 the elliptic curve y2 = x(x − an )(x + bn ), where a and b are positive integers such that an + bn = cn for some integer c, must be very special. In particular, he pointed out that such a curve must be semistable and non-modular. The fact that it is non-modular would then contradict the socalled Taniyama-Shimura Conjecture, proposed by the Japanese mathematicians Yutaka Taniyama and Goro Shimura in 1957. The conjecture stated that every elliptic curve, semistable or not, has to be modular. Andrew Wiles managed to prove this conjecture in the semistable case. Fermat’s Last Theorem then follows from this result. The fact that all elliptic curves, semistable or not, are modular, was proved in 2001 by Christophe Breuil, Brian Conrad, Fred Diamond and 107 Richard Taylor. This result is known as the Modularity Theorem. It took more than 350 years for the proof of Fermat’s Last Theorem to be discovered. Fermat claimed that he had the proof of the Fermat’s Last Theorem. Of course, it is highly unlikely that the argument he had in mind was as involved as the one given by Andrew Wiles. Most likely, Fermat believed that the theorem could be proved using the technique of infinite descent, which he developed. This technique allowed him to prove the theorem in the special case when n = 4. We present a more general result in the following proposition. The idea of infinite descent can be summarized as follows: when considering certain Diophantine equations, like x3 + 2y3 + 4z3 = 0 or x4 + y4 = z2 , one can show that the existence of one solution leads to the existence of another solution, which is “smaller” than the previous one. One would then obtain an infinite strictly decreasing sequence of positive integers x1 > x2 > x3 > . . ., which would contradict the fact that the natural numbers are bounded below by 1. We will demonstrate the application of this technique in two special cases. More examples can be found in the following survey of Keith Konrad: http://www.math.uconn.edu/~kconrad/ blurbs/ugradnumthy/descent.pdf. Proposition 28.2. (Fermat, 1636)39 The equation x4 + y4 = z2 has no solutions in positive integers x, y and z. Proof. By Theorem 27.1, every primitive solution (x, y, z) to the equation x2 + y2 = z2 must be of the form x = n2 − m2 , y = 2mn, z = n2 + m2 . Assume that there is a solution to x4 + y4 = z2 , where x, y and z are positive integers. Without loss of generality, we may suppose that gcd(x, y) = 1, which means that gcd(x, z) = 1 and gcd(y, z) = 1. We will find a second positive integer solution (x0 , y0 , z0 ) with gcd(x0 , y0 ) = 1 that is smaller than (x, y, z) in a suitable sense. Since x4 + y4 = z2 and gcd(x, y) = 1, at least one of x and y is odd. Otherwise, z2 ≡ 2 (mod 4), and this congruence as we saw before has no solutions. Without loss of generality, we may assume that x is odd and y is even. Then z is odd. Since (x2 )2 + (y2 )2 = z2 , the triple (x2 , y2 , z) must be a primitive Pythagorean triple, so there exist integers m and n such that x2 = n2 − m2 , y2 = 2mn, z = n2 + m2 . 39 Theorem 3.1 in Keith Conrad, Proofs by descent. 108 (6) Since x2 + m2 = n2 and gcd(m, n) = 1, we conclude that (x, m, n) is another primitive Pythagorean triple. Since x is odd, the formula for primitive Pythagorean triples once again tells us that x = a2 − b2 , m = 2ab, n = a2 + b2 , (7) where a and b are positive. Substituting the values of m and n in (7) into the second equation of (6), we obtain y2 = 4(a2 + b2 )ab. Since y is even, y 2 2 = (a2 + b2 )ab. Since gcd(a, b) = 1, the three factors on the right are pairwise coprime. Since they are all positive, each of them must be a perfect square: a = x02 , b = y02 , a2 + b2 = z02 . Since gcd(a, b) = 1, it must be the case that gcd(x0 , y0 ) = 1. Now the last equation can be rewritten as x04 + y04 = z02 , so (x0 , y0 , z0 ) is another solution to our original equation with gcd(x0 , y0 ) = 1. Now we compare z0 to z. Since 0 < z0 ≤ z02 = a2 + b2 = n ≤ n2 < z, we see that from one primitive solution (x, y, z) to x4 + y4 = z2 we can produce another primitive solution (x0 , y0 , z0 ) such that z > z0 . But then we could produce an infinite strictly decreasing sequence of positive integers z > z0 > z00 > . . ., and this contradicts the fact that the positive integers are bounded below by 1. Corollary 28.3. The Fermat’s Last Theorem holds for n = 4. In other words, the equation x4 + y4 = z4 has no solutions in positive integers x, y and z. √ Another example of the proof by infinite descent is the proof of irrationality of 2. This proof was discovered by Pythagoreans, who showed that the diagonal of a square cannot be represented as a ratio of two integers. The Pythagoreans kept the proof of this fact as a secret and, according to the legend, its discoverer (possibly Hippasus of Metapontum) was murdered for divulging it. √ Proposition 28.4. The number 2 is irrational. That is, there exist no integers m √ and n such that 2 = m/n. 109 √ Proof. Suppose not and there exist positive integers m and n such that 2 = m/n. √ Then m = 2n. Raising both sides of this equation to the power of two, we obtain m2 = 2n2 , so (m, n) is a positive solution to the Diophantine equation x2 = 2y2 . From the above equality we see that 2 | m2 , which means that 2 | m. But then we can write m as m = 2m0 for some integer m0 . Therefore m2 = (2m0 )2 = 4m2 = 2n2 . Thus we obtain 4m2 = 2n2 , and by cancelling 2 on both sides we get n2 = 2m02 . Thus from the positive integer solution (m, n) we can obtain another positive integer solution (n, m0 ) to the Diophantine equation x2 = 2y2 . Note that 1 1 m0 = m = √ n < n, 2 2 so the second coordinate in the solution (m, n) is strictly greater than the second coordinate in the solution (n, m0 ). Thus, if there would be a positive integer solution to the Diophantine equation x2 = 2y2 , we could produce an infinite strictly decreasing sequence of positive integers n > m0 > m00 > . . ., and this contradicts the fact that the positive integers are bounded below by 1. √ Exercise 28.5. Let k be a positive integer. Prove that the number k is rational if and only if k is a perfect square. 29 Gaussian Integers Let i denote one of the complex roots of the polynomial x2 +1. That is, the number i satisfies the equation i2 = −1. Notice that if i is a root of x2 + 1, then so is −i. Definition 29.1. A complex number of the form a + bi, where a, b ∈ Z is called a Gaussian integer. The set of Gaussian integers is denoted by Z[i]. The notation Z[i] suggests that the set of Gaussian integers is analogous to the ring of rational integers Z, where we now treat the numbers i or −i as (Gaussian) integers as well. The similarities between the two sets become even more obvious once we note that, just like the set of rational integers Z, the set Z[i] forms a commutative ring under the standard operations of addition and multiplication. 110 Proposition 29.2. The set of Gaussian integers Z[i] := {a + bi : a, b ∈ Z} forms a commutative ring under the standard operations of addition and multiplication. Proof. Strictly speaking, to prove this result one would have to do the routine verification of the ring axioms and the commutativity. We will leave this part an exercise. What is worthwhile mentioning is that both 0 = 0 + 0 · i and 1 = 1 + 0 · i are the elements of Z[i], and also that the operations of addition and multiplication are well-defined. That is, for all a + bi, c + di ∈ Z[i], their sum, difference and product are the elements of Z[i]: (a + bi) ± (c + di) = (a ± c) + (b ± d)i ∈ Z[i]; (a + bi)(c + di) = (ac − bd) + (ad + bc)i ∈ Z[i]. Also, note that Z ( Z[i], so every rational integer is also a Gaussian integer. We will see that the Gaussian integers will be of a great help when we will try to answer the question which integers can be represented as a sum of two squares. In other words, we will use Gaussian integers to solve the Diophantine equation n = a2 + b2 , where n is fixed and a, b ∈ Z are variables. Note that, if n ∈ N is representable as a sum of two squares, then n = a2 + b2 = (a + bi)(a − bi), so we just managed to factor a rational integer n, which is also a Gaussian integer, as a product of two Gaussian integers a + bi and a − bi. Definition 29.3. Let a, b ∈ Z[i]. We say that a divides b, or that a is a factor of b, when b = ak for some k ∈ Z[i]. We write a | b if this is the case, and a - b otherwise. Example 29.4. For example, 5 = (1 + 2i)(1 − 2i), so (1 + 2i) | 5 and (1 − 2i) | 5. One of the most important invariants attached to a Gaussian integer z is its norm, which we denote by N(z). 111 Definition 29.5. The norm function is defined to be N : Z[i] → N ∪ {0}, a + bi 7→ a2 + b2 . Definition 29.6. Let z = a + bi be a Gaussian integer. The complex conjugate of z is z = a − bi. The absolute value of z is p p √ |z| := zz = (a + ib)(a − ib) = a2 + b2 . Note the obvious connection between the norm of a Gaussian integer z = a+bi and the absolute value of z: p 2 N(z) = N(a + bi) = a2 + b2 = a2 + b2 = |z|2 . The norm map has many nice properties. For example, it is multiplicative; that is, the norm of the product of two Gaussian integers is equal to the product of their norms: N(zw) = |zw|2 = (|z||w|)2 = |z|2 |w|2 = N(z)N(w). We will see the usefulness of this property later. Another important thing to mention is that the only Gaussian integer whose norm is equal to zero is zero itself. That is, N(z) = 0 if and only if z = 0. Now comes the time to speak about the geometric interpretation of the Gaussian integers. Consider Figure 1,40 which depicts a complex plane. The Gaussian integers a + bi form a square grid located at points (a, b), where the coordinates are rational integers. If z = a + bi is a Gaussian integer, then the point (a, −b), which corresponds to the complex conjugate z = a − bi of z, is just the result of reflection of the point (a, b) along the x-axis. In turn, the absolute value |z| represents the distance from the point (a, b) to the origin. Note that it is equal to the distance from the point (a, −b) to the origin. The next important concept that we need to introduce is the concept of units. Definition 29.7. A Gaussian integer u is a unit of Z[i] when u | w for all w in Z[i]. In other words, the units are those very special numbers that divide every single element of the ring. The notion of a unit does not apply only to the ring of Gaussian integers, but in fact applies to any algebraic ring. For example, in Z the only units are 1 and −1. When talking about the prime factorization of rational 40 The picture is taken from https://upload.wikimedia.org/wikipedia/commons/7/ 7d/Gaussian_integer_lattice.png. 112 Figure 1: Gaussian integers integers, we always omit ±1. When doing so, we actually mean that the prime factorization is unique up to multiplication by ±1. We will see that the analogue of the Fundamental Theorem of Arithmetic holds for Gaussian integers, and so every Gaussian integer has the unique prime factorization up to multiplication by a unit. In the next proposition, we prove that the only units in the ring of Gaussian integers are ±1 and ±i. Proposition 29.8. 41 The following are equivalent: 1. z is a unit in Z[i]; 2. N(z) = 1; 3. z ∈ {1, −1, i, −i}; 4. the inverse complex number z−1 := 1/z is also a Gaussian integer. Proof. Suppoze that z is a unit. Then z | 1, since z divides every Gaussian integer. Thus 1 = zw for some w in Z[i]. Then 1 = 12 + 02 = N(1) = N(zw) = N(z)N(w). Since N(z) and N(w) are positive integers, we deduce that N(z) = 1 (and N(w) = 1). 41 Proposition 7.9 in Frank Zorzitto, A Taste of Number Theory. 113 Suppose that N(z) = 1, where z = a+bi for some a, b ∈ Z. We have a2 + b2 = 1, which means that a2 = 1, b = 0 or a = 0, b2 = 1. In the first case, a = ±1 and b = 0, which means that z = ±1. In the second case, a = 0 and b = ±1, which means that z = ±i. If z is one of 1, −1, i, −i, its inverse is 1, −1, −i, i, respectively, and these are again Gaussian integers. Finally, suppose that z and z−1 are Gaussian integers. If w is any other Gaussian integer, we see that z | w, because w = z(z−1 w) and z−1 w is a Gaussian integer. We will now turn our attention to establishing the analogue of the Fundamental Theorem of Arithmetic in the ring of Gaussian integers. For this purpose, we need to introduce the definition of a Gaussian prime. Definition 29.9. Let z be a Gaussian integer. Then z is called a Gaussian prime if it is not a unit and any factorization z = wu in Z[i] forces w or u to be a unit. Compare this definition to Definition 2.5, where we introduced the notion of a rational prime. One can notice the similarities, since an ordinary rational prime can be factored in Z only if one of its factors is a unit, which in the case of Z are ±1. Example 29.10. The integer 2 is a prime in Z, but it is not a Gaussian prime because 2 = (1 + i)(1 − i), and neither 1 + i nor 1 − i is a unit. The number 3, however, is not only a rational prime, but also a Gaussian prime. For suppose that 3 = zw for some Gaussian integers z and w. Then 9 = N(3) = N(zw) = N(z)N(w), which means that N(z) | 9 and N(w) | 9. If we assume that N(z) = 1 then z must be a unit by Proposition 29.8. Thus we need to eliminate this case. But then N(z) = 3, and if we let z = a + bi, then 3 = N(z) = N(a + bi) = a2 + b2 . However, we saw many times that integers congruent to 3 modulo 4 cannot be represented as a sum of two squares, which means that N(z) 6= 3. Analogously, N(w) 6= 3. But then either N(z) = 1 or N(w) = 1, which means that either z or w is a unit. 114 Exercise 29.11. Prove that every rational prime p such that p ≡ 3 (mod 4) is a Gaussian prime. The next step is to establish the analogue of the Remainder Theorem for Gaussian integers. Proposition 29.12. 42 If z, w are Gaussian integers and z 6= 0, then there exist Gaussian integers q and r such that w = qz + r, where N(r) < N(z). Proof. Recall the geometric interpretation of the Gaussian integers, given in Figure 1. The complex number w/z is located somewhere on the complex plane C. This w/z need not be a Gaussian integer. However, as Figure 2 demontrates, one can see that it falls into one of the rectangular areas, whose vertices are Gaussian integers. Figure 2: Complex number in a square with Gaussian integers as vertices We pick our Gaussian integer q so that the distance between the point corresponding to q and the point corresponding to w/z is the smallest. By inspection, we can see that such a Gaussian integer q must satisfy 1 w −q ≤ √ . z 2 The Gaussian integer q has to be in one of the four boxes as shown on Figure 3, and the diagonal of each box has length s 1 2 1 2 1 + =√ . 2 2 2 115 Figure 3: Gaussian integer closest to a given complex number We conclude that 2 w 1 − q ≤ < 1. z 2 Therefore w − zq z 2 < 1, and so |w − zq|2 < |z|2 , which is the same as N(w − zq) < N(z). Put r := w − zq, and obtain w = zq + r, where N(r) < N(z). Example 29.13. Let us see how the Remainder Theorem for Gaussian integers works. Let w = 4 + 7i and z = 1 − 3i. Then w 4 + 7i (4 + 7i)(1 + 3i) −17 + 19i = = = = −1.7 + 1.9i. z 1 − 3i (1 − 3i)(1 + 3i) 10 We see that the nearest integer point to (−1.7, 1.9) is (−2, 2). Thus q = −2 + 2i. Then r = w − qz = (4 + 7i) − (1 − 3i)(−2 + 2i) = −I. We conclude that 4 + 7i = (−2 + 2i)(1 − 3i) − i. Note that N(−i) = 1 < 10 = N(z). 42 Proposition 7.11 in Frank Zorzitto, A Taste of Number Theory. 116 We will now prove the analogue of Bézout’s lemma for Gaussian integers. For a, b ∈ Z[i], we call an integer ax + by with x, y ∈ Z[i] a Gaussian combination of a and b. In the following proposition, it is crucial that for every Gaussian integer a the value N(a) is always non-negative. Proposition 29.14. Let a, b be Gaussian integers such that a 6= 0 or b 6= 0. If d is a Gaussian combination of a and b such that N(d) is minimal, then d divides every combination of a and b. Proof. We know that ax + by = d and N(d) > 0 is minimal. Now consider some integer combination c = as + bt, where s,t ∈ Z[i]. We want to show that d | c. By Proposition 29.12, c = dq + r for some q, r ∈ Z[i], where N(r) < N(d). Thus 0≤r = c − dq = as + bt − (ax + by)q = a(s − xq) + b(t − yq). We see that r is an integer combination of a and b such that N(r) < N(d). Because d is the integer combination of a and b such that N(d) is minimal, the only option is that N(r) = 0. Hence d | c. In particular, d | a and d | b, because a, b are integer combinations of a and b. Definition 29.15. A Gaussian integer d = ax + by such that x, y are Gaussian integers, d | a and d | b is called a greatest common divisor of Gaussian integers a and b. Exercise 29.16. Let d1 and d2 be greatest common divisors of Gaussian integers a and b. Prove that d1 = ud2 for some unit u in Z[i]. Finally, we prove the analogue of Euclid’s lemma for Gaussian integers. Proposition 29.17. 43 if p is a Gaussian prime and p | zw for some Gaussian integers z, w, then p | z or p | w. 43 Proposition 7.13 in Frank Zorzitto, A Taste of Number Theory. 117 Proof. Suppose that p - z. We will show that p | w. Let u be a greatest common divisor of p and z. Thus u = pt + zs for some t, s ∈ Z[i] and u | p, u | z. Write p = uk for some k ∈ Z[i]. Since p is a Gaussian prime, one of u or k is a unit in Z[i]. If k is a unit, then u = pk−1 ∈ Z[i], and so p | u. Since p | u and u | z, it must be that p | z, contrary to our assumption. Thus u is a unit with inverse u−1 ∈ Z[i]. Now multiply u = pt + zs by wu−1 : w = ptwu−1 + zswu−1 . Clearly, p | ptwu−1 , and we are given that p | zw. Thus p | w. Exercise 29.18. Use the results established above to prove the Fundamental Theorem of Arithmetic for Gaussian integers: any Gaussian integer that is not a unit can be written uniquely (up to reordering and multiplication by a unit) as a product of Gaussian primes. Exercise 29.19. Compute the quotient and the remainder after division of w by z, when (w, z) = (6 + i, 2 − i), (27 − 5i, 3 − 7i), (4 + 7i, 8 − i). Exercise 29.20. Let ω denote the primitive third root of unity. That is, √ 2πi −1 + −3 . ω =e 3 = 2 Note that ω satisfies the equation ω 2 + ω + 1 = 0. The set Z[ω] := {a + bω : a, b ∈ Z} is called the ring of Eisenstein integers. For any Eisenstein integer α = a + bω, where a, b ∈ Z, the norm map is defined by N(a + bω) := a2 − ab + b2 . (8) Just like the ring of Gaussian integers, the ring of Eisenstein integers is a Unique Factorization Domain. Geometrically, Eisenstein integers form a lattice on the complex plane (see Figure 4). 1. Prove that Z[ω] is a ring by showing that 0, 1 ∈ Z[ω], and for all α, β ∈ Z[ω] it is the case that α ± β ∈ Z[ω] and α · β ∈ Z[ω]; 118 2. Prove that the norm map defined in (8) is multiplicative. That is, for every α, β ∈ Z[ω] it is the case that N(αβ ) = N(α)N(β ). Explain why N(α) ≥ 0 for every α ∈ Z[ω] and why N(α) = 0 if and only if α = 0; 3. We say that υ ∈ Z[ω] is a unit if υ | α for every α ∈ Z[ω]. Prove that υ ∈ Z[ω] is a unit if and only if N(υ) = 1; 4. Find all units in Z[ω]. Figure 4: Eisenstein integers Exercise 29.21. Let n o √ √ Z[ 2] := a + b 2 : a, b ∈ Z . √ √ We say that υ ∈ Z[ 2] is a√unit if υ | α for every α ∈ Z[ 2]. Prove that there are infinitely many units in Z[ 2]. Hint: Consider the Pell equation x2 − 2y2 = ±1. Explain √why, for every (x1√ , y1 ) satisfying this Diophantine equation, the value x1 + y1 2 is a unit in Z[ 2]. Find any solution (x1 , y1 ), and then prove that, for √ every positive √ integer n, the integer coefficients xn and yn of the number xn + yn 2 := (x1 + y1 2)n also satisfy the equation xn2 − 2y2n = ±1. Exercise 29.22. Consider the ring √ √ Z[ −13] = {a + b −13 : a, b ∈ Z}. √ For every a, b ∈ Z, the norm map on Z[ −13] is defined by √ N(a + b −13) := a2 + 13b2 . 119 You may assume that the √ norm is multiplicative. We will show that the unique factorization fails in Z[ −13]. To solve this problem, you might want to refer to Section 2.3 in Frank Zorzitto, A Taste of Number Theory. √ 1. Prove that the only units of Z[ −13] are ±1. √ √ Hint: Let υ = a + b −13 for a, b ∈ Z. By definition, υ ∈ Z[ −13] is a √ unit if υ | α for every α ∈ Z[ −13]. Thus, in particular, υ | 1. Explain why this fact implies the equality a2 + 13b2 = 1. What are the solutions to this Diophantine equation? √ 2. We say that a non-zero number γ ∈ Z[ −13] is prime if the factoriza√ that either α√is a unit or β is a tion γ = αβ for α, β ∈ Z[ −13] implies √ unit. √ Prove that the numbers 2, 7, 1 + −13 and 1 − −13 are prime in Z[ −13]; √ 3. Using Part (b), explain why the unique factorization fails in Z[ −13]. 30 Fermat’s Theorem on Sums of Two Squares We will now turn our attention to the Diophantine equation n = a2 + b2 , where n is a fixed positive integer and a, b are integer variables. On December 25th 1640, Fermat sent the proof of the following theorem to Mersenne, which is why in some sources it is called Fermat’s Christmas Theorem. This theorem will allow us to explain which positive integers are representable as a sum of two squares, and how many solutions does the equation n = a2 + b2 have. Theorem 30.1. (Fermat’s Theorem on Sums of Two Squares)44 If p is a rational odd prime and p ≡ 1 (mod 4), then p = a2 + b2 for some rational integers a and b. Proof. (Richard Dedekind, circa 1894) Since p ≡ 1 (mod 4), it follows from Corollary 20.10 that −1 is a quadratic residue modulo 4. Thus −1 ≡ x2 (mod p) for some rational integer x. Thus p | x2 + 1 in Z, and so p | (x + i)(x − i) in Z[i]. Now note that p - x + i, for if we assume that x + i = p(c + di) for some Gaussian integer c + di, then by equating the imaginary parts we get pd = 1, which contradicts the fact that p - 1. Likewise, p - x − i. 44 Theorem 7.14 in Frank Zorzitto, A Taste of Number Theory. 120 Since p divides a product without dividing either of the factors, Proposition 29.17 tells us that p is not a Gaussian prime. Thus p = uv, where u, v ∈ Z[i] are not units. But then p2 = N(p) = N(uv) = N(u)N(v), so N(u) = 1, p or p2 .If N(u) = 1, then u is a unit. If N(u) = p2 , then N(v) = 1, so v is a unit. Hence N(u) = N(v) = p. But if we now write u = a + bi, then p = N(u) = N(a + bi) = a2 + b2 , so p is a sum of two squares of rational integers. Now we know that, when p is an odd prime, the equation p = x2 + y2 has a solution in positive integers x and y if and only if p ≡ 1 (mod 4). Notice that it also has a solution when p = 2, because 2 = 12 + 12 . We would now like to generalize this result to all positive integers n. For this purpose, we need to prove the following lemma. Lemma 30.2. 45 If p in Z[i] is a Gaussian prime and pk | uv for some Gaussian integers u and v and exponent k ≥ 1, then there are exponents j, ` = 0, 1, . . . , k such that p j | u, p` | v and j + ` = k. Proof. We will prove this statement using the principle of mathematical induction. Base case. For k = 1, the result is equivalent to Euclid’s lemma for Gaussian integers, stated in Proposition 29.17. Induction hypothesis. Suppose that the theorem is true for k − 1. Induction step. Let pk | uv. Then p | u or p | v. Suppose that p | v. Write v = wp for some w in Z[i]. Then pk | uwp, which means that pk−1 | uw. According to the induction hypothesis, there exist integers j and m, 0 ≤ j, m ≤ n − 1, such that p j | u, pm | w, and j + m = k − 1. But then pm+1 | wp = v. If we now put ` = m + 1, then p j | u, p` | v, and j + ` = k, as claimed. Proposition 30.3. Let n be a positive integer. The Diophantine equation n = x2 + y2 has a solution if and only if n has the prime factorization 2f 2f 2f n = 2t pe11 pe22 · · · pekk q1 1 q2 2 · · · q` ` , where p j ≡ 1 (mod 4) for all j = 1, 2, . . . , k and q j ≡ 3 (mod 4) for all j = 1, 2, . . . , `. 45 Proposition 7.16 in Frank Zorzitto, A Taste of Number Theory. 121 Proof. Let w = a + bi and z = c + di be Gaussian integers. Since the norm map is multiplicative, it is the case that (a2 + b2 )(c2 + d 2 ) = N(w)N(z) = N(wz) = N ((ac − bd) + (ad + bc)i) = (ac − bd)2 + (ad + bc)2 . The identity above allows us to conclude that the product mn of any two numbers m = a2 + b2 and n = c2 + d 2 will be representable as a sum of two squares as well: mn = (a2 + b2 )(c2 + d 2 ) = (ac − bd)2 + (ad + bc)2 . Since 2 is representable as a sum of two squares, as well as any odd prime p ≡ 1 (mod 4), we conclude that every integer n with the prime factorization n = 2t pe11 pe22 · · · pekk , where p j ≡ 1 (mod 4) for all j = 1, 2, . . . , k is representable as a sum of two squares. We know that for every rational prime q ≡ 3 (mod 4) the Diophantine equation q2 f +1 = a2 + b2 has no solutions for every non-negative integer f , because q2 f +1 ≡ 3 (mod 4). However, every even power of q is representable as a sum of two squares, because q2 f = (q f )2 + 02 for every positive integer f . But then once again we can use the identity (a2 + b2 )(c2 + d 2 ) = (ac − bd)2 + (ad + bc)2 to conclude that every integer n with the prime factorization 2f 2f 2f n = 2t pe11 pe22 · · · pekk q1 1 q2 2 · · · q` ` , where p j ≡ 1 (mod 4) for all i = 1, 2, . . . , k and q j ≡ 3 (mod 4) for all j = 1, 2, . . . , ` is representable as a sum of two squares. We will now show that these are the only numbers representable as a sum of two squares. To prove this fact, all that we have to do is to show that, whenever n = x2 + y2 and some prime q ≡ 3 (mod 4) satisfies n = qk m, where m is an integer such that q - m, then the exponent k has to be even. We see that qk | x2 + y2 = (x + yi)(x − yi). 122 Since every rational prime q ≡ 3 (mod 4) is also a Gaussian and prime, it follows from Lemma 30.2 that there exist integers j and `, 0 ≤ j, ` ≤ k, such that j + ` = k, q j | (x + yi) and q` | (x − yi). Suppose that j ≥ `. Then x + yi = q j (c + di) for some integers c and d. Therefore x + yi = q j c + q j di, which means that x = q j c and y = q j d. But then n = x2 + y2 = q2 j c2 + q2 j d 2 = p2 j (c2 + d 2 ). Since j ≥ `, we see that 2 j = j + j ≥ j + ` = k, and since qk is the highest power of q that divides n and q2 j | n, we must conclude that k = 2 j, which is an even number. Now that we know for which positive integers n does the Diophantine equation n = x2 + y2 have a non-trivial solution, there are only two questions left for us to discuss namely how many solutions are there and how does one compute the solutions. Let r2 (n) denote the number of integer solutions to n = x2 + y2 , where x, y ∈ Z are allowed to be positive, negative or zero. As it turns out, r2 (n) = 4 (d1 (n) − d3 (n)) , where d1 (n) and d3 (n) correspond to the number of divisors of n congruent to 1 and 3 modulo 4, respectively. This formula can also be rewritten as follows: r2 (n) = 4 ∑ (−1) d−1 2 . d|n d≡1,3 (mod 4) From this formula it follows that for every prime p ≡ 1 (mod 4) the Diophantine equation p = x2 + y2 has only 4 solutions, and if (x, y) is one of them, then the other three are (x, −y), (−x, y) and (−x, −y). As for the computation of the actual solutions, when p ≡ 1 (mod 4) is prime, the computation of x and y such that p = x2 + y2 basically reduces to finding a quadratic residue of −1 modulo p. This can be done in polynomial time using the Tonelli-Shanks Algorithm. If z is an integer such that z2 ≡ −1 (mod p) then one can use the Euclidean algorithm for Gaussian integers to compute x + yi = gcd(z + i, p). In order to find a solution to n = x2 + y2 for a composite integer 123 n one would have to factor n first, and as we know in general the integer factorization is a difficult problem. In fact, as we saw in Assignment 3, the ability to represent a composite integer n as a sum of two squares in two different ways yields a non-trivial factorization of n. Such a method of factorization is called the Euler Factorization Method. Leonhard Euler used this method to factor the integer 10000009 = 293 · 3413 by knowing the fact that 10000009 = 10002 + 32 = 9722 + 2352 . Exercise 30.4. Consider the setup as in Exercise 29.19. We say that γ 6= 0 is an Eisenstein prime if the factorization γ = αβ for α, β ∈ Z[ω] implies that either α is a unit or β is a unit. 1. Prove that every rational prime p ≡ 2 (mod 3) is also an Eisenstein prime. Hint: See Example 29.10. 2. Note that 3 = (1 − ω)(1 − ω 2 ), so 3 is not an Eisenstein prime. Also, it can be shown that every rational prime p ≡ 1 (mod 3) is not an Eisenstein prime. Use this fact, as well as Parts (a) and (b), to show that every integer n with the prime factorization 2f 2f 2f n = 3t pe11 pe22 · · · pekk q1 1 q2 2 · · · q` ` , where pi ≡ 1 (mod 3) for all i = 1, 2, . . . , k and q j ≡ 2 (mod 3) for all j = 1, 2, . . . , `, admits a non-trivial solution (x, y) to the Diophantine equation n = x2 − xy + y2 . 31 Continued Fractions Even though most of the real numbers are not rational, to simplify calculations we approximate them by rationals. However, some rationals are better than others, so which ones should we pick? For example, we can truncate the decimal expansion of the number π = 3.1415926535 . . . after the 9th digit, and approximate π by the rational number 3141592654/109 . However, after a careful investigation we discover that the rational number 103993/33102 also approximates π to 9 decimal digits while having a significantly smaller denominator. So we can ask the following question: 124 For a given real number α and a positive integer Q, which rational numbers p/q with 1 ≤ q ≤ Q correspond to the minimal value of |α − p/q|? This question lies in the core of the subarea of Number Theory called Diophantine Approximation. As we will find out, the best possible rational approximations to a non-zero real number α form a sequence {pn /qn }∞ n=0 , entitled the canonical continued fraction expansion of α. Every canonical continued fraction is a special case of what is called a partial fraction, whose properties we will now investigate. Definition 31.1. Let a0 , a1 , . . . , aN be real numbers such that ai > 0 for all i satisfying 1 ≤ i ≤ N. Define the partial fraction [a0 , a1 , . . . , aN ] by [a0 , a1 , . . . , aN ] := a0 + 1 a1 + . . 1 . .+ 1 aN The numbers a0 , a1 , . . . , aN are called partial coefficients of [a0 , a1 , . . . , aN ]. If n is an integer such that 0 ≤ n ≤ N, the partial fraction [a0 , a1 , . . . , an ] is called the n-th covergent to [a0 , a1 , . . . , aN ]. Note that in the definition of a partial fraction we let ai ’s be real numbers such that ai > 0 for all i satisfying 1 ≤ i ≤ N. If we allow ai ’s to be negative or complex, then not every choice of ai ’s is admissible, as the examples [1, 1, −1] or [i, i, i] demonstrate. Soon we will introduce canonical continued fractions and restrict the domain of ai ’s from real numbers to integers. √ √ √ Example 31.2. Let us determine the value of [ 2, 2, 2]. We have √ √ √ √ √ √ √ 1 2 4 2 [ 2, 2, 2] = 2 + √ = 2+ = . 3 3 2 + √12 Also, we see that √ 4 2 1 = 1+ 3 1+ 7 1√ +3 2 2 √ 7 = 1, 1, + 3 2 . 2 Thus several continued fractions can correspond to the same number. Some continued fractions, like √ 4 2 = [1, 1, 7, 1, 2, 1, 7, 1, 2, 1, . . .], 3 125 appear to be periodic, while some continued fractions, like √ 3 3 = [1, 2, 3, 1, 4, 1, 5, 1, 1, 6, 2, 5, 8, . . .] seem to be aperiodic. They can also be infinite. Certain continued fractions have quite elegant continued fraction expansions. For example, tan(1) = [1, 1, 1, 3, 1, 5, 1, 7, 1, 9, 1, 11, 1, 13, . . .]. √ √ √ Exercise 31.3. Compute [1, 2, 3, 4, 5] and [ 5, 2 5, 3 5]. Give an example of a √ continued fraction of 3 2 with at least five terms. Some elementary properties of continued fractions are 1 , [a0 , a1 , . . . , an ] = a0 , a1 , . . . , an−1 + an [a0 , a1 . . . , an ] = [a0 , [a1 , . . . , an ]] and, more generally, [a0 , a1 , . . . , an ] = [a0 , a1 , . . . , am−1 , [am , . . . , an ]] . Proposition 31.4. Let a0 , a1 , . . . , aN be real numbers such that ai > 0 for all i satisfying 1 ≤ i ≤ N. For a non-negative integer n, define the real numbers pn and qn by p0 = a0 , q0 = 1, p1 = a1 a0 + 1, q1 = a1 ... ... pn = an pn−1 + pn−2 , qn = an qn−1 + qn−2 . Then [a0 , a1 , . . . , an ] = pn /qn . Proof. We will prove this statement using the principle of mathematical induction. Base case. Clearly, we have [a0 ] = a0 = a10 = qp00 , [a0 , a1 ] = a0 aa11+1 = qp11 , so the result holds for n = 0, 1. Induction hypothesis. Suppose that the statement is true for n = m − 1, m, where m < N. 126 Induction step. We will show that the result holds for n = m + 1. We have 1 [a0 , a1 , . . . , am+1 ] = a0 , a1 , . . . , am + am+1 1 am + am+1 pm−1 + pm−2 = 1 am + am−1 qm−1 + qm−2 am+1 (am pm−1 + pm−2 ) + pm−1 am+1 (am qm−1 + qm−2 ) + qm−1 am+1 pm + pm−1 = am+1 qm + qm−1 pm+1 = . qm+1 = Proposition 31.5. For any positive integer n, it is the case that pn qn−1 − pn−1 qn = (−1)n−1 or, equivalently, pn pn−1 (−1)n − = . qn qn−1 qn qn−1 Proof. See Assignment 6. Proposition 31.6. For any positive integer n, it is the case that pn qn−2 − pn−2 qn = (−1)n an or, equivalently, pn pn−2 (−1)n an − = . qn qn−2 qn qn−2 Proof. The result follows from Proposition 31.5: pn pn−2 an pn−1 + pn−2 pn−2 − = − qn qn−2 an qn−1 + qn−2 qn−2 qn−2 (an pn−1 + pn−2 ) − pn−2 (an qn−1 + qn−2 ) = qn−2 (an qn−1 + qn−2 ) an (pn−1 qn−2 − pn−2 qn−1 ) = qn qn−2 n (−1) an = . qn qn−2 127 Proposition 31.7. Let a0 , a1 , . . . , aN be real numbers such that ai > 0 for all i satisfying 1 ≤ i ≤ N. Let xn = pn /qn . Then the following hold: 1. It is the case that x0 < x2 < x4 < . . . and x1 > x3 > x5 > . . . . 2. Every odd convergent is greater than any even convergent. That is, x2k+1 > x2` for any k and `; 3. The N-th convergent xN is greater than any even convergent and less than any odd convergent. Proof. Let us prove property 1. If n is even, then it follows from Proposition 31.6 that an pn pn−2 − = > 0. xn − xn−2 = qn qn−2 qn qn−2 Therefore xn−2 = pn−2 pn = xn < qn−2 qn for all even n. Analogously, one can show that xn−2 > xn for all odd n. To establish property 2, recall that by Proposition 31.5 we have x2k+1 > x2k for all non-zero k. If ` ≤ k, then x2k > x2` , so x2k+1 > x2` . If ` > k, then x2` < x2`+1 ; since x2k+1 > x2`+1 , it follows that x2k+1 > x2` . Finally, to see that property 3 holds, we note that if xN is even then by property 1 we have x0 < x2 < . . . < xN . Thus xN is greater than any even convergent. On the other hand, by property 2, every even convergent, including xN , is less than every odd convergent. The result then follows for all even N, and similarly one can also argue that it is true when N is odd. Example 31.8. Let us see an example of the phenomenon described √ in Proposition 31.7. Consider the following continued fraction expansion of 7: √ 7 = [2, 1, 1, 1, 4, 1, 1, 1, 4, 1, . . .] = 2.64575 . . . . 128 The first 10 convergents of √ 7 are 5 8 37 45 82 127 590 717 2, 3, , , , , , , , . 2 3 14 17 31 48 223 271 We see that √ 5 37 82 590 717 127 45 8 < < < < ... < 7 < ... < < < < < 3. 2 14 31 223 271 48 17 3 √ The n-th convergents to√the left of 7 correspond to even n, while the n-th convergents to the right of 7 correspond to odd n. 2< Now let α be a real number. We construct a canonical continued fraction expansion of α as follows: Step 1. Define a0 := bαc. If α = a0 then α = [a0 ]. Otherwise let α = a0 + 1/α1 for some α1 . Step 2. Let a1 = bα1 c. If α1 = a1 then α = a0 + 1/a1 = [a0 , a1 ]. Otherwise let α1 = a1 + 1/α2 for some α2 . We repeat this procedure. If it stops after a finite number of steps then α = [a0 , . . . , aN ]. Otherwise α = [a0 , a1 , . . .] has an infinite canonical continued fraction expansion. Example 31.9. Let us determine the first five terms in the canonical continued fraction expansion of π = 3.14159 . . ., as well as the first five convergents of π. Step 1. Define a0 := bπc = 3. Then π = [3, α1 ] = 3 + 1 , α1 where α1 = 1/(π − 3) = 7.06251. Step 2. Define a1 := bα1 c = 7. Then π = [3, 7, α2 ] = 3 + 1 , 7 + α12 where α2 = 1/(α1 − 7) = 15.99659. We see that the first convergent to π is p1 1 1 22 = a0 + = 3 + = . q1 a1 7 7 129 Step 3. Define a2 := bα2 c = 15. Then π = [3, 7, 15, α3 ] = 3 + 1 7 + 15+1 , 1 α3 where α3 = 1/(α2 − 15) = 1.00342. We see that the second convergent to π is p2 1 1 333 = a0 + = 3+ = . 1 1 q2 106 a1 + a 7 + 15 2 Proceeding in the same fashion, we see that π = [3, 7, 15, 1, 292, 1, . . .], and the first five convergents of π are 22 333 355 103993 104348 , , , , . 7 106 113 33102 33215 Exercise 31.10. Determine the first five terms in the canonical continued fraction expansion of the Euler constant e = 2.71828 . . ., as well as the first five convergents of e. Exercise 31.11. Prove that α has a finite canonical continued fraction expansion if and only if α is a rational number. Proposition 31.12. Let α be a real number and let pn /qn be the n-th convergent in the canonical fraction expansion of α. Then |q1 α − p1 | > |q2 α − p2 | > |q3 α − p3 | > . . . . Proof. Let α = [a0 , a1 , . . . , an , αn+1 ]. Then α= αn+1 pn + pn−1 . αn+1 qn + qn−1 It follows from Proposition 31.5 that pn αn+1 + pn−1 |qn α − pn | = qn − pn qn αn+1 + qn−1 |qn pn−1 − pn qn−1 | = |qn αn+1 + qn−1 | 1 = . qn αn+1 + qn−1 130 Now note that qn αn+1 + qn−1 ≥ qn + qn−1 = an qn−1 + qn−2 + qn−1 = qn−1 (an + 1) + qn−2 > qn−1 αn + qn−2 . The observation made above allows us to conclude that |qn α − pn | = 1 1 < = |qn−1 α − pn−1 |. qn αn+1 + qn−1 qn−1 αn + qn−2 Proposition 31.13. Let α be a real number and let pn /qn be the n-th convergent in the canonical fraction expansion of α. Then 1 pn 1 < α − < . (an+1 + 2)q2n qn an+1 q2n Proof. Let α = [a0 , a1 , . . . , αn+1 ] for some αn+1 such that an+1 ≤ αn+1 < an+1 +1. Also, let pn /qn = [a0 , a1 , . . . , an ] be the n-th convergent to α. Then it follows from the formula αn+1 pn + pn−1 , α= αn+1 qn + qn−1 as well as from Proposition 31.5, that α− pn 1 = . qn qn (αn+1 qn + qn−1 ) Since qn > qn−1 , we can deduce the desired result by establishing the following inequalities: an+1 qn < αn+1 qn + qn−1 < (an+1 + 1)qn + qn = (an+1 + 2)qn . Proposition 31.14. Let α be a real number and let pn /qn be the n-th convergent in the canonical fraction expansion of α. Then for all integers p and q such that 0 < q < qn+1 it is the case that |qα − p| ≥ |qn α − pn |. 131 Proof. Note that if p = pn and q = qn then the result holds. Thus we may assume that p/q 6= pn /qn . Recall from Proposition 31.5 that pn qn+1 − qn pn+1 = (−1)n+1 . Then the matrix pn pn+1 A= qn qn+1 has a non-zero determinant det A = (−1)n+1 , which means that it is invertible. Furthermore, the inverse matrix is defined by 1 qn+1 −pn+1 −1 n+1 qn+1 −pn+1 A = = (−1) . pn −qn pn det A −qn As we can see, the matrix A−1 has integer coefficients. This means that the matrix equation p u =A q v can be solved in integers u and v, and the solution is u = (−1)n+1 (qn+1 p − pn+1 q), v = (−1)n+1 (pn q − qn p). Note that v 6= 0 and u 6= 0, for otherwise it would be the case that p/q = pn /qn or p/q = pn+1 /qn+1 . Of course, the latter is impossible because, according to the hypothesis, q < qn+1 . Now consider the expressions p = upn + vpn+1 , q = uqn + vqn+1 . Note that q = uqn + vqn+1 < qn+1 . We claim that u and v have opposite signs. If we assume that both u and v are negative then q would be negative, which contradicts the assumption q > 0. On the other hand, if we assume that both u and v are positive, then q would have to exceed qn+1 . This would lead us to a contradiction to the inequality established above. Since neither u nor v can be zero, we see that our claim holds; that is, the numbers u and v have different signs. 132 Next, recall that according to property 3 of Proposition 31.7 either pn pn+1 pn+1 pn <α < or <α < qn qn+1 qn+1 qn must hold, depending on whether n is even or odd. In any case, it must be that αqn − pn and αqn+1 − pn+1 have different signs. Since u, v have different signs and αqn − pn , αqn+1 − pn+1 have different signs, the signs of u(qn α − pn ) and v(qn+1 α − pn+1 ) match. Hence |qα − p| = |α(uqn + vqn+1 ) − (upn + vpn+1 )| = |u(qn α − pn ) + v(qn+1 α − pn+1 )| = |u(qn α − pn )| + |v(qn+1 α − pn+1 )| ≥ |u||qn α − pn | ≥ |qn α − pn |. The fact that u(qn α − pn ) and v(qn+1 α − pn+1 ) have the same signs was utilized to establish the third equality. In turn, the last inequality follows from the fact that u is a non-zero integer. Corollary 31.15. Let p/q be a rational number and let α be a real number. Then the inequality p 1 α− < 2 q 2q implies that p/q = pn /qn for some non-negative integer n. That is, the number p/q appears as a convergent in the canonical continued fraction expansion of α. Proof. See Assignment 6. We conclude this section by discussing the question of periodicity of canonical continued fraction expansions. Definition 31.16. Let α be a real number with the canonical continued fraction expansion α = [a0 , a1 , . . . , an ; b1 , b2 , . . . , bk , b1 , b2 , . . . , bk , b1 , . . .]. In other words, at some point the elements of the continued fraction expansion start to repeat. We indicate this by writing α = [a0 , a1 , . . . , an ; b1 , b2 , . . . , bk ]. 133 A canonical continued fraction expansion of such kind is called preperiodic, and if the terms a0 , a1 , a2 , . . . , an are missing we say that it is periodic. The smallest number k such that the terms repeat is called the period of a continued fraction. It was proved by Joseph-Louis Lagrange that a real number α has a preperiodic canonical continued√fraction expansion if and only if it is a quadratic irrational. That is, α = a + b d for some rational numbers a, b 6= 0 and d, where d is a positive integer that is not a perfect square. Example 31.17. Let us determine the canonical continued fraction expansion of √ 7. By computing the first few terms, we see that √ 7 = [2, 1, 1, 1, 4, 1, 1, 1, 4, 1, . . .]. √ Thus we can guess that 7 = [2, 1, 1, 1, 4]. Let us prove this fact. Let θ = [1, 1, 1, 4]. Then θ = [1, 1, 1, 4] = [1, 1, 1, 4, θ ] = 1 + 1 1 + 1+ 1 1 4+ θ1 = 14θ + 3 . 9θ + 2 We see that θ satisfies the equation 3θ 2 − 4θ − 1 = 0. The above equation has two roots, but since θ > 0 we can conclude that √ 2+ 7 θ= . 3 Then √ 1 3 7+2 7 √ √ = √ = 7, [2, 1, 1, 1, 4] = [2, θ ] = 2 + = 2 + θ 2+ 7 2+ 7 as claimed. √ Exercise 31.18. Determine canonical continued fraction expansions for 1+2 5 and √ 2. Are they both preperiodic? Are they both periodic? What are the periods of their continued fraction expansions? Exercise 31.19. Prove that if a real number α has a preperiodic canonical continued fraction expansion, then there exist rational integers a, b and c, not all zero, such that aα 2 + bα + c = 0. 134 32 The Pell’s Equation For more details on the subject, we refer the reader to the monograph of M. J. Jacobson, Jr. and H. C. Williams, Solving the Pell Equation, 2009. In 1773, Gotthold Ephraim Lessing was appointed librarian of the Herzog August Library in Wolfenbüttel, Germany. In this library, he discovered an ancient Greek manuscript with a poem of 44 lines, which contained an interesting arithmetical problem. This problem is attributed to Archimedes and is called the Archimedes’ Cattle Problem. The problem was to calculate the number of cattle in the herd of Helios, the god of the sun. There were two parts to this problem, the first of which could be solved relatively easy by setting up a system of seven equations with eight unknowns, each for one type of bulls and cows present in the herd. Much more challenging was the second part of the problem, which, in its essence, asked the reader to calculate a solution to the equation x2 − 4729494y2 = 1. Despite its innocent look, the smallest solution to this equation has more than 100000 digits. In 1880, A. Amthor discovered that the smallest herd that could satisfy both parts of this problem had approximately 7.76 × 10206544 bulls. In comparison, it is conjectured that there are between 1078 and 1082 atoms in the known, observable universe.46 Of course, Amthor himself did not calculate this number precisely. In 1965, the precise answer to the Archimedes’ Cattle Problem was given by Hugh Williams, Gus German and Robert Zanke, who were University of Waterloo students at that time. To calculate the answer, they used a combination of the IBM 7040 and IBM 1620 computers. You can find a fascinating article about the history of computing at the University of Waterloo here: https://cs.uwaterloo.ca/40th/Chronology/printable.shtml. An equation of the form x2 − dy2 = ±1, where d is positive and is not a perfect square, is called a Pell’s equation. The name is due to Euler, who attributed the method of solving this equation to John Pell. It is widely believed that Euler actually made a mistake and confused John Pell with William Brouncker. The English mathematician William Brouncker discovered a general method for solving the Pell’s equation, which was based on continued fractions. He was able to apply it to the equation x2 − 313y2 = 1 46 According to http://www.universetoday.com/36302/atoms-in-the-universe/. 135 and find the smallest positive solution x = 32188120829134849, y = 1819380158564160. When writing to Frenicle de Bessy who proposed this problem to him, Brouncker claimed that it only took him “an hour or two” to find the solution. In 1768, Joseph-Louis Lagrange managed to prove that Pell’s equation has a solution different from (±1, 0) for every positive d that is not a perfect square. We will now apply Corollary 31.15 to show that every positive solution to Pell’s equation x2 − dy2 = ±1 √ must arise as a convergent of d. Theorem 32.1. Let d be a positive integer that is not a perfect square. Then every solution (x, y) 6= (±1, 0) to Pell’s equation x2 − dy2 = ±1 must satisfy x/y √ = pn /qn for some positive integer n, where pn /qn is the n-th convergent of d. Proof. Suppose that (x, y) 6= (±1, 0) is a solution. Without loss of generality, we may assume that x and y are positive. Then p √ x ≥ dy2 − 1 ≥ y d − 1. Therefore √ |x − dy| = since 1 1 1 < , √ ≤√ q 2y |x + dy| dy 1 + 1 − d1 √ √ d + d − 1 > 2 for d ≥ 2. Thus √ 1 x d − < 2. y 2y It follows from Corollary 31.15 that x/y is a convergent of 136 √ d. 33 Algebraic and Transcendental Numbers. Liouville’s Approximation Theorem In 1840, the French mathematician Joseph Marie Liouville proved the so-called Approximation Theorem, which allowed him to discover the first transcendental −k! . This number is called the Liouville Number. You are asked number ∑∞ k=0 10 to reproduce Liouville’s proof for a different number in Exercise 33.7. Definition 33.1. A complex number α is called algebraic if there exists a nonzero polynomial f (t) with rational coefficients such that f (α) = 0. Otherwise, it is called transcendental. Definition 33.2. Let α be an algebraic number. Let f (t) = cd t d + cd−1t d−1 + . . . + c1t + c0 be a polynomial such that a) f (α) = 0; b) c0 , c1 , . . . , cd ∈ Z; c) cd > 0; d) gcd(c0 , c1 , . . . , cd ) = 1; e) The polynomial f (t) has the smallest degree among all non-zero polynomials satisfying a), b), c) and d). Then f (t) is called the minimal polynomial of α. It is a fact from algebraic number theory that such a polynomial is unique. We say that the algebraic number α has a degree d if the degree of its minimal polynomial is equal to d, i.e. deg f = d. √ √ Example 33.3. Consider the number 2. This number is algebraic, since 2 is a root of the polynomial f (t) = t 2 − 2, which has rational coefficients. Note that it is also a root of f1 (t) = 0, or f2 (t) = t 3 + 3t 2 − 2t − 6, or f3 (t) = 6t 2 − 12. However, none of these polynomials satisfy Definition 33.2. p√ √ Exercise 33.4. Explain why the numbers α = 0, 1/2, i, 2 + 3 are algebraic. For each α, find a non-zero monic polynomial with rational coefficients such that f (α) = 0. 137 Exercise 33.5. a) Prove that every rational number x/y has degree 1; b) Prove that every quadratic irrational √ has degree 2. In other words,2show that every number of the form a + b d, where a, b, d ∈ Q and d 6= ±r for some r ∈ Q, satisfies some polynomial f (x) of degree 2 and does not satisfy any polynomial of degree 1. Some properties of an irreducible polynomial: • For a given algebraic number α the minimal polynomial of α is unique; • Every minimal polynomial f (t) is irreducible over the field of rational numbers. That is, if g(t) | f (t) and g(t) ∈ Q[t], then g(t) = ± f (t) or g(t) = ±1; • Let α be a root of its minimal polynomial f (t). Then f 0 (α) 6= 0. That is, in C[t] it is the case that (t − α) | f (t) while (t − α)2 - f (t). Theorem 33.6. (Liouville’s Approximation Theorem, 1840) Let α be an irrational algebraic number (that is, a number of degree d ≥ 2). Then there exists some constant C, which depends only on α, such that for any x ∈ Z, y ∈ N the following inequality holds: x C α− ≥ d. y y Proof. 47 Let f (t) = cd t d + . . . + c1t + c0 be the minimal polynomial of α. Since f is irreducible over Q and is of degree d ≥ 2, it has no rational roots, so f (x/y) 6= 0 for any x ∈ Z, y ∈ N. Furthermore, x f = y k 1 x ∑ ck y = yd k=0 d d ∑ ck xk yd−k k=0 | {z ∈N ≥ 1 . yd } We now apply the Mean Value Theorem and observe that there exists some real ξ , satisfying f (α) − f (x/y) − f (x/y) f 0 (ξ ) = = . α − x/y α − x/y 47 The proof is from P. Garrett, Liouville’s theorem on diophantine approximation, 2013. See http://www.math.umn.edu/~garrett/m/mfms/notes_2013-14/04b_Liouville_ approx.pdf. Note that there is an error in these notes: instead of estimating | f 0 (ξ )| from above, the author obtains the estimate from below. 138 Rearranging the terms of the above equality, we get | f 0 (ξ )|−1 x x −1 α− = f . · f 0 (ξ ) ≥ y y yd For now, our constant | f 0 (ξ )|−1 depends on α and y (note that x depends on α and y), but it is not hard to eliminate the dependency on y by slightly adjusting our constant. In particular, since f is minimal, the multiplicity of α is 1, which means that f 0 (α) 6= 0. This means that for all ξ within some small neighbourhood Uα of α, it must be the case that 0 < | f 0 (ξ )| ≤ 2| f 0 (α)|. Plainly, there exists some large y0 , which depends only on α, such that some rational fraction x/y with the denominator y ≥ y0 falls into Uα . We conclude that α− | f 0 (ξ )|−1 | f 0 (α)|−1 x ≥ ≥ y yn 2yd for all y ≥ y0 . Finally, we choose our constant c by picking the minimum between 2−1 | f 0 (α)|−1 and yd |α − x/y| over all y < y0 . This concludes the proof. Liouville’s Approximation Theorem is a very elegant result which can be explained on a rather intuitive level. As y grows, we certainly expect our approximations x/y of α to be more precise. The question is, to what extent, and how can we measure the ”quality” of our approximation? The theorem tells us that any irrational algebraic number cannot be approximated “too well” by rational numbers. One intuitive explanation of this is the following: no fraction phenomenon 1 will approximate α better than up to d + logy C base-y places. √ For example, when α = 2 one may take C = 1/4, and observe that for all q≥2 √ x 1 2− ≥ 2. y 4y One of the ways to interpret √ the above inequality is as follows: no fraction x/y for y > 2 will approximate 2 significantly better than up to 2 base-y places. Many more things can be said regarding Liouville’s inequality. For example, one may ask what happens if we make C a function of y: α− x C(y) ≥ d . y y It turns out that for d = 2 one cannot replace the constant C with some monotonously increasing function C(y), but for d ≥ 3 this can be done. The first improvement 139 of such kind was introduced by Thue in 1909, who showed that one can take d C(y) = c1 y 2 −1−ε for some constant c1 , which depends only on α, and any ε > 0. This result allowed him to prove Thue’s Theorem. The further improvements were developed by Siegel, Gelfond and Dyson, until in 1955 Roth showed that C(y) = c1 yd−2−ε would do the job as well. In basic terms, his result states that there are only finitely many rational approximations x/y to α of degree ≥ 3, which will result in more than 2 + ε accurate base-y places. Exercise 33.7. (a) Prove that, for every integer n ≥ 1, the number ∞ α := 1 1 1 1 1 ∑ 2k! = 1 + 2 + 4 + 64 + 16777216 + . . . k=0 satisfies the inequality n 1 1 < n! n . k! (2 ) k=0 2 α−∑ Hint: Note that (9) ∞ 1 1 < . ∑ k! 2k k=n+1 2 k=(n+1)! ∞ ∑ Use the formula for the infinite geometric series afterwards. (b) Use Liouville’s Theorem and the inequality established in Part (a) to prove that the number α is either rational or transcendental. Hint: Suppose not. Then there exist fixed integers d ≥ 2 and C > 0 such that α− C x ≥ d y y for all integers x and y > 0. Why does this inequality contradict the inequality (9)? 34 Elliptic Curves Let n be a squarefree number. We say that n is congruent if there exists a right triangle with rational sides whose area is n. For example, the number 5 is congruent since it is the area of the right triangle with rational sides 20/3, 3/2 and 41/6. 140 Number 6 is also congruent, since it is the area of the right triangle with rational sides 3, 4 and 5. In contrast, the number 3 is not congruent. Also, note that if n is congruent, then any integer of the form s2 n also trivially arises as the area of a right triangle with rational sides. That is why we restrict our attention only to squarefree n. Given a squarefree number n, how can we find out whether it is congruent or not? Essentially, what we need to do is to solve the system of equations ( a2 + b2 = c2 ; 1 2 ab = n for a, b, c ∈ Q. Set x= 2n2 (a + c) n(a + c) , y= . b b2 Then y2 = x3 − n2 x, where y 6= 0. Thus, instead of the original system of equations we just have to find x, y ∈ Q such that y2 = x3 − n2 x. If such rational x and y exist, one can easily obtain a solution to the original system of equations by setting 2nx x 2 + n2 x2 − n2 a= , b= , c= . y y y Thus we just have to find a rational point (x, y) on the curve y2 = x3 − n2 x. Such a curve is an example of elliptic curve. Definition 34.1. Let F = Fq , Q, R, C, where q is a prime power.48 Let a, b ∈ F be such that 4a3 + 27b2 6= 0. The collection E(F) = (x, y) ∈ F2 : y2 = x2 + ax + b ∪ {∞} is called an elliptic curve, defined over the field F. Here ∞ denotes the point at infinity. The value ∆ = −16(4a3 + 27b2 ) is called the discriminant of an elliptic curve E(F). 48 Here Fq denotes the finite field of order q. We will not give a rigorous construction of Fq here. We remark though that when q is prime the finite field Fq is the same as Zq , the ring of residue classes modulo q. 141 Example 34.2. The graph of an elliptic curve E1 : y2 = x3 − 25x over R is depicted on Figure 5. This elliptic curve, aside from trivial rational points (0, 0) and (±5, 0), contains a rational point (x, y) = (45, 300). This fact implies that the number 5 is congruent. Furthermore, one can show that in the case of E1 (Q) the existence of one non-trivial rational point implies the existence of infinitely many rational points. In contrast, E2 : y2 = x3 − 9x has no non-trivial rational points, so the elliptic curve E2 (Q) contains only four points, namely (0, 0), (±3, 0) and the point at infinity. Both curves E1 (R) and E2 (R) contain infinitely many points. Also, note that the graph of E1 (R) contains two connected components. This is because the discriminant of E2 is equal to ∆(E1 ) = 106 and is positive. In contrast, the discriminant of E3 : y2 = x3 − 2 is equal to ∆(E3 ) = −1728 and is negative. The negative sign indicates that the graph of E2 (R) has one connected component. -20 -15 -10 -5 10 10 5 5 0 5 10 15 20-20 -15 -10 -5 0 -5 -5 -10 -10 5 10 15 20 Figure 5: Elliptic curves y2 = x3 − 25x and y2 = x3 − 2 Exercise 34.3. Find integers a and b such that the discriminant of a curve y2 = x3 + ax + b is equal to zero. How does the graph of such a curve look like? Many problems in number theory are actually connected to elliptic curves. For example, consider the Fermat equation a3 + b3 = c3 . The question of existence of non-trivial solutions to this Diophantine equation is equivalent to solving the equation u3 + v3 = 1 142 in rational numbers u and v. If we now let x = 12(u2 − uv + v2 ), y = 36(u − v)(u2 − uv + v2 ), then y2 = x3 − 432. If some point (x, y) ∈ Q2 lies on the elliptic curve determined by the above equation, then it is straightforward to check that the numbers u= 36 + y 36 − y , v= 6x 6x are rational and satisfy u3 + v3 = 1. So once again the existence of a solution to some Diophantine equation reduces to the question of existence of a non-trivial rational point on some elliptic curve. The first questions about elliptic curves date back to Diophantus of Alexandria, who looked at the Diophantine equation of the form y(6 − y) = x3 − x. Fermat claimed that he knew how to solve the Diophantine equation y2 = x3 + 1, but did not provide his proof. The problem got fully resolved only one century later by Euler. The field of algebraic number theory essentially was born when Euler tried to solve the Diophantine equation y2 = x3 − 2 by writing √ √ x3 = y2 + 2 = (y + −2)(y − −2) √ √ and then claiming that y + −2 and y − −2 are “coprime”, without rigorously explaining what coprimeness means in this setting. Of course, his intuition was √ correct: the ring Z[ −2] is a √ √ Unique Factorization√Domain, and indeed one can show that y + −2 and y − −2 are coprime in Z[ −2], as long as y 6= 0. Elliptic curves got extensively studied over the past two centuries. The theory of elliptic curves truly blossomed with the prominent work of Weierstrass on elliptic functions, which connects elliptic curves defined over the field of complex numbers C to lattices on a complex plane. In fact, every elliptic curve arises from (or can be reduced to) a lattice on the complex plane! Elliptic curves are intimately connected to modular forms, and the development of the theories of elliptic curves and modular forms resulted in Andrew Wiles’s proof of Fermat’s Last Theorem (see Section 28 for more details). 143 Other prominent mathematicians which contributed a lot to the development of the theory of elliptic curves were Abel and Jacobi. By studying so-called elliptic integrals, they realized that, in fact, one can impose arithmetic on points of an elliptic curve. More precisely, such an arithmetic takes place whenever an elliptic curve E(F) is defined over a field F. This is why we restrict our attention only to F = Fq , Q, R, C and not, say, Z or Z/pk Z for p prime and k ≥ 2. The latter two collections are rings but not fields. To explain what this means, consider for now some elliptic curve E defined over the field of real numbers R. For two distinct points P, Q ∈ E(R), we draw a line through P and Q. Of course, this line is uniquely defined. For now, let us assume that this line is neither tangent to P nor to Q (see the first picture on Figure 6).49 Our line will intersect E at some third point, say R. Our arithmetic on an elliptic curve is then defined as follows: P + Q + R = ∞; that is, any three points P, Q and R which lie on E add up to ∞ (the point at infinity). Alternatively, if R0 = (xR0 , yR0 ), we can write P + Q = −R0 , so by “adding” two points together we were able to produce the third point, namely −R0 = (xR0 , −yR0 ). On Figure 6, the point at infinity is actually denoted by 0. Soon we will see that there is a deep reason for this alternative notation. Figure 6: Group law 49 The picture is taken from Wikipedia: https://upload.wikimedia.org/wikipedia/ commons/thumb/7/77/ECClines-2.0.svg/680px-ECClines-2.0.svg.png. 144 We can formalize the observations made above as follows. Let P = (xP , yP ) and Q = (xQ , yQ ). Suppose that xP 6= xQ . Let s= yP − yQ xP − xQ denote the slope of the line passing through the points P and Q. Then we define the third point R = (xR , yR ) = P + Q as follows: xR = s2 − xP − xQ , yR = −yP + s(xP − xR ). It is straightforward to verify that R indeed belongs to E(R). Furthermore, if we look closer at the expressions for xR and yR , we can notice that they preserve the field of definition. That is, if P and Q are points in R2 , then R is also a point in R2 . If P and Q are points in Q2 , then so is R. This applies to any field, so the procedure of addition of points is well-defined for any base field F. See Figure 7 for the demonstration that the field of definition remains unchanged. All the points in this example belong to Z2 , and therefore in Q2 as weill. Note, however, that in general an addition of two integer points on an elliptic curve may not result in an integer point, but it will result in a rational point or a point at infinity.50 We need to consider three special cases separately. For example on the second picture of Figure 6, we see the situation when the line is tangential to the point Q. This picture corresponds to the following: if instead of distinct points P and Q we pick two identical points, i.e. P = Q, then we can think of the tangent line as the line which passes through both P and Q. In this case, the slope of our line tangent to E at P = (xP , yP ) is equal to 3xP2 + a , s= 2yP and we may compute the point R = (xR , yR ) = P + P as xR = s2 − 3xP , yR = −yP + s(xP − xR ). Once again, we can easily verify that (xR , yR ) indeed lies on E and the formulas of xR and yR above preserve the field of definition. That is, if P ∈ F, then R = P + P ∈ F. For short, we write R = 2P, and more generally nP = |P + P + {z. . . + P} . n times 50 The picture is taken from William Stein’s lecture notes, Chapter 6, Figure 6.3: http:// wstein.org/simuw06/ch6.pdf. 145 Figure 7: The group law: (1, 0) + (0, 2) = (3, 4) on y2 = x3 − 5x + 4 The only two special cases left for us to consider is when P 6= Q with xP = xQ (third picture on Figure 6) and when P = Q with yP = 0 (fourth picture on Figure 6). Both cases result in a vertical line, which has an infinite slope. In the former case, we write P + Q = ∞, and in the latter case we write 2P = ∞. At this point, we covered all four cases that can arise. In this unorthodox way, we were able to define the operation of addition ”+” on E(F). We can also define the operation of negation: if P = (xP , yP ), we write −P = (xP , −yP ). Also, we can define the operation of subtraction ”−” as follows: P − Q = P + (−Q). One can also notice that the point at infinity plays the role of zero, and this explains the notation present in Figure 6. We summarize the observations made above (and introduce a few more) in Proposition 34.4. Proposition 34.4. (The Group Law) Let F be a field and E(F) be an elliptic curve. The collection of points E(F) forms a group, called a Mordell-Weil Group, under the operation of addition. That is, it satisfies the following four group axioms: 1. Closure. For all P, Q ∈ E(F), P + Q ∈ E(F); 2. Associativity. For all P, Q, R ∈ E(F), (P + Q) + R = P + (Q + R); 146 3. Identity element. For all P in E(F), the element ∞ satisfies P + ∞ = ∞ + P = P; 4. Inverse element. For each P in E(F) there exists an element −P in E(F) such that P + (−P) = (−P) + P = ∞. Furthermore, the group of points on an elliptic curve E(F) is Abelian: 5. Abelianness. For all P, Q ∈ E(F), P + Q = Q + P. Theorem 34.5. (Mordell’s Theorem, 1922) Every elliptic curve E defined over the field of rational numbers Q is a finitely generated Abelian group. That is, E(Q) ∼ = C × Zr , where r is a non-negative integer and C is a finite Abelian group. The main point of the above theorem is that the number r is finite. It is called the Mordell-Weil rank of an elliptic curve E(Q). Such a nice classification is impossible when the base field is R or C. In its essence, the theorem is saying that even though there can be infinitely many rational points, there cannot be “too many” of them in a very precise sense. To better explain the theorem of Mordell, let us recall the notion of an order of a group element. Just like for other groups that we studied, we say that the point P ∈ E(F) has order n if n is the smallest positive integer such that nP = ∞. If such an integer does not exist, we say that P has infinite order. According to Mordell’s Theorem, there exist r elements P1 , P2 , . . . , Pr of infinite order such that every element P ∈ E(Q) can be written in the form r P = T + ∑ ni Pi , i=1 where n1 , n2 , . . . , nr are integers and T is a point of finite order (such points are called torsion points). An elliptic curve is a first interesting example of what is called an Abelian variety. In 1928, the theorem of Mordell was generalized by the French mathematician André Weil to all Abelian varieties. We conclude this section with Siegel’s Theorem, which has profound consequences in the analysis of Diophantine equations related to elliptic curves. 147 Theorem 34.6. (Siegel’s Theorem, 1929) Every elliptic curve E(C) contains only finitely many integer points. That is, for any numbers a, b ∈ C such that 4a3 + 27b2 6= 0, the Diophantine equation y2 = x3 + ax + b has only finitely many solutions in integers x and y. 148