Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MA 311 NUMBER THEORY BUTLER UNIVERSITY FALL 2008 SCOTT T. PARSELL 1. Introduction Number theory could be defined fairly concisely as the study of the natural numbers: 1, 2, 3, 4, 5, 6, . . . . We usually denote this set by ℕ. The set of all integers (including 0 and the negatives) is denoted by ℤ. Is there anything about the natural numbers that’s worth studying? It seems that we have a pretty good understanding of them once we’ve learned to count! Perhaps surprisingly, this turns out to be a rich and fascinating field of study, bursting with unsolved problems. A good starting point for our investigations is to look at how the natural numbers factor. Primes. A prime number is a number greater than 1 that cannot be written as the product of two smaller natural numbers. The first few primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, . . . . Integers exceeding 1 that are not prime are called composite. The primes are important because each natural number greater than 1 can be written as a product of primes, and this factorization is unique (up to the order of the factors). For example, 24 = 23 ⋅ 3 and 105 = 3 ⋅ 5 ⋅ 7. It is fairly easy to show that there are infinitely many prime numbers; we’ll prove this in a later section. However, there remain many interesting unsolved (or partially solved) questions about the primes and how they are distributed. For example, ∙ How precisely can we estimate the number of primes less than 𝑥? (We know that 𝑥/ log 𝑥 gives a good first approximation.) What about primes of the form 4𝑛 + 1, of the form 4𝑛 + 3, etc.? ∙ Are there infinitely many primes of the form 𝑛2 + 1? How about of the form 2𝑛 − 1? Of the form 2𝑛 + 1? ∙ Is there an efficient algorithm for finding a number’s prime factorization or proving that a number is prime? (The difficulty of factoring efficiently is the basis of the security of RSA encryption.) ∙ Are there infinitely many pairs of “twin primes”, i.e., primes whose difference is two, such as 3 and 5)? If not, can anything be said about small gaps between primes asymptotically? ∙ (Goldbach’s problem) Can every even integer exceeding 2 be written as the sum of two primes? Questions about the distribution of primes usually fall under the heading of analytic number theory because many of the techniques are based on real and complex analysis (i.e., mathematics related to calculus). 1 2 SCOTT T. PARSELL Divisibility and congruences. Along with the idea of factoring integers comes the notion of divisibility. We say that 𝑎 divides 𝑏 if there exists an integer 𝑘 such that 𝑎𝑘 = 𝑏. For example, 4 divides 24 since 4 ⋅ 6 = 24, and 15 divides 105 since 15 ⋅ 7 = 105. Divisibility leads to the important idea of congruences. We say that 𝑎 is congruent to 𝑏 modulo 𝑛 if 𝑛 divides 𝑎 − 𝑏. In this case, we write 𝑎≡𝑏 (mod 𝑛). For example, 3 ≡ 75 (mod 24) and 8 ≡ 38 (mod 10). Arithmetic with congruences (sometimes called modular arithmetic) is useful for detecting certain types of periodic phenomena. For example, one could use arithmetic mod 24 to keep track of the hour of day (in military time) without regard to minutes, seconds, or day. One could use arithmetic mod 10 to keep track of the last digit of a positive number (or mod 100 to keep track of the last two digits). If 𝑛 objects are arranged in a circle, then arithmetic mod 𝑛 can be used to keep track of the positions of the objects as they are rearranged. We’ll see some more interesting uses of congruences later on. For instance, they can be used to construct check-digit schemes to minimize errors in data entry. Facts about the computation of powers modulo 𝑛 form the basis for constructing an RSA cryptosystem. Rings and fields. If one is doing arithmetic with congruences, say modulo 6, then effectively there are only 6 distinct “numbers” to work with, usually denoted by 0, 1, 2, 3, 4, and 5. Under this scheme, the number 0 actually stands for the set [0]6 = {. . . , −24, −18, −12, −6, 0, 6, 12, 18, 24, . . . }. Similarly, 1 stands for [1]6 = {. . . , −17, −11, −5, 1, 7, 13, 19, . . . }, and so on. However, it is convenient to pick one small integer (usually either the smallest positive integer or the one of smallest absolute value) to represent each “congruence class”. The integers themselves are an example of an abstract algebraic structure called a ring, which is basically a set equipped with addition and multiplication operations satisfying basic properties like associativity and the distributive law (we omit the precise definition of a ring here). The set of congruence classes {0, 1, 2, 3, 4, 5} can be viewed as a ring in its own right, sometimes denoted by ℤ/6ℤ or ℤ6 , with addition and multiplication defined modulo 6. For example, 2 + 5 = 1 and 2 ⋅ 3 = 0 in the ring ℤ6 . One defect of rings is that multiplicative inverses do not exist in general. For example, 2 does not have a multiplicative inverse in ℤ, nor in ℤ6 . However, 2 does have a multiplicative inverse in ℤ7 , since 2 ⋅ 4 = 1 under mod 7 arithmetic. Special rings in which all nonzero elements have multiplicative inverses (such as the rational numbers, real numbers, and complex numbers) are called fields. It turns out that ℤ𝑛 = {0, 1, 2, . . . , 𝑛 − 1}, under arithmetic modulo 𝑛 is a field if and only if 𝑛 is prime. Our algebra with congruences will be influenced by these considerations. Just as the equation 2𝑥 = 1 can be solved over the rationals but not over the integers, the congruence 2𝑥 ≡ 1 (mod 𝑛) can be solved when 𝑛 = 7 but not when 𝑛 = 6 (in other words, the equation 2𝑥 = 1 has a solution over ℤ7 but not over ℤ6 ). One can construct further examples of rings √ by “adjoining” irrational or complex numbers to the set of integers. For example if 𝑖 = −1, then the set ℤ[𝑖] of all complex numbers of the form 𝑎 + 𝑏𝑖, where 𝑎 and 𝑏 are integers, forms a ring, known as the ring of Gaussian MA 311 NUMBER THEORY FALL 2008 3 integers. One can ask whether such a ring has any number-theoretic properties in common with the integers, such as unique factorization. It turns out that this ring does have unique factorization, but not all the integer primes remain prime in ℤ[𝑖]. For instance, 2 = (1 + 𝑖)(1 − 𝑖), but 3 remains irreducible. The numbers 1 + 𝑖 and 1 − 𝑖 are primes in ℤ[𝑖], and the number 6 has the √unique prime factorization 6 = (1 + 𝑖) ⋅ (1 − 𝑖) ⋅ 3. If we let 𝛿 = −5, then we can construct the ring ℤ[𝛿], which is the set of all complex numbers of the form 𝑎 + 𝑏𝛿, where 𝑎 and 𝑏 are integers. Something bizarre happens when we try to factor 6 in this ring. We obviously have 6=2⋅3 and 6 = (1 + 𝛿)(1 − 𝛿), and one can show that 2, 3, 1 + 𝛿, and 1 − 𝛿 are all irreducible in the ring ℤ[𝛿]. Thus we have two different factorizations for 6, which means that unique factorization fails in this ring! The study of primes and factorization in rings such as ℤ[𝑖] and ℤ[𝛿] forms the basis for much of algebraic number theory. Here one makes heavy use of general results from modern algebra, so we won’t pursue this branch of the subject very deeply. Diophantine equations. One area of number theory that we hope to touch on later in the course overlaps with both analytic and algebraic number theory. A diophantine equation is simply an equation (usually a polynomial in two or more variables) for which we seek integer (or sometimes rational) solutions; a classic example is the equation 𝑥2 + 𝑦 2 = 𝑧 2 . This equation has many integer solutions, such as (3, 4, 5) and (5, 12, 13). In fact, it can be shown that there are infinitely many integer solutions, and all the solutions can be described by an explicit parametrization. These are the so-called Pythagorean triples, which correspond to the lengths of the sides in right triangles. Interestingly, things become dramatically different if we change the equation to 𝑥3 + 𝑦 3 = 𝑧 3 . Here the only integer solutions are the “trivial” ones with 𝑥𝑦𝑧 = 0. In fact, Fermat’s Last Theorem asserts that if 𝑘 is any integer exceeding 2 then the diophantine equation 𝑥𝑘 + 𝑦 𝑘 = 𝑧 𝑘 has only trivial solutions. This seemingly innocent conjecture remained unproven for over 300 years until deep work of Wiles resolved it in 1995. As another example, consider the diophantine equation 𝑦 2 = 𝑥3 +17. This is an example of an elliptic curve, which more generally has the form 𝑦 2 = 𝑓 (𝑥), where 𝑓 is a cubic polynomial. It turns out that the rational points lying on such a curve have an additive group structure, and this can be used as the basis for an encryption scheme and also for an efficient factoring algorithm. Wiles also exploited connections with elliptic curves in his proof of Fermat’s Last Theorem. All this work on diophantine equations in few variables uses primarily algebraic techniques, so the detailed study of these topics is best left for a more advanced course. A theorem of Lagrange states that every positive integer can be expressed as the sum of four perfect squares. In other words, the diophantine equation 𝑥21 + 𝑥22 + 𝑥23 + 𝑥24 = 𝑛 can be solved for every positive integer 𝑛. For instance, when 𝑛 = 31 we can take 𝑥1 = 5, 𝑥2 = 2, 𝑥3 = 1, and 𝑥4 = 1. A generalization of this question known as Waring’s problem asks what happens with higher powers. For instance, how large does 𝑠 have to be in order to represent all integers as sums of 𝑠 perfect cubes? (The answer turns out to be 9.) What if we only need to represent all sufficiently large integers? Here we know that 7 cubes suffice, 4 SCOTT T. PARSELL but it’s conjectured that 4 would be enough! The type of diophantine equation involved in Waring’s problem typically has enough variables that it can be attacked by analytic methods, and this has been a very active area of research over the past 20 years. We’ll discuss some of the underlying ideas later in the course. In Waring’s problem, one could also ask what happens if the variables are restricted to be primes. For example, the Goldbach problem mentioned earlier amounts to solving the equation 𝑝1 + 𝑝2 = 𝑛 in primes 𝑝1 and 𝑝2 for every even 𝑛 > 2. The general Waring-Goldbach problem considers the solubility of the diophantine equation 𝑝𝑘1 + ⋅ ⋅ ⋅ + 𝑝𝑘𝑠 = 𝑛 in primes 𝑝1 , . . . , 𝑝𝑠 for every 𝑛 for which the underlying congruences are feasible. A variation known as a diophantine inequality arises when attempting to approximate irrational number √numbers by rational numbers. For instance, if we want to find a rational √ close to 2, then we are looking for integer solutions to the inequality ∣𝑥/𝑦 − 2∣ < 𝜀, where 𝜀 is a small positive number. Dirichlet’s theorem on diophantine approximation actually tells us that we can solve this inequality with 𝜀 replaced by an explicit function of the √ denominator, namely 1/𝑦 2 . Thus we can solve the diophantine inequality ∣𝑥 − 2𝑦∣ < 1/𝑦. More general inequalities (for example, involving sums of 𝑘th powers) are a subject of current research interest. Where do we begin? We’ve only scratched the surface of number theory by mentioning some of the important ideas and some of the interesting unsolved problems. In the next section, we’ll start laying the foundations for our study by developing some actual machinery on divisibility, primes, and congruences. This will lead us to our first main goal, which is to understand RSA cryptography. Following that, we hope to touch on some of the more advanced topics mentioned above, such as the distribution of primes, the algebraic structure of ℤ𝑛 , Waring’s problem, and diophantine approximation. 2. Divisibility Recall that if 𝑎, 𝑏 ∈ ℤ, we say that 𝑎 divides 𝑏 (and write 𝑎∣𝑏) if there exists 𝑘 ∈ ℤ such that 𝑏 = 𝑎𝑘. For example, 2 divides 6, but 4 does not divide 6. When 𝑎 divides 𝑏, we say that 𝑏 is a multiple of 𝑎 and that 𝑏 is divisible by 𝑎. Two easy properties of divisibility that we’ll find useful are given in the following lemma. Lemma 2.1. Let 𝑎, 𝑏, and 𝑐 be integers. (a) If 𝑎∣𝑏 and 𝑏∣𝑐, then 𝑎∣𝑐. (b) If 𝑎∣𝑏 and 𝑎∣𝑐, then 𝑎∣(𝑏𝑠 + 𝑐𝑡) for all integers 𝑠 and 𝑡. Proof. If 𝑎∣𝑏 and 𝑏∣𝑐, then we can write 𝑏 = 𝑎𝑘 and 𝑐 = 𝑏𝑙 for some integers 𝑘 and 𝑙. We then have 𝑐 = 𝑎(𝑘𝑙), which shows that 𝑎∣𝑐. Similarly, if 𝑎∣𝑏 and 𝑎∣𝑐, then we can write 𝑏 = 𝑎𝑘 and 𝑐 = 𝑎𝑙 for some integers 𝑘 and 𝑙. If 𝑠 and 𝑡 are arbitrary integers, we have 𝑏𝑠 + 𝑐𝑡 = 𝑎𝑘𝑠 + 𝑎𝑙𝑡 = 𝑎(𝑘𝑠 + 𝑙𝑡), which shows that 𝑎∣(𝑏𝑠 + 𝑐𝑡). □ The following divisibility exercise gives us a chance to review proof by mathematical induction. Example 2.2. Prove that 𝑛5 − 𝑛 is divisible by 5 for every positive integer 𝑛. MA 311 NUMBER THEORY FALL 2008 5 Solution. We proceed by induction on 𝑛. First of all, we have 15 − 1 = 0, which is clearly divisible by 5, since 0 = 5 ⋅ 0. This establishes the base case. Now suppose that 𝑛 ≥ 1 is an integer and that 𝑛5 − 𝑛 is divisible by 5. Then by the binomial theorem one has (𝑛 + 1)5 − (𝑛 + 1) = 𝑛5 + 5𝑛4 + 10𝑛3 + 10𝑛2 + 5𝑛 + 1 − 𝑛 − 1 = (𝑛5 − 𝑛) + 5(𝑛4 + 2𝑛3 + 2𝑛2 + 𝑛). Here the first term on the right is divisible by 5 according to the induction hypothesis, and the second term is clearly divisible by 5 since 𝑛4 + 2𝑛3 + 2𝑛2 + 𝑛 is an integer. We therefore deduce from part (b) of Lemma 2.1 that (𝑛 + 1)5 − (𝑛 + 1) is divisible by 5, and the result now follows by induction. □ In the future, we will not always be quite so pedantic in writing, but the above solution serves as a good model for constructing proofs of this type. In general, to prove that a statement 𝑃 (𝑛) holds for all positive integers 𝑛, one must first establish 𝑃 (1) and then prove the implication 𝑃 (𝑛) =⇒ 𝑃 (𝑛 + 1). This principle is one of the fundamental axioms about the integers. It is equivalent to the well-ordering principle, which states that every non-empty subset of the positive integers has a smallest element. Greatest common divisors. The greatest common divisor of 𝑎 and 𝑏 is the largest positive integer that divides both 𝑎 and 𝑏. It is denoted by gcd(𝑎, 𝑏), or sometimes just (𝑎, 𝑏) when there is no danger of confusion with an ordered pair. For example, gcd(4, 6) = 2, gcd(12, 51) = 3, and gcd(9, 16) = 1. If gcd(𝑎, 𝑏) = 1, then we say that 𝑎 and 𝑏 are relatively prime (or coprime). We note that gcd(𝑎, 0) = 𝑎 for every non-zero integer 𝑎 and that gcd(0, 0) is undefined. The least common multiple of 𝑎 and 𝑏 is the smallest positive integer that is a multiple of both 𝑎 and 𝑏. It is denoted by lcm(𝑎, 𝑏) or [𝑎, 𝑏]. For example, lcm(4, 6) = 12. It is fairly easy to see that gcd(𝑎, 𝑏)lcm(𝑎, 𝑏) = 𝑎𝑏. When 𝑎 and 𝑏 are small, one can compute gcd(𝑎, 𝑏) fairly easily by looking at the prime factorizations of 𝑎 and 𝑏 and picking out the parts in common. For instance, 24 = 23 ⋅ 3 and 180 = 22 ⋅ 32 ⋅ 5, so gcd(24, 180) = 22 ⋅ 3 = 12. However, since factoring is expensive computationally, this is not an efficient method when 𝑎 and 𝑏 are large. A better method is based on the division with remainder algorithm learned in grade school. Theorem 2.3. (Division with remainder) For any integers 𝑎 and 𝑏 with 𝑏 > 0, there exist unique integers 𝑞 and 𝑟 such that 𝑎 = 𝑞𝑏 + 𝑟 and 0 ≤ 𝑟 < 𝑏. Proof. We first prove the existence of 𝑞 and 𝑟. Consider the list of integers . . . 𝑎 − 3𝑏, 𝑎 − 2𝑏, 𝑎 − 𝑏, 𝑎, 𝑎 + 𝑏, 𝑎 + 2𝑏, 𝑎 + 3𝑏, . . . . Since 𝑏 > 0, we can select one with the smallest non-negative value, say 𝑟 = 𝑎 − 𝑞𝑏. If 𝑟 ≥ 𝑏, then we find that 𝑟 − 𝑏 = 𝑎 − 𝑞𝑏 − 𝑏 = 𝑎 − (𝑞 + 1)𝑏 is a non-negative number on our list with a smaller value than 𝑟, which contradicts our choice of 𝑞. Thus we have 0 ≤ 𝑟 < 𝑏 and 𝑎 = 𝑞𝑏 + 𝑟. To check uniqueness, suppose there are integers 𝑞1 , 𝑞2 , 𝑟1 , and 𝑟2 with 𝑎 = 𝑞1 𝑏 + 𝑟1 = 𝑞2 𝑏 + 𝑟2 and 0 ≤ 𝑟1 , 𝑟2 < 𝑞. 6 SCOTT T. PARSELL Then we have 𝑏(𝑞1 − 𝑞2 ) = 𝑟2 − 𝑟1 , and we may suppose without loss of generality that 𝑟1 ≤ 𝑟2 . Then 0 ≤ 𝑟2 − 𝑟1 < 𝑏 − 𝑟1 ≤ 𝑏, and hence 0 ≤ 𝑏(𝑞1 − 𝑞2 ) < 𝑏, which implies that 𝑞1 − 𝑞2 = 0. Thus 𝑞1 = 𝑞2 , and it follows that 𝑟1 = 𝑟2 . □ For example, if 𝑎 = 48 and 𝑏 = 9, then we can write 48 = 5 ⋅ 9 + 3, so we can take 𝑞 = 5 and 𝑟 = 3 in Theorem 2.3. We call 𝑞 the quotient and 𝑟 the remainder. Notice that 𝑟 = 0 if and only if 𝑏 divides 𝑎. Theorem 2.4. Let 𝑎 and 𝑏 be nonzero integers. Then gcd(𝑎, 𝑏) is the smallest positive integral linear combination of 𝑎 and 𝑏. That is, gcd(𝑎, 𝑏) is the smallest positive value of 𝑎𝑠 + 𝑏𝑡, where 𝑠 and 𝑡 are integers. Proof. By taking 𝑠 = 𝑎 and 𝑡 = 𝑏, we see that positive integral linear combinations exist, so we can let 𝑔 denote the smallest such value. Write 𝑔 = 𝑎𝑠0 + 𝑏𝑡0 . By Theorem 2.3, we can write 𝑎 = 𝑞𝑔 + 𝑟 = 𝑞(𝑎𝑠0 + 𝑏𝑡0 ) + 𝑟, where 0 ≤ 𝑟 < 𝑔. Solving for 𝑟, we get 𝑟 = 𝑎(1 − 𝑞𝑠0 ) + 𝑏(−𝑞𝑡0 ), so 𝑟 is an integral linear combination of 𝑎 and 𝑏, and since 𝑟 < 𝑔, the minimality of 𝑔 implies that 𝑟 = 0. Thus we see that 𝑔 divides 𝑎, and we can apply a similar argument to deduce that 𝑔 divides 𝑏. Thus 𝑔 is a common divisor of 𝑎 and 𝑏. Moreover, if 𝑑 is any common divisor of 𝑎 and 𝑏, then 𝑑 divides both 𝑎𝑠0 and 𝑏𝑡0 , so 𝑑 divides 𝑔. Thus we conclude that 𝑔 = gcd(𝑎, 𝑏). □ Corollary 2.5. The integers 𝑎 and 𝑏 are relatively prime if and only if there exist integers 𝑠 and 𝑡 such that 𝑎𝑠 + 𝑏𝑡 = 1. Proof. If gcd(𝑎, 𝑏) = 1, then it follows from Theorem 2.4 that 𝑎𝑠 + 𝑏𝑡 = 1 for some integers 𝑠 and 𝑡. Conversely, suppose that 1 can be expressed as a linear combination of 𝑎 and 𝑏. Since Theorem 2.4 ensures that gcd(𝑎, 𝑏) is the smallest positive integer with this property, we may conclude that gcd(𝑎, 𝑏) = 1. □ For example, we have 9 ⋅ (−7) + 16 ⋅ 4 = 1, which shows that gcd(9, 16) = 1. An efficient algorithm for computing gcd(𝑎, 𝑏) is based on the following simple result. Lemma 2.6. If 𝑎 = 𝑞𝑏 + 𝑟, then gcd(𝑎, 𝑏) = gcd(𝑏, 𝑟). Proof. If 𝑑 divides both 𝑎 and 𝑏, then 𝑑 clearly divides 𝑟 = 𝑎 − 𝑞𝑏, so 𝑑 is a common divisor of 𝑏 and 𝑟. Conversely, if 𝑑 divides both 𝑏 and 𝑟, then 𝑑 clearly divides 𝑎 = 𝑞𝑏 + 𝑟, so 𝑑 is a common divisor of 𝑎 and 𝑏. Therefore the set of common divisors of 𝑎 and 𝑏 is identical to the set of common divisors of 𝑏 and 𝑟, so the greatest common divisors must be equal. □ The Euclidean Algorithm. We can compute the greatest common divisor very efficiently by successively applying Theorem 2.3 and Lemma 2.6. The gcd is the last non-zero MA 311 NUMBER THEORY FALL 2008 7 remainder in this process. That is, to compute gcd(𝑎, 𝑏), we write 𝑎 = 𝑏𝑞1 + 𝑟1 (0 < 𝑟1 < 𝑏) 𝑏 = 𝑟 1 𝑞2 + 𝑟 2 (0 < 𝑟2 < 𝑟1 ) 𝑟1 = 𝑟2 𝑞3 + 𝑟3 ... (0 < 𝑟3 < 𝑟2 ) 𝑟𝑗−2 = 𝑟𝑗−1 𝑞𝑗 + 𝑟𝑗 𝑟𝑗−1 = 𝑟𝑗 𝑞𝑗+1 , (0 < 𝑟𝑗 < 𝑟𝑗−1 ) so that gcd(𝑎, 𝑏) = 𝑟𝑗 . Example 2.7. Use the Euclidean algorithm to compute 𝑑 = gcd(630, 132), and find integers 𝑠 and 𝑡 such that 𝑑 = 630𝑠 + 132𝑡. Solution. We have 630 = 132 ⋅ 4 + 102 132 = 102 ⋅ 1 + 30 102 = 30 ⋅ 3 + 12 30 = 12 ⋅ 2 + 6 12 = 6 ⋅ 2, so the algorithm terminates with 𝑗 = 4, and we have gcd(630, 132) = 𝑟4 = 6. We can now work backwards through these equations to find the required integers 𝑠 and 𝑡. We have 6 = 30 − 12 ⋅ 2 = 30 − (102 − 30 ⋅ 3) ⋅ 2 = 30 ⋅ 7 − 102 ⋅ 2 = (132 − 102) ⋅ 7 − 102 ⋅ 2 = 132 ⋅ 7 − 102 ⋅ 9 = 132 ⋅ 7 − (630 − 132 ⋅ 4) ⋅ 9 = 132 ⋅ 43 − 630 ⋅ 9, so we can take 𝑠 = −9 and 𝑡 = 43. □ There is another way to organize the computations in the Euclidean algorithm that produces gcd(𝑎, 𝑏) and the integers 𝑠 and 𝑡 simultaneously. The idea is to set up an augmented matrix consisting of a 2 × 2 identity matrix, followed by 𝑎 and 𝑏 in the third column. One then subtracts one a multiple of one row from the other until the entries in the third column divide one another. The multiples we use are exactly the quotients 𝑞1 , 𝑞2 , . . . , 𝑞𝑗 . Thus Example 2.7 could be handled as follows: ] [ ] [ ] [ 1 −4 102 1 −4 102 1 0 630 → → 0 1 132 0 1 132 −1 5 30 ] [ ] [ 4 −19 12 4 −19 12 → . → −1 5 30 −9 43 6 8 SCOTT T. PARSELL Every row [𝑥 𝑦 ∣ 𝑧] of every matrix in this computation has the property that 630𝑥 + 132𝑦 = 𝑧, because this is satisfied by the initial matrix and is preserved by the row operations. Therefore, the required integers 𝑠 and 𝑡 appear to the left of gcd(𝑎, 𝑏) in the final matrix. In the worst case, the Euclidean algorithm takes on the order of log 𝑛 steps to compute gcd(𝑎, 𝑏), where 𝑛 = max(∣𝑎∣, ∣𝑏∣). The function log 𝑛 grows very slowly as 𝑛 → ∞, so the algorithm runs very quickly on a computer. Primes. Recall that an integer 𝑛 > 1 is said to be prime if its only positive factors are 1 and 𝑛. One can generate all the primes up to 𝑁 using the Sieve of Eratosthenes to successively strike out all the proper multiples √ of 2, 3, 5, etc. If an integer less than 𝑁 is√not prime, then it has a prime divisor less than 𝑁 , so one can terminate this process at 𝑁 . The integers that remain uncrossed are the primes up to 𝑁 . Lemma 2.8. (Euclid’s Lemma) Let 𝑎 and 𝑏 be integers, and let 𝑝 be a prime. If 𝑝∣𝑎𝑏, then 𝑝∣𝑎 or 𝑝∣𝑏. Proof. Suppose that 𝑝 divides 𝑎𝑏 but that 𝑝 does not divide 𝑎. Since 𝑝 is prime, we must have gcd(𝑎, 𝑝) = 1, so by Theorem 2.4 there exist integers 𝑠 and 𝑡 such that 𝑎𝑠 + 𝑝𝑡 = 1. Multiplying through by 𝑏, we obtain 𝑎𝑏𝑠 + 𝑝𝑏𝑡 = 𝑏. Since 𝑝∣𝑎𝑏 and 𝑝∣𝑝, we deduce from part (b) of Lemma 2.1 that 𝑝∣𝑏. □ Note that Lemma 2.8 fails if 𝑝 is not prime. For example, 6∣12 = 3⋅4, but 6 does not divide 3 or 4. One can easily show by induction that Lemma 2.8 can be extended to products of more than two integers. That is, if 𝑝 is a prime dividing the product 𝑎1 ⋅ ⋅ ⋅ 𝑎𝑚 , then 𝑝 must divide at least one of the 𝑎𝑖 . As a simple application of Euclid’s Lemma, we perform the following entertaining exercise. √ Example 2.9. Prove that 2 is irrational. √ √ Solution. We proceed by contradiction. If 2 were rational, then we could write 2 = 𝑎/𝑏 for some positive integers 𝑎 and 𝑏 with (𝑎, 𝑏) = 1. After squaring both sides and clearing denominators, we find that 2𝑏2 = 𝑎2 , and hence in particular that 2∣𝑎2 . Since 2 is prime, it now follows from Euclid’s Lemma that 2∣𝑎, so we can write 𝑎 = 2𝑐 for some integer 𝑐. Substituting this into our previous equation yields 2𝑏2 = 4𝑐2 , or 𝑏2 = 2𝑐2 . Thus 2∣𝑏2 and hence by Euclid’s Lemma we have 2∣𝑏. We have now deduced that both 𝑎 and 𝑏 are divisible by 2, contradicting our original assumption that (𝑎, 𝑏) = 1. This contradiction forces us to √ conclude that 2 is in fact irrational. □ √ Note that there is little difficulty in generalizing the argument to handle 𝑝, where 𝑝 is any √ prime. In fact it is not hard to see that 𝑛 is irrational if and only if 𝑛 fails to be a perfect square, but this requires information about factoring composite integers. The following result is the most important application of Euclid’s Lemma and, as its name suggests, is fundamental to our study of number theory. Theorem 2.10. (Fundamental Theorem of Arithmetic) Every integer 𝑛 > 1 can be written as a product of prime factors, and this factorization is unique up to the order of the factors. MA 311 NUMBER THEORY FALL 2008 9 Proof. The existence of factorizations follows easily by induction on the size of the integer 𝑛. For the base case, it suffices to note that 𝑛 = 2 is prime. Now suppose that 𝑛 ≥ 2 and that every integer 𝑘 with 2 ≤ 𝑘 ≤ 𝑛 − 1 has a factorization into primes. If 𝑛 is prime, then we are done. Otherwise, we may write 𝑛 = 𝑎𝑏 where 2 ≤ 𝑎, 𝑏 ≤ 𝑛 − 1, and the induction hypothesis shows that 𝑎 and 𝑏 both have factorizations, which combine to produce a factorization of 𝑛. To prove uniqueness, we induct on the number of factors. Suppose that 𝑛 = 𝑝1 ⋅ ⋅ ⋅ 𝑝𝑟 = 𝑞1 ⋅ ⋅ ⋅ 𝑞𝑠 , where the 𝑝𝑖 and 𝑞𝑖 are primes, and we may assume without loss of generality that 𝑟 ≤ 𝑠. If 𝑟 = 1, then clearly 𝑠 = 1, so 𝑝1 = 𝑞1 . Now let 𝑟 > 1, and suppose that unique factorization holds for all integers with fewer than 𝑟 prime factors. Since 𝑝1 ∣𝑞1 ⋅ ⋅ ⋅ 𝑞𝑠 , we have 𝑝1 ∣𝑞𝑖 (and hence 𝑝1 = 𝑞𝑖 ) for some 𝑖 by an easy extension of Euclid’s Lemma. By relabeling, we may suppose that 𝑖 = 1, and hence we may divide through by 𝑝1 to get 𝑝2 ⋅ ⋅ ⋅ 𝑝𝑟 = 𝑞2 ⋅ ⋅ ⋅ 𝑞𝑠 . The induction hypothesis now implies that 𝑟 = 𝑠 and that 𝑝2 , . . . , 𝑝𝑟 is a permutation of 𝑞2 , . . . , 𝑞𝑠 , and the uniqueness follows. □ √ In rings where unique factorization fails, like ℤ[ −5], the problem is that the notions of “irreducible” and “prime” do not correspond. The property in Lemma 2.8 is used as the definition of prime, but there are √ irreducible elements that don’t satisfy this property. For example, 2 is irreducible in ℤ[ −5], but it is not √ √ √ prime in this √ ring because 2 divides 6 = (1 + −5)(1 − −5), but 2 does not divide 1 + −5 or 1 − −5 Theorem 2.11. There are infinitely many primes. Proof. Assume to the contrary that there are only finitely many primes, say 𝑝1 , 𝑝2 , . . . , 𝑝𝑛 , and let 𝑁 = 𝑝1 𝑝2 ⋅ ⋅ ⋅ 𝑝𝑛 + 1. We know from Theorem 2.10 that 𝑁 has at least one prime factor, say 𝑞. We cannot have 𝑞 = 𝑝𝑖 for some 𝑖 because this would imply that 𝑞 divides 1 = 𝑁 − 𝑝1 𝑝2 ⋅ ⋅ ⋅ 𝑝𝑛 . This is a contradiction, so we conclude that there must be infinitely many primes. □ This theorem was first proved by Euclid, and we’ve given his original proof. Many other proofs have been discovered since Euclid’s time. A more general theorem of Dirichlet states that there are infinitely many primes of the form 𝑝 = 𝑞𝑛 + 𝑎 whenever 𝑞 and 𝑎 are relatively prime. For example, there are infinitely many primes of the form 𝑝 = 4𝑛 + 1 and also of the form 𝑝 = 4𝑛 + 3. A weak version of the prime number theorem states that if 𝜋(𝑥) denotes the number of primes up to 𝑥, then 𝜋(𝑥) ∼ 𝑥/ log 𝑥 asymptotically, in the sense that 𝜋(𝑥) = 1. 𝑥→∞ 𝑥/ log 𝑥 lim One could interpret this by saying that the probability that the integer 𝑥 is prime is roughly 1/ log 𝑥. Throughout these notes log 𝑥 denotes the natural (base 𝑒) logarithm. Theorem 2.12. There are arbitrarily large gaps between consecutive primes. 10 SCOTT T. PARSELL Proof. Given an integer 𝑛 > 1, we’ll construct a list of 𝑛 consecutive composite numbers. If we let 𝑎 = (𝑛 + 1)! + 2, then the 𝑛 numbers 𝑎, 𝑎 + 1, 𝑎 + 2, . . . , 𝑎 + 𝑛 − 1 are all composite, since 𝑘 + 2 divides 𝑎 + 𝑘 = (𝑛 + 1)! + (𝑘 + 2) for 𝑘 = 0, 1, 2 . . . , 𝑛 − 1. □ At the other extreme, the Twin Primes Conjecture states that there are infinitely pairs of primes whose difference is 2, for instance (3, 5), (5, 7), (11, 13), (17, 19), (29, 31), (41, 43), . . . . Those familiar with analysis may wish to observe that if 𝑝𝑛 denotes the 𝑛th prime then Theorem 2.12 is equivalent to the statement that lim sup(𝑝𝑛+1 − 𝑝𝑛 ) = ∞, while the Twin Primes Conjecture asserts that lim inf(𝑝𝑛+1 − 𝑝𝑛 ) = 2. In spite of some recent breakthroughs in this area, we do not even know for sure that lim inf(𝑝𝑛+1 − 𝑝𝑛 ) < ∞. This indicates that we’re not very close to a proof of the Twin Primes Conjecture! Perfect numbers and Mersenne primes. A positive integer is said to be perfect if it is the sum of its proper positive divisors (that is, not including the number itself). For example, 6=1+2+3 and 28 = 1 + 2 + 4 + 7 + 14 are perfect. The first few perfect numbers are 6, 28, 496, 8128, 33550336. It is believed that there are infinitely many perfect numbers, but this is not known. Another open problem is to determine whether there are any odd perfect numbers (it’s believed that the answer is no). Theorem 2.13. A positive even integer 𝑚 is perfect if and only if we can write 𝑚 = 2𝑛−1 (2𝑛 − 1), where 2𝑛 − 1 is prime. Proof. First suppose that 𝑝 = 2𝑛 − 1 is prime. We need to show that 𝑚 = 2𝑛−1 𝑝 is perfect. The proper positive divisors of 𝑚 are 1, 2, 4, 8, . . . , 2𝑛−1 , 𝑝, 2𝑝, 4𝑝, 8𝑝, . . . , 2𝑛−2 𝑝, so their sum is 2𝑛 − 1 + 𝑝(2𝑛−1 − 1) = 𝑝 + (2𝑛−1 − 1)𝑝 = 2𝑛−1 𝑝 = 𝑚. This shows that 𝑚 is perfect. Conversely, suppose that 𝑚 is an even perfect number. We need to show that there is an integer 𝑛 such that 𝑚 = 2𝑛−1 (2𝑛 − 1) and 2𝑛 − 1 is prime. Since 𝑚 is even, we can write 𝑚 = 2𝑎 𝑡, where 𝑎 ≥ 1 and 𝑡 is odd. Let 𝑆 denote the sum of all the positive divisors of 𝑡 (i.e., the sum of the odd positive divisors of 𝑚). Since 𝑚 is perfect, we know that the sum of all the positive divisors of 𝑚 is equal to 2𝑚, so we have have 2𝑚 = 𝑆 + 2𝑆 + 4𝑆 + 8𝑆 + ⋅ ⋅ ⋅ + 2𝑎 𝑆 = (2𝑎+1 − 1)𝑆, and thus 𝑆= 2𝑚 2𝑎+1 𝑡 (2𝑎+1 − 1)𝑡 + 𝑡 𝑡 = = = 𝑡 + . 2𝑎+1 − 1 2𝑎+1 − 1 2𝑎+1 − 1 2𝑎+1 − 1 MA 311 NUMBER THEORY FALL 2008 11 Since 𝑆 and 𝑡 are integers, we see that 𝑢 = 𝑡/(2𝑎+1 − 1) is an integer, and 𝑢 < 𝑡 since 𝑎 ≥ 1. Thus 𝑢 and 𝑡 are two distinct divisors of 𝑡. It follows that they are the only positive divisors of 𝑡, whence 𝑡 is prime and 𝑢 = 1. Thus we have 𝑡 = 2𝑎+1 − 1, so on setting 𝑛 = 𝑎 + 1 we get 𝑚 = 2𝑛−1 𝑡 = 2𝑛−1 (2𝑛 − 1), where 2𝑛 − 1 is prime. □ Primes of the form 2𝑛 − 1 are called Mersenne primes. As a result of Theorem 2.13, finding even perfect numbers is equivalent to finding Mersenne primes. Notice that 6 = 21 ⋅ (22 − 1), 28 = 22 (23 − 1), 496 = 24 (25 − 1), 8128 = 26 (27 − 1), and 33550336 = 212 (213 − 1). The following theorem restricts the possibilities somewhat. Theorem 2.14. If 2𝑛 − 1 is prime, then 𝑛 is prime. Proof. We prove the contrapositive. Suppose that 𝑛 is composite. Then we can write 𝑛 = 𝑎𝑏 for some integers 𝑎 and 𝑏 with 1 < 𝑎, 𝑏 < 𝑛. Then we have 2𝑛 − 1 = 2𝑎𝑏 − 1 = (2𝑎 )𝑏 − 1 = (2𝑎 − 1)(1 + 2𝑎 + 22𝑎 + ⋅ ⋅ ⋅ + 2(𝑏−1)𝑎 ). Here we have used the factorization 𝑥𝑏 − 1 = (𝑥 − 1)(1 + 𝑥 + 𝑥2 + ⋅ ⋅ ⋅ + 𝑥𝑏−1 ) with 𝑥 = 2𝑎 . Since 1 < 𝑎 < 𝑛, we have 1 < 2𝑎 − 1 < 2𝑛 − 1, and hence we conclude that 2𝑛 − 1 is composite. □ The converse of Theorem 2.14 is false. That is, there exist primes 𝑝 for which 2𝑝 − 1 is not prime. The smallest example is 211 − 1 = 2047 = 23 ⋅ 89. There are 46 known Mersenne primes, the largest of which is 243,112,609 − 1. This was discovered in August 2008 and has 12,978,189 digits. The largest known perfect number is therefore 243,112,608 (243,112,609 − 1). This world-record prime was actually the 45th Mersenne prime to be discovered. The 46th one was found about two weeks later but has only 11,185,272 digits. To join the Great Internet Mersenne Prime Search (GIMPS), go to http://www.mersenne.org. 3. Congruences Let 𝑛 be a positive integer, and let 𝑎 and 𝑏 be arbitrary integers. We say that 𝑎 and 𝑏 are congruent modulo 𝑛 if 𝑛 divides 𝑎 − 𝑏. In this case, we write 𝑎≡𝑏 (mod 𝑛). For example, we have 37 ≡ 2 (mod 5), 37 ≡ −3 (mod 5), and 24 ≡ 0 (mod 6). Notice that 𝑎 ≡ 0 (mod 𝑛) if and only if 𝑛∣𝑎 and that 𝑎 ≡ 𝑏 (mod 𝑛) if and only if we can write 𝑎 = 𝑏 + 𝑘𝑛 for some integer 𝑘. Lemma 3.1. If 𝑎 ≡ 𝑐 (mod 𝑛) and 𝑏 ≡ 𝑑 (mod 𝑛), then 𝑎 + 𝑏 ≡ 𝑐 + 𝑑 (mod 𝑛) and 𝑎𝑏 ≡ 𝑐𝑑 (mod 𝑛). Proof. Suppose that 𝑎 ≡ 𝑐 (mod 𝑛) and 𝑏 ≡ 𝑑 (mod 𝑛). Then there exist integers 𝑘 and 𝑙 such that 𝑎 = 𝑐 + 𝑘𝑛 and 𝑏 = 𝑑 + 𝑙𝑛. We then have 𝑎 + 𝑏 = 𝑐 + 𝑑 + (𝑘 + 𝑙)𝑛 and 𝑎𝑏 = 𝑐𝑑 + (𝑘𝑑 + 𝑙𝑐 + 𝑘𝑙𝑛)𝑛, which shows that 𝑎 + 𝑏 ≡ 𝑐 + 𝑑 (mod 𝑛) and 𝑎𝑏 ≡ 𝑐𝑑 (mod 𝑛). □ This lemma allows us to manipulate congruences algebraically as we do with equations. 12 SCOTT T. PARSELL Example 3.2. For what integers 𝑥 does the congruence 4𝑥 + 1 ≡ 3 (mod 7) hold? Solution. Subtracting 1 from both sides shows that the congruence is equivalent to 4𝑥 ≡ 2 (mod 7). Multiplying both sides by 2 now gives 8𝑥 ≡ 4 (mod 7), which is the same as 𝑥 ≡ 4 (mod 7), since 8 ≡ 1 (mod 7). Hence the congruence is satisfied by all integers 𝑥 of the form 𝑥 = 4 + 7𝑘, where 𝑘 is an integer. □ Lemma 3.3. (Cancellation) If 𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛) and (𝑎, 𝑛) = 1, then 𝑏 ≡ 𝑐 (mod 𝑛). Proof. Suppose that 𝑎𝑏 ≡ 𝑎𝑐 (mod 𝑛) and (𝑎, 𝑛) = 1. Then 𝑛 divides 𝑎𝑏 − 𝑎𝑐 = 𝑎(𝑏 − 𝑐). Since (𝑎, 𝑛) = 1, it follows by imitating the proof of Euclid’s Lemma that 𝑛 divides 𝑏 − 𝑐 (exercise). Thus we have 𝑏 ≡ 𝑐 (mod 𝑛). □ Note that Lemma 3.3 may fail without the assumption that (𝑎, 𝑛) = 1. For instance, we have 2 ⋅ 5 ≡ 2 ⋅ 14 (mod 6), but 5 ∕≡ 14 (mod 6). Example 3.4. For what values of 𝑥 does the congruence 4𝑥 + 1 ≡ 5 (mod 7) hold? Solution. Here the congruence is equivalent to 4𝑥 ≡ 4 (mod 7), and since (4, 7) = 1 we may apply Lemma 3.3 to conclude that 𝑥 ≡ 1 (mod 7). Hence the congruence holds for all integers 𝑥 of the form 𝑥 = 1 + 7𝑘, where 𝑘 is an integer. □ Residue Classes. It is easy to see that congruence modulo 𝑛 defines an equivalence relation on the set of integers and therefore partitions the integers into equivalence classes. Our solutions to Examples 3.2 and 3.4 indicate how these are defined. In Example 3.4, for instance, the solution was the set of all integers congruent to 1 modulo 7, that is, all integers 𝑥 that can be expressed in the form 𝑥 = 1 + 7𝑘 for some integer 𝑘. We call this set the residue class of 1 modulo 7. It is sometimes denoted by [1] or [1]7 . Thus [1]7 = {. . . , −20, −13, −6, 1, 8, 15, 22, . . . }. Similarly, the solution of Example 3.2 is the set of all integers in the residue class [4]7 = {. . . , −17, −10, −3, 4, 11, 18, . . . }. In general, we let [𝑎] or [𝑎]𝑛 denote the residue class of 𝑎 modulo 𝑛, which is defined to be the set of all integers of the form 𝑎 + 𝑘𝑛, where 𝑘 ∈ ℤ. It is often convenient to view each residue class as a single element in a number system. Therefore, we let ℤ𝑛 denote the set of residue classes modulo 𝑛. Technically, we have ℤ𝑛 = {[0]𝑛 , [1]𝑛 , [2]𝑛 , . . . , [𝑛 − 1]𝑛 }, but Lemma 3.1 allows us to work with any set of representatives, such as {0, 1, 2, . . . , 𝑛 − 1}, when doing computations. Thus we often dispense with the brackets and just think of ℤ𝑛 as the set {0, 1, 2, . . . , 𝑛 − 1} under mod 𝑛 arithmetic. With this viewpoint, we could say that the congruence in Example 3.4 has the unique solution 𝑥 = 1 in ℤ7 . Addition and multiplication in ℤ7 can be represented by the following tables: MA 311 + 0 1 2 3 4 5 6 0 0 1 2 3 4 5 6 NUMBER THEORY 1 1 2 3 4 5 6 0 2 2 3 4 5 6 0 1 3 3 4 5 6 0 1 2 4 4 5 6 0 1 2 3 5 5 6 0 1 2 3 4 6 6 0 1 2 3 4 5 × 0 1 2 3 4 5 6 FALL 2008 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 6 2 0 2 4 6 1 3 5 3 0 3 6 2 5 1 4 4 0 4 1 5 2 6 3 5 0 5 3 1 6 4 2 13 6 0 6 5 4 3 2 1 A set such as {0, 1, 2, . . . , 𝑛−1} that contains exactly one representative of each equivalence class is called a complete residue system modulo 𝑛. Complete residue systems are not unique; for instance {0, 1, 2, 3, 4, 5, 6} and {−3, −2, −1, 0, 1, 2, 3} are equally valid complete residue systems modulo 7, and either one could be used to represent ℤ7 . Solving Linear Congruences. We want to develop a systematic procedure for finding the solutions of a congruence of the shape 𝑎𝑥 ≡ 𝑏 (mod 𝑛). The following lemma is an important starting point. Lemma 3.5. (Multiplicative Inverses) If (𝑎, 𝑛) = 1, then there is an integer 𝑐 such that 𝑐𝑎 ≡ 1 (mod 𝑛). Moreover, the residue class of 𝑐 modulo 𝑛 is unique. Proof. Since (𝑎, 𝑛) = 1, we know from Corollary 2.5 that there exist integers 𝑠 and 𝑡 with 𝑎𝑠 + 𝑛𝑡 = 1. We then have 𝑎𝑠 = 1 − 𝑛𝑡, which shows that 𝑎𝑠 ≡ 1 (mod 𝑛), so we can take 𝑐 = 𝑠. Now suppose that 𝑐′ is any other integer with 𝑐′ 𝑎 ≡ 1 (mod 𝑛). Then 𝑐′ ≡ 𝑐′ (𝑐𝑎) ≡ (𝑐′ 𝑎)𝑐 ≡ 𝑐 (mod 𝑛), and the uniqueness claim follows. □ If 𝑐𝑎 ≡ 1 (mod 𝑛), then we say that 𝑐 is the inverse of 𝑎 modulo 𝑛, and we sometimes write 𝑐 = 𝑎−1 or 𝑐 = 𝑎−1 mod 𝑛. Lemma 3.5 shows that when (𝑎, 𝑛) = 1, the congruence 𝑎𝑥 ≡ 𝑏 (mod 𝑛) has a unique solution in ℤ𝑛 , given by 𝑥 = 𝑎−1 𝑏. In view of Corollary 2.5, it is easy to see that Lemma 3.5 can be strengthened to an “if and only if” statement. That is, 𝑎 has a multiplicative inverse modulo 𝑛 if and only if (𝑎, 𝑛) = 1. In order to find 𝑎−1 when (𝑎, 𝑛) = 1, we apply the Euclidean algorithm to find integers 𝑠 and 𝑡 with 𝑎𝑠 + 𝑛𝑡 = 1. −1 We then have 𝑎𝑠 ≡ 1 (mod 𝑛), so 𝑠 ≡ 𝑎 (mod 𝑛). For small values of 𝑛, we can often find inverses by inspection without resorting to the Euclidean algorithm. Example 3.6. Solve the congruence 4𝑥 ≡ 3 (mod 9). Solution. Since (4, 9) = 1 we know that 4 has a multiplicative inverse modulo 9, and we find by inspection that 4−1 = 7 in ℤ9 since 4 ⋅ 7 = 28 ≡ 1 (mod 9). Multiplying through by 7 now gives 𝑥 ≡ 21 ≡ 3 (mod 9), and hence 𝑥 = 3 is the unique solution in ℤ9 . □ Example 3.7. Solve the congruence 91𝑥 ≡ 5 (mod 64). Solution. We can start by observing that 91 ≡ 27 (mod 64), so the congruence is equivalent to 27𝑥 ≡ 5 (mod 64). Since (27, 64) = 1, we can again find a unique solution modulo 64 by 14 SCOTT T. PARSELL multiplying through 27−1 , but finding the inverse by inspection is not quite as easy as it was in Example 3.6. Thus we apply the Euclidean algorithm: [ ] [ ] [ ] 1 0 64 1 −2 10 1 −2 10 → → 0 1 27 0 1 27 −2 5 7 [ ] [ ] 3 −7 3 3 −7 3 → → . −2 5 7 −8 19 1 This shows that 64 ⋅ (−8) + 27 ⋅ 19 = 1 and hence that 27 ⋅ 19 ≡ 1 (mod 64). Hence we have 27−1 = 19 in ℤ64 . Thus 𝑥 ≡ 5 ⋅ 19 ≡ 31 is the unique solution modulo 64. □ What, if anything, can we say about the solutions to the congruence 𝑎𝑥 ≡ 𝑏 (mod 𝑛) when (𝑎, 𝑛) > 1? The following theorem provides the answer. Theorem 3.8. Write 𝑑 = (𝑎, 𝑛). The congruence 𝑎𝑥 ≡ 𝑏 (mod 𝑛) has a solution if and only if 𝑑 divides 𝑏. In this case, there are exactly 𝑑 solutions modulo 𝑛, spaced 𝑛/𝑑 apart. Proof. If 𝑥 is a solution to the congruence, then we have 𝑎𝑥 = 𝑏 + 𝑘𝑛 for some integer 𝑘, and thus 𝑏 = 𝑎𝑥 − 𝑘𝑛. Since 𝑑∣𝑎 and 𝑑∣𝑛, we must have 𝑑∣𝑏 by Lemma 2.1. Therefore the congruence has no solution if 𝑑 does not divide 𝑏. Now suppose that 𝑑∣𝑏. Then since 𝑎𝑥 − 𝑏 = 𝑘𝑛 if and only if 𝑎𝑑 𝑥 − 𝑑𝑏 = 𝑘 𝑛𝑑 , we see that the congruence is equivalent to 𝑎 𝑏 𝑛 𝑥≡ (mod ). 𝑑 𝑑 𝑑 Since (𝑎/𝑑, 𝑛/𝑑) = 1, Lemma 3.5 shows that there is a unique solution 𝑥0 modulo 𝑛/𝑑 and hence 𝑑 distinct solutions modulo 𝑛, given by 𝑥 = 𝑥0 + 𝑚(𝑛/𝑑) for 0 ≤ 𝑚 ≤ 𝑑 − 1. □ Example 3.9. Describe the solutions of the congruence 6𝑥 ≡ 5 (mod 9). Solution. We have (6, 9) = 3, which fails to divide 5, so Theorem 3.8 tells us that there is no solution. □ Example 3.10. Describe the solutions of the congruence 24𝑥 ≡ 9 (mod 33). Solution. We have (24, 33) = 3, which divides 9, so the proof of Theorem 3.8 shows that the congruence is equivalent to 8𝑥 ≡ 3 (mod 11). Since 8−1 = 7 in ℤ11 , we find that 𝑥 = 10 is the unique solution modulo 11. It follows that there are exactly 3 solutions modulo 33, represented by the residue classes 𝑥 = 10, 𝑥 = 21, and 𝑥 = 32. □ Applications to check digit schemes. Congruences can be used to construct a method for reducing errors in data entry. Suppose we have a list of 9-digit identification numbers of the form 𝑥1 𝑥2 . . . 𝑥9 to enter into a computer. We can add a 10th digit 𝑥10 satisfying the congruence 𝑥10 ≡ 𝑥1 + ⋅ ⋅ ⋅ + 𝑥9 (mod 10); that is, 𝑥10 is the sum of the previous 9 digits modulo 10. We can now enter our ID numbers in the form 𝑥1 𝑥2 . . . 𝑥10 and program our computer to reject our entry if the above congruence is not satisfied. For example, the number 129-28-5468 would be entered as 129-28-5468-5 The number 𝑥10 (in this case 5) is called a check digit. This scheme will catch any errors in which only a single digit is mistyped; for instance, the erroneous entry 126-28-5468-5 for the ID number above would be rejected. Many other errors will be caught as well, and this MA 311 NUMBER THEORY FALL 2008 15 scheme can be applied to data strings of any length. One notable disadvantage is that it does not detect errors in which two digits are interchanged; for example, the entry 129-28-4568-5 would be accepted by our computer as a valid ID even though it may have resulted from mistyping 54 as 45. In order to detect errors resulting from interchanging digits, one can employ a more sophisticated scheme. We illustrate by examining the International Standard Book Number (ISBN) system. These numbers are 10 digits long and come in 4 blocks; for instance, the ISBN for Niven, Zuckerman, and Montgomery, Introduction to the Theory of Numbers, 5th edition, is 0-471-62546-9. The first digit indicates the country of publication, the second block encodes the publisher (Wiley), the third block identifies the title and edition, and the fourth block is a check digit. If the first nine digits are 𝑥1 , . . . , 𝑥9 , then the check digit 𝑥10 is determined by the congruence 𝑥10 ≡ 9 ∑ 𝑖𝑥𝑖 ≡ 𝑥1 + 2𝑥2 + 3𝑥3 + ⋅ ⋅ ⋅ + 9𝑥9 (mod 11). 𝑖=1 Thus in the above case, we would compute 𝑥10 ≡ 0 + 2 ⋅ 4 + 3 ⋅ 7 + 4 ⋅ 1 + 5 ⋅ 6 + 6 ⋅ 2 + 7 ⋅ 5 + 8 ⋅ 4 + 9 ⋅ 6 ≡ 196 ≡ 9 (mod 11). We find 𝑥10 by reducing the above expression modulo 11 to obtain one of the standard representatives 0, 1, 2 . . . , 9, 10. (In the event that 𝑥10 = 10, the ISBN uses X instead.) It turns out that this scheme protects both against mistyping a single digit and against interchanging two unequal digits, as long as only one of these errors occurs in a given entry. Theorem 3.11. If 𝐴 = 𝑥1 𝑥2 . . . 𝑥10 is a valid ISBN and 𝐵 = 𝑥′1 𝑥′2 . . . 𝑥′10 is obtained from 𝐴 by altering exactly one digit or interchanging two unequal digits, then 𝐵 is not a valid ISBN. Proof. Note that since 10 ≡ −1 (mod 11) our check digit test for a valid ISBN is equivalent to the congruence 10 ∑ 𝑖𝑥𝑖 ≡ 0 (mod 11). 𝑖=1 Suppose that 𝐵 is obtained from 𝐴 by replacing some digit 𝑥𝑗 by 𝑥′𝑗 , where 𝑥𝑗 ∕= 𝑥′𝑗 . Then (∑ ) 10 10 ∑ ′ 𝑖𝑥𝑖 = 𝑖𝑥𝑖 − 𝑗𝑥𝑗 + 𝑗𝑥′𝑗 ≡ 𝑗(𝑥′𝑗 − 𝑥𝑗 ) ∕≡ 0 (mod 11) 𝑖=1 𝑖=1 by Euclid’s Lemma, since 11 does not divide 𝑗 or 𝑥𝑗 − 𝑥′𝑗 . Suppose instead that 𝐵 is obtained from 𝐴 by interchanging the 𝑗th and 𝑘th digits, where 𝑗 ∕= 𝑘 and 𝑥𝑗 ∕= 𝑥𝑘 . Then we can write 𝑥′𝑗 = 𝑥𝑘 and 𝑥′𝑘 = 𝑥𝑗 , and hence (∑ ) 10 10 ∑ ′ 𝑖𝑥𝑖 ≡ 𝑖𝑥𝑖 + 𝑗𝑥𝑘 + 𝑘𝑥𝑗 − 𝑗𝑥𝑗 − 𝑘𝑥𝑘 ≡ (𝑘 − 𝑗)(𝑥𝑗 − 𝑥𝑘 ) ∕≡ 0 (mod 11) 𝑖=1 𝑖=1 by Euclid’s Lemma, since 11 does not divide 𝑘 − 𝑗 or 𝑥𝑗 − 𝑥𝑘 . □ Example 3.12. The code number 5-382-14572-2 was obtained from a valid ISBN by interchanging two adjacent digits. What was the original ISBN? 16 SCOTT T. PARSELL Solution. Adopting the notation from the proof of Theorem 3.11, we have 10 ∑ 𝑖𝑥′𝑖 = 5 + 6 + 24 + 8 + 5 + 24 + 35 + 56 + 18 + 20 = 201 ≡ 3 (mod 11). 𝑖=1 Suppose the adjacent digits 𝑥𝑗 and 𝑥𝑗+1 were interchanged in the original ISBN. Then by applying the last displayed equation in the proof of Theorem 3.11 with 𝑘 = 𝑗 + 1, we see that 𝑥′𝑗+1 − 𝑥′𝑗 = 𝑥𝑗 − 𝑥𝑗+1 ≡ 3 (mod 11). In the given code, we have 𝑥′6 − 𝑥′5 = 3, and there is no other pair of adjacent digits with this property, so these must be the ones that were interchanged. It follows that the original ISBN was 5-382-41572-2. □ In the above example, we were able to use the ISBN scheme not only to detect an error but also to correct it, assuming we were fairly confident that the error involved transposing adjacent digits. Of course, if there was more than one adjacent pair (𝑥′𝑗 , 𝑥′𝑗+1 ) in the erroneous code with 𝑥′𝑗+1 − 𝑥′𝑗 = 3, then we’d be less successful. Recently, the above system (known as ISBN-10) has been phased out in favor of a 13-digit code that is compatible with the UPC/EAN scheme. Here the check digit is determined by the congruence 𝑥1 + 3𝑥2 + 𝑥3 + 3𝑥4 + 𝑥5 + ⋅ ⋅ ⋅ + 3𝑥12 + 𝑥13 ≡ 0 (mod 10), and a 12-digit UPC is converted to this form by putting an extra 0 at the beginning. Since the arithmetic now occurs in ℤ10 , there is no need to allow X as a possible check digit. This scheme (known as ISBN-13) still detects all single-digit errors but unfortunately no longer detects all transpositions. Many recent books contain both the ISBN-10 and ISBN-13 codes. Fermat’s Little Theorem. In many applications of congruences, it is important to be able to compute powers of an integer efficiently modulo some number 𝑛. In the case where 𝑛 is a prime, we have the following useful result. Theorem 3.13. (Fermat’s Little Theorem) If 𝑝 is a prime not dividing 𝑎, then 𝑎𝑝−1 ≡ 1 (mod 𝑝). Proof. Suppose that 𝑝 does not divide 𝑎, and consider the product 𝑋 = 𝑎 ⋅ 2𝑎 ⋅ 3𝑎 ⋅ ⋅ ⋅ (𝑝 − 1)𝑎 = 𝑎𝑝−1 [1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ (𝑝 − 1)] = 𝑎𝑝−1 (𝑝 − 1)!. Suppose that 1 ≤ 𝑖, 𝑗 ≤ 𝑝 − 1 and that 𝑖𝑎 ≡ 𝑗𝑎 (mod 𝑝). Since (𝑎, 𝑝) = 1, Lemma 3.3 implies that 𝑖 ≡ 𝑗 (mod 𝑝), and hence that 𝑖 = 𝑗. Therefore, the integers 𝑎, 2𝑎, 3𝑎, . . . , (𝑝 − 1)𝑎 represent all the non-zero residue classes modulo 𝑝, and hence their product, 𝑋, must be congruent modulo 𝑝 to 1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ (𝑝 − 1) = (𝑝 − 1)!. That is, we have 𝑎𝑝−1 (𝑝 − 1)! ≡ (𝑝 − 1)! (mod 𝑝). Now since all the prime factors of (𝑝 − 1)! are smaller than 𝑝, we find that 𝑝 and (𝑝 − 1)! are relatively prime, and thus Lemma 3.3 implies that 𝑎𝑝−1 ≡ 1 (mod 𝑝). □ MA 311 NUMBER THEORY FALL 2008 17 We can use Fermat’s Little Theorem to compute powers modulo a prime very efficiently by applying division with remainder to the exponent. Usually we are interested in the least non-negative representative for a particular residue class; this is sometimes called the residue and denoted by the MOD symbol. For instance, the residue of 8 modulo 5 is 8 MOD 5 = 3. Example 3.14. Compute 22008 MOD 13. Solution. Since 13 is prime and doesn’t divide 2, Theorem 3.13 implies that 212 ≡ 1 (mod 13). Moreover, division with remainder yields 2008 = 12 ⋅ 167 + 4, so 22008 = 212⋅167+4 = (212 )167 ⋅ 24 ≡ 24 ≡ 3 (mod 13). Thus we have 22008 MOD 13 = 3. □ Fermat’s Little Theorem also yields a negative test for primality, which is often faster than trial division. If 𝑏 is a positive integer not divisible by 𝑛 and we can show that 𝑏𝑛−1 ∕≡ 1 (mod 𝑛), then we may conclude that 𝑛 is not prime. However, the converse of this is false. For example, 2340 ≡ 1 (mod 341), and yet 341 = 11 ⋅ 31 is not prime. So this does not give a way to prove that an integer is prime. We’ll return to this topic in the next section. Reduced residues and Euler’s Theorem. Recall that 𝑎 has a multiplicative inverse modulo 𝑛 if and only if (𝑎, 𝑛) = 1. When 𝑛 is prime, the residues with this property are just 1, 2, 3, . . . , 𝑛 − 1. In general, we write 𝜙(𝑛) for the number of positive integers less than or equal to 𝑛 that are relatively prime to 𝑛. This is known as Euler’s phi function. For instance, we have 𝜙(1) = 1, 𝜙(2) = 1, 𝜙(3) = 2, 𝜙(4) = 2, 𝜙(5) = 4, 𝜙(6) = 2, 𝜙(7) = 6, 𝜙(8) = 4, 𝜙(9) = 6, and 𝜙(10) = 4. Notice that 𝜙(𝑝) = 𝑝 − 1 whenever 𝑝 is prime. The property of being relatively prime to 𝑛 depends only on the residue class of an integer, since (𝑎, 𝑛) = (𝑎+𝑘𝑛, 𝑛) for any integer 𝑘 by Lemma 2.6. Therefore, we can view 𝜙(𝑛) as the number of residue classes modulo 𝑛 that are relatively prime to 𝑛. Any set of representatives for these classes is called a reduced residue system modulo 𝑛. For instance, {1, 2, 3, 4} is a reduced residue system modulo 5, while {1, 3, 7, 9} and {−3, −1, 1, 3} are reduced residue systems modulo 10. We often use ℤ∗𝑛 to denote a reduced residue system modulo 𝑛. Those familiar with abstract algebra may wish to note that ℤ∗𝑛 forms a group under multiplication. The following result generalizes Fermat’s Little Theorem to the case of composite moduli. Theorem 3.15. (Euler’s Theorem) If 𝑎 and 𝑛 are positive integers with (𝑎, 𝑛) = 1, then 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛). Proof. Let 𝑏1 , . . . , 𝑏𝜙(𝑛) denote the positive integers less than or equal to 𝑛 that are relatively prime to 𝑛, and let 𝑟𝑖 = 𝑎𝑏𝑖 MOD 𝑛 be the residue of 𝑎𝑏𝑖 modulo 𝑛. Suppose that 1 ≤ 𝑖, 𝑗 ≤ 𝜙(𝑛) and 𝑟𝑖 = 𝑟𝑗 . Then 𝑎𝑏𝑖 ≡ 𝑎𝑏𝑗 (mod 𝑛), which implies that 𝑏𝑖 ≡ 𝑏𝑗 (mod 𝑛) since (𝑎, 𝑛) = 1. Since 𝑏1 , . . . , 𝑏𝜙(𝑛) are distinct integers between 1 and 𝑛, we must have 𝑖 = 𝑗. This shows that 𝑟1 , . . . , 𝑟𝜙(𝑛) are distinct. Moreover, it is clear that (𝑟𝑖 , 𝑛) = 1 for each 𝑖, so {𝑟1 , . . . , 𝑟𝜙(𝑛) } is a reduced residue system modulo 𝑛. In particular, we have 𝑏1 ⋅ ⋅ ⋅ 𝑏𝜙(𝑛) ≡ 𝑟1 ⋅ ⋅ ⋅ 𝑟𝜙(𝑛) ≡ 𝑎𝑏1 ⋅ ⋅ ⋅ 𝑎𝑏𝜙(𝑛) ≡ 𝑎𝜙(𝑛) 𝑏1 ⋅ ⋅ ⋅ 𝑏𝜙(𝑛) (mod 𝑛). Since 𝑏1 ⋅ ⋅ ⋅ 𝑏𝜙(𝑛) is relatively prime to 𝑛, we conclude that 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛), as desired. □ Example 3.16. Compute 5999 MOD 12. 18 SCOTT T. PARSELL Solution. We have 𝜙(12) = 4 and (5, 12) = 1, so Theorem 3.15 implies that 54 ≡ 1 (mod 12). Since 999 = 4 ⋅ 249 + 3, we have 5999 = 54⋅249+3 = (54 )249 ⋅ 53 ≡ 53 ≡ 5 (mod 12). Thus we have 5999 MOD 12 = 5. □ In turns out that 𝜙(𝑛) can be computed easily provided that the prime factorization of 𝑛 is known. This follows from the following important theorem about simultaneous congruences. We say that integers 𝑚1 , . . . , 𝑚𝑟 are pairwise relatively prime if (𝑚𝑖 , 𝑚𝑗 ) = 1 whenever 𝑖 ∕= 𝑗. Theorem 3.17. (Chinese Remainder Theorem) Let 𝑚1 , . . . , 𝑚𝑟 be pairwise relatively prime positive integers, and let 𝑏1 , . . . , 𝑏𝑟 be any integers. There exists an integer 𝑥 satisfying the system of congruences 𝑥 ≡ 𝑏1 (mod 𝑚1 ), 𝑥 ≡ 𝑏2 (mod 𝑚2 ), ... , 𝑥 ≡ 𝑏𝑟 (mod 𝑚𝑟 ), and 𝑥 is unique modulo 𝑚1 ⋅ ⋅ ⋅ 𝑚𝑟 . Proof. Let 𝑀 = 𝑚1 ⋅ ⋅ ⋅ 𝑚𝑟 , and for each 𝑖 write 𝑀𝑖 = 𝑀/𝑚𝑖 . Since the 𝑚𝑖 are pairwise relatively prime, we have (𝑀𝑖 , 𝑚𝑖 ) = 1, and thus Theorem 3.8 shows that there is a unique integer 𝑠𝑖 modulo 𝑚𝑖 satisfying the congruence 𝑀𝑖 𝑠𝑖 ≡ 𝑏𝑖 (mod 𝑚𝑖 ). It is easy to check that the integer 𝑥 = 𝑀1 𝑠1 + 𝑀2 𝑠2 + ⋅ ⋅ ⋅ + 𝑀𝑟 𝑠𝑟 satisfies our system of congruences. If 𝑥′ is another solution to the system, then we have 𝑥 ≡ 𝑥′ (mod 𝑚𝑖 ) for each 𝑖, and hence 𝑥 − 𝑥′ is divisible by 𝑚𝑖 . Since the 𝑚𝑖 are pairwise relatively prime, it follows easily that 𝑥 − 𝑥′ is divisible by 𝑀 , which establishes uniqueness modulo 𝑀 . □ Example 3.18. Solve the system of congruences 𝑥 ≡ 1 (mod 5), 2𝑥 ≡ 4 (mod 6), 3𝑥 ≡ 2 (mod 7). Solution. We first rewrite the system in a form to which Theorem 3.17 applies. In view of Theorem 3.8, we see that the system is equivalent to 𝑥 ≡ 1 (mod 5), 𝑥 ≡ 2 (mod 3), 𝑥 ≡ 3 (mod 7), and we may now employ the proof of Theorem 3.17 with 𝑚1 = 5, 𝑚2 = 3, and 𝑚3 = 7 to produce a unique solution modulo 𝑀 = 105. We must find integers 𝑠1 , 𝑠2 , and 𝑠3 satisfying the congruences 21𝑠1 ≡ 1 (mod 5), 35𝑠2 ≡ 2 (mod 3), 15𝑠3 ≡ 3 (mod 7). We see easily by inspection that 𝑠1 = 1, 𝑠2 = 1, and 𝑠3 = 3 are solutions, and thus 𝑥 = 21 ⋅ 1 + 35 ⋅ 1 + 15 ⋅ 3 = 101 is the unique solution of the original system modulo 105. Hence the solutions are precisely the integers of the form 𝑥 = 101 + 105𝑘, where 𝑘 ∈ ℤ. □ MA 311 NUMBER THEORY FALL 2008 19 The Chinese Remainder Theorem also allows us to deal with systems of congruences in which the moduli are not pairwise relatively prime. The technique is to convert the system to an equivalent one in which all the moduli are distinct prime powers. Example 3.19. Find all solutions of the system 𝑥 ≡ 1 (mod 36) and 𝑥 ≡ 5 (mod 56). Solution. By the Chinese Remainder Theorem, the first congruence is equivalent to the pair 𝑥 ≡ 1 (mod 4) and 𝑥 ≡ 1 (mod 9), and the second congruence is equivalent to the pair 𝑥 ≡ 5 (mod 8) and 𝑥 ≡ 5 (mod 7). The congruences modulo powers of 2 must contain either redundant or contradictory information, so we examine these more carefully. If 𝑥 ≡ 5 (mod 8), then we can write 𝑥 = 8𝑘 + 5 = 4(2𝑘 + 1) + 1, for some 𝑘 ∈ ℤ, and it follows that 𝑥 ≡ 1 (mod 4). Since 𝑥 ≡ 5 (mod 8) implies 𝑥 ≡ 1 (mod 4), the latter congruence is redundant and may be eliminated from consideration. We have therefore reduced to the system 𝑥 ≡ 5 (mod 8), 𝑥 ≡ 1 (mod 9), 𝑥 ≡ 5 (mod 7), and here the moduli are pairwise relatively prime, so Theorem 3.17 applies. We know that the unique solution modulo 𝑀 = 504 is given by 𝑥 = 63𝑠1 + 56𝑠2 + 72𝑠3 , where 𝑠1 , 𝑠2 , and 𝑠3 are integers satisfying 63𝑠1 ≡ 5 (mod 8), 56𝑠2 ≡ 1 (mod 9), 72𝑠3 ≡ 5 (mod 7), 2𝑠2 ≡ 1 (mod 9), 2𝑠3 ≡ 5 (mod 7). or equivalently, 7𝑠1 ≡ 5 (mod 8), We see that 𝑠1 = 3, 𝑠2 = 5, and 𝑠3 = 6 satisfy these congruences, and thus 𝑥 = 63 ⋅ 3 + 56 ⋅ 5 + 72 ⋅ 6 = 901 ≡ 397 (mod 504) is the unique solution modulo 504. □ Example 3.20. Find all solutions of the system 𝑥 ≡ 1 (mod 36) and 𝑥≡3 (mod 56). Solution. As in the previous example, the Chinese Remainder Theorem implies that the system is equivalent to 𝑥 ≡ 1 (mod 4), 𝑥 ≡ 3 (mod 8), 𝑥 ≡ 1 (mod 9), 𝑥 ≡ 3 (mod 7). But if 𝑥 ≡ 3 (mod 8), then we have 𝑥 = 8𝑘 + 3 = 4(2𝑘) + 3 for some integer 𝑘, which shows that 𝑥 ≡ 3 (mod 4). Hence these two congruences are inconsistent, and we conclude that the system has no solution. □ 20 SCOTT T. PARSELL One way of viewing the Chinese Remainder Theorem is that it gives a bijection between the integers 𝑥 with 0 ≤ 𝑥 < 𝑀 and the integral 𝑟-tuples (𝑏1 , . . . , 𝑏𝑟 ) with 0 ≤ 𝑏𝑖 < 𝑚𝑖 . The correspondence is given by 𝑥 Ã→ (𝑥 MOD 𝑚1 , . . . , 𝑥 MOD 𝑚𝑟 ). The CRT is what allows us to recover 𝑥 uniquely modulo 𝑀 from the numbers 𝑏𝑖 = 𝑥 MOD 𝑚𝑖 . In fact, this yields a bijection between the 𝜙(𝑀 ) reduced residue classes modulo 𝑀 and the 𝜙(𝑚1 ) ⋅ ⋅ ⋅ 𝜙(𝑚𝑟 ) 𝑟-tuples of reduced residue classes modulo 𝑚1 , . . . , 𝑚𝑟 . This observation allows us to prove the following important multiplicative property of Euler’s phi function. Theorem 3.21. If (𝑚, 𝑛) = 1, then 𝜙(𝑚𝑛) = 𝜙(𝑚)𝜙(𝑛). Proof. By the Chinese Remainder Theorem, there is a one-to-one correspondence, 𝑥 Ã→ (𝑥 MOD 𝑚, 𝑥 MOD 𝑛) between the integers 𝑥 with 0 ≤ 𝑥 < 𝑚𝑛 and the pairs (𝑎, 𝑏) with 0 ≤ 𝑎 < 𝑚 and 0 ≤ 𝑏 < 𝑛. Now suppose that 𝑥 is one of the 𝜙(𝑚𝑛) integers with (𝑥, 𝑚𝑛) = 1. Then one clearly has (𝑥, 𝑚) = (𝑥, 𝑛) = 1, so Lemma 2.6 implies that (𝑥 MOD 𝑚, 𝑥 MOD 𝑛) is one of the 𝜙(𝑚)𝜙(𝑛) pairs (𝑎, 𝑏) with (𝑎, 𝑚) = (𝑏, 𝑛) = 1. On the other hand, if 𝑥 ≡ 𝑎 (mod 𝑚) and 𝑥 ≡ 𝑏 (mod 𝑛), where (𝑎, 𝑚) = (𝑏, 𝑛) = 1, then Lemma 2.6 shows that (𝑥, 𝑚) = (𝑥, 𝑛) = 1 and hence that (𝑥, 𝑚𝑛) = 1. It follows that the CRT bijection specializes to a bijection among reduced residue classes. □ To help visualize the correspondence used in the proof of Theorem 3.21, we illustrate it explicitly for the case 𝑚 = 8, 𝑛 = 9. In row 𝑖, column 𝑗 we write the unique integer 𝑥 with 0 ≤ 𝑥 < 72 that satisfies 𝑥 ≡ 𝑖 (mod 8) and 𝑥 ≡ 𝑗 (mod 9). The reduced residues modulo 8, 9, and 72 are indicated by stars, and we see that 𝜙(72) = 24 = 4 ⋅ 6 = 𝜙(8)𝜙(9). 0 1∗ 2 3∗ 4 5∗ 6 7∗ 0 0 9 18 27 36 45 54 63 1∗ 64 1∗ 10 19∗ 28 37∗ 46 55∗ 2∗ 56 65∗ 2 11∗ 20 29∗ 38 47∗ 3 48 57 66 3 12 21 30 39 4∗ 40 49∗ 58 67∗ 4 13∗ 22 31∗ 5∗ 32 41∗ 50 59∗ 68 5∗ 14 23∗ 6 24 33 42 51 60 69 6 15 7∗ 16 25∗ 34 43∗ 52 61∗ 70 7∗ 8∗ 8 17∗ 26 35∗ 44 53∗ 62 71∗ Corollary 3.22. If 𝑛 = 𝑝𝛼1 1 ⋅ ⋅ ⋅ 𝑝𝛼𝑘 𝑘 , where 𝑝1 , . . . , 𝑝𝑘 are distinct primes, then ( ) ( ) 1 1 𝛼1 𝛼1 −1 𝛼𝑘 𝛼𝑘 −1 𝜙(𝑛) = (𝑝 − 𝑝 ) ⋅ ⋅ ⋅ (𝑝 − 𝑝 )=𝑛 1− ⋅⋅⋅ 1 − . 𝑝1 𝑝𝑘 Proof. Applying Theorem 3.21 repeatedly gives 𝜙(𝑛) = 𝜙(𝑝𝛼1 1 ) ⋅ ⋅ ⋅ 𝜙(𝑝𝛼𝑘 𝑘 ). Now if 𝑝 is prime, then the only positive integers less than or equal to 𝑝𝑡 that are not relatively prime to 𝑝𝑡 are the multiples of 𝑝, namely 𝑝, 2𝑝, 3𝑝, . . . , 𝑝𝑡−1 𝑝. Since there are 𝑝𝑡−1 such multiples, we have 𝜙(𝑝𝑡 ) = 𝑝𝑡 − 𝑝𝑡−1 = 𝑝𝑡 (1 − 1/𝑝), and the result follows. □ MA 311 NUMBER THEORY FALL 2008 21 Example 3.23. Compute 𝜙(21000). Solution. We have 21000 = 23 ⋅ 3 ⋅ 53 ⋅ 7, so Corollary 3.22 gives 𝜙(21000) = (23 − 22 )(3 − 1)(53 − 52 )(7 − 1) = 4 ⋅ 2 ⋅ 100 ⋅ 6 = 4800. □ Example 3.24. Find the last two digits of 32008 . Solution. The last two digits are determined by the residue class modulo 100. Since 100 = 22 ⋅ 52 , we have 𝜙(100) = (4 − 2)(25 − 5) = 40 by Corollary 3.22. Moreover, one has 2008 = 40 ⋅ 50 + 8, so Euler’s Theorem gives 32008 = (340 )50 ⋅ 38 ≡ 38 ≡ 61 (mod 100). Therefore the last two digits are 61. □ 4. Public-key cryptography We can use Euler’s Theorem to devise a scheme for public-key encryption. In such a system, each individual creates and publishes some unique data (known as a public key) that allows them to receive encrypted messages from other users. The system we’ll describe was developed at MIT in 1977 by Rivest, Shamir, and Adelman and is commonly known as RSA. To construct the code, we choose two large primes 𝑝 and 𝑞, say around 200 digits each. We then compute 𝑛 = 𝑝𝑞 and use Corollary 3.22 to calculate 𝜙(𝑛) = (𝑝 − 1)(𝑞 − 1). Next we choose an integer 𝑒 > 1 that is relatively prime to 𝜙(𝑛) and use the Euclidean algorithm to find 𝑑 = 𝑒−1 MOD 𝜙(𝑛). We make the pair (𝑛, 𝑒) publicly available but keep 𝑑 secret. Obviously, we keep 𝑝 and 𝑞 secret as well, since knowing them would enable one to find 𝜙(𝑛), and hence 𝑑. The security of the system rests on the fact that it is essentially impossible to factor 𝑛 in a reasonable amount of time with current technology. To encrypt a message to a user whose public key is (𝑛, 𝑒), we first create a digital version of the message, say 𝑀 , using some character-to-integer scheme such as ASCII. For simplicity, we use the conversions A = 01, B = 02, C = 03, D = 04, . . . , Y = 25, Z = 26, so that each letter of the alphabet corresponds to a 2-digit integer, and we use 27 to represent a space. If desired, we could introduce additional integers to stand for punctuation marks and other symbols. If 𝑀 ≥ 𝑛, we break the message into blocks so that each block is smaller than 𝑛. We then encrypt the message by computing 𝐸 = 𝑀 𝑒 MOD 𝑛. The recipient then decrypts the message by computing 𝐸 𝑑 MOD 𝑛, using Euler’s Theorem and the fact that 𝑑𝑒 = 1 + 𝑘𝜙(𝑛) for some integer 𝑘. One has 𝐸 𝑑 ≡ (𝑀 𝑒 )𝑑 ≡ 𝑀 𝑑𝑒 ≡ 𝑀 1+𝑘𝜙(𝑛) ≡ (𝑀 𝜙(𝑛) )𝑘 ⋅ 𝑀 ≡ 𝑀 𝑑 (mod 𝑛), and thus 𝐸 MOD 𝑛 = 𝑀 . In applying Euler’s Theorem, we implicitly assumed that (𝑀, 𝑛) = 1, but the probability that this fails is negligible when 𝑛 is composed of two 200digit primes. The point of RSA is that, given 𝑛 and 𝑒, one cannot compute the decryption key 𝑑 without knowing 𝜙(𝑛), which is equivalent to knowing the factorization 𝑛 = 𝑝𝑞. 22 SCOTT T. PARSELL Example 4.1. Decode the encrypted message 0828, which was generated using RSA with public key (4897, 19). Solution. Here the integer 𝑛 = 4897 is far too small to create a secure cryptosystem, and after some trial division we easily obtain the factorization 𝑛 = 𝑝𝑞, where 𝑝 = 59 and 𝑞 = 83. It now follows from Corollary 3.22 that 𝜙(𝑛) = 58 ⋅ 82 = 4756. Next we calculate the decryption key 𝑑 = 19−1 MOD 4756 using the Euclidean Algorithm: [ ] [ ] [ ] 1 0 4756 1 −250 6 1 −250 6 → → . 0 1 19 0 1 19 −3 751 1 Hence we have 𝑑 = 751, so we can decrypt the message by computing 𝑀 = 828751 MOD 4897. This computation is doable on a calculator by successive squaring. We write the exponent 751 in binary as 1011011112 = 512 + 128 + 64 + 32 + 8 + 4 + 2 + 1 and square 𝑎 = 828 repeatedly modulo 4897. This gives 𝑎2 = 4, 𝑎4 = 16, 𝑎8 = 256, 𝑎16 = 1875, 𝑎32 = −421, 𝑎64 = 949, 𝑎128 = −447, 𝑎256 = −968, and 𝑎512 = 1697, and it follows that 𝑀 ≡ 𝑎512 𝑎128 𝑎64 𝑎32 𝑎8 𝑎4 𝑎2 𝑎 ≡ 2515 (mod 4897), so the message was “YO”. □ The method of successive squaring used in the above example gives a good way of performing fast modular exponentiation, which has been implemented in many software packages. For instance, Mathematica has the function PowerMod[a,b,n], which quickly computes 𝑎𝑏 MOD 𝑛. A useful tool for finding modular inverses is the function ExtendedGCD[a,b], which returns gcd(𝑎, 𝑏), together with integers 𝑠 and 𝑡 satisfying gcd(𝑎, 𝑏) = 𝑎𝑠 + 𝑏𝑡. Mathematica has built-in arbitrary precision, so it is great for handling long integers without the fear of truncation. Programs that only store, say, the first 16 digits of a number give more than sufficient accuracy for many applications, but losing even a single digit of an integer is obviously devastating for number theory. Digital Signatures. We can also apply the RSA encryption principle to authenticate digital signatures. If your public key is (𝑛, 𝑒) and you send me your signature 𝑆 in the form 𝐷 = 𝑆 𝑑 MOD 𝑛, where 𝑑 is your personal decryption key, then I can recover 𝑆 by computing 𝐷𝑒 MOD 𝑛 = 𝑆 𝑑𝑒 MOD 𝑛 = 𝑆 1+𝑘𝜙(𝑛) MOD 𝑛 = 𝑆. Moreover, I know that the signature is authentic since you’re the only one who knows 𝑑. If the signature had been encrypted using an incorrect 𝑑 then I would most likely obtain gibberish when attempting to recover 𝑆. Example 4.2. You receive the digital signature 20496 from a user with initials S. P. and public key (21311, 41). Does it appear to be authentic? Solution. We compute 2049641 MOD 21311 using Mathematica and get PowerMod[20496, 41, 21311] = 1916. MA 311 NUMBER THEORY FALL 2008 23 Since S=19 and P=16, the message must have come from S. P., or at least someone with access to his decryption key. Of course, the numbers are once again so small that anyone could have figured out the decryption key and sent a phony signature. □ The digital signature process described above presumes that the signer is not concerned about the possibility of his or her signature being viewed by a third party. The goal is simply to provide a method for the recipient to verify the signer’s identity. One can transmit sensitive information and verify the identity of the sender by nesting an encryption within the digital signature process. Suppose that Alice, whose public key is (𝑛𝐴 , 𝑒𝐴 ), wants to send a message 𝑀 to Bob, whose public key is (𝑛𝐵 , 𝑒𝐵 ), and Bob wants to be sure that the message is really coming from Alice. Alice first “signs” the message using her own decryption key, 𝑑𝐴 , and then encrypts it using Bob’s public key. Thus she computes 𝑆 = 𝑀 𝑑𝐴 MOD 𝑛𝐴 and sends Bob 𝐸 = 𝑆 𝑒𝐵 MOD 𝑛𝐵 . When Bob receives 𝐸, he uses his decryption key 𝑑𝐵 to compute 𝑆 = 𝐸 𝑑𝐵 MOD 𝑛𝐵 and then recovers the message as 𝑀 = 𝑆 𝑒𝐴 MOD 𝑛𝐴 . At this point, he knows the contents of the message and can be sure that it was sent by Alice and not someone pretending to be Alice. Primality testing. One issue in implementing RSA is that we need to find large integers that are known to be prime. Fortunately, there are alternatives to trial division for investigating primality. As mentioned in §3, Fermat’s Little Theorem can be used to show that an integer is not prime. For example, if 𝑝 is an odd prime, then the theorem tells us that 2𝑝−1 ≡ 1 (mod 𝑝). The converse of this statement is false; that is, there exist odd composite integers 𝑛 with the property that 2𝑛−1 ≡ 1 (mod 𝑛). However, it turns out that such integers are fairly rare, so there is a good chance that an integer 𝑛 satisfying this congruence will in fact be prime. An odd composite integer 𝑛 satisfying this congruence is called a pseudoprime. The only pseudoprimes less than 1000 are 341 = 11 ⋅ 31, 561 = 3 ⋅ 11 ⋅ 17, and 645 = 3 ⋅ 5 ⋅ 43. More generally, if 𝑛 is an odd composite integer with (𝑏, 𝑛) = 1 and 𝑏𝑛−1 ≡ 1 (mod 𝑛), then we say that 𝑛 is a pseudoprime for the base 𝑏. If we want to know whether 𝑛 is prime, we could first test for divisibility by 2, 3, 5, and 7 and then compute 2𝑛−1 MOD 𝑛, 3𝑛−1 MOD 𝑛, 5𝑛−1 MOD 𝑛, and 7𝑛−1 MOD 𝑛, for instance. If any of these is not equal to 1, then Fermat’s Theorem implies that 𝑛 is not prime. If they are all equal to 1, then 𝑛 is very likely (but not certain) to be prime. Interestingly, and perhaps unfortunately, there are odd composite integers 𝑛 that are pseudoprimes for every base 𝑏 with (𝑏, 𝑛) = 1. Such numbers are called Carmichael numbers, and the smallest one is 561. Carmichael numbers are very sparse (561 is the only one less than 1000), but it was proved in 1994 by Alford, Granville, and Pomerance that there are infinitely many! In fact, they showed that there are at least 𝑥2/7 Carmichael numbers not exceeding 𝑥. The concept of pseudoprimes can be strengthened by making the following simple observation. Suppose that 𝑏𝑛−1 ≡ 1 (mod 𝑛). If 𝑛 is odd, we can write 𝑛 = 2𝑚 + 1 for some integer 𝑚, and we see that 𝑛 divides 𝑏2𝑚 − 1 = (𝑏𝑚 − 1)(𝑏𝑚 + 1). If 𝑛 is prime, it now follows from Euclid’s Lemma that 𝑛 divides 𝑏𝑚 − 1 or 𝑏𝑚 + 1. Thus if 𝑏𝑚 ∕≡ ±1 (mod 𝑛) then we can conclude that 𝑛 is not prime. On the other hand, if 𝑏𝑚 ≡ 1 (mod 𝑛) and 𝑚 is even, then we can apply the same reasoning within the factorization 𝑏𝑚 − 1 = (𝑏𝑚/2 − 1)(𝑏𝑚/2 + 1). 24 SCOTT T. PARSELL Example 4.3. Show how to deduce that 341 is not prime without using its prime factorization. Solution. One easily computes that 2340 ≡ 1 (mod 341), so 341 divides 2340 − 1 = (2170 − 1)(2170 + 1) = (285 − 1)(285 + 1)(2170 + 1). We further compute that 2170 ≡ 1 (mod 341) and 285 ≡ 32 (mod 341). But if 341 were prime then it would have to divide 285 − 1, 285 + 1, or 2170 + 1 by Euclid’s Lemma, which would mean that 285 ≡ ±1 (mod 341) or 2170 ≡ −1 (mod 341). Since none of these conclusions holds, we may conclude that 341 is not prime. □ In general, if 𝑛 is an odd integer exceeding 1, we can write 𝑛 = 2𝑎 𝑡 + 1, where 𝑡 is odd and 𝑎 ≥ 1. Then one has the factorization 𝑎−1 𝑡 𝑏𝑛−1 − 1 = (𝑏𝑡 − 1)(𝑏𝑡 + 1)(𝑏2𝑡 + 1)(𝑏4𝑡 + 1) ⋅ ⋅ ⋅ (𝑏2 + 1). (4.1) 𝑎 In Example 4.3 we had 𝑎 = 2 and 𝑡 = 85. An odd composite integer 𝑛 = 2 𝑡 + 1 is called a strong pseudoprime for the base 𝑏 if (𝑏, 𝑛) = 1 and 𝑛 divides one of the factors on the right-hand side of (4.1). Any integer (prime or composite) with this property is said to have passed the strong pseudoprime test. Strong pseudoprimes are considerably more scarce than ordinary pseudoprimes. For the base 𝑏 = 2, for example, there are 5597 pseudoprimes up to 109 but only 1282 strong pseudoprimes, the smallest of which is 2047. Example 4.4. Show that 2047 is a strong pseudoprime for the base 2. Solution. We observe that 2046 is not divisible by 4, so we have 𝑎 = 1 and 𝑡 = 1023 in the above notation. Moreover, Mathematica shows that 21023 ≡ 1 (mod 2047), so 2047 divides 21023 − 1 and hence passes the strong pseudoprime test. Finally, we note that 2047 = 23 ⋅ 89 is composite and is therefore in fact a strong pseudoprime. □ The results are more striking if we apply the strong pseudoprime test for several different bases. There is only one integer less than 2.5 × 1010 , namely 3, 215, 031, 751 = 151 × 751 × 28351, that is a strong pseudoprime for bases 2, 3, 5, and 7. Moreover, there is no “strong” analogue of the Carmichael numbers. That is, every composite number 𝑛 fails the strong pseudoprime test for some base 𝑏 with (𝑏, 𝑛) = 1. Such a 𝑏 is called a witness to the compositeness of 𝑛. In fact, it can be shown that at least half of the bases 𝑏 ≤ 𝑛 with (𝑏, 𝑛) = 1 are witnesses when 𝑛 is composite, so this procedure can be used to identify primes with near certainty. In 2004, Agrawal, Kayal, and Saxena developed an algorithm that proves the primality or compositeness of 𝑛 with √ a running time that is polynomial in log 𝑛. For comparison, trial division requires 𝑂( 𝑛) steps to prove that an integer is prime. Many software packages have built-in functions that implement various primality tests. In Mathematica, for example, PrimeQ[n] returns true or false according to whether or not 𝑛 is prime, while Prime[k] returns the 𝑘th prime number. Factorization algorithms. Attacks on RSA could be made if efficient factoring algorithms were known. As with primality testing, there are algorithms that are far more MA 311 NUMBER THEORY FALL 2008 25 efficient than trial division, but no current algorithm comes close to breaking RSA with 200-digit primes, except in very special cases that can easily be avoided. We briefly explore some of the ideas involved in these factorization techniques. Fermat’s factoring method was to try to express 𝑛 as the difference of two squares. If we can find positive integers 𝑥 and 𝑦 such that 𝑛 = 𝑥2 − 𝑦 2 = (𝑥 − 𝑦)(𝑥 + 𝑦), then we’ve found a factorization of 𝑛, provided that 𝑥 − 𝑦 ∕= 1. Kraitchik realized that one could apply the spirit of Fermat’s idea more efficiently by instead looking for integers 𝑥 and 𝑦 satisfying the weaker condition 𝑥2 ≡ 𝑦 2 (mod 𝑛), so that 𝑛 divides (𝑥 − 𝑦)(𝑥 + 𝑦). This no longer ensures a factorization of 𝑛, but there is a reasonable chance that both 𝑥 − 𝑦 and 𝑥 + 𝑦 contain some of the prime factors of 𝑛. For example, if 𝑛 is the product of two distinct primes 𝑝 and 𝑞, one would expect roughly a 50% chance that 𝑝 and 𝑞 split among the two factors 𝑥−𝑦 and 𝑥+𝑦. In this case gcd(𝑥−𝑦, 𝑛) will be a non-trivial factor of 𝑛, and this can be computed efficiently via the Euclidean algorithm. If both 𝑝 and 𝑞 divide the same factor, then one can simply try different values for 𝑥 and 𝑦. Powerful recent factoring methods like the quadratic sieve are based on finding suitable integers 𝑥 and 𝑦 to carry out this principle. Pollard’s so-called “rho” method is based on generating a quasi-random sequence of numbers that are distinct modulo the integer 𝑛 to be factored but not distinct modulo its smallest prime divisor 𝑝. Suppose we generate “random” integers 𝑥1 , . . . , 𝑥𝑘 , where 𝑘 is large by com√ parison with 𝑝 but small by comparison with 𝑛. For example, we could take 𝑘 ≈ 10𝑛1/4 . Then the probability that the 𝑥𝑖 are distinct modulo 𝑝 is very small, so gcd(𝑥𝑖 − 𝑥𝑗 , 𝑛) will most likely produce the factor 𝑝 for some 𝑖 and 𝑗. This leads to a factorization of 𝑛 with expected running time 𝑂(𝑛1/4 ). When the method works, the numbers 𝑥𝑖 MOD 𝑝 are eventually periodic and can thus be written in a shape resembling the Greek letter 𝜌. Example 4.5. Use Pollard’s rho method to factor the integer 𝑛 = 36287. Solution. First note that 236286 ≡ 35799 ∕≡ 1 (mod 36287), so Fermat’s Little Theorem implies that 𝑛 is composite. We construct our quasi-random sequence of integers recursively by taking 𝑥0 = 1 and 𝑥𝑖+1 = (𝑥2𝑖 + 1) MOD 𝑛. The first few terms of the sequence are 1, 2, 5, 26, 677, 22886, 2439, 33941, 24380, 3341, 22173, 25654, 26685. In particular, one has 𝑥5 = 22886 and 𝑥12 = 26685, which gives gcd(𝑥12 − 𝑥5 , 𝑛) = 131. Thus we obtain the factorization 36287 = 131 ⋅ 277. □ Suppose that 𝑛 is a large composite integer with no small prime factors but that all the prime factors of 𝑝 − 1 are small for some prime 𝑝∣𝑛. For example, suppose that 𝑝 − 1 divides 10000!. Then by Fermat’s Little Theorem one has 210000! ≡ 1 (mod 𝑝), and thus 𝑝 divides gcd(210000! − 1, 𝑛). Thus we can attempt to find 𝑝 by computing gcd(2𝑖! − 1, 𝑛) for various values of 𝑖. This is known as Pollard’s 𝑝 − 1 method, and it can be applied with bases other than 2 as well. 26 SCOTT T. PARSELL Example 4.6. Use Pollard’s 𝑝 − 1 method to factor the integer 𝑛 = 69841. Solution. We have 2𝑛−1 ≡ 37073 ∕≡ 1 (mod 𝑛), so 𝑛 is composite. With 𝑖 = 5 in the above notation, we obtain gcd(2120 − 1, 69841) = 331, which gives us a nontrivial divisor of 𝑛. It is now easily checked that 𝑛 = 211 ⋅ 331 is the desired prime factorization. Note that the method was effective here because 𝑝 − 1 = 330 = 2 ⋅ 3 ⋅ 5 ⋅ 11 is divisible by 11!. One would typically expect to test up to 𝑖 = 11 before finding 𝑝, but we happened to find it sooner. □ One important consequence of the Pollard 𝑝 − 1 method is that RSA can be broken if the primes 𝑝 and 𝑞 are chosen in such a way that 𝑝 − 1 or 𝑞 − 1 has only small prime factors. Therefore, one must be careful to avoid this situation when constructing a public key. We also mention that neither of Pollard’s algorithms will prove primality if they are applied to prime integers. Hence they should only be used on integers that are known to be composite, for example by failing a pseudoprime test. The Mathematica function FactorInteger[n] implements a variety of advanced algorithms to attempt to determine the prime factorization of 𝑛, but it typically becomes extremely slow when the smallest prime factor of 𝑛 is large. 5. Primitive roots When (𝑎, 𝑛) = 1, we know from Euler’s Theorem that 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛). However, there may be smaller powers of 𝑎 that are congruent to 1 modulo 𝑛. We define the order of 𝑎 modulo 𝑛 (or the order of 𝑎 in ℤ∗𝑛 ) to be the smallest positive integer 𝑑 such that 𝑎𝑑 ≡ 1 (mod 𝑛). For example, the elements 3, 5, and 7 all have order 2 in ℤ∗8 . The elements 2 and 3 have orders 3 and 6, respectively, in ℤ∗7 . In view of Euler’s Theorem, the order of an element in ℤ∗𝑛 is at most 𝜙(𝑛). Note that if 𝑎𝑑 ≡ 1 (mod 𝑛) for some positive integer 𝑑, then we can write 𝑎𝑑−1 𝑎 + 𝑘𝑛 = 1 for some 𝑘 ∈ ℤ, so Corollary 2.5 implies that (𝑎, 𝑛) = 1. Thus if (𝑎, 𝑛) > 1 then there is no positive power of 𝑎 that is 1 modulo 𝑛. Hence order is not defined for the elements of ℤ𝑛 that are not relatively prime to 𝑛. Theorem 5.1. If 𝑎 has order 𝑑 in ℤ∗𝑛 and 𝑚 is a positive integer with 𝑎𝑚 ≡ 1 (mod 𝑛), then 𝑑 divides 𝑚. Proof. We use division with remainder (Theorem 2.3) to write 𝑚 = 𝑞𝑑 + 𝑟, where 𝑞 and 𝑟 are integers with 0 ≤ 𝑟 < 𝑑. Then we have 1 ≡ 𝑎𝑚 ≡ 𝑎𝑞𝑑+𝑟 ≡ (𝑎𝑑 )𝑞 ⋅ 𝑎𝑟 ≡ 𝑎𝑟 (mod 𝑛), so the minimality of 𝑑 implies that 𝑟 = 0, and hence 𝑚 = 𝑞𝑑, as required. □ Corollary 5.2. The order of every element of ℤ∗𝑛 divides 𝜙(𝑛). Proof. In view of Euler’s Theorem, this follows from Theorem 5.1 with 𝑚 = 𝜙(𝑛). □ For example, it is easy to check directly that each element of ℤ∗7 has order 1, 2, 3, or 6 and that each element of ℤ∗8 has order 1 or 2. If the order of 𝑎 modulo 𝑛 happens to be 𝜙(𝑛) then we say that 𝑎 is a primitive root modulo 𝑛. MA 311 NUMBER THEORY FALL 2008 27 Example 5.3. Determine the primitive roots modulo 7 and modulo 8. Solution. The elements 3 and 5 are primitive roots modulo 7 because they both have order 6 = 𝜙(7). It is easily checked that all other elements of ℤ∗7 have order less than 6, so there are no other primitive roots. Finally, there are no primitive roots modulo 8 because there are no elements of order 4 = 𝜙(8). □ A primitive root is sometimes a called a generator because computing successive powers of it generates the whole of ℤ∗𝑛 . For example, 3 is a generator for ℤ∗7 because 31 = 3, 32 = 2, 33 = 6, 34 = 4, 35 = 5, and 36 = 1. For this reason, we sometimes use the letter 𝑔 to denote a primitive root. In algebraic terms, the existence of a primitive root modulo 𝑛 means that ℤ∗𝑛 is a cyclic group under multiplication. Example 5.3 shows that ℤ∗7 is cyclic but that ℤ∗8 is not. The following theorem shows that primitive roots are generators. Theorem 5.4. If 𝑔 is a primitive root modulo 𝑛 and (𝑟, 𝑛) = 1, then we have 𝑟 ≡ 𝑔 𝑖 (mod 𝑛) for some integer 𝑖 with 1 ≤ 𝑖 ≤ 𝜙(𝑛). Proof. Consider the 𝜙(𝑛) integers 𝑔, 𝑔 2 , 𝑔 3 , . . . , 𝑔 𝜙(𝑛) . If 𝑔 𝑖 ≡ 𝑔 𝑗 (mod 𝑛) for some 𝑖 and 𝑗 with 1 ≤ 𝑖 < 𝑗 ≤ 𝜙(𝑛), then we would have 𝑔 𝑗−𝑖 ≡ 1 (mod 𝑛), which is impossible since 𝑔 has order 𝜙(𝑛) and 0 < 𝑗 − 𝑖 < 𝜙(𝑛). Therefore, the integers 𝑔, 𝑔 2 , 𝑔 3 , . . . , 𝑔 𝜙(𝑛) all lie in distinct residue classes modulo 𝑛. Since each 𝑔 𝑖 is also relatively prime to 𝑛, we deduce that the set {𝑔, 𝑔 2 , 𝑔 3 , . . . , 𝑔 𝜙(𝑛) } forms a reduced residue system modulo 𝑛. Hence there is some exponent 𝑖 for which 𝑟 ≡ 𝑔 𝑖 (mod 𝑛). □ Theorem 5.5. If 𝑎 has order 𝑑 modulo 𝑛, then 𝑎𝑖 has order 𝑑/(𝑑, 𝑖) modulo 𝑛. Proof. Let 𝑒 denote the order of 𝑎𝑖 modulo 𝑛. First of all, we have (𝑎𝑖 )𝑑/(𝑑,𝑖) ≡ (𝑎𝑑 )𝑖/(𝑑,𝑖) ≡ 1 (mod 𝑛), so Theorem 5.1 implies that 𝑒 divides 𝑑/(𝑑, 𝑖). Moreover, we have 𝑎𝑒𝑖 ≡ (𝑎𝑖 )𝑒 ≡ 1 (mod 𝑛), so Theorem 5.1 further implies that 𝑑 divides 𝑒𝑖, and hence that 𝑑/(𝑑, 𝑖) divides 𝑒𝑖/(𝑑, 𝑖). Since 𝑑/(𝑑, 𝑖) and 𝑖/(𝑑, 𝑖) are relatively prime, it follows from a homework exercise that 𝑑/(𝑑, 𝑖) divides 𝑒. Since 𝑒 divides 𝑑/(𝑑, 𝑖) and 𝑑/(𝑑, 𝑖) divides 𝑒, and both quantities are positive, we may conclude that 𝑒 = 𝑑/(𝑑, 𝑖), as desired. □ Corollary 5.6. If ℤ∗𝑛 contains a primitive root, then the total number of primitive roots in ℤ∗𝑛 is 𝜙(𝜙(𝑛)). In other words, if ℤ∗𝑛 is cyclic, then it has 𝜙(𝜙(𝑛)) generators. Proof. If 𝑔 is a primitive root modulo 𝑛, then Theorem 5.5 shows that 𝑔 𝑖 is a primitive root if and only if (𝜙(𝑛), 𝑖) = 1. Hence there are 𝜙(𝜙(𝑛)) choices for 𝑖. By Theorem 5.4, all □ elements of ℤ∗𝑛 can be expressed as 𝑔 𝑖 for some 𝑖, so this completes the proof. The following theorem, due to Gauss, completely characterizes the integers 𝑛 for which has a primitive root. ℤ∗𝑛 28 SCOTT T. PARSELL Theorem 5.7. There exists a primitive root modulo 𝑛 if and only if 𝑛 = 1, 2, 4, 𝑝𝑘 , or 2𝑝𝑘 , where 𝑝 is an odd prime and 𝑘 is a positive integer. Example 5.8. What can you say above the existence of primitive roots modulo 𝑛 when 9 ≤ 𝑛 ≤ 20? How many primitive roots are there modulo 18 and 19? Solution. In view of Theorem 5.7, there are primitive roots modulo 9, 10, 11, 13, 14, 17, 18, and 19, while there are no primitive roots modulo 12, 15, 16, or 20. By Corollary 5.6, the number of primitive roots modulo 18 is 𝜙(𝜙(18)) = 𝜙(6) = 2, and the number of primitive roots modulo 19 is 𝜙(𝜙(19)) = 𝜙(18) = 6. □ The full proof of Theorem 5.7 is somewhat time-consuming, although it is accessible with elementary techniques. Rather than giving the complete argument, which involves a number of separate cases, we will be content to prove the existence of primitive roots for prime moduli. Before doing this, we need some auxiliary results. The following theorem, due to Lagrange, concerns solutions of polynomial congruences modulo a prime. Theorem 5.9. Let 𝑓 (𝑥) be a polynomial of degree 𝑑 with integer coefficients, and let 𝑝 be a prime not dividing the leading coefficient of 𝑓 (𝑥). Then the congruence 𝑓 (𝑥) ≡ 0 (mod 𝑝) has at most 𝑑 distinct solutions modulo 𝑝. Proof. We proceed by induction on 𝑑. When 𝑑 = 0, the polynomial 𝑓 (𝑥) is a constant not divisible by 𝑝, so the congruence has no solutions. Now suppose that 𝑑 > 0 and that the result holds for all polynomials of degree less than 𝑑. Let 𝑓 (𝑥) be a polynomial of degree 𝑑, with 𝑝 not dividing the leading coefficient, and suppose that 𝑓 (𝑎) ≡ 0 (mod 𝑝). Using division with remainder for polynomials, we can write 𝑓 (𝑥) = 𝑞(𝑥)(𝑥 − 𝑎) + 𝑟, where 𝑞(𝑥) is a polynomial of degree 𝑑 − 1, and where 𝑟 is an integer. (Since 𝑥 − 𝑎 has degree one, the remainder has degree zero.) Moreover, 𝑝 does not divide the leading coefficient of 𝑞(𝑥), since this is the same as the leading coefficient of 𝑓 (𝑥). We have 𝑟 = 𝑓 (𝑎) ≡ 0 (mod 𝑝), which means that 𝑝∣𝑟, and thus for any integer 𝑥 we have 𝑓 (𝑥) ≡ 𝑞(𝑥)(𝑥 − 𝑎) (mod 𝑝). Now if 𝑓 (𝑏) ≡ 0 (mod 𝑝), then 𝑝 divides 𝑞(𝑏)(𝑏 − 𝑎), so Euclid’s Lemma implies that 𝑝 divides 𝑞(𝑏) or 𝑝 divides 𝑏 − 𝑎. In the first case, we have 𝑞(𝑏) ≡ 0 (mod 𝑝), so the induction hypothesis ensures that there are at most 𝑑 − 1 choices for 𝑏. In the second case, we have 𝑏 ≡ 𝑎 (mod 𝑝), which gives one additional possibility. Thus 𝑓 (𝑥) ≡ 0 (mod 𝑝) has at most 𝑑 solutions in total. □ Note that the theorem fails for composite moduli. For example, the congruence 𝑥2 − 1 ≡ 0 (mod 8) has four solutions, 𝑥 = 1, 3, 5, 7, but the polynomial 𝑓 (𝑥) = 𝑥2 − 1 has degree two. The next lemma establishes an interesting relationship between an integer and the Euler phi function of its divisors. To illustrate, notice that the positive divisors of 12 are 1, 2, 3, MA 311 NUMBER THEORY FALL 2008 29 4, 6, and 12 and that 𝜙(1) + 𝜙(2) + 𝜙(3) + 𝜙(4) + 𝜙(6) + 𝜙(12) = 1 + 1 + 2 + 2 + 2 + 4 = 12. The positive divisors of 17 are 1 and 17, and we have 𝜙(1) + 𝜙(17) = 1 + 16 = 17. The positive divisors of 20 are 1, 2, 4, 5, 10, and 20, and we have 𝜙(1) + 𝜙(2) + 𝜙(4) + 𝜙(5) + 𝜙(10) + 𝜙(20) = 1 + 1 + 2 + 4 + 4 + 8 = 20. We now show that this phenomenon occurs in general. Lemma 5.10. Let 𝑛 be a positive integer, and let 𝑑1 , 𝑑2 , . . . , 𝑑𝑡 denote the positive divisors of 𝑛. Then ∑ 𝜙(𝑑1 ) + 𝜙(𝑑2 ) + ⋅ ⋅ ⋅ + 𝜙(𝑑𝑡 ) = 𝜙(𝑑) = 𝑛. 𝑑∣𝑛 Proof. Let 𝑆(𝑑) denote the number of integers 𝑎 with 1 ≤ 𝑎 ≤ 𝑛 and (𝑎, 𝑛) = 𝑑. Since 𝑆(𝑑) = 0 unless 𝑑 is a divisor of 𝑛, we can write ∑ 𝑆(𝑑1 ) + 𝑆(𝑑2 ) + ⋅ ⋅ ⋅ + 𝑆(𝑑𝑡 ) = 𝑆(𝑑) = 𝑛. 𝑑∣𝑛 Consider an integer 𝑎 counted by 𝑆(𝑑). Then (𝑎, 𝑛) = 𝑑, so in particular we have 𝑑∣𝑎 and 𝑑∣𝑛, and furthermore (𝑎/𝑑, 𝑛/𝑑) = 1. Thus we can write 𝑎 = 𝑘𝑑 for a unique integer 𝑘 with 1 ≤ 𝑘 ≤ 𝑛/𝑑 and (𝑘, 𝑛/𝑑) = 1. Hence the number of choices for 𝑘 is 𝜙(𝑛/𝑑). Since this also gives the number of possibilities for 𝑎, we deduce that 𝑆(𝑑) = 𝜙(𝑛/𝑑). Notice that the numbers 𝑛/𝑑1 , 𝑛/𝑑2 , . . . , 𝑛/𝑑𝑡 are just the divisors 𝑑1 , 𝑑2 , . . . , 𝑑𝑡 , listed in a different order. Thus we have ∑ ∑ ∑ 𝑆(𝑑) = 𝑛, 𝜙(𝑛/𝑑) = 𝜙(𝑑) = 𝑑∣𝑛 𝑑∣𝑛 𝑑∣𝑛 as desired. □ We can now demonstrate the existence of primitive roots modulo a prime 𝑝. The following theorem actually makes the stronger assertion that ℤ∗𝑝 contains elements of all orders dividing 𝑝 − 1 (including 𝑝 − 1 itself). Note that Corollary 5.2 implies that no other orders are permissible. Theorem 5.11. If 𝑝 is prime and 𝑑 is a positive integer dividing 𝑝−1, then there are exactly 𝜙(𝑑) elements of order 𝑑 in ℤ∗𝑝 . Proof. Let 𝑑 be a divisor of 𝑝 − 1, and let 𝑁 (𝑑) be the number of elements of order 𝑑 in ℤ∗𝑝 . If 𝑁 (𝑑) > 0, then there exists some 𝑎 ∈ ℤ∗𝑝 of order 𝑑. The integers 𝑎, 𝑎2 , 𝑎3 , . . . , 𝑎𝑑 are distinct modulo 𝑝, since otherwise we would have 𝑎𝑗−𝑖 ≡ 1 (mod 𝑝), where 0 < 𝑗 − 𝑖 < 𝑑, violating the definition of order. Moreover, for each 𝑖 with 1 ≤ 𝑖 ≤ 𝑑 we have (𝑎𝑖 )𝑑 ≡ (𝑎𝑑 )𝑖 ≡ 1 (mod 𝑝), so each 𝑎𝑖 is a solution of the congruence 𝑥𝑑 − 1 ≡ 0 (mod 𝑝), and Theorem 5.9 implies that these are the only solutions. Furthermore, every element of order 𝑑 satisfies the congruence and must therefore be a power of 𝑎. We know from Theorem 5.5 that 𝑎𝑖 has order 𝑑 if and only if (𝑑, 𝑖) = 1, so there are exactly 𝜙(𝑑) elements of order 𝑑. Thus we’ve shown that either 30 SCOTT T. PARSELL 𝑁 (𝑑) = 0 or 𝑁 (𝑑) = 𝜙(𝑑) whenever 𝑑∣(𝑝 − 1). Since there are 𝑝 − 1 elements in ℤ∗𝑝 , we deduce from Lemma 5.10 that ∑ ∑ 𝑁 (𝑑) = 𝑝 − 1 = 𝜙(𝑑). 𝑑∣(𝑝−1) 𝑑∣(𝑝−1) Since 𝑁 (𝑑) ≤ 𝜙(𝑑) for each 𝑑, we must actually have 𝑁 (𝑑) = 𝜙(𝑑) for each 𝑑, and this completes the proof. □ Corollary 5.12. There are exactly 𝜙(𝑝 − 1) primitive roots in ℤ∗𝑝 . Proof. This follows immediately by taking 𝑑 = 𝑝 − 1 in Theorem 5.11. □ The Lucas primality test. Suppose the integer 𝑛 has passed a strong pseudoprime test and is therefore suspected to be prime. It turns out that we can then use primitive roots to try to prove that 𝑛 is prime. Suppose that 𝑛 passes the ordinary pseudoprime test for the base 𝑏, so that 𝑏𝑛−1 ≡ 1 (mod 𝑛), and further that we are able to factor 𝑛 − 1, say 𝑛 − 1 = 𝑝𝑒11 ⋅ ⋅ ⋅ 𝑝𝑒𝑟𝑟 . Theorem 5.1 implies that the order of 𝑏 modulo 𝑛 divides 𝑛 − 1, so if we can show that 𝑏(𝑛−1)/𝑝𝑖 ∕≡ 1 (mod 𝑛) for each 𝑖 then we may conclude that the order of 𝑏 is actually 𝑛 − 1. On the other hand, we know from Euler’s Theorem that the order of 𝑏 cannot exceed 𝜙(𝑛), so we have 𝑛 − 1 ≤ 𝜙(𝑛). But it follows easily from Corollary 3.22 that this can only happen if 𝑛 is prime, in which case 𝜙(𝑛) = 𝑛 − 1. Example 5.13. Use the Lucas test to prove that 𝑛 = 631 is prime. Solution. We have 3630 ≡ 1 (mod 631), so the order of 3 modulo 631 must divide 630. Moreover we have 630 = 2 ⋅ 32 ⋅ 5 ⋅ 7 and 3(𝑛−1)/2 = 3315 ≡ −1 (mod 631), (𝑛−1)/5 3 =3 126 ≡ 242 (mod 631), 3(𝑛−1)/3 = 3210 ≡ −44 3 (𝑛−1)/7 =3 90 ≡ 269 (mod 631), (mod 631), which shows that the order of 3 modulo 631 is actually equal to 630. We may therefore conclude that 631 is prime and that 3 is a primitive root modulo 631. □ If we find an element 𝑏 of order 𝑛 − 1 in ℤ∗𝑛 , then the above argument shows that 𝑛 is prime and hence that 𝑏 is a primitive root modulo 𝑛. So the success of the test depends in part on being able to find primitive roots quickly. However, Corollary 5.12 implies that there are 𝜙(𝑛 − 1) primitive roots in ℤ∗𝑛 when 𝑛 is prime, and a bit of elementary analytic number theory shows that 𝜙(𝑛) ≈ 𝜋62 𝑛 on average. Hence the proportion of numbers 𝑏 ≤ 𝑛 that are primitive roots modulo 𝑛 averages about 6/𝜋 2 ≈ 0.608 for large prime values of 𝑛. Thus we have a good chance of finding a suitable 𝑏 fairly quickly if 𝑛 is in fact prime. A more serious issue is that it may not be easy to factor 𝑛 − 1. If we’re lucky, it will have several relatively small prime factors, but there might be a large factor remaining whose primality needs to be established. In this case, we can iterate the Lucas test until our numbers 𝑛 − 1 are small enough to be factored by trial division. MA 311 NUMBER THEORY FALL 2008 31 The Diffie-Hellman key exchange. The first secure method for public-key cryptography was actually developed about two years before the RSA breakthrough. One of the fundamental problems with classical cryptography is the difficulty of agreeing on the key for a particular cipher without having this information intercepted. Diffie and Hellman resolved this by generating large prime 𝑝 and then choosing a primitive root 𝑠 modulo 𝑝. Note that Corollary 5.12 ensures that there are 𝜙(𝑝 − 1) possible choices for 𝑠. The pair (𝑝, 𝑠) is public information. Now if Alice wants to communicate with Bob, she chooses a number 𝑎 at random between 2 and 𝑝 − 2 and sends him 𝛼 = 𝑠𝑎 MOD 𝑝. Bob then chooses a number 𝑏 at random between 2 and 𝑝 − 2 and sends Alice 𝛽 = 𝑠𝑏 MOD 𝑝. Since 𝑠 is a primitive root modulo 𝑝, we know that neither 𝛼 nor 𝛽 will equal 1. We observe that 𝑘 = 𝑠𝑎𝑏 MOD 𝑝 = 𝛽 𝑎 MOD 𝑝 = 𝛼𝑏 MOD 𝑝 can now be calculated by both Alice and Bob and can be used as the key for whatever cryptosystem they employ. Example 5.14. Using the public prime 𝑝 = 197 and public base 𝑠 = 31, show how Alice and Bob can agree on a common key for secure communication. Solution. Suppose that Alice randomly chooses 𝑎 = 72. Then she sends 𝛼 = 3172 MOD 197 = 76 to Bob. If Bob randomly chooses 𝑏 = 109, then he sends 𝛽 = 31109 MOD 197 = 147 to Alice. At this point, Alice computes 𝛽 72 MOD 197 = 14772 MOD 197 = 28 and Bob computes 𝛼109 MOD 197 = 76109 MOD 197 = 28, so they’ve agreed on the key 𝑘 = 28. □ The natural way for Eve (who is eavesdropping) to obtain the key 𝑘 would be to solve the two congruences 𝑠𝑎 ≡ 𝛼 (mod 𝑝) and 𝑠𝑏 ≡ 𝛽 (mod 𝑝) (5.1) for 𝑎 and 𝑏. Solving either one of these is known as the discrete log problem, and there is no known efficient algorithm for handling it. It is believed that no such algorithm exists, but this has not been proven. What Eve really needs is an efficient algorithm for finding 𝑠𝑎𝑏 MOD 𝑝 from 𝑠𝑎 and 𝑠𝑏 , which is known as the Diffie-Hellman problem. Its solution would obviously follow from a solution to the discrete log problem, but it’s not known whether the two problems are equivalent. In view of Theorem 5.4, the fact that 𝑠 is a primitive root modulo 𝑝 ensures that the congruences (5.1) have unique solutions 𝑎 and 𝑏 between 1 and 𝑝 − 1 for every choice of 𝛼 and 𝛽 between 1 and 𝑝 − 1. The uniqueness of the solutions obviously makes it less likely that Eve will stumble upon one quickly by trial and error. The ElGamal Cryptosystem. One disadvantage of the Diffie-Hellman method is that Alice has to wait for a response from Bob before she can calculate the key and initiate secure communication. However, ElGamal showed that the protocol can be adapted to create a selfcontained public-key cryptosystem. In addition to the public prime 𝑝 and base 𝑠, suppose 32 SCOTT T. PARSELL that Alice and Bob publish their numbers 𝛼 and 𝛽 in a directory. If Alice wants to send a message 𝑥 to Bob, she generates a random session key 𝑘 between 2 and 𝑝 − 2 and sends Bob 𝑡 = 𝑠𝑘 MOD 𝑝 and 𝑦 = 𝛽 𝑘 𝑥 MOD 𝑝. He then recovers the message by computing 𝑦(𝑡𝑏 )−1 MOD 𝑝 = (𝑥𝑠𝑏𝑘 ) ⋅ (𝑠𝑘𝑏 )−1 = 𝑥, provided that 𝑥 ≤ 𝑝 − 1. Longer messages can of course be broken into blocks prior to encryption. Example 5.15. Suppose that Alice and Bob use ElGamal with public prime 𝑝 = 11881379 and base 𝑠 = 23, and that Alice has published 𝛼 = 10442571. How can Bob discreetly ask Alice to tea? Solution. Bob first needs to pick a random session key, say 𝑘 = 101. He then converts TEA to digital form, say 𝑥 = 200501, and calculates 𝑡 = 23101 MOD 𝑝 = 3054634 and 𝑦 = 200501 ⋅ 10442571101 MOD 𝑝 = 3497868. Bob now sends the pair (3054634, 3497868) to Alice, and she recovers the message using her private key, 𝑎 = 8137: 3497868 ⋅ (30546348137 )−1 MOD 𝑝 = 3497868 ⋅ 7225717 MOD 𝑝 = 200501. □ 6. Quadratic reciprocity Quadratic Residues. Having studied linear congruences in §3, it is natural to ask about solving quadratic congruences. Let 𝑝 be a prime and let 𝑎 ∈ ℤ∗𝑝 . We say that 𝑎 is a quadratic residue modulo 𝑝 if there exists 𝑥 such that 𝑥2 ≡ 𝑎 (mod 𝑝). If no such 𝑥 exists, then 𝑎 is called a quadratic non-residue modulo 𝑝. We sometimes denote the sets of quadratic residues and non-residues in ℤ∗𝑝 by 𝑅 and 𝑁 , respectively. Example 6.1. Identify the quadratic residues and non-residues in ℤ∗5 , ℤ∗7 , and ℤ∗11 . Solution. In ℤ∗5 , we have 𝑅 = {1, 4} and 𝑁 = {2, 3}. In ℤ∗7 , we have 𝑅 = {1, 2, 4} and 𝑁 = {3, 5, 6}. In ℤ∗11 , we have 𝑅 = {1, 3, 4, 5, 9} and 𝑁 = {2, 6, 7, 8, 10}. □ The next theorem shows that there are always equal numbers of quadratic residues and non-residues modulo an odd prime. Theorem 6.2. If 𝑝 is an odd prime, then ∣𝑅∣ = ∣𝑁 ∣ = 12 (𝑝 − 1). Proof. If 𝑥2 ≡ 𝑦 2 (mod 𝑝), then 𝑝 divides 𝑥2 −𝑦 2 = (𝑥−𝑦)(𝑥+𝑦), so Euclid’s Lemma implies that 𝑝 divides 𝑥 − 𝑦 or 𝑥 + 𝑦, and thus 𝑥 ≡ ±𝑦 (mod 𝑝). Thus every quadratic residue in ℤ∗𝑝 has exactly two distinct square roots, which implies that the set 𝑅 = {𝑥2 : 𝑥 ∈ ℤ∗𝑝 } contains □ exactly half the elements of ℤ∗𝑝 . How do we determine whether a particular element of ℤ∗𝑝 is a quadratic residue? One answer is given by the following theorem. MA 311 NUMBER THEORY FALL 2008 33 Theorem 6.3. (Euler’s Criterion) Let 𝑝 be an odd prime, and let 𝑎 ∈ ℤ∗𝑝 . Then 𝑎 is a quadratic residue modulo 𝑝 if and only if 𝑎(𝑝−1)/2 ≡ 1 (mod 𝑝). Proof. By Theorem 5.11, we know that there exists a primitive root 𝑔 modulo 𝑝. If 𝑎 is a quadratic residue modulo 𝑝, then there exists 𝑥 ∈ ℤ∗𝑝 with 𝑥2 ≡ 𝑎 (mod 𝑝). By Theorem 5.4, we have 𝑥 ≡ 𝑔 𝑖 (mod 𝑝) for some integer 𝑖, and thus 𝑎 ≡ 𝑔 2𝑖 (mod 𝑝). It follows that 𝑎(𝑝−1)/2 ≡ (𝑔 2𝑖 )(𝑝−1)/2 ≡ (𝑔 𝑝−1 )𝑖 ≡ 1 (mod 𝑝). Conversely, if 𝑎 is a quadratic non-residue modulo 𝑝, then 𝑎 cannot be an even power of 𝑔, so Theorem 5.4 implies that 𝑎 ≡ 𝑔 2𝑗+1 (mod 𝑝) for some integer 𝑗. Thus we have 𝑎(𝑝−1)/2 ≡ (𝑔 2𝑗+1 )(𝑝−1)/2 ≡ (𝑔 𝑝−1 )𝑗 𝑔 (𝑝−1)/2 ≡ 𝑔 (𝑝−1)/2 ∕≡ 1 (mod 𝑝), since 𝑔 has order 𝑝 − 1. In fact, we can deduce that 𝑎(𝑝−1)/2 ≡ −1 (mod 𝑝), since 𝑎(𝑝−1)/2 is a solution of the congruence 𝑥2 ≡ 1 (mod 𝑝). □ As a result, the congruence 𝑥2 ≡ 𝑎 (mod 𝑝) has two solutions if 𝑎(𝑝−1)/2 ≡ 1 (mod 𝑝) and no solutions if 𝑎(𝑝−1)/2 ≡ −1 (mod 𝑝). Example 6.4. Decide whether the congruence 𝑥2 ≡ 6 (mod 37) has solutions. Solution. We have 618 ≡ (62 )9 ≡ (−1)9 ≡ −1 (mod 37), so Euler’s Criterion implies that the congruence has no solution. □ ( ) When 𝑎 is an integer and 𝑝 is an odd prime, ⎧ ( ) ⎨ 0 𝑎 = 1 𝑝 ⎩−1 we define the Legendre symbol 𝑎 𝑝 by if 𝑝∣𝑎 if 𝑎 ∈ 𝑅 . if 𝑎 ∈ 𝑁 Note that this definition only depends on the residue class( of) 𝑎 modulo ( 3 ) 𝑝, so replacing ( 7 ) 𝑎 by 2 𝑎 + 𝑘𝑝 does not change the value. For example, we have 7 = 1, 7 = −1, and 7 = 0. The Legendre symbol (sometimes read as “𝑎 on 𝑝”) has the following useful properties: Theorem 6.5. Let 𝑎 and 𝑏 be integers, and let 𝑝 be an odd prime. Then ( ) ( ) ( )( ) 𝑎 𝑎𝑏 𝑎 𝑏 (𝑝−1)/2 (i) ≡𝑎 (mod 𝑝) (iii) = 𝑝 𝑝 𝑝 𝑝 ( ) ( 2) −1 𝑎 (ii) = (−1)(𝑝−1)/2 (iv) = 1 if 𝑎 is not divisible by 𝑝. 𝑝 𝑝 Proof. Fermat’s Little Theorem gives 𝑎𝑝−1 ≡ 1 (mod 𝑝) when (𝑎, 𝑝) = 1, so property (i) follows immediately from Euler’s Criterion and Euclid’s Lemma. Properties (ii), (iii), and (iv) follow easily from (i). □ Note that property (ii) implies that −1 is a quadratic residue mod 𝑝 if and only if 𝑝 ≡ 1 (mod 4). For example, the congruence 𝑥2 ≡ −1 is solvable modulo 73 but not modulo 71. 34 SCOTT T. PARSELL The following criterion is a key ingredient in proving the law of quadratic reciprocity, which provides an efficient method for computing the Legendre symbol. Theorem 6.6. (Gauss’ Criterion) Let 𝑝 be an odd prime, and let 𝑎 be a positive integer not divisible by 𝑝. For 1 ≤ 𝑖 ≤ 21 (𝑝 − 1), let 𝑟𝑖 = 𝑎(2𝑖 − 1) MOD 𝑝, and let 𝑡 be the number of 𝑟𝑖 that are even. Then we have ( ) 𝑎 = (−1)𝑡 . 𝑝 Example 6.7. Use Gauss’ criterion to calculate (2) ( 2 ) ( 2 ) (2) , 11 , 13 , and 17 . 7 Solution. For 𝑝 = 7, we have 𝑟1 = 2 ⋅ 1 = 2, 𝑟2 = 2 ⋅ 3 = 6, and ( )𝑟3 = 2 ⋅ 5 = 3. Hence the number of even residues is 𝑡 = 2, and Gauss’ Criterion gives 72 = (−1)2 = 1. Similarly, have 𝑟1 = 2, 𝑟2 = 6, 𝑟3 = 10, 𝑟4 = 3, and 𝑟5 = 7, which yields 𝑡 = 3 and (for2 )𝑝 = 11 we 3 = (−1) = −1. For 𝑝 =( 13,) we get 𝑟1 = 2, 𝑟2 = 6, 𝑟3 = 10, 𝑟4 = 1, 𝑟5 = 5, and 𝑟6 = 9, 11 2 so we again have 𝑡 = 3 and 13 = −1. Finally, for 𝑝 = 17 we have(𝑟1 )= 2, 𝑟2 = 6, 𝑟3 = 10, 2 𝑟4 = 14, 𝑟5 = 1, 𝑟6 = 5, 𝑟7 = 9, and 𝑟8 = 13, which gives 𝑡 = 4 and 17 = 1. □ The result of Example 6.7 may be generalized as follows. ( ) 2 2 Corollary 6.8. If 𝑝 is an odd prime, then = (−1)(𝑝 −1)/8 . 𝑝 Proof. It is an easy exercise to check that (𝑝2 − 1)/8 is even if 𝑝 ≡ ±1 (mod 8) and odd if 𝑝 ≡ ±3 (mod 8). The proof therefore splits into four cases. First of all, suppose that 𝑝 ≡ 1 (mod 8) so that 𝑝 = 8𝑘 + 1 for some positive integer 𝑘. The numbers 2 ⋅ 1, 2 ⋅ 3, 2 ⋅ 5, . . . , 2(4𝑘 − 1) are all less than 𝑝 (since 8𝑘 − 2 < 8𝑘 + 1), so their residues are all clearly even. On the other hand, the numbers 2(4𝑘 + 1), 2(4𝑘 + 3), 2(4𝑘 + 5), . . . , 2(8𝑘 − 1) all lie between 𝑝 and 2𝑝, so their residues are 1, 5, 9, . . . , 8𝑘 − 3, which are all odd. The number of even residues in Gauss’ criterion is therefore 𝑡 = 2𝑘, since 2𝑖 − 1 ranges from 1 to 4𝑘 − 1 as 𝑖 ranges from 1 to 2𝑘, and thus (2/𝑝) = (−1)2𝑘 = 1. Next suppose that 𝑝 ≡ 3 (mod 8), so that 𝑝 = 8𝑘 + 3. Then the numbers 2 ⋅ 1, 2 ⋅ 3, 2 ⋅ 5, . . . , 2(4𝑘 + 1) are all less than 𝑝 (since 8𝑘 + 2 < 8𝑘 + 3), so their residues are even. The numbers 2(4𝑘 + 3), 2(4𝑘 + 5), 2(4𝑘 + 7), . . . , 2(8𝑘 + 1) all lie between 𝑝 and 2𝑝, so their residues are 3, 7, 11, . . . , 8𝑘 − 1, which are all odd. We therefore have 𝑡 = 2𝑘 + 1 and hence (2/𝑝) = (−1)2𝑘+1 = −1. The remaining two cases are left as exercises. □ Proof of Gauss’ criterion: Write 𝑚 = 21 (𝑝 − 1). We re-index the residues so that 𝑟1 , 𝑟2 , . . . , 𝑟𝑡 are even and 𝑟𝑡+1 , 𝑟𝑡+2 , . . . , 𝑟𝑚 are odd. Let 𝑏1 , 𝑏2 , . . . , 𝑏𝑚 be the positive odd MA 311 NUMBER THEORY FALL 2008 35 integers less than 𝑝, re-ordered so that 𝑟𝑖 = 𝑎𝑏𝑖 MOD 𝑝. The numbers 𝑝 − 𝑟1 , 𝑝 − 𝑟2 , . . . , 𝑝 − 𝑟𝑡 , 𝑟𝑡+1 , 𝑟𝑡+2 , . . . , 𝑟𝑚 are positive odd integers less than 𝑝; we claim that they are distinct and hence a re-ordering of 𝑏1 , . . . , 𝑏𝑚 . To show this, we consider three cases: (i) If 𝑟𝑖 = 𝑟𝑗 , where 𝑡 + 1 ≤ 𝑖, 𝑗 ≤ 𝑚, then 𝑎𝑏𝑖 ≡ 𝑎𝑏𝑗 (mod 𝑝), so Lemma 3.3 gives 𝑏𝑖 ≡ 𝑏𝑗 (mod 𝑝). But 𝑏1 , . . . , 𝑏𝑚 are distinct positive integers less than 𝑝, so we deduce that 𝑖 = 𝑗. (ii) If 𝑝 − 𝑟𝑖 = 𝑝 − 𝑟𝑗 , where 1 ≤ 𝑖, 𝑗 ≤ 𝑡, then 𝑟𝑖 = 𝑟𝑗 , so the above argument gives 𝑖 = 𝑗. (iii) If 𝑝 − 𝑟𝑖 = 𝑟𝑗 , where 1 ≤ 𝑖 ≤ 𝑡 and 𝑡 + 1 ≤ 𝑗 ≤ 𝑚, then 𝑟𝑖 + 𝑟𝑗 ≡ 0 (mod 𝑝), so 𝑎(𝑏𝑖 + 𝑏𝑗 ) ≡ 0 (mod 𝑝), and thus 𝑏𝑖 + 𝑏𝑗 ≡ 0 (mod 𝑝). Since 0 < 𝑏𝑖 + 𝑏𝑗 < 2𝑝, it follows that 𝑏𝑖 + 𝑏𝑗 = 𝑝, which is impossible since 𝑏𝑖 + 𝑏𝑗 is even. We therefore have 𝑏1 ⋅ ⋅ ⋅ 𝑏𝑚 ≡ (𝑝 − 𝑟1 ) ⋅ ⋅ ⋅ (𝑝 − 𝑟𝑡 )𝑟𝑡+1 ⋅ ⋅ ⋅ 𝑟𝑚 ≡ (−1)𝑡 𝑟1 ⋅ ⋅ ⋅ 𝑟𝑚 ≡ (−1)𝑡 𝑎𝑚 (𝑏1 ⋅ ⋅ ⋅ 𝑏𝑚 ) (mod 𝑝). Since 𝑝 does not divide 𝑏1 ⋅ ⋅ ⋅ 𝑏𝑚 , we deduce that 𝑎𝑚 ≡ (−1)𝑡 (mod 𝑝), and the result now follows from part (i) of Theorem 6.5. □ We are now ready to state the main theorem of this section, which is one of the most important and beautiful results in elementary number theory. Theorem 6.9. (Quadratic Reciprocity) If 𝑝 and 𝑞 are distinct odd primes, then { ( )( ) (𝑝−1)(𝑞−1) 𝑝 𝑞 1 if 𝑝 ≡ 1 (mod 4) or 𝑞 ≡ 1 (mod 4) 4 = (−1) = 𝑞 𝑝 −1 if 𝑝 ≡ 𝑞 ≡ 3 (mod 4). The proof of Theorem 6.9 uses Gauss’ criterion but requires a somewhat technical argument to count the even residues 𝑝(2𝑖 − 1) MOD 𝑞 and 𝑞(2𝑗 − 1) MOD 𝑝. There are actually many ways of proving quadratic reciprocity; over 200 different proofs have appeared in print since Gauss’ original work in the early 1800s. Before launching into a proof, we illustrate with some typical applications. ( ) 11 Example 6.10. Use quadratic reciprocity to calculate . 31 Solution. Since 11 and 31 are both primes congruent to 3 mod 4, quadratic reciprocity gives ( 11 ) ( 31 ) ( 31 ) ( 9 ) ( 32 ) = − 11 . Now since 31 ≡ 9 (mod 11), we have 11 = 11 = 11 = 1. We therefore 31 ( 11 ) conclude that 31 = −1 and hence that 11 is a quadratic non-residue modulo 31. □ ( Example 6.11. Use quadratic reciprocity to calculate ) 42 . 61 Solution. We first apply Theorem 6.5 (iii) to write ( ) ( )( )( ) 42 2 3 7 = . 61 61 61 61 36 SCOTT T. PARSELL (2) Since 61 ≡ 5 (mod 8), Corollary 6.8 gives 61 = −1. Next, since 61 ≡ 1 (mod 4), we may apply quadratic reciprocity to obtain ( ) ( ) ( ) ( ) ( ) ( ) 61 1 7 61 5 3 = = = 1 and = = = −1, 61 3 3 61 7 7 ( 42 ) by the result of Example 6.1. Thus we conclude that 61 = (−1) ⋅ (1) ⋅ (−1) = 1, and hence that 42 is a quadratic residue modulo 61. □ Quadratic reciprocity can be used to determine a general criterion for 3 to be a quadratic residue modulo a prime 𝑝 > 3. The result is somewhat reminiscent of the analogous criterion for (2/𝑝) given in Corollary 6.8, except that here the conclusion depends on the residue class of 𝑝 modulo 12 rather than modulo 8. ( ) { 3 1 if 𝑝 ≡ ±1 (mod 12) Corollary 6.12. One has = . 𝑝 −1 if 𝑝 ≡ ±5 (mod 12) Proof. If 𝑝 ≡ 1 (mod 12), then 𝑝 ≡ 1 (mod 3) and 𝑝 ≡ 1 (mod 4), so quadratic reciprocity gives ( ) ( ) ( ) 3 𝑝 1 = = = 1. 𝑝 3 3 Similarly, if 𝑝 ≡ −1 (mod 12), then 𝑝 ≡ 2 (mod 3) and 𝑝 ≡ 3 (mod 4), so quadratic reciprocity yields ( ) ( ) ( ) 3 𝑝 2 =− =− = −(−1) = 1. 𝑝 3 3 We leave the remaining two cases as exercises. □ A proof of quadratic reciprocity. We now describe an argument that leads from Gauss’ criterion to the conclusion of Theorem 6.9. Let 𝑝 and 𝑞 be odd primes, and define 𝑟𝑖 = 𝑞(2𝑖 − 1) MOD 𝑝 (1 ≤ 𝑖 ≤ 𝑝−1 ) 2 and 𝑠𝑗 = 𝑝(2𝑗 − 1) MOD 𝑞 (1 ≤ 𝑗 ≤ 𝑞−1 ). 2 ( ) ( ) By Theorem 6.6, we have 𝑝𝑞 = (−1)𝑡 , where 𝑡 is the number of even 𝑟𝑖 , and 𝑝𝑞 = (−1)𝑢 , where 𝑢 is the number of even 𝑠𝑗 . It follows that ( )( ) 𝑝 𝑞 = (−1)𝑡+𝑢 . (6.1) 𝑞 𝑝 It therefore suffices to show that 𝑡 + 𝑢 is odd if and only if 𝑝 ≡ 𝑞 ≡ 3 (mod 4). We now let 𝑋 denote the set of all integers of the form 𝑥 = 𝑞𝑎 − 𝑝𝑏, where 𝑎 and 𝑏 are odd integers with 1 ≤ 𝑎 < 𝑝 and 1 ≤ 𝑏 < 𝑞. For example, if 𝑝 = 7 and 𝑞 = 11, each element of 𝑋 has the form 𝑥 = 11𝑎 − 7𝑏 where 𝑎 ∈ {1, 3, 5} and 𝑏 ∈ {1, 3, 5, 7, 9}. Taking 𝑎 = 1 gives 𝑥 = 4, −10, −24, −38, −52, while 𝑎 = 3 gives 𝑥 = 26, 12, −2, −16, −30, and finally 𝑎 = 5 gives 𝑥 = 48, 34, 20, 6, −8, for a total of 15 elements. MA 311 NUMBER THEORY FALL 2008 37 Lemma 6.13. The elements of 𝑋 are nonzero even integers, and one has ∣𝑋∣ = 14 (𝑝 − 1)(𝑞 − 1). Proof. Suppose that 𝑥 = 𝑞𝑎 − 𝑝𝑏 ∈ 𝑋. Then 𝑞𝑎 and 𝑝𝑏 are odd, so 𝑥 is clearly even. Moreover, if 𝑞𝑎 = 𝑝𝑏, then 𝑝∣𝑞𝑎, and hence 𝑝∣𝑎, which is impossible since 1 ≤ 𝑎 < 𝑝. Finally, if 𝑞𝑎 − 𝑝𝑏 = 𝑞𝑎′ − 𝑝𝑏′ , then 𝑞(𝑎 − 𝑎′ ) = 𝑝(𝑏 − 𝑏′ ), which implies that 𝑝∣(𝑎 − 𝑎′ ) and hence that 𝑎 = 𝑎′ and 𝑏 = 𝑏′ , since −𝑝 < 𝑎 − 𝑎′ < 𝑝. Hence these expressions are all distinct, and ∣𝑋∣ is just the number of ordered pairs (𝑎, 𝑏). □ Next, we let 𝑌 = {𝑟 ∈ 𝑋 : −𝑞 < 𝑟 < 𝑝}. For example, when 𝑝 = 7 and 𝑞 = 11, we have 𝑌 = {−10, −8, −2, 4, 6}. Lemma 6.14. One has ∣𝑌 ∣ = 𝑡 + 𝑢. Proof. First suppose that 𝑟 ∈ 𝑌 and that 0 < 𝑟 < 𝑝. Then 𝑟 ≡ 𝑞𝑎 (mod 𝑝) for some odd integer 𝑎 with 1 ≤ 𝑎 < 𝑝, and we can write 𝑎 = 2𝑖 − 1 with 1 ≤ 𝑖 ≤ 21 (𝑝 − 1). But since 𝑟 < 𝑝, we must actually have 𝑟 = 𝑟𝑖 , and Lemma 6.13 shows that this is one of the even residues counted by 𝑡. On the other hand, if 𝑟𝑖 = 𝑞(2𝑖 − 1) MOD 𝑝 is even, then 0 < 𝑟𝑖 < 𝑝, and 𝑟𝑖 ≡ 𝑞𝑎 (mod 𝑝), where 𝑎 = 2𝑖 − 1 is odd and 1 ≤ 𝑎 < 𝑝. It then follows that 𝑞𝑎 − 𝑟𝑖 = 𝑝𝑏 for some 𝑏 ∈ ℤ, and clearly 𝑏 must be odd and positive. Moreover, 𝑝𝑏 < 𝑞𝑎 < 𝑞𝑝 and hence 𝑏 < 𝑞. This shows that 𝑟𝑖 ∈ 𝑌 . We may therefore conclude that the elements 𝑟 ∈ 𝑌 with 0 < 𝑟 < 𝑝 are precisely the even residues 𝑟𝑖 counted by 𝑡. A similar argument shows that the elements 𝑠 ∈ 𝑌 with −𝑞 < 𝑠 < 0 are precisely the negatives of the even residues 𝑠𝑗 counted by 𝑢, and the lemma follows immediately. □ To determine whether 𝑡 + 𝑢 is even, we attempt to pair up the elements of 𝑌 via the correspondence 𝑞𝑎 − 𝑝𝑏 Ã→ 𝑞(𝑝 − 1 − 𝑎) − 𝑝(𝑞 − 1 − 𝑏). (6.2) For example, when 𝑝 = 7 and 𝑞 = 11, we have 11𝑎 − 7𝑏 Ã→ 11(6 − 𝑎) − 7(10 − 𝑏), which gives the pairs (4, −8), (−10, 6), and (−2, −2). On the other hand, if 𝑝 = 5 and 𝑞 = 7, then 𝑋 = {−18, −8, −4, 2, 6, 16} and 𝑌 = {−4, 2}, so the correspondence 7𝑎−5𝑏 Ã→ 7(4−𝑎)−5(6−𝑏) yields the obvious pair (2, −4). We now aim to show that this correspondence gives the desired parity result for ∣𝑌 ∣. Lemma 6.15. The pairs arising from the correspondence (6.2) consist of distinct elements unless 𝑝 ≡ 𝑞 ≡ 3 (mod 4), in which case a single element is paired with itself. Proof. We first note that if 𝑞𝑎 − 𝑝𝑏 ∈ 𝑌 then one has −𝑞 = −𝑞 + 𝑝 − 𝑝 < −𝑞 + 𝑝 − (𝑞𝑎 − 𝑝𝑏) < −𝑞 + 𝑝 + 𝑞 = 𝑝, which shows that 𝑞(𝑝 − 1 − 𝑎) − 𝑝(𝑞 − 1 − 𝑏) = −𝑞 + 𝑝 − (𝑞𝑎 − 𝑝𝑏) ∈ 𝑌 . Moreover, Lemma 6.13 shows that the expressions 𝑞𝑎 − 𝑝𝑏 are distinct, so if an element is paired with itself in (6.2), we must have 𝑎 = 𝑝 − 1 − 𝑎 and 𝑏 = 𝑞 − 1 − 𝑏, which gives 𝑎 = 12 (𝑝 − 1) and 𝑏 = 21 (𝑞 − 1). But these values are both odd if and only if 𝑝 ≡ 𝑞 ≡ 3 (mod 4), and this completes the proof. □ The proof of quadratic reciprocity is now within our grasp. By Lemmas 6.14 and 6.15, we see that ∣𝑌 ∣ = 𝑡 + 𝑢 is odd if and only if 𝑝 ≡ 𝑞 ≡ 3 (mod 4), so the result follows from (6.1). 38 SCOTT T. PARSELL The Jacobi symbol. There is a generalization of the Legendre symbol, called the Jacobi symbol, that is defined whenever the bottom entry is odd. If 𝑚 = 𝑝1 ⋅ ⋅ ⋅ 𝑝𝑟 , where the 𝑝𝑖 are (not necessarily distinct) primes, then we define ( ) ( ) ( ) 𝑎 𝑎 𝑎 = ⋅⋅⋅ , 𝑚 𝑝1 𝑝𝑟 where the factors on the right are Legendre symbols. It turns out that the Jacobi symbol enjoys many of the same properties as the Legendre symbol, including the law of quadratic reciprocity. Theorem 6.16. The results of Theorem 6.5 (𝑖𝑖), (𝑖𝑖𝑖), Corollary 6.8, and Theorem 6.9 hold with the Legendre symbol replaced by the Jacobi symbol and the odd primes 𝑝 and 𝑞 replaced by odd positive integers. Proof. It suffices to write out the prime factorizations of the odd integers in question and apply the definition of the Jacobi symbol in combination with the corresponding properties of the Legendre symbol. We leave the details as an exercise. □ Note that part (iv) of Theorem 6.5 does not quite hold for the Jacobi symbol. The correct analogue is that (𝑛2 /𝑚) = 1 if (𝑚, 𝑛) = 1. Theorem 6.16 often allows us to perform computations with Legendre symbols more efficiently than was previously possible. For instance, in Example 6.11, we could apply quadratic reciprocity for Jacobi symbols to obtain ( ) ( ) ( ) ( ) ( ) 21 61 19 21 2 = = = = = −1 61 21 21 19 19 rather than dealing with (3/61) and (7/61) separately. Unfortunately, the Jacobi symbol (𝑎/𝑚) does not tell us whether 𝑎 is a square mod 𝑚. For example, (2/9) = (2/3)(2/3) = 1, but 2 is not a square modulo 9. 7. Some diophantine equations A diophantine equation usually refers to a polynomial equation with integer coefficients to which we seek integer solutions. As a simple example, consider the equation 9𝑥 + 6𝑦 = 20. This is a linear diophantine equation in two variables. A moment’s thought reveals that this equation has no integer solutions, since 9𝑥+6𝑦 is divisible by 3 for any integers 𝑥 and 𝑦 while 20 is not divisible by 3. From another point of view, notice that solving the above equation is equivalent to solving the congruence 9𝑥 ≡ 20 (mod 6), and we know from Theorem 3.8 that this has no solution since (9, 6) = 3 does not divide 20. On the other hand, the equation 2𝑥 + 3𝑦 = 7 has infinitely many integer solutions, given by 𝑥 = −1 + 3𝑘 and 𝑦 = 3 − 2𝑘 for any 𝑘 ∈ ℤ. The following theorem characterizes the solutions of the linear diophantine equation 𝑎𝑥 + 𝑏𝑦 = 𝑐. Theorem 7.1. Let 𝑎, 𝑏, and 𝑐 be integers, and write 𝑑 = (𝑎, 𝑏). The equation 𝑎𝑥 + 𝑏𝑦 = 𝑐 has integer solutions if and only if 𝑑∣𝑐. Moreover, the set of solutions is given by 𝑥 = 𝑥0 + 𝑘𝑏/𝑑, where (𝑥0 , 𝑦0 ) is any particular solution. 𝑦 = 𝑦0 − 𝑘𝑎/𝑑 (𝑘 ∈ ℤ), MA 311 NUMBER THEORY FALL 2008 39 Proof. The equation 𝑎𝑥 + 𝑏𝑦 = 𝑐 is equivalent to the congruence 𝑎𝑥 ≡ 𝑐 (mod 𝑏), and Theorem 3.8 shows that this is solvable if and only if (𝑎, 𝑏) divides 𝑐. If 𝑥0 is any solution of the congruence, then we have 𝑎𝑥0 = 𝑐 − 𝑏𝑦0 for some integer 𝑦0 , so (𝑥0 , 𝑦0 ) solves the equation. Moreover, any solution (𝑥, 𝑦) satisfies the congruences 𝑎 𝑥 𝑑 ≡ 𝑐 𝑑 (mod 𝑑𝑏 ) and 𝑏 𝑦 𝑑 ≡ 𝑐 𝑑 (mod 𝑎𝑑 ), which have unique solutions modulo 𝑏/𝑑 and 𝑎/𝑑, respectively. Therefore we have 𝑥 = 𝑥0 + 𝑘𝑏/𝑑 and 𝑦 = 𝑦0 + 𝑚𝑎/𝑑 for some integers 𝑘 and 𝑚. Substituting into the equation 𝑎𝑥 + 𝑏𝑦 = 𝑐, we find that (𝑥, 𝑦) is a solution if and only if 𝑚 = −𝑘. □ Example 7.2. Describe all integer solutions of the diophantine equations 35𝑥 + 49𝑦 = 64 and 35𝑥 + 49𝑦 = 63. Solution. In view of Theorem 7.1, the equation 35𝑥 + 49𝑦 = 64 has no integer solutions, but the equation 35𝑥 + 49𝑦 = 63 has solutions 𝑥 = −1 + 7𝑘 and 𝑦 = 2 − 5𝑘 for every 𝑘 ∈ ℤ. □ Notice that the solubility of our linear diophantine equation was closely connected to the solubility of the underlying congruences. This is a fairly general principle that is useful to keep in mind when studying higher degree equations. Example 7.3. Determine all integer solutions of the diophantine equation 𝑥2 + 𝑦 2 = 1999. Solution. Notice that 0 and 1 are the only perfect squares modulo 4, and no two of these add up to 3, which is congruent to 1999 modulo 4. We therefore conclude that the equation has no integer solutions. □ Example 7.4. Determine all integer solutions of the equation 𝑥2 + 7𝑦 2 + 35𝑧 2 = 70493. Solution. If (𝑥, 𝑦, 𝑧) were a solution, then 𝑥 would satisfy the congruence 𝑥2 ≡ 70493 ≡ 3 (mod 7). But 3 is a quadratic non-residue modulo 7, so we conclude that there are no integer solutions. □ Pythagorean triples. A famous quadratic diophantine equation in three variables is the Pythagorean equation 𝑥2 + 𝑦 2 = 𝑧 2 . (7.1) Notice that this equation has many “trivial” solutions, (0, 𝑦, ±𝑦) and (𝑥, 0, ±𝑥), obtained by setting one of the variables on the left hand side equal to zero. These solutions are not very interesting. Of course, there are some well-known right triangles with integer side lengths, which give non-trivial solutions such as (3, 4, 5) and (5, 12, 13). A solution to (7.1) is sometimes called a Pythagorean triple. The equation (7.1) also has a special property called homogeneity, which means that if (𝑥, 𝑦, 𝑧) is a solution, then so is (𝑘𝑥, 𝑘𝑦, 𝑘𝑧) for any integer 𝑘. For this reason, we usually restrict attention to the so-called primitive solutions, in which 𝑥, 𝑦, and 𝑧 have no non-trivial common factors. It turns out that we can express all primitive solutions of this equation as a two-parameter family. It is easy to show that in any primitive Pythagorean triple we must have 𝑧 odd and either 𝑥 or 𝑦 even. By interchanging 𝑥 and 𝑦 if necessary, we may suppose without loss of generality that 𝑥 is even. 40 SCOTT T. PARSELL Theorem 7.5. If (𝑥, 𝑦, 𝑧) is a primitive Pythagorean triple, where 𝑥 is even and 𝑥, 𝑦, and 𝑧 are positive, then 𝑥 = 2𝑠𝑡, 𝑦 = 𝑠2 − 𝑡2 , and 𝑧 = 𝑠 2 + 𝑡2 , for some relatively prime positive integers 𝑠 and 𝑡. Conversely, if 𝑠 and 𝑡 are relatively prime, 𝑠 > 𝑡 > 0, and 𝑠 or 𝑡 is even, then (2𝑠𝑡, 𝑠2 − 𝑡2 , 𝑠2 + 𝑡2 ) is a primitive Pythagorean triple. Proof. Let (𝑥, 𝑦, 𝑧) be a positive primitive Pythagorean triple with 𝑥 even and 𝑦 and 𝑧 odd. Then we have 𝑥2 = 𝑧 2 − 𝑦 2 = (𝑧 + 𝑦)(𝑧 − 𝑦), and both 𝑧 + 𝑦 and 𝑧 − 𝑦 are even, so we can write )( ) ( )2 ( 𝑧+𝑦 𝑧−𝑦 𝑥 = , 2 2 2 where all three factors are integers. Any common divisor of (𝑧 + 𝑦)/2 and (𝑧 − 𝑦)/2 would have to divide their sum and difference, 𝑧 and 𝑦, but we know that 𝑧 and 𝑦 are relatively prime and hence so are (𝑧 + 𝑦)/2 and (𝑧 − 𝑦)/2. It follows easily that both (𝑧 + 𝑦)/2 and (𝑧 − 𝑦)/2 must be perfect squares, say 𝑧+𝑦 𝑧−𝑦 = 𝑠2 and = 𝑡2 . 2 2 2 2 2 2 The equations 𝑥 = 2𝑠𝑡, 𝑦 = 𝑠 − 𝑡 , and 𝑧 = 𝑠 + 𝑡 follow immediately. Conversely, it is easy to check that (2𝑠𝑡)2 + (𝑠2 − 𝑡2 )2 = (𝑠2 + 𝑡2 )2 . Moreover, any odd prime dividing both 2𝑠𝑡 and 𝑠2 − 𝑡2 would have to divide either 𝑠 or 𝑡 and either 𝑠 + 𝑡 or 𝑠 − 𝑡, and in all of these cases the prime would divide both 𝑠 and 𝑡. Thus if (𝑠, 𝑡) = 1 then the above triple is primitive. □ Example 7.6. Find all positive primitive Pythagorean triples with one of the variables equal to 15. Solution. Since 15 is odd and is not the sum of two squares, Theorem 7.5 implies that 𝑦 = 𝑠2 − 𝑡2 is the only variable that could take the value 15. So we seek positive integers 𝑠 > 𝑡 such that 𝑠2 − 𝑡2 = (𝑠 + 𝑡)(𝑠 − 𝑡) = 15. Clearly, the only possibilities are 𝑠 + 𝑡 = 15, 𝑠−𝑡=1 and 𝑠 + 𝑡 = 5, 𝑠 − 𝑡 = 3, which yield 𝑠 = 8, 𝑡 = 7 and 𝑠 = 4, 𝑡 = 1. Hence the only Pythagorean triples of this type are (112, 15, 113) and (8, 15, 17). □ Theorem 7.7. The equation 𝑥4 + 𝑦 4 = 𝑧 2 has no integer solutions with 𝑥𝑦𝑧 ∕= 0. Proof. If (𝑥, 𝑦, 𝑧) is a solution with gcd(𝑥, 𝑦) = 𝑑, then 𝑧 2 is divisible by 𝑑4 and hence 𝑧 is divisible by 𝑑2 , so we obtain a new solution (𝑥/𝑑, 𝑦/𝑑, 𝑧/𝑑2 ) with the first two variables relatively prime. Therefore we may suppose that (𝑥, 𝑦, 𝑧) is a solution with 𝑥, 𝑦, and 𝑧 positive, gcd(𝑥, 𝑦) = 1 and 𝑧 as small as possible. We will show how to construct a solution with a smaller value of 𝑧, thereby producing a contradiction. MA 311 NUMBER THEORY FALL 2008 41 Since (𝑥2 , 𝑦 2 , 𝑧) is a positive primitive Pythagorean triple, we may apply Theorem 7.5 (after possibly interchanging 𝑥 and 𝑦) to write 𝑥2 = 2𝑠𝑡, 𝑦 2 = 𝑠 2 − 𝑡2 , and 𝑧 = 𝑠2 + 𝑡2 , where 𝑠 > 𝑡 > 0 and gcd(𝑠, 𝑡) = 1. Since 𝑦 is odd, it follows that 𝑠 is odd and 𝑡 is even, so we in fact have gcd(𝑠, 2𝑡) = 1, and thus 𝑠 and 2𝑡 are both perfect squares, say 𝑠 = 𝑢2 and 2𝑡 = 𝑣 2 . Furthermore, we have 𝑡2 + 𝑦 2 = 𝑠2 , so (𝑡, 𝑦, 𝑠) is another primitive Pythagorean triple, and we can apply Theorem 7.5 again to write 𝑡 = 2𝑆𝑇, 𝑦 = 𝑆 2 − 𝑇 2, and 𝑠 = 𝑆 2 + 𝑇 2, where 𝑆 > 𝑇 > 0 and gcd(𝑆, 𝑇 ) = 1. We now have 𝑆𝑇 = 𝑡/2 = (𝑣/2)2 , which implies that 𝑆 and 𝑇 are both perfect squares, say 𝑆 = 𝑋 2 and 𝑇 = 𝑌 2 . But now 𝑋 4 + 𝑌 4 = 𝑆 2 + 𝑇 2 = 𝑠 = 𝑢2 and 𝑢2 = 𝑠 < (𝑠2 + 𝑡2 )2 = 𝑧 2 , so 𝑢 < 𝑧, and taking 𝑍 = 𝑢 gives a new solution (𝑋, 𝑌, 𝑍) with 𝑍 < 𝑧. □ Corollary 7.8. The equation 𝑥4 + 𝑦 4 = 𝑧 4 has no integer solutions with 𝑥𝑦𝑧 ∕= 0. Proof. If (𝑥, 𝑦, 𝑧) were a solution with 𝑥𝑦𝑧 ∕= 0, then we would have 𝑥4 + 𝑦 4 = (𝑧 2 )2 , contradicting Theorem 7.7. □ Theorem 7.9. (Fermat’s Last Theorem) If 𝑘 is an integer with 𝑘 ≥ 3 is an integer, then the equation 𝑥𝑘 + 𝑦 𝑘 = 𝑧 𝑘 has no integer solutions with 𝑥𝑦𝑧 ∕= 0. Note that this follows easily from Corollary 7.8 when 𝑘 is a multiple of 4. The proof for arbitrary 𝑘 is extremely hard and was just completed by Wiles in 1995. The following symmetric generalization of Fermat’s Last Theorem is still unsolved. Conjecture 7.10. If 𝑘 is an integer with 𝑘 ≥ 5, then the equation 𝑥𝑘 + 𝑦 𝑘 = 𝑧 𝑘 + 𝑤𝑘 has no non-trivial integer solutions. Notice that there are non-trivial solutions to this equation when 𝑘 = 2, 3, and 4. For instance, one has 12 + 72 = 52 + 52 , 13 + 123 = 93 + 103 , and 1334 + 1344 = 1584 + 594 . Equations in “many” variables. A general theme illustrated above is that diophantine equations in few variables (relative to the degree) tend to have few, if any, non-trivial solutions. Conversely, equations in sufficiently many variables (relative to the degree) tend to have many non-trivial solutions. One of the most interesting problems here is to try to quantify the phrase “sufficiently many.” As an example, we look at the problem of representing integers as sums of 𝑘th powers. Theorem 7.11. (Lagrange’s four squares theorem) Every positive integer can be written as the sum of four squares. 42 SCOTT T. PARSELL For example, we have 31 = 52 + 22 + 12 + 12 and 120 = 102 + 42 + 22 + 02 . We leave it as an exercise to show that there are infinitely many positive integers that cannot be represented as sums of three squares. Lemma 7.12. If 𝑚 and 𝑛 are sums of four squares, then so is 𝑚𝑛. Proof. Suppose that 𝑚 = 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 and 𝑛 = 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 . Then it is easy (but somewhat tedious) to verify that 𝑚𝑛 = (𝑥𝑎 + 𝑦𝑏 + 𝑧𝑐 + 𝑤𝑑)2 + (𝑥𝑏 − 𝑦𝑎 + 𝑧𝑑 − 𝑤𝑐)2 + (𝑥𝑐 − 𝑧𝑎 + 𝑤𝑏 − 𝑦𝑑)2 + (𝑥𝑑 − 𝑤𝑎 + 𝑦𝑐 − 𝑧𝑏)2 . We leave this algebra as an exercise. □ Lemma 7.13. If 𝑝 is an odd prime, then there exist integers 𝑥, 𝑦, and 𝑘, with 0 < 𝑘 < 𝑝, such that 𝑥2 + 𝑦 2 + 1 = 𝑘𝑝. Proof. It suffices to find integers 𝑥 and 𝑦 with 𝑥2 + 𝑦 2 + 1 ≡ 0 (mod 𝑝) and 𝑥2 + 𝑦 2 + 1 < 𝑝 2 . (7.2) We divide the proof into two cases. ( ) If 𝑝 ≡ 1 (mod 4), then Theorem 6.5 (ii) implies that −1 = 1, so −1 is a quadratic 𝑝 residue modulo 𝑝. Therefore we can find 𝑥 with 0 < 𝑥 < 𝑝/2 such that 𝑥2 ≡ −1 (mod 𝑝), and (7.2) is satisfied with 𝑦 = 0. ( ) = −1. Now let 𝑎 be the If 𝑝 ≡ 3 (mod 4), then Theorem 6.5 (ii) implies that −1 𝑝 smallest quadratic non-residue modulo 𝑝. Then we have ( ) ( )( ) −𝑎 −1 𝑎 = = (−1)(−1) = 1 𝑝 𝑝 𝑝 by Theorem 6.5 (iii), so −𝑎 is a quadratic residue modulo 𝑝. Therefore we can find 𝑥 with 0 < 𝑥 < 𝑝/2 such that 𝑥2 ≡ −𝑎 (mod 𝑝). Furthermore, the minimality of 𝑎 ensures that 𝑎 − 1 is a quadratic residue modulo 𝑝, so we can find 𝑦 with 0 < 𝑦 < 𝑝/2 such that 𝑦 2 ≡ 𝑎 − 1 (mod 𝑝). It is easy to check that 𝑥 and 𝑦 satisfy (7.2), so this completes the proof. □ Proof of Lagrange’s Theorem: In view of Lemma 7.12 and the fact that 2 = 12 +12 +02 +02 , it suffices to prove that every odd prime 𝑝 is the sum of four squares. By Lemma 7.13, we can find integers 𝑥, 𝑦, 𝑧, and 𝑤 such that 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 = 𝑘𝑝 (7.3) for some positive integer 𝑘 < 𝑝. For instance, take 𝑥 and 𝑦 as in the lemma, 𝑧 = 1, and 𝑤 = 0. We employ a descent argument to show that we can find a solution to (7.3) with 𝑘 = 1. To do this, we suppose that we have a solution with 𝑘 > 1 and demonstrate how to construct a solution with a smaller value of 𝑘. First of all, if 𝑘 is even, then an even number of the variables on the left-hand side are odd, so by relabeling if necessary we may suppose that 𝑥 ± 𝑦 and 𝑧 ± 𝑤 are even, and )2 ( )2 ( )2 ( )2 ( 𝑥−𝑦 𝑧+𝑤 𝑧−𝑤 𝑥+𝑦 + + + = (𝑘/2)𝑝. 2 2 2 2 MA 311 NUMBER THEORY FALL 2008 43 If 𝑘/2 is even, then we can repeat the argument until we obtain a solution to (7.3) with 𝑘 odd, so we may suppose from now on that 𝑘 is odd. Now let 𝑎, 𝑏, 𝑐, and 𝑑 denote the least absolute value residues of 𝑥, 𝑦, 𝑧, and 𝑤 modulo 𝑘. That is, 𝑎 ≡ 𝑥 (mod 𝑘), 𝑏 ≡ 𝑦 (mod 𝑘), 𝑐 ≡ 𝑧 (mod 𝑘), 𝑑 ≡ 𝑤 (mod 𝑘), where ∣𝑎∣, ∣𝑏∣, ∣𝑐∣, ∣𝑑∣ < 𝑘/2 since 𝑘 is odd. Then we have 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 ≡ 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 ≡ 0 (mod 𝑘), so we can write 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 = 𝑘𝑚 for some integer 𝑚, and 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 < 𝑘 2 , so we have 𝑚 < 𝑘. If 𝑚 = 0 then we would have 𝑎 = 𝑏 = 𝑐 = 𝑑 = 0, which would imply that 𝑘𝑝 = 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑤2 is divisible by 𝑘 2 . This cannot occur when 1 < 𝑘 < 𝑝 since 𝑝 is prime, so we conclude that 𝑚 > 0. Now by the proof of Lemma 7.12 we can write (𝑘𝑝)(𝑘𝑚) = 𝑋 2 + 𝑌 2 + 𝑍 2 + 𝑊 2 , where 𝑋 = 𝑥𝑎 + 𝑦𝑏 + 𝑧𝑐 + 𝑤𝑑, 𝑌 = 𝑥𝑏 − 𝑦𝑎 + 𝑧𝑑 − 𝑤𝑐, 𝑍 = 𝑥𝑐 − 𝑧𝑎 + 𝑤𝑏 − 𝑦𝑑, 𝑊 = 𝑥𝑑 − 𝑤𝑎 + 𝑦𝑐 − 𝑧𝑏, and it is easy to check that 𝑋, 𝑌 , 𝑍, and 𝑊 are each divisible by 𝑘. It follows that (𝑋/𝑘)2 + (𝑌 /𝑘)2 + (𝑍/𝑘)2 + (𝑊/𝑘)2 = 𝑚𝑝, which gives a solution of (7.3) with 0 < 𝑚 < 𝑘. This completes the descent and shows that there is in fact a solution with 𝑘 = 1. □ Waring’s problem. One might ask whether similar results exist for higher powers. That is, given a positive integer 𝑘, can we find a positive integer 𝑠 such that all positive integers 𝑛 can be written in the form 𝑛 = 𝑥𝑘1 + 𝑥𝑘2 + ⋅ ⋅ ⋅ + 𝑥𝑘𝑠 (7.4) for some non-negative integers 𝑥1 , 𝑥2 , . . . , 𝑥𝑠 ? This question was posed by Waring in 1770 (around the same time as Lagrange’s Theorem was proved) and has received considerable attention over the past century. The original version of the problem seeks to determine 𝑔(𝑘), which is defined to be the smallest integer 𝑠 such that the above equation can be solved for every positive integer 𝑛. For example, one has 𝑔(2) = 4. It is also known that 𝑔(3) = 9, 𝑔(4) = 19, and 𝑔(5) = 37 and that ⌊( )𝑘 ⌋ 3 𝑘 𝑔(𝑘) ≥ 2 + −2 (7.5) 2 for all 𝑘, where ⌊𝑥⌋ denotes the greatest integer less than or equal to 𝑥. Notice that the integer 23 really does require 9 cubes in order to achieve a representation. Since 23 < 33 and 23 < 23 + 23 + 23 , the most efficient decomposition is 23 = 23 + 23 + 13 + 13 + 13 + 13 + 13 + 13 + 13 . Amazingly, it turns out that 23 and 239 are the only two integers that actually require 9 cubes. In fact, there are only finitely many integers that require 8 cubes, and it follows that every sufficiently large integer can be expressed as the sum of 7 cubes. In general, we 44 SCOTT T. PARSELL define 𝐺(𝑘) to be the smallest integer 𝑠 such that every sufficiently large integer 𝑛 can be represented in the form (7.4). For example, it is known that 𝐺(2) = 4, 4 ≤ 𝐺(3) ≤ 7, 𝐺(4) = 16, 6 ≤ 𝐺(5) ≤ 17, and 9 ≤ 𝐺(6) ≤ 24. It turns out that 𝐺(𝑘) grows much slower than 𝑔(𝑘) as 𝑘 → ∞, reflecting the fact that the representation of small integers poses some unusual difficulties that do not persist in the long run. In fact, it was shown by Wooley in 1992 that 𝐺(𝑘) grows no faster than 𝑘 log 𝑘 asymptotically, whereas (7.5) shows that the growth of 𝑔(𝑘) is exponential in 𝑘. In the 1920s, Hardy and Littlewood devised a method for counting the number of representations of 𝑛 in the form (7.4) by using a definite integral. Refinements of this strategy due to Vinogradov, Davenport, Vaughan, Woooley, and others have led to the sharpest available upper bounds for 𝐺(𝑘) when 𝑘 ≥ 3. Notice that even in the cubic case, the existing technology still leaves fairly large gaps between what is conjectured and what can be proved! We give √ a very brief outline of the Hardy-Littlewood method. When 𝛼 is a real number and 𝑖 = −1, write 𝑒(𝛼) = 𝑒2𝜋𝑖𝛼 = cos(2𝜋𝛼) + 𝑖 sin(2𝜋𝛼). If 𝑚 is an integer, then it is easy to verify the orthogonality relations { ∫ 1 ∫ 1 ∫ 1 1 if 𝑚 = 0 sin(2𝜋𝛼𝑚) 𝑑𝛼 = cos(2𝜋𝛼𝑚) 𝑑𝛼 + 𝑖 . 𝑒(𝛼𝑚) 𝑑𝛼 = 0 if 𝑚 ∕= 0 0 0 0 If we let 𝑃 = ⌊𝑛1/𝑘 ⌋ and introduce the exponential sum 𝑓 (𝛼) = 𝑃 ∑ 𝑒(𝛼𝑥𝑘 ), 𝑥=1 then the fact that 𝑒(𝑎)𝑒(𝑏) = 𝑒(𝑎 + 𝑏) gives ∫ 1 𝑃 𝑃 ∫ ∑ ∑ 𝑠 𝑓 (𝛼) 𝑒(−𝛼𝑛) 𝑑𝛼 = ⋅⋅⋅ 0 𝑥1 =1 𝑥𝑠 =1 1 0 𝑒(𝛼(𝑥𝑘1 + ⋅ ⋅ ⋅ + 𝑥𝑘𝑠 − 𝑛)) 𝑑𝛼, and the orthogonality relations show that each term in the sum is 1 or 0 according to whether or not 𝑥𝑘1 + ⋅ ⋅ ⋅ + 𝑥𝑘𝑠 = 𝑛. The integral on the left therefore counts the representations of 𝑛 in this form, and demonstrating the existence of representations amounts to showing that the integral is positive. This is a non-trivial task that involves dissecting the interval [0, 1] into two subsets according to the nature of the rational approximations to 𝛼 and applying several types of estimates for the exponential sum 𝑓 (𝛼). Notice that as the real variable 𝛼 runs from 0 to 1, the complex variable 𝑧 = 𝑒2𝜋𝑖𝛼 traces out the unit circle ∣𝑧∣ = 1. The original set-up devised by Hardy and Littlewood actually takes the latter perspective, using integrals over circles in the complex plane. For this reason, the technique is often referred to as the circle method, and the two subsets mentioned above are called major and minor arcs. 8. Irrationality and transcendence √ We have already seen in §2 that irrational numbers exist; for instance, 2 ∕∈ ℚ. In fact, almost all real numbers are irrational, since the rationals form a countable set while the reals are uncountable. On the other hand, given any two real numbers 𝛼 < 𝛽, we can find a rational number lying between them. To see this, let 𝑛 be an integer with 𝑛 > 1/(𝛽 − 𝛼), so that 𝑛𝛽 − 𝑛𝛼 > 1. Clearly there must be an integer 𝑚 between 𝑛𝛼 and 𝑛𝛽, and it MA 311 NUMBER THEORY FALL 2008 45 follows that the rational number 𝑚/𝑛 lies between 𝛼 and 𝛽. In particular, by choosing 𝛽 sufficiently close to 𝛼, we can find a rational number that approximates 𝛼 to any desired degree of accuracy. This property is often expressed by saying that the rationals are dense in the reals. In number theory, we often desire more quantitative information about rational approximations. For instance, how does the quality of the approximation improve as we allow the denominator to increase? This is the type of information that determines how we dissect into major and minor arcs in the Hardy-Littlewood method. One simple answer is given by the following theorem. Theorem 8.1. (Dirichlet’s theorem on diophantine approximation) Given a real number 𝛼 and a positive integer 𝑃 , there exist integers 𝑎 and 𝑞 with (𝑎, 𝑞) = 1 and 1 ≤ 𝑞 ≤ 𝑃 − 1 such that ¯ ¯ ¯ ¯ 𝑎 ¯𝛼 − ¯ ≤ 1 . ¯ 𝑞 ¯ 𝑞𝑃 Proof. It suffices to prove the result for 𝛼 ∈ [0, 1], since the general case can then be obtained by replacing 𝑎/𝑞 by ⌊𝛼⌋+𝑎/𝑞. We divide the interval [0, 1] into 𝑃 subintervals, each of length 1/𝑃 , and consider the values of 𝑞𝛼 − ⌊𝑞𝛼⌋ as 𝑞 runs over the integers 1, 2, 3, . . . , 𝑃 − 1. First of all, if 𝑞𝛼 − ⌊𝑞𝛼⌋ lies in the interval [0, 1/𝑃 ] for some 𝑞, then taking 𝑎 = ⌊𝑞𝛼⌋ gives ∣𝑞𝛼 − 𝑎∣ ≤ 1/𝑃 . Similarly, if 𝑞𝛼 − ⌊𝑞𝛼⌋ lies in the interval [1 − 1/𝑃, 1] for some 𝑞, then taking 𝑎 = ⌊𝑞𝛼⌋ + 1 gives ∣𝑞𝛼 − 𝑎∣ ≤ 1/𝑃 . If none of these 𝑃 − 1 values lies in the first or last subinterval, then the pigeonhole principle ensures that two of them must lie in one of the remaining 𝑃 − 2 subintervals. That is, we have ∣(𝑞2 𝛼 − ⌊𝑞2 𝛼⌋) − (𝑞1 𝛼 − ⌊𝑞1 𝛼⌋)∣ ≤ 1/𝑃 for some integers 𝑞1 and 𝑞2 with 1 ≤ 𝑞1 < 𝑞2 ≤ 𝑃 −1. Taking 𝑞 = 𝑞2 −𝑞1 and 𝑎 = ⌊𝑞2 𝛼⌋−⌊𝑞1 𝛼⌋ again gives ∣𝑞𝛼 − 𝑎∣ ≤ 1/𝑃 . Finally, if (𝑎, 𝑞) = 𝑑 then setting 𝑎′ = 𝑎/𝑑 and 𝑞 ′ = 𝑞/𝑑 gives (𝑞 ′ , 𝑎′ ) = 1 and ∣𝑞 ′ 𝛼 − 𝑎′ ∣ ≤ 1/(𝑑𝑃 ) ≤ 1/𝑃 , which completes the proof. □ Corollary 8.2. If 𝛼 is an irrational number, then there are infinitely many rational numbers 𝑎/𝑞 for which ¯ ¯ ¯ ¯ 𝑎 ¯𝛼 − ¯ < 1 . ¯ 𝑞 ¯ 𝑞2 Proof. If there were only finitely many such rational approximations to 𝛼, then we could find one, say 𝑎/𝑞, with 𝛿 = ∣𝛼 − 𝑎/𝑞∣ minimal. Since 𝛼 ∕∈ ℚ, we have 𝛿 > 0, so we may let 𝑃 = ⌊1/𝛿⌋ + 1 > 1/𝛿. By Theorem 8.1, we can find a rational number 𝑏/𝑟 with 1 ≤ 𝑟 < 𝑃 and ¯ ¯ ¯ ¯ ¯𝛼 − 𝑏 ¯ ≤ 1 < 1 . ¯ 𝑟 ¯ 𝑟𝑃 𝑟2 Since 1/(𝑟𝑃 ) < 𝛿, this contradicts the minimality of 𝛿. □ Note that if 𝛼 is rational then the inequality in Corollary 8.2 has only finitely many solutions. To see this, write 𝛼 = 𝑚/𝑛 and note that if 𝑚/𝑛 ∕= 𝑎/𝑞 then we have ¯ ¯ ¯ 𝑚 𝑎 ¯ ∣𝑚𝑞 − 𝑎𝑛∣ 1 1 ¯ − ¯= ≥ ≥ 2 ¯𝑛 ¯ 𝑞 𝑛𝑞 𝑛𝑞 𝑞 46 SCOTT T. PARSELL whenever 𝑞 ≥ 𝑛, so the only possible solutions come from 1 ≤ 𝑞 < 𝑛. A theorem of Hurwitz shows that there are in fact infinitely many solutions of ¯ ¯ ¯ ¯ ¯𝛼 − 𝑎 ¯ < √ 1 ¯ 𝑞¯ 5𝑞 2 when 𝛼 is irrational. This turns out to be best possible in the sense that the result fails √ √ if the constant 1/ 5 is replaced by anything smaller. However, the golden ratio 𝛼 = 21 (1 + 5) provides the only counterexample! Continued fractions. One way of generating good rational approximations to an irrational number 𝛼 is to construct the continued fraction expansion 1 𝛼 = 𝑥0 + . 1 𝑥1 + 1 𝑥2 + 1 𝑥3 + 𝑥4 + . . . To save space, this is sometimes denoted by 𝛼 = [𝑥0 ; 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , . . . ]. We can construct continued fractions for rational numbers as well, but in this case the expansion is finite. 125 as a finite continued fraction. 54 Solution. We first split off the integer part by writing 125 17 =2+ . 54 54 Next we take the reciprocal of the fractional part and repeat the process. We have 54 3 17 2 3 1 =3+ , = 5 + , and =1+ . 17 17 3 3 2 2 Thus we have 125 1 = [2; 3, 5, 1, 2]. =2+ 1 54 3+ 1 5+ 1 1+ 2 Example 8.3. Express the rational number □ Example 8.4. What real number is represented by the continued fraction [1; 1, 1, 1, 1, . . . ]? Solution. If 𝛼 = [1; 1, 1, 1, 1, . . . ] then we have 1 1 𝛼=1+ =1+ . 1 𝛼 1+ 1 + ... 2 It follows that 𝛼 − 𝛼 − 1 = 0, and since 𝛼 is clearly positive we may conclude that √ 1+ 5 . 𝛼= 2 □ MA 311 NUMBER THEORY FALL 2008 47 To generate the continued fraction for 𝛼, we first take 𝑥0 = ⌊𝛼⌋ and then write 𝛼1 = 1 𝛼 − 𝑥0 and 𝑥1 = ⌊𝛼1 ⌋. In general, if 𝛼𝑛 and 𝑥𝑛 have been defined, we take 𝛼𝑛+1 = 1 𝛼𝑛 − 𝑥𝑛 and 𝑥𝑛+1 = ⌊𝛼𝑛+1 ⌋. √ Example 8.5. Compute the continued fraction for 2. √ Solution. First of all, we have 𝑥0 = ⌊ 2⌋ = 1. Next, we have 𝛼1 = √ √ 1 = 2 + 1, 2−1 and hence 𝑥1 = 2. Furthermore, 𝛼2 = 1 1 =√ = 𝛼1 , 𝛼1 − 2 2−1 and hence 𝑥2 = 2. Since √ 𝛼𝑛+1 depends only on 𝛼𝑛 and 𝑥𝑛 , we can conclude that 𝑥𝑛 = 2 for all 𝑛 ≥ 1. Therefore, 2 = [1; 2, 2, 2, 2, . . . ] = [1; 2]. □ By truncating the continued fraction obtained above, we can obtain rational approxima√ tions to 2, for instance 𝑝1 1 3 =1+ = , 𝑞1 2 2 𝑝2 1 =1+ 𝑞2 2+ 1 2 7 = , 5 and 𝑝3 1 17 =1+ = . 1 𝑞3 12 2 + 2+ 1 2 The rational number 𝑝𝑛 /𝑞𝑛 is called the 𝑛th convergent to 𝛼, and the integer 𝑥𝑛 is called the 𝑛th partial quotient of 𝛼. It turns out that the convergents satisfy some simple recurrence relations, which make them easy to compute once the partial quotients are known. Theorem 8.6. If 𝛼 has the continued fraction expansion [𝑥0 ; 𝑥1 , 𝑥2 , 𝑥3 , . . . ], then the 𝑛th convergent to 𝛼 is the rational number 𝑝𝑛 /𝑞𝑛 defined by recurrence relations 𝑝𝑛 = 𝑥𝑛 𝑝𝑛−1 + 𝑝𝑛−2 and 𝑞𝑛 = 𝑥𝑛 𝑞𝑛−1 + 𝑞𝑛−2 (𝑛 ≥ 0), where we take 𝑝−1 = 1, 𝑞−1 = 0, 𝑝−2 = 0, and 𝑞−2 = 1. Proof. We regard the convergents 𝑝𝑛 /𝑞𝑛 as functions of the partial quotients. That is, 𝑝𝑛 = 𝑝𝑛 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛 ) and 𝑞𝑛 = 𝑞𝑛 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛 ). The result is clear for 𝑛 = 0, since the recursions give 𝑝0 = 𝑥0 and 𝑞0 = 1. Now suppose that [𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 ] = 𝑝𝑛−1 /𝑞𝑛−1 . Then we can write [𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 , 𝑥𝑛 ] = [𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 + 1 ] 𝑥𝑛 = 𝑝𝑛−1 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛−1 + 𝑞𝑛−1 (𝑥0 , 𝑥1 , . . . , 𝑥𝑛−1 + 1 ) 𝑥𝑛 1 . ) 𝑥𝑛 48 SCOTT T. PARSELL Applying the above recurrence relations, we obtain [𝑥0 ; 𝑥1 , . . . , 𝑥𝑛−1 , 𝑥𝑛 ] = (𝑥𝑛−1 + 1 )𝑝𝑛−2 𝑥𝑛 1 )𝑞 𝑥𝑛 𝑛−2 + 𝑝𝑛−3 (𝑥𝑛−1 + + 𝑞𝑛−3 𝑥𝑛 𝑥𝑛−1 𝑝𝑛−2 + 𝑝𝑛−2 + 𝑥𝑛 𝑝𝑛−3 = 𝑥𝑛 𝑥𝑛−1 𝑞𝑛−2 + 𝑞𝑛−2 + 𝑥𝑛 𝑞𝑛−3 𝑥𝑛 (𝑥𝑛−1 𝑝𝑛−2 + 𝑝𝑛−3 ) + 𝑝𝑛−2 𝑥𝑛 𝑝𝑛−1 + 𝑝𝑛−2 𝑝𝑛 = = = . 𝑥𝑛 (𝑥𝑛−1 𝑞𝑛−2 + 𝑞𝑛−3 ) + 𝑞𝑛−2 𝑥𝑛 𝑞𝑛−1 + 𝑞𝑛−2 𝑞𝑛 The result follows by induction. □ √ Example 8.7. Find the continued fraction expansion for 29, and compute the first 6 convergents. √ Solution. We have 𝑥0 = ⌊ 29⌋ = 5, and thus √ √ 1 29 + 5 29 − 3 𝛼1 = √ = =2+ . 4 4 29 − 5 It follows that 𝑥1 = 2 and √ √ 29 + 3 29 − 2 4 𝛼2 = √ = =1+ . 5 5 29 − 3 This in turn gives 𝑥2 = 1 and √ √ 5 29 + 2 29 − 3 𝛼3 = √ =1+ , = 5 5 29 − 2 which yields 𝑥3 = 1 and √ √ 29 + 3 29 − 5 5 𝛼4 = √ =2+ . = 4 4 29 − 3 Now we have 𝑥4 = 2 and √ √ 4 𝛼5 = √ = 29 + 5 = 10 + ( 29 − 5), 29 − 5 and from this we see that 𝑥5 = 10 and 𝛼6 = 𝛼 √1 , which means that the continued fraction becomes periodic. We therefore conclude that 29 = [5; 2, 1, 1, 2, 10], and we can use Theorem 8.6 to compute the convergents. We have 𝑝0 = 5, 𝑝1 = 2 ⋅ 5 + 1 = 11, 𝑝2 = 1 ⋅ 11 + 5 = 16, 𝑝3 = 1 ⋅ 16 + 11 = 27, 𝑝4 = 2 ⋅ 27 + 16 = 70, and 𝑝5 = 10 ⋅ 70 + 27 = 727. Similarly, we get 𝑞0 = 1, 𝑞1 = 2, 𝑞2 = 1 ⋅ 2 + 1 = 3, 𝑞3 = 1 ⋅ 3 + 2 = 5, 𝑞4 = 2 ⋅ 5 + 3 = 13, and 𝑞5 = 10 ⋅ 13 + 5 = 135. Hence the first 6 convergents are 11 16 27 70 727 5, , , , , and . 2 3 5 13 135 □ Algebraic and transcendental numbers. A real number that is a root of a non-trivial polynomial with integer coefficients is said to be algebraic. More precisely, if 𝛼 is a root of a polynomial of degree 𝑘 with integer coefficients that is irreducible over ℚ, then we say that 𝛼 is algebraic of degree 𝑘. Note that any rational number 𝑝/𝑞 is algebraic of degree one, since MA 311 NUMBER THEORY FALL 2008 49 √ it is a root of the polynomial 𝑓 (𝑥) = 𝑞𝑥 − 𝑝. Any real number of the form 𝑎 ± 𝑏 𝑑, where 𝑎, 𝑏, and √ 𝑑 are rational and 𝑑 is not a perfect square, is algebraic 2of degree two. For instance, 1 (1+ 5) is algebraic of degree two, since it is a root of 𝑓 (𝑥) = 𝑥 −𝑥−1. Algebraic numbers 2 of degree two are sometimes called quadratic irrationals. It turns out that a number is a quadratic irrational if and only if it has an eventually periodic continued fraction expansion. The set of algebraic numbers is closed under addition and √ √ multiplication, but the set of algebraic numbers of degree 𝑘 is not. For instance, 2 and 3 are algebraic of degree 2, but √ √ √ √ 2 + 3 is algebraic of degree 4 and 2 ⋅ 2 = 2 is algebraic of degree 1. √ √ Example 8.8. Prove that 𝛼 = 2 + 3 is algebraic. √ √ Solution. First of all, we have 𝛼2 = 2 + 2 6 + 3, and hence 𝛼2 − 5 = 2 6. Squaring both sides gives 𝛼4 − 10𝛼2 + 25 = 24, or 𝛼4 − 10𝛼2 + 1 = 0. Thus 𝛼 is a root of the polynomial 𝑓 (𝑥) = 𝑥4 − 10𝑥2 + 1 and hence is algebraic of degree at most 4. One can in fact show that 𝑓 is irreducible over ℚ and hence that 𝛼 is algebraic of degree 4. □ Real numbers that are not algebraic are called transcendental. Probably the two most famous transcendental numbers are 𝑒 and 𝜋. Proving the transcendence of 𝑒 and 𝜋 is beyond the scope of the course; however, it is not too difficult to show that 𝑒 is irrational. Theorem 8.9. The number 𝑒 is irrational. Proof. Suppose to the contrary that 𝑒 is rational, say 𝑒 = 𝑝/𝑞, where 𝑝 and 𝑞 are integers with 𝑞 ≥ 1. We recall that 𝑒 can be expressed as the infinite series ∞ ∑ 1 𝑒= . 𝑘! 𝑘=0 Let 𝑛 ≥ 2𝑞 be an integer, and let 𝑒𝑛 denote the 𝑛th partial sum of this series; that is, 𝑛 ∑ 1 1 1 1 1 𝑒𝑛 = =1+1+ + + + ⋅⋅⋅ + . 𝑘! 2 6 24 𝑛! 𝑘=0 Clearly 𝑒𝑛 is rational, and we can write 𝑒𝑛 = 𝑎/𝑛! for some integer 𝑎. Moreover, we have 𝑒 > 𝑒𝑛 , and thus 𝑝 𝑎 𝑝𝑛! − 𝑎𝑞 1 𝑒 − 𝑒𝑛 = − = ≥ . 𝑞 𝑛! 𝑞𝑛! 𝑞𝑛! On the other hand, we have ∞ ∑ 1 1 1 1 𝑒 − 𝑒𝑛 = = + + + ... 𝑘! (𝑛 + 1)! (𝑛 + 2)! (𝑛 + 3)! 𝑘=𝑛+1 ) ( 1 1 1 1 2 1 < = ⋅ ≤ 1 + + 2 + ... (𝑛 + 1)! 𝑛 𝑛 (𝑛 + 1)! 1 − 1/𝑛 (𝑛 + 1)! since 𝑛 ≥ 2. Combining our two inequalities, we obtain 1 2 ≤ 𝑒 − 𝑒𝑛 ≤ , 𝑞𝑛! (𝑛 + 1)! which implies that 𝑛 ≤ 2𝑞 − 1, a contradiction. □ 50 SCOTT T. PARSELL The idea of the preceding proof may be summarized by saying that 𝑒 has rational approximations (namely 𝑒𝑛 ) that are “too good” to allow 𝑒 to be rational, since two distinct rationals differ by at least the reciprocal of the product of the denominators. The following theorem may be viewed as a generalization of this idea. It states that algebraic numbers cannot have fantastically good rational approximations. Theorem 8.10. (Liouville’s Theorem) Suppose that 𝛼 is an algebraic number of degree 𝑘 ≥ 2. Then there exists a positive constant 𝑐𝛼 such that ¯ ¯ ¯ ¯ ¯𝛼 − 𝑎 ¯ > 𝑐𝛼 ¯ 𝑞 ¯ 𝑞𝑘 for all integers 𝑎 and 𝑞 with 𝑞 ≥ 1. Proof. Suppose that 𝛼 is a root of the irreducible polynomial 𝑃 (𝑥) = 𝑏𝑘 𝑥𝑘 + 𝑏𝑘−1 𝑥𝑘−1 + ⋅ ⋅ ⋅ + 𝑏1 𝑥 + 𝑏0 , where 𝑘 ≥ 2, and let 𝑎 and 𝑞 be integers with 𝑞 ≥ 1. First of all, we note that 𝑃 (𝑎/𝑞) ∕= 0, since 𝑃 is irreducible of degree at least two. Furthermore, it is clear that 𝑞 𝑘 𝑃 (𝑎/𝑞) is an integer and hence that 𝑞 𝑘 ∣𝑃 (𝑎/𝑞)∣ ≥ 1. Since 𝛼 is a root of 𝑃 , we may write 𝑃 (𝑥) = (𝑥−𝛼)𝑄(𝑥), where 𝑄 is a polynomial of degree 𝑘 −1, not necessarily with integer coefficients. Since 𝑄 is a continuous function, we know that it attains maximum and minimum values on any closed, bounded interval. Therefore, there exists 𝑀𝛼 > 0 such that ∣𝑄(𝑥)∣ ≤ 𝑀𝛼 for all 𝑥 ∈ [𝛼 − 1, 𝛼 + 1]. We set 𝑐𝛼 = (1 + 𝑀𝛼 )−1 and consider two cases. If ∣𝛼 − 𝑎/𝑞∣ ≤ 1, then we have 𝑞 −𝑘 ≤ ∣𝑃 (𝑎/𝑞)∣ ≤ ∣𝛼 − 𝑎/𝑞∣∣𝑄(𝑎/𝑞)∣ ≤ ∣𝛼 − 𝑎/𝑞∣𝑀𝛼 < ∣𝛼 − 𝑎/𝑞∣𝑐−1 𝛼 , which gives ∣𝛼 − 𝑎/𝑞∣ > 𝑐𝛼 𝑞 −𝑘 , as required. If ∣𝛼 − 𝑎/𝑞∣ > 1, then the desired inequality follows from the observation that 𝑐𝛼 ≤ 1. □ Example 8.11. Find an admissible value for 𝑐𝛼 in Liouville’s Theorem when 𝛼 = √ 3 2. Solution. In the notation of the above proof, we have 𝑃 (𝑥) = 𝑥3 − 2 = (𝑥 − 𝛼)(𝑥2 + 𝛼𝑥 + 𝛼2 ) = (𝑥 − 𝛼)𝑄(𝑥). Since 𝑄′ (𝑥) = 2𝑥 + 𝛼, we find that 𝑄 is increasing on the interval [𝛼 − 1, 𝛼 + 1] and hence that 𝑄(𝛼 − 1) ≤ 𝑄(𝑥) ≤ 𝑄(𝛼 + 1) for all 𝑥 in the interval [𝛼 − 1, 𝛼 + 1]. Since 𝑄(𝛼 − 1) > 0 and 𝑄(𝛼 + 1) = 3𝛼2 + 3𝛼 + 1 < 9.542, we can take 𝑀𝛼 = 9.542 and thus any 𝑐𝛼 < (10.542)−1 is admissible. For example, one has ¯ ¯ ¯√ ¯ 𝑎 3 ¯ 2− ¯> 1 ¯ 𝑞 ¯ 11𝑞 3 for all integers 𝑎 and all positive integers 𝑞. □ One might hope that the proof of Theorem 8.9 could be modified to show that 𝑒 is transcendental using the contrapositive of Liouville’s Theorem. However, the quality of the rational approximations 𝑒𝑛 is not sufficient to make this argument work. We note that 𝑒𝑛 has denominator 𝑞 = 𝑛!, but 2/(𝑛 + 1)! > 1/(𝑛!)2 = 1/𝑞 2 so the inequality ∣𝑒 − 𝑒𝑛 ∣ < 2/(𝑛 + 1)! doesn’t even rule out the possibility that 𝑒 is a quadratic irrational! Therefore a more MA 311 NUMBER THEORY FALL 2008 51 sophisticated argument is required to prove that 𝑒 is transcendental. However, we can establish the existence of transcendental numbers by working with a series that converges much faster. Theorem 8.12. The number 𝛼 = ∞ ∑ 10−𝑗! = 0.11000100000000000000000100000000..... is 𝑗=1 transcendental. 𝑛 𝑎𝑛 ∑ −𝑗! Proof. We write = 10 , where 𝑞𝑛 = 10𝑛! . We then have 𝑞𝑛 𝑗=1 ¯ ¯ ∞ ∑ ¯ ¯ ¯𝛼 − 𝑎𝑛 ¯ = 10−𝑗! = 10−(𝑛+1)! + 10−(𝑛+2)! + 10−(𝑛+3)! + . . . ¯ ¯ 𝑞𝑛 𝑗=𝑛+1 ( ) 10 10 < 10−(𝑛+1)! 1 + 10−1 + 10−2 + . . . = ⋅ 10−(𝑛+1)! = 𝑞𝑛−(𝑛+1) . 9 9 If 𝛼 is algebraic of degree 𝑘 ≥ 2, then Liouville’s Theorem implies that there is a constant 𝑐 > 0 such that ∣𝛼 − 𝑎𝑛 /𝑞𝑛 ∣ > 𝑐𝑞𝑛−𝑘 for all 𝑛. This statement holds for 𝑘 = 1 as well since 𝛼 ∕= 𝑎𝑛 /𝑞𝑛 and hence 𝛼 = 𝑏/𝑟 =⇒ ∣𝛼 − 𝑎𝑛 /𝑞𝑛 ∣ ≥ (𝑟𝑞𝑛 )−1 , whence we can take 𝑐 = 1/(𝑟 + 1). Thus if 𝛼 is algebraic of degree 𝑘 we have 10 −(𝑛+1) 𝑐𝑞𝑛−𝑘 < ∣𝛼 − 𝑎𝑛 /𝑞𝑛 ∣ < 𝑞 , 9 𝑛 and thus 𝑞𝑛𝑛+1−𝑘 < 10/(9𝑐). But 𝑞𝑛 → ∞ as 𝑛 → ∞, so we obtain a contradiction by taking 𝑛 sufficiently large in terms of 𝑘 and 𝑐. □ Some open questions. A real number 𝛼 is said to be badly approximable if there is a positive constant 𝑐𝛼 such that ∣𝛼−𝑎/𝑞∣ > 𝑐𝛼 𝑞 −2 for all integers 𝑎 and 𝑞 with 𝑞 ≥ 1. Liouville’s Theorem shows that all algebraic numbers of degree two (i.e., all quadratic irrationals) are badly approximable. It is conjectured that no algebraic numbers of degree greater than two are badly approximable, but this has not been proven. It turns out that a number is badly approximable if and only if the partial quotients in its continued fraction expansion are bounded. For instance, 𝑒 is not badly approximable, for it can be shown that 𝑒 = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, 12, . . . ]. By contrast, it is unknown whether 𝜋 is badly approximable (the conjecture is that it’s not). We do know that most real numbers are not badly approximable, in the sense that the badly approximable numbers have measure zero in the real line. However, the set of badly approximable numbers is uncountable (like the reals), whereas the set of algebraic numbers is countable (like the integers and the rationals). Therefore, there are uncountably many badly approximable transcendental numbers, but producing a single specific example seems to be non-trivial. On a more basic level, much is still unknown about the irrationality and transcendence√of familiar numbers. For instance, it is not known whether the numbers 𝜋 ± 𝑒, 𝜋/𝑒, 𝜋 𝑒 , 𝜋 2 , 2𝑒 , and log 𝜋 are irrational. As another example, consider the Riemann zeta function ∞ ∑ 1 𝜁(𝑠) = 𝑛𝑠 𝑛=1 52 SCOTT T. PARSELL for 𝑠 > 1. When 𝑠 is an even integer, it is known that 𝜁(𝑠) is a rational multiple of 𝜋 𝑠 (and hence transcendental); for example, 𝜁(2) = 𝜋 2 /6. Much less is known when 𝑠 is odd. It was proved by Apéry in 1979 that 𝜁(3) is irrational, but it is unknown whether 𝜁(3) is transcendental. It is unknown whether 𝜁(5) is irrational, although it has been shown that at least one of 𝜁(5), 𝜁(7), 𝜁(9), and 𝜁(11) must be irrational (Zudilin, 2001). In fact, it is known that there are infinitely many odd integers 𝑠 for which 𝜁(𝑠) is irrational (Rivoal, 2000), but the irrationality is not known for any particular odd 𝑠 > 3. 9. The distribution of primes Suppose that 𝑝𝑛 denotes the 𝑛th prime, so that 𝑝1 = 2, 𝑝2 = 3, 𝑝3 = 5, and so on. One of the central problems in analytic number theory is to obtain precise information about the behavior of 𝑝𝑛 as 𝑛 → ∞. A simple way to get an idea of how the sequence (𝑝𝑛 ) is distributed is to look at the sum of the reciprocals of the terms: ∞ ∑ 1 1 1 1 1 1 1 1 = + + + + + + + .... 𝑝 2 3 5 7 11 13 17 𝑛=1 𝑛 (9.1) As a starting point, we recall from calculus that the harmonic series 1 + 12 + 13 + 41 + . . . diverges. The following lemma provides quantitative information about the growth rate of the partial sums. Specifically, it says that the sum of the first 𝑁 terms is roughly log 𝑁 . Lemma 9.1. For all positive integers 𝑁 , one has 1 1 1 1 0 < 1 + + + + ⋅⋅⋅ + − log 𝑁 ≤ 1. 2 3 4 𝑁 Proof. By using a left-hand Riemann sum to over-estimate the area under the graph of 𝑦 = 1/𝑡, we find that ∫ 𝑁 +1 1 1 1 𝑑𝑡 1 + + + ⋅⋅⋅ + > = log(𝑁 + 1) > log 𝑁. 2 3 𝑁 𝑡 1 By considering a right-hand Riemann sum we similarly obtain ∫ 𝑁 1 1 1 𝑑𝑡 1 + + + ⋅⋅⋅ + ≤1+ = 1 + log 𝑁, 2 3 𝑁 𝑡 1 and the result follows by subtracting log 𝑁 from each side of the above inequalities. □ It turns out that the quantity considered above actually approaches a limit as 𝑁 → ∞, known as Euler’s constant: ) ( 1 1 1 𝛾 = lim 1 + + + ⋅ ⋅ ⋅ + − log 𝑁 = 0.57721566490153286060651209008240243 . . . 𝑁 →∞ 2 3 𝑁 Although Euler’s constant is known to over 1,000,000 decimal places, it is still unknown whether it is irrational. It is conjectured to be transcendental. We now return to the prime harmonic series (9.1). It turns out that the 𝑁 th partial sum of this series is on the order of log log 𝑁 rather than log 𝑁 . In what follows, it is useful to call an integer square-free if it is not divisible by the square of any prime. In other words, 𝑛 is square-free if we can write 𝑛 = 𝑝1 ⋅ ⋅ ⋅ 𝑝𝑟 , where 𝑝1 , . . . , 𝑝𝑟 are distinct primes. For instance, MA 311 NUMBER THEORY FALL 2008 53 the integer 42 = 2 ⋅ 3 ⋅ 7 is square-free but 45 = 32 ⋅ 5 is not. From now on, the letter 𝑝 is reserved to denote a prime unless otherwise indicated. ∑1 Theorem 9.2. For every integer 𝑁 > 1, one has > log log 𝑁 − log 2. 𝑝 𝑝≤𝑁 Proof. Every positive integer 𝑛 can be written uniquely in the form 𝑛 = 𝑞𝑚2 , where 𝑞 and 𝑚 are positive integers with 𝑞 square-free. Using Lemma 9.1, we obtain ∞ ∑1 ∑ ∑ ∑ 1 ∑ 1 1 log 𝑁 < = ≤ . 2 𝑛 𝑞𝑚 𝑞 𝑚=1 𝑚2 √ 𝑛≤𝑁 𝑞≤𝑁 𝑞≤𝑁 𝑞 squarefree Furthermore, we have 𝑚≤ 𝑁/𝑞 𝑞 squarefree ∫ ∞ ∞ ∑ 1 𝑑𝑡 <1+ = 2, 2 𝑚 𝑡2 1 𝑚=1 and the inequality 1 + 𝑢 ≤ 𝑒𝑢 yields ) ∏ (∑ ) ∑ 1 ∏( 1 1 1/𝑝 ≤ 1+ ≤ 𝑒 = exp . 𝑞 𝑝≤𝑁 𝑝 𝑝 𝑝≤𝑁 𝑝≤𝑁 𝑞≤𝑁 𝑞 squarefree We therefore deduce that (∑ ) 1 log 𝑁 < 2 exp , 𝑝 𝑝≤𝑁 and taking logarithms gives log log 𝑁 < log 2 + ∑1 , 𝑝 𝑝≤𝑁 as required. Corollary 9.3. The prime harmonic series □ ∞ ∑ 1 diverges. 𝑝 𝑛 𝑛=1 Proof. This follows immediately from Theorem 9.2, since lim (log log 𝑁 ) = ∞. 𝑁 →∞ □ ∑ We may interpret the divergence of 𝑝−1 𝑛 to mean that the primes are not all that sparsely distributed. For instance, if the primes were as sparse as the sequence of perfect squares ∑ −2 then the series would converge by comparison with 𝑛 . On the other hand, comparing the orders of growth of the partial sums in Lemma 9.1 and Theorem 9.2 indicates that the primes are, at least in some sense, significantly sparser than the integers themselves. In order to obtain more precise information about the growth of 𝑝𝑛 , it is useful to define 𝜋(𝑛) to be the number of primes 𝑝 with 𝑝 ≤ 𝑛. We aim to derive some elementary bounds for 𝜋(𝑛) due to Chebyshev and then use our results to obtain bounds on 𝑝𝑛 . We begin with two simple combinatorial lemmas. ( ) 2𝑛 𝑛 Lemma 9.4. One has 2 ≤ < 4𝑛 for all positive integers 𝑛. 𝑛 54 SCOTT T. PARSELL Proof. By the binomial theorem, we have 𝑛 4 = (1 + 1) 2𝑛 ) 2𝑛 ( ∑ 2𝑛 = 𝑘=0 𝑘 ( ) 2𝑛 > . 𝑛 The other inequality may be established by a simple induction argument and is left as an exercise. □ Lemma 9.5. One has ⌋ 𝑝 𝑛⌋⌊ ∑ ⌊log ∑ 𝑛 log 𝑛! = log 𝑝. 𝑝𝑚 𝑝≤𝑛 𝑚=1 3 ⋅ ⋅ ⋅ (𝑛 − 1) ⋅ 𝑛, it is clear that there are no primes 𝑝 > 𝑛 dividing 𝑛!, Proof. Since 𝑛! = 1 ⋅ 2 ⋅ ∏ so we may write 𝑛! = 𝑝≤𝑛 𝑝𝛼𝑝 , where 𝛼𝑝 is a non-negative integer representing the exact power of 𝑝 that divides 𝑛!. Taking logarithms gives ∑ log 𝑛! = 𝛼𝑝 log 𝑝, 𝑝≤𝑛 so it remains to find a formula for 𝛼𝑝 . Among the integers 1, 2, 3, . . . , 𝑛 − 1, 𝑛, there are ⌊𝑛/𝑝⌋ multiples of 𝑝. Of these, ⌊𝑛/𝑝2 ⌋ are also multiples of 𝑝2 , and in general ⌊𝑛/𝑝𝑚 ⌋ of them are multiples of 𝑝𝑚 . Since 𝑝𝑚 > 𝑛 when 𝑚 > log𝑝 𝑛, we see that ⌊ ⌋ ⌊ ⌋ ⌊ ⌋ 𝑛 𝑛 𝑛 𝛼𝑝 = + 2 + 3 + ... = 𝑝 𝑝 𝑝 ⌊log𝑝 𝑛⌋⌊ ∑ 𝑚=1 ⌋ 𝑛 , 𝑝𝑚 and this completes the proof. □ Theorem 9.6. For every integer 𝑛 ≥ 2 one has 𝑛 6𝑛 < 𝜋(𝑛) < . 6 log 𝑛 log 𝑛 Proof. By taking logarithms in the result of Lemma 9.4, we obtain since (2𝑛) 𝑛 𝑛 log 2 ≤ log(2𝑛)! − 2 log 𝑛! < 𝑛 log 4, = (2𝑛)!/(𝑛!)2 . Lemma 9.5 therefore gives 𝑛 log 2 ≤ ⌋ 𝑝 2𝑛⌋ (⌊ ∑ ⌊log ∑ 2𝑛 𝑝≤2𝑛 𝑚=1 𝑝𝑚 ⌊ ⌋) 𝑛 −2 𝑚 log 𝑝 < 𝑛 log 4. 𝑝 (9.2) Now since ⌊2𝑥⌋ − 2⌊𝑥⌋ is 0 if 0 ≤ 𝑥 − ⌊𝑥⌋ < 1/2 and 1 otherwise, we find that ∑ 𝑛 < 𝑛 log 2 ≤ (log𝑝 2𝑛)(log 𝑝) = 𝜋(2𝑛) log 2𝑛, 2 𝑝≤2𝑛 and hence 𝜋(2𝑛) > 2𝑛/(4 log 2𝑛), which establishes the lower bound for even integers. Since 2𝑛 ≥ 23 (2𝑛 + 1) for 𝑛 ≥ 1, we also have 𝜋(2𝑛 + 1) ≥ 𝜋(2𝑛) > 2𝑛 + 1 2𝑛 > , 4 log 2𝑛 6 log(2𝑛 + 1) MA 311 NUMBER THEORY FALL 2008 55 which proves the lower bound for odd integers. For the upper bound, we delete all but the 𝑚 = 1 term from (9.2) to obtain ⌊ ⌋) ∑ (⌊ 2𝑛 ⌋ 𝑛 −2 log 𝑝 < 𝑛 log 4. 𝑝 𝑝 𝑝≤2𝑛 Let 𝜗(𝑛) = ∑ log 𝑝. Since ⌊2𝑛/𝑝⌋ − 2⌊𝑛/𝑝⌋ = 1 when 𝑛 < 𝑝 ≤ 2𝑛, we deduce that 𝑝≤𝑛 ∑ 𝜗(2𝑛) − 𝜗(𝑛) = log 𝑝 < 𝑛 log 4. 𝑛<𝑝≤2𝑛 Now if 𝑛 is a particular integer with 𝑛 ≥ 2, there is a positive integer 𝑘 such that 2𝑘 ≤ 𝑛 < 2𝑘+1 . We then have 𝑘+1 𝜗(𝑛) ≤ 𝜗(2 )= 𝑘 ∑ (𝜗(2 𝑟+1 𝑟 ) − 𝜗(2 )) < 𝑟=0 𝑘 ∑ 2𝑟 log 4 = (2𝑘+1 − 1) log 4 < 4𝑛 log 2, 𝑟=0 since the first summation telescopes and 𝜗(1) = 0. On the other hand, we have ∑ 𝜗(𝑛) ≥ log 𝑝 ≥ (𝜋(𝑛) − 𝜋(𝑛2/3 )) log(𝑛2/3 ) ≥ 32 (𝜋(𝑛) − 𝑛2/3 ) log 𝑛. 𝑛2/3 <𝑝≤𝑛 Combining the previous two inequalities yields (𝜋(𝑛) − 𝑛2/3 ) log 𝑛 < 6𝑛 log 2, and hence ( ) 6𝑛 log 2 𝑛 log 𝑛 2/3 𝜋(𝑛) < +𝑛 = 6 log 2 + 1/3 . log 𝑛 log 𝑛 𝑛 It is a simple calculus exercise to show that the function (log 𝑥)/𝑥1/3 takes its maximum value at 𝑥 = 𝑒3 and hence that (log 𝑛)/𝑛1/3 ≤ 3/𝑒 for all 𝑛 ≥ 1. We therefore have 𝜋(𝑛) < 𝑛 6𝑛 (6 log 2 + 3/𝑒) < , log 𝑛 log 𝑛 as required. □ We now deduce upper and lower bounds on the size of the 𝑛th prime. Theorem 9.7. For every integer 𝑛 ≥ 2, one has 1 𝑛 log 𝑛 < 𝑝𝑛 < 18𝑛 log 𝑛. 6 Proof. Suppose that 𝑝𝑛 = 𝑘. By Theorem 9.6, we have 𝑛 = 𝜋(𝑘) < 6𝑘 6𝑝𝑛 = , log 𝑘 log 𝑝𝑛 and thus 𝑝𝑛 > 16 𝑛 log 𝑝𝑛 > 61 𝑛 log 𝑛, which gives the lower bound. Similarly, Theorem 9.6 gives 𝑘 𝑝𝑛 𝑛 = 𝜋(𝑘) > = , 6 log 𝑘 6 log 𝑝𝑛 56 SCOTT T. PARSELL and thus 𝑝𝑛 < 6𝑛 log 𝑝𝑛 . We recall from the proof of Theorem 9.6 that log 𝑥 ≤ (3/𝑒)𝑥1/3 , 1/3 2/3 which gives 𝑝𝑛 < (18/𝑒)𝑛𝑝𝑛 and thus 𝑝𝑛 < 18𝑛/𝑒. Taking logarithms gives 2 log 𝑝𝑛 < log 𝑛 + log(18/𝑒) < 2 log 𝑛, 3 provided that 𝑛 > 6. We therefore obtain 𝑝𝑛 < 18𝑛 log 𝑛 when 𝑛 > 6, and it is easy to check that this holds for 2 ≤ 𝑛 ≤ 6 as well. □ Even more precise information is known about 𝜋(𝑛) and 𝑝𝑛 asymptotically as 𝑛 → ∞. Before mentioning some of these results, we discuss some of the common asymptotic notation. We say that 𝑓 (𝑥) ∼ 𝑔(𝑥) as 𝑥 → ∞ if 𝑓 (𝑥) = 1. 𝑥→∞ 𝑔(𝑥) lim Furthermore, we write 𝑓 (𝑥) = 𝑜(𝑔(𝑥)) if lim 𝑥→∞ 𝑓 (𝑥) = 0. 𝑔(𝑥) Finally, we write 𝑓 (𝑥) = 𝑂(𝑔(𝑥)) if there is a constant 𝑀 such that ∣𝑓 (𝑥)∣ ≤ 𝑀 ∣𝑔(𝑥)∣ for all 𝑥. Notice that 𝑓 = 𝑜(𝑔) implies that 𝑓 = 𝑂(𝑔). Theorem 9.8. (The Prime Number Theorem) As 𝑛 → ∞ one has 𝑛 𝜋(𝑛) ∼ and 𝑝𝑛 ∼ 𝑛 log 𝑛. log 𝑛 The proof of the prime number theorem is beyond the scope of the course, as the most direct method requires the theory of complex variables. If 𝜋(𝑛; 𝑞, 𝑎) denotes the number of primes 𝑝 ≤ 𝑛 with 𝑝 ≡ 𝑎 (mod 𝑞), then it is also known that 1 𝑛 𝜋(𝑛; 𝑞, 𝑎) ∼ 𝜋(𝑛) ∼ 𝜙(𝑞) 𝜙(𝑞) log 𝑛 whenever (𝑞, 𝑎) = 1. This is called the prime number theorem for arithmetic progressions. In particular, it shows that there are infinitely many primes in each reduced residue class modulo 𝑞 and that the primes are equally distributed among the residue classes asymptotically. For example, roughly half of the odd primes are congruent to 1 mod 4 and roughly half are congruent to 3 mod 4. The prime number theorem may be interpreted by saying that the probability that the integer 𝑛 is prime is roughly 1/ log 𝑛. In fact, this interpretation leads to an approximation for 𝜋(𝑛) that is more accurate than 𝑛/ log 𝑛. It is known that 𝜋(𝑛) ∼ li(𝑛), where ∫ 𝑥 𝑑𝑡 li(𝑥) = . 2 log 𝑡 We may think of li(𝑥) as a sort of cumulative distribution function for the density function 𝑓 (𝑡) = 1/ log 𝑡. It is known that ∣𝜋(𝑥)−li(𝑥)∣ = 𝑜(𝑥) as 𝑥 → ∞, and in fact one can make the error term more explicit. The best known quantitative version of the prime number theorem states that √ ∣𝜋(𝑥) − li(𝑥)∣ = 𝑂(𝑥 exp(−𝑐 log 𝑥)) MA 311 NUMBER THEORY FALL 2008 57 for some constant 𝑐 > 0. However, it is easy to show that this error term grows more rapidly than 𝑥1−𝛿 for every 𝛿 > 0, so this is actually a fairly weak √ result in some sense. It is conjectured that the true error term is just slightly larger than 𝑥. Conjecture 9.9. (The Riemann Hypothesis) One has √ ∣𝜋(𝑥) − li(𝑥)∣ = 𝑂( 𝑥 log 𝑥). This is one of the most notorious unsolved problems in mathematics, and even establishing an error term of 𝑂(𝑥1−𝛿 ) for some positive 𝛿 would be considered a major breakthrough. The usual statement of the Riemann hypothesis concerns the zeta function mentioned at the end of §8. This is a function of a complex variable, which is defined by the infinite series ∞ ∑ 𝜁(𝑠) = 𝑛−𝑠 𝑛=1 when Re(𝑠) > 1. The above series fails to converge when Re(𝑠) ≤ 1, but it turns out that the zeta function has a unique extension (called an analytic continuation) to the whole complex plane. This extension of 𝜁(𝑠) has so-called “trivial” zeros at the negative even integers, and the Riemann hypothesis is equivalent to the assertion that all the remaining zeros of 𝜁(𝑠) lie on the line Re(𝑠) = 1/2. Twin Primes and Mersenne Primes. It is conjectured that 𝜋2 (𝑛), the number of twin prime pairs (𝑝, 𝑝 + 2) with 𝑝 + 2 ≤ 𝑛, is asymptotic to 𝐶𝑛/(log 𝑛)2 for some constant 𝐶 > 0, but we don’t even know that 𝜋2 (𝑛) → ∞. This latter statement is known as the Twin Prime Conjecture. In some sense, the twin primes are very sparse, as it can be shown that the sum of their reciprocals, ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 1 + + + + + + + + ... 3 5 5 7 11 13 17 19 converges, in contrast to the conclusion of Corollary 9.3. The value of the above sum, known as Brun’s constant, is quite difficult to estimate precisely because of the slow convergence; however, its value appears to be around 1.902160583. In 1994, Nicely discovered inconsistencies in his computations of Brun’s constant, which turned out to result from a subtle flaw in Intel’s new Pentium processor. This led to an embarrassing recall and provided one of the more surprising applications of number theory. It is not known whether Brun’s constant is rational; of course, its irrationality would imply the Twin Prime Conjecture since a finite sum of rational numbers is rational. Recall that the Mersenne numbers are integers of the form 2𝑝 − 1 where 𝑝 is prime. It is conjectured that the number of Mersenne primes up to 𝑛 is asymptotic to 𝑒𝛾 log2 (log 𝑛), where 𝛾 is Euler’s constant. However, only 46 Mersenne primes have been discovered as of November 2008, and proving that there are infinitely many seems completely out of reach. The computational evidence certainly suggests that the Mersenne primes are sparsely distributed among the Mersenne numbers; that is, for most primes 𝑝 the number 2𝑝 − 1 turns out to be composite. However, it also remains an open problem to establish that there are infinitely many composite Mersenne numbers. It seems inconceivable that this would fail, since then all sufficiently large Mersenne numbers would be prime! Nevertheless, the existing technology does not seem to be capable of generating a proof. 58 SCOTT T. PARSELL References [1] G. E. Andrews, Number Theory, Dover, 1994. [2] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics, Springer-Verlag, 1976. [3] T. H. Barr, Invitation to cryptology, Prentice Hall, 2002. [4] D. M. Bressoud, Factorization and primality testing, Undergraduate Texts in Mathematics, Springer-Verlag, New York, 1989. [5] E. B. Burger, Exploring the number theory jungle: A journey into diophantine analysis, AMS Student Mathematical Library, Volume 8, 2000. [6] M. Erickson and A. Vazzana, Introduction to number theory, Discrete Mathematics and its Applications, Chapman & Hall/CRC, Boca Raton, 2008. [7] J. A. Gallian, Contemporary abstract algebra, 6th ed, Houghton Mifflin, 2006. [8] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 6th ed, Oxford University Press, 2008. [9] J. F. Humphreys and M. Y. Prest, Numbers, groups, and codes, Cambridge University Press, 1989. [10] K. Ireland and M. Rosen, A classical introduction to modern number theory, 2nd ed, Graduate Texts in Mathematics, 84, Springer-Verlag, 1990. [11] N. Koblitz, A course in number theory and cryptography, 2nd ed, Graduate Texts in Mathematics, 114, Springer-Verlag, 1994. [12] H. L. Montgomery and R. C. Vaughan, Multiplicative number theory I. Classical theory, Cambridge University Press, 2007. [13] M. B. Nathanson, Additive number theory I: The classical bases, Graduate Texts in Mathematics, 164, Springer-Verlag, 1996. [14] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An introduction to the theory of numbers, 5th ed, Wiley, 1991. [15] K. H. Rosen, Elementary number theory and its applications, 5th ed, Pearson Addison Wesley, 2005. [16] J. H. Silverman, A friendly introduction to number theory, 3rd ed, Pearson Prentice Hall, 2006. [17] G. Tenenbauam and M. Mendés France, The prime numbers and their distribution, AMS Student Mathematical Library, Volume 6, 2000. [18] R. C. Vaughan, The Hardy-Littlewood method, 2nd ed, Cambridge University Press, 1997.