Download 1 First Theme: Sums of Squares

I will try to organize the work of this semester around several classical questions. The first is, When is a prime p the sum of two squares? The question was raised by Fermat who gave the correct answer in 1640 and claimed he had a proof but never wrote it down; the first published proof, in 1747, is due to Euler. There is a beautiful article in Wikipedia which you can reach by Googling “sum of two squares”; it is well worth reading. Today there is a very short exceedingly clever proof which requires practically no preparation but which also gives you no idea of why the theorem should be true. Our approach here, however, will not be to get to the result as speedily as possible but to bring in the many modern algebraic ideas that lead to a conceptual understanding of why Fermat’s original assertion is correct. 1 First Theme: Sums of Squares When is an integer a sum of two squares, m = a2 + b2 ? First observation: if m, n are both sums of two squares then so is mn. For going over to complex numbers, m = a2 + b2 can be written as m = (a + bi)(a − bi). Similarly, if n = c2 + d2 ) then we can write n = (c + di)(c − di). It follows that mn = (a + bi)(c + di)(a − bi)(c − di) = [(ac − bd) + (ad + bc)i][(ad − bc) − (ac − bd)i], so we must have mn = (ac − bd)2 + (ad + bc)2 . An easy calculation shows that this is correct. This suggests that we look first at the question of when a prime p is a sum of two squares. Knowing which primes are sums of two squares won’t completely answer the question, however. Certainly those numbers which can be written as products of primes each of which is a sum of two squares will, from what we have just seen, indeed be sums of two squares. It is still possible that some other numbers are, too. In fact, any number which is itself a square is trivially the sum of two squares, itself and zero. We will see that this really exhausts the possibilities: to be a sum of two squares a number must be a product of squares and of primes which are themselves a sum of two squares. So which primes qualify? Let’s try out some small primes: 2 = 12 + 12 , 3 fails, 5 = 12 + 22 , 7 fails, 11 fails, 13 = 22 + 32 , 17 = 12 + 42 , 19 fails. So far, except for 2, those primes p which are congruent 1 modulo 4, i.e., which leave a remainder of 1 when divided by 4 are sums of two squares. This is written p ≡ 1 mod 4 or simply p ≡ 1(4). Those p with p ≡ 3(4) fail. And a number like 3 × 5 = 15 which has as a factor a prime that fails, namely 3, (but not 32 ), is not a sum of two squares. In fact we have the following Theorem 1 1. A prime p is a sum of two squares if and only if either p = 2 or p ≡ 1(4). 2. An integer m is a sum of two squares if and only if in the prime factorization of m those primes p ≡ 3(4) appear to an even power. 2 There are classical computational proofs of this which do not involve any higher algebra (cf. the Wikipedia article). The first few weeks of our course will be devoted to developing enough modern algebra so that we can understand a conceptual proof. Part of this will be understanding when prime factorizations 1 exist and are unique; it is implicit in the statement of the theorem that this is the case for the ordinary integers. Passing to the complex numbers has been useful but before going further, let’s introduce some terminology and some notation. You have probably already been introduced to the concept of a group, G but let me review it very briefly: G consists of a set of elements together with a multiplication map G × G → G which is associative; moreover (i) there is a unit element, denoted 1 or e (or 1G if we want to emphasize that this is the unit element of the group G) with the property that 1 · x = x · 1 = x for all x ∈ G, and (ii) for every x ∈ G there is an inverse element x−1 such that x · x−1 = 1 = x−1 · x. This inverse is necessarily unique (as is the unit element). We say that x and y commute if xy = yx. A group in which all pairs of elements commute is called commutative or Abelian (in honor of N. H. Abel). In that case, multiplication is frequently written as addition and the ‘unit element’ is denoted by 0 and called the zero element; the group is then usually called an additive group. The set of all permutations of an arbitrary set S forms a group; it is non-Abelian whenever S has at least three elements. The most important Abelian group is the additive group of integers, here always denoted by Z. The integers, however, have more structure, there is an associative multiplication. This leads to the following definition: a ring R is an additive group with an associative multiplication that satisfies the distributive laws, i.e., where x(y + z) = xy + xz and (y + z)x = yx + zx. We need both of these statements since the multiplication need not be commutative. A good example of such a ring is set of all n × n matrices with real coefficients. We will always denote the real numbers by R and this ring by Mn (R). (This notation differs from that in the text but is more common.) Notice that Mn (R) has a unit element for multiplication. We will generally assume without mentioning it that the rings we deal with have a unit element but (unlike the text) not that they are commutative. A field is a commutative ring in which every non-zero element has a multiplicative inverse. The most important examples are the rational numbers Q, the real numbers, R, and the complex numbers C. There do exist noncommutative rings in which every non-zero element has a multiplicative inverse. These are called division rings, or skew fields, a term which the elder Artin has contracted to sfield. The most important example is the Hamiltonians, generally denoted by H after its discoverer. This algebra is a four-dimensional vector space over R with basis elements 1, i, j, k and multiplication defined by i2 = j 2 = k 2 = −1, ij = k = −ji. It follows that jk = i = −kj, ki = j = −ik. The general quaternion thus has the form q = a+bi+cj +dk, where a, b, c, d ∈ R. To see that this is a division ring, define the conjugate q̄ = a − bi − cj − dk. Then q · q̄ = q̄ · q = a2 + b2 + c2 + d2 . Since a, b, c, d are real, this can’t vanish unless all are zero, so we have q −1 = q/q q̄. A ring which is a vector space over a field is frequently called an algebra. This brings us to the concept of a morphism (older name homomorphism). Recall that a morphism between groups f : G → H is a mapping which preserves the group multiplication, i.e., such that f (xy) = (f x)(f y). This implies that f (1g ) = 1H and that f (x−1 ) = (f x)−1 . 2 Exercise 1 Prove the preceding assertions. In the newer terminology, a morphism which is one-to-one, i.e., where x 6= y implies f x 6= f y, is called a monomorphism although I will frequently use the older word as well. A morphism which is onto is now usually called an epimorphism, but here, too, I will frequently use the older term. One which is both is frequently called a bijection. It is easy to check that if we have f g g◦f morphisms G −−− −→ H −−−−→ J then the composite G −−−−→ J is again a morphism. In elementary texts an isomorphism f : G → H is usually defined to be a morphism which is one-to-one and onto, a bijection. This will do for groups, rings, and all the structures you will encounter in this course, but the ‘categorical’ definition is this: f is an isomorphism if there is a morphism g : H → G such g ◦ f is the identity map of g, denoted idG , and such that f ◦ g = idH . The reason that the elementary definition works for groups and for rings is that if f : G → H is a bijection then there is a set mapping g : H → G sending every z ∈ H back to the unique x ∈ G with f x = z and this g is again a morphism. Exercise 2 Prove the preceding assertion. The concept of a morphism extends to rings: a morphism f : R → S is a mapping which preserves both the addition and the multiplication, i.e., such that f (x + y) = f x + f y and f (xy) = (f x)(f y). While we certainly have f (0) = 0 it need no longer be the case that f (1R ) = 1S . For example, let R just be R and S = M2 (R). We can µ then define a morphism R → S by sending ¶ x 0 . This is certainly a morphism but every real number x to the matrix 0 0 the image of the unit element 1 ∈ R is not the unit element of S. The image of 1 is, however, an element whose square is itself; such an element is called an idempotent. The general concept of a morphism is that it preserves all the structure there is. Having a unit element is not part of the definition of a ring, even though the rings we deal with here generally do have units. To be more precise, we define a unital ring to be one with a unit element and a unital morphism to be one preserving the unit element. If you want a simple example where a morphism which is a bijection is not necessarily an isomorphism, considered partially ordered sets or posets, that is, sets in which for some but not necessarily all pairs of elements x, y there is a relation x ≺ y. This has to satisfy the axioms that x ≺ y and y ≺ z imply x ≺ z, x ≺ x, and x ≺ y together with y ≺ x imply x = y. A morphism f : S → T of partially ordered sets is a set map such that x ≺ y implies f x ≺ f y. Now suppose that S carries a non-trivial partial order and that |S| is the same set with the partial order wiped out, i.e., in which there is no pair x, y with x ≺ y unless x = y. Now consider the identity map |S| → S carrying every element to itself. This is trivially a morphism of partially ordered sets (since there are no conditions to satisfy) and is trivially a bijection, but the inverse map S → |S| is not a morphism of partially ordered sets. 3 The kernel K of a group morphism f : G → K is the set of all x ∈ G such that f x = 1. It is a normal subgroup of G, i.e., a subgroup such that xKx−1 = K, where xKx−1 is the set of all elements xkx−1, k ∈ K. The kernel of a ring morphism f : R → S is similarly the set I of all x ∈ R such that f x = 0. This subset is an ideal of R, that is, (i) I is an additive subgroup of R (ii) with the additional property that if x ∈ R and z ∈ I then xz and zx are both in I. When we consider only the additive group structure of R (forget the multiplication for the moment) we generally write R+ . Here is a simple but fundamental example. Suppose that R is commutative and pick an element a ∈ R. Then the set of all multiples of a, i.e. elements of the form ax with x ∈ R, forms an ideal. Such an ideal is called principal ; it is the principal ideal generated by a, written aR or sometimes simply as (a) when the ring is understood. In particular, when R = Z if we pick an integer say 6, then the set of all multiples of 6 forms an ideal. As we will see, in Z every ideal is principal. Rings with this special property are generally called principal ideal rings. In any ring, the entire ring and the subring reduced to the zero element alone are ideals but we generally don’t count these; all other ideals are called proper. It can happen that a ring has no proper ideals, in which case it is called simple. The rings Mn (R) are simple. Exercise 3 (i) Prove this. (ii) Prove that if I is an ideal of a ring R then Mn (I) is an ideal of Mn (R) and (iii) that every ideal of Mn (R) has this form. Recall that if we have a normal subgroup K of a group G then we can form the quotient group G/K (the set of cosets xK). Every normal subgroup K is actually the kernel of a group morphism, namely of the canonical morphism G → G/K. Moreover, if we have a group morphism f : G → H and if the kernel of f contains K then we can define a new morphism f¯ : G/K → H as follows: An element of G/K is a coset xK; set f¯(xK) = f (x). The “representative” x of the coset xK is not unique; it is just one of the elements of xK. However, if we have one representative x (that is, if we have in hand one element x of xK) then any other representative y must be a multiple xk of x. It follows that f (y) = f (x)f (k) = f (x), so the map f¯ is well defined, and it is easy to see that it is a group morphism. With this, f can be factored: it can be written as the f¯ composite morphism G −−can −−→ G/K −−−−→ H . The important thing now is that we can do the same for rings. Suppose that I is an ideal of R. Since I is an additive subgroup of R+ its cosets are written in the form x + I and we define R/I to be the set of these cosets. This is again a ring with addition and multiplication defined by (x+I)+(y +I) = (x+y)+I; (x+I)(y +I) = xy +I. In our simple example of the multiples of 6 in Z, the ring Z/(6) (sometimes written even more simply as Z/6 has exactly six elements, 0̄, 1̄, 2̄, . . . , 5̄, the cosets of 0 through 5, respectively. Addition and multiplication in Z/6 are easy but there are some peculiarities: 2̄ · 3̄ = 0̄ so here we have a pair of elements, neither of which is the zero element of the ring, but their product is zero; such elements are called zero divisors. Also, (3̄)2 = 3̄ · 3̄ = 3̄. This is another example of an idempotent. 4 The reason that Z/6 has zero divisors is that 6 is composite; 6 = 2 · 3. It is a basic theorem (a special case of a deeper one) that if p is a prime number then Z/p is a field. It is easy to check, for example, that Z/7 is a field: 2̄ · 4̄ = 3̄ · 5̄ = 6̄ · 6̄ = 1̄ (and, of course 1̄ is its own inverse), showing that every non-zero element of Z/7 has a multiplicative inverse. It follows that the non-zero elements form a group under multiplication. This group is cyclic; you can check that the powers of 3̄ give all the non-zero elements of Z/6. This is not an accident. We will prove the following Theorem 2 Any finite multiplicative subgroup of the multiplicative group of a field is cyclic. 2 Exercise 4 Give examples to show that this need not hold for (i) a skew field or (ii) an infinite group. To prove this we shall need some information about the structure of Abelian groups. A group G (Abelian or not) is finitely generated if there is some finite subset of elements g1 , . . . , gn such that every element of G can be written as a product of these elements and their reciprocals. A group G which can be generated by a single element g is cyclic. If G is finite, say #G = n then G = {1, g, g 2 , . . . , n − 1}, where g n = 1. In this case G is isomorphic (as an a group) to Z/n, the isomorphism being given by g m → m̄. If G is infinite then G = {. . . , g −2 , g −1 , 1, g, g 2 , . . . }, where no power of G other then the zeroth is equal to the unit element. In this case, G ∼ = Z, the isomorphism being given by g m → m. Recall that the direct product or simply the product of two groups, G × H consists of all ordered pairs (g, h), g ∈ G, h ∈ H with group operation (g, h)(g 0 , h0 ) = (gg 0 , hh0 ). This construction extends in an obvious way to a product of any finite number of groups. With this we have the following fundamental theorem on finitely generated Abelian groups (a special case of a slightly more general theorem which we will prove) Theorem 3 Every finitely generated Abelian group is isomorphic to a unique group of the form Z/d1 × Z/d2 × · · · × Z/dr × Z × · · · × Z where d1 |d2 | · · · |dr . 2 The last condition means that d1 divides d2 , d2 divides d3 , etc.. If the group is finite, then the Z factors don’t appear. There are generally other ways to decompose an Abelian group, but this is the shortest (least number of factors). The di are called the principal divisors of the group. Note that if the group is finite then the largest one, here denoted dr is the exponent of the group, i.e., the smallest integer e such that g e = 1 for every g ∈ G. Notice that if a finite group is not cyclic, i.e., if r ≥ 2, then writing dr = e, the number of elements of the group satisfying the equation xe = 1 is greater than e. In a field the number of solutions to a polynomial equation can not exceed the degree of the equation. This is why a finite subgroup of the multiplicative group of a field must be cyclic. Sometimes, however, it is quite difficult to find a generator. For example, we will see that if p is a prime then Z/p is a field, usually denoted Fp . Since this field is finite and has exactly p elements, its multiplicative group 5 is of order p − 1 and cyclic, but the difficulty of finding a generator when p is a large prime has sometimes been used in coding schemes. (The existence of finite fields was discovered by E. Galois.) The concept of direct product can also be extended to rings R and S except that there it is frequently called the direct sum and denoted R ⊕ S. It is the set of ordered pairs (r, s) with r ∈ R, s ∈ S with addition and multiplication defined by (r, s) + (r0 , s0 ) = (r + r0 , s + s0 ), (r, s)(r0 , s0 ) = (rr0 , ss0 ). Let’s get back (in a sophisticated way) to our problem of determining which primes p (and more generally, which integers) are sums of two squares, but first, one elementary observation: Any prime p ≡ 3(4) can not be a sum of two squares. For observe that if n is an even integer then n2 ≡ 0(4), while if n is odd, say n = 2m + 1 then n2 = 4m2 + 4m + 1 ≡ 1(4). Therefore, a sum of two squares can only be congruent to 0, 1, or 2 mod 4, hence never to 3. For deeper results we will look at a generalization of the concept of integer. A special case is the set of all complex numbers of the form a + bi where a and b are integers. These form a ring, the ring of Gaussian integers, denoted Z[i]. Ordinary primes (henceforth called rational primes because they are the primes in the field Q of rational numbers) may factor in Z[i]. For example, 2 = i(1 + i)2 , 5 = (1 + 2i)(1 − 2i). If p is a sum of two squares, p = a2 + b2 , a, b ∈ Z, then p = (a + bi)(a − bi) in Z[i], so being a sum of two squares implies factorization in Z[i]. The converse is also true. Lemma 1 A rational prime p factors in Z[i] if and only if it is a sum of two squares, in which case it has exactly two non-trivial factors, i.e., factors other than ±1, ±i. Proof. Suppose that p is a prime which factors in a non-trivial way in Z[i] with p = αβ. Taking conjugates we also have p = ᾱβ̄. Multiplying gives p2 = |α|2 |β|2 , a factorization of p2 in Z, but the only possibility for this is that |α|2 = |β|2 = p. It follows that neither α nor β can factor further since this would give a factorization of p in Z. Since both have the same absolute value and their product is real, one must be the conjugate of the other. So if α = a + bi, a, b ∈ Z then β = a − bi and p = a2 + b2 . ¥ A complex number α which is a root of some polynomial f (x) = cn xn + cn−1 xn−1 + · · · + c1 x + c0 with integer coefficients is called an algebraic number. We could make this equation monic, i.e., have leading coefficient equal to 1 by dividing by cn , but then the other coefficients would generally be rational numbers and not integers. If α is a root of a monic polynomial with integer coefficients then it is called an algebraic integer. If α is an algebraic number then there is a unique monic polynomial f (x) of minimum degree which it satisfies; that polynomial is called the minimum polynomial for α and its degree is called the degree √ of α. Before considering the general case, consider the special case where α = d, where d is some square-free integer (i.e., not divisible by the square of any integer other than 1) but which may be negative, a most important case being d = −1. This is obviously algebraic and even an algebraic 2 integer since it satisfies √ the equation x − d = 0. In fact, all complex numbers of the form α = a + b d with a, b ∈ Q are algebraic since α satisfies the equation 6 √ x2 − 2ax + (a2 − b2 d) = 0. The set of all these numbers is denoted Q( d) and we claim that they form a field: it is clear √ that the sums, differences, and products of two numbers of the form a + b d is again a number of the same form. To see that the inverse is also, observe first that since d is not a square the rational number a2 − b2 d can not vanish whenever either a or b is not 0. From x2 − 2ax + (a2 − b2 d) = 0√we have x(x − 2a) = −(a2 − b2 d) whence 2 x−1 = −(x − 2a)/(a2 − √b d), so Q d is indeed a field. It is also a vector space over Q with basis {1, d}, so it has dimension equal to 2 as a Q-space. Such a field is usually called a quadratic field. We will see that in fact the set of all algebraic numbers forms a field and that the set of√all algebraic integers forms a subring of this field. When is a number a + b d; a, b ∈ Q an (algebraic) integer? Looking at the equation it satisfies, it is sufficient (and we will later show, necessary) that 2a and a2 − db2 be integers. If a is an ordinary or rational integer then db2 must also be an integer, and since d is square-free it follows that b is an integer, too. The only other possibility is that a be a “half-integer”, i.e., of the form m + 1/2, m ∈ Z, in which case a2 = m2 + m + 1/4. But then a2 − db2 can only be an integer if b is also a half integer and d ≡ 1(4). √ So in this case the integers consist not just of all elements of the form m + n d, m, n ∈ Z √ but also of the elements (m + 12 ) + (n + 12 ) d. √ Exercise 5 Prove that the integers of Q d do form a ring. Since −1 6≡ 1(4) we see finally that the ring of Gaussian integers is precisely the ring of algebraic integers inside the field Q(i). We are now close to understanding why a prime p ≡ 1(4) must be a sum of two squares. Observe that if p = a2 + b2 = (a + bi)(a − bi), a, b ∈ Z then the rational prime p has factored inside the ring of Gaussian integers Z[i]. This raises the question of whether Z[i] behaves like Z in that it has “primes” which can not be factored and where every element can be factored “uniquely” into a product of primes. Here is the reason for the quotes. Even in Z factorization is not strictly unique because we could introduce factors of −1 so we make the following definition: In a commutative ring R (with unit element) an element u which has an inverse is called a unit. (This may not be the best terminology, but it is the historical one.) The units form a group under multiplication, usually denoted R× . In Z the group of units consists only of {+1, −1}; in Z[i] it is {±1, ±i}. Elements x, y ∈ R with y = ux where u is a unit are called associates; this is obviously an equivalence relation. In a commutative ring it is meaningful to say that y divides x if x = yz for some z but it generally does not follow that if x and y divide each other that they are associates, for there may be zero divisors. A commutative ring R in which there are no zero divisors is called an integral domain or simply a domain. (The older name was “domain of integrity”.) In a domain, if x and y divide each other, x = yz and y = xw, then we have x = xwz or x(1 − wz) = 0, so 1 − wz = 0. Thus w and z are units and x and y are associates. An element which has no divisors except itself and associates is called irreducible. When we speak of “unique factorization” it means factorization into irreducibles which is unique up to the 7 order of the irreducible factors and multiplication by units (or replacement of factors by associates). It is quite possible in a domain for an element to have genuinely distinct factorizations into irreducibles. √ Consider, for example, the √ ring of integers of Q( −5);√it consists √ of all m + n −5 with m, n ∈ Z. In this ring the elements 3, 7, 1 + 2 −5, 1 − 2 −5 are all irreducible. Exercise 6 Prove this from first principles. (We will see more sophisticated reasons later.) √ Unfortunately unique factorization fails inside the ring of integers of Q( −5 for we √ have the two √ distinct factorizations into irreducible factors 21 = 3 · 7 = (1 + 2 −5) · (1 − 2 −5). A domain in which we have unique factorization into irreducibles is called a unique factorization domain, abbreviated UFD, or a factorial domain. In these we sometimes call irreducible elements “primes”, but bear in mind that with this definition +2 and −2 are both rational primes. It is a basic theorem (and not too difficult) that Z[i] is a factorial domain. Obviously in a factorial domain if an irreducible element divides a product then it must divide one of the factors; in fact, this is a crucial property that one must prove to show that a domain is factorial. If R is a factorial domain and π an irreducible element of R then√the quotient ring R/π is again a √ domain and√conversely. On the other hand Z[ −5]/3 is not a domain since (1+2 −5)·(1−2 −5) = 3·7 ≡ 0 mod 3. We will prove that Z[i] is factorial, but for the moment let’s accept it. Lemma 2 A finite domain R is a field Proof. We must show that every non-zero element x ∈ R has an inverse. Consider the set of all xy as y varies in R. No two of these can be identical, for if xy = xy 0 then x(y − y 0 ) = 0, contradicting the assumption that R is a domain. Since R is finite, the set of all xy with fixed x must be all of R, so there is a y such that xy = 1. ¥ Exercise 7 We really don’t need the commutativity in the preceding Lemma or even the existence of a unit element. Prove that if S is a finite set with an associative multiplication with both the left and right cancelation property, i.e., xy = xy 0 implies y = y 0 and yx = y 0 x also implies y = y 0 then S is in fact a group (and in particular, there is a unit element for multiplication). (Hard) Suppose only one of the two cancelation properties holds. Is S still a group? (Give a proof or a counterexample, but don’t spend too much time on it.) Finally (assuming some of the things we have not yet proven) we have the proof of Theorem 1: Proof. To prove assertion 1. we must show that if p ≡ 1(4) then p factors in Z[i]. Suppose to the contrary that it remained irreducible. Since Z[i] is a factorial domain it would follow that Z[i]/p is again a domain, with exactly p2 elements, and being finite it would be a field. The equation x4 = 1 could then have no more than four roots in Z[i]/p. But Z[i]/p ⊃ Z/p and the latter is a field with exactly p elements. Its multiplicative group is therefore a cyclic 8 group with p − 1 elements, and this is a multiple of 4, say p − 1 = 4m. If a is any generator of this group then 1, am , a2m and a3m are four distinct elements satisfying x4 = 1. But (the classes of) i and −i are not amongst these and also satisfy the equation, so there are too many roots. Therefore Z[i]/p can not be a field, so p must factor. For 2., suppose that m is a sum of two squares, hence of the form m = (a + bi)(a − bi) for some a, b ∈ Z and that a prime p ≡ 3(4) divides m. Since p is still irreducible in Z[i] it must divide one of the two factors. Suppose pk ||(a + bi) (meaning that k is the precise power to which p divides a + bi). Since pk is real, taking conjugates we see that also pk ||(a − bi) so p2k ||m. ¥ So one way to understand the “if” (hard) part of Fermat’s original assertion, that an odd prime p is a sum of two squares if and only if p ≡ 1(4), is to say that such a prime must factor in Z[i]. There are many details that must be filled in; that will be our next project. Exercise 8 Show that if a, b are integers then Z/(a + bi) always has exactly a2 + b2 elements, and if a, b are relatively prime then there is an isomorphism Z/(a2 +b2 ) → Z/(a+bi) but not otherwise. What is the structure of the additive group of Z/(a + bi) when a and b are not relatively prime? (Hint: You might want to do the relatively prime case first.) 9

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1 First Theme: Sums of Squares