Download higher algebra

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cartesian tensor wikipedia , lookup

Factorization wikipedia , lookup

Bra–ket notation wikipedia , lookup

Dual space wikipedia , lookup

Linear algebra wikipedia , lookup

Field (mathematics) wikipedia , lookup

Homological algebra wikipedia , lookup

Invariant convex cone wikipedia , lookup

Group action wikipedia , lookup

Eisenstein's criterion wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Oscillator representation wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Complexification (Lie group) wikipedia , lookup

Modular representation theory wikipedia , lookup

Deligne–Lusztig theory wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Polynomial ring wikipedia , lookup

Congruence lattice problem wikipedia , lookup

Birkhoff's representation theorem wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Commutative ring wikipedia , lookup

Algebraic number field wikipedia , lookup

Transcript
HIGHER ALGEBRA
Ernest E. Shult
1
Contents
1 Basics
1.1 Presumed results and conventions . . . . . . . .
1.1.1 Presumed Jargon . . . . . . . . . . . . .
1.1.2 Sets and maps . . . . . . . . . . . . . . .
1.1.3 Partitions . . . . . . . . . . . . . . . . .
1.1.4 Zorn’s Lemma and the Axiom of Choice.
1.1.5 Cardinal Numbers . . . . . . . . . . . .
1.1.6 Special results on cardinalities . . . . . .
1.2 Binary operations and Monoids. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Partially Ordered Sets and Dependance Theories
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Basic definitions. . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Definition of a partially ordered set. . . . . . . . . . . .
2.2.2 Induced subposets. . . . . . . . . . . . . . . . . . . . .
2.2.3 Dual posets and dual concepts. . . . . . . . . . . . . .
2.2.4 Maximal and minimal elements of induced posets . . .
2.2.5 Global maxima and minima: zero’s and one’s in posets.
2.2.6 Total orderings and chains . . . . . . . . . . . . . . . .
2.2.7 Antichains. . . . . . . . . . . . . . . . . . . . . . . . .
2.2.8 Zornification . . . . . . . . . . . . . . . . . . . . . . . .
2.2.9 Order ideals and filters. . . . . . . . . . . . . . . . . . .
2.2.10 Products of posets. . . . . . . . . . . . . . . . . . . . .
2.2.11 Examples and exercises. . . . . . . . . . . . . . . . . .
2.3 Mappings among posets: closure operators and Galois connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Subposets. . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Order-preserving mappings . . . . . . . . . . . . . . . .
2
13
13
13
13
18
19
20
21
23
24
24
26
26
26
27
27
28
28
29
30
30
32
33
41
41
41
2.4
2.5
2.6
2.7
2.8
2.9
2.3.3
2.3.4
Chain
2.4.1
2.4.2
2.4.3
Closure operators. . . . . . . . . . . . . . . . . . . . .
Closure operators and Galois connections. . . . . . . .
conditions . . . . . . . . . . . . . . . . . . . . . . . . .
Saturated chains . . . . . . . . . . . . . . . . . . . . .
Algebraic intervals and the height function. . . . . . .
The ascending and descending chain conditions in arbitrary posets. . . . . . . . . . . . . . . . . . . . . . . .
Posets with meets and/or joins. . . . . . . . . . . . . . . . . .
2.5.1 Meets and joins . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Lower semilattices with the descending chain condition.
2.5.3 Lower semilattices with both chain conditions . . . . .
2.5.4 Examples and exercises . . . . . . . . . . . . . . . . . .
The Jordan-Hölder Theory . . . . . . . . . . . . . . . . . . . .
2.6.1 Lower semi-lattices and semimodularity . . . . . . . . .
2.6.2 Interval Measures on Posets . . . . . . . . . . . . . . .
2.6.3 The Jordan-Hölder Theorem . . . . . . . . . . . . . . .
Möbius inversion . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 More locally finite posets. . . . . . . . . . . . . . . . .
2.7.2 The incidence algebra . . . . . . . . . . . . . . . . . . .
2.7.3 The Möbius function of a locally finite poset . . . . . .
2.7.4 Special cases . . . . . . . . . . . . . . . . . . . . . . . .
2.7.5 Remarks on other applications of the Möbius function .
Point measures in lattices; the modular law. . . . . . . . . . .
2.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
2.8.2 Dual lattices and sublattices. . . . . . . . . . . . . . .
2.8.3 Kolmogoroff functions on lattices . . . . . . . . . . . .
2.8.4 The “inclusion-exclusion” principle . . . . . . . . . . .
2.8.5 Point-measure of a lattice . . . . . . . . . . . . . . . .
2.8.6 Modular lattices . . . . . . . . . . . . . . . . . . . . . .
2.8.7 Exercises concerning lattices. . . . . . . . . . . . . . .
Dependence Theories . . . . . . . . . . . . . . . . . . . . . . .
2.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
2.9.2 Dependence . . . . . . . . . . . . . . . . . . . . . . . .
2.9.3 The extended definition . . . . . . . . . . . . . . . . .
2.9.4 Independence and spanning. . . . . . . . . . . . . . . .
2.9.5 Dimension. . . . . . . . . . . . . . . . . . . . . . . . .
2.9.6 Matroids . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9.7 The rank function . . . . . . . . . . . . . . . . . . . . .
3
43
43
45
45
46
47
50
50
52
54
55
57
58
58
60
62
62
64
65
67
69
69
69
70
71
73
74
77
81
83
83
83
84
85
86
87
89
2.9.8 Dependence theories and matroids are the same thing . 91
2.9.9 Other formulations of dependence . . . . . . . . . . . . 94
2.9.10 Exercises on Matroids . . . . . . . . . . . . . . . . . . 96
3 Review of Elementary Group Properties
97
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.2 Definition and examples. . . . . . . . . . . . . . . . . . . . . . 98
3.2.1 Orders of elements and groups . . . . . . . . . . . . . . 102
3.2.2 Subgroups. . . . . . . . . . . . . . . . . . . . . . . . . 103
3.3 Homomorphisms of Groups . . . . . . . . . . . . . . . . . . . . 106
3.3.1 Definitions and basic properties. . . . . . . . . . . . . . 106
3.3.2 Automorphisms as right operators. . . . . . . . . . . . 108
3.3.3 Examples of homomorphisms. . . . . . . . . . . . . . . 109
3.4 Factor Groups and the Fundamental Theorem of Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.4.1 Miscellaneous Exercises. . . . . . . . . . . . . . . . . . 121
3.5 Direct products. . . . . . . . . . . . . . . . . . . . . . . . . . . 123
3.6 Other Variations: Semidirect Products and Subdirect Products.125
3.6.1 Semidirect products. . . . . . . . . . . . . . . . . . . . 125
3.6.2 Subdirect products. . . . . . . . . . . . . . . . . . . . . 128
3.7 Minimal normal subgroups of finite groups. . . . . . . . . . . . 128
4 Permutation Groups and Group Actions
4.1 Group actions and orbits . . . . . . . . . . . . . . . . . . . .
4.2 Permutations. . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Cycle Notation. . . . . . . . . . . . . . . . . . . . .
4.2.2 Even and odd permutations. . . . . . . . . . . . . .
4.2.3 Transpositions and Cycles: The Theorem of Feit, Lyndon and Scott . . . . . . . . . . . . . . . . . . . . . .
4.2.4 Action on right cosets. . . . . . . . . . . . . . . . . .
4.2.5 Equivalent actions. . . . . . . . . . . . . . . . . . . .
4.2.6 The Fundamental Theorem of Transitive Actions. . .
4.2.7 Normal subgroups of transitive groups. . . . . . . . .
4.2.8 Double cosets. . . . . . . . . . . . . . . . . . . . . .
4.3 Applications of transitive group actions. . . . . . . . . . . .
4.3.1 Finite p-groups. . . . . . . . . . . . . . . . . . . . . .
4.3.2 The Sylow Theorems . . . . . . . . . . . . . . . . . .
4.3.3 Calculations of Group Orders. . . . . . . . . . . . .
4
131
. 131
. 134
. 134
. 136
.
.
.
.
.
.
.
.
.
.
138
140
141
141
142
143
144
144
145
149
4.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
152
152
154
157
161
162
168
172
5 Normal Structure of Groups
5.1 The Jordan-Holder Theorem for Artinian Groups
5.1.1 Subnormality. . . . . . . . . . . . . . . . .
5.2 Commutators. . . . . . . . . . . . . . . . . . . . .
5.3 The Derived Series and Solvability. . . . . . . . .
5.4 Central Series and Nilpotent Groups. . . . . . . .
5.4.1 The upper and lower central series. . . . .
5.4.2 Finite nilpotent groups. . . . . . . . . . .
5.5 Coprime Action . . . . . . . . . . . . . . . . . . .
5.6 Exercises for Chapter 4 . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
177
177
177
180
183
188
188
192
196
199
4.5
Primitive and Multiply Transitive Actions.
4.4.1 Primitivity . . . . . . . . . . . . . .
4.4.2 The rank of a group action. . . . .
4.4.3 Multiply transitive group actions. .
4.4.4 Permutation characters . . . . . . .
Exercises for Chapter 3. . . . . . . . . . .
4.5.1 First proof: . . . . . . . . . . . . .
4.5.2 Burnside’s original proof. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6 Generation in Groups
207
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.2 The Cayley graph . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.2.1 Definition. . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.2.2 Morphisms of various sorts of graphs. . . . . . . . . . . 208
6.2.3 Morphisms of groups induce morphisms of Cayley graphs.209
6.3 Free groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.3.1 The Universal Property . . . . . . . . . . . . . . . . . 216
6.4 The Brauer-Ree Theorem . . . . . . . . . . . . . . . . . . . . 224
6.5 Appendix to Chapter 6: A Detective Story: An application of
(k, l, m)-groups . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7 Elementary Properties of Rings
7.1 Elementary Facts about Rings .
7.1.1 Definitions . . . . . . . .
7.1.2 Units in rings . . . . . .
7.1.3 Homomorphisms . . . .
7.1.4 Ideals and factor rings. .
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
235
. 235
. 236
. 238
. 239
. 240
7.1.5
7.1.6
7.1.7
The fundamental theorems of ring homomorphisms . . 241
The poset of ideals. . . . . . . . . . . . . . . . . . . . . 243
Examples . . . . . . . . . . . . . . . . . . . . . . . . . 246
8 Elementary properties of Modules
8.1 Basic theory . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 Modules over rings. . . . . . . . . . . . . . . . . . . .
8.1.2 Submodules . . . . . . . . . . . . . . . . . . . . . . .
8.1.3 Factor modules. . . . . . . . . . . . . . . . . . . . . .
8.1.4 The fundamental theorems of homomorphisms for Rmodules. . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.5 The Jordan-Hölder Theorem for R-modules. . . . . .
8.1.6 Direct sums and products of modules. . . . . . . . . .
8.1.7 Free modules . . . . . . . . . . . . . . . . . . . . . .
8.1.8 Vector spaces. . . . . . . . . . . . . . . . . . . . . . .
8.1.9 Examples and exercises . . . . . . . . . . . . . . . . .
8.2 Consequences of chain conditions on modules . . . . . . . .
8.2.1 Definitions: Noetherian and Artinian modules . . . .
8.2.2 Effects of the chain conditions on endomorphisms. . .
8.2.3 Noetherian Modules and finite generation. . . . . . .
8.2.4 Right and left Noetherian rings. . . . . . . . . . . . .
8.2.5 Algebraic integers form a ring. . . . . . . . . . . . . .
8.3 The Hilbert Basis Theorem . . . . . . . . . . . . . . . . . .
.
.
.
.
254
254
254
257
259
.
.
.
.
.
.
.
.
.
.
.
.
.
259
261
263
264
266
268
271
271
273
278
279
282
285
9 The Arithmetic of Integral Domains.
288
9.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.2 Localization in Commutative Rings. . . . . . . . . . . . . . . . 294
9.2.1 Localization by a multiplicative set. . . . . . . . . . . . 294
9.2.2 Special features of localization in domains. . . . . . . . 295
9.2.3 Localization at a prime ideal. . . . . . . . . . . . . . . 296
9.3 Unique Factorization. . . . . . . . . . . . . . . . . . . . . . . . 297
9.3.1 A Special Case. . . . . . . . . . . . . . . . . . . . . . . 301
9.4 Euclidean Domains . . . . . . . . . . . . . . . . . . . . . . . . 302
9.4.1 Examples of Euclidean Domains . . . . . . . . . . . . . 304
9.5 Principal Ideal Domains are Unique Factorization Domains . . 306
9.6 If D is a UFD, then so is D[x]. . . . . . . . . . . . . . . . . . 306
9.7 Appendix to Chapter 8: The Arithmetic of Quadratic Domains.313
9.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 313
6
9.7.2
9.7.3
Which quadratic domains are Euclidean, principal ideal
or factorial domains? . . . . . . . . . . . . . . . . . . . 315
Subdomains of Euclidean domains which are not UFD’s 317
10 Principal Ideal Domains and Their Modules
321
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
10.2 Quick Review of PID’s . . . . . . . . . . . . . . . . . . . . . . 321
10.3 Free modules over a PID. . . . . . . . . . . . . . . . . . . . . . 323
10.4 A technical result . . . . . . . . . . . . . . . . . . . . . . . . . 324
10.5 Finitely generated modules over a PID . . . . . . . . . . . . . 327
10.6 The uniqueness of the Invariant Factors. . . . . . . . . . . . . 331
10.7 Applications of the Theorem on PID Modules. . . . . . . . . . 333
10.7.1 Classification of Finite Abelian Groups. . . . . . . . . . 333
10.7.2 The Rational canonical form of a linear transformation 334
10.7.3 The Jordan Form . . . . . . . . . . . . . . . . . . . . . 338
10.7.4 Information Carried by the Invariant Factors . . . . . . 340
11 Theory of Fields.
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Elementary Properties of Field Extensions. . . . . . . . . . .
11.2.1 Algebraic and Transcendental Extensions. . . . . . .
11.2.2 Indices Multiply: Ruler and Compass Problems. . . .
11.2.3 The Number of Zeros of a Polynomial . . . . . . . . .
11.3 Splitting Fields and Their Automorphisms . . . . . . . . . .
11.3.1 Extending Isomorphisms Between Fields . . . . . . .
11.3.2 Splitting Fields. . . . . . . . . . . . . . . . . . . . . .
11.3.3 Normal Extensions. . . . . . . . . . . . . . . . . . . .
11.4 Some Applications to Finite Fields . . . . . . . . . . . . . .
11.4.1 Prime Subfields . . . . . . . . . . . . . . . . . . . . .
11.4.2 Finite fields . . . . . . . . . . . . . . . . . . . . . . .
11.4.3 Automorphisms of finite fields . . . . . . . . . . . . .
11.4.4 Polynomial equations over finite fields: the ChevalleyWarning Theorem . . . . . . . . . . . . . . . . . . . .
11.5 Separable Elements and Field extensions. . . . . . . . . . . .
11.5.1 Separability. . . . . . . . . . . . . . . . . . . . . . . .
11.5.2 p-th Power Mappings in Characteristic p . . . . . . .
11.5.3 Simple Roots . . . . . . . . . . . . . . . . . . . . . .
11.5.4 Separable Elements and Separable Field Extensions .
7
343
. 343
. 344
. 345
. 347
. 349
. 351
. 351
. 353
. 357
. 359
. 359
. 360
. 361
.
.
.
.
.
.
362
364
364
366
367
370
11.6 Galois Theory . . . . . . . . . . . . . . . . . . . . . .
11.6.1 Galois field extensions . . . . . . . . . . . . .
11.6.2 Independence of linear characters . . . . . . .
11.6.3 Group orders and degrees in Galois extensions
11.7 Solvability of equations by radicals . . . . . . . . . .
11.7.1 Introduction . . . . . . . . . . . . . . . . . . .
11.7.2 Roots of unity . . . . . . . . . . . . . . . . . .
11.7.3 Radical extensions . . . . . . . . . . . . . . .
11.7.4 Galois’ criterion for solvability by radicals . .
11.8 Transcendental extensions . . . . . . . . . . . . . . .
11.8.1 Simple transcendental extensions . . . . . . .
11.8.2 Transcendence Degree . . . . . . . . . . . . .
11.9 Appendix to Chapter 10: Fields with a Valuation . .
11.9.1 Introduction . . . . . . . . . . . . . . . . . . .
11.9.2 The Language of Convergence . . . . . . . . .
11.9.3 Completions exist . . . . . . . . . . . . . . . .
11.9.4 The non-archimedean p-adic valuations . . . .
11.10Appendix to Chapter 10: Finite Division Rings . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
375
375
376
378
383
383
386
387
390
394
394
399
402
402
405
406
413
418
12 Semiprime Rings
424
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
12.2 Complete Reducibility . . . . . . . . . . . . . . . . . . . . . . 424
12.3 Homogeneous Components. . . . . . . . . . . . . . . . . . . . 429
12.3.1 The action of the R-endomorphism ring. . . . . . . . . 429
12.3.2 The Socle of a Module is a Direct Sum of the Homogeneous Components. . . . . . . . . . . . . . . . . . . . 429
12.4 Semiprime rings. . . . . . . . . . . . . . . . . . . . . . . . . . 431
12.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 431
12.4.2 The semiprime condition . . . . . . . . . . . . . . . . . 432
12.4.3 Completely Reducible and Semiprimitive Rings are Species
of Semiprime Rings . . . . . . . . . . . . . . . . . . . . 433
12.4.4 Idempotents and Minimal Right Ideals of Semiprime
Rings. . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
12.4.5 Idempotents and Left-Right Symmetries . . . . . . . . 435
12.5 Completely Reducible Rings . . . . . . . . . . . . . . . . . . . 439
12.6 Exercises for Chapter 12 . . . . . . . . . . . . . . . . . . . . . 443
12.6.1 Exercises relating to Sections 12.1 and 12.2 . . . . . . . 443
12.6.2 Warm up Exercises . . . . . . . . . . . . . . . . . . . . 446
8
12.6.3
12.6.4
12.6.5
12.6.6
Exercises concerning the Jacobson radical . . . . . .
The Jacobson radical of Artinian rings . . . . . . . .
Quasiregularity and the radical . . . . . . . . . . . .
Exercises involving nil one-sided ideals in Noetherian
rings. . . . . . . . . . . . . . . . . . . . . . . . . . . .
13 Categorical Notions
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.1 Universal Mapping Properties. . . . . . . . . . . . . .
13.1.2 Duality. . . . . . . . . . . . . . . . . . . . . . . . . .
13.1.3 Functors. . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Special Features of the Category of R-modules. . . . . . . .
13.2.1 Exact Sequences . . . . . . . . . . . . . . . . . . . .
13.2.2 The Direct Sum as a Universal Mapping Property . .
13.2.3 The Direct Product as a Universal Mapping Property
13.3 Projective Modules . . . . . . . . . . . . . . . . . . . . . . .
13.4 Injective Modules . . . . . . . . . . . . . . . . . . . . . . . .
13.5 hom(A, B) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.5.1 hom(A, B) is an R-module. . . . . . . . . . . . . . .
13.5.2 Functorial properties of Hom . . . . . . . . . . . . .
13.5.3 Dual modules . . . . . . . . . . . . . . . . . . . . . .
13.6 Tensor Products. . . . . . . . . . . . . . . . . . . . . . . . .
13.6.1 Elementary Consequences of the Definition. . . . . .
13.7 Important applications of the tensor product . . . . . . . . .
13.7.1 Exchange of Rings . . . . . . . . . . . . . . . . . . .
13.7.2 Tensor products of R-algebras. . . . . . . . . . . . . .
13.7.3 The tensor product of modules as a module for a tensor
product of rings. . . . . . . . . . . . . . . . . . . . .
13.8 Graded Algebras . . . . . . . . . . . . . . . . . . . . . . . .
13.8.1 Morphisms of graded modules . . . . . . . . . . . . .
13.8.2 Subalgebras of graded R-algebras . . . . . . . . . . .
13.8.3 Some examples of graded R-algebras . . . . . . . . .
13.8.4 Ideals and homomorphisms in associative graded Ralgebras . . . . . . . . . . . . . . . . . . . . . . . . .
13.9 Exercises for Chapter 12 . . . . . . . . . . . . . . . . . . . .
13.10Appendix to Chapter 12: Important R-algebras . . . . . . .
13.10.1 Other applications . . . . . . . . . . . . . . . . . . .
9
. 447
. 448
. 449
. 451
454
. 454
. 458
. 460
. 461
. 464
. 464
. 465
. 467
. 468
. 471
. 472
. 472
. 474
. 475
. 476
. 476
. 480
. 480
. 481
.
.
.
.
.
486
488
488
489
491
.
.
.
.
492
494
495
500
14 Categorical Features of Vector Spaces
14.1 Left and Right Vector Space Dimensions . . . . . . . . . . .
14.2 hom(V, W ) as a Left Vector Space; the Dual Space . . . . .
14.3 Tensor Products of Vector Spaces Over Fields . . . . . . . .
14.4 Tensor Products of (F, F )-bimodules . . . . . . . . . . . . .
14.5 Tensor Products of Linear Transformations. . . . . . . . . .
14.6 The Universal Mapping Property of Multiple Tensor Products:
Multilinear Mappings . . . . . . . . . . . . . . . . . . . . . .
15 Tensor Algebras and Their Offspring
15.1 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.1.1 Nonassociative Algebras . . . . . . . . . . . . . . . .
15.1.2 Associative Algebras . . . . . . . . . . . . . . . . . .
15.2 The Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . .
15.3 The Symmetric Algebra . . . . . . . . . . . . . . . . . . . .
15.4 The Exterior Algebra . . . . . . . . . . . . . . . . . . . . . .
15.5 Transformations Induced on the k-fold Wedge Products: The
Determinant as Group Homomorphism. . . . . . . . . . . . .
15.6 Symmetric and Alternating Multilinear Mappings. . . . . . .
16 Sesquilinear Forms.
16.1 σ-sesquilinear Forms. . . . . . . . . . . . . . . . . . . . . . .
16.1.1 Basic definitions . . . . . . . . . . . . . . . . . . . . .
16.1.2 Forms and functionals . . . . . . . . . . . . . . . . .
16.1.3 Descriptions of forms by Grammian matrices . . . . .
16.2 Reflexive forms. . . . . . . . . . . . . . . . . . . . . . . . . .
16.2.1 Perps and radicals of reflexive forms. . . . . . . . . .
16.2.2 (σ, )-Hermitian forms. . . . . . . . . . . . . . . . . .
16.2.3 Totally isotropic subspaces; symplectic forms. . . . .
16.3 Finite-dimensional subspaces of spaces with a non-degenerate
reflexive form. . . . . . . . . . . . . . . . . . . . . . . . . . .
16.4 The Witt index of Polar Systems. . . . . . . . . . . . . . . .
16.4.1 Polar systems . . . . . . . . . . . . . . . . . . . . . .
16.4.2 The radical of a polar system . . . . . . . . . . . . .
16.4.3 Finite rank, and the Witt index of a polar system . .
16.4.4 A brief interlude with graph-theoretic concepts . . . .
16.4.5 The distance metric on M(T ) . . . . . . . . . . . . .
16.5 Decompositions of Forms of Finite Rank . . . . . . . . . . .
10
506
. 506
. 508
. 510
. 513
. 514
. 514
.
.
.
.
.
.
516
516
516
518
521
522
525
. 528
. 530
.
.
.
.
.
.
.
.
536
536
536
537
538
538
538
539
541
.
.
.
.
.
.
.
.
542
546
546
547
548
550
552
556
16.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 556
16.5.2 Orthogonal Decompositions. . . . . . . . . . . . . . . . 557
16.5.3 Hyperbolic Decompositions of Finite Rank Sesquilinear Forms Over Fields. . . . . . . . . . . . . . . . . . . 558
16.5.4 Symmetric forms in characteristic 2 . . . . . . . . . . . 561
16.6 Special cases of sesquilinear forms . . . . . . . . . . . . . . . . 565
16.6.1 Symplectic forms . . . . . . . . . . . . . . . . . . . . . 565
16.6.2 Hermitian forms . . . . . . . . . . . . . . . . . . . . . . 565
16.6.3 Special cases for orthogonal forms. . . . . . . . . . . . 568
16.7 Sesquilinear forms obtained from others by reduction to a subfield. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
16.7.1 Forms derived from a Hermitian form by field reduction 572
16.8 Exercises for Chapter 15 . . . . . . . . . . . . . . . . . . . . . 573
16.8.1 Miscellaneous warm-up problems . . . . . . . . . . . . 573
16.8.2 Forms and algebras. . . . . . . . . . . . . . . . . . . . 575
16.8.3 Forms induced on wedge products . . . . . . . . . . . . 578
16.8.4 Special constructions involving symplectic forms . . . . 580
17 Quadratic Forms
585
17.1 Quadratic Forms. . . . . . . . . . . . . . . . . . . . . . . . . . 585
17.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 585
17.1.2 Forms of finite rank . . . . . . . . . . . . . . . . . . . . 586
17.1.3 More definitions relative to a quadratic form . . . . . . 588
17.1.4 The poset of totally singular subspaces . . . . . . . . . 588
17.1.5 Radicals of quadratic forms in characteristic 2 . . . . . 590
17.2 Hyperpolic decompositions with respect to a quadratic form. . 591
17.3 The oriflame phenomenon. . . . . . . . . . . . . . . . . . . . . 592
17.4 Isometries and the Classical Groups . . . . . . . . . . . . . . . 593
17.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 593
17.4.2 Witt’s Theorem . . . . . . . . . . . . . . . . . . . . . . 595
17.5 Quadratic forms in special cases . . . . . . . . . . . . . . . . . 605
17.5.1 The real and complex cases . . . . . . . . . . . . . . . 605
17.5.2 The case of finite fields . . . . . . . . . . . . . . . . . . 606
17.6 Quadratic forms over GF(2) . . . . . . . . . . . . . . . . . . . 608
17.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 608
17.6.2 Beginnings . . . . . . . . . . . . . . . . . . . . . . . . . 608
17.6.3 The definition of an orthogonal geometry over GF(2). . 609
11
17.6.4 Non-degenerate orthogonal geometries over GF(2) in
odd dimension. . . . . . . . . . . . . . . . . . . . . .
17.6.5 Non-degenerate orthogonal geometries over GF(2) in
even dimension. . . . . . . . . . . . . . . . . . . . . .
17.6.6 Some low dimensional cases. . . . . . . . . . . . . . .
17.6.7 Counting the singular and non-singular vectors. . . .
17.7 Cocliques in Orthogonal Geometries over GF(2). . . . . . .
17.8 Homogeneous cocliques of maximal size. . . . . . . . . . . .
17.9 Observations obtained from homogeneous cocliques in small
dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.9.1 The case n = 3. . . . . . . . . . . . . . . . . . . . . .
17.9.2 The case n = 4. . . . . . . . . . . . . . . . . . . . . .
18 The
18.1
18.2
18.3
18.4
18.5
18.6
. 610
.
.
.
.
.
611
612
613
615
616
. 619
. 619
. 620
Klein Correspondence
626
The Klein Quadric . . . . . . . . . . . . . . . . . . . . . . . . 626
The space of Symplectic Forms . . . . . . . . . . . . . . . . . 628
Pure Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
The Correspondence . . . . . . . . . . . . . . . . . . . . . . . 631
The correspondence and the oriflame strucure . . . . . . . . . 632
Consequences of the Klein Correspondence . . . . . . . . . . . 634
18.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 634
18.6.2 Classical isomorphisms . . . . . . . . . . . . . . . . . . 636
18.6.3 The generalized quadrangles of types Sp(4, K) and O(5, K)637
√
18.6.4 The generalized quadrangles of types U (4, K) and O− (6, K)638
12
Chapter 1
Basics
1.1
1.1.1
Presumed results and conventions
Presumed Jargon
The reader is assumed to have already acquired some familiarity with groups,
cosets, group homomorphisms, ring homomorphisms and vector spaces from
an undergraduate abstract algebra course or linear algebra course. Since most
abstract algebraic strucures in these notes are treated from first principles,
we rely on these topics mostly as a source of familiar examples which can
aid the intuition as well as act as points of reference that will indicate the
direction various generalizations are taking.
1.1.2
Sets and maps
In the area of sets and maps we should be precise about the conventions.
First of all the reader should have a comfortable rapport with the following:
1. The set-theoretic notions of membership, containment and the operations of intersection and union over arbitrary collections of subsets of
a set.
2. The Cartesian product construction, A × B.
3. Relations, on a set S (as a subset of S × S). Especially, equivilance
relations and equivalence classes.
13
4. The fact that given a set X, there is a set 2X of all subsets of X.
5. Mappings: The word “mapping” is intended to be indistinguishable
from the word “function” as it is used in most literature. A mapping
f : X −→ Y is any device that associates with any element x of set
X, a unique element, denoted f (x), of the set Y . We say “device
that associates” here, rather than “rule that associates” to indicate the
generality of the notion.1 Such a device always exists. We may define
a mapping f : X → Y as a subset R ⊆ X × Y with the property that
for every x ∈ X, there exists exactly one y ∈ Y such that the pair
(x, y) is in R. Then the notation y = f (x) simply means (x, y) ∈ R
and we metaphorically express this fact by saying “the function f sends
element x to y – as if the function was actively doing something. The
suggestive metaphors continue when we also render this same fact –
that y = f (x) – by saying that y is the image of element x or
equivalently x is a preimage of y or that x lies in the fibre f −1 (y).
Two mappings are considered equal if they “do the same things”. Thus
if f and g are both mappings (or functions) from X to Y we say that
mapping f is equal to mapping g if and only if f (x) = g(x) for all
x in X.
6. Identity mappings. A very special example is the following: The mapping 1X : X → X which takes each element x of X to itself – i.e.
f (x) = x – is called the identity mapping on set X. This mapping
is very special and is uniquely defined just by specifying the set X.
7. Domains, codomains, restrictions and extensions mappings. In defining
a mapping f : X → Y , the sets X and Y are a vital part of the
definition of a mapping or function. The set X is called the domain
of the function; the set Y is called the codomain of the mapping or
function.
A simple manipulation of both sets allows us to define new functions
from old ones. For example, if A is a subset of the domain set X, and
1
The word “rule” may elicit its ordinary-language meaning of an English sentence which
instructs you how to compute an image – say, a power series formula for a function. But
English sentences are finite and there are only countably many of them – while the number
of functions between infinite sets is usually uncountable. You would be amazed how many
dreadful Calculus texts define a function as a “rule” which associates a unique y for each
x of the domain set.
14
f : X → Y is a mapping, then we obtain a mapping
f |A : A → Y
which sends every element a ∈ A to f (a) (which is defined a fortiori).
This new function is called the restriction of the function f to the
subset A. If g = f |A , we say that f extends function g.
Similarly, if the codomain Y of the function f : X → Y is a subset of a
set B (that is, Y ⊆ B), then we automatically inherit a function f |B :
X → B just from the definition of “function”. When f : X → Y = X
is the identity mapping on X, the replacement of the codomain from
X by a larger set B yields a mapping 1X |B called the containment
mapping2 .
8. The composition of mappings. If f : X → Y and G : Y → Z, there
is a well-defined mapping g ◦ f : X → Z which takes any x ∈ X to
g(f (x)) in Z. The student should understand the “associativity” of
the composition operation when all compositions are defined. Thus if
h : Z → W and f and g are as indicated, then h ◦ (g ◦ f ) = (h ◦ g) ◦ f .
The reader should know that if f : X → Y is any mapping, then
1Y ◦ f = f, and f ◦ 1X = f.
9. Images and range. If f : X → Y is a mapping, the collection of all
“images” f (x) of elements of X is clearly a subset of Y which we call
the image or range of the function f and is denoted f (X).
10. One-to-one mappings (injections) and onto mappings (surjections). A
mapping f : X → Y is called onto or surjective if and only if Y =
f (X) as sets. That means that every element of set Y is the image of
some element of set X, or, put another way, the fibre f −1 (y) of each
element y of Y is nonempty.
A mapping f : X → Y is said to be one-to-one or injective if any
two distinct elements of the domain are not permitted to yield the same
image element. This is another way of saying, that the fibre f −1 (y) of
each element y of Y is permitted to have at most one element.
2
Unlike the notion of “restriction”, this construction does not seem to enjoy a uniform
name.
15
11. The reader should be familar with the fact that the composition of two
injective mappings is injective and the composition of two surjective
mappings (performed in that chronological order) is surjective. She or
he should be able to prove that the restriction f |A : A → Y of any
injective mapping f : X → Y (here A is a subset of X) is an injective
mapping.
12. Bijections. A mapping f : X → Y is a bijection if and only if it is both
injective and surjective – that is, both one-to-one and onto (in the preBourbaki vernacular). When this occurs, every element y ∈ Y has its
fibre f −1 (y) containing a unique element which can unambiguously be
denoted f −1 (y). This notation allows us to define the unique function
f −1 : Y → X which we call the inverse of the bijection f . Note
that the inverse mapping possesses these properties,
f −1 ◦ f = 1X , and f ◦ f −1 = 1Y ,
where 1X and 1Y are the identity mappings on sets X and Y , respectively.
13. Indexing families of subsets of a set X with the notation {Xα }I –
or {Xα }α∈I . This should be understood in its guise as a mapping
I −→ 2X . 3
14. The construction hom(A, B) as the set of all mappings A −→ B. (This
is denoted B A in some parts of mathematics.) The reader should see
that if A is a finite set, say with n elements, then hom(A, B) is just the
n-fold Cartesian product of B with itself
B × B × · · · × B (with exactly n factors)
Also if N denotes the collection of all natural numbers, then hom(N, B)
is the set of all sequences of elements of B.
Notation for the image of an element in the domain of a mapping.
There is a perennial akwardness cast over common mathematical notation
for the composition of two maps. Mappings are sometimes written as left
3
Note that in the notation, the “α” is ranging completely over I and so does not itself
affect the collection being described; it is what Logicians call a “bound” variable.
16
operators and sometimes as right operators, and the akwardness is not the
same for both choices due to the assymetric fact that English is read from left
to right. Because of this, right operators work much better for representing
the action of sets with a binary operation as mappings with compositions.
Then why are left operators used at all? There are two answers: Suppose
there is a division of the operators on X into two sets – say A and B. Suppose
also that if an operator a ∈ A is applied first in chronological order and the
operator b ∈ B is applied afterwards; that the result is always the same
had we applied b first and then applied a later. Then we say that the two
operations “commute” (at least in the time scale of their application, if not
the temporal order in which the operators are read from left to right). This
property can often be more conveniently rendered by having one set, say A,
be the left operators, and the other set B be the “right” operators, and then
expressing the “commutativity“ as something that looks like an “associative
law”. So one needs both kinds of operators to do that.
The second reason for using left operators is not so mathematical: the
answer comes from the sociological accident of English usage. In the English
phrase “function of x”, the function comes first, and then its argument. This
is reflected in the left-to-right notation “f (x)” so familiar from calculus.
Then if the composition α◦β were to mean “α is applied first and then β”,
we must have (α ◦ β)(x) = β(α(x)). That is, we must reverse their “reading”
order.
On the other hand, if we say α ◦ β means β is applied first and α is
applied second – so (α ◦ β)(x) = α(β)(x) – then things are nice as far as the
treatment of parentheses are concerned, but we seem to be reading things in
the reverse chronological order (unless we compensate by reading from right
to left). Either way there is an inconvenience. A (not all that overwhelming)
majority in the literature have chosen the latter as the least of the two evils.
Here, at least, we adopt the latter conventions insofar as left operators are
concerned. So we have this
(Rule for Composition of Left Operators) If α : X → Y and β : Y → Z
are left operators on their respective domains X and Y , then, for each
x∈X
(β ◦ α)(x) := β(α(x));
— so that β ◦ α : X → Z denotes the composition of the mappings,
viewed as a left operator on X.
17
The first argument certainly applies to discussions of representations of
groups and rings, where on the one hand, it is really handier to think of the
right action of groups or rings as induced mappings which are right operators.
At the same time, one desires “morphisms” which commute with all these
operators , and so it is easier to think of these morphisms as left operators:
so both types of operators must be used.
The convention of these notes is to write mappings as left operators, and
to indicate right operators by exponential notation or with explicit apologies
(as in the case of right R-modules), as right multiplications.
Thus in general we adopt these rules:
Rule #1 α ◦ β denotes the composition resulting from first applying β and
then α in chronological order. Thus
(α ◦ β)(x) = β(α(x)).
Rule #2 Exponential notation indicates right operators. Thus compositions have these images:
xα◦β = (xα )β .
Exception to Rule #2 For right R-modules we indicate the right operators by right “multiplication”, that is right juxtaposion. Ring multiplication still gets represented the right way since we have (mr)s = m(rs)
for module element m and ring elements r and s (it looks like an associative law). (The reason for eschewing the exponential notation in
this case is that the law mr+s = mr + ms for right R-modules would
then not look like a right distributive law.)
1.1.3
Partitions
A partition of a set X is a collection {Xα }I of non-empty subsets Xα of X
satisfying
1. ∪α∈I Xα = X
2. The sets Xα are pairwise disjoint – that is, Xα ∩ Xβ = ∅ whenever
α 6= β.
18
The sets Xα of the union are called the components of the partition.4
A partition may be described in another way: as a surjection π : X −→ I.
Then the collection of fibres – that is, the sets π −1 (α) := {x ∈ X|π(x) = α}
as α ranges over I – form the components of a partition. Conversely, if
{Xα }I is a partition of X, then there is a well-defined surjection π : X −→ I
which takes each element of X to the index of the unique component of the
partition which contains it.
˙
In these lecture notes we shall write X = A + B (rather than X = A∪B
or X = A ] B) to express the fact that {A, B} is a partition of X with just
two components A and B. Similarly we write
X = X1 + X2 + ... + Xn
when X possesses a partition with n components Xi , i = 1, 2, . . . , n. This
notation goes back to Galois’ rendering of a partition of a group by cosets
of a subgroup. The notation is very convenient since one doesn’t have to
“doctor up” a “cup” (or “union”) symbol. Unfortunately similar notation is
also used in more algebraic contexts with a different meaning. We resolve
the possible ambiguity in this way:
When a partition (rather than, say, a sum of ideals) is intended, the partition
will simply be introduced by the words “partition” or “decomposition”.
1.1.4
Zorn’s Lemma and the Axiom of Choice.
The student ought to be acquainted with and to be able to use
The Axiom of Choice Suppose {Xα }I is a family of pairwise disjoint
non-empty sets. Then there exists a subset R of the union of the Xα , which
meets each component Xα in exactly a one-element set.
This assertion is about the existence of systems of representatives. If only
finitely many of the Xα are infinite sets this can be proved from ordinary
set theory. But the reader should be aware that in its full generality it is
independant of set theory, and yet, is consistent with it. The reader is thus
4
It does not seem to be useful to allow partitions to have empty components.
19
encouraged to think of it as an adjunct axiom to set theory, to make a note
of each time it is used, and to quietly employ the appropriate guilt feelings
when using it.
Obviously it has many uses. For example it guarantees us a system of
coset representatives for any subgroup of any group. In fact we shall see an
application of it in the very next subsection on cardinal numbers.
In the presence of set theory (which is actually ever-present for the purposes of these notes) the Axiom of Choice is equivilant to another assertion
called Zorn’s Lemma, which should also be familiar to the reader. Since it
appears in the setting of partially ordered sets, its discussion is deferred to
a later section of this chapter.
The reader need not be required to know a proof of the equivalence of
the Axiom of Choice and Zorn’s Lemma, or a proof of their consistency with
set theory. At this stage just the psychological assurances are enough to
read these notes. For a good developement of the many surprising equivalent
versions of the Axiom of Choice the truly curious student is encouraged to
peruse the excellent and readable monograph of Prof. Mary Ellen Rudin as
well as read another study in clarity: the book by Ian Stewart entitled The
Joy of Sets.
1.1.5
Cardinal Numbers
Not all collections of things are actually amenable to the axioms of set theory, as Russell’s paradox illustrates. Nonetheless certain operations and constructions on such collections can still exist. It is still possible that they may
possess equivalence relations and that is true of the collection of all sets.
We have mentioned that a mapping f : A −→ B which is both an injection and a surjection is called a bijection or a one-to-one correspondence
in a slightly older language. In that case the partition of A defined by the surjection f (see above) is the trivial partiton of A into its one-element subsets.
This means the fibering f −1 defines an inverse mapping f −1 : B −→ A
which is also a bijection satisfying f ◦ f −1 = 1A and f −1 ◦ f = 1B as every
student knows as described above. It is clear, using obvious facts about compositions of bijections and identity mappings, that the relation of two sets
having a bijection connecting them is an equivalence relation. The resulting
equivalence classes are called cardinal numbers and the equivalence class
containing set X is denoted |X| and is called the cardinality of X.
One recalls that one writes |X| ≤ |Y | if and only there exists an injection
20
f : X −→ Y . This is a well-defined relation between cardinal numbers of sets
for if |X| = |X 0 | then there is a bijection g : X 0 −→ X and the composition
g ◦ f defines an injection from X 0 into Y , so |X 0 | ≤ |Y | by definition.
These lecture notes presume and use the following three results concerning
cardinalities of sets:
Theorem 1 (The Bernstein-Schröder Theorem) If for two sets X and Y ,
|X| ≤ |Y | and |Y | ≤ |X| then |X| = |Y |
(A nice proof with pictures appears in the famous book of Birkhoff and
McLane.)
Theorem 2 If there exists a surjection f : X −→ Y then |Y | ≤ |X|.
proof: : With the axiom of choice one gets a subset X 0 containing exactly
one element xy from each fiber f −1 (y) as y ranges over Y . One then has an
injection Y −→ X taking y to xy .
Theorem 3 If I is an infinite set, then there exists a partition I = A + B
with both A and B in bijective correspondence with I – that is, |A| = |I| =
|B|.
1.1.6
Special results on cardinalities
As usual, let N denote the set of natural numbers, that is, the set {0, 1, 2, . . .}
of all integers which are positive or zero (non-negative). There is a useful
special result whose proof may not have appeared in the student’s undergraduate experience.
Theorem 4 If I is an infinite set, then |I × N| = |I|.
proof: By Theorem 3 just above, there is a partition I = A + B and
bijections α : I −→ A and β : I −→ B. Set B1 := B = β(I), Bk := β k (I)
and Ak := β k−1 ◦ α(I). We then have diagram
β
β
β
β
I → B1 → B2 → B3 → · · ·
α ↓
β
β
β
β
A → A1 → A2 → A3 →
21
where each Ak is contained in the Bk directly above it in the diagram. For
any n ∈ N we have a partition
I = A + A1 + A2 + · · · + An + Bn−1 .
Setting à = A + A1 + A2 + · · · (an infinite union of pairwise disjoint sets) we
see that the formula
ψ(x, n) := β n−1 ◦ α(x)
defines a mapping ψ : I × N −→ Ã whose restriction to I × {n} induces an
injection I × {n} → An−1 (with the convention that A0 := A). Thus we see
that ψ is an injection. Thus by definition |I × N| ≤ |I|. But as there is a
bijection I → I × 1 ⊆ I × N we also have |I| ≤ |I × N|. The result now
follows from the Bernstein-Schröder Theorem.
Corollary 5 . Let {Xα }I be a family of finite non-empty subsets of a set S
indexed by an infinite set I. Then
|∪α∈I Xα | ≤ |I|
proof: We may formally construct the disjoint union of Xα by replacing
each Xα by Xα × {α} and taking the union
F = ∪α (Xα × {α})
in the ambient set S × I. Then the projection on the first coordinate defines
a surjection
F → U := ∪α Xα .
At each α there exists some surjection φα := N → Xα since Xα is a finite
set. This sequence {φα } in turn defines a surjection
ψ :I ×N→F
upon setting ψ(x, m) := (φα (x), α) ∈ F . Since the composition of the two
surjections presented above is a surjection I × N → U we have
|I × N| ≥ |U |
by Theorem 2 above. But by Theorem 4 just preceeding, |I × N| = |I|, and
so |I| ≥ |U | follows.
22
1.2
Binary operations and Monoids.
Suppose X is a set and let X (n) be the n-fold Cartesian product of X with
itself. For n > 0, a mapping
X (n) → X
is called an n-ary operation on X. If n = 1 is is just a mapping of x into
itself. There are certain concepts that are brought to bear at the level of
2-ary (or binary) that fade away for larger n.
We say that set X admits a binary operation if there exists a 2-ary
function f : X × X → X. In this case, it is possible to indicate the operation
by a constant symbol (say “*”) inserted between the elements of an ordered
pair — thus one might write “x*y” to indicate f ((x, y)) (which we shall write
as f (x, y) to rid ourselves of one set of superfluous pqrentheses). Indeed one
might even use the empty symbol in this role – that is f (x, y) is represented
by “juxtaposition” of symbols: we write “xy” for f (x, y). This seems the
simplest way to describe the relevant properties.
The binary operation on X is said to be commutative if and only if
xy = yx, for all x and y in X.
The operation is associative if and only if
x(yz) = (xy)z for all x, y, z ∈ X.
A set admitting an associative binary operation is called a semigroup. A
semigroup is said to possess an identity element or two-sided identity if
and only there exists an element e ∈ X such that
ex = x = xe for all x ∈ X.
A semigroup with a two-sided identity is called a monoid. A semigroup (or
monoid) with respect to a commutative binary operation is simply called a
commutative semigroup (or commutative monoid).
For example the non-negative integers, N0 , form a commutative monoid
under ordinary addition of integers.
It is not our intention to venture into various algebbraic structures at
such an early stage. But we are forced to single out monoids, since they are
always lurking around so many of the most basic definitions (for example,
interval measures on posets, just to name one).
23
Chapter 2
Partially Ordered Sets and
Dependance Theories
2.1
Introduction
The reader is certainly familiar with examples of sets X possessing a transitive relation “≤” which is antisymmetric. Such a pair (X, ≤) is called a
partially ordered set. For any subset Y of X there is a partial ordering (Y, ≤)
induced on Y by the partially ordered set (X, ≤) which surrounds it: one
merely restricts the relation “≤” to the pairs of X × X. We then call (Y, ≤)
an induced poset of (X, ≤). Certainly one example familiar to most readers
is the poset P(X) := (2X , ⊆) called the power poset of all subsets of X
under the containment relation.
Throughout this algebra course one will encounter sets X which are closed
under various various n-ary operations subject to certain axioms – that is,
some species of “algebraic object”. Each such object X naturally produces a
partially ordered set whose members are the subsets of X closed under these
operations. — that is, the poset of algebraic subobjects of the same species.
Of course, if there are no particular operations or axioms to be preserved,
we have merely redescribed the power poset P(X) := (2X , ⊆) mentioned
in the preceeding paragraph. If X is some algebraic species closed under
operations subject to axioms, then its poset of algebraic subobjects of the
same species is certainly an induced subposet of the power poset P(X). But
as a convention to distinguish the two, instead of writing “A ⊆ B” (the
simple containment relation) we write “A ≤ B” to indicate that A is an
24
algebraic subobject of B of the same algebraic species.
For example, If X is a group, then the poset of algebraic subobjects is
the poset of all subgroups of X. If R is a ring and M is a right R-module,
then we obtain the poset of submodules of M . Special cases are the posets
of vector subspaces of a right vector space V and the poset of right ideals
of a ring R. In turn these posets have special induced posets: Thus the
poset L(V ) of all finite-dimensional vector subspaces of a vector space V
is transparently a subposet of the poset of all vector subspaces of V . For
example from a group G one obtains the poset of normal subgroups of G, or
more generally, the poset of subgroups invariant under any fixed subgroup
A of the automorphism group of G. Similarly there are posets of invariant
subrings of polynomial rings, and invariant submodules for an R-module
admitting operators. Finally, there are induced posets of algebraic objects
which are closed with respect to a closure operator on a poset (perhaps
defined by a Galois connection).
All of these examples will be made precise later. The important thing to
note at this stage is that
1. Partially ordered sets underly all of the algebraic structures discussed
in this book.
2. Many of the crucial conditions which make arguments work are basically properties of the underlying posets alone and do not depend on
the particular species of algebraic object with which one is working:
Here are the main examples:
(a) Zorn’s Lemma,
(b) the ascending and descending chain conditions,
(c) Galois connections and closure operators,
(d) interval measures on semimodular semilattices (The General Jordan Hölder Theorem),
(e) Möbius inversion, and
(f) dependence theory (where “dimension” becomes matroid rank).
The purpose of this chapter is to introduce those basic arguments that arise
strictly from the framework of partially ordered sets, ready to be used for
the rest of this book.
25
2.2
2.2.1
Basic definitions.
Definition of a partially ordered set.
A partially ordered set, (P, ≤), hereafter called a poset, is a set P with
a transitive antireflexive binary relation ≤. This means that for all elements
x, y and z of P ,
1. x ≤ y and y ≤ z together imply x ≤ z, and
2. x ≤ y and y ≤ x together imply x = y.
We write y ≥ x if x ≤ y and write x < y if x ≤ y and x 6= y. In the latter
case we say that x is properly below y, or, equivalently, that y is properly
above x.
2.2.2
Induced subposets.
Suppose (P, ≤) is a poset, and X is a subset of P . Then we can agree to
induce the relation ≤ on the subset X — that is, for any two elements x and
y of the subset X, we agree to say that x ≤X y if and only if x ≤ y in the
poset (P, ≤). Then any child can verify that (x ≤ y) if and only if (x ≤X y)
– it is just a consequence of the definitions.
The empty set is considered to be an induced subposet of any poset.
Let X and Y be two subsets of P where (P, ≤) is a poset. We make these
observations:
1. If (X, ≤X ) is a partial ordering on X and if (Y, ≤Y ) is a partial ordering
on Y such that (X, ≤X ) is an induced subposet of (Y, ≤Y ) and (Y, ≤Y ) is
an induced subposet of (P, ≤), then (X, ≤X ) is also an induced subposet
of (P, ≤). This fact means that if X is any subset of P , and (P, ≤) is a
poset, we don’t need those little subscripts attached to the relation “≤”:
we can unambiguously write (X, ≤) to indicate the subposet induced
by (P, ≤) on X. In fact, when it is clear that we are speaking of induced
subposets, we may write X for (X, ≤) and speak of “the induced poset
X”.
2. If (X, ≤) and (Y, ≤) are induced subposets of (P, ≤) then we can form
the intersection of induced posets (X ∩Y, ≤). In fact, since this notion depends only on the underlying sets, one may form the intersection
26
of induced posets
(∩σ∈I Xσ , ≤),
from any family {(Xσ , ≤)|σ ∈ I} of induced posets of a poset (P, ≤).
Perhaps the most important example of an induced poset is the interval.
Suppose (P, ≤) is a poset and that x ≤ y for elements x, y ∈ P . Consider
the induced poset
[x, y]P := {z ∈ P |x ≤ z ≤ y}.
This is called the interval in (P, ≤) from x to y. If (P, ≤) is understood,
we would write [x, y] in place of [x, y]P . But we must be very careful: there
are occasions in which one wishes to discuss intervals within an induced poset
(X, ≤), in which case one would write [x, y]X for the elements z of X between
x and y. Clearly one could mix the notations and write [x, y]X := [x, y] ∩ X,
the intersection of two induced posets.
2.2.3
Dual posets and dual concepts.
Suppose now that (P, ≤) is given. One may obtain a new poset, (P ∗ , ≤∗ )
whose elements are exactly those of P , but in which the partial ordering has
been reversed! Thus a ≤ b in (P, ≤) if and only if b ≤∗ a in (P ∗ , ≤∗ ). In this
case, (P ∗ , ≤∗ ) is called the dual poset of (P, ≤).
We might as well get used to the idea that for every definition regarding
(P, ≤), there is a “dual notion” — that is, the generically-defined property
or set in (P, ≤) resulting from defining the same property or set in (P ∗ , ≤∗ ).
Examples will follow.
2.2.4
Maximal and minimal elements of induced posets
An element x is said to be maximal in (X, ≤) if and only if there is no
element in X, which is strictly larger than x – that is x ≤ y for y ∈ X
implies x = y.
Note that this is quite different than the more specialized notion of a
global maximum (over X) which would be an element x in X for which
x0 ≤ x for all elements x0 of X. Of course defining something does not posit
its existence; (X, ≤) may not even possess maximal elements, or if it does, it
may not contain a global maximum.
We let max X denote the full (but possibly empty) collection of all maximal elements of (X, ≤).
27
By replacing the symbol “≤” by “≥” in the preceeding definitions, we
obtain the dual notions of a minimal element and a global minimum of
an induced poset (X, ≤).
Of course some posets may contain no minimal elements whatsoever.
2.2.5
Global maxima and minima: zero’s and one’s in
posets.
In some posets (P, ≤) there exists an element “0P ” (just to give it a name)
which it less-than-or-equal to all other elements of P . Clearly such an element
is unique. We call it the zero-element of the poset. From its definition,
a zero element would always be a the global minimum.
Dually, there may exist a “one-element”, denoted 1P , or something similar, which is an element in (P, ≤) with the property that all other elements
are less-than-or equal to it. Obviously this the zero-element of (P ∗ , ≤∗ ).
Some posets have a “zero”, some have a “one”, some have both, and some
have neither.
But whether or not a “zero” is present in P , one can always adjoin a
new element declared to be properly below all elements of P to obtain a
new poset 0(P ). For example, if P contains just one element p, then 0(P )
consists of just two elements {01 , p} with 01 < p — called a chain of length
one . Iterating this construction with the same meaning of P , we see that
02 (P ) introduces a “new zero element” , 02 to produce a 3-element poset
with 02 < 01 < p – that is, a chain of length two. Clearly 0k (P ) would be
a poset with elements (up to a renaming of the elements) arranged as
0k < 0k−1 < · · · < 01 < p,
which we call a chain of length k. Of course this construction can also
be performed on any poset P so that 0k (P ) in general becomes the poset
P with a tail of length k − 1 adjoined to it from below – a sort of attached
“kite-tail”.
Also, by dually defining 1k (P ) one can attach a “stalk” of length k − 1
above an arbitrary poset P .
2.2.6
Total orderings and chains
A poset (P, ≤) is said to be totally ordered if {c, d} ⊆ C implies c ≤ d or
d ≤ c. (Sometimes this notion is referred to as “simply ordered”. Obviously
28
any induced poset of a totally ordered set is also totally ordered. Any maximal (or minimal element) of a totally ordered set is in fact a global maximum
(or global minimum). Familiar examples of totally ordered sets are obtained
as induced posets of the real numbers under its usual ordering, for example (1) the rational numbers, (2) the integers, (3) the positive integers (4)
the open, half-closed or closed intervals (a, b), [a, b), (a, b], or[a.b] of the real
number system, or (5) the intersections of sets in (4) with those in (1)-(3).
A chain of (P, ≤) is an induced poset (C, ≤) which happens to be totally
ordered.
The familiar real number system under the natural ordering is totally
ordered, and so are any of its induced subposets such as the integers, the
positive integers, any real open, half-open, or closed interval, or any intersection of these intervals with the integers or rational numbers etc.
2.2.7
Antichains.
The reader is certainly familiar with many posets which are not totally ordered, such as the set P(X) of all subsets of a set X of cardinality at least
two. Here the relation “≤” is containment of sets. Again there are many
structures that can be viewed as a collection of sets, and thus become a
poset under this same containment relation: for example, subspaces of a vector space, subspaces of a point-line geometry, subgroups of a group, ideals
in a ring and R-submodules of a given R-module and in general nearly any
admissable subobject of an object admitting some specified set of algebraic
properties.
Two elements x and y are said to be incomparable if both of the statements x ≤ y and y ≤ x are false. A set of pairwise incomparable elements in
a poset is called an antichain.1
The set max(X, ≤) where (X, ≤) is an induced subposet of (P, ≤), is
always an antichain.
1
In a great deal of the literature, sets of pairwise incomparable elements are called
independant. Despite this convention, the term “independant” has such a wide usage in
mathematics that little is served by employing it at this point to indicate what we have
called an antichain. However, some coherent sense of the term “independance” – of the
sort useful for Algebra – is exposed in a section on Matroids later in this chapter.
29
2.2.8
Zornification
Although an induced poset (X, ≤) of a poset (P, ≤) may not have a maximum, it might have an upper bound – that is, an element m in P which is
larger than anything in the subset X ( precisely rendered by “m ≥ x for all
elements x of X”). (Of course if it happened that such an element m were
already in X then it would be a global maximum of X).
The dual notion of “lower bound” should be easy to formulate.
Of course it would be useful to know that induced posets (X, ≤) actually
have maximal elements. We mention here a criterion for asserting that such
maximal elements exist.
Zorn’s Lemma Suppose (P, ≤) is a poset for which any chain has an upper
bound. Then (P, ≤) possesses a maximal element.
It has had many equivalent formulations, and has gone by many names,
such as “the Hausdorff maximum principle”, “Zorn’s Lemma” and others2 .
However it is not a Lemma or even a Theorem. It does not follow from the
axioms of set theory, nor does it contradict them. That is why we called it “a
criterion for an assertion”. Using it can never produce a contradiction with
formal set theory. But since its denial also cannot produce such a contradiction, one can apparently have it either way. The experienced mathematician,
though not always eschewing its use, at least prudently keeps a running tab
on how many appeals the maximum principle he or she may have invoked.3
In these notes the maximum principle will be used only very sparingly.
2.2.9
Order ideals and filters.
For this subsection fix a poset (P, ≤). An order ideal of P is an induced
subposet (J, ≤), with this property:
If y ∈ J and x is an element of P with x ≤ y, then x ∈ J.
2
For a nice account of the history of this widely used principle, see Mary Ellen Rudin’s
book.
3
Certainly using the Maximum Principle should be accompanied by the appropriate
feeling of guilt. Who knows? Each indulgence in Zornification might revisit some of you
in another life.
30
In other words, an order ideal is a subset J of P with the property that once
some element belongs to J, then all elements of P below that element are
also in J.
Note that the empty subposet is an order ideal.
The reader may check that the the intersection of any family of order
ideals is an order ideal. (Since order ideals are a species of induced posets,
we are using “intersection” here in the sense of the previous subsection 2.1.2
on induced subsposets.)
Then there is the dual notion. Suppose (F, ≤∗ ) were an order ideal of the
dual poset (P, ≤∗ ). Then what sort of subset of P is F ? It is characterized
by being subset of P with this property:
1. If x ∈ F and y is any element of P with x ≤ y, then y ∈ F .
Any subset of P with this property is called a filter.
There is an easy way to construct order ideals. Take any subset X of P .
Then define
PX := {y ∈ P |y ≤ x for some element x ∈ X}.
Note that always X ⊆ PX . In fact the order ideal PX is “generated” by
X in the sense that it is the intersection of all order ideals that contain X.
Also, we understand that P( ∅) = ∅.
In the particular case that X = {x} we write Px for P{x} , and call Px the
principle order ideal generated by x (or just a principle order ideal
if x is left unspecified.
Of course any order ideal has the form PX (all we have to do is set
X = PX ) but in general, we do not need all of the elements of X. For
example if x1 < x2 for xi ∈ X, and if we set X 0 := X − {x1 }, then PX = PX 0 .
Thus if we throw out elements of X each of which is below an element left
behind, the new set defines the same order ideal that X did. At first the
student might get the idea that we could keep throwing out elements until
we are left with an antichain. That is indeed true when X is a finite set, or
more generally if every element of X is below some member of max(X, ≤).
But otherwise it is generally false.
The dual notion of the filter generated by X should be transparent. It
is the set P X of elements of P which bound from above at least one element
of X. It could be described as
P X := {y ∈ P |Py ∩ X 6= ∅}.
31
By duality, the intersection of any collection of filters is a filter.
An induced subposet (X, ≤) of (P, ≤) is said to be convex if, whenever
x1 and x2 are elements of X with x1 ≤ x2 in (P, ≤), then in fact the entire
interval [x, y]P is contained in X.
2.2.10
Products of posets.
Suppose (P1 , ≤) and (P2 , ≤) are two posets4 .
The product poset (P1 × P2 , ≤) is the poset whose elements are the
elements of the Cartesian product P1 × P2 , where element (a1 , a2 ) is declared
to be less-than-or-equal to (b1 , b2 ) if and only if
a1 ≤ b1 and also a2 ≤ b2 .
It should be clear that this notion can be extended to any collection of posets
{(Pσ , ≤)|σ ∈ I}. Its elements are the elements of the Cartesian product
Πσ∈I Pσ – that is, the functions f : I → U , where U is the disjoint union of
the sets Pσ with the property that at any σ in I, f always assumes a value
in Pσ . Then function f ≤ g if and only if f (σ) ≤ g(σ) for each σ ∈ I.
Suppose now, each poset Pσ contains its own “zero element” element 0σ
less than or equal to all other elements of Pσ . We can define a direct sum of
posets {(Pσ , ≤)} as the induced subposet of the direct product consisting
only of the functions f ∈ Πσ∈I Pσ for which f (σ) differs from 0σ for only
P
finitely many σ. This poset is denoted σ∈I Pσ .
4
Usually authors feel that the two poset relations should always have distinguished
notation – that is, one should write (P1 , ≤1 ) and (P2 , ≤2 ) instead of what we wrote.
At times this can produce intimidating notation that would certainly finish off any sleepy
students with detectable breath. Of course that precaution certainly seems to be necessary
if the two sets P1 and P2 are identical. But sometimes this is a little over-done. Since
we already have posets denoted by pairs consisting of the set Pi and a symbol “≤00 , the
relation “≤00 is assumed to be the one operating on set Pi and we have no ambiguity except
possibly when the ground sets (Grundmengen) Pi are equal. (I just made up that word!
I doubt if decent Germans use it.) Of course in the case the two “ground-sets” are equal
we do not hesitate for a moment to adorn the symbol “≤00 with further distinguishing
emblems. This is exactly what we did in defining the dual poset. But even in the case that
P1 = P2 one could say that in the notation, the relation “≤” is determined by the name
Pi of the set, rather then the actual set, so even then the “ordered pair” notation makes
everything clear. I hope this makes everything clear.
32
2.2.11
Examples and exercises.
Since we will need to discern whether two examples of posets are the same
poset or not, we must at least define an isomorphism of posets. An isomorphism of poset (P1 , ≤1 ) to (P2 , ≤2 ) is a bijection f : P1 → P2 with the
property that for any two elements x and y of P1 ,
x ≤1 y if and only if f (x) ≤2 f (y).
As explained in the introduction of Chapter 1, the isomorphism simply
amounts to a change of names – each “x’ in P1 is renamed “f (x)” and the
relation “≤1 ” is renamed “≤2 ”. We also say that (P1 , ≤1 ) is isomorphic
to (P2 , ≤2 ) when there is an isomorphism from the former to the latter.
Among the vast collection of all posets, the relation of being isomorphic is
an equivalence relation. Also, as noted in Chapter 1, algebraists tend to
abuse the language by saying that two posets “are the same poset” if and
only if they belong to the same equivalence class under isomorphisms.
This abuse can be rendered tautologous by thinking of an (abstract) poset
as the the isomorphism class of posets itself, and the examples of posets as
presentations or models of the general (abstract) poset.
In general, in algebra, we want two objects to be isomorphic if and only
one can get a complete description of the second object simply by changing
the names of the operating parts of the first object and the names of the
relations among them that must hold – and vice versa. This is the “alias”
point of view; the two structures are really the same thing with some names
changes imposed.5
The other way is to form a bijection of the relevant parts of the first
object with the second such that relations hold among parts in the domain
if and only if the corresponding relations hold among their images.6
There is no logical distinction between the two approaches – only a psychological one. My wish is to get across to the student all that is expected
when we speak of isomorphisms – we are merely dealing with the same situation, but under a new management which has renamed everything. Unfortunately “renaming” is a subjective human conceptualization that is akward
to define precisely – that is why, at that beginning, we like to make the defi5
I must confess that this is the view I normally prefer.
I have been deliberately vague in talking about parts rather than “elements” for the
sake of generality.
6
33
nitions in terms of bijections rather than “renamings” even though the latter
is how many of us secretly think of it.
We shall meet this over and over again; it is the “abstractness” of Abstract
Algebra.7
Example 1. (Examples of totally ordered sets) Isomorphism classes of these
sets are the ordinal numbers. The most familiar examples are these:
1. The natural numbers This is the set of positive integers,
N := {1, 2, 3, 4, 5, 6, . . .},
with the usual ordering
1 < 2 < 3 < 4 < 5···.
Rather obviously, as an ordered set, it is isomorphic to the chain
0 < 1 < 2 < ···
or even
k < k + 1 < k + 2 < k + 3 < ···,
k any integer, under the shift mapping n → n + k − 1.8
2. The system of integers
Z := {. . . < −2 < −1 < 0 < 1 < 2 < . . .}.
3. The system of the rational numbers with respect to the usual ordering — that is a/b > c/d if and only if ad > bd, an inequality of
integers.
7
There is a common misunderstanding of this word “abstract” that mathematicians
seem condemned to suffer. To the man on the street, “abstract” seems to mean “having
no relation to the world, no applications”. Unfortunately this is the overwhelming view
of State Legislatures, pundits of Education, and University Administrators everywhere
(one hears words like “Ivory Tower”, “Intellecuals on welfare”, etc.). On the contrary, a
concept is “abstract” precisely because is has more than one application – not that that it
hasn’t any application, as one would expect from Ivory-Tower concepts remote from the
world. They have got it just backwards!
8
This isomorphism explains why it is commonplace to do an induction proof with
respect to the second of these examples rather than the first. In enumerative combinatorics,
for example, the “natural numbers” are define to be all non-negative integers, not just the
positive integers.
34
4. The real number system R.
5. Any induced subposet of a totally ordered set. We have already mentioned intervals of the real line. (Remark: the word “interval” here is
used for the moment as it is used in Freshman College Algebra, open,
closed, and half-open intervals such as (a, b] or [a, ∞). In this context,
the intervals of posets that we have define earlier, becom the closed
intervals, [a, b], of the real line, with a consistency of notation.
Here is an example: Consider the induced poset of the rational numbers
(Q, ≤) consisting of those postive fractions less than or equal to 1/2
which (in lowest terms) have a denominator not exceeding the positive
integer d in absolute value. For d = 7 this is the chain
1
1
1
1
2
1
2
3
1
< < < < < < < < .
7
6
5
4
7
3
5
7
2
This is called a Farey series. A curiosity is that if ab and dc are adjacent
members from left to right in this series, then bc − ad = 1!
Example 2. (Examples of the classical locally finite (or finite) posets which
are not chains) A poset (P, ≤) is said to be a finite poset if and only if
it contains only finitely many elements – that is, |P | < ∞. It is said to be
locally finite if and only if every one of its intervals [x, y] is a finite poset.
1. The Boolean poset B(X) of all finite subsets of a set X, with the
containment relation (⊆) between subsets as the partial-ordering.
2. The divisor poset D = (N, |) of all positive integers N under the
divisor relation: – that is, we say a|b if and only if integer a divides
integer b evenly – i.e. b/a is an integer. 9
9
There are variations on this theme: In an integral domain a non-unit a is said to be
irreducible if and only if a = bc implies one of b or c is a unit. Let D be an integral
domain in which each non-unit is a product of finitely many irreducible elements, and let
U be its group of units. Let D∗ /U be the collection of all non-zero multiplicative cosets
U x. Then for any two such cosets, U x and U y, either every element of U x divides every
element of U y or else no element of U x divides any element of U y. In the former case
write U x ≤ U y. Then (D∗ /U, ≤) is a poset. If D is a unique factorization domain, then,
as above, (D∗ /U, ≤) is locally finite for it is again a product of chains (one factor in the
product for each association class U p of irreducible elements).
One might ask what this poset looks like when D is not a unique factorization domain.
Must it be locally finite? It’s something to think about.
35
3. Posets of vector subspaces. The partially ordered set L<∞ (V ; q) of
all finite-dimensional vector subspaces of a vector space V over a finite
field of q elements is a finite poset. (There is a generalization of this:
the poset L<∞ (M ) of all finitely generated submodules of a right Rmodule M and in particular the poset L<∞ (V ) of all finite-dimensional
subspaces of a right-vector space V over some division ring.) 10
4. The partition set: Πn . Suppose X is a set of just n elements. Recall
that a partition of X is a collection π := {Y1 , . . . , Yk } of non-empty
subsets Yj whose join is X but which pairwise intersect at the empty
set. The subsets Yj are called the components of the partition π.
Suppose π1 := {Yi |i ∈ I} is a partition of X and π 0 = {Zk |k ∈ K} is
a second partition. We say that partition π 0 refines partition π if
and only if there exists a partition I = J1 + · · · Jk of the index set, such
that
Yi := ∪{Z` |` ∈ Ji }.
[We can state this another way: A partition π can be associated with
a function fπ : X → I where the preimages of the points are the fibres
partitioning X: the same being true of π 0 and a function fπ0 : X → I.
We say that partition π 0 refines a partition π if and only if fπ (x) =
fπ0 (x).]
For example {6}{4, 9}{2, 3, 8}{1, 5, 10}{9, 10}, with four components
refines {1, 5, 6, 9, 10}, {2, 3, 4, 8, 9} with just two components.
Then Πn is the partially ordered set of all partitions of the n-set X
under the refinement relation.11
5. The Poset of Finite Multisets. Suppose X is any non-empty set.
A multiset is essentially a sort of inventory whose elements are drawn
from X. For example: if X = {oranges, apples, and bananas} then
m = { three oranges, two apples} is an inventory whose elements are
multiple instances of elements from X. Letting O = oranges, A =
10
In Aigner’s book [?, page 203], L<∞ (V, q) is denoted L(∞, q) in the case that V has
countable dimension. This makes sense when one’s plan is to relate the structure to certain
types of generating functions (the q-series). But of course, it is a well-defined locally finite
poset whatever the dimension of V .
11
Its cardinality |Πn | is called the nth Bell number and will reappear later in the
context of permutation characters.
36
apples, and B = bananas , one may represent the multiset m by the
symbol 3 · O + 2 · A + 0 · B or even the sequence (3, 2, 0) (where the order
of the coordinates corresponds to a total ordering of X). But both of
these notations can become inadequate when X is an infinite set. The
best way is to think of a multiset as a mapping. Precisely, a multiset
is a mapping
f : X → N0 := N ∪ {0},
from X into the the set N0 of non-negative integers.
A multiset f is dominated by a multiset g (written f ≤ g) if and only
if
f (x) ≤ g(x) for all x ∈ X
(where the “≤” in the presented assertion relects the standard total
ordering of the integers).
The collection M(X) of all multisets of a set X forms a partially ordered
set under the dominance relation. Since the multisets are actually
mappings from X to N0 the dominace relation is exactly that used in
comparing mappings in products. We are saying that the definition of
the poset of multisets shows us that
(M(X), ≤) =
Y
x∈X
N0 ,
(2.1)
– that is a product in which each “factor” Pσ in the definition of product
of posets is the constant poset (N0 , ≤) of non-negative integers.
The multiset f is said to be a finite multiset of magnitude |f | if
and only if
f (x) > 0 for only finitely many values of x, and
X
|f | =
f (x),
x∈X,f (x)>0
(2.2)
(2.3)
where the sum in the second equation is understood to be the integer
0 when the range of summation is empty (i.e. f (x) = 0 for all x ∈ X).
Thus in the example concerning apples and oranges above, the multiset
m is finite of magnitude 3 + 2 = 5.
Thus the collection M≤∞ (X) of all finite multisets forms an induced
poset of (M(X), ≤). Next one observes that a mapping f : X → N0
37
is a finite multiset if and only if f (x) = 0 for all but a finite number
instances of x ∈ X. This means equation (2.1) above has a companion
with the product replaced by a sum:
M≤∞ (X) =
X
x∈X
N0 .
(2.4)
Example 3. Simplicial complexes. We begin with a definition that is a
little more general than is common. A poset (P, ≤) is called a simplicial
complex if and only if
1. The poset (P, ≤) contains a zero element, 0.
2. Given any element x ∈ P , the interval [0, x] is isomorphic to the
Boolean poset B(Sx ) where Sx is any set of finite cardinality r(x).
The non-negative integer r(x) is called the (simplicial) rank of the element
x, and, as may be expected, the function r : P → N0 taking x to r(x) is
called the simplicial rank function. Clearly the zero element 0 is the
unique element of rank zero.
For each non-negative integer k, let Pk denote the elements of the simplicial complex (P, ≤) which have rank exactly k. Then
P = ∪k Pk and P0 = {0}.
The elements of P1 are called vertices. We can then form a graph ∆ =
(P1 , P2 ) where vertex x ∈ P1 is incident with “edge” y ∈ P2 if and only if
x ≤ y in (P, ≤).
One sees that for any element x in P , that the order ideal Px meets P1
in exactly r(x) elements — i.e. each poset element x dominates exactly r(x)
vertices. Indeed, we see that we can take Sx := Px ∩ P1 in the isomorphism
[0, x] ' B(Sx ). [Note that S0 is the empty set.] With that convention we
have:
For all x, y ∈ P , x ≤ y if and only if Sx ⊆ Sy .
Now let S(P ) be the full collection of all subsets of P1 of the form Sx =
Px ∩ P1 for some element x of P . All of the sets in S(P ) are finite and so
define an induced poset (S(P ), ≤) of the Boolean poset B(P1 ).
Lemma 6 The following holds for any simplicial complex (P, ≤):
38
1. The poset (S(P ), ≤) is an order ideal of B(P1 ).
2. The mapping P → S(P ) defined by x → Sx is a poset isomorphism
(P, ≤) ' (S(P ), ≤).
3. A poset (P, ≤) is a simplicial complex if and only if it is an order ideal
of some Boolean poset B(X) for some set X.
¯ is a reflexive and transitive relation on a set X.
Exercise 1. Suppose <
(Such a relation is called a pseudo-order.) For any two elements x and y
¯ and also y <x.
¯
of X, let us write s ∼ y if and only if x<y
1. Show that “∼” is an equivalence relation.
2. For each element x of X, let [x] be the ∼-equivalence class containing
¯ then a<b
¯ for every element a ∈ [x] and
element x. Show that if x<y
element b ∈ [y]. (In this case we write “[x] ≤ [y]”.)
3. Let X/ ∼ denote the collection of all ∼-classes of X. Show that (X/ ∼
, ≤) is a poset.
Exercise 2. Make a full list of the posets defined in Examples 1 and 2 above
which
1. Have a zero element.
2. Have a one-element.
3. Are locally finite.
Exercise 3. Let P be a fixed poset. If there is a bijection f : X → Y prove
that there exists an isomorphism of posets:
Y
P →
x∈X
Y
X
P →
X
x∈X
y∈Y
y∈Y
P,
P, if P has a zero element.
(This means that the index set X has only a weak effect on the definition of
a product.)
Exercise 4. Prove the following
39
Theorem 7 The divisor poset D is isomorphic to the poset M<∞ (N) of
all finite multisets over the set N of positive integers. That is, we have an
isomorphism
X
: (D, ≤) → N N0 .
(2.5)
[Hint: We sketch the argument: Notice that the right side of (2.5) consists
of all sequences of non-negative integers of which only finitely many are
positive (equivalently, all finite sequences of positive integers). The mapping
is easy to define. First place the positive prime numbers in ascending
(natural) order 2 = p1 < p2 < · · ·. For example p6 = 17. Now any integer
d ∈ D greater than one posseses a unique factorization into positive prime
powers, with the primes placed from left to right in ascending order –
d = Πσ∈Jd paσσ ,
for some (possibly empty) ascending sequence of positive integers J. If one
declares (d) to be the function f : N → N0 such that
f (σ) =
(
aσ
0
if σ ∈ Jd ,
if σ ∈
6 Jd ,
and declare (1) := 0, the constant function with all values 0, then one sees
that f (c) ≤ f (d) in the sum on the right side of equation (2.5) if and only if
c divides d evenly – i.e. c ≤ d in (D, ≤). Clearly is a bijection.]
Exercise 5. Recall that if (P, ≤) and X ⊆ P , then the symbol PX denoted
the order ideal
PX := {z ∈ P |z ≤ x for some x ∈ X}.
Show that PX = ∩{Px |x ∈ X}.
Exercise 6. Show that PX ∩ PY = PX∪Y .
Exercise 7. Give an example showing that if X and Y are subsets of P ,
then PX ∪ PY and PX∩Y may not be equal.
Exercise 8. Give an example of a poset (P, ≤) and an order ideal J of P
such that J does not have the form P (X) for any antichain X.
Exercise 9. Give formal proofs of all three conclusions in Lemma 6.
Exercise 10. In any undirected graph G = (V, E) a clique is any collection
of pairwise adjacent vertices. Suppose (P, ≤) is a simplicial complex and let
∆ = (P1 , P2 ) be its graph on vertices. Show that (P, ≤) is isomorphic to the
poset of all finite cliques of the graph ∆ under set-inclusion.
40
Exercise 11. Let V be a vector space of infinite dimension. Let I be the
collection of subsets of V consisting of the empty set together with every
finite linearly independant set of vectors. Show that with respect to the
containment relation, (I, ≤) is a simplicial complex.
2.3
2.3.1
Mappings among posets: closure operators and Galois connections
Subposets.
Now let (P, ≤) be a poset, and suppose X is a subset of P with a relation
≤X for which (X, ≤X ) is a partially ordered set. In general, there may be no
relation between (X, ≤X ) and the poset (X, ≤) that is induced on X.
But if it is true that for any x and y in X,
x ≤X y implies x ≤ y,
then we say that (X, ≤X ) is a subposet of (P, ≤). How does this compare
with the notion of induced subposet that we have been using so far? If (X, ≤X )
is an induced subposet of (P, ≤), then for any x and y in X,
x ≤X y if and only if x ≤ y.
Thus in a general subposet it might happen that two elements x1 and x2 of
(X, ≤X ) are incomparable with respect to the ordering ≤X even though one
is bounded by the other (say, x1 ≤ x2 ) in the ambient poset (P, ≤).
2.3.2
Order-preserving mappings
Let (P, ≤) and (Q, ≤) be posets. A mapping f : P → Q is said to be order
preserving if and only x ≤ y implies f (x) ≤ f (y), for all elements x and y
of P .12
Now there are two ways to view the image set f (P ) as a subset of (Q, ≤):
(1) First one may view it as the induced poset (f (P ), ≤). (2) Secondly one
may think of it as an image poset (f (P ), ≤f ) where, for a, b ∈ f (P ), one
12
The student familiar with such things from a topology class may recognize that the
order preserving mappings among simplicial complexes are what are usually called simplicial mappings in that context.
41
writes a ≤f b if and only if there are elements x and y in the respective fibers
f −1 (a) and f −1 (b) such that x ≤ y. In this case we still have that a ≤f b
implies a ≤ b with respect to poset (Q, ≤). So we have the following:
If f : P → Q is an order preserving mapping between posets (P, ≤) and
(Q, ≤), then the image poset (f (P ), ≤f ) is a subposet (but not necessarily an induced suposet) of (Q, ≤).
If the order preserving mapping f : P → Q is injective then the posets
(P, ≤) and (f (P ), ≤f ) are isomorphic, and we say that f is an embedding
of poset (P, ≤) into (Q, ≤). The following result, although not used anywhere in this book, at least displays the fact that one is interested in cases
in which the image poset of an embedding is not an induced subposet of the
co-domain.
Any poset (P, ≤) can be embedded in a totally ordered set.13
We need a few other definitions related to order preserving.
A mapping f : P → Q is said to be order-reversing if and only x ≤ y
implies f (x) ≥ f (y).
Example 4. Suppose X = ∪σ∈I Aσ , a union of non-empty sets indexed by
I. Now there is a natural order reversing mapping among power posets:
α : P(X) → P(I),
which takes each subset Y of X, to
α(Y ) := {σ ∈ I|Y ⊆ Aσ }.
For example, if Y is contained in no Aσ , then α(Y ) = ∅.
Finally an order preserving mapping f : P → P is said to be monotone
non-decreasing if p ≤ f (p) for all elements p of P .
13
Many books present an equivalent assertion “any poset has a linear extension”. The
proof is an elementary induction for finite posets. For infinite posets it requires some
grappling with Zorn’s Lemma and ordinal numbers.
42
2.3.3
Closure operators.
A closure operator is a monotone non-decreasing mapping τ : P → P
which is idempotent, that is, τ (τ (x)) = τ (x) for all x ∈ P .
We call the images of τ the closed elements of P .
There are many contexts in which closure operators arise, and we list a
few.
1. The ordinary topological closure in the poset of subsets of a topological
space.
2. The mapping which takes a subset of a group (ring or R-module) to
the subgroup (subring or submodule, resp.) generated by that set in
the poset of all subsets of a group (ring or R-module).
3. The mapping which takes a set of points to the subspace which they
generate in a point-line geometry (P, L).
2.3.4
Closure operators and Galois connections.
One interesting context in which closure operators arise are Galois connections. Let (P, ≤) and (Q, ≤) be posets. A Galois connection (P, Q, α, β)
is a pair of order-reversing mappings α : P → Q and β : Q → P , such that
the two compositions β ◦ α : P → P and α ◦ β : Q → Q are both monotone
non-decreasing.
Galois connections arise in several contexts, especially when some algebraic object acts on something: For example
• Groups acting on a set X.
• Groups acting on a field as in the classical Galois Theory.
• Rings acting as a ring of endomorphisms of an abelian group.
Another example arises in algebraic geometry. We say that a polynomial
p(x1 , . . . , xn ) vanishes at a vector v = (a1 , . . . , an ) in the vector space F (n)
of n-tuples over the field F if and only if p(v) := p(a1 , . . . , an ) = 0 ∈ F .
Let (P, ≤) be the poset of ideals in the polynomial ring F [x1 , . . . , xn ] with
respect to the containment relation and for each ideal I let α(I) be the set
of vectors in F (n) at which each polynomial of I vanishes. Let (Q, ≤) be the
poset of all subsets of F (n) with respect to containment. For any subset X
43
of F (n) , let β(X) be the set of all polynomials which vanish simultaneously
at every vector in X. Then (P, Q, α, β) is a Galois connection.
The Corollary following the Lemma below shows how closure operators
can arise from Galois connections.
Lemma 8 Suppose (P, Q, α, β) is a Galois connection. Then for any elements p and q of P and Q respectively
α(β(α(p))) = α(p) and β(α(β(q))) = β(q)
proof: For p ∈ P , p ≤ β(α(p)) since β ◦ α is monotone non-decreasing.
Thus α(β(α(p))) ≤ α(p) since α is order reversing.
On the other hand
α(p) ≤ α(β(α(p))) = (α ◦ β)(α(p))
since (α ◦ β) is monotone non-decreasing. By antisymmetry, we have the first
equation of the statement of the Lemma.
The second statement then follows from the symmetry of the definition
of Galois connection – that is (P, Q, α, β) is a Galois connection if and only
if (Q, P, β, α) is.
Corollary 9 If (P, Q, α, β) is a Galois connection, then τ := β ◦ α is a
closure operator on (P, ≤).
proof: Immediate upon taking the β-image of both sides of the first equation of the preceeding Lemma.
Example 5. Once again consider the order reversing mapping α : P(X) →
P(I) of Example 4, where X was a union of non-empty sets Aσ indexed by
a parameter σ ranging over a set I. For every subset Y of X, α(Y ) was the
set of those σ for which Y ⊆ Aσ .
Now there is another mapping β : P(I) → P(X) defined as follows: If
J ⊆ I set β(J) := ∩σ∈J Aσ , with the understanding that if J = ∅, then
β(J) = X. Then β is easily seen to be an order reversing mapping between
the indicated power posets.
Now the mapping τ = β ◦ α : P(X) → P(X) takes each subset Y of X to
the intersection of all of the Aσ which contain it (with the convention that
empty intersections denote the set X itself).
44
2.4
2.4.1
Chain conditions
Saturated chains
Recall that a chain C of poset (P, ≤) is simply a (necessarily induced)
subposet (C, ≤P ) which is totally ordered. In case C is a finite set, |C| is
called the length of the chain.
We say that a chain C2 refines a chain C1 if and only if C1 ⊆ C2 .
The chains of a poset (P, ≤) themselves form a partially ordered set under inclusion relation (the dual of the refinement relation) which we denote
(ch(P, ≤), ⊆).
Suppose now that
C0 ≤ C1 ≤ · · ·
is an ascending chain in the poset (ch(P, ≤), ⊆). Then the set-theoretic
union ∪Ci is easily seen to be totally ordered and so is an upperbound in
(ch(P, ≤), ⊆). An easy application of Zorn’s Lemma then shows that every
element of the poset (ch(P, ≤), ⊆) lies below a maximal member. These
maximal chains are called unrefineable (or saturated) chains. Thus
Theorem 10 If C is a chain in any poset (P, ≤), then C is contained in a
saturated chain of (P, ≤).
Of course we can restrict this in special ways to an interval. In a poset
(P, ≤), chain from x to y is a chain (C, ≤) with x as a “zero element” and
y as a “one element”. Thus x ≤ y and {x, y} ⊆ C ⊆ [x, y].
The collection of all chains of (P, ≤) from x to y itself becomes a poset
(ch[x, y], ⊆) under the containment ( or “corefinement”) relation. (Note that
(ch[x, y], ⊆) not quite the same as ch([x, y], ≤) since the latter may contain
chains which do not contain x or y. We can still apply Theorem 10 for
P = [x, y] and the chains C in it that do contain {x, y} to obtain:
Corollary 11 If C is chain from x to y in a poset (P, ≤), then there exists
a saturated chain C 0 from x to y which refines C. In particular, given an
interval [x, y] of a poset (P, ≤), there exists an unrefinable chain in P from
x to y.
45
2.4.2
Algebraic intervals and the height function.
We have just observed a rather strong condition (FC) that implies that every
saturated chain is finite. In general, nothing of the sort occurs. Still, enough
occurs to produce strong theories: I mention two examples:
1. The general Jordan-Hölder Theorem whose context is algebraic interval
measures in semimodular lower semilattices.
2. Locally finite posets, the natural domain of the theory of the Mobius
function.
Let (P, ≤) be any poset. An interval [x, y] of (P, ≤) is said to be algebraic
(or of finite height) if and only if there exists a saturated chain from x to
y of finite length.14 Note that to assert that (a, b) is an algebraic interval,
does not preclude the simultaneous existence of infinite chains from a to b.
The height of an algebraic interval [x, y] is then defined to be the minimal
length of a saturated chain from x to y.15
If (a, b) is an algebraic interval its height, the smallest length of a finite
unrefinable chain from a to b, is denoted h(a, b). Thus for an algebraic interval
[a, b], h(a, b) is always a non-negative integer. We denote the collection of all
algebraic intervals of (P, ≤) by the symbol AP .
Proposition 12 The following hold:
1. If (a, b) and (b, c) are algebraic intervals of poset (P, ≤), , then (a, c) is
also an algebraic interval.
2. The height function h : AP → N0 from the algebraic intervals of (P, ≤)
to the non-negative integers, satisfies this property: If (a, b) and (b, c)
are algebraic intervals, then
h(a, c) ≤ h(a, b) + h(b.c)
14
This adjective “algebraic” does not enjoy uniform usage. In Universal Algebras ,
elements which are the join of finitely many atoms are called algebraic elements (I suppose
by analogy with the theory of field extensions). Here we are applying the adjective to an
interval, not an element of a poset.
15
The term “height” is used here instead of “length” which is appropriate when all
unrefineable chains have the same length, as in semimodular lower semilattices that appear
in the Jordan-Hölder Theorem or in the case of simple chains, where we have already used
the term.
46
Example 6. Recall that a poset (P, ≤) is said to be a simplicial complex
if
1. It contains an element 0 such that 0 ≤ p, ∀p ∈ P .
2. For each interval (0, p), the subposet [0, p] := ({r ∈ P |0 ≤ r ≤ p}, ≤)
is the poset of all subsets of some finite set Fp .
Clearly every interval (a, b) of a simplicial complex is algebraic with height
function h(a, b) = |Fb | − |Fa |.
2.4.3
The ascending and descending chain conditions
in arbitrary posets.
Let (P, ≤) be any partially ordered set. We say that P satisfies the ascending chain condition or ACC if and only if every ascending chain of
elements of P stabilizes after a finite number of steps – that is, for any chain
p1 ≤ p2 ≤ p3 ≤ · · · ,
there exists a positive integer N , such that
pN = pN +1 = · · · = pk , for all integers k greater than N.
Put another way, (P, ≤) has the ACC if and only if every properly ascending
chain
p1 < p 2 < · · ·
terminates after some finite number of steps. Finally, it could even be put a
third way: (P, ≤) satisfies the ascending chain condition if and only if there
is no countable sequence {pn }∞
n=1 with pn < pn+1 for all natural numbers n.
Lemma 13 The poset P = (P, ≤) satisfies the ascending chain condition
(ACC) if and only if it satisfies
1. (The Maximum Condition) Every non-empty subset X of P (regarded
as an induced poset) contains a maximal member.
2. (The Second form of the Maximum Condition) In particular, for every
induced poset X of (P, ≤), and every element x ∈ X, there exists an
element y ∈ X such that x is bounded by y and y is maximal in X. (To
just make sure that we understand this: there exists an element y ∈ X
such that
47
(a) y ≤ x.
(b) If u ∈ X and u ≤ y implies that u = y.
remark: The student is reminded: to say that “x is a maximal member of
a subset X of a poset P ” simply means x is an element of X which is not
properly less than any other member of X.
proof of the lemma: 1. (The Maximum Condition implies the ACC).
Suppose the ACC fails. Then there must exist a properly ascending countably
infinite chain p1 < p2 < · · ·.. . Then X := {pi |i = 1, 2, } is a set with no
maximal element. So the Maximum condition fails.
( The ACC implies the Maximum Condition). Let X be any nonempty
subset of P . By way of contradiction, assume that X contains no maximal
member. Choose x1 ∈ X. Since x1 is not maximal in X, there exists an
element x2 ∈ X with x1 < x2 . By the same token x2 < x3 for some x3 ∈ X.
Continuing in this way we obtain an infinite properly ascending chain
x1 < x 1 < x 3 < . . . ,
contrary to the assumption of ACC.16
16
In Devlin’s wonderful book The Joy of Sets one has the exact argument of this last
line with the phrase “Proceeding inductively” replacing the initial phrase “Continuing
in this way”. The graduate student has probably encountered arguments like this many
times, where a sequence with certain properties is said to exist because after the first n
members of the sequence are constructed, it is always possible to choose a suitable n + 1-st
member. This has always struckthe author as a leap of faith, for the sequence alleged to
exist must exemplify infinitely many of these choices. At the very least we are involved in
the Axiom of Choice in choosing the xi — but in a sense it appears worse. The sets are
not just sitting there as if we had prescribed non-empty sets of socks in closets lined up
in an infinite hallway (the traditional folk-way of explaining the Axiom of Choice). Here
it as if each new closet was being defined by our choice of sock in in a previous closet.
All I can feebly tell you is that it is basically equivalent to the Axiom of Choice – it does
not matter whether the hallway of closets is replaced by a “tree” of closets, each full of
socks, each sock bearing a prescription of a “successor closet” (indeed, almost a model of
Wheeler’s “many-worlds” view of the Universe) – it is a theorem about the existence of
infinite paths in suitable infinite graphs.
Accepting this, the Axiom of Choice serves up an entire properly-ascending sequence
that never ends. I just wanted you to know that the Axiom of Choice may be involved
whenever you (as a graduate student) purport to produce a chain on the basis that every
member of a finite truncation of a proposed chain can be extended.
48
It remains to show that the two versions of the Maximum condition are
equivalent. Clearly the second (apparently stronger) version implies the first;
one need only pick x ∈ X (X is non-empty), and the second version produces
the desired y maximal in X.
Now assume only the first version of the maximum condition. Take a
subset X and an element x ∈ X. Then set X 0 = X ∩ P x where P x :=
{z ∈ P |x ≤ z} is the principal filter generated by x. Then X 0 is non-empty
(it contains x) and so we obtain an element y maximal in X 0 from the first
version of the maximum condition. Now if y were not a maximal member of
X there would exist an element y 0 ∈ X with y < y 0 . But in that case x < y 0
so y 0 ∈ X ∩ P X = X 0 . But that would contradict y being maximal in X 0 .
Thus y is in in fact maximal in X and dominates x as desired.
Of course, merely by replacing the poset (P, ≤) by its dual P ∗ := (P, ≥),
and applying the above we have the dual development:
We say a poset (P, ≤) possesses the descending chain condition or
DCC if and only if every descending chain
p1 ≥ p2 ≥ · · · , pj ∈ P,
stabilizes after a finite number of steps. That is, there exists a positive integer
N such that pN = pN +1 = · · · pN +k for all positive integers k.
We say that x is a minimal member of the subset X of P if and only
if x ≥ y for y ∈ X implies x = y. A poset (P, ≤) is said to possess the
minimum condition if and only if
(Minimum Condition) Every nonempty subset of elements of the poset
P = (P, ≤) contains a minimal member.
Then we have:
Lemma 14 A poset P = (P, ≤) has the descending condition (DCC)
1. if and only if it satisfies the minimum condition or
2. if and only if it satisfies this version of the minimimum condition: for
any induced poset X any x ∈ X, there exists an element y minimal in
X which is bounded above by x.
Proof: Just the dual statement of Lemma 13.
49
Corollary 15 (The chain conditions are hereditary) If X is any induced
poset of (P, ≤), and P satisfies the ACC (or DCC), then X also satisfies the
ACC (or DCC, respectively).
proof: This is not really a Corollary at all. It follows from the definition of
“induced poset” and the chain-definitions directly. Any chain in the induced
poset is a fortiori a chain of its ambient parent. We mention it only to have
a signpost for future reference, allowing us to alert the reader to when this
principle is invoked.
Later, Lemma 14 will have a surprising consequence for lower semilattices
with the DCC.
Any poset satisfying both the ACC and the DCC also satisfies
(FC) Any saturated chain of P has finite length.
This is because patently one of the two chain conditions is violated by an
infinite saturated chain. Conversely, if a poset P satisfies condition (FC) then
there can be no properly ascending or descending chain of infinite length since
by Theorem 10 it would then lie in a saturated chain which was also infinite
against (FC). Thus
Lemma 16 The condition (FC) is equivalent to the assumption of both DCC
and ACC.
2.5
2.5.1
Posets with meets and/or joins.
Meets and joins
Let W be any subset of a poset (P, ≤). The join of the elements of W is
an element v in (P, ≤) with these properties:
1. w ≤ v for all w ∈ W .
2. If v 0 is an element of (P, ≤) such that w ≤ v 0 for all w ∈ W , then
v ≤ v0.
Similarly there is the dual notion: the meet of the elements of W in
P would be an element m in P such that
50
1. m ≤ w for all w ∈ W .
2. If m0 is an element of (P, ≤) such that m0 ≤ w for all w ∈ W , then
m0 ≤ m.
Of course, P may or may not possess a meet or a join of the elements of
W . But one thing is certain: if the meet exists it is unique; if the join exists,
it is unique. Because of this uniqueness we can give these elements names.
V
V
We write P (W ) (or just (W ) if the ambient poset P is understood) for
W
the meet of the elements of W in P (if it exists). Similarly, we write P (W )
W
(or (W )) for the join in P of all of the elements of W (if that exists).
In the case that W is the set {a, b}, we render the meet and join of a and
b by a ∧ b and a ∨ b, respectively. The reader may verify
a∧b = b∧a
a ∧ (b ∧ c) = (a ∧ b) ∧ c
(a ∧ b) ∨ (a ∧ c) ≤ a ∧ (b ∨ c)
(2.6)
(2.7)
(2.8)
and the three dual statements when the indicated meets and joins exist.
We say that (P, ≤) is a lower semilattice if it is “meet-closed” – that
is, the meet of any two of its elements exist. In that case (Exercise ???) it is
easy to show that meets exist for finite subsets of P . Dually we can define
an upper semilattice (it is join closed).
A lattice is a poset (P, ≤) that is both a lower semilattice and an upper
semilattice – that is the meet and join of any two of its elements both exist
in P . Thus, a lattice is a self-dual concept: If P = (P, ≤) is a lattice, then
so is its dual poset P ∗ = (P, ≤∗ ).
If every non-empty subset U of P has a meet (join) we say that arbitrary
meets (joins) exists. If both arbitrary meets and joins exist we say that
P is a complete lattice.
Example 7.
1. Any totally ordered poset is a lattice. The meet of a finite set is its
minimal member; its join is its maximal member. Considering the open
real interval (0, 1) with its induced total ordering from the real number
system, it is easy to see
(a) that there are lattices with no “zero” or “one”,
51
(b) that there can be (infinite) subsets of a lattice with no lower bound
or no upper bound.
2. The poset B(X) of all subsets of a set X is a lattice with intersection
of two sets being their meet, and the union of two sets being their join.
This lattice is self dual and is a complete lattice, meaning that any
subset of elements of B(X), whether infinite or not, has a least upper
bound and gretest lower bound – i.e. a meet and a join. Of course the
lattice has X as its “one“ and the empty set as its “zero”.
2.5.2
Lower semilattices with the descending chain condition.
The proof of the following theorem is really due to Stanley, who presented it
only for the case that P is finite[?, Proposition 3.3.1, page 103].
Theorem 17 Suppose P is a meet-closed poset (that is a lower semilattice)
with the (DCC) condition.
Then the following assertions hold:
1. If X is a meet-closed induced poset of P , then X contains a unique
minimal member.
2. Suppose, for some subset X of P , the filter
P X := {y ∈ P |x ≤ y for all x ∈ X}
is non-empty. Then the join
W
(X) exists in P .
In particular if (P, ≤) possesses a one-element 1̂ then for any arbitrary
W
subset X of P , there exists a join (X). Put more succinctly, if 1̂
exists, unrestricted joins exist.
V
3. For any non-empty subset X of P , the universal meet (X) exists and
is expressible as a meet of a finite number of elements of X.
4. The poset P contains a 0̂.
5. If P contains a 1̂, then P is a complete lattice.
52
proof: 1. The set of minimal elements of X is non-empty (Lemma 14 of
the section on chain conditions.) Suppose there were at least two distinct
minimal members of X, say y1 and y2 . Then y1 ∧ y2 is also a member of X
by the meet-closed hypothesis. But by minimality of each yi , one has
y1 = y1 ∧ y2 = y2 .
Since any two minimal members of the set X are now equal (and the set
of them is non-empty) there exists a unique minimal member. The proof of
Part 1 is complete.
2. Let X be any subset of P for which the filter P X is non-empty. One
observes that P X is a meet-closed induced poset and so by part 1, contains
W
a unique minimal member j(X). Then, by definition, j(X) is the join (X).
If 1̂ ∈ P then of couse P X is non-empty for all subsets X and the result
follows.
3. Let W (X) be the collection of all elements of P which can be expressed
as a meet of finitely many elements of X, viewed as an induced poset. Then
W (X) is meet-closed, and so possesses a unique minimal member x0 by Part
1. In particular x0 ≤ x for all x ∈ X. Now suppose z ≤ x for all x ∈ X.
Then z is less than or equal to any finite meet of elements of X, and so is
V
less than or equal to x0 . Thus x0 = (X), by definition of a global meet.
V
Suppose (X) < x0 . Then there exists some element x ∈ X such that
V
x ∧ x0 < x0 (otherwise x0 ≤ x for all x ∈ X forcing x0 = (X)). Then
patently x ∧ x0 is expressible as a meet of finitely many members of X and
so lies in W (X). This contradicts x0 being the minimum element of W (X).
V
Thus we must conclude x0 = (X), and we are done.
4. By part 1, P contains minimal elements (often called “atoms”). Suppose m is such an atom. If m ≤ x for all x ∈ P then x would be the element
0̂, against our definition of “minimal element”. Thus there exists an element
y ∈ P such that y is not greater than or equal to x. Then y ∧ m (which
exists by meet-closure) is strictly less than m. Since m is an atom, we have
y ∧ m = 0̂ ∈ P and the proof is complete.
5. If 1̂ exists, P enjoys both unrestricted joins by part 2. By part 4, 0̂
exists. Thus for any subset X of P , the order ideal PX that it generates is
W
non-empty. So we can form the join (PX ). This element – by definition – is
the unique minimum of the filter P PX generated by PX . Since X ⊆ P PX , the
W
aforesaid minimum belongs to the order ideal PX . Thus we see that (PX )
is itself a member of PX and so is the unique maximal element of this order
53
W
V
ideal. This means (PX ) is (X) and exists. Thus unrestricted meets exist
and so now P is a complete lattice.
The proof is complete.
Example 8. The following posets are all lower semilattices with the DCC:
1. N+ = {0 < 1 < 2 < . . .} of all non-negative integers with the usual
ordering. This is a chain.
2. D, the positive integers under the partial ordering of “dividing” – that
is, a ≤ b if and only if a divides the integer b.
3. B(X), the poset of all finite subsets of a set X.
4. L<∞ (V ), the poset of finite-dimensional subspaces of a vector space V .
5. M<∞ (I), the finite multisets over a set I.
It follows that arbitrary meets exist and a 0̂ exists.
remarkOf course, also, we could adjoin a global maximum 1̂ to each of these
examples and obtain complete lattices in each case.
2.5.3
Lower semilattices with both chain conditions
Recall from Section 1, that a poset has both the ACC and the DCC if and
only if it possesses condition
(FC) Every chain in P has finite length.
An observation is that if P satisfies (FC), then so does its dual P ∗ . Similarly, if P has finite height, then so does its dual. This is trivial. It depends
only on the fact that rewriting a finite saturated chain in descending order
produces a saturated chain of P ∗ .
We obtain at once
Lemma 18 Suppose P is a lower semilattice satisfying (FC) .
1. P contains both a 0̂. If P contains a 1̂ then P is a complete lattice.
2. Every element of P − {1̂} is bounded by a maximal member of P − {1̂}
– that is, an element of max(P ).
54
3. Every element of P − {0̂} is above a minimal element of P − 0̂ – that
is, an element of min(P ).
V
4. The meet of all maximal elements of P – that is (max(P )) (called the
Frattini element or radical of (P. ≤)) exists and is the meet of just
finitely many elements of max(P ).
5. If 1̂ ∈ P , the join of all atoms, (min(P )) ( called the socle of P and
denoted soc(P )) exists and is the join of just finitely many atoms.
W
proof: Parts 1. and 4.follow from the previous subsection. Parts 2. and 3.
are consequences of the chain-conditions (see the First Section) and Part 5.
follows from Parts 1. and 3.
Example 9. Hopefully the reader will notice that the hypothesis that P
contained the element 1̂ played a role – even took a bow – in Parts 1. and
5. of the above Lemma. Is this necessary? After all, we have both the
DCC and the ACC. Well, there is an asymmetry in the hypotheses. P is
a lower semilattice, but not an upper semilattice, though this symmetry is
completely restored once 1̂ exists in P , because then we have arbitrary joins.
Consider any one of the posets D, B(X), L<∞ (V ). These are ranked
posets, with the rank of an element being (1) the number of prime factors,
(2) the cardinality of a set, or (3) the dimension of a subspace – respectively.
Select a positive integer r and consider the induced poset trr (P ) of all elements of rank at most r in P where P = D, B(X), L<∞ (V ). Then trr (P ) is
still a lower semilattice with both the DCC and the ACC. But it has no 1̂.
Indeed the examples conceivably proliferate for unranked lower semilattices P when we replace trr (P ) by requiring this to be the order ideal generated by an antichain of elements x ∈ P of finite ultra length ul[0̂, x]. You
see how easy it is?
2.5.4
Examples and exercises
Exercise 12. We let [n] be the chain {1 < 2 < · · · < n} is the usual total
ordering of the natural numbers. Show that the infinite sum
[1] ∪ [2] ∪ [3] ∪ · · ·
satisfies (FC) but does not possess finite height, and possesses neither a 1̂
nor a 0̂.
55
Exercise 13. If we adjoin 0̂ to the poset of Exercise 11, show that the
Frattini element exists, and is 0̂.
Exercise 14. Show that
[1] × [2] × [3] × · · ·
does not satisfy (FC).
Exercise 15. Let P be a lower semilattice with 0̂ and 1̂ satisfying (FC). Set
B := max(P ) and let P̄ be the induced subposet generated by B – that is,
the set of elements of P expressible as a finite meet of elements of B. We
understand the empty meet to be the element 1̂, so the latter is an element
of P̄ .
1. Show that P̄ has the Frattini element φ(P ) as its zero element, 0̂P̄ .
2. For each x ∈ P − {1̂}, show that the induced poset P x ∩ P̄ has a unique
minimal member (which we shall call σ(x)).
3. Defining σ(1̂) = 1̂ show the following:
(a) The mapping σ : P → P̄ is a surjective morphism of posets.
(b) For each x ∈ P , one has σ(σ(x)) = σ(x).
(Any morphism onto an induced subposet which satisfies these last two
conditions is called a closure operator.)
Exercise 16. Must the induced subposet of all elements of finite ultralength
in a lower semilattice with DCC necessarily have the ACC?
Exercise 17. Suppose L is a lattice. Give an induction proof showing that
for any finite collection {a1 , . . . , an } of elements of L, the elements
a1 ∨ · · · ∨ an and a1 ∧ · · · ∧ an ,
exist and are respectively the greatest lower bound and greatest upper bound
in L of the set {a1 , . . . , an }.
Exercise 18. A lattice L := (L, ≤) is said to be distributive if and only if,
for any elements a and b in L, one always has:
a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c).
(2.9)
An example of a distributive lattice is the power poset P(X) of all subsets
of a set X.
56
1. Prove that if L is a distributive lattice and (M, ≤) is an induced subposet closed under taking pair-wise meets and joins, then M is also a
distributive lattice.
2. Use the result of item 1. to prove the following:
(a) The Boolean poset B(X) - of all finite subsets of X is a distributive
lattice.
(b) Let J (P ) be the poset of all order ideals of a poset (P, ≤) under
the containment relation. Use item 1. to prove that J (P ) is a
distributive lattice. [One must define “meet” and “‘join” of order
ideals and set L = P(P ) in item 1.]
2.6
The Jordan-Hölder Theory
In the previous two sections we defined the notions of “algebraic interval”,
“height of an algebraic interval” and the meet-closed posets which we called
“lower semilattices”. They shall be used with their previous meanings without further comment.
This section concerns a basic theorem that emerges from an property of
lower semilattices, that of semimodularity. Many important algebraic objects
give rise to lower semilattices which are semimodular (for example the posets
of subnormal subgroups of a finite group, or the submodule poset of an Rmodule) and each enjoyed its own “Jordan-Holder Theorem” – it always
being understood there was a general form of this theorem. It is in fact a
very simple theorem about extending “semimodular functions” on the set
of covers of a semimodular lower semilattice to an interval measure on that
semilattice.17 Sounds like a mouthful, but it is really quite simple.
Fix a poset (P, ≤). An unrefinable chain of length one is called a cover
and is denoted by (a, b) (which is almost the name of its interval [a, b] –
altered to indicate that we are talking about a cover). Thus (a, b) is a cover
if and only if a < b and there is no element c in P with a < c < b — i.e.
a is a maximal element in the induced poset of all elements prperly below
17
The prefix “semi-” is justified for several reasons. The term “modular function” has
quite another meaning as a certain type of meromorphic function of a complex variable.
Secondly the function in question is defined in the context of a semimodular lower semilattice. So why not put in the “semi”?. I do not guarentee that every term coined in this
book has been used before.
57
b (the principal order ideal Pb minus {b}). The collection of all covers of
(P, ≤) is denoted CovP . Note that CovP is a subset of AP , the collection of
all algebraic intervals of (P, ≤).
2.6.1
Lower semi-lattices and semimodularity
A lower semilattice (P, ≤) is said to be semimodular if, whenever (x1 , b) and
(x2 , b) are both covers with x2 6= x2 , then both (x1 ∧ x2 , x1 ) and (x1 ∧ x2 , x2 )
are also covers.
Lemma 19 Suppose (P, ≤) is a semimodular lower semilattice. Suppose
(x, a) is algebraic and (b, a) is a cover, where x ≤ b. Then (x, b) is algebraic.
Proof: Since (x, a) is algebraic, there exists a finite unrefinable chain from
x to a, say A = (x = a0 , a1 , . . . , an = a). Clearly each interval (x, aj ) is
algebraic.
By hypothesis, x ≤ b and so there is a largest subscript i such that ai ≤ b.
Clearly i < n. If i = n − 1 then an−1 = b so (x, b) is algebraic by the previous
paragraph. Thus b 6= an−1 . Since both (an−1 , a) and (b, a) are covers, then by
semimodularity, both (an−1 ∧ b, an−1 ) and (an−1 ∧ b, b) are covers. Continuing
in this way we obtain that (ak ∧ b, ak ) and (ak ∧ b, ak−1 ∧ b) are covers for all k
larger than i. Finally, as ai ≤ b, this previous statement yields ai+1 ∧ b = ai
and (by semimodularity) (ai , ai+2 ∧ b) must be a cover (see Figure 1). Now
(x = a0 , . . . , ai , ai+2 ∧ b, ai+3 ∧, . . . , an−1 ∧ b, b) is an finite unrefinable chain
since its sucessive pairs are covers. This makes (x, b) algebraic, completing
the proof.
2.6.2
Interval Measures on Posets
Let M be a commutative monoid. This means M possesses a binary operation, say “◦” which is commutative and associative, as well as an identity
element, say “e” such that m = e ◦ m = m ◦ e for all m ∈ M .
There is one monoid that plays an important role in the applications of
our main theorem and that is the monoid M<∞ (X) of all finite multisets over
a set X. We have met this object before in the guise of a locally finite poset
(see the last item under Example 2 of this Chapter). However it is also a
commutative semigroup. We have seen that any finite multiset over X can
be represented as a function f : X → N0 from X to the non-negative integers
58
Figure 2.1: The poset showing (x, b) is algebraic. The symbol “cov” on a
depicted interval indicates that it is a cover.
N0 whose “support” is finite – i.e. the function achieves a non-zero value in
only finitely instances as x wanders over X. Now, if f and g are two such
functions, we may let “f + g” denote the function that takes x ∈ X to the
non-negative integer f (x) + g(x), the sum of two integers. Clearly f + g has
finite support, and so (M (X), +) (under this definition of “plus”) becomes
a commutative semigroup. But as the constant function 0X : X → {0} is an
identity element with respect to this operation, M (M(X), +) is actually a
monoid (a commutative semigroup with an identity element).
An interval measure µ of a poset (P, ≤) is a mapping µ : AP → M
from the set of algebraic intervals of P into a commutative monoid M with
identity element e such that
µ(a, a) = e for all a ∈ P.
(2.10)
µ(a, b) ◦ µ(b, c) = µ(a, c) whenever (a, b) and (b, c) are in AP (2.11)
[Notice that we have found it convenient to write µ(a, b) for µ([a, b]).]
Here are some examples of interval measures on posets:
Example 10. Let M be the multiplicative monoid of positive integers. Let
(P, ≤) be the set of positive integers and write x ≤ y if x divides y evenly.
59
Then every interval of (P, ≤) is algebraic. Define µ by setting µ(a, b) := b/a
for every interval (a, b).
Example 11. Let (P, ≤) be as in Example 10, but now let M be the additive
monoid of all non-negative integers. Now if we set µ(a, b); = the total number
of prime divisors of b/a, then µ is a measure.
Example 12. Let (P, ≤) be the poset of all finite-dimensional subspaces
of some (possibly infinite dimensional) vector space V , where “≤” is the
relation of “contained in”.
(i) Let M be the additive monoid of all non-negative integers. Then if we
define µ(A, B) := dim(B/A), then µ is a measure on (P, ≤)
(ii) If M is the multiplicative group {1, −1}, then setting µ(A, B) := (−1)dim(A/B)
for every algebraic interval (A, B) also defines a measure.
2.6.3
The Jordan-Hölder Theorem
Now we can prove
Theorem 20 (The Jordan-Hölder Theorem) Let (P, ≤) be a semimodular
lower semilattice. Suppose µ : CovP −→ (M, +) is a mapping from the set of
covers of P to a commutative monoid (M, +), and suppose this mapping is
“semimodular” in the sense that
µ(b, y) = µ(a ∧ b, a) whenever (a, y) and (b, y) are distinct covers. (2.12)
Then for any two finite unrefinable chains U = (u = u0 , . . . , un = v) and
V = (u = v0 , . . . , vm = v) from u to v, we have
n−1
X
µ(ui , ui+1 ) =
i=0
m−1
X
µ(vi , vi+1 )
(2.13)
i=0
and n = m (the finite summation is taking place in the additive monoid M ).
In particular, µ extends to a well-defined measure µ̂ : AP −→ M .
Proof: If n = 0 or 1, then U = V , and the conclusion holds. We
therefore proceed by induction on the minimal length h(x, y) of an unrefinable
chain from x to y for any algebraic interval (x, y) – that is, the height of (x, y).
60
Figure 2.2: The main figure for the Jordan Hölder Theorem
If um−1 = vn−1 , h(u, vn−1 ) = n − 1, so by induction, n−2
i=0 µ(ui , ui+1 ) =
µ(v
,
v
)
and
m
−
1
=
n
−
1
and
the
conclusion
follows.
i i+1
i=0
So assume um−1 6= vn−1 . Set z = um−1 ∧ vn−1 . Since (um−1 , v) and
(vn−1 , v) are both covers, by semi-modularity, so also are (z, um−1 ) and
(z, vn−1 ) (see Figure 2).
Since u ≤ z, (z, vn−1 ) is a cover, and (u, vn−1 ) is algebraic, we see that
Lemma 1 implies that (u, z) is also algebraic. So a finite unrefinable chain
P
Pm−2
Z = (u = z0 , z1 , . . . , zr = z)
from u to z exists. Since h(u, vn−1 ) = n − 1 < n, by induction
r−1
X
(
µ(zi .zi+1 )) + µ(z, vn−1 ) =
i=0
m−1
X
µ(ui , ui+1 )
(2.14)
i=0
and r + 1 = n − 1. But h(um−1 ) ≤ r + 1 so induction can also be applied to
the algebraic interval (u, um−1 ) so
r−1
X
(
µ(zi , zi+1 )) + µ(z, um−1 ) =
m−1
X
µ(ui , ui+1 ).
(2.15)
i=0
i=0
and r + 1 = n − 1. Thus n = m.
But by (3), µ(z, um−1 ) = µ(vn−1 , v) and µ(z, vn−1 ) = µ(um−1 , v). The
result (4) now follows from (5), (6) and the commutativity of M . The proof
is complete.
61
There are many applications of this theorem. In the case of finite groups
(R-modules) we let (M, +) be the commutative monoid of multisets of all
finite simple groups (or all irreducible R-modules, resp.). These are the classical citations of the Jordan-Hölder Theorem for Groups and R-modules.18
Exercise 19. (The Jordan-Hölder Theorem implies the Fundamental Theorem of Arithmetic) Let (P, ≤) be the poset of positive integers where a ≤ b
if and only if integer a divides integer b evenly (Example 1, of this Section).
1. Show that (P, ≤) is a lower semimodular semilattice with all intervals
algebraic. Let µ : CovP → Z+ be the function which records the
prime number b/a at every cover (a, b) of (P, ≤). Indicate why µ is a
semimodular function in the sense given in the Jordan-Hölder Theorem.
2. Suppose integer a properly divides integer b. For every factorization
b/a = p1 p2 · · · pr of b/a into primes show that there exists a finite
unrefineable chain (a = a0 , a1 , a2 , . . . ar = b) such that ai /ai−1 = pi for
i = 1, . . . r.
3. Conclude from the Jordan-Hölder Theorem that every positive integer possesses a factorization into a unique collection of positive prime
numbers up to the order of the factors.
2.7
Möbius inversion
We pass now from interval measures on posets to more general functions on
the intervals of a poset. If our functions assume values in a commutative ring
A, and the poset is locally finite, an interesting theory emrges which encodes
many subtleties of the poset.
2.7.1
More locally finite posets.
Recall that a poset (P, ≤) is said to be locally finite if and only if, given any
interval [a, b] (where a ≤ b in P ) there are only finitely many distinct poset
18
I beg the reader to notice that in the case of groups there is no need for Zassenhaus’
famous “butterfly lemma”, or for the need to prove that subnormal subgroups of a finite
group form a lattice – a paper by Wielandt in an early issue of the Journal of Algebra,
whose introduction thought this was a necessity. Even if it were needed, that result is
immediate from the one- or two-line proof in Theorem 17.
62
elements z between a and b. In Example 2 of this chapter we gave a number
of examples of locally finite posets. The following examples are a little more
“generic” – they tell you how to produce new examples from known ones.
Example 13. If X is any subset of a locally finite poset, then the poset
induced by X is always locally finite. In particular, the induced poset [a, b]
of an interval is locally finite. The interval posets of the first example are all
just finite chains; moreover any finite chain (like any finite poset) is locally
finite, and any such chain can be realized as an interval poset of the natural
numbers.
An easy example could be formed by taking any subspace-closed collection
S of finite-dimensional subspaces of a vector space V , that is, an order ideal
in L<∞ (V ). For example S could be:
1. all subspaces of V which meet a fixed subspace W trivially. (This is
sometimes called “the geometry of attenuated spaces”.)
2. all finite-dimensional subspaces which are totally isotropic with respect
to a non-degenerate reflexive σ-sesquilinear form or toally singular with
respect to a quadratic form. (These are the “polar space geometries
which we shall meet these in Chapters ** and **.)
If V is over a finite field, then these two examples for S yield locally finite
posets.
Example 14. The dual P ∗ of any locally finite poset (P, ≤) is also locally
finite.
Example 15. Let (Pi , ≤), i = 1, 2, . . . , n, be a family of locally finite
posets. The elements of the product of these posets is the Cartesian product
P1 × · · · × Pn of sequences of length n , whose i-th member is an element of
Pi . A sequence a = (a0 , a1 , . . . , an ) is less than a sequence b = (b0 , b1 , . . . , bn )
if and only ai ≤ bi for all i. It follows that the interval poset defined by (a, b)
is the poset induced on
Y
[a, b] = i [ai , bi ],
a finite set. Thus the product poset is also locally finite.
63
2.7.2
The incidence algebra
Let (P, ≤) be a locally finite poset and let R be any ring. From these two objects we are going to construct a new ring A = A(P, R), called the incidence
algebra of (P, ≤) over ring R.
Let IP be the collection of all intervals of P . (By local finiteness, they
are all algebraic intervals.) The elements of A are the functions f : IP → R,
from the set of intervals into the ring R. Addition in A is point-wise – that is,
if f and g are functions from intervals to R, then f + g is the function whose
value at interval (a, b) is f (a, b) + g(a, b). Clear with respect to addition,
(A, +) is a commutative group. (It is in fact a direct product of copies of
(R, +) indexed by IP .)
Multiplication is not defined pointwise. Instead it is a sort of “convolution”, which we denote as a binary operation “◦”. If f and g are functions
from intervals to R, that is, elements of A, then f ◦ g is defined to be the
function whose value at interval (a, b) is
X
a≤x≤b
f (a, x)g(x, b).
(2.16)
Notice that the definition of f ◦ g makes sense, since the sum in (2.16) is
finite, and the multiplication within the summands is occurring in the ring
R.
The operation “◦” is distributive with respect to addition, since
f ◦ (g + h)(a, b) =
X
a≤x≤b
=
X
f (a, x)g(x, g) +
a≤x≤b
f (a, x)((g(x, b) + h(x, b))
X
= (f ◦ g)(a, b) + (f ◦ h)(a, b)
= (f ◦ g + f ◦ h)(a, b)
a≤y≤b
f (a, y)h(y, b)
(2.17)
for all intervals (a, b). Thus f ◦ (g + h) = f ◦ g + f ◦ h.
The proof of the left distributive law is very similar.
Also the operation “◦” is associative, for both f ◦ (g ◦ h) and (f ◦ g) ◦ h
have value
X
f (a, x)g(x, y)h(y, b)
a≤x≤y≤b
at interval (a, b).
64
The function : IP → R defined by
(a, b) :=
(
1
0
if a = b
if a < b
at every interval (a, b) is easily seen to be the identity element of A.
Thus A = A(P, R) is a ring.
2.7.3
The Möbius function of a locally finite poset
There are two more very special elements in A. The zeta function, ζ :
IP → R is the function which has constant value 1, the identity element of
the ring R, at every interval.
Theorem 21 The zeta function has a 2-sided inverse in A.
Proof: We are going to inductively define a function µ` : IP → R which is a
left inverse of ζ in A. Fix an element a in P . It suffices to give an appropriate
definition of µ` (a, b) for all elements b such that b ≥ a (so that (a, b) is an
interval). First set µ` (a, a) = 1. If (a, x) ∈ CovP , set µ` (a, x) = −1. Suppose,
at some stage that we have defined µ` (a, x) for all x ∈ [a, b] such that x < b.
Then we define
X
µ` (a, b) := −
µ (a, x).
a≤x<b `
19
(2.18)
Notice that (2.18) provides an algorithm for calculating µ` (a, b) for any
b ≥ a.
[If we change the value of the lower end of the interval from a to some other
element a0 , a whole new series of calculations must ensue. These new calulations are in no way influenced by our previous calulations of the µ` (a, z)’s.]
Now from (2.18), we have
µ` ◦ ζ = .
Similarly, there exists in A a right inverse µr for zeta. It too is inductively
defined, this time by the equations:
µr (b, b) = 1, and
X
µr (a, b) = − a<y≤b µr (y, b).
19
(2.19)
The algorithm is “greedy” in the sense that it ignores the global envoirnment of (a, b)
in the poset P . Instead it only looks at the first values at hand, the µ` (a, x) for x in the
interval poset [a, b] properly below b.
65
Thus
ζ ◦ µr = is easily verified.
Now why is µr = µ` ? In any ring, if both left and right inverses of an
element exist, then they are equal. In this particular case, the argument is
this:
µ` = µ` ◦ = µ` ◦ (ζ ◦ µr )
= (µ` ◦ ζ) ◦ µr = ◦ µr = µr .
From this point onward we write µ or µP (if we wish to emphasize the
poset P ) for µ` = µr . The element µP of A is called the Möbius function
of the poset (P, ≤).
Corollary 22 The Möbius function of the poset P over the ring R takes
values in the subring of R generated by the multiplicative identity. These
values are in the center of the ring R.
We now observe
Theorem 23 (Möbius Inversion) Let (P, ≤) be a locally finite poset, and let
f : IP → R be any R-valued function on the set of intervals. Suppose the
functions F1 and F2 from IP to R are define by setting
F1 (a, b) : =
X
a≤x≤b
F2 (a, b) : =
X
a≤x≤b
f (a, x),
(2.20)
f (x, b),
(2.21)
for any interval (a.b) in (P, ≤). Then for any interval (c, d), we have
f (c, d) =
X
F (c, y)µ(y, d) =
c≤y≤d 1
X
c≤y≤d
µ(c, y)F2 (y, d).
Proof: The definitions of F1 and F2 assert that
F1 = f ◦ ζ, and,
F2 = ζ ◦ f.
Then
F1 ◦ µ = (f ◦ ζ) ◦ µ = f ◦ (ζ ◦ µ)
= f ◦ = f.
Similarly
µ ◦ F2 = (µ ◦ ζ) ◦ f = ◦ f = f.
66
(2.22)
2.7.4
Special cases
Let C = (x0 , x1 , . . . , xn ) be a chain – that is, a finite totally ordered poset.
This is certainly one of the simplest posets that one could imagine. What is
it’s Möbius function?
By (2.19), µ(x0 , x0 ) = 1 and µ(x0 , x1 ) = −1. But then (2.18) implies
µ(x0 , xi ) = 0 if i > 1. Since any induced poset [xi , xj ], with i ≤ j, is also a
chain, we see that


 1
µ(xi , xj ) =  −1

0
if i = j
if j = i + 1
if j > i + 1
Recall that if {(Pσ , ≤σ )|σ ∈ J} is a family of posets, each with a “zero”
element 0σ ,then the direct sum of these posets is the poset (P, ≤) whose
elements are the maps f : J → ∪σ∈J Pσ from the index set into the dijoint
union of the Pσ such that f (σ) ∈ Pσ for all σ ∈ J and such that f (σ) 6= 0σ
for only finitely many indices σ. The partial ordering is such that
f ≤ g if and only if f (σ)≤σ g(σ) for every σ ∈ J.
Then it is easy to see that a direct sum of locally finite posets is also
locally finite. We now show
Theorem 24 Suppose (P, ≤) is a direct sum of locally finite posets {(Pσ , ≤σ )|σ ∈
J}. Suppose (P, ≤) has Möbius function µ and each poset (Pσ , ≤σ ) has
Möbius function µσ . For each interval (f, g) in P , one has
µ(f, g) =
Y
µ (f (σ), g(σ)).
σ∈J,f (σ)6=g(σ) σ
(2.23)
Proof: Fix interval (f, g) in (P, ≤), and let K := {1, . . . , n} be the set of
indices σ ∈ J for which g(σ) > 0σ ∈ Pσ . We understand this to mean K = ∅
if n = 0. But the equation (2.23) is true if f = g, for then 1 = µ(f, g) and
Q
f (σ) = g(σ) so the right side is µσ (f (σ), g(σ)) = 1n . So we may assume
that f < g and n = |K| is at least one.
As an induction hypothesis, we may assume that for each x ∈ [f, g], with
x < g, we have
Yn
µ(f, x) = σ=1 µσ (f (σ), x(σ)).
(2.24)
67
Now by definition of the Möbius function,
0 = µ(f, g) +
X
f ≤x<g
= µ(f, g) +
X
f ≤x<g
µ(f, x)
Yn
(
µ (f (σ), x(σ))),
σ=1 σ
(2.25)
where the second equality applies the induction hypothesis (2.24).
On the other hand
0 = 0n =
Yn
X
(
σ=1
µ (f (σ), z(σ))),
f (σ)≤z(σ)≤g(σ) σ
which upon “multiplying out” (that is, using the distributive laws of R) is
the sum of all possible n-fold monomials,
Yn
µ (f (σ), z(σ)),
σ=1 σ
with the the z(σ) ranging independantly over their respective interval posets
[f (σ), g(σ)]. But the summation in the right side of (2.25) is also the sum of
all such monomials except for one: the one monomial summand
Yn
µ (f (σ), g(σ))
σ=1 σ
is missing. It follows from (2.25) that this missing monomial is µ(f, g), and
the theorem is proved.
We look at two classic applications.
We saw in Theorem 7, that the poset D for the divisor relation among
the positive integers was a direct sum of copies of the poset N+ (of the
natural order relation among the positive integers) indexed by the natural
numbers. All these posets are locally finite (indeed, this was our second item
in Example 2, page 35). Since each interval (a, b) in D (representing number
a dividing number b) is a product of finite chains (representing the divisors
of those maximal prime-power divisors of b/a, we see that
µ(a, b) =
(
0
(−1)n
if b/a is divisible by the square of a prime,
if b/a is the product of n distinct primes.
The function µ(n) := µ(1, n) is the traditional Möbius function from
number theory.
68
Now let (P (n), ≤) be the poset of all subsets of an n-set X = {1, 2, . . . , n},
say, partially ordered by containment. This poset has a “zero” (the empty
set), a “one” (the whole setX) and 2n elements. Then (P (n), ≤) can be
viewed as a direct sum of n copies of the 2-element poset {0 < 1}. This is
because any subset Y of X can be regarded as the characteristic function
y : X → Z which has value 1 at elements of X and value 0 otherwise. Then
if A ⊆ B ⊆ X, so that (A, B) is an interval in (P (n), ≤), our theorem shows
that
µ(A, B) = (−1)|B−A| .
2.7.5
Remarks on other applications of the Möbius
function
The relations are boundless. Here is a sampling:
1. If (P, Q, α, β) is a Galois connection, then under mild conditions, the
value of µ(0, 1) (the Euler Characteristic minus 1) can be determined
by just partial knowledge of the Möbius function of the other side.
2. When f : P → Q is monotone, and the fibre over every interval (0, q)
contains a global maximum q 0 , then
X
{y|f (y)=s}
µ(0, y) = 0,
provided s > 0.
3. Similarly, if F is a family of subsets of a finite set containing the empty
set, closed under intersection, if X is their union, and if µ is the Möbius
function of the poset (F ∪ {X}, ⊆), then
µ(0, 1) = µ(∅, X) = q2 − q3 + q4 · · ·
where qk is the number of k-subsets of F which together intersect at
the empty set.
2.8
2.8.1
Point measures in lattices; the modular
law.
Introduction
In a recent article in the American Math Monthly [1] one finds this
69
Proposition 25 Let V1 , . . . , Vk be a collection of subspaces of an n-dimensional
space V . If
dim V1 + dim V2 + · · · + dim Vk > (k − 1)n,
then dim(V1 ∩ V2 ∩ · · · ∩ Vk ) is positive.
Proof: Let U = V1 ⊕ · · · ⊕ Vk be the formal direct sum of the Vi and let W
be the direct sum of k − 1 copies of V . Define the transformation T : U → W
by the rule that if vi ∈ Vi , i = 1, . . . k, then
T ((v1 , . . . , vk )) = (v1 − v2 , v2 − v3 , . . . , vk−1 − vk ).
Then the kernel of T is just the space of all vectors (v, v, . . . , v) where v ∈
V1 ∩ · · · ∩ Vk . Thus as
(k − 1)n <
Xk
i=1
dim Vi = dim U
= dim T (U ) + dim(ker T )
≤ dim(W ) + dim(ker T )
= (k − 1)n + dim(ker T ),
the result follows.
Remark: Of course one can always take V = V1 + V2 + · · · + Vk in the
Proposition.
As one can see, this is a theorem about dimensions. Dimensions measure
something about elements of the poset of subspaces of a vector space. In
fact, the dimension obeys a rule called “Kolmogoroff’s Axiom” which applies
to measures in many other contexts, such as probability, area, cardinality
etc. The object of this section is to display this fundamental concept, and
in particular, to generalize the above lemma in this wider context. In this
spirit, the vector-subspace poset is replaced by a “lattice”.
2.8.2
Dual lattices and sublattices.
Recall that a lattice is a poset (L, ≤) for which any two elements a and b
of the lattice possess a greatest lower bound or meet a ∧ b and also possess
a least upper bound or join a ∨ b. One has the following two “associative
laws” hold:
a ∨ (b ∨ c) = (a ∨ b) ∨ c, and
a ∧ (b ∧ c) = (a ∧ b) ∧ c,
70
(2.26)
(2.27)
for any elements a, b and c of L. Thus one may omit parentheses in compositions that use only one of the two binary operations “∨” or “∧”. One sees
that for any finite collection {a1 , . . . an } of elements of L, the elements
a1 ∨ · · · ∨ an and a1 ∧ · · · ∧ an ,
exist and are respectively the greatest lower bound and greatest upper bound
in L of the set {a1 , . . . , an }. It follows from the definitions that the dual of a
lattice (L, ≤) is also a lattice L∗ .
A lattice is said to possess a zero element if and only if it has a global
minimum – usually denoted by “0”. Similarly, it possesses a “one” if and
only it contains a global maximum – denoted by the symbol “1” as well as
the name “one”.
A lattice is said to be self-dual if and only if it is isomorphic to its dual.
Suppose L = (L, ≤) is a lattice. A sublattice of L is a subposet (P, ≤P )
of L that is closed under the pairwise meets and joins that are taking place
in L.. In Exercise ?? the student is asked to prove the following
Lemma 26
1. Any sublattice of a lattice is an induced poset.
2. Suppose P = (P, ≤P ) is a subposet of a lattice (L, ≤). Suppose further
that (P, ≤P ) is a lattice in its own right. Then
(a) It is possible for P not to be an induced subposet of L.
(b) It is possible P to be an induced subposet of L and still not be a
sublattice of L.
2.8.3
Kolmogoroff functions on lattices
Let (L, ≤) be any lattice. In turn let (A, +) be any commutative monoid
with with respect to the operation “+” having identity element 0. Moreover
A is called a cancellation monoid if
ac ≤ bc always implies a ≤ b,
for all elements a, b and c in A.
A Komolgoroff function is a mapping
f : L → A,
such that the following axiom holds:
71
(Komolgoroff’s Axiom) For any two elements a and b of L
f (a ∨ b) + f (a ∧ b) = f (a) + f (b)
(2.28)
Note that the “+” in these equations is occuring in the additive monoid
A. The reader should recall from a previous linear algebra course, that for the
poset of all subspaces of a finite dimensional vector space V , the dimension
function W → dim W is a Komolgoroff function into the additive monoid of
the non-negative integers.
A basic observation is this:
Lemma 27 (Duality for Kolmogoroff functions.) Suppose f : L → A is a
Komolgoroff function from lattice L into a commutative monoid A. Let L∗
be the dual lattice of (L, ≤). Both of the lattices possess the same set of
elements; and so f may be regarded as a mapping from the set of elements
of the dual lattice into A. In this sense f is a Komolgoroff measure of the
dual poset L∗ .
It is almost an insult to ask anyone for a proof! The Komolgoroff axiom
is clearly self-dual.
Consider first the following
Lemma 28 Suppose f is a Komolgoroff measure on the lattice L into the
commutative monoid A. For elements x1 , . . . , xk of lattice L, we have the
following two identities:
1. (The basic identity)
Xk
i=1
f (xi ) = f (x2 ∨ x1 ) + f (x3 ∨ (x1 ∧ x2 )) +
+f (x4 ∨ (x1 ∧ x2 ∧ x3 )) + . . . + f (x1 ∧ x2 ∧ · · · ∧ xk )
Xk
= [
^j−1
f (xj ∨ (
j=2
x ))]
r=1 r
+ f (x1 ∧ · · · ∧ xk ).
(2.29)
2. (Vriar’s Identity)
Xk
i=1
f (xi ) = f (x2 ∧ x1 ) + f (x3 ∧ (x1 ∨ x2 )) +
+f (x4 ∧ (x1 ∨ x2 ∨ x3 )) + . . . + f (x1 ∨ (x2 ∧ · · · ∨ xk ))
Xk
= [
_j−1
f (xj ∧ (
j=2
x ))]
r=1 r
72
+ f (x1 ∨ · · · ∨ xk ).
(2.30)
Proof: The second identity is the complete dual of the first in the sense of
Lemma 27. Thus it suffices only to prove the first one.
For k = 2 this is equation (2.28). Assume the result for k = n − 1 and
add f (xn ) to both sides. The left side is then the sum of the measures of the
xi from i = 1 to n. The last two terms on the right side are
f (x1 ∧ · · · ∧ xn−1 ) + f (xn ).
(2.31)
By equation (1) with a = x1 ∧ · · · ∧ xn−1 and b = xn , (4) becomes
f (xn ∨ (x1 ∧ · · · ∧ xn−1 )) + f (x1 ∧ · · · ∧ xn )
and the sum on the right now has the desired form.
2.8.4
The “inclusion-exclusion” principle
A lattice L = (L, ≤) is said to be distributive if and only if, for any three
elements a, b and c of L, one always has
a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c).
(2.32)
In this subsection we consider a Kolmogoroff function f : L → A from a
distributive lattice L with values in a commutative monoid A. An “inclusionexclusion” principle seeks to express the value of the Kolmogoroff function at
the join of several elements in terms of its values at meets of various subsets of
these elements. In order to express this we need to introduce some notation.
Fix a finite non-empty subset P = {`1 , . . . , `n } of elements of a lattice
L. This set is indexed by [n] := {1, . . . , n}. For every non-empty subset
J = {a1 , . . . , ak } of [n], let
^
J
(P ) := `a1 ∧ `a2 ∧ . . . ∧ `ak .
Next, for any integer k ∈ [n], set
Σk f (P ) :=
X
^
f(
|J|=k
J
(P )),
where the sum on the right hand-side ranges over all subsets J of [n] of
cardinality k – that is the sum (in monoid A) of the f -values at all possible
meets of k-subsets of [n]. Thus
Σ1 f (P ) = f (`1 ) + f (`2 ) + · · · + f (`n )
X
f (`i ∧ `j )
Σ2 f (P ) =
1≤i<j≤n
Σn f (P ) = f (`1 ∧ `2 ∧ · · · ∧ `n ).
73
(2.33)
Now we may state
Theorem 29 (Inclusion-Exclusion) Let f : L → A be a Kolmogoroff measure from the distributive lattice L to the commutative monoid (A, +). Let
P = {`1 , . . . , `n } be any set of n elements chosen from L. Then, in the
notation just proceeding,
f (`1 ∨ · · · ∨ `n ) +
X
(k
even) Σk f (P ) =
X
(k
odd ) Σk f (P ).
(2.34)
If morover, A is a cancellation monoid, one may write this:
f (`1 ∨ · · · ∨ `n ) =
Xn
k=1
(−1)(k+1) Σk f (P ).
(2.35)
The proof of (2.34) is left as an exercise, an easy induction proof utilizing
the distributive law ( 2.32) and the Kolmogoroff axiom. Then (2.35) follows
from the fact that in a cancellation monoid x +y = z, x and y being elements
of A, imply that z − y is an element of A.
2.8.5
Point-measure of a lattice
In order to even state an analogue of Proposition 25 we need to be able to
compare images of the Komolgoroff measure f : L → A from a lattice L to
a commutative monoid. That means that the monoid A itself must have a
partial ordering.
A commutative monoid M is said to be a partially ordered monoid if
and only if there is a partial ordering “≤” such that (M, ≤) is a poset and
multiplication (or addition) by any element is order-preserving. Thus
a ≤ b implies ac ≤ bc,
for any elements a, b and c in M . It is said to be a partially ordered
monoid with cancellation if, in addition
ac ≤ bc always implies a ≤ b,
for all elements a, b and c in M .
A good example of a partially ordered monoid with cancellation is the
additive monoid R+ of non-negative real numbers. We shall see from the
74
exercises that the additive monoid of multisets is also a partially ordered
cancellation monoid.
Let L be a lattice with a “0”. (As usual, we let a ∧ b and a ∨ b denote
the meet and join of the elements a and b in L.) A point-measure on the
lattice L, is a function µ : L → R from the set of lattice elements to a
partially-ordered cancellation monoid A such that
1. µ(0) = 0, the identity of A,
2. µ is monotone non-decreasing in that a ≤ b implies µ(a) ≤ µ(b), and
3. µ is a Kolmolgoroff measure, that is,
µ(a ∨ b) + µ(a ∧ b) = µ(a) + µ(b),
(2.36)
for any lattice elements, a and b.
We can now present the generalization of the Theorem at the beginning
of this section:
Theorem 30 Suppose µ : L → A is an A-valued point measure on the lattice
L. If for k lattice elements a1 , . . . , ak , one has
µ(a1 ) + µ(a2 ) + · · · + µ(ak ) > (k − 1)µ(a1 ∨ · · · ∨ ak ),
(2.37)
then µ(a1 ∧ · · · ∧ ak ) > 0.
Proof of Theorem 30: Let x1 , . . . xk satisfy the hypothesis of the Theorem and set X := x1 ∨ · · · ∨ xk . By the hypothesis and Lemma 28, we
have
(k − 1)µ(X) <
X
i
µ(xi )
= µ(xk ) +
Xk−1
i=1
µ(xi )
Xk−1
= µ(xk ) + [
^j−1
µ(xj ∨ (
j=2
x ))]
r=1 r
+ µ(x1 ∧ · · · ∧ xk−1 )
≤ µ(xk ) + (k − 2)µ(X) + µ(x1 ∧ · · · ∧ xk−1 )
(2.38)
(Equation (2.38) follows from the monotonicity of µ.) If (k − 2)µ(X) is
cancelled from both sides, one obtains
µ(X) < µ(xk ) + µ(x1 ∧ · · · ∧ xk−1 )
= µ(xk ∨ (x1 ∧ · · · ∧ xk−1 )) + µ(x1 ∧ · · · ∧ xk ).
75
(2.39)
But µ(X) ≥ µ(xk ∨ (x1 ∧ · · · ∧ xk−1 )) by monotonicity, so µ(x1 ∧ · · · ∧ xk ) > 0,
completing the proof.
When strict inequality is replaced by equality, it is possible for the meet
to have trivial measure. For example, if n = k and the Vi of Proposition 25
are hyperplanes of V , it is possible for the Vi to intersect at the zero subspace.
Nonetheless, with equality, the situation is still very tight. Essentially the
same proof as that of Theorem 30 shows
Theorem 31 Suppose the inequality of Theorem 2 is replaced by equality.
Then either
^
µ( j xj ) > 0
(2.40)
or, for each index i,
_
µ(
x)
j j
= µ(xi ∨ (x1 ∧ · · · ∧ xi−1 ∧ xi+1 · · · ∧ xk )).
(2.41)
If moreover µ is order-injective – that is, a ≤ b and µ(a) = µ(b) always
imply a = b – then the µ’s may be removed in equations (2.40) and (2.41).
In a lattice of finite dimensional vector spaces, “dimension” is always an
order-injective measure.
Because L and µ are general, there are many applications of the inequality
of Theorem 2 beyond the case that L is the lattice of all subspaces of a vector
space and the measure is dimension as in Proposition 25.
1. There are all sorts of height functions on matroids – for example, subsets of a finite set with cardinality as the measure.
Here is another example of this type: Let N denote the non-negative
integers, and let M = N(n) be the collection of all n-tuples (m1 . . . , mn )
the mi ∈ N. Under coordinate-wise addition, M is the monoid of
multisets over a given n-set. M is a lattice under the order relation
(a1 , . . . , an ) ≤ (b1 , . . . , bn ) iff ai ≤ bi for each i.
The theorem applies with the measure defined by
µ(a1 , . . . , an ) =
76
X
ai .
2. Let L be any Boolean algebra of Lesbsque-measurable subsets in Euclidean space, and let µ be their measure (area, volume etc.)
3. Let L be the lattice of finite dimensional submodules of an R-module,
where R is a K-algebra, K any field. Let the measure µ(N ) be the
number of composition factors of the submodule N which belong to
any specified class of irreducible R-modules.
As a result one has, for example:
Corollary 32
1. Let A1 , . . . , Ak be finite subsets of a set X. If
|A1 | + · · · + |Ak | > (k − 1)|A1 ∪ · · · ∪ Ak |,
then there exists an element x in every Ai .
2. Let A1 , . . . , Ak be Lesbegue-measurable subsets of Euclidean space. If
the sum of their volumes exceeds k − 1 times the volume of their union,
then their intersection has a positive volume.
2.8.6
Modular lattices
Consider for the moment the following example:
Example 16. The poset (P, ≤) contains as elements P = {a, b, c}∪Z, where
Z is the set of integers, with ordering defined by these rules:
1. a ≤ p for all p ∈ P –i.e. a is the poset “zero”,
2. c ≥ p for all p ∈ P , so c is the poset “one”,
3. b is not comparable to any integer in Z, and
4. the integers Z are totally ordered in the natural way.
Notice that the meet or join of b and any integer are a and c respectively.
Thus (P, ≤) is a lattice with its “zero” and “one” connected by an unrefinable
proper chain of length 2 and at the same time by an infinite unrefinable chain.
The lattice is even lower semimodular since the only covers are (a, b), (b, c)
and (n, n + 1), for all integers n.
77
For the purposes of the Jordan-Hölder theory, such examples did not
bother us, for the J-H theory was phrased as an assertion about measures
which take values on algebraic intervals – that is to say, non-algebraic intervals could be ignored.
However, many of the most important posets in Algebra are actually lattices with a property, the modular law, which prevents elements x and y
from being connected by both a finite unrefinable chain chain and an infinite
chain. This modular law always occurs in posets of congruence subalgebras
which are subject to the so-called “Second Fundamental Theorem of Homomorphisms”.
A lattice L is said to be a modular lattice or to be modular if and
only if
(M) for all elements a, b, c of L with a ≥ b,
a ∧ (b ∨ c) = b ∨ (a ∧ c).
The dual condition is
(M∗ ) for all elements a, b, c of L with a ≤ b,
a ∨ (b ∧ c) = b ∧ (a ∨ c).
But by permuting the roles of a and b it is easy to see that the two
conditions are equivalent. Thus
Lemma 33 A lattice satisfies (M) if and only if it satisfies (M∗ ). Put another way: a lattice L is modular if and only if its dual lattice L∗ is.
The most important property of a modular lattice is given in the following
Lemma 34 Suppose L is a modular lattice. Then for any two elements a
and b of L, there is a poset isomorphism
µ : [a, a ∨ b] → [a ∧ b, b],
taking each element x of the domain to x ∧ b.
78
Proof: As µ is defined, µ(a) = a ∧ b, µ(a ∨ b) = (a ∨ b) ∧ b = b, and if
a ≤ x ≤ y ≤ b, then µ(x) = x ∧ b ≤ y ∧ b = µ(y). Thus µ is monotone and
takes values in [a ∧ b, b].
Suppose now that x, y ∈ [a, b] and µ(x) = µ(y). Then x ∧ b = y ∧ b and
so a ∨ (x ∧ b) = a ∨ (y ∧ b). Applying the law (M) to each side of the last
equation, one obtains x = x ∧ (a ∨ b) = y ∧ (a ∨ b) = y, since a ∨ b dominates
both x and y. Thus µ is injective.
Now suppose a ∧ b ≤ x ≤ b. Then by the first inequality, x = x ∨ (b ∧ a).
Since x ≤ b, applying (M∗ ) (with x and a in the role of a and c in (M∗ )),
x = b ∧ (x ∨ a) = µ(x ∨ a). Thus µ is onto.
Thus µ is a poset isomorphism.
Remark: Just on the general principle that modularity is a relation
between formulas in three variables with a partial order relation, it follows
that any sublattice of a modular lattice is modular.20
Corollary 35 Any modular lattice is a semi-modular lower semilattice and
so is subject to the Jordan-Hölder theory. In particular, any two finite unrefinable chains that may happen to connect two elements x and y of the lattice,
must have the same length, one depending only on the pair (x, y).
Proof: Apply the previous Lemma for the case [a, a ∨ b] and [b, a ∨ b] are
both covers.
The preceeding Corollary only compared two finite unrefinable chains
from x to y. Could one still have both a finite unrefinable chain from x
to y as well as an infinite one as in Example 16 at the beginning of this
subsection? The next result shows that such examples are banned from the
realm of modular lattices.
Theorem 36 Suppose L is a modular lattice and suppose a = a0 < a1 <
· · · < an = b is an unrefinable proper chain of length n preceeding from a to
b. Then every other proper chain proceeding from a to b has length at most
n.
20
Recall from the definition of “sublattice” on page 71 that the latter is closed under
the join and meet operations of the parent lattice.
79
Proof: Of course, by Zorn’s lemma, any proper chain is refined by a proper
unrefinable chain. So, if we can prove that all proper chains connecting a
and b are finite, such a chain possesses a well-defined measure by the JordanHölder Theory. In particular all such chains possess the same fixed length
n. So it suffices to show that any proper chain connecting a and b must be
finite.
On the other hand, we propose to accomplish this by induction on n. But
this means that we shall have to keep track of the bound on length asserted
by the induction hypothesis, in order to obtain new intervals [a0 , b0 ] to which
induction may be applied.
So we begin by considering a (possibly infinite) totally ordered subposet
X of (P, ≤) having a as its minimal member and b as its maximal member.
We must show that X is a finite proper chain. If n = 1,{a, b} = X is a cover,
and we are done.
If each x ∈ X satisfies x ≤ an−1 , then X − {b} is finite by induction on
n, and so X is finite. So we may suppose that for some xβ ∈ X, we have
xβ ∨ an−1 = b. By Lemma 34 (an−1 ∧ xβ , xβ ) is a cover.
Now the distinct members of the set
Y = {z ∧ an−1 |z ∈ X}
form a totally ordered set whose maximal member is an−1 = b ∧ an−1 , and
whose minimal member is a = a ∧ an−1 . By induction on n, Y is a finite set
of at most n − 1 elements. On the other hand, it is the concatenation of two
proper chains
Y + : = {y ∈ Y |y ≥ an−1 ∧ xβ }
Y − : = {y ∈ Y |y ≤ an−1 ∧ xβ },
whose lengths are non-zero and sum to at most n − 1.
Now the poset isomorphism µ : [xβ , b] → [an−1 ∧ xβ , an−1 ] takes the
members of X dominating xβ to Y + . Thus
X ∩ [xβ , b]
is a chain of the same length as Y + . It remains only to show that the
remaining part of X, the chain X ∩ [a, xβ ] is finite.
80
Let index i be maximal with respect to the condition ai ≤ xβ . Then for
each index j exceeding i but bounded by n − 1,
aj = aj−1 ∨ (aj ∧ xβ ) and (aj−1 ∧ xβ , xj ∧ xβ )
is a cover or is length zero. It follows that a is connected to xβ by an
unrefinable chain of length at most n − 1. By induction X ∩ [a, xβ ] is finite.
The proof is complete.
Corollary 37 Let L be a modular lattice with minimum element 0 and maximum element 1. The following conditions are equivalent:
1. There exists a finite unrefinable chain proceeding from 0 to 1.
2. Every proper chain has length below a certain finite universal bound.
3. L has both the ascending and descending chain conditions.
Proof: The Jordan-Hölder theory shows all chains connecting 0 to 1 are
bounded by the length of any unrefinable one. How this affects all proper
chains is left as an exercise.
Any modular lattice which satisfies the conditions of Corollary 37 is said
to possess a composition series.
Example 17. Let L be the lattice D of all positive integers where a ≤ b
if and only if a divides b. Then D is a modular lattice which possesses the
DCC but not the ACC. There is a “zero” (the integer 1), but no lattice “one”.
However, every interval [a, b] possesses a composition series.
2.8.7
Exercises concerning lattices.
Exercise 20. Among the posets of Examples 2 through 5, which are lattices?
Exercise 21.
1. Prove statement 1. of Lemma 26 – i.e any sublattice of a lattice is an
induced poset.
2. Suppose P = (P, ≤P ) is a subposet of a lattice L = (L, ≤) such that P
is itself a lattice.
81
(a) Give an example of such P and L such that P is not an induced
poset.
(b) Give an example of such P and L such that P is an induced poset
but is still not a sublattice.
[Hint: The examples requested need not have more than five elements. In
the second example, the joins x ∨P y and meets x ∧P y in P somehow don’t
coincide with those in L.]
Exercise 22. Show that any multiplicatively closed subset M of the positive
real numbers R+ is a totally ordered cancellation monoid with respect to
multiplication and the order induced by the natural ordering of the reals.
In particular, any multiplicatively closed subset of the positive integers is a
totally ordered monoid with cancellation. Show that the set S of integers
which are the sum of two integer squares (at least one of which is not zero)
is such a set. [Later on, the reader might be interested to know that the set
Sp of numbers that can be the number of p-Sylow subgroups of some finite
group is also multiplicatively closed. This is not always all positive integers
congruent to one modulo the prime p (also multiplicatively closed) since it
will emerge that no finite group has 22 7-Sylow subgroups.]
Exercise 23. Let (M1 , ≤) and (M2 , ≤) be two partially ordered semigroups.
Define a binary operation on the Cartesian product M1 × M2 be coordinatewise multiplication: that is,
(a1 , a2 ) ◦ (b1 , b2 ) := (a1 b1 , a2 b2 ).
Define a partial ordering on M1 × M2 by the rule:
(a1 , a2 ) ≤ (b1 , b2 ) if and only if both a1 ≤ b1 and a2 ≤ b2 .
Show that with respect to these definitions of product and order, M1 × M2
is also a partially ordered monoid. Moreover, show that if both M1 and M2
are partially order cancellation monoids, then so is M1 × M2 .
Exercise 24. Regarding Z+ as a totally ordered monoid, show that
Z+ × Z+ × · · · × Z+ (k factors),
is the commutative monoid of multisets over k objects. One distribution of
P
P
objects ai Oi is less than or equal to another bi Oi if and only if integer
82
ai is less than or equal to integer bi for each i. That is, the inventories of the
second distribution are at least as large as those of the first in every category.
Show how this result extends to infinite products. Conclude that the monoid
of multisets, with this partial ordering is a partially-ordered cancellation
monoid.
Exercise 23. Show that the product L1 × L2 of two modular lattices is
modular. If the lattices Lσ are modular and possess a “zero”, does the
modularity extend to their direct sum?
2.9
2.9.1
Dependence Theories
Introduction
If we say A “depends” on B, what does this mean? In ordinary language one
might intend several things: “A depends entirely on B” or that “A depends
just a little on B” – a statement so mild that it might suggest only that B
has a “slight influence” on A. But some syntax seems to be applied all the
same: thus if we say that A depends on B and that B in turn depends on
C, then A (we suppose to some small degree) depends on C.
In mathematics we also usually use the word “depends” in both senses.
When one asserts that f (x, y) depends (to some degree) upon the variable x,
one means only that f might be influenced by x. After all, it could be that f
is a “constant” function as far as x is concerned. But on the other hand the
mathematician also intends that f is entirely determined by the pair (x, y).
Thus we may consider the phrase “f (x, y) depends on x” as one borrowed
from ordinary everday speech. In its various guises, the stronger idea of total
and entire dependence appears thoughout mathematics with such a common
strain of syntactical features as to deserve codification.
But as you will see, the theory here is highly special.
2.9.2
Dependence
Fix a set S. Let F(S) be the set of all finite subsets of the set S . A
dependence relation is a relation “depends” on the Cartesian product
S × F(S) such that
83
1. (The Reflexive Condition) Any element s of S depends upon any finite
set which contains it as a member.
2. (The Transitivity Condition) If element x depends on the finite set A
and every member of A depends on the finite set B, then x depends on
B.
3. (The Exchange Condition) If z, a1 , . . . , an are all elements of S such
that z depends on the set {a1 , . . . , an } but does not depend on the set
{a1 , . . . .an−1 }, then an depends on the set {a1 , . . . , an−1 , z}.
.
Before concluding anything from this, let us consider some examples.
Example 18. (Linear dependence). Suppose V is a a vector space over some
field or division ring D. Let us say that a vector v depends on a finite set of
vectors {v1 , . . . , vn } if and only if v can be expressed as a linear combination
P
of those vectors, – that is, ni=1 δi vi = v for some choice of δi in D. One
checks that this defines a dependence relation on V × F(V ).
Example 19. (Algebraic dependence) Let K be a field containing k as a
subfield. (One might let K and k respectively be the complex and rational
number fields). We say that an element b of K depends on a finite subset
X = {a1 , . . . .an−1 } of K if and only if b is the root of a polymonial equation
whose coefficients are expressible as a k-polynomial expressions in X. This
means that there exists a polynomial p(x, x1 , . . . , xn ) in the polynomial ring
k[x, x1 , . . . , xn ] when evaluated at x = b, xi = ai , 1 ≤ i ≤ n yields 0. We shall
see in Chapter 10 that all three axioms of a dependence theory hold for this
definition of dependence among elements of the field K.
2.9.3
The extended definition
Our first task will be to extend the definition of “depends” from being a
relation on S × F(S) to a relation on S × 2S – so that it will be possible
speak of element x depending on a (possibly infinite) subset A of S. For
any subset A of S, we say that element x depends on A if and only there
exists a finite subset A1 of A such that x depends on A1 (in the old sense).
We leave it as an exercise to the reader to prove the following :
Lemma 38 In the following the sets A and B may be infinite sets:
84
1. Element x of S depends on any set A which contains it.
2. If A and B are subsets of S, if x depends on A and every element of
A depends on set B, then x depends on B.
3. If a ∈ A, a subset of S, and x is an element of S which depends on A
but does not depend on A − {a}, then a depends on (A − {a}) ∪ {x}
2.9.4
Independence and spanning.
Now let A be any subset of S. The flat generated by A is the set hAi :=
{y|y depends on A}. A subset X of S is called a spanning set if and only
if hXi = S.
Next we say that a subset Y of S is independent if and only if for each
element x of X, x does not depend on X − {x}.
Now the collections of all spanning sets and all independent sets, each
form a partially ordered set under the containment relation. Thus it makes
perfect sense to speak of a minimal spanning set and a maximal independant set (should such sets exist). We wish to show that these two concepts
coincide.
Theorem 39
1. Every minimal spanning set is a maximal independent set.
2. Every maximal independent set is a minimal spanning set.
proof: We prove 1. Suppose U is a minimal spanning set. If U is not
independent then there exists an element u in U such that u depends on a
finite subset U1 of U − {u}. Thus every element of S depends on U and by
Lemma 10, part 1, and what we know of u, every element of U depends on
U − {u}. Thus by Lemma 10, part 2, every element of S depends on U − {u}.
Thus U − {u} is a spanning set, against the minimality of U . Thus U must
be independent.
If it were not a maximal independent set there would exist an element z
in S − U which did not depend on U . But that is impossible as U spans S.
Next we show 2. If X is a maximal independant set, then by maximality,
X spans S. If a proper subset X1 of X also spanned S then any element x
of X − X1 would depend on X1 and so would depend on X − {x} by the
85
definition of dependence. But this contradicts X independent. Thus X is
actually minimal as a spanning set. The proof is complete.
Theorem 40 Maximal independant subsets of S exist.
proof: This is a straightforward application of Zorn’s Lemma. As remarked
above, the collection J of all independent subsets of S form a partially
ordered set (J , ≤) under the inclusion relation21 . Now consider any chain
C = {Jα } of independent sets and form their union C̄ := ∪α Jα . Then if
C̄ were not an independent set, there would exist an element x in C̄ and a
finite subset F of C̄ − {x} such that x depends on F . But since the chain
C is totally ordered and F ∪ x is finite, there exists an index σ such that
F ∪ x ⊆ Jσ . But this contradicts Jσ independent. Thus we see that C̄ must
be an independent set.
Thus every chain C in J possesses an upper bound C̄ in (J , ≤). By the
Zorn Principle, J contains maximal elements. The proof is complete.
2.9.5
Dimension.
The purpose of this section is to establish
Theorem 41 If X and Y are two maximal independent sets, then |X| = |Y |.
proof: By the Bernstein-Schroeder theorem, it suffices to show only that
|X| ≥ |Y | for any two maximal independent sets X and Y .
The proof has a natural division into two cases: the case X is a finite set
and the case that X is infinite.
First assume X is finite. We proceed by induction on the parameter
k := |X| − |X ∩ Y |. If k = 0, then X = X ∩ Y ⊆ Y . But as Y is independent
and each of its elements depends on its subset X, we must have X = Y ,
whence |X| = |Y | and we are done. Thus we assume k > 0.
Assume X = {x1 , . . . , xn } with xi pairwise distinct elements of S and the
indexing chosen so {x1 , . . . , xr } = X ∩ Y , thereby making k = n − r. From
maximality of the independent set X, every element of Y − X depends on X.
Similarly, every element of X depends on Y . If every element of Y depended
on X0 := {x1 , . . . , xn−1 } then xn , which depends on Y , would depend on X0
21
Actually, it forms a simplicial complex since it is an easy exercise to show that a subset
of an independent set must be independent.
86
by Lemma 38 (1) above, against the independence of X. Thus there is some
element y ∈ Y − X which depends on X but does not depend on X0 . By the
exchange condition, xn depends on X1 := {x1 , . . . , xn−1 , y}. Moreover, if X1
were not independent, then either y depends on X1 − {y} = {x1 , . . . , xn−1 }
or some xi depends on (X0 − {xi }) ∪ {y}. The former alternative is ruled out
by our choice of y. If the latter dependence held, then as the independence of
X prevents xi from depending on X0 − {xi }, the exchange condition would
force y to depend on X0 , again contrary to the choice of y. Thus X1 is
an independent set of n distinct elements. But as xn depends on X1 , each
element of X depends on X1 . Thus as all elements of S depend on X, they
all depend on X1 as well. Thus X1 is a maximal independent set. But since
|X1 ∩ Y | = 1 + |X ∩ Y |, induction on k yields |X1 | ≥ |Y |. The result now
follows from |X| = |X1 |.
Now assume X is an infinite set. Since Y is a maximal independent set,
it is a spanning set. Thus for each element x in X there is at least one
finite non-empty subset Yx of Y on which it depends. Choose, if possible
y ∈ (Y − ∪X Yx ). Since y depends on X and by construction every element of
X depends on ∪X Yx , we see that y depends on ∪X Yx ⊆ Y , contradicting the
independence of Y . Thus we see that no such y can exist and so Y = ∪X Yx .
Now by Corollary 5
|∪X Yx | ≤ |X|
so |Y | ≤ |X| as required. The proof is complete.
This common cardinality of maximal independent sets is called the dimension of the dependence system S.
2.9.6
Matroids
We present here another view of dependence theory. At the same time the
developement in this section illustrates many of the principles and defined
structures encountered earlier in this chapter.
Fix a set X and let I be a family of subsets of X. The pair (X, I) is
called a matroid if and only if the following axioms hold:
(M1) The family I is closed under taking subsets.
(M2) (Exchange axiom) If A, B ∈ I with |A| > |B|, then there exists an
element a ∈ A − B, such that {a} ∪ B ∈ I.
87
(M3) If every finite subset of a set A belongs to I, then A belongs to I.
Note that if A and B are maximal elements of (I, ⊆), then |A| = |B| is
an immediate consequence of axiom (M2). However, from axioms (M1) and
(M2) alone, it does not follow that maximal elements even exist. Consider
the following
Example 20.Let X be be the set of natural numbers and let I be all finite
subsets of X. Then certainly I is closed under taking subsets, so (M1) holds.
Also if A and B are finite sets with |A| > |B| there is certainly an element
a ∈ A − B such that {a} ∪ B is finite, Thus axiom (M2) holds.
But now it should be clear that I contains no maximal elements at all.
However, axiom (M3) avoids this problem.
Lemma 42 If (X, I) is a matroid, then every element of I lies in a maximal
element of I.
Proof: This proof utilizes Zorn’s lemma. Let A0 be an arbitrary member
of I. We examine the poset of all subsets in I which contain A0 . Let
A1 ⊆ A2 ⊆ · · · be a chain in this poset and let B be the union of all the
sets Ai . Consider any finite subset F of B. Then each element f ∈ F lies in
some member Ai(f ) of the chain. Setting m to be the maximal index in the
finite set {i(f )|f ∈ F }, we see that F ⊆ Am . Since Am ∈ I, (M1) implies
F ∈ I. But since F was an arbitrary finite subset of B, (M3) shows that
B ∈ I. Thus every finite chain in the poset of elements of I which contain
A0 has an upper bound in that poset, so that poset contains a maximal
element M . Clearly M is a maximal member of I, since any member of I
which contains M also contains A0 , and so lies in the poset of which M is a
maximal member. Thus A0 lies in a maximal member of I, as required.
Example 21. Let V be a vector space and let I be all linearly independant
subsets of V . Then axioms (M1) through (M3) hold so (V, I) is a matroid.
Example 22. Suppose (V, E) is a simple graph with vertex set V and
edge set E. (The adjective “simple” just means that edges are just certain unordered pairs of distinct vertices.) A cycle is a sequence of edges
(e0 , e1 , . . . , en ) such that ei shares just one vertex with ei+1 , indices i taken
modulo n. Thus en−1 shares a vertex with en = e0 , and the “length” of the
cycle, n, cannot be one or two. A graph with no cycles is called a forest –
88
its connected components are called trees. Now let I be the collection of
all subsets A of E such that the graph (V, A) is a forest. Then (V, I) is a
matroid.
Example 23. Let F be a fixed family of subsets of a set X. A finite
subset {x1 , x2 , · · · , xn } of X is said to be a partial transversal of F if
and only if there are pairwise distinct subsets A1 , . . . An of F such that xi ∈
Ai , i = 1, . . . , n — that is, the set {x1 , . . . , xn } is a “system of distinct
representatives” of the sets {Ai }. Now let I be the collection of all subsets
of X all of whose finite subsets are partial transversals. Then (X, I) is a
matroid.
Example 24. Suppose (X, I) is any given matroid, and Y is any subset of
X. Let I(Y ) be all members of I which happen to be contained in the subset
Y . Then it is straightforward that the axioms (M1), M(2) and (M3) all hold
for (Y, I(Y )). The matroid (Y, I(Y )) is called the induced matroid on Y .
2.9.7
The rank function
For any matroid M = (X, I), a maximal element of I is called a basis of
M . We have seen that maximal elements of I exist and that any two of them
have the same cardinality, which we denote by ρ(M ). But also any subset Y
of X can be regarded as an induced matroid Y = (Y, I(Y )) as in Example
24, and it too has a rank ρ(Y ) – the cardinality of a maximal member of
I(Y ) (basis of Y ). Thus rank is a function
ρ : subsets of X → cardinal numbers .
This is a useful function, and before going onward, we collect a few of its
properties.
Lemma 43 The rank function is monotone, that is, if A ⊆ B ⊆ X then
ρ(A) ≤ ρ(B) (as inequality is understood among cardinal numbers).
This is immediate from the fact that a maximal element of I(A) lies in
some maximal element of I(B).
Lemma 44 Suppose x and y are elements of a matroid M = (X, I) and B
is a finite subset of X. Then
ρ({x} ∪ B) = ρ({y} ∪ B) = ρ(B)
89
implies
ρ({x, y} ∪ B) = ρ(B).
Proof: Let F be a maximal element of I({x, y} ∪ B), and suppose that
|F | = ρ({x, y} ∪ B) > ρ(B).
Then by (M2), for any basis B0 for B there exists an element z ∈ F − B0 ,
such that {z} ∪ B0 ∈ I. Now if z were in B, the maximality of B0 in I(B)
would be in peril. So z is x or y. Suppose z = x ∈
/ B. Then as B0 is finite
and {x} ∪ B0 ∈ I, we would have
ρ({x} ∪ B0 ) ≤ ρ({x} ∪ B) = ρ(B),
which says
|{x} ∪ B0 | = |B0 |,
which absurd for these finite sets. By symmetry, the alternative that z = y
leads to a similar absurdity. Thus the assumption that |F | exceeds the rank
of B is untenable. The monotonicity of the rank function shows that ρ(B) ≤
|F |, so we must have equality.
Corollary 45 Suppose A and B are finite subsets of X, and for each element
a in A, ρ({a} ∪ B) = ρ(B). Then
ρ(A ∪ B) = ρ(B).
Proof: This just follows by iterating the lemma through the finitely many
elements of A.
Theorem 46 Let ρ be the rank function of the matroid M = (x, I). Then
it satisfiies these three rules:
(R1) For each subset Y of X, 0 ≤ ρ(Y ) ≤ |Y |.
(R2) ( Monotonicity) If Y1 ⊆ Y2 ⊆ X, then ρ(Y1 ) ≤ ρ(Y2 ).
(R3) (The submodular inequality) If A and B are subsets of X, then
ρ(A ∪ B) + ρ(A ∩ B) ≤ ρ(A) + ρ(B).
90
proof: The first two statements follow immediately from the definition of
the rank function. So we need only prove the submodular inequality.
Let A and B be two subsets of X. Let F be a basis for A ∩ B, and let
G be a basis of A ∪ B containing F . Then G ∩ A and G ∩ B belong to I(A)
and I(B), respectively, so
|G ∩ A| ≤ ρ(A) and |G ∩ B| ≤ ρ(B).
Then
ρ(A) + ρ(B) ≥
=
=
=
=
|G ∩ A| + |G ∩ B|
|(G ∩ A) ∪ (G ∩ B)| + |(G ∩ A) ∩ (G ∩ B)|
|G ∩ (A ∪ B)| + |G ∩ A ∩ B|
|G| + |F |
ρ(A ∪ B) + ρ(A ∩ B).
The proof is complete.
The properties (R1) – (R3) actually characterize matroids.
Theorem 47 Suppose r is a map from the set of all subsets of a set X, into
the cardinal numbers, satisfying the three rules (R1), (R2), (R3). Let I the
the collection of a sets X with the property
(*) For every finite subset FX of X we have |FX | = r(FX ).
Then (X, I) is a matroid.
2.9.8
Dependence theories and matroids are the same
thing
For any matroid (X, I), the subsets in I are usually called independent sets
– a name we have avoided up to now, since independent sets have already
been defined in the context of a dependence theory. This suggestive name is
in fact justified by the following theorem:
Theorem 48 Let X be a set and let F(X) be all finite subsets of X. Let D ⊆
X × F(X) be a dependence theory in the sense of the previous subsections.
(We say “x depends on A” if xDA.) Let I be the collection of all subsets Y
of X independant with respect to D.
Then (X, I) is a matroid.
91
Proof: Recall that in our dependence theory any (possibly infinite) subset
Y of X is said to be an “independent set” if and only if:
For each element y ∈ Y , y does not depend on any finite subset of Y − {y}.
It is immediate that a subset of an independent set is independant. Equally
immediate from this definition is the fact that a subset is independant if and
only if every finite subset of it is independent. So (M1) and (M3) hold.
Now suppose A and B are independent subsets of X with |A| > |B|. Then
in particular A − B is non-empty. Now assume by way of contradiction that
the conclusion (M2) were false. Then for every element a ∈ A − B, {a} ∪ B is
not independent. Since B is independent by hypothesis, the Exchange Axiom
forces a to depend on B. Thus every element of A − B depends on B. But
also, every element of A ∩ B depends on B, simply by the Reflexive Property.
Thus A ⊆ hBi, the flat spanned by B. Since B is an independent spanning
set for hBi, B is a basis for hBi. But as A is also an independent subset
of hBi, it lies in a maximal independent subset A1 of hBi. By Theorem 41,
|A| ≤ |A1 | = |B|. This contradicts |A| > |B|. The proof is complete.
Now suppose we are given a matroid (X, I). Can we recover a dependence theory from this matroid? Clearly we need to have a definition of
“dependence”. Consider this definition:
Definition Let x be an element of X and let A be a finite subset of X. We
say that x depends on A if and only if there exists a subset A0 of A
which lies in I, such that either (i) x ∈ A0 , or (ii) {x} ∪ A0 is not in I.
One consequence of this definition is
Lemma 49 If B is a finite set, x depends on B if and only if
ρ({x} ∪ B) = ρ(B).
(2.42)
Proof: Suppose x depends on B. Then there exists a finite set B0 ∈ (B),
such that {x}∪B0 ∈
/ I. Suppose by way of contradiction that ρ({x}∪B)ρ(B).
Then every basis of {x} ∪ B contains x. Since B0 is in I(B) it extends to a
basis B 0 of {x}∪B. But, as remarked, x ∈ B 0 , where (M1) forces {x}∪B0 ∈ I,
a contradiciton. Thus ρ({x} ∪ B) = ρ(B).
Now we must prove the converse statement. Suppose B is a finite set
such the ρ({x} ∪ B) = ρ(B). We must show that x depends on B.
92
Suppose first that x ∈ B. If {x} ∈
/ I, set Z = ∅ ∈ I, and if {x} ∈ I, set
Z = {x}. Then in either case, Z ∈ I(B), while x ∈ Z or {x} ∪ Z ∈
/ I. Thus
x depends on B, by our definition. Thus we may assume that x is not in B.
Let B0 be a maximal member of I(B), that is a basis for B. Now as x
is not in B,and B is finite, {x} ∪ B0 ∈ I would imply that ρ({x} ∪ B) >
|B0 | = ρ(B), contrary to assumption. Thus {x} ∪ B0 ∈
/ I, while B0 ∈ I, so
x depends on B, according to our definition.
Theorem 50 Given a matroid (X, I), let the relation of “dependence” between elements of X and finite subsets of X be given as in the preceeding
definition. Then this notion satisfies the axioms of a dependence theory.
Proof: First suppose x ∈ A. Then ρ({x} ∪ A) = ρ(A) and so, as A is finite,
Lemma 49 shows that x depends on A. Thus the Reflexive Property holds.
The Transitivity Property: Suppose x depends on set finite set A, and
that every every element of A depends on set B. We must establish the
conclusion that x depends on B. If x ∈ A, the conclusion is already part of
the hypothesis. If x ∈ B, the conclusion follows from the reflexive property
already established. So without loss of generality, assume x ∈
/ A ∪ B.
Now for each element a ∈ A, there exists a finite subset Ba ∈ I(B) , such
that {a} ∪ Ba ∈
/ I. Let B0 be the finite union of these finite sets, that is
B0 := ∪a∈A Ba .
Then by Lemma 49 it is correct to say that
ρ({a} ∪ B0 ) = ρ(B0 ) for each a ∈ A,
as well as
ρ({x} ∪ A) = ρ(A).
Since both A and B0 are finite, Corollary 45 shows
ρ(A ∪ B0 ) = ρ(B0 ).
Now, as x depends on A, x depends on the finite set A ∪ B0 . So
ρ({x} ∪ A ∪ B) = ρ(A ∪ B).
93
It follows that
ρ({x} ∪ B0 ) ⊆ ρ({x} ∪ A ∪ B)
= ρ(A ∪ B0 ) = ρ(B0 )
Thus by Lemma 49, x depends on B0 and so x depends on B.
The Exchange Condition: Suppose B = {b1 , . . . , bn } and B 0 := B − {xn },
and the element x depends on B but not B 0 . Then
ρ({x} ∪ B) = ρ(B)
while
ρ({x} ∪ B 0 ) = 1 + ρ(B 0 ).
Now suppose ρ(B 0 ) = ρ(B). Then
ρ({x} ∪ B 0 ) ≤ ρ({x} ∪ B)
= ρ(B) = ρ(B 0 ),
from which Lemma 49 would imply that x depends on B 0 , against the hypothesis. Thus we have
1 + ρ(B 0 ) = ρ(B) = ρ({x} ∪ B 0 ).
(2.43)
Now everything can be assembled.
ρ({bn } ∪ ({x} ∪ B 0 )) = ρ({x} ∪ B)
= ρ(B)
= ρ({x} ∪ B 0 .
From the equality of the first and last terms and Lemma 49 applied to the
finite set {x} ∪ B 0 , we see that bn depends on {x} ∪ B 0 . That, of course, is
the desired conclusion of the exchange condition.
2.9.9
Other formulations of dependence
There are purely combinatorial ways to express the gist of a dependence
theory, and these are the various formulations of the notion of a matroid in
terms of flats, in terms of axioms on circles, or in terms of closure operators.
94
Lets consider flats, for a moment. We have already defined them from
the point of view a dependence theory. From a matroid point of view, the
flat hAi spanned by subset A in matroid (X, I), is the set of all elements x
for which {x} ∪ Ax ∈
/ I for some finite subset Ax of A. It is an easy exercize
to show that hhAii = hAi, for all subsets A of X. In fact
Theorem 51 The mapping τ which sends each subset A of Xto the flat hAi
spanned by A, is a closure operator on the lattice of all subsets of X. The
image sets – or “closed” sets – form a lattice:
1. The intersection of two closed sets is closed, and is the meet in this
lattice.
2. The closure of the set-theoretic union of two sets is a join, that is, a
global minimum in the poset of all flats above the two sets.
The reader should be able to prove this.
Proposition 52 Suppose P = P (X) is the poset of all subsets of a set X
and τ : P → P satisfies
(Increasing) For each subset A, A ⊆ τ (A).
(Closure) The mapping τ is a closure operator – that is, it is an idempotent
monotone mapping of a poset into itself.
(The Steinitz-MacLane Exchange Property) If X ⊆ P and y, z ∈ P − τ (X)
then y ∈ τ (X ∪ {z}) implies z ∈ τ (X ∪ {y}).
Then, setting
I = {A ⊆ X|x ∈
/ τ (A − {x}) for all x ∈ A},
we have that M := (X, I) is a matroid.
Conversely, if (X, I) is a matroid, then the mapping τ which takes each
subset of A to the flat hAi which it spans, is a monotone increasing closure
operator P → P possessing the Steinitz-MacLane Exchange Property.
95
2.9.10
Exercises on Matroids
Exercise 24. Prove the three parts of Lemma 38 for the extended definition
of dependence allowing infinite subsets of S.
Exercise 25. Verify the axioms of a matroid for Example 22, on the edge-set
of a graph.
Exercise 26. Verify the axioms of a matroid for Example 23, on finite partial
transversals.
Exercise 27. Let Γ = (V, E) be a simple graph. Let M = (E, I) be the
matroid of Example 22. For any subset F of the edge set E, let
(V, F ) = ∪σ∈K (Vσ , Fσ )
be a decomposition of Γ into connected components. For each connected
component (Vσ , Fσ ), let Eσ be the collection of edges connecting two vertices
in Vσ . Thus (Vσ , Eσ ) is the subgraph induced on the vertex set Vσ . Prove
that in the matroid, the flat spanned by F is the union ∪σ∈K Eσ .
Exercise 28. Let M = (X, I) be a matroid. A set which is minimal with
respect to not lying in I is called a circuit. Show that any circuit is finite.
[Remark: There is also a characterization of matroids by circuits. See Oxley,
Prop. 1.3.10.]
96
Chapter 3
Review of Elementary Group
Properties
3.1
Introduction
In mathematics we often know various ways to construct something by inputing some data D to some procedure P . If we wish to classify these same
somethings up to isomorphism, we are faced with two problems: Does the
construction procedure P give all such objects? Can I tell when two distinct
data, D1 and D2 yield isomorpic objects when inputted to P ? If the answer
to the first question is “No” then P is not an interesting procedure, and one
might think of discarding it. But if the answer to the first question is “yes ”
– or even almost “yes” – the second question becomes interesting. But this
question comes down to all the ways some one object might be “realized”
as the consequence of datum D and procedure P . Often it happens that
each way to do this is obtained by first appying an automorphism to the
something and then performing a “canonical” realization. In that case, to
classify the somethings, we need to know about their automorphisms.
This is just one of the reasons that group theory is needed, whatever field
of mathematics you might choose to enter. There are of course many other
reasons having to do with special uses of groups, such as “Polya counting”1 ,
quantum physics, invariant theory etc.). But overall, the need to classify
things seems the broadest reason that group theory is pertinent to all of
mathematics.
1
which has a long pre-Polya history
97
Groups are defined by a hallowed set of axioms which are useful only
for the purpose of banishing all ambiguities from the subject. Somehow,
staring at the axioms of a group mesmerizes one away from their basic and
natural habitat. Every group that ever exists in the world is in fact the full
group of symmetries of something. That is what the theory of groups is: the
study of symmetries. But it is useful to know this only on a philosophical
plane. Knowing that there is a set of objects such that every group – or
even every finite group –is the full group of automorphisms of at least one
of these objects, is not very helpful for classifying groups. Moreover it is
evidence that the converse problem of classifying the objects which exhibit
these groups is not only not brought nearer to a classification; it points in
the opposite direction: these objects probably cant be classified.2
3.2
Definition and examples.
This is probably a good place to explain a convention. A group is a set G
with a binary operation with certain properties – of which there are a good
many examples. But familiar examples breed familiar customs. In some of
these examples the binary operation is written as “+”. in others as “◦” or
“×”. How then does one speak generally about all possible groups? The
standard solution is this: The group operation of an arbitrary group, will
be indicated by simple juxtaposition, the act of writing one symbol directly
after the other – thus, the group operation, applied to the ordered pair (x, y)
2
I hope I will not belabor the point in recalling the day that a colleague of mine
announced to me that he had just proved that for any finite group G, there existed a
projective plane P whose automorphism group was G. I know that I dissapointed my
colleague by my lack of enthusiasm for the theorem – as much as I wished to encourage
him. The problem is that this just shows how hopeless the subject of projective planes
is. Every single projecive plane he could exhibit whose automorphism group was a finite
group G was in fact an infinite projective plane. Far from being the tight object which
everyone had imagined it was, projective planes (especially the infinite ones) are such a
loose subject that every finite group can be presented as their automorphism group. This
is a theorem stating how bad the subject of projective planes really is. (I am sure his proof
was not easy, but it is doomed to be a non-useful theorem. My remarks apply in fact to
any Theorem of the form “For any finite group G, there exists an object X(G) in class C
( a class which has not been classified (such as finite groups or it subclass p-groups)), such
that Aut(X(G) ' G. In short, it does not advance the classification problem for groups to
construct a Zoo of objects in some equally unclassified category, exhibiting each of these
groups as their automorphism group. Enough of sermons!
98
will be denoted xy.
A group is a set G equipped with a binary operation G×G → G (denoted
here by juxtaposition) such that the following statements hold:
1. (The Associative Law) The binary operation is associative – that is
(ab)c = a(bc) for all elements a, b, c of G. (The parentheses indicate
the temporal order in which operations were performed. This axiom
more-or-less says that the temporal order of applying the group operations doesn’t matter, while, of course, the order of the elements being
operated on does matter.)
2. (Identity Element) There exists an element, say e, such that eg = g =
ge for all elements g in G.
3. (The Existence of Inverses) For each element x in G, there exists an
element x0 such that x0 x = xx0 = e where e is the element referred to
in Axiom 2.3
One immediately deduces the following
Lemma 53 For any group G one has:
1. The identity element e is the unique element of G possessing the property of Axiom 2.
2. Given element g in G, there is at most one element g 0 such that gg 0 =
g 0 g = e.
3. In light of the preceeding item 2, we may denote the unique inverse of
g by the symbol g −1 . Then note that
(i) (ab)−1 = (b−1 )(a−1 ) for all elements a, b ∈ G.
(ii) (a−1 )−1 = a, for each element a ∈ G.
(iii) For any element x and natural number k, the product xx · · · x
(with k factors) is a unique element which we denote as xk . We
have (xk )−1 = (x−1 )k . ( As a notational convention, this element
is also written as x−k .)
3
People have figured out how to “improve” these axioms, by hypothesizing only right
identities and left inverses and so on. There is no attempt here to make these axioms
independent or logically neat. – indeed they over-state their case. Yet, a little redundancy
won’t hurt at this stage.
99
.
Example 1. Familiar Examples:
(a) The group (Z, +) of integers under the operation of addition.
(b) The group of non-zero rational numbers under multiplication.
(c) The group of non-zero complex numbers under multiplication.
(d) The cyclic group of order n. This can be thought of as the group
of rotations of a regular n-gon. This group consists of the elements
{1, x, x2 . . . , xn−1 } where x is a clockwise rotation of 2π/n radians and
“1” is the identity transformation. All multiplications of the powers of
x can be deduced solely from the algebraic identity xn = 1.
(e) The dihedral group of order 2n. This is the group of all rigid
symmetries of the regular n-gon. In addition to the group of n rotations
just considered, it also contains n reflections which are symmetries of
the n-gon. If n is an odd number, these reflections are about an axis
through a corner or vertex of the polygon and bisecting an opposite side.
If n is even, the axes of the reflections are of two sorts: those which go
from a vertex through its opposite vertex, and those which bisect a pair
of opposite sides. There are then n/2 of each type. The elements of the
group all have the form ti xj , where t is any reflection, x is the clockwise
rotation by 2π/n radians and 0 ≤ t ≤ 1 and 0 ≤ j ≤ n − 1. The results
of all multiplications of such elements can be deduced entirely from the
relations xn = 1, t2 = 1, tx = x−1 t (and its consequence, tx−1 = xt).
(f) The symmetric groups. Let X be a set. A permutation of the
elements of X is a bijective mapping X → X. This class of mappings
is closed under composition of mappings, inverses exists, and it is easy
to verify that they form a group with the identity mapping 1X which
takes each element to itself, as the group identity element. This group
is called the symmetric group on X and is denoted Sym(X). If
|X| = n it is well known from elementary counting principles that
there are exactly n! permutations of X. In this case one writes Sym(n)
for Sym(X), since the names or identities of the elements of X do not
really affect the nature of the group of all permutations.
100
(g) The group of rigid motions of a (regular) cube. Just imagine a wooden
cube on the desk before you. We consider the ways that cube can be
rotated so that it achieves a position congruent to the original one. The
result of doing one rotation after another is still some sort of rotation.
It should be clear that these rotations can have axes which are situated
in three different ways with respect to the cube. The axis of rotation
may pass through the centers of opposite faces, it may pass though the
midpoints of oppositie edges, or it could be passing through opposite
vertices.
(h) Let G = (V, E) be a simple graph. This means the edges E are pairwise
distinct 2-subsets of the vertex set V . Two (distinct) vertices are said
to be adjacent if and only if they are the elements of an edge. Now the
group of automorphisms of the graph G is the set of permutations
of the set of vertices V which preserve the adjacency relation. The
group operation is composition of the permutations.
(i) Let R be any ring with identity element e.4 An element u of R is said to
be a unit if and only there exists a two-sided multiplicative inverse
in R – that is, an element u0 such that u0 u = e = uu0 . (Observe in this
case, that u0 itself must be a unit.) Then the set U (R) of all units of
ring R forms a group under ring multiplication. Some specific cases:
(a) The group U (Z) of units of the ring of integers Z, is the set of
numbers {+1, −1} under multiplication. Clearly this group behaves just like the cyclic group of order 2, one of those introduced
in Example (d) above.
(b) Let D = Z ⊕ Zi, where i2 = −1, the ring of Gaussian integers
{a + bi|a, b ∈ Z} as a multiplicatively and additively closed subset
(that is a sub-ring) of the ring of all complex numbers. Then the
reader may check that U (D) is the set {±1, ±i} under multiplication. This is virtually identical with the cyclic group of order four
as defined in Example (d).
(c) Let V be a left vector space over a division ring D. Let hom(V, V )
be the collection of all linear transformations f : V → V (viewed
as right operators on the set of vectors V ). This is an additive
4
It is presumed that the reader has met rings before. Not much beyond the usual
definitions are presumed here.
101
group under the operation “+” where, for all f, g ∈ hom(V, V ),
(f + g) is the linear transformation which takes each vector v to
vf + vg. With composition of such transformations as “multiplication” the set hom(V, V ) becomes a ring 5 Now the group of
units of hom(V, V ) would be the set GL(V ) of all linear transformations t : V → V where t is bijective on the set of vectors (i.e. a
permutation of the set of vectors. (V is not assumed to be finite
or even finite-dimensional in this example.) The group GL(V ) is
called the general linear group of V .
(j) The group GL(n, F ). Let F be a field, and let G = GL(n, F ) be
the set of all invertible n × n matrices with entries in F . 6 This set is
closed under multiplication and forms a group. In fact it is the group
of units of the ring of all n ×n matrices with respect to ordinary matrix
multiplication and matrix addition.
3.2.1
Orders of elements and groups
Suppose x is an element of a group G. We inductively define xn+1 := x(xn ).
Thus x1 = x, x2 = xx, x3 = xx2 = xxx = x2 x (because of the associative
law), and so on. If there exists a natural number n such that xn = e, then x is
said to have finite order. Of course if there is not such a natural number,
then it is an easy exercise to see that the the members of G represented
by the infinite set {1, x, x2 , x3 , . . .} are pairwise distinct. In this case we
say that x has infinite order. But clearly the set of natural numbers
Nx := {n ∈ N|xn = e} is an ideal in the ring Z. Since Z is a priciple ideal
domain with only {±1} as its group of units, it follows that there is a unique
least natural number n in Nx , which we call the order of the element
x. We denote this number by the sympbol o(x). Note that because Nx is
a principal ideal generated by o(x), we see that xm = e implies that o(x)
divides m. If x has finite order, the number o(x) is called the order of the
element x.
5
Actually it is more, for hom(V, V ) can be made to have the structure of a vector space,
and hence an algebra if D possesses an anti-automorphism – e.g. when D is a field. But
we can get into this later.
6
Recall from your favorite undergraduate linear algebra or matrix theory course that
if n is a natural number, then a matrix M has a right inverse if and only if it has a left
inverse (this is a consequence of the equality of row and column ranks).
102
Elements of order 2 play a special role in finite group theory, and so are
given a special name: any element of order two is called an involution.
The order of a group G is the number |G| of elements within it.
3.2.2
Subgroups.
Let G be a group. For any two subsets X and Y of G, the symbol XY
denotes the set of all group products xy where x ranges over X and y ranges
independantly over Y . (It is not immediately clear just how many group elements are produced in this way, since a single element g might be expressible
as such a product in more than one way. At least we have |XY | ≤ |X| · |Y |.)
A second convention is to write X −1 for the set of all inverses of elements
of the subset X. Thus X −1 := {x−1 |x ∈ X}. This time |X −1 | = |X|, since
the correspondence x → x−1 defines a bijection X → X −1 (using Lemma 53
3(ii) here.)
A subset H of G is said to be a subgroup of G if and only if
1. HH ⊆ H, so that by restriction H admits the group operation of G,
and
2. with respect to this operation, H is itself a group.
One easily obtains the useful
Lemma 54 (Subgroup Criterion.) For any subset X of G, the following
are equivalent:
1. X is a subgroup of G.
2. X = X −1 and XX ⊂ X.
3. XX −1 ⊆ X.
.
A remark on notation. Whenever subset X is a subgroup of G, we write
X ≤ G instead of X ⊆ G. This is in keeping with the general practice in
abstract algebra of writing A ≤ B whenever A is a subobject of the same
algebraic species as B. For example we write this if A is an R-submodule
of the R-module B, and its special case: when A is a vector subspace of a
vector space B. Usually the context should make it clear what species of
algebraic object we are talking about. Here it is groups.
103
Corollary 55
1. The set-intersection over any family of subgroups of G
is a subgroup of G.
2. If A and B are subgroups of G, then AB is a subgroup of G if and only
if AB = BA (an equality of sets).
3. For any subset X of G, the set hXiG of all finite products y1 y2 · · · yn ,
n ∈ N where the yi range independantly over X ∪ X −1 , is a subgroup
of G, which is contained in any subgroup of G which contains subset
X. Thus we can also write
hXiG = ∩X⊆H≤G H
where the intersection on the right is taken over all subgroups of G
which contain X.
The subgroup hXiG (which is often written hXi when the “parent” group
G is understood) is called the subgroup generated by X. As is evident
from the Corollary, it is the smallest subgroup (in the poset of all subgroups
of G) which contains set X.
Example 2. Examples of Subgroups.
(a) Let Γ(V, E) be a graph with vertex set V and edge set E, and set H :=
Aut(Γ), the group of automorphisms of the graph Γ, as in Example (h)
of groups in the previous subsection. Then H is a subgroup of Sym(V ),
the symmetric group on the vertex set.
(b) Cyclic Subgroups Let x be an element of the group G. The set
hxi := {xn |n ∈ Z} (where we adopt the convention of Lemma 53 3(iii)
that x−n = (x−1 )n ) is clearly a subgroup of G (Corollary 55). Now one
sees that the order of the element x is the (group) order of the subgroup
hxi.
(c) Let Y be a subset of X. The set of all permutations of X which
map the subset Y onto istelf forms a subgroup of Sym(X) called the
stabilizer of Y . If G is any subgoup of Sym(X), then the instersection
of the stabilizer of Y with G is called the stabilizer of Y in G and
is denoted StabG (Y ). This is clearly a subgroup of G. Thus it makes
sense to speak of the stabilizer of a specified vertex in the automorphism
group Aut(Γ) of a graph Γ. (By convention one writes Stab(y) instead
of Stab({y}). Note also that all these definitions apply when G =
Sym(X), as well.)
104
(d) Let us return once more to the example of the group of rotations of a
regular cube (example (g) of the previous subsection). We asserted that
each non-identity rigid motion was a rotation in ordinary Euclidean 3space about an axis symmetrically placed on the cube. One can see that
the group of rotations about an axis through the center of a square face
and through the center of its opposite face, is a cyclic group generated
by a rotaion y of order 4. There are 6/2 = 3 such axes: in fact, they
can be taken to be the three pair-wise orthogonal coordinate axes of
the surrounding Euclidean space. This contributes 6 elements y of
order 4 and 3 elements of order 2 (such as y 2 ). Another type of axis
extends from the midpoint of one edge to the midpoint of an opposite
edge. Rotations about such an axis form a subgroup of order two. The
generating rotation t of order two does not stabilize any face of the cube,
and so is not any of the involutions (elements of order two) stabilizing
any of the previous “face-to-face” axes. Since there are 12 edges in six
opposite pairs, these “edge-to-edge” axes contribute 6 new involutions
to the group. Finally there are 8/2 = 4 “vertex-to-vertex axes, the
rotations about which form cyclic subgoups generated by a rotation of
order 3. Thus each of these 4 groups contribute two elements of order
three.
Thus the group of rotations of the cube contains 1 identity element,
6 elements of order four, 3 elements of order two stabilizing a face, 6
involutions not stabilizing a face, and 8 elements of order three – a
total of 24 elements.
Cosets.
Suppose H is a subgroup of a group G. If x is any element of G, we write
Hx for the product set H{x} introduced in the last subsection. Such a set
Hx is a right coset of H in G. If x and y are elements of G and H ≤ G,
then y is an element of Hx if and only if Hy = Hx. This is any easy exercise.
It follows that all the elements of G are partitioned into right cosets as
G = ∪x∈T Hx a disjoint union of right cosets, for appropriate T.
Here T is merely a set consisting of one element from each right coset. Such
a set is called a system of right coset representives of H in G, or
sometimes a (right) transversal of H in G.
105
The components of this partition – that is the sets {Hx|x ∈ T } for any
transversal T – is denoted G/H. It is just the collection of all right cosets
themselves. The cardinality of this set is called the index of H in G and is
denoted [G : H].
One notes that right multiplication by element x induces a bijection H →
Hx. Thus all right cosets have the same cardinality. Since they partition all
the elements of G we have the following:
Lemma 56 (Lagrange’s Theorem)
1. If H ≤ G, then |G| = [G : H] · |H|.
2. The order of any subgroup divides the order of the group.
3. The order of any element of G divides the order of G.
We conclude with a useful result
Lemma 57 Suppose A and B are subgroups of the finite group G. Then
|AB| · |A ∩ B| = |A| · |B|.
proof: Consider the mapping f : A × B → AB, which maps every element
(a, b) of the Cartesian product to the group product ab. This map is surjective, and the fibre f −1 (ab) contains all pairs {(ax, x−1 b|x ∈ A ∩ B}. Thus
|A × B| ≥ |A ∩ B| · |AB|. On the other hand, if ab = a0 b0 for (a, b) and (a0 , b0 )
in A × B, then a−1 a0 = b(b0 )−1 = x ∈ A ∩ B. But then ax = a0 , x−1 b = b0 . So
the fibres are no larger than |A ∩ B|. This gives the inequality in the other
direction.
3.3
Homomorphisms of Groups
3.3.1
Definitions and basic properties.
Let G and H be groups. A mapping f : G → H is called a homomorphism
of groups if and only if
f (xy) = f (x)f (y) for all elements x, y ∈ G
106
(3.1)
Here as was our convention, we have represented the group operation of both
abstract groups G and H by justaposition. Of course in actual practice, the
operations may already possess some other notation customary for familiar
examples.
For any subset X, we set f (X) := {f (x)|x ∈ X}. In particular, the set
f (G) is called the homomorphic image of G.
We have the usual glossary for special properties of homomorphisms.
Suppose f : G → H is a homomorphism of groups. Then
1. f is an epimorphism if f is onto – that is, f is a surjection of the
underlying sets of group elements.
2. f is an embedding of groups whenever f is an injection of the underlying set of group elements.
3. f is an isomorphism of groups if and only if f is a bijection of the
underlying set of elements – that is, f is both an embedding and an
epimorphism.
4. f is an endomorphism of groups if it is a homorphism into itself.
5. f is an automorphism of a group G if it is an isomorphism G → G
of G to itself.
The following is an elementary exercise.
Lemma 58 Suppose f : G → H is a homomorphism of groups. Then
1. If 1G and 1H denote the unique identity elements of G, then f (1G ) =
1H .
2. For any element x of G, f (x−1 ) = (f (x))−1 .
3. The homomorphic image f (G) is a subgroup of H.
.
Lemma 59 Suppose f : G → H and g : H → K are group homomorphisms.
Then the composition g ◦ f : G → K is also a homomorphism of groups.
Moreover:
1. If f and g are epimorphisms, then so is g ◦ f .
107
2. If f and g are both embeddings of groups, then so is g ◦ f ,
3. If f and g are both isomorphisms, then so is g ◦ f , and the inverse
mapping f −1 : H → G,
4. If f and g are both endomorphisms (i.e. G = H = K), then then so is
g ◦ f , and,
5. if f and g are both automorphisms of G then so are g ◦ f and f −1 .
Thus the set of automorphisms of a group G form a group under composition
of automorphisms. (This is called the automorphism group of G and is
denoted Aut(G)).
Finally we introduce an invariant associated with every homomorphism
of groups. The kernel of the group homomorphism f : G → H is the set
ker f := {x ∈ G|f (x) = 1H }.
The beginning reader should use the subgroup criterion to verify that ker f is
a subgroup of G. If f (x) = f (y) for elements x and y in G, then xy −1 ∈ ker f ,
or equivalently, (ker f )x = (ker f )y as cosets. Thus we see
Lemma 60 The group homomorphism f : G → H is an embedding if and
only if ker f = 1, the identity subgroup of G.
We shall have more to say about kernels later.
3.3.2
Automorphisms as right operators.
As noted just above, homomorphisms of groups may be composed when the
arrangement of domains and codomains allows this. In that case we wrote
(g ◦ f )(x) for g(f (x)) – that is, f is applies first, then g.
As remarked in Chapter 1, that notation is not very convenient if composition of mappings is to reflect a binary operation on the set of mappings
itself. We have finally reached such a case. The automorphisms of a group
G themselves form a group Aut(G). To represent how the group operation is
realized by composition of the induced mappings, it is notational convenient
to represent then in “exponential notation” and view the composition as a
composition of right operators.
108
(The exponential convention) If σ is an automorphism of a group G,
and g ∈ G, we rewrite σ(g) as g σ . This way, passing from the group
operation (in Aut(G) ) to composition of the automorphisms does not
entail a reversal in the order of the group arguments: We have,
xστ = (xσ )τ .
7
3.3.3
Examples of homomorphisms.
Symmetries that are induced by group elements on some object X are a
great source of examples of group homomorphisms. Where possible in these
examples we write these as left operators with ordinary composition “◦” —
but we will begin to render these things in exponentional notation here and
there, to get used to it. In the next Chapter on group actions, we will be
using the exponential notation uniformly when a group acts on anything.
Example 3. Examples of homomorphisms.
(a) Suppose there is a bijection between sets X and Y . Then there is an
isomorphism Sym(X) → Sym(Y ). This just amounts to changing the
names of the objects being permuted.
(b) Let R∗ be the multiplicative group of all nonzero real numbers, and let
R+∗ be the multiplicative group of the positive real numbers. Then the
“squaring” mapping, which sends each element to its square, defines
an epimorphism which is an endomorphism R∗ → R+∗ , and an endomorphism R∗ → R∗ . The kernel is the multiplicative group consisting
of the real numbers ±1 in both cases.
(c) In any intermediate algebra course (cf. books of Dean or Herstein, for
example), one learns that complex conjugation (which sends any complex number z = a + bi to z̄ := a − bi, a and b real) is an automorphism
of the field of complex numbers. It follows that the norm mapping
N : C∗ → R+∗ from the multiplicative group of non-zero complex
7
This is part of a general scheme in which elements of some ‘abstract group’ G (with
its own multiplication) induce a group of symmetries Aut(X) of some object X so that
group multiplication is represented by composition of the automorphisms. These are called
“group actions” and are studied carefully in the next chapter.
109
numbers to the multiplicative group of positive real numbers, satisfies N (z1 z2 ) = N (z1 )N (z2 ) and hence is a homomorphism of groups.
The kernel of the homomorphism is the group C1 of complex numbers of norm 1 – the so-called circle group . (When one considers
that N (a + bi) = a2 + b2 , a, b ∈ R, it is not mysterious that the set
of integers which are the sum of two perfect squares is closed under
multiplication.)
(d) (Part 1.) Now consider the group of rigid rotations of the (regular)
cube. There are four diagonal axes intersecting the cube from a vertex
to its opposite vertex. These four axes intersect at the center of the
cube, which we take to be the origin of Euclidean 3-space. The angle
α formed at the origin by the intersection of any of these two axes,
satisfies cos(α) = ±1/3. Let us label these four axes 1, 2, 3, 4 in any
manner. As we rotate the cube to a new congruent position, the four
axes are permuted among themselves. Thus we have a mapping
rotations of the cube → permutations of the labeled axes
which defines a group homomorphism
rigid rotations of the cube → Sym(4),
the symmetric group on the four labels of the axes. The kernel would
be a group of rigid motions which stabilizes each of the four axes. Of
course it is conceivable that some axes are reversed (sent end-to-end)
by such a motion while others are fixed pointwise by the same motion.
In fact if we had used the three face-centered axes, it would be possible
to reverse two of the axes while rigidly fixing the third. But with these
four vertex-centered axes, that is not possible. (Can you show why? It
has to do with the angle.) So the kernel here is the identity rotation
of the cube. Thus we have an embedding of the rotations of the cube
into Sym(4). But we have seen in the previous subsection that both of
these groups have order 24. Thus by the “pidgeon-hole principle” the
homomorhism we have defined is an isomorphism.
(d) (Part 2.) Again G is the group of rigid rotations of the cube. There
are exactly three face-centered axes which are at right angles to one
another. A 90o rotation about one of these three axes fixes it, but
110
transposes the other two. Thus if we label the three face-centered
axes by the letters {1, 2, 3}, and send each rotation in the group G to
the permutation of the labels of the three face-centered axes which it
induces, we obtain an epimorphism of groups G → Sym(3), or, in view
of part 1 of (d), a homomorphism Sym(4) → Sym(3). The kernel is
the group K which stabilizes each of the three face-centered axes. This
group consists of the identity element, together with three involutions,
each being a 180o rotation about one of the three face-centered axes.
Multiplication in K is commutative.
(e) Linear groups to matrix groups. Now let V be a vector space over
a field F of finite dimension n. We have seen in the previous subsection
that the bijective linear transformations from V into itself form a group
which we called GL(V ), the general linear group on V . Now fix a
basis A = {v1 , . . . , vn } of V . Any linear transformation T : V → V ,
viewed as a right operator of V can be represented as a matrix
AT A
:= (pij )
where
T (vi ) = pi1 v1 + pi2 v2 + · · · + pin vn
(so that the rows of the matrix depict the fate of the vector vi under
T )8 . Then we see that
A (T
◦ S)A = A (T )A · A (S)A
(where chronologicaly, the notation intends that T is applied first, then
S, being right operators.) This way the symbolism does not transpose
the order of the arguments, so in fact the mapping
T →A TA
8
Thanks to the analysts’ notation for functions, combined with our habit of reading
from left to right, many linear algebra books make linear transformations left operators
of their vector spaces, so that their matrices are then the transpose of those you see here.
That is, the columns of their matrices record the fates of their basis vectors. However
as algebraists are aware, this is actually a very awkward procedure when one wants to
regard the composition of these transfomations as a binary operations on any set of such
transformations.
111
defines a group homomorphism
fA : GL(V ) → GL(n, F )
of the group of linear bijections into the group of n × n invertible
matrices under ordinary matrix multiplication.
What is the kernel? This would be the group of linear transformations
which fix each basis element, hence every linear combination of them,
and hence every vector of V . Only the identity mapping can do this,
so we see that fA is an isomorphism of groups.
(f) The determinant homomorhism of matrix groups. The determinant associates with each n × n matrix, a scalar which is non-zero if
the matrix is invertible. That determinants preserve matrix multiplication is not very obvious from the formulae expressing it as a certain
sum over the elements of a symmetric group. Taking it on faith, for
the moment, this would mean that the mapping
det : GL(n, F ) → F ∗ ,
taking each invertible matrix to its determinant is a group homomorphism into the multiplicative group of non-zero elements of the ground
field F . The kernel, then, is the group of all n × n matrices of determinant 1, which is called the special linear group and is denoted
SL(n, F ).
(g) Even and odd permutations and the sgn homomorphism. Now
consider the symmetric group on n letters. In view of Example (a)
above, the symmetric groups Sym(X) on finite sets X of size n are all
isomorphic to one another, and so are given a neutral uniform description: Sym(n) is the group of all permutations of the set of “letters”
{1, 2, . . . , n}. Subgroups of Sym(n) are called permutation groups
on n letters and are so useful in providing an envoirnment for calculation, that many properties of finite groups are in fact proved by
such calculations. The way to transport arguments with symmetric
groups to general groups is to exploit homomorphisms G → Sym(n).
These are called “group actions” and are fully exploited in the next
subsection.
112
Now we can imagine that the neutral set of letters Ωn := {1, 2, . . . , n}
are formally a basis A of an n-dimensional vector space over any chosen
field F . Then any permutation becomes a permutation of the basis
elements of V , which extends to a linear transformation T of V , and
can then be rendered as a matrix A T A with respect to the basis A as
in example (f). Thus, by a composition of several isomorphisms that
we understand, together with their restrictions to subgroups, we have
obtained an isomorphism
Sym(n) → GL(n, F )
which represents each permutation by a matrix possessing exactly one
1 in each row and in each column, all other entries being zero. Such a
matrix is called a a permutation matrix.
Now even the usual sum formula for the determinant shows that the
determinant of a permutation matrix is ±1. Now if we accept the
thesis of example (f) just above, that the determinant function is in
fact multiplicative, we obtain a useful group homomorphism:
sgn : Sym(n) → {±1}
into the multiplicative group Z2 of numbers ±1, which records the
determinant of each permutaion matrix representing a permutation.
The kernel of sgn is called the alternating group, denoted Alt(n), and
its elements are called even permutations. All other permutations
are called odd permutations. Since sgn is a group homomorphism,
we see that
An even permutation times an even permutation is even.
An odd permutation times an odd permutation is even.
An even permutation times an odd permutation (in any order) is odd.
A transposition is a permutation which interchanges just two of the
n letters, and leaves all remaining letters fixed. Such a permutation
is represented by a permutation matrix having just two of its 1’s in
symmetric positions off the main diagonal, and all remaining 1’s on the
that diagonal. Clearly the determinant of such a matrix is −1. Thus
any transposition is an odd permutation. Obviously the product of
113
an even number of transpositions is even, while the product of an odd
number is odd. We shall learn later that any permutaion displacing a
finite number of letters can be written as a product of transpositions,
so this criterion is definitive9 .
(h) The automorphism group of a cyclic group of order n. Finally, perhaps, we should consider an example of an automorphism
group of a group. We consider here, the group Aut(Zn ), the group
of automorphisms of the cyclic group of order n, where n is any natural number. Suppose, then,that C is the additive group of integers
mod n – that is, the additive group of residue classes modulo n. Thus
{[j] := j + nZ}, j = 1, 2, . . . n − 1. The addition rule becomes
[i] + [j] = [i + j] or [i + j − n], whichever does not exceed n,
where 1 ≤ i ≤ jn. Then element [1] generates this group. Indeed
so does [m] if and only if gcd(m, n) = 1. Thus, if f : C → C is an
automorphism of C it follows than f ([1]) = [m] where gcd(m.n) = 1.
Moreover, since f is a homorphism, f [k] = [mk]. Thus the automorphism f is completely determined by the natural number m coprime to
and less than n. The number of such numbers is called the Euler φfunction, and it’s value at n is denoted φ(n). Thus φ(n) = |Aut(Zn )|.
It now follows that Aut(C) is isomorphic to the multiplicative group
Φ(n) of all residues mod n which consist only of numbers which are
relatively prime to n.10
(i) The inner automorphism group. Let G be a general abstract group
and let x be a fixed element of G. We define a mapping
ψx : G → G,
called conjugation by x , by the rule ψx (g) := x−1 gx, for all g ∈ G.
One can easily verify that ψx (gh) = ψx (g)ψx (h), and as ψx is a bijection,
9
Since the development of this example assumed the thesis of Example (f) — that the
determinant of a product of matrices is the product of their determinants — which may
not be known from first principles by some students, we shall give an alternative proof of
the existence of the sgn homomorphism in the section on Permutation Groups and Group
Actions in the next Chapter.
10
We will obtain a more exact structure of this group when we encounter the Sylow
theorems.
114
it is an automorphism of G. Any automorphism of G which is of the
form ψx for some x in G, is called an inner automorphism of G.
Now if {x, y, g} ⊆ G, one always has
y −1 (x−1 gx)y = (y −1 x−1 )g(xy) = (xy)−1 g(xy)
(3.2)
which means
ψy ◦ ψx = ψxy
(3.3)
for all x, y. Thus the set of inner automorphisms is closed under composition of morphisms. Setting y = x−1 in Equation (3.3) we have
ψx−1 = ψx−1 .
(3.4)
and so the set of inner automorphisms is also closed under taking inverses. It now follows that the set of inner automorphisms of a group G
forms a subgroup, which we denote Inn(G), of the full automorphism
group and call the inner automorphism group of G.
Now Equation (3.3) would suggest that there is a homomorphism from
G to Aut(G) except for one thing: the arguments of the ψ-morphisms
come out in the wrong order in the right side of the equation. That is
because the operation“◦” is denoting composition of left operators.
This reveals the efficacy of using the exponential notation for denoting
automorphisms as right operators. We employ the following:
(Convention of writing conjugates in groups) If a and b are
elements of a group G, we write
a−1 ba in the exponential form ba .
In this notation Equation (3.2) reads as follows:
(g x )y = g xy
(3.5)
for all {g, x, y} ⊆ G. What could be simpler?
Then we understand Equation (3.3) to read
ψxy = ψx ψy ,
115
(3.6)
where the juxtaposition on the right hand side of the equation indicates
composition of right operators – that is the chronological order in which
the mappings are performed reads from left to right, ψx first and then
ψy .
Now Equation (3.6) provides us with a group homomorphism:
ψ : G → Aut(G)
taking element x to the inner automorphism ψx , the automorphisms
of Aut(G) being composed as right operators (as in the exponential
convention for isomorphisms on page 108).
What is the kernel of the homorphism ψ? This would be the set of all
elements z ∈ G such that ψz = 1G , the identity map on G. Thus this
is the set Z(G) of elements z of G satisfying any one of the following
equivalent conditions:
(a) ψz = 1G , the identity map on G,
(b) z −1 gz = g for all elements g of G,
(c) zg = gz for all elements g of G.
The subgroup Z(G) is called the center of G. The identity element
is always in the center. If Z(G) = G, then multiplication in G is
“commutative” – that is xy = yx for all (x, y) ∈ G × G. Such a group
is said to be commutative, and is affixed with the adjective abelian.
Thus G is abelian if and only if G = Z(G).
Finally, we ask the student to prove that G/Z(G) is never a nontrivial
cyclic group. [Hint: If false, cosets of the form Z(G)x±k partition G,
and elements on one of these commute with those in any other such
coset. So G can be shown to be abelian.]
A Glossary Expected to be Understood from the Examples.
1. homomorphism of groups.
2. epimorphism of groups.
3. embedding of groups.
4. isomorphism of groups.
116
5. endomorphism of a group.
6. automorphism of a group.
7. the kernel of a homomorphism kerf .
8. the automorphism group of a group Aut(G).
9. the inner automorphism group, Inn(G), of a group, G.
10. an inner automorphism.
11. the center of a group, Z(G).
12. abelian groups.
3.4
Factor Groups and the Fundamental Theorem of Homomorphisms
Whenever I am obligated to present before a class a Theorem labeled as some
sort of “Fundamental Theorem”, I feel embarassed. Nearly all the time such
a theorem is not fundamental at all.
Who on earth decided that the Fundamental Theorem of Algebra
should be the fact that the Complex Numbers form an algebraically closed
field? Who on earth decided that the the Fundamental Theorem of Geometry should be the fact that an isomorphism between two Desarguesian
Projective Spaces of sufficient dimension is always induced by a semilinear
transformation of the underlying vector spaces ?
The answer is obvious: it was so declared by persons outside the field
who never had to assume the burden of classifying the objects in that field.
So what is a “fundamental theorem”? At a minimum it is a proposition
with these properties:
1. It should be used so constantly in the daily life of a scholar of the field,
that quoting it becomes repetitive.
2. It’s logical distance from the “first principles” of the field should be
short enough to bear a short explanation to a puzzled student (that is,
the alleged “fundamental theorem” should be “teachable”.)
117
We are lucky today! The fundamental theorems of homomorphisms of
groups actually meets both of these criteria.
The set of all subgroups of a group are permuted among themselves by
automorphisms of a group. Explicitly: if σ ∈ Aut(G), and K is a subgroup
of G, then K σ := {k σ |k ∈ K} is again a subgroup of G.11 Moreover, if H is
a subgroup of Aut(G), then any subgoup of G left invariant by the elements
of H is said to be H-invariant. Naturally we have special cases for special
choices of H. A subgroup K which is invariant under the full group Aut(G)
is said to be a characteristic subgroup of G. We write this K char G.
(The center which we previously encountered is certainly one of these.)
But let us drop down to a larger class of H-invariant subgroups by letting
H descend from the full automorphism group of G to the case that H is
simply the group Inn(G), encountered in the previous Section, Example 3,
part (i). A subgroup which is invariant under Inn(G) is said to be normal
in G or a normal subgroup of G.
Just putting definitions together one has
Lemma 61 The following are equivalent conditions for a subgroup K of G:
1. ψx (K) = K, for all x ∈ G,
2. x−1 Kx = K for every x ∈ K,
3. xK = Kx, for each x ∈ G.
(In this Lemma, ψx is the inner automorphism induced by the element x: see
Example 3 part (i) preceeding.)
As a notational convenience, the symbol K G will always stand for the
assertion: “K is a normal subgroup of G” or, equivalently, “K is normal in
G”. This is always a relation between a subgroup K and some subgroup G
which contains it. It is not a transitive relation. It is quite possible for a
group G to possess subgroups L and K, with L K and K G, for which
L is not normal in G.12 Two immediate consequences of normality are the
following:
11
Note the convention of regarding automorphisms as right operators whose action is
denoted exponentially (see page 108).
12
In the group G of rigid rotations of a cube (Example 1 part (g) of Section 3.2), the
180o rotation about one of the three face-centered axes, generates a subgroup L which
pointwise fixes its axis of rotation, but inverts the other two face-centered axes. This a
normal subgoup of the abelian subgroup K of all rotations of the cube which leave each
118
Corollary 62 If N is a normal subgroup of the group G, and H is any
subgroup of G, then
1. N ∩ H is a normal subgroup of H. (As a special case, if H contains
N , then N is normal in H.)
2. N H = HN is a subgroup of G.
The proof is left as an exercise for the beginning student.
If the condition of Part 3. of the above Lemma 61 is true, one sees that
the subset xK is in fact the set Kx, for each x ∈ G. But it also asserts
that Kx · Ky = Kxy as subsets of G. Thus there is actually a multiplicative
group G/K whose elements are the right cosets of K in G, mutiplication
unambiguously given by the rule
(Kx) · (Ky) = Kxy.
We have a name for the group of cosets of a normal subgroup N . It is called
the factor group G/N .
Now let f : G → H be a homomorphism of groups. Two special groups
associated with a group homomorphism f have been introduced earlier in
this Chapter: the range or image, f (G), and the kernel, ker f . Now if
y ∈ ker f , and x ∈ G, we see that
f (x−1 yx) = (f (x))−1 f (y)f (x) = (f (x))−1 · 1 · f (x) = 1 ∈ H
since f (y) = 1. Thus for all y ∈ ker f , and x ∈ G, x−1 yx ∈ ker f so ker f is
always a normal subgroup. What about the corresponding factor group?
Theorem 63 (The Fundamental Theorem of Homomorphisms)
1. If K is a normal subgroup of a group G, the mapping G → G/K which
maps each element x of G to the right coset Kx which contains it, is
an epimorphism of groups. Its kernel is K.
of the three face-centered axes invariant. Then K is normal in G, being the kernel of the
homomorphism of Example 3 Part (d) of Section 3.3. But clearly L is not normal in G,
since otherwise its unique pointwise-fixed face-centered axis, would also be fixed. But it
is not fixed as G/K induces all permutations of Sym(3) on these axes.
119
2. If f : G → H is a homomorphism of groups, there is an isomorphism
η : f (G) → G/(ker f ). Thus every homomorphic image f (G) is isomorphic to a factor group of G.
proof: 1. That the mapping ν : G → G/K defined by ν(x) := Kx, is
a homomorphism follows from KxKy = Kxy, for all (x, y) ∈ G × G. By
definition of G/K it is onto. Since the coset K = K · 1 is the identity element
of G/K, the kernel of ν is precisely the subset {x ∈ G|Kx = K} = {x ∈
G|x ∈ K} = K . Thus ker ν = K.
2. Now set ker f := K. We propose a mapping η : f (G) → G/K which
takes an image element f (x) to the coset Kx. Since this recipe is formulated
in terms of a single element x, we must show that the proposed mapping is
well-defined. Suppose f (x) = f (y). We wish to show Kx = Ky. But the
hypothesis shows that f (x−1 y) = f (x−1 )f (y) = (f (x))−1 f (x) = 1 ∈ H, so
x−1 y ∈ ker f = K, whence Kx = Ky, as desired.
Now
η(f (x)f (y)) = η(f (xy)) = Kxy
= KxKy = η(f (x)) · η(f (y)).
(3.7)
(3.8)
Thus η is a homomorphism. If f (x) were in the kernel of η, then η(f (x)) =
Kx = K, the identity element of G/K. Thus x ∈ K, so f (x) = f (1), the
identity element of f (G). Finally, η is onto, since, for any g ∈ G, the coset
Kg = η(f (g)). Thus η is a bijection. The proof is complete
There are two further consequences, contained in the following
Corollary 64 (The So-called Second and Third Fundamental Theorems of
Homorphisms)
1. Suppose K and N are both normal subgroups of G, with K contained in
N . Then N/K is a normal subgroup of G/K, and G/N is isomorphic
to the factor group (G/K)/(N/K).
2. If N is a normal subgroup of G, and H is any subgroup of G, HN/N
is isomorphic to H/(H ∩ N ).
proof: 1. By Corollary 62, part 1, K is normal in N , so N/K is the multiplicative group of cosets {Kn|n ∈ N }. For any coset Kx of G/K, and coset
120
Kn of N/K, (Kx)−1 KnKx = K(x−1 nx), which is in N/K. Thus N/K G/K. Thus there is a natural epimorphism f2 : G/K → (G/K)/(N/K) onto
the factor group as described in the Fundamental Theorem (Theorem 63, part
1). By the same token, there is a canonical epimorphism f1 : G → G/N . By
Lemma 59 of Section 3.2. the composition of these epimorphisms is again an
epimorphism:
f2 ◦ f1 : G → (G/K)/(N/K).
An element x of G is first mapped to the coset Kx and then to the coset
(Kx){Kn|n ∈ N } (which, viewed as a set of elements of G is just N n). Thus
x maps to the identity element (N/K)/(N/K) of (G/K)/(N/K) if and only
if x ∈ N . Thus the kernel of epimorphism f2 ◦f1 is N . The result now follows
from the Fundamental Theorem of Homorphisms (Theorem 63 Part 2.).
2. Clearly N HN and N ∩ H H by Corollary 62, part1. We propose
to define a morphism f : H → HN/N by sending element h of H to coset
N h ∈ HN/N . Clearly hh0 is sent to N hh0 = (N h)(N h0 ), so f is a group
homomorphism. Since every coset of N in HN = N H has the form N h for
some h ∈ H, f is an epimorphism of groups. Now if h ∈ H, then N h = N ,
the identity of HN/N , if and only if h ∈ H ∩N . Thus ker f = H ∩N and now
the conclusion follows from the Fundamental Theorem of Homomorphisms.
3.4.1
Miscellaneous Exercises.
Exercise 1. Suppose N is a normal subgroup of the group G.
1. Show that there is an isomorphism between the poset of all subgroups
H of G which contain N , and the poset of all subgroups of G/N (both
posets partially ordered by the containment relation). [Hint: The isomorphism takes H in the first poset to H/N in the second.]
2. Show that H is normal in G if and only if H/N is normal in G/N . Thus
the isomorphism of part 1 and its inverse both preserve the normality
relation. (Note that they need not preserve the property of being characteristic in either direction. The group H containing N is called the
inverse image of H/N .)
3. Using the fundamental theorem of homomorphisms conclude that if
f :G→L
121
is an epimorphism of groups, then there is a 1-1 correspondence of the
subgroups of L with the subgroups of G containing ker f preserving
containment and normality. The correspondence takes a subgroup L1
of L to the subset {g ∈ G|f (g) ∈ L1 }, also called the inverse image
of L1 .
Exercise 2. If X is a subset of the group G (X is not necessarily a subgroup),
define the centralizer in G of X to be the set of all group elements g of G
such that gx = xg for all x ∈ X. This set is denoted CG (X). Similarly, the
normalizer in G of X is the set of elements {g ∈ G|g −1 Xg = X}. Such
elements may permute the elements of X by conjugation, but they stabilize
X as a set. A subgroup H is said to normalize X if and only if H ≤ NG (X).
If NG (X) = G, then X is called a normal set in G.
1. Show that CG (X) and NG (X) are subgoups of G.
2. Show that if X is a normal set in G, then CG (X) is also normal in G.
3. Show that if N is a normal subgroup of G, then the inner automorphism
ψg : x → g −1 xg induces an automorphism of N .
Exercise 3.
1. Conclude that a characteristic subgroup of a normal subgroup of G is
normal in G, that is, K char N G implies K G.
2. Conclude that a characteristic subgroup of a characteristic subgroup of
G is characteristic in G, that is, L char K char G implies L char G.
3. Show that the mapping which sends element g to ψg |N , the restriction of the inner automorphism conjugation-by-g to N , defines a group
homomorphism
ψ N : G → Aut(N )
whose kernel is CG (N ). Conclude that the group of automorphisms induced on N by the inner automorphisms of G is isomorphic to G/CG (N ).
Exercise 4. Show that for any group G, Inn(G)Aut(G). (The factor group
Out(G) := Aut(G)/Inn(G) is called the outer automorphism group of
G.)
Exercise 5. Prove the assertions of Corollary 62.
122
Exercise 6. A group is said to be a simple group if and only if its only
proper normal subgroup is the identity subgroup. (Note that the definition
forbids the identity group to be a simple group.)
1. Prove that any group of prime order is a simple group.
2. Suppose G is a group. A subgroup M is a maximal subgroup of G
if and only if it is maximal in the poset of all proper subgroups of G. A
subgroup M is a maximal normal subgroup of G if and only if it is
maximal in the poset of all proper normal subgroups of G. (Note that
it does not mean that it is a maximal subgroup which is normal. A
maximal normal subgroup may very well not be normal.) Prove that a
factor group G/N of G is a simple group if and only if N is a maximal
normal subgroup. [Hint: Use the result of Exercise 1. just above.]
3.5
Direct products.
A direct product is a formal construction for getting new groups in a rather
easy way. First let G = {Gσ |σ ∈ I} be a family of groups indexed by the
index set I. We can write elements of the cartesian product of the Gσ as
functions
[
f : I → σ∈I Gσ such that f (σ) ∈ Gσ .
Of course, when I is countable, we can represent elements f in the usual way,
as sequences (f (1), f (2), . . .). Define a binary operation on the cartesian
product by declaring the “product” f1 f2 of f1 and f2 to be the function
whose value (f1 f2 )(σ) at σ ∈ I is f1 (σ)f2 (σ) – that is, the values fi (σ)
at the “coordinate” σ are multiplied (in Gσ ) to yield the σ-coordinate of
the “product”. This is termed “coordinatewise multiplication”, since for
sequences, the product of (a1 , a2 , . . .) and (b1 , b2 , . . .) is (a1 b1 , a2 b2 , . . .), by
this definition. The cartesian product with this coordinatewise multiplication
clearly forms a group, which we call the direct product over G and is
denoted by
Y
Y
G
(G
)
or
σ
G σ
σ∈I
or, when I = {1, 2, . . . , n}, simply by
G1 × G2 × · · · × Gn .
123
It contains a subgroup consisting of all maps f in the definition of direct
product, for which f (σ) fails to be the identity element of the group Gσ only
finitely many times. This subgroup is called the weak direct product or
direct sum over G and is denoted
X
(Gσ ) or
σ∈I
X
G
(Gσ ).
When I is finite, there is no distinction between the direct product and direct
sum.
P
For every index τ in I, there is clearly a homomorphism πτ : G (Gσ ) →
Gτ which takes f to f (τ ), for all f in the direct product. Such an epimorphism is called a projection onto the coordinate indexed by τ . This
epimorphism retains this name even when it is restricted to the direct sum.
Any permutation of the index set induces an obvious isomorphism of
direct products and direct sums. Thus G1 × G2 is isomorphic to G2 × G1
even though they are not formally the same group.
Similarly, any (legitimate) rearrangement of parentheses involved in constructing direct products yields isomorphisms; specifically
G1 × G2 × G3 ' (G1 × G2 ) × G3 ' G1 × (G2 × G3 )
Now suppose A and B are two subgroups of a group G with A ∩ B = 1,
and B normalizing A (that is, B ≤ NG (A)). Then together they generate
a subgroup AB with order |A| · |B| (Corollary 55, part 2, and Lemma 57).
Now suppose in addition that A normalizes B. Then for any element a ∈ A
and element b ∈ B, the factorizations (aba−1 )b−1 = a(ba−1 b−1 ) show that
aba−1 b−1 ∈ A ∩ B = {1} and so ab = ba. Thus all elements of A commute
with all elements of B. In that case the mapping A × B → AB which takes
(a, b) ∈ A × B to ab, is a group homomorphism whose kernel is {(x, x−1 )|x ∈
A ∩ B}. Since A ∩ B = {1}, this map is an isomorphism. This is generalized
below.
Summarizing,
Lemma 65
1. For any permutation π in Sym(I), There is an isomorphism
G1 × G2 × · · · × Gn → Gπ(1) × Gπ(2) × · · · × Gπ(n)
although neither of the groups are necessarily formally the same.
124
2. Any two well-formed groupings of the factors of a direct product into
parentheses yields groups which are isomorphic to the original direct
product and hence are isomorphic to each other.
3. (Internal Direct Products) Suppose A1 , A2 , . . . is a countable sequence
of subgroups of a group G.
(a) Suppose Aj normalizes Ai whenever i 6= j, and
(b) A1 A2 · · · Ak−1 ∩ Ak = 1, for all k ≥ 2.
Then A1 A2 · · · An is isomorphic to the direct product A1 × · · · × An for
any finite initial segment {A1 , . . . , An } of the sequence of subgroups.
3.6
3.6.1
Other Variations: Semidirect Products
and Subdirect Products.
Semidirect products.
Suppose N and H are two abstract groups and that we are given a homomorphism
f : H → Aut(N ).
Then f defines a formal construction of a group N : H , which is called
semidirect product of N by H.13 The elements of N.H are the elements
of the cartesian product N × H. Multiplication proceeds according to this
rule: For any two elements (n1 , h1 ) and (n2 , h2 ) in N × H:
−1
(n1 , h1 )(n2 , h2 ) := (n1 · (n2 )f (h1 ) , h1 h2 ).
Verification of the associative law for a triple product (n1 , h1 )(n2 , h2 )(n3 , h3 )
comes down to the calculation
f ((h1 h2 )−1 )
n3
−1
f (h−1
2 ) f (h1 )
= (n3
)
.
Example 4.
13
In some older books there is other notation for the semidirect product – some sort
of decoration added to the “” symbol. The notation N : H is the notation for a “split
extension of groups” used in the Atlas for Finite Groups [?].
125
1. If f : H → Aut(N ) is the trivial homomorphism – that is, f maps
every element of H to the identity automorphism of N – then N H is
just the ordinary direct product N × H of the previous section.
2. If A and B are subgroups of a group G for which A normalizes B
and A ∩ B = 1, then the subgroup AB of G, is isomorphic to the
semidirect product B.A with respect to the morphism f : A → Aut(B)
which takes each element a of A to the automorphism of B induced by
conjugating the elements of B by a – that is ψa |B .
3. Let N be the additive group of integers mod n, and let H be any
subgroup of the multiplicative group of residues coprime to n (for example, the quadratic residues coprime to n). Let G be the group of all
permutations of N of the form
µ(m, h) : x → hx + m, for all x ∈ N.
Then G is the semidirect product N : H.
4. Let F be any field, let F ∗ be the multiplicative group of all nonzero
elements of F , and let H be any subgroup of F ∗ . Then the group of
all transformation of F of the form
x → hx + a, a ∈ F, h ∈ H,
is the semidirect product (F, +) : H, where (F, +) denotes the group
on F whose operation is addition in the field F .
5. (The Frobenius group of order 21.) The group is generated by elements
x and y subject only to the relations
x3 = 1, y 7 = 1, x−1 yx = y 2
and is a semidirect product Z7 : Z3 . (This is an example of a “presented
group”, which we shall meet in Chapter 6.)
6. A group is said to be a normal extension of a group N by H
(written G = H.N ) if and only if
N G and G/N ' H.
126
The extension is said to be split of and only of there is a subgroup H1
of G such that
G = N H1 , N ∩ H1 = 1.
In this case, H1 ' H. One can see that
G is a split extension of G by H if and only if G is a semidirect product
of N by H.
First, if the extension is split, every element of G has the form g = nh,
(n, h) ∈ N × H. Because N ∩ H1 = 1, the expression g = nh is unique
for a given element g. We thus have a bijection
σ : G → N × H1
which takes g = nh to (n, τ (h)), where τ is the isomorphism H1 →
H = G/N . Now if we multiply two elements of G, we typically obtain
−1
(h1
(n1 h1 )(n2 h2 ) = n1 (h1 n2 h−1
1 )h1 h2 = n1 (n2 )
)
· h1 h2 ,
where, by our convention on inner automorphisms, g x := x−1 gx. The
σ-value of this element is
−1
((n1 )(n2 )(h1 ) , τ (h1 )τ (h2 )).
Thus σ is an isomorphism of G with the semidirect product of N by
H, defined by composing τ −1 with the homomorphism
H1 → Aut(N )
induced by conjugation. (The latter is just the restriction to H1 of the
homomorphism ψ of Example 3, part (i).)
Conversely, if G is the semidirect product of N by H defined by some
homomorphism ρ : H → Aut(N ), then, setting
N0 := {(n, 1)|n ∈ M }
H0 := {(1, h)|h ∈ H}
then
N0 G = N0 H0 and N0 ∩ H0 = 1,
so G is the split extension N0 by H0 and conjugation in G by element
h0 of H0 induces automorphism ρ(h0 ) on N0 . (Clearly, N0 ' N and
H0 ' H.)
127
3.6.2
Subdirect products.
Something a little less precise and more general is the notion of a subdirect
product. We say that a group H is a subdirect product of the groups
Q
{Gσ |σ ∈ I} if and only if (i) H is a subgroup of the direct product I Gσ and
(ii) for each index τ , the restriction of the projection map πτ to H is onto –
i.e. πτ (H) = Gτ . There may be many such subgroups, so the isomorphism
type of H is not determined by (i) and (ii) above.
Subdirect products naturally arise in this situation:
Suppose M and N are normal subgroups of a group G. Then G/(M ∩ N )
is a subdirect product of G/N and G/M . This is because G/(M ∩ N ) can be
embedded as a subgroup of (G/M )×(G/N ) by mapping each coset (N ∩M )x
to the ordered pair (M x, N x).
In fact something very general happens:
If {Nσ |σ ∈ I} is a family of normal subgroups of a group G, then
• G/(∩I Nσ ) is the subdirect product of the groups {G/Nσ |σ ∈ I}.
• If F is “closed under taking subdirect products” – that is, any subdirect
product of members of F is in F (for example the class of abelian groups
is closed under subdirect products) – then there is a “smallest” normal
subgroup GF yielding a factor G/GF in F. Precisely G/GF ∈ F and
if N is normal with G/N ∈ F, then GF ≤ N .
3.7
Minimal normal subgroups of finite groups.
In this section the reader is reminded of the following definitions and result
from Exercise 6, page 123:
1. A simple group is a group G whose only proper normal subgroup is
the identity subgroup.
2. A maximal normal subgroup of a group G is a maximal element
in the partially ordered set of proper normal subgroups of G (ordered
by inclusion, of course).
3. A factor group G/N of G is a simple group if and only if N is a maximal
normal subgroup of G.
We begin with a an elementary Lemma
128
Lemma 66 Suppose {Mi |i ∈ I} is a finite collection of maximal normal
subgroups of a group G. Then for some subset J of I,
G/(∩i∈I Mi ) = G/(∩j∈J Mj ) '
X
j∈J
(G/Mj ),
a direct product of simple groups.
proof: The result is true for |I| = 1, as G/M1 is simple. We use induction on
|I|. Renumbering the subscripts if necessary, the induction hypothesis allows
us to assume that for some subset J = {1, . . . , d} ⊆ I 0 = {1, . . . , k − 1},
N : = ∩i∈I 0 Mi = ∩j∈J Mj .
G/N ' (G/M1 ) ⊕ · · · ⊕ (G/Md ).
(3.9)
(3.10)
We set I := {1, . . . , k} = I 0 + {k} and assume, without loss, that Mk is
not contained in N . Then since N Mk is a normal subgroup of G properly
containing Mk , we must have G = N Mk . Then
G/(N ∩ Mk ) ' (G/N ) ⊕ (G/Mk ),
and the result for |I| = k follows upon substitution for G/N in the right-hand
side.
One may conclude
Corollary 67
1. Suppose G is a finite group with no non-trivial proper characteristic
subgroups. Then G is a direct product of pairwise isomorphic simple
groups.
2. If N is a minimal normal subgroup of a finite group G, then N is a
direct product of isomorphic simple groups.
proof: Part 1. Let M be a maximal normal subgroup of the finite group
G. Then G has a finite automorphism group, and so {M σ |σ ∈ Aut(G)} is
a finite collection of maximal normal subgroups of G whose intersection N
is a characteristic subgroup of G properly contained in G. By hypothesis,
129
N = 1. Then by the above Lemma 66, G = G/N is the direct product of
of simple groups. If G = 1, we have an empty direct product and there is
nothing to prove. Otherwise, the product of those direct factors isomorphic
to the first direct factor clearly form a non-trivial characteristic subgroup,
which, by hypothesis must be the whole group.
Part 2. Since N is a minimal normal subgroup of the finite group G,
N is a non-trivial finite group with only the identity subgroup as a proper
characteristic subgroup, so the conclusion of Part 1. holds for N .
130
Chapter 4
Permutation Groups and
Group Actions
4.1
Group actions and orbits
As defined earlier, a permutation of a set X is a bijective mapping X → X.
Because we will be dealing with groups of permutations it will be convenient
to use the “exponential ” notation which casts permutations in the role of
right operators. Thus if π is a permutation of X, we write xπ for the image
of element x under the permutation π. Recall that composition of bijections
and inverses of bijections are bijections, so that the set of all bijections of
set X into itself form a group under composition which we have called the
symmetric group of X and have denoted Sym(X). Recall also that any
bijection ν : X → Y induces a group isomorphism ν̄ : Sym(X) → Sym(Y )
by the rule:
ν(xπ ) = (ν(x))ν̄(π) .
In order to emphasize the distinction between the elements of Sym(X) and
the elements of the set X which are being permuted, we often refer to the
elements of X by the neutral name “letters”. In view of the isomorphism
just recorded, the symmetric group on any finite set of n elements can be
thought of as permuting the set of symbols (or ‘letters’) Ωn := {1, 2, . . . , n}.
This group is denoted Sym(n) and has order n!.
Now suppose H is any subgroup of Sym(X). Say for the moment that
two elements of X are H-related if and only if there exists an element of
H which takes one to the other. It is easy to see that “H-relatedness” is
131
an equivalence relation on the elements of X. The equivalence classes with
respect to this relation are called the H-orbits of X. Within an H-orbit, it
is possible to move from any one letter to any other, by means of an element
of H. Thus X is partitioned into H-orbits. We say H is transitive on X if
and only if X comprises a single H-orbit.
We now extend these notions in a very useful way. We say that a group
G acts on set X if and only if there is a group homomorphism
f : G → Sym(X).
We refer to both f and the image f (G) as the action of G, and we shall
borrow almost any adjective of a subgroup f (G) of Sym(X), and apply it
to G. Thus we call an f (G)-orbit, a G-orbit, we say “G acts in k orbits on
X” if X partitions into exactly k f (G)-orbits, and we say “G is transitive on
X”, or “acts transitively on X” if and only if f (G) is transitive on X.
If the action f is understood, we write xg instead of xf (g) . Also the unique
G-orbit containing letter x is the set {xg |g ∈ G}, and is denoted xG . It’s
cardinality is called the length of the orbit. The group action is said to
be faithful if and only if ker f : G → Sym(X) = 1 ∈ G – that is, f is an
embedding of G as a subgroup of the symmetric group.
The power of group actions derives from the fact that the same group can
have several actions, each of which can yield new information about a group.
We have already met several examples of group actions, although not in this
current language. We list a few
Examples of Group Actions
1. Recall that each rigid rotation of a regular cube induces a permutation
on the following sets:
(a) the four vertex-centered axes.
(b) the three face-centered axes.
(c) the six edge-centered axes.
(d) the six faces.
(e) the eight vertices.
(f) the twelve edges.
132
If Y is any one of the six sets just listed, and G is the full group of
rigid rotations of the cube, then we obtain a transitive action fY : G →
Sym(Y ) in each of these cases. Except for the case Y is the three
perpendicular face-centered axes (Case (b)), the action is faithful. The
action is an epimorphism onto the symmetric group only in Cases (a)
and (b).
2. If N is a normal subgroup of G, then G acts on the elements of N by
inner automorphisms of G. Thus ng := g −1 ng for each element g of G
and each element n of N , to give the group action: f : G → Sym(N ).
Clearly the identity element of G forms one of the G-orbits and has
length one.
If N = G, the G-orbits xG := {g −1 xg|g ∈ G}, x ∈ G are called conjugacy classes of G. If one is so inclined, one can then myopically
concentrate on the transitive action G → Sym(xG ) of a single conjugacy
class xG .
3. The image H σ of any subgroup H of G under an automorphism σ is
again a subgroup of G, and this is certainly true of inner automorphisms. Thus H g := g −1 Hg is a subgroup of G called a G-conjugate
of H or just conjugate of H. Thus there is an action G → Sym(S(G))
where S(G) is the collection of all subgroups of G. As in the previous
case, this restricts to the transitive action G → Sym(H G ) on the set
H G := {H g |g ∈ G} of all conjugates of the subgroup H.
4. Finally we can take any fixed subset X of elements of G and watch the
action G → Sym(X G ) on the collection X G := {g −1 Xg|g ∈ G} of all
G-conjugates of X.
5. Let Pk = Pk (V ) be the collection of all k-dimensional subspaces of the
vector space V , where k is a natural number. We assume dim V ≥ k
to avoid the possibility that Pk is empty. Since an invertible linear
transformation must preserve the dimension of any finite dimensional
subspace, we obtain an action
f : GL(V ) → Sym(Pk ).
It is necessarily transitive.
133
6. Similarly, if we have an action f : G → Sym(X), where |X| ≥ k, we
also inherit an action fY : G → Sym(Y ) on the following sets Y :
(a) The set X (k) of ordered k-tuples of elements of X.
(b) the set X (k)∗ of ordered k-tuples with pairwise distinct entries.
(c) the set X(k) of all (unordered) subsets of X of size k.
4.2
Permutations.
A permutation π ∈ Sym(X) is said to displace letter x ∈ X if and only
if xπ 6= x. A permutaion is said to be finitary if and only if it displaces a
finite number of letters. The products among, and inverses and conjugates of
finitary permutations is always finitary, and so the set of all finitary permutations always form a normal subgroup of the symmetric group. Any finitary
permutation obviously has finite order.
In this subsection we shall study certain arithmetic properties of group
actions which are mostly useful for finite groups.
4.2.1
Cycle Notation.
Now suppose π is a finitary permutation in Sym(X). Letting H be the
cyclic subgroup of Sym(X) generated by π, we may partition X into Horbits. By our self-imposed finitary hypothesis, there are only finitely many
H-orbits of length greater than one. Let O be one of these, say of length
n. Then for any letter x in O, we see that O = xH must consist of the set
2
n−1
{x, xπ , xπ , . . . xπ }. We represent this permutation of O by the symbol:
2
(x xπ xπ . . . xπ
n−1
)
called a cycle. Stated in this generality, the notation is not very impressive. But in specific instances it is quite useful. For example the notation
(1 2 4 7 6 3) denotes a permutation which takes 1 to 2, takes 2 to 4, takes 4
to 7, takes 7 to 6, takes 6 to 3, and takes 3 to 1. Thus everyone is moved forward one position along a circular trail indicated by the cycle. In particular
the cycle notation is not unique since you can begin anywhere in the cycle:
thus (7 6 3 1 2 4) represents the same permutation just given. Moreover,
134
writing the cycle with the reverse orientation of the circuit yields the inverse
permutation, (4 2 1 3 6 7) in this example.
Now the generator π of the cyclic group H acts on each H-orbit as a cycle.
This can be restated as the assertion that any finitary permutation can be
represented as a product of disjoint cycles, that is, cycles which pairwise
displace no common letter. Such cycles commute with one another so it
doesn’t matter in which order the cycles are written. Also, if the set X is
known, the permutation is determined only by its list of disjoint cycles of
length greater than one – the cycles of length one (indicating letters that are
fixed need not be mentioned. Thus in Sym(9), the folloing permutations are
the same:
(14597)(26) = (62)(14597)(8).
Now to multiply two such permutations we simply compose the two permutations – and here the order of the factors does matter. (It is absolutely
necessary that the student learn to compute these compositions.) Suppose
π = (7 6 1)(3 4 8)(2 5 9)
σ = (7, 10, 11)(9 2 4).
(4.1)
(4.2)
Then applying π first and σ second. the composition of these right operators
is
πσ = (1, 10, 11, 7 6)(2 5)(3 9 4 8).
We simply compute hπσi-orbits.1 The computation begins with an involved
letter – say “1” – and asks what πσ did to it? 1π = 7 and 7σ = 10, so
1πσ = 10. Then it asks what πσ did to 10 in turn? One gets 10πσ = 11,
11πσ = 7,and so on, until the hπσi-orbit is completed upon returning to
1. One then looks for a new letter displaced by at least one of the two
permutations, but which is not involved in the orbit just calculated – say,
“2” – and repeats the process.
We make this observation:
Lemma 68 The order of a finitary permutation expressed as a product of
disjoint cycles is the least common multiple of the lengths of those cycles.
proof: This is an immediate consequence of two facts: (i) disjoint cycles
commute, and (ii) and the order of a cycle is its length. (See Section 2.3 for
1
I will follow Burnside in explaining that commas are introduced into the cycle notation
only for the specific purpose of distinguishing a 2-or-more-digit entry from other entries.
135
the fact that if x and y are commuting elements of finite order in a group,
then o(xy) = lcm(o(x), o(y)).)
4.2.2
Even and odd permutations.
We begin with a technical result:
Lemma 69 If k and l are non-negative integers, then
1. (a b)(a x1 . . . xk b y1 . . . yl ) = (a y1 . . . yl )(b x1 . . . xk ), and
2. (a b)(a y1 . . . yl )(b x1 . . . xk ) = (a x1 . . . xk b y1 . . . yl ).
proof: The permutation on the left side of the first equation sends a to y1 ,
sends yi to yi+1 for i < l, and yl to a. Similarly it sends b to x1 , sends xj to
xj+1 for j < k, and sends xk to b. Thus the left side has been expressed as
the product of the two disjoint cycles on the right side of the first equation.
The second equation follows from the first my multiplying both sides of
the first equation by the 2-cycle (a b).
Let FinSym(X) denote the group of all finitary permutations of the
set of ‘letters’ X. Each element π of FinSym(X) is expressible as a finite
product of disjoint cycles of lengths d1 , . . . dk greater than one, together with
arbitrarily many cycles of length one. Define the sign of π to be the number
Yj=k
j=1
(−1)dj −1 .
Then the function sgn : FinSym(X) → {±1} is well-defined.
A permutation of Sym(X) which displaces exactly two letters is called
a transposition. Thus every transposition can be expressed in cycle notation by (a b), for the two displaced letters, a and b. Clearly, the sign of a
transposition is (−1)2−1 = −1.
We now can state
Lemma 70 If g is a permutation in FinSym(X), and if t is a transposition,
then
sgn(tg) = −sgn(g) = sgn(gt).
136
proof: The permutation g is a product of disjoint cycles cj of length dj .
Q
Thus sgn(g) = j (−1)dj −1 . Suppose t is the transposition (a b). If the
letters a and b are not involved in any of the cycles gi , then the result follows
from the formula for the sign of gt = tg since we have simply tacked on the
disjoint cycle (a b) in forming gt.
Suppose on the other hand, that a and b appear in the same hgi-orbit, say
the one represented by the cycle gj . Then gj has the form (a x1 . . . xk b y1 . . . yl ),
where k and l are non-negative. By Lemma 69,
tgj = (a y1 . . . yl )(b x1 . . . xk ).
Thus dj = k + l + 2 so sgn(gj ) = (−1)k+l+1 while sgn(tgj ) = (−1)k (−1)l , so
sgn(tgj ) = −sgn(gj ). Since tg = g1 · · · (tgj ) · · · gi · · ·, the result follows from
the formula.
Now suppose a occurs in one cycle, say g1 = (a x1 , . . . xk ) and b occurs
in another cycle, which we can take to be g2 = (b y1 . . . yl ) . Then by
Lemma 69,
tg = (tg1 g2 )g3 · · · gm
= (a y1 . . . yl b x1 . . . xk )g3 · · · gm .
Thus
Y
sgn(tg) = sgn(tg1 g2 )
i>2
(−1)(di −1)
Y
= (−1)sgn(g1 )sgn(g2 )
i>2
(−1)(di −1)
= −sgn(g).
So the result follows in this case.
Now any cycle (x1 . . . xm ) is a product of transpositions
(x1 xk )(xk xk−1 ) · · · (x3 x2 ).
Since each element of FinSym(X) is the product of disjoint cycles we see that
FinSym(X) is generated by its transpositions.
It now follows from Lemma 70 that no element can be both a product of
an even number of transpositions as well as a product of an odd number of
transpositions. Thus we have a partition
FinSym(X) = A+ + A−
137
of all the elements of FinSym(X) into the set A+ of all finitary permutations
which are a product of an even number of transpositions, and A− , all finitary
permutations which are a product of an odd number of transpositions. Right
multiplication by any finitary permutation at best permutes the two sets A+
and A− , and multiplication by a transposition clearly transposes them. Thus
we have a transitive permutation action
sgn : FinSym(X) → Sym({A+ , A− }).
The right side is Sym(2) ' Z2 , isomorphic to the multiplicative group {±1}.
The kernel of this action is the normal subgroup FinAlt(X) := A+ , the
finitary alternating group. Its elements are called the even permutations. The elements in the other coset A− , are called odd permutations.
These terms are used only for finititary permutations. If X is a finite set,
there is no distinction gained by singleing out the finitary permutations, and
the prefix “Fin” is dropped throughout. So in that case A+ is simply called
the alternating group and is denoted Alt(X) or Alt(n) when |X| = n.
The factorization of a cycle of length n given above shows that any cycle
of even length is an odd permutation and any cycle of odd length is an even
permutation. It only “sounds” confusing; by the formula, the sign of an
n-cycle is (−1)n−1 .
4.2.3
Transpositions and Cycles: The Theorem of Feit,
Lyndon and Scott
Recall that a transposition of a symmetric group Sym(X) is an element
which dispaces exactly two of the letters of the set X. These two letters
must clearly exchange places, otherwise we are dealing with the identity
permutation.
Let T be any collection of transpositions of Sym(X). We can then construct an simple2 graph G(T ) := (X, T ) with vertex set X and edge set
consisting of the transposed 2-subsets determined by each involution.
2
In the context of graphs (as opposed to groups) “simple” actually means something
rather simple. A simple graph is one which is undirected without loops or multiple edges.
That is edges connect only distinct vertice (no loops), there is at most one edge connecting
two distinct vertices, and no orientation to the edge – i.e. edges are 2-subsets of the vertex
set.
138
Lemma 71 Let T be a set of transpositions of Sym(X). Suppose G(T ) :=
hT iSym(x) , the subroup generated by the transpositions of T , acts on X with
orbits Xσ of finite length. Then
G(T ) '
X
σI
Sym(Xσ )
where Xσ are the G(T )-orbits on X.
proof: First of all G(T ) is a finitary permutation group on X. Let Tσ be
the set of transpositions which form an edge of the connected component Xσ
of G(T ). It is clear that G(T ) is the direct sum of the groups
Gσ := hTσ i.
so that all that remains is to show is that
Gσ acts on Xσ as the full symmetric group on Xσ , a question in finite
groups of actions.
So, without loss we assume that G(T ) is transitive on the finite set X
of cardinality n. If |T | = 1, then |X| = 2 and there is nothing to prove.
We may now proceed by induction on |T | and conclude that G(T ) is a tree.
Now we are free to choose the transposition t ∈ T so that as an edge of
the graph G(T ), t connects an “end point” a of valence one, and a vertex b.
Thus G(T − {t}) is a tree on the vertices X − {a}, and so by induction the
stabilizer in G(T ) of the vertex a has order (n − 1)!, and so as G(T ) is clearly
transitive, it has order n!. Since it is faithful, (for its elements fix every single
letter of every other orbit), we have here the full symmetric group.
Theorem 72 (Feit, Lyndon and Scott) Let T be a minimal set of transpositions generating Sym(n). Then their product in any order is an n-cycle.
proof: We have seen from above that the minimality of T implies that the
graph G(T ) is a tree.
Let a = t1 t2 · · · tn−1 be the product of these transpositions in any particular order. Consider t1 and remove the edge t1 from the graph G(T ) to obtain
a new graph G 0 which is not connected, but has two connected component
trees G1 and G2 , connected in G by the “bridging” edge t1 . Now by an obvious
induction any order of multiplying the transpositions of
Ti := {s ∈ T |s is an edge in Gi },
139
i = 1, 2, . . ., yields an n -cycle. Thus
a = t(x1 , . . . xk )(y1 , . . . yl ),
where the second cycle b is the product of the transpositions of T1 as they
are encountered in the factorization a = t1 · · · tn , and the third cycle c above
is the product of the transpositions of T2 in the order in which they are
encountered in the product given for a. (Recall that every transposition in
T1 commutes with every transposition in T2 .) Now these two cycles involve
two seperate orbits bridged by the transposition t1 . That the product t1 bc
is an n-cycle follows directly from Lemma 69 Part 2. above. The proof is
complete.
This strange theorem will bear fruit in the proof of the Brauer-Ree Theorem, which takes a bow in our chapter on generation of groups (Chapter
6).
4.2.4
Action on right cosets.
Let H be any subgroup of the group G, and let G/H be the collection{Hg|g ∈
G} of all right cosets of H in G. Note that we have extended out previous
notation. G/H used to mean a group of right cosets of a normal subgroup
H of G. It is still the same set (of right cosets of H in G) even when H is
not normal in G. But it no longer has the structure of a group. Nonetheless,
as we shall see, it still admits a right G-action.
For any element g of G and coset Hx in G/H, Hxg is also a right coset
of H in G. Moreover, Hxg = Hyg implies Hx = Hy, while Hx = (Hxg −1 )g.
Thus right multiplication of all the right cosets of H in G by the element g
induces a bijection πg : G/H → G/H. Moreover for elements g, h ∈ G, the
identity Hx(gh) = ((Hx)g)h, forced by the associative law, implies πg πh =
πgh as right operators on G/H. Thus we have a group action
πH : G → Sym(G/H)
which takes g to the permutation πg , induced by right multiplication of right
cosets by g. This action is always transitive for coset Hx is mapped to coset
Hy by πx−1 y .
140
4.2.5
Equivalent actions.
Suppose a group G acts on sets X and Y – that is, there are group homomorphisms fX : G → Sym(X) and fY : G → Sym(Y ). We say that the two
actions are eqivalent if and only if there is a bijection e : X → Y such
that for each letter x ∈ X, and element g ∈ G,
e(xfX (g) ) = (e(x))fY (g) .
That is, the action is the same except that we have used the bijection e to
“change the names of the elements of X to those of Y ”.
4.2.6
The Fundamental Theorem of Transitive Actions.
First we make an observation:
Lemma 73 Let f : G → Sym(X) be a transitive action of the group G. For
each letter x ∈ X, let Gx be the subgroup of all elements of G which leave
the letter x fixed – i.e. Gx := {g ∈ G|xg = x}.
1. If x, y ∈ X, then Gx and Gy are conjugate subgroups of G. Precisely,
Gy = g −1 Gx g whenever xg = y.
2. The set {g ∈ G|xg = y} of all elements of G taking x to y, is a right
coset of Gx .
−1
−1
proof: 1. If xg = y, y g Gx g = xgg Gx g = xGx g = xg = y, so g −1 Gx g ⊆ Gy ..
−1
But since x = y g , we have Gy ⊆ gGx g −1 , by the same token. Thus the first
containment is reversible and g −1 Gx g = Gy .
2. If xg = y then clearly, xh = y for all h ∈ Gx g. Conversely, if xg = xh =
y, then gh−1 ∈ Gx , so Gx g = Gx h.
Theorem 74 (The Fundamental Theorem of Transitive Group Actions.)
Suppose f : G → Sym(X) is a transitive group action. Then for any letter x in X, f is equivalalent to the action
πGx : G → Sym(G/Gx ),
of G on the right cosets of Gx by right multiplication.
141
proof: We must construct the bijection e : X → G/Gx compatible with
both actions. For each letter y ∈ X set e(y) := {g ∈ G|xg = y}. We have
seen in Lemma 73 that the latter is a right coset Gx h of Gx . Now for any
element g in G, e(y g ) is the set of all elements of G which take x to y g . This
clearly contains Gx hg; yet by Lemma 73, it must itself be a right coset of
Gx . Thus e(y g ) is the right multiple e(y)g, establishing the compatibility of
the actions.
Corollary 75
1. If G acts on the set X, then every G-orbit has length dividing the order
of G.
2. If G acts transitively on X, then for any subgroup H of G which also
acts transitively on X, one has G = Gx H, for any x ∈ X.
proof: The action of G on any G-orbit O is transitive. Applying the fundamental theorem we see that |O| = |G/Gx | for any letter x in O.
4.2.7
Normal subgroups of transitive groups.
If G acts transitively on a set X, we say that G acts regularly on X if and
only if for some x ∈ X, Gx = 1.
Lemma 76 Suppose G acts transitively on X.
1. If N is a normal subgroup of G, then all N -orbits on X have the same
length.
2. In particular, if N is a non-identity normal subgroup of G lying in a
subgroup Gx , for some x in X, then N acts trivially on X, and the
action is not faithful.
3. The faithful transitive action of an abelian group is always a regular
action.
142
4. Suppose N is a normal sybroup of G which acts regularly on X. Then
for any x ∈ X, G = Gx N , with Gx ∩ N = 1, and the action of
Gx on X − {x} (by restriction) is equivalent to that action of Gx on
N − {1}, the set of all non-identity elements, induced by conjugation
of the elements of N by the elements of Gx .
proof: 1. For any g ∈ G and x ∈ X, the mapping
xN → (xN )g = xN g = xgN = (xg )N
is a bijection of N -orbits. Since G is transitive, all N -orbits can be gotten
this way.
2. This is immediate from 1, since N has an orbit of length 1.
3. If A is an abelian subgroup acting transitively on X, for any x ∈ X,
Ax acts trivially on X by part 2. Since A acts faithfully, Ax = 1, so A has
the regular action.
4. That G = Gx N and Gx ∩ N = 1 follows from Lemma 75, part 2, and
the fact that N is regular.
Since the normal subgroup N is regular, there is a bijection, ex : N →
X = xN which sends element n ∈ N to xn . For each element h ∈ Gx ,
e(h−1 nh) = xh
−1 nh
= xn h = e(n)h .
So e defines an equivalence of the action of Gx on N by conjugation and the
action of Gx on X obtained by restriction. Throwing away the corresponding
Gx -orbits {1} and {x} of length 1 yields the last statements of 4.
4.2.8
Double cosets.
Let H and K be subgroups of a group G. Any subset of the form HgK is
called an (H, K)-double coset. Such a set is, on the one hand a disjoint
union of right cosets of H, and on the other, a disjoint union of left cosets
of K. So it’s cardinality must be a common multiple of |H| and |K|. The
precise cardinality is given in the following:
Lemma 77 Let H and K be subgroups of G.
1. |HgK| = |H| · [K : K ∩ g −1 Hg] = |H||K|/|gKg −1 ∩ H| = [H : H ∩
gKg −1 ]|K|.
143
2. The (H, K)-double cosets partition the elements of G.
proof: In the action πH of G on G/H by right multiplication, the K-orbits
are precisely the right cosets of H within an (H, K)-double coset. Since G/K
is partitioned by such orbits, and the right cosets of H partition G, part 2.
follows.
The length of a K-orbit on G/H, is the index of its subgroup fixing one
of its “letters”, say Hg. This subgroup would be {k ∈ K|Hgk = Hg} =
K ∩ gHg −1 }. Since each, right coset of H has |H| elements of G, the second
and third terms of the equations in Part 1. give |HgK|. The last term
appears from the symmetry of H and K.
4.3
Applications of transitive group actions.
For any subset X of a group G, the normalizer of X in G is the set of
all elements NG (X) := {g ∈ G|g −1 Xg = X}. Now G acts transitively by
conjugation on X G = {g −1 XG g ∈ G}, the set of distinct conjugates of X,
and the normalizer NG (X) is just the subgroup fixing one of the “letters” of
X G . Thus
Lemma 78
1. Let X be a subset of G. The number |X G | of distinct conjugates of X
in G is the index [G : NG (X)] of the normalizer of X in G. This holds
when X is a subgroup, as well.
2. The size of a conjugacy class xG in G is the index of the centralizer
CG (x) in G.
4.3.1
Finite p-groups.
A finite p-group is a group whose order is a power of a prime number p.
The trivial group of order 1 is always a p-group. We have
Lemma 79
144
1. If a p-group acts on a set X of cardinality not divisible by the prime p,
then it fixes a letter.
2. A non-trivial finite p-group has a nontrivial center.
3. If H is a proper subgroup of a finite p-group P , then H is properly
contained in its normalizer in P , that is NP (H).
proof: 1. Let P be a finite p-group, say of order pn . Then any P -orbit
on X has length a power of p. Thus if no orbit had length 1, p would divide
|X|, since X partitions into such orbits. But as p does not divide |X|, some
P -orbit must have length 1, implying the conclusion.
2. Let P be as in part 1, but of order pn > 1. Now P acts by conjugation
on the set P − {1} of pn − 1 non-identity elements. Since p does not divide
the cardinality of this set, part 1 implies there is a non-identity element z
left fixed by this action. That is, g −1 zg = z, for all g ∈ P . Clearly z is a
non-identity element of the center Z(P ), our conclusion.
3. Suppose H is a proper subgroup of a p-group P . By way of contradiction assme that H = NP (H). Then certainly, Z(P ) ≤ H. By Part 2.
Z(P ) 6= 1. If H = Z(P ), the H P , a contradiction to H proper. Thus,
under the homomorphism
f : P → P̄ := P/Z(P ),
H maps to a non-trivial proper subgroup f (H) := h̄ of the p-group P̄ . By
induction on |P̄ |, NP̄ (H̄) properly contains H̄. But the preimage of the left
side is
{x ∈ P |x−1 Hx ⊆ some cosets of Z(H) in H = NP (H) ⊆ H.
this contradicts NP (H) ≤ H.
4.3.2
The Sylow Theorems
Theorem 80 (Sylow’s Theorem) Let G be a group of order n and let k be
the highest power of the prime p such that pk divides the group order n. Then
the following hold:
1. (Existence) G contains a subgroup P of order pk .
145
2. (Covering and Conjugacy) Every p-subgroup R of G is contained in
some conjugate of P . In particular, any subgroup of order pk is conjugate in G to P .
3. (Arithmetic) The number of subgroups of order pk is congruent to
1 mod p.
proof: 1. Let Σ denote the collection of!a subsets of G of cardinality pk .
n
The number of such subsets is |Σ| =
. Since n − r and pk − r are both
pk
divisible by the same highest power of p, the number |Σ| is not divisible by
the prime p. Now G acts on Σ by right multiplication – that is, if X ∈ Σ then
Xg ∈ Σ, for any group element g. Thus Σ partitions into G-orbits under
this action, and not all of these orbits can have length divisible by p, since
p doesn’t divide |Σ|. Let Σ1 be such a G-orbit of length not divisible by p.
By the Fundamental theorem of transitive group actions (Theorem 74), the
number of sets in Σ1 is the index of the subgroup GX fixing the “letter” X
in Σ1 . Since n = |G| = [G : GX ]|GX | = |Σ1 ||Gx |, we see that pk divides
|GX |. But by definition, XGX = X, so X is a union of left cosets of GX , so
|GX | divides |X| = pk . It follows that GX has order pk exactly. So GX is the
desired subgroup P of statement 1.
2. Let R be any p-subgroup of G. Then R also acts on Σ1 by right
multiplication. Since p does not divide |Σ1 |, Lemma 79, part 1., shows that
R must fix a letter Y in Σ1 . Thus R ≤ GY . But as G is transitive on Σ1 ,
Lemma 73 shows that GY is conjugate to P = GX . Thus R lies in a conjugate
of P as required.
3. Now let S denote the collection of all subgroups of G having order
pk exactly. By parts 1. and 2., already proved, S is non-empty, and G acts
transitively on S by conjugation. Thus by Lemma 78 part 1., |S| = [G : N ],
where N := NG (P ), the normalizer in G of a subgroup P in S. Now P itself
acts on S by conjugation, fixing itself, and acting on S − {P } in orbits of
p-power length. Suppose {R} were such a P -orbit in S − {P } of length one.
Then P normalizes R so P R is a subgroup of G. Then |P R| = |P |·[R : P ∩R],
(Lemma 63, Part 2.) a product of pk and [R : P ∩ R], another power of p.
Since |P R| divides n and pk is the largest p-power dividing n, one must
conclude that [R : P ∩ R] = 1, which forces P = R, contrary to the choice
of R in S{P }. Thus P acts on S − {P } in P -orbits of lengths divisible by p.
This yields |S| ≡ 1 mod p.
146
The subgroups of G of maximal p-power order are called the p-sylow
subgroups of G, and the single conjugacy class which they form is denoted
Sylp (G).
Corollary 81 (The Frattini Argument) Suppose N is a normal subgroup of
the finite group G, and select P ∈ Sylp (N ). Then G = NG (P )N .
proof: We know that conjugation by elements of G induces automorphisms
of the normal subgoup N . Since Sylp (N ) is the full collection of subgroups of
N of their order, G acts on Sylp (N ) by conjugation. But by Sylow’s Theorem
(Theorem 80) the subgroup N is already transitive on Sylp (N ). The result
now follows from Lemma 75, part 2.
Corollary 82
1. Any normal p-Sylow subgroup of a finite group is in fact characteristic
in it.
2. Any finite abelian group is the direct product of its p-sylow subgroups,
that is, A ∼ S1 × S2 · · · × Sn , where Si is the pi -sylow subgoup of A,
and {p1 , . . . , pn } lists all the distinct prime divisors of |A| exactly once
each. Moreover, we have:
Aut(A) ' Aut(S1 ) × Aut(S2 ) × · · · , Aut(Sn ).
3. The Euler phi-function is multiplicative, that is,
φ(n) = φ(a)φ(b)
whenever n = ab and gcd(a, b) = 1.
proof: 1. Any normal p-sylow subgroup is the unique subgroup of its order,
and hence is characteristic.
2. Since in a finite abelian group, each p-Sylow subgoup is normal, and
has order prime to the direct product of the remaining r-sylow subgroups
(r 6= p), A is the described internal direct product by Lemma 65 , part
3. Since by 1. each Si is characteristic in A, each automorphism σ of A
147
induces an automorphism σi of Si . But conversely, if we apply all possible
automorphisms σi to the direct factor Si and the identity automorphism to
all other p-sylow subgroups Sj , j 6= i, we obtain a subgroup Bi of Aut(A).
Now the internal direct product characterization of Lemma 65 applies to the
Bi to yield the conclusion.
3. Applying 2. when A is the cyclic group of order n, Zn , one obtains
φ(n) =
Yi=n
when n has prime factorization n =
ders.
φ(pai i )
i=1
a1 a2
p1 p2
· · · pann , upon equating group or-
Lemma 83 (The Tail-Wags-the-Dog Lemma.) Suppose Γ = (V, E) is a
bipartite graph with vertices in two parts V1 and V2 , and each edge of E
involving one vertex from V1 and one vertex from V2 . Suppose G is a group
of automorphism of the graph Γ acting on the vertex set V with V1 and V2 as
its two orbits. Then the following conditions are equivalent.
1. G acts transitively on the edges of Γ.
2. For each vertex v1 ∈ V1 , the subgroup G1 of G fixing v1 acts transitively
on the edges on v1
3. For each vertex v2 in V2 , the subgroup G2 of G fixing v2 acts transitively
on the edges of Γ on vertex v2 .
proof: By the symmetry of the Vi , is suffices to prove the equivalence of 1.
and 2. If G transitively permutes the edges, any edge on v1 must be moved
to any other edge on v1 by some element g. But since v1 is the only vertex
of the G-orbit V1 incident with these edges, such an element g fixes v1 . So 2.
holds. Conversely, suppose the subgroup G1 transitively permutes the edges
on v1 . Since G is transitive V1 , any edge meeting V1 at a single vertex can
be taken to any other. But by hypothesis, all edges have this property.
Theorem 84 (The Burnside Fusion Theorem) Let G be a finite group, and
let X1 and X2 be two normal subsets of a p-sylow subgroup P of G. Then X1
and X2 are conjugate in G if and only if they are conjugate in the normalizer
in G of P . In particular, any two elements of the center of P are conjugate
if and only if they are conjugate in NG (P ).
148
proof: Obviously, if the Xi are conjugate in NG (P ), they are conjugate in
G. So assume the Xi belong to X := X1G . We now form a graph whose vertex
set is X ∪ S where S := Sylp (G). An edge will be any pair (Y, R) ∈ X × S
for which the subset Y is a normal subset of the p-sylow group R. Then
G acts by conjugation on the vertex set V with just two orbits, X and S
(the first by construction, the second by Sylow’s Theorem). Moreover the
conjugation action preserves the normal-subset relationship on X ×S, and so
G acts as a group of automorphisms of our graph, which is clearly bipartite.
Now by Sylow’s theorem, the subgroup G1 := NG (X1 ) acts transitively by
conjugation on Sylp (G1 ). But since X1 is normal in some p-sylow subgroup
(namely P ),
Sylp (G1 ) ⊆ Sylp (G)
Thus Sylp (G1 ) are the S-vertices of all the edges of our graph which lie on
vertex X1 . We now have the hypothesis 2. of the Tail-Wags-the-Dog Lemma.
So the assertion 3. of that lemma must hold. In this context, it asserts that
the stabilizer G2 of a vertex of S – for example, NG (P ) – is transitive on
the members of X which are normal in P . But that is exactly the desired
conclusion.
Theorem 85 (Thompson Transfer Theorem) Suppose N is a subgroup of
index 2 in a 2-sylow subgroup S of a finite group G. Suppose t is an involution
in S − N , which is not conjugate to any element of N . Then G contains a
normal subgoup M of index 2.
proof: Let G act on G/N , the set of right cosets of N in G by right
multiplication. In this action, the involution t cannot fix a letter, for if
N xt = N x, then xtx−1 ∈ N , contrary to assumption. Thus t acts as [G :
S] 2-cycles on G/N . Since this makes t the product of an odd number of
transpositions in Sym(G/N ), we see that the the image of G in the action
f : G → Sym(G/N ) meets the alternating group Alt(G/N ) at a subgroup of
index 2. Its preimage M is the desired normal subgroup of G.
4.3.3
Calculations of Group Orders.
The Fundamental Theorem of Transitive Group Actions is guite useful in
calculating the orders of the automorphism groups of various objects. We
peruse a few examples here.
149
Figure 4.1: Two views of Petersen’s graph.
(Petersen’s Graph) The reader is invited to verify that the two graphs which
appear in Figure 4.1 are indeed isomorphic (vertices with corresponding numerical labels are matched under the isomorphism. Presenting them in these
two ways reveals certain automorphisms. In the left hand figure, vertex 1 is
adjacent to vertices 2,3, and 4. The remaining vertices form a hexagon, with
three antipodal pairs, (5, 8), (6, 9) and (7, 10), which are connected to 5, 4 and
3, respectively. Any symmetry of the hexagon induces a permutation of these
antipodal pairs, and hence induces a corresponding permutation of {2, 3, 4}.
Thus, if we rotate the hexagon clockwise we obtain an automorphism of the
graph which permutes the vertices as:
y = (1)(2 3 4)(5 10 9 8 7 6).
But if we reflect the graph about its vertical axis we obtain
t = (1)(2 4)(3)(5 9)(6 8)(7)(10).
Now any automorphism of the graph which fixes vertex 1 must induce some
symmetry of the hexagon on the six vertices not adjacent to vertex 1. The
automorphism inducing the identity, preserves the three antipodal pairs, and
so fixes vertices 2,3 and 4, as well as vertex 1 – i.e. it is the identity element.
Thus if G is the full automorphism of the Petersen graph, G1 is the dihedral
group hy, ti of order 12, acting in orbits, {1}, {2, 3, 4}, {5, 6, 7, 8, 9, 10}.
On the other hand, the right hand figure reveals an obvious automorphism
of order five, namely
u = (1 2 5 6 4)(3 8 10 7 9).
That G is transitive comes from overlaying the three orbits of G1 , and the
two orbits of hui of length five. Together, one can travel from any vertex
to any other, by hopping a ride on each orbit, and changing orbits where
they overlap. This forces [G : G1 ] = 10, the number of vertices. Since
we have already determined |G1 | = 12, we have determined that the full
150
Figure 4.2: The configuration of the projective plane of order 2.
automorphism group of this graph has order |G| = [G : G1 ]|G1 | = 10 · 12 =
120.
(The Projective Plane of Order 2.) Now consider a system P of seven points,
which we shall name by the integers {0, 1, 2, 3, 4, 5, 6}. We introduce a system
L of seven triplets of points, which we call lines:
L1
L2
L3
L4
L5
L6
L0
:
:
:
:
:
:
:
=
=
=
=
=
=
=
[1, 2, 4]
[2, 3, 5]
[3, 4, 6]
[4, 5, 0]
[5, 6, 1]
[6, 0, 2]
[0, 1, 3]
This system is depicted by the six straight lines and the unique circle in
Figure 4.3.3 above. Automorphisms of the system (P, L) are those permutaions of the seven points which take lines to lines – i.e. preserve the system
L. We shall determine the order of the group G of all automorphisms. First,
from the way lines were defined above, as translates mod 7 of L1 , we have
an automorphism
u = (0 1 2 3 4 5 6) : (L1 l2 L3 L4 L5 L6 L0 ).
151
Thus G is certainly transitive on points. Thus the subgroup G1 fixing point
1 has index 7. There are three lines on point 1, namely, L0 , L1 and L5 . There
is an obvious reflection of the configuration of Figure 4.3.3 about the axis
formed by line L5 , giving the automorphism
s = (1)(6)(5)(0 4)(3 2) : (L0 L1 )(L3 L6 )(L2 )(L4 )(L5 ).
But not quite so obvious is the fact that t = (0)(1)(3)(2 6)(4 5) induces the
permutaion (L0 )(L1 L5 )(L2 L3 )((L4 )(L6 ) and so is an automorphism fixing 1
and L5 and transposing L0 and L1 . Thus G1 acts as the full symmetric group
on the set {L0 , L − 1, L5 } of three lines on point 1. Consequently the kernel
of this action is a normal subgroup K of G1 , at index 6. Now K stabilizes
all three lines {L0 , L1 , L5 }, and contains
r = (0 3)(1)(2 4)(5)(6) : (L0 )(L1 )(L2 L4 )(L3 L6 )(L5 ).
So K acts as a transposition on the remaining points of L0 beyond point 1.
Thus K in turn has a subgroup K1 fixing L0 pointwise, and stabilizing L1
and L5 . K1 , in turn, contains
u = (0)(1)(2 4)(3)(5 6) : (L0 )(L1 )(L5 )(L2 L3 )(L4 L6 ).
It in turn contains a subgroup of index 2 whose elements now fix 0, 1, 3, 5
and 6. One can see that any such element stabilizes every line (for every pair
of fixed points determines a fixed line), and hence all intersections of such
lines, and hence fixes all points. Thus K1 has order only 2. We now have
|G| = [G : G1 ][G1 : K][K : K1 ]|K1 | = 7 · 6 · 2 · 2 = 168.
4.4
4.4.1
Primitive and Multiply Transitive Actions.
Primitivity
Suppose G acts on a set X. A system of imprimitivity for the action of
G is a non-trivial partition of X which is stabilized by G. Thus X is a disjoint
union of two or more proper subsets Xi , not all of cardinality one, such that
for each g ∈ G, Xig = Xj for some j – that is, G permutes the components of
the partition among themselves. Of course, if G is transitive in its action on
X, the components Xi all have the same cardinality (evidently one dividing
the cardinality of X), and are transitively permuted among themselves.
152
As an example, the symmetric group Sym(n) acts on all of its elements
by right multiplication (the regular representation). But these elements were
partitioned as Sym(n) = A+ ∪ A− , into those elements which were a product
of an even number of transpositions (A+ ) and those which were a product of
an odd number of transpositions (A− ). Right multiplication permutes these
sets, and they form a system of imprimitivity when n > 2.
We have seen that when N is a non-trivial normal intransitive subgroup of
a group G acting faithfully on X, then the N -orbits form a system of imprimitivity (Lemma 76 considers the case of N -orbits when G acts transitively on
X.)
If a transitive group action preserves no system of primitivity, it is said to
be a primitive group action. We understand this to mean that primitive
groups are always transitive. The immediate characterization is this:
Theorem 86 Suppose G acts transitively on X, where |X| > 1. Then G
acts primitively if and only Gx is a maximal subgroup of G for some x in X.
proof: Since G is transitive, all the subgroups Gx are conjugate in G (see
Lemma 73, part 1). By the fundamental theorem of group actions 74, we
may regard this action as that of right multiplication of the right cosets of
Gx . Now if Gx < H < G, then each right coset of H is a union of right cosets
of Gx . In this way the rights cosets of H partition G/Gx to form a system
of imprimitivity. Thus in the imprimitive case, no such H exists, so Gx is G
or is maximal in G. The former case is ruled out by |X| > 1.
Conversely, if X = X1 + X2 + · · · is a system of imprimitivity, then all the
Xi have the same cardinality and for x ∈ X1 , Gx stabilizes the set X1 . But by
transitivity of G, if L = StabG (X1 ) is the stabilizer in G of the component
X1 , then L must act transitively on X1 . Since |X1 | =
6 1, Gx is a proper
subgroup of L. Since X1 6= X, L is a proper subgroup of G. Thus transitive
but imprimitive action, forces Gx not to be maximal. That is, Gx maximal
implies primitive action of G on G/Gx and hence X.
We immediately have:
Corollary 87 If G has primitive action on X, then for any normal subgroup
N of G either (i) N acts trivially on X (so either N = 1 or the action is not
faithful), or (ii) N is transitive on X.
153
4.4.2
The rank of a group action.
Suppose G acts transitively on a set X. The rank of the group action
is the number of orbits which the group G induces on the cartesian product
X × X. Here an element g of G takes the ordered pair (x, y) of X × X to
(xg , y g ). Since G is transitive, one such orbit is always the diagonal orbital
D = {(x, x)|x ∈ X}. The total number of such “orbitals” – as the orbits
on X × X are called – is the rank of the group action.
Basically, the non-diagonal orbitals mark the possible binary relations on
the set X which could be preserved by G. Such a relation is defined by
asserting that letter x has relation Ri to letter y (which we write as XRi y)
if and only if the ordered pair (x, y) is a member of the non-diagonal orbital
Oi (G-orbit on X × X.) We could then represent this relation by a directed
graph, Γi = (X, Ei ), where the ordered pair (x, y) is a directed edge of Ei if
and only if xRi y holds. In that case G acts as a group of automorphisms of
the graph Gi , – that is, G ≤ Aut(Γi ).
If Oi is an orbital, there is also an orbital Oi∗ consisting of all pairs (y, x)
for which (x, y) belongs to Oi . Clearly OI∗ has the same cardinality as Oi .
The orbital Oi is symmetric if and only if (x, y) ∈ Oi implies (y, x) ∈ Oi ,
that is Oi = Oi∗ .. In this case the relation Ri is actually a symmetric relation.
In that case if Oi 6= D, then the graph Γi is a simple undirected graph.3 Thus
representing groups as groups of automorphisms of graphs, has a natural
setting in permutation groups of rank 3 or more.
Now fix a letter x in X. If (u, v) is in the orbital Oi , there exists by
transitivity an element g such that ug = x. Thus the orbital Oi possesses
a representative (x, y g ), with first coordinate equal to x. Moreover, if (x, v)
and (x, w) are two such elements of Oi with first coordinate x, the fact that
Oi is a G-orbit, shows that there is an element h in Gx taking (x, u) to (x, w).
That is, u and w must belong to the same Gx orbit on X. Thus there is a
one-to-one correspondence between the orbitals of the action of G on X and
Gx -orbits on X. The lengths of these orbits are called the subdegrees of
the permutation action of G on X. By the Fundamental Theorem of Group
Actions 74, these Gx -orbits on X (or equivalently, G/Gx ) correspond to
(Gx , Gx )-double cosets of G, and the length of Gx gGx is [Gx : GX ∩ g −1 Gx g].
We summarize most of this in the following;
3
For graphs, the word “simple” means is has no multiple edges (of the same orientation)
or loops.
154
Theorem 88 Suppose G acts transitively on X.
1. Then the rank of this action is the number of Gx -orbits on X.
2. Their lengths are called subdegrees and are the numbers
[Gx : Gx ∩ g −1 Gx g]
as g ranges over a full system of (Gx , Gx )-double coset representatives.
3. The correspondence of an orbital Oi and a (Gx , Gx )-double coset is that
(Gx , Gx h) ∈ Oi if and only coset Gx h is in the corresponding (Gx , Gx )double coset, namely Gx hGx . Then
|Oi | = |X|[Gx : Gx ∩ g −1 Gx g].
Clearly Oi and Oi∗ correspond to Gx -orbits of the same size. In particular, if an orbital is not symmetric, then there must be two subdegrees
of the same size.
4. If a double coset Gx gGx contains an involution t, then the corresponding
orbital must be symmetric.
remark: All but item 4 was developed in the discussion preceeding. If t is
an involution in Gx gGx , then t inverts the edge (Gx , Gx t) in the graph Γi .
Let’s look at a few simple examples.
1. (Grassmanian Action) Suppose V is a finite-dimensional vector space
of dimension at least 2. As we have done before, let Pk (V ) be the
collection of all k-dimensional subspaces, where k ≤ (dim V )/2. Then
the group GL(V ) acts transitively on Pk (V ), as noted earlier. But
because of the inequality on dimensions, a given k-subspace W may
intersect other k-subspaces at any dimension from 0 to k. Moreover,
the stabilizer H in GL(V ) of the subspace W is transitive on all further
k-subspaces which meet W at a specific dimension. Thus we see that
GL(V ) acts on Pk (V ) with an action which is rank k + 1.
2. (Symmetric Groups) If k ≤ n/2, the group G = Sym(n), transitively
permutes A(k), the k-subsets of A = {1, 2, . . .}. Again, the stabilizer
155
H of a fixed k-subset U , is transitive on the set of all k-subsets which
meet U in a subset of fixed cardinality. Thus G = Sym(n), has rank
k + 1 in its action on all k-subsets.
For example, G = Sym(7), acts faithfully on the 35 points of A(3), as
a rank four group with subdegrees 1, 12, 18, 4.
G = Sym(6), acts as a rank 4 group on the 20 letters of A(3), with
subdegrees 1, 9, 9, 1.
Similarly, G = Sym(5), acts as a rank 3 group on the 10 letters of
A(2), with subdegrees 1, 3, 6. If Oi is the orbital of 30 ordered pairs
corresponding to the subdegree 3, then, as this orbital is symmetric,
it contributes 15 undirected edges on the 10 vertices, yielding a graph
Γi = (A(2), Ei ) isomorphic to Petersen’s graph, whose automorphism
group of order 120 we met in the last section.
3. We have mentioned that G turns out to be a subgroup of the automorphism group of the graph Γi . It need not be the full automorphism
group, and whether it is or not depends on which orbital Oi one uses.
For example, we have just seen above that the group G = Sym(7), has
a rank 4 action on the 35 3-subsets chosen from the set A of 7 letters,
with subdegrees 1, 12, 18, 4. Now it happens that the graph Γ3 corresponding to the subdegree 18 has diameter 2, and its full automorphism
group is Sym(8).
4. This might be a good time to introduce an example of a rank 3 permutation group with two non-symmetric orbitals. Consider the group
G = AQ of all affine transformations,
x → sx + y where x ∈ Zp , s a square,
where Zp denotes the additive group of residue classes of mod a prime
p ≡ 3 mod 4. Here A denotes the additive group of translations {ty :
x → x + y, x ∈ Zp }, and Q denotes the multiplications by quadratic
residue classes modulo p. Then G acts faithfully on G/Q, with A as a
normal regular abelian subgroup. Now the group G has order p(p−1)/2
which is odd. G is rank three with subdegrees 1, (p − 1)/2, (p − 1)/2,
and the non-diagonal orbitals are not symmetric. By Lemma 76, the
graph Γ can be described as follows: The vertices are the residue classes
[m] mod p. Two of them are adjacent if and only if they differ by a
156
quadratic residue class. Since −1 is not a quadratic residue here, this
adjacency relation is not symmetric.
A rank 1 permutation group is so special that it is trivial, for the diagonal
orbital D is the only orbital, forcing |X × X| = |X| which implies |X| = 1.
A rank 2 group action has just two orbitals, the diagonal and one other.
These are the doubly transitive group actions which are considered in the
next subsection. In fact
Corollary 89 The following are equivalent:
1. G acts as a rank 2 permutation group.
2. G transitively permutes the ordered pairs of distinct letters of X.
3. G acts transitively on X and for some letter x ∈ X, Gx acts with two
orbits: {x} and X − {x}.
4.4.3
Multiply transitive group actions.
Suppose G acts transitively on a set X. We have remarked in Section 3.1
that in that case, for every positive integer k ≤ |X|, that G also acts on any
of the following sets:
1. X (k) , the set of ordered k-tuples of elements from X,
2. X (k)∗ , the set of ordered k-tuples of elements of X with entries pairwise
distinct.
3. X(k), the set of (unordered) subsets of X of size k.
For k > 1, the first set is of little interest to us; for the group can never
act transitively here, and what orbits that exist tend to be be equivalent to
orbits on the other two sets. A group G is said to act k-fold transitively
on X if and only if it induces a transitive action on the set X (k)∗ of k-tuples
of distinct letters. It is said to have a k-homogeneous action if and only
if G induces a transitive action of the set of all k-sets of letters.
Lemma 90
1. If G acts k-transitively on X, then it acts k-homogeneously
on X. That is, k-transitivity implies k-homogeneity.
157
2. For 1 < k ≤ |X|, a group which acts k-transitively on X acts (k − 1)transitively on X.
3. For 1 < k < |X| − 1,a group G which acts transitively on X, acts
k-fold transitively on X if and only if the subgroup fixing any (k − 1)tuple (x1 , . . . , xk−1 ) transitively permutes the remaining letters, X −
{x1 , . . . , xk−1 }.
proof: The first two parts are immediate from the definitions. The third
part follows from Corollary 89 when k = 2. So we assume 2 < k < |X| − 1,
and will apply induction.
Assume X = {1, 2, 3, . . .} (the notation is not intended to suggest that X
is countable; only that a countable few elements have been listed first), and
suppose G acts transitively on X and that for any finite ordered (k − 1)-tuple
(1, 2, . . . , (k−1)), the subgroup G1,...(k−1) that fixes the (k−1)-tuple pointwise,
is transitive on the remaining letters, X − {1, 2, . . . , (k − 1)}. Clearly, the
result will follow if we can first show that G is (k − 1)-fold transitive on
X. Now set U = X − {1, . . . , (k − 2)}. The subgroup H := G1,...,k−2 , of all
elements of G fixing {1, . . . , k − 2} pointwise, properly contains the group
G1,...,k−1 just discussed, and has this property:
• For every letter u in U , Hu fixes {u} and is transitive on U − {u}.
Since k < |X| − 1 causes |U | ≥ 3, the presented property implies H acts
transitively on U . Thus we have the hypotheses if part 3 for k − 1, and so
we can conclude by induction, that G has (k − 1)-fold action on X. Then
the hypothesis on G1,...,k−1 implies the k-fold transitive action.
The converse implication in part 3, that k-fold transitive action causes
G1,...,k−1 to be transitive on all remaining letters, is straightforward from the
definitions.
remark: The condition that k < |X| − 1 is necessary. The group Alt(5)
acts transitively on five letters, and any subgroup fixing four of these letters,
is certainly transitive on the remaining letters (or more accurately “letter”,
in this case). Yet the group is not 4-transitive on the five letters: it is only
3-fold transitive.
Lemma 91 Suppose f : G → Sym(X) is a 2-transitive group action. Then
the following statements hold:
158
1. G is primitive in its action on X.
2. If x ∈ X, G = Gx + Gx tGx where t is congruent to an involution
mod ker f .
3. Suppose N is a finite regular normal subgroup of G. Then N has no
proper characteristic subgroups and so is a direct product of cylic groups
of prime order p. (Such a group is called an elementary abelian pgroup. It follows that |N | = |X| = pa is a prime power.
proof: 1. If X1 were a non-trivial component of a system of imprimitivity,
then 1 6= X1 6= X| and for any x ∈ X1 , The subgroup Gx stabilizes X1 . But
Gx acts transitively on X − {x}, forcing X1 = X, a contradiction. So there
is no such system of imprimitivity, as was to be shown.
2. Since n(n − 1) = |X (2)∗ | divides |G/ker f |, the latter number is even.
So by Sylow’s theorem applied to G/ker f , there must exist an element z
of G which acts as an involution on X. Then z must induce at least one
2-cycle, say (a, b) on X. From transitivity, we can choose g ∈ G such that
ag = x. Then t := g −1 zg is the involution displacing x and can be taken as
the double coset representative.
3. By Lemma 76 part 4, Gx transitively permutes the non-identity elements by conjugation. Since the conjugation action, induces automorphisms
of N , N does not contain any proper characteristic subgroups. Also, all
non-identity elements are G − conjugate and so have the same finite order,
which must be a prime p. This means Sylr (N ) = {1} if r 6= p, so N is a
p-group. By Lemma 79, N has a non-trivial center, and so is abelian. Since
all its elements have order p, it is elementary abelian. Since it is regular, the
statements about |X| follow.
Let us discuss a few
Examples
1. The groups FinSym(X) are k-fold transitive for every natural number
k not exceeding |X|. However, an easy induction proof shows that
Alt(X) is (k − 2)-transitive for each natural number k less than |X|.
(Recall that the alternating group is by definition a subgroup of the
finitary symmetric group.)
159
2. The group GL(V ) has a 2-transitive action on the set P1 (V ) of all
1-subspaces of V .
This set is called the set of projective points of V ; the set P2 (V ) of all
2-dimensional spaces is called the set of projective lines. Incidence
between P1 (V ) and elements of P2 (V ) is the relation of containment of
a 1-subspace in a 2-subspace of V . Then P G(V ) := (P1 (V ), P2 (V )) is
an incidence system of points and lines with this axiom:
Linear Space Axiom. Any two points are incident with a unique
line.
If dim V = 2, Gl(V ) is doubly transitive on the set P1 (V ), the projective
line. If the ground field for the vector space V is a finite field containing
q elements, then there are q + 1 projective points being permuted. The
group of actions induced – that is GL(V )/K where K is the kernel
of the action homomorphism – is the group P GL(2, q), and it is 3transitive on P1 (V ).
If dim V = 3, it is true that any two distinct 2-subspaces of V meet in
a 1-subspace. Thus we have the additional property:
(Dual linear) Any two distinct lines meet at a point.
Any system of points and lines, satisfying both the Linear Space Axiom
and the Dual Linear Axiom is called a projective plane. Again, if
the ground field of V is a finite field of q elements then we see that
|P1 (V )| = 1 + q + q 2 = |P2 (V )|
and each projective point lies on q +1 projective lines, and, dually, each
projective line is incident with exactly q + 1 projective points.
In the case q = 2 (which means that the field is Z/(2), integers mod 2),
we get the system of seven points and seven lines which we have met as
an example in Section 3.4.4. We saw there that the full automorphism
group of the projective plane of order 2 had order 168.
160
4.4.4
Permutation characters
The results in this section only make sense for actions on a finite set X.
Suppose f : G → Sym(X) is a group action on a finite set X. Then f (G)
is a finite group of permutations of X. Without loss we assume G itself is
finite. (We do this, so that some sums taken over the elements of G are finite.
But this is no drawback, for the assertions we wish to obtain can in general
be applied to f (G) as a faithfully acting finite permutation group.)
Associated with the finite action of G is the function
π:G→Z
which maps each element g of G to the number π(g) of fixed points of g
– that is, the number of letters that g leaves fixed. This function is called
the permutation character of the (finite) action f , and, of course, is
completely determined by f . The usefulness of this function is displayed in
the following:
Theorem 92 (Burnside’s Lemma)
1. The number of orbits on which G acts on X is the average:
X
(1/|G|)
g∈G
π(g).
In particular, G is transitive, if and only if the average value of π over
the group is 1.
2. The number of orbits of G on X (k)∗ is
X
(1/|G|)
g∈G
π(g)(π(g) − 1) · · · (π(g) − k + 1).
3. In particular, if G acts 2-transitively, then
X
(1/|G|)
π(g)2 = 2.
Also, if G is 3-transitive,
X
(1/|G|)
g∈G
161
(π(g))3 = 5.
proof: 1. Count pairs (x, g) ∈ X × G with xg = x. One way is to first
choose x ∈ X and then find g ∈ Gx . This count yields
X
|Gx | =
x∈X
X
X
orbits |Gx ||Ox | =
orbits |G| = |G| · no. of orbits.
On the other hand we can count these pairs by choosing the group element
g first, then x ∈ X in π(g) ways. thus
|G| · no. of orbits =
X
g∈G
π(g).
2. The number of ordered k-tuples left fixed by element g is π(g) · (π(g) −
1) · · · (π(g) − k + 1) (since such a k-tuple must be selected from the fixed
point set of g). We then simply apply part 1. with X replaced by X (k)∗ .
3. If G is 2-transitive then the average value of π(π − 1) is 1. But as such
a group is also transitive, the average value of π is also 1. Thus the average
value of π 2 is
the average value of (π(π − 1) + π) = 1 + 1 = 2.
Similarly, if G is 3-transitive,
X
(1/|G|)
g∈G
(π(g))3 = (1/|G|)
X
+
X
g∈G
2
(π(g)(π(g) − 1)(π(g) − 2))
X
3π − 2
g∈G
g∈G
π(g)
= 1 + 2 · 3 − 2 · 1 = 5.
4.5
Exercises for Chapter 3.
Exercise 1. Show that in any finite group G of even order, there are an
odd number of involutions. [Hint: Choose t ∈ I, the set I = inv(G) of
all involutions in G. Then I partitions as {t} + Γ(t) + ∆(t), where Γ(t) :=
CG (t) ∩ I − {t} and ∆(t) := tG − CG (t). Show why the last two sets have
even cardinality.]
Exercise 2. (Herstein) Show that if exactly one-half of the elements of a
finite group are involutions, then G is dihedral of order 2n, n odd. [Hint:
By exercise 1, the number of involutions is an odd number equal to onehalf the group order, and by Sylow’s theorem all involutions are conjugate,
162
forcing CG (t) = hti. The result follows from any Transfer Theorem such as
Thompson’s.4 ]
Exercise 3. This has several parts:
1. Show that if p is a prime, then any group of order p2 is abelian. [Hint:
It must have a non-trivial center, and the group mod its center cannot
be a non-trivial cyclic group.]
2. Let G be a finite group of order divisible by the prime p. Show that
the number of subgroups of G or order p is congruent to one modulo
p. [Hint: Make the same sort of decomposition of the class of groups
of order p as was done for groups of order 2 in Exercise 1. You need to
know something about the number of subgroups of order p in a group
of order p2 .]
3. If prime number p divides |G| < ∞, show that the number of elements
of order p in G is congruent to −1 modulo p.
4. (McKay’s proof of Exercises 1 and 3, part 2.) Let G be a finite
group of order divisible by the prime p. Let S be the set of p-tuples
Q
(g1 , g2 , . . . , gp ) of elements of G such that gi = 1.
(a) Show that S is invariant under the action of σ, a cyclic shift which
takes (g1 , g2 , . . . , gp ) to (gp , g1 , . . . , gp−1 ).
(b) As H = hσi ' Zp acts on S, suppose there are a H-orbits of length
1 and b H-orbits of length p. Show that |G|p−1 = |S| = a + bp.
As p divides the order of G conclude that p divides a.
(c) Show that a 6= 0. Conclude that a is the number of elements of
order p in G, and is a non-zero multiple of p.
(d) Let P1 , . . . , Pk be a full listing of the subgroups of order p in G.
Show that | ∪ Pj | = a ≡ 0 mod p, and deduce that k ≡ 1 mod p.
5. Suppose pk divides the order of G and P is a subgroup of order pk−1 .
Show that p divides the index [NG (P ) : P ]. [Hint: Set N := NG (P ) and
suppose, by way of contradiction that p does not divide [N : P ]. Then
4
I am convinced that this is the elementary proof that Herstein had in mind rather
than a number of diatribes on this subject which have appeared in the problems section
of the AMA monthly.
163
P is the unique subgroup of N of order pk−1 and so is characteristic in
N . As pk divides |G|, we see that p divides the index [G : N ]. Now P
acts on the left cosets N in G by left multiplication, with N comprising
a P -orbit of length one. Show that there are at least p such orbits of
length one so that there is a left coset bN 6= N , with P bN = bN . Then
b−1 P b ≤ N forcing b−1 P b = P from the uniquness of its order. But
then b ∈ N , a contradiction.]
6. Suppose the prime power pk divides the order of the finite group G.
Define a p-chain of length k to be a chain of subgroups of G,
P1 < P 2 < · · · < P k ,
where Pj is a subgroup of G of order pj , j = 1, . . . , k. Show that the
number of p-chains of G of length k is congruent to one modulo p.
[Hint: Proceed by induction on k. The case k = 1 is part 4 (d) of this
Exercise. One can assume the number m of p-chains of length k − 1
is congruent to 1 mod p. and these chains of length k − 1 partition
the chains of of length k. If c is a chain of length k − 1 with terminal
member Pk−1 , then any extension of c to a chain of length k is obtained
by adjoining a group Pk of order pk in the normalizer NG (Pk−1 ). The
number of such candidates is thus the number of subgroups of order
p in NG (Pk−1 )/Pk−1 . Part 5 of this Exercise shows that p divides this
group order, and so by part 4 (d), the number of Zp ’s in the factor
N/P is congruent to one mod p. Show this forces the conclusion.]
7. From the previous result show that if pk divides |G|, for k > 0, then
the number of subgroups of order pk is congruent to 1 mod p. [Remark:
Note that parts 3 though 7 of this Exercdise did not require either the
Sylow theorems of even Cauchy’s Theorem – indeed, they essentially
reprove them. Exercise 1 and part 2 of this Exercise, on the other hand,
required Cauchy’s theorem in order to have Zp ’s to discuss.]
Exercise 4. Show that any group of order fifteen is cyclic. [Hint: Use
Sylow’s Theorem to show that the group is the direct product of two cyclic
p-sylow subgroups.]
Exercise 5. Show that any group of order 30 has a characteristic cyclic
subgroup of order 15.
164
Exercise 6. Show that any group N which admits a group of automorphisms
which is 2-transitive on its non-identity elements is an elementary abelian 2group or a cyclic group of order 3.
Extend this result by showing that if there is a group of automorphisms
of N which is is 3-transitive on the the non-identity elements of N , then
|N | ≤ 4.
Exercise 7. Show that if G faithfully acts as a k-fold transitive group on
a set X, and Gx is a simple group, then (as k is at least 1) any proper
normal subgroup of G is regular on X. Using a basic lemma of this chapter,
show that G is the semidirect product N Gx ( here Gx ∩ N = 1 so Gx is a
complement to N in G). Moreover, Gx acts (k − 1)-fold transitively on the
non-identity elements of N .
Conclude that under these circumstances, G can be at most 4-fold transitive on X, and then only in the case that |X| = 5 and G = Alt(5).
Exercise 8. Show that the group Alt(5) acts doubly transitively on its six
5-sylow subgroups.
Exercise 9. Show that if N is a non-trivial normal subgroup of G = Alt(5),
then (as G acts primitively on both 5 and 6 letters) N either (i) has order
divisible by 30 or (ii) normalizes and hence centralizes (Why?) each 5-sylow
subgroup.
Exercise 10. Conclude that possibility (ii) of the previous exercise is impossible, by showing that a 5-sylow subgroup is its own centralizer.
Exercise 11. Show in just a few steps, that Alt(5) is a simple group.
Exercise 12. If G is a k-transitive group, and Gx1 ,...xk−1 is simple, then,
provided k ≥ 3, and |X| > 4 , G must be a simple group. So in this case,
the property of simplicity is conferred from the simplicity of a subgroup.
(This often happens for primitive groups of higher permutation rank, but
the situation is not easy to generalize.)
Exercise 13. Organize the last few exercises into a proof that all alternating
groups Alt(n) are simple for n ≥ 5.
Exercise 14. (Witt’s Trick) Suppose K is a k-transitive group acting faithfully on the set X, where k ≥ 2.
165
1. Show that if H := Kα , for some letter α in X, then K = H + HtH for
some involution t in K.
2. Suppose ∞ is not a letter in X, set X 0 := X ∪ {∞}, and set S :=
Sym(X 0 ). Suppose now that S contains an involution s such that
(a) s transposes ∞ and α,
(b) s normalizes H – i.e. sHs = H, and
(c) o(st) = 3, that is, (st)3 = 1, while st 6= 1.
Then show that the subset Y = K ∪ KsK is a subgroup of S. [Hint: One
need only show that Y is closed under multiplication, and this is equivalent
to the assertion that sKs ⊂ Y . Decompose K; use the fact that sH = Hs
and that sts = tst.]
Exercise 15. Show that in the previous Exercise Y is triply transitive on
X 0.
Exercise 16. Suppose G is a simple group with an involution t such that
the centralizer CG (t) is a dihedral group of order 8. Show the following:
1. The involution t is a central involution – that is, t is in the center of
at least one 2-Sylow subgroup of G, or, equivalently, |tG | is an odd
number. Thus any 2-Sylow subgroup of G is just D8 .
2. There are exactly two conjugacy classes of fours groups – say V1G and
V2G , where V1 and V2 are the unique two fours subgroups of a 2-Sylow
subgroup D of G. Each of them is a TI-subgroup – that is a subgroup
which intersects its distinct conjugates only at the identity subgroup.
3. There is just one conjugacy class of involutions in G.
[Hints: This is basically an exercise in the use of the Burnside Fusion Theorem
and Sylow’s Theorem.
Part 1.: Take D = CG (t) ' D8 . Then its center is just hti, so NG (D) ≤
CG (t) = D.
Part 2.: Let Vi , i = 1, 2, be the two unique fours subgroups of D ∈ Syl(G).
By the Burnside Fusion theorem they could be conjugate in G only if this
were true in NG (D) = D, where it is patently untrue. Clearly two conjugates
of Vi could at best meet at a central involution, of which there is only one
166
class, tG . In that case they would be two fours subgroups of the same 2-Sylow
subgroup, and hence could not be conjugate.
Part 3.: Suppose false: Then at least one involution, say without loss
v1 ∈ V1 , is not central; so its centralizer in G, C1 , contains V1 as a 2-Sylow
subgroup. By the Thompson Transfer Theorem, the fact that the fours group
V2 has index 2 in the 2-Sylow subgroup D and the fact that G possesses no
normal subgroup of index 2, t1 is conjugate to some involution in V2 −{t}, say
t2 . (At this stage t is conjugate to no further involution in D – a so-called
Z ∗ -involution. 5 ) But then by the Burnside Fusion Theorem, the normal
subsets Xi := {ti , ti t} are conjugate in N (D) = D, a contradiction. 6 ]
Exercise 17. This exercise is really an optional student project. First show
that if G if a 4-fold transitive group of permutations of the finite set X, then
X
(1/|G|)
g∈G
π(g)4 = 15.
The object of the project would be to show that in general, if G is k-fold
transitive on X then
X
(1/|G|) g∈G π(g)k = Bk ,
where Bk is the total number of partitions of a k-set (also known as the k-th
Bell number). An analysis of the formulae involves one in expressing the
number of sequences (with possible repititions) of k elements from a set X
in terms of ordered sequences with pair-wise distinct entries. The latter are
expressed in term of the so-called “falling factorials” x(x − 1)(x − 2) · · · (x −
` + 1) in the variable x = |X|. The coefficients involve Stirling numbers
and the Bell number can be reached by known identities. But a more direct
argument can be made by sorting the general k-tuples over X into those
having various sorts of repitions; this is where the partitions of k come in.
5
There is a famous theorem – G. Glauberman’s Z∗-theorem –which asserts that no
Z -involution can occur in a simple group of order different than 2. Of course we cannot
invoke such things here.
6
Note the power here in using the Burnside Fusion theorem on normal subsets of a
Sylow subgroup, not just normal subgroups.
∗
167
Appendix to Chapter 4: Minimal normal subgroups of
finite doubly transitive groups.
All groups in this section are finite groups.
We have seen in Section 3.7 that a minimal normal subgroup of any finite
group is a direct product of isomorphic simple groups. If one of these direct
factors is a cyclic group of prime order p (denoted Zp ), then the minimal
normal subgroup is a direct product of finitely many Zp ’s. Such a group
is called an elementary abelian p-group (cf. Lemma 22). Such a finite
group A is characterized by the properties of being abelian and of exponent
p – that is, xp = 1 for all x ∈ A.
The main result of the section is a theorem of Burnside stating that a
minimal normal subgroup of a (finite) doubly transitive group is either a
regular elementary abelian p-group or a transitive finite non-abelian simple
group. I have always found the proof in Burnside’s book difficult to read.
Presentations of this lemma are infrequent in the textbook literature. Those
rare books on permutation groups which will even refer to the theorem invariably refer to the original in Burnside’s book (Wielandt [?], or Passman
[?]). We revisit this theorem for two purposes: (1) to see the theory we
have presented in action, and (2) to extract the theorem into a transparent
referrable form. We present below two distinct proofs, the second being a
rewriting of Burnside’s original proof in modern language.7
4.5.1
First proof:
The version offered here closely exploits Sylow’s theorem.
7
My own history with this theorem is nearly idiotic: Until a very few years ago, I
had always beleived Burnside’s original proof was basically flawed. A few years ago, R.
Abhyankar (learning group theory as fast as he could – that is, faster than most of us)
asked me on the spot for a proof of Burnside’s result. It took a night but I coughed up
the first proof below. It uses little more than Sylow’s theorem and the standard results
about permutations groups given earlier in this chapter.
Recently my graduate student (Kasikova) remarked that she thought Burnside’s proof
was correct. One of the things that I have learned is to always pay attention to her slightest
mathematical remarks. So I looked it over and decided that indeed, Burnside’s original
proof is correct, provided there is some straightening up of the pronoun references. As it
stands it is a proof which is easily baffling, and for that reason alone, needs a reintroduction
into the modern literature. Similarly, this proof uses only elementary concepts about
permutations groups which have been presented in Chapter 3. The proof is not different
than Burnside’s; it is just translated into modern language.
168
But first, a few elementary technical facts must be set down.
Lemma 93 Assume H is a proper subgroup of G which contains a p-Sylow
normalizer of G. Then for some element g ∈ G, H ∩ H g does not contain a
p-Sylow subgroup of G.
proof: Assume, for every element g ∈ G, that H ∩ H g contains a p-Sylow
−1
subgroup of G. For any g, let R ∈ Syl(H ∩ H g ) ⊆ Syl(G). Then {R, Rg } ⊆
−1
Syl(H), so by Sylow’s theorem, Rg h = R for some h ∈ H. Then g −1 h ∈
NG (R) ≤ H. So g ∈ H. But since g was arbitrarily chosen, we have H = G,
a contradiction. Thus our beginning assumption must be false, and so the
conclusion holds.
Lemma 94 Suppose a group G is the direct product of two of its subgroups,
A and B – so we write G = A × B. Suppose L is a subgroup G of the
form L = A1 × B1 , where A1 ≤ A and B1 ≤ B. Then for any element
g = (g1 , g2 ) ∈ A × B = G,
L ∩ Lg = (A1 ∩ Ag11 ) × (B1 ∩ B1g2 ).
(4.3)
−1
proof: An element y of L lies in L ∩ Lg if and only if y · y g ∈ L. So,
writing y = ab, where a ∈ A1 , b ∈ B1 , and g = g1 g2 where g1 ∈ A and
g2 ∈ B, y ∈ L ∩ Lg is equivalent to asserting that
y · yg
−1
= ab(ab)g
−1
−1
−1
−1
−1
= abag1 bg2 = (aag1 )(bbg2 )
is an element of L and therefore, is uniquely expressible as an element of
A1 times and element of B1 as well as uniquely expressible as an element of
−1
−1
A times an element of B. Thus aag1 ∈ A1 and bbg2 ∈ B1 , from which it
follows that a ∈ A1 ∩ Ag11 and b ∈ B1 ∩ B1g2 . So the left side of equation ( 4.3)
is contained in the right.
The reverse containment is easy.
If K is any finite group, and p is a prime number, the symbol |K|p will
denote the highest power of p dividing the group order |K| – that is, the
order of a p-Sylow subgroup of K.
169
Lemma 95 Suppose G = X1 × X2 , where the Xi are regarded as subgroups
of G. Suppose H is a subgroup of G, containing neither direct summand Xi ,
but containing nontrivial p-Sylow subgroups of both X1 and of X2 . Finally,
assume that H contains a p-Sylow normalizer of G. Then there exist elements
x, y ∈ G − H such that
|H ∩ H x |p 6= |H ∩ H y |p .
proof: By hypothesis, H contains a full p-sylow subgroup P = P1 × P2 ,
where 1 6= Pi ∈ Sylp (X1 ), i = 1, 2.
Set Hi := H ∩ Xi , a normal subgroup of H containing Pi as a nontrivial
p-Sylow subgroup of Hi , i = 1, 2. Then P is a p-Sylow subgroup of H lying
within its normal subgroup K := H1 H2 . It follows that K = H1 H2 contains
every p-Sylow subgroup of H – even every p-element of H.
Choose an element g = g1 g2 ∈ G − H where gi ∈ Xi , i = 1, 2. Let R be
a p-Sylow subgroup of H ∩ H g . Then as R is a p-subgroup of H, R must lie
in K = H1 × H2 , as well as K g . Thus R is a p-Sylow subroup of
K ∩ K g = (H1 ∩ H1g1 ) × (H2 ∩ H2g2 ),
by Lemma 94. Thus
|H ∩ H g |p = |H1 ∩ H1g1 |p · |H2 ∩ H2g2 |p .
Fix i ∈ {1, 2}. By hypothesis, Hi is a proper subgroup of Xi containing
a non-trivial p-Sylow subgroup Pi of X1 . Moreover, as
NXi (Pi ) ≤ NG (P1 × P2 ) ∩ X1 ≤ H ∩ X1 = H1 ,
we see that (Xi , Hi , Pi ) satisfy the hypotheses of (G, H, P ) in Lemma 93.
Thus it is possible to choose gi so that
|Hi ∩ Higi |p < |H1 |p .
Now setting x = g1 and y = g1 g2 , we see that
|H ∩ H y |p = |H1 ∩ H1g1 |p · |H2 ∩ H2g2 |p
|H ∩ H x |p = |H1 ∩ H1g1 |P · |H2 |p
170
(4.4)
(4.5)
(4.6)
So
|H ∩ H y |p < |H ∩ H x |p < |H|p .
The last inequality shows that neither x nor y lie in H. The first inequality,
is the desired conclusion. This completes the proof.
Now we can proceed with the main result.
Theorem 96 Suppose G is a finite doubly transitive group of permutations
of a set X containing at least two objects. Then any minimal normal subgroup
of G is either
1. an elementary abelian p-group which acts regularly on X (so that |X|
must be a prime power), or
2. a non-abelian simple group.
proof: One first notes that since the class of elementary p-groups includes
the identity subgroup, we may assume |X| > 1 (or equivalently, that |G| > 1).
In any case, we may assume that N is a minimal normal (and hence
non- identity) subgroup of G. By Corollary 20 at the end of Chapter 3, N
is either an elementary abelian p-group or the direct product of isomorphic
simple nonabelian groups. Since double transitivity implies primitivity, N
must be a transitive group. In the former case (since G acts faithfully), N
must be a regular group and the first conclusion holds. Thus we may assume
N = X1 × . . . Xt ,
where the direct factors Xi are non-abelian groups of a single isomorphism
type. It remains only to show that t = 1.
Assume by way of contradiction that t > 1. Fix a “letter” α in X. Then
by double transitivity, the one-point stabilizer, Gα , is transitive on X − {α},
and so its normal subgroup H := Gα ∩ N acts on X − {α} with orbits all of
the same length. This means that for every element g ∈ N − H, the index
[H : H ∩ H n ] does not depend on the choice of n ∈ N − H. Put another way
|H ∩ H x | is constant for all x ∈ N − H.
Moreover
171
(4.7)
(*) H cannot contain a non-trivial normal subgroup of N ,
since otherwise the fact that such a normal subgroup of transitive N fixes a
letter, would imply that it fixes all letters and hence is trivial. In particular
H cannot contain any Xi .
Now suppose P is a p-Sylow subgroup of H. If NN (P ) is not contained
in H, then, by choosing x ∈ NN (P ) − H in Equation ( 4.7), we see that |P |
divides |H ∩ H x | for all x ∈ N . Now if this happened for every prime p which
divides |H|, we should have |H ∩ H x | = |H| for all x ∈ N – that is H = H x
for all x, or H is normal in N against (*). Thus there must be some prime
p dividing H for which the p-Sylow subgroup P of H satifies NN (P ) ≤ H.
We fix such a p-Sylow subgroup P . Now, since a proper subroup of a
finite p-group properly lies in its normalizer in that p-group, the condition
NN (P ) ≤ H in fact implies that P is a full p-Sylow subgroup of N . Then
P = P1 × P2 × · · · × Pt , where Pi is a full p-Sylow subgroup of Xi . Since
the Xi are isomorphic, all the Pi are non-trivial groups.
Now, noting that (*) prevents H from containing B = X2 × · · · × Xt ,
the quintette of groups (N, X1 , B, H, P ) satisfy all of the hypotheses on
(G, X1 , X2 , H, P ) in Lemma 95. This means we can find x, y ∈ N − H
such that
|H ∩ H x |p < |H ∩ H y |p for some x, y ∈ N − H.
This clearly contradicts (*). Thus t > 1 is impossible and the proof is
complete.
4.5.2
Burnside’s original proof.
A point-line geometry (P, L) is a set P of points together with a collection
L of subsets of P called lines. Such a point-line geometry is called a linear
space if and only if any two distinct points lie in exactly one line, lines
contain at least two points, and there are at least two distinct lines. It
follows that each point must lie on at least two distinct lines.
The automorphism group of a point-line geometry (P, L), is the set
Aut(P, L) of all permutations of P which induce a permutation of the lines
– that is, they stabilize the collection L. We have used these as examples of
groups in Chapter 3.
Theorem 97 Suppose N is a point-transitive group of automorphisms of a
linear space (P, L) with the property that for each point p, its stabilizer Np ,
172
leaves invariant each line on point p. Then the stabilizer in N of two distinct
points is the identity subgroup – i.e. Np ∩ Nq = 1, for distinct points p and
q.
Proof: Suppose p and q are distinct points and L is the unique line of L
containing {p, q}. Then any point z not in L is the intersection of two lines
(one on z and p; the other on z and q) which are fixed by Np ∩ Nq . Thus the
latter fixes the non-empty set P − L pointwise. But then, any point u in L
is the intersection of the line on u and z with L and both of these are also
stabilized by Np ∩ Nz , which we have seen contains Np ∩ Nq . Thus Np ∩ Nq
fixes every point of P, and so contains only the identity permutation.
The next two lemmas involve direct products of groups and are slightly
more general than needed.
Lemma 98 Suppose G = A × B, where each factor is non-trivial. Suppose
H is a subgroup of G with the “Frobenius property” that
H ∩ H g = 1 for all elements g ∈ G − H
Then H is not a proper subgroup – that is, H = 1 or H = G.
Proof: We identify A with subgroup A × 1 and B with subgroup 1 × B so
G = AB, and A ∩ B = 1.
Assume 1 6= x ∈ H ∩ A. Then x ∈ H ∩ H b for all b ∈ B. The Frobenius
property then implies B ≤ H. Since B 6= 1, the same argument with A
replacing B shows A ≤ H whence H = G. So we can assume H ∩ A =
H ∩ B = 1.
Now if 1 6= h = ab ∈ H where a ∈ A and b ∈ B, then by the previous
paragraph neither a nor b is in H. But h ∈ H ∩ H b and the Frobenius
property imply b ∈ H, a contradiction.
Thus H = 1, and the proof is complete.
Lemma 99 Suppose G = A × B is a group of permutations of a set X, with
each factor a non-identity group.
1. Suppose the subgroup A × 1 is transitive. Then the other factor 1 × B
is semiregular – that is, the only element of 1 × B which fixes a letter
is the identity permutation.
173
2. If G is primitive, then
(a) each factor acts regularly on X,
(b) each factor is a non-abelian simple group, and
(c) the two simple factors A and B are isomorphic
3. Suppose φ : A → B is an isomorphism of simple groups. Let H :=
{(a, φ(a))|a ∈ A}. Then H is a maximal subgroup of G = A × B.
Proof: Again we adopt the convention of writing A for A × 1 and B for
the subgroup 1 × B.
Part 1. Suppose A is transitive. Then for any x ∈ X, Bx fixes xA = X,
so Bx = 1X .
Part 2. If G is primitive, each of the normal subgroups A and B are
transitive and so by Part 1, each is regular. Then for x ∈ X, we have
G = AGx = BGx , and A ∩ Gx = B ∩ Gx = 1. Thus
A ' G/B ' Gx ' G/A ' B.
Now suppose N were a proper normal subgroup of A. Then N is normal
in G. Since G is primitive, N must be transitive. But that is impossible since
N is a proper subgroup of a regularly acting group. Thus no such N exists
and so A is a simple group. So is B ' A. If A were abelian, then G would
be abelian, and Gx would be central and so would fix xA = X. This would
force Gx = 1 contrary to Gx ' A above. Thus A and B are both non-abelian
simple groups.
Part 3. Suppose H < M ≤ A × B = G. Then M ∩ A contains a
non-identity element a0 . Then for any g = (a, φ(a)) ∈ H, the commutator
−1
−1
a0 ga−1
= a0 aa−1
0 g
0 a ,
−1
belongs to M . But the subgroup of A generated the set {a0 aa−1
0 a |a ∈ A}
is a normal subgroup of A (this is shown in the next Chapter). Since A is a
simple group, we have A ≤ M . Similarly B ≤ M so M = G. This proves H
is maximal in G.
Part 3 of this Lemma is only here to show that the direct procuct of two
simple groups can act as a primitive group.
Now all the peices are assembled to run through Burnside’s argument.
174
Theorem 100 (Burnside) Let G be a doubly transitive group of permutations of the set X. Let N be a minimal normal subgroup of G. Then either
1. N is an elementary abelian p-group acting regularly on X, or else
2. N is a non-abelian simple group.
Proof: We know that as N is minimal normal it is transitive and is either
an elementary abelian p-group, or is a product of isomorphic non-abelian
simple groups. In the former case N is regular.
So we may assume
N = S1 × · · · St , t ≥ 2,
where all the Si are isomorphic to a fixed non-abelian simple group.
There are two cases:
Case 1: N is primitive on X. Thus setting A = S1 and B = S2 × · · · × St
in Lemma 99 (Part 2) we see that A and B are isomorphic simple groups,
so t = 2. Moreover each acts regularly on X. Then G contains a normal
subgroup
H = NG (S1 ) = NG (S2 ) = Hx S1 ,
for which Hx acts in at most two orbits on Xx . This means that in the
semidirect product S1 Hx , Hx acts in at most two orbits on the non-identity
elements of S1 . Thus, since S1 is not a p-group, we see that S1 has order pq,
p and q primes. But in that case there can be only one p-Sylow subgroup for
the larger prime, against S1 being simple.
Case 2: N is imprimitive on X. Since N is transitive, a system of imprimitivity for N must be a partition
π : X = X1 + X2 + · · · + Xm , m > 1,
where each Xi has the same cardinality. Among all such non-trivial partitions
stabilized by N choose π so the cardinality |X1 | is minimal (but larger than
1). Now if G stabilized π, it would not be primitive. Thus for some g ∈ G−N ,
π g : X = X1g + · · · + Xtg ,
is distinct from π, and thus their common refinement (that is, their meet
π ∧ π g in the lattice of partitions of X) is the trivial partition. This means
175
for any 1 ≤ i ≤ j ≤ m and elements g1 and g2 in G, the two sets Xig1 and
Xjg2 either coincide or intersect in at most one letter. Thus setting P := X
and
L := {Xig |1 ≤ i ≤ m.g ∈ G},
we see that (P, L) is a point-line geometry in which any two lines intersect
in at most one point, and that each point lies on at least two lines (since L
consists of more than the components of π). In fact, since G acts as a group
of automorphisms of (P, L) which is doubly transitive on points, every pair
of distinct points is on a line. Thus (P, L) is a linear space.
Now the subgroup N , being normal in G and fixing π, stabilizes each
“parallel class” π g , g ∈ G. It follows that Nx stabilizes each line of L on
point x. Thus by Theorem 97, Nx ∩ Ny = 1 for distinct points x and y. In
particular, for any element g ∈ N − Nx , Nxg = Ny where y = xg 6= x. This
means that H := Nx is a subgroup of N with the Frobenius property. Now
the assumption that t > 1 and Lemma 99 together imply that H = 1 or
H = N . In the latter case, N fixes x so X = {x} and N = 1, against its
being minimal normal. In the former case N is regular in N . But in that case
Gx is a complement to N in G which acts on N − {1} in one orbit, forcing
N to be an elementary p-group, contrary to the current case-division.
Thus we are forced to have t = 1.
This, then, is Burnside’s argument with all the smallest steps filled in, all
the “i’s” dotted.
Example Here is a simple example showing that a non-abelian minimal
normal subgroup of G need not be simple when the hypothesis of double
transitivity is weakened to being a primitive rank three permutation group.
Let G be the graph of the 5-by-5 grid (sometimes called a lattice graph).
Then G contains the direct product of two copies of Sym(5) as a normal
subgroup M of index 2. Outside this is an involution, t, conjugation by
which interchanges the two direct factors of M . (The group G = hM, ti is
called the wreath product of Sym(5) by Z2 .)
Now G acts as a rank 3 permutation group on the 25 vertices of the grid
graph with subdegrees 1, 8 and 16, and so is primitive. But G has a unique
minimal normal subgroup N ' Alt(5)×Alt(5), with each direct factor acting
imprimitively.
176
Chapter 5
Normal Structure of Groups
5.1
The Jordan-Holder Theorem for Artinian
Groups
.
5.1.1
Subnormality.
As recorded in the previous section, a group is simple if and only if it has
no normal subgroups other than the identity subgroup or the entire group.
That is, it has no proper normal subgroups.
A subgroup H is said to be subnormal in G if and only if there exists
a finite chain of subgroups
H = N0 N1 · · · Nk = G.
We express the assertion that H is subnormal in G by writing
H G.
Let SN (G) be the collection of all subnormal subgroups of G. This can
be viewed as a poset in two different ways: (1) First we can say that A is
“less than or equal to B” iff A is a subgroup of B, That is, SN has the
inherited structure of an induced poset of S(G), the poset of all subgroups
of G with respect to the containment relation. (2) We also have the option
of saying that A “is less than B” if and only if A is subnormal in B. The
177
second way of defining a poset on SN would seem to be more conservative
than the former. But actually the two notions are exactly the same, as can
be concluded from the following Lemma.
Lemma 101 (Basic Lemma on Subnormality) The following hold:
1. If A B and B G, then A G.
2. If A G and H is any subgroup of G, then A ∩ H H.
3. If A is a subnormal subgroup of G, then A is subnormal in any subgroup
that contains it.
4. If A and B are subnormal subgroups of G, then so is A ∩ B.
proof: 1. This is immediate from the definition of subnormality: By assumption there two finite ascending chains of subgroups, one running from
A to B, and one running from B to G, each member of the chain normal in
its successor in the chain. Clearly concatenating the two chains produces a
chain of subgroups from A to G, each member of the chain normal in the
member immediately above it. So by definition, A is subnormal in G.
2. If A G one has a finite subnormal chain A = A0 A1 · · · Aj :=
G. Suppose H is any subgroup of G. It is then no trouble to see that
A ∩ H A1 ∩ H A2 ∩ H · · · Aj ∩ H = G ∩ H = H, is a subnormal series
from A ∩ H to H.
3. This assertion follows from Part 2 where H contains A.
4. By Part 2, A ∩ B is subnormal in B. Since B is subnormal in G, the
conclusion follows by an application of Part 1.
At this point the student should be able to provide a proof that the two
methods (1) and (2) of imposing a partial order on the subnormal subgroups
lead to the same poset. We simply denote it SN (G), ≤ (or simply SN if G
is understood) since it is an induced poset of the poset of all subgroups of
G. From Lemma 101, Part 4, we see that
SN (G) is a lower semilattice.
Finally it follows from the second or third isomorphism theorem, that if
178
If A and B are distinct maximal normal subgroups of G = hA, Bi = AB =
BA, then G/A ' A/(A ∩ B). Thus SN (G) is a semimodular lower
semilattice and the mapping
µ : Cov(SN ) → M
(which records for every covering factor X/Y , X a maximal normal
subgroup of Y , the simple group X/Y ) is “semimodular” in the sense
of Theorem 20 Section 2.6.3.
According to the general Jordan-Holder theory advanced in Section 2.6.3,
µ may extended to an interval measure
ASN → M.
from all algebraic intervals of SN into the additive monoid M of all multisets
of simple groups — that is, the free additive semigroup of all formal nonnegative integer linear combinations of simple groups:
X
simp aG G, of finite support .
If there is a finite unrefinable ascending chain in SN from 1 to G, then
such a chain of subnormal subgroups is called a composition series of G.
The multiset of simple groups Ai+1 /Ai appearing in such a composition series
1 A1 · · · An = G is called the multiset of composition factors. The
theorem states that these are the same multisets for any two composition
series – the “measure” µ([1, G]).
If you have been alert, you will notice that no assumption that the involved groups are finite has been made. The compositions factors may be
infinite simple groups.
But of course, in applying the Jordan-Hölder theorem, one is interested
in pinning down a class of algebraic intervals. This is easy in the case of
a finite group, for there, all intervals of SN (G) are algebraic. How does
one approach this for infinite groups? The problem is that one can have an
infinitely descending chain of subnormal subgroups, and also one can have
an infinitely ascending chain of proper normal subgroups (just think of an
abelian group with such an ascending chain).
However one can select certain subposets of SN which will provide us
with many of these algebraic intervals: Suppose from any SN (G) we extract
179
the set SN ∗ of all subnormal subgroups for which any downward chain of
subgroups each normally containing its successor, must terminate in a finite
number of steps. There are three other descriptions of this subcollection
SN ∗ , of subnormal subgroups: (i) they are the subnormal subgroups of G
which belong to a finite unrefineable subnormal chain extended downward
to the identity. (ii) They are the elements M of SN for which [1, M ] is
an algebraic interval of SN – see Chapter 2. This is the special class of
subgroups amenable to the Jordan-Holder Theorem. (iii) They are the subnormal subgroups of G which themselves possess a finite composition series.
We call the elements of SN ∗ the Artinian subgroups. They are closed
under the operation of taking pairwise meets in SN and so SN ∗ is clearly a
lower semilattice, all of whose intervals are algebraic.
As a special case, we have:
Theorem 102 (The Jordan-Holder Theorem for general groups.) Let SN ∗
be the class of all Artinian subnormal subgroups of a group G. Then for
any H in SN ∗ , the list (with multiplicities) of all simple groups which might
appear as a cheif factor in any finite unrefinable normal chain of H is independant of the particular chain.
Corollary 103 Any two unrefinable normal series of a finite group recreate,
the same list of finite simple groups with mutiplicities respected.
.
5.2
Commutators.
Let G be a group. A commutator in G is an element of the form x−1 y −1 xy
and is denoted [x, y]. A triple commutator [x, y, z] is an element of shape
[[x, y], z], which the student may happily work out to be
[x, y]−1 z −1 [x, y]z = y −1 x−1 yxz −1 x−1 y −1 xyz.
If we write ux := x−1 ux, the result of conjugation of u by x, then it is an
exercise to verify these identities, for elements x, y, and z in any group G:
[x, y] = [y, x]−1 .
[xy, z] = [x, z]y [y, z] = [x, z][x, z, y][y, z].
180
(5.1)
(5.2)
[x, yz] = [x, z][x, y]z = [x, z][x, y][x, y, z].
1 = [x, y −1 , z]y [y, z −1 , x]z [z, x−1 , y]x .
[x, y, z][y, z, x][z, x, y] = [y, x][z, x][z, y]x ×
[x, y][x, z]y [y, z]x [x, z][z, x]y .
(5.3)
(5.4)
(5.5)
A commutator need not be a very typical element in a group. For example, in an abelian group, all commutators are equal to the identity.
If A, B and C are subgroups of a group G, then [A, B] is the subgroup of
G generated by all commutators of the form a−1 b−1 ab = [a, b] as (a, b) ranges
over A × B. Similarly we write [A, B, C] := [[A, B], C].
Theorem 104 (Basic Identities on Commutators of Subgroups.) Suppose
A, B and C are subgroups of a group G:
1. A centralizes B if and only if [A, B] = 1.
2. [A, B] = [B, A].
3. B always normalizes [A, B].
4. A normalizes B if and only if [A, B] ≤ B. In that case, [A, B] is a
normal subgroup of B.
5. If f : G → H is a group homorphism, then [f (A), f (B)] = f ([A, B]).
6. (The Three Subgroups Lemma of Phillip Hall) If at least two of the
three subgroups [A, B, C], [B, C, A] and [C, A, B] are trivial, then so is
the third.
proof: (1): A centralizes B if and only if every commutator of shape
[a, b], (a, b) ∈ A × B, is the identity. The result follows from this.
(2) A subgroup generated by set X is also the subgroup generated by
X −1 . Apply ( 5.1).
(3) If (a, b, b1 ) ∈ A × B × B, then it follows from ( 5.3) that
[a, b]b1 = [a, b1 ]−1 [a, bb1 ] ∈ [A, B].
(4) The following statements are clearly equivalent:
1. A normalizes B
181
2. [a, b] ∈ B for all (a, b) ∈ A × B
3. [A, B] ≤ B.
The normality assertion follows from (3).
(5) This is obvious.
(6) Without loss assume that [A, B, C] and [B, C, A] are the identity
subgroup. Then all commutators [a, b, c] and [b, c, a] are the identity, no
matter how the triplet (a, b, c) is chosen in A × B × C. Now by Equation
(5.4), any commutator [c, a, b] is the identity, for any (a, b, c) ∈ A × B × C.
This proves that [C, A, B] has only the identity element for a generator.
Corollary 105 Suppose again that A and B are subgroups of a group G.
1. If A normalizes B and centralizes [A, B], then A/CG (B) is abelian. In
particular, if A ≤ Aut(B) and A centralizes [A, B] := ha−1 ab |(a, b) ∈
A × Bi, then A is an abelian group.
2. If A is a group of linear transformations of the vector space V , centralizing a subspace W as well as its factor space V /W , then A is an
abelian group.1
proof: All these results are versions of the following: By the Three Subgroups Lemma, [A, B, A] = 1 implies [A, A, B] = 1. That is the proof.
1
This seems to be a frequently rediscovered result. As recently as 1991 a very distinguished colleague in applied linear algebra and combinatorics, relayed this discovery to me
at an international meeting as a new result that he had learned from some Russians in Lie
Representation Theory. With disrespect to noone, it shows how far politicians can isolate
our sciences, and how far mathematicians can isolate themselves from neighboring fields.
Perhaps one function of meetings is to discover that we are not as far apart as we thought
we were. Philip Hall noted this theorem in his hand-written lecture notes 60 years ago.
His notes were never published and not all of the world was lucky enough to have been
in his class at that time. As a result they are not as widely known as they ought to be –
except among group-theorists. (I got my (handwritten) copy of Phillip Hall’s notes from
John Thompson, but both of my advisors (Suzuki and Walter) were well aquainted with
this in 1958.) Theorems are not discovered in the order of their utility or simplicity. Thus
a very useful basic theorem may very well not be a recognized principle until someone
reorganizes things late in the maturity of a field. For example the Thompson transfer
theorem and the Thompson Order Formula were late arrivals. That’s just the way it is.
182
Of course anytime someone has some machinery, some mathematician
wants to iterate it. We have defined the commutator of two subgroups in
such manner that the result is also a subgroup. Thus inductively we may
define a multiple commutator of subgroups in this way: Suppose (A1 , A2 , . . .)
is a (possibly infinite) sequence of subgroups of a group G. Then for any
natural number k, we define
[A1 , . . . Ak+1 ] := [[A1 , . . . Ak ], Ak+1 ].
5.3
The Derived Series and Solvability.
From the previous section, we see that the subgroup [G, G] is a subgroup
which
1. is mapped into itself by any endomorphism f : G → G (a consequence
of Theorem 104 (5)). (This condition is called fully invariant. Since
inner automorphisms are automorphisms which are in turn endomorphisms, “fully invariant” implies “characteristic” implies “normal”.)
2. yields an abelian factor, G/[G, G] (let f : G → G/[G, G] and apply
parts 2. and 3. of Theorem 104). Moreover, if G/N is abelian, then
[G, G] ≤ N , so [G, G] is the smallest normal subgroup of G, which
possesses an abelian factor (see Chapter 3).
The group [G, G] is called the commutator subgroup or derived subgroup of G and is often denoted G0 .
Since we have formed the group G0 := [G, G], we may iterate this definition to obtain the second derived group,
00
G := [[G, G], [G, G]].
Once on to a good thing, why not set G0 = G(1) and G00 = G(2) and define
G(k+1) := [G(k) , G(k) ]
for all natural numbers k? The resulting sequence
G ≥ G(1) ≥ G(2) ≥, · · · .
is called the derived series of G. Since each G(k) is characteristic in its
successor which in turn is characteristic in its successor, etc. , we see that
183
Every G(k) is characteristic in G.
Note also that
G(k+l) = (G(k) )(l) .
(5.6)
A group G is said to be solvable if and only if the derived series
G ≥ G(1) ≥ G(2) , . . . ,
eventually becomes the identity sugroup after a finite number of steps.2 The
smallest positive integer k such that G(k) = 1 is called the derived length
of G. Thus a group is abelian if and only if it has derived length 1.
Theorem 106
ular
1. If H ≤ G and A ≤ G, then [A, H] ≤ [A, G]. In partic[H, H] ≤ [G, H] ≤ [G, G].
2. If H ≤ G, then H (k) ≤ G(k) . Thus, if G is solvable, so is H.
3. If f : G → H is a group homomorphism, then
f (G(k) ) = (f (G))(k) ≤ H (k) .
4. If N G, then G is solvable if and only if both N and G/N are solvable.
5. For a direct product,
(A × B)(k) = A(k) × B (k) .
6. Solvable groups are closed under finite direct products and subgroups
and hence is closed under taking finite subdirect products. In particular,
if G is a finite group, there is a normal subgroup N of G minimal among
normal subgroups with respect to the property that G/N is solvable.
(Clearly this normal subgroup is characteristic; it is called the solvable
residual.)
2
In this case, for once, the name “solvable” is not accidental. The reason will be
explained in the Chapter 11 on Galois Theory.
184
proof: Part 1. follows from the containment
{[a, h]|(a, h) ∈ A × H} ⊆ {[a, g]|(a, g) ∈ A × G}.
Part 2. By Part 1, H 0 ≤ G0 . If we have H (k−1) ≤ G(k−1) , then
H (k) = [H (k−1) , H (k−1) ] ≤ [G(k−1)) , G(k−1) ] = G(k) ,
utilizing H (k−1) and G(k−1) in the roles of H and G in the last sentence of
Part 1. The result now follows by induction.
Part 3. The equal sign is from Theorem 104, Part 5; the subgroup relation
is from Part 1 above of this Theorem.
Part 4. Suppose G is solvable. Then so are N and G/N by Parts 1 and 3.
Now assume, G/N and N are solvable. Then there exists natural numbers
k and l, such that (G/N )(k) = 1 and N (l) = 1. Then by Part 3, the former
equation yields
G(k) N/N = 1 or G(k) ≤ N.
Then
G(k+l) = (G(k) )(l) ≤ N (l) ≤ 1,
by Part 2., so G is solvable.
Part 5. If the subgroups A and B of G centralize each other, it follows
from Equations 5.2 or 5.3 that [ab, a0 b0 ] = [a, a0 ][b, b0 ] for all (a, b), (a0 , b0 ) ∈
0
0
0
A × B. This shows (A × B) ≤ A × B . The reverse containment follows
from Part 1. Thus the conclusion holds for k = 1. One need only iterate this
result sufficiently many times to get the stated result for all k.
Part 6. This is a simple application of the other parts. A subgroup of a
solvable group is solvable because of the inequality of Part 2. A finite direct
product of solvable groups is solvable by Part 5. Thus, from the definition
of subdirect product, a finite subdirect product of solvable groups is solvable.
The last part follows from the remarks in Section 3.6.2.
remark: Suppose G1 , G2 , . . . is a countably infinite sequence of finite groups
for which Gk has derived length k (It is a fact that a group of derived length k
exists for every positive integer k: perhaps one of those very rare applications
P
of wreathed products.) Then the direct sum k∈N Gk = G1 ⊕ · · · does not
have a finite derived length and so is not solvable. Thus solvable groups are
not closed under infinite direct products.
185
Recall that a finite subnormal series for G is a chain
G = N0 N1 N − 1 · · · Nk = 1,
where each Nj+1 is a normal subgroup of its predecessor, Nj , j = 0, . . . k−1, k
is a natural number. (Of course what we have written is actually a descending
subnormal series. We could just as well have written it backwards as an
ascending series.) Recall also that an unrefineable subnormal series is called
a composition series, and when such a thing exists, the factors Nj /Nj+1
are simple groups called the composition factors of G. That the list of
composition factors that appears is the same for all composition series, was
the substance of the Jordan-Hölder Theorem for groups.
These notions make their reappearance in the following:
Corollary 107
1. A group G is solvable of derived length at most k if
and only if it possesses a subnormal series
G = N0 N1 N1 · · · Nk = 1.
such that each factor Nj /Nj+1 , j = 0, . . . k − 1, is abelian.
2. A finite group is solvable if and only if it possesses a composition series
with all composition factors cyclic of prime order (of course, the prime
may depend on the particular factor taken).
proof: Part 1. If G is solvable, the derived series is a subnormal series with
succesive factors abelian. Conversely, if the subnormal series is as given,
0
then N0 /N1 abelian implies [G, G] = G ≤ N1 . Inductively, if G(j) ≤ Nj ,
then Nj /Nj+1 abelian implies G(j+1) ≤ [Nj .Nj ] ≤ Nj+1 . So by induction,
G(j+1) ≤ Nj+1 for all j = 0, 1, . . . k − 1. Thus G(k) ≤ Nk = 1, and G is
solvable.
Part 2. If G possesses a composition series with all composition factors
cyclic of prime order, G is solvable by Part 1. On the other hand, if G is
finite and solvable it possesses a subnormal series
G = N0 N1 · · · Nk = 1,
with each factor Nj /Nj+1 a finite abelian group. But in any finite abelian
group (where all subgroups are normal) we can form a composition series by
186
choosing a maximal subgroup M1 of A, a maximal subgroup M2 of M1 , and
so forth. Each factor will then be cyclic of prime order. Thus each interval
Nj Nj+1 can be refined to a chain with successive factors of prime order.
The result is a refinement of the original subnormal series to a composition
series with all composition factors cyclic of prime order.
There are many very interesting facts about solvable groups, especially
finite ones, which unfortunately we do not have space to reveal. Two very
interesting topics are these:
• Phillip Hall’s theory of Sylow Systems. In a finite solvable group there
is a representative Si of Sylpi (G), the pi -Sylow subgroups of G such
that G = S1 S2 · · · Sm , and the Si are permutable in pairs – that is
Si Sj = Sj Si where p1 , . . . pm is the list of all primes dividing the order
of G. Thus in any solvable group of order 120, for example, there must
exist a subgroup of order 15. (This is not true for example of Sym(5)).
• The theory of formations. Basically, a formation is an isomorphismclosed class of finite solvable groups closed under finite subdirect products. Then each finite group G contains a characteristic subgroup GF
which is unique in being minimal among all normal subgroups N of G
which yield a factor G/N belonging to F. (GF is called the F-residual
of G.) Whenever G is a finite group, let D(G) be the intersection of
all the maximal subgroups of G. This is a characteristic subgroup
which we shall meet shortly, called the Frattini subgroup of G. A
formation is said to be saturated, if for every solvable finite group G,
G/D(G) ∈ F implies G ∈ F. If F is saturated, then there exists a
special class of subgroups, the F-subgroups, subgroups in F described
only by their embedding in G, which form a conjugacy class. Finite
abelian groups comprise a formation but are not a saturated formation. A class of groups considered in the next section are the nilpotent
groups and they do form a saturated formation. As a result one gets a
theorem like this: “Suppose X and Y are two nilpotent subgroups of a
finite solvable group G, each of which is its own normalizer in G. Then
X and Y are conjugate subgroups.”3 A bit surprising.
3
This conjugacy class of subgroups was first announced in a paper by Roger Carter;
they are called the class of “Carter subgroups” of G. Usually mathematical objects bearing
a mathematician’s name are not named for the person who discovered or advanced the
187
5.4
5.4.1
Central Series and Nilpotent Groups.
The upper and lower central series.
Let G be a fixed group. Set Γ0 (G) = G, Γ1 (G) = [G, G], and inductively
define
Γk+1 (G) := [Γk (G), G],
for all positive integers k. Then the descending sequence of subgroups
G = Γ0 (G) ≥ Γ1 (G) ≥ Γ2 (G) ≥ · · · ,
is called the lower central series of the group G. Note that
Γk (G) = [G, G, . . . G] (with k arguments. )
Since any endomorphism f : G → G, maps the arguments of the above multiple commutator into themselves, the endomorphism f maps each member
of the lower central series into itself. A similar argument for homomorphisms
allows us to state the elementary
Lemma 108
1. Each member Γk (G) of the lower central series is a fully
invariant subgroup of G.
2. If f : G → H is a homomorphism of groups then
f (Γk (G)) = Γk (f (G)).
3. If H is a subgroup of G, then Γk (H) ≤ Γk (G).
idea. They got it right for once!
But there is more to this business of names. An “inner automorphisms” of a finitedimensional Lie algebra is well-defined: it is one of shape i(n) := exp(ad(n)), where adn
is nilpotent. In the theory of finite-dimensional simple complex Lie Algebras, there is a
class of sub-algebras C which are characterized by being nilpotent subalgebras which are
“self-normaling” in the sense that the only inner automorphisms which leave it invariant
are those i(n) with n ∈ C. The theorem is that there is only one conjugacy class (one
orbit under the inner automorpsm group) of such self-normalizing nilpotent subalgebras.
They are called the “Cartan subalgebras”. Here in solvable groups we have one conjugacy
class of self-normalizing nilpotent subgroups and these are called the “Carter subgroups”.
188
The proof is an easy exercise.
A group G is said to be nilpotent if and only if its lower central series
terminates at the identity subgroup in a finite number of steps. In this case
G is said to belong to nilpotence class k if and only if k is the smallest
positive integer such that Γk (G) = 1. Thus abelian groups are the groups of
nilpotence class 1.
The following is immediate from Lemma 108, and the definition of nilpotence.
Corollary 109
1. Every homomorphic image of a nilpotent group of nilpotence class at most k is group of nilpotence class at most k.
2. Every subgroup of a nilpotent group of nilpotence class at most k is also
nilpotent of nilpotence class at most k.
3. Every finite direct product of nilpotent groups of nilpotence class at most
k is itself nilpotent of nilpotence class at most k.
The beginning student should be able supply immaculate proofs of these
assertions by this time. [ Follow the lead of the similar result for solvable
groups in the previous section; do not hesitate to utilize the commutator
identies in Part 3.]
Clearly, the definitions show that
[Γk , Γk ] ≤ [Γk (G), G] = Γk+1 (G),
so the factor groups (Γk (G))/(Γk+1 (G)) are abelian. As a consequence,
Corollary 110 Any nilpotent group is solvable.
The converse fails drastically: Sym(3) is a solvable group of derived length
2. But is not nilpotent since its lower central series drops from Sym(3) to its
subgroup N ' Z3 , and stabilizes at N for the rest of the series.
Now again fix G. Set Z0 (G) := 1, the identity subgroup; set Z1 (G) =
Z(G), the center of G (the characteristic subgroup of those elements of G
which commute with every element of G); and inductively define Zk (G) to
be that subgroup satisfying
Zk (G)/Zk−1 (G) = Z(G/Zk−1 (G)).
189
That is, Zk (G) is the inverse image of the center of G/Zk−1 (G) under the
natural homomorphism G → G/Zk−1 (G). For example:
Z2 (G) = {z ∈ G|[z, g] ∈ Z(G), for all g ∈ G}.
By definition Zk (G) ≤ Zk+1 (G), so we obtain an ascending series of characteristic subgroups
1 = Z0 (G) ≤ Z1 (G) ≤ Z2 (G) ≤ · · · ,
called the upper central series of G. Of course if G has a trivial center,
this series does not even get off the ground. But, as we shall see below, it
proceeds all the way to the top precisely when G is nilpotent.
Suppose now that G has nilpotence class k. This means Γk (G) = 1 while
Γk−1 (G) is non-trivial. Then as
[Γk−1 (G), G] = Γk (G) = 1,
G centralizes Γk−1 (G), that is
Γk−1 (G) ≤ Z1 (G).
Now assume for the purposes of induction, that
Γk−j (G) ≤ Zj (G).
Then
[Γk−j−1 , G] ≤ Γk−j (G) ≤ Zj (G),
so
[Γk−j−1 , G] ≤ {z ∈ G|[z, G] ≤ Zj (G)} := Zj+1 (G).
Thus
Γk−j (G) ≤ Zj (G), for all 0 ≤ j ≤ k.
(5.7)
Putting j = k in ( 5.7), we see that Zk (G) = Γ0 (G) = G. That is, if the
lower central series has length k, then the upper central series terminates at
G in at most k steps.
Now assume G 6= 1 has an upper central series terminating at G in exactly
k steps – that is, Zk (G) = G while Zk−1 (G) is a proper subgroup of G (as G 6=
190
1). Then, of course G/Zk−1 is abelian, and so Γ1 (G) = [G, G] ≤ Zk−1 (G).
Now for the purpose of an induction argument, assume that
Γj (G) ≤ Zk−j (G).
Then
Zk−j (G)/Zk−j−1 (G) = Z(G/Zk−j−1 )
implies
[Zk−j (G), G] ≤ Zk−j−1 (G).
so
Γj+1 (G) := [Γj (G), G] ≤ [Zk−j (G), G] ≤ Zk−(j+1) (G).
Thus
Γj (G) ≤ Zk−j (G), for all 0 ≤ j ≤ k.
(5.8)
Thus if j = k in (5.8), Γk (G) ≤ Z0 (G) := 1. Thus if the upper central series terminates at G in exactly k steps, then the lower central series
terminates at the identity subgroup in at most k steps.
We can assemble these observations as follows:
Theorem 111 The following assertions are equivalent:
1. G belongs to nilpotence class k. (By definition, its lower central series
terminates at the identity in exactly k steps.)
2. The upper central series of G terminates at G in exactly k steps.
There is an interesting local property of nilpotent groups:
Theorem 112 If G is a nilpotent group, then any proper subgroup H of G
is properly contained in its normalizer NG (H).
proof: If G = 1, there is nothing to prove. Suppose H is a proper subgroup
of G. Then there is a minimal positive integer j such that Zj (G) is not
contained in H. Then there is an element z in Zj (G) − H. Then [z, H] ≤
[z, G] ≤ Zj−1 (G) ≤ H. Thus z normalizes H, but does not lie in it.
Corollary 113 Every maximal subgroup of a nilpotent group is normal.
191
remark There are many groups with no maximal subgroups at all. Some of
these are not even nilpotent. Thus one should not expect a converse to the
above Corollary. This raises the question, though, whether the property of
Theorem 112 characterizes nilpotent groups. Again one might expect that
this is false for infinite groups. However, in the next subsection we shall learn
that among finite groups it is indeed a characterizing property.
Here is a classical example of a nilpotent group: Suppose V is an ndimensional vector space over a field F . A chamber is an ascending sequence
of subspaces S = (V1 , V2 , . . . Vn−1 ) where Vj has vector space dimension j.
(This chain is unrefineable.) Now consider the group B(S) of all linear transformations T ∈ GL(V ) such that T fixes each Vj and induces the identity
transformation on each 1-dimensional factor-space Vj /Vj−1 . Rendered as a
group of matrices these are upper triangular matrices with “1’s” on the diagonal. The reader may verify that the k-th member of the lower central
series of this group consists of upper triangular matrices still having 1’s on
the diagonal, but having more and more upper subdiagonals all zero as k
increases. Eventually one sees that T (n) is the identity matrix.
5.4.2
Finite nilpotent groups.
Let p be a prime number. As originally defined, a p-group is a group in which
each element has p-power order. Suppose P is a finite p-group. Then by the
Sylow Theorem, the order of G cannot be divisible by a prime distinct from
p. Thus we observe
A finite group is a p-group if and only if it has p-power order.
Now we have seen before, that a non-trivial finite p-group must possess
a non-trivial center. Thus if P 6= 1 is a finite p-group, we see that Z(P ) 6= 1.
Now if Z(P ) = P , then P is abelian. Otherwise, P/Z(P ) has a nontrivial
center. Similarly, in general, if P is a finite p-group:
Either Zk (P ) = P or Zk (P ) properly lies in Zk+1 (P ).
As a result, a finite p-group possesses an upper central series of finite
length, and hence is a nilpotent group. Obviously, a finite direct product
of finite p-groups (where the prime p is allowed to depend upon the direct
factor) is a finite nilpotent group.
Now consider this
192
Lemma 114 If G is a finite group and S is a p-Sylow subgroup of G, then
any subgroup H which contains the normalizer of S is itself self-normalizing
– i.e. NG (H) = H In particular NG (S) is self-normalizing.
proof: Suppose x is an element of NG (H). Since P is a p-Sylow subgroup
of G, it is also a p-Sylow subgroup of NG (H) and H. Since P x ∈ Sylp (H),
by Sylow’s theorem there exists an element h ∈ H such that P x = P h . Then
xh−1 ∈ NG (P ) ≤ H whence x ∈ H. (Actually this could be done in one line,
by exploiting the “Frattini Argument” (Chapter 3; Corollary 14). Now we
have
Theorem 115 Let G be a finite group. Then the following statements are
equivalent.
1. is G is nilpotent.
2. G is isomorphic to the direct product of its various Sylow subgroups.
3. Every Sylow subgroup of G is a normal subgroup.
proof: ( 2. implies 1.) Finite p-groups are nilpotent, as we have seen, and
so too are their finite direct products. Thus any finite group which is the
direct product of its Sylow subgroups is certainly nilpotent.
(2. if and only 3.) Now it is easy to see that a group is isomorphic
to a direct product of its various p-Sylow subgroups, (p ranging over the
prime divisors of the order of G) if and only if each Sylow subgroup of G
is normal. Sylow subgroups of such a direct product are certainly normal
(being direct factors), and conversely, if all p-Sylow subgroups are normal,
the direct product criterion of Chapter 2 (Lemma 13, Part (3)) is satisfied,
upon considering the orders of the intersections S1 S2 · · · Sk ∩ Sk+1 , where
{Si } is the full (one-element) collection of Sylow subgroups of G (one for
each prime).
(1. implies 3.) So it suffices to show that in a finite nilpotent group,
each p-sylow subgroup P is a normal subgroup of G. If NG (P ) = G, there is
nothing to prove. But if NG (P ) is a proper subgroup of G, then it is a proper
self-normalizing subgroup by Lemma 114. This completely contradicts Theorem 112 which asserts that every proper subgroup of a nilpotent group is
properly contained in its normalizer.
193
Fix a group G. A non-generator of G is an element x of G such that
whenever x is a member of a generating set X (that is, when x ∈ X, and
hXi = G), then X − {x} is also a generating set: hX − {x}i = G. Thus x
is just one of those superfluous elements that you never need4 . Now it is not
easy to see from “first principles” – that is, from the defining property alone,
that in a finite group, the product of a non-generator and a non-generator is
again a non-generator. But we have this characterization of a non-generator:
Lemma 116 Suppose G is a finite group. Let D(G) be the intersection of
all maximal subgroups of G. Then D(G) is the set of all nongenerators of G.
proof: Suppose x is a non-generator, and let M be any maximal subgroup
of G. Then if x is not in M we must have h{x} ∪ M i = G, while hM i = M ,
against the definition of “non-generator”. Thus x ∈ D(G).
On the other hand assume that x is an arbitrary element of D(G), and
that X is a set of generators of G which counts x among its active participants. If X − {x} does not generate G, then, as G is finite, there is
a maximal subgroup M containing H := hX − {x}i. But by assumption
x ∈ D(G) ≤ M , whence hXi = M , a contradiction. Thus always it is the
case that if x ∈ X, where X generates G, then also X − {x} generates G.
Thus x is a non-generator. The proof is complete.
remark: This theorem really belongs to posets. Clearly finiteness can be
replaced by the only distinctive property used in the proof: that for every
subset X of this poset, there is an upper bound – that is, an element of the
filter P X , which is either the 1-element of the poset of lies in a maximal member of the poset with the 1-element removed. Its natural habitat is therefore
lattices with the ascending chain condition. (We need upper semi-lattices
so that we can take suprema; we need something like a lower-semilattices in
order to define the Frattini element of the poset as the meet of all maximal
elements. Finally the ascending chain condition makes sure that the maximal elements fully cover all subsets whose supremum is not “1”. So at least
lattices with the ascending chain condition (ACC) are more than sufficient
for this result.)
Theorem 117 (Global non-generators) If X ∪ D(G) generates G, then X
generates G.
4
Strangely, some College Presidents come to mind.
194
proof: The argument is the same and has the same poset-generalization. If
the supremum of X − D(G) is not G, it lies below a maximal member M , of
the poset, and so it would be impossible that sup(X) = G. Thus X − D(G)
has the “top” element G as it supremum.
The subgroup D(G) is called the Frattini subgroup of G 5 . It is clearly
a characteristic subgroup of G. In the case of finite groups, something special
occurs which could be imitated only artificially in the context of posets:
Theorem 118 In a finite group, the Frattini subgroup D(G) is a characteristic nilpotent subgroup of G.
proof: The “characteristic” part is clear, and has been remarked on earlier.
Suppose P ∈ Sylp (D(G). By the Frattini Argument, G = NG (P )D(G), It
now follows from Theorem 117 that NG (P ) = G and so in particular P is
normal in D(G). Thus every p-sylow subgroup of D(G) is normal in it. Since
it is finite, it is a direct product of its Sylow subgroups and so is nilpotent.
Corollary 119 (Wielandt) For a finite group G the following are equivalent:
1. G is nilpotent.
2. Every maximal subgroup of G is normal.
3. G0 ≤ D(G).
proof: If M is a maximal subgroup of the finite group G which also happens
to be normal, then G/M is a simple group with no proper subgroups, and so
must be a cyclic group of order p, for some prime p. Thus if all maximal subgroups are normal, the commutator subgroup lies in each maximal subgroup
and so lies in the Frattini subgroup D(G). On the other hand, if G/D(G) is
abelian, every subgroup above D(G) is normal; so in particular, all maximal
subgroups are normal. Thus Parts 2 and 3 are equivalent.
But if M is a maximal subgroup of a nilpotent group, M properly lies
in its normalizer in G, and so NG (M ) = G. Thus the assertion of Part 1.
implies Parts 2. and 3..
5
In some of the earlier literature one finds D(G) written as “Φ(G)”. I do not understand
fully the reason for the change.
195
So it remains only to show that if every maximal subgroup is normal, then
G is nilpotent. Since G is finite, we need only show that every Sylow subgroup
is normal, under these hypotheses. Suppose P is a p-Sylow subgroup of G.
Suppose NG (P ) 6= G. Then NG (P ) lies in some maximal subgroup M of G.
By Lemma 114, M is self-normalizing. But that contradicts our assumption
that every maximal subgroup is normal. Thus NG (P ) = G for every Sylow
subgroup P . Now G is nilpotent by Theorem 115.
5.5
Coprime Action
Suppose G is a finite group possessing a normal abelian subgroup A, and let
F be the factor group G/A. Any transversal of A — that is, a system of
coset representatives of A in G can be indexed by the elements of the group
F . Any transversal T := {tf |f ∈ F } determines a function
αT : F × F → A
(called a factor system) by the rule that for all (f, g) ∈ F ,
tf · tg = tf g αT (f, g).
In some sense, the factor system measures how far off the transversal is from
being a group. Because of the associative law, one has
αT (f g, h)αT (f, g)h = αT (f, gh)αT (g, h),
(5.9)
for all f, g, h ∈ F . (Notice that since A is abelian, conjugation by any element
of the coset h = Ax gives the same result ax = x−1 ax which we have denoted
by ah .)
Now, given a transversal T = {tf |f ∈ F }, any other transversal S =
{sf |f ∈ F } is uniquely determined by a function β : F → A, by the rule that
sf = tf β(f ), f ∈ F.
Then the new factor system αS for transversal S can be determined from the
old one αT by the equations
αS (f, g) = αT (f, g)[(β(f g))−1 β(f )g β(g)] for all f, g ∈ F.
(5.10)
The group G is said to split over A if and only if there is a subgroup H
of G such that G = HA, H ∩ A = 1 — that is, if and only if some transversal
196
H forms a subgroup. Such a subgroup H is called a complement of A in
G. In that case, the factor system αH for H is just the constant function at
the identity element of A. Then, given G, A and transversal T = {tf |f ∈ F },
we can conclude from equation (5.10) that G splits over A if and only if there
exists a function β : F → A such that
αT (f, g) = β(f g)β(f )−g β(g)−1 , for all f, g ∈ F.
(5.11)
Now suppose A is a normal abelian p-subgroup of the finite group G and
suppose P is a Sylow p-subgroup of G with the property that ““‘P splits over
A”. That means that P (which necessarily contains A) contains a subgroup
B such that P = AB and B ∩ A = 1. Let X be a transversal of P in G.
Then T := BX := {bx|b ∈ B, x ∈ X} is a transversal of A in G whose factor
system αT satisfies
αT (b, t) = 1 for all (b, t) ∈ B × T.
Now let m be the p-Sylow index [G : P ] = |X| in G. Since m is prime to
|A|, there exists an integer k such that mk ≡ mod |A|. Since A is abelian,
the factors of any finite product of elements of A can be written in any order.
So there is a function γ : F → A defined by
γ(f ) :=
Y
x∈X
αT (x, f ).
Now, multiplying each side of equation (5.9) as f ranges over F , one obtains
γ(h)γ(g)h = γ(gh)αT (gh)m .
Then, setting β(g) := γ(g)−k for all g ∈ F , we obtain equation (5.11). Thus
G splits over A.
Noting that conversely, if G splits over A, so does P , we have in fact
proved,
Theorem 120 (Gaschütz) Suppose G is a finite group with a normal abelian
p-subgroup A. Then G splits over A if and only if some p-Sylow subgroup of
G splits over A.
Next suppose A is a normal abelian subgroup of the finite group G and
that H is a complement of A in G, so G = HA and H ∩ A = 1. Now
197
suppose some second subgroup K was also a complement to A. Then there
is a natural isomorphism µ : H → K factoring through G/A, and a function
β : H → A so that
µ(h) = hβ(h) for all h ∈ H.
Expressed slightly differently, K is a transversal {kh |h ∈ H} whose elements
can be indexed by H, and kh = hβ(h) for all h ∈ H. Since K is a group, it
is easy to see that for h, g ∈ H,
kg · kh = hgh = (gh)β(gh)
= gβ(g) · hβ(h)
= ghβ(g)h β(h).
So
β(gh) = β(g)h β(h), for all g, h ∈ H.
(5.12)
Now assume m = [G : A] is relatively prime to |A|, and select an integer n
Q
such that mn ≡ mod |A|. Then as A is abelian, the product H β(g), taken
as g ranges over H, is a well-defined constant b, an element of A. Forming
a similar product of the terms on both sides of equation (5.12) as g ranges
over H, one now has
b = bh · β(h)m ,
so
β(h) = a · a−h , where a = bn , h ∈ H.
Then for every h ∈ H,
kh = hβ(h) = h · a−h · a = a−1 ha.
Thus K = a−1 Ha is a conjugate of H. So we have shown
Lemma 121 If A is a normal abelian subgroup of a group G of finite order,
and the order of A is relatively prime to its index [G : A], then any two
complements of A in G are conjugate.
The reader should be able to use Gaschütz’ Theorem and the preceeding
Lemma to devise an induction proof of the following
Theorem 122 (The lower Schur-Zassenhaus Theorem) Suppose N is a solvable normal subgroup of the finite group G whose order is relatively prime to
its index [G : N ]. Then
198
1. there exists a complement H of N in G, and
2. any two complements of N in G are conjugate by an element of N .
On the other hand, using little more than Sylow’s theorem, a Frattini
argument, and induction on group orders one can easily prove
Theorem 123 (The upper Schur-Zassenhaus Theorem) Suppose N is a normal subgroup of the finite group G, such that |N | and its index [G : N ] are
coprime. If G/N is solvable, then
1. there exists a complement H of N in G, and
2. any two complements of N in G are conjugate by an element of N .
Remark: Both the lower and upper Schur-Zassenhaus theorems contain a
hypothesis of solvability, either on N or G/N . However this condition can
be dropped altogether, because of the famous Feit-Thompson theorem which
says that any finite group of odd order must be solvable. The hypothesis
that |N | and [G : N ] are coprime forces one of N or G/N to have odd order
and so at least one is solvable. Of course the Feit-Thompson theorem is far
beyond the sophistication of this relatively elementary course.
As an application of the Schur-Zassenhaus theorem, one can show
Theorem 124 (Glauberman) Let A be a group of automorphisms acting
on a group G leaving invariant some coset Hx of a subgroup H. Then the
following hold:
1. H is itself A-invariant.
2. If A and H have coprime orders (so that at least one is solvable), then
A fixes an element of the coset Hx.
Proof: This is an exercise.
5.6
Exercises for Chapter 4
Exercise 1. We say that a subgroup A of G is subnormal in G if and only
if there is a finite ascending chain of subgroups:
A = N0 < N 1 · · · < N m = G
199
from A to G, with each member of the chain normal in its successor – i. e.
Ni Ni+1 , i = 0, . . . , m−1. Using the Corollary to the Fundamental Theorem
of Homomorphisms, show that SN (G) is a semi-modular lower semi-lattice.
Exercise 2. Suppose N is a family of normal subgroups of G. Let hN i
be defined to be the set of elements of G which are expressible as a finite
product of elements each of which belongs to a subgroup in N .
1. Show that hN i is a subgroup of G.
2. We say that N is a normally closed family, if and only if for any
non-empty subset M ⊆ N , hMi ∈ N . A group is said to be a torsion
group if and only if every element of G has finite order. (This does not
mean that the group is finite: far from it: there are many infinite torsion
groups.) Show that the family T (G) of all normal torsion subgroups of
G, is a normally closed family. [hint: When any element x is expressed
as a finite product x = a1 · · · ak with ai ∈ Ni ∈ N , only a finite number
of groups Ni are involved. So the proof comes down to the case k = 2.]
Then the group
Tor(G) := hT (G)i ∈ T (G)
is a characteristic subgroup of G
Exercise 3. Fix a prime number p. A group H is said to be a p-group if and
only if evey element of H has order a power of the prime p. Show that the
collection of all normal p-subgroups of a (not necessarily finite) group G is
normally closed. (In this case the unique maximal member of the collection
of all normally closed p-subgroups of G is denoted Op (G).)
Exercise 4. Let G be any group, finite or infinite.
1. Show that any subnormal torsion subgroup of G lies in T (G).
2. Show that any subnormal p-subgroup of G lies in Op (G).
Exercise 5. Let Q denote the additive group of the rational numbers. The
group Zp∞ is defined to be the subgroup of Q/Z generated by the infinite
set {1/p, 1/p2 , . . .}. Show that this is an infinite p-group. Also show that
it is indecomposable, – that is, it is not the direct product of two of its
subgroups.
Exercise 6. Let F be a family of abstract groups which is closed under taking
unrestricted subdirect products – that is, if H is a subgroup of the direct
200
product σ∈I Xσ , with each Xσ ∈ F, and the restriction of each canonical
projection
Y
πτ : σ∈I Xσ → Xτ
Q
to H is an epimorphism, then H ∈ F.
1. Show that there exists a unique normal subgroup NF such that NF is
minimal with respect to being a normal subgroup which yields a factor
G/NF ∈ F 6
2. Show that there is always a subgroup G0 of G minimal with respect to
these properties:
(a) G0 G,
(b) G/G0 is abelian.
3. Explain why there is not always a subgroup N of G minimal with
respect to being normal and having G/N a torsion group (or even a pgroup). [Hint: Although torsion groups are closed under taking direct
sums, they are not closed under taking direct products.]
4. If G is finite, and F is closed under finite subdirect products –
that is, the set of possibilities for F is enlarged because of the weaker
requirement that it is enough that any subgroup H of a finite direct
product of members of F which remains surjectve under the restrictions of the canonical projections, is still in F– then there does exist a
subgroup NF which is minimal with respect to the two requirements:
(i) NF G, and (ii) G/NF ∈ F. (If π is any set of (non-negative) prime
numbers, a group G is said to be a π-group if and only every element
of G has finite order divisible only by the primes in π. In particular,
a π-group is a species of torsion group, and a p-group is a species of
π-group.
Show that all π-groups are closed under finite subdirect products. [In
this case we write
0
NF = Oπ (G), Op (G), or Op (G)
6
Note that as a class of abstract groups F can either be viewed as either (i) a collection
of groups closed under isomorphisms, or (ii) a disjoint union of isomorphism classes of
groups. Under the former interpetation, the “membership sign”, ∈, as in “ G/NF ∈ F” is
appropriate. But in the second case, “∈” need only be replaced by “an isomorphism with
a member of F”. One must get used to this.
201
where π = {p} or all primes except p, in the last two cases.]
Exercise 7. Suppose A and B are normal subgroups of a group G. Prove
the following:
1. If A and B normalize subgroup C, then
[AB, C] = [A, C][B, C].
[Hint: Use the commutator identity (1.2) and the fact that B normalizes [A, C] to show the containment of the left side in the right.]
2. Show that the k-fold commutator [AB, . . . , AB] is a normal product
Y
[X1 , X2 , . . . , Xk ],
where the Xi = A or B, and the product extends over all 2k possibilities
for the k-tuples (X1 , . . . , Xk ). (Of course each factor in the normal
product is a normal subgroup of G. But some of them occur more than
once since the cases with X1 = A and X2 = B, are duplicated in the
cases with X1 = B and X2 = A.)
3. Show that if A and B are nilpotent of class k and `, respectively, then
AB is nilpotent of class at most k + `. [Hint: Consider Γk+`+1 (AB)
and use the previous step.]
4. Prove that in a finite group, there exists a normal nilpotent subgroup
which contains all other normal nilpotent suproups of G. (This characteristic subgroup is called the Fitting subgroup of G and is denoted
F(G).)
The next few excercizes spell out a subtle but well-known theorem known
as the Baer-Suzuki theorem (Not to be confused with “Brauer-Suzuki
Theorem”7 The proof considered here is actually do to Alperin and Lyons[?].)
7
Still another instance in which Mathematical History, with a Nabikovian perverseness,
tends to place name-similarities in the students’s path. Theorem ??, on the existence
and conjugacy of Carter subgroups of finite solvable groups has the same hypotheses
for subgroups as does the theorem on the existence and conjugacy of Cartan subgroups
of finite-dimensional complex Lie algebras. A folklore common in the graduate student’s
“coffee-klatches” of the 50’s was that Philip Hall knew the theorem in advance, and merely
202
In the following G is a finite group. There is some new but totally natural
notation due to Aschbacher. Suppose Σ is a collection of subgroups of G and
H is a subgroup of G. Then the symbol Σ ∩ H denotes the collection of all
subgroups in Σ which happened to be contained in the subgroup H. (Of
course this collection could very well be empty.) If Γ is any collection of
subgroups of G, the symbol NG (Γ) denotes the subgroup of G of all elements
which under conjugation of subgroups leave the collection Γ invariant.
Exercise 8. Suppose Ω is a G-invariant collection of p-subgroups of G.
Let Q be any p-subgroup of G and let Γ be any subset of Ω ∩ Q for which
Γ ⊆ NG (Γ). Show that either
1. Γ = Ω ∩ Q or
2. Then there is an element X ∈ Ω ∩ Q which also lies in NG (Γ).
[Hint: Without loss of generality, one may assume that Γ = Ω ∩ Q ∩ NG (Γ).
Then every element of NQ (NQ (Γ)) leaves Γ = Ω ∩ NQ (Γ) invariant, so
NQ (NQ (Γ) = NQ (Γ). Since Q is a nilpotent group, Γ is a Q-invariant collection of subgroups of Q and the conclusion follows easily.]
Exercise 9. Prove the following:
Theorem 125 (The Baer-Suzuki Theorem) Suppose X is some p-subgroup
of the finite group G. Either X ≤ Op (G) or hX, X g i is not a p-subgroup for
some conjugate X g ∈ X G .
[Hint: Set Ω = X G , choose P a p-sylow subgroup of G which contains
X, and set ∆ := Ω ∩ P . Since ∆ = Ω implies hΩi is a normal p-group, forcing
the conclusion X ∈ Op (G), we may assume Ω − ∆ 6= ∅.
Now show
1. that for any subgroup Y ∈ Ω − ∆, hY, ∆i cannot be a p-group. [ If R is
a p-Sylow subgroup containing hR, ∆i, compare |Ω ∩ R| and |Ω ∩ P |.]
suggested it to a very bright student of a similar name when he or she (statiscally) should
eventually happen along. I reject this hypothesis as being sociologically unlikely: (1) with
at most a handful of good students alloted by fate to any Professor, it is very improbalbe
that one of these has a name closely resembling a given one, (2) Carter could have proved
anything he liked, and (3) such an hypothesis would interfere with the lofty thesis of this
footnote that there is a natural and charming and mystical tendency for this to happen.
How else should it happen that the mathematician Andrew Wiles who proved Fermat’s
last theorem not have a name resembling Andre Weil?
203
Next consider the set S of all pairs (Y, Γ) where Y ∈ X G − ∆, Γ ⊆ ∆ and
hY, Γi is a p-group. (We have just seen that this collection is non-empty.)
Among these pairs, choose one, say (Y, Γ) ∈ S, so that |Γ| is maximal. Now
if Γ = ∅, hXi is not a p-group and we are done. Thus we may assume Γ is
non-empty.
Now set Q := hY, Γi, a p-group. Then, noting that hY, ∆∩Qi is a p-group,
the maximality of Γ forces Γ = ∆ ∩ Q. Now Γ̄ := ∪X∈Γ X ⊆ P ⊆ NG (∆) and
Γ̄ ⊆ Q together imply
Γ̄ ⊆ NC (Q) ∩ NG (∆) ⊆ NG (Q ∩ ∆) = NG (Γ).
By the previous exercise, either Γ = Ω ∩ Q or there is a subgroup Y 0 ∈
Ω ∩ Q − Γ with Y 0 ⊆ NG (Γ). The former alternative is dead from the outset
because of the existence of Y . Now Y 0 belongs to X G − ∆, and hY 0 , Γi is a
p-subgroup, as it is a subgroup of Q. Thus (Y 0 , Γ) is also one of the extreme
pairs in S.
Similarly, since hY 0 , ∆i is not a p-group, Γ 6= ∆ and so, applying the
previous exercise once more with P in place of Q, there must exist a subgroup
Z ∈ ∆ − Γ, with Z ∈ NP (Γ). Thus
{Y 0 , Z} ∪ Γ ⊆ Ω ∩ NG (Γ)
.
Now NG (Γ) is a proper subgroup of G (otherwise some element of X G lies
in the normal p-subgroup hΓi). Now for the first time, we exploit induction
on the group order to conclude that
hY 0 , Γ ∪ {Z}i
is a p-group. Clearly this contradicts the maximality of |Γ|, completing the
proof.]
Exercise 9. The proof just sketched is very suggestive of other generalizations, but there are limits. We say that a finite group is p-nilpotent if and
only if G/Op0 (G) is a p-group.
Show that the collection of finite p-nilpotent subgroups is closed under finite
normal products. (In a finite group, there is then a unique maximal
normal p-nilpotent subgroup, usually denoted Op0 p (G).)
204
Is the following statement true? Suppose X is a p-subgroup of the finite
group G. Then either X ∈ Op0 p (G) or else there exists a conjugate
X g ∈ X G such that hX, X g i is not p-nilpotent. [Hint: Think about
the fact that in any group, the group generated by two involutions is a
dihedral group.]
Exercise 10. Suppose F is a family of groups closed under normal products.
Suppose G possesses a maximal normal subgroup N belonging to F (this
occurs when G is finite. Is it true that any subnormal subgroup of G lies in
N ? Prove it, or give a counterexample.
Finally we add an “open problem”8 .
Exercise 11. Open problem: Prove the following
Theorem 126 (Brodkey’s Theorem Extended) Suppose G is a finite group
with Op (G) = 1 — that is, G has no nontrivial normal p-subgroups. Show
that there exist two p-Sylow subgroups P and P g such that, P ∩ P g = 1.
[Yes, one could reduce to the case that G is simple and use the classification of
finite simple groups to solve the problem. But of course that would condemn
the theorem to a rather limited usefulness. At that level, it could never be
used to streamline the theory of the classification, and could be applied only
in fields willing to accept the classification – say coding theory or most of
the work on flag-transitive incidence geometries.
Moreover, the hypotheses are so elementary that one could easily imagine
that the theorem could be proved just with the theorems developed so far.
Such a principle is not a good guide as the the Fermat Theorem shows.
Moreover, there are other theorems in Group theory, such as the theorem
concerning the Frobenius Kernel, which are equally naive in their hypotheses,
yet seem to require character theory for no good reason.
Nonetheless, one cannot escape the feeling, that there is some analogy
between the Brodkey result and the Alperin-Lyons proof of the Baer-Suzuki
theorem. An analogue in sesquilinear forms (Chapter 16) is the fact that for
non-degenerate polar spaces of finite rank, there exist two maximal totally
isotropic (or totally singular) subspaces which intersect at 0.]
Exercise 12. There are other open generalizations of the Baer-Suzuki Theorem due to Chat-Yin Ho. Here is one of his examples: Suppose P is a
8
Probably the word “open” only means that the solutions are not apparent to the
author; but this definition has such an overwhelming breadth that it dilutes all meaning.
205
p-Sylow subgroup of the finite group G. Suppose X is a p-subgroup with the
property that hX, Y i is a p-subgroup for every subgroup Y in X G ∩ P (using
the Aschbacher notation). Then necessarily X ⊆ P . Is this true?
Exercise 13. A student of this course should be able to explain exactly why
this generalization is logically as strong as the Baer-Suzuki theorem.
206
Chapter 6
Generation in Groups
6.1
Introduction
One may recall that for any subset X of a given group G, the subgroup
generated by X always has two descriptions:
1. It is the intersection of all subgroups of G which contain X;
2. It is the subset of all elements of G with are expressible as a finite
product of elements of X or their inverses in G.
The two notions are certainly equivalent, for the set of elements described
in item 2, is closed under group multiplication and taking inverses and so is
a subgroup of G (by the Subgroup Criterion) containing X. On the other
hand it must be a subset of every subgroup containing X. So this set is the
same subgroup which was denoted hXi.
But there is certainly a difference in the way the two notions feel. Our
point of view in this chapter will certainly be in the spirit of the second of
these two notions.
6.2
6.2.1
The Cayley graph
Definition.
Suppose we are given a group G and a subset X. X is said to be a set of
generators of G if and only if hXi = G. For any such set of generators X,
we can obtain a directed graph C(G, X) without multiple edges as follows:
207
1. The vertex set of C := C(G, X) is the set of elements of G.
2. The directed edges are the ordered pairs of vertices: (g, gx), where
g ∈ G and x ∈ X. (It is our option to label such an edge by the symbol
“x”, since each edge could only be assigned in this way a unique label.)
The directed labelled graph C(G, X) is called the Cayley graph of the
generating set X of the group G.
One now notices that left multiplication of all the vertices of C (that is,
the elements of G) by a given element h of G induces an automorphism πh
of C (in fact one preserving the labels) since (hg, hgx) is a an edge labelled
by “x”.
Let us look at a very simple example. The set X = {a = (12), b = (23)} is
a set of generators of the group G = Sym(3), the symmetric group on the set
{1, 2, 3}. The Cayley graph then has six vertices, and each edge has two outgoing edges labeled “a” and “b”, respectively, and two ingoing edges labeled
“a” and “b” as well. Indeed, we could coalesce an outgoing and ingoing edge
with the same label, as one undirected edge of that label. This only happens
when the label indicates an involution of the group. In this case we get a
hexagon with alternating sides labelled “a” and “b” alternately.
6.2.2
Morphisms of various sorts of graphs.
Since we are talking about graphs, this may be a good time to discuss homomorphisms of graphs. But there are various sorts of graphs to discuss.
The broad categories are
1. Simple graphs Γ = (V, E) where edges are just subsets of V , the vertex
set, of size two. (These record all unreflexive symmetric relations on a
set V , for example the acquaintanceship relation at a large party.)
2. Undirected graphs with multiple edges . (This occurs when the edges
themselves live an important life of their own – for example when they
are the bridges connecting islands, or doors connecting rooms.)
3. Simple graphs with directed edges. Here the edge set is a collection
of ordered pairs of vertices. (This occurs when one is recording assymmetric relations, for example, which Football team beat which in
a Football League, assuming each pair of teams plays at most once.
Or the the comparison relation of elements in a poset. We even allow
208
two relations between points, for example (a, b) and (b, a) may both be
directed edges. This occurs, if we have sites in the “Alt Stadt” which
are connected by a network of both one-way and two-way streets.)
4. Any of the above, with labels attached to the edges. Cayley graphs,
for example, can be viewed as directed labeled graphs.
A graph homomorphism f : (V1 , E1 ) → (V2 , E2 ) is a mapping, f :
V1 → V2 such that if (a, b) ∈ E then either f (a) = f (b) or (f (a), f (b)) ∈ E2 .
If the two graphs are labeled, there are labelling mappings λi : Ei → Li
assigning to each (directed) edge, a label from a set of labels Li , i = 1, 2.
Then we require an auxilliary mapping f 0 : L1 → L2 , so that if (a, b) ∈ E1
carries label x and (f (a), f (b)) is an edge, then it too carries label f 0 (x).
6.2.3
Morphisms of groups induce morphisms of Cayley graphs.
Now suppose f : G → H is a surjective morphism of groups. Then it is
easy to see that if X is a set of generators of G, then also f (X) is a set of
generators of f (G) = H. Now f is a fortiori a map from the vertex set of
the Cayley graph C(G, X) to the vertex set of the Cayley graph C(H, f (X)).
For (g, x) ∈ G × X, (g, gx) is a directed edge labelled x. Then clearly either
f (g) = f (gx) or else, (f (g), f (gx)) is a directed edge of C(H, f (X)), labelled
f (x). Thus
Lemma 127 If X is a set of generators of a group G and f : G → H is
an epimorphisms of groups, then f (X) is a set of generators of H and f
induces a homomorphism of the Cayley graphs, f : C(G, X) → C(H, f (X))
as labeled directed graphs.
6.3
Free groups
The Cayley graphs of the previous section were defined by the existence of a
group G (among other things). Now we would like to reverse this; we should
like to start with a set X and obtain out of it a certain group which we shall
denote F (X).
We do this in stages.
209
First we give ourselves a fixed abstract set X. The free monoid on X is
the collection M (X), of “words” xi x2 · · · xk spelled with “letters” xi chosen
from the alphabet X.1 The number k is allowed to be zero – that is, there
is a unique “word” spelled with no letters at all –which we denote by the
symbol φ and call the empty word.2 The number k is called the length of
the word which, as we have just seen, can be zero.
The concatenation of two words u = x1 · · · xk and w = y1 y2 · · · y` ,
xi , yj ∈ X, is the word,
u ∗ w := x1 · · · xk y1 · · · y` .
The word “bookkeeper” is the concatenation of “book” and “keeper”; 12345
is the cancatenation 123 ∗ 45 or 12 ∗ 345, and so forth. Clearly the concatenation operation “*” is a binary operation on the set W (X) of all words over the
alphabet X, and it is associative (though very non-commutative). Moreover,
for any word w ∈ W (X), φ ∗ w = w ∗ φ = w. So the empty word φ is a twosided identity with respect to the operation “∗”. Thus M (X) = (W (X), ∗)
is an associative semigroup with identity, – that is a monoid. Indeed, since
it is completely determined by the abstract set X alone, it is called the free
monoid on (alphabet) X.3
Now let σ : X → X be an involutory fixed-point-free permutation
of X. This means σ is a bijection from X to istelf such that
1. σ 2 = 1X , the identity mapping on X, and
2. σ(x) 6= x for all elements x in X.
We are going to construct a directed labelled graph Γ = F (X, σ), called
a frame which is completely determined by X and the involution σ.
Let W ∗ (X) be the subset of W (X) of those words y1 y2 · · · yr , r any nonnegative integer, for which yi 6= σ(yi+1 ), for any i = 1, . . . , r − 1. These are
1
Of course, to be formal about it, one can regard these as finite sequences (x1 , . . . , xk ),
but the analogy with linguistic devices is helpful because the binary operation we use here
is not natural for sequences.
2
I suppose one could say that blank pieces of paper are literally “literally” covered with
some number of these words – but a philosophical paper on the issue of how many empty
words are on a blank piece of paper is a medieval project that could best be described by
such words.
3
Monoids were introduced in Chapter 1.
210
the words which never have a “factor” of the form yσ(y). We call them the
reduced words.
We now construct a labelled directed graph Γ whose vertices are these
reduced words in W ∗ (X). The set E of oriented labelled edges are of two
types:
1. Ordered pairs (w, wy) ∈ W ∗ (X) × W ∗ (X) are to be directed edges
labelled “y”. (Note that the nature of the domain forces the reduced
word w not to end in the letter σ(y) – otherwise wy would not be
reduced.)
2. Similarly those ordered pairs of the form (wy, w) (under the same hypothesis that w not end in σ(y)) are directed edges labeled by “σ(y)”.
The two types of directed edges are distinguished this way: in the first kind,
the “head” of the directed edges is a word of length one more than it’s “tail”,
while the reverse is the case for the second kind of edge.
Then the directed graph Γ = (W ∗ (X), E) which we have defined and
have called a frame has these properties:
1. Let Vk be the set of words of W ∗ (X) of length k. Then V0 = {φ}, and
the vertex set partitions as W ∗ (X) = V0 + V1 + V2 + . . ..
2. Each vertex has all its outgoing edges labeled by X; it also has all its
ingoing edges labeled by X. Each directed edge (a, b) that is labeled
by y ∈ X, has its “transpose” (b, a) labelled by σ(y).
3. For k > 0, each vertex in Vk receives just one ingoing edge and gives one
outgoing edge to exactly one vertex in Vk−1 . All further edges leaving
or arriving at this vertex, are to or from vertices of Vk+1 .
4. Given any ordered pair of vertices (w1 , w2 ) (recall the wi are reduced
words) there is a unique directed path in Γ from vertex w1 to vertex
w2 . It follows that there are no circuits other than “backtracks” – that
is circular directed walks of the form
(v1 , v2 , . . . vm , . . . v2m+1 )
where vj = v2m+1−j , (vj , vj+1 ) and (vj+1 , vj ) are all directed edges,
j = 1, . . . m .
211
At this point we define a monoid homomorphism
µ : M (X) → Aut(Γ).
For each y ∈ X, and vertex w in W ∗ (X) define
µ(y)(w) :=
(
yw if w does not begin with letter σ(y)
w0 if w has the form σ(y)w0 .
It is then quite clear that µ(y) is a label-preserving automorphism of Γ (this
needs only be checked at vertices w of very short length where “cancellation”
affects the terminal vertices.
Notice that by definition
µ(σ(y)) = (µ(y))−1 ,
the inverse of the automorphism induced by σ(y).
Next, for any word m = y1 y2 · · · yk in the monoid M (X) we set
µ(m)(w) := µ(yk )(µ(yk−1 )(· · · (µ(y1 )(w)) · · ·),
that is, µ(m) is the composition
µ(yk ) ◦ · · · ◦ µ(y1 )
of automorphisms of Γ. Clearly, µ(m1 m2 ) = µ(m1 )µ(m2 ) for all (m1 , m2 ) ∈
M (X) × M (X) and µ(φ) = 1W ∗ (X) . So µ is a monoid homomorphism
µ : W (X) → Aut(Γ).
Note that µ is far from being 1-to-1. For each y ∈ X, µ(yσ(y)) = 1Γ .
the identity automorphism on Γ. Thus all compound expressions derived
from the empty word by successively inserting factors of the form yσ(y),
– such as abσ(c)deσ(e)σ(d)cf gσ(g)σ(f )σ(b)σ(a) – also induce the identity
automorphism of Γ.
We now have
Lemma 128 Let G = Aut Γ, the group of all label preserving automorphisms of the directed graph Γ.
1. The monoid homomorphism µ : M (X) → G is onto.
212
2. G acts regularly on the vertices of Γ.
3. The group G is generated by the set µ(Y ), where Y is any system of
representatives of the σ-orbits on X.
4. Identities of the form µ(m) = 1Γ occur if and only if the word m ∈
M (X) is formed by a sequence of insertions of factors yi σ(yi ) starting
with the empty word.
5. Γ is the Cayley graph of G with respect to the set of generators X.
proof: Suppose g is an element of G fixing a vertex w in Γ. Then, as each
(in- or out-) edge on w bears a unique label, every vertex next to w is fixed.
Since every vertex u is connected to the vertex φ, the empty word in W ∗ (X),
by an “undirected path” of length equal to the length `(u), the undirected
version of Γ is connected. This forces the elements g in the first sentence of
this paragraph to fix all vertices W ∗ (X) of Γ.
But now it is clear that for any word w ∈ W ∗ (X), regarded as a monoid
element, µ(w) is an element of G taking vertex φ to w. Thus µ(M (X)) = G,
and 1. and 2. are proved. So is the statement that µ(X) is a set of generators
of G in part 3.
Suppose now, that w = y1 y2 · · · yk ∈ M (X), satisfies µ(w) = 1Γ . Then
sucessive premultiplications by yk , then yk−1 , and so on, take the vertex φ on
a journey along a directed path with successive arrows labelled yk , yk−1 , . . .;
eventually returning to φ via a directed edge (y1 , φ). At each stage of this
journey, the current image of φ is either taken to a vertex that is one length
further or else one length closer to its original “home”, vertex φ. So there is
a position
v = µ(yj yj+1 · · · yk )(φ)
in which a local maximum occurs in the distance from “home”. This means
that both v − = µ(yj+1 · · · yk )(φ) and v + = µ(yj−1 yj · · · yk )(φ) are one unit
closer to the starting vertex φ than was v. But as pointed out in the construction of Γ, each vertex z at distance d from φ has only one out-edge and
one in-edge to a vertex at distance d − 1 from φ, and the two closer vertices
are the same vertex –say z1 –, so that the two edges in and out of the vertex z
are two orientations of an edge on the same pair of vertices {z, z1 }. It follows
therefore that v − = v + and that yj , the label of directed edge (v − , v), is the
σ-image of the label yj−1 of the reverse directed edge (v, v + ). Thus
213
µ(w) = µ(y1 y2 · · · yj−2 yj−1 yj yj+1 · · · yk )
= µ(y1 · · · yj−1 yj+1 · · · yk )
= 1Γ
(6.1)
Now one applies induction on the length of w0 := y1 · · · yj−2 yj+1 · · · yk , to
complete the proof of 3.
Part 4. is obvious.
Now let Y be a system of representatives of the σ-orbits on X. Then we
have
X = Y ∪ σ(Y ), and Y ∩ σ(Y ) = ∅.
The group G above is canonically defined (one sees) by the set Y alone.
To recover X one need only formally define X to be two disjoint copies of
Y , and invoke any bijection Y → Y to define the involutory permutation
σ : X → X.
Moreover, we see from Part 3 of the Lemma 128 that the only relations
that can exist among the elements of G, are consequences of the relations
µ(y) ◦ µ(σ(y)) = 1, y ∈ X. Accordingly the group G is called the free group
on the set of generators Y and is denoted F (Y ) 4 .
remarks:
1. (Uniqueness.) As remarked F (Y ) is uniquely defined by Y , so if there
is a bijection f : Y → Z, then there is an induced group isomorphism
F (Y ) ' F (Z)
2. For any frame Φ = F (Z, ζ), Aut(Φ) is a free group F (Z0 ), for any
system Z0 of representatives of the ζ-orbits on Z.
3. (The inverse mapping on Γ.) Among the reduced words comprising the
vertices of Γ, there is a well-defined involutory “inverse mapping”
superscript -1 : W ∗ (X) → W ∗ (X)
4
Although this is the customary notation, unfortunately, it is not all that distinctive
in mathematics. For example F (X) can mean the field of rational functions in the indeterminants X, as well as a myriad of meanings in Analysis and Topology.
214
which fixes only the empty word φ and takes the reduce word w =
y1 y2 · · · yk to the reduced word w−1 := σ(yk )σ(yk−1 ) · · · σ(y1 ).
Thus the group-theoretic inverse mapping µw → (µw )−1 is extended to
the vertices of Γ by the rule that (µw )−1 := µw−1 : that is, the inverse
mapping of group elements induces a similar mapping on W ∗ (X) via
the bijection µ from the regular action of G = Aut(Γ) on the vertices
of Γ.
Thus without ambiguity, we may write y −1 for σ(y) and Y −1 for σ(Y ),
so Y ∩ Y −1 = ∅.
(It is this mapping on X which defines the graph Γ; there is no mention
of a special system of representatives of σ-orbits. Thus if Y 0 is any other
such system, then F (Y ) = F (Y 0 ) with “equals”, rather than the weaker
isomorphism sign.)
Corollary 129 Any subgroup of a free group is also a free group.
proof: Let F (Y ) be the free group on the set Y , let X = Y ∪ Y −1 , and let
Γ be the Cayley graph of F (Y ) with respect to the set of generators X. We
used Γ to define F (Y ), for by the preceeding lemma F (Y ) is the group of all
label-and-direction-preserving automorphisms of Γ.
Let H be a non-trivial subgroup of F (Y ) and let ΓH be the H-orbit on
vertices of Γ which contains the vertex φ, the empty word. Now each vertex
in ΓH is a reduced word w and there is a unique directed path from φ to
that vertex; moreover the sequence of labels from Y encountered along this
directed path, are in fact the letters which in their order “spell” the reduced
word w comprising this vertex . An H-neighbor of φ, is a vertex w of ΓH
such that along the unique directed path in Γ from φ to w, no intermediate
vertex of ΓH is encountered. Let Y H denote the collection of all H-neighbors
of φ. Note from the remarks above, and the fact that H is a subgroup of
F (X) that if the reduced word w is an H-neighbor of φ, then so is w−1 .
Now we construct a new labelled graph on the vertex set ΓH . There is a
directed edge (x, y) ∈ ΓH × ΓH if and only if x = yw where yw is reduced
and w ∈ Y H . (This means x = µ(y)(w).) In a sense this graph can be
extracted from the original graph Γ in the following way: If we think of ΓH
as embedded in the graph Γ, we are drawing a directed edge from x to y
labelled w if an only if the reduced word formed by juxtaposing the labels
215
of the successive edges along the unique directed path in Γ from x to y is w
and w is in Y H .
Now the graph ΓH has as its vertices elements of W ∗ (Y H ). There is a
unique directed path in ΓH connecting any two vertices. Also for every two
vertices there are no directed edges or exactly two directed edges, (x, y) and
(y, x) respectively labelled w and w−1 for a unique w ∈ Y H . Thus the graph
ΓH has all of the properties that Γ had except that X has been replaced by
Y H . Thus as H is a regularly acting orientation- and label-preserving group
of automorphisms of ΓH , H ' F (Y+H ) where Y H = Y−H + Y+H is any partition
of Y H such that for each w ∈ Y H , exactly one representative of {w, w−1 }
lies in Y+H . (Notice that we have used the inverse map on vertices of Γ in
describing ΓH .) Thus the graph ΓH is a frame F = F (Y H , E), and H, being
transitive, is by Lemma 128 the free group on Y+H .
Exercise 1: Show that if |X| > 1, then it is possible to have |X| finite,
while |Y H | is infinite. That is, a free group on a finite number of
generators can have a subgroup which is a free group on an infinite
number of generators.
Exercise 2: This is a project. Is it possible for a group to be a free group
on more than one cardinality of generator? The question comes down
to asking whether F (X) ' F (Z) implies there is a bijection X → Z.
Clearly the graph-automorphism definition of a free group is the way
to go.
Exercise 3. Another project: What can one say about the automorphism
group of the free group F (X)? Use the remarks above to show that is
is larger than Sym(X).
6.3.1
The Universal Property
We begin with a fundamental theorem.
Theorem 130 Suppose G is any group, and suppose X is a set of generators
of G. Then there is a group epimorphism F (X) → G, from the free group
on the set X onto our given group G, taking X to X identically.
Proof: Here X is simply a set of generators of the group G, and so a subset
of G. But a free group F (X) can be constructed on any set, despite whatever
216
previous roles the set X had in the context of G. One constructs the formal
set Y = X + X −1 = two disjoint copies of X, with the notational convention
that the exponential operator −1 denotes both a bijection X → X −1 and its
inverse X −1 → X so that (x−1 )−1 = x. Recall that any arbitrary element of
the free group F (X) was a reduced word in the alphabet Y – that is,
an element of W ∗ (Y ), consisting of those words which do not have the form
w1 yy −1 w2 for any y ∈ Y . (Recall that these were the vertices of the Cayley
graph Γ = Γ(Y ) which we constructed in the previous subsection.)
Now there is a mapping f which we may think of as our “amnesiarecovering morphism”. For each word w = u1 · · · yk ∈ W ∗ (Y ), f (w) is the
element of G we obtain when we “remember” that each yi is one of the generators of G or the inverse of one. Clearly f (w1 w2 ) = f (w1 )f (w2 ), since this
is how elements multiply when we remember that they are group elements.
Then f is onto because every element of G can be expressed as a finite product of elements of X and their inverses. There is really nothing more to
prove.
One should bear in mind that the epimorphism of the theorem completely
depends on the pair (G, X). If we replace X by another set of generators, Y ,–
possibly of smaller cardinality – one gets an entirely different epimorphism:
F (Y ) → G.
Generators and relations. Now consider the same situation as in the
previous subsection: A group G contains a set of generators X, and there is
an epimorphism f := fG,X : F (X) → G, taking X to X “identically”. But
there was on the one hand, a directed labelled graph Γ (the “frame”) that
came from the construction in Lemma 128. This was seen to be the Cayley
graph C(F (X), X). Since X ∪ X −1 is also a set of generators of G, one may
form the Cayley graph C(G, X ∪ X −1 ). Then by Lemma 127 , there is a
label-preserving vertex-surjective homomorphism of directed labelled graphs
f : Γ → C(G, X ∩ X −1 ),
from the frame Γ = F (X ∪ X −1 , −1) to the Cayley graph..
Where Γ had no circuitous directed walks other than backtracks, the
Cayley graph C(G, X ∪ X −1 ) may now have directed circuits. In fact it
must have them if f is not an isomorphism. For suppose f (w1 ) = f (w2 ). If
w1 6= w2 , there is in Γ a unique directed walk from w1 to w2 . Its image is
then a directed circuit.
217
Since G acts regularly on its Cayley graph by left multiplication, every
directed circular walk in C(G, XU X −1 ) can be translated to one that begins
and ends at the identity element 1 of G. Ignoring the peripheral attached
backtracks, we can take such a walk to be a circuit
(1, x1 , x1 x2 , x1 x2 x3 , ..., x1 x2 · · · xk = 1), xi ∈ X ∪ X −1 ,
where x1 x2 · · · xk is reduced in the sense that for i = 1, . . . , k − 1, xi 6= x−1
i+1 .
The equation
x1 · · · xk = 1
where a reduced word in the generators is asserted to be 1, is called a relation. In a set of relations, the words that are being asserted to be the
identity are called relators. Obviously, the distinction is only of grammatical significance since the relators determine the relations and vice versa.
Let R be a set of relators in the set of generators X; precisely, this means
that R is a set of reduced words in W ∗ (X ∪X −1 ). Then the normal closure of
R in the free group F (X), is the subgroup hRF (X) iF (X) , generated by the set
of all conjugates in F (X) of all relators in R. Clearly, this is the intersection
of all normal subgroups of F (X) which contain R. A group G is said to be
presented by the generators X and relations R, if and only if
G ' F (X)/hRF (X) iF (X) .
This is expressed by writing
G = hX|Ri.
This is the “free-est” group which is generated by a set X and satisfying
the relations R in the very real sense that any other group which satisfies
these relations is a homomorphic image of the presented group.
Let us look at some examples:
Example 1.. (The infinite dihedral group D∞ .) G = hx, y|x2 = y 2 = 1i.
As you can see we allow some latitude in the notation. The set of generators
here is X = {x, y}; but we ignore the extra wavy brackets in displaying
the presentation. The set of relators is R = {x2 , y 2 }, but we have actually
written out the explicit relations.
This is a group generated by two involutions x and y. Nothing is hypothesized at all about the order of the product xy, so it generates an infinite
218
cyclic subgroup N = hxyi. Now setting u := xy, the generator of N , we see
that yx = u−1 and that x(xy)x = yx so xux = u−1 . Similarly yuy = u−1 .
Moreover, uy = x shows N y = N x. Thus
xN = N x = N y = yN
and so N is a normal cyclic subgroup of index 2 in G. One of the standard
models of this group is given by its action on the integers.
Example 2.. (The dihedral group of order 2n, D2n .) Here G = hx, y|x2 =
y 2 = (xy)n = 1i. Everything is just as above except that the element u = xy
now has order n. This means the subgroup N = hui (which is still normal)
is cyclic of order n, so G = N + xN has order 2n. Note that this group is a
homomorphic image of the group of Example 1. When n = 2, the group is
an elementary abelian group of order 4 – the so-called Klein fours group.
Example 3.. (Polyhedral or (k, l, m)-groups.) Here k, l, and m are integers
greater than one. The presentation is
G := hx, y|xk = y l = (xy)m = 1i.
The case that k = l = 2 is the dihedral group of order 2m of the previous
example. We consider a few more cases:
(1) the case (k = 2, m = 2) : Here u = xy has order 2. But then x and
u are involutions, and the product xu = y has order l. So G is a
homomorphic image of
hx, u|x2 = u2 = (xu)l i ' D2l .
It is not difficult to see that this homomorphism is an isomorphism.
(2) the case(k = 2, l = 3, m = 3) : There is a group satisfying the relations.
The alternating group on four letters contains a permutation x =
(1, 2)(3, 4) of order 2, an element y = (1, 2, 3) of order 3 for which
the product u = (1, 2)(3, 4)(1, 2, 3) = (1, 3, 4) has order three, and
these elements generate Alt(4). Thus there is an epimorphism of the
219
presented (2, 3, 3)-group onto Alt(4). But the reader can check that
the relations imply
x(yxy 2 ) = y 2 xy.
The left side is the product of two involutions, while the right side is an
involution. It follows that the involutions x and x1 = yxy 2 commute.
But now the equations just presented show that
2
xy = x1 , and xy = xx1 .
Thus the abelian fours group K := hx, x1 i is normalized by y, so the
presented (2, 3, 3)-group G has coset decomposition
G = K + yK + y 2 K,
and so has order 12.
It follows that G is isomorphic to Alt(4).
(3) the case (k = 2, l = 3, m = 4) : Here x2 = y 3 = (xy)4 = 1, so xyxy =
y −1 xy −1 x and yxyx = xy −1 xy−1. Now conjugation by x inverts y −1 y x
since
(y −1 y x ) = (y −1 )x · y = [y −1 · ((y −1 )x )−1 ]−1 = (y −1 y x )−1 .
But
y −1 y x = y 2 · xyx = y(yxyx) = y(xy −1 xy −1 ) = (xy −1 )−1 · y −1 · (xy −1 )
and is a conjugate of y −1 and so has order 3. But yx is conjugate to
xy, so yx has order four. Thus
y has order 3, yxyxh has order 2, and y · yxyx = y 2 xyx has order 3.
Thus,
H := hy, yxyxi
is a (2, 3, 3)-group, and so by the previous case, has order at most 12.
But obviously
G = hx, yi = hx, Hii = H + Hx,
220
so |G| = 24. But in the case of Sym(4), setting x = (12) and y := (134),
we see that there is an epimorphism
G → Sym(4).
(6.2)
The orders force5 this to be an isomorphism, whence
any (2, 3, 4) − group is Sym(4).
(6.3)
(4) the case (k, l, m) = (2, 3, 5) : So here goes. We certainly have
G = hx, y|x2 = y3 = (xy)5 = 1i.
Again set z := xy. Then y = xz has order three.
Then for x and z, we have the relations
x2 = z 5 = 1.
(xz)3 = 1.
(6.4)
(6.5)
xzx = z −1 xz −1
xz −1 x = zxz.
(6.6)
(6.7)
But the latter implies
Now set t = zxz −1 , an involution. Then
(ty)3 = (zxz −1 · xz)3 = (z(zxz)z)3 = (z 2 xz 2 )3
= z 2 xz 4 xz 4 xz 2 = z 2 x(xzx)xz 2 = z 5 = 1.
(6.8)
(6.9)
Thus H = ht, yi is a (2, 3, 3)-subgroup of order divisible by 6, and so
is Alt(4). It contains a normal subgroup K with involutions t, r := ty
2
and ty , and setting Y = hyi, one has H = Y + Y rY .
5
Of course this could also be worked out by showing that the Cayley graph must
complete itself in a finite number of vertices. But the reasons seem equally ad hoc when
done this way. I suppose only a brief contest of the efficacy of presenting it both ways
might be interesting from a pedagogical point of view, but given that both methods yield
the same result in about the same amount of time with the same degree of sophistication,
there is no advantage in this.
221
Next let s be the involution z −1 (ty )z, which, upon writing t and y in
terms of x and z becomes
s = z −1 xz 3 xz.
Now
sy = z −1 xz 3 xz · xz = (xz)−1 (z 3 xz 2 )(xz)
= (xz)−1 z −1 (z −1 xz)z · xz
is a conjugate of x and so has order 2. Thus s inverts y.
Also
sr = sty =
=
=
=
(z −1 xz 3 xz)(xz 3 xz)
(xz)−1 z 3 xzxz 3 (xz)
(xz)−1 z 3 (z −1 xz −1 )z 3 (xz)
(xz)−1 (z 2 xz 2 )(xz)
is conjugate to z 2 xz 2 which has order 3 by equation 6.8. Consider now
the collection
2
M = KHK = H + HsH = H + Hs + Hst + Hsty + Hsty ,
a set of 60 elements closed under taking inverses. We show that M is
closed under multiplication. Using the three results
M = H + HsH
H = Y + Y rY
sY = Y s
we have M M ⊆ M if and only if
srs ∈ M.
But this is true since (sr)3 = 1 implies
srs = rsr ∈ HsH ⊆ M.
Finally, to get G = M we must show that at least two of the three
generators x, y or z lies in M . Now M already contains y, and
s = z −1 xz 3 xz = (xz)−1 z 3 (xz) = y −1 z 3 y,
222
and so contains z 3 and z = (z 3 )2 . Now, as |G| = 60 and G contains a
subgroup of index 5, there is a homomorphism ρ : G → Sym(5) whose
image has order at least 30 = lcm{2, 3, 5}. Because of the known
normal strucure of Sym(5) obtained in the previous chapter, the only
possibility is that ρ is a monomorphism and that G ' Alt(5).
Again, this could have been proved using the Cayley graph. Neither
proof is particularly enlightening.6
Example 4. (Coxeter Groups.) This class covers all groups generated by
involutions whose only further relations are the declared orders of the pairwise products of these involutions. First we fix abstract set X. A Coxeter
Matrix M over X is a function M : X × X → N ∪ {∞} which associates
with each ordered pair (x, y) ∈ X × X, a natural number m(x, y) or the
symbol “∞” so that (1) m(x, x) = 1 and (2) that m(x, y) = m(y, x) – that
is the matrix is “symmetric”.7
The Coxeter group W (M ) is then defined to be the presented group
W (M ) := hX|Ri.
where R = {(xy)m(x,y) |(x, y) ∈ X × X} . Thus W (M ) is generated by involutions (because of the 1’s on the ‘diagonal’ of M ). If X is non-empty the
group is always non-trivial. Coxeter classified the M for which W (M ) is a
finite group. These finite Coxeter groups have an uncanny way of appearing all over a large part of important mathematics: Lie Groups, Algebraic
Groups, Buildings, to name a few areas.
remarks: There are many times in which the choice of relations, R may
very well imply that the presented group has order only 1. For example if
6
The real situation is basically topolgical. It is a fact that a (k, l, m)-group is finite if
and only if
1/k + 1/l + 1/l > 1.
The natural proof of this is in terms of manifolds and curvature. Far from discouraging anyone, such a connection shows the great and mysterious “interconnectedness” of
mathematics. I find that joyful!
7
Of course if X is finite, it can be totally ordered so that the data provided by M can
be encoded in an ordinary matrix. We just continue to say “matrix” in the infinite case –
perhaps out of love of analogy.
223
G := hx|x3 = x232 = 1i then G has order 1. In fact, if we were to select
at random an arbitrary set of relations on a set X of generators, the odds
are that the presented group has order 1. From this point of view, it seems
improbable that a presented group is not the trivial group. But this is
illusory, since any non-trivial group bears some set of relations, and so the
presented group with these relations is also non-trivial in these cases. So most
of the time, the non-trivial-ness of a presented group is proved by finding an
object on which the presented group can act in a non-trivial way.8 Thus a
Coxeter group W (M ) is known not to be trivial for the reason that it can be
faithfully represented as an orthogonal group over the real field, preserving
a bilinear form B := BM : V × V → R defined by M . Thus Coxeter Groups
are non-trivially acting groups, in which each generating involution acts as a
reflection on some real vector space with some sort of inner product. That is
the trivial part of the result. The Real result is that the group acts faithfully
in this way. This means that there is no product of these reflections equal to
the identity transformation unless it is already a consequence of the relations
given in M itself. This is a remarkable theorem.
Exercise 4 Suppose G is a (k, l, m)-group. Using only the directed Cayley
graph C(G, X) where X is the set of two elements {x, y} of orders 2 and 4,
Show that
1. any (2, 4, 4)-group has order 24 and so is Sym(4).
2. any (2, 3, 5)-group has order 120
6.4
The Brauer-Ree Theorem
If G acts on a finite set Ω, and X is any subset of G, we let ν(X) be defined
by the equation
X
ν(X) := Γ =hXi-orbit (|Γi | − 1),
i
where the sum ranges over the hXi-orbits Γi induced on Ω. Thus if 1 is the
identity element, ν(1) = 0, and if t is an involution with exactly k orbits of
length 2, then ν(t) = k.
8
It is the perfect analogue of Locke’s notion of “substance”: something on which to
hang properties.
224
Theorem 131 (Brauer-Ree) Suppose G = hx1 , . . . , xm i acts on Ω and x1 · · · xm =
1. Then
Xm
ν(xi ) ≥ 2ν(G).
i=1
proof: Let Ω = Ω1 + · · · + Ωk be a decomposition of Ω into G-orbits, and
for any subset X of G, set
νi (X) =
so ν(X) =
X
Γij =hXi-orbits
P
in
νi (X). By induction, if k > 1,
X
ν(xj ) =
j
X
ν (x ) ≥
i,j i j
X
i
Ωi
(|Γij | − 1),
νi (xj ) ≥ 2(|Ωi | − 1), so
P
2(|Ωi | − 1) = 2ν(G).
Thus we may assume k = 1, so G is transitive.
Now each xi induces a collection of disjoint cycles on Ω. We propose to
replace each such cycle of length d, by the product of d − 1 transpositions
t1 · · · td−1 . Then the contribution of that particular cycle of xi to ν(xi ) is
P
ν(ti ) = d − 1 and xi itself is the product of the transpositions which were
used to make up each of its constituent cycles. We denote this collection of
transpositions T (xi ) so
ν(xi ) =
X
ν(t) and xi =
t∈T (x )
i
Y
t∈T (xi )
t,
in an appropriate order. Since x1 · · · xm = 1, we have
YY
i
t∈T (xi )
t = 1,
Q
or, letting T be the union of the T (xi ), we may say t∈T t = 1, with the
transpositions in an appropriate order. But as i ranges over 1, . . . m, the
cycles cover all points of Ω, since G is transitive. It follows that the graph
G(T ); = (Ω, T ), where the transpositions are regarded as edges, is connected.
So by Lemma 23 of Section 3.4.3 of Chapter 3.
hT i = Sym(Ω).
Q
But Sym(Ω) is transitive and so ν(Sym(Ω)) = ν(G). Since T t = 1,
P
P
hT i = Sym(Ω), and T ν(t) = i ν(xi ), it suffices to prove the asserted
inequality for the case that G = Sym(Ω) and the xi are transpositions.
225
So now, G = Sym(Ω), x1 · · · xm = 1, where the xi are a generating set
of transpositions. Among the xi we can select a minimal generating set
{s1 , . . . , sk } taken in the order in which they are encountered in the product
Q
x1 · · · xm . We can move these si to the left in the product xi at the expense
of conjugating the elements they pass over, to obtain
s1 s2 · · · sk y1 y2 · · · yl = 1, k + l = m,
where the yj ’s are conjugates of the xi ’s. Now by the Feit-Lyndon-Scott
Theorem, the product s1 · · · sk is an n-cycle (where n = |Ω|). Thus k ≥
n − 1. But then y1 · · · yl is the inverse n-cycle, and so l ≥ n − 1, also. But
P
k + l = m = ν(xi ) and n − 1 = ν(G), so the theorem is proved.
Exercise 5. Use the Brauer-Ree Theorem to show that Alt(7) is not a
(2, 3, 7)-group. [Hint: Use the action of G = Alt(7) on the 35 3-subsets
of the seven letters. With respect to this action show that an involution
fixes exactly seven triplets, and so has ν-value 14. Any element of order 3
contributes ν = 22, and that an element of order 7 fixes no triplet, acts in
five 7-cycles and contributes ν = 30.]
6.5
Appendix to Chapter 6: A Detective Story:
An application of (k, l, m)-groups
At 2030 hours it was brought in. It was an unidentified group. All we knew
was
1. It was simple.
2. It had order 360.
Not much to go on.
I had a hunch that it was Alt(6), which already had a known record of
being simple and order 360. We knew that if it was simple of at least this
order that it could not have a subgroup H of index less than 6, for otherwise
the standard homomorphism
πH : G → Sym(G/H)
226
would yield a proper non-trivial kernel. Also if this G had a subgroup H
of index exactly 6, simplicity would yield that πH is an embedding into
Alt(G/H) ' Alt(6) and comparing group orders we would have “made” our
group G as an isomorphic version of Alt(6).
So, at the request of Prosecuting Attorney Martha Clark, we made, the
working assumption 9
(H’) G is not Alt(6). G has no subgroup of index less than or equal to six.
As usual, we methodically proceeded by a series of short steps, first exploiting the Sylow theory.
STEP 1. G has ( or rather “had”) 36 5-Sylow subgroups. G contains 144
elements of order 5.
What were the possibilites? We knew that n5 := |Syl5 (G)| was a divisor
of 360 congruent to one modulo 5, and so was a divisor of 36 with the same
properties. So the only possibilities were 1, 6, and 36. Since n5 is just the
index of the normalizer of a 5-Sylow subgroup, the first two cases are ruled
out by our working assumption (H 0 ). Since the 5-Sylow subgroups have
prime order 5, they pairwise intersect trivially (they are TI subgroups) and
so each contains 4 elements of order 5. In that way, we found that there were
exactly 36 × 4 = 144 elements of order 5 in its system.
STEP 2. The 3-sylow subgroups of G were TI-subgroups, that is, each
intersected its other conjugates only at the identity.
We knew that these 3-Sylow subgroups of G had order 9 and so were
abelian. Thus if two of them intersected at a subgroup hxi of order 3, then
CG (x) would contain at least two of them. This would mean that |CG (x)| =
9d where a divisor m of d is m = |Syl3 (CG (x))|, is not 1, and satisfies m ≡
1(mod 3). If m = 40, CG (x) = G, against G simple. Now if m = 4, a 3-Sylow
subgroup of CG (x) must induce a 3-cycle on X := Syl(CG (x)), and so CG (x)
induces a 2-transitive group – either Alt(4) or Sym(X) – on X. Also, there
would have to be a subgroup F , normal in CG (x), representing the kernel of
9
The department has been a lot more sensitive recently, about what the working assumptions the police are actually using. We are now required to display them openly in
our reports. I do not mind this as much as I used to, for I understand this is now done in
quite a number of professions.
227
this action on X, so
|CG (x)| = |F | · 12 or |F | · 24.
By (H’), the only possibility was |F | = 1, and CG (x) inducing Alt(4) on
X. But then there would be a normal Klein 4-group, K, normal in CG (x).
Since K was normal in some 2-Sylow subgroup, NG (K) has at least twice the
order of CG (x), yielding a subgroup of index 5 or less, against (H’). There
was no doubt about it: CG (x) could not contain two or more distinct 3-Sylow
subgroups, and so the TI-condition must have held.
STEP 3. Ω := Syl3 (G) has ten elements and any 3-Sylow subgroup P must
act regularly by conjugation on Ω − {P }. Thus G must have been doubly
transitive on Ω.
The possibilities allowed us by Sylow’s Theorem were |Syl3 (G)| = 1, 4, 10, 40,
with the first two eliminated by our assumption (H’). But if there were 40
3-Sylow subgroups, they would contribute 40 × 8 = 320 elements of orders
3 or 9 (since they are TI-subgroups), and would have caused the subject’s
order to exceed 360. Thus |Ω| = 10 was the only possibility.
Now if P ∈ Syl3 (G), P would act by conjugaton on Ω, leaving itself fixed
and acting on Ω − {P } in orbits of length 3 or 9. But an orbit of length 3
would reveal a non-trivial intersection between P and one of its conjugates,
against Step 2. Thus P acts with just one orbit regularly on Ω − {P }.
Since the Sylow theory showed us that there was a small degree doubly
transitive representation, my blood began to tingle. Since G was simple,
involutions could only act as even permutations. Moreover this action would
really show up since the simplicity of G would force the transitive action on
Ω to be faithful.
We staked it out. Soon along came an involution t. We saw that it had
to act as an even permutation whether it wanted to or not, and so had to
fix at least 2 letters, and thus be represented by an involution in a 3-Sylow
normalizer, say, as an involution t in NG (P ). Then, of course t must fix the
unique point {P } of Ω fixed by P . And then it must act on the remaining
points Ω − {P } exactly as it does on P by conjugation. (One of my clercks
will put up the relevant file “Lemma 8 (iii)” at this point.) Thus t had to fix
1 + |CP (t)| letters – that is 1 or 4 letters. But if it fixed 4 letters it would
have been an odd permutation and would have been arrested at once. Thus
we were left with
228
STEP 4. Every involution of G appears as an involution t in some 3-sylow
normalizer, NG (P ). Moreover an involution t in NG (P ) acts on P as a fixedpoint-free automorphism of order 2, so t “inverts” P – that is, tyt = y −1 for
each y ∈ P .
The last statement is a standard ‘MO’ on involutions which act a little
too much fixed-point-freely than they should (we have a standard file on
these offenders).
At this point we knew what we were expecting:
STEP 5.
1. A 2-Sylow subgroup of a 3-Sylow normalizer was a cyclic group of order
4.
2. All involutions that had ever lived in G, had been conjugate in G
3. G had never possesed elements of order 6, 15, or 10.
Suppose a 2-sylow subgroup T of a 3-Sylow normalizer NG (P ) contained
two commuting involutions s and t. Then, according to Step 4, each of s, t
and st must invert P . Clearly this is impossible. Thus T cannot be Z2 × Z2
and so must be cyclic.
Now any involution does appear in some 3-Sylow normalizer and hence
(by two applications of Sylow’s Theorem) is conjugate to the unique involution in the cyclic group T . So, since all involutions are conjugate to the same
involution, they are all conjugate.
Now since 3-Sylow subgroups are TI-subgroups, any element of order 6
or 15 would have to appear in a 3-Sylow normalizer, NG (P ), not inverting
P : absurd by Step 4. If u were an element of order 10, it’s square, u2 would
be an element of order 5 which could only act on the ten elements of Ω as
two disjoint 5-cycles. Then u could only have acted as a single 10-cycle, an
odd permutation that could not live within G.
Things were happening quickly now.
STEP 6. G had never contained an element of order 8 or 9.
We knew this because first, if G had ever possesed an element of order 8,
then G would have had a cyclic 2-Sylow subgroup. The file (Corallary 14)
229
clearly shows that in this case G cannot be a simple group10 .
On the other hand, we know that if there had ever been an element of
order 9, then any 3-Sylow subgroup of G would be cyclic. and so have an
automorhism group of order 6. Thus an element of order 4 normalizing P
(whose existence was certified by Step 5 (1)) would have it’s square centralizing P against Step 5 (3).
Here, the trail went cold. We could add only a few immediate footnotes to
what we already knew above: (1) there were 45 involutions, each normalizing
exactly two 3-Sylow subgroups. (2) Each 3-Sylow subgroup was elementary
abelian – i.e. of type Z3 × Z3 . Not much else. There just didn’t seem to be
any way to turn. We didn’t have the slightest idea what the structure of a
2-Sylow subgroup of G might have been, without a case-by-case pathology
scan on all five possible subgroups of order 8, and most of us were much too
close to retirement to do that. We only knew that a 2-Sylow subgroup could
not be cyclic of order 8.
At 0830 hours the next morning, one of my deputies (Gee, a Navaho)
knocked persistently on my door. He said,
“Sir? I have been looking at the file on (k, l, m)-groups, and I think it
may be of help.”
Normally I would have said something sarcastic like “How would you
know?” But I didn’t want to burden the Department with another one of
those required training sessions on Diversity, so I turned the subject to the
impersonal issue at hand. I said,
“How could that be? The integers k, l and m could be anything.”
“But Sir,” he said, bowing a little too deeply for my taste, “if we fixed an
involution t and looked at an element u of order 3, Steps 5 and 6 show that
there are only certain orders available to the element tu: it could only have
orders 2,3, 4 or 5. But these produce groups ht, ui, which are homomorphic
images of (k, l, m)-groups with k = 2, l = 3 and m = 2, 3, 4, or 5, whose
order is divisible by the least common multiple k, l, and m, since it contains
elements of these orders.”
I really hadn’t time for this obfuscation so I said
“So?” ( This always works.)
10
Perhaps the only useful information we ever received from questioning the regular
representation of a group.
230
“Moreover”, he went on interminably, “the (2, 3, m)-groups for m = 2, 3, 4
and 5 are all known in the file.”
“You forgot the case m = 1”. I deliberately said this a little crossly, a
trick I had learned in management school, or one of those schools I went to
or was sent to.
“But in that case Sir, we would have an element of order 2 times an
element of order 3 being an element of order 1.” It sounded familiar. I said,
“Certainly, I was just testing you, Deputy.”
“It is part of the same principle, Sir.” (Somehow I knew he was going to
preach to me.) “If an element t of order 2 times an element u of order 3 has
order m, then the group they generate must have order a multiple of 2, 3,
and m.
“I knew that“, I snapped (I had learned to “snap”, in management school;
they said it maintains discipline). “ But which (k, l, m)-groups have proper
homomorphic images of these orders?”
“None. Sir. I looked it up. For k = 2, l = 3, we have this: If m = 2, then
the group is Sym(3). If m = 3, it is a homomorphic image of a (2, 3, 3)-group
– that is, Alt(4) – of order at least 6. Thus it is exactly Alt(4) in this case. If
m = 4, it is a homomorphic image of Sym(4) of order at least 12, and hence
all of Sym(4). If m = 5, it is a homorphic image of Alt(5) of order at least
30. Thus in each case the group generated by t and u is the full abstract
(2, 3, m)-group.”
“Good work, Deputy. What do you propose to do?”.
“Well, sir, I suggest we fix an involution t (recall that all involutions
are conjugate) as a sort of ‘sting’. Then we watch various Z3 -subgroups
hui approach t and record the order tu.” (I just knew Gee had no feeling
for Departmental expenditures.) “This order (he went on) determines the
group ht, ui, the (2, 3, m) group. We know that m cannot be 5 for otherwise
ht, ui ' Alt(5) has index 6 against (H’). So something is bound to happen.”
I put down my notebook and excused the Deputy. In part, my job is to
make these vague ideas of my deputies precise.
We set everything up by fixing an involution t. The potential interactions
of t with the set A of 40 Z3 ’s were to be sorted out. We had
A2
A3
A4
A5
:=
:=
:=
:=
{A ∈ A|ht, Ai = Sym(3)}
{A ∈ A|ht, Ai = Alt(4)}
{A ∈ A|ht, Ai = Sym(4)}
{A ∈ A|ht, Ai = Alt(5)},
231
(6.10)
(6.11)
(6.12)
(6.13)
and the last set was empty by our working assumption. Then setting ai :=
|Ai |, i = 2, 3, 4, we knew that
a2 + a3 + a4 = 40.
The trap was set. I was worried; we still didn’t know anything about the
2-Sylow subgroups (except that they weren’t Z8 ’s).
First came A2 . If A were in A2 , then A would be normalized (and inverted) by t and hence also t would invert the unique 3-Sylow subgroup P
which contains A (Steps 2 and 4). Each such 3-Sylow contributes 4 elements of A2 and t normalizes just two 3-Sylow subgroups (letters of Ω), so
a2 = 2 × 4 = 8. We carefully entered this in the report as
STEP 7. |A2 | = 8.
None of the deputies seemed to know what to do with the subgroups
A ∈ A3 ∪ A4 . We only knew that there had to be some A coming along, and
patience was our game.
It was at this time that Deputy Gee insinuated himself once more.
“Any ideas?”, I asked.
“Are you still mad at me sir?” (At first I did not recall what he was
talking about, perhaps some abruptness on my part.)
“I don’t know what you are talking about. Besides, you responded to my
question with a question. What have you found out, Deputy?” (Always take
charge of the dialogue: the Management Manual (MM) p. 43.)
“Well sir, if A is in one of the classes A3 or A4 , then A and t act on some
Klein 4-group, K. In case A ∈ A3 , t actually lies in K; but if A ∈ A4 then
t does not lie in K and does not normalize A (otherwise A would be in A2 ).
Thus 4-groups of the latter sort contribute just two members of A4 while
4-groups of the former sort contribute four members of A3 .”
I broke the silence of my scratching in my notebook with this review:
“So, according to you, our prayers are answered if we know all of the
4-groups K which are normalized by t. But how in this bloody world are we
to do this if we dont even know what the 2-Sylow subgroups look like?”
I have never seen a more expressionless face when he ventured that if a 2Sylow subgroup S were abelian, the Burnside Fusion theorem together with
the fact that all involutions were conjugate would eliminate most groups: If
S = Z2 ×Z2 ×Z2 then 7 would divide 360 – not a popular idea; S = Z2 ×Z4 has
232
two species of involutions, one square in S and three non-square involutions,
so they cannot be conjugate in the normalizer of S.
“ Finally, sir, you told me yourself that Z8 had been eliminated. Thus S
is one of the two non-abelian groups of order 8 and the quaternion group of
order 8 is eliminated because of the presence of a 4-group.”
I was really glad I had thought of this, and am now encouraged to reconsider my promised retirement. At any rate I have once again made Gee’s
vague ideas more precise by writing
STEP 8. A 2-Sylow subgroup of G is dihedral of order 8.
Sometimes Gee can be a pest. He then insisted that we now enjoyed
(“enjoyed” was his word) the hypotheses:
(K1) G is a simple group.
(K2) There exists an involution t such that CG (t) ' D8 .
Naturally I knew that; I even knew the name of the file (I think it was
called “A Geometry of Klein 4-groups.”) and I knew that he would cough
it up. I hadn’t time to read it fully, but he said that it contained these
statements:
STEP 9. (From a file) If G is a simple group and if t is an involution in G
with CG (t) := D ' D8 , the dihedral group of order 8, then
1. D contains exactly two 4-groups, K1 and K2 , both containing the
unique central involution t of D.
2. K1 is not conjugate to K2 in G.
3. Ki is a TI-subgroup of G – that is,
Ki ∩ Kkg = Ki or 1, for all g ∈ G, i = 1, 2.
4. G has just one class of involutions.
5. NG (Ki ) ' Sym(4), i = 1, 2.
6. And some other stuff about a partial linear space Γ := (KiG , K2G ) with
incidence being mutual normalization.
233
Perhaps after all, this would give us a handle on A3 and A4 . We now knew
all about normalizers of fours-groups. They were all conjugate to NG (K1 ) or
NG (K2 ), and Gee’s lucky guess that all (2, 3, 3)- and (2, 3, 4)-subgroups must
appear only inside such subgroups was going to pay off.
In calculating a3 we needed only look at the two 4-groups K1 and K2
containing t, and record four Z3 ’s in A3 for each. Thus
STEP 10. |A3 | = 8.
Next we had to examine the collection K(t) of all 4-groups K normalized
by t but not containing it. For each K in K(t), CK (t) is a group hsi ' Z2
and there are just four choices of involution s in CG (t) − {t}. For each such
choice, K is the unique 4-group on s distinct from the 4-group hs, ti, so that
t∈
/ K. Thus |K(t)| = 4. with two members in K1G and two members in K2G .
Then, as noted above, each member of K(t) yields two members of A4 . Thus
STEP 11. |A4 | ≤ 8.
We write “≤” in case we might have counted some Z3 two times. Chee,
with the confidence of his youth, insists that that does not happen.
At any rate we see that a2 + a3 + a4 ≤ 24 and so we have reached a
contradiction. There is no possible way G is not Alt(6). The case is closed.
Your Servant, Inspector van Naughty.
234
Chapter 7
Elementary Properties of Rings
7.1
Elementary Facts about Rings
In the next several parts of this course we shall take up the following topics
and their applications:
1. The Galois Theory.
2. The Arithmetic of Integral Domains.
3. Semisimple Rings.
4. Multilinear Algebra.
All of these topics require some basic facility in the language of rings and
modules.
For the purposes of this survey course all rings possess a multiplicative
identity. This is not true of all treatments of the theory of rings. But
virtually every major application of rings is one which uses the presence of
the identity element (for example, this is always part of the axiomatics of
Integral Domains, Field Theory, and appears in Artinian Rings and Finitedimensional Associative Algebras). So in order to even have a language to
discuss these topics, we must touch base on a few elementary definitions and
constructions.
235
7.1.1
Definitions
A ring is a set – say R – endowed with two distinct binary operations, called
(ring) addition and (ring) multiplication – which we denote by “+” and
by juxtaposition or “·”, respectively – such that
Addition laws: (R, +) is an abelian group.
Multiplicative Rules: (R, ·) is a monoid.
Distributive Laws: For all triples (a, b, c) ∈ R × R × R,
a(b + c) = ab + ac
(a + b)c = ac + bc
(7.1)
(7.2)
Just to make sure we know what is being asserted here, for any elements
a, b and c of R:
1. a + (b + c) = (a + b) + c
2. a + b = b + a
3. There exists a “zero element” (which we denote by 0R ) with the property that 0R + a = a for all ring elements a.
4. Given a ∈ R, their exists an element (−a) in R such that a+(−a) = 0R .
5. a(bc) = (ab)c.
6. There exists an “identity element” (which we denote by 1R ) with the
property that 1R a = a1R = a, for all ring elements a ∈ R.
7. The distributive laws which we have just stated.
As we know from the group theory, the zero element 0R is unique in R.
Also given element x in R, there is exactly one additive inverse −x, and
always −(−x) = x. The zero element may not be distinct from the multiplicative identity 1R (although this is required for integral domains as seen
below); this makes the class of rings closed under homomorphisms. When
there is no confusion (and sometimes with multiple uses of the notation) we
often write 0 for 0R and 1 for 1R .
236
Lemma 132 For any elements a, b in R:
1. (−a)b = −(ab) = a(−b), and
2. 0 · b = b · 0 = 0.
Some notation: Suppose X and Y are subsets of the set R of elements of
a ring. We write
X + Y := {x + y|x ∈ X, y ∈ Y }
XY := {xy|x ∈ X, y ∈ Y },
−X := {−x|x ∈ X}
(7.3)
(7.4)
(7.5)
There are various species of rings.
• A ring is said to be commutative if and only multiplication is commutative in it.
• A division ring is a ring in which (R∗ := R − {0}, ·) is a group.
• A commutative division ring is a field.
• A commutative ring in which (R∗ , ·) is a monoid – i.e. the non-zero
elements are closed under multiplication and possesses an identity – is
called an integral domain.
There are many other species of rings – Artinian, Noetherian, Semiprime,
Primitive, etc. which we shall meet later, but there is no immediate need for
them at this point.
Finally, if R is a ring with multiplicative identity element 1, then a subring of R is a subset S of R such that
1. S is a subgroup of (R, +). Thus S = −S and S + S = S. In particular
0R ∈ S.
2. SS ⊆ S, –i.e. S is closed under multiplication.
3. 1 ∈ S.
Thus the ring of integers, Z, is a subring of any of the three fields Q, R
and C, of rational, real and complex numbers, respectively. But the set of
even numbers is not a subring of the ring of integers.
237
7.1.2
Units in rings
Suppose R is a ring with multiplicative identity element 1R . We say that an
elements x in R has a right inverse if and only if there exists an element
x0 in R such that xx0 = 1R . Similarly, x has a left inverse if and only if
there exists an element x00 in R such that x00 x = 1R . (If R is not an integral
domain these right and left inverses might not be unique.) An element in R
is called a unit if and only if it possesses both a right and a left inverse in R.
The element 1R , of course, is a unit, so we always have at least one of these.
We observe
Lemma 133
1. An element x has a right inverse if and only if xR = R.
Similarly, x has a left inverse if and only if Rx = R.
2. The right inverse and left inverse of a unit element are equal.
3. The set of units of any ring form a group under multiplication.
proof: Part 1. If x has a right inverse xR contains xx0 = 1R , so xR contains
R. The converse, that xR = R implies that xx0 = 1R , for some element x0
in R, is transparent. The left inverse story follows upon looking at this
argument through a mirror.
Part 2. Suppose xl and xr are respectively the left and right inverses of
an element x. Then
xr = (1R )xr = (xl x)xr = xl (xxr ) = xl (1R ) = xl .
Part 3. Suppose x and y are units of the ring R with inverses equal to
x and y −1 , respectively. Then xy has y −1 x−1 as both a left and a right
inverse. Thus xy is a unit. Thus the set U (R) of all units of R is closed
under an associative multiplication, has an identity element with respect to
which every one of its elements has a 2-sided inverse. Thus it is a group.
−1
A ring in which every non-zero element is a unit must be a division ring
and conversely.
Example 1.. Some examples of groups of units:
1. In the ring of integers, the group of units is {±1} , forming a cyclic
group of order 2.
238
2. The integral domain Z ⊕ Zi of√all complex numbers of the form a +
bi with a, b integers and i = −1, is called the ring of Gaussian
integers. The units of this ring are {±1, ±i} forming the cyclic group
of order four under multiplicaton.
3. The complex numbers of the form a + b(e2iπ/3 ), a and b integers, form
an integral domain called the Eisenstein numbers. The units of this
ring are {1, z, z 2 , z 3 = −1, z 4 , z 5 }, where z = −e2iπ/3 , forming a cyclic
group of order 6.
√
√
4. The number λ := 2 + 5 is a unit in the domain D := {a + b 5|a, b ∈
Z}. Now λ is a real number, and since 1 < λ, we see that the successive powers of λ form a monotone increasing sequence, and so we may
conclude that the group U (D) of units for this domain is an infinite
group.
5. In the ring R := Z/Zn of residue classes of integers mod n, under
addition and multiplication of these residue classes, the units are the
residue classes of the form w + Zn where w is relatively prime to n,
(Of course, if we wish, we may take 0 ≤ w ≤ n − 1.) They thus form a
group of order φ(n), where φ denotes the Euler phi-function. Thus
the units of R = Z/8Z are {1 + 8Z, 3 + 8Z, 5 + 8Z, 7 + 8Z}, and form
the Klein fours group rather than the cyclic group of order four.
6. In the ring of n-by-n matrices with entries from a field F , the units are
the invertible matrices and they form the general linear group under
multiplication.
7.1.3
Homomorphisms
Let R and S be rings. A mapping f : R → S is called a ring homomorphism if and only if
1. For all elements a and b of R, f (a + b) = f (a) + f (b).
2. For all elements a and b of R, f (ab) = f (a)f (b).
A basic consequence of Part 1 of this definition is that f induces a homomorphism of the underlying additive groups, so, from our group theory
we know that f (−a) = −f (a), and f (0R ) = 0S . In fact any polynomial
239
expression formed from a finite set of elements using only the operations of
addition, multiplication and additive inverses gets mapped by f to a similar
expression with all the original elements replaced by their f -images. Thus:
f ([(a + b)c2 + (−c)bc]a) = [(f (a) + f (b))(f (c))2 + (−f (c))(f (b)f (c))]f (a).
The homomorphism is onto or an epimorphism or surjection, injective, or an isomorphism if and only if these descriptions apply to the
induced group homomorphism of the underlying additive groups.
7.1.4
Ideals and factor rings.
Again let R be a ring. A left ideal is a subgroup (I, +) of the additive
group (R, +), such that it is invariant under left multiplications from R,
– that is RI ⊆ I. Similarly, a subgroup J ≤ (R, +) is a right ideal if
JR ⊆ J. Finally, a (two-sided) ideal is a left ideal which is also a right
ideal. The parentheses are to indicate that if we use the word “ideal” without
adornment, a two-sided ideal is intended. A subset X of R is a two sided
ideal if and only if
X +X = X
−X = X
RX + XR ⊆ X.
(7.6)
(7.7)
(7.8)
If X is a two sided ideal in R, then it is a (normal) subgroup of the
commutive additive group (R, +). Thus we may form the factor group
R/X := (R, +)/(X, +) whose elements are the additive cosets x + X in
R.
Not only is R/X an additive group, but there is a natural multiplication
on this set, induced from that fact that all products formed from two such
additive cosets, x + X and y + X, lie in a unique further additive coset. This
is evident from
(x + X)(y + X) ⊆ xy + xX + Xy + XX ⊆ xy + X + X + X = xy + X. (7.9)
With respect to addition and multiplication, R/X forms a ring which we
call the factor ring. Moreover, by equation (7.9), the homomorphism of
additive groups R → R/X is a ring epimorphism.
240
7.1.5
The fundamental theorems of ring homomorphisms
Suppose f : R → S is a homomorphism of rings. Then the kernel of the
homomorphism f is the set
ker f := {x ∈ R|f (x) = 0S }.
We have
Theorem 134 (The first fundamental theorem of ring homomorphisms.)
1. The kernel of any ring homomorphism is a two-sided ideal.
2. If f : R → S is a homomorphism of rings, then there is a natural
isomorphism between the factor ring R/ ker f and the image ring f (R),
which takes x + ker f to f (x).
proof: Part 1. Suppose f : R → S is a ring homomorphism and that x is
an element of ker f . Then as ker f is the kernel of the group homomorphism
of the underlying additive groups of the two rings, ker f is a subgroup of the
additive group of R. Moreover, for any element y in R, f (rx) = f (r)f (x) =
f (r))0S = 0S . Thus R(ker f ) + (ker f )R ⊆ R, and so ker f is a two-sided
ideal.
Part 2. The image ring f (R) is a subring of S formed from all elements
of S which are of the form f (x) for some element x in R. We wish to
define a mapping φ : R/(ker f ) → f (R) which will take the additive coset
x + ker f to the image element f (x). We must show that this mapping
is well-defined. Now if x + ker f = y + ker f , then x + (−y) ∈ ker f , so
0S = f (x + (−y)) = f (x) + (−f (y)), whence f (x) = f (y). So changing
the coset representative does not change the image, and so the mapping is
well-defined. (In fact the φ-image of x + ker f is found simply by applying f
to the entire coset, that is, f (x + ker f ) = f (x).)
The mapping φ is easily seen to be a ring homomorphism since
φ((x + ker f ) + (y + ker f ) =
=
φ((x + ker f )(y + ker f ) =
=
φ(x + y + ker f ) = f (x + y)
f (x) + f (y) = φ(x + ker f ) + φ(y + ker f ).
φ(xy + ker f ) = f (xy)
f (x)f (y) = φ(x + ker f ) · φ(y + ker f ).
241
Clearly φ is onto since by definition, f (x) = φ(x + ker f ) for any image
element f (x). If φ(x+ker f ) = φ(y+ker f ) then f (x) = f (y) so f (x+(−y)) =
0S , and this gives x + ker f = y + ker f . Thus φ is an injective mapping.
We have assembled all the requirments that f be an isomorphism.
Theorem 135 Suppose f : R → S is an epimorphism of rings. Then there
is a one-to-one correspondence between the left ideals of S and the left ideals of R which contain ker f . Under this correspondence, 2-sided ideals are
matched only with 2-sided ideals.
proof: If J is a left ideal of the ring S, set f −1 (J) := {r ∈ R|f (r) ∈ J}.
Clearly if a and b are elements of f −1 (J), and r is an arbitrary element of R,
then
f (ra) = rf (a) ∈ J
f (−a) = −f (a) ∈ J
f (a + b) = f (a) + f (b) ∈ J
So ra, −a, a+b all belong to f −1 (J). Thus f −1 (J) is a left ideal containing
f (0S ) = ker f . If also J is a right ideal of S, we have JS ⊆ J. Then
f (f −1 (J)R) ⊆ Jf (R) = JR ⊆ J, whence f −1 (J)R ⊆ f −1 (J). Thus f −1 (J)
is a right ideal if and only if J is.
Now if I is a left ideal of R, then so is f (R) a left ideal of F (R) =
S. But if I contains ker f , then f −1 (f (I)) = I, because, for any element
x ∈ f −1 (f (I)), there is an element i ∈ I such that f (x) = f (i). But then
x + (−i) ∈ ker f ⊆ I, when x ∈ I.
We thus see that the operator f −1 induces a mapping of the poset of left
ideals of S onto the set of left ideals of R containing ker f . It remains only
to show that this (containment-preserving) operator is one-to-one. Suppose
I = f −1 (J1 ) = f −1 (J2 ). We claim J1 ⊆ J2 . If not there is an element
y ∈ J1 − J2 . Since f is an epimorphism, y has the form y = f (x), for some
element x in R. Then x is in f −1 (J1 ) but not f −1 (J2 ), a contradiction. Thus
we have J1 ⊆ J2 . By symmetry, J2 ⊆ J1 and so J1 = J2 . This completes the
proof.
−1
a second proof: We can take advantage of the fact that the kernel of the
surjective ring homomorphism f : R → S is also the kernel of the homomorphism f + : (R, +) → (S, +) of the underlying additive groups and exploit
242
the fact that the fundamental theorems of homomorphisms of group theory
give us a poset isomorphism
A(R : ker f ) → A(S : 0),
where A(M, N ) denotes the poset of all subgroups of M which contain the
subgroup N . The isomorphism is given by sending subgroup A of A(R, ker f )
to the subgroup f (A) of S.
We need only show that if X is a subgroup of A(R : S). then
1. RX ⊆ X if and only if Sf (X) ⊆ f (X).
2. XR ⊆ X if and only if f (X)S ⊆ f (X).
Both of these are immediate consequences of the fact that f is a ring homomorphism, and S = f (R).
7.1.6
The poset of ideals.
We wish to make certain statements that hold for the set of all left ideals
of R as well as for the set of all right ideals and all (two-sided) ideals of R.
We do this by asserting “X holds for (left, right) ideals with property Y”.
This means the statement is asserted three times: once about ideals with
property Y, again about left ideals with property Y, and again about right
ideals with property Y. It does save a little space, and helps point out a
uniformity among the theorems. But on the other hand it makes statements
difficult to scan, and sometimes renders the statement a little less indelible
in the students memory. So I will keep this practice at a minimum.
Lemma 136 Suppose A and B are (left, right) ideals in the ring R. Then
1. A + B := {a + b|(a, b) ∈ a × B} is itself a (left,right) ideal and is the
unique minimal such which contains both A and B.
2. A ∩ B is a (left, right) ideal which is the maximal such contained in
both A and B.
3. The set of all (left, right) ideals of R form a partially ordered set —
that is a poset – with respect to the containment relation. In view of
the preceeding two conclusions, this poset is a lattice.
243
As in the case of groups, when we say maximal (left, right) ideal, we
mean a maximal member in the poset of all proper (left, right) ideals. Thus
the ring R itself, though an ideal, is not a maximal ideal.
Lemma 137 Maximal (left, right) ideals exist.
proof: Let P be the full poset of all proper ideals, all proper left ideals or
all proper right ideals of the ring R. Let
J 1 ≤ J2 ≤ · · ·
be an ascending chain in the poset P . Let U be the set-theoretic union of
the Ji . Then any element of U belongs to some Jj . Thus any (left, right)
multiple of that element by an element of R belongs to U . Thus it is easy to
see that RU R, RU , or U R is a subset of U in the three respective cases of
P . Similarly, −U = U and using the total ordering of the chain, U + U ⊆ U .
Thus U is an (left, right) ideal in R. If U = R, then 1R would belong to some
Jj , whence RJ1 , J1 R and RJj R would all coincide with R, against Jj ∈ P .
Thus U is a member of P and is an upper bound to the ascending chain.
Since any ascending chain in P has an upper bound, it follows from Zorn’s
Lemma that P contains maximal members.
A simple ring is a ring with exactly two (two-sided) ideals: 0 := {0R }
and R. (Note that the “zero-ring” in which 1R = 0R is not considered a
simple ring here. Simple rings are non-trivial. Since every non-zero element
of a division ring D is a unit, 0 is a maximal ideal of D. Thus any division
ring is a simple ring. The really classical example of a simple ring, however,
is the full matrix algebra over a division ring, which appears among the
examples at the end of this section.
Lemma 138
1. A (two-sided) ideal A of R is maximal if and only R/A
is a simple ring.
2. An ideal A of a commutative ring R is maximal if and only if R/A is
a field.
proof: Part 1, is a direct consequence of Theorem 135 of the previous
subsection.
244
Part 2. Set R̄ := R/A. If R̄ is a field, it is a simple ring, and A is maximal
by Part 1. If A is a maximal ideal, the commutative ring R̄ has only 0 for a
proper ideal. Thus for any non-zero element x̄ of R̄, x̄R̄ = R̄x̄ = R̄. Thus
any non-zero element of R̄ is a unit. Thus R̄ is a commutative division ring,
and hence is a field.
Suppose now that A and B are ideals in a ring R. In a once-only departure
from our standard notational convention –special for (two-sided) ideals – we
let the symbol AB denote the additive subgroup of (R, +) generated by the
set of elements {ab|(a, b) ∈ A × B} (the latter would ordinarily have been
written as “AB” in our standard notation). That is, for ideals A and B, AB
is the set of all finite sums of the form
X
ab,
j∈I j j
where |I| < ∞, aj ∈ A, bj ∈ B.
If we multiply such an expression on the left by a ring element r, then by the
left distributive laws, each aj is replaced by raj which is in A since A is a left
ideal, yielding thus another such expression. Similarly, right multiplication
of such an element by any element of R yields another such element. Thus
AB is an ideal. 1
Such an ideal AB clearly lies in A ∩ B, but it might be even smaller.
Indeed, if B lies in the right annihilator of A, that is the set of elements
x of R such that Ax = 0, then AB = 0, while possibly A ∩ B is a non-zero
ideal (with multiplication table all zeroes, of course).
This leads us to still another construction of new ideals for old: Suppose
A and B are right ideals in the ring R with B ≤ A. The right annililator
of A modB is the set
AnnR (A : B) := {r ∈ R|Ar ⊆ B}.
Similarly, if B and A are left ideals of R, with B ≤ A, the same symbol
AnnR (A : B) will denote the left annihilator of A mod B, the set of ring
elements r such that rA ⊆ B.
Lemma 139 If B ≤ B is a chain of right (left) ideals of the ring R, then
AnnR (A : B) is a two-sided ideal of R.
1
Indeed, one can see that it would be a two-sided ideal even if A were only a left ideal
and B were a right ideal. It is unfortunate to risk this duplication of notation: AB as
ideal product, or set product, but I feel bound to hold to the standard notation of the
literature – when there is a standard –even if there are small overlaps in that standard.
245
proof: We prove here only the right-hand version. Set W := AnnR (A : B).
Clearly W + W ⊆ W , W = −W , and W R ⊆ W , since B is a right ideal. It
remains to show RW ⊆ W . Since A is also a right ideal,
A(RW ) = (AR)W ⊆ AW ⊆ B,
so by definition, RW ⊆ W .
A prime ideal of the ring R is an ideal – say P – such that whenever
two ideals A and B have the property that AB ⊆ P , then either A ⊆ P , or
B ⊆ P . (Note here AB is the product of two ideals as defined above. But of
course, since P is an additive group, AB (as a product of two ideals) lies in
P if and only if AB (the old set product) lies in P – so the multiple notation
is not critical here.
A rather familiar example of a prime ideal would be the set of all multiples
of a prime number – say 137, to be specific –in the ring of integers. This
set is both a prime ideal and a maximal ideal. The ideal {0} in the ring of
integers would be an example of a prime ideal which is not a maximal ideal.
Theorem 140 In any ring, a maximal ideal is a prime ideal.
proof: Suppose M is a maximal ideal in R and A and B are ideals such
that AB ⊆ M . Assume A is not contained in M . Then A + M = R, by
the maximality of M , but RB = AB + M B ⊆ M , so B belongs to the
two-sided ideal AnnR (R : M ). Since M is a left ideal, M also belongs to
this right annihilator. Thus we see that M + B lies in this right annihilator,
AnnR (R : M ). But the latter cannot be R since 1R does not annihilate R
mod M on the right. Thus M + B is a proper ideal of R and maximality of
M as a two-sided ideal forces B ⊆ M .
Thus either A ⊆ M or (as just argued) B ⊆ M . Thus M is a prime ideal.
7.1.7
Examples
Example 2. Constructions of fields. Perhaps the three most familiar
fields are the rational numbers Q, the real numbers R, and the complex
numbers C. In addition we have the field of integers modulo a prime. Later,
when we begin the Galois theory and the theory of fields, we shall meet a
number of ways to construct a field, namely
246
1. Forming the factor ring D/M where M is a maximal ideal in an integral
domain D.
2. Forming the ring of quotients of an integral domain.
3. Forming the topological completion of a field.
The first of these methods is contained in the theorems already encountered. Any commutive ring , modulo a maximal ideal is a field. It turns out
that any commutative ring modulo a maximal two-sided ideal is a field.
The other two must be deferred.
Example 3. The monoid algebras. Recall that a monoid is a set with
an associative operation (say, multiplication) and an identity element with
respect to that operation. Here are some monoids:
• The natural numbers (the positive integers) N under multiplication.
The integer 1 is the multiplicative identity.
• The multisets over a set X. The elements of this monoid consist of the
integer zero together with formal positive-integer linear combinations
a1 x1 + a2 x2 + · · · am xm , m, a1 , . . . am ∈ N, xi ∈ X,
where addition is performed by adding coefficients of like elements of
X. For example if X = {a, b, o}, where a = apples, b = bananas, and
o = oranges, then (a+2b)+(b+2o) = a+3b+2o – that is, an apple and
two bananas added to a banana and two oranges is one apple, three
bananas and two oranges. Thus one can add apples and oranges in this
system! We used just such a multiset monoid over the simple groups
in proving the Jordan-Hölder Theorem for groups.
• The Free Monoid over X. This was the monoid whose elements are
the words of finite length over the alphabet X. These are sequences of
juxtaposed elements of X, x1 x2 · · · xm where m ∈ N and the “letters”
xi are elements of X, as well as the empty word which we denote by
φ. The associative operation is concatenation of words, the act of
writing one word right after another. The empty word is an identity
element. We utilized this monoid in constructing free groups.
The free monoid over the single letter alphabet X = {x}, consists of
φ, x, xx, xxx, . . . which we denote as 1, x, x2 , x3 . . ., etc. Clearly this
247
monoid is isomorphic to the monoid of non-negative integers Z+ :=
N ∪ {0} under addition.
• Of course any group is a monoid.
Now let F be any field, and let M be a monoid. The monoid algebra
F M is the set of all formal finite F -linear combinations of the elements of
M . Two such expressions are added in the usual way, by adding coefficients
of the same elements of M to obtain the coefficient of that element in the
sum. Similarly multiplication is defined by
X
(
X
a σ)(
σ σ
b τ) =
τ τ
X X
κ
(
a b )κ.
σ,τ,στ =κ σ τ
Essentially this “convolution formula” results from applying every possible
distributive law in sight (assuming that scalars from F commute with monoid
elements) and then collecting terms.
• In the case that M is a group, F M is called a group algebra.
• In the case that M is the free monoid on the single generator {x}, then
F M is called the ring of polynomials over F and is denoted F [x].
• In the case that M is the monoid of multisets over the finite set
X = {x1 , x2 , . . . , xn }, the monoid algebra F M is called the ring of
polynomials in the indeterminates {x1 , . . . , xn } and is denoted
F [x1 , . . . , xn ].
Example 4. Endomorphism rings. Let (A, +) be an additive abelian
group. Let Hom(A, A) denote the set of all mappings from A into itself. We
can define addition on this set by letting f + g be the mapping A → A,
whose value at any element a in A is f (a) + g(a) — here f and g can be
any mappings from A into A. With respect to this definition of addition,
the reader should be able to show that Hom(A, A) is an abelian group with
respect to addition. Notice that the zero mapping which sends every element
of A to its additive identity, is the identity element of Hom(A, A).
Now there is a second binary operation that can be performed on the
set Hom(A, A), and that is the composition of mappings: If f and g are in
Hom(A, A), then f ◦ g is the mapping which takes any element a of A to
248
f (g(a)). Compositions like this are always associative, and form a monoid
which has the identity mapping a → a as its identity element.
Is this composition distributive with respect to addition? The reader
should be able to see that it is not. The trouble is that f (g(x) + h(x)) may
not be f (g(x)) + f (h(x)), if f does not respect group addition.
So instead let us restrict to a subgroup of Hom(A, A), namely HomZ (A, A)
which denotes the set of all endomorphisms of the group A into itself. The
reader can quickly verify the following:
1. HomZ (A, A) is indeed a subgroup of Hom(A, A) that is, it is closed
under the addition operation and the taking of additive inverses.
2. HomZ (A, A) is closed under map compositions – that is, the composition of two group endomorphisms is an endomorphism.
3. Composition satisfies both distributive laws with respect to addition.
Thus we obtain a ring, and we write HomZ (A, A) as End(A) when we
wish to emphasize its ring structure. This is the endomorphism ring of
an abelian group.
Suppose now that A happens to be a right vector space over the division ring D. Then we can specialize even further. Instead of considering all
endomorphisms, as in End(A), we may just take those which are linear transformations of A. This set, HomD (A, A), is also a subgroup of HomZ (A, A)
which is closed under multiplication and contains both the “zero map” and
the identity mapping. Again, it is a ring, called the D-endomorphism ring
of the D-vector space A and is denoted EndD (A).
When A has finite dimension n, then by choosing a basis, every linear
transformation of V into itself can be rendered uniquely as an n × n matrix.
Then addition and composition of linear transformations is duplicated as
matrix addition and multiplication. In this guise, EndD (A) is seen to be
isomorphic to the full matrix algebra over D – the ring Dn×n of all n × n
matrices over D, with respect to matrix addition and multiplication.
Each basis of A yields a ring isomorphism EndD (A) → Dn×n .
The algebra Dn×n is a simple ring for every natural number n and division ring D. We will not prove this until much later; it is only offered here
as a “sociological” statement about the population of mathematical objects
which are known to exist. It would be nearly sadistic to propose definitions
249
to students which turn out to be inhabitated by nothing – still one more description of the empty set. So I perceive some value in pointing out (without
proof) that things exist.
Example 5. Opposite rings. Suppose we have a ring R with a given
multiplication table. We can define a new multiplication (let’s call it “*”)
by transposing this multiplication table. That is, a ∗ b := ba for all a and b
in R. The new operation is still distributive with respect to addition with
both distibutive laws, and it is easy for the student to verify that (R, +, ∗)
is a ring. (Remember, we need a multiplicative identity, and the associative
law should at least be checked.) We call this ring the opposite ring of R
and denote it by the symbol Opp(R).
Example 6. Direct Product of rings. Suppose {Rσ }I is a family of
rings indexed by I. Let 1σ be the multiplicative identity element of Rσ . Now
we can can form the direct product of the additive groups {(Rσ , +)}I :
P :=
Y
σ∈I
(Rσ , +).
Recall that the elements of the product are functions f : I → ∪I Mσ (the
abstract disjoint union), with the property that f (σ) ∈ Rσ . We can then
convert this additive group into a ring by defining the multiplication of two
elements f and g of P by the rule that f g is the function I → ∪I Rσ defined
by
(f g)(σ) := f (σ)g(σ),
where the juxtapostion on the right indicates multiplication in the ring Rσ .
(This sort of multiplication is often called “coordinatewise multiplication”.)
It is an easy exercise that this multiplication is associative and is both left
and right distributive with respect to addition in P . Finally, one notes that
the function 1P whose value at σ is 1σ , the unique multiplicative identity of
the ring Rσ , comprises a multiplicative identity element for P , so P is indeed
a ring.
Note that if K is any subset of the index set I, then the subset PK of all
functions f : I → ∪I Rσ which vanish on K forms a (two-sided) ideal of P .
Now that we understand a direct product of rings, is there a corresponding
direct sum of rings? The additive group (S, +) of such a ring should be
a direct sum ⊕I (Rσ , +) of additive groups of ring. But in that case, the
multiplicative identity element 1S of the ring S must be uniquely expressible
250
as a function which vanishes on all but a finite subset F of indices in I. It then
follows that if multiplication is that defined by “pointwise multiplication” as
for its ambient direct product, that in fact the direct sum of rings,
⊕I Rσ = ⊕F Rσ ,
is always a finite direct sum of rings. This is an expense one pays for in
maintaining a multiplicative identy.
Example 7. Integral domains. Let D be any integral domain. Recall
that this means that D is a commutative non-zero ring in which the product
of a non-zero element and any other is again a non-zero element. In other
words, D∗ := D − {0D } is closed under multiplication (and hence, from the
other ring axioms, D∗ is a multiplicative monoid.
Obviously, the example par excellance is the ring of integers. ( Who
knows? Perhaps this ring was the inspiration of the actual term, “integral
domain”.) But perhaps that is also why it need not be the example par
example.
The integers certainly enjoy every property that the any integral domain
does. But there are many further properties of the integers which are not
shared by many integral domains. For example, every integer possesses an
essentially unique factorization (up to the order of the factors or the replacing
of any factor by a unit times that factor) in prime numbers. That is true of
a large class of integral√domains, but it is not true of many further integral
domains (such as Z ⊕ −7Z).
Every ideal of Z has the form Zx where x is a particular element of Z.
Any ideal of this form is called a principal ideal. (That is for an arbitrary
ring, a principal ideal is a ring of the shape RxR for some element x. If all
ideals of an integral domain have this shape, we say that D is a principle
ideal domain or PID. Many integral domains don’t have this property. I
am telling you this for your own good.
The characteristic of an integral domain.
Suppose now that D is an integral domain. We define the characteristic
of D as the smallest natural number n such that nx = xn is zero for every
element x in D. Of course we can make this concept very local by considering
the natural numbers nx such that nx is the smallest natural number such that
x + x + · · · x (with exactly nx summands) is 0R .
251
Then it should be obvious from the distributive laws that nx divides n1 ,
since 1 + 1 + · · · + 1( to n1 factors) implies that x + . . . + x = x(1 + · · · + 1) =
xnx = x0R = 0R . Thus nx divides n1 .
On the other hand, if x is nonzero, and if we consider x added to itelf
with exactly nx summands, we obtain
0 = x + x + · · · , +x = x(1 + 1 + · · · + 1) = xnx .
(7.10)
Since D∗ is closed under multiplication either x = 0 or 1 + · · · + 1 (to nx
summands ). The former alternitive is forbidden by the choice of x. Thus
n1 divides nx . Thus nx = n1 , since these are natural numbers, each dividing
the other. We have shown
Lemma 141 For any integral domain D, one of the following two alternatives hold:
1. Either for every non-zero element x, no sum of x0 s can ever be zero, or
2. There exists a natural number c such that cx = xc = 0 for every every
element x of R and for every smaller number c0 < c, Dc0 6= 0.
In the former case we say that D has characteristic zero, while in the
latter case we say that has characteristic c where c is the number described
in the second part. Since, in an integral domain, the product of two non-zero
elements must remain non-zero, it follows that the number c := n1 is a prime
number .
All subrings of an integral domain of characteristic c are integral domains
of the same characteristic c. Thus if D is any sort of subring of the complex numbers C, then it is an integral domain of characteristic 0. Thus the
Gaussian Integers, the Eisenstein numbers, or any other subring of the reals
or complex numbers, must be an integral domain of characteristic 0. The
field Z/pZ, where p is a prime number, clearly has characteristic p. Thus the
characteristic is either 0 or a positive prime number p. Of course every field
is also an integral domain, so they too possess a characteristic.
The cancellation law for integral domains.
Suppose D is an integral domain. If, for any three elements (a, b, c) ∈ D ×
D × D, b 6= 0,
ab = cb, then
252
(7.11)
b = c
(7.12)
As a check on one’s ability to construct proofs that everyone might understand, the student should attempt to lay out an unassailable proof of the
above assertion.
Finite integral domains are fields.
Suppose now that D is a finite integral domain. Then for any non zero
element d, right multiplication by d must be an injective mapping D → D.
For if ad = bd then (a − b)d = 0R . But if d 6= 0 then by the integral domain
hypothesis, a = b. Thus right multiplication by any non-zero element is
injective. Therefore , as D∗ is a finite set, it must be surjectve (the famous
“Pidgeon Hole Principle). Thus xR = Rx = R for every non-zero element x.
Of course, this means that x is a unit. Thus D is a commutive ring in which
every non-zero element is right invertible, hence left invertible, hence a unit
– i.e. D is a field.
253
Chapter 8
Elementary properties of
Modules
8.1
8.1.1
Basic theory
Modules over rings.
Fix a ring R. A left module over R or left R-module is an additive group
M = (M, +), which admits a composition
R×M →M
(which we denote as a juxtaposion of arguments to indicate multiplication)
such that for all elements (a, b, m, n) ∈ R × R × M × M ,
a(m + n)
(a + b)m
(ab)m
1·m
=
=
=
=
am + an
am + bm
a(bm)
m,
(8.1)
(8.2)
(8.3)
(8.4)
where 1 denotes the multiplicative identity element of R. It follows from
0R + 0R = 0R that for any element m of M , 0R m + 0R m = 0R m. Thus
0R m = 0M for any m ∈ M . Here of course 0R denotes the additive identity
of the additive group (R, +) while 0M denotes the additive identity of the
additive group (M, +).
A right R-module is an additive group M = (M, +) admitting a right
composition M × R → M , such that, subject to the same conventions of
254
denoting this composition by juxtaposition, we have for all (a, b, m, n) ∈
R × R × M × M,
(m + n)a
m(a + b)
m(ab)
m1
=
=
=
=
ma + na
ma + mb
(ma)b
m.
(8.5)
(8.6)
(8.7)
(8.8)
Again we can easily deduce the identities
m0R = 0M
m(−a) = −(ma)
(8.9)
(8.10)
where 0R and 0M are the additive identity elements of the groups (R, +) and
(M, +), and m is an arbitrary element of M .
If should be clear from the second equation of the definition of right Rmodule (Equation 8.5) that right multiplication of all of M by an element
r ∈ R, induces an endomorphism f (r) of the additive group M . But if we
denote the composition of endomorphisms in their chronological order, by
left-to-right juxtaposition (that is, the one on the left is applied first) we see
f (ab) = f (a)f (b),
(8.11)
right multiplication by a, followed by right multiplication by b, (a, b) ∈ R ×
R. Also, the first equation (8.6) tells us that addition of endomorphisms
represents ring addition – i.e.f (a + b) = f (a) + f (b) for all (a, b) ∈ R × R. In
short, we have a ring homomorphism
R → End(M ).
Conversely, if we are given such a ring homomorphism f , and if we regard endomorphisms as being right operators of the group (M, +), then M
acquires the structure of a right R-module by setting mr := m · f (r), for all
(m, r) ∈ M × R.
We can take this point of view with left R-modules as well, but there
are two approaches to it: (1) using opposite rings, or (2) using the “circle”
convention for composition of endomorphisms.
(1) Here, given a left R-module M and regarding the composition of
endomorphisms to be indicated by juxtapostion in their chronological order
(reading left to right), we obtain a ring homomorphism
Opp(R) → End(M ).
255
(2) Here all endomorphisms of M are viewed as left operators. The “circle” convention for composing mappings, asserts that
(α ◦ β)(m) := α(β(m)), for all m ∈ M.
Thus in writing α ◦ β we mean that left operator β is chronologically applied
first, and α is applied at some later date. Thus if left multiplication of the
entire left R-module M by element r ∈ R induces the endomorphism f (r) of
the additive group M , then we have
f (ab) = f (a) ◦ f (b).
But then, we obtain the ring homomorphism
r → (End(M ), ◦),
where “circle” indicates composition. In other words, the “opposite ring”
construction is built into the definition “◦”, for we have a ring isomorphism
(End(M ), ◦) ' Opp(End(M )).
Finally, there is a third type of module, an (R, S)-bimodule. Here M
is a left R-module and at the same time a right S-module, and the left and
right compositions are connected by the law
r(ms) = (rm)s, for all (r, m, s) ∈ R × M × S.
(8.12)
Although this resembles a sort of “associative law”, what it really says is that
every endomorphism of M induced by right multiplication by an element of
ring S commutes with every endomorphism induced by left multiplication by
an element of ring R.
Example 1.. Let T be a ring and let R and S be subrings of T . Then
(T, +) is an additive group, which admits a composition T × R → T . It
follows directly from the ring axioms, that with respect to this composition,
T becomes a right R module. We denote this module by the symbol TR when
we wish to view T in this way.
Also, (T, +) admits a composition R × T → T with respect to which T
becomes a left R-module which we denote by R T .
Finally (T, +) admits a composition
S×T ×R→T
256
inherited directly from ring multiplication. With respect to this composition
T becomes an (S, R)-bimodule, which we denote by the symbol S TR . Notice that here, the special law for a bimodule (Equation 8.12) really is the
associative law of T (restricted to certain triples, of course).
8.1.2
Submodules
Fix a (right) R-module M . A submodule is an additive subgroup of (M, +)
which is invariant under right multiplication by elements of R – that is, right
multiplication of the subgroup by any element of R yields a subset of that
subgroup.
It is easy to see that a subset N is a submodule of the right R-module
M , if and only if
1. N + N = N
2. −N = N , and
3. N R ⊆ N .
It should be remarked that the last containment is actually the equality
N R = N , since right multiplication by the identity element 1R induces the
identity endomorphism of N .
Suppose now N = {Nσ |σ ∈ I} is a family of submodules of the right
R-module M . The intersection
∩N Nσ
of all these submodules is clearly a submodule of M . This is the unique
supremum of the poset of all submodules contained in all of the submodules
in N .
Now let X be an arbitrary subset of the right R-module M . The submodule generated by X is defined to be the intersection of all submodules
of M which contain the subset X and is denoted hXiR . From the previous
paragraph, this intersection is the unique smallest submodule in the poset of
all submodules of M which contain X
This object has another description. Let X̄ be the set of all elements
of M which are finite R-linear combinations of elements from X – that is
elements of the form x1 r1 + . . . + xn rn where n is any natural number, the xi
belong to X and the rj belong to the ring R. Then X̄ is a subgroup (recall
257
that x(−1R ) = −x) and X̄R = X̄, so X̄ is a submodule. On the other hand,
it must lie in any submodule which contains X and so must coincide with
hXiR .
In the case that X = {x1 , . . . , xn } is a finite set, then we see
X̄ = x1 R + · · · + xn R = hXiR .
A right R-module is said to be finitely generated if and only if it is generated by a finite set. Thus right R module N is finitely generated if and only
if
N = x1 R + · · · + xn R
for some finite subset {x1 , . . . xn } of its elements.
Now once again let N = {Nσ |σ ∈ I} be a family of submodules of the
right R-module M . The sum over N is the submodule of M generated
by the set-theoretic union ∪N of all the submodules in the family N . It is
therefore the set of all finite R-linear combinations of elements which lie in
at least one of the submodules in N . But of course, as each submodule is
invariant under right multiplication by elements of R, this simply consists of
all finite sums of elements lying in at least one module of the family. In any
event, this is the unique member of the set of all submodules of M which
contain every member of N .
[Notation: if N = {N1 , . . . , Nn } is a finite family of submodules of M , we
write N1 + · · · + Nn for the sum of these submodules. ]
We summarize this information in
Lemma 142
1. The intersection over any family of submodules of a given
module is a submodule.
2. The sum over any family of submodules is the submodule consisting of
all finite sums of elements each of which is in at least one member of
the family of modules.
3. The poset of all submodules of a module is a lattice with unrestricted
meets and joins, given by the intersection and sum operations on arbitrary families of submodules.
Everything that we have said in this subsection about submodules of a
right R module holds for submodules of a left R-module with the phrase
N R = R replaced by RN = N . For this reason the above lemma has been
258
stated without any reference to left or right. It is assumed that we are either
talking about the poset of all submodules of a right R-module or else the
poset of all submodules of a left R-module.
8.1.3
Factor modules.
Now suppose N is a submodule of the right R-module M . Since N is a
subgroup of the additive group (M, +), we may form the factor group M/N
whose elements are the additive cosets x+N, x ∈ M of N . Now a composition
M/N × R → M/N is well-defined since right multiplication of a coset of N
by a ring element r lies in a unique additive coset of N – as is seen from
the containment (x + N )r ⊆ xr + N (since N R = N ). This composition
obeys all the requisite axioms with respect to addition of cosets to endow
the factor group M/N with the structure of a right R-module. This is called
the factor module M/N
Again, this construction can be duplicated for left R-modules: For each
submodule N of a left R=module M , we obtain a left R-module M/N .
8.1.4
The fundamental theorems of homomorphisms
for R-modules.
First the definition of the relevant morphisms: Suppose M and N are right
R modules (for the same ring R). An R-homomorphism M → N is a
homomorphism f : (M, +) → (N, +) among the underlying additive groups,
such that
f (m)r = f (mr),
(8.13)
for all module elements m and ring elements r. In other words, these are
group homomorphisms which commute with the right multiplications by each
of the elements of the ring R. (The student should be able to draw the
relevant commutative square diagram on the board.)
The adjectives which accrue to an R-homomorphism are exactly those
which are applicable to it as a homomorphism of additive groups: Thus the
R-homomorphism f is
• an R-epimorphism
• an R-monomorphism
259
• an R-isomorphism
if and only if f is an epimorphism, monomorphism, or isomorphism, respectively, of the additive groups.
We define the kernel of an R-homomorphism f : M → N among
right R-modules, simply to be the kernel of f as a homomorphism of additive
groups. Explicitly, ker f is the set of all elements of M which are mapped to
the additive identity 0N of the additive group N . But the critical equation
0N r = 0N shows that if x lies in ker f , then so must the entire set xR since
f (xR) = f (x)R = 0N R = 0N . Thus the kernel of an R-homomorphism (such
as f ) must be an R-submodule.
The student should be able to restate the obvious analogues of the above
definitions for left R-modules as well.
Now we can state
Theorem 143 (The First Fundamental Theorem of Module Homomorphisms)
Suppose f : M → N is a homomorphism of right (left) R-modules.
1. ker f is a submodule of M .
2. The image f (M ) is a submodule of N .
3. There is a natural R-isomorphism φ : M/(ker f ) → f (M ) which takes
coset m + ker f to f (m).
proof: Part 1 was proved above.
Part 2 is immediate from the definition of R-homomorphism.
Part 3. By definition φ is already an isomorphism of additive groups. It
remains only to verify equation (8.13), – that is, to observe that
φ((x + ker f )r) = f (xr) = f (x)r = (φ(x + ker f ))r,
for all (x, r) ∈ M × R.
Theorem 144 (Second Fundamental Theorem of Module Homomorphisms.)
260
1. If f : M → N is an R-epimorphism of right (left) R-modules, then there
is a poset isomorphism between S(M, ker f ), the poset of all submodules
of M which contain ker f and the poset S(N, 0N ) of all submodules of
N.
2. Suppose A ≤ B ≤ M is a chain in the poset of submodules of a right
(left) R-module M . Then there is an R-isomorphism
M/A → (M/B)/(B/A)
Theorem 145 (The Third Fundamental Theorem of Module Homomorphisms.) Suppose A and B are submodules of the right (left) R-module M .
Then there is an R-isomorphism
(A + B)/A → B/(B ∩ A).
sketch of the proof of the two theorems: All isomorphisms mentioned are exactly those defined in the case of abelian groups. One need only
observe that when the arguments forming the poset or group isomorphisms
are R-modules, the isomorphisms in question commute with the action of the
endomorphisms induced by R. Thus sending an R-submodule C containing
ker f to f (C) yields an R-invariant subroup of (N, +), and vice versa. The
isomorphisms M/B → (M/A)/(B/A), and (A + B)/A → B/(B ∩ A) are all
R-homomorphisms.
8.1.5
The Jordan-Hölder Theorem for R-modules.
An R-module is said to be irreducible if and only if its only submodules
are itself and the zero submodule – that is, it has no proper submodules.
Let Irr(R) denote the collection of all isomorphism classes of non-zero
irreducible R modules. Suppose M is a given irreducible R-module, and let
P (M ) be the lattice of all its submodules. Recall that an interval [A, B] in
P (M ) is the set {C ∈ P (M )|A ≤ C ≤ B} (this is considered to be undefined
if A is not a submodule of B). A cover is an interval, [A, B] such that A
is a maximal proper submodule of B – that is (A, B) is itself an unrefinable
chain of submodules of length one. Finally, recall that a non-empty interval
[A, B] is said to be algebraic if and only if there exists a finite unrefinable
chain of submodules from A to B.
261
Because of the Second Fundamental Theorem of Homomorphisms, we see
that (A, B) is a cover if and only if B/A is a non-trivial irreducible R-module.
Thus we have a well-defined mapping
µ : Cov(P (M )) → Irr(R),
from the set of all covers in the lattice P (M ) to the collection of all nonzero irreducible R-modules. If A1 and A2 are distinct proper submodules of
a submodule B, with each Ai maximal in B, then A1 + A2 = B, and by
the Third Fundamental Theorem of Homomorphisms, A2 /(A1 ∩ A2 ) (being
isomorphic to B/A1 ) is an irreducible R-module. Thus if {[A1 , B], [A2 , B]} ∈
Cov(P (M )) and A1 6= A2 , then [A1 ∩ A2 , Ai ], i = 1, 2 are also covers. This is
the statement that
• The poset P (M ) of all submodules of the R-module M , is a semimodular lattice.
We can now conclude from the Jordan-Hölder Theorem for semi-modular
lower semilattices,
Theorem 146 (Jordan-Hölder Theorem for R-modules.) Let M be an arbitrary R-module, and let P (M ) be its poset of submodules.
1. Then the mapping
µ : Cov(P (M )) → Irr(R),
extends to an interval measure
µ : Alg(P (M )) → M(Irr(R)),
from the set of all algebraic intervals of P (M ) to the multiset monoid
over the collection of all isomorphism classes of non-trivial irreducible
R-modules.
2. Suppose M itself possesses a composition series – that is, there exists a
finite unrefinable chain of submodules beginning at the zero submodule
and terminating at M . The composition factors are the finitely many
non-trivial irreducible submodules formed from the factors between successive members of the chain. Then the collection of irreducible modules
so obtained is the same with the same multiplicities, no matter what
composition series is used. In particular all composition series of M
have the same length.
262
8.1.6
Direct sums and products of modules.
Let I be any index set, and let {Mσ |σ ∈ I} be a family of right R-modules
indexed by I. Recall that the direct product of the additive groups (Mσ , +)
is the collection of all functions f : I → ∪I Mσ , from I into the disjoint
union of the Mσ so that f (σ) ∈ Mσ . This had the structure of an additive
Q
group I Mσ with addition defined “coordinatewise ” so that f + g was the
Q
mapping σ → f (σ) + g(σ). Here, since each Mσ is an R-module, I Mσ can
Q
be endowed with the structure of a right R-module, by defining f r ∈ I Mσ
by
Y
f r : σ → f (σ)r, for all (f, r, σ) ∈ ( I Mσ ) × R × I.
(8.14)
Q
When all the Mσ are R-modules, I Mσ is understood to be this R-module
and is called the direct product over the family of R-modules {Mσ |σ ∈
I}.
P
Clearly its additive subgroup:
I Mσ , consisting of all functions f for
which f σ 6= 0Mσ at only finitely many σ, is also a submodule. Thus again,
P
when the Mσ are all R-modules, I Mσ is understood to be this R-submodule
Q
of I Mσ . This is the direct sum of the Mσ .
As was the case for groups, when I is a finite set {1, . . . , n}, the two
notions of product and sum coincide, and we write
Y
I
M σ = M 1 ⊕ M2 ⊕ · · · ⊕ M n .
Lemma 147 (How To Recognize Direct Sums Internally) Let M1 , M2 , . . .
be a countable sequence of submodules of an R-module M which generate M .
P
Then M ' N Mi if and only if, for any natural number k,
(M1 + M2 + · · · + Mk−1 ) ∩ Mk = {0M }.
P
proof: For each mapping f in I Mσ , let supp(f ), be the set of σ for which
f (σ) is not the identity element of Mσ . Then by definition of the sum,
supp(f ) is always a finite subset of the index set N, the natural numbers.
P
We next define a mapping κ : I Mj → M , by the recipe:
κ:f →
X
j∈supp(f )
f (j).
Now it is easy to see that
κ(f + g) = κ(f ) + κ(g)
κ(−f ) = −κ(f )
263
(8.15)
(8.16)
so κ is a homomorphism of groups. Moreover the definition of κ reveals
κ(f r) = κ(f ) · r, for all r ∈ R.
(8.17)
Thus κ is actually an R-homomorphism.
Since M is generated by the Mj , each element of M is a finite sum of
elements each lying in at least one of the Mj ’s. Thus a typical element would
be m = m1 + · · · mt , where mi lies in Mp(i) , for some index p(i) possibly
depending on (but not necessarily uniquely determined by) i. By adding
together summands in the same submodule, we may assume without loss
that the indices p(i), 1 ≤ i ≤ t are all distinct. Now there is a function f in
P
i∈N Mi whose value at p(i) is mp(i) , for 1 ≤ i ≤ t, and is 0M at all other
indices. Clearly f has been devised so that κ(f ) = m. Thus κ is a surjective
mapping and so is an R-epimorphism.
It remains only to show that κ is one-to-one (injective) for in that case
κ is the desired isomorphism of R-modules. To that end, suppose f is an
P
element of i∈N Mi ; set J := supp(f ), a finite subset of natural numbers;
and suppose κ(f ) = 0, the additive identity of M .
Clearly J is empty if and only if f = 0. So assume J is non-empty and
P
set k = max(J). Then κ(f ) = j∈J f (j) = 0. This means
−mk =
X
j∈J−{k}
f (j) ∈ Mk ∩ (M1 + · · · + Mk−1 ) = {0}
(8.18)
This is a contradiction since mk = f (k) and k ∈ J, the support of f , and so
mk cannot be 0. Thus J = ∅ and f = 0. We have shown then that ker κ (as a
group homomorphism) is trivial. It follows that the mapping κ is one-to-one
as desired. This completes the proof.
8.1.7
Free modules
P
A free right R-module over I is a direct sum σ∈I Mσ where each summand Mσ is isomorphic as right R-modules to the R-module RR . This object
is uniquely determined up to R-isomorphism by the index set I and so is here
P
denoted by I RR . Analogously, a free left R-module is one that is a direct
P
sum of copies of R R. If I indexes the list of summands, we write this I R R.
There is a special subset EI of the free right module over I which we call
the special R-basis over I. It is the set {ei |i ∈ I}, where ei denotes the
function from I into RR for which:
264
ei (j) =
(
1R if j = i
0R otherwise
In keeping with our general notation, 1R is the multiplicative identity of R,
and 0R is the additive identity of R. There is a canonical bijection β : EI → I
in which ei is mapped to i. (Notice that there is nothing “natural” about the
definition of ER or the bijection β. It all depends on how the free module
is presented as a direct sum. Clearly there may be R-automorphisms of the
free module in which the presented direct summand ei R may be sent to a
submodule which is not one of the presented direct summands at all.)
Despite the drawbacks of the preceeding remark, there is a useful property
of a special basis.
P
Lemma 148
1. Every element of the direct sum i∈I RR has a unique
expression as a finite R-linear combination of the elements of EI .
P
2. Suppose EI is a special basis for the free module I RR . Then any arbitrary mapping f : EI → M into a right R-module M can be extended
P
to an R-homomorphism I RR → M .
proof: Part 1. If their were two distinct ways to express some element of
the direct sum in this way, then by substracting one from the other, we would
obtain a nontrivial expression
X
er
i∈J j j
= 0,
(8.19)
where J is a finite non-empty subset of I, and each rj is non-zero. Now the
right side of equation (8.19) is the function taking each member of I to 0R .
But for each k ∈ J, this function has value rk 6= 0R which is a contradiction.
P
Part 2. First, every element of I RR is a finite R-linear combination
of the elements of ER since we are dealing with a direct sum of the subP
modules ej R. Given f as described we propose to define f ( i∈J ej rj ) to be
P
j∈J f (ej )rj . But this is well-defined by the uniqueness of the respesentation
of the R-linear expression. From its definition, it is automatically defining
an R homomorphism with the desired properties.
The main result is this:
265
Theorem 149 Suppose M is a right (left) R-module generated by a subset
X – that is M = hXiR (R hXi).
1. Then there is an R-epimorphism κ from the free module over X to
M such that the restriction of κ to the special R-basis EX yields the
bijection βX : EX → X.
2. Moreover, if f : N → M is any R-epimorphism, then there exists an
P
R-homomorphism hf : x∈X RR → N from the free module to N for
which the composition f ◦ hf is κ.
remark: The minimal use of “right” and “left” in this theorem is deliberate.
We are either talking throughout about morphisms in the category of right Rmodules, or throughout about morphisms in the categrory of left R-modules.
proof: The proof is simpler than the the statement of the theorem would
suggest. The first part is an immediate consequence of the fact that the
bijection βX : EX → X when composed with the (somewhat metaphysical)
containment mapping X ,→ M , can be extended to an R-homomorphism.
This is clearly an R-epimorphism since X generates M as an R-module.
Suppose we are given the epimorphism f : N → N . In each fibre f −1 (x)
( a coset of ker f mapping onto x), x ∈ X, choose a representative x̂ so that
f (x̂) = x, and set X̂ := {x̂|x ∈ X}. Then X̂ generates some submodule
N 0 of N with the property that f (N 0 ) = M . Now by Part 1, the bijection
EX → X̂, extends to an R-homomorphism hf which takes ex to x̂, as x
ranges over X. Then hf has all the desired properties since
X
(f ◦ hf )(
8.1.8
X
ex rx ) = f (
X
hf (ex )rx = f (
=
X
X
f (x̂)rx =
=
X
κ(ex rx ) = κ(
x̂rx )
xrx
X
ex rx ).
(8.20)
Vector spaces.
Let D be a division ring. A right D-module M is called a right vector
space over D. Similarly, a right F -module is a right vector space over
F.
The student should be able convert everything that we define for a right
vector space to the analogous notion for a left vector space. There is thus no
266
need to go through the discussion twice, once for left vector spaces and once
again for right vector spaces – so we will stick to right vector spaces for this
discussion.
The elements of the vector space are called vectors. We say that a vector
v linearly depends on a set {x1 , . . . , xn } of vectors if there exist elements
of the field F , α1 , . . . , αn , such that
v = x1 α1 + · · · + xn αn ,
– that is, v is a member of the (right) submodule generated by x1 , . . . , xn .
Clearly from this definition, the following properties hold:
Reflexive Property x1 linearly depends on {x1 , . . . , xn }.
Transitivity If v linearly depends on {x1 , . . . , xn } and each xi linearly depends on {y1 , . . . , ym }, then v linearly depends on {y1 , . . . , ym }.
The Exchange Condition If vector v linearly depends on {x1 , . . . , xn } but
not on {x1 , . . . , xn−1 }, then xn linearly depends on {x1 , . . . , xn−1 , v}.
Thus linear dependance is a dependence relation as defined in the last
section of Chapter 2. It follows from Theorem 40, page86 and Theorem 41,
page86, that maximal linearly independant sets exist and any two maximal
linearly independant sets in M have the same cardinality. Therefore this
cardinality is an invariant of M and is called the dimension of M over
F and is denoted dimF (V ). Thus in particular, if F is a finitely generated
F -module, then it is finite dimensional.
Recall from Chapter 2, Theorem 39, that a maximal independant set is
at the same time, a minimal spanning set. Such a set is called an F -basis for
M . In this case, every vector of M is uniquely expressible as a finite F -linear
combination of the vectors in this basis.
Suppose now v is any non-zero vector of the vector space V . Then v
generates an F -submodule vF which is a one-dimensional subspace. Then
it is easy to see that the mapping mv : FF → vF which sends field element α
to vα gives an F -isomorphism of F -modules. (The argument runs like this:
It is easily seen to be an F -homomorphism and its kernel is the two-sided
ideal {α ∈ F |vα = 0M }. But there is a severe shortage of two-sided ideals in
a field. The only possibilities are 0 and F itself. But the latter alternative
for ker mv is not available since v1F = v, by the definition of module. Thus
ker mv = 0F and so mv is injective.) Now we can say
267
Lemma 150 If B is any basis for the F -vector space M , then M is isoP
morphic to the direct sum b∈B bF . Since each bF is isomorphic to the right
F -module FF , M is a free F -module.
proof: The unique expression of each element as a finite F -linear combination of elements of B defines the group isomorphism
M→
X
b∈B
bF
which is easily seen to behave as an F -isomorphism.
A morphism V → W of right F -modules is called a linear transformation. Given a linear transformation T : V → V of a vector space into itself,
we can convert V into a left F [x]-module by defining
p(x) · v := p(T )(v)
for any p(x) ∈ F [x] and v ∈ V . (Note that F [T ] is the subring of the
F -endomorphism ring EndF (M ) genereated by {T }.)
We will have occasion to examine finitely generated modules of this type
– even to classify them up to isomorphism.
remark: Note that we have defined dimension of a right vector space. Similarly we can define linear dependance and dimension for a left vector space.
But if V is a (D, D) bimmodule we have both a left dimension and a right dimension? Must these dimensions be the same? Is a left-linearly independant
set also a right-linearly independant set? More about this later.
8.1.9
Examples and exercises
Example 2. (Ideals as modules.) A submodule of RR is simply a right ideal
of R. Similarly the left ideals of R are just the submodules of the (free)
left R-module R R. This is certainly a trivial consequence of the definitions,
but it is a useful notion, for it allows one to apply module notions (such as
being finitely generated or semisimple) to the ideals of a ring to get a richer
language in which to discuss the parent ring itself.
Exercise 1. ( Annihilators of modules as two-sided ideals.) Suppose that
M is a right R-module. Show that Ann(M ) := {r ∈ R|M r = 0M } is a
two-sided ideal in R.
268
Exercise 2. (Abelian groups are Z-modules.) Obviously Z-modules, like all
R-modules, are abelian groups. Conversely suppose A is an Abelian group.
A composition A × Z → A can be defined as follows:
na := a + a + · · · + a.( with n summands ), n ∈ N
(−n)a := (−a) + . . . + (−a)( with n summands), n ∈ N
0a := 0A .
(8.21)
(8.22)
(8.23)
Show that A bcomes a Z-module with respect to this composition.
So the two notions, Z-module and Abelian group, are coextensive.
Exercise 3. (The Jordan-Hölder theorem implies the fundamental theorem
of Arithmetic.) This exercise is going to occur in four steps to be demonstrated by the student:
1. A Z-module M is irreducible if and only if it is a cyclic group of prime
order.
2. If n is a natural number and p is a prime number, then nZ/pnZ is a
cyclic group of prime order p, and hence is an irreducible Z-module
3. Suppose the integer n has a factorization n = pa11 · · · pakk for some set of
prime numbers {p1 , . . . , pk } and natural numbers aj . Then there exists
a composition series A0 := nZ < A1 < · · · < Am := Z, such that the
composition factors involve Z/p1 Z (exactly a1 times), Zp2 Z (exactly
a2 times), . . . Z/pk Z exactly ak times. [Hint: apply the previous step
repeatedly.]
4. By the Jordan-Hölder Theorem any two composition series of Z/nZ
yield the same multiset of composition factors. By the previous step
these are recorded in any prime-factorization. Hence up to order and
unit multiples (association classes) these are unique.
Example 3. (Ideals defined by varieties and vice versa.) Let V be the
n-dimensional vector space F (n) of all n-tuples over a field F . Let R :=
F [x1 , x2 , . . . , xn ] be the ring of all polynomials in n indeterminates {x1 , . . . , xn }
having coefficients in the field F .
Suppose we begin with an ideal J in R. Then the variety of zeroes of
J is defined to be the subset
V(J) := {(a1 , . . . , an ) ∈ F (n) |p(a1 , . . . , an ) = 0 ∀p ∈ J}
269
–that is, the vectors which are zeroes of each poynomial p(x1 , . . . , xn ) in J.
This produces an order-reversing mapping
V : the poset of ideals of R → the poset of subsets of V.
Conversely, for every subset X of F (n) , we may consider the collection
I(X) of all polynomials p(x1 , . . . , xn ) in F [x1 , x2 , . . . , xn ] such that p(a1 , . . . , an ) =
0 for all vectors (a1 , . . . , an ) ∈ X. Any polynomial r times such a polynomial
p ∈ I(X) is again such a polynomial. Similarly, the sum or difference of two
polynomials which vanish on X, also vanish on X. Thus I(X) is an ideal in
R. The result is an order-reversing mapping of these posets in the opposite
direction:
I : poset of subsets of V → poset of ideals of F [x1 , x2 , . . . , xn ].
Moreover, we possess the monotone relations
J¯ := I(V(J)) ≥ J
X̄ := V(I(X)) ≤ X
(8.24)
(8.25)
for every ideal J in R and subset X within vector space V . Thus the pair
of mappings forms a Galois connection in the sense of Chapter 0, and the
¯ and subset X → X̄, are closure operators
two “bar” operations, ideal J → J,
(idempotent order preserving maps on a poset). Their images – the “closed”
objects – are a special subposet which defines a topology. If we start with
an ideal J, its closure J¯ is a particular ideal
rad(J) := {x ∈ R|xn ∈ J, for some natural number n}.
This is the content of the famous “zeroes Theorem” of David Hilbert. 1 An
analogue for varieties would be a description of the closure X̄ of X in terms
of the original subset X. These image sets X̄ are called (affine) algebraic
varieties. That there is not a uniform description of affine variesties in terms
of V alone should be expected. The world of “V ” allows only such a primitive language to describe things, that we cannot say words like “nilpotent”,
“ideal”, etc. That is why the Galois connection is useful. The mystery of
these ”topology-inducing” closed sets can be pulled back to a better understood algebraic world, the poset of ideals.
1
Note that rad(J), as defined, is not patently an ideal when F is replaced by a noncommutative division ring.
270
Is the new topology on V really new? Of course, it is not ordained in
heaven that any vector space actually has a topology which is more interesting than the discrete one. In the case of finite vectors spaces, one cannot
expect anything better, and indeed every subset of a finite vector space is
an affine algebraic variety. Also, if F is the field of complex numbers, the
two topologies coincide, although that takes a non-trivial argument. But
there are cases where this topology on algebraic varieties differs from other
standard topologies on V , or, in some cases is the only available topology
known. Every time this happens, one has a new opportunity to do analysis
another way, and to deduce new theorems.
8.2
8.2.1
Consequences of chain conditions on modules
Definitions: Noetherian and Artinian modules
Let M be a right R-module. We are interested in the poset P (M ) := (P, ≤),
of all submodules of M , parially ordered by inclusion. We have mentioned in
a previous section that P (M ) was a semimodular lower semi-lattice for the
reason that that was all that was needed for activating the general JordanHölder theory. Actually, the fundamental homomorphism theorems imply a
much stronger property:
Theorem 151 The poset P (M ) = (P, ≤) of all submodules of a right Rmodule is a modular lattice with a minimal element 0 (the lattice “zero”) and
a unique maximal element, M (the lattice “one”).
Recall from Chapter 2 that a lattice L is modular if and only if
(M) a ≥ b implies a ∧ (b ∨ c) = b ∨ (a ∧ c) for all a, b, c in L.
It is manifestly equivalent to it’s dual:
(M∗ ) a ≤ b implies a ∨ (b ∧ c) = b ∧ (a ∨ c) for all a, b, c in L.
In the poset P (M ) the condition M reads:
If A, B, and C are submodules with A contained in B, then
A ∩ (B + C) = B + (A ∩ C).
271
This is an easy argument, acheived by considering an arbitrary element of
each side.
We say that module M is Noetherian if and only if the poset P (M )
satisfies the ascending chain condition (ACC). Similarly, we say that M is
Artinian if and only if the poset P (M ) satisfies the descending chain condition (DCC). Obviously each of these properties inherits to submodules and
factor modules.
Finally, we say that the module M has a finite composition series
if and only if there is a finite unrefineable chain of submodules in P (M )
proceeding from 0 to M . Now since P (M ) is a modular lattice, so are each
of the intervals [A, B] := {X ∈ P (M )|A ≤ X ≤ B}. Moreover, from the
modular property, we have that any interval [A, A + B] is poset isomorphic
to [A ∩ B, B]. Using these facts the student should be able to prove
Lemma 152 The following three conditions for an R-module M are equivalent:
1. M has a finite composition series.
2. M is both Noetherian and Artinian.
3. Every unrefinable chain of submodules if finite.
Exercise 4. Prove the preceeding Lemma.
Exercise 5. (Heredity of the Noetherian (Artinian) properties)
1. Show that any submodule or factor module of a Noetheriean (Artinian)
module is Noetherian (Artinian)
2. Suppose M = N1 ⊕ · · · ⊕ Nk , a direct sum of finitely many submodules
Ni . Show that if each submodule Ni is Noetherian (Artinian) then M
is Noetherian (Artinian).
3. Show that if M/A and M/B are Noetherian (Artinian) right modules,
then so is M/(A ∩ B).
4. Let R be a ring. Suppose the right R-module RR is Noetherian (Artinian) (in this case we say that the ring R is right Noetherian (Artinian)). Show that any finitely generated right R-module is Noetherian (Artinian). [Hint: Such a right module M is a homomorphic image
272
of a free module F = x1 R⊕· · ·⊕xk R over a finite number of generators.
Then apply other parts of this exercise.]
8.2.2
Effects of the chain conditions on endomorphisms.
The basic results
Let f : M → M be an endomorphism of the R-module M . As usual, we
regard f as operating from the left. Set f 0 to be the identity mapping on
M , f 1 := f and inductively define
f n = f ◦ f n−1 ,
for all positive integers n. Then f n is always an R-endomorphism, and so its
image Mn := f n (M ) and its kernel Kn := ker(f n ) are submodules. We then
have two series of submodules, one ascending and one descending:
0 = K 0 ≤ K1 ≤ K2 ≤ · · ·
M = M 0 ≥ M1 ≥ M 2 ≥ · · ·
Lemma 153 Let f be an endomorphisms of M and let Kn := ker(f n ) and
Mn := f n (M ), as above. The following statements hold:
1. Mn = Mn+1 implies M = Mn + Kn .
2. Kn = Kn+1 implies Mn ∩ Kn = 0.
3. If M is Noetherian, then for sufficiently large n, Mn ∩ Kn = 0.
4. If M is Artinian, then M = Mn + Kn for sufficiently large n.
5. If M is Noetherian and f is surjective, then f is an automorphism of
M.
6. If M is Artinian and f is injective, then again f is an automorphism
of M .
Proof: 1. Since Mn = f (Mn ) we have Mn = M2n . Suppose x is an arbitrary
element of M . Then, f n (x) = f 2n (y), for some element y. But we can write
273
x = f n (y) + (x − f n (y)). Now the latter summand is an element of Kn while
the former summand belongs to Mn . Thus x ∈ Mn + Kn and the result
follows.
2. Suppose Kn = Kn+1 , so that Kn = K2n . Choose x ∈ Kn ∩ Mn . Then
x = f n (y) for some element y, and so
0 = f n (x) = f 2n (y).
But as K2n = Kn , f 2n (y) = 0 implies f n (y) = 0 = x. Thus Kn ∩ Mn = 0.
3. If M is Noetherian, the ascending series K0 ≤ K1 ≤ · · · must stabilize
at some point, so there exists a natural number n such that Kn = Kn+1 .
Now by Part 2., Mn ∩ Kn = 0.
4. If M is Artiniam, the decending series in the Mi eventually stabilizes
and so we have Mn = Mn+1 for sufficiently large n. Now apply Part 1.
5. If f is surjective, M = M0 = Mn for all n. But M is Noetherian,
so by Part 3., Mn ∩ Kn = 0 for some n. Thus Kn = 0. But since the
Ki are ascending, K1 = 0, which makes f injective. Hence f is a bijective
endomorphism, that is, an automorphism.
6. If f is injective and M is Artinian, Mn = Mn+1 for sufficiently large
n, and so M = Mn + Kn . But Kn = 0 since K1 = 0 by hypothesis. Thus
M = Mn , and since the Mi form a descending series, M = M1 and f is
surjective. Hence f is a bijection once more.
Corollary 154 Suppose M is both Artinian and Noetherian and f : M →
M is an endomorphism. Then for some natural number n,
M = M n ⊕ Kn .
Fittings Lemma
A module M is said to be indecomposable if and only if it is not the direct
sum of two non-trivial submodules.
Theorem 155 (Fitting’s Lemma.) Suppose M 6= 0 is an indecomposable
right R-module which possesses a finite composition series. Then the following hold:
1. For any R-endomorphism f : M → M , f is either invertible (that is,
f is a unit in the endomorphism ring S := EndR (M )) or f is nilpotent
(which means that f n = 0, the trivial endomorphism, for some natural
number n).
274
2. The endomorphism ring S = EndR (M ) possesses a unique maximal
right ideal which is also the unique maximal left ideal, and which consists of all nilpotent elements of S.
Proof: 1. Recall that since P (M ) is a modular lattice, M has a finite
composition series if and only if it is both Artinian and Noetherian. It follows
from the preceeding Corollary that for sufficiently large n, M = Mn + Kn
and Mn ∩ Kn = 0, so, in fact M = Mn ⊕ Kn . Since M is indecomposable,
either Mn = 0 or Kn = 0. In the former case f is nilpotent. In the latter
case f is injective, and it is also surjective since M = Mn . Thus f is an
automorphism of M and so possesses a two-sided inverse in S.
2. It remains to show that the nilpotent elements of S form a two-sided
ideal, and to prove the uniqueness assertions about this ideal.
Suppose f is nilpotent. Since M 6= 0, K1 = ker f 6= 0, and so for any
endomorphism s ∈ S, neither sf nor f s are injective. So, by the dichotomy
imposed by Part 1., both sf and f s must be nilpotent.
Now assume f and g are nilpotent endomorphisms while h := f + g is
not. Then by the dichotomy, h possesses a 2-sided inverse h−1 . Then
1M = hh−1 = f h−1 + gh−1 .
It follows that the two endomorphisms, s := f h−1 and t := gh−1 , commute
and, by the previous paragraph, are nilpotent. Then there is a natural number m large enough to force sm = tm = 0. Then
2m
1M = (1M )
2m
= (s + t)
=
X2m
j=0
2m
j
!
s2m−j tj .
Each monomial s2m−j sj = 0 since at least one of the two exponents exceeds
m − 1. But that yields 1M = 0, an absurdity as M 6= 0. Thus h must also be
nilpotent. Thus the set nil(S) of all nilpotent elements of S forms a two-sided
ideal.
That this ideal is the unique maximal member of the poset of right ideals
of S follows from the fact that S − nil(S) consists entirely of units. A similar
statement holds for the poset of left ideals of S. The proof is complete.
The Krull-Remak-Schmidt Theorem.
Lemma 156 Let f : A = A1 × A2 → B = B1 × B2 be an isomorphism of
Artinian modules, and define mappings α : A1 → B1 and β : A1 → B2 by the
275
equation
f (a1 , 0) = (α(a1 ), β(a1 ))
for all a1 ∈ A1 . Clearly α and β are homomorphisms of R-modules. If α is
an isomorphism, then there exists an isomorphism
g : A2 → B2 .
Proof: Clearly α = π1 ◦ f ◦ κ1 and β = π2 ◦ f ◦ κ1 , where πj is the canonical
projection B → Bj with kernel B3−j , j = 1, 2, and κ1 is the canonical
embedding A1 → A1 × A2 sending a1 to (a1 , 0).
Suppose first that β = 0. Then f restricted to the first summand has
its image equal to B1 × 0, and so induces an isomorphism A/(A1 × 0) →
B/(B1 × 0) – that is, an isomorphism A2 → B2 .
So our proof will be complete is we can construct from f another isomorphism h : A → B such that h(a1 , 0) = (α(a1 ), 0) for all a1 ∈ A1 . We define
h in the following way: If f (a1 , a2 ) = (b1 , b2 ), set
h(a1 , a2 ) := (b1 , b2 − βα−1 (b1 )).
Clearly h is an R-homomorphism. Since B is Artinian it suffices to show
that h is injective. Suppose h(a1 , a2 ) = 0. Then b1 = 0 = b2 so a1 = 0 = a2
since f is an isomorphism. Thus h has been constructed as desired.
Lemma 157 Suppose A is a right R-module that is Artinian. Then A is a
direct sum of finitely many indecomposable modules.
Proof: First we argue that A even possesses an indecomposable direct
summand. If A is already indecomposable then we are done. Otherwise
there is a non-trivial decomposition B1 ⊕ A1 . If A1 is indecomposable we are
done. Otherwise we write A1 = B2 ⊕ A2 . If A2 is indecomposable we have
A = (B1 + B2 ) ⊕ A2 and we are done. As long as the process continues we
obtain a properly descending series A > A1 > A2 > · · ·. By the Artinian
condition the process must terminate at an indecomposable direct summand
An := D1 .
Now that we can write A = C1 ⊕ D1 , where D1 is non-zero indecomposable, we may, noting that C is Artinian, similarly write C1 = C2 ⊕ D2 ,
and Ci = Ci+1 ⊕ Di+1 with Di+1 indecomposable, as long as Ci remains
276
decomposable. But the properly descending chain C1 > C2 > · · · must eventually terminate in some finite number m of steps. Then one has a direct
decomposition
A = Cm ⊕ Dm ⊕ Dm−1 ⊕ · · · ⊕ D1
into indecomposables.
Theorem 158 (The Krull-Remak-Schmidt Theorem) Let the Artinian and
Noetherian module A = A1 × · · · × Am be isomorphic to B = B1 × · · · ×
Bn where the Ai and Bj are indecomposable. Then m = n and there is a
permutation ρ of the index set I = {1, . . . , n} such that Ai is isomorphic to
Bρ(i) for all i ∈ I.
Proof: Without loss of generality, we may assume m ≥ n. We shall use
induction on min(n, m) = n since the result holds trivially if n = 1.
Let f : A → B be the assumed isomorphism. Let κi : Ai → A and
0
κj : Bj → B be the cannonical embeddings associated with the direct sums.
Similarly, let πi : A → Ai and πj0 : B → Bj be the corresponding canonical
projection epimorphisms associated with these direct sums. Then set
αi : = π10 ◦ f ◦ κi : Ai → B1 ,
βi : = πi ◦ f −1 ◦ κ01 : B1 → Ai ,
each an R-homomorphism.
Then
Xm
α
i=1 i
◦ βi
is the identity mapping on the indecomposable module B1 . By Fitting’s
Lemma (Theorem 155) not all the αi ◦ βi can be nilpotent. Reindexing if
necessary, we may assume α1 ◦ β1 is not nilpotent. Then it is in fact an
automorphism of B1 . Then α1 : A1 → B1 is an isomorphism. Then we have
f (a1 , 0, . . . , 0) = (α1 (a1 ), . . .).
By the preceeding Lemma 156, there is an isomorphism
A2 × · · · × Am ' B2 × · · · × Bn .
Now we apply the induction hypothesis to deduce that m = n, and that for
some permutation σ of J = {2, . . . , n}, Ai is isomorphic to Bσ (i), i ∈ J. The
result now follows upon feeding in the isomorphism α1 .
277
8.2.3
Noetherian Modules and finite generation.
We can come to the point at once:
Theorem 159 Suppose M is a right R-module, for some ring R. Then
the poset of right R-submodules of M satisfies the ascending chain condition
(ACC) if and only if every right submodule of M is finitely generated.
proof: 1. ACC implies every submodule is finitely generated (fg). Assume
the poset of right R-submodules of M possesses the ACC. Suppose, by way
of contradiction that N is a submodule which is not finitely generated. Then
we claim that the collection N of all finitely generated submodules of N
possesses no maximal member. For if S ∈ N then S = a1 R + a2 R + · · · am R
for elements a1 , · · · am pf N , since S is a finitely generated submodule of N .
But as N is not finitely generated there exists an element am+1 ∈ N − S.
Then a1 R + · · · am+1 R is a module in N which properly contains S. Thus
the maximal condition fails among the right R-submodules of M , and so by
the previous section, the ACC fails.
2. Every submodule of the right R-module M finitely generated implies
the ACC on submodules of M . Let M1 < M2 < · · · be a properly ascending
chain of submodules of H. Then clearly the union
M∞ := ∪i Mi
is a submodule N of M and so, by hypothesis, is finitely generated. Then
N = x1 R + · · · + xm R for some finite subset {xi } ⊆ M∞ . Now each xj lies
in some Mf (j) . Thus setting N = max({f (j)|j ≤ m}, we see that MN is the
terminal member of the properly ascending sequence. Thus ACC holds on
the poset of right R-submodules of M
Corollary 160 Suppose the right R-module M satisfies the ascending chain
condition on submodules. Then every generating X – that is a set such that
hXiR = M – contains a finite subset Y which generates M .
The proof is an exercise that the reader should be able to do at this point.
Although we have lost the “up-down” symmetry between a poset and its
dual by locking the ACC to finite-generatedness of submodules, we have not
lost right-left symmetry. Thus
278
A left R-module possesses the ascending chain condition on submodules
if and only if every left R-submodule is finitely generated.
A right R-module M which possesses either of the two equivalent conditions of the above theorem (the ACC or the maximum condition in the
poset of submodules, and the fact that all submodules are finitely generated)
has been called Noetherian. Similarly we call a left R module Noetherian if and only every left R-submodule is finitely generated. There is no
need to invent two words (“left Noetherian” or “right Noetherian” as long
as we know that M is either a right R-module or a left R-module, but not
both. These words will be required for bimodules such as R RR , since conceivably one of these things could be Noetherian as a left module without
being Noetherian as a right module. But that leads us into the definitions of
the next subsection.
8.2.4
Right and left Noetherian rings.
The reader shall soon discover that many properties of a ring are properties
which are lifted (in name) from the R-modules RR or R R. We say that a ring
R is a right Noetherian ring if and only if RR is a Noetherian R-module.
In view of Theorem 159 we are saying that a ring R is right Noetherian if
and only every right ideal is finitely generated.
Similarly, R is a left Noetherian ring if and only R satisfies the ACC
on left ideals – or, equivalently, every left ideal is finitely generated. Beleive
it or not, there really do exist rings which are right Noetherian but not left
Noetherian (Dieudonné). By taking opposite rings one may also have it the
other way round.
Of course on modules, there is quite a difference between saying, on the
one hand, that a module is finitely generated, and on the other hand, that
every submodule is finitely generated. That is, being finitely generated and
Noetherian are in general two different things in the world of modules. That
is where Noetherian rings come in: if M is a right (left) R-module over a
right Noetherian ring , there is no difference between the two notions. Let’s
make this precise by stating:
Theorem 161 Suppose R is a ring which is right Noetherian – i.e. it satisfies the ascending chain condition on its right ideals. Then if M is a finitely
279
generated right R-module, then M is Noetherian – i.e. every submodule of
M is also finitely generated.
proof: Assume R is right Noetherian, and suppose N is a submodule of
M = a1 R +· · · an R, a finitely generated right R-module. Then every element
of N can be written as an “expression”
a1 r 1 + a2 r 2 + · · · + an r n .
(8.26)
We next consider the totality of such expressions of the form in equation
( 8.26) which represent elements in N . We say that such an expression of
the form ( 8.26) has length ≤ ` if and only if the last n − ` coefficients are all
zero. Let I` be the set of coefficients rl of al appearing among expressions of
the form ( 8.26) which have length at most `, and represent elements of N .
Thus
I` := {r` |a1 r1 + · · · + a`−1 r`−1 + a` r` ∈ N, ri ∈ R}.
since N is a right R-module itself, it is clear that
I` is a right ideal in R.
(8.27)
Moreover, from the definition of I` , every element b ∈ I` actually is realized as the `-th coefficient of at least one expression of the form ( 8.26),
say
B(b) := a1 r1 (b) + · · · a`−1 r`−1 (b) + a` b,
to make the choice precise.
Now by the hypothesis, R has the ascending chain condition on right
ideals, and so the ideals I` are finitely generated. Thus we may write
(l)
(l)
(`)
I` = d1 R + d2 R + · · · dt(`)
(`)
(`)
(`)
(8.28)
for some finite set of generators d1 , d2 , . . . dt(`) , depending on the choices of
` = 1, 2, · · · , n.
P
(`)
We can now form the t = nl=1 t(`) expressions B(dj ), 1 ≤ j ≤ t(`),
1 ≤ ` ≤ n, and view these as the elements of N which they represent. We
(`)
now claim that the t elements B(dj ) generate N as an R-module. To see
this, first consider an element of N represented by an expression of length n,
say
a1 c1 + a2 c2 + · · · + an cn , with cn 6= 0.
(8.29)
280
Then cn ∈ In and so from equation (8.28) is an R-linear combination
(n)
(n)
(n)
d1 s1 + d2 s2 + · · · + dt(n) st(n) = cn .
Then we see that
(n)
(n)
(n)
T = B(d1 )s1 + B(d2 )s2 + · · · + B(dt(n) )st(n)
=
Xt(n) Xn−1
i=1
(
(n−1)
a r (d
))si
j=1 j j i
+ an
Xt(n)
(n)
d
i=1 i
= ( terms of length ≤ n − 1) + an cn
is an element of N which is represented by an expression of the form ( 8.26)
whose n-th coefficient is also cn . Thus it can be subtracted from the expression in (8.29) to transform it into an expression of length ≤ n − 1
still representing an element of N . This expression in turn can have it’s
length reduced to n − 2 by subtracting off a suitable R-linear combination of
(n−1)
(n−1
B(d1
), · · · , B(dt(n−1) ). Continuing in this way, a1 c1 + · · · an cn can eventually be reduced to an expression of length zero by succesively subtracting
(j)
off R-linear combinations of the B(di ). Since 0 is the only element represented by an expression of length zero, we see that any element of N is
(j)
an R-linearly combination of the t elements B(di ). Thus N is a finitely
generated R-module.
As N was an arbitrary submodule, the proof is complete.
The student is encouraged to state the left-hand version of the statement
of this theorem.
Although the above theorem reveals a nice consequence of Noetherian
rings, the applicability of the theorem appears limited until we know that
such rings exist.
One very important class are the principal ideal domains. These are the
integral domains such that every ideal is generated by a single element. Since
an integral domain is by definition commutative, it does not matter whether
we are speaking of right ideals or left ideals here. Our most important canonical example is the ring Z of integers.
Suppose K is an ideal in the ring of integers. Since K = −K, the ideal K
contains positive integers, and so, by the “well-ordered property” contains a
unique smallest positive integer, say k. Suppose y is any integer in the ideal
K. Then the greatest common divisors of k and y consist of two integers d
281
and −d, where, without loss, d is positive. As is well known from Euclid’s
algorithm, d can be expressed as an integer linear combination d = ak + by.
Thus d lies in the ideal K. Now d is a positive integer in K dividing k, the
smallest positive integer in K. Thus d = k, and so y is an integer multiple
of k. Thus K = Zk. Hence
Lemma 162 The ring of integers Z is a principal ideal domain.
This argument used only certain properties shared by all Euclidean Domains (and there are many). All of them are principal ideal domains by
a proof almost identical to the above. But principal ideal domains are a
larger class of rings, and most of the important theorems (about their modules and the unique factorization of their elements into primes) encompass
basically the analysis of finite-dimensional linear transformations and the
elementary arithmetic of the integers. So this subject is given a seperate
section. Nonetheless, just knowing that Z is a Noetherian ring, yields the
important fact that algebraic integers form a ring.
8.2.5
Algebraic integers form a ring.
An algebraic integer is any complex number α which
√ is the root of a
polynomial in Z[x] having lead coefficient one. Thus 3 is√an algebraic
integer, but half of it is not. The complex number ω = (1 − −3)/2 is an
algebraic integer because w2 + w + 1 = 0.
An easy way to tell whether a complex number α is an algebraic integer
is given in the following;
Lemma 163 A complex number α is an algebraic integer if and only if Z[α]
is a finitely generated Z-module.
proof; (⇒). Suppose α is an algebraic integer. Then α satisfies a polynomial
equation
αn + a1 αn−1 + · · · + an−1 α + an = 0.
Then αn is a Z-linear combination of lower powers of α (including the zero-th
power). By an induction argument, one can easily establish that for every
integer m larger than n,
αm ∈ Z + Zα + · · · + Zαn−1 .
282
(8.30)
First, it is true for m = 0, 1, . . . , n − 1. If the membership relation ( 8.30)
holds for m = k, then
αk = z1 + z2 α + · · · + zn αn−1
for integers z1 , . . . zn . Then
αk+1 = ααk = α(z1 + · · · + zn αn−1 )
= αz1 + · · · + zn−1 αn−1 + zn αn .
(8.31)
(8.32)
But since
αn = −an − an−1 α − · · · − a1 αn−1
we see that αk+1 is a Z-linear combination of {αn−1 , . . . , α, 1}. Thus equation
( 8.30) holds. Since the powers of α generate the Z-module Z[α], ( 8.30)
implies
Z[α] = Z + Zα + · · · Zαn−1
and so Z[α] is finitely generated as a Z-module.
(⇐) Suppose Z[α] is a finitely generated Z-module. Now Z is a principal
ideal domain (as noted at the end of a previous section) and so has the
ascending chain condition on ideals. Then by our theorem, the Z-module
Z[α] satisfies the ascending chain condition on submodules. But the set
X := {1, α, α2 , . . .} is a set of generators and so by Lemma 160 there exists
a finite subset Y = {αn1 , . . . , αnm } of X which generates Z[α]. Thus if
m := max{ni }, then we have αm+1 equal to a Z-linear combination of the
αni , i = 1, . . . m, so
αm+1 = b1 αn1 + b2 αn2 + · · · + bm αnm , bi ∈ Z
so α is a root of the monic polynomial
p(x) = xm+1 − bnx 1 − b2 xn2 − · · · − bm xnm
in Z[x] of degree m+1 and having leading coefficient 1. Thus α is an algebraic
integer.
Theorem 164 The algebraic integers form a ring.
283
proof: Certainly if α is a root of a monic polynomial with integer coeficients,
than so is −α.
Let α and β be any two algebraic integers. It remains only to show that
α + β and αβ are algebraic integers. By the previous Lemma 163, both Z[α]
and Z[β] are finitely generated Z-modules, and so we may write:
Z[α] = Zu1 + Zu2 + · · · Zun
Z[β] = Zv1 + Zz2 + · · · + Zvm
(8.33)
(8.34)
Now form the ring Z[α, β] of all polynomial expressions in both complex
numbers α and β. This is generated as a Z-module by all monomials αi β j ; 0 ≤
i, j. Since αi and β j lie in Z[α] and Z[β], from equations (8.33) and (8.34)
just above, the general monomial αi β j is a Z-linear combination of the set
X := {us vt |1 ≤ s ≤ n, 1 ≤ t ≤ m}.
Thus X is a set of generators of the Z-module Z[α, β].
Thus Z[α, β] is finitely generated. Since Z is Noetherian, every submodule
of Z[α, β] is finitely generated. Thus the submodules
Z[α + β] and Z[αβ]
are finitely generated. But by the Lemma 163, this means that α + β and
αβ are algebraic integers.
In the opinion of the author, the spirit of abstract algebra is immortalized
in this maxim:
• A GOOD DEFINITION IS BETTER THAN MANY LINES OF PROOF.
Perhaps no theorem illustrates this principle more than the one just given. In
the straightforward way of proving this theorem one is obligated to produce
monic polynomials with integer coefficients for which α +β and αβ are roots;
not an easy task. Yet the above proof was not hard. That was because we
had exactly the right definitions!
284
8.3
The Hilbert Basis Theorem
This section gives us a second important application of Theorem 161.
We begin with a few remarks about polynomial rings.
Previously, in attempting to supply ourselves with examples of rings, we
considered monoid rings M R where R was any ring and M was any monoid.
The elements of M R were finite (right) R-linear combinations of elements
of M , with component-wise addition and right multiplication by elements of
R – i.e. in terms of language we have since learned – M R is a free right
R-module. Multiplication in M R is defined by assuming two principles:
1. Elements of R commute with elements of M .
2. Every conceivable distributive law holds.
Thus M R = RM is an (R, R)-bimodule.
In particular, if M is the free monoid generated by an element x, then
M R is all expressions r0 + xr1 + · · · xn rn , for all natural numbers n and
coefficients rj in R, and is the ring of polylomials over R. The reason
this needs to be spelled out is that usually the notion of a ring of polynomials
in many books and contexts entails at least
1. that R is a commutative ring, or even
2. that R is a field.
. Neither of these are assumed here: only that elements of the ring R commute
with powers of x.
Of course if M is the commutative multiset monoid generated by a finite
set {x1 , . . . xn }, then RM is the ring of polynomials over R and is denoted
R[x1 , . . . , xn ].
Theorem 165 (The Hilbert Basis Theorem) That is one of the two applications of the above theorem and is given two subsections ahead.
Let R be any (not necessarily commutative ) ring and let R[x] be the ring
of polynomials with coefficients in R.
If R has the ascending chain condition on right ideals, then so does R[x].
proof: Let A be a right ideal in R[x]. We wish to show that A is finitely
generated as a right R[x]-module. Let A+ be the collection of all “lead”
285
coefficients of all polynomials lying in A, where the “lead coefficients” of any
polynomial is the coefficient of the highest power of x appearing (with nonzero coefficient) in the polynomial or is zero in the case of the zero polynomial.
Since A is a right ideal in the polynomial ring, A+ is a right ideal in R. Then
by Theorem 159 A+ is finitely generated, so we may write
A+ = a1 R + a2 R + · · · + an R.
From the definition of A+ , there exist polynomials pi (x) in A having lead
coefficient ai . Let ni := degree pi (x) and set m = max{ni }. Now let Rm [x]
be the polynomials of degree at most m in R[x] and set
Am = A ∩ Rm [x].
We claim
A = Am + p1 (x)R[x] + p2 (x)R[x] + · · · + pn (x)R[x].
(8.35)
Let f (x) be a polynomial in A of degree N larger than m. Then
f (x) = αxN + terms involving lower powers of x.
Then, as α is the leading coefficient of f (x) we have α ∈ A+ . Thus
α = a1 r 1 + a2 r 2 + · · · an r n .
Then
f1 (x) = f (x) − xN −n1 p1 (x)r1 − xN −n2 p2 (x)r2 − · · · − xN −nn pn (x)rn
is a polynomial in A which has degree less than N . In turn, if the degree of
f1 (x) exceeds m this procedure can be repeated. Thus eventually we obtain
a polynomial h(x) in A of degree at most m, which is equal to the original
f (x) minus an R[x]-linear combination of {p1 (x), p2 (x), . . . , pn (x)}. But since
each pi (x) and f (x) all lie in A, we see that h(x) ∈ Am . Thus
f (x) ∈ Am p1 (x)R[x] + · · · pn (x)R[x].
Since f (x) was an arbitrary member of A, equation ( 8.35) holds.
In view of equation ( 8.35), it remains only to show that Am R[x] is a
finitely generated R[x]-module, and this is true if Am is a finitely generated
286
R-module. But Am is an R-submodule of Rm [x] which is certainly a finitely
generated R-module (for {1, x, . . . xm } is a set of generators). Thus by the
main theorem of the previous section, since R has the ACC on right ideals,
the submodule Am is finitely generated. This completes the proof.
remark: Again there exists a left version of this theorem. The ring R[x]
is the same. It asserts that if R is left Noetherian, than R[x] is also a left
Noetherian ring.
Corollary 166 If R is right Noetherian, than so is R[x1 , . . . , xn ].
287
Chapter 9
The Arithmetic of Integral
Domains.
9.1
Introduction.
Let us recall that an integral domain is a species of commutative ring with
the simple property
(ID) If D is an integral domain then there are no zero divisors – that is,
there is never a product of two non-zero elements equalling the zero
element of D.
Of course this is equivalent (in the realm of commutative rings) to the proposition
(ID’) The set D∗ of non-zero elements of D, is closed under multiplication,
and thus (D∗ , ·) is a commutative monoid under ring multiplication.
An immediate consequence is
Lemma 167 The collection of all non-zero ideals of an integral domain are
closed under intersection, and so form a lattice under either the containment relation or its dual. Under the containment relation, the ‘meets’ are
intersections and the ‘joins’ are sums of two ideals.
proof: The submodules of DD form a lattice under the containment relation with ‘sum’ and ‘intersection’ playing the roles of and “join” and “meet”.
288
Since such submodules are precisely the ideals of D, it remains only to show
that two nonzero ideals cannot intersect at the zero ideal in an integral domain. But if A and B are ideals carrying non-zero elements a and b, respectively, then ab is a non-zero element of A ∩ B.
We had earlier introduced integral domains as a class of examples in the
Chapter 6 on basic properties of rings. One of the unique and identifying
properties of integral domains presented there was:
(The Cancellation Law) If (a, b, c) ∈ D∗ × D × D, then ab = ac implies
b = c.
It is this law alone which defines the most interesting aspects of integral
domains – their arithmetic – which concerns who divides who among the
elements of D. (It does not seem to be an interesting question in general
rings with zero divisors – such as full matrix algebras.)
Of course for any commutative ring R we may extract the group of units,
U (R). This gives rise to an equivalence relation among the elements of D:
the associate relation, where one element is a (left or right) multiple of
another by a unit: clearly an equivalence relation.
Equivalence relations are fine, but how can we connect these up with “divisibility”? To make the notion precise, we shall say that element a divides
element b if and only if b = ca for some element c in R. Notice that this
concept is very sensitive to the ambient ring R which contains elements a
and b, for that is the “well” from which a potential element c is to be drawn.
By this definition,
• Every element divides itself.
• Zero divides only zero, while every element divides zero.
• If element a divides b and element b divides c. then certainly a divides
c.
Thus the divisibility relationsihip is transitive and so we automatically
inherit a preorder which is trying to tell us as much as possible about the
question of who divides whom.
As one may recall from Chapter 1, for every preorder, there is an equivalence relation (that of being less-than-or-equal in both directions) whose
289
eqivalence classes become the elements of a partially ordered set. Under the
divisility preorder for an integral domain, D, equivalent elements a and b
should satisfy
a = sb, and b = ta.
for certain elements s and t in D. If either one of a or b is zero, then so is the
other, so zero can only be equivalent to zero. Otherwise, a and b are both
non-zero and so, by the Cancellation Law,
a = s(ta) = (st)a and st = 1.
In this case, both s and t are units, and a and b are associates. Conversely, if a
and b are associates, they divide one another. Thus the equivalence classes for
the associate-relation in an integral domain D (which are the multiplicative
cosets xU (D) of the group of units in the multiplicative monoid (D, ·) of the
domain D) are also the equivalence classes of the divisibility preorder .
Thus from the definition of the word“divides”, we obtain a poset which
we denote Div(D) whose elements are the associate-classes of the integral
domain D. Moreover
Lemma 168 (Properties of the Divisibility Poset)
1. The additive identity element 0D , is an associate-class that is a global
maximum of the poset Div(D).
2. The group of units U (D) is an associate-class comprising the global
minimum of the poset Div(D).
One can also render this poset in another way. We have met above the
lattice of all ideals of D under inclusion – which we will denote here by the
symbol L. Its dual lattice (ideals under reverse containment) is denoted L∗ .
We are interested in a certain induced sub-poset of L∗ . Ideals of the form
xD which are generated by a single element x are called principal ideals.
Let P (D) be the poset of L induced on the collection of all principal ideals,
including the zero ideal, 0 = 0D. We call this he principal ideal poset. To
obtain a comparison with the poset Div(D), we must pass to the dual. Thus
P (D)∗ is defined to be the subposet of L∗ induced on all principal ideals —
that is, the poset of principal ideals ordered by reverse inclusion.
We know the following:
290
Lemma 169 (Divisibility and principal ideals posets) In an intedgral domain D, the following holds:
1. Element a divides element b if and only if bD ⊆ aD.
2. Elements a and b are associates if and only if aD = bD.
3. So there is a bijection between the poset P (D)∗ of all principal ideals
under reverse containment, and the poset Div(D) of all multiplicative
cosets of the group of units (the association classes of D) under the
divisiblity relation, given by the map f : xD → xU (D).
4. Moreover, the mapping
f : P (D)∗ → Div(D),
is an isomorphism of posets.
One should be aware that the class of principal ideals of a domain D need
not be closed under taking intersections and taking sums. This has a lot to
do – as we shall soon see – with the questions of the existence of “greatest”
common divisors, “least” common multiples and ultimately the success or
failure of unique factorization. Notice that here, the “meet” of two cosets
xU (D) and yU (D) would be a coset dU (D) such that dU (D) divides both
xU (D) and yU (D) and is the unique coset maximal in Div(D) having this
property. We pull this definition back to the realm of elements of D by
saying:
The element d is a greatest common divisor of two non-zero elements a
and b of an integral domain D if and only
1. d divides both a and b.
2. If e divides both a and b, then e divides d.
Clearly, then, if a greatest common divisor of two elements of a domain
exists, it is unique up to taking associates (i.e. only a coset of U (D) is being
specified) and is unchanged upon replacing any of the two original elements
by any of their associates.
In fact it is easy to see that if d is the greatest common divisor of a and
b, then dD is the smallest principal ideal containing both aD and bD. Note
291
that this might not be the ideal aD + bD, the join in the lattice of all ideals
of D.
Similarly, lifting back the meaning of a “join” in the poset Div(D), we
say
The element m is a least common multiple of two elements a and b in
an integral domain D if and only if
1. a and b both divide m (i.e. m is a multiple of both a and b).
2. If n is also a common multiple of a and b, then n is a multiple of
m
Again, we note that if m is the least common multiple of a and b, then
mD is the largest principal ideal contained in both aD and bD. Again, this
does not mean that mD is aD ∩ bD, the meet of aD and bD in the full lattice
L of all ideals of D.
The next two lemmas display further instances in which a question on
divisibility in an integral domain, refers back to properties of the poset of
principal ideals.
A factorization of an element of the integral domain D is simply a way
of representing it as a product of other elements of D. Of course one can write
an element as a product of a unit and another associate in as many ways as
there are elements in U (D). The more interesting factorizations are those in
which none of the factors are units. We call these proper factorizations.
Suppose we begin with an element x of D. If it is a unit, it has no proper
factorization. Suppose element x is not a unit, and has a proper factorization
x = x1 y1 into non-zero non-units. We attempt, then, to factor each of
these factors into two further factors. If such a factorization is not possible,
one proceeds with the other proper factor. In this way one may image an
infinite schedule of such factorizations. In the graph of the divisility poset
Div(D), this schedule would correspond to a downwardly growing binary tree.
An endnode to this tree results if and only if one obtains a factor y which
possesses no proper factorization. We call such an element “irreducible”.
Precisely, an element y of D∗ is irreducible if and only if it is a non-zero
non-unit with the property that if y = ab is a factorization, one of the factors
a or b, is a unit. Applying the poset isomorphism f : Div(D) → P IP (D)
given in Lemma 169 above, these comments force
292
Lemma 170 If D is an integral domain, an element y of D is an irreducible
element if and only if the principal ideal yD is non-zero and does not properly lie in any other principal ideal except D itelf (that is, it is maximal
in the induced poset P (D) − {D, 0} of proper principal ideals of D under
containment).
Another important central Lemma of this type which holds for arbitrary
integral domains is this:
Lemma 171 Let D be an integral domain possessing the ascending chain
condition on its poset P (D) of principal ideals. Then every non-zero nonunit is the product of a unit and a finite number of irreducible elements.
proof: Suppose there were an element x ∈ D∗ − U (D) which was not the
product of a finite number of irreducible elements. Then the collection T
of principal ideals xD where x is not a product of finitely many irreducible
elements is non-empty. We can thus choose an ideal yD which is maximal
in T because T inherits the ascending chain condition from the poset of
principal ideals. Since y itself is not irreducible, there is a factorization
y = ab where neither a nor b is a unit. Then aD properly contains yD, since
otherwise aD = yD and y is an associate ua of a where u is a unit. Instantly,
the cancellation law yields b = u contrary to hypothesis. Thus certainly the
principal ideal aD does not belong to T . Then a is a product of finitely
many irreducible elements of D. Similarly, b is a product of finitely many
irreducible elements of D. Hence their product y must also be so, contrary
to the choice of y. The proof is complete.
What about the converse to this lemma?
In any event, the divisibility structure of D is revealed by the divisibility
poset Div(D) and is in part controlled by the group of units. It is then
interesting to know that the group of units can be enlarged by a process
described in the following section.1
1
When this happens, the student may wish to verify that the new divisibility poset so
obtained is a homomorphic image of the former.
293
9.2
9.2.1
Localization in Commutative Rings.
Localization by a multiplicative set.
We say that a subset S of a ring is multiplicatively closed if and only
if SS ⊆ S and S does not contain the zero element of the ring. If S is
a non-empty multiplicatively closed subset of the ring R, we can define an
equivalence relation “∼” on R × S by the rule that (a, s) = (b, t) if and only
if, for some element u in S, u(at − bs) = 0.
Let us show first that it truly is an equivalence relation. Obviously the
relation “∼” is symmetric and reflexive. Suppose now
(a, s) ∼ (b, t) ∼ (c, r) for {r, s, t} ⊆ S.
Then there exists elements u and v in S such that
u(at − bs) = 0 and v(br − tc) = 0.
Multiplying the first equation by vr we get vr(uat) = vr(ubs). But vr(ubs) =
us(vbr) = us(vtc), by the second equation. Hence
vut(ar) = vut(sc),
so (a, s) ∼ (c, r), since vut ∈ S. Thus ∼ is a transitive relation.
We now let the symbol a/s denote the ∼-equivalence class containing the
ordered pair (a, s). We call these equivalence classes fractions.
Next we show that these equivalence classes enjoy a ring structure. First
observe that if there is an element u in S such that u(as0 − sa0 ) = 0 – i.e.
(a, s) ∼ (a0 , s0 ) – then u(abs0 t−sta0 b) = 0, so (ab, st) ∼ (a0 b, s0 t). Thus we can
unambiguously define the product of two equivalence classes – or ‘fractions’
– by setting (a/s) · (b/t) = ab/st.
Similarly, if (a, s) ∼ (a0 , s0 ), so that for some u ∈ S, u(as0 − sa0 ) = 0 , we
see that
u(at + sb)s0 t − u(a0 t + s0 b)st = 0,
so
(at + bs)/s = (a0 t + s0 b)/s0 t.
Thus ‘addition’, defined by setting
at + bs
a b
+ :=
,
s t
st
294
is well defined, since this is also (a0 t + s0 b)/s0 t.
The set of all ∼-classes on R × S is denoted RS . Now that multiplication
and addition are defined on RS , we need only verify that
1. (RS , +) is an abelian group with identity element 0/s – that is, the
∼-class containing (0, s) for any s ∈ S.
2. (RS , ·) is a commutative monoid with multiplicative identity s/s, for
any s ∈ S.
3. Multiplication is distributive with respect to addition in RS .
All of these are left as Exercise 1. The conclusion, then, is that
(Rs , +, ·) is a ring.
This ring Rs is called the localization of R by S.
9.2.2
Special features of localization in domains.
For each element s in the multiplicatively closed set S, there is a homomorphism of additive groups
ψs : (R, +) → (RS , +)
given by ψs (r) = r/s. This need not be an injective morphism. If a/s = b/s,
it means that there is an element u ∈ S such that us(a − b) = 0. Now if the
right annihilator of us is nontrivial, such an a − b exists with a 6= b, and ψs
is not one-to-one.
Conversely, we can say, however,
Lemma 172 If s is an element of S such that no element of sS is a “zero
divisor” – an element with a non-trivial annihilator – then ψs : R → RS is
an embedding of additive groups. The converse is also true. In particular,
we see that in an integral domain, where every non-zero element has only a
trivial annihilator, the map ψs is injective for each s ∈ S.
Lemma 173 Suppose S is a multiplicative closed subset of non-zero element
of the commutative ring R and suppose no element of S is a zero divisor in
R.
295
1. Suppose ac 6= 0 and b, d are elements of S. Then in the localized ring
RS
a c
( )( ) is non-zero in RS .
b d
2. If a is not a zero divisor in R, then for any b ∈, a/b is not a zero
divisor in RS .
3. If R is an integral domain, then so is RS .
4. If R is an integral domain, the mapping ψ1 : R → RS is an injective
homomorphism of rings (that is an embeddig of rings).
proof: : Part 1. Suppose ac 6= 0, but that for some {b, d, s} ⊆ S,
a c
0
( )( ) = .
b d
s
(9.1)
Then
(ac, bd) ∼ (0, s).
Then by the definition of the relation “∼”, there is an element u ∈ S such
that u(acs − ab · 0) = 0 which mean uacs = 0, But since u and s are elements
of S, they are not zero divisors, and so we obtain ac = 0, a contradiction.
Thus the assumption ac 6= 0 forces equation (9.1) to be false. This proves
Part 1.
Parts 2. and 3. follow immediately from Part 1.
Part 4. Assume R is an integral domain. From the second statement of
Lemma 172, the mapping ψs : R → RS which takes element r to element
r/s is an injective homomorphism of additive groups for each s ∈ S. Now
put s = 1. We see that ψ1 (ab) = ψ1 (a)ψ1 (b) for all a, b ∈ R. Thus ψ1 is an
injective ring homomorphism.
9.2.3
Localization at a prime ideal.
Now let P be a prime ideal of the commutative ring R – that means P has
this property: If, for two ideals A and B of R, one has AB ⊆ P , then at
least one of A and B lie in P . For commutative rings, this is equivalent to
asserting either of the following (see the exercise at the end of Chapter 7):
(1) For elements a and b in R, ab ∈ P implies a ∈ P or b ∈ P .
296
(2) The set R − P is a multiplicatively closed set.
It should be clear that a prime ideal contains the annihilator of every element
outside it.
Now let P be a prime ideal of the integral domain D, and set S := D − P ,
which, as noted is multiplicatively closed. Then the localization of D by S is
called the localization of D at the prime ideal P . In the literature the
prepositions are important: The localization by S = D − P is the localization
of D at the prime ideal P .
Now we may form the ideal M := {p/s|(p, s) ∈ P × S} in DS for which
each element of DS − M , being of the form s0 /s, (s, s0 ) ∈ S × S, is a unit of
DS . This forces M to be a maximal ideal of DS , and in fact, every proper
ideal B of DS must lie in it. Thus we see
DS has a unique maximal ideal M.
(9.2)
Any ring having a unique maximal ideal is called a local ring.
This discussion together with Part 4. of Lemma reflem:localzero yields
Theorem 174 Suppose P is a prime ideal of the integral domain D. Then
the localization at P (that is, the ring DS where S = D − P ) is a local ring
that is also an integral domain.
Example 1. The zero ideal {0} of any integral domain is a prime ideal.
Forming the localization at the zero ideal of an integral domain D thus
produces an integral domain with the zero ideal as the unique maximal ideal
and every non-zero element a unit. This localized domain is called the field
of fractions of the domain D.2
9.3
Unique Factorization.
Let D be any integral domain. Recall that a non-zero non-unit of D is said to
be irreducible if it cannot be written as the product of two other non-units.
This means that if r = ab is irreducible, then either a or b is a unit.
2
One finds “field of quotients” or even “quotient field” used in place of “‘field of fractions” here and there. Such usage is discouraged because we already have the term “quotient ring” commonly being used to mean “factor ring”.
297
A similar, but distinct notion is that of a prime element. A non-zero
non-unit r in D is said to be a prime, if and only if, whenever r divides a
product ab, then either r divides a or r divides b.
In the case of the familiar ring of inegers, the two notions coincide, and
indeed they are forced to coincide in a fairly large collection of domains.
An integral domain D is said to be a unique factorization domain (or
UFD) if and only if
(UFD1) Every non-zero non-unit is a product of a unit and a finite number
of irreducible elements.
(UFD2) Every irreducible element is prime.
We shall eventually show that a large class of domains – the principal
ideal domains – are UFD’s.
One of the very first observations to be made from these definitions is that
the notion of being prime is stronger than the the notion of being irreducible,
Thus:
Lemma 175 In any integral domain, any prime element is irreducible.
proof: Suppose r is a prime element of the integral domain D. Suppose
r = ab. Then r divides ab, and so, as r is prime, r divides one of the
two factors a or b – say a. Then r = rvb so, by the Cancellation laws,
1 = vb,whence b is a unit. Thus r is irreducible.
Lemma 176 The following are equivalent:
1. D is a UFD.
2. In D every non-zero non-unit is a product of a unit and a finite number
of primes.
proof: 1. ⇒ 2. is obvious since 1. implies each irreducible element is a
prime.
2. ⇒ 1. First we show that every irreducible element is prime. Let r
be irreducible. Then by 2., r is a product of a unit and a finite number of
primes. But since r is irreducible, r = up where u is a unit and p is a prime.
298
Now every non-zero non-unit is a product of a unit and finitely many
irreducible elements, since each prime is irreducible by Lemma 175. Thus
the two defining properties of a UFD follow from 2. and the proof is complete.
Theorem 177 (Unique Factorization Theorem) Let D be an integral domain
which is a UFD. Then every non-zero non-unit can be written as a product
of a unit and a set of primes. Such an expression is unique up to the unit,
the order of the factors, and the replacement of a prime by an associate –
that is, a multiple of that prime by a unit.
proof: Let r be a non-zero non-unit of the UFD D. Then by Lemma 176,
r can be written as a unit times a product of finitely many primes. Suppose
this could be done in more than one way, say,
r = up1 p2 · · · ps = vq1 q2 · · · qt
where u and v are units and the pi and qi are primes. If s = 0 or t = 0,
then r is a unit, contrary to the choice of r. So without loss we may assume
0 < s ≤ t. Now as p1 is a prime, it divides one of the right hand factors,
and this factor cannot be v. Rearranging the indexing if necessary, we may
assume p1 divides q1 . Since q1 is irreducible, p1 and q1 are associates, so
q1 = u1 p1 . Then
r = up1 · · · ps = (vu1 )p1 q2 · · · qt
so
up2 · · · ps = (vu1 )q2 · · · qt .
By induction on t, we have s = t and qi = ui pπ(i) for some unit ui , i = 2, . . . t
and permutation π of these indices. Thus the two factorizations involve the
same number of primes with the primes in one of the factorizations being
associates of the primes in the other written in some possibly different order.
We also have a local characterization of UFD’s.
Theorem 178 (Local Characterization of UFD’s) Let D be an integral domain. Let S be the collection of all elements of D which can be written as
a product of a unit and a finite number of prime elements. Then S is multiplicatively closed and does not contain zero; so the localization DS can be
formed.
D is a UFD if and only if DS is a field.
299
proof: (⇒) If D is a UFD, then, by Lemma 176, S comprises all nonzero non-units of D. One also notes that DS = DS 0 where S 0 is S with the
set U (D) of all units of D adjoined. (By Exercize 2. of this Chapter, this
holds for all domains.) Thus S 0 = S − {0}. Then DS 0 is the field of fractions
of D.
(⇐) It suffices to prove that S is all non-zero non-units of D (Lemma 176).
In fact, as S ⊆ D∗ − U (D), it suffices to show that D∗ − U (D) ⊆ S. Let r
be any non-zero non-unit – i.e. an element of D∗ − U (D), and suppose by
way of contradiction that r ∈
/ S.
Since r is a non-unit, Dr is a proper ideal of D. Then
(Dr)S ; = {a/s|a ∈ Dr, s ∈ S}
is clearly a non-zero ideal of DS . Since DS is a field, it must be all of DS . But
this means that 1/1 is an element of (Dr)S – i.e. one can find (b, s) ∈ D × S,
such that
br/s = 1/1.
This means br = s so
Dr ∩ S 6= ∅.
Now from the definition of S, we see that there must be multiples of r which
can be expressed as a unit times a non-zero number s of primes. Let us
choose the multiple so that the number of primes that can appear in such
a factorization attains a minimum m. Thus there exists an element r0 of D
such that
rr0 = up1 · · · pm
where u is a unit and the pi are primes. We may suppose the indexing of the
primes to be such that p1 , . . . , pd do not divide r0 , while pd+1 , . . . , do divide
r0 . Then if d < m, pd+1 divides r0 so r0 = bpd+1 . Then we have
rbpd+1 = up1 · · · pd pd+1 · · · pm
so
rb = up1 · · · pd pd+2 · · · pm (m − 1 prime factors)
against the minimal choice of m. Thus d = m, and each pi does not divide r0 .
Then each pi divides r. Thus r = a1 p1 so a1 r0 = up2 · · · pd , upon cancelling
p1 . But again, as p2 does not divide r0 , p2 divides a1 . Thus a1 = a2 p2
so a2 r0 = up3 · · · pd . As each pi does not divide r0 , this argument can be
300
repeated, until finally one obtains ad r0 = u when the primes run out. But
then r0 is a unit. Then
r = ((r0 )−1 u)p1 · · · pm ∈ S.
Thus as r was arbitrarily chosen in D∗ − U (D) ⊆ S, we are done.
9.3.1
A Special Case.
Of course we are accustomed to the domain of the integers, which is a UFD.
So it might be instructive to look√at a more pathological case.
Consider the ring D = {a + b −5|a, b ∈ Z}. Being a subring of the field
of complex numbers C, it is an integral domain.
The mapping z → z̄, where z̄ is the complex conjugate of z, is an automorphism of C which leaves D invariant (as a set). and so induces an automorphism of D. We define the norm of a complex number ζ to be N (ζ) := ζ ζ̄.
Then N : C → R+ , the positive
and N (ζψ) = N (ζ)N (ψ) for
√ real numbers,
2
2
all ζ, ψ ∈ C. Clearly, N (a + b −5) = a + 5b so the restriction of N to D
has non-negative integer values. (Moreover, we obtain for free the fact that
the integers of the form a2 + 5b2 are closed under multiplication.)
Let us determine the units of√D. Clearly if ν ∈ D is a unit, then there is
a µ in D so that νµ = 1 = 1 + 0 −5, the identity of D. Then
N (ν)N (µ) = N (νµ) = N (1) = 12 + 5 · 02 = 1.
But how can 1 be expressed as the product of two non-negative integers?
Obviously, only if N (ν) = 1 = N (µ). But if a2 + 5b2 = 1, it is clear that
b = 0 and a = ±1. Thus
U (D) = {±1} = {d ∈ D|N (d) = 1}.
We can also use norms to locate irreducible elements of D. For example, if
ζ is an element of D of norm 9, then ζ is irreducible. For otherwise, one would
have ζ = ψη where ψ and η are non-units. But that means N (ψ) and N (η)
are both integers larger than one. Yet 9 = N (ψ)n(η) so N (ψ) = N (η) = 3,
which is impossible since 3 is not of the form a2 + 5b2 .
But now note that
√
√
9 = 3 · 3 = (2 + −5)(2 − −5)
301
are two factorizations of 9 into irreducible factors (irreducible, because the
they have norm 9) and, as U (D) = {±1}, 3 is not an associate of either
factor on the right hand side.
√
Thus unique factorization fails in the domain D = Z⊕Z −5. The reason
is that 3 is an irreducible
√ element√which is not a prime. It is not a prime
because 3 divides (2 + −5)(2 − −5), but does not divide either factor.
9.4
Euclidean Domains
An integral domain D is said to be a Euclidean Domain if and only if there
is a function g : D∗ → N into the non-negative integers3 satisfying
(ED1) For a, b ∈ D∗ := D − {0}, g(ab) ≥ g(a).
(ED2) If (a, b) ∈ D × D∗ , then there exist elements q and r such that
a = bq + r, where either r = 0 or else g(r) < g(b).
The notation in (ED2) is intentionally suggestive: q stands for “quotient”
and r stands for “remainder”.
Recall that in a ring R, an ideal I of the form xR is called a principal
ideal. An integral domain in which every left ideal is a principal ideal is
called a principal ideal domain or PID. We have
Theorem 179 Every Euclidean domain is a principal ideal domain.
proof: Let D be a Euclidean domain with grading function g : D∗ → N.
Let J be any ideal in D. Among the non-zero elements of J we can find an
element x with g(x) minimal. Let a be any other of J. Then a = qx + r as
in (ED2) where r = 0 or g(r) < g(x). Since r = a − qx ∈ J, minimality of
g(x) shows that the second alternative cannot occur, so r = 0 and a = qx.
Thus J ⊆ xD. But x ∈ J, an ideal, already implies xD ⊆ J so J = xD is a
principal ideal. Thus D is a PID.
We have mentioned in the introduction to this chapter that the divisibility
information about any integral domain is encoded in the divisibility poset
Div(D) of the divisibility relation ‘|’ among the cosets of the group of units
U (D) in the multiplicative monoid of D.
3
The term “natural numbers”includes zero.
302
In fact, the Lemma 169 of the introduction showed that there is an isomorphism between the divisibility poset Dvi(D), and the poset P (D)∗ of all
principal ideals of D under reverse containment. Since, in any integral domain, the collection of all nonzero ideals are closed under intersections, the
non-zero ideals of a principal ideal domain form a lattice.
Thus in a principal ideal domain D the poset Dvi(D) becomes a lattice
– that is, it has “meets” and “joins”. Following the developement in Section
8.1, we conclude
Theorem 180 In a principal ideal domain D :
1. Any two non-zero elements a and b of D possess a greastest common
divisor d which is a generator of the ideal aD + bD.
2. Any two non-zero elements a and b possess a least common multiple m
which is generator of the ideal aD ∩ bD.
The interesting feature of a Euclidean domain, is not only the algebraic
property that it is a PID and hence has greatest common divisors, but the
interesting computional-logical feature that the greatest common divisors can
actually be computed! Behold the
euclidean algorithm. Given a and b in D and D∗ , respectively, by (ED2)
we have
a = bq1 + r1 ,
and if r1 6= 0,
b = r1 q2 + r2 ,
and if r2 6= 0,
r 1 = r 2 q3 + r 3 ,
etc., until we finally obtain a remainder rk = 0 in
rk−2 = rk−1 qk + rk .
Such a termination of this process is inevitable for g(r1 ), g(r2 ), . . . is a strictly
decreasing sequence of non-negative integers which can only terminate at
g(rk−1 ) when rk = 0.
We claim the number rk−1 ( the last non-zero remainder) is a greatest
common divisor of a and b. First rk−2 = rk− qk and also rr−3 = rk−2 qk−1 +rk−1
303
are multiples of rk−1 . Inductively, if rk−2 divides both rj and rj+1 , it divides
rj−1 . Thus all of the right hand sides of the equations above are multiples of
rk−1 ; in particular, a and b are multiples of rk−1 . On the other hand if d0 is
a common divisor of a and b, d0 divides r1 = a − bq1 and in general divides
rj = rj−2 − rj−1 qj and so d0 divides rk−1 , eventually. Thus rk−1 is a greatest
common divisor.
Of course it follows from the poset isomorphism
P → P r(D) → (all ideals of D, ⊇)
that any greatest common divisor of aD and bD is aD + bD. However an
explicit expression of rk−1 as a D-linear combination a and b can be obtained
from the same sequence of equations. For
d = rk−1 = rk−3 − rk2 qk−1 ,
and each rj similarly is a D-linear combination of rj−2 and rj−1 , j = 1, 2, . . ..
9.4.1
Examples of Euclidean Domains
1. The Gaussian integers and Eisenstein Integers.
These are respectively the domains D1 := Z ⊕ Zi (where i2 = −1 6= −i)
and D2 = Z ⊕ Zω (where ω = e2iπ/3 ). The complex number ω is a root of
the irreducible polynomial x2 + x + 1, so it is an algebraic integer which is a
primitve cube root of unity.
Both of these domains are subrings of C, the complex number field, and
each is invariant under complex conjugation. Thus both of them admit a
multiplicative norm function N : C → R where N (z) := z · z̄ records the
square of the Euclidean distance of z from zero in the complex plane.
We shall demonstrate that these rings are Euclidean domains with the
norm function in the role of g. That its values are taken in the non-negative
integers follows from the formulae
N (a + bi) = a2 + b2
N (a + bω) = a2 − ab + b2 .
As already remarked, the norm function is multiplicative so (ED1) holds.
To demonstrate (ED2) we consider two elements a and b of D with b 6= 0.
304
Now the elements of D1 form a square tessellation of the complex plane with
{0, 1, i, i+1} as a fundamental square. Similarly, the elements of D2 form the
equilateral triangle tessallation of the complex plane with {0, 1, ω 1/2 } at the
corners of a fundamental triangle. We call the points where three or more
tiles of the tessellation meet lattice points.
When we superimpose the ideal bD on either one of these tessellations
of the plane, we are depicting a tessellation of the same type on a subset
of the lattice points with a possibly larger fundamental tile. Thus for the
Gaussian integers, bD is a tesselation whose fundamental square is defined
by the four “corner” points {0, b, ib, (1 + i)b}. The resulting lattice points are
a subset of the Gaussian integers closed under addition (vector addition in
the geometric picture). Similarly for the Eisenstein numbers, a fundamental
triangle of the tessellation defined by the ideal bD is the one whose “corners”
are in {0, b, bω 1/2 }. In either case the tiles of the tessellation bD cover the
plane, and so the element a must fall in some tile T – either a square or an
equilateral triangle – of the superimposed tessellation bD. Now let qb be a
corner point of T which is nearest a. Since T is a square or an equilateral
quadrangle, we have
(Nearest Corner Principle) the distance from a point of T to its nearest
corner is less than the length of a side of T
q
q
Thus the distance |a − qb| = N (a − qb) is less than |b| = N (b). Using
the fact that the square function is monotone increasing on non-negative real
numbers, we have
g(a − bq) = N (a − bq) < N (b) = g(b).
Thus setting r = a − qb, we have r = 0 or g(r) < g(b). Thus (with respect
to the function g), r serves to realize the condition (ED2).4
4
Usually boxing matches are held (oxymoronically) in square “boxing rings”. Even if
(as well) they were held in equilateral triangles, it is a fact that when the referee commands
a boxer to “go to a neutral corner”, the fighter really does not have far to go (even subject
to the requirement of “neutrality” of the corner). But in some ethnological fantasia, one
could well imagine a culture with such an intense emphasis on “gracious defeats” that
their boxing rings approached a more ring-like N -gon, for larger N . Woe the fighter who
achieves a knock-down near the center, for after the mandatory hike to a neutral corner,
there is a good probably that the fighter can only return to the boxing match totally
exhausted from the long trip. No Euclidean algorithm for these boxers.
305
2. Polynomial domains over a field. Let F be a field and let F [x] be
the ring of polynomials in indeterminate x and with coefficients from F . We
let g be the degree function, F [x] → N. Then if f (x) and g(x) are non-zero
polynomials,
deg(f g) = deg f + deg g,
so (ED1) holds. The condition (ED2) is the familiar long division algorithm
of College Algebra and grade school.
9.5
Principal Ideal Domains are Unique Factorization Domains
The following theorem utilizes the fact that if an integral domain possesses
the ascending chain condition on principal ideals, then every element is a
product of finitely many irreducible elements (Lemma 171).
Theorem 181 Any PID is a UFD.
proof: Assume D is a PID. Then D has the ascending chain condition on
all ideals, and so by Lemma 171, the condition (UFD2) holds.
It remains to show (UFD1), that every irreducible element is prime. Let
x be irreducible. Then Dx is a maximal ideal ( Lemma 170). Now if x were
not a prime, there would exist elements a and b such that x divides ab but x
does not divide either a or b. Thus a ∈
/ xD, b ∈
/ xD, yet ab ∈ xD. Thus xD
is not a prime ideal, contrary to the conclusion of the previous paragraph
that xD is a maximal ideal, and hence a prime ideal.
9.6
If D is a UFD, then so is D[x].
Let D be a unique factorization domain (UFD), set S = D∗ = D − {0},
and left F := DS be the field of fractions of D. Then D[x], the domain of
polynomials with coefficients from D, can be regarded as a subring of F [x]
by the device of regarding coefficients d of D as fractions d/1. We wish to
compare the factorization of elements of D[x] in D[x] and in F [x].
We begin with very elementary results.
Lemma 182 U (D[x]) = U (D).
306
proof: Since D is an integral domain, degrees add in the products of nonzero polyomials, and so the units of D[x] must have degree zero. This shows
that units have degree zero and so lie in D. That a unit of D is a unit of Dx
depends only upon the fact that D is a subring of D[x].
Lemma 183 Let p be an irreducible element of D, regarded as a subring of
D[x]. If p divides the polynomial q(x) in D[x], then every coefficient of q(x)
is divisible by p.
proof: If p divides q(x), then q(x) = pf (x) for some f (x) ∈ D[x]. Then the
coefficients of q(x) are those of f (x) multiplied by p.
Lemma 184 (Irreducible elements of D are prime elements of D[x].)
Let p be an irreducible element of D. If p divides a product p(x)q(x) of two
elements of D[x], then either p divides p(x) or p divides q(x).
proof: Suppose the irreducible element p divides p(x)q(x) as hypothesized.
Write
Xm
Xn
p(x) = i=0 ai xi , and q(x) = j=0 bj xj ,
where ai , bj ∈ D. Suppose by way of contradiction that p does not divide
either p(x) or q(x). Then by the previous Lemma 183, p divides each coefficient of p(x)q(x) while there is a first coefficient ar of p(x) not divisible by
p, and a first coefficient bs of q(x) not divisible by p. Then the coefficient of
xr+s in p(x)q(x) is
cr+s =
X
ab,
i+j=r+s i j
subject to 0 ≤ i ≤ n and 0 ≤ j ≤ m
(9.3)
Now if (i, j) 6= (r, s), and i + j = r + s, then either i < r or j < s, and
in either case ai bj is divisible by p. Thus all summands ai bj in the right side
of equation ( 9.3) except for ar bs are divisible by p. But this would mean
that p does not divide cr+s , against Lemma 183 and the fact that p divides
p(x)q(x).
Thus p must divide one of p(x) or q(x), completing the proof.
We say that a polynomial p(x) ∈ D[x] is primitive if and only if, whenever d ∈ D divides p(x), then d is a unit.
We now approach two lemmas that involve F [x] where F is the field of
fractions of D. First,
307
Lemma 185 If p(x) = αq(x), where p(x) and q(x) are primitive polynomials
in D[x], and α ∈ F , then α is a unit in D.
proof: Since D is a UFD, greatest common divisors and least common
multiples exist, and so there is a “lowest terms” representation of α as a
fraction t/s where the greatest common divisor of s and t are just the units.
If s is a unit, then t divides each coefficient of p(x), and since this is primitive,
t must also be a unit. Otherwise, there is prime divisor p of s. Then p does
not divide t, and so from sp(x) = tq(x), we see that every coefficient of q(x)
is divisible by p, against q(x) being primitive.
Lemma 186 If f (x) ∈ F [x], then f (x) has a factorization
f (x) = rp(x)
where r ∈ F and p(x) is a primitive polynomial in D[x]. This factorization
is unique up to replacement of each factor by an associate.
proof: We prove this in two steps. First we establish
Step 1. There exists a scalar γ such that f (x) = γp(x), where p(x) is primitive in D[x].
Each non-zero coefficient of xi in f (x) has the form ai /bi with bi ∈ S.
Multiplying through by a least common multiple m of the bi (recall that lcm’s
exist in UFD’s), we obtain a factorization
f (x) = (
1 X
) ai (m/bi )xi ,
m
whose second factor is clearly primitive in D[x].
Step 2. The factorization in Step 1 is unique up to associates.
proof: Suppose f (x) = γ1 p1 (x) = γ2 p2 (x) with γi ∈ F , and pi (x) primitive
in D[x]. Then p1 (x) and p2 (x) are associates in F [x] so
p1 (x) = γp2 (x), for γ ∈ F.
Then by Lemma 185, γ is a unit in D. But as γ = γ2 γ1−1 , the result follows.
308
Lemma 187 (Gauss’ Lemma) Suppose p(x) ∈ D[x] is reducible in the ring
F [x]. Then p(x) is reducible in D[x].
proof: By way of contradiction, we may assume that p(x) is irreducible in
D[x]. In particular, by Lemma 183 and Lemma 184, we may assume that
p(x) is primitive.
Suppose p(x) = f (x)g(x) where f (x) and g(x) are polynomials in F [x].
Then by Step 1, we may write
f (x) = µf1 (x)
g(x) = γf2 (x).
where µ, γ ∈ F and f1 (x) and g1 (x) are primitive polynomials in D[x]. Now
by Lemma 184, f1 (x)g1 (x) is a primitive polynomial in D[x] so
p(x) = (µγ)(f1 (x)g1 (x),
and Lemma 185 shows that µγ is a unit in D. Then
p(x) = ((µγ)f1 (x)) · g1 (x)
is a factorization of p(x) in D[x], with factors of positive degree. The conclusion thus follows.
Theorem 188 If D is a UFD, then so is D[x].
proof: Let p(x) be any element of D[x]. We must show that p(x) has a
factorization into irreducible elements which is unique up the replacement of
factors by associates. Since D is a unique factorization domain, a consideration of degrees shows that it is sufficient to do this for the case that p(x) is
primitive of positive degree.
Let S = D−{0} and form the field of fractions F = DS , regarding D[x] as
a subring of F [x] in the usual way. Now F [x] is a Euclidean domain, hence a
PID, and hence a unique factorization domain. Thus we have a factorization
p(x) = p1 (x)p2 (x) · · · pn (x)
309
where each pi (x) is irreducible in F [x]. Then by Lemma 186, there exist
scalars γi , i = 1, . . . , n, such that pi (x) = γi qi (x), where qi (x) is a primitive
polynomial in D[x]. Then
p(x) = (γ1 γ2 · · · γn )q1 (x) · · · qn (x).
Since the qi (x) are primitive, so is their product (Lemma 184). Then by
Lemma 185 the product of the γi is a unit u of D. Thus
p(x) = uq1 (x) · · · qn (x)
is a factorization in D[x] into irreducible elements.
If
p(x) = vr1 (x) · · · rm (x)
were another such factorization, the fact that F [x] is a UFD shows that
m = n and the indexing can be chosen so that ri (x) is an associate of qi (x)
in F [x] – i.e. there exist scalars ρi in F such that ri (x) = ρi qi (x), for all i.
Then as ri (x) and qi (x) are irreducible in D[x], they are primitive, and so
by Lemma 185, each γi is a unit in D. Thus the two factorizations of p(x)
are alike up to replacement of the factors by associates in D[x]. The proof is
complete.
Corollary 189 If D is a UFD, then so is the ring
D[x1 , . . . , xn ].
proof: Repeated application of the theorem to
D[x1 , . . . , xj+1 ] ' (D[x1 , . . . , xj ])[xj+1 ].
EXERCISES 9.1
The first few exercises concern localizations.
Exercise 1. Using the definitions of multiplication and addition for fractions, fill in all the steps in the proof of the following:
1. that (DS , +) is an abelian groups,
310
2. that multiplication of fractions is a monoid, and
3. that multiplication is distributive with respect to addition.
Exercise 2. Let D be an integral domain with group of units U (D). Show
that if S is a multiplicatively closed subset and S 0 is either U (D)S or U (D) ∪
U (D)S, then the localizations DS and DS 0 are isomorphic.
Exercise 3. Below are given examples of a domain D and a multiplicatively
closed subset of non-zero elements of D. Describe as best you can the local
ring DS .
1. D = Z; S is all powers of 3. Is DS a local ring?
2. D = Z; S consists of all numbers of the form 3a 5b where a and b are
natural numbers and a + b > 0. Is the condition a + b > 0 necessary?
3. D = Z; S is all positive integers which are the sum of two squares.
4. D = Z[x]; S is all primitive polynomials of D.
Let D be a fixed unique-factorization-domain (UFD), and let p be a prime
element in D. Then the principal ideal (p) := pD is a maximal ideal and
D/(p) is a field. The next few excercises concern the following surjection
between integral domains:
mp : D[x] → (D/(p))[x],
which reduces mod p the coefficients of any polynomial of D[x].
Exercise 4. Show that mp is a ring homomorphism. Conclude that the
principal ideal in D[x] generated by p is a prime ideal and from this, that p
is a prime element in the ring D[x].
Exercise 5. Show that every maximal ideal M of Z[x] is generated by two
elements p and f (x), where p is a rational prime and f := f (x) is a polynomial
in Z[x] for which mp (f ) is irreducible in (Z/(p))[x]. [Hint: First show that if
M is a maximal ideal, then the field Z[x]/M cannot have characteristic zero.
This forces the presence of a unique rational prime p in M . Then use the
fact that (Z/(p))[x] is a Euclidean domain.]
311
Exercise 6. (Another proof of Gauss’ Lemma.) Let D be a UFD. A polynomial h(x) in D[x] is said to be primitive if and only there is no prime
element p of D dividing each of the D-coefficients of h(x). Using the morphism mp of the first exercise, show that if f and g are primitive polynomials
in D[x], then the product f g is also primitive. [Hint: by assumption, there
is no prime in D dividing f or g in D[x]. If a prime p of D divides f g in
D[x] then mp (f g) = 0 = mp (x) · mp (g). Then use the fact that (D/(p)[x] is
an integral domain.]
Exercise 7. (Eisenstein’s Criterion) Let D be a UFD. Suppose p is a prime
in D and
f (x) = an xn + · · · a1 x + a0
is a polynomial in D[x] with n > 1 and an 6= 0. Further, assume that
1. p does not divide the lead coefficient an ;
2. p divides each remaining coefficient ai for i = 0, . . . , n − 1; and
3. p2 does not divide a0 .
Show that f cannot factor in D[x] into polynomial factors of positive degree.
In particular f is irreducible in F [x], where F is the field of fractions of D.
[Hint: Suppose by way of contradiction that there was a factorization f = gh
in D[x] with f and g of degree at least one. Then apply the ring morphism
mp to get
mp (f ) = mp (g) · mp (h) = mp (an )xn 6= 0.
Now by hypothesis 1) the polynomials f and mp (f ) have the same degree,
while the degrees of g and h are at least as large as the respective degrees
of their mp images. Since f = gh, and deg mp (f ) = deg mp (g) + deg mp (h),
we must have the last two summands of positive degree. Since (D/(p)[x]
is a UFD, x must divide both mp (g) and mp (h). This means the constant
coefficients of g and h are both divisible p. This contradicts hypothesis 3.]
Exercise 8. Show that under the hypotheses of the previous exercise, one
cannot conclude that f is an irreducible element of D[x]. [Hint: In Z[x],
the polynomial 2x + 6 satisfies the Eisenstein hypotheses with respect to the
prime 3. Yet it has a proper factorization in Z[x].]
Exercise 9. Let p be a prime element of D. Using the morphism mp show
that the subset
312
B := {p(x) = a0 + a1 x + · · · + an xn |ai ≡ 0mod 0, for i > 0}
of D[x] is a subdomain.
9.7
9.7.1
Appendix to Chapter 8: The Arithmetic
of Quadratic Domains.
Introduction
We understand the term quadratic domain to indicate the ring of algebraic integers of a field K of degree two over the rational number field Q.
The arithmetic properties of interest are those which define Euclidean domains, principal ideal domains (or PID’s) and unique factorization domains
(or UFD’s, also called “factorial domains” in the literature).
Quadratic Fields.
Throughout Q will denote the field of rational numbers. Suppose K is a
subfield of the complex numbers. Then it must contain 1, 2 = 1 + 1, all
integers and all fractions of such integers. Thus K contains Q as a subfield
and so is a vector space over Q. Then K is a quadratic field if and only of
dimQ (K) = 2.
Suppose now that K is a quadratic field, so as a Q-space,
K = Q ⊕ ωQ, for any chosen ω ∈ K − Q.
Then ω 2 = −bω − c for rational numbers b and c, so ω is a root of the monic
polynomial
p(x) = x2 + bx + c
in Q[x]. Evidently, p(x) is irreducible, for otherwise, it would factor into
two linear factors, one of which would have ω for a rational root, against our
choice of ω. Thus p(x) has two roots ω and ω̄ given by
√
(−1 ± b2 − 4c)/2.
313
We see that K is a field generated by Q and either ω or ω̄ — a fact which
can be expressed by writing
K = Q(ω) = Q(ω̄).
Evidently
√
K = Q( d), where d = b2 − 4c.
We anticipate a few elementary results on fields reviewed in the next chapter,
by observing that substitution of ω for the indeterminate x in each polynomial
of Q[x] produces an onto ring homomorphism,
vω : Q[x] → K,
whose kernel is the principal ideal p(x)Q[x]. But as vω̄ : Q[x] → K has the
same kernel, we see that the Q-linear transformation
σ : Q ⊕ ωQ → Q ⊕ ω̄Q,
which take each vector a + bω to a + bω̄, a, b ∈ Q, is an automorphism of the
field K. Thus σ(ω) = ω̄, and σ 2 = 1K (why?).
Associated with the group hσi = {1K , σ} of automorphisms of K are two
further mappings.
First is the trace mapping,
T r : K → Q,
which takes each element ζ of K to ζ + ζ σ . Clearly this mapping is Q-linear.
The second mapping is the norm mapping,
N : K → Q,
which takes each element ζ of K to ζζ σ . This mapping is not Q-linear,
but it follows from its definition (and the fact that multiplication in K is
commutative) that it is multiplicative – that is
N (ζψ) = N (ζ)N (ψ),
for all ζ, ψ in K.
Just to make sure everything is in place, consider the roots ω and ω̄ of the
irreducible polynomial p(x) = x2 +bx+c. Since p(x) factors as (x−ω)(x− ω̄)
in K[x], we see that the trace is
T (ω) = T (ω̄) = −b ∈ Q,
and that the norm is
N (ω) = N (ω̄) = c, also in Q.
314
Quadratic Domains
We wish to identify the ring A of algebraic integers of K. These are the
elements ω ∈ K which are roots of at least one monic irreducible polynomial
in Z[x]. At this stage the reader should be able to prove:
Lemma 190 An element ζ ∈ K − Q is an algebraic integer if and only if
its trace T (ζ) and its norm N (ζ) are both rational integers.
√
√
Now putting ζ = d, where d is square-free and K = Q( d), the lemma
implies:
Lemma 191
√ If d is a square-free integer, then the ring of algebraic integers
of K = Q( d) is the Z-module spanned by
√
{1, d} if d ≡ 2, 3 mod 4,
√
{1, (1 + d)/2} if d ≡ 1 mod 4.
√
√
Thus the ring of√integers for Q( 7) is the set {a + b 7|a, b ∈ Z}. But if
d = −3,
√ then (1 + −3)/2 is a cube root of unity, and the ring of integers
for Q( −3) is the domain of Eisenstein numbers.
9.7.2
Which quadratic domains are Euclidean, principal ideal or factorial domains?
Fix a square free integer
d and let A(d) denote the full ring of algebraic
√
integers of K = Q( d). A great deal of effort has been devoted to the
following questions:
1. When is A(d) a Euclidean domain?
2. More generally, when is A(d) a principal ideal domain?
In this chapter we showed that the Gaussian integers, A(−1) and the
Eisenstein numbers, A(−3) were Euclidean by showing that the norm function could act as the function g : A(d) → Z with respect to which the
Euclidean algorithm is defined. In the case that the norm can act in this
role, we say that the norm is algorithmic.
The two questions have been completely answered when d is negative.
315
Proposition 192 The norm is algorithmic for A(d) when d = −1, −2, −3, −7
and −11. For all remaining negative values of d, the ring A(d) is not even
Euclidean. However, A(d) is a principal ideal domain for
d = −1, −2, −3, −7, −11, −19, −43, −67, −163,
and for no further negative values of d.
There are ways of showing that A(d) is a principal ideal domain, even
when the norm function is not algorithmic. For example:
Lemma 193 Suppose d is a negative square-free integer and the (positively
valued) norm function of A(d) has this property:
• If N (x) ≥ N (y) then either y divides x or else there exist elements u
and v (depending on x and y) such that
0 < N (ux − yv) < N (y).
Then A(d) is a principal ideal domain.
This lemma is used, for example, to show that A(−19) is a PID.5
√
For positive d (where Q( d) is a real field) the status of these questions
is not so clear. We can only report the following:
Proposition 194 Suppose d is a positive square-free integer. Then the norm
function is algorithmic precisely for
d = 2, 3, 5, 6, 7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57, 73.
The domain A(d) is known to be a principal ideal domain for many further
values of d. But we dont even know if this can occur for infinitely many
values of d! Moreover, among these PID’s, how many are actually Euclidean
by some function g other than the norm function? Are there infinitely many?
5
See an account of this in Pollard’s The Theory of Algebraic Numbers, Carus Monograph
no. 9, Math. Assoc. America, page 100, and in Wilson, J. C., A principal ideal domain
that is not Euclidean, Mathematics Magazine, vol 46(1973), pp. 34-48.
316
9.7.3
Subdomains of Euclidean domains which are not
UFD’s
Here it is shown that standard Euclidean domains such as the Gaussian
integers contain subdomains for which unique factorization fails. These nice
examples are due to P. J. Arpaia (“A note on quadratic Euclidean Domains”,
Amer. Math. Monthly, vol. 75(1968), pp 864-865).
Let A0 (d) be the√subdomain of A(d) consisting of all integral linear combinations of 1 and d, where d is a square-free integer. Thus A0 (d) = A(d)
if d ≡ 2, 3 mod 4, and is a proper subdomain of A(d) only if d ≡ 1 mod 4.
Let p be a rational prime. Let A0 /(p) be the ring whose elements belong
to the vector space
√
(Z/(p))1 ⊕ (Z/(p)) d,
√
where multiplication is defined by setting ( d)2 to be the residue class modulo p containing d. Precisely, A0 /(p) is the ring Z[x]/I where I is the ideal
generated by the prime p and the polynomial x2 − d. It should be clear that
A0 /(p) is a field if d is not zero or is not a quadratic residue mod p, and is
the direct (ring) sum of two fields if d is a quadratic residue mod p. If p
divides d, it has a 1-dimensional nilpotent ideal. But in all cases, it contains
Z/(p) as a subfield containing the multiplicative identity element.
There is a straightforward ring homomorphism
np : A0 → A0 /(p),
√
which takes a + b d to ā + b̄ d where “bar” denotes the taking of residue
classes of integers mod p. Now set
√
B(d, p) := n−1
p (Z/(p)).
Then B(d, p) is a ring, since it is the preimage under np of a subfield of the
image A0 /(p). In fact, as a set,
√
B(d, p) := {a + b d ∈ A0 |b ≡ 0 mod p}.
Clearly it contains the ring of integers and so is a subdomain of A(p). Note
that the domains A0 and B(d, p) are both σ-invariant.
In the subdomain B(d, p), we will show that under a mild condition, there
are elements which can be factored both as a product of two irreducible
elements and as a product of three irreducible elements. Since d and p are
fixed, we write B for B(d, p) henceforward.
317
Lemma 195 An element of B of norm one is a unit in B.
Proof: Since B = B σ , N (ζ) = ζζ σ = 1 for ζ ∈ B implies ζ σ is an inverse
in B for ζ.
Lemma 196 Suppose ζ is an element A0 whose norm N (ζ) is the rational
prime p. The ζ does not belong to B.
Proof: By hypothesis, N (ζ) = a2 − b2 d = p. If p divided integer b, it would
divide integer a, so p2 would in fact divide N (ζ), an absurdity. Thus p does
not divide b and so ζ is not an element of B.
Corollary 197 If an element of B has norm p2 , then it is irreducible in B.
Proof: By Lemma 195, elements of norm one in B are units in B. So
if an element of norm p2 in B were to have a proper factorization, it must
factor into two elements of norm p. But the preceeding lemma shows that B
contains no such factors.
Theorem 198 Suppose there exist integers a and b such that a2 − db2 = p
(that is, A0 contains an element of norm p). Then the number p3 admits
these two factorizations into irreducible elements of B:
p3 = p · p · p
√
√
p3 = (pa + pb d) · (pa − pb d).
Proof: The factors are clearly irreducible by Corollary 197.
We reproduce Arpaia’s table displaying p = a2 − db2 for all d for which
the norm function on A(d) is algorithmic.
d
−11
−7
−3
−2
−1
2
3
p = a2 − db2
47 = 62 − (−11)12
11 = 22 − (−3)12
7 = 22 − (−3)12
3 = 12 − (−2)12
2 = 1 − (−1)12
−7 = 12 − 2 · 22
−2 = 12 − 3 · 12
d
5
6
7
11
13
17
19
p = a2 − db2
11 = 42 − 5 · 12
−5 = 12 − (6)12
−3 = 22 − 3 · 12
−7 = 22 − 11 · 12
3 = 42 − 13 · 12
−13 = 22 − 17 · 12
−3 = 42 − 19 · 12
318
d
21
29
33
37
41
57
73
p = a2 − db2
−17 = 22 − 21 · 12
−13 = 42 − (26)12
−29 = 22 − 33 · 12
107 = 122 − 37 · 12
−37 = 22 − 41 · 12
−53 = 22 − 57 · 12
−37 = 62 − 73 · 12
A Student Project
In the preceding subsection, a non-factorial subdomain
B was produced for
√
each case that the domain A(d) = alg. int.(Q( d)) was Euclidean. But
there are only finitely many of these domains A(d). This raises the question
whether one can find such a subdomain B in more standard generic Euclidean
domains? We know that no such subdomain can exist in the domain of
integers Z due to the paucity of it subdomains. But are there non-factorial
subdomains B of the classical Euclidean domain of polynomials over a field?
This might be done by duplicating the development of the previous section
replacing the ring of integers Z, by an arbitrary unique factorization domain
D which is distinct from its field of fractions k. Suppose further that K is a
field of degree two over k generated by a root ω of an irreducible quadratic
polynomial x2 − bx − c in D[x] which has distinct roots {ω, ω̄}. (One must
assume that the characteristic of D is not 2.) Then the student can show
that, as before, K = k ⊕ ωk as a vector space over k and that the mapping
σ defined by
σ(u + ωv) = u + ω̄v, for all u, v ∈ k,
is a field automorphism of K of order two so that the trace and norm mappings T and N can be defined as before.
Now let A be the set of elements in K whose norm and trace are in D.
The student should be able to prove that if D is Noetherian, A is a ring. In
any event, A0 := {a + ωb|a, b ∈ D} is a subring.
Now since D is a UFD not equal to its quotient field, there is certainly a
prime p in D. It is easy to see that
Bp = {a + ωb|a, b ∈ D, b ∈ pD} = D ⊕ ωpD
is a σ-invariant integral domain. Then the proofs of Lemmas 195, and 196
and Corollary 197 go through as before (note that the fact that D is a UFD
is used here). The analogue of the Theorem will follow:
If A0 contains an element of norm p, B is not factorial.
Can we do this so that between A and B lies the Euclidean ring F [x]?
The answer is “yes”, if the characteristic of the field F is not 2.
In fact we can pass directly from F [x] to the subdomain B and fill in the
roles of D, k, K, A, and A0 later.
319
Let p(x) be an irreducible polynomial in F [x] where the characteristic of
F is not 2. Then (using an obvious ring isomorphism) p(x2 ) is an irreducible
element of the subdomain F [x2 ]. Now the student may show that the set
Bp : F [x2 ] ⊕ xp(x2 )F [x2 ]
is a subdomain of F [x].
Now D = F [x2 ] is a UFD with quotient field k = F (x2 ) – the so called
“field of rational functions” over F . The quotient field K = F (x) of F [x], is
degree two over k, with the indeterminate x playing the role of ω. Thus
F (x) = F (x2 ) ⊕ xF (x2 )
since ω = x is a root of the polynomial y 2 − x2 which is irreducible in
F [x2 ][y] = D[y]. Since the characteristic is not 2, the roots x and −x are
distinct. Thus we obtain our field automorphism σ which takes any element
ζ = r(x2 ) + xs(x2 ) of F (x), to r(x2 ) − xs(x2 ) (r and s are quotients of
polynomials in F [x2 ]). (Note that this is just the substitution of −x for x in
F (x).) Now the student can write out the norm and trace of such a generic
ζ. For example
N (ζ) = r2 − x2 s2 .
(9.4)
At this point the ring Bp given above is not factorial if there exists a
polynomial ζ = a(x2 ) + xb(x2 ) in F [x] of norm p(x2 ). That is,
a(x2 )2 − x2 b(x2 )2 = p(x2 )
is irreducible in F [x2 ].
Here is an example: Take a(x2 ) = 1 = b(x2 ), so ζ = x + 1. Then
N (1+x) = x2 −1 is irreducible in F [x2 ]. Then in Bp = F [x2 ]⊕x(x2 −1)F [x2 ],
we have the factorizations
(1 − x2 )3 = (1 + x − x2 − x3 )(1 − x − x2 + x3 )
into irreducible elements.
320
Chapter 10
Principal Ideal Domains and
Their Modules
10.1
Introduction
Thoughout this Chapter, D (and sometimes R) will be a principal ideal
domain (PID). Thus it is a commutative ring for which a product of two
elements is zero only if one of the factors is already zero – that is, the set
of non-zero elements is closed under multiplication. Moreover every ideal is
generated by a single element – and so has the form Dg for some element g
determined by the ideal.
Of course, as rings go, principal ideal domains are hardly typical. But
two of their examples, the ring of integers Z, and the ring F [x], of all polynomials in a single indeterminate x over a field F , are so pervasive thoughout
mathematics that their atypical properties deserve special attention.
In the previous chapter we observed that principal ideal domains were
in fact unique factorization domains. In this chapter, the focus is no longer
on the internal arithmetic of such rings, but rather on the structure of all
finitely-generated submodules over these domains. As one will soon see, the
structure is rather precise.
10.2
Quick Review of PID’s
Principal Ideal Domains (PID’s) are integral domains D for which each ideal
I has the form I = aD for some element a of D. One may then infer
321
from the fact that every ideal is finitely generated, that the Ascending Chain
Condition (ACC) holds for the poset of all ideals.
Recall that a divides b if and only if
aD ⊇ bD.
It follows — as we have seen in the arguments that a principal ideal domain
(PID) is a unique factorization domain (UFD) – that a is irreducible if and
only if aD is a maximal ideal, and that this happens if and only if aD is a
prime element.
We also recall that for ideals,
xD = yD
if and only if x and y are associates. For this means x = yr and y = xs for
some r and s in D, so x = xrs, so sr = 1 by the cancellation law. Thus r
and s, being elements of D, are both units.
The least common multiple of two elements a, b ∈ D is a generator m
of the ideal
aD ∩ bD = mD
where m is unique up to associates.
The greatest common divisor of two non-zero elements a and b is a
generator d of
aD + bD = dD,
again up to the replacement of d by an associate. Two elements a and b are
relatively prime if and only if
aD + bD = D.
Our aim in this chapter will be to find the structure of finitely-generated
modules over a PID. Basically, our main theorem will say that if M is a
finitely-generated module over a principal ideal domain D, then
M ' (D/De1 ) ⊕ (D/De2 ) ⊕ · · · ⊕ (D/Dek )
as right D-modules, where k is a natural number and for 1 ≤ i ≤ k − 1, ei
divides ei+1 .
This is a very useful theorem. Its applications are (1) the classification of
finite abelian groups, and (2) the complete analysis of a single linear transformation T : V → V of a vector space into itself. It will turn out that the set
of elements ei which are not units is uniquely determined up to associates.
But all of that still lies ahead of us.
322
10.3
Free modules over a PID.
Recall that we have earlier defined a free module over the ring R to be a
direct sum of submodule Mσ each of which is isomorphic to RR as right
R-modules.
We say that an R-module M has a basis if and only if there is a subset B
of M (called a basis) such that every element m of M has a unique expression
X
bi ri , bi ∈ B, ri ∈ R,
as a finite R-linear combination of elements of B. The uniqueness is the key
item here.
Lemma 199 An R-module M is a free R-module if and only if it has a basis.
Proof: If M is free, it is isomorphic to a direct sum of submodules Mσ each
of which is isomorphic to RR as right R-module. Let mσ be the element of
Mσ which corresponds to the element 1 ∈ R under this R-isomorphism. The
reader can verify that the set B := {mσ |σ ∈ I}, where I indexes the direct
summands Mσ , is a basis for M .
Now suppose the R-module M has a basis B. Suppose b ∈ B. Then the
uniqueness of expression means that for any r, s ∈ R, br = bs implies r = s.
Thus the map which sends br to r is a well-defined isomorphism bR → RR of
right R-modules. The directness of the sum M = ⊕b∈B bR follows from the
uniquness of expression. Thus M is a free R-module.
In general, there is no result asserting that two bases of a free module
have the same cardinality. However, the result is true when R is an integral
domain D. The reader may recall from Chapter 7 that this was true of any
K-module, where K was a field. Recall that we only needed to invoke our
general dependence theory (modeled by linear dependance) to achieve this
result: namely that vector spaces have a dimension. Since any K-module
is free, and any field K is certainly a PID, the theorem proposed below is
clearly a vast generalization of the vector space result. On the other hand it
readily stems from it.
Theorem 200 Let D be an integral domain which is not neccessarily a principal ideal domain. Let F be a free D-module with basis X := {xi }i∈I . Then
the cardinality of I is uniquely determined.
323
proof: Let P be a maximal ideal of D (recall that at least one must exist).
Then the factor ring, D/P := K is a field. Then F P := hf r|f ∈ F, r ∈ P i
is a submodule of F . Then the factor module F̄ := F/F P is annihilated
by P and so becomes a K-vector space. We are done once we show that
the D-basis X for F maps to a K-basis of F̄ under the natural modulehomomorphism F → F̄ . That it maps to a spanning set is no problem: it is
the K-linear independance of the x̄i which is at issue.
P
P
P
P
Suppose r̄i x̄i = 0 ∈ K. Then ri xi ∈ F P . This means ri x1 = aj xj ,
where each coefficient aj lies in P . Since X is an D-basis the uniqueness of
the expression for the right side forces the ri to lie in P . But then each
r¯i = 0, proving the K-independance.
We call the unique cardinality of a basis for a free D-module F , the
D-rank of F and denote it rkD (F ) or just rk(F ) if D is understood.
In general, a submodule of a free R-module is not free. However the result
holds if R is an integral domain. This is proved in a later Chapter where
projective modules are introduced.
10.4
A technical result
Theorem 201 Let A be an n × m matrix with coefficients from a principal
ideal domain D. Then there exists an n × n matrix P invertible in (D)n×n ,
and an m × m matrix Q, invertible in (D)m×m , such that




P AQ = 
d1 0
0 d2
. .
. .
...
...
...
...
0
0
.
.



 where di divides di+1 .

The elements {d1 , d2 , . . . , dk }, where k = min(n, m), are uniquely determined
up to associates.
proof: For each non-zero element d of D let `(d) be the number of prime
divisors occurring in a factorization of d into primes. (Recalling that a PID
is always a UFD, the function ` is seen to be well-defined.)
Our goal is to convert A into a diagonal matrix with successively dividing
diagonal elements by a series of right and left multiplications by invertible
324
matrices. These will allow us to perform several so-called elementary row
and column operations namely:
(I) Add b times the i-th row (column) to the j-th row (column),
i 6= j. This is left (right) multiplication by a suitably sized matrix
having b in the (i, j)-th (or (j, i)-th) position while having all diagonal
elements equal to 1D , and all futher entries equal to zero.
(II) Multiply a row (or column) throughout by a unit u of D. This is
left (right) multiplication by a diagonal matrix diag(1, . . . , 1u, 1, . . . , 1)
of suitable size.
(III) Permute rows (columns). This is left (right) multiplication by a
suitable permutation matrix.
First observe that there is nothing to do if A is the n × m matrix with
all entries zero. In this case each di = 0D , and they are unique.
Thus if we consider the entire collection of matrices
S = {P AQ|P, Q units in D(n×n) and D(m×m) , respectively},
among their non-zero entries, there can be found a matrix element bij in some
matrix B ∈ S, with `(bij ) minimal. Then by rearranging rows and columns
of B if necessay, we may assume that this bij is in the (1, 1)-position of B, –
that is, (i, j) = (1, 1).
We now claim
b11 divides b1k for k = 2, . . . , m.
(10.1)
To prove the claim, assume b11 does not divide b1k . Permuting a few
columns if necessary, we may assume k = 2. Since D is a principal ideal
domain, we may write
b11 D + b12 D = Dd.
Then as d divides b11 and b12 , we see that
b12 = ds and − b11 = dt, with s, t ∈ D.
Then, as d = b11 x + b12 y, for some x, y ∈ D, we see that
d = (−dt)x + (ds)y so 1 = sy − tx,
325
by the cancellation law. Thus
−t
s
y −x
!
x s
y t
!
=
1 0
0 1
!
.
So, after multplying B on the right by the invertible




Q1 = 



x
y
0
.
0
s 0 ... 0
t 0 ... 0
0 1 0 ...
.
. ...
.
0 ... 0 1




,



The first row of BQ1 is (d, 0, b13 , . . . , b1m ). Thus, as d now occurs as a nonzero element of D in a matrix in S, we see that `(b11 ) ≤ `(d) ≤ `(b11 ) so that
b11 is an associate of d. Thus b11 divides b12 as claimed.
Similarly, we see that b11 divides each entry bk1 in the first column of
B. Thus, subtracting suitable multiples of the first row from the remaining
rows of B, we obtain a matrix in S with first column, (b11 , 0, . . . , 0)T . Then
subtacting suitable multiples of the column from the remaining columns of
this matrix, we arrive at a matrix




B0 := 




b11 0 0 . . . 0


0

·
,
,

·
B1

0
where B0 ∈ S. We set B1 = (cij ), where 1 ≤ i ≤ n − 1 and 1 ≤ j ≤ m − 1.
We now expand on this: we even claim that
b11 divides each entry cij of matrix B1 .
This is simply because addition of the j-th row to the first row of the
matrix B0 just listed above, yields a matrix B 0 in S whose first row is
(b11 , cj2 , cj3 , . . . .cjm ). But we have just seen that in any matrix B0 in S
containing an element of minimal possible length in the (1, 1)-position, that
this element divides every element of the first row and column. Since B 0 is
such a matrix, b11 divides cij .
326
Now by induction on n + m, there exist invertible matrices P1 and Q1
such that P1 B1 Q = diag(d2 , d3 , . . . , ) with d2 dividing d3 dividing . . . dk .1
Moreover, by induction, the numbers d2 , . . . dk , are uniquely determined up
to associates by the matrix B1 . But in turn, the element b11 with minimal
length determined the matrix B1 , so that the pair (b, B1 ) is determined up to
the multiplication of each entry by a unit of D. Since the d2 , . . . dk are in turn
determined by B1 and hence b11 , they are uniquely determined quantities.
10.5
Finitely generated modules over a PID
Theorem 202 (The Invariant Factor Theorem) Let D be a principal ideal
domain and let M be a finitely generated D-module. Then as right Dmodules, we have
M ' (D/Dd1 ) ⊕ (D/Dd2 ) ⊕ . . . ⊕ (D/Ddn ),
where di divides di+1 , i = 1, ..., n − 1.
proof: Let M be a finitely generated D-module. Then there is a finitely
generated free module F and a D-epimorphism
f : D → M.
Then F has a D-basis {x1 , . . . , xn }, and as D is Noetherian, its submodule
ker f is finitely generated. Thus we have
M = x1 D ⊕ x2 D ⊕ . . . ⊕ xn D
ker f = y1 D + y2 D + . . . + ym D.
(10.2)
(10.3)
We then have m unique expressions
yi = x1 ai1 + x2 ai2 + · · · + xn ain , i = 1, . . . , m,
1
I apologize for the notation here: diag(d2 , d3 , . . . , ) should not be presumed to be a
square matrix. Of course it is still an n × m-matrix, with all entries zero except those that
bear a di at the (i, i)-position. If n < m the main diagonal hits the “floor” before it hits
the “east wall”, and the other way round if n > m.
327
and an m × n matrix A = (aij ) with entries in D. By the technical result
of the previous subsection, there exist matrices P = (pij ) and Q = (qij )
invertible in Dm×m and Dn×n , respectively, such that


d1 0 . . . 0

P AQ =  0 d2 . . . 0 
 ( where di |di+1 ),
..
an m × n matrix whose only non-zero entries are on the principal diagonal
and proceed until they cant go any further (that is, they are defined up to
subscript min(m, n)).
If we set
Q−1 = (qij∗ ), and
yi0 =
X
yj pij and
x0i
X
xj qij∗
=
then P AQ expresses the yi0 in terms of the x0i , which now comprise a new
D-basis for F . Thus we have
y10 = x01 d1 , y20 = x02 d2 , . . . etc.,
(where, by convention the expression dk is taken to be zero for min(m, n) <
k ≤ m). Also, since P and Q−1 are invertible, the yi0 also span ker f . So,
bearing in mind that some of the yi0 = x0i di may be zero, and extending the
definition of the dk by setting dk := 0 if m < k ≤ n, we may write
F = x01 D ⊕ x02 D ⊕ · · · ⊕ x0n D
ker f = x01 d1 D ⊕ x02 d2 D ⊕ · · · ⊕ x0n dn D.
Then
M = f (F ) ' F/ ker f ' (D/d1 D) ⊕ (D/d2 D) ⊕ · · · ⊕ (D/dm D)
and the proof of the theorem is complete.
Remark The essential uniqueness of the elements {di } appearing in the
above theorem will soon emerge in Section 9.6 below. These elements are
called the invariant factors (or sometimes the elementary divisors) of
the finitely generated D-module M .
328
At this stage their uniqueness does not immediately follow from the fact
that they are uniquely determined by the matrix A since this matrix itself
depended on the choice of a finite spanning set for ker f and not all such finite
spanning sets can be moved to one another by the action of an invertible
transformation Q in homD {ker f, ker f }. Indeed they might have different
cardinalities.
An element m of a D-module M is called a torsion element if and only
if there exists a non-zero element d ∈ D such that md = 0. Recall that
Ann(m) = {d ∈ D|md = 0} is an ideal and that we have just said that m
is a torsion element if and only if Ann(m) 6= 0. In an integral domain, the
intersection of two non-zero ideals is a non-zero ideal; it follows from this
that the sum of two torsion elements is a torsion element. In fact the set of
all torsion elements of M forms a submodule of M called torM .
Corollary 203 A finitely generated D-module over a principal ideal domain
D, is a direct sum of its torsion submodule torM and a free module.
proof: By the Invariant Factor Theorem,
M = (D/d1 D) ⊕ · · · ⊕ (D/dn D)
where di divides di+1 . Choose index t so that i ≤ t implies di 6= 0, and, for
i > t we have di = 0. Then
torM = (D/d1 D) ⊕ · · · ⊕ (D/dt D)
and so
M = torM ⊕ DD ⊕ DD ⊕ · · · ⊕ DD ,
with n − t copies of DD .
If M = torM , we say that M is a torsion module.
Let M be a module over a principal ideal domain, D. Let p be a prime
element in D. We define the p-primary part of M as the set of elements
of M annihilated by some power of p, that is
Mp = {m ∈ M |mpk = 0 for some natural number k = k(m)}.
It is easy to check that Mp is a submodule of M . We now have
329
Theorem 204 (Primary Decomposition Theorem) If M is a finitely generated torsion D-module, where D is a principal ideal domain, then there exists
a finite number of primes p1 , p2 , . . . , pN such that
M ' Mp1 ⊕ · · · ⊕ MpN .
proof: Since M is finitely generated, we have M = x1 D+. . . xm D for some
finite generating set {xi }. Since M is a torsion module, each of the xi is
a torsion element, and so all of the ideals Ai := Ann(xi ) are non-zero; and
since D is a principal ideal domain,
Ai = Ann(xi ) = di D, where di 6= 0, i = 1, . . . , m.
Also, as D is a unique factorization domain as well, each di is expressible as
a product of primes:
a
di = pai1i1 pai2i2 · · · pifif(i)(i) .
We let {p1 , . . . , pN } be a complete re-listing of all the primes pij which appear in these factorizations. If f (i) = 1, then di is a prime power and so
xi D ⊆ Mpj for some j. If, on the other hand, f (i) 6= 1, then, forming the elea
ments vij = di /pijij , we see that the greatesr common divisor of the elements
{vi1 , . . . , vif (i) } is 1, and this means
vi1 D + vi2 D + · · · + vif (i) D = D.
Thus there exist elements b1 , · · · , bf (i) such that
vi1 b1 + vi2 b2 + · · · + vif (i) bf (i) = 1.
Note that
a
bj vij xi ∈ Mpij since pijij vij bj xi = 0,
so
xi = xi · 1 =
X
bj vij xi ∈ Mpi1 + · · · + Mpif (i) .
Thus in all cases
xi D ⊆ Mp1 + · · · + MpN
for i = 1, . . . , m, and the sum on right side is M . Now it remains only to
show that this sum is direct and we do this by the criterion for direct sums
of D-modules. Consider an element
a ∈ (Mp1 + · · · + Mpk ) ∩ Mpk+1 .
330
Then the ideal Ann(a) contains an element m which on the one hand is a
product of primes in {p1 , . . . , pk }, and on the other hand is a power of the
prime pk+1 . Since gcd(m, pk+1 ) = 1, we have
Ann(a) ⊇ mD + p`k+1 D = D.
thus 1 · a = a = 0. Thus we have shown that
(Mp1 + · · · + Mpk ) ∩ Mpk+1 = (0).
This proves that
M = Mp1 ⊕ Mp2 ⊕ · · · ⊕ MpN .
10.6
The uniqueness of the Invariant Factors.
We have seen that a finitely generated module M over a principal ideal
domain has a direct decomposition as a torsion module and a finitely generated free module (corresponding to the number of invariant factors which
are zero). This free module is isomorphic to M/torM and so its rank is the
unique rank of M/torM (Theorem 200), and consequently is an invariant
of M . The torsion submodule, in turn, has a unique decomposition into
primary parts whose invariant factors determine those of tor(M ). Thus, in
order to show that the so-called invariant factors earn their name as genuine
invariants of M , we need only show this for a module M which is primary –
i.e. M = Mp for some prime p in D.
In this case, applying the invariant factor theorem, we have
M ' D/ps1 D ⊕ · · · ⊕ D/pst D
(10.4)
where s1 ≤ s2 ≤ · · · ≤ st , and we must show that the sequence S =
(s1 , . . . , st ) is uniquely determined by M . We shall accomplish this by producing another sequence Ω = {ωi }, whose terms are manifestly invariants
determined by the isomorphism type of M , and then show that S is determined by Ω.
For any p-primary D-module A, set
Ω1 (A) := {a ∈ A|ap = 0}.
Then Ω1 (A) is a D-submodule of A which can be regarded as a vector space
over the field F := D/pD. We denote the dimension of this vector space by
331
the symbol rk(Ω1 (A)). Clearly, Ω1 (A) and its rank are uniquely determined
by the isomorphism type of A.
Now let us apply these notions to the module M with invariants S =
(s1 , . . . , st ) as in equation ( 10.4). First we set ω1 := rk(Ω1 (M )), and K1 :=
Ω1 (M ). In general we set
Ki+1 /Ki := Ω1 (M/Ki ), and ωi+1 := rk(Ki+1 /Ki ).
The Ki form an ascending chain of submodules of M which ascends properly
until it stabilizes at i = st , the maximal element of S. (Note that Ann(M ) =
Dpst , so that pst can be thought of as the “exponent” of M .) Moreover, each
of the D-modules Ki , is completely determined by the isomorphism type of
M/Ki−1 , which is determined by Ki−1 , and ultimately by M alone. Thus we
obtain a sequence of numbers
Ω = (ω1 , ω2 , . . . , ωst ),
which are completely determined by the isomorphism type of M . One can
easily see
(*) ωj is the number of elements in the sequence S which are at least as
large as j.
Now the sequence S can be recovered from Ω as follows:
(**) For j = 0, 1, . . . , t − 1, st−j is the number of elements ωr of Ω of cardinality at least j + 1.
example: Suppose p is a prime in the principal ideal domain D and A is a
primary D-module with invariant factors:
p, p, p2 , p3 , p3 , p4 , p7 , p7 , p10 ,
so S = (1, 1, 2, 3, 3, 4, 7, 7, 10). Then, setting F = D/pD, K1 is an F module of rank 9, K2 /K1 is an F -module of rank 7, and in general Ω =
(9, 7, 6, 4, 3, 3, 3, 1, 1, 1). Then S = (1, 1, 2, 3, 3, 4, 7, 7, 10) is recovered by the
recipe in (**).
We conclude:
Theorem 205 The non-unit invariant factors of a finitely-generated module
over a principal ideal domain are completely determined by the isomorphism
type of the module – i.e., they are really invariants.
332
10.7
Applications of the Theorem on PID Modules.
10.7.1
Classification of Finite Abelian Groups.
As remarked before, an abelian group is simply a Z-module. Since Z is a
PID, any finitely generated Z-module A has the form
A ' Zd1 × Zd2 × · · · × Zdm
–that is, A is uniquely expressible as the direct product of cyclic groups, the
order of each dividing the next. The numbers (d1 , . . . dn ) are thus invariants
of A and the decomposition just presented is called the Jordan decomposition of A. The number of distinct isomorphism types of abelian groups of
order n is thus the number of ordered sets (d1 , . . . , dm ) such that di divides
di+1 and d1 d2 · · · dm = n.
example: The number of abelian groups of order 36 is 4. The four
possible ordered sets are: (36), (2, 18), (3, 12), (6, 6).
This number is usually much easier to compute if we first perform the
primary decomposition. If we assume A is a finite abelian group of order n,
and if p is a prime dividing n, then Ap = {a ∈ A|apk = 0, k ≥ 0} is just
the set of elements of A of p-power order. Thus Ap is the unique p-Sylow
subgroup of A. Then the primary decomposition
A = Ap1 ⊕ · · · ⊕ Apt
is simply the additive version of writing A as a direct product of its Sylow
subgroups. Since the Api are unique, the isomorphism type of A is uniquely
determined by the isomorphism types of the Api . Thus if Api is an abelian
group of order ps , we can ask for its Jordan decomposition:
Zps1 × Zps2 × · · · × Zpsm
where psi divides psi+1 and ps1 ps2 · · · psm = ps . But we can write this condition
s: si ≤ si+1 and s1 + s2 + · · · + sm = s. Thus each unordered set renders s
as a sum of natuaral numbers without regard to the order of the summands
– called an unordered partition of the integer s. The number of these is
usually denoted p(s), and p(s) is called the partition function. For example
333
p(2) = 2, p(3) = 3, p(4) = 5, p(5) = 7, etc. Here, we see that the number of
abelian groups of order ps is just p(s), and that:
Corollary 206 The number of isomorphism classes of abelian groups of order n = pt11 · · · ptrr is
p(t1 )p(t2 ) · · · p(tr ).
Again, we can compute the number of abelian groups of order 36 = 22 32 as
p(2)p(2) = 22 = 4.
10.7.2
The Rational canonical form of a linear transformation
Let F be any field and let V be an n-dimensional vector space over F so
V = F ⊕ · · · ⊕ F (n copies) as an F -module. Now suppose T : V → V is a
linear transformation on V , which we regard, as a right operator on V . Now
in homF (V, V ), the ring of linear transformations of V into V , multiplication
is composition of transformations. Thus for any such transformation T , T 2 :=
T ◦ T and any polynomial in T is a well defined linear transformation of V .
As is customary, F [x] will denote the ordinary ring of polynomials in an
indeterminate x. We wish to convert V into an F [x]-module by defining
v · p(x) := vp(T ) for any v ∈ V and p(x) ∈ F [x].
This module is finitely generated since an F -basis of V will certainly serve
as a set of generators of V as an F [x]-module.
It now follows from our main theorem that there are polynomials p1 (x),
p2 (x),. . . , pm (x) with pi (x) dividing pi+1 (x) in F [x] such that
VF [x] ' F [x]/(p1 (x)) ⊕ · · · ⊕ F [x]/(pm (x)),
where as usual (pi (x)) denotes the principal ideal pi (x)F [x]. Note that none
of the pi (x) is zero since otherwise VF [x] would have a (free) direct summand
F [x]F [x] which is infinite dimensional over F . Also, as usual, those pi (x)
which are units – that is, of degree zero – contribute nothing to the direct
decomposition of V . In fact we may assume that the polynomials pi (x)
are monic polynomials of degree at least one. Since they are now uniquely
determined polynomials, they have a special name: the invariant factors
of the transformation T .
334
Now since each pi (x) divides pm (x), we must have that V pm (x) = 0 and
so
pm (x)F [x] = AnnF [x] (V ).
For this reason, pm (x) being a polynomial of smallest degree such that pm (T ) :
V → 0, is called the minimal polynomial of T . Such a polynomial is
determined up to associates, but bearing in mind that we are taking pm (x)
to be monic, the word “the” preceeding “minimal polynomial” is justified.
The product p1 (x)p2 (x) · · · pm (x) is called the characteristic polynomial of T and we shall soon see that the product of the constant terms of
these polynomials is, up to sign, the determinant detT , familiar from linear
algebra courses.
How do we compute the polynomials pi (x)? To begin with, we must
imagine the transformation T : V → V given to us in such a way that it can
be explicitly determined, that is, by a matrix A describing the action of T
on V with respect to a basis B = (v1 , . . . , vn ). Specifically, as T is regarded
as a right operator, 2
[T ]B = A = (aij ) so T (vi ) =
Xn
va .
j=1 j ij
Then in forming the F [x]-module VF [x] , we see that application of the transformation T to V corresponds to right multiplication of every element of V
by the indeterminate x. Thus
vi x =
X
vj aij
so
0 = v1 ai1 + · · · + vi−1 ai,i−1 + vi (aii − x) + vi+1 ai,i+1 · · · vn ain .
Now as in the proof of the Invariant Factor Theorem, we form the free module
F r = x1 F [x] ⊕ · · · ⊕ xn F [x]
and the epimorphism f : F r → V defined by f (xi ) = vi . Then the elements
ri := x1 ai1 + · · · + xi−1 ai,i−1 + xi (aii − x) + xi+1 ai,i+1 · · · xn ain .
2
Note that the rows of A record the fate of each basis vector under T . Similarly, if
T had been a left operator rather than a right one, the columns of A would have been
recording the fate of each basis vector. The difference is needed only for the purpose of
making matrix multiplication represent composition of transformations. It has no real
effect upon the invariant factors.
335
all lie in ker f .
Now for any subscript i, x times the element xi of F r is
X
x i x = ri +
xj aij , i = 1, . . . , n.
Thus for any polynomial p(x) ∈ F [x, we can write
xi p(x) =
X
r h (x) +
k k ik
X
x j bj
for some polynomials hik (x) ∈ F [x] and coefficients bj ∈ F .
Now if
w = x1 g1 (x) + x2 g2 (x) + · · · + xn gn (x)
were an arbitrary element of ker f , then itwould have the form
w =
Xn
x g (x)
i=1 i i
Xn
= (
r h (x))
i=1 i i
(10.5)
+
Xn
xc
i=1 i i
(10.6)
for appropriate hi (x) ∈ F [x] and ci ∈ F . Then since f (w) = 0 = f (ri ) and
f (xi ) = vi we see that
X
f (w) = 0 = f (
= 0+
ri hi (x)) +
X
Xn
vc)
i=1 i i
(10.7)
vi ci .
(10.8)
So each ci = 0. This means
w ∈ r1 F [x] + · · · + rn F [x].
Thus the elements r1 , . . . , rn are a set of generators for ker f .
Thus we have:
Corollary 207 To find the invariant factors of the linear transformation
represented by the square matrix A = (aij ), it suffices to diagonalize the
following matrix of F [x](n×n) ,




A − Ix = 



a11 − x
a12
a13
a21
a22 − x
a23
a31
a32
a33 − x
..
..
..
an1
...
...
336
...
a1n
...
a2n
...
a3n
..
..
. . . ann − x








by the elementary row and column operations to obtain the matrix
diag(p1 (x), p2 (x), . . . , pn (x))
with each pi (x) monic, and with pi (x) dividing pi+1 (x). Then the pi (x) which
are not units are the invariant factors of A.
Now, relisting the invariant factors of T (the non-unit invariant factors)
as p1 (x), . . . pr (x), r ≤ n the invariant factor theorem says that T acts on V
exactly as right multiplication by x acts on
V 0 := F [x]/(p1 (x)) ⊕ . . . ⊕ F [x]/(pr (x)).
The space V 0 has a very nice basis. Each pi (x) is a monic polynomial of
degree di > 0, say
pi (x) = b0i + b1i x + b2i x2 + · · · + bdi −1,i xdi −1 + xdi .
Then the summand F [x]/(p1 (x)) of V 0 is non-trivial, and has the F -basis
1 + (pi (x)), x + ((pi (x)), x2 + (pi (x)), . . . , xdi −1 + (pi (x)),
and right multiplication by x effects the action:
xj + (pi (x)) → xj+1 + (pi (x)), 1 ≤ j < di − 1
(10.9)
di −1
x
+ (pi (x)) → −b0i (1 + (pi (x))) − b1i (x + (pi (x))) − . . . .(10.10)
Thus, with respect to this basis, right multiplication of F [x]/(pi (x)) by x
is represented by the matrix

Cpi (x)



:= 



0
1
0
...
0
0
0
1
0...
0
..
..
..
...
..
0
0
...
0
1
−b0i −b1i −b2i . . . −bdi −1,i








(10.11)
called the companion matrix of pi (x). If pi (x) is a unit in F [x] – that is,
it is just a nonzero scalar, then our convention is to regard the companion
matrix as the “empty” (0 × 0) matrix. That convention is incorporated in
the following
337
Corollary 208 If p1 (x), . . . , pn (x) are the invariant factors of A, then A is
conjugate to the matrix




P −1 AP = 
Cp1 (x)
0
0
Cp2 (x)
..
..
0
...
...
0
0...
0
...
..
0
Cpn (x)



.

This is called the rational canonical form of A.
Remark: Note that each companion matrix Cpi (x) is square with di :=
deg(pi (x) rows. Using decomposition along the first column, one calculates
that Cpi (x) in equation (10.11) has determinant (−1)di −1 (−b0i ) = (−1)di b0i .
Thus the determinant of A is completely determined by the invariant factors.
The determinant is just the product
detA =
Yr
(−1)di b0i = (−1)n
i=1
Yr
b ,
i=1 0i
which is the product of the constant terms of the invariant factors times the
P
parity of n = di , the dimension of the original vector space V .
It may be of interest to know that the calculation of the determinant of
matrix A by ordinary means uses on the order of n! steps. Calculating the
determinant by finding the invariant factors (diagonalizing A − xI) uses on
the order of n3 steps. The abstract way is ultimately faster.
10.7.3
The Jordan Form
This section involves a special form that is available when the ground field
of the vector space V contains all the roots of the minimal polynomial of
transformation T .
Suppose now F is a field containing all roots of the minimal polynomial
of a linear transformation T : V → V . Then the minimal polynomial, as well
as all of the invariant factors, completely factors into prime powers (x −λi )ai .
Applying the primary decomposition to the right F [x]-module V , we see that
a “primary part” V(x−λ) of V is a direct sum of modules, each isomorphic to
F [x] modulo a principal ideal I generated by a power of x − λ. This means
that any matrix representing the linear transformation T is equivalent to
a matrix which is a diagonal matrix of submatrices which are companion
338
matrices of powers of linear factors. So one is reduced to considering F [x]modules of the form F [x]/I = F [x]/((x − λ)m ), for some root λ. Now F [x]/I
has this F -basis:
1 + I, (x − λ) + I, (x − λ)2 I, . . . (x − λ)m−1 + I.
Now right multiplying F [x]/I by x = λ + (x − λ) takes each basis element
to λ times itself plus its successor basis element – except for the last basis
element, which is merely multiplied by λ. That basis element, (x−λ)m−1 +I,
and its scalar multiples comprise a 1-space of all “eigenvectors” of F [x]/I –
that is, vectors v such that vT = λv. The resulting matrix, with respect to
this basis has the form:








λ
0
..
0
0
1
λ
..
0
0
0
1
..
0
0
... 0
0 ...
. . . ..
... λ
... 0
0
0
..
1
λ




.



The matrix which results from assembling blocks according to the primary
decomposition, and rewriting the resulting companion matrices in the above
form, is called the Jordan form of the transformation T .
example: Suppose the invariant factors of a matrix A were given as follows:
x, x(x − 1), x(x − 1)3 (x − 2)2 .
Here, F is any field in which the numbers 0, 1 and 2 are distinct. What is
the Jordan form?
First we separate the invariant factors into their elementary divisors:
prime x: {x, x, x}.
prime x − 2: {(x − 2)2 }.
prime x − 1: {x − 1, (x − 1)3 }.
Next we write out each Jordan block as above: the result being
339

















0
0





0


2 1


2
.


1


1 1


1 1 
1
(All unmarked entries in the above matrix are zero.)
10.7.4
Information Carried by the Invariant Factors
What are some things one would like to know about a linear transformation
T : V → V , and how can the invariant factors provide this information?
1. We have already seen that the last invariant factor pr (x) is the minimal
polynomial of T , that is – the smallest degree monic polynomial p(x)
such that p(T ) is the zero transformation V → 0V . (We recall that
the ideal generated by p(x) is Ann(VF [x] ) when V is converted to an
F [x]-module via T .)
2. Sometimes it is of interest to know the characteristic polynomial of
T , which is defined to be the product p1 (x) · · · pr (x) of the invariant
factors of T . One must bear in mind that by definition the invariant
factors of T are monic polynomials of positive degree. The characteristic polynomial must have degree n, the F -dimension of V . This is
because the latter is the sum of the dimensions of the F [x]/(pi (x)),
that is, the sum of the numbers deg pi (x) = di .
3. Since V has finite dimension, T is onto if an only if it is one-to-one . 3 .
3
This is a well-known consequence of the so-called “nullity-rank” theorem for transformations but follows easily from the fundamental theorems of homomorphisms for
R-modules applied when R = F is a field. Specifically, there is an isomorphism between the poset of subspaces of T (V ) and the poset of subspaces of V which contain
ker T . Since dimension is the length of an unrefinable chain in such a poset one obtains
codimV (ker T ) = dim(T (V ))
340
The dimension of ker T is called the nullity of T , and is given by the
number of invariant factors pi (x) which are divisible by x.
4. The rank of T is the F -dimension of T (V ). It is just n minus the
nullity of T (see the footnote just above).
5. V is said to be a cyclic F [x]-module if and only if there exists a module element v such that the set vF [x] spans V – i.e. v generates V as
an F [x]-module. This is true if and only if the only non-unit invariant factor is pr (x) the minimal polynomial (equivalently: the minimal
polynomial is equal to the characteristic polynomial).
6. Recall that an R-module is said to be irreducible if and only if there
are no proper submodules. (We met these in the Jordan-Hölder theorem for R-modules.) Here, we say that T acts irreducibly on V if
and only if the associated F [x]-module is irreducible. This means there
exists no proper sub-vector space W of V which is T -invariant in the
sense that T (W ) ⊆ W . One can then easily see that T acts irreducibly
on V if and only the characteristic polynomial is irreducible in F [x].4
7. An eigenvector of T associated with the root λ or λ-eigenvector
is a vector v of V such that vT = λv. A scalar is called an eigen-root
of T if and only if there exists a non-zero eigenvector associated with
this scalar. Now one can see that the eigenvectors associated with λ
for right multiplication of VF [x] by x, are precisely the module elements
killed by the ideal (x−λ)F [x]. Clearly they form a T -invariant subspace
of V whose dimension is the number of invariant factors of T which are
divisible by (x − λ).
8. A matrix is said to be nilpotent if and only some power of it is the zero
matrix. Similarly a linear transformation T : V → V is called a nilpotent transformation if and only if some power (number of iterations)
of T annihilates V . Quite obviously the following are equivalent:
• T is a nilpotent transformation.
• T can be represented by a nilpotent matrix with respect to some
basis of V .
4
This condition forces the minimal polynomial to equal the characteristic polynomial
(so that the module is cyclic) and both are irreducible polynomials.
341
• The characteristic polynomial of T is a power of x.
• The minimal polynomial of T is a power of x.
342
Chapter 11
Theory of Fields.
11.1
Introduction
In this chapter we are about to enter one of the most fascinating and historically interesting regions of abstract algebra. I can think of at least three
reasons why this is so:
1. Fields underlie nearly every part of mathematics. Of course we could
not have vector spaces without fields (if only as the center of the coefficient division ring). Independantly of vector spaces, fields figure in
number theory, algebraic varieties and sesquilinear forms. So it is a
very basic subject.
2. There are presently many open problems involving fields – each motivated by some application of fields to another part of mathematics.
3. Third, there is the strange drama that so many field-theoretic questions bear on old historic questions, and indeed, its own history has a
dramatic story of its own.
Recall that a field is an integral domain in which every non-zero element
is a unit. Examples are:
1. The field of rational numbers – in fact the “field of fractions” of an
arbitrary integral domain D (that is, the localization DS where S =
D − {0}). A special case is the field of fractions of the polynomial
domain F [x]. This field is called the field of rational functions
343
over F in the indeterminate x (even though they are not functions at
all).
2. The field of real numbers.
3. The field of complex numbers.
4. The field of integers modulo a prime – more generally, the field D/M
where M is a maximal ideal of an integral domain D.
Of course there are many further examples.
In the past few chapters, we have been passing from genus to species to
subspecies. We considered integral domains, then principal ideal domains,
and now fields. As we get more specialized by adding more axioms, one
might suppose that we are seeing the same theory, with the proofs getting
easier and easier. But what actually happens is that each theorem rises to
its appropriate level of generality, so that with each new speciation of rings,
an entirely new generation of questions becomes feasible. Thus for arbitrary
integral domains, divisibility is an important question, and hence the UFD
condition, which fails for general domains but works for PID’s, is of interest.
But for fields, divisibility questions are almost meaningless since every nonzero element divides every other such element. The new generation of feasible
questions for fields involves some of the oldest and longest-standing problems
in mathematics – they all involve the existence or non-existence of roots of
polynomial equations in one indeterminate.
11.2
Elementary Properties of Field Extensions.
Let K be a field. A subfield is a subset F of K which contains a nonzero element, and is closed under addition, multiplication, and the taking of
multiplicative inverses. It follows at once that F is a field with respect to
these operations inherited from K, and that F contains the multiplicative
identity of K as its own multiplicative identity. Obviously, the intersection
of any collection of subfields of F is a subfield, and so one may speak of the
subfield generated by a subset X of a field F as the intersection of all
344
subfields of F containing set X. The subfield of K generated by F and X is
denoted F (X).1
The subfield generated by zero is called the prime subfield of F . (By
definition, it contains the multiplicative identity as well as zero.) Since a field
is a species of integral domain, it has a characteristic. If the multiplicative
identity element 1, as an element of the additive group (F, +) has finite order,
then its order is a prime number p which we designate as the characteristic
of F in that case. Otherwise 1 has infinite order as an element of (F, +) and
we say that F has characteristic zero in that case. We write char(F ) =
p or 0 in the respective cases. In the former case, it is clear that the prime
subfield of F is just Z/pZ; in the latter case the prime subfield is isomorphic
to the field Q of rational numbers.
Since we shall often be dealing in this chapter with cases in which the
subfield is known but the over-field is not, it is handy to reverse the point of
view of the previous paragraph. We say that K is an extension of F if and
only if F is a subfield of K. A chain of fields F1 ≤ F2 ≤ · · · ≤ Fn , where
each Fi+1 is a field extension of Fi , is called a tower of fields.
If the field K is an extension of the field F , then K is a right module
over F , that is, a vector space over F . As such, it possesses a dimension as
an F -vector space, which we call the degree of the extension F ≤ K, and
denote by the symbol [K : F ].
11.2.1
Algebraic and Transcendental Extensions.
Let K be a field extension of the field F , and let p(x) be a polynomial in
F [x]. We say that an element α of K is a zero of p(x) (or a root of the
equation p(x) = 0) if and only it p(x) 6= 0 and yet p(α) = 0.2 If α is a
1
Of course this notation suffers the defect of leaving K out of the picture. Accordingly
we shall typically be using it when the ambient overfield K of F is clearly understood from
the context.
2
Two remarks are appropriate at this point.
First note that the definition makes it “ungrammatic” to speak of a zero of the zero
polynomial. Had the non-zero-ness of p(x) not been inserted in the definition of root, we
should be saying that every element of the field is a zero of the zero polynomial. That is
such a bizarre difference from the situation with non-zero polynomials over infinite fields
that we should otherwise always be apologizing for that exceptional case in the statements
of many theorems. It thus seems better to utilize the definition to weed out this akwardness
in advance.
345
zero in K of a polynomial in F [x], then α is said to be algebraic over F .
Otherwise α is said to be transcendental over F .
If K is an extension of F , and if α is an element of K, then the symbol
F [α] denotes the subring of K generated by F ∪ {α}. This would be the set
of all elements of K which are F -linear combinations of powers of α. Note
that as this subring contains the field F , it is a vector space over F whose
dimension is again denoted [F [α] : F ].
There is then a ring homomorphism
ψ : F [x] → F [α] ⊆ K
which takes the polynomial p(x) to the field element p(α) in K. Clearly
ψ(F [x]) = F [α].
Suppose now ker ψ = 0. Then F [x] ' F [α] and α is transcendental since
any non-trivial polynomial of which it is a zero would be a polynomial of
positive degree in ker ψ.
On the other hand, if ker ψ 6= 0, then, as F [x] is a principal ideal domain,
we have ker ψ = F [x]p(x), for some polynomial p(x). Then
F [α] = ψ(F [x]) ' F [x]/kerψ = F [x]/(p(x)).
Now, as F [α] is an integral domain, the principal ideal (p(x)) is a prime
ideal, and so p(x) is a prime element of F [x]. But since F [x] is a UFD
(not to mention even simpler reasons) p(x) is an irreducible element of F [x].
Consequently the ideal (p(x)) = ker ψ is actually a maximal ideal of F [x],
and so the subring F [α] is a field.
The irreducible polynomial p(x) determined by α is unique up to multiplication by a non-zero scalar. The associate of p(x) which is monic is denoted
IrrF (α). It is the the unique monic polynomial of smallest possible degree
which has α for a zero.
Note that in this case, the F -dimension of F [α] is the F - dimension of
F [x]/(p(x)), which, by the previous chapter can be recognized to be the
degree of the polynomial p(x).
Second, there are natural objections to speaking of a “root” of a polynomial. One
of my colleagues (an editor of the Intelligencer whose opinion I regard most highly) asks
whether it might not be better to follow the German example and write “zero” (Nullstelle)
for “root” (Wurzel), despite usage of the latter term in English versions of Galois Theory.
I will follow tradition in speaking of a “root” of an equation, p(x) = 0 but the zero of a
polynomial.
346
Now conversely, assume that [F [α] : F ] is finite. Then there exists a
non-trivial finite F -linear combination of elements in the infinite list
{1, α, α2 , α3 , . . .}
which is zero. It follows that α is algebraic in this case.
Summarizing, we have
Theorem 209 Let K be an extension of the field F and let α be an element
of K.
1. If α is transcendental over F , then the subring F [α] which it generates
is isomorphic to the integral domain F [x]. Moreover, the subfield F (α)
which it generates is isomorphic to the field of fractions of F [x], the
field of rational functions over F .
2. If α is algebraic over F , then the subring F [α] is a subfield intermediate
between K and F . As an extension of F its degree is
[F [α] : F ] = deg IrrF (α).
3. Thus dimF F [α] is finite or infinite according as α is algebraic or transcendental over F .
11.2.2
Indices Multiply: Ruler and Compass Problems.
Lemma 210 If F ≤ K ≤ L is a tower of fields, then the indices “multiply”
in the sense that
[L : F ] = [L : K][K : F ].
proof: Let A and B be bases of L as a vector space over K, and K as a
vector space over F , respectively. Let AB be the set of all products {ab|a ∈
A, b ∈ B}. We shall show that AB is an F -basis of L.
If α ∈ L, then α is a finite K-linear combination of elements of A, say
α = a1 α1 + · · · + am αm .
347
Now each coefficient αi , being an element of K, is a finite linear combination
αj = bj1 βj1 + · · · + bjmj βjmj
of elements bij of B (with the βij in F ). Substituting these expressions for
αj in the L-linear combination for α, we have expressed α as an F -linear
combination of elements of AB. Thus AB is an F -spanning set for L.
It remains to show that AB is an F -linearly independant set. So suppose
for some finite subset S, that
X
(a,b)∈S
abβa,b = 0.
Then as each βa,b is in F . and each b is in K, the left side of the presented
formula may be regarded as a K-linear combination of a’s equal to zero, and
so, by the K-linear independance of A, each coefficient
X
b
bβa,b
of each a is equal to zero. Hence each βa,b = 0, since B is F -linearly independent. Thus AB is F -linearly independent and hence is an F -basis for L.
Since this entails that all the elements of AB are distinct, we see that
|AB| = |I × J| = |I||J| = |A||B|,
which proves the theorem.
Corollary 211 If F1 ≤ F2 ≤ · · · ≤ Fn is a finite tower of fields, then
[Fn : F1 ] = [Fn : Fn−1 ] · [Fn−1 : Fn−2 ] · · · [F2 : F1 ].
We can use this observation to sketch a proof of the impossibility of certain
ruler and compass constructions. Given a unit length 1, we can replicate it
m times, or divide it into n equal parts, and so can form all lengths which
are rational numbers. We can also form right-angled triangles inscribed in a
circle and so with ruler and compass, we can extract the square root of the
differences of squares, and hence any square root because of the formula
c+1
c=
2
2
c−1
−
2
348
2
.
If α is such a square root, where c is a rational length, then by ruler and
compass we can form all the lengths in the ring Q[α], which (since α is a
zero of x2 − c) is a field extension of Q of degree 1 or 2. Iterating these
constructions a finite number of times, the possible lengths that we could
encounter all lie in the uppermost field of a tower of fields
Q = F1 < F2 < · · · < Fn = K
with each Fi+1 an extension of degree two over Fi . By the Corollary above,
the index [K : Q] is a power of 2.
Now we come to the arm-waving part of the proof: One needs to know
that once a field L of constructable distances has been achieved, the only
new number not in K contructed entirely from old numbers already in K is
in fact obtained by the construction of a missing side of a right triangle, two
of whose sides have lengths in K. (A really severe proof of that fact would
require a formalization of exactly what “ruler and compass” constructions are
– a logical problem beyond the field-theoretic applications whose interests are
being advanced here.) In this sketch, we assume that that has been worked
out.
This means, for example, that we cannot find by ruler and compass alone,
a length α which is a zero of an irreducible polynomial of degree n containing
an odd prime factor. For if so, Q[α] would be a subfield of a field K of
degree 2m over Q. On the other hand, the index n = [F [α] : F ] must divide
[K : Q] = 2m , by the Theorem 209. But the odd prime in n cannot divide
2m . That’s it! That’s the whole amazingly simple argument!
Thus one cannot solve the problem posed by the oracle of Delos, to “duplicate” (in volume) a cubic altar – i.e., find a length α such that α3 − 2 = 0
– at least not with ruler and compass.
Similarly, given angle α, trisect it with ruler and compass. If one could,
one could construct the length cos β where 3β = α. But cos(3β) = 4cos3 β −
3 cos β = λ = cos α. This means we could always find a zero of 4x3 − 3x − λ
and when α = 60o , so λ = 1/2, setting y = 2x yields a constructed zero of
y 3 −3y −1, which is irreducible over Q. As observed above, this is impossible.
11.2.3
The Number of Zeros of a Polynomial
Lemma 212 Let K be an extension of the field F and suppose the element
α of K is a zero of the polynomial p(x) in F [x]. Then in the polynomial ring
F [α][x], x − α divides p(x).
349
proof: Since p(α) = 0 we may write
p(x) =
=
=
=
a0 + a1 x + · · · + an x n
p(x) − p(α)
a1 (x − α) + a2 (x2 − α2 ) + · · · + an (xn − αn )
(x − α)[a1 + a2 (x + α) + a3 (x2 + αx + α2 ) + · · ·
· · · + an (xn−1 + · · · αn−1 )].
Theorem 213 Let K be an extension of the field F and let p(x) be a polynomial in F [x]. Then K has at most deg p(x) zeros of p(x) (even counting
multiplicities).
proof: There is no loss in assuming that F = K and that p(x) is monic.
Since K[x] is a UFD, p(x) has a factorization into irreducible factors as
p(x) = (x − α1 )n1 · · · (x − αr )nr g1 (x) · · · gk (x),
where the gi (x) are each irreducible in K[x] of degree at least 2. Then each
of the αi are zeros. Conversely by Lemma 212, each zero α produces a linear
(degree one) factor (x − α) of p(x), and so α must be one of the αi by
uniqueness of the factorization.
The number ni is called the multiplicity of the zeros αi , and clearly
P
ni ≤ deg p(x), proving the Theorem.
For the next result, we require this definition: a group G is said to be
locally cyclic if and only if every finitely-generated subgroup of G is cyclic.
Clearly such a group is abelian.
Corollary 214 Let F be any field and let F ∗ be its multiplicative group of
non-zero elements. Then the torsion subgroup of F ∗ is locally cyclic.
proof: Any finitely-generated subgroup of the torsion subgroup of F ∗ is a
finite abelian group A. If A were not cyclic, it would contain a subgroup
isomorphic to Zp × Zp , for some prime p. In that case, the polynomial xp − 1
would have at least p2 zeros in F , contrary to Theorem 213.
350
Corollary 215 (Polynomial Identities) Suppose a(x), b(x), c(x) and d(x)
are polynomials in F [x]. Suppose a(b(α)) = c(d(α)) for all α ∈ F . If F
contains infinitely many elements, then
a(b(x)) = c(d(x))
is a polynomial identity — that is, both sides are the same element of F [x].
proof: : We need only show that the polynomial J(x) := a(b(x)) − c(d(x))
has degree zero. If it has positive degree, then by Theorem 213 it possesses
only finitely many zeros. But then by hypothesis, the infinitely many elements of F would all be zeros of J(x), a contradiction.
11.3
Splitting Fields and Their Automorphisms
11.3.1
Extending Isomorphisms Between Fields
Suppose E is a field extension of F and that α is an element of E which is
algebraic over F . Then the development in the previous section showed that
there is a unique irreducible monic polynomial,
g(x) = IrrF (α) ∈ F [x],
having α as a zero. The subfield F (α) generated by F ∪ {α} is the subring
F [α] isomorphic to the factor ring F [x]/(g(x)).
Now suppose f : F1 → F2 is an isomorphism of fields. Then f can be
extended to a ring isomorphism
f ∗ : F1 [x] → F2 [x],
which takes polynomial
p := a0 + a1 x + · · · + an xn
to
f (p) := f (a0 ) + f (a1 )x + · · · + f (an )xn .
Then of course f ∗ takes irreducible polynomials of F1 [x] to irreducible polynomials of F2 [x].
With this notation in mind we record
351
Lemma 216 (Fundamental Lemma on Extending Isomorphisms) Let E1 be
a field extension of F1 , and suppose f : F1 :→ E be an isomorphism which
embeds F1 as a subfield F2 of some other field E. Let f ∗ be the extension of
the field isomorphism f : F1 → F2 to a ring isomorphism F1 [x] → F2 [x], as
described above.
Finally, let α1 be an element of E1 which is algebraic over F1 , and let g1
be its monic irreducible polynomial in F1 [x].
Then the following assertions hold:
1. The embedding (injective morphism) f : F1 → E can be extended to an
embedding
fˆ : F1 (α1 ) → E,
if and only if the polynomial f ∗ (g1 ) := g2 contains a zero in E.
2. Moreover, for each zero α2 of g2 in E, there is a unique embedding
fˆ : F1 (α1 ) → E, taking α1 to α2 and extending f .
3. The number of extensions fˆ : F1 (α1 ) → E of the embedding f is equal
to the number of distinct zeros of g2 to be found in E.
proof: 1. If there is an embedding fˆ : F (α1 ) → E extending the embedding
f , then fˆ(α1 ) is a zero of g2 in E.
Conversely, if α2 is a zero of g2 in E, then F2 (α2 ) is a subfield of E
isomorphic to F2 [x]/(g2 (x)). Similarly, F1 (α1 ) is isomorphic to F1 [x]/(g1 (x)).
But the ring isomorphism f ∗ takes the maximal ideal (g1 (x)) of F1 [x] to
the maximal ideal (g2 (x)) of F2 [x], and so induces an isomorphism of the
corresponding factor rings:
f 0 : F1 [x]/(g1 (x)) → F2 [x]/(g2 (x)).
Linking up these three isomorphisms
f0
F1 (α1 ) → F1 [x]/(g1 (x)) → F2 [x]/(g2 (x)) → F2 (α2 ) ⊆ E,
yields the desired embedding.
2. If their were two embeddings F1 (α1 ) → E, taking α1 to α2 and extending f , then one composed with the inverse of the other would fix F1 ∪ {α1 }
and hence F1 (α1 ) element-wise. Similarly, composition of the inverse of one
with the other would be the identity mapping on F2 (α2 ). It follows that the
two embeddings are equal.
3. This statement now follows from 1. and 2.
352
11.3.2
Splitting Fields.
Suppose E is an extension field of F . Then, of course, any polynomial in
F [x] can be regarded as a polynomial in E[x], and any factorization of it in
F [x] is a factorization in E[x]. Put another way, F [x] is a subdomain of E[x].
A polynomial p(x) in F [x] of positive degree is said to split over E if and
only if p(x) factors completely into linear factors (that is, factors of degree
1) in E[x]. Such a factorization has the form
p(x) = a(x − a1 )n1 (x − a2 )n2 · · · (x − ar )nr
(11.1)
where a is a nonzero element of F , the exponents ni are positive integers
and the collection {a1 , . . . , ar } of zeros of p(x) in E (also called roots of
p(x) = 0) is contained in E.
Obviously, if E 0 is a field containing E and p(x) splits over E, then it splits
over E 0 . Also, if E0 were a subfield of E containing F together with all of the
zeros {a1 , . . . , ar }, then p(x) splits over E0 since the factorization in equation
(11.1) can take place in E0 [x]. In particular, this is true if E0 = F (a1 , . . . , ar ),
the subfield of E generated by F and the zeros ai , i = 1, . . . , r.
An extension field E of F is called a splitting field of the polynomial
p(x) ∈ F [x] over F if and only if
1. F ⊆ E.
2. f (x) factors completely into linear factors in E[x] as
f (x) = a(x − a1 )(x − a2 ) · · · (x − an ).
3. E = F (a1 , . . . , an ), that is, E is generated by F and the zeros ai .
The following observations follow directly from the definition just given
and the proofs are left as an exercise.
Lemma 217 The following hold:
1. If E is a splitting field for p(x) ∈ F [x] over F , and L is a subfield of
E containing F , then E is a splitting field for p(x) over L.
2. Suppose p(x) and q(x) are polynomials in F [x]. Suppose E is a splitting
field for p(x) over F and K is a splitting field for q(x) over E. Then
K is a splitting field for p(x)q(x) over F .
353
Our immediate goals are to show that splitting fields exist, to show that
they are unique up to isomorphism, and to show that between any two such
splitting fields, the number of isomorphisms is bounded by a function of
number of distinct zeros of f (x).
Theorem 218 (Existence of Splitting Fields.)
If f (x) ∈ F [x], then a
splitting field for f (x) over F exists. Its degree over F is at most d!, where
d is the sum of the degrees of the non-linear irreducible polynomial factors of
f (x).
proof: Let f (x) = f1 (x)f2 (x) · · · fm (x) be a factorization of f (x) into irreducible factors in F [x]. Set n = deg f (x), and proceed by induction on
k = n − m. If k = 0, then n = m, so each factor fi (x) is linear and clearly
E = F satisfies all the conclusions of the theorem. So assume k > 0. Then
some factor, say f1 (x), has degree greater than 1. Form field
L = F [x]/(f1 (x)),
(this is a field since f1 (x) is irreducible and is an extension of F since it is a
vector-space over F ), and observe that the coset a := x + F [x]f1 (x) is a zero
of f1 (x) ( and hence f (x)) in L. Thus in L[x], we have the factorizations:
f1 (x) = (x − a)h1 (x),
and
f (x) = (x − a)h1 (x)f2 (x) · · · fm (x),
with ` > m irreducible factors. Thus n − ` < n − m and so by induction,
there is a splitting field E of f (x) over L. Moreover, this degree is at most
(d − 1)! since the sum of the degrees of the non-linear irreducible factors of
f (x) over L[x] has been reduced by at least one because of appearance the
new linear factor x − a.
Now we claim that E is a splitting field of f (x) over F . Property (1.)
holds since L ⊆ E implies F ⊆ E. We already have (2.) from the definition
of E. It remains to see that E is generated by the zeros of f . Since (x − a)
is one of the factors of f (x) in the factorization
f (x) = (x − a1 )(x − a2 ) · · · (x − an ),
354
we may assume without loss of generality that a = a1 . We have E =
L(a1 , . . . , an ) from the definition of E. But L = F (a), so
E = L(a1 , . . . , am ) = F (a1 )(a1 , . . . , an ) = F (a1 , . . . , an ).
Thus E is indeed a splitting field for f (x) over F .
Finally, since [L : F ] = deg f1 (x) ≤ d, and [E : L] ≤ (d − 1)! we obtain
[E : F ] = [E : L][L : F ] ≤ (d − 1)! · d = d! as required.
Theorem 219 Let η : F → F̄ be an isomorphism of fields, which we extend
to a ring isomorpism η ∗ : F [x] → F̄ [x]. and we write f¯(x) for η ∗ (f (x)) for
each f (x) ∈ F [x]. Suppose E and Ē are splitting fields of f (x) over F , and
and f¯(x) over F̄ , respectively. Then η can be extended to an isomorphism
η̂ of E onto Ē, and the number of ways of doing this is at most the index
[E : F ]. There are exactly [E : F ] such isomorphisms if f¯(x) has distinct
zeros in Ē; or, equivalently, f (x) has distinct zeros in E.
proof: We use induction on [E : F ]. If [E : F ] = 1, f (x) factors into linear
factors, and there is precisely one isomorphism extending η, namely η itself.
Assume [E : F ] > 1. Then f (x) contains an irreducible factor g(x) of
degree greater than 1. Since η ∗ is an isomorphism of polynomial rings, ḡ(x) is
an irreducible factor of f¯(x) with the same degree as g(x). Thus Ē contains
a zero of ḡ(x), and E contains a zero r1 of g(x). Form K = F (r1 ). Then
by the Fundamental Lemma on Extensions (Lemma 216), the isomorphism
η has k extensions ζ1 , . . . , ζk : F (r1 ) → Ē, where k is the number of distinct
zeros of ḡ(x) in Ē.
Now if m = deg g(x), [K : F ] = m and k ≤ m. If the zeros of ḡ(x) are
distinct, k = m.
Now clearly E is a splitting field of f (x) over K, and Ē is a splitting field
¯
of f (x) over each ζi (K). Since [E : K] < [E : F ], induction implies each ζi
can be extended to an isomorphism E → Ē in at most [E : K] ways, and in
exactly k ways if the non-linear factors of f (x) have distinct zeros in E. This
yields at most k[E : K] = [K : F ][E : K] = [E : F ] isomorphisms in general,
and exactly [E : F ] isomorphisms if f (x) has distinct zeros in E. Since we
possess at least one field isomorphism from E to Ē, the last statement is
equivalent to f¯(x) having distinct zeros in Ē. This proves the theorem.
355
Corollary 220 If E1 and E2 are two splitting fields for f (x) ∈ F [x] over F ,
there exists an F -linear field isomorphism:
σ : E1 → E2 .
Proof: This follows immediately from Theorem 219 for the case σ = 1F ,
the identity mapping on F = F1 = F2 .
EXAMPLE. Suppose F = Q, the rational field, and p(x) = x4 −
√5,
irreducible by Eisenstein’s criterion. Then if α is a zero of p(x), α2 is
±
5,
√
so the
√ splitting field E of p(x) over Q must contain the subfield Q( 5). In
Q( 5)[x] we have a factorization
√
√
x4 − 5 = (x2 + 5)(x2 − 5).
q √
√
Noting that the complex number − 5 is i 4 5, the full factorization over
E[x] is
√
√
√
√
4
4
4
4
x4 − 5 = (x + i 5)(x − i 5)(x + 5)(x − 5),
√
where i2 = −1, and 4 5 denotes the positive real fourth zero of 5. We see
that
√
4
E = Q( 5, i)
and
√
√
4
4
[E : Q] = [E : Q( 5)] · [Q( 5) : Q] = 2 · 4 = 8.
This shows that the bound [E : F ] ≤ d! need not be obtained.
We close this subsection with a useful observation:
Lemma 221 Suppose K is an extension of a field F , and K contains a
subfield E which is a splitting field for a polynomial p(x) ∈ F [x] over F .
Then any automorphism of K which fixes F pointwise, must stabilize E.
Remark: The Lemma just says that the splitting field E is “characteristic”
among fields in K which contain F – that is, E is invariant under all F -linear
automorphisms of K.
Proof: If σ is an F -linear automorphism of K, then for any zero α of p(x)
in K, σ(α) is also a zero of p(x) in K.
356
11.3.3
Normal Extensions.
So far the notion of splitting field is geared to a particular polynomial. The
purpose of this section is to show that it can be seen as a property of a field
extension itself, independant of any particular polynomial.
Before proceeding further, let us stream-line our language concerning field
automorphisms.
Definition: Let K and L be fields containing a common subfield F . An
isomorphism K → L of K onto L is called an F -isomorphism if and only
if it fixes the subfield F elementwise. (Heretofore, we have been calling these
F -linear isomorphisms.) Of course, if K = L, an F -isomorphism:K → L
is called an F -automorphism. Finally, an F -isomorphism of K onto a
subfield of L is called an F -embedding.
We say that a finite extension E of F is normal over F if and only if E
is the splitting field over F of some polynomial of F [x]. Just as a reminder,
recall that this means that there is a polynomial p(x) which splits completely
into linear factors in E[x] and that E is generated by F and all the zeros of
p(x) that lie in E.
Note that if E is a normal extension and K is an intermediate field, –
that is, F ≤ K ≤ E – then E is a normal extension of K.
We have a criterion for normality.
Theorem 222 (A characterization of normal field extensions3 ) A finite extension E of F is normal if and only if, every irreducible polynomial g(x) in
F [x] which has one zero in E, in fact splits completely over E.
proof: Suppose E has the described property concerning irreducible polynomials of F [x]. Since E is a finite extension of F , it has an F -basis, say
u1 , . . . , un . Set gi (x) = IrrF (ui ), and let p(x) be the product of the gi (x).
Then by hypothesis, every zero of p(x) is in E and p(x) splits into linear factors in E[x]. since the {ui } are among these zeros and the ui and F generate
E, E is a splitting field of p(x) over F .
3
In many books, the characterizing property given in this theorem is taken to be the
definition of “normal extension”. This does not change the fact that the equivalence of
two distinct notions must be proved.
357
Suppose, on the other hand, that E is the splitting field of a polynomial
f (x) ∈ F [x]. Suppose, by way of contradiction that g(x) is an irreducible
polynomial in F [x] with one zero a in E, and one zero b not in E. We may
regard b as a zero in a field E(b), properly extending E.
Now since g(x) is irreducible, there is an isomorphism σ : F (a) → F (b)
which is the identity mapping when restricted to F .
Now E is the splitting field for a polynomial f (x) ∈ F [x]. Thus E is a
splitting field for f (x) over F (a). But also E(b) is a splitting field for f (x) over
F (b) since E(b) is generated by F (b) and all zeros of f (x). By Theorem 219,
the isomorphism σ extends to an F -linear isomorphism τ : E → E(b). It
follows that E and E(b) have the same degree over F . But as E is a subfield
of E(b), we have E = E(b), so b ∈ E, contrary to our choice of b. Thus all
zeros of g(x) lie in E, as required.
Corollary 223 If E is a (finite) normal extension of F and K is any intermediate field, then any F -embedding
σ : K → E.
can be extended to an automorphism of E fixing F .
proof: E is a splitting field over F for a polynomial f ∈ F [x]. Thus E is a
splitting for f (x) over K as well as over σ(K). The result then follows from
Theorem 219.
Suppose K/F is a finite extension. Then K = F (a1 , . . . , an ) for some
finite set of elements {ai } of K (for example, an F -basis of K).
Now let p(x) be the product of the gi (x) := IrrF (ai ). Now any normal
extension E/F capturing K as an intermediate field, contains every zero of
gi (x) and hence every zero of p(x). Thus between E and K is a splitting field
S of p(x) over F . The splitting field S is the “smallest” normal extension
of F containing K in the sense that any other finite normal extension E 0 of
F which contains K, contains an isomorphic copy of S – that is, there is
an injective morphism S → E 0 whose image contains K. Now this global
description of the field S is independent of the polynomial p(x), and so we
have
358
Corollary 224 (Normal Closure) If K is a finite extension of F , then there
is a normal extension S of F , containing K with the property that for every
normal extension E of F containing K, there is an F -isomorphism of S to
a subfield of E which is normal over F and contains K. Clearly S is unique
up to isomorphism.
The extension S of F so defined is called the normal closure of K over
F . The reader should bear in mind that this notion depends critically on F .
For example it may happen that K is not normal over F , but is normal over
some intermediate field L. Then the normal closure of K over L is just K
itself while the normal closure of K over F could be larger than K.
Another consequence of Corollary 223 is this:
Corollary 225 (Normality of invariant subfields) Suppose K is a normal
extension of the field F of finite degree, and let G be the full group of F automorphisms of K. If L is a subfield of K containing F , then L is normal
over F if and only if it is G-invariant.
proof: If L is normal over F , then L is the splitting field for some polynomial
f (x) ∈ F [x]. But the roots of f (x) in K are permuted among themselves by
G, and so the subfield over F that they generate, namely L, is G-invariant.
On the other hand, assume L is G-invariant. Suppose p(x) is an irreducible polynomial in F [x] with at least one zero, α in L. Suppose β were
another zero of p(x) in K. Then there is an F -isomorphism F [α] → F [β] of
subfields of K, which, by Corollary 223, can be extended to an element of G.
Since L is G invariant, β ∈ L. Thus p(x) has all its zeros in L and so splits
completely over L. Now by Theorem 222, L is normal over F .
11.4
Some Applications to Finite Fields
11.4.1
Prime Subfields
For any field F there is a unique subfield P generated by the multiplicative
identity element 1. It is called the prime subfield of F and is uniquely
defined by the field F . Of course it must contain the following set of field
elements
N = {0, 1, 1 + 1, 1 + 1 + 1, . . .}
359
as well as the additive inverses of elements of this set, −N . There are two
cases: (1) N is a finite set, and (2) N is an infinite set. In the former case,
there exists a smallest positive integer n for which
1 + 1 + · · · + 1 with n summands
is the zero element of F . In the latter case, there exists no such integer n.
(In language that we have understood earlier,we are discussing the order of
the element 1 in the additive group (F, +) of the field F . In the one case 1
has finite order n and in the second case, “1 has infinite order”.
The reader should recognise that F is an integral domain, then the two
cases being discussed simply describe the characteristic of F . In case (1) n
must be a prime number. But that means that the set N is itself a field, so
N = P ' Z/pZ,
where, p is a prime number. In case (2) M := N ∪ −N is a copy of the
ring of integers contained in P . It follows that F contains the set M −1 ,
of multiplicative inverses of non-zero elements of M , and that M M −1 is
isomorphic to the field of rational numbers.
Lemma 226 In any field, the prime subfield is either the field Z/pZ or is
the field of rational numbers.
The algebraic definition of the prime subfield as the subfield generated
by 1, forces an invariance under any isomorphism:
Corollary 227 Any field isomorphism φ : K → L induces an isomorphism
P (K) → P (L) between their prime subfields.
11.4.2
Finite fields
Suppose now that F is a finite field – that is, one which contains finitely many
elements. Then, of course, its prime subfield P is the field Z/pZ, where p
is a prime. Then, of course, F is a finite-dimensional vector space over P –
say, of dimension n. Then clearly, |F | = pn = q, a prime power.
It now follows from Corollary 214 that the finite multiplicative group F ∗
of non-zero elements of F is a cyclic group of order q − 1. This means that
F contains all the q − 1 roots of the equation xq−1 − 1 = 0 in P [x], and since
F is generated by its set of non-xero elements, we have
360
Lemma 228 F is a splitting field of the polynomial xq − x in P [x].
It follows immediately, that F is uniquely determined up to P -isomorphism
by the prime-power q alone.
Corollary 229 For any given prime-power q, there is, up to isomorphism,
exactly one field with q elements.
One denotes any member of this isomorphism class by the symbol GF(q).
11.4.3
Automorphisms of finite fields
Suppose F is a finite field with exactly q elements, that is, F ' GF(q).
We have seen that char (F ) = p, a prime number, where, for some positive
integer n, q = pn . We know from Theorem 237 of Subsection 10.4.1, that the
“p-th power mapping” σ : F → F is a ring endomorphism. Its kernel is trivial
since F contains no non-zero nilpotent elements. Thus σ is injective, and so,
by the pigeon-hole principle, is bijective – that is, it is an automorphism of
k
F . Now suppose σ had order k. Then, αp = α for all elements α of F . But
k
as there are at most pk roots of xp − x = 0, we see that k = n. Thus σ
generates a cyclic subgroup of Aut(F ) of order n.
Now let τ be an arbitrary automorphism of F . Now the multiplicative
group C of non-zero elements of F is cyclic of order q−1. Let θ be a generator
of C. Then there exists an integer t , 0 < t < q − 1, such that τ (θ) = θt .
Then, as τ is an automorphism, τ (θi ) = (θi )t . Since 0t = 0 = τ (0), it is true
that the automorphism τ is a power mapping α → αt on F . Since
αt + 1 = (α + 1)t = αt + tαt−1 + . . . + 1t
we have that if t > 1, all of the q −1 elements α in C are roots of the equation
Xt−1
k=1
t
k
!
xk = 0.
(11.2)
If the polynomial on the left were not identically zero, its degree t − 1 would
be at least as large as the number of its distinct zeros which is at least q1 .
That is outside the range of t. Thus each of the binomial coefficients in the
equation is zero. Thus either p divides t or t = 1.
361
Now if t = pk ·s, where s is not divisible by p, we could apply the argument
k
of the previous paragraph to ρ := τ p , to conclude that s = 1. Thus t is a
prime power and so τ is a power σ.
We have shown the following:
Corollary 230 If F is a the finite field GF(q), q = pn , then the full automorphism group of F is cyclic of order n and is generated by the p-th power
mapping.
remarks: The argument above, that Aut(F ) = hσi, could be achieved in
one stroke by a principle to be introduced later, called the “independence of
F -characters”. The above argument uses only the most elementary properties
introduced so far.
We see at this point that [F : P ] = |Aut(F )|. In the language of a later
section, we would say that F is a “Galois extension” of P . It will have very
strong consequences for us. For one thing, it will eventually mean that no
irreducible polynomial of postive degree in GF(q)[x] can have repeated zeros
in some extension field.
11.4.4
Polynomial equations over finite fields: the ChevalleyWarning Theorem
The main result of this section is a side-issue in so far as it doesnt really play
a role in the developement of the Galois theory as do the chain of sections
before and after this one. However, it is represents a property of finite fields
which is too important not to mention before leaving a section devoted to
these fields. The student wishing to race on to the theory of Galois extensions
and solvability of equations in radicals may safely skip this section, hopefully
for a later revisitation.
For this subsection, fix a finite field K = GF(q) of characteristic p and
order q = p` . Now for any natural number k, let
S(k, q) :=
X
a∈K
ak .
the sum of the k-th powers of the elements of the finite field K of q elements.
Clearly, if k = 0, the sum is S(0, q) = q = 0, since the field has characteristic
362
p dividing q. If k is a multiple of q − 1 distinct from 0, then as K ∗ is a cyclic
group of order q − 1,
S(k, q) =
X
K−{0}
1 = q − 1 = −1.
Now suppose k is not a multiple of q − 1. Then there exists an element
b in K ∗ with bk 6= 1. Since multiplying on the left by b simply permutes the
elements of K, we see that
S(k, q) =
X
ak =
a∈K
X
a∈K
(ba)k = bk S(k, q).
So (bk − 1)S(k, q) = 0. Since the first factor is not zero, we have S(k, q) = 0
in this case.
We summarize this in
Lemma 231


 0
if k = 0
S(k, q) =  −1 if q − 1 divides k

0
otherwise
Now fix a positive integer n. Let V = K (n) , the vector space of n-tuples
with entries in K. For any vector v = (a1 , . . . , an ) ∈ V and polynomial
p = p(x1 , . . . xn ) ∈ K[x1 , . . . , xn ],
the ring of polynomials in the indeterminates x1 , . . . xn , we let the symbol
p(v) denote the result of substituting ai for xi in the polynomial, that is,
p(v) := p(a1 , . . . , an ).
Now suppose p = xk11 xk22 · · · xknn is a monomial of degree
Then
X
Yn
=
S(ki , q) = 0,
v∈V
i=1
ki < n(q − 1).
P
since at least one of the exponents ki is less than q − 1 and so by Lemma
231 introduces a factor of zero in the product. Since every polynomial p in
K[x1 , . . . , xn ] is a sum of such monomials of degree less than n(q − 1) , we
have
Lemma 232 If p ∈ K[x1 , . . . , xn ] has degree less than n(q − 1), then
S(p) :=
X
v∈V
363
p(v) = 0.
Now we can prove the following
Theorem 233 (Chevalley-Warning) Let K be a finite field of order q and
characteristic p. Suppose p1 , . . . , pm is a family of polynomials of K[x1 , . . . , xn ],
the sum of whose degrees is less than n, the number of variables. Let X be
the collection
{v ∈ K (n) |pi (v) = 0, i = 1, . . . , m}
of common zeroes of these polynomials. Then
|X| ≡ 0 mod p.
proof: Let R := i=1 (1 − pq−1
). Then R is a polynomial whose degree
i
P
(q − 1) deg pi is less than n(q − 1). Now if v ∈ X, then v is a common zero
of the polynomials, so R(v) = 1. But if v ∈
/ X, then for some i, pi (v) 6= 0, so
pi (v)q−1 = 1, introducing a zero factor in the definition of R(v). Thus we see
that the polynomial R induces the characteristic function of X. It follows
that
X
S(R) := v∈V R(v) ≡ |X| modp.
(11.3)
Qn
But since deg R < n(q − 1), Lemma 232 forces S(R) = 0, which converts
(11.3) into the conclusion.
Corollary 234 Suppose p1 , . . . pm is a collection of polynomials over K =
GF(q) in the indeterminates x1 , . . . , xn , each with a zero constant term. Then
there exists a common non-trivial zero for these polynomials. (Non-trivial
means one that is not the zero vector of V .)
In particular, if p is a homogeneous polynomial over K in more indeterminates than it’s degree, then it must have a non-trivial zero.
11.5
Separable Elements and Field extensions.
11.5.1
Separability.
A polynomial p(x) ∈ F [x] is said to be separable if and only if its irreducible
factors in F [x] each have all their roots distinct in a splitting field for p(x).
Clearly a splitting field for p(x) contains a splitting field for any one of its
factors (just take the subfield generated by the roots of that factor and note
that the factor splits there).
There is a simple way to tell whether a polynomial f (x) in F [x] is separable.
364
Lemma 235 The polynomial f (x) has distinct roots in its splitting field if
and only if
F [x]f (x) + F [x]f 0 (x) = F [x],
where f 0 (x) represents the formal derivative of f (x). Equivalently, f (x) is
separable if and only if f and its derivative f 0 are relatively prime in F [x].
proof: If f (x) = a0 + a1 x + · · · + an xn , then its formal derivative f 0 (x) is
defined to be the polynomial
a1 + 2a2 x + · · · + nan xn−1 .
For any field K, the mapping ∂ : K[x] → K[x] which takes each polynomial
to its formal derivative, is a K-linear mapping which satisfies the “product
rule”:
∂(f · g) = f · ∂(g) + (∂(f )) · g.
Now if E is a splitting field for f (x) in F [x], we have a factorization of
f (x) in E[x] into linear factors:
f (x) = (x − a1 )b1 (x − a2 )b2 · · · (x − am )bm ,
P
where m is the number of distinct roots and bi = n, the degree of f (x).
Then the product rule (iterated a number of times) renders the derivative as
f 0 (x) =
Xm
b (x − ai )b1 −1 (
i=1 i
Y
j6=i
(x − aj )bj ).
Now if the roots of f (x) are not distinct, some exponent bi exceeds 1, and
so (x − bi ) divides both f (x) and f 0 (x). Conversely, if there is a common
non-unit factor to f (x) and f 0 (x), then (by the UFD property) there is a
common linear factor (x − ai ), forcing bi > 1 so that the root ai is repeated
in f (x).
This lemma has a very nice application when f (x) is irreducible.
A finite extension E of F is said to be separable if and only if, for each
element a ∈ E, the irreducible monic polynomial ga (x) := Irr F (a) of F [x]
which vanishes at a is separable.
365
11.5.2
p-th Power Mappings in Characteristic p
Before beginning with a discussion of simple and repeated roots of polynomials we need to familiarize ourselves with certain properties of the p-th power
mapping in domains and fields of characteristic p. So, for the moment, assume D is an integral domain of prime characteristic p. The p-th power
mapping on D is the mapping which takes each element x to its p-th power.
We immediately observe
Lemma 236 If D is an integral domain of characteristic p, then the p-th
power mapping is an isomorphism of D onto a subdomain Dp of D. In case
D is a field, the image Dp is a subfield of D.
Proof: For any two elements x and y of D, the “binomial theorem” implies
(x + y)p = xp + y p ,
thanks to the fact that the intermediate binomial coefficients are all divisible
by the prime p. Since we have (xy)p = xp y p for any x and y, we see that
the p-th power mapping is a ring endomorphism of D. Since D, being a
domain, has no non-zero elements whose p-th power is zero, the kernel of
this endomorphism is zero. Thus the p-th power mapping is an injective
endomorphism of D.
But if D is a field, the inverse of xp is (x−1 )p . Thus the set Dp of p-th
powers in D forms a subfield.
We shall have need of the following technical result:
Theorem 237 Suppose K is an extension of finite degree of the field F .
Suppose further that char K = p, and that K = F K p – that is, K is the join
of it’s two subfields F and K p . Then the p-th power mapping on K preserves
F -linear independence of subsets.
Proof: Since any F -linearly independant subset of K can be completed to a
F -basis for K, it suffices to prove that the p-th power mapping preserves the
linear independence of any F -basis for K. Suppose, then, that {a1 , . . . , an }
is a F -basis of K. Let c be a generic element of K. Then
c = α1 a1 + · · · + αn an , where αi ∈ F,
for all i. Then
cp = α1p ap1 + · · · + αnp apn ,
366
so
K p = F p ap1 + · · · + F p apn .
Then
K = F K p = F ap1 + · · · + F apn .
Thus the p-th powers of the basis elements ai form a F -spanning set of K
of size n. Since n = dimF K, these n spanning elements must be F -linearly
independent.
Armed with these results, we can now carry out our discussion of simple
and repeated roots.
11.5.3
Simple Roots
Let f (x) be any polynomial in F [x]. We have learned from the previous
subsection that there exists a splitting field for f (x) and that any two such
fields are F -isomorphic. This means that there is a field K, distinct elements
a1 , . . . , ar , and positive integers n1 , . . . , nr such that
1. K = F (a1 , . . . , ar ), and
2. there is a complete factorization
f (x) = (x − a1 )n1 (x − a2 )n2 · · · (x − ar )nr
in K[x].
.
The elements ai are called the roots of the equation f (x) = 0 in K
and are uniquely determined in any extension field of K. The number ni is
called the multiplicity of the root ai in f (x). Obviously the collection of
multiplicities {n1 , . . . , nr }, is uniquely determined by the polynomial f (x).
But it is also true that the multiplicity ni attached to the root ai is unique up
to exchanging ai by some zero bi of the same irreducible monic polynomial
in F [x] that ai satisfies.
Suppose K1 and K2 are two splitting fields for f (x) over F . Then there
is a F -isomorphism σ : K1 → K2 which extends to a ring isomorphism
σ ∗ : K1 [x] → K2 [x].
367
This carries the factorization
f (x) =
to
f (x) =
Yr
i=1
Yr
i=1
(x − ai )ni ,
(x − σ(ai ))ni
in K2 [x]. Then ai and σ(ai ) are both zeros of the same irreducible polynomial
p(x) over F and possess the same multiplicity, although they may lie in
distinct splitting fields.
In particular we see that if f (x) is irreducible in F [x], then any two of its
zeros possess the same multiplicity in it.
Note that “multiplicity” of a root has been defined relative to the polynomial of which it is a zero. Thus 1 has multiplicity one in x−1 but multiplicity
two in x2 − 2x + 1. Most of the time we will be interested in this notion of
multiplicity of a zero relative to an irreducible polynomial.
A root ai of f (x) is said to be a simple zero of f (x) iff and only if it
has multiplicity ni equal to one. Any root of f (x) = 0 of multiplicity greater
than one is called a repeated zero of f (x).
The following very elementary properties concerning roots and their multiplicities are useful:
Lemma 238 Let K contain the subfield F .
1. Suppose f (x) and g(x) are polynomials in F [x] which share a common
root a in K. Then they possess a non-unit greatest common divisor in
F [x] – that is, there exists a polynomial of degree at least one in F [x]
dividing both polynomials f and g.
2. If f (x) is irreducible in F [x] and f shares a root with g ∈ F [x], then f
divides g.
3. If f (x) is irreducible in F [x], then all of its roots have the same multiplicity in f (x).
4. Suppose f (x) and g(x) are polynomials in F [x] with the property that in
some splitting field for f (x), every root of f (x) is simple and moreover
is a root of g(x). Then f (x) divides g(x) in F [x].
368
Proof: 1. If the conclusion were false there would exist b(x) and c(x) in
F [x] such that b(x)f (x) + c(x)g(x) = 1. This is impossible when one applies
the homomorphism F [x] → K obtained by substituting a for x.
2. This is immediate from 1.
3. This was discussed above.
4. Let E be a splitting field for f (x)g(x). Then in E[x],
f (x) = (x − a1 ) · · · (x − ad )
where the roots ai are all simple and distinct. By hypothesis each ai is a root
of g(x), so
g(x) = (x − a1 ) · · · (x − ad )h(x),
a factorization in E[x]. Thus f (x) divides g(x) in E[x]. We claim that this
means f (x) divides g(x) in F [x]. For if not, the Euclidean algorithm yields
g(x) = q(x)f (x) + r(x), where 1 ≤ deg r(x) < deg f (x).
But then each ai , i = 1, . . . , d is a root of r(x). That is impossible since the
ai are pairwise distinct.
In any ring of polynomials F [x], there is a mapping ∂ : F [x] → F [x]
which takes each polynomial
f (x) = a0 + a1 x + · · · + an xn ,
to it’s formal derivative
∂(f ) = f 0 (x) := a1 + 2a2 x + · · · + nxn−1 .
Without any recourse to analytic arguments it can directly be proved
that
1. The mapping ∂ is F -linear,
2. For any two polynomials f (x) and g(x) in F [x],
∂(f g) = f 0 (x)g(x) + f (x)g 0 (x).
We now can prove this useful theorem:
369
Theorem 239 Let f (x) be an irreducible polynomial in F [x]. Then f (x)
has a repeated root if and only if its formal derivative is the zero polynomial
– that is, f 0 (x) = 0.
Proof: Suppose first that a is a repeated root of f (x). This means that in
K[x], where K is a splitting field for f (x) over F , one obtains a factorization
f (x) = (x − a)m g(x), where g(a) 6= 0, and m > 1.
Then f 0 (x) = m(x − a)m−1 g(x) + (x − a)m g 0 (x) has a for a root, and so by
Lemma 238, part 3, f (x) divides f 0 (x). But since f 0 (x) has smaller degree
than f (x), this implies f 0 (x) = 0.
Now assume f (x) is irreducible in F [x] and that f 0 (x) = 0. Then it is easy
to see F must have characteristic p, a rational prime, and that f (x) = g(xp )
for some irreducible polynomial g(x) in F [x]. Now in K, the p-th power of
a root of f (x) is a root of g(x). Since the p-th power mapping is injective
(see Lemma 236), f (x) cannot have more roots than g(x). But g(x) has
smaller degree than f (x), and so the number of roots of f (x) is smaller than
its degree. It follows that some must be repeated.
11.5.4
Separable Elements and Separable Field Extensions
Again K is an extension of field F . We say that an element a in K is
separable over F if and only if a is a simple root of IrrF (a), the monic
irreducible polynomial of F [x] of which it is a root. Note that if a is already
an element of F then its monic irreducible polynomial is IrrF (a) = x − a, of
degree one. Clearly a is a simple root of this polynomial, so elements of F
are automatically separable over F .
There is a second observation: If L is a subfield of K containing F and
a is separable over F then a is separable over L. The argument is this: Set
f (x) := IrrF (a) and g(x) := IrrL (a). Then there is a factorization f (x) =
g(x)h(x) in L[x]. Let E be the splitting field of f (x) over K. and let EL and
EF be the splitting fields of f (x) over L and F , respectively. All the roots of
f (x) in either EK or EL lie in EF and are distinct. But all the roots of g(x)
are roots of f (x) and are pairwise distinct. Thus a is seperable over L.
We record these observations:
Lemma 240 Suppose K is an extension of the field F .
370
1. Every element of F is separable over F .
2. If L is a subfield such that F ⊆ L ⊆ K, and a ∈ K is separable over
F , then a is also separable over L.
We say that K is separable over F if and only if every element of K is
separable over F .
This definition also has several immediate consequences:
Lemma 241 Suppose K is an extension of F .
1. If char F = 0, then K is separable over F .
2. Suppose L is a subfield with F ⊆ L ⊆ K.
(a) If K is separable over F then L is separable over F .
(b) If K is separable over F then K is also separable over L.
Proof: 1. Suppose char F = 0. Choose a ∈ K and set f (x) := IrrF (a).
Since f (x) is monic of degree at least one, and the characteristic is zero we
cannot have f 0 (x) = 0. Then by Theorem 239, f (x) has no repeated roots,
so a is a simple root of f (x). This means a is separable over F , and since a
was arbitrary, K is separable over F .
2. Part (a) is obvious and part (b) follows from Lemma 240, part 2.
We shall eventually show that if F ⊆ L ⊆ K is a tower of fields for which
K is separable over L and L is separable over F , then K is separable over F .
For the moment even the relation between separable elements and separable
field extensions is extremely one-sided. We do not even know that if a is an
element of K separable over F , that the subfield F (a) is separable over F .
As one can see from the above lemma, most of the trouble occurs in prime
characteristic. In fact it occurs precisely under the initial hypotheses of the
following theorem:
Theorem 242 Suppose F ⊆ K are fields of characteristic p, a prime. Suppose further that there exists an element a in K and an integer e such that
e
ap = b ∈ F .
e
r
1. If p(x) = IrrF (a), then xp − b = (p(x))p for some integer r ≥ 0.
e
2. If b is not in F p then xp − b is irreducible and so is equal to IrrF (a).
371
3. If b is not in F p and moreover a is separable, then a is already an
element of F .
e
Proof: Set f (x) = xp − b and let p(x) be any irreducible monic factor of
f (x) in F [x]. Let c be a root of p(x) and set L = F (c). Then f (c) = 0, and
e
so b = cp . Using the additive property of the p-th power map in L[x], we
obtain
e
f (x) = (x − c)p ,
as a factorization in L[x].
Now suppose q(x) is any other monic irreducible divisor of f (x) in F [x].
Then factoring q(x) in L[x] we see that q(x) is some positive power of (x − c)
and so q(c) = 0. Thus we see that p(x) and q(x) are irreducible in F [x] and
share a common root in an extension field of F . It follows from Lemma 238
that they divide each other. Since they are monic and irreducible, they are
equal. It thus follows that f (x) = p(x)m . Since m divides deg f (x) = pe , we
see that m = pr , for some non-negative integer r. Thus part 1. holds.
Now suppose further, that b is not in F p . Let a0 = p(0), the constant
r
term of the polynomial p(x). Clearly a0 ∈ F . Then from Part 1., ap0 = −b.
Since b is not the p-th power of an element of F , we must have r = 0. Thus
f (x) = p(x) is irreducible in F [x] and so Part 2. has been established.
Now assume not only that b is not in F p but that the element a is separable
over F . Then by Theorem 239, f 0 (x) 6= 0 where now f (x) = IrrF (a) by Part
e
e
2. But f (x) = xp − b, so f 0 (x) = pe xp −1 6= 0. This forces e = 0. Thus
e
a = ap = b ∈ F as desired.
The proof is complete.
We now can deduce a useful result:
Theorem 243 (Separability criterion for fields of prime characteristic) Let
K be any algebraic field extension of the field F , where F (and hence K)
have prime characteristic p.
1. If K is a separable extension of F , then K = F K p . (Note that the
degree [K : F ] need not be finite here.)
2. If K is a finite extension of F such that K = F K p , then K is separable
over F .
372
Proof: Part 1. Assume K is separable over F . Then K is separable over
its subfield L := F K p (Lemma 241). Suppose there exists an element a in K
that is not in L. Then its p-th power b = ap would be an element of L. If b
were not the p-th power of an element of L, then (noting that a is separable
over L) Theorem 242, Part 3.(with L in the role of F in the Theorem) would
imply a ∈ L, a contradiction. Thus we must assume b = cp for some c ∈ L.
Then a is a root of
xp − b = (x − c)p ∈ L[x],
and so IrrL (a) = x − c, whence a = c ∈ L, a contradiction once again.
We must conclude that no such element a exists, and so L = K as we
were to prove.
Part 2. Now assume [K : F ] is finite, and that K = F K p . Suppose, by
way of contradiction that a is an element of K which is not separable over
F . Then if f (x) := IrrF (a), Theorem 239 implies f 0 (x) = 0. It follows that
m
f (x) = a0 + a1 xp + · · · + am xp = g(xp ),
(11.4)
for g(x) irreducible in F [x].
Now from (11.4), the elements {1, ap , a2p , . . . , amp } are linearly dependent
over F . On the other hand, if {1, a, a2 . . . . , am } were F -linearly dependant,
then there would exist a polynomial p(x) in F [x] of degree at most m having
element a for a zero. But in that case f (x) = IrrF (a) would divide p(x),
which is impossible as
deg p(x) ≤ m < pm = deg f (x).
Thus we must conclude that the elements {1, a, . . . , am } are linearly independant over F . But this too is impossible, for in that case the p-th power
mapping takes this linearly independent set to one that is F -linearly dependent, against Theorem 237 and the hypothesis that K = F K p .
Thus no non-seperable element exists in K, so K is separable over F .
Corollary 244 (Separability of simple extensions) Fix a field F of prime
characteristic p.
1. Suppose a is separable over F . Then F (a) is a separable field extension
of F , and F (a) = F (ap ).
373
2. Conversely, if a is algebraic over F , and F (ap ) = F (a) where p 6= 0 is
the characteristic of F , then a is separable over F .
Proof: Part 1. Suppose first that a is separable over F and set L := F (ap ).
We first show that F (a) = F (ap ).
Suppose a is not in L, so that f (x) := IrrL (a) has degree greater than one.
r
Then clearly ap ∈ L = F (ap ), so by Theorem 242, Part 1, xp − ap = (f (x))p
has degree p and deg f (x) is a non-identity divisor of this prime. Thus
f (x) = xp − ap , and b = ap is not a p-th power of an element of L. But by
hypothesis a is separable over F , and so by Theorem 242, Part 3., a is an
element of L, a contradiction.
Thus we must have a ∈ L and so F (a) = F (ap ).
Now as every element of F (a) is a polynomial of F [x] evaluated at a, we
have (F (a))p ⊆ F · F (ap ) and F · (F (a))p = F · F (a) = F (a). Then
F · (F (a))p = F · F (ap ) = F · F (a) = F (a).
(11.5)
By Theorem 243, Part 2., F (a) is separable over F .
Part 2. Conversely, F (a) = F (ap ) implies the full equation (11.5), and so
again F (a) is separable over F as before. In particular, element a is separable
over F .
Corollary 245 (Transitivity of separability among finite extensions) Suppose K is a finite separable extension of L and that L is a finite separable
extension of F . Then K is a finite separable extension of F .
Proof: That [K : F ] is finite is known from the first section of this chapter,
so we only need to prove the separability of K over F . By Lemma 241,
Part 1., such a conclusion follows if F has characteristic zero. So we assume
char F = p.
Now by Theorem 243, the separability hypotheses yield K = LK p and
L = F Lp . Since Lp ⊆ K p , we obtain K = (F Lp )K p = F K p . Then by part
2. of Theorem 243, (noting that [K : F ] is finite) K is separable over K.
Corollary 246 Suppose K is an extension of F containing elements a1 , . . . , an
each of which is separable over F . Then the subfield F (a1 , . . . , an ) is separable
over F .
374
Proof: This is an easy induction using Corollaries 244 and 245.
Corollary 247 Let K be an arbitrary extension of F . Then
Ks = {a ∈ K|a is separable over F }
is a subfield of K containing F .
Proof: By Lemma 240 F ⊆ Ks . It remains to show that Ks is a subfield
of K. Suppose a and b are arbitrary elements of Ks . Then by Corollary 246,
F (a, b) is a separable extension of F so F (a, b) ⊆ Ks . But this means a ± b,
ab, and, if b 6= 0, ab−1 , all lie in Ks . Thus Ks is a subdomain of K closed
under taking multiplicative inverses of non-zero elements and so is a subfield
of K.
This field Ks is called the separable closure of F in K.
11.6
Galois Theory
11.6.1
Galois field extensions
Let K be any field, and let k be a subfield of K. The k-isomorphisms of K
with itself are called k-automorphisms of K. Under composition they form
a group which we denote by Gal(K/k). The group of all automorphisms of
K is denoted Aut(K), as usual.
Suppose now that G is any group of automorphisms of the field K. The
elements fixed by every automorphism of G form a subfield called the fixed
subfield of G and denoted Fix(G). If G ≤ Gal(G/k) then clearly k is a
subfield of Fix(G).
Now let S(G) be the poset of all subgroups of G, and let S(K/k) be the
poset of all subfields of K which contain k. In both cases the partial ordering
is containment. Now, the mapping
φ : S(Gal(K/k)) → S(K/k),
which takes each subgroup G of Gal(K/k) to Fix(G), reverses inclusion. That
is, if H ≤ G, then Fix(G) ≤ Fix(H). Similarly, the mapping
gal : S(K/k) → S(Gal(K/k)),
375
which takes each field L, with k ≤ L ≤ K, to Gal(L/k), also reverses inclusion. It follows that the composition
φ ◦ gal : S(K/k) → S(K/k),
is a closure operator on the poset of fields between k and K. The Galois
theory is concerned with the subposet of “closed” subfields.
Accordingly, we say that K is a Galois extension of L, (or that K is
Galois over L), if and only if L is the fixed subfield of Gal(K/L).
Now we can always force this situation by “starting from the top down”.
That is, for any subgroup G of Gal(K/k), we can form the fixed field L =
Fix(G), and observe the following:
1. k ≤ L,
2. G ≤ Gal(K/L), and
3. K is Galois over L.
11.6.2
Independence of linear characters
Let K be any field and let G be any group. A linear K-character of
the group G is a group homomorphism λ : G → K ∗ of the group G into
the multiplicative group K ∗ of non-zero elements of the field K. Thus a
K-character is a very special vector in the left K-vector space Hom (G, K).
Theorem 248 (Independence of the linear K-characters of a group) For any
group G and field K, the linear K-characters of G are K-linearly independent
in Hom (G, K).
Proof: It suffices to show that any finite set of linear K-characters is linearly
independent. Suppose, by way of contradiction that {σ1 , . . . σn } is a collection
of linear K-characters of G of smallest cardinality n, forming a K-linearly
dependent set. Since no linear K-character is the zero map, a single such
character comprises a linearly independent set. So n > 1. Then there exist
scalars a1 , . . . , an in K, not all zero, such that
Xn
(
a σ )(g)
i=1 i i
= 0, for all g ∈ G.
376
(11.6)
Note that by the minimality of n, each scalar ai is non-zero. Since σ1 and σn
are distinct characters, there exists a group element h at which they differ —
i.e., σ1 (h) 6= σn (h). Now, from the definition of linear K-character, we must
have σi (gh) = σi (g)σi (h), for all i. Then
0=
Xn
a σ (gh) =
i=1 i i
Xn
a σ (g)σi (h),
i=1 i i
(11.7)
for all g ∈ G. But also, just multiplying equation (11.6) throughout by σ1 (h)
on the right, one obtains
0=
Xn
a σ (g)σ1 (h).
i=1 i i
(11.8)
Now, subtracting equation (11.8) from equation (11.7), we obtain
0=
Xn
a σ (g)(σi (h)
i=2 i i
− σ1 (h)), for all g ∈ G.
(11.9)
Note that the linear combination on the right side of equation (11.9) is not
zero, since the last coefficient an (σn (h) − σ1 (h)) is non-zero. But then (11.9)
reveals that {σ2 , . . . , σn } is a linearly dependent set, against the minimal
choice of n.
We can parley this into a lower bound of the degree of an extension in
terms of the number of k-isomorphisms of one extension of k into another. To
be more precise, let ` and k be fields and let σ : ` → k be an isomorphism of
fields. Next suppose L and K are fields containing ` and k, respectively. An
injective ring morphism L → K which extends σ is called a `-isomorphism
extending σ, or just a `-isomorphism if σ is fixed by the context.
Theorem 249 Let L and K be fields containing the fields ` and k, respectively. Suppose σ : ` → k is a field isomorphism and that {σi : L → K|i =
1, . . . , n} is a finite collection of n distinct `-isomorphisms of L into K, each
extending σ. Then [L : `] ≥ n.
Proof: Suppose, by way of contradiction that {b1 , . . . , bm } is a `-basis for
L, where m < n. Consider now the n-by-m matrix




B=
σ1 (b1 )
σ2 (b1 )
...
σn (b1 )
. . . σ1 (bm )
. . . σ2 (bm )
...
. . . σn (bm )
377



.

The rows of B are n vectors in the m-dimensional space of row vectors
K
, and, since m < n, they form a K-linearly dependent set of vectors.
Thus there are scalars a1 , . . . , an of K, not all zero, forming a linear combination of these column vectors equal to zero – that is (a1 , . . . , an ) × B = 0 ∈
M 1×m . Thus for each j = 1, . . . , m indexing the basis elements,
1×m
Xn
a σ (b )
i=1 i i j
= 0.
Since the `-span of the bj is L, we see that for every element v in L,
Xn
(
a σ )(v)
i=1 i i
= 0 ∈ K.
That means ni=1 ai σi is the zero mapping L → K – that is, the set {σ1 , . . . , σn }
is a K-linearly dependent set of elements of Hom (L, K). Now observing that
the σi induce distinct linear K-characters of the multiplicative group L∗ , of
non-zero elements of L, we have contradicted Theorem 248.
Thus we must have [L : `] ≥ n.
P
Setting L = K, ` = k and σ = 1k in this theorem, we have
Corollary 250 Suppose K is an extension of k. Then
|Gal(K/k)| ≤ [K : k].
11.6.3
Group orders and degrees in Galois extensions
Theorem 251 Let K be a field and let G be a finite subgroup of Aut(K). If
k is the fixed subfield of G, then the following is true:
1. K is a normal separable extension of k.
2. |G| = [K : k].
3. G = Gal(K/k).
Proof: First we show that K is normal and algebraic over k. Let a ∈ K.
Since G is finite, we can form the polynomial
f (x) :=
Y
g∈G
378
(x − ag ),
all of whose zeroes are in K. The coefficent of each power of x in f (x) is
clearly G-invariant, and so belongs to k. Thus f (x) is a polynomial in k[x]
having a as one of its zeroes, It follows that a is algebraic over k, and that if
p(x) = Irrk (a), then p(x) divides f (x). Thus the zeroes of p(x), being among
the zeroes of f (x), are all found in K. In short, K is algebraic over k, and if
an irreducible polynomial has at least one zero in K, then it has all its zeroes
in K. Thus K is a normal algebraic extension of k.
We leave the separability till later.
Now we list the elements of G as {σ1 , . . . , σn }. Suppose by way of contradiction that [K : k] ≥ n+1 and let {a1 , . . . , an+1 } be a k-linearly independent
subset of K. We now form the n-by-(n + 1) matrix




B=
σ1 (a1 )
σ2 (a1 )
...
σn (a1 )
σ1 (a2 )
σ2 (a2 )
...
σn (a2 )
. . . σ1 (an+1 )
. . . σ2 (an+1 )
...
. . . σn (an+1 )



.

We let V be the set of all column vectors {x1 , . . . , xn+1 }T such that B ·
{x1 , . . . , xn+1 }T = 0. Clearly V is a vector space. In fact one may regard V
as the subspace of K (n+1)×1 consisting of all vectors whose (“dot”) product
with each row vector of B is zero. Since there are only n row vectors, the
space V is non-trivial. Now for each vector v = {x1 , . . . , xn+1 }T in V , and
element g ∈ G, set gv := {g(x1 ), . . . , g(xn+1 )}T . Then if rj ∈ K 1×(n+1) is the
j-th row of B, we have rj · v = 0, for each j – that is,
Xn+1
i=1
σj (ai )xi = 0.
Now left multiplication by g −1 permutes the elements of G, so, given j, there
is an index π(j) such that g −1 σj = σπ(j) . But then for all j, the “dot product”
rj · gv is
Xn+1
Xn+1
σ (a ) · g(xi ) = g(
i=1 j i
i=1
Xn+1
= g(
i=1
g −1 σj (ai ) · xi )
σπ(j) (ai ) · xi )
= g(rπ(j) · v)
= g(0) = 0.
Thus, if v ∈ V , then gv ∈ V .
379
Now for each vector w in K (n+1)×1 let the weight of w be the number
of non-zero entries in w, and denote it as wt(w). In the subspace V choose
a non-zero vector u of minimal possible weight. Multiplying through by a
scalar from K, and rearranging the indices of the rows if necessary, we may
assume
u = (1, c2 , c3 , . . . cr , 0, . . . , 0)T ,
where each ci is non-zero. If all ci were in k, then we should have
Xn+1
σj (
ac)=
i=1 i i
Xn+1
σ (a )c = rj · u = 0 ⇒
i=1 j i i
X
ai ci = 0,
against the fact that the ai were k-linearly independent. Thus we may assume
that r > 1 and that cr is not in k. Since k is the fixed field of G, this means
there is a σ ∈ G such that σ(cr ) 6= cr . Now consider σ(u) − u. This vector
lies in V , has zero in its first coordinate, and in all coordinates beyond the
r-th. Thus
wt(σ(u) − u) < wt(u).
So, from the fact that u was chosen non-zero with minimal weight, we see
that σ(u) − u is zero. But that is impossible since its r-th coordinate is
non-zero.
This contradiction proves that [K : k] ≤ |G|. But now Theorem 249
forces |G| ≤ [K : k], so equality holds.
To show that K is separable over k, we must show that in case k has
characteristic p, a prime, that kK p = K (see Theorem 29). Set L = kK p .
Clearly L is a G-invariant subfield of K. The group G thus induces a group
of automorphisms of of L, through a group homomorphism
f : G → Gal(L/k).
The kernel of f is GL := G ∩ Gal(K/L). We claim that Gal(K/L) = {1K },
the identity subgroup. Let g be any element of Gal(K/L) and let a be an
element of K. If a ∈ L, then g fixes a. If a ∈ K − L, then ap = b ∈ L. Thus
a is a zero of xp − b ∈ L[x]. But in K[x], xp − b = (x − a)p . That means a is
the unique zero of xp − a ∈ L[x] that the field K has to offer. For that reason
a is fixed by g. Thus our claim that Gal(K/L) is trivial, is true, and so f is
an injection. Thus G can be regarded as a subgroup of Gal(L/k). But since
L is G-invariant, it is normal over k, and so Theorem 249 applies to yield,
|G| ≤ [L : k]. But now the equality in part 2 forces
[K : k] = |G| ≤ [L : k] ≤ [K : k],
380
so [L : k] = [K : k] and K = L. The proof is complete.
Corollary 252 Suppose K is a separable normal extension of k of finite
degree. Then K is a Galois extension of k, and |Gal(K/k) = [K : k].
Proof: Let G = Gal(K/k). In view of Theorem 251, it suffices to show that
Fix(G) = k. But suppose θ is any element of K − k. Then as [K : k] is finite,
θ is algebraic over k, and so f (x) := Irrk (θ) exists and has degree at least
two. Since K is separable over k, there are at least two distinct zeros, θ = a1
and a2 of f (x) in K. Thus there is a k-isomorphism: k(a1 ) → k(a2 ) which,
by Corollary 223, can be extended to an automorphism g of the splitting field
K. Thus g is an element of Gal(K/k) which moves θ to a2 6= θ. It follows
that Fix(G) = k, and the proof is complete.
Theorem 253 (The Fundamental Theorem of Galois Theory) Suppose K
is a finite separable normal extension of the the field k. Let G := Gal(K/k).
Then the following hold:
1. (the Galois Connection) There is a one-to-one order-reversing correspondence between the poset S(G) of all subgroups of G under inclusion
and the poset S(K/k) of all subfields L with k ≤ L ≤ K ordered by
inclusion, which takes every subgroup H of G to the subfield Fix(H).
We write this
Fix : S(G) → S(K/k),
The inverse mapping is Gal : S(K/k) → S(G), which takes each intermediate field L to Gal(K/L). Thus the two poset order-reversing
bijections, Fix and Gal, form a Galois connection.
2. (The connection between upper field indices and group orders) For each
intermediate field L of S(K/k), one has
[K : L] = |Gal(K/L)|.
3. (The connection between groups indices and lower field indices) If H ≤
G, then [Fix(H) : k] = [G : H].
4. (The correspondence of normal fields and normal subgroups).
381
(a) Suppose L ∈ S(K/k) is normal over k. Then Gal(K/L) is a
normal subgroup of G = Gal(K/k).
(b) Suppose N is a normal subgroup of G. Then Fix(N ) is a normal
extension of k.
5. (Induced groups and normal factors) Suppose L is a normal extension
of k contained in K. Then Gal(L/k) is isomorphic to the factor group
Gal(K/k)/Gal(K/L).
Proof: Thanks to our preparation, the statement of this Theorem is almost
longer than its proof. Lets start with part 1. Since, by Lemma 241 part 2(b),
K is finite normal separable over every intermediate field L, k ≤ L ≤ K,
then by Corollary 252 and Theorem 251 we always have that
|Gal(K/L)| = [K : L].
By hypothesis both sides of this equation are natural numbers. A great deal
follows from this.
First, it implies that the mapping Gal : S(K/k) → S(G) is injective. For
if H = Gal(K/L1 ) = Gal(K/L2 ), then H fixes the subfield L3 generated
by L1 and L2 , and certainly H = Gal(K/L3 ). Then we see that |H| =
|Gal(K/L3 )| = |Gal(K/L1 )| = [K : L3 ] = [K : L1 ]. Since L1 ≤ L3 we
are forced to conclude that L3 = L1 so L1 = L2 . Thus Gal is an injective
mapping.
On the other hand, for each intermediate field L,
|Gal(K/k)| = [K : k]
= [K : L][L : k]
= |Gal(K/L)| · [L : k].
So
[Gal(K/k) : Gal(K/L)] = [L : k].
Similarly, suppose L := Fix(H1 ) = Fix(H2 ). Then if H is the subgroup of
G generated by H1 and H2 , we have L = Fix(H). Now K is finite normal and
seperable over L, so, applying Theorem 251 to each group, |H1 | = |H2 | =
|H| = [K : L]. Since H1 ≤ H, we have H = H1 = H2 . Thus Fix :
S(Gal(K/k)) → S(K/k) is injective.
382
Now, the compositions Gal ◦ Fix : S(Gal(K/k) → S(Gal(K/k)) and
Fix ◦ Gal : S(K/k) → S(K/k) are injective poset endomorphisms of finite
posets and so by the Pigeon-Hole Principle, are isomorphisms.
At this point we have proved all of parts 1, 2, and 3.
Now suppose L is a subfield of K which is normal over k. Then L is
invariant under the action of the Galois group, and Gal(K/L) is the kernel of
the homomorphism Gal(K/k) → Gal(L/k) which for every k-automorphism
of K, gives the k-automorphism induced on L. But by Corollary 223, every
k-automorphism of L extends to one of K, so this homomorphism is onto.
So Parts 4(a). and 5. hold.
Finally, suppose N is a normal subgroup of Gal(K/k). Then Gal(K/k)
leaves Fix(N ) invariant, from which it follows that Fix(N ) is a normal extension of k. The proof is complete.
11.7
Solvability of equations by radicals
11.7.1
Introduction
Certainly one remembers that point in one’s education that one first encountered the quadratic equation. If we are given the polynomial equation
ax2 + bx + c = 0,
where a 6= 0 (to ensure that the polynomial is indeed “quadratic”), then the
roots (or zeros) of this equation are given by the formula
√
(−b ± b2 − 4ac)/2a.
Later, perhaps, the formula is justified by the procedure known as “completing the square”. One adds some constant to both sides of the equation so that
the left side is the square of a linear polynomial, and then one takes square
roots. It is fascinating to realize that this idea of completing the square goes
back at least two thousand years to the Near East and India. It means that
at this early stage, there is the suggestion that there could be a realm where
square roots could always be taken, and the subtlety that there are cases in
which square roots can never be taken because the radican is negative.
A second method of solving the equation involves renaming the variable
x in a suitable way. One first divides the equation through by a so that it
383
has the form x2 + b0 x + c0 = 0. Then setting y := x − b0 /2, the equation
becomes
y 2 + c0 − (b0 )2 /4 = 0,
from
which extraction of the square root (if that can be done) yields y =
q
(4c0 − (b0 )2 )/2. The original zero named by x, is obtained by x = y + b0 /2,
and the equations b0 = b/a and c0 = c/a.
Certainly the polynomials in these equations were seen as generic expressions representing relations between some sort of possible numbers. But
these numbers were known not to be rational numbers in many cases. One
could say that solutions don’t exist in those√cases, or that there is a larger
realm of numbers (for example fields like Q( 2) or the reals), and the latter
view only began to emerge in the last three centuries.
Of course the quadratic formula is not valid over fields of characteristic
2 since dividing by 2 was used in deriving and expressing the formula. The
same occurs for the solutions of the cubic and quartic equations: fields of
characteristic 2 and 3 must be excluded.
Dividing through by the coefficient of u3 , the general cubic equation has
the form u3 +bu2 +cu+d = 0, where u is the generic unkown root. Replacing
u by x − b/3 yields the simpler equation,
x3 + qx + r = 0.
Next, one puts x = y + z to obtain
y 3 + z 3 + (3yz + q)x + r = 0.
(11.10)
There is still a degree of freedom allowing one to demand that yz = −q/3,
thus eliminating the coefficient of x in (11.10). Then z 3 and y 3 are connected
by what amounts to the norm and trace of a quadratic equation:
z 3 + y 3 = −r,
and
z 3 y 3 = −q 3 /27.
One can then solve for z 3 and y 3 by the quadratic formula. Taking cube
roots (assuming that is possible) one gets a value for z and y. The possible
zeroes of x3 + qx + r are:
y + z, ωy + ω 2 z, ω 2 y + ωz,
384
where ω is a complex primitive cube root of unity. The formula seems to have
been discovered in 1515 by Scipio del Ferro and independently by Tartaglia.
Just thirty years later, when a manuscript of Cardan published the cubic
formula, Ferrari discovered the formula for solving the general quartic equation. Here one seeks the roots of a monic fourth-degree polynomial, whose
cubic term can be eliminated by a linear substitution to yield an equation
x4 + qbx2 + rx + s = 0.
We would like to re-express the left side as
(x2 − kx + u)(x2 + kx + v),
for suitable numbers k u and v. From the three equations arising from
equating coefficients, the first two allow u and v to be expressed in terms
of rk, and a substitution of these expressions for u and v into the third
equation produces a cubic in k 2 , which can be solved by the cubic formula.
Then u and v are determined and so the roots can be obtained from two
applications of the quadratic formula. It is hardly surprising, with all of this
success concentrated in the early sixteenth century, that the race would be
on to solve the general quintic equation. But this effort met with complete
frustration for the next 270 years.
Looking back we can make a little better sense of this. First of all where
are we finding these square roots and cube roots that appear in these formulae? That can be answered easily from the concepts of this chapter. If we we
wish to find the n-th root of a number w in a field F , we are seeking a root
of the polynomial xn − w ∈ F [x]. And of course we need only form the field
F [α] ' F [x]/p(x)F [x] where p(x) is any irreducible factor of xn − w. But in
that explanation, why should we start with a general field F ? Why not the
rational numbers? The answer is that in the formula for the general quartic
equation, one has to take square roots of numbers which themselves are roots
of a cubic equation. So in general, if one wishes to consider the possibility of
having a formula for an n-th degree equation, he or she must be prepared to
extract p-th roots of numbers which are already in an uncontrollably large
class of field extensions of the rational numbers, or whatever field we wish to
begin with.
Then there is this question of actually having an explicit formula that is
good for every n-th degree equation. To be sure, the sort of formula one has
in mind would involve only the operations of root extraction, multiplication
385
and addition and taking multiplicative inverses. But is it not possible that
although there is not one formula good for all equations, it might still be
possible that roots of a polynomial f (x) ∈ F [x] can be found in a splitting
field K which is obtained by a series of root extractions – put more precisely,
K lies in some field Fr which is the terminal member of a tower of fields
F = F0 < F1 < · · · < Fr ,
where Fi+1 = Fi (αi ) and αini ∈ Fi , for positive integers ni , i = 0, . . . , r − 1.
In this case we call Fr a radical extension of F . Certainly, if there is a
formula for the roots of the general n-th degree equation, there must be a
radical extension Fr /F which contains a copy of the the splitting field Kf (x)
for every polynomial f (x) of degree n. Yet something more general might
happen: Conceivably, for each polynomial f (x), there is a radical extension
Fr /F , containing a copy of the splitting field of f (x), which is special for
that polynomial, without there being one universal radical extension which
does the job for everybody. Thus we say that f (x) is solvable by radicals
if and only if its splitting field lies in some radical extension. Then, if there
is a polynomial not solvable by radicals, there can be no universal formula.
Indeed that turned out to be the case for polynomials of degree at least five.
This field-theoretic view-point was basically the invention of Evariste Galois.
The next sections will attempt to describe this theory.
11.7.2
Roots of unity
Any zero of xn − 1 ∈ F [x] lying in some extension field of F is called an n-th
root of unity. If F has characteristic p, a prime, we may write n = m · pa
where p does not divide m and observe that
a
xn − 1 = (xm − 1)p .
Thus all n-th roots of unity are in fact m-th roots of unity in this case.
Of course if α and β are n-th roots of unity lying in some extension field
of K of F , then (αβ)n = αn β n = 1 so αβ is also an n-th root of unity.
Thus the n-th roots of unity lying in K always form a finite multiplicative
subgroup of K ∗ , which, by Corollary 214 is necessarily cyclic.
Suppose now that n is relatively prime to the characteristic p of F , or that
F has characteristic zero. Let K be the splitting field of xn − 1 over F . Then
xn − 1 is relatively prime to its formal derivative nxn−1 6= 0. It follows that
386
K is a separable extension of F and that the n-th roots of unity in K form
a cyclic group of order n. A generator of this group is called a primitive
n-th root of unity. Since all other n-th roots are powers of some primitive
n-th root of unity α, we see that K = F [α]. Clearly Gal(K/F ) induces a
group of automorphisms of this cyclic group C. The subgroup inducing the
identity automorphism fixes all n-th roots, whose F -span is K, and so is the
identity group. Thus Gal(K/F ) is isomorphic to a subgroup of Aut(C), and
so is abelian.
We summarize these observations in
Lemma 254 Let N be any positive integer and let F be any field. Write
N = n · pk if F has prime characteristic p, or N = n, if F has characteristic
zero. Let K be the splitting field of xN − 1 over F . Then the following hold:
1. K is the splitting field of xn − 1 over F .
2. K is separable over F and is a simple extension K = F [α], where α is
a primitive n-th root of unity.
3. Gal(K/F ) is an abelian group.
11.7.3
Radical extensions
Suppose r is a prime. We say that a field F contains all r-th roots of
unity if and only if the polynomial xr −1 splits completely into linear factors
in F [x]. Now if F has prime characteristic equal to r, we have xr −1 = (x−1)r
and so it is true that F contains all r-th roots of unity in that case.
Lemma 255 Let F be a field, and let r be a prime. Suppose K = F [α]
where, αr ∈ F . Suppose F contains all r-th roots of unity. Then
1. the extension K is normal over F , and
2. the Galois group Gal(K/F ) is abelian.
Proof: First suppose char F = r. Then α is the unique zero of xr − b where
b = αr . Thus K is a splitting field for this polynomial, and Gal(K/F ) is the
identity group.
Now suppose char F 6= r. Then K is separable over F and by assumption,
F contains all r-th roots of unity.. We may assume Gal(K/F ) is non-trivial,
387
so α ∈ K − F . As before, set αr = b. Let ζ be an r-th root of unity in
F distinct from 1. Then {ζ i α|i = 0, . . . , r − 1}, is the complete set of all r
zeros of p(x) := xr − b. Thus K is the splitting field for p(x) over F and so
is normal over F . Now let σ and τ be any two elements of Gal(K/F ). Then
σ(α) = ζ i α and τ (α) = ζ j α for some i and j. Then obviously both στ and
τ σ take α to ζ i+j α. Thus the commutator [σ, τ ] fixes α and hence fixes K
point-wise. So any two elements of Gal(K/F ) commute.
At this point the reader is reminded that if K is a finite extension of
F , then there exists a normal closure M of K over F , and that M is also
a finite extension of F and can be assumed to contain K [The reader may
recall that M can be formed by selecting an F -basis B = {α1 , . . . , αn } of K,
Q
and forming the splitting field M for the polynomial fB (x) := fi (x) where
fi (x) := IrrF (α1 ). Then, as there is an F -isomorphism, K → M we may
assume that M contains K without any loss of generality.]
Lemma 256 Suppose K = F [α] where, for some prime r, αr = b ∈ F . Let
M be a normal closure of K over F . Then Gal(M/F ) is solvable.
Proof: If r = char F , F aleady contains all rth roots of unity and K is
already its own normal closure M . Thus Gal(M/F ) = Gal(K/F ) = h1M i.
So we may assume that r 6= char F and that α is not in F . Then IrrF (α) =
f (x) is an irreducible factor of xr − b in F [x] of degree at least two. Moreover
f (x) is separable over F , and so K is separable over F , so M contains two
distinct roots of f (x), say α and α0 . Then ζ = α/α0 is a non-trivial r-th
root of unity. Since r is a prime number, L = F [ζ] is normal over F . Now
Gal(M/F ) induces an abelian group of F -automorphisms of L (Lemma 254)
with the kernel of the action being Gal(M/L). But also, noting that M =
F (ζ, α) = L(α), Lemma 255 implies that Gal(M/L) is also abelian. It follows
that Gal(M/F ) is a metabelian group, and so is certainly solvable.
Now we introduce several concepts motivated by the introduction to this
section. Earlier we said that a field K was a radical extension of F if and
only there exists a tower of fields F = F0 < F1 < · · · < Fm = K for which
Fi+1 = F [αi ] where αini ∈ Fi for some positive integer ni .
Consider, for example one of these generic intervals Fi+1 /Fi . Basically, it
is an extension L[β] of a field L where β m ∈ L for some integer m. Suppose
and m = pm0 where p is a prime and set γ = β p . Then we have a refined
388
chain of fields L < L[γ] < L[β] where β p ∈ L[γ]. This process may be
repeated with γ in place of β. The end result would be a chain of subfields
from L to L[α] each obtained by adjoining a p-th root of an element in its
preceding field, where p is a prime depending on the position of the field Lin
the chain.
This means that in considering the chain {Fi } used to define K as a
radical extension as above, by refining the chain of fields, we can assume
that each exponent ni in that definition is a prime number. We shall always
do so.
For each polynomial p(x) in F [x], we say that the equation p(x) = 0 is
solvable by radicals if and only if the splitting field E for p(x) over F lies
in some radical extension of F . Our goal in this subsection is to prove that
if p(x) = 0 is solvable by radicals then Gal(E/F ) is a solvable group. In the
subsection following this, we shall prove the converse of this statement.
Theorem 257 Suppose K is a radical extension of F , and let M be the
normal closure of K over F . Then Gal(M/F ) is solvable.
Proof: For any radical extension K of F there exists a chain of subfields
F = F0 < F1 < · · · < F` = K where Fi+1 = Fi [αi ] where αpi ∈ Fi for some
prime number pi , i = 0, . . . , ` − 1. The length of a shortest such chain will
be christened the radical height of K over F and denoted rh (K). Note
that if rh(K) = 0, then F = K = M and Gal(M/K) = 1 and is solvable in
this case. The proof will proceed by induction on the radical height.
To fix notation, we assume here that {Fi } is such a shortest chain so that
` = rh(K). Note that this means rh(F`−1 ) = ` − 1.
Next Let M be a normal closure of K over F and set G := Gal(M/F ).
Let Mi be the normal closure of Fi in M with respect to F . The fields Mi
are all normal over F . They actually have a simple description. For each αi
let αiG be the full G-orbit {σ(αi )|σ ∈ G}. Then
G
Mi = F (α1G ∪ · · · ∪ αi−1
).
Now F`−1 is a radical extension of F of radical height ` − 1. By induction,
Gal(M`−1 /F ) is solvable. But since M and M`−1 are both normal over F ,
Gal(M/F )/Gal(M/M`−1 ) ' Gal(M`−1 /F ).
389
Thus to show that G is solvable, it remains only to show that the kernel
G
N = Gal(M/M`−1 ), is a solvable group. We have that M = M`−1 (α`−1
).
G
Next α`−1
decomposes into N -orbits with representatives {β1 , . . . , βt }. Then
Di := M`−1 (βiN ) is a normal extension of M`−1 , i + 1, . . . , t, and M =
D1 D2 · · · Dt . Now βip` is in M`−1 , p` is a prime, and Di is the normal closure
over M`−1 of M`−1 [βi ]. By Lemma 256, Gal(Di /M`−1 ) is solvable. Since both
M and Di are normal over M`−1 , there are epimorphisms
φi : Gal(M/M`−1 ) → Gal(Di /M`−1 )
with solvable images. The intersection of their kernels is
∩ti=1 Gal(M/Di )
which pointwise fixes D1 · · · Dt = M , and so is trivial. Thus Gal(M/M`−1 )
is a subdirect product of the solvable groups Gal(Di /M`−1 ) and so is itself
solvable.
The proof is complete.
Corollary 258 Suppose p(x) is a polynomial in F [x] and that p(x) = 0 is
solvable by radicals. Then if E is a splitting field for p(x) over F , Gal(E/F )
is a solvable group.
Proof: By assumption there is a radical extension K of F containing E.
Let M be the normal closure of K over F . By Theorem 257, Gal(M/F ) is
solvable. Since E and M are both normal over F , there is an epimorphism
of groups
φ : Gal(M/F ) → Gal(E/F ).
The conclusion follows.
11.7.4
Galois’ criterion for solvability by radicals
In this subsection we will prove a converse to Corollary 258 for polynomial
equations over a ground field in characteristic zero. (In characteristic p the
converse statement is not even true.) In order to accomplish this we require
two initial results, each bearing an unusual name.
390
Lemma 259 (Lemma on Accesory Irrationalities) Suppose K and E are
both normal extensions of a field F , and let K1 be a larger field generated by
K and E. Then K1 is normal over E and the group Gal(K1 /E) is isomorphic
to a subgroup of Gal(K/F ).
Proof: First K is the splitting field over F of some polynomial p(x) ∈ F [x].
Let K 0 be the splitting field of f (x) over E. As there is a k-isomorphism
K → F , we may assume K 0 contains K. Then K 0 ' E(K) ' K1 – these
are k-isomorphisms – so we see that K1 can be expressed as a splitting field
over E. Thus K1 is normal over E. Since K is normal over F , it is invariant
under Gal(K1 /F ). Under the action of its subgroup Gal(K1 /E) we induce
k-automorphisms of K, so there is a group homomorphism
ψ : Gal(K1 /E) → Gal(K/F ).
But any k-automorphism of ker ψ fixes pointwise both E and K, and hence is
the identity automorphism of K1 . Thus ψ is injective. The proof is complete.
Now assume K is a Galois extension of F . Set G = Gal(K/F ). Then G
is finite and Fix(G) = F . Then, for any element α ∈ K we define the norm
of α as the right side of
NK/F (α) :=
Y
σ∈G
σ(α).
The “norm” function has several important properties listed in this
Lemma 260 For any elements α and β in K,
1. NK/F (α) is fixed by G and so lies in F ,
2. NK/F (α−1 ) = NK/F (α)−1 , and
3. NK/F (αβ) = NK/F (α)NK/F (β).
Lemma 261 (Hilbert’s Theorem 90) Suppose K is a Galois extension of F
and G = Gal(K/F ) is a cyclic group of order n with generator σ. An element
α in K can be written in the form βσ(β −1 ) for some element β ∈ K if and
only if NK/F (α) = 1.
391
Proof: First, if α = βσ(β −1 ), then NK/F (α) = 1 follows easily from the
properties of the norm listed in the previous Lemma.
Suppose now NK/F (α) = 1. We form a series of elements in K as follows:
δ0 := α, δ1 := ασ(α), δ2 := ασ(α)σ 2 (α), . . . ,
and in general
δj :=
Yj
i=0
σ(α), j = 0, . . . n − 1,
so that
δn−1 = ασ(α) · · · σ n−1 (α) = NK/F (α) = 1.
Note that from these definitions,
ασ(δi ) = δi+1 , i = 0, . . . , n − 2.
(11.11)
Since the elements {1, σ, σ 2 , . . . , σ n−1 } are a set of K-linearly independent
linear K ∗ -characters, the function
Xn−2
σ n−1 +
δσ
i=0 i
i
,
from K to K is not the zero function. Thus there is an element γ at which
this function does not vanish, and we let β be the nonzero value assumed at
γ. Thus:
Xn−2
β := σ n−1 (γ) + i=0 δi σ i (γ) 6= 0.
(11.12)
Now using equation (11.11), one has
σ(β) = α−1 [δ1 σ(γ) + · · · + δn−1 σ n−1 (γ)] + σ n (γ).
But since σ n = 1, the last term on the right can be written as σ n (γ) = γ =
α−1 δ0 · γ. Factoring out the α−1 , we recognize the equation just above to
read
σ(β) = α−1 β.
So we see α = β · σ(β −1 ), as desired.
Corollary 262 In this corollary the ground field F has characteristic 0.
Suppose K is a Galois extension of F and that Gal(K/F ) is a cyclic group
of order n. Assume further that F contains all n-th roots of unity. Then
K = F (α) where αn ∈ F . In particular, K is a radical extension of F .
392
Remark: The condition that the fields in question have characteristic
zero is absolutely necessary at this point. Suppose F = GF (2) and K =
GF (4). Then K is a Galois extension of F and Gal(K/F ) is cyclic of order
two. But K is not a radical extension of F . In fact F is closed under the
operation of taking square roots.
Proof: Let σ be a generator of G := Gal(K/F ). Since F contains n-th
roots of unity and has characteristic zero, it contains a primitive n-th root of
unity, say ζ. Since |G| = n, the norm of ζ is ζ n = 1. By Hilbert’s Theorem 90
(Lemma 261) there exists an element β such that ζ = β −1 σ(β), so σ(β) = ζβ.
Then σ i (β) = ζ i β. for all integers i, and (since ζ is a primitive n-th root)
are pairwise distinct elements for i = 0, 1, . . . , n − 1. The formula for i = 1
yields σ(β n ) = (σ(β))n = ζ n β n = β n . Thus β n is fixed by G = hσi and
so is an element c in F . Then β is a root of xn − c = 0. Of course, the n
elements ζ i β, 0 ≤ n − 1, comprise the full set of roots of this equation. Since
G transitively permutes them, the polynomial xn − c is irreducible in F [x].
Thus [F [β] : F ] = n. But since K is Galois over F , [K : F ] = |G| = n, so
K = F [β], as in the stated conclusion.
Theorem 263 Suppose F is a field of characteristic zero and that f (x) is a
polynomial in F [x]. Let K be the splitting field of f (x) over F . If Gal(K/F )
is a solvable group then the equation f (x) = 0 is solvable in radicals.
Proof: Since F has characteristic zero, K is normal and separable over F
and so is a Galois extension of F of degree n, say. Then G := Gal(K/F ) is
a solvable group of order n.
We first prove the result under the assumption that F contains all n-th
roots of unity. As we are in characteristic zero, that means that F contains
a primitive n-th root of unity.
Since G is solvable, there is a composition series
G = G0 > G1 > · · · Gr = 1.
This means Gi+1 is a normal subgroup of Gi and the quotient Gi /Gi+1 is a
cyclic group of prime order pi . Let Fi := Fix(Gi ). Then by the Fundamental
Theorem of Galois Theory (Theorem 253) we have a tower of fields
F = F0 < F1 < · · · < Fr = K,
393
where Fi+1 is normal over Fi of degree pi . Moreover Gal(Fi+1 /Fi ) ' Gi /Gi+1
is cyclic of order pi . Clearly pi divides n = [K : F ]. Thus Fi contains a
primitive pi -th root of unity. Then by Corollary 262, Fi+1 = Fi [αi ] where
αipi is an element of Fi . Thus Fi+1 is a radical extension of Fi . Since this
assertion holds at each index i , K is a radical extension F , and f (x) = 0 is
solvable by radicals.
Now we drop the assumption that F contains all n-th roots of unity. Let
J be the splitting field of xn − 1 over K. Then as the characteristic is zero, J
contains a primitive n-th root ζ. Set E = F (ζ). Then E is a splitting field of
xn − 1 over F . In particular E is a radical extension of F ; that is important.
Since J = K[ζ], J is generated by E and K. Since both E and K
are normal over F , the Lemma on accessory irrationalities (Lemma 259)
informs us that Gal(J/E) is isomorphic to a subgroup of the solvable group
Gal(K/F ). This tells us two things: (1) m := [J : E] = |Gal(J/E)| divides
|Gal(K/F )| = n, and (2) Gal(J/E) is itself solvable. From (1), E contains
m-th roots, and so, by the first part of this proof, (2) implies that J is a
radical extension of E. But we have already emphasized that E is a radical
extension of K. So J must be a radical extension of F . Since the splitting
field K lies in this extension J, f (x) is solvable by radicals. The proof is
complete.
Corollary 264 (Galois’ solvability criterion) Assume f (x) ∈ F [x] where F
has characteristic zero. Then f (x) = 0 is solvable by radicals if and only
if the Galois group Gal(K/F ) of the splitting field K for f (x) is a solvable
group.
Proof: This is just a combination of Theorems 257 and 263.
11.8
Transcendental extensions
11.8.1
Simple transcendental extensions
Suppose K ia some extension field of F . Recall that an element α of K is said
to be algebraic over F if and only if there exists a polynomial p(x) ∈ F [x]
such that p(α) = 0. Otherwise, if there is no such polynomial we say that
α is transcendental over F and that F (α) is a simple transcendental
extension of F .
394
Recall further that a rational function over F is any element in the field
of quotients F (x) of the integral domain F [x]. Thus every rational function
r(x) is a quotient f (x)/g(x) of two polynomials in F [x], where g(x) is not the
zero polynomial, and f (x) and g(x) are relatively prime. For each element
α ∈ K, we understand r(α) to be the element
r(α) := f (α) · (g(α))−1 ,
provided α is not a zero of g(x). If α is transcendental over F , then r(α) is
always defined.
Lemma 265 If F (α) is a transcendental extension of F , then there is an
isomorphism F (α) → F (x) taking α to x.
Proof: Define the mapping : F (x) → F (α) by setting (r(x)) := r(α), for
every r(x) ∈ F (x). Clearly is a ring homomorphism. Suppose
r1 (x) =
c(x)
a(x)
, r2 (x) =
,
b(x)
d(x)
are rational functions with r1 (α) = r2 (α). We may assume a(x), b(x), c(x)
and d(x) are all polynomials with b(x)d(x) 6= 0. Then
a(α)d(α) − b(α)c(α) = 0.
Since α is transcendental, a(x)d(x) = b(x)c(x) so r1 (x) = r2 (x). Thus is
injective and its image is a subfield of F (α) containing (x) = α. Thus is
bijective, and so −1 is the desired isomorphism.
Now suppose F (α) is a simple transcendental extension of F , and let β
be any further element of F (α). By Lemma 265 there exist a coprime pair
of polynomials (f (x), g(x)), with g(x) 6= 0 ∈ F [x], such that
β = f (α)(g(α))−1 .
Suppose
f (x) = a0 + a1 x + · · · + an xn
g(x) = bo + b1 x + · · · + bn xn
395
where n is the maximum of the degrees of f (x) and g(x), so at least one of
an or bn is non-zero. Then f (α) − βg(α) = 0, so
(a0 − βb0 ) + (a1 − βb1 )α + · · · + (an − βbn )αn = 0.
Thus α is a zero of the polynomial
hβ (x) := (a0 − βb0 ) + (a1 − βb1 )x + · · · + (an − βbn )xn
in F (β)[x].
Claim: The polynomial hβ (x) is the zero polynomial if and only if β ∈ F .
First, if β ∈ F , then hβ (x) = f (x) − βg(x) is in F [x]. If it were a non-zero
polynomial, then its zero, α would be algebraic over F . Thus hβ (x) is the
zero polynomial.
On the other hand, if hβ (x) is the zero polynomial, each coefficent ai −βbi
is zero. Now, since g(x) is not the zero polynomial, for some index j, bj 6= 0.
Then β = aj · b−1
j , an element of F .
The claim is proved
Now suppose β is not in F . Then the coefficient (ai − βbi ) is zero if and
only if ai = 0 = bi , and this cannot happen for i = n as n was chosen. Thus
hβ (x) has degree n. So certainly α is algebraic over the intermediate field
F (β). Since [F (α) : F is infinite, and F (α) : F (β)] is finite, [F (β) : F ] is
infinite, and so β is transcendental over F .
We shall now show that hβ (x) is irreducible in F (β)[x].
Let D := F [y] be the domain of polynomials in the indeterminate y, with
coefficients from F . (This statement is here only to explain the appearance
of the new symbol y.) Then F (y) is the quotient field of D. Since β is
transcendental over F , Lemma 265, yields an F -isomorphism
: F (β) → F (y)
of fields which can be extended to a ring isomorphism
∗ : F (β)[x] → F (y)[x].
Then h(x) := ∗ (hβ (x)) = f (x) − yg(x) is a polynomial in D[x] = F [x, y].
Moreover, hβ (x) is irreducible in F (β)[x] if and only if h(x) is irreducible in
396
F (y)[x]. Now since D = F [y] is a UFD, by Gauss’ Lemma, h(x) ∈ D[x] is
irreducible in F (y)[x] if and only if it is irreducible in D[x]. Now, D[x] =
F [y, x] is a UFD and h(x) is degree one in y. So, if h(x) had a non-trivial
factorization in D[x], one of the factors k(x) would be of degree zero in y
and hence a polynomial of positive degree in F [x]. The factorization is then
h(x) = f (x) − yg(x) = k(x)(a(x) + yb(x)),
where k(x),a(x) and b(x) all lie in F [x]. Then f (x) and g(x) possess the
common factor k(x) of positive degree, against the fact that f (x) and g(x)
were coprime in F [x]. We conclude that h(x) is irreducible in F (y)[x], and
so hβ (x) is irreducible of degree n in F (β)[x] as promised.
We now have the following
Theorem 266 (Lüroth’s Theorem) Suppose F (α) is a transcendental extension of F . Let β be an element of F (α) − F . Then we can write β =
f (α)/g(α) where f (x) and g(x) are a pair of coprime polynomials in F [x],
unique up to multiplying the pair through by a scalar from F . Let n be the
maximum of the degrees of f (x) and g(x). Then [F (α) : F (β)] = n. In
particular, α is algebraic over F (β).
Remark: A general consequence is that if E is an intermediate field distinct
from F – that is, F < E ≤ F (α) — then [E : F ] is infinite. In particular,
F (α) − F contains no elements which are algebraic over F (a fact that has
an obvious direct proof from first principles.)
Lüroth’s theorem gives us a handle on the Galois group Gal(F (α)/F )
of a transcendental extension. Suppose now that σ is an F -automorphism
F (α) → F (α). Then σ is determined by what it does to the generator
α. Thus, setting β = σ(α), for any rational function t(x) of F , we have
σ(t(α)) = t(β). Now in particular β = r(α) for a particular rational function
r(x) = f (x)/g(x) where (f (x), g(x)) is a coprime pair of polynomials in F [x].
Since σ is onto, F (α) = F (β). On the other hand, by Lüroth’s Theorem,
[F (α) : F (β)] is the maximum of the degrees of f (x) and g(x). Thus we can
write
aα + b
,
(11.13)
β=
cα + d
where a,b,c and d are in F , a and c are not both zero, and ad−bd 6= 0 to keep
the polynomials ax + b and cx + d coprime. Actually the latter condition
implies the former, that a and c are not both zero.
397
Now let G = Gal(F (α)/F ) and set O := αG , the orbit of α under G. Our
observations so far are that there exists a bijection between any two of the
following three sets:
1. the elements of the group Gal(F (α)/F ),
2. the orbit O, and
3. the set LF (F ) of so-called “linear fractional” transformations,
ω→
aω + b
, a, b, c, d ∈ F, ad − cb non-zero,
cω + d
viewed as a set of permutations of the elements of O.
A perusal of how these linear factional transformations compose shows
that LF (F ) is a group, and that in fact, there is a surjective group homomorphism
GL(2, F ) → LF (2, F ),
defined by
a b
c d
!
→ [ the transformation z →
az + b
on set O ],
cz + d
where the matrix has non-zero determinant – i.e. is invertible. The kernel of
this homomorphism is the subgoup of two-by-two scalar matrices γI, comprising the center of GL(2, F ). As the reader will recall, the factor group,
GL(2, F )/Z(GL(2, F )) is called the “one-dimensional projective group over
F ” and is denoted P GL(2, F ). Our observation, then, is that
Gal(F (α)/F ) ' LF (F ) ' P GL(2, F ).
Notice how the fundamental theorem of Galois Theory falls to pieces in
the case of transcendental extensions. Suppose F is the field Z/pZ, the ring
of integers mod p, a prime number. If α is transcendental over F , then
[F (α) : F ] is infinite, while Gal(F (α)/F ) is P GL(2, p), a group of order
p(p2 − 1) which, when p ≥ 5, contains at index two, a simple subgroup.
398
11.8.2
Transcendence Degree
Let K be an extension of the field F 4 . We define a notion of algebraic
dependence over F on the elements of K as follows: We say that element
α algebraically depends (over F ) on the finite subset {β1 , . . . , βn } if and
only if
• α is algebraic over the subfield F (β1 , . . . , βn ).
Theorem 267 Algebraic dependence over F is an abstract dependance relation on K in the sense of Chapter 1.
proof: We must show that the relation of algebraic depence over F satisfies
the three required properties of a dependance relation: reflexivity, transitivity
and the exchange condition.
Clearly if β ∈ {β1 , . . . , βn }, then β lies in F (β1 , . . . , βn ) and so is algebraic
over it. So the relation is reflexive.
Now suppose γ is algebraic over F (α1 , . . . , αm ), and each αi is algebraic
over F (β1 , . . . , βn ). Then certainly, αi is algebraic over the field
F (β1 , . . . , βn , α1 , . . . , αi−1 ).
So, setting B := {β1 , . . . , βn }, we obtain a tower of fields,
F (B) ≤ F (B, α1 ) ≤ F (B, α1 , α2 ) ≤ · · ·
≤ F (B, α1 , · · · , αn ) ≤ F (β1 , . . . , αn , γ)
with each field in the tower of finite degree over its predecessor. Thus the
top field in the tower is a finite extension of the bottom field, and so γ is
algebraic over F (B).
We now address the exchange condition: we are to show that if γ is
algebraic over F (β1 , . . . , βn } but is not algebraic over F (β1 , . . . , βn−1 }, then
βn is algebraic over the field F (β1 , . . . , βn−1 , γ}.5
4
This subsection is drawn from notes of my colleague David Surowski, for which I am
most grateful.
5
It is interesting to observe that in this context the Exchange Condition is essentially an
“implicit function theorem”. The hypothesis of the condition, transferred to the context of
algebraic geometry, says that the “function” γ is expressible locally in terms of {β1 , . . . , βn }
∂γ
but not in terms of {β1 , . . . , βn−1 }. Intuitively, this means ∂β
6= 0 , from which we
n
anticipate the desired conclusion.
399
By hypothesis, there is a polynomial relation
0 = ak γ k + ak−1 γ k−1 + · · · + a1 γ + a0
(11.14)
where ak is non-zero, and each aj is in F (β1 , . . . , βn ), and so, at the very
least, can be expressed as a rational funtion, nj /dj , where the numerators nj
and denominators dj are polynomials in F [β1 , . . . , βn ]. We can then multiply
equation (11.14) through by the product of the di , from which we see that
without loss of generality, we may assume the coefficients ai are polynomials
– explicitly:
ai =
X
b βj ,
j ij n
bij ∈ F [β1 , . . . , βn−1 ], ak 6= 0..
(11.15)
Then substituting the formulae of equation (11.15) into equation (11.14), and
collecting the powers of βn , we obtain the polynomial relation
0 = pm βnm + pm−1 βnm−1 + · · · p1 βn + p0 ,
with each coefficient
pt =
Xk
b γ
s=0 st
s
,
(11.16)
(11.17)
a polynomial in the ring F [β1 , . . . , βn−1 , γ].
Now we get the required algebraic dependence of βn on {β1 , . . . , βn−1 , γ}
from equation (11.16), provided not all of the coefficients pt are zero. Suppose, by way of crontadiction, that they were all zero. Then as γ does not
depend on {β1 , . . . , βn−1 }, having each of the expressions in equation (11.17)
equal to zero forces each coefficient bst to be zero, so by equation (11.15),
each ai is zero. But that is impossible since certainly ak is non-zero.
Thus the Exchange Condition holds and the proof is complete.
So, as we have just seen, if K is an extension of the field F , then the
relation of being algebraically dependant over F is an abstract dependance
relation, and we call any subset X of K which is independant with respect to
this relation algebraically independant (over F ). By the basic development of Chapter 1, Section 3, maximal algebraically independant sets exist,
they “span” K, and any two such sets have the same cardinality. We call
such subsets of K transcendence bases of K over F , and their common
cardinality is called the transcendence degree of K over F , and is denoted
tr.deg(K/F ).
400
If X is a transcendence basis for K over F , then the “spanning” property
means that every element of K is algebraic over the subfield F (X) generated
by X. One calls K(X) a “purely transcendental” extension of F , although
this notion conveys little since it is complely relative to X. For example, if x
is a real number that is transcendental over Q, the field of rational numbers,
then both Q(x) and Q(x2 ) are purely transcendental extensions of Q, while
the former is algebraic over the latter.6
Next we observe that transcendence degrees add under iterated field extension just as ordinary degrees multiply. Specificically:
Corollary 268 Suppose we have a tower of fields
F ≤ K ≤ L.
Then
tr.deg[K : F ] + tr.deg[L : K] = tr.deq[L : F ].
proof: Let X be a transcendence basis of K over F and let Y similarly be
a transcendence basis of L over K. Clearly then, X ∩ Y = ∅, every element
of K is algebraic over F (Y ), and every element of L is algebraic over K(Y ).
The corollary will be proved if we can show that X ∪ Y is a transcendence
basis of L over F .
First, we claim that X ∪ Y spans L – that is, every element of L is
algebraic over F (X ∪ Y ). Let λ be an arbitrary element of L. Then there is
an albebraic relation
λm + am λm−1 + · · · + a1 λ + a0 = 0,
where m ≥ 1, and the coefficients ai are elements of K(Y ). This means that
each of them can be expressed as a quotient of two polynomials in K[Y ].
Let B be the set of all coefficients of the monomials in Y appearing in the
numerators and denominators of these quotients representing ai , as i ranges
over 0, 1, . . . , m. Then B is a finite subset of K, and λ is now algebraic over
6
In this connection, there is a notion that is very useful in looking at the subject of
Riemann Surfaces from an algebraic point of view: A field extension K of F of finite
transcendence degree over F is called an algebraic function field if and only if, for any
transcendence basis X, K is a finite extension of F (X).
401
F (X ∪ Y ∪ B). Since each element of B is algebraic over F (X), F (X ∪ B)
is a finite extension of F (X), and so F (X ∪ Y ∪ B) is a finite extension of
F (X ∪ Y ). It then follows that F (X ∪ Y ∪ B, λ) is a finite extension of
F (X ∪ Y ), and so λ is algebraic over F (X ∪ Y ) as desired.
It remains to show that X ∪ Y is algebraically independant over Y . But if
not, there is a polynomial p ∈ F [z1 , . . . zn+m ] and finite subsets {x1 , . . . , xm }
and {y1 , . . . , yn } of X and Y , respectively, such that upon substitution of the
m xi for the first m zi ’s, and the yi for the remaining n zi ’s, one obtains
0 = p(x1 , . . . , xm , y1 , . . . ym ).
Now this can be rewritten as a polynomial whose monomial terms are monomials in the yi , with coefficients which are polynomials ck in F [z1 , . . . , zn ]
evaluated at x1 , . . . , xn . Since the yi are algebraically independant over K,
the coefficient polymomials ck are each equal to zero when evaluated at
x1 , . . . , xn . But again, since the xj are algebraically independent over F ,
we see that each polynomial ck is identically 0. This means that the original
polynomial p was the zero polynomial of F [z1 , . . . , zm+n ], eastablishing the
algebraic independence of X ∪ Y . The proof is complete. .
Corollary 269 An algebraic extension of an algebraic extension is algebraic.
Precisely, if F ≤ K ≤ L is a tower of fields with K an algebraic extension of
F and L an algebraic extension of K, then L is an algebraic extension of F .
proof: This follows from the previous theorem and the fact that zero plus
zero is zero.
remark: Before, we knew this corollary only with “finite degree” replacing the word “algebraic” in the statement.
11.9
Appendix to Chapter 10: Fields with a
Valuation
11.9.1
Introduction
Let R denote the field of real numbers and let F be any field. A mapping
φ : F → R from F into the field of real numbers is a valuation on F if and
only if, for each α and β in F :
402
1. φ(α) ≥ 0, and φ(α) = 0 if and only if α = 0.
2. φ(αβ) = φ(α)φ(β), and
3. φ(α + β) ≤ φ(α) + φ(β).
This concept is a generalization of certain “distance” notions which are
familiar to us in the case that F is the real or complex field. For example,
in the case of the real field R, the absolute value function
|r| =
(
r
−r
if r ≥ 0
if r < 0
is easily seen to be a valuation. Similarly, for the complex numbers, the
function “k k”, defined by
√
ka + bik = a2 = b2 ,
where a, b ∈ R and the square root is non-negative, is also a valuation.
In this case, upon representing complex numbers by points in the complex
plane, kzk measures the distance of z from 0. The “subadditive property”
(3.) then becomes the triangle inequality.
A valuation φ : F → R is said to be archimedean if and only if there is
a natural number n such that if
s(n) := 1 + 1 + · · · + 1 { with exactly n summands},
in F , then φ(s(n)) > 1.
These valuations of the real and complex numbers are clearly archimedean.
An example of a non-archemedian valuation is the trivial valuation for
which φ(0) = 0, and φ(α) = 1 for all non-zero elements α in F .
Here is another important example. Let F be the field of quotients of some
unique factorization domain D which contains a prime element p. Evidently,
F 6= D. Then any nonzero element α of F can be written in the form
α=
a k
·p ,
b
where a, b ∈ D, p does not divide either a or b, and k is an integer. The
integer k and the element a/b ∈ F are uniquely determined by the element
α. Then set
φ(α) = e−k .
403
where e is any fixed number larger than 1, such as 2.71 . . .. Otherwise we set
φ(0) = 0. The student may verify that φ is a genuine valuation of F . (Only
the subadditive property really needs to be verified.) From this construction,
it is clear that φ is non-archimedean.
Lemma 270 If φ is a valuation of F , then the following statements hold:
1. φ(1) = 1.
2. If ζ is a root of unity in F , then φ(ζ) = 1. In particular, φ(−1) = 1
and so φ(−α) = φ(α), for all α in F .
3. The only valuation possible on a finite field is the trivial valuation
4. For any α, β ∈ F ,
|φ(α) − φ(β)| ≤ φ(α − β).
The proof of this lemma is left as an exercise for the student.
Lemma 271 If φ is a non-archimedean valuation of F , then for any α, β ∈
F,
φ(α + β) ≤ max(φ(α), φ(β)).
proof: From the subadditivity, and that fact that for any number of summands, φ(1 + 1 + · · · + 1) ≤ 1, we have
n
φ((α + β) ) = φ
≤
Xn
j=0
Xn
j=0
n
j
!
α
n−j
β
j
!
φ(α)n−j φ(β)j
≤ (n + 1)max (φ(α)n , φ(β)n ).
Since the taking of positive n-th roots in the reals is monotone, we have
φ(α + β) ≤ (n + 1)1/n max (φ(α), φ(β)).
Now since n is an arbitrary positive integer, and (n + 1)1/n can be made
arbitrarily close to 1, the result follows.
404
11.9.2
The Language of Convergence
We require a few definitions. Throughout them, φ is a valuation of the field
F.
First of all, a sequence is a mapping N → F , from the natural numbers
to the field F , but as usual it can be denoted by a displayed indexed set, for
example
{α0 , α1 , . . .} = {αk } or {βk }.
A sequence {αk } is said to be a Cauchy sequence if and only if, for
every positive real number , there exists a natural number N (), such that
φ(αm − αn ) < ,
for all natural numbers n and m larger than N ().
A sequence {αk } is said to converge relative to φ if and only if there
exists an element α ∈ F such that for any real > 0, there exists a natural
number N (), such that
φ(α − αk ) < ,
(11.18)
for all natural numbers k exceeding N ().
It is an easy exercise to show that an element α satisfying (11.18) for
each > 0 is unique. [Note that in proving uniqueness, the choice function
N : R+ → N taking to N () is posited for the assertion that α is a limit.
To assert that β is a limit, an entirely different choice function M : → M ()
may be posited.] In that case we say that α is the limit of the convergent
sequence {αk } or that the sequence {αk } converges to α.
A third easy exercise is to prove that every convergent sequence is a
Cauchy sequence.
A sequence which converges to 0 is called a null sequence. Thus {αk }
is a null sequence if and only if, for every real > 0, however small, there
exists a natural number N (), such that φ(αk ) < , for all k > N ().
A field F with a valuation φ is said to be complete with respect to
φ if and only if every Cauchy sequence converges (with respect to φ) to an
element of F .
Now, finally, a completion of F (with respect to φ) is a field F̄ with
a valuation φ̄ having these three properties:
1. The field F̄ is an extension of the field F .
2. F̄ is complete with respect to φ̄.
405
3. Every element of F̄ is the limit of a φ-convergent sequence of elements
of F . (We capture this condition with the phrase “F is dense in F̄ ”.
So far our glossary entails
• Convergent sequence
• Limit
• Null sequence
• Cauchy sequence
• Completeness with respect to a valuation φ
• A completion of a field with a valuation
11.9.3
Completions exist
The stage is set. In this subsection our goal is to prove that a completion
(F̄ , φ̄) of a field with a valuation (F, φ) always exists.
In retrospect, one may now realize that the field of real numbers R is the
completion of the field of rational numbers Q with respect to the “absolute”
valuation, | |. The construction of such a completion should not be construed
as a “construction” of the real numbers. That would be entirely circular since
we used the existence of the real numbers just to define a valuation and to
deduce some of the elementary properties of it.
One may observe that the set of all sequences over F possesses the structure of a ring, namely the direct product of countably many copies of the
field F . Thus
S := F N =
Y
k∈N
Fk where each Fk = F .
Using our conventional sequence notation, addition and multiplication in S
are defined by the equations
{αk } + {βk } = {αk + βk }
{αk } · {βk } = {αk βk}.
For each element α ∈ F , the sequence {αk } for which αk = α, for every
natural number k is called a constant sequence and is denoted ᾱ. The zero
406
element of the ring S is clearly the constant sequence 0̄. The multiplicative
identity of the ring S is the constant sequence 1̄ (where “1” denotes the
multiplicative identity of F ).
Obviously, the constant sequences form a subring of S which is isomorphic
to the field F . It should also be clear that every constant sequence is also
a Cauchy sequence. We shall show that the collection Ch of all Cauchy
sequences also forms a subring of S.
In order to do this, we first make a simple observation. A sequence {αk }
is said to be bounded if and only if there exists a real number b such that
αk < b for all natural numbers k. Such a real number b is called an upper
bound of the sequence {αk }. The observation is
Lemma 272 Every Cauchy sequence is bounded.
proof: Suppose {αk } is a Cauchy sequence. Then there is a natural number
N = N (1) such that φ(αN − αm ) < 1, for all natural numbers m exceeding
N . Now set
b := 1 + max {α0 , α1 , . . . , αN }.
If k ≤ N , αk < b, by the definition of b. If k > N then αk < b by the
definition of N and the fact that αN ≤ b − 1. So b is an upper bound.
Lemma 273 The collection Ch of all Cauchy sequences of S is closed under
addition and multiplication of sequences. Thus Ch is a subring of S, the ring
of all sequences over F .
proof: Recall that if {αk } is a Cauchy sequence, there must exist an auxilliary function N : R+ → N such that if ∈ R+ , then φ(αn − αm ) < for all numbers n and m larger than N (). This auxillialy function N is not
unique; it is just that at least one such function is available for each Cauchy
sequence. If we wish to indicate an available auxilliary function N , we simply
write ({αk }, N ) instead of {αk }.
Now suppose ({αk }, N ) and ({βk }, M ) are two Cauchy sequences. For
each positive real number set K() := max(N (/2), M (/2)). Then for any
natural numbers n and m exceeding K(), we have
φ(αn + βn − αm − βm ) ≤ φ(αn − αm ) + φ(βn − βm ) ≤ /2 + /2 = .
Thus ({αk + βk }, K) is a Cauchy sequence.
407
Again suppose ({αk }, N ) and ({βk }, M ) are two Cauchy sequences. By
the preceeding Lemma 272, we may assume that these sequences possess
positive upper bounds b and c, respectively. For each positive real set
K() := max (N (/2c), M (/2b)).
Now suppose n and m are natural numbers exceeding K(). Then
φ(αm βm − αn βn ) = φ(αm βm − αm βn + αm βn − αn βn )
≤ φ(αm )(φ(βm − βn ) + φ(βn )φ(αn − αm )
< b(/2b) + c(/2c) = .
Thus ({αk βk }, K) is a Cauchy sequence.
Finally, it should be obvious that {−αk } is the additive inverse in S of
{αk } and, from an elementary property of φ, that the former is Cauchy if
and only if the latter is.
It follows that the Cauchy sequences form a subring of S.
The null sequences of S are simply the sequences that converge to zero.
We have already remarked that every convergent sequence is a Cauchy sequence. Thus the set Z of all null sequences lies in Ch. Note that any
sequence whose coordinates are nonzero at only finitely many places (that
P
is, the direct sum N F ) form a subset Z0 of Z.
Clearly the sum or difference of two null sequences is null. Suppose {αk }
is a null sequence so that for every positive real , there is a natural number
N () such that αm , for all m > N (). Next let {γk } be any Cauchy sequence
with upper bound b. Then, for every real > 0, and n > N (/b), we see that
φ(αn γn ) ≤ b · (/2) = .
Thus the product {αk }{γk } is a null sequence. So we have
Lemma 274 Z is an ideal of the ring Ch.
.
A sequence {αk } is said to possess a lower bound ` if and only if for
some real ` > 0, one has αn > ` for every natural number n. The next
lemma, though mildly technical, is important.
Lemma 275 The following assertions hold:
408
1. Suppose a sequence {γk } in S has the property that for every real > 0,
γk > for only finitely many natural numbers k. Then {γk } is a null
sequence.
2. Suppose {βk } is a Cauchy sequence which is not a null sequence. Then
there exists a real number λ > 0, and a natural number N such that
βk > λ for all k > N .
3. If {αk } is a Cauchy sequence having a lower bound `, then it is a unit
in the ring Ch.
proof: 1. The reader may want to do this as an exercise. The hint is that
the hypotheses allow us to define a mapping K : R+ → N where
K() := max {k ∈ N|γk > }.
Then K serves the desired role in the definition of a null sequence.
2. Using the contrapositive of the statement in part 1., we see that if
{βk } is a Cauchy sequence which is not null, there must exist some > 0
such that βk > infinitely often. Also, since the sequence is Cauchy, there
exists a natural number N (/2) such that
φ(βn − βm ) ≤ /2,
for all natural numbers n and m exceeding N (/2). So there is an natural
number N exceeding N (/2) such that βM > . Then for m > N ,
φ(βm ) = φ(βM − (βM − βm )
≥ φ(βM ) − φ(βM − βn )
> − (/2) = /2.
Thus λ := /2 and N satisfy the conclusion of part 2.
3. For each > 0, let K() := N ( · `2 ). Then for any natural numbers n
and m exceeding K(), we have
1
1
−
φ
αn αm
1
= φ
φ(αm − αn )
αn αm
≤ `−2 · (`2 ) = .
Thus the sequence of reciprocals {αk −1 } is a Cauchy sequence and is a multiplicative inverse to {αk } in Ch. The proof is complete.
409
Corollary 276 The factor ring F̄ := Ch/Z is a field.
proof: It suffices to show that every element of Ch − Z is congruent to a
unit modulo the ideal Z. Suppose {βk } is a non-null Cauchy sequence. Then
by Lemma 275, there is a positive real number λ and a natural number N
such that βn > λ for all natural numbers n greater than N . Now let {ζk } be
a sequence for which
(
1 if k ≤ N
ζk =
0 if k > N .
Then {ζk + βk } is bounded below by ` := min (1, λ) and so by Lemma
275, part 3., is a unit. Since {ζk } belongs to Z, we have shown the sufficient
condition announced at the beginning of this proof.
Note that we have a natural embedding F → F̄ taking each element α
to the constant sequence ᾱ, all of whose terms are equal to α.
Now we need to extend the valuation φ on (the embedded copy of) F
to a valuation φ̄ of F̄ . The key observation is that whenever we consider
a Cauchy sequence {αk }, then {φ(αk )} is itself a Cauchy sequence of real
numbers with respect to the absolute value. For, given real > 0, there is a
number N () such that φ(αn − αm ) < for all n and m exceeding N (). But
|φ(αm ) − φ(αm )| ≤ φ(αn − αm ) < with the same conditions on n and m.
Now Cauchy sequences {rn } of real numbers tend to a real limit which
we denote by the symbol
lim rk .
k→∞
Thus, for any Cauchy sequence {αk } in Ch, we may define
φ̄({αk }) := lim φ(αk ).
k→∞
Now note that if {αk } happens to be a sequence that already converges
to α in F , then, by the definition of convergence, limk→∞ φ(αk ) = φ(α). In
short, on convergent sequences, φ̄ can be evaluated simply by takin φ of the
existing limit. In particular this works for constant sequences, whereby we
we have arrived at the fact that φ̄ extends the mapping φ defined on the
embedded subfield F .
Now observe
410
Lemma 277 If {αk } and {βk } are two Cauchy sequences with the property
that
φ(αk ) ≤ φ(βk ),
for all natural numbers k larger than some number N , then
φ̄({αk }) ≤ φ̄({βk }).
proof: The student should be able to prove this, either by assuming the
result for real Cauchy sequences, or from first principles by examining what
it would mean for the conclusion to be false.
Corollary 278 If {αk } and {βk } are two Cauchy sequences,
φ̄({αk + βk }) ≤ φ̄({αk }) + φ̄({βk }).
proof: Immediate from Lemma 277.
Corollary 279 The following statements are true.
1. {αk } is a null sequence if and only if φ̄({αk }) = 0 ∈ R.
2. φ̄ assumes a constant value on each coset of Z in Ch.
proof: Another exercise in the meaning of the definitions.
Now for any coset γ := {βk } + Z of Z in Ch we can let φ̄(γ) be the
constant value of φ̄ on all Cauchy sequences in this coset. In effect φ̄ is now
defined on the factor ring F̄ = Ch/Z, which, as we have noted, is a field.
Moreover, from the preceeding Corollary φ̄ assumes the value 0 only on the
zero element 0̄ = 0 + Z = Z of F̄ .
At this stage – and with a certain abuse of notation – φ̄ is a non-negativereal-valued function defined on two distinct domains, Ch and F̄ . However
the ring epimorphism f : Ch → F̄ preserves the φ̄-values. That is important
for three reasons.
First it means that if φ̄ is multiplicative on Ch, then it is on F̄ . But it is
indeed the case that φ̄ is multiplicative on Ch, since, for {αk } and {βk } in
Ch,
lim φ(αk βk ) =
k→∞
lim φ(αk )φ(βk )
k→∞
= ( lim φ(αk )) · ( lim φ(βk )).
k→∞
411
k→∞
Second, the fact that the homomorphism f preserves φ̄, means that if φ̄
is subadditive on Ch, then it is also subadditive on F̄ . But the subadditivity
on Ch is asserted in Corollary 278, part 1.
Third, since F ∩ Z = 0 ∈ Ch (that is, the only constant sequence which
is null is the zero constant sequence), the homomorphism f embedds F (in
its transparent guise as the field of constant sequences) into F̄ without any
change in the valuation.
These observations, taken together, yield
Corollary 280 The function φ̄ : F̄ → R+ is a valuation of F̄ , which extends
the valuation φ on its subfield F .
It remains to show that F̄ is complete with respect to the valuation φ̄
and that F dense in F̄ .
Recall that the symbol ᾱ denotes the constant sequence {α, α, . . .}. Its
image in F̄ is then f (ᾱ) = ᾱ+Z. Now suppose {αk } is any Cauchy sequence,
and let γ = {αk } + Z be its image in F̄ . We claim that the sequence
{f (ᾱ1 ), f (ᾱ2 ), . . .}
(of constant sequences mod Z) is a sequence of elements in F̄ converging to
γ relative to φ̄. Choose a positive real number . Then there exists a natural
number N () such that
φ(αN () − αm ) < for all m > N ().
Then as f preserves φ̄,
φ̄ f (ᾱN () ) − γ̄
= φ̄ ᾱN () − {αk }
=
lim φ αN () − αn < .
n→∞
So our claim is true. F is indeed dense in F̄ .
Now suppose {γ` } is a Cauchy sequence of elements of F̄ with respect to
φ̄. For each index `, we can write
γ` = {β`k },
a Cauchy sequence over F indexed by k. So there exists a natural number
N (`) such that
φ(β`N (`) − β`m ) < 2−`
(11.19)
412
for all m exceeding N (`). This allows us to define the following set of elements
of F :
δ` := β`N (`) , for ` = 0, 1, . . . .
Then taking limits in equation (11.19), we have
φ̄(f (δ̄` ) − γ` ) < 2−` ,
where, as usual, f (δ̄k ) is the image of a constant sequence, namely the coset
{δk , δk , δk , . . .} + Z.
Now for any natural numbers n and m, we have
φ(δn − δm ) = φ̄ f (δ̄m ) − f (δ̄n )
= φ̄ f (δ̄n ) − γm + γm − γn + γn − f (δ̄m )
≤ φ̄(f (δ̄n − γm ) + φ̄(γn − γm ) + φ̄(γm − f (δ̄m ))
≤ φ̄(γn − γm ) + 2−m + 2−n
But since {γ` } is a Cauchy sequence, the above equations tell us that δ :=
{δk } is a Cauchy sequence in Ch. It is now clear that
φ̄(δ − γ` ) = lim (δk − β`k ) < 2−` .
k→∞
Thus {γ` } converges to δ with respect to φ̄. It follows that F̄ is a complete
field.
Summarizing all of the results of this subsection, one has the following
Theorem 281 The field F̄ = Ch/Z with valuation φ̄ is a completion of
(F, φ).
11.9.4
The non-archimedean p-adic valuations
Our classical example of a non-archimedean valuation was derived from the
choice of a prime p in a unique factorization domain D. The valuation φ
is defined on the field F of quotients of D by the rule that φ(r) = e− k,
whenever we write r ∈ F in the canonical form r = (a/b)pk where a and b
are elements of D which are not divisible by p, and k is an integer. Of course
to make this function subadditive, we must take e to be a fixed real number
413
greater than 1. (Many authors choose p itself. That way, the values never
overlap from one prime to another.)
Now notice that from Lemma 271, the collection of all elements α of F
for which φ(α) ≤ 1 is closed under addition as well as multiplication, and
so form a subring O of F called the valuation ring of (F, φ). From the
multiplicative properties of φ, it is clear that every non-zero element of F
either lies in O, or has its multiplicative inverse in O.
A somewhat smaller set is P := {α ∈ F |φ(α) < 1}. It is easy to see that
this is an ideal in O. In fact the set O − P = {u ∈ F |φ(u) = 1} is closed
under taking multiplicative inverses in F and so consists entirely of units of
O. Conversely, since φ(1) = 1, every unit of O must have φ-value 1, so in
fact
O − P = U (O),
(11.20)
the (multiplicate) group of units of O. Thus P is the unique maximal ideal
of O. Thus O is a local ring.
Note that O and P have been completely determined by F and φ. The
factor ring O/P is a field called the residue field of F with respect to φ.
The same sort of argument used to show (11.20) can be used to show that
there exists an element π ∈ P , such that for any natural number k,
P k − P k+1 = π k U (O).
Taking the disjoint union of these set differences, one concludes that P is the
principal ideal P = Oπ.
Suppose p is a prime integer,D = Z, so F = Q, the field of rational
numbers. Then O is the ring of all fractions ab where a and b are integers and
p does not divide b. Terribly odd things happen! For example {1, p, p2 , . . .}
is a null sequence (since φ(pk ) = e−k , for all positive integers k), and
1 + p + p2 + · · ·
is a “convergent series” – that is its sequence of partial sums is a Cauchy
sequence.
Let us return to our classic example. There F was the field of fractions of
a unique factorization domain D, p is a prime element of D and φ(r) = e−k
when r ∈ F is written in the form (a/b)pk , a, b ∈ D, k an integer. Let O be
the valuation ring and let P be its unique prime ideal. Similarly, let F̄ , φ̄,
Ō and P̄ be the completion of F , its extended valuation, valuation ring and
uniqe maximal ideal, respectively.
414
Theorem 282 The following statements hold:
1. D ⊆ O, and D ∩ P = pD.
2. D + P = O.
3. F ∩ Ō = O, O ∩ P̄ = P .
4. O + P̄ = Ō.
5. The residue class fields of F and F̄ are both D/pD.
proof: 1. Clearly every element of D has valuation at most 1 so the first
statement holds. Those elements of D of value less than 1 are divisible by p,
and vice versa. So the second statement holds.
2. Suppose r ∈ O − P . Then r has the form a/b where a and b are
elements of D which are not divisible by p. Since p is a prime and D is a
UFD, this means there elements c and d of D such that cp + db = a. Then
a
c
r= =
· p + d.
b
b
(11.21)
Now p does not divide d, otherwise it would divide a = cp + db, against our
assumption about r. Thus d is an element of D also in O−P ,and by equation
(11.21), r ≡ (a/b) modulo P . This proves 2.
3. Both statements here just follow from that fact the φ̄ extends φ.
4. Now let ᾱ be any element in Ō − P̄ so φ̄(ᾱ) = 1. Since F is dense in
F̄ , there is an element r ∈ F such that φ(ᾱ − r) < 1. This means r − ᾱ is an
element of P . Since O is an additive subgroup of F containing P , r belongs
to Ō ∩ F = O, and we can say r ≡ ᾱ mod P̄ . This proves 4.
5. First using 3. and 4., and then 1. and 2.,
Ō/P̄ =
'
O/P =
'
(O + P̄ )/P̄
O/(O ∩ P̄ ) = O/P, while
(D + P )/P
D/(D ∩ P ) = D/pD.
Now fix a system Y of coset representatives of D/pD and suppose 0 ∈ Y .
For example, if D = Z, then p is a rational prime, the residue class field is
Z/pZ, and we may tke Y to be the set of integers,
{0, 1, 2, . . . , p − 1}.
415
For another example, suppose D = K[x] where K is some field, and p =
x + a is a monic polynomial of degree one. Then the residue class field is
K[x]/(x + a) ' K. But as K is a subring of D, it makes perfect sense to
take Y = K (the polynomials of degree zero).
Now in the notation of Theorem 282, D + P̄ = (D + P ) + P̄ = O + P̄ = Ō.
So, for a fixed ᾱ ∈ Ō, there is a unique element a0 ∈ Y such that ᾱ − a0 ∈ P̄ .
(If ᾱ is already in P̄ , this element is 0.) Now P̄ = pŌ, so
1
α¯1 = (ᾱ − a0 ) ∈ Ō.
p
Then we repeat this procedure to obtain a unique element a1 ∈ Y such that
α¯1 − a1 ∈ pŌ. Then
ᾱ ≡ a0 + pa1 mod p2 Ō.
Again, one can find a2 ∈ Y such that if α¯2 := (1/p)(α¯1 − a1 ), then α¯2 − a2 ∈
pŌ. So, after k repetitions of this procedure, we obtain a sequence of k + 1
unique elements of Y , (a0 , a1 , . . . , ak ) such that
ᾱ ≡ a0 + a1 p + a2 p2 + · · · + ak pk mod P̄ k+1 .
Thus we can write any ᾱ ∈ F̄ in the “power series” form
ᾱ = pk [a0 + a1 p + a2 p2 + · · ·]
where the ai are uniquely determined elements of Y , and a0 6= 0, so that k
is determined by e−k = φ̄(ᾱ).
For example, if p = x ∈ K[x] = D, then, taking Y = K, we see that F̄ is
the field of Laurent series over K – that is, the ring of power series of the
form
b−k x−k + · · · + b−1 x−1 + b0 + b1 x + b2 x2 + · · · , k ∈ N
where the bi are in K and one follows the usual rules of adding and multiplying power series.
The ring of Laurent series is very important in the study of power series
generating functions. Be aware that the field K here is still arbitrary. It can
even be a finite field.
Note that if D = Z and p is a rational prime, the system of coset representatives Y is not closed under either addition or multiplication and so, in
order to restore the coefficients of the powers of p to their proper location in
416
Y , some local adjustments are needed after addition or multiplication. For
example if p = 5, then we must rewrite
(1 + 3(5))(1 + 4(5)) = 1 + 7(5) + 12(52 )
as
1 + 2(5) + 3(52 ) + 2(53 ).
417
11.10
Appendix to Chapter 10: Finite Division Rings
In 1905 J. H. M. Wedderburn proved that every finite division ring is in fact
a field.
In van Der Waerden’s book (1930-31) [??. pp 104-105] one finds a seemingly short proof that depends only on the easy exercise which proves that
no finite group can be the union of the conjugates of a proper subgroup.
But the proof is short only in the sense that the number of words required
to render it is not large. Measured from actual elementary first principles,
the proof spans a considerable “logical distance”. It uses the notion of a
splitting field for a division ring, and a certain structure theorem involving a
tensor product of algebras. Both of these notions are beyond what has been
developed in this and the preceeding chapters.
Several other proofs are available. The first proof given below is due to
T. J. Kaczynski [??] and has several points of interest.
We begin with a small lemma.
Lemma 283 Every element in a finite field is the sum of two squares. In
particular, when p is a prime number, there exist numbers t and r such that
t2 + r2 ≡ −1, mod p, with t 6≡ 0.
proof: If q is even, all elements are squares and hence is the sum of 02 and
a square. So suppose q is odd and let S be the set of non-zero squares. Then,
for any non-square q ∈ GF(q), we have a partition GF(q)∗ = S + Sg . Now
using 0 as one of the summand squares, we see that every element of {0} ∪ S
is the sum of two squares, and if (S + S) ∩ gS 6= ∅ (where now “S + S”
indicates all possible sums of two squares), then this is true of gS as well and
the result is proved. Otherwise S + S ⊆ {0} ∪ S and so {0} ∪ S is a subfield
of GF(q). That is impossible since |{0} ∪ S| = (q + 1)/2 is not a prime power
dividing q.
In particular, −1 can be represented as the sum of two squares, and one
of them at least (say t) is non-zero.
Theorem 284 (Wedderburn) Every finite division ring is a field.
proof: (Kaczynski, 1964) Let D be a finite division ring and let P be the
prime subfield in its center. Also let D∗ be the multiplicative group of nonzero elements of D. For some prime r let S be an r-Sylow subgroup of D∗ , of
418
order ra , say. Now S cannot contain an abelian subgroup A of type Zr × Zr
otherwise the subfield generated by P ∪ A would have too many roots of
xr − 1 = 0. Thus S is either cyclic or else r = 2 and S is a generalized
quaternion group. We shall show that the latter case is also forbidden.
Suppose S is generalized quaternion. Then the characteristic p is not 2
and S contains a subgroup S0 which is quaternion of order 8, generated by
two elements a and b or order 4 for which bab = a−1 and z = a2 = b2 is its
central involution. Then the extension P (z) is a subfield containing the two
roots of x2 − 1 = 0, namely 1 and −1. It follows that
a2 = −1 = b2
so a3 = −a. Thus
ba = a−1 b = −ab.
Now let p = |P |, the characteristic of D. Then by the lemma we can find
elements t and r in P (and hence central in D) such that t2 + r2 = −1. Then
((ta + b) + r) ((ta + b) − r) = (ta + b)2 − r2
= t2 a2 + r(ab + ba) + b2 − r2
= −t2 + b2 − r2 = 0.
So one of the factors on the left is zero. But then b = −ta ± r and so as
t 6= 0, b commutes with a, a contradiction.
Thus, regardless of the parity of p, every r-Sylow subgroup of D∗ must
be cyclic. If follows that D∗ must be solvable. Now let Z ∗ = Z(D∗ ) be the
center of the group D∗ and suppose Z ∗ 6= D∗ . Then we can find a normal
subgroup of D∗ containing Z ∗ such that A/Z ∗ is a minimal normal subgroup
of D∗ . Since the latter is solvable with all Sylow subgroups cyclic, A/Z ∗ is
cyclic. Now Z ∗ is in the center of A, and since no group has a non-trivial
cyclic factor over its center, A is abelian.
Now we shall show that in general, D∗ cannot contain a normal abelian
subgroup A which is not in the center Z ∗ . Without loss of generality we can
take (as above) Z ∗ < A D∗ . Choose y ∈ A − Z ∗ . Then there exists an
element x in D∗ which fails to commute with y. Then also x + 1 is not in A
as it fails to commute with y as well. Then, as A is normal,
(1 + x)y = a(1 + x),
419
for some element a ∈ A. Then
y − a = ax − xy = (a − yxy −1 )x.
(11.22)
Now if y − a = 0, then a = y and a = xyx−1 by (11.22), so y = xyx−1
against the fact that x and y do not commute. Thus (a−xyx−1 ) is invertible,
and from (11.22),
x = (a − xyx−1 )−1 (y − a).
(11.23)
But a, y and xyx−1 all belong to A and so commute with y. Thus every
factor on the right side of (11.23) commutes with y, and so x commutes with
y, the same contradiction as before.
the proof is complete.
Remark: As noted, the last part of Kaczynski’s proof did not use finiteness
at all. As a result, one can infer
Corollary 285 (Kaczynski) Suppose D is any division ring with center Z.
Then any abelian normal subgroup of D∗ lies in Z.
One might prefer a proof that did not use the fact that a finite group
with all sylow subgroups cyclic is solvable. This is fairly elementary to a
finite group theorist (one uses a transfer argument on the smallest prime and
applies induction) but it would not be as transparent to the average man or
woman on the street.
Another line of argument due to to I. N. Herstein [??] develops some
truly beautiful arguments from first principles. Consider:
Lemma 286 Suppose D is a division ring of characteristic p, a prime, and
let P be it prime subfield, which, of course, is contained in the center Z.
Suppose a is any element of D − Z which is algebraic over P . Then there
exists an element y ∈ D such that
yay −1 = ai 6= a
for some integer i.
proof: Fix element a as described. The commutator mapping δ : D → D,
which takes an arbitrary element x to xa − ax, is clearly an endomorphism
420
of the additive group (D, +). Also, for any element λ of the finite subfield
P ((a) obtained by adjoining a to P , the fact that λ commutes with a yields
δ(λx) = λxa − aλx = λ(xa − ax) = λδ(x).
Thus, regarding D as a left vector space over the field P (a), we see that δ is
a linear transformation of V .
Now, for any x ∈ D, and positive integer k, an easy computation reveals
k
δ (x) =
Xk
j
j=0
(−1)
k
j
!
aj xak−j .
m
Now P (a) is a finite field of order pm , say. Then ap
characteristic p, putting k = pm in (11.24) gives
m
m
(11.24)
= a. Since D has
m
δ p (x) = xap − ap = xa − ax = δ(x).
Thus the linear transformation δ of V satisfies
m
δ p − δ.
Thus, in the commutative subalgebra of Hom (V, V ) generated by δ, we have
m
0 = δp − δ =
Y
λ∈P (a)
(δ − λI),
where the λI are the scalar transformations V → V , λ ∈ P (a).
Now ker δ is the subalgebra CD (a) of all elements of D which commute
with a. (This is a proper subspace of V since a is not in the center Z.) Thus
Q
δ acts as a non-singular transformation of W := V /CD (a) while (δ − λI)
(the product taken over all non-zero λ ∈ P (a)) vanishes on it. It follows that
for some λ ∈ P (a) − {0}, δ − λI is singular on W . This means there is a
coset b0 + CD (a) 6= CD (a) such that
δ(b0 ) = λb0 + c, c ∈ CD (a).
Setting y = b0 + λ−1 c, we have δ(y) = λy 6= 0. This means
yay −1 = a + λ ∈ P (a).
Now if m is the order of a in the multiplicative group P (a)∗ , then yay −1 is
one of the roots of X m − 1 = 0 lying in P (a). But since these roots are just
421
the powers of a, we have yay −1 = ai 6= a, for some i, 1 < m ≤ m − 1, as
desired.
Now we proceed with a proof of Wedderburn’s theorem. We may assume
that D is a non-commutative finite division ring in which all proper subalgebras are fields. In particular, for any element v ∈ D − Z, the centralizing
subalgebra CD (v) is a field. It follows that the commuting relation on D − Z
is an equivalence relation. For each commuting equivalence class Ai , and
element ai ∈ Ai , we have
Fi := Z ∪ Ai = CD (ai ).
(11.25)
Let D∗ be the multiplicative group of non-zero elements of D, so Z ∗ :=
Z − {0} be its center. Choose a coset aZ ∗ so that its order in D∗ /Z ∗ is
the smallest prime s dividing the order of D∗ /Z ∗ . The actual representative
element a can then be chosen to have s-power order.
Now by Herstein’s lemma, there is an element y ∈ D such that
yay −1 = at 6= a.
From (11.25) it follows that conjugation by y leaves the subfield F := CD (a)
invariant, and induces on it an automorphism whose fixed subfield is
CD (y) ∩ CD (a) = Z.
But the same conclusion must hold when y is replaced by a power y k which is
not in Z. It follows that Gal(F/Z) is cyclic of prime order r and conjugation
by y induces an automorphism of order r. Thus y r ∈ CD (a)∩CD (a) = Z, so r
is the order of yZ ∗ in the group D∗ /Z ∗ . We can then choose the representative
y to be in the r-Sylow subgroup of Z(y)∗ , so that it has r-power order.
Now, by the “(very) little Fermat Theorem” is−1 ≡ 1 mod s. Thus
s−1
y (s−1) ay −(s−1) = ai
≡ a mod Z ∗ .
So, setting b := y s−1 , we have
bab−1 = az,
for some element z in Z which is also a power of a. Now from this equation,
z = 1 if and only if b ∈ CD (y) ∩ CD (a) = Z.
422
Case 1: z = 1 and b is in the center. Then y s−1 ∈ Z, so r, the order of
yZ in D∗ /Z ∗ , divides s − 1. That is impossible since s was chosen to be the
smallest prime divisor of |D∗ /Z ∗ |.
Case 2. z 6= 1 and b has order r mod Z ∗ . Then br a−r = a = az r . But
since z is an element of s-power order, we now have r = s is the order of z.
Now at this stage, the multiplicative subgroup H := ha, bi of D∗ generated by
a and b is a nilpotent group generated by two elements of r-power order, and
so is an r group. Since it is non-abelian without a subgroup of type Zr × Zr ,
it is generalized quaternion and so contains the quaternion subgroup of order
8. Herstein goes on to eliminate the presence of Q8 by an argument that
essentially anticipates by three years the argument given in the first part of
Kaczynski’s proof above7
There is another fairly standard proof which comes closer to being a proof
from elementary principles. It uses an argument due to Witt [??] concerning
cyclotomic polynomials and the partition of the group D∗ into conjugacy
classes. One may consult Jacobson’s book (cited below), pages 431-2, for an
example of this proof.
∗
References Herstein, I.Wedderburn’s theorem and a theorem of Jacobson, Amer. Math. Monthly, 68 (1961), 249-251.
Jacobson, Nathan, Basic Algebra I, W. H. Freeman Co., San Fransisco,
1974. Kaczynski, T. J. Another proof of Wedderburn’s theorem. Amer.
Math. Monthly, 71 (1964), 652-653.
Witt, Ernst, Über die Kommutativität endlicher Schiefköpern. Abhandlung. Math. Sem. Univ. Hamburg, 8 (1931), p. 413.
Wedderburn, J. H. M., A theorem on finite algebras, Trans. Amer. Math.
Soc. 6 (1905), 349-352.
van der Waerden, B. L. Algebra vol II Springer-Verlag, Berlin-New York,
1991
7
Actually Herstein inadvertantly omitted Case 1 by taking s = r from a misreading of
the definition of s. As we see, this presents no real problem since his lemma applies when
s is the smallest possible prime.
423
Chapter 12
Semiprime Rings
12.1
Introduction
In this chapter we develope the material necessary to reach the famous ArtinWedderburn theorem which – in the version given here – classifies the Socle
of a prime ring.
We begin with the important concept of complete reducibility of modules.
When transferred to rings, this notion and its equivalent incarnations give us
quick access to the Socle of a ring and its homogeneous components. Most of
the remaining development takes place in a general class of rings called the
semiprime rings. It has several important subspecies, the prime, primitive
and completely reducible rings, where the important role of idempotents
unfolds.
12.2
Complete Reducibility
One defines the radical radM of a right R-module M as the intersection of
all maximal submodules of M . If there are no maximal submodules — as in
the Z-module
2
Z∞
p = h1/p, 1/p , . . .i ≤ Q/Z
– then radM = M , by definition.
There is a dual notion: One defines the socle Soc(R) as the submodule
generated by all minimal submodules (that is, all irreducible submodules)
of M . Similarly, if there are no irreducible submodules – as in the case of
424
the free Z-module ZZ – then by convention one sets Soc(M ) = 0, the zero
submodule.
Theorem 287 If M is a right R-module then
1. Soc(M ) = ⊕i∈J Mi , where {Mi }i∈J is a family of irreducible submodules
of M , and the sum is direct.
2. In addition, Soc(M ) is invariant under the endomorphism ring, E =
homR (M, M ).
proof: We prove the second statement first. Recall that for a right Rmodule M , homR (M, M ) is regarded as acting on the left, so that the fact
that application of such endomorphisms commutes with the right action of
R can be expressed as an “associative law”, that is,
ψ(mr) = (ψm)r, for all (ψ, m, r) ∈ homR (M, M ) × M × R.
Now if Mj is an irreducible submodule of M , and ψ is an R-endomorphism
of M , then ψ(Mj ) = 0 or is irreducible because of its extremely limited
P
lattice of submodules. Thus for any element m lying in a finite sum Mi
P
of irreducible submodules of M , ψ(m) ∈ ψ(Mi ), where ψ(Mi ) is zero or
irreducible. Since the hypothesis on m in the previous sentence characterizes
the elements of Soc(M ), we see that
ψ(Soc(M )) ≤ Soc(M )
for all ψ ∈ homR (M, M ), as desired.
Let the set I index the family of all irreducible modules of M . Trivially,
we may assume I is non-empty. We shall call a subset J of I a direct set if
P
and only if j∈J Mj = ⊕j∈J Mj is a direct sum. [Recall that this means that
P
if a finite sum k∈K ak = 0 where ak ∈ Mk and K ⊆ J, then each ak = 0.
We note also that singleton subsets of I are direct, and so, as irreducible
modules are assumed to exist, direct subsets of I really exist.]
Suppose
I0 ⊂ L1 ⊂ · · · ⊂ · · ·
is a properly ascending tower of direct sets, and let J be the union of the sets
P
in this tower. If J were not direct, there would be a finite sum k∈K ak = 0
425
with not all ak equal to zero. But as each index k of the sum lies in some If (k) ,
the finite index set K for this sum is contained in some Im in the tower where
m = max{f (k)|k ∈ K}. But then the existence of such a sum contradicts
the assumption that each member Ij of the tower was a direct set.
Thus in the poset of all direct subsets of I, partially ordered by inclusion,
all simply ordered chains possess an upper bound. Thus by Zorn’s Lemma,
this poset contains a maximal member, say H.
We now claim that MH := ⊕j∈H Mj = Soc(M ). Clearly, MH ≤ Soc(M ).
Suppose MH was properly contained in Soc(M ). Then there would exist an
irreducible submodule M0 not in MH . Then as 0 is the only proper submodule
P
of M0 , we see MH ∩ M0 = 0. Then the sum ( H Mj ) ⊕ M0 is direct, for if
not, there would be a finite subset K of H ∪ {0}, and elements ak , k ∈ K,
P
not all zero, such that k∈K ak = 0. Since H is direct, K is not a subset of
H, and so 0 ∈ K and
X
a0 = −
a
k∈K−{0} k
6= 0.
(12.1)
But then both sides of equation (12.1) represent a non-zero element of MH ∩
M0 , an impossibility. Thus H ∪ {0} is direct, and that contradicts the maximality of H. Thus Soc(M ) = ⊕j∈H Mj , and the proof is complete.
Corollary 288 The following conditions on an R-module are equivalent:
1. M = Soc(M ).
2. M is a sum of minimal submodules.
3. M is isomorphic to a direct sum of irreducible submodules.
An R-module M satisfying any of the above three conditions is said to
be completely reducible.
A submodule L of a right R-module M is called large if it has a nonzero intersection with every non-zero submodule. (Note that if M 6= 0, M
is a large submodule of itself.) We can produce (but not actually construct)
many large submodules using the following
Lemma 289 If B is a submodule of M and C is maximal among submodules
of M meeting B at zero, then B + C is large.
426
proof: Assume B and C chosen as above and let D 6= 0 be any non-trivial
submodule of M . Suppose (B + C) ∩ D = 0. If b, c, and d are elements of B,
C and D, respectively, and b = c + d, then d = b − c and this is zero by our
supposition. Thus b = c ∈ B ∩ C = 0, so b = c = 0. Thus B ∩ (C + D) = 0.
By maximality of C, D ⊆ C. Then D = (B + C) ∩ D = 0, a contradiction.
Thus always (B + C) ∩ D is non-zero, and so B + C is large.
That such a submodule C exists which is maximal with respect to meeting
B trivially, as in Lemma 289, is an easy application of Zorn’s Lemma.
If M is a right R-module, let L(M ) be the lattice of all submodules of M
(sum is the lattice “join”, intersection is the lattice “meet”). We say L(M )
is complemented if every submodule B of M has a complement in L(M )
– i.e. there exists a submodule B 0 such that B ∩ B 0 = 0 and B + B 0 = M
(this means M = B ⊕ B 0 ).
Lemma 290 If L(M ) is complemented, then so is L(N ) for any submodule
N of M .
proof: Let C be a submodule of N . By hypothesis there is a complement
C 0 to C in L(M ). Then setting C 00 := C 0 ∩ N , we see
C 00 ∩ C ⊆ C 0 ∩ C = 0, and
C 00 + C = (C 0 ∩ N ) + C = (C + C 0 ) ∩ N, (for C ≤ N )
= M ∩ N = N.
Thus C 00 is a complement of C in N .
remark: Of course the above is actually a lemma about any modular lattice:
if it is complemented, then so are any of its intervals above zero.
Lemma 291 If M is a right R-module for which L(M ) is complemented,
then rad(M ) = 0.
proof: Let m be a non-zero element of M and let A be maximal among the
submodules which do not contain m. (A exists by Zorn’s Lemma.) Suppose
427
now A < B < M so A is not maximal in L(M ). Let B 0 be a complement to
B in L(M ). Then by the modular law (Dedekind’s Lemma)
A = (B ∩ B 0 ) + A = B ∩ (B 0 + A), as A < B.
Now as B < M implies B 0 6= 0 and A < B implies B 0 ∩ A = 0, we see
that A is properly contained in B 0 + A. Then the maximality of A implies
m ∈ B 0 + A and also m ∈ B. Thus m lies in B ∩ (B 0 + A) = A, contrary to
the choice of A.
Thus no such B exists and A is in fact a maximal submodule of M . Then
we see that every non-zero module element is avoided by some maximal
submodule of M , and hence is outside rad(M ). It follows that rad(M ) = 0.
Theorem 292 The following conditions on an R-module M are equivalent:
1. M is completely reducible.
2. M contains no proper large submodule.
3. L(M ) is complemented.
proof: 1. implies 2. Assume M = Soc(M ). If A is a proper large submodule
of M , then there exists an irreducible submodule M0 not contained in A, since
the irreducibles generate Soc(M ). But then as M0 is irreducible, A ∩ M0 = 0.
But this contradicts the property that A is large.
2. implies 3. Let B be any proper submodule of M . Choose C maximal
so that C ∩ B = 0 (Zorn’s Lemma gives us this). Then by Lemma 289, C + B
is large and so by part 2., C + B = M . This means C is a complement to B
in L(M ).
3. implies 1. Since L(M ) is complemented, Soc(M ) has a complement
C in L(M ). Since we are trying to prove that Soc(M ) = M , let us assume
by way of contradiction that Soc(M ) is a proper submodule of M . Then its
complement C is non-zero.
Now by Lemma 290, the sublattice L(C) is also complemented. Then by
Lemma 291, we must have rad(C) = 0. Thus, for each nonzero element c of
C, there is a maximal submodule Cc of C not containing c. Then Cc has a
complement C 0 in L(C), so C = Cc ⊕ C 0 . Since Cc is a maximal submodule,
C 0 ' C/Cc is irreducible. Thus C 0 is a minimal submodule of M not lying
in Soc(M ), which is a contradiction.
Thus M = Soc(M ) so M is completely reducible. The proof is complete.
428
12.3
Homogeneous Components.
12.3.1
The action of the R-endomorphism ring.
Let M be a right R-module and let E = homR (M.M ) be its R-endomorphism
ring. As remarked earlier, we may regard M as a left E-module, E M , and
because of the “associative law”
e(mr) = (em)r for all (e, m, r) ∈ E × M × R,
we say that in this case that M is naturally an (E, R)-bimodule, denoted
E MR . (Obviously, by replacing E by a ring S with morphism S → E, we
obtain a general definition of an (S, R)-bimodule.)
Now let Mi be any irreducible submodule of M and let e be an endomorphism of M (i.e. e ∈ E). Then, as remarked in the proof of Lemma 287 of
the previous section, either eMi = 0 or else eMi is irreducible. But in the
latter case we should also notice that E, restricted to Mi , is an isomorphism
Mi → e(Mi ). This motivates the following definition: If Mi is an irreducible
P
submodule of M , we call the sum Mj over all irreducible submodules Mj of
M that are isomorphic to Mi , the homogeneous component of M containing Mi . Clearly the homogeneous components of M are submodules of
Soc(M ).
Lemma 293 If M is a completely reducible R-module and E = homR (M, M )
and Mi is any irreducible submodule of M , then EMi is the homogeneous
component of M containing Mi .
proof: If e ∈ E, then eMi is either the zero submodule, or is an irreducuble
module isomorphic to Mi . In either case eMi lies in the homogeneous component containing Mi . Conversely, if Mk is a submodule isomorphic to Mi ,
then there is a complement Mi0 to Mi in the lattice L(M ), so M = Mi ⊕ Mi0 .
Then there is an endomorphism e : M → Mk with kernel Mi0 obtained by
composing the projection onto Mi with the isomorphism Mi → Mk .
12.3.2
The Socle of a Module is a Direct Sum of the
Homogeneous Components.
For the moment, we don’t even know that the homogeneous components do
not intersect non-trivially, or that they are ever proper submodules. This is
429
clarified by the next
Lemma 294 Suppose C1 is the homogeneous component obtained as a sum
of all irreducible submodules of M isomorphic to the irreducible submodule
M1 . Suppose M0 is an irreducible submodule of M lying within C1 . Then
M0 ' M1 .
proof: First, C1 is a submodule of Soc(M ) which is completely reducible.
Thus L(C1 ) is complemented. So there is a complementing submodule M00
to M0 in L(C1 ). Let F be the collection of all irreducible submodules of M
which are isomorphic to M1 . Since these modules generate C1 , there exists
one –say N1 – not in M00 . Then from irreducibility, N1 ∩ M00 = 0 so
N1 ' (N1 + M00 )/M00 ≤ M/M00 = (M0 ⊕ M00 )/M00 ' M0 .
Since M0 is irreducible, this shows N1 ' M0 and so M0 ∈ F.
Corollary 295 Soc(M ) is the direct sum of the homogeneous components of
M.
proof: Let I index the isomorphism types of the irreducible submodules of
M and for σ ∈ I let Cσ be the corresponding homogeneous component of M .
As remarked above ECσ = Cσ ≤ Soc(M ).
P
Obviously Soc(M ) = σ∈I Cσ since every irreducible module lies in some
Cσ .
We claim that for any finite subset K of the index set I, and σ ∈ I − K,
that
X
( k∈K Ck ) ∩ Cσ = 0.
P
This will imply that the sum σ∈I Cσ = Soc(M ) is direct, completing the
proof.
Suppose the contrary, that for some σ and K as above,
C := (
X
k∈K
Ck ) ∩ Cσ 6= 0.
Since C is completely reducible, it contains an irreducible submodule M0 .
Since it is also an irreducible submodule of the completely reducible module
P
P
0
K Ck , it contains a complement M0 in it. Since
K Ck is generated by its
430
irreducible modules, there exists an index s ∈ K and irreducible submodule
Ms of Cs such that Ms does not lie in M00 . Then considering that M0 is
P
irreducible and M00 is maximal in K Mk , the latter can be expressed as
both M0 ⊕ M00 and as Ms ⊕ M00 . Thus M0 ' Ms . Now by Lemma 294, all
the irreducible submodules of Cσ and Cs are of the same isomorphism type,
against σ ∈
/ K. So the claim holds.
The next sections concern rings rather than modules. But we can already
apply a few of the results of this section to the module RR .
Note that because of the associative law in R, left multiplication of RR
by any element of R is an R-endomorphism of RR , and so by Lemma 293
leaves each homogeneous component of RR invariant. Thus we see
Corollary 296 Every homogeneous component of Soc(RR ) is a 2-sided ideal
of R. Also Soc(RR ) is itself a 2-sided ideal of R (which we write as Soc(R)).
12.4
Semiprime rings.
12.4.1
Introduction
We now pass to rings. The basic class of rings that we shall deal with are
the semiprime rings and certain subspecies of these. In order to state the
defining properties, we should touch base by reviewing three notions about
2-sided ideals that may or may not have been covered before.
If A and B are 2-sided ideals of the ring R, the product AB of ideals is
the additive group generated by the set of all products {ab|(a, b) ∈ A × B}.
This additive group is itself already a 2-sided ideal, which is denoted AB, and
which lies in the ideal A ∩ B. (Taking products among ideals is associative.)
We say that a (2-sided) ideal A is nilpotent if and only if there exists
some positive integer n such that
An := A · A · · · · A(to n factors ) = 0 := {0R },
the zero ideal.
An ideal P is said to be a prime ideal if and only if, whenever A and B
are ideals such that AB ⊆ P , then either A ⊆ P or B ⊆ P .
431
12.4.2
The semiprime condition
Theorem 297 The following conditions on a ring R are equivalent:
1. 0 is the only nilpotent ideal.
2. 0 is the intersection of all prime ideals.
3. For any ideals A and B of R, AB = 0 implies A ∩ B = 0.
proof: 1. implies 2. Let a0 = a 6= 0. Then Ra0 R is not nilpotent. So its
square, Ra0 Ra0 R is non-zero, so there is a non-zero element a1 in a0 Ra0 . In
general we can iterate this process to find an element ai+1 non-zero in ai Rai .
Set T = {an |n ∈ {0} ∪ N}. Before proceeding further, we note an important
property of this sequence:
(S) If element ai is in an ideal J, then so are all of its successors in T .
Let P be an ideal maximal with respect to not containing any element
of T . (It exists by Zorn’s Lemma.) Now suppose A and B are ideals of R
neither of which is contained in P but AB ⊆ P . By the maximality of P the
ideals A + P and B + P both meet T non-trivially. So there are subscripts i
and j such that ai ∈ A + P , and aj ∈ B + P . Set m = max(i, j). Then by
the property (S), am lies in both P + A and P + B, so
am+1 ∈ am Ram ⊆ (A + P )(B + P ) ⊆ AB + P ⊆ P
against P ∩ T = ∅. Thus no such ideals A and B exist, so P is a prime ideal.
Thus, we have just argued, for each non-zero element a of R, there is a prime
ideal P = Pa not containing it. It follows that the intersection of all prime
ideals of R is the zero ideal 0 := {0R }.
2. implies 3. Assume A and B are ideals for which AB = 0. Then
AB ⊆ P for any prime ideal P so either A lies in P or B lies in P . In any
event A ∩ B ⊆ P for each prime ideal P . Thus A ∩ B is contained in the
intersection of all prime ideals, and so is zero.
3. implies 1. Suppose A is a nilpotent ideal. Then for some positive
integer n, An = 0. Then by 3.,
0 = A ∩ An−1 = An−1 = A ∩ An−2 = · · · = A.
Thus 1. holds.
432
A ring satisfying any of the conditions above is called semiprime.
An easy way to characterize semiprime rings is given in
Lemma 298 The following conditions are equivalent:
1. R is semiprime.
2. For each non-zero element x in R, xRx 6= 0.
proof: 1. implies 2. Suppose 1., but suppose that 2. fails. Then there is
a non-zero x such that xRx = 0. Then as x ∈ RxR, we see that RxR is a
non-trivial ideal which is nilpotent since
(RxR)2 = RxRxR = R · 0 · R = 0.
This contradicts 1.
2. implies 1. Now, by way of contradiction, assume that 2. holds, but
that 1. fails. Then there is a non-trivial nilpotent ideal A. Let n be the
smallest positive integer such that An = 0 (here, n ≥ 2). Then N := An−1 is
a non-trivial ideal and N 2 = 0. Thus if x is any non-zero element of N , we
have xRx ⊆ N 2 = 0, contrary to our assumption 2.
12.4.3
Completely Reducible and Semiprimitive Rings
are Species of Semiprime Rings
At the beginning of this chapter we have defined the radical of a right Rmodule M to be the intersection of all maximal submodules of M , or, if there
are no maximal submodules, to be M itself. Similarly, we define the radical,
Rad(R), of the ring R to be rad(RR ) – the radical of the right module
RR . Accordingly, Rad(R) is the intersection of all maximal right ideals of R
and is called the Jacobson Radical of R. Although we do not require this
fact, it turns out that the Jacobson radical, despite its being defined in terms
of right ideals, is actually a 2-sided ideal. (For a neat proof of this fact see
Lambeck:Rings and Modules, page 57.) One can then “mod out” the radical
and observe that the factor ring has trivial Jacobson radical. But these are
side issues.
A ring is said to be semiprimitive if and only its Jacobson radical is
trivial. Finally we say that a ring R is completely reducible if and only if
the right module RR is a completely reducible module. Now we can state
433
Theorem 299 The following implications hold among properties of a ring:
completely reducible ⇒ semiprimitive ⇒ semiprime
proof: The first implication is Lemma 291 applied to the module M = RR .
For the second implication suppose R is a ring which is semiprimitive but
not semiprime. Then Rad(R) = 0 and there exists a non-zero nilpotent ideal
N . By replacing N by an appropriate power if necessary, we may assume
N 2 = 0 6= N . Let M be a maximal right ideal and suppose N is not contained
in M . Then R = N + M by maximality of M . Then 1 = n + m, where n ∈ N
and m ∈ M . Then
n = 1 · n = n2 + mn = 0 + mn ∈ M.
Thus 1 = n + m ∈ M which is impossible. Thus we see that N must lie in
every maximal right ideal, so N ⊆ Rad(R) = 0. This contradicts N 6= 0.
Thus no such ring R exists, so the implication holds.
12.4.4
Idempotents and Minimal Right Ideals of Semiprime
Rings.
The importance of semiprimness for us is that it affects the structure of the
minimal right ideals of R. But first we need a lemma which holds for rings
generally.
Lemma 300 (Brauer’s lemma) Let K be a minimal right ideal of a ring R,
Then either
(i) K 2 = 0, or
(ii) K = eR where e2 = e.
proof: Suppose K 2 6= 0. Then by the minimality of K, K = K 2 = kK for
some element k ∈ K. Now observe that AnnK (k) := {x ∈ K|kx = 0}, the
right annihilator of k in K, is a right ideal of R lying properly in K. Hence
it is the zero ideal. Then there is a unique element e such that ke = k. Then
k(e2 − e) = (ke)e − ke = ke − ke = 0,
434
so e2 − e ∈ AnnK (k) = 0, so e2 = e. Then 0 6= eR ⊆ K, so eR = K as K
was minimal.
Corollary 301 If R is semiprime, then every minimal right ideal of R has
the form eR where e2 = e.
proof: Let K be a minimal right ideal of R. Then by Brauer’s Lemma, the
conclusion holds or else K 2 = 0. But in the latter case, RK is a 2-sided ideal
containing K which is nilpotent, since
(RK)2 = R(KR)K ⊆ RK 2 = 0.
This contradicts the fact that R is semiprime.
12.4.5
Idempotents and Left-Right Symmetries
An element e in a ring R with the property that e2 = e is called an idempotent. Clearly the defining property of an idempotent is left-right symmetric;
yet it makes its first appearance here in the two results of the previous subsection in connection with right ideals. We wish to exploit the symmetry of
idempotents to acheive results relating minimal right ideals to minimal left
ideals, right Socles to left Socles and so on.
First we require a technical observation:
Lemma 302 (Technical Lemma on Idempotents) Let e be an idempotent of
R and let f be any element of R. Then there is a bijective morphism of
additive groups
homR (eR, f R) → f Re.
If, moreover f = e, this an isomorphism of rings.
proof: We define a mapping
µ : f Re → homR (eR, f R),
where µ(f re) is left multiplication of eR by f re. Thus µ(f re) sends element
er0 of eR to (f re)er0 = f rer0 ∈ f R. Clearly associativity of multiplication
435
in R reveals that this left multiplication is an R-homomorphism of right Rmodules: eR → f R. Clearly, eRf is a subgroup of (R, +), and homR (eR, f R)
is an additive group in the usual way. Since
µ(f r1 e + f r2 e)(er0 ) =
=
=
=
µ(f (r1 + r2 )e)(er0 )
f (r1 + r2 )er0 = f r1 er0 + f r2 er0
µ(f r1 e)(er0 ) + µ(f r2 e)(er0 )
(µ(f r1 e) + µ(f r2 e))(er0 ) for all er0 ∈ eR,
we see that µ is a group homomorphism.
If µ(f re) is the zero R-homomorphism, then f reR = 0R so f re = 0.
Thus ker µ = 0R , and µ is injective. To show that µ is a surjective (“onto”)
mapping we need only show that any R-homomorphism φ ∈ homR (eR, f R)
has the form of an appropriate left multiplication. Now φ(e) = f r for some
element r ∈ R. Then for any r0 ∈ R,
φ(er0 ) = φ(e2 r0 ) = φ(e) · er0 = f rer0 = (f re)(er0 )
as required. Thus µ is an isomorphism of the two additive groups.
For the last statement, we need only observe that when f = e, then
f Re = eRe is a subring of R, homR (eR, eR) is a ring with composition of
morphisms as its ring multiplication, and that µ preserves multiplication.
To demonstrate the latter, observe that for arbitrary elements r1 and r2 ,
the action of µ((er1 e)(er2 e)) = µ(er1 er2 e) is left multiplication of eR by the
element er1 er2 . But as R is associative, the latter is left multiplication by
er2 e followed by left multiplication by er1 e, the composition µ(er1 e)◦µ(er2 e).
The proof is complete.
Lemma 303 If R is semiprime and e2 = e ∈ R, then eR is a minimal right
ideal if and only eRe is a division ring.
proof: If eR is an irreducible right R module (minimal right ideal), then
by Lemma 302 above, eRe ' hom(eR, eR), and the latter is a division ring
by Schur’s Lemma (which was Corollary 7.16, to Fitting’s Lemma, Theorem
7.15).
Conversely, suppose eRe is a division ring. Let r be such that er 6= 0.
Then as R is semiprime, erRer 6= 0 (see Lemma 298). Thus erse 6= 0 for
436
some s ∈ R. But as erse is in the division ring eRe, it has an inverse ete.
Then
erse(ete) = ersete = e.
thus erR = eR. Now if there were a right ideal A such that 0 < A < eR, we
could find a non-zero er ∈ A. But by the sentence before last, eR = erR ⊆
AR ⊆ A, a contradiction. Thus no such A may exist and eR is a minimal
right R-module.
Corollary 304 If R is semiprime and e2 = e ∈ R, then eR is a minimal
right ideal if and only if Re is a minimal left ideal.
proof: In the preceeding Lemma 303, the condition that eRe is a division
ring has complete left-right symmetry. Thus if it is equivalent to eR being a
minimal right ideal, it is also equivalent to Re being a minimal left ideal.
We require one last technical result.
Lemma 305 Let e and f be idempotents of R. Then eR and f R are isomorphic as right R-modules if and only if there exist elements u and v in R
such that vu = e and uv = f .
proof: First assume there is an isomorphism of eR onto f R. Now by the
Technical Lemma 302, this isomorphism corresponds to left multiplication of
eR by an element u := f u0 e in f Re. Since f and e are idempotents, f ue = u,
so without loss we may take u0 = u. Similarly, the inverse isomorphism
corresponds to left multiplication of f R by an element v = evf . Then,
becauses these isomophisms are inverses of each other, vue = vu = e, and
similarly, uv = f .
Conversely, assume that elements u and v exist such that vu = e and
uv = f . Then ue = u(vu) = (uv)u = f u and similarly vf = ev. Then left
multiplication of eR by u, acting as: er → uer = f ur, is an isomorphism:
eR → f R. Its inverse is left multiplication of f R by v, acting as: f r0 →
vf r0 = evr0 .
We obtain, then, a Corollary relating left ideals to right ideals:
437
Corollary 306 If e and f are idempotents of a ring R, then the right ideals
eR and f R are isomorphic as right R-modules if and only if the left ideals
Re and Rf are isomorphic as left R-modules.
proof: This follows from the left-right symmetry of the pair of conditions
uv = f and vu = e.
Theorem 307 If R is semiprime, then R R and RR have the same Socles.
Also, they have the same homogeneous components, and these components
are the minimal 2-sided ideals of R.
P
proof: S := Soc(RR ) = eR where e ranges over all idempotents e for
P
which eRe is a division ring. Similarly, S 0 := Soc(R R) = Re where the
sum extends over the same set of idempotents e. Now by Corollary 296 S is
a 2-sided ideal of R, so S 0 ⊆ S. By symmetry S = S 0
Next let H be any homogeneous component of S = Soc(RR ). Then
P
H = eR where now e ranges over those idempotents e such that eR is
isomorphic as right R-module to some fixed minimal right ideal gR, where
g 2 = g. Now by Corollary 306 above, eR is isomorphic to gR as (irreducible)
right R-modules if and only Re is isomorphic to Rg as left R-modules. By
Corollary 304 Rg is a minimal left ideal. Thus RH = H contains all Re such
that Re is isomorphic to Rf , whose sum is a homogeneous component H 0 of
0
0
R R. Thus H ⊆ H. By symmetry again H = H .
Now we show that H is a minimal 2-sided ideal. Suppose K is a nonzero ideal contained in H. Since H is a sum of minimal right ideals, it
is a completely reducible right R-module, and so, therefore, is K. Thus
K contains a minimal right ideal eR. Then by Lemma 294, all irreducible
submodules of H are isomorphic to eR. Each of these has the form f R
where f 2 = f , and since f R ' eR there must exist elements u and v such
that f = uv = uev (Lemma 305). Thus f R = u(eR) ⊆ uK ⊆ K. Thus
H, being a sum of all such f R, lies in K, whence H = K. This shows that
H = H 0 is a minimal ideal of R.
We apply these ideas to completely reducible rings in the next section.
438
12.5
Completely Reducible Rings
The left-right symmetry we developed in the previous section can be applied
to characterization of completely reducible rings.
Theorem 308 The following four statements are equivalent for any ring R.
1. Every right R-module is completely reducible.
2. RR is completely reducible.
3. Every left R-module is completely reducible.
4.
RR
is completely reducible.
proof: Clearly 1. implies 2.
P
Assume 2., so R = I Ai where the Ai are some minimal right ideals. Let
M be any right R-module. Then clearly
M=
X
mR =
m∈M
X
m∈M
X
i∈I
mAi .
(12.2)
Now if N were a non-zero submodule of one of the summands mAi , then
the set B := {r ∈ Ai |mr ∈ N } is a right ideal of R contained in Ai , itself
a minimal right ideal. Now elements of N have the form ma,a ∈ Ai . Then
such an a is in B. Thus B 6= 0, so B = Ai and so N = mAi . Since N
was arbitrary in mAi , we see that each summand in the right hand side of
equation (12.2) is zero or is an irreducible module. Then by Corollary 288
at the beginning of this Chapter, M is completely reducible.
So 1. and 2. are equivalent. Similarly, 3. and 4. are equivalent (just
place the proof in the preceeding paragraphs in front of a mirror).
We next show that 2. implies 4. Assume 2. Then R is semiprime, since by
Theorem 299, completely reducibility implies semiprime-ness of rings. But
then, by Theorem 307, R R and RR have the same Socles. Since R = Soc(RR )
we see that Soc(R R) = R so R R is completely reducible.
By symmetry 4. implies 2. We are done.
There is an immediate application:
Corollary 309 Let D be a division ring. Then any right vector space V
over D is completely reducible. In particular V is a direct sum of copies of
DD .
439
proof:DD is an irreducible module so any right D module V (for that is
what a vector space is!) is completely reducible.
We need, for a moment, a very special notion. A ring is said to be a
prime ring if and only if 0 is a prime ideal. This means that if A and B are
ideals such that AB = 0 than either A = 0 or B = 0. Note that a prime ring
is semiprime, for in a prime ring no non zero ideal K satisfies K 2 = 0.
Lemma 310 Let R be a prime ring. Assume S = Soc(RR ) 6= 0. Then if eR
is a minimal right ideal of R, then homR (S, S) is isomorphic to the ring of
all linear transformations homeRe (Re, Re) .
proof: Since S = Soc(RR ) is a 2-sided ideal and is the direct sum of
its homogeneous components each of which is a 2-sided ideal, the prime
hypothesis implies that S is itself a homogeneous component. Let S be
P
written as a direct sum ei R of minimal right ideals with the ei idempotent,
and fix one of these, say e = e1 . Then each ei R ' eR so by Lemma 305,
there exist elements ui and vi such that vi ui = e and ui vi = ei . (Note that
as e = e1 , u1 = v1 = 1.)
Then
ui evi = ui vi ui vi = e2i = ei .
Now consider any φ in homR (S, S). We define
τ (φ) : Re → Re
by the rule
τ (φ)(re) = φ(re) = φ(re)e, for all r ∈ R.
Clearly τ (φ) is right eRe-linear. Also, for φ1 , φ2 ∈ homR (S, S) , we have
τ (φ1 + φ2 ) = τ (φ1 ) + τ (φ2 ).
Also,
τ (φ1 ◦ φ2 )(re) = (φ1 ◦ φ2 )(re)) · e.
= φ1 (φ2 (re)e)
= φ1 (φ2 (re)) · e
So τ is a ring homomorphism.
440
Clearly, τ (φ) = 0 if and only if φ(re) · e = 0 for all r ∈ R. So
φ(ei ) = φ(ui evi ) = φ(ui e)vi = 0 · vi = 0.
for all indices i. Thus φ(S) = 0, the zero mapping. It follows that τ is
one-to-one.
We shall show that τ is onto by producing a mapping σ : homeRe (Re, Re) →
homR (S, S) such that τ ◦ σ = 1, the identity map on homeRe (Re, Re). Now
fix ψ ∈ homeRe (Re, Re). Define σ(ψ) : S → S as follows: If s ∈ S, we can
uniquely write
X
X
s = ei ri = ui evi ri .
Then set
σ(ψ)(s) :=
X
ψ(ui e)vi ri ,
which is completely determined by ψ and the unique ri .
Now, in order to evaluate τ ◦ σ, we express an arbirary element re of Re
P
in the unique form ui evi ri . (This is a finite sum.) Now
τ (σ(ψ))(re) = σ(ψ)(re) · e
X
= ( ψ(ui e) · vi ri ) · e
=
X
ψ(ui e2 ) · vi ri · e
=
X
ψ(ui e) · euvi ri e
X
= ψ(
ui evi ri e) = ψ(re · e) = ψ(re).
Notice that the last three lines of equations use the fact that ψ is an eRemorphism in two places.
Thus τ is onto and so we may conclude that
τ : homR (S, S) → homeRe (Re, Re)
is a ring isomorphism.
Theorem 311 (Wedderburn-Artin)
1. A ring R is completely reducible if and only if it is isomorphic to a
finite direct product of completely reducible simple rings.
2. A ring R is completely reducible and simple if and only if it is the ring
of all linear transformations of a finite dimensional vector space.
441
proof: Part 1. Let R be completely reducible, so R is a direct sum of
minimal right ideals. Now the identity element 1 is a finite sum of elements
in these direct summands so 1 = e1 + · · · + en and 12 = 1 and the directness
of the sum implies e2i = ei and R = ei R ⊕ · · · ⊕ en R, ei R a minimal right
ideal.
Collecting R = SocRR as a sum of homogeneans components Cσ , again
expresssing 1 as a finite sum of elements from these components, we see that
1 = c1 + c2 + · · · + ck
where ci ∈ Ci
. Again 12 = 1 implies c2i = ci . But since Ci is a 2-sided ideal of R, the
directness of the sum R = ⊕k1 Ci allows us to infer from the equation r1 = 1r,
the equations rci = ci r, i = 1, · · · , k, for all r ∈ R. Thus the ci are central
idempotents, and each component ci R = Rci = Ci is itself a ring with
identity element ci .
Let Ci = H be any homogeneous component. Since Ci · Cj = 0 for i 6= j,
it is easy to see that the ideals and right ideals of R which lie in H are
precisely the ideals and right ideals of the ring H itself. Thus H is a sum of
minimal right ideals of H and so H is completely reducible.
Why is H simple? Because by Theorem 307 H is a minimal ideal of R.
Conversely, if R is a sum of ideals H which are sums of irreducible right
R-modules, RR is completely reducible.
Part 2. Let R be completely reducible and simple. Let eR be a minimal
right ideal of R, where e2 = e in R. Now RR is its own socle, hence
R ' HomR (R, R) ' HomeRe (Re, Re).
(12.3)
The second isomorphism is the previous Lemma 310. The first isomorphism
sends element r to left multiplication of R by the element r. By the right
distributive law and the associative law, the latter is an R-homomorphism.
By the left distributive law, this morphism R → homR (R, R) is a homomorphism of rings. Finally, rR = 0 if and only if r = 0, so it is injective.
On the other hand, if φ is an arbitrary R-endomorphism of RR , we may set
φ(1) := fφ . Then for any element r in R,
φ(r) = φ(1 · r) = φ(1)r = fφ · r,
– that is the left action of φ on RR is just left multiplication of all the elements
of R by an element, fφ of R. Thus the first mapping in equation (12.3) is also
442
surjective, and so depicts an isomorphism of rings. (In fact this part of the
argument assumed nothing about R; the first isomorphism always holds.)
Clearly homeRe (Re, Re) is a full ring of linear transformations of the right
vector space Re over eRe. It remains only to show that Re has finite dimension over the division ring eRe. It suffices to show that (Re)eRe is Noetherian.
To prove this we note that as RR is a direct sum of minimal right Rmodules, Mj , we can write the identity element as a finite sum
1 = m1 + m2 + · · · + mn
where mi ∈ Mi − {0}. Then clearly
RR = 1 · R = ⊕mi R,
and so the increasing partial sums form a composition series of RR of length
n. By the Jordan-Hölder theorem, and the fact that the lattice of submodules
of RR is modular, any chain of submodules of RR has length at most n, and
so RR is Noetherian.
Let K = {Kτ } be the family of all subspaces (that is, eRe-submodules) of
Re. For each Kτ ∈ K, Kτ R is a right R-submodule of (ReR)R . But as RR is
Noetherian, so is its submodule (ReR)R . Thus among the submodules Kτ R,
Kτ ∈ K, we can find one – say Kµ R which is maximal. Then if Kµ ≤ Kσ ,
then Kµ R = Kσ R. But, noting that as e is the identity of eRe, Kµ e = Kµ ,
we have
Kµ = Kµ eRe = Kµ Re
= Kσ Re = Kσ eRe = Kσ
Thus Kµ is a maximal element of K. Thus all ascending chains
K1 < K 2 < · · · < · · ·
in K convert to chains K1 R < · · · which stabilize at some Km R from which
it follows that Km is maximal in K. Thus Re is a Noetherian eRe-module,
and so has finite dimension.
12.6
Exercises for Chapter 12
12.6.1
Exercises relating to Sections 12.1 and 12.2
All of the results of this subsection possess a strong lattice-theoretic component. What I propose here is really a project – that means that the proposer
443
does not know the answer but beleives it answerable. So the basic thrust of
these exercises is for the student to investigate to what extent it is possible
to recast the theory of this section in the context of abstract lattices.
It would seem that minimally we must assume that L is a complete lattice. This means that L has a “zero” element, 0 bounding below all elements
of L, a “universal one” which bounds above all elements, and unrestricted
joins and meets, which means that any subset of L contains a least upper
bound and greatest lower bound in L.
There are then two pertinent subsets of L: max(L) , the set of all elements
which are maximal with respect to being distinct from 1, and min(L), the
set of elements of L which are minimal with respect to being distinct from
the lattice zero (usually called the atoms of a poset with 0). Either of these
sets might be empty. We can then define :
rad(L) =
( V
Soc(L) =
( W
if max(L) 6= ∅,
if max(L) = ∅
A∈min(L) A
L
A∈min(L) A
0
if min(L) 6= ∅,
if min(L) = ∅
Also, with these lattices, we may define the analogue of a direct sum. We
say that d is the direct join of the set {ai |i ∈ I} if and only if
1. d =
W
i∈I ai
2. For each finite subset J ⊆ I, and σ ∈ I − J, we have
aσ ∧
_
a
j∈J j
= 0.
Exercise 1. With only these hypotheses on L, show that Soc(L) is the direct
join of some set of minimal elements. [One defines direct subsets of a set
I indexing min(L) and uses the Zornification to obtain a maximal direct set
as in the proof of Theorem 287.]
Exercise 2. Of course there is the usual notion of duality of posets under
which (P, ≤) becomes (P, ≥). Then complete lattices as defined just above is
a self-dual notion. Define a notion of direct meet which would be the dual
of “direct sum” as above. Then one can define “codirect” subsets of the set
K indexing max(L). State the dual form of the result of Exercise 1.
444
More remarks: One notices that the later theorems of this section link
facts about rad(M ) with Soc(M ). There are two mediating concepts: L(M )
complemented and the notion of large submodules.
So let us say that a complete lattice L is complemented if and only if
for every element c of L, there exists an element c0 such that
1. c ∨ c0 = 1
2. c ∧ c0 = 0.
The second concept is the notion of a “large“ submodule. So we say that
an element l is large in L if and only if for every non-zero element d in L,
one has l ∧ d 6= 0.
These two notions are easy to define, but do they work in the same way?
One of the Lemmas employing large submodules in it’s proof relied on recognizing that a certain module C + B where C was maximal with respect to
meeting B trivially, was large. Even to get such modules to exist, a Zornification was used. Even later, a Zornification argument chose a submodule
maximal with respect to not containing a module element m. I propose two
analogues to these arguments in a general lattice L: the Zorn Axioms:
(M1) For every non-zero element d of L, there exists an element m of L
maximal with respect to d not less than m.
(M2) For every non-zero element d of L, there exists an element m of L
maximal with respect to d ∧ m = 0.
Finally, there is one further notion used throughout the arguments of this
section. The lattice of submodules of a module is modular. Recall from
Chapter 1, that a lattice (complete or not) is modular if and only if
The Modular Law (Dedekind’s Lemma) Whenever we have a ≤ b, then
for any c in L,
b ∧ (a ∨ c) = a ∨ (c ∧ b).
Suppose L(M ) is complete modular lattice satisfying hypotheses (M1) and
(M2). Investigate these questions:
Exercise 3. If L is complemented, does L = Soc(L)?
445
Exercise 4. Is it true that when rad(L) = 0, that there are no large elements? (You may assume (M1).)
Exercise 5. Is there a way to get around the arguments involving individual
elements of M to adapt Lemma 289 to modular complete lattices satisfying
any desired Zorn-axiom, (M1) or (M2)?
Exercise 6. Write a report adapting the theory of section to complete modular modular lattices satifying the Zorn Conditions (M1) and (M2). Explain
why it works or doesn’t work.
12.6.2
Warm up Exercises
7. An element x of a ring R is nilpotent if and only if, for some positive
integer n, xn = 0. Show that no nilpotent element is right or left
invertible.
8. Show that if x is a nilpotent element, then 1 − x is a unit of R.
9. A right ideal A of R is said to be nilpotent if and only if, for some
positive integer n (depending on A) An = 0. Show that if A and B are
nilpotent right ideals of R, then so is A + B.
10. (An interesting example) Let F be any field. Let F hx, yi denote the ring
of all polynomials in two “non-commuting” variables x and y. Precisely
F hx, yi is the monoid-ring F M where M is the free monoid generated
by the two-letter alphabet {x, y}. (We shall see later on that this ring
is the tensor algebra T (V ) where V is a two-dimensional vector space
over F .) Let I be the 2-sided ideal generated by {x2 , y 2 }, and form the
factor ring R := F hx, yi/I.
(a) For each positive integer k, let
(x)k := xyx · · · ( k factors ) + I,
(y)k := yxy · · · ( k factors ) + I.
Show that the set {1, (x)k , (y)k |0 < k ∈ Z} is a basis for R as a
vector space over F . Thus each element of R is a finite sum
r = α0 · 1 +
X
i∈J
αi (x)i + βi (y)i ,
where J is a finite set of positive integers.
446
(b) Show that the sum (x+I)R +(y +I)R of two right ideals is direct,
and is a maximal two-sided ideal of R.
(c) An element (x)k is said to be x-palendromic if it begins and ends
in x – i.e., if k is odd. Similarly (y)k is y-palendromic if and only
if k is odd: Show that any linear combination of x-palendromic
elements is nilpotent.
(d) Let Px ( Py ) denote the vecor subspace spanned by all x-palendromic
(y-palendromic) elements. Then the group of units of R contains
(1+Px )∪(1+Py ) and hence all possible products of these elements.
(e) Show that for any positive integer k, (x+y)k = (x)k +(y)k . (here is
a sum of two nilpotent elements which is certainly not nilpotent.)
11. A right ideal is said to be a nil right ideal if and only all of its elements
are nilpotent. (If the nil right ideal is 2-sided, we simply call it a nil
ideal.)
(a) For any family F of nilpotent right ideals, show that the sum
of all ideals in F is a nil ideal.
P
F
(b) Show that if A is a nilpotent right ideal, then so is rA.
(c) Define N (R) to be the sum of all nilpotent right ideals of R. (This
invariant of R can be thought of as a certain kind of “radical” of
R.) Show that N (R) is a 2-sided nil ideal.
12.6.3
Exercises concerning the Jacobson radical
Recall that Jacobson radical of a ring R (with identity) was defined to be
the intersection of all the maximimal right ideals of R, and was denoted by
the symbol Rad(R).
12. Show that an element r of the ring r is right invertible if and only if it
belongs to no maximal right ideal.
13. Let “1” denote the multiplicative identity element of R. Show that the
following four conditions on an element r ∈ R are equivalent:
(a) r belongs to the Jacobson radical of R.
(b) For each maximal right ideal M , 1 does not belong to M + rR.
447
(c) For every element s in R, 1 − rs belongs to no maximal right ideal
of M .
(d) For every element s in R, 1 − rs is right invertible.
14. Show that if an element 1 − rt is right invertible in R, then so is 1 − tr.
[Hint: By hypothesis there exists an element u such that (1 − rt)u = 1.
So we have : u = 1 + rtu. Use this to show that the product
(1 − tr)(1 + tur) = 1 + tur − t(1 + rtu)r,
is just 1.]
15. Demonstrate that the Jacobson radical Rad(R) is a two-sided ideal –
that is, if r ∈ Rad(R), then so is tr.
16. Show that Rad(R) contains every nilpotent 2-sided ideal.[Hint: If false,
there is a nilpotent 2-sided ideal A and a maximal ideal M such that
A2 ⊆ M while R = A + M . Then 1 = a + m where a ∈ A and m ∈ M .
Thus 1 − a is on the one hand an element of M , and on the other hand
it is a unit by Exercise 8.]
12.6.4
The Jacobson radical of Artinian rings
A ring whose poset of right ideals satisfies the descending chain condition is
called an Artinian ring. The next group of exercizes will lead the student
to a proof that for any Artinian ring, the Jacobson radical is a nilpotent
ideal.
17. We begin with something very elementary. Suppose b is a non-zero
element of the ring R. If b = ba, then there is a maximal right ideal
not containing a – in particular a is not in the Jacobson radical. [Hint:
Just show that 1 − a has no right inverse.]
18. Suppose R is an Artinian ring, and A is a non-zero ideal for which
A2 = A. Show that there exists a non-zero element b and an element
a ∈ A such that ba = b. [Hint: Consider F the collection of all non-zero
right ideals C such that CA 6= 0. A ∈ F, so this family is non-empty.
By the DCC there is a minimal element B in F. Then for some b ∈ B,
bA 6= 0. Then as bA = bA2 , bA ∈ F. But minimality of B shows
bA = B, and the result follows.]
448
19. Show that if R is Artinian, then Rad(R) is a nilpotent ideal. [Hint:
J := Rad(R) is an ideal (Exercize 15). Suppose by way of contradiction
that J is not nilpotent. Use the DCC to show that for some positive
integer k, A := J k = J k+1 , so A2 = A 6= 0. Use Exercize 18 to obtain
a non-zero element b with ba = b for some a ∈ A. Use Exercise 17
to argue that a is not in J. It is apparent that the last assertion is a
contradiction.]
20. Show that an Artinian ring with no zero divisors is a division ring. [No
hints this time. Instead a challenge.]
12.6.5
Quasiregularity and the radical
Given a ring R, we define a new binary operation “◦” on R, by declaring
a ◦ b := a + b − ab, for all a, b ∈ R.
The next exercises investigate invertibility properties of elements with respect
to this operation, and use these facts to describe a new poset of right ideals
for which the Jacobson radical is a supremum. One reward for this will be a
dramatic improvement of the elementary result in Exercise 16: the Jacobson
radical actually contains every nil right ideal.
21. Show that (R, ◦) is a monoid with monoid identity element being the
zero element, 0, of the ring R.
22. An element x of R is said to be right quasiregular if it has a right
inverse in (R, ◦) – that is, there exists an element r such that x ◦ r =
0. Similarly, x is left quasiregular if ` ◦ x = 0 for some ` ∈ R.
The elements r and ` are called right and left quasi-inverses of x,
respectively. (They need not be unique.) You are asked here to verify
again a fundamental property of all monoids: Show that if x is both
left and right quasiregular, then any left quasi-inverse is equal to any
right quasi-inverse, so there is a unique element x◦ such that
x◦ ◦ x = x ◦ x◦ = 0.
Show also that (x◦ )◦ = x, for any x ∈ R.
23. This exercise has three parts, all elementary.
449
(a) Show that for any elements a, b ∈ R,
a ◦ b = 1 − (1 − a)(1 − b).
(b) Show that x is right (left) quasiregular if and only if 1 − x is right
(left) invertible in R. From this, show that if x is quasiregular,
then 1 − x has 1 − x◦ for a two-sided multiplicative inverse.
(c) Conclude that x is quasiregular if and only if 1 − x is a unit.
(d) Prove that any nilpostent element a of R is quasiregular. [Hint:
Think about the geometric series in powers of a.]
24. Now we apply these notions to right ideals. Say that a right ideal is
quasiregular if and only if each of its elements is quasiregular. Prove
the following
Lemma 312 If every element of a right ideal K is right quasiregular,
then each of its elements is also quasiregular – that is, K is a quasiregular ideal.
[Hint: First observe that if x ∈ K and y is a right quasi-inverse of x,
then y ∈ K as well. So now y has both left and right quasi-inverses
which must be x (why?). Why does this make x quasiregular?]
25. Next prove
Lemma 313 Suppose K is a quasiregular right ideal and M is a maximal right ideal. Then K ⊆ M . (Thus every quasiregular right ideal
lies in the Jacobson radical.)
[Hint: If K does not lie in M , R = M + K so 1 = m + k where m ∈ M
and k ∈ K, Now k has a right quasi-inverse z. As above, z ∈ K. Then
0 = 1 · z − z = (m + k)z − z = mz + (kz − z) = mz − k,
to force k ∈ M , an absurdity.]
26. Show that the Jacobson radical is itself a right quasiregular ideal. [Hint:
Show each element y ∈ Rad(R) is right quasiregular. If not, (1 − y)R
lies in some maximal right ideal M.] We have now reached this
450
Lemma 314 The radical Rad(R) is the unique supremum in the poset
of right quasiregular ideals of R.
27. Prove that every nil right ideal lies in the Jacobson radical Rad(R).
[Hint: It suffices to show that every nil right ideal is quasiregular.
There are previous exercises that imply this.]
28. Are there rings R (with identity) in which some nilpotent elements do
not lie in the Jacobson radical? Can you produce an example? (This
last is not a sociological question about your abilities — a interesting
subject noone (not even you) knows anything about. Go out and try
to find a real example.)
12.6.6
Exercises involving nil one-sided ideals in Noetherian rings.
29. Show that if B is a nil right ideal, then for any element b ∈ B, Rb is a
nil left ideal.
30. For any subset X of ring R, let X ⊥ := {r ∈ R|Xr = 0}, the right
annihilator of X.
(a) Show that always X ⊥ R ⊆ X ⊥ .
(b) Show that X ⊥ ⊆ (RX)⊥ .
(c) If X is an additive subgroup of (R, +), then X ⊥ is a right ideal.
(d) Show that if X is a left ideal of R, then X ⊥ is a 2-sided ideal.
31. Now suppose L is a left ideal, and P is the poset of right ideals of the
form y ⊥ as y ranges over the non-zero elements of L (partially ordered
by the containment relation).
(a) Show that R cannot be a member of P .
(b) Show that if y ⊥ is a maximal member of P , then (ry)⊥ = y ⊥ for
all r ∈ R.
32. Suppose R is Noetherian – that is, it has the ascending chain condition
on right ideals.
451
(a) Show that if L is a left ideal, then there exists an element u ∈ L
such that u⊥ = (ru)⊥ , for all r ∈ L.
(b) Suppose L is a nil left ideal and that u is chosen as in Part (a)
of this exercise. Show that uRu = 0. Conclude from this that
(RuR)2 = 0, so u generates a nilpotent 2-sided ideal.
(c) Because R is Noetherian, there exists an ideal M which is maximal
among all nilpotent 2-sided ideals. Show that M contains every
nil left ideal. [Hint: If L is a nil left ideal, so is L + M , and so
(L + M )/M is a nil left ideal of the Noetherian ring R/M . Now if
L + M is not contained in M , Part (b) of this exercise applied to
R/M will yield a non-zero nilpotent 2-sided ideal in R/M . Why
is this absurd?]
(d) Show that the right ideal M contains every nil right ideal. [Hint:
Use Exercise 28 above and Part (c) of this exercise.]
(e) In a Noetherian ring, the sum of all nilpotent right ideals, N (R),is
a nil 2-sided ideal (Exercise 11, Part (c) above). Prove the following:
Theorem 315 (Levitsky) In a Noetherian ring, N (R) is a nilpotent ideal and it contains all left and right nil ideals.
Remarks: The reader is advised that most of the theory exposed in these
exercizes also exist in “general ring theory” where (unlike this limited text)
rings are not required to possess a multiplicative identity element. The proofs
of the results without the assumption of an identity element – especially the
results about radicals – require a bit of ingenuity. There are three basic
ways to handle this theory without a multiplicative identity. (1) Embedd
the general ring in a ring with identity (there is a rather ‘minimal’ way to do
this) and apply the theory for these rings.. Only certain sorts of statements
about general rings can extracted in this way. For example, quasiregularity
can be defined in any general ring. But to establish the relation connecting
quasiregularity of x with 1 − x being a unit, the embedding is neccessary. (2)
Another practice is exploit the plentitude of right ideals by paying attention
only to right ideals possessing properties they would have had if R had an
identity element. Thus a regular right ideal in a general ring, is an ideal
I for which there is an element e, such that e − er ∈ I for all R in the
ring. The Jacobson radical is then defined to be the intersection of all such
452
regular right ideals. With this modified definition, nearly all of the previous
exercises regarding the Jacobson radical can be emulated in this more general
context. But even this is just “scratching the surface”; there is an extensive
literature on all kinds of radicals of general rings. (3) In Artinian general
rings, one can often find sub-(general)-rings containing an idempotent serving
as a multiplicative identity for that subring. Then one can apply any relevant
theorem about rings with identity to such subrings.
Two particularly good sources extending the material in this chapter are
the classic Rings and Modules by J. Lambeck (Blaisdell, 1966) and the excellent and thorough book of J. Dauns, Modules and Rings (Cambridge U.
Press, 1994). 1
1
As indicated by the titles, they have this subject covered “coming and going”.
453
Chapter 13
Categorical Notions
13.1
Introduction
A category is a triple (O, Hom, ◦) consisting of a class O of objects, and
an assignment at each ordered pair of objects (A, B) ∈ O × O, a class
Hom (A, B) of morphisms with a law of composition
◦ : Hom (A, B) × Hom (B, C) → Hom (A, C),
defined at every ordered triple (A, B, C) for which Hom (A, B) and Hom (B, C)
are non-empty, subject to these rules:
1. Where defined, the (binary) composition operation on morphisms is
associative.
2. For each object A, Hom (A, A) contains a special morphism, 1A , called
an identity morphism at A, whose composition on the left or right
with any other morphism (when defined) is just that other morphism.
Since in many instances the morphisms in a category are mappings between sets or classes, we denote composition of morphisms as if they were
left operators. Specifically, if α ∈ Hom (A, B) and β ∈ Hom (B, C), then
β ◦ α ∈ Hom (A, C), denotes the result of the composition.
A subcategory of a category C := (O, Hom, ◦) is a pair (O0 , Hom0 , ◦)
where O0 is a subset of O and for every (A0 , B 0 ) ∈ O0 × O0 , Hom0 (A, B) is a
subset of Hom (A, B) such that all compositions (taken in C) of morphisms
defined by Hom0 are morphisms in Hom0 . It is called an induced subcategory if and only Hom0 (A, B) = Hom (A, B) at every pair (A, B) ∈ O0 × O0 .
454
Figure 13.1: A commutative diagram.
In general, of course, the objects O are not sets and the morphisms are
not mappings. This “abstract” approach is useful in several ways:
1. It separates those features of an argument which depend only on the associative composition of morphisms and the existence of identity morphisms, from the parts of an argument which depend on special internal
features of the mappings of sets.
2. It allows one the insight to recognize when an argument in one part
of mathematics is in fact a “categorical” argument which can be lifted
to other parts of mathematics where the hypotheses of the categorical
context of the argument may also be realized .
3. It allows one to define properties of an object, or subcategory, in terms
of the way they are embedded in a category.
It is this third advantage which will be of most use to us here, when we
consider universal mapping properties.
Discussions within a category can become complex and at times akward
to state completely in words. It is therefore convenient to adopt a symbolic
convention which we call a diagram. A morphism α ∈ Hom (A, B) repreα
sented as an arrow A → B with the morphism as label. If, α ∈ Hom (A, B)
and β ∈ Hom (B, C), then the fact that γ = β ◦ α can be expressed by saying
that the diagram if Figure 13.1 commutes or is a commutative diagram.
So far, there does not seem to be a great saving of space offered by diagrams.
However “commuting” extends beyond triangles to squares and polygonal
diagrams and complex diagrams can be buit up from these. Thus to say that
the diagrams of Figure 13.2 commute just asserts that β ◦ α = γ ◦ δ in the
first instance, and γ ◦ β ◦ α = ◦ δ in the second.
455
Figure 13.2: More commutative diagrams.
Figure 13.3: A complex diagram.
Commutativity really is an assertion that composition of morphisms along
two directed paths from object X to object Y in a diagram yield the same
morphism. The directed paths may be “homotopically” perturbed by commutative diagrams to obtain new instances of commutativity. For example
in the diagram of Figure 13.3, the commutativity of the polygonal diagram
formed by the outer borders asserts that λ ◦ µ ◦ β ◦ α = ρ ◦ φ ◦ γ and is
a consequence of the commutativity of the internal diagrams on the square
ABED and the triangles BCD, AHE, HEF, EDF, BCD and DF G.
EXAMPLES
1. The category K(X) of all subsets of a fixed set X. The title describes
the objects. All possible mappings among these subsets form the morphisms. If we restrict ourselves to the finite subsets of X with all
possible morphisms between them, we obtain a full subcategory. If, on
the other hand, we let the objects remain all subsets of X, but take
only the surjective mappings as morphisms, we obtain a subcategory
which is not an induced subcategory.
456
2. The category of all right (left) R-modules over a given ring R. The
morphisms are the R-homomorphisms among R-modules. This category entails a number of important cases:
(a) The category of Z-modules. (All Abelian groups.)
(b) The category of vector spaces over a field F . The linear transformations are the morphisms.
(c) Among the right R-modules can be found the set of all (S, R)bimodules, S MR , where the morphisms f must be morphisms
both as left S-module and right R-modules – that is, the law
f (smr) = sf (m)r, for all (s, m, r) ∈ S × M × R is in force.
An important induced subcategory of the category of right R-modules
is the category of finitely generated R-modules.
3. Let M be some fixed monoid. We form a category with M as its
sole object. Hom (M, M ) will be the set of monoid elements of M , and
composition of any two of these is defined to be their product under the
binary monoid operation. (Conversely, note that in any category with
object A, Hom (A, A) is always a monoid with respect to morphism
composition.)
4. Let G be a simple connected graph (V, E). This means that E is a
collection of 2-subsets of the vertex set V , and, for any two vertices
x and y there is a “walk” of some non-negative integral length n, say
w = (x = x0 , x1 , . . . , xn = y) where each successive pair {(xi , xi+1 } ∈ E
for i = 0, . . . , n − 1. Let us define a category as follows:
(a) The objects are the vertices of V .
(b) If x and y are vertices, then Hom (x, y) is the collection of all
walks that begin at x and end at y. (This is often an infinite set.)
Suppose now u = (x = x0 , x1 , . . . , xn = y) is a walk in Hom (x, y), and
v = (y = y0 , y1 , . . . , ym = z) is a walk in Hom(y, z). Then the composition of the two morphisms is defined to be their concatenation, that
is, the walk v ◦ u := (x = x0 , . . . xn = y0 , . . . ym = z) from x to z in
Hom (x, z).
The reader can verify that this is a category.
457
We say that a morphism α ∈ Hom (A, B) is an epimorphism if and
only if there is a morphism γ ∈ Hom (B, A) such that α ◦ γ = 1B . In the
category of sets with all possible mappings as morphisms (and any of its
subcategrories such as R-modules), the epimorphisms are surjections in the
usual sense. However, in some category for which objects are sets but for
some reason, there is a severe restrction on the existence of morphisms, this
may not be true. For example if α ∈ Hom (A, B) is surjective as a mapping,
but Hom (B, A) = ∅, α could not be an epimorphism.
Similarly, we say that a morphism α ∈ Hom (A, B) in the category C is a
monomorphism if and only if there exists a morphism β ∈ Hom (B, A) such
that β ◦ α = 1A . In any subcategory of the category of sets, the monomorphisms are precisely the injective mappings. However, as one can see, in the
abstract setting, these properties deal with the existence of right and left
inverses of morphisms.
In fact there are many constructions familiar to us in some nice category
such as the category of R-modules, which can be recreated in the realm of
abstract categories – things like:
zero objects
kernels and cokernels
direct sums and direct products.
But as we shall soon be turning to the category of R-modules for the rest of
this chapter anyway, there is no great advantage for the purpose of these notes
in recreating for all abstract categories constructions we already understand
in R-modules. We are, rather, interested in the converse process – bringing
abstract constructions to bear on the category of R-modules. The three main
categorical notions we shall exploit are these:
• Universal mapping properties.
• Duality.
• Functors.
13.1.1
Universal Mapping Properties.
Fix a category C, and consider a diagram ∆ – say, the one in Figure 13.4.
Possibly, we might want to adjoin to the diagram further assertions (P ) – for
458
Figure 13.4: An arbitrary diagram.
example, that the triangle ABC commutes, or that the morphism D → C
is an epimorphism or that A is a fixed previously designated object. By a
generic diagram of this type, we mean any diagram in C of this shape with
A, B, C, D replaced by other objects of C with the same assertions (P ) about
the diagram being maintained. (Note that in this way the original diagram
∆ has now become a variable with a range over certain diagrams of C.)
We say that a diagram ∆1 with attached assertions (P ) is embedded
in a diagram ∆2 with attached assertions (R), if and only ∆2 can be
obtained from ∆1 by adding more objects and arrows, and if the assertions
(R) include all of the old assertions (P ). A universal mapping property
is a declaration something like this:
Every instance in C of a generic diagram (∆1 , (P )) , can be embedded into
an instance of a generic diagram (∆2 , (R)) (perhaps in ways that are
unique in some respect).
This is intentionally broad, but it fits most situations.
Let us take a simple example. A source in a category C is an object S
such that for every object Y in the category, there is a morphism S → Y .
Then the statement that “S is a source” can be rendered as a universal
mapping property as follows:
The generic diagram ∆1 is just a pair {X, Y } of variable objects together
with the assertion (P ) that X = S.
The diagram ∆2 is X → Y together with the same assertion (P ).
If the assertion (R) included not only (P ) but the assertion that the morphism was unique, we should have the statement that “S is a zero object”.
It is sometimes possible to represent the whole universal mapping property by writing the diagram ∆2 with all new ingredients not in the original
459
∆1 written in some other font or with arrows dashed. Thus the assertion
that S is a source can be rendered symbolically by
S − −− → Y,
where, of course, the assertion that S is a particular object while Y is a
generic object of C must still be stated in words.
One more example: To say that the morphism α ∈ Hom (A, B) is surjective can be rendered as a universal mapping property by setting ∆1 to be
the diagram
f,g
α
X→Y →Z
along with the assertion that the morphisms f ◦ α = g ◦ α. Then ∆2 is the
same diagram with the extra assertion (R) that f = g. Thus compositions
with α are right-cancellable.
13.1.2
Duality.
Suppose (O, Hom, ◦) is some category. One can obtain a new category C̄ by
“turning all the arrows backwards”. Of course, this phrase is in quotation
marks since it has to be made precise. We are here constructing a new
category whose set of objects Ō is the same as that for C – that is, Ō = O.
For each pair (A, B) in O × O, and each morphism α ∈ Hom (A, B) in C,
we declare a morphism ᾱ ∈ HomC̄ (B, A) in category C̄ such that
• If β ◦ α ∈ Hom (A, C) is the composition in C of α ∈ Hom (A, B) and
β ∈ Hom (B, C), then
β ◦ α ∈ Hom (C, A)
is the composition in C̄ of β̄ ∈ Hom (C, B) and ᾱ ∈ Hom (B, A), –
that is,
β ◦ α = ᾱ ◦ β̄.
This means that each monoid Hom (A, A) is replaced by its opposite
monoid, Hom (A, A)opp = Hom (Ā, Ā) in C̄.
The same applies to diagrams, viewed as miniature categories, – each
¯ with arrows running the
diagram ∆ can be replaced by a dual diagram ∆
other way. In particular if a universal mapping property is the statement that
any diagram ∆1 with assertions (P ) can be embedded in a diagram ∆2 with
460
assertions (R), then there is a dual universal mapping property in which
the dual diagram ∆1 (with (P̄ ) the assertions of (P ) to the same objects but
to the dual morphisms they used to be asserted of) can be embedded in dual
¯ 2 (with assertions R̄). Thus
diagram ∆
• for every notion describable by a universal mapping property, there is
also a dual notion.
Thus, the dual of the notion of source would be an object T such that for
every object Y , Hom (Y, T ) 6= ∅. Similarly, the dual of epimorphism is
monomorphism; the dual of kernel is cokernel ; and, as we shall see, the dual
of projective is injective and the dual of direct sum is direct product.
13.1.3
Functors.
A functor from category C to category C 0 is first of all a mapping
F : O → O0 from the class of objects of C to the class of objects of C 0 , together
with a collection of mappings FA,B : Hom (A, B) → Hom (F (A), F (B)) for
each pair (A, B) for which Hom (A, B) 6= ∅ such that:
• If for morphisms, α ∈ Hom (A, B) and β ∈ Hom (B, C), in C we have
the composition γ = β ◦ α ∈ Hom (A, C), then its image FA,C (γ) ∈
Hom (F (A), F (C)) is the composition
FB,C (β) ◦ FA,B (α),
in category C 0 .
It follows, for every object A ∈ C, there is a monoid homomorphism
FA,A : Hom (A, A) → Hom (F (A), F (A)).
Note: We often delete the elaborate and redundant subscripts, writing F
for FA,B .
The first of the examples below shows that the mappings FA,B need not
be surjections.
There is another variation on this theme. Suppose A and B are categories
and that B̄ is the dual of the category B. Then a functor F : A → B̄ is called
a contravariant functor from A to B. This may seem a little confusing,
but basically the property being emphasized is that if f ∈ Hom (A1 , A2 ) is
461
a morphism in category A, then F (f ) is a morphism from F (A2 ) to F (A1 )
(rather than a morphism from F (A1 ) to F (A2 ) as would be the case if F
were merely a functor as defined above). Since it is a functor into the dual
category, the image of the compositions of morphisms in the domain category
A still maps to the composition of their images, but the arrows are now going
in the opposite direction.
The functors in the earlier paragraphs above – especially when they are
in an envoirnment which also contains contravariant functors – are called
covariant functors to distinguish them. But generally, here, “functor”
without any further adjectives means “covariant functor”.
EXAMPLES:
1. The category C(X) of pointed sets. Fix a set X. The objects of this
category are pairs (u, U ) where u ∈ U and U is a subset of X. The
morphisms of Hom ((u, U ), (v, V )) are the mappings U → V which take
u to v.
There is a second category K(X) of all non-empty subsets of X with
all possible mappings between them as morphisms.
Now there is a functor F : C(X) → K(X) which “forgets” about the
“point”. Precisely, our functor takes each object (u, U ) of C(X) to the
subset U , viewed as an object in the category K(X). Any morphisms
f : (u, U ) → (v, V ) is already a mapping U → V which is a morphism
F (f ) in the category K. This is clearly a functor.
2. Let A be the category of abelian groups – that is, the objects are all
the abelian groups and the morphisms are the group homomorphisms
that exist between them.
Of course A is also the category of right Z-modules in a natural way.
Now let MR be the category of right R-modules, with R-homomorphisms
as the morphisms.
Then there is a functor MR → A which regards each R-module M
as an additive abelian group (M, +) with respect to the operation of
module addition. That is, the right action from R is simply ignored.
F is a functor, because every R-homomorphisms is automaticically a
homomorphism of the underlying abelian groups.
462
3. The ‘Hom’ functors in the category of R-modules. As above, let MR
be the category of right R − modules for a fixed ring R. Fix a right Rmodule A. Then Hom (A, B) (also written HomR (A, B) when the category MR needs to be emphasized) is the collection of all R-homomorphisms
from right R-module A to right R-module B. Now, as observed earlier, Hom (A, B) itself has the structure of a right R-module under the
following conventions:
(f + g)a : = f (a) + g(a)
(f r)a : = (f (a))r
(13.1)
(13.2)
where f, g ∈ Hom (A, B), and (a, r) ∈ A × R, arbitrarily.
Now suppose f : B → C is a homomorphism among right R-modules.
Then post-composing every morphism of Hom (A, B) by f induces a
mapping HA (f ) : Hom (A, B) → Hom (A, C) which is easily seen to
be an R-homomorphism of right R-modules. One needs to check only
that
f ◦ (g + h) = f ◦ g + f ◦ h and
f ◦ (gr) = (f ◦ g)r.
(13.3)
(13.4)
Now setting HA (B) := Hom (A, B) for all right R-modules B in MR ,
we see that HA is a (covariant) functor MR → MR .
4. Now suppose f ∈ Hom (B, C). There there is a mapping Hom (C, A) →
Hom (B, A) induced by pre-composing every morphism g from C to
A by f to yield g ◦ f ∈ Hom (B, A). This is an R-homomorphism
H A (f ) : Hom (C, A) → Hom (B, A) of right R- modules. Setting
H A (B) := Hom (B, A) for all right R-modules B, one sees that H A
is a contravariant functor MR → MR .
463
13.2
Special Features of the Category of Rmodules.
13.2.1
Exact Sequences
Let f : A → B be a morphism in the category MR of right R-modules. One
calls A the domain of f , and calls B the codomain of f . Recall that
Im := f (A) = {f (a)|a ∈ A},
is called the image of f , and that
ker f := F −1 (0B ) = {a ∈ A|f (a) = 0B }
is called the kernel of f .
Now consider the composition g ◦ f ∈ Hom (A, C) of R-homomorphisms
f ∈ Hom (A, B) and g ∈ Hom (B, C) as depicted in the diagram:
f
g
A → B → C.
The pair (f, g) is said to be exact if and only if Im f = ker g. That is, an
element b of B has the form f (a) for some a ∈ A if and only if g(b) = 0C .
Similarly, if
{fj : Aj → Aj+1 |j in some interval of Z}
(13.5)
is a sequence of R-homomorphisms:
fj−1
fj
fj+1
· · · → Aj → Aj+1 → · · · ,
the sequence is said to be exact if and only if each successive pair (fj , fj+1 )
is exact.1
In the category MR , the trivial module, 0 is a “zero object”. This means
that for any right R-module M , there is exactly one R-homomorphism 0 → A
and one R-homomorphism A → 0.
Now we can say that a homorphism f : A → B is injective if and only if
the sequence
f
0→A→B
1
The phrase “interval of Z” appearing in the presented range of bound variable “j” in
equation (13.5) is intended to allow intervals that are infinite on the left, or on the right,
or both.
464
is exact. Similarly, f is surjective exactly with the exactness of the sequence:
f
A → B → 0.
Moreover f is a bijection if and only if the sequence
f
0 → A → B → 0.
is exact.
Finally the sequence
f
g
0→A→B→C→0
is called a short exact sequence if and only if this is an exact sequence.
In this case f is a monomorphism, g is an epimorphism, and C is isomorphic
to B/f (A), as an R-module.
Such a short exact sequence is said to split if and only if there are also
morphisms g 0 : C → B and f 0 : B → A such that
g ◦ g 0 = 1C
f 0 ◦ f = 1A .
(13.6)
(13.7)
In this case C ' A ⊕ B, a direct sum. This is just part of a larger scheme
to characterize the direct sum by mapping properties.
13.2.2
The Direct Sum as a Universal Mapping Property
Recall that the formal direct sum over a family {Mσ |σ ∈ I} of right Rmodules is the set of all mappings f : I → ∪σ M σ, with f (σ) ∈ Mσ , and
f (σ) non-zero for only finitely many indices σ. We think of the indices
as indicators of coordinates, and the value f (σ) as the σ-coordinate of the
P
element f . This is an R-module σ Mσ under the usual coordinate-wise
addition and application of right operators from R.
There is at hand a canonical collection of injective R-homomorphisms:
P
κσ : Mσ → σ Mσ defined by setting
(κσ (aσ ))(τ ) =
(
aσ
0
465
if σ = τ
if σ =
6 τ
where aσ is an arbitrary element of Mσ . That is, κσ takes an element a of
Mσ to the unique element of the direct sum whose coordinate at σ is a, and
whose other coordinates are all zero. We now state
Theorem 316 Suppose M = σ∈I Mσ is the direct sum of modules {Mσ |σ ∈
I} with canonical homomorphisms κσ : Mσ → M . Then, for every module
B, and family of homomorphisms φσ : Mσ → B, there exists a unique homomorphism φ : M → B such that φ ◦ κσ = φσ for all σ ∈ I. In fact, this
property characterizes the direct sum up to isomorphism.
P
We define φ : M → B by the equation
φ(a) :=
X
φ (a(σ)).
σ σ
The summands on the right are elements of B since a(σ) ∈ Mσ , by definition; only finitely many of them are non-zero, so the sum describes a unique
element of B. It is elementary to verify that φ, as we have defined it, is an
R-homomorphism. Moreover, if aτ is an arbitrary element of Mτ , then
φ ◦ κτ (aτ ) = φ(κτ (aτ )) =
X
σ∈I
φσ (κτ (aτ (σ))) = φτ aτ .
Thus we have φ ◦ κτ = φτ , for all τ .
Now suppose there were a second homomorphism λ : M → B, with the
property that λ ◦ κτ = φτ , for all τ ∈ I. Then
λ(a) =
X
σ∈I
λ(κσ (a(σ))) =
X
σ∈I
φσ (a(σ)) = φ(a).
Thus λ = φ. So the homomorphism φ which we have produced is unique.
Now assume that the R-module N , and the collection of morphisms λσ :
Mσ → N satisfy the universal mapping conditions of the last part of the
theorem.
First apply this property to M when B = N and φσ = λσ . Then there is
a homomorphism λ : M → N with λ ◦ κσ = λσ .
Similarly, there is a homomorphism κ : N → M for which κ ◦ λσ = κσ .
It follows that
κ ◦ κσ = κ ◦ λσ = κσ = 1M ◦ κσ ,
where 1M is the identity morphism on M . So by the uniqueness of the
homomorphism, κ ◦ λ = 1M . Similarly, reversing the roles, λ ◦ κ = 1N .
Therefore κ is an isomorphism.
466
P
We have seen from our earlier definition of the direct sum M = σ∈I Mσ
that there are projection mappings πσ : M → Mσ which, for any element
m ∈ M , reads off its σ-coordinate, m(σ). One can see that these projection
mappings are related to the canonical mappings κσ by the equations
πi ◦ κi = 1 (the identity mapping on Mi )
πi ◦ κj = 0 if i 6= j.
(13.8)
(13.9)
We remark that the existence of the projection mappings πi with the
compositions just listed could have been deduced directly from the universal
mapping property which characterizes the direct sum. Let δij be the morphism Mj → Mi which is the zero map if i 6= j, but is the identity mapping
on Mi if i = j. If we apply the universal property with Mi and δij in the respective roles of B and φi , we obtain the desired morphism φ = πi : M → Mi
with πi ◦ κj = δij .
13.2.3
The Direct Product as a Universal Mapping
Property
We can now characterize the direct product of modules by a universal property which is the complete dual of that for the direct sum.
Q
Theorem 317 Suppose M = σ∈I Mσ is the direct product of the modules
{Mσ |σ ∈ I} with canonical projections πσ : M → Mσ . Then, for every
module B, and family of homomorphisms φi : B → Mi there exists a unique
homomorphism φ : B → M such that πi ◦ φ = φi , for each i ∈ I. This
property characterizes the direct product up to isomorphism.
proof: The canonical projection mapping πi , recall, reads off the i-th coordinate of each element of M . Thus, if m ∈ M , m is a mapping I → ∪Mσ
with m(i) ∈ Mi , and πi (m) := m(i).
Now define the mapping φ : B → M by the rule:
φ(b)(i) := φi (b),
for all (i, b) ∈ I × B. Then certainly φ is an R-homomorphism. Moreover,
one checks that
(πi ◦ φ)(b) = πi (φ(b)) = (φ(b))(i) = φi (b).
467
Thus πi ◦ φ = φi .
Now if ψ : B → M were an R-homomorphism which also satisfied πi ◦ψ =
φi , for all i ∈ I, then we should have
(ψ(b))(i) = πi (ψ(b)) = (πi ◦ ψ)(b) = φi (b) = (φ(b))(i),
at each i, so ψ(b) = φ(b) at each b ∈ B – i.e. ψ = φ as mappings.
Now suppose N were some R-module equipped with morphisms π 0 i : N →
Mi , i ∈ I satisfying this mapping property: whenever there is a module B
with morphisms φi : B → Mi , then there is a morphism φB : B → N such
that π 0 i ◦ φB = φi .
First we use the property for M . Setting B = N and φi = π 0 i , we obtain
(in the role of φ) a morphism π 0 : N → M such that πi ◦ π 0 = π 0 i .
Similarly, using the same property for N (with M and πi in the roles of B
and the φi ), we obtain a homomorphism π : M → N such that π 0 i ◦ π 0 = πi .
It follows that
πi ◦ π 0 ◦ π = π 0 i ◦ π = πi = πi ◦ 1M .
By the uniqueness, π 0 ◦ π = 1M . Similarly π ◦ π 0 = 1N . Therefore π is an
isomorphism.
As a final remark, the universal mapping property can be used to show
Q
the existence of mappings κi : Mi → i∈I Mi such that πi ◦ κj is the identity
map on Mi if i = j, and is the zero mapping Mi → Mj if i 6= j.
13.3
Projective Modules
An R-module M is said to be projective if and only for every epimorphism
π : B → A, any homomorphism φ : M → A, can be “lifted” to a homomorphism ψ : M → B so that π ◦ ψ = φ. This is a universal mapping property
of the module M depicted by the diagram in Figure 13.5.
Recall that a free module over R is simply a direct sum of copies of the
module RR . We have:
Theorem 318 Every free module is projective.
proof: Let MR be a free R-module with basis {mi |i ∈ I}, and let π : B → A
be an epimorphism and let φ : M → A be a homomorphism. Since, π is an
468
Figure 13.5: The universal mapping property of projective modules.
epimorphism, for each index i ∈ I, there is an element bi ∈ B such that
P
π(bi ) = φ(mi ). Now for every general element m = i∈I mi ri we define
ψ(m) :=
X
br.
i∈I i i
(In these sums, all but a finite number of the summands are zero.) To
show that ψ is well defined we note that the expression of m as an R-linear
combination of the mi is unique. Clearly ψ is an R-homomorphism from M
into B, and the composition with π yields φ.
Theorem 319 If M is the direct sum of a family of modules {Mi |i ∈ I},
then M is projective if and only if each direct summand Mi is projective.
proof: Assume first that each Mi is projective. Now assume the classic
“test situation” for projectivity of M – that is, we have an epimorphism
π : B → A and an R-homomorphism φ : M → A. Recall that we have
canonical momorphisms κi : Mi → M with a certain universal property.
Since the Mi are projective, each homomorphism φ ◦ κi : Mi → A can be
lifted to a morphism ψi : Mi → B such that π ◦ ψi = φ ◦ κi . Now by
the universal property of the direct sum, there exists a unique morphism
ψ : M → B such that ψ ◦ κi = ψi for all i ∈ I.
Since
π ◦ ψ ◦ κi = π ◦ ψi = φ ◦ κi ,
for all i ∈ I, we may conclude that π ◦ ψ = φ. Thus M is projective.
469
Now, converesely assume M is projective. Let πi : M → Mi be the
cannonical projections. Suppose, as in the test situation that π : B → A is
an epimorphism as above, and let φi : Mi → A be a fixed morphism. We
must show that φi ‘lifts’.
Since M is projective φi ◦ πi : M → A lifts to a morphism ψ : M → B
such that π ◦ ψ = φi ◦ πi . But recall that πi ◦ κi is the identity mapping on
Mi , so π ◦ ψ ◦ κi = φi , and so φi has been lifted to ψ ◦ κi . Therefore Mi is
projective.
Theorem 320 A module M is projective if and only if every short exact
sequence
0 → K → B → M → 0,
splits – that is, M is isomorphic to a direct summand of every module B of
which it is a homomorphic image.
proof: Suppose M is projective and π : B → M is an epimorphism. Then
the identity mapping M → M can be lifted to a morphism κ : M → B such
that π ◦ κ = 1M . So the short exact sequence splits.
Conversely, assume M is a direct summand of every module B of which it
is a homomorphic image. We know from Chapter 7 that M is a homomorphic
image of a free module. So we may take B projective, in view of Theorem 318,
so M is a direct summand of a projective module. Then by the preceeding
Theorem 319, M is projective.
Corollary 321 A module is free if and only it is a direct summand of a free
module.
remark: Not every projective module is itself a free module, however.
One might wish to know which rings are characterized by the fact that
all of their R-modules are projective. This is answered in
Theorem 322 A ring R has all its right R-modules projective if and only if
RR is a completely reducible module.
470
Figure 13.6: The universal mapping property for injective modules.
proof: First suppose all right R-modules are projective. Then for any right
ideal K, the factor group R/K is a projective right R- module which is a
homomorphic image of the R-module RR . By Theorem 320, K is a direct
summand of RR as R-modules. Thus RR is completely reducible.
Conversely, assume RR is completely reducible. Let M be any right Rmodule, assume that π : B → M is an epimorphism. Let K be the kernel of
π (so M ' B/K). Now B is completely reducible (as are all R-modules when
RR is completely reducible: see Theorem322). Hence there is a submodule L
of B complementing K and clearly M is isomorphic to L, a direct summand of
B. Now, by Theorem 320, M is a projective module. The proof is complete.
13.4
Injective Modules
We now consider the notion which is dual to the notion of projectivity, just
considered. A module M is said to be injective if and only if it satisfies the
following universal mapping property: Whenever κ : A → B is a monomorphism, then any homomorphism φ : A → M can be “extended” to a homomorphism ψ : B → M – that is, ψ◦κ = φ. This universal property is depicted
in the diagram in Figure 13.6. Some results about projective modules can
now be obtained simply by “dualizing” – that is, reversing the arrows, and
transposing the words “surjective” and “injective” – in the proofs of results
about projective modules. Thus an immediate consequence of Theorem 319,
is:
Theorem 323 If the module M is the direct product of a family of modules
{Mi |i ∈ I}, then M is injective if and only if each Mi is injective.
Similarly, since the “exactness” of a short exact sequence is a self-dual
concept, we have from Theorem 320:
471
Theorem 324 A module K is injective if and only if every exact sequence
0 → K → B → M → 0,
splits.
This says (in one direction) that an injective module is complemented in
any module which contains it.
13.5
hom(A, B)
13.5.1
hom(A, B) is an R-module.
Let A and B be right R-modules. By our convention, any R-homomorphism
f from A to B is taken to be a left operator on A, so that basic commutativity
between the application of f and the right action of R can be expressed by
an “associative law”:
f (ar) = f (a)r, for all (a, r) ∈ A × R.
We have also noted that the collection of all R-homomorphism from A
to B, denoted hom(A, B) form an additive group by “pointwise addition”.
Thus, if f and g are elements of hom(A, B) then f + g is the mapping defined
by the rule
(f + g)(a) = f (a) + f (a), for all a ∈ A.
Clearly f + g is an R-homomorphism from A to B and hom(A, B) becomes
an additive group under this operation.
The point of this brief subsection is that hom(A, B) naturally possesses
the structure of a left R-module. What is sometimes overlooked is that this
can occur in several ways.
The first (standard) construction
Given an R-homomorphism f : A → B, and ring element r, let rf be the
mapping from A to B defined by
(rf )(a) := f (ar), for any a ∈ A.
Now we note that for a ∈ A, and ring elements r and s in R,
(r(sf ))(a) = (sf )(ar) = f (ar · s) = f (a(rs)) = ((rs)f )(a).
472
In other words,
r(sf ) = (rs)f
as maps in hom(A, B). Finally it is easy to check that r(f + g) = rf + rg
for all f, g ∈ hom(A, B) and r ∈ R. Thus
Lemma 325 Suppose A and B are right R-modules. Let hom(A, B) be the
set of all R-homomorphisms from A to B (viewed as left operators on elements of A). Define addition and left multiplication by R on hom(A, B) by
the rules
(f + g)(a) = f (a) + f (b)
(rf )(a) = f (ar), for all f, g ∈ hom(A, B), (a, b, r) ∈ A × A × R.
Then hom(A, B) has the structure of a left R-module.
A second construction
The asserted module structure in the next two lemmas are routine to verify.
Lemma 326 Again, suppose A and B are right R-modules and that B is also
a left S module (but not necessarily an (S, R)-bimodule). Then homR (A, B)
is a left S-module by the rule that for any s ∈ S, f ∈ homR (A, B), and
element a in A,
(sf )(a) := s · f (a).
One notes that for s, t ∈ S,
((st)f )(a) = (st) · f (a)
= s(tf (a))
= (s(tf ))(a).
Thus
(st)f = s(tf ).
Now the first and second constructions can be combined. Consider
Lemma 327 Let A be a right R-module and let B be an (S, R)-bimodule.
Let S ⊕ R be the direct sum of the rings S and R. Then homR (A, B) has the
structure of a left S ⊕ R-module by the rule:
((s, r)f )(a) := sf (ar),
for all (s, r, a) ∈ S × R × A and f ∈ homR (A, B).
473
A third construction
Here, if A is a left S-module, we obtain a right S-module structure on
homR (A, B).2
Lemma 328 Let A be an (S, R)-bimodule and B a right R-module. Then
homR (A, B) attains the structure of a right S module by the rule that for
f ∈ homR (A, B) and s ∈ S, that f s is the mapping A → B which takes each
a ∈ A to f (sa) – that is, (f s)(a) = f (sa).
proof: One only has to verify the module property:
(f s)t = f (st)
for all f ∈ homR (A, B). This is left as an exercise.
13.5.2
Functorial properties of Hom
Now suppose A, B and D are all right R modules. For the moment, we think
of A and B ranging all over the category MR of right R-modules, while D is
a fixed right R-module. Suppose φ : A → B is an R-homomorphism. Then
for each f in hom(B, D) we obtain an element f ◦ φ : A → D of hom(A, D),
taking a ∈ A to f (φ(a)). We check that for all r ∈ R:
(f + g) ◦ φ = f ◦ φ + g ◦ φ,
(rf ) ◦ φ = r(f ◦ φ),
(13.10)
(13.11)
which means that the mapping
hom(φ, D) : hom(B, D) → hom(A, D)
sending f ∈ hom(B, C) to f ◦ φ is a morphism of left R-modules.
Moreover the construction preserves compositions in reverse order. Thus
if we have R-homomorphisms φ : A → B and β : B → C, then we obtain
morphisms
hom(φ, D) :
hom(β, D) :
hom(B, D) → hom(A, D),
hom(C, D) → hom(B, D).
2
Although less frequently encountered, this construction was important enough to appear (albeit in the context of left R-modules) as Theorem 17a, page 66 of Chevalley’s book
Fundamental Concepts of Algebra, Academic Press, 1956.
474
Then we see that
hom(β ◦ φ, D) = hom(φ, D) ◦ hom(β, D).
(13.12)
This can all be stated succinctly in
Lemma 329 Let MR and R M denote the categories of right and left modules, respectively. Let D be a right R-module.
1. The mapping hom( , D) which sends each right R-module A to the left
R-module hom(A, D) is a contravariant functor MR →R M.
2. The mapping hom(D, ) which sends each right R-module A to hom(D, A)
is a covariant functor MR →R M.
The steps verifying the second assertion are virtually similar to the observations above and are left to an interested reader.
13.5.3
Dual modules
In the special case that A is a right R-module and B = RR , the first construction (presented in Lemma 325) endowed homR (A, RR ) with the strucure
of a left R-module, which we denote as A∗ := homR (A, RR ) and call the dual
module of A.
It is interesting to note, that since R is an (R, R)-bimodule, the second
construction (which made its debut in Lemma 326) also converts homR (A, R)
into a left R-module, which could equally well seem to be a candidate for the
definition of “dual module”. What is the difference between these constructions? In the first case (which we have chosen to christen the “dual module”)
for each f ∈ homR (A, B), r ∈ R and a ∈ A,
(rf ) : a → f (ar) = f (a)r,
(13.13)
while in the second construction,
(rf ) : a → rf (a).
(13.14)
Now if R is not commutative, it is clear that the two left modules are not
isomorphic. While, if R is commutative, either construction gives the same
left module.
475
Functorial properties of the dual construction
Suppose f : A → B is a homomorphism of right R-modules. Then by
Lemma 329, we obtain an induced morphism f ∗ : B ∗ → A∗ as left Rmodules. Taking duals clearly produces a contravariant functor from the
category of right R-modules to the category of left R-modules; but this is
just, Lemma 329 with D = RR .
We shall have much more to say about duals in the category of right
vector spaces over a division ring or field K, where bases, dimension and
matrices are available for discussion.
13.6
Tensor Products.
13.6.1
Elementary Consequences of the Definition.
Let A be a right R-module and let B be any left R-module. Then let Z(A×B)
be the free Z-module with the Cartesian product A × B as a Z-basis.
Now let L(A, B) be the additive subgroup of Z(A × B) generated by all
elements of the following forms:
(a1 + a2 ) − (a1 , b) − (a2 , b)
(a, b1 + b2 ) − (a, b1 ) − (a, b2 )
(ar, b) − (a, rb),
for all a1 , a2 , a ∈ A and b1 , b2 , b ∈ B and r ∈ R.
The factor group Z(A × B)/L(A, B) is denoted AR ⊗R B) (or just A ⊗R B
if the left and right module structure of A and B is understood). The image
of (a, b) under this natural surjective morphism t : Z(A × B) → A ⊗R B of
groups is denoted a ⊗ b := t(a, b).
Theorem 330 (The Universal Property of the Tensor Product.) Suppose
AR and R B are respectively right and left R-modules. Suppose f : AR ×R B →
C is a homomorphism of additive abelian groups which is “R-linear” in the
sense that
f (a1 + a2 , b)
f (a, b1 + b2 )
f (ar, b)
for all (a, a1 , a2 , b, b1 , b2 , r)
=
=
=
∈
f (a1 , b) + f (a2 , b)
f (a, b1 ) + f (a, b2 )
f (a, rb),
A × A × A × B × B × B × R.
476
Then the mapping f factors through the natural morphism t : A × B →
Z(A × B)/L(A, B) = AR ⊗R B – i.e. there is a morphism f¯ : AR ⊗R B → C
such that the diagram commutes. That means f = f¯ ◦ t.
proof: Obvious upon observing that ker f ≥ L(A, B).
This yields an immediate
Corollary 331 Suppose B, A1 , . . . , An are (R, R)-bimodules. Suppose
f : A1 × · · · × An → B
is an R-linear mapping, that is, for each coordinate place i:
f (a1 , . . .
=
f (a1 , . . .
=
, ai−1 , ai + bi , ai+1 , . . . , an )
f (a1 , . . . , , an ) + f (a1 , . . . , ai−1 , bi , ai+1 , . . . , an ), and(13.15)
, ai−1 , ai r, ai+1 , . . . , an )
f (a1 , . . . , ai , rai+1 , . . . , an ),
(13.16)
for all ai ∈ Ai and r ∈ R. Then the mapping f “factors through the tensor
product” – that is, there is a morphism (as right R-modules, as we shall see)
f¯ : A1 ⊗ A2 ⊗ · · · ⊗ An → B,
such that f = f¯◦ ι where ι : A1 × · · · × An → A1 ⊗ · · · × An is an extension of
the canonical mapping, that is, it takes the n-tuple (a1 , . . . , an ) to a1 ⊗· · ·⊗an .
Good definitions do not always engender good exercises for the reason that
once formulated their immediate uses are almost obvious. The following is
just such a situation: The proofs of the following Lemmata are relegated to
exercises but the author urges any students with uncertainties to sketch all
the proofs.
The first immediate consequence of the definition is that when each or
either of A and B enjoys a bimodule structure (and they do at least for
the ring of integers) then the tensor product itself assumes this bimodule
structure.
477
Lemma 332 (Extended Module Structure of A ⊗ B) If A is an (S, R)bimodule and B is an (R, T )-bimodule, where R, S, and T are rings, then
S AR
⊗R BT
has the structure of an (S, T )-bimodule in a well-defined way.
[The left and right multiplications are defined by the formula
s(a ⊗ b)t = (sa) ⊗ (bt)
with s, a, b and t lying where their generic name suggests, and since the
“pure” elements a ⊗ b span A ⊗ B, one needs to know that this definition
consistently extends to finite sums of pure elements. Then, verify the bimodule axioms.]
If X and Y are rings, let X MY denote the category of (X, Y )-bimodules
(this notation was intoduced in the list of examples in the introduction of
this Chapter).
Lemma 333 (Functorial Properties of the Tensor Product) Let S, R and T
be rings.
1. Suppose f : A → B is an (S, R)-homomorphism of (S, R)-bimodules.
Similarly suppose g : C → D is an (R, T ) homomorphism of (R, T )bimodules. Then the mapping
{a ⊗ c| (a, c) ∈ A × C} → {f (a) ⊗ g(c)| (a, c) ∈ A × C}
extends uniquely to an (S, T )-homomorphism
f ⊗ g : A ⊗R C → B ⊗R D
.
2. Now let C be a fixed (R, T )-bimodule. Then the mapping which takes
every (S, R)-bimodule A to the (S, T )-bimodule S AR ⊗R CT , is a covariant functor
− ⊗R C : S MR →S MT
taking every morphism f in S MR to the morphism f ⊗ 1C in S MT .
478
3. Similarly, if A is a fixed (S, R)-bimodule, there is a functor
A⊗R − :R MT →S MT ,
taking each (R, T )-bimodule C to A⊗R C.
Lemma 334 (Associativity of the Tensor Product) There is a unique isomorphism
(AR ⊗R B)S ⊗S CT → AR ⊗R (R BS ⊗S CT )
which on the spanning “pure elements” takes a ⊗ (b ⊗ c) to (a ⊗ b) ⊗ c.
Note that the previous Corollary provides the requisite bimodule structure for the terms enclosed in parenthesis. So this is really an isomophism of
(Q, T )-modules when A is a (Q, R)-bimodule.
Lemma 335 (Distributivity of the Tensor Product over Direct Sums) Suppose A is a fixed (S, R)-bimodule, and that {Bi |i ∈ I} is a family of (R, T )bimodules. Then with the usual direct sum notation, there is a canonical
isomorphism
S AR
⊗ (⊕i∈I
R (Bi )T )
' (⊕i∈I S (A ⊗R Bi )T
which obeys the same formula on “pure elements” (that is, one may change
upper case letters in the formula to lower case letters representing generic
elements)
Theorem 336 Let A be any right R-module. Then there is an isomorphism
A ⊗R R ' A as right R-modules. Similarly, if B is a left R-module, then
R ⊗R B ' B as left R modules.
proof: We prove only the right-hand version. First, as R is an (R, R)bimodule, and A is a right R-module, we see that A⊗R R is a right R-module.
Now we have a bilinear mapping
α0 : A × R → A
taking (a, r) ∈ A × R to ar. Since (a, sr) and (as, r) have the same image
under this mapping, the fundamental property of the tensor product reveals
479
that the mapping α0 factor through the tensor product, giving us a mapping
α : A ⊗R R → A mapping each “pure” tensor a ⊗ r to ar. Then
α((a ⊗ r)s) = α((a ⊗ (rs)) = ars = (α(a ⊗ r))s.
Thus α is a morphism of right R-modules.
Next define a mapping β : A → A ⊗R R taking each element a of A to
the pure tensor element a ⊗ 1R , where as usual 1R denotes the multiplicative
identity element of R. Then β is easily seen to be a morphism of right Rmodules. But next one observes that β is both a left and right inverse of α.
Thus for any elements a ∈ A and r ∈ R, we have
β
α
αβ = 1A : a → a ⊗ 1R → a1R = a
α
β
βα = 1A⊗R : a ⊗ r → ar → ar ⊗ 1R = a ⊗ r.
(13.17)
(13.18)
Since β is the inverse mapping of α, the latter is an isommorphism of
R-modules.
13.7
Important applications of the tensor product
13.7.1
Exchange of Rings
One of the many useful constructions obtainable from the tensor product is
that of the so-called “exchange of rings”. Suppose M is a right R-module
and S is a ring containing R. We can then regard S as an (R, S)-bimodule.
Then M ⊗R S is a right S-module (see Exercise 1).
Thus starting with a right R-module M we obtain a new module M 0 =
M ⊗R S which is a right module for a larger ring S. Also for any set of
generators U of R-module M , the set U 0 = U ⊗ 1 is a set of generators of
M ⊗R S = M 0 . This is because
U 0 S contains U 0 R, so < U 0 S >⊇ M ⊗R S = M 0 .
We give two applications of this construction:
application 1. Let F ⊆ K be an extension of fields. Let V be an ndimensional vector space over F with basis {v1 , · · · , vn } = B. Then V ⊗F K =
480
V 0 is a vector space over K and has the elements v1 ⊗ 1, · · · , vn ⊗ 1 as a basis:
thus dimK (V ⊗F K) = dimF V = n. This is no different than the process
of enlarging the field of a vector space V by regarding V as the set of all
F -linear combinations of the vectors v1 , · · · , vn , and the forming the space
V 0 of all K-linear combinations of the vectors v1 , · · · , vn . This is something
the student has probabily been doing for years without any formalism such
as the tensor product.
applicaton 2. Suppose G is a group and let K be any field. A representation of a group G as a group of linear transformations of a vector
space V is a group homomorphism G → GL(V ), the group of units of the ring
HomK (V, V ). We may then regard V as a right KG-module where KG is the
group ring. Conversely, any right KG-module V , affords a representation of
G.
Suppose we have on hand a representation of a subgroup H of G over the
field K – i.e. we have a right KH-module, V . How can we obtain from V ,
a representation of the whole group G?
The answer is simple. We may regard KH as a subring of the group
ring KG, and follow the “exchange of rings” recipe, forming in this way the
KG-module
V 0 = V ⊗KH (KG)
This KG-module V 0 is said to afford a representation of G that is induced
from the representation of H afforded by V .
This process maps in some way the category MKH −→ MKG and preserves direct sums from the distributivity in Lemma 335.3
13.7.2
Tensor products of R-algebras.
Definition of R-algebra.
Let A be a right R-module. Then A is called a right R-algebra if and only
if there is a mapping
g :A×A→A
such that the following hold:
3
It is in fact the functor −−⊗KH KG and induces a ring homomorphism G(MKH ) −→
G(MKG ) of the corresponding Grothendiek rings — whatever they are!
481
(A1)
1. For each element x ∈ A the mapping g(x, −) : A → A which
takes each element a ∈ A to g(x, a) is an R-endomorphism of A
as a right R-module.
2. Similarly, for any x ∈ A the mapping g(−, x) taking any element
a of A to g(a, x) is also an R-endomorphism of A.
Remark Note that from this definition alone one cannot say that the
mapping g factors through the tensor product – indeed, unless A is somehow
also a left R-module, the expression A ⊗ A is just meaningless.
Of course there is a left-handed version of this definition, where a left
R-algebra is a left R-module A with a mapping A × A → A which is a
homomorphism of left modules in each coordinate.
However the distinction between left and right R-algebras is not as sharp
as it would seem. To explain this, suppose for the moment that A is a right
R-algebra. We refer to R as the ring of scalars, in order to distinguish the
elements of R from the elements of A. Let A2 be the (right) R-submodule of
A generated by all “products” {g(x, y), x, y ∈ A}. For any submodule B of
A, let Ann(B) be the set of all elements r ∈ R such that Br = 0. This the
right annihilator of B. We have met this before: the right annihilator of
any submodule B of A is always a two-sided ideal of the ring R. We say A
is a faithful R module if and only if Ann(A) = 0. Finally, we say that an
element e of the R-algebra A is a multiplicative identity element if and
only if g(e, x) = g(x, e) = x for all elements x ∈ A.
These definitions all play a role in the following
Theorem 337 Let A be a right R-algebra. Then the following hold:
1. R/Ann(A2 ) is a commutative ring.
2. If A2 = A, then A can be assigned the structure of an (R, R)-bimodule
where left multiplication by a scalar is defined to be right multiplication
by it. Moreover if A is a faithful R-module, then R is a commutative
ring.
proof: Let a, b ∈ A and let α and β be arbitrary scalars from R. Condition
(A1) implies
cg(aα, b) = g(a, b)α
g(a, bβ) = g(a, b)β.
482
(13.19)
(13.20)
Now, for arbitrary x and y in A, we have
g(x, y)(αβ) =
=
=
=
=
=
(g(x, y)α)β (right R-module axiom for A)
g(xα, y)β (by (13.19)
g(xα, yβ) (by (13.20)
g(x, yβ)α (by (13.19)
(g(x, y)β)α (by (13.20)
g(x, y)(βα) (right R-module axiom for A).
Comparison of first and last terms yields
g(x, y)(αβ − βα) = 0,
for all (x, y, α, β) ∈ A × A × R × R. Thus the arbitrary commutator [α, β] =
αβ − βα lies in the right annihilator of A2 . This proves 1.
Part 2. Suppose A2 = A. Then from part 1.,
x(αβ) = x(βα)
(13.21)
for all (x, α, β) ∈ A × R × R. Now let us define left multiplication of a module
element by a scalar by agreeing to write
αa := aα for every α ∈ R and a ∈ A.
It is clear from the definition that left multiplication by a scalar distributes
over module addition.
But, to show that this makes A into an (R, R)-bimodule, we need only
verify these formulae:
(αβ)m = α(βm)
(αm)β = α(nβ).
The student can easily check that both of these follow from the identity
(13.21).
Finally, if A2 = A. and A is a faithful R module, we have Ann(A) = 0.
Then part 1. and the first statement of 2. show that R = R/Ann(A) is
commutative, as required.
483
The general use of the tensor product in defining an algebra.
Suppose A is an (R, R)-bimodule. We have already seen that A⊗R A is also
an (R, R)-bimodule (see Lemma 332). Now suppose f : A⊗R A → A is
an (R, R)-bimodule homomorphism. This defines a sort of “multiplication”
“◦00 : A × A → A defined by
a ◦ b := f (a ⊗ b), for all a, b ∈ A.
Since the mapping is R-bilinear, and since A ⊗R A is an (R, R)-bimodule, the
usual distributive laws hold, and the right and left multiplications from the
ring R associate with the multiplication “◦”. Does this make A into either a
left or right R-algebra?
No it doesn’t. What we need are the identities implied by (A1). They
are these:
xα ◦ y = (x ◦ y)α,
(13.22)
which, in terms of f , is simply
f (xα ⊗ y) = f (x ⊗ y)α,
and of course the identity x ◦ (yα) = (x ◦ y)α, which automatically follows
from the way A ⊗ A is defined to be an (R, R)-bimodule in Lemma 332.
Putting these two together, we see that in order for the (R, R)-homomorphism
f : A ⊗R A → A to define a right R-algebra we must have
x ⊗ (αy − yα) = 0, for all x, y ∈ A and α ∈ R.
We thus have
Lemma 338 Suppose A is an (R, R)-bimodule, and suppose f : A⊗R A → A
is an (R, R)-homomorphism. Let A0 be the additive subgroup generated by
all elements of A of the form αx − xα. Then A0 is an (R, R)-submodule of
A. Given f : A ⊗ A → A as above, we can define a sort of “right radical”
A⊥ := {r ∈ A|A ⊗ r = 0}. Then A⊥ is also an (R, R)-submodule of A.
1. Then “multiplication” defined by f confers upon A the structure of a
right R-algebra if and only if
A ⊗ A0 = 0.
484
(13.23)
2. Suppose now that A⊥ = 0 and that the condition (13.23) holds so that
f yields the right R-algebra structure. Then A is an (R, R)-bimodule
by the rule
αx = xα, for all (x.α) ∈ A × R.
Of course that means R/Ann(A) is a commutative ring.
proof: Everything except the assertions that A0 and A⊥ are (R, R)-submodules
of A have already been demonstrated above. The two assertions themselves
are left as exercises.
Please note that in general, even when we can define one does not obtain a
ring from this process since (1) the multiplication “◦” may not be associative,
and (2) there might not be a multiplicative identity. If it is a ring, we say
that A is an associative R-algebra.4
Why R-algebras are often over a commutative ring.
Suppose A is a right R module. We have remarked that Ann(A) = {r ∈
R|Ar = 0}, the right annihilator of A, is a two-sided ideal of the ring
R. We said that A was a faithful R-module if and only if Ann(A) = 0.
But of course we can then always regard A as a right R̄-module where R̄ =
R/Ann(A). In fact it will always be a faithful R̄-module.
In short the elements of Ann(A) simply don’t matter in speaking of A
as a right R module or in other constructs obtained from it, such as the
notion of a right R-algebra. This implies the metamathematical fact that
the theory of R-algebras is not constrained in any way if we assume A is a
faithful R-module – we just exchange R for R̄.
In this case, as observed in Theorem 337, if A2 = A, then R is a commutative ring. The condition A2 = A is implied by any single one of a host
of conditions: for example, it is true if A contains either a left identity element or contains a right identity. Now we understand why most of the time
R-algebras are viewed as being over commutative rings.
In the next several chapters our right R-modules A will be right vector
spaces over a division ring D. Then of course, for non-zero vector spaces A we
always have Ann(A) = 0. Then we see that a discussion of right D-algebras
4
The reader should be warned that many authors use the term R-algebra to refer to
what we have called an “associative R-algebra”.
485
A for which A2 = A require that that D be commutative – that is, it must
be a field.
From this point onward, we specialize our previous notion of right Ralgebra, to a 2-sided notion: A (two-sided) R-algebra is a symmetric
(R, R)-bimodule A — that is, a αa = aα for all (a, α) ∈ A × R — which
is also a right R-algebra.5 Obviously it would be an insult to any student to
request the inference that A is a left R-algebra from these hypotheses – it
is immediate from the symmetric module assumption. Of course if A is an
R-algebra, is a faithful right module
Tensor products of R-algebras.
Here we finally come to the main point of this subsection.
Let A and B be R-algebras as defined in the previous subsection. (Each
is a symmetric (R, R) bimodule so in fact, by Lemma 332, the tensor product
A ⊗ B can be formed and is itself a symmetric (R, R)-bimodule.) We denote
the multiplications in each algebra A or B by juxtaposition, as we do for
any ring. We do this particularly in order to create a contrast with a new
operation we are going to introduce on A ⊗ B. It is proposed to define a
multiplication “◦” on A ⊗R B so that
(a ⊗ b) ◦ (a0 ⊗ b0 ) = aa0 ⊗ bb0 ,
(13.24)
for all a, a0 ∈ A and b, b0 ∈ B. Now notice, that for an arbitrary element r of
the ring R,
ara0 ⊗ bb0 =
=
=
=
13.7.3
(ar ⊗ b) ◦ (a0 ⊗ b0 )
(a ⊗ rb) ◦ (a0 ⊗ b0 )
aa0 ⊗ rbb0
aa0 r ⊗ bb0 .
The tensor product of modules as a module for
a tensor product of rings.
Suppose A and B are right R-modules. Is there a natural way that A ⊗S B
can be regarded as as a right R module?
5
As usual the parenthesis in the definition is intended to indicate that this adjective
“two sided” is intended to apply precisely when it is omitted. What a strange world!
486
Note that we need some intermediate ring S – possibly Z – just to define
the tensor product. If R were commutative, we could convert both A and B
into (R, R)-bimodules, and use the extended module structure (Lemma 332)
to make A ⊗R B a right R module.
The construction of this section is a completely different variation which
does not assume that R is commutative. In fact it encompasses two rings R
and S.
Let A be a right R module, and suppose B is a (K, S)-module where K is
some subring containing 1R = 1S lying in the center of both R and S. Then
we can form A ⊗K B.
Now right multiplication of B by an element s of S is a K-endomorphism
of B which we denote adB (s). Similarly, right multiplication of A by an
element r of R is a K-endomorphism of A which we denote by adA (r). We
then declare the pair (r, s) of the Cartesian product of rings R × S to act on
A ⊗K B as the K-endomorphism
adA (r) ⊗ adB (s),
defined in Lemma 333 , Part 1. That is, we have a mapping
: R × S → End(A ⊗K B)
that is (1) linear in each argument and (2) for any element t ∈ K, and any
(r, s) ∈ R × S, the endomorphism induced by the pair (rt, s) is the same as
that induced by the pair (r, ts). It follows from the “universal property of the
tensor product” (Theorem 330 ) that the mapping : R × S → End(A ⊗K B)
factors through the ring R⊗K S. Thus A⊗K B becomes a right R⊗K S-module
where the action on “pure tensors” by right multiplication is consistently
described by
(a ⊗ b)(r, s) := ar ⊗ bs,
(13.25)
for all (a, b, r, s) ∈ A × B × R × S. Note that by the previous subsection,
R ⊗K S is a K-algebra, and composition of right multiplications is right
action of a product. This is easily checked on a class of generateors of both
the module and the K-algebra as follows:
((a ⊗ b)(r ⊗ s))(r0 ⊗ s0 ) = (ar ⊗ bs)(r0 ⊗ s0 )
= (arr0 ⊗ bss0 ) = (a ⊗ b)(rr0 ⊗ ss0 )
= (a ⊗ b)((r ⊗ s)(r0 ⊗ s0 )),
487
for all (a, b, r, r0 , s, s0 ) ∈ A × B × R × R × S × S.
We have thus shown:
Lemma 339 If A and B are respectively right R- and right S-modules, and
if K is a central subring of both R and S, then A ⊗K B is a right (R ⊗K S)module by the action given in equation (13.25).
Application . The device also provides a way to get a representation
of the direct product of two groups from representations of each factor.
Suppose G = G1 × G2 , the direct product of two multiplicative groups.
Suppose K is a field, and A and B are right modules for the group rings
KG1 and KG2 , respectively. Then K can be regarded as a subring of the
center of both rings. Now it is an easy exercise to show that the group ring
KG = K(G1 × G2 ) is isomorphic to the tensor product of the respective
group algebras – that is KG ' KG1 ⊗ KG2 as group algebras. Thus using
this action as a tensor product of algebras, Lemma 339 tells us that A ⊗ B
is a KG-module. In fact, if A and B are irreducuble modules for KG1 and
KG2 respectively, then A ⊗K B is an irreducible KG-module.
13.8
Graded Algebras
13.8.1
Morphisms of graded modules
Let (M, +) be a (not necessarily commutative) monoid with identity element
“0M ”. A right R-module A is said to be graded by M if and only if A can
be written as a direct sum of submodules indexed by M – that is
A = ⊕σ∈M Aσ .
(13.26)
The elements of M are called the “degrees” of the graded module.
Now suppose A and B are right R-modules which are graded by the
same monoid (M, +). An R-homomorphism f ∈ homR (A, B) is said to be
of degree γ if and only if, for all degrees σ, f (Aσ ) ⊆ Bσ+γ . Note that the
identity mapping on A is always homogeneous of degree zero – more precisely,
degree 0M .
Lemma 340 Let A and B be right R-modules graded by the same additive
monoid M . Then of course hom(A, B) is a left R-module by the standard
construction whereby (rf )(a) = f (ar), for f ∈ hom(A, B) and (a, r) ∈ A×R.
488
1. Let Hγ denote the elements of hom(A, B) which are morphisms of degree γ. Then Hγ is a left submodule of hom(A, B).
2. Suppose {fσ } is a finite set of R-homomorphisms from A to B with
fσ ∈ Hσ , each of a distinct degree. If M is a cancellation monoid, then
X
and
P
M Hσ
fσ = 0 ∈ hom(A, B) implies each fσ = 0,
is a direct sum of left R-modules.
Proof: Simply from the definitions, Hσ is an additive subgroup of hom(A, B).
If a ∈ Aγ and f ∈ Hσ , then f (a) ∈ Bσ+γ and also f (a)r = f (ar) = (rf )(a)
lies in Bσ+γ . Thus rf also has degree σ. So Hσ is a left submodule of
hom(A, B).
Now let the family {fσ } be as in part two. Choose any element a ∈ Aγ .
Then
X
0 = fσ (a).
Now fσ (a) belongs to Bσ+γ . Since M is a cancellation monoid, if σ 6= τ ,
then σ + γ 6= τ + γ. Thus each summand on the right belongs to a unique
homogeneous component Bσ+γ , and so each summand is zero because of the
uniqueness of such a sum. Since a was an arbitrary homogeneous element,
and since any element of A is a finite sum of such elements, each fσ is the
zero mapping.
P
It follows that any morphism in M Hσ has a unique expression as a sum
of elements in pairwise distinct Hσ . Thus the sum is direct.
Warning: Be aware that
13.8.2
P
M Hσ
need not be all of hom(A, B).
Subalgebras of graded R-algebras
Now suppose A is an R-algebra where R is a commutative ring, and ra = ar
for all (a, r) ∈ A × R. Suppose also, that as a right R-module, A is graded
by the (not necessarily commutative) additive monoid M so that A has the
direct decomposition depicted in (13.26). Then A is called a graded algebra
(graded by M ) if and only if, for any σ and τ in M ,
Aσ Aτ ⊆ Aσ+τ ,
489
(13.27)
Where, as usual, for right submodules X and Y of AR , the juxtaposition
XY is the right submodule of all R-linear combinations of products xy of an
element of X with an element of Y .
If A is a graded algebra, graded by M , the elements of Aσ are said to
be homogeneous of degree σ. An element is said to be homogeneous
if and only if it is homogeneous of some degree in M . Thus the element 0
of the algebra is homogeneous of every possible degree, while the non-zero
elements are homogeneous of a unique degree or are not homogeneous at all.
If σ 6= τ in M , and s and t are non-zero elements of Aσ and Aτ , repectively,
then s + t is not homogeneous. Note that the set of homogeneous elements
of a graded algebra are closed under multiplication.
Lemma 341 Suppose A is a graded R-alegbra, graded by monoid (M, +).
1. Then every non-zero element has a unique expression as a finite sum
of homogeneous elements.
2. The set of homogeneous elements is closed under algebra multiplication.
3. If M is a left cancellation monoid, and if the graded R-algebra has a
multiplicative identity element 1, then 1 is an homogenous element of
degree 0M .
Proof: The first two observations are straightforward from the definitions.
For the third, express the multiplicative identity of A as a sum of homogeP
neous elements, say as 1 = γ∈J aγ , where J is a finite set of degrees and aγ is
homogeneous of degree γ. Now let fγ be the R-homomorphism of AR → AR
obtained by left multiplication by aγ . Then fγ is a morphism of degree γ.
Thus left multiplication by 1 is the identity mapping 1A of A, which has
degree zero. Thus
X
0 = (1A − f0M ) + J−{0 } fγ .
M
From Lemma 340, Part 2, each summand on the right side must be zero. So
if γ 6= 0M , left multiplication by aγ is the zero mapping. But in that case,
aγ = aγ · 1 = 0A . Thus we see that 1 = aOM is homogeneous of degree zero.
Theorem 342 Let R be a commutative ring, and let A be an R-algebra
graded by the (not necessarily commutative) monoid (M, +) with identity
element 0M . We say that a subalgebra B (that is a subset of A closed under
490
addition, algebra multiplication, and multiplications on either side from the
ring R) is a homogeneous subalgebra if and only if
B = ⊕σ∈M B ∩ Aσ .
(13.28)
The conclusion is that B is a homogeneous algebra if and only if, as an
R-algebra, B is generated by homogeneous elements.
A Sketch of the Proof: For each degree σ ∈ M , let Bσ := B ∩ Aσ ,
the homogeneous elements of degree σ which lie in B. Clearly, if B is a
homogeneous subalgebra, it is generated as an R-module by its homogeneous
elements –and so is generated by them as a subalgebra.
Now suppose S is a set of homogeneous elements of A and let B be
the algebra that they generate. Then every monomial expression in S is a
homogeneous element of A lying in B and so is any R-multiple of it. Now the
collection of all R-linear combinations of such monomials is a subalgebra of
A lying in B but containing S, and so, as a set, coincides with B. Thus each
element of B is a finite sum of homogeneous elements of B – which means
that B is a sum of the Bσ . The sum is direct from the criterion of uniqueness
of expression.
13.8.3
Some examples of graded R-algebras
Not too many non-associative examples come to mind. I can tell you that a
classical simple Lie algebra L over the complex numbers always has a direct
decomposition into subspaces Lr which are indexed by a collection of vectors
in real Euclidean space called the “root system”. These roots generate an
integral lattice M which is a discrete additive subgroup of n-dimensional
Euclidean space. The Lie algebra can then be seen to be an algebra graded
by M since always Lσ Lτ ⊆ Lσ+τ . Here, if σ ∈ M , but is not a root, then
Lσ = 0.
The degree zero elements, as usual, form a subalgebra. In this example
it is called “the Cartan subalgebra”.
Here are some associative examples which may be more familiar.
1. Let G be any group (infinite or not, commutative or not), and let A be
the group ring KG where K is any field. Then clearly A is an algebra
graded by the group G.
491
2. Again, let K be any field and consider the polynomial ring, K[X], where
X is a (possibly infinite) collection of commuting indeterminates.
3. More generally, let M be any monoid, and form the monoid-ring KM ,
of all K-linear combinations of elements of the monoid M . Multiplication takes place using the distributive laws to reduce the product to a
sum of products of elements of M . Clearly the group-ring is an example
with M = G. But also the polynomial ring is an example with M the
free commutative monoid generated by X. Here M is the monoid of
multisets on X. Finally, one can take M to be the free monoid generated by the alphabet X. In that case we obtain a tensor-algebra, which
will be introduced in Chapter 14.
13.8.4
Ideals and homomorphisms in associative graded
R-algebras
As usual R is a commutative ring. In this section A will be an associative
R-algebra with a multiplicative identity element 1. We wish to consider
homomorphisms and ideals of these algebras which relate well to the structure
of the grading.
Actually, the associativity is somewhat unnecessary, and the author apologizes for this. The only reason that we impose associativity is that up to this
point, we have not discussed homomorphisms and ideals of non-associative
algebras.
We can now state the ideal analogue of Theorem 342 of the previous
subsection.
Theorem 343 An ideal J of the graded algebra A is homogeneous if and
only if it is generated (as an ideal of A) by a set of homogeneous elements.
Sketch of proof: First, if J is homogeneous, then it is a direct sum of
its R-submodules Jσ = J ∩ Aσ and so is certainly generated as an ideal by
these submodules.
Conversely, suppose S is a collection of homogeneous elements. It is easy
to see that the two-sided ideal J generated by S consists of all R-linear
combinations of monomials of the form a1 s1 a2 s2 · · · ak sk ak+1 , where the si
are in S,the ai are arbitrary elements of A and k is any positive integer (so
at least one si is a factor). Now each ai is itself a finite sum of homogeneous
492
elements of A and expanding the products out one obtains a finite sum of
such monomial expressions, but this time with the ai homogeneous elements
of A. Since homogeneous elements are closed under multiplication, these new
monomials all belong to Jσ for various degrees σ. We have thus shown that
P
J = Jσ and we are done.
Theorem 344 Suppose L is a homogeneous two-sided ideal of the graded
associative algebra A, graded by M . Let π : A → Ā := A/L be the natural
ring homomorphism. For each degree σ let Āσ := π(Aσ ). Then
Ā = ⊕M Āσ .
Moreover,
Āσ Āτ ⊆ Āσ+τ .
Thus Ā is an algebra graded by M .
Proof: Obviously A = ⊕Aσ implies Ā = Āσ .
P
Suppose J were a finite subset of M , and that a sum σ∈J aσ represented
an element ` of the ideal L. Then as L is homogeneous, ` has a unique expression as a sum of homogeneous elements of L. But it also has a unique
expression as a sum of homogeneous elements of A and we have just been
presented with this sum. Since any homogeneous element of L is an homogeneous element of A, the expressions coincide, term by term. Thus each aσ
already lies in Lσ . This shows that Ā is a direct sum of the Āσ .
Now it is clear that if a and b are homogeneous elements of degrees σ and
τ , respectively, then ā := a + L and b̄ := b + L are typical elements of Āσ
and Āτ , respectively. Clearly (a + L)(b + L) ⊆ (ab + L) ∈ Āσ+τ . Thus Ā is
a graded algebra.
P
Remark: Obviously the set deg(L) of degrees of the non-zero elements of
the ideal L generates some submonoid ML of M . For those degrees σ which
are not in ML , one has an isomorphism Āσ ' Aσ . Always, we have
Āσ ' Aσ /(L ∩ Aσ ) = Aσ /Lσ .
493
13.9
Exercises for Chapter 12
Exercise 1. Let R and S be rings, and let ρ and σ be antiautomorphisms
of R and S, respectively. Given an (R, S)-bimodule M , define new multiplications
S×M ×S →M
by the rules that
sm = m(sσ ) and mr = (rρ )m,
for all (s, m, r) ∈ S × M × R.
1. Show that with respect to these new multiplications, the additive group
(M, +) is endowed with the structure of an (S, R)-bimodule, which we
denote as σ M ρ .
2. Show that if Z is the ring of integers, any abelian additive group (M, +)
is a (Z, Z)-bimodule where the action of an integer n on module element
m (from the right or left) is to add m or −m to itself |n| times, according
as n is positive or negative, and to set 0m = m0 = 0. With this
interpretation, any left R-module is an (R, Z)-bimodule, and any right
S-module is already a (Z, S)-bimodule.
3. By applying part one with one of (R, ρ) or (S, σ) set equal (Z, 1Z ), the
ring of integers with the identity antiautomorphism, show that any left
R-module M can be converted into a right R-module M ρ . Similarly,
any right S-module N can be converted into a left S-module, σ N , by
the recipe of part 1.
4. Suppose N is a monoid with anti-automorphism µ. Let K be a field
and let KN denote the monoid-ring – that is, a K-vector space with
the elements of N as a basis, and all multiplications of K-linear combinations of these bases elements determined (via the distributive laws)
by the multiplication table of N . Define µ̂ : KN → KN by applying µ
to each basis element of N in each K-linear combination, while leaving
the scalars from K unaffected. Show that µ̂ is an anti-automorphism
of KN .
5. Suppose now N = G, a group. Show that every anti-automorphism σ
of G has the form g → (g −1 )α , where α is a fixed automorphism of G.
494
Exercise 2. Let A be any R-algebra, where R is a commutative ring.
1. Show that the commutator mapping A × A → A, which takes (x, y)
to xy − yx is a bilinear mapping.
2. Show that the mapping α : A × A × A → A, called the associator,
which takes (x, y, z) to the element (xy)z − x(yz) of A is a trilinear
form – that is, a multilinear form in three variables.
13.10
Appendix to Chapter 12: Important Ralgebras
These notes have had to hold to a certain mainstream, and one of the mainstream themes has been rings. As a consequence, there is not space6 to do
justice to a fair treatment of non-associative algebras. At least, before recommending some references, we should point out two important reasons for
considering algebras that may not be associative.
• They are absolutely vital to certain branches of mathematics.
• Many of the theorems regarding semisimplicity etc., gain an extra
meaningful life in the context of alternative algebras.
Lie algebras
Let R be a commutative ring. A Lie algebra over R is an R-algebra L
(whose products x ◦ y are traditionally rendered as [x, y]) satisfying these
identities:
(Anticommutativity) [x, x] = 0 , which implies [x, y] = −[y, x]
(The Jacobi Identity) 0 = [[x, y], z] + [[y, z], x] + [[z, x], y],
(13.29)
for all elements x, y, z ∈ L. Of course, as for any R-algebra, the algebra
product “[ , ]” is R-bilinear – in particular, the left and right distributive
laws hold.
6
either on these pages or in the brains of the author
495
Lie algebras arise naturally when one is dealing with so called “derivation
mappings”. Let A be an R-algebra. An (R, R)-bimodule morphism
δ:A→A
is called a derivation if and only if
δ(ab) = δ(a)b + aδ(b),
for all a, b ∈ A. In this context one can add and compose mappings, so, given
derivations, δ1 and δ2 , one can define the “Lie product”
δ3 = [δ1 , δ2 ] := δ2 ◦ δ1 − δ1 ◦ δ2 .
One may check that δ3 is again a derivation (four of eight terms cancel).
Since the set of derivations itself forms an (R, R)-bimodule, we see that it
forms a Lie algebra over R, with respect to the Lie product of mappings just
defined.
Historically, Lie algebras were first introduced in the study of Lie groups,
which may be roughly described as topological spaces whose elements form
a group whose left multiplications are “continuous in every direction” (analytic) and preserve a local Euclidean-space structure over the real or complex number systems at each point (the tangent spaces). Group elements are
thus analytic functions whose derivatives at the identity induce a collection
of transformations of the tangent space at the identity which is closed under
the vector-space operations and the Lie product of transformations.
Since the time of Sophus Lie, the useful and sometimes deep applications
of Lie algebras have spread all over mathematics. In algebra, one finds Lie
GF (p)-algebras arising in the study of p-groups, where the group commutator is employed to define a Lie ring on the direct sum of the sections of a
central series. One finds Lie Z-algebras at the heart of the geometric objects
called “Buildings” which provide the geometries for the classical groups, and
all finite simple nonabelian groups except the alternating groups and the 26
sporadic simple groups. In addition, buildings provide a purely synthetic
framework for the simple complex Lie groups. This is important since otherwise there are only exotic contructions (depending on triality, trilinear and
quatralinear forms and whatnot) for the exceptional Lie groups, and no format explaining why these are the only types. Yet this emerges from the Lie
Z-algebra “spherical building” point of view. In geometry the effect has been
496
enormous: the “Building” concept introduced a central theory for classifying
reasonable incidence geometries – of the sort formed from the coset geometries of Lie Groups. These connections were at the heart of Tit’s wonderful
discovery of Buildings, and the connections with point-line geometries were
first explored by Cooperstein. Finally, there is an enormous connection between Lie Algebras and Algebraic groups. I think it is fair to say that the
subject of Lie Algebras forms a vital cross-roads of mainstream mathematics.
Non-associative algebras in projective geometry
A projective plane is an ordered pair of sets (P, L) and a relation I ⊆ P ×L
(called “incidence”) such that
1. Any two distinct points are together incidence with just one line.
2. Any two distinct lines are together incident with exactly one point.
3. At least two lines have at least three points, or, equivalently (in view
of the first two axioms), at least two points are incident with at least
three lines.
Here is a classical example: Suppose V is a three-dimensional vector space
over a division ring K. Let P and L be the respective collections of 1- and
2-dimensional subspaces of V . The incidence relation is containment. Then
the axioms of a projective plane are satisfied. However, there are tons of
projective planes which are not classical.
The historical route to the study of projective planes has been the creation of an algebraic object which depends on the plane and the choice of
three points in the plane. It is called a coordinatizing ring and is a set T
with a rather cryptic ternary operation about which not a lot can be said.
But with the supposition of automorphisms, more can be said. If there are
enough elations, centered over a point, the ternary operation crystallizes as a
formula in two operations: as ab+c. If every point of some line is a non-trivial
elation center, (T, +) is an additive group, and the plane is called a translation plane. As more automorphisms called homologies are introduced, one
obtains several Z-algebras which are not associative – for example, alternative algebras, moufang rings, and in particular, the algebra of octonions. In
the classical case, the coordinatizing algebra recreates the underlying division
ring K of the defining vector space.
497
These arguments have a venerable history going back to von Staudt,
Hilbert and the book of Veblen and Young.
One of the special coordinatizing rings has a special hold on life of it’s
own. I refer to the algebra of octonions. But to describe this algebra, one
must recall first the algebra of quaternions. Sir William Rowen Hamilton,
it is said, suddenly experienced a “Hadamard event”7 in which the simple
structure-constants of this division-ring were revealed to him while strolling
across a foot-bridge. He whipped out a pocket knife and carved
i · j = k, j · k = i, k · i = j, x · y = −y · x,
whenever x and y are distinct members of {i, j, k}, and
i2 = j 2 = k 2 = −1,
perhaps not in those precise words. There is also a multiplicative identity 1,
independent of i, j, and k, and the division ring H = 1·R⊕iR⊕jR⊕kR, has
dimension four over the real field R. Of course he was thinking about what
this meant for optics, and that incite can only add to the excitement of his
Hadamard Experience. But the reader who has read this chapter must see
that each product of two basis elements is plus or minus some basis element,
so that the basic relations carved on Hamilton’s Bridge bridge are those of a
Z-form – that is, a Z-algebra – and so one can define a “quarternion algebra”
over any field K in place of R. Of course, for most choices of K, it is no
longer a division algebra and has most probably lost its relevence to optics.
Now here is what bothers me. Later, Cayley discovered the Octonions O
which is an eight-dimensional algebra over the real field, a two-dimensional
over the quaternions – i.e., O = H ⊕ H`. This turns out to be a nonassociative algebra with no zero divisors. The basis over the real field is
1, i, j, k = ij, `, i`, j`, k`.
The product of two of these basis elements is always plus or minus a third.
So once again, we obtain a Z-form. With eight basis elements there were only
64 products to remenber. Perhaps Cayley did this work at home with pen
and paper, so there was no need to indelibly carve these facts into a wooden
7
This new word refers to the signal experiences of mathematical discovery surveyed
in The Psychology of Invention in the Mathematical Field by Jacques Hadamard, (Dover
1956).
498
bridge. Hamilton, with only 16 products to remember – well, actually only
nine, since one already knows how the multiplicative identity multiplies –
well, actually only three, once one knows these squares are minus one and
the other products are anticommutative — still felt compelled to embedd
these multiplications into the surface of the bridge in case he should forget
all three of them before he should return home. I suppose the idea was
that should the worst happen, and he couldn’t remember anything when
he got home, there was the comforting thought that these products could
be retrieved from the bridge. In other words, the bridge was a convenient
memnonic device to record just three products!
I am having a lot of trouble beleiving this folk tale concerns an entirely
sober person.
But the tale fades when one realizes that (for those of us who do not
care about “Optiks”) the Otconians play a more important role in our universe than the Quarternions – a role depending only on it’s basic structure
constants. From these constants alone one learns that the Octonions are an
example of an algebra with a quadratic form Q : O → K which preserves
multiplication – i.e. Q(ab) = Q(a)Q(b). Because of this, one can show a very
remarkable fact – called “triality”.
If one has a 2n-dimensional vector space which admits a non-degenerate
quadratic form (see Chapter 17) with n-dimensional totally singular spaces,
then these n-dimensionally singular spaces fall into two classes A and B,
with the property that dim(A/(A ∩ B)) is odd, otherwise, if and only A and
B belong to different classes. Let us call these two classes Mi , i = 1, 2.
Elements from different classes are said to be “incident” if and only if they
meet at an (n − 1)-space. Triality says that if n = 4, the incidence structure
(P, M1 , M2 ) is isomorphic to the structure (M1 , P, M2 ). One could prove
this over finite fields by observing that the right hand side (when truncated
to the first two arguments) obeys the axioms of a polar space, observe these
are classified and uniquely determined by the finite paramenters). Such a
proof can convince a student that this is a real phenomenon (one sees it used
that way as an aside in Cameron’s wonderful teaching notes [??]). But if one
desires a proof good for all fields, one needs the Z-algebra of the octonions.
One should also mention that a Moufang projective plane (those with
lots of elations) can be coordinatized by the Octonion K-algebras. Even
when K is the real numbers R, such planes arise from compact forms of
the exceptional Lie group or algebra of type E6 . I suppose every important
mathematical construction has a role to play, and the Octonions enjoy a
499
special role in the basic nature of things.
13.10.1
Other applications
Often one gains a lot by creating an algebra which is functorially related (a
better explication of our older word “canonically related”) to a non-algebra
problem. The aim is a very basic one: embedd your geometry or combinatorial
situation into a better understood one where there is much more to discuss.
Suppose I can embedd an abstractly defined geometry into an algebra of
matrices. Suddenly one encounters matrices, a vector space, eigen-subspaces
– lots of interrelated extra structure.
Example 12.1. Suppose Γ = (V, E) is a simple undirected graph. Then
there is a symmetric “adjacency matrix” A of 0’s and 1’s whose rows and
columns are indexed by the vertices, and whose (i, j)-th entry aij is 1 if and
only if the i-th vertex is (distinct from) and adjacent to the j-th vertex. There
is a natural place for A to act. Suppose W is the free vector space over field K
with basis V , the vertices of the given graph. Then right multiplication by A
maps each basis vector v to the sum of the basis vectors which are adjacent to
it in the graph – that is, it records a certain linear transformation. The beauty
of it is that so far, one has complete freedom about the choice of the ground
field K. For example, if the graph Γ is finite (has finitely many vertices),
then the vector space W has finite dimension |V | and the transformation
TΓ : W → W which is represented by right multiplication by the matrix A,
must satisfy some minimal polynomial p(x), and K can be chosen to be the
splitting field for this polynomial. That means (assuming a semisimplicity
which will be seen to follow from the fact that A is a symmetric matrix),
that W decomposes into a direct sum of eigen-spaces for the matrix A.
Regularity constraints on the graph Γ can force A to satisfy a polynomial
of low degree.
Example . Suppose Γ is a graph with the following properties:
1. There are v vertices, each adjacent to exactly k other vertices.
2. For any two distinct adjacent vertices x and y, there are λ vertices z
such that {x, y, z} is a 3-clique – i.e., a “triangle”.
3. For any two non-adjacent vertices x and y, there are exacty µ vertices
adjacent to both x and y.
500
A graph with the above properties is called a strongly regular graph
(SRG) with parameters, (v, k, λ, µ). But there are some relations. For every
vertex x, let x⊥ be x together all vertices adjacent to x. Now we see that
|x⊥ ∩ y ⊥ | = k + 1, λ + 2, µ, according as x = y, x is adjacent to y, or y ∈
/ x⊥ ,
2
respectively. It easily follows that A is a linear combination of A, I and J
– where I is the identity matrix and J is the square matrix of the same size
|V | with every entry one.
It follows that A satisfies a cubic equation. More than that, the valence
k is an eigenvalue of multiplicity exactly one, so that for the two remaining
eigenvalues, θ1 , θ2 occuring with multiplicities (eigenspace-dimensions) m1
and m2 we have
m1 + m2 = n − 1
θ1 m1 + θ2 m2 + k = 0.
The latter simply follows from the fact that the trace of the matix A is zero.
The fact that the numbers mi must be integers puts a considerable constraint
on possible parameter sets.
For example, if Γ is a so-called Moore graph – that is, an SRG with
λ = 0 and µ = 1, — then Γ has 1 + k 2 vertices and k = 1, 2, 3, 7 or 57, a fact
that could probably not be deduced by graph-theoretic arguments alone.
Similarly, but more generally, the representation of the commutative algebra C[A] conveys further delicate information about the graph.
All this can be generalized to association schemes and coherent configurations. Suppose, Γ = (V, E) is a directed complete graph, so that
E = V × V − D, where D := {(v, v)|v ∈ V }, the diagonal set. For
each subset, X ⊆ V × V , let X ∗ = {(y, x)|(x, y) ∈ X}. Suppose there is a
partition,
E = D + E1 + Ei + . . . + Er−1 ,
of the directed edges, such that the following hold:
1. D = {(x, x)|x ∈ V }, the “diagonal elements”.
2. For each i = 1, . . . , r−1, Ei∗ = Ei or else Ei∗ = Ej for a unique subscript
j distinct from i. (In the latter case Ej∗ = Ei and the two sets of edges
are said to be paired. In the former case Ei is said to be self-paired.)
3. Each graph (V, Ei ) is regular (that means constant valence, or constant out- or in-valence), but more importantly, given a triple of indices
501
(i, j, k), whenever (x, y) ∈ Ei , then the number of vertices z such that
(x, z) ∈ Ej and (y, z) ∈ Ek is a constant aj,k
which does not depend
i
upon the particular representative pair (x, y) chosen from Ei .
Such an object, basically a decomposition of the complete graph into
some nice subgraphs, is called a coherent configuration (Don Higman),
and if each Ei is self-dual, it is given an older name which goes back to Bose
(1956): an r-class association scheme. An SRG is just a special case of
a three-class association scheme, where E1 is the set of edges of the strongly
regular graph, and E2 is the set of all ordered pairs of distinct non-adjacent
vertices – so that (V, E2 ) is the complementary graph.
For coherent configurations, there is a complete analogy of the algebra
K[A] for SRG’s. This is the Hecke algebra or (to use it’s original name in
this context) the Bose-Mesner algebra[?][?].
Let Ai be the adjaceny graph whose (i, j)-th entry is 1 if and only if the
ordered pair formed by the i-th and j-th vertices belongs to Ei . Thus Ai
is symmetric if Ei is self paired, and the transpose of Ai is the adjacency
matrix for Ei∗ in any case. (For example, in the case of a connected SRG,
A2 = J − I − A1 , and J and I (and hence A2 ) lie in K[A1 ].)
The Hecke algebra is simply the algebra generated by the incidence matrices, that is K[I, A1 , A2 , . . . , Ar−1 ]. The condition about the constants aj,k
i ,
force the Ai to satisfy polynomial relations, and the whole machinery of
matrix algebra is now available to exploit these relations.
These days, there are several good books on this topic. The book of P-H
Ziechang[?] is the most “algebraic” and most general of the books. The interplay with orthogonal polynomials and geometric hypotheses is a little more
manifest in the two books: Bannai-Ito and Brauer-Cohen-Neumeier. The
latter book is even more special, for it concerns distance regular graphs;
these are association schemes with a special subgraph (V, E1 ) controlling the
other Ei in the sense that (x, y) ∈ Ei if and only if d(x, y) = i with distance
measured by shortest paths in (V, E1 ). Being special in this way is not a
draw-back, for even tighter relations hold (since most of the aj,k
are zero)
i
and the relation to groups and buildings is signifiantly close enough to expect
classification theorems. This book is a goldmine of special constructions and
sporadic examples8 .
8
If I am sentenced to prison for writing up these class notes, I will make sure that
Brauer-Cohen-Neumeier is smuggled in with me.
502
Hecke algebras also make their appearance in chamber systems, where
they can be related to deformations and quantum groups – well beyond the
scope of this book. It is enough to know that applications exist.
Example 12.2. We have seen from the appendices to Chapter 1 that one
can associate with every locally finite poset (P ≤), and field K, an incidence
algebra A(P ) whose elements are functions f : I(P ) → K where I(P ) is the
collection of all intervals of P — that is, pairs (a, b) ∈ P × P with a ≤ b.
Example 12.3. Suppose you were a number theorist with a combinatorial
bent. Suppose that you wanted to study a collection of numbers {An,k }
parameterized by positive integers with the property that An,n = 1 and that
An,k = 0 if k > 0. Then the matrix M = (An,k ) is a lower triangular matrix
whose rows and columns are indexed by the non-negative integers. Such a
matrix is invertible (why?). Let us call the entries An,k of M numbers of
the second kind, and then define the entries of M −1 = (dn,k ) to be the
numbers of the first kind. Suppose there is a recursion relation R which
relates a number An,k of the second kind as an integer linear combination of
Ai,j where i + j ≤ n + k and if equal j > k, where the coefficients entering
the linear combination are indepaendant of n or k. (Of course we must
make accomodations for such a recursion relation for very small n and k, by
declaring certain marginal values of An,k to be zero when one of the indices
is negative.) My question is whether it is then true that the numbers dn,k of
the first kind must also obey some recursion
relation Rt determined by R?.
!
n
Consider these examples: An,k =
or An,k = Sn,k , the Stirling numbers
k
of the first kind. In both cases there is a well-known recursion relation R
expressing An,k in terms of two earlier !terms. In the first case the entries
n
of M −1 are the numbers (−1)n−k
: in the second case they are the
k
Stirling numbers sn,k of the first kind. In each of these cases there is a wellknown recursion relation Rt satisfied by the numbers of the second kind – the
entries of M −1 . The recursion relations on the An,k can be represented as an
equation among matrices like M with some of the rows or columns shifted.
In attempting to convert such a relation among matrices exhibiting rowcolumn shifts of the inverse of M one is confronted with the need to take
the inverse of matrices with a row or column of zeroes – certainly the sort
of frustrating experience one dreams about. Now one inherits those zeroes
because one used matrices indexed by the positive integers, and positive shifts
in rows and columns pull in columns or rows of zeroes because we thought
503
that was how An,k should be defined on negative indices. But we could have
worked in the algebra of matrices whose rows and columns are indexed by
all integers Z. The recurrence relation R can sometimes be used to produce
lower triangular Z × Z matrices M̂ extending the original M while still
satisfying the recurrence relation R in its entries. Then the shift operators
can be expressed by left or right multipication by positive or negative powers
of the matrix X = (xi,j ) where xi,j is 1 if and only if j = i − 1 and is zero
otherwise. The relation would look like
M = P (X)M R(X),
where P and R are polynomials in X and X −1 – i.e. Lorentz polynomials.
That means one is working in the following algebra (say, over Q):
M is the collection of all matrices with rows and colums indexed by the
integers Z, with the property that with each matrix M = (mi,j ) there
is an integer g(M ) such that mi,j = 0 if j > i + g(M ). (This means
that to the north-east of some sort of super diagonal a finite distance
away from the , all entries are zero.)
Notice that the calculation of any entry in the product of two of these
matrices is a finite sum, so multiplication is well-defined. Let Mk := {M ∈
M|g(M ) ≤ k}. Then we have
Mk M` ⊆ Mk+` .
Of course this equation resembles what goes on in graded algebras. But
there is one big difference: there is no direct sum into subspaces of homogeneous objects. There is a name for this more general notion. Let P = (P, ≤)
be any complete upper semilattice. We say that an R-algebra A is filtered
by poset P if and only if there is a “grading function” g : A → P such that
if x, y ∈ A, then g(xy) ≤ g(x) ∨ g(y).
Thus if As := {a ∈ A|g(a) ≤ s ∈ P }, then
As At ⊆ As∨t .
The student might be curious about two things:
504
1. It is well-know that the units of M are precisely the matrices which at
their highest non-zero subdiagonal have only invertible elements. Can
this be generalized to all filtered algebras?
2. Surely the student can make precise the apparent relation between
algebras filtered by a poset P and incidence algebras.
.
Sources about coordinatizing rings
Two very good sources are these:
1. Non-associative algebras by R. Schaeffer.
2. Introduction to Lie Algebras and their Representations by J. Humphreys.
but one might find edifying reading in
1. Chapter 16 of the late Marshall Hall’s book Theory of Groups
2. Veblen and Young: Projective Geometry
505
Chapter 14
Categorical Features of Vector
Spaces
14.1
Left and Right Vector Space Dimensions
Throughout the first few sections we fix a possibly non-commutative division
ring D. A right (left) vector space over D is simply a right (left) Dmodule. If V is a right vector space, right linear dependence of a vector
v ∈ V on a set X of vectors in V means v is a finite right linear combination
P
of vectors in X, v = x∈X xαx (all but a finite number of the αx being
zero), or equivalently, v is in the submodule hXDi of V generated by X.
As observed, this notion satisfied the three axioms of a dependence theory,
and so V possesses a (right) basis B, which is a maximal independant
spanning set, and any two bases possess the same invariant cardinality called
the (right) dimension of V .
Obviously the same notions hold for a left vector space over D, using
left linear dependence, and so these spaces possess a left dimension (or
left D-dimension, if D needs to be emphasized). We shall need to deal with
both right and left vector spaces and even (D1 , D2 )-bimodules, in order to
attribute vector space structure to objects like Hom(V, W ), V ⊗D W etc.
Now if D is a division ring, a (D, D)-bimodule V is both a left vector
space over D and at the same time a right vector space over D. Thus V
possesses both a left D-dimension and a right D-dimension. Need these two
dimensons be the same? The answer is in fact “no”, despite the fact that
there is the identity (αv)β = α(vβ), for all (α, v, β) ∈ D × V × D connecting
506
right and left multiplications by elements of D.
These are ways, of course, to convert a right D-module V into a left Dbimodule. This is possible if there is an antiautomorphism σ : D → D.
This means σ is an automorphism of the additive group (D, +) which “reverses” multiplication in the sense that σ(α1 α2 ) = σ(α2 )σ(α1 ) for all α1 , α2 ∈
D. Note that if σ is an antiautomorphism, then σ 2 is an automorphism. If
σ 2 = 1D , the identity automorphism, then σ is said to be an involutatory
antiautomorphism.
An example: The quaternions HK over K (where K is a subfield of
the real field R) is a K-vector space HK with basis B = {1, i, j, k} with
multiplication defined by the rules
1·i
i·j
j·k
k·i
=
=
=
=
i, 1 · j = j, 1 · k = k, 12 = 1.
−(j · i) = k, i2 = j 2 = k 2 = −1
−(k · k) = i,
−(k · i) = j.
Then the mapping σ : HK → HK defined by
σ : 1a + ib + jc + kd → 1a − ib − cj − kd
is an antiautomorphism of HK . Noting that the “norm function” N (x) :=
x · xσ takes its values in the center 1 · K of HK , we deduce that N (xy) =
N (x)N (y). Also, since N (x) is the sum of the squares of the coefficients of
1, i, j and k in x, it is easy to see that N (x) = 0 if and only if x = 0. This
may then be used to show that x has as its inverse, xσ /N (x). Thus HK is a
division ring and σ is an involutory antiautomorphism.
In general if D is a division ring with an involutatory antiautomorphism
σ, any right D-module V can be converted into a left D-module by defining
αv := vασ , for all (α, v) ∈ D × V.
We need only verify
(α1 + α2 )v = α1 v + α2 v
α(v1 + v2 ) = αv1 + αv2 .
α(βv) = (αβ)v
507
(14.1)
(14.2)
(14.3)
for all v, v1 , v2 ∈ V and all α1 , α2 , α, β ∈ D . The first is trivial to verify and
the second follows from
α(βv) := (βv)ασ = (vβ σ )ασ = v(β σ ασ )
= v(αβ)σ = (αβ)v.
Thus V now has the structure of both a right D-module and a left D-module.
We now ask two questions:
(1) Is the left-dimension of V the same as its right dimension?
(2) Is V a (D, D)-bimodule?
The answer to the first question is “yes”. If B = {vτ } is a right basis V , then
any left-linear combination of elements of B can be converted to a right-linear
combination and vice versa. Moreover, a left-linear combination of elements
of B that is equal to zero converts to a right linear combination equal to
zero, so its coefficients are all zero. But as these cofficients are σ-images of
the left-coefficients, they too are all zero. Thus B is also a left-basis over D.
The answer to the second question is “no” – unless σ is also an automorphism; in which case D is a field. To be a bimodule we must have
(αv)β = α(vβ) for all α, β ∈ D and v ∈ V , whence
vασ β = (vβ)ασ so
0 = v(ασ β − βασ ).
But as the right annihilator of v in D is an ideal of the division ring D not
containing 1, it must be 0D , so ασ commutes with any β in D. Since α was
an arbitrary element of D, D is commutative under multiplication and so is
a field.
14.2
hom(V, W ) as a Left Vector Space; the
Dual Space
We now introduce
Lemma 345 Let V and W be right vector spaces over a division ring D and
suppose at the same time that W is a left vector space over the division ring
508
E. (It is not required that W be an (E, D)-bimodule.) Then homD (V, W ) is
a left vector space over E, by the rule
(f )(v) := (f (v)) for all (, v, f ) ∈ E × V × homD (V, W ).
Morever,
dimE (homD (V, W )) = right dimD (V ) · left dimE (W ),
(14.4)
provided the first factor on the right is finite.
proof: Verifying the left E-module axioms for H := homD (V, W ) is straightforward.
Let B be a right D-basis of V and let C be a left E-basis of W . For any
pair (v, w) of basis elements in B × C, define the right D-homomorphism
v,w : V → W whose value on the basis B is given by
v,w (u) :=
(
w, if v = u
0W , if v 6= u
P
This means that if u = b∈B bαb , then v,w (u) = wαv .
Now choose f ∈ homD (V, W ). Then for each v ∈ B,
f (v) =
X
β w
w∈C v,w
where, for fixed v, all but a finite number of the E-scalars βv,w are zero. In
that case, we may form the finite double sum
g :=
X
β v,w v,w v,w
in homD (V, W ),
and calculate that its values agree with that of f on the basis B. Thus f = g
is an E-linear combination of the set of D-homomorphisms X := {v,w |
(v, w) ∈ B × C}. So X spans H.
If there were an E-linear relation
0=
X
γ ,
(v,w)∈B×C v,w v,w
in H, with only a finite number of the γv,w nonzero, evaluating it at basis vecP
tor v ∈ B gives w γv,w w = 0 so each γv,w = O, since C is E-independant. So
509
as v is arbitrary in B, all coefficients γv,w = 0. Thus X is an E-independant
subset of H.
One concludes that X := {v,w | (v, w) ∈ B × C} is an E-basis of H =
homD (V, W ). Clearly |X| = |B × C|, and the proof is complete.
The special case that E = D, and W = D as a (D, D)-bimodule is of
interest. In this case,
V ∗ := homD (V, D)
is a left D-module, called the dual space of V .
Corollary 346 If V is a right D-module, then its dual space V ∗ is a left
D-module. If V has finite dimension then
right dimD (V ) = left dimD (V ∗ ).
remark: When V is infinite dimensional, always
left dimD (V ∗ ) > right dimD (V ).
14.3
Tensor Products of Vector Spaces Over
Fields
One may recall from the last section of the previous Chapter 12, that the
tensor product was given a right T -module structure when the right-hand
tensor factor had the appropriate bimodule structure. In trying to consider
tensor products in the category of vector spaces over a division D we are
presented with the problem of how to make a (D, D)-bimodule when the
division ring D is not commutative. In the first part of this section we saw
that the device of using an antiautomorphism gave both a left and a right
vector space structure, but did not yield a bimodule.
It is for this reason, that in the rest of this section and the next on
tensor algebras we assume that D is a (commutative) field and that all right
D-spaces V are regarded as symmetric (D, D)-bimodules by the rule:
αv := vα, for all (α, v) ∈ D × V.
For this reason it will be useful to restate the universal mapping property
of the tensor product (Theorem 330) for (F, F )-bimodules where F is a field,
as follows:
510
Lemma 347 (Universal Mapping Property of ⊗) Suppose V , W and U are
(F, F )-bimodules. Suppose the mapping f : V × W → U is bilinear in the
sense that for any (v, w) ∈ V × W ,
1. f (−, w) : V → U and f (v, −) : W → U are F -linear mappings (as
either right or left vector space morphisms), and
2. f (uα, w) = f (v, αw) for all α ∈ F .
Then f = f¯ ◦ t where t : V × W → V ⊗F W is the canonical projection which
defines the tensor product, and
f¯ : V ⊗F W → W
is F -linear. That is, f can be factored through f¯.
In this subsection we wish to show that if V and W are finite-dimensional
vector spaces over a field F , then V ⊗ W has dimension equal to the product
of the dimensions of V and W .
For general rings, it is not true that the tensor product of two R-modules
is even a larger R-module. A Z-module M is said to be divisible if, for
each element x ∈ M , and each positive integer n, there is an element y in M
such that ny = x. The additive group of the rational numbers (Q, +) is an
example of a divisible group.
Now let A be a divisible right Z-module, and let B be a torsion left
Z-module. Then
A⊗Z B = 0.
This is because A⊗Z B is generated by the elements a⊗b, where (a, b) ∈ A×B.
Since B is torsion, there exists a positive integer n(b) such that n(b) · b = 0B .
Since A is divisible, there exists an element c in A such that c · n(b) = a.
Then
a ⊗ b = (c · n(b)) ⊗ b = c ⊗ (n(b) · b) = c ⊗ 0 = 0.
So the generating set of the tensor product is just zero.
So it is at least non-trivial to assert the following:
Lemma 348 Let V and W be finite dimensional vector spaces over a field
F , each regarded as an (F, F )-bimodule. Then
dimF (V ⊗F W ) = dimF (V ) · dimF (W ).
511
proof: We regard the vector spaces as left or right vector spaces indiscriminantly according to our convention for fields. Let B = {v1 , . . . , vn } be a right
F -basis for V and let C = {w1 , . . . , wm } be a left F -basis for W . Let X be
the collection of elements
{vi ⊗ wj |1 ≤ i ≤ n, 1 ≤ j ≤ m}
then X spans V ⊗ W since the latter is spanned by the pure elements v ⊗ w,
(v, w) ∈ V × W , and each of these is clearly a linear combination of elements
of X.
It is not so easy, however, to show directly that X is a linearly independant
set. It will be true, however, if we can show that V ⊗ W has dimension at
least |X| = nm.
Now homF (V, W ) is a left vector space over F as prescribed in Lemma 345.
Then as described in the proof of Lemma 345, the set X := {ij |1 ≤ i ≤
n, 1 ≤ j ≤ m} is an F -basis for homF (V, W ), where, as one will reall,
ij (wk ) = δik wj , for all i, j, k
(δik is Kroenecker’s delta; it is 1 if i = k, 0 otherwise).
We now define a mapping
φ : V × W → homF (V, W )
as follows: If v = vi αi and w = βj wj , the unique expressions for v ∈ V
and w ∈ W in terms of the (right) F -basis B of V and (left) F -basis C of
W , then we set
X
φ(v, w) = i,j αi βj ij .
P
P
Then φ is is a biadditive map – that is, φ(v1 + v2 , w) = φ(v1 , w) + φ(v2 , w)
and φ(v, w1 + w2 ) = φ(v, w1 ) + φ(v, w2 ). Moreover,
X
φ(vα, w) = φ(
=
X
i,j
vi αi α, w) =
X
((αi α)βj )ij =
i,j
φ(vi α, βj wj )
X
i,j
αi (αβj )ij
= φ(v, αw), for all α ∈ F.
By the universal property of the tensor product, the mapping φ factors
through the canonical projection t : V × W → V ⊗F W . So we obtain
the factorization φ = φ̄ ◦ t where
φ̄ : V ⊗F W → homF (V, W )
512
is an appropriate morphism as left F -vector spaces. Now by its definition,
φ(vi , wj ) = ij , so the F -basis elements of X are all image elements. It follows
from the factorization that φ̄ is onto. Thus
left dimF (V ⊗F W ) ≥ left dimF (homF (V, W )) = nm.
The proof is complete.
14.4
Tensor Products of (F, F )-bimodules
As assumed throughout this and subsequent sections, F is a field and any
F -vector space is regarded as an (F, F )-bimodule by adopting the rule
vα = αv for all (α, v) ∈ F × V.
(14.5)
From the previous section, V ⊗F W (that is (F VF )⊗F (F WF )) is both a
right- and left- vector space.
In fact it satisfies equation ( 14.5). This is clear since it is generated as
an additive group by the “pure” elements {v ⊗ w}, v ∈ V , w ∈ W , and for
any scalar α in F ,
(v ⊗ w)α =
=
=
=
=
v ⊗ (wα) (V ⊗ W as right F -module)
v ⊗ (αw) (by (14.5) for W )
(vα) ⊗ w (definition of ⊗)
(αv) ⊗ w (by (14.5) for V )
αv ⊗ w (V ⊗ W as left F -module),
and these multiplications by α, being biadditive, extend to all of V ⊗ E by
the universal property.
Moreover if B = {x1 , . . . , xn } and C = {y1 , . . . , ym } are bases of V and W
respectively, then the set of “pure” tensors, B ⊗ C := {b ⊗ c|(b, c) ∈ B × C}
is an F -basis (both left and right) of V ⊗F W .
Finally, as (−⊗F W ) and (V ⊗F −) are both covariant functors in the
category of (F, F )-bimodules satisfying equation ( 14.5), we see that if σ and
τ are (right-operating) linear transformations of V into itself and W into
itself, respectively, then there is a unique induced F -linear transformation
σ ⊗ τ : V ⊗F W → V ⊗F W , satisfying
(v ⊗ w)(σ ⊗ τ ) = (vσ) ⊗ (wτ ).
513
(14.6)
14.5
Tensor Products of Linear Transformations.
We continue with the notation of the previous subsection.
Suppose now that S := [σ]B = (αij ) and T := [τ ]C = (βij ) are the
matrices representing the actions of σ and τ on V and W with respective
bases B and C. Since σ and τ are taken to be right operators, the rows of
these matrices reveal the fate of each basis element: thus
vi σ =
X
vj αij ,
ws τ =
X
βst .
If the basis B ⊗ C is given the “row ordering”
(x1 ⊗ y1 , . . . , x1 ⊗ ym , x2 ⊗ y1 , . . .)
obtained by writing the rows of the array (xi ⊗ yj ) one after another proceeding from top to bottom, then
[σ ⊗ τ ]B⊗C


α11 T α12 T . . . α1m T


..
. . . ..
=  ..

α1n T α2n T . . . αmn T
is an nm × nm matrix, called the Kronecker product of S and T .
14.6
The Universal Mapping Property of Multiple Tensor Products: Multilinear Mappings
We have from Lemma 334 on the associativity of the tensor product, that
one basic bimodule is produced by a multiple tensor product, despite the
placement of parentheses. We apply this construction to the category of
(F, F )-bimodules satisfying equation ( 14.5). In the subsection before last
it was observed that this class of bimodules is closed under taking tensor
products – i.e. equation ( 14.5) holds for the tensor product of two such
bimodules. Also, any morphism between them as right vector spaces is also
automatically a morphism as left vector spaces, and is simply called an F linear mapping.
514
Now suppose V1 , . . . , Vn , U are all (F, F ) bimodules satisfying ( 14.5). A
mapping
f : V1 × V2 × · · · × Vn → U
is said to be n-multlinear (or just multilinear) if and only if, for each index
i and (v1 , . . . , vn ) ∈ V1 × · · · × Vn
1. f (v1 , . . . , vi−1 , −, vi+1 , . . . , vn ) : Vi → U is F -linear, and
2. f (v1 , . . . , vi−1 α, vi , . . . , vn ) = f (v1 , . . . , vi−1 , αvi , . . . , vn ) for all α ∈ F .
Theorem 349 (The Universal Mapping Property and Multilinear Maps.)
Suppose
f : V1 × V2 × · · · × Vn → U
is an n-multilinear mapping of (F, F )-bimodules satisfying ( 14.5). Then f
factors through an F -linear mapping
f¯ : V1 ⊗F V2 ⊗F · · · ⊗F Vn → U,
that is, f = f¯◦ t(n) where t(n) is the map from the n-fold Cartesian product of
the Vi to their n-fold tensor product which takes (v1 , . . . , vn ) to v1 ⊗ · · · ⊗ vn .
proof: The case n = 2 is the universal property for bilinear maps enunciated
in the previous section (Lemma 347). The rest is an easy induction from this,
using the associativity of the tensor product.
The same sort of induction immediately yields
Corollary 350 Let fi be an F -linear transformation of the vector space Vi ,
i = 1, . . . n. These transformations induce a unique F -linear transformation
of V1 ⊗F · · · ⊗F Vn into itself taking each pure vector v1 ⊗ · · · ⊗ vn to f1 (v1 ) ⊗
· · · ⊗ fn (vn ).
515
Chapter 15
Tensor Algebras and Their
Offspring
15.1
Algebras
15.1.1
Nonassociative Algebras
Usually, when we tack on an adjective to a noun, the denotation (that is, the
set of objects to which the altered noun may refer) decreases. The topic of this
subsection represents an exception to this rule: “Nonassociative Algebra” is
a more general notion than that of “Algebra”. An Algebra is a vector space
which is at the same time a ring, with a compatibility condition connecting
ring multiplication and scalar multiplication.1 A Nonassociative Algebra
is a vector space A over a field F , with a “rule of multipliction”
µ:A×A→A
which is bilinear, that is
µ(αa1 + βa2 , a) = αµ(a1 , a) + βµ(a2 , a)
µ(a, αa1 + βa2 ) = αµ(a, a1 ) + βµ(a, a2 )
µ(a, αb) = µ(aα, b),
1
Formerly, an Algebra was called a Linear Algebra; but this term so strongly resembled
the term for the study of vector spaces and their morphisms, that some modification was
necessary. Today they are just “Algebras”.
516
where α and β are scalars of F , and a, b, a1 , a2 are all elements of the (F, F )bimodule A. Of course if we write a · b or ab for µ(a, b) we recognize here the
right and left distributive laws and the compatibility laws
α(a · b) = (αa) · b = a · (αb) for all(α, a, b) ∈ F × A × A.
(15.1)
Note that there is no assumption of an associative law nor is there the
assumption that a multiplicative identity element exists . In fact, any F linear mapping µ̄ : A⊗F A → A, defines for us a “multiplication” µ : A×A →
A by setting
µ(a, b) := µ̄(a ⊗ b).
Finally, if B := {a1 , . . . an } is a finite F -basis for A, then multiplication
defines a collection SB := {γijk } of n3 scalars called the strucure constants
defined by the equations:
ai · aj =
X
a γijk.
k k
Conversely, the multiplication table of A is completely determined by the set
SB of structure constants.
Very important classes of non-associative algebras abound throughout
mathematics:
1. The (associative) Algebras.
2. The Lie Algebras.
3. The Jordan Albegras.
4. The Cayley Algebras – in particular, the Octonions (over the reals).
We shall pursue only certain instances of the first category, since these
make their appearance in several guises in multilinear algebra. For the student who wishes to look further into the other categories, the author strongly
recommends two classics: the book of Schaeffer: Introduction to Nonassociative Algebras[?] and the book by Humphries: Lie Algebras and their Representations[?].
517
15.1.2
Associative Algebras
As indicated in the previous section, an (associative) algebra is a vector
space A equipped with a binary multiplicative operation A×A → A (denoted
by “·” or juxtaposition) such that
(1) (A, +, ·) is an F -algebra in the sense of the previous section. In particular, the compatibility laws (15.1) hold.
(2) (A, +, ·) is a ring.
As usual, we regard A as a symmetric (F, F )-bimodule so
αv = vα for all (α, v) ∈ F × A.
(15.2)
We inherit most of the ring-theoretic notions, except that now they are
restricted by the additional vector space requirements. Thus:
An F -algebra homomorphism is a ring homomorphism which is F -linear.
An F -subalgebra is a subring which is also an F -vector subspace.
An algebra right ideal is a right ideal of the ambient ring which is an F vector space. Since F · 1, is a subring in the center, all right A-modules
are vector spaces over F . In particular, right ideals in the ring sense
are automatically right algebra ideals.
Of course right multiplication of A by any element a induces a linear
transformation ρ(a) : A → A. Now suppose A and B are two algebras over
F . Then the mapping
A × B × A × B → A⊗F B
which takes (a1 , b1 , a2 , b2 ) to a1 a2 ⊗ b1 b2 is multilinear in each argument and
so factors through (A⊗F B) × (A⊗F B), and thus, by the comments of the
previous subsection, defines a unique “multiplication” on A⊗F B. Thus we
have
Lemma 351 (Tensor Products of Algebras) If A and B are algebras over F ,
then A⊗F B is itself an algebra over F with multiplication uniquely defined
by the following multiplication on pure vectors:
(a ⊗ b) · (c ⊗ d) := ac ⊗ bd.
518
remark: Note that right multiplication by c ⊗ d induces on (A⊗F B) the
tensor product of transformations: ρ(c) ⊗ ρ(d).
Also, if B is a field extending the ground field F , the definition of (A⊗F B)
as a F -algebra can be extended to a B-algebra. Behold.
Lemma 352 (Extension of the Field of an Algebra) Suppose K is an extension field of the field F and let A be an algebra over F . Then in a very
natural way, A⊗F K is a K-algebra. Multiplication is defined by
a⊗α · b⊗β := ab ⊗ αβ,
just as for F -algebras; one only adds the observation that as K is commutative, this multiplication is compatible with the K-vector space structure of
A ⊗ K.
remark: As noted before, an F -basis for the algebra A is also a K-basis for
the K-vector space A⊗K. So, here, the set of structure constants Y defined
by this basis, is also a set of structure constants of A ⊗ K. It is just that the
field of scalars has been enlarged.
If the K-algebra B = A⊗K is obtained from A in this way, then A is
called an F -form of B.
Now suppose A is an algebra over the field F . We say that A is a graded
algebra with grading monoid M if and only if the F -vector space A is a
direct sum of vector subspace Aσ indexed by the elements of M such that
any element in Mσ multiplied on the right by an element of Aτ is an element
of Aστ . 2 We write this as
A=
M
Aσ
(15.3)
Aσ Aτ ⊆ Aστ .
(15.4)
σ∈M
If A is graded by the monoid M , the elements which reside within one of
the components Aσ of the grading are called the homogeneous elements
of grade σ. A homogeneous element is simply a homogeneous element
of grade σ for some unspecified σ. The additive identity element 0A is homogeneous of any degree. But every non-zero homogeneous element belongs
to a specific grade. The homogeneous elements form a mutiplicatively closed
set. examples:
2
Often the monoid is taken to be commutative, whether A is a commutative ring or
not. Our definition allows Mστ to be distinct from Mτ σ .
519
1. If K is an extension field of F , then K is an algebra over F .
2. The ring of functions X → F . First this forms an F -vector space
A. Multiplication is coordinatewise. (Of course the basis X is special
here since it forms a complete system of orthogonal indecomposable
idempotents. If we change the basis, the recipe yields a different ring
multiplication.)
3. If V is a vector space over F , then the F -endomorphism ring homF (V, V )
is an algebra over F (multiplication is composition of transformations).
4. The polynomial ring F [X] of all polynomials in a finite number of indeterminates chosen from the (possibly infinite) set X of pair-wise commuting indeterminates. This ring is graded by the collection of degrees,
which form the monoid of all non-negative integers under addition.
5. A Generalization of the Monoid Algebra. Let M be any monoid with its
binary operation written multiplicatively. Finally, let A be any algebra
over F . Now we let B be the direct sum of copies of the algebra A
indexed by M . Formally this means that an arbitrary element b of B is
a function b : M → A which assumes a non-zero value in A only finitely
often. Its value at σ ∈ M is denoted b(σ). The homogeneous elements
of degree σ are those funtions which vanish on M − {σ}. The product
in B of two functions x and y, is then defined by specifying products
of these homogeneous elements. If f has value a ∈ A − {0} at σ and is
zero otherwise (i.e. it is homogeneous of degree σ) and similarly, g is
homogeneous of degree τ with value b ∈ A at τ , then f ·g is the function
which vanishes at all κ 6= στ , and has value ab at στ . Extending by
L
linearity from homogeneous elements, this defines a product on Aσ
to form a graded algebra over F , with M as its grading monoid.
In the case that A = F , this is the Monoid Algebra introduced as an
example of a ring in Chapter 6. The Group Algebra F G is a special
case, with M = G any group.
In the case that M is the group Z/(2), and A is any algebra over F ,
the construction gives the A-superalgebra.
520
15.2
The Tensor Algebra
Let V be a fixed F -vector space. We define an algebra T (V ) called the
Tensor Algebra over V whose vector-space structure is given by
T (V ) := F ⊕ V ⊕ (V ⊗ V ) ⊕ (V ⊗ V ⊗ V ) ⊕ . . .
and whose multiplication is defined by “⊗”. Thus every element of T (V ) is
a finite F -linear combination of monomial pure tensors
v1 ⊗ v2 ⊗ · · · ⊗ vn
where the vi are vectors of V and n is any non-negative integer. Indeed, since
V has a basis, the vi may be assumed to be members of a basis B of V . The
members of the monomial need not be distinct.
Considering the way tensor monomials multiply under “⊗”, it is not difficult to see
Lemma 353 The tensor algebra T (V ) is isomorphic to the Monoid Algebra
F M (B), where M (B) is the free monoid generated by a basis B of V .
Algebraically it does not depend upon which basis is used: the isomorphism type of the entire algebra T (V ) is determined only by F and dimF (V ).
It is exactly like the rings of polynomials except the indeterminates do not
commute, nor enjoy any equational identities. The tensor monimials with d
factors generate the component of homogeneous elements of degree d. Clearly
the assigning of degrees in this way produces a grading with respect to the
monoid of non-negative integers under multiplication.
We observe
Lemma 354 If f : V → V is an F -linear transformation, then f induces an
algebra endomorphism fˆ of T (V ), whose restriction to the direct summand
V ⊗ V ⊗ · · · ⊗ V (with d tensor factors),
the homogeneous part T (V )d of degree d, the transformation
f ⊗ f ⊗ · · · ⊗ f (with d tensor factors).
Thus GL(V ) ≤ AutT (V ).
521
We now display one of the most important properties of the Tensor Algebra.
Theorem 355 (The Universal Property of the Tensor Algebra.) Let A be
any F -algebra generated by a set X. (This means no proper subalgebra contains X.) Let V be the free F -module over X – that is, V has X as a
basis, and consists of all formal F -linear combinations of the elements of X.
Then there exists a unique algebra epimorphism tX : T (V ) → A for which
tX (x) = x for all x ∈ X.
sketch of proof: The desired homomorphism is defined by its restriction
to its homogeneous parts. The part of T (V ) of degree k is
T (V )k := V ⊗ · · · ⊗ V (with k factors ).
It has dimension nk , where n = dimF (V ). The mapping which sends
(x1 , . . . , xn ) → x1 x2 · · · xn ∈ A,
is k-multilinear and so induces an F -linear mapping
fk : T (V )k → Ak ,
where Ak is the F -subspace of A spanned by the k-fold products of elements
of X. Then two things remain: to show that the mapping f = ⊕fk defined
by the fk is (1) F -linear and (2) preserves the ring multiplication.
15.3
The Symmetric Algebra
Let T2 := V ⊗ V be the homogeneous part of T (V ) of degree 2, and let S2
be the F -subspace of T2 generated by the set
{a ⊗ b − b ⊗ a|(a, b) ∈ V × V }.
We observe
Lemma 356
1. If B := {vσ } is a basis for V , then
B2 := {vσ ⊗ vτ − vτ ⊗ vσ | each 2-set {σ, τ } listed just once.}
is a basis for S2 . If dimF V = n, then dimF S2 = n(n − 1)/2.
522
2. As noted hom(V, V ) acts on T2 = V ⊗ V , with each f ∈ hom(V, V )
acting as f ⊗ f . The subspace S2 is invariant under this action.
P
P
proof: If a = vσ ασ and b = vσ βσ (with only a finite number of the
coefficients non-zero in each case), then
a⊗b−b⊗a=
X
σ,τ
(vσ ⊗ vτ − vτ ⊗ vσ )ασ βτ ,
(15.5)
where each unordered pair {σ, τ } is encountered just once in the sum. It
follows that B2 spans S2 .
Suppose a finite linear combination of the elements of B2 were zero. Since
each unordered pair (σ, τ ) occurs just once in the indexing of B2 , this would
be a finite linear combination of the basis B⊗B of T2 with the same coefficient
at vσ ⊗ vτ and at −vτ ⊗ vσ . Since B2 is a linearly independant set, all these
coefficients are zero. Hence B2 is linearly independant.
If |B| = dimF V = n is finite, the dimension of S2 is the number of
2-subsets of B, which is n(n − 1)/2.
Now suppose f ∈ hom(V, V ). Then for any (a, b) ∈ V × V ,
(a ⊗ b − b ⊗ a)f = (af ) ⊗ (bf ) − (bf ) ⊗ (af ).
Thus f leaves the set {a ⊗ b − b ⊗ a|(a, b) ∈ V × V } invariant, and so leaves
its F -span, S2 invariant. the proof is complete.
Now let I := T (V )S2 T (V ) be the 2-sided ideal generated by S2 . Clearly
this also is invariant under the action of hom(V, V ) on T (V ). We let s :
T (V ) → T (V )/I be the canonical algebra homomorphism, and let S(V ) :=
T (V )/I be its image. Then S(V ) is an algebra over F which is commutative
since it is generated by a set {v + I|v ∈ T (V )1 = V } of pairwise commuting
elements.
For any two elements a := x + I and b := y + I, x, y ∈ T (V ), we write
a ∨ b for x ⊗ y + I. Thus “∨” is the multiplication induced on the factor
algebra S(V ). the algebra S(V ) is called the symmetric algebra over V
Now once again consider a basis B = {vσ } of V . For each σ set sσ :=
s(vσ ) = vσ + I, and set S := s(B) = {sσ }. Since the ideal I contains only
elements of degree 2 or more, S is a basis for S(V )1 := s(T (V )1 ) = s(V ) ' V .
We wish to discuss monomial expressions in S(V ) without being encumbered with unwieldy double subscripts; so, for this purpose we impose a total
523
ordering on the set S. Now for every finite non-decreasing sequence
τ = (s1 , . . . , sn ), s1 ≤ s2 ≤ · · · ≤ sn ,
set τ̄ = s1 ∨ · · · ∨ sn . Let F be the complete set of all finite non-decreasing
sequences and set
X := {τ̄ |τ ∈ F}.
Now, any finite tensor monomial v1 ⊗ · · · ⊗ vm is congruent modulo I to
a tensor monomial in the same vi but with the factors arranged in a nondecreasing finite sequence τ . Thus as the tensor monomials form an F -basis of
T (V ), their images under the natural projection s : T (V ) → S(V ), comprise
the set X, and so at the very least:
(*) The F -span of X is all of S(V ).
(Of course it will emerge that it is actually an F -basis, but we do not require
this fact at this time.)
Now let R := F [B], the ring of all polynomials over F with indeterminates
being the set B. (Since the ring of polynomials is a formal construction, the
set of indeterminates can be anything we want.) Clearly B together with
the identity element of degree zero is a set of generators of R. But since
the free F -space spanned by B is V , the universal mapping property for
tensor algebras (Theorem 355) implies the existence of a surjective algebra
homomorphism
f : T (V ) → R,
which maps B (as a subset of V = T1 ) to B (as the set of monomials in the
polynomial ring R of degree one) by the identity mapping.
Since R is commutative, the expressions aσ ⊗ bτ − bτ ⊗ aσ all lie in ker f .
As a consequence, B2 , S2 and hence I = T (V )S2 T (V ) lie in ker f . Thus the
mapping f “factors” through S(V ) = T (V )/I, and so induces a surjective
algebra homomorphism
f¯ : S(V ) → R,
taking sσ to the monomial vσ in R, and so, for any finite sequence (s1 , . . . sm ),
one has
f¯(s1 ∨ · · · ∨ sm ) = v1 v2 · · · vm .
It follows that any linear combination,
X
τ ∈J
τ̄ ατ ,
524
where J is a finite subset of the set F of non-decreasing sequences in S,
is mapped by f¯ to a polynomial with exactly the same collection {ατ } of
coefficients. Since, as noted in (*) above, any element of S(V ) is such a
linear combination, the previous sentence implies ker f¯ = 0, making f¯ an
algebra isomorphism.
One notes that these arguments demonstrate that the universal property
of the tensor algebra specializes to a universal property of S(V ) for commutative F -algebras: Summarizing:
Theorem 357 (Fundamental and Universal Properties of the Symmetric Algebra.) The following hold:
1. The algebra S(V ) is isomorphic as an F -algebra to the polynomial ring
F [B] with indeterminates being any F -basis of V .
2. For every commutative F -algebra A with generating set B, there is a
surjective algebra homomorphism from the ring of polynomials over F
with B as a set of indeterminates,
π : F [B] → A,
which is the identity mapping when restricted to B.
15.4
The Exterior Algebra
As in the previous subsection T2 will denote the subspace T (V )2 of the tensor
algebra T (V ) consisting of all tensors of degree two. It contains a subspace
E2 which is the F -span of the set of all pure tensors {u ⊗ u|u ∈ V }. Let
J := T (V )E2 T (V ) be the two sided ideal in T (V ) generated by E2 and let
E(V ) be the factor algebra T (V )/J. Since J is generated by homogeneous
elements it follows that the natural algebra homomorphism
e : T (V ) → E(V ) := T (V )/J
induces a grading
E(V ) =
M
e(T (V )k ).
The factor algebra E(V ) is called the exterior algebra of V . The
induced multiplication is denoted by the “wedge” symbol “∧”. Specifically
if a = â + J and b = b̂ + J, then
â ⊗ b̂ + J is defined to be a ∧ b.
525
Two facts are immediate:
1. E1 := e(V ) is isomorphic to V as a vector space, and, with the identity
element of degree zero, generates E(V ) as an algebra.
2. For any vectors v and u of V , one has
u ∧ u = 0 and u ∧ v = −(v ∧ u).
(15.6)
Now for every non-negative integer k, we set k (V ) := e(Tk ). One realizes
V
that if k = 0, then 0 (V ) is just a copy of the ground field F .
V
Now let B = {Vσ } be a basis of V . Again, to simplify notation, we assume
a total ordering (B, ≤) and let A be the full collection of all finite properly
ascending sequences of elements of B. Thus if τ = (u1 , u2 , · · · , um ) ∈ A,
then ui ∈ B and ui < ui+1 ,i = 1, . . . , m − 1. Given such a τ , we write
τ̂ = u1 ⊗ u2 ⊗ · · · ⊗ un and write τ ∗ := e(u1 ) ∧ e(u2 ) ∧ · · · ∧ e(u2 ) = e(τ̂ ), an
element of E(V ).
We know from a previous subsection that the collection of all tensor
monomials, X = {a1 ⊗ · · · ⊗ an |n ∈ N ∪ {0}, ai ∈ B} form an F -basis of
T (V ) (where the grade 0 element 1 ∈ F , is the intended tensor monomial
when n = 0). Now using equation (15.6), we see that of such a monimial is
congruent modulo J, to zero or to ±1 times a tensor monomial in the same
factors, but appearing in strictly ascending order from left to right in the
tensor monomial. Thus if a monimial factor is repeated, then the term lies
in J and maps to 0 under e : T (V ) → E(V ). Otherwise, if τ is the sequence
of distinct factors of a monomial u1 ⊗ · · · ⊗ un placed in strictly ascending
order, then e(u1 ⊗ · · · ⊗ un ) = τ ∗ . It follows that the set X ∗ = {τ ∗ |τ ∈ A
spans A(V ) as a vector space.
Now we construct an F -algebra A(B), starting with a totally ordered
set (B, ≤). Again we let A(B) be the set of all strictly increasing finite
sequences of elements of B, including the empty sequence ∅. We say that
two such sequences σ and τ intersect if and only if they contain a common
member of B. If σ = (a1 , · · · , am ) and τ = (b1 , · · · , bn ) are elements of A(B)
which do not intersect, then there is a unique permutation π on {ai } ∪ {bi }
which rearranges (a1 , · · · , am , b1 , · · · , bm ) into a strictly ascending sequence
στ ∈ A(B). Since π is uniquely determined by σ and τ , we can write s(σ, τ )
for sgn(π).
526
We can now define the algebra A(B). Its elements are the vectors of free
F -bimodule generated by A(B) — that is, all formal F -linear combinations of
elements of A(B). The algebra is then determined by the structure constants
– that is, by displaying exactly how the basis elements in A(B) multiply. If
σ, τ ∈ A(B), then
στ =
(
0
if σ and τ intersect,
(στ )s(σ, τ ) otherwise
It is easy to verify that the associative law holds for triples of products
among the basis elements in A(B) and so A(B) is an associative algebra.
Clearly, the empty sequence is a multiplicative identity element. If we
define the degree of any ascending sequence of A(B) to be the length of the
sequence, then degrees add for non-zero products of the basis elements. In
this was A(B) becomes a graded algebra. Thus A(B)0 = ∅F . A(B)1 = V ,
the free vector space spanned by B. Clearly any element of A(B) is an
algebra product of elements in B so B generates the algebra A(B).
Moreover, from the definition of multiplicative, for any two elements
b1 , b2 ∈ B, one has
b1 b2 = −b2 b1
b1 b1 = 0 (products in A(B))
(15.7)
(15.8)
By the universal mapping property, there is an onto algebra homorphism
f : T (V ) −→ A(B).
It follows from equations (15.7) and (15.8 ), that for any vector v in V ,
v ⊗ v ∈ ker f . Thus the ideal J = ker e lies in ker f . It follows that the
surjective algebra homomorphism f “factors through” E(V ). In other words,
there is a surjective algebra homomorphism
f¯ : E(V ) → A(B)
such that f = f¯ ◦ e. From this composition one sees that for each ascending
sequence τ ∈ A(B),
f¯(e(τ̂ )) = f¯(τ ∗ ) = τ.
It follows from this formula, and the fact that A(B) is an F -basis for A(B),
that ker f¯ = 0. Thus f¯ is an isomorphism. That E(V ) is graded by the
V
k (V ) follows from the grading of A(B).
527
The above argument also yields a universal mapping property for the
exterior algebra. Summarizing, we have
Theorem 358 (Universal Property of the Exterior Algebra).
1. Let V be a vector space over the field F . Then for any F -basis B of V ,
and total ordering (B, ≤) of B, we have an algebra isomophism
ι : E(V ) → A(B)
between the exterior algebra and the algebra of ascending sequences,
A(B).
2. Conversely, if (B, ≤) is a totally ordered set, F is a field and V is
the free F -bimodule generated by B, then the isomorphism ι exists as
above.
3. If (A, +, ·) is an associative algebra over a field F , generated by the
multiplicative identity and a subset B satisfying
b1 · b1 = 0 and b1 · b2 = −b2 · b1 , for all b1 , b2 ∈ B,
then for any total ordering of B, there is a surjective algebra homomorphism
A(B) −→ A
which takes every ascending sequence τ = (b1 , · · · , bm ) in A(B), to the
product b1 · b2 · · · · · bm in A.
15.5
Transformations Induced on the k-fold
Wedge Products: The Determinant as
Group Homomorphism.
It has been pointed out in Lemma 354, page 521, that any f ∈ homF (V, V )
L
induces a grade-preserving endomorphism fˆ =
(⊗k f ) of T (V ) which induces
⊗k f = f ⊗ · · · ⊗ f ( k factors)
on each homogeneous part T (V )k . Thus any transformation f in homF (V, V )
induces a transformation f ⊗ f on T2 . It takes u ⊗ u to f (u) ⊗ f (u), for any
528
u ∈ V , and so preserves the generating set of E2 , and hence E2 itself. Thus
the ideal
J := T (V )E2 T (V )
is invariant under the induced action of homF (V, V ) on T (V ). . This means
the endomorphis fˆ leaves the ideal J invariant. Thus σ induces an endomorV
phism (f ) which leaves each graded part invariant, induces at each degree
k, an F -transformation
∧(k) f :
k
^
(V ) −→
k
^
(V ).
Since ∧(k) f yields the mapping
b1 ⊗ · · · ⊗ bk + J −→ σ(b1 ) ⊗ · · · ⊗ σ(bk ) + J,
we deduce
Lemma 359
(∧(k) g) ◦ (∧(k) f ) = ∧(k) (g ◦ f ) for all f, g ∈ homF (V, V )
remark: Unfortunately, for k > 2, ∧(k) (f1 +f 2) is not ∧(k) (f1 )+∧(k) (f2 )
V
and so, if V is a left A-module for some F -algebra A, (k) (V ) does not usually
assume the structure of an A-module under this action.
In the case that V has finite dimension n over F , the exterior algebra also
has finite dimension. We have
E(V ) = F ⊕ V ⊕
^2
(V ) ⊕ · · · ⊕
^n
(V ),
where each graded part k (V ) has a basis consisting precisely of the elements
τ ∗ where τ ranges over the finite ascending seqences of length k of a totally
ordered basis B of V . Thus
V
^k
dimF (
(V )) =
n
k
!
= n!/k!(n − k)!,
the number of k-subsets of an n-set.
V
Note in particular that n (V ) has dimension only 1. It follows from
Lemma 354 that for any f ∈ homF (V, V ), the transformation ∧(n) f is scalar
multiplication by a uniquely determined scalar det f , called the determinant of f . Now by Lemma 359, we have
529
Corollary 360
1. For any σ, τ ∈ homF (V, V ),
det(σ ◦ τ ) = det(σ) det(τ ).
2. Suppose V is any finite-dimensional left A-module where A is an F algebra. Let U (A) be the group of units of A. Then the formula:
χV (u) := det u defines a group homomorphism
χV : U (A) → F ∗
into the multiplicative group of non-zero elements of the field F .
15.6
Symmetric and Alternating Multilinear
Mappings.
We fix an integer k and F -vector spaces V and W . A k-mutilinear mapping
f : V × · · · × V → V was defined to be a function which was linear in each
of its n arguments (see the subsection 13.7). Such a mapping, as we have
seen in Lemma 347 , factors through the subspace V ⊗ · · · ⊗ V . In this
subsection we wish to prove a similar theorem for symmetric and alternating
mappings in which the tensor products are replaced by product “∨” or “∧”
in the symmetric or exterior algebras, respectively.
We say that a k-multilinear mapping f is symmetric if and only if, f
is unchanged by the transposition of its i-th and (i + 1)-st variables for any
i = 1, · · · , k − 1. As a consequence, if f is symmetric and k-multilinear and
π is any permutation of the indices {1, · · · , k}, then
f (u1 , · · · , uk ) = f (uπ(1) , · · · , uπ(k) ).
We say that a k-multilinear mapping f is alternating if and only if, for
any k-tuple of vectors uj of V ,
f (u1 , · · · , uk ) = 0 whenever two of the ui are equal.
It is easy to deduce from this, the following properties of an alternating
multilinear mapping f ::
530
1. For each j = 1, · · · , k − 1, the transposition of uj−1 and uj changes the
sign of f (u1 , · · · , uk ).
2. For any permutation π of the indices {1, · · · , k},
f (u1 , · · · , uk ) = f (uπ(1) , · · · , uπ(k) ) · sgnπ
.
3. f (u1 , · · · , uk ) = 0 if and only if the {ui } are F -linearly dependant.
The main goal of this section is to prove the following:
Theorem 361
1. If one is given a symmetric k-multilinear mapping
f : V × · · · × V → W,
then there exists a linear transformation
f¯ : V ∨ V ∨ · · · ∨ V (with k wedge factors) → W
such that for each k-tuple (u1 , · · · , uk ) of vectors from V ,
f¯(u1 ∨ u2 ∨ · · · ∨ uk ) = f (u1 , · · · , uk ).
2. Suppose f is an alternating k-multilinear mapping,
f : V × · · · × V → W.
Then there exists a linear transformation f ∗ :
each k-tuple (u1 , · · · , uk ) of vectors of V ,
Vk
(V ) → W , which, for
f ∗ (u1 ∧ u2 ∧ · · · ∧ uk ) = f (u1 , · · · , uk ).
Before proving Theorem 361, it will be helpful to review what we know
about Symmetric and Exterior Algebras, comparing the two as we go.
We had introduced two Hom(V, V )-invariant ideals of the tensor algebra
T (V ):
I = T (V )S2 T (V ) S2 = hu ⊗ v − v ⊗ u|v, u ∈ V iF ,
531
and
J = T (V )E2 T (V ) E2 = hu ⊗ u|u ∈ V iF .
These produced two algebra homomorphisms
s : T (V ) → T (V )/I : = S(V ), the symmetric algebra,
e : T (V ) → T (V )/J : = E(V ), the exterior algebra,
which induced products “∨” and “∧” defined by the equations
s(a) ∨ s(b) : = s(a ⊗ b)
e(a) ∧ e(b) : = e(a ⊗ b).
The images of the graded component T (V )k under s and e are:
S k (V ) : = V ∨ · · · ∨ V (k factors) := s(T (V )k )
= (T (V )k + I)/I ' T (V )k /(T (V )k ∩ I)
^k
(V ) : = V ∧ · · · ∧ V (k factors) := e(T (V )k )
= (T (V )k + J)/J ' T (V )k /(T (V )k ∩ J).
These spaces are spanned by the sets
{s(u1 ) ∨ · · · ∨ s(uk )|ui ∈ V } and {e(u1 ) ∧ · · · ∧ e(uk )|ui ∈ V },
respectively, and these are the domains for the functions f¯ and f ∗ appearing
in Theorem 361.
Now choose a basis B of V endowed with a total ordering (B, ≤). As
before we let Fk be the set of all nondecreasing sequences σ = (b1 , . . . , bk )
where b1 ≤ · · · ≤ bk , bj ∈ B. For such a σ we write
σ ∗ : = b1 ⊗ b2 ⊗ · · · ⊗ bk ∈ T (V ), and
σ̄ : = s(b1 ) ∨ · · · ∨ s(bk ) ∈ S(V ).
Similarly, let Ak be the set of all properly ascending sequences τ = (b1 , · · · , bk )
of length k – thus b1 < b2 < · · · < bk , bj ∈ B. For such a τ we write
τ ∗ := e(b1 ) ∧ · · · ∧ e(b2 ).
The import of Theorems 357 and 358 was .
532
Lemma 362
1. The set B̄k := {σ̄|σ ∈ Fk } is an F -basis for S k (V ).
2. The set Bk∗ := {τ ∗ |τ ∈ Ak } is an F -basis for
Vk
(V ).
To prove Theorem 361 it will be useful to establish two technical lemmas.
Lemma 363 For each σ = (b1 , · · · , bk ) ∈ Fk and index j = 2, · · · , k, set
gj (σ) := b1 ⊗ b2 ⊗ · · · ⊗ bj−2 ⊗ (bj−1 ⊗ bj − bj ⊗ bj−1 ) ⊗ bj+1 ⊗ · · · ⊗ bk .
Then the collection
{gj (σ)|σ ∈ Fk , j = 2, · · · , k}
spans T (V )k ∩ I as a vector space.
proof: It is clear that each gj (σ) lies in T (V )k ∩ I. Let U be the F -span of
the elements gj (σ), for all j ≥ 2 and σ ∈ Fk .
Then any tensor monomial, b1 ⊗ · · · ⊗ bk (with factors not necessarily in
non-decreasing order) is congruent modulo U to one in which the (j − 1)-st
and j-th factors have been transposed, j = 2, . . . , k. It follows that it is
congruent modulo U to one which has its factors in non-decreasing order.
Thus any element a ∈ (T (V )k ∩ I) − U has the form
a=
X
σ̂ · ασ + u
(15.9)
where, for σ = (b1 , . . . , bk ) ∈ Fk , σ̂ := b1 ⊗ · · · ⊗ bk , only finitely many of
the ασ are non-zero, and u is some vector in the subspace U . Now we apply
the algebra homomorphism s to both sides of equation (15.9). Since a ∈ I,
s(a) = 0 and so one obtains
s(a) = 0 =
X
s(σ̂)ασ =
X
k
σ̄ασ .
Since B̄k is a basis for S k (V ), the σ̄ are linearly independant, and so each
coefficient ασ is zero. Thus a ∈ U , contrary to its choice. Thus U = T (V )k ∩I.
Similarly
Lemma 364 Let Fk0 be the collection of all non-decreasing sequences σ =
(b1 , . . . , bk ) of elements of B which are not all distinct – that is, bj = bj+1
533
for at least one index j ≤ k − 1. Also, for each strictly ascending sequence
τ = (b1 , . . . , bk ) ∈ Ak and index j = 2, . . . , k, set
hj (τ ) := b1 ⊗ · · · ⊗ bj−2 ⊗ (bj−1 ⊗ bj + bj ⊗ bj−1 ) ⊗ bj+1 ⊗ · · · ⊗ bk .
Then the collection
{hj (τ )|τ ∈ Ak , j = 2, . . . , k} ∪ {σ̂|σ ∈ Fk0 }
spans T (V )k ∩ J.
proof: This proof is very similar to that of the previous lemma. It is clear
that for each σ ∈ Fk0 , σ̂ lies in J, and that each element hj (τ ), τ ∈ Ak , lies in
J ∩ T (V )k . Again let U be the F -span of these elements. Now an arbitrary
tensor monomial b1 ⊗· · · ⊗ bk , (bi ∈ B) is congruent modulo U to the additive
inverse of the monomial obtained from it by transposing two adjacent tensor
factors. It follows that if π is any permutation of the indices {1, . . . , k},
b1 ⊗ · · · ⊗ bk ≡ bπ(1) ⊗ bπ(2) ⊗ · · · bπ(k) · sgn(π) mod U.
Thus any tensor monomial of degree k in the elements of B, is congruent
modulo U to ±1 times one in which the factors are in non-decreasing order
(reading from left to right). If two of these factors are equal, the element is
±σ̂ for some σ in Fk0 , and hence is in U . Thus, finally, we can say that any
tensor monomial of degree k in the elements of B, which is not already in
U , is congruent (modulo U ) to ±τ̂ , where τ = (b1 , . . . , bk ) ∈ Ak is strictly
increasing.
Now suppose b is an element of (T (V )k ∩ J) − U . Since the monomials of
degree k in the elements of B form a basis for T (V )k , and each such monomial
is either in U or an element of U plus some τ̂ , τ ∈ Ak , we have
b=
X
τ
τ̂ βτ + u,
where τ ranges over a finite subset of Ak , the βτ are non-zero scalars, and u
is some element of U . Applying the algebra homomorphism e to both sides
and noting that b ∈ J implies e(b) = 0, we have
0=
X
e(τ̂ )βτ =
X
τ ∗ βτ .
Now by Lemma 362, Bk∗ is a basis for k (V ) = e(T (V )k ), and so the elements
τ ∗ in the sum are linearly independant. Hence each βτ is zero and this yields
b ∈ U , a contradiction.
V
534
Thus U = T (V )k ∩ J.
proof of theorem 361: Now suppose
f : V × V × ··· × V → W
is a k-multilinear mapping. The universal property of the tensor product
shows that there is a mapping
T (f ) : T (V )k → W
such that for each k-tuple (u1 , . . . , uk ) ∈ V × · · · × V = V (k) ,
T (f )(u1 ⊗ · · · ⊗ uk ) = f (u1 , . . . , uk ).
Suppose now f is symmetric. Then clearly T (f ) vanishes at all vectors
gj σ, σ ∈ F. Since these vectors span I ∩ T (V )k (Lemma 363) it follows that
T (f ) vanishes on T (V )k ∩ I. It follows that T (f ) induces a homomorphism
f¯ : T (V )k /(T (V )k ∩ I) := S k (V )
taking u1 ∨ · · · ∨ uk to f (u1 , . . . , uk ) for all (u1 , . . . , uk ) ∈ V (k) .
On the other hand, if f is alternating, it vanishes on Fk0 and all hj (τ ), τ ∈
Ak , and hence by Lemma 364, vanishes on J ∩ T (V )k . It follows that T (V )
induces
^k
(V ) −→ W
f¯ : T (V )k /(T (V )k ∩ J) :=
taking u1 ∧ . . . ∧ uk to f (u1 , . . . , uk ) for all (u1 , . . . , uk ) ∈ V (k) .
535
Chapter 16
Sesquilinear Forms.
16.1
σ-sesquilinear Forms.
16.1.1
Basic definitions
At the very beginning, we can return temporarily to the generality of division
rings.
Our starting point will be a right vector space V over a division ring D.
Recall that an antiautomorphism of D is an automorphism σ of the additive
group (D, +) such that (ab)σ = bσ aσ for all (a, b) ∈ D × D. A D-linear
transformation, as we know, is a D-homomorphism of right vector spaces.
A σ-semilinear transformation is an additive group homomorphism t :
V → W from a right D-vector space V into a left D-vector space W such
that
t(vα) = ασ t(v) for all v ∈ V.
In particular A semilinear functional of V is a group homomorphism
τ : (V, +) → (D, +) such that for some antiautomorphism σ of D, we have
τ (va) = aσ τ (v) for all v ∈ V, a ∈ D.
(When σ = id, the identity map on D, so that D is a field, a semilinear
functional becomes a (linear) functional as previously defined.) 1
1
Note that we cannot write 1D for id, the identity mapping on D, since the symbol 1D
already indicates the multiplicative identity of D in our previous notaion on rings.
536
A mapping f : V × V → D is said to be biadditive if and only if, for
any (u, v, w) ∈ V × V × V ,
f (u, v + w) = f (u, v) + f (u, w)
f (u + v, w) = f (u, w) = f (v, w).
A mapping f : V × V → D is said to be a σ-sesquilinear form if and
only if f is biadditive, and
f (xa, yb) = aσ f (x, y)b.
(16.1)
Some immediate consequences of the definitions are recorded here for
future use:
Lemma 365 If the σ-sesquilinear form f is not the zero mapping, then the
following statements are true:
1. There exists a pair (x, y) ∈ V × V such that f (x, y) = 1.
2. The antiautomorphism σ is uniquely determined by f .
3. For any non-zero scalar c in D, the function cf : V × V → D (whose
value at (u, v) is c·f (u, v)) is also a non-zero τ -sesquilinear form where
τ is the antiautomorphism taking any scalar a ∈ D, to caσ c−1 – that
is, τ is the composition of σ followed by conjugation by c−1 . (In this
case we say that the form cf is proportional to f .)
16.1.2
Forms and functionals
Note that there are associated with f two families of semilinear functions.
For each vector u ∈ V , the mapping λu : V → D which takes any vector
v to f (u, v) is a (linear) functional, while the mapping ρu : V → D which
takes any vector v to f (v, u) is a semilinear functional associated with the
antiautomorphism σ.
Now suppose f : V × V → D is a σ-sequilinear form. We have just
observed that for each vector u ∈ V , the “left mapping” λu : V → D defined
by the formula λu (v) = f (u, v), is D-linear, and so λu ∈ V ∗ , the dual space.
The important observation is this:
537
Lemma 366 The mapping λ(f ) : V → V ∗ which takes each vector u to λu
is a σ-semilinear transformation. Conversly, if λ : V → V ∗ is σ-semilinear,
and f : V × V → F is defined by the formula
f (u, v) := λ(u)(v),
(that is, the functional λ(u) applied to vector v), then f is a σ-sesquilinear
form.
16.1.3
Descriptions of forms by Grammian matrices
We also remark that when V has finite D-dimension, then any σ- sesquilinear form, f can be determined by a certain matrix. First suppose B =
{v1 , . . . , vn } is an ordered basis of V as a right D-module. One can form the
matrix G = [f ]B whose (i, j)-th entry is f (ui , vj ), called the Grammian of
f with respect to basis B. Then G determines all values of f (u, v), for if
u=
X
vi αi and w =
X
vj βj ,
then
f (u, v) =
X
i,j
αiσ f (vi , vj )βj .
16.2
Reflexive forms.
16.2.1
Perps and radicals of reflexive forms.
A σ-sesquilinear form is reflexive if the relation
f (x, y) = 0, (x, y ∈ V )
is symmetric.
If f is reflexive then for any x ∈ V we put:
x⊥ = {y ∈ V : f (x, y) = f (y, x) = 0}
and for any non-empty subset S of V ,
S ⊥ = ∩x∈S {x⊥ }.
538
We also denote V ⊥ by Rad(f ), the radical of f , and we say that f is nondegenerate if Rad(f ) = 0, while otherwise f is degenerate.
Note that for any x ∈ V , the functional ρx : V → D, is D-linear, with
kernel equal to x⊥ . Thus, codimV (x⊥ ) ≤ 1.
Theorem 367 Suppose f is a reflexive σ-sesquilinear form over the right
D-vector space V . Let V̄ := V /Rad(f ). Then there is an induced form
f¯ : V̄ × V̄ → D which is non-degenerate.
proof: The elements of vector space V̄ are cosets u + Rad(f ). Then, setting
f¯(u + Rad(f )), v + Rad(f )) := f (u, v),
it is easy to see that this value is independant of the choices of coset representatives u and v, and is a reflexive σ-sesquilinear form on V̄ . If w̄ =
w + Rad(f )) ∈ Rad(f¯)), then f (w + Rad(f ), V + Rad(f )) = f (w, V ) = 0 so
w ∈ Rad(f )). Thus w̄ = 0̄ = Rad(f¯) so f¯ is non-degenerate.
16.2.2
(σ, )-Hermitian forms.
Theorem 368 Let f be a reflexive, σ-sesquilinear form on V , and assume
that codimV (Rad(f )) ≥ 2. Then there exists in D such that:
f (y, x) = f (x, y)σ for all x, y ∈ V.
(16.2)
Moreover, we then have
σ = −1 , and
(16.3)
σ2
(16.4)
a
= a
−1
for all a ∈ D.
Proof: Let x ∈ V − Rad(f ), choose y ∈ V with f (x, y) = 1, and set
= f (y, x). For any z ∈ V we can write z = ya + w with w ∈ x⊥ and
a ∈ D. Then f (x, z) = f (x, ya) = a, while f (z, x) = f (ya, x) = aσ f (y, x) =
f (x, z)σ . That is:
f (z, x) = f (x, z)σ for all z ∈ V.
539
(16.5)
We may write = x to emphasize that depends only on x. Observe
that for any t ∈ x + y ⊥ we have f (t, y) = 1 and f (y, t) = . This yields:
x = t if t ∈ x + y ⊥ .
(16.6)
But if we replace y by an element y 0 in y + x⊥ , the new y 0 enjoys the same
properties:
f (x, y 0 ) = 1, and f (y 0 , x) = .
which, for y implied (16.6). Thus we may strengthen that equation to read:
(*) For each y 0 ∈ y + x⊥ and each t ∈ x + (y 0 )⊥ , one has x = t .
Now suppose by way of contradiction that there is a vector u in V −Rad(f )
such that u 6= x . Choose v ∈ V so that f (u, v) = 1. If one could find a
vector w, say, in (x+(y 0 )⊥ )∩(u+v ⊥ ), then by (*) and by equation (16.6) (with
(u, v) in the role of (x, y)), one could conclude x = w = u , a contradiction.
Thus, for all y 0 ∈ y + x⊥ , we must have (x + (y 0 )⊥ ) ∩ (u + v ⊥ ) empty, and as
this is the intersection of two cosets of hyperplanes, this can only happen if
they are cosets of the same hyperplane. This means
v ⊥ = (y 0 )⊥ for all y 0 ∈ y + x⊥ .
(16.7)
But equation (16.7) declares that any element of v ⊥ is perpendicular to all
elements of y + x⊥ and hence to the entire subspace they generate. The
latter subspace contains all differences of elements of y + x⊥ (which form the
hyperplane x⊥ itself) as well as y ∈
/ x⊥ , and so is all of V . Thus v ⊥ ≤ V ⊥ =
Rad(f ) contrary to the assumption that codimV Rad(f ) ≥ 2.
Thus equation (16.2) of the statement of the theorem holds.
Now 1 = f (x, y) = f (y, x)σ = σ , so σ = −1 . Also for any a ∈ D,
a = f (x, ya), aσ = f (ya, x) and (aσ )σ = f (x, ya) = a. So
2
2
2
σ aσ = −1 aσ = a and aσ = a−1 .
The proof is complete.
Any sesquilinear form satisfying equation (16.2) is called a (σ, )-Hermitan
form.
remark Suppose f is a reflexive form satisfying equation (16.2), for some
constant . If dimD V is finite, and f has the Grammian G with respect to
some basis, then
Gt = Gσ where Gt denotes the transpose of G and Gσ is G with the antiautomorphism
σ applied to each entry.
540
16.2.3
Totally isotropic subspaces; symplectic forms.
Given a σ-sesquilinear form, f , we say that a vector v (or the 1-space hvi) is
isotropic (relative to f ) if and only if f (v, v) = 0. Notice that this definition
makes perfect sense even when f is not reflexive. A subspace A of V is said
to be totally isotropic if and only if f (A, A) = 0. (When f is reflexive,
so that the “perp” symbol makes sense, one can say this another way: A is
totally isotropic if and only if A ⊆ A⊥ .)
We say that a σ-sequilinear form is symplectic if and only if it is reflexive,
σ = id (from which it follows that D is a field), and that every vector is
isotropic (from which it then follows that f (u, v) = −f (v, u) for all (u, v) ∈
V × V ).
Here is a characterization of symplectic forms.
Theorem 369 Let V be a right vector space over a division ring D and let
f : V × V → D be a non-zero σ-sesquilinear form. If every vector in V is
isotropic, then D is a field and f is a symplectic form.
proof: Since f is not zero, there exist two vectors, u and v, such that
f (u, v) = γ 6= 0. Since every vector is isotropic, it is impossible for Rad(f )
to have codimension 0. Moreover, for every scalar α ∈ D,
0 = f (u + vα, u + vα) = f (vα, u) + f (u, vα).
Thus
ασ f (u, v) = −f (v, u)α for all α ∈ D.
(16.8)
Since this holds for α = 1,
f (u, v) = −f (v, u)
for all (u, v) such that f (u, v) 6= 0. But of course it holds otherwise as well.
This identity shows that f is reflexive (so that Theorem 368 applies), that
σ = id, and that the of Theorem 368 is −1. Thus f is symplectic.
Corollary 370 Suppose f is a non-zero (σ, )-Hermitian form associated
with antiautomorphism σ and constant as in equation ( 16.2) in Theorem
368. Assume σ 6= 1D if = −1. Then there exists an element δ of D such
that = δ σ δ −1 .
541
proof: If = 1, take δ = 1. If = −1 6= 1, then D has odd characteristic and
σ acts on D (as a vector space over its center) as an element of order 2 (for σ 2
is conjugation by the central element −1), and σ 6= 1, by assumption. Then
σ possesses an eigenvector for the root −1. Letting δ be such an eigenvector
we have δ σ = −δ as desired. So we may assume f is not symplectic. Then
there exists a vector u such that f (u, u) = δ −1 6= 0. Then δ −1 = (δ −1 )σ ,
whence = δ σ δ −1 .
Corollary 371 Assume that F is a (commutative) field. Then, in the preceeding Theorem 368, we have σ 2 = id, and one of the following holds,
1. σ = id and = ±1.
2. σ 6= id and f is proportional to a σ-sesquilinear form g such that
g(y, x) = g(x, y)σ for all x, y ∈ V .
Proof: Equation (16.4) of the theorem says that σ 2 = id, and equation
(16.3) says that if σ = id then = ±1. Suppose that σ 6= id. By Corollary 370
and the fact that F is commutative, there is then an element δ of F with
δ −1 δ σ = . Put g = δf . Then g(x, y)σ = f (x, y)σ δ σ , and g(y, x) = f (y, x)δ =
f (x, y)σ δ = f (x, y)σ δ σ . Thus, (2) holds.
16.3
Finite-dimensional subspaces of spaces
with a non-degenerate reflexive form.
Let f : V × V → D be a σ-sesquilinear form. As usual, V is a right vector
space over D and σ is an antiautomorphism of D. Recall that the form f
is said to be reflexive if and only if for any vectors u, v ∈ V , f (u, v) = 0 if
and only if f (v, u) = 0. In this case, for any subspace W of V one writes
W ⊥ for the set of vectors v such that f (v, W ) := {f (v, w)|w ∈ W } = 0.
This is always a subspace of V : one writes Rad(f ) for V ⊥ , and says that f
is non-degenerate exactly when Rad(f ) = 0. One notes that f induces a
non-degenerate reflexive σ-sesquilinar form on V̄ := V /Rad(f ) in a natural
way (Theorem 367). Finally, there is this structural result (Theorem 368):
If Rad(f ) has codimension at least 2 in V , then there exists a constant in D such that f (u, v) = f (v, u)σ for all vectors u, v ∈ V . Moreover,
σ = −1 and σ 2 is conjugation by −1 .
542
We have more or less summarized the previous section.
From this point onward, we consider only (F, F )-bimodules over a field
F satisfying
αv = vα for all (α, v) ∈ F × V.
(16.9)
We shall refer to all such objects simply as “vector spaces” over F .2
Now suppose f : V × V → F is a reflexive σ- sesquilinear form. Then,
of course, we have a definition of “⊥”. For each vector u ∈ V , u⊥ = {v ∈
V |f (u, v) = 0} = ker λu has codimension at most one in V . As before, for
any subset X ⊆ V , we write X ⊥ := ∩x∈X x⊥ .
We say that a subspace U of V is non-degenerate (with respect to the
reflexive σ-sesquilinear form f ) if and only if
U ∩ U ⊥ = 0.
At the other extreme, if U ⊆ U ⊥ , we say that U is totally isotropic.
Now, something new: If U and W are subspaces of V , we write U ⊥ W
if U + W = U ⊕ W and U ⊆ W ⊥ – that is U ∩ W = 0 and f (u, w) = 0 for
all (u, w) ∈ U × W .
We now state
Theorem 372 Suppose U is a finite dimensional non-degenerate subspace of
a (possibly infinite-dimensional) vector space V with respect to some reflexive
non-degenerate σ-sesquilinear form on V . Then
V = U ⊕ U ⊥.
proof: Let f be the given reflexive σ-sesquilinear form. Let λ : V → U ∗
be the σ-semilinear mapping which sends v ∈ V to λv |U . This means λ =
resU ◦ λ(f ) where resU : V ∗ → U ∗ is the mapping which restricts a functional
of V to the subspace U . Now the kernel of λ is U ⊥ and the restriction of λ to
U has kernel U ∩ U ⊥ = 0. But since U is finite-dimensional, dim U = dim U ∗ .
Thus
codimV (U ⊥ ) = codimV (ker λ) ≤ dim U ∗ = dim U.
It follows that U + U ⊥ = V and the conclusion holds.
2
We called these “symmetric” bimodules.
543
remarks: (1) A version of this theorem also holds for σ-sesquilinear forms
on right vector spaces over a non-commutative division ring D. However
one then has the akwardness of having to discuss a sort of transformation
V → V ∗ from a right vector space to a left vector space. That would require
still another generation of definitions without adding any real insights beyond
those introduced here, and so I have elected to leave that cranny to the
curious student.
(2) This theorem fails completely when U has infinite dimension. There
is a non-degenerate form (V, f ) with a non-degenerate subspace U of codimension 2 in V for which U + U ⊥ still has codimension 1 in V (Exercise
??).
Theorem 373 Let V be a (possibly infinite dimensional) vector space over
field F and let (V, f ) be a non-degenerate reflexive sesquilinear form. Suppose
A is a finite-dimensional totally isotropic subspace of V . Then the following
hold:
1. There is a linear surjection ζ : V → A∗ with kernel A⊥ . As a result,
dim V ≥ 2 dim A.
2. Rad(A⊥ /A) = 0, or, equivalentaly, A⊥⊥ = A.
3. There exists a non-degenerate subspace C containing A, having dimension 2 × dim A.
4. A⊥ = A ⊥ X where X is a non-degenerate subspace.
Remark: Note that in part 4, we cannot conclude that V = X ⊥ X ⊥ since
X may be infinite-dimensional.
proof: 1. For each vector v ∈ V let ζ(v) be the functional of A which takes
each vector a in A to the scalar f (v, a). Then ζ is a linear transformation,
V → A∗ , into the dual space of A. Since A has finite dimension, if ζ(V ) were
a proper subspace of A∗ , the intersection of the kernels of the functionals in
ζ(V ) would contain a non-zero vector a0 . Then clearly a0 ∈ V ⊥ = Rad(V ),
against (V, f ) being non-degenerate. Thus ζ is a surjection. The elements
of V which map to the zero-functional of A clearly comprise A⊥ . It follows
that
dim A = dim A∗ = dim ζ(V ) = dim(V / ker ζ) = codimA⊥ .
544
Since A ≤ A⊥ , dim V ≥ dim A + codimA⊥ = 2 × dim A.
2. and 3. Choose a finite basis a1 , . . . , an for the given totally isotropic
subspace A. Since ζ is a surjection, there are elements v1 , . . . , vn of V which
map onto a dual basis of the {ai }. This means exactly that
1. for 1 ≤ i, j ≤ n, f (vi , aj ) = δij , and
2. the cosets vi + A⊥ are a basis for V /A⊥ ,
where δij is the familiar Kronecker “delta”.
Since A is totally isotropic, A ⊆ A⊥ , so A ⊆ A⊥⊥ . Now suppose y were
an element of A⊥⊥ − A. Set γi := f (vi , y), i = 1, . . . , n, and then set
u := y −
Xn
aγ.
j=1 j j
Then u is also an element of A⊥⊥ − A, and hence non-zero, and
f (vi , u) = f (vi , y) − f (vi , ai γi ) = γi − γi = 0, i = 1, . . . n.
Since also u is in A⊥⊥ , it is now perpendicular to every element of
V = F v 1 ⊕ F v 2 ⊕ · · · ⊕ F v n ⊕ A⊥ .
This contradicts V being non-degenerate.
So no such vector y exists, and A = A⊥⊥ .
Now set C := A ⊕ F v1 ⊕ F v2 ⊕ · · · ⊕ F vn . The restriction of ζ to C still
induces a surjection C → A∗ , and so A = A⊥ ∩ C. Then Rad(C) = C ⊥ ∩ C ⊆
P
A⊥ ∩ C = A. But if r = ai βi were in Rad(C), then 0 = f (vi , r) = βi for all
i, so r = 0. Thus Rad(C) = 0, so C is non-degenerate.
4. Since vector spaces over a division ring are semisimple, they are complemented modules. Thus A has a vector space complement X in A⊥ . Now,
since part 2 implies A = Rad(A⊥ ), the induced form (A⊥ /A, f¯) is nondegenerate by Theorem 367. But there is an isometry
(A⊥ /A, f¯) → (X, f |X ),
and so X is non-degenerate.
545
16.4
The Witt index of Polar Systems.
In the next chapter we shall study quadratic forms. In characteristic two
these provide new geometric systems which cannot be described in terms
of sesquilinear forms alone. Yet, there are many features which they have
in common with the geometry of sesquilinear forms. Both strucures can be
approached uniformly through the notion of a “polar system”.
Throughout this section, (V, f ) will be a reflexive σ-sesquilinear form on
a right vector space V over a division ring D. In the notation of the previous
section, X ⊥ := {v ∈ V |f (x, v) = 0, for all x ∈ X} for any subset X of V ,
and we write rad(V )f (V ) = V ⊥ . By Theorem 368, if dim(V /radf (V )) ≥ 1,
there exists a constant scalar such that for any two vectors x and y,
f (x, y) = f (y, x)σ .
If W is a subspace of V , then fW is the sesquilinear form on W obtained
by restriction of the domain of f to W × W , and for any subspace R of
rad(W ), fW/R is the form on W/R induced by fW .
16.4.1
Polar systems
A polar system with respect to (V, f ) is a collection T of totally isotropic
subspaces subject to these two properties:
1. If A ≤ B ∈ T then A ∈ T , that is, T is closed under taking subspaces.
2. If A, B ∈ T , and A ⊆ B ⊥ , then A + B ∈ T .
Some examples:
1. The collection of all totally isotropic subspaces of V .
2. Suppose T is a polar system with respect to (V, f ) and suppose that
W and R are subspaces of V with R ≤ W ⊥ .
(a) The collection TW of all members of T which are subspaces of W
is a polar system with respect to both (V, f ) and (W, f |W ).
(b) The collection, T(W +R)/R , of all subspaces (T + R)/R for which
T ∈ T(W +R) , is a polar system with respect to T(W +R)/R .
546
3. If D is a field and Q is a quadratic form whose associated bilinear form
is (V, f ) then the collection of all subspaces which vanish under Q forms
a polar system (see Chapter 16).
4. If f is the zero form, all subspaces of V are totally isotropic. In that
case the collection of all finite dimensional subspaces of V forms a polar
system.
Of course, a polar system forms a poset under the inclusion relation. Does
this system have maximal elements? The last example shows that it may not.
Exercise ?? at the end of this chapter gives an example of a polar system
in which maximal elements exist, but it is not true that every member of the
system lies in a maximal element.
16.4.2
The radical of a polar system
If T is a polar system, let M(T ) denote the collection of all maximal elements
of the polar system T . As just noted, M(T ) could very well be empty.
The intersection over all members of M(T ) is called the radical of T
and is denoted by the symbol rad(T ). If M(T ) is empty, the radical is
understood to be the space hT i generated by all subspaces in T .
The polar system is non-degenerate if and only if rad(T ) = 0. (Note
that if no maximal elements of T exist, then it cannot be non-degenerate.)
Lemma 374 Let T be a polar system relative to the reflexive sesquilinear
form (V, f ). Suppose V = hT i. Then the following statements are true:
1. If T contains at least one maximal member, then the sesquilinear radical
radf (V ) contains a unique subspace R maximal with respect to belonging
to T .
2. Moreover, if every member of T lies in a maximal member of (T , ≤),
then this maximal subspace R coincides with rad(T ).
3. If the radical of T lies in the sesquilinear radical, then the induced polar
system TV /rad(T ) is non-degenerate.
proof: Let A is a maximal member of the polar system T . Set R = A ∩
radf (V ). Then if T is an element of T contained in radf (V ), we see that
547
T + A ∈ T and so T ≤ R. Thus R is the unique maximal element of the
poset of subspaces of T which lie in the sesquilinear radical.
Also, the argument involving T and A in the previous paragraph, can be
repeated with R and an arbitrary maximal element of T in these respective
roles, to conclude that R ≤ rad(T ).
Now assume that every member of T lies in a maximal member. Then the
radical of T lies in the “perp” of every member of T and so lies in radf (V ).
Thus rad(T ) ≤ R in this case, and so the two are equal.
Suppose R = rad(T ) lies in radf (V ), the sequilinear radical. Then there
is an induced form fV /R on V /R, and an induced polar system TV /R whose
maximal elements are all of the form A/R where A is maximal in T . Since
such A intersect at R, the last part of the lemma is proved.
16.4.3
Finite rank, and the Witt index of a polar system
Again T is a polar system with respect a reflexive sesquilinear form (V, f ).
Theorem 375 Suppose A and B are two members of M(T ).
1. Then A⊥ ∩ B = A ∩ B.
2. Suppose A/(A ∩ B) has finite dimension. Then dim(B/(A ∩ B)) ≤
dim(A/(A ∩ B).
3. If rad(T ) has finite codimension in at least one member of M(T ) then
it has the same has finite codimension in all members of T . (This
constant codimension is called the Witt index of the polar system
T ).
proof: 1. Since the members of T are totally isotropic subspaces, A ⊆ A⊥ .
Thus A∩B ⊆ A⊥ ∩B. Now by the first property of a polar system, A⊥ ∩B ∈
T . Then, by the second property, we see that
A + (A⊥ ∩ B) ∈ T .
(16.10)
But since A is a maximal element of (T , ≤), we have that the left side of
equation (16.10) is just A itself. Thus A⊥ ∩ B ≤ A ∩ B. This proves part 1.
548
2. We define a linear transformation α : B → (A/(A ∩ B))∗ as follows:
For each element b ∈ B, α(b) is the funtional on A/(A ∩ B) which sends each
coset a + A ∩ B to
f (b, a + A ∩ B)) = f (b, a),
which is a single scalar, since B is totally isotropic with respect to f . Now
the kernel of α is the set B ∩ A⊥ , which by part 1 is B ∩ A. Thus α induces
an embedding B/(B ∩ A) into the dual space (A/(A ∩ B))∗ . Since A/(A ∩ B)
is finite-dimensional by the hypotheses of this part of the theorem, we have
dim B/(A ∩ B) ≤ dim(A/(A ∩ B))∗ ≤ dim A/(A ∩ B).
It follows that dim(B/(A ∩ B)) ≤ dim(A/(A ∩ B)).
3. Suppose R = rad(T ), the radical of T has finite codimension in some
A ∈ M(T ) . Then, for any further element B ∈ M(T ), A∩B contains R, and
so dim A/(A ∩ B) is finite. Then by part 2, dim B/(A ∩ B ≤ dim A/(A ∩ B)
so dim(B/(A ∩ B)) is also finite. That being so, the roles of A and B can be
reversed to also yield dim(A/A ∩ B)) ≤ dim(B/(A ∩ B)). Thus Thus A ∩ B
has the same codimension in both A and B. Since R has finite codimension
in A ∩ B, it follows that R has the same finite codimension in both A and
B. The proof is complete.
We say that T has finite rank or has finite Witt index if and only
if the radical of T has finite codimension in some member of M(T ). Note
that this hypothesis entails the fact that M(T ) is non-empty. In fact more
is true.
Corollary 376 Suppose the polar system T has finite rank. Then every
element of T lies in a maximal member.
proof: By hypothesis there is a subspace A ∈ M(T ) such that dim(A/rad(T )
is a finite number d. Suppose their were a member T of T which was contained in no member of M(T ). Then also the same is true of T + rad(T ).
So, without loss, we may assume that T contains rad(T ). Then there must
be a properly ascending infinite chain of subspaces of T beginning at T . It
follows that there is a subspace S ∈ T , containing T at codimension larger
than 2d.
Now A⊥ ∩ S = A ∩ S is the kernel of the linear transformation
α : S → (A/(A ∩ S))∗ ,
549
defined by setting
α(s)(a + (A ∩ S)) = f (s, a),
for each s ∈ S and a ∈ A. Thus
dim(S/(A ∩ S) ≤ dim(A/(A ∩ S)∗ = dim(A/A ∩ S) ≤ d.
(16.11)
Thus
dim(S/(A ∩ T )) = dim(S/T ) + dim(T /(A ∩ T ))
= dim(S/(A ∩ S)) + dim((A ∩ S)/(A ∩ T ))
≤ d + d = 2d.
So dim(S/T ) is bounded by 2d. This contradicts the choice of S.
16.4.4
A brief interlude with graph-theoretic concepts
In the section following, we wish to reveal some special features of the system
M(T ) of maximal members of a polar system of finite rank. These are most
easily described in terms of the distance metric of a graph. Although most
of these concepts are familiar to most readers, the important concept of
“gatedness” may not be quite so familiar.
In any simple graph (V 0 , E 0 ), a walk of length d from vertex x to vertex
y is a sequence of vertices (x = x0 , . . . , xd = y), where {xi , xi+1 } is a member
of the edge set E 0 . A graph is connected if and only if , given any two
vertices, there is a walk of some finite length from one to the other. For a
connected graph G = (V 0 , E 0 ), there is a natural metric dG : V 0 × V 0 → N,
which, at any pair of vertices, records the length of a shortest possible walk
from one to the other. Thus, for x and y in V 0 ,
1. dG (x, y) is a non-negative integer,
2. dG (x, y) = dG (y, x) and
3. dG (x, y) = 0 if and only if x = y.
0
be set of all edges of E 0
For any subset X of the vertex set V 0 , let EX
0
whose incident vertices both lie in X. Then (X, EX
) (with the restricted
incidence of vertices and edges) is called the subgraph induced by X or
simply an induced subgraph of (V 0 , E 0 ). A clique is an induced subgraph
any two of whose vertices form an edge.
550
0
Suppose X = (X, EX
) is an induced subgraph of G := (V 0 , E 0 ). We say
the L is gated with respect to vertex t if and only there exists in L, a
vertex gt (whose choice may depend on t) such that for any vertex x in X,
dG (t, x) = dG (t, gt ) + dG (gt , x).
(16.12)
This means that the distance in G from t to any vertex x of X can be
“measured through the vertex” gt . For that reason, the latter is called the
gate of L with respect to vertex t.
Right away one notices two things: (1) Given X gated relative to t, the
“gate” gt is unique. (2) If t is itself a vertex of X, then t is itself its own
gate.
An induced subgraph is said to be gated if and only if it is gated with
respect to every vertex t of its ambient graph.3
Lemma 377 Gated induced subgraphs are convex and isometrically embedded.
The adjective convex means that for any shortest walk connecting two
vertices of the subgraph, the intermediate vertices lie in the subgraph. In
fact since the subgraph is an induced subgraph, succesive vertices in such
a shortest walk are already adjacent in the subgraph, and so the walk is
a shortest walk of the subgraph. This means that the distance between
two vertices of the subgraph as measured by the metric of the subgraph, is
the same as their distance as measured by the distance metric of the entire
graph — which is the substance of the adjective isometrically embedded
for subgraphs.
The proof of the Lemma is left as an exercise.
In order to approach the so-called “oriflame phenomenon” of the next
chapter, we record for later use this simple result:
Lemma 378 Suppose (V 0 , E 0 ) is a connected simple graph whose edges are
gated. Then it is a bipartite graph. (Thus, “being at even distance” is an
equivalence relation on vertices.)
One simply shows that the graph cannot have cycles of odd length, a
well-known characterization of bipartite graphs.
These concepts are used in the next subsection.
3
This concept was introduced to the mathematical world by A. Dress and R. Scharlau,
though it seems to have appeared once in a paper in Physical Chemistry. This concept
can be adapted to subgraphs which are not induced as well, where it is the basis of R.
Scharlau’s beautiful characterization of Buildings belonging to a diagram.
551
16.4.5
The distance metric on M(T )
Now suppose T is a polar system with respect to the reflexive sesquilinear
form f on V . Now we define a graph G = (M(T ), ∼) with vertex set M(T ).
Two vertices A and B will be defined to be adjacent if and only if they
intersect at a hyperplane – that is a codimension one subspace – of each of
them. (Of course by Theorem 375, it is enough to know that the intersection
has codimension in at least one of A or B.)
Theorem 379 (The polygonal structure of maximal elements of a polar
system) Let T be a polar system of finite rank, relative to the sesquilinear
form (V, f ). Let G = (M(T ), ∼) be the adjacency graph on the maximal
elements of T .
1. The graph G is connected.
2. For any two vertices A and B be of M(T ), the distance dG (A, B) in
the graph G is the codimension of A ∩ B in A.
3. Let A and B be elements of M(T ), and let H be a hyperplane of B
which contains rad(V ). Then, among the elements of M(T ) which
contain H, there is a unique element which is nearest A in the graph
G.
proof: The first two parts are proved together.
Since T has finite rank, rad(T ) has finite codimension in every element
of M(T ). Choosing A and B in M(T ) we see that d = codimA (A ∩ B) =
dim(A/(A ∩ B)) is finite. We shall prove part 2 by induction on d.
If d = 1, then A and B are adjacent vertices in G. So we assume d > 1.
Choose vector b in B − (A ∩ B). Then b⊥ ∩ A = H is a hyperplane of A
containing A ∩ B. Now form the subspace A1 := hb, Hi. We claim that A1 is
an element of M(T ). First of all, A1 ∈ T since hbi and H both belong to T
and hbi ⊆ H ⊥ . Second, A1 must be maximal in T . For suppose otherwise.
Then A1 is properly contained in a space T belonging to T . Thus H has
codimension at least two in T , and, as A = hH, ai, we see that A⊥ ∩ T is a
hyperplane of T . But since (A⊥ ∩ T ) + A ∈ T and A is maximal in T , we
must have that A⊥ ∩ T ≤ A so A itself is a hyperplane of T . But in that case
A is not maximal in T , a contradiction. Thus, we see that A1 ∈ M(T ).
552
Now A1 is adjacent to A in the graph G. On the other hand since A1 ∩
B ⊇ hb, A ∩ Bi properly contains A ∩ B, we may apply induction on the
codimension of A1 ∩ B in B to see that dG < d. In particular B is in the
same connected component of G as is A1 and A, and the distance dG (A, B)
is at most d.
On the other hand, we claim that dG (A, B) is at least d = dim(A/(A ∩
B)). Suppose, by way of contradiction, that A and B were connected by a
walk (A, A1 , A2 , . . . , Ak = B) of length k less than d. Then A ∩ B contains
A ∩ A1 ∩ · · · ∩ Ak . But since Ai+1 ∩ Ai is a hyperplane of Ai , the codimension
of A ∩ A1 ∩ · · · ∩ Ai+1 in A can be at most one more than the coddimension
of A ∩ A1 ∩ · · · ∩ Ai in A. Thus A ∩ A1 ∩ · · · ∩ Ak has codimension at most
k in A. But since A ∩ B contains this intersection, we have
codimA (A ∩ B) ≤ k < d = codimA (A ∩ B),
an absurdity.
Thus we see that
dG (A, B) = d = dim(B/(A ∩ B)) = dim(A/(A ∩ B)),
exactly.
Now we address part 3. Let A and B be maximal elements of (T , ≤),
and let H be a hyperplane of B. We have already seen that the finite rank
assumption forces A ∩ B to have finite codimension in A, and that there is
an isomorphism
ᾱ : (B/(A ∩ B) → (A/(A ∩ B))∗ ,
It follows that H ∩ A has finite codimension e in A, and codimension e − 1
in H. This difference in codimensions shows that H ⊥ ∩ A contains H ∩ A as
a hyperplane. Then B0 = H + (H ⊥ ∩ A) is an element of M(T ) which meets
A at H ⊥ ∩ A with codimension e − 1 in A. Thus B0 has distance e − 1 from
A in the graph G.
Conversely, suppose B 0 were a member of M(T ) containing H and having
distance e − 1 from A. Then by part 2, B 0 ∩ A has codimension e − 1 in
A and contains H ⊥ ∩ A, also at codimension e − 1 in A. It follows that
B 0 = H + (H ⊥ ∩ A) = B0 . So there is a unique element of M(T ) containing
H and having distance e − 1 from A.
There is one last relevant detail.
553
Theorem 380 Suppose (T ) is a polar system of Witt index n > 0 relative
to the sesquilinear form (V, f ). Then the following statements are true.
1. Fix a maximal space A in M(T ). Then there exists a maximal space
B in M(T ) such that A ∩ B = rad(T ). (In that case we say that B is
opposite A.)
2. Let H be a hyperplane of some element of M(T ) chosen so that H
contains rad(T ). Then H lies in at least two members of M(T ).
3. Fix integer k < n. Then in the graph G, every geodesic path of length
k extends to one of length k + 1.
proof: 1. In proving this part, we may assume that rad(T ) = 0 and that all
members of M(T ) have dimension n. Fix A in M(T ) and choose B ∈ M(T )
so that the dimension of B is as small as possible. If A ∩ B = 0 we are done,
so assume A ∩ B > 0. Since rad(T ) = 0 there is a maximal element D in
M(T ) which does not contain A ∩ B, and hence an element x in D so that
x⊥ does not contain A ∩ B.
Now hxi ∈ T , and x does not lie in B. Thus C := hx, x⊥ ∩ Bi is in T and
because its dimension is n, lies in M(T ). We claim that C ∩ A is properly
contained in B ∩ A. Suppose their were a 1-space hzi in C ∩ A − B. Then
z ⊥ would contain A ∩ B and the hyperplane x⊥ ∩ B = C ∩ B of B, and so
would contain B = (C ∩ B) + (A ∩ B). But this would force z ∈ B ⊥ so
hz, Bi ∈ T . Since B is maximal in T we must then have z in B, contrary to
our choice of z. Thus C ∩ A ≤ B ∩ A ∩ x⊥ which lies properly in A ∩ B. But
that contradicts the minimality of the intersection A ∩ B. Thus the minimal
intersection with A must be 0.
2. Suppose H is a hyperplane of B ∈ M(T ). By part 1, there exists a
maximal element A of T such that B ∩ A = rad(T ). It follows that H ⊥ ∩ A
contains rad(T ) as a hyperplane, and that C := hH ⊥ ∩ A, Hi is a member of
M(T ) containing H which is different from B.
3. Suppose A and B are vertices of G = (M(T ), ∼) at distance k < n.
Then A ∩ B has codimension k in B and so properly contains rad(T ). Thus
we can select a hyperplane H of B containing rad(T ) but not containing
A ∩ B. Now by part 2. H lies in a maximal element C ∈ M(T ) distinct
from B. Now as H ∩ A has codimension k + 1 and H ∩ A is either equal
to or is a hyperplane of C ∩ A, C ∩ A cannot have codimension k − 1 in A.
Applying Theorem 379, part 2, all the elements of M(T ) which contain H
554
have distance at least k from A. But by part 3 of the same Theorem, among
the elements of M(T ) which contain H, there is a unique one closest to A.
Evidently that is B. Since C 6= B we must have dG (A, C) = k + 1, and
theorem is proved.
Remarks: Suppose L is a collection of subsets of a set P. each subset
containing at least two elements. Then (P, L) is called a point-line geometry: the elements of the set P are called points while the elements of L
are called lines. Associated with each point-line geometry is a collinearity
graph (P, ∼) whose vertices are the points, any two being adjacent if and
only if they are distinct yet contained together in some line.
In the case of a polar system T of finite rank, we have a point-line geometry Γ = (P, L) whose points are the set M(T ) of maximal elements of
T and whose lines are the collections LH of elements of M(T ) containing a
hyperplane H of some element of M(T ). The requirement that lines contain
at least two points is Theorem 380, part 2. The associated point-collinearity
graph is just our graph G = (M(T ), ∼) above.
Now Theorem 379, part 2, shows that the lines of this geometry Γ are
always gated in the collinearity graph. A point-line geometry all of whose
lines are gated is called a near polygon. Examples of near polygons are
the generalized polygons other than the projective planes. The class of near
polygons with all lines having just two points is coextensive with the class of
bipartite graphs.
Suppose, now, (P, L) is a near polygon all of whose lines have at least
three points. If x and y are non-collinear points for which there are at least
two points adjacent to both of them, then there is a convex generalized
quadrangle containing {x, y}. If this occurs for all distance-two pairs (x, y)
we say that “quads exist fo (P, L)”. A theorem of Peter Cameron states
that if, for a near polygon of finite diameter at least three with quads, each
quad is itself a gated subgraph of the point-collearity graph, then the near
polygon is one obtained from a polar system T as at the beginning of these
remarks.
555
16.5
Decompositions of Forms of Finite Rank
16.5.1
Introduction
We say that the reflexive form f has finite rank if and only Rad(f ) has finite
codimension in V . The results of this section attempt to describe special
decompositions of V . The decompositions that one has in mind are relative
to an orthogonal basis or to a hyperbolic basis – at least to the extent that
this is possible. For indeed, sometimes, it is not possible. Here, the results
are sensitive to the algebraic properties of the division ring D which underlies
everything. For example, it will emerge that it is possible for a symmetric
form in characteristic 2 to have all of its isotropic vectors occuring within
one unique totally isotropic subspace, without that form being degenerate –
a fact rarely advertised.
It will be useful to recall a few definitions before getting into specifics.
Thoughout, f is at least a reflexive sesquilinear form on a right D-space V .
First, a key notion for defining decompositions: Suppose V is the direct sum
of two subspaces A and B. If, in addition f (A, B) = 0 (equivalently, A ⊆ B ⊥
or B ⊆ A⊥ ) , we write
V = A ⊥ B,
with the “big perp” symbol, and say that V is the orthogonal direct sum
of A and B.4
In Section 1 we said that a vector v was isotropic if and only if f (v, v) =
0. Now we say that a subspace W of V is totally isotropic if and only if
f (W, W ) = 0, or equivalently W ⊆ W ⊥ . (Note that this is much stronger
than just saying all vectors of W are isotropic.) At the other extreme, a
subspace W is said to be anisotropic if and only if 0 is the only isotropic
vector in W .
Recall that a σ-sesquilinear form f was symplectic if and only if every
vector of V was isotropic. In that case f (u, v) = −f (v, u) for all u, v ∈ V .
The sesquilinear form f is said to be symmetric if and only if f (u, v) =
f (v, u) for all u, v ∈ V . In this case, either f = 0 or σ = id and D is a field.
4
This notation goes back to Artin’s wonderful book: Geometric Algebra.
556
16.5.2
Orthogonal Decompositions.
Theorem 381 Suppose f : V × V → D is a reflexive σ-sesquilinear form of
finite rank. Assume the following:
1. f is not a symplectic form.
2. In the case that f is a symmetric form and D is a field of characteristic
2, then V /Rad(f ) is anisotropic.
Then there exist non-isotropic vectors v1 , . . . , vn such that
V = hv1 i ⊥ hv2 i ⊥ · · · ⊥ hvn i ⊥ Rad(f ).
proof: Without loss, we may assume f is non-degenerate (Theorem 367),
so V has finite dimension, f is not symplectic, and V is anisotropic when f is
symmetric and D is a field of characteristic 2. Now since f is not symplectic,
there exists a non-isotropic vector v. Then v ⊥ is a hyperplane of V not
containing v so V = hvi ⊥ v ⊥ . If v ⊥ = 0 there is nothing to prove. So v ⊥
is a non-degenerate space. At this stage Rad(f ) has codimension at least 2,
and so by Theorem 368 the constant exists.
Now suppose the restriction of f to v ⊥ were symplectic. Then v ⊥ , being non-degenerate, contains two vectors x and y such that f (x, y) = 1 =
−f (y, x), so = −1, σ = id and D is a field. Then f (v, v) = −f (v, v) = γ 6=
0, so D is a field of characterstic 2. Then f is a symmetric form. But in that
case V is anisotropic, and so v ⊥ cannot be a non-zero symplectic space.
Next suppose the restriction of f to V were symmetric and D were a field
of characteristic 2. Taking x and y in v ⊥ as in the previous paragraph, we
see once again that σ = 1 and = 1 = −1, and once again, f is symmetric
on V in characteristic 2. Then V is anisotropic, and so then is v ⊥ .
Otherwise, the restriction of f to v ⊥ is neither symplectic nor symmetric
in characteristic 2.
In either case represented by the two previous paragraphs, the restriction
of f to v ⊥ satisfies the hypotheses of the theorem, and so by induction on the
dimension, v ⊥ is an orthogonal direct sum of 1-subspaces generated by nonisotropic vectors. Adjoining hvi to this list of 1-spaces yields the conclusion.
This theorem leaves two cases to be cleared up: (1) the symplectic case
and (2) the symmetric case in characteristic two. For the symplectic case,
557
the desired decomposition is in terms of a so-called hyperbolic basis. In the
symmetric case in characteristic 2, such bases are needed, but they do not
tell the whole story. In both cases we are concerned only with fields. A
principle used to handle hyperbolic bases over fields concerns decomposition
with respect to a finite-dimension non-degenerate subspace. In the next section we content ourselves with doing this over fields. The only complicating
factor then is the automorphism σ.
16.5.3
Hyperbolic Decompositions of Finite Rank Sesquilinear Forms Over Fields.
In this subsection we are dealing only with σ-sesquilinear forms of finite rank
defined over fields. Throughout, all vector spaces are (F, F )-bimodules over
a field F as in the previous subsection.
Before addressing the issue of hyperbolic bases, we require a minor technical Lemma and its Corollary.
Lemma 382 Let σ be an automorphism of order 2 of a field F . Suppose Fσ
is the subfield of elements fixed by σ. Then the trace mapping
trσ : F → Fσ ,
which takes each element α to α + ασ , is an onto mapping.
proof: First note that σ : F → F is an invertible Fσ -linear mapping of
order 2. Therefore the trace map is a non-zero Fσ -linear mapping into the
1-space Fσ , and hence is onto.
Corollary 383 Suppose f is a reflexive non-degenerate sesquilinear form
on a vector space V over a field F . Suppose f is not a symmetric form in
characteristic 2 unless it is already a symplectic form. If v is a non-zero
isotropic vector not in the radical, then V contains another isotropic vector
u such that f (u, v) = 1.
proof: Since v is not in the radical, there is a vector n in V , with f (v, n)
non-zero. Replacing n with a scalar multiple if necessary, one may assume
558
f (v, n) = 1. If n is isotropic, we are done. So we assume that f (n, n) = γ 6= 0.
In particular, f is not a symplectic form.
Since dim V is at least 2 and (V, f ) is non-degenerate, f is a (σ, )Hermitian form by Theorem ?? Since F is a field, Corollary ??rovides us
with just three possibilities:
1. f is proportional to g where σ is an automorphism of order two and
the constant (for g) is 1.
2. σ = id, = 1.
3. σ = id, = −1.
Now for any non-zero scalar α, we calculate
f (vα + n, vα + n) = ασ + α + γ,
(16.13)
where γ = f (n, n) has the property that
γ σ = γ.
First consider case 1. A vector that is isotropic for f is isotropic with
respect to any form proportional to f . Suppose g = δf is chosen so that
g = 1 — that is, g(x, y) = g(y, x)σ for all vectors x, y. Then
g(v, v) = 0, g(v, n) = δ, g(n, n) = δγ 6= 0,
and
g(vα + n, vα + n) = 0 + ασ δ + δ σ α + δγ
= (αδ)σ + (αδ) + δγ.
Now, by Lemma 382, there exists an element β such that β σ + β = −δγ.
Setting α = βδ σ , we see that the vector u := vα + n satisfies
g(u, u) = 0, g(v, u) = δ 6= 0.
Then, for the original form f = δ −1 g, we have
f (v, v) = f (u, u) = 0, and f (u, v) = 1,
as required.
559
Similarly, in case 2, when the characteristic is odd, we can set α = −γ/2,
to obtained the desired u = vα + n.
In case 3, when the characteristic is odd, all vectors are isotropic, so f is
a symplectic form, a case already eliminated.
In the only remaining case, we have σ = id, = 1 and f is a symmetric
form over a field F of characteristic 2 which is not a symplectic form. But
that is “beyond the scope” of the hypotheses.
We say that a collection of paired isotropic vectors {u1 , v1 }, . . . , {un , vn } is
a hyperbolic basis of a vector space V relative to the relexive σ-sesquilinear
form f , if and only if
V = hu1 , v1 i ⊥ · · · ⊥ hun , vn i,
and f (ui , vi ) = 1, for each i = 1, . . . n. (This implies f (ui , uj ), f (vi , vj ) and
f (ui , vk ) are all zero for all i, j and k distinct from i.)
Theorem 384 Suppose f is a σ-sesquilinear form of finite rank over the
field F . Suppose f is not a symmetric form in characteristic 2 unless it is
already a symplectic form. Then
V = A ⊥ H ⊥ Rad(f ),
where A is anisotropic, and H possesses a hyperbolic basis.
proof: Without loss of generality we may once again assume that Rad(f ) =
0 so that f is non-degenerate and dimF (V ) is finite. Now if V is anisotropic
to start with, there is nothing to prove. So suppose v1 is a non-zero isotropic
vector. Then v1 does not lie in Rad(f ). Since f is not symmetric in characteristic 2, Corollary 383 insures the existence of a second isotropic vector u1
with f (v1 , u1 ) = 1. By Theorem 372, we have a decomposition
V = U1 ⊥ (U1 )⊥ , where U1 := hv1 , u1 i.
Now if (U1 )⊥ = 0, we are done. Also, if (U1 )⊥ were anisotropic, we should
also be finished. So (U1 )⊥ contains an isotropic 1-space. Clearly if dim U1 = 1
then U1 would be the radical of V , contrary to assumption. So dimF (U1 )⊥ ≥
2. This is enough to see that σ and are controlled by (U1 )⊥ . Thus if the
restriction of f to (U1 )⊥ were symmetric in characteristic 2, the same would
560
be true of V . Thus induction applies to the lower- dimensional space (U1 )⊥ ,
and we see that it in turn is the orthogonal direct sum of a hyperbolic
space (one with a hyperbolic basis) and an anisotropic subspace. Adjoining
the subspace U1 , the decomposition extends to V . The proof is complete.
remark: The availability of large anisotropic subspaces to be presented as
a summand in the conclusion of Theorem 384 depends critically upon the
field. We shall examine some of the possibilities which arise in a later section
on special cases. From the point of view of decompositions alone, they do
not present a problem. Each of them has an orthogonal basis.
16.5.4
Symmetric forms in characteristic 2
By now the reader should have noticed one elusive case so far just beyond
the grasp of the theorems of this section. That case is that of a symmetric
form of finite rank in characteristic 2 where f is not symplectic. We address
that now.
Suppose then that f is a non-degenerate symmetric form f : V × V → F ,
where F is a field of characteristic 2. The first observation is
The set of all isotropic vectors forms a subspace N of V .
This is easy to see. If f (u, u) = f (v, v) = 0 then
f (u + v, u + v) = f (u, u) + f (u, v) + f (v, u) + f (v, v) = 2f (u, v) = 0.
Clearly all exterior vectors of V −N , are not isotropic, and so any complement
A of H in V is anisotropic.
But the vectors in H are not necessarily all alike. One may form a graph Γ
with vertex set the isotropic 1-spaces of V , any two being adjacent if and only
if they are distinct and perpendicular. The radical, R, of this graph consists
of all those 1-spaces which are adjacent to all others. Clearly the subset T
of all vectors which lie within a 1-space of R, comprises an F -subspace of H,
namely T , – the subspace of all isotropic vectors which are perpendicular to
all other isotropic vectors. This is clearly a subspace which is invariant in
the sense that it is defined only by global properties of (V, f ).
Now we wish to approach a certain canonical subspace A ⊕ T of V . We
first consider any non-zero isotropic vector u1 not in T . Then there exists
561
another isotropic vector v1 such that f (u1 , v1 ) = 1. Then by Theorem 372,
upon setting U1 := hu1 , v1 i, we have
V = U1 ⊥ (U1 )⊥ .
Now (U1 )⊥ contains all of T . Again we ask if there are any isotropic vectors
of (U1 )⊥ , which do not lie in T , choose one, say u2 , find v2 isotropic such
that f (u2 , v2 ) = 1, form U2 := hu2 , v2 i, and obtain another decomposition
V = U1 ⊥ U2 ⊥ V2 ,
where still V2 containns T . Since dimF (V ) is finite, this process eventually
terminates with a collection {ui , vi }, i = 1, . . . n, of paired vectors so that
setting Ui := hui , vi i, one has
V = U1 ⊥ U2 ⊥ · · · ⊥ Un ⊥ B,
where B has this peculiar property:
(2) All non-zero isotropic vectors of B are perpendicular to all other isotropic
vectors of V . So they form a totally isotropic subspace of B which is
evidently T .
At this stage
V = U1 ⊥ U2 ⊥ · · · Un ⊥ (T ⊕ A).
Obviously if their were a non-zero isotropic vector x in V outside the space
spanned by all but the last summand, then it would not lie in T , and so
could be used to build still another direct orthogonal summand Un+1 . Thus
the definition of n as a terminating step forces
N = U1 ⊥ U2 ⊥ · · · ⊥ Un ⊥ T
and
T = Rad(N ),
which I think we knew anyway.
Let us take a look at B = T ⊕ A
We have
562
Theorem 385 Suppose f is a non-degenerate symmetric form of finite rank
in characteristic 2 on the F -vector space V . Then
V = H ⊥ (T ⊕ A),
where
1. N = H ⊕ T is the symplectic space of all isotropic vectors.
2. H is a perpendicular direct sum of hyperbolic 2-spaces.
3. T = Rad(f |(H⊕T ) ) = H ⊥ ∩ N .
4. T ⊕ U is non-degenerate, and T is the set of all isotropic vectors of
T ⊕ A.
The first summand of the theorem is easy to understand. It has a hyperbolic basis and so yields a unique Grammian. Casting that aside, what is
one to do with a space such as W = T ⊕ A?
First, despite the fact that all its isotropic vectors comprise all the vectors
of T , the unique maximal totally isotropic subspace of B = T ⊕ U , these
vectors do not lie in the radical, nor is f degenerate on B.
Do such non-degenerate subspaces exist? We can construct as many such
subspaces B as the field allows us to have anisotropic subspaces. Suppose
A is an anisotropic space under the symmetric form f1 over a field F of
characteristic 2. Fix a non-degenerate subspace A0 , and let A1 = A⊥
0 so
A = A0 ⊥ A1 . Let π : A → T be a vector space epimorphism with kernel A1 .
There is then an injection µ : T → A0 which is a ‘partial inverse’ of π; that
is, for any a ∈ A0 , µ ◦ π is the identity map on A0 . Formally set B := T ⊕ A.
We extend the symmetric form f1 to a form f on B by this rule:
f (t1 + a1 , t2 + a2 ) = f1 (µ(t1 ), a2 ) + f1 (a1 , µ(t2 )) + f1 (a1 , a2 ).
Then f is a symmetric bilinear form on B. Suppose an element z = t0 + a0
were in Rad(f ), where t0 ∈ T and a0 ∈ A. Then at the very least 0 =
f (z, t) = f (t0 + a0 , t) = f1 (µ(t), a0 ) so a0 lies in (µ(T ))⊥ ∩ A = A1 . On the
other hand, such an element z is in A⊥ (with respect to f ) so
f (z, a) = f (t0 + a0 , a) = f1 (µ(t0 ), a) + f1 (a0 , a) = 0,
563
for all a ∈ A. This means
f1 (µ(t0 ), a) = −f1 (a0 , a) for all a ∈ A.
Now consider: We have shown that a0 lies in A1 = (A0 )⊥ . So as a ranges over
A0 , we get f1 (µ(t0 ), a) = 0. It follows that µ(t0 ) lies in (A0 )⊥ ∩(µ(T ) = A0 ) =
0, as A0 was chosen non-degenerate. Thus t0 = 0. This forces a0 ∈ A⊥ = 0,
so z = 0. We have just shown that the extended f is non-degenerate.
Now let us show that all the isotropic vectors of (B, f ) lie in its summand
T , in this construction. We can write an arbitrary vector outside T in the
form v = t + a where t ∈ T and a ∈ A. Then
f (t + a, t + a) = f (t, t) + 2f (t, a) + f (a, a) = f (a, a) 6= 0,
since f is symmetric, F has characteristic 2, T is totally isotropic, and A is
anisotropic – each of these four facts are used. Thus we see
Corollary 386 For every anisotropic space (A, f1 ) where f1 is a symmetric
bilinear form in characteristic 2, and for every non-degenerate k-subspace A0
of A, there exists a non-degenerate space (B, f ), where
1. dimF (B) = dimF (A) + dimF (A0 ).
2. f is a non-degenerate symmetric form in characteristic 2,
3. (B, f ) contains (A, f1 ) as a nondegenerate subspace under the restriction of f ,and miraculously,
4. all isotropic vectors of B form a subspace complement (but not an orthogonal complement) to A.
Conversly, we leave it as an exercise that if (V, f ) is a non-degenerate
symmetric form over a finite dimensional vector space V in characteristic
2 with the property that all of its isotropic vectors comprise one totally
isotropic subspace T , then V arises by the above construction.
final remark: If Rad(f ) has infinite codimension, then hardly a single
remark in this section remains valid. In fact the number of isomorphism
types of non-degenrate forms on a countably-infinite-dimensional space V
tends to the cardinality of the continuum. This result uses model theory. (I
recommend the papers of J. I. Hall and Cameron and Hall to the inquisitive
student.)
564
16.6
Special cases of sesquilinear forms
16.6.1
Symplectic forms
As one might glean from the preceeding sections, the case of symplectic forms
of finite rank are the simplest, at least in the case of forms of finite rank..
If such a form is non-degenerate, then one has a hyperbolic basis and the
Grammian for this basis consists of n copies of the 2 × 2 matrix
0 1
−1 0
!
centered along its main diagonal. Thus dimF V = 2n is even. The particular
field offers no complication here. In fact, the parameter n alone determines
the non-degenerate symplectic forms over F up to isomorphism.
16.6.2
Hermitian forms
Next in degree of complication are the Hermitian5 forms f : V × V → F .
Here F is a field, and there is an involutatory field automorphism σ such
that for any vectors u, v of V ,
f (uα, vβ) = ασ f (u, v)β, α, β ∈ F,
f (u, v) = f (v, u)σ .
(16.14)
(16.15)
Setting Fix(σ) := Fσ , we see that [F : Fσ ] = 2. There are two important
mappings F → Fσ to consider.
1. The trace mapping tr : F → Fσ where tr (α) = α + ασ for all α ∈ F .
This mapping is Fσ -linear and is surjective.
2. The norm mapping N : F → Fσ where N (α) = αασ for all α ∈ F .
This mapping is multiplicative in that N (αβ) = N (α)N (β), for all
α, β ∈ F . It is not in general surjective.
Suppose the Hermitian form h has finite rank and is non-degenerate.
Then there are two decompositions available :
5
A recent fashion has been to spell this adjective “Hermitean” in honor of the silent
final “e” in Hermite’s surname.
565
(1) V = hv1 i ⊥ · · · ⊥ hvn i, N (vi ) 6= 0,
(2) V = H ⊥ A,
where H is a perpendicular direct sum of hyperbolic 2-spaces, and A is a nondegenerate subspace possessing at most one isotropic 1-space. Of course the
subspace H is uniquely determined up to isomorphism of forms (isometry).
The presence or absence of this dangling remainder A – even the set of its
permissable dimensions – is very sensitive to the field F and depends entirely
on the norm mapping. First of all, A itself has an orthogonal decomposition
as does V in (1), so let us suppose V has a basis {vi } is as in (1) and set
h(vi , vi ) = ωi . Clearly each ωi lies in Fσ . Now the availability of a non-zero
isotropic vector depends on whether
X
f(
vi αi ,
X
vi αi ) =
X
ωi N (αi )
(16.16)
can be zero without all αi being zero.
The complex Hermitian forms
Suppose F = C, the field of complex numbers and σ is complex conjugation.
Then Fσ is the real field R and the norm mapping takes C∗ = (C − {0}, ·)
to the multiplicative group of all real numbers. Now if the constants ωi are
all positive, the only way the sum in (16.16) can be zero is if all αi are zero.
Thus the entire space V in (1) is anisotropic in this case. We say then that
the form h is a positive definite (or anisotropic) Hermitian form – in
this case, over C. Note that since the subgroup of positive reals has index 2
in the multiplicative group of all non-zero reals with coset representatives 1
or −1, we can assume without loss of generality that ωi = ±1. Thus in the
positive definite case, we take all ωi = 1 and this sort of Hermitian form is
unique up to isometry. (This is an important case, so much so that when an
analyst refers to a “Hermitian” form, he or she means the positive definite
one (although finite dimension may not be assumed).
This leaves, of course, the remaining much neglected Hermitian forms in
which some of the ωi might be −1. Replacing h by its proportional form −h
if necessary, we may assume that exactly d ≤ [n/2] of the ωi are equal to −1;
the rest are equal to +1. Rearranging the order of the bases, we may suppose
ωi = −1 if i ≤ d and ωi = +1 if i > d. Then the vectors wi = vi + vd+i ,
i = 1, . . . , d are isotropic, pairwise orthogonal and linearly independent, and
566
so span a totally isotropic subspace T of dimension d. Then T is clearly a
maximal totally isotropic subspace, and its dimension d is called the Witt
index of (V, h). Similarly, the subspace T 0 spanned by the independent
vectors ui = vi − vd+i , i = 1, 2, . . . , d, also forms, a totally isotropic d-space,
and that
T ⊕ T 0 = hv1 i ⊕ · · · ⊕ hv2d i
is nondegenerate. Thus V = (T ⊕ T 0 ) ⊥ A where A is spanned by vi , 2d + 1 ≤
i ≤ n, and is anisotropic, since h|A is positive definite.
Hermitian forms when the norm-mapping is surjective.
One can quickly see that the study of isotropy for these Hermitian forms
generates questions about √
the index of the norm group N (F ∗ ) in Fσ∗ . For
example, suppose F = Q( 2) and σ is its unique automorphism of order
two. Here
√ Q denotes the field of rational numbers and for a and b rational,
N (a + b 2) = a2 − 2b2 . Does N (F ∗ ) have infinite index in its overgroup
√
(Q∗ , ·)? Can an anisotropic subspace of a σ-Hermitian form over a Q( 2)space have arbitrarily large dimension?
Suppose now the best of all worlds has arrived: the norm mapping N :
∗
F → Fσ∗ is surjective. There is then only one non-degenerate Hermitian
form for each finite rank n. That is because each ωi in equation (16.16) can
be taken to be 1 — that is, V has an orthonormal basis.
In this case it is impossible to have an anisotropic space A of dimension
two or higher. If A were non-degenerate of dimension 2 we should have
A = hv1 , v2 i with v1 , v2 orthonormal as above. Then as N is surjective, we
can find α with N (α) = −1. Then h(v1 α + v2 , v1 α + v2 ) = N (α) + 1 = 0 so
v1 α + v2 is isotropic. Indeed, we have as many such vectors of this form as we
have scalars α of norm −1; these elements form a coset of the multiplicative
group comprising the kernel K ∗ of the norm homorphism. We argue that
K 6= 1. Suppose β ∈ F − Fσ . Then α = β −1 β σ 6= 1, and ασ = α−1
so N (α) = 1. Thus hv1 , v2 i contains a pair of non-perpendicular isotropic
1-subspaces and so is a hyperbolic space. Thus in this case:
1. V is hyperbolic, provided dimF V is even, or else
2. V = H ⊥ hvi where H is hyperbolic and h(v, v) = 1, whenever dimF V
is odd.
In either case, h is uniquely determined up to isometry.
567
The hypotheses of the previous paragraph hold whenever F = GF(q 2 ),
the finite field having q 2 elements, and σ is the q-th power mapping.
Hermitan forms over GF (4).
The Hermitian form is particularly easy to calculate when F = GF(22 ). Here
the norm mapping is the cubing map, α → α3 , and it has value 1 if and only
if its argument is non-zero. Thus for an orthonormal basis {vi }, if
u=
X
vi αi , and v =
X
vi βi ,
then
αi 2 βi .
h(u, v) =
X
h(u, u) =
X
In particular,
(αi )3
and is zero if and only if an even number of the αi are zero — that is, the
vector u has “even weight”6 . Thus if V = GF(22 )(5) then V has the five
weight one vectors as an orthonormal basis, and so the singular vectors are
the vectors of weights 2 or 4. Thus there are
5
2
!
·3+
5
4
!
· 33 = 165 = 11 · 15
isotropic 1-spaces in this vector space7 .
16.6.3
Special cases for orthogonal forms.
Like the Hermitian forms, this case is very sensitive to special properties of
the particular ground field. We consider again, only non-degenerate symmetric bilinear forms of finite rank, but this time in fields of characteristic
not two (to avoid something that differs in only one codimension from the
symplectic case).
Now, for such a non-degenerate form (V, f ) of finite rank, we have,
V = hv1 i ⊥ · · · hvn i.
6
(16.17)
Here, I am commandeering a term from coding theory.
Yes, the groups of isometries for this Hermitian form U (5, 22 ) contains an element of
order 11 actinging irreducibly on V .
7
568
Set γi = f (vi , vi ). By replacing the basis element v = vi by a scalar multiple
u = αv, we still have u perpendicular to all the other vj , but now f (u, u) =
γi0 = α2 γi . Thus, if R is a complete system of coset representatives of the
subgroup S of squares in the multiplicative group (F ∗ , ·), each γi can be made
to be an element of R.
Orthogonal forms over algebraically closed fields
Actually, the title of this subsubsection overstates the needed hypothesis: we
want a field K which admits no algebraic extensions of degree two. In such
a field it is clear that for every element γ, there exists a “square root”, that
is, an element β such that β 2 = γ.
Suppose K is such a field, that V is a finite-dimensional vector space over
V , and that b : V × V → K is a non-degenerate orthogonal form. Then, V
is an orthogonal direct sum of 1-spaces as in equation 16.17. Now there is a
square-root βi of each γi−1 , and so replacing each vi by vi βi we obtain a new
basis for V for which the Grammian matrix is the identity matrix. So b is
equivalent to the ordinary “dot product”.
Assume the characteristic is not 2, and let i be the square root of −1 in
the field K. Suppose n = dim V , and set m = n/2 or (n − 1)/2 according as
n is even or odd. Then there exists a subspace
M ={
Xm
v
a
j=1 2j−1 j
+ v2j i · aj |aj ∈ K}
which is totally isotropic of dimension m. Since b is non-degenerate, M is a
maximimal isotropic subspace and so m is the Witt index.
Orthogonal forms over the real numbers
In the case that K = R, the field of real numbers, the set S of non-zero
squares (which is just the positive real numbers) is a subgroup of index two
in the multiplicative group (R∗ , ·) of non-zero real numbers with R = {1, −1}
as a system of coset representatives. Thus if b is an orthogonal form of finite
rank on the real vector space V , then, in the notation at the beginning of
this subsection, the orthogonal basis {vi } can be chosen so that each basis
norm γi = b(vi , vi ) = ±1, i = 1, . . . , n. Replacing b with −b, if necessary,
we can assume the norm “+1” occurs at least as often as the norm “−1”.
Rearranging the order of the basis elements, we can assume all bases elements
569
of norm −1 occur first. Then the Grammian matrix G has the form
G = diag(−1, −1, . . . , −1, 1, . . . , 1),
with the first b = n−a diagonal entries equal to −1, and the last a equal to 1,
where a ≥ b. Thus we see that the orthogonal form is completely determined
up to isometry, by just two parameters: the dimension n, and the parameter
a (or equivalently b = n − a).
Now all vectors
wj := vj + vj+b , j = 1, . . . , b,
are isotropic and pairwise orthogonal, and so generate a totally isotropic
subspace M of dimension b. Similarly, the vectors wj0 := vj − vj+b , j =
1, . . . , b are pairwise orthogonal and so span another totally isotropic space
M 0 . Noting that
b(wj , wk0 ) = −2δjk ,
it is easy to see that M ⊕ M 0 is a perpendicular direct sum of hyperbolic
2-subspaces, whose perpendicular complement A = (M ⊕ M 0 )⊥ has a Grammian GA which is the identity matrix. Thus (A, b) is just a Euclidean space,
and so A is anisotropic. It follows that M and M 0 are maximal isotropic
spaces and so the Witt index is the parameter b.
The most well-known cases are the following:
• Euclidean n-space En : Here b = 0 and G is the identity matrix.
The form is said to be positive definite which, like the complex Hermitian case, means b(v, v) ≥ 0 for all vectors v and equality holds only
when v = 0.
• Lorentzian n-space: Here b = 1, so G = diag(−1, 1, . . . , 1). When
n = 4, it is the Lorentz space where special relativity is played out.
When n = 3, the isotropic vectors form a cone.
Orthogonal forms over GF (3)
For orthogonal forms over finite fields, calculation is perhaps easiest in the
case of the field of three elements. Here the constants γi are ±1, and as we
did for the real case, we may assume the basis norm 1 occurs at least as often
as −1. This time we re-order the orthogonal basis so those of norm 1 occur
570
first. Thus G = diag(1, 1, . . . , 1, −1, . . . , −1). Let a be the number of +1’s
which appear.
Do we get a different orthogonal geometry for each value of the parameter
a ≥ n/2, as we did the the real numbers? The answer is “No”.
With the Grammian G defined relative to the orthogonal basis {vi }, as
P
described just above, we see that the norm of a general vector u = i vi αi is
b(u, u) =
Xa
α2 −
j=1 j
Xn
α2 .
j=a+1 j
(16.18)
Now here is the advantage of the field GF (3). Each summand αj2 is 1
if and only αi is non-zero. Thus the vector u is isotropic if and only if the
difference between the number of non-zero coefficients appearing among the
first a and the number occurring after the a-th one, is a multiple of three.
Let’s look at some examples and count the number of isotropic 1-spaces.
P
We can represent a general vector u = i vi αi , by an n-tuple
(α1 , . . . αa , ; αa+1 , . . . , αn ),
where the semicolon indicates the position after which the basis norm switches
from +1 to −1. We call this position between the a-th and (a + 1)-th coefficients, the “break”. Of course, if a = n, so all these norms are +1, we do
not write the semiclon. A vector u is said to be of shape
(∗c , 0d ; ∗e , 0f ), where c + d = a, e + f = b = n − a,
if and only if exactly c of the coefficients occurring before the break are nonzero and exactly e of the coefficients after the break are non-zero. The order
of these coefficients on either side of the break is not specified. Then for a
vector of this shape to be singular, we need only have c ≡ e mod 3.
Examples in dimension 4.
Suppose first that the Grammian G =
diag(1, 1, 1, 1). !Then a vector is isotropic if and only if it is of shape (∗3 , 0).
4
There are
ways to choose the position of the zero, and 23 = 8 ways
1
to choose the non-zero entry ±1 at each “star”,∗. But since we are counting 1-spaces, not vectors, we must divide the latter number by 2. Thus the
number of isotropic 1-spaces in this geometry is
4
1
!
× 22 = 16.
571
Now suppose G = (1, 1, 1, −1) – a sort of finite version of a Lorentz
space. Then the possible shapes of isotropic vectors are listed below, each
accompanied by the count of isotropic 1-spaces they produce:
(∗3 ; 0) :
(∗, 02 ; ∗) :
23 /2 = 4,
!
3
× 2 × 2/2 = 6.
1
So now there are just 4 + 6 = 10 isotropic 1-spaces. Evidently, this geometry
differs from the first.
Finally, suppose G = (1, 1, −1, −1). The number of isotropic vectors are
distributed among the following shapes:
(∗, 0; ∗, 0) :
(∗2 : ∗2 ) :
2 × 2 × 2 × 2/2 = 8 1-spaces
22 × 22 /2 = 8 1-spaces.
So we are back to the same count, 16 as the first example. In fact this
geometry is isometric to the first one, but we leave that issue to an exercise.
16.7
Sesquilinear forms obtained from others
by reduction to a subfield.
16.7.1
Forms derived from a Hermitian form by field
reduction
For every Hermitian form h, there are two related forms, both defined over
the subfield Fσ . If V is an n-dimensional F -space with basis {v1 , v2 , . . . , vn },
then V may be regarded as a 2n-dimensional space W over Fσ with basis
{v1 , ψv1 , v2 , ψv2 , . . . , vn , ψvn },
where F = Fσ ⊕ ψFσ as an Fσ -space.
First the function
Hh : W → Fσ
which takes each vector v ∈ W to h(u, u) is a quadratic form whose associated
bilinear form B is defined by
B(u, v) = h(u, v) + h(v, u) = tr (h(u, v))
572
for all u, v ∈ W . Note that every 1-space hvi which is non-isotropic with
respect to the Hermitian form h, becomes, in W , an anisotropic 2-dimensional
subspace.
Secondly, one can define a symplectic form B − on W as follows: Since
σ has order 2, there must exist a scalar θ in F such that θσ = −θ. (In
characteristic 2 one may take θ = 1.) Then B − : W × W → Fσ is defined by
B − (u, v) = θ−1 (h(u, v) − h(v, u)).
This form assumes values only in Fσ since any scalar β (such as h(u, v) −
h(v, u)) is an Fσ -scalar multiple of θ. Writing F = Fσ ⊕ ψFσ as above,
where ψ σ − ψ = θ, we see that {v1 , ψv1 , . . . , vn , ψvn } is a hyperbolic basis for
(W, B − ) with
V = hv1 , ψv1 i ⊥ . . . ⊥ hvn , ψvn i.
Now every isometry of (W, B − ) is a fortiori an isometry of (V, h). This
gives a containment among classical groups:
U (n, F ) ≤ Sp(2n, Fσ ),
where Sp(2n, Fσ ) is just the symplectic group of isometries of (W, B − ) and
the unitary group U (n, F ) is realized as the Hermitian isometries of (V, h).
Note that in either of these two derived forms, any two subspaces which
were orthogonal in V with respect to the hermitian form h become orthogonal
subspaces of W with respect to either of the forms B or B − .
16.8
Exercises for Chapter 15
16.8.1
Miscellaneous warm-up problems
1. Let X be a vector space over a field F with a countable basis B =
{b1 , b2 , . . .}. Let us define a (σ, )-hermitian form on X by declaring
f (bn , bn )
f (b2n−1 , b2n )
f (bi , bj )
f (x, y)
=
=
=
=
0, and
1, for all positive integers n,
0, for all other pairs (i, j), i < j, and using
f (y, x)σ , for all basis pairs (x, y).
573
Thus X is an orthogonal direct sum of countably many hyperbolic 2subspaces. Next form the vector space V = uF ⊕ vF ⊕ X and extend
the form fX on X to a form f on V by declaring
f (u, v) = 1,
f (v, bn ) = 0, and
f (u, bn ) = 1, for all positive n.
(a) Write out a formal proof showing that the form f is non-degenerate.
(b) Show that X ⊥ = hvi and so V is not a direct sum X ⊕ X ⊥ as in
Theorem 372.
2. Let h1 , . . . , hd be a collection of functionals on a vector space V over a
field F . Let F (d) be the vector space of d-tuples of elements of F .
(a) Show that the mapping γ : V → F (d) which sends each vector v
to the row vector
γ(v) := ((v)h1 , . . . , (v)hn )
is a linear transformation.8
(b) If the image space γ(V ) has dimension less than d, show that the
functionals h1 , . . . , hd are linearly dependent in V ∗ . [Hint: Since
γ(V ) is a k-dimensional subspace of the space of row vectors of
length d with k less than d, there exists a non-zero column vector
a = (a1 , . . . , ad )T such that r ·a = 0 for every row vector r ∈ γ(V ).
P
Conclude that ai hi is the zero functional.]
3. Assume f is a reflexive sesquilinear form on the F -vector space V ,
where F is a field. Let W be a d-subspace of V and select any ordered
basis (b1 , . . . , bd ) of W . Define the function γ : V → F (d) by the
equation
γ(v) = (f (v, b1 ), f (v, b2 ), . . . , f (v, bd )).
(a) Show that γ is a linear transformation with ker γ = W ⊥ .
8
For convenience, the functionals act on the right here.
574
(b) Show that if the codimension of W ⊥ in V is less than d, then
Radf (V ) ∩ W 6= 0. In particular, f is not non-degenerate. [Hint:
First observe that part (a) implies dim γ(V ) = codim(W ⊥ ). If this
number is less than d, then one can conclude from the preceeding
Exercize 5(b) that the functionals
f ( , b1 ), . . . , f ( , bd ),
are linearly dependent in V ∗ .]
[The reader should compare the result of part (b) with the results of
Theorems 372 and 373 of Section 15.3.]
16.8.2
Forms and algebras.
4. Suppose A(V, ◦) is an F -algebra. Let f ∈ A∗ – that is, f is a functional
on A. Show that the function fA : V × V → F whose value at the
ordered pair (u, v) is f (u ◦ v), is a bilinear form on V .
5. (a) Show that fA is reflexive if and only if ab ∈ ker f if and only if
ba ∈ ker f . In particular, ker f must contain the commutator
space
[A, A] := hx ◦ y − y ◦ x|x, y ∈ Ai
(b) For any n × n matrix, M = (mij ), the sum of the diagonal entries
of M is called the trace of the matrix M . Show that taking
traces defines a functional tr : F n×n → F .
(c) If A = F n×n is the associative algebra of all n × n matrices under
ordinary matrix multiplication, Exercize 1. produces the form
trA on F = F (n×n) , whose value at a pair of matrices (M, N )
is tr(M N ). This form on V (or any subspace U of V ) is called
the trace form on V (or U ). Show that the trace from is a
symmetric bilinear form.
(d) Show that this form is non-degenerate. [Hint: Form the standard
basis B = {Eij } for the algebra F n×n where Eij is the matrix
with (i, j)-th entry equal to 1 and all remaining entries equal to
0. Then Eij ◦ Ek` = Ei` or 0, depending on whether j = k or not.
575
For part (3) show that trA is symmetric when evaluated on pairs
of standard basis elements. For part (4) show that
trA (Eij ◦ Ek` ) =
(
1
0
if j = k and ` = i,
otherwise.
What does this say about the Grammian matrix of trA with respect to the basis B?]
6. Suppose now that (A, ◦) is an algebra over F , and g : A × A → F is a
bilinear form on A. We say that g is associative if and only if:
g(a ◦ b, c) = g(a, b ◦ c),
for all (a, b, c) ∈ A(3) . (For example, if fA were a form on algebra A
obtained from a functional f as in Exercise 1, then fA would be an
associative form. In particular, the trace form of any matrix algebra
(that is, a subalgebra of F n×n ) is associative. However, there are associative forms on non-associative algebras such as the Killing form on a
finite-dimensional Lie algebra.)
(a) Show that if g is an associative form on the (not necessarily associative) algebra (A, ◦) over F , then the subspace S := Radf (A) is
a 2-sided ideal in A.
(b) Now suppose A is an algebra over F (that is, an F -algebra which
is a ring). Let boxAssoc(A) be the vector space of all associative
forms on A. Show that there is an isomorphism
ζ : Assoc(A) → A∗ .
[Hint: Show that there is a natural mapping
Assoc(A) → hom(A ⊗A A, F )
and note that A ⊗A A ' A.]
(c) Using part 2, conclude that every associative form g ∈ Assoc(A)
has the form g = fA for some functional f ∈ A∗ (the notation fA
is from part 1).
576
1. (Forms from field traces) Let K be a finite separable normal extension
of a subfield k and let G = Gal(K/k) be its Galois group. Then, as we
know, [K : k] = |Gal(K/k)| and there is a trace mapping tr : K → k
defined by
X
tr(α) = g∈G αg .
Since K is a vector space over k, we can define an associative symmetric
bilinear form B : K × K → k by setting B(α, β) := tr(αβ).
(a) Show that B is a non-degenerate form.
(b) Suppose K = Q(i), where i2 = −1 and k = Q, the field of rational
numbers. Is K anisotropic with respect to B? Explain.
√
(c) Answer the same question when K = Q( 2) and k = Q.
subsubsectionRecipe for constructing alternating forms.
7. Fix two vector spaces V and W over a field F . Let H be the vector
space of all functions V (n) → W . We let the group Sym(n) act on H
in the following way: For each permutation π ∈ Sym(n) and function
h ∈ H, let hπ be the function defined by
hπ (x1 , . . . xn ) := h(xπ(1) , xπ(2) , . . . , xπ(n) ).
(Clearly H is a right F G-module.) Now for each function g ∈ H, let
E(g) := g Alt(n) be the orbit of g under the action of the group of all
even permutations. (Note that the orbit length, |E(g)| can assume
various cardinalities; for example if g is the constant function the orbit
length is just one.) Finally we define the orbit sum,
A(g) :=
X
h∈E(g)
h.
(a) Show that the k-multilinear functions form a Sym(n)-invariant
subspace of H.
(b) Show that if g is k-multilinear, than A(g) is a k-multilinear form
that is fixed under the action of the alternating group.
(c) Suppose g is a k-multilinear function, and that for some transposition t in Sym(n),
g t = −g.
577
Then prove that A(g) is an alternating k-linear mapping V (n) →
W . (Of course, when W = F we call this an alternating “form”,
rather than a “mapping”. However, we will have need of the more
general construction in Exercise 18.)
16.8.3
Forms induced on wedge products
Suppose f is a reflexive sesquilinear form on a vector space V over a field F .
8. Let π1 and π2 be any two k-dimensional subspaces of V . Show that
π1⊥ ∩ π2 = 0 if and only if π2⊥ ∩ π1 = 0. [In this case π1 and π2 are said
to be opposite.
→
→
9. Next let x= (x1 , . . . , xk ) and y = (y1 , . . . , yk ) be any two ordered k-sets
→ →
of vectors of V . Then let M ( x, y ) denote the matrix





f (x1 , y1 ),
f (x2 , y1 ),
...
f (xk , y1 ),
...,
...,
...
...,
f (x1 , yk )
f (x2 , yk )
...
f (xk , yk )



.

Show that the mapping
ĝ : V (2k) := V × V × · · · × V (2k factors) → F
defined by
→ →
g(x1 , . . . , xk , y1 , . . . , yk ) := det M ( x, y )
has these properties:
(a) It is alternating in the first k variables (that is, transposing two
of the xi changes the sign).
(b) It is alternating in the last k variables, the yi .
(c) It is 2k-multilinear.
Justify the conclusion that the mapping ĝ factors through
Vk
(V ) to induce a sesquilinear form
g:
^k
(V ) ×
578
^k
(V ) → F.
Vk
(V ) ×
[This form g is uniquely defined by the triple (V, f, k) and is called here,
V
the form induced on k (V ) by f . It can suggestively be denoted
Vk
f , whatever that avails us.]
10. Show that
→ →
M ( x, y ) =

→ →
T

y

 M ( ,→x)
→ T
x)
−M ( y ,


→ →

(M ( y , x)σ )T
11. Show that the sesquilinear form g on
ercize 9 is
if f is symmetric
if f is symplectic
if f is Hermitian
Vk
(V ) defined at the end of Ex-
(a) symmetric if f is symmetric,
(b) symmetric if f is symplectic and k is even,
(c) symplectic if f is symplectic and k is odd, and is
(d) Hermitian if f is Hermitian.
12. Let g be the sesquilinear form on k (V ) defined in Exercise 5 above.
Show that the pure vector x1 ∧ · · · ∧ xk is not isotropic with respect to
g if and only if hx1 , . . . , xk i is a non-degenerate k-subspace of V with
respect to f .
V
13. Show that if ∧x := x1 ∧ · · · ∧ xk and ∧y := y1 ∧ . . . ∧ yk are (pure)
V
vectors of k (V ), then g(∧x, ∧y) is non-zero if and only if hx1 . . . , xk i
and hy1 , . . . , yk i are k-spaces of V which are opposite with respect to
V (see Exercise 7 above).
14. Suppose π = hx1 , . . . , xk i is a k-subspace of V so that ∧x = x1 ∧· · ·∧xk
V
V
is a typical pure vector of k (V ). Show that if ∧x ∈ Radg ( k (V )),
then f is degenerate. [Hint: Show that if ∧x is in the radical of g,
then by the previous exercize π has no opposite k-subspace. Show that
this forces π ⊥ to have codimension less than k. Then use Exercise 6(b)
above to reach the conclusion.]
It is easy to show if f is degenerate than so is the induced form g (the
student should be able to produce a radical element for g from a non-zero
vector in the radical of f , noting as usual that k < dim V ). What about
579
the converse? The previous exercise suggests the following conjecture: If
V
V
(V, f ) is non-degenerate, then so is ( k (V ), g = k f ). At least that exercize
Vk
says that if f is nondegenerate, then Radg ( (V )) contains no non-zero pure
vectors. The next two exercizes imply that g is non-degenerate when (V, f )
is a certain kind of finite-dimensional non-degenerate form.
15. Let f be a non-degenerate symplectic form on a 4-space V , with hyperbolic decomposition he1 , e2 i ⊥ he3 , e4 i, where f (e1 , e2 ) = 1 = f (e3 , e4 ).
Compute the Grammian matrix of g with respect to the ordered basis
B = {e12 , e13 , e14 , e23 , e24 , e34 }, where, as usual, eij := ei ∧ ej .
16. Now suppose (V, f ) has a hyperbolic decomposition
V = he1 , e2 i ⊥ · · · ⊥ he2n−1 , e2n i.
For each k-subset J of the index set [1, . . . , 2n], let e(J) be the pure
vector ∧j∈J ej with the wedge factors taken in the natural ascending
order of the indices. Generalize the calculation of the previous exercise
by showing that with respect to any total ordering of the standard
V
basis B = {e(J)} of k V , the Grammian matrix for g has exactly one
non-zero entry in each row. Since M T = M, −M or M σ , conclude that
this is also true of the columns. This makes M non-singular and makes
g non-degenerate.
17. Now suppose (V, f ) is non-degenerate with a finite orthogonal basis.
Say
V = he1 i ⊥ · · · ⊥ hen i,
For each k-subset J of Ωn := [1, . . . , n], define e(J) as in Exercize 15.
Show that for k < n, the standard basis {e(J)|J ⊆ Ωn , |J| = k} is an
V
orthogonal basis for ( k (V ), g). So again, g is non-degenerate.
16.8.4
Special constructions involving symplectic forms
Here f will always be a symplectic form on a vector space V over a field F .
18. Suppose k is a positive integer greater than 1, but less than (the possibly infinite number) dim V . Suppose W is another vector space and
form the space H of all functions V (n) → W . Then as in Exercize 7, H
580
is a Sym(k)-module. We wish to apply that exercize to the case that
V
the vector space W is k−2 (V ) and g ∈ H is the function defined by
g(x1 , . . . xk ) = x1 ∧ · · · ∧ xk−2 f (xk−1 , xk ),
→
for all k-tuples of vectors, x= (x1 , · · · , xk ). As before, let E(g) be the
orbit of g under
! the action of Alt(k). Notice that this orbit has length
k
exactly
. Let A(g) be the orbit sum of these functions.
2
(a) Show that A(g) is k-multilinear. (Very easy.)
(b) Show that A(g) is an alternating mapping [Hint: Note that g t =
−g if t is the transposition (k − 1, k). Apply Exercize 7(c).]
(c) Conclude that A(g) factors through
transformation
λ(k) :
k
^
(V ) →
Vk
k−2
^
(V ) to produce a linear
(V ),
which is equivariant under the isometry group Aut(V, f ). [Here is
an example: λ(3) takes a ∧ b ∧ c to af (b, c) + bf (c, a) + cf (a, b).
Similarly λ(4) takes a ∧ b ∧ c ∧ d to a sum of six terms.]
19. Suppose (V, f ) is any symplectic form. We have two mappings
∧:
2
^
(V ) ×
2
^
λ(4) :
4
^
(V ) →
4
^
(V ), and
(V ) →
2
^
(V ),
the first induced by the grading of the exterior algebra, and the second
forced by the previous exercise. That of course means that there is an
V
algebra A(V, f ) = ( 2 (V ), ◦) defined by these two mappings: For any
V
two elements a and b of 2 (V ) we have
a ◦ b = λ(4) (a ∧ b).
This algebra depends only on the symplectic space (V, f ) and so is
called here, a symplectic algebra.9 Show that for k ≥ 4, the symplectic algebras are commutative but generally nonassociative.
9
This algebra probably exists in the literature, but I have not seen it mentioned in any
theorems, so I suppose either it hasn’t been thoroughly investigated or at least if it has,
news of it successes hasn’t reached Kansas yet. Considering the recent cancellation of all
but a few token journals, possibly it will never reach Kansas.
581
20. Suppose V is a vector space over the field F . Fix an integer k less
than the (possibly infinite) dimension of V . Suppose H is a collection
of k-subspaces of V with these two properties:
(a) There exists a k-subspace W not belonging to H.
(b) Suppose for any pair of subspaces (U, X) of V with U ≤ W and
dim U = k − 1 and dim X = k + 1, either exactly one of the
k-subspaces between U and X belongs to H, or else all such ksubspaces belong to H
Then H is said to be a geometric hyperplane of the Grassmann
space Ak,n−1 . (Not all these terms are going to be justified here: I am
just saving time.)
This notion is intimately related to alternating k-linear forms, and so
deserves its place in this chapter. Let’s start from the other end and
suppose h is an alternating k-linear form. As we all know, that means
f = sgn(π)f π ,
→
in the notation of Exercise 7. Suppose v = (v1 , . . . , vk ). Then we let
→
P ( v ) denote the space generated by these these vi . Of course, if g is
→
→
an alternating k-linear form, and P ( v ) is not a k-space, then g( v ) = 0.
→
So the big question is what happens when P ( v ) is a k-space.
We say that an alternating k-linear form g kills a k-subspace U of V ,
→
if and only if, for some basis v = (v1 , . . . , vk ),
→
g( v ) = 0.
.
(a) One observes that if this is true of one basis of U , it is true of all
basis of U . (Justify this assertion.)
(b) Suppose h is a non-zero alternating k-linear form (k is an integer
at least two and less than n). Let H be the collection of all ksubspaces of V which are killed by h. Show that H is a geometric
hyperplane of the Grassmann space Ak,n−1 . [No hints.]
Now it is time to record the theorem in the other direction:
582
Theorem 387 (Sh??) Suppose V is a vector space. Fix a positive
integer k and suppose H is a collection of k-subspaces of V satisfying
the conditions (a) and (b) listed in Exercize 19. Then there exists
an alternating k-linear form h such that H is precisely the set of ksupspaces of V which are killed by V .
21. Now suppose f is a symplectic form on the n-space V . Again choose
a positive integer k less than n. Now suppose (U, S, W ) is a chain of
subspaces such that dim U = k − 1, dim S = k, dim W = k + 1. (Then
S and (U, W ) are an incident point-line pair of the Grassmann space
Ak,n−1 .
Suppose S is a non-degenerate k-subspace with respect to the form f .
Then, as f is symplectic, k is even.
(a) Show that there exist non-zero vectors r1 , r2 , r3 , and a non-degenerate
subspace D such that
U = hr1 i ⊥ D,
S = hr2 , r2 i ⊥ D,
W = hr3 i ⊥ hr1 , r2 i ⊥ D = hr3 i ⊥ S.
(b) Show that if S is anon-degenerate k-space with respect to a symplectic form f on V , then for any chain (U, S, W ) exactly one or
all of the intermediate subspaces S are degenerate with respect to
f.
Now according to the theorem cited above, if k is even, there must exist
an alternating k-linear form which kills exactly the k-subspaces which
are degenerate with respect to f . What is this form?
22 (Kasikova) Let (V, f ) be be a symplectic form and let k = 2m be an
even integer. Let ρ : V (n) → F be the function defined by the formula
ρ(x1 , . . . , xk ) = f (x1 , x2 )f (x3 , x4 ) · · · f (x2m−1 , x2m ).
Next let R be the complete orbit ρAlt(k) . (Note, that this orbit length,
|R| is the product of the first m odd integers:1 · 3 · 5 · · · (2m − 1).)
Finally, let h be the orbit sum:
h :=
X
583
r∈R
r.
(a) Show that h is an alternating k-linear form. [Hint: Apply Exercise
7(c) of this chapter.]
(b) Show that h assumes a non-zero value on a k-tuple (x1 , . . . , xk ) if
and only if hx1 , . . . , xk i is a non-degenerate k-dimensional space.
23. Again suppose (V, f ) is a symplectic form. Suppose k is an even positive
integer, and let X = {x1 , . . . , xk } be a set of k linearly independant
vectors. Form a graph Γ(X) whose vertices are the k vectors xi , two of
them being adjacent if and only if the symplectic inner product between
them is non-zero. A complete matching of a graph is a collection
of k/2 vertex disjoint edges covering all k vertices. Some graphs do
not possesses a complete matching – for example one whose connected
components are pentagons. Show that if Γ(X) contains no complete
matching, then hXi is a degenerate k-space. [Hint: Consider Kasikova’s
form h of the previous exercize.]
584
Chapter 17
Quadratic Forms
17.1
Quadratic Forms.
17.1.1
Definitions
Let V be a vector space over a field F . A function Q : V → F is called a
quadratic form if and only if
(1) Q(vα) = Q(v)α2 for all (v, α) ∈ V × F , and
(2) B(u, v) := Q(u+v)−Q(u)−Q(v) defines a bilinear form B : V ×V → F .
The bilinear form B defined by the formula given in (2) is called the
bilinear form associated with Q. Clearly a scalar multiple αQ (whose
value at vector v is (αQ)v := αQ(v)) is also a quadratic form and its associated bilinear form is αB, proportional to B. Similarly, the sum Q1 + Q2
of two quadratic forms is a quadratic form whose associated bilinear form is
B1 + B2 , where Bi is the bilinear form associated with Qi , i = 1, 2. Thus
all quadratics forms on V form a vector space quad(V ) – something nice
to know; but we have little need for it here.1 Patently the bilinear form B
associated with Q is a symmetric form, but in characteristic 2 there is more:
B is then a symplectic form, not just a symmetric form. (One calculates
1
It is interesting to note, however, that when F has characteristic 2, then the act of
“taking the associated bilinear form” produces a semilinear transformation, τ : quad(V ) →
symp(V ), into the space of all symplectic forms on V .
585
that in characteristic 2,
B(u, u) = Q(2u) − 2Q(u) = 0 for all u ∈ V,
so all vectors are isotropic.)
A vector v is said to be singular if and only if Q(v) = 0. Otherwise
v is said to be non-singular. Obviously 0 is a singular vector, and if v
is a singular vector, any scalar multiple of it is also singular. Thus the 1dimensional subspaces of V are segregated into two sets: the singular 1-spaces
and the non-singular 1-spaces.
A subspace W of V is said to be totally singular if and only if Q(W ) = 0.
If W is totally singular, then for any vector w1 , w2 ∈ W ,
B(w1 , w2 ) = Q(w1 + w2 ) − Q(w1 ) − Q(w2 ) = 0 − 0 − 0 = 0,
so B(W, W ) = 0. Thus:
For all subspaces W , totally singular implies totally isotropic.
If the characteristic of F is not 2, the converse even holds. Not only is B
defined by Q, but knowing B, one can recover Q by the formula
Q(u) = B(u, u)/2.
Thus in characteristic distinct from 2, the theory of quadratic forms does not
add much to the theory of symmetric bilinear forms presented in the previous
section. So there is a natural emphasis on characteristic 2 in this subject.
17.1.2
Forms of finite rank
We treat two common examples of quadratic forms of finite rank.
EXAMPLE 1: Let V be an F -vector space with basis X = {x1 , · · · , xn }.
Let p ∈ F [X1 , · · · , Xn ] be a homogeneous polynomial of degree 2 in the
indeterminates X1 , . . . , Xn . We define a function Qp : V → F by writing
P
each vector v (uniquely) as xi αi and then setting
Q(v) = p(α1 , · · · , αn ),
586
where we have substituted coefficient αi for Xi . Then Q is a quadratic form.
(To see this one observes that p is a linear combination of monomial terms
Xi Xj and observe that the mapping F × F → F defined by (αi , αj ) → αi αj
is a quadratic form. From the definition any sum Q1 + Q2 of quadratic forms
is a quadratic form and so Qp is a linear combination of quadratic forms.2 )
Conversely, if X is a finite basis for V , and Q : V → F is a quadratic
form, then we can find a polynomial p such that Q = Qp as above. (We leave
these details to the exercises.)
EXAMPLE 2: There is another way to formally construct a quadratic form
of finite rank from certain data on a basis.
DATA IN CHARACTERISTIC NOT 2: This consists simply in a
symmetric n × n matrix G = (gij ) with entries from F . This matrix can
be used to define a symmetric bilinear form B on V so that gij = B(ui , vj )
where V is the free F -module over (v1 , · · · , vn ). Then set Q(vi ) = B(vi , vi )/2.
So here our basic data is
gij = B(vi , vj ), i ≤ j, and Q(vi ) is determined.
DATA IN CHARACTERISTIC 2: This consists of a symmetric matrix
G = (gij ) in entries from F , but with all diagonal entries gii = 0. Then as
above, if V is a free F -module over {v1 , . . . , vn }, G can be regarded as the
Grammian of a symplectic form B : V × V → F , so gij = B(vi , vj ). We next
assign value Q(vi ) arbitrarily. So our basic data is
gij = B(vi , vj ), i < j, and Q(vi ) arbitrarily assigned.
Then in either case, we have determined a function Q : V → F from the
basic data by this rule:
If u =
X
vi αi , then Q(u) =
X
B(vi , vj )αi αj +
i<j
X
i
Q(vi )αi2 .
(17.1)
If B is to be the bilinear form associated with Q, then this is the only
possible formula for Q, since it follows from the data and the repeated application of the basic formula
Q(a + b) = B(a, b) + Q(a) + Q(b)
to the sum vi αi . So we have a quadratic form Q : V → F formally
constructed from the initial data in each case.
P
2
I was wrong. We did use the fact that quad(V ) was a vector space.
587
17.1.3
More definitions relative to a quadratic form
Now let B be the symmetric bilinear form associated with a quadratic form
Q : V → F . We still employ all of the terminology about B that we had for
sesquilinear forms. Thus the “perp” symbol ⊥ is used relative to B; we say
that subspaces U and W are orthogonal if and only if U ⊆ W ⊥ ; vector v is
isotropic if and only if B(v, v) = 0; and we say that B is non-degenerate
if and only if Rad(B) = 0.
However the definitions regarding Q are just a little different. First the
radical of Q is the set of all singular vectors which lie inside Rad(B), and
is denoted Rad(Q). (We shall see below that Rad(Q) is always a subspace.)
We then say that Q is non-degenerate if and only if Rad(Q) = 0. Now
when the characteristic is not 2, we always have Rad(Q) = Rad(B). But
if char F = 2, it is always possible that Rad(Q) is a proper subspace of
Rad(B), so the distinction is only relevant in this characteristic.
Also for quadratic forms Q : V → F , it is a convention to call a subspace anisotropic with respect to Q if and only if the subspace has no
non-zero singular vectors. Again, in characteristic distinct from 2, where
isotropy and singular-ness coincide, anisotropy with respect to Q is equivalent to anisotropy with respect to B. But in characteristic 2, where B is
symplectic, all vectors are isotropic, so there are no non-zero subspaces which
are anisotropic with respect to B, – so that notion is quite useless. On the
other hand, the notion of being anisotropic with respect to Q becomes a
valuable distinction in this characteristic.
17.1.4
The poset of totally singular subspaces
Now let Q : V → F be a quadratic form in arbitrary characteristic. Let B
be its associated bilinear form. Suppose a and b are singular vectors which
are perpendicular. Then
Q(a + b) = B(a, b) + Q(a) + Q(b) = 0 + 0 + 0 = 0.
Setting b0 = bα, we see that (a, b0 ) enjoys the same hypotheses that we
imposed upon (a, b), so Q(a + bα) = 0. So it follows that ha, bi is a totally
B-isotropic subspace with all its its 1-spaces singular.
Conversely, assume ha, bi is totally singular. Then Q(a+b) = B(a, b) = 0,
so a and b are orthogonal, and ha, bi is totally B-isotropic. Thus we have
588
Lemma 388 Let Q : V → F be a quadratic form with associated bilinear
form B. Then one has:
1. Any totally singular subspace is totally isotropic with respect to B.
2. If S is any collection of pairwise orthogonal singular vectors, then hSi
is a totally singular subspace of V . In, fact the sum over any family
of pairwise orthogonal totally singular subspaces is totally singular. In
particular, every maximal totally singular subspace (if there are any)
contains Rad(Q).
3. Rad(Q) is a subspace of Rad(B). (They are equal if char F 6= 2.)
4. Q, when restricted to Rad(B), is an additive function (that is, a homomorphism of additive groups)
Q|Rad(B) : (Rad(B), +) → (F, +),
with kernel equal to Rad(Q).
The proofs of the following Corollaries are immediate consequences of the
preceeding Lemma, and are left as exercises for the student.
Corollary 389 Let R be any subspace of Rad(Q). Set V̄ := V /R and let
B̄ : V̄ × V̄ → F be the bilinear form induced on V̄ by B. Then the function
Q̄ : V̄ → F given by Q̄(u + R) = Q(u), u ∈ V is well-defined, and is a
quadratic form on V̄ with B̄ as its associated bilinear form.
Corollary 390 Suppose Q is a quadratic form over the F -vector space V
with associated bilinear form B.
1. Then the collection T of all totally singular subspaces of V forms a
polar system with respect to (V, B),in the sense of Subsection 16.2.1 of
the previous Chapter.
2. If hT i = V and every member of T lies in a maximal member of T ,
then Rad(T ) = Rad(Q).
589
17.1.5
Radicals of quadratic forms in characteristic 2
Let us refine conclusion 3. of Lemma 388 by examining the relationship
between Rad(Q) and Rad(B) in characteristic 2.
We first note that the set F 2 of all squares in F forms a subfield of F . Of
course in some cases F 2 = F – for example when F is a finite field of order
q = 2m . On the other hand, if F = K(x) where x is transcendental over the
subfield K, then F 2 6= F since x has no square root in F .
Let us suppose W = Rad(B). Then Q restricted to W is an additive
function
Q|W : (W, +) → (F, +)
and ker Q|W = Rad(Q) and
W = Rad(Q) ⊥ A
where A is anisotropic with respect to Q. For the moment, A is simply any
vector space complement of Rad(Q). How large can A be? The answer is
that A can have dimension as large as the field index [F : F 2 ]. Suppose {ωσ }I
is an F 2 -basis for F , as a vector spaces over F 2 , indexed by I. Suppose A
has an F -basis{aσ }J where J is a subset of I, and Q is formally defined from
data (as in Example 2) by setting Q(aσ ) = ωσ . Then, as B is zero here, for
all ασ ∈ F ,
X
X
Q( aσ ασ ) = ασ 2 ωσ ,
which, thanks to the F 2 -independance of the ωσ , cannot be zero unless all
ασ = 0. This makes A anisotropic. We thus have:
Lemma 391 If Q : V → F is a quadratic form with charF = 2, then
Rad(B) = Rad(Q) ⊥ A,
where A is anisotropic. The codimension of Rad(Q) in Rad(B) is bounded
by the field index [F : F 2 ] and this bound is best possible.
In particular, if F 2 = F (as is the case with finite fields), then Rad(B) =
Rad(Q) or else Rad(Q) is a hyperplane of Rad(B).
590
17.2
Hyperpolic decompositions with respect
to a quadratic form.
We wish to discuss next, what happens when there are singular vectors not
in Rad(Q). To do this we must introduce a slight refinement of our previous
notion of “hyperbolic space” for sesquilinear forms that is useful for quadratic
forms.
Suppose Q : V → F is a quadratic form with associated symmetric form
B. We say that V is a hyperbolic space(of the quadratic form) if and only
if
V =< s1 , t1 >⊥ · · · < sn , tn >
where “⊥” is taken with respect to the form B, B(si , ti ) = 1, and each of the
si and ti are singular vectors. Under the older notion of “hyperbolic for B”,
it was required that si and ti be isotropic for all i; here they are required to
be singular. Of course there is no dif