Download A Beginner`s Guide to Modern Set Theory

Document related concepts
no text concepts found
Transcript
A Beginner’s Guide
to Modern Set Theory
Martin Dowd
Product of
Hyperon Software
PO Box 4161
Costa Mesa, CA 92628
www.hyperonsoft.com
c 2010 by Martin Dowd
Copyright 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Formal logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3. Axioms of equality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. The integers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. Informal set theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6. Structures and models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7. Models of Peano arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8. The real numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9. Computability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
10. Independence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
11. ZFC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
12. Proper classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
13. Ordinals and cardinals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
14. The real numbers (II). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
15. The continuum hypothesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
16. Absoluteness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
17. Admissible sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
18. Formalization of syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
19. Constructible sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
20. CH is true in L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
21. Forcing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
22. ¬CH is consistent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
23. Clubs, stationary sets, and diamond. . . . . . . . . . . . . . . . . . . . 84
24. Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
25. The Suslin hypothesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
26. Diamond implies ¬SH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
27. Iterated forcing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
28. Martin’s axiom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
29. SH is consistent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
30. Inaccessible cardinals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
31. Mahlo cardinals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
32. Greatly Mahlo cardinals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
33. Reflection principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
34. Indescribable cardinals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
35. Ultrapowers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
36. Measurable cardinals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
37. Indiscernibles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
38. 0#. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
39. Relative constructibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
40. Direct limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
41. L[U ] and iterated ultrapowers. . . . . . . . . . . . . . . . . . . . . . . . . . 131
42. The sharp operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
i
43. Cardinals larger than measurable. . . . . . . . . . . . . . . . . . . . . .
44. Kunen’s theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45. Rudimentary functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46. The Jensen hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47. Fine structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48. Upward extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49. Fine structural ultrapowers. . . . . . . . . . . . . . . . . . . . . . . . . . . .
50. The covering lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51. Cardinal arithmetic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52. Square. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53. Independence of AC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54. Proper forcing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55. Core models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56. Consistency strength. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57. Descriptive set theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58. Determinacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59. Determinacy and descriptive set theory. . . . . . . . . . . . . . .
60. Determinacy and 0#. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61. Determinacy and large cardinals. . . . . . . . . . . . . . . . . . . . . . .
62. Forcing axioms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63. Some observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix 1. Axioms for plane geometry. . . . . . . . . . . . . . . . . .
Appendix 2. Computability (II). . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Index of symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
133
143
145
151
157
164
167
173
176
178
180
181
183
188
189
199
202
207
211
212
213
214
227
235
241
246
1. Introduction.
As the title suggests, this book is intended to provide an introduction to modern set theory, to readers with little or no knowledge of
mathematical logic. As such, it should be useful to anyone interested
in learning about modern set theory, without having to wade through
an entire text such as the “Millennium Edition” [Jech2]. Readers might
fall in to two categories, those who are not interested in reading further,
and those who are. For the latter, this book hopefully provides useful
orientation.
It is hoped that advanced high school students will find this book
useful. Admittedly only the most intrepid student would finish it in high
school; but the first 15 chapters, and the two appendices, are hopefully
fairly accessible. Resources for advanced high school mathematics are
mainly in calculus and linear algebra, with some resources in other areas. Resources in mathematical logic have typically been scarce, one
example being a 1958 book on Godel’s proof [NagNew]. The website
[Wiki, Mathematical logic] has overviews of various topics, and links to
additional resources.
The present book contains an introduction to mathematical logic
sufficient for its purposes, and thus should serve as a useful introduction
for other purposes. Various other topics are covered for the same reason,
so that the book is fairly self-contained.
Set theory, like any branch of contemporary mathematics, consists
of an overwhelming volume of technical definitions and arguments. On
the other hand, non-technical introductions sometimes engage in circumlocutions intended to avoid technical detail, so convoluted that they
become confusing. The present book pursues an intermediate course,
covering technical details in outline and giving references, so that the
main content can be given with some discussion of technical details.
The book consists of a series of sections, each covering a particular
topic. The table of contents gives a list of the sections. The end of a
proof is denoted using the symbol “⊳”. The author thanks Dr. Herbert
Enderton for reading a draft of the manuscript.
2. Formal logic.
It is a discovery of late 19th and early 20th century mathematics,
that mathematical theorems can be stated and proved in formal logic.
This discovery did not change the way mathematics is done; theorems
are proved by working mathematicians using informal logic, which other
mathematicians can follow, and which may refer to extensive amounts of
material already accepted as fact. Rather, formal logic brought complete
precision to the analysis of mathematical reasoning, clarified various
issues which had been under debate, and produced formal logic as itself
1
a branch of mathematics.
Formal logic relies on the fact that statements of mathematics can
be specified in a formal language. Indeed, this observation holds in
other areas, and formal logic has found uses in addition to its use in
mathematics. Statements are finite strings of symbols, each symbol
being chosen from an “alphabet” of symbols. For this reason, formal
logic is also called symbolic logic.
The alphabet of the formal language of mathematics is divided into
groups of symbols, as follows.
Logical symbols
Punctuation marks
(),
Propositional connectives ¬ ∧ ∨ ⇒⇔
Quantifiers
∀∃
Variables
x0 , x1 , . . .
Non-logical symbols
Predicate symbols
P0n , P1n , . . .
Function symbols
f0n , f1n , . . .
Constant symbols
c 0 , c1 , . . .
The superscript n in predicate and function symbols is an integer giving its “valency”, i.e., the number of arguments it applies to; it will
invariably be omitted.
Not every string of symbols is “legal”; those that are, are called
formulas. These may be defined by giving rules for building them up, as
follows. A term is either a variable, a constant symbol, or f (t1 , . . . , tn )
where f is a function symbol of valency n and t1 , . . . , tn are terms. An
atomic formula is a formula P (t1 , . . . , tn ) where P is a predicate symbol
of valency n and t1 , . . . , tn are terms. A formula is either an atomic
formula, ¬F , F1 ∧ F2 , F1 ∨ F2 , F1 ⇒ F2 , F1 ⇔ F2 , ∀xF , or ∃xF , where
F , F1 , and F2 are formulas and x is a variable.
The preceding style of definition, where objects which are already
built up can be use to build up new objects, is called “recursive”. A
“shortcut” has been taken; the subformulas in the definition of a formula
should be enclosed in parentheses, to avoid ambiguity, although some
of the parentheses can be made optional (requiring a more laborious
recursive definition).
The notion of the free and bound occurrences of variables in a formula is an important one, and may be defined recursively as follows.
In an atomic formula, all occurrences of variables are free. In a propositional combination of formulas, all occurrences of variables are free
or bound as they are in the constituent subformulas. In ∀xF or ∃xF ,
any free occurrence of x in F becomes bound; all other occurrences are
free or bound as they are in F . A sentence is a formula in which all
2
occurrences of variables are bound.
Later it will be seen that, given an interpretation in a mathematical
setting of the non-logical symbols, a meaning can be assigned to any
formula. Some discussion is useful here. In general, a formula defines a
“predicate” on the “universe of discourse”: if values from the universe
are assigned to the free variables, the formula takes on the value of either
true or false. In particular, a sentence is a statement which is either true
or false.
A brief statement of the meaning of the propositional connectives
and quantifiers can be given, as follows.
¬F means “not F ” (negation)
F1 ∧ F2 means “F1 and F2 ” (conjunction)
F1 ∨ F2 means “F1 or F2 ” (disjunction)
F1 ⇒ F2 means “if F1 then F2 ” (implication)
F1 ⇔ F2 means “F1 if and only if F2 ” (bi-implication)
∀xF means “for all x, F ” (universal quantification)
∃xF means “there exists x, F ” (existential quantification)
Having a formal definition of a mathematical statement, a formal
definition can now be given of a proof. Certain formulas are specified
as “axioms”, and rules are given for deducing formulas from formulas
already deduced. Some axioms are axioms of formal logic, and are
called “logical”. Other axioms are specific to a particular setting, and
are called “non-logical”. The rules are all logical.
The logical axioms of formal logic are chosen so that they are true
in any setting, and in any setting the rules produce true statements from
statements already known to be true. The non-logical axioms are true
in settings of interest.
Even though the principles are clear without giving one, an example
of a system of logical axioms and rules will be given. Such will be given
for a variation of the alphabet, namely a smaller one. A larger alphabet is more expressive, but a smaller alphabet results in fewer axioms
and rules. Needless to say, the variation is inessential; in particular,
the larger alphabet can be expressed in terms of the smaller one. The
alphabet of the axioms and rules will be ¬ ⇒ ∀.
In the following, let F, G, H be formulas. If F is a formula, x a
variable, and t a term, Ft/x will denote the formula obtained from F by
replacing each free occurrence of x by t. There are three propositional
logical axioms.
F ⇒ (G ⇒ F )
(H ⇒ (F ⇒ G)) ⇒ ((H ⇒ F ) ⇒ (H ⇒ G))
(¬F ⇒ G) ⇒ ((¬F ⇒ ¬G) ⇒ F )
There is one propositional rule.
3
From F and F ⇒ G, deduce G.
There is one quantifier axiom.
F ⇒ ∀xG |= F ⇒ Gt/x , provided no occurrence of a variable of t
becomes bound.
There is one quantifier rule.
From F ⇒ G deduce F ⇒ ∀xG, provided x does not occur free in
F.
Note that arbitrary formulas may occur in a proof, and not just
sentences. This is an artifact of the method; quantifiers get introduced
as the formulas of the proof become more complex. A formula is considered to be true if it is true, regardless of the values assigned to the
free variables (if its “universal closure” is a true sentence).
As has been seen, the “syntax” of formal (or mathematical) logic
consists of an alphabet, and rules for building formulas. Statements
of mathematics are proved to be true using the axioms and rules of a
formal system for making deductions. The semantics of mathematical
logic consists of assigning in a rigorous manner a meaning to each formula; this requires some additional concepts, and is left to section 6.
Once all this is specified, theorems may be proved about mathematical
logic itself, which delineate the way in which it captures mathematical
reasoning.
There are a number of introductions to mathematical logic, among
them [Belaniuk], [Enderton], [Mendelson], [Magnus], and chapter 11 of
the author’s self-published advanced undergraduate algebra text
[Dowd1]. As will be seen in section 11, formal logic is an essential
ingredient of modern set theory. Historically, early developments in
mathematical logic and set theory overlapped and influenced each other.
A relatively recent development in mathematical logic is the use of
computers to produce “formal proofs” of mathematical theorems, using
a known “informal proof” as a starting point. The December 2008 issue
of the Notices of the American Mathematical Society contains several
articles on the subject.
3. Axioms of equality.
The equality predicate, for which the symbol = is used, has a special
status in formal logic. It is a binary (valency 2) predicate. As for many
common binary predicates, the notation x = y is used in mathematical
writing rather than =(x, y).
In settings where equality is present, it is meant to be interpreted
as equality, that is, x = y holds only when x and y are assigned the
same value. There are some subtleties in handling the special status of
the equality predicate; and some variations in how this is done. More
will be said in section 6.
4
If equality is present, the axioms for it may be considered to be
added as “quasi-logical” (standardized non-logical) axioms. These are
as follows.
x=x
x=y⇒y=x
x=y⇒y=z⇒x=z
x1 = y1 ⇒ · · · ⇒ xn = yn ⇒ P (x1 , . . . , xn ) ⇒ P (y1 , . . . , yn ), for
any valency n predicate symbol P .
x1 = y1 ⇒ · · · ⇒ xn = yn ⇒ f (x1 , . . . , xn ) = f (y1 , . . . , yn ), for any
valency n function symbol f .
In the foregoing, x, y, etc. denote variables. Also, the abbreviation F1 ⇒
· · · ⇒ Fk is used for F1 ⇒ (· · · ⇒ Fk ); this may also be written as
(F1 ∧ · · · ∧ Fk−1 ) ⇒ Fk , or just F1 ∧ · · · ∧ Fk−1 ⇒ Fk .
The axioms of equality are written without quantifiers, all variables
being implicitly universally quantified. This is a serendipitous coincidence between common use in mathematical writing, and a convention
of formal logic.
4. The integers.
The integers are fundamental mathematical objects, which are familiar from everyday life. With modern machinery, a theory of the
integers can be given either for all the integers, including negative integers; or for only the non-negative integers. Historically, the theory
of the non-negative integers has been important in the development of
mathematical logic, and it continues to play a significant role.
The non-negative integers 0,1,2,. . . comprise a universe of discourse
concerning which mathematical statements can be made. A set of nonlogical symbols which turns out to be satisfactory as those of the formal
language for such statements is as follows:
a constant 0;
a valency 1 function s, the successor function;
a valency 2 function +, addition;
a valency 2 function ·, multiplication; and
the equality predicate =.
The notation xs will be used for the successor function; xs equals x + 1.
Even though it is not ordinarily used in mathematical writing, it is
convenient and traditional to have it as one of the symbols of the formal
language in this setting.
The above symbols comprise the language of Peano arithmetic. Let
F denote a formula in this language, and let x, y, etc., denote variables.
The following formulas are known as Peano’s axioms.
1. xs = y s ⇒ x = y
2. ¬xs = 0
5
3.
4.
5.
6.
7F .
x+0=x
x + y s = (x + y)s
x·0 = 0
x · y s = (x · y) + x
F0/x ∧ ∀x(F ⇒ Fxs /x ) ⇒ ∀xF .
Again, axioms 1 to 6 are written without quantifiers, and all variables are implicitly universally quantified. Peano’s axioms are clearly
basic facts about the non-negative integers. In accordance with the axiomatic method, they are taken as true, and more complex statements
deduced to be true by mathematical reasoning.
Axiom 7 is an infinite family of axioms, one for each formula F
(and variable x). Such a system of axioms is called an axiom scheme,
and these occur frequently in mathematical logic. Note that x is not
required to occur free in F ; some authors do require this, but it is
unnecessary to do so. This axiom scheme is a formal statement of the
principle of mathematical induction. Mathematical induction may be
stated in a version using sets of integers; but the formal machinery given
so far does not provide for this, and Peano’s axioms provide a method
for giving axioms for the non-negative integers within the confines of
basic formal logic. Historically, this was a reason for their introduction.
They remain a topic of considerable interest in mathematical logic, even
though they are subsumed by formal set theory, as will be seen.
In particular, the “logical strength” of Peano’s axioms is of great
interest. As will be noted in section 10, not every true statement about
the integers can be proved using them (this is in fact the case for any
formal system for arithmetic which proves only true statements); but
stronger systems can be given. Whether a particular true statement
about the non-negative integers can be proved using Peano’s axioms is
a topic of interest in mathematical logic.
Of course, Peano’s axioms are of interest because they are strong
enough that a wide variety of basic facts about the non-negative integers
can be proved using them. Treatments of this topic can be found in
[Mendelson] and [Shoenfield1]. Among these facts are the following.
- The basic properties of + and · are provable.
- There is a formula defining the order relation ≤ (indeed, x ≤ y if
and only if ∃w(y = x + w)), and its basic properties are provable.
- The “division law” states that for any nonnegative integer x and
positive integer d there are unique nonnegative integers q and r
such that x = q · d + r; this is provable.
- The exponential function is definable, that is, there is a formula
E(x, y, z) which is true if and only if z = xy . The basic properties
of the exponential function are provable.
6
- More generally, any of the class of functions known as the primitive
recursive functions (see appendix 2) is definable.
Another result of mathematical logic of interest concerning Peano’s
axioms, is that there is no finite set of axioms from which the statements
provable are exactly those provable in Peano arithmetic. [Shoenfield1]
has a proof of this.
5. Informal set theory.
Informal set theory has become so indispensable to mathematical
discourse that it is now taught early in mathematical education. Like
the integers, the sets are mathematical objects which comprise a mathematical universe of discourse. Indeed, they comprise a single universe
of discourse for all of mathematics. This is a more advanced topic, but
in view of the fact, it should not be surprising that the notion of a set
is useful throughout mathematics.
Basic set theory and logic are both tools used throughout mathematics, in particular in the consideration of each other. This results in
the need for “forward references” in the presentation of the two topics,
which various authors handle in various ways. A formal definition of the
meaning of formulas has been deferred to section 6, and until then the
reader’s existing knowledge will be relied on, indeed already has been
in the preceding section.
The language of set theory has a single binary predicate symbol,
called “membership” and denoted ∈. The fact that x ∈ y is stated
variously as, x is a member of y, x is an element of y, or x belongs to
y. The notation x ∈
/ y is used to abbreviate ¬(x ∈ y). The equality
predicate will also be considered a basic symbol, although in set theory
it can be defined. The formula
x = y ⇔ ∀w(w ∈ x ⇔ w ∈ y)
is called the extensionality axiom. It is assumed as an axiom of set
theory if equality is considered to be a predicate symbol; or it may be
taken as the definition of equality.
The concepts of informal set theory can all be defined in terms
of membership and equality. However, it is necessary to posit that
certain construction operations can be carried out to obtain new sets
from already known sets. The axioms of set theory give formal rules for
these constructions.
For example, if objects x1 , . . . , xk are given then there is a set
{x1 , . . . , xk } whose elements are exactly these objects. In set theory
there is no distinction between an object and a set; but in specific settings it may be convenient to make such a distinction. For example, one
can consider the integers as objects, and then consider sets of integers.
7
The integers can be defined within set theory as specific sets, in a way
which by now is standard; this will be discussed further in section 13.
The set containing no elements is called the empty set and denoted
∅. The axioms of set theory ensure that it exists and is unique. It plays
a role in set theory analogous to 0 in arithmetic.
The main topics of informal set theory can be organized into the
following areas.
- Subsets, the power set, and operations on the power set.
- Ordered ntuples and the Cartesian product.
- Relations.
- Functions.
Each of these will be considered in turn. The website [Wiki, Naive set
theory] is one of numerous references covering these topics, and has links
to additional resources. Introductory set theory books such as [Monk1]
cover them also, deriving basic facts from the axioms. Textbooks in
other areas of mathematics frequently review informal set theory in
introductory material, [Dowd1] for example.
A set x is said to be a subset of a set y, written x ⊆ y, if w ∈ x ⇒
w ∈ y. By the extensionality axiom, x = y if and only if x ⊆ y and
y ⊆ x. If x ⊆ y but x 6= y then x is said to be a proper subset of y, and
this is written x ⊂ y. It should be noted that, as usual, the foregoing is
just one of various notational conventions in use.
If x is a set then the collection of all its subsets comprises a set,
called the power set of x, and denoted Pow(x). This is one of the
construction principles provided in the axioms of set theory (indeed,
it is the power set axiom). Note that ∅ ⊆ x (the defining formula
holds “vacuously”, since there are no w satisfying w ∈ ∅); and hence
∅ ∈ Pow(x) for any set x.
Suppose U is a set; then the following operations may be defined
on Pow(U ).
- union: w ∈ x ∪ y if and only if w ∈ x or w ∈ y.
- intersection: w ∈ x ∩ y if and only if w ∈ x and w ∈ y.
- complement: w ∈ xc if and only if w ∈ U and w ∈
/ x.
The following formulas are the axioms for the structures known as
Boolean algebras, with the binary functions ∪ and ∩, the unary function
c
, and the constants ∅ and U (structures are defined in section 6).
- x ∪ y = y ∪ x, x ∩ y = y ∩ x
- x ∪ (y ∪ z) = (x ∪ y) ∪ z, x ∩ (y ∩ z) = (x ∩ y) ∩ z
- x ∪ (y ∩ z) = (x ∪ y) ∩ (x ∪ z), x ∩ (y ∪ z) = (x ∩ y) ∪ (x ∩ z)
- x ∪ ∅ = x, x ∩ U = x
- x ∪ xc = U , x ∩ xc = ∅
8
It is easy to verify that Pow(U ) forms a Boolean algebra with the operations given above. Further identities involving these operations may
be proved from the axioms, with that advantage that they then have
been shown not only for Pow(U ), but for any Boolean algebra. Such
identities may be found in various references, including [Dowd1].
The operations x∪y and x∩y are in fact defined for any pair of sets.
A generalization of the union operation is important in the development
of formal set theory. The complementation operation however is only
defined on the subsets of a given set. The relative complement, or
difference, x − y may be defined for any sets x and y: w ∈ x − y if and
only if w ∈ x and w ∈
/ y.
The use of the minus sign for both subtraction of real numbers and
relative complement causes no confusion. The context makes clear which
is intended, with rare exceptions which can be clarified explicitly. For
readers familiar with the concept of “overloading” from programming
languages, the minus sign is overloaded, and may have arguments which
are real numbers (or more generally elements of a commutative group);
or sets.
Additional terminology includes the following. A set y is said to be
a superset of x, written y ⊇ x, if x is a subset of y. Sets x and y are
said to be disjoint if x ∩ y = ∅. A set z is the disjoint union of sets x
and y if z = x ∪ y and x ∩ y = ∅. The symmetric difference x ⊕ y of two
sets equals (x − y) ∪ (y − x).
As noted above, if x and y are objects there is a set {x, y} such
that w ∈ {x, y} if and only if w = x or w = y. This in fact is the axiom
of pairing. If x and y are the same object than {x, y} only contains a
single object, otherwise it contains two objects. Also, {x, y} and {y, x}
are the same set.
One of the basic constructions of set theory is that of the ordered
pair hx, yi of two objects x and y. This is designed to have the property
that hx1 , y1 i = hx2 , y2 i if and only if x1 = x2 and y1 = y2 . It is not
necessary to add this as a basic construction principle; hx, yi may be
defined to be {{x}, {x, y}}. It follows using the axioms of extensionality
and pairing that with this definition hx, yi has the desired property. A
history of the notion of ordered pair can be found in [Kanamori1]; the
modern definition is therein credited to Kuratowski.
The Cartesian product x × y of two sets x and y is defined to be the
set such that w ∈ x × y if and only if w = hw1 , w2 i where w1 ∈ x and
w2 ∈ y. In a more convenient notation, the definition may be written
as
x × y = {hw1 , w2 i : w1 ∈ x, w2 ∈ y}.
From hereon such notation will be used without further comment. In
9
formal set theory the Cartesian product is proved to exist from the axioms. In informal set theory the existence may be accepted as intuitively
obvious; note, however, that x × y ⊆ Pow(Pow(x ∪ y)), and this fact is
part of the formal existence proof.
The Cartesian product x1 × · · · × xn of n sets may be defined recursively to be x1 × (x2 × · · · × xn ). There is an obvious correspondence
between hw1 , hw2 , w3 ii and hhw1 , w2 i, w3 i, which can usually be ignored,
and the triple written as hw1 , w2 , w3 i, which in tedious formality is the
first version. Similar remarks hold for other nested Cartesian products.
An nary relation on a set x is defined to be a subset of x × · · · × x,
where there are n factors of x. If n = 1 the relation is called unary; a
unary relation is the same thing as a subset. If n = 2 the relation is
called binary.
A function f from a set x to a set y is a subset of x × y, such that
for all u ∈ x there exists a unique v ∈ y, such that hu, vi ∈ f . A function
assigns an element of y to each element of x. [Kanamori1] notes that
the definition of a function in this generality was an early triumph of
set theory, with Felix Hausdorff being a major contributer. Having a
definition such as this, a function may be considered as an object, as is
done in calculus for example.
The notation f : x 7→ y is used to denote that f is a function from x
to y. Basic definitions concerning such a function include the following.
- f (u) = v may be written, rather than hu, vi ∈ f ; similarly f (u)
may be used for v in formulas.
- In mathematical writing, the terminology “graph of f ” is used for
the relation f , although in formal set theory f as an object is the
relation.
- The domain of f is x; Dom(f ) will be used to denote it.
- For x′ ⊆ x, f [x′ ] denotes {v : ∃u ∈ x′ (f (u) = v}.
- The range of f equals f [x]; Ran(f ) will be used to denote it.
- If x′ ⊆ x the restriction of f to x′ is the set {hu, vi ∈ f : u ∈ x′ }.
This is a function from x′ to y, which is denoted f ↾ x′ .
- f is said to be injective, or 1-1, if f (u1 ) = f (u2 ) implies u1 = u2 .
- f is said to be surjective, or onto, if its range is y.
- f is said to be bijective, or a 1-1 correspondence, if it is both injective and surjective.
- If f : x 7→ y and g : y 7→ z then there is a function g ◦ f : x 7→ z,
defined by the formula (g ◦ f )(u) = g(f (u)). This function is called
the composition of g and f .
- An nary function on a set x is just a function from x × · · · × x to
x, where there are n factors of x in the domain.
A function is also called a mapping or map, emphasizing the fact that,
10
in addition to constituting an object itself, it has an “active” aspect.
The function from X1 × X2 to Xi where i is 1 or 2, which maps
hx1 , x2 i to xi , is called a projection function. These functions are quite
convenient, and will be denoted as π1 and π2 . Note that, for example,
Dom(f ) = π1 [f ].
In formal set theory, a notion of the “size”, or “cardinality”, of an
arbitrary set may be defined; this was an early triumph of set theory, due
to Cantor. A treatment will be given in section 13; here a few facts are
noted which will be needed before section 13. Given two cardinalities,
one is greater than or equal to the other; and given any cardinality there
are larger ones.
For a nonnegative integer n, let Nn be the set {0, . . . , n − 1}; N0
is the empty set (it will be seen in section 13 that in set theory the
notation Nn is unnecessary). A set x is said to be finite if there is a
bijection from Nn to x for some n. It may be shown by induction on k
that if f : Nk 7→ Nl is a bijection then l = k. It follows that n is unique;
this unique n is said to be the cardinality of x.
A set is said to be infinite if it is not finite. Letting N denote the
set of all natural numbers, a set x is said to be countably infinite if there
is a bijection f : N 7→ x. Such a set is infinite.
Rather than attempting to be encyclopedic in this section, additional definitions of basic set theory will be introduced as needed.
6. Structures and models.
As already noted, set theory is a tool required in the development
of mathematical logic. The notion of a universe of discourse referred to
in earlier sections can be formalized using it.
A first-order language is defined to be a set of predicate, function,
and constant symbols. Each predicate or function symbol has a valency
associated with it. For many purposes, the set may be finite; however
there are contexts where infinite sets are used, and the definition may
easily be given in this generality. In a special case of frequent interest, there may be an infinite set of constants, while the predicates and
functions are a fixed finite set.
Given a first-order language L, a structure for L consists of a
nonempty set D, called the domain or universe of the structure, together with the following.
- For each nary predicate symbol P of L, an nary relation P̂ on D.
- For each nary function symbol f of L, an nary function fˆ on D.
- For each constant symbol c of L, a element ĉ of D.
The relation, function, or constant assigned to a symbol is called its interpretation. Predicate symbols are also called relation symbols. In this
section, if = is in L, initially no restriction is placed on its interpretation.
11
Set-theoretically, a structure is a domain D, together with a function assigning to each symbol of L its interpretation. A frequently used
notational abbreviation is to let D denote the structure, with the function understood, and let P̂ , etc. denote the interpretation of P according
to the structure.
The interpretation of a valency n predicate symbol is an nary relation. From hereon a valency n predicate symbol will be called nary.
Likewise, a valency n function symbol will be called nary.
A formal definition of the meaning of a formula F in a first order
language, in a structure D for the language, will now be given. Typically of mathematical logic, the definition is a tedious and long-winded
formalization of a fact which is completely obvious.
To begin with, the semantics of the propositional connectives must
be specified. Let {t, f } be the two element set of “truth values” true
and false. A propositional connective denotes a function on this set;
the same symbol will be used to denote this function as the connective
itself. For ¬ the function is unary, with ¬t = f and ¬f = t, For the
other connectives the function is binary, as follows.
X Y
t t
t f
f t
f f
X ∧Y
t
f
f
f
X ∨Y
t
t
t
f
X⇒Y
t
f
t
t
X⇔Y
t
f
f
t
Given a structure D and a set of variables V , an assignment to V
is defined to be a function α which assigns to each x ∈ V an element of
D. For a term t, let Vt be the variables which occur in t. Similarly for
a formula F let VF be the variables which occur free in F .
Given a structure D, the interpretation t̂ of a term t is a function
from assignments to Vt , to D. It is defined recursively as follows.
- If t is a variable x then t̂ is the function which assigns to the assignment α to {x}, the value α(x).
- If t is a constant c then t̂ is the function which assigns to the empty
assignment, the value ĉ of c in the interpretation. The ambiguity
of the notation causes no confusion.
- If t = f (t1 , . . . , tn ) and α is an assignment to Vt , for 1 ≤ i ≤ n
let αi be the assignment to Vti induced by α, i.e., α ↾ Vti . Then
t̂(α) = fˆ(t̂1 (α1 ), . . . , t̂n (αn )).
Similarly, given a structure D, the interpretation F̂ of a formula F
is a function from assignments to VF , to {t, f }. It is defined recursively
as follows.
- If F is an atomic formula P (t1 , . . . , tn ) then
F̂ (α) = P̂ (t̂1 (α1 ), . . . , t̂n (αn )), where αi is as for terms.
12
For the remaining cases let αi = α ↾ VFi .
- If F is ¬F1 then F̂ (α) = ¬F̂1 (α1 ).
- If F is F1 ∧ F2 then F̂ (α) = F̂1 (α1 ) ∧ F̂2 (α2 ).
- If F is F1 ∨ F2 then F̂ (α) = F̂1 (α1 ) ∨ F̂2 (α2 ).
- If F is F1 ⇒ F2 then F̂ (α) = F̂1 (α1 ) ⇒ F̂2 (α2 ).
- If F is F1 ⇔ F2 then F̂ (α) = F̂1 (α1 ) ⇔ F̂2 (α2 ).
- If F is ∀xF1 then F̂ (α) = t if and only if F̂1 (β) = t for all assignments β to VF1 such that β ↾ VF = α.
- If F is ∃xF1 then F̂ (α) = t if and only if F̂1 (β) = t for some
assignment β to VF1 such that β ↾ VF = α.
Some basic definition from mathematical logic are as follows. Fix
a first order language L.
- A formula is said to be a formula in (or over) L if its non-logical
symbols are all in L.
- If A is a set of formulas in L, and F is a formula in L, the notation
A ⊢ F is used to denote the fact that there is a proof of F in formal
logic, using axioms from A, where all formulas of the proof are in
L.
- Given a structure D for L, and a formula F in L, |=D F is used to
denote the fact that F is true in D.
- Given a set A of formulas in L, the fact that |= F holds for every
F ∈ A is denoted |=D A, and D is said to be a model of (or for) A.
- A set of formulas A is said to be consistent if for no sentence F do
both F and ¬F have proofs.
Suppose |=D A, and A ⊢ F . It is straightforward (if tedious) to
show that |=D F . This fact is called the “soundness” of formal logic;
it states that the logical axioms and rules are “sound”. A proof of this
fact may be found in any of various introductory logic texts, including
[Enderton], [Mendelson], and chapter 11 of [Dowd1]. Note that “extra”
symbols may be allowed in a proof; this follows by simply enlarging (the
technical term is “expanding”) L.
Suppose for any D, if |=D A then |=D F ; then A ⊢ F . This fact
is called the “completeness” of formal logic. Not only does formal logic
prove only true statements, it proves all statements which follow “by
logic alone” from the non-logical axioms. That is, either a formula is true
in some models and false in others (so additional axioms are needed); or
it follows from the axioms by formal logic. The completeness theorem
was first proved by Kurt Godel in 1929; a proof may be found in any of
the above cited references.
Given a proof of F from A, let A0 be the formulas of A which
occur in the proof; this set is finite. Let L0 be the symbols of L which
occur in A0 or F . A model D for A0 in L0 may be considered a model
13
in L; and since there is a proof of F , it is true in D considered as
a model in L, whence it is true in D considered as a model in L0 .
By completeness, then, there is a proof of F from A0 which uses only
symbols from L0 . There are “syntactic” proofs of facts such as this,
using “Gentzen systems” for example; see [Smullyan].
If a set A of formulas has a model then it is consistent, since for a
sentence F only one of F and ¬F can be true in the model, so only one
can be provable. It follows by completeness that if a set A of formulas
is consistent then it has a model. In fact, this is usually proved first,
and completeness deduced from it.
In some cases, a system of axioms A is intended to be used to prove
theorems about a particular structure; Peano’s axioms are an example. It is a fact of mathematical logic, however, that such systems will
generally have other models than the intended one. Indeed, it follows
from the “Lowenheim-Skolem” theorem that if A has infinite models
then it has a model, of any infinite cardinality greater than or equal to
the cardinality of the language. A proof of this may be found in the
above cited references, and a version is given in section 20; see [Wiki,
Lowenheim-Skolem theorem] for some historical comments. In the next
section, a few comments will be made on models of Peano’s axioms.
On the other hand, some systems of axioms A are intended to be
used to prove theorems about any of a variety of structures, namely
those which are models of the axioms. This is a basic tool of abstract
algebra; a system of axioms for structures of a certain type is specified,
and the theory of these developed by deducing facts from the axioms.
An example has already been seen, namely Boolean algebras in section
5; additional examples will be seen in section 8.
If the language contains the equality predicate, say that a model is
an E-model if = is interpreted as equality. By completeness, a consistent
set A of formulas, which includes the axioms of equality, has a model M .
M need not be an E-model; however an E-model can be constructed
from M . It follows that, in considering systems of axioms where = is
in the language and the axioms of equality are assumed, only E-models
need be considered. An outline of the construction of an E-model will
be given; see for example [Dowd1] for details.
A binary relation satisfying the first three axioms of equality is
called an equivalence relation. Given an equivalence relation ≡, let
[x] = {y : y ≡ x}; [x] is called the equivalence class of x. By the axioms,
x ∈ [x], and two equivalence classes are either disjoint or equal.
A binary relation on the domain of a structure D which satisfies all
the axioms of equality is called a congruence relation. A structure D/≡
may be constructed, called the quotient of D by ≡. This has as the ele14
ments of its domain, the equivalence classes. The value P ([x1 ], · · · , [xn ])
for a predicate symbol P may be defined as P (x1 , . . . , xn ); the axioms
ensure that the value depends only on the equivalence classes, and not
the particular choice x1 , . . . , xn of “representatives” of the classes. Similarly f ([x1 ], · · · , [xn ]) may be defined as [f (x1 , · · · , xn )].
If α is an assignment in D, let α′ be the assignment in D/≡ which
assigns to x the value [α(x)]. A straightforward induction shows that
for any formula F , F̂ (α) in D equals F̂ (α′ ) in D/≡.
In particular, if M is a model of A and ≡ is the interpretation of
=, then M/≡ is a model of A. Clearly, it is an E-model.
Assignments are somewhat cumbersome, and are used in mathematical logic for the definition of the semantics of formulas, etc. There
is a more convenient method of referring to the semantics of a formula,
which is in common use and will be used in this text (assignments will
be used occasionally also).
Suppose F is a formula, and ~v = v1 , . . . , vk is a list of variables
which includes the free variables of F . Given elements ~x = x1 , . . . , xk in
a structure S, let F~v (~x) be F̂ (a) where a assigns xi to vi for 1 ≤ i ≤ k.
It is common practice to use F (~x) as an abbreviation for F~v (~x), when
the explicit list of the variables is not needed. Another variation in use
is F (x̊1 , . . . , x̊k ); the variables are xi , . . . , xk , and x̊i is assigned to xi for
1 ≤ i ≤ k.
k will frequently be used to denote the length of a list ~v . Thus, F~v is
a kary predicate on S. A predicate P which is F~v for some F and ~v is said
to be definable; the formula F defines P in S. For a formula to define a
predicate, a correspondence must be given between the argument places
of the predicate and the free variables of the formula. The value of
the predicate depends only on the values assigned to the free variables;
additional variables are allowed for convenience.
7. Models of Peano arithmetic.
Models of Peano arithmetic have become a topic of interest in mathematical logic, [Kaye] being one reference on the subject. Let LA denote
the language 0 s + · =. Let N denote the structure of the non-negative
integers over this language. This may be defined in set theory; facts to
be given here provide some description of it. For a nonnegative integer
n, let n be the term, 0 followed by n s s; this is called the numeral for n.
Given a structure D in a language L, let Th(D) be the set of formulas in L which are true in D. Th(D) is called the theory of the structure
D. Let PA denote the formulas which are provable from Peano’s axioms. Let Q denote the formulas which are provable from the first 6 of
Peano’s axioms, and the formula x 6= 0 ⇒ ∃y(x = y s ).
Let D1 and D2 be structures for a language L. Let ˆ denote
15
the interpretation in D1 , and ˜ the interpretation in D2 . D2 is said
to be a substructure of D1 if the following requirements hold, where
x1 , . . . , xn ∈ D1 .
- For each predicate P , P̃ (x1 , . . . , xn ) if and only if P̂ (x1 , . . . , xn ).
- For each function f , f˜(x1 , . . . , xn ) = fˆ(x1 , . . . , xn ).
- For each constant c, c̃ = ĉ.
A function h : D1 7→ D2 is said to be a homomorphism if the following
requirements hold, where x1 , . . . , xn ∈ D1 .
- For each predicate P , P̃ (h(x1 ), . . . , h(xn )) if and only if
P̂ (x1 , . . . , xn ).
- For each function f , f˜(h(x1 ), . . . , h(xn )) = h(fˆ(x1 , . . . , xn )).
- For each constant c, c̃ = h(ĉ).
The third requirement is redundant, since a constant is a 0-ary function
symbol. Some authors (such as [Dowd1]) weaken the requirement for
predicates, and call a homomorphism as above a strong homomorphism;
others (such as [Sacks1]) give the above definition.
It is readily seen that if h is a homomorphism then h[D1 ] may be
made into a substructure of D2 in a unique way (or see [Dowd1]). If h
is an injection then it is called an isomorphic embedding of D1 in D2 .
If h is a bijection then it is called an isomorphism of D1 with D2 .
If D is any structure for LA , the predicate x ≤ y is defined by the
formula ∃w(y = x + w).
The following are some basic facts concerning the above defined
concepts. Let M denote a model of Q.
1. Th(N ) has models other than N ; such models are called nonstandard.
2. Q⊆PA⊆Th(N ).
3. The map h defined by the formula h(n) = n̂ is an isomorphic embedding of N in M .
4. If y ∈ M and y ≤ h(n) for some n ∈ N then y = h(m) for some
m ∈ N (h[N ] is said to be an initial segment of M ).
5. Suppose M satisfies the “second order induction axiom”, that is,
for any subset S ⊆ M , if 0 ∈ S, and ∀x(x ∈ S ⇒ xs ∈ S), then
∀x(x ∈ S). Then h is an isomorphism.
Fact 1 was first observed by T. Skolem in 1933; a proof is as follows. Let ∞ be a new non-logical symbol, and add to Th(N ) the formulas n < ∞ for each integer n. If the enlarged set of formulas were
inconsistent, there would be some finite set of the added formulas which,
when added to Th(N ), would result in an inconsistent system. But this
is impossible, because the ordinary integers with a large enough value
assigned to ∞ would be a model. Thus, the enlarged set has a model,
16
and this is a model of Th(N ) which contains an element greater than
every “standard” integer.
To prove fact 2 it is only necessary to give a proof in PA of x 6=
0 ⇒ ∃y(x = y s ); this is an easy exercise, or may be found in [Yasuhara].
Fact 3 follows from the following facts, where ⊢ denotes provability
in Q.
- If k + l = m then ⊢ k + l = m.
- If k · l = m then ⊢ k · l = m.
- If k 6= l then ⊢ k 6= l.
Fact 4 follows from the additional fact
- ⊢ x ≤ k ⇒ (x = 0 ∨ · · · ∨ x = k).
Proofs of these facts can be found in [Yasuhara].
To prove fact 5, let S be h[N ]. The axiom of fact 5 is called
“second order” because it involves the use of subsets of the universe
of discourse, and must be formalized within set theory (or at least an
adequate fragment of it). Together with the preceding facts, it may
be seen that second order methods are stronger than strict first order
methods.
8. The real numbers.
Like the integers, the real numbers are fundamental mathematical
objects, which are familiar from everyday life, and form a mathematical universe of discourse. The real numbers may be constructed from
the non-negative integers N in informal set theory, and second order
axioms can be given which completely characterize the structure. It is
valuable to first construct some substructures which are themselves fundamental mathematical objects. The structures to be constructed are
the integers Z, the rational numbers Q, and the real numbers R. Some
families of structures will be defined, of which the preceding structures
are important examples.
The language of commutative rings is 0 1 + · =. The axioms for
commutative rings are
C1 (x + y) + z = x + (y + z)
C2 x + y = y + x
C3 x + 0 = x
C4 For all x there exists y such that x + y = 0
C5 (x · y) · z = x · (y · z)
C6 x · y=y · x
C7 x · 1 = x
C8 x · (y + z) = x · y + x · z
Various additional facts can be shown readily from the axioms; these
may be found in any of numerous introductions to abstract algebra,
17
including [Dowd1]. In particular, subtraction may be defined, and its
basic laws proved.
N is not a commutative ring, because axiom C4 does not hold. N
can easily be enlarged to a structure which is a commutative ring, by
adding the negative integers. One method of doing this is as follows.
On N × N , define the binary functions
- hm1 , n1 i + hm2 , n2 i = hm1 + m2 , n1 + n2 i and
- hm1 , n1 i · hm2 , n2 i = hm1 m2 + n1 n2 , m1 n2 + m2 n1 i;
and the binary predicate
- hm1 , n1 i ≡ hm2 , n2 i if and only if m1 + n2 = n1 + m2 .
By straightforward if tedious calculation ≡ is verified to be a congruence
relation on N × N with + ·. The equivalence class [hm, ni] will represent
m − n.
In the quotient (N × N )/≡, + and · are defined by the above
equations. Another straightforward calculation shows that the quotient
is a commutative ring, with [h0, 0i] as 0, [h1, 0i] as 1, and [hm, ni] +
[hn, mi] = 0. This is the ring Z. The function h where h(n) = [hn, 0i]
is an isomorphic embedding of N in Z.
A binary predicate ≤ on a set D is said to be a partial order if the
following hold.
1. x ≤ x (reflexive law)
2. x ≤ y and y ≤ z imply x ≤ z (transitive law)
3. x ≤ y and y ≤ x imply x = y (antisymmetry law)
A partial order is a linear order if the following also holds.
4. x ≤ y or y ≤ x
The subset order on Pow(U ) for a set U is an example of a partial order
which is not a linear order (provided U has at least two elements). The
relation ≤ on N , defined to hold if ∃w(y = x + w), is a linear order.
Given a partial order ≤, the predicate x < y may defined by the
formula x ≤ y ∧ x 6= y. This relation is called the strict part of the
partial order, and satisfies the transitive law and x 6< x. On the other
hand, given such a predicate the relation x < y ∨ x = y is a partial
order.
If ≤ is a partial order on D and S ⊆ D then x ∈ S is said to be a
least element of S if x ≤ y for all y ∈ S. An element x ∈ D is said to
be an upper bound for S if y ≤ x for all y ∈ S. An upper bound x for
S is a least upper bound if x ≤ x′ whenever x′ is an upper bound.
An ordered commutative ring is one where a unary predicate P
(positive) has been added to the language, and satisfying the following
additional axioms.
O1 ¬P (0).
O2 if x 6= 0, exactly one of P (x) or P (−x) holds.
18
O3 P (x) ∧ P (y) ⇒ P (x + y).
O4 P (x) ∧ P (y) ⇒ P (x · y).
Properties which follow immediately include the following:
- 1 is positive (unless 0=1 and the ring is trivial);
- the relation P (x − y) is the strict part x > y of a linear order x ≥ y
on the ring;
- if x < y then x + z < y + z, and if x ≤ y then x + z ≤ y + z; and
- if x < y then −y < −x, and if x ≤ y then −y ≤ −x.
The absolute value |x| is defined to be x if x is positive or 0, else −x.
This satisfies the triangle inequality |x + y| ≤ |x| + |y|. Axioms can
be given using the order predicate; using positivity results in a slightly
simpler set of axioms.
Z is an ordered commutative ring; the elements [hn, 0i] for n 6= 0
constitute a set of positive elements. If M is any ordered commutative
ring, mapping 0 and 1 to 0 and 1 induces a unique isomorphic embedding of Z in M . The following second order axiom ensures that the
embedding is in fact an isomorphism.
- If S ⊆ M is nonempty and bounded below then S has a least
element.
A proof will be outlined.
Call elements of the image of the embedding “integers”. There can
be no element greater than every integer. If not, let S be the set of such,
and let a be the least element of S. Then a − 1 ≤ m for some integer
m, whence a ≤ m + 1, a contradiction. There can be no element less
than every integer; if a is such then −a is greater than every integer.
Suppose m < a < m + 1 where m is an integer. Then 0 < b < 1 where
b = a − m. The set {bj : j ∈ N } is a set which is bounded below but
has no least element.
A field is a commutative ring which satisfies the following additional
axioms.
F1 For all x, if x 6= 0 then there exists y such that x × y = 1
F2 0 6= 1
An ordered field is an ordered commutative ring satisfying F1 and F2.
Z is not a field, because there is no x such that 2 · x = 1, as may
be easily verified. Z may be enlarged, to construct a field, as follows
(in fact this construction may be carried out in any “integral domain”,
which is a commutative ring satisfying some additional axioms). Let
Z 6= denote the nonzero elements of Z. On Z × Z 6= , define the binary
functions
- hm1 , n1 i + hm2 , n2 i = hm1 n2 + m2 n1 , n1 n2 i and
- hm1 , n1 i · hm2 , n2 i = hm1 m2 , n1 n2 i;
and the binary predicate
19
- hm1 , n1 i ≡ hm2 , n2 i if and only if m1 n2 = m2 n1 .
By straightforward calculation ≡ is verified to be a congruence relation
on Z × Z 6= with + ·. The equivalence class [hm, ni] will represent m/n.
In the quotient (Z × Z 6= )/≡, + and · are defined by the above
equations. Another straightforward calculation shows that the quotient
is a field, with [h0, 1i] as 0, [h1, 1i] as 1, and, provided m 6= 0, [hm, ni] ·
[hn, mi] = 1. This is the field Q. The function h where h(n) = [hn, 1i]
is an isomorphic embedding of Z in Q.
Q is an ordered field; the elements [hm, ni] where m, n > 0 constitute a set of positive elements. If M is any ordered field, mapping 0
and 1 to 0 and 1 induces a unique isomorphic embedding of Q in M .
Clearly Q is the unique ordered field which is isomorphically embedded
in any ordered field; this seems to be the best uniqueness property for
Q.
The rational numbers suffer from a deficiency. Let S = {q ∈ Q :
q 2 < 2}; it is not difficult to show that if S has a least upper bound r
then r2 = 2; and there is no r ∈ Q such that r2 = 2 (this is proved in
the ancient Greek text “Euclid’s Elements”). Thus, S does not have a
least upper bound in Q.
Q can be enlarged, so that the deficiency just mentioned is eliminated. This was an important issue in the history of mathematics, and
its resolution was important to early set theory. See [MacTutor, Real
numbers] for remarks on the history of the subject. The construction
to be outlined below can be found in numerous references, [Rudin] for
example.
A linearly ordered set D is said to have the least upper bound
property if, whenever S ⊆ D is nonempty and has an upper bound,
then S has a least upper bound. Q does not have this property. One
method of constructing the real numbers is to enlarge Q to a linearly
ordered set which does have the property. It turns out that there is
exactly one way to do this.
If D is a set with a partial order on it, say that a subset S ⊆ D is
≤-closed if x ∈ S ∧ w ≤ x ⇒ w ∈ S. Considering Q with its usual order,
define a cut to be a set of rationals which is nonempty, bounded above,
≤-closed, and has no greatest element. Let R be the set of cuts; R will
be equipped with interpretations for 0 1 + · = P , to produce a structure
for this language. For q ∈ Q let q < denote {r ∈ Q : r < q}; this is
readily seen to be a cut.
To begin with, some facts about R will be proved using only the
order ≤ on Q; these facts are of interest in themselves. A linear order
is said to be a dense linear order without endpoints if it satisfies the
additional axioms
20
∀x∀y(x < y ⇒ ∃z(x < z < y)),
∀x∃y(y < x), and ∀x∃y(y > x).
Later in the section it will be shown that if such a structure is countably
infinite then it is isomorphic as a linear order to Q; for now only the
easily verified fact that Q is such an order is needed.
The notation sup(S) is commonly used for the least upper bound
of a subset S of a partially ordered set; henceforth it will be adopted.
The notation is derived from that fact that “supremum” is a synonym
for “least upper bound”. The notation inf(S) is used for the greatest
lower bound (infimum).
A map between linear orders is said to be order-preserving if x ≤
y ⇒ h(x) ≤ h(y); suppose h is such a map. It is easy to see that h is
an isomorphic embedding if and only if x < y ⇒ h(x) < h(y); and in
this case h(x) < h(y) ⇒ x < y. Such a map will be said to be strictly
order-preserving.
A subset S of a linear order is said to be order-dense if whenever
x < y then there is a q ∈ S such that x < q < y.
The subset relation induces a partial order on R. To simplify the
notation, let p, q, r denote elements of Q and x, y, z elements of R. If
q∈
/ x then q is an upper bound for x; for if r ∈ x, q ≤ r cannot hold,
else q ∈ x, whence r < q. Thus, given x, y, and q ∈ y − x, x ⊂ y follows;
this shows that ⊆ is a linear order on R.
Suppose x ⊂ y; then there is some q ∈ y − x. Clearly q < ⊂ y. Also,
x ⊆ q < , and x = q < if and only if q = sup(x). In the latter case, x ∪ {q}
must be a proper subset of y, because y has no greatest element, and
therefore there is an r such that q < r and r ∈ y. Replacing q by r if
necessary, a q has been found such that x ⊂ q < ⊂ y; in particular, the
subset order on R is dense.
If x ∈ R then x is bounded above, so there is a q with x ⊂ q < .
Also, there is a q ∈ x, and q < ⊂ x. In particular, the subset order on R
has no endpoints.
Let hR denote the map from Q to R, where hR (q) = q < . Using
facts already observed, it follows that hR is an isomorphic embedding
of linear orders.
R has the least upper bound property. Indeed, if S is a nonempty
set of cuts which has an upper bound let b = {q : ∃x ∈ S(q ∈ x)}
(readers who are familiar with infinite unions will recognize that this
is just the union of the members of S). Then b is nonempty, bounded
above, ≤-closed, and has no greatest element; that is, it is a cut. It
is the least upper bound because the infinite union is the least upper
bound in the subset order; this will be shown in section 11.
To summarize, R has the following properties.
21
1.
2.
3.
4.
It is a dense linear order without endpoints.
It has the least upper bound property.
There is an isomorphic embedding of Q.
The image of this embedding is an order-dense subset.
Suppose that M is any linear order having properties 1-4 above, and
let h denote the embedding. For x ∈ M let Cx = {q ∈ Q : h(q) < x}.
Then Cx is nonempty (there is some q with h(q) < x), Cx is bounded
above (there is some q with h(q) > x), Cx is ≤-closed (r < q ⇒ h(r) <
h(q)), and Cx contains no largest element (if h(q) < x then there is an
r such that h(q) < h(r) < x).
Thus, Cx is a cut, and so there is a map g such that the following
hold.
a. g : M 7→ R.
b. If x < y then g(x) < g(y) (since x < h(q) < y for some q).
c. g ◦ h = hR (g(h(q)) = {r : h(r) < h(q)} = {r : r < q} = hR (q)).
In fact, if g is any map having properties 1-3 then g(x) = Cx . This
follows because q ∈ Cx if and only if h(q) < x if and only if g(h(q)) <
g(x) if and only if hR (q)) < g(x) if and only if q ∈ g(x).
Suppose X ∈ R; then h[X] is nonempty and bounded, so sup(X)
exists. Letting x = sup(h[X]), it is not difficult to verify that g(x) = X,
that is, q ∈ X if and only if h(q) < sup(h[X]).
To summarize, if M is any linear order having properties 1-4 above,
where h is the isomorphic embedding, then there is a unique map g
having properties a-c above, and it is an isomorphism. Also, any x ∈ M
equals sup(h[Cx ]) where Cx is as above; this may be seen because it is
true of hR .
Define a commutative group to be a structure in the language 0 + =
which satisfies axioms C1-C4. An ordered commutative group is a commutative group, with P added to the language, satisfying axioms O1-O3.
Given h : Q 7→ M with properties 1-4, there is a unique commutative
group structure on M which makes h an isomorphic embedding of ordered commutative groups. Indeed (writing q for h(q)), q < x + y if and
only if ∃r, s(r < x ∧ s < y ∧ q = r + s), and it follows that x + y must
equal sup({r + s : r < x ∧ s < y}.
In particular, having constructed R as the Dedekind cuts in Q, considered as a linear order, there is a unique way of defining the function
+ on R so that hR is an isomorphic embedding of ordered commutative
groups.
Multiplication may be handled similarly; given h : Q 7→ M with
properties 1-4, and positive elements x, y, q < x·y if and only if ∃r, s(r <
x ∧ s < y ∧ q = r · s), and it follows that x · y must equal sup({r · s :
r < x ∧ s < y}. The function · may then be extended uniquely by
22
algebra (that is, by logic from the axioms for ordered fields) to all pairs
x, y ∈ M .
Given an ordered field M , it is easily seen that there is a unique
isomorphic embedding of Q in M . From these facts, any two ordered
fields having the least upper bound property are isomorphic by a unique
isomorphism. R is such a structure, and this may be taken as a formal
definition of the real numbers.
Returning to the topic of countable dense linear orders without
endpoints, let A and B be two such. The assumption that A is countably
infinite means that there is a bijection a : N 7→ A. The convention of
writing an for a(n) is a frequently used one, and one may say, “let A be
enumerated as a0 , a1 , . . .”. Likewise, let B be enumerated as b0 , b1 , . . ..
The following procedure (the “back and forth” procedure) produces
a 1-1 correspondence between A and B, which is an order isomorphism.
Let Ad be the elements of A which have been assigned a value so far,
and similarly for Bd . Repeat the following.
a. Let m be smallest such that am ∈
/ Ad . Assign to am an element of
B which bears the same relation to Bd which am bears to Ad .
b. Proceed similarly, with B and A exchanged.
Although of peripheral interest, the relation between a rigorous
theory of the real numbers, and a rigorous theory of the plane of plane
geometry, is of sufficient interest that it was considered by David Hilbert,
after whom one system of axioms for plane geometry is named. A treatment of this topic may be found in Appendix 1.
Another topic of peripheral interest concerns weaker systems than
full set theory in which the theory of the real numbers can be developed.
This topic has been of recent interest; see [Simpson].
9. Computability.
Computability theory is concerned with mechanical procedures involving formal objects. The need for such a theory was already evident
in 1900, when David Hilbert asked whether there was “a process according to which it can be determined in a finite number of operations”,
whether a polynomial with integer coefficients had an integer solution.
A negative answer to this question (Hilbert’s tenth problem) was given
in 1970 by Yuri Matijasevic; the formal theory of computability developed in the mid 1930’s was necessary to its solution. Computability
theory has also proved useful in mathematical logic, as will be seen in
the next section.
A mechanical procedures might do any of the following.
1. Enumerate a set of integers, or more generally a predicate as a set of
ntuples. Such a procedure “runs forever”, outputs only elements of
the set, eventually outputs every element of the set, and in general
23
may output the same element more than once.
2. Decide whether an integer is in a set (more generally whether an
nary predicate holds). Such a procedure always halts after a finite
number of steps, with a “yes” or “no” answer.
3. Compute the value of an nary function. Such a procedure always
halts after a finite number of steps, and produces a numeric value.
4. Compute the value of an nary “partial function”. Such a procedure
may halt after a finite number of steps and produce a numeric value;
or run forever and produce no output, in which case the value of
the partial function is undefined.
The notion of a partial function is useful in computability theory,
and occasionally in other discussions. An (n + 1)ary predicate P (~x, y)
(where ~x is used to denote x1 , . . . , xn ) is said to be single-valued if for
all ~x there is at most one y such that P (~x, y). P is said to be total if for
all ~x there is at least one y such that P (~x, y). Thus, an nary function is
just an (n + 1)ary predicate which is both single-valued and total. An
nary partial function is only required to be single-valued.
Shortly, formal definitions of the predicates and functions computable using procedures of the above types will be given. An informal
discussion is of value, though. In particular, informal arguments can be
given showing how a procedure of one type can be converted into a procedure of another type. More generally, procedures can be (and usually
are) given “informally”, relying on the experience of mathematics to
conclude that the predicate or function computed by the procedure has
a proper formal procedure. This principle (that informal procedures can
be translated to formal ones) is often called “Church’s thesis” (although
Church’s original statement was more specialized).
A predicate computed by a procedure P of type 1 (an enumeration
procedure) can be computed by a procedure Q which halts on a given
input if and only if the predicate holds (a semi-decision procedure).
Given P, given an input ~x, run P, and halt if ~x appears. On the other
hand, given a semi-decision procedure Q, at stage n run Q for n steps,
on all inputs where xi ≤ n for all i, and output ~x if Q halts on ~x. The set
of inputs on which a semi-decision procedure halts is called its domain.
It is interesting to note that already some characteristics of a “mechanical procedure” are apparent. There should be some record of a
finite amount of data, and some operations which can be performed on
it in a step. It should also be noted that the “informal” transformation
just given can be formalized, once a formal definition of a mechanical
computation has been given.
A set (or predicate) which is the output of an enumeration procedure (or the domain of a semi-decision procedure) is called variously
24
“computably enumerable”, “recursively enumerable”, or “semi-decidable”. Until recently (e.g., as in [Rogers]), the term “recursively enumerable” was preferred. However, this terminology is a “historical artifact”,
and progressively since [Soare] the term “computably enumerable” has
been seen as carrying less linguistic baggage.
A procedure of type 2 is called a decision procedure. A predicate
computed by such a procedure is called variously “computable”, “decidable”, or “recursive”. Again, the term “computable” is displacing the
older standard “recursive”. The term “decidable” remains in current
use, especially in certain contexts, in particular logical theories.
If a predicate R is computable, then ¬R is, by simply reversing the
roles of “yes” and “no”. Also, R is computably enumerable, by running
forever instead of halting and answering “no”. If R is enumerable by
procedure P, and ¬R is enumerated by procedure Q, a procedure for
deciding R is as follows. Given an input ~x, at alternate stages, run
a step of P, or a step of Q. when ~x is enumerated, answer “yes” if P
enumerated it, and “no” if Q enumerated it.
A partial function computable by a procedure of type 4 is said to
be a “computable partial function” or “partial recursive function”, with
the first terminology being the more recent. If φ is an nary partial
function computed by procedure P, then as a set of (n + 1)tuples, φ is
enumerated by the following procedure. At stage n, run P for n steps
on all inputs ~x with x1 ≤ n for all i. Whenever P yields a value y,
output ~x, y. On the other hand, if Q is a procedure enumerating φ as a
set of (n + 1)tuples, φ is computed by the following type 4 procedure.
On input ~x, enumerate the (n + 1)tuples of φ. If an ntuple ~x, y appears,
produce y as the value and halt.
A function computable by a procedure of type 3 is said to be a
“computable function” or “recursive function”, with the first terminology being the more recent. A computable function is just a computable
partial function, which is total.
Needless to say, there are many ways of giving a formal definition of
computability. Further, as can be seen from the foregoing discussion, a
formal definition can be given of, say, computable partial functions, and
other classes of predicates and functions defined in terms of this, instead
of more directly from the formal definition. Likewise, a formal definition
might instead be given for a computably enumerable predicate.
In fact, a definition of the latter type is already available. Using
this as the basic definition is uncommon; usually some other definition
is given, and the one to be given here shown to be equivalent. An
outline of such methods will be given in appendix 2. More extensive
treatments of this topic can be found in any of numerous introductions
25
to computability theory, including [Mendelson], [Yasuhara], and chapter
12 of [Dowd1]. Early workers in the subject include (in alphabetical
order) Church, Godel, Kleene, Markov, Post, and Turing.
The formal definition will involve formulas in the language 0 s + · =,
called LA in section 7. Recall that for n ∈ N , n is the numeral with
value n. Also, for a formula F , variables x1 , . . . , xk , and terms t1 , . . . tk ,
Ft1 /x1 ,...tk /xk denotes the formula obtained from F by replacing each
free occurrence of xi by ti . Let ⊢ denote provability from the axioms
of the system Q defined in section 7. A kary predicate P is said to be
computably enumerable if there is a formula F , with k free variables
x1 , . . . , xk , such that P (n1 , . . . , nk ) if and only if ⊢ Fn1 /x1 ,...,nk /xk . A
predicate P is computable if and only if both P and ¬P are computably
enumerable. A partial function or total function is computable if and
only if it is computably enumerable as a set of (n + 1)tuples.
The formal definition is consistent with the informal one. The formulas provable from the axioms of Q (or any similar finite set of axioms)
can be mechanically enumerated, as follows. A list is maintained of all
formulas proved so far. At stage n, formulas which follow from those
in the list by a rule are added; and also axioms involving the first n
variables.
A technical point of considerable significance has been ignored in
the previous paragraph. A computably enumerable set is a subset of
N , whereas formulas are finite strings over an alphabet, which as given
contains infinitely many variables, even if there are only finitely many
non-logical symbols.
Mapping formulas to integers is called “arithmetization of syntax”.
This is an omnipresent ingredient of mathematical logic, first carried
out by Godel, in the course of proving the incompleteness theorems, to
be described in the next section. A mapping needs to be given, such
that functions and predicates on the formulas correspond to computable
functions and predicates on N .
There are many ways of doing this. One simple such relies on a
simple correspondence between strings over a finite alphabet and nonnegative integers. Let m be the size of the alphabet. The letters of
the alphabet may be taken as thePintegers 1, . . . , m. The finite string
lt−1 . . . l0 is mapped to the integer i li ni . This notation is called madic
notation, and has advantages over the more familiar “mary” notation,
where the letters are considered to be 0, . . . n − 1. In madic notation,
the map from strings to N is a 1-1 correspondence; the empty string
denotes 0. In mary notation there are multiple strings corresponding to
the same integer, unless leading 0’s are disallowed; and 0 rather than
the empty string customarily denotes 0.
26
A countably infinite alphabet can be replaced by a finite alphabet.
In the case of a first order language with finitely many non-logical symbols, the variable xi can be replaced by the string xN where N is the
2-adic notation for n.
It is sometimes useful to be able to distinguish between a string s
over an m letter alphabet, and the numeric value corresponding to it
under m-adic notation. In such cases, a common notation is psq, and
the integer is called the Godel number of the string.
In appendix 2, an outline will be given of an argument that various functions and predicates are computable. This is an illustration
of Church’s thesis, in that these functions and predicates are readily
computed by informally given procedures; readers with experience with
a programming language for example should be able to see that such
procedures could be expressed in the language.
In particular, there is a computably enumerable binary predicate
U , whose pairs are the pairs hf, ni, such that n is in the set enumerated
by the formula F with one free variable, where f = pF q. It follows from
this that the unary predicate K(x) = U (x, x) is computably enumerable;
indeed, if F is a formula for U in the free variables x and y then Fx/y
is a formula for K.
Theorem 1. K c is not computably enumerable.
Proof: Suppose to the contrary that F was a formula which yielded
an enumeration of K c , and let f = pF q. Then f ∈ K c if and only if
U (f, f ) (because U with first argument f enumerates K c ), if and only
if f ∈ K (by definition of K). Thus, the assumption of the existence of
F leads to a contradiction, so F does not exist. ⊳
As a corollary, K is not decidable. This is a fact of interest; but
K is a “contrived” set (although a basic one in computability theory).
Undecidability of a more “naturally occurring” set will be proved in the
next section.
10. Independence.
The emergence of formal logic in the late 19th century showed that
mathematical reasoning could be reduced to the formal manipulation of
strings of symbols. The question then arose, what could be proved about
formal logic using “finitary” methods, which involve reasoning directly
about the formal objects without introducing abstract concepts. In the
1920’s there were attempts to make the notion of finitary methods more
precise; see [SEP, Hilbert’s Program].
David Hilbert asked (Hilbert’s second problem, 1900) whether a
system of axioms could be given for arithmetic, and a proof given of
the consistency of the axioms, which was finitary. Later (Hilbert’s program, 1920) he asked the same question for a system of axioms for all
27
of mathematics.
It is generally recognized that the second incompleteness theorem,
which was proved by Kurt Godel in 1931, provided a negative answer to
both questions. The first incompleteness theorem, proved in the same
paper, already showed that the application of formal logic to mathematics was unable to provide a method of proving all true statements
of mathematics.
As observed in section 6, the axioms and rules of formal logic are
complete. Thus, the insufficiency is due to the fact that no allowable
system of non-logical axioms for all of mathematics can be given. It
must be specified what systems of axioms are allowable; the set of all
true statements for example should not be. It has become clear that
the correct notion of an allowable system of axioms is one which is
computably enumerable.
Fix a first order language L. A theory over L is defined to be a set
S of formulas, which is closed under logical consequence, that is, such
that if F1 , . . . , Fk are in S and F1 , . . . , Fk ⊢ G then G is in S. The
notation ⊢T F will be used to denote that the formula F is in T .
Recall from section 6 that a theory is consistent if for no sentence
F , are both F and ¬F is in S; and that a theory is consistent if and
only if it has a model. A theory is inconsistent if it is not consistent.
A theory S is said to be complete if for all sentences F , either F
or ¬F is is in S. Recall from section 7 that for a structure D for L,
Th(D) is the set of formulas which are true in D. Th(D) is an example
of a consistent and complete theory. A theory is incomplete if it is not
complete.
A sentence F is independent of a theory S if neither F nor ¬F is in
S. A theory is clearly incomplete if and only if there is an independent
sentence. However the fact that a particular sentence is independent
may be of particular interest.
If L has only finitely many non-logical symbols (this restriction can
of course be relaxed) a method was given in the preceding section for
assigning integers (Godel numbers) to formulas. A theory is said to be
computably enumerable if its set of Godel numbers is; and decidable if
its set of Godel numbers is.
Recall the theory PA from section 7. Since PA is a subset of Th(N ),
it is consistent. It will next be shown that PA is not complete; this is
the “first incompleteness theorem”. Let Sub(x, y) be the function whose
value is pFt/v q if x = pF q where F is a formula with one free variable
v, and y = ptq where t is a term. Let Num(x) be the function whose
value is pxq, where as in section 7 x is the term representing x. It is
shown in appendix 2 that these are computable.
28
Lemma 1. Suppose F is a formula with one free variable v. Then
there is a sentence G such that ⊢Q G ⇔ F (NG ) where NG = pGq is the
numeral for the Godel number of G.
Proof: The notation will be abused by writing y = e(x) for the
formula Ex/x,y/y , where E is the formula for e. Let H be ∃w(w =
Sub(v, Num(v)) ∧ Fw/v ), let NH = pHq, and let G be HNH /v . By
definition of Num, pNH q = Num(pHq); and since Num is computable,
⊢Q pNH q = Num(pHq), or ⊢Q pNH q = Num(NH ). By definition of
Sub, pGq = Sub(pHq, pNH q). and since Sub is computable, ⊢Q pGq =
Sub(pHq, pNH q). By predicate logic, ⊢Q Sub(NH , Num(NH )) = NG .
Again by predicate logic, ⊢Q FNG /v ⇔ HNH /v , and the right side is G.
⊳
Let PrfT (x) be the function which is 0 if x is a proof in T , else 1.
Let Thm(x) be the function which is the last formula of the proof x;
Thm is shown to be computable in appendix 2.
Theorem 2. Suppose T is an extension of Q, possibly in a language with finitely many additional non-logical symbols, which is consistent, and such that PrfT (x) is computable. Let F be the formula
¬∃y(PrfT (y) ∧ Thm(y) = x), and let G be the sentence obtained as in
lemma 1. Then 6⊢T G.
Proof: Suppose G were provable; then the following would hold.
a. ⊢T G.
b. For some m ∈ N , ⊢T PrfT (m) ∧ Thm(m) = NG (by computability
of PrfT ).
c. ⊢T G ⇔ ¬∃y(PrfT (y) ∧ Thm(y) = NG ) (by lemma 1 and the hypothesis Q ⊆ T ).
By predicate logic, T is inconsistent, contradicting the hypotheses. ⊳
Corollary 3. PA is not complete.
Proof: N is a model of PA (this is provable in set theory). Since G
is not provable, it is true in N (since it says it’s not provable). Therefore
¬G is not provable either. ⊳
The methods of this section can be used to show that the axioms of
set theory are incomplete, indeed will remain so if they are expanded. A
great quantity of modern set theory is concerned with showing that specific statements of set theory are independent, and implications which
hold between independent statements.
Many other facts about PA and other theories containing Q can be
proved. A discussion of two such will be given here; further discussion
can be found in any of several references, including chapter 12 of [Dowd1]
and [Shoenfield1].
Theorem 4. Suppose T is an extension of Q, possibly in a language
with finitely many additional non-logical symbols, which is consistent;
29
then T is undecidable.
Proof: It is shown in appendix 2 that if P is a computable predicate
then there is a formula F with free variable v such that if P (n) then
⊢Q Fn/v and if ¬P (n) then ⊢Q ¬Fn/v ; say that F represents P . Let
UT (f, n) be the predicate which is true if and only if Sub(f, Num(n)) is
in T , considered as a set of integer; the predicate U mentioned in section
9 is UQ . It is easily seen that if F represents a computable predicate
P , then UT (pF q, n) = U (pF q, n) for all n. Let KT (x) = UT (x, x). As
in theorem 9.1, there is no f such that KTc (n) = UT (f, n) for all n. It
follows that K c is not computable. From this it follows that T is not
computable, since a decision procedure for T would yield one for KTc .
⊳
In particular, PA is undecidable. Q was invented by A. Tarski, and
used to prove undecidability of various theories. Some discussion may
be found in chapter 12 of [Dowd1].
Call a theory T which satisfies the hypotheses of theorem 2 a G1
theory. Given such, let F abbreviate ∃w(PrfT (w) ∧ Thm(w) = vF );
the “placeholder” vF will be used in various ways. The “derivability
conditions” for T are the following formulas.
1. ⊢T F then ⊢T F .
2. ⊢T (F ⇒ G) ⇒ F ⇒ G.
3. ⊢T F ⇒ F ,
In condition 1 vF denotes pF q. In condition 2, vF and vG are variables
and F ⇒ G is an abbreviation for a function giving this formula from
its subformulas. In condition 3, vF is a variable and the inner F is an
abbreviation for a function giving this formula from F .
Call a G1 theory T a G2 theory if it satisfies the derivability conditions. It was observed in the proof of theorem 2 that the first condition
holds for a G1 theory. The above notation is used, because proofs can
be made more readable; it is the notation used in “provability logic”.
The following theorem is called Lob’s theorem.
Theorem 4. Suppose T is a G2 -theory. If ⊢T F ⇒ F then ⊢T F .
Proof: Using lemma 1 let G be such that ⊢ G ⇔ (G ⇒ F ); then
⊢ (G ⇒ G ⇒ F ), whence ⊢ G ⇒ G ⇒ F . But ⊢ G ⇒
G, so ⊢ G ⇒ F , and so ⊢ G ⇒ F . Thus ⊢ G, so ⊢ G, so ⊢ F .
⊳
Let Con(T ) be the sentence ¬(F0 ∧ ¬F0 ) for some arbitrarily chosen sentence F0 .
Corollary 5. If T is a G2 -theory then Con(T ) is not provable in T .
Proof: Apply Lob’s theorem with F being F0 ∧ ¬F0 . ⊳
Corollary 5 is called the second incompleteness theorem. Although
this will not be proved here, PA is a G2 theory, and so Con(PA) is not
30
provable in PA. Con(PA) is provable in set theory; in fact it is provable in weaker theories, and this is a topic of interest in mathematical
logic. A proof that a theory is a G2 theory involves a fair amount of
labor; [HajPud], [Monk2], and [Smorynski] are among various references
containing treatments.
11. ZFC.
The language of set theory is ∈, = . The axioms of equality are
axioms of set theory. The other non-logical axioms are as follows; some
make use of abbreviations which will be defined later, and are familiar
from informal set theory.
Extensionality:
∀w(w ∈ x ⇔ w ∈ y) ⇒ x = y
Pairing:
∃x∀w(w ∈ x ⇔ w = u ∧ w = v)
Union:
∃x∀w(w ∈ x ⇔ ∃u(u ∈ y ∧ w ∈ u))
Power Set:
∃x∀w(w ∈ x ⇔ w ⊆ y)
Separation or subset: For any formula F where x does not occur free,
∃x∀w(w ∈ x ⇔ w ∈ y ∧ F )
Replacement: For any formula F ,
∀x∀y∀z(F ∧ Fz/y ⇒ y = z) ⇒ ∀u∃v∀y(y ∈ v ⇔ ∃x(x ∈ u ∧ F ))
Infinity:
∃x(∃w(w ∈ x ∧ IsEmpt(w)) ∧ ∀w(w ∈ x ⇒ ∃v(v ∈ x ∧ IsSuc(w, v))))
Foundation or regularity:
∃w(w ∈ x) ⇒ ∃w(w ∈ x ∧ ∀u(u ∈ w ⇒ ¬u ∈ x))
Choice:
∀w(w ∈ x ⇒ ∃v(v ∈ w)) ⇒ ∃f (IsChoiceFunc(f, x))
This set of axioms is known as ZFC, “Zermelo-Fraenkel with
Choice”. Fraenkel added replacement to Zermelo’s original set, and
choice is considered separately because of philosophical issues; in particular ZF is the axioms, with choice omitted. ZFC has been considered
the “official” axiom system for mathematics since the 1930’s and even
earlier.
The extensionality axiom defines how ∈ and = are related; the
converse implication follows by the axioms of equality. The axioms
pairing, union, and power set describe how to build up new sets. The
separation axiom describes how to cut down a set, to those elements
having some property. The replacement axiom states that, if a formula
gives a partial function on the universe of all sets, then the image of a
set is a set. The axiom of infinity states that an infinite set exists.
31
Further discussion of each axiom will be given. Some definitions
will be given as well; these are useful in the formal development, and
include several concepts used in informal set theory. The treatment will
be as brief as possible; various references, including [Monk1] and [Jech2],
provide a more extensive treatment.
The pairing axiom states that given sets u and v, there is a set x
such that the 3-ary predicate ∀w(w ∈ x ⇔ w = u ∧ w = v) holds. Using
extensionality, it follows that this set is unique; the notation {u, v} is
used for it. In writing formulas, x = {u, v} is used as an abbreviation
for the 3-ary predicate. This is common throughout mathematics, and
is known as introduction by definition of function symbols.
The union axiom states that given a collection y of sets, there is a set
x, where w ∈ x if and only if w ∈ u for some set u in the collection. By
extensionality this set is unique; it is called the union of the collection
y, and the notation ∪y is used to denote it. It may also be denoted
∪u∈y u.
A set w is said to be a subset of the set x, written w ⊆ x, if
∀u(u ∈ w ⇒ u ∈ y). The power set axiom states that given a set y,
there is a set x whose members are the subsets of y. The power set is
unique; in this text the notation Pow(y) is used to denote the power set.
The set x stated to exist in the axiom of separation is unique, and
x = {w ∈ y : F } may be written. The reader might wonder why the
axiom is not simply ∃x∀w(w ∈ x ⇔ F ). This axiom is incorrect. There
is an axiom system where it is correct, called Bernays-Godel set theory
(a treatment may be found in [Jech2]). This treats proper classes, to
be defined in the next section, in a different manner than ZFC; but is
essentially equivalent to ZFC for practical purposes.
The hypothesis of the replacement axiom requires that for any x
there can be at most one y; degenerate cases where x or y does not
occur free in F are allowed. Given a set u of x’s, the corresponding y’s
can be collected into a set. Notation such as v = F [u] is introduced by
some authors.
The notation w ∈
/ x is used for ¬w ∈ x. A set x is said to be empty
if ∀w(w ∈
/ x). That there is a set follows by predicate logic; ∃x(x = x)
is provable. By separation and extensionality there is a unique empty
set; ∅ is used to denote it. The notation IsEmpt(x) in the axiom of
infinity can be replaced by x = ∅, and will no longer by used. Likewise,
IsSuc(w, v) will no longer be used; it can be replaced by v = w ∪ {w}.
This axiom implies the existence of an infinite set; but this requires a
fair amount of formal development and further discussion is deferred to
section 13.
By separation the set {w : w ∈ x ∧ w ∈ y} exists; it is called the
32
intersection of x and y, and denoted x ∩ y. The requirement on w in the
axiom of foundation is thus w ∩ x = ∅; w and x are said to be disjoint
in this case. If w ∈ x is thought of as implying that w is simpler than
x, the axiom of foundation states that a nonempty set x has an element
which is as simple as possible among the elements of x. Such an element
will be called ∈-minimal.
A set x is an ordered pair if and only if ∃u∃v(x = {{u}, {u, v}}); u
is the first component and v is the second. The notation hu, vi is used
for an ordered pair. Nested use of defined functions is handled by introducing existentially quantified variables. For example x = {{u}, {u, v}}
can be written as ∃s∃t(s = {u} ∧ t = {u, v} ∧ x = {s, t}).
A relation is a set of ordered pairs. The domain of a relation is
the set of its first components (a more formal definition is left to the
reader), and the range is the set of its second components. A function
is a relation f which is single-valued in the second component, i.e.,
∀x∀y∀z(hx, yi ∈ f ∧ hx, zi ∈ f ⇒ y = z). A choice function on a set x of
nonempty sets is a function f whose domain is x, such that if hu, vi ∈ f
then v ∈ u. The terminology “f is a choice function on x” will be used,
rather than IsChoiceFunc(f, x) as written in the statement of the axiom
of choice. The axiom of choice states that a system of choices can be
made, of elements from the sets of a collection of nonempty sets.
The pairing axiom is actually redundant; it is included for historical
reasons, and because it is so fundamental. To prove it, first use the power
set axiom to prove that Pow(Pow(∅)) exists and has two elements. Then
use the replacement axiom with the formula x = ∅∧y = u∨x = {∅}∧y =
v.
Likewise the separation axiom is redundant. Given F as in the
separation axiom, apply replacement with the formula F ∧ y = x (with
variables renamed as necessary).
Some authors include an axiom stating the existence of the empty
set; as already seen this is redundant.
ZFC is accepted as the axiom system for mathematics on the basis
of experience. It is possible to give arguments that the axioms are true
facts about sets; see [Shoenfield2].
Basic facts of set theory may be proved using ZFC; indeed, this is
evidence that ZFC is adequate. Outlines of such proofs will be given as
necessary in later sections. Here the fact mentioned in section 8, that
∪x is the least upper bound of the elements of x in the subset order,
will be shown.
Let y = ∪x (which recall equals {v : ∃w(w ∈ x ∧ v ∈ w)}). For
w ∈ x, if v ∈ w then v ∈ y by definition of y; thus w ⊆ y. This shows
that y is an upper bound. If z is any upper bound, and v ∈ y, then by
33
definition v ∈ w for some w ∈ x. By the assumption that z is an upper
bound, w ⊆ z, and so v ∈ z. Thus, y ⊆ z has been shown, and y is the
least upper bound.
The greatest lower bound also exists, namely {v : ∀w(w ∈ x ⇒ v ∈
w)), provided x is nonempty. This is a set, because it equals {v ∈ w0 :
∀w(w ∈ x ⇒ v ∈ w)) where w0 is any member of x. It is denoted ∩x.
Letting y = ∩x, it may be seen that y is a lower bound, and for any
lower bound z, z ⊆ y.
While ZFC has proved adequate for contemporary mathematics, as
will be seen it suffers from the deficiency that basic questions of mathematics are independent of it. To settle independent questions ZFC must
be enlarged. In current mathematics there is intensive research underway, as to how this should be done. Some references are: [Bagaria],
[Dowd2], [Foreman], [Friedman], [Koellner1], [Steel1]. Of course, enlarging ZFC would settle some questions; but as observed in section 10,
others would remain independent.
12. Proper classes.
If x ∈ x then {x} violates the axiom of foundation; thus, x ∈
/ x is
a theorem of ZFC. The universe of discourse of set theory, that is, the
collection of all sets, is denoted by the symbol V . This cannot be a set,
else V ∈ V .
Mathematics has dealt with this situation by considering V to be
some sort of collection which is not a set, so that special reasoning must
be applied to such collections. Indeed, Cantor realized the need for such
caution in his later work. In current common usage, such collections are
called proper classes.
A proper class is a “large subset of V ”. Suppose F is a formula.
It might be provable in ZFC that {x : F } is a set. On the other hand
it might be provable that it is not; indeed, it was just proved that
{x : x = x}, which is V , is not a set. Other proper classes are commonly
encountered in the development of set theory.
From the point of view of ZFC, a proper class is {x : F }, which is
not a set. From a more general perspective these are only very few of
the proper classes; however this perspective must remain intuitive, at
least if arguments are to be carried out in ZFC.
Note that a set is a “small subset of V ”, because the elements
of a set are themselves sets, since in set theory every object is a set.
This sometimes seems unreasonable to beginning students; but it is a
fundamental fact of mathematics that taking this approach yields an
axiom system for all of mathematics.
Since V is not a set, the application of formal logic to set theory
encounters complications. This concerned some set theorists in the early
34
stages of discussion, for example Skolem, but modern set theory accepts
them as a fact of life. In particular, if ZFC is consistent then it has a
countable model M , which is a set; and “uncountable” sets exist in M
only as objects which satisfy the relevant formula, and are not “actually”
uncountable.
V is the domain of the “natural” model of ZFC. The logical complications arise because this is not a set. The domains of mathematics
are all sets, so the problem generally arises only in set theory. Other
proper classes may be models of set theory, but in a sense which must
be specified. An example is given in section 19.
The term “class” may be used for {x : F }, even if it will be shown
later to be a set. If F is a formula with free variable x defining a
class C, a convenient notational device is to write x ∈ C instead of the
subformula F , in a formula. Also, the term “class” is sometimes used,
rather than proper class; the context should clarify the usage.
13. Ordinals and cardinals.
The theory of ordinal and cardinal numbers has been a topic of set
theory of fundamental importance since the earliest days of set theory,
Cantor’s work in the 1870’s. The modern treatment involves some technical concepts, and even in an informal discussion some discussion of
these must be given.
In developing basic set theory, facts must be proved in a certain
order; for example there is yet no definition of the non-negative integers,
so the notion of an infinite sequence xi for i ∈ N cannot yet be used.
The ordinal numbers must be defined first.
A set x is called transitive if ∈ satisfies a version of the transitivity
law, namely, v ∈ w ∧ w ∈ x ⇒ v ∈ x. Informally speaking, x is closed
under the iterated operation of taking elements. Although the notion
of a transitive set is a technical one, it has turned out to be quite useful
in set theory.
A transitive set x is called an ordinal if in addition the trichotomy
law holds, that is, if v ∈ x ∧ w ∈ x ⇒ (v ∈ w ∨ v = w ∨ w ∈ v). Greek
letters α, β, γ, δ will be used to denote ordinals, as is commonly done in
set theory.
Lemma 1.
a. If x ⊆ α is transitive then x is an ordinal.
b. If x ∈ α then x is an ordinal.
c. If β ⊂ α then β ∈ α.
d. Either α ⊆ β or β ⊆ α.
Proof: Part a follows because if v, w ∈ x then v, w ∈ α, so they are
related by ∈. For part b, x ⊆ α because α is transitive. If w ∈ x and
v ∈ w then v, w ∈ α, so w ∈ x, w = x, or x ∈ w; but the latter two
35
possibilities contradict foundation. For part c, suppose γ is a ∈-minimal
element of α − β. If δ ∈ γ then δ ∈ α, so δ ∈ β else γ is not ∈-minimal.
If δ ∈ β then δ ∈ γ, since δ = γ or γ ∈ δ both imply γ ∈ β. Thus, β and
γ have the same elements and so are the same set (by extensionality),
and β ∈ α as was to be shown. For part d, α ∩ β is readily verified to
satisfy the defining properties of an ordinal. If α ∩ β = α then α ⊆ β,
and if α ∩ β = β then β ⊆ α; in the remaining case by part c α ∩ β ∈ α
and α ∩ β ∈ β, so α ∩ β ∈ α ∩ β, a contradiction. ⊳
The collection of ordinals is denoted Ord. If Ord were a set then
by the lemma it would be an ordinal, contradicting Ord ∈
/ Ord. Thus,
Ord is a proper class. The notation α < β is used for α ∈ β. It follows
from the lemma that < satisfies the axioms for the strict part of a linear
order on Ord.
Lemma 2.
a. If x is a transitive set of ordinals then x is an ordinal.
b. If x is a set of ordinals then ∪x is an ordinal.
Proof: For part a, if α, β ∈ x then trichotomy holds because α, β
are ordinals. For part b, suppose α ∈ ∪x, say α ∈ β where β ∈ x; and
γ ∈ α. Then γ ∈ β, so γ ∈ ∪x. Thus, ∪x is transitive, and it is a set of
ordinals, so by part a it is an ordinal. ⊳
Lemma 3.
a. ∅ is an ordinal, denoted 0. 0 ≤ α for any α.
b. α ∪ {α} is an ordinal, called the successor of α and denoted α + 1.
If α ≤ β ≤ α + 1 then either β = α or β = α + 1.
Proof: For part a, ∅ satisfies the requirements for an ordinal vacuously; and ∅ ⊆ α. For part b, the requirements are readily verified, and
α ⊆ β ⊆ α ∪ {α}. ⊳
An ordinal α is called a successor ordinal if there is a β < α such
that α = β + 1. An ordinal α is called a limit ordinal if it is not 0, and
for all β < α, β + 1 < α. If α is not 0 or a successor ordinal then α is a
limit ordinal, since if β < α then β + 1 cannot equal α, so must be less
than α.
The axiom of infinity states that there is a set x with the property
(∗) ∅ ∈ x ∧ ∀w(w ∈ x ⇒ w ∪ {w} ∈ x).
The subsets of x having property (∗) form a nonempty set y, and ∩y is
the smallest set having property (∗). Let ω denote this set.
Theorem 4. ω is the smallest limit ordinal.
Proof: Let S = {α ∈ ω : α ∈ Ord}. S is readily seen to have
property (∗), whence ω ⊆ S; since clearly S ⊆ ω, S = ω, and ω is a
set of ordinals. Let T = {α ∈ ω : α ⊆ ω}. T is readily seen to have
property (∗), so ω ⊆ T , so α ⊆ ω ⇒ α ∈ ω. Thus, ω is a transitive set of
ordinals, whence it is an ordinal. Since ω has property (∗) it is a limit
36
ordinal. Any limit ordinal β has property (∗), so ω ⊆ β. ⊳
The elements of ω are the ordinals 0+1+· · ·+1, where 0 denotes the
empty set and there are n 1’s added. That is, ω is a copy of the integers,
with n+1 the successor function. It is easily seen that n = {0, . . . , n−1}.
As promised in section 5, in set theory there is no need to define Nn ,
since it is the same thing as n.
The ordinals are in many ways a generalization of the integers.
For example, there is an induction principle for ordinals, called transfinite induction, stated in the following theorem. To state it, a notational convenience will be adopted. If F is a formula let ∀αF denote
∀α(α ∈ Ord ⇒ F ), so that α stands for a variable ranging over ordinals.
Similarly ∃αF denotes ∃α(α ∈ Ord ∧ F ).
Theorem 5. ∀α(∀β(β < α ⇒ Fβ/α ) ⇒ F ) ⇒ ∀αF .
Proof: By contraposition and replacing F by ¬F it suffices to prove
∃αF ⇒ ∃α(F ∧ ∀β(β < α ⇒ ¬Fβ/α )). To prove this, suppose F holds
at γ. Then {α < γ : F } is a nonempty set of ordinals. Let α be an
∈-minimal element; then F ∧ ∀β(β < α ⇒ ¬Fβ/α ) holds. ⊳
Another fundamental tool making use of the ordinals is transfinite
recursion, where a function F from Ord to V defined by a recursion
involving a function G from V to V . The use of the term “function”
needs to be clarified; G is a proper class given by a formula, which has
free variables x and y. For each value for x there is a unique value for
y such that G holds (i.e., this has been proved in ZFC). Likewise F is
a proper class, its formula is derived from that for G, and it may be
proved in ZFC that it is a function in the same sense.
F (α) = y if and only if there exists a function f with domain α such
that f (β) = G(f ↾ β) for all β < α, and y = G(f ). Using transfinite
induction it can be shown that for all α there is a unique f and y; details
are omitted.
A binary relation < on a set S is said to be well-founded if for
every subset T ⊆ S there is an element x ∈ T such that for any y ∈ T ,
y 6< x. Such an element x of T is said to be a minimal element; <
is well-founded if and only if every subset contains a minimal element.
An infinite descending chain in S is a function f : ω 7→ S such that
f (i + 1) < f (i) for all i ∈ ω.
Theorem 6. A binary relation < on S is well-founded if and only if
there is no infinite descending chain.
Proof: Suppose < is well-founded, and f : ω 7→ S. Then f [ω] ⊆ S,
so there is a minimal element x ∈ f [ω], say x = f (i). In particular,
f (i + 1) 6< f (i), so f is not an infinite descending chain. Suppose < is
not well-founded; then there is a subset T ⊆ S such that ∀x ∈ T ∃y ∈
T (y < x). Using the axiom of choice there is a function g : T 7→ T such
37
that g(x) < x. Using recursion there is a function f : ω 7→ T such that
f (i + 1) < f (i) for all i ∈ ω. ⊳
If S is a set then ∈ can be considered as a binary relation on S.
By the axiom of foundation this relation is well-founded, and so there
is no infinite descending chain xi of elements of S, i.e., with xi+1 ∈ xi
for all i ∈ ω. To show that there is no infinite descending chain at all,
it suffices to show that given any x0 there is a transitive set S with
x0 ⊆ S. In fact there is a smallest such S. Define sets Si for i ∈ ω by
the recursion S0 = x0 , Si+1 = ∪Si ; and let S = ∪i∈ω Si .
A binary relation < on a set S is said to be a well-order it is the
strict part of a linear order, and is well-founded. For example, if α is an
ordinal then ∈ is a well-order on α.
Lemma 7. If f : α 7→ β is an order isomorphism then α = β and f
is the identity function, that is, f (γ) = γ for all γ < α.
Proof: If not there is a least γ < α such that f (γ) 6= γ, say f (γ) = δ.
If δ < γ then f (δ) = δ, a contradiction. If δ > γ then f (ζ) = γ, for
some ζ, and ζ < γ must hold, whence f (ζ) = ζ, again a contradiction.
Thus, f (γ) = γ for all γ < α, whence β = {γ : γ < α}, whence β = α.
⊳
Theorem 8. If < is a well-order on S then there is a unique ordinal
α such that there is an order isomorphism f : α 7→ S; further the
isomorphism is unique.
Proof: Similarly to a definition in section 8, say that a subset T ⊆ S
is <-closed if x ∈ t ∧ y < x ⇒ y ∈ T . Let C be the class of functions
g : α 7→ S such that g is an order isomorphism from α to g[α] and
g[α] is <-closed. Suppose g1 : α1 7→ S and g2 : α2 7→ S are in C.
Then if β ∈ α1 ∩ α2 then g1 (β) = g2 (β). Suppose not, and let β be
the smallest counterexample; suppose without loss of generality that
g1 (β) = x < y = g2 (β) where x, y ∈ S. Since g2 [α2 ] is <-closed,
g2 (γ) = x for some γ, and γ < β must hold. But then g1 (γ) = g2 (γ) = x,
a contradiction. Thus, f1 ⊆ f2 or f2 ⊆ f1 . Since there are only a set
of initial segments of S, it follows using lemma 7 and replacement that
C is a set. Let f = ∪C. If g : α 7→ S is in C and β < α then g ↾ β
is in C; it follows that Dom(f ) is an ordinal α. It is easy to check that
f is strictly order-preserving; to show that it is an order isomorphism
it suffices to show that it is surjective. Suppose not, and let x be least
such that x ∈
/ Ran(f ). Then f : α 7→ S is in C, and if y < x then
y ∈ f [α]. Let f ′ be the function with domain α + 1, which is the same
as f below α, and with f (α) = x. It is easy to check that f ′ is in
C. This yields a contradiction showing that x does not exists; and f is
surjective. Uniqueness of α and f follows using lemma 7. ⊳
The ordinal α is called the order type of the well-order. Thus, the
38
ordinals are a “system of representatives” for the well-orders. This was
a topic of concern in set theory, until a nice definition of the ordinals
was given (by John von Neumann). The “equivalence relation” that
two well-orders are order isomorphic, is a proper class, so one cannot
define the ordinals by taking a quotient. The modern definition of the
ordinals has a number of additional desirable properties, and proofs of
basic facts are simple.
Theorem 9. If S is a set then there is an ordinal α and a bijection
f : α 7→ S.
Proof: Let g be a choice function for Pow(S) − {∅}. By transfinite
recursion define f so that f (β) = f (S − g[β]). Since f is injective and
its range is a set, by replacement its domain is a set, in fact a transitive
set of ordinals, that is, an ordinal. ⊳
That is, “any set can be well-ordered”, a fact known as the wellordering principle. Cantor believed this was true, but did not know how
to prove it. In fact, given the other axioms of set theory the well-ordering
principle is equivalent to the axiom of choice; a proof is omitted.
Note that α will in general depend on g, and the ordinal of the
well-ordering is not unique (theorem 8 only says that it is unique if S
already is equipped with some well-order). For example ω can be wellordered in “natural” order; or by listing the even integers in natural
order, followed by the odd integers in natural order. This latter order
is denoted as ω + ω, or ω · 2.
Indeed, the following functions on the ordinals may be defined by
transfinite recursion.
- α + 0 = α, α + (β + 1) = (α + β) + 1,
α + β = ∪γ<β (α + γ) when β is a limit ordinal.
- α · 0 = α, α · (β + 1) = (α · β) + α,
α · β = ∪γ<β (α · γ) when β is a limit ordinal.
Basic facts about “ordinal arithmetic” include the following. Proofs
may be found in introductory treatments of set theory, in [Jech2] or
[Monk1] for example.
- The type of the order obtained by appending an order of type β to
an order of type α is α + β.
- The type of the order obtained by appending β copies of an an
order of type α, one after another, is α · β.
- On ω the operations + and · are the usual integer operations.
- α + (β + γ) = (α + β) + γ and α · (β · γ) = (α · β) · γ.
- + is not commutative; for example 1 + ω = ω.
- · is not commutative; for example 2 · ω = ω.
- α · (β + γ) = α · γ + β · γ.
39
- If δ > 0 then for any α there are unique β and γ such that α =
δ · β + γ.
The cardinal numbers play a role regarding the size of sets, similar
to the role played by the ordinals regarding the well-ordering of sets.
Two sets are considered to be the “same size” if there is a 1-1 correspondence between them. An ordinal is said to be a cardinal if and only
if it is not in 1-1 correspondence with any smaller ordinal. The following
theorem is known as the Bernstein-Cantor-Schroder theorem.
Theorem 10. Suppose injective functions f : S 7→ T and g : T 7→ S
are given; then there is a bijection from S to T .
Proof: By identifying T with its image under g, we may consider T
to be a subset of S; we thus have T ⊆ S and f : S 7→ T injective. Define
S0 = X, Si+1 = f [Si ]; and T0 = T , Ti+1 = f [Ti ]. Since f is injective,
f [Si − Ti ] = f [Si ] − f [Ti ]. It follows that each point either belongs to
some Si − Ti , some Ti+1 − Si , or every Si . We may map points x in the
first category to f (x), and the remaining points to themselves. ⊳
It follows that an ordinal is a cardinal if and only if it has no
injection to a smaller ordinal, if and only if it has no surjection from a
smaller ordinal.
The class of cardinal numbers will be denoted Card; Greek letters
κ, λ, µ, ν are customarily used to denote cardinals. The order relation
< on Ord induces an order relation on Card; given two cardinals κ and
λ, either κ < λ, κ = λ, or κ > λ, and κ ≤ λ if and only if there is an
injection from κ to λ. If κ and λ are distinct cardinals there is no 1-1
correspondence between them. Every ordinal is in 1-1 correspondence
with exactly one cardinal, namely the least such ordinal.
By the well-ordering principle every set S is in 1-1 correspondence
with some ordinal, and hence in 1-1 correspondence with exactly one
cardinal κ; κ is called the cardinality of S, and denoted |S|. As noted
in section 5 for −, the use of |r| for the absolute value of a real number,
and |S| for the cardinality of a set, ordinarily causes no confusion.
The following theorem was already noted in section 5; it is called
the pigeonhole principle.
Theorem 11. If m and n are integers with m > n then there is no
injection from m to n.
Proof: The proof is by induction on m. If m = 0 the theorem is
vacuously true. If f : m + 1 7→ n then there is a bijection g : n 7→ n
such that if f ′ = g ◦ f then f ′ (m) = n − 1. Then f ′ ↾ m contradicts the
induction hypothesis. To obtain g, if f (m) 6= n − 1 let g(n − 1) = f (m)
and g(f (m)) = n − 1; in other cases let g(i) = i. ⊳
Theorem 12. If x is a set of cardinals then ∪x is a cardinal.
Proof: By lemma 2 ∪x is an ordinal. Let κ = | ∪ x|. If κ ∈ ∪x then
40
κ ∈ λ for some λ ∈ x. Then λ ⊆ ∪x, so λ ≤ | ∪ x| = κ, a contradiction.
Thus, κ = ∪x. ⊳
By theorem 11 the integers are distinct cardinals. By theorem 12
ω is a cardinal. As noted in section 5, a set is said to be finite if its
cardinality is an integer, else infinite. A set is said to be countably
infinite if its cardinality is ω.
Theorem 13. If κ is an infinite cardinal then κ is a limit ordinal.
Proof: If α + 1 is an infinite successor ordinal then ω ∈ α, and an
injective map from α + 1 to α can be constructed. Namely, for i ∈ ω
map i to i + 1, map α to 0, and map other elements to themselves. ⊳
Theorem 14. For any set x, |Pow(x)| > |x|.
Proof: Let f : x 7→ Pow(x), and let y = {w ∈ x : f (w) ∈
/ w}.
Suppose f (w) = y; then w ∈ f (w) if and only if w ∈ y if and only if
w∈
/ f (w). Hence w does not exist, so f is not a surjection. Since f was
arbitrary, there is no surjection from x to Pow(x). ⊳
This is another of the many basic theorems of set theory due to
Cantor. By theorems 12 and 14 Card is a proper class; if it were a set
there would be a cardinal larger than any cardinal in the set. It also
follows that for any infinite cardinal κ there is a next largest cardinal,
which will be denoted κ+ .
By transfinite recursion the function ℵ from Ord to Card can be
defined, where writing ℵα for the value at α,
ℵ0 = ω,
ℵα+1 = ℵ+
α , and
ℵα = ∪β<α ℵβ when α is a limit ordinal.
It is readily verified that every infinite cardinal is ℵα for some α.
For the next theorem a well-order on the ordered pairs of ordinals
will be defined. This well-order has a variety of other uses. Note that
α ∪ β is the larger of α and β. Letting γi = αi ∪ βi for i = 1, 2, say that
hα1 , β1 i <OP hα2 , β2 i if and only if
γ1 < γ2 or
γ1 = γ2 ∧ α1 < α2 or
γ1 = γ2 ∧ α1 = α2 ∧ β1 < β2 .
Lemma 15. <OP satisfies the axioms for a well-order.
Proof: To simplify the notation, for an integer i let Pi = hαi , βi i,
and let γi = αi ∪ βi . Suppose P1 <OP P2 <OP P3 . If γ1 < γ2 or γ2 < γ3
then P1 <OP P3 ; otherwise if α1 < α2 or α2 < α3 then P1 <OP P3 ;
otherwise β1 < β2 and β2 < β3 so P1 <OP P3 . Thus, <OP is transitive.
Clearly it is irreflexive. If P1 6= P2 then either γ1 < γ2 or γ2 < γ1 , or
α1 < α2 or α2 < α1 , or β1 < β2 or β2 < β1 ; thus, <OP is a linear order.
If Pi is a countably infinite nondecreasing infinite sequence then γi must
eventually become constant, then αi must, then βi must. Hence, <OP
41
is a well-order. ⊳
Let LOP (α, β) denote {hα′ , β ′ i : {hα′ , β ′ i <OP hα, βi}. Let Γ(α, β)
be the order type of LOP (α, β). Γ is a proper class order isomorphism
from Ord × Ord (the proper class of ordered pairs of ordinals) ordered
by <OP , to Ord.
Theorem 16. For an infinite cardinal κ, Γ[κ × κ] = κ.
Proof: For an integer n |Γ[n×n]| = n2 , and the claim follows for κ =
ω. Suppose inductively that the claim holds for infinite cardinals λ < κ.
If ζ < κ for an infinite ordinal ζ, let λ = |ζ|; then |ζ × ζ| equals |λ × λ|
(because there is a bijection), equals |Γ(λ×λ)| (because Γ is a bijection),
equals λ (by the induction hypothesis); thus, |ζ × ζ| < κ. Suppose
hα, βi ∈ κ × κ, and let γ = α ∪ β; then LOP (α, β) ⊆ (γ + 1) × (γ + 1). It
follows that |Γ(α × β)| = |LOP (α, β)| < κ, whence Γ(α × β) < κ. Thus,
Γ[κ × κ] ⊆ κ has been shown. Clearly, |Γ[κ × κ]| ≥ κ, and Γ[κ × κ] is an
ordinal, and Γ[κ × κ] = κ follows. ⊳
There are other ordinals α for which Γ[α × α] = α; see theorem
46.13.
Computing the cardinality of sets is frequently done in set theory.
For this purpose, and other purposes, κ + λ is defined as the cardinality
of the disjoint union of κ and λ, and κ · λ is the cardinality of κ × λ. It
is easily seen using theorem 16 that κ + λ = κ · λ = sup(κ, λ) = κ ∪ λ.
Again, the use of + and · for both the ordinal and cardinal operations
rarely causes confusion; however some authors use different symbols for
the ordinal operations.
As an example of computing cardinalities, if S has infinite cardinality κ then S k (the set of ordered k-tuples) does also, for any k. The
set of finite sequences does also; there is an injection to ω × κ mapping
a sequence of length k to the pair hk, αi where α is the “code” (ordinal
rank in an enumeration of S k ) for the k-tuple.
Ordinal and cardinal numbers are used throughout mathematics,
including set theory. Additional properties will be given as they are
needed. Also, further details of various basic arguments, omitted so far,
will be given in section 17.
14. The real numbers (II).
Section 1 of [Miller] is titled, “What are the reals, anyway?” At
least five answers to the question of what a real number is, are commonly
encountered:
1. An element of R.
2. An element of the completion of Q.
3. A subset of N .
4. A function from N to {0, 1}.
5. A function from N to N .
42
Definitions 1 and 2 are equivalent, and either may be taken as
the “official” definition. After earlier work, both constructions were
published in 1872, the first by Dedekind and the second by Cantor.
Definitions 3 to 5 are used in specific circumstances as a matter of
convenience. As will be seen, they are “nearly equivalent” to the official
definition; further, in these cases additional structure may be imposed
on the collection of reals.
The main purpose of this section is to show that |R| = |Pow(ω)|.
Other facts of interest will also be shown.
R has already been given as the unique ordered field having the
least upper bound property. A brief description will be given of the
second construction; further details may be found in any of numerous
references, including chapter 17 of [Dowd1].
A distance function or metric function (often called simply a “metric”) on a set S is a binary function d such that the following hold.
1. d(x, y) ≥ 0.
2. If d(x, y) = 0 then x = y.
3. d(x, y) = d(y, x).
4. d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).
The function |x− y| is readily verified to be a metric on R, and a fortiori
on Q.
A metric space is defined to be a set S equipped with a metric
function. The “open ball” Bxǫ in S is defined to be {y ∈ S : d(x, y) < ǫ}.
By an infinite sequence in S is meant a function f : ω 7→ S; such is
frequently written hxi : i ∈ ωi, or simply hxi i. A point x is said to be a
limit of the infinite sequence hxi i if every open ball Bxǫ contains all but
finitely many points of the sequence. In this case, the sequence is said
to converge to x.
An infinite sequence converges to at most one point, since given
two distinct points x and y there are disjoint open balls Bxǫ and Byǫ .
However there may be no point.
Certainly hii does not converge. A
√
more relevant example is h⌊ 2i⌋/i : i > 0i (the reader is assumed to be
familiar with the “greatest integer” function ⌊x⌋ on R). This converges
in R but not in Q.
An infinite sequence hxi i in a metric space is called a Cauchy sequence if for all ǫ > 0 there is an n such that d(xi , xj ) < ǫ if i, j ≥ n. A
metric space is called complete if every Cauchy sequence converges to
some limit.
It is not difficult to see that R is complete. Let hxn i be a Cauchy
sequence. The set of values taken on by the sequence is bounded above;
choose any ǫ > 0, choose an N so that if i, j ≥ N then |xi − xj | < ǫ,
and consider xN + ǫ. This bounds above xn for x ≥ N , so an upper
43
bound can be obtained by considering the maximum of this and the xn
for n < N . A similar argument shows that the set of values is bounded
below. It is clear that sup{xk : k ≥ n} exists for each n. Letting bn
denote this value, it is clear that inf{bn } exists. Let x denote this value;
the sequence xn converges to x. Given ǫ, N ≤ M ≤ L may successively
be chosen so that |x − bN | < ǫ/3, |bN − xM | < ǫ/3, and |xM − xn | < ǫ/3
for n ≥ L.
A function d satisfying the defining properties of a metric, with
2 replaced by d(x, x) = 0, is called a pseudo-metric. A pseudo-metric
space is a set equipped with a pseudo-metric. Given such, the relation
d(x, y) = 0 is an equivalence relation, and the distance d([x], [y]) between
two classes may be defined as d(x, y). This yields the “quotient” metric
space.
Suppose X is a metric space, with metric d; let X1 be the set of
Cauchy sequences. If x = hxi i and y = hyi i are two such, using the
triangle inequality, hd(xi , yi )i is a Cauchy sequence in R. The limit
thus exists; define d1 (x, y) to be this limit. The function d1 is a pseudometric. The quotient metric space metric space is called the completion
of X. It may be shown that it is the essentially unique complete metric
space which contains X as a “dense” subspace. Since R is complete and
Q is a dense subspace of R, R is the completion of Q (i.e., there is an
isomorphism of metric spaces).
The use of the term “dense” above deserves clarification. A topological space is defined to be a set S, equipped with a family T of sets
with the following properties.
- ∅, S ∈ T ;
S
- if S ⊆ T then S ∈ T ;
- if S1 , S2 ∈ T , then S1 ∩ S2 ∈ T .
T is called a topology on S, and its members are called open sets. Let
Tb be a set of subsets of S and let T = {∪Q : Q ⊆ Tb }. T is a topology
if and only if for all U, V ∈ Tb and all x ∈ U ∩ V there is a W ∈ Tb with
x ∈ W . Under these conditions Tb is called a base for T .
The open balls Bxǫ form a base for a topology on a metric space
S, called the metric topology. A set U ⊆ S is open if and only if for all
x ∈ U there is an open ball Bxǫ such that Bxǫ ⊆ U . A subset Q of a
topological space S is said to be dense if for every open set U , Q∩U 6= ∅.
In the case of a metric space, it suffices that Q ∩ U 6= ∅ for every open
ball U .
The last characterization of course does not require the introduction
of topological spaces. However there is an alternative characterization of
the topology on R, which shows that the foregoing definition of a dense
subset agrees with that given in section 8. If S is a linear order without
44
endpoints, let (x, y) denote the “open interval” {w : x < w < y}. These
form a base for a topology on S, called the order topology. R is both a
metric space and a linear order without endpoints, and the metric and
order topologies are the same.
In set theory the notation xy is heavily overloaded. If x and y are
sets, xy denotes {f : f : y 7→ x}. If κ and λ are cardinals, κλ denotes
|{f : f : λ 7→ κ}|. Which is intended can be confusing. Here 2ω will be
used in the former sense, and 2ℵ0 in the latter.
There is an obvious bijection from 2y to Pow(y), mapping f to
{w ∈ x : f (w) = 1}. In particular, |Pow(ω)| = |2ω | = 2ℵ0 . Also, reals
as in definition 3 may be identified with reals as in definition 4. The
notation c (the cardinality of the continuum) is frequently used for 2ℵ0 .
If t : n 7→ 2 is a finite string of 0’s and 1’s, let Ut denote {f ∈
2ω : f ↾ n = t}. The sets Ut form the base for a topology on 2ω . This
topological space is called Cantor space; the notation C will be used to
denote it.
A function f : S1 7→ S2 between topological spaces is said to be
continuous if for any open set U ⊆ S2 , f −1 [U ] is an open set. For
metric spaces, this is equivalent to the “ǫ-δ” characterization, given
x ∈ X and ǫ > 0 there is a δ > 0 such that d(f (x), f (y)) < ǫ whenever
d(x, y) < δ. A homeomorphism is a bijection f such that both f and
f −1 are continuous. A homeomorphic embedding is an injection, which
gives a homeomorphism to its range.
In R, let [a, b] denote the “closed interval” {r : a ≤ r ≤ b}.
Theorem 1. There is a homeomorphic embedding of C in [0, 1].
P
Proof: For f ∈ C let j(f ) = i∈ω (2·f (i))/3i+1 ; it is easy to show
that j(f ) ∈ [0, 1]. Given distinct f0 and f1 let i be the first position
where they differ, and suppose without loss of generality that f0 (i) = 0
and f1 (i) = 1; it is easy to show that j(f0 ) < j(f1 ). Given an open
interval V in [0, 1] a finite string t may be found so that j(t) ∈ V , and
this may be extended to a finite string t′ , so that for any f ∈ Ut′ ⊆ V ;
this shows that f is continuous. To see that f is a homeomorphic
embedding it suffice to show that for any finite 0-1 sequence t there
is an open interval V in R such that f [Ut ] ∩ V = f [Ut ] (since this shows
that f is an “open” map to f [C], meaning it maps open sets to open
sets, and the open sets of f [C] are those of the “relative” or “subspace”
topology, namely the intersections of the subspace with the open sets
of the parent space). The left endpoint r of f [Ut ] is contained in some
open interval V ⊆ R, such that if s < r and s ∈ V then s ∈
/ f [Ut ]; and
a similar fact holds for the right endpoint. (It can be concluded that f
is a homeomorphic embedding by a general fact of topology, since C is
compact and [0, 1] is Hausdorff.) ⊳
45
The image of C under the embedding given above is called the
“Cantor”, or “Cantor middle thirds”, set. It can be described as obtained from [0, 1] by successively removing the middle third of remaining closed intervals, for infinitely many stages (the intervals removed are
open). This set has a number of properties which make it an interesting
example in topology; a few such will be given below. In the following,
C will be used ambiguously to denote the image, a subset of [0, 1].
A subset K of a topological space is said to be closed if K c is open;
equivalently if given x ∈
/ K there is an open set U such that x ∈ U and
K ∩ U = ∅. C is a closed subset of [0, 1]; this may be seen since it equals
[0, 1] − ∪i∈ω Vi where Vi is open. It is easy to verify that a closed subset
of a complete space is complete; hence C is complete. It follows that C,
given as 2ω with a topology, is “completely metrizable”, meaning that
a metric can be defined, whose metric topology is the given topology,
and the resulting metric space is complete.
The closure of a subset W of a topological space S is the set of points
x ∈ S, such that any open set containing x has nonempty intersection
with W . This is the smallest closed set containing W , and if W is closed
the closure of W is W . The interior of a subset W of S is the set of
x ∈ W such that there is an open set U such that x ∈ U ⊆ W . This is
the largest open set contained in W , and if W is open the interior of W
is W .
A subset of S is said to be nowhere dense if its closure has empty
interior. It is not difficult to see that C is nowhere dense in [0, 1].
A topological space S is said to be totally disconnected if for any
two distinct points x and y there are disjoint open sets U and V , such
that x ∈ U , y ∈ V , and X = U ∪ V . C is readily seen to be totally
disconnected (consider the space on 2ω ).
Theorem 2. There is a continuous surjection from C to [0, 1].
P
Proof: For f ∈ C let e(f ) = i∈ω f (i)/2i+1 . If r ∈ [0, 1] let qi
be the largest rational number of the form m/2i+1 such that qi ≤ r.
It is easily seen that hqi i yields an f such that e(f ) = r. If V is an
open interval in [0, 1] and r ∈ V , let f be such that e(f ) = r. A
sufficiently long prefix t of f can be found, such that e[Ut ] ⊆ V . Thus,
e is continuous. ⊳
The surjection of the proof (“binary notation”) is almost bijective;
reals r ∈ [0, 1] of the form m/2i+1 have two representations, t1 followed
by all 0’s, and t0 followed by all 1’s; all other r have a single representation.
There is a homeomorphism from (0, 1) to R, for example (2x −
1)/(1−(2x−1)2 ). It has been shown that |C| ≤ |R| = |(0, 1)| ≤ |[0, 1]| ≤
|C|, and so |R| = |C|. By theorem 13.14, |Pow(ω)| > |ω| = ℵ0 . Since
46
|C| = |Pow(ω)|, |R| = |C| = 2ℵ0 > ℵ0 . A set is said to be uncountable
if its cardinality is greater than ℵ0 ; thus, R, and also C, is uncountable.
Letting ω ω denote {f : f : ω 7→ ω}, a topology may be defined on
ω
ω in much the same manner as the topology on 2ω . If t : n 7→ ω is a
finite string of non-negative integers let Ut denote {f ∈ ω ω : f ↾ n = t}.
The sets Ut form the base for a topology. This topological space is called
Baire space; the notation N will be used to denote it.
Theorem 3. There is a homeomorphic embedding of N in R.
Proof: Define sets Si of open intervals in R for i ∈ ω. S0 equals
(−∞, ∞), the entirety of R. At stage i + 1, given an interval I in
si choose qj for j ∈ Z such that q2j for j ≥ 0 increases to the right
endpoint; q2j+ for j ≥ 0 decreases to the left endpoint; and q1 < q0 .
Choose the qi so that q ′ − q < 2−(i+1 for two successive chosen rationals.
For each interval of Si , add to Si+1 the intervals (q, q ′ ) for each successive
chosen pair of rationals within the interval. To ensure that every rational
is eventually chosen, enumerate the rationals in some countably infinite
sequence ri , and ensure that ri is chosen in stage i. An element of N
determines a nested sequence of intervals Ii , namely let I0 = R, and if
f (i) = j choose that interval within Ii whose left endpoint is qj as Ii+1 .
Since the length of the intervals decreases to 0 a unique real r is singled
out; let j(f ) = r. Note that r is not an endpoint of any interval. That
e is a homeomorphism follows because e[Ut ] is the intersection of the
image of the embedding, with an open interval in R. ⊳
The construction ensures that the image of the embedding is the
R − Q, the set of “irrational” real numbers. For this reason Baire space
is sometimes called the irrationals.
There is a “classical” embedding of N in R, which has additional
properties of interest in basic number theory (see [HardWr]). Briefly,
there is an embedding of P ω in (1, ∞), as follows, where P = ω −
{∅} is the positive integers. For such a sequence hai i, define p0 = a0 ,
q0 = 1, p1 = a1 a0 + 1, q1 = a1 , and for n ≥ 2, pn = an pn−1 + pn−2
and qn = an qn−1 + qn−2 . The image of hai i under the embedding is
limn→∞ (pn /qn ).
There is an obvious injection from C to N, and it follows that |N| =
2ℵ0 . This can be shown more easily of course; in particular it is not
difficult to construct an injection from N to C. Also, it may be seen
using the following facts of cardinal arithmetic.
Lemma 4.
a. (κ · λ)µ = κµ · λµ .
b. κλ+µ = κλ · κµ .
c. κλ·µ = (κλ )µ .
d. If κ1 ≤ κ2 then κλ1 ≤ κλ2 .
47
e. If λ1 ≤ λ2 , and either λ1 6= 0 or κ 6= 0, then κλ1 ≤ κλ2 .
Proof: These facts are stated following lemma 3.3 of [Jech2]. Let
S, T , and U be sets. For part a, there is a bijection from (S × T )U to
S U × T U ; f corresponds to hπ1 ◦ f, π2 ◦ f i. For part b, if T and U are
disjoint there is a bijection from S T ∪U to S T × S U ; f corresponds to
hf ↾ T, f ↾ U i. For part c, there is a bijection from S T ×U to (S T )U ; f
corresponds to f¯, where f¯(u)(t) = f (t, u). For part d, if S ⊆ T then
there is an injection from S U to T U ; f corresponds to itself. For part
e, if T ⊆ U and either S or T is nonempty then there is an injection
from S U to T U ; f corresponds to f ′ where f ′ is any function such that
f′ ↾ T = f. ⊳
Theorem 5. If 2 ≤ λ ≤ κ then λκ = 2κ .
Proof: 2κ ≤ λκ ≤ (2λ )κ = 2λ×κ = 2κ . ⊳
15. The continuum hypothesis.
In section 14 it was proved that |R| = 2ℵ0 . Since it is a cardinal,
2 (the “cardinality of the continuum”) equals ℵα for some α. It was
shown by Paul Cohen in 1963 that the value of α cannot be determined
from ZFC. The continuum hypothesis (CH) is the statement “2ℵ0 = ℵ1 ”.
The terminology “continuum hypothesis” arises from the use of the
term “continuum” to denote the real line. In more recent usage, the
term “continuum” denotes any dense linear order without endpoints
which has the least upper bound property. The real line is such an
object, indeed the unique such containing the rationals as a dense subset;
but there are others, of varying cardinalities. Thus, the terminology
“continuum hypothesis” has become a bit of a historical artifact.
The history of the continuum hypotheses goes back to Cantor, who
discovered the problem of the cardinality of the continuum, and made
great efforts to prove that it was ℵ1 , but could not do so. He did prove
that every closed uncountable subset of R has cardinality 2ℵ0 , and later
results along this line were important advances.
In 1934 Sierpinski published a monograph giving various mathematical consequences of the continuum hypothesis. Although the independence of consequences of CH has been an ongoing topic of research,
it is a surprising fact that these are fairly technical, and no question of
great importance in everyday mathematics seems to require CH for its
proof. In what follows, six examples of implications of CH will be given.
Implications of CH for the real numbers occur concerning the meager, and the measure 0, subsets of R. These are both families of “small”
subsets, which are of interest to “set-theoretic topology”, as well as
topology itself. A set is meager if it is a countable union of nowhere
dense sets. The Lebesgue measure is a function assigning nonnegative
real numbers to some subsets of the real line, which assigns b − a to an
ℵ0
48
interval [a, b] where a < b; and which has various geometrically motivated properties. By a set of measure 0 is meant one whose Lebesgue
measure is 0.
For the first example, some facts about the topology of R are
needed. First, there are 2ℵ0 open sets, and so 2ℵ0 closed sets. To see
this, there are ℵ0 open intervals (q, r) with q and r rational numbers;
and any open set is a union of countably many of these; so there are at
most ℵℵ0 0 = 2ℵ0 open sets. Second, a nowhere dense set is contained in
a closed nowhere dense set, because the closure of a closed set K equals
K.
Example 1, a Lusin set exists. A Lusin set is an uncountable subset
S ⊆ R, such that for any meager set N , |S ∩ N | ≤ ℵ0 .
Theorem 1. If CH is true a Lusin set exists.
Proof: Under CH the closed nowhere dense sets may be enumerated
as hNα : α < ℵ1 i. Let xα be a real number which is not in ∪β<α Nβ ,
and let S = {xα : α < ℵ1 }. ⊳
For the second example, some facts about the Lebesgue measure
will be needed, which will be stated without proof. A Gδ subset of R is
one which is a countable intersection of open sets. As above, there are
2ℵ0 Gδ sets. Any measure 0 set is a subset of a Gδ measure 0 set. Also,
the measure 0 sets are closed under countable union.
Example 2, a Sierpinski set exists. A Sierpinski set is an uncountable subset S ⊆ R, such that for any measure 0 set N , |S ∩ N | ≤ ℵ0 .
The proof that such a set exists if CH is true is similar to the proof of
theorem 1, but using an enumeration of the Gδ measure 0 sets.
Example 3, cardinal invariants are all ℵ1 . Cardinal invariants are
cardinals associated with families of subsets of the real numbers, or
functions on the real numbers. They are between ℵ1 and 2ℵ0 , so if CH
holds they all equal ℵ1 . The Chicon diagram is a diagram of inequalities
which hold between ten such invariants. Various combinations of strict
inequalities which are permitted by the Chicon diagram are consistent
with ZFC; see [Bart], and [Jech2] for some discussion.
Example 4, the iterated integrals can be unequal. The following was
proved by Sierpinski in 1920. It was proved in 1980 that the question is
independent of ZFC.
Theorem 2. If CH holds, then there is a function f : [0, 1] × [0, 1] 7→
R1R1
[0, 1] such that the iterated Lebesgue integrals 0 0 f (x, y) dx dy and
R1R1
0 0 f (x, y) dy dx are unequal.
Remarks on proof: Let ≤ be a well-ordering of [0, 1] of order-type
ℵ1 (by CH such exists). Let f (x, y) = 1 if x ≤ y, else 0. For fixed y the
set {x : f (x, y) = 1} is countable, because it is a proper initial segment
R1
R1R1
of ℵ1 . It follows that 0 f (x, y) dx = 0, whence 0 0 f (x, y) dx dy = 0.
49
For fixed x the set {x : f (x, y) = 0} is countable, and it follows that
R1
R1R1
0 f (x, y) dy = 1, whence 0 0 f (x, y) dy dx = 1. ⊳
Example 5, Kaplansky’s problem. This problem is only mentioned;
it is a problem in analysis of whether a “discontinuous homomorphism”
exists, in a certain setting. In 1976 it was shown that the existence of
such follows from CH; and also that it is consistent with ZFC that there
is not one.
Example 6, no atomless measure. Say that a function µ : Pow([0, 1])
7→ [0, 1] is an atomless measure if it is countably additive, µ([0, 1]) = 1,
and µ(S) = 0 for every singleton set S. Banach and Kuratowski proved
in 1929 that if CH holds then no such measure exists; see corollary 10.17
of [Jech2].
The generalized continuum hypothesis (GCH) is the statement
“2ℵα = ℵα+1 ”. One consequence is that the cardinal exponentiation
ℵ
function ℵαβ is determined (see [Jech2]). Another fact of interest, proved
by Sierpinski in 1946, is that in ZF, the axiom of choice (AC) follows
from GCH. A proof may be found in [TakZar1].
16. Absoluteness.
A structure for the language of set theory has (in addition to equalb on the domain D, which is the interpretation of
ity) a binary relation ∈
the membership predicate. Although there are some contexts where an
b is the memberarbitrary relation is of interest, in common contexts ∈
ship relation of V . A structure of this type is called by various names,
such as “standard” or an “∈-structure”. An ∈-structure is thus a set
D, considered as a structure for the language of set theory, by letting
the interpretation of the nonlogical symbol ∈ be simply the membership
relation.
In such a structure, the formula x ∈ y holds between two elements
of D, if and only if it holds between them in V . In general, a formula F
is said to be absolute for a set (or class) D (considered as a structure)
if whenever elements of D are assigned to the free variables of F , the
formula is true in D if and only if it is true in V . F is absolute for a
family of structures (sets or classes), if it is absolute for each structure
in the family. Absoluteness was defined by Kurt Godel in 1940, and has
proved to be a useful definition in set theory.
The foregoing requires clarification; for example “true in V ” must
be made precise. This is accomplished as usual, by “formalizing” the
required facts as formulas of set theory, and proving them in ZFC. One
might suppose that the definition of the truth of a formula in a structure
might be formalized. Indeed this can be done; but a simpler approach
can be used, and includes cases such as class structures.
A common abbreviation in set theory is to use ∀w ∈ xF for ∀w(w ∈
50
x ⇒ F ) and ∃w ∈ xF for ∃w(w ∈ x ∧ F ). These are called bounded universal and existential quantification respectively. They occur in many
contexts, and have useful properties, such as theorem 1 below. A formula in the language of set theory is said to be ∆0 if all its quantifiers
are bounded (i.e., the clauses in the recursive definition of a formula
involving the quantifiers are modified to require bounded quantifiers).
The “relativization” of a formula F to the set D is obtained by
replacing all quantifiers ∀wF by ∀w ∈ DF , and similarly for ∃xF . This
can be more formally specified by giving a recursion on the formation
of F . The relativization can be to a class D, recalling that in this case
x ∈ D is an abbreviation for a formula. Also, a transitive class is a class
S such that x ∈ y ∧ y ∈ S ⇒ x ∈ S. Let Tran(x) be the formula stating
that x is transitive.
Theorem 1. Suppose D is a set or class, and F is a ∆0 formula
with free variables x1 , . . . , xk . Then
Tran(D) ∧ x1 ∈ D ∧ · · · ∧ xk ∈ D ⇒ (F ⇔ F D ).
Proof: By induction on F , the required formula follows by predicate
logic. If F is an atomic formula then F D is F . For ¬F , (¬F )D equals
¬(F D ) and the claim follows easily; similarly for the other propositional
connectives. Let (1) be ∀w(w ∈ x ⇒ F ) and let (2) be ∀w(w ∈ x ∧
w ∈ D) ⇒ F . (1)⇒(2) follows directly; and (2)⇒(1) follows using the
hypotheses x ∈ D and w ∈ x ∧ x ∈ D ⇒ w ∈ D. ⊳
Many predicates may be shown to be absolute using this theorem,
by simply writing ∆0 formulas for them. Examples include the following.
- x = ∅: ∀w ∈ x(w 6= w).
- x ⊆ y: ∀w ∈ x(w ∈ y).
- s = {u, v}: u ∈ s ∧ v ∈ s ∧ ∀w ∈ s(w = u ∨ w = v).
- p = hu, vi: ∃s ∈ p∃t ∈ p(s = {u} ∧ t = {u, v} ∧ p = {s, t}).
- p is an ordered pair: ∃u ∈ s∃v ∈ t(p = hu, vi)
where s, t are as in the previous formula.
- u = π1 (p), v = π2 (p), π1 (p1 ) = π1 (p2 ), π2 (p1 ) = π2 (p2 ):
∃v ∈ t(p = hu, vi), etc.
- r is a relation: ∀p ∈ r(p is an ordered pair).
- x = π1 [r], y = π2 [r]:
∀w ∈ x∃p ∈ r(w = π1 (p)) ∧ ∀p ∈ r∃w ∈ x(w = π1 (p)), etc.
- f is a function: f is a relation ∧
∀p1 ∈ f ∀p2 ∈ f (π1 (p1 ) = π1 (p2 ) ⇒ π2 (p1 ) = π2 (p2 )).
- w = f (v): ∃p ∈ f (p = hv, wi.
- g = f ↾ x: ∀p ∈ g(π1 (p) = x) ∧ ∀p ∈ f (π1 (p) = x ⇒ p ∈ g)).
- z = ∪x: ∀v ∈ x∀u ∈ v(u ∈ z) ∧ ∀u ∈ z∃v ∈ x(u ∈ x).
- z = x ∩ y: ∀v ∈ x(v ∈ y ⇒ v ∈ z) ∧ ∀v ∈ z(v ∈ x ∧ v ∈ y).
- z = x − y: ∀v ∈ x(v ∈
/ y ⇒ v ∈ z) ∧ ∀v ∈ z(v ∈ x ∧ v ∈
/ y).
51
- z = ∩x:
∃w ∈ x∀u ∈ w(∀v ∈ x(u ∈ v) ⇒ u ∈ z) ∧ ∀u ∈ z∀v ∈ x(u ∈ x).
- z = x × y:
∀p ∈ z∃u ∈ x∃v ∈ y(p = hu, vi) ∧ ∀u ∈ x∀v ∈ y∃p ∈ z(p = hu, vi).
- x is transitive: ∀v ∈ x(v ⊆ x).
- x is an ordinal: x is transitive ∧∀u ∈ x∀v ∈ x(u ∈ v∨u = v∨v ∈ u).
- x is a limit ordinal: x is an ordinal ∧∀u ∈ x∃v ∈ x(u ∈ v).
- x ∈ ω: x is an ordinal ∧ x is not a limit ordinal ∧
∀u ∈ x(x is not a limit ordinal).
- x = ω: x is a limit ordinal ∧ ∀u ∈ x(x is not a limit ordinal).
Theorem 1 illustrates the utility of transitive sets, as well as bounded quantifiers. The next theorem shows that a transitive structure can
be obtained from a structure satisfying a milder hypothesis. It is called
the Mostowski collapsing lemma, and is a technical tool of considerable
significance. It will be used many times throughout the remainder of
the text, lemma 20.6 being a notable example.
Theorem 2. Suppose D is a set satisfying the axiom of extensionality. Then there is a transitive set T , and a bijection π : D 7→ T , such
that ∀x, y ∈ D(x ∈ y ⇔ π(x) ∈ π(y)) (π is an ∈-isomorphism). Further
T and π are unique.
Remarks on proof: π is defined by recursion on the well-founded
partial order ∈, in a manner similar to definition by transfinite recursion
as described in section 13, to be the unique function on D such that
π(x) = {π(w) : w ∈ x ∩ D}. That is, π(x) = y if and only if x ∈ D and
there exists a function f with domain x such that f (w) = G(f ↾ w) for
all w ∈ x, and y = G(f ↾ x), where G(g) = {t : ∃s ∈ D(hs, ti ∈ g)}. It
follows by ∈-induction that there is a unique π satisfying the recursion
equation.
Let T = Ran(π). Using the axiom of replacement T is a set. Suppose x′ ∈ T ∧ v ′ ∈ x′ ; then ∃x ∈ D(x′ = π(x)), so ∃v ∈ x ∩ D(v ′ = π(v)),
so v ∈ T . This shows that T is transitive. Also, if v ∈ x for v, x ∈ D then
π(v) ∈ π(x) (π is an ∈-homomorphism). This much follows without the
extensionality hypothesis.
Suppose π is not bijective, so that {x1 ∈ D : ∃x2 ∈ D(x2 6= x1 ∧
π(x2 ) = π(x1 )} is nonempty. Let x1 be an ∈-minimal element. By the
hypothesis of extensionality, either (1) ∃w1 ∈ D(w1 ∈ x1 ∧ w1 ∈
/ x2 )
or (2) ∃w2 ∈ D(w2 ∈
/ x1 ∧ w2 ∈ x2 ). In case 1, π(w1 ) ∈ π(x), so
π(w1 ) ∈ π(x2 ), so ∃w2 ∈ D ∩ x2 (π(w1 ) = π(w2 )). Since w1 ∈
/ x2 ,
w1 6= w2 . In case 2, exchanging the roles of 1 and 2 again yields w1 , w2
with w1 ∈ x1 , w2 ∈ x2 , w1 6= w2 , and π(w1 ) = π(w2 ). In either case, w1
contradicts the minimality of x1 . Thus, π is bijective.
Suppose π(v) = π(x). Then there is a v ′ ∈ x ∩ D such that π(v ′ ) =
52
π(v). Since π is injective v ′ = v. This shows that π is an ∈-isomorphism.
If π ′ is any order isomorphism whose range is transitive, and w ∈ x
for w, x ∈ D, then π ′ (w) ∈ π ′ (x); thus, {π ′ (w) : w ∈ x ∩ D} ⊆ π ′ (x). If
w′ ∈ π(x) for x ∈ D, since the range of π ′ is transitive ∃w ∈ D(π ′ (w) =
w′ ), and π ′ (w) ∈ π ′ (x), whence w ∈ x. Thus, {π ′ (w) : w ∈ x ∩ D} =
π ′ (x). Thus, π ′ = π, and it follows that T is unique also.
For further details see theorem 6.15 of [Jech2]. ⊳
The set T of the theorem is called the transitive collapse (or Mostowski collapse), and the map π is called the collapsing isomorphism.
Theorem 2 gives the transitive collapse of an ∈-structure. The transitive
collapse may be applied to more general structures; see section 36.
A formula of set theory is said to be Π1 (resp. Σ1 ) if it is of the
form ∀~xF (resp. ∃xF ) for a ∆0 formula F . Suppose D is a transitive
class. A formula F is said to be down-absolute (resp. up-absolute) if for
any assignment of members of D to the free variables of F , if F holds in
V then it holds in D (resp. if F holds in D then it holds in V ). It is not
difficult to show, using the methods of theorem 1, that a Π1 formula is
down-absolute and a Σ1 formula is up-absolute.
The predicate z = Pow(x) is Π1 ; it holds if and only if ∀w(w ∈ z ⇔
w ⊆ z). Thus, it is down-absolute; that is, if the power set of x is an
element of D then it is the power set in D. However, it may not be in
D. Further, the predicate is not absolute. This follows because there
are countable models of the power set axiom. Indeed, it can be shown
(theorem 12.14 of [Jech2]) that there is a countable transitive model of
any finite set of axioms of ZFC.
The predicate “x is a cardinal” is Π1 ; it holds if and only if x is an
ordinal and for all functions f : α 7→ x where α ∈ x, f [α] 6= x. This
predicate is not absolute either, as cardinals can be proved to exist using
only finitely many axioms.
17. Admissible sets.
The predicate z = x × y is absolute; however, in a set D, the
Cartesian product may not always exist. If D is a model of ZFC then it
will; however there are various contexts in mathematical logic where it is
convenient to consider models of theories which are subsets of ZFC. The
models of such theories are shown to be closed under various functions
f , that is, ∀x∃yF holds in the models, where F is the formula defining
the function. The class of models of interest varies, but using a smaller
class than needed for one application avoids re-doing the work when
considering a smaller class.
Although there are others, the two main classes of models in use
are the rudimentarily closed sets, and the admissible sets (see [Mathias],
[Rathjen1] for other classes). The former are important in the branch of
53
set theory known as constructibility theory, and are discussed in section
45. The latter have continued to find applications in many areas in
mathematical logic.
The system of axioms KP (Kripke-Platek) consists of the following
axioms. For brevity bounded quantifiers are used (which was not done
for ZFC).
- extensionality, pairing, union
- foundation axiom scheme: For any formula F ,
∃wF ⇒ ∃w(F ∧ ∀u ∈ w¬Fu/w )
- ∆0 separation: For any ∆0 formula F where x does not occur free,
∃x∀w(w ∈ x ⇔ w ∈ y ∧ F )
- ∆0 collection: For any ∆0 formula F where w does not occur free,
∀x ∈ z∃yF ⇒ ∃w∀x ∈ z∃y ∈ wF
As noted in section 13, the foundation axiom scheme is provable in ZF,
The axiom of ∆0 collection is also (see theorem 5 below). Thus, KP is
a subtheory of ZF. It lacks the axiom of infinity, and also the power set
axiom.
As will be seen, in basic constructibility theory, certain functions
must be shown to exist (that is, the existence condition ∀x∃yF must
be shown), and proofs given that the functions have various properties.
If proofs are given in ZFC then the functions have been shown to exist
and have the properties, in models of ZFC, and in various cases this is
really all that is required. However, even in some such cases, proofs can
be given in KP. Theorems proved using KP hold in arbitrary structures
satisfying KP, and not just the universe of sets. This is a useful fact in
several branches of mathematical logic, including set theory itself.
Recall from section 5 that a k-tuple hx1 , x2 , . . .i is formally defined
as hx1 , hx2 . . .ii. Any formula ∃~xG where G is ∆0 is (provably in KP,
in fact weaker systems) equivalent to ∃pG′ where G′ is ∃s1 ∈ p∃x1 ∈
s1 · · · (p = hx1 , . . .i ∧ G). This transformation is called “contraction of
quantifiers”. Note also that if hx, yi ∈ w then x ∈ ∪ ∪ w; this may be
seen by applying ∪ twice to {{{x}, {x, y}}} ⊆ w.
Theorem 1 (Σ1 collection). For any Σ1 formula F where w does
not occur free,
⊢KP ∀x ∈ z∃yF ⇒ ∃w∀x ∈ z∃y ∈ wF .
Proof: Suppose F is ∃~v G where G is ∆0 ; then ∀x ∈ z∃y∃~vG. Using
contraction of quantifiers ∀x ∈ z∃pG′ , so by ∆0 collection ∃w′ ∀x ∈
z∃p ∈ w′ G′ . so ∃w∀x ∈ z∃y ∈ wF where w = ∪ ∪ w′ . The above
argument can be formalized in KP. ⊳
A predicate is said to be Σ1 -definable (resp. Π1 -definable) if there
is a Σ1 (resp. Π1 ) formula defining it. A predicate is said to be ∆1 definable if there are both a Σ1 and a Π1 formula defining it. For
54
brevity, the term “Σ1 ” alone may be used to abbreviate “Σ1 -definable”;
and similarly for Π1 and ∆1 . The following theorem is known as ∆1
separation.
Theorem 2. Suppose ⊢KP F ⇔ G where F is a Σ1 formula, G is a
Π1 formula, and w does not occur free in F or G. Then ⊢KP ∃x∀w(w ∈
x ⇔ w ∈ y ∧ F ).
Proof: By contraction of quantifiers it may be assumed that F
is ∃vF ′ and G is ∃vG′ . It follows from F ⇔ G by predicate logic
that (1) ∃v(F ′ ∧ ¬G′ ). It follows from (1) using ∆0 collection that
(2) ∃u∀w ∈ y∃v ∈ u(F ′ ∨ G′ ). It follows by ∆0 separation that (3)
∃x∀w(w ∈ x ⇔ w ∈ y ∧ ∃v ∈ uF ′ ). If ∃v ∈ uF ′ then ∃vF . On the other
hand, if w ∈ y and ∃vF , then ∀vG′ , so by (2) ∃v ∈ uF ′ . That is, (4)
w ∈ y ⇒ (∃v ∈ uF ′ ⇔ ∃vF ′ ). The claim follows by (3) and (4). ⊳
Note that a predicate defined by a formula F as in the theorem,
often called a ∆KP
predicate, is absolute for transitive sets which are
1
models of KP (and similarly if KP is replaced by other theories, such
as ZFC). The utility of theorems 1 and 2 are enhanced by methods for
showing that predicates are Σ1 or ∆KP
1 ; the next theorem is one such.
Theorem 3. Suppose G and H are Σ1 (resp. Π1 ) formulas, and F
is either G ∧ H, G ∨ H, ∀x ∈ yG, or ∃x ∈ yG. Then there is a Σ1 (resp.
Π1 ) formula F ′ such that ⊢KP F ⇔ F ′ .
Proof: All cases follow by predicate logic, except ∀x ∈ y∃~v G′ and
∃x ∈ y∀~v G′ . The second follows from the first by applying ¬. For the
first, by contraction of quantifiers the formula may be assumed to be
∀x ∈ y∃vG′ . By ∆0 collection the formula ∃u∀x ∈ y∃v ∈ uG′ follows,
and in fact this formula is equivalent, as can be seen by predicate logic.
⊳
(resp. ΠKP
Formulas as in this theorem are often called ΣKP
1 ) for1
mulas. Even though they are not in the required form, they are provably
equivalent to formulas that are, and can be used in proofs as if they were.
Another useful method for showing that predicates are Σ1 or ∆1
is “substitution”. Given a predicate P (y) and a function f (x), the
predicate P (f (x)) is commonly considered; the methods for dealing with
such substitutions formally are standard, but a bit involved.
Suppose the function f (x) is defined by a formula F , expressing the
predicate y = f (x). To use f as a function in proofs, it is necessary to
make use of the formula stating that f is a function, that is, “for all x
there is a unique y such that F holds”. This occurs so commonly that
the abbreviation “∀x∃!yF ” is used for it.
The existence condition is (omitting the leading universal quantifier) ∃yF . The uniqueness condition may be expressed as either
∃z∀y(F ⇒ y = z) or F ∧ Fz/y ⇒ z = y, where z does not occur
55
free in F ; the two are equivalent by predicate logic. The existence and
uniqueness conditions may be combined into a single formula in various
ways. Further discussion is omitted, except to note that the exercises in
predicate logic involved require some mastery, which should be acquired
in a course in predicate logic.
Using the existence and uniqueness conditions, it follows in predicate logic that ∃y(F ∧ G) is equivalent to ∀y(F ⇒ G). Thus, if G is a
formula for P , there is a y with y = f (x) and P (y) if and only if, P (y)
holds whenever y = f (x). If F is Σ1 , then if G is Σ1 then ∃y(F ∧ G) is
Σ1 ; and if G is Π1 then ∀y(F ⇒ G) is Π1 . If G is ∆KP
then so is G,
1
“with f (x) substituted for y”, when f is a Σ1 -definable function whose
existence and uniqueness conditions are provable in KP. The same is
true in various other theories, such as ZFC or PA.
Many of the ∆0 predicates listed in section 16 are definitions of
functions. For all of these, the existence and uniqueness conditions are
provable in KP; some examples will be given. For g = f ↾ x, rewrite
the predicate as g = {p ∈ f : π1 (p) ∈ x}. Existence follows by ∆0
separation. Uniqueness follows by supposing g1 and g2 both satisfy the
predicate, and showing that if p ∈ g1 then p ∈ g2 ; whence by symmetry
g1 = g2 .
For z = x × y, using ∆0 collection, ∀u ∈ x∃pu A where A is ∀v ∈
y∃w ∈ pu (w = hu, vi). Again using ∆0 collection, ∃p′ ∀u ∈ x∃pu ∈
p′ A. Let p = ∪p′ ; then ∀u ∈ x∀v ∈ y∃w ∈ p(w = hu, vi). Using
∆0 separation, the existence condition follows. Uniqueness follows by
proceeding as in the first example.
In the proof of theorem 16.2, a definition by recursion on ∈ was
given. With suitable hypotheses, such definitions can be given in models of KP. In the following, f = F ↾ x is an abbreviation for ∀w ∈
x∀y(Fw/x ⇔ f (w) = y).
Theorem 4. Given a Σ1 formula G, let F be the formula ∃f (I ∧
G), where I is the formula “f is a function and π1 [f ] = x and ∀w ∈
x∃y∃g(y = f (w) ∧ g = f ↾ w ∧ Gw/x,g/f )”. Then ∃!yF is provable in KP
from ∃!yG; F ⇔ ∃f (f = F ↾ x ∧ G) is also provable.
Proof: Suppose I, If ′ /f , and w ∈ x. Suppose inductively that
f ′ (w′ ) = f (w′ ) for w′ ∈ w; then f ′ ↾ w = f ↾ w. Using this and the
uniqueness condition for G, f ′ (w) = f (w). Thus, by ∈-induction, (1)
I ∧ If ′ /f ⇒ f ′ = f . By a similar argument using the existence condition
for G, (2) ∃f I.
The uniqueness condition for F follows by (1) and the uniqueness
condition for G.
Suppose inductively that ∀x ∈ x0 ∃yF , that is, ∀x ∈ x0 ∃y∃f (I ∧ G).
Then by Σ1 collection, for some cy and cf , ∀x ∈ x0 ∃y ∈ cy ∃f ∈ cf (I∧G).
56
I is ∆KP
1 , so using ∆1 separation, let f0 = ∪{f ∈ cf : ∃x ∈ x0 I}. Using
(1) and (2), f0 is a function with domain x0 . Ix0 /x,f0 /f is readily seen
to hold, whence ∃y(Ix0 /x,f0 /f ∧ Gx0 /x,f0 /f ) holds. ∃yF follows by ∈induction.
For the last claim, it suffices to show that I ⇔ f = F ↾ x; since
∃!f (f = F ↾ x) it suffices to show I ⇒ f = F ↾ x. Suppose I ∧ w ∈ x.
From the definition of f0 above, Iw/x,g/f ⇒ g = f ↾ w, from which
Iw/x,g/f ⇔ g = f ↾ w. By the definition of I, ∃y(y = f (w) ∧ Fw/x ),
whence Fw/x ⇔ f (w) = y. ⊳
Recall from section 13 the definition of the smallest transitive set
containing a set x as a subset; it is called the transitive closure of x.
Informally, it equals x ∪ (∪x) ∪ (∪ ∪ x) · · ·, that is, the sets which are
members of x, members of members of x, etc. It follows by the theorem
that the function TC(x), whose value at x is the transitive closure of x,
is ΣKP
1 .
Basic properties of the TC operation include the following.
- x ⊆ TC(x)
- x = TC(x) if and only if x is transitive.
- If w ∈ x then TC(w) ⊂ TC(x).
- TC(x) = x ∪ (∪w∈x TC(w))
It clearly follows that |x| ≤ |TC(x)|, and |w| ≤ |TC(x)| for all w ∈ x.
On the other hand, if |x| ≤ κ and |w| ≤ κ for all w ∈ x then |TC(x)| ≤ κ.
This follows since by the last fact above, |TC(x)| ≤ κ + κ · κ.
An admissible set is defined to be a transitive set which is a model
of KP. The use of the term “model” requires clarification. A definition
has been given in informal set theory; this can be given in formal set
theory, indeed this will be done shortly. On the other hand, in set theory
D can be said to be a model of a sentence F if F D is true in V . If D is
a set the two definitions agree (this is proved in the next section); the
latter definition can be used when D is a class.
If the existence and uniqueness conditions have been proven in KP
for a function f with a Σ1 definition, then an admissible set D is “closed”
under f , that is, if x1 , . . . xk ∈ D then f (~x) ∈ D.
If D is a transitive set D∩Ord is an ordinal, indeed the least ordinal
which is not an element of D. This ordinal is called the ordinal of D;
the notation “o(D)” is sometimes used to denote it, but here D ∩ Ord
will be used. An ordinal α is said to be admissible if and only if there is
an admissible set D with α = D ∩ Ord. In early stages of admissible set
theory admissible ordinals had been given various “intrinsic” characterizations; the characterization in terms of admissible sets has turned out
to be a useful one.
Some examples of admissible sets will shortly be given. In addition,
57
some material of general interest will be covered, including Vα , the rank
function, Hκ , cofinality, and regular and singular cardinals.
Let V0 = ∅, Vα+1 = Pow(Vα ), and Vα = ∪β<α Vβ for limit ordinals
α. The following are basic facts about these sets.
1. Every Vα is transitive.
2. If β ≤ α then Vβ ⊆ Vα .
3. For every x ∈ V there is an α such that x ∈ Vα .
For fact 1, Pow(S) is transitive for any set S, and the union of transitive
sets is transitive, and the claim follows by transfinite induction. Fact
2 follows by transfinite induction on α; for example Vα ∈ Vα+1 , so by
fact 1 Vα ⊆ Vα+1 . For fact 3, if every element of x is in some Vβ then
using the axiom of replacement there is a Vα containing all of them, and
x ∈ Vα+1 . Thus, the claim follows by induction on ∈.
Any set D is already a model of the foundation scheme. Suppose
C is the set of w ∈ D such that F is true of w in D. If C is nonempty
then there is an ∈-minimal element w′ in C, and this has the necessary
properties in D. Any transitive set D is already a model of extensionality. If x, y ∈ D then all elements of either x or y are also in D, so if
these sets are the same they are the same in D.
Thus for any α, Vα , being transitive, satisfies the extensionality
axiom and the foundation axiom scheme. If α is a limit ordinal, Vα also
satisfies the pairing, union, and power set axioms, because applying
these operations to elements of Vα yields an element of Vα . Vα satisfies
the separation axiom, because any subset of a member of Vα is a member
of Vα . Vα satisfies the axiom of choice, because given a set of nonempty
sets in Vβ for some β < α, there is a choice function in Vβ+i where i is
a small integer. If α > ω then Vα satisfies the axiom of infinity, because
ω ∈ Vα .
The rank ρ(x) of a set is defined to be the least α such that x ∈
Vα+1 . Vα may be seen as the αth “level” of the “cumulative hierarchy”
of sets. Each set is a set of objects, which themselves are simpler sets.
The rank gives a quantitative meaning to the notion of simpler.
The following are basic facts about the rank function.
1. Vα = {x : ρ(x) < α}. If ρ(x) = β < α then x ∈ Vβ+1 ⊆ Vα ; and if
ρ(x) = α then x ∈
/ Vα .
2. If w ∈ x then ρ(w) < ρ(x). If ρ(x) = α then x ∈ Vα+1 , so x ⊆ Vα ,
so w ∈ Vα , so ρ(w) < α.
3. Vα ∩ Ord = α. This follows by transfinite induction. For example
at successor stages, inductively α ⊆ Vα , so α ∈ Vα+1 , so α + 1 =
α ∪ {α} ⊆ Vα+1 . If α + 1 ∈ Vα+1 then α + 1 ⊆ Vα , so α ∈ Vα+1 ,
contradicting the induction hypothesis.
4. ρ(α) = α. This follows by fact 3.
58
5. ρ(x) = sup{ρ(w) + 1 : w ∈ x}. If ρ(x) = α and w ∈ x then
ρ(w) < α, so ρ(w + 1) ≤ α; thus, sup{ρ(w) + 1 : w ∈ x} ≤ α. If
α = β + 1 there must be some w ∈ x with ρ(w) = β, else x ⊆ Vβ ; so
sup{ρ(w) + 1 : w ∈ x} = α in this case. If α is a limit ordinal then
for all β < α there must be a w ∈ x with ρ(w) ≥ β, else x ⊆ Vβ for
some β < α; thus sup{ρ(w) + 1 : w ∈ x} = α in this case also.
6. If ρ(x) = α and γ < α then ρ(w) = γ for some w ∈ TC(x). This
follows by induction on α. If α = β + 1 then as in the proof of
fact 5 there is a w ∈ x with ρ(w) = β, and for γ < β the claim
follows inductively. If α is a limit ordinal then there is a w ∈ x with
ρ(w) = β and γ < β, and the claim follows inductively.
7. ρ(x) ≤ |TC(x)|. This follows by the previous fact.
Item 5 may be used to give a ΣKP
definition of the rank function in
1
any admissible set; thus, it is defined and absolute for admissible sets.
The following theorem shows how the rank function may be used, to
prove a basic fact about KP.
Theorem 5. Given the other axioms of KP, Σ1 replacement follows
from ∆0 collection, and ∆0 collection follows from Σ2 replacement (the
replacement axiom with F restricted to Σ2 formulas).
Remarks on proof: The replacement axiom differs from the collection axiom in two ways. In the collection axiom Fxy is required to
define a relation which is total when restricted to the domain u, and it
is asserted that there exists a set v containing the range of the restricted
relation. In the replacement axiom Fxy is required to define a relation
which is single valued, and it is asserted that if the domain is restricted
to a set u then the range v is a set.
Given a Σ1 formula F which defines a single valued relation, and a
set u, by the collection axiom there is a set w which contains the range.
The required set v is then {y ∈ w : ∃x ∈ uF }. Since F is single valued
it may be written in Π1 form, and by theorem 3 ∃x ∈ uF may be also.
By ∆1 separation (theorem 2) v is a set.
Let KP′ be KP, with ∆0 collection replaced by Σ1 replacement.
Theorem 4 may be proved in KP′ by suitably modifying the proof in
KP (since the uniqueness condition holds for F , replacement may be
used instead of collection to prove the existence condition).
Given a ∆0 formula Fxy let G(x, β) be the formula, “∃y(ρ(y) ≤
β ∧ F ) ∧ ∀y(F ⇒ ρ(y) ≤ β)”. Then G is Σ2 , and the uniqueness
condition is provable, so by Σ2 replacement the image of u is a set. ⊳
By modifying the above proof, the axiom of collection for any formula F follows in ZFC. Also, let “strong Σ1 collection” be the axiom
scheme “∃v∀x ∈ u(∃yF ⇒ ∃y ∈ vF )” for Σ1 F where v does not occur
not free; this follows from Σ2 replacement.
59
For an infinite cardinal κ let Hκ be {x : |TC(x)| < κ}. Using fact 7
above, if x ∈ Hκ then |TC(x)| < κ, so ρ(x) < κ, so x ∈ Vκ , so ρ(x) < κ,
so x ∈ Vκ . That is, Hκ ⊆ Vκ . In particular, Hκ is a set.
Hκ is transitive; it follows from the definition that if x ∈ Hκ and
w ∈ x then w ∈ Hκ . Similarly to the case of Vα for limit α, Hκ satisfies
extensionality, the foundation scheme, pairing, union, separation, and
choice.
Before considering collection in Hκ , the notion of cofinality will be
introduced. Let α be a limit ordinal. A subset S ⊆ α is said to be
“unbounded” if for all β < α there exists γ ∈ S such that γ ≥ β. The
“ cofinality” cf(α) of α is the smallest ordinal β such that there is a
function f : β 7→ α whose range is unbounded in α. If β is a successor
ordinal then f [β] cannot be unbounded; it follows that cf(α) is a limit
ordinal.
For example, the cofinality of ℵω is ω. The map n 7→ ℵn shows that
it is at most ω, and ω is the smallest limit ordinal.
A strictly order-preserving map between linear orders is also called
“increasing”. It does not matter whether the function f in the definition
of cf(α) is required to be increasing. Given an arbitrary f with domain
β and range unbounded in α, a strictly order-preserving f ′ with domain
β ′ ≤ β and unbounded range can be defined. Using transfinite recursion,
let f ′ (γ) be the least δ ∈ f [β] which is greater than any element of f ′ [γ],
if any. A function which is increasing and has unbounded range will be
said to be increasing and unbounded.
The cofinality cf(α) of a limit ordinal α is a cardinal number κ with
κ ≤ α. Indeed, if f : β 7→ α, f [β] is unbounded, and g : κ 7→ β is a
bijection where κ is the cardinality of β, then (f ◦ g)[κ] is unbounded;
it follows that the smallest β must be a cardinal.
Suppose α, β, and γ are limit ordinals, f : β 7→ α, and g : γ 7→ β.
It is easy to verify that if f and g are increasing then f ◦ g is increasing;
and that if f [β] and g[γ] are unbounded then (f ◦ g)[γ] is unbounded.
Supposing that f is increasing and unbounded, it follows that cf(α) ≤
cf(β). In fact, cf(β) = cf(α). Suppose cf(α) = κ and h : κ 7→ α is
increasing and unbounded. Let g : κ 7→ β be constructed inductively,
letting g(δ) = µζ(f (ζ ∈
/ h[δ]). Since κ = cf(α) the recursion must
continue until the domain κ of g is exhausted, and ζ must eventually
exceed any element of β. This shows that cf(β) ≤ cf(α) also.
A cardinal κ is said to be regular if cf(κ) = κ; otherwise it is said
to be singular. cf(α) is a regular cardinal. Indeed, if f : κ 7→ α is
increasing and unbounded, and g : λ 7→ κ is increasing and unbounded,
then as noted above f ◦ g is increasing and unbounded.
Suppose κ = ℵα is an infinite cardinal. It is said to be a successor
60
cardinal if and only if κ = λ+ for some λ, or equivalently if α is a
successor ordinal. Otherwise, α = 0 or α is a limit ordinal, and κ is a
union of smaller cardinals; in this case κ is said to be a limit cardinal.
Theorem 6. A successor cardinal is regular.
Proof: Suppose f : λ 7→ ℵα+1 where λ ≤ ℵα . Then |f (δ)| ≤ ℵα
for all δ < λ, whence | ∪δ<λ f (δ)| ≤ λ · ℵα ≤ ℵα . Thus, f [λ] is not
unbounded. ⊳
For limit cardinals other than ω, cf(ℵα ) = cf(α). The question
arises of whether there are any regular limit cardinals other than ω.
This question is independent of ZFC; more will be said in section 30.
Lemma 7. Suppose κ is regular, λ < κ, and |Sα | < κ for α < λ.
Then | ∪α<λ Sα | < κ.
Proof: Because κ is regular, if µ = sup{|Sα | : α < λ} then µ < κ;
so | ∪α<λ Sα | ≤ λ · µ < κ. ⊳
“Full collection” is the axiom scheme ∀u ∈ x∃vF ⇒ ∃y∀u ∈ x∃v ∈
yF for any formula F .
Theorem 8. If κ is a regular cardinal then Hκ satisfies full collection,
and is an admissible set.
Proof: Suppose ∀u ∈ x∃vF , and for each u ∈ x let vu be such that
F holds at u and vu ; let y = {vu : u ∈ x}. Since x ∈ Hκ , |x| < κ, so
|y| < κ. Since vu ∈ Hκ , |TC(v)| < κ for each v ∈ y. Using lemma 6,
|TC(y)| < κ, so y ∈ Hκ . ⊳
Hω is known as the collection of hereditarily finite sets. Since ω
is a regular cardinal, Hω is an admissible set. It is easy to see that
any set in Vω is hereditarily finite, whence Vω = Hω . Further, if D
is any admissible set, then Vω ⊆ D must hold, using the pairing and
union axioms. That is, the hereditarily finite sets comprise the smallest
admissible set.
Vω is a model of all the axioms of ZFC, except the axiom of infinity.
That replacement holds can be verified by proving full collection as in
theorem 8, and using the observation following theorem 5 (alternatively
a direct proof can readily be given).
In fact, Hκ is admissible for any infinite cardinal κ. The proof
requires additional methods, and will be omitted. The textbooks [Barwise] and [Devlin] have proofs. From this, it follows that every infinite
cardinal is an admissible ordinal. There are many more, though; indeed
there are many admissible countable ordinals.
18. Formalization of syntax.
Just as in the theory of PA, formalization of syntax plays a role in
the theory of ZFC. A predicate Sat can be defined, so that Sat(D, f,
a) is true if and only if the formula F where f = pF q, is true in the
set D (considered as a structure for the language of set theory), with
61
assignment a to the free variables of F . In the notation of section 6,
Sat(D, f, a) if and only if F̂ (a) = t in the structure D. The “Godel
numbering” of formulas used differs from that of arithmetic, and uses
sets in Vω rather than integers representing strings. This predicate is
called the “satisfaction predicate”. It has various uses; one will be given
in the next section.
It is important to show that this predicate is ∆ZF
1 . Since it is little
more work, it will be shown to be ∆KP
1 . This latter fact has applications
in admissible set theory. A function is said to be ΣKP
if the predicate
1
“f (~x) = y” is ΣKP
1 . If the existence and uniqueness conditions are
provable in KP then the predicate is in fact ∆KP
1 , since it is provably
equivalent to ∀w(Fw/y ⇒ w = y).
The language of set theory has no constants or function symbols.
Thus, a formula F may be given a code pF q as follows.
- xn ∈ xm : h0, hn, mii
- xn = xm : h1, hn, mii
- ¬F : h2, pF qi
- F ◦ G: hi, hpF q, pGqii where i is 3,4,5,6 when ◦ is ∧ ∨ ⇒ ⇔ respectively
- Qxn F : hi, hn, hpF qii where i is 7,8 when Q is ∀ ∃ respectively
Theorem 17.4 does not allow the above recursion to be formalized immediately; however a generalization does, wherein F may be
defined from its values at elements of TC(x), rather than members of
x. The corresponding induction rule (lemma 1) is needed for the proof
of the theorem (theorem 2) showing that the recursively defined function exists. The abbreviation “∀w ∈ TC(x)Fw/x ” is used for “∃t(t =
TC(x) ∧ ∀w ∈ tFw/x ”, and similar abbreviations.
Lemma 1. ∀x(∀w ∈ TC(x)Fw/x ⇒ F ) ⇒ ∀xF .
Proof: Suppose ∀w ∈ TC(x)Fw/x ⇒ F . Let F ′ be ∀u ∈ TC(x ∪
′
; then ∀u ∈ TC(x)Fu/x , so F , so F ′ .
{x})Fu/x . Suppose ∀w ∈ xFw/x
Thus, F ′ follows by ∈-induction, and F follows from F ′ . ⊳
Theorem 2. Given a Σ1 formula G, let F be the formula ∃f (I ∧ G),
where I is the formula “f is a function and π1 [f ] = TC(x) and ∀w ∈
TC(x)∃y∃g(y = f (w) ∧ g = f ↾ TC(x) ∧ Gw/x,g/f )”. Then ∃!yF is
provable in KP from ∃!yG; F ⇔ ∃f (f = F ↾ TC(x)∧G) is also provable.
Proof: The theorem may be proved by modifying the proof of theorem 17.4; a few observations on the modifications will be made. The
proof of the existence and uniqueness conditions for I make use of lemma
1, and the existence and uniqueness conditions for TC. In the proof of
the existence condition for F let f0 = ∪{f ∈ cf : ∃x ∈ TC(x0 )I};
then Ix0 /x,f0 /f follows. Lemma 1 is used in concluding ∃yF . For the
last claim, I ⇒ f = F ↾ TC(x) is shown. Suppose I ∧ w ∈ TC(x).
62
Iw/x,g/f ⇒ g = f ↾ w follows, from which Iw/x,g/f ⇔ g = f ↾ TC(w).
Fw/x ⇔ f (w) = y follows as before. ⊳
The predicate IsForm(x) can readily be defined by a recursion as
in theorem 2. It holds if and only if x = h0, hn, mii or x = h1, hn, mii or
∃w ∈ TC(x)(x = h2, wi or . . ..
There is a slight technicality, in that a 0-1 valued function IsFormF
must be defined first; IsForm(x) then equals IsFormF (x, 1), or
∀y(IsFormF (x, y) ⇒ y = 1). Alternatively, versions of theorems 17.4
and 2 can be given for predicates.
Using theorem 2 it is a routine exercise to show that the following
functions and predicates are ∆KP
1 , and that the existence and uniqueness
conditions for the functions are provable in KP.
- FrVar(f ), the set of free variables of the formula coded by the set
f (or ∅ if f is not the code of a formula).
- IsAsn(D, s, a), a : s 7→ D where s is a set of variables.
- Sat(D, f, a).
As an example, Sat in the case where F is ∀xi G, is ∀u ∈ DSat(D, g, a′ )
where a′ is a ∪ {hi, ui} (i.e., the value of a formula which defines this).
If F is a formula with free variables xi1 , . . . , xik let AsnF be a
formula with the additional free variables a, which is true if and only
if a is the assignment to xi1 , . . . , xik , determined by their values (i.e.,
such that a(ij ) = xij for 1 ≤ j ≤ k). The following theorem uses
abbreviations which should be familiar by now. It shows that for sets,
the two notions of a model for a sentence of set theory mentioned in
section 17 are equivalent.
Theorem 3. For any formula F , ⊢KP Sat(D, pF q, AsnF (~x)) ⇔ F D .
Proof: The proof is a straightforward if tedious induction on F . ⊳
19. Constructible sets.
Suppose S is a set, F is a formula, xi is a free variable of F , and a
is an assignment to the remaining free variables of F . Then the subset
T = {u ∈ S : Sat(S, pF q, a′ ) where a′ = a ∪ {hi, ui} is said to be the
subset defined by F and a in S. It should be clear from the preceding
section that the function T = DefBy(S, f, a) stating that this is the case
is ∆KP
1 . The existence condition follows using ∆1 collection.
Let Def(S) be {T : ∃f ∃a(T = DefBy(S, f, a))}. This is the subset
of Pow(S), consisting of those subsets which are defined in S by some F
and a. In model theory, the definable sets (or more generally predicates)
in a structure are similarly defined. A definition of a set T is said to have
parameters if a is nonempty, else to be parameter-free. Thus, Def(S) is
just the subsets of the structure S for the language of set theory, which
have a definition with parameters.
Theorem 1. The function Def(S) is ∆KP
1 .
63
Proof: The existence condition can be proved, because f can be
limited to Vω , and a to x<ω , the “finite sequences” in x, i.e., the functions from n to x for some integer n. See the proof of theorem 6 below
for further details. ⊳
The following definition was given by Kurt Godel in 1939, and is
one of the most important definitions in set theory.
L0 = ∅
Lα+1 = Def(Lα )
Lα = ∪β<α Lβ for limit ordinals α
In the following, the notation x 7→ t will be used. Here, t is a term
involving x, and x 7→ t denotes the function whose value at x is t. This
is a convenient method for denoting a function, without having to give
it a name.
Theorem 2. The function α 7→ Lα is ∆KP
1 .
Proof: This follows from theorem 1 using theorem 17.4. ⊳
Theorem 3.
a. Lα is transitive.
b. Lβ ∈ Lα if β < α.
c. Lα ⊆ Vα .
d. Ln = Vn for n ∈ ω, and Lω = Vω .
e. Lα ∩ Ord = α.
Proof: Part a follows by induction using the fact that if S is transitive then Def(S) is transitive; indeed, if w ∈ x ∈ Def(S) then x ⊆ S, so
w ∈ S, so w is defined by xi ∈ xj and {hj, wi}. For part b, S ∈ Def(S)
for any set S, so Lα ∈ Lα+1 , and by part a Lα ⊆ Lα+1 . The claim,
together with Lβ ⊆ Lα , follow by induction on α. Part c follows by
induction, since Def(S) ⊆ Pow(S), and X ⊆ Y ⇒ Pow(x) ⊆ Pow(y).
For part d, if S is finite then it is easily seen that Def(S) = Pow(S),
and the claim follows. For part e, by part c Lα ∩ Ord ⊆ α. On the other
hand it is easily seen that if Lα ∩ Ord = α then α ∈ Lα+1 , and the claim
follows. ⊳
Although it is a secondary topic, some further remarks may be
made about KP. Let KP∞ be KP, with the axiom of infinity added.
Theorems 1 and 2 hold with ZF replaced by KP∞ , and also theorem 6
below. The only admissible set not satisfying the axiom of infinity is
Vω . Any other admissible set D contains ω, Vω , and x<ω for any set
x ∈ D. Theorem 3 holds in any admissible set (except the second claim
of part e in Vω ). Suppose D is an admissible set and D ∩ Ord = α. By
absoluteness, Lβ in D is the same as Lβ for β < α. Lα is a subset of D.
Lα is an admissible set (this follows by arguments similar to theorem
4), and is the smallest admissible set D with D ∩ Ord = α.
Let L be the class ∪α Lα . As usual, this is an abbreviation for a
64
formula stating that x is in some Lα . Note that L is a transitive class;
it is known as the class of constructible sets. Theorem 5 below shows
that it is a model of ZF in the sense stated in section 17. For lemma
4, the notation “LimOrd” is introduced for the class of limit ordinals.
Also, the notion of a “subformula” of a formula F is required; the set of
these is defined by an obvious recursion. Neither lemma 4 nor theorem
5 require the axiom of choice.
Lemma 4. Suppose F is a formula with free variables x1 , . . . , xk .
Then ⊢ZF ∀α∃β ≥ α(β ∈ LimOrd ∧ ∀x1 , . . . , xk ∈ Lβ (F L ⇔ F Lβ )).
Proof: For a formula G with free variables w, x1 , . . . , xk let BGw (α)
be the least β > α such that for all x1 , . . . , xk ∈ Lα , if ∃w ∈ LGL then
∃w ∈ Lβ GL ; such a β exists by full collection. Let α0 = α; let αi+1 be
the supremum over subformulas G of F and free variables w of G, of
BGw (αi ); and let β = ∪i αi . It will be shown by induction on G that if G
is a subformula of F , with free variables y1 , . . . , yl , and y1 , . . . , yl ∈ Lβ ,
then GL ⇔ GLβ . The claim is immediate for atomic formulas, and
follows trivially for propositional connectives. It suffices to show the
claim for ∃wG. Suppose ∃w ∈ LGL ; then ∃w ∈ Lβ GL , so inductively
∃w ∈ Lβ GLβ . Suppose ∃w ∈ Lβ GLβ ; then inductively ∃w ∈ Lβ GL , so
∃w ∈ LGL . ⊳
Theorem 5. If F is an axiom of ZF then ⊢ZF F L .
Proof: Since L is transitive, the axioms of extensionality and foundation hold in L. If a, b ∈ Lα then {a, b} is defined in Lα by x = a∨x = b,
so pairing holds. If a ∈ Lα then ∪a is defined in Lα by ∃b ∈ a(x ∈ b),
so union holds. ω ∈ Lω+1 , so infinity holds.
For separation, let F be a formula, and suppose the free variables
are assigned values in L; then for some α the values are in Lα . By
lemma 4 β may be chosen such that F L ⇔ F Lβ when the free variables
are assigned values in Lβ . Let x = {w ∈ y : F Lβ }; by the definition of
Lβ+1 and theorem 18.3, x ∈ Lβ+1 . By choice of β, x = {w ∈ y : F L }.
Since x ∈ L, the axiom of separation for F has been shown.
For power set, suppose y ∈ L. Let x = {w ⊆ y : w ∈ L}. By power
set in V x is a set, and is a subset of L. Using replacement in V , x ⊆ Lα
for some α. By separation in L (which has just been shown), x ∈ L.
For replacement, suppose F defines a single valued predicate in L,
and u ∈ L. Let v = {y ∈ L : ∃x ∈ uF L }. By replacement in V v is a
set, and is a subset of L. As for power set, it follows that v ∈ L. ⊳
Theorem 5 is a remarkable fact. The axioms of set theory seem clear
enough, and there is no reason to suspect that such a proper class might
exist. Godel’s discovery of it in 1939 raised questions which remain
under study to this day, the most obvious being, whether every set is
constructible. The statement that this is so is called the hypothesis of
65
constructibility, and V = L is frequently written for it. It will be seen
later that it is independent of ZFC.
Suppose A is a transitive model of ZF, and α = A ∩ Ord (or Ord if
A is a proper class). Then x ∈ LA if and only if ∃β ∈ A(x ∈ LA
β ). Using
absoluteness x ∈ LA if and only if ∃β ∈ α(x ∈ Lβ ). That is, LA = Lα .
If A is a proper class, LA = L. Thus, L is the smallest transitive proper
class which is a model of ZF. Also, LL = L, and (since clearly V L = L),
V = L holds in L.
Let ⊥ denote an obviously contradictory statement, Suppose V =
L ⊢ZF ⊥. Modify the proof by relativizing each statement to L. The
resulting sequence of formulas may be expanded to a ZF proof of ⊥L
in ZF; this follows using theorem 5, some facts about the predicate
calculus, and ⊢ZF (V = L)L . By suitable formalization, it follows that
the consistency of Con(ZF+V = L) is provable in ZF (in fact PA) from
Con(ZF).
The hypothesis of constructibility is a very strong statement. In
[Shelah2] the statement is made that “A major preliminary obstacle to
this dream is the lack of a good candidate to be a test problem, since
so many questions have already been settled under the assumption V =
L”. V = L is so powerful because L can be shown to have a variety of
properties; these are not provable in ZFC for “all” sets.
For example, L has a “definable well-order”. The notion of a (strict)
well-order < on a class S needs to be clarified. It is a proper class of
ordered pairs, such that for each pair p π1 (p) ∈ S and π2 (p) ∈ S. It
satisfies the axioms for a strict partial order, namely x < y∧y < z ⇒ x <
z and x 6< x; and trichotomy x ∈ S ∧ y ∈ S ⇒ (x < y ∨ x = y ∨ y < x),
so that it is a strict linear order on S.
For the remaining restrictions usage varies. At the least, every
nonempty subset of x ⊆ S must contain a <-minimal element v (i.e.,
∀u ∈ x(u 6< v)). The additional requirement that {u : u < v} be a
set for all v ∈ S may be imposed. Since relations satisfying only the
first restriction have uses, these will be called well-orders; if the second
restriction holds also the well-order will be said to have small extensions.
Theorem 6. There are binary predicates <α on Lα such that, for
α < β, <α ⊆<β , and x ∈ Lα ∧ y ∈ Lα − Lβ ⇒ x <β y. The map α 7→<α
is Σ1 and the existence and uniqueness conditions are provable in KP.
Proof: The proof is an exercise in definition by Σ1 recursion; an
outline is given below. ⊳
Write x <L y for ∃α(x <α y). It is readily seen from the theorem
that <L is a well-order on L with small extensions, and has a Σ1 definition. As a consequence, the axiom of choice holds in L, indeed there
is a definable function on the nonempty sets in L, which assigns to each
66
nonempty set x its <L -least element.
It follows that AC is provable in ZF from V = L, from which it
follows that if ZF is consistent then ¬AC is not provable. It is also
true that <L
L (the predicate defined by the relativized formula) equals
<L . Since the formula is Σ1 , x <L
L y ⇒ x <L y. Since <L satisfies
trichotomy, x <L y if and only if x ∈ L ∧ y ∈ L ∧ x 6= y ∧ y 6<L x, and
x <L
L y follows.
The well-orders <α are defined by recursion.
<0 = ∅,
x <α+1 y if and only if x ∈ Lα+1 ∧ y ∈ Lα+1 ∧
x ∈ Lα ∧ y ∈ Lα ∧ x <α y∨
x ∈ Lα ∧ y ∈
/ Lα ∨
x∈
/ Lα ∧ y ∈
/ Lα ∧ P (x, y).
The predicate P (x, y) holds, if there is a definition of x which precedes
the first definition of y; an outline of its definition will be given below.
For a limit ordinal α, <α = ∪β<α <β .
For the definition of P some functions of general use will be defined first. These have Σ1 definitions and the existence and uniqueness conditions are provable in ZF. If v is a collection of subsets of
x let Add1(v, x) equal {s ∪ {a} : s ∈ v, a ∈ x}. Let [x]≤n denote
the subsets s ⊆ x with |s| ≤ n; this may be defined by the recursion
[x]≤0 = ∅, [x]≤n+1 = Add1([x]≤n , x). The set of finite subsets [x]<ω
equals ∪n∈ω [x]n . A similar recursion may be used to define the set xn
of sequences in x of length n; and the set x<ω of finite sequences.
The finite rank sets may be defined without the power set axiom,
by the recursion V0 = ∅, Vn+1 = [Vn ]<ω , and Vω = ∪n Vn .
Let <fn be the well-order on Vn defined recursively as follows.
<f0 = ∅,
x <fn+1 y if and only if x ∈ Vn+1 ∧ y ∈ Vn+1 ∧
x ∈ Vn ∧ y ∈ Vn ∧ x <fn y∨
x ∈ Vn ∧ y ∈
/ Vn ∨
x∈
/ Vn ∧ y ∈
/ Vn ∧ ∃s ∈ Vn (s ∈
/ x ∧ s ∈ y∧
∀t ∈ Vn (t <fn s ⇒ (t ∈ x ⇔ t ∈ y)))
Let <f denote ∪n <fn ; this is a well-order on Vω .
Suppose ≤ is a linear order on a set x, and γ is an ordinal. Let <lex
be the relation on xγ , such that f <lex g if and only if ∃β < γ(s(β) <
t(β) ∧ ∀α < β(s(α) = t(α))). A straightforward verification shows that
this relation is the strict part of a linear order. This order has various
uses in mathematics, including set theory, and is called the lexicographic
order.
If < is a well-order, in general <lex is not; however if γ is a finite
ordinal n then <lex is a well-order. To see this, given any set s ⊆ xn ,
67
define a “leftmost” element f inductively, by letting f (i) be the least
value such that hf (0), . . . , f( i)i is a prefix of an element of s.
Given a well-order < on a set x, let <l be the binary relation on
<ω
x , defined as follows. For s, t ∈ x<ω let m = Dom(s) and n = Dom(t).
Then s <l t if an only if m < n ∨ m = n ∧ s <lex t. This relation is
easily seen to be a well-order on x<ω .
To define P (x, y), an element of Lα is considered to be given by
a pair d = hf, ai where f ∈ Vω and a ∈ L<ω
α . Let D denote the
set of these; for d1 , d2 ∈ D let d1 <D d2 if and only if f1 <f f2 or
f1 = f2 and a1 <lα a2 ; and let VD (d) denote DefBy(Lα , f, a). Then P
is ∃d1 , d2 ∈ D(x = VD (d1 ) ∧ y = VD (d2 ) ∧ ∀d3 ∈ D(d3 <D d2 ⇒ y 6=
VD (d3 )) ∧ d1 <D d2 ).
20. CH is true in L.
A constructible x ⊆ ω must be constructed by some stage Lα . The
nature of L allows using “model theoretic” arguments to first construct
a substructure containing x, second collapse it using the Mostowski
collapsing lemma, and third show that the result is Lβ for some β < ℵL
1.
The same argument works for any infinite cardinal κ, and it follows that
GCH holds in L.
There are other methods of proving this. For example Pow(κ) ⊆
Hκ+ , and by a model theoretic construction it can be shown that in L,
Hκ+ ⊆ Lκ+ . This proof may be found in [Drake]. The first proof will
be given here, because all three steps are of wide use in set theory. The
third is called the “condensation lemma”.
If S is a structure for a first order language, and T ⊆ S is a substructure, then T is said to be an elementary substructure if, for every
formula F (and suitable list of variables), and vector ~x with xi ∈ T for
1 ≤ i ≤ k, F̂ (~x) has the same value in T as it has in S. The notation
T ≺ S is used to denote this.
For future use, in the case of set theory (and other settings where
bounded quantifiers are present), if the requirement need only hold for
Σn formulas, T is said to be a Σn -elementary substructure, and this
is denoted T ≺n S. ∆0 will be used for Σ0 ; the cases ∆0 and Σ1 are
frequently of special interest. Note that if T ≺n S then T is also a
“Πn -elementary substructure”, that is, the truth value of a Πn formula
with free variables assigned values in T is the same, in either S or T .
The first step is accomplished by constructing an elementary substructure of Lα , which contains x and has small cardinality. This is a
standard construction in mathematical logic (e.g., proposition 3.3.2 of
[ChaKei]); lemma 2 is a version for set theory. Lemma 1 is known as
Tarski’s criterion, for a substructure to be elementary.
Lemma 1. Suppose T ⊆ S is a substructure. Suppose for each
68
formula F , and each ~t with ti ∈ T for 1 ≤ i ≤ k, if F (s, ~t) is true in S
for some s ∈ S then F (s, ~t) is true in S for some s ∈ T . Then T is an
elementary substructure.
Proof: The proof is by induction on F . For ¬G, since the value
in T and S are the same for G they are the same for ¬G; the other
propositional connectives are similar. For ∃wG, if the value is true in
T then (using the induction hypothesis) it is in true in S, and if it is
true in S then (using the induction hypothesis and the hypothesis of the
lemma) it is true in T . ⊳
Lemma 2. Suppose S is a structure, and X ⊆ S. Then there is a
structure T with X ⊆ T ≺ S, and |T | = max(|X|, ℵ0 ).
Proof: Suppose F is a formula, w is a free variable, and x1 , . . . xk
are the remaining free variables in alphabetic order. Let σwF (~s) be a
value r such that Fw~x (r, ~s) if such an r exists, else ∅. Let T be the
“closure” of X under the functions σF w . That is, let T0 = X; let Tn+1
equal Tn , with σF w (~s) added for all F , w, and ~s; and let T = ∪n Tn . It
is easy to see by lemma 1 that T is an elementary substructure, and the
cardinality claim is also easily seen. ⊳
The function σF w is called a Skolem function. A substructure constructed as in the lemma is called a Skolem hull. The axiom of choice is
used in the proof of lemma 2; this can be avoided in the case S = Lα .
Namely, let σF w be the <L -least r. With these Skolem functions, T
is the smallest elementary substructure of Lα containing X; see lemma
II.5.3 of [Devlin].
Lemma 3. The predicate “y = Lγ ” is absolute for any Lα where
α ∈ LimOrd and α > ω. Also, the existence condition for y holds in Lα .
Remarks on proof: The existence condition follows by theorem
19.3.b. Let G be the functions y = f (~x) such that there is a formula
∃wG defining f where G is ∆0 , and an integer i, such that when γ < α,
γ > ω, and xj ∈ Lγ for all j, then y ∈ Lγ+i ; and there is a w ∈ Lγ+i .
The lemma may be proved by showing that the functions Sat, DefBy,
Def, y = Lγ , and any others that are needed are in G. Since the definitions have only been sketched, a detailed proof is omitted. Note that
in a definition by recursion, if a bound γ + i on the rank of y is known
then a bound on, say, F ↾ x is also, since F ↾ x is a definable subset of
Lγ+i+2 . A detailed proof can be found in [Devlin]. ⊳
Lemma 4. Suppose S is a set satisfying the axiom of extensionality,
π : S 7→ T is the collapsing isomorphism, and X ⊆ S is transitive. Then
π(x) = x for all x ∈ X.
Proof: This follows by induction on ∈: π(x) = {π(w) : w ∈ x∩S} =
{w : w ∈ x} = x. ⊳
Lemma 5. Suppose N ≺1 M , F is a Σ1 formula, and |=M ∀uF ;
69
then |=N ∀uF .
Proof: If u ∈ N then |=M F (u) so |=N F (u)). ⊳
Note that if S satisfies extensionality (for example if it is transitive),
and T ≺0 , then T satisfies extensionality, and so the transitive collapse
of T may be taken.
Lemma 6 (Condensation lemma). Suppose α ∈ LimOrd, S ≺1 Lα ,
and T is the transitive collapse of S. Then T = Lβ for some β ∈ LimOrd
with β ≤ α.
Proof: Let β = T ∩ Ord; then by lemma 5 and the fact that T is
∈-isomorphic to S, β ∈ LimOrd. Let ∃wG be a formula as in lemma 3
defining “y = Lγ ”. Then |=Lα ∀γ∃y∃wG, so by lemma 5 and the fact
that T is ∈-isomorphic to S, |=T ∀γ∃y∃wG. Thus, Lγ ∈ T for all γ < β
(because T is transitive, G is ∆0 , and Ord is ∆0 ), and so ∪γ<β Lγ ⊆ T .
Also |=Lα ∀x∃γ∃y∃w(G ∧ x ∈ y), whence |=S ∀x∃γ∃y∃w(G ∧ x ∈ y),
and T ⊆ ∪γ<β Lγ . Since Lβ = ∪γ<β Lγ the lemma is proved. ⊳
Lemma 7. For α ≥ ω, |Lα | = |α|.
Proof: By theorem 19.3.e, |α| ≤ |Lα |. That |Lα | ≤ |α| follows by
induction on α. First, |Lω | = ℵ0 . Second, |Lα+1 | ≤ ℵ0 · |Lα | ≤ |α + 1|.
Third, for α ∈ LimOrd | ∪β<α Lβ | ≤ |α| · |α| = |α|. ⊳
Theorem 8. GCHL .
Proof: Suppose x ⊆ κ and x ∈ L; then x ∈ Lα for some α ≥ κ.
Let S be such that S ≺1 Lα , κ ∪ {x} ⊆ S, and |S| = κ. Let Lγ be
the transitive collapse of S; then γ < κ+ , and by lemma 4 κ ⊆ Lγ ,
whence π(x) = x and x ∈ Lγ . Hence, every constructible subset of κ is
in Lκ+ . The foregoing is an argument in ZF, and it follows that in L,
|Pow(κ)| = κ+ . ⊳
From this, V = L ⊢ZF GCH follows, and also, if ZF is consistent
then so is ZFC+GCH.
21. Forcing.
That GCH is consistent with ZFC is proved by showing that there
is a proper class (namely L) which is a model of ZFC+GCH. It is not
possible to show that ¬GCH is consistent with ZFC by this method,
called the method of inner models, where an inner model is a transitive
class which is a model of ZF. Indeed, it is not possible to show that
V 6= L is consistent using an inner model. Suppose M is a transitive
proper class, the axioms of ZF hold in M , and ⊢ZF (V 6= L)M . Then
since V = L ⊢ZF M = L, V = L ⊢ZF (V 6= L)L , so V = L ⊢ZF V 6= L,
so ⊢ZF V 6= L, so ZF is inconsistent. The transitivity requirement can
be removed; if M is any model then ∪α TC(M ∩ Vα ) is transitive and
∈-isomorphic to M .
An alternative approach, which turns out to work, is to start with
a transitive model M of ZFC. A set G which is not in M is then added,
70
to obtain a new model M [G]. G can be constructed in such a way that
various statements in the language of set theory (for example ¬GCH)
can be “forced” to hold in M [G]. This approach was discovered by Paul
Cohen in 1963, and has been in extensive use ever since.
There are some logical subtleties, which are discussed at the end
of the section. For example, it is not provable in ZFC that there is an
∈-model of ZFC. However, these complications can be circumvented, in
a manner well-known to set theorists, who therefore usually argue in the
above manner. To quote [Geschke]: “In order to prove the consistency
of ZFC+¬CH we pretend that there is a transitive set M such that
(M ,∈) is a model of ZFC. Using M we construct another transitive set
N such that (N ,∈) satisfies ZFC but not CH.” The method is quite
flexible; M can be a transitive class, even though if V = L there is no
way of “actually” enlarging M .
Say that a partially ordered set is a pair hP, ≤i where P is a set
and ≤ is a partial order on P . As usual, P rather than hP, ≤i is often
used to denote a partially ordered set. Also, the abbreviation “poset”
is in common use. To construct G, a partially ordered set P which is
a member of M is constructed. The pair hM, P i is called a “notion of
forcing”, or a “setting for forcing”. A definition is given of when a subset
of P is “generic”. Supposing G is a generic subset, a model M [G], the
“generic extension” of the “ground model” M by G, is defined. In a
generic extension, sentences will be true if and only if they are “forced”
to be, where this is a predicate which can be defined using the partial
order.
To specify further details some notions from partial order theory are
needed. First it should be noted that there are two conventions in use
for notions of forcing. Elements of P are called forcing conditions, and
there is a notion of one condition being stronger than another. Some
authors use p ≤ q to denote that p is stronger than q, and some use
p ≥ q. [Jech2] uses p ≤ q, and this will be used here.
If P is a partially ordered set say that a subset S ⊆ P is ≤-closed if
p ∈ S and q ≤ p imply q ∈ S. For p ∈ P let p≤ = {q : q ≤ p}. Then S is
≤-closed if and only if S = ∪p∈P p≤ . ≥-closed sets are defined similarly,
and also p≥ , p< , etc.
A subset D ⊆ P is said to be dense if ∀p ∈ P ∃q ∈ D(q ≤ p).
This definition may be given a topological interpretation. Recall that
a subset of a topological space is said to be dense if it intersects every
nonempty open set. In a partially ordered set P , the ≤-closed sets form
a topology, with the sets p≤ for p ∈ P forming a base. D is dense as
defined above if and only if it is dense in this topology.
A subset F ⊆ P is said to be a filter if it is nonempty, ≥-closed,
71
and whenever p, q ∈ F there is an r ∈ F with r ≤ p and r ≤ q.
Given a notion of forcing hM, P i, a filter G ⊆ P is said to be
M -generic if G ∩ D 6= ∅ for any dense D ⊆ P such that D ∈ M .
A basic example of an M -generic filter is as follows. Let P be the
set of functions p : d 7→ {0, 1} where d is a finite subset of ω. Let the
partial order on P be ⊇, so that a condition is stronger if its value is
defined for further integers. If M is a transitive model of ZFC then
P ∈ M . Suppose G ⊆ P is an M -generic filter.
- Since G is a filter, it follows that f = ∪G is a function with
Dom[f ] ⊆ ω.
- For any n ∈ ω the set {p ∈ P : n ∈ Dom[p]} is a dense subset of P ,
and is in M . Since G is generic, f must be defined at n. That is,
Dom[f ] = ω.
The function f is called a “Cohen generic real”. This is an example of
the use of the term “real” in sense 4 of chapter 14.
A generic extension M [G] will contain G as an element. Under mild
restrictions on P , G ∈
/ M , and M [G] is a proper extension. Elements
p, q in a partially order set P are said to be compatible if they have a
common extension, that is, if there is an r ∈ P such that r ≤ p and
r ≤ q; this may also be stated as, p≤ ∩ q ≤ 6= ∅. Elements which are not
compatible are said to be incompatible. Suppose P has the property
that for any p, p≤ contains incompatible elements. Suppose F ⊆ P is
a filter and F ∈ M . Given p, if p ∈
/ F then clearly p≤ ∩ (P − F ) is
nonempty; otherwise there are incompatible elements r1 , r2 ∈ p≤ , and
at most one of them can be in F . Thus, P − F is dense, and is an
element of M , so G ∩ (P − F ) 6= ∅, so G 6= F .
There are various methods of constructing the model M [G]. The
method of Boolean valued models, at least as treated in [Jech2], consists
of the following steps.
1. A Boolean algebra B is constructed from P .
2. A “Boolean valued model” V B is defined; by giving a recursive
definition of a class. An element is a function mapping elements of
lower rank to B.
3. Let M B be the class defined in M by the definition of V B ; each
element of M B is a “name”, and is an element of M .
4. For each name x ∈ M B , a value xG is defined by a recursive definition; and M [G] = {xG : x ∈ M B }.
The forcing predicate p F is defined for each forcing condition p and
formula F in the “forcing language”, i.e., the language of set theory with
M B names added as constants.
There are more direct methods, such as that used in [Kunen1].
1’. A class V P is defined recursively; an element is a relation τ ⊆ N ×p,
72
where N is the set of elements of lower rank. Alternatively, it is
a function whose domain is a set of elements of lower rank, with
f (w) a subset of P .
2’. Let M P be the class defined in M by the definition of V P , each
element of M P is a “name”, and is an element of M .
3’. For each name τ ∈ MP , a value τG is defined by a recursive definition; and M [G] = {τG : x ∈ M P }.
The forcing predicate p F is defined for each forcing condition p and
formula F in the “forcing language”, i.e., the language of set theory with
M P names added as constants.
The first method has the disadvantage that the extra machinery of
the Boolean algebra needs to be introduced; and the advantage that the
definition of the forcing relation is simpler, and many proofs are also.
Hereafter, the first approach will be used.
Recall the definition of a Boolean algebra from section 5; the symbols ⊔, ⊓, † , 0, 1 will be used for the operations, rather than ∪, ∩, c , ∅,
and U , with the latter reserved for their usual meaning on Pow(U ) for
a set U .
In a partially ordered set P , an element x ∈ P is said to be an
upper bound for a subset S ⊆ P if x ≥ y for all y ∈ S. An upper
bound x is a least upper bound (or supremum) if x ≤ x′ for any upper
bound x′ . If is supremum exists then it is unique, as is easily seen.
The notions of lower bound and greatest lower bound (or infimum) are
defined “dually”; (x ≤ y for all y ∈ S, x ≥ x′ for any lower bound).
Lemma 1. Suppose B is a Boolean algebra, and x, y ∈ B.
a. x ⊔ x = x, x ⊓ x = x, x ⊔ 1 = 1, x ⊓ 0 = 0, x ⊔ (x ⊓ y) = x, and
x ⊓ (x ⊔ y) = x.
b. x ⊔ y = y if and only if x ⊓ y = x.
c. Defining x ≤ y to hold if and only if x ⊔ y = y, ≤ is a partial order
on B.
d. x ⊔ y is the least upper bound of {x, y} in the order ≤, and x ⊓ y is
the greatest lower bound.
Proof: x⊔x = (x⊔x)⊓1 = (x⊔x)⊓(x⊔x† ) = x⊔(x⊓x† ) = x⊔0 = x.
The proof that x ⊓ x = x is dual. x ⊔ 1 = x ⊔ (x ⊔ x† ) = x ⊔ x† = 1.
The proof that x ⊓ 0 = 0 is dual. x ⊔ (x ⊓ y) = (x ⊓ 1) ⊔ (x ⊓ y) =
x ⊓ (1 ⊔ y) = x ⊓ 1 = x. The proof that x ⊓ (x ⊔ y) = x is dual. If
x ⊔ y = y then x ⊓ y = x ⊓ (x ⊔ y) = x. The argument that x ⊓ y = x
implies x ⊔ y = y is dual. x ≤ x since x ⊔ x = x. If x ⊔ y = y and
y ⊔ z = z then x ⊔ z = x ⊔ (y ⊔ z) = (x ⊔ y) ⊔ z = y ⊔ z = z. If
x ⊔ y = y and y ⊔ x = x then x = y. This shows that ≤ is a partial
order. That x ≤ x ⊔ y follows since x ⊔ (x ⊔ y) = (x ⊔ x) ⊔ y = x ⊔ y;
y ≤ x ⊔ y follows similarly. If x, y ≤ u then x ⊔ u = y ⊔ u = u; hence
73
(x ⊔ y) ⊔ u = x ⊔ (y ⊔ u) = x ⊔ u = u, so x ⊔ y ≤ u. This shows that ⊔
is the least upper bound. The argument that x ⊓ y is the greatest lower
bound is dual. ⊳
A Boolean algebra is said to be complete if any subset S ⊆ B has
a least upper bound (denoted ⊔S) and a greatest lower bound (denoted
⊓S).
The construction of the Boolean algebra B of step 1 above makes
use of the notion of the regular open sets of a topological space X.
Recall the definitions of the interior and closure of a subset of W ⊆ X,
given in section 14; these will be denoted W int and W cl respectively.
These operations have the following properties.
- If W1 ⊆ W2 then W1int ⊆ W2int and W1cl ⊆ W2cl .
- For an open set U , U ⊆ (U cl )int (since U ⊆ U cl and U int = U ).
- For a closed set K, (K int )cl ⊆ K (since K int ⊆ K and K cl = K).
- For an open set U , ((U cl )int )cl = U cl (since U cl ⊆ ((U cl )int )cl ⊆
U cl ).
- For a closed set K, ((K int )cl )int = K int (since K int ⊇ ((K int )cl )int
⊇ K int ).
An open set is said to be regular open if (U cl )int = U . It is easily
seen from the above facts that if U is an open set then (U cl )int is a
regular open set. An example of an open set which is not regular open
is R − {0}; the closure of this is R, and the interior of the closure is
again R.
If T is a topological space let ro(T ) denote the regular open sets,
equipped with the following operations.
- x ⊔ y = ((x ∪ y)cl )int .
- x ⊓ y = x ∩ y.
- x† = (xc )int .
- 0 = ∅.
- 1 = T.
Theorem 2. ro(T ) is a complete Boolean algebra.
Remarks on proof: A structure with operations ⊔ and ⊓ is said to
be a lattice if it satisfies the axioms
- x ⊔ x = x, x ⊔ y = y ⊔ x, x ⊔ (y ⊔ z) = (x ⊔ y) ⊔ z
- x ⊓ x = x, x ⊓ y = y ⊓ x, x ⊓ (y ⊓ z) = (x ⊓ y) ⊓ z
- x ⊔ (x ⊓ y) = x, x ⊓ (x ⊔ y) = x
The definition of ≤ in a Boolean algebra given above may be given in
any lattice. An element l such that l ≤ x for all x is unique if it exists,
and is called a 0 element. An element g such that g ≥ x for all x is
unique if it exists, and is called a 1 element.
A lattice is said to be complete if the greatest lower bound and and
least upper bound exist for every subset. It suffices that the greatest
74
lower bound exist: Suppose S is a subset, and let U be the set of upper
bounds of S. The greatest lower bound of U is the least upper bound
of S.
A Heyting algebra is a lattice with a 1 element, and a binary operation →, such that the following axioms hold.
- x → x = 1;
- x ⊓ (x → y) = x ⊓ y;
- y ⊓ (x → y) = y;
- x → (y ⊓ z) = (x → y) ⊓ (x → c).
In a Heyting algebra, x → y is the largest element among the elements
z such that x ⊓ z ≤ y. Conversely if such an element always exists, the
operation → may be defined.
Suppose L is a Heyting algebra with a 0 element. The pseudocomplement xp of an element is defined to be x → 0. The following may
be shown.
- x ≤ y if and only if xp ≥ y p , and x ≤ y p if and only if xp ≥ y.
- x ≤ xpp , xppp = xp , and (x ⊔ y)p = xp ⊓ y p
- Letting Lp denote {xp : x ∈ L}, Lp = {x ∈ L : xpp = x}.
- Lp is a Boolean algebra, with greatest lower bound x ⊓ y, least
upper bound (xp ⊓ y p )p = (x ⊔ y)pp , and complement xp .
A complete lattice has a Heyting algebra → operation if and only if
it satisfies the distributive law x⊓(⊔i yi ) = ⊔i (x⊓yi ) for any family {yi }.
Such a lattice is called a complete Heyting algebra. If L is a complete
Heyting algebra then the greatest lower bound in Lp is the same as the
greatest lower bound in L. It follows that Lp is a complete Boolean
algebra.
Further details regarding the above stated facts may be found in
[Dowd1].
The open sets of a topological space T , with the subset order, form
a complete Heyting algebra; ∪ is the least upper bound, ∩ is the greatest
lower bound, ∅ is a 0 element, and T is a 1 element. Let Ω(T ) denote
this algebra.
For U ∈ Ω(T ), U p = (U c )int since by definition this is the largest
open set contained in U c . If U is regular open then V p = U where
V = (U cl )c , and if V p = U for some open V then U is regular. That is,
the regular open sets are exactly the complete Boolean algebra Ω(X)p .
⊳
As noted above, the ≤-closed sets in a partial order P form a topology on P . Let ro(P ) denote the set of regular open sets in this topology.
By theorem 2 ro(P ) is a complete Boolean algebra. The next two lemmas give some facts about partial orders and Boolean algebras, which
will be required. Say that a map f : P 7→ Q between partially ordered
75
sets is order-preserving if x ≤ y ⇒ f (x) ≤ f (y).
Lemma 3. Let P be a partially ordered set, with the ≤-closed sets
as the open sets of a topology on P . Suppose S ⊆ P .
a. For S ⊆ P , w ∈ S cl if and only if w≤ ∩ S 6= ∅.
b. An open subset U ⊆ P is regular open if and only if p≤ ⊆ U cl ⇒
p ∈ U.
c. If i : P 7→ Q is a bijection of partially ordered sets, with inverse
function j : Q 7→ P , and both i and j are order preserving, then i
and j are order isomorphisms.
d. If i : P 7→ Q is an order isomorphism of partially ordered sets, then
i preserves all greatest lower bounds and least upper bounds which
exist.
Proof: For part a, w ∈ S cl if and only if U ∩ S 6= ∅ for any open
set U with w ∈ U ; the latter is clearly true if and only if w≤ ∩ S 6= ∅.
Part b follows because p ∈ (U cl )int if and only if p≤ ⊆ U cl . For part
c, that i and j are bijective is a well-known fact of informal set theory.
If i(p1 ) ≤ i(p2 ) then p1 = j(i(p1 )) ≤ j(i(p2 )) = p2 ; and similarly for j.
For part d, if b is a least upper bound for S, then i(b) is an upper bound
for i[S]. If i(b′ ) is any other upper bound, then b′ is an upper bound for
S, so b′ ≥ b, so i(b′ ) ≥ i(b). The argument for greatest lower bounds is
dual. ⊳
Lemma 4.
a. In a Boolean algebra, x ⊓ y ≤ z ⇔ y ≤ x† ∪ z.
b. In a complete Boolean algebra, x ⊓ (⊔Y ) = ⊔y∈Y (x ⊓ Y ).
c. An order isomorphism between Boolean algebras is a Boolean algebra isomorphism.
Proof: For part a, if y ≤ x† ∪ z then x ⊓ y ≤ x ⊓ (x† ∪ z) = x ⊓ z ≤ z.
Also, y = (x ⊔ x† ) ⊓ y = (x ⊓ y) ⊔ (x† ⊓ y), so if x ⊓ y ≤ z then
y ≤ z ⊔ (x† ⊓ y) ≤ x† ⊔ z. For part b, let j = ⊔Y ; since y ≤ j for y ∈ Y ,
x ⊓ y ≤ x ⊓ j for y ∈ Y . Suppose x ⊓ y ≤ b for y ∈ Y ; then y ≤ x† ⊔ d
for y ∈ Y , so j ≤ x† ⊔ d, so x ⊓ j ≤ x ⊓ d. This shows that x ⊓ j is the
least upper bound of {x ⊓ y : y ∈ Y }. For part c, if i is the map then
by lemma 3.d i preserves ⊔ and ⊓. Since i(0) ≤ i(b) for all b, and i is
surjective, i preserves 0; similarly i preserves 1. Since c = b† if and only
if b ⊔ c = 1 and b ⊓ c = 0, i preserves † as well. ⊳
A map e : P 7→ Q from a partial order P to a partial order Q is a
dense embedding if it satisfies the following requirements.
a. x ≤ y ⇒ e(x) ≤ e(y).
b. e(x) and e(y) are compatible if and only if x and y are.
c. e[P ] is a dense subset of Q.
A map e : P 7→ B from a partial order P to a Boolean algebra B is said
to be a dense embedding if it is a dense embedding of P in B − {0}.
76
Theorem 5. Let e0 : P 7→ ro(P ) be the map where e0 (p) =
((p≤ )cl )int . Then e0 is a dense embedding of P in ro(P ).
Proof: For a regular open set U , p ∈ U if and only if p≤ ⊆ U if
and only if e0 (p) ⊆ U . In particular, p ∈ e0 (p), and e0 (p) 6= ∅. Since
x ≤ y ⇒ x≤ ⊆ y ≤ , and the closure and interior operations preserve
inclusion, requirement a for a dense embedding follows. If w ≤ x and
w ≤ y then e(w) ≤ e(x) and e(w) ≤ e(y), so compatible elements map to
compatible elements (this much follows by requirement a). If e(x) and
e(y) are compatible then there is a w ∈ P such that w ∈ ((x≤ )cl )int and
w ∈ ((y ≤ )cl )int . Then w≤ ⊆ (x≤ )cl and w≤ ⊆ (y ≤ )cl . But u ∈ (x≤ )cl if
and only if u is compatible with x; thus there is a u1 ≤ w with u1 ≤ x,
and a u2 ≤ u1 with u2 ≤ y, so x and y are compatible. Requirement b
is thus proved. An element of ro(P ) is a ≤-closed subset S ⊆ P such
that (S cl )int = S. For such an S, x ∈ S if and only if x≤ ⊆ S if and only
if e(x) = ((x≤ )cl )int ⊆ (S cl )int = S. Requirement c follows by choosing
any x ∈ S. ⊳
Lemma 6. Suppose e : P 7→ B is a dense embedding of P in a
complete Boolean algebra B. For b ∈ B − {0} let Ub = {p ∈ P : e(p) ≤
b}. Then Ub is a nonempty regular open set, and ⊔e[Ub ] = b.
Proof: Ub is nonempty by requirement c for a dense embedding. If
p ∈ Ub and q ≤ p then e(q) ≤ e(p) ≤ b; thus Ub is open. Suppose p ∈
/ Ub .
Then e(p) 6≤ b, so e(p) ⊓ b† 6= ∅, so e(q) ≤ e(p) ⊓ b† 6= ∅ for some q ∈ P .
In particular e(q) and e(p) are compatible, so p and q are compatible.
Choose r ∈ P with r ≤ q and r ≤ p. Then e(r) ≤ e(q) ≤ b† , whence
r≤ ∩ Ub = ∅, whence r ∈
/ Ubcl . Thus, p≤ 6⊆ Ubcl has been shown, and so
Ub is regular by lemma 3.b. Let b0 = ⊔e[Ub ]. Clearly b0 ≤ b. If b0 < b,
it is easily seen that b ⊓ b†0 6= 0, so e(p) ≤ b ⊓ b†0 for some p ∈ P , so
e(p) ≤ b0 , a contradiction. Thus, b0 = b. ⊳
Theorem 7. If e : P 7→ B is a dense embedding of P in a complete
Boolean algebra B then B is isomorphic to ro(P ).
Proof: Let i : ro(P ) − {∅} 7→ B be the map where i(U ) = ⊔e[U ]. If
U is a regular open set and p ∈ U then e(p) ≤ ⊔e[U ], whence Ran(i) is
contained in B − {0}. Let j : B − {0} 7→ ro(P ) − {∅} where j(b) = Ub ,
where Ub is as in lemma 6. By lemma 6 i is surjective, indeed j is
a right inverse. To show that j is a left inverse, it suffices to show
that for p ∈ P , e(p) ≤ ⊔e[U ] if and only if p ∈ U . If p ∈ U then clearly
e(p) ≤ ⊔e[U ]. Suppose e(p) ≤ ⊔e[U ]. If e(p)⊓e(q) = 0 for all q ∈ U then
e(p) ⊓ (⊔e[U ]) = 0, a contradiction; and it follows that p is compatible
with q for some q ∈ U , whence p≤ ⊓U 6= ∅. The same argument holds for
any p′ ≤ p. By lemma 3.b, p ∈ U . Clearly S ⊆ T ⇒ i(S) ≤ i(T ), and i
is order-preserving. Clearly b ≤ c ⇒ Ub ⊆ Uc , and j is order-preserving.
⊳
77
Thus, a partial order P can be “converted” to a complete Boolean
algebra B = ro(P ) in a canonical way. This fact permits a simple
construction of M [G], where G ⊆ P is a generic filter. The filter G can
be “converted” to a subset GB of B as follows: let GB = {b ∈ B : ∃p ∈
G(e(p) ≤ b)}; it is readily seen that GB is the smallest filter in B − {0}
containing e[G].
Let B be a complete Boolean algebra. The class V B is defined
recursively as follows.
- V0B = ∅
B
- Vα+1
= {f : f : d 7→ B where d ⊆ VαB }.
- For a limit ordinal α, VαB = ∪β<α VβB
- V B = ∪α∈Ord VαB
This definition “mimics” the definition of Vα , except elements of Vα+1
have elements of Vα as members, according to elements of B, rather
than {0, 1}.
The formula for V B has the parameter B. Basic theorems concerning the class are still theorems of ZFC, though, as they are true for any
B.
As already indicated, in any transitive model M of ZFC (set or
class) containing P , the formula for V B defines a class in M , which will
be denoted M B . Elements of M B are called “names”, as they are to be
used as names for the elements of M [G]. Indeed, given an M -generic
filter G ⊆ P , the value xG of a name x is defined recursively as follows.
- ∅G = ∅
- xG = {wG : w ∈ Dom[x] ∧ x(w) ∈ GB }
The model M [G] equals {xG : x ∈ M B }.
M is not directly a substructure of M [G]; however there is a “canonical embedding”, wherein each x ∈ M is assigned a name x̌. This is
defined recursively as follows.
- ˇ
∅=∅
- x̌ is the function with domain {w̌ : w ∈ x}, and x̌(w̌) = 1 for all
w ∈ x.
Of course, it remains to show that this is an embedding; this will be
shown below, along with various other basic facts.
The forcing language may be defined to be the pairs hF~v , ~xi where
as in section 20 F is a formula in the language of set theory, ~v is a list of
variables including the free variables of F , and x is a corresponding list
of names (elements of M B , or in some cases V B ). Similarly to remarks
in section 6, the notation F (x1 , . . . , xn ) will be used for hF~v , ~xi, when
no confusion results.
Recall from the proof of theorem 2 the operation → in a Heyting
algebra. A Boolean algebra is a Heyting algebra when x → y is defined
78
to be x† ⊔ y. As will be seen, this operation is convenient when giving
the technical details of forcing.
Given a formula F (x1 , . . . , xn ) in the forcing language, its “truth
value” JF (x1 , . . . , xn )K, an element of B, may be defined. First, a recursive definition is
Pgiven for atomic formulas, as follows.
- Jx ∈ yK = v∈π1 [y] (Jv = wK) ⊓ y(v))
P
- Jx = yK = Vxy ⊓ Vyx where Vxy = w∈π1 [x] (x(w) → Jw ∈ yK)
The value for any formula may be defined by recursion on formulas, as
follows.
- J¬F (~x)K = JF (~x)K†
- JF (~x) ∧ G(~x)K = JF (~x)K ⊓ JG(~x)K
- JF (~x) ∨ G(~x)K = JF (~x)K ⊔ JG(~x)K
- JF (~x) ⇒ G(~x)K = JF (~x)K → JG(~x)K
- J∀vF (~x)K = ⊓w∈M B JF (w, ~x)K
- J∃vF (~x)K = ⊔w∈M B JF (w, ~x)K
The forcing relation is now easy to define. For a condition p and a
formula F of the forcing language, p F if and only if e(p) ≤ JF K. From
hereon proofs will be omitted. They may be found in any of numerous
standard references; specific references to [Jech2] will be given. The
following lemma gives some basic facts about the truth value function.
Lemma 8.
a. Jx = xK = 1
b. Jx = yK = Jy = xK
c. Jx = yK ⊓ Jy = zK ≤ Jx = zK
d. Jx′ = xK ⊓ Jx ∈ yK ≤ Jx′ ∈ yK
e. Jy ′ = yK ⊓ Jx ∈ yK ≤ Jx ∈ y ′ K
f. J∀w(w ∈ x ⇔ w ∈ y)K ≤ Jx = yK
g. If S ⊆ B, and for each b ∈ S xb is an element of M B , then there is
an element y ∈ M B such that for each b ∈ S, b ≤ Jy = xb K.
h. If F is a formula of the forcing language, with free variable v, then
there is an element w ∈ M B such that J∃vF (~x)K = JF (w, ~x)K
i. If F (~x) is a ∆0 formula with values xi ∈ M then |=M F (~x) if and
only if JF (x̌1 , . . . , x̌k )K = 1.
Remarks on proof: Part a is lemma 14.15. part b is immediate from
the definition. Parts c-e are lemma 14.16. Part f-h are lemmas 14.17 to
14.19. Part i is lemma 14.21. ⊳
The next lemma gives a fundamental relation between truth in
M [G] and the truth value function.
Lemma 9. Suppose x, xi , y ∈ M B , and F (~x) is a formula of the
forcing language.
a. xG ∈ y G if and only if Jx ∈ yK ∈ GB
b. xG = y G if and only if Jx = yK ∈ GB
79
G
c. |=M[G] F (xG
x)K ∈ GB
1 , . . . , xk ) if and only if JF (~
Remarks on proof: Parts a and b are lemma 14.28. Part c is theorem
14.29. This is proved by an induction on formulas, using parts a and
b for atomic formulas, and some properties of GB . In particular, for a
Boolean algebra B, an M -generic filter H in B − {0} is an M -generic
ultrafilter in B (exercise 14.10). H is an ultrafilter if and only if for any
b ∈ B, either b ∈ H or b† ∈ H. H is M -generic if and only if for any
subset S ⊆ H such that S ∈ M , ⊓S ∈ G. ⊳
The next lemma shows that the axioms of ZFC are “true” in V B
(or M B ).
Lemma 10. If F is an axiom of ZFC then JF K = 1.
Remarks on proof: This is theorem 14.24. Elements may be shown
to exist by explicitly constructing them. ⊳
Having proved sufficiently many lemmas, the main theorems of forcing theory can be proved. These are the main tools for using forcing,
although the lemmas are often useful also. The following theorem is
known as the “forcing theorem”.
Theorem 11. If F (~x) is a sentence of the forcing language then
G
|=M[G] F (xG
x)).
1 , . . . , xk ) if and only if ∃p ∈ G(p F (~
Remarks on proof: This is theorem 14.6. It follows easily from
lemma 9.c above, and the definition of the forcing relation. ⊳
The next theorem states basic properties of the forcing relation.
Theorem 12. For formulas F , G of the forcing language, and p ∈ P ,
the following hold.
a. If p F and q ≤ p then q F
b. For no p do both p F and p ¬F hold
c. For every p there is a q ≤ p such that either q F or q ¬F
d. p F ∧ G if and only if p F and p G
e. p ¬F if and only if for all q ≤ p q 6 F
f. p ∃vF (~x) if and only if ∀q ≤ p∃r ≤ q∃w ∈ M B (r F (w, ~x))
g. p F ∨ G if and only if ∀q ≤ p∃r ≤ q(p F or p G)
h. p ∀vF (~x) if and only if ∀w ∈ M B (p F (w, ~x))
Remarks on proof: This is theorem 14.7. It can be proved from the
definition of , along with some facts from lemma 8 above. In some
treatments, items d to f are used in defining . ⊳
The next theorem is called the “generic model theorem”.
Theorem 13. Suppose M is a transitive model of ZFC, and G is a
generic filter in a notion of forcing in M .
a. M [G] is a model of ZFC
b. M ⊆ M [G]
c. G ∈ M [G]
80
d. If N is a transitive model of ZFC such that M ⊆ N and G ∈ N
then N = M [G]
e. OrdM[G] = OrdM
Remarks on proof: This is theorem 14.5. Part a follows by lemma
10 above and the fact that 1 ∈ GB for any generic filter G. Part b
follows because x 7→ x̌G preserves ∈ and M is transitive. For part c,
let Ġ be the name where Ġ(b̌) = b for all b ∈ B. Then ĠG = GB , and
G = {p ∈ P : e(p) ∈ GB }. A proof of this last fact, and many other
useful facts about filters in partial orders and Boolean algebras, can be
found in [TakZar2]. ⊳
Suppose M is a standard model of ZFC, P is the notion of forcing
for adding a Cohen generic real described above, and G is a generic
ultrafilter. By theorem 13 and the absoluteness of α 7→ Lα , LM =
LM[G] . Since G ∈
/ M, G ∈
/ LM[G] . Since G ∈ M [G], in M [G], V 6= L.
The above observation is only a statement about models of ZFC.
It can’t immediately be used to conclude that V 6= L is consistent with
ZFC, because the existence of M , or of G, has not been established (in
fact M cannot be proved to exist). G does exist if M is countable; see
section 28.
There are various ways of “converting” a construction of a model
M [G] of a sentence F , to a consistency proof. A simple one is to note
that the construction can be transformed into a proof that JF K 6= 0
(indeed, ordinarily JF K = 1). On the other hand, if ⊢ZFC ¬F then
J¬F K = 1.
22. ¬CH is consistent.
Adding a single Cohen generic real produces a model where V = L
is false. Adding a large quantity of them produces a model where 2ℵ0
is as large as desired. However, there are restrictions on what 2ℵ0 can
be; see section 51.
In this section let P be a notion of forcing in a transitive model M
of ZFC, and let G be a generic set. Forcing arguments often make use of
properties of P , to prove properties of M [G]. A frequently encountered
such property is the following. P is said to satisfy the κ-chain condition
(κ-c.c.), if whenever S ⊆ P and any two elements of S are incompatible,
then |S| < κ. The ℵ1 -chain condition is commonly called the countable
chain condition (c.c.c.).
Authors (for example [Fremlin2]) have observed that the terminology “chain condition” is strained, as chains (defined below) are not
involved; but it is in such wide use that this fact is ignored. Further, it
may be seen that a complete Boolean algebra satisfies the κ-c.c. if and
only if there is no ascending chain of length κ.
81
Theorem 1. Suppose P satisfies the κ-c.c., λ ≥ κ, and cf(α) = λ in
M . Then cf(α) = λ in M [G].
Proof: Let λ′ denote cf(α) in M [G]. If f : λ 7→ α in M then
f : λ 7→ α in M [G] because M ⊆ M [G], so λ′ ≤ λ. Suppose µ < λ and
f : µ 7→ α in M [G]; it suffices to show that the range of f is bounded.
Let f˙ be a name for f , and let p ∈ P be such that p f˙ : µ̌ 7→ κ̌. For
each β < λ let Sβ = {γ : ∃q < p(q f˙(γ) = β}. For each γ ∈ Sβ choose
a qβ such that qγ f˙(γ) = β. Then the qγ are pairwise incompatible,
so since P satisfies the κ-c.c. there are at fewer than κ of them, so
|Sβ | < κ. Since λ is regular in M , it follows that | ∪β<µ Sβ | < λ,
whence ∪β<µ Sβ < δ for some δ < α, whence p f˙[µ] ⊆ δ, whence
|=M[G] f [µ] ⊆ δ. ⊳
Corollary 2. Suppose P satisfies the κ+ -c.c., and λ ≥ κ+ . Then λ
is a cardinal in M if and only if λ is a cardinal in M [G].
Proof: As noted in section 16, the property of being a cardinal is
is down-absolute, and one direction follows. The other direction follows
by induction. If λ is singular in M then it is regular, so by the theorem
it is a regular cardinal in M [G]. If λ is a limit cardinal in M then in
M [G] it is a union of cardinals, so is a cardinal. ⊳
For any cardinal κ, κ many Cohen reals may be added by considering the notion of forcing, where the elements are the functions
p : d 7→ {0, 1} where d is a finite subset of κ × ω. As in the case of
a single Cohen real, if G is a generic filter then ∪G is a function with
domain κ × ω.
Lemma 3 (∆-system lemma). Let S be an uncountable set of finite
sets. Then there is a finite set r, and an uncountable subset D ⊆ S,
such that for any s, t ∈ D, s ∩ t = r.
Remarks on proof: This is a classic theorem, proved by N. Shanin
in 1946. It is theorem 9.18 of [Jech2]. ⊳
A system D where any two sets intersect in the same set r is called
a ∆-system, with root r.
Corollary 4. If S is an uncountable set of functions f : d 7→ {0, 1}
where d is a finite set then S satisfies the c.c.c.
Proof: By lemma 3 there is an uncountable set S2 ⊆ S such that
given any two functions in S2 , their domains intersect in a fixed set r.
In turn there is an uncountable subset S3 ⊆ S2 , such that f ↾ r is the
same for all f ∈ S3 . Any two f ∈ S3 are compatible. ⊳
Lemma 5. Suppose GCH holds. If λ < cf(κ) then κλ = κ.
Proof: If f : λ 7→ κ then since λ < cf(κ), f is bounded. It follows
that κλ = ∪α<κ αλ . Now, αλ ≤ 2|α|·λ = (|α| · λ)+ ≤ κ, and the lemma
follows. ⊳
Let P be a partially ordered set. A chain is defined to be a subset
82
C ⊆ P , which is linearly ordered, i.e., such that if x, y ∈ C then x ≤ y
or y ≤ x. A maximal element in P is an element p ∈ P such that if
q ≥ p then q = p.
Lemma 6 (Zorn’s lemma). Suppose P is a partially ordered set such
that for every chain C ⊆ P there is an element of P which is an upper
bound for C. Then P contains a maximal element.
Proof: Using the axiom of choice, there is a function c which assigns
to each S ⊆ P a strict upper bound if it has one, else ∅. Define by transfinite induction a sequence Cα of chains, where Cα+1 = Cα ∪ {c(Cα )}, if
Cα has a strict upper bound; and Cα = ∪β<α Cβ if α ∈ LimOrd. Using
the axiom of replacement eventually a Cα must be obtained with no
strict upper bound. If p is an upper bound for Cα then p is a maximal
element of P . ⊳
Lemma 7. If P is a partially ordered set and S ⊆ P , then there is
a pairwise incompatible subset T ⊆ S, which is maximal among the set
of such, ordered by inclusion.
Proof: Suppose C is a chain of pairwise incompatible subsets. Let
T = ∪C, and suppose x, y ∈ T . Then x ∈ Sx and y ∈ Sy for some
Sx , Sy ∈ C. Since C is a chain, either Sx ⊆ Sy or Sy ⊆ Sx , so for some
S ∈ C, x, y ∈ S, and so x and y are incompatible. ⊳
Lemma 8. Suppose P is a partially ordered set, U ⊆ P is a regular
open set, and M is a maximal pairwise incompatible subset of U . Then
U = ⊔{e(p) : p ∈ M }.
Proof: Let V = ⊔{e(p) : p ∈ M }. Clearly V ⊆ U . If V ⊂ U then
there is a q ∈ P such that e(q) ⊆ U ∩(V c )int . From this, it follows that q
is incompatible with every element of M , contradicting the maximality
of M . ⊳
Theorem 9. Suppose M is a model of GCH. Suppose κ has uncountable cofinality (in M ). Let M [G] be a generic extension by the
notion of forcing adding κ Cohen generic reals. Then in M [G], 2ℵ0 = κ.
Proof: Let f = ∪G and for α < κ let fα : ω 7→ {0, 1} be the function
where fα (n) = f (α, n). Then for any α 6= β, {p ∈ P : ∃n(fα (n) 6=
fβ (n)} is dense, and since G is generic, |=M[G] fα 6= fβ . From this, it
follows that |=M[G] 2ℵ0 ≥ κ, since there is an injection in M [G] and by
lemmas and corollaries 4 and 6 κ is a cardinal in M [G].
Given any generic extension, and any cardinal λ in M , let µ1 = |2λ |
in M [G], let µ′1 = |2λ | in M , and let µ2 = |B|λ in M . For each S ⊆ λ
in M [G] let Ṡ be a name, and let gS : λ 7→ B be the function where
gS (α) = Jα̌ ∈ ṡK. If gS = gT then S = T , since whether α ∈ S is
determined by Jα̌ ∈ ṡK. Thus, in M µ′1 ≤ µ2 . µ1 ≤ µ′1 by downabsoluteness of cardinality.
Using lemma 8 and corollary 4, |B| ≤ |P |ℵ0 = κℵ0 . Using GCH in
83
M , |B| ≤ κ. Finally, |=M[G] 2ℵ0 ≤ |B|ℵ0 = κ. ⊳
Only the first paragraph of the proof is required to show that ¬GCH
holds in M [G]. The additional argument gives the exact size of 2ℵ0 in
M [G]. The hypothesis of GCH is not needed to do so; in general it is
κℵ0 (see [Jech2]).
Theorem 9 can be generalized to larger cardinals, by using the
functions f : d 7→ {0, 1} where d ⊆ λ and |d| < κ. If κ is regular and
2<κ = κ (this requirement follows from GCH) then P satisfies the κ+ c.c. To show that no new cardinals ≤ κ are introduced, the notion of
a κ-closed partially ordered set is introduced. If P is κ-closed, then no
new cardinals ≤ κ are introduced; further the above forcing notion is
< κ-closed. It follows that if λ > κ and λκ = κ (which follows from
GCH if κ < cf(λ)) then |=M[G] 2κ = λ. See chapter 15 of [Jech2].
23. Clubs, stationary sets, and diamond.
In 1972 R. Jensen defined a principle about sets, which has proved
to be of importance. Before giving the definition, it is necessary to first
give some preliminary definitions, which have many uses in set theory.
Suppose S is a set of ordinals. An ordinal β is said to be a limit
point of S if S ∩ β is unbounded below β. A limit point β must be a
limit ordinal, since if β = γ + 1 and δ ∈ S then δ ≤ γ, so S is bounded
below β.
Suppose α is a limit ordinal. A subset S ⊆ α is said to be closed if,
whenever β < α is a limit point of S then β ∈ S. A subset S which is
closed and unbounded is said to be a club subset, or just a club.
The notion of an ideal in a partially ordered set P is dual to that
of a filter, that is, a subset I ⊆ P is an ideal if it is nonempty, ≤-closed,
and whenever p, q ∈ I there is an r ∈ F with r ≥ p and r ≥ q. If P
is a family of subsets of a set X, ordered by inclusion, it is a common
notion that a filter is a collection of “large” subsets, and an ideal is a
collection of “small” subsets. For example the meager and measure 0
subsets of R are ideals of small sets.
The club subsets are of most interest when α is an uncountable cardinal (as will be seen in the proof of theorem 2, though, the completely
general definition is sometimes needed). In this case, the club subsets
are closed under intersection (this will be shown in section 31). Thus,
the subsets which contain a club subset form a filter. The complements
of subsets in the club filter form an ideal, and are called thin subsets;
a subset T is thin if there is a club subset C such that T ∩ C = ∅. A
subset which is not thin is called stationary. Thus, S is stationary if
and only if for any club subset C, S ∩ C 6= ∅.
The diamond principle (often denoted ♦) is as follows. There is a
system of subsets hSα i of ℵ1 such that
84
1. Sα ⊆ α for any α < ℵ1 , and
2. for any subset X ⊆ ℵ1 , {α < ℵ1 : X ∩ α = Sα } is stationary.
Theorem 1. ♦ ⊢ZFC CH.
Proof: If X ⊆ ω then it follows from ♦ that for some ordinal α,
X = X ∩ α = Sα . ⊳
Theorem 2. V = L ⊢ZFC ♦.
Remarks on proof: If ♦ does not hold for hSα i, there is a set S ⊆ ℵ1
and a club C ⊆ ℵ1 such that S ∩ α 6= Sα for all α ∈ C. The idea of
the proof is to construct hSα i to rule out every “condensed” version of
such an occurrence; a sequence Cα , where Cα is a club subset of α for
α ∈ LimOrd, is constructed along with the Sα .
Let C0 = S0 = ∅. Let Cα+1 = Sα+1 = α + 1. For α ∈ LimOrd, if
there is an S ⊆ α and a club C ⊆ α such that S ∩ ξ 6= Sξ for all ξ ∈ C,
let hSα , Cα i be the <L -least such pair; otherwise let Cα = Sα = α.
If ♦ does not hold for hSα i, let hS, Ci be the <L -least pair which is a
counterexample. Then hSα i, hCα i, S, and C are in Lℵ2 , and satisfy their
defining formulas in Lℵ2 . Let M be a countable elementary substructure
of Lℵ2 . Then hSα i, hCα i, S, and C are in M . Also, M ∩ ℵ1 = δ for
some δ < ℵ1 .
Suppose π : M 7→ Lβ is the collapsing isomorphism. Then π(ℵ1 ) =
δ, π(hSα i) = hSα : α < δi, π(hCα i) = hCα : α < δi, π(S) = S ∩ δ, and
π(C) = C ∩ δ. In Lδ , and hence in L, hS ∩ δ, C ∩ δi is the <L -least pair
hS ′ , C ′ i such that C ′ is club and S ′ ∩ ξ 6= Sξ for all ξ ∈ C ′ . Hence by the
definition of hSα i, S ∩ δ = Sδ . On the other hand C ∩ δ is unbounded
below δ, and C is closed, so δ ∈ C. This is a contradiction.
For more details of the proof, see [Devlin] or [Jech2]. ⊳
There are generalizations of ♦ to larger cardinals. In particular,
suppose κ is a cardinal of uncountable cofinality, and E ⊆ κ is stationary; then ♦κ (E) is the principle that, there exists a sequence hSα : α ∈
Ei with Sα ⊆ α, such that for any X ⊆ κ, {α ∈ E : X ∩ α = Sα } is a
stationary subset of κ.
For κ a regular uncountable cardinal ♦κ (E) follows from V = L;
this was proved by Jensen in 1972, and a proof may be found in [Devlin].
Stronger results have been proved since. To state these, a definition is
required, which has various uses. Suppose κ is a regular uncountable
cardinal, and λ < κ is a regular cardinal. Let Eλκ denote {α < κ :
cf(α) = λ}. Let Card denote the class of cardinals.
Theorem 3. Eλκ is stationary.
Proof: Let C be a club in κ. Let f : κ 7→ κ be the function that
enumerates C in increasing order. By remarks in section 17, cf(f (λ)) =
cf(λ) = λ. ⊳
In 1976 J. Gregory showed that for κ regular uncountable and λ
85
+
regular, ♦κ+ (Eλκ ) follows from 2κ = κ+ and κλ = κ. A proof may
be found in [Jech2]. It was shown by S. Shelah in 2007 that for κ
uncountable, ♦κ+ (E) follow from 2κ = κ+ , for any stationary subset
E ⊆ {α < κ+ : cf(α) 6= cf(κ)}. A proof may be found in [Komjath].
¬♦ is consistent with CH; more will be said about this in section
29.
24. Trees.
The term “tree” is used in many areas of mathematics, and what
are called trees in one area may not be the same as what are called trees
in another. In set theory, there is a general notion of a tree; in some
contexts, however, a more specialized type may be meant. According to
[Kanamori1], general transfinite trees were first studied systematically
in the 1935 Ph. D. thesis of D. Kurepa. They have become indispensable
in modern set theory.
A tree is a partially ordered set T , such that for all x ∈ T , x< is wellordered by ≤. Thus, x< is a chain, and there are no infinite descending
sequences. Following are some basic notions concerning trees.
- Elements of T are called nodes.
- A node is said to be of level α if α is the order type of x< ; Tα will
be used to denote the set of nodes of level α.
- The height of a tree T is the least α such that Tα = ∅.
- A tree is said to be a κ-tree if its height is κ, and |Tα | < κ for each
α.
Some authors impose additional restrictions on a κ-tree; here such will
be stated explicitly. Although the definition has been given in general,
the notion of a κ tree is less interesting when κ is singular, and from
here on it will be assumed that κ is regular.
A tree will be said to be rooted if it has a a single node of level 0;
in which case the node is called the root. Often this is required, but the
most general definition omits the requirement.
A branch of a tree is a maximal chain. Clearly it is well-ordered,
and its order type is called its length. Elements x, y in a partial order
are said to be comparable if x ≤ y or y ≤ x; else they are incomparable.
An antichain is a set of pairwise incomparable elements. An application
of Zorn’s lemma shows that any antichain is a subset of some maximal
antichain.
Theorem 1 (Konig’s infinity lemma). An ℵ0 -tree has an infinite
branch.
Proof: Each level is finite, and the tree is infinite. Let x0 be a node
at level 0 such that {y : x0 ≤ y} is infinite. Inductively, let xi+1 be a
node at level i + 1 with xi < xi+1 , and {y : xi+1 ≤ y} infinite. ⊳
86
A regular uncountable cardinal κ is said to have the tree property
if any κ-tree has a branch of length κ. A basic question of set theory
is whether there are any such cardinals. A counterexample to the tree
property, that is, a κ-tree with no branch of length κ, is called a κAronszajn tree. A κ-tree with no branch of length κ and no antichain
of size κ is called a κ-Suslin tree.
Following are some known facts. Some will be considered in later
sections. Inaccessible cardinals are defined in section 30, and Π11 -indescribable cardinals in section 34.
- If κ = ℵ1 then it is provable in ZFC that an Aronszajn tree exists
([Jech2], theorem 9.16). If V = L then a Suslin tree exists (section
26). It is consistent that there is no Suslin tree (section 29).
- If κ > ℵ1 is a successor cardinal, if V = L, a Suslin tree exists
([Devlin], theorem 2.4; some further remarks are made in section
52).
- If κ = λ+ and 2<λ = λ then an Aronszajn tree exists (see [Jackson]).
- If κ is an inaccessible cardinal, then κ has the tree property if and
only if κ is Π11 -indescribable (see section 34).
- If κ is an inaccessible cardinal, and V = L, then there is no κ-Suslin
tree if and only if κ is Π11 -indescribable ([Devlin], theorem VII.1.3).
- If ℵ2 has the tree property then ℵ2 is Π11 -indescribable in L ([Jech2],
theorem 28.23).
- If there is a Π11 -indescribable cardinal then there is a generic extension in which ℵ2 has the tree property ([Jech2], theorem 28.24).
The question of the consistency of the tree property for successors of
singular cardinals is a matter of current research, [MagShel] for example.
Independent questions arise for a third type of tree. A κ-Kurepa
tree is a κ-tree with at least κ+ branches of length κ. In the case when
κ is an inaccessible cardinal, to avoid triviality an additional restriction
is imposed, namely, |Tα | ≤ |α| for for infinite ordinals α. Following are
some known facts. See [Devlin] for the definition of an ineffable cardinal.
- If V = L, if κ is a successor cardinal then there is a Kurepa tree
([Devlin], theorem IV.3.3).
- If there exists an inaccessible cardinal then there is a generic extension in which there is no ℵ1 -Kurepa tree ([Jech2], theorem 27.9).
- If there is no ℵ1 -Kurepa tree then ℵ2 is inaccessible in L ([Jech2],
exercise 27.5).
- If κ is an ineffable cardinal then there is no κ-Kurepa tree. ([Devlin],
theorem VII.2.6).
- If V = L, κ is an inaccessible cardinal, and there is no κ-Kurepa
tree then κ is an ineffable cardinal ([Devlin], theorem VII.2.7).
25. The Suslin hypothesis.
87
It follows from results in section 8 that the real line is the unique
order R (up to order isomorphism) such that
1. R is a dense linear order without endpoints,
2. R has the least upper bound property, and
3. R contains a countable order-dense subset.
It follows from property 3 that
4. a set of pairwise disjoint open intervals in R is at most countable;
indeed any open interval contains an element of the dense subset.
In 1920 Suslin asked whether, if property 3 is replaced by property
4, there is still only a unique order. A counterexample, that is, an order
with properties 1, 2, and 4 which does not have property 3, is called a
Suslin line. The Suslin hypothesis (SH) is the hypothesis that no Suslin
line exists.
In his 1935 dissertation on trees, D. Kurepa reduced the question
of the existence of a Suslin line to that of the existence of a Suslin tree,
where by the latter is meant an ℵ1 -Suslin tree. The question was finally
shown to be independent of ZFC in 1968 to 1971 by T. Jech, R. Jensen,
R. Solovay, and S. Tennenbaum.
Theorem 1. If there is a Suslin line then there is a Suslin tree.
Proof: A sequence Iα for α < ℵ1 of the given Suslin line will be
defined, where Iα = [aα , bα ] with aα < bα . Let I0 be arbitrary. For
α > 0, let S = {aβ : β < α} ∪ {bβ : β < α}. Then S is countable,
so is not dense, so there is a closed interval disjoint from S; let [aα , bα ]
be any such. If for β < α Iβ and Iα intersect then Iβ ⊇ Iα . Letting
T = {Iα : α < ℵ1 }, with I ≤ J if and only if I ⊇ J, T is a tree.
An antichain consists of disjoint intervals, so is countable (consider the
interiors). In particular the levels of T are countable, and |T | = ℵ1 , so
the height of T is ℵ1 . In a branch, the left endpoints form an increasing
sequence lα , and the intervals (lα , lα+1 ) are disjoint; it follows that the
branch is countable. ⊳
Lemma 2. If there is an order with properties 1 and 4 but not
property 3 then there is a Suslin line.
Proof: Let R be the given order. The set of cuts R̄ may be defined
as in section 8 for the rationals, and ordered by ⊆. As in the case of
the rationals, R̄ is a dense linear order without endpoints, has the least
upper bound property, and contains (an isomorphic copy of) R as an
order-dense subset. Any open interval in R̄ contains an open interval
with endpoints in R. It follows that any collection of pairwise disjoint
open intervals in R̄ is at most countable. If R̄ had property 3 it would
be isomorphic to the real numbers; but then R would have property
3, since it is a dense subset of the real numbers (iterate the process of
taking a point in between successive points). ⊳
88
By a subtree of a tree T will be meant any subset T ′ ⊆ T , equipped
with the inherited order. If b′ ⊆ T ′ is a branch then there is a branch
b ⊆ T such that b′ ⊆ b. First, close b′ downward in T , and then extend
the result in any way to a branch. If a′ ⊆ T ′ is an antichain then a′ is
an antichain in T .
For a node x of level α in a tree T let sons(x) denote the elements of
T of level α + 1 with x ≤ y. Define the following “normality” properties
of a κ-tree T .
N1. For any x ∈ T of level α, and any β > α, there is a y ∈ T of level
β, with x ≤ y.
N2. T is rooted.
N3. T has unique limits, that is, if x is a node at level α where α ∈
LimOrd, there is no distinct node y at level α such that x< = y < .
N4. For any x ∈ T , |sons(x)| ≥ 2.
N5. For any x ∈ T , sons(x) is infinite.
Lemma 3. Suppose T0 is κ-tree, where κ is a regular cardinal. Then
there is a subtree T3 ⊆ T such that T3 is a κ-tree having properties N1N3.
Proof: In any κ-tree, if |x≥ | = κ, x ∈ Tα , and β > α, then there
is a y > x with y ∈ Tβ and |y ≥ | = κ, else |x≥ | would be < κ. Let
T1 = {x ∈ T : |x≥ | = κ. It follows that the level in T1 does not
change from T , and T1 is a κ-tree. Further, T1 has property N1. To
ensure property N2, choose any x at level 0 and let T2 = x≥ . To ensure
property N3, proceed inductively on α ∈ LimOrd to choose a single
node x from each set {y ∈ levα : y < = x< }. ⊳
Lemma 4. Suppose T0 is Suslin tree. Then there is a subtree
T5 ⊆ T0 such that T5 is a Suslin tree having properties N1-N5.
Proof: Let T be a Suslin tree, and let T3 be as in lemma 3; then
T3 is a Suslin tree. For any x ∈ T3 , |{y ∈ x≥ : |sons(y)| ≥ 2}| = ℵ1 ,
since otherwise there would be a branch of length ℵ1 . Let T4 = {x ∈
T3 : |sons(x)| ≥ 2}. By transfinite recursion a function f : ℵ1 7→ ℵ1 may
be defined, so that nodes in T4 of level α have level ≤ f (α) in T3 . It
follows that T4 is an ℵ1 -tree. It has properties N1-N4, and is a Suslin
tree. Suppose x ∈ Tα in T4 ; then {y ∈ T4 : y ∈ Tα+ω and x ≤ y} is
infinite. It suffices to note that if S ⊆ {0, 1}ω and |S| = k then only
k of the finite strings of length l where 2l > k can occur as prefixes of
members of S. Let T5 be the nodes of T4 , whose level is 0 or a limit
ordinal. It follows that T5 is as required. ⊳
Theorem 5. If there is a Suslin tree then there is a Suslin line.
Proof: Let T be a Suslin tree with properties N1-N5. For each
x ∈ T choose a bijection of sons(x) with Q, inducing an order of type Q
on sons(x). Let R be the branches of T ; order these lexicographically.
89
It is easy to see that under this order R is dense and without endpoints.
For x ∈ T let Bx = {b ∈ R : x ∈ b}. If (b1 , b2 ) is an open interval in R
then there is an x ∈ T such that Bx ⊆ (b1 , b2 ). Given a set S of pairwise
disjoint intervals, any set of such x is an antichain in T , hence at most
countable, whence S is at most countable. Suppose S is a countable
subset of B. Then there is a level α such that no branch of C has an
element at level α in T . For any x ∈ T at level α, Bx does not contain
an element of S. Thus, S is not dense, and since S was arbitrary R does
not have a countable dense subset. ⊳
26. Diamond implies ¬SH.
That ¬SH is consistent was first proved by constructing a generic
extension in which there is a Suslin line. Later, Jensen showed that ¬SH
follows from V = L, indeed from ♦.
If T is a κ-tree and α < κ let T<α denote ∪β<α Tβ . Properties
N1-N4 of section 25 will be used. Note that these can be defined for
trees of height α where α ∈ LimOrd, and not just κ-trees.
Lemma 1. Suppose α is a countable limit ordinal and T is a tree of
height α having countable levels and properties N1-N4. Suppose x ∈ T .
Then there is a branch b ⊆ T of length α with x ∈ b.
Proof: Suppose x is of level β0 . Choose an increasing sequence βi
with i ∈ ω, unbounded below α. Let x0 = x, and inductively let xi+1
be such that xi+1 ∈ Tβi+1 and xi+1 > xi . Let b be the union of the x≤
i .
⊳
Lemma 2. Suppose T is an ℵ1 -tree having properties N1-N4, and
no uncountable antichains. Then T is a Suslin tree.
Proof: Suppose b is a branch. For each x ∈ b let yb be a son of
x, other than the son in b. Then {yx : x ∈ b} is an antichain, so is
countable, and it follows that b is countable. ⊳
Lemma 3. Suppose T is an ℵ1 -tree having properties N1-N4. Suppose A ⊆ T is a maximal antichain. Let C = {α : A ∩ T<α is a maximal
antichain in T<α }. Then C is a club subset of ℵ1 .
Proof: Suppose A ∩ T<βi is a maximal antichain in T<βi , where
βi is an increasing sequence of ordinals with limit α. If x ∈ T<α then
x ∈ T<βi for some i, so x is comparable with y for some y ∈ A ∩ T<βi .
It follows that C is closed. Choose any α0 < ℵ1 . Given αi , since T<αi
is countable there is some αi+1 with αi < αi+1 < ℵ1 , such that every
x ∈ T<αi is comparable with some y ∈ Tαi+1 . Let α = ∪i∈ω αi ; then
A ∩ Tα is a maximal antichain in Tα . ⊳
Theorem 4. ♦ ⊢ZFC ¬SH.
Proof: A Suslin tree will be constructed, by constructing T<α recursively for α < ℵ1 ; T<α will have properties N1-N4. Let T0 be a single
node. To add Tα+1 to T<α+1 , give each node of Tα two sons.
90
Suppose α ∈ LimOrd. By lemma 1, for each x ∈ T<α there is a
branch b ⊆ T<α of length α having x as a member. Tα is obtained by
choosing for each x ∈ T<α , a branch bx having x as a member; and
adding a node at level α “extending” bx .
To specify how bx is chosen, the nodes of T<α for α ∈ LimOrd
will be enumerated as νξ for ξ < α. This can be achieved recursively,
by enumerating the nodes in ∪α≤β<α+ω Tβ as να+i for i ∈ ω in some
manner (for example using the pairing function of appendix 2 to obtain
i for the jth node of the kth subtree).
Let hSα : α < ℵ1 i be a diamond sequence. Identifying the nodes of
T<α with their indices in the enumeration, suppose Sα happens to be
a maximal antichain in T<α . Then for any x ∈ T<α there is a y ∈ Sα
which is comparable with x, and bx may be chosen to contain such a
y. If Sα is not a maximal antichain in T<α , each bx may be chosen
arbitrarily.
Letting T = ∪α T<α , clearly T is an ℵ1 tree having properties N1N4. By lemma 2, and the fact that any antichain can be enlarged to
a maximal one, to show that T is a Suslin tree it suffices to show that
any maximal antichain is countable. Let A be a maximal antichain in
T . Let C = {α ∈ LimOrd : A ∩ T<α is a maximal antichain in T<α }.
By lemma 3 C is a club set (the proof shows that α ∈ LimOrd may be
required).
Since Sα is a diamond sequence there is an α ∈ C such that A∩α =
Sα . Since α ∈ C A ∩ T<α is a maximal antichain in T<α . Suppose x ∈ T
is of level ≥ α; then there is a y ∈ Tα such that y ≤ x. Since A ∩ α = Sα
is a maximal antichain in Tα , by construction there is a z ∈ A ∩ α such
that z ≤ y. It follows that A ∩ T<α is a maximal antichain in T , whence
A ∩ T<α = A, whence A is countable. ⊳
27. Iterated forcing.
Many forcing arguments make use of “successive” extensions of the
ground model. There are two variations: the successive partial orders
are in the ground model, or in the preceding extension. These are called
“product forcing” and “iterated forcing” respectively.
In either case, the overall extension turns out to be a generic extension of the ground model. As in the case of a simple generic extension,
properties of the overall forcing notion may be proved, necessary to ensure properties of M [G]. These in turn may be seen to hold by general
arguments, from properties of the partial orders Pi where Pi is used at
stage i.
The simplest example is the product of two forcing notions P1 and
P2 . In partial order theory, the product P1 × P2 is the Cartesian product (the set of ordered pairs hp1 , p2 i), with the product order, where
91
hp1 , p2 i ≤ hq1 , q2 i if and only if p1 ≤ q1 and p2 ≤ q2 .
Lemma 1. Suppose P1 and P2 are notions of forcing in a transitive
model M . Let G be a subset of P1 ×P2 . Let Gi = πi [G] for i = 1, 2. Then
G is an M -generic filter if and only if G1 is an M -generic filter and G2 is
an M [G1 ]-generic filter, if and only if G2 is an M -generic filter and G1 is
an M [G2 ]-generic filter. In this case, M [G] = M [G1 ][G2 ] = M [G2 ][G1 ].
Remarks on proof: This is lemma 15.9 of [Jech2] ⊳
One application of the product of two notions may be found in
lemma 15.19 of [Jech2], a fact used in the proof of Easton’s theorem
(stated in section 51). Various products of a family Pi where i ranges
over an index set I are considered. In such cases, often each Pi is
required to have a largest element 1. The support of a sequence hpi :
i ∈ Ii in the Cartesian product is {i ∈ I : pi 6= 1}. The overall notion
of forcing is a subset of the Cartesian product, namely those sequences
whose whose support satisfies some restriction. For example, the notion
of forcing that adds κ Cohen reals is the same as the product with finite
support of κ copies of the notion that adds a single Cohen real. (The
largest element is the function with empty domain.)
By lemma 1, forcing with P1 × P2 is the same as forcing with P1 ,
and then with P2 (or P2 and then P1 ). P2 is an element of the ground
model M , and the generic filter for the second forcing is a subset of P2 .
More generally, forcing with P2 , where this is an element of M [G1 ], can
be considered. Forcing in this manner is called iterated forcing. It was
first used in 1971 by R. Solovay and S. Tennenbaum, and has since seen
extensive use.
To define a notion of forcing in M which is equivalent to the iterated
forcing, a name for P2 must be used. Recall that this is an element of
M B1 where B1 = ro(P1 ); Ṗ2 will be used to denote it. The notation
P1 ∗ Ṗ2 is used to denote the notion equivalent to a two-step iteration.
The elements of P1 ∗Ṗ2 are ordered pairs hp1 , ṗ2 i where p1 ∈ P1 and Jṗ2 ∈
Ṗ2 K = 1. The truth value is taken with respect to the first forcing, and
is an element of B1 . This is the definition used in [Jech2]; other authors
use variations, which are mostly inessential, although some differences
may result.
To define the partial order on P1 ∗ Ṗ2 , let hp1 , ṗ2 i ≤ hq1 , q̇2 i if and
only if p1 ≤ q1 and p1 ṗ2 ≤ q̇2 .
Ṗ2 is required to be a name for a partial order, that is, the statement
“P2 is a partial order” is required to have truth value 1, whence it is
forced by any condition. Using this, it is easy to verify that the relation
≤ on P1 ∗ Ṗ2 is a partial order. For example, suppose hp1 , ṗ2 i ≤ hq1 , q̇2 i
and hq1 , q̇2 i ≤ hr1 , ṙ2 i. Then p1 ≤ q1 and q1 ≤ r1 , whence p1 ≤ r1 since
P1 is a partial order. Also p1 p2 ≤ q2 and q1 q2 ≤ r2 , whence, since
92
p1 ≤ q1 , p1 p2 ≤ q2 ∧q2 ≤ r2 . By the requirement on Ṗ2 , p1 p2 ≤ r2 .
The reflexivity and antisymmetry properties may be similarly verified.
Lemma 2. Suppose G1 ⊆ P1 is an M -generic filter, P2 = Ṗ2G , and
G2 ⊆ P2 is an M [G1 ]-generic filter. Let G1 ∗ G2 = {hp1 , ṗ2 i : p1 ∈ G1
and ṗG
2 ∈ G2 }. Then G1 ∗ G2 ⊆ P1 ∗ Ṗ2 is an M -generic filter, and
M [G1 ∗ G2 ] = M [G1 ][G2 ].
Remarks on proof: This is lemma 16.2(i) of [Jech2]. ⊳
Lemma 3. Suppose G ⊆ P1 ∗ Ṗ2 is an M -generic filter. Let G1 =
π1 [G], P2 = Ṗ2G1 , and G2 = {ṗ2G1 : ṗ2 ∈ π2 [G]}. Then G1 ⊆ P1 is an
M -generic filter, G2 ⊆ P2 is an M [G1 ]-generic filter. and G = G1 ∗ G2 .
Remarks on proof: This is lemma 16.2(ii) of [Jech2]. ⊳
It should not come as a surprise that the foregoing two-step iteration can be used at successor stages of a transfinite iteration. To specify
an iterated forcing in general, an overall poset P is defined as a poset
of sequences of length η for some ordinal η > 0. The notation Pα will
be used for {p ↾ α : p ∈ P }, for 1 ≤ α ≤ η. To simplify the notation, for
a partial order P let M P denote M ro(P ) .
P must have various properties, which are specified inductively using the Pα . At stages where α is a successor ordinal β + 1, a two-step
iteration is done, using Pβ and a name Q̇β in M Pβ for a notion of forcing. The case α = 1 is special, and involves a notion of forcing Q0 in
M . Q0 and the Q̇α are all required to contain a largest element, which
will be denoted 1.
The notation ≤α is used for the order on Pα . Likewise JF Kα denotes
the truth value in M Pα , and α the corresponding notion of forcing. The
requirements that Pα be the partial order of a two-step iteration when
α = β + 1 are as follows, where p, q ∈ Pα .
1. p ↾ β ∈ Pβ and Jp(β) ∈ Q̇β Kβ = 1.
2. p ≤ q if and only if p ↾ β ≤ q ↾ β and p ↾ β p(β) ≤ q(β).
This differs from the general definition slightly, in that Pβ × M Pβ is
identified with Pα .
P1 is just the set of length 1 sequences p with p(0) ∈ Q0 ; p ≤1 q if
and only if p(0) ≤ q(0).
When α is a limit ordinal, p ∈ Pα only if p ↾ β ∈ Pβ for β < α; and
p ≤ q if and only if p ↾ β ≤ q ↾ β for β < α. The exact definition of
Pα varies, depending on the type of iterated forcing. Some “regularity”
properties may be required. In [Jech2], P is required to contain the
sequence p where p(α) = 1 for all α < η. Also, if β < α, p ∈ Pα , q ∈ Pβ ,
and q ≤ p ↾ β, then p′ ∈ Pα , where p′ (γ) equals q(γ) if γ < β, else p(γ).
As above, the support of a sequence p is {α < η : p(α) 6= 1}. A
general type of iteration considers those p whose support is in a specified
ideal I ⊆ Pow(η) which contains the finite sets; such a notion of forcing is
93
called an iteration with I-support. At limit stages, all such p are taken.
The simplest type of iterated forcing is iteration with finite support,
which is iteration with I-support, where I is the finite sets.
The properties which P must have, and the methods used to prove
that they hold, again vary with the type of forcing.
Theorem 4. Suppose κ is a regular uncountable cardinal. Suppose
P is a finite support iteration of length η, where J“Q̇α satisfies the κc.c.”Kα = 1 for all α < η. Then P satisfies the κ-c.c.
Remarks on proof: This is lemma 16.9 of [Jech2] ⊳
28. Martin’s axiom.
As noted in chapter 21, it is of little interest in forcing arguments
whether a generic filter actually exists. It has turned out that the existence question is of interest in set theory. The following fact is a classic
theorem of partial order theory, called by some authors the RasiowaSikorski lemma.
Theorem 1. Suppose P is a poset, and C is a countable collection
of dense subsets of P . Then there is a filter G ⊆ P , such that G ∩ D 6= ∅
for every D ∈ C.
Proof: Enumerate C as D0 , . . .. Let p0 be an arbitrary element of
P . Given pn , let pn+1 be any element p ∈ Dn such that p ≥ pn . Let
G = {q : q ≤ pn for some n}. ⊳
In accordance with the terminology of forcing, a set G as in the
theorem is often said to be “C-generic”.
Martin’s axiom (MA) is a statement more general than theorem 1:
Suppose P is a poset satisfying the c.c.c., and C is a collection of dense
subsets of P , with |C| < 2ℵ0 . Then there is a filter G ⊆ P , such that
G ∩ D 6= ∅ for every D ∈ C.
MA follows from CH by theorem 1; indeed the hypothesis that P
be c.c.c. is unnecessary. If CH is false, however, then MA asserts the
existence of generic filters in additional cases. MA+¬CH has turned out
to be an assumption of interest in considering independent statements;
it settles a variety of them, sometimes in the same manner as V = L,
and sometimes in the opposite manner.
To begin the discussion, MA will be reduced to a special case,
where only “small” P need be considered. Let MAs denote the following
statement:
Suppose P is a poset satisfying the c.c.c. and |P | < 2ℵ0 , and C is a
collection of dense subsets of P , with |C| < 2ℵ0 . Then there is a filter
G ⊆ P , such that G ∩ D 6= ∅ for every D ∈ C.
Lemma 2. MA follows from MAs .
Proof: Given any P , and C, a poset P ′ ⊆ P with |P ′ | = |C| will
be constructed, in such a way that MAs may be applied. Let p be an
94
element of P . Let fc : P × P 7→ P be such that if p and q are compatible
and r = fc (p, q) then r ≤ p, q. For D ∈ C let fD : P 7→ D be such that
fD (p) ≤ p. P ′ will be taken as the smallest subset of P containing
p and closed under fc and fD for D ∈ C. By standard arguments
|P ′ | ≤ sup(ℵ0 , |C|). A subset of P ′ which is pairwise incompatible in
P ′ is pairwise incompatible in P , so is countable. Each set D ∩ P ′ is
dense in P ′ . Thus, by MAs there is a filter F ′ ⊆ P ′ which has nonempty
intersection with every D ∩ P ′ for D ∈ C. Let F be the upward closure
of F ′ ; then F is a filter which has nonempty intersection with every
D ∈ C. ⊳
Theorem 3. Suppose M is a transitive model of ZFC+GCH. Suppose that (in M ) κ > ℵ1 is a regular cardinal. There is a notion of
forcing P satisfying the c.c.c., such that a generic extension M [G] satisfies MA, and 2ℵ0 = κ.
Remarks on proof: P will be an iteration with finite support of
length κ. The idea of the proof is to choose Q̇α such that each small
poset Q in M [G] gets used as Q̇α at some stage. Pα+1 will then contain
a generic filter for Q. Some further details will be given; for complete
details see theorem 16.13 of [Jech2].
Q̇α is chosen so that “Q̇α satisfies the c.c.c” and “Q̇α < κ” have
truth value 1 in V Pα . From this, and GCH in M , the following can be
concluded: Pα and P satisfy the c.c.c., |Pα | ≤ κ, there are at most κ
candidates for Q̇α , and “2ℵ0 ≤ κ” has truth value 1 in V P .
Recalling the function Γ from section 13, let α = Γ(β, γ), and at
stage α let Q̇α be the γth name in V Pβ of a suitable small poset.
Let G be a generic filter for P , and let Gα = G ↾ Pα . It may be
shown that if λ ≤ κ, X ⊆ λ, and X ∈ M [G] then X ∈ M [Gα ] for some
α < κ. Using this, it follows that if Q ∈ M [G] is a suitable poset, and
C ∈ M [G] is a collection of dense subsets with |C| < κ, then there is
C-generic filter F ∈ M [G] for Q. This shows that M [G] satisfies MAs .
Further, if X ⊆ {0, 1}ω and |X| < κ, Q and C may be chosen to
conclude that X 6= {0, 1}ω . Thus, 2ℵ0 ≥ κ, whence 2ℵ0 = κ, as 2ℵ0 ≤ κ
was shown already. ⊳
In what follows a brief list of consequences of MA will be given.
There is an entire book [Fremlin1] devoted to the subject. In some
cases the consequence might follow from a weaker hypothesis.
Recall from section 15 the notions of meager and measure 0 subsets
of R. It is provable in ZFC that these are closed under countable unions.
Hence, if CH is true, they are closed under unions of size < 2ℵ0 . In fact
it follows from MA that they are closed under unions of size < 2ℵ0 . A
proof may be found in [Jech2], theorem 26.39, or [Ciesielski], theorems
8.2.6 and 8.2.7.
95
It is independent whether the measure 0 sets are closed under unions
of size < 2ℵ0 . Indeed, if M is countable, P is the notion of forcing for
adding ℵ2 reals, and G ⊆ P is M -generic, then in M [G] the interval [0, 1]
is a union of < 2ℵ0 measure 0 sets. This is corollary 9.4.7 of [Ciesielski].
It follows that ¬MA is consistent with ¬CH.
Recall from section 15 the notion of a cardinal invariant. It follows
from MA that these all equal 2ℵ0 . Discussions may be found in [Jech2]
and [Roitman]. A related result concerns scales, which are certain families of functions f : ω 7→ ω. MA implies that these exist, and all have
cardinality 2ℵ0 ; see [Roitman].
CH implies that there are partial orders satisfying the c.c.c., whose
product does not satisfy it (theorem 8.1.12 of [Ciesielski]). MA+¬CH
implies that for any two partial orders satisfying the c.c.c., their product
satisfies it (theorem 8.2.10 of [Ciesielski]).
Ultrafilters were mentioned in section 21. An ultrafilter in a Boolean algebra B is a filter F , such that for each x ∈ B, either x ∈ F
or x† ∈ F . A p-point is an ultrafilter in Pow(ω), which has certain
properties. It was shown in 1970 that MA implies the existence of ppoints; [Jech2] gives a proof in theorem 16.27. It had been shown by
W. Rudin in 1956 that CH implies the existence of p-points. In 1982 it
was shown by S. Shelah that there are models of ZFC where there are
no p-points.
The notion of a commutative (or Abelian) group was defined in
section 8. Free Abelian groups are an important type. Whitehead
groups are defined as groups which have an important property of free
groups (trivial extension group). In the 1950’s the problem was raised
of whether every Whitehead group was free. This was shortly shown
to be the case for countable groups. The problem remained open for
arbitrary cardinality until 1974, when S. Shelah showed the following.
1. If V = L, in fact if a certain ♦ principle, holds then every Whitehead
group is free.
2. If MA+¬CH holds then there are Whitehead groups which are not
free.
Finally, SH follows from MA+¬CH; this is shown in the next section.
29. SH is consistent.
SH was the first principle proved consistent using iterated forcing.
It was almost immediately observed that SH follows from MA+¬CH,
that the consistency of MA+¬CH follows by a basic iterated forcing
argument, and that MA+¬CH had various other consequences.
It is sometimes valuable to refine the hypothesis MA+¬CH. For an
infinite cardinal κ, MAκ is the following statement.
96
Suppose P is a poset satisfying the c.c.c., and C is a collection of
dense subsets of P , with |C| ≤ κ. Then there is a filter G ⊆ P , such
that G ∩ D 6= ∅ for every D ∈ C.
MA is the statement ∀κ < 2ℵ0 (MAκ ). MAℵ0 is true, by theorem
28.1.
Theorem 1. MAκ can only hold if κ < 2ℵ0 .
Proof: Let P be the poset for adding a single real. For each f ∈
{0, 1}ω let Df = {p ∈ P : ∃n ∈ ω(p(n) 6= f (n)). Suppose G is a filter
generic for this collection. Since G is a filter, h = ∪G exists. But this
is a contradiction, since h cannot equal any f ∈ {0, 1}ω . ⊳
Theorem 2. MAℵ1 implies SH.
Proof: Suppose T is a Suslin tree, with property N1 of section 25.
Let P be T with the order reversed, so that p < q if p is farther down
a branch of the tree than q. If two elements of P are incompatible in
P , then they are incomparable in T , and since any pairwise incomparable subset of T is countable, any pairwise incompatible subset of P is
countable. That is, P satisfies the c.c.c. For α < ℵ1 let Dα be the set
of nodes of T of level > α. It is easy to see using property N1 that Dα
is a dense subset of P . Since |P | = |T | = ℵ1 , by MAℵ1 there is filter
G which intersects every Dα . Since G is a filter, it’s union must be a
branch of T , and since it intersects every Dα , it must have length ℵ1 ,
contradicting the hypothesis that T is a Suslin tree. ⊳
From theorems 23.1 and 26.4, CH+¬SH is consistent with ZFC.
From theorems 1 and 2, ¬CH+SH is consistent with ZFC.
That ¬CH+¬SH is consistent with ZFC follows by a theorem of
Shelah, that adding a Cohen real adds a Suslin tree. This is theorem
28.12 of [Jech2], or theorem 46 of [Roitman].
That CH+SH is consistent with ZFC was shown by Jensen. The
proof is quite involved, and may be found in [DevJohn]. Later, Shelah
gave a proof using “proper forcing”, which according to [Kanamori2]
he invented partly to prove this theorem. Still later, the proof was
simplified [AbrShel]. It was remarked in section 23 that CH+¬♦ is
consistent with ZFC; this follows because CH+SH is, and SH⇒ ¬♦.
30. Inaccessible cardinals.
Recall that a cardinal κ is a limit cardinal if λ+ < κ whenever λ < κ.
A cardinal κ is said to be a strong limit cardinal if 2λ < κ whenever
λ < κ. If GCH holds then the two notions are equivalent, but since
GCH is independent the two notions must be considered separately.
Recall also that a cardinal κ is regular if there is no (increasing)
map f : α 7→ κ from an ordinal α < κ, whose range is unbounded. A
cardinal κ is said to be weakly inaccessible (resp. strongly inaccessible)
if it is uncountable, regular, and a limit (resp. strong limit) cardinal.
97
Many authors use the term “inaccessible” by itself to mean “strongly inaccessible”, and this will be done here.
Lemma 1. Suppose κ is an inaccessible cardinal.
a. If α < κ then |Vα | < κ.
b. |Vκ | = κ.
Proof: Part a is proved by induction on α. At successor stages,
|Vα+1 | = 2|Vα | ; |Vα | < κ by induction, so |Vα+1 | < κ since κ is a strong
limit cardinal. At limit stages, |Vβ | < κ by induction, so ∪β<α |Vβ | < κ
since κ is regular and α < κ, so |Vα | < κ. For part b, using part a
|Vκ | = ∪α<κ |Vα | ≤ κ; and clearly κ ≤ |Vκ | for any cardinal κ. ⊳
It was observed in section 17 that if α is a limit ordinal and α >
ω then Vα is a model of the axioms of set theory, with the possible
exception of the replacement axiom. The replacement axiom can be
given in a “second order” form,
∀F (“F is a partial function”⇒ ∀u∃v(v = F [u])).
Theorem 2. If κ is an inaccessible cardinal then Vκ is a model of
second order replacement, and hence a model of ZFC.
Proof: If u ∈ Vκ then u ∈ Vλ for some λ < κ, so u ⊆ Vλ , so
|u| ≤ |Vλ | < κ. Let r(x) equal ρ(F (x)) if F (x) exists, else 0. Since
κ is regular there is an α < κ such that x ∈ u ⇒ r(x) < α, whence
F [u] ⊆ Vα , whence F [u] ∈ Vα+1 , whence F [u] ∈ Vκ . This shows that Vκ
is a model of second order replacement. A fortiori Vκ is a model of the
replacement axiom scheme, hence by remarks above a model of ZFC. ⊳
Lemma 3. The following predicates are Π1 :
a. κ is a cardinal.
b. κ is a regular cardinal.
c. κ is a limit cardinal.
d. κ is a strong limit cardinal.
Further, if κ is an inaccessible cardinal and λ < κ has one of these
properties in Vκ then λ has the property.
Proof: Part a was already observed in section 16. For part b, a
cardinal κ is regular if and only if ¬∃f ∃α < κ, the domain of f is a
subset of α and f [α] is unbounded in κ. For part c, a cardinal κ is a
limit cardinal if and only if for all α < κ∃λ < κ, α < λ and λ is a
cardinal. For part d, a cardinal κ is a strong limit cardinal if and only
if ¬∃f ∃α < κ, any element in the domain of f is a subset of α and
f [α] = κ. For the second claim, suppose λ is an ordinal and f : α 7→ λ
where α < λ and f [α] = λ. Since every ordered pair of f is in Vλ ,
f ∈ Vκ . Similarly if f is a function contradicting that λ is a regular
cardinal or a strong limit cardinal then f ∈ Vκ . Finally if λ = µ+ then
µ ∈ Vκ . ⊳
Theorem 4. If ZFC is consistent then it is not provable in ZFC that
98
an inaccessible cardinal exists.
Proof: Let I be the statement that an inaccessible cardinal exists.
Let F be the statement “for all M , if M is a model of ZFC then M is a
model of I”. Using lemma 3, it follows from I that there is a model of
ZFC in which ¬I holds, namely Vκ where κ is the smallest inaccessible
cardinal. Writing ⊢ for ⊢ZFC , it has been shown that ⊢ I ⇒ ¬F . On
the other hand, if ⊢ I then ⊢ F . ⊳
Theorem 5. If ZFC is consistent then it is not provable in ZFC that
a weakly inaccessible cardinal exists.
Proof: Let W be the statement that a weakly inaccessible cardinal
exists. By an argument similar to the proof of theorem 4, if W is
provable in ZFC+V = L then ZFC+V = L is inconsistent. But if W is
provable in ZFC then W is provable in ZFC+V = L, and if ZFC+V = L
is inconsistent then ZFC is inconsistent. ⊳
Set theorists have long suspected that inaccessible cardinals exist. [Godel] contains arguments in favor of their existence. [Hauser]
states that “Their existence is intrinsically plausible on the basis of
. . . the doctrine that the universe of all sets V is beyond any determination”. [Bagaria] states that the axioms for set theory which should be
found intuitively obvious include ZFC “plus, perhaps, some small largecardinal existence axioms”. See also remarks at the end of chapter 1 of
[Kanamori3], including a quote from a 1930 paper of Zermelo.
A compelling argument can be given via “meta-logical” considerations concerning the universe of discourse of set theory. This behaves
in many respects like a set, indeed like a domain of discourse of mathematics which is a set, satisfying some axioms. A type structure can be
erected on top of it, proper classes being “type 1” collections. Axioms
which are intuitively obvious can be given. Among these is the axiom
which state that the universe satisfies second order replacement.
These observations can be seen as putting the existence of inaccessible cardinals on the same footing as the existence of sets in general.
Since there is “something” that behaves in this way, there is a set that
does. The universe is a concept more vague than a set, and aspects of
the situation are “reflected” by the existence of inaccessible cardinals.
In addition, a universe without an inaccessible cardinal can be obtained by “truncating” a universe with inaccessible cardinals, at the
smallest such. This truncation seems arbitrary, once the existence of
inaccessible cardinals is considered.
A “large cardinal” is a cardinal at least as large as the smallest
inaccessible cardinal. Throughout the history of set theory various types
of large cardinals have been defined, and their properties studied. While
there is still debate on which types of large cardinals should be accepted
99
as existing, most set theorists would probably agree that the existence
of inaccessible cardinals should be added to the axioms of set theory.
This has not yet been done, though, probably because it makes little
difference to the rest of mathematics.
As a corollary of lemma 3, if κ is an inaccessible cardinal then it
remains one in L. As will be seen, some types of large cardinals have
this property, and some do not.
31. Mahlo cardinals.
In the preceding section, it was observed that inaccessible cardinals
can be argued to exist, because the universe behaves like a set, so there
is a set that behaves like the universe. This principle might be called
“collecting the universe”. At stages of the cumulative hierarchy hVα i
where second order replacement is satisfied, the collection process may
be continued.
It should be clear that according to the principle of collecting the
universe, the inaccessible cardinals form a proper class. The inaccessible cardinals should be unbounded, and the universe should retain the
characteristics of an inaccessible cardinal. Continuing, there are inaccessible cardinals κ where the inaccessible cardinals are unbounded below
κ. Such cardinals are called hyperinaccessible.
To continue further, it is useful to introduce the operator Lim on
classes of ordinals. If X is a class of ordinals let Lim(X) = {α : X ∩
α is unbounded below α} denote the limit points of X, as defined in
section 23. This is standard notation, used for example in [Jech2]. The
notation Lim′ (X) will be used for X ∩ Lim(X) (there doesn’t seem to
be a standard notation for this).
Let Inac denote the class of inaccessible cardinals. The hyperinaccessible cardinals are clearly the class Lim′ (Inac). Applying Lim′ again
results in the hyper-hyperinaccessible cardinals, etc. It seems clear that
continuing to apply the operation Lim′ results in cardinals which can
be argued to exist using the principle of collecting the universe. A more
complete discussion of this topic may be found in [Dowd2].
Mahlo cardinals are named after P. Mahlo, who defined weakly
Mahlo cardinals in 1911. There had been suspicion that Mahlo cardinals represented some sort of “limit” of iterating the operation Lim′ ;
see [Drake] for example. A result to this effect was proved in 1967 in
[Gaifman]. The proof has been simplified by the author and others. It
will be given here, since it provides evidence for the existence of Mahlo
cardinals, and introduces various basic facts about clubs and stationary
sets.
An inaccessible cardinal κ is said to be Mahlo if the inaccessible
cardinals below κ are stationary. A weakly inaccessible cardinal κ is
100
said to be weakly Mahlo if the weakly inaccessible cardinals below κ are
stationary; weakly Mahlo cardinals will not be further considered.
To begin with, some additional facts about Lim will be noted. The
operation Lim is “local”; given any ordinal α, there is an operation Lim
acting on the subsets of α, where Lim(X ∩ α) = Lim(X) ∩ α. Recalling
a definition from section 23, a class X of ordinals is closed if and only if
Lim(X) ⊆ X. Clearly this is the case if and only if Lim(X) = Lim′ (X).
More generally, X is said to be closed in Y if Lim(X) ∩ Y ⊆ X.
Lemma 1. The operation Lim satisfies
a. X ⊆ Y ⇒ Lim(X) ⊆ Lim(Y ),
b. Lim(Lim(X)) ⊆ Lim(X), and
c. Lim(X ∪ Y ) = Lim(X) ∪ Lim(Y ).
Proof: Part a is obvious. For part b, suppose α ∈ Lim(Lim(X)). If
β < α then ∃γ ∈ Lim(X) ∩ α(β < γ), whence ∃δ ∈ X ∩ γ(β < δ). This
shows that α ∈ Lim(X). For part c, Lim(X) ∪ Lim(Y ) ⊆ Lim(X ∪ Y )
by part a. If α ∈ Lim(X ∪ Y ) then (X ∪ Y ) ∩ α is unbounded below α,
whence X ∩ α or Y ∩ α must be. ⊳
Next, some additional facts about club subsets and stationary subsets of an uncountable cardinal κ will be noted. The diagonal intersection △ξ<κ Xξ of an indexed sequence Xξ of subsets of κ is the subset
which contains an ordinal α if and only if α ∈ Xξ for ξ < α.
Lemma 2. Suppose κ is an uncountable cardinal.
a. If Cξ is a club subset for ξ < η where η < κ then ∩ξ<η Cξ is a club
subset.
b. If Cξ is a club subset for ξ < κ then △ξ<κ Cξ is a club subset.
c. If C is a club subset then Lim(C) is a club set.
Proof: For part a, Lim(∩ξ Cξ ) ⊆ Lim(Cξ ) ⊆ Cξ , so Lim(∩ξ Cξ ) ⊆
∩ξ Cξ ; thus, ∩ξ Cξ is closed. Given α < κ, at stage η · i + ξ choose
an element of Cξ greater than the elements chosen so far (greater than
α at stage 0). Let γ be the supremum of the elements chosen. Then
γ ∈ Lim(Cξ ) for each ξ, so γ ∈ Cξ for each ξ, so γ ∈ ∩ξ Cξ . For part
b, suppose C = △ξ<κ Cξ and α ∈ Lim(C). If ξ < α then there is an
increasing sequence hαζ i with ξ < αζ < α and αζ ∈ C, whose supremum
is α. Then αζ ∈ Cξ for all ζ, whence since Cξ is closed α ∈ Cξ . Since
ξ was arbitrary, α ∈ △ξ<κ Cξ , and it has been shown that C is closed.
Given α < κ, choose β0 > α with β0 ∈ C0 . At stage n + 1, choose
βn+1 > βn with βn+1 ∈ Cβ′ n where Cξ′ = ∩ζ≤ξ Cξ . Let β = sup{βn }. If
ξ < β then ξ < βn for some n. For k > n βk ∈ Cβ′ n , and so β ∈ Cβ′ n ,
and so β ∈ Cξ . Since ξ was arbitrary, β ∈ △ξ<κ Cξ , and it has been
shown that C is unbounded. For part c, since Lim(Lim(C)) = Lim(C),
Lim(C) is closed. If X ⊆ κ is unbounded then Lim(X) is unbounded:
Given α ∈ κ, choose an ascending chain α0 < α1 · · · of length ω, of
101
elements of X, with α0 > α. The limit of this chain is in Lim(X), and
is greater than α. ⊳
If β is a limit ordinal of uncountable cofinality the notion of a club
subset of β is still a sensible notion. Parts a and c of the lemma may be
seen to hold, with essentially the same proofs, where η < cf(β) in part
a.
Lemma 3. Suppose κ is an uncountable cardinal. Suppose X ⊆
Y ⊆ κ, Y is stationary, and X is closed in Y and unbounded; then the
following hold.
a. X is stationary.
b. Lim′ (X) is closed in Y and unbounded.
Proof: Since X is unbounded Lim(X) is a club subset (see the
proof of lemma 2.c). Given a club subset C, Lim(X) ∩ C is a club
subset, so Lim(X) ∩ Y ∩ C is nonempty. This proves part a. For part
b, X ∩ Lim(X) ∩ Y ⊆ X ∩ Lim(X), and X ∩ Lim(X) ∩ Y = Lim(X) ∩ Y
is unbounded, so X ∩ Lim(X) is unbounded. ⊳
The Lim′ operation may be iterated through ordinals, by taking
intersections at limit ordinals. To avoid triviality, at a stage α where
cf(α) < κ, an intersection of length η < κ should be taken; and if
cf(α) = κ a diagonal intersection should be taken.
These observations may be captured in the notion of a “scheme” in
κ; this is a specification of the intersections and diagonal intersections to
be taken. A length is given, which is a successor ordinal ρ + 1 < κ+ . For
each α ≤ ρ which is a limit ordinal, an increasing unbounded sequence
with domain η ≤ κ is given. It follows that η = κ if and only if cf(α) = κ.
For a scheme Σ in κ, an operation F on subsets of κ, and a subset
X of κ, the F Σ (X) of using Σ to iterate F on X may be defined. Indeed,
for α ≤ ρ let Xα be the result of applying F through α steps according
to Σ. This is defined by recursion, with the definition falling into into 4
cases:
0. X0 = X.
1. Xβ+1 = F (Xβ ).
2. ∩ξ<η Xαξ .
3. △ξ<κ Xαξ .
F Σ (X) equals Xρ .
Lemma 4. If Y ⊆ κ is stationary then for any scheme Σ in κ,
Σ
Lim′ (Y ) is closed in Y and unbounded.
Proof: The proof is by induction on Σ. The basis is trivial. The
Σ
Σ
claim follows for Lim′ (Lim′ (Y )) from the claim for Lim′ (Y ) by lemma
3. Suppose Xξ is closed in Y and unbounded for ξ < η where η < κ.
Clearly ∩ξ<η Xξ is closed in Y . Also, ∩ξ<η Lim(Xξ ) is club, so Y ∩
(∩ξ<η Lim(Xξ )) is unbounded. Suppose Xξ is closed in Y for ξ < κ, and
102
suppose α ∈ Lim(△ξ Xξ ) ∩ Y . Let αη be a sequence in △ξ Xξ converging
to α. If ξ < α then some suffix of the sequence converges in Xξ to
α, so α ∈ Xξ . But this shows that α ∈ △ξ Xξ . The argument for
unboundedness is similar to the intersection case. ⊳
Lemma 5. If Y ⊆ κ is not stationary then for some scheme Σ in κ,
Σ
Lim′ (Y ) = ∅.
Proof: Let Z ⊆ κ be a club set disjoint from Y . Enumerate Z in
natural order as hαγ : γ < κi. Choose any scheme of rank κ where the
limiting sequence for κ is hαγ i. By induction Yα ∩ α = ∅ for α < κ. It
follows that α ∈
/ Yκ for α ∈ Lim, whence Lim′ (Yκ ) = ∅. ⊳
Theorem 6. Suppose κ ∈ Inac; the following are equivalent.
a. κ is Mahlo.
Σ
b. For any scheme Σ in κ, Lim′ (Inac ∩ κ) is closed in Inac and unbounded.
Σ
c. For any scheme Σ in κ, Lim′ (Inac ∩ κ) is stationary.
Σ
d. For any scheme Σ in κ, Lim′ (Inac ∩ κ) 6= ∅
Proof: b follows from a by lemma 4 c follows from b by lemma 3.
d follows from c immediately. a follows from d by lemma 5. ⊳
Thinking of κ as Ord, the theorem indicates that Ord has the Mahlo
property exactly if no iteration of Lim′ exhausts the inaccessible cardinals. Since it is reasonable to suppose that this is the case, it is reasonable to suppose that the universe has the Mahlo property, whence it is
reasonable to suppose that Mahlo cardinals exist.
The theorem can be recast in terms of filters. Let F be the subsets
X ⊆ κ which contain a club set. As mentioned in section 23, it follows
by lemma 2 that F is a filter, called the club filter. Say that a filter is
proper, or nontrivial, if it does not contain ∅; clearly F is proper. Say
that a filter of subsets of κ is κ-complete if it is closed under intersections of length less than κ; and normal if it is closed under diagonal
intersections. Lemma 2 shows that F is a κ-complete normal proper
filter.
Theorem 7. Suppose κ ∈ Inac. Then κ is Mahlo if and only if there
is a κ-complete normal proper filter of subsets of κ, containing Inac ∩ κ
and closed under Lim′ .
Proof: Suppose κ is Mahlo. Using theorem 6.d, let F0 be the sets
′Σ
Lim (Inac ∩ κ) for Σ a scheme in κ. Let F be the sets containing a
set in F0 . Then F is a filter with the required properties. If F is a
filter with the required properties then F0 ⊆ F and the requirement in
theorem 6.d holds. ⊳
The statements “X is a closed subset” and “X is an unbounded
subset” of an ordinal α are readily verified to be ∆0 . The statement “κ
is a Mahlo cardinal” is Π1 ; κ is a Mahlo cardinal if and only if κ is an
103
inaccessible cardinal, and any club subset of κ contains an inaccessible
cardinal. Thus, if κ is a Mahlo cardinal then κ is a Mahlo cardinal in
L.
Mahlo cardinals appear in various topics in set theory. Some examples may be found in [Jech2], for example corollary 18.4. A fairly recent
result concerning Mahlo cardinals and Aronszajn trees can be found in
[Todor1]. In [Friedman3] a statement about the integers is given, whose
proof requires the existence of Mahlo cardinals. Large cardinals, including weakly Mahlo cardinals, have been used in “ordinal analysis”; see
[Rathjen2] for a survey.
32. Greatly Mahlo cardinals.
Like Lim′ of the previous section, the Mahlo operation is defined
on Pow(κ) for an inaccessible cardinal κ. Several variations have been
considered; the definition used here will be as follows: H(X) = {λ ∈
Inac ∩ X : X ∩ λ is stationary below λ}.
Say that an inaccessible cardinal κ is greatly Mahlo if and only if
there is a κ-complete normal proper filter of subsets of κ, containing
Inac ∩ κ and closed under H.
Theorem 1. Suppose κ ∈ Inac. Then κ is greatly Mahlo if and only
if for any scheme Σ in κ, HΣ (Inac ∩ κ) 6= ∅.
Proof: The proof is just like the proof of theorem 31.6. ⊳
The greatly Mahlo cardinals were so-named in [BTW]. They had
already been considered, along with even larger Mahlo-type cardinals, in
[Gaifman]. The considerations of the previous section can be continued,
suggesting that it is reasonable to suppose that greatly Mahlo cardinals
exist. No attempt will be made give a detailed argument here, but the
Mahlo cardinals should be a proper class, and generalizing using theorem 31.6, applying Mahlo’s operation should not exhaust the cardinals.
Iterating Mahlo’s operation using schemes should not either.
Theorem 2. If a cardinal κ is greatly Mahlo then this is true in L.
Remarks on proof: First, the predicate “Σ is a scheme” is ∆0 . A
scheme Σ is a function whose domain is an ordinal. Σ(α) = 0 if α is
a successor, else it is an increasing unbounded sequence with domain
≤ κ. By induction on Σ, HΣ (Inac ∩ κ) ⊆ (HΣ (Inac ∩ κ))L . The basis
follows because “λ is inaccessible” is Π1 . At successor stages, Y = H(X)
where inductively X ⊆ X L . If λ ∈ Y then λ ∈ X, so λ ∈ X L ; and
X is stationary below λ, so X L is. “X is stationary below λ” is Π1 ,
so X L is stationary below λ in L, that is, λ ∈ Y L . Intersection and
diagonal intersection are straightforward. It follows that if HΣ (Inac ∩ κ)
is nonempty for any Σ, then this is true in L. ⊳
Greatly Mahlo cardinals have been appearing in topics in set theory.
[Jech2] mentions one example.
104
A further characterization of greatly Mahlo cardinals can be give,
using an additional method, first introduced in [Jech1]. This method
has continued to find uses since, so an outline will be given.
For the rest of the section suppose κ is an inaccessible cardinal,
although various facts hold more generally. For X, Y ∈ Pow(κ) let
X ⊆t Y denote that X − Y is thin. A binary relation which is reflexive
and transitive will be called a quasi-order; other terminology is in use,
such as preorder.
Lemma 3. The relation X ⊆t Y on Pow(κ) is a quasi-order.
Proof: X ⊆t X is immediate. If X ⊆t Y and Y ⊆t Z then X ⊆ Y ∪
T1 and Y ⊆ Z ∪T2 then where T1 and T2 are then. Then X ⊆ Z ∪T1 ∪T2 ,
and T1 ∪ T2 is thin since the thin sets form the ideal dual to the club
filter. ⊳
Let Hi (X) = {λ ∈ Inac : X ∩ λ is stationary below λ}. Then
H(X) = X ∩ Hi (X); note the resemblance to Lim′ and Lim.
Lemma 4.
a. If X ⊆ Y then Hi (X) ⊆ Hi (Y ).
b. Hi (Hi (X)) ⊆ Hi (X).
c. Hi (X ∪ Y ) = Hi (X) ∪ Hi (Y ).
Proof: Part a is obvious. For part b, suppose λ ∈
/ Hi (X) where
λ ∈ Inac. Then there is a club subset C ⊆ λ such that C ∩ X =
∅. If µ ∈ Lim(C) ∩ Inac then C ∩ µ is a club subset of µ, whence
X ∩ µ is a thin subset of µ, whence µ ∈
/ Hi (X). Since µ was arbitrary,
Lim(C) ∩ Hi (X) = ∅, whence λ ∈
/ Hi (X) ∪ Hi (Y ). Part c follows just as
lemma 31.c. ⊳
Lemma 5.
a. If X is thin then Hi (X) is thin.
b. If X ⊆t Y then Hi (X) ⊆t Hi (Y ).
c. If X ⊆t Y then H(X) ⊆t H(Y ).
Proof: For part a, suppose Hi (X) is stationary. Let C be a club
subset; then Lim(C) is a club subset, so there is some λ ∈ Lim(C) ∩
Hi (X). Since λ ∈ Lim(C), C ∩ λ is a club subset of λ; and since
λ ∈ Hi (X), X ∩ λ is a stationary subset of λ. Thus, C ∩ X is nonempty,
and since C was arbitrary X is stationary. For part b, if X ⊆t Y then
X ⊆ Y ∪ T where T is thin, so Hi (X) ⊆ Hi (Y ∪ T ) = H(Y ) ∪ H(T ),
so Hi (X) ⊆t Hi (Y ). Part c follows from part b, and the fact that the
operation ∩ (and also ∪) respects ⊆t , whose proof is left to the reader.
⊳
Lemma 6. If Xξ is a family of subsets of κ then Lim(∩ξ Xξ ) ⊆
∩ξ Lim(Xξ ).
Proof: This follows by lemma 31.1.a, and ∩ξ Xξ ⊆ Xξ ⊳
105
Lemma 7. If X is a stationary subset of κ and C is a club subset,
then X ∩ C is a stationary subset.
Proof: If D is a club subset then X ∩ C ∩ D is nonempty because
C ∩ D is a club subset. ⊳
Lemma 8. If X, Y are subsets of κ and C is a club subset, and
X ⊆ H(Y ), then X ∩ Lim(C) ⊆ H(Y ∩ C).
Proof: An element of X ∩ Lim(C) is an element of H(Y ) also,
whence it is an inaccessible cardinal, and λ ∈ C and λ ∈ Y . C ∩ λ is a
club subset of λ, and Y ∩ λ is a stationary subset of λ, so by lemma 7
Y ∩ C ∩ λ is a stationary subset of λ. It follows that λ ∈ H(Y ∩ C). ⊳
The binary relation X ≺ Y on the stationary subsets of an inaccessible cardinal κ will be defined to hold if and only if Y ⊆t H(X). This
is a variation from [Jech1] in that a variation of the Mahlo operation is
used.
Lemma 9. The relation X ≺ Y is transitive.
Proof: If Y ⊆t H(X) and Z ⊆t H(Y ) then H(Y ) ⊆t H(H(X)) by
lemma 5, and H(H(X)) ⊆ H(X), whence Z ⊆t H(X) by lemma 3. ⊳
Lemma 10. The relation X ≺ Y is well-founded.
Proof: Suppose X0 ≻ X1 ≻ X2 ≻ · · · is an infinite descending
chain. Let Ci be a club set disjoint from Xi −H(Xi+1 ), so that Xi ∩Ci ⊆
H(Xi+1 ). Let Ci′ = ∩j≥0 Lim(j) (Ci+j ) where Lim(j) (X) denotes Lim,
applied to X j times. Let Xi′ = Xi ∩ Ci′ . By lemma 8 Xi ∩ Ci ∩
′
′
Lim(Ci+1
) ⊆ H(Xi+1 ∩ Ci+1
). Using lemma 6, Ci ∩ Lim(Ci+1 )′ ⊆ Ci′ .
′
′
Thus, Xi ⊆ H(Xi+1 ). Let λi be the least element of Xi′ . Then λi ∈
′
), so λi > λi+1 , which yields an infinite descending chain of
H(Xi+1
ordinals. ⊳
Given a well-founded relation on a set S, for an ordinal ρ, the set
Sρ of nodes of rank ρ is defined by the transfinite recursion, Sρ equals
the minimal elements of S − ∪ν<ρ Sν . For x ∈ S let ρ(x) be the unique
ρ such that x ∈ Sρ . The function ρ : S 7→ Ord will be referred to as
the canonical rank function of the relation. If a well-founded relation
< on a set S is transitive the canonical rank function has the following
properties.
1. If ν < ρ(y) then there is a z < y with ρ(z) = ν. (y is not minimal
in S1 = S − ∪ξ<ν Sξ , so there is a minimal z ∈ S2 = y < ∩ S1 ; z is
easily seen to be minimal in S1 , so ρ(z) = ν.)
2. If y > x then ρ(y) > ρ(x). (If ρ(x) = ρ(y) then x and y are
incomparable. If ρ(x) > ρ(y) then there is a y ′ with ρ(y ′ ) = ρ(y)
and x > y ′ ; but then y > y ′ , a contradiction.)
3. If x, y are such that y > z ⇒ x < z then ρ(y) ≤ ρ(x). (If ρ(y) >
ρ(x) then there is a z such that y > z and ρ(z) = ρ(x); but then
x 6> z.)
106
4. The range of ρ is an ordinal α. Since ρ : S 7→ α is surjective,
α < |S|+ .
Let S denote the stationary subsets of (Inac ∩ κ) ∪ {0}. Let ρ≺
denote the rank function of the order ≺ restricted to S. This rank has
the following properties.
1. If X ⊆t Y then ρ≺ (X) ≥ ρ≺ (Y ). (This follows since then Y ≺ Z
implies X ≺ Z.)
2. If ρ≺ (Y ) ≥ α, and Z ⊆t Y whenever ρ≺ (Z) ≥ α, then ρ≺ (Y ) = α.
(If ρ≺ (Y ) > α and Z is such that ρ(Z) = α and Y ≺ Z then Z ⊆t Y
and Y ⊆t H(Z), a contradiction).
A stationary set M ∈ S will be said to be maximal of rank ρ if
ρ≺ (M ) = ρ, and X ⊆t M whenever X ∈ S and ρ≺ (X) ≥ ρ.
Lemma 11. Suppose X ′ and Xξ are elements of S.
0. If Inac ∩ κ is stationary then it is in S and has rank 0.
1a. If X = H(X ′ ) is stationary then X ∈ S and ρ≺ (X) ≥ ρ≺ (X ′ ) + 1;
and
1b. if X ′ is maximal then X is maximal and ρ≺ (X) = ρ≺ (X ′ ) + 1.
2a. Suppose η < κ is a limit ordinal, ρξ for ξ < η is an increasing
sequence of ordinals with limit ρ, ρ≺ (Xξ ) = ρξ for ξ < η, X =
∩ξ<η Xξ , and X is stationary. Then X ∈ S and ρ≺ (X) ≥ ρ; and
2b. if Xξ is maximal for ξ < η then X is maximal and ρ≺ (X) = ρ.
3a. Suppose ρξ for ξ < κ is an increasing sequence of ordinals with
limit ρ, ρ≺ (Xξ ) = ρξ for ξ < κ, X = △ξ<κ Xξ , and X is stationary.
Then X ∈ S and ρ≺ (X) ≥ ρ; and
3b. if Xξ is maximal for ξ < κ then X is maximal and ρ≺ (X) = ρ.
Proof: Part 0 is immediate. For part 1a, X ∈ S is clear, and
since X ≺ X ′ , ρ≺ (X) > ρ≺ (X ′ ). For part 1b, if ρ≺ (Y ) ≥ ρ + 1 then
ρ≺ (Y ) > ρ, so there is a Z such that ρ≺ (Z) = ρ and Y ≺ Z. Inductively,
Z ⊆t X ′ , so Y ⊆t H(Z) ⊆t H(X ′ ). For part 2a, again X ∈ S is clear,
and since X ⊆ Xξ for each ξ, ρ≺ (X) ≥ ρξ for each ξ, whence ρ≺ (X) ≥ ρ.
For part 2b, if ρ≺ (Y ) ≥ ρ then ρ≺ (Y ) ≥ ρξ for all ξ < η, so Y ⊆t Xξ
for all ξ < η, so Y ⊆t X. Parts 3a and 3b are similar to part 2, noting
that 0 ∈ X is allowed by the definition of S. ⊳
Corollary 12. For any scheme Σ in κ if HΣ (Inac ∩ κ) is stationary
then it is maximal.
Proof: The proof is by induction on Σ. ⊳
The rank ρ≺ (κ) of κ is defined to be the least ρ such that for every
ν < ρ, there is a stationary set S ⊆ Inac with ρ≺ (S) ≥ ν. With the
foregoing conventions, any κ has rank ≥ 0, and the rank of κ is ≥ 1 if
and only if Inac ∩ κ is stationary, if and only if κ is Mahlo, if and only
if S is nonempty. If arbitrary stationary sets rather than sets in S were
allowed, the rank would always be at least 1.
107
Theorem 13. An inaccessible cardinal κ is greatly Mahlo if and only
if ρ≺ (κ) ≥ κ+ .
Proof: κ is greatly Mahlo if and only if HΣ (Inac ∩ κ) is nonempty
for all schemes Σ in κ. Lemma 31.5 clearly holds with Lim′ replaced by
H, whence κ is greatly Mahlo if and only if HΣ (Inac ∩ κ) is stationary
for all schemes Σ in κ. By corollary 12 κ is greatly Mahlo if and only if
there is a stationary set of rank ρ for all ρ < κ+ , which is so if and only
if its Mahlo rank is ≥ κ+ . ⊳
33. Reflection principles.
A reflection principle in set theory is a statement to the effect that
if some statements holds in a model then they hold in a submodel.
There are a variety of such principles, some provable in ZFC, and some
stronger than ZFC. One example, lemma 20.2, has already been given.
It is a theorem of ZFC that a single formula can be reflected. Theorem 1.8.1 of [Devlin] is a general theorem of this type. Namely, suppose
W = ∪α∈Ord Wα , where Wα is a transitive set, the predicate x ∈ Wα is
definable, α ≤ β ⇒ Wα ⊆ Wβ , and if α ∈ LimOrd then Wα = ∪β<α Wβ .
Let F be a formula, with free variables among ~x. Then for any α there
is a limit ordinal β > α such that ∀~x ∈ Wβ (F W ⇔ F Wβ ). The version
where W = V and Wα = Vα is proved in theorem 12.14 of [Jech2]; and
theorem 3.6.3 of [Drake], where it is called the Montague-Levy reflection
principle.
Inaccessible cardinals can be characterized in terms of a reflection
principle. Let L be the first order language h=, ∈, P i where P is a unary
predicate. By a structure for L will be meant a pair hM, Xi where M
is a transitive set and X = P̂ , where as in section 6 P̂ denotes the
interpretation of P . A substructure is a pair hN, X ∩ N i where N ⊆ M .
Say that an ordinal α is Π10 -indescribable if for every sentence F in
L, if |=Vα ,X F then for some β < α |=Vβ ,X∩Vβ F .
Theorem 1. α is an inaccessible cardinal if and only if α is Π10 indescribable.
Remarks on proof: Suppose α is an inaccessible cardinal κ. Let
X ⊆ Vκ be an interpretation for P , and let {f } be a set of Skolem
functions for the formulas of the language L. Since κ is inaccessible,
for any β < κ there is a γ with α < β < κ, such that for all ~x ∈ Vβ
and all Skolem functions f , f (~x) ∈ Vγ ; let e(β) be the least such γ.
Let γ0 = 0, and given γn let γn+1 = e(γn ). Let γ = sup{γn }. Using
lemma 20.1, it follows that hVγ , X ∩ Vγ i is an elementary substructure
of hVκ , Xi. The opposite implication is proved by showing that if α fails
to have a property of an inaccessible cardinal, then a sentence F can be
found which is not reflected. Sentences using more than one predicate
may be used, since these can be converted to sentences using a single
108
predicate (code two classes X1 , X2 as the single class ∪i=1,2 {hi, xi : x ∈
Xi }). If α = γ + 1 let X = {γ}, and let F be ∃xP (x). If γ < α,
f : γ 7→ α, and f [γ] is unbounded, let X1 = {γ} and X2 = f , let G(x)
be “P2 is a function with domain x taking ordinal values”, and let F
be ∃xP1 (x) ∧ G(x). If γ < α, f : Pow(γ) 7→ α, and f is surjective,
let X1 = {γ} and X2 = f , let G(x) be “P2 is a function with domain
Pow(x) taking ordinal values”, and let F be ∃xP1 (x) ∧ G(x). If α = ω
let F be “∀x∃y(x ∈ y)”. For further details see theorem 9.1.3 of [Drake].
⊳
34. Indescribable cardinals.
New types of large cardinals can be obtained by generalizing the
formulas that are considered in the reflection principle of theorem 33.1.
The language of the more general formulas has two sorts of variables,
“first order”, which range over elements of the universe of discourse, and
“second order”, which range over subsets.
Second order variables are sometimes more convenient than unary
predicate symbols, provided some technical details are attended to. The
atomic formulas are
x ∈ y, x = y, x ∈ Y , X = Y ,
where x, y are first order variables and X, Y are second order variables.
Some authors continue to use Y (x) rather than x ∈ Y .
Second order variables may be quantified over. The notion of a
structure has already been indicated; second order variables range over
subsets, and the satisfaction definition only needs to be modified accordingly. The restriction of the value X of a free second order variable
to a substructure N is X ∩ N .
A formula with no bound second order variables is called ∆10 ; it
is convenient to consider these to be the Π10 and Σ10 formulas as well.
This differs slightly from the definition in section 33, in that several
several free second order variables are allowed, rather than coding them
into a single one. A Π1n+1 (resp. Σ1n+1 ) formula is obtained from a
Σ1n (resp. Π1n ) formula by preceding it with universal (resp. existential)
quantifications of second order variables.
Variables of order higher than second order can be considered. This
is omitted here; see [Koellner2] for one discussion.
Using notation from chapter 20, an inaccessible cardinal κ is said to
be Π1n -indescribable if for every Π1n formula F and suitable variable list
~ then for some α < κ |=Vα F ~ (X1 ∩ Vα , . . . , Xk ∩ Vα ).
~ , if |=Vκ F ~ (X)
W
W
W
By theorem 33.1, there is no loss of generality in requiring κ to be an
inaccessible cardinal. For simplicity the free variables are required to be
second order; these can be used to “simulate” first order free variables.
To avoid repetition, let F̄ denote a formula, together with a suit109
able list of variables and assignment X1 , . . . , Xk of subsets of Vκ to the
variables; and for α < κ let F̄ ↾α denote the result of replacing Xi by
Xi ∩ α.
There is no point to defining Σ1n -indescribable cardinals. Omitting
free variables, if F̄ is ∃X Ḡ, and |=Vκ F̄ , then |=Vκ Ḡ where Ḡ extends
the assignment to the variable X in some way, so |=Vα Ḡ for some α < κ,
so |=Vα F̄ .
If κ is a Π1n -indescribable cardinal, a subset Q ⊆ κ is said to be Π1n enforceable if there is an F̄ such that |=Vκ F̄ , and {α : |=Vα F̄ ↾α} ⊆ Q.
By the definition of a Π1n -indescribable cardinal, a Π1n -enforceable set
is nonempty. In fact, such a set is stationary. If F̄ witnesses that Q is
Π1n -enforceable, and C is a club subset of κ, let Ḡ be the formula “C
is unbounded and F̄ ”. Then |=Vκ Ḡ, so |=Vα Ḡ↾α for some α. Since
C is a club subset, α ∈ C; and since F̄ holds, α ∈ Q. Thus, Q ∩ C is
nonempty, and since C was arbitrary, Q is stationary.
A Π1n formula and assignment F̄ may be coded as a subset of Vκ ;
F̃ will be used to denote such a code. The predicate “F̃ is true” will be
denoted “Trun (F̃ )”. There is a Π1n formula which defines this predicate
in any Vκ for κ an inaccessible cardinal (in fact, in any Vα where α ∈
LimOrd). This predicate is Π1n . Only a brief discussion will be given
here; see [Drake] for further details. By usual methods, the predicate
Tru0 (F̃ ) is ∆11 . Tru1 (F̃ ) states that “for all G̃, if G̃ is properly derived
from F̃ then Tru0 (G̃)”; and similarly for additional quantifiers.
The predicate “F̃ is true in M ” will be denoted Sats.o. (M, F̃ ).
There is a Π1n formula which defines this predicate in any Vα where
α ∈ LimOrd.
Using the preceding predicates, it may be shown that the set of
Π1n -indescribable cardinals is a Π1n+1 -enforceable set. Let En be the
sentence “∀F̃ (Trun (F̃ ) ⇒ ∃α(Sats.o. (Vα , F̃ )))”. En is Π1n+1 , and an
inaccessible cardinal λ is a Π1n -indescribable cardinal if an only if |=Vλ
En . In particular, if κ is a Π1n+1 -indescribable cardinal then |=Vκ En .
From hereon only Π11 -indescribable cardinals will be considered.
Theorem 1. Suppose κ is a Π11 -indescribable cardinal. The collection of Π11 -enforceable sets is a proper normal filter which contains
Inac ∩ κ and is closed under H.
Remarks on proof: The formula witnessing that Inac ∩ κ is enforceable is “∃x(x = ω) and ∀x(Pow(x) is a set) and ∀F ∀x(if F is
a function then F [x] is a set)”. F̄ witnesses that Q is is enforceable
then “F̄ and ∀C(C club implies C ∩ Q nonempty)” witnesses that
H(Q) is is enforceable. Suppose Qξ ⊆ κ for ξ < κ is is witnessed
by F̄ξ . Let F̃ = {hξ, xi : x ∈ F̃ξ }. Then △ξ<κ Qξ is witnessed by
∃x(x = ω) ∧ ∀x∃y(x ∈ y) ∧ ∀ξ(G̃ = F̃ξ ⇒ Tru′ (ξ, c, P )), where G̃ = F̃ξ is
110
an abbreviation for a formula involving G̃ and F̃ . For the case ∩ξ<µ Qξ ,
replace ∃x(x = ω) by ∃x(x = µ), and ∀ξ by ∀ξ < µ. For further details
see section 9.1 of [Drake]. ⊳
Thus, a Π11 -indescribable cardinal is greatly Mahlo. It may be
shown that the greatly Mahlo cardinals below a Π11 -indescribable cardinal comprise a Π11 -enforceable set. It is a topic of current research, what
ρ≺ (κ) is for a Π11 -indescribable cardinal, and whether a determination
of this would provide evidence that Π11 -indescribable cardinals exist.
Theorem 2. If κ is a Π11 -indescribable cardinal then κ has the tree
property.
Remarks on proof: This is theorem 9.15 of [Drake], and follows
from theorem 17.18 of [Jech2]. Suppose κ is inaccessible and T is a κtree, which recall from section 24 is a tree whose height is κ, and where
|Tα | < κ for each α. By enumerating the levels successively, it may
be assumed that Tα ⊆ Vβ , where the function α 7→ β is an increasing
function from κ to κ. If T has a branch of length κ then the formula “T
is unbounded and for all X, if X is a branch of T then X is bounded”
is true in Vκ but not in Vα for any α < κ. ⊳
As mentioned in section 24, if an inaccessible cardinal has the tree
property then it is Π11 -indescribable. Since the proof is somewhat involved it will be omitted. However some comments will be made.
First, some standard notation for “Ramsey-theoretic” properties of
sets will be introduced. Let [x]n denote the set of n element subsets
of a set x, where n is an integer. If f : [x]n 7→ I, a subset y ⊆ x is
said to be homogeneous for f if there is an i ∈ I such that f (s) = i for
all s ∈ [y]n . For cardinals κ, µ, and λ, and an integer n, the notation
κ → (λ)nµ is used for the statement “for any function f : [κ]n 7→ µ, there
is a homogeneous subset of κ of cardinality λ”.
A classic example of a Ramsey-theoretic theorem states that 6 →
(3)22 . That is, if the lines between 6 points are colored red and green,
then either there is a red triangle or a green triangle.
Following is a list of properties which a cardinal might have.
1. Inaccessible and tree property
2. Uncountable and κ → (κ)22
3. Inaccessible and a certain infinitary language has a certain compactness property
4. A certain elementary extension property
5. Π11 -indescribable
These properties are equivalent. Following are references to the proof in
[Jech2]. See also theorem 10.2.1 of [Drake]. 1⇔2 is proved in lemmas 9.9
and 9.26. 1⇔3 is proved in theorem 17.13.(i). 3⇔4 is proved in lemma
17.17. 4⇔5 is proved in theorem 17.18. If the compactness property of
111
property 3 is modified slightly the requirement of inaccessibility becomes
redundant; see proposition 4.4 of [Kanamori3].
If a cardinal is Π11 -indescribable then it remains so in L. Again,
the proof is somewhat involved, and may be found in theorem 17.22 of
[Jech2]. The claim is proved for the property κ → (κ)22 .
35. Ultrapowers.
Ultrapowers are a construction from mathematical logic which have
various uses in set theory; one will be seen in the next section. Ultrapowers are a special case of a slightly more general construction, ultraproducts.
Recall that a filter of subsets of a set S is a nonempty subset of
Pow(S) such that if A ∈ F and A ⊆ B then B ∈ F , and if A, B ∈ F
then A ∩ B ∈ F . A filter is
- proper if ∅ ∈
/ F,
- µ-complete for a cardinal µ if the intersection of fewer than µ sets
in F is in F ,
- an ultrafilter if it is proper and for any A ⊆ S, either A ∈ F or
Ac ∈ F , and
- principal if there is some A0 ∈ S such that A ∈ F if and only if
A0 ⊆ A.
It is readily verified that in a principal ultrafilter, A0 must be a singleton
set, i.e., of the form {x} for some x ∈ S. Indeed, if x ∈ A0 and {x}c ∈ F
then A0 − {x} ∈ F .
Let {Mi : i ∈ I} be a family of structures for a first order language.
The Cartesian product ×i∈I Mi is the set of sequences hxi : i ∈ Ii where
xi ∈ Mi . Let U be an ultrafilter on U . A relation ≡U is defined on
×i∈I Mi by the requirement, hxi i ≡U hyi i if and only if {i ∈ I : xi =
yi } ∈ U .
Lemma 1. ≡U is an equivalence relation.
Proof: hxi i ≡U hxi i because I ∈ U . If hxi i ≡U hyi i then obviously
hyi i ≡U hxi i. If {i ∈ I : xi = yi } = A where A ∈ U and {i ∈ I : yi =
zi } = B where B ∈ U then {i ∈ I : xi = zi } ⊇ A ∩ B. ⊳
Let (×i∈I Mi )/ ≡U denote the set of equivalence classes. This is
called the ultraproduct of the Mi , by the ultrafilter U .
Lemma 2. (×i∈I Mi )/ ≡U may be made into a structure M for the
language by setting
- cM = [hcMi i] for a constant c,
- f M ([hx1i i], . . . , [hxki i]) = [hf Mi (x1i , . . . , xki )i] for a function f , and
- P M ([hx1i i], . . . , [hxki i]) if and only if {i : P Mi (x1i , . . . , xki i)} ∈ U .
Proof: Suppose for 1 ≤ j ≤ k that {i : hxji i = hyji i} = Aj where
Aj ∈ U . Then for i ∈ ∩j Aj , f Mi (x1i , . . . , xki ) = f Mi (y1i , . . . , yki ) and
P Mi (x1i , . . . , xki ) if and only if P Mi (y1i , . . . , yki ). ⊳
112
If = is a symbol of the language, and is interpreted as equality in Mi ,
then the interpretation of = in M is equality, since if {i : xi = yi } ∈ U
then [hxi i] = [hyi i].
Theorem 3. Suppose (×i∈I Mi )/ ≡U is the ultraproduct of the Mi ,
by the ultrafilter U . Suppose F is a formula, with a suitable variable
list. Suppose [hx1i i], . . . , [hxki i] are elements of M . Then
|=M F ([hx1i i], . . . , [hxki i]) if and only if {i : |=Mi F (x1i , . . . , xki )} ∈ U .
Proof: First, for a term t with k variables in order, the value of t
in M at [hx1i i], . . . , [hxki i] equals [ht̂i i], where t̂i is the value in Mi of
t at x1i , . . . , xki ; this follows by induction on t. Second, for an atomic
formula F = P (t1 , . . . , tl ), |=M F if and only {i : P Mi (t̂1i , · · · , t̂li )} ∈ U ,
and P Mi (t̂1i , · · · , t̂li ) if and only |=Mi F . Third, for F = ¬G, |=M F
/ U , if and only if
if and only if ¬ |=M G, if and only if {i : |=Mi G} ∈
{i : |=Mi ¬G} ∈ U . Fourth, for F = G ∧ H, |=M F if and only if |=M G
and |=M H, if and only if {i : |=Mi G} ∈ U and {i : |=Mi H} ∈ U , if and
only if {i : |=Mi G and |=Mi H} ∈ U , if and only if {i : |=Mi G ∧ H} ∈ U .
Fifth, suppose F = ∃wG. If |=M F then |=M G where a value [hwi i] is
assigned to v. Then {i : |=Mi G} ∈ U . whence {i : |=Mi F } ∈ U . If on
the other hand A ∈ U where A = {i : |=Mi F }, for each i ∈ A |=Mi G,
where a value wi is assigned to v. Then |=M G where v is assigned [hwi i]
(wi being arbitrary if i ∈
/ A), and so |=M F . ⊳
The preceding theorem is called Los’ theorem. It states that F
holds in M if and only if F holds in Mi for “almost all” i, i.e., the set
of such i is in the filter of measure 1 sets.
If Mi = M for all i the ultraproduct is called an ultrapower. This
may be denoted M I / ≡U . The ultrapower construction provides a construction of “elementary extensions” of M , as the following theorem
shows.
A map j : M 7→ N between structures is said to be an elementary
embedding if j is an injective homomorphism, and j[M ] is an elementary
substructure of N . Equivalently, for any formula F , and x1 , . . . , xk ∈ M ,
|=M F (x1 , . . . , xk ) if and only if |=N F (j(x1 ), . . . , j(xk )). The requirement that j be injective is redundant if equality is present, since then
x = y if and only if j(x) = j(y).
Theorem 4. Suppose M I / ≡U is the ultrapower of I copies of M , by
the ultrafilter U . Suppose F is a formula. Suppose j : M 7→ M I / ≡U
is the map where j(x) equals [Cx ] where Cx is the sequence which is
constantly x. Then j is an elementary embedding.
Proof: The induction in the proof of theorem 3 shows that for
x1 , . . . , xk ∈ M , |=M I /≡U F (j(x1 ), . . . , j(xk )) if and only if
{i : |=Mi F (x1 , . . . , xk )} = I, if and only if |=M F (x1 , . . . , xk )}. Since
Cx ≡U Cy only if x = y, j is injective. ⊳
113
36. Measurable cardinals.
Measurable cardinals were first defined by S. Ulam in 1930. In 1962
some facts about the size of a measurable cardinal were established. In
1961 D. Scott had already shown that the existence of a measurable
cardinal had implications for statements of set theory which were not
concerned with large cardinals. This was another milestone of set theory
dating from the early 1960’s. Measurable cardinals are among the smallest large cardinals for which this is the case, and a considerable amount
of research since has concerned implications which stronger types of
large cardinals have.
A measurable cardinal is defined to be an uncountable cardinal κ,
such that there is a κ-complete nonprincipal ultrafilter U ⊆ Pow(κ).
The name arose from connections with measure theory; see [Jech2] for
a discussion of this topic.
If U is a κ-complete nonprincipal ultrafilter on κ, and |A| < κ, then
A∈
/ U . This follows because {α} ∈
/ U for any α < κ, and if |A| < κ
then A is the union of fewer than κ such sets.
Suppose κ is a measurable cardinal, and U ⊆ Pow(κ) is a κcomplete nonprincipal ultrafilter. Even though V is a proper class, the
ultrapower V κ / ≡U may be defined. It is a proper class. Rather than
taking the entire equivalence class of a sequence hxα : α < κi, only the
b [hyi i] holds, if
elements of least rank are taken. The predicate [hxi i]∈
and only if {i : xi ∈ yi } ∈ U .
Recall from section 19 that a binary relation < which is a class has
small extensions if {u : u < v} is a set for all v ∈ S. It will be said to
be well-founded if there are no infinite descending chains.
Lemma 1. If κ is a cardinal and U ⊆ Pow(κ) is a countably complete
ultrafilter then V κ / ≡U has small extensions and is well-founded.
b [hyα i]. Then A ∈ U where A = {α : xα ∈
Proof: Suppose [hxα i]∈
′
b [hyα i] and ρ(hx′α i) ≤
yα }. Let xα equal xα if α ∈ A, else ∅. Then [hx′α i]∈
hyα i. It follows that U ⊆ Pow(κ) has small extensions. Suppose that [si ]
b [si ], where si = hxiα i.
for i ∈ ω were a sequence of elements with [si+1 ]∈
Then Ai ∈ U where Ai = {α : xi+1,α ∈ xiα }, so ∩i Ai is in U , and for
any α ∈ U xi,α is an infinite descending sequence of ordinals. ⊳
The elementary embedding of theorem 35.4 may be adapted to this
case. For a formula F , the statement that F is true in a class may be
cast as a statement in the language of set theory, using relativization to
a class, as discussed preceding theorem 16.1. Theorem 35.4 may then be
proved “formula by formula”, so that it is replaced by an infinite set of
formulas, all of which are provable in ZFC. The terminology “elementary
embedding” is in common use to describe this situation, in addition to
its use for maps between structures which are sets. See the end of the
114
section for further discussion.
Theorem 16.2 holds with D a class, with an arbitrary relation which
has small extensions and is well-founded, and satisfies the axiom of
extensionality; T and π are classes. The modifications to the proof
given in section 16 are minimal. Recursion on the well-founded relation
is used; this is a generalization of recursion on ∈. Theorem 6.15 of
[Jech2] gives a complete treatment.
If κ is a measurable cardinal and U ⊆ Pow(κ) is a κ-complete
nonprincipal ultrafilter, let UltU0 denote V κ / ≡U ; and let j U0 : V 7→
UltU0 be the canonical embedding of theorem 35.4. Let UltU denote
the transitive collapse of UltU0 , and let jU denote π ◦ j U0 where π is the
collapsing isomorphism.
Theorem 2. jU : V 7→ UltU is an elementary embedding.
Proof: j U0 is an elementary embedding and π is an isomorphism.
⊳
Some useful observations about an elementary embedding j are as
follows.
- If y = f (x) is defined by a formula, then y = f (x) if and only if
j(y) = f (j(x)); thus, j(f (x)) = f (j(x)).
- If j : V 7→ M where M is a transitive class then M is a model of
ZFC, and by adapting an argument in section 17, the rank function
on M is absolute.
The identity map is called the trivial embedding; any other embedding
of structures is said to be nontrivial.
Lemma 3. Suppose j : V 7→ M is a nontrivial elementary embedding of V in a transitive class. Then there is a least ordinal α such that
j(α) > α; j ↾ Vα is the identity map.
Proof: Since β < α if and only if j(β) < j(α), j ↾ Ord is increasing,
so j(α) ≥ α. Inductively, if j(β) = β for β < α then j(x) = x for
x ∈ V<α . This is clear if α = 0 or α is a limit ordinal. If α = γ + 1
and x ∈ V<α then x ⊆ Vγ , whence j(x) ⊆ j(Vγ ) = Vj(γ) = Vγ . Thus,
if w ∈ j(x) then j(w) = w, whence j(w) ∈ j(x), whence w ∈ x. Thus,
j(x) ⊆ x. If w ∈ x then w = j(w) ∈ j(x); thus, x ⊆ j(x) also. If
j(α) = α for all α then j is the identity map. It follows that j(α) > α
for some α; since j(β) = β for β < α, j ↾ Vα is the identity map. ⊳
Theorem 4. Suppose j : V 7→ M is an elementary embedding of V
in a transitive class, and j is not the identity map.
a. Let α be the least ordinal such that j(α) > α. Then α is a cardinal
κ.
b. Let U = {X ⊆ κ : κ ∈ j(X)}. Then U is a κ-complete nonprincipal
ultrafilter on κ, and κ is a measurable cardinal.
Proof: To begin with, let U = {X ⊆ α : α ∈ j(X)}. If X, Y ∈ U
115
then α ∈ j(X) ∩ j(Y ), and j(X) ∩ j(Y ) = j(X ∩ Y ), so X ∩ Y ∈ U . If
X ∈ U and X ⊆ Y then j(X) ⊆ j(Y ), so α ∈ j(Y ), so Y ∈ U . Since
j(α − X) = j(α) − j(X) and α ∈ j(α), either α ∈ j(X) or α ∈ j(α − X)
but not both. If β < α then j({β}) = {j(β)} = {β}, so {β} ∈
/ U . Thus,
U ⊆ Pow(α) is a nonprincipal ultrafilter.
Suppose Xξ ∈ U for ξ < η where η < α. Then j(hXξ i) is a
sequence of length j(η) whose j(ξ)-th element is j(Xξ ). Since j(η) = η
and j(ξ) = ξ for ξ < η, j(hXξ i) = hj(Xξ )i. It follows that j(∩ξ Xξ ) =
∩ξ j(Xξ ), whence ∩ξ Xξ ∈ U . That is, U is “α-complete”.
Let κ = |α|; then α is the union of κ singleton sets, so if κ were less
that α then α ∈
/ U by α-completeness, a contradiction. Thus, α = κ.
It has already been shown that U is a κ-complete nonprincipal
ultrafilter on κ. To complete the proof that κ is measurable, it need
only be shown that it is uncountable. For α ≤ ω j(α) = α, since α is
definable and j is elementary. Thus, κ > ω. ⊳
In the remainder of the section, κj and Uj will be used to denote κ
and U of the theorem.
Recall the definitions of diagonal intersection and normal filter from
section 31. A function f : S 7→ Ord where S is a set of ordinals is said to
be regressive if f (α) < α for every nonzero α ∈ S. Say that a function
f is constant on a subset S of its domain if and only if f (x1 ) = f (x2 )
for all x1 , x2 ∈ S. In the case of the club filter, the forward direction of
the following lemma is known as Fodor’s theorem.
Lemma 5. Suppose κ is a regular uncountable cardinal and F ⊆
Pow(κ) is a κ-complete filter. Let I = {X c : X ∈ F } be the dual
ideal. Then F is normal if and only if (∗) for every regressive function
f : X 7→ κ where X ∈
/ I, there is a subset Y ⊆ X such that Y ∈
/ I and
f is constant on Y .
Proof: Suppose F is normal. Let Xξ = {α ∈ X : f (α) = ξ}.
Suppose Xξ ∈ I for every ξ < κ. Let D = △ξ<κ Xξc . Since F is normal,
D ∈ F ; and since X ∈
/ I, D ∩ X 6= ∅. But if α ∈ D ∩ X then α ∈ Xξc
for all ξ < α, whence f (α) 6= ξ for all ξ < α, whence f (α) ≥ α, a
contradiction. Suppose property (∗) holds, and Xξ ∈ F for ξ < κ. Let
D = △ξ<κ Xξ . If α ∈ Dc then α ∈
/ Xξ for some ξ < α; let f (α) be any
such ξ. Suppose D ∈
/ F . By property (∗), there is a subset E ⊆ Dc and
a ξ < κ, such that E ∈
/ I and f (α) = ξ for all ξ ∈ E. Since E ∈
/ I and
Xξ ∈ F , Xξ ∩ E is nonempty. But if α ∈ E ∩ Xξ then f (α) = ξ, so
α∈
/ Xξ , a contradiction. Thus, D must be in F . ⊳
Define the diagonal sequence in an ultrapower V κ / ≡U to be [hxξ i],
where xξ = ξ.
Lemma 6. Suppose j : V 7→ M is an elementary embedding. Suppose U ⊆ Pow(κ) is a κ-complete nonprincipal ultrafilter.
116
a. Uj is normal.
b. κjU = κ.
c. Suppose U is normal. Then jU (κ) is the image under the collapsing
isomorphism of the diagonal sequence.
Proof: For part a, suppose κ is the least ordinal moved by j, X ⊆ κ,
Xc ∈
/ Uj , and f : X 7→ κ is a regressive function. Then X ∈ Uj since
Uj is an ultrafilter, j(f ) : j(X) 7→ j(κ), j(f ) is a regressive function,
and κ ∈ j(X) by definition of Uj . Let γ = j(f )(κ); then γ < κ. Let
Y = {α : f (α) = γ}. Then κ ∈ j(Y ), so Y ∈ Uj ; and f is constant on
Y.
b [hαi],
For part b, suppose α < κ and β < jU (α) in M . Then [hxξ i]∈
where [hxξ i] is the preimage of β under the collapsing isomorphism. It
follows that ∪γ<α {ξ : xξ = γ} ∈ U , and it follows by κ-completeness
that β = jU (γ) for some γ < α. It then follows by induction that
jU (α) = α for α < κ.
In V κ / ≡U let d be the diagonal sequence. It is easily seen that
b d and d∈
b [hκi], and so jU (κ) > κ. This completes the proof of part
[hαi]∈
b.
b d then ξ 7→ xξ
For part c, let d be the diagonal sequence. If [hxξ i]∈
is a regressive function, and by normality hxξ i ≡U hγi for some γ < κ.
⊳
By parts a and b, if κ is a measurable cardinal then there is a
normal κ-complete nonprincipal ultrafilter on κ.
Theorem 7. Suppose U ⊆ Pow(κ) is a normal κ-complete nonprincipal ultrafilter. Suppose F is a second order formula with free
~
~ , and |=Vκ F ~ (X).
Let RF = {α < κ : |=Vα
variables among W
W
FW
(X
∩
V
,
.
.
.
,
X
∩
V
)}.
Then
R
∈
U
.
1
α
k
α
F
~
Remarks on proof: Some preliminary facts are required. First, if κ
is measurable then κ is regular. Indeed, suppose κ were singular, say
κ = ∪ξ<λ Sξ for some λ < κ where |Sξ | < κ for all ξ < λ. Then Sξ ∈
/U
for all ξ < λ, whence κ ∈
/ U , a contradiction.
Second, if κ is measurable then κ is inaccessible. Indeed, suppose
S is a set of functions f : λ 7→ {0, 1} where λ < κ and |S| = κ. Let
Dξ,i = {f ∈ S : f (ξ) = i} for ξ < λ and i = 0, 1. Let Dξ = Dξ,0 if
Dξ,0 ∈ U , else Dξ,1 . Let D = ∩ξ<λ Dξ . Then D ∈ U ; but |D| = 1, a
contradiction.
Third, suppose j : V 7→ M and κ = κj . Then j[Vκ ] ⊆ M , and by
lemma 3 j[Vκ ] = Vκ . It follows that VκM = {x ∈ M : ρ(x) < κ} = Vκ .
M
Also, if X ⊆ Vκ then j(X) ∩ Vκ = X, and so X ∈ M , and so Vκ+1
=
Vκ+1 .
Fourth, under the hypotheses of the theorem, a subset X ⊆ Vκ is
represented by [hX ∩ Vξ i]. Indeed, if x ∈ X then x ∈ Vξ for some ξ < κ,
117
b [hX ∩Vξ i]. On the other hand suppose [hxξ i]∈
b [hX ∩Vξ i], and let
so [hxi]∈
A = {ξ : xξ ∈ X ∩ Vξ }. Then A ∈ U , and ρ(xξ ) < ξ for ξ ∈ A, whence
since A is normal there is a B ⊆ A and a γ < κ such that B ∈ U and
ρ(xξ ) = γ for ξ ∈ B. Thus, xξ ∈ Vγ for ξ ∈ B. Since κ is inaccessible
|Vγ | < κ, so by κ-completeness there is a w ∈ Vγ and a C ⊆ B such that
C ∈ U and xξ = w for ξ ∈ C.
Let N denote UltU . Let F ′ be F , with first (resp. second) order
variables limited to Vκ (resp. Vκ+1 ). Let G states that κ is a limit ordinal
~ Using the
and F ′ holds in Vκ . Using the third fact above, |=N G(κ, X).
fourth fact, |=V κ /≡U G([hξi], [hX1 ∩ Vξ i], . . . , [hXk ∩ Vξ i]). Using Los’
theorem, {ξ < κ : G(ξ, X1 ∩ Vξ , . . . , Xk ∩ Vξ )} is in U , and this is a
subset of RF .
See theorem 9.3.1 of [Drake] or lemma 17.15 of [Jech2] for further
details. ⊳
As a corollary, if κ is measurable then κ is Π1n -indescribable for
any n. By extending the argument it may be shown that κ is Π21 indescribable; see theorem 9.3.1 of [Drake]. It is easily seen that club
subsets of κ are Π10 -enforceable. Thus as another corollary, club subsets
of κ are in a normal ultrafilter U , whence sets in U are stationary.
It is a theorem of D. Scott that if V = L then there are no measurable cardinals. A stronger result will be proved in section 38. A direct
proof may be found in theorem 17.1 of [Jech2].
The following theorem gives a relationship between j and jUj ; it
will be useful in section 44.
Theorem 8. Given an elementary embedding j : V 7→ M , there is
an elementary embedding k : MUj 7→ M , such that j = k ◦ jUj .
Proof: Write κ for κj and U for Uj . A map k ′ on V κ / ≡U will
be defined; k = π ◦ k ′ where π is the collapsing isomorphism. Let
k ′ ([hxξ i]) = j(hxξ i)(κ). Let X = {ξ : xξ = yξ }; if X ∈ U then κ ∈ j(X),
and j(hxξ i)(κ) = j(hyξ i)(κ) follows. Thus, k ′ is a well-defined function.
Suppose |=V κ /≡U F ([hx1ξ i], . . . , [hxkξ i]). Then there is a D ∈ U
such that F (x1ξ , . . . , xkξ ) is true for ξ ∈ D. It follows that κ ∈ j(D),
whence |=M F (j(hx1ξ i)(κ), . . . , j(hxkξ i)(κ)), whence
|=M F (k ′ ([hx1ξ i]), . . . , k ′ ([hxkξ i])), which shows that k ′ is elementary.
Finally, k(jU (x)) = k ′ ([hxi]) = j(hxi)(κ) = j(x). ⊳
Some further comments on elementary embeddings which are proper classes will be made. If j, M , and N are proper classes, it is safe
to say that j is an elementary embedding if for every formula φ~x in
the language of set theory, the formula Eφ is true (which fact would
be demonstrated by proving it in ZFC), where Eφ is y1 = j(x1 ) ∧ · · · ∧
yk = j(xk ) ∧ φM ⇒ φN
y1 /x1 ,...,yk /xk and y1 , . . . , yk are new free variables.
Section 6.2 of [Drake] uses this method.
118
The above method does not, however, yield a single statement of
ZFC which states that j is an elementary embedding. Such a statement
can be given, making use of the observation that truth for Σ1 formulas
is definable. A discussion may be found in section 5 of [Kanamori3].
For any n there is a single statement En which states that j is a Σn elementary embedding. If E1 holds then En holds for any n, and Eφ
holds for any formula φ.
There are types of cardinals “in between” Π1n -indescribable cardinals and measurable cardinals. Recent interest has been in cardinals
larger than measurable; smaller cardinals will generally not be covered
here. Discussions may be found in [Drake] and [Jech2]. Types considered include the following. The notation [X]<ω will be used to denote
the finite subsets of X.
1. Totally indescribable cardinals
2. ν-indescribable cardinals
3. Cardinals for which κ → (ω)<ω
2 .
4. Cardinals for which κ → (ℵ1 )<ω
2
5. Johnsson cardinals
6. Rowbottom cardinals
7. Ramsey cardinals, cardinals for which κ → (κ)<ω
2
In some cases, a cardinal of a higher numbered type is a cardinal of a
lower numbered type, but not always. Types 1-3 are consistent with
V = L, and higher numbered types are not. Types 3 and 4 are called
Erdos, or partition, cardinals.
Subtle and ineffable cardinals are related to indescribable cardinals.
They are consistent with V = L. They have found various uses; see for
example [Friedman1]. Ineffable cardinals were mentioned in section 24.
Because the fact will be used later (section 42), it will be shown
that measurable cardinals are Ramsey.
Theorem 9. If κ is a measurable cardinal then κ → (κ)<ω
2 .
Proof: This is theorem 10.22 of [Jech2]. It suffices to show that if
U ⊆ Pow(κ) is a κ-complete normal ultrafilter then for any partition of
[κ]n into finitely many parts there is a homogeneous subset H ⊆ κ with
H ∈ U . The proof is by induction on n. For the basis n = 1, at least
one of the parts must be in U , because the dual ideal is closed under
finite union. Suppose Pi for 1 ≤ i ≤ t are the parts of a partition of
[κ]n+1 . Suppose α < κ. Let Piα be the sets in Pi with least element α,
and let Qα
i be these sets with α removed. Let T = κ − (α + 1), and
let U ′ = {S ∩ T : S ∈ U }; it is readily seen that U ′ ⊆ Pow(T ) is a κcomplete normal ultrafilter, and U ′ ⊆ U . It follows using the induction
hypothesis that there is a Jα ∈ U such that [Jα ]n ⊆ Qα
i for some i,
which will be denoted iα . Let Ri = {α : iα = i}; then there is a j such
119
that Rj ∈ U . Let H = Rj ∩ △α Jα . If ~x is a set in [H]n+1 let α = x1
and ~
y = x2 , . . . , xn+1 . Then ~x ⊆ Rj , so iα = j; and ~y ⊆ Jα . Thus
[Jα ]n ⊆ Qα
y ∈ Qα
x is in Pj ; this shows that H is homogeneous.
j , so ~
j , so ~
⊳
37. Indiscernibles.
Like ultrapowers, indiscernibles are a construction from mathematical logic with uses in set theory. Suppose M is a structure for some
first order language. Suppose I ⊆ M and < is a linear order on I. I is
said to be a set of indiscernibles for M if, for any formula F and suitable
list of variables, and sequences x1 < · · · < xn and y1 < · · · < yn , F (~x)
if and only if F (~y ).
If I is a set of indiscernibles for M , the set of formulas {F :|=M F (~x)
for x1 < · · · < xk ∈ I} may be considered. This set will be called
the EM-set, where EM is an abbreviation for Ehrenfeucht-Mostowski;
other names include EM-formula, theory of indiscernibles, and character. Note that each F must be given with a suitable list of variables; this
can be formally defined in a straightforward manner, say as a k +1-tuple
hF, ~v i. An EM-set is defined to be a set of formulas which is the EM-set
for some structure M and set of indiscernibles I ⊆ M .
A standard theorem states that if T is a theory with infinite models,
and I is a set with a linear order <, then there is a model M of T which
has (an isomorphic copy of) I as a set of indiscernibles (theorem 3.3.10
of [ChaKei]). Since this is not needed, a proof will be omitted. The
following, called the stretching lemma, is another standard fact.
Theorem 1. Suppose I is an infinite set of indiscernibles for a structure M , and Σ is the EM-set. Suppose J is an infinite linearly ordered
set. Then there is a structure containing J as a set of indiscernibles and
having Σ as its EM-set.
Remarks on proof: This is theorem 3.3.11.b of [ChaKei]. Introduce
additional constants cj for j ∈ J, and consider the theory of M together
with the formulas Fv (cj1 , . . . , cjk ) where j1 < · · · < jk . Since I is infinite,
any finite subset of the enlarged theory T has a model, so T is consistent,
so T has a model. The interpretations of the cj are a copy of J, and
may be linearly ordered according to the indexes j. It is readily seen
that J is a set of indiscernibles and Σ is the EM-set. ⊳
Indiscernibles have additional properties of interest for structures
with an additional property. A structure M is said to have “definable
Skolem functions” if for every formula F there is a formula F s such that
1. |=M ∃!yF s ; and
2. |=M F s ⇒ F .
A standard theorem states that any structure can be “expanded” to one
with definable Skolem functions (proposition 3.3.4 of [ChaKei]). Various
120
particular structures already have them, in particular some structures
of interest in set theory, as will be seen in the next section.
Recalling the definition of the theory of a structure from section
7, it is clear that if M1 and M2 have the same theory, and M1 has
definable Skolem functions, then M2 has definable Skolem functions,
indeed defined by the same formulas.
Recalling the proof of lemma 2 of section 20 and following comments, given a structure M with definable Skolem functions, and X ⊆ S,
if M ′ is the Skolem hull obtained using the definable Skolem functions,
then M ′ ≺ M . M ′ will be called the definable hull of X.
Theorem 2. Suppose the following hold.
1. For j = 1, 2, Mj is a structure, and Ij ⊆ Mj with order <j is a set
of indiscernibles.
2. Σ is the EM-set in both cases.
3. M1 (hence M2 ) has definable Skolem functions.
4. f : I1 7→ I2 is strictly order-preserving.
5. For j = 1, 2 Hj is the definable hull of Ij in Mj .
Then there is a unique elementary embedding f¯ : H1 7→ H2 , whose
restriction to I1 is f . Further, f¯[H1 ] is the definable hull of f [I1 ].
Remarks on proof: This is theorem 3.3.11.d of [ChaKei]. Each
element of H1 is given by a term, involving applying Skolem functions
to elements of I1 . Such a term determines an element of H2 , by replacing
an element x ∈ I1 by f (x), and using the “same” Skolem functions, i.e.,
those defined in M2 by the same formulas.
If t1 and t2 determine the same element of H1 , there is a formula
stating that t1 = t2 , which holds in M1 at certain x ∈ I1 . By hypotheses
2 and 4, replacing each x by f (x), the formula holds in M2 . Thus, f¯(w)
may be defined to be the value determined in M2 by t, where t is any
term determining w in M1 .
Given a formula F , and values ~x in H1 , there is a formula G, and
values ~y in I1 , such that |=H1 F (~x) if and only if |=H1 G(~y ). Let ~y∗
denote the vector where y∗i = f (yi ); then |=H1 G(~y ) if and only if |=H2
G(~y∗ ). Let ~x∗ denote the vector where x∗i = f¯(xi ); then |=H2 G(~y∗ ) if
and only if |=H2 F (~x∗ ). This shows that f¯ is an elementary embedding.
let H2′ denote the definable hull of f [I1 ]. It is readily seen from the
definition of f¯(w) that H2′ ⊆ f¯[I1 ]. If w′ ∈ H2′ then there is a term
t determining it in M2 , involving elements ~x∗ where x∗i = f (xi ) with
xi ∈ I1 . Let w be the element of H1 determined in M1 by replacing x∗i
by xi in t; then f¯(w) = w′ , so H2′ f¯[H1 ]. ⊳
38. 0#.
0# may be defined as an EM-set with certain properties. It will
be seen that if V = L then there is no such set. In particular, unless
121
ZFC is inconsistent, it is not provable in ZFC that 0# exists. Thus,
the principle “0# exists”, discovered by Solovay and Silver in the late
1960’s, extends ZFC in a manner contrary to V = L.
It will be seen that “0# exists” follows from “there exists a measurable cardinal”. As mentioned in section 36, that the existence of a
measurable cardinal implies V 6= L was first proved directly, by Scott
in 1961. Various other principles imply “0# exists”, and hence that
V 6= L. Since its introduction, the notion of 0# has found many uses.
In this section, by “EM-set” will be meant the EM-set for a structure Lγ in the language of set theory, where γ ∈ LimOrd, and an infinite
set I ⊆ γ of indiscernibles, with the natural order. The structure Lγ has
definable Skolem functions; indeed F c is the formula, “y is the <L -least
y such that F if there is any such y, else ∅”.
Theorem 1. Suppose Σ is an EM-set. For each infinite ordinal
α there is a structure MΣ,α , unique up to isomorphism, and a set of
indiscernible I ⊆ OrdM , such that I has order type α, Σ is the EM-set,
and MΣ,α is the definable hull of I.
Remarks on proof: This is theorem 18.7 of [Jech2]. Existence follows by taking the definable hull of I in the structure given by theorem
37.1. The proof of theorem 37.1 needs to be modified, by adding the
formulas “cj is an ordinal”, and cj1 ∈ cj2 for j1 < j2 , to T . Uniqueness
follows by theorem 37.2. ⊳
There are three important properties an EM-set might have. An
EM-set may be
1. well-founded,
2. unbounded, or
3. remarkable.
These will be discussed in turn. The statement “0# exists” is the statement “there is an EM-set with properties 1-3”. Theorem 7 below gives
an equivalent condition.
By a Skolem term will be meant a term involving Skolem funcb is
tions. Say that a structure MΣ,α is well-founded if the relation ∈
well-founded.
Lemma 2. If MΣ,α is well-founded then MΣ,α is isomorphic to Lγ
for some γ ∈ LimOrd. Further the set of indiscernibles for Lγ is uniquely
determined.
Remarks on proof: Using remarks following lemma 36.1, MΣ,α is
isomorphic to a transitive set. Also, by the hypotheses on Σ it is elementarily equivalent to Lγ0 for some γ0 ∈ LimOrd. By a version of
the condensation lemma, lemma 20.6 (see section V.2 of [Devlin] for
example), the transitive set is Lγ for some γ ∈ LimOrd. Let I and J be
two sets of indiscernibles. There is a unique order isomorphism f from
122
I to J. This induces an isomorphism f¯ from Lγ to Lγ , which extends
f . Since f¯ must be the identity, f is. ⊳
Say that an EM-set Σ is well-founded if every structure MΣ,α is
well-founded.
Lemma 3. An EM-set Σ is well-founded if and only if there is an
uncountable ordinal α such that MΣ,α is well-founded.
Remarks on proof: This is theorem 18.9 of [Jech2]. Suppose for
some α1 that MΣ,α1 is not well-founded, and hai i is an infinite descending chain, i.e., ai+1 b
ai for all i < ω. Let ai be defined from indiscernibles
by the Skolem term ti . Let I0 be the indiscernibles involved in any ti .
Then I0 is countable; let α2 be its order type. The definable hull of I0
in MΣ,α1 is not well-founded, since it contains the ai . Also, it is MΣ,α2 .
However, MΣ,α2 is the definable hull in MΣ,α of the first α2 elements
of α, and hence, being a substructure of a well-founded structure, is
well-founded. This is a contradiction to the hypothesis that α1 exists.
⊳
For α ∈ LimOrd, say that MΣ,α is unbounded if the indiscernibles
are unbounded in OrdM . Say that an EM-set Σ is unbounded if for
every α ∈ LimOrd, MΣ,α is unbounded.
In the statement of the following theorem, and later in the section,
if t is a Skolem term, the formula defining it will be denoted as t also.
Lemma 4. For an EM-set Σ, the following are equivalent.
a. Σ is unbounded.
b. For some α ∈ LimOrd, MΣ,α is unbounded.
c. Suppose tw~ is a Skolem term. Then Σ contains the formula Fw,v
~ ,
“if tw~ is an ordinal then tw~ < v”.
Remarks on proof: This is theorem 18.10 of [Jech2]. a⇒b is trivial.
For b⇒c, let ~x be any increasing finite sequence of indiscernibles. If,
in MΣ,α , tw~ (~x) is an ordinal, then there is some y with xn < y and
tw~ (~x) < y. Thus, Fw,v
x, y) is true, so Fw,v
is in Σ. For c⇒ a, let α be
~ (~
~
a limit ordinal, let I be the set of indiscernibles in MΣ,α , and let a be an
ordinal. Then a = t where t is a Skolem term involving the increasing
sequence ~x of indiscernibles. If y is any indiscernible greater than xn ,
then by hypothesis Fw,v
is in Σ, whence Fw,v
x, y) is true. ⊳
~
~ (~
For α ∈ LimOrd, say that M = MΣ,α is remarkable if it is unbounded, and whenever b is a limit point (in M ) of the set of indisb b is in the definable hull of I ∩ b (in M ). Say that
cernibles I, every c∈
an EM-set Σ is remarkable if for every α ∈ LimOrd, MΣ,α is remarkable.
In the statement of the following theorem, and later in the section,
if t~v is a Skolem term with variables as indicated, tu~ denotes the term,
where vi is replaced by ui .
Lemma 5. For an unbounded EM-set Σ, the following are equiva123
lent.
a. Σ is remarkable.
b. For some α ∈ LimOrd with α > ω, if b is the ωth indiscernible of
b b is in the definable hull of I ∩ b.
MΣ,α every c∈
c. Suppose tw,~
~ v is a Skolem term. Then Σ contains the formula Fw,~
~ v ,~
u,
‘if tw,~
~ v is an ordinal and tw,~
~ v < v1 then tw,~
~ u ”.
Remarks on proof: This is theorem 18.11 of [Jech2]. a⇒b is trivial.
For b⇒c, let ~x, ~y, ~z be an increasing finite sequence of indiscernibles,
where ~x is the first k indiscernibles, y1 is the ωth. Given a Skolem term
t, let a = tw,~
x, ~y ). If a is an ordinal and a < y1 then by hypothesis
~ v (~
a = s where s is a term involving a finite set J of indiscernibles, each
less than y1 . There is a formula G which, at the values ~x, J, and
~y in increasing order, states that t = s, so is true. Since the values
are indiscernibles, G is true when yi is replaced by zi . It follows that
Fw,~
x, ~y, ~z ) is true, whence Fw,~
~ v ,~
u (~
~ v ,~
u is in Σ. For c⇒ a, let α be a limit
ordinal, let I be the set of indiscernibles in MΣ,α , let b be a limit point
b b. Then for some Skolem term t = tw,~
of I, and suppose c∈
x, ~y )
~ v , c = t(~
where y1 = b. Choose ~
q and ~r of the same length as ~y such that ~x, ~p, ~y , ~q
is an increasing sequence of indiscernibles, Since Fw,~
~ v ,~
u is in Σ, and
c < y1 , t(~x, ~y ) = t(~x, ~
q ). By applying indiscernibility to the appropriate
formula, and c < y1 , t(~x, ~y ) = t(~x, p~). ⊳
Lemma 6. Suppose Σ is an EM-set with properties 1-3. Suppose κ
is an uncountable cardinal, and MΣ,κ is isomorphic to Lγ . Then γ = κ.
Remarks on proof: This is corollary 18.13 of [Jech2]. Suppose I
is the set of indiscernibles. Since I ⊆ γ and |I| = κ, γ ≥ κ. Suppose
γ > κ. Since MΣ,κ is unbounded and I has uncountable order type,
there is a β ∈ Lim(I) so that κ < β. Since MΣ,κ is remarkable, β is a
subset of the definable hull H of I ∩ β. Let α be the order type of I ∩ β;
then α < κ, so H < κ, which is a contradiction since κ ⊆ H. ⊳
Theorem 7. There is an EM-set with properties 1-3, if and only if
there is a γ ∈ LimOrd such that Lγ has an uncountable set of indiscernibles.
Remarks on proof: This is corollary 18.18 of [Jech2]. If the EMset exists, let κ be any uncountable cardinal; by lemma 6 Lκ has an
uncountable set of indiscernibles. Suppose I is an uncountable set of
indiscernibles for Lγ ; choose γ as small as possible. Let Σ be the EM-set,
and let H be the definable hull of I in Lγ . Σ is well-founded because H is.
If H is not unbounded, there is a Skolem term t~v (~x) whose value γ ′ in H
is greater than any element of I. It may be assumed that γ ′ ∈ LimOrd;
if not, subtract an integer from it. Let I ′ = {x ∈ I : x > xk } where xk
is the last (largest) element of ~x. Since Lγ ′ is definable from γ ′ in Lγ ,
there is a formula G~v ,w~ such that for y1 , . . . , yl ∈ I ′ , G~v,w~ (~x, ~y ) is true in
124
Lγ if and only if Fw~ (~y ) is true in Lγ ′ . Since I is a set of indiscernibles for
Lγ , I ′ is a set of indiscernibles for Lγ ′ . Since γ ′ < γ this is impossible,
so H is unbounded, and so Σ is. Since γ is a small as possible, H is
isomorphic to Lγ . By applying the collapsing isomorphism to I, H = Lγ
may be assumed. Choose I so that the ωth element iω of I is as small
as possible. Suppose Lγ is not remarkable. Then there is a Skolem term
t = t~v,w~ such that if ~x, ~y , ~z is an increasing sequence of indiscernibles,
then in Lγ , t(~x, ~y) is an ordinal, t(~x, ~y ) < y1 , and t(~x, ~y) 6= t(~x, ~z). If ~v
has length k and w
~ has length l, partition I into a vector ~x of length k,
followed by successive vectors yζ of length l. Let ηζ = t(~x, ~yζ ). The ηζ
must form either a decreasing or increasing sequence, and the former is
impossible. The set J = {ηζ } is readily seen to be a set of indiscernibles
for Lγ , since a formula involving elements of J can be transformed into
one involving elements of I. Since iω = yω,1 , ηω = t(~x, ~yω ) < iω . If π is
the collapsing isomorphism, π[J] is a set of indiscernibles for Lγ with a
smaller ωth element than I. Thus, Σ is remarkable. ⊳
Next, some consequences of “0# exists” will be given. First, a
lemma about remarkable EM-sets is given.
Lemma 8. Suppose Σ is a remarkable EM-set, M = MΣ,α where
α ∈ LimOrd, and I is the set of indiscernibles. Then in M , I is a closed
and unbounded class of ordinals.
Remarks on proof: This is lemma 18.12 of [Jech2]. By definition I
is unbounded. Suppose β < α is a limit ordinal and b is the βth element
of I. It suffices to show that b is the limit of I ∩b. Let H be the definable
hull in M of I ∩ b; then H is MΣ,β . Since M is remarkable, b is the limit
of the ordinals in H. Also, H is unbounded, and the claim follows. ⊳
Theorem 9. Let Σ be an EM-set with properties 1-3. For each
uncountable cardinal κ, let Iκ ⊆ κ be the set of indiscernibles for Lκ .
Suppose λ < κ is an uncountable cardinal.
a. Iλ = Iκ ∩ λ.
b. Lλ is the definable hull of Iλ in Lκ .
c. Lλ ≺ Lκ
d. λ ∈ Iκ .
e. Σ is uniquely determined.
Remarks on proof: Parts a and b are lemma 18.14 of [Jech2]. Let l
be the λth element of Iκ , and let H be the definable hull of Iκ ∩ l. Then
H equals MΣ,λ , hence it is isomorphic to Lλ , hence the ordinals of H
equal λ. Also, l is the limit of the ordinals of H, so l = λ. Now, H is
closed under the definable function α 7→ Lα , whence Lλ ⊆ H, whence
Lλ = H since the collapsing isomorphism must be the identity. Part
b is proved; part a follows because Iκ ∩ λ is a set of indiscernibles for
H = Lλ . Part c follows because H ≺ M since H is a Skolem hull (see
125
lemma 20.2). Part d has already been shown, since l = λ and l ∈ Lκ .
For part e, by the foregoing each ℵn for n < ω is in Iℵω . Thus, F~v is in
Σ if and only if |=Lℵω F~v (ℵ1 , . . . , ℵk ). ⊳
The union of the Iκ is a class of ordinals, called the Silver indiscernibles.
Corollary 10. Suppose 0# exists, and κ is an uncountable cardinal. Then “Lκ ≺ L”, that is, for any formula F and x1 , . . . , xk ∈ Lκ ,
Fv (~x)L ⇔ Fv (~x)Lκ .
Remarks on proof: By the reflection principle referred to in section
33, there is an uncountable cardinal κ such that Fv (~x)L ⇔ Fv (~x)Lκ .
The corollary follows by theorem 9.c. ⊳
Corollary 11. Suppose 0# exists. For any cardinal κ, |Vκ ∩ L| = κ.
Proof: Let F be the formula “w ∈ x if and only if ρ(w) < κ”.
F (x, κ) holds in L if and only if x = Vκ ∩ L. ∃xF (x, κ) holds in L, so it
holds in Lκ+ . Thus, Vκ ∩ L ∈ Lκ+ , and the corollary follows. ⊳
In particular, there are only countably many constructible subsets
of ω, and certainly V 6= L.
There are various other basic facts concerning 0#, including the
following.
- If 0# exists, if κ is an uncountable cardinal then in L, κ → (ω)<ω
2
(theorem V.2.15.ii of [Devlin]).
- There is a Π1 formula F with free variable v, such that Fv (x) holds if
and only if x is 0#. It follows that 0# is not constructible (theorem
V.3.1 and corollary V.3.2 of [Devlin]).
- There is a Π12 formula F of arithmetic with a free second order
variable V , such that FV (X) holds if and only if X is 0#, considered
as a set of integers (lemma 25.30 of [Jech2]).
- 0# exists if and only if there is a nontrivial embedding j : L 7→ L
(this will be shown in section 44).
It will be seen in section 42 that if a measurable cardinal exists then
0# exists (in fact stronger facts hold). Among other principles which
imply “0# exists” are Chang’s conjecture (Corollary 18.28 of [Jech2]),
and the existence of a Johnsson cardinal (Corollary 18.29 of [Jech2]).
Another example will be seen in section 60.
39. Relative constructibility.
The constructible sets may be generalized, and the resulting construction has numerous applications. If L is a proper subclass of V
then models of interest may be constructed by starting with some sets
or classes of interest, and “closing up” under the constructibility process. Notation varies; a general construction along the lines of [TakZar2]
will be given first, and some special cases with more or less standard
notation defined in terms of it.
126
In this section the language of set theory may be expanded with
unary predicate symbols P1 , . . . , Pk . These may occur in the formulas
in the separation and replacement axioms schemes; the expanded axiom
system will be denoted ZFCP1 ,...,Pk . This is done for convenience; if
the Pi are interpreted as sets or classes defined by formulas, the added
predicates may be removed.
The predicate Sat(D, f, a) of section 18 can be generalized, to consider formulas F in the language expanded with unary predicate symbols
P1 , . . . , Pk , to the predicate Sat(D, A1 , . . . , Ak , f, a). This is true as in
the case of Sat(D, f, a), with the added proviso that for 1 ≤ i ≤ k,
Ai ⊆ D and Pi is interpreted as Ai .
~ f, a) generalizing the function
The function DefBy(S, A,
DefBy(S, f, a) of section 19 is readily defined. Let Def P~ (S) be {T :
∃f ∃a(T = DefBy(S, P1 ∩ S, . . . , P1 ∩ S, f, a))}. Theorem 19.1 holds, in
~
ZFCP .
Given a class A, and a transitive class B, let
L0 (A; B) = ∅,
Lα+1 (A; B) = Def A,B (Lα (A; B)) ∪ (B ∩ Vα+1 ), and
Lα (A; B) = ∪β<α Lβ (A; B) for limit ordinals α.
Let L(A; B) = ∪α Lα (A; B). A and B are used as additional predicates
in adding sets to L(A; B), and also the sets of B are added.
Theorem 19.2 holds, in ZFCA,B . Theorem 19.3 also holds; for the
proof, it is only necessary to observe that part a still follows, because B
is transitive.
Theorem 1. L(A; B) is the smallest class M having the following
properties.
a. M is transitive.
b. Ord ⊆ M .
c. M is a model of ZF.
d. For any x ∈ M , x ∩ A ∈ M .
e. For any x ∈ M , x ∩ B ∈ M .
f. B ⊆ M .
Remarks on proof: That L(A; B) has properties a and b has already
been observed. To prove property c, only minimal changes are needed
to the proofs of lemma 19.4 and theorem 19.5. A similar claim is proved
in theorem 7.25 of [TakZar2]. For part d, if x ∈ L(A; B) then for
some α, x ∈ Lα (A; B). By part a, x ⊆ Lα (A; B). It follows that
x ∩ A = {w ∈ Lα (A; B) :|=Lα (A;B) w ∈ x ∧ A(x)}, from which it follows
that x ∩ A ∈ Lα+1 (A; B). Part e is similar. For part f, if x ∈ B has rank
α then x will be added to Lα+1 (A; B). Suppose M has properties a-f.
Using absoluteness and the hypotheses, it follows readily by induction
that Lα (A; B)M = Lα (A; B), whence L(A; B) = L(A; B)M ⊆ M . ⊳
127
Although usage varies, L[A] is commonly used to denote L(A; ∅)
(the usual definition involves only a single predicate symbol in the extended language). The proof of theorem 19.6, with hardly any modification, shows that L[A] satisfies AC.
For another useful fact, letting à = A ∩ L[A], it follows readily by
induction that Lα [Ã] = Lα [A], whence L[Ã] = L[A]. If A is a set then
à ∈ L[Ã] follows.
A version of the condensation lemma (lemma 20.6) holds for L[A].
It is necessary to add the hypothesis “TC(A ∩ Lα ) ⊆ S” The proof
of lemma 20.6 may then be adapted, since π[A ∩ Lα ] = A ∩ Lα , and
the relativizing predicate does not change due to the collapsing, so that
T = Lβ [A]. (See exercise II.4.A of [Devlin]).
Using this condensation lemma it may be shown that in L[A] for a
set A, for “sufficiently large” κ, 2κ = κ+. One version is used in showing
that if U is a normal κ-complete ultrafilter on a measurable cardinal κ
then the GCH holds in L[U ] (theorem 19.3 of [Jech2]).
For a set b, L(b) denotes L(∅, TC({b})). This may be defined by
the recursion
L0 (b) = TC({b}),
Lα+1 (b) = Def(Lα (b)), and
Lα (b) = ∪β<α Lβ (b) for limit ordinals α.
L(b) is the smallest class with properties a-c of theorem 1, which contains
b as an element. If there is a well-ordering of b in L(b) then L(b) satisfies
AC. The most commonly encountered example, however, is L(R), and
as will be seen in section 53 it is independent of ZF whether R can be
well-ordered.
For a transitive model M of ZFC which is a class, and a set x which
is a subset of M , L(∅; M ∪ {x}) is commonly denoted M [x]. M [x] is the
smallest class N with properties a-c of theorem 1, such that M ⊆ N
and x ∈ N .
M [x] may be defined for M a set. Lα (A; B) is defined as usual for
α ∈ M , and M [x] is the union over these ordinals. The proof of theorem
1.c may be modified to show that M [x] is a model of ZF.
The model M [x] is sometimes of use in forcing arguments. For
example, if x is a Cohen real then M [x] and M [G] are the same, since x
and G are definable from each other. This follows using theorem 21.13.d.
40. Direct limits.
In this section some facts from model theory will be covered, which
are among the many facts of model theory that are useful in set theory.
One use will be seen in the next section.
A poset P is said to be directed if for any x, y ∈ P there is a
z ∈ P such that x ≤ z and y ≤ z. A chain is an example of a directed
128
poset. For many applications of the constructions of this section a chain
suffices, but the general construction is only slightly more involved.
Recall the definition of a homomorphism h : S 7→ T from a structure
S for a first order language to a structure T . An injective homomorphism is called a monomorphism.
A direct system of structures is a family {Di : i ∈ P } of structures,
where P is a directed poset, together with homomorphisms hij : Di 7→
Dj for i ≤ j, such that hii is the identity map, and hjk ◦ hij = hik
whenever i ≤ j ≤ k.
Suppose {Di : i ∈ P } is a direct system of structures. Let U be
the disjoint union of the Di (that is, {hi, xi : x ∈ Di }). Let ≡ be the
relation on U , where hi, xi ≡ hj, yi if and only if for some k ≥ i, j,
hik (x) = hjk (y).
Lemma 1. With notation as above, ≡ is an equivalence relation on
U.
Proof: It is immediate that hi, xi ≡ hi, xi, and if hi, xi ≡ hj, yi
then hj, yi ≡ hi, xi. Suppose hi1 , x1 i ≡ hi2 , x2 i and hi2 , x2 i ≡ hi3 , x3 i.
There is a ka with ka ≥ i1 , i2 and hi1 ka (x1 ) = hi2 ka (x2 ). Similarly
there is a kb with kb ≥ i2 , i3 and hi2 kb (x2 ) = hi3 kb (x3 ). There is a k
with k ≥ ka , kb . Then hi1 k (x1 ) = hka k (hi1 ka (x1 )) = hka k (hi2 ka (x2 )) =
hi2 k (x2 ) = hkb k (hi2 kb (x2 )) = hkb k (hi3 kb (x3 )) = hi3 k (x3 ). ⊳
Let D∞ be the set of equivalence classes of U under ≡.
Lemma 2. Suppose that [hit , xt i] ∈ D∞ for 1 ≤ t ≤ k. Suppose
j1 and j2 both satisfy js ≥ it for 1 ≤ t ≤ k. Let xst = hit js (xt ).
Then for any predicate symbol P , P (x11 , . . . x1k ) holds in Dj1 if and
only if P (x21 , . . . x2k ) holds in Dj2 ; and for any function symbol f , x1k =
f (x11 , . . . x1k−1 ) holds in Dj1 if and only if x2k = f (x21 , . . . x2k−1 ) holds in
Dj2 .
Proof: Choose j3 ≥ j1 , j2 . Letting x3t = hit j3 (xt ), for s = 1, 2 x3t =
hjs j3 (xst ). Thus, P (xs1 , . . . xsk ) holds in Djs if and only if P (x31 , . . . x3k )
holds in Dj3 . The argument for functions is similar. ⊳
Thus, a structure may be defined on D∞ , by letting P ([hi1 , x1 i], . . . ,
[hik , xk i]) hold if and only if P (x′1 , . . . x′k ) holds in Dj for some (and
hence any) j with j ≥ it for 1 ≤ t ≤ k, where x′t = hit j (xt ). Similarly [hik , xk i] = f ([hi1 , x1 i], . . . , [hik−1 , xk−1 i]) if and only if x′k =
f (x′1 , . . . , x′k−1 ) in Dj where j ≥ it for 1 ≤ t ≤ k and x′t = hit j (xt ).
The structure D∞ is called the direct limit of the direct system
S = {Di }, and denoted dirlim(S). The “canonical homomorphism”
hi∞ : Di 7→ D∞ is defined to be the map where for x ∈ Di , hi∞ (x) =
[hi, xi].
Lemma 3.
a. With notation as above, hi∞ is a homomorphism. Also, for i ≤ j,
129
hj∞ ◦ hij = hi∞ .
b. Suppose D′ is a structure, and h′i : Di 7→ D′ are homomorphisms
with h′j ◦ hij = h′i . Then there is a unique homomorphism h′∞ :
D∞ 7→ D′ such that h′∞ ◦ hi∞ = h′i for all i ∈ P .
Proof: For part a, by definition, for a predicate P , and x1 , . . . , xk ∈
Di , P (hi∞ (x1 ), . . . , hi∞ (xk )) holds in D∞ if and only if P (x1 , . . . xk )
holds in Di . A similar claim holds for functions. Thus, hi∞ is a homomorphism. If i ≤ j then hi∞ (x) = [hi, xi] = [hj, hij (x)i] = hj∞ (hij (x)).
For part b, h′∞ ([hi, xi]) = h′i (x) must hold, and straightforward calculation shows that this yields a well-defined homomorphism. ⊳
If E is another structure having the properties of D∞ as stated in
the lemma, then E is isomorphic to D∞ . Indeed, there are homomorphisms in both directions, and by the uniqueness of the map in part b,
it may be seen that their compositions are the identity. Thus, the properties of the lemma characterize the direct limit. This is an example of a
fact from category theory, the lemma stating the existence of the colimit
of a direct system in the category of structures and homomorphisms; see
[Dowd1].
Lemma 4. If hij is a monomorphism for all i ≤ j then hi∞ is a
monomorphism for all i.
Proof: If hi∞ (x) = hi∞ (y) then hi, xi ≡ hi, yi, so for some k ≥ i, j
hik (x) = hik (y). Since hik is a monomorphism, x = y. ⊳
An important special case occurs when Di ⊆ Dj for i ≤ j, and
hij : Di 7→ Dj is the “inclusion map”, that is, hij (x) = x. In this case
D∞ may be taken as ∪i Di , and hi∞ as the inclusion map. Indeed, to
determine the value of a predicate P (x1 , . . . , xk ) in D∞ , choose j ≥ xt
for 1 ≤ t ≤ k, and take the value in Dk ; and similarly for functions. In
particular, in the case of a chain, this produces the union of the chain
as a structure.
Theorem 5. If hij is an elementary embedding for all i ≤ j then
hi∞ is an elementary embedding for all i.
Proof: This is theorem 10.1 of [Sacks1], where it is attributed to
Tarski and Vaught. See also lemma 12.2 of [Jech2]. It follows by induction on the formula F that for all i, if x1 , . . . , xk ∈ Di then F (x1 , . . . , xk )
holds in Di if and only if F (x′1 , . . . x′k ) holds in D∞ , where x′t = hi∞ (xt )
for 1 ≤ t ≤ k. For convenience let ~x denote x1 , . . . , xk , and similarly for
~x′ . If F is atomic the claim follows because hi∞ is a homomorphism.
If F is ¬G then F (~x) is true in Di if and only if G(~x) is false in Di , if
and only if (using the induction hypothesis) G(~x′ ) is false in D∞ , if and
only if F (~x′ ) is true in D∞ . The claim follows similarly for the other
propositional connectives. Suppose F is ∃vG. If F (~x) is true in Di then
G(w, ~x) is true in Di for some w ∈ Di , so by induction G(hi∞ (w), ~x′ )
130
is true in D∞ , so F (~x′ ) is true in D∞ . If F (~x′ ) is true in D∞ then
G(hj∞ (w), ~x′ ) is true in D∞ for some j ≥ i and w ∈ Dj , so by induction G(w, hij (x1 ), . . . , hij (xk )) is true in Dj , so F (hij (x1 ), . . . , hij (xk ))
is true in Dj . Since hij is elementary, so F (x1 , . . . , xk ) is true in Di . ⊳
As for elementary substructures (as noted in section 20), the notion
of a Σn -elementary embedding may be defined, by requiring the defining
property to hold only for formulas which are Σn (and it holds for Πn
formulas also). The cases ∆0 and Σ1 are often of particular interest.
Theorem 5 holds when “elementary” is replaced by “∆0 -elementary”, or “Σ1 -elementary”. The proof requires only minor adjustments.
41. L[U ] and iterated ultrapowers.
Class models of ZFC have been of considerable interest in modern
set theory. An early example of such is L[U ] where U is a κ-complete
nonprincipal ultrafilter on a cardinal κ. Its study helped spur later
developments. Chapter 19 of [Jech2] contains a thorough treatment of
the basic theory. Here, some facts of interest will be stated; the reader
is referred to [Jech2] for proofs and further facts. See also sections 8
and 9 of [KanMag].
Let Ũ = L ∩ U . As noted in section 39, L[U ] = L[Ũ].
Lemma 1. In L[U ], Ũ is a κ-complete nonprincipal ultrafilter on κ.
If U is normal then Ũ is normal.
Remarks on proof: This is lemma 19.1 of [Jech2]; see also theorem
6.5.1 of [Drake]. ⊳
By results of section 39, L[U ] is a model of ZFC.
Lemma 2. L[U ] is a model of GCH.
Remarks on proof: This is lemma 19.2 of [Jech2]; see also lemma
6.5.2 and theorem 6.5.3 of [Drake]. ⊳
Theorem 5 below shows that L[U ] is “tied” to the measure U ; the
following lemma is an initial such fact. Also, if V has many measurable
cardinals then L[U ] is quite unlike V .
Lemma 3. In L[U ], κ is the only measurable cardinal.
Remarks on proof: This is lemma 19.4 of [Jech2]. ⊳
Further properties of L[U ] may be shown using iterated ultrapowers, which were first considered by Gaifman, and subsequently by
Kunen. Suppose U is a κ-complete nonprincipal ultrafilter on a cardinal
κ. For an ordinal α the α-th iterated ultrapower Ultα may be defined by
recursion, together with elementary embeddings iβα : Ultβ 7→ Ultα for
β < α. The embeddings will satisfy the compatibility condition, that
for γ < β < α, iβα◦ iγβ = iγα . The cardinal κα may be defined as i0α (κ)
and the κ-complete nonprincipal ultrafilter Uα on κα as i0α (U ).
To start the recursion, Ult0 = V , κ0 = κ, and U0 = U . Let Ult′α+1
be the ultrapower of Ultα by Uα . If this is not well-founded, the iteration
131
stops. Otherwise, Ultα+1 is the transitive collapse of Ult′α+1 , and iα,α+1
is the composition of the transitive collapse map with the canonical
embedding (the remaining iβ,α+1 are determined by the compatibility
requirement). If α is a limit ordinal, let Ult′α be the direct limit of the
system {Ultβ : β < α}, where the map from Ultβ ′ to Ultβ for β ′ < β
is iβ ′ β . If this is not well-founded, the iteration stops. Otherwise, Ultα
is the transitive collapse of Ult′α , and iβ,α is the composition of the
transitive collapse map with the direct limit map.
Theorem 4. Every iterated ultrapower Ultα exists.
Remarks on proof: This is theorem 19.7 of [Jech2]. The proof makes
use of the “factor lemma”, lemma 19.5, which states that when the β-th
iterate is taken inside the model Ultα , the result is Ultα+β . ⊳
Theorem 5.
a. If U is a normal measure on κ in L[U ] then U is the only such.
b. For every ordinal κ there is at most one U ⊆ Pow(κ) such that U
is a normal measure on κ in L[U ].
c. Suppose κ1 < κ2 are ordinals, and for i = 1, 2, Ui is a normal
measure on κi in L[Ui ]. Then there is an ordinal α such that
L[D2 ] = Ultα in L[D1 ], and D2 = i0α (D1 ).
Remarks on proof: This is theorem 19.14 of [Jech2]. Lemma 19.13
(the representation lemma) gives a characterization of Ultα as an ultrapower defined by an ultrafilter on a certain Boolean algebra. Using this,
certain values of i0α can be computed (lemma 19.15). In turn, Ultλ in
L[U ] can be characterized for certain λ (lemma 19.17), and uniqueness
can then be shown (lemma 19.18). Part c is shown by iterating in L[D1 ]
“at least to D2 ”, and proving a lemma (lemma 19.19) which shows that
the iteration in fact “hits D2 ”. ⊳
42. The sharp operator.
By generalizing the definition of 0#, the notion of x# where x ⊆ ω
may be defined. Thus, # may be considered an operator on subsets of ω.
The sharp operator is occasionally useful in various discussions. Various
facts of section 38 continue to hold, merely expanding the language of
set theory with a unary predicate symbol, and considering structures
Lγ [x] in the definition of an EM set, etc.
In particular lemma 38.2 holds with this modification, for limit
ordinals γ > ω. This may be seen using the appropriate condensation
lemma, for example lemma 1.7 of [Mitchell1], noting that x ∪ {x} ∈
Lγ [x].
The proofs of the remaining facts (with Lγ replaced by Lγ [x]), in
particular theorem 38.9, require few if any changes.
In fact, x# can be defined for any set of ordinals; see [KanMag]
and [Mitchell1] for some remarks.
132
Theorem 1. If there is a cardinal κ such that κ → (ℵ1 )<ω
then a#
2
exists for all a ⊆ ω.
Remarks on proof: This is theorem 9.19 of [Kanamori3]. By the
relativized version of theorem 38.7, it suffices to show that there is an
uncountable set of indiscernibles for Lκ [a]. Number the formulas with
free variables indicated, so that in Fn,~v , ~v has length at most n. Partition
[κ]<ω into two parts, where a set corresponding to the sequence ξ1 <
· · · < ξn is in one part if Fn,~v (ξ1 , . . . , ξn ) is true, else in the other part.
A homogeneous subset is a set of indiscernibles. ⊳
In some topics, the hypothesis “a# exists for all a ⊆ ω” is of interest. By theorem 1, theorem 36.9, and the fact that if κ → (κ)<ω
2
then κ → (ℵ1 )<ω
2 , this hypothesis follows from the hypothesis that a
measurable cardinal exists.
43. Cardinals larger than measurable.
In this section, let “j : V 7→κ M ” denote that “j : V 7→ M is an
elementary embedding of V in the transitive class M , and κ is the least
ordinal moved”. By results of section 36, if j is nontrivial then κ exists
and is a measurable cardinal. The term “critical point” is often used for
κ.
A cardinal κ is said to be
- measurable if ∃j, M (j : V 7→κ M );
- strong if ∀α∃j, M (j : V 7→κ M ∧ Vα ⊆ M );
- superstrong if ∃j, M (j : V 7→κ M ∧ Vj(κ ) ⊆ M ); and
- supercompact if ∀α∃j, M (j : V 7→κ M ∧ M α ⊆ M ).
The first definition agrees with that given previously by results of section
36. All of these types of cardinals are measurable.
As observed in the proof of theorem 36.7, for a measurable cardinal,
Vκ+1 ⊆ M . Suppose Vκ+2 ⊆ M . Then the ultrafilter determined by j is
in M , so κ is measurable in M , so there is a measurable cardinal below
j(κ) in M , so there is a measurable cardinal below κ in V . The above
list of types of large cardinals elaborates on this theme, by imposing
various requirements on M , to the effect that it “more closely resemble”
V , to obtain cardinals which are “larger than measurable”.
A cardinal is said to be Woodin if for all A ⊆ Vκ there is a cardinal
λ < κ such that ∀α < κ∃j, M (j : V 7→λ M ∧ Vα ⊆ M ∧ α < j(λ) ∧
A ∩ Vα = j(A) ∩ Vα ). As will be seen below, though Woodin cardinals
need not be measurable, they lie between strong and superstrong in
“consistency strength”.
The above definitions are not formalizable in the first order language of set theory, since they involve quantifying over the proper class
j. As has been seen in section 36, for measurable cardinals a first order
definition can be given (and it can then be shown that there is a defin133
able embedding). However, using just a single ultrafilter U results in
an ultrapower M = UltU where U ∈
/ M (lemma 17.9 of [Jech2]), and so
Vκ+2 6⊆ M .
In the 1980’s it was discovered that systems of ultrafilters can be
used to characterize cardinals larger than measurable. The ultrafilters
in a system are indexed by elements of [λ]<ω where λ is some ordinal
with κ < λ. The ultrafilter Ea for a ∈ [λ]<ω is an ultrafilter on [ζ]|a| for
some ordinal ζ ≥ κ.
It is convenient to adopt the convention that, when a finite set of
ordinals is written as {α1 , . . . , αn }, α1 < · · · < αn is understood. If
a = {α1 , . . . , αn }, b = {β1 , . . . , βm }, and a ⊆ b, a map t : {1 · · · n} 7→
{1 . . . m} is induced, where ai = bt(i) . This in turn induces a map
π : [ζ]|b| 7→ [ζ]|a| , where {ξ1 . . . ξm } maps to {ξt(1) . . . ξt(m) }. When
necessary, π may be written as πba . It is easily checked that for a ⊆ b ⊆
c, πca = πba ◦ πcb .
The following convenient notation will be introduced. Suppose a ⊆
−1
b. Given X ⊆ [ζ]|a| let Xab = πba
[X] (X is transformed to a subset
|b|
of [ζ] ). Given a function f with domain [ζ]|a| let fab be f ◦ πba (f is
transformed to a function with domain [ζ]|b| ). For the following lemma,
for a finite set of ordinals s let si denote the ith element in increasing
order (so that s = {s1 . . . sn }); and for ξ ∈ s let i(ξ, s) be that i such
that ξ = si .
Lemma 1. Suppose j : V 7→ M is an elementary embedding with
critical point κ; λ > κ is an ordinal; and ζ is the least ordinal with ζ ≥ κ
and λ ≤ j(ζ). For a ∈ [λ]<ω let Ea = {X ⊆ [ζ]|a| : a ∈ j(X)}. The
following hold.
a. ζ ≥ κ and [ζ]|a| ∈ Ea .
b. Ea is a κ-complete ultrafilter on [ζ]|a| .
c. For some a, Ea is not κ+ -complete.
d. If β < ζ then there is an a such that Ea is not κ+ -complete, and
{s ∈ [ζ]|a| : β ∈ s} ∈ Ea .
e. For a ⊆ b and X ⊆ [ζ]|a| , X ∈ Ea if and only if Xab ∈ Eb .
f. Suppose f : [ζ]|a| 7→ V , and {s ∈ [ζ]|a| : f (s) < max(s)} ∈ Ea .
Then for some b with a ⊆ b, {s ∈ [ζ]|b| : fba (s) ∈ s} ∈ Eb .
g. Suppose for each i ∈ ω, ai ∈ [ζ]<ω and Xi ∈ Eai . Then there is a
function d : ∪i ai 7→ ζ such that for all i ∈ ω, d[ai ] ∈ Xi .
Remarks on proof: This is stated following (20.39) of [Jech2]; it
is also exercise 26.3.a of [Kanamori3]. Since λ ≤ j(ζ), a ∈ [λ]|a| ⊆
[j(ζ)]|a| = j([ζ])|a| , proving part a. For part b, the proof of theorem
36.4 may be adapted, using part a. For part c, for α < κ let Xα =
{{ξ} : ξ ≥ α}. Since j(α) = α, j(Xα ) = {{ξ} : ξ ≥ α}, whence
{κ} ∈ j(Xα ), so Xα ∈ Eκ . Since {κ} ∈ [j(κ)]1 = j([κ]1 ), [κ]1 ∈ Eκ .
134
Letting Yα = Xα ∩ [κ]1 , Yα ∈ Eκ . But ∩α<κ Yα = ∅, so Eκ is not κ+ complete. For part d, A = B ∪ C where A = {{ξ1 , ξ2 } : β ∈ {ξ1 , ξ2 }},
B = {{ξ1 , β}}, and C = {{β, ξ2 }. First suppose β ≥ κ. Since β < ζ,
j(β) < λ; let a = {κ, j(β)}. Then a ∈ j(B), whence B ∈ Ea , so
A ∈ Ea . For α < κ let Xα = {{ξ, β} : α ≤ ξ < κ}; then a ∈ j(Xα );
this family shows that Ea is not κ+ -complete. Now suppose β < κ,
whence j(β) = β; let a = {β, κ}. Then a ∈ j(C), whence C ∈ Ea , so
A ∈ Ea . For α < κ let Xα = {{β, ξ} : α ≤ ξ < κ}; this family shows
that Ea is not κ+ -complete. For part e, since b ∈ j([ζ]|b| ), it suffices to
−1
show that a ∈ j(X) if and only if b ∈ πba
[j(X)]. This is clear, since
πba (b) = a. For part f, let X denote {s ∈ [ζ]|a| : f (s) < max(s)}.
Then ∀s ∈ X∃β < max(s)(f (s) = β), whence in M the formula is
true with X replaced by j(X) and f replaced by j(f ). By hypothesis
a ∈ j(X), whence there is a β such that β < max(a) and j(f )(a) = β.
let P (s, a, β) be the predicate “β < max(a) and s ∈ [µa ]a∪{β} and
fab (s) = si(β,b) ”. Let Y denote {s ∈ [ζ]|b| : fab (s) = si(β,b) }. Then
∀s∀a∀β(P ⇒ s ∈ Y ), so replacing f by j(f ) and Y by j(Y ) it is true in
M . Choosing β as above, Y ∈ Eb ; part f follows. For part g, for each
i, ∃s(s : n 7→ λ ∧ s[n] ∈ j(Xi )), whence ∃s(s : n 7→ j(ζ) ∧ s[n] ∈ j(Xi )),
whence ∃s(s : n 7→ ζ ∧ s[n] ∈ Xi ). Choosing si as the value for Xi , d
can be constructed from the si . ⊳
Given a cardinal κ and an ordinal λ > κ, a system of ultrafilters
satisfying the properties of lemma 1 is called a (κ, λ)-extender. It should
be noted that [Jech2] only considers the case ζ = κ and λ ≤ j(κ).
Extenders where λ > j(κ) and ζ > κ are called “long”; it will shortly
be seen that they are useful. Also, [Kanamori3] considers j : N 7→ M
with N not necessarily V ; but this will not be needed here.
Given an extender, an ultraproduct may be taken. Let U0 =
{ha, f i : a ∈ [λ]<ω , f : [ζ]|a| 7→ V . Let ≡0 be the binary relation on U0 ,
where ha, f i ≡0 hb, gi if and only if {s ∈ [ζ]|c| : fac (s) = fbc (s)} ∈ Ec ,
where c = a ∪ b. Let ∈0 be the binary relation, where ha, f i ∈0 hb, gi if
and only if {s ∈ [ζ]|c| : fac (s) ∈ fbc (s)} ∈ Ec , where c = a ∪ b.
Lemma 2. ≡0 is a congruence relation on U0 , equipped with ∈0 .
Remarks on proof: This construction is mentioned preceding lemma
20.29 of [Jech2]. First, for a ⊆ b if f (s) = g(s) for s ∈ X where
X ∈ Ea then fab (s) = f (πba (s)) = g(πba (s)) = gab (s) for s ∈ Xab
where Xab ∈ Eb . That ≡0 is reflexive follows from [ζ]|a| ∈ Ea . That
≡0 is symmetric is immediate. Given a, b, c let c1 = a ∪ b, c2 = b ∪
c,and c3 = a ∪ b ∪ c. Suppose fac1 (s) = gbc1 (s) for s ∈ Xc1 where
Xc1 ∈ Ec1 , and gbc2 (s) = hcc2 (s) for s ∈ Xc2 where Xc2 ∈ Ec2 . Then
fac3 (s) = gbc3 (s) for s ∈ Xc1 c3 and gbc3 (s) = hcc3 (s) for s ∈ Xc2 c3 ,
whence fac3 (s) = hcc3 (s) for s ∈ Xc1 c3 ∩ Xc2 c3 . This proves that ≡0 is
135
transitive. Given a, b, a′ , b′ let c1 = a ∪ b, c2 = a ∪ a′ , c3 = b ∪ b′ , and
c4 = a∪b∪a′ ∪b′ . Suppose fac1 (s) ∈ gbc1 (s) for s ∈ Xc1 where Xc1 ∈ Ec1 ,
fac2 (s) = fa′ ′ c2 (s) for s ∈ Xc2 where Xc2 ∈ Ec2 , and gbc3 (s) = gb′ ′ c3 (s)
for s ∈ Xc3 where Xc2 ∈ Ec3 ; arguing as above, fa′ ′ c4 (s) ∈ gb′ ′ c4 (s) for
s ∈ Y where Y ∈ Ec4 , which proves that ≡0 respects ∈0 . ⊳
Letting E denote the extender, let UltE0 be the quotient of U0 by
≡0 . This is a structure for the language of set theory. To simplify the
notation, write [a, f ] for [ha, f i].
Lemma 3 (Los theorem). Suppose φ is a ∆0 formula. Letting N
denote UltE0 , suppose and [ai , fi ] is an element of N for 1 ≤ i ≤ n. Let
c = ∪i ai . Then
a. |=N φ([a1 , f1 ], . . . , [ak , fk ])
if and only if
b. Xφ ∈ Ec where Xφ = {s ∈ [ζ]|c| :|=V φ(f1a1 c (s), . . . , fkak c (s))}.
Proof: The proof is by induction on the formation of φ. The claim
holds for atomic formulas by definition. The claim for φ = ¬ψ follows,
from claim for ψ, since Xφ = [ζ]|c| − Xψ . The claim for φ = ψ ∧ θ follows
from the claim for ψ and θ, since Xφ = Xψ ∩ Xθ . It might be necessary
to add variables to the list for ψ or θ; that this is permissible follows by
property (e) of a (κ, λ)-extender.
Suppose φy,~x is ∃z ∈ y ψz,~x . If |=N φ([b, g], [a1 , f1 ], . . . , [ak , fk ]) then
for some c, h, |=N [c, h] ∈ [b, g] ∧ ψ([c, h], [a1 , f1 ], . . . , [ak , fk ]). Using
the induction hypothesis, and letting c1 = ∪i ai ∪ b and c2 = c1 ∪ c,
{s :|=V hcc2 (s) ∈ g cc2 (s) ∧ ψ(hcc2 (s), f1a1 c2 (s), . . . , f1ak c2 (s))} ∈ Ec2 ,
whence {s :|=V φ(g cc1 (s), f1a1 c1 (s), . . . , f1ak c1 (s))} ∈ Ec1 .
Suppose {s :|=V φ(g cc1 (s), f1a1 c1 (s), . . . , f1ak c1 (s))} ∈ Ec1 ; and let
h : [µc1 ]|c1 | 7→ ∪Ran(g) be a function where h(s) is some y ∈ ∪Ran(g)
such that |=V y ∈ g(s) ∧ ψ(y, f1a1 c1 (s), . . . , f1ak c1 (s)) if such a y exists,
else ∅. Then {s :|=V h(s) ∈ g(s)∧ψ(h(s), f1a1 c1 (s), . . . , f1ak c1 (s))} ∈ Ec1 .
Using the induction hypothesis, |=N [c1 , h] ∈ [b, g]∧ψ([c1 , h], [a1 , f1 ], . . . ,
[ak , fk ]); and a follows. ⊳
Lemma 4. UltE0 has small extensions and is well-founded.
Proof: Given [a, f ], to consider its elements it suffices to consider
[b, g] with b ∈ [λ]<ω and g where either g(s) ∈ f (s) or g(s) = ∅; this
shows that UltE0 has small extensions. Suppose UltE0 is not wellfounded, and let [ai , fi ] be such that [ai+1 , fi+1 ] ∈ [ai , fi ] for i ∈ ω.
Let ci = ai ∪ ai+1 and Xi = {s ∈ [ζ]ci : fiai ci (s) = fi+1,ai+1 ,ci+1 (s)}.
Then Xi ∈ Eai , so using property (g) of an extender, let d : ∪i ai 7→
ζ be such that d[ai ] ∈ Xi . Then fi+1,ai+1 ,ci (d[ci ]) ∈ fi,ai ,ci (d[ci ]),
fi+2,ai+2 ,ci+1 (d[ci+1 ]) ∈ fi+1,ai+1 ,ci+1 (d[ci+1 ]), and πci ai+1 (d[ci ]) =
πci+1 ai+1 (d[ci+1 ]), yielding an infinite descending chain of sets. ⊳
|a|
Since Ea is an ultrafilter, there is an ultrapower V [ζ] / ≡Ea . Let
136
UltEa 0 denote this, and j Ea 0 : V 7→ UltEa 0 the canonical embedding.
Lemma 5. The map f 7→ fab induces an elementary embedding
0
jab
: UltEa 0 7→ UltEb 0 ; these maps together with the objects UltEa 0 form
direct system. UltE0 is the direct limit; the map ja0 : UltEa 0 7→ UltE0 is
that induced by f 7→ ha, f i.
Remarks on proof: This is a routine verification; an outline will be
0
is well-defined. Using
given. If f ≡Ea g then fab ≡Eb gab , whence jab
Los theorem twice, it follows that |=UltEa 0 φ([a, f1 ], . . . , [a, fk ]) if and
0
only if |=UltEb 0 φ([b, f1ab ], . . . , [b, fkab ]); that is, jab
is elementary. Since
0
0
0
0
jac = jbc ◦ jab , the jab are the maps of a direct system. If f ≡Ea g then
ha, f i ≡E ha, gi, whence ja0 is well-defined. That it is elementary again
follows using Los theorem twice. For a ⊆ b, [a, f ] = [b, fab ], and ja0 =
0
. Suppose U ′ is another structure with maps ja′ : UltEa 0 7→ U ′
jb0 ◦ jab
0
satisfying ja′ = jb′ ◦ jab
. If there is a map j∞ : UltEb 0 7→ U ′ such that
′
j∞ ◦ ja = ja for all a, then it must be the case that j∞ ([a, f ]) = ja′ ([f ]).
By remarks following lemma 40.3, to complete the proof that UltE0
is the direct limit, it suffices to show that this prescription yields a
well-defined elementary embedding. But if [a, f ] = [b, c] then [c, fac ] =
[c, gac ], whence [fac ] = [gac ]; and so ja′ ([f ] = jc′ ([fac ]) = jc′ ([gbc ]) =
jb′ ([g]). Finally, let c = ∪i ai . Since [a1 , f1 ] = [c, f1a1 c ] = jc ([f1a1 c ) and
jc′ ([f1a1 c ]) = ja′ 1 ([f1 ]) = j∞ ([a1 , f1 ]), |=UltE0 φ([a1 , f1 ], . . .) if and only if
|=UltE0 φ(jc ([f1a1 c ]), . . .) if and only if |=UltEc 0 φ([f1a1 c ], . . .) if and only
if |=U ′ φ(jc′ ([f1a1 c ]), . . .) if and only if |=U ′ φ(j∞ ([a1 , f1 ]), . . .). ⊳
The preceding lemma is not needed, but is included because many
authors define UltE0 as the direct limit. It follows that for each a the
ultrapower UltEa 0 is well-founded. It also follows that, letting UltEa and
UltE denote the transitive collapses, and jab and ja denote the maps
composed with the appropriate transitive collapse isomorphisms or their
inverse, UltE is the direct limit of the UltEa .
For x ∈ V let Cx denote the function {h∅, xi}. Let j E0 : V 7→
UltE0 (M ) be the map given by j E0 (x) = [∅, Cx ]. Let U : [ζ]1 be the
function {h{ξ}, ξi : ξ < ζ}. For n ∈ ω let In denote the identity function
on [ζ]n .
Lemma 6.
a. j E0 is an elementary embedding.
b. For α < κ, the elements of [∅, Cα ] are the elements [∅, Cβ ] for β < α
(whence [∅, Cα ] is the ordinal α in UltE0 ).
c. For α < λ, the elements of [{α}, U ] are the elements [{β}, U ] for
β < α. (whence [{α}, U ] is the ordinal α in UltE0 ).
d. For a ∈ [λ]ω , the elements of [a, I|a| ] are the elements [{α}, U ] for
α ∈ a. (whence [a, I|a| ] is the set a in UltE0 ).
e. For all [a, f ], [∅, f ]([a, I|a| ]) = [a, f ].
137
f. For all a ∈ [λ]<ω , X ∈ Ea if and only if a ∈ j E0 (X).
g. The critical point of j E0 equals κ.
h. ζ is the least ordinal with ζ ≥ κ and λ ≤ j E0 (ζ).
Remarks on proof: These claims are proved in [Kanamori3], many
of them in the proof of lemma 26.2. For part a, in the notation of the
proof of lemma 4, Xφ (Cx1 , . . . , Cxk ) equals {h∅, . . . , ∅i} if |=M φ(x1 , . . . ,
xk ), else ∅. That j E0 is elementary follows using lemma 3, and since
equality is present j E0 is injective. For part b, it is straightforward
to verify that [∅, Cβ ] is an element of [∅, Cα ]. Suppose [a, f ] ∈ [∅, Cα ];
then {s : f (s) ∈ α} ∈ Ea . Since Ea is a κ-complete ultrafilter, there
is some β < α such that {s : f (s) = β} ∈ Ea , and [a, f ] = [∅, Cβ ].
The last claim follows inductively. For part c, it is straightforward to
verify that [{β}, U ] is an element of [{α}, U ]. Suppose [a, f ] ∈ [{α}, U ].
Let c1 = a ∪ {α} and X = {s : fac1 (s) ∈ U{α}c1 (s) }; then X ∈ Ec1 .
But U{α}c1 (s) is some member of s, so for s ∈ X, fac1 (s) < max(s).
Using property (f) of a (κ, λ)-extender, for some c2 ⊇ c1 , {s ∈ [ζ]|c2 | :
fac2 (s) ∈ s} ∈ Ec2 , whence for some i, {s ∈ [ζ]|c2 | : fac2 (s) = si } ∈ Ec2 ,
for some β < max(s). Since x ∈ y ⇒ ¬(y ∈ x ∨ y = x) holds in V ,
it holds in UltE0 . It follows that β < α must hold. For part d, it is
straightforward to verify that [{α}, U ] is an element of [a, I|a| ]. Suppose
[b, f ] ∈ [a, I|a| ]. Let c1 = a ∪ b; then {s : fac1 (s) ∈ I|a|ac1 } ∈ Ec1 . It
follows that for some i of a position of a in c1 , {s : fac1 (s) = si } ∈
Ec1 . It follows from this that for some α ∈ a, [b, f ] = [{α}, U ]. For
part e, {s ∈ [ζ]|a| : f (s) = (Cf ∅a (s))(I|a| (s))} equals [ζ]|a| . For part f,
X ∈ Ea if and only if {s ∈ [ζ]|a| : I|a| (s) ∈ CX∅a (s)} ∈ Ea if and only
if [a, I|a| ∈ [∅, CX ]. For part g, suppose κ̄ is the critical point of j E0 . If
ν < κ̄ and Xα ∈ Ea for α < ν then by part f, a ∈ j E0 (Xα ) for α < ν,
whence a ∈ ∩α j E0 (Xα ). Since ν < κ̄ it follows as usual that whence
a ∈ j(∩α j E0 Xα ). This shows that Ea is κ̄-complete. By property (c)
of an extender, κ < κ̄ cannot hold, whence by part b κ̄ = κ. For part
h, ζ ≥ κ since otherwise Ea would be principal and hence κ+ -complete
for all a. If α < λ then {{xi} : ξ ∈ ζ} ∈ E{α} , so [{α}, U ] ∈ [∅, Cζ ]; it
follows that λ ≤ j E0 (ζ). Suppose κ ≤ ζ ′ < ζ. By property (d) of an
extender, [∅, ζ ′ ] ∈ [a, I|a| ] for some a. It follows that j E (ζ ′ ) ∈ a ⊆ λ. ⊳
Let π denote the transitive collapse map for UltE0 , and let j E =
π ◦ j E0 (so that j E : V 7→ UltE ). By part b, jE (α) = α for α < κ. By
part c, π([{α}, U ]) = α for α < λ. By part d, π([a, I|a| ]) = a. By part
e, Ea = {X ⊆ [ζ]|a| : a ∈ j E (X)}. By part f, UltE = {j E (f )(a) : f :
[ζ]|a| 7→ V, a ∈ [λ]<ω }. ⊃E0 may be replaced by j E in parts g and h.
Using lemma 5 it is easy to verify that j E0 = ja0 ◦ j Ea 0 . Composing
with the appropriate transitive collapses and their inverses, j E = ja ◦
j Ea . See remarks preceding lemma 26.1 of [Kanamori3].
138
Lemma 7. Suppose j : V 7→κ M , λ > κ, and E is the (κ, λ)extender derived from j.
a. The map ha, f i 7→ j(f )(a) induces an elementary embedding k :
UltE 7→ M .
b. k ◦ j E = j.
c. If |Vγ | ≤ λ then Vγ) ⊆ Ran(k), whence k ↾ Vj E (β) is the identity.
d. k ↾ λ is the identity.
Remarks on proof: These claims are proved in the proof of 26.1
of [Kanamori3]. For part a, given a formula φ and hai , fi i for 1 ≤ i ≤
k, let X = {s ∈ [ζ]c : φ(f a1 c1 (s), · · ·)} where c = ∪i ai . Then s ∈
X ⇔ φ(f a1 c1 (s), · · ·), and it follows that s ∈ j(X) ⇔ φ(j(f )a1 c1 (s), · · ·).
Letting φ be equality, c ∈ j(X) if and only if j(f )(a) = j(g)(b), showing
k is well-defined. Letting φ be arbitrary, it follows that k is elementary.
Part b follows because j(Cx )(∅) = j(x).
For part c, the claim is clear for β < ω, so suppose β ≥ ω. It is easy
to see that there is a bijection gβ : |Vβ | 7→ Vβ , such that for ω ≤ γ ≤ β,
gβ [|Vγ |] = Vγ . For example to extend gβ to Vβ+1 , use a bijection from
|Vβ+1 | to Vβ+1 − Vβ . Let f be the function where f ({ξ}) = gβ (ξ), for
ξ < min(ζ, |Vβ |). Then if [|Vγ |]1 ⊆ Dom(f ) then f [[|Vγ |]1 ] = Vγ . Hence
this is true in UltE for j(f ). If |Vγ | ≤ λ then [|Vγ |]1 ⊆ Dom(f ). The
first claim follows by part a, and the second claim follows because k is
an elementary embedding.
For part d, if α < λ then α is represented by [{α}, U ], whence
k(α) = j(U )({α}) = α. ⊳
Theorem 8. The following are equivalent, for a cardinal κ and
ordinal α.
a. ∃j, M (j : V 7→κ M ∧ Vκ+α ⊆ M ).
b. For some λ there is a (κ, λ)-extender E such that Vκ+α ⊆ UltE .
Remarks on proof: For related results see exercise 26.7 of [Kanamori3]. Suppose a holds. Let λ = |Vκ+α | and let E be the (κ, λ)-extender
derived from j. By lemma 7, Vκ+α ⊆ UltE . Suppose (b) holds. Letting
j = jE and M = UltE , it is immediate that (a) holds. ⊳
Thus, κ is strong if and only if (b) holds for all α. This latter fact is
expressible in the language of set theory. A similar argument shows that
κ is superstrong if and only if, for some λ there is a (κ, λ)-extender E
such that Vj(κ) ⊆ UltE , whence this also is expressible in the language
of set theory.
Given a cardinal λ, an ordinal α, and a set A, say that λ is α-strong
for A ∃j, M (j : V 7→λ M ∧ Vα ⊆ M ∧ α < j(λ) ∧ A ∩ Vα = j(A) ∩ Vα ).
Theorem 9. For a cardinal κ, consider the following properties.
a. ∀A ⊆ Vκ {λ < κ : ∀α < κ(λ is α-strong for A)} is nonempty.
b. ∀A ⊆ Vκ {λ < κ : ∀α < κ(λ is α-strong for A)} is stationary.
139
c. For any function f : κ 7→ κ there is a cardinal λ < κ with f [λ] ⊆ λ,
and a j, M such that j : V 7→λ M , and Vj(f )(λ) ⊆ M .
d. For any function f : κ 7→ κ there is a cardinal λ < κ with f [λ] ⊆
λ, and an extender E ∈ Vκ , such that j E has critical point λ,
Vj E (f )(λ) ⊆ M e , and j E (f )(λ) = f (λ).
If property c holds then κ is Mahlo and there is a stationary set of
measurable cardinals below κ. All 4 properties are equivalent.
Remarks on proof: See exercise 26.10 and lemma 26.14 of [Kanamori3], or lemma 34.2 of [Jech2]. ⊳
Property (a) is just the definition of a Woodin cardinal given earlier. Property (d) is expressible in the language of set theory. In fact,
the property of being a Woodin cardinal is “Π11 -describable”; whence
the smallest Woodin cardinal is not Π11 -indescribable, and hence not
measurable.
An uncountable regular cardinal is said to be strongly compact if,
for any set S, every κ-complete filter on S can be extended to a κcomplete ultrafilter on S.
If A is a set with with |A| ≥ κ, and x ∈ [A]<κ , let x̂ = {y ∈ [A]<κ :
x ⊆ y}. Let F be the filter on [A]<κ generated by {x̂ : x ∈ [A]<κ }.
Lemma 10. For an uncountable regular cardinal κ, F is κ-complete.
Proof: If x ⊆ y then ŷ ⊆ x̂. Given x̂xi for ξ < µ where µ < κ, let
u = ∪ξ xξ ; then u ∈ [A]<κ and û ⊆ ∩ξ x̂ξ .
A fine measure on [A]<κ is defined to be a κ-complete ultrafilter
which extends F .
Let Lκ,ω denote a language where conjunctions and disjunctions of
µ subformulas are allowed, for µ < κ. Also, there are κ variables. A
formal definition is straightforward and will be omitted.
Theorem 11. For an uncountable regular cardinal κ, the following
are equivalent.
a. κ is strongly compact.
b. For any set A with |A| ≥ κ, there exists a fine measure on [A]<κ .
c. Suppose S is a set of sentences in the language Lκ,ω , and every
subset T ⊆ S with |T | < κ has a model; then S has a model.
Remarks on proof: This is lemma 20.2 of [Jech2]. ⊳
Say that a fine measure U on [A]<κ is normal if whenever f :
<κ
[A]
7→ A is such that f (x) ∈ x for x ∈ S where S ∈ U , then f is
constant on some S ′ ⊆ S with S ′ ∈ U .
Theorem 12. For an uncountable regular cardinal κ, the following
are equivalent.
a. κ is supercompact.
b. For any set A with |A| ≥ κ, there exists a normal fine measure on
[A]<κ .
140
Remarks on proof: This follows by lemma 20.14 of [Jech2]. ⊳
Property (b) is expressible in the language of set theory. There is
an embedding characterization of strongly compact cardinals, namely, a
cardinal is strongly compact if and only if ∀α∃j, M (j : V 7→κ M ∧ ∀X ⊆
M (|X| ≤ α ⇒ ∃Y (Y ∈ M ∧ X ⊆ Y ∧ |=M |Y | < j(κ))). A proof may
be found in theorem 22.17 of [Kanamori3].
Given types of large cardinals T1 and T2 , some relationships which
might hold include the following.
- T2 (κ) ⇒ T1 (κ)
- ∃κT2 (κ) ⇒ ∃κT1 (κ)
- If ∃κT2 (κ) then in an inner model, ∃κT1 (κ) holds
- Con(ZF C + ∃κT2 (κ)) ⇒ Con(ZF C + ∃κT1 (κ))
(That is, these statements are provable in ZFC). Write these as T2 ⇒ T1 ,
T2 ⇒e T1 , T2 ⇒i T1 , and T2 ⇒c T1 respectively. All four implications
are transitive. It is easily seen that if T2 ⇒ T1 then T2 ⇒e T1 ; and
if T2 ⇒e T1 then T2 ⇒i T1 . It is also true that if T2 ⇒i T1 then
T2 ⇒c T1 . Indeed, for sentences φ, ψ, to show that Con(ZF C + φ) ⇒
Con(ZF C + ψ), it suffices to show (in ZFC) that φ implies that ψ has
an inner model (see remarks preceding theorem II.4.1 of [Devlin]).
Theorem 13.
a. supercompact⇒superstrong
b. superstrong⇒Woodin
c. Woodin⇒i strong
d. supercompact⇒strong
e. Woodin⇒e measurable
f. supercompact⇒strongly compact
g. strongly compact⇒i Woodin
With the exception of Woodin cardinals, all the above types of cardinals
are measurable.
Remarks on proof: Say that a cardinal κ is α-supercompact if
∃j, M (j : V 7→κ M ∧ M α ⊆ M ). If x ∈ M and M is 2|x| -supercompact
then Pow(x) ∈ M , because Pow(x) can be constructed from a wellordering of x and a string of 0’s and 1’s of length |x| · 2|x| . Parts a
and d follow. For part b, see proposition 26.12 of [Kanamori3]. Part b
follows by exercise 34.3 of [Jech2]. Part c follows because each λ in the
definition of a Woodin cardinal is strong in Vκ . Part e follows because
each λ in the definition of a Woodin cardinal is measurable. Part f follows by theorems 11 and 12. Part g follows using sophisticated results
in inner model theory; see [SchSt]. It has already been observed that
supercompact, superstrong, and strong cardinals are measurable. That
strongly compact cardinals are measurable follows by theorem 11.b. ⊳
It is not true that a superstrong cardinal is strong; see exercise 26.9
141
of [Kanamori3]. It is considered a major open question of set theory,
whether strongly compact ⇒c supercompact.
An “order of measurability”, called the Mitchell order, can be defined for a cardinal κ. For normal ultrafilters U1 , U2 ⊆ κ, say that
U1 < U2 if U1 ∈ UltU2 .
Theorem 14. < is transitive and well-founded.
Remarks on proof: Suppose U2 < U1 < U0 . U1 is represented by
a function Ũ1 , such that Ũ1 (α) is an ultrafilter in α, on a set I1 ∈ U0 .
W.l.g., I1 can be assumed to be a set of cardinals, since the cardinals are
a club subset of κ and U0 is normal. Since U0 is normal κ is represented
by the diagonal function. A subset x ⊆ κ is represented by the function
whose value at α is x∩α. Letting U1p (x) denote {λ ∈ I1 : x∩λ ∈ Ũ1 (λ)},
x ∈ U1 if and only if U1p (x) ∈ U0 . Similarly y ∈ U2 if and only if
U2p (y) ∈ U1 . So y ∈ U2 if and only if U1p (U2p (y)) ∈ U0 . For t ⊆ λ let
U2pλ (t) = {µ ∈ I2 ∩ λ : t ∩ µ ∈ Ũ2 (µ)}; then U2p (x) ∩ λ = U2pλ (x ∩ λ) Thus
λ ∈ U1p (U2p (y)) if and only if λ ∈ I1 and U2p (y) ∩ λ ∈ Ũ1 (λ), if and only if
λ ∈ I1 and U2pλ (y ∩ λ) ∈ Ũ1 (λ). Let W̃ (λ) = {t ⊆ λ : U2pλ (t) ∈ Ũ1 (λ)}.
Then y ∈ U2 if and only if {λ ∈ I1 : x∩λ ∈ W̃ (λ)}, so to show U2 < U0 it
suffices to show that for λ ∈ I1 , W̃ (λ) is a λ-complete normal ultrafilter
in λ. Let J denote U1p (I2 ), i.e., {λ ∈ I1 : I2 ∩ λ ∈ Ũ1 (λ)}. Since
I2 ∈ U1 , J ∈ U0 . For the following assume λ ∈ J, whence λ ∈ I1
and I2 ∩ λ ∈ Ũ1 (λ) (although λ ∈ I1 suffices for some cases). Suppose
t ∈ W̃ (λ) and t ⊆ s. Then U2pλ (t) ∈ Ũ1 (λ), and U2pλ (t) ⊆ U2pλ (s),
so U2pλ (s) ∈ Ũ1 (λ), so s ∈ W̃ (λ). Suppose η < λ and tξ ∈ W̃ (λ)
for ξ < η. Let Kξ = U2pλ = {µ ∈ I2 ∩ λ : tξ ∩ µ ∈ Ũ2 (µ)}; then
Kξ ∈ U1 (λ). Let K = (∩ξ<η Kξ ) ∩ (η, λ); then K ∈ U1 (λ). If µ ∈ K
then η < µ and tξ ∩ µ ∈ Ũ2 (µ) for all ξ < η, so (∩ξ<η tξ ) ∩ µ ∈ Ũ2 (µ).
Thus, ∩ξ<η tξ ∈ W̃ (λ). For any µ ∈ I2 ∩ λ, either t ∩ µ ∈ Ũ2 (µ) or
tc ∩ µ ∈ Ũ2 (µ). So if K1 = {µ ∈ I2 ∩ λ : t ∩ µ ∈ Ũ2 (µ)} and K2 = {µ ∈
I2 ∩ λ : tc ∩ µ ∈ Ũ2 (µ)}, then I2 ∩ λ = K1 ∪ K2 . Since I2 ∩ λ ∈ Ũ1 (λ),
either K1 ∈ Ũ1 (λ) or K2 ∈ Ũ1 (λ). Suppose tξ ∈ W̃ (λ) for ξ < λ. Let
Kξ = U2pλ = {µ ∈ I2 ∩ λ : tξ ∩ µ ∈ Ũ2 (µ)}; then Kξ ∈ U1 (λ). Let
K = △ξ < λKξ ; then K ∈ U1 (λ). If µ ∈ K then tξ ∩ µ ∈ Ũ2 (µ) for all
ξ < µ, so △ξ < µ(tξ ∩ µ) ∈ Ũ2 (µ), so (△ξ < λtξ ) ∩ µ ∈ Ũ2 (µ). Thus,
△ξ<λ tξ ∈ W̃ (λ). This completes the proof that < is transitive. The
proof that it is well-founded may be found in lemma 19.31 of [Jech2]. ⊳
Let ρ< denote the rank function for < defined in section 32. For a
cardinal κ let ρ< (κ) = sup{ρ< (U ) + 1 : U is a normal ultrafilter on κ}.
Clearly ρ< (κ) > 0 if and only if κ is measurable. ρ< (κ) > 1 if and only
if there is a normal ultrafilter U on κ such that ρ< (U ) ≥ 1. By lemma
19.33 of [Jech2], this is so if and only if {λ : ρ< (λ) ≥ 1} ∈ U , i.e., the
142
measurable cardinals below κ comprise a set which is in U .
In general ρ< is a measure of the “order of measurability” of κ.
The following may be shown.
- ρ< (κ) ≤ (2κ )+ ; if GCH holds then ρ< (κ) ≤ κ++ (remarks following
lemma 19.34 of [Jech2]).
- If κ is strong then ρ< (κ) ≤ κ++ (exercise 20.17 of [Jech2]).
From one point of view, the types of large cardinals considered
above comprise the most important types larger than measurable. Even
larger types are considered, though. A cardinal κ which was the critical
point of an elementary embedding j : V 7→ V would be very large.
However in 1971 it was shown that there is no such (definable proper
class) j. This is theorem 17.7 of [Jech2]; and also theorem 23.12 of
[Kanamori3], where three proofs are given.
It became of interest what types of cardinals can be defined, for
which it was not known whether the existence of such cardinals was
demonstrably false. The main such which have been defined are I0, I1,
I2, and I3 (the rank-into-rank types); huge cardinals and related types;
and extendible cardinals and related types. The main properties of
these types of cardinals can be found in [Kanamori3]; [Jech2] has some
discussion. Although not of as great interest as the types considered
above, these types continue to be studied.
44. Kunen’s theorem.
As has been seen in section 38, the principle “0# exists” is greatly
at variance with the principle “V = L”. In 1974 R. Jensen proved the
“covering lemma”, which is a statement to the effect that if 0# does not
exist (written “¬0#”) then there are restrictions on how greatly V can
differ from L. In particular, the “singular cardinals hypothesis”, which
follows from GCH and hence from V = L, follows from ¬0#.
The proof of the covering lemma is quite involved, no matter how it
is done. It was originally proved using the “fine structure theory”. Later,
proofs were given which did not require this. More recently, proofs
using fine structure theory have been recognized as being of additional
interest, in particular to generalizations of the covering lemma to “core
models”. An introduction to these methods may be found in [Mitchell2].
An overview of a proof of the covering lemma for L will be given in
section 50; the sections from here through section 49 discuss preliminary
results. The singular cardinal hypothesis will be discussed in section 51.
As seen in section 36, the existence of a measurable cardinal is
equivalent to the existence of a non-trivial elementary embedding of
V . A theorem of Kunen states that “0# exists” is equivalent to the
existence of a non-trivial elementary embedding of L. This is clearly a
fact of interest in itself, and will be needed in section 50.
143
An elementary embedding j : M 7→ N between transitive models
(sets or classes) of ZFC is defined as usual. In this section. only the
case that M is a proper class will be considered, in which case N is.
The case that M is a set is also of use; a discussion may be found in
[Cummings] for example.
Various facts from the case M = V are readily adapted. In particular, an ultrafilter may be defined from j. It need not be in M , though,
so the following definition is needed: If κ is a cardinal in M , an M ultrafilter on κ is an ultrafilter in the Boolean algebra Pow(κ)M . Say
that U is M -κ-complete if ∩ξ<η Xξ ∈ U whenever η < κ and hXξ : ξ < ηi
is an element of M . Say that U is M -normal if △ξ<κ Xξ ∈ U whenever
hXξ : ξ < κi is an element of M .
Given an M -ultrafilter U on a cardinal κ of M , the ultrapower
(M κ )M / ≡U can be constructed as follows. (M κ )M is the functions f
in M with domain κ. Say that two such are equivalent if they are equal
on a set in U . An element of the ultrapower is the elements of least rank
b is defined as in section 36, and
of an equivalence class. The predicate ∈
theorems 35.3 and 35.4 hold.
By the same argument as in the proof of lemma 36.1, (M κ )M / ≡U
has small extensions. In general it is not necessarily well-founded, even
if U is M -κ complete. If U is M -κ complete then κ is a regular cardinal in M ; this follows by an argument given in the proof of theorem
36.7. In general ensuring further properties of κ requires placing further
restrictions on U ; see [Kunen2].
Suppose j : M 7→ N is an elementary embedding where M (and
hence N ) is a proper class which is a model of ZFC. Lemma 36.3 may be
adapted, replacing Vα by VαM . Theorem 36.4 may be adapted, letting U
in the proof equal {X ⊆ α : X ∈ M and α ∈ j(X)}. Lemma 36.5 may
be adapted; U is M -normal if and only if for every regressive function
f : X 7→ κ where f ∈ M and X ∈ U , there is a subset Y ⊆ X such that
Y ∈ U and f is constant on Y . Lemma 36.6 may be adapted. Theorem
36.8 may be adapted; rather than from MUj , the map constructed is
from (M κ )M / ≡Uj .
Suppose j as above, It follows by the adaptation of theorem 36.8
that (M κj )M / ≡Uj is well-founded, since a descending chain would yield
one in N .
Theorem 1. If 0# exists then there is a non-trivial elementary
embedding j : L 7→ L.
Proof: Let I = ∪Iκ be the Silver indiscernibles (defined in section
38). Let j0 : I 7→ I be any order-preserving map. By theorem 38.9
every element of L equals t~v (~x) where t is a Skolem term and ~x is
an increasing sequence of Silver indiscernibles. Define j(t~v (~x)) to be
144
t~v (j0 (x1 ), . . . , j0 (xk )). Standard arguments show that j is a well-defined
function, and is an elementary embedding. ⊳
Suppose j : L 7→ L is a non-trivial elementary embedding. The
facts outlined above can be used to transform j to an embedding with
an additional property. Write κ for κj and U for Uj . (M κ )M / ≡U is wellfounded, and its transitive collapse is a model of set theory containing
the ordinals, so equals L, giving rise to an elementary embedding jU :
L 7→ L; by the adaptation of lemma 36.6, κjU = κj .
Lemma 2. If λ is a limit cardinal of cofinality greater than κj then
jU (λ) = λ.
Remarks on proof: This is lemma 18.23 of [Jech2]. ⊳
From hereon j will be assumed to have the property of lemma 2.
The next definition makes use of an operator on classes of ordinals
which has uses in various settings. If X is a class of ordinals, there
is a function eX on the ordinals, where eX (ξ) is the ξth element of X
in increasing order. A “fixed point” of eX is an ordinal ξ such that
eX (ξ) = ξ. The fixed points of eX are also called the fixed points of X.
Clearly, ξ is a fixed point of X if and only if the order type of X ∩ ξ
equals ξ. If X is a class of cardinals then λ is a fixed point if and only
if |X ∩ λ| = λ. Let FP(X) denote the set of fixed points of X.
Given j, let X0 be the set of limit cardinals of cofinality greater
than κj . Let Xα+1 = FP(Xα ). For α ∈ LimOrd let Xα = ∩β<α Xβ .
Lemma 3. Each Xα is a proper class.
Remarks on proof: This is proved in remarks preceding lemma 18.24
of [Jech2]. ⊳
Now let κl be any element of Xℵ1 . Let jl be the restriction of
j to Lκl . Since j(κl ) = κl , jl is an elementary embedding of Lκl in
Lκl . For α < ℵ1 let Mα be the definable hull in Lκl of κj ∪ (Xα ∩ κl ).
Since |Xα | = κl its transitive collapse is Lκl ; let πα be the collapsing
isomorphism. Let γα = πα−1 (κj ).
Lemma 4. The set {γα : α < ℵ1 } is a set of indiscernibles for Lκl .
Remarks on proof: This is proved in lemmas 18.24 to 18.26 of
[Jech2]. ⊳
Theorem 5. If there is a non-trivial elementary embedding j : L 7→
L then 0# exists.
Remarks on proof: This follows by lemma 4 and theorem 38.7. ⊳
45. Rudimentary functions.
The rudimentary functions were introduced independently in
[Gandy] and [Jensen] in the early 1970’s, and have since been of considerable use in set theory. The rudimentary functions are functions
f : V k 7→ V . As usual, these are proper classes, so each is defined by
a formula. However an “informal” style may be used in defining them;
145
the definition could be “translated” to more formal one, at the cost of
great tedium.
The rudimentary functions may be initially defined to be the smallest class of function containing those in clauses 1-3 below; and closed
under the operations of clauses 4 and 5.
1. F (x1 , . . . , xk ) = xi , for any k and 1 ≤ i ≤ k.
2. F (x1 , . . . , xk ) = {xi , xj }, for any k and 1 ≤ i, j ≤ k.
3. F (x1 , . . . , xk ) = xi − xj , for any k and 1 ≤ i, j ≤ k.
4. F (x1 , . . . , xk ) = ∪w∈x1 G(w, x2 , . . . , xk ) (union on the first argument).
5. F (x1 , . . . , xk ) = G(H1 (x1 , . . . , xk ), . . . , Hl (x1 , . . . , xk ))
(composition).
The composition operator is a strict variety. A more general composition operator allows F (x1 , . . . , xk ) to equal G(t1 , . . . , tl ) where ti is
Hi (xi1 , . . . , xiki ), or some xj . It is easily seen by induction that allowing
the more general composition operator does not change the collection
of rudimentary functions.
The initial definition may be used to prove basic properties of the
rudimentary functions. In this section various of these will be stated;
proofs will be omitted, and can be found in the original paper [Jensen],
[Devlin], or [Dodd]. There is a brief treatment of rudimentary functions
in [Jech2], which omits proofs.
A predicate P (~x) is said to be rudimentary if there is a rudimentary
function f such that P (~x) ⇔ f (~x) 6= ∅. Lemma VI.1.1 of [Devlin] shows
that various functions and predicates are rudimentary, and shows the
following.
- A predicate P (x) is rudimentary if and only if its characteristic
function χP (~x) is rudimentary, where χP (~x) = 1 if P (~x), else 0.
- The predicates x = y and x ∈ y are rudimentary.
- The rudimentary predicates are closed under Boolean operations
and bounded quantification. In particular a ∆0 predicate (i.e., one
definable by a ∆0 formula) is rudimentary.
- Various standard functions with ∆0 graphs are rudimentary, including p = hu, vi, u = π1 (p), and v = π2 (p); z = x × y; and standard
functions concerning relations and functions.
Lemma 1. A rudimentary function has a ∆0 graph.
Remarks on proof: Say that a function f (~x) is simple if R(f (~x), ~y )
is ∆0 whenever R(w, ~y ) is ∆0 . After a preliminary lemma, it follows by
induction on f that every rudimentary function is simple. The proof
may be found in lemmas VI.1.2 and VI.1.3 of [Devlin]. ⊳
As a corollary, a k-ary predicate on V is rudimentary if and only
if it has a ∆0 definition in the language of set theory (if and only if the
146
characteristic function has a ∆0 graph).
The rudimentary functions can be characterized using “basis functions”. These are as follows, where recall hx1 , x2 , . . . , xk i denotes
hx1 , hx2 , . . . , xk ii.
0. {x, y}
1. x − y
2. x × y
3. {hu1 , u2 , u3 i : u2 ∈ x ∧ hu1 , u3 i ∈ y}
4. {hu1 , u2 , u3 i : u3 ∈ x ∧ hu1 , u2 i ∈ y}
5. ∪x
6. π1 (x)
7. ∈ ∩(x × x)
8. {π2 (x ↾ z) : z ∈ y}
Lemma 2. A function is rudimentary if and only if it can be obtained by (general) composition from the basis functions.
Remarks on proof: Let B be the collection of functions which can
be obtained by general composition from the basis functions. It is not
difficult to see that the basis functions, and hence all the functions in
B, are rudimentary. For the converse, if F is a formula of set theory
together with a suitable list of k variables, let dF be the function of a
single argument u, whose value is the subset of uk defined in u by F . It
follows by induction on F that dF is in B. Given a function f : V k 7→ V
let f ∗ : V 7→ V be the function where f ∗ (u) = f [un ]. It follows by
induction (using dF ’s) that if f is rudimentary then f ∗ is in B. From
this, it follows that if f is rudimentary then f ∈ B. For details see
lemma VI.1.11 of [Devlin]. ⊳
A set X is said to be “rudimentarily closed” (abbreviated “rudclosed”) if whenever x1 , . . . , xk ∈ X and f is a rudimentary function
then f (x1 , . . . , xk ) ∈ X. Strictly speaking this definition is unsatisfactory, since it quantifies over a set of proper classes. However, it is
intuitively correct, and the difficulties are removable. For example, it
suffices that X be closed under the basis functions. Lemma 2 is subject to similar comment. It is not a theorem of ZFC; rather, it is a
“meta-theorem” about ZFC.
It is easy to see that if α is a limit ordinal then Vα is rud-closed; and
that the intersection of a family of rud-closed sets is rud-closed. Thus,
for any set X there is a smallest rud-closed set Y such that X ⊆ Y . Y
is called the “rudimentary closure” (abbrevated rud-closure) of X. As
usual it may be described as ∪i∈ω Xi where Xi+1 is obtained from Xi by
adding to Xi the result of applying each of the basis functions in every
possible way to elements of Xi .
Lemma 3. If X is a transitive set then the rud-closure of X is
147
transitive.
Remarks on proof: Let Y be the rud-closure. Say that x ∈ Y is
valid if TC({x}) ⊆ Y . It suffices to show that if f is rudimentary, and
xi is valid for 1 ≤ i ≤ k, then f (~x) is valid. Indeed, if x ∈ X then x is
valid; and it follows inductively that any x ∈ Y is valid. The claim for
f is proved by induction on the formation of f according to the initial
definition of a rudimentary function. For details see lemma VI.1.7 of
[Devlin]. ⊳
The following two technical lemmas will be used in section 46. If
M is a structure, a subset S ⊆ M will be said to be definable in M if
it is defined by some formula without parameters, and definable in M
from parameters if is defined by some formula with parameters,
Lemma 4. Suppose X is transitive and rud-closed, and Y ≺1 X.
Then Y is rud-closed and satisfies the axiom of extensionality. Suppose
π : Y 7→ Z is the transitive collapse; then π commutes with rudimentary
functions (i.e., π(f (~x)) = f (π(x1 ), . . . , π(xk ))).
Remarks on proof: That Y satisfies extensionality follows because
Y ≺1 X, and x 6= y can be written in Σ1 form. That Y is rud-closed follows because the existence condition is Σ1 , by lemma 1. The final claim
follows by induction on f . For details see lemma VI.1.22 of [Devlin]. ⊳
Lemma 5. If f is a rudimentary function then there is an integer b
such that
ρ(f (~x)) ≤ max(ρ(x1 ), . . . , ρ(xk )) + b.
Proof: This follows by induction from the fact that it is true of the
basis functions, which is easily checked. ⊳
In many applications of the rudimentary functions it is necessary to
consider them, relativized to an additional unary predicate A. Examples
will be seen in section 47. As in section 39, for convenience ZFCA can
be used, and A removed when it is definable or a set. To define the Arelative rudimentary functions, add to the initial definition the following
clause.
6. A ∩ x
This is a unary function, for any A.
Lemma 6. If f is an A-relative rudimentary function then f equals
a general composition of rudimentary functions and the function A ∩ x.
Remarks on proof: The claim follows by induction on f . All cases
are straightforward except clause 4. For details see lemma VI.1.8 of
[Devlin]. ⊳
Note that this is a meta-theorem of ZFCA , and the claim holds
for any A (“uniformly in A”). As a corollary, if the function A ∩ x
is added as a 9th basis function, the A-relative rudimentary functions
are those obtained from the expanded set of basis functions by general
148
composition.
Expanding a remark preceding lemma 1, note that if x ∈ A then
A ∩ {x} = {x}, else A ∩ {x} = ∅. It follows that if a predicate is ∆0
in A then it is an A-relative rudimentary function; further the function
definition depends only on the formula and not on A.
The “A-relative rud-closure” of a set X is defined as the rud-closure,
except the additional function A ∩ x is included in the basis functions.
A structure hM, Ai for the language, expanded with a unary predicate symbol, is said to be amenable if A ∩ x ∈ M for all x ∈ M . The
structure will be said to be transitive or rud-closed if M is. If hM, Ai
is rud-closed and amenable then M is closed under the A-relative rudimentary functions; and if a predicate is ∆0 relative to A then it is given
by an A-relative rudimentary function, acting on M .
Just as an admissible set, a transitive rud-closed set is a model of
a sufficient fragment of set theory that definitions can be given, and
“computations” carried out, which are valid in any such set (lemma 8
below will be an example). However, rud-closure is not as stringent
a requirement as admissibility, and more care is needed. (Just as for
admissible sets, there are axiom systems for the rud-closed sets, but
this will not be considered; see [Dodd], [Gandy], or [Mathias].)
As in the case of admissible sets, it is useful to give Σ1 definitions,
which hold in any transitive rud-closed set. It is easily seen that the
set Vω of hereditarily finite sets is rud-closed, and is a subset of any
rud-closed set. Suppose ∃w P (w, ~x) is a predicate on Vω where P is ∆0 ;
if w can be taken to be in Vω when it exists, then the definition holds in
any transitive rud-closed set. Lemma 7 uses this observation to provide
some useful predicates. Recall from section 16 that the predicate “x is
an integer” is ∆0 .
Lemma 7. For every recursively enumerable predicate P on ω there
is Σ1 formula which defines P in any transitive rud-closed set.
Remarks on proof: First, k = i + j has such a formula; a witness is
a function f with domain j + 1, such that f (0) = i, f (t + 1) = f (t) + 1
for all t ∈ j, and k = f (j). The required f is in Vω . Second, there is a
Σ1 formula Nm (n, f ) which holds if f is the m-adic notation of n (f (0)
being the low-order digit). This states the existence of a sequence v of
integers, and w of witnesses, where if Dom(f ) = l then Dom(v) = l + 1,
Dom(w) = l, v(0) = 0, v(l) = n, and for t < l w(t) witnesses that
v(t + 1) = m · v(t) + f (t). Again, the required witnesses are in Vω . If the
state of a Turing machine (q.v. see appendix 2) is coded as a sequence
over a finite alphabet, the step predicate and “halted” predicate are
rudimentary. An input integer can be transformed to the initial state
using N2 . Thus, the predicate determined by the Turing machine is
149
defined by a Σ1 formula, where the witnesses may all be taken in Vω . ⊳
There is a well-known bijection E : ω 7→ Vω . This may be defined
by the following recursion.
E(0) = ∅, E(2i0 + i1 ) = {E(i0 )} ∪ E(i1 ).
Using the methods indicated above, it may be seen that E is Σ1 , i.e., has
a Σ1 definition which holds in any transitive rud-closed set. Thus, any
function from Vω to Vω , such that the induced function on the integer
codes is Σ1 , is Σ1 also.
Lemma 8. There is a Σ1 formula in the language expanded with a
unary predicate symbol A, which defines the predicate “the ∆0 formula
φ is true in hM, Ai, with assignment a to the free variables”, in any
transitive rud-closed amenable structure hM, Ai. There is also such a
Π1 formula.
Remarks on proof: This is lemma 1.12 of [Jensen]; see also definition 1.16 of [Dodd]. Using the function E defined after lemma 7, it is
irrelevant whether φ is coded as an integer or a hereditarily finite set;
integer codes will be assumed. As observed preceding lemma 1, given a
∆0 formula φ defining the predicate Q, there is a rudimentary function
definition dφ such that Q holds if and only if dφ = 1. By lemma 2 dφ may
be translated to a term tφ involving the basis functions. The proofs provide explicit methods, allowing the conclusion that the function φ 7→ tφ
is recursive (indeed primitive recursive). These remarks extend readily
to the relativized case. By lemma 7 the predicate P1 (t, φ), which holds
if t = tφ , is Σ1 . The amenability requirement ensures that every term t
in the basis functions defines a total function. The predicate P2 (y, t, a),
which holds if y is the value of the term t at the assignment a to the
variables of t, may be seen to be Σ1 . The final predicate P is then
∃t(P2 ∧ P1 ). This may be written in Π1 form as ¬P (¬φ, a). ⊳
Corollary 9. For n ≥ 1, there is a Σn formula in the expanded language, which defines the predicate “the Σn formula φ is true in hM, Ai,
with assignment a to the free variables”, in any transitive rud-closed
amenable structure hM, Ai.
Remarks on proof: See lemma II.6.4 of [Devlin] for a related result.
Let P1 (φ, a) be the predicate of lemma 8. Let P2 (θ, ā, a1 , . . . , an , φ, a) be
the following predicate: “θ is the matrix of φ, ai is an assignment to the
ith block of bound variables, and ā is the concatenation of a and the ai ”.
P2 may be seen to have a Σ1 definition, which holds in any transitive
rud-closed set. If n is odd the final predicate is ∃a1 · · · ∃an ∃θ∃ā(P2 ∧P1 ).
If n is even the final predicate is ∃a1 · · · ∀an ∀θ∀ā(P2 ⇒ P1′ ) where P1′ is
the Π1 form. ⊳
The following fact will be used in section 46.
Lemma 10. Suppose M is transitive and rud-closed, and A ⊆ M is
150
∆0 -definable in M from parameters. Then A ∩ x ∈ M for all x ∈ M .
Remarks on proof: A is rudimentary, so A ∩ x is rudimentary;
the parameters are in M , and M is rud-closed. For details see lemma
VI.1.6.v of [Devlin]. ⊳
46. The Jensen hierarchy.
The Jensen hierarchy is an alternative to Godel’s Lα hierarchy
for the constructible sets. In various circumstances facts about L can
be proven more straightforwardly using the Jensen hierarchy than the
Godel hierarchy, and it has become widely used since Jensen introduced
it in 1972. Again, in this section proofs will mostly be omitted, and may
be found in [Jensen] or [Devlin].
Let Rud(X) denote the rud-closure of X ∪ {X}. For an ordinal α,
define the set Jα by transfinite recursion as follows.
J0 = ∅
Jα+1 = Rud(Jα )
Jα = ∪β<α Jβ for limit ordinals α
Although the Lα hierarchy can be dispensed with entirely ([Schindler] for example does so), both hierarchies are widely used. Lemma 5
below gives a relation between them. The definition of the Lα hierarchy using the “Def” operator has the advantage of yielding a quite
straightforward definition of L.
If U is transitive then U ∪ {U } is, so Rud(U ) is. Thus, each Jα is
transitive. Since clearly Jα ∈ Jα+1 , it follows that Jβ ∈ Jα if β < α,
and Jβ ⊆ Jα if β ≤ α,
Each Jα is clearly rud-closed. This is a significant advantage of
the Jensen hierarchy. For example, Lα is not even closed under ordered
pairs in general (it is if α is a limit ordinal). As observed in the previous
section, Vω is rud-closed, from which it follows that J1 = Vω .
Lemma 1. Jα ∩ Ord = ω · α.
Proof: By lemma 45.5 Jα+1 ∩ Ord ≤ (Jα ) + ω. It is easy to see that
equality holds. The lemma follows by induction on α. ⊣
Some authors adopt the convention that Jα receives the index ω · α
rather than α. The indexing here is that used by [Jensen], [Devlin], and
[Jech2].
For the following let ∆0 -Def(S) be the collection of subsets of S
which are defined in S by a ∆0 formula with parameters from S (as
opposed to Def(S), which allows any formula).
Lemma 2. For a set S, ∆0 -Def(S ∪ {S}) ∩ Pow(S) = Def(S).
′
Remarks on proof: If A ⊆ S is defined in S by Fx , let Fvx
be F with
all quantifiers replaced by quantifiers restricted to range over v where
v is a new variable; then A is defined in S ∪ {S} by x ∈ v ∧ F ′ , with
S assigned to v. For the converse, it follows by induction on F that
151
there is a formula F ′ , such that if F defines X in S and F ′ defines X ′
in S ∪ {S}, then X = X ′ ∩ S. For details see lemma VI.1.17 of [Devlin].
⊳
Lemma 3. For a transitive set S, Rud(S) ∩ Pow(S)=∆0 -Def(S ∪
{S}) ∩ Pow(S).
Proof: Suppose A ⊆ S is defined in S ∪ {S} by the ∆0 formula F .
Since S ∪{S} and Rud(S) are both transitive, A is defined in Rud(S) by
F . By lemma 45.10, B = B ∩ S is in Rud(S). Suppose A ∈ Rud(S) and
A ⊆ S. Then for some rudimentary function f and p ∈ S, A = f (p, S).
As noted in the proof of lemma 45.1, f is simple,, so x ∈ f (p, S) is
defined by a ∆0 formula Fv , with parameters p, S. By absoluteness, it
is so defined in S ∪ {S}; that is, A is in ∆0 -Def(S ∪ {S}) ∩ Pow(S). ⊳
Corollary 4. For a transitive set S, Rud(S) ∩ Pow(S) = Def(S).
Proof: Immediate by lemmas 2 and 3. ⊳
It follows that a subset of Jα is definable in Jα from parameters
if and only if it is an element of Jα+1 . The following provides a comparison of the Godel and Jensen hierarchies. Neither inclusion should
be surprising; the first is reasonable by corollary 4, and the second by
lemma 1.
Lemma 5. For any ordinal α, Lα ⊆ Jα ⊆ Lω·α .
Proof: See lemmas VI.2.3 and VI.2.4 of [Devlin]. ⊳
In particular, L = ∪α Jα . Also, if α = ω ·α then Jα = Lα . The class
of ordinals for which α = ω · α is a large one; it is closed and unbounded
for example.
The following definition is often useful in developing the theory of
the Jensen hierarchy. Let Bi for 0 ≤ i ≤ 8 denote the ith basis function
of section 45, and let ki denote its valency. Let
s(u) = u ∪ ∪8i=0 Bi [uki ]
S(u) = s(u ∪ {u})
S0 = ∅
Sα+1 = S(Sα )
Sα = ∪β<α Sβ for limit ordinals α
Lemma 6. Rud(u) = ∪i<ω S i (u).
Proof: Clearly ∪i<ω si (u) is the rud-closure of u. Also, if u ⊆ v then
s(u) ⊆ s(v), whence s(u) ⊆ S(u). The inclusion Rud(u) ⊆ ∪i<ω S i (u)
follows. For the opposite inclusion, by the proof of lemma 45.2, Bi∗ is
rudimentary, where Bi∗ (u) = Bi [uki ]. Thus, if v ∈ Rud(u) then Bi∗ (v) ∈
Rud(u). It is easily seen that if v, w ∈ Rud(u) then v ∪ w ∈ Rud(u).
Thus, if v ∈ Rud(u) then s(v) ∈ Rud(u). But if v ∈ Rud(u) then
v ∪ {v} ∈ Rud(u), so if v ∈ Rud(u) then S(v) ∈ Rud(u). It is now easy
to see by induction that S i (u) ⊆ Rud(u) for all i. Indeed, if v = S i (u)
then S i+1 (u) = s(v∪{v}), and v∪{v} ⊆ Rud(u), so s(v∪{v}) ⊆ Rud(u).
152
⊳
It follows by induction that Jα = Sω·α .
Lemma 7. For the following predicates, there is a Σ1 formula which
defines the predicate in Jα for any α.
a. y = Sβ
b. y = Jβ
Remarks on proof: There is a ∆0 predicate P (f ) which states that
“f is a function whose domain is an ordinal; f (0) = 0; at successor stages
f (γ + 1) = S(f (γ)); and at limit states f (γ) = ∪δ<γ f (δ)”. y = Sβ is
defined by “∃f (P (f ) ∧ y = f (β))”. There is at most one f with a given
domain. That there is an f , and part a of the lemma, may be shown by
induction on α. For the induction step at successor stages, the function
y = Sβ is Σ1 -definable in Jα , so is in Jα+1 . Thus, there is an f with
domain ω · α, and it follows that there is an f with domain ω · α + n for
any n ∈ ω. Part b follows by giving a suitable definition of the predicate
“β = ω · γ”. Further details may be found in lemma 2.2 of [Jensen]; a
more involved proof may be found in lemma VI.2.5 of [Devlin]. ⊳
Lemma 8. There is a rudimentary function W , such that if r is a
well-order of v then W (v, r) is a well-order of s(v). Further, W (v, r) is
an end-extension of r, i.e., r ⊆ W (v, r), and if x ∈ v and y ∈
/ v then x
precedes y in W (v, r).
Remarks on proof: Given x and y, say that x precedes y if one of
the following holds, where a clause presumes that preceding clauses do
not hold.
- x and y are in v and x precedes y in r.
- x ∈ v and y ∈
/ v.
- The smallest i such that x ∈ Bi [v ki ] precedes the smallest j such
that y ∈ Bj [v kj ].
- The smallest i and j are equal, and the least ~a such that x = Bi (~a)
precedes the least ~b such that y = Bi (~b), in the lexicographic order
on v ki .
The predicate “precedes” is ∆0 , and W (v, r) is the subset of s(v) × s(v)
defined by it. For further details see lemma 1.21 of [Devlin]. ⊳
Define well-orders <Sα as follows.
< S0 = ∅
<Sα+1 = W (<Sα )
<Sα = ∪β<α <Sβ for limit ordinals α
Let <Jα equal <Sω·α . Let <J equal ∪α <Jα .
Lemma 9. For the following predicates, there is a Σ1 formula which
defines the predicate in Jα for any α.
a. y = <Sβ
b. y = <Jβ
153
Remarks on proof: The proof of lemma 8 need only be modified as
needed. ⊳
It should come as no surprise to the reader that there are Skolem
functions for the Jα with a variety of additional properties. As will be
seen, it is a fact of considerable importance that such can be defined
when the language has an additional unary predicate symbol A, in a
manner which holds for any A meeting the amenability requirement.
For the following lemma (and for later use) let φi denote the ith
formula in some fixed computable (and hence Σ1 in Vω ) enumeration of
the Σ1 formulas with free variables y and x, in the language expanded
with a unary predicate symbol A.
Lemma 10. There is a Σ1 formula in the expanded language, which
defines a Σ1 Skolem function in any structure M = hJα , Ai which is
amenable. That is, it defines a partial function hM
J (i, x), such that if
∃yφi (x̊) is true in M , then φi (h(i, x̊), x̊) is true.
Remarks on proof: Define the following predicates:
- P1 (φ, y, x) is the predicate of corollary 45.9; this is Σ1 .
- P2 (i, y, x) = P1 (φi , y, x); this is also Σ1 .
- Write P2 as ∃w P3 (w, i, y, x) where P3 is ∆0 .
- P4 (z, i, x) = P3 (π1 (z), i, π2 (z), x); this is ∆0 .
- P5 (z, i, x) holds if and only if z is the <J -least z ′ such that
P4 (z ′ , i, x) holds; this is Σ1 .
Then y = hJ (i, x) if and only if ∃z(P5 (z, i, x)∧y = π2 (z)). To see that P5
is Σ1 , write it as P4 (z, i, p)∧∃v(v = {z ′ : z ′ <J z}∧∀z ′ ∈ v¬P4 (z ′ , i, x));
that v = {z ′ : z ′ <J z} is Σ1 follows by lemma 9. ⊳
The superscript M may be omitted when it is clear. As the following lemma shows, a Skolem function such as hJ permits taking a
Skolem hull with a single function application. A more general fact will
be proved, which is useful in fine structure theory.
Let M̄ denote a structure hM, Ai where A is a unary predicate
(only a single unary predicate is considered here, but the case of several
unary predicates is essentially the same). A function h : D 7→ M where
D ⊆ ω × M is said to be a Σn Skolem function for M̄ if, whenever
∃yFyx (x̊) is true in M̄ then Fyx (h(i, x̊), x̊) is true for some i. This is a
slight generalization of the hypothesis of lemma 10, where F = φi ; the
more general hypothesis suffices.
Recall from section 5 that in set theory, a function of several variables is an abbreviation for a function on a Cartesian product. In particular, to simplify the notation, angle brackets may be omitted in the
argument list of a function. This abbreviation is used in the following
lemma, and occasionally subsequently.
Lemma 11. Suppose M̄ = hM, Ai is a structure as above with
154
M rud-closed. Suppose h is a Σn Skolem function, which is defined
in M by a Σn formula with parameter p ∈ M . Suppose X ⊆ M is
nonempty and closed under ordered pairs. Let N = h[ω × {p} × X], and
let N̄ = hN, A ∩ N i. Then X ⊆ N and N̄ ≺n M̄ .
Proof: Suppose x ∈ X, and consider the formula “w = π2 (hp, xi)”.
The unique value of w satisfying this is x, from which it follows that
x = h(i, hp, xi) for some i, and so x ∈ N . Suppose F (w, q1 , . . . , qn ) is
true for some w, where qj ∈ N for 1 ≤ j ≤ n. Then qj = h(ij , hp, xj i)
for some xj ∈ X. A Σn formula F ′ can thus be defined, so that for
any w, F (w, q1 , . . . , qn ) holds in M̄ if and only if F ′ (w, hp, hx1 , . . . , xk ii)
(the integers ij may be defined by formulas). The latter holds for w =
h(i, hp, hx1 , . . . , xk ii) for some i, which is in N . Thus, F holds for some
w ∈ N . By Tarski’s criterion, lemma 20.1, suitably modified for Σn
formulas, it follows that N̄ ≺n M̄ . See also remarks preceding lemma
2.8 of [Jensen]. ⊳
The parameter p is unnecessary for the function hJ , but is needed
in other situations. From the proof it is easily seen that p ∈ N̄ .
There is a condensation lemma for the Jα hierarchy. For the following, by lemma 45.4, if N ≺1 Jα for some α then the transitive collapse
of N may be taken.
Lemma 12. Suppose α ∈ Ord and N ≺1 Jα , and let π be the
collapsing isomorphism. Then π[N ] = Jβ where β ≤ α. Further, for
x ∈ N , π(x) ≤J x.
Remarks on proof: By lemma 7.a, if γ ∈ ω · α ∩ N then Jγ ∈ N ,
since the formula giving Jγ from γ is Σ1 , so holds in N , and Σ1 formulas
are up-absolute, so the formula defines Jγ from γ in N . Using lemma
45.4 and induction on γ, it may be seen that π(Sγ ) = Sπ(γ) . Letting
β = π[N ∩ Ord], That Jβ ⊆ π[N ] is straightforward. The opposite
inclusion holds because if x = π(w) then ∃γ(w ∈ Sγ ) is true in Jα , so
is true in N , and x ∈ Sπ(γ) follows. For the second claim, since <J is
uniformly Σ1 , x <J y if and only if π(x) <J π(y). If x <J π(x) for
some x, let x be the <J -least such. Since π(x) ∈ Jβ , x ∈ Jβ , and so
x = π(w) for some w ∈ N . Then w <J x, so π(w) ≤J w by choice of x,
so x ≤J w, a contradiction. For details see lemma VI.2.9 of [Devlin]. ⊳
If hN, Bi ≺0 hM, Ai then B = A ∩ N , since A(x) is a ∆0 formula.
Thus, if M̄ = hJα , Ai, and N̄ ≺1 M̄ where N̄ = hN, A ∩ N i, then the
conclusion of the lemma holds.
Following are some further lemmas, which will be needed later.
Recall the function Γ defined in section 13. An ordinal number α is
called a δ-number if whenever β, γ < α then β · γ < α. A treatment of
δ-numbers may be found in [Monk1].
Lemma 13.
155
a. Γ has a Σ1 definition, which is provably total in KP.
b. If α is an infinite δ-number then Γ[α × α] = α.
c. For each α there is a surjection g : ω · α 7→ (ω · α) × (ω · α), which
is Σ1 -definable in Jα from parameters.
d. If p is a parameter such that there is a function g as in part c, which
has a definition from p, then Jα = hJ (ω × {p} × ω · α).
e. For each α there is a surjection g : ω ·α 7→ Jα , which is Σ1 -definable
in Jα from parameters.
Remarks on proof: Part a is proved in lemma II.6.6 of [Devlin]. For
part b, it follows by ordinal arithmetic that Γ(β, γ) ≤ (max(β, γ) + 1)2 ;
see lemma 1 of [Linden]. Thus if α is an infinite δ-number and β, γ < α
then Γ(β, γ) ≤ α. That Γ[α × α] ≥ α follows because the map γ 7→
Γ(0, γ) is increasing. This proves part b. Part c is proved in lemma
VI.3.15 of [Devlin]. Part d is proved in lemma 2.10 of [Jensen]. Part e
follows using part d. ⊳
Suppose α is an infinite δ-number. It is easily seen that either α = ω
or ω × α = α. The function Γ−1 , which has a parameter-free definition,
may be used as g in part c. Likewise, there is a function which has a
parameter-free Σ1 definition, which may be used as g in part e. Note
that an admissible ordinal is a δ-number, since the existence condition
for ordinal multiplication is provable in KP.
A function φ ⊆ X × Y is said to uniformize a relation R ⊆ X × Y
if π1 [f ] = π1 [R] and R(φ(~y ), ~y ). A structure M is said to be Σn uniformizable if and only if, whenever R is a predicate which is Σn definable in M from parameters, then there is a function φ, which is
Σn -definable in M from parameters, such that φ uniformizes R. Uniformization is useful in constructibility theory, and will be encountered
again in section 58.
Lemma 14. Suppose hM, Ai is a transitive rud-closed amenable
structure. If hM, Ai is Σn -uniformizable then it has a Σn -definable Σn
Skolem function.
Remarks on proof: This is lemma VI.3.12 of [Devlin]. It follows
using corollary 45.9. ⊳
Recall the strong Σ1 collection axiom from section 17. Say that an
admissible set is strongly admissible if it satisfies this axiom. Say that
an ordinal α is nonprojectible if there is no function f , Σ1 -definable in
Jα from parameters, with domain a set of ordinals bounded below ω · α,
and range ω · α.
Lemma 15.
a. Jα is admissible if and only if there is no function f , definable in
Jα from parameters, with domain an ordinal γ < ω · α, and range
unbounded in ω · α.
156
b. Jα is strongly admissible if and only if there is no function f , definable in Jα from parameters, with domain a set of ordinals bounded
below ω · α, and range unbounded in ω · α.
c. If ω · α is nonprojectible then Jα is strongly admissible.
Remarks on proof: These statements, and some further statements,
can be found in lemmas 2.11 to 2.13 of [Jensen]. ⊳
The relativized Jensen Hierarchy JαA is defined by the same recurA
sion as the unrelativized hierarchy, except Jα+1
= RudA (JαA ), where
A
Rud (X) denotes the A-relativized rud-closure of X ∪ {X}, as defined
in section 45. Likewise, sA (u) = u ∪ ∪9i=0 Bi [uki ], and S A (u) and SαA are
defined in terms of sA . Discussion of this hierarchy will be cursory, but
it is of considerable importance in modern set theory. [Dodd] contains
a comprehensive introduction.
Some properties of the unrelativized hierarchy continue to hold. In
particular:
- Every JαA is transitive. Indeed, if the set of basis functions is enlarged then every SαA is transitive. (See [lemma 2.2 of [Dodd].)
- Lemmas 7 to 9 continue to hold. (See lemmas 1.10 and 1.11 of
[SchZem].)
- Every structure hJαA , A ∩ JαA i is amenable (remarks following definition 1.9 of [SchZem]).
- There is a uniform Σ1 Skolem function, which will be denoted hM
J
(lemma 1.15 of [SchZem]).
- Lα [A] ⊆ JαA ⊆ Lω·α [A]. (The proof of lemma VI.2.4 of [Devlin]
may be modified as necessary.)
In particular, L[A] = ∪α JαA .
Lemma 16.
A
a. If A is definable in Jα then Jα+1
= Jα+1 .
A
b. If hJα , Ai is amenable then Jα = Jα .
Proof: For part a, A ∈ Jα+1 , so by induction SγA ⊆ Jα+1 for
γ < ω · (α + 1). Part b is similar. ⊳
47. Fine structure.
The “fine structure theory” of the Jensen hierarchy involves defining
a system of structures. These are used in various ways, once a definition
has been given. Fine structure theory may considered for L, or more
generally L[B]. Initially, the case of L will be of principal interest, but
the case of L[B] will be of interest later, so some preliminary facts will
be given for both cases.
A structure M = hJαB , B ∩ JαB , Ai where α > 0 will be called a
J-structure. hJα , Ai and Jα are special cases of interest. As in [Dodd],
B
B
B
the notation J B
α will be used to denote hJα , B ∩ Jα i; hJ α , Ai denotes
B
M . The notation “p ∈ M ” will be used for “p ∈ Jα ”.
157
Let M be a J-structure. The notation “ΣM
n ” will be used to abbreviate “Σn -definable in M from parameters”.
a. Let ρaM be the largest ordinal ρ such that, if X ⊆ JρB is ΣM
1 , then
hJρB , B ∩ JρB , A ∩ JαB i is amenable.
b. Let ρbM be the smallest ordinal ρ such that there is a subset X ⊆ ω·ρ
which is ΣM
/ JαB .
1 , such that X ∈
c. Let ρcM be the smallest ordinal ρ such that there is a partial function
f which is ΣM
1 , such that Dom(f ) ⊆ ω · ρ and Ran(f ) = ω · α.
Theorem 1.
a. ρaM ≤ ρbM .
b. ρbM ≤ ρcM .
c. If M = Jα then ρcM ≤ ρaM .
Proof: For part a, suppose η < ρaM , and X ⊆ ω · η is ΣM
, Xi
1 . hJ ρa
M
is amenable, X = X ∩ Jη , and Jη ∈ JρaM , so X ∈ JρaM , so X ∈ Jα . Thus,
η < ρbM ; since this is so whenever η < ρaM , ρaM ≤ ρbM .
For part b, suppose f is as in the definition of ρcM . Let g : ω·α 7→ Jα
be a surjection as in lemma 46.13. Let f ′ = g ◦ f ; then f ′ is ΣM
1 . Let
X = {γ < ω ·ρcM : γ ∈
/ f ′ (γ)}; then X is ΣM
.
If
X
∈
J
then
X
=
f ′ (γ0 )
α
1
c
′
for some γ0 < ω · ρM . But then γ0 ∈ X if and only if γ0 ∈ f (γ0 ) if
and only if γ0 ∈
/ X, a contradiction. Thus, X ∈
/ Jα , and this shows that
ρbM ≤ ρcM .
For any ordinal γ > 0 there is a ΣM
1 function g : ω · γ 7→ ω · (γ + 1);
namely map 2m to m, map 2m + 1 to ω · γ + m, and remaining values to
themselves. From this is it easily seen that if ρcM > 1 then ρcM ∈ LimOrd.
For part c, it suffices to show that if X ⊆ JρcM is ΣM
1 , then hJρcM , Xi
is amenable. If ρcM = 1 then JρcM = Vω , and the claim follows, because
a subset of an element of Vω is in Vω . Thus, it may be supposed that
ρcM is a limit ordinal. Suppose γ < ρcM . Let Y = X ∩ Jγ ; then Y is
defined in Jα by a Σ1 formula with parameter q ∈ Jα .
Let Z = hJ [ω × {q} × Jγ ]. By lemma 46.11, it follows that Z ≺1 Jα ,
Jγ ⊆ Z, and q ∈ Z. By lemma 46.12, π[Z] = Jᾱ for some ᾱ, where π is
the collapsing isomorphism. Since Y ⊆ Jγ , π ↾ Y is the identity map.
Let q̄ = π(q). Since π is an isomorphism, Y has a Σ1 definition in Jᾱ
with parameter q̄. It follows that Y ∈ Jᾱ+1 .
Using the definition of Z, and the partial function π[hJ ∩(ω×Z)×Z],
it is easy to see that there is a partial function f1 which is ΣJ1 ᾱ , such that
Dom(f1 ) ⊆ Jγ and Ran(f1 ) = Jᾱ . Let f2 be a partial function which
is ΣJ1 α , such that Dom(f ) ⊆ ω · ρcM and Ran(f ) = ω · α. If ᾱ ≥ ρcM
held, then using lemma 46.13, f1 , an easily defined surjection from Jᾱ
to ω · ᾱ, and f2 , a Σ1 partial function from ω · γ to ω · α could be defined,
contradicting γ < ρcM . Thus, ᾱ < ρcM , whence Y ∈ JρcM .
Thus far it has been shown that if γ < ρcM then X ∩ Jγ ∈ JρcM .
158
If x ∈ JρcM then x ∈ Jγ for some γ < ρcM . Then x = x ∩ Jγ , so
x ∩ X = x ∩ Jγ ∩ X; since x and Jγ ∩ X are in JρcM , x ∩ Jγ ∩ X is. This
completes the proof that ρcM ≤ ρaM . ⊳
Note that the proof that ρM
≤ ρM
c
a does not apply in general,
π[B]
because it can only be concluded that Y has a definition in hJᾱ , π[A]i.
In the case M = Jα , the common value of the theorem is called the
Σ1 -projectum of α; ρα will be used to denote it. As will be seen, the
projectum is central to fine structure theory. It is also used in the branch
of higher recursion theory known as “α-recursion theory”; see [Sacks2].
Jρα is a “reduct” of Jα , and has properties not held by an arbitrary Jα ,
for example the following.
Lemma 2. For any J-structure M , ω · ρcM is nonprojectible.
Proof: By definition there is no ΣM
1 partial function f such that
Dom(f ) is a bounded subset of ω · ρcM and Ran(f ) = ω · ρcM . A fortiori
there is no such f which is Σ1 in Jα . ⊳
By lemma 15.c, ω · ρcM is admissible, and hence as noted in section
46 is a δ-number.
It is useful to have a definition of the projectum ρM for any Jstructure M . In [DoddJen1] (and [SchZem] and [Mitchell2]), ρbM is used,
and will be here. A value p ∈ M is called a good parameter if there is
a subset X ⊆ ω · ρM which is Σ1 -definable in M from the parameter
p, such that X ∈
/ Jα . Since ρM is defined as ρbM , a good parameter
exists. Let pM be the <J -least good parameter. The fact that pM can
be defined in this way for any M is a main advantage of using ρM as
the definition of the projectum.
[DoddJen1], and various other authors, allow only finite sets of ordinals as parameters, while others, such as [Welch1], allow any element
of Jα , as has been done here. The less restrictive definition permits connecting the recursion equations given below with the original definitions
(as found in [Devlin]). The more restrictive definition has additional
properties of use in some applications. They are equivalent to some
<ω
] (see lemma 2.36 of [Dodd]).
extent, since JαB = hM
J [ω × ω · α
The projectum of Jα may be iterated, provided structures M =
hJα , Ai are considered. For any such M , for any p ∈ Jα let AMp be the
set of pairs hi, xi such that x ∈ JρM and φi (x, p) is true in M , where φi
is the enumeration of formulas as in lemma 46.10. Using corollary 45.9,
this is readily seen to be Σ1 -definable in M , from the parameter hp, ρM i,
or p if ρM = α. Let AM be AMp where p = pM . (These quantities can
be defined in the relativized case, but the definition is more complicated;
see below).
Let P(M ) denote hJρM , AM i.
- P0 (Jα ) = hJα , ∅i
159
- ρn+1
= ρPn (Jα )
α
- pn+1
= pPn (Jα )
α
- An+1
= APn (Jα )
α
- Pn+1 (Jα ) = P(Pn (Jα ))
It is convenient to define ρ0α = α, p0α = ∅, and A0α = ∅. A simple
induction shows that Pn (Jα ) = hJρnα , Anα i. Also, for n > 0, pnα ∈ Jρn−1
α
and Anα ⊆ Jρnα . ρnα is called the Σn -projectum of α, pnα the standard
n
parameter, and Aα the standard code (or master code).
Originally (in [Jensen]), these values were defined as those having
certain properties, and various basic facts, including the above recursion equations, proved to hold. This will be done here, following the
presentation in [Devlin]. Some additional facts are needed, since the
nb
definition of pM differs from that given in [Devlin]. Let ρna
α , ρα , and
nc
ρα be defined as in theorem 1, but for Σn formulas.
Theorem 3. Suppose M = Jα and n ≥ 1. Then M is Σn uniformizable.
Remarks on proof: References will be to [Devlin]. Let an be the
statement of the theorem for a particular n. Let bn be the statement
that hJα , Xi is amenable for any X ⊆ Jρnc
which is ΣM
n . Let cn be
α
M
the statement that, if R(x, ~y ) is Σn and u ∈ Jρnc
then
∀x ∈ uR(x, ~y )
α
is ΣM
n+1 . Lemma VI.3.13 states that a1 . Lemma VI.4.3 states that
an ⇒ bn . Lemma VI.4.4 states that cn ∧ an ⇒ an+1 . Using these, it
suffices to show that cn follows from cj for 1 ≤ j < n. The proof of this
may be found in theorem VI.4.5; by the induction hypotheses and facts
above, aj and bj may be assumed for 1 ≤ j ≤ n. ⊳
A proof of theorem 3, for Lα where α is an admissible ordinal, may
be found in theorem 1.27 of [Chong].
nb
nc
Theorem 4. ρna
α = ρα = ρα .
nb
nb
nc
Remarks on proof: By theorem 1 ρna
α ≤ ρα and ρα ≤ ρα . That
na
nc
ρα ≤ ρα follows mutatis mutandis as part c of theorem 1 (see lemma
VI.4.3 of [Devlin]). By theorem 3 and lemma 46.14, there is a Σn Skolem
function which is Σn -definable in Jα from some parameter p; p must be
included in taking the Skolem hull Z, as in lemma 46.11. ⊳
Note that ρnα is defined earlier; it will be seen in theorem 8 that it
equals the common value of the theorem.
In a structure S, a formula Fx is said to define an element x̊ if
Fx (ẘ) is true if and only if ẘ = x̊. If S is an ∈-structure there is some
ambiguity in the use of the term “definable” for an element, in that it
might be defined as an element, or as a unary predicate; usually this
causes no confusion.
Lemma 5. For any J-structure M , ordinal ρ ≤ α, and element
p ∈ M , the following are equivalent,
160
a. Every x ∈ M has a Σ1 definition in M from parameters in JρB ∪{p}.
b. JαB = hM
J [ω × {p} × Jρ ].
c. There is a partial function f which is Σ1 -definable in M from p,
such that f [Jρ ] = JαB .
If ρ is an infinite δ-number then the preceding hold if and only if
d. there is a partial function f which is Σ1 -definable in M from p,
such that f [ω · ρ] = Jα .
Proof: Write p̊ for the element, and x for the variable, etc. Suppose
ψ(x, hp̊, ẘi) defines x̊ in M , where ẘ ∈ JρB . Then ∃xψ(x, hp̊, ẘi) is true
in M . Since there is exactly one possible value for x, x̊ = hM
J (i, hp̊, ẘi),
where φi = ∃xψ. Thus, a⇒b. For b⇒c, let f (v) = hJ (i, p, w) if v is
of the form hi, wi. For c⇒a, the formula x = f (ẘ) defines x̊ when x̊ =
f (ẘ). For part d, let g be a function as in lemma 46.13.e, which may be
taken to have a parameter-free definition by the additional hypothesis.
Given f as in part c, f ◦ g is as in part d; and the opposite implication
is trivial. ⊳
A parameter p ∈ M is said to be very good if JαB = hM
J [ω × {p} ×
JρM ].
Lemma 6. A very good parameter is good.
Remarks on proof: This is lemma 3.0.4 of [Welch1]. Given a very
good parameter, an f as in the definition of ρcM can be constructed,
showing that ρcM ≤ ρbM , whence ρbM = ρcM , whence ω ·ρM is a δ-number.
Let f be as in part d of lemma 5. Let X = {γ < ω · ρM
/ f (γ)}; by
1 :γ ∈
the argument in the proof of theorem 1.b, X ∈
/ Jα . ⊳
The following additional observations may be made.
- If pM is a very good parameter then it is the <J -least very good
parameter (by lemma 6).
Suppose M = hJα , Ai.
- If there is a very good parameter then ρaM = ρbM = ρcM (as in
theorem 4).
- If there is a very good parameter then P(M ) is amenable (since
AM is ΣJ1 α and ρaM = ρbM ).
To avoid complication, the following lemma will be given only for
the unrelativized case, although a suitably formulated version holds in
the relativized case. Let P(M, p) denote hJρM , AMp i.
Lemma 7. Suppose M = hJα , Ai is an amenable structure, and p ∈
Jα is a good parameter. Suppose j : K 7→ P(M, p) is a ∆0 -elementary
embedding. Then there is an amenable structure N = hJβ , Bi and a
very good parameter q ∈ Jβ , such that K = P(N, q). Further, there is
a Σ1 -elementary embedding ̂ : N 7→ M such that j ⊆ ̂ and ̂(q) = p.
N
Finally, if p equals pM
1 then q = p1 .
161
Remarks on proof: This is (the unrelativized case of) lemma 3.3 of
[SchZem] (with less restricted parameters).
Let X = Ran(j), let Y = hM
J [ω × {p} × X], and let π be the
transitive collapse map for Y . It is easily seen that X is closed under
ordered pairs, whence by lemmas 46.10 and 46.11 Y ≺1 M , whence by
lemma 46.12 π[Y ] equals Jβ for some β ≤ α. Let B equal π[A ∩ Y ]. Let
̂ be the inverse of π. Write K as hJρ̄ , Āi.
Suppose x ∈ X, y ∈ Y , and y ∈ x. Then x ∈ P(M, p) since
X ⊆ P(M, p), whence y ∈ P(M, p) since P(M, p) is transitive. Also,
y = hJ (i, p, z) for some z ∈ X. This may be expressed as AMp (k, y, z)
(which recall is an abbreviation for AMp (hk, hy, zii) for some k, and
∃y ∈ xAMp (k, y, z) is true. Let x̄, z̄ be such that x = j(x̄) and z = j(z̄);
since j is ∆0 -elementary, ∃ȳ ∈ x̄Ā(k, ȳ, z̄) is true; choosing some ȳ,
AMp (k, j(ȳ), z) is true, so j(ȳ) = hJ (i, p, z), so j(ȳ) = y, and y ∈ X.
It follows by the preceding paragraph that π ↾ X is the inverse map
to j, and j ⊆ ̂. Let q = π(p). It also follows that N = hN
J [ω × {q} × Jρ̄].
From this it follows that ρN ≤ ρcN ≤ ρ̄.
For x ∈ Jρ̄ and i ∈ ω, φi (x, q) holds in N if and only if φi (j(x), p)
holds in M if and only if AMp (i, j(x)) if and only if Ā(i, x).
N
Suppose P ⊆ Jρ̄ is ΣN
1 . Since N = hJ [ω × {q} × Jρ̄ ] there is
a w ∈ Jρ̄ and a Σ1 formula Q such that P (x) ⇔ Q(x, w, q). By the
preceding paragraph there is an i such that P (x) if and only if Ā(i, x, w).
If γ < ω · ρ̄ then {w ∈ Jγ : Ā(i, x, w)} is an element of Jρ̄ , and it follows
that P ∩ Jγ is an element of Jρ̄ . As noted in the proof of theorem 1, it
follows that hJρ̄ , P i is amenable. Since P was arbitrary, ρ̄ ≤ ρaN ≤ ρN .
Thus, ρ̄ = ρN . By the above noted fact about Ā, Ā = AN q ; and so
K = P(N, q). That q is a very good parameter has already been noted.
The proof of the last claim may be found in the proof of lemma 3.6
of [Mitchell2]. Certainly pN ≤J q. Suppose pN <J q. Since q is very
good, pN is Σ1 -definable in Jβ from q. Since ̂ is Σ1 -elementary, ̂(pN )
is Σ1 -definable in Jα from ̂(q) = pM . This implies that ̂(pN ) is a very
good parameter, which is a contradiction since ̂(pN ) <J ̂(q) = pM . ⊳
Theorem 8. For α > 0 and n > 0 the following hold.
a. ρnα = ρnc
α .
b. pnα is a very good parameter.
c. Anα is ΣJnα .
P (J )
α
d. A subset of Jραn is ΣJn+1
if and only if it is Σ1 n α .
e. Suppose N = hJβ , Bi and j : N 7→ Pn (Jα ) is a ∆0 -elementary
embedding. There is a ᾱ ≤ α such that N = Pn (Jᾱ ). There is
a Σn -elementary embedding ̂ : Jᾱ 7→ Jα such that j ⊆ ̂ and
α
̂(pᾱ
i ) = pi for 1 ≤ i ≤ n.
Remarks on proof: The proof is by induction on n. M will be used
162
to denote Pn−1 (Jα ).
Jα
First, since ρnc
α is a δ-number there is a Σn partial function f
nc
such that Dom(f ) ⊆ ω · ρα and Ran(f ) = Jα . Let f¯ = f ∩ (ω · ρnc
α ×
Jραn−1 ). Using lemma 46.9, f¯ is ΣJnα , whence by part d of the induction
c
nc
hypothesis, f¯ is ΣM
1 . It follows that ρM ≤ ρα .
a
Second, ρna
α = ρM . This follows using part d of the induction
hypotheses, as in the proof of lemma VI.5.4 of [Devlin].
Part a follows from the preceding two paragraphs and theorem 1.a
and 1.b.
Part b is lemma 3.4 of [Mitchell2]; see also lemma 9.2 of [SchZem].
Let Y = hM
J [ω × {pM } × JρM ], and let π be the transitive collapse map
for Y .
If n = 1, as in lemma 7 π[Y ] equals Jᾱ for some ᾱ ≤ α. Now,
ρ1α = ρα , whence there is an X ⊆ ω · ρ1α , which is Σ1 -definable in Jα
from p1M , such that X ∈
/ Jα . Since Jρ1M ⊆ Y and p1M ∈ Y , it follows that
X is Σ1 -definable in Jᾱ (from π(p1M )). If ᾱ < α then X ∈ Jα would
follow, a contradiction, Thus, ᾱ = α. By lemma 46.12, π(p1M ) ≤ p1M .
Since A1M is Σ1 -definable in Jα from π(p1M ), it follows that π(p1M ) = p1M .
It follows that π is the identity map and Y = Jα , and so p1M is a very
good parameter.
For n > 1, by lemma 7 with j the identity map on Pn (Jα ), π[Y ]
n−1
n−1
]. By part e of the
. Let Ā = π[Aα
equals Jρ̄ for some ρ̄ ≤ ρα
n−1
n−1
induction hypothesis, ρ̄ = ρᾱ and Ā = Aᾱ for some ᾱ ≤ α.
By definition of ρnα , there is an X ⊆ ω · ρnα , which is ΣM
1 , such that
hJ ,Āi
1
X∈
/ Jρn−1
. Since Jρ1M ⊆ Y and pM ∈ Y , it follows that X is Σ1 ρ̄ . It
α
J Ā
Ā
follows that X is Σ1 ρ̄ , whence X ∈ Jρ̄+1
. Since hJρ̄ , Āi is amenable, it
n−1
;
follows by lemma 46.16 that X ∈ Jρ̄+1 . As in the case n = 1, ρ̄ = ρα
1
1
and further π(pM ) = pM , whence π is the identity map.
In remarks preceding lemma VI.5.3 of [Devlin], pnα is defined as the
least very good parameter, as may be seen using lemma 5. By part b
and lemma 6, this equals pnα as defined here. Parts c and d follow by
the proof of lemma VI.5.3.
For part e, using lemma 7 there is a Σ1 embedding j1 : hJγ , Ci 7→
Pn−1 (Jα ) which extends j, and with j1 (q) = pnα where q is a very good
parameter in Jγ . By part e of the induction hypothesis (or lemma 46.12
if n = 1), γ = ρn−1
for some ᾱ, and C = An−1
. It follows that β = ρnᾱ .
ᾱ
ᾱ
It also follows that q = pᾱ , by the last claim of lemma 7. ̂ is obtained
from ̂1 (or is the inverse of the transitive collapse if n = 1). The proof
that ̂ is ΣJnα follows inductively using lemma 3.2 of [SchZem]. ⊳
Theorem 8.d is a principal fact of fine structure theory. In Jα , the
standard code “codes” the Σn+1 subsets of Jρ as Σ1 subsets, where ρ is
163
the n-th projectum. Some additional facts will be given in the next few
paragraphs.
Theorem 8.d can be strengthened. For any l ≥ 1, a subset of Jραn
P (J )
α
is ΣJn+l
if and only if it is Σl n α . See lemma VI.5.3 of [Devlin].
The map ̂ of lemma 7 is unique. Indeed, suppose j : P(N, q) 7→
P(M, p) is a ∆0 -elementary embedding, and q is a very good parameter.
Then there is a unique ∆0 -elementary embedding ̂ : N 7→ M such that
j ⊆ ̂ and ̂(q) = p; further ̂ is Σ1 -elementary. See lemma 3.1 of
[SchZem].
Theorem 8.e can be strengthened. Both ᾱ and ̂ are unique. If j
is Σm -elementary, then for 0 ≤ i ≤ n, ̂ ↾ Jρiᾱ is a Σn−i+m -elementary
embedding of Pi (Jᾱ ) in Pi (Jα ). See theorem 8.6 of [Devlin].
As already indicated, suitably reformulated, various facts of fine
structure theory continue to hold for the relativized Jensen hierarchy.
Relativized fine structure theory is an essential ingredient of the branch
of set theory known as “core model theory”.
A J-structure M is said to be acceptable if the following holds:
B
- Suppose β < α, γ < ω · β, and Pow(γ) ∩ (Jβ+1
− JβB ) 6= ∅; then in
B
B
Jβ+1 , there is a function f ∈ Jβ+1 which is a surjection from γ to
ω · β.
In [DoddJen1] the notion defined above is called “strong acceptability”, and shown to imply a property called “acceptability”; various facts
can be proved from the weaker notion. In [Dodd] the weaker notion is
given as an axiom of the system RA, and various facts proved in RA.
Some facts which follow from the assumption that M is acceptable
will be stated without proof; unless otherwise specified references are to
[SchZem]. For a cardinal ρ of M (where M is a transitive ∈-structure),
HρM denotes {x ∈ M : |T C(x)|M < ρ}.
- A version of GCH holds in M (corollary 2.13 of [DoddJen1]).
- Suppose ρ ∈ M is an infinite cardinal in M , and a ⊆ u where
u ∈ JρB and a ∈ JαB ; then a ∈ JρB (1.23).
M
.
- Suppose ρ ∈ M is an infinite limit cardinal in M ; then JρA = Hωρ
- If ρM < α then ρM is a cardinal in M (2.2).
- If X ⊆ Jρ is ΣM
, Xi is amenable (2.4.a).
1 , then hJρM
1
M
- If p1 is a very good parameter then every good parameter is very
good (6.8).
- Each Jα is acceptable (9.1).
48. Upward extension.
Lemma 47.7 states how an embedding of an amenable structure K
into P(M, p) may be extended to an embedding of a larger structure N
into M . It is often called a “downward extension” lemma, since N is
164
obtained by a collapse. The “dual” problem of extending an embedding
of P(M, p) into an amenable structure K, to an embedding of M into a
larger structure N , is called the “upward extension” problem.
Suppose for this section that M = hJα , Ai and K = hJγ , Ci are
amenable structures (as for lemma 47.4, more general structures can be
considered), p ∈ Jα is a very good parameter, and j : P(M, p) 7→ K is
Σ1 -elementary. Also, let P denote the unary predicate symbol of the
expanded language.
To construct N , the pair hi, xi for i ∈ ω and x ∈ Jγ will represent
hJ (i, q, x) where q is an appropriate parameter. The construction is
based on the assumption that C is AN q , which can later be justified.
The pair hi, xi can represent an element of N only if ∃y(y = hJ (i, q, x)),
which can be expressed as C(hiv , hi, xii) for some iv ; such a pair will
called valid. Similarly, the following predicates may be defined.
- Let ≡K be the relation which holds for the valid pairs hi1 , x1 i and
hi2 , x2 i, if and only if hJ (i1 , q, x1 ) = hJ (i2 , q, x2 ); let i≡ be such
that this holds if and only if C(hi≡ , hi1 , x1 , i2 , x2 ii).
- Let ∈K be the relation which holds for the valid pairs hi1 , x1 i and
hi2 , x2 i, if and only if hJ (i1 , q, x1 ) ∈ hJ (i2 , q, x2 ); let i∈ be such
that this holds if and only if C(hi∈ , hi1 , x1 , i2 , x2 ii).
- Let PK be the relation which holds for the valid pair hi, xi, if and
only if P (hJ (i, q, x)); let iP be such that this holds if and only if
C(hiP , hi, xii).
Lemma 1. The relation ≡K is an equivalence relation on the valid
pairs, and respects ∈K and PK .
Remarks on proof: See the proof of lemma 4.2 of [SchZem]. The
relation ≡P(M,p) , defined from P(M, p) in the same way as ≡K , is an
equivalence relation, since the role of C is played by AMp . Using i≡ ,
this may be expressed as a Π1 formula in the expanded language. Thus,
the formula holds of C in K since j is Σ1 -elementary by hypothesis, and
thus preserves Π1 formulas as well. The argument that ≡K respects ∈K
and PK is similar. ⊳
Let K̂ denote the set of equivalence classes of the valid pairs under ≡K . K̂ will also be used to denote the structure with this set as
its domain, and where ∈ is interpreted as ∈K and P is interpreted as
PK . Let ix (resp. ip ) be the number of the formula y = π2 (x) (resp.
y = π1 (x)). Then in K̂, hix , xi represents x and hip , ∅i represents the
parameter. Let pK denote [hip , ∅i].
If S is a structure for the language of set theory, with membership
relation ∈S , a substructure T ⊆ S is said to be an initial substructure
of S if w ∈ T whenever x ∈ T and w ∈S x. S is also said to be an end
extension of A. See definition I.8.2.of [Barwise].
165
For part c of the following lemma, let R′ be the system of axioms
in the language of set theory expanded with a unary predicate symbol,
consisting of extensionality, foundation, and the existence conditions for
the basis functions for the rudimentary functions; see remarks following
lemma 1.4 of [Dodd].
Lemma 2.
a. There is a well-defined map jK : M 7→ K̂, where jK (hJ (i, p, x)) =
[hi, j(x)i].
b. jK is Σ2 -elementary.
c. In K̂ the axioms of the system R′ defined above hold; V = L also
holds.
d. The map x 7→ [hix , xi] is a bijection from Jγ to an initial substructure of K̂.
In the remaining clauses K will be identified with its image under this
map.
e. jK is an extension of j.
f. jK (p) = pK .
g. C(hi, xi) is true in K if and only if φi (x, pK ) is true in K̂.
h. Every x ∈ K̂ equals hJ (i, pK , w) for some w ∈ K.
Remarks on proof: See the proof of lemma 4.2 of [SchZem]. For
part a, if hJ (i1 , p, x1 ) = hJ (i2 , p, x2 ) then this is attested to by AMp ,
and hi1 , x1 i ≡K hi2 , x2 i follows. For part b, a formula ∀x∃yφ(x, y,z̊)
with z̊ ∈ M may be rewritten as ∀ix ∀xψ(ix , x,ı̊z ,z̊), where ψ is Σ1 ,
underlined variables and constants are in P(M, p), and z̊ = hJ (ı̊z ,z̊).
ψ is equivalent to a ∆0 formula, and the second claim follows. Part
c is immediate from part b, since these are all expressed by sentences
which are at worst Π2 , and they hold in M . Part d follows because the
formulas hix , x1 i ≡K hix , x2 i ⇒ x1 = x2 and hix , wi ∈K hix , xi ⇒ ∃u ∈
x(hix , ui ≡K hix , wi) may be expressed in Π1 form, and hold in P(M, p).
For part e, for x ∈ P(M, p) jK (x) = jK (hJ (ix , p, x)) = [hix , j(x)i]. For
part f, jK (p) = jK (hJ (ip , p, x)) = [hip , ∅i] = pK .
Let T (i, x, q) be the Σ1 predicate defined as in lemma 45.8, for i
the number of a Σ1 formula in two free variables. The hypotheses of
lemma 45.8 are needlessly restrictive, and T in fact defines the truth
predicate in any model of R′ . See definition 1.16 of [Dodd]. Also, that
T (i, x, q) ↔ φi (x, q) for each i is provable in R′ (exercise for the reader;
see for example theorem 16.49 of [TakZar1]).
The formula for T may be translated into a formula Ť (i, i1 , x1 , i2 ,
x2 ), which holds in P(M, p) if and only if T (i, hJ (i1 , x1 ), hJ (i2 , x2 ))
holds in M . The sentence P (i, x) ⇔ Ť (i, ix , x, ip , ∅) holds in P(M, p),
and may be written in Π1 form, so it holds in K. Part g follows.
The formula hJ (i, p, x) = hJ (i, hJ (ip , p, ∅), hJ (ix , p, x)) is express166
ible (using AMp ) as a ∆0 formula with free variables i and x. It’s
universal quantification is true in P(M, p), so is true in K. Using part
g, part h follows. ⊳
Lemma 3. Suppose K̂ is well-founded. There is a β and a “transitive collapse” map π : K̂ 7→ Jβ . Let B = π[PK ], let q = π(pK ), and let
N = hJβ , Bi. Then K equals P(N, q), q is a very good parameter, and
N and q are unique.
Remarks on proof: See the proof of lemma 4.2 of [SchZem]. By
remarks following lemma 36.1, the transitive collapse of K̂ may be taken.
Since K̂ is rud-closed and satisfies V = L, an argument similar to the
proof of lemma 46.12 shows that π[K̂] is Jβ for some β, where π is the
collapse map. The transitive collapse of the image in K̂ of Jγ equals Jγ
since it is an initial substructure.
There is a map from ρM onto JρM , which is Σ1 -definable without
parameters, and it follows that there is a map from γ onto Jγ , which is
Σ1 definable without parameters. Using this and lemma 2.h, it follows
K
that ρcN ≤ γ. Suppose X ⊆ Jγ is ΣN
1 . Then X is ∆0 , so by lemma
45.10 if x ∈ Jγ then x ∩ X ∈ Jγ . This shows that γ ≤ ρaN . Since
γ ≤ ρaN ≤ ρcN ≤ γ, γ = ρN . It now follows by lemma 2.g that C equals
AN q , and by lemma 2.h that q is a very good parameter.
Suppose N1 and q1 , and N2 and q2 , satisfy the conclusion of the
theorem. By lemma 2.g, AN1 q1 and AN2 q2 both equal C. Let σ : M1 7→
M2 be such that σ(hJ (i, q1 , w)) = hJ (i, q2 , w)) for w ∈ Jγ . Using C and
translating Σ1 to ∆0 as usual, it follows that σ is a well-defined function
from N1 to N2 , and is an isomorphism. Also, σ(q1 ) = q2 . ⊳
49. Fine structural ultrapowers.
There are different varieties of fine structural ultrapowers, and they
have many uses in modern set theory. An example of their use will be
seen in the next section, in a proof of the covering lemma. They are
also used in “core model” theory.
One variety of fine structural ultrapowers are called “extender ultrapowers”. These are fashioned after the extender ultrapowers of large
cardinal theory, described in section 43. The version given in [Schindler]
will be given here; other authors give different versions.
Suppose j : M 7→ N is a ∆0 -elementary embedding. [Schindler]
assumes that M and N are acceptable structures, which recall are
amenable structure hJαB , Ai satisfying a certain requirement (and are
models of the axiom system RA of [Dodd]). This is a convenient assumption; in particular, the structures will be transitive, rud-closed,
amenable, and models of V = L[B] (lemma 5.17 of [Schindler]). ∆0 separation holds (lemma 1.2 of [Dodd]). In applications here, B is invariably ∅.
167
Say that j is ∈-cofinal if, for all x ∈ N there is some w ∈ M
such that x ∈ j(w). [Schindler] assumes that j is ∈-cofinal. However,
given any j, let N ′ be the downward closure under ǫ of j[M ]. Then
j, considered as a map to N ′ (such a map is called a co-restriction) is
∈-cofinal. It is also ∆0 -elementary, and sup(j[M ] ∩ Ord) = N ′ ∩ Ord.
Lemma 1. If j : M 7→ N is ∈-cofinal and j is a ∆0 -elementary
embedding then j is Σ1 -elementary.
Proof: Suppose Fx,~y is a ∆0 formula, and x̊ ∈ N and ẙ1 , . . . , ẙn ∈ M
are such that |=N F (x̊, h(ẙ1 ), . . . , j(ẙn )). Since j is ∈-cofinal, there is
some ẘ ∈ M such that |=N ∃x ∈ j(ẘ)F (x, j(ẙ1 ), . . . , j(ẙn )). Since j is
∆0 -elementary, |=M ∃x ∈ ẘF (x, ẙ1 , . . . , ẙn ), whence |=M ∃xF (x, ẙ1 , . . . ,
ẙn ). ⊳
Thus, suppose j : M 7→ N is ∆0 -elementary (if needed, its corestriction to N ′ as above can be considered). Suppose that j is nontrivial, and κ is the least ordinal moved. Suppose λ < sup(j[M ] ∩ Ord).
For each a ∈ [λ]<ω let µa be the smallest µ such that a ⊆ j(µ); and let
Ea = {X ⊆ [µa ]|a| : X ∈ M, a ∈ j(X)}.
Given a ⊆ b the map πba may be defined as in section 43, except
−1
now πba : [µb ]|b| 7→ [µa ]|a| . For X ⊆ [µa ]|a| let Xab = πba
[X] ∩ [µb ]|b|
|b|
|a|
(a subset of [µb ] ). Given a function f with domain [µa ] let fab be
f ◦ πba (a function with domain [µb ]|b| ).
Recall the definition of an M -ultrafilter from section 44; and the
notation si and i(ξ, s) used in the proof of lemma 43.1.
Lemma 2.
a. µa is the least µ ∈ M such that [µ]|a| ∈ Ea .
b. Ea is an M -ultrafilter on [µa ]|a| , which is κ-complete for sequences
in M .
c. For X ⊆ [µa ]|a| with X ∈ M , X ∈ Ea if and only if Xab ∈ Eb .
d. µ{κ} = κ.
e. Suppose f : [µa ]|a| 7→ µa where f ∈ M , and suppose {s ∈ [µa ]|a| :
f (s) < max(s)} ∈ Ea . Then for some β < max(a), {s ∈ [µa ]|b| :
fab (s) = si(β,b) } ∈ Eb where b = a ∪ {β}.
Remarks on proof: This is lemma 10.18 of [Schindler], where the
proof is left as an exercise. For part a, write n for |a|; [µ]n ∈ Ea if
and only if a ∈ j([µ]n ). Also, j([µ]n ) = [j(µ)]n . It follows readily that
a ⊆ j(µ) if and only if a ∈ j([µ]n ). For part b, the proof of theorem 36.4
may be adapted, as already indicated in section 44 for M -ultrafilters.
That [µa ]|a| ∈ Ea is needed, which was already proved in part a. Part c
follows as lemma 43.1.e, noting that b ∈ j([µb ]|b| ). Part d follows because
κ∈
/ j(α) for α < κ, but κ ∈ j(κ). Part e follows as lemma 43.1.f, with
the following modifications. Let X denote {s ∈ [µa ]|a| : f (s) < max(s)}.
∀s ∈ X∃β < max(s)(f (s) = β) is true in M , and after the substitution
168
is true in N . Let Y denote {s ∈ [µa ]|b| : fab (s) = si(β,b) }. ∀s∀a∀β(P ⇒
s ∈ Y ) is true in M , so replacing f by j(f ) and Y by j(Y ) it is true in
the reduced codomain N ′ . ⊳
In this section, by a (κ, λ)-extender over M is meant a system of
ultrafilters with the properties of lemma 2. As in section 43, given such,
an ultraproduct may be taken. U0 , ≡0 , and ∈0 are defined as in section
43. [ζ]|a| is replaced by [µa ]|a| , etc.; and a function f must be in M . In
addition, let P0 be the unary predicate, where P0 (ha, f i) if and only if
{s ∈ [µa ]|a| : P (f (s))} ∈ Ea .
Lemma 3. ≡0 is a congruence relation on U0 , equipped with ∈0 and
P0 .
Remarks on proof: This is stated in the proof of theorem 10.20
of [Schindler]. The proof is as that for lemma 43.2, with the following
modifications. That ≡0 is reflexive follows from [µa ]|a| ∈ Ea . Given
a, a′ let c = a ∪ a′ . Suppose P (f (s)) on s ∈ X where X ∈ Ea , and
fac (s) = fa′ ′ c (s) for s ∈ Xc where Xc ∈ Ec . Then fac (s) = fa′ ′ c (s) for
s ∈ Xac ∩ Xc , which proves that ≡0 respects P0 . ⊳
As in section 43, let E denote the extender and let UltE0 (M ) be the
quotient of U0 by ≡0 . This is a structure for the language of set theory
expanded with a unary predicate symbol. To simplify the notation,
write [a, f ] for [ha, f i].
Lemma 4 (Los theorem). Suppose φ is a ∆0 formula. Letting N
denote UltE0 (M ), suppose [ai , fi ] is an element of N for 1 ≤ i ≤ n. Let
c = ∪i ai . Then
a. |=N φ([a1 , f1 ], . . . , [ak , fk ])
if and only if
b. Xφ ∈ Ec where Xφ = {s ∈ [µc ]|c| :|=M φ(f1a1 c (s), . . . , fkak c (s))}.
Remarks on proof: This is “claim 1” in the proof of theorem 10.20
of [Schindler]. The proof is the same as that of lemma 43.3, with some
modifications. V is replaced by M . The claim for φ = ¬ψ follows using
Xφ = [µc ]|c| − Xψ . Free variables may be added to subformulas using
property (c) of an extender.
Suppose φy,~x is ∃z ∈ y ψz,~x . To prove b⇒a, suppose {s :|=M
φ(g cc1 (s), f1a1 c1 (s), . . . , f1ak c1 (s))} ∈ Ec1 ; and let h : [µc1 ]|c1 | 7→ ∪Ran(g)
be the function where h(s) is the <J -least y ∈ ∪Ran(g) such that
|=M y ∈ g(s) ∧ ψ(y, f1a1 c1 (s), . . . , f1ak c1 (s)) if such a y exists, else ∅. By
the assumptions on M and the fact that g ∈ M . it follows that h ∈ M .
Further, {s :|=M h(s) ∈ g(s) ∧ ψ(h(s), f1a1 c1 (s), . . . , f1ak c1 (s))} ∈ Ec1 .
Using the induction hypothesis, |=N [c1 , h] ∈ [b, g] ∧ ψ([c1 , h], [a1 , f1 ],
. . . , [ak , fk ]); and a follows. ⊳
For x ∈ M let Cx denote the function {h∅, xi} (this is a map from
[µ∅ ]0 to M ). Let j E0 : M 7→ UltE0 (M ) be the map given by j E0 (x) =
169
[∅, Cx ]. Let Uα : [µ{α} ]1 be the function {h{ξ}, ξi : ξ < µ{α} }. For a ∈
[λ]<ω let Ia denote the identity function on [µa ]|a| . Clearly Uα , Ia ∈ M .
Lemma 5.
a. j E0 is a ∆0 -elementary embedding.
b. j E0 is ∈-cofinal.
c. For α < κ, the elements of [∅, Cα ] are the elements [∅, Cβ ] for β < α.
d. For α < λ, the elements of [{α}, Uα ] are the elements [{β}, Uβ ] for
β < α.
e. The elements of [a, Ia ] in UltE0 (M ) are the elements [{α}, Uα ] for
α ∈ a.
f. For all [a, f ] in UltE0 (M ), [∅, f ]([a, Ia ]) = [a, f ].
g. For all a ∈ [λ]<ω , X ∈ Ea if and only if a ∈ j E0 (X).
h. The critical point of j E0 equals κ.
Remarks on proof: This is proved in the proof of theorem 10.20
of [Schindler]. For part a, in the notation of the proof of lemma 4,
Xφ (Cx1 , . . . , Cxk ) equals {h∅, . . . , ∅i} if |=M φ(x1 , . . . , xk ), else ∅. That
j E0 is ∆0 -elementary follows using lemma 4, and since equality is present
j E0 is injective. For part b, suppose [ha, f i] is an element of UltE0 (M ).
Let x = Ran(f ). It is easily seen that X∈ (f, Cx ) = [µa ]|a| , whence
by lemma 4 [a, f ] ∈ [∅, Cx ]. Part c is proved as lemma 43.6.b, except
the fact that Ea is an M-κ-complete M -ultrafilter is used. Part d is
proved as lemma 43.6.c, except Uα is used rather than U ; property (e)
of a (κ, λ)-extender over M is used; [µa ]|a| is used rather than [ζ]|a| , etc.;
c2 = c1 ∪{β}; and x ∈ y ⇒ ¬(y ∈ x∨y = x) holds in UltE0 (M ) by parts
a and b and lemma 1. Part e is proved as lemma 43.6.d. Part f and g are
proved as lemmas 43.6.e and 43.6.f, except s ∈ [µa ]|a| . For part h, by
parts c and d it suffices to show that [{κ}, Uκ] ∈ [∅, Cκ ]. By property (d)
of a (κ, λ)-extender over M , µ{κ} = κ, whence {s ∈ [µκ} ]1 : Uκ (s) ∈ κ}
equals [µκ} ]1 and is in Eκ . ⊳
Theorem 10.20 of [Schindler] states that j E0 has some properties,
easily derived from the foregoing. It also states that UltE0 (M ) and jE0
are characterized by these properties. If UltE0 (M ) is well-founded its
transitive collapse may be taken; UltE (M ) will be used to denote this;
j E denotes the composition of the transitive collapse map with j E0 .
If UltE0 (M ) is well-founded the comments following lemma 43.6 apply
(with M rather than V , etc.; in particular Ea = {X ⊆ [ζ]|a| : X ∈ M
and a ∈ j E (X)}).
A second variety of fine structural ultrapowers, called pseudo-ultrapowers, will be briefly covered. An early construction of these may be
found in [DoddJen2]; [Welch2] contains a more recent treatment. For
simplicity only amenable structures of the form hJα , Ai will be considered; but the treatment may be extended to acceptable structures.
170
Suppose
- j : Jγ̄ 7→ Jγ is a ∆0 -elementary embedding,
- M = hJα , Ai is an amenable structure with α ≥ γ̄,
- F is a set of functions f with f ∈ Jα and Dom(f ) ∈ Jγ̄ , and
- λ ≤ ω · γ is an ordinal.
Let U0 = {ha, f i : f ∈ F , a ∈ [λ]<ω , and a ∈ j(Dom(f ))}. Given
f, g ∈ F let D= (f, g) = {hu, vi : f (u) = g(v)}, let D∈ (f, g) = {hu, vi :
f (u) ∈ g(v)}, and let DP (f ) = {u : P (f (u))}. Say that F is good
for γ̄ if for all f, g ∈ F , D= (f, g), D∈ (f, g), and DP (f ) are in Jγ̄ .
Supposing F to be good for γ̄, let ≡0 be the binary relation on U0 , where
ha, f i ≡0 hb, gi if and only if ha, bi ∈ j(D= (f, g)). Let ∈0 be the binary
relation, where ha, f i ∈0 hb, gi if and only if ha, bi ∈ j(D∈ (f, g)). Let P0
be the unary predicate, where P0 (ha, f i) if and only if a ∈ j(DP (f )).
Lemma 6. ≡0 is a congruence relation on U0 , equipped with ∈0 and
P0 .
Remarks on proof: This is stated without proof in remarks preceding proposition 3.9 of [Mitchell2]; see also remarks following definition
3.4 in [DoddJen2]. Let ds = {hu, ui : u ∈ s}; that t = ds is expressible by a ∆0 formula, so j(ds ) = dj(s) . Thus, if a ∈ j(Dom(f )) then
ha, ai ∈ j(dDom(f ) ). Since dDom(f ) ⊆ D= (f, f ), ha, ai ∈ j(D= (f, f )),
which shows that ≡0 is reflexive. If r is a binary relation then j(r)
is a binary relation, and j commutes with the transpose operation. It
follows that ≡0 is reflexive. Similarly, j commutes with composition
of binary relations, and it follows that ≡0 is transitive. The composition of D= (f ′ , f ), D∈ (f, g), and D= (g, g ′ ) is contained in D∈ (f ′ , g ′ ). It
follows that if ha′ , ai ∈ j(D= (f ′ f )), ha, bi ∈ j(D∈ (f, g)), and hb, b′ i ∈
j(D= (g, g ′ )) then ha′ , b′ i ∈ j(D∈ (f ′ , g ′ )). That ≡0 respects P0 follows
similarly, since the composition of D= (g, f ) and DP (f ) is contained in
DP (g). ⊳
Let Ultjλ0 (M ) be the quotient of U0 by ≡0 , provided this exists,
that is, provided F is good for γ̄. This is a structure for the language of
set theory expanded with a unary predicate symbol. For a ∆0 formula
φ over this language, and f1 , . . . , fn ∈ F , let Dφ (f1 , . . . , fn ) denote
{hu1 , . . . , un i : |=M φ(f1 (u1 ), . . . , fn (un ))}. As previously, write [a, f ]
for [ha, f i].
Lemma 7. Suppose F is good for γ̄, and let N denote Ultjλ0 (M ).
Suppose φ is a ∆0 formula, f1 , . . . , fn ∈ F , and ai ∈ [λ]<ω ∩ j(Dom(fi ))
for 1 ≤ i ≤ n. Then Dφ (f1 , . . . , fn ) ∈ Jγ̄ , and
a. |=N φ([a1 , f1 ], . . . , [an , fn ])
if and only if
b. ha1 , . . . , an i ∈ j(Dφ (f1 , . . . , fn )).
Proof: Write M ′ for Jγ̄ . The claim holds for atomic formulas by def171
inition. The claim for φ = ¬ψ follows using the claim for ψ, and the fact
that Dφ (f1 , . . . , fn ) = Dom(f1 ) × · · · × Dom(fn ) − Dψ (f1 , . . . , fn ) (and
the fact that j commutes with Cartesian product, set difference, etc.).
The claim for φ = ψ ∧ θ follows using Dφ (f1 , . . . , fn ) = Dψ (f1 , . . . , fn ) ∩
Dθ (f1 , . . . , fn ) (and the fact that Dψ (f1 , . . . , fn ) may be obtained from
Dψ (fi1 , . . . , fit ) and Dom(fi ) for 1 ≤ i ≤ n by a rudimentary function,
etc.).
Suppose φ is ∃z ∈ y ψ(z, x1 , . . . , xn ), and g, f1 , . . . , fn ∈ M have
domains in M ′ . Let h be the function with domain Dom(g)×Dom(f1 )×
· · · × Dom(fn ), where h(v, u1 , . . . , un ) equals the <J -least z ∈ g(v) such
that |=M ψ(z, f1 (u1 ), . . . , fn (un )), or ∅ if there is no such z. Clearly
Dom(h) ∈ M ′ . To see that h ∈ M , in M let r = {hz, v, ~ui : z ∈
g(v) ∧ ψ(z, f1 (u1 ), . . . , fn (un ))}; then r ∈ M follows by ∆0 -separation.
Thus, r ∈ Sν for a sufficiently large ν ∈ M . The restriction o of <J to
Sν is in M (lemma 2.7.iii of [Devlin], or lemma 2.23 of [Dodd] for more
general structures). Finally w = h(v, ~u) if and only if r(w, v, ~u) ∧ ∀z ∈
g(v)(hz, wi ∈ o ⇒ ¬r(z, v, ~u)).
Let θ denote z ∈ y ∧ ψ(z, x1 , . . . , xn ). Using the induction hypothesis, it follows that Dθ (h, g, f~) ∈ M ′ , from which Dφ (g, f~) ∈ M ′ follows, since it is the composition with w = hv, ~ui, restricted to Ran(g) ×
Dom(g) × Dom(f1 ) × · · · × Dom(fn ).
If b holds then h witnesses that a does. Suppose a holds (with
[b, g] for the value of y), and let [c, h] be a value of z witnessing the
fact. Using the induction hypothesis it follows that hc, a1 , . . . , an i ∈
j(Dψ (f1 , . . . , fn )) and hc, bi ∈ j(D∈ (h, g)); hb, a1 , . . . , an i ∈ j(Dφ (g,
f1 , . . . , fn )) follows. ⊳
For x ∈ M let Cx denote the function {h∅, xi}. Let j jλ0 : M 7→
Ultjλ0 (M ) be the map given by j jλ0 (x) = [∅, Cx ].
Lemma 8.
a. j jλ0 is a ∆0 -elementary embedding.
b. j jλ0 is ∈-cofinal.
Proof: For part a, Dφ (Cx1 , . . . , Cxn ) equals {h∅, . . . , ∅i} if |=M φ(x1 ,
. . . , xn ), else ∅. The result follows using lemma 7. For part b, suppose
[a, f ] is an element of Ultjλ0 (M ). Let x = Ran(f ). Then D∈ (f, Cx ) =
Dom(f ) × {∅}, so ha, ∅i ∈ j(D∈ (f, Cx )), that is, [a, f ] ∈ j jλ0 (x). ⊳
Lemma 9. Suppose j : Jα 7→ Jβ is a Σ1 -elementary embedding,
γ ≤ α, and j ↾ ω · γ is the identity map. Then j ↾ Jγ is the identity
map.
Proof: Since ξ 7→ Sξ is Σ1 , j(Sξ ) = Sξ for ξ < ω · γ. It follows by
induction that for such ξ, j ↾ Sξ is the identity. ⊳
Lemma 10. If j is nontrivial then j jλ0 is nontrivial.
Proof: Consider the co-restriction of j as in remarks preceding
172
lemma 1. Using lemmas 1 and 9, it follows that j ↾ ω · γ̄ is not the
identity. As in the proof of lemma 36.3, there is a least ordinal κ such
that j(κ) 6= κ, and j(κ) > κ. For x ∈ Jγ̄ let Ix denote the identity
function restricted to x. Let x = {{ζ} : ζ < κ}. Then ∀ζ(ζ < κ ⇒
{ζ} ∈ x), so ∀ζ(ζ < j(κ) ⇒ {ζ} ∈ j(x)), so {κ} ∈ j(x). Thus [{κ}, Ix ]
is in Ultjκ0 (M ). D= (Ix , Cy ) is empty unless y ∈ x, in which case it
equals {hy, ∅i}. It follows that ha, Ix i ≡0 h∅, Cy i if and only if y ∈ x and
a = j(y); and thus [{κ}, Ix ] does not equal [∅, Cy ] for any y ∈ Jγ̄ . ⊳
Lemma 11. Suppose N = Ultjκ0 (M ) is well-founded and let π :
N 7→ Jᾱ be the collapsing isomorphism. Let γ ′ = sup(j[ω · γ̄]). Then
β ∩ γ ′ ⊆ Jᾱ ; and if ξ ∈ Jγ̄ and j(ξ) < β then π(j jλ0 (ξ)) = j(ξ).
Proof: Suppose ζ ∈ β ∩γ ′ ; then there is a θ ∈ Jγ̄ such that ζ < j(θ).
Let Jθ = {h{ζ}, {ζ}i : ζ < θ}. Since D= (Jθ1 , Jθ2 ) = {h{ζ}, {ζ}i : ζ <
min(j(θ1 ), j(θ2 ))}, h{ζ1 }, Jθ1 i ≡0 h{ζ2 }, Jθ2 i if and only if ζ1 = ζ2 and
ζ1 < min(j(θ1 ), j(θ2 )). Let φ(x) be the ∆0 formula stating that x is
a singleton set whose element is an ordinal. Then Dφ (Jθ ) = Dom(Jθ ),
and using lemma 7 it follows that if ζ < j(θ) then |=N φ([{ζ}, Jθ ]). Let
ψ(x1 , x2 ) be the ∆0 formula stating that x1 = {ζ1 }, x2 = {ζ2 }, and
ζ1 < ζ2 . Again using lemma 7, it follows that if ζ1 < ζ2 < j(θ) then |=N
ψ([{ζ1 }, Jθ ], [{ζ2 }, Jθ ]). Let ζ ′ be the ordinal such that π([{ζ}, Jθ ]) =
{ζ ′ }. Thus, if ζ1 < ζ2 then ζ1′ < ζ2′ . It follows that ζ ′ = ζ. That
j({ξ}) = [j({ζ}), Jθ ]) follows by facts given in the proof of lemma 11,
and the last claim follows. ⊳
In applications of pseudo-ultrapowers, various methods are used to
ensure that the collection of functions F is good for γ̄.
50. The covering lemma.
As mentioned in section 40, modern proofs of the covering lemma
make use of fine structure theory. The proof given in [Schindler] will be
outlined here. It involves the use of fine structural extender ultrapowers.
A proof of the covering lemma using fine structural pseudo-ultrapowers
may be found in [Rasch].
The proof makes use of two definitions, that of an F -dense set, and
that of a specific set W . Lemmas 5 and 6 will prove properties of these,
which will be used in the proof of the covering lemma in theorem 7.
Suppose κ is an uncountable cardinal and A is a set with |A| ≥ κ.
Recall that [A]κ denotes the set of subsets of A of cardinality κ. Say
that a subset S ⊆ [A]κ is F -dense if, whenever {fγ : γ < η} for η ≤ κ
is a set of functions from A to A, then there is a set x ∈ S such that
fγ [x] ⊆ x for all γ < η.
For the definition of W , some preliminary definitions are needed. As
in earlier sections, Σn will be used as an abbreviation for “Σn -definable
in Jβ from parameters”. For ordinals α ≤ ω ·β and n ∈ ω, α is said to be
173
a Σn -cardinal in Jβ if there is no Σn function f , with Dom(f ) a bounded
subset of α and Ran(f ) = ω·β. In the case n = 0, equivalently, if there is
no function f ∈ Jβ , with Dom(f ) an ordinal γ < α and Ran(f ) = ω · β.
α will be said to be a cardinal in Jβ if, as usual, α ∈ Jβ and the preceding
holds.
Lemma 1. Suppose α ≤ β, n > 0, and in Jβ , ω ·α is a Σn−1 -cardinal
but not a Σn -cardinal. Then ρnβ < α ≤ ρβn−1 . Further ω · ρnβ is the least
ρ such that there is a Σn map of a subset of ρ onto ω · α.
Remarks on proof: The first two claims are lemma 2.1.i of [DevJen].
If ω · α is a Σl -cardinal then there is no map onto ω · α, whence clearly
there is none onto ω·β. ρlβ ≥ α follows by theorem 47.4 (characterization
(c) of ρnβ ). Let ρ be least such that there is a Σn map from a subset
of ω · ρ onto ω · α; by hypothesis ρ < α. Again using theorem 47.4,
ρnβ ≥ ρ. Let f be Σn , such that Dom(f ) ⊆ ω · ρ and Ran(f ) = Jα . Let
a = {ν ∈ Dom(f ) : ν ∈
/ f (ν)}. If a ∈ Jα then a = f (ν) for some ν, and
ν ∈ f (ν) ⇒ ν ∈
/ f (ν); thus, a ∈
/ Jα . Since a is Σn , it follows by theorem
47.4 (characterization (b) of ρnβ ) that ρnβ ≤ ρ. Thus, ρ = ρnβ and ρnβ < α.
⊳
Lemma 2. If n > 0 then ρnβ is a Σn -cardinal in Jβ .
Proof: This follows by theorem 47.4; if there is a map f : γ 7→ ω · ρnβ
then there is a map g : γ 7→ ω · β ⊳
Suppose β is the largest ordinal such that κ is a cardinal in Jβ .
Then κ is not a cardinal in Jβ+1 , so there exists a surjection f : γ 7→ κ
with γ < κ and f ∈ Jβ+1 . Such an f is Σn for some n. Let n be least
such that there is such an f . Then in Jβ , κ is a Σn−1 -cardinal but not
a Σn -cardinal.
Let η be an ordinal and let κi : i < α be the cardinals of Jη . For
i < α let βi be the largest β such that κi is a cardinal in Jβi if such
exists, else ∞. Let κα equal ω · η. Define βα as for i < α, except that it
equals η if ω · η is not a cardinal in Jη+1 .
Lemma 3. Suppose i, j ≤ α.
a. βi ≥ η.
b. If η ≤ β < βi then for all n, ρnβ ≥ κi .
c. If βi < ∞ then for some n, ρnβi < κi .
d. If i ≤ j then βj ≤ βi .
e. {βi : i < α} is finite.
Remarks on proof: These observations are made preceding lemma
10.32 of [Schindler]. Part a is immediate. Parts b and c follow by lemma
1 and the remarks following it. For part d, there is nothing to prove
if βi = ∞, so suppose otherwise. Then for some n, ρβni < κi < κj .
However if βi < βj then ρnβi ≥ κj for all n. Part e is an immediate
consequence of part d. ⊳
174
< κi ≤ ρnβi if βi < ∞, else 0.
Let ni be the least n such that ρn+1
βi
Lemma 4. If i ≤ j and βi = βj then nj ≤ ni .
Remarks on proof: This observation is also made preceding lemma
10.32 of [Schindler]. The claim is immediate if βi = ∞. Otherwise,
ρnβii +1 < κi ≤ κj ≤ ρnβii and the claim follows. ⊳
By lemma 2 each ρnβii +1 equals κl for some l; let IY be the set of
such l, together with α. By lemmas 3 and 4 IY is finite.
Suppose that µ is a regular cardinal, Y ⊆ Hµ is an elementary
substructure, and j : X 7→ Hµ is the inverse of the transitive collapse
map for Y . Suppose j is nontrivial and let κc denote the critical point.
- Let η = X ∩ Ord, and let α, and κi and βi for i ≤ α, be defined as
above.
- For i ≤ α, if κc < κi let νi = sup(j[κi ]), and let Ei be the (κc ,νi )extender derived from j ↾ Jκi .
- Recall the definition of Pn from section 47. Since κi ≤ ρnβii , there is
a map ji : Pni (Jβi ) 7→ UltEi (Pni (Jβi )).
- If UltEi (Pni (Jβi )) is well-founded let i : Jβi 7→ Mi denote the map
given by lemma 48.2.
Suppose κ is an uncountable cardinal, θ ≥ κ is an ordinal, and
µ > θ is a regular cardinal. With notation as above, let W denote
the set of elementary substructures Y ⊆ Hµ such that Mi exists and is
well-founded for all i ∈ IY with κc < κi .
Lemma 5. {Y ∩ θ : Y ∈ W and |Y ∩ θ| = κ} is an F -dense subset
of [θ]κ .
Remarks on proof: This is proved in lemma 10.33 of [Schindler] for
κ a regular cardinal; the general case then follows by lemma 10.34. ⊳
Lemma 6. If there is no nontrivial embedding e : L 7→ L then
{Y ∩ Ord : Y ∈ W } ∈ L.
Remarks on proof: This is lemma 10.32 of [Schindler]. ⊳
It should be mentioned that various details are omitted from the
proofs of lemmas 5 and 6 cited above; no effort to provide these will be
made here, though. Various references have other proofs of the covering
lemma, including [DevJen], [Devlin], [Jech2]; some of these do not use
fine structure theory.
Theorem 7. The following are equivalent.
a. There is no non-trivial elementary embedding j : L 7→ L.
b. If κ is an uncountable cardinal and θ is a cardinal such that θ ≥ κ,
then [θ]κ ∩ L is an F-dense subset of [θ]κ .
c. If x is an uncountable set of ordinals then there is a constructible
set of ordinals y such that x ⊆ y and |y| = |x|.
d. 0# does not exist.
175
Remarks on proof: That a⇒b follows by lemmas 5 and 6. For b⇒c,
suppose x as in c is given. Let κ = |x| and let θ ≥ κ be a cardinal such
that x ⊆ θ. Let γ 7→ xγ be a bijection from κ to x. Let fγ be the
function which is constantly γ on θ. By part b, there is a constructible
set y ∈ [θ]κ such that xγ ∈ y for all γ, so that x ⊆ y. The proof that c⇒d
may be found in remarks preceding Corollary 18.31 of [Jech2]. Suppose
0# exists. First, if κ is an uncountable cardinal then κ is regular in
L. This follows because the Silver indiscernibles are “L-indiscernibles”,
the uncountable cardinals are Silver indiscernibles, and ℵ1 is regular in
L (using lemma 30.3); see theorem 2.15 of [Devlin], and also theorem
38.9. In particular ℵω is regular in L. Thus, ℵ1 ∪ {ℵn : n ∈ ω} cannot
be covered by any constructible set of ordinals of cardinality less than
ℵω . Part d implies part a by theorem 44.5. ⊳
The implication d⇒c is the classical covering lemma. An “official”
proof first appeared in [DevJen]; slightly earlier, handwritten notes containing a proof had been provided by Jensen. ¬0# is a weaker hypothesis than V = L. When part c was seen to follow from it, various
consequences for the universe of sets were seen to hold, some of which
will be given in the next section.
The implication a⇒d was already proved, in theorem 44.1.
51. Cardinal arithmetic.
The notions of κ+ , ℵα , κ + λ, and κ · λ are defined in section 13;
κ is defined in section 14. Various basic facts about these operations
have already been given. κ + λ and κ · λ are determined by ZFC. By
results of sections 20 and 22, however, 2ℵ0 is not determined.
Suppose hκi : i ∈ Ii is a set of cardinal numbers. Let Σi∈I κi = |D|
where D is the disjoint union of the κi . Let Πi∈I κi = |C| where C is
the Cartesian product of the κi , that is, {f : I 7→ ∪i κi } where f (i) ∈ κi
for all i ∈ I.
Theorem 1 (Konig’s theorem). If κi < λi for all i ∈ I then Σi∈I κi <
Πi∈I λi .
Proof: This is theorem 5.10 of [Jech2]. Let C be the Cartesian
product of the λi . Let πi be the projection from C to the ith coordinate.
Suppose Di ⊆ C is a subset with |Di | ≤ κi , for each i ∈ I. Then
|πi [Di ]| ≤ κi < λi , so πi [Di ] ⊂ λi . Thus there is an f ∈ C such that
f (i) ∈
/ πi [Di ] for any i, and so f ∈
/ Di for any i. ⊳
Corollary 2. κcf(κ ) > κ.
P
Proof: This is corollary 5.14 of [Jech2]. If κ = i<cf(κ) κi where
κi < κ then κ < Πi<cf(κ) κ = κcf(κ) . ⊳
There are restrictions on the behavior of 2λ and κλ which are provable in ZFC. In particular:
- If κ1 ≤ κ2 then 2κ1 ≤ 2κ2 .
λ
176
- κ < cf(2κ ).
Theorem 3 (Easton’s theorem). Suppose F is a function from cardinals to cardinals such that if κ1 ≤ κ2 then F (κ1 ) ≤ F (κ2 ), and
κ < cf(F (κ)). Then there is a model of ZFC in which 2κ = F (κ)
for any regular cardinal κ.
Remarks on proof: This is theorem 15.18 of [Jech2]. The proof
uses “Easton forcing”, a type of forcing with a class of conditions, over
a ground model satisfying GCH. ⊳
The situation for singular cardinals is more complicated.
Theorem 4.
a (Bukovsky-Hechler). Suppose κ is a singular cardinal. If 2λ = µ
for all sufficiently large λ < κ then 2κ = µ.
b (Silver). Suppose κ is a singular cardinal of uncountable cofinality.
If 2λ = λ+ for all cardinals λ < κ then 2κ = κ+ .
c (Galvin-Hajnal). Suppose κ = ℵα is a strong limit singular cardinal
of uncountable cofinality. Then 2κ < ℵγ where γ = (2|α |)+ .
d (Shelah). Suppose ℵω is a strong limit cardinal. Then 2ℵω < ℵℵ4 .
Remarks on proof: These may all be found in [Jech2]. Part a is
corollary 5.17, part b is theorem 8.12, part c is theorem 24.1, and part
d is theorem 24.33. ⊳
The proof of theorem 5.d uses “PCF theory”, a theory concerning
ultraproducts of sets of regular cardinals, which has found a variety
of applications. Additional consequences for cardinal arithmetic have
been stated by various authors (although these were doubtless known
to Shelah). For example,
cf(δ)
- Suppose δ is a limit ordinal and |δ|cf(δ) < ℵδ . Then ℵδ
< ℵγ
where γ = |δ|+4 .
(theorem 7.3 of [AbrMag]).
The “singular cardinals problem” is to give a set of rules describing
the behavior of the function 2κ on singular cardinals. This turns out to
depend on what types of large cardinals are allowed. See [Gitik] for a
survey.
The function λκ is not determined by the function 2κ . It is determined by the function κcf(κ) ; see corollary 5.18 and corollary 5.21 of
[Jech2]. It is determined if GCH holds, as follows.
Theorem 5. Suppose GCH holds and κ, λ are infinite cardinals.
- If κ < cf(λ) then λκ = λ.
- If cf(λ) ≤ κ < λ then λκ = λ+ .
- If λ ≤ κ then λκ = κ+ .
Remarks on proof: This is theorem 5.15 of [Jech2]; the third claim
was already proved in theorem 14.5. ⊳
177
By corollary 2 κcf(κ ) ≥ κ+ . If GCH holds then κcf(κ ) = κ+ (theorem 5). The singular cardinals hypothesis (SCH) is the statement that
κcf(κ) = κ+ for any cardinal κ such that 2cf(κ) < κ. Note that for the
hypothesis to hold, κ must be singular. SCH was isolated around 1970
as being of particular interest. If SCH holds then the function λκ is
determined by the function 2κ , as follows.
Theorem 6. Suppose SCH holds and κ, λ are infinite cardinals.
- If λ > 2κ and κ < cf(λ) then λκ = λ.
- If λ > 2κ and κ ≥ cf(λ) then λκ = λ+ .
- If λ ≤ 2κ then λκ = 2κ .
Remarks on proof: This is theorem 5.22.ii of [Jech2]. ⊳
Theorem 7. Suppose ¬0#.
a. SCH holds.
b. If κ is singular then κ is singular in L.
c. If κ is singular then (κ+ )L = κ+ .
Remarks on proof: Part a is corollary 18.33 of [Jech2]. Suppose
2cf(κ) < κ (whence κ is singular). Let C = {Y ⊆ κ : Y ∈ L and
|Y | = max(ℵ1 , cf(κ))}. Then |C| ≤ |PowL (κ)| = |(κ+ )L | ≤ κ+ . Also,
|[Y ]cf(κ) | = max(ℵ1 , cf(κ))cf(κ) = 2cf(κ) . If X ∈ [κ]cf(κ) then by the
covering lemma there is a Y ∈ C with X ⊆ Y . Using the hypothesis
2cf(κ) < κ, it follows that |[κ]cf(κ) | ≤ κ+ . Now, κcf(κ) is the number of
functions from cf(κ) to κ. It follows that κcf(κ) ≤ |[κ]≤cf(κ) | · cf(κ)cf(κ) .
It is easily seen that |[κ]≤cf(κ) | ≤ |[κ]cf(κ) | · cf(κ), and 2cf(κ) ≤ |[κ]cf(κ) |;
κcf(κ) = |[κ]cf(κ) | follows. Part b is corollary 18.31 of [Jech2]. Part c is
corollary 18.32 of [Jech2]. ⊳
Part c is called the “weak covering” principle. It is of interest in
“core model theory”, since if L is replaced by a core model K, the weak
covering principle might hold, even though the full covering principle
does not. Some other conclusions which follow from ¬0# via the covering lemma will be given in the next section.
52. Square.
Just as the diamond principle ♦, the square principle is a “combinatorial” principle which follows from V = L, and has various consequences. It was defined in 1972 by R. Jensen in the same paper
([Jensen]) as the diamond principle.
Recall from section 23 that the club subsets of α are defined for any
limit ordinal α. If S is a set of ordinals let Otp(S) denote the order type
of S, with the order inherited from Ord. Let κ be an infinite cardinal,
and let E be a subset of κ+ . The principle κ (E) will be defined; this
is a generalization of κ useful in developing the theory. κ (E) is the
statement that the following holds:
178
There is a sequence hCα i for α < κ+ with α ∈ LimOrd, such that
the following hold.
1. Cα is a club subset of α;
2. if cf(α) < κ then Otp(Cα ) < κ; and
3. if β ∈ Lim(Cα ) then Cβ = Cα ∩ β, and β ∈
/ E.
κ denotes κ (∅). It is readily seen that requirement 2 may be alternatively stated as, |Cα | < κ.
Theorem 1. If V = L then κ holds.
Remarks on proof: A proof is given in section IV.5 of [Devlin]. Let
S = {α < κ+ : α > κ, ω · α = α, and ∀γ < α(|γ|Jα < κ)}. S may be
seen to be a club subset of κ+ . Using fine structure theory, a sequence
hCα i for α ∈ S is constructed, such that
1. Cα is a closed subset of α;
2. if cf(α) > ω then Cα is unbounded in α;
3. the order type of Cα is at most κ; and
4. if β ∈ Cα then Cβ = Cα ∩ β.
Via the order preserving bijection S 7→ Lim(κ), hCα i becomes a
sequence hBα i for α < κ+ with α ∈ LimOrd, such that
1. Bα is a closed subset of Lim(α);
2. if cf(α) > ω then Bα is unbounded in α;
3. the order type of Bα is at most κ; and
4. if β ∈ Bα then Bβ = Bα ∩ β.
The sequence hBα i can in turn be converted to a sequence of sets
as required for κ (lemma IV.5.1). ⊳
Recall the principle ♦κ (E) defined in section 23.
Lemma 2. Let W denote {α < κ+ : cf(α) = ω}.
a. Suppose κ is uncountable and κ holds; then there is a stationary
set E ⊆ W such that κ (E) holds, and if ♦κ+ (W ) holds then
♦κ+ (E) holds.
b. Suppose GCH holds, κ is uncountable, cf(κ) = ω, and κ holds;
then ♦κ+ (W ) holds.
c. Suppose GCH holds, κ is infinite, and cf(κ) > ω; then ♦κ+ (W )
holds.
Remarks on proof: These may be found in [Devlin]. Part a is lemma
IV.2.10. Part b is lemma IV.2.8. Part c is lemma IV.2.7. ⊳
Theorem 3. Suppose κ is an infinite cardinal, and there is a stationary subset E ⊆ κ+ such that κ (E) and ♦κ+ (E) both hold. Then
there is a κ+ -Suslin tree.
Remarks on proof: This is theorem IV.2.5 of [Devlin]. ⊳
Corollary 4. Suppose GCH holds, κ is uncountable, and κ holds;
then there exists a κ+ -Suslin tree.
179
Remarks on proof: This is theorem IV.2.10 of [Devlin]. Using
lemma 2 there is a stationary subset E ⊆ κ+ such that κ (E) and
♦κ+ (E) both hold. ⊳
It follows using theorem 1 and corollary 4 that if V = L then for any
infinite cardinal κ, a κ+ -Suslin tree exists. Using the covering lemma,
the following may be shown.
Theorem 5. Suppose ¬0#. Let κ be a singular cardinal.
a. κ holds.
b. Suppose that GCH holds also. Then there is a κ+ -Suslin tree.
Remarks on proof: Part a is theorem V.5.6 of [Devlin]. Part b
(theorem V.5.7 of [Devlin]) follows from part a by theorem 3. ⊳
The following result will be of interest in chapter 56.
Theorem 6. For any cardinal κ, if κ is false then κ+ is Mahlo in
L.
Remarks on proof: This was noted in [Jensen]. First, by modifying
the proof that κ holds in L it may be shown that, if A ⊆ κ+ and
∀α < κ+ (|α|L[A∩α] ≤ κ) then κ holds in L[A]. See exercise IV.5 of
[Devlin] and theorem 6.36.of [BJW]. The theorem then follows (exercise
IV.6 of [Devlin]). ⊳
Corollary 7. If ¬κ for a singular cardinal κ then 0# exists.
Remarks on proof: This follows from theorem 6 using theorem
51.7.c. ⊳
The square principle, like many other statements of modern set
theory, has become of interest in various topics. [CFM] is one example
of research in this area.
53. Independence of AC.
As noted in theorem 21.13, if M [G] is a generic extension of a
transitive model M of ZFC then M [G] is a model of ZFC. To construct
a model in which the axiom of choice fails, additional steps are needed.
A set x is said to be ordinal definable if
(∗) there is a formula Fu,~p , and ordinals α
~ , such that u ∈ x if and only
if F (u, α
~ ) holds.
The notion of ordinal definability was discussed by K. Godel in 1947,
and has found various uses in set theory since. Discussions can be found
in [Drake], [Jackson], [Jech2], [Schindler], and other references.
The definition (∗) cannot be formalized in ZFC. Such a definition
is easy to give, however; Let OD be the class of sets x such that there
is an ordinal β such that x is definable in Vβ from ordinal parameters.
Using a reflection principle it follows that x ∈ OD if and only if (∗)
holds (this is noted following lemma 13.25 of [Jech2]).
OD can be characterized in various other ways. In [Jech2] it is
characterized as those x which are obtained from elements Vα by “Godel
180
operations”, also called “fundamental operations”. The exact set of
operations varies from author to author. The basis functions of section
45 can be used, since theorem 13.4 of [Jech2] holds for these; this may
be seen from results of section VI.1 of [Devlin].
Let HOD be the sets x which are “hierarchically” OD, that is,
such that x ∈ OD and TC(x) ⊆ OD. Basic properties of these classes
include the following.
Theorem 1.
a. OD has a definable well-ordering.
b. L ⊆ OD ⊆ HOD ⊆ V .
c. If HOD = OD then V = HOD.
d. HOD is a model of ZFC.
Remarks on proof: Part a is lemma 13.25 of [Jech2]. Since {Vβ :
β < α} is well-ordered, so is its closure under the Godel operations (see
for example lemma 46.8). For part b, it is only necessary to show that
L ⊆ OD; this follows because there is a definable bijection between
Ord and L (see lemma 5.8.4 of [Drake]). For part c, if HOD = OD
then every Vα is HOD and hence every set is OD (remarks following
theorem 8.8 of [Drake]). Part d may be proved by various methods; that
of theorem 13.26 of [Jech2] uses a general fact of interest. Namely, a
transitive class which is closed under the Godel operations, and “almost
universal”, is a model of ZF (theorem 13.9 of [Jech2], theorem 14.11 of
[TakZar1]). ⊳
Possible inequalities which are not excluded by theorem 1.b and
theorem 1.c are consistent; see [Drake] for further remarks. Since HOD
satisfies AC, to violate AC the method must be generalized. For a set
A, let ODA be the sets such that
- there is a formula Fu,~p , ordinals α
~ , and elements ai ∈ A, such that
u ∈ x if and only if F (u, α
~ , ~a) holds.
Let HODA be the sets x such that x ∈ ODA and TC(x) ⊆ ODA . By
arguments similar to those already given, ODA is definable in ZFC,
and HODA is a model of ZF. The notation HOD(A) will be used for
HODA∪{A} , as in [Jech2].
Let M be a ground model in which V = L holds. Let P be the
partial order, whose elements are functions f : s 7→ {0, 1} where s is a
finite subset of ω × ω, with f < g if and only if f ⊃ g. Let G be an
M -generic subset of P . For i ∈ ω let ai be the set of j ∈ ω such that
f (i, j) = 1 for some f ∈ G. Let A = {ai : i ∈ ω}.
Theorem 2. In the model HOD(A) where A is defined above, there
is no well-order of the real numbers.
Remarks on proof: This follows from lemma 14.39 of [Jech2]. ⊳
54. Proper forcing.
181
Properness is a property of notions of forcing, which has found
many applications since it was defined by S. Shelah in 1982. Since it
will be referred to in subsequent sections, a brief treatment will be given.
See chapter 31 of [Jech2] and [Abraham] for more extensive treatments.
Some variations of proper forcing which have subsequently been seen
also to be of interest will be described as well.
First, the notion of the club filter in [A]µ , for a cardinal µ of uncountable cofinality and a set A with |A| ≥ µ, will be defined. This
notion was first defined by T. Jech in 1972, and has since become commonly used. Chapter 8 of [Jech2] includes a discussion of this topic.
The set [A]µ becomes a poset when ordered by the subset relation.
A subset X ⊆ P of a quasi-order P is said to be cofinal if for any p ∈ P
there is an x ∈ X with p ≤ x. The term “unbounded” is also used. A
subset C ⊆ [A]µ is said to be a club subset if it is cofinal, and closed
under unions of ascending chains of length some ordinal α ≤ µ.
Lemma 1. Suppose µ and A are as above. The subsets of [A]µ
which contain a club subset comprise a filter.
Proof: It suffices to show that if C0 and C1 are club subsets then
C0 ∩ C1 is a club subset. Suppose s0 is any element of [A]µ . Define sij
for i ∈ ω and j = 0, 1 inductively. Let s00 be any f ∈ C0 such that
s0 ⊆ f . Let si1 be any f ∈ C1 such that si0 ⊆ f . Let si+1,0 be any
f ∈ C0 such that si1 ⊆ f . Let s = ∪i,j sij . Then s0 ⊆ s and s ∈ C0 ∩ C1 .
Given an ascending chain of functions in C0 ∩ C1 of length < µ+ , its
union is in both C0 and C1 , so is in C0 ∩ C1 . ⊳
The terminology “club subset” is used because facts concerning the
usual club filter may be adapted to this filter. In particular, a subset is
called thin if its complement is in the club filter; and stationary if it is
not thin.
A notion of forcing hM, P i is said to be proper if, whenever, in
M , S is a stationary subset of [λ]ω for an uncountable cardinal λ, S is
stationary in the generic extension M [G].
Some sufficient conditions for properness include the following.
- If P satisfies the countable chain condition then P is proper (lemma
31.2 of [Jech2]).
- If P is ω-closed then P is proper (lemma 31.3 of [Jech2]).
- If P “satisfies axiom A” then P is proper (lemma 31.10 of [Jech2]).
There is an important equivalent formulation, which involves some
preliminary definitions. Suppose (N, P ) is a notion of forcing.
- A cardinal λ is sufficiently large if λ > 2|P | .
- For λ a sufficiently large cardinal, by an elementary substructure
of Hλ will be meant one in an expanded structure including any
items of interest, for example P .
182
- A set D ⊆ P is predense if every p ∈ P is compatible with some
q ∈ D.
- If M is an elementary substructure of Hλ where λ is a sufficiently
large cardinal, a condition q ∈ P is said to be (M, P )-generic if for
every maximal antichain A ∈ M the set A ∩ M is predense below
q.
The following holds:
- P is proper if and only if for all sufficiently large cardinals λ there is
a club subset of [Hλ ]ω of countable elementary submodels M such
that for all p ∈ M there is a q ≤ p which is (M, P )-generic (theorem
31.7 of [Jech2]).
One of the most important properties of proper forcing is the following. Recall from section 27 that in an iteration having countable
support, the overall forcing notion involves sequences which are not 1
at only countably many places.
- If Pα is a countable support iteration of {Q̇β : β < α} such that
every Q̇β is a proper forcing notion in M Pβ , then Pα is proper
(theorem 31.15 of [Jech2]).
A weakened version of properness has been seen to have various
applications. Suppose M is an elementary substructure of Hλ where λ
is a sufficiently large cardinal.
- For a condition q ∈ P , q is (M, P )-generic if and only if, for any
name α̇ for an ordinal (i.e., such that Jα̌ is an ordinalK = 1), q α̇ ∈ M (lemma 31.6 of [Jech], fact 18.31 of [Roitman]).
- Say that a condition q ∈ P is (M, P )-semigeneric if for any name
α̇ for a countable ordinal, q α̇ ∈ M .
- P is defined to be semiproper if, for all sufficiently large cardinals λ
there is a club subset of [Hλ ]ω of countable elementary submodels
M such that for all p ∈ M there is a q ≤ p which is (M, P )semigeneric.
Properties of interest of semiproper notions of forcing include the
following.
- If hM, P i is semiproper then every stationary set S ⊆ ℵ1 remains
stationary in M [G] (theorem 34.4 of [Jech2]).
A notion of forcing is said to be “stationary set preserving” if it has the
foregoing property.
- In general, countable support iteration does not preserve semiproperness. There is a type of iteration, called “revised countable
support” (RCS) iteration, which does. See chapter 37 of Jech for
some discussion.
55. Core models.
183
Core models are inner models which are constructed by certain
methods, and have certain properties. There is no precise definition,
but several models have been constructed, which set theorists agree are
examples of core models. The first construction appeared in [DoddJen1],
of the “Dodd-Jensen” core model, which will be denoted K DJ . An
alternative treatment may be found in [Dodd], and a brief overview in
[Jech2]. An overview will be given here.
A notion central to core model theory is that of a “premouse”. The
definition depends on the type of core model, and even for the same type
there are variations between authors. The definition in [DoddJen1] will
be considered here.
Recalling the definition of the relativized Jensen hierarchy from secA
A
tion 46, structures of the form J A
α = hJα , A∩Jα i will be considered. As
noted in section 46, such structures are amenable; they are a restricted
type of J-structure, as defined in section 47. A premouse at κ is defined
to be a structure N = J U
α where in N , κ is a cardinal and U is a normal
N -ultrafilter on κ. (U is a subset of JαU , but not necessarily an element).
Suppose N is a premouse at κ. On N κ ∩ N , let
- f ≡0 g if and only if {α < κ : f (α) = g(α)} ∈ U ;
- f ∈0 g if and only if {α < κ : f (α) ∈ g(α)} ∈ U ; and
- U0 (f ) if and only if {α < κ : U (f (α))} ∈ U .
By arguments as given in earlier sections, ≡0 is a congruence relation
on N κ ∩ N , equipped with ∈0 and U0 . Let Ñ denote the quotient, with
∈Ñ and UÑ the induced relations.
Using familiar arguments, the following may be shown. Los’ theorem holds for Σ0 formulas. For x ∈ N let Cx denote the function on κ
where Cx (α) = x for all α < κ; the map x 7→ [Cx ] is Σ0 -elementary. It
is also ∈-cofinal, so in fact is Σ1 -elementary.
If Ñ is well-founded, let N + be the transitive collapse, and let
jN : N 7→ N + be x 7→ [Cx ] composed with the transitive collapse
map π. Again, by familiar arguments, jN ↾ κ is the identity map,
Pow(κ) ∩ N + = Pow(κ) ∩ N , and if x = π([f ]) then jN (f )(κ) = x.
+
+
+
Lemma 1. N + = J U
α+ where α = sup(j[α]) and U (π([f ])) if and
′
′
only if UÑ ([f ]). Letting κ denote jN (κ), κ is a cardinal in N + and U +
is a normal N + -ultrafilter on κ′ . In particular, N + is a premouse at κ′ .
Remarks on proof: This is lemma 3.8 of [DoddJen1]. All claims
follows using the fact that jN is Σ1 -elementary. ⊳
Lemma 1 permits iterating the step N 7→ N + , provided Ñ is wellfounded. Say that N is 0-iterable, N0 = N , and j00 is the identity. If
N is α-iterable and Ñα is well-founded say that N is (α + 1)-iterable,
Nα+1 = Nα+ , for β ≤ α jβ,α+1 = jNα ◦ jβ,α , and jα+1,α+1 is the identity.
If α is a limit ordinal and N is β-iterable for all β < α, let Nl be the
184
direct limit of the Nβ with the maps jγβ for γ < β < α. If Nl is wellfounded say that N is α-iterable, let Nα be the transitive collapse of
Nl , and let jβα be the direct limit map, composed with the transitive
collapse map.
A premouse N is said to be iterable if it is α-iterable for all ordinals
α. If M is an iterable premouse at κ the sequence hMi , jij , κi i, where
κi = j0i (κ), is called the iteration of M . The following may be verified
(lemma 3.12 of [DoddJen1]):
- jij is Σ1 -elementary and cofinal.
- jij ↾ κi is the identity,
- If i < j then κi < κj and Pow(κi ) ∩ Mj = Pow(κi ) ∩ Mi .
- κi is the critical point of jij .
- Any element of Mj is definable by a Σ0 formula with parameters
from Ran(jij ) ∪ {κh : i ≤ h < j}.
The core model K DJ (written simply K if there is no possibility of
confusion)) is constructed by singling out certain iterable premice, called
mice, which satisfy certain “fine structural” requirements. It turns out
that the existence of a mouse, the existence of an iterable premouse,
and the existence of 0#, are equivalent (see chapter 12 of [Dodd]).
The definition of mice and the derivation of their relevant properties
in [DoddJen1] is lengthy, and only some remarks will be made here,
referring to the treatment there. For convenience, conventions following
[DoddJen1] will be used, which are slightly at variance with those used
in section 47. M , N will generally used to denote structures of the form
JA
α . These may be required to satisfy additional hypotheses, e.g., to be
acceptable or to be a premouse. The term “strongly acceptable” will be
used for the notion of “acceptable” defined in section 47. The parameter
pM is defined to be the least finite set of ordinals (under a well-order on
such). AM is defined to be {hi, xi ∈ JρM :|=M φi (x, pM )}. M ∗ denotes
M
JA
ρM .
Lemma 2.
a. M ∗ is strongly acceptable.
M
.
b. JρAMM = Hωρ
M
Remarks on proof: Part a is lemma 2.18 of [DoddJen1], and part b
is lemma 2.19. ⊳
Let hM denote the uniformly defined σ1 Skolem function for M
A
(denoted hM
J in section 46). A structure M = Jα is said to be sound if
JαA = hM [ω × (JρM ∪ {pM })] (that is, making suitable adjustments to
the definitions, if pM is a very good parameter).
Lemma 3. Suppose A ⊆ JρAM .
a. M is sound.
b. JρAMM = JρAM .
185
Remarks on proof: See lemma 4.2 and remarks following lemma
4.4 of [DoddJen1]. Part a may be proved as the case n = 0 of lemma
9.7 of [Dodd]. For part b, by remarks following lemma 47.6 hJαA , AM i
is amenable. The claim follows by arguments as in the proof of lemma
46.16. ⊳
Definition 4.5 of [DoddJen1] gives the recursion equations for iterating the projectum, as follows. Let M (0) = M , ρ0M = α, p0M = ∅,
i+1
and A0M = A. Let M (i+1) = M (i)∗ , ρi+1
M = ρM (i) , pM = pM (i) , and
Ai
(i)
Ai+1
= J ρiM .
M = AM (i) . It follows by induction that M
M
M is said to be n-sound if M (i) is sound for i < n. Using lemma
3 and standard methods such as may be found in the proof of theorem 47.8, it may be seen that if A ⊆ JρAn then M is n-sound, and
M
suitably reformulated, theorems 47.8.c and 47.8.d hold (lemma 4.6.a of
[DoddJen1]).
A premouse N = J U
α at κ is said to be critical if N is acceptable,
and there is a subset of κ, definable from parameters, which is not an
element of N .
Lemma 4. Suppose N is critical, and n is least such that there is a
subset of κ, Σn -definable from parameters, which is not an element of
N . Then ρn+1
≤ κ < ρnN .
N
Remarks on proof: This is stated following Definition 5.1 of [DoddJen1]. Let δN denote the least δ ≤ α such that U ⊆ JαU . Using facts
given above, ρn+1
< δN ≤ ρnN . The claim follows using lemma 4.8 of
M
[DoddJen1]. ⊳
The integer n of lemma 4 is called the critical number, and denoted
n(N ). Let N ′ = N (n) ; it follows by facts above that N ′ = J U
ρ′ where
ρ′ = ρnN . A premouse N = JαU at κ is said to be a mouse (at κ) if
- N is critical,
- N ′ is iterable with iteration h(N ′ )i , (j ′ )ij , (κ′ )i i, and
- for each i ∈ Ord there is a critical premouse Ni such that n(Ni ) =
n(N ) and (Ni )′ = (N ′ )i .
Lemma 5. Suppose N is a mouse and σ : M 7→ N ′ is Σ1 -elementary.
Then there is a unique N̄ such that N̄ ′ = M . Further, N̄ is a mouse
and n(N̄ ) = n(N ).
Remarks on proof: See lemma 5.6 of [DoddJen1]. ⊳
If M is the transitive collapse of hN ′ [ω × (JρN ′ ∪ {pN ′ })], by the
preceding lemma there is a unique mouse N̄ with N̄ ′ = M . N̄ is called
the core of N .
It may be shown that, for a mouse N , the iteration of N ′ may
be “lifted” to a system of structures and Σn+1 -elementary embeddings,
starting at N (remarks following Definition 5.4 of [DoddJen1]; and also
lemma 9.15 of [Dodd]). This system is called the mouse iteration of N .
186
Let CN = ∩{U ∩ hN ′ [ω × (JρN ′ ∪ {pN ′ })]. It follows that CN is
the set of Σ1 generating indiscernibles for N ′ with suitable constants
added to the language, which is unique, and equals {κ̄i : i < λ}, where
the κ̄i are the critical points of the mouse iteration of the core N̄ of
N and N = N̄λ in the iteration (see remarks following lemma 5.14 of
[DoddJen1]).
Lemma 6. For any κ, there is at most one N such that N is a
mouse at κ and |CN | = ω.
Remarks on proof: See lemma 6.2 of [DoddJen1]. ⊳
Let
D = {hξ, κi : there is a mouse N at κ with |CN | = ω and ξ ∈ CN };
Kα = JαD ; and
K = ∪α Kα = L[D].
K is the (Dodd-Jensen) core model.
After proving various lemmas about the above constructs, the following may be shown (references are to [DoddJen1]).
- if β is an infinite cardinal in K then Kβ = HβK (lemma 6.9).
- |=K GCH (corollary 6.10).
- if 0# does not exist then K = L (Examples 1 of section 6).
- If 0# exists then 0# ∈ K (remarks following Definition 5.4).
- If 0# exists and β is an uncountable cardinal in K then Kβ =
∪{N ∈ Kβ : N is a mouse} (lemma 6.11).
- If there is a set U such that U is a normal measure in LU then
Ui
K = ∪i HκLi = ∩i LUi where hLUi , κi i is the iteration (Examples 3
of section 6).
- It follows by standard observations that the hypothesis of the preceding item holds if and only if there is an inner model containing
a measurable cardinal.
- In K, there is no set U such that U is a normal measure in LU
(remarks in fourth paragraph).
- An iterable premouse is acceptable (lemma 5.21).
K is “between” L and L[U ] for a normal measure U . It exists even if
there is no such U , and contains 0# if and only if 0# exists.
A standard method in applications of core models is as follows.
- Let P be a property of cardinals, to the effect that they are “large”.
- Let H be the statement, “there does not exist an inner model of
∃κP (κ)”
- Let K be a core model.
- Let SK be a statement involving K, such that H ⇒ SK .
- Let SV be a statement not involving K, such that SK ⇒ SV .
- It follows that if ¬SV is consistent then ∃κP (κ) is consistent.
187
The statement H is called the “anti-large cardinal hypothesis”. K must
be constructed so that H ⇒ SK follows, for various SK of interest.
This is often stated, “K is the core model below, or up to, a cardinal
with property P ”. K has a “maximality” property, in that considerable
“large cardinal structure” up to ∃κP (κ) is incorporated; it has a “minimality” property in that this is done in a minimal fashion, invariably
as L[E] for some class E.
That H ⇒ SK is not as written a statement of set theory; what
is actually proved is the contrapositive ¬SK ⇒ ∃κP (κ)M for some inner model M (so that ZF C M also). The consistency implication than
follows as in remarks preceding theorem 43.13.
In the introduction to part four of [Dodd] it is stated that “The
covering lemma and the SCH are the most important applications of
K”. This application is an example (indeed the first) of the above
described method:
- P is “measurable”.
- K is K DJ .
- SK is the “covering property”, that if x is an uncountable set of
ordinals then there is a set of ordinals y ∈ K such that x ⊆ y and
|y| = |x|.
- SV is SCH.
That H ⇒ SK was first proved in [DoddJen2]; a proof can also be found
in lemma 18.26 of [Dodd]. That SK ⇒ SV is lemma 21.10 of [Dodd].
There is a covering lemma for L[U ], and consequences for SCH; see
[DoddJen3].
“Higher” core models, that is, up to larger cardinals, have been
of considerable interest in modern set theory. Chapter 35 of [Jech2]
mentions three such, up to a measurable cardinal κ of Mitchell order κ
(K m ), up to a strong cardinal (K strong ), and up to a Woodin cardinal
(K steel ). The methods needed for constructing successive core models
become increasingly complex. Roughly, K m requires coherent sequences
of measures, K strong requires coherent sequences of extenders, and K steel
requires iteration trees. Iteration trees are used for various purposes, including constructing fine structural inner models which need not be core
models. Further remarks are omitted. Several articles in The Handbook
of Set Theory provide further information.
56. Consistency strength.
If T1 and T2 are first order theories, T1 is said to have consistency
strength at least that of T2 if Con(T1 ) ⇒ Con(T2 ). This notion has already been encountered in section 43, where the theories are specialized
to “ZFC+∃κP (κ)” where P defines a type of large cardinal.
The question arises of what methods may be allowed in proving the
188
implication. For extensions of ZFC, ZFC may be allowed. For weaker
theories (and even set theory), though, the methods might be restricted,
say to primitive recursive arithmetic (for which see [Smorynski]).
T1 and T2 are said to be equiconsistent if Con(T1 ) ⇒ Con(T2 ) and
Con(T2 ) ⇒ Con(T1 ). T1 is said to have consistency strength greater
than that of T2 if Con(T1 ) ⇒ Con(T2 ), but ¬(Con(T2 ) ⇒ Con(T1 )).
Some simple examples are as follows.
- ZFC+GCH is equiconsistent with ZFC (since L is an inner model).
- ZFC+¬GCH is equiconsistent with ZFC (since there is a generic
extension in which CH is false).
Let I be the statement “there exists an inaccessible cardinal”.
- ZFC+I has consistency strength strictly greater than ZFC (since
Con(ZFC) is provable in ZFC+I, and using the second incompleteness theorem).
- ZFC+¬I is equiconsistent with ZFC (since a model of ZFC+I may
be “truncated” to a model of ZFC+¬I).
Certain statements of set theory may be singled out as “large cardinal axioms”. Often these are of the form “∃κP (κ)” where P is a
predicate singling out some class of large cardinals. Some other statements are considered as “having large cardinal strength”, though, for
example “0# exists” (see theorem 9.17 of [Kanamori]).
The following are empirical observations of modern set theory.
- If C1 and C2 are large cardinal axioms then either Con(ZF C +
C1 ) ⇒ Con(ZF C + C2 ) or Con(ZF C + C2 ) ⇒ Con(ZF C + C1 ).
- If S is a statement of interest then there is a large cardinal axiom
C such that ZFC+S is equiconsistent with ZFC+C.
- That Con(ZFC + C) ⇒ Con(ZFC + S) gives an “upper bound” on
the consistency strength of S. Upper bound proofs often involve
constructing a generic extension satisfying S, from a ground model
satisfying C. Sometimes, the implication C ⇒ S can be shown.
- That Con(ZFC + S) ⇒ Con(ZFC + C) gives a “lower bound” on
the consistency strength of S. Lower bound proofs often involve
constructing an inner model (in many cases of interest a core model)
satisfying C, starting with a model of S.
- Sometimes bounds are shown for Con(ZF + S).
For an example, the negation of SCH is equiconsistent with the existence of a measurable cardinal κ of Mitchell order κ++ . The existence
of a measurable cardinal κ such that 2κ > κ+ also is. These results were
proved by Mitchell and Gitik, in the period from 1984 to 1991. For further information see corollary 35.18 and theorem 36.1 of [Jech2], and
theorem 4.41 of [Mitchell2]. For a survey of subsequent developments
see [Kanamori4].
189
For another example, if there exists a Mahlo cardinal then there
is a forcing model in which ¬ℵ1 holds (exercise 27.2 of [Jech2]). By
theorem 52.6, ¬ℵ1 is equiconsistent with the existence of a Mahlo
cardinal.
Further examples of consistency strength bounds will be given in
subsequent sections.
57. Descriptive set theory.
Descriptive set theory is a subject dating to the last half of the
1910’s. Its subject matter is sets of real numbers, which satisfy any of
various definability conditions. It has been found convenient to consider
the real numbers in the alternative form mentioned in section 14, as
elements of Baire space N, ω ω with basic open sets Ut where t ∈ ω <ω .
Definable subsets are a topic of interest in any topological space.
Spaces called “Polish spaces” are of particular interest. Descriptive set
theory in more general spaces is thoroughly covered in [Kechris] and
[Moschovakis]. For some applications, including set theory, descriptive
set theory in some particular spaces is all that is required. Such treatments may be found in [Jech2] and [Kanamori3].
Many facts of descriptive set theory, when suitably formulated, hold
for Polish spaces in general. For various applications the cases of greatest
interest are R, N, and C. No two of these are homeomorphic. (For
readers familiar with some topology, R is connected and not compact,
N is totally disconnected and not compact, and C is totally disconnected
and compact. See [Dowd1] for definitions.) Any two uncountable Polish
spaces are however “Borel isomorphic” (see theorem 15.6 of [Kechris]).
Proofs might be given for any Polish space. Alternatively, as a matter of convenience, a proof might be given for a particular Polish space.
This can then be “transferred” to an arbitrary Polish space via Borel
isomorphism. It can also be transferred to some other particular Polish
space by a specific method, avoiding the need to develop the theory of
Borel isomorphism. Most results given here will be for particular spaces.
The starting point of descriptive set theory is the Borel sets. These
were first defined in the 19th century by E. Borel. The class of Borel
subsets may be defined for any topological space. A “σ-algebra” of
subsets of some set is defined to be a family of subsets which is closed
under the operations of countable union and complement. The Borel
sets are defined as the smallest σ-algebra containing the the open sets
(the σ-algebra generated by the open sets).
If the space satisfies some restrictions, the Borel sets may be stratified into the Borel hierarchy. In [Kechris] the restriction of “metrizability” is imposed (a topological space is metrizable if there is a metric
whose metric topology is the given topology). In a metrizable space:
190
- A subset is defined to be Σ01 if it is open.
- For α ≥ 1 a subset is Π0α if it is the complement of a Σ0α set.
- For α > 1 a subset is Σ0α if it is a union of finitely or countably
many sets which are Π0β for some β < α.
- For α ≥ 1 a subset is ∆0α if it is both Σ0α and Π0α .
The use of “boldface” to denote these classes is standard. Proposition
I.3.7 of [Kechris] shows that in a metrizable space, every closed subset
is a countable intersection of open sets. It follows by induction that for
α < β, Σ0α , Π0α ⊆ ∆0β . It is not difficult to see that ∪α<ℵ1 Σ0α is closed
under countable union and complement, and equals the class of Borel
sets.
In developing descriptive set theory it is convenient to consider the
space Aω of sequences from any set A. Generalizing a definition from
section 14, for t ∈ An let Ut denote {f ∈ Aω : f ↾n = t}. The sets Ut
form the base for a topology on Aω . This space is metrizable; indeed,
the function d(s, t) = 1/2n where n is least such that s(n) 6= t(n), is
readily verified to be a metric.
It is also convenient to consider product spaces. In general, the
product of topological spaces X1 , . . . , Xk has as underlying set the Cartesian product X1 × · · · × Xk . The sets U1 × · · · × Uk where Ui is an
open subset of Xi may be taken as the base of a topology (in fact the
Ui may be required to be basic, given a base for the topology on Xi
for each i). If for each i Xi is a metric space, with metric function
di , then the product space may be given any of several metrics, with
the resulting metric topology being the product topology, the function
max(d1 (x, y), . . . , dk (x, y)) for example.
ω
A product space Aω
1 × · · · × Ak is readily seen to be homeomorphic
ω
to (A1 × · · · × Ak ) , via the map hhx1i i, . . . , hxki ii 7→ hhx1i , . . . , xki ii.
In descriptive set theory, it is convenient to blur the distinction. For
example, hx1 , . . . , xk i may be used to denote an element of (A1 × · · · ×
Ak )ω , so that no special notation is needed for this. Also, in a product
ω
Aω
1 × A2 , a factor may itself be a product space; more generally, the
usual identifications may be invoked, so that the order of association of
a product may be ignored.
By a “standard space” will be meant a Aω where A = A1 ×· · ·×Ak ,
where Ai is 2 or ω. A standard space is a product of copies of N and C.
Many authors only consider spaces Nk ; allowing factors of C as well is
sometimes convenient.
Another homeomorphism of interest arises between Aω and B ω
when |A| = |B|. Let f : A 7→ B be any bijection, and map hxi i to
hf (xi )i. Letting ∼
= denote homeomorphism, it follows that for infinite
A, (Aω )k ∼
= N.
= Aω . For example, Nk ∼
= (Ak )ω ∼
191
In many areas in mathematics, including descriptive set theory, it is
useful to specify a method of “coding” a finite sequence of integers as a
single integer. The “prime power coding” is a standard such (discussed
in [Yasuhara] for example). It makes use of the “fundamental theorem
of arithmetic”, that every integer n ≥ 2 may be uniquely expressed in
the form pe11 · · · pekk where p1 < · · · < pk are “prime numbers”. A proof
of this may be found in [Dowd1].
The code for a finite sequence s = hs0 , . . . , sl−1 i will be taken as
2s0 +1 3s1 +1 · · · psl−1 +1 where p is the l-th prime number in increasing
numerical order. The empty sequence is coded by the empty product,
whose value is by a standard convention equal to 1. The code for s
will be denoted Cd(s). The standard operations on sequences translate
to primitive recursive functions on their codes; see [Yasuhara] for some
examples.
Let FS(i) be the sequence whose code is i if such exists, else the
empty sequence. This provides a computable listing of the finite sequences of integers. It is readily seen that |FS(i)| ≤ i, a fact which is
sometimes useful.
Theorem 1 (Parameterization theorem). Let Aω be a standard
space. For any α > 0 there is a Σ0α subset Ũ ⊆ N × Aω such that for
any Σ0α subset X ⊆ Aω there is an element s ∈ N such that X = {x :
hs, xi ∈ Ũ }.
Remarks on proof: The proof may be given for A = ω; modifications
for the general case are slight. For α = 1 let Ũj = {hs, xi : x ∈ UFS(j) };
and let Ũ = ∪j Ũj . Ũj is open, since Us↾(j+1) × UFS(j) ⊆ Ũj ; hence Ũ
is open. If X is open then X = ∪j UFS(j) for some s, and the theorem
follows when α = 1. The claim for α > 1 may be proved by induction;
see lemma 11.2 of [Jech2]. ⊳
Corollary 2. In a standard space,
a. there is a subset which is Σ0α but not Π0α ; and
b. if α < β then Σ0α , Π0α ⊂ ∆0β .
Remarks on proof: Again, the case ω ω is typical. Given α, let Ũ
be as in the theorem. Let D = {s : hs, si ∈ Ũ }. D is the inverse image
of Ũ under the function s 7→ hs, si. This function is readily verified to
be continuous. The inverse image of a Σ0α subset under a continuous
function is Σ0α (theorem 1C.2 of [Moschovakis]). Thus, D is Σ0α . If D
were Π0α then for some s0 , Dc equals {x : hs0 , xi ∈ Ũ }. But then s0 ∈ D
if and only if hs0 , xi ∈ Ũ , if and only if s0 ∈ Dc . This contradiction
shows that D is not Π0α . If α < β then Π0α ⊆ ∆0β ; Σ0α ⊂ ∆0β follows.
Taking complements, Π0α ⊂ ∆0β also. ⊳
A second hierarchy of sets is defined in a standard space Aω as
follows.
192
- A subset X is Σ1n if there is a subset Y ⊆ N × Aω , which is closed
if n = 1, or Π1n−1 if n > 1, such that X = π2 [Y ]. X is said to be
the projection of Y ; x ∈ X if and only if ∃w(hw, xi ∈ Y ).
- A subset is Π1n if it is the complement of a Σ1n set.
- A subset is ∆1n if it is both Σ1n and Π1n .
These classes are called the classes of the projective hierarchy, and their
elements are called projective sets. The sets in Σ11 are called analytic
sets. See the introduction of [Moschovakis] for remarks on the history
of the definition of the analytic and projective sets.
The definition of an analytic subset applies in any Polish space
X, that is, a subset is analytic if and only if there is a closed subset
W ⊆ N × X such that X = π2 [W ]. Lemmas 11.6 and 11.7 of [Jech2]
gives some other characterizations.
Lemma 3. Suppose Aω is a standard space and let ΓA denote the
class of analytic subsets of Aω .
a. ΓA is closed under countable union and intersection.
b. ΓA contains the closed sets and the open sets.
c. If W ∈ Γω×A and X is the projection of W then X ∈ ΓA .
d. If W ∈ ΓA and f : ΓA 7→ ΓB is continuous then f [W ] ∈ ΓB .
Remarks on proof: A proof of part a may be found in the proof of
lemma 11.6 of [Jech2]. A closed set X equals the projection of N × X.
Since N has a countable base of sets which are both open and closed,
any open set is a countable union of closed sets, so using part a ΓA
contains the open sets. For part c, ∃u∃vW may be converted to ∃wW ′ ,
homeomorphically. For part d, since W is the projection of a closed set
W may be assumed to be closed. The graph of f may be verified to be
a closed subset of ΓA×B , and f [W ] may be obtained by projection. ⊳
Theorem 4. Let Aω be a standard space. For any n > 0 there is a
1
Σn subset Ũ ⊆ N × Aω such that for any Σ1n subset X ⊆ Aω there is
an element s ∈ N such that X = {x : hs, xi ∈ Ũ}.
Remarks on proof: See lemma 11.8 of [Jech2]. ⊳
Corollary 5. In a standard space,
a. there is a subset which is Σ1n but not Π1n ; and
b. if n < m then Σ1n , Π0n ⊂ ∆1m .
Remarks on proof: Similar to corollary 2. ⊳
By lemma 3 a Borel set is ∆11 . By corollary 5 there are analytic
sets which are not Borel. Say that disjoint subsets X, Y of a set are
separated by a subset Z if X ⊆ Z and Y ⊆ Z c .
Theorem 6. If X and Y are disjoint analytic subsets then there is
a Borel subset Z which separates them.
Remarks on proof: See lemma 11.11 of [Jech2]. Other proofs
may be found in proposition 13.4 of [Kanamori3] and theorem 2E.1
193
of [Moschovakis]. These involve further concepts, some of which will be
discussed below. ⊳
Theorem 7. A subset is ∆11 if and only if it is Borel.
Remarks on proof: It has already been observed that Borel subsets
are ∆11 . If X is ∆11 then both X and X c are analytic, whence they are
separated by some Borel set, which must be X. ⊳
There are “effective” (“lightface”) versions of classes of sets (“boldface”) defined above. A simple method for defining these is to consider
a language with three sorts of variables, intended to range over ω (integer variables), ω ω (function variables), and and 2ω (set variables). The
functions and relations of the language consist of 0,1,+,·,≤ on integer
values; function application, which may be written as the term f (n) as
usual; and predicate application, which may be written as the atomic
formula X(n) as usual. Set variables may only occur free in formulas.
As mentioned previously, many authors omit set variables.
The quantifier complexity of formulas in this language is defined “as
usual” (see section 34). A formula of the form Q1 ~x1 · · · Qn ~xn G, where
the blocks of quantifiers alternate in type, is said to be Σ0n (resp. Π0n )
if the quantifiers are number quantifiers, G has only bounded number
quantifiers, and Q1 is ∃ (resp. ∀). The formula is said to be Σ1n (resp. Π1n )
if the quantifiers are function quantifiers, G has only number quantifiers,
and Q1 is ∃ (resp. ∀).
A formula with free variables f1 , . . . , fk , where fi denotes either
a function or set variable, defines a predicate on some standard space.
This predicate is said to be Σ0n , Π0n , Σ1n , or Π1n , if there is a formula of
the specified type defining it. The ∆0n and ∆1n predicates are defined as
usual.
Lemma 8.
a. A Σ11 subset may be defined by a formula ∃~g G where G is Π01 .
b. A subset is Σ01 if and only if it can be defined in the form
∃nR(Cd(f1 ↾n), . . . , Cd(fk ↾n), n)
where R is a computable predicate.
c. A subset is Σ11 if and only if it can be defined in the form
∃g∀nR(Cd(f1 ↾n), . . . , Cd(fk ↾n), Cd(g↾n), n)
where R is a computable predicate.
Remarks on proof: See section 12 of [Kanamori3]. ⊳
Part a is an effective version of the fact that an analytic set is the
projection of a closed set. Part c is used by some authors (such as
[Jech2]) as the definition of a Σ11 subset. Using part b it is easy to show
that (when k = 1) a subset is Σ01 if and only if it is empty or is of the
form ∪i UFS(e(i)) for some computable function e.
The lightface classes may be defined “relative to” an arbitrary h ∈
194
ω ω . This is readily accomplished by adding a symbol for the function
to the language (so that h(t) may occur in a formula; but h may not be
quantified). The resulting classes are denoted Σ0n (h), etc. Theorem 8.c
“relativizes”, i.e., a Σ11 (h) subset may be defined in such form, where R
is computable from h.
Theorem 9. A subset of a standard space is Σ01 if and only if it is
0
Σ1 (h) for some h. The same claim holds for the other classes.
Remarks on proof: See proposition 12.6 of [Kanamori3]. ⊳
One may say that a lightface subset is one definable without parameters; and a boldface subset is one definable with parameters. The
method of the proof of the theorem may be used to strengthen theorem
1 when α is a nonzero integer n, to require that Ũ be Σ0n ; theorem 4
may similarly be strengthened (see proposition 12.7 of [Kanamori3]).
As a consequence, the lightface hierarchies are strict, indeed there is a
Σ0n subset which is not Π0n , etc.
There is a notion of an effective transfinitely Borel set, and an
effective version of theorem 7; see corollary 27.4 of [Miller].
For s ∈ A<ω , the length of s is just the cardinality of s as a set of
ordered pairs, so that |s| may be used to denote the length of s.
The Lebesgue measure was mentioned in section 15. A self-contained treatment may be found in chapter 23 of [Dowd1]; an overview
may be found in chapter 11 of [Jech2]. Some subsets of Rn (the “measurable” sets) are assigned a “volume”. The Lebesgue measurable sets
form a σ-algebra containing the Borel sets. A measure on C, which is
also called the Lebesgue measure, may be defined by assigning the measure 2−|t| to Ut . This is readily generalized, to a Lebesgue measure on
Ck for any k > 0.
Recall the definition of a meager set from section 15. A subset X of
Rk is said to have the property of Baire if for some open set U , X ⊕ Y is
meager. The sets with the property of Baire form a σ-algebra containing
the Borel sets.
A subset of Rk is said to be perfect if is nonempty, closed, and
contains no isolated points. A subset is said to have the perfect set
property if it it is countable, or contains a perfect subset.
The following theorem is a version for particular spaces. Suitably
stated, it holds for more general spaces; see exercises 2H.8, 2H.5, and
2C.2 of [Moschovakis]. In particular, measures may be defined on Baire
space; but this will be be omitted, so theorems regarding measure will
be stated for the Lebesgue measure on Ck .
Theorem 10. Every analytic subset of Rk or Ck is Lebesgue measurable, has the property of Baire, and has the perfect set property.
Remarks on proof: See theorem 11.18 of [Jech2]. ⊳
195
As a corollary, Π11 subsets also are Lebesgue measurable and have
the property of Baire. This theorem was proved early in the history of
descriptive set theory. No progress was made until 1938, when Godel
announced some parts of theorem 16 below. Before giving this, some
further methods of descriptive set theory will be outlined, in particular
trees.
For a set A, a subset of T ⊆ A<ω which is closed under prefix (i.e.,
if s ∈ T and t is a prefix of s then t ∈ T ) is a type of tree. To distinguish
this type from general trees, authors use adjectives, such as “sequential”
in [Jech2]. For the rest of this section, by a tree will be meant one of
this type. These trees are extremely useful in descriptive set theory. For
example, closed subsets of Aω can be characterized in terms of trees.
For a set X ⊆ Aω the set {s : s is a prefix of f for some f ∈ X} is
a tree; let Pr(X) denote it. For a tree T let Br(T ) denote {f ∈ Aω : for
all n, f ↾n ∈ T }. An element of Br(T ) is called a branch of T .
Lemma 11. For any tree T , Br(T ) is closed. For any subset X,
Br(Pr(X)) is the closure of X.
Proof: If f is not a branch of T then there is an n such that f ↾n ∈
/ T,
whence Uf ↾n contains f and is disjoint from Br(T ). This shows that
Br(T ) is closed. If f ∈ X then every prefix of f is in Pr(X), so f is a
branch of Pr(X). This shows that X ⊆ Br(Pr(X)). If f ∈ Br(Pr(Y ))
then for any n f ↾n is a prefix of some gn ∈ Y . The gn converge to f , so if
Y is closed f ∈ Y . This shows that if Y is closed then Y = Br(Pr(Y )).
Thus, if Y is closed and X ⊆ Y then Br(Pr(X)) ⊆ Br(Pr(Y )) = Y .
This shows that Br(Pr(X)) is the closure of X. ⊳
The relation ⊃ (proper suffix) is a transitive irreflexive relation on
A<ω . A tree is said to be well-founded this relation is, that is, if there
are no infinite branches, or Br(T ) = ∅.
Lemma 12. A tree T is well-founded if and only if there is a function
f : T 7→ Ord such that if s ⊃ t then f (s) < f (t). Further, the range of
f may be taken to be a subset of |T |+ .
Proof: If f exists then clearly T is well-founded. If T is well-founded
let f be the canonical rank function defined in section 32. ⊳
ω
For a product space Aω
1 × · · · × Ak , a tree may be defined as a
<ω
subset of (A1 × · · · × Ak ) , which is closed under prefix. Some authors
define such a tree as a subset of the set of k-tuples of finite sequences
hs1 , . . . , sk i, where si ∈ Ali for all i, that is, all si are the same length;
this is readily seen to amount to the same thing. In particular, sequences
si in Ai , all of the same length l, may be combined into a sequence in
(A1 × · · · × Ak )l . As for infinite sequences, the notation hs1 , · · · , sk i may
be used ambiguously to denote this operation.
If T is a tree on A × B, and x ∈ B ω , let T ⊘x = {s ∈ A<ω :
196
hs, x↾|s|i ∈ T } (the notation Tx is in common use for this operation).
It is easy to see that hw, xi ∈ Br(T ) if and only if w ∈ Br(T ⊘x). B
is usually ω, although the case B = 2 is also of interest. A is often ω
also; however letting it be a larger cardinal adds a fundamental tool to
descriptive set theory.
For an infinite cardinal κ, a subset S of a space Aω is said to be
κ-Suslin if there is a tree T on κ × A, such that S = π2 [Br(T )], i.e.,
x ∈ S if and only if ∃w ∈ κω (hw, xi ∈ Br(T )). This last requirement
can be restated variously as follows:
- Br(T ⊘x) 6= ∅.
- T ⊘x is not well-founded.
- ∃y ∈ κω (hx, yi ∈ K) where K is a closed subset of (ω × κ)ω ,
equipped with the usual topology (this follows using lemma 11).
In particular, a set is analytic if and only if it is ω-Suslin.
The following lemma is a fundamental fact of descriptive set theory.
Lemma 13. Suppose h ∈ N. A subset S of a standard space Aω is
1
Π1 (h) if and only if there is a a tree T on ω × A which is computable
from h, such that x ∈ S if and only if T ⊘x is well-founded.
Remarks on proof: See theorem 13.1 of [Kanamori3], and also theorem 25.3 of [Jech2]. ⊳
Lemma 14. Suppose T is a tree on ω, M is a model of a sufficient
fragment of ZFC, and T ∈ M . Then T is well-founded if and only if it
is well-founded in M .
Remarks on proof: See lemma 25.4 [Jech2] ⊳
An absoluteness theorem for Π11 predicates follows, called Mostowski’s absoluteness lemma. This, and lemma 13, are used in the proof of
the following lemma.
Lemma 15. Suppose S is a subset of a standard space Aω . Then S
is Σ1 -definable without parameters in Hℵ1 if and only if S is Σ21 .
Remarks on proof: Theorem 25.25 of [Jech2] proves this for subsets
of N; the argument for subsets of Aω requires only minor changes. See
also theorem 19.1 of [Miller]. ⊳
Theorem 16. Suppose V = L.
a. There is a ∆12 subset of C2 which is neither Lebesgue measurable
nor has the property of Baire.
b. There is a Π11 subset of C which does not have the perfect set
property.
Remarks on proof: For part a, <L ∩C2 is such a subset. Suppose
x ∈ Hℵ1 ∩ L. Let y = TC(x ∪ {x}); then |y| < ℵ1 and y ∈ L. As in the
proof of theorem 20.8, and since y is transitive, y ∈ Lα for some α < ℵ1 .
It follows that <L ∩C2 =<Lℵ1 ∩C2 is Σ1 -definable without parameters
in Lℵ1 , whence is Σ1 -definable without parameters in Hℵ1 . By lemma
197
15 <L ∩C2 is Σ21 . See corollary 25.28 of [Jech2] for the rest of the proof,
and also corollary 13.10 of [Kanamori3]. For part b, see theorem 13.12
of [Kanamori3]; this provides a more direct proof than that given for
corollary 25.37 of [Jech2]. ⊳
It is a classic result that it follows in ZF that if R can be wellordered then there is a subset of R which does not have the perfect set
property, is not Lebesgue measurable, and does not have the property
of Baire (theorem 11.4 of [Kanamori3]; see also exercises 10.1 and 11.7
of [Jech2]).
Models of ZF in which AC fails are of interest in various topics. It
may be of interest that a weakened version of AC holds. An example of
such is the principle of dependent choices, denoted DC. This states that
if R is a binary relation on a nonempty set S, and ∀x ∈ S∃y ∈ S R(x, y),
then there is an infinite sequence hxi : i ∈ ωi of elements of S such that
∀i ∈ ωR(xi , xi+1 ).
Theorem 17. Suppose M is a transitive model of ZFC containing
an inaccessible cardinal. There is a generic extension M [G] with the
following properties. Let S be the class in M [G] of infinite sequences of
ordinals.
a. HOD(S) is a model of ZF+DC, in which every set of reals is
Lebesgue measurable, has the property of Baire, and has the perfect
set property.
a. OD(S) is a model of ZFC, in which every projective set of reals is
Lebesgue measurable, has the property of Baire, and has the perfect
set property.
Remarks on proof: This is theorem 26.14 of [Jech2]. The proof
makes use of methods which have many applications, for example Levy
collapse and random reals. ⊳
This is one of the earliest examples of a consistency strength bound.
“There is an inaccessible cardinal” is an upper bound on the consistency
strength of “every projective set is Lebesgue measurable” and “ZF +
every set is Lebesgue measurable”. In [Shelah1] it is shown that if every
Σ13 set of reals is Lebesgue measurable then ℵ1 is an inaccessible cardinal
in L, so the bound is exact.
One further fact about the projective hierarchy will be given, which
illustrates additional basic methods of descriptive set theory.
Lemma 18. Suppose Aω is a standard space, κ is an infinite cardinal, and Y ⊆ N × A is κ-Suslin. Then X = π2 [Y ] is κ-Suslin.
Remarks on proof: See proposition 13.13.d of [Kanamori3]. Using
a bijection of κ × ω with κ, two existentially quantified variables can be
combined into one. ⊳
Theorem 19. Suppose Aω is a standard space, h ∈ N, and X ⊆ A
198
is Σ12 (h). Then X is the projection of a tree T̂ ∈ L[h] on ℵ1 × A, and in
particular is ℵ1 -Suslin.
Remarks on proof: See theorem 13.14 of [Kanamori3]. By the proof
of lemma 18, it suffices to prove the claim for X Π11 ; the bijection used
in the proof can be taken to be defined by a sufficiently simple formula.
Let T ∈ L[h] be a tree on ω × A such that x ∈ X if and only if T ⊘x
is well-founded. Let T̂ be the tree on ℵ1 × A, such that hs, ti ∈ T̂ if and
only if ∀i, j ≤ |s|(hFS(i), t ↾ |FS(i)|i ∈ T ∧ FS(i) ⊃ FS(i) ⇒ s(i) < s(j)).
This definition is sensible since, using a fact noted above, |FS(i)| ≤ i ≤
|s| = |t|.
Given a branch hŵ, xi of T̂ , for t ∈ T ⊘x let f (t) = ŵ(Cd(t)). Using
lemma 12, it follows that T ⊘x is well-founded. Conversely, if T ⊘x is
well-founded, let f be as in lemma 12, and let ŵ be any function such
that f (t) = ŵ(Cd(t)) for t ∈ T ⊘x. ⊳
There has been considerable effort in modern set theory to determine relations between independent questions of descriptive set theory,
and other questions, in particular concerning the determinacy of twoperson games. Some discussion will be given in the next four sections.
In particular, it will be shown in section 60 that if there is a measurable
cardinal then theorem 10 holds for Σ12 subsets. An alternative proof of
this fact may be found in theorems 8.G.4,9 of [Moschovakis].
58. Determinacy.
The theory of infinite games dates back to the 1930’s (see [Telgarsky]). Papers concerning relations with set theory appeared as early
as 1953, with additional results appearing throughout the 1960’s. The
subject has continued to evolve, and has become a major one in modern
set theory.
The “plays” of an infinite two-person game on a set A are the
elements of Aω . A “position” is an element s ∈ A<ω ; if |s| is even (resp.
odd) “player 1” (resp. 2) plays, by appending an element of A to s. The
game is specified by giving a subset W ⊆ Aω , which are the plays in
which player 1 wins. The notation GA (W ) is in common use to denote
this game. When A is fixed this may be abbreviated to G(W ).
A strategy for player 1 (resp. 2) in GA (W ) is a function σ1 :
∪l even Al 7→ A (resp. σ2 : ∪l odd Al 7→ A). A play hxl i accords with σ1
(resp. σ2 ) if xl = σ1 (hx0 , . . . , xl−1 i) whenever l is even (xl = σ2 (hx0 , . . . ,
xl−1 i) whenever l is odd). A strategy σ1 (resp. σ2 ) is a winning strategy
if any play that accords with the strategy is in A (resp. Aω − W ). Thus,
no matter how the other player plays, a player playing the strategy wins.
A game GA (W ), or the set W , is said to be determined if either player
I or player II has a winning strategy.
Recall from section 57 that Aω may be considered a topological
199
space, where a basis for the topology is {Ut : t ∈ A<ω }.
Theorem 1. If W is a closed subset of Aω then G(W ) is determined.
Proof: Say that a position s is viable if player 2 does not have
a winning strategy from then on, i.e., no σ2 with which s accords is
winning. If hx0 , . . . , xl−1 i is viable then there must be some xl such
that for any xl+1 , hx0 , . . . , xl−1 , xl , xl+1 i is viable. Suppose player 2
does not have a winning strategy, so that the empty sequence is viable.
Let σ1 be any strategy where at hx0 , . . . , xl−1 i, player 1 plays any xl as
described above. Suppose f ∈ Aω accords with σ1 . If f ∈
/ W then since
W is closed, there is some position s ⊆ f such that f ∈ Us ⊆ W c . But
then s is not viable, contradicting the assumption that f accords with
σ1 . Hence f ∈ W . This shows that σ1 is a winning strategy. ⊳
D. Martin proved in 1975 that every Borel subset of Aω is determined. A proof may be found in [Kechris]; some remarks will be given
here. Before proceeding, some basic facts about games will be noted.
- Games G1 and G2 are said to be equivalent if player 1 (resp. 2) has
a winning strategy in G1 if and only if he has one in G2 .
- A tree T is said to be pruned if it has no finite maximal branch,
i.e., for any s ∈ T there is a t ∈ T with t ⊃ s.
- The notion of a game may be generalized. Given a a pruned tree L
of “legal” positions, and W ⊆ Br(L), G(L, W ) is those plays where
all positions (finite prefixes) are in L. (For an “unrestricted” game,
L = A<ω .)
- A strategy on L is a function whose domain is the even length, or
odd length, sequences which are elements of L, and such that if
s = hx0 , . . . , xl−1 i ∈ L then hx0 , . . . , xl−1 , f (s)i ∈ L.
- A game G(L, W ) can be converted to an equivalent game G(W ′ ),
by adding to W those plays f where the least l such that f ↾ l ∈
/L
is even. Thus, GA (L, W ) may be considered an abbreviation.
- If π : A 7→ B is a bijection then GB (W ′ ) is equivalent to GA (W ),
where W ′ = {π ◦ f : f ∈ W }.‘
- If A ⊆ B then GA (W ) is equivalent to GB (Aω , W ).
- A game Gω (W ) can be converted to an equivalent game G2 (W ′ ).
Let c0n (resp. c1n ) be the sequence of 2n 0’s followed by 10 (resp.
01). Given f ∈ ω ω let f ′ be the concatenation of the strings cpnii
where ni = f (i) and pi equals 0 if i is even, and 1 if i is odd.
W ′ = W1 ∪ W2 where W1 is {f ′ : f ∈ W }, and W2 is the “illegal”
strings where the first mistake is made by player 2.
- A game G(W ) can be converted to a game G(W ′ ) such that player
1 (resp. 2) has a winning strategy in G(W ) if and only if player 2
(resp. 1) has a winning strategy in G(W ′ ). Given f , let f ′ be the
sequence where f ′ (n) = f (n + 1) (i.e., the first element is deleted);
200
then W ′ = {f : f ′ ∈ W c }.
- Theorem 1 generalizes, to G(L, W ) where W is a closed or open
subspace of the subspace Br[L] ⊆ Aω (theorem 20.1 of [Kechris]).
There is a standard method for proving determinacy, which cannot
be better described than by the following quotation from [Kechris]:
“The idea is . . . to associate to the game G(T, X) an auxiliary game
G(T ∗ , X ∗ ), which is known to be determined, usually a closed or
open game, in such a way that a winning strategy for any of the
players in G(T ∗ , X ∗ ) gives a winning strategy for the corresponding
player in G(T, X).”
Typically, a position of the auxiliary game has “additional information”;
this may be discarded, producing a position of the original game.
In the case of Borel determinacy, the auxiliary game is constructed
using the notion of a “cover” of a set L of legal positions. Such is given
by the following:
- a set L̃ of legal positions of the auxiliary game;
- a map π : L̃ 7→ L; and
- a map φ from strategies on L̃ to strategies on L.
These must satisfy the following restrictions.
- For f˜ ∈ Br(L̃), |π(f˜)| = |f˜|.
- For σ̃, τ̃ ∈ L̃, and any n, if σ̃ and τ̃ agree on positions s with |s| ≤ n
then so do φ(σ̃) and φ(τ̃ ).
- If f ∈ Br(L) accords with φ(σ̃) then there is a f˜ ∈ Br(L̃) which
accords with σ̃, such that π(f˜) = f .
Lemma 2. Suppose hL̃, π, φi is a cover of L, and W ∈ Br(L). If σ̃
is a winning strategy in G(L̃, π −1 [W ]) then φ(σ̃) is a winning strategy
in G(L, W ).
Proof: Suppose σ̃j wins in G(L̃, π −1 [W ]) where j = 1, 2, and f ∈
Br(L) accords with φ(σ̃j ). Suppose f˜ ∈ Br(L̃) accords with σ̃j and
π(f˜) = f . Then since σ̃j is winning, if j = 1 then f˜ ∈ π −1 [W ], and if
j = 2 then f˜ ∈ π −1 [W c ]. Thus, if j = 1 then f ∈ W , and if j = 2 then
f ∈ W c. ⊳
A subset of a topological space is said to be “clopen” if and only
if it is both closed and open. The basic open sets Ut ⊆ Aω are readily
seen to be clopen.
For an integer k ≥ 0 a cover is said to be a k-cover if L̃ ∩ ω ≤2k =
L ∩ ω ≤2k and π ↾ L̃ ∩ ω ≤2k is the identity. A cover is said to unravel
W ⊆ Br(L) if π −1 [W ] is a clopen subset of Br(L).
Lemma 3. Suppose L is a nonempty pruned tree and W ⊆ Br(L)
is closed. For each k ≥ 0 there is a k-cover of L that unravels W .
Remarks on proof: This is lemma 20.7 of [Kechris]. Its proof comprises the bulk of the work in proving Borel determinacy. ⊳
201
Lemma 4. Suppose k ∈ ω, and hTi+1 , πi+1 , φi+1 i is a cover of Ti
for all i ∈ ω. Then there is a pruned tree Tω , and for each i ∈ ω maps
πωi and φωi , such that for all i ∈ ω hTω , πωi , φωi i is a k + i-cover of Ti .
πi+1 ◦ πω,i+1 = πωi , and φi+1 ◦ φω,i+1 = φωi .
Remarks on proof: This is lemma 20.8 of [Kechris]. ⊳
Theorem 5. Suppose L is a nonempty pruned tree and W ⊆ Br(L)
is Borel. For each k ≥ 0 there is a k-cover of L that unravels W .
Remarks on proof: This is theorem 20.6 of [Kechris]. ⊳
The following theorem was proved in 1953, in the same paper as
theorem 1.
Theorem 6. There is a set of reals W such that Gω (W ) is not
determined.
Proof: The number of strategies is ℵℵ0 0 = 2ℵ0 . Let σiα for α < 2ℵ0
be an enumeration of the σi for i = 1, 2. Sets Wi = {wiα : α < ℵ0 }
for i = 1, 2 may be constructed by transfinite recursion as follows. At
stage α, first add to W2 some some w ∈ ω ω − (W1 ∪ W2 ) so that σ1α is
defeated, then add to W1 some some w ∈ ω ω − (W1 ∪ W2 ) so that σ2α is
defeated. Such w exist, because there are 2ℵ0 elements of N according
with a given strategy. Clearly neither player has a winning strategy in
Gω (W1 ). ⊳
The axiom of determinacy (AD) states that every subset of N is
determined. This can only hold in models where AC fails. By facts
noted above AD holds if an only if games in 2ω are determined.
Various questions regarding determinacy have been of considerable
interest in modern set theory, including the following.
- Does determinacy hold for sets of higher complexity than Borel?
- What effect does the assumption of determinacy for further types
of sets have on the universe, in particular the regularity properties
of sets of reals?
- What properties must models of ZF+AD have?
Chapters 27 to 32 of [Kanamori3] contain a survey of this work; other
references include [Jech2]. Some discussion will be given in the next
three sections.
Theorem 7. In ZF, AD implies that every countable family of
nonempty subsets of ω ω has a choice function.
Remarks on proof: See lemma 33.2 of [Jech2]. ⊳
In consideration of models where AD holds, sometimes the above
theorem suffices as a substitute for AC, and sometimes DC is assumed
to hold.
59. Determinacy and descriptive set theory.
Lebesgue measurability, the property of Baire, and the perfect set
property, for a set of reals, are known as regularity properties. As seen in
202
the preceding section, analytic sets have these properties. It was realized
in 1964, and earlier for the property of Baire, that these properties could
be formulated in terms of games.
The Banach-Mazur game has alphabet A = 2<ω − ∅. Any play is
legal. A play p may be transformed to an element f ∈ 2ω by concatenating the sequences p(i). Let χ denote the map p 7→ f .
Theorem 1. In the game as above, suppose X ⊆ C and W =
χ−1 [X].
a. Player 1 has a winning strategy in GA (W ) if and only if Ut − X is
meager for some t ∈ ω <ω .
b. Player 2 has a winning strategy GA (W ) if and only if X is meager.
c. GA (χ−1 [X − ∪t {Ut : X − Ut is meager}]) is determined if and only
if X has the property of Baire.
Remarks on proof: See proposition 27.3 and corollary 27.4 of [Kanamori3]. These are proved for N; the changes for C are minimal. ⊳
The perfect set game has alphabet A = 2<ω ∪ 2. A play is legal if
elements in even (resp. odd) positions are in 2<ω (resp. 2). A play p may
be transformed to an element f ∈ 2ω by concatenating the sequences
p(i) (elements of 2 being considered sequences of length 1). Let χ denote
the map p 7→ f .
Theorem 2. In the game as above, suppose X ⊆ C, L is the legal
positions, and W = χ−1 [X].
a. Player 1 has a winning strategy in GA (L, W ) if and only if X has
a perfect subset.
b. Player 2 has a winning strategy in GA (L, W ) if and only if X is
countable.
c. GA (L, W ) is determined if and only if X has the perfect set property.
Remarks on proof: For parts a and b see proposition 27.5 of [Kanamori3]. Part c follows easily. ⊳
The covering game has alphabet 2 ∪ (2<ω )<ω , and a real parameter
ǫ > 0. Given an element t̃ = ht1 , . . . , tk i ∈ (2<ω )<ω , let Nt̃ = ∪ki=1 Uti .
Let µ denote the Lebesgue measure. A play p is legal if:
1. elements in even (resp. odd) positions are in 2 (resp. (2<ω )<ω ); and
2. if p(2n + 1) = t̃ then µ(Nt̃ ) < ǫ/22(n+1).
Given a legal play p, let rp be the concatenation of the values in the
even positions, and let Op be the union of the neighborhoods in the odd
positions. Given X ⊆ 2ω , let W = {p : rp ∈ X = Op }. C is said to be a
minimal cover of X if X ⊆ Y , Y is measurable, and µ(Y ) is as small as
possible among such Y (it is not difficult to show that such Y exists).
Theorem 3. In the game as above, suppose X ⊆ C, and L is the
legal positions.
203
a. If player 1 has a winning strategy in GA,ǫ (L, W ) then there is a
measurable B ⊆ X such that µ(Y ) > 0.
b. If player 2 has a winning strategy in GA,ǫ (L, W ) then there is an
open Y ⊇ X such that µ(Y ) < ǫ.
c. Let Y be a minimal cover of X, and suppose that for any ǫ > 0
GA,ǫ (L, Y − X) is determined; then X is Lebesgue measurable.
Remarks on proof: Proposition 27.7 and corollary 27.8 of [Kanamori3] give a version for N; this is readily adapted to the case of C.
⊳
Theorem 4. It follows in ZF from AD that every subset of C has
the property of Baire, has the perfect set property, and is Lebesgue
measurable.
Proof: By remarks in section 58, the games of theorems 1, 2, and
3 are determined. One may verify that this follows in ZF; for example
there is a definable well-order on the positions. ⊳
The theorem is stated for subsets of C so that the Lebesgue measure
may be used; as usual more general facts hold. This theorem may be
seen to follow for various “pointclasses” Γ, i.e., if determinacy holds for
all sets in Γ then the regularity properties hold for all sets in Γ. The
notion of a pointclass is frequently used in descriptive set theory. A
pointclass is a set of subsets of one or more Polish spaces. Some authors
(such as [Kanamori3] and [Kechris]) call them classes.
Theorem 5. It follows in ZF from determinacy for Γ that every
set in Γ has the property of Baire, has the perfect set property, and is
Lebesgue measurable, where Γ is the Σ1n (h) or Π1n (h) subsets of C for
n ≥ 1 and h ∈ N.
Remarks on proof: The transformations involved in proving theorem 4 are all computable. ⊳
Note that by remarks in section 58 determinacy for Σ1n (h) is equivalent to determinacy for Π11 (h). For a more general version of the theorem
see for example exercises 6A.12,16,19 of [Moschovakis]. The theorem follows for the boldface classes by theorem 56.9 (or directly). Projective
determinacy (PD) is the statement that determinacy holds for any projective set; it follows from PD that any projective subset of C has the
regularity properties.
Theorem 6. It follows in ZF from determinacy for the Π1n (h) subsets
of C that every Σ1n+1 (h) subset of C has the property of Baire, has the
perfect set property, and is Lebesgue measurable, for n ≥ 1 and h ∈ N.
Remarks on proof: This is proved for the boldface classes in N,
in theorem 27.14 of [Kanamori3]. The method of proof is to define an
“unfolded” version of the games given above, where player 1’s plays are
augmented with a value y(i), where y will be the value of the leading
204
existentially quantified variable. See also sections 21.B,C of [Kechris]. ⊳
Various other properties of pointclasses are decided by assuming
determinacy. Among the most important are reduction and uniformization. The modern theory of these involves the use of norms and scales.
These in turn have become mainstays of descriptive set theory and related areas.
In what follows a pointclass is assumed to be “suitable”, i.e., to
have any required properties. The classes of the lightface or boldface
projective hierarchy of subsets of standard spaces as considered in section 57 are suitable. Suppose Γ is a pointclass. Given X ∈ Γ let X c
denote T −X where T is the space of which X is a subset. Let ¬Γ denote
{X c : X ∈ Γ}, the “dual” pointclass. Let ∆Γ denote Γ ∩ ¬Γ. Let ∃1 Γ
denote the sets X which can be written in the form {x : ∃w(hw, xi ∈ Y )}
where Y ∈ Γ and Y ⊆ N × T for some space T . ∀1 Γ is similarly defined.
Recall from section 32 that a quasi-order is a reflexive and transitive
binary relation. If ≤ is a quasi-order then the relation x ≤ y ∧ y 6≤ x is a
transitive irreflexive relation, called the strict part, and denoted x < y.
A pre-well-order is defined to be a quasi-order, such that x ≤ y ∨ y ≤ x,
and < is well-founded.
In descriptive set theory, a norm on a set X is a function f : X 7→
Ord. Given such, the relation f (x) ≤ f (y) is readily verified to be a
pre-well-ordering. On the other hand, given a pre-well-ordering ≤, the
canonical rank function ρ for < (defined in section 32) is a norm; further,
as is easily verified, x ≤ y if and only if ρ(x) ≤ ρ(y).
For a pointclass Γ a norm ρ on a set X ∈ Γ is said to be a Γ-norm
if the relation “x ∈ X ∧ ρ(x) ≤ ρ(y)” is in ∆Γ . Γ is said to have the
pre-well-ordering property if every X ∈ Γ has a Γ-norm.
A pointclass Γ is said to have the reduction property if whenever
X, Y ∈ Γ there are X ′ , Y ′ ∈ Γ such that X ′ ⊆ X, Y ′ ⊆ Y , X ′ ∪ Y ′ =
X ∪ Y , and X ′ ∩ Y ′ = ∅. A pointclass Γ is said to have the separation
property if whenever X, Y ∈ Γ and X ∩ Y = ∅ there is a Z ∈ ∆Γ such
that X ⊆ Z and Y ⊆ Z c .
Some facts about the foregoing pointclass properties will be stated
without proof; references are given to proofs in [Kanamori3].
- Γ has the reduction property if and only if ¬Γ has the separation
property (29.2).
- If Γ has the reduction property then it does not have the separation
property (assuming that Γ has universal sets) (29.3).
- If Γ has the pre-well-ordering property then it has the reduction
property (29.7).
- For h ∈ ω ω , Π11 (h) has the pre-well-ordering property (29.8).
- Suppose V = L. For h ∈ ω ω and n ≥ 2, Σ1n (h) has the pre-well205
ordering property (29.11).
- (First periodicity theorem.) It follows in ZF+DC that if determinacy holds for ∆Γ and ∃1 Γ ⊆ Γ, then if Γ has the pre-well-ordering
property then ∀1 Γ does (29.13).
- Suppose PD holds. For h ∈ ω ω , the classes Π1n (h) for n odd and
Σ1n (h) for n even and nonzero, have the pre-well-ordering property
(29.14).
A semiscale on a set X is a sequence hρi : i ∈ ωi of norms on X such
that, if hxi : i ∈ ωi is a sequence in X, x = limi→∞ xi , and for each n
there is a λn such that φn (xi ) = λn for sufficiently large i, then x ∈ X.
A scale is a semiscale such that in addition, for all n, φn (x) ≤ λn . (The
term “scale” is used for other purposes in set theory also.)
For a pointclass Γ a scale hρi i on a set X ∈ Γ is said to be a Γ-scale
if the 3-ary relation “x ∈ X ∧ ρn (x) ≤ ρn (y)” is in ∆Γ . Γ is said to have
the scale property if every X ∈ Γ has a Γ-scale.
Recall the definition of a function uniformizing a relation from section 46. Γ is said to have the uniformization property if whenever R is
a binary relation in Γ, there is a function f in Γ which uniformizes R.
Again, the following will be stated without proof, with references
being to [Kanamori3] unless otherwise indicated.
- If Γ has the uniformization property then Γ has the reduction property (exercise 1C.8 of [Moschovakis]).
- If Γ has the scale property and ∀1 Γ ⊆ Γ then Γ has the uniformization property (30.4).
- For h ∈ ω ω , Π11 (h) has the scale property (see theorem 8 below).
- Suppose V = L. For h ∈ ω ω and n ≥ 2, Σ1n (h) has the scale
property (30.5).
- (Second periodicity theorem.) It follows in ZF+DC that if determinacy holds for ∆Γ and if ∃1 Γ ⊆ Γ, then if Γ has the scale property
then ∀1 Γ does (30.8).
- Suppose PD holds. For h ∈ ω ω , the classes Π1n (h) for n odd and
Σ1n (h) for n even and nonzero, have the scale property (30.9).
An even stronger hypothesis than PD is ADL(R) . Consequences
of AD have become of interest, since if ADL(R) holds then such consequences hold in L(R), and if they are absolute, in V . The reader is
referred to [Kanamori3] and [Jackson] for surveys of this extensive and
ongoing work.
To illustrate the use of scales, an outline of a proof of a fact stated
above will be given. To begin with, note that for an ordinal γ and
a subset X ⊆ γ ω , if X is closed then X has a lexicographically least
(“leftmost”) branch. To see this, let s0 be the empty sequence. Letting
∗ denote concatenation, let si+1 be the least α such that si ∗ α ∈ Pr(X).
206
By induction, si is in Pr(X) and is lexicographically less than or equal
to x ↾ i for any x ∈ X. Since X is closed xl = ∪i si is in X, and is the
leftmost branch.
The leftmost branch xl of a subset X ⊆ γ ω has the property that,
for any x ∈ X with x 6= xl , there exists an i ∈ ω such that xl (i) < x(i).
A branch x0 ∈ X is said to be the honest leftmost branch of X if, for
any x ∈ X, x0 (i) < x(i) for all i ∈ ω. Such is clearly the leftmost branch
of X if it exists.
For an ordinal γ, a norm f : X 7→ Ord is said to be “into γ” if
f [X] ⊆ γ. A scale is said to be “into γ” if its norms are. Recall the
notion of a standard space from section 57.
Lemma 7. Let Aω be a standard space. For any subset X ⊆ Aω ,
there is a scale on X into γ if and only if there is a tree T on γ × A,
such that X = π2 [Br(T )], and Br(T ⊘x) has an honest leftmost branch
for all x ∈ X.
Remarks on proof: See proposition 30.2 of [Kanamori3]. ⊳
Theorem 8. For h ∈ ω ω , the class of Π11 (h) subsets of a standard
space has the scale property.
Remarks on proof: Suppose X ⊆ Aω , and T is a tree on ω × A
such that x ∈ X if and only if T ⊘x is well-founded. Let T̂ be the tree
on ℵ1 × A, derived from T as in the proof of theorem 56.18. Suppose
x ∈ X. Let ρ be the canonical rank function on T ⊘x. Let ŵ be the
function such that ŵ(i) = ρ(FS(i)) if FS(i) ∈ T ⊘x, else 0. This function
is the honest leftmost branch of Br(T̂ ⊘x). To see this, note that if ρ0 is
the canonical rank function on a well-founded tree, and ρ is any other
rank function, then it follows by induction on ρ0 (t) that ρ0 (t) ≤ ρ(t).
See exercise 30.3 of [Kanamori3] for further details. ⊳
The following are noted without proof; see exercise 30.7 of [Kanamori3].
- If Γ has the scale property and ∀1 Γ ⊆ Γ then ∃1 Γ has the scale
property.
- For h ∈ ω ω , the class of Σ12 (h) subsets of a standard space has the
scale property.
60. Determinacy and 0#.
Consider the following two statements.
1. a# exists for all subsets a ⊆ ω.
2. Π11 games on ω are determined.
D. Martin proved in 1970 that 1⇒2. L. Harrington proved in 1978 that
2⇒1. In fact lightface versions can be proved.
These results show that before determinacy beyond Π11 can be assumed, the existence of 0# must first be; in particular the consequences
of determinacy for higher complexity sets has no bearing on the latter
207
question. An overview of these results will be given in this section; this
will involve a treatment of further basic methods in descriptive set theory. The underlying space will be N for the first statement, and C for
the second.
For an ordinal γ, let <KB be the relation on γ <ω , where x <KB y
if and only if x ⊃ y or for some i ∈ Dom(s) ∩ Dom(t), ∀j < i(s(j) =
t(j)) ∧ s(i) < t(i). This order is called the Kleene-Brouwer order. A
routine verification shows that it is a linear order.
Lemma 1. Suppose T is a tree on γ. Then T is well-founded if and
only if T is well-ordered by <KB .
Remarks on proof: See exercise 13.2 of [Kanamori3]. ⊳
Suppose X is Π11 , and T is a tree on ω such that x ∈ X if and only
if T ⊘x is well-founded. For x ∈ ω ω let <x be the relation on ω where
i <x j if and only if
FS(i) ∈
/ T ⊘x ∧ FS(j) ∈
/ T ⊘x ∧ i < j∨
FS(i) ∈
/ T ⊘x ∧ FS(j) ∈ T ⊘x∨
FS(i) ∈ T ⊘x ∧ FS(j) ∈ T ⊘x ∧ FS(i) <KB FS(j).
The codes of non-members of T ⊘x come first, well-ordered by <; these
are followed by the codes of members of T ⊘x, ordered by <KB on the
sequences they code. It follows that <x is a well-order if and only if
<KB is a well-order, if and only if x ∈ X.
For t ∈ ω <ω let T ⊘t = {s ∈ A<ω : |s| < |t| ∧ hs, t↾|s|i ∈ T }; let <t
be the relation on {i : i < |t|} defined as <x , except using T ⊘t rather
than T ⊘x. It readily follows that <t is a linear order, if t1 ⊆ t2 then
<t1 ⊆<t2 , and <x = ∪t⊆x <t .
Let T ∗ be the tree on ℵ1 × ω, such that hs, ti ∈ T ∗ if and only if
∀i, j < |t|(i <t j ⇒ s(i) < s(j)). It readily follows that <x is a well-order
if and only if there is a map w : ω 7→ ℵ1 such that i <x j ⇒ w(i) < w(j)),
∗
if and only if there is a w ∈ ℵω
1 such that hw, xi ∈ Br(T ).
For the following, an element h ∈ ω ω can be considered as its code,
a real; h# may be used to denote h̃#.
Theorem 2. Suppose h# exists where h ∈ ω ω . Then Π11 (h) subsets
of N are determined.
Remarks on proof: See theorem 31.2 of [Kanamori3]. Let A be
the alphabet (ℵ1 × ω) ∪ ω. A position is legal if the elements in even
(resp. odd) positions are in ℵ1 × ω (resp. ω). Let M be the set of legal
positions. Given a play p, let x be the element of ω ω where x(i) =
π2 (p(i)) for i even, and p(i) for i odd; and let w be the element of ℵω
1
where w(i) = π1 (p(2i)). Let W be the plays such that hw, xi ∈ Br(T ∗ ).
It is readily seen that W is closed in Br(M ) (if player 1 loses there
is some i such that w ↾ 2i is not order-preserving). By theorem 20.1 of
[Kechris], GA (M, W ) is determined. By lemma 56.13 T can be chosen
208
in L[h]. Given such a T , it follows readily that T ∗ and W are in L[h],
so in fact GA (M, W ) is determined in L[h].
Suppose σ1 ∈ L[h] is a winning strategy for player 1 in GA (M, W )
in L[h]. If there is play in V according with σ1 , which is in Br(M ) − W ,
then since Br(M ) − W is open there are such plays in L[h]. Thus, σ1 is
a winning strategy for player 1 in V . Let σ1′ be the strategy for Gω (X),
where player 1 reconstructs the values w(i) and then plays according to
σ1 ; this is a winning strategy for player 1 in Gω (X).
Suppose σ2 ∈ L[h] is a winning strategy for player 2 in GA (M, W )
in L[h]. Let P ⊆ M be the positions according with σ2 which are
“viable” for player 1, i.e., there is a play extending the position which
player 1 wins. P must be well-founded in L[h]. By absoluteness of
well-foundedness (lemma 0.3 of [Kanamori3]), P is well-founded in V .
It follows that σ2 is a winning strategy for player 2 in V .
By the hypothesis that h# exists, there is a Skolem term defining
σ2 involving Silver indiscernibles of L[h]; further the indiscernibles may
be taken as less than γ where γ is an ordinal less than ℵ1 . Given a
position u of length 2i + 1, let s and t be defined as w and x for a play
p as above. If s(j) ≥ γ for all j < 2i + 1, and the elements s(j) are
distinct, then σ2 (u) does not depend on s; let σ2′ be the strategy for
player 2 in Gω (X), where σ2′ (t) equals σ2 (u) for any such s.
Suppose x accords with σ2′ . If x ∈ X then there is a w ∈ ℵω
1
such that hw, xi is in Br(T ∗ ) and accords with σ2 . But then x ∈
/ X, a
contradiction. Thus, σ2′ is a winning strategy for player 2 in Gω (X). ⊳
By theorem 56.9, if h# exists for any h ∈ ω ω , then any game in
1
Σ1 is determined. By theorem 58.6, it follows that every Σ12 set has the
regularity properties. As noted in section 42, the hypothesis follows if a
measurable cardinal exists. This proves a fact mentioned at the end of
section 57.
The converse of theorem 2 also holds (theorem 9 below). An outline of a proof will be given, following [Harrington]. This will be for
the unrelativized lightface pointclass; but as noted in [Harrington], the
argument may be adapted to the lightface pointclass relative to an oracle. The proof makes use of a notion of forcing, that of tagged trees;
see [Sami] for a forcing-free proof.
By a real will be meant an element of C, indifferently considered
as a subset of ω. Given reals a and b, a ≤T b will denote that a is
computable from b; and a ≡T b that a ≤T b ∧ b ≤T a. A set X of reals
is said to be Turing closed if a ∈ X ∧ a ≡T b ⇒ b ∈ X. A Turing cone
is a set of the form {b : a ≤T b}.
Various sets may be coded as a real. A binary relation R on ω
may be coded as {GP(x, y) : R(x, y)} (GP is defined in appendix 2).
209
The code for a function (in particular a strategy) is a special case. A
countable structure for the language of set theory may be coded by the
code for the membership relation, after choosing some enumeration of
the domain. A tree T may be coded as the set of codes of the sequences
in T .
Lemma 3. Suppose X is Turing closed. If player 1 (resp. 2) has a
winning strategy in G2 (X) then X (resp. X c ) contains a Turing cone.
Proof: Let f be a winning strategy for player 1, coded as a real.
Suppose f ≤T g. Let h be the play according with f , where player 2
plays g. It is easily seen that h ≡T g. Also, h ∈ X, whence g ∈ X since
X is Turing closed. The proof for player 2 is similar. ⊳
For the rest of this section, for a real a let ω1 (a) denote the smallest
ordinal α > ω such that Lα [a] is admissible. Let A0 = {x ∈ C : ∃y ≤T
x (y codes an end extension of Lω1 (x) )}.
Lemma 4. A0 is Turing closed and Σ11 .
Remarks on proof: This is stated without proof following definition
3.1 of [Harrington]. Clearly A0 is Turing closed. The statement that
y codes a structure for the language of set theory having the required
properties is first order with free second order variables x and y. ⊳
Lemma 5. A0 is cofinal in the quasi-order ≤T on C.
Remarks on proof: See lemma 3.2 of [Harrington]. Given x, using
facts from model theory, choose a model M of KP such that M contains
no nonstandard integers, x ∈ M , and there is an element α ∈ M which
is a nonstandard countable ordinal in M . Let w be a real which codes
Lα in M . Let y as in the definition of A0 be GP(x, w). ⊳
Lemma 6. If A0 is determined then it contains a Turing cone.
Proof: This follows by lemmas 3 and 5. ⊳
For a limit ordinal α, let Q(α) be the poset whose elements are the
pairs ht, ri where t is a finite tree on ω and r : t 7→ α ∪ ∞ is a function
with the following properties:
1. if s ⊃ t and r(t) 6= ∞ then r(s) < r(t); and
2. r(∅) = ∞.
The relation ht1 , r1 i ≥Q(α) ht2 , r2 i holds if t1 ⊆ t2 and r1 = r2 ↾ t1 .
Given a notion of forcing hM, Q(α)i (where M is a transitive set
which is a model of some fragment of set theory), an M -generic filter in
Q(α) yields a tree T and an extended norm R : T 7→ α∪∞. It follows by
basic facts about forcing that if R(t) 6= ∞ then R(t) = sup{R(s) + 1 : s
is a son of t}.
A proof of the last mentioned fact will be outlined. In a poset P ,
for p ∈ P let p≤ denote {q : q ≤ p}; this may be considered a sub-poset,
with the inherited order. A subset X ⊆ P is said to be dense below P
if X is a dense subset of p≤ , i.e., X ⊆ p≤ and ∀q ≤ p∃r ∈ X(r ≤ q).
210
If G is an M -generic filter for a notion of forcing hM, P i, p ∈ G,
and D is dense below p, then D ∩ G 6= ∅. This may be seen by noting
that D′ = D ∪ {q ∈ P : p and q are incompatible} is dense.
Suppose ht, ri ∈ P , u ∈ t is a node, β(u) > β, and β > β(v) for
any son v of u. Then the set of trees extending t such that u has a son
labeled β is readily seen to be dense below t. The claim for R follows.
Further discussion of this notion of forcing will be omitted; see
[Harrington] and [Steel2].
Lemma 7. Suppose {x : a0 ≤T x} ⊆ A0 , and α > ω is a countable
ordinal such that Lα [a0 ] is admissible. Then α is a cardinal in L.
Remarks on proof: See lemma 3.4 of [Harrington]. Suppose κ < α
is a cardinal in L, X ⊆ κ, and X ∈ L. Choose a countable limit ordinal
ξ > α with X ∈ Lξ . Choose an Lξ -generic filter in Q(α). Let T be its
tree, and let b = GP(a0 , T ); then ξ < ω 1 (b).
Since b ∈ A0 , let N be an end-extension of Lξ , whose code is
computable from b. Then N ∈ Lκ+1 . From this it may be shown that
X ∈ Lκ·3 . It then follows that X ∈ Lα [a0 , T ]. Since T is generic a
forcing argument may be given to conclude that X ∈ Lα [a0 ].
Thus far it has been shown that if X is a constructible subset of κ
then X ∈ Lα [a0 ]. An ordinal less than κ+L is coded by such an X, so
is in Lα [a0 ] since Lα [a0 ] is admissible, so is in L. Thus, α is a cardinal
in L. ⊳
Corollary 8. Suppose {x : a0 ≤T x} ⊆ A0 , and α > ω is an ordinal
such that Lα [a0 ] is admissible. Then α is a cardinal in L.
Remarks on proof: See lemma 7.22 of [MansWeit]. ⊳
Theorem 9. Suppose Σ11 subsets of C which are Turing closed are
determined. Then 0# exists.
Remarks on proof: This is theorem 4.1 of [Harrington]. Let A0 be
as above. By hypothesis and lemma 4 A0 is determined, whence by
lemma 6 some a0 with {x : a0 ≤T x} ⊆ A0 exists.
In L[a0 ], an elementary substructure X ⊆ Lℵ3 [a0 ] may be chosen so
that |X| = ℵ1 , ℵ2 ∈ X, and X ω ⊆ X. Let j : Lα [a0 ] 7→ Lℵ3 [a0 ] be the
inverse of the transitive collapse. It follows that Lα [a0 ] is admissible,
whence by corollary 8 α is a cardinal in L. Also, α < ℵ2 .
Since |X| = ℵ1 , j is nontrivial; let κ be the critical point. Let
U = {Y ⊆ κ : Y ∈ L, κ ∈ j(Y )}. It follows that U is a countably
complete L-ultrafilter, and the theorem follows by theorem 44.5 and
theorem 36.2 adapted to L-ultrafilters. ⊳
61. Determinacy and large cardinals.
It has already been seen in section 60 that Π11 determinacy has
large cardinal strength. An investigation into the connection between
211
determinacy and large cardinals led to dramatic developments in modern
set theory. To quote [Neeman1]:
“In 1985 the faith in this connection was fully justified. A sequence of results of Foreman, Magidor, Martin, Shelah, Steel, and
Woodin . . . brought the identification of a new class of large cardinals, known now as Woodin cardinals, new structures of iterated
ultrapowers, known now as iteration trees, and new proofs of determinacy, including a proof of ADL(R) . Additional results later on
obtained Woodin cardinals from determinacy axioms, and indeed
established a deep and intricate connection between the descriptive set theory of L(R) under AD, and inner models for Woodin
cardinals.”
The articles [Neeman1] and [KoeWood] contain an extensive survey
of these results. Some brief remarks will be made on these articles.
The article [Neeman1] contains self-contained proofs of the following.
- Suppose that there are n Woodin cardinals and a measurable cardinal above them. Let A ⊆ ω ω be Π1n+1 . Then Gω (A) is determined.
(Corollary 5.30).
- Suppose that there are ω Woodin cardinals and a measurable cardinal above them. Then AD holds in L(R). (Theorem 8.24.)
The methods used in the proofs have many other uses, and an extensive
history. These include
homogeneous trees, homogeneously Suslin sets, extender models,
iteration trees.
Some of these topics are covered in [Jech2] and [Kanamori3].
The article [KoeWood] contains a self-contained proof of the following.
- Assume ZF+AD. Then there is a model of ZFC+“There are ωmany Woodin cardinals”. (See theorem 6.2).
62. Forcing axioms.
A forcing axiom is a statement which asserts the existence of a
generic set for a family of notions of forcing. One example, Martin’s
axiom, has already been seen in chapter 28. As for Martin’s axiom, it
is of interest what facts hold concerning a forcing axiom.
The following are among the most commonly encountered forcing
axioms:
- PFA: If P is a proper notion of forcing and D is a family of predense
subsets of P with |D| ≤ ℵ1 then there is a filter G ⊆ P such that
G ∩ D 6= ∅ for all D ∈ D.
- SPFA: As for PFA, but with P semiproper.
- MM: As for PFA, but with P stationary set preserving.
212
- The “bounded versions” BPFA, BSPFA, and BMM of the above,
where the sets D ∈ D are restricted to have |D| ≤ ℵ1 .
It is virtually immediate from the definitions that
MM ⇒ SPFA ⇒ PFA
⇓
⇓
⇓
BMM ⇒ BSPFA ⇒ BPFA.
In the unbounded cases, “predense” can be replaced by “dense” in the
definition (exercise 14.4 of [Jech2]).
After earlier results, it was shown in [Moore] that BPFA ⇒ 2ℵ0 =
ℵ2 . This result is one reason for interest in forcing axioms, since no
known large cardinal axiom settles CH one way or the other.
Recall the definition of MAℵ1 from section 29. As observed in
section 54, if P is c.c.c. then P is proper, and it follows that PFA ⇒
MAℵ1 . By the preceding paragraph, it then follows that PFA ⇒ MA.
As seen in section 28, the consistency of MA follows from the consistency of ZFC. BPFA, on the other hand, was shown in [GoldShe] to
be equiconsistent with the existence of a “Σ1 -reflecting” cardinal, which
is stronger than inaccessible but weaker than Mahlo. It was shown in
1988 that if there is a supercompact cardinal then there is a forcing
model in which SPFA holds (theorem 37.9 of [Jech2]). It is also true
that SPFA ⇒ MM (theorem 37.10 of [Jech2]). Better upper and lower
bounds are known for the consistency strength of various forcing axioms.
This is an active area of research; see [Neeman2] for some remarks.
Various of these axioms, or related axioms, have various consequences. Among the most notable such implications are the following.
- In [Steel4] it is shown that PFA implies ADL(R) .
- In [Todor2] it is shown that PFA implies ¬κ for any cardinal
κ ≥ ℵ1 (and hence, by corollary 52.7, that 0# exists).
- In [Vaile] a statement CP is defined, and shown to follow from
the statements MRP, PID, and “there exists a strongly compact
cardinal”. Both MRP and PID were already known to follow from
PFA. CP in turn implies SCH.
63. Some observations.
In a poll taken in 2000, 31 out of 31 set theorists replied that they
did not believe that V = L. The author has been maintaining that the
situation should be re-assessed, and the possibility that V = L taken
more seriously.
Since Cohen’s invention of forcing in 1963, set theory has undergone
a series of dramatic advances. Various of the most complex of these
involve statements which are false if V = L, and set theorists argue that
this complexity is indicative of the fact that V = L “impoverishes” the
213
universe and should be rejected. That large cardinal hypotheses and
PD result in a “richer” universe is “a posteriori” evidence in favor of
them.
It is clear, though, that independent questions must be resolved by
more fundamental considerations. Set theorists since Godel have been
attempting to come to grips with what such might be, and the explosion
of knowledge has not clarified the situation in the least.
One question which seems fundamental is the existence of Suslin
trees. The question of their existence was raised in 1920, and shown to
be independent of ZFC in the early 1970’s.
The proof of theorem 26.4 “constructs” a Suslin tree by judiciously
choosing a countable set of branches at limit stages.
On the other hand, the proof of theorem 29.2 involves a “blanket”
assumption, which has as a consequence that an uncountable branch
exists.
It is clear that the first proof has at least equal claim to representing
the truth about the real numbers as the second proof.
Another fundamental question is whether ω1L = ω1 . This assumption does not require assigning a value to |Pow(ω)|. In the process of
obtaining constructible bijections of ω with countable ordinals, ω1 steps
of the constructibility process may be performed. Although most set
theorists currently disagree, one view is that assumptions which limit
the number of constructible bijections to be countable must somehow
be “pathological”.
Appendix 1. Axioms for plane geometry.
In a modern system of axioms for the Euclidean plane, the plane is
considered to consist of a set P of points. In addition, certain subsets of
P , called lines, are given. Roman letters x, y, . . . will be used to denote
points. Greek letters α, β, . . . will be used to denote lines.
The language of plane geometry is ∈, B, C. B(x, y, z) is intended to
hold if y lies on the line through x and z, and is strictly between them.
C(w, x, y, z) is intended to hold if the distance between w and x equals
the distance between y and x.
Before giving the axioms it is useful to introduce some defined concepts and notation (used only in this appendix), as follows. To simplify
the notation, x-y-z is used for B(x, y, z), and wx≡yz for C(w, x, y, z).
- Lines α and β are said to intersect, at the point x, if α ∩ β = {x};
and to be parallel if α ∩ β = ∅.
- Three points are said to be collinear if they lie on a common line.
- Given points x and y let (xy) denote {z : B(x, z, y)}.
- Let [xy] denote (x, y) ∪ {x, y}; this is called the line segment, or
simply segment, between x and y, and (xy) is called its interior.
214
- If wx≡yz the line segments [wx] and [yz] are said to be congruent.
- The notation xyz is used as an abbreviation for hx, y, zi.
- The notation xyz≡x′ y ′ z ′ denotes that xy≡x′ y ′ , xz≡x′ z ′ , and yz≡
y ′ z ′ all hold.
- An ordered triple xyz is called a triangle if the points are distinct
and non-collinear. The points x, y, and z are called its vertices,
and the line segments [xy], [xz], and [yz] its sides. Side [xy] is said
to be opposite z, side [xz] opposite y, and side [yz] opposite x. A
side not opposite a vertex is called adjacent to it.
- Two triangles xyz and x′ y ′ z ′ are called congruent if xyz≡x′ y ′ z ′ .
The axioms can be divided up into several groups. The first group
are the “incidence axioms”, giving the restrictions on the way in which
points can lie in lines.
I1 Given two distinct points there is exactly one line containing them.
I2 Each line contains at least three distinct points.
I3 There is a set of three distinct non-collinear points.
These axioms may not seem to say much, but already they have some
consequences. For example two distinct lines intersect at at most one
point; for if there were two points x and y then by axiom I1 there could
only be one line. Also, given any line α there is a point x which does not
lie on it, else axiom I3 would be contradicted since all points would lie
on α. We use the notation α(xy) to denote the unique line containing
the distinct points x and y.
The next axiom is called the axiom of parallels.
P Given a line α and a point x not on α there is exactly one line β
such that x is in β and β is parallel to α.
As an immediate consequence of the parallel axiom, if α is parallel to
β, and β is parallel to γ, then α is parallel to γ or equals γ. For if α
and γ intersect at the point x, then α and γ would both contain x and
be parallel to β, and so by axiom P they would be identical.
The next group of axioms concerns the betweenness relation.
B1 If x-y-z then x, y, z are distinct collinear points.
B2 If x, y, z are distinct collinear points then exactly one of x-y-z, y-x-z,
y-z-x holds.
B3 If x-y-z then z-y-x.
B4 Suppose xyz is a triangle; α is a line containing none of its points;
and α intersects (yz). Then either α intersects (xy) or α intersects
(xz).
Axiom B4 is called Pasch’s axiom; it states that if a line intersects the
interior of one side of a triangle then it intersects the interior of a second,
provided the line does not contain any vertex.
The axioms concerning equality of distance are as follows.
215
D1
D2
D3
D4
D5
D6
D7
xx≡yz if and only if y = z.
xy≡xy.
xy≡yx.
If uv≡wx then wx≡uv.
If uv≡wx and wx≡yz then uv≡yz.
If x-y-z, x′ -y ′ -z ′ , xy≡x′ y ′ , and yz≡y ′ z ′ then xz≡x′ z ′ .
Suppose xyz and x′ y ′ z ′ are triangles with xyz≡x′ y ′ z ′ ; w, y, z are
collinear; w′ , y ′ , z ′ are collinear; yw≡y ′ w′ ; and wz≡w′ z ′ . Then
xw≡x′ w′ .
D8 Given x in α and distinct y, z there are exactly two points u = u1 , u2
in α such that xu≡yz. Further u1 -x-u2 .
D9 If xyz is a triangle and y ′ z ′ ≡yz then there are exactly two points
x′ = x′1 , x′2 such that xyz≡x′ y ′ z ′ . Further α(y ′ z ′ ) intersects (x′1 x′2 ).
The final axiom requires some definitions. A cut of a line α consists
of two nonempty subsets A and B of α such that
1. every point of α is in exactly one of A or B, and
2. whenever x and z are both in the same set and x-y-z, then y is in
that set also.
Given a cut, a cutpoint is defined to be a point p such that whenever
x ∈ A and y ∈ B then either x = p, y = p, or x-p-y.
C If A and B form a cut of a line α then there is a cutpoint.
A model for these (second order) axioms can be given, with P =
R × R, which is also denoted as R2 . The following conventions will be
adopted for points in R2 .
- A point x equals hx1 , x2 i for some x1 , x2 ∈ R.
- The sum x + y of two points in equals hx1 + y1 , x2 + y2 i.
- The scalar multiple rx of a point by a real number equals hrx, ryi.
- For distinct x, y, let α(xy) denote {x + t(y − x) : t ∈ R}.
- The
p “Euclidean distance” d(x, y) is defined to be
(x1 − y1 )2 + (x2 − y2 )2 .
The lines of the model are taken as the sets α(xy) for distinct x, y (the
set of such sets; different pairs might yield the same line). B(x, y, z) is
taken to hold if ∃t(0 < t < 1 ∧ y = x + t(z − x)). C(w, x, y, z) is taken
to hold if d(w, x) = d(y, z).
Theorem 1. R2 , with lines, B, and C interpreted as above, satisfies
the axioms for the Euclidean plane.
Proof: The proof is a straightforward but tedious verification that
all the axioms hold. Proofs will assume basic facts about the real numbers, which are well-known and can easily be proved from the axioms.
In particular, for a non-negative r ∈ R there is a unique non-negative
√
s ∈ R such that s2 = r, called the square root of r and denoted r.
216
For axiom I1, clearly x and y are elements of α = α(xy). Suppose
x, y ∈ β where β = α(x′ y ′ ), say x = x′ +t1 (y ′ −x′ ) and y = x′ +t2 (y ′ −x′ );
then y − x = (t2 − t1 )(y ′ − x′ ). Thus, x + t(y − x) = x′ + (t1 + t(t2 −
t1 ))(y ′ − x′ ), showing that α ⊆ β. Also x′ = x − (t1 /(t2 − t1 ))(y − x)
and y ′ = x + ((t1 + 1)/(t2 − t1 ))(y − x), so x′ ∈ α and y ′ ∈ α, whence
β ⊆ α also.
For axiom I2, (1/2)(x + y) is a third point. For axiom I3, h0, 0i,
h1, 0i, and h0, 1i are non-collinear.
For axiom P, let α(x1 y1 ) and α(x2 y2 ) be two lines. Let vi = yi − xi
for i = 1, 2. A point of intersection would be determined by values t1
and t2 such that x1 + t1 v1 = x2 + t2 v2 , and t1 and t2 would be a solution
to the linear equations
v11
v12
−v21
−v22
t1
t2
x21 − x11
=
.
x22 − x12
By linear algebra there are three cases, v2 is not a scalar multiple of
v1 and the lines intersect at a single point; v2 = sv1 for some nonzero
s ∈ R and x2 = x1 + tv2 for some t ∈ R and the lines are the same;
or the remaining case and the lines are parallel. Suppose in axiom P
α = α(x1 y1 ). Then any α(x2 y2 ) containing x and parallel to α must
have v2 a scalar multiple of v1 . Given two such α(x2 , y2 ) and α(x′2 y2′ ), x
lies on both and v2′ is a scalar multiple of v2 , so the lines are the same.
By the definitions, if B(x, y, z) then x, y, z are colinear; further
they are distinct, since x + t(z − x) takes on distinct values as t does.
This proves axiom B1. Axiom B2 follows by algebra, by considering
the cases t < 0, 0 < t < 1, and t > 1. Axiom B3 follows because
x + t(z − x) = z + (1 − t)(x − z).
For axiom B4, suppose without loss of generality that α intersects
(yz) at w = y + t(z − y) where 0 < t < 1. If α is not parallel to α(xy)
or α(xz) then for some p, q, r, s, v
y + r(x − y) = w + pv and z + s(x − z) = w + qv.
Given v there is exactly one solution; given v, p, q, r = (p/q)(1 − t) + t
and s = (1 − t) + (q/p)t is the solution. If s ≥ 1 then q/p ≥ 1 so
0 < p/q ≤ 1 and so t < r ≤ 1; and if s ≤ 0 then q/p ≤ −(1 − t)
so −t/(1 − t) ≤ p/q < 0 and so 0 ≤ r < t. But r 6= 0, 1 and so if
0 < s < 1 is false then 0 < r < 1 is true. If α is parallel to α(xy) then
z + s(x − z) = w + q(x − y) where q = s = 1 − t. The case α parallel to
α(xz) is similar.
Recalling from section 14 the definition of a metric function, it is
next shown that the Euclidean distance is a metric function. Note first
217
that x2 ≥ 0, and the sum of non-negative values is non-negative, so
d(x,
because
√ y) is defined for any two points x, y. Requirement 1 follows
x ≥ 0 if it is defined. Requirement 2 follows because x2 > 0 if x > 0,
and other basic properties of ≤. Requirement 3 is immediate.
The proof of M4 benefits from some additional definitions. The
“inner product” x · y of two elements √
of R2 is defined to be x1 y1 + x2 y2 ,
and the “norm” |x| is defined to be x · x; then d(x, y) = |x − y|. M4
follows from
|x + y| ≤ |x| + |y|
(1),
since then |x − z| ≤ |x − y| + |y − z|.
Both sides of (1) are nonnegative, so squaring both sides yields
an equivalent relation; after canceling terms it remains to show that
x · y ≤ |x||y|. Indeed |x · y| ≤ |x||y| holds, a fact known as the CauchySchwarz inequality. This follows by observing that
x
y 2
y
x
±
·
.
0≤
=2±2
|x| |y|
|x| |y|
It is easily seen from the above proof that equality holds in (1) if
and only if x · y = |x||y|, and that this holds for nonzero x, y if and only
if x/|x| = y/|y|, or equivalently y = cx for some scalar c > 0. Equality
holds in M4 for distinct points if and only if y − x is a positive scalar
multiple of z − y, or equivalently if x-y-z.
Axioms D1-D5 follow easily from the fact that d is a metric function
(and the axioms of equality for R).
For axiom D6, if x-y-z then as observed in the proof of M4, d(x, y)+
d(y, z) = d(x, z); similarly d(x′ , y ′ ) + d(y ′ , z ′ ) = d(x′ , z ′ ). By hypothesis
d(x, y) = d(x′ , y ′ ) and d(y, z) = d(y ′ , z ′ ), so d(x, z) = d(x′ , z ′ ).
For axiom D7, from the hypotheses it follows that w = y + t(z − y)
and w′ = y ′ + t′ (z ′ − y ′ ) where t′ = t. Thus
(d(x, w))2 = ((x1 − y1 ) − t(z1 − y1 ))2 + ((x2 − y2 ) − t(z2 − y2 ))2
= ((x1 − y1 )2 + (x2 − y2 )2 )
− 2t((x1 − y1 ))(z1 − y1 ) + (x2 − y2 )(z2 − y2 ))
+ t2 ((z1 − y1 )2 + (z2 − y2 )2 )
= ((x1 − y1 )2 + (x2 − y2 )2 )
− t(((x1 − y1 )2 + (x2 − y2 )2 ) − ((x1 − z1 )2 + (x2 − z2 )2 )
+ ((y1 − z1 )2 + (y2 − z2 )2 )))
+ t2 ((z1 − y1 )2 + (z2 − y2 )2 )
where the last step uses the equality 2(x − y)(z − y) = (x − y)2 − (x −
z)2 + (y − z)2 . A similar equation is obtained for (d(x′ , w′ ))2 , with
218
all variables primed. By the hypotheses the right sides are equal, so
d(x′ , w′ ) = d(x, w).
For axiom D8, let w be any other point of α and let u1 = x+t(w−x),
u2 = x − t(w − x), where t = d(y, z)/d(w, x). Then d(ui , x) = td(w, x) =
d(y, z) for i = 1, 2; further x = u1 + (1/2)(u2 − u1 ), so u1 -x-u2 .
For axiom D9, let d = d(y, z), v = h−(z2 − y2 ), z1 − y1 i, and
(x1 − y1 )(z1 − y1 ) + (x2 − y2 )(z2 − y2 )
,
d2
−(x1 − y1 )(z2 − y2 ) + (x2 − y2 )(z1 − y1 )
.
u=
d2
Then it is a straightforward exercise to show that x − y = t(z − y) + uv,
d(x, y) = (t2 + u2 )d2 , and d(x, z) = ((1 − t)2 + u2 )d2 . These facts hold
similarly with all quantities primed, and so xyz≡x′ y ′ z ′ if and only if
t2 + u2 = t′ 2 + u′ 2 and (1 − t)2 + u2 = (1 − t′ )2 + u′ 2 , which holds if
and only if t′ − t and u′ = ±u. Finally (x′1 , x′2 ) intersects α(y ′ z ′ ) at
y ′ + t(z ′ − y ′ ), because this point is x′1 + (1/2)(x′2 − x′1 ).
The relation B(x, y, z) may be defined in R by the formula x <
y < z ∨ x > y > z. Define a cut-pair in R to be a pair of subsets A
and B as in the definition of a cut in a line; and a cutpoint also. It may
be assumed without loss of generality that elements of A are less than
elements of B. It follows by algebra and the least upper bound property
that either A has a least upper bound, or B has a greatest lower bound.
Further, that point is a cutpoint.
For axiom C, let α = α(xy). The cut of α induces a cut-pair of R;
t lies in the subset in R corresponding to the subset that x + t(y − x)
lies in. The cutpoint of this pair corresponds to a cutpoint for the cut
in α.
This concludes the proof of theorem 1. ⊳
Before stating the next theorem, some definitions and lemmas
needed in its proof will be given. To distinguish points of the Euclidean
plane from points of R2 , capital letters will be used for the former. Facts
about the Euclidean plane will be proved using the axioms for it.
Lemma 2. If XY Z is a triangle and α is a line not containing any
of its vertices then it is not the case that α intersects all of (XY ), (XZ),
and (Y Z).
Proof: Suppose to the contrary that U, V, W are the points of intersection; suppose also without loss of generality that U -V -W . Then
the triangle Y U W and the line α(XZ) contradict axiom B4. ⊳
Given any line α, say that two points X and Y not lying on α are
on opposite sides of α if α intersects (XY ), and are on the same side
otherwise. Using axioms I3 and D9, there are points X and Y not on α
and lying on opposite sides.
t=
219
Lemma 3. Suppose X and Y , and Y and Z, are on the same side
of α, or X and Y , and Y and Z, are on opposite sides of α. Then X
and Z are on the same side of α.
Proof: Suppose the first possibility holds. Consider the triangle
XY Z; if α intersects (XZ) it must intersect (XY ) or (Y Z). If the
second possibility holds the claim follows similarly using lemma 2. ⊳
Lemma 4. For O, A, X, Y in α suppose X-O-A and Y -O-A, or
¬X-O-A and ¬Y -O-A; then ¬X-O-Y .
Proof: The case X-O-A and Y -O-A follows from the case ¬X-O-A
and ¬Y -O-A by reversing the roles of O and A; so consider the latter
case. The claim is straightforward if X or Y equals O or A, so suppose
not. Choose P not in α. Choose Q in α(AP ), Q 6= A, P . If A-P -Q,
considering the triangle QAX together with the line α(OP ) there is
an R in α(OP ) with Q-R-X. Similarly there is an S in α(OP ) with
Q-S-Y . By lemma 1, considering the triangle QXY and the line α(OP ),
¬X-O-Y . If A-Q-P , reverse the roles of P and Q in the preceding
argument. If Q-A-P , by lemma 3 X and Y are on the same side of
α(OP ), whence ¬X-O-Y . ⊳
Lemma 5. Given a line α, an order < can be defined on it such
that X-Y -Z if and only if either X < Y < Z or Z < Y < X. There are
exactly two such orders, <1 and <2 , and X <1 Y if and only if Y <2 X.
Proof: Suppose α = α(O, U ). A unique order may be defined such
that O < U , which is consistent with B in the sense of the lemma.
Partition α into the 5 classes λ = {X : XOU }, o = {O}, µ = {X :
OXU }, υ = {X : U }, and ρ = {X : OU X}. Writing χ1 < χ2 for
∀X ∈ χ1 ∀Y ∈ χ2 (X < Y ), the order on the classes must be λ < o <
µ < υ < ρ.
Add the pairs X < Y to the relation whenever X and Y are in
different classes. If X, Y ∈ λ add X < Y if X-Y -O, else add Y < X.
If X, Y ∈ µ add X < Y if O-X-Y , else add Y < X. If X, Y ∈ ρ add
X < Y if O-X-Y , else add Y < X.
Clearly, for distinct X and Y exactly one of X < Y or Y < X holds;
and X 6< X. It remains to verify the transitivity law X < Y ∧ Y < Z ⇒
X < Z. This is clear unless X, Y , and Z are all in the same class. In
these cases either X-O-A and Y -O-A or ¬X-O-A and ¬Y -O-A,and so
¬X-O-Y ; similarly ¬X-O-Z.
Suppose X-Y -O and Y -Z-O; then ¬O-Y -Z, and so if ¬X-Y -Z then
¬X-Y -O. Hence, X-Y -Z, so ¬Z-X-Y ; also ¬O-X-Y and so ¬O-X-Z.
Together with ¬X-O-Z this shows X-Z-O.
Suppose O-X-Y and O-Y -Z; then ¬O-Y -X, and so if ¬Z-Y -X then
¬O-Y -Z. Hence, Z-Y -X, so ¬X-Z-Y ; also ¬O-Z-Y and so ¬O-Z-X.
Together with ¬X-O-Z this shows O-X-Z. ⊳
220
Lemma 6. If XY Z≡X ′ Y ′ Z ′ and X, Y, Z are collinear then
X , Y ′ , Z ′ are collinear.
Proof: Suppose without loss of generality that X-Y -Z. If X ′ , Y ′ , Z ′
are not collinear let Z1 be a point with XY Z1 ≡X ′ Y ′ Z ′ , and let X1 be
a point on α(Y Z1 ) with XY Z≡X1 Y Z1 . Then XY X1 ≡X1 Y X, X-Y -Z,
X1 -Y -Z1 , XZ≡X1 Z1 , and Y Z≡Y Z1 , so by axiom D7 XZ1 ≡X1 Z. By
assumption, XZ≡XZ1 ; XZX1 ≡XZ1 X1 follows, and so by axiom D9
α(XX1 ) intersects (ZZ1 ). By axiom B4 α(XX1 ) intersects either (Y Z)
or (Y Z1 ); but this is impossible. ⊳
Lemma 7. Suppose X-Y -Z, XY ≡X ′ Y ′ and XZ≡X ′ Z ′ , and Y ′
and Z ′ are on the same side of X ′ . Then X ′ -Y ′ -Z ′ and Y Z≡Y ′ Z ′ .
Proof: Under the stated hypotheses there are by axiom D8 two
points Z1′ and Z2′ with Y Z≡Y ′ Zi′ ; only one of these, say Z1′ , is on the
opposite side of Y ′ than X ′ . By axiom D6 XZ≡X ′ Z1′ . Also both Z ′
and Z1′ are on the same side of X ′ as Y ′ . Using axiom D8 again it must
be the case that Z ′ and Z1′ are identical. ⊳
Lemma 8. Given distinct points X and Y there is a unique point
M , called the midpoint of the segment [XY ], such that X-M -Y and
XM ≡M Y .
Proof: Choose W not in α(XY ); choose W ′ on the opposite side of
α(XY ) with XW Y ≡Y W ′ X, and let α(W W ′ ) intersect α(XY ) in M .
Let M ′ in α(XY ) satisfy XM Y ≡Y M ′ X. Using axiom D7 W M ≡W ′ M ′
and W M ′ ≡W ′ M , and hence W M W ′ ≡W ′ M ′ W . By lemma 6 M ′ is in
α(W W ′ ), and so M ′ = M . If, say, X-Y -M then since M X≡M Y and
M Y ≡M Y , by lemma 7 XY ≡Y Y would hold, which contradicts the
assumption that X and Y are distinct. Similarly M -X-Y is impossible,
so X-M -Y . To prove uniqueness, order the line so that X < M < Y ,
and suppose M ′ is another midpoint; if M ′ 6= M suppose without loss
of generality that M < M ′ . Then the point M ′′ such that M M ′ Y ≡
M M ′′ X must be identical to M ′ , which is impossible. ⊳
Lemma 9. Given a line α = α(OU ), ordered so that O < U , there
is a unique order-preserving bijection P : R 7→ α with the following
properties, where Pr is written for P (r).
1. P0 = O, P1 = U , and
2. Pr Ps ≡Pt Pu if and only if |s − r| = |u − t|.
Proof: Let Qd denote the rational numbers whose denominator is
a power of 2. There is a unique order-preserving map P : Qd 7→ α such
that properties 1 and 2 hold; this may be obtained by first “marking
off” the integers, and then successively taking midpoints. Since Qd
is a dense linear order without endpoints, a unique order-preserving
bijection g : R 7→ α is induced; let P be the inverse of g.
It remains to show that property 2 holds in general. Let ⊕ be the
′
221
binary function on α, where S = X ⊕ Y if and only if XS≡OY and
S > X, S = X, or S < X according to whether Y > O, Y = O, or
Y < O.
To see that ⊕ satisfies axiom C1, let S = X + Y , T = Y + X,
U = (X + Y ) + Z, and V = X + (Y + Z). Then XS≡OY and SU ≡Y T ,
so (considering signs and using axiom D6 and lemma 7) XU ≡OT ; also
OX≡V T , so OU ≡OV .
To see that ⊕ satisfies axiom C2, let S = X + Y and T = Y + X.
Then XS≡OY and Y T ≡OX, so (considering signs and using axiom D6
and lemma 7) OS≡OT .
That ⊕ satisfies axiom C3, with O for 0, follows by axiom D1.
To see that ⊕ satisfies axiom C4, let X ′ be the other point on α
such that OX≡OX ′ . Let S = X + X ′ ; OX≡XS follows, whence O = S
since O and S are on the same side of X.
That ⊕ satisfies axiom O3 follows, since if S = X + Y where X and
Y are positive, then O < X < X + Y .
Thus, α forms an ordered commutative group with ⊕ and <, which
has the least upper bound property. It is readily verified that ⊕ and +
agree on Qd ; and hence they agree on α. It follows that Pr Pr+s ≡P0 Ps ,
and property 2 follows. ⊳
It is useful to have a definition of the distance between two points;
when this is used it is assumed that two distinct points O and U have
been selected, to give the “unit length”. By the lemma this defines for
each r ∈ R a point Pr ∈ α(OU ). Define the distance D(X, Y ) between
points X and Y to be the unique real number r ≥ 0 such that XY ≡
OPr . Note that W X≡Y Z if and only if D(W, X) = D(Y, Z).
Lemma 10. If α is parallel to β, and β is parallel to γ, then α is
parallel to γ, or α equals γ.
Proof: If α and γ intersected in the point X, then α and γ would
both contain X and be parallel to β, and so by axiom P would be
identical. ⊳
Lemma 11. Suppose that OX1 Y1 ≡O′ X1′ Y1′ , X2 (resp. Y2 , X2′ , Y2′ ) is
on the same side of O as X1 (resp. Y1 , X1′ , Y1′ ), OX2 ≡OX2′ , and OY2 ≡
OY2′ . Then X2 Y2 ≡X2′ Y2′ .
Proof: By axiom D6 or lemma 7 OY2 ≡O′ Y2′ , and so by axiom D7
X1 Y2 ≡X1′ Y2′ . Repeating the argument with the triangles OX1 Y2 and
O′ X1′ Y2′ yields X2 Y2 ≡X2′ Y2′ . ⊳
It is useful to introduce some further notation. Given distinct points
X and Y , let α> (XY ) denote the points of α(XY ) which are on the
same side of X as Y . Given noncollinear points O, X, and Y write
6 OXY for the triple OXY , when it is to be considered as an “angle”.
Say that angles 6 OXY and 6 OX ′ Y ′ are congruent, written 6 OXY ≡
222
O′ X ′ Y ′ , if X1 Y1 ≡X1′ Y1′ for some (and hence any) X1 ∈ α> (OX),
X1′ ∈ α> (O′ X ′ ), Y1 ∈ α> (OY ), and Y1′ ∈ α> (O′ Y ′ ), with OX1 ≡O′ X1′
and OY1 ≡O′ Y1′ . It is easy to verify that this is an equivalence relation,
and 6 OXY ≡6 OY X.
Lemma 12. Suppose Z1 and Z2 are on the same side of α(XY ),
and 6 XY Z1 ≡6 XY Z2 . Then α(Y Z1 ) and α(Y Z2 ) coincide.
Proof: Let Z3 ∈ α> (Y Z2 ) be such that Y Z3 ≡Y Z1 . Then XY Z1 ≡
XY Z3 . Z1 and Z3 are on the same side of α(XY ), so by axiom D9
Z3 = Z1 . Thus, α(Y Z1 ) = α(Y Z3 ) = α(Y Z2 ). ⊳
For readers familiar with plane geometry, lemma 11 may be seen as
a version of the “side-angle-side” criterion for congruent triangles. The
next lemma is the “angle-side-angle” criterion,
Lemma 13. Given triangles XY Z and X ′ Y ′ Z ′ , if XY ≡X ′ Y ′ ,
6 XY Z≡6 X ′ Y ′ Z ′ , and 6 Y XZ≡6 Y ′ X ′ Z ′ then XY Z≡X ′ Y ′ Z ′ .
Proof: Let Z2′ ∈ α> (X ′ Z ′ ) be such that XZ≡X ′ Z2′ . By lemma 11
XY Z≡X ′ Y ′ Z2′ , and 6 X ′ Y ′ Z ′ ≡6 X ′ Y ′ Z2′ follows. By lemma 12 α(Y ′ Z ′ )
= α(Y ′ Z2′ ). Hence Z2′ and Z ′ both lie on α(Y ′ Z ′ ) and α(Y ′ Z2′ ),
and hence Z2′ = Z ′ . ⊳
Two distinct lines intersecting at a point determine four angles.
Pairs of angles which share a common side are said to be supplementary;
there are four of these. Pairs of angles which have no common side are
said to be opposite; there are two of these.
Lemma 14. Suppose 6 OXY and 6 OY Z are supplementary,
′
6 O X ′ Y ′ and 6 O′ Y ′ Z ′ are supplementary, and 6 OXY ≡6 O′ X ′ Y ′ .
Then 6 OY Z≡6 O′ Y ′ Z ′ .
Proof: It may be assumed that OX≡O′ X ′ , OY ≡O′ Y ′ , and OZ≡
′ ′
O Z . By hypothesis XY ≡X ′ Y ′ . Using axiom D7, Y Z≡Y ′ Z ′ . ⊳
Corollary 15. If OXY and OX ′ Y ′ are opposite angles then
6 OXY ≡6 OX ′ Y ′ .
Proof: 6 OXY and 6 OXY ′ , and 6 OXY ′ and 6 OX ′ Y ′ , are supplementary. ⊳
Lemma 16. Suppose X1 , X2 , and X3 are points on the line α with
X1 -X2 -X3 , and Y1 and Y2 are points on the same side of α. Then
6 X1 X2 Y1 ≡6 X2 X3 Y2 if and only if α(X1 Y1 ) and α(X2 Y2 ) are parallel.
Proof: Suppose α(X1 Y1 ) and α(X2 Y2 ) intersect at W1 . Let W2 be
the midpoint of [X1 X2 ], and let W3 ∈ α(W1 W2 ) be such that W1 W2 ≡
W2 W3 and W1 6= W3 . Then by congruence of opposite angles and
side-angle-side, W1 X1 W2 ≡W3 X2 W2 . Further W3 is not in α(W1 X2 ),
because W2 lies in α(W1 W3 ) and W2 and X2 are distinct. Choose W4
with X2 -X1 -W4 . Choose W5 ∈ α(X1 Y1 ) and W6 ∈ α(X2 Y2 ) on the
same side of α as W3 . By congruence of opposite angles, 6 X1 X2 Y1 ≡
6 X1 W5 W4 and 6 X2 X3 Y2 ≡6 X2 W6 W4 ; and 6 X1 W1 W2 ≡6 X2 W3 W2 has
6
223
already been shown. Since 6 X2 W4 W3 6≡6 X2 W4 W6 , 6 X1 X2 Y1 6≡
X2 X3 Y2 follows.
Suppose now that α(X1 Y1 ) and α(X2 Y2 ) are parallel. Let Y3 be a
point on the same side of α(X1 X2 ) as Y2 such that 6 X1 X2 Y1 ≡6 X2 X3 Y3
(such exists, using axioms D8 and D9). Then by what was just shown,
α(X2 Y3 ) is parallel to α(X1 Y1 ). By axiom P α(X2 Y2 ) and α(X2 Y3 ) are
identical, whence 6 X1 X2 Y1 ≡6 X2 X3 Y2 . ⊳
The ordered sequence W XY Z of four points is said to be the vertices of a parallelogram if α(W X) and α(Y Z) are parallel, and and
α(W Z) and α(XY ) are parallel. It follows that the points must be
distinct, with no three collinear. The segments [W X], [XY ], [Y Z] and
[ZW ] are called the sides of the parallelogram. The segments [W Y ] and
[XZ] are called the diagonals.
Lemma 17. In a parallelogram W XY Z, W X≡Y Z and XY ≡ZW .
Proof: By lemma 16 and opposite angles, 6 W XY ≡6 Y ZW and
6 W ZY ≡6 Y XW . By angle-side-angle, the lemma follows. ⊳
Lemma 18. Suppose W XY Z are distinct points such that α(W X)
and α(Y Z) are parallel, W and Z are on the same side of α(XY ), and
W X≡Y Z. Then W XY Z is a parallelogram.
Proof: Let β be the line through W parallel to α(XY ). Let Z ′ be
the point where β intersects α(Y Z). Then W XY Z ′ is a parallelogram,
so by lemma 17 W X≡Y Z ′ , so Z = Z ′ . ⊳
Lemma 19. The diagonals of a parallelogram W XY Z intersect.
Proof: Let M be the midpoint of [W X] and let N be the midpoint
of [Y Z]. It is easy to see that W M ≡ZN and M X≡N Y (let N ′ be
the point on α> (ZY ) with W M ≡ZN ′ ; by lemmas 17 and 7 M X≡
N ′ Y ). By lemma 18 W M N Z and M XY N are parallelograms. W and
Y are on opposite sides of α(M N ), so [W Y ] intersects α(M N ); let V
be the point of intersection. By angle-side-angle, W M V ≡Y N V , so V is
the midpoint of [M N ]. A similar argument shows that [XZ] intersects
[M N ] in the midpoint. ⊳
Say that a point W is inside 6 XY Z if W is on the same side of
α(XY ) as Z, and W is on the same side of α(XZ) as Y .
Lemma 20. The following are equivalent.
a. W is inside 6 XY Z.
b. W ∈ (Y ′ Z ′ ) for some Y ′ ∈ α> (XY ) and Z ′ ∈ α> (XZ).
c. α(XW ) intersects (Y Z), and if V is the intersection point then
V ∈ α> (X, W ).
Proof: Suppose a holds. Let β be the line through X, parallel to
α(Y Z). Y and Z are on the same side of β. Since α(XW ) is not parallel
to β it is not parallel to α(Y Z) either, so intersects it at some point V .
V , indeed α> (XW ), is on the Y, Z side of β, the Z side of α(XY ), and
6
224
the Y side of α(XZ). In particular c holds. Further, let γ be the line
through W parallel to β, and let Y ′ be the intersection with α(XY )
and Z ′ the intersection with α(XZ). Y ′ and Z ′ are on the Y, Z side
of β, so Y ′ ∈ α> (XY ) and Z ′ ∈ α> (XZ). That is, b holds also. If b
holds then W and Z ′ are on the same side of α(XY ), and Z ′ and Z are
also, whence W and Z are. Similarly W and Y are on the same side of
α(XZ). If c holds then W and V are on the same side of α(XY ), and
V and Z are also, whence W and Z are. Similarly W and Y are on the
same side of α(XZ). ⊳
Lemma 21. If W is inside 6 XY Z, W ′ is inside 6 X ′ Y ′ Z ′ , 6 XY Z≡
6 X ′ Y ′ Z ′ , and 6 XY W ≡6 X ′ Y ′ W ′ , then 6 XW Z≡6 X ′ W ′ Z ′ .
Proof: Let V be the intersection point of α(XW ) and (Y Z); and
let V ′ be the intersection point of α(X ′ W ′ ) and (Y ′ Z ′ ). The lemma
follows by lemma 7. ⊳
Say that 6 O1 X1 Y1 < 6 O2 X2 Y2 if there is a point W inside
6 O2 X2 Y2 such that 6 O1 X1 Y1 ≡6 O2 X2 W
Lemma 22. If 6 O1 X1 Y1 < 6 O2 X2 Y2 and 6 O2 X2 Y2 < 6 O3 X3 Y3
then 6 O1 X1 Y1 < 6 O3 X3 Y3 .
Proof: Without loss of generality it may be assumed that O1 =
O2 = O3 , X1 = X2 = X3 , Y1 is inside 6 O1 X1 Y2 , and Y2 is inside
6 O1 X1 Y3 . The lemma follows by considering the segment (Y1 Y3 ). ⊳
Lemma 23. Suppose XY Z is a triangle and W -Y -Z.
Then 6 W XZ < 6 Y XZ.
Proof: Let α be the line through W parallel to α(XY ) and β the
line through X parallel to α(Y Z). Then Y XV W is a parallelogram,
where V is the point of intersection of α and β. By lemma 19 (W X) is
inside 6 W Y V , which is congruent to 6 Y XZ. ⊳
If OU V is a triple of distinct non-collinear points, and let U ′ be
the other point of α(OU ) such that OU ≡OU ′ . Say that OU V forms
a right angle if V U ≡V U ′ . Using lemma 16 if this is so then all four
angles at the intersection O of α(OU ) and α(OV ) are right angles. It
is readily verified that two right angles are congruent. If all four angles
at the intersection of lines α and β are right angles, the lines α and β
are said to be perpendicular. Using lemma 16, if β1 and β2 are distinct
lines perpendicular to α then β1 and β2 are parallel; also if β1 is a
perpendicular and β2 is parallel to β1 then β2 is a perpendicular.
Lemma 24. Given a point X and a line α, there is a unique line β
perpendicular to α, such that X ∈ β.
Proof: Suppose α = α(OU ), and let U ′ be the distinct point of α
with U ′ O≡OU . Let W be a point not in α, and let W ′ be the point on
the same side of α as W with U W U ′ ≡U ′ W ′ U . Let V be the midpoint
of [W W ′ ]. Using the triangles U W W ′ and U ′ W ′ W it follows by axiom
225
D7 that U V ≡U ′ V . This shows that there is a perpendicular β1 to α
with O ∈ β1 . Such a perpendicular is unique by lemma 12. If X is on
β1 let β = β1 ; otherwise let β be the line through X parallel to β1 . ⊳
Lemma 25. Suppose XY Z is a triangle with 6 XY Z a right angle.
Then 6 Y XZ < 6 XY Z and 6 ZXY < 6 XY Z.
Proof: Let α be the line through Y parallel to α(XZ) and β the line
through Z parallel to α(XY ). Then W Y XZ is a parallelogram, where
W is the point of intersection of α and β. Further 6 Y XW is a right angle
and Z is inside it, so 6 Y XZ < 6 Y XW ; similarly 6 ZXY < 6 ZXW
where ZXW is a right angle. ⊳
Lemma 26. Suppose XY Z is a triangle with 6 XY Z a right angle.
Suppose α is the perpendicular to α(Y Z) passing through X, and that
α and α(Y Z) intersect in the point W . Then Y -W -Z.
Proof: W = Y or W = Z are ruled out by lemma 25. Y ZW and
W Y Z are ruled out by lemmas 23, 25, and 22. ⊳
Theorem 2. Suppose OU V is a right angle with OU ≡OV . Then
there is a unique bijection from the Euclidean plane to R2 with the
following properties, where X̃ is used to denote the value at X.
a. Õ = h0, 0i, Ũ = h1, 0i, and Ṽ = h0, 1i.
b. Z ∈ α(X, Y ) if and only if Z̃ = X̃ + t(Ỹ − X̃) for some t ∈ R.
c. D(X, Y ) = d(X̃, Ỹ ).
d. X-Z-Y if and only if Z̃ = X̃ + t(Ỹ − X̃) for some t ∈ R such that
0 < t < 1.
By the proof of lemma 9, the requirements imply that for Pr in
α(OU ), P̃r = hr, 0i. Likewise, for Pr in α(OV ), P̃r = h0, ri. Let α
be the line through Pr on α(OU ), which is parallel to α(OV ); let X
be another point on α. Then X̃ must be hr, si for some s ∈ R, else
α would intersect α(OV ). Choose X to be the point such that OV ≡
Pr X, which is on the same side of α(OU ) as V . Then (since V X can’t
intersect OPr ), X̃ must be hr, 1i. This determines X̃ for all X ∈ α.
Finally, every X occurs in some α, namely that α with X ∈ α which is
parallel to α(OV ); this must intersect α(OU ), else it is parallel to both
α(OU ) and α(OU ), which intersect but are not equal.
Call α(OU ) the x-axis, and α(OV ) the y-axis. It has been shown
that if there is a bijection with the required properties then it must be
as just described, indeed it suffices that it have the required properties
on the x-axis and all lines parallel to the y-axis. It remains to show that
this map has the required properties in all other cases.
Let Pxy denote the point such that P̃xy = hx, yi. Using lemma 19,
Pxy lies on the line through P0y which is parallel to the x-axis; further
this line has properties b, c (meaning it holds for all X and Y on the
line), and d.
226
Let α be a line not parallel to the x-axis or the y-axis. Let Pxα
be the point where α intersects the line through Px0 and parallel to
the y-axis. Let y0 be such that P0α = P0,y0 , and let s be such that
P1α = P1,y0 +s . Given x1 < x2 , let y1 be such that Pxα1 = Px1 ,y1 , let y2
be such that Pxα2 = Px2 ,y2 , and let y3 be such that Pxα2 −x1 = Px2 −x1 ,y3 .
Using lemma 16 and angle-side-angle, y2 = y1 + (y3 − y0 ). By induction,
Pnα = Pn,y0 +ns for n ∈ Z. It then follows that Pxα = Px,y0 +xs for
x ∈ Qd . Finally, this holds for all x ∈ R using axiom C.
A similar argument shows that D(Px,y0 +xs , P00 ) = D(P1,y0 +s , P00 ).
Properties b and d for α follow readily. To prove that property c
holds, it suffices to show that l2 = x2 + y 2 where l = D(P00 , Pxy ) (the
“Pythagorean theorem”); it may be assumed that x > 0 and y > 0. Let
β be the line perpendicular to α(P00 Pxy ), with Px0 ∈ β. Let Q be the
point of intersection of β and α(P00 Pxy ). By lemma 26 P00 -Q-Pxy . Let
l1 = D(Q, P00 ), l2 = D(Q, Pxy ), and l3 = D(Q, Px0 ); then l = l1 + l2 .
Using the facts that 6 P00 Px0 Q ≡ 6 P00 QPx0 , and 6 Px0 P00 Pxy and
6 QP00 Px0 are both right angles, it follows that l1 /x = l3 /y = x/l.
Similarly l3 /x = l2 /y = y/l. Thus, x2 = l1 l and y 2 = l2 l, and using
l = l1 + l2 , l2 = x2 + y 2 follows.
This concludes the proof of theorem 2. ⊳
Theorems 1 and 2 show that the second order axioms for the Euclidean plane characterize it as an essentially unique structure.
Appendix 2. Computability (II).
In applications of computability theory it is necessary to have a
repertoire of functions which have been shown to be computable, in
particular various “syntactic functions” involved in the arithmetization
of syntax, and other string manipulations. Showing that such functions
are computable is somewhat lengthy no matter how it is done, and
usually involves giving additional characterizations of the computable
functions.
The following classes of functions will be defined: representable,
µ-recursive, primitive recursive, and Turing computable. Outlines will
be given of proofs of the following facts.
1. A representable function is computable.
2. A µ-recursive function is representable.
3. A primitive recursive function is µ-recursive.
4. Various syntactic functions are primitive recursive.
5. A Turing computable function is µ-recursive.
6. Further syntactic functions are Turing computable.
7. A Turing computable function is computable.
In particular, µ-recursive and Turing computable are the same as
computable. There are methods which do not involve as many overall
227
steps; in [Shoenfield1] for example the Turing computable functions are
not defined and step 6 is carried out using µ-recursion directly. The
proof is still lengthy, however; and the coding of strings is more technical.
The method here provides additional facts. In particular, it will be noted
that the syntactic functions are members of a class of functions which
is considerably smaller than the entire class of computable functions.
Many omitted details can be found in chapter 12 of [Dowd1].
A kary function f is said to be representable if there is a formula
F , with k + 1 free variables x1 , . . . , xk , y, such that f (n1 , . . . , nk ) = m
if and only if ⊢ Fn1 /x1 ,...,nk /xk ⇔ y = m (in this section ⊢ denotes
provability from the axioms of Q). It is readily seen that fact 1 holds.
Representability is a slightly stronger concept than computability, of
only technical interest.
The µ-recursive functions are defined by a recursive definition. To
streamline the definition some preliminary definitions are helpful. Variables may be assumed to be ordered in “alphabetical” order x0 , x1 , . . ..
A term t involving functions which have already been assigned a meaning
determines a function ft . If t has k variables then ft is kary; argument
position i corresponds to the ith variable in the alphabetical order. A
recursive definition may be given, similar to the definition of t̂ given in
section 6.
If g(~x, y) is a (k + 1)ary function the minimization (or minimalization) of g (on y) is the kary partial function φ(~x), such that φ(~x) = y
if and only if y is the least value such that g(~x, y) = 0. It should be
obvious that φ will in general be a partial function, since there might
be no y such that g(~x, y) = 0.
The µ recursive functions are defined as follows.
1. The functions s , +, and · are µ-recursive.
2. If t is a term involving 0 and already defined µ-recursive functions
then ft is µ-recursive.
3. If f is obtained from the already defined µ-recursive function g by
mimimization then f is µ-recursive.
Note that if t has no variables then ft is a constant; for reasons such as
this it is convenient to consider constants to be 0ary functions.
Theorem 1. If f is µ-recursive then f is representable.
Proof: Recall from section 7 that if k + l = m then ⊢ k + l = m, and
if k ·l = m then ⊢ k ·l = m. where ⊢ denotes provability from the axioms
of Q. It follows using these, the axioms of equality, and propositional
logic that 0 is represented by y = 0, s by y = xs , + by y = x1 + x2 ,
and · by y = x1 · x2 . The reader may verify this, or see the “Equality
theorem” of [Shoenfield1].
Suppose f is represented by F , and for 1 ≤ i ≤ k ti is represented
228
by Gi . By renaming variables, the argument variables of F may be
assumed to be z1 , . . . zk , and the value variable of Gi to be zi . Then
f (t1 , . . . , tk ) is represented by ∃z1 . . . ∃zk (G1 ∧· · ·∧Gk ∧F ). The required
formulas for this formula may be derived from the inductively assumed
formulas by predicate logic; details may be found in [Shoenfield1].
Recall the definition of ≤ from section 7; and that ⊢ x ≤ k ⇒ (x =
0∨· · ·∨x = k). Also, x ≤ n∧n ≤ x (see [Yasuhara]), and ¬n ≤ x ⇒ x =
0 ∧ · · · ∧ x = n − 1 follows. If h(w, ~x) is represented by H(w, ~x, y) then
µw(h(w, ~x) = 0) is represented by H(w, ~x, 0) ∧ ∀v(H(v, ~x, 0) ⇒ w ≤ v).
This follows by predicate logic from the induction hypothesis and the
above facts. ⊳
Given a kary function g and a (k + 2)ary function h, there is a
unique (k + 1)ary function f such that f (0, ~y) = g(~y) and f (xs , ~y ) =
h(x, ~y , f (x, ~y )). In this case f is said to be obtained by primitive recursion from the functions g and h. The existence and uniqueness of f can
be proved in basic set theory.
The primitive recursive functions are defined as follows.
1. The function s is primitive recursive.
2. If t is a term involving 0 and already defined primitive recursive
functions then ft is primitive recursive.
3. If f is obtained from already defined primitive recursive functions
g and h by primitive recursion then f is primitive recursive.
Some facts about the integers are required for the proof of the next
theorem. If n and d are elements of N with d 6= 0, there is a unique
r ∈ N such that n = dq + r and r < d. The notation Rem(n, d) will be
used for r. Other facts required concern prime factorization, properties
of relative primality, and the Chinese remainder theorem. Discussions
of these topics can be found in various sources, including [Dowd1].
The notation n! is used for the factorial function. Let GP(x, y) =
((x + y)(x + y + 1))/2 + y. This is a bijection from N × N to N , as the
reader may verify. It is called either the Godel or Cantor, pairing function. Let GP1 (x) and GP2 (x) denote the first and second components
of the pair corresponding to x.
Theorem 2. If f is obtained from µ-recursive functions g and h by
primitive recursion then f is µ-recursive.
Proof: Let β(c, d, i) = Rem(c, 1 + (i + 1)d). Given a sequence
a0 , . . . , ak−1 of nonnegative integers there are c and d such that
β(c, d, i) = ai for 0 ≤ i < k. Indeed, let d = (k − 1)!; then the numbers
1 + (i + 1)d are pairwise relatively prime. This follows because a prime
divisor of 1 + (i + 1)d must be greater than k − 1; and a prime divisor
of two such values must divide their difference. Now by the Chinese
remainder theorem c may be chosen. Let γ(x, i) = β(GP1 (x), GP2 (x), i),
229
so that c and d are coded by a single value. Suppose f is defined
by f (0, ~y) = g(~y) and f (xs , ~y) = h(x, ~y , f (x, ~y )). Using µ to denote
minimization, let
Vseqf (x, ~y ) =
µs(γ(s, 0) = g(~y ) ∧ µi(γ(s, i + 1) 6= h(x, ~y , γ(s, i))) ≥ x).
Then f (x, ~y ) = γ(Vseqf (x, ~y ), x). ⊳
The method of sequence coding used in the proof of theorem 2 is
due to Godel. As an almost immediate consequence of the theorem,
every primitive recursive function is µ-recursive. The primitive recursive functions are an easily defined class of computable functions which
contain functions of interest for arithmetization of syntax and similar
technical details. Godel used them for this purpose.
Some basic primitive recursive functions relevant to string manipulation are as follows, where m ≥ 2.
Predecessor:
Pred(0) = 0,
Pred(xs ) = x.
Limited difference:
Ldif(x, 0) = x,
Ldif(x, y s ) = Pred(Ldif(x, y)).
Conditional:
Cond(0, y, z) = y,
Cond(xs , y, z) = z.
m-ary conditional:
Condm (x, y0 , . . . ym−1 ) = Cond(x, y0 , Cond(Pred(x), y1 , · · ·)).
Right digit of m-adic notation:
Rdigm (0) = 0,
Rdigm (xs ) =
Condm+1 (Rdigm (x), 1, Rdigm (x) + 2, . . . , Rdigm (x) + 1).
Trim right digit of m-adic notation:
Trdigm (0) = 0,
Trdigm (xs ) = Condm+1 (Rdigm (x), 0, Trdigm (x), . . . , Trdigm (x)s ).
Trim digits from right of m-adic notation:
Trdigsm (0, y) = y,
Trdigsm (xs , y) = Trdigm (Trdigsm (x, y)).
Length of m-adic notation up to limit:
Lenlm (0, x) = 0,
Lenlm (ws ), x) =
Condm (Trdigsm (w, x), Lenlm (w, x), Lenlm (w, x)s ).
Length of m-adic notation:
Lenm (x) = Lenlm (x, x),
230
Let Appim be the function x + · · · + x + i where x is repeated
m times. A function f is said to be defined from functions g and hi
for 1 ≤ i ≤ m by recursion on m-adic notation if f (0, ~y) = g(~y) and
f (Appim (x), ~y ) = hi (x, ~y , f (x, ~y )).
Lemma 3. For m ≥ 2 the primitive recursive functions are closed
under recursion on m-adic notation.
Proof: Define by primitive recursion the function k(w, x, ~y ) which
is the value of f (u, ~y) where u is the leftmost w digits of x. Thus
(omitting the subscript m), let Ldigs(x, y) = Trdigs(Ldif(Len(y), x), y);
and let k(0, x, ~y ) = g(~y), and
k(w + 1, x, y) = condm+1 (Rdig(Ldigs(w + 1, x)), 0, t1 , . . . , tm )
where ti = hi (Ldigs(w, x), ~y , k(w, x, ~y )). Then f (x, ~y) = k(x, x, ~y ). ⊳
The definition of Turing comuputabilty involves the notion of a
Turing machine. A Turing machine has a tape alphabet A, and a “head
state” alphabet Q. One of the tape symbols is distinguished as a blank
symbol, denoted b. There is a set of rules, of the form qs → q ′ s′ d where
q, q ′ ∈ Q, s, s′ ∈ A, and d is L (left) or R (right). A Turing machine is
called deterministic if no pair qs occurs more than once on the left side
of a rule; Turing machines will here be required to be deterministic.
The state of the machine is represented by a string over Q ∪ A,
which contains exactly one symbol from Q. If the state is of the form
αqsβ, the rule with left side qs (if any) is applied to update the state,
resulting in a step of the computation. If the right side is q ′ s′ R the new
state is αs′ q ′ β, which may be described as, “if the tape head is reading
s in state q then it overwrites s with s′ , switches to state q ′ , and moves
one square to the right”.
The remaining types of transitions are as follows. If the state is αq
and there is a rule with left side qb and right side q ′ s′ R the new state
is αs′ q ′ (the tape grows a blank square). If the rule is qs → q ′ s′ L then
αtqsβ becomes αq ′ ts′ β and qsβ becomes q ′ bs′ β; and if s = b then αtq
becomes αq ′ ts′ and q becomes q ′ bs′ .
A computation is a sequence of states, where each follows from
the previous according to a rule. The transition from one state to the
next is called a step of the computation. Note that blanks may be
added to the beginning or end of the initial state, without affecting
the computation. A state may be reached which has no successor, in
which case the computation has halted. Alternatively a computation
may continue forever.
To view a Turing machine as computing a partial function from N
to N conventions for coding the input and output must be adopted.
A simple choice is to represent a nonnegative integer n by its 2-adic
notation; the initial state for as the input preceded by the initial head
231
state. If the computation halts in state αqβ, the output value is the
longest valid integer following q (the longest string of 1’s and 2’s). A
Turing machine may be viewed as computing a kary partial function
φ(x1 , . . . , xn ), by adding comma to the tape alphabet.
Lemma 4. For any Turing machine M the step function StepM (x)
is primitive recursive.
Proof: The following functions are primitive recursive, where S
is a subset of {1, . . . , m}. The abbreviation xi is used for Appim (x),
1 ≤ i ≤ m.
Concatenation:
Concat(x, 0) = x,
Concat(x, yi) = Concat(x, y)i.
Leftmost member of S:
LftmS (0) = 0,
LftmS (xi) = LftmS (x) if i ∈
/ S,
LftmS (xi) = Cond(LftmS (x), LftmS (x), i) if i ∈ S.
Similarly the functions SlLftmS (x) (substring to the left of it) and
SrLftmS (x) (substring to the right of it) may be defined. Remaining
details are left to the reader. ⊳
Theorem 5. If a Turing machine M computes a function f then f
is µ-recursive.
Proof: The following functions are primitive recursive:
- HaltedM (x), which is 0 if x is a halted state, else 1.
- CompM (x, y), which is the state starting at x, after y steps, with
steps after halting doing nothing.
- InM (x) which translates a 2-adic integer to the initial state.
- OutM (x) which translates a state to the output value.
The function f is a composition of the above functions, and
µy(HaltedM (CompM (x, y)). ⊳
To show that a function is computable it suffices to show that it is
Turing computable, and this is often easier than showing directly that
it is µ-recursive. According to [Soare], Godel was not convinced that
a correct formal definition of computation had been given, until the
appearance of Turing machines.
As an example, the function IsTerm(x), which is 0 if x is (the
Godel number of) a term, else 1, is Turing computable. A Turing machine which computes this function repeats the following step, until the
leftmost function symbol is checked. Find the leftmost symbol which is
either 0; x; or a function symbol followed by a left parenthesis, checked
symbols or commas with at least one checked symbol after the left parenthesis and each comma, and right parenthesis. In the first case check 0;
in the second check x and following 1’s and 2’s; and in the third check all
232
symbols from the function symbol to the right parenthesis. The checkmarks needed for this can be placed on a separate “track”; the tape
alphabet letters can be considered as tuples, with the ith element being
the symbol in the ith track.
A more detailed consideration of “Turing machine programming”
can be found in numerous references, for example [AHU]. After some
labor, it should be easy to outline programs to compute the following
functions.
- Sub(x, y), which equals pFt/v q if x = pF q where F is a formula
with one free variable v, and y = ptq where t is a term.
- Num(x), which equals pxq where as in section 7 x is the numeral
for x.
- PrfQ (x), which is 0 if x is a proof in Q, else 1.
- Thm(x), which is the last formula of the proof x.
Theorem 6.
a. If P is computably enumerable then there is a Turing machine which
halts on input ~x if and only if P (~x).
b. If f is computable then f is Turing computable.
Proof: For part a, let F be a formula showing P is computably
enumerable. On input ~n, the Turing machine successively computes
Prf(y) for y = 0, 1, . . .. If y is a proof, the machine checks whether it
is a proof of Fn1 /x1 ,...,nk /xk . For part b, the Turing machine checks for
a proof of Fn1 /x1 ,...,nk /xk ,m/y for some m, and produces m as output
when one is found. ⊳
Theorem 7. If P is a computable k-ary predicate then there is
a formula H with free variables v1 , . . . , vk such that if P (n) then ⊢
Hn1 /x1 ,...,nk /xk , and if 6= P (n) then ⊢ ¬Fn/v . ⊢ ¬Hn1 /x1 ,...,nk /xk .
Proof: To simplify the notation the proof will be given for k = 1.
Suppose P is a computable predicate. Let F be the formula such that
P (n) if and only if ⊢ Fn/x , and let G be the formula such that ¬P (n)
if and only if ⊢ Gn/x . The function fP (n), which equals 0 if P (n)
and 1 if ¬P (n), is µ-recursive. Indeed, for a formula F let wF (w, x) =
PrfQ (w, Sub(NF , Num(x)) be the µ-recursive function which states that
w is a proof that F holds at x. Then fP (x) = wF (µw(wF (w, x) =
0 ∨ wG (w, x) = 1), x). Suppose H ′ is the formula with free variables x
′
and y, representing fP . Let H be H0/y
. ⊳
The predicate U (f, n) mentioned in section 9 can be defined as
follows. U (f, n) if and only if for some w, w is a proof of Fn/v , where
f = pF q and v is the free variable of F .
The length ℓ(x) of a string x over a finite alphabet is defined to
be the number of symbols comprising the string. Let ∗ denote 2-adic
concatenation, which is defined by x ∗ 0 = x, x ∗ yi = (x ∗ y)i for i = 1, 2.
233
The operation of concatenating x with itself ℓ(y) times is defined by
x ⊛ 0 = 0, x ⊛ yi = x ∗ (x ⊛ y). This is an easily defined function with
ℓ(x ⊛ y) = ℓ(x) · ℓ(y).
A function f is said to be defined from functions g, hi for 1 ≤ i ≤ m,
and b by bounded recursion on m-adic notation if f (0, ~y) = g(~y ) and
f (xi, ~y ) = min(b(x, ~y ), hi (x, ~y , f (x, ~y ))). The class L is defined as the
least class containing x1, x2, ∗, and ⊛; and closed under definition by
terms and bounded recursion on notation.
Cobham’s theorem states that a function f (~x) is in L if and only if
there is a Turing machine M computing f and a polynomial p such that
M halts within ℓ(x1 ) + · · · + ℓ(xk ) steps. This is stated here without
proof (see [Dowd1] for additional details), and also the following.
Various functions defined above are in L, including + and · ; the
string manipulation functions; StepM , HaltedM , InM , OutM ; and Sub,
Num, PrfQ , and Thm.
234
References
[AHU] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis
of Computer Algorithms, Addison-Wesley, 1974.
[Abraham] U. Abraham, “Proper Forcing”, In Handbook of Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[AbrMag] U. Abraham and M. Magidor, “Cardinal Arithmetic”, In
Handbook of Set Theory, ed. M. Foreman and A. Kanamori, to
appear.
[AbrShel] U. Abraham and S. Shelah, “A ∆22 Well-Order of the Reals
And Incompactness of L(QMM )”, manuscript, 1998.
[BJW] A. Beller, R. Jensen, and P. Welch, “Coding the Universe”,
London Mathematical Society Lecture Notes No. 47, 1982.
[Bagaria] J. Bagaria, “Natural Axioms of Set Theory and the Continuum Problem”, in Proceedings of the 12th International Congress
of Logic, Methodology, and Philosophy of Science, King’s College
London, 2005, 43–64.
[Bart] T. Bartoszynski, “Invariants of measure and category”, in Handbook of Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[Barwise] J. Barwise, Admissible Sets and Structures, Springer-Verlag,
1971.
[Belaniuk] S. Bilaniuk, A Problem Course in Mathematical Logic, euclid.trentu.ca/math/sb/pcml, 2003.
[BTW] J. Baumgartner, A. Taylor, and S. Wagon, “On splitting stationary subsets of large cardinals”, J. Symb. Logic 42 (1977) 203–
214.
[CFM] J. Cummings, M. Foreman, and M. Magidor, “Squares, scales
and stationary reflection”, Journal of Mathematical Logic, 1 (2001),
35–99. [ChaKei] C. Chang and H. Keisler, Model Theory, NorthHolland Publishing Company, 1973.
[Chong] C. Chong, Techniques of Admissible Recursion Theory, Lecture Notes in Mathematics 1106, Springer-Verlag, 1984.
[Ciesielski] K. Ciesielski, Set Theory for the Working Mathematician,
London Math Society Student Texts 39, Cambridge University
Press, 1997.
[Cummings] J. Cummings, “Iterated Forcing and Elementary Embeddings”, In Handbook of Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[DevJen] K. Devlin and R. Jensen, “Marginalia to a theorem of silver”,
Lecture Notes in Mathematics 499, Springer, 1975, 115–142.
[DevJohn] K. Devlin and H. Johnsbraten, The Souslin Problem, Lecture Notes in Mathematics 405, Springer-Verlag, 1974.
[Devlin] K. Devlin, Constructibility, Springer-Verlag, 1984.
235
[Dodd] A. Dodd, The Core Model, Cambridge University Press, 1982.
[DoddJen1] A. Dodd and R. Jensen, “The Core Model”, Annals of
Mathematical Logic 20 (1981), 43–75.
[DoddJen2] A. Dodd and R. Jensen, “The Covering Lemma for K”,
Annals of Mathematical Logic 22 (1982), 1–30.
[DoddJen3] A. Dodd and R. Jensen, “The Covering Lemma for L[U ]”,
Annals of Mathematical Logic 22 (1982), 127–135.
[Dowd1] M. Dowd, Introduction to Algebra, Topology, and Category
Theory, www.hyperonsoft.com, 2006.
[Dowd2] M. Dowd, “Some New Axioms for Set Theory”, submitted to
IJPAM.
[Drake] F. Drake, Set Theory, An Introduction to Large Cardinals,
North Holland, 1974.
[Enderton] H. Enderton, A Mathematical Introduction to Logic, Academic Press, 1972.
[Foreman] M. Foreman, “Generic Large Cardinals: New Axioms for
Mathematics?”, manuscript, 2001,
www.math.uiuc.edu/documenta/xvol-icm/01/01.html.
[Friedman1] H. Friedman, “Subtle Cardinals and Linear Orderings”,
manuscript, 1998.
[Friedman2] H. Friedman, “Does Mathematics Need New Axioms?”,
manuscript, 2000.
[Friedman3] H. Friedman, “Boolean Relation Theory and the Incompleteness Phenomena” manuscript, 2007.
[Fremlin1] D. Fremlin, Consequences of Martin’s Axiom, Cambridge
University Press, 1984.
[Fremlin2] D. Fremlin, Measure Theory, manuscript, 2006.
[Gaifman] H. Gaifman, “A generalization of Mahlo’s method for obtaining large cardinal numbers”, Israel J. Math. 5 (1967) 188–200.
[Gandy] R. Gandy, “Set-theoretic functions for elementary syntax”, in
Proceedings of Symposia in Pure Mathematics 13, Part II, American Mathematical Society, 1974, 103–126.
[Geschke] S. Geschke, “Models of Set Theory”, manuscript, 2008.
[Gitik] M. Gitik, “The Power Set Function”, Proceedings of the International Conference of Mathematics, 2002, 507–513.
[Godel] K. Godel, “What is Cantor s continuum problem?”, 1947.
[GoldShe] M. Goldstern and S. Shelah, “The Bounded Proper Forcing
Axiom” J. Symbolic Logic 60 (1995), 58–73.
[HajPud] P. Hajek and P. Pavel, Metamathematics of First-Order Arithmetic, Springer-Verlag, 1993.
[Harrington] L. Harrington, “ Analytic determinacy and 0#”, J. Symbolic Logic 43 (1978), 685–693.
236
[HardWr] G. Hardy and E. Wright, An Introduction to the Theory of
Numbers, Oxford University Press, 1968.
[Hauser] K. Hauser, “Godel’s Program Revisited Part I: the Turn to
Phenomenology”, The Bulletin of Symbolic Logic 12 (2006), 529–
590.
[Jackson1] S. Jackson, Math 6010 Notes, manuscript, 2003.
[Jackson2] S. Jackson, “Structural Consequences of AD”, In Handbook
of Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[Jech1] T. Jech, “Stationary subsets of inaccessible cardinals”, in Axiomatic Set Theory, Contemporary Mathematics 31 (1984), 115–
142, Ed. by J. Baumgartner, D. Martin, and S. Shelah, American
Mathematical Society.
[Jech2] T. Jech, Set Theory, Springer, 2003.
[Jensen] R. Jensen, “The fine structure of the constructible hierarchy”,
Ann. Math. Logic 4 (1972), 229–308.
[KanMag] A. Kanamori and M. Magidor, “The Evolution of Large Cardinal Axioms in Set Theory” in Higher Set Theory, Lecture Notes
in Mathematics 669, Springer 1978, 99–275.
[Kanamori1] A. Kanamori, “Set Theory From Cantor to Cohen”,
manuscript, 2007.
[Kanamori2] A. Kanamori, “Tennenbaum and Set Theory”,
manuscript, 2007.
[Kanamori3] A. Kanamori, The Higher Infinite, Springer, 2003.
[Kanamori4] A. Kanamori in “Reviews”, The Bulletin of Symbolic Logic 9 (2003), 237–241.
[Kaye] R. Kaye, Models of Peano Arithmetic, Clarendon Press, 1991.
[Kechris] A. Kechris, Classical Descriptive Set Theory, Springer-Verlag,
1995.
[Komjath] P. Komjath, “Shelah’s proof of diamond”, manuscript, 2008.
[KoeWood] P. Koellner and H. Woodin, “Large Cardinals from Determinacy”, In Handbook of Set Theory, ed. M. Foreman and A.
Kanamori, to appear.
[Koellner1] P. Koellner, “The Search for New Axioms” Ph. D. thesis,
Massechusets Institute of Technology, 2003.
[Koellner2] P. Koellner, “On Reflection Principles” manuscript, 2008.
[Kunen1] K. Kunen, “Some applications of iterated ultrapowers in set
theory”, Annals of Mathematical Logic 1 (1970), 179–227.
[Kunen2] K. Kunen, Set Theory: an Introduction to Independence
Proofs, North-Holland, 1980.
[Linden] T. Linden, “Equivalences between Godel’s definitions of constructibility”, in Sets, Models, and Recursion Theory, J. Crossley,
Ed., North-Holland, 1967.
237
[MacTutor] The MacTutor History of Mathematics archive, University
of St Andrews, www-groups.dcs.st-and.ac.uk/ history
[MagShel] M. Magidor and S. Shelah, “The tree property at successors
of singular cardinals”, manuscript, 2003.
[Magnus] P. Magnus, forall x: An Introduction to Formal Logic,
manuscript, 2008.
[MansWeit] R. Mansfield and G. Weitkamp, Recursive Aspects of Descriptive Set Theory, Oxford University press, 1985.
[Mathias] A. Mathias, “Weak systems of Gandy, Jensen and Devlin”,
manuscript, 2006.
[Mendelson] E. Mendelson, Introduction to Mathematical Logic, van
Nostrand, 1964.
[Miller] A. Miller, Descriptive Set Theory and Forcing, Lecture Notes,
University of Wisconson, 1995.
[Mitchell1] W. Mitchell, “Beginning Inner Model Theory”, In Handbook of Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[Mitchell2] W. Mitchell, “The Covering Lemma”, In Handbook of Set
Theory, ed. M. Foreman and A. Kanamori, to appear.
[Monk1] J. Monk, Introduction to Set Theory, McGraw-Hill, 1969.
[Monk2] J. Monk, Mathematical Logic, Springer-Verlag, 1976.
[Moore] TṀoore, “Set Mapping Reflection”, J. Math. Logic 5 (2005),
87–97.
[Moschovakis] Y. Moschavakis, Descriptive Set Theory, North-Holland
(1980).
[NagNew] E. Nagel and J. Newman, Godel’s Proof, Routledge, 1989.
[Neeman1] I. Neeman, “Determinacy in L(R)”, In Handbook of Set
Theory, ed. M. Foreman and A. Kanamori, to appear.
[Neeman2] I. Neeman, “Hierarchies of forcing axioms II”, Journal of
Symbolic Logic 73 (2008), 522-542.
[Rasch] T. Rasch, “Erweiterbarkeit von Einbettungen”, Diploma thesis,
Humboldt University in Berlin, 2000.
[Rathjen1] M. Rathjen, “A proof-theoretic characterization of the
primitive recursive set functions”, Journal of Symbolic Logic, 1992.
[Rathjen2] M. Rathjen, “The Higher Infinite in Proof Theory”,
manuscript, 1995.
[Rogers] H. Rogers, Theory of Recursive Functions and Effective Computability, McGraw-Hill (1967).
[Roitman] J. Roitman, “Notes on Forcing”, manuscript, 2005.
[Rudin] W. Rudin, Principles of Mathematical Analysis, McGraw-Hill,
1964.
[SEP] Stanford Encyclopedia of Philosophy, plato.stanford.edu.
[Sacks1] G. Sacks, Saturated Model Theory, W. A. Benjamin, 1972.
238
[Sacks2] G. Sacks, Higher Recursion Theory, Springer-Verlag, 1990.
[Sami] R. Sami, “Analytic determinacy and 0#: A forcing-free proof
of Harrington’s theorem”, Fundamena Mathematicae 160 (1999),
153–159.
[SchSt] E. Schimmerling and J. Steel “Fine Structure for Tame Inner
Models”, The Journal of Symbolic Logic 61 (1996), 621–639. Transactions of the American Mathematical Society 351 (), 3119–3141.
[Schindler] R. Schindler, Set Theory, manuscript, 2007.
[SchZem] R. Schindler and M. Zeman, “Fine structure”, In Handbook
of Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[Shelah1] S. Shelah, “Can you take Solovay’s inaccessible away?” Israel
J. Math. 48 (1984), 1–47.
[Shelah2] S. Shelah, “Logical Dreams”, Bulletin of the American Mathematical Society 40 (2003), 203–228.
[Shoenfield1] J. Shoenfield, Mathematical Logic, Addison-Wesley,
1967.
[Shoenfield2] J. Shoenfield, “Axioms of Set Theory”, in Handbook of
Mathematical Logic, ed. J. Barwise, North-Holland, 1977.
[Simpson] S. Simpson, Subsystems of Second Order Arithmetic, to appear
[Smorynski] C. Smorynski, Self-Reference and Modal Logic, SpringerVerlag, 1985.
[Smullyan] R. Smullyan, First-Order Logic, Springer-Verlag, 1968.
[Soare] R. Soare, “Computability and Recursion”, manuscript, 1996.
[Steel1] J. Steel, “Mathematics Needs New Axioms”, manuscript, 2000.
[Steel2] J. Steel, “Forcing with Tagged Trees”, Annals of Mathematical
Logic 15 (1978), 55-74.
[Steel3] J. Steel, “An Outline of Inner Model Theory”, In Handbook of
Set Theory, ed. M. Foreman and A. Kanamori, to appear.
[Steel4] J. Steel, “PFA implies ADL(R) ”, manuscript, 2007.
[TakZar1] G. Takeuti and W. M. Zaring, “Introduction to Axiomatic
Set Theory”, Springer-Verlag, 1971.
[TakZar2] G. Takeuti and W. M. Zaring, “Axiomatic Set Theory”,
Springer-Verlag, 1973.
[Telgarsky] R. Telgarsky, “Topological Games: on the 50th Anniversary
of the Banach-Mazur game”, Rocky Mountain Journal of Mathematics 17 (1987), 227–276.
[Todor1] S. Todorcevic, “Coherent Sequences”, In Handbook of Set
Theory, ed. M. Foreman and A. Kanamori, to appear.
[Todor2] S. Todorcevic, “A note on the Proper Forcing Axiom”, in
Axiomatic Set Theory, 1983, 209–218.
239
[Viale] M. Viale, “Applications of the Proper Forcing Axiom to Cardinal Arithmetic”, Ph. D. thesis, University of Paris, 2006.
[Welch1] P. Welch, “An introduction to inner model theory”, lecture
notes.
[Welch2] P. Welch, “Σ∗ Fine Structure”, In Handbook of Set Theory,
ed. M. Foreman and A. Kanamori, to appear.
[Wiki] Wikipedia, the free encyclopedia, en.wikipedia.org
[Yasuhara] A. Yasuhara, Recursive Function Theory & Logic, Academic Press, 1971.
240
Index.
absolute, 50
absolute value, 19
acceptable J-structure, 164
AD, 202
admissible ordinal, 57
admissible set, 57
amenable structure, 149
analytic set, 193
anti-large cardinal hypothesis,
187
antichain, 86
antisymmetry, 18
Aronszajn tree, 87
assignment, 12
atomic formula, 2
auxiliary game, 201
axiom, 3
axiom of determinacy, 202
axiom scheme, 6
axioms of equality, 4
closed interval, 45
closed subset, 46, 84
closure of a subset, 46
club filter, 103
club subset, 84, 182
cofinal, 182
cofinality, 60
collapsing isomorphism, 53
commutative group, 22
commutative ring, 17
comparable elements, 86
compatible elements, 72
complement, 8
complete, 13
complete Boolean algebra, 74
complete Heyting algebra, 75
complete lattice, 74
complete metric space, 43
complete theory, 28
composition, 10
computable function, 25
computable partial function, 25
computable predicate, 25
computably enumerable, 25
condensation lemma, 70
congruence relation, 14
consistency strength, 188
consistent, 13, 14
constructible set, 65
continuous, 45
continuum hypothesis, 48
contraction of quantifiers, 54
core, 186
core model, 183
countable chain condition, 81
countably infinite, 11
covering lemma, 176
critical number, 186
critical point, 133
critical premouse, 186
Baire space, 47
base for topology, 44
bijective, 10
boldface class, 193
Boolean algebra, 8
Boolean valued model, 72
Borel determinacy, 201
Borel hierarchy, 190
Borel set, 190
bound variable, 2
bounded quantifier, 51
c.c.c., 81
canonical rank function, 106
Cantor space, 45
cardinal, 40
cardinal arithmetic, 47
cardinality, 11
Cartesian product, 9
chain, 82
choice function, 33
Church’s thesis, 24
clopen subset, 201
DC, 198
definable, 148
definable element, 160
241
definable from parameters, 148
definable hull, 121
definable Skolem functions, 120
definition with parameters, 63
dense embedding, 76
dense linear order, 20
dense subset, 71
descending chain, 37
determined, 199
diagonal intersection, 101
diamond principle, 84
direct limit, 129
direct system, 129
directed poset, 128
disjoint sets, 9
disjoint union, 9
domain, 10, 11
down-absolute, 53
downward extension, 164
game, 199
GCH, 50
generic extension, 72
generic filter, 72
generic model theorem, 80
Godel number, 27
good parameter, 159
greatest lower bound, 73
greatly Mahlo cardinal, 104
Heyting algebra, 75
homeomorphism, 45
homomorphism, 16
hypothesis of constructibility, 66
inaccessible cardinal, 97
incomparable elements, 86
incompatible elements, 72
independent sentence, 28
indiscernibles, 120
induction, 6
infimum, 21, 73
infinite, 11
infinite sequence, 43
infinite two-person game, 199
injective, 10
inner model, 70
integer, 5, 17
integral domain, 19
interior of a subset, 46
interpretation, 11
intersection, 8
irrational number, 47
isomorphic embedding, 16
isomorphism, 16
iterable premouse, 184
iterated forcing, 92
iterated ultrapower, 131
elementary embedding, 113
elementary substructure, 68
EM-set, 120
empty set, 8
equality predicate, 4
equiconsistent, 188
equivalence class, 14
equivalence relation, 14
existence condition, 54, 55
expansion, 13
extender, 135
field, 19
filter, 71
fine structure theory, 157
finite, 11
first incompleteness theorem, 28
first-order language, 11
forcing axiom, 212
forcing condition, 71
forcing language, 78
forcing relation, 79
forcing theorem, 80
formula, 2
free variable, 2
full collection, 61
function, 8, 10
Jensen hierarchy, 151
Kleene-Brouwer order, 208
KP, 54
Kurepa tree, 87
242
open interval, 45
open set, 44
order topology, 45
order type, 38
order-dense, 21
order-preserving, 21
ordered n-tuple, 8
ordered commutative group, 22
ordered commutative ring, 18
ordered field, 19
ordered pair, 9
ordinal, 35
ordinal arithmetic, 39
ordinal definable, 180
lattice, 74
least element, 18
least upper bound, 18, 73
least upper bound property, 20
Lebesgue measure, 48
legal position, 200
lexicographic order, 67
lightface class, 193
limit, 43
limit cardinal, 60
limit ordinal, 36
limit point, 84
linear order, 18
logical axioms, 3
Los’ theorem, 113
Lowenheim-Skolem theorem, 14
lower bound, 73
partial function, 24
partial order, 18
Peano’s axioms, 5
perfect set property, 195
perfect subset, 195
pigeonhole principle, 40
play, 199
player, 199
Polish space, 190
poset, 71
position, 199
positive, 18
power set, 8
pre-well-order, 205
predense subset, 182
predicate, 3
premouse, 184
prime power coding, 191
principal filter, 112
principle of dependent choices,
198
product forcing, 91
product order, 91
projection function, 11
projective hierarchy, 192
projective set, 192
projectum, 159, 160
proper class, 34
proper filter, 103
proper forcing, 182
property of Baire, 195
Mahlo cardinal, 100
Martin’s axiom, 94
master code, 160
maximal antichain, 86
maximal element, 82
meager, 48
measurable cardinal, 114
measure 0, 49
metric function, 43
metric space, 43
metric topology, 44
minimal element, 37
Mitchell order, 142
model, 13
monomorphism, 129
Mostowski collapse, 53
mouse, 186
mouse iteration, 186
name, 78
nonprincipal filter, 112
nonprojectible ordinal, 156
norm, 204
notion of forcing, 71
nowehere dense subset, 46
243
Skolem term, 122
small extensions, 66
sound, 13
sound structure, 185
square principle, 178
standard code, 160
standard parameter, 160
standard space, 191
stationary set preserving, 183
stationary subset, 84, 182
strategy, 199
strict partial order, 18
strictly order-preserving, 21
strong Σ1 collection, 59
strong cardinal, 133
strong limit cardinal, 97
strongly admissible set, 156
strongly compact cardinal, 140
structure, 11
subset, 8
substructure, 16
successor cardinal, 60
successor function, 5
successor ordinal, 36
supercompact cardinal, 133
superset, 9
superstrong cardinal, 133
support, 92
supremum, 21, 73
surjective, 10
Suslin hypothesis, 88
Suslin line, 88
Suslin tree, 87
symmetric difference, 9
propositional connective, 2
pruned tree, 200
pseudo-complement, 75
quantifier, 2
quasi-order, 105
quotient, 14
range, 10
rank, 58
rational number, 17
real number, 17
recursive definition, 2
reduction, 204
reflexive, 18
regressive function, 116
regular cardinal, 60
regular open set, 74
regularity property, 202
relation, 8, 11
relative complement, 9
relative rudimentary function,
148
relativization, 51
restriction, 10
rud-closed, 147
rud-closure, 147
rudimentary functions, 145
scale, 204
SCH, 177
scheme, 102
second incompleteness theorem,
28
second order variable, 109
semiproper forcing, 183
semiscale, 206
sentence, 2
separated subsets, 193
Silver indiscernibles, 126
single-valued, 24
singular cardinal, 60
singular cardinals hypothesis,
177
Skolem function, 69
Skolem hull, 69
term, 2
theory, 15
thin subset, 84
topological space, 44
topology, 44
total, 24
totally disconnected, 46
transfinite induction, 37
transfinite recursion, 37
transitive, 18
transitive class, 51
244
universal closure, 4
universe, 11
universe of discourse, 3
up-absolute, 53
upper bound, 18, 73
upward extension, 164
transitive closure, 57
transitive collapse, 53
transitive set, 35
tree, 196
tree property, 86
triangle inequality, 19
Turing closed, 209
Turing cone, 209
Turing machine, 231
valency, 2
variable, 2
very good parameter, 161
ultrafilter, 96
ultrapower, 113
ultraproduct, 112
unbounded subset, 60
uncountable set, 47
uniformization, 204
uniformize, 156
union, 8, 32
uniqueness condition, 55
well-founded, 37
well-order, 38
well-ordering, 39
winning strategy, 199
Woodin cardinal, 133
ZFC, 31
Zorn’s lemma, 83
245
Index of symbols.
κλ , 45
xy , 45
c, 45
N, 47
∈-structure, 50
∆0 collection, 54
Σ1 collection, 54
∆KP
1 , 55
∆ZF
1 , 55
∃!, 55
TC, 57
Vα , 58
ρ, 58
Hκ , 60
Def, 63
Sat, 63
L, 64
Lα , 64
LimOrd, 65
<L , 66
[x]≤n , 67
Σn -elementary substructure, 68
≺, 68
≺n , 68
p< , 71
p≥ , 71
p≤ , 71
M [G], 72
ro, 74
M B , 78
V B , 78
ˇ, 78
, 79
JK, 79
κ-c.c., 81
κ-chain condition, 81
∆-system, 82
♦, 84
κ-closed, 84
♦κ (E), 85
=, 4
Pow, 8
∩, 8
∪, 8
∅, 8
⊂, 8
⊆, 8
c
,8
−, 9
hi, 9
⊕, 9
⊇, 9
×, 9
Dom, 10
Ran, 10
◦, 10
↾, 10
f [x′ ], 10
π1 , 11
π2 , 11
N , 11, 15
|=, 13
⊢, 13
E-model, 14
F~v (~x), 15
Q, 17
R, 17
Z, 17
inf, 21
sup, 21
madic notation, 26
pq, 27, 62
∈-minimal, 33
Ord, 36
ω, 36
+, 39, 42
·, 39, 42
Card, 40
ℵ, 41
C, 45
246
SαA , 157
AM , 159
AMp , 159
ρα , 159
pM , 159
P(M ), 159
Anα , 160
pnα , 160
∈-cofinal, 167
, 178
HOD, 180
OD, 180
K DJ , 183
n-sound structure, 186
n(N ), 186
σ-algebra, 190
Cd, 192
FS, 192
T ⊘x, 196
Br, 196
Pr, 196
κ-Suslin, 197
<KB , 208
GP, 229
Lim, 100
△, 101
Π10 -indescribable cardinal, 108
Π1n -indescribable cardinal, 109
Π1n -enforceable subset, 110
[x]n , 111
κ → (λ)nµ , 111
UltU , 115
0#, 122
L[A], 127
L(b), 128
M [x], 128
dirlim, 129
L[U ], 131
Σn -elementary embedding, 131
x#, 132
(κ, λ)-extender, 135
UltE , 137
M -ultrafilter, 144
Jα , 151
Rud, 151
Sα , 152
δ-number, 155
J-structure, 157
JαA , 157
247
Related documents