Download Peano and Heyting Arithmetic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bayesian inference wikipedia , lookup

Gödel's incompleteness theorems wikipedia , lookup

Surreal number wikipedia , lookup

Infinity wikipedia , lookup

Structure (mathematical logic) wikipedia , lookup

Mathematical logic wikipedia , lookup

List of first-order theories wikipedia , lookup

Axiom wikipedia , lookup

Quasi-set theory wikipedia , lookup

Propositional calculus wikipedia , lookup

Georg Cantor's first set theory article wikipedia , lookup

Non-standard analysis wikipedia , lookup

Laws of Form wikipedia , lookup

Mathematical proof wikipedia , lookup

Curry–Howard correspondence wikipedia , lookup

Natural deduction wikipedia , lookup

Hyperreal number wikipedia , lookup

Theorem wikipedia , lookup

Peano axioms wikipedia , lookup

Ordinal arithmetic wikipedia , lookup

Transcript
3. Peano Arithmetic
3.1. Language and Axioms.
Definition 3.1. The language of arithmetic consists of:
•
•
•
•
•
A 0-ary function symbol (i.e. a constant) 0,
A unary function symbol S,
Two binary function symbols +, ·,
Two binary relation symbols =, <,
For each n, infinitely many n-ary predicate symbols Xni .
We often abbreviate ¬(x = y) by x 6= y and sometimes ¬(x < y) by
x 6< y. We write x ≤ y as an abbreviation for x < y ∨ x = y and s + t, s · t
as “abbreviations” for +st and ·st.
We intend these symbols to represent their usual meanings regarding
arithmetic. S is the successor operation. The predicate symbols Xni intentionally have no fixed meaning; their purpose is so that if we prove a
formula φ containing one of them then not only have we proven φ[ψ/Xni ]
(the formula where we replace Xni with the formula ψ) for any ψ in our
language, we have proven φ[ψ/Xni ] for any formula in any extension of the
language of arithmetic.
Definition 3.2. P − consists of formulas:
• ∀x(x = x),
• ∀x∀y(x = y → φ[x/z] → φ[y/z]) where φ is atomic and x and y are
substitutable for z in φ,
• ∀x(Sx 6= 0),
• ∀x∀y(Sx = Sy → x = y),
• ∀x∀y(x < Sy ↔ x ≤ y),
• ∀x(x 6< 0),
• ∀x∀y(x < y ∨ x = y ∨ y < x),
• ∀x(x + 0 = x),
• ∀x∀y(x + Sy = S(x + y)),
• ∀x(x · 0 = 0),
• ∀x∀y(x · Sy = x · y + x).
The second equality axiom is a bit subtle. In particular, note that φ is
allowed to contain x or y, so we can easily derive
x=y→x=x→y=x
(taking φ to be z = x) and
y=x→y=w→x=w
(taking φ to be z = w).
Finally, in order to prove anything interesting, we need to add an induction scheme.
1
2
Definition 3.3. The axioms of arithmetic, ΓP A , consist of P − plus, for
every formula φ and each variable x, the formula
φ[0/x] → ∀x(φ → φ[Sx/x]) → ∀xφ.
We write PA ` Γ ⇒ Σ if Fc ` ΓP A Γ ⇒ Σ and HA ` Γ ⇒ Σ if Fi `
ΓP A Γ ⇒ Σ.
PA stands for Peano Arithmetic while HA stands for Heyting arithmetic.
Definition 3.4. The numerals are the terms built only from 0 and S. If n
is a natural number, we write n for the numeral given recursively by:
• 0 is the term 0,
• n + 1 is the term Sn.
3.2. Basic Properties. We will generally not try to give even simple proofs
explicitly in the sequent calculus: even very simple arguments rapidly become infeasible. (Consider, for example, that the most basic arguments
involving substitution generally take several inference rules.)
Instead, we will accept from our previous work that Fc already captures
most ordinary logical reasoning, and we will give careful arguments from the
axioms, using informal logic.
Theorem 3.5. HA proves that additition is commutative:
∀x∀y x + y = y + x.
Proof. By induction on x. It suffices to show ∀y 0+y = y +0 and ∀y(x+y =
y + x) → ∀y(Sx + y = y + Sx).
For the first of these, since y + 0 = y, it suffices to show that 0 + y = y;
we show this by induction on y. 0 + 0 = 0 is an instance of an axiom, and
if 0 + y = y then 0 + Sy = S(0 + y) = Sy.
Now assume that ∀y(x + y = y + x). Again, we go by induction on y.
Sx + 0 = Sx and we have already shown that 0 + Sx = Sx. Suppose
Sx + y = y + Sx; then
Sx+Sy = S(Sx+y) = S(y+Sx) = SS(y+x) = SS(x+y) = S(x+Sy) = S(Sy+x) = Sy+Sx.
(Consider just how many infence rules it would take to completely formalize
applying the transitivity of = over seven equalities, which forms only one of
the three inductive arguments in the proof.)
By similar arguments, HA (and so also PA) proves all the standard facts
about the arithmetic operations. These systems are (more than) strong
enough to engage in sensible coding of more complicated (but still finite)
objects. The details of how to accomplish this sort of coding are tedious and
sufficiently described elsewhere, but we will briefly describe what it means
to code something in the language of arithmetic with an example.
One of the first things that needs to be coded is the notion of a finite
sequence of natural numbers. What we mean by this is that we wish to
informally set up a correspondence between finite sequences and natural
3
numbers. Let us name a function π which is an injective map from finite
sequences to natural numbers. The range of π should be definable; that is,
there should be a formula φπ such that HA ` φπ (n) when n = π(σ) for
some σ and HA ` ¬φπ (n) when n is not in the range of π.
Then we need the natural operations on sequences to be definable. For
instance, we would like to be able to take a sequence σ and a natural number n and define the sequence σ _ hni which consists of appending n to the
sequence σ. To code this, we should have a formula φ_ such that:
• If m = π(σ _ hni) then HA ` φ_ (π(σ), n, m),
• If m 6= π(σ _ hni) then HA ` ¬φ_ (π(σ), n, m),
• HA ` ∀x, y, z, z 0 (φ_ (x, y, z) ∧ φ_ (x, y, z 0 ) → z = z 0 ), and
• HA ` ∀x, y(φπ (x) → ∃zφ_ (x, y, z)).
The first two clauses state the HA proves that φ_ correctly identifies
π(σ _ hni) for actual sequences σ and natural numbers n. But this isn’t
enough to give the last two clauses, because HA can’t actually prove that
the numerals n are the only numbers. So the last two clauses say that HA
can actually prove that φ_ represents a well-defined function. (For instance,
the last two clauses ensure that in a nonstandard model, which has “nonstandard sequences”, the _ operation still has a sensible interpretation.)
Coding sequences is crucial to the power of HA because we can carry out
induction along sequences. In particular, this lets us define exponentiation:
we say x = y z if there is a sequence σ of length z such that σ(0) = y,
σ(i + 1) = σ(i) · y for each i, and the last element of σ is equal to x. And
once we have done this, we could define iterated exponentiation, and so on.
Once we can code sequences, it also becomes much easier to define other
notions, since we can use sequences to combine multiple pieces of information
in a single number. For instance, we could define a finite group to consist of
a quadruple hG, e, +G , −1 i where G is a number coding a finite set, e is an
element of G, +G and −1 are numbers coding finite sets of pairs, and then
write down a long formula describing what has to happen for this quadruple
to properly define a group.
To illustrate just how much is provable, we quote Harvey Friedman’s
“grand conjecture”
Every theorem published in the Annals of Mathematics whose
statement involves only finitary mathematical objects (i.e.,
what logicians call an arithmetical statement) can be proved
in EFA. EFA is the weak fragment of Peano Arithmetic based
on the usual quantifier-free axioms for 0, 1, +, , exp, together
with the scheme of induction for all formulas in the language
all of whose quantifiers are bounded.
(We will define the notion of a bounded quantifier below.) In other words,
almost all of conventional combinatorics, number theory, finite group theory,
and so on can be coded up and then proven, not only inside PA, but in a
comparatively small fragment of PA. (Later on we’ll have a way to quantify
4
how strong fragments of PA are, and we’ll learn that EFA is very small
indeed.)
3.3. The Arithmetical Hierarchy.
Definition 3.6. We write ∀x < y φ as an abbreviation for ∀x(x < y → φ)
and ∃x < y φ as an abbreviation for ∃x(x < y ∧ φ).
Note that PA ` ¬∀x < y ¬φ ↔ ∃x < y φ and PA ` ¬∃x < y ¬φ ↔ ∀x <
y φ, just as we would expect. We call these bounded quantifiers. As we will
see, formulas in which all quantifiers are bounded behave like quantifier-free
formulas. We call other quantifiers unbounded.
Because HA can describe sequences in a single number, there is no real
difference between a single quantifier ∃x and a block of quantifiers of the
same type, ∃x1 ∃x2 · · · ∃xn —anything said with the later could be coded up
and expressed with a single quantifier. Furthermore, all the coding necessary
can be done using only bounded quantifiers. Therefore we will generally
simply write a single quantifier, knowing that it could stand for multiple
quantifiers of the same type.
Definition 3.7. The ∆0 formulas are those in which all quantifiers are
bounded. Σ0 and Π0 are alternates names for ∆0 .
The Σn+1 formulas are formulas of the form
∃xφ
(possibly with a block of several existential quantifiers) where φ is Πn .
The Πn+1 formulas are formulas of the form
∀xφ
(possibly with a block of several universal quantifiers) where φ is Σn .
In particular, the truth of ∆0 formulas is computable, in the sense that
given numeric values for the free variables in ∆0 , we can easily run a computer program which checks in finite time whether the formula is true (under
the intended interpretation in the natural numbers).
By the same argument that shows every formula is equivalent in Fc to
a prenex formula, PA shows that every formula is equivalent to a formula
with its unbounded quantifiers in front, which must be Σn or Πn for some
n.
Lemma 3.8. If t is a closed term then there is a natural number k such
that HA ` t = k.
Proof. By induction on the construction of the term t. If t = 0 then this is
trivial since 0 = 0 is derivable.
If t = St0 then by IH we have HA ` t0 = k 0 , and therefore we can derive
St0 = Sk 0 . To keep the formula managable, we write ψ(t) as an abbreviation
for t0 = t → St0 = St0 → St0 = St. Note that this is an instance of the
substitution axiom.
5
L∀
St0 = St0 ⇒ St0 = St0
ΓP A , St0 = St0 ⇒ St0 = St0
L∀
ΓP A ⇒ St0 = St0
ΓP A ⇒ t0 = k 0
L∀
ΓP A , ψ(k 0 ) ⇒ ψ(k 0 )
ΓP A , ∀y(ψ(y)) ⇒ ψ(k 0 )
ΓP A ⇒ ψ(k 0 )
ΓP A ⇒ St0 = St0 → St0 = Sk 0
ΓP A ⇒ St0 = Sk 0
If t = t0 + t1 then by IH we have HA ` t = k0 + k1 , and then by induction
on k1 we can construct a deduction of HA ` t = k0 + k1 . The t = t0 · t1
case is similar.
Lemma 3.9. If φ is atomic then HA ` φ ∨ ¬φ.
Proof. We first observe that not only does HA have excluded middle in the
form t = 0∨t 6= 0, HA even has the slightly stronger form t = 0∨∃y t = Sy.
This is due to the presence of the induction axiom: certainly 0 = 0 ∨ ∃y 0 =
Sy, and in the inductive case we ignore the hypothesis entirely and note
that Sx = 0 ∨ ∃y Sx = Sy. This means we can argue by cases: if we show
φ(0) and ∀xφ(Sx) then we have ∀xφ(x).
The only atomic formulas are those of the form t0 = t1 or t0 < t1 . We
first consider the case of =. We proceed by induction on t0 . In the case
where t0 = 0, we split into cases: 0 = 0, so 0 = 0 ∨ 0 6= 0, and 0 6= Sy so
0 = Sy ∨ 0 6= Sy.
Suppose ∀y(x = y ∨ x 6= y). Again, we split into cases. Sx 6= 0, so
Sx = 0 ∨ Sx 6= 0. Sx = Sy is equivalent to x = y, and we assumed that
x = y ∨ x 6= y, so also Sx = Sy ∨ Sx 6= Sy.
The case for x < y is even simpler, using the fact that x = y ∨ x 6= y: we
already have x < y ∨ x = y ∨ y < x. If x = y then we have x 6< y, and if
x 6= y then we have x < y ∨ y < x, the latter of which implies x 6< y.
Theorem 3.10. If φ is ∆0 then HA ` φ ∨ ¬φ.
Proof. By induction on φ. For φ atomic, this is the previous lemma. Observe
that from φ ∨ ¬φ and ψ ∨ ¬ψ, we can derive (φ ~ ψ) ∨ ¬(φ ~ ψ).
Suppose φ is ∃x < t φ. We show by induction that
∀y(∃x < y φ ∨ ¬∃x < y φ).
For y = 0, this is derivable, since we can show that ¬∃x x < 0.
Suppose ∃x < y φ ∨ ¬∃x < y φ. We must show ∃x < Syφ ∨ ¬∃x < Sy φ.
If ∃x < y φ then clearly ∃x < Sy φ. Also if φ(y) then ∃x < Sy φ. Otherwise
we have ¬∃x < y φ and ¬φ(y), and since x < Sy implies x < y or x = y, we
have ¬∃x < Sy φ.
The ∀x < t φ case is similar.
3.4. The Friedman-Dragalin Translation. One interpretation of the last
theorem of the previous section is that ∆0 formulas behave like classical
ones, even in intuitionistic logic. A consequence is that classical logic and
intuitionistic logic have to agree on simple formulas:
6
Theorem 3.11. If φ is Π2 and PA ` φ then HA ` φ.
This statement is not true if we deduce a sequent of Π2 formulas instead
of a single formula.
For the proof, we need another translation of formulas:
Definition 3.12. Fix a formula θ.
• ⊥F D is θ,
• If p is atomic and not ⊥, pF D is p ∨ θ,
• (φ ~ ψ)F D is φF D ~ ψ F D ,
• (Qxφ)F D is Qx(φF D ).
Note that this is the result of the ∗ translation from intuitionistic to
minimal logic followed by replacing every occurrence of ⊥ with θ.
Lemma 3.13. If Fm ` Γ ⇒ Σ and Γ[θ/⊥], Σ[θ/⊥] are the result of replacing
every occurrence of ⊥ with the formula θ, then Fm ` Γ[θ/⊥] ⇒ Σ[θ/⊥].
Proof. Proof sketch: This follows from the fact that ⊥ has no special properties in minimal logic. We proceed by induction on deductions, and the
only way ⊥ can be introduced is by weakening or by the axiom ⊥ ⇒ ⊥. Theorem 3.14. If HA ` Γ ⇒ Σ then HA ` ΓF D ⇒ ΣF D .
Proof. We proved for first-order logic in general that if Fi ` ΓP A , Γ ⇒ Σ
then Fm ` Γ∗A , Γ∗ ⇒ Σ∗ . The previous lemma then shows that Fm `
F D ⇒ ΣF D .
ΓFP D
A, Γ
Furthermore, we have already seen that Fi ` φ → φ∗ , and by the same
argument, Fi ` φ → φF D . In particular, we may apply cuts over all the
axioms of ΓP A actually used in the original proof to obtain Fi ` ΓP A , ΓF D ⇒
ΣF D , and therefore HA ` ΓF D ⇒ ΣF D .
Lemma 3.15. If φ is ∆0 and no free variable in θ appears bound in φ then
HA ` φF D → φ ∨ θ.
Proof. By induction on φ. If φ is ⊥, this is trivial since ⊥F D is θ. If φ is
atomic then φF D is exactly φ ∨ θ.
If φ = ψ0 ∨ ψ1 , it is easy to derive ψ0 ∨ θ ⇒ (ψ0 ∨ ψ1 ) ∨ θ and ψ1 ∨ θ ⇒
(ψ0 ∨ ψ1 ) ∨ θ. Since the inductive hypothesis gives ψ0F D → ψ0 ∨ θ and
ψ1F D → ψ1 ∨ θ, we can conclude ψ0F D ∨ ψ1F D → (ψ0 ∨ ψ1 ) ∨ θ.
The cases for ∧, → are similar.
Suppose φ is ∃x < t ψ. We show by induction that ∀y((∃x < y ψ)F D →
(∃x < y ψ ∨ θ)). Note that (∃x < y ψ)F D is ∃x((x < y ∨ θ) ∧ ψ F D ). If y = 0
then since x 6< y, the premise immediately implies θ. Suppose the claim
holds for y, and we set out to show it for Sy. Assume ∃x((x < Sy∨θ)∧ψ F D );
using the main inductive hypothesis, we have ∃x((x < Sy ∨ θ) ∧ (ψ ∨ θ)).
This easily implies (∃x < Sy ψ) ∨ θ.
The case for ∀x < tψ is similar.
Theorem 3.16. If φ is Π2 and PA ` φ then HA ` φ.
7
Proof. We have φ = ∀xθ where θ = ∃yψ is Σ1 . We have a deduction of
θ in PA. Using the double negation translation yields a deduction HA `
(∀y(ψ → ⊥)) → ⊥.
Applying the Friedman-Dragalin translation gives us
HA ` ∀y(ψ F D → θ) → θ.
We have by the previous lemma HA ` ψ F D → ψ ∨ θ, and since ψ → θ,
we actually have HA ` ψ F D → θ and so HA ` ∀y(ψ F D → θ). Combining
these, we we obtain a deduction of HA ` θ.
3.5. Ordinals. In order to discuss cut-elimination for Peano Arithmetic, it
is helpful to have a theory of ordinals.
We will be concerned with linear orders which can be defined in HA—
that is, there is a formula ≺ (x, y) with exactly the two listed free variables,
where HA can prove that ≺ is a linear order. We will in fact primarily
be interested in the case where ≺ is ∆0 . We will write x ≺ y in place of
≺ (x, y). We are mostly interested in the interpretation of ≺ as an ordering
on the actual natural numbers, and so we will sometimes equate formulas
which define orderings with the ordering itself.
Definition 3.17. A definable linear ordering of ω is a formula ≺ (x, y) with
exactly the two listed free variables such that HA deduces:
• x 6≺ x,
• If x ≺ y and y ≺ z then x ≺ z,
• If x 6= y, either x ≺ y or y ≺ x.
≺ is a well-ordering if there is no infinite sequence n1 n2 · · · .
The statement that ≺ is a well-ordering can’t be directly expressed in the
language of arithmetic, but we can make a coherent attempt. We use the
presence of the fresh predicate symbols to represent the idea of quantifying
over all sequences: we view a binary predicate X as a sequence, saying
X(s, t) holds if s is the t-th element of the sequence. Then the statement
W O(≺) is:
∃x∀y∀zX(x, y) ∧ X(Sx, z) → z 6≺ y.
In other words, X does not list an infinite descending sequence in ≺. If
HA ` W O(≺) then it is actually true that, in the standard model, ≺ describes a well-ordering. (In a nonstandard model, this may not be the case
because in such models X describes sequences of “nonstandard length”.)
Of course, there are many examples of formulas ≺ which actually describe
well-orderings, but where HA cannot prove W O(≺).
Being a well-ordering is equivalent to saying that every non-empty set
contains a least element. We can’t quite state this inside arithmetic, so we
prove it externally.
Theorem 3.18. ≺ is a well-ordering iff whenever Y is non-empty, there is
a ≺-least element of Y .
8
Proof. Suppose Y is non-empty but has no ≺-least element. Let x1 ∈ Y .
Since x1 is not ≺-least, there is an x2 ≺ x1 with x2 ∈ Y . Similarly, x2 is
not ≺-least in Y . Iterating, we obtain an infinite decreasing sequence in ≺,
which shows that ≺ is not well-ordered.
Conversely if ≺ is not well-ordered then there is an infinite descending
sequence x1 x2 · · · , and clearly {xn } is a non-empty subset of X with
no ≺-least element.
In particular, every well-ordering other than the one with empty domain
has a least element, which we generally call 0.
One special feature of well-orderings is that they are precisely the orders
on which transfinite induction makes sense.
Theorem 3.19. Suppose (X, ≺) is a non-empty well-ordering. Let Z ⊆ X
be a set such that 0 ∈ Z and such that for any x ∈ X, if every y ≺ x belongs
to Z then x belongs to Z. Then Z = X.
Proof. Suppose Z ( X. Then X \ Z is non-empty, and therefore has a
≺-least element x ∈ X \ Z. But then for every y ≺ x, y ∈ Z, and therefore
x ∈ Z, a contradiction.
Moreover, transfinite induction can be stated inside arithmetic (in the
rough way that being a well-ordering can be stated): we write T I(≺, X) for
the formula
∀x[(∀y ≺ xX(y)) → X(x)] → ∀xX(x).
We can write T I(≺, φ) if we are interested in particular cases of transfinite
induction, or T I(≺, X) to indicate the statement with one of our fresh predicates X. Note that if we can prove T I(≺, X) with X a fresh predicate then
we can prove T I(≺, φ) for any formula φ.
Another key properties of well-orderings is that they are in some sense
unique.
Definition 3.20. An initial segment of X (under ≺) is a set Z ⊆ X such
that whenever z ∈ Z and x ≺ z, x ∈ Z.
Theorem 3.21. Let (X, ≺) and (Y, ≺0 ) be well-orderings. Then either there
is an order-preserving bijection from X to an initial segment of Y , or an
order-preserving bijection from Y to an initial segment of X.
Proof. If either is empty, this is trivial. Otherwise, we will define, by transfinite recursion, a function f from an initial segment of X to an initial
segment of Y which is a bijection on these initial segments and which is
order-preserving (so f (x) ≺ f (y) iff x ≺ y).
Initially we set f (0) = 0. Suppose X 0 ⊆ X, Y 0 ⊆ Y are initial segments
and we have defined an order-preserving bijection f : X 0 → Y 0 . If X 0 = X
then f is an order-preserving bijection from X to an initial segment of Y .
If Y 0 = Y then f −1 is an order-preserving bijection from Y to an initial
segment of X.
9
Otherwise there is a least x ∈ X \ X 0 and a least y ∈ Y \ Y 0 , and we
extend f by setting f (x) = y. Clearly X 0 ∪ {x} and Y 0 ∪ {y} are initial
segments and the extended f is an order-preserving bijection.
This means that even though the underlying sets X and Y might be
different, we can find a copy one of these orderings inside the other.
In particular, this allows us to induce an ordering on well-orderings themselves: (X, ≺) is less than or equal to (Y, ≺0 ) if there is an order-preserving
bijection from (X, ≺) to an initial segment of Y . (The initial segment could
be all of Y , so we allow for “equality”.) In fact, this is a well-ordering on
the well-orders!
We use the term ordinal to mean an equivalence class of well-orderings—
that is to say, the order itself, rather than some particular description of the
order.
Let’s consider some concrete examples of well-orders which are definable
in PA. Each finite number is an ordinal, and since there is only one linear
ordering on a finite set (up to isomorphism), there is a unique finite ordinal of
each size. In other words, 0 is the smallest ordinal, 1 (the ordinal consisting
of a single point) is the next smallest, then 2 (the ordinal with two points,
one smaller than the other), and so on.
Above all these ordinal is the ordering of the natural numbers, which we
call ω. This ordinal has infinitely many elements ordered in a row. Clearly
ω is definable, by the formula x < y.
A more interesting ordering is given by
x ≺ω+1 y ↔ [(0 < x ∧ x < y) ∨ (x = 0 ∧ 0 < y)] .
The smallest element in this order is 1, followed by 2, then 3, and so on,
with 0 larger than any positive number. In other words, this ordering looks
like ω, but with an extra element tacked on at the end, larger than any finite
element.
Next we could define ω+2, which looks like ω+1 but with another number
added on after. In general, if α is any ordering, we could define α + 1 to be
the ordering that looks like α, but with one additional element larger than
any element of α.
Theorem 3.22. If HA proves that α is well-ordered then HA proves that
α + 1 is well-ordered.
Proof. It suffices to show that if X is an infinite descending sequence in
α + 1 then we can define from X an infinite descending sequence in α.
This is easily done: take the sequence x 7→ X(x + 1) (that is, the formula
Y (x, y) ↔ X(x + 1, y)). Certainly every element of the sequence X after the
first must be below the largest element, and therefore must belong to the
ordering α.
We could keep going, and eventually get
x ≺ω+ω y ↔ [(x and y are either both even or both odd and x < y) ∨ (x is odd and y is even)] .
10
This ordering starts with all the odd numbers in their usual order—which
looks like a copy of ω—and then above them is another copy of ω.
We note that there is a significant difference between well-orderings like
ω and ω + ω on the one hand, and well-orderings like ω + 7 on the either.
Some well-orderings have largest elements, and some do not. (0 is a special
case.)
Definition 3.23. We say α is a successor ordinal if there is some β such
that α = β + 1. If α is neither 0 nor a successor ordinal, we say α is a limit
ordinal.
Definition 3.24. Let β1 < β2 < · · · be an increasing sequence of ordinals.
We define supn βn to be the least ordinal larger than any βn .
Note that supn βn is well-defined, since the ordinals are themselves wellordered, so there is a least such ordinal.
Lemma 3.25. Suppose α is a limit ordinal which can be represented with
domain the natural numbers. Then there is a sequence β1 < β2 < · · · such
that α = supn βn .
Proof. Consider some representation of α as a well-ordering ≺ on the natural
numbers. For each n, define γn = {m | m ≺ n}. γn is an initial segment
of α, so is itself a well-ordering. γn does include n, so γn is a proper initial
segment, in particular γn < α. Define β0 = γ0 and given βn , define βn+1 to
be γm where m is least such that βn < γm .
We have supn βn ≤ α since each βn < α. Suppose δ < α; then δ may
be mapped to some proper initial segment of α, so in particular α \ δ is
non-empty, and there must be some least k belonging to α \ δ. Then δ =
{m | m ≺ k}, and therefore δ < γk+1 ≤ βk+1 . This holds for every δ < α,
so α ≤ supn βn .
We can define addition on well-orderings:
Definition 3.26.
• α + 0 = α,
• α + (β + 1) = (α + β) + 1,
• If λ = supn βn is a limit, α + λ = supn (α + βn ).
An immediate consequence of this definition is that addition is not commutative. It is easy to see why: addition corresponds to the operation of
placing one ordering after another. So ω < ω + 1, because adding a new
element at the end of ω gets a larger ordering. But 1 + ω = ω, since ω
already has an infinite increasing sequence, and adding an element to the
beginning doesn’t change its length.
We can similarly define multiplication as iterated addition:
Definition 3.27.
• α · 0 = 0,
11
• α · (β + 1) = α · β + α,
• If λ = supn βn is a limit, α · λ = supn (α · βn ).
Again, this is not commutative. For instance, ω · 2 = ω + ω is two copies
of ω, as we have already seen. But 2 · ω is infinitely many pairs, which is
really the same as ω.
To consider the first really non-trivial example, ω · ω = ω 2 consists of a
copy of ω, followed by a second copy of ω, followed by a third, and so on.
An easy representation is in terms of pairs: we think of the pair (n, m) as
representing ω · n + m, so (n, m) < (n0 , m0 ) if either n < n0 or n = n0 and
m < m0 .
Although we will not prove it, both addition and multiplication are still
associative.
Naturally, the next step is exponentiation.
Definition 3.28.
• α0 = 1,
• αβ+1 = αβ · α,
• If λ = supn βn then αλ = supn (αβn ).
We will really only use the cases where α = 2 or α = ω. It is important to note that ordinal exponentiation is not cardinal exponentiation. In
particular, 2ω = ω, which is very different from 2ℵ0 .
It turns out that there is a natural representation of exponentiation.
Lemma 3.29. Consider the collection X of finite functions x : β → α (here
we equate α and β with the set of smaller ordinals) such that x(γ) is nonzero at finitely many values. We may order such functions by setting x ≺ y
if when γ < β is largest such that x(γ) 6= y(γ), x(γ) < y(γ). Then X is a
representation of the ordinal αβ .
Choosing γ largest here is possible since x(γ) and y(γ) are non-zero at
finitely many places.
Proof. By induction on β. When β = 0, |X | = 1, since it contains only the
empty function. Suppose the claim holds for β, and we show it for β + 1:
each function x ∈ X can be viewed as a pair (γx , x0 ) where γx < α and x0
is a function from β to α. Clearly x ≺ y if either γx < γy or γx = γy and
x0 ≺ y 0 . Therefore X can be viewed as α copies of X 0 in order, which is
exactly αβ · α.
If λ = supn βn , observe that every element of Xλ is an element of Xβn for
some n.
One special feature of all these operations is that they have fixed points.
Definition 3.30. α is additively principal if whenever β, γ < α, β + γ < α.
α is multiplicatively principal if whenever β, γ < α, β · γ < α.
α is exponentially principal if whenever β, γ < α, β γ < α.
Lemma 3.31. α > 0 is additively principal iff α = ω β for some β.
12
Proof. By induction on β. If β = 0 then α = ω 0 = 1, and the claim
is obvious. Suppose the claim holds for β; if γ, δ < ω β+1 = ω β · ω then
there must be n, m < ω such that γ < ω β · n and δ < ω β · m. Then
γ + δ < ω β · n + ω β · m = ω β (n + m) < ω β+1 .
If λ = supn βn , the claim holds for each βn , and γ, δ < ω λ then there is
some n such that γ, δ < ω βn , and therefore γ + δ < ω βn < ω λ .
β
Similarly, α > 2 is multiplicatively principal iff α = ω ω for some β.
(0, 1, 2 are multiplicatively principal as well.)
The first exponentially principal ordinal greater than 0 is named 0 , and
has a special relationship with PA. 0 is the limit of taking exponents:
define ω0 = 0, ωn+1 = ω ωn . Then 0 = supn ωn .
Our next step will be obtaining a description of 0 inside arithmetic. We
will do this by providing a normal form—a canonical way of writing the
ordinals below 0 .
Lemma 3.32. If α is additively principal and β < α then β + α = α.
Proof. If α = 0 or α = 1, this is trivial. Otherwise α is a limit, so β + α =
supn (β + αn ) = α.
Lemma 3.33. If α is not additively principal, there are β, γ < α such that
β + γ = α.
Proof. Choose β, γ < α such that β + γ ≥ α. Let γ 0 be least such that
β + γ 0 ≥ α; clearly γ 0 ≤ γ < α. If γ 0 = δ + 1 then we have β + δ < α,
so β + γ 0 ≤ α, and therefore β + γ 0 = α. If γ 0 = supn δn then for each n,
β + δn < α, and therefore supn (β + δn ) ≤ α, so again β + γ 0 = α.
Lemma 3.34. Suppose β, γ are additively principle, α < γ, α < β, and
γ + α = β + α. Then γ = β
Proof. Suppose the claim fails, and let γ be smallest so that this fails, so
α < γ, α < β, γ, β are additively principle, and γ + α = β + α, but γ 6= β.
If β < γ then β would be an example of an ordinal smaller than γ for which
the same statement holds, so we must have γ < β.
But since γ < β, α < β, and β is additively principle γ + α < β ≤ β + α,
a contradiction.
Theorem 3.35 (Additive Normal Form). For any α, there is a unique
sequence of additively principal ordinals α1 ≥ α2 ≥ · · · ≥ αn such that
α = α1 + α2 + · · · + αn .
Proof. We define the sequence explicitly as follows. We let α1 be the largest
additively principal ordinal ≤ α. To see that this exists, observe that the
supremum of additively principal ordinals is itself additively principal, so we
may take α1 to be the supremum of all additively principal ordinals ≤ α.
Suppose we have chosen α1 ≥ · · · ≥ αk so that α1 + · · · + αk ≤ α. If
these are equal, we are done, so suppose α1 + · · · + αk < α. Let αk+1 be
the largest additively principal ordinal such that α1 + · · · + αk + αk+1 ≤ α
13
(again, the largest such ordinal exists by taking it to be the supremum of
all such ordinals). We have αk+1 ≤ αk since if αk+1 > αk ,
α1 + · · · + αk + αk+1 = α1 + · · · + αk−1 + αk+1 ≤ α
contradicting the maximality of αk .
It remains to show that this process terminates. Since the ordinals are
well-founded, the sequence α1 ≥ · · · ≥ αk · · · cannot be strictly decreasing
infinitely many times, so in order for the process to fail to terminate, there
would have to be some k so that αk = αk+n for all n. That is,
α1 + · · · + αk · n ≤ α
for all n. But then α1 + · · · + αk · ω ≤ α, and since αk · ω is additively
principal and αk < αk · ω, we contradict the maximality of αk .
Now we need to show uniqueness. Suppose β1 ≥ · · · ≥ βm , each βi is
additively principal, and β1 + · · · + βm = α. We will show by induction on
i that βi = αi . Suppose βj = αj for j < i. If αi < βi then, by maximality
of αi ,
α < α1 + · · · + αi−1 + βi = β1 + · · · + βi−1 + βi ≤ β1 + · · · + βm ,
contradicting the assumption that β1 +· · ·+βm = α. If βi < αi then βi0 < αi
for every i0 ≥ i, and therefore
β1 +· · ·+βi−1 +βi +· · ·+βm = α1 +· · ·+αi−1 +βi +· · ·+βm < α1 +· · ·+αi−1 +αi ≤ α,
and so β1 + · · · + βm < α, again contradicting the assumption.
Theorem 3.36. Suppose 0 < α < 0 . Then there is a unique sequence of
ordinals α1 ≤ α2 ≤ · · · ≤ αn < α such that α = ω αn + · · · + ω α1 .
Definition 3.37. We define the Cantor normal forms as follows:
• 0 is a Cantor normal form,
• If α1 ≥ α2 ≥ · · · ≥ αn are in Cantor normal form then so is
ω α1 + ω α2 + · · · + ω αn .
Since each Cantor normal form is in additive normal form, the Cantor
normal form is unique. Note that it is easy to code the Cantor normal form
in arithmetic using sequences.
We need one more arithmetic operation, a modification of addition which
is commutative.
Definition 3.38. The natural or commutative sum of α and β, written #,
is given as follows. Suppose the additive normal forms of α and β are
α = α1 + · · · + αn
and
β = αn+1 + · · · + αn+m .
Then
α#β = απ(1) + · · · + απ(n+m)
14
where π : [1, n + m] → [1, n + m] is a permutation such that απ(i+1) ≤ απ(i)
for all i < n + m.
For instance, 1#ω = ω + 1. More elaborately,
(ω ω + ω 2 + 1)#(ω 3 + ω 2 + ω) = ω ω + ω 3 + ω 2 · 2 + 1.
This choice of permutation π is precisely the choice that makes α#β as large
as possible.
Lemma 3.39.
(1) α#β = β#α,
(2) α < β implies α#γ < β#γ and γ#α < γ#β,
(3) # is associative,
(4) If α is additively principle and β, γ < α then β#γ < α.
3.6. Cut-Elimination. The cut-elimination theorem for first-order logic
applies to Peano Arithmetic, but it isn’t very useful: given a deduction of
ΓP A , Γ ⇒ Σ, there is a cut-free deduction, but since the axioms in ΓP A
include every formula, we lose all the useful properties of cut-elimination.
What we would really like is to be able to obtain a deduction without induction axioms—that is, given a deduction of ΓP A , Γ ⇒ Σ, we would like
a cut-free deduction of P − , Γ ⇒ Σ. This would have two benefits; first,
it would give us (most of) the consequences of cut-elimination back, since
the axioms of P − are very simple formulas. Further, such a result says
something about the consistency of PA: in particular, if PA ` ⊥ then
` P − ⇒ ⊥. Since the axioms of P − are essentially just definitions, the
latter is impossible, so we could conclude that PA is consistent.
There is one problem: the proposed theorem isn’t quite true. However it
is true if we restrict ourselves to formulas of a specific form. In particular,
we will show that if PA ` Σ where Σ consists only of Σ1 formulas then
Fc ` P − ⇒ Σ.
The proof of cut-elimination for Peano Arithmetic is a step beyond anything we have done so far. In order to simplify the proof, we will take a
strange route. We will introduce a new sequent calculus which allows infinitary rules—that is, rules which have infinitely many branches. We will
show how to embed proofs from regular Peano Arithmetic into this infinitary system, and then we will prove that a form cut-elimination holds in
this infinitary system. Specifically, we will prove that if Fc ` ΓP A , Γ ⇒ Σ
−
then Fcf
∞ ` P , Γ ⇒ Σ.
In order to complete the proof, we will have to move from the infinitary
system back to regular Peano Arithmetic. In general, this is not possible,
but it will be possible when the statement we have proven consists only of
Σ1 formulas.
We first define the system F∞ , which consists of replacing the R∀ rule
and L∃ rules in Fc with two new rules, known as the ω rules:
...
Γ ⇒ Σ, φ[0/x]
...
Γ ⇒ Σ, φ[n/x]
Rω
Γ ⇒ Σ, ∀xφ
15
...
Γ, φ[n/x] ⇒ Σ
...
Γ, ∃xφ ⇒ Σ
We also add the requirement that all sequents consist entirely of sentences—
that is, there are no free variables.
Observe that in F∞ , the induction rule is derivable! First, note that we
can derive Fc ` φ[0/x], ∀x(φ → φ[Sx/x]) ⇒ φ[n/x] for any n. We show this
by induction on n: for n = 0, this is trivial. Suppose the claim holds for n.
Then
Lω
Γ, φ[0/x] ⇒ Σ
φ[0/x], ∀x(φ → φ[Sx/x]) ⇒ φ[n/x]
φ[Sn/x] ⇒ φ[Sn/x]
φ[0/x], ∀x(φ → φ[Sx/x]), φ[n/x] → φ[Sn/x] ⇒ φ[Sn/x]
φ[0/x], ∀x(φ → φ[Sx/x]) ⇒ φ[Sn/x]
Now the induction axiom follows from a single application of the ω-rule
followed by two applications of R →.
Theorem 3.40. If Fc ` Γ ⇒ Σ where ΓΣ has no free variables then F∞ `
Γ ⇒ Σ.
Proof. By induction on deductions, we show:
If Fc ` Γ ⇒ Σ, where x1 , . . . , xn are the free variables in
Γ ⇒ Σ, then whenever t1 , . . . , tn are closed terms, there is a
deduction F∞ ` Γ[t1 /x1 ] · · · [tn /xn ] ⇒ Σ[t1 /x1 ] · · · [tn /xn ].
If the last inference is anything other than L∃ or R∀ then the claim follows
immediately from IH, since all other inference rules of Fc are also rules of
F∞ .
Suppose the final rule is R∀. Then the preceeding step was Γ ⇒ Σ, φ[y/x]
for some free y. By IH, for each n, there is a deduction of Γ ⇒ Σ, φ[n/x],
and therefore the claim followed by an application of Rω. The Lω case is
similar.
So suppose we have a deduction of Fc ` ΓP A , Γ ⇒ Σ. By compactness,
we may assume we used finitely many axioms from ΓP A , and in particular,
finitely many induction axioms, say Γ0A . By the previous theorem, there is
a deduction of F∞ ` Γ0A , P − , Γ ⇒ Σ. We may then apply finitely many cuts
with derivations of the induction axioms to conclude that F∞ ` P − , Γ ⇒ Σ.
Definition 3.41. The height of a deduction in F∞ is given recursively by:
• The height of an axiom is 1,
• If a deduction d is formed from subdeductions {di } then the height
of d is the smallest ordinal whose height is greater than the height
of any di .
We write `αr Γ ⇒ Σ if there is a deduction of Γ ⇒ Σ such that all cuts in
this deduction have rank < r and the height is ≤ α.
We still have our old friends the inversion lemmas:
Lemma 3.42.
(1) Suppose `αr Γ ⇒ Σ, φ ∧ ψ. Then `αr Γ ⇒ Σ, φ and
α
`r Γ ⇒ Σ, ψ.
16
(2)
(3)
(4)
(5)
Suppose
Suppose
Suppose
Suppose
`αr
`αr
`αr
`αr
Γ, φ ∨ ψ ⇒ Σ. Then `αr Γ, φ ⇒ Σ and `αr Γ, ψ ⇒ Σ.
Γ, φ → ψ ⇒ Σ. Then `αr Γ, ψ ⇒ Σ and `αr Γ ⇒ Σ, φ.
Γ ⇒ Σ, ∀xφ. Then for any n, `αr Γ ⇒ Σ, φ[n/x].
Γ, ∃xφ ⇒ Σ. Then for any n, `αr Γ, φ[n/x] ⇒ Σ.
And the reduction lemmas:
Lemma 3.43.
(1) Suppose `αr Γ ⇒ Σ, φ ∧ ψ and `βr Γ, φ ∧ ψ ⇒ Σ where
rk(φ ∧ ψ) ≤ r. Then `α#β
Γ ⇒ Σ.
r
α
(2) Suppose `r Γ, φ ∨ ψ ⇒ Σ and `βr Γ ⇒ Σ, φ ∨ ψ where rk(φ ∨ ψ) ≤ r.
Then `rα#β Γ ⇒ Σ.
(3) Suppose `αr Γ, φ → ψ ⇒ Σ and `βr Γ ⇒ Σ, φ → ψ where rk(φ →
ψ) ≤ r. Then `rα#β Γ ⇒ Σ.
(4) Suppose `αr Γ ⇒ Σ, ∀xφ and `βr Γ, ∀xφ ⇒ Σ where rk(∀xφ) ≤ r.
Then `rα#β Γ ⇒ Σ.
(5) Suppose `αr Γ, ∃xφ ⇒ Σ and `βr Γ ⇒ Σ, ∃xφ where rk(∃xφ) ≤ r.
Then `rα#β Γ ⇒ Σ.
Proof. We prove the first of these; the others are similar. We proceed by
induction on β. We consider two cases.
For the first case, suppose the last inference of the deduction of Γ, φ∧ψ ⇒
Σ had main formula φ ∧ ψ. Then immediate subdeduction must have been
a deduction of either Γ, φ ∧ ψ, φ ⇒ Σ or of Γ, φ ∧ ψ, ψ ⇒ Σ, and have had
height δ < β for some δ. Without loss of generality, we assume the former.
By IH, there is a deduction of Γ, φ ⇒ Σ of height α#δ and by inversion there
is a deduction of Γ ⇒ Σ, φ of height α. We obtain a deduction of Γ ⇒ Σ
by applying a cut over φ. This deduction has height > max{α#δ, α}. Since
β > δ, α#β > α#δ and also β > 0 so α#β > α. Therefore this deduction
has height at most α#β.
α
Lemma 3.44. Suppose `αr+1 Γ ⇒ Σ. Then `2r Γ ⇒ Σ.
Proof. By induction on α. If the last inference of the deduction is anything
other than a cut over a formula of rank r, the claim follows by applying IH to
all subdeductions and then applying the same inference. All subdeductions
have height < α, so IH gives deductions of height < 2α .
Suppose the last inference is a cut over a formula of rank r. The two
subdeductions have heights β, β 0 < α, and by IH, there are deductions of
0
height at most 2β , 2β with all cuts having rank < r. We then apply the
0
previous lemma, obtaining a deduction of Γ ⇒ Σ of height at most 2β #2β ≤
0
2max{β,β }+1 ≤ 2α .
α
Definition 3.45. Define 2α0 = α and 2αr+1 = 22r .
2α
Theorem 3.46. If `αr Γ ⇒ Σ then `0r Γ ⇒ Σ.
For arbitrary sequents Γ ⇒ Σ, having a cut-free proof in F∞ doesn’t do
us much good.
17
Theorem 3.47. Consider a deduction of Γ ⇒ Σ in Fcf
∞ where every formula
in Γ has the form ∀xφ with φ quantifier-free and every formula in Σ has the
form ∃xψ with ψ quantifier-free. Then this is a deduction in Fcf
c .
Proof. Easily seen since, by the generalized subformula property, the ω rules
do not appear in such a deduction.
3.7. Consequences of Cut-elimination. We can ask what it would take
to formalize the argument just given—that is to carry it out, not in ordinary
mathematics, but inside some sequent calculus. PA includes more than
enough knowledge about natural numbers to code deductions and make
statements about PA itself.
Very careful work shows that the following is enough. IΣ1 is the restriction of PA in which the only induction axioms allowed are those where φ is
Σ1 .
(It is usual to use, in place of IΣ1 , an even weaker theory, PRA (“primitive recursive arithmetic”), in which there are no quantifiers in the language—
and therefore, none in the induction axioms—but where some additional
functions—the primitive recursive functions—are added to make enough
coding definable.)
Definition 3.48. Let α be a description of an ordinal in the language of
arithmetic (that is, an injection π : α → N such that there are formulas r
and <α such that r(n) holds iff n is in the range of π and <α (n, m) holds
iff n and m are in the range of π and π −1 (n) < π −1 (m)). We write T I(α, φ)
for the formula
(∀x (∀y <α xφ(y)) → φ(x)) → ∀xφ(x).
Theorem 3.49. IΣ1 +{T I(0 , φ) | φ is Σ1 } proves that PA is 1-consistent.
Idea of the proof : With great care, one can actually carry out the proof
of cut-elimination just described entirely within the formal system of P RA
together with induction up to 0 on quantifier-free formulas. This isn’t at all
obvious—after all, the proof given involved infinite objects. However when
the sequent being proven is Σ1 , the ω-rule can be systematically replaced
by a constructive ω-rule, in which there is a computable function f with
the property that for each n, f (n) is a code describing a deduction of Γ ⇒
Σ, φ[n/x]. This code might have to reference other functions coding other
ω rules, so the details are quite complicated.
Since Gödel’s Incompleteness Theorem applies to PA, it follows that the
argument just given cannot be carried out inside PA, nor in any fragment
of it. Therefore we have:
Corollary 3.50. PA does not prove T I(0 , X) for any representation of the
ordinal 0 .
Indeed, the following is true:
18
Theorem 3.51. For every α < 0 , there is a representation of α such that
PA ` T I(α, X).
In fact, PA ` T I(α, X) for the “natural” representations of α. However
there are “artificial” representations of even, say, ω, such that PA cannot
prove transfinite induction. For instance, consider the following ordering:
x ≺ y if either x < y and PA is consistent, or y < x and PA
is not consistent.
If PA is consistent, this is a representation of ω, but if PA is not consistent,
this is a representation of the ordering which is ω reversed, which obviously
has an infinite decreasing sequence 0 1 2 · · · . So if PA could prove
transfinite induction for this ordering, it could prove its own consistency.
Extensions T of IΣ1 , such as PA and its extensions and fragments, often
have an ordinal α for which the following are all true:
• α is the supremum of those ordinals such that there is some representation of α such that T ` T I(α),
• α is the least ordinal such that T 6` T I(α),
• α is least such that IΣ1 + T I(α) ` T is 1 − consistent (to say that
a theory is 1-consistent means that every Σ1 sentence it proves is
actually true),
• If T ` ∀x∃yφ(x, y) where φ is ∆0 then the function mapping x to
the least such y is “≺ α-computable” (this means that the function
is not only computable, but computable by a machine which, at each
step, decrements a timer, where the timer is always an ordinal < α,
and where the machine always finishes by the time the timer reaches
0),
• If T ` ∀x∃yφ(x, y) where φ is ∆0 then the function mapping x to the
least such y is bounded by some fast-growing function (see below)
fβ with β < α, and T proves that each fβ for β < α is total.
It is possible to contrive artificial theories in which these properties do not
align, but for “natural” theories, these properties all occur at the same
ordinal. We call this the proof-theoretic ordinal of T.
There is an analagous approach to proof-theoretic ordinals for theories of
sets (specifically, weak fragments of ZFC) rather than theories of arithmetic;
in this case the proof-theoretic ordinal generally aligns with the least α such
that every Π2 formula provable in the theory is satisfied at Lα , the α-th
level of the constructible hierarchy.
Proof-theoretic ordinals sort theories into a rough hierarchy of strength.
If the ordinal of S is less than the ordinal of T (and both are theories of—
possibly extensions of—the language of arithmetic) then any Π2 consequence
of S (in their common language) will typically also be a consequence of T.
This is one of the reasons for the special role of Π2 formulas, and computable
functions, in proof-theory.
19
Definition 3.52. Suppose that α is a countable ordinal and for every
limit ordinal λ ≤ α we have fixed an increasing sequence λ[n] such that
λ = supn λ[n]. Then we define the fast-growing hierarchy of functions by
recursion on ordinals α:
• f0 (x) = x + 1,
• fβ+1 (x) = fβx (x),
• fλ (x) = fλ[x] (x).
Observe that f1 (x) = f0x (x) = x + x = 2x, f2 (x) = f1x (x) = 2x x. As a
result, these functions grow very quickly indeed!
One consequence of these results is that there is a cap to how quickly
functions which PA can prove total are allowed to grow, and therefore one
way to show that something cannot be proven in PA is to prove that it
grows faster than fα for every α < 0 . (“Grows faster” here could mean
fα (x) < g(x) for infinitely many x.)
3.8. Goodstein’s Theorem. All these leads to an example of a “natural”
statement unprovable in PA.
Definition 3.53. We define a hereditary base n notation for a number inductively by:
• 0 is a hereditary base n notation,
• If for each i ≤ k, ai is a hereditary base n notation and i < j implies
ai ≤ aj then
nak + nak−1 + · · · + na0
is a hereditary base n notation.
This is a generalization of the usual way of writing a number in base n,
with the addition that the exponents themselves must also be written in
base n.
For example, in hereditary base 2, the first few numbers are:
0
20
0
20
20
20
0
0
20 , 22 , 22 + 20 , 22 , 22 + 20 , 22 + 22 , 22 + 22 + 20 , . . .
For a larger example, to write 73 in hereditary base 3, we first write 73
in regular base 3:
221 = 34 + 34 + 33 + 33 + 3 + 1 + 1
and then we rewrite each exponent itself in base 3:
221 = 33+1 + 33+1 + 33 + 33 + 3 + 1 + 1
finally obtaining:
30 +30
221 = 33
30 +30
+ 33
30
+ 33
30
+ 33
0
+ 33 + 30 + 3 0 .
Definition 3.54. We define the function ιa,b (x) to be the function given
by writing the number x in hereditary base a notation and then replacing
every a with a b.
20
For example
30 +30
ι3,4 (221) = ι3,4 (33
40 +40
= 44
30 +30
+ 33
40 +40
+ 44
30
30
0
+ 33
+ 33
+ 33 + 3 0 + 3 0 )
40
40
0
+ 44
+ 44
+ 44 + 40 + 4 0
= 45 + 45 + 44 + 44 + 4 + 1 + 1
= 2566.
Definition 3.55. For any x, the Goodstein sequence starting with x is the
sequence a1 , a2 , . . . where:
• a1 = x,
• ak+1 = ιk+1,k+2 (ak ) − 1.
More generally, a generalized Goodstein sequence is a sequence a1 , a2 , . . .
together with an auxiliary sequence h1 , h2 , . . . such that for every k, hk <
hk+1 and
ak+1 < ιhk+1 ,hk+2 (ak ).
For example, the Goodstein sequence starting with 3 is the sequence
•
•
•
•
•
•
a1
a2
a3
a4
a5
a6
= 3 = 2 1 + 20 ,
= ι2,3 (21 + 20 ) − 1 = 31 + 30 − 1 = 3,
= ι3,4 (3) − 1 = 41 − 1 = 3,
= ι4,5 (40 + 40 + 40 ) − 1 = 2,
= ι5,6 (50 + 50 ) − 1 = 1,
= 0.
On the other hand, the Goodstein sequence starting with 4 begins:
•
•
•
•
a1
a2
a3
a4
= 4 = 22 ,
= ι2,3 (22 ) − 1 = 33 − 1 = 26 = 32 + 32 + 3 + 3 + 1 + 1,
= ι3,4 (26) = 42 + 42 + 4 + 4 + 1 = 41,
= ι4,5 (41) = 52 + 52 + 5 + 5 = 60.
In fact, this sequence will eventually start decreasing, and will eventually
reach 0—after 32402653211 − 2 steps!
Theorem 3.56. For every h and every x, the h-Goodstein sequence starting
with x eventually reaches 0.
Proof. We prove this by transfinite induction up to 0 . For any number x,
we may define ιa,ω (x), the result of replacing a in the hereditary base a
notation with ω. The result is always an ordinal in Cantor Normal Form,
and in particular, an ordinal < 0 . For instance, consider the Goodstein
sequence starting with 4:
•
•
•
•
ι2,ω (a1 ) = ω ω ,
ι3,ω (a2 ) = ω 2 + ω 2 + ω + ω + 1 + 1,
ι4,ω (a3 ) = ω 2 + ω 2 + ω + ω + 1,
ι5,ω (a4 ) = ω 2 + ω 2 + ω + ω.
21
This suggests the main point: no matter what h is, ιh(k+2),ω (ak+1 ) <
ιh(k+1),ω (ak ). This is easily seen, since ιb,ω (ιa,b (x)) = ιa,ω (x), and therefore
ιh(k+2),ω (ak+1 ) < ιh(k+2),ω (ak+1 +1) = ιh(k+2),ω (ιh(k+1),h(k+2) (ak )) = ιh(k+1),ω (ak ).
Therefore the sequence
ιh(k+1),ω (ak )
is a strictly decreasing sequence of ordinals below 0 , and therefore eventually
must hit 0.
Theorem 3.57. Suppose that for every h and every x, the h-Goodstein
sequence starting with x eventually reaches 0. Then 0 is well-founded.
Proof. Suppose g were an infinite descending sequence below 0 , g(1) >
g(2) > · · · . We can easily choose an h so that ιk+1,ω (ak ) = g(k) for all
k > 1, simply by setting h(k) = ιω,k+1 (g(k)) − ιω,k+1 (g(k + 1)).
In particular, it follows that PA cannot prove that every h-Goodstein
sequence eventually terminates. In fact, with a bit more care, it is possible
to show that the function mapping x to the number of steps in the Goodstein
sequence starting with x grows at roughly the speed of f0 , and therefore
PA cannot even prove that regular Goodstein sequences terminate.