Download Advanced Topics in Markov chains

Document related concepts

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability box wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Advanced Topics
in Markov chains
J.M. Swart
May 16, 2012
Abstract This is a short advanced course in Markov chains, i.e., Markov
processes with discrete space and time. The first chapter recalls, without proof, some of the basic topics such as the (strong) Markov property, transience, recurrence, periodicity, and invariant laws, as well as
some necessary background material on martingales. The main aim of
the lecture is to show how topics such as harmonic functions, coupling,
Perron-Frobenius theory, Doob transformations and intertwining are all
related and can be used to study the properties of concrete chains, both
qualitatively and quantitatively. In particular, the theory is applied to
the study of first exit problems and branching processes.
2
Notation
N
N+
N
Z
Z
Q
R
R
C
B(E)
1A
A⊂B
Ac
A\B
A
int(A)
(Ω, F, P)
ω
E
σ(. . .)
kf k∞
µν
f k gk
fk ∼ gk
o(n)
O(n)
δx
µ⊗ν
⇒
natural numbers {0, 1, . . .}
positive natural numbers {1, 2, . . .}
N ∪ {∞}
integers
Z ∪ {−∞, ∞}
rational numbers
real numbers
extended real numbers [−∞, ∞]
complex numbers
Borel-σ-algebra on a topological space E
indicator function of the set A
A is a subset of B, which may be equal to B
complement of A
set difference
closure of A
interior of A
underlying probability space
typical element of Ω
expectation with respect to P
σ-field generated by sets or random variables
supremumnorm kf k∞ := supx |f (x)|
µ is absolutely continuous w.r.t. ν
lim fk /gk = 0
lim fk /gk = 1
any function such that o(n)/n → 0
any function such that supn o(n)/n ≤ ∞
delta measure in x
product measure of µ and ν
weak convergence of probability laws
Contents
0 Preliminaries
0.1 Stochastic processes . . . . . . . . . .
0.2 Filtrations and stopping times . . . .
0.3 Martingales . . . . . . . . . . . . . .
0.4 Martingale convergence . . . . . . . .
0.5 Markov chains . . . . . . . . . . . . .
0.6 Kernels, operators and linear algebra
0.7 Strong Markov property . . . . . . .
0.8 Classification of states . . . . . . . .
0.9 Invariant laws . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
9
10
14
15
17
19
1 Harmonic functions
1.1 (Sub-) harmonicity . . . . . . . . .
1.2 Random walk on a tree . . . . . . .
1.3 Coupling . . . . . . . . . . . . . . .
1.4 Convergence in total variation norm
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
29
34
37
.
.
.
.
2 Eigenvalues
2.1 The Perron-Frobenius theorem . . . . .
2.2 Subadditivity . . . . . . . . . . . . . .
2.3 Spectral radius . . . . . . . . . . . . .
2.4 Existence of the leading eigenvector . .
2.5 Uniqueness of the leading eigenvector .
2.6 The Jordan normal form . . . . . . . .
2.7 The spectrum of a nonnegative matrix
2.8 Convergence to equilibrium . . . . . .
2.9 Conditioning to stay inside a set . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
43
44
45
46
48
51
53
55
3 Intertwining
3.1 Intertwining of Markov chains
3.2 Markov functionals . . . . . .
3.3 Intertwining and coupling . .
3.4 First exit problems revisited .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
65
69
72
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Branching processes
75
4.1 The branching property . . . . . . . . . . . . . . . . . . . . . . . . 75
3
4
CONTENTS
4.2
4.3
4.4
4.5
4.6
4.7
Generating functions . .
The survival probability
First moment formula .
Second moment formula
Supercritical processes .
Trimmed processes . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
78
81
84
86
89
90
A Supplementary material
97
A.1 The spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Chapter 0
Preliminaries
0.1
Stochastic processes
Let I be a (possibly infinite) interval in Z. By definition, a stochastic process
with discrete time is a collection of random variables X = (Xk )k∈I , defined on
some underlying probability space (Ω, F, P) and taking values in some measurable
space (E, E). We call the random function
I 3 k 7→ Xk (ω) ∈ E
the sample path of the process X. The sample path of a discrete-time stochastic
process is in fact itself a random variable X = (Xk )k∈I , taking values in the product
space (E I , E I ), where
E I := {x = (xk )k∈I : xk ∈ E ∀k ∈ I}
is the space of all functions x : I → E and E I denotes the product-σ-field. It
is well-known that a probability law on (E I , E I ) is uniquely characterized by its
finite-dimensional marginals, i.e., even if I is infinite, the law of the sample path
X is uniquely determined by the finite dimensional distributions
P (Xk , . . . , Xk+n ) ∈ ·
({k, . . . , k + n} ⊂ I).
of the process. Conversely, if (E, E) is a Polish space equipped with its Borel-σfield, then by the Daniell-Kolmogorov extension theorem, any consistent collection
of probability measures on the finite-dimensional product spaces (E J , E J ), with
J ⊂ I a finite interval, uniquely defines a probability measure on (E I , E I ). Polish
5
6
CHAPTER 0. PRELIMINARIES
spaces include many of the most commonly used spaces, such as countable spaces
equipped with the discrete topology, Rd , separable Banach spaces, and much more.
Moreover, open or closed subsets of Polish spaces are Polish, as are countable
carthesian products of Polish spaces, equipped with the product topology.
0.2
Filtrations and stopping times
As before, let I be an interval in Z. A discrete filtration is a collection of σ-fields
(Fk )k∈I such that Fk ⊂ Fk+1 for all k, k + 1 ∈ I. If X = (Xk )k∈I is a stochastic
process, then
FkX := σ {Xj : j ∈ I, j ≤ k}
(k ∈ I)
is a filtration, called the filtration generated by X. For any filtration (Fk )k∈I , we
set
[ F∞ := σ
Fk .
k∈I
In particular,
X
F∞
= σ((Xk )k∈I ).
A stochastic process X = (Xk )k∈I is adapted to a filtration (Fk )k∈I if Xk is Fk measurable for each k ∈ I. Then (FkX )k∈I is the smallest filtration that X is
adapted to, and X is adapted to a filtration (Fk )k∈I if and only if FkX ⊂ Fk for all
k ∈ I.
Let (Fk )k∈I be a filtration. An Fk - stopping time is a function τ : Ω → I ∪ {∞}
such that the {0, 1}-valued process k 7→ 1{τ ≤k} is Fk -adapted. Obviously, this is
equivalent to the statement that
{τ ≤ k} ∈ Fk
(k ∈ I).
If (Xk )k∈I is an E-valued stochastic process and A ⊂ E is measurable, then the
first entrance time of X into A
τA := inf{k ∈ I : Xk ∈ A}
with inf ∅ := ∞ is an FkX -stopping time. More generally, the same is true for the
first entrance time of X into A after σ
τσ,A := inf{k ∈ I : k > σ, Xk ∈ A},
where σ is an Fk -stopping time. Deterministic times are stopping times (w.r.t.
any filtration). Moreover, if σ, τ are Fk -stopping times, then also
σ ∨ τ,
σ∧τ
0.3. MARTINGALES
7
are Fk -stopping times. If f : I ∪ {∞} → I ∪ {∞} is measurable and f (k) ≥ k for
all k ∈ I, and τ is an Fk -stopping time, then also f (τ ) is an Fk -stopping time.
If X = (Xk )k∈I is an Fk -adapted stochastic process and τ is an Fk -stopping time,
then the stopped process
ω 7→ Xk∧τ (ω) (ω)
(k ∈ I)
is also an Fk -adapted stochastic process. If τ < ∞ a.s., then moreover ω 7→
Xτ (ω) (ω) is a random variable. If τ is an Fk -stopping time defined on some filtered
probability space (Ω, F, (Fk )k∈I , P) (with Fk ⊂ F for all k ∈ I), then the σ-field of
events observable before τ is defined as
Fτ := A ∈ F∞ : A ∩ {τ ≤ k} ∈ Fk ∀k ∈ I .
Exercise 0.1 If (Fk )k∈I is a filtration and σ, τ are Fk -stopping times, then show
that Fσ∧τ = Fσ ∧ Fτ .
Exercise 0.2 Let (Fk )k∈I be a filtration, let X = (Xk )k∈I be an Fk -adapted
stochastic process and let τ be an FkX -stopping time. Let Yk := Xk∧τ denote the
stopped process Show that the filtration generated by Y is given by
X
FkY = Fk∧τ
k ∈ I ∪ {∞} .
In particular, since this formula holds also for k = ∞, one has
FτX = σ (Xk∧τ )k∈I ,
i.e., FτX is the σ-algebra generated by the stopped process.
0.3
Martingales
By definition, a real stochastic process M = (Mk )k∈I , where I ⊂ Z is an interval,
is an Fk -submartingale with respect to some filtration (Fk )k∈I if M is Fk -adapted,
E[|Mk |] < ∞ for all k ∈ I, and
E[Mk+1 |Fk ] ≥ Mk
({k, k + 1} ⊂ I).
(0.1)
We say that M is a supermartingale if the reverse inequality holds, i.e., if −M
is a submartingale, and a martingale if equality holds in (0.1), i.e., M is both a
8
CHAPTER 0. PRELIMINARIES
submartingale and a supermartingale. By induction, it is easy to show that (0.1)
holds more generally when k, k + 1 are replaced by more general times k, m ∈ I
with k ≤ m.
If M is an Fk -submartingale and (Fk0 )k≥0 is a smaller filtration (i.e., Fk0 ⊂ Fk for
all k ∈ I) that M is also adapted to, then
E[Mk+1 |Fk0 ] = E E[Mk+1 |Fk ]|Fk0 ] ≥ E[Mk |Fk0 ] = Mk
({k, k + 1} ⊂ I),
which shows that M is also an Fk -submartingale. In particular, a stochastic process M is a submartingale with respect to some filtration if and only if it is a
submartingale with respect to its own filtration (FkM )k∈I . In this case, we simply
say that M is a submartingale (resp. supermartingale, martingale).
Let (Fk )k∈I be a filtration and let (Fk−1 )k∈I be the filtration shifted one step to
left, where we set Fk−1 := {∅, Ω} if k − 1 6∈ I. Let X = (Xk )k∈I be a real Fk adapted stochastic process such that E[|Xk |] < ∞ for all k ∈ I. By definition,
a compensator of X w.r.t. the filtration (Fk )k∈I is an Fk−1 -adapted real process
K = (Kk )k∈I such that E[|Kk |] < ∞ for all k ∈ I and (Xk − Kk )k∈I is an Fk martingale. It is not hard to show that K is a compensator if and only if K is
Fk−1 -adapted, E[|Kk |] < ∞ for all k ∈ I and
Kk+1 − Kk = E Xk+1 Fk ] − Xk
({k, k + 1} ⊂ I).
T
It follows that any two compensators must be equal up to an additive k∈I Fk−1 measurable random constant. In particular, if I = N, then because of the way
we have defined F−1 , such a constant must be deterministic. In this case, it is
customary to put K0 := 0, i.e., we call
Kn :=
n
X
E Xk Fk−1 ] − Xk−1
(n ≥ 0)
k=1
the (unique) compensator of X with respect to the filtration (Fk )k∈N . We note
that X is a submartingale if and only if its compensator is a.s. nondecreasing.
The proof of the following basic fact can be found in, e.g., [Lach12, Thm 2.4].
Proposition 0.3 (Optional stopping) Let I ⊂ Z be an interval, (Fk )k∈I a
filtration, let τ be an Fk -stopping time and let (Mk )k∈I be an Fk -submartingale.
Then the stopped process (Mk∧τ )k∈I is an Fk -submartingale.
The following proposition is a special case of [Lach12, Prop. 2.1].
0.4. MARTINGALE CONVERGENCE
9
Proposition 0.4 (Conditioning on events up to a stopping time) Let I ⊂ Z
be an interval, (Fk )k∈I a filtration, let τ be an Fk -stopping time and let (Mk )k∈I
be an Fk -submartingale. Then
E Mk Fk∧τ ≥ Mk∧τ
(k ∈ I).
0.4
Martingale convergence
If F, FkS(k ≥ 0) are σ-fields, then we say that Fk ↑ F if Fk ⊂ Fk+1 (k ≥ 0) and
F = σ( k≥0 Fk ). Note that this is the same as saying that (Fk )k≥0 is a filtration
and F = F∞ , as we have defined it above. Similarly, if F,
T Fk (k ≥ 0) are σ-fields,
then we say that Fk ↓ F if Fk ⊃ Fk+1 (k ≥ 0) and F = k≥0 Fk .
Exercise 0.5 Let (Fk )k∈N be a filtration and let τ be an Fk -stopping time. Show
that
Fk∧τ ↑ Fτ as k ↑ ∞.
The following proposition says that conditional expectations are continuous w.r.t.
convergence of σ-fields. A proof can be found in, e.g., [Lach12, Prop. 4.12], [Chu74,
Thm 9.4.8] or [Bil86, Thms 3.5.5 and 3.5.7].
Proposition 0.6 (Continuity in σ-field) Let X be a real random variable defined on a probability space (Ω, F, P) and let F∞ , Fk ⊂ F (k ≥ 0) be σ-fields.
Assume that E[|X|] < ∞ and Fk ↑ F∞ or Fk ↓ F∞ . Then
E[X | Fk ] −→ E[X | F∞ ]
k→∞
a.s. and in L1 -norm.
Note that if Fk ↑ F and E[|X|] < ∞, then Mk := E[X | Fk ] defines a martingale. Proposition 0.6 says that such a martingale always converges. Conversely,
we would like to know for which martingales (Mk )k≥0 there exists a final element
X such that Mk = E[X | Fk ] . This leads to the problem of martingale convergence. Since each submartingale is the sum of a martingale and a nondecreasing
compensator and since nondecreasing functions always converge, we may more or
less equivalently ask the same question for submartingales. For a proof of the
following fact we refer to, e.g., [Lach12, Thm 4.1].
Proposition 0.7 (Submartingale convergence) Let (Mk )k∈N be a submartingale such that supn≥0 E[Mk ∨ 0] < ∞. Then there exists a random variable M∞
with E[|M∞ |] < ∞ such that
Mk −→ M∞
k→∞
a.s.
10
CHAPTER 0. PRELIMINARIES
In particular, this implies that nonnegative supermartingales converge almost
surely. The same is not true for nonnegative submartingales: a counterexample is
one-dimensional random walk reflected at the origin.
In general, even if M is a martingale, it need not be true that E[M∞ ] ≥ E[M0 ] (a
counterexample is random walk stopped at the origin). We recall that a collection
of random variables (Xk )k∈I is uniformly integrable if
lim sup E |Xk |1{|Xk |≥n} = 0.
n→∞ k∈I
Sufficient1 for this is that supk∈I E[ψ(|Xk |)] < ∞, where ψ : [0, ∞) → [0, ∞) is
nonnegative, increasing, convex, and satisfies limr→∞ ψ(r)/r = ∞. Possible choices
are for example ψ(r) = r2 or ψ(r) = (1 + r) log(1 + r) − r. It is well-known that
uniform integrability and a.s. convergence of a sequence of real random variables
imply convergence in L1 -norm. For submartingales, the following result is known
[Lach12, Thm 4.8].
Proposition 0.8 (Final element) In addition to the assumptions of Proposition 0.7, assume that (Mk )k∈N is uniformly integrable. Then
E |Mk − M∞ | −→ 0 a.s.
k→∞
and E[M∞ |Fk ] ≥ Mk for all k ≥ 0. If M is a martingale, then E[M∞ |Fk ] = Mk
for all k ≥ 0.
Note that in particular, if M is a martingale, then Proposition 0.8 says that Mk =
E[M∞ |Fk ], which shows that all information about the martingale M is hidden in
its final element M∞ .
Combining Propositions 0.8 and 0.3, we see that if τ is an Fk -stopping time such
that τ < ∞ a.s., (Mk )k∈N is an Fk -submartingale, and (Mk∧τ )k∈N is uniformly
integrable, then E[Mτ ] = limk→∞ E[Mk∧τ ] ≥ M0 .
There also exist convergence results for ‘backward’ martingales (Mk )k∈{−∞,...,0} .
0.5
Markov chains
Proposition 0.9 (Markov property) Let (E, E) be a measurable space, let I ⊂
Z be an interval and let (Xk )k∈I be an E-valued stochastic process. For each n ∈ I,
1
By the De la Valle-Poussin theorem, this condition is in fact also necessary.
0.5. MARKOV CHAINS
11
set In− := {k ∈ I : k ≤ n} and In+ := {k ∈ I : k ≥ n}, and let FnX := σ((Xk )k∈In− )
be the filtration generated by X. Then the following conditions are equivalent.
(i) P (Xk )k∈In− ∈ A, (Xk )k∈In+ ∈ B Xn
= P (Xk )k∈In− ∈ A Xn P (Xk )k∈In+ ∈ B Xn a.s.
−
+
for all A ∈ E In , B ∈ E In , n ∈ I.
+
(ii) P (Xk )k∈In+ ∈ B FnX = P (Xk )k∈In+ ∈ B Xn a.s. for all B ∈ E In , n ∈ I.
(iii) P Xn+1 ∈ C FnX = P Xn+1 ∈ C Xn a.s. for all C ∈ E, {n, n + 1} ⊂ I.
Remarks Property (i) says that the past and future are conditionally independent
given the present. Property (ii) says that the future depends on the past only
through the present, i.e., after we condition on the present, conditioning on the
whole past does not give any extra information. Property (iii) says that it suffices
to check (ii) for single time steps.
Proof of Proposition 0.9 Set GnX := σ((Xk )k∈In+ ). If (i) holds, then, for any
A ∈ FnX and B ∈ GnX , we have
E 1A P[B |Xn ] = E E 1A P[B | Xn ] Xn
= E E[1A | Xn ]P[B | Xn ] = E P[A | Xn ]P[B | Xn ]
= E P[A ∩ B | Xn ] = P[A ∩ B],
where we have pulled the σ(Xn )-measurable random variable P[B | Xn ] out of the
conditional expectation. Since this holds for arbitrary A ∈ FnX and since P[B |Xn ]
is FnX -measurable, it follows from the definition of conditional expectations that
P[B | FnX ] = P[B |Xn ],
which is just another way of writing (ii). Conversely, if (ii) holds, then for any
C ∈ σ(Xn ),
E P[A | Xn ]P[B | Xn ]1C = E P[B | Xn ]1C E[1A | Xn ]
= E E[P[B | Xn ]1C 1A | Xn ] = E 1A∩C P[B | FnX ] = P[A ∩ B ∩ C].
Since this holds for any C ∈ σ(Xn ), it follows from the definition of conditional
expectations that
P[A ∩ B | Xn ] = P[A | Xn ]P[B | Xn ] a.s.
12
CHAPTER 0. PRELIMINARIES
To see that (iii) is sufficient for (ii), one first proves by induction that
P Xn+1 ∈ C1 , . . . , Xn+m ∈ Cm FnX = P Xn+1 ∈ C1 , . . . , Xn+m ∈ Cm Xn ,
and then uses that these sort events uniquely determine conditional probabilities
of events in GnX .
If a process X = (Xk )k∈I satisfies the equivalent conditions of Proposition 0.9,
then we say that X has the Markov property. For processes with countable state
spaces, there is an easier formulation.
Proposition 0.10 (Markov chains) Let I ⊂ Z be an interval and let X =
(Xk )k∈I be a stochastic process taking values in a countable space S. Then X has
the Markov property if and only if for each {k, k + 1} ⊂ I there exists a probability
kernel Pk,k+1 (x, y) on S such that
P[Xk = xk , . . . , Xk+n = xk+n ]
= P[Xk = xk ]Pk,k+1 (xk , xk+1 ) · · · Pk+n−1,k+n (xk+n−1 , xk+n )
(0.2)
for all {k, . . . , k + n} ⊂ I, xk , . . . , xk+n ∈ S.
Proof See, e.g., [LP11, Thm 2.1].
If I = N, then Proposition 0.10 shows that the finite dimensional distributions,
and hence the whole law of a Markov chain X are defined by its initial law P[X0 ∈
· ] and its transition probabilities Pk,k+1 (x, y). If the initial law and transition
probabilities are given, then it is easy to see that the probability laws defined
by (0.2) are consistent, hence by the Daniell-Kolmogorov extension theorem, there
exists a Markov chain X, unique in distribution, with this initial law and transition
probabilies.
We note that conversely, a Markov chain X determines its transition probabilities
Pk,k+1 (x, y) only for a.e. x ∈ S w.r.t. the law of Xk . If it is possible to choose
the transition kernels Pk,k+1 ’s in such a way that they do not depend on k, then
we say that the Markov chain is homogeneous. We are usually not interested in
the problem to find Pk,k+1 given X, but typically we start with a given probability
kernel P on S and are interested in all Markov chains that have P as their transition
probability in each time step, and arbitrary initial law. Often, the word Markov
chain is used in this more general sense. Thus, we often speak of ‘the Markov chain
with state space S that jumps from x to y with probability. . . ’ without specifying
0.5. MARKOV CHAINS
13
the initial law. From now on, we use the convention that all Markov chains are
homogeneous, unless explicitly stated otherwise. Moreover, if we don’t mention the
initial law, then we mean the process started in an arbitrary initial law.
We note from Proposition 0.9 (i) that the Markov property is symmetric under
time reversal, i.e., if (Xk )k∈I has the Markov property and I 0 := {−k : k ∈ I}, then
the process X 0 = (Xk0 )k∈I 0 defined by Xk0 := X−k (k ∈ I 0 ) also has the Markov
property. It is usually not true, however, that X 0 is homogeneous if X is. An
exception are stationary processes, which leads to the useful concept of reversible
laws.
Exercise 0.11 (Stopped Markov chain) Let X = (Xk )k≥0 be a Markov chain
with countable state space S and transition kernel P , let A ⊂ S and let τA :=
inf{k ≥ 0 : Xk ∈ A} be the first entrance time of A. Let Y be the stopped process
Yk := Xk∧τA (k ≥ 0). Show that Y is a Markov chain and determine its transition
kernel. If we replace τA by the second entrance time of A, is Y then still Markov?
By definition, a random mapping representation of a probability kernel P on a
countable state space S is a probability space (E, E, µ) together with a measurable
function f : S × E → S such that if Z1 is a random variable taking values in (E, E)
with law µ, then
P (x, y) = P[f (x, Z1 ) = y]
(x, y ∈ S).
If (Zk )k≥1 are i.i.d. random variables with values in (E, E) and common law µ
and X0 is a random variable taking values in S, independent of the (Zk )k≥1 , then
setting, inductively,
Xk := f (Xk−1 , Zk )
(k ≥ 1)
defines a Markov chain with transition kernel P and initial state X0 . Random
mapping representations are essential for simulating Markov chains on a computer.
In addition, they have plenty of theoretical applications, for example for coupling
Markov chains with different initial states. (See Section 1.3 for an introduction
to coupling.) One can show that each probability kernel has a random mapping
representation, but such a representation is far from unique. Often, the key to a
good proof is choosing the right one.
We note that it is in general not true that functions of Markov chains are themselves
Markov chains. An exception is the case when
P f (Xk+1 ) ∈ · |FkX ]
depends only on f (Xk ). In this case, we say that f (X) is an autonomous Markov
chain.
14
CHAPTER 0. PRELIMINARIES
Lemma 0.12 (Autonomous Markov functional) Let X = (Xk )k∈I be a Markov chain with countable state space S and transition kernel P . Let S 0 be a countable set and let f : S → S 0 be a function. Assume that there exists a probability
kernel P 0 on S 0 such that
X
P 0 (x0 , y 0 ) =
P (x, y 0 )
(x ∈ S, f (x) = x0 ).
y: f (y)=y 0
Then f (X) := (f (Xk ))k∈I is a Markov chain with state space S 0 and transition
kernel P 0 .
0.6
Kernels, operators and linear algebra
Let X = (Xk )k∈I be a stochastic process taking values in a countable space S, and
let P be a probability kernel on S. Then it is not hard to see that X is a Markov
process with transition kernel P (and arbitrary initial law) if and only if
P[Xk+1 = y | FkX ] = P (Xk , y) a.s.
(y ∈ S, {k, k + 1} ⊂ I),
where (FkX )k∈I is the filtration generated by X. More generally, one has
P[Xk+n = y | FkX ] = P n (Xk , y) a.s.
(y ∈ S, n ≥ 0, {k, k + n} ⊂ I),
where P n denotes the n-th power of the transition kernel P . Here, if K, L are
probability kernels on S, then we define their product as
X
KL(x, z) :=
K(x, y)L(y, z)
(x, z ∈ S),
y∈S
which is again a probability kernel on S. Then K n is defined as the product of n
times K with itself, where K 0 (x, y) := 1{x=y} . We may associate each probability
kernel on S with a linear operator, acting on bounded real functions f : S → R,
defined as
X
Kf (x) :=
K(x, y)f (y)
(x ∈ S).
y∈S
Then KL is just the composition of the operators K and L, and for each bounded
f : S → R, one has
E[f (Xk+n ) | FkX ] = P n f (Xk ) a.s.
(n ≥ 0, {k, k + n} ⊂ I),
(0.3)
0.7. STRONG MARKOV PROPERTY
15
and this formula holds more generally provided the expectations are well-defined
(e.g., if E[|f (Xk+n )|] < ∞ or f ≥ 0).
If µ is a probability measure on S and K is a probability kernel on S, then we
may define a new probability measure µK on S by
X
µK(y) :=
µ(x)K(x, y)
(y ∈ S).
x∈S
In this notation, if X is a Markov process with transition kernel P and initial law
P[X0 ∈ · ] = µ, then P[Xn ∈ · ] = µP n is its law at time n.
We may view transition kernels as (possibly infinite) matrices that act on row
vectors µ or column vectors f by left and right multiplication, respectively.
0.7
Strong Markov property
We assume that the reader is familiar with some basic facts about Markov chains,
as taught in the lecture [LP11]. In particular, this concerns the strong Markov
property, which we formulate now.
Let X = (Xk )k≥0 be a Markov chain with countable state space S and transition
kernel P . As usual, it goes without saying that X is homogeneous (i.e., we use the
same P in each time step) and when we don’t mention the initial law, we mean
the process started in an arbitrary initial law. Often, it is notationally convenient
to assume that our process X is always the same, while the dependence on the
initial law is expressed in the choice of the probability measure on our underlying
probability space.
More precisely, we assume that we have a measurable space (Ω, F) and a collection
X = (Xk )k≥0 of measurable maps Xk : Ω → S, as well as a collection (Px )x∈S of
probability measures on (Ω, F), such that under the measure Px , the process X is
a Markov chain with initial law Px [X0 = x] = 1 and transition kernel
P P . In this
set-up, if µ is any probability measure on S, then under the law P := x∈S µ(x)Px ,
the process X is distributed as a Markov chain with initial law µ and transition
kernel P .
If X, P, Px are as just described and (FkX )k≥0 is te filtration generated by X, then
it follows from Proposition 0.9 (ii) and homogeneity that
P (Xn+k )k≥0 ∈ · FnX = PXn (Xk )k≥0 ∈ ·
a.s.
(0.4)
16
CHAPTER 0. PRELIMINARIES
Here, for fixed n ≥ 0, we consider (Xn+k )k≥0 as a random variable taking values
in S N (i.e., this is the process Y defined by Yk := Xn+k (k ≥ 0)). Since S N is a
nice (in particular Polish) space, we can choose a regular version of the conditional
probability on the left-hand side of (0.4), i.e., this is a random probability measure
on S N . Since Xn is random, the same is true for the right-hand side. In words,
formula (0.4) says that given the process up to time n, the process after time n is
distributed as the process started in Xn . The strong Markov property extends this
to stopping times.
Proposition 0.13 (Strong Markov property) Let X, P, Px be as defined above.
Then, for any FkX -stopping time τ such that τ < ∞ a.s., one has
a.s.
(0.5)
P (Xτ +k )k≥0 ∈ · FτX = PXτ (Xk )k≥0 ∈ ·
Proof This follows from [LP11, Thm 2.3].
Remark 1 Even if P[τ = ∞] > 0, formula (0.5) still holds a.s. on the event
{τ < ∞}.
Remark 2 Homogeneity is essential for the strong Markov property, at least in
the (useful) formulation of (0.5).
Since this is closely related to formula (0.4), we also mention the following useful
principle here.
Proposition 0.14 (What can happen must eventually happen) Let X =
(Xk )k≥0 be a Markov chain with countable state space S. Let B ⊂ S N be measurable
and set ρ(x) := Px [(Xk )k≥0 ∈ B]. Then the event
(Xn+k )k≥0 ∈ B for infinitely many n ≥ 0 ∪ ρ(Xn ) −→ 0
n→∞
has probability one.
Proof Let A denote the event that (Xn+k )k≥0 ∈ B for some n ≥ 0. Then by
Proposition 0.6,
ρ(Xn ) = PXn (Xk )k≥0 ∈ B = P (Xn+k )k≥0 ∈ B FnX
X
≤ P A FnX −→ P A F∞
= 1A a.s.
n→∞
In particular, this shows that ρ(Xn ) → 0 a.s. on the event Ac . Applying the same
argument to Am := {(Xn+k )k≥0 ∈ B for some n ≥ m}, we see that the event
Am ∪ {ρ(Xn ) → 0} has probability one for each m. Letting m ↑ ∞ and observing
that Am ↓ {(Xn+k )k≥0 ∈ B for infinitely many n ≥ 0}, the claim follows.
0.8. CLASSIFICATION OF STATES
0.8
17
Classification of states
Let X be a Markov chain with countable state space S and transition kernel P .
For each x, y ∈ S, we write x
y if P n (x, y) > 0 for some n ≥ 0. Note that
x
y
z implies x
z. Two states x, y are called equivalent if x
y and
y
x. It is easy to see that this defines an equivalence relation on S. A Markov
chain / transition kernel is called irreducible if all states are equivalent.
A state x is called recurrent if
Px [Xk = x for some k ≥ 1] = 1,
otherwise it is called transient. If two states are equivalent and one of them is
recurrent (resp. transient), then so is the other. Fix x ∈ S, let τ0 := 0 and let
τk := inf{n > τk−1 : Xn = x}
(k ≥ 1)
be the times of the k-th visit to x after time zero. Consider the process started
in X0 = x. If x is recurrent, then τ1 < ∞ a.s. It follows from the strong Markov
property that τ2 − τ1 is equally distributed with and independent of τ1 . By induction, (τk − τk−1 )k≥1 are i.i.d. In particular, τk < ∞ for all k ≥ 1, i.e., the process
returns to x infinitely often.
On the other hand, if x
Pis transient, then by the same sort of argument we see
that the number Nx = k≥1 1{Xk =x} of returns to x is geometrically distributed
Px [Nx = n] = θn (1 − θ) where θ := Pn [Xk = x for some k ≥ 1].
In particular, Ex [Nx ] < ∞ if and only if x is transient.
Lemma 0.15 (Recurrent classes are closed) Let X be a Markov chain with
countable state space S and transition kernel P . Assume that S 0 ⊂ S is an equivalence class of recurrent states. Then P (x, y) = 0 for all x ∈ S 0 , y ∈ S\S 0 .
Proof Imagine that P (x, y) > 0 for some x ∈ S 0 , y ∈ S\S 0 . Then, since S 0 is
an equivalence class, y 6 x, i.e., the process cannot return from y to x. Since
P (x, y) > 0, this shows that the process started in x has a positive probability
never to return to x, a contradiction.
A state x is called positively recurrent if
Ex [inf{n ≥ 1 : Xn = x}] < ∞.
18
CHAPTER 0. PRELIMINARIES
Recurrent states that are not positively recurrent are called null recurrent. If two
states are equivalent and one of them is positively recurrent (resp. null recurrent),
then so is the other. From this, it is easy to see that a finite equivalence class of
states can never be null recurrent.
The following lemma is an easy application of the principle ‘what can happen must
happen’ (Proposition 0.14).
Lemma 0.16 (Finite state space) Let X = (Xk )k≥0 be a Markov chain with
finite state space S and transition kernel P . Let Spos denote the set of all positively
recurrent states. Then P[Xk ∈ Spos for some k ≥ 0] = 1.
By definition, the period of a state x is the greatest common divisor of {n ≥ 1 :
P (x, x) > 0}. Equivalent states have the same period. States with period one are
called aperiodic. If x is an aperiodic state, then there exists an m ≥ 1 such that
P n (x, x) > 0 for all n ≥ m. Irreducible Markov chains are called aperiodic if one,
and hence all states have period one. If X = (Xk )k≥0 is an irreducible Markov
chain with period n, then Xk0 := Xkn (k ≥ 0) defines an aperiodic Markov chain
X 0 = (Xk0 )k≥0 .
The following example is of special importance.
Lemma 0.17 (Recurrence of one-dimensional random walk) The Markov
chain X with state space Z and transition kernel P (k, k − 1) = P (k, k + 1) = 12 is
null recurrent.
Proof Note that this Markov chain is irreducible and has period two, as it takes
value alternatively in the even and odd integers. Using Stirling’s formula, it is not
hard to show that (see [LP11, Example 2.9])
1
P 2k (0, 0) ∼ √
as k → ∞.
πk
In
particular, this shows that the expected number of returns to the origin E0 [N0 ] =
P∞
2k
k=1 P (0, 0) is infinite, hence X is recurrent.
To prove that X is not positively recurrent, we jump a bit ahead. By Theorem 0.18 (a) and (b) below, an irreducible Markov chain is positively recurrent
if and only if it has an invariant law. It is not hard to check that any invariant
measure for X must be infinite, hence X has no invariant law, so it cannot be
positively recurrent.
We will later see that, more generally, random walks on Zd are recurrent in dimensions d = 1, 2 and transient in dimensions d ≥ 3.
0.9. INVARIANT LAWS
0.9
19
Invariant laws
By definition, an invariant law for a Markov process with transition kernel P and
countable state space S is a probability measure µ on S that is invariant under
left-multiplication with P , i.e., µP = µ, or, written out per coordinate,
X
µ(y)P (y, x) = µ(x)
(x ∈ S).
y∈S
More generally, a (possibly infinite) measure µ on S satisfying this equation is
called an invariant measure. A probability measure µ on S is an invariant law
if and only if the process (Xk )k≥0 started in the initial law P[X0 ∈ · ] = µ is
(strictly) stationary. If µ is an invariant law, then there also exists a stationary
process X = (Xk )k∈Z , unique in distribution, such that X is a Markov process with
transition kernel P and P[Xk ∈ · ] = µ for all k ∈ Z (including negative times).
A detailed proof of the following theorem can be found in [LP11, Thms 2.10 and
2.26].
Theorem 0.18 (Invariant laws) Let X be a Markov chain with countable state
space S and transition kernel P . Then
(a) If µ is an invariant law and x is not positively recurrent, then µ(x) = 0.
(b) If S 0 ⊂ S is an equivalence class of positively recurrent states, then there
exists a unique invariant law µ of X such that µ(x) > 0 for all x ∈ S 0 and
µ(x) = 0 for all x ∈ S\S 0 .
(c) The invariant law µ from part (b) is given by
−1
µ(x) = Ex inf{k ≥ 1 : Xk = x} .
(0.6)
Sketch of proof For any x ∈ S, define µ(x) as in (0.6), with 1/∞ := 0. Since
consecutive return times are i.i.d., it is not hard to prove that
n
1X x
P [Xk = x] = µ(x),
n→∞ n
k=1
lim
(0.7)
i.e., the process started in x spends a µ(x)-fraction of its time in x. As a result, it
is not hard to show that if x is transient or null-recurrent, then the process started
20
CHAPTER 0. PRELIMINARIES
in any initial law satisfies P[Xn = x] → 0 for n → ∞, hence no invariant law can
give positive probability to such a state.
On the other hand, if S 0 ⊂ S is an equivalence class of positively recurrent states,
then one can check that (0.7) holds more generally for the process started in any
initial law on S 0 . It follows that for any such process, the Césaro-limit
n
1X
P[Xk ∈ · ]
n→∞ n
k=1
µ = lim
exists and does not depend on the initial law. In particular, if we start in an
invariant law, then this limit must be µ, which proves uniqueness. It is not hard
to check that any such Césaro-limit must be an invariant law, from which we obtain
existence.
Remark Using Lemma 0.15, it is not hard to prove that a general invariant law
of the process is a convex combination of invariant laws that are concentrated on
one equivalence class of positively recurrent states.
Theorem 0.19 (Convergence to invariant law) Let X be an irreducible positively recurrent Markov chain with invariant law µ. Then the process started in
any initial law satisfies
n−1
1X
P[Xk = x] −→ µ(x)
n→∞
n k=0
(x ∈ S).
If X is moreover aperiodic, then one also has that
P[Xk = x] −→ µ(x)
k→∞
(x ∈ S).
On the other hand, if X is a Markov chain such that all states of X are transient
or null recurrent, then the process started in any initial law satisfies
P[Xk = x] −→ 0
k→∞
(x ∈ S).
Proof See [LP11, Thm 2.26].
If µ is an invariant law and X = (Xk )k∈Z is a stationary process such that P[Xk ∈
· ] = µ for all k ∈ Z, then by the symmetry of the Markov property w.r.t. time
reversal, the process X 0 = (Xk0 )k∈Z defined by Xk0 := X−k (k ∈ Z) is also a
0.9. INVARIANT LAWS
21
Markov process. By stationarity, X 0 is moreover homogeneous, i.e., there exists
0
of X 0 satisfy
a transition kernel P 0 such that the transition probabilities Pk,k+1
0
(x, y) for a.e. x w.r.t. µ. In general, it will not be true that
P 0 (x, y) = Pk,k+1
0
P = P . We say that µ is a reversible law if µ is invariant and in addition,
the stationary processes X and X 0 are equal in law. One can check that this is
equivalent to the detailed balance condition
µ(x)P (x, y) = P (x, y)µ(y)
(x, y ∈ S),
which says that the process X started in P[X0 ∈ · ] = µ satisfies P[X0 = x, X1 =
y] = P[X0 = y, X1 = x]. More generally, a (possibly infinite) meaure µ on S
satisfying detaied balance is called an reversible measure. If µ is reversible measure
and we define a (semi-) inner product of real functions f : S → R by
X
hf, giµ :=
f (x)g(x)µ(x),
x∈S
then P is self-adjoint w.r.t. this inner product:
hf, P giµ = hP f, giµ .
22
CHAPTER 0. PRELIMINARIES
Chapter 1
Harmonic functions
1.1
(Sub-) harmonicity
Let X be a Markov chain with countable state space S and transition kernel P .
As we have seen, an invariant law of X is a vector that is invariant under leftmultiplication with P . Harmonic functions 1 are functions that are invariant
under right-multiplication with P . More precisely, we will say that a function
h : S → R is subharmonic for X if
X
P (x, y)|h(y)| < ∞
(x ∈ S),
y
and
h(x) ≤
X
P (x, y)h(y)
(x ∈ S).
y
We say that h is superharmonic if −h is subharmonic, and harmonic if it is both
subharmonic and superharmonic.
Lemma 1.1 (Harmonic functions and martingales) Assume that h is subharmonic for the Markov chain X = (Xk )k≥0 and that E[|h(Xk )|] < ∞ (k ≥ 0).
Then Mk := h(Xk ) (k ≥ 0) defines a submartingale M = (M (Xk ))k≥0 w.r.t. to the
filtration (FkX )k≥0 generated by X.
1
Historically, the term harmonic function was first used, and is still commonly used, for a
smooth function f : U → R, defined on some open domain U ⊂ Rd , that solves the Laplace
Pd
∂2
equation i=1 ∂x
2 f (x) = 0. This is basically the same as our definition, but with our Markov
i
chain X replaced by Brownian motion B = (Bt )t≥0 . Indeed, a smooth function f solves the
Laplace equation if and only if (f (Bt ))t≥0 is a local martingale.
23
24
CHAPTER 1. HARMONIC FUNCTIONS
Proof This follows by writing (using (0.3)),
X
E h(Xk+1 ) FkX =
P (Xk , y)h(y) ≥ h(Xk )
(k ≥ 0).
y
We will say that a state x is an absorbing state or trap for a Markov chain X if
P (x, x) = 1.
Lemma 1.2 (Trapping probability) Let X be a Markov chain with countable
state space S and transition kernel P , and let z ∈ S be a trap. Then the trapping
probability
h(x) := Px Xk = z for some k ≥ 0
is a harmonic function for X.
Proof Since 0 ≤ h ≤ 1, integrability is not an issue. Now
h(x) = Px Xk = z for some k ≥ 0
X =
Px Xk = z for some k ≥ 0 X1 = y Px [X1 = y]
y
X
X
P (x, y)h(y).
P (x, y)Py Xk = z for some k ≥ 0 =
=
y
y
Lemma 1.3 (Trapping estimates) Let X be a Markov chain with countable
state space S and transition kernel P , and let T := {z ∈ S : z is a trap}. Assume
that the chain gets trapped a.s., i.e., P[∃n ≥ 0 s.t. Xn ∈ T ] = 1 (regardless of the
initial law). Let z ∈ T and let h : S → [0, 1] be a subharmonic function such that
h(z) = 1 and h ≡ 0 on T \{z}. Then
h(x) ≤ Px Xk = z for some k ≥ 0
If h is superharmonic, then the same holds with the inequality sign reversed.
Proof Since h is subharmonic, Mk := h(Xk ) is a submartingale. Since h is
bounded, M is uniformly integrable. Therefore, by Propositions 0.7 and 0.8,
Mk → M∞ a.s. and in L1 -norm, where M∞ is some random variable such that
Ex [M∞ ] ≥ M0 = h(x). Since the chain gets trapped a.s., we have M∞ = h(Xτ ),
1.1. (SUB-) HARMONICITY
25
where τ := inf{k ≥ 0 : Xk ∈ T } is the trapping time. Since h(z) = 1 and h ≡ 0
on T \{z}, we have M∞ = 1{Xτ =z} and therefore Px [Xτ = z] = Ex [M∞ ] ≥ h(x). If
h is superharmonic, the same holds with the inequality sign reversed.
Remark 1 If S 0 ⊂ S is a ‘closed’ set in the sense that P(x, y) = 0 for all x ∈ S 0 ,
y ∈ S\S 0 , then define φ : S → (S\S 0 ) ∪ {∗} by φ(x) := ∗ if x ∈ S 0 and φ(x) := x
if x ∈ S\S 0 . Now (φ(Xk ))k≥0 is a Markov chain that gets trapped in ∗ if and only
if the original chain enters the closed set S 0 . In this way, Lemma 1.3 can easily be
generalized to Markov chains that eventually get ‘trapped’ in one of finitely many
equivalence classes of recurrent states. In particular, this applies when S is finite.
Remark 2 Lemma 1.3 tells us in particular that, provided that the chain gets
trapped a.s., the function h from Lemma 1.2 is the unique harmonic function
satisfying h(z) = 1 and h ≡ 0 on T \{z}. For a more general statement of this
type, see Exercise 1.7 below.
Remark 3 Even in situations where it is not feasable to calculate trapping probabilities exactly, Lemma 1.3 can sometimes be used to derive lower and upper
bounds for these trapping probabilities.
The following transformation is usually called an h-transform or Doob’s h-transform. Following [LPW09], we will simply call it a Doob transform.2
Lemma 1.4 (Doob transform) Let X be a Markov chain with countable state
space S and transition kernel P , and let h : S → [0, ∞) be a nonnegative harmonic
function. Then setting S 0 := {x ∈ S : h(x) > 0} and
P h (x, y) :=
P (x, y)h(y)
h(x)
(x, y ∈ S 0 )
defines a transition kernel P h on S 0 .
Proof Obviously P h (x, y) ≥ 0 for all x, y ∈ S 0 . Since
X
X
P h (x, y) = h(x)−1
P (x, y)h(y) = h(x)−1 P h(x) = 1 (x ∈ S 0 ),
y∈S 0
y∈S 0
P h is a transition kernel.
2
The term h-transform is somewhat inconvenient for several reasons. First of all, having
mathematical symbols in names of chapters or articles causes all kinds of problems for referencing.
Secondly, if one performs an h-transform with a function g, then should one speak of a g-transform
or an h-transform? The situation becomes even more confusing when there are several functions
around, one of which may be called h.
26
CHAPTER 1. HARMONIC FUNCTIONS
Proposition 1.5 (Conditioning on the future) Let X = (Xk )k≥0 be a Markov
chain with countable state space S and transition kernel P , and let z ∈ S be a
trap. Set S 0 := {y ∈ S : y
z} and assume that P[X0 ∈ S 0 ] > 0. Then, under the
conditional law
P[(Xk )k≥0 ∈ · Xm = z for some m ≥ 0 ,
the process X is a Markov process in S 0 with Doob-transformed transition kernel
P h , where
h(x) := Px Xm = z for some m ≥ 0
satisfies h(x) > 0 if and only if x ∈ S 0 .
Proof Using the Markov property (in its strong form (0.4)), we observe that
P Xn+1 = y (Xk )0≤k≤n = (xk )0≤k≤n , Xm = z for some m ≥ 0
= P Xn+1 = y (Xk )0≤k≤n = (xk )0≤k≤n , Xm = z for some m ≥ n + 1
P Xn+1 = y, Xm = z for some m ≥ n + 1 (Xk )0≤k≤n = (xk )0≤k≤n
=
P Xm = z for some m ≥ n + 1 (Xk )0≤k≤n = (xk )0≤k≤n
P (xn , y)Py [Xm = z for some m ≥ 0]
=
= P h (xn , y)
x
n
P [Xm = z for some m ≥ 1]
for each (xk )0≤k≤n and y such that P[(Xk )0≤k≤n = (xk )0≤k≤n ] > 0 and xn , y ∈ S 0 .
Remark At first sight, it is surprising that conditioning on the future may preserve
the Markov property. What is essential here is that being trapped in x is a tail
event, i.e., an event measurable w.r.t. the tail-σ-algebra
\
T :=
σ(Xk , Xk+1 , . . .).
k≥0
Similarly, if we condition a Markov chain (Xk )0≤k≤n that is defined on finite time
interval on its final state Xn , then under the conditional law, (Xk )0≤k≤n is still
Markov, although no longer homogeneous.
Exercise 1.6 (Sufficient conditions for integrability) Let h : S → R be any
function.
P Assume that E[|h(X0 )|] < ∞ and there exists a constant K < ∞ such
that y P (x, y)|h(y)| ≤ K|h(x)|. Show that E[|h(Xk )|] < ∞ (k ≥ 0).
Exercise 1.7 (Boundary conditions) Let X be a Markov chain with countable
state space S and transition kernel P , and let T := {z ∈ S : z is a trap}. Assume
1.1. (SUB-) HARMONICITY
27
that the chain gets trapped a.s., i.e., P[∃n ≥ 0 s.t. Xn ∈ T ] = 1 (regardless
of the initial law). Show that for each real function φ : T → R there exists a
unique bounded harmonic function h : S → R such that h = φ on T . Hint: take
h(x) := E[φ(Xτ )], where τ := inf{k ≥ 0 : Xk ∈ T } is the trapping time.
Exercise 1.8 (Conditions for getting trapped) If we do not know a priori that
a Markov chain eventually gets trapped, then the following fact is often useful. Let
X be a Markov chain with countable state space S and transition kernel P , and
let h : S → [0, 1] be a sub- or superharmonic function. Assume that for all ε > 0
there exists a δ > 0 such that
Px |h(X1 ) − h(x)| ≥ δ ≥ δ whenever ε ≤ h(x) ≤ 1 − ε.
Show that limk→∞ h(Xk ) ∈ {0, 1} a.s. Hint: use martingale convergence to prove
that limk→∞ h(Xk ) exists and use the principle ‘what can happen must happen’
(Proposition 0.14) to show that the limit cannot take values in (0, 1).
Exercise 1.9 (Trapping estimate) Let X, S, , P and h be as in Excercise 1.8.
Assume that h is a submartingale and there is a point z ∈ S such that h(z) = 1
and supx∈S\{z} h(x) < 1. Show that
h(x) ≤ Px [Xk = z for some k ≥ 0].
Exercise 1.10 (Compensator) Let X = (Xk )k≥0 be a Markov chain with countable state
P space S and transition kernel P , and let f : S → R be a function such
that y P (x, y)|f (y)| < ∞ for all x ∈ S. Assume that, for some given initial law,
the process X satisfies E[|f (Xk )|] < ∞ for all k ≥ 0. Show that the compensator
of (f (Xk ))k≥0 is given by
Kn =
n−1
X
P f (Xk ) − f (Xk )
(n ≥ 0).
k=0
Exercise 1.11 (Expected time till absorption: part 1) Let X be a Markov
chain with countable state space S and transition kernel P , and let T := {z ∈ S :
z is a trap}. Let τ := inf{k ≥ 0 : Xk ∈ T } and assume that Ex [τ ] < ∞ for all
x ∈ S. Show that the function
f (x) := Ex [τ ]
satisfies P f (x) − f (x) = −1 (x ∈ S\T ) and f ≡ 0 on T .
28
CHAPTER 1. HARMONIC FUNCTIONS
Exercise 1.12 (Expected time till absorption: part 2) Let X be a Markov
chain with countable state space S and transition kernel P , let T := {z ∈ S :
z is a trap}, and set τ := inf{k ≥ 0 : Xk ∈ T }. Assume that f : S → [0, ∞)
satisfies P f (x) − f (x) ≤ −1 (x ∈ S\T ) and f ≡ 0 on T . Show that
Ex [τ ] ≤ f (x)
(x ∈ S).
Hint: show that the compensator K of (f (Xk ))k≥0 satisfies Kn ≤ −(n ∧ τ ).
Exercise 1.13 (Absorption of random walk) Consider a random walk W =
(Wk )k≥0 on Z that jumps from x to x + 1 with probability p and to x − 1 with
the remaining
probability q :=
1 − p, where 0 < p < 1. Fix n ≥ 1 and set
τ := inf k ≥ 0 : Wk ∈ {0, n} . Calculate, for each 0 ≤ x ≤ n, the probability
P[Wτ = n].
Exercise 1.14 (First ocurrence of a pattern: part 1)
Let (Xk )k≥0 be i.i.d. Bernoulli random variables
with P[Xk = 0] = P[Xk = 1] = 21 (k ≥ 0). Set
τ110 := inf k ≥ 0 : (Xk , Xk+1 , Xk+2 ) = (1, 1, 0) ,
and define τ010 similarly. Calculate P[τ010 < τ110 ].
~ k := (Xk , Xk+1 , Xk+2 ) (k ≥ 0). Then
Hint: Set X
~ k )k≥0 is a Markov chain with transition prob(X
abilities as in the picture on the right. Now the
problem amounts to calculating the trapping probabilities for the chain stopped at τ010 ∧ τ110 .
000
100
001
010
101
110
011
111
Exercise 1.15 (First ocurrence of a pattern: part 2) In the set-up of the
previous exercise, calculate E[τ110 ] and E[τ111 ]. Hints: you need to solve a system
1.2. RANDOM WALK ON A TREE
29
of linear equations. To find the solution, it helps to use Theorem 0.18 (c) and the
fact that the uniform distribution is an invariant law. In the case of τ111 , it also
helps to observe that Ex [τ111 ] depends only on the number of ones at the end of x.
1.2
Random walk on a tree
In this section, we study random walk on an infinite tree in which every vertex has
three neighbors. Such random walks have many interesting properties. At present
they are of interest to us because they have many different bounded harmonic
functions. As we will see in the next section, the situation for random walks on
Zd is quite different.
Let T2 be an infinite tree, (i.e., a connected graph without cycles) in which each
vertex has degree 3 (i.e., there are three edges incident to each vertex). We will be
interested in the Markov chain whose state space are the vertices of T2 and that
jumps in each step with equal probabilities to one of the three neighboring sites.
We first need a convenient way to label vertices in such a tree. Consider a finitely
generated group with generators a, b, c satisfying a = a−1 , b = b−1 and c = c−1 .
More formally, we can construct such a group as follows. Let G be the set of all
finite sequences x = x(1) · · · x(n) where n ≥ 0 (we allow for the empty sequence
∅), x(i) ∈ {a, b, c} for all 1 ≤ i ≤ n, and x(i) 6= x(i + 1) for all 1 ≤ i ≤ i + 1 ≤ n.
We define a product on V by concatenation, where we apply the rule that any two
a’s, b’s or c’s next to each other cancel each other, inductively, till we obtain an
element of G. So, for example,
(abacb)(cab) = abacbcab,
(abacb)(bab) = abacbbab = abacab,
and (abacb)(bcb) = abacbbcb = abaccb = abab.
With these rules, G is a group with unit element ∅, the empty sequence, and
inverse (x(1) · · · x(n))−1 = x(n) · · · x(1). Note that G is not abelian, i.e., the
group product is not commutative.
We can make G into a graph by drawing an edge between two elements x, y ∈ G
if x = ya, x = yb, or x = yc. It is not hard to see that the resulting graph
is an infinite tree in which each vertex has degree 3; see Figure 1.1.3 We let
3
This is a special case of a much more general construction. Let G be a finitely generated
group and let ∆ ⊂ G be a finite, symmetric (in the sense that a ∈ ∆ implies a−1 ∈ ∆) set of
elements that generates G. Draw a vertex between two elements a, b ∈ G if a = cb for some
30
CHAPTER 1. HARMONIC FUNCTIONS
|x| = |x(1) · · · x(n)| := |n| denote the length of an element x ∈ G. It is not hard
to see that this is the same as the graph distance of x to the ‘origin’ ∅, i.e., the
length of the shortest path connecting x to ∅.
aca
aba
ac
ab
acb
abc
a
bca
cbc
∅
bc
bcb
bac
cb
b
c
ba
ca
bab
cab
cba
cac
Figure 1.1: The regular tree T2
Let X = (Xk )k≥0 be the Markov chain with state space G and transition probabilities
1
(x ∈ G),
P (x, xa) = P (x, xb) = P (x, xc) =
3
i.e., X jumps in each step to a uniformly chosen neighboring vertex in the graph.
We call X the nearest neighbor random walk on G.
We observe that if X is such a random walk on G, then |X| = (|Xk |)k≥0 is a
c ∈ ∆ (or equivalently, by the symmetry of ∆, if b = c0 a for some c0 ∈ ∆). The resulting graph is
called the left Cayley graph associated with G and ∆. This is a very general method of making
graphs with some sort of translation-invariant structure.
1.2. RANDOM WALK ON A TREE
31
Markov chain with state space N and transition probabilities given by
Q(n, n − 1) :=
1
3
and Q(n, n + 1) :=
2
3
(n ≥ 1),
and Q(0, 1) := 1.
For each x = x(1) · · · x(n) ∈ G, let us write x(i) := ∅ if i > n. The following
lemma shows that the random walk X is transient and walks away to infinity in a
well-defined ‘direction’.
Lemma 1.16 (Transience) Let X be the random walk on G described above,
started in any initial law. Then there exists a random variable X∞ ∈ {a, b, c}N+
such that
lim Xn (i) = X∞ (i) a.s.
(i ∈ N+ ).
n→∞
Proof We may compare |X| to a random walk Z = (Zk )k≥0 on Z that jumps from
n to n − 1 or n + 1 with probabilities 1/3 and 2/3, respectively. Such a random
walk has independent increments, i.e., (Zk − Zk−1 )k≥1 are i.i.d. random variables
that take the values −1 and +1 with probabilities 1/3 and 2/3. Therefore, by the
strong law of large numbers, (Zn − Z0 )/n → 1/3 a.s. and therefore Zn → ∞ a.s.
In particular Z visits each state only finitely often, which shows that all states are
transient. It follows that the process Z started in Z0 = 0 has a positive probability
of not returning to 0. Since Zn → ∞ a.s. and since |X| has the same dynamics as
Z as long as it is in N+ , this shows that the process started in X0 = a satisfies
Pa |Xk | ≥ 1 ∀k ≥ 1 = P1 Zk ≥ 1 ∀k ≥ 1 > 0.
This shows that a is a transient state for X. By irreducibility, all states are
transient and |Xk | → ∞ a.s., which is easily seen to imply the lemma.
We are now ready to prove the existence of a many bounded harmonic functions
for the Markov chain X. Let
∂G := x ∈ {a, b, c}N+ : x(i) 6= x(i + 1) ∀i ≥ 1 .
Elements in ∂G correspond to different ways of walking to infinity. Note that ∂G
is an uncountable set. In fact, if we identify elements of ∂G with points in [0, 1]
written in base 3, then ∂G corresponds to a sort of Cantor set. We equip ∂G with
the product-σ-field, which we denote by B(∂G). (Indeed, one can check that this
is the Borel-σ-field associated with the product topology.)
32
CHAPTER 1. HARMONIC FUNCTIONS
Proposition 1.17 (Bounded harmonic functions) Let φ : ∂G → R be bounded
and measurable, let X be the random walk on the tree G described above, and let
X∞ be as in Lemma 1.16. Then
(x ∈ G)
h(x) := Ex φ(X∞ )
defines a bounded harmonic function for X. Moreover, the process started in an
arbitrary initial law satisfies
h(Xn ) −→ φ(X∞ )
n→∞
a.s..
Proof It follows from the Markov property (in the form (0.4)) that
X
X
P (x, y)Ey [φ(X∞ )] =
P (x, y)h(y)
(x ∈ G),
h(x) = Ex [φ(X∞ )] =
y
y
which shows that h is harmonic. Since khk∞ ≤ kφk∞ , the function h is bounded.
Moreover, by (0.4) and Proposition 0.6,
X
h(Xn ) = EXn [φ(X∞ )] = E[φ(X∞ ) | FnX ] −→ E[φ(X∞ ) | F∞
] = φ(X∞ ) a.s.
n→∞
For example, in Figure 1.2, we have drawn a few values of the harmonic function
h(x) := Px [X∞ (1) = a]
(x ∈ G).
Although Proposition 1.17 proves that each bounded measurable function φ on
∂G yields a bounded harmonic function for the process X, we have not actually
shown that different φ’s yield different h’s.
Lemma 1.18 (Many bounded harmonics) Let µ be the probability measure
on ∂G defined by µ := P∅ [X∞ ∈ · ]. Let φ, ψ : ∂G → R be bounded and measurable
and let
h(x) := Ex φ(X∞ ) and g(x) := Ex ψ(X∞ )
(x ∈ G).
Then h = g if and only if φ = ψ a.s. w.r.t. µ.
Proof Let us define more generally µx = Px [X∞ ∈ · ]. Since
X
µx (A) =
P n (x, z)Pz [X∞ ∈ · ] ≤ P n (x, y)µy (A)
z
1.2. RANDOM WALK ON A TREE
33
23
24
23
24
5
6
5
6
23
24
23
24
2
3
1
24
1
24
1
3
1
12
1
24
1
12
1
6
1
6
1
12
1
12
1
24
1
24
1
24
1
24
1
24
Figure 1.2: A bounded harmonic function
(x, y ∈ G, n ≥ 0, A ∈ B(∂G)) and P is irreducible, we see that µx µy for all
x, y ∈ G, hence the measures (µx )x∈G are all equivalent. Thus, if φ = ψ a.s. w.r.t.
µ, then they are a.s. equal w.r.t. to µx for each x ∈ G, and therefore
Z
Z
h(x) = φdµx = ψdµx = g(x)
(x ∈ G).
On the other hand, if the set {φ 6= ψ} has positive probability under µ, then by
Proposition 1.17
P∅ lim h(Xn ) 6= lim g(Xn ) > 0,
n→∞
n→∞
which shows that there must exist x ∈ G with h(x) 6= g(x).
Exercise 1.19 (Escape probability) Let Z = (Zk )k≥0 be the Markov chain
with state space Z that jumps in each step from n to n − 1 with probability 1/3
and to n + 1 with probability 2/3. Calculate P1 [Zk ≥ 1 ∀k ≥ 0]. Hint: find a
suitable harmonic function for the process stopped at zero.
34
CHAPTER 1. HARMONIC FUNCTIONS
Exercise 1.20 (Independent increments) Let (Yk )k≥1 be i.i.d. and uniformly
distributed on {a, b, c}. Define (Xn )n≥0 by the random group product (in the group
G)
Xn := Y1 · · · Yn
(n ≥ 1),
with X0 := ∅. Show that X is the Markov chain with transition kernel P as defined
above.
1.3
Coupling
P
For any x = (x(1), . . . , x(d)) ∈ Zd , let |x|1 := di=1 |x(i)| denote the ‘L1 -norm’ of
x. Set ∆ := {x ∈ Zd : |x|1 = 1}. Let (Yk )k≥1 be i.i.d. and uniformly distributed
on ∆ and let
n
X
Xn :=
Yk
(n ≥ 1),
k=1
with X0 := 0. (Here we also use the symbol 0 to denote the origin 0 = (0, . . . , 0) ∈
Zd .) Then, just as in Excercise 1.20, X is a Markov chain, that jumps in each time
step from its present position x to a uniformly chosen position in x + ∆ = {x + y :
y ∈ ∆}. We call X the symmetric nearest neighbor random walk on the integer
lattice Zd . Sometimes X is also called simple random walk.
Let P denote its transition kernel. We will be interested in bounded harmonic
functions for P . We will show that in contrast to the random walk on the tree,
the random walk on the integer lattice has very few bounded harmonic functions.
Indeed, all such functions are constant. We will prove this using coupling, which
is a technique of much more general interest, with many applications.
Usually, when we talk about a random variable X (which may be the path of a
process X = (Xk )k≥0 ), we are not so much interested in the concrete probability
space (Ω, F, P) that X is defined on. Rather, all that we usually care about is
the law P[X ∈ · ] of X. Likewise, when we have in mind two random variables X
and Y (for example, one binomially and the other normally distributed, or X and
Y may be two Markov chains with possibly different initial states or transition
kernels), then we usually do not a priori know what their joint distribution is,
even if we know there individual distributions. A coupling of two random variables
X and Y , in the most general sense of the word, is a way to construct X and
Y together on one underlying probability space (Ω, F, P). More precisely, if X
and Y are random variables defined on different underlying probability spaces,
then a coupling of X and Y is a pair of random variables (X 0 , Y 0 ) defined on one
1.3. COUPLING
35
underlying probability space (Ω, F, P), such that X 0 is equally distributed with X
and Y 0 is equally distributed with Y . Equivalently, since the laws of X and Y
are all we really care about, we may say that a coupling of two probability laws
µ, ν defined on measurable spaces (E, E) and (F, F), respectively, is a probability
measure ρ on the product space (E × F, E ⊗ F) such that the first marginal of ρ
is µ and its second marginal is ν.
Obviously, a trivial way to couple any two random variables is to make them
independent, but this is usually not what we are after. A typical coupling is
designed to compare two random variables, for example by showing that they are
close, or one is larger than the other. The next excercise gives a simple example.
Exercise 1.21 (Monotone coupling) Let X be uniformly distributed on [0, λ]
and let Y be exponentially distributed with mean λ > 0. Show that X and
Y can be coupled such that X ≤ Y a.s. (Hint: note that this says that you
have to construct a probability measure on [0, λ] × [0, ∞) that is concentrated on
{(x, y) : x ≤ y} and has the ‘right’ marginals.) Use your coupling to prove that
E[X α ] ≤ E[Y α ] for all α > 0.
Now let ∆ ⊂ Zd be as defined at the beginning of this section and let P be the
transition kernel on Zd defined by
P (x, y) :=
1
1
2d {y − x ∈ ∆}
(x, y ∈ Zd ).
We are interested in bounded harmonic functions for P , i.e., bounded functions
h : Zd → R such that P h = h. It is somewhat inconvenient that P is periodic.4
In light of this, we define a ‘lazy’ modification of our transition kernel by
Plazy (x, y) := 12 P (x, y) + 21 1{x=y} .
Obviously, Plazy f = 12 P f + 21 f , so a function h is harmonic for P if and only if it
is harmonic for Plazy .
Proposition 1.22 (Coupling of lazy walks) Let X x and X y be two lazy random
walks, i.e., Markov chains on Zd with transition kernel Plazy , and initial states
X0x = x and X0y = y, x, y ∈ Zd . Then X x and X y can be coupled such that
∃n ≥ 0 s.t. Xkx = Xky
4
∀k ≥ n a.s.
Indeed, the Markov chain with transition kernel P takes values alternatively in Zdeven := {x ∈
Pd
Pd
Z : i=1 x(i) is even} and Zdodd := {x ∈ Zd : i=1 x(i) is odd}.
d
36
CHAPTER 1. HARMONIC FUNCTIONS
Proof We start by choosing a suitable random mapping representation. Let
(Uk )k≥1 , (Ik )k≥1 , and (Wk )k≥1 be collections of i.i.d. random variables, each collection independent of the others, such that for each k ≥ 1, Uk is uniformly distributed
on {0, 1}, Ik is uniformly distributed on {1, . . . , d}, and Wk is uniformly distributed
on {−1, +1}. Let ei ∈ Zd be defined as ei (j) := 1{i=j} . Then we may construct
X x inductively by setting X0x := x and
x
Xkx = Xk−1
+ Uk Wk eIk
(k ≥ 1).
Note that this says that Uk decides if we jump at all, Ik decides which coordinate
jumps, and Wk decides whether up or down.
To construct also X y on the same probability space, we define inductively X0y := y
and
( y
y
x
Xk−1 + (1 − Uk )Wk eIk
if Xk−1
(Ik ) 6= Xk−1
(Ik ),
y
Xk =
(k ≥ 1).
y
y
x
Xk−1
+ Uk Wk eIk
if Xk−1
(Ik ) = Xk−1
(Ik ),
Note that this says that X x and X y always select the same coordinate Ik ∈
{1, . . . , d} that is allowed to move. As long as X x and X y differ in this coordinate, they jump at different times, but after the first time they agree in this
cordinate, they always increase or decrease this coordinate by the same amount at
the same time. In particular, these rules ensure that
Xkx (i) = Xky (i) for all k ≥ τi := inf{n ≥ 0 : Xnx (i) = Xny (i)}.
Since (Xkx , Xky )k≥0 is defined in terms of i.i.d. random variables (Uk , Ik , Wk )k≥1
by a random mapping representation, the joint process (X x , X y ) is clearly a
Markov chain. We have already seen that X x , on its own, is also a Markov
chain, with the right transition kernel Plazy . It is straightforward to check that
P[Xky = z | (Xkx , Xky )] = Plazy (Xky , z) a.s. In particular, this transition probability
depends only on Xky , hence by Lemma 0.12, X y is an autonomous Markov chain
with transition kernel Plazy .
In view of this, our claim will follow provided we show that τi < ∞ a.s. for each
i = 1, . . . , d. Fix i and define inductively σ0 := 0 and
σk := inf{k > σk−1 : Ik = i}.
Consider the difference process
Dk := Xσxk − Xσyk
(k ≥ 0).
1.4. CONVERGENCE IN TOTAL VARIATION NORM
37
Then D = (Dk )k≥0 is a Markov process on Z that in each step jumps from z to
z + 1 or z − 1 with equal probabilities, except when it is in zero, which is a trap.
In other words, this says that D is a simple random walk stopped at the first time
it hits zero. By Lemma 0.17, there a.s. exists some (random) k ≥ 0 such that
Dk = 0 and hence τi = σk < ∞ a.s.
As a corollary of Proposition 1.22, we obtain that all bounded harmonic functions
for nearest-neighbor random walk on the d-dimensional integer lattice are constant.
Corollary 1.23 (Bounded harmonic functions are constant) Let P (x, y) =
(2d)−1 1{|x−y|1 =1} be the transition kernel of nearest-neighbor random walk on the
d-dimensional integer lattice Zd . If h : Zd → R is bounded and satisfies P h = h,
then h is constant.
Proof Couple X y and X y as in Proposition 1.22. Since h is harmonic and bounded,
(h(Xkx ))k≥0 and (h(Xky ))k≥0 are martingales. It follows that
h(x) − h(y) = E[h(Xkx )] − E[h(Xky )]
= E[h(Xkx ) − h(Xky )] ≤ 2khk∞ P[Xkx 6= Xky ] −→ 0
k→∞
for each x, y ∈ Zd , proving that h is constant.
Remark Actually, a much stronger statement than Corollary 1.23 is true: for
nearest-neighbor random walk on Zd , all nonnegative harmonic functions are constant. This is called the strong Liouville property, see [Woe00, Corollary 25.5]. In
general, the problem of finding all positive harmonic functions for a Markov chain
leads to the (rather difficult) problem of determining the Martin boundary of a
Markov chain.
1.4
Convergence in total variation norm
In this section we turn our attention away from harmonic functions and instead
show another application of coupling. We will use coupling to give a proof of the
statement in Theorem 0.19 (stated without proof in the Introduction) that any
aperiodic, irreducible, positively recurrent Markov chain is ergodic, in the sense
that regardless of the initial state, its law at time n converges to the invariant law
as n → ∞.
38
CHAPTER 1. HARMONIC FUNCTIONS
Recall that the total variation distance between two probability measures µ, ν on
a countable set S is defined as
kµ − νkTV := max µ(A) − ν(A)
A⊂S
The following lemma gives another formula for k · kTV .
Lemma 1.24 (Total variation distance) For any probability measures µ, ν on
a countable set S, one has
X
kµ−νkTV =
µ(x)−ν(x) =
x: µ(x)≥ν(x)
X
X
µ(x)−ν(x).
ν(x)−µ(x) =
x∈S
x: µ(x)<ν(x)
Proof Set S+ := {x ∈ S : µ(x) > ν(x)}, S0 := {x ∈ S : µ(x) = ν(x)}, and
S− := {x ∈ S : µ(x) < ν(x)}. Define a finite measure ρ by ρ(x) := |µ(x) − ν(x)|.
Then
µ(A) − ν(A) = ρ(A ∩ S+ ) − ρ(A ∩ S− ).
It follows that
−ρ(S− ) ≤ µ(A) − ν(A) ≤ ρ(S+ )
where either inequality may be an equality for a suitable choice of A (A = S− or
A = S+ , respectively). Here
ρ(S+ ) − ρ(S− ) =
X
µ(x) − ν(x) = 1 − 1 = 0,
x∈S
so
X
x: µ(x)≥ν(x)
X
µ(x) − ν(x) = ρ(S+ ) = ρ(S− ) =
ν(x) − µ(x) .
x: µ(x)<ν(x)
Theorem 1.25 (Convergence to invariant law) Let X be an irreducible, aperiodic, positively recurrent Markov chain with transition kernel P , state space S,
and invariant law µ. Then the process started in any initial law satisfies
P[Xn ∈ ·] − µ −→ 0.
TV
n→∞
1.4. CONVERGENCE IN TOTAL VARIATION NORM
39
Proof We take the existence of an invariant law µ as proven. Uniqueness will follow
from our proof. Let X and X be two independent Markov chains with transition
kernel P , where X is started in an arbitrary initial law and P[X 0 ∈ · ] = µ. It is
easy to see that the joint process (X, X) = (Xk , X k )k≥0 is a Markov process with
state space S × S. Let is denote its transition kernel by P2 , i.e., by independence,
P2 (x, x), (y, y) = P (x, y)P (x, y)
(x, x, y, y ∈ S).
We claim that P2 is irreducible. Fix x, x, y, y ∈ S. Since P is irreducible and
aperiodic, it is not hard to see that there exists an m1 ≥ 1 such that P n (x, y) > 0
for all n ≥ m1 . Likewise, there exists an m2 ≥ 1 such that P n (x, y) > 0 for all
n ≥ m2 . Choosing n ≥ m1 ∨ m2 , we see that
P2n (x, x), (y, y) = P n (x, y)P n (x, y) > 0,
proving that P2 is irreducible.
By Theorem 0.18 (a) and (b), an irreducible Markov chain is positively recurrent
if and only if it has an invariant law. Obviously, the product measure µ ⊗ µ is an
invariant law for P2 , so P2 is positively recurrent. In particular, this proves that
the stopping time
τ := inf{k ≥ 0 : Xk = X k }
is a.s. finite and has, in fact, finite expectation. Let X 0 = (Xk0 )k≥0 be the process
defined by
(
Xk
if k < τ,
Xk0 :=
Xk
if k ≥ τ.
It is not hard to see that X 0 is a Markov chain with transition kernel P and initial
law P[X00 ∈ · ] = P[X0 ∈ · ], hence X 0 is equal in law with X. Now
P[Xn ∈ · ] − µ = sup P[Xk0 ∈ A] − µ(A) = sup P[Xk0 ∈ A] − P[X k ∈ A]
TV
A⊂S
0
A⊂S
= sup P Xk ∈ A, X k 6∈ A or Xk 6∈ A, X k ∈ A
A⊂S
≤ P Xk0 6= X k = P[k < τ ] −→ 0.
k→∞
Exercise 1.26 (Periodic kernels) Show that the probability kernel P2 in the
proof of Theorem 1.25 is not irreducible if P is periodic.
40
CHAPTER 1. HARMONIC FUNCTIONS
Chapter 2
Eigenvalues
2.1
The Perron-Frobenius theorem
In this section, we recall the classical Perron-Frobenius theorem about the leading
eigenvalue and eigenvector of a nonnegative matrix. This theorem is usually viewed
as a part of linear algebra and proved in that context. We will see that the theorem
has close connections with Markov chains and some of its statements can even be
proved by probabilistic methods.
Let S be a finite set (S = {1, . . . , n} in the traditional formulation of the PerronFrobenius theorem) and let A : S × S → C be a function. We view such functions
as matrices, equipped with the usual matrix product, or, equivalently,
we identify
P
S
S
A with the linear operator A : C → C given by Af (x) := y∈S A(x, y)f (y),
where, CS denotes the linear space consisting of all functions f : S → C. We
say that A is real if A(x, y) ∈ R, and nonnegative if A(x, y) ≥ 0, for all x, y ∈ S.
Note that probability kernels are nonnegative matrices. A nonnegative matrix A is
called irreducible if for each x, y ∈ S there exists an n ≥ 1 such that An (x, y) > 0.
For probability kernels, this coincides with our earlier definition of irreducibility.
We let spec(A) denote the spectrum of A, i.e., the collection of (possibly complex)
eigenvalues of A, and we let ρ(A) denote its spectral radius
ρ(A) := sup{|λ| : λ ∈ spec(A)}.
If k · k is any norm on CS , then we define the associated operator norm kAk of A
as
kAk := sup{kAf k : f ∈ CS , kf k = 1}.
41
42
CHAPTER 2. EIGENVALUES
It is well-known that for any such operator norm
ρ(A) = lim kAn k1/n .
n→∞
(2.1)
The following version of the Perron-Frobenius theorem can be found in [Gan00,
Section 8.3] (see also, e.g., [Sen73, Chapter 1]). In this section, we will give our
own proof, with a distinctly probabilistic flavor, of this theorem and the remark
below it.
Theorem 2.1 (Perron-Frobenius) Let S be a finite set and let A : CS → CS
be a linear operator whose matrix is nonnegative and irreducible. Then
(i) There exist an f : S → [0, ∞), unique up to multiplication by a positive
constant, and a unique α ∈ R such that Af = αf and f is not identically
zero.
(ii) f (x) > 0 for all x ∈ S.
(iii) α = ρ(A) > 0.
(iv) The algebraic multiplicity of α is one. In particular, if A is written in its
Jordan normal form, then α corresponds to a block of size 1 × 1.
Remark We define periodicity for nonnegative matrices in the same way as for
probability kernels. If A is moreover aperiodic, then there exists some n ≥ 1
such that An (x, y) > 0 for all x, y ∈ S (i.e., the same n works for all x, y). Now
Perron’s theorem [Gan00, Section 8.2] implies that all other eigenvalues λ of A
satisfy |λ| < α. If A is not aperiodic, then it is easy to see that this statement fails
in general. (This is stated incorrectly in [DZ98, Thm 3.1.1 (b)].)
We call the constant α and function f from Theorem 2.1 the Perron-Frobenius
eigenvalue and eigenfunction of A, respectively. We note that if A† (x, y) := A(y, x)
denotes the transpose of A, then A† is also nonnegative and irreducible. It is wellknown that the spectra of a matrix and its transpose agree: spec(A) = spec(A† ),
and therefore also ρ(A) = ρ(A† ), which implies that the Perron-Frobenius eigenvalues of A and A† are the same. The same is usually not true for the corresponding
Perron-Frobenius eigenvectors. We call eigenvectors of A and A† also right and left
eigenvectors, respectively, and write φA := A† φ, consistent with earlier notation
for probability kernels.
2.2. SUBADDITIVITY
2.2
43
Subadditivity
By definition, a function f : N+ → R is subadditive if
f (n + m) ≤ f (n) + f (m)
(n, m ≥ 1).
The following simple lemma has many applications. Proofs can be found in many
places, e.g. [Lig99, Thm B.22].
Lemma 2.2 (Fekete’s lemma) If f : N+ → R is subadditive, then the limit
1
1
f (n) = inf f (n)
n→∞ n
n≥1 n
lim
exists in [−∞, ∞).
Proof Note that we can always extend f to a subadditive function f : N → R by
setting f (0) = 0. Fix m ≥ 1 and for each n ≥ 0 write n = km (n)m + rm (n) where
km (n) ≥ 0 and 0 ≤ rm (n) < m, i.e., km (n) is n/m rounded off to below and rm (n)
is the remainder. Setting sm := sup1≤r<m f (r), we see that
f km (n)m + rm (n)
km (n)f (m) + sm
f (m)
f (n)
=
≤
−→
,
n→∞
n
km (n)m + rm (n)
km (n)m
m
which proves that
lim sup
n→∞
f (n)
f (m)
≤
n
m
(m ≥ 1).
Taking the infimum over m we conclude that
lim sup
n→∞
f (n)
f (m)
≤ inf
.
m≥1 m
n
This shows in particular that the limit superior is less or equal than the limit inferior, hence the limit exists. Moreover, the limit (which equals the limit superior)
is given by the infimum. Since f takes values in R, the infimum is clearly less than
+∞.
44
2.3
CHAPTER 2. EIGENVALUES
Spectral radius
Let S be a finite set and let L(CS , CS ) denote the space of all linear operators
A : CS → CS . Let k · k be any norm on CS and let kAk denote the associated
operator norm of an operator A ∈ L(CS , CS ). Then obviously
kAf k = kA(f /kf k)k kf k ≤ kAk kf k
(f ∈ CS ),
where the inequality between the left- and right-hand sides holds also for f = 0
even though in this case the intermediate steps are not defined. It follows that
kABf k ≤ kAk kBk kf k, which in turn shows that
kABk ≤ kAk kBk
A, B ∈ L(CS , CS ) .
(2.2)
Exercise 2.3 (Operator norm induced by supremumnorm) If k · k∞ denotes the supremumnorm on CS , then show that the associated operator norm on
L(CS , CS ) is given by
X
|A(x, y)|.
kAk∞ = sup
x∈S
y∈S
Lemma 2.4 (Spectral radius) For each operator A ∈ L(CS , CS ), the limit
ρ(A) := lim kAn k1/n
n→∞
exists in [0, ∞). For each ε > 0 there exists a Cε < ∞ such that
n
ρ(A)n ≤ kAn k ≤ Cε ρ(A) + ε
(n ≥ 0).
(2.3)
Proof It follows from (2.2) that kAn+m k ≤ kAn k kAm k. In other words, this says
that the function n 7→ log kAn k is subadditive. By Lemma 2.2, it follows that the
limit
1
1
lim log kAn k = inf log kAn k
n→∞ n
n≥1 n
exists in [−∞, ∞). Applying the exponential function to both sides of this equation, we see that the limit
ρ(A) := lim kAn k1/n = inf kAn k1/n
n→∞
n≥1
2.4. EXISTENCE OF THE LEADING EIGENVECTOR
45
exists in [0, ∞). In particular, this shows that ρ(A) ≤ kAn k1/n (n ≥ 1) and for
each ε > 0 there exists an N ≥ 0 such that
kAn k1/n ≤ ρ(A) + ε
(n ≥ N ).
Raising both sides of our inequalities to the n-th power and choosing
Cε := 1 ∨ sup kAn k(ρ(A) + ε)−n
0≤n<N
yields the desired statement (where we observe that ρ(A)0 = 1 = k1k = kA0 k).
Exercise 2.5 (Choice of the norm) Show that the limit ρ(A) from Lemma 2.4
does not depend on the choice of the norm on CS . Hint: you can use the fact that
on a finite-dimensional space, all norms are equivalent. In particular, if k · k and
k · k0 are two different norms on the space L(CS , CS ), then there exist constants
0 < c < C < ∞ such that ckAk ≤ kAk0 ≤ CkAk for all A ∈ L(CS , CS ).
2.4
Existence of the leading eigenvector
Let A be a nonnegative real matrix with coordinates indexed by a finite set S.
The Perron-Frobenius theorem tells us that if A is irreducible, then A has a unique
nonnegative eigenvector with eigenvalue ρ(A). In this section, we will prove the
existence of such an eigenvector.
Lemma 2.6 (Existence of eigenvector) Let S be a finite set, let A : CS → CS
be a linear operator whose matrix is nonnegative, and let ρ(A) be its spectral radius
as defined in Lemma 2.4. Then there exists a function h : S → [0, ∞) that is not
identically zero such that Ah = ρ(A)h.
Proof We will treat the cases ρ(A) = 0 and ρ(A) > 0 separately. We start with the
latter. In this case, for each z ∈ [0, 1/ρ(A)), let us define a function fz : S → [0, ∞)
by
∞
X
fz :=
z n An 1,
n=0
where 1 denotes the function on S that is identically one. Note that for each
z < 1/ρ(A), this sequence is absolutely summable in any norm on CS by (2.3). It
follows that
∞
X
Afz =
z n An+1 1 = z −1 fz − 1
(0 < z < ρ(A)−1 ).
(2.4)
n=0
46
CHAPTER 2. EIGENVALUES
By Excercise 2.3 and the nonnegativity of A (and hence of An for any n ≥ 0),
X
kAn 1k∞ = sup
An (x, y) = kAn k∞ .
x∈S
y∈S
P
Let kf k1 := x∈S |f (x)| denote the `1 -norm of a function f : S → C. Then, by
nonnegativity and (2.3)
kfz k1 =
∞
X
n=0
n
n
z kA 1k1 ≥
∞
X
n=0
n
n
z kA 1k∞ =
∞
X
n=0
n
n
z kA k∞ ≥
∞
X
z n ρ(A)n .
(2.5)
n=0
Let hz := fz /kfz k1 . Since hz takes values in the compact set [0, 1]S , we can choose
zn ↑ ρ(A)−1 such that hzn → h where h is a nonnegative function h with khk1 = 1.
Now (2.4) tells us that
Ahzn = zn−1 hzn − kfzn k−1
1 .
By (2.5), kfz k1 → ∞ as z ↑ ρ(A)−1 , so letting n → ∞ we obtain a nonnegative
function h such that Ah = ρ(A)h and khk1 = 1.
We are left with the case ρ(A) = 0. We will show that this implies An = 0
for some n ≥ 1. Indeed, by the finiteness of the state space, if A|S| 6= 0, then by
nonnegativity there must exist x0 , . . . , xn ∈ S with x0 = xn and
Qn A(xk−1 , xk ) > 0 for
nm
m
each k = 1, . . . , n. But then A (x0 , x0 ) ≥ η where η := k=1 A(xk−1 , xk ) > 0,
which is easily seen to imply that ρ(A) ≥ η 1/n > 0. Thus, we see that ρ(A) = 0
implies An = 0 for some n. Let m := inf{n ≥ 1 : An 1 = 0}. Then h := Am−1 1 6= 0
while Ah = 0 = ρ(A)h.
2.5
Uniqueness of the leading eigenvector
In the previous section, we have shown that each nonnegative matrix A has a
nonnegative eigenvector with eigenvalue ρ(A). In this section, we will show that
if A is irreducible, then such an eigenvector is unique. We will use probabilistic
methods. Our proof will be based on the following observation.
Lemma 2.7 (Generalized Doob transform) Let S be a finite set and let A :
CS → CS be a linear operator whose matrix is nonnegative. Assume that h : S →
[0, ∞) is not identically zero, α > 0, and Ah = αh. Then setting S 0 := {x ∈ S :
h(x) > 0} and
A(x, y)h(y)
Ah (x, y) :=
(x, y ∈ S 0 )
αh(x)
defines a transition kernel Ah on S 0 .
2.5. UNIQUENESS OF THE LEADING EIGENVECTOR
47
Proof Obviously Ah (x, y) ≥ 0 for all x, y ∈ S 0 . Since
X
X
A(x, y)h(y) = (αh(x))−1 Ah(x) = 1 (x ∈ S 0 ),
Ah (x, y) = (αh(x))−1
y∈S 0
y∈S 0
Ah is a transition kernel.
We need one more preparatory lemma.
Lemma 2.8 (Eigenfunctions of positively recurrent chains) Let P be a
probability kernel on a finite or countably infinite space S. Assume that the Markov
chain X with transition kernel P is irreducible and positively recurrent. Let f :
S → (0, ∞) be a bounded function and let α > 0 be a constant. Assume that
P f = αf . Then α = 1 and f is constant.
Proof Since P is irreducible and positively recurrent, by Theorem 0.18 (b), it has
a unique invariant law µ, which satisfies µ(x) > 0 for all x ∈ S. Now µf = µP f =
µαf which by the fact that f > 0 proves that α = 1 and hence f is harmonic.
Let X denote the Markov chain with transition kernel P , let Px denote the law
of the chain started in X0 = x, and let Ex denote expectation w.r.t. Px . Let
δx denotes the delta measure in x, i.e., the probability measure on S such that
δx ({y}) := 1{x=y} . Then
n−1
n−1
n−1
1 X
1X k
1X x
P f (x) =
δx P k f =
E [f (Xk )] −→ µf
f (x) =
n→∞
n k=0
n k=0
n k=0
(x ∈ S),
where in the last step we have used Theorem 0.19. In particular, since the limit
does not depend on x, the function f is constant.
We can now prove parts (i)–(iii) of the Perron-Frobenius theorem.
Proposition 2.9 (Perron-Frobenius) Let S be a finite set and let A : CS → CS
be a linear operator whose matrix is nonnegative and irreducible. Then
(i) There exist an f : S → [0, ∞), unique up to multiplication by a positive
constant, and a unique α ∈ R such that Af = αf and f is not identically
zero.
(ii) f (x) > 0 for all x ∈ S.
(iii) α = ρ(A) > 0.
48
CHAPTER 2. EIGENVALUES
Proof Assume that f : S → [0, ∞) is not identically zero and satisfies Af = αf
for some α ∈ R. Since f is nonnegative and not identically zero, we can choose
x ∈ S such that f (x) > 0. Since A is nonnegative, Af (x) ≥ 0 and therefore α ≥ 0.
In fact, since A is irreducible, we can choose some n ≥ 1 such that An (x, x) > 0
and hence An f (x) > 0, so we must have α > 0. Moreover, by irreducibility, for
each y ∈ S there is an m such that Am (y, x) > 0 and hence Am f (y) > 0, which
shows that f (y) > 0 for all y ∈ S.
Now let h : S → [0, ∞) and β ∈ R be another function and real constant such
that h is not identically zero and Ah = βh. We will show that this implies f = ch
for some c > 0 and α = β. In particular, choosing h as in Lemma 2.6, this then
implies the statements of the proposition.
By what we have already proved, β > 0 and h(y) > 0 for all y ∈ S. Let Ah be
the probability kernel on S 0 = S (since h > 0 everywhere) defined in Lemma 2.7.
Note that Ah is irreducible since A is and since h > 0 everywhere. Set g := f /h.
We observe that
X A(x, y)h(y) f (y)
Af (x)
α
=
= g(x).
Ah g(x) =
βh(x) h(y)
βh(x)
β
y∈S
Since Ah is positively recurrent by the finiteness of the state space, Lemma 2.8
implies that α/β = 1 and g(x) = c (x ∈ S) for some c > 0.
2.6
The Jordan normal form
In this section, we recall some facts from linear algebra. Let V be a finitedimensional linear vector space over the complex numbers and let L(V, V ) denote
the space of linear operators A : V → V . If {e1 , . . . , ed } is a basis for V , then each
vector φ ∈ V can in a unique way be written as
φ=
d
X
φ(i)ei ,
i=1
where the φ(i) are complex numbers, the coordinates of φ with respect to the basis
{e1 , . . . , ed }. There exist complex numbers A(i, j) such that for any φ ∈ V , the
coordinates of Aφ are given by
Aφ(i) =
d
X
j=1
A(i, j)φ(j)
(i = 1, . . . , d).
2.6. THE JORDAN NORMAL FORM
49
We call (A(i, j))1≤i,j≤d the matrix of A w.r.t. basis {e1 , . . . , ed }. We warn the
reader that whether a matrix is nonnegative (i.e., wether its entries A(x, y) are all
nonnegative) very much depends on the choice of the basis.1
The spectrum of A is the set of eigenvalues of A, i.e.,
spec(A) := λ ∈ C : ∃0 6= φ ∈ V s.t. Aφ = λφ .
The function C 3 z 7→ pA (z) defined by
pA (z) := det(A − z)
is the characteristic polynomial of A. We may write
n
Y
pA (z) =
(λi − z)ma (λi ) ,
i=1
where spec(A) = {λ1 , . . . , λn }, with λ1 , . . . , λn all different, and ma (λ) is the algebraic multiplicity of the eigenvalue λ. We need to distinguish the algebraic
multiplicity of an eigenvalue λ from its geometric multiplicity mg (λ), which is
the dimension of the corresponding eigenspace {φ ∈ V : Aφ = λφ}. By definition, φ ∈ V is a generalized eigenvector of A with eigenvalue λ if (A − λ)k φ = 0
for some k ≥ 1. Then the algebraic multiplicity ma (λ) of λ is the dimension of
the generalized eigenspace {φ ∈ V : (A − λ)k φ = 0 for some k ≥ 1}. Note that
mg (λ) ≤ ma (λ).
It is possible to find a basis {e1 , . . . , ed } such that the matrix of A w.r.t. this basis
has the Jordan normal form , i.e., the matrix of A has the block diagonal form




λk 1
B1
... ...






.
..
A=
.
 , with Bk = 
.
.. 1 

Bd
λk
We can read off both the algebraic and geometric multiplicity of an eigenvalue λ
from the Jordan normal form of A. Indeed, ma (λ) is simply the number of times
1
The statement that the matrix of a linear operator A is nonnegative should not be confused
with the statement that A is nonnegative definite. The latter concept, which is defined only
for linear spaces that are equipped with an inner product hφ, ψi, means that hφ, Aφi is real
and nonnegative for all φ. One can show that A is nonnegative definite if and only if there
exists an orthonormal basis of eigenvectors of A and all eigenvalues are nonnegative. Operators
whose matrix is nonnegative w.r.t. some basis, by contrast, need not be diagonalizable and their
eigenvalues can be negative or even complex.
50
λ occurs on the diagonal, while mg (λ) is the
diagonal. Note that


λ 1
1
.. ..



.
.

 0

  ..
...

1  .
0
λ
CHAPTER 2. EIGENVALUES
number of blocks having λ on the


 
 
=
 
λ
0
..
.



.

0
Each block in the Jordan normal form of A corresponds to an invariant subspace of
generalized eigenvectors that contains exactly one eigenvector of A. The subspaces
corresponding to all blocks with a given eigenvalue span the generalized eigenspace
corresponding to this eigenvalue. To give a concrete example: if A has the Jordan
normal form


λ1 1


λ1 1




λ
1




λ1 1




λ
1




λ
2



λ3 1 
λ3
then the eigenspace and generalized eigenspace corresponding to the eigenvalue λ1
consist of all vectors of the forms




φ(1)
φ(1)
 0 
 φ(2) 




 0 
 φ(3) 




 φ(4) 
 φ(4) 




and
 0 
 φ(5) 




 0 
 0 




 0 
 0 
0
0
repectively.
Recall that in Lemma 2.4, we defined ρ(A) as ρ(A) := limn→∞ kAn k1/n . The
next lemma relates ρ(A) to the spectrum of A. This fact is well-known, but for
completeness we include a proof in Appendix A.1.
Lemma 2.10 (Gelfand’s formula) Fix any norm on V and let kAk denote the
associated operator norm of an operator A ∈ L(V, V ). Let ρ(A) be defined as in
2.7. THE SPECTRUM OF A NONNEGATIVE MATRIX
51
Lemma 2.4. Then
ρ(A) = sup{|λ| : λ ∈ spec(A)}
2.7
A ∈ L(V, V ) .
The spectrum of a nonnegative matrix
We note that in the proof of Proposition 2.9, which contains the main conclusions
of the Perron Frobenius theorem, we have used very little linear algebra. In fact,
we have not even very often used the fact that our state space S is finite. We used
the finiteness of S several times in the proof of Lemma 2.6 (existence of the leading
eigenvalue), for example when we used that k1k1 < ∞ or when we used that any
sequence of functions that is bounded in `1 -norm has a convergent subsequence.
Later, in the proof of Proposition 2.9, we needed the finiteness of S when we
used that any irreducible transition kernel is positively recurrent. In both of these
places, our arguments can sometimes be replaced by more ad hoc calculations to
show that certain infinite dimensional nonnegative matrices also have a PerronFrobenius eigenvector that is unique up to a multiplicative constant, in a suitable
space of functions.
In the present section, we will use the finiteness of S in a more essential way to
prove the missing part (iv) of Theorem 2.1, as well as the remark below it. In
particular, we will use that each finite-dimensional square matrix can be brought
in its Jordan normal form.
Lemma 2.11 (Spectrum and Doob transform) Let S be a finite set and let
A : CS → CS be a linear operator whose matrix is nonnegative. Assume that
h : S → (0, ∞) and α > 0 satisfy Ah = αh, and let Ah be the probability kernel
defined in Lemma 2.7. Then Ah is irreducible, resp. apriodic if and only if A
has these properties. Moreover, a function f : S → C is an eigenvector, resp.
generalized eigenvector of A corresponding to some eigenvalue λ ∈ C if and only
if f /h is an eigenvector, resp. generalized eigenvector of Ah corresponding to the
eigenvalue λ/α.
Proof Let us use the symbol h to denote the linear operator on CS that maps
a function f into hf . Similarly, let h−1 denote pointwise multiplication with the
function 1/h(x). Then Lemma 2.7 says that Ah = α−1 h−1 Ah. Recall that f is
an eigenvector, resp. generalized eigenvector of A corresponding to the eigenvalue
λ if and only if (A − λ)f = 0, resp. (A − λ)k f = 0 for some k ≥ 1. Now
52
CHAPTER 2. EIGENVALUES
Ah − λ/α = α−1 h−1 Ah − α−1 h−1 λh = α−1 h−1 (A − λ)h and therefore
(Ah − λ/α)k = α−1 h−1 (A − λ)h · · · α−1 h−1 (A − λ)h = α−k h−1 (A − λ)k h.
|
{z
}
k times
It follows that (Ah −λ/α)(f /h) = α−1 h−1 (A−λ)f is zero if and only if (A−λ)f = 0
and more generally (Ah − λ/α)k (f /h) = α−k h−1 (A − λ)k f is zero if and only if
(A − λ)k f = 0.
Exercise 2.12 (Non-diagonalizable kernel) Give an example of a probability
kernel P that is not diagonalizable. Hint: It is easy to give an example of a
nonnegative matrix that is not diagonalizable. Now apply Lemma 2.11.
Remark Regarding the excercise above: I do not know how to give an example
such that P is irreducible.
Proposition 2.13 (Spectrum of a nonnegative matrix) Let S be a finite set,
let A : CS → CS be a linear operator whose matrix is nonnegative and irreducible,
and let α = ρ(A) be its Perron-Frobenius eigenvalue. Then the algebraic multiplicity of α is one. If A is moreover aperiodic, then all other eigenvalues λ of A
satisfy |λ| < α.
Proof By Lemma 2.11 it suffices to prove the proposition for the case that A is
a probability kernel. Writing P instead of A to remind us of this fact, we first
observe that since the constant function 1 is nonnegative and satisfies P 1 = 1, by
Proposition 2.9, this is the Perron-Frobenius eigenfunction of P and its eigenvalue
is 1. In particular, α = ρ(P ) = 1.
Let µ be the unique invariant law of P , which satisfies µ(x) > 0 for all x ∈ S.2
Set V := {f ∈ CS : µf = 0}. Since f ∈ V implies µP f = µf = 0, we see that P
maps the space V into itself, i.e., V is an invariant subpace of P . Note that for any
f ∈ CS , we have f − (µf )1 ∈ V . We observe that for any f ∈ CS , f − (µf )1 ∈ V .
In fact, since f − c1 6∈ V for all c 6= µf , each function in CS can in a unique way
be written as a linear combination of a function in V and the constant function 1,
i.e., the space V has codimension one.
Let P V denote the restriction ofthe linear operator P to the space V . We claim
that 1 is not an eigenvalue of P V . To prove this, imagine that f ∈ V satisfies
2
Note that, actually, µ is the left Perron-Frobenius eigenvector of P .
2.8. CONVERGENCE TO EQUILIBRIUM
53
P f = f . Then, by the reasoning in Lemma 2.8,
n−1
1X k
f (x) =
P f (x) −→ µf = 0
n→∞
n k=0
(x ∈ S).
If P is aperiodic, a stronger conclusion can be drawn. In this case, if 0 6= f ∈ V
satisfies P f = λf for some λ ∈ C, then by Theorem 0.19,
λn f (x) = P n f (x) = δx P n f = Ex [f (Xn )] −→ µf = 0
n→∞
(x ∈ S),
which is possible only if |λ| < 1.
Now imagine that f ∈ CS (not necessarily f ∈ V ) is a generalized eigenvector of
P corresponding to some eigenvalue λ, i.e., (P − λ)k f = 0. Then we may write f
in a unique way as f = g + c1 where g ∈ V and c ∈ C. Now 0 = (P − λ)k f =
(P − λ)k g + c(1 − λ)k 1, where (P − λ)k g ∈ V by the fact that V is an invariant
subspace. Since each function in CS can in a unique way be written as a linear
combination of a function in V and the constant function 1, this implies that both
(P − λ)k g = 0 and c(1 − λ)k 1 = 0. We now distinguish two cases.
If λ = 1, then since we have just proved that 1 6∈ spec(P V ), the operator P V has
no generalized eigenvectors corresponding to the eigenvalue λ, and hence g = 0.
This shows that the constant function 1 is (up to constant multiples) the only
generalized eigenvector of P corresponding to the eigenvalue 1, i.e., the algebraic
multiplicity of the eigenvalue 1 is 1.
If λ 6= 1, then we must have c = 0 and hence f ∈ V . In this case, if P is aperiodic,
then by what we have just proved we must have |λ| < 1.
Exercise 2.14 (Periodic kernel) Let P be the probability kernel on {0, . . . , n −
1} defined by
P (x, y) = 1{y = x mod(n)}
(0 ≤ x, y < n).
Show that P is irreducible but not aperiodic. Determine the spectrum of P .
2.8
Convergence to equilibrium
Now that the Perron Frobenius theorem is proved, we can start to reap some of
its useful consequences. The next proposition says that for irreducible, apriodic
Markov chains with finite state space, convergence to the invariant law always
happens exponentially fast.
54
CHAPTER 2. EIGENVALUES
Proposition 2.15 (Exponential convergence to equilibrium) Let S be a
finite set and let P be an irreducible, aperiodic probability kernel on S. Then the
Perron-Frobenius eigenvalue of P is ρ(P ) = 1 and its associated eigenvector is the
constant function 1. Set
γ := sup |λ| : λ ∈ spec(P ), λ 6= 1 ,
and let µ denote the unique invariant law of P . Then γ < 1 and for any norm on
CS and ε > 0, there exists a constant Cε < ∞ such that
kP n f − (µf )1k ≤ Cε (γ + ε)n kf − (µf )1k
(n ≥ 0, f ∈ CS ).
Proof Obviously P 1 = 1 and 1 is a positive function, so by Proposition 2.9, this
is the Perron-Frobenius eigenfunction of P and its eigenvalue is 1. As in the proof
of Proposition 2.13, we set V := {f ∈ CS : µf = 0}. We observe that V is an
invariant subspace of P and f − (µf )1 ∈ V for all f ∈ CS . Let P V denote the
restriction of the linear operator P to V .
By Lemma 2.4, for any norm on CS and its associated operator norm, and for any
ε > 0, there exists a Cε < ∞ such that for any f ∈ CS ,
kP n f − (µf )1k = kP n (f − (µf )1)k
n
≤ kP n V k kf − (µf )k ≤ Cε ρ(P V ) + ε kf − (µf )k.
As we have seen near the end of the proof of Proposition 2.13, if f is an eigenvector
of P corresponding to an eigenvalue λ 6= 1, then we must have f ∈ V
. By
Proposition 2.13, the algebraic multiplicity of 1 is 1, hence 1 6∈ spec(P V ). It
follows that
spec(P )\{1} = spec(P V ),
and therefore γ = ρ(P V ). By Proposition 2.13, all eigenvalues λ 6= 1 of P satisfy
|λ| < 1, hence γ < 1.
Remark If γ is as in Proposition 2.15, then the positive quantity 1−γ is called the
absolute spectral gap of the operator P . If X = (Xk )k≥0 denotes the Markov chain
with transition kernel P , then Proposition 2.15 says that the expectation Ex [f (Xn )]
of any function f at time n converges to its equilibrium value µf exponentially
fast:
x
E [f (Xn )] − µf ≤ Cε e −(η − ε)n , where η = log(1 − γ).
Here Cε can even be chosen uniformly in x (if we take k · k to be the supremumnorm).
2.9. CONDITIONING TO STAY INSIDE A SET
2.9
55
Conditioning to stay inside a set
Let X be a Markov chain with countable state space S and transition kernel P ,
and let S 0 ⊂ S be a finite set of states such that
• P (x, y) > 0 for some x ∈ S, y 6∈ S, i.e., it is possible leave S 0 .
• The restriction of P to S 0 is irreducible, i.e., it is possible to go from any
state x in S 0 to any other state y ∈ S 0 without leaving S 0 .
We will be interested in the process X started in any initial state in S 0 and conditioned not to leave S 0 till time n, in the limit that n → ∞. Obviously, our
assumptions imply that starting from anywhere in S 0 , there is a positive probability to leave S 0 at some time in the future. Since S 0 is finite, this probability is
uniformly bounded from below, so by the principle ‘what can happen must happen’
(Proposition 0.14), we have that
P Xk ∈ S 0 ∀k ≥ 0 = 0.
Let P 0 := P S 0 denote the restriction of P to S 0 , i.e., the matrix (P (x, y))x,y∈S 0
0
and its associated linear operator acing on CS . Since P 0 is nonnegative and irreducible, it has a unique Perron-Frobenius eigenvalue α and associated left and
right eigenfunctions η and h, unique up to multiplicative constants, i.e., η and h
are strictly positive functions on S 0 such that
ηP 0 = αη
and P 0 h = αh.
We choose our normalization of η and h such that
π(x) := η(x)h(x)
(x ∈ S 0 )
is a probability measure on S 0 . It turns out that π is the invariant law of the Doob
transformed probability kernel (P 0 )h defined as in Lemma 2.7.
Lemma 2.16 (Invariant law of Doob transformed kernel) Let S 0 be a finite
0
0
set and let A : CS → CS be linear operator whose matrix is irreducible and
nonnegative. Let α, η and h be its Perron-Frobenius eigenvalue and left and right
eigenvectors, respectively, normalized such that π(x) := η(x)h(x) (x ∈ S 0 ) is a
probability law on S 0 . Then π is the invariant law of the probability kernel Ah
defined in Lemma 2.7.
56
CHAPTER 2. EIGENVALUES
Proof Note that since Ah is irreducible, its invariant law is unique. Now
X
A(y, x)h(x)
πAh (x) =
η(y)h(y)
αh(y)
0
y∈SX
= α−1 h(x)
η(y)A(y, x) = α−1 (ηA)(x)h(x) = π(x),
y∈S 0
which proves that π is an invariant law.
For simplicity, we assume below that P 0 is not only irreducible but also aperiodic.
Theorem 2.17 (Process conditioned not to leave a set) Let X be a Markov
chain with countable state space
S and transition kernel P , and let S 0 ⊂ S be
0
finite. Assume that P := P S 0 is irreducible and aperiodic. Let α and η, h denote its Perron-Frobenius
left and right eigenvectors, respectively,
P eigenvalue and P
normalized such that x∈S 0 η(x) = 1 and x∈S 0 η(x)h(x) = 1. Set
τ := inf k ≥ 0 : Xk 6∈ S 0 .
Then, for any x ∈ S 0 ,
lim α−n Px [n < τ ] = h(x)
n→∞
(x ∈ S 0 ).
(2.6)
Moreover, for each m ≥ 1 and x ∈ S 0 ,
Px (Xk )0≤k≤m ∈ · n < τ =⇒ Px (Xkh )0≤k≤m ∈ · ,
n→∞
where X h denotes the Markov chain with state space S 0 and Doob transformed
transition kernel
P h (x, y) :=
P 0 (x, y)h(y)
αh(x)
(x, y ∈ S 0 ).
Proof We observe that, setting x0 := x,
x
P [n < τ ] =
X
x1 ∈S 0
···
n
X Y
P (xk−1 , xk ) =
xn ∈S 0 k=1
X
n
n
P 0 (x0 , xn ) = P 0 1(x).
xn ∈S 0
As in the proof of Lemma 2.11, we let h and h−1 also denote the operators that
correspond to multiplication with these functions. Then P h = α−1 h−1 P 0 h and
therefore
n
n
α−n Px [n < τ ] = α−n P 0 1(x) = α−n αhP h h−1 1(x)
= h(x) (P h )n h−1 (x) −→ h(x)πh−1 = h(x),
n→∞
2.9. CONDITIONING TO STAY INSIDE A SET
57
where
πh−1 =
X
x0 ∈S 0
π(x)h−1 (x) =
X
η(x)h(x)h−1 (x) =
x0 ∈S 0
X
η(x) = 1
x∈S 0
is the average of the function h−1 with respect to the invariant law π of P h .
To prove the remaining statement of the theorem, we observe that for any 0 ≤
m < n, xm+1 ∈ S 0 , and (x0 , . . . , xm ) ∈ S 0 m such that
P (X0 , . . . , Xm ) = (x0 , . . . , xm ) > 0,
we have
Px0 Xm+1 = xm+1 (X0 , . . . , Xm ) = (x0 , . . . , xm ), n < τ
P
Qn
P
k=1 P (xk−1 , xk )
xn ∈S 0
xm+2 ∈S 0 · · ·
Qn
Qm
P
=P
0
0
x0n ∈S 0 ( k=1 P (xk−1 , xk ))( k=m+1 P (xk−1 , xk ))
x0
∈S 0 · · ·
m+1
P (xm , xm+1 )P 0 n−m−1 1(xm+1 )
=
.
P 0 n−m 1(xm )
This shows that conditional on the event n < τ , the process (X0 , . . . , Xn ) is a
time-inhomogeneous Markov chain with transition kernel in the (m + 1)-th step
given by
(n)
Pm,m+1 (x, y)
P (x, y)P 0 n−m+1 1(y)
P (x, y)α−(n−m−1) P 0 n−m+1 1(y)
=
=
.
P 0 n−m 1(x)
αα−(n−m) P 0 n−m 1(x)
Since we have already shown that α−n P 0 n 1 → h as n → ∞, the result follows.
Exercise 2.18 (Behavior at typical times) Let 0 ≤ mn ≤ n be such that
mn → ∞ and n − mn → ∞ as n → ∞. Show that for any x ∈ S 0 ,
P Xmn ∈ · n < τ = π,
where π = ηh is the invariant law of the Doob transformed probability kernel P h .
Excercise 2.18 shows that conditional on the unlikely event that n < τ where n
is large, most of the time up to n, the process X is approximately distributed
according to the invariant law π of the Doob transformed probability kernel P h .
We will see below that at the time n, the situation is different.
58
CHAPTER 2. EIGENVALUES
Let X be a Markov chain with countable state space S and transition kernel P ,
and let S 0 ⊂ S be finite. By definition, we say that a probability measure ρ on S 0
is a quasi-stationary law3 of P on S 0 if
P[X0 ∈ · ] = ρ implies P[X1 ∈ · | X1 ∈ S 0 ] = ρ.
It seems that quasi-stationary laws were first introduced by Darroch and Seneta
in [DS67].
Proposition 2.19 (Quasi stationary law) Let X be a Markov chain with count0
able state space
S and transition kernel P , and let S ⊂ S be finite. Assume
0
that P := P S 0 is irreducible and aperiodic. Let α and η, h denote its PerronFrobenius
P eigenvalue and leftPand right eigenvectors, respectively, normalized such
that
x∈S 0 η(x) = 1 and
x∈S 0 η(x)h(x) = 1. Then η is the unique quasistationary law of P on S 0 and the process started in any initial law satisfies
lim P Xn ∈ · n < τ ] = η,
n→∞
where τ := inf{k ≥ 0 : Xk 6∈ S 0 }.
Proof Define finite measures µn on S 0 by
µn (x) := P[Xn = x, n < τ ] (x ∈ S 0 , n ≥ 0).
Then
µn+1 (x) = P[Xn+1 = x, n + 1 < τ ] =
X
P[Xn = y, n < τ ]P (y, x) = µn P 0 (x)
y 0 ∈S 0
(x ∈ S 0 ), i.e., µn+1 = µn P 0 . As before, we let h and h−1 denote the operators
that correspond to multiplication with these functions. Then P h = α−1 h−1 P 0 h
and therefore
n
n
µn = µ0 P 0 = µ0 αhP h h−1 = αn µ0 h(P h )n h−1 .
(2.7)
Let us define a probability measure ν on S 0 by
ν(x) := (µ0 h)−1 µ(x)h(x)
3
(x ∈ S 0 ).
Usually, the terms ‘stationary law’ and ‘invariant law’ can be used exchangeably. In this case,
this may lead to confusion, however, since the term ‘quasi-invariant measure’ is normally used in
ergodic theory for a measure that is mapped into an equivalent measure by some transformation.
2.9. CONDITIONING TO STAY INSIDE A SET
59
We observe that
αn µ0 h(P h )n = αn (µ0 h)ν(P h )n ,
where ν(P h )n is just the law at time n of a Markov chain with transition kernel
P h and initial law ν. Since P h is irreducible and aperiodic, ν(P h )n converges as
n → ∞ to the invariant law π given by π(x) = η(x)h(x). Inserting this into (2.7),
we see that
α−n µn = µ0 h(P h )n h−1 −→ (µ0 h)ηhh−1 = (µ0 h)η.
n→∞
It follows that
µn
α−n µn
(µ0 h)η
P Xn ∈ · n < τ ] =
= −n
=⇒
= η,
µn 1
α µn 1 n→∞ (µ0 h)η1
where we have used that η1 = 1 by the choice of our normalization.
60
CHAPTER 2. EIGENVALUES
Chapter 3
Intertwining
3.1
Intertwining of Markov chains
In linear algebra, an intertwining relation is a relation between linear operators of
the form
AB = B Ã.
In particular, if B is invertible, then this says that A = B ÃB −1 , or, in other words,
that the matrices A and à are similar. In this case, we may associate B with a
change of basis, and à corresponds to the matrix A written in terms of a different
basis. In general, however, B need not be invertible. It is especially in this case
that the word intertwining is used.
We will be especially interested in the case that A, Ã and B are (linear operators
corresponding to) probability kernels. So let us assume that we have probability
kernels P, P̃ on countable spaces S, S̃, respectively, and a probability kernel K
from S to S̃ (i.e., S × S̃ 3 (x, y) 7→ K(x, y)), such that
P K = K P̃.
(3.1)
Note that P, P̃ and K correspond to linear operators P : CS → CS , P̃ : CS̃ → CS̃ ,
and K : CS̃ → CS , so both sides of the equation (3.1) correspond to a linear
operator from CS̃ into CS . We need some examples to see this sort of relations
between probability kernels can really happen.
Exclusion process. Fix n ≥ 2 and let Cn := Z/n, i.e., Cn = {0, . . . , n − 1}
with addition modulo n. (I.e., Cn is the cyclic group with n elements.) Let
61
62
CHAPTER 3. INTERTWINING
S := {0, 1}Cn , i.e., S consists of all finite sequences x = (x(0), . . . , x(n − 1))
indexed by Cn . Let (Ik )k≥1 be i.i.d. and uniformly distributed on Cn . For each
x ∈ S, we may define a Markov chain X = (Xk )k≥0 , started in X0 = x and with
values in S, by setting

if i = I,

 Xk (I + 1)
Xk (I)
if i = I + 1,
Xk+1 (i) :=
(3.2)


Xk (i)
otherwise.
In words, this says that in each time step, we choose a uniformly distributed
position I ∈ Cn and exchange the values of X in I and I + 1 (where we calculate
modulo n). Note that we have described our Markov chain in terms of a random
mapping representation. In particular, it is clear from this construction that X is
a Markov chain.
Thinning Let C be any countable set and let S := {0, 1}C be the set of all x =
(x(i))i∈C with x(i) ∈ {0, 1} for all i ∈ C, i.e., S is the set of all sequences of zeros
and ones indexed by C. Fix 0 ≤ p ≤ 1, and let (χi )i∈C be i.i.d. Bernoulli (i.e.,
{0, 1}-valued) random variables with P[χi = 1] = p. We define a probability kernel
Kp from S to S by
Kp (x, · ) = P χi x(i) i∈C ∈ ·
(x ∈ S).
(3.3)
Note that χi x(i) i∈C is obtained from x by setting some coordinates of x to
zero, independently for each i ∈ C, where each coordinate x(i) that is one has
probability p to remain one and probability 1 − p to become a zero. We describe
this procedure as thinning the ones with parameter p.
Exercise 3.1 (Thinning of exclusion processes) Let P be the transition kernel of the exclusion process on Cn described above, with state space S = {0, 1}Cn ,
and for 0 ≤ p ≤ 1, let Kp be the kernel from S to S corresponding to thinning
with parameter p. Show that
P K p = Kp P
(0 ≤ p ≤ 1).
Sum of two processes Let P be a transition kernel on a countable space S and let
X(1) = (Xk (1))k≥0 and X(2) = (Xk (2))k≥0 be two independent Markov chains
with transition kernel P and possibly different deterministic initial states X0 (i) =
x(i) (i = 1, 2). Then (X(1), X(2)) = (Xk (1), Xk (2))k≥0 is a Markov chain with
values in the product space S × S. We may view (X(1), X(2)) as two particles,
3.1. INTERTWINING OF MARKOV CHAINS
63
walking around in the space S. Now maybe we are not interested in which particle
is where, but only in how many particles are on which place. In that case, we may
look at the process
Yk (i) := 1{Xk (1)=i} + 1{Xk (2)=i}
(i ∈ S, k ≥ 0),
whichPtakes values in the space S̃ consisting of all functions y : S → {0, 1, 2} such
that i∈S y(i) = 2. Note that Y just counts how many particles are present on
each site i ∈ S. It is not hard to see that Y = (Yk )k≥0 is an autonomous Markov
chain.
Exercise 3.2 (Counting process) Let (X(1), X(2)) and Y be the Markov chains
with state spaces S × S and S̃ described above. Let P2 be the transition kernel of
(X(1), X(2)) and let P̃ be the transition kernel of Y . For each y ∈ S̃, let K(y, · )
be the uniform distribution on the set
Uy := x(1), x(2) ∈ S × S : y = 1x(1) + 1x(2) ,
where 1x (i) := 1{x=i} . Note that K is a probability kernel from S̃ to S × S. From
y, we can see that there are two particles and where these are, but not which is
the first and which is the second particle. All K does is arbitrarily ordering the
particles in y. Show that
P̃ K = KP2 .
This example can easily be generalized to any number of independent Markov
chains (all with the same transition kernel).
Conditioning on the future As a third example, we look at a Markov chain X with
finite state space S and transition kernel P . We assume that P is such that all
states in S are transient, except for two states z0 and z1 , which are traps. We set
hi (x) := Px Xk = zi for some k ≥ 0
(i = 0, 1).
By Lemma 1.2, we know that h0 and h1 are harmonic functions. Since all states
except z0 and z1 are transient and our state space is finite, we have h0 + h1 = 1.
By Proposition 1.5, the process X conditioned to be eventually trapped in zi is
itself a Markov chain, with state space {x ∈ S : hi (x) > 0} and Doob transformed
transition kernel P hi . Set
S̃ := (x, i) : x ∈ S, i ∈ {0, 1}, hi (x) > 0 .
64
CHAPTER 3. INTERTWINING
We define a probability kernel P̃ on S̃ by
P̃ (x, i), (y, j) := 1{i=j} P hi (x, y)
(x, y ∈ S, i, j ∈ {0, 1}).
Note that if (X̃, I) = (X̃k , Ik )k≥0 is a Markov chain with transition kernel P̃ , then
I never changes its value, and depending on whether Ik = 0 for all k ≥ 0 or Ik = 1
for all k ≥ 0, the process X̃ is our original Markov chain X conditioned to be
trapped in either z0 or z1 .
Exercise 3.3 (Conditioning on the future) In the example above, define a
probability kernel K from S to S̃ by
K x, (y, i) := 1{x=y} hi (x)
(x, y ∈ S, i ∈ {0, 1}).
Show that
P K = K P̃.
Returning to our general set-up, we observe that the intertwining relation (3.1)
implies that for any probability measure µ on S
µP n K = µK P̃ n
(n ≥ 0).
This function has the following interpretation. If we start the Markov chain with
transition kernel P in the initial law µ, run it till time n, and then apply the kernel
K to its law, then the result is the same as if we start the the Markov chain with
transition kernel P̃ in the initial law µK, run it till time n, and look at its law.
More concretely, in our three examples, this says:
• If we start an exclusion process in some initial law, run it till time n, and
then thin it with the parameter p, then the result is the same as when we
thin the initial state with p, and then run the process till time n.
• If we start the counting process Y in any initial law, run it to time n, and
then arbitrarily order the particles, then the result is the same as when we
first arbitrarily order the particles, and then run the process (X(1), X(2))
till time n.
• If we run the two-trap Markov chain X till time n, and then assign it a value
0 or 1 according to the probability, given its present state Xn , that it will
eventually get trapped in z0 or z1 , respectively, then the result is the same
as when we assign such values at time zero, and then run the appropriate
Doob transformed Markov chain till time n.
3.2. MARKOV FUNCTIONALS
65
None of these statement comes as a big surprise, but we will later see less trivial
examples of this phenomenon.
3.2
Markov functionals
We already know that functions of Markov chains usually do not have the Markov
property. An exception, as we have seen in Lemma 0.12, is the case when a function
of a Markov chain is autonomous. Let us quickly recall what this means. Let X
be a Markov chain with countable state space S and transition kernel P , and let
ψ : S → S̃ be a function from S into some other countable set S̃. Then we say
that (Yk )k≥0 := (ψ(Xk ))k≥0 is an autonomous Markov chain if
P ψ(Xk+1 ) = y Xk = x
depends on x only through ψ(x). Equivalently, this says that there exists a transition kernel P̃ on S̃ such that
X
P̃ (y, y 0 ) =
P (x, x0 )
∀x ∈ S s.t. ψ(x) = y.
(3.4)
x0 : ψ(x0 )=y 0
If (3.4) holds, then, regardless of the initial law of X, one has that the process
ψ(X) = (ψ(Xk ))k≥0 is a Markov chain with transition kernel P̃ .
The next theorem shows that sometimes, a function ψ(X) of a Markov chain X can
be a Markov chain itself, even when it is not autonomous. In this case, however,
this is usually only true for certain special initial laws of X. It seems this result
is due to Rogers and Pitman [RP81]. For better comparison with other results, in
the theorem below, it will be convenient to interchange the roles of X and Y , i.e.,
X will be a function of Y and not the other way around. Note that if X and Y
are random variables such that P[Y ∈ · |X] = K(X, · ), then formula (3.5) below
says that ψ(Y ) = X a.s. Thus, the probability kernel K from S to S̃ is in a sense
the ‘inverse’ of the function ψ : S̃ → S.
Theorem 3.4 (Markov functionals) Let Y be a Markov chain with countable
state space S̃ and transition kernel P , and let ψ : S̃ → S be a function from S̃ into
some other countable set S. Let K be a probability kernel from S to S̃ such that
y ∈ S̃ : K(x, y) > 0 ⊂ y ∈ S̃ : ψ(y) = x
(x ∈ S),
(3.5)
and let P̃ be a transition kernel on S such that
P̃ K = KP.
(3.6)
66
CHAPTER 3. INTERTWINING
Then
P Y0 = y ψ(Y0 ) = K(ψ(Y0 ), y)
a.s.
(y ∈ S̃)
(3.7)
implies that
P Yn = y ψ(Y0 ), . . . , ψ(Yn ) = K(ψ(Yn ), y)
a.s.
(y ∈ S̃, n ≥ 0).
(3.8)
Moreover, (3.7) implies that the process ψ(Y ) = (ψ(Yk ))k≥0 , on its own, is a
Markov chain with transition kernel P̃ .
Proof Formula (3.8) says that
P Yn = y ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn ) = K(xn , y) a.s.
n ≥ 0, (x0 , . . . , xn ) ∈ S n such that
P[ ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn ) > 0,
and y ∈ S̃ such that ψ(y) = xn . We will prove this by induction. Assume that the
statement holds for some n ≥ 0. We wish to prove the statement for n + 1. Let
(x0 , . . . , xn+1 ) be such that
P[ ψ(Y0 ), . . . , ψ(Yn+1 ) = (x0 , . . . , xn+1 ) > 0,
and let y ∈ S̃ be such that ψ(y) = xn+1 . Then
P Yn+1 = y ψ(Y0 ), . . . , ψ(Yn+1 ) = (x0 , . . . , xn+1 )
P Yn+1 = y, ψ(Yn+1 ) = xn+1 ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn )
=
P ψ(Yn+1 ) = xn+1 ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn )
P Yn+1 = y ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn )
.
= P ψ(Yn+1 ) = xn+1 ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn )
(3.9)
We will treat the nominator and denominator separately.
To shorten notation, let
us denote the conditional law given ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn ) by
P(x0 ,...,xn ) · := P · ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn ) .
Then
P(x0 ,...,xn ) Yn+1 = y
X
=
P(x0 ,...,xn ) Yn+1 = y Yn = y 0 P(x0 ,...,xn ) Yn = y 0 .
y 0 :ψ(y 0 )=xn
3.2. MARKOV FUNCTIONALS
67
Here, by our induction assumption,
P(x0 ,...,xn ) Yn = y 0 = K(xn , y 0 )
∀y 0 ∈ S̃ s.t. ψ(y 0 ) = xn ,
while by the Markov property of Y ,
P(x0 ,...,xn ) Yn+1 = y Yn = y 0
= P Yn+1 = y Yn = y 0 , ψ(Y0 ), . . . , ψ(Yn ) = (x0 , . . . , xn )
= P Yn+1 = y Yn = y 0 = P (y 0 , y).
Thus, we see that
X
P(x0 ,...,xn ) Yn+1 = y =
K(xn , y 0 )P (y 0 , y)
y 0 : ψ(y 0 )=xn
=
X
0
0
K(xn , y )P (y , y) = KP (xn , y 0 ) = P̃ K(xn , y),
y 0 ∈S̃
where we have used that K(xn , · ) is concentrated on {y 0 : ψ(y 0 ) = xn } and in the
last step we have applied our intertwining assumption.
We observe that by (3.5) and the fact that ψ(y) = xn+1 ,
X
P (xn , x)K(x, y)
P̃ K(xn , y) =
x∈S̃
=
X
P (xn , x)K(x, y)1{ψ(y)=x} = P (xn , xn+1 )K(xn+1 , y),
x∈S̃
and therefore
P(x0 ,...,xn ) Yn+1 = y = P (xn , xn+1 )K(xn+1 , y).
It follows that
P(x0 ,...,xn ) ψ(Yn+1 ) = xn+1 =
X
P(x0 ,...,xn ) Yn+1 = y
y: ψ(y)=xn+1
=
X
P (xn , xn+1 )K(xn+1 , y) = P̃ (xn , xn+1 ).
y: ψ(y)=xn+1
Inserting our last two formulas into (3.9), we obtain that
P Yn+1 = y ψ(Y0 ), . . . , ψ(Yn+1 ) = (x0 , . . . , xn+1 )
=
P (xn , xn+1 )K(xn+1 , y)
= K(xn+1 , y).
P̃ (xn , xn+1 )
(3.10)
68
CHAPTER 3. INTERTWINING
This completes our proof of (3.8). Moreover, formula (3.10) shows that ψ(Y ) is a
Markov chain with transition kernel P̃ .
Let us see how Theorem 3.4 relates to the examples developed in the previous
section.
Counting process Let (X(1), X(2)) be two independent Markov processes, where
each process takes values in S and has transition kernel P , as in Excercise 3.2. Let
Y and S̃ be as defined there and let ψ : S × S → S̃ be the function
ψ(x(1), x(2)) := 1x(1) + 1x(2) ,
so that Y = ψ(X(1), X(2)). Then the kernel K defined in Excercise 3.2 satisfies
(x(1), x(2)) : K y, (x(1), x(2)) > 0 ⊂ {(x(1), x(2)) : ψ(x(1), x(2)) = y .
Since moreover P̃ K = KP2 , all assumptions of Theorem 3.4 are fulfilled, so we
find that
P (X0 (1), X0 (2)) = (x(1), x(2)) Y0 = K Y0 , (x(1), x(2)) a.s.
(3.11)
for all (x(1), x(2)) ∈ S × S implies that
P (Xn (1), Xn (2)) = (x(1), x(2)) (Y0 , . . . , Yn ) = K Yn , (x(1), x(2)) a.s. (3.12)
for all (x(1), x(2)) ∈ S × S and n ≥ 0, and under the same assumption, Y =
ψ(X(1), X(2)) is a Markov chain with transition kernel P̃ . Note that this latter conclusion holds in fact even without the assumption (3.11), since Y is autonomous. Formula (3.12) tells us that if initially we do not know the order of the
particles, then by observing the process Y up to time n, we obtain no information
about the order of the particles.
Conditioning on the future Let X and (X̃, I) be Markov chains with state spaces
S and S̃ and transition kernels P and P̃ as in Excercise 3.3. Thus, X is a Markov
chain that eventually gets trapped in one of two traps z0 and z1 , and (X̃, I) is the
same process where the second coordinate I tells us from the beginning in which
of the two traps we will end up.
Define ψ : S̃ → S by
ψ(x, i) := x
(x, i) ∈ S̃ ,
i.e., ψ is just projection on the first coordinate. Then the kernel K from Excercise 3.3 satisfies
(x̃, i) ∈ S̃ : K x, (x̃, i) > 0 ⊂ (x̃, i) ∈ S̃ : ψ(x̃, i) = x
3.3. INTERTWINING AND COUPLING
69
and P K = K P̃ , so all assumptions of Theorem 3.4 are fulfilled. We therefore
conclude that
P (X̃0 , I0 ) = (x̃, i) X̃0 ] = K X0 , (x̃, i) a.s.
(x̃, i) ∈ S̃
implies that
P (X̃n , In ) = (x̃, i) (X̃0 , . . . , X̃n )] = K Xn , (x̃, i) a.s.
for all n ≥ 0 and (x̃, i) ∈ S̃. More simply formulated, this says that
P I0 = i X̃0 = hi (X0 ) a.s.
(i = 0, 1)
implies that
P In = i (X̃0 , . . . , X̃n ) = hi (X̃n ) a.s.
(n ≥ 0, i = 1, 2).
(3.13)
Moreover, under the same assumption, the process X̃, on its own, is a Markov
chain with transition kernel P . Note that this is more surprising than in the
previous example, since in the present set-up, X̃ is not autonomous as part of the
joint Markov chain (X̃, I). Indeed, X̃ evolves according to the transition kernel
P h0 , resp. P h1 , depending on whether I = 0 or = 1.
To understand how it is possible that X̃ has the Markov property even though it
is not autonomous, we observe that by (3.13), if we observe the whole path of the
process X̃ up to time n, then we do not gain more information about the present
state of I than we would get from knowing Xn . As a result,
P X̃n+1 = xn+1 (X̃0 , . . . , X̃n ) = (x0 , . . . , xn )
= P X̃n+1 = xn+1 (X̃n , In ) = (xn , 0) h0 (xn )
+ P X̃n+1 = xn+1 (X̃n , I) = (xn , 1) h1 (xn ),
which depends only on xn and not on (x0 , . . . , xn−1 ).
Thinning of exclusion processes This example does not satisfy condition (3.5),
hence Theorem 3.4 is not applicable. In view of this and other examples, we will
prove a more general theorem in the next section.
3.3
Intertwining and coupling
In Theorem 3.4 we have seen how intertwining is related to the problem of Markov
functionals, i.e., the question whether certain functions of a Markov chain themselves have the Markov property. The intertwinings in Theorem 3.4 are of a special
70
CHAPTER 3. INTERTWINING
kind, because of the condition (3.5). In this section, we will see that intertwining
relations between two Markov chains X and Y in general give rise to couplings between X and Y such that (3.8) holds. The following result is due to Diaconis and
Fill [DF90, Thm 2.17]; the continuous-time analogue has been proved in [Fil92,
Thm. 2].
Theorem 3.5 (Intertwining coupling) Let P and P̃ be probability kernels on
countable state spaces S and S̃, respectively. Assume that K is a probability kernel
from S to S̃ such that
P K = K P̃.
Let
P (x, x0 )K(x0 , y 0 )
P (x, x ) :=
P K(x, y 0 )
y0
0
x, x0 ∈ S, y 0 ∈ S̃, P K(x, y 0 ) > 0 ,
(3.14)
and choose for Py0 (x, · ) any probability law on S if P K(x, y 0 ) = 0. Then Py0 is a
probability kernel on S for each y 0 ∈ S̃ and
P̂ (x, y; x0 , y 0 ) := P̃ (y, y 0 )Py0 (x, x0 )
(x, y), (x0 , y 0 ) ∈ Ŝ)
(3.15)
defines a probability kernel on Ŝ := {(x, y) ∈ S × S̃ : K(x, y) > 0}, where P̂ does
not depend on the freedom in the choice of the Py0 . If (X, Y ) is the Markov chain
with transition kernel P̂ started in an initial law such that
P Y0 = y X0 = K(X0 , · )
(y ∈ S̃),
then
P Yn = y (Xk )0≤k≤n = K(Xn , · )
(y ∈ S̃, n ≥ 0),
and X, on its own, is a Markov chain with transition kernel P .
Remark It is clear from (3.15) that in the joint Markov chain (X, Y ), the second
component Y is autonomous with transition kernel P̃ , but X is in general not
autonomous, unless Py0 can be chosen so that it does not depend on y 0 .
Proof of Theorem 3.5 We observe that
X
P K(x, y 0 )
= P̃ (y, y 0 )
Py0 (x, x0 ) =
0)
P
K(x,
y
x0 ∈S
(P K(x, y 0 ) > 0),
which shows that Py0 is a probability kernel. To show that the definition of P̂
does not depend on how we define Py0 (x, · ) when P K(x, y 0 ) = 0, we observe that
3.3. INTERTWINING AND COUPLING
71
(x, y) ∈ S and P̃ (y, y 0 ) > 0 imply that K(x, y)P̃ (y, y 0 ) > 0 and hence P K(x, y 0 ) =
K̃P (x, y 0 ) > 0.
To prove the remaining statements, let K̂ be the probability kernel from S to Ŝ
defined by
K̂ x, (x0 , y 0 ) := 1{x=x0 } K(x, y 0 )
x ∈ S, (x0 , y 0 ) ∈ Ŝ ,
and define ψ : Ŝ → S by ψ(x, y) := x. If we show that
P K̂ = K̂ P̂,
then the remaining claims follow from Theorem 3.4. We observe that
X
P K̂(x; x0 , y 0 ) =
P (x, x00 )1{x00 =x0 } K(x00 , y 0 ) = P (x, x0 )K(x0 , y 0 )
x00 ∈S
and
K̂ P̂ (x; x0 , y 0 ) =
=
X
X
1{x=x00 } K(x, y 00 )P̂ (x00 , y 00 ; x0 , y 0 )
(x00 ,y 00 )∈Ŝ
00
00
K(x, y )P̂ (x, y ; x0 , y 0 ),
y 00 ∈S̃
so that P K̂ = K̂ P̂ can be written coordinatewise as
X
K(x, y 00 )P̂ (x, y 00 ; x0 , y 0 )
P (x, x0 )K(x0 , y 0 ) =
x ∈ S, (x0 , y 0 ) ∈ Ŝ .
y 00 ∈S̃
To check this, we write
X
X
P (x, x0 )K(x0 , y 0 )
K(x, y 00 )P̂ (x, y 00 ; x0 , y 0 ) =
K(x, y 00 )P̃ (y 00 , y 0 )
P K(x, y 0 )
00
00
y ∈S̃
y ∈S̃
0
=
K P̃ (x, y )
P (x, x0 )K(x0 , y 0 ) = P (x, x0 )K(x0 , y 0 ),
P K(x, y 0 )
where we have used our assumption that K P̃ = P K.
Thinning of exclusion processes In this example,
Ŝ = (x, y) : x, y ∈ {0, 1}Cn , y ≤ x ,
and the joint evolution of (X, Y ) can be described in the same way as in (3.2), using
the same random pair I for both X and Y . While this example is rather trivial (we
could have invented this coupling without knowing anything about intertwining!),
we will see in the next section that there exist less trivial examples of intertwinings
where the coupling of Theorem 3.5 is much more surprising.
72
CHAPTER 3. INTERTWINING
3.4
First exit problems revisited
In this section we return to the set-up of Section 2.9. We will be interested at the
behavior of a Markov chain X until it leaves some finite set S 0 . For simplicity, we
will assume that the whole state space of X just consists of S 0 plus one additional
point z, which is a trap. More generally, the results we derive can be used to study
Markov chains stopped at the first time they leave a finite set.
Thus, we let X be a Markov chain with finite state space S = S 0 ∪ {z}, where z is
a trap. We let P denote the transition kernel of X, we write P 0 for the restriction
of P to S 0 and assume that P 0 is irreducible. It follows that P 0 has a unique
Perron-Frobenius eigenvalue α and associated positive right eigenvector h, which
is unique up to a multiplicative constant. If we extend h to S by putting h(z) := 0,
then h is an eigenvector of P . Indeed, since h(z) = 0, we have
X
X
P 0 (x, y)h(y) = P 0 h(x) = αh(x)
(x ∈ S 0 ),
P h(x) =
P (x, y)h(y) =
y∈S 0
y∈S
while P h(z) =
P
y∈S
P (z, y)h(y) = h(z) = 0 by the fact that z is a trap.
Proposition 3.6 (Intertwining for chain with one trap) Let X be a Markov
chain with finite state space S = S 0 ∪ {z} and transition kernel P . Assume that z
is a trap and the restriction of P to S 0 is irreducible. Let h be the, up to multiple
constants unique, right eigenvector of P such that h(z) = 0 and h > 0 on S 0 , and
assume that h is normalized such that supx∈S 0 h(x) ≤ 1. Let α be the eigenvalue of
h and let Y be a Markov process with state space S̃ := {1, 2} and transition kernel
P (1, 1) P (1, 2)
α 1−α
P̃ =
=
.
P (2, 1) P (2, 2)
0
1
Let K be the probability kernel from S to S̃ defined by
h(x)
if y = 1,
K(x, y) :=
1 − h(x)
if y = 2.
Then
P K = K P̃.
Proof This follows by writing
P K(x, y) =
X
x0 ∈S
0
0
P (x, x )K(x , y) =
P h(x) = αh(x)
P (1 − h)(x) = 1 − αh(x)
if y = 1,
if y = 1,
3.4. FIRST EXIT PROBLEMS REVISITED
73
while
K P̃ (x, y) =
X
K(x, y 0 )P (y 0 , y)
y 0 ∈S̃
=
K(x, 1)α = αh(x)
K(x, 2) + K(x, 1)(1 − α) = 1 − h(x) + (1 − α)h(x)
if y = 1,
if y = 2,
which shows that P K = K P̃ .
By Theorem 3.5 and the remark following the latter, we see that the processes X
and Y from Proposition 3.6 can be coupled in such a way that their joint transition
probabilities are given by
P̂ (x, y; x0 , y 0 ) = P̃ (y, y 0 )Py0 (x, x0 )
(x, y), (x0 , y 0 ) ∈ Ŝ ,
where Ŝ := {(x, y) ∈ S × S̃ : K(x, y) > 0} and Py0 is given by
Py0 (x, x0 ) :=
P (x, x0 )K(x0 , y 0 )
P K(x, y 0 )
(P K(x, y 0 ) > 0).
Filling in the definition of K and using our formulas for P K, we see that
P (x, x0 )K(x0 , 1)
P (x, x0 )h(x0 )
P1 (x, x ) =
=
= P h (x, x0 )
0
P K(x, 1)
αh(x )
0
(x, x0 ∈ S 0 )
is the generalized Doob transform of P 0 as defined in Lemma 2.7 and more specifically in Theorem 2.17. Thus, as long as the autonomous process Y stays in 1, the
process X jumps according to the generalized Doob transformed transition kernel
P h . Recall that Ph is concentrated on S 0 , hence X cannot get trapped as long
as Y = 1. Starting with the time step when Y jumps to the state 2, and from
thereon, the process X jumps according to the transition kernel
P2 (x, x0 ) =
P (x, x0 )(2 − h(x0 ))
P (x, x0 )K(x0 , 2)
=
,
P K(x, 2)
2 − αh(x)
which is well-defined for all x ∈ S since α < 1 and h ≤ 1. Since 1 − h is not an
eigenvector of P , this is not a kind of transform we have seen so far.
Remark The coupling of X and Y immediately gives us the estimate
Px [Xn ∈ S 0 ] ≥ K(x, 1)P1 [Yn = 1] = αn h(x).
(3.16)
In view of (2.6), this estimate quite good, but it is not completely sharp since
the function h in Section 2.9 is normalized in a different way than in the present
74
CHAPTER 3. INTERTWINING
P
section. Indeed, in Section 2.9 we chose h such that x∈S 0 η(x)h(x) = 1, where η
is a probability measure, while at present we need supx∈S 0 h(x) ≤ 1. Thus (unless
h is constant on S 0 ), the h in our present section is always smaller than the one in
Section 2.9.
Interesting examples of intertwining relations for birth-and-death processes can be
found in [DM09, Swa11]. Lower estimates in the spirit of (3.16) but in the more
complicated set-up of hierarchical contact processes have been derived in [AS10].
The ‘evolving set process’ in [LPW09, Thm 17.23] (which can be defined for quite
general Markov chains) provides another nontrivial example of an intertwining
relation.
Chapter 4
Branching processes
4.1
The branching property
Let SPbe a countable set and let N (S) be the set of all functions x : S → N such
that i∈S x(i) < ∞. It is easy to check that N (S) is a countable set (even though
the set of all functions x : S → N is uncountable). We interpret x ∈ N (S) as
a collection of finitely many particles or individuals, where x(i) is the number of
individuals of type i, where S is the type space, or sometimes also the number of
particles at the position i, where S represents physical space. Each x ∈ N (S) can
be written as
|x|
X
x=
δi β ,
(4.1)
β=1
where |x| :=
P
i
x(i), i1 , . . . , i|x| ∈ S, and δi ∈ N (S) is defined as
δi (j) := 1{i=j}
(i, j ∈ S).
This way of writing x is of course not unique but depends on the way we order
the individuals. We will be interested in Markov processes with state space N (S),
where in each time step, each individual, independently of the others, is replaced
by a finite number of new individuals (its offspring). Let Q be a probability kernel
from S to N (S). For a given x ∈ N (S) of the form (4.1), we can construct
independent N (S)-valued random variables V 1 , . . . , V |x| such that
P[V β ∈ · ] = Q(iβ , · )
75
(β = 1, . . . , |x|).
76
CHAPTER 4. BRANCHING PROCESSES
Then
|x|
P (x, · ) := P
X
Vβ ∈·
β=1
defines a probability law on N (S). Doing this for each x ∈ N (S) defines a probability kernel P on N (S). By definition, the Markov chain X with state space
N (S) and transition kernel P is called the multitype branching process with offspring distribution Q.
Lemma 4.1 (Branching property) Let X = (Xk )k≥0 and Y = (Yk )k≥0 be independent branching processes with the same type space N (S) and offspring distribution Q. Then Z = (Zk )k≥0 , defined by
Zk (i) := Xk (i) + Yk (i)
(k ≥ 0, i ∈ S)
(4.2)
is distributed as a branching processes with type space N (S) and offspring distribution Q.
Proof We need to check that
P Zk+1 = z FkZ = P (Zk , z) a.s.
k ≥ 0, z ∈ N (S) ,
where (FkZ )k≥0 is the filtration generated by Z. We will show that actually
(X,Y ) P Zk+1 = z Fk
= P (Zk , z) a.s.
k ≥ 0, z ∈ N (S) ,
(X,Y )
where (Fk
)k≥0 is the filtration generated by (X, Y ) = (Xk , yk )k≥0 . Since FkZ ⊂
(X,Y )
Fk
and P (Zk , z) is FkZ -measurable, this then implies that
(X,Y ) Z Fk = E P (Zk , z) FkZ = P (Zk , z).
P Zk+1 = z FkZ = E P Zk+1 = z Fk
Thus, by the Markov property of (X, Y ), it suffices to show that
P Zk+1 = z Xk = x, Yk = y = P (x + y, z)
k ≥ 0, x, y, z ∈ N (S) .
Here
P Zk+1 = z Xk = x, Yk = y
X X
=
P Xk+1 = x0 , Yk+1 = z − x0 Xk = x, Yk = y =
P (x, x0 )P (y, z − x0 ),
x0 ≤z
x0 ≤z
4.1. THE BRANCHING PROPERTY
77
where we have used that by the independence of X and Y , the process (X, Y ) is
a Markov process with transition kernel P2 (x, y; x0 , y 0 ) := P (x, x0 )P (y, y 0 ).
Thus, we are left with the task of showing that
X
P (x + y, z) =
P (x, x0 )P (y, z − x0 )
x, y, z ∈ N (S) .
(4.3)
x0 ≤z
Let us write
x=
|x|
X
δi β
and y =
|y|
X
δjγ ,
γ=1
β=1
and let V 1 , . . . , V |x| and W 1 , . . . , W |y| be all independent of each other such that
P[V β ∈ · ] = Q(iβ , · )
(β = 1, . . . , |x|),
γ
P[W ∈ · ] = Q(jγ , · )
(γ = 1, . . . , |y|).
Then
|x|
P (x + y, z) = P
X
β=1
|x|
=
β
V +
|y|
X
Wγ = z
γ=1
|y|
X
X X
X
W γ = z − x0 =
P (x, x0 )P (y, z − x0 ),
V β = x0 ,
P
x0 ≤z
β=1
x0 ≤z
γ=1
as required.
Remark We may define the convolution of two probability laws µ, ν on N (S) by
X
µ ∗ ν(z) :=
µ(x0 )ν(z − x0 ).
x0 ≤z
Then (4.3) says that
P (x + y, · ) = P (x, · ) ∗ P (y, · )
x, y ∈ N (S) .
(4.4)
In general, any probability kernel on N (S) with this property is said to have the
branching property. It is not hard to see that a Markov process with state space
N (S) has the branching property (4.2) if and only if its transition kernel has the
branching property (4.4). In particular (4.4) implies that if
x=
|x|
X
β=1
δi β ,
78
CHAPTER 4. BRANCHING PROCESSES
then
P (x, · ) = P (δi1 , · ) ∗ · · · ∗ P (δi|x| , · ),
where P (δi , · ) = Q(i, · ). Thus, each Markov process that has the branching
property is a branching process, and its transition probabilities are uniquely characterized by the offspring distribution.
4.2
Generating functions
For each function φ : S → R and x ∈ N (S), let us write
x
φ :=
Y
x(i)
φ(i)
i∈S
=
|x|
Y
φ(iβ ) where x =
β=1
|x|
X
δi β ,
β=1
where φ0 := 1. It is easy to see that
φx+y = φx φy
x, y ∈ N (S), φ : S → R .
Because of all the independence coming from the branching property, the linear
operator P associated with the transition kernel of a branching process maps
such ‘multiplicative functions’ into multiplicative functions. We will especially be
interested in the case that φ takes values in [0, 1]. We let [0, 1]S denote the space
of all functions φ : S → [0, 1].
Lemma 4.2 (Generating operator) Let P denote the transition kernel of a
multitype branching process with type space S and offspring distribution Q. Let U
be the nonlinear operator defined by
X
1 − U φ(i) :=
Q(i, x)(1 − φ)x
i ∈ S, φ ∈ [0, 1]S .
x∈N (S)
Then
P f φ = fU φ
φ ∈ [0, 1]S ,
where for any φ ∈ [0, 1]S , we define fφ : N (S) → [0, 1] by fφ (x) := (1 − φ)x .
P|x|
Proof If X is started in X0 = x with x = β=1 δiβ , then X1 is equal in distribution
P|x|
to β=1 V β where the V β ’s are independent with distribution Q(iβ , · ). It follows
4.2. GENERATING FUNCTIONS
79
that
|x|
P|x|
Y
β
β
P fφ (x) = E (1 − φ) β=1 V = E
(1 − φ)V
β=1
|x|
=
|x|
Y Y
β
E (1 − φ)V =
(1 − U φ)(iβ ) = fU φ (x).
β=1
β=1
Remark 1 It would seem that the formulation of the lemma is simpler if we
replace φ by 1 − φ everywhere, but as we will see later there are good reasons to
formulate things in terms of 1 − φ.
P
Remark 2 Our assumption that 0 ≤ φ ≤ 1 guarantees that the sum x Q(i, x)φx
in the definition of U φ(i) is finite. Under more restrictive assumptions on Q, we
can define U φ also for more general real-valued φ.
We call the nonlinear operator U from Lemma 4.2 the generating operator of the
branching process with offspring distribution Q. By induction, Lemma 4.2 shows
that P n fφ = fU n φ , or, in other words
Ex (1 − φ)Xn = (1 − U n φ)x
n ≥ 0, φ ∈ [0, 1]S .
(4.5)
The advantage of the operator U is that it acts on functions ‘living’ on the space
S, while P acts on functions on the much larger space N (S). The price we pay
for this is that U , unlike P , is not linear.
The next lemma shows that U contains, in a sense ‘all information we need’.
Lemma 4.3 (Generating functions are distribution determining) Let µ, ν
be probability measures on N (S) such that
X
X
µ(x)(1 − φ)x =
ν(x)(1 − φ)x
φ ∈ [0, 1]S .
x∈N (S)
x∈N (S)
Then µ = ν.
Proof We will prove the statement first under the assumtion that S is finite. Let
N (S) ∪ {∞} be the one-point compactification of N (S). For each ψ ∈ [0, 1)S ,
define gψ (x) := ψ x and gψ (∞) := 0. Then the gψ ’s are continuous functions on the
compact space N (S) ∪ {∞}. It is not hard to see that they separate points, i.e., for
each x 6= x0 there exists a ψ ∈ [0, 1)S such that gψ (x) 6= gψ (x0 ). Since gψ gψ0 = gψψ0 ,
80
CHAPTER 4. BRANCHING PROCESSES
the class {gψ : ψ ∈ [0, 1)S } is closed under multiplication. Let H be the space
of all linear combinations of functions from this class and the identity function.
Then H is an algebra that separates points, hence by the Stone-Weierstras theorem
H is dense in the space of continuous functions on N (S) ∪ {∞}, equipped with
the supremumnorm.
P
P By linearity and because µ and ν are probability measures,
x µ(x)f (x) =
x ν(x)f (x) for all f ∈ H. Since H is dense, it follows that µ = ν.
If S is not finite, then by applying our argument to functions ψ that are zero
outside a finite set, we see that the finite-dimensional marginals of µ and ν agree,
which shows that µ = ν in general.
There is a nice suggestive way of writing the relation (4.5). Generalizing (3.3), for
any φ ∈ [0, 1]S , we define a probability kernel Kφ from N (S) to N (S) by
|x|
Kφ (x, · ) := P
X
χβ δiβ ∈ · ,
(4.6)
β=1
P|x|
where x = β=1 δiβ and the χ1 , . . . , χ|x| are independent Bernoulli random variables with P[χβ = 1] = φ(iβ ). Thus, if Z is distributed according to the law
Kφ (x, · ), then Z is obtained from x by independent thinning, where a particle of
type i is kept with probability φ(i) and thrown away with the remaining probability. Note that Kφ has the branching property (4.4) and corresponds in fact to the
offspring distribution
Qφ (i, y) := Kφ (δi , y) = ρ(i)1{y=δi } + (1 − ρ(i))1{y=0}
i ∈ S, y ∈ N (S) .
Let Thinφ (x) denote a random variable with law Kφ (x, · ). Then
P Thinφ (x) = 0 = (1 − φ)x
x ∈ N (S), φ ∈ [0, 1]S ,
where 0 denotes the configuration in N (S) with no particles. In view of this,
X
1 − U φ(i) =
Q(i, x)(1 − φ)x = δi P Kφ (0)
x∈N (S)
and hence
U φ(i) = δi QKφ (N (S)\{0}).
(4.7)
Note that this says that if we start with one particle of type i, let it produce
offspring, and then thin with φ, then U φ(i) is the probability that we are left with
at least one individual. Likewise, we may rewrite (4.5) in the form
δx P n Kφ (0) = δx KU n φ (0),
(4.8)
4.3. THE SURVIVAL PROBABILITY
81
where µP n Kφ is the law obtained by starting the branching process in x, running
it till time t, and then applying the kernel Kφ , while δx KU n φ is the law obtained
by thinning x with U n φ.
Exercise 4.4 (Repeated thinning) Show that Kφ Kψ = Kφψ (φ, ψ ∈ [0, 1]S ).
Exercise 4.5 (Thinning characterization) Let µ, ν be probability laws on
N (S) such that µKφ (0) = νKφ (0) for all φ ∈ [0, 1]S . Show that µ = ν.
4.3
The survival probability
Let X be a branching process with type space S and generating operator U . We
observe that
Pδi Xn 6= 0] = 1 − Pδi Thin1 (Xn ) = 0]
= 1 − Pδi ThinU n 1 (δi ) = 0] = 1 − 1 − U n 1(i) = U n 1(i),
where we use the symbol 1 also to denote the function that is constantly one. Since
0 is a trap for any branching process,
Pδi Xn 6= 0] = Pδi Xk 6= 0 ∀0 ≤ k ≤ n] −→ Pδi Xk 6= 0 ∀k ≥ 0],
n→∞
where we have used the continuity of our probability measure with respect to
decreasing sequences of events. In view of this, let us write
ρ(i) := lim U n 1(i) = Pδi Xk 6= 0 ∀k ≥ 0]
(4.9)
n→∞
for the probability to survive starting from a single particle of type i.
Lemma 4.6 (Survival probability) The function ρ in (4.9) is the largest solution (in [0, 1]S ) of the equation
U ρ = ρ,
i.e., ρ solves this equation and any other solutions ρ0 of this equation, if they exist,
satisfy ρ0 ≤ ρ.
Proof Since
U φ(i) =
X
x
Q(i, x) 1 − (1 − φ)x ,
82
CHAPTER 4. BRANCHING PROCESSES
and since φ 7→ 1 − (1 − φ)x is a nondecreasing function, we see that
φ≤ψ
implies U φ ≤ U ψ.
(4.10)
By monotone convergence, we see moreover that
φn ↓ φ implies U φn ↓ U φ.
(4.11)
Since 1 ≥ U 1 we see by (4.10) and induction that U n 1 ≥ U n+1 1 and U n 1 ↓ ρ,
which was in fact also clear from our probabilistic interpretation. By (4.11), it
follows that
U ρ = U lim U n 1 = lim U n+1 = ρ.
n→∞
0
n→∞
S
Now if ρ ∈ [0, 1] is any other fixed point of U , then by (4.10) and (4.11)
ρ0 ≤ 1 implies ρ0 = U n ρ0 ≤ U n 1 −→ ,
ρ→∞
which shows that ρ0 ≤ ρ.
Exercise 4.7 (Galton-Watson process) Let X be a branching process whose
type space S = {1} consists of a single point, and let Q be its offspring distribution.
We identify N ({1}) ∼
= N. Since there is only one type of individual, we only need
to now with which probability a single individual produces n offspring (n ≥ 0).
Thus, we simply write
Q(n) = Q(1, nδ1 )
which is a probability law on N. Assume that Q has a finite second moment and
let
∞
X
nQ(n)
a :=
n=0
denote its mean. We identify [0, 1]S ∼
= [0, 1] and let U : [0, 1] → [0, 1] be the
generating operator of X, which is now just a (nonlinear) function from [0, 1] to
[0, 1].
(a) Show that U is a concave function and that U 0 (0) = a.
(b) Assume that Q(1) < 1. Show that
P 1 [Xk 6= 0 ∀k ≥ 0] > 0
if and only if a > 1.
4.3. THE SURVIVAL PROBABILITY
83
Remark Single-type branching processes as in Excercise 4.7 are called GaltonWatson processes after the seminal paper (from 1875) [WG75], where they proved
that the survival probability ρ solves U ρ = ρ but incorrectly concluded from this
that the process survives for all a ≥ 0, since ‘obviously the solution of this equation
is ρ = 0’.
Exercise 4.8 (Spatial branching) Let (Ik )k≥0 be i.i.d. with P[Ik = −1] = 1/2 =
P[Ik = 1], and let N be a Poisson distributed random variable with mean a,
independent of (Ik )k≥0 . Let X be the branching process with type space Z and
offspring distribution Q given by
Q(i, · ) := P
N
X
δi+Ik ∈ ·
(i ∈ Z),
k=1
which says that a particle at i produces Pois(a) offspring which are independently
placed on either i − 1 or i + 1, with equal probabilities. Show that
Pδ0 Xk 6= 0 ∀k ≥ 0 > 0
if and only if a > 1.
Exercise 4.9 (Two-type process) Let X be a branching process with type
space S = {1, 2} and the following offspring distribution. Individuals of both
types produce a Poisson number of offspring, with mean a. If the parent is of
type 1, then its offspring are, independently of each other, of type 1 or 2 with
probability 1/2 each. All offspring of individuals of type 2 are again of type 2.
Starting with a single individual of type 1, for what values of a is there a positive
probability that there will be individuals of type 1 at all times?
Exercise 4.10 (Poisson offspring) Let X be a Galton-Watson process where
each individual produces a Poisson number of offspring with mean a, and let
ρn := P1 [Xn = 0]
(n ≥ 0)
be the probability that the process started with a single individual is extinct after
n steps. Prove that ρn+1 = ea(ρn −1) .
84
4.4
CHAPTER 4. BRANCHING PROCESSES
First moment formula
Proposition 4.11 (First moment formula) Let X be a branching process with
type space S and offspring distribution Q. Assume that the matrix
X
A(i, j) :=
Q(i, x)x(j)
(i, j ∈ S)
(4.12)
x∈N (S)
satisfies
sup
i∈S
X
A(i, j) < ∞.
(4.13)
j∈S
Then
Ex [Xn (j)] =
X
x(i)An (i, j)
x ∈ N (S), j ∈ S, n ≥ 0 .
i
P|x|
Proof We first prove the statement for n = 1. If x = β=1 δiβ , then X1 is equal in
P|x|
distribution to β=1 V β where the V β ’s are independent with distribution Q(iβ , · ).
Therefore,
|x|
|x|
X β X β E V (j)
V (j) =
E [X1 (j)] = E
x
β=1
β=1
=
|x|
X
X
β=1
Q(iβ , y)y(j) =
y
|x|
X
A(iβ , j) =
X
x(i)A(i, j).
i
β=1
By induction, it follows that
X
Px [Xn = x0 ]Ex [Xn+1 (j) | Xn = x0 ]
Ex [Xn+1 (j)] =
x0
X
X
X
X
x
=
P [Xn = x0 ]
x0 (i)A(i, j) =
A(i, j)
Px [Xn = x0 ]x0 (i)
i X
x0
x0
X
X i
X
x
=
A(i, j)
E [Xn (i)] =
A(i, j)
x(k)An (k, i) = An+1 (i, k),
i
x0
i
k
where all expressions are finite by (4.13).
Lemma 4.12 (Subcritical processes) Let X be a branching process with finite
type space S and offspring distribution Q. Assume that its first moment matrix
A defined in (4.12) is irreducible and satisfies (4.13), and let α be its PerronFrobenius eigenvalue. If α < 1, then
Px Xk 6= 0 ∀k ≥ 0 = 0
x ∈ N (S) .
4.4. FIRST MOMENT FORMULA
85
Proof For any f : S → R, let lf : N (S) → R denote the ‘linear’ function
lf (x) :=
X
x ∈ N (S), f : S → R .
x(i)f (i)
i∈S
Then by Lemma 4.11,
X
P n lf (x) = Ex lf (Xn ) =
f (i)Ex Xn (i)
i
X
X
X
n
=
f (i)
x(j)A (j, i) =
x(j)An f (j) = lAn f (x).
i
j
j
In particular, if h is the (strictly positive) right eigenvector of A with eigenvalue
α, then
P n lh = lAn h = lαn h ,
which says that
Ex h(Xn ) = αn h(x),
which tends to zero by our assumption that α < 1. Since h is strictly positive, it
follows that Px [Xn 6= 0] → 0.
Proposition 4.13 (Critical processes) Let X be a branching process with finite
type space S and offspring distribution Q. Assume that its first moment matrix
A defined in (4.12) is irreducible and satisfies (4.13), and let α be its PerronFrobenius eigenvalue. If α = 1 and there exists some i ∈ S such that Q(i, 0) > 0,
then
Px Xk 6= 0 ∀k ≥ 0 = 0
x ∈ N (S) .
Proof Let A := {Xn 6= 0 ∀n ≥ 0} and let
ρ(i) := Pδi (A).
By the branching property,
Px (Ac ) = (1 − ρ)x
x ∈ N (S) .
By the principle ‘what can happen must happen’ (Proposition 0.14), we have
(1 − ρ)Xn −→
0→∞
a.s. on the event A.
86
CHAPTER 4. BRANCHING PROCESSES
By irreducibility and the fact that Q(j, 0) > 0 for some j, it is not hard to see that
ρ(i) < 1 for all i ∈ S. By the finiteness of S, it follows that supi∈S ρ(i) < 1 and
hence
inf
(1 − ρ)x > 0
(N ≥ 0).
x∈N (S), |x|≤N
It follows that
|Xn | −→ ∞ a.s. on the event A,
n→∞
(4.14)
i.e., the only way for the process to survive is to let the number of particles tend
to infinity.
Let h be the (strictly positive) right eigenvector of A with eigenvalue α. Then
E lh (Xn+1 ) FnX = (P lh )(Xn ) = lαh (Xn ) = αlh (Xn ),
which shows that (provided that E[h(X0 )] < ∞, which is satisfied for processes
started in deterministic initial states) the process
X
Mn := α−n
h(i)Xn (i)
(n ≥ 0)
(4.15)
i∈S
is a nonnegative martingale. By martingale convergence, it follows that there exists
a random variable M∞ such that
Mn −→ M∞
n→∞
a.s.
(4.16)
In particular, if α = 1, this proves that
|Xn | −→
6
∞ a.s.,
n→∞
which by (4.14) implies that P(A) = 0.
4.5
Second moment formula
We start with some general Markov chain theory. For any probability law µ on a
countable set S and functions f, g : S → R, let us write
Covµ (f, g) := µ(f g) − (µf )(µg)
for the covariace of f and g, whenever this is well-defined. Note that if X is a
random variable with law µ, then Covµ (f, g) = E[f (X)g(X)] − E[f (X)]E[g(X)],
in accordance with the usual formula for the covariance of two functions.
4.5. SECOND MOMENT FORMULA
87
Lemma 4.14 (Covariance formula) Let X be a Markov chain with countable
state space S and transition kernel P . Let µ be a probability law on S and let C be
a class of functions f : S → R such that
(i) f ∈ C implies P f ∈ C.
X
(ii)
µ(x)P (x, y)|f (y)g(y)| < ∞ for all x ∈ S and f, g ∈ C.
x,y
Then, for any f, g ∈ C,
n
n
CovµP n (f, g) = Covµ (P f, P g) +
n
X
µP n−k Γ(P k−1 f, P k−1 g),
(4.17)
k=1
where
Γ(f, g) := P (f g) − (P f )(P g)
(f, g ∈ C).
Remark 1 If X = (Xk )k≥0 is the Markov chain with initial law µ and transition
kernel P , then
CovµP n (f, g) = E f (Xn )g(Xn ) − E f (Xn ) E g(Xn ) ,
=: Cov f (Xn ), g(Xn ) ,
and similarly
Covµ (P n f, P n g) = Cov (P n f )(X0 ), (P n g)(X0 ) .
Remark 2 The assumptions of the lemma are trivially fulfilled if we take for C
the class of all bounded real functions on S. Often, we also need the lemma for
certain unbounded functions, but in this case we need to find a class C satisfying
the assumptions of the lemma to ensure that all second moments are finite.
Proof of Lemma 4.14 The statement is trivial for n = 0. Fix n ≥ 1 and define
a function H : {0, . . . , n} → R by
H(k) := P k (P n−k f )(P n−k g)
(0 ≤ k ≤ n).
Then
µ H(n) − H(0) = µP n (f g) − µ (P n f )(P n g)
= µP n (f g) − (µP h f )(µP n g) − µ (P n f )(P n g) − (µP h f )(µP n g)
= CovµP n (f, g) − Covµ (P n f, P n g).
88
CHAPTER 4. BRANCHING PROCESSES
It follows that
n
X
CovµP n (f, g) − Covµ (P f, P g) =
µ H(k) − H(k − 1)
n
n
k=1
n
X
=
µ P k (P n−k f )(P n−k g) − P k−1 (P n−k+1 f )(P n−k+1 g)
k=1
=
n
X
µP k−1 Γ(P n−k f, P n−k g).
k=1
Changing the summation order (setting k 0 := n − k + 1), we arrive at (4.17).
We now apply this general formula to branching processes. To simplify matters,
we will only look at finite type spaces.
Proposition 4.15 (Second moment formula) Let X be a branching process
with finite type space S and offspring distribution Q. Let V i denote a random
variable with law Q(i, · ) and assume that
A(i, j) := E[V i (j)],
C(i; j, k) := E[V i (j)V i (k)]
(4.18)
are finite for all i, j, k ∈ S. Let A be the linear operator with matrix A(i, j) and
for functions f, g : S → R, let C(f, g) : S → R be defined by
X
C(i; j, k)f (j)g(k).
C(f, g)(i) :=
j,k∈S
P
Fox x ∈ N (S) and f : S → R, let xf :=
i x(i)f (i). Then, for functions
f, g : S → R, one has
Ex Xn f = xAn f,
n
X
(4.19)
Covx Xn f, Xn g =
xAn−k C(Ak−1 f, Ak−1 g),
k=1
where Covx denotes covariance w.r.t. to the law Px .
Proof The first formula in (4.19) has already been proved in Lemma 4.11. As in
the proof of Lemma 4.12, for any real function f on S, let lf : N (S) → R denote
4.6. SUPERCRITICAL PROCESSES
89
P
the ‘linear’ function lf (x) := i f (i)x(i). Then the first formula in (4.19) says
that
P n lf = lAn f
(f ∈ RS ),
which motivates us to take for the class C in Lemma 4.14 the class of ‘linear’
functions lf with f : S → R any function. Using the fact that C(i; j, k) < ∞ for
all i, j, k P
∈ S, it is not hard to prove that C satisfies the assumptions of Lemma 4.14.
|x|
Let x = β=1 δiβ and let V 1 , . . . , V |x| be independent such that V β is distributed
according to Q(iβ , · ). We calculate
Γ(lf , lg )(x) = P (lf lg ) − (P lf )(P lg ) (x)
X
X
X
X
= Cov
f (j)V β (j) ,
g(k)V γ (k) =
f (j)g(k)
Cov V β (j), V β (k)
j,β
=
X
k,γ
jk
β
x(i)f (j)g(k)C(i; j, k) = lC(f, g) (x),
ijk
where we have used that Cov V β (i), V γ (j) = 0 for β 6= γ by independence. Then
Lemma 4.14 tells us that
Covδx P n (lf , lg ) =
=
n
X
k=1
n
X
δx P n−k Γ(P k−1 lf , P k−1 lg ) =
k=1
δx P n−k lC(Ak−1 f, Ak−1 g) =
n
X
n
X
δx P n−k Γ(lAk−1 f , lAk−1 g )
k=1
lAn−k C(Ak−1 f, Ak−1 g) (x),
k=1
which proves the second formula in (4.19).
4.6
Supercritical processes
The aim of this section is to prove that supercritical branching processes survive.
Proposition 4.16 (Supercritical process) Let X be a branching process with
finite type space S and offspring distribution Q. Assume that the first and second
moments A(i, j) and C(i; j, k) of Q, defined in (4.18), are all finite and that A is
irreducible. Assume that the Perron-Frobenius eigenvalue α of A satisfies α > 1.
Then
Px Xk 6= 0 ∀k ≥ 0 > 0
x ∈ N (S), x 6= 0 .
90
CHAPTER 4. BRANCHING PROCESSES
Proof Let h be the (strictly positive) right eigenvector of A with eigenvalue α. We
have already seen in formulas (4.15)–(4.16) in the proof of Proposition 4.13 that
Mn = α−n Xn h is a nonnegative martingale that converges to an a.s. limit M∞ .
By Proposition 0.8, if we can show that M is uniformly integrable, then
Ex [M∞ ] = xh > 0
for all x 6= 0, which shows that the process survives with positive probability.
Thus, it suffices to show that
sup Ex Mn2 ] < ∞.
n≥0
We write
2
Ex Mn2 ] = Ex [α−n Xn h] + Varx (α−n Xn h),
where by the first formula in (4.19)
2
Ex [α−n Xn h] = (α−n xAn h)2 = (xh)2
is clearly bounded uniformly in n. We observe that since h is strictly positive and
S is finite, we can find a constant K < ∞ such that
C(h, h) ≤ Kh.
Therefore, applying the second formula in (4.19), we see that
x
Var (α
=α
−n
−2n
Xn h) = α
n
X
−2n
n
X
xAn−k C(Ak−1 h, Ak−1 h)
k=1
xA
n−k
α
2(k−1)
C(h, h) ≤ α
k=1
=α
−2n
K
n
X
α
n−k
α
2(k−1)
xh = Kxh
k=1
−2n
n
X
K
n
X
xAn−k α2(k−1) h
k=1
α
k−n−2
k=1
≤ Kxhα
−2
∞
X
α−k < ∞,
k=0
uniformly in n ≥ 1, where we have used that α > 1.
4.7
Trimmed processes
Let X be a multitype branching process with countable type space S and offspring
distribution Q. Recall from Lemma 4.6 that
ρ(i) := Pδi Xk 6= 0 ∀k ≥ 0
(i ∈ S)
4.7. TRIMMED PROCESSES
91
is the largest solution of the equation U ρ = ρ, where U is the generating operator
of X. Let ρ ∈ [0, 1]S be any solution of U ρ = ρ. The aim of the present section
is to prove a result similar to conditioning on the future (as in Proposition 1.5)
or intertwining of processes with one trap (Proposition 3.6), but now on the level
of the individuals in a branching process. More precisely, we will divide the population into two ‘sorts’ of individuals, with probabilities (depending on the type)
ρ(i) and 1 − ρ(i). In particular, if ρ is the survival probability, then we divide
the population at each time k into those individuals which we know are going to
survive (or, more precisely, that have living descendants at all times), and those
whose descendants are going to die out completely.
P|x|
To this aim, let x = β=1 δiβ ∈ N (S) and let χ1 , . . . , χ|x| be independent Bernoulli
random variables with P[χβ = 1] = ρ(iβ ). Doing this for any x ∈ N (S), we define
a probability kernel Lρ from N (S) to N (S × {0, 1}) by
|x|
Lρ (x, · ) := P
X
δ(iβ ,χβ ) ∈ · .
β=1
Another way to describe Lρ is to note that Lρ has the branching property (4.4)
and
Lρ (δi , y) = ρ(i)1{y=δ(i,1) } + (1 − ρ(i))1{y=δ(i,0) }
i ∈ S, y ∈ N (Ŝ) .
Note that this is very similar to the thinning kernel Kρ defined in (4.6).
We set
Ŝ := (i, σ) : i ∈ S, σ ∈ {0, 1}, K(δi , δ(i,σ) ) > 0 ,
i.e.,
Ŝ := S0 ∪ S1 ,
where
S0 := {(i, 0) : 1 − ρ(i) > 0} and S1 := {(i, 0) : 1 − ρ(i) > 0}.
Then Lρ is in effect a probability kernel from N (S) to N (Ŝ).
For any x ∈ N (S) and S 0 ⊂ S, let us write
xS 0 := x(i))i∈S 0 ∈ N (S 0 )
for the restriction of x to S 0 (and similarly for subsets of Ŝ). With this notation,
if Y is a random variable with law Lρ (x, · ), then Y |S1 (resp. Y |S0 ) is a thinning
of x with the function ρ (resp. 1 − ρ).
92
CHAPTER 4. BRANCHING PROCESSES
Recall that the offspring distribution Q of X is a probability kernel from S to
N (S). Let
E := y ∈ N (Ŝ) : y S1 = 0 .
Observe that QLρ is a probability kernel from S to N (Ŝ). For given i ∈ S, the
probability of E under QLρ (i, · ) is given by
X
QLρ (i, E) =
Q(i, x)(1 − ρ)x = 1 − U ρ(i) = 1 − ρ(i),
(4.20)
x∈N (S)
where we have used that U ρ = ρ. We define a probability kernel Q̂ from Ŝ to
N (Ŝ) by
(
QLρ i, · E
if σ = 0,
c
Q̂(i, σ; · ) :=
if σ = 1,
QLρ i, · E
where we are conditioning the probability law QLρ i, · ) on the event E and its complement, respectively. By (4.20), these conditional probabilities are well-defined,
i.e., QLρ (i, E) > 0 for all (i, 0) ∈ S0 and QLρ (i, E c ) > 0 for all (i, 1) ∈ S1 .
In words, our definition of Q̂ says that an individual of type (i, 0) ∈ S0 produces
offspring in the following manner. First, we produce offspring according to the
law Q(i, · ) of our original branching process. Next, we assign to these individuals
independent ‘signs’ 0, 1 with probabilities depending on the type of the individual
through the function ρ. Finally, we condition on producing only offspring with sign
0. Individuals of type (i, 1) ∈ S1 reproduce similarly, but in this case we condition
on producing at least one offspring with sign 1. Note that individuals of a type in
S1 always produce at least one offspring in S1 , and possibly also offspring in S0 .
Individuals in S0 produce only offspring in S0 , and possibly no offspring at all.
Theorem 4.17 (Distinguishing surviving particles) Let X = (Xk )k≥0 be a
branching process with finite type space S and irreducible offspring distribution Q.
Assume that X survives (with positive probability). Let ρ, Lρ , Ŝ, and Q̂ be as
defined above. Then X can be coupled to a branching process Y with type space Ŝ
and offspring distribution Q̂, in such a way that
P Yn = y (Xk )0≤k≤n = Lρ (Xn , · )
(y ∈ Ŝ, n ≥ 0).
(4.21)
Proof We apply Theorem 3.5. In fact, for our present purpose Theorem 3.4 is
sufficient, where the function ψ occurring there is given by
ψ(y)(i) := y(i, 0) + y(i, 1)
y ∈ N (Ŝ), i ∈ S .
4.7. TRIMMED PROCESSES
93
Let P and P̂ denote the transition kernels of X and Y , respectively. We need to
check that
P Lρ = Lρ P̂.
Since P, Lρ , and P̂ have the branching property (4.4), it suffices to check that
δi P Lρ = δi Lρ P̂
(i ∈ S).
Indeed, by our definition of Q̂,
δi Lρ P̂ = (1 − ρ(i))Q(i, 0; · ) + ρ(i)Q(i, 1; · )
= (1 − ρ(i))QLρ (i, · | E) + ρ(i)QLρ (i, · | E c ) = QLρ (i, · ) = δi P Lρ ,
where we have used (4.20).
Proposition 4.18 (Trimmed process) Let X and Y be the branching processes
with type spaces S and Ŝ in Theorem 4.17, let ρ be as in that theorem and let U
be the generating operator of X. Then
Yk S1 k≥0
is a branching process with type space S1 and generating operator U ρ given by
U ρ φ(i) = ρ−1 (i)U (ρφ)(i)
i ∈ S1 , φ ∈ [0, 1]S1 ,
where ρφ denotes the pointwise product of ρ and φ, which is extended to a function
on S by setting ρφ(j) = 0 for all j ∈ S\S1 .
Proof Since individuals in S0 never produce offpring in S1 , it is clear that the
restriction of Y to S1 is a branching process. Let Q0 be the offspring distribution
of the restricted process. Then Q0 (i, · ) is just QKρ (i, · ) conditioned on producing
at least one offspring, where Kρ is the thinning kernel defined in (4.6). Then,
setting G := N (S1 )\{0}, we have by (4.7)
U ρ φ(i) = δi Q0 Kφ (G) =
δi QKρφ (G)
U (ρφ)(i)
δi QKρ Kφ (G)
=
=
= ρ(i)−1 U (ρφ)(i),
δi QKρ (G)
δi QKρ (G)
U ρ(i)
where we have used that U ρ = ρ.
Remark In particular, if ρ is the survival probability, then the branching process
Y from Proposition 4.18 has been called the trimmed tree of X in [FS04], which
deals with continuous type spaces and continuous time. Similar constructions have
been used in branching theory long before this paper but it is hard to find a good
reference.
94
CHAPTER 4. BRANCHING PROCESSES
Exercise 4.19 (Nonbranching process) Let S be a countable set, let P 0 be a
probability kernel on S, and let Q be the offspring distribution defined by
Q(i, δj ) := P 0 (i, j)
(i, j ∈ S),
and Q(i, x) := 0 if |x| =
6 1. In other words: a particle at the position i produces
exactly one offspring on a position that is distributed according to P 0 (i, · ). Let
X be the branching process with type space S and ofspring distribution Q and let
U be its generating operator. Show that U = P 0 . In particular, this says that a
function ρ ∈ [0, 1]S solves U ρ = ρ if and only if ρ is harmonic for P 0 and U ρ = (P 0 )ρ
is just the classical Doob transform of P 0 .
If X is a multitype branching process with type space S and offspring distribution
Q, then let us write i → j if Q(i, {x : x(j) > 0}) > 0 and i
j if there
exist i = i0 → · · · → in = j. We say that Q is irreducible if i
j for all
i, j ∈ S. In particular, if Q has finite first moments, then this is equivalent to
irreducibility of the matrix A(i, j) in (4.18). We also define aperiodicity of Q in
the obvious way, i.e., an irreducible Q is aperiodic if the greatest common divisior
of {n ≥ 1 : P n (δi , {x : x ≥ δi }) > 0} is one for some, and hence for all i ∈ S. If Q
is irreducible and
ρ(i) = Pδi Xk 6= 0 ∀k ≥ 0
(i ∈ S),
(4.22)
then it is not hard to see that either ρ(i) > 0 for all i ∈ S, or ρ(i) = 0 for all i ∈ S.
In the first case, we say that X survives, while in the second case we say that X
dies out.
Exercise 4.20 (Immortal process) Let X be a branching process with finite
type space S, offspring distribution Q, and generating operator U . Assume that Q
is irreducible and aperiodic and that Q(i, 0) = 0 for each i ∈ S, i.e., each individual
always produces at least one offspring. Assume also that Q(i, {x : |x| ≥ 2}) > 0
for at least one i ∈ S. Then it is not hard to show that
Pδi [Xn (j) ≥ N ] −→ 1
n→∞
(i ∈ S, N < ∞).
Use this to show that for any φ ∈ [0, 1]S that is not identically zero,
U n φ(i) −→ 1
n→∞
(i ∈ S).
In particular, this shows that the equation U ρ = ρ has only two solutions: ρ = 0
and ρ = 1.
4.7. TRIMMED PROCESSES
95
Exercise 4.21 (Fixed points of generating operator) Let X be a branching
process with finite type space S, offspring distribution Q, and generating operator
U . Assume that Q is irreducible and aperiodic and that Q(i, {x : |x| ≥ 2}) > 0 for
at least one i ∈ S. Assume that the survival probability ρ in (4.22) is positive for
some, and hence for all i ∈ S. Show that for any φ ∈ [0, 1]S that is not identically
zero,
U n φ(i) −→ ρ(i)
(i ∈ S).
n→∞
In particular, this shows that ρ is the only nonzero solution of the equation U ρ = ρ.
Hint: use Proposition 4.18 to reduce the problem to the set-up of Excercise 4.20.
Exercise 4.22 (Exponential growth) In the set-up of Excercise 4.21, assume
that Q has finite first moments. Let α be the Perron-Frobenis eigenvalue of the
first moment matrix A defined in (4.12) and let h > 0 be the associated right
eigenvector. Assume that α > 0, let M = (Mn )n≥0 be the martingale defined in
(4.15), and let M∞ := limn→∞ Mn . Set
ρ0 (i) := Pδi M∞ > 0 .
Prove that ρ0 = ρ, where ρ is the survival probability defined in (4.22). Hint: show
that U ρ0 = ρ0 .
96
CHAPTER 4. BRANCHING PROCESSES
Appendix A
Supplementary material
A.1
The spectral radius
Proof of Lemma 2.10 Suppose that λ ∈ spec(A) and let f be an associated
eigenvector. Then kAn f k = |λ|n kf k which shows that kAn k1/n ≥ |λ| and hence
ρ(A) ≥ |λ|.
To complete the proof, it suffices to show that ρ(A) ≤ λ+ , where λ+ := sup{|λ| :
λ ∈ spec(A)}. We start with the case that A can be diagonalized, i.e., there
exists a basis {e1 , . . . , ed } of eigenvectors with associated eigenvalues λ1 , . . . , λd .
By Excercise 2.5 the choice of our norm on V is irrelevant. We choose the `1 -norm
with respect to the basis {e1 , . . . , ed }, i.e.,
X
kφk :=
|φ(i)|,
i=1
where φ(1), . . . , φ(d) are the coordinates of φ w.r.t. this basis. Then
n
kA φk = k
d
X
φ(i)λni ei k
=
i=1
d
X
|φ(i)| |λi |n ≤ λn+ kφk,
i=1
which proves that kAn k ≤ λn+ for each n ≥ 1 and hence ρ(A) ≤ λ+ .
In general, A need not be diagonalizable, but we can choose a basis such that
the matrix of A w.r.t. this basis has a Jordan normal form. Then we may write
A = D + E where D is the diagonal part of A and E has ones only on some places
just above the diagonal and zeroes elsewhere. One can check that E is nilpotent,
97
98
APPENDIX A. SUPPLEMENTARY MATERIAL
i.e., E m = 0 for some m ≥ 1. Moreover E commutes with D and kEk ≤ 1 if we
choose the `1 -norm with respect to the basis {e1 , . . . , ed }. Now
n
n
A = (D + E) =
m−1
X
k=0
n
Dn−k E k
k
and therefore
n
kA k ≤
m−1
X
k=0
m−1
m−1
X n
X n
n
n−k
n−k
n
n
kD k ≤
λ+ = λ+
λ−k
+ =: q(n)λ+ ,
k
k
k
k=0
k=0
where
−2
1
q(n) = 1 + nλ−1
+ + 2 n(n − 1)λ+ + · · ·
is a polynomial in n of degree m. In particular, this shows that kAn k (λ+ + ε)n
as n → ∞ for all ε > 0, which again yields that ρ(A) ≤ λ+ .
Bibliography
[AS10] S.R. Athreya and J.M. Swart. Survival of contact processes on the hierarchical group. Probab. Theory Relat. Fields 147(3), 529–563, 2010.
[Bil86] P. Billingsley. Probability and Measure. Wiley, New York, 1986.
[Chu74] K.L. Chung. A Course in Probability Theory, 2nd ed. Academic Press,
Orlando, 1974.
[DF90] Strong stationary times via a new form of duality. P. Diaconis and J. Fill.
Ann. Probab. 18(4), 1483–1522, 1990.
[DM09] P. Diaconis and L. Miclo. On times to quasi-stationarity for birth and
death processes. J. Theor. Probab. 22, 558–586, 2009.
[DS67] J.N. Darroch and E. Seneta. On quasi-stationary distributions in absorbing continuous-time finite Markov chains. J. Appl. Probab. 4, 192–196,
1967.
[DZ98] A. Dembo and O. Zeitouni. Large deviations techniques and applications
2nd edition. Applications of Mathematics 38. Springer, New York, 1998.
[Fil92] J.A. Fill. Strong stationary duality for continuous-time Markov chains. I.
Theory. J. Theor. Probab. 5(1), 45–70, 1992.
[FS04] K. Fleischmann and J.M. Swart. Trimmed trees and embedded particle
systems. Ann. Probab. 32(3A), 2179-2221, 2004.
[Gan00] F.R. Gantmacher. The Theory of Matrices, Vol. 2. AMS, Providence RI,
2000.
[Lach12] P. Lachout. Disktrétnı́ martingály. Skripta MFF UK, 2012.
99
100
BIBLIOGRAPHY
[Lig10] T.M. Liggett. Continuous Time Markov Processes. AMS, Providence,
2010.
[Lig99] T.M. Liggett. Stochastic interacting systems: contact, voter and exclusion
processes. Springer-Verlag, Berlin, 1999.
[LL10] G.F. Lawler and V. Limic. Random walk: A modern introduction. Cambridge Studies in Advances Mathematics 123. Cambridge University Press,
Cambridge, 2010.
[LP11] P. Lachout and Z. Prášková. Základy náhodných procesu̇ I. Skripta MFF
UK, 2011.
[LPW09] D.A. Levin, Y. Peres and E.L. Wilmer. Markov Chains and Mixing
Times. AMS, Providence, 2009.
[RP81] L.C.G. Rogers and J.W. Pitman. Markov functions. Ann. Probab. 9(4),
573–582, 1981.
[Sen73] E. Seneta. Non-Negative Matrices: An Introduction to Theory and Applications. George Allen & Unwin, London, 1973.
[Swa11] J.M. Swart. Intertwining of birth-and-death processes. Kybernetika 47(1),
1–14, 2011.
[WG75] H.W. Watson and F. Galton. On the Probability of the Extinction of
Families. Journal of the Anthropological Institute of Great Britain 4, 138–
144, 1875.
[Woe00] W. Woess. Random walks on infinite graphs and groups. Cambridge
Tracts in Mathematics 138. Cambridge University Press, Cambridge, 2000.
[Woe09] W. Woess. Denumerable Markov chains. Generating functions, boundary
theory, random walks on trees. EMS Textbooks in Mathematics. EMS,
Zürich, 2009.
Index
absorbing state, 24
adaptedness, 6
algebraic multiplicity, 49
autonomous Markov chain, 13, 65
generalized eigenspace, 49
generalized eigenvector, 49
generating operator, 79
geometric multiplicity, 49
Bernoulli random variable, 62
branching process, 76
branching property, 77
h-transform, 25
harmonic function, 23
homogeneous Markov chain, 12
characteristic polynomial, 49
codimension, 52
compensator, 8
convergence of σ-fields, 9
convolution, 77
coupling, 34
increments
independent, 31
integer lattice, 34
intertwining, 61
invariant
law, 19
measure, 19
invariant subspace, 52
irreducibility, 17, 41, 94
detailed balance, 21
Doob transform, 25
generalized, 46, 56
eigenspace, 49
eigenvector
left or right, 42
equivalence of states, 17
escape probability, 33
extinction, 94
Fekete’s lemma, 43
filtered probability space, 7
filtration, 6
first entrance time, 6
Galton-Watson process, 83
Gelfand’s formula, 50
Jordan normal form, 49
Markov
chain, 12
functional, 13
property, 10, 15
strong property, 16
martingale, 7
convergence, 9
multiplicity
algebraic, 49
geometric, 49
nonnegative definite, 49
101
102
nonnegative matrix, 41
null recurrence, 18
offspring distribution, 76
operator norm, 41
optional stopping, 8
period of a state, 18
periodicity, 18
Perron-Frobenius theorem, 42
positive recurrence, 17, 19
quasi-stationary law, 58
random mapping representation, 13
random walk
on integer lattice, 34
on tree, 30
real matrix, 41
recurrence, 17
null, 18
positive, 17, 19
reversibility, 21
sample path, 5
self-adjoint transition kernel, 21
similar matrices, 61
spectral gap
absolute, 54
spectral radius, 41
spectrum, 41
stochastic process, 5
stopped
Markov chain, 13
stochastic process, 7
submartingale, 8
stopping time, 6
strong Markov property, 16
subadditivity, 43
subharmonic function, 23
submartingale, 7
INDEX
convergence, 9
superharmonic function, 23
supermartingale, 7
convergence, 9
survival, 94
thinning, 62, 80
total variation distance, 38
transience, 17
transition
kernel, 12
probabilities, 12
transposed matrix, 42
trap, 24
type space, 75
uniformly integrable, 10