Download Lecture 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability box wikipedia , lookup

Inductive probability wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Lecture 5
1
Markov chain: definition
Definition 1.1 [Markov chain] A sequence of random variables (Xn )n≥0 taking values
in a measurable state space (S, S) is called a (discrete time) Markov chain, if for Fn :=
σ(X0 , · · · , Xn ),
P(Xn+1 ∈ A|Fn ) = P(Xn+1 ∈ A|Xn )
∀ n ≥ 0 and A ∈ S.
(1.1)
If we interpret the index n ≥ 0 as time, then a Markov chain simply requires that the future
depends only on the present and not on the past.
Examples: 1. Random walks; 2. Branching processes; 3. Polya’s urn.
Remark. Note that any stochastic process (Xn )n≥0 taking values in S can be turned into
S
a Markov chain if we enlarge the state space from S to n∈N S n , and change the process
from (Xn )n≥0 to (X̃n )n≥0 with X̃n = (X0 , X1 , · · · , Xn ) ∈ S n+1 , namely the process becomes
Markov if we take its entire past to be its present state.
A more concrete way of characterizing Markov chains is by transition probabilities.
Definition 1.2 [Markov chain transition probabilities] A function p : S × S → [0, 1] is
called a transition probability, if
(i) For each x ∈ S, A → p(x, A) is a probability measure on (S, S).
(ii) For each A ∈ S, x → p(x, A) is a measurable function on (S, S).
We say a Markov chain (Xn )n≥0 has transition probabilities pn if
P(Xn ∈ A|Fn−1 ) = pn (Xn−1 , A)
(1.2)
almost surely for all n ∈ N and A ∈ S. If pn ≡ p for all n ∈ N, then we call (Xn )n≥0 a
time-homogeneous Markov chain, or a Markov chain with stationary transition probabilities.
If the underlying state space (S, S) is nice, then the distribution of a Markov chain X
satisfying (1.1) can be characterized by the initial distribution µ of X0 and the transition
probabilities (pn )n∈N . In particular, if S is a complete separable metric space with Borel σalgebra S, then regular conditional probability distributions always exist, which guarantees the
existence of transition probabilities pn . Conversely, a given family of transition probabilities
pn and an initial law µ for X0 uniquely determine a consistent family of finite dimensional
distributions:
Z
Z
Z
Pµ (Xi ∈ Ai , 0 ≤ i ≤ n) =
µ(dx0 )
p1 (x0 , dx1 ) · · ·
pn (xn−1 , dxn ),
(1.3)
A0
A1
An
which are the finite-dimensional distributions of (Xn )n≥0 . When (S, S) is a Polish space
with Borel σ-algebra S, by Kolmogorov’s extension theorem (see [1, Section A.7]), the law of
(Xn )n≥0 , regarded as a random variable taking values in (S N0 , S N0 ), is uniquely determined.
Here N0 := {0} ∪ N and S N0 is the Borel σ-algebra generated by the product topology on S N0 .
1
Theorem 1.3 [Characterization of Markov chains via transition probabilities] Suppose that (S, S) is a Polish space equipped with the Borel σ-algebra. Then to any collection
of transition probabilities pn : S × S → [0, 1] and any probability measure µ on (S, S), there
corresponds a Markov chain (Xn )n≥0 with state space (S, S), initial distribution µ, and finite
dimensional distributions given as in (1.3). Conversely if (Xn )n≥0 is a Markov chain with
initial distribution µ, then we can construct a family of transition probabilities (pn )n∈N such
that the finite dimensional distributions of X satisfy (1.3).
From (1.3), it is also easily seen that Pµ (·) =
Markov chain starting at X0 = x.
R
Px (·)µ(dx), where Px denotes the law of the
Remark. When there is no other randomness involved besides the Markov chain (Xn )n≥0 ,
it is customary to let (S N0 , S N0 , Pµ ) be the canonical probability space for X with initial
distribution µ.
From the one-step transition probabilities (pn )n∈N , we can easily construct the transition
probabilities between times k < l, i.e., P(Xl ∈ A|Fk ). Define
Z
Z
pk,l (x, A) =
· · · pk+1 (x, dyk+1 )pk+2 (yk+1 , dyk+2 ) · · · pl (yl−1 , A).
S
S
It is an easy exercise to show that
Theorem 1.4 [Chapman-Kolmogorov equations] The transition probabilities (pk,m )0≤k<m
satisfy the relations
Z
pk,n (x, A) =
pk,m (x, dy)pm,n (y, A)
(1.4)
S
for all k < m < n, x ∈ S and A ∈ S. In convolution notation, this reads
pk,n = pk,m ∗ pm,n .
In particular, for any 0 ≤ m < n,
P(Xn ∈ A|Fm ) = pm,n (Xm , A)
a.s.
Time homogeneous Markov chains are determined by their one-step transition probabilities
p = pn−1,n for all n ∈ N. We call p(k) = pn,n+k the k-step transition probabilities. The
Chapman-Kolmogorov equation (1.4) then reads
p(m+n) = p(m) ∗ p(n) .
2
The Markov and strong Markov property
We now restrict ourselves to time-homogeneous Markov chains. The Markov property asserts
that given the value of Xn , the law of (Xn , Xn+1 , · · · ) is the same as that of a Markov chain
starting from Xn , while the strong Markov property asserts that the same is true if we replace
n by a stopping time τ . When the stopping time is a hitting time of a particular point x0 ∈ S,
the strong Markov property tells us that the process renews itself and has no memory of the
past. Such renewal structures are particularly useful in the study of Markov chains.
We will formulate the Markov property as an equality in law in terms of conditional
expectations of bounded measurable functions.
2
Theorem 2.1 [The Markov property] Let (S N0 , S N0 , Pµ ) be the canonical probability space
of a homogeneous Markov chain X with initial distribution µ, and let Fn = σ(X0 , · · · , Xn ).
Let θn : S N0 → S N0 denote the shift map with (θn X)m = Xm+n for m ≥ 0. Then for any
bounded measurable function f : S N0 → R,
Eµ [f (θn X)|Fn ] = EXn [f ],
(2.5)
where Eµ (resp. EXn ) denotes expectation w.r.t. the Markov chain with initial law µ (resp.
δXn ).
Proof. It suffices to show that for all A ∈ Fn and all bounded measurable f ,
Eµ [f (θn X)1A ] = Eµ EXn [f ]1A .
(2.6)
We can use the π-λ theorem to restrict our attention to sets of the form A = {ω ∈ S N0 : ω0 ∈
A0 , ω1 ∈ A1 , · · · , ωn ∈ An }, and use the monotone class theorem to restrict our attention to
Q
functions of the form f (ω) = ki=0 gi (ωi ) for some k ∈ N and bounded measurable gi : S → R.
For A and f of the forms specified above, by successive conditioning and the fact that the
transition probabilities p of the Markov chain are regular conditional probabilities,
Eµ [f (θn X)1A ] = Eµ [gk (Xn+k ) · · · g0 (Xn )1An (Xn ) · · · 1A0 (X0 )]
Z
Z
Z
=
µ(dx0 )
p(x0 , dx1 ) · · ·
p(xn−1 , dxn )g0 (xn )
A0
A1
An
Z
Z
· p(xn , dxn+1 )g1 (xn+1 ) · · · p(xn+k−1 , dxn+k )gk (xn+k )
= Eµ EXn [g0 · · · gk ]1A = Eµ EXn [f ]1A .
(2.7)
Q
Given f = ki=0 gi (ωi ), the collection of sets A ∈ Fn which satisfy (2.7) is a λ-system, while
sets of the form A = {ω ∈ S N0 : ω0 ∈ A0 , · · · , ωn ∈ An } is a π-system. Therefore by π-λ
theorem, (2.7) holds for all A ∈ Fn .
Now we fix A ∈ Fn . Let H denote the set of bounded measurable functions for which
Q
(2.7) holds. We have shown that H contains all functions of the form f (ω) = ki=0 gi (ωi ).
In particular, H contains indicator functions of sets of the form A = {ω ∈ S N0 : ω0 ∈
A0 , · · · , ωk ∈ Ak }, which is a π-system that generates the σ-algebra S N0 . Clearly H is closed
under addition, scalar multiplication, and increasing limits. Therefore by the monotone class
theorem, H contains all bounded measurable functions.
Theorem 2.2 [Monotone class theorem] Let Π be a π-system which contains the full set
Ω, and let H be a collection of real-valued functions satisfying
(i) If A ∈ Π, then 1A ∈ H.
(ii) If f, g ∈ H, then f + g ∈ H, and cf ∈ H for any c ∈ R.
(iii) If fn ∈ H are non-negative, and fn ↑ f where f is bounded, then f ∈ H.
Then H contains all bounded measurable functions w.r.t. the σ-algebra generated by Π.
The monotone class theorem is a simple consequence of the π-λ theorem. See e.g. Durrett [1]
for a proof.
3
Theorem 2.3 [The strong Markov property] Following the setup of Theorem 2.1, let
τ be an (Fn )n≥0 stopping time. Let (fn )n≥0 be a sequence of uniformly bounded measurable
functions from S N0 → R. Then
Eµ [fτ (θτ X)|Fτ ]1{τ <∞} = EXτ [fτ ]1{τ <∞}
a.s.
(2.8)
Proof. Let A ∈ Fτ . Then
Eµ [fτ (θτ X)1A∩{τ <∞} ] =
∞
X
Eµ [fn (θn X)1A∩{τ =n} ].
n=0
Since A ∩ {τ = n} ∈ Fn , by the Markov property (2.5), the right hand side equals
∞
X
Eµ [EXn [fn ]1A∩{τ =n} ] = Eµ [EXτ [fτ ]1A∩{τ <∞} ],
n=0
which proves (2.8).
To illustrate the use of the strong Markov property and the reason for introducing the
dependence of the functions fn on n, we prove the following.
Example 2.4 [Reflection principle for simple symmetric random walks] Let Xn =
Pn
1
i=1 ξi , where ξi are i.i.d. with P(ξi = ±1) = 2 . Then for any a ∈ N,
P( max Xi ≥ a) = 2P(Xn ≥ a + 1) + P(Xn = a).
1≤i≤n
(2.9)
Proof. Let τa = inf{0 ≤ k ≤ n : Xk = a} with τa = ∞ if the set is empty. Then
max1≤i≤n Xi ≥ a if and only if τa ≤ n. Therefore
P( max Xi ≥ a) = P(τa ≤ n) = P(τa ≤ n, Xn < a) + P(τa ≤ n, Xn > a) + P(τa ≤ n, Xn = a).
1≤i≤n
Note that P(τa ≤ n, Xn > a) = P(Xn > a) because X is a nearest-neighbor random walk, and
similarly P(τa ≤ n, Xn = a) = P(Xn = a), while
P(τa ≤ n, Xn < a) = E[1{τa ≤n} P(Xn < a|Fτa )] = E[1{τa ≤n} Pa (Xn−τa < a)],
where we have applied (2.8) with fk = 1{Xn−k <a} if 0 ≤ k ≤ n and fk = 0 otherwise. By
symmetry, conditional on τa , we have Pa (Xn−τa < a) = Pa (Xn−τa > a). Therefore
P(τa ≤ n, Xn < a) = P(τa ≤ n, Xn > a) = P(Xn > a),
which then implies (2.9).
Remark. The proof of Theorem 2.3 shows that a discrete time Markov chain is always
strong Markov. However, this conclusion is false for continuous time Markov processes. The
reason is that there are uncountable number of times which may conspire together to make
the strong Markov property fail, even though the Markov property holds almost surely at
deterministic times. One way to guarantee the strong Markov property is to require the
transition probabilities pt (x, ·) to be continuous in t and x, which is called the Feller property.
4
3
Markov chains with a countable state space
We now focus on time-homogeneous Markov chains with a countable state space S. Let
(p(x, y))x,y∈S denote the 1-step transition probability kernel of the Markov chain (Xn )n≥0 ,
P
which is a matrix with non-negative entries and y∈S p(x, y) = 1 for all x ∈ S. Such matrices
are called stochastic matrices. The n-step transition probability kernel of the Markov chain is
P
then given by the n-th power of p, i.e., p(n) (x, y) = z∈S p(n−1) (x, z)p(z, y). We first consider
the following subclass of Markov chains.
Definition 3.1 [Irreducible Markov chains] A Markov chain with a countable state space
S is called irreducible if for all x, y ∈ S, p(n) (x, y) > 0 for some n ≥ 0.
In other words, every state communicates with every other state. A markov chain fails to be
irreducible either because the state space can be partitioned into non-communicating disjoint
subsets, or there are subsets of the Markov chain acting as syncs: once the Markov chain
enters the subset, it can never leave it.
Definition 3.2 [Transience, null recurrence, and positive recurrence] Let τy := inf{n >
0 : Xn = y} be the first hitting time (after time 0) of the state y ∈ S by the Markov chain X.
Any state x ∈ S can then be classified into the following three types:
(i) Transient, if Px (τx < ∞) < 1.
(ii) Null recurrent, if Px (τx < ∞) = 1 and Ex [τx ] = ∞.
(iii) Positive recurrent, if Px (τx < ∞) = 1 and Ex [τx ] < ∞.
It turns out that for an irreducible Markov chain, all states are of the same type. Therefore
transience, null recurrence and positive recurrence will also be used to classify irreducible
Markov chains. Before proving this claim, we first prove some preliminary results.
Lemma 3.3 Let ρxy = Px (τy < ∞) for x, y ∈ S. Let G(x, y) =
P∞ (n)
(x, y). If y is transient, then
n=0 p
 ρ
xy

if x =
6 y,

 1 − ρyy
G(x, y) =
1


if x = y.

1 − ρyy
P∞
n=0 Px (Xn
= y) =
(3.10)
If y is recurrent, then G(x, y) = ∞ for all x ∈ S with ρxy > 0.
Proof. Assuming X0 = y, let Ty0 = 0, and define inductively Tyk = inf{i > Tyk−1 : Xi = y}.
Namely, Tyk are the successive return times to y. By the strong Markov property, Py (Tyk <
∞|Tyk−1 < ∞) = Py (Ty1 < ∞) = ρyy . By successive conditioning, we thus have Py (Tyk < ∞) =
ρkyy . Therefore
∞
∞
X
X
1
G(y, y) =
Py (Tyk < ∞) =
ρkyy =
.
(3.11)
1 − ρyy
k=0
k=0
Therefore G(y, y) = ∞ if and only if ρyy = 1, i.e., y is recurrent.
5
For x 6= y, we first have to wait till X visits y, and
∞
X
G(x, y) =
Px (Tyk < ∞) =
k=1
where we used the fact that
Px (Ty1
ρxy
,
1 − ρyy
(3.12)
< ∞) = ρxy . This completes the proof of the lemma.
Lemma 3.4 If x ∈ S is recurrent, y 6= x, and ρxy := Px (τy < ∞) > 0, then Px (τy < τx ) > 0,
ρyx := Py (τx < ∞) = 1 = ρxy , and y is also recurrent.
Proof. If Px (τy < τx ) = 0 so that the Markov chain starting from x returns to x before
visiting y almost surely, then when it returns to x, it starts afresh and will not visit y before
a second return to x. Iterating this reasoning, the Markov chain will visit x infinitely often
before visiting y, which means it will never visit y, contradicting the assumption.
Suppose that ρyx < 1. Let k = inf{i > 0 : p(k) (x, y) > 0}. Since Px (τy < τx ) > 0, there exists k ≥ 1 and y1 , · · · , yk−1 ∈ S, all distinct from x and y such that p(x, y1 )p(y1 , y2 ) · · · p(yk−1 , y) >
0. Then
Px (τx = ∞) ≥ p(x, y1 ) · · · p(yk−1 , y)(1 − ρyx ) > 0,
which contradicts the recurrence of x. Hence ρyx = 1.
Since upon each return to x, with probability Px (τy < τx ) > 0, the Markov chain will
visit y before returning to x, it follows that ρxy = 1 because the Markov chain returns to x
infinitely often by recurrence, and the events that y is visited between different consecutive
returns to x are independent by the strong Markov property. Since ρyx = ρxy = 1, almost
surely the Markov chain starting from y will visit x and then return to y. Therefore y is also
recurrent.
We are now ready to prove
Theorem 3.5 For an irreducible Markov chain, all states are of the same type.
Proof. Lemma 3.4 has shown that if x is recurrent, then so is any other y ∈ S by the
irreducibility assumption. It remains to show that if x is positive recurrent, then so is any
y ∈ S. Let p = Px (τy < τx ), which is positive by Lemma 3.4. Then
Ex [τx ] ≥ Px (τy < τx )Ey [τx ].
Therefore Ey [τx ] ≤
1
p Ex [τx ]
< ∞. On the other hand,
Ex [τy ] ≤ Ex [1{τy <τx } τx ] + Ex [1{τx <τy } τy ] = Ex [1{τy <τx } τx ] + Ex [1{τx <τy } E[τy |Fτx ]]
= Ex [1{τy <τx } τx ] + Ex 1{τx <τy } (τx + Ex [τy ])
= Ex [τx ] + (1 − p)Ex [τy ].
Therefore Ex [τy ] ≤
1
p Ex [τx ],
and
Ey [τy ] ≤ Ey [τx ] + Ex [τy ] ≤
2
Ex [τx ] < ∞,
p
which proves the positive recurrence of y.
Remark. Theorem 3.5 allow us to classify an irreducible countable state space Markov chain
to be either transient, null recurrent, or positive recurrent, depending on the type of its states.
References
[1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont,
California, 1996.
6