Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 5 1 Markov chain: definition Definition 1.1 [Markov chain] A sequence of random variables (Xn )n≥0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if for Fn := σ(X0 , · · · , Xn ), P(Xn+1 ∈ A|Fn ) = P(Xn+1 ∈ A|Xn ) ∀ n ≥ 0 and A ∈ S. (1.1) If we interpret the index n ≥ 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. Examples: 1. Random walks; 2. Branching processes; 3. Polya’s urn. Remark. Note that any stochastic process (Xn )n≥0 taking values in S can be turned into S a Markov chain if we enlarge the state space from S to n∈N S n , and change the process from (Xn )n≥0 to (X̃n )n≥0 with X̃n = (X0 , X1 , · · · , Xn ) ∈ S n+1 , namely the process becomes Markov if we take its entire past to be its present state. A more concrete way of characterizing Markov chains is by transition probabilities. Definition 1.2 [Markov chain transition probabilities] A function p : S × S → [0, 1] is called a transition probability, if (i) For each x ∈ S, A → p(x, A) is a probability measure on (S, S). (ii) For each A ∈ S, x → p(x, A) is a measurable function on (S, S). We say a Markov chain (Xn )n≥0 has transition probabilities pn if P(Xn ∈ A|Fn−1 ) = pn (Xn−1 , A) (1.2) almost surely for all n ∈ N and A ∈ S. If pn ≡ p for all n ∈ N, then we call (Xn )n≥0 a time-homogeneous Markov chain, or a Markov chain with stationary transition probabilities. If the underlying state space (S, S) is nice, then the distribution of a Markov chain X satisfying (1.1) can be characterized by the initial distribution µ of X0 and the transition probabilities (pn )n∈N . In particular, if S is a complete separable metric space with Borel σalgebra S, then regular conditional probability distributions always exist, which guarantees the existence of transition probabilities pn . Conversely, a given family of transition probabilities pn and an initial law µ for X0 uniquely determine a consistent family of finite dimensional distributions: Z Z Z Pµ (Xi ∈ Ai , 0 ≤ i ≤ n) = µ(dx0 ) p1 (x0 , dx1 ) · · · pn (xn−1 , dxn ), (1.3) A0 A1 An which are the finite-dimensional distributions of (Xn )n≥0 . When (S, S) is a Polish space with Borel σ-algebra S, by Kolmogorov’s extension theorem (see [1, Section A.7]), the law of (Xn )n≥0 , regarded as a random variable taking values in (S N0 , S N0 ), is uniquely determined. Here N0 := {0} ∪ N and S N0 is the Borel σ-algebra generated by the product topology on S N0 . 1 Theorem 1.3 [Characterization of Markov chains via transition probabilities] Suppose that (S, S) is a Polish space equipped with the Borel σ-algebra. Then to any collection of transition probabilities pn : S × S → [0, 1] and any probability measure µ on (S, S), there corresponds a Markov chain (Xn )n≥0 with state space (S, S), initial distribution µ, and finite dimensional distributions given as in (1.3). Conversely if (Xn )n≥0 is a Markov chain with initial distribution µ, then we can construct a family of transition probabilities (pn )n∈N such that the finite dimensional distributions of X satisfy (1.3). From (1.3), it is also easily seen that Pµ (·) = Markov chain starting at X0 = x. R Px (·)µ(dx), where Px denotes the law of the Remark. When there is no other randomness involved besides the Markov chain (Xn )n≥0 , it is customary to let (S N0 , S N0 , Pµ ) be the canonical probability space for X with initial distribution µ. From the one-step transition probabilities (pn )n∈N , we can easily construct the transition probabilities between times k < l, i.e., P(Xl ∈ A|Fk ). Define Z Z pk,l (x, A) = · · · pk+1 (x, dyk+1 )pk+2 (yk+1 , dyk+2 ) · · · pl (yl−1 , A). S S It is an easy exercise to show that Theorem 1.4 [Chapman-Kolmogorov equations] The transition probabilities (pk,m )0≤k<m satisfy the relations Z pk,n (x, A) = pk,m (x, dy)pm,n (y, A) (1.4) S for all k < m < n, x ∈ S and A ∈ S. In convolution notation, this reads pk,n = pk,m ∗ pm,n . In particular, for any 0 ≤ m < n, P(Xn ∈ A|Fm ) = pm,n (Xm , A) a.s. Time homogeneous Markov chains are determined by their one-step transition probabilities p = pn−1,n for all n ∈ N. We call p(k) = pn,n+k the k-step transition probabilities. The Chapman-Kolmogorov equation (1.4) then reads p(m+n) = p(m) ∗ p(n) . 2 The Markov and strong Markov property We now restrict ourselves to time-homogeneous Markov chains. The Markov property asserts that given the value of Xn , the law of (Xn , Xn+1 , · · · ) is the same as that of a Markov chain starting from Xn , while the strong Markov property asserts that the same is true if we replace n by a stopping time τ . When the stopping time is a hitting time of a particular point x0 ∈ S, the strong Markov property tells us that the process renews itself and has no memory of the past. Such renewal structures are particularly useful in the study of Markov chains. We will formulate the Markov property as an equality in law in terms of conditional expectations of bounded measurable functions. 2 Theorem 2.1 [The Markov property] Let (S N0 , S N0 , Pµ ) be the canonical probability space of a homogeneous Markov chain X with initial distribution µ, and let Fn = σ(X0 , · · · , Xn ). Let θn : S N0 → S N0 denote the shift map with (θn X)m = Xm+n for m ≥ 0. Then for any bounded measurable function f : S N0 → R, Eµ [f (θn X)|Fn ] = EXn [f ], (2.5) where Eµ (resp. EXn ) denotes expectation w.r.t. the Markov chain with initial law µ (resp. δXn ). Proof. It suffices to show that for all A ∈ Fn and all bounded measurable f , Eµ [f (θn X)1A ] = Eµ EXn [f ]1A . (2.6) We can use the π-λ theorem to restrict our attention to sets of the form A = {ω ∈ S N0 : ω0 ∈ A0 , ω1 ∈ A1 , · · · , ωn ∈ An }, and use the monotone class theorem to restrict our attention to Q functions of the form f (ω) = ki=0 gi (ωi ) for some k ∈ N and bounded measurable gi : S → R. For A and f of the forms specified above, by successive conditioning and the fact that the transition probabilities p of the Markov chain are regular conditional probabilities, Eµ [f (θn X)1A ] = Eµ [gk (Xn+k ) · · · g0 (Xn )1An (Xn ) · · · 1A0 (X0 )] Z Z Z = µ(dx0 ) p(x0 , dx1 ) · · · p(xn−1 , dxn )g0 (xn ) A0 A1 An Z Z · p(xn , dxn+1 )g1 (xn+1 ) · · · p(xn+k−1 , dxn+k )gk (xn+k ) = Eµ EXn [g0 · · · gk ]1A = Eµ EXn [f ]1A . (2.7) Q Given f = ki=0 gi (ωi ), the collection of sets A ∈ Fn which satisfy (2.7) is a λ-system, while sets of the form A = {ω ∈ S N0 : ω0 ∈ A0 , · · · , ωn ∈ An } is a π-system. Therefore by π-λ theorem, (2.7) holds for all A ∈ Fn . Now we fix A ∈ Fn . Let H denote the set of bounded measurable functions for which Q (2.7) holds. We have shown that H contains all functions of the form f (ω) = ki=0 gi (ωi ). In particular, H contains indicator functions of sets of the form A = {ω ∈ S N0 : ω0 ∈ A0 , · · · , ωk ∈ Ak }, which is a π-system that generates the σ-algebra S N0 . Clearly H is closed under addition, scalar multiplication, and increasing limits. Therefore by the monotone class theorem, H contains all bounded measurable functions. Theorem 2.2 [Monotone class theorem] Let Π be a π-system which contains the full set Ω, and let H be a collection of real-valued functions satisfying (i) If A ∈ Π, then 1A ∈ H. (ii) If f, g ∈ H, then f + g ∈ H, and cf ∈ H for any c ∈ R. (iii) If fn ∈ H are non-negative, and fn ↑ f where f is bounded, then f ∈ H. Then H contains all bounded measurable functions w.r.t. the σ-algebra generated by Π. The monotone class theorem is a simple consequence of the π-λ theorem. See e.g. Durrett [1] for a proof. 3 Theorem 2.3 [The strong Markov property] Following the setup of Theorem 2.1, let τ be an (Fn )n≥0 stopping time. Let (fn )n≥0 be a sequence of uniformly bounded measurable functions from S N0 → R. Then Eµ [fτ (θτ X)|Fτ ]1{τ <∞} = EXτ [fτ ]1{τ <∞} a.s. (2.8) Proof. Let A ∈ Fτ . Then Eµ [fτ (θτ X)1A∩{τ <∞} ] = ∞ X Eµ [fn (θn X)1A∩{τ =n} ]. n=0 Since A ∩ {τ = n} ∈ Fn , by the Markov property (2.5), the right hand side equals ∞ X Eµ [EXn [fn ]1A∩{τ =n} ] = Eµ [EXτ [fτ ]1A∩{τ <∞} ], n=0 which proves (2.8). To illustrate the use of the strong Markov property and the reason for introducing the dependence of the functions fn on n, we prove the following. Example 2.4 [Reflection principle for simple symmetric random walks] Let Xn = Pn 1 i=1 ξi , where ξi are i.i.d. with P(ξi = ±1) = 2 . Then for any a ∈ N, P( max Xi ≥ a) = 2P(Xn ≥ a + 1) + P(Xn = a). 1≤i≤n (2.9) Proof. Let τa = inf{0 ≤ k ≤ n : Xk = a} with τa = ∞ if the set is empty. Then max1≤i≤n Xi ≥ a if and only if τa ≤ n. Therefore P( max Xi ≥ a) = P(τa ≤ n) = P(τa ≤ n, Xn < a) + P(τa ≤ n, Xn > a) + P(τa ≤ n, Xn = a). 1≤i≤n Note that P(τa ≤ n, Xn > a) = P(Xn > a) because X is a nearest-neighbor random walk, and similarly P(τa ≤ n, Xn = a) = P(Xn = a), while P(τa ≤ n, Xn < a) = E[1{τa ≤n} P(Xn < a|Fτa )] = E[1{τa ≤n} Pa (Xn−τa < a)], where we have applied (2.8) with fk = 1{Xn−k <a} if 0 ≤ k ≤ n and fk = 0 otherwise. By symmetry, conditional on τa , we have Pa (Xn−τa < a) = Pa (Xn−τa > a). Therefore P(τa ≤ n, Xn < a) = P(τa ≤ n, Xn > a) = P(Xn > a), which then implies (2.9). Remark. The proof of Theorem 2.3 shows that a discrete time Markov chain is always strong Markov. However, this conclusion is false for continuous time Markov processes. The reason is that there are uncountable number of times which may conspire together to make the strong Markov property fail, even though the Markov property holds almost surely at deterministic times. One way to guarantee the strong Markov property is to require the transition probabilities pt (x, ·) to be continuous in t and x, which is called the Feller property. 4 3 Markov chains with a countable state space We now focus on time-homogeneous Markov chains with a countable state space S. Let (p(x, y))x,y∈S denote the 1-step transition probability kernel of the Markov chain (Xn )n≥0 , P which is a matrix with non-negative entries and y∈S p(x, y) = 1 for all x ∈ S. Such matrices are called stochastic matrices. The n-step transition probability kernel of the Markov chain is P then given by the n-th power of p, i.e., p(n) (x, y) = z∈S p(n−1) (x, z)p(z, y). We first consider the following subclass of Markov chains. Definition 3.1 [Irreducible Markov chains] A Markov chain with a countable state space S is called irreducible if for all x, y ∈ S, p(n) (x, y) > 0 for some n ≥ 0. In other words, every state communicates with every other state. A markov chain fails to be irreducible either because the state space can be partitioned into non-communicating disjoint subsets, or there are subsets of the Markov chain acting as syncs: once the Markov chain enters the subset, it can never leave it. Definition 3.2 [Transience, null recurrence, and positive recurrence] Let τy := inf{n > 0 : Xn = y} be the first hitting time (after time 0) of the state y ∈ S by the Markov chain X. Any state x ∈ S can then be classified into the following three types: (i) Transient, if Px (τx < ∞) < 1. (ii) Null recurrent, if Px (τx < ∞) = 1 and Ex [τx ] = ∞. (iii) Positive recurrent, if Px (τx < ∞) = 1 and Ex [τx ] < ∞. It turns out that for an irreducible Markov chain, all states are of the same type. Therefore transience, null recurrence and positive recurrence will also be used to classify irreducible Markov chains. Before proving this claim, we first prove some preliminary results. Lemma 3.3 Let ρxy = Px (τy < ∞) for x, y ∈ S. Let G(x, y) = P∞ (n) (x, y). If y is transient, then n=0 p ρ xy if x = 6 y, 1 − ρyy G(x, y) = 1 if x = y. 1 − ρyy P∞ n=0 Px (Xn = y) = (3.10) If y is recurrent, then G(x, y) = ∞ for all x ∈ S with ρxy > 0. Proof. Assuming X0 = y, let Ty0 = 0, and define inductively Tyk = inf{i > Tyk−1 : Xi = y}. Namely, Tyk are the successive return times to y. By the strong Markov property, Py (Tyk < ∞|Tyk−1 < ∞) = Py (Ty1 < ∞) = ρyy . By successive conditioning, we thus have Py (Tyk < ∞) = ρkyy . Therefore ∞ ∞ X X 1 G(y, y) = Py (Tyk < ∞) = ρkyy = . (3.11) 1 − ρyy k=0 k=0 Therefore G(y, y) = ∞ if and only if ρyy = 1, i.e., y is recurrent. 5 For x 6= y, we first have to wait till X visits y, and ∞ X G(x, y) = Px (Tyk < ∞) = k=1 where we used the fact that Px (Ty1 ρxy , 1 − ρyy (3.12) < ∞) = ρxy . This completes the proof of the lemma. Lemma 3.4 If x ∈ S is recurrent, y 6= x, and ρxy := Px (τy < ∞) > 0, then Px (τy < τx ) > 0, ρyx := Py (τx < ∞) = 1 = ρxy , and y is also recurrent. Proof. If Px (τy < τx ) = 0 so that the Markov chain starting from x returns to x before visiting y almost surely, then when it returns to x, it starts afresh and will not visit y before a second return to x. Iterating this reasoning, the Markov chain will visit x infinitely often before visiting y, which means it will never visit y, contradicting the assumption. Suppose that ρyx < 1. Let k = inf{i > 0 : p(k) (x, y) > 0}. Since Px (τy < τx ) > 0, there exists k ≥ 1 and y1 , · · · , yk−1 ∈ S, all distinct from x and y such that p(x, y1 )p(y1 , y2 ) · · · p(yk−1 , y) > 0. Then Px (τx = ∞) ≥ p(x, y1 ) · · · p(yk−1 , y)(1 − ρyx ) > 0, which contradicts the recurrence of x. Hence ρyx = 1. Since upon each return to x, with probability Px (τy < τx ) > 0, the Markov chain will visit y before returning to x, it follows that ρxy = 1 because the Markov chain returns to x infinitely often by recurrence, and the events that y is visited between different consecutive returns to x are independent by the strong Markov property. Since ρyx = ρxy = 1, almost surely the Markov chain starting from y will visit x and then return to y. Therefore y is also recurrent. We are now ready to prove Theorem 3.5 For an irreducible Markov chain, all states are of the same type. Proof. Lemma 3.4 has shown that if x is recurrent, then so is any other y ∈ S by the irreducibility assumption. It remains to show that if x is positive recurrent, then so is any y ∈ S. Let p = Px (τy < τx ), which is positive by Lemma 3.4. Then Ex [τx ] ≥ Px (τy < τx )Ey [τx ]. Therefore Ey [τx ] ≤ 1 p Ex [τx ] < ∞. On the other hand, Ex [τy ] ≤ Ex [1{τy <τx } τx ] + Ex [1{τx <τy } τy ] = Ex [1{τy <τx } τx ] + Ex [1{τx <τy } E[τy |Fτx ]] = Ex [1{τy <τx } τx ] + Ex 1{τx <τy } (τx + Ex [τy ]) = Ex [τx ] + (1 − p)Ex [τy ]. Therefore Ex [τy ] ≤ 1 p Ex [τx ], and Ey [τy ] ≤ Ey [τx ] + Ex [τy ] ≤ 2 Ex [τx ] < ∞, p which proves the positive recurrence of y. Remark. Theorem 3.5 allow us to classify an irreducible countable state space Markov chain to be either transient, null recurrent, or positive recurrent, depending on the type of its states. References [1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont, California, 1996. 6