Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution of the Markov chain to its stationary measure as time tends to infinity. 1.1 Existence and uniqueness of the stationary measure Definition 1.1 [Stationary measure] Let X be an irreducible Markov chain with countable state space S and transition matrix Π. A measure µ on S is called a stationary measure for X if X (µΠ)(x) := µ(y)Π(y, x) = µ(x) for all x ∈ S, (1.1) y∈S or equivalently, hµ, Πf i = hµ, f i where hµ, f i = distribution. P x∈S for all bounded f, (1.2) µ(x)f (x). When µ is a probability measure, we say µ is a stationary The equivalence comes from the fact that µ is uniquely determined by its action on bounded test functions, while hµ, Πf i = hµΠ, f i. Example 1.2 A random walk on Zd —regardless of the distribution of its increment—has µ ≡ 1 as a stationary measure by virtue of the translation invariance of Zd . Any irreducible finite state Markov chain admits a unique stationary distribution, which is a left eigenvector of Π with eigenvalue 1. We are interested in the long time behavior of the Markov chain. If the chain is transient, P then for any x, y ∈ S, G(x, y) = n≥0 Πn (x, y) < ∞. In particular, Πn (x, y) → 0 as n → ∞. This rules out the existence and convergence to a stationary probability distribution. The more interesting cases are the null recurrent and positive recurrent Markov chains. Theorem 1.3 [Existence of a stationary measure for recurrent Markov chains] Let X be an irreducible recurrent Markov chain with countable state space S and transition matrix Π. Then for any x ∈ S with τx = inf{n ≥ 1 : Xn = x}, the measure µ(y) := ∞ X Px (Xn = y, n < τx ) = Ex n=0 is a stationary measure for X, and x −1 h τX n=0 P y∈S µ(y) = Ex [τx ]. 1 i 1{Xn =y} , y ∈ S, Remark. In words, µ(y) is the expected number of visits to y before the Markov chain returns to x. Note that µ(x) = 1. This is sometimes called the cycle trick. Proof. First we show that µ(y) < ∞ for all y ∈ S. Since µ(x) = 1, let y 6= x. Since the Markov chain is irreducible and recurrent, Py (τx < τy ) > 0. Therefore starting from y, the number of visits to y before the chain visits x is geometrically distributed. In particular, the expected number of visits to y before τx is finite, and so is µ(y). For each y 6= x, µ(y) = ∞ X Px (Xn = y, n < τx ) = n=1 = ∞ X X n=1 z∈S ∞ X X Px (Xn−1 = z, Xn = y, n < τx ) Px (Xn−1 = z, n − 1 < τx )Π(z, y) n=1 z∈S = X µ(z)Π(z, y), z∈S which verifies the stationarity of µ at all y 6= x. On the other hand, by the recurrence of X and a similar decomposition, 1= ∞ X Px (τx = n) = n=1 = ∞ X X n=1 y∈S ∞ XX Px (Xn−1 = y, n − 1 < τx , Xn = x) Px (Xn−1 = y, n − 1 < τx )Π(y, x) y∈S n=1 = X µ(y)Π(y, x), y∈S which verifies the stationarity of µ at x. Theorem 1.4 [Uniqueness of stationary measures for recurrent Markov chains] Let X be an irreducible recurrent Markov chain with countable state space S. Then the stationary measure µ for X is unique up to a constant multiple. Proof. Let µ be the stationary measure defined in Theorem 1.3 with µ(x) = 1. Let ν be any stationary measure with ν(x) = 1. We have for any y 6= x, X ν(y) = ν(x)Π(x, y) + ν(z1 )Π(z1 , y) (1.3) z1 6=x = ν(x)Π(x, y) + X X ν(x)Π(x, z1 )Π(z1 , y) + z1 6=x ν(z2 )Π(z2 , z1 )Π(z1 , y), z1 ,z2 6=x where we have substituted (1.3) into itself. Iterating the substitution indefinitely then gives ν(y) ≥ Π(x, y) + ∞ X X Π(x, z1 ) n−1 Y n=1 z1 ,··· ,zn 6=x = ∞ X Px (Xn = y, n < τx ) = µ(y). n=1 2 i=1 Π(zi , zi+1 ) Π(zn , y) Now suppose that ν(y) > µ(y) for some y ∈ S. By irreducibility, there exists n ∈ N with Πn (y, x) > 0. The stationarity of µ and ν implies X X µ(z)Πn (z, x) = µ(x) = 1 = ν(x) = ν(z)Πn (z, x). z∈S z∈S Therefore 0= X (ν(z) − µ(z))Πn (z, x) ≥ (ν(y) − µ(y))Πn (y, x) > 0, z∈S which is a contradiction. Therefore ν = µ. Combining Theorems 1.3 and 1.4 with the observation that transient irreducible Markov chains do not admit stationary probability distributions, we have the following. Corollary 1.5 [Stationary distributions] An irreducible Markov chain admits a stationary probability distribution µ (which is necessarily unique) if and only if it is positive recurrent, in which case µ(x) = Ex1[τx ] for all x ∈ S. 1.2 Convergence of the Markov chain We now proceed to the study of the convergence of an irreducible Markov chain, i.e., what is the limit of the probability measure Πn (x, ·) as n → ∞ for each x ∈ S? When the chain is transient, we have seen that Πn (x, y) → 0 for all x, y ∈ S. If the chain is null recurrent, then there is a unique (up to a constant multiple) stationary measure, which has infinite mass. Since Πn (x, ·) corresponds to the Markov chain starting with unit mass at x, we expect the measure to spread out and approximate a multiple of the stationary measure, hence Πn (x, y) → 0 for all x, y ∈ S. If the chain is positive recurrent, then it is natural to expect that Πn (x, y) → µ(y), the mass of the unique stationary distribution µ at y. The last statement is almost true, except for the issue of periodicity. To illustrate the problem, take a simple random walk on the Torus S := {0, 1, · · · , 2m} where 0 and 2m are identified. Clearly the Markov chain is irreducible and the uniform distribution on S is the unique stationary distribution. However, Πn (0, ·) is supported on the even sites when n is even, and on the odd sites when n is odd. So Πn (0, ·) does not converge to the uniform distribution on S. Therefore we first need to address the issue of periodicity. Definition 1.6 [Period of a Markov chain] Let X be an irreducible Markov chain with countable state space S and transition matrix Π. For x ∈ S, let Dx := {n : Πn (x, x) > 0} and let dx be the greatest common divisor (gcd) of Dx . Then dx is independent of x ∈ S, which we simply denote by d and call it the period of the Markov chain. When d = 1, we say the chain is aperiodic. In the definition above, we have used part of the following result. Lemma 1.7 Let X be an irreducible Markov chain with countable state space S. Then dx = dy for all x, y ∈ S. Furthermore, for any x ∈ S, Dx contains all sufficiently large multiples of dx . Proof. By irreducibility, there exist K, L ∈ N with ΠK (x, y) > 0 and ΠL (y, x) > 0. Therefore ΠK+L (x, x) ≥ ΠK (x, y)ΠL (y, x) > 0, 3 and hence dx |(K + L), i.e., dx divides K + L. For any m ∈ Dy , Πm (y, y) > 0, therefore ΠK+L+m ≥ ΠK (x, y)Πm (y, y)ΠL (y, x) > 0. So dx |(K +L+m). Since dx |(K +L), we have dx |m for all m ∈ Dy . Therefore dx |dy . Similarly we also have dy |dx , and hence dx = dy . Since dx is the greatest common divisor of Dx , it is the gcd of a finite subset n1 , · · · , nk ∈ Dx . By the properties of gcd, there exist a1 , · · · , ak ∈ Z such that k X ai ni = dx . i=1 Moving the terms with negative ai to the RHS above shows that there exists m ∈ N with mdx , (m + 1)dx ∈ Dx . For any n ≥ m2 , we can write ndx = (lm + r)dx = (l − r)mdx + r(m + 1)dx , where r is the remainder of n after diving by m, and l ≥ m > r by assumption. Therefore ndx ∈ Dx for all n ≥ m2 , which proves the lemma. We are now ready to state the convergence result for irreducible aperiodic Markov chains. Theorem 1.8 [Convergence of transition kernels] Let X be an irreducible aperiodic Markov chain with countable state space S. If the chain is transient or null recurrent, then lim Πn (x, y) = 0 n→∞ ∀ x, y ∈ S. (1.4) If the chain is positive recurrent with stationary distribution µ, then lim Πn (x, y) = µ(y) n→∞ ∀ x, y ∈ S. (1.5) Theorem 1.8 follows from the renewal theorem. Theorem 1.9 [Renewal Theorem] Let f be a probability distribution on N∪{∞} with mean P m = n nf (n) ∈ [1, ∞]. Assume further that D := {n ≥ 1 : f (n) > 0} has greatest common divisor 1. A renewal process (Un )n≥0 with renewal time distribution f is a homogeneous Markov chain with state space {0, 1, · · · } ∪ {∞} and transition probabilities p(x, x + n) = f (n) for all x ≥ 0 and p(∞, ∞) = 1. Then we have lim P0 (Ui = n for some i ∈ N) = n→∞ 1 . m (1.6) Proof of Theorem 1.8. For a Markov chain X starting from x ∈ S, if we let U0 = 0 and Un be the successive return times of X to x, then clearly Un is a renewal process with f (n) = Px (τx = n), m = Ex [τx ], and Πn (x, x) = P0 (Ui = n for some i ∈ N). Equations (1.4)–(1.5) with y = x then follows from the renewal theorem since Ex [τx ] = ∞ when X is 1 transient or null recurrent, and µ(x) = Ex1[τx ] = m when the chain is positive recurrent. When x 6= y, note that n X Πn (x, y) = Px (τy = i)Πn−i (y, y). i=1 Equations (1.4)–(1.5) then follow from the case for x = y and the dominated convergence theorem. 4 Remark 1.10 Not surprisingly, the renewal theorem can conversely be deduced from Theorem 1.8. Given a renewal process U on {0, 1, · · · } with renewal time distribution f on N ∪ {∞}, we can construct an irreducible aperiodic Markov chain X on {0, 1, · · · } ∪ {∞} as follows. Let Π(0, l) = f (l + 1) for l ∈ {0, 1, · · · } ∪ {∞}, Π(i, i − 1) = 1 for i ≥ 1, and Π(∞, ∞) = 1. Then the successive return times of X to 0 is distributed as U , and P0 (Ui = n for some i ∈ N) is P precisely Πn (0, 0). Since m = ∞ n=1 nf (n) = E0 [τ0 ], (1.6) follows from (1.4)–(1.5). Proof of Theorem 1.9. If f (∞) > 0, then τ∞ < ∞ almost surely for the Markov chain U , P and (1.6) clearly holds. From now on, we assume f (∞) = 0, so that n∈N f (n) = 1. Let p(n) = P0 (Ui = n for some i ∈ N). By decomposing in terms of the first renewal time, p(n) satisfies the recursive relation (known as the renewal equation) n X p(n) = f (i)p(n − i). (1.7) i=1 Summing over 1 ≤ n ≤ N , we obtain N X p(n) = (f (1) + · · · + f (N )) + (f (1) + · · · + f (N − 1))p(1) + · · · + f (1)p(N − 1) n=1 = N X p(N − n) n=1 where T (n + 1) = n X f (i) = i=n+1 f (i). N X p(N − n)(1 − T (n + 1)), n=1 i=1 P∞ N X Rearranging terms then gives T (n)p(N − n + 1) = 1 − T (N + 1) = n=1 N X f (n). (1.8) n=1 P Note that n=1 T (n) = m. By dominated convergence, if limn→∞ p(n) exists, then it must 1 be m . Let a = lim supn→∞ p(n), which is bounded by 1 since p(n) ≤ 1. By Cantor diagonalization, we can find a sequence (nj )j∈N along which p(nj + i) → q(i) for all i ∈ Z, with q(0) = a. We claim that q ≡ a. Assuming the claim, then taking the limit N → ∞ in (1.8) along the 1 sequence nj shows that a = 0 when m = ∞ by Fatou’s lemma, and a = m when m < ∞ by dominated convergence. It remains to verify q ≡ a. Applying the dominated convergence theorem along the sequence nj + k in (1.7) gives ∞ X q(k) = f (i)q(k − i). (1.9) i=1 In particular, a= ∞ X f (i)q(−i). i=1 Since by definition of a, q(−i) ≤ a for all i ∈ Z, we have q(−i) = a for all i ∈ D := {n ∈ N : f (n) > 0}. The same argument applied to (1.9) shows that q(−i) = a for all i ∈ ⊕2 D := {n = x + y : x, y ∈ D}, and inductively, for all i ∈ ⊕k D, k ∈ N, with ⊕k D defined analogously. Since the gcd of D is 1, the proof of Lemma 1.7 shows that q(−i) = a for all i sufficiently large. Substituting these values of q into (1.9) shows that q ≡ a. The same 1 argument can be used to show that lim inf p(n) = m when m < ∞, which proves Theorem 1.9. 5