Download Lecture 10

Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) → (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although it might be empty. We will show that any measure µ ∈ M can be decomposed as mixtures of extremal elements of M, which are exactly the ergodic measures for T . Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Proof. If µ ∈ M is not ergodic, then there exists A ∈ F with µ(A) ∈ (0, 1) and A is an invariant set for T . Let µA (resp. µAc ) denote the restriction of µ to A (resp. Ac ) and normalized to be a probability measure, i.e., µA (·) = µ(A ∩ ·) . µ(A) Then µA and µAc are distinct invariant probability measures for T , and µ = αµA + (1 − α)µAc , where α = µ(A) ∈ (0, 1), which shows that µ is not extremal. Conversely, if µ ∈ M is not extremal, then µ = αµ1 + (1 − α)µ2 for some α ∈ (0, 1) and distinct µ1 , µ2 ∈ M. If µ was ergodic, then by the ergodic theorem, for any bounded measurable f on (Ω, F), An f (ω) = f (ω) + f (T ω) + · · · + f (T n−1 ω) −→ Eµ [f ] n µ a.s. and in L1 (Ω, F, µ). In particular, An f (ω) also converges to Eµ [f ] almost surely w.r.t. µ1 (resp. µ2 ), and hence Eµ1 [f ] = Eµ2 [f ] = Eµ [f ]. Since f is any bounded measurable function, this implies that µ1 = µ2 = µ, a contradiction. Therefore given that µ is non-extremal, it cannot be ergodic. By applying the ergodic theorem to suitable test functions, one can prove: Lemma 1.1 [Singularity of ergodic measures] Distinct ergodic measures µ1 , µ2 ∈ M are mutually singular. More specifically, there exists A ∈ I s.t. µ1 (A) = µ2 (Ac ) = 1. Choquet’s Theorem (see Lax [2, Section 13.4]) provides a decomposition of a metrizable compact convex subset K of a locally convex topological vector space in terms of the extremal points of K. Since the set of invariant probability measures M in general may not be compact, we will not appeal to Choquet’s theorem. Instead, we will assume that (Ω, F) is a complete separable metric space with Borel σ-algebra, and appeal to the existence of regular conditional probability distributions. 1 Theorem 1.2 [Ergodic decomposition] Let Ω be a complete separable metric space with Borel σ-algebra F. Let T be a measurable transformation on (Ω, F) and let M denote the set of probability measures on (Ω, F) invariant w.r.t. T . Then for any µ ∈ M, there exists a probability measure ρµ on the set of ergodic measures Me such that Z ν ρµ (dν). (1.1) µ= Me Remark. The σ-algebra we use for defining ρµ on M is the Borel σ-algebra induced by the weak topology on M, i.e., µn → µ in M w.r.t. the weak R R topology if and only if for all bounded continuous functions f : Ω → R, we have f dµn → f dµ. Such a convergence of probability measures on (Ω, F) is called weak convergence. Proof. Since (Ω, F) is Polish, there exists a regular conditional probability µω of µ conditional on the invariant σ-field I. Provided we can show that µω ∈ Me almost surely, we can regard µω as a map from Ω to Me , and denote the distribution of µω by ρµ . The decomposition (1.1) then follows readily. We now verify that µω (·) := µ(·|I) is ergodic µ a.s. First we show invariance, i.e., µ a.s., µω (A) = µω (T −1 A) ∀ A ∈ F. (1.2) A priori, there are uncountable number of sets in F, and the exceptional sets may pile up. However, by our assumption that (Ω, F) is Polish, F can be generated by a countable collection of sets F0 , and hence it suffices to verify (1.2) for A ∈ F0 since µω is a probability measure a.s. Since µω (·) = µ(·|I), given A ∈ F0 , µω (A) = µω (T −1 A) a.s. (i.e., µ(A|I) = µ(T −1 A|I) a.s.) if and only if µ(A ∩ E) = µ(T −1 A ∩ E) ∀ E ∈ I, which holds since E ∈ I implies that µ(E∆T −1 E) = 0, and µ(A ∩ E) = µ(T −1 (A ∩ E)). This proves the a.s. invariance of µω for T . For the a.s. ergodicity of µω , it suffices to show that for µ a.s. every µω , ∀ A ∈ F, An 1A (ω) := 1A (ω) + 1A (T ω) + · · · + 1A (T n−1 ω) → µω (A) a.s. w.r.t. µω . (1.3) n Approximating A ∈ F by sets that are finitely generated from F0 , it suffices to verify (1.3) for A ∈ F0 . For such an A, the ergodic theorem applied to 1A w.r.t. µ implies that An 1A (ω) → µ(A|I) = µω (A) a.s. w.r.t. µ. Since µω is the regular conditional probability of µ given I, (1.3) must hold. 2 Structure of stationary Markov chains We now apply the ergodic decomposition theorem for stationary measures to stationary Markov chains. Let Π(x, dy) be a transition probability kernel on the state space (S, S). In this section, we will consider a general Polish space (S, S). A Markov process (Xn )n∈N is stationary if and only if its marginal distribution µ is stationary for Π. More precisely, Z µ ∈ M := {ν : ν(S) = 1, ν(A) = Π(x, A)ν(dx) ∀ A ∈ S}. S Given marginal law µ ∈ M, we can embed the stationary Markov process (Xn )n∈N in a doubly infinite stationary sequence (Xn )n∈Z . The process (Xn )n∈Z can be regarded as a random 2 variable taking values in the sequence space (S Z , S Z ) where S Z denotes the product σ-algebra on the product space S Z . Given marginal law µ ∈ M, let Pµ denote the law of (Xn )n∈Z on (S Z , S Z ). Let T denote the coordinate shift map on S Z . Then each µ ∈ M determines a f where M f is the family of probability measures on (S Z , S Z ) invariant for the shift Pµ ∈ M, map T . Our goal is to show that the ergodic components of a stationary Markov process Pµ are stationary Markov processes Pν with ν ∈ M, where ν are the extremal components of µ in M. (Note that in general, ergodic decomposition a stationary process gives ergodic processes which need not be Markov). Theorem 2.1 [Ergodic decomposition of stationary Markov processes] Given µ ∈ M, Pµ is ergodic for the shift map T if and only if µ ∈ Me , i.e., µ is extremal in the family of invariant measures M for the Markov chain. Furthermore, for any µ ∈ M, there exists a probability measure ρµ on Me such that Z Z µ= νρµ (dν) and Pµ = Pν ρµ (dν). (2.1) Me Me The extremal elements of M are called the extremal or ergodic invariant measures. When M is a singleton, we say the Markov chain is ergodic. f which is equivalent to Proof. If µ ∈ M is not extremal, then neither is Pµ extremal in M, Pµ not being ergodic. The key to proving the converse is the following result. Lemma 2.1 Let µ ∈ M, and let I be the invariant σ-field on (S Z , S Z ) for the shift map T and the measure Pµ (note that we defined I modulo sets of Pµ measure 0). Then within sets n = σ(x , x Z of Pµ measure 0, I ⊂ F00 , where Fm m m+1 , · · · , xn ) on S = {(xi )i∈Z : xi ∈ S}. Proof. The lemma shows that, for any E ∈ I, there exists A ∈ S such that E = {(xn )n∈Z : x0 ∈ A} modulo sets of Pµ measure zero. The proof relies on the fact that invariant sets lie ∞ := ∩ F ∞ , as well as the infinite past F −∞ := ∩ F n , and the both in the infinite future F∞ n n n −∞ −∞ past and the future of a Markov process are independent conditioned on the present. Thus for E ∈ I, Pµ [E|F00 ] = Pµ [E ∩ E|F00 ] = Pµ [E|F00 ]2 . Therefore Pµ [E|F00 ] = 0 or 1 µ a.s. Let A ∈ S be the set on which Pµ [E|F00 ] = 1 a.s. Then by the invariance of E under the shift T , we have E = AZ := {(xn )n∈Z ∈ S Z : xn ∈ A ∀ n ∈ Z} modulo sets of Pµ measure zero, while E c = (Ac )Z . In particular, for µ almost all x ∈ S, if x ∈ A (resp. x ∈ Ac ), then the Markov chain starting at x never leaves A (resp. Ac ). Therefore, E = {(xn )n∈Z ∈ S Z : x0 ∈ A} modulo sets of Pµ measure zero, which proves the lemma. With Lemma 2.1, we can conclude the proof of Theorem 2.1. Suppose that Pµ is not ergodic, then Pµ is a mixture of Pµ [·|I], which are ergodic measures on (S Z , S Z ). Since I ⊂ F00 by Lemma 2.1, Pµ [·|I] are almost surely mixtures of Pµ [·|F00 ], which are measures of the Markov chain with specified values at time 0. Hence Pµ [·|I] are stationary Markov processes with marginal laws in M, and µ is a mixture of these marginal laws, which means that µ is not extremal in M. The same reasoning also allows us to deduce (2.1) from the ergodic decomposition of Pµ . Remark. Note that extremal measures in M must be singular w.r.t. each other, since the associated ergodic Markov processes are singular w.r.t. each other by Theorem 2.1. 3 Remark. A sufficient condition to guarantee the uniqueness of a stationary distribution (if it exists) for a Markov chain is to have some form of irreducibility. If M is not a singleton, then we can find two extremal invariant measure with disjoint support U1 and U2 in the state space, such that the Markov chain makes no transitions between U1 and U2 . Any irreducibility condition that breaks such a partition of the state space will guarantee the existence of at most one stationary distribution. One such condition is if Π(x, dy) has a positive density p(x, y) w.r.t. a common reference measure α(dy) for all x in the state space. 3 Harris chains So far we have studied mostly countable state Markov chains, although the ergodic decomposition of stationary Markov chains was developed for a general Polish space. We now discuss briefly the theory of general state space Markov chains. One class of Markov chains that admit a similar treatment as the countable state space case is the so-called Harris chains. Definition 3.1 (Harris Chains) A Markov chain (Xn )n≥0 with state space (S, S) and transition kernel Π(·, ·) is called a Harris chain, if there exist A, B ∈ S, > 0, and a probability measure ρ with ρ(B) = 1 such that: (i) If τA := inf{n ≥ 0 : Xn ∈ A}, then Pz (τA < ∞) > 0 for all z ∈ S. (ii) If x ∈ A, then Π(x, C) ≥ ρ(C) for all C ∈ S with C ⊂ B. The conditions of a Harris chain allow us to construct an equivalent Markov chain X̄ with state space S̄ := S ∪ {α} and σ-algebra S̄ := {B, B ∪ {α} : B ∈ S}, where α is an artificial atom that the chain X̄ will visit. More precisely, define X̄ with transition probability kernels Π̄, such that If x ∈ S\A, Π̄(x, C) = Π(x, C) for C ∈ S, If x ∈ A, Π̄(x, {α}) = , and Π̄(x, C) = Π(x, C) − ρ(C) for C ∈ S, Z Π̄(α, D) = ρ(dx)Π̄(x, D) for D ∈ S̄. If x = α, X̄n being in the state α corresponds to Xn being distributed as ρ on B. This correspondence allows us to go from the distribution of X to X̄ and vice versa. Having a macroscopic atom α allows us to define transience, recurrence, periodicity, and use the cycle trick to construct stationary measures for recurrent Harris chains, and use coupling to prove convergence of positive recurrent Harris chains to its unique stationary distribution. Definition 3.2 (Recurrence, transience, and periodicity) Let τα := inf{n ≥ 1 : X̄n = α}. X is called a recurrent Harris chain if Pα (τα < ∞) = 1, and transient otherwise. The gcd of D := {n ≥ 1 : Pα (X̄n = α) > 0} is called the period of the Harris chain, with d = 1 corresponding to aperiodicity. Note that Definition 3.1 (i) guarantees that Px (τα < ∞) > 0 for all x ∈ S̄, which is a form of irreducibility for the chain X̄. The theory we developed for countable state Markov chains can be adapted to Harris chains. See e.g. [1] for more details. 4 Theorem 3.1 (Stationary measures) If X is a recurrent Harris chain, then there exists a unique (modulo constant multiple) stationary measure. If X is furthermore aperiodic with stationary distribution π, then for any x ∈ S with Px (τα < ∞) = 1, we have kΠn (x, ·)−π(·)k → 0, where k · k denotes the total variation norm of a signed measure. We next give some sufficient conditions for a Harris chain to be positive recurrent, i.e., Eα [τα ] < ∞, which is based on the existence of certain Lyapunov functions. Theorem 3.2 (Sufficient conditions for positive recurrence) Let X be a Harris chain satisfying the conditions in Definition 3.1, where we further assume that A = B. Assume that there exists a function g : S → [0, ∞) with supx∈A Ex [g(X1 )] < ∞, such that (i) either g : S → [1, ∞) and there exists r ∈ (0, 1) s.t. Ex [g(X1 )] ≤ rg(x) for all x ∈ Ac , (ii) or Ex [g(X1 )] ≤ g(x) − for all x ∈ Ac , then Eα [τα ] < ∞ and X is a positive recurrent Harris chain. Proof. Since every time the Markov chain X̄ enters the set A = B, there is probability of entering the state α in the next step, to show Eα [τα ] < ∞, it suffices to show that sup Ex [τA ] < ∞, where τA := min{n ≥ 1 : Xn ∈ A}. (3.1) x∈A Note that condition (i) implies that g(Xn∧τA )r−n∧τA is a super-martingale. Therefore g(x) ≥ Ex [g(Xn∧τA )r−n∧τA ] ≥ Ex [r−n∧τA ]. Letting n → ∞ then gives Ex [r−τA ] ≤ g(x) ∀ x ∈ Ac . (3.2) By the Markov inequality, this further implies that Ex [τA ] = ∞ X n=1 Px (τA ≥ n) ≤ ∞ X n=1 rn g(x) < g(x) 1−r ∀ x ∈ Ac . (3.3) Similarly, condition (ii) implies that g(Xn∧τA ) + (n ∧ τA ) is a super-martingale. Therefore g(x) ≥ Ex [g(Xn∧τA ) + (n ∧ τA )] ≥ Ex [n ∧ τA ]. Letting n → ∞ then gives 1 ∀ x ∈ Ac . Ex [τA ] ≤ g(x) Using (3.3) or (3.4), we note that for x ∈ A, Z Z 1 1 Ex [τA ] = 1 + Π(x, dy)Ey [τA ] ≤ 1 + Π(x, dy)g(y) ≤ 1 + Ex [g(X1 )], c Ac c Ac (3.4) where c = 1 − r under assumption (i) and c = under assumption (ii). Taking supx∈A on both sides then yields (3.1) by the assumption that supx∈A Ex [g(X1 )] < ∞. References [1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont, California, 1996. [2] P. Lax. Functional analysis, John Wiley & Sons Inc., 2002. 5

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 10