* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download LECTURE 3 Basic Ergodic Theory
Survey
Document related concepts
Georg Cantor's first set theory article wikipedia , lookup
Inductive probability wikipedia , lookup
Four color theorem wikipedia , lookup
Wiles's proof of Fermat's Last Theorem wikipedia , lookup
Dynamical system wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
List of important publications in mathematics wikipedia , lookup
Brouwer fixed-point theorem wikipedia , lookup
Series (mathematics) wikipedia , lookup
Nyquist–Shannon sampling theorem wikipedia , lookup
Fundamental theorem of calculus wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Transcript
LECTURE 3 Basic Ergodic Theory 3.1 STATIONARITY AND ERGODICITY Let X = (Xn )n∈N be a random process on X with indices in N and (Ω, F , P) be the associated probability space, where Ω = X N is the sample space (the set of outcomes), F = σ((Xn )n∈N ) is the σ-algebra generated by X (the set of events), and P is the probability measure. Note that Xn (ω) = ωn , where the latter denotes the n-th coordinate of ω ∈ Ω. The index set N is typically one of ℕ = {1, 2, . . .}, ℤ+ = {0, 1, . . .}, and ℤ = {. . . , −1, 0, 1, . . .}, and is omitted when it is irrelevant or clear from the context. Let T be a time shift operator on Ω, that is, if ω = (. . . , ω−1 , ω0 , ω1 , . . .), then Tω = (. . . , ω0 , ω1 , ω2 , . . .). Here the boxes denote the zeroth coordinate. The process X = (Xn ) is said to be stationary if P(T −1 A) = P({ω : Tω ∈ A}) = P(A) for every A ∈ F . Note that this definition is equivalent to the standard definition of stationarity, namely, P{X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn } = P{X1+m ≤ x1 , X2+m ≤ x2 , . . . , Xn+m ≤ xn } for every m, n and every x1 , x2 , . . . , xn ∈ X . If the process is double-sided (i.e., N = ℤ), stationarity is equivalently defined by P(A) = P(TA). For example, if A = {ω : ω0 = 1}, then T −1 A = {ω : (Tω)0 = 1} = {1 = 1}, TA = {Tω : ω0 = 1} = {−1 = 1}. Thus P(A) = P(X0 = 1) while P(T −1 A) = P(X1 = 1) and P(TA) = P(X−1 = 1). An event A is said to be shift-invariant if A = T −1 A. The process X = (Xn ) is said to be ergodic if every shift-invariant event A is trivial, namely, P(A) = 0 or 1. 2 Basic Ergodic Theory Example 3.1. Let X= . . . 0 101 . . . with probability (w.p.) 1/2, . . . 1 010 . . . w.p. 1/2. Then A = T −1 A implies A = or Ω. Thus, P(A) ∈ {0, 1} and X is ergodic. Example 3.2. Let X be defined as in Example ., Z be an independent copy of X, and Y = (X, Z). Consider the event A = {y = (x, z) : xi = zi for all i}, which is shift-invariant. However, P(A) = P(X = Z) = 1/2 and Y is not ergodic. Thus, a composite of two independent ergodic processes is not necessarily ergodic. Example 3.3. Let X1 , X2 , . . . be i.i.d. Then X = (Xn ) is ergodic. To prove this, first observe ∞ that any shift-invariant A must be in the tail σ-algebra T = ⋂n=1 σ(Xn , Xn+1 , . . .); see Problem .. Since X1 , X2 , . . . are independent, by the Kolmogorov - law, P(A) = 0 or 1 and X is ergodic. Example 3.4. Let Θ= 1/3 w.p. 1/2, 2/3 w.p. 1/2, and given {Θ = θ}, X1 , X2 , . . . be i.i.d. Bern(θ). Note that X1 , X2 , . . . are unconditionally dependent. Consider the shift-invariant event A = ω : lim sup n→∞ = ω : lim sup n→∞ 1 n 1 Xi (ω) ≤ 2 n i=1 1 n+1 1 X (ω) ≤ = T −1 A. 2 n i=2 i Then by the strong law of large numbers, P(A | Θ = θ) = and P(A) = 1 if θ = 1/3, 0 otherwise 1 1 1 P(A | Θ = 1/3) + P(A | Θ = 2/3) = . 2 2 2 Thus, X is not ergodic. In general, a mixture of ergodic processes is not ergodic. However, every stationary process can be viewed as a mixture of stationary ergodic processes. ∞ If X = (Xn )∞ n=−∞ is stationary and ergodic, then so is Y = (Yn )n=−∞ , where Yn (ω) = f (T n ω) for some (measurable) function f . 3.2 Mixing 3 3.2 MIXING Suppose that X = (Xn ) is stationary. We introduce the notion of mixing (Bradley ) that states that the memory of the process fades away such that (X1, . . . , Xk ) and (Xn+1 , . . . , Xn+k ) are asymptotically independent. More precisely, X is said to be strongly mixing if for every A, B ∈ F , lim P(T −n A ∩ B) = P(A) P(B), (.) n→∞ where T −n denotes the n-fold composition of T −1 . Let A be shift-invariant. Then, by setting A = B in (.), we have lim P(T −n A ∩ A) = P(A) = (P(A))2 , n→∞ or equivalently, P(A) = 0 or 1. Hence, strong mixing implies ergodicity. In fact, the notion of ergodicity can be rewritten in a form similar to (.). Theorem .. Suppose that X = (Xn ) be stationary. Then X is ergodic iff for every A, B ∈ F , 1 P(T −i A ∩ B) = P(A) P(B). n i=1 n lim n→∞ (.) As an intermediate level between strong mixing and ergodicity, we say that a process is weakly mixing if for every A, B ∈ F , 1 | P(T −i A ∩ B) − P(A) P(B)| = 0. n i=1 n lim n→∞ (.) Strong mixing implies weak mixing, which, in turn, implies ergodicity; see Problem .. Example 3.5. A stationary irreducible Markov chain is ergodic. If further if the chain is aperiodic, then it is strongly mixing. ∞ Example 3.6. Suppose that X = (Xn )∞ n=1 is stationary and ergodic, Z = (Zn )n=1 is stationary and weakly mixing, and X and Z are independent. Let Yn = f (Xn , Zn ), n = 1, 2, . . . for some function f . Then Y = (Yn )∞ n=1 is stationary and ergodic. In particular, if X is a stationary irreducible Markov chain and Z is i.i.d., then Y is hidden Markov and ergodic. 3.3 ERGODIC THEOREMS The strong law of large numbers states that the time average of an i.i.d. random sequence converges to the ensemble average with probability one. This deterministic behavior arises more generally for ergodic processes. 4 Basic Ergodic Theory Theorem . (Pointwise ergodic theorem (Birkhoff )). Let X = (Xn )∞ n=1 be stationary and ergodic with E(|X1 |) < ∞. Then 1 X = E(X1 ) n i=1 n n lim n→∞ a.s. (.) If E(X12 ) < ∞, then the convergence in (.) also holds in the L2 sense, which is often referred to as von Neumann’s mean ergodic theorem. The following result is an analog of the ergodic theorem for information theory. Theorem . (Shannon (), McMillan (), Breiman ()). If X = (Xn )∞ n=1 is stationary and ergodic, then lim n→∞ 1 1 log = H(X) n p(X n ) a.s. Roughly speaking, the theorem states that every realized sequence x n has a probability nearly equal to 2−nH(X) (asymptotic equipartition property). For the proof of the theorem, refer to Cover and Thomas (, Section .). The following generalizes the Shannon–McMillan–Breiman theorem to the relative entropy rate between densities. Theorem . (Barron ()). If P is a stationary ergodic probability measure, Q is a stationary ergodic Markov probability measure, and P is absolutely continuous w.r.t. Q, then p(X n ) 1 lim log = D(P ‖ Q) P −a.s. n→∞ n q(X n ) PROBLEMS .. .. Shift-invariant sets. Show that the collection I of shift-invariant sets is a σ-algebra. ⋂ Tail σ-algebra. Let X = (Xn )∞ n=1 be a random process and T = n σ(Xn , Xn+1 , . . .). (a) Show that every shift-invariant A ∈ F is in T , i.e., I ⊆ T . (b) Does the converse hold, that is, T ⊆ I ? .. Multiple time shifts. Let X = (Xn )∞ n=1 be ergodic and Yn = X2n , n = 1, 2, . . .. Is ∞ Y = (Yn )n=1 ergodic? Prove or provide a counterexample. Problems .. 5 Mixture of ergodic processes. Let U = (Un ) and V = (Vn ) be two stationary ergodic processes on the same alphabet X , but with different entropy rates H(U ) and H(V ), respectively. Let X = (Xn ) be a random process that is either U or V uniformly at random, i.e., U X= V w.p. 1/2, w.p. 1/2. (a) Is X stationary? (b) Find its entropy rate H(X) in terms of H(U ) and H(V ). (c) Does the random sequence 1 1 log n p(X n ) converge almost surely? If so, characterize the limiting random variable. .. Cesàro summation. Let {an } be a sequence of real numbers. We say that an converges if limn→∞ an = a < ∞; that an converges in the strong Cesàro sense if 1 |a − a| = 0; n i=1 n n lim n→∞ and that an converges in the Cesàro sense if 1 a = a. n i=1 i n lim n→∞ (a) Show that convergence implies strong Cesàro convergence, which, in turn, implies Cesàro convergence. (b) Using part (a) to show that strong mixing implies weak mixing, which, in turn, implies ergodicity. Bibliography Barron, A. R. (). The strong ergodic theorem for densities: Generalized Shannon–McMillan– Breiman theorem. Ann. Probab., (), –. [] Birkhoff, G. D. (). Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA, (), –. [] Bradley, R. C. (). Basic properties of strong mixing conditions: A survey and some open questions. Probab. Surv., (), –. [] Breiman, L. (). The individual ergodic theorem of information theory. Ann. Math. Statist., (), –. Correction (). (), –. [] Cover, T. M. and Thomas, J. A. (). Elements of Information Theory. nd ed. Wiley, New York. [] McMillan, B. (). The basic theorems of information theory. Ann. Math. Statist., (), –. [] Shannon, C. E. (). A mathematical theory of communication. Bell Syst. Tech. J., (), –, (), –. []