Download LECTURE 3 Basic Ergodic Theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Georg Cantor's first set theory article wikipedia , lookup

Inductive probability wikipedia , lookup

Four color theorem wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

Dynamical system wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

List of important publications in mathematics wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

Series (mathematics) wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Fundamental theorem of calculus wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Theorem wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
LECTURE 3
Basic Ergodic Theory
3.1 STATIONARITY AND ERGODICITY
Let X = (Xn )n∈N be a random process on X with indices in N and (Ω, F , P) be the
associated probability space, where Ω = X N is the sample space (the set of outcomes),
F = σ((Xn )n∈N ) is the σ-algebra generated by X (the set of events), and P is the probability measure. Note that Xn (ω) = ωn , where the latter denotes the n-th coordinate
of ω ∈ Ω. The index set N is typically one of ℕ = {1, 2, . . .}, ℤ+ = {0, 1, . . .}, and ℤ =
{. . . , −1, 0, 1, . . .}, and is omitted when it is irrelevant or clear from the context.
Let T be a time shift operator on Ω, that is, if
ω = (. . . , ω−1 , ω0 , ω1 , . . .),
then
Tω = (. . . , ω0 , ω1 , ω2 , . . .).
Here the boxes denote the zeroth coordinate.
The process X = (Xn ) is said to be stationary if
P(T −1 A) = P({ω : Tω ∈ A}) = P(A)
for every A ∈ F . Note that this definition is equivalent to the standard definition of
stationarity, namely,
P{X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn } = P{X1+m ≤ x1 , X2+m ≤ x2 , . . . , Xn+m ≤ xn }
for every m, n and every x1 , x2 , . . . , xn ∈ X . If the process is double-sided (i.e., N = ℤ),
stationarity is equivalently defined by P(A) = P(TA). For example, if A = {ω : ω0 = 1},
then
T −1 A = {ω : (Tω)0 = 1} = {󰑤1 = 1},
TA = {Tω : ω0 = 1} = {󰑤−1 = 1}.
Thus P(A) = P(X0 = 1) while P(T −1 A) = P(X1 = 1) and P(TA) = P(X−1 = 1).
An event A is said to be shift-invariant if A = T −1 A. The process X = (Xn ) is said to
be ergodic if every shift-invariant event A is trivial, namely, P(A) = 0 or 1.
2
Basic Ergodic Theory
Example 3.1. Let
X=󶁇
. . . 0 101 . . . with probability (w.p.) 1/2,
. . . 1 010 . . . w.p. 1/2.
Then A = T −1 A implies A =  or Ω. Thus, P(A) ∈ {0, 1} and X is ergodic.
Example 3.2. Let X be defined as in Example ., Z be an independent copy of X, and
Y = (X, Z). Consider the event
A = {y = (x, z) : xi = zi for all i},
which is shift-invariant. However, P(A) = P(X = Z) = 1/2 and Y is not ergodic. Thus, a
composite of two independent ergodic processes is not necessarily ergodic.
Example 3.3. Let X1 , X2 , . . . be i.i.d. Then X = (Xn ) is ergodic. To prove this, first observe
∞
that any shift-invariant A must be in the tail σ-algebra T = ⋂n=1 σ(Xn , Xn+1 , . . .); see
Problem .. Since X1 , X2 , . . . are independent, by the Kolmogorov - law, P(A) = 0 or 1
and X is ergodic.
Example 3.4. Let
Θ=󶁇
1/3 w.p. 1/2,
2/3 w.p. 1/2,
and given {Θ = θ}, X1 , X2 , . . . be i.i.d. Bern(θ). Note that X1 , X2 , . . . are unconditionally
dependent. Consider the shift-invariant event
A = 󶁃ω : lim sup
n→∞
= 󶁃ω : lim sup
n→∞
1 n
1
󵠈 Xi (ω) ≤ 󶁓
2
n i=1
1 n+1
1
󵠈 X (ω) ≤ 󶁓 = T −1 A.
2
n i=2 i
Then by the strong law of large numbers,
P(A | Θ = θ) = 󶁇
and
P(A) =
1 if θ = 1/3,
0 otherwise
1
1
1
P(A | Θ = 1/3) + P(A | Θ = 2/3) = .
2
2
2
Thus, X is not ergodic. In general, a mixture of ergodic processes is not ergodic. However,
every stationary process can be viewed as a mixture of stationary ergodic processes.
∞
If X = (Xn )∞
n=−∞ is stationary and ergodic, then so is Y = (Yn )n=−∞ , where Yn (ω) =
f (T n ω) for some (measurable) function f .
3.2
Mixing
3
3.2 MIXING
Suppose that X = (Xn ) is stationary. We introduce the notion of mixing (Bradley )
that states that the memory of the process fades away such that (X1, . . . , Xk ) and (Xn+1 , . . . , Xn+k )
are asymptotically independent. More precisely, X is said to be strongly mixing if for every
A, B ∈ F ,
lim P(T −n A ∩ B) = P(A) P(B),
(.)
n→∞
where T −n denotes the n-fold composition of T −1 .
Let A be shift-invariant. Then, by setting A = B in (.), we have
lim P(T −n A ∩ A) = P(A) = (P(A))2 ,
n→∞
or equivalently, P(A) = 0 or 1. Hence, strong mixing implies ergodicity. In fact, the notion
of ergodicity can be rewritten in a form similar to (.).
Theorem .. Suppose that X = (Xn ) be stationary. Then X is ergodic iff for every
A, B ∈ F ,
1
󵠈 P(T −i A ∩ B) = P(A) P(B).
n i=1
n
lim
n→∞
(.)
As an intermediate level between strong mixing and ergodicity, we say that a process
is weakly mixing if for every A, B ∈ F ,
1
󵠈 | P(T −i A ∩ B) − P(A) P(B)| = 0.
n i=1
n
lim
n→∞
(.)
Strong mixing implies weak mixing, which, in turn, implies ergodicity; see Problem ..
Example 3.5. A stationary irreducible Markov chain is ergodic. If further if the chain is
aperiodic, then it is strongly mixing.
∞
Example 3.6. Suppose that X = (Xn )∞
n=1 is stationary and ergodic, Z = (Zn )n=1 is stationary and weakly mixing, and X and Z are independent. Let Yn = f (Xn , Zn ), n = 1, 2, . . .
for some function f . Then Y = (Yn )∞
n=1 is stationary and ergodic. In particular, if X is a
stationary irreducible Markov chain and Z is i.i.d., then Y is hidden Markov and ergodic.
3.3 ERGODIC THEOREMS
The strong law of large numbers states that the time average of an i.i.d. random sequence
converges to the ensemble average with probability one. This deterministic behavior arises
more generally for ergodic processes.
4
Basic Ergodic Theory
Theorem . (Pointwise ergodic theorem (Birkhoff )). Let X = (Xn )∞
n=1 be stationary and ergodic with E(|X1 |) < ∞. Then
1
󵠈 X = E(X1 )
n i=1 n
n
lim
n→∞
a.s.
(.)
If E(X12 ) < ∞, then the convergence in (.) also holds in the L2 sense, which is often
referred to as von Neumann’s mean ergodic theorem.
The following result is an analog of the ergodic theorem for information theory.
Theorem . (Shannon (), McMillan (), Breiman ()). If X = (Xn )∞
n=1 is
stationary and ergodic, then
lim
n→∞
1
1
log
= H(X)
n
p(X n )
a.s.
Roughly speaking, the theorem states that every realized sequence x n has a probability
nearly equal to 2−nH(X) (asymptotic equipartition property). For the proof of the theorem,
refer to Cover and Thomas (, Section .).
The following generalizes the Shannon–McMillan–Breiman theorem to the relative
entropy rate between densities.
Theorem . (Barron ()). If P is a stationary ergodic probability measure, Q is a
stationary ergodic Markov probability measure, and P is absolutely continuous w.r.t. Q,
then
p(X n )
1
lim log
= D(P ‖ Q) P −a.s.
n→∞ n
q(X n )
PROBLEMS
..
..
Shift-invariant sets. Show that the collection I of shift-invariant sets is a σ-algebra.
⋂
Tail σ-algebra. Let X = (Xn )∞
n=1 be a random process and T = n σ(Xn , Xn+1 , . . .).
(a) Show that every shift-invariant A ∈ F is in T , i.e., I ⊆ T .
(b) Does the converse hold, that is, T ⊆ I ?
..
Multiple time shifts. Let X = (Xn )∞
n=1 be ergodic and Yn = X2n , n = 1, 2, . . .. Is
∞
Y = (Yn )n=1 ergodic? Prove or provide a counterexample.
Problems
..
5
Mixture of ergodic processes. Let U = (Un ) and V = (Vn ) be two stationary ergodic processes on the same alphabet X , but with different entropy rates H(U )
and H(V ), respectively. Let X = (Xn ) be a random process that is either U or V
uniformly at random, i.e.,
U
X=󶁇
V
w.p. 1/2,
w.p. 1/2.
(a) Is X stationary?
(b) Find its entropy rate H(X) in terms of H(U ) and H(V ).
(c) Does the random sequence
1
1
log
n
p(X n )
converge almost surely? If so, characterize the limiting random variable.
..
Cesàro summation. Let {an } be a sequence of real numbers. We say that an converges if limn→∞ an = a < ∞; that an converges in the strong Cesàro sense if
1
󵠈 |a − a| = 0;
n i=1 n
n
lim
n→∞
and that an converges in the Cesàro sense if
1
󵠈 a = a.
n i=1 i
n
lim
n→∞
(a) Show that convergence implies strong Cesàro convergence, which, in turn, implies Cesàro convergence.
(b) Using part (a) to show that strong mixing implies weak mixing, which, in turn,
implies ergodicity.
Bibliography
Barron, A. R. (). The strong ergodic theorem for densities: Generalized Shannon–McMillan–
Breiman theorem. Ann. Probab., (), –. []
Birkhoff, G. D. (). Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA, (), –.
[]
Bradley, R. C. (). Basic properties of strong mixing conditions: A survey and some open
questions. Probab. Surv., (), –. []
Breiman, L. (). The individual ergodic theorem of information theory. Ann. Math. Statist.,
(), –. Correction (). (), –. []
Cover, T. M. and Thomas, J. A. (). Elements of Information Theory. nd ed. Wiley, New York.
[]
McMillan, B. (). The basic theorems of information theory. Ann. Math. Statist., (), –.
[]
Shannon, C. E. (). A mathematical theory of communication. Bell Syst. Tech. J., (), –,
(), –. []