Download Lecture 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Probability box wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Lecture 10
1
Ergodic decomposition of invariant measures
Let T : (Ω, F) → (Ω, F) be measurable, and let M denote the space of T -invariant probability
measures on (Ω, F). Then M is a convex set, although it might be empty. We will show that
any measure µ ∈ M can be decomposed as mixtures of extremal elements of M, which are
exactly the ergodic measures for T .
Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic
for T if and only if it is an extremal point in M.
Proof. If µ ∈ M is not ergodic, then there exists A ∈ F with µ(A) ∈ (0, 1) and A is
an invariant set for T . Let µA (resp. µAc ) denote the restriction of µ to A (resp. Ac ) and
normalized to be a probability measure, i.e.,
µA (·) =
µ(A ∩ ·)
.
µ(A)
Then µA and µAc are distinct invariant probability measures for T , and
µ = αµA + (1 − α)µAc ,
where α = µ(A) ∈ (0, 1), which shows that µ is not extremal.
Conversely, if µ ∈ M is not extremal, then µ = αµ1 + (1 − α)µ2 for some α ∈ (0, 1)
and distinct µ1 , µ2 ∈ M. If µ was ergodic, then by the ergodic theorem, for any bounded
measurable f on (Ω, F),
An f (ω) =
f (ω) + f (T ω) + · · · + f (T n−1 ω)
−→ Eµ [f ]
n
µ a.s. and in L1 (Ω, F, µ).
In particular, An f (ω) also converges to Eµ [f ] almost surely w.r.t. µ1 (resp. µ2 ), and hence
Eµ1 [f ] = Eµ2 [f ] = Eµ [f ]. Since f is any bounded measurable function, this implies that
µ1 = µ2 = µ, a contradiction. Therefore given that µ is non-extremal, it cannot be ergodic.
By applying the ergodic theorem to suitable test functions, one can prove:
Lemma 1.1 [Singularity of ergodic measures] Distinct ergodic measures µ1 , µ2 ∈ M are
mutually singular. More specifically, there exists A ∈ I s.t. µ1 (A) = µ2 (Ac ) = 1.
Choquet’s Theorem (see Lax [2, Section 13.4]) provides a decomposition of a metrizable
compact convex subset K of a locally convex topological vector space in terms of the extremal
points of K. Since the set of invariant probability measures M in general may not be compact,
we will not appeal to Choquet’s theorem. Instead, we will assume that (Ω, F) is a complete
separable metric space with Borel σ-algebra, and appeal to the existence of regular conditional
probability distributions.
1
Theorem 1.2 [Ergodic decomposition] Let Ω be a complete separable metric space with
Borel σ-algebra F. Let T be a measurable transformation on (Ω, F) and let M denote the
set of probability measures on (Ω, F) invariant w.r.t. T . Then for any µ ∈ M, there exists a
probability measure ρµ on the set of ergodic measures Me such that
Z
ν ρµ (dν).
(1.1)
µ=
Me
Remark. The σ-algebra we use for defining ρµ on M is the Borel σ-algebra induced by the
weak topology on M, i.e., µn → µ in M w.r.t.
the weak
R
R topology if and only if for all bounded
continuous functions f : Ω → R, we have f dµn → f dµ. Such a convergence of probability
measures on (Ω, F) is called weak convergence.
Proof. Since (Ω, F) is Polish, there exists a regular conditional probability µω of µ conditional
on the invariant σ-field I. Provided we can show that µω ∈ Me almost surely, we can regard
µω as a map from Ω to Me , and denote the distribution of µω by ρµ . The decomposition (1.1)
then follows readily.
We now verify that µω (·) := µ(·|I) is ergodic µ a.s. First we show invariance, i.e., µ a.s.,
µω (A) = µω (T −1 A)
∀ A ∈ F.
(1.2)
A priori, there are uncountable number of sets in F, and the exceptional sets may pile up.
However, by our assumption that (Ω, F) is Polish, F can be generated by a countable collection
of sets F0 , and hence it suffices to verify (1.2) for A ∈ F0 since µω is a probability measure
a.s. Since µω (·) = µ(·|I), given A ∈ F0 , µω (A) = µω (T −1 A) a.s. (i.e., µ(A|I) = µ(T −1 A|I)
a.s.) if and only if
µ(A ∩ E) = µ(T −1 A ∩ E)
∀ E ∈ I,
which holds since E ∈ I implies that µ(E∆T −1 E) = 0, and µ(A ∩ E) = µ(T −1 (A ∩ E)). This
proves the a.s. invariance of µω for T .
For the a.s. ergodicity of µω , it suffices to show that for µ a.s. every µω ,
∀ A ∈ F,
An 1A (ω) :=
1A (ω) + 1A (T ω) + · · · + 1A (T n−1 ω)
→ µω (A) a.s. w.r.t. µω . (1.3)
n
Approximating A ∈ F by sets that are finitely generated from F0 , it suffices to verify (1.3) for
A ∈ F0 . For such an A, the ergodic theorem applied to 1A w.r.t. µ implies that An 1A (ω) →
µ(A|I) = µω (A) a.s. w.r.t. µ. Since µω is the regular conditional probability of µ given I,
(1.3) must hold.
2
Structure of stationary Markov chains
We now apply the ergodic decomposition theorem for stationary measures to stationary
Markov chains. Let Π(x, dy) be a transition probability kernel on the state space (S, S).
In this section, we will consider a general Polish space (S, S). A Markov process (Xn )n∈N is
stationary if and only if its marginal distribution µ is stationary for Π. More precisely,
Z
µ ∈ M := {ν : ν(S) = 1, ν(A) =
Π(x, A)ν(dx) ∀ A ∈ S}.
S
Given marginal law µ ∈ M, we can embed the stationary Markov process (Xn )n∈N in a doubly
infinite stationary sequence (Xn )n∈Z . The process (Xn )n∈Z can be regarded as a random
2
variable taking values in the sequence space (S Z , S Z ) where S Z denotes the product σ-algebra
on the product space S Z . Given marginal law µ ∈ M, let Pµ denote the law of (Xn )n∈Z on
(S Z , S Z ). Let T denote the coordinate shift map on S Z . Then each µ ∈ M determines a
f where M
f is the family of probability measures on (S Z , S Z ) invariant for the shift
Pµ ∈ M,
map T . Our goal is to show that the ergodic components of a stationary Markov process Pµ
are stationary Markov processes Pν with ν ∈ M, where ν are the extremal components of µ in
M. (Note that in general, ergodic decomposition a stationary process gives ergodic processes
which need not be Markov).
Theorem 2.1 [Ergodic decomposition of stationary Markov processes] Given µ ∈
M, Pµ is ergodic for the shift map T if and only if µ ∈ Me , i.e., µ is extremal in the family
of invariant measures M for the Markov chain. Furthermore, for any µ ∈ M, there exists a
probability measure ρµ on Me such that
Z
Z
µ=
νρµ (dν)
and
Pµ =
Pν ρµ (dν).
(2.1)
Me
Me
The extremal elements of M are called the extremal or ergodic invariant measures. When M
is a singleton, we say the Markov chain is ergodic.
f which is equivalent to
Proof. If µ ∈ M is not extremal, then neither is Pµ extremal in M,
Pµ not being ergodic. The key to proving the converse is the following result.
Lemma 2.1 Let µ ∈ M, and let I be the invariant σ-field on (S Z , S Z ) for the shift map T
and the measure Pµ (note that we defined I modulo sets of Pµ measure 0). Then within sets
n = σ(x , x
Z
of Pµ measure 0, I ⊂ F00 , where Fm
m m+1 , · · · , xn ) on S = {(xi )i∈Z : xi ∈ S}.
Proof. The lemma shows that, for any E ∈ I, there exists A ∈ S such that E = {(xn )n∈Z :
x0 ∈ A} modulo sets of Pµ measure zero. The proof relies on the fact that invariant sets lie
∞ := ∩ F ∞ , as well as the infinite past F −∞ := ∩ F n , and the
both in the infinite future F∞
n n
n −∞
−∞
past and the future of a Markov process are independent conditioned on the present. Thus
for E ∈ I,
Pµ [E|F00 ] = Pµ [E ∩ E|F00 ] = Pµ [E|F00 ]2 .
Therefore Pµ [E|F00 ] = 0 or 1 µ a.s. Let A ∈ S be the set on which Pµ [E|F00 ] = 1 a.s. Then by
the invariance of E under the shift T , we have E = AZ := {(xn )n∈Z ∈ S Z : xn ∈ A ∀ n ∈ Z}
modulo sets of Pµ measure zero, while E c = (Ac )Z . In particular, for µ almost all x ∈ S,
if x ∈ A (resp. x ∈ Ac ), then the Markov chain starting at x never leaves A (resp. Ac ).
Therefore, E = {(xn )n∈Z ∈ S Z : x0 ∈ A} modulo sets of Pµ measure zero, which proves the
lemma.
With Lemma 2.1, we can conclude the proof of Theorem 2.1. Suppose that Pµ is not
ergodic, then Pµ is a mixture of Pµ [·|I], which are ergodic measures on (S Z , S Z ). Since
I ⊂ F00 by Lemma 2.1, Pµ [·|I] are almost surely mixtures of Pµ [·|F00 ], which are measures
of the Markov chain with specified values at time 0. Hence Pµ [·|I] are stationary Markov
processes with marginal laws in M, and µ is a mixture of these marginal laws, which means
that µ is not extremal in M. The same reasoning also allows us to deduce (2.1) from the
ergodic decomposition of Pµ .
Remark. Note that extremal measures in M must be singular w.r.t. each other, since the
associated ergodic Markov processes are singular w.r.t. each other by Theorem 2.1.
3
Remark. A sufficient condition to guarantee the uniqueness of a stationary distribution (if
it exists) for a Markov chain is to have some form of irreducibility. If M is not a singleton,
then we can find two extremal invariant measure with disjoint support U1 and U2 in the state
space, such that the Markov chain makes no transitions between U1 and U2 . Any irreducibility
condition that breaks such a partition of the state space will guarantee the existence of at
most one stationary distribution. One such condition is if Π(x, dy) has a positive density
p(x, y) w.r.t. a common reference measure α(dy) for all x in the state space.
3
Harris chains
So far we have studied mostly countable state Markov chains, although the ergodic decomposition of stationary Markov chains was developed for a general Polish space. We now discuss
briefly the theory of general state space Markov chains. One class of Markov chains that
admit a similar treatment as the countable state space case is the so-called Harris chains.
Definition 3.1 (Harris Chains) A Markov chain (Xn )n≥0 with state space (S, S) and transition kernel Π(·, ·) is called a Harris chain, if there exist A, B ∈ S, > 0, and a probability
measure ρ with ρ(B) = 1 such that:
(i) If τA := inf{n ≥ 0 : Xn ∈ A}, then Pz (τA < ∞) > 0 for all z ∈ S.
(ii) If x ∈ A, then Π(x, C) ≥ ρ(C) for all C ∈ S with C ⊂ B.
The conditions of a Harris chain allow us to construct an equivalent Markov chain X̄ with
state space S̄ := S ∪ {α} and σ-algebra S̄ := {B, B ∪ {α} : B ∈ S}, where α is an artificial
atom that the chain X̄ will visit. More precisely, define X̄ with transition probability kernels
Π̄, such that
If x ∈ S\A,
Π̄(x, C) = Π(x, C) for C ∈ S,
If x ∈ A,
Π̄(x, {α}) = , and Π̄(x, C) = Π(x, C) − ρ(C) for C ∈ S,
Z
Π̄(α, D) = ρ(dx)Π̄(x, D) for D ∈ S̄.
If x = α,
X̄n being in the state α corresponds to Xn being distributed as ρ on B. This correspondence
allows us to go from the distribution of X to X̄ and vice versa. Having a macroscopic atom
α allows us to define transience, recurrence, periodicity, and use the cycle trick to construct
stationary measures for recurrent Harris chains, and use coupling to prove convergence of
positive recurrent Harris chains to its unique stationary distribution.
Definition 3.2 (Recurrence, transience, and periodicity) Let τα := inf{n ≥ 1 : X̄n =
α}. X is called a recurrent Harris chain if Pα (τα < ∞) = 1, and transient otherwise. The
gcd of D := {n ≥ 1 : Pα (X̄n = α) > 0} is called the period of the Harris chain, with d = 1
corresponding to aperiodicity.
Note that Definition 3.1 (i) guarantees that Px (τα < ∞) > 0 for all x ∈ S̄, which is a form
of irreducibility for the chain X̄. The theory we developed for countable state Markov chains
can be adapted to Harris chains. See e.g. [1] for more details.
4
Theorem 3.1 (Stationary measures) If X is a recurrent Harris chain, then there exists
a unique (modulo constant multiple) stationary measure. If X is furthermore aperiodic with
stationary distribution π, then for any x ∈ S with Px (τα < ∞) = 1, we have kΠn (x, ·)−π(·)k →
0, where k · k denotes the total variation norm of a signed measure.
We next give some sufficient conditions for a Harris chain to be positive recurrent, i.e.,
Eα [τα ] < ∞, which is based on the existence of certain Lyapunov functions.
Theorem 3.2 (Sufficient conditions for positive recurrence) Let X be a Harris chain
satisfying the conditions in Definition 3.1, where we further assume that A = B. Assume that
there exists a function g : S → [0, ∞) with supx∈A Ex [g(X1 )] < ∞, such that
(i) either g : S → [1, ∞) and there exists r ∈ (0, 1) s.t. Ex [g(X1 )] ≤ rg(x) for all x ∈ Ac ,
(ii) or Ex [g(X1 )] ≤ g(x) − for all x ∈ Ac ,
then Eα [τα ] < ∞ and X is a positive recurrent Harris chain.
Proof. Since every time the Markov chain X̄ enters the set A = B, there is probability of
entering the state α in the next step, to show Eα [τα ] < ∞, it suffices to show that
sup Ex [τA ] < ∞,
where τA := min{n ≥ 1 : Xn ∈ A}.
(3.1)
x∈A
Note that condition (i) implies that g(Xn∧τA )r−n∧τA is a super-martingale. Therefore
g(x) ≥ Ex [g(Xn∧τA )r−n∧τA ] ≥ Ex [r−n∧τA ].
Letting n → ∞ then gives
Ex [r−τA ] ≤ g(x)
∀ x ∈ Ac .
(3.2)
By the Markov inequality, this further implies that
Ex [τA ] =
∞
X
n=1
Px (τA ≥ n) ≤
∞
X
n=1
rn g(x) <
g(x)
1−r
∀ x ∈ Ac .
(3.3)
Similarly, condition (ii) implies that g(Xn∧τA ) + (n ∧ τA ) is a super-martingale. Therefore
g(x) ≥ Ex [g(Xn∧τA ) + (n ∧ τA )] ≥ Ex [n ∧ τA ].
Letting n → ∞ then gives
1
∀ x ∈ Ac .
Ex [τA ] ≤ g(x)
Using (3.3) or (3.4), we note that for x ∈ A,
Z
Z
1
1
Ex [τA ] = 1 +
Π(x, dy)Ey [τA ] ≤ 1 +
Π(x, dy)g(y) ≤ 1 + Ex [g(X1 )],
c Ac
c
Ac
(3.4)
where c = 1 − r under assumption (i) and c = under assumption (ii). Taking supx∈A on
both sides then yields (3.1) by the assumption that supx∈A Ex [g(X1 )] < ∞.
References
[1] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, Belmont,
California, 1996.
[2] P. Lax. Functional analysis, John Wiley & Sons Inc., 2002.
5