Download A little more measure theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Random variable wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
SOME TOOLS FROM MEASURE THEORY
Abstract. Some more measure theory
1. Dealing with sigma-algebras
1.1. Measurable functions. Let (Ω, F) be a measurable space. Given
a collection C of subsets of Ω, we let σ(C) denote the smallest sigmaalgebra that contains the sets C; we also say that C generates the
sigma-algebra σ(C). Thus the open sets generate the Borel sets and
B = σ(G), where G is the collection of opens, and B is the set of
Borel sets. Recall that a function f : Ω → R is measurable if f −1 (B) =
{f −1 (B) : B ∈ B} ⊂ F; that is, f −1 (B) ∈ F for all B ∈ B. We claimed
that this condition is equivalent to checking the easier condition that
f −1 (C) ⊂ F, in the case that C is the set of intervals of the form
(−∞, x) for x ∈ R.
Lemma 1. Let C be the set of intervals of the form (−∞, x) for x ∈ R.
Then σ(C) = B.
Proof. First, we argue that every open interval is in σ(C). Let a < b
be real numbers. Clearly, (−∞, b) ∈ σ(C), and
\
(−∞, a] =
(−∞, a + n−1 ) ∈ σ(C).
n>0
c
Hence (−∞, a] = (a, ∞) ∈ σ(C), from which we can conclude that
(a, ∞) ∩ (−∞, b) = (a, b) ∈ σ(C). Second, we note that it is theorem
that every open set is a disjoint union of a countable number of intervals of open intervals, from which it follows that σ(C) contains every
open set. Finally, since B was defined to be smallest sigma-algebra
containing the open sets, we have that σ(C) = B.
Exercise 1.1. Let (Ω, F) be a measurable space. Let C be a collection
of subsets that generate B. Show that f : Ω → R is measurable if and
only if f −1 (C) ⊂ F.
We will also speak of random variables with that are not real-valued.
For example random vectors of random sequences. In general, if (Ω, F)
is measurable space, and (S, S) is another measurable space, we say
that f : Ω → S is measurable if f −1 (S) ⊂ F.
Exercise 1.2. Consider the measurable space (N, 2N ); here 2N is the
set of all subsets of N. Check that the set of all singletons {n} such
that n ∈ N generates 2N .
Exercise 1.3. Prove a version of Exercise 1.1 for the case of general
measuable functions.
Let (Ω, F, P) be a probability space. We say that a sigma-algebra
T ⊂ F is trivial if for every A ∈ T , we have P(A) ∈ {0, 1}.
Exercise 1.4. Let X be a real-valued random variable. Show that if
σ(X) is trival, then there exists a constant c ∈ R such that P(X = c) =
1.
Exercise 1.5. Let (Ω, F, P) be a probability space, and let (S, S) be a
measurable space.
(a) Show that if S = {∅, S}, then every function X : Ω → S is a
random variable, and σ(X) is trivial.
(b) Show that if σ(X) is trivial, then there does not exists a partition
of S given by S1 ∪ S2 = S with S1 , S2 ∈ S such that X takes values
in both S1 and S2 .
(c) Assume that S contains all the singletons of S; that is, all sets of
the form {s}, with s ∈ S. Show that if X is discrete and σ(X) is
trivial, then there exists c ∈ S such that P(X = c) = 1.
(d) What happens if we do not know a priori know that X is discrete?
1.2. Measures. Recall in elementary probability courses, we said that
two random variables X and Y are independent if
P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y)
for all x, y ∈ R. This condition is equivalent to the condition that
P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B)
for all A, B ∈ B. One way to justify this fact is via π-systems. Let Ω
be a set. A collection of subsets I of Ω is a π-system if it is closed
under finite intersections; that is if A, B ∈ I, then A ∩ B ∈ I.
Theorem 2 (Uniqueness via π-systems). Let (Ω, F) be a measurable
space and µ and ν be finite measures on Ω. If µ and ν agree on π-system
that generates F, then µ and ν are equal on all of F.
Exercise 1.6. We claimed that there exists a nice ‘Borel’ measure λ
on (R, B), with properties such as λ(a, b) = b − a and translationinvariance. Show that there is only one Borel measure.
Let (Ω, F, P) be a probability space. Let G and H be sub-sigmaalgebras of F; that is G, H ⊂ F. We say that G and H are independent if
P(A ∩ B) = P(A)P(B)
for all A ∈ G and all B ∈ H. Recall that X −1 (B) = {X −1 (B) : B ∈ B}
is a sigma-algebra. Write σ(X) = X −1 (B). We say that X and Y are
independent if σ(X) is independent of σ(Y ).
Exercise 1.7. Check that this definition agrees with the usual definition.
Exercise 1.8. Let f be a Borel measurable function. Check that Y =
f (X) is measurable with respect to the σ(X); that is, Y −1 (B) ∈ σ(X).
Exercise 1.9. Check that if X and Y are independent, then f (X) is
independent of g(Y ), for Borel measurable funcitons f and g.
Exercise 1.10. Let f : [0, 1] → R be a continuous function. Let Ui
be a sequence of i.i.d. random variables uniformly distributed in [0, 1].
Show that
Z 1
n
1X
f (Ui ) →
f (x)dx.
n i=1
0
Exercise 1.11. Let X be Bernoulli random variable. Let Y ∈ σ(X)
be a real-valued random variable. Show that Y is a discrete random
variable with at most two distinct values.
2. Approximating and constructing measures
2.1. Caratheodory’s extension theorem. Let Ω be a set. An algebra A on Ω is a collection of subsets of Ω that contains Ω and is
closed under complements and finite unions; that is, Ω ∈ A, if A ∈ A,
then Ac ∈ A, and if A, B ∈ A, then A ∪ B ∈ A.
Theorem 3 (Caratheodory’s extension theorem). Let Ω be a set, and
A be an alegbra for Ω. If µ̃ be a measure on A, then there exists a
measure µ on the measurable space (Ω, σ(A)) such that µ = µ̃ on A.
The proof of Theorem 3 involves defining a set function on all subsets
of Ω using µ̃: set
X
µ∗ (E) = inf
µ̃(Ai ),
i=1
where the the infimum is take over all sequences (Ai ) for which E ⊂
∪i Ai and Ai ∈ A. The set function µ∗ is called an outer measure.
We have that µ∗ = µ̃ on A, but the outer measure is only countably
subadditive: even if Ei are disjoint sets, we only have
[ X
µ∗
Ei ≤
µ∗ (Ei );
i
i
however, restricting to µ∗ to σ(A) results in a measure µ. Using outer
measure to construct measures has the following benefit that measurable sets can be approximated by more basic sets.
In the case of Borel measure on [0, 1), we can set A to be the set of all
finite unions of intervals for the form [a, b), and define µ̃[a, b) = b − a.
Corollary 4. Let ε > 0. If E ∈ σ(A), then there exists a A ∈ A such
that µ(A4E) < ε. Here A4E = (A \ E) ∪ (E \ A).
Exercise 2.1. In Theorem 3, show that if µ̃(Ω) < ∞, then the extension µ is unique.
Exercise 2.2. Let f : [0, 1] → R be a meaurable function. Show that
there exist a sequence of step functions fn such that fn → f almost
surely with respect to Borel measure. Here a step function is any finite
linear combination of indicator functions over the intervals.
Exercise 2.3. Check that the set A of all finite unions of the form
[a, b) for all 0 ≤ a < b < 1, containing the empty set, is an algebra.
Show that A is not a sigma-algebra.
Solution. Notice that if A = [a, b) ∪ [c, d), where a < b < c < d, then
Ac = [0, a) ∪ [b, c) ∪ [d, 1).
Thus it is easy to see that A is closed under complements, as well as
finite unions. Notice also that A does not even contain an interval of
the form (a, b).
A semialgebra A0 is collection of sets that are closed under finite
intersections and have the property that any A ∈ A0 , we have that
Ac is a finite disjoint union of members of A0 ; note that Ac does not
have to be in A0 . The algebra generated by A0 is the collection A
containing the empty-set and all finite disjoint unions of sets in A0 .
2.2. Product spaces and Kolmogorov’s extension theorem. Let
(Ωi , Fi ) be measurable spaces with µ(Ωi ) < ∞. We define the product
space to be (Ω1 × Ω2 , F1 ⊗ F2 ), where F1 ⊗ F2 is defined to be the
sigma-algebra generated by all sets of the form F1 × F2 ∈ F1 × F2 .
Often we will abuse notation, and write F1 × F2 = σ(F1 × F2 ).
Theorem 5 (Product measures). Let (Ωi , Fi , µi ) be measure spaces.
There exists a unique measure µ on the product space (Ω1 ×Ω2 , F1 ⊗F2 ),
such that µ(A × B) = µ1 (A)µ2 (B) for all A × B ∈ F1 × F2 .
The proof of Theorem 5 is harder than you might guess, but it is a
corollary of Theorem 3.
Q In the case of infinite product spaces, we consider the set Ω =
i∈Z+ Ωi , which is the set of all sequences ω such that ω(i) ∈ Ωi for all
i ∈ Z+ . and sigma-algebra generated by all the finite dimensional
sets, which are sets given by the finite intersection of sets of the form
{ω ∈ Ω : ω(i) ∈ Fi }; such sets are sometimes also called cylinder sets.
Note that by de Morgan’s laws, the cylinder sets form a semialgebra;
for example,
c
{ω : Ω : ω(1) ∈ F1 } ∩ {ω : Ω : ω(2) ∈ F2 } =
{ω : Ω : ω(1) ∈ F1c } ∪ {ω : Ω : ω(2) ∈ F2c } .
Sometimes even members of the algebra generated by the cylinder sets
are called cylinder sets.
Exercise 2.4. Use Theorem 3 to construct a probability space for an
infinite sequence of i.i.d. fair coin flips.
Exercise 2.5 (Ergodicity). Let X = (Xi )i∈Z be i.i.d. Bernoulli random
variables. Let Q = P(X ∈ ·) be the law of X, so that Q is probability
measure on space of bi-infinte sequences Ω = {0, 1}Z endowed with the
product sigma-algebra F. Define the left-shift T via (T ω)i = ωi+1 . An
event A ∈ F is said to be translation-invariant if Q(A4T −1 (A)) = 0.
(a) Argue that Q is translation-invariant; that is Q ◦ T −1 = Q.
(b) Show that for any two cylinder sets C1 and C2 , there exists a finite
N > 0 such that Q(C1 ∩ T −N C2 ) = Q(C1 )Q(C2 ).
(c) Show that every translation-invariant event is trivial, in the sense
that they have probability zero or one. Hint: Show that Q(A) =
Q(A)2 . Approximate A by a finite disjoint union of cylinder sets.
A sequence of probability measures µn is consistent if
µn+1 (F1 × · · · × Fn × Ωn+1 ) = µn (F1 × · · · Fn ),
for all Fi ∈ Fi .
Theorem 6 (Kolmogorov extension theorem). If µn be a sequence of
consistent probability measures on (Rn , B n ), then there exists a unique
probability measure µ on the infinite product space such that µ agrees
with the µn on the cyclinder sets.
Remark 1. Theorem 6 also holds in the case of an infinite product of
a general measurable space (Ω, F) provided that it is a standard Borel
space; that is, there exists a measurable bijection φ : Ω → R.
Exercise 2.6. Use Theorem 6 to construct a Markov chain X with a
transition matrix P on a countable state space S that is started at the
probability measure µ0 .
Solution. Without loss of generality, we may assume that S = N ⊂ R.
Thus we can start with the probability measure µ0 on defined on (R, B).
We can define µ1 on (R2 , B 2 ) by
µ1 (a0 , a1 ) = µ0 (a0 )pa0 ,a1 .
It follows from the fact that P is a transition matrix that µ1 is also
a probability measure supported on S × S, and µ1 (a0 , R) = µ0 (a0 ).
Similarly, we can define µn on (Rn+1 , B n+1 ) via
µn+1 (a0 , a1 , . . . , an+1 ) = µ0 (a0 )pa0 ,a1 · · · pan−1 ,an ,
and we obtain a sequence of consistent probability measures. From
the Kolmogorov extension theorem, there exists a unique probability
measure P on (RN , B N ) that agrees with the µn on the finite dimensional
sets. Let X be the random variable on the probability space (RN , B N , P)
defined by X(ω) = ω for all ω ∈ RN . Set Xi (ω) = X(ω)i = ωi . If you
are not convinced we are done, then consider the following calculation.
Let a, b ∈ S. We have that
X
P(Xn = a) = µn (Rn , a) =
µ(a0 )pa0 ,a1 · · · pan−1 ,an .
a∈S n+1 ,an =a
Similarly, we have
P(Xn+1 = b, Xn = a) = µn (Rn , a, b)
X
=
µ(a0 )pa0 ,a1 · · · pan−1 ,an pan ,an+1
a∈S n+2 ,an =a,an+1 =b
X
= pa,b
µ(a0 )pa0 ,a1 · · · pan−1 ,an
a∈S n+1 ,an =a,an+1 =b
= pa,b P(Xn = a).
Hence we obtain that
Note that
P(Xn+1 = b, Xn = a)
P(Xn = a)
= pa,b ,
P(Xn+1 = b | Xn = a) =
as required.
Exercise 2.7. Let X = (X0 , X1 , . . .) be a Markov chain with transition
matrix P , started at a stationary distribution. Extend X to include all
negative integer times.
Exercise 2.8. Define a stationary Marov chain Y = (. . . , Y−1 , Y0 , Y1 , . . .)
such that there exists a translation-invariant event A that is not trivial;
see Exercise 2.5.
3. Stopping times and the strong Markov property, again
Let X be a Markov chain taking values on a countable state space S,
defined on a probability space (Ω, F, P). Set Fn = σ(X0 , . . . , Xn ). Let
T be a stopping time. You will soon be able to prove that {T = n} ∈
Fn ; in fact, this is the usual definition. We set FT to be sigma-algebra
of events F ∈ F such that F ∩ {T = n} ∈ Fn for all n ≥ 0.
Lemma 7. The stopping time T is measurable with respect to FT ; that
is, T −1 (B) ∈ FT for all B ⊂ N.
Proof. If n ∈ B, then T −1 (B) ∩ {T = n} = {T = n} ∈ Fn ; otherwise,
T −1 (B) ∩ {T = n} = ∅ ∈ Fn .
Lemma 8. Let X be a Markov chain taking values on a countable state
space S. Let T be a stopping time. Add to S a symbol 4 to obtain S4 .
Set
Z = (X0 , X1 , . . . , XT , 4, 4, . . .)
N
so that Z takes values in S4
. Then Z is measurable with respect to FT .
Proof. Let n ∈ N. It suffices to check that Z −1 (B) ∩ {T = n} ∈ Fn
for all sets B of the form B = (b0 , b1 , . . . , bk , 4, . . .), where bi ∈ S.
Note that by definition {T = n} ∈ Fn . If k 6= n, then the intersection is empty, and thus in Fn ; otherwise, k = n, and Z −1 (B) =
{X0 = b0 , . . . , Xn = bn } ∈ Fn
Exercise 3.1. Let X be a Markov chain taking values on a countable
state space S, defined on a probability space (Ω, F, P). Let T1 and T2
be stopping times.
(a) Check that FT is indeed a sigma-algebra.
(b) Check that min {S, T } is a stopping time
(c) Check that S + T is also a stopping time.
Theorem 9. Let X be a Markov chain taking values in a state space S
with a transition matrix P . Let T be a stopping time, with P(T < ∞) =
1. Let s ∈ S. Conditional on XT = s, we have that Y = (XT +k )∞
k=0 is a
Markov chain started at s with transition matrix P that is independent
of Z = (Xk )Tk=0 .
Proof. Note that it suffices to check that conditional on {XT = s}, we
have that Y is Markov chain started at s that is independent of FT .
Let C ∈ FT . We should check that for all measurable A ⊂ S N , we have
that
P(Y ∈ A, C | XT = s) = P(X ∈ A | X0 = s)P(C | XT = s).
Hopefully, you will believe (see Exercise 3.2) that we have enough measure theory to justify that we only need to consider cyclinder sets A of
the form
A = a ∈ S N : a0 = z0 , . . . , ak = zk .
(1)
Note that C is given by the disjoint union of Bn = C ∩ {T = n}, where
Bn ∈ Fn . Let z0 , . . . , zk ∈ S. Check using the Markov property that
P(XT = z0 , . . . , XT +k = zk , Bn , T = n, XT = s) =
P(X0 = z0 , . . . , Xk = zk |X0 = s)P(Bn , T = n, XT = s).
Summing over all n ≥ 0, we have
P(XT = z0 , . . . , XT +k = zk , C, XT = s) =
P(X0 = z0 , . . . , Xk = zk |X0 = s)P(C, XT = s),
and dividing by P(XT = s), we obtain the required result.
Exercise 3.2. Check that it really is enough to just check cylinder sets
in the proof of Theorem 9.
Solution. This is a consequence of the π-system lemma. Fix C ∈ FT ,
and s ∈ S. The random variables X and Y take values on (S N , F),
where F is the product sigma-algebra generated by the cylinder sets.
Consider the finite measures µ and ν defined by on (S N , F) via
µ(A) = P(Y ∈ A, C | XT = s)
and
ν(A) = P(X ∈ A | X0 = s)P(C | XT = s);
these are just the left and right hand sides of (1). We checked in the
proof of Theorem 9, that
µ(A) = ν(A)
for all cylinder sets A. Note that the cylinder sets are a π-system that
generates F. Hence Theorem 2 gives that µ = ν on all of F.
4. Stationary stochastic processes
Let (Ω, F, P) be a probability space. Let X = (Xi )i∈Z be a bi-infinite
sequence of random variables taking values in R. Let T : RZ → RZ be
given by (T x)i = xi+1 for all i ∈ Z; thus (T X)i = Xi+1 for all i ∈ Z.
d
We say that X is stationary if T X = X; that is,
P(X ∈ A) = P(T X ∈ A)
for all A ∈ B Z . We can also use the same definition in the case the
X = (Xi )i∈N is a unilateral sequence of random variables.
Exercise 4.1. Check that X is stationary if and only if for all ` ∈ Z
and any finite collection of n1 , . . . , nk ∈ Z and Borel sets B1 , . . . , Bk ∈
B, we have
P(Xn1 ∈ B1 , . . . , Xnk ∈ Bk ) = P(Xn1 +` ∈ B1 , . . . , Xnk +` ∈ Bk ).
Exercise 4.2. Show a Markov chain started at a stationary distribution
is indeed a stationary process.
Exercise 4.3. Let X be a stationary real-valued stochastic process on
(Ω, F, P). Show that if we set µ to be the law of X so that µ(A) =
P(X ∈ A) for all A ∈ B Z , then (RZ , B Z , µ, T ) is a measure-preserving
system.
Exercise 4.4. Let (RZ , B Z , µ, T ) is a probability measure-preserving
system, where (T x)i = xi+1 . Show that if we set Xi (x) = xi for all
i ∈ Z and x ∈ RZ , then X = (Xi )i∈Z is a stationary process.
Exercise 4.5. Let X be an irreducible Markov chain on a finite state
space, so that it has a unique stationary distribution. Use the Poincare
recurrence theorem to show if X is started at the stationary distribution,
then it will visit every state almost surely.