Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SOME TOOLS FROM MEASURE THEORY Abstract. Some more measure theory 1. Dealing with sigma-algebras 1.1. Measurable functions. Let (Ω, F) be a measurable space. Given a collection C of subsets of Ω, we let σ(C) denote the smallest sigmaalgebra that contains the sets C; we also say that C generates the sigma-algebra σ(C). Thus the open sets generate the Borel sets and B = σ(G), where G is the collection of opens, and B is the set of Borel sets. Recall that a function f : Ω → R is measurable if f −1 (B) = {f −1 (B) : B ∈ B} ⊂ F; that is, f −1 (B) ∈ F for all B ∈ B. We claimed that this condition is equivalent to checking the easier condition that f −1 (C) ⊂ F, in the case that C is the set of intervals of the form (−∞, x) for x ∈ R. Lemma 1. Let C be the set of intervals of the form (−∞, x) for x ∈ R. Then σ(C) = B. Proof. First, we argue that every open interval is in σ(C). Let a < b be real numbers. Clearly, (−∞, b) ∈ σ(C), and \ (−∞, a] = (−∞, a + n−1 ) ∈ σ(C). n>0 c Hence (−∞, a] = (a, ∞) ∈ σ(C), from which we can conclude that (a, ∞) ∩ (−∞, b) = (a, b) ∈ σ(C). Second, we note that it is theorem that every open set is a disjoint union of a countable number of intervals of open intervals, from which it follows that σ(C) contains every open set. Finally, since B was defined to be smallest sigma-algebra containing the open sets, we have that σ(C) = B. Exercise 1.1. Let (Ω, F) be a measurable space. Let C be a collection of subsets that generate B. Show that f : Ω → R is measurable if and only if f −1 (C) ⊂ F. We will also speak of random variables with that are not real-valued. For example random vectors of random sequences. In general, if (Ω, F) is measurable space, and (S, S) is another measurable space, we say that f : Ω → S is measurable if f −1 (S) ⊂ F. Exercise 1.2. Consider the measurable space (N, 2N ); here 2N is the set of all subsets of N. Check that the set of all singletons {n} such that n ∈ N generates 2N . Exercise 1.3. Prove a version of Exercise 1.1 for the case of general measuable functions. Let (Ω, F, P) be a probability space. We say that a sigma-algebra T ⊂ F is trivial if for every A ∈ T , we have P(A) ∈ {0, 1}. Exercise 1.4. Let X be a real-valued random variable. Show that if σ(X) is trival, then there exists a constant c ∈ R such that P(X = c) = 1. Exercise 1.5. Let (Ω, F, P) be a probability space, and let (S, S) be a measurable space. (a) Show that if S = {∅, S}, then every function X : Ω → S is a random variable, and σ(X) is trivial. (b) Show that if σ(X) is trivial, then there does not exists a partition of S given by S1 ∪ S2 = S with S1 , S2 ∈ S such that X takes values in both S1 and S2 . (c) Assume that S contains all the singletons of S; that is, all sets of the form {s}, with s ∈ S. Show that if X is discrete and σ(X) is trivial, then there exists c ∈ S such that P(X = c) = 1. (d) What happens if we do not know a priori know that X is discrete? 1.2. Measures. Recall in elementary probability courses, we said that two random variables X and Y are independent if P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y) for all x, y ∈ R. This condition is equivalent to the condition that P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B) for all A, B ∈ B. One way to justify this fact is via π-systems. Let Ω be a set. A collection of subsets I of Ω is a π-system if it is closed under finite intersections; that is if A, B ∈ I, then A ∩ B ∈ I. Theorem 2 (Uniqueness via π-systems). Let (Ω, F) be a measurable space and µ and ν be finite measures on Ω. If µ and ν agree on π-system that generates F, then µ and ν are equal on all of F. Exercise 1.6. We claimed that there exists a nice ‘Borel’ measure λ on (R, B), with properties such as λ(a, b) = b − a and translationinvariance. Show that there is only one Borel measure. Let (Ω, F, P) be a probability space. Let G and H be sub-sigmaalgebras of F; that is G, H ⊂ F. We say that G and H are independent if P(A ∩ B) = P(A)P(B) for all A ∈ G and all B ∈ H. Recall that X −1 (B) = {X −1 (B) : B ∈ B} is a sigma-algebra. Write σ(X) = X −1 (B). We say that X and Y are independent if σ(X) is independent of σ(Y ). Exercise 1.7. Check that this definition agrees with the usual definition. Exercise 1.8. Let f be a Borel measurable function. Check that Y = f (X) is measurable with respect to the σ(X); that is, Y −1 (B) ∈ σ(X). Exercise 1.9. Check that if X and Y are independent, then f (X) is independent of g(Y ), for Borel measurable funcitons f and g. Exercise 1.10. Let f : [0, 1] → R be a continuous function. Let Ui be a sequence of i.i.d. random variables uniformly distributed in [0, 1]. Show that Z 1 n 1X f (Ui ) → f (x)dx. n i=1 0 Exercise 1.11. Let X be Bernoulli random variable. Let Y ∈ σ(X) be a real-valued random variable. Show that Y is a discrete random variable with at most two distinct values. 2. Approximating and constructing measures 2.1. Caratheodory’s extension theorem. Let Ω be a set. An algebra A on Ω is a collection of subsets of Ω that contains Ω and is closed under complements and finite unions; that is, Ω ∈ A, if A ∈ A, then Ac ∈ A, and if A, B ∈ A, then A ∪ B ∈ A. Theorem 3 (Caratheodory’s extension theorem). Let Ω be a set, and A be an alegbra for Ω. If µ̃ be a measure on A, then there exists a measure µ on the measurable space (Ω, σ(A)) such that µ = µ̃ on A. The proof of Theorem 3 involves defining a set function on all subsets of Ω using µ̃: set X µ∗ (E) = inf µ̃(Ai ), i=1 where the the infimum is take over all sequences (Ai ) for which E ⊂ ∪i Ai and Ai ∈ A. The set function µ∗ is called an outer measure. We have that µ∗ = µ̃ on A, but the outer measure is only countably subadditive: even if Ei are disjoint sets, we only have [ X µ∗ Ei ≤ µ∗ (Ei ); i i however, restricting to µ∗ to σ(A) results in a measure µ. Using outer measure to construct measures has the following benefit that measurable sets can be approximated by more basic sets. In the case of Borel measure on [0, 1), we can set A to be the set of all finite unions of intervals for the form [a, b), and define µ̃[a, b) = b − a. Corollary 4. Let ε > 0. If E ∈ σ(A), then there exists a A ∈ A such that µ(A4E) < ε. Here A4E = (A \ E) ∪ (E \ A). Exercise 2.1. In Theorem 3, show that if µ̃(Ω) < ∞, then the extension µ is unique. Exercise 2.2. Let f : [0, 1] → R be a meaurable function. Show that there exist a sequence of step functions fn such that fn → f almost surely with respect to Borel measure. Here a step function is any finite linear combination of indicator functions over the intervals. Exercise 2.3. Check that the set A of all finite unions of the form [a, b) for all 0 ≤ a < b < 1, containing the empty set, is an algebra. Show that A is not a sigma-algebra. Solution. Notice that if A = [a, b) ∪ [c, d), where a < b < c < d, then Ac = [0, a) ∪ [b, c) ∪ [d, 1). Thus it is easy to see that A is closed under complements, as well as finite unions. Notice also that A does not even contain an interval of the form (a, b). A semialgebra A0 is collection of sets that are closed under finite intersections and have the property that any A ∈ A0 , we have that Ac is a finite disjoint union of members of A0 ; note that Ac does not have to be in A0 . The algebra generated by A0 is the collection A containing the empty-set and all finite disjoint unions of sets in A0 . 2.2. Product spaces and Kolmogorov’s extension theorem. Let (Ωi , Fi ) be measurable spaces with µ(Ωi ) < ∞. We define the product space to be (Ω1 × Ω2 , F1 ⊗ F2 ), where F1 ⊗ F2 is defined to be the sigma-algebra generated by all sets of the form F1 × F2 ∈ F1 × F2 . Often we will abuse notation, and write F1 × F2 = σ(F1 × F2 ). Theorem 5 (Product measures). Let (Ωi , Fi , µi ) be measure spaces. There exists a unique measure µ on the product space (Ω1 ×Ω2 , F1 ⊗F2 ), such that µ(A × B) = µ1 (A)µ2 (B) for all A × B ∈ F1 × F2 . The proof of Theorem 5 is harder than you might guess, but it is a corollary of Theorem 3. Q In the case of infinite product spaces, we consider the set Ω = i∈Z+ Ωi , which is the set of all sequences ω such that ω(i) ∈ Ωi for all i ∈ Z+ . and sigma-algebra generated by all the finite dimensional sets, which are sets given by the finite intersection of sets of the form {ω ∈ Ω : ω(i) ∈ Fi }; such sets are sometimes also called cylinder sets. Note that by de Morgan’s laws, the cylinder sets form a semialgebra; for example, c {ω : Ω : ω(1) ∈ F1 } ∩ {ω : Ω : ω(2) ∈ F2 } = {ω : Ω : ω(1) ∈ F1c } ∪ {ω : Ω : ω(2) ∈ F2c } . Sometimes even members of the algebra generated by the cylinder sets are called cylinder sets. Exercise 2.4. Use Theorem 3 to construct a probability space for an infinite sequence of i.i.d. fair coin flips. Exercise 2.5 (Ergodicity). Let X = (Xi )i∈Z be i.i.d. Bernoulli random variables. Let Q = P(X ∈ ·) be the law of X, so that Q is probability measure on space of bi-infinte sequences Ω = {0, 1}Z endowed with the product sigma-algebra F. Define the left-shift T via (T ω)i = ωi+1 . An event A ∈ F is said to be translation-invariant if Q(A4T −1 (A)) = 0. (a) Argue that Q is translation-invariant; that is Q ◦ T −1 = Q. (b) Show that for any two cylinder sets C1 and C2 , there exists a finite N > 0 such that Q(C1 ∩ T −N C2 ) = Q(C1 )Q(C2 ). (c) Show that every translation-invariant event is trivial, in the sense that they have probability zero or one. Hint: Show that Q(A) = Q(A)2 . Approximate A by a finite disjoint union of cylinder sets. A sequence of probability measures µn is consistent if µn+1 (F1 × · · · × Fn × Ωn+1 ) = µn (F1 × · · · Fn ), for all Fi ∈ Fi . Theorem 6 (Kolmogorov extension theorem). If µn be a sequence of consistent probability measures on (Rn , B n ), then there exists a unique probability measure µ on the infinite product space such that µ agrees with the µn on the cyclinder sets. Remark 1. Theorem 6 also holds in the case of an infinite product of a general measurable space (Ω, F) provided that it is a standard Borel space; that is, there exists a measurable bijection φ : Ω → R. Exercise 2.6. Use Theorem 6 to construct a Markov chain X with a transition matrix P on a countable state space S that is started at the probability measure µ0 . Solution. Without loss of generality, we may assume that S = N ⊂ R. Thus we can start with the probability measure µ0 on defined on (R, B). We can define µ1 on (R2 , B 2 ) by µ1 (a0 , a1 ) = µ0 (a0 )pa0 ,a1 . It follows from the fact that P is a transition matrix that µ1 is also a probability measure supported on S × S, and µ1 (a0 , R) = µ0 (a0 ). Similarly, we can define µn on (Rn+1 , B n+1 ) via µn+1 (a0 , a1 , . . . , an+1 ) = µ0 (a0 )pa0 ,a1 · · · pan−1 ,an , and we obtain a sequence of consistent probability measures. From the Kolmogorov extension theorem, there exists a unique probability measure P on (RN , B N ) that agrees with the µn on the finite dimensional sets. Let X be the random variable on the probability space (RN , B N , P) defined by X(ω) = ω for all ω ∈ RN . Set Xi (ω) = X(ω)i = ωi . If you are not convinced we are done, then consider the following calculation. Let a, b ∈ S. We have that X P(Xn = a) = µn (Rn , a) = µ(a0 )pa0 ,a1 · · · pan−1 ,an . a∈S n+1 ,an =a Similarly, we have P(Xn+1 = b, Xn = a) = µn (Rn , a, b) X = µ(a0 )pa0 ,a1 · · · pan−1 ,an pan ,an+1 a∈S n+2 ,an =a,an+1 =b X = pa,b µ(a0 )pa0 ,a1 · · · pan−1 ,an a∈S n+1 ,an =a,an+1 =b = pa,b P(Xn = a). Hence we obtain that Note that P(Xn+1 = b, Xn = a) P(Xn = a) = pa,b , P(Xn+1 = b | Xn = a) = as required. Exercise 2.7. Let X = (X0 , X1 , . . .) be a Markov chain with transition matrix P , started at a stationary distribution. Extend X to include all negative integer times. Exercise 2.8. Define a stationary Marov chain Y = (. . . , Y−1 , Y0 , Y1 , . . .) such that there exists a translation-invariant event A that is not trivial; see Exercise 2.5. 3. Stopping times and the strong Markov property, again Let X be a Markov chain taking values on a countable state space S, defined on a probability space (Ω, F, P). Set Fn = σ(X0 , . . . , Xn ). Let T be a stopping time. You will soon be able to prove that {T = n} ∈ Fn ; in fact, this is the usual definition. We set FT to be sigma-algebra of events F ∈ F such that F ∩ {T = n} ∈ Fn for all n ≥ 0. Lemma 7. The stopping time T is measurable with respect to FT ; that is, T −1 (B) ∈ FT for all B ⊂ N. Proof. If n ∈ B, then T −1 (B) ∩ {T = n} = {T = n} ∈ Fn ; otherwise, T −1 (B) ∩ {T = n} = ∅ ∈ Fn . Lemma 8. Let X be a Markov chain taking values on a countable state space S. Let T be a stopping time. Add to S a symbol 4 to obtain S4 . Set Z = (X0 , X1 , . . . , XT , 4, 4, . . .) N so that Z takes values in S4 . Then Z is measurable with respect to FT . Proof. Let n ∈ N. It suffices to check that Z −1 (B) ∩ {T = n} ∈ Fn for all sets B of the form B = (b0 , b1 , . . . , bk , 4, . . .), where bi ∈ S. Note that by definition {T = n} ∈ Fn . If k 6= n, then the intersection is empty, and thus in Fn ; otherwise, k = n, and Z −1 (B) = {X0 = b0 , . . . , Xn = bn } ∈ Fn Exercise 3.1. Let X be a Markov chain taking values on a countable state space S, defined on a probability space (Ω, F, P). Let T1 and T2 be stopping times. (a) Check that FT is indeed a sigma-algebra. (b) Check that min {S, T } is a stopping time (c) Check that S + T is also a stopping time. Theorem 9. Let X be a Markov chain taking values in a state space S with a transition matrix P . Let T be a stopping time, with P(T < ∞) = 1. Let s ∈ S. Conditional on XT = s, we have that Y = (XT +k )∞ k=0 is a Markov chain started at s with transition matrix P that is independent of Z = (Xk )Tk=0 . Proof. Note that it suffices to check that conditional on {XT = s}, we have that Y is Markov chain started at s that is independent of FT . Let C ∈ FT . We should check that for all measurable A ⊂ S N , we have that P(Y ∈ A, C | XT = s) = P(X ∈ A | X0 = s)P(C | XT = s). Hopefully, you will believe (see Exercise 3.2) that we have enough measure theory to justify that we only need to consider cyclinder sets A of the form A = a ∈ S N : a0 = z0 , . . . , ak = zk . (1) Note that C is given by the disjoint union of Bn = C ∩ {T = n}, where Bn ∈ Fn . Let z0 , . . . , zk ∈ S. Check using the Markov property that P(XT = z0 , . . . , XT +k = zk , Bn , T = n, XT = s) = P(X0 = z0 , . . . , Xk = zk |X0 = s)P(Bn , T = n, XT = s). Summing over all n ≥ 0, we have P(XT = z0 , . . . , XT +k = zk , C, XT = s) = P(X0 = z0 , . . . , Xk = zk |X0 = s)P(C, XT = s), and dividing by P(XT = s), we obtain the required result. Exercise 3.2. Check that it really is enough to just check cylinder sets in the proof of Theorem 9. Solution. This is a consequence of the π-system lemma. Fix C ∈ FT , and s ∈ S. The random variables X and Y take values on (S N , F), where F is the product sigma-algebra generated by the cylinder sets. Consider the finite measures µ and ν defined by on (S N , F) via µ(A) = P(Y ∈ A, C | XT = s) and ν(A) = P(X ∈ A | X0 = s)P(C | XT = s); these are just the left and right hand sides of (1). We checked in the proof of Theorem 9, that µ(A) = ν(A) for all cylinder sets A. Note that the cylinder sets are a π-system that generates F. Hence Theorem 2 gives that µ = ν on all of F. 4. Stationary stochastic processes Let (Ω, F, P) be a probability space. Let X = (Xi )i∈Z be a bi-infinite sequence of random variables taking values in R. Let T : RZ → RZ be given by (T x)i = xi+1 for all i ∈ Z; thus (T X)i = Xi+1 for all i ∈ Z. d We say that X is stationary if T X = X; that is, P(X ∈ A) = P(T X ∈ A) for all A ∈ B Z . We can also use the same definition in the case the X = (Xi )i∈N is a unilateral sequence of random variables. Exercise 4.1. Check that X is stationary if and only if for all ` ∈ Z and any finite collection of n1 , . . . , nk ∈ Z and Borel sets B1 , . . . , Bk ∈ B, we have P(Xn1 ∈ B1 , . . . , Xnk ∈ Bk ) = P(Xn1 +` ∈ B1 , . . . , Xnk +` ∈ Bk ). Exercise 4.2. Show a Markov chain started at a stationary distribution is indeed a stationary process. Exercise 4.3. Let X be a stationary real-valued stochastic process on (Ω, F, P). Show that if we set µ to be the law of X so that µ(A) = P(X ∈ A) for all A ∈ B Z , then (RZ , B Z , µ, T ) is a measure-preserving system. Exercise 4.4. Let (RZ , B Z , µ, T ) is a probability measure-preserving system, where (T x)i = xi+1 . Show that if we set Xi (x) = xi for all i ∈ Z and x ∈ RZ , then X = (Xi )i∈Z is a stationary process. Exercise 4.5. Let X be an irreducible Markov chain on a finite state space, so that it has a unique stationary distribution. Use the Poincare recurrence theorem to show if X is started at the stationary distribution, then it will visit every state almost surely.