Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Jan van Neerven Measure, Integral, and Conditional Expectation Handout May 2, 2012 1 Measure and Integral In this lecture we review elements of the Lebesgue theory of integration. 1.1 σ-Algebras Let Ω be a set. A σ-algebra in Ω is a collection F of subsets of Ω with the following properties: (1) Ω ∈ F ; (2) A ∈ F implies {A ∈ F ; S ∞ (3) A1 , A2 , · · · ∈ F implies n=1 An ∈ F . Here, {A = Ω \ A is the complement of A. A measurable space is a pair (Ω, F ), where Ω is a set and F is a σ-algebra in Ω. The above three properties express that F is non-empty, closed under taking complements, and closed under taking countable unions. From ∞ \ n=1 An = { ∞ [ {An n=1 it follows that F is closed under taking countable intersections. Clearly, F is closed under finite unions and intersections as well. Example 1.1. In every set Ω there is a minimal σ-algebra, {∅, Ω}, and a maximal σ-algebra, the set P(Ω) of all subsets of Ω. Example 1.2. If A is any collection of subsets of Ω, then there is a smallest σ-algebra, denoted σ(A ), containing A . This σ-algebra is obtained as the intersection of all σ-algebras containing A and is called the σ-algebra generated by A . 2 1 Measure and Integral Example 1.3. Let (Ω1 , F1 ), (Ω2 , F2 ), . . . be a sequence of measurable spaces. The product ∞ ∞ Y Y ( Ωn , Fn ) n=1 n=1 Q∞ Q∞ is defined by n=1 Ωn = Ω1 ×Ω2 ×. . . (the cartesian product) and n=1 Fn = F1 × F2 × . . . is defined as the σ-algebra generated by all sets of the form A1 × · · · × AN × ΩN +1 × ΩN +2 × . . . with QN N = 1, 2, . . . and An ∈ Fn for n = 1, . . . , N . Finite products n=1 (Ωn , Fn ) are defined similarly; here one takes the σ-algebra in Ω1 × · · · × ΩN generated by all sets of the form A1 × · · · × AN with An ∈ Fn for n = 1, . . . , N . Example 1.4. The Borel σ-algebra of Rd , notation B(Rd ), is the σ-algebra generated by all open sets in Rd . The sets in this σ-algebra are called the Borel sets of Rd . As an exercise, check that (Rd , B(Rd )) = d Y (R, B(R)). n=1 Hint: Every open set in Rd is a countable union of rectangles of the form [a1 , b1 ) × · · · × [ad , bd ). Example 1.5. Let f : Ω → R be any function. For Borel sets B ∈ B(R) we define {f ∈ B} := {ω ∈ Ω : f (ω) ∈ B}. The collection σ(f ) = {f ∈ B} : B ∈ B(R) is a σ-algebra in Ω, the σ-algebra generated by f . 1.2 Measurable functions Let (Ω1 , F1 ) and (Ω2 , F2 ) be measurable spaces. A function f : Ω1 → Ω2 is said to be measurable if {f ∈ A} ∈ F1 for all A ∈ F2 . Clearly, compositions of measurable functions are measurable. Proposition 1.6. If A2 is a subset of F2 with the property that σ(A2 ) = F2 , then a function f : Ω1 → Ω2 is measurable if and only if {f ∈ A} ∈ F1 for all A ∈ A2 . Proof. Just notice that {A ∈ F2 : {f ∈ A} ∈ F1 } is a sub-σ-algebra of F2 containing A2 . t u 1.3 Measures 3 In most applications we are concerned with functions f from a measurable space (Ω, F ) to (R, B(R)). Such functions are said to be Borel measurable. In what follows we summarise some measurability properties of Borel measurable functions (the adjective ‘Borel’ will be omitted when no confusion is likely to arise). Proposition 1.6 implies that a function f : Ω → R is Borel measurable if and only if {f > a} ∈ F for all a ∈ R. From this it follows that linear combinations, products, and quotients (if defined) of measurable functions are measurable. For example, if f and g are measurable, then f + g is measurable since [ {f > q} ∩ {g > a − q}. {f + g > a} = q∈Q This, in turn, implies that f g is measurable, as we have fg = 1 [(f + g)2 − (f 2 + g 2 )]. 2 If f = supn>1 fn pointwise and each fn is measurable, then f is measurable since ∞ [ {f > a} = {fn > a}. n=1 It follows from inf n>1 fn = − supn>1 (−fn ) that the pointwise infimum of a sequence of measurable functions is measurable as well. From this we get that the pointwise limes superior and limes inferior of measurable functions is measurable, since lim sup fn = lim sup fk = inf sup fk n→∞ n→∞ k>n n>1 k>n and lim inf n→∞ fn = − lim supn→∞ (−fn ). In particular, the pointwise limit limn→∞ fn of a sequence of measurable functions is measurable. In these results, it is implicitly understood that the suprema and infima exist and are finite pointwise. This restriction can be lifted by considering functions f : Ω → [−∞, ∞]. Such a function is said to be measurable if the sets f ∈ B} as well as the sets {f = ±∞} are in F . This extension will also be useful later on. 1.3 Measures Let (Ω, F ) be a measurable space. A measure is a mapping µ : F → [0, ∞] with the following properties: (1) µ(∅) = 0; P∞ S∞ (2) For all disjoint set A1 , A2 , . . . in F we have µ n=1 An = n=1 µ(An ). 4 1 Measure and Integral A triple (Ω, F , µ), with µ a measure on a measurable space (Ω, F ), is called measure space. A measure space (Ω, F , P) is called finite if µ is a finite measure, that is, if µ(Ω) < ∞. If µ(Ω) = 1, then µ is called a probability measure and (Ω, F , µ) is called a probability space. In probability theory, it is customary to use the symbol P for a probability measure. The following properties of measures are easily checked: (a) if A1 ⊆ A2 in F , then µ(A1 ) 6 µ(A2 ); (b) if A1 , A2 , . . . in F , then ∞ ∞ [ X µ An 6 µ(An ); n=1 n=1 Hint: Consider the disjoint sets An+1 \ (A1 ∪ · · · ∪ An ). (c) if A1 ⊆ A2 ⊆ . . . in F , then ∞ [ An = lim µ(An ); µ n=1 n→∞ Hint: Consider the disjoint sets An+1 \ An . (d) if A1 ⊇ A2 ⊇ . . . in F and µ(A1 ) < ∞, then ∞ \ An = lim µ(An ). µ n=1 n→∞ Hint: Take complements. Note that in c) and d) the limits (in [0, ∞]) exist by monotonicity. Example 1.7. Let (Ω, F ) and (Ω̃, F˜ ) be measurable spaces and let µ be a measure on (Ω, F ). If f : Ω → Ω̃ is measurable, then f∗ (µ)(Ã) := µ{f ∈ Ã}, Ã ∈ F˜ , defines a measure f∗ (µ) on (Ω̃, F˜ ). This measure is called the image of µ under f . In the context where (Ω, F , P) is a probability space, the image of P under f is called the distribution of f . Concerning existence and uniqueness of measures we have the following two fundamental results. Theorem 1.8 (Caratheodory). Let A be a non-empty collection of subsets of a set Ω that is closed under taking complements and finite unions. If µ : A → [0, ∞] is a mapping such that properties 1.2(1) and 1.2(2) hold, then there exists a unique measure µ e : σ(A ) → [0, ∞] such that µ e|A = µ. 1.3 Measures 5 The proof is lengthy and rather tedious. The typical situation is that measures are given, and therefore the proof is omitted. The uniqueness part is a consequence of the next result, which is useful in many other situations as well. Theorem 1.9 (Dynkin). Let µ1 and µ2 be two finite measures defined on a measurable space (Ω, F ). Let A ⊆ F be a collection of sets with the following properties: (1) A is closed under finite intersections; (2) σ(A ), the σ-algebra generated by A , equals F . If µ1 (A) = µ2 (A) for all A ∈ A , then µ1 = µ2 . Proof. Let D denote the collection of all sets D ∈ F with µ1 (D) = µ2 (D). Then A ⊆ D and D is a Dynkin system, that is, • Ω ∈ D; • if D1 ⊆ D2 with D1 , D2 ∈ D, then also D2 \ D S1∞∈ D; • if D1 ⊆ D2 ⊆ . . . with all Dn ∈ D, then also n=1 Dn ∈ D. The finiteness of µ1 and µ2 is used to prove the second property. By assumption we have D ⊆ F = σ(A ); we will show that σ(A ) ⊆ D. To this end let D0 denote the smallest Dynkin system in F containing A . We will show that σ(A ) ⊆ D0 . In view of D0 ⊆ D, this will prove the lemma. Let C = {D0 ∈ D0 : D0 ∩ A ∈ D0 for all A ∈ A }. Then C is a Dynkin system and A ⊆ C since A is closed under taking finite intersections. It follows that D0 ⊆ C , since D0 is the smallest Dynkin system containing A . But obviously, C ⊆ D0 , and therefore C = D0 . Now let C 0 = {D0 ∈ D0 : D0 ∩ D ∈ D0 for all D ∈ D0 }. Then C 0 is a Dynkin system and the fact that C = D0 implies that A ⊆ C 0 . Hence D0 ⊆ C 0 , since D0 is the smallest Dynkin system containing A . But obviously, C 0 ⊆ D0 , and therefore C 0 = D0 . It follows that D0 is closed under taking finite intersections. But a Dynkin system with this property is a σ-algebra. Thus, D0 is a σ-algebra, and now A ⊆ D0 implies that also σ(A ) ⊆ D0 . t u Example 1.10 (Lebesgue measure). In Rd , let A be the collection of all finite unions of rectangles [a1 , b1 ) × · · · × [ad , bd ). Define µ : A → [0, ∞] by µ([a1 , b1 ) × · · · × [ad , bd )) := d Y (bn − an ) n=1 and extend this definition to finite disjoint unions by additivity. It can be verified that A and µ satisfy the assumptions of Caratheodory’s theorem. The resulting measure µ on σ(A ) = B(Rd ), the Borel σ-algebra of Rd , is called the Lebesgue measure. 6 1 Measure and Integral Example 1.11. Let (Ω1 , F1 , P1 ), (Ω2 , F2 , P2 ), . . . be a sequence Q∞ of probability spaces. There exists a unique probability measure P = n=1 Pn on F = Q∞ F which satisfies n n=1 P(A1 × · · · × AN × ΩN +1 × ΩN +2 × . . . ) = N Y Pn (An ) n=1 for all N > 1 and An ∈ Fn for n = 1, . . . , N . Finite products of arbitrary measures µ1 , . . . , µN can be defined similarly. For example, the Lebesgue measure on Rd is the product measure of d copies of the Lebesgue measure on R. As an illustration of the use of Dynkin’s lemma we conclude this section with an elementary result on independence. In probability theory, a measurable function f : Ω → R, where (Ω, F , P) is a probability space, is called a random variable. Random variables f1 , . . . , fN : Ω → R are said to be independent if for all B1 , . . . , BN ∈ B(R) we have P{f1 ∈ B1 , . . . , fN ∈ BN } = N Y P{f1 ∈ B1 }. n=1 Proposition 1.12. The Q random variables f1 , . . . , fN : Ω → R are indepenN dent if and only if ν = n=1 νn , where ν is the distribution of (f1 , . . . , fN ) and νn is the distribution of fn , n = 1, . . . , N . Proof. By definition, f1 , . . . , fN are independent if and only if ν(B) = QN n=1 νn (Bn ) for all B = B1 × · · · × BN with Bn ∈ B(R) for all n = 1, . . . , N . Since these sets generate B(RN ), by Dynkin’s lemma the latter implies QN ν = n=1 νn . t u 1.4 The Lebesgue integral Let (Ω, F ) be a measurable space. A simple function is a function f : Ω → R that can be represented in the form f= N X cn 1An n=1 with coefficients cn ∈ R and disjoint sets An ∈ F for all n = 1, . . . , N . Proposition 1.13. A function f : Ω → [−∞, ∞] is measurable if and only if it is the pointwise limit of a sequence of simple functions. If in addition f is non-negative, we may arrange that 0 6 fn ↑ f pointwise. P22n . t u Proof. Take, for instance, fn = m=−22n 2mn 1{f ∈[ mn , m+1 n )} 2 2 1.4 The Lebesgue integral 7 1.4.1 Integration of simple functions Let (Ω, F , µ) be a measure space. For a non-negative simple function f = PN n=1 cn 1An we define Z f dµ := Ω N X cn µ(An ). n=1 We allow µ(An ) to be infinite; this causes no problems because the coefficients cn are non-negative (we use the convention 0 · ∞ = 0). It is easy to check that this integral is well-defined, in the sense that it does not depend on the particular representation of f as a simple function. Also, the integral is linear with respect to addition and multiplication with non-negative scalars, Z Z Z af1 + bf2 dµ = a f1 dµ + b f2 dµ, Ω Ω Ω and monotone in the sense that if 0 6 f1 6 f2 pointwise, then Z Z f1 dµ 6 f2 dµ. Ω Ω 1.4.2 Integration of non-negative functions In what follows, a non-negative function is a function with values in [0, ∞]. For a non-negative measurable function f we choose a sequence of simple functions 0 6 fn ↑ f and define Z Z f dµ := lim fn dµ. n→∞ Ω Ω The following lemma implies that this definition does not depend on the approximating sequence. Lemma 1.14. For a non-negative measurable function f and non-negative simple functions fn and g such that 0 6 fn ↑ f and g 6 f pointwise we have Z Z g dµ 6 lim fn dµ. n→∞ Ω Ω Proof. First consider the case g = 1A . Fix ε > 0 arbitrary and let An = {1A fn > 1 − ε}. Then A1 ⊆ A2 ⊆ . . . ↑ A and therefore µ(An ) ↑ µ(An ). Since 1A fn > (1 − ε)1An , Z Z lim 1A fn dµ > (1 − ε) lim µ(An ) = (1 − ε)µ(A) = (1 − ε) g dµ. n→∞ Ω n→∞ Ω This proves the lemma for g = 1A . The general case follows by linearity. t u 8 1 Measure and Integral The integral is linear and monotone on the set of non-negative measurable functions. Indeed, if f and g are such functions and 0 6 fn ↑ f and 0 6 gn ↑ g, then for a, b > 0 we have 0 6 afn + bgn ↑ af + bg and therefore Z Z af + bg dµ = lim afn + bgn dµ n→∞ Ω Ω Z Z = a lim fn dµ + b lim gn dµ n→∞ Ω n→∞ Ω Z Z =a f dµ + b g dµ. Ω Ω If such f and g satisfy f 6 g pointwise, then by noting that 0 6 fn 6 max{fn , gn } ↑ g we see that Z Z Z Z f dµ = lim fn dµ 6 lim max{fn , gn } dµ 6 g dµ. n→∞ Ω n→∞ Ω Ω Ω The next theorem is the cornerstone of Integration Theory. Theorem 1.15 (Monotone Convergence Theorem). Let 0 6 f1 6 f2 6 . . . be a sequence of non-negative measurable functions converging pointwise to a function f . Then, Z Z lim fn dµ = f dµ. n→∞ Ω Ω Proof. First note that f is non-negative and measurable. For each n > 1 choose a sequence of simple functions 0 6 fnk ↑k fn . Set gnk := max{f1k , . . . , fnk }. For m 6 n we have gmk 6 gnk . Also, for k 6 l we have fmk 6 fml , m = 1, . . . , n, and therefore gnk 6 gnl . We conclude that the functions gnk are monotone in both indices. From fmk 6 fm 6 fn , 1 6 m 6 n, we see that fnk 6 gnk 6 fn , and we conclude that 0 6 gnk ↑k fn . From fn = lim gnk 6 lim gkk 6 f k→∞ k→∞ we deduce that 0 6 gkk ↑ f . Recalling that gkk 6 fk it follows that Z Z Z Z gkk dµ 6 lim fk dµ 6 f dµ. f dµ = lim Ω k→∞ Ω k→∞ Ω t u Ω Example 1.16. In the situation of Example 1.7 we have the following substitution formula. For any measurable f : Ω1 → Ω2 and non-negative measurable φ : Ω2 → R, 1.4 The Lebesgue integral Z 9 Z φ ◦ f dµ = φ df∗ (µ). Ω1 Ω2 To prove this, note the this is trivial for simple functions φ = 1A with A ∈ F2 . By linearity, the identity extends to non-negative simple functions φ, and by monotone convergence (using Proposition 1.13) to non-negative measurable functions φ. From the monotone convergence theorem we deduce the following useful corollary. Theorem 1.17 (Fatou Lemma). Let (fn )∞ n=1 be a sequence of non-negative measurable functions on (Ω, F , µ). Then Z Z lim inf fn dµ 6 lim inf fn dµ. Ω n→∞ n→∞ Ω Proof. From inf k>n fk 6 fm , m > n, we infer Z Z inf fk dµ 6 inf fm dµ. m>n Ω k>n Ω Hence, by the monotone convergence theorem, Z Z lim inf fn dµ = lim inf fk dµ Ω n→∞ Ω n→∞ k>n Z Z Z = lim inf fk dµ 6 lim inf fm dµ = lim inf fn dµ. n→∞ n→∞ m>n Ω k>n Ω n→∞ Ω t u 1.4.3 Integration of real-valued functions A measurable function f : Ω → [−∞, ∞] is called integrable if Z |f | dµ < ∞. Ω Clearly, if f and g are measurable and |g| 6 |f | pointwise, then g is integrable if f is integrable. In particular, if f is integrable, then the non-negative functions f + and f − are integrable, and we define Z Z Z f dµ := f + dµ − f − dµ. Ω Ω Ω For a set A ∈ F we write Z Z f dµ := A 1A f dµ, Ω noting that 1A f is integrable. The monotonicity and additivity properties of this integral carry over to this more general situation, provided we assume that the functions we integrate are integrable. The next result is among the most useful in analysis. 10 1 Measure and Integral Theorem 1.18 (Dominated Convergence Theorem). Let (fn )∞ n=1 be a sequence of integrable functions such that limn→∞ fn = f pointwise. If there exists an integrable function g such that |fn | 6 |g| for all n > 1, then Z lim |fn − f | dµ = 0. n→∞ Ω In particular, Z lim n→∞ Z fn dµ = Ω f dµ. Ω Proof. We make the preliminary observation that if (hn )∞ n=1 is a sequence of non-negative measurable functions such that limn→∞ hn = 0 pointwise and h is a non-negative integrable function such that hn 6 h for all n > 1, then by the Fatou lemma Z Z h dµ = lim inf (h − hn ) dµ Ω Ω n→∞ Z Z Z 6 lim inf h − hn dµ = h dµ − lim sup hn dµ. n→∞ Ω Ω R n→∞ Since Ω h dµ is finite, it follows that 0 6 lim supn→∞ therefore Z lim hn dµ = 0. n→∞ Ω R Ω hn dµ 6 0 and Ω The theorem follows by applying this to hn = |fn − f | and h = 2|g|. t u Let (Ω, F , P) be a probability space. The integral of an integrable random variable f : Ω → R is called the expectation of f , notation Z f dP. Ef = Ω Proposition 1.19. If f : Ω → R and g : Ω → R are independent integrable random variables, then f g : Ω → R is integrable and Ef g = Ef Eg. Proof. To prove this we first observe that for any non-negative Borel measurable φ : R → R, the random variables φ ◦ f and φ ◦ g are independent. Taking φ(x) = x+ and φ(x) = x− , it follows that we may consider the positive and negative parts of f and g separately. In doing so, we may assume that f and g are non-negative. Denoting the distributions of f , g, and (f, g) by νf , νg , and ν(f,g) respectively, by the substitution formula (with φ(x, y) = xy1[0,∞)2 (x, y)) and the fact that ν(f,g) = νf × νg we obtain 1.5 Lp -spaces Z E(f g) = 11 Z xy d(νf × dνg )(x, y) Z xy dνf (x)dνg (y) = Ef Eg. xy dν(f,g) (x, y) = [0,∞)2 [0,∞)2 Z = [0,∞) [0,∞) t u This proves the integrability of f g along with the desired identity. 1.4.4 The role of null sets We begin with a simple observation. Proposition 1.20. Let f be a non-negative measurable function. R 1. If RΩ f dµ < ∞, then µ{f = ∞} = 0. 2. If Ω f dµ = 0, then µ{f 6= 0} = 0. Proof. For all c > 0 we have 0 6 c1{f =∞} 6 f and therefore Z 0 6 cµ{f = ∞} 6 f dµ. Ω The first result follows from this by letting c → ∞. For the second, note that for all n > 1 we have n1 1{f > n1 } 6 f and therefore 1 1 µ f> 6 n n It follows that µ{f > 1 n} Z f dµ = 0. Ω = 0. Now note that {f > 0} = S∞ n=1 {f Proposition 1.21. If f is integrable and µ{f 6= 0} = 0, then R Ω > 1 n }. t u f dµ = 0. Proof. By considering f + and f − separately we may assume f is non-negative. Choose simple functions 0 6 fn ↑ f . Then µ{fn > 0} 6 µ{f > 0} = 0 and R t u therefore Ω fn dµ = 0 for all n > 1. The result follows from this. What these results show is that in everything that has been said so far, we may replace pointwise convergence by pointwise convergence µ-almost everywhere, where the latter means that we allow an exceptional set of µ-measure zero. This applies, in particular, to the monotonicity assumption in the monotone convergence theorem and the convergence and domination assumptions in the dominated convergence theorem. 1.5 Lp -spaces Let µ be a measure on (Ω, F ) and fix 1 6 p < ∞. We define L p (µ) as the set of all measurable functions f : Ω → R such that 12 1 Measure and Integral Z |f |p dµ < ∞. Ω For such functions we define kf kp := Z |f |p dµ p1 . Ω The next result shows that L p (µ) is a vector space: Proposition 1.22 (Minkowski’s inequality). For all f, g ∈ L p (µ) we have f + g ∈ L p (µ) and kf + gkp 6 kf kp + kgkp . Proof. By elementary calculus it is checked that for all non-negative real numbers a and b one has (a + b)p = inf t1−p ap + (1 − t)1−p bp . t∈(0,1) Applying this identity to |f (x)| and |g(x)| and integrating with respect to µ, for all fixed t ∈ (0, 1) we obtain Z Z |f (x) + g(x)|p dµ(x) 6 (|f (x)| + |g(x)|)p dµ(x) Ω Ω Z Z 1−p p 1−p 6t |f (x)| dµ(x) + (1 − t) |g(x)|p dµ(x). Ω Ω Stated differently, this says that kf + gkpp 6 t1−p kf kpp + (1 − t)1−p kgkpp . Taking the infimum over all t ∈ (0, 1) gives the result. t u In spite of this result, k · kp is not a norm on L p (µ), because kf kp = 0 only implies that f = 0 µ-almost everywhere. In order to get around this imperfection, we define an equivalence relation ∼ on L p (µ) by f ∼ g ⇔ f = g µ-almost everywhere. The equivalence class of a function f modulo ∼ is denoted by [f ]. On the quotient space Lp (µ) := L p (µ)/ ∼ we define a scalar multiplication and addition in the natural way: c[f ] := [cf ], [f ] + [g] := [f + g]. We leave it as an exercise to check that both operators are well defined. With these operations, Lp (µ) is a normed vector space with respect to the norm 1.5 Lp -spaces 13 k[f ]kp := kf kp . Following common practice we shall make no distinction between a function in L p (µ) and its equivalence class in Lp (µ). The next result states that Lp (µ) is a Banach space (which, by definition, is a complete normed vector space): Theorem 1.23. The normed space Lp (µ) is complete. Proof. Let (fn )∞ n=1 be a Cauchy sequence with respect to the norm k · kp of Lp (µ). By passing to a subsequence we may assume that kfn+1 − fn kp 6 1 , 2n n = 1, 2, . . . Define the non-negative measurable functions gN := N −1 X |fn+1 − fn |, ∞ X g := n=0 |fn+1 − fn |, n=0 with the convention that f0 = 0. By the monotone convergence theorem, Z Z p g p dµ = lim gN dµ. N →∞ Ω Ω Taking p-th roots and using Minkowski’s inequality we obtain kgkp = lim kgN kp 6 lim N →∞ N −1 X N →∞ kfn+1 −fn kp = ∞ X kfn+1 −fn kp 6 1+kf1 kp . n=0 n=0 If follows that g is finitely-valued µ-almost everywhere, which means that the sum defining g converges absolutely µ-almost everywhere. As a result, the sum f := ∞ X (fn+1 − fn ) n=0 converges on the set {g < ∞}. On this set we have f = lim N →∞ N −1 X (fn+1 − fn ) = lim fN . N →∞ n=0 Defining f to be zero on the null set {g = ∞}, the resulting function f is measurable. From −1 −1 NX p NX p |fN | = (fn+1 − fn ) 6 |fn+1 − fn | 6 |g|p p n=0 n=0 and the dominated convergence theorem we conclude that 14 1 Measure and Integral lim kf − fN kp = 0. N →∞ We have proved that a subsequence of the original Cauchy sequence converges to f in Lp (µ). As is easily verified, this implies that the original Cauchy sequence converges to f as well. t u In the course of the proof we obtained the following result: Corollary 1.24. Every convergence sequence in Lp (µ) has a µ-almost everywhere convergent subsequence. For p = ∞ the above definitions have to be modified a little. We define L ∞ (µ) to be the vector space of all measurable functions f : Ω → R such that sup |f (ω)| < ∞ ω∈Ω and define, for such functions, kf k∞ := sup |f (ω)|. ω∈Ω The space L∞ (µ) := L ∞ (µ)/ ∼ is again a normed vector space in a natural way. The simple proof that L∞ (µ) is a Banach space is left as an exercise. We continue with an approximation result, the proof of which is an easy application of Proposition 1.13 together with the dominated convergence theorem: Proposition 1.25. For 1 6 p 6 ∞, the simple functions in Lp (µ) are dense in Lp (µ). We conclude with an important inequality, known as Hölder’s inequality. For p = q = 2, it is known as the Cauchy-Schwarz inequality. Theorem 1.26. Let 1 6 p, q 6 ∞ satisfy g ∈ Lq (µ), then f g ∈ L1 (µ) and 1 p + 1 q = 1. If f ∈ Lp (µ) and kf gk1 6 kf kp kgkq . Proof. For p = 1, q = ∞ and for p = ∞, q = 1, this follows by a direct estimate. Thus we may assume from now on that 1 < p, q < ∞. The inequality is then proved in the same way as Minkowski’s inequality, this time using the identity ab = inf t>0 tp ap p + bq . qtq t u 1.7 Notes 15 1.6 Fubini’s theorem A measure space (Ω, F , µ) is called σ-finite if we can write Ω = with sets Ω (n) ∈ F satisfying µ(Ω (n) ) < ∞. S∞ n=1 Ω (n) Theorem 1.27 (Fubini). Let (Ω1 , F1 , µ1 ) and (Ω1 , F1 , µ1 ) be σ-finite measure spaces. For any non-negative measurable function f : Ω1 × Ω2 → R the following assertions hold: R (1) The non-negative function ω1 7→ Ω2 f (ω1 , ω2 ) dµ2 (ω2 ) is measurable; R (2) The non-negative function ω2 7→ Ω1 f (ω1 , ω2 ) dµ1 (ω1 ) is measurable; (3) We have Z Z Z Z Z f dµ1 dµ2 = f dµ2 dµ1 . f d(µ1 × µ2 ) = Ω1 ×Ω2 Ω2 Ω1 Ω1 Ω2 We shall not give a detailed proof, but sketch the main steps. First, for f (ω1 , ω2 ) = 1A1 (ω1 )1A2 (ω2 ) with A1 ∈ F1 , A2 ∈ F2 , the result is clear. Dynkin’s lemma is used to extend the result to functions f = 1A with A ∈ F = F1 × F2 . One has to assume first that µ1 and µ2 are finite; the σ-finite case then follows by approximation. By taking linear combinations, the result extends to simple functions. The result for arbitrary non-negative measurable functions then follows by monotone convergence. A variation of the Fubini theorem holds if we replace ‘non-negative measurable’ by ‘integrable’. In that case, for µ1 -almost all ω1 ∈ Ω1 the function ω2 7→ f (ω1 , ω2 ) is integrable and for µ2 -almost all ω2 ∈ Ω R 2 the function ω1 7→ f (ω1 , ω2 ) is integrable. Moreover, the functions ω1 7→ Ω2 f (ω1 , ω2 ) dµ2 (ω2 ) R and ω2 7→ Ω1 f (ω1 , ω2 ) dµ1 (ω1 ) are integrable and the above identities hold. 1.7 Notes The proof of the monotone convergence theorem is taken from Kallenberg [1]. The proofs of the Minkowski inequality and Hölder inequality, although not the ones that are usually presented in introductory courses, belong to the folklore of the subject. 2 Conditional expectations In this lecture we introduce conditional expectations and study some of their basic properties. We begin by recalling some terminology. A random variable is a measurable function on a probability space (Ω, F , P). In this context, Ω is often called the sample space and the sets in F the events. For an integrable random variable f : Ω → R we write Z Ef := f dP Ω for its expectation. Below we fix a sub-σ-algebra G of F . For 1 6 p 6 ∞ we denote by Lp (Ω, G ) the subspace of all f ∈ Lp (Ω) having a G -measurable representative. With this notation, Lp (Ω) = Lp (Ω, F ). We will be interested in the problem of finding good approximations of a random variable f ∈ Lp (Ω) by random variables in Lp (Ω; G ). Lemma 2.1. Lp (Ω, G ) is a closed subspace of Lp (Ω). p Proof. Suppose that (ξn )∞ n=1 is a sequence in L (Ω, G ) such that limn→∞ fn = p f in L (Ω). We may assume that the fn are pointwise defined and G measurable. After passing to a subsequence (when 1 6 p < ∞) we may furthermore assume that limn→∞ fn = f almost surely. The set C of all ω ∈ Ω where the sequence (fn (ω))∞ n=1 converges is G -measurable. Put fe := limn→∞ 1C fn , where the limit exists pointwise. The random variable fe is G -measurable and agrees almost surely with f . This shows that f defines an element of Lp (Ω, G ). t u Our aim is to show that Lp (Ω, G ) is the range of a contractive projection in Lp (Ω). For p = 2 this is clear: we have the orthogonal decomposition L2 (Ω) = L2 (Ω, G ) ⊕ L2 (Ω, G )⊥ 18 2 Conditional expectations and the projection we have in mind is the orthogonal projection, denoted by PG , onto L2 (Ω, G ) along this decomposition. Following common usage we write E(f |G ) := PG (f ), f ∈ L2 (Ω), and call E(f |G ) the conditional expectation of f with respect to G . Let us emphasise that E(f |G ) is defined as an element of L2 (Ω, G ), that is, as an equivalence class of random variables. Lemma 2.2. For all f ∈ L2 (Ω) and G ∈ G we have Z Z E(f |G ) dP = f dP. G G As a consequence, if f > 0 almost surely, then E(f |G ) > 0 almost surely. Proof. By definition we have f − E(f |G ) ⊥ L2 (Ω, G ). If G ∈ G , then 1G ∈ L2 (Ω, G ) and therefore Z 1G (f − E(f |G )) dP = 0. Ω This gives the desired identity. For the second assertion, choose a G -measurable representative of g := E(f |G ) and apply the identity to the G -measurable set {g < 0}. t u Taking G = Ω we obtain the identity E(E(f |G )) = Ef . This will be used in the lemma, which asserts that the mapping f 7→ E(f |G ) is L1 -bounded. Lemma 2.3. For all f ∈ L2 (Ω) we have E|E(f |G )| 6 E|f |. Proof. It suffices to check that |E(f |G )| 6 E(|f | |G ), since then the lemma follows from E|E(f |G )| 6 EE(|f | |G ) = E|f |. Splitting f into positive and negative parts, almost surely we have |E(f |G )| = |E(f + |G ) − E(f − |G )| t 6 |E(f + |G )| + |E(f − |G )| = E(f + |G ) + E(f − |G ) = E(|f | |G ). u Since L2 (Ω) is dense in L1 (Ω) this lemma shows that the condition expectation operator has a unique extension to a contractive linear operator on L1 (Ω), which we also denote by E(·|G ). This operator is a projection (this follows by noting that it is a projection in L2 (Ω) and then using that L2 (Ω) is dense in L1 (Ω)) and it is positive in the sense that it maps positive random variables to positive random variables; this follows from Lemma 2.2 by approximation. Lemma 2.4 (Conditional Jensen inequality). If φ : R → R is convex, then for all f ∈ L1 (Ω) such that φ ◦ f ∈ L1 (Ω) we have, almost surely, φ ◦ E(f |G ) 6 E(φ ◦ f |G ). 2 Conditional expectations 19 Proof. If a, b ∈ R are such that at + b 6 φ(t) for all t ∈ R, then the positivity of the conditional expectation operator gives aE(f |G ) + b = E(af + b|G ) 6 E(φ ◦ f |G ) ∞ almost surely. Since φ is convex we can find real sequences (an )∞ n=1 and (bn )n=1 such that φ(t) = supn>1 (an t + bn ) for all t ∈ R; we leave the proof of this fact as an exercise. Hence almost surely, t u φ ◦ E(f |G ) = sup an E(f |G ) + bn 6 E(φ ◦ f |G ). n>1 Theorem 2.5 (Lp -contractivity). For all 1 6 p 6 ∞ the conditional expectation operator extends to a contractive positive projection on Lp (Ω) with range Lp (Ω, G ). For f ∈ Lp (Ω), the random variable E(f |G ) is the unique element of Lp (Ω, G ) with the property that for all G ∈ G , Z Z E(f |G ) dP = f dP. (2.1) G G Proof. For 1 6 p < ∞ the Lp -contractivity follows from Lemma 2.4 applied to the convex function φ(t) = |t|p . For p = ∞ we argue as follows. If f ∈ L∞ (Ω), then 0 6 |f | 6 kf k∞ 1Ω and therefore 0 6 E(|f | |G ) 6 kf k∞ 1Ω almost surely. Hence, E(|f | |G ) ∈ L∞ (Ω) and kE(|f | |G )k∞ 6 kf k∞ . For 2 6 p 6 ∞, (2.1) follows from Lemma 2.2. For f ∈ Lp (Ω) with 2 1 6 p < 2 we choose a sequence (fn )∞ n=1 in L (Ω) such that limn→∞ fn = f in Lp (Ω). Then limn→∞ E(fn |G ) = E(f |G ) in Lp (Ω) and therefore, for any G ∈ G, Z Z Z Z E(f |G ) dP = lim E(fn |G ) dP = lim fn dP = f dP. G n→∞ n→∞ G G G R p If R g ∈ L (Ω, G ) satisfies G g dP = G f dP for all G ∈ G , then G g dP = E(f |G ) dP for all G ∈ G . Since both g and E(f |G ) are G -measurable, as G in the proof of the second part of Lemma 2.2 this implies that g = E(f |G ) almost surely. In particular, E(E(f |G )|G ) = E(f |G ) for all f ∈ Lp (Ω) and E(f |G ) = f for all f ∈ Lp (Ω, G ). This shows that E(·|G ) is a projection onto Lp (Ω, G ). t u R R The next two results develop some properties of conditional expectations. Proposition 2.6. (1) If f ∈ L1 (Ω) and H is a sub-σ-algebra of G , then almost surely E(E(f |G )|H ) = E(f |H ). 20 2 Conditional expectations (2) If f ∈ L1 (Ω) is independent of G (that is, f is independent of 1G for all G ∈ G ), then almost surely E(f |G ) = Ef. (3) If f ∈ Lp (Ω) and g ∈ Lq (Ω, G ) with 1 6 p, q 6 ∞, p1 + 1q = 1, then almost surely E(gf |G ) = gE(f |G ). R R RProof. (1): For all H ∈ H we have H E(E(f |G )|H ) dP = H E(f |G ) dP = f dP by Theorem 2.5, first applied to H and then to G (observe that H H ∈ G ). Now the result follows from the uniqueness part of the Rtheorem. R (2): For all G ∈ G we have G Ef dP = E1G Ef = E1G f = G f dP, and the result follows from the uniquenessR part of Theorem 2.5. R 0 R (3): For all R G, G ∈ G we have G 1G0 E(f |G ) dP = G∩G0 E(f |G ) dP = f dP = G f 1G0 dP. Hence E(f 1G0 |G ) = 1G0 E(f |G ) by the uniqueness G∩G0 part of Theorem 2.5. By linearity, this gives the result for simple functions g, and the general case follows by approximation. t u Example 2.7. Let f : (−r, r) → R be integrable and let G be the σ-algebra of all symmetric Borel sets in (−r, r) (we call a set B symmetric if B = −B). Then, 1 E(f |G ) = (f + fˇ) almost surely, 2 ˇ where f (x) = f (−x). This is verified by checking the condition (2.1). Example 2.8. Let f : [0, 1) → R be integrable and let G be the σ-algebra n generated by the sets In := [ n−1 N , N ), n = 1, . . . , N . Then, E(f |G ) = N X mn 1[ n−1 , n ) almost surely, N N n=1 where mn = 1 |In | Z f (x) dx. In Again this is verified by checking the condition (2.1). If f and g are random variables on (Ω, F , P), we write E(g|f ) := E(g|σ(f )), where σ(f ) is the σ-algebra generated by f . Example 2.9. Let f1 , . . . , fN be independent integrable random variables satisfying E(f1 ) = · · · = E(fN ) = 0, and put SN := f1 + · · · + fN . Then, for 1 6 n 6 N, E(SN |fn ) = fn almost surely. This follows from the following almost sure identities: X X E(SN |fn ) = E(fn |fn ) + E(fm |fn ) = fn + E(fm ) = fn . m6=n m6=n 2.1 Notes 21 Example 2.10. Let f1 , . . . , fN be independent and identically distributed integrable random variables and put SN := f1 + · · · + fN . Then E(f1 |SN ) = · · · = E(fN |SN ) almost surely and therefore E(fn |SN ) = SN /N almost surely, n = 1, . . . , N. It is clear that the second identity follows from the first. To prove the first, let µ denote the (common) distribution of the random variables fn . By independence, the distribution of the RN -valued random variable (f1 , . . . , fN ) equals the N -fold product measure µN := µ × · · · × µ. Then, for any Borel set B ∈ B(R) we have, using the substitution formula and Fubini’s theorem, Z E(fn |SN ) dP {SN ∈B} Z = fn dP {S ∈B} Z N = xn dµN (x) {x1 +···+xN ∈B} Z Z = y dµ(y) dµN −1 (z1 , . . . , zN −1 ), RN −1 {y∈B−z1 −···−zN −1 } which is independent of n. 2.1 Notes We recommend Williams [2] for an introduction to the subject. The bible for modern probabilists is Kallenberg [1]. The definition of conditional expectations starting from orthogonal projections in L2 (Ω) follows Kallenberg [1]. Many textbooks prefer the shorter (but less elementary and transparent) route via the Radon-Nikodým theorem. In this approach, one notes that if f is a random variable on (Ω, F , P) and G is a sub-σ-algebra of F , then Z f dP, G ∈ G , Q(G) := G defines a probability measure Q on (Ω, G , P|G ) which is absolutely continuous with respect to P|G . Hence by the Radon-Nikodým, Q has an integrable density with respect to P|G . This density equals the conditional expectation E(f |G ) almost surely. References 1. O. Kallenberg, “Foundations of Modern Probability”, second ed., Probability and its Applications (New York), Springer-Verlag, New York, 2002. 2. D. Williams, “Probability with Martingales”, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991.