Download A nice handout on measure theory and conditional expectations by

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Shapley–Folkman lemma wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

Integral wikipedia , lookup

Lp space wikipedia , lookup

Transcript
Jan van Neerven
Measure, Integral, and
Conditional Expectation
Handout
May 2, 2012
1
Measure and Integral
In this lecture we review elements of the Lebesgue theory of integration.
1.1 σ-Algebras
Let Ω be a set. A σ-algebra in Ω is a collection F of subsets of Ω with the
following properties:
(1) Ω ∈ F ;
(2) A ∈ F implies {A ∈ F ; S
∞
(3) A1 , A2 , · · · ∈ F implies n=1 An ∈ F .
Here, {A = Ω \ A is the complement of A.
A measurable space is a pair (Ω, F ), where Ω is a set and F is a σ-algebra
in Ω.
The above three properties express that F is non-empty, closed under
taking complements, and closed under taking countable unions. From
∞
\
n=1
An = {
∞
[
{An
n=1
it follows that F is closed under taking countable intersections. Clearly, F is
closed under finite unions and intersections as well.
Example 1.1. In every set Ω there is a minimal σ-algebra, {∅, Ω}, and a maximal σ-algebra, the set P(Ω) of all subsets of Ω.
Example 1.2. If A is any collection of subsets of Ω, then there is a smallest
σ-algebra, denoted σ(A ), containing A . This σ-algebra is obtained as the intersection of all σ-algebras containing A and is called the σ-algebra generated
by A .
2
1 Measure and Integral
Example 1.3. Let (Ω1 , F1 ), (Ω2 , F2 ), . . . be a sequence of measurable spaces.
The product
∞
∞
Y
Y
(
Ωn ,
Fn )
n=1
n=1
Q∞
Q∞
is defined by n=1 Ωn = Ω1 ×Ω2 ×. . . (the cartesian product) and n=1 Fn =
F1 × F2 × . . . is defined as the σ-algebra generated by all sets of the form
A1 × · · · × AN × ΩN +1 × ΩN +2 × . . .
with
QN N = 1, 2, . . . and An ∈ Fn for n = 1, . . . , N . Finite products
n=1 (Ωn , Fn ) are defined similarly; here one takes the σ-algebra in Ω1 ×
· · · × ΩN generated by all sets of the form A1 × · · · × AN with An ∈ Fn for
n = 1, . . . , N .
Example 1.4. The Borel σ-algebra of Rd , notation B(Rd ), is the σ-algebra
generated by all open sets in Rd . The sets in this σ-algebra are called the
Borel sets of Rd . As an exercise, check that
(Rd , B(Rd )) =
d
Y
(R, B(R)).
n=1
Hint: Every open set in Rd is a countable union of rectangles of the form
[a1 , b1 ) × · · · × [ad , bd ).
Example 1.5. Let f : Ω → R be any function. For Borel sets B ∈ B(R) we
define
{f ∈ B} := {ω ∈ Ω : f (ω) ∈ B}.
The collection
σ(f ) = {f ∈ B} : B ∈ B(R)
is a σ-algebra in Ω, the σ-algebra generated by f .
1.2 Measurable functions
Let (Ω1 , F1 ) and (Ω2 , F2 ) be measurable spaces. A function f : Ω1 → Ω2 is
said to be measurable if {f ∈ A} ∈ F1 for all A ∈ F2 . Clearly, compositions
of measurable functions are measurable.
Proposition 1.6. If A2 is a subset of F2 with the property that σ(A2 ) = F2 ,
then a function f : Ω1 → Ω2 is measurable if and only if
{f ∈ A} ∈ F1 for all A ∈ A2 .
Proof. Just notice that {A ∈ F2 : {f ∈ A} ∈ F1 } is a sub-σ-algebra of F2
containing A2 .
t
u
1.3 Measures
3
In most applications we are concerned with functions f from a measurable
space (Ω, F ) to (R, B(R)). Such functions are said to be Borel measurable. In
what follows we summarise some measurability properties of Borel measurable
functions (the adjective ‘Borel’ will be omitted when no confusion is likely to
arise).
Proposition 1.6 implies that a function f : Ω → R is Borel measurable
if and only if {f > a} ∈ F for all a ∈ R. From this it follows that linear
combinations, products, and quotients (if defined) of measurable functions are
measurable. For example, if f and g are measurable, then f + g is measurable
since
[
{f > q} ∩ {g > a − q}.
{f + g > a} =
q∈Q
This, in turn, implies that f g is measurable, as we have
fg =
1
[(f + g)2 − (f 2 + g 2 )].
2
If f = supn>1 fn pointwise and each fn is measurable, then f is measurable
since
∞
[
{f > a} =
{fn > a}.
n=1
It follows from inf n>1 fn = − supn>1 (−fn ) that the pointwise infimum of
a sequence of measurable functions is measurable as well. From this we get
that the pointwise limes superior and limes inferior of measurable functions
is measurable, since
lim sup fn = lim sup fk = inf sup fk
n→∞
n→∞
k>n
n>1
k>n
and lim inf n→∞ fn = − lim supn→∞ (−fn ). In particular, the pointwise limit
limn→∞ fn of a sequence of measurable functions is measurable.
In these results, it is implicitly understood that the suprema and infima
exist and are finite pointwise. This restriction can be lifted by considering
functions f : Ω → [−∞, ∞]. Such a function is said to be measurable if the
sets f ∈ B} as well as the sets {f = ±∞} are in F . This extension will also
be useful later on.
1.3 Measures
Let (Ω, F ) be a measurable space. A measure is a mapping µ : F → [0, ∞]
with the following properties:
(1) µ(∅) = 0;
P∞
S∞
(2) For all disjoint set A1 , A2 , . . . in F we have µ n=1 An = n=1 µ(An ).
4
1 Measure and Integral
A triple (Ω, F , µ), with µ a measure on a measurable space (Ω, F ), is
called measure space. A measure space (Ω, F , P) is called finite if µ is a finite
measure, that is, if µ(Ω) < ∞. If µ(Ω) = 1, then µ is called a probability
measure and (Ω, F , µ) is called a probability space. In probability theory, it is
customary to use the symbol P for a probability measure.
The following properties of measures are easily checked:
(a) if A1 ⊆ A2 in F , then µ(A1 ) 6 µ(A2 );
(b) if A1 , A2 , . . . in F , then
∞
∞
[
X
µ
An 6
µ(An );
n=1
n=1
Hint: Consider the disjoint sets An+1 \ (A1 ∪ · · · ∪ An ).
(c) if A1 ⊆ A2 ⊆ . . . in F , then
∞
[
An = lim µ(An );
µ
n=1
n→∞
Hint: Consider the disjoint sets An+1 \ An .
(d) if A1 ⊇ A2 ⊇ . . . in F and µ(A1 ) < ∞, then
∞
\
An = lim µ(An ).
µ
n=1
n→∞
Hint: Take complements.
Note that in c) and d) the limits (in [0, ∞]) exist by monotonicity.
Example 1.7. Let (Ω, F ) and (Ω̃, F˜ ) be measurable spaces and let µ be a
measure on (Ω, F ). If f : Ω → Ω̃ is measurable, then
f∗ (µ)(Ã) := µ{f ∈ Ã},
à ∈ F˜ ,
defines a measure f∗ (µ) on (Ω̃, F˜ ). This measure is called the image of µ
under f . In the context where (Ω, F , P) is a probability space, the image of
P under f is called the distribution of f .
Concerning existence and uniqueness of measures we have the following
two fundamental results.
Theorem 1.8 (Caratheodory). Let A be a non-empty collection of subsets
of a set Ω that is closed under taking complements and finite unions. If µ :
A → [0, ∞] is a mapping such that properties 1.2(1) and 1.2(2) hold, then
there exists a unique measure µ
e : σ(A ) → [0, ∞] such that
µ
e|A = µ.
1.3 Measures
5
The proof is lengthy and rather tedious. The typical situation is that measures are given, and therefore the proof is omitted. The uniqueness part is a
consequence of the next result, which is useful in many other situations as
well.
Theorem 1.9 (Dynkin). Let µ1 and µ2 be two finite measures defined on a
measurable space (Ω, F ). Let A ⊆ F be a collection of sets with the following
properties:
(1) A is closed under finite intersections;
(2) σ(A ), the σ-algebra generated by A , equals F .
If µ1 (A) = µ2 (A) for all A ∈ A , then µ1 = µ2 .
Proof. Let D denote the collection of all sets D ∈ F with µ1 (D) = µ2 (D).
Then A ⊆ D and D is a Dynkin system, that is,
• Ω ∈ D;
• if D1 ⊆ D2 with D1 , D2 ∈ D, then also D2 \ D
S1∞∈ D;
• if D1 ⊆ D2 ⊆ . . . with all Dn ∈ D, then also n=1 Dn ∈ D.
The finiteness of µ1 and µ2 is used to prove the second property.
By assumption we have D ⊆ F = σ(A ); we will show that σ(A ) ⊆ D. To
this end let D0 denote the smallest Dynkin system in F containing A . We
will show that σ(A ) ⊆ D0 . In view of D0 ⊆ D, this will prove the lemma.
Let C = {D0 ∈ D0 : D0 ∩ A ∈ D0 for all A ∈ A }. Then C is a Dynkin
system and A ⊆ C since A is closed under taking finite intersections. It
follows that D0 ⊆ C , since D0 is the smallest Dynkin system containing A .
But obviously, C ⊆ D0 , and therefore C = D0 .
Now let C 0 = {D0 ∈ D0 : D0 ∩ D ∈ D0 for all D ∈ D0 }. Then C 0
is a Dynkin system and the fact that C = D0 implies that A ⊆ C 0 . Hence
D0 ⊆ C 0 , since D0 is the smallest Dynkin system containing A . But obviously,
C 0 ⊆ D0 , and therefore C 0 = D0 .
It follows that D0 is closed under taking finite intersections. But a Dynkin
system with this property is a σ-algebra. Thus, D0 is a σ-algebra, and now
A ⊆ D0 implies that also σ(A ) ⊆ D0 .
t
u
Example 1.10 (Lebesgue measure). In Rd , let A be the collection of all finite
unions of rectangles [a1 , b1 ) × · · · × [ad , bd ). Define µ : A → [0, ∞] by
µ([a1 , b1 ) × · · · × [ad , bd )) :=
d
Y
(bn − an )
n=1
and extend this definition to finite disjoint unions by additivity. It can be
verified that A and µ satisfy the assumptions of Caratheodory’s theorem.
The resulting measure µ on σ(A ) = B(Rd ), the Borel σ-algebra of Rd , is
called the Lebesgue measure.
6
1 Measure and Integral
Example 1.11. Let (Ω1 , F1 , P1 ), (Ω2 , F2 , P2 ), . . . be a sequence
Q∞ of probability
spaces.
There
exists
a
unique
probability
measure
P
=
n=1 Pn on F =
Q∞
F
which
satisfies
n
n=1
P(A1 × · · · × AN × ΩN +1 × ΩN +2 × . . . ) =
N
Y
Pn (An )
n=1
for all N > 1 and An ∈ Fn for n = 1, . . . , N . Finite products of arbitrary
measures µ1 , . . . , µN can be defined similarly.
For example, the Lebesgue measure on Rd is the product measure of d
copies of the Lebesgue measure on R.
As an illustration of the use of Dynkin’s lemma we conclude this section
with an elementary result on independence. In probability theory, a measurable function f : Ω → R, where (Ω, F , P) is a probability space, is called
a random variable. Random variables f1 , . . . , fN : Ω → R are said to be
independent if for all B1 , . . . , BN ∈ B(R) we have
P{f1 ∈ B1 , . . . , fN ∈ BN } =
N
Y
P{f1 ∈ B1 }.
n=1
Proposition 1.12. The Q
random variables f1 , . . . , fN : Ω → R are indepenN
dent if and only if ν = n=1 νn , where ν is the distribution of (f1 , . . . , fN )
and νn is the distribution of fn , n = 1, . . . , N .
Proof. By definition, f1 , . . . , fN are independent if and only if ν(B) =
QN
n=1 νn (Bn ) for all B = B1 × · · · × BN with Bn ∈ B(R) for all n = 1, . . . , N .
Since these sets generate B(RN ), by Dynkin’s lemma the latter implies
QN
ν = n=1 νn .
t
u
1.4 The Lebesgue integral
Let (Ω, F ) be a measurable space. A simple function is a function f : Ω → R
that can be represented in the form
f=
N
X
cn 1An
n=1
with coefficients cn ∈ R and disjoint sets An ∈ F for all n = 1, . . . , N .
Proposition 1.13. A function f : Ω → [−∞, ∞] is measurable if and only if
it is the pointwise limit of a sequence of simple functions. If in addition f is
non-negative, we may arrange that 0 6 fn ↑ f pointwise.
P22n
.
t
u
Proof. Take, for instance, fn = m=−22n 2mn 1{f ∈[ mn , m+1
n )}
2
2
1.4 The Lebesgue integral
7
1.4.1 Integration of simple functions
Let (Ω, F , µ) be a measure space. For a non-negative simple function f =
PN
n=1 cn 1An we define
Z
f dµ :=
Ω
N
X
cn µ(An ).
n=1
We allow µ(An ) to be infinite; this causes no problems because the coefficients
cn are non-negative (we use the convention 0 · ∞ = 0). It is easy to check
that this integral is well-defined, in the sense that it does not depend on the
particular representation of f as a simple function. Also, the integral is linear
with respect to addition and multiplication with non-negative scalars,
Z
Z
Z
af1 + bf2 dµ = a
f1 dµ + b
f2 dµ,
Ω
Ω
Ω
and monotone in the sense that if 0 6 f1 6 f2 pointwise, then
Z
Z
f1 dµ 6
f2 dµ.
Ω
Ω
1.4.2 Integration of non-negative functions
In what follows, a non-negative function is a function with values in [0, ∞].
For a non-negative measurable function f we choose a sequence of simple
functions 0 6 fn ↑ f and define
Z
Z
f dµ := lim
fn dµ.
n→∞
Ω
Ω
The following lemma implies that this definition does not depend on the approximating sequence.
Lemma 1.14. For a non-negative measurable function f and non-negative
simple functions fn and g such that 0 6 fn ↑ f and g 6 f pointwise we have
Z
Z
g dµ 6 lim
fn dµ.
n→∞
Ω
Ω
Proof. First consider the case g = 1A . Fix ε > 0 arbitrary and let An =
{1A fn > 1 − ε}. Then A1 ⊆ A2 ⊆ . . . ↑ A and therefore µ(An ) ↑ µ(An ). Since
1A fn > (1 − ε)1An ,
Z
Z
lim
1A fn dµ > (1 − ε) lim µ(An ) = (1 − ε)µ(A) = (1 − ε)
g dµ.
n→∞
Ω
n→∞
Ω
This proves the lemma for g = 1A . The general case follows by linearity.
t
u
8
1 Measure and Integral
The integral is linear and monotone on the set of non-negative measurable
functions. Indeed, if f and g are such functions and 0 6 fn ↑ f and 0 6 gn ↑ g,
then for a, b > 0 we have 0 6 afn + bgn ↑ af + bg and therefore
Z
Z
af + bg dµ = lim
afn + bgn dµ
n→∞ Ω
Ω
Z
Z
= a lim
fn dµ + b lim
gn dµ
n→∞ Ω
n→∞ Ω
Z
Z
=a
f dµ + b
g dµ.
Ω
Ω
If such f and g satisfy f 6 g pointwise, then by noting that 0 6 fn 6
max{fn , gn } ↑ g we see that
Z
Z
Z
Z
f dµ = lim
fn dµ 6 lim
max{fn , gn } dµ 6
g dµ.
n→∞
Ω
n→∞
Ω
Ω
Ω
The next theorem is the cornerstone of Integration Theory.
Theorem 1.15 (Monotone Convergence Theorem). Let 0 6 f1 6 f2 6
. . . be a sequence of non-negative measurable functions converging pointwise
to a function f . Then,
Z
Z
lim
fn dµ =
f dµ.
n→∞
Ω
Ω
Proof. First note that f is non-negative and measurable. For each n > 1 choose
a sequence of simple functions 0 6 fnk ↑k fn . Set
gnk := max{f1k , . . . , fnk }.
For m 6 n we have gmk 6 gnk . Also, for k 6 l we have fmk 6 fml , m =
1, . . . , n, and therefore gnk 6 gnl . We conclude that the functions gnk are
monotone in both indices.
From fmk 6 fm 6 fn , 1 6 m 6 n, we see that fnk 6 gnk 6 fn , and we
conclude that 0 6 gnk ↑k fn . From
fn = lim gnk 6 lim gkk 6 f
k→∞
k→∞
we deduce that 0 6 gkk ↑ f . Recalling that gkk 6 fk it follows that
Z
Z
Z
Z
gkk dµ 6 lim
fk dµ 6
f dµ.
f dµ = lim
Ω
k→∞
Ω
k→∞
Ω
t
u
Ω
Example 1.16. In the situation of Example 1.7 we have the following substitution formula. For any measurable f : Ω1 → Ω2 and non-negative measurable
φ : Ω2 → R,
1.4 The Lebesgue integral
Z
9
Z
φ ◦ f dµ =
φ df∗ (µ).
Ω1
Ω2
To prove this, note the this is trivial for simple functions φ = 1A with A ∈ F2 .
By linearity, the identity extends to non-negative simple functions φ, and by
monotone convergence (using Proposition 1.13) to non-negative measurable
functions φ.
From the monotone convergence theorem we deduce the following useful
corollary.
Theorem 1.17 (Fatou Lemma). Let (fn )∞
n=1 be a sequence of non-negative
measurable functions on (Ω, F , µ). Then
Z
Z
lim inf fn dµ 6 lim inf
fn dµ.
Ω n→∞
n→∞
Ω
Proof. From inf k>n fk 6 fm , m > n, we infer
Z
Z
inf fk dµ 6 inf
fm dµ.
m>n
Ω k>n
Ω
Hence, by the monotone convergence theorem,
Z
Z
lim inf fn dµ =
lim inf fk dµ
Ω n→∞
Ω n→∞ k>n
Z
Z
Z
= lim
inf fk dµ 6 lim inf
fm dµ = lim inf
fn dµ.
n→∞
n→∞ m>n
Ω k>n
Ω
n→∞
Ω
t
u
1.4.3 Integration of real-valued functions
A measurable function f : Ω → [−∞, ∞] is called integrable if
Z
|f | dµ < ∞.
Ω
Clearly, if f and g are measurable and |g| 6 |f | pointwise, then g is integrable if
f is integrable. In particular, if f is integrable, then the non-negative functions
f + and f − are integrable, and we define
Z
Z
Z
f dµ :=
f + dµ −
f − dµ.
Ω
Ω
Ω
For a set A ∈ F we write
Z
Z
f dµ :=
A
1A f dµ,
Ω
noting that 1A f is integrable. The monotonicity and additivity properties of
this integral carry over to this more general situation, provided we assume
that the functions we integrate are integrable.
The next result is among the most useful in analysis.
10
1 Measure and Integral
Theorem 1.18 (Dominated Convergence Theorem). Let (fn )∞
n=1 be a
sequence of integrable functions such that limn→∞ fn = f pointwise. If there
exists an integrable function g such that |fn | 6 |g| for all n > 1, then
Z
lim
|fn − f | dµ = 0.
n→∞
Ω
In particular,
Z
lim
n→∞
Z
fn dµ =
Ω
f dµ.
Ω
Proof. We make the preliminary observation that if (hn )∞
n=1 is a sequence of
non-negative measurable functions such that limn→∞ hn = 0 pointwise and h
is a non-negative integrable function such that hn 6 h for all n > 1, then by
the Fatou lemma
Z
Z
h dµ =
lim inf (h − hn ) dµ
Ω
Ω n→∞
Z
Z
Z
6 lim inf
h − hn dµ =
h dµ − lim sup
hn dµ.
n→∞
Ω
Ω
R
n→∞
Since Ω h dµ is finite, it follows that 0 6 lim supn→∞
therefore
Z
lim
hn dµ = 0.
n→∞
Ω
R
Ω
hn dµ 6 0 and
Ω
The theorem follows by applying this to hn = |fn − f | and h = 2|g|.
t
u
Let (Ω, F , P) be a probability space. The integral of an integrable random
variable f : Ω → R is called the expectation of f , notation
Z
f dP.
Ef =
Ω
Proposition 1.19. If f : Ω → R and g : Ω → R are independent integrable
random variables, then f g : Ω → R is integrable and
Ef g = Ef Eg.
Proof. To prove this we first observe that for any non-negative Borel measurable φ : R → R, the random variables φ ◦ f and φ ◦ g are independent. Taking
φ(x) = x+ and φ(x) = x− , it follows that we may consider the positive and
negative parts of f and g separately. In doing so, we may assume that f and
g are non-negative.
Denoting the distributions of f , g, and (f, g) by νf , νg , and ν(f,g) respectively, by the substitution formula (with φ(x, y) = xy1[0,∞)2 (x, y)) and the
fact that ν(f,g) = νf × νg we obtain
1.5 Lp -spaces
Z
E(f g) =
11
Z
xy d(νf × dνg )(x, y)
Z
xy dνf (x)dνg (y) = Ef Eg.
xy dν(f,g) (x, y) =
[0,∞)2
[0,∞)2
Z
=
[0,∞)
[0,∞)
t
u
This proves the integrability of f g along with the desired identity.
1.4.4 The role of null sets
We begin with a simple observation.
Proposition 1.20. Let f be a non-negative measurable function.
R
1. If RΩ f dµ < ∞, then µ{f = ∞} = 0.
2. If Ω f dµ = 0, then µ{f 6= 0} = 0.
Proof. For all c > 0 we have 0 6 c1{f =∞} 6 f and therefore
Z
0 6 cµ{f = ∞} 6
f dµ.
Ω
The first result follows from this by letting c → ∞. For the second, note that
for all n > 1 we have n1 1{f > n1 } 6 f and therefore
1 1
µ f>
6
n
n
It follows that µ{f >
1
n}
Z
f dµ = 0.
Ω
= 0. Now note that {f > 0} =
S∞
n=1 {f
Proposition 1.21. If f is integrable and µ{f 6= 0} = 0, then
R
Ω
>
1
n }.
t
u
f dµ = 0.
Proof. By considering f + and f − separately we may assume f is non-negative.
Choose simple
functions 0 6 fn ↑ f . Then µ{fn > 0} 6 µ{f > 0} = 0 and
R
t
u
therefore Ω fn dµ = 0 for all n > 1. The result follows from this.
What these results show is that in everything that has been said so far, we
may replace pointwise convergence by pointwise convergence µ-almost everywhere, where the latter means that we allow an exceptional set of µ-measure
zero. This applies, in particular, to the monotonicity assumption in the monotone convergence theorem and the convergence and domination assumptions
in the dominated convergence theorem.
1.5 Lp -spaces
Let µ be a measure on (Ω, F ) and fix 1 6 p < ∞. We define L p (µ) as the
set of all measurable functions f : Ω → R such that
12
1 Measure and Integral
Z
|f |p dµ < ∞.
Ω
For such functions we define
kf kp :=
Z
|f |p dµ
p1
.
Ω
The next result shows that L p (µ) is a vector space:
Proposition 1.22 (Minkowski’s inequality). For all f, g ∈ L p (µ) we
have f + g ∈ L p (µ) and
kf + gkp 6 kf kp + kgkp .
Proof. By elementary calculus it is checked that for all non-negative real numbers a and b one has
(a + b)p = inf t1−p ap + (1 − t)1−p bp .
t∈(0,1)
Applying this identity to |f (x)| and |g(x)| and integrating with respect to µ,
for all fixed t ∈ (0, 1) we obtain
Z
Z
|f (x) + g(x)|p dµ(x) 6
(|f (x)| + |g(x)|)p dµ(x)
Ω
Ω
Z
Z
1−p
p
1−p
6t
|f (x)| dµ(x) + (1 − t)
|g(x)|p dµ(x).
Ω
Ω
Stated differently, this says that
kf + gkpp 6 t1−p kf kpp + (1 − t)1−p kgkpp .
Taking the infimum over all t ∈ (0, 1) gives the result.
t
u
In spite of this result, k · kp is not a norm on L p (µ), because kf kp = 0
only implies that f = 0 µ-almost everywhere. In order to get around this
imperfection, we define an equivalence relation ∼ on L p (µ) by
f ∼ g ⇔ f = g µ-almost everywhere.
The equivalence class of a function f modulo ∼ is denoted by [f ]. On the
quotient space
Lp (µ) := L p (µ)/ ∼
we define a scalar multiplication and addition in the natural way:
c[f ] := [cf ],
[f ] + [g] := [f + g].
We leave it as an exercise to check that both operators are well defined. With
these operations, Lp (µ) is a normed vector space with respect to the norm
1.5 Lp -spaces
13
k[f ]kp := kf kp .
Following common practice we shall make no distinction between a function
in L p (µ) and its equivalence class in Lp (µ).
The next result states that Lp (µ) is a Banach space (which, by definition,
is a complete normed vector space):
Theorem 1.23. The normed space Lp (µ) is complete.
Proof. Let (fn )∞
n=1 be a Cauchy sequence with respect to the norm k · kp of
Lp (µ). By passing to a subsequence we may assume that
kfn+1 − fn kp 6
1
,
2n
n = 1, 2, . . .
Define the non-negative measurable functions
gN :=
N
−1
X
|fn+1 − fn |,
∞
X
g :=
n=0
|fn+1 − fn |,
n=0
with the convention that f0 = 0. By the monotone convergence theorem,
Z
Z
p
g p dµ = lim
gN
dµ.
N →∞
Ω
Ω
Taking p-th roots and using Minkowski’s inequality we obtain
kgkp = lim kgN kp 6 lim
N →∞
N
−1
X
N →∞
kfn+1 −fn kp =
∞
X
kfn+1 −fn kp 6 1+kf1 kp .
n=0
n=0
If follows that g is finitely-valued µ-almost everywhere, which means that the
sum defining g converges absolutely µ-almost everywhere. As a result, the sum
f :=
∞
X
(fn+1 − fn )
n=0
converges on the set {g < ∞}. On this set we have
f = lim
N →∞
N
−1
X
(fn+1 − fn ) = lim fN .
N →∞
n=0
Defining f to be zero on the null set {g = ∞}, the resulting function f is
measurable. From
−1
−1
NX
p NX
p
|fN | = (fn+1 − fn ) 6
|fn+1 − fn | 6 |g|p
p
n=0
n=0
and the dominated convergence theorem we conclude that
14
1 Measure and Integral
lim kf − fN kp = 0.
N →∞
We have proved that a subsequence of the original Cauchy sequence converges to f in Lp (µ). As is easily verified, this implies that the original Cauchy
sequence converges to f as well.
t
u
In the course of the proof we obtained the following result:
Corollary 1.24. Every convergence sequence in Lp (µ) has a µ-almost everywhere convergent subsequence.
For p = ∞ the above definitions have to be modified a little. We define
L ∞ (µ) to be the vector space of all measurable functions f : Ω → R such
that
sup |f (ω)| < ∞
ω∈Ω
and define, for such functions,
kf k∞ := sup |f (ω)|.
ω∈Ω
The space
L∞ (µ) := L ∞ (µ)/ ∼
is again a normed vector space in a natural way. The simple proof that L∞ (µ)
is a Banach space is left as an exercise.
We continue with an approximation result, the proof of which is an easy
application of Proposition 1.13 together with the dominated convergence theorem:
Proposition 1.25. For 1 6 p 6 ∞, the simple functions in Lp (µ) are dense
in Lp (µ).
We conclude with an important inequality, known as Hölder’s inequality.
For p = q = 2, it is known as the Cauchy-Schwarz inequality.
Theorem 1.26. Let 1 6 p, q 6 ∞ satisfy
g ∈ Lq (µ), then f g ∈ L1 (µ) and
1
p
+
1
q
= 1. If f ∈ Lp (µ) and
kf gk1 6 kf kp kgkq .
Proof. For p = 1, q = ∞ and for p = ∞, q = 1, this follows by a direct estimate.
Thus we may assume from now on that 1 < p, q < ∞. The inequality is then
proved in the same way as Minkowski’s inequality, this time using the identity
ab = inf
t>0
tp ap
p
+
bq .
qtq
t
u
1.7 Notes
15
1.6 Fubini’s theorem
A measure space (Ω, F , µ) is called σ-finite if we can write Ω =
with sets Ω (n) ∈ F satisfying µ(Ω (n) ) < ∞.
S∞
n=1
Ω (n)
Theorem 1.27 (Fubini). Let (Ω1 , F1 , µ1 ) and (Ω1 , F1 , µ1 ) be σ-finite measure spaces. For any non-negative measurable function f : Ω1 × Ω2 → R the
following assertions hold:
R
(1) The non-negative function ω1 7→ Ω2 f (ω1 , ω2 ) dµ2 (ω2 ) is measurable;
R
(2) The non-negative function ω2 7→ Ω1 f (ω1 , ω2 ) dµ1 (ω1 ) is measurable;
(3) We have
Z Z
Z
Z Z
f dµ1 dµ2 =
f dµ2 dµ1 .
f d(µ1 × µ2 ) =
Ω1 ×Ω2
Ω2
Ω1
Ω1
Ω2
We shall not give a detailed proof, but sketch the main steps. First, for
f (ω1 , ω2 ) = 1A1 (ω1 )1A2 (ω2 ) with A1 ∈ F1 , A2 ∈ F2 , the result is clear.
Dynkin’s lemma is used to extend the result to functions f = 1A with A ∈
F = F1 × F2 . One has to assume first that µ1 and µ2 are finite; the σ-finite
case then follows by approximation. By taking linear combinations, the result
extends to simple functions. The result for arbitrary non-negative measurable
functions then follows by monotone convergence.
A variation of the Fubini theorem holds if we replace ‘non-negative measurable’ by ‘integrable’. In that case, for µ1 -almost all ω1 ∈ Ω1 the function
ω2 7→ f (ω1 , ω2 ) is integrable and for µ2 -almost all ω2 ∈ Ω
R 2 the function ω1 7→
f (ω1 , ω2 ) is integrable. Moreover, the functions ω1 7→ Ω2 f (ω1 , ω2 ) dµ2 (ω2 )
R
and ω2 7→ Ω1 f (ω1 , ω2 ) dµ1 (ω1 ) are integrable and the above identities hold.
1.7 Notes
The proof of the monotone convergence theorem is taken from Kallenberg
[1]. The proofs of the Minkowski inequality and Hölder inequality, although
not the ones that are usually presented in introductory courses, belong to the
folklore of the subject.
2
Conditional expectations
In this lecture we introduce conditional expectations and study some of their
basic properties.
We begin by recalling some terminology. A random variable is a measurable
function on a probability space (Ω, F , P). In this context, Ω is often called the
sample space and the sets in F the events. For an integrable random variable
f : Ω → R we write
Z
Ef :=
f dP
Ω
for its expectation.
Below we fix a sub-σ-algebra G of F . For 1 6 p 6 ∞ we denote by
Lp (Ω, G ) the subspace of all f ∈ Lp (Ω) having a G -measurable representative.
With this notation, Lp (Ω) = Lp (Ω, F ). We will be interested in the problem
of finding good approximations of a random variable f ∈ Lp (Ω) by random
variables in Lp (Ω; G ).
Lemma 2.1. Lp (Ω, G ) is a closed subspace of Lp (Ω).
p
Proof. Suppose that (ξn )∞
n=1 is a sequence in L (Ω, G ) such that limn→∞ fn =
p
f in L (Ω). We may assume that the fn are pointwise defined and G measurable. After passing to a subsequence (when 1 6 p < ∞) we may
furthermore assume that limn→∞ fn = f almost surely. The set C of all
ω ∈ Ω where the sequence (fn (ω))∞
n=1 converges is G -measurable. Put
fe := limn→∞ 1C fn , where the limit exists pointwise. The random variable
fe is G -measurable and agrees almost surely with f . This shows that f defines
an element of Lp (Ω, G ).
t
u
Our aim is to show that Lp (Ω, G ) is the range of a contractive projection
in Lp (Ω). For p = 2 this is clear: we have the orthogonal decomposition
L2 (Ω) = L2 (Ω, G ) ⊕ L2 (Ω, G )⊥
18
2 Conditional expectations
and the projection we have in mind is the orthogonal projection, denoted
by PG , onto L2 (Ω, G ) along this decomposition. Following common usage we
write
E(f |G ) := PG (f ), f ∈ L2 (Ω),
and call E(f |G ) the conditional expectation of f with respect to G . Let us
emphasise that E(f |G ) is defined as an element of L2 (Ω, G ), that is, as an
equivalence class of random variables.
Lemma 2.2. For all f ∈ L2 (Ω) and G ∈ G we have
Z
Z
E(f |G ) dP =
f dP.
G
G
As a consequence, if f > 0 almost surely, then E(f |G ) > 0 almost surely.
Proof. By definition we have f − E(f |G ) ⊥ L2 (Ω, G ). If G ∈ G , then 1G ∈
L2 (Ω, G ) and therefore
Z
1G (f − E(f |G )) dP = 0.
Ω
This gives the desired identity. For the second assertion, choose a G -measurable
representative of g := E(f |G ) and apply the identity to the G -measurable set
{g < 0}.
t
u
Taking G = Ω we obtain the identity E(E(f |G )) = Ef . This will be used
in the lemma, which asserts that the mapping f 7→ E(f |G ) is L1 -bounded.
Lemma 2.3. For all f ∈ L2 (Ω) we have E|E(f |G )| 6 E|f |.
Proof. It suffices to check that |E(f |G )| 6 E(|f | |G ), since then the lemma
follows from E|E(f |G )| 6 EE(|f | |G ) = E|f |. Splitting f into positive and
negative parts, almost surely we have
|E(f |G )| = |E(f + |G ) − E(f − |G )|
t
6 |E(f + |G )| + |E(f − |G )| = E(f + |G ) + E(f − |G ) = E(|f | |G ). u
Since L2 (Ω) is dense in L1 (Ω) this lemma shows that the condition expectation operator has a unique extension to a contractive linear operator on
L1 (Ω), which we also denote by E(·|G ). This operator is a projection (this
follows by noting that it is a projection in L2 (Ω) and then using that L2 (Ω)
is dense in L1 (Ω)) and it is positive in the sense that it maps positive random variables to positive random variables; this follows from Lemma 2.2 by
approximation.
Lemma 2.4 (Conditional Jensen inequality). If φ : R → R is convex,
then for all f ∈ L1 (Ω) such that φ ◦ f ∈ L1 (Ω) we have, almost surely,
φ ◦ E(f |G ) 6 E(φ ◦ f |G ).
2 Conditional expectations
19
Proof. If a, b ∈ R are such that at + b 6 φ(t) for all t ∈ R, then the positivity
of the conditional expectation operator gives
aE(f |G ) + b = E(af + b|G ) 6 E(φ ◦ f |G )
∞
almost surely. Since φ is convex we can find real sequences (an )∞
n=1 and (bn )n=1
such that φ(t) = supn>1 (an t + bn ) for all t ∈ R; we leave the proof of this fact
as an exercise. Hence almost surely,
t
u
φ ◦ E(f |G ) = sup an E(f |G ) + bn 6 E(φ ◦ f |G ).
n>1
Theorem 2.5 (Lp -contractivity). For all 1 6 p 6 ∞ the conditional expectation operator extends to a contractive positive projection on Lp (Ω) with
range Lp (Ω, G ). For f ∈ Lp (Ω), the random variable E(f |G ) is the unique
element of Lp (Ω, G ) with the property that for all G ∈ G ,
Z
Z
E(f |G ) dP =
f dP.
(2.1)
G
G
Proof. For 1 6 p < ∞ the Lp -contractivity follows from Lemma 2.4 applied to
the convex function φ(t) = |t|p . For p = ∞ we argue as follows. If f ∈ L∞ (Ω),
then 0 6 |f | 6 kf k∞ 1Ω and therefore 0 6 E(|f | |G ) 6 kf k∞ 1Ω almost surely.
Hence, E(|f | |G ) ∈ L∞ (Ω) and kE(|f | |G )k∞ 6 kf k∞ .
For 2 6 p 6 ∞, (2.1) follows from Lemma 2.2. For f ∈ Lp (Ω) with
2
1 6 p < 2 we choose a sequence (fn )∞
n=1 in L (Ω) such that limn→∞ fn = f
in Lp (Ω). Then limn→∞ E(fn |G ) = E(f |G ) in Lp (Ω) and therefore, for any
G ∈ G,
Z
Z
Z
Z
E(f |G ) dP = lim
E(fn |G ) dP = lim
fn dP =
f dP.
G
n→∞
n→∞
G
G
G
R
p
If
R g ∈ L (Ω, G ) satisfies G g dP = G f dP for all G ∈ G , then G g dP =
E(f |G ) dP for all G ∈ G . Since both g and E(f |G ) are G -measurable, as
G
in the proof of the second part of Lemma 2.2 this implies that g = E(f |G )
almost surely.
In particular, E(E(f |G )|G ) = E(f |G ) for all f ∈ Lp (Ω) and E(f |G ) = f
for all f ∈ Lp (Ω, G ). This shows that E(·|G ) is a projection onto Lp (Ω, G ).
t
u
R
R
The next two results develop some properties of conditional expectations.
Proposition 2.6.
(1) If f ∈ L1 (Ω) and H is a sub-σ-algebra of G , then almost surely
E(E(f |G )|H ) = E(f |H ).
20
2 Conditional expectations
(2) If f ∈ L1 (Ω) is independent of G (that is, f is independent of 1G for all
G ∈ G ), then almost surely
E(f |G ) = Ef.
(3) If f ∈ Lp (Ω) and g ∈ Lq (Ω, G ) with 1 6 p, q 6 ∞, p1 + 1q = 1, then almost
surely
E(gf |G ) = gE(f |G ).
R
R
RProof. (1): For all H ∈ H we have H E(E(f |G )|H ) dP = H E(f |G ) dP =
f dP by Theorem 2.5, first applied to H and then to G (observe that
H
H ∈ G ). Now the result follows from
the uniqueness part of the Rtheorem.
R
(2): For all G ∈ G we have G Ef dP = E1G Ef = E1G f = G f dP, and
the result follows from the uniquenessR part of Theorem 2.5.
R
0
R (3): For all
R G, G ∈ G we have G 1G0 E(f |G ) dP = G∩G0 E(f |G ) dP =
f dP = G f 1G0 dP. Hence E(f 1G0 |G ) = 1G0 E(f |G ) by the uniqueness
G∩G0
part of Theorem 2.5. By linearity, this gives the result for simple functions g,
and the general case follows by approximation.
t
u
Example 2.7. Let f : (−r, r) → R be integrable and let G be the σ-algebra of
all symmetric Borel sets in (−r, r) (we call a set B symmetric if B = −B).
Then,
1
E(f |G ) = (f + fˇ) almost surely,
2
ˇ
where f (x) = f (−x). This is verified by checking the condition (2.1).
Example 2.8. Let f : [0, 1) → R be integrable and let G be the σ-algebra
n
generated by the sets In := [ n−1
N , N ), n = 1, . . . , N . Then,
E(f |G ) =
N
X
mn 1[ n−1 , n ) almost surely,
N
N
n=1
where
mn =
1
|In |
Z
f (x) dx.
In
Again this is verified by checking the condition (2.1).
If f and g are random variables on (Ω, F , P), we write E(g|f ) := E(g|σ(f )),
where σ(f ) is the σ-algebra generated by f .
Example 2.9. Let f1 , . . . , fN be independent integrable random variables satisfying E(f1 ) = · · · = E(fN ) = 0, and put SN := f1 + · · · + fN . Then, for
1 6 n 6 N,
E(SN |fn ) = fn almost surely.
This follows from the following almost sure identities:
X
X
E(SN |fn ) = E(fn |fn ) +
E(fm |fn ) = fn +
E(fm ) = fn .
m6=n
m6=n
2.1 Notes
21
Example 2.10. Let f1 , . . . , fN be independent and identically distributed integrable random variables and put SN := f1 + · · · + fN . Then
E(f1 |SN ) = · · · = E(fN |SN ) almost surely
and therefore
E(fn |SN ) = SN /N almost surely,
n = 1, . . . , N.
It is clear that the second identity follows from the first. To prove the first,
let µ denote the (common) distribution of the random variables fn . By independence, the distribution of the RN -valued random variable (f1 , . . . , fN )
equals the N -fold product measure µN := µ × · · · × µ. Then, for any Borel set
B ∈ B(R) we have, using the substitution formula and Fubini’s theorem,
Z
E(fn |SN ) dP
{SN ∈B}
Z
=
fn dP
{S ∈B}
Z N
=
xn dµN (x)
{x1 +···+xN ∈B}
Z
Z
=
y dµ(y) dµN −1 (z1 , . . . , zN −1 ),
RN −1
{y∈B−z1 −···−zN −1 }
which is independent of n.
2.1 Notes
We recommend Williams [2] for an introduction to the subject. The bible
for modern probabilists is Kallenberg [1].
The definition of conditional expectations starting from orthogonal projections in L2 (Ω) follows Kallenberg [1]. Many textbooks prefer the shorter
(but less elementary and transparent) route via the Radon-Nikodým theorem.
In this approach, one notes that if f is a random variable on (Ω, F , P) and
G is a sub-σ-algebra of F , then
Z
f dP, G ∈ G ,
Q(G) :=
G
defines a probability measure Q on (Ω, G , P|G ) which is absolutely continuous with respect to P|G . Hence by the Radon-Nikodým, Q has an integrable
density with respect to P|G . This density equals the conditional expectation
E(f |G ) almost surely.
References
1. O. Kallenberg, “Foundations of Modern Probability”, second ed., Probability
and its Applications (New York), Springer-Verlag, New York, 2002.
2. D. Williams, “Probability with Martingales”, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991.