Download Chapter 3 Conditional Expectations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Chapter 3
Conditional Expectations
When we work on a particular branch of multi-period Binomial model, conditional
probability is involved: conditional on a node, we consider the probability of going
up or down. Given a continuous time price process St , if at time u we want to find
a product’s price at maturity T , we need to deal with the conditional distribution
of ST given Su . In this section we provide a rigorous introduction to conditional
probabilities and conditional expectations.
3.1 Conditional Probability and Independence
Definition 3.1. (Conditional Probability) For any events A, B ∈ F such that P(B) 6=
0, the conditional probability of A given B is defined by
P(A|B) =
P(A ∩ B)
.
P(B)
(3.1)
Example 3.1. (Total Probability Formula). For any event A ∈ F and any partition
B1 , B2 , . . . of Ω such that P(Bi ) 6= 0 and Bi ∈ F for any i,
∞
∞
P(A) = ∑ P(A ∩ Bi ) = ∑ P(A|Bi )P(Bi ), ,
i=1
i=1
where the first and second equalities follow from the countably additivity of probability measure and (3.1) respectively.
Definition 3.2. (Independence of events) Two events A, B ∈ F are called independent if P(A ∩ B) = P(A)P(B). Any n events A1 , A2 , . . . , An ∈ F are independent if
P(Ai1 ∩ Ai2 ∩ · · · ∩ Aik ) = P(Ai1 )P(Ai2 ) · · · P(Aik )
for any indices 1 ≤ i1 < i2 < · · · < ik ≤ n. In general, an infinite family of events is
independent if any finite number of them are independent.
41
42
3 Conditional Expectations
Definition 3.3. (Independence of random variables) A set of n random variables
X1 , . . . , Xn are independent if for any Borel sets B1 , . . . , Bn ∈ B, the events
{X1 ∈ B1 }, . . . , {Xn ∈ Bn }
are independent. In general, an infinite family of random variable is independent if
any finite number of them are independent.
Definition 3.4. (Independence of σ -fields) A set of n σ -fields G1 , . . . , Gn contained
in F are independent if any n events
{B1 ∈ G1 }, . . . , {Bn ∈ Gn }
are independent. In general, an infinite family of σ -fields is independent if any
finite number of them are independent.
Remark 3.1. The σ fields Gi s contained in F , i.e., Gi ⊂ F for all i. Otherwise, the
measure P is not able to assign probabilities on the set {Bi ∈ Gi }.
Example 3.2. Consider the dice example (Ω , F , P) where Ω = {1, 2, 3, 4, 5, 6} and
F = 2Ω . Let P({ω}) = 1/6 for ω = 1, . . . , 6. and
A1 = {1, 2, 3}, A2 = {2, 5}, Xi = 1Ai , Fi = σ ({Ai }) for i = 1, 2 .
First consider independence of events: P(A1 ∩ A2 ) = P(A1 )P(A2 ) = 61 , thus A1
and A2 are independent.
Next consider independence of r.v.s. Note that the σ -field generated by X1 and X2
are {∅, A1 , {4, 5, 6}, Ω } and {∅, A2 , {1, 3, 4, 6}, Ω } respectively. We can conclude
that X1 and X2 are independent if each pair of these two groups of events are independent (4×4=16 combinations). Nevertheless, these 16 pairs of events can be
shown to be independent by elementary calculations. Thus, X1 and X2 are independent.
Finally, to show that F1 and F2 are independent, we need to show that each
combination of events from F1 and F2 are independent. Notice that Fi coincide
with the σ -field generated by Xi for i = 1, 2, respectively. Thus, the calculations in
the preceding paragraph show that F1 and F2 are indeed independent.
From the above, it is seen that if two σ -fields Fa and Fb are independent, any
Fa measurable r.v. is independent of any Fb measurable r.v..
3.2 Conditional Expectation
3.2.1 Conditioning on an Event
Definition 3.5. (Conditional Expectation given an Event) For any random variable X ∈ L 1 on (Ω , F , P) and any event B ∈ F such that P(B) 6= 0, the conditional
3.2 Conditional Expectation
43
expectation of X given B is defined by
1
E(X|B) =
P(B)
Z
XdP .
B
Example 3.3. An investor plans to invest $10 in an asset. According to his estimate,
the asset value in the coming two years can be represented by the following diagram
12.1
1/4
2/3
11
3/4
1/4
10
1/3
9.5
3/4
Year
0
10.45
9.025
Year
1
Year
2
The number along an arrow represents the probability that the event indicated by
the arrow would occur. Let X be the value of the asset at the end of the second year.
Recall the definition of path in Definition 1.6 where ωt = (z1 , . . . , zt ), zi = ±1 for
i = 1, . . . ,t, indicates the evolution (up=1, down=-1) of the market. We can take the
probability space to be (ΩX , FX , PX ) where ΩX = {ω2 = (a, b)|a = ±1, b = ±1},
FX = 2ΩX and PX is the discrete probability measure satisfying PX ({(1, 1)}) =
1/6, . . . , PX ({(−1, −1)}) = 1/4.
Consider the event B = {ω1 = (1)}. Clearly, P(B) = 2/3. Also, note that
X((1, 1)) = 12.1 , X((1, −1)) = 10.45 .
and
P({(1, 1)}) =
1
1
, P({(1, −1)}) = .
6
2
Therefore
Z
1
1 XdP =
(X((1, 1))P({(1, 1)}) + X((1, −1))P({(1, −1)})
P(B) B
P(B)
1
1
1
12.1 × + 10.45 ×
= 10.8625.
=
2/3
6
2
E(X|B) =
44
3 Conditional Expectations
3.2.2 Conditioning on a discrete random variable
Given an arbitrary random variable X in L 1 (so expectation exists) and a discrete
random variable Z : Ω → {z1 , . . . , zm }, the conditional expectation of a r.v. X given
Z, i.e. E(X|Z), must depend solely on the random variable Z. Thus E(X|Z) is itself a
random variable. Since Z takes only m possible values, so does E(X|Z). Therefore,
we can characterize E(X|Z) by m conditional expectations (on an event {Z = zi })
E(X|{Z = zi }), i = 1, . . . , m, using the definition in the last section.
Definition 3.6. (Conditional Expectation given a discrete r.v.) Let X ∈ L 1 and
Z be a discrete r.v. taking values on {zi }i=1,... . The conditional expectation of X
given Z is defined to be a discrete random variable E(X|Z) : Ω → R such that
E(X|Z)(ω) = E(X|{Z = zi }) on {ω : Z(ω) = zi } ,
for any i = 1, 2, . . .
(3.2)
Example 3.4. Three coins, 10, 20 and 50 cents, are tossed. The values of those coins
with heads up are added for a total amount X. Let Z be the total amount of the 10
and 20 cents, what is E(X|Z)?
First, the probability space can be taken as Ω = {HHH, HHT, . . . , T T T }, where
H and T stands for Head and Tail respectively. The associated σ -field can be taken
as F = 2Ω . The probability measure P may be defined on F with P({HHH}) =
· · · = P({T T T }) = 81 . The discrete r.v. Z on (Ω , F , P) is given by Z : Ω →
{0, 10, 20, 30}. Thus finding E(X|Z) is equivalent to finding E(X|{Z = 0}), . . . ,
E(X|{Z = 30}). For example,
Z
1
1
1
1
E(X|{Z = 0}) =
X dP =
(0 + 50) + (0 + 0)
= 25 .
P({Z = 0}) {Z=0}
1/4
8
8
Using similar argument for the other cases, we have


 25 if Z(ω) = 0 ,

35 if Z(ω) = 10 ,
E(X|Z)(ω) =
45
if Z(ω) = 20 ,



55 if Z(ω) = 30 .
Note that the probability measure on Z is irrelevant. In other words, the above results
still hold even if the 10 and 20 cents coins are biased. However, the results will be
different if the 50 cents coin is assumed biased.
Theorem 3.1. If X ∈ L 1 and Z is a discrete random variable, then
i) E(X|Z) is σ (Z)-measurable
ii) For any A ∈ σ (Z),
Z
Z
E(X|Z) dP =
A
X dP .
A
(3.3)
3.2 Conditional Expectation
45
Proof. i) To show that E(X|Z) is σ (Z) measurable, we have to show that for any
B ∈ B, {ω : E(X|Z)(ω) ∈ B} ∈ σ (Z). Since Z is discrete, we can assume that Z
takes values {zi }i=1,... . From (3.2), E(X|Z) also takes discrete values {yi }i=1,... ,
where yi = E(X|{Z = zi }). Note that {ω : E(X|Z)(ω) = yi } = {Z = zi }. It follows
that for any B ∈ B, {ω : E(X|Z)(ω) ∈ B} is a disjoint union of {Z = zi }, which
belongs to σ (Z). This completes the proof of i).
ii) Since E(X|Z)(ω) is constant on {ω : Z(ω) = zi }, we have
Z
Z
E(X|Z)dP =
{Z=zi }
{Z=zi }
E(X|{Z = zi })dP
= E(X|{Z = zi })
R
{Z=zi } XdP
Z
dP
{Z=zi }
Z
= R
{Z=zi } dP
dP
{Z=zi }
Z
XdP .
=
{Z=zi }
In general, any A ∈ σ (Z) is a countable union of sets of the form {Z = zi },
which are pairwise disjoint. Thus (3.3) follows from the countably additivity of
Lebesgue integral.
Example 3.5. Continuing Example 3.4, the induced sample space for the r.v. X and Z
are ΩX = {0, 10, 20, 30, 50, 60, 70, 80} and ΩZ = {0, 10, 20, 30}. Let A1 = {Z = 20}
and A2 = {Z = 10 or 30}. Note that A1 , A2 ∈ FZ , 2ΩZ . It is easy to verify that (3.3)
holds for A = A1 and A2 . In particular,
1
1
X dP = 20 + 70 = 11.25 ,
8
8
A1
Z
1
E(X|Z) dP = 45 = 11.25 ,
4
A1
Z
1
X dP = (10 + 60 + 30 + 80) = 22.5 ,
8
A2
Z
1
E(X|Z) dP = (35 + 55) = 22.5 .
4
A2
Z
3.2.3 Conditioning on an arbitrary random variable
If Z is a discrete r.v., we see from (3.2) that E(X|Z) is also a discrete random variable. When Z is an arbitrary r.v. (has discrete and continuous component), it is not
easy to write the explicit formula for E(X|Z) in general. (The elementary way of
46
3 Conditional Expectations
R
writing E(X|Z) = x fX,Z (x, z)/ fZ (z) dx cannot be used if the p.d.f.s fX,Z or fZ (z)
does not exist.) However, Theorem 3.1 suggests a way for us to define conditional
expectation given an arbitrary r.v..
Definition 3.7. (Conditional Expectation given an arbitrary r.v.) Let X ∈ L 1 and
Z be an arbitrary r.v.. Then the conditional expectation of X given Z is defined to be
a random variable E(X|Z) such that
i) E(X|Z) is a σ (Z)-measurable r.v. ,
ii) For any A ∈ σ (Z),
Z
Z
E(X|Z) dP =
A
X dP .
(3.4)
A
Since conditional expectation is defined implicitly by (3.4) rather than an explicit
formula, we need to justify such a definition by showing that the r.v. E(X|Z) is
characterized uniquely. Since E(X|Z) is a random variable, the uniqueness depends
on the probability measure P through the notion of almost sure equivalence.
Definition 3.8. (Almost Sure equivalence)
1. Two events A, B ∈ F are equal almost surely (a.s.) if P((A\B) ∪ (B\A)) = 0,
i.e., the un-common region of A and B has probability 0.
2. Two random variables X and Y on (Ω , F , P) are equal almost surely (a.s.) if
P(X = Y ) , P({ω ∈ Ω : X(ω) = Y (ω)}) = 1.
Example 3.6. Note that two events A and B are equal a.s. does not imply that the two
events are the same. It only means that the event that A\B and B\A are of P-measure
0. For example, consider A = (0, 0.5) and B = (0, 0.5] on ([0, 1], B[0,1] , λ[0,1] ). It is
clear that A = B a.s. but A 6= B.
For the case of random variables, let Ω = {ω1 , ω2 }, F = 2Ω , P is defined on F
and satisfies P({ω1 }) = 1. Suppose that Xi : Ω → R are r.v.s with X1 = Iω1 + 2 × Iω2
and X2 = Iω1 + 3 × Iω2 . Then X1 = X2 a.s. but X1 6= X2 .
Theorem 3.2. (Uniqueness of Conditional Expectation) E(X|Z) is unique in the
sense that if X = X 0 a.s., then E(X|Z) = E(X 0 |Z) a.s.
The proof of Theorem 3.2 relies on the following lemma.
Lemma 3.3 Let (Ω , F , P) be a probability space and let σ (Z) ∈ F . If Y is a σ (Z)
measurable r.v. and
Z
Y dP = 0 ,
A
for any A ∈ σ (Z), then Y = 0 a.s. That is, P(Y = 0) = 1.
Proof. Observe that P{Y ≥ n1 } = 0 for any positive integer n because
Z
Z
1
1
1
0≤ P Y ≥
Y dP = 0
=
dP
≤
n
n
{Y ≥ 1n } n
{Y ≥ n1 }
3.2 Conditional Expectation
47
The last equality follows from the fact that {Y ≥ 1n } = {Y ∈ [ 1n , ∞)} ∈ σ (Z). Similarly, P{Y ≤ − n1 } = 0 for any n > 0. Set An = {− 1n < Y < n1 }, then we have
P(An ) = 1
for any positive integer n. Since {An } forms a contracting sequence of events and
{Y = 0} =
∞
\
An ,
n=1
it follows that
P{Y = 0} = P
∞
\
!
m
\
= lim P
An
m→∞
n=1
!
An
n=1
= lim P(Am ) = 1,
m→∞
(3.5)
as required. See Exercise 3.29.
Proof. (Theorem 3.2.) Suppose that X = X 0 a.s., i.e., P(X = X 0 ) = 1. We want
to show that E(X|Z) = E(X 0 |Z) a.s.. Let Y = E(X|Z) − E(X 0 |Z), note that by the
definition of conditional distribution, E(X|Z), and E(X 0 |Z) are σ (Z) measurable, so
does Y . For any A ∈ σ (Z), we have
Z
Z
Y dP =
A
ZA
=
E(X|Z) − E(X 0 |Z) dP
(X − X 0 ) dP
(by (3.4))
A
(Since X = X 0 a.s..)
=0
From Lemma 3.3, we have Y = 0 a.s., i.e., E(X|Z) = E(X 0 |Z) a.s., completing the
proof of Theorem 3.2.
Example 3.7. Consider the probability space ([0, 1], B[0,1] , λ[0,1] ). Let
2
X(ω) = 2ω , Z(ω) =
2 if ω ∈ [0, 12 )
.
ω if ω ∈ [ 12 , 1]
The following heuristic can be used to guess the form of E(X|Z): First note that
there is a discrete mass at Z = 2, we can follow the arguments used in conditioning
on discrete r.v. to obtain that, on ω ∈ [0, 21 ),
E(X|Z)(ω) = E(X|{Z = 2}) =
1
P(ω ∈ [0, 12 ))
1
=
1/2
Z
0
1
2
Z
[0, 12 )
2ω 2 dω =
X(ω) dP
1
.
6
(3.6)
48
3 Conditional Expectations
For the continuous part Z(ω) = ω on ω ∈ [1/2, 1], note that {Z = z} ∈ σ (Z) and
{Z = z} = {ω : Z(ω) = z} = {ω = z} for z ∈ [1/2, 1], we heuristically solve for
E(X|Z) from the definition by
R
R
= {Z=z} X dP
R
⇒ E(X|{Z = z}) {Z=z} dP = {ω:Z(ω)=z}
X P(dω)
R
dP
{Z=z} E(X|Z)
R
⇒
(3.7)
P(dω)
E(X|{Z = z}) = X(z) R{ω=z} P(dω) = X(z) = 2z2 .
{ω=z}
Express E(X|Z) as a function of ω (sample space) instead of a function of Z (induced sample space), we have
1
1
6 if ω ∈ [0, 2 ) .
E(X|Z)(ω) =
1
2
2ω if ω ∈ [ 2 , 1]
To verify that E(X|Z) is really a conditional expectation, we need to verify Conditions i) and ii) in Definition 3.7. First we show i): E(X|Z) is σ (Z) measurable.
Recall that σ (Z) = {Z −1 (B), B ∈ B}. For any Borel set B ∈ B, from the definition
of Z, we have
/ B,
B ∩ 21 , 1 if 2 ∈
(3.8)
Z −1 (B) =
B ∩ 21 , 1 ∪ 0, 12 if 2 ∈ B .
On the other hand, it can be checked that the inverse mapping of {ω : E(X|Z)(ω) ∈
B}, where B ∈ B, also takes the same form as in (3.8). Hence, {ω : E(X|Z)(ω) ∈
B} ⊆ {Z −1 (B), B ∈ B} = σ (Z), i.e., E(X|Z) is σ (Z) measurable. Finally, ii) can be
directly justified by calculations similar to (3.6) and (3.7). Thus, E(X|Z) satisfies
Definition 3.7, i.e., is a valid conditional expectation.
Example 3.8. (Elementary definition of Conditional Expectation) Recall that
Definition 3.7 of conditional expectation E(X|Z) does not require the existence of
joint p.d.f. of X and Z. In this example we show that E(X|Z) reduces to the elementary definition of conditional expectation when the joint p.d.f. of X and Z exists.
Suppose that the joint p.d.f. of r.v.s X and Z exists and is given by fX,Z (x, z). We
can define the conditional p.d.f. by
fX|Z (x|z) = fX,Z (x, z)/ fZ (z) if fZ (z) 6= 0 ,
R
and 0 otherwise, where fZ (z) = fX,Z (x, z) dx. Note that if the joint p.d.f. exists,
then dP = fX,Z (x, z)dxdz. Following the same heuristic as in Example 3.7, we solve
for E(X|Z) from the definition by
R
⇒ E(X|{Z = z})
⇒
RR
{z=z}
R
= {Z=z} X dP
RR
fX,ZR (x, z) dxdz = {z=z} x fX,Z (x, z) dxdz
{Z=z} E(X|Z) dP
E(X|{Z = z}) =
x fX,Z (x,z) dx
fZ (z)
R
= x fX|Z (x|z) dx =: g(z) ,
say. Then E(X|Z) , g(Z) is the conditional expectation of X given Z.
(3.9)
3.3 Conditioning on a σ -field
49
Next we justify Conditions i) and ii) in Definition 3.7. First, i) is trivial since g(z)
is a function involving only z. Lastly we verify ii): for any B ∈ B and A , {ω :
Z(ω) ∈ B} ∈ σ (Z), we have
Z
Z Z
X dP =
A
IA (z)x fX,Z (x, z) dxdz .
and
Z
Z
g(Z) dP =
A
Z
Z
IA (z)g(z) fZ (z) dz =
IA (z)
Z Z
=
x fX|Z (x|z) dx
fZ (z)dz
Z
IA (z)x fX,Z (x, z) dxdz =
X dP .
A
Thus conditions in Definition 3.7 are fulfilled.
3.3 Conditioning on a σ -field
Note that Definition 3.7 only looks at sets of σ (Z) rather than on the actual values
of Z, we can generalize the definition of conditional expectation to a given σ -field.
Definition 3.9. Let X ∈ L 1 on (Ω , F , P), and let G ⊂ F be a σ -field. Then the
conditional expectation of X given G is defined to be a random variable E(X|G )
such that
i) E(X|G ) is a G -measurable r.v.
ii) For any A ∈ G ,
Z
Z
E(X|G ) dP =
A
X dP .
(3.10)
A
Remark 3.2. Combining Definition 3.7 and 3.9, it can be seen that
E(X|σ (Z)) = E(X|Z) .
Note that different r.v.s can generate the same σ -field. For example, on the sample
space Ω = {H, T }, let X1 = 1{H} and X2 = 1{T } generate the same σ -field F =
{∅, {H}, {T }, Ω }. Thus, conditional expectation given a σ -field is more general
than conditional expectation given a r.v..
Example 3.9. If G = {φ , Ω }, then E(X|G ) = E(X) a.s. To see this, note that if G =
{∅, Ω }, then any constant r.v., including E(X), is G -measurable. Note that
Z
Z
E(X) dP ,
X dP = E(X) =
Ω
and
Ω
50
3 Conditional Expectations
Z
Z
X dP = 0 =
∅
E(X) dP .
∅
Therefore, E(X) satisfies i) and ii) of Definition 3.7, thus E(X|G ) = E(X), a.s.. Example 3.10. For B ∈ G , we have E(E(X|G )|B) = E(X|B). To see this, note that
from the definition of conditional expectation, for B ∈ G we have
Z
Z
E(X|G ) dP =
X dP .
B
B
Thus Definition 3.5 implies that
1
E(E(X|G )|B) =
P(B)
1
E(X|G ) dP =
P(B)
B
Z
Z
X dP = E(X|B) .
B
Example 3.11. Consider the fair dice example where Ω = {1, . . . , 6} and P({i}) =
1
Ω
6 . Let F1 = {∅, Ω }, F2 = {∅, {1, 2, 3}, {4, 5, 6}, Ω }, F3 = 2 and X : Ω → R
satisfying X(i) = (i − 4)+ . Note that
σ (X) = {∅, {5}, {6}, {5, 6}, {1, 2, 3, 4}, {1, 2, 3, 4, 5}, {1, 2, 3, 4, 6}, Ω } ,
thus X ∈ F3 but X ∈
/ F1 and F2 . From Example 3.9, E(X|F1 ) = E(X) = (5 −
4) 61 + (6 − 4) 61 = 0.5. Next, E(X|F2 ) is a F2 measurable function, thus it takes
constant values on each of {1, 2, 3} and {4, 5, 6}. Simple calculations yield
R
 {1,2,3} X dP=0
if ω ∈ {1, 2, 3} ,
E(X|F2 )(ω) = R P({1,2,3})
 {4,5,6} X dP = 1 (5 − 4) 1 + (6 − 4) 1 = 1 if ω ∈ {4, 5, 6} .
P({4,5,6})
1/2
6
6
Lastly, since X ∈ F3 , we have E(X|F3 ) = X .
3.4 Radon-Nikodym derivative
In Definition 3.7 and 3.9, it is implicitly assumed that the random variable E(X|Z)
or E(X|G ) exists. The existence of conditional expectation is non-trivial, but is guaranteed by the Radon-Nikodym Theorem, which is an important tool in probability
theory that makes connections between measures.
Definition 3.10. (Absolute Continuous measures) Suppose that P and Q are probability measures on (Ω , F ) and that P(F) = 0 implies Q(F) = 0 when F ∈ F . Then
Q is said to be absolutely continuous with respect to P and is denoted by Q P.
Example 3.12. Consider the measurable space (Ω , F ) where Ω = {1, 2, 3, 4, 5, 6}
and F = 2Ω . For ω ∈ Ω , let P1 ({ω}) = 1/6, P2 ({ω}) = |ω − 3.5|/9, P3 ({ω}) =
1
1
3 1{ω≥4} and P4 ({ω}) = 3 1{ω≤3} . Then, the only absolute continuous pairs are
P1 P2 , P2 P1 , P3 Pi and P4 Pi for i = 1, 2.
3.4 Radon-Nikodym derivative
51
For the measurable space (R, B), let PN and PE satisfies PN ((a, b]) = FN (b) −
FN (a) and PE ((a, b]) = FE (b) − FE (a), where FN and FE are the c.d.f.s of the standard normal and standard exponential distribution family, respectively. Then we
have PE PN , but PN PE is not true.
Theorem 3.4. (Radon-Nikodym Theorem) Let P and Q be two probability measures on the measurable space (Ω , F ). If Q P, then there exists a F measurable
r.v. Y ∈ L 1 such that
Z
Q(A) =
Y dP ,
(3.11)
A
for any A ∈ F . The r.v. Y is called the Radon-Nikodym derivative of Q w.r.t. P
and is denoted by Y = dQ
dP .
Proof. The proof consists of four steps:
i) Assume Q(A) ≤ P(A) for all A ∈ F . Construct a Yn on a finite σ -field Fn ⊂ F .
ii) Take limit to obtain the required r.v. Y = limn→∞ Yn .
iii)Show that the limit Y satisfies (3.11) any A ∈ F .
iv) Relax the assumption Q(A) ≤ P(A).
i) [Construction on finite σ -field] Assume that Q(A) ≤ P(A) for all A ∈ F . First
we define the required function Y on a finite sub-σ field Fn ⊂ F which is generated
n
by a finite partition of Ω . That is, for some integer rn , Ω = ∪rj=1
Fj where Fi ∩Fj = ∅
if i 6= j and Fn = σ (F1 , . . . , Frn ). Define Yn : Ω → R by
(
Q(Fj )
if ω ∈ Fj and P(Fj ) > 0 ,
Yn (ω) = P(Fj )
(3.12)
0
if P(Fj ) = 0 .
Note that Yn ∈ L 1 since
E(|Yn |) =
rn
Z
Yn dP =
Ω
∑
rn
Z
j=1 Fj
Yn dP =
∑ Q(Fj ) = Q(Ω ) = 1 < ∞.
j=1
Moreover, Yn (ω) is a constant (Q(Fj )/P(F
)) on ω ∈ Fj , so Yn is Fn measurable.
Rj
Also, simple algebra shows that Q(Fj ) = Fj Yn dP. Since every F ∈ Fn is a union
of Fj s, (3.11) holds for all A ∈ Fn .
Next consider a refinement Fn+1 = σ (F̃1 , . . . , F̃rn+1 ) of Fn such that {F̃j } j=1,...,rn+1
are disjoint and each Fi ∈ Fn is a union of some F̃j s. Define Yn+1 : Ω → R by
(
Q(F̃j )
if ω ∈ F̃j and P(F̃j ) > 0 ,
Yn+1 (ω) = P(F̃j )
0
if P(F̃j ) = 0 .
Similarly,
it can be shown that Yn+1 ∈ L 1 , Yn+1 is Fn+1 measurable, and Q(A) =
R
A Yn+1 dP for all A ∈ Fn+1 . Moreover,
52
3 Conditional Expectations
Z
2
Yn+1
dP =
Ω
Z
(Yn+1 −Yn )2 dP +
Ω
Z
=
Z
Yn2 dP + 2
Z
Ω
(Yn+1 −Yn )2 dP +
Ω
≥
Z
Z
Yn (Yn+1 −Yn ) dP
Ω
Yn2 dP
Ω
Yn2 dP ,
(3.13)
Ω
where the second equality follows from
Z
rn
Yn (Yn+1 −Yn ) dP =
Ω
=
∑
Z
j=1 Fj
rn
Yn (Yn+1 −Yn ) dP
Q(Fj )
∑ P(Fj )
j=1
Z
Fj
Yn+1 dP −
Z
Fj
Q(Fj )
dP
P(Fj )
rn
=
Q(Fj )
(Q(Fj ) − Q(Fj )) = 0 .
j=1 P(Fj )
∑
ii) [Passing to the limit] From (3.13), it can be observed that if the partition of
Ω gets finer and finer (e.g., a sequence of σ -field F1 ⊂ . . . ⊂ Fk ⊂ . . .), then the
corresponding
Radon-Nikodym
derivative Yk is a non-decreasing
sequence in the
R
R
R
2 dP ≥
2
2
sense that Ω Yk+1
Ω Yk dP. On the other hand, Ω Yk dP is bounded because
P(Ω ) = 1 and Yk ≤ 1 from (3.12)
and the assumption Q(A) ≤ P(A) for all A ∈ F .
R
Thus, the limit l := limk→∞ Ω Yk2 dP exists. Some limit arguments show that the
limit Y = limk→∞ Yk also exists (see Exercise 3.30).
iii) [Verify (3.11)] Since Yn ∈ Fn ⊂ F and Y is the limit of Yn , Y is F measurable.
For any A ∈ F , let Rn be the
common refinement of Fn and {A, Ac }. It can be
R
verified directly that Q(A) = A YRn dP where YRn is defined similarly
as Yn but
is on
R
R
Rn instead of Fn . Similar limit arguments as in ii) show that A YRnk dP → A Y dP
for some subsequence {nk }k=1,2,... , yielding (3.11) (see Exercise 3.30).
iv) [Relax the assumption Q(A) ≤ P(A)] In general, let S = P + Q, so P(A) ≤
S(A) and Q(A) ≤ S(A) for allR A ∈ F . The previous
results implies that there exist
R
YQ and YP such that Q(A) = A YQ dS and P(A) = A YP dS for all A ∈ F . Now, let
Y
Y = YQ 1{YP >0} . Then,
P
Z
Q(A) =
A
YQ dS
YQ
YP dS +
A∩{YP >0} YP
Z
=
Z
=
ZA
=
YYP dS + 0
Y dP ,
A
Z
A∩{YP =0}
YQ dS
3.5 General Properties of Conditional Expectations
53
R
where the third equality follows from that on B := {YP = 0}, P(B) = B YP dS = 0,
which implies Q(B) = 0 as Q P, which in turn implies that S(B) = 0. The last
equality follows from Exercise 3.30. Thus the proof is completed.
Using the Radon-Nikodym Theorem, the existence of conditional expectation
can be justified:
Theorem 3.5. (Existence of Conditional Expectation) Let (Ω , F , P) be a probability space and G be a σ -field contained in F . Then for any r.v. X ∈ L 1 there
exists a G -measurable r.v. XG such that for every A ∈ G ,
Z
Z
X dP =
A
A
XG dP .
Here XG can be regarded as the conditional expectation E(X|G ).
Proof. Given a positive r.v. X, let QX : F → R be a set function satisfying
Z
QX (A) =
X dP ,
(3.14)
A
for A ∈ F . It can be checked that QX is a probability measure and QX P on G .
Thus, applying Radon-Nikodym Theorem
on the measure QX (·) yields a G meaR
surable r.v., say Y , such that QX (A) = A Y dP for A ∈ G . Together with (3.14), Y
satisfies i) and ii) of Definition (3.9). Thus XG = Y is the desired conditional expectation of X given G .
For general r.v. X, we can write X = X + − X − , where X + = X1{X>0} and X − =
−X1{X<0} are positive r.v.s. Then the preceding result can be applied to each of X +
and X − , and the difference of the two conditional expectations is the desired XG . Example 3.13. Refer to Example 3.12, we have
dP1
dP2 (ω)
dP2
dP1 (ω) =
dP3
dP4
dP2 and dP2 ?
= 3/2|ω − 3.5|,
dP3
dP4
(ω) = 2 · 1{ω≥4} and dP
(ω) = 2 · 1{ω≤3} . What are
2|ω − 3.5|/3, dP
1
1
Rx
On theRother hand, note that PN ((−∞, x)) = FN (x) = −∞
fN (x) dx and PE ((−∞, x)) =
x
FE (x) = 0 fE (x) dx where fN (x) and fE (x) are the p.d.f.s of standard normal and
standard exponential distribution, respectively. Note that ffNE (x)
(x) is Borel measurable
and satisfies
Z
A
fE (x)
dPN =
fN (x)
Z
A
fE (x)
fN (x) dx =
fN (x)
Z
A
fE (x) dx = PE (A) .
dPN
dPE
Thus, dP
= ffNE (x)
(x) . However, dPE does not exists as PN is not absolutely continuous
N
w.r.t. PE ( fN (x)/ fE (x) does not exist for x < 0).
3.5 General Properties of Conditional Expectations
Theorem 3.6. Conditional expectation has the following properties
54
3 Conditional Expectations
1. E(aX + bY |G ) = aE(X|G ) + bE(Y |G )
(linearity).
2. E(E(X|G )) = E(X).
3. E(XY |G ) = XE(Y |G ) if X is G -measurable.
4. E(X|G ) = E(X) if X is independent of G
5. E(E(X|G )|H ) = E(X|H ) if H ⊂ G
(tower property).
6. If X ≥ 0, then E(X|G ) ≥ 0
(positivity).
Here a, b are real numbers, X, Y ∈ L 1 are defined on a probability space (Ω , F , P)
and G , H ⊂ F are σ -fields. In (3), we also assume that the product XY ∈ L 1 . All
equalities and inequalities hold almost surely.
Proof. 1. For any B ∈ G
Z
(aE(X|G ) + bE(Y |G ))dP = a
B
Z
Z
E(X|G )dP + b
Z B
=
E(Y |G )dP
B
(aX + bY )dP.
B
The result thus follows from the definition of conditional expectation.
2. This follows by putting B = Ω in Example 3.10. Also, it is a special case of (5)
when H = {∅, Ω }.
3. We first verify the result for X = IA (indicator function) where A ∈ G . In this
case
Z
B
IA E(Y |G )dP =
Z
E(Y |G )dP
ZA∩B
=
Y dP
ZA∩B
=
B
IAY dP.
for any B ∈ G . From the definition of conditional expectation, this implies
IA E(Y |G ) = E(IAY |G ) .
Using 1), we can extend the preceding result to a simple-function X = ∑mj=1 a j IA j ,
i.e.,
m
XE(Y |G ) =
m
∑ a j IA j E(Y |G ) = E( ∑ a j IA j Y |G ) = E(XY |G ) .
j=1
j=1
where A j ∈ G for j = 1, 2, . . . , m. Finally, the general result follows by approximating X with an increasing sequence of simple functions and use MCT.
4. Since X is independent of G , the random variables X and IB are independent
for any B ∈ G . Thus
Z
Z
B
Z
dP = E(X)
E(X)dP = E(X)
B
IB dP = E(X)E(IB )
Ω
3.5 General Properties of Conditional Expectations
55
Since independent random variables are uncorrelated,
Z
E(X)E(IB ) = E(XIB ) =
R
R
Now we have obtained that B E(X)dP =
E(X).
5. By Definition,
Z
B XdP
for all B ∈ G , i.e., E(X|G ) =
Z
E(X|G )dP =
B
for every B ∈ G and
XdP .
B
XdP
B
Z
Z
E(X|H )dP =
B
XdP
B
for every B ∈ H . Since H ⊂ G ,
Z
Z
E(X|G )dP =
B
E(X|H )dP
B
for every B ∈ H . Therefore, the definition of conditional expectation implies
that
E(E(X|G )|H ) = E(X|H ).
6. For any n we put
1
An = E(X|G ) ≤ −
.
n
Since E(X|G ) is a G measurable r.v., we have An ∈ G . If X ≥ 0 a.s., then
0≤
Z
1
E(X|G )dP ≤ − P(An )
n
An
Z
XdP =
An
which implies P(An ) = 0. Since
{E(X|G ) < 0} =
∞
[
An ,
n=1
it follows that P{E(X|G ) < 0} = ∑∞
i=1 P(An ) = 0, completing the proof.
Example 3.14. (Independence and Zero-Correlation) Recall that X ∈ L 1 if E(|X|) <
∞. If X1 and X2 are independent, then they are uncorrelated, i.e., E(X1 X2 ) =
E(X1 )E(X2 ). To see this, note that
E(X1 X2 ) = E(E(X1 X2 |σ (X2 ))) = E(X2 E(X1 |σ (X2 ))) = E(X2 E(X1 )) = E(X1 )E(X2 ) ,
(3.15)
where the four equalities are consequences of Properties 2., 3., 4., 1. of Theorem 3.6
respectively. Repeating the argument in (3.15), we have that, if X1 , . . . , Xn ∈ L 1 are
independent, then they are uncorrelated, i.e.,
56
3 Conditional Expectations
E(X1 X2 · · · Xn ) = E(X1 )E(X2 ) · · · E(Xn )
provided that the product X1 X2 · · · Xn ∈ L 1 . However, the opposite direction is not
true. See Exercise 3.15.
3.6 Useful Inequalities
Definition 3.11. (Convex Function) A function ϕ : R → R is convex if for any
x, y ∈ R and any λ ∈ [0, 1]
ϕ(λ x + (1 − λ )y) ≤ λ ϕ(x) + (1 − λ )ϕ(y) .
Graphically, a function ϕ is convex if it lies below the straight line from the point
(x, ϕ(x)) to the point (y, ϕ(y)) on any interval [x, y].
Theorem 3.7. (Jensen’s Inequality) Let ϕ : R → R be a convex function and let
X ∈ L 1 on (Ω , F , P) such that ϕ(X) ∈ L 1 . Then
ϕ(E(X|G )) ≤ E(ϕ(X)|G ) a.s.
(3.16)
for any σ -field G on Ω contained in F .
Proof. By the convexity of ϕ, for every fixed c ∈ R, there exists an ac such that
ϕ(x) ≥ ac (x − c) + ϕ(c) for all x. (If ϕ(x) is continuous and ac = ϕ 0 (x)|x=c , then the
line ac (x − c) + ϕ(c) is the tangent of ϕ(x) at the point (c, ϕ(c)).) Replacing x by a
r.v. X and taking conditional expectation on both sides gives
E(ϕ(X)|G ) ≥ ac (E(X|G ) − c) + ϕ(c) .
Since c is arbitrary, put c = E(X|G ) yields (3.16).
Theorem 3.8. (Hölder Inequality) Let p > 1, 1p + 1q = 1 and X, Y be r.v. on
(Ω , F , P). If E(X p ) < ∞ and E(Y q ) < ∞, then E(|XY |) < ∞ and
1
1
E(XY ) ≤ (E(X p )) p (E(Y q )) q .
Proof. Without loss of generality, assume that X and Y are positive r.v.s. First define
a new probability measure Q on (Ω , F ) by
Xp
p dP ,
A kXk p
Z
Q(A) =
where kXk p = (E(X p ))1/p . Then,
3.6 Useful Inequalities
57
Y kXk pp X p
Y kXk pp
dP
=
dQ
p−1 kXk p
p−1
Ω X
Ω X
p
q1
Z
1q Z
Jensen
Y q kXk pq
Y q kXk pq
Xp
p
p
dP
≤
dQ
=
p
Ω X q(p−1)
Ω X q(p−1) kXk p
! q1
Z
1
Z
p(q−1)
q
Y q kXk p
q
=
dP
=
kXk
Y
dP
p
qp−q−p
X
Ω
Ω
Z
Z
Z
XY dP =
Ω
= kXk p kY kq ,
completing the proof.
The following is the triangular inequality for the L p norm kXk p = (E|X| p )1/p .
Theorem 3.9. (Minkowski’s Inequality) Let p > 1 and X, Y are r.v. on (Ω , F , P).
If E(X p ) < ∞, then
kX +Y k p ≤ kXk p + kY k p .
Proof. We focus on p > 1 as p = 1 corresponds to the usual triangular inequality.
Note that
|X +Y | p ≤ |X||X +Y | p−1 + |Y ||X +Y | p−1 .
Set q =
p
p−1
Z
such that
1
p
(3.17)
+ 1q = 1. Apply Hölder’s inequality on |X||X +Y | p−1 yields
p
1
|X||X +Y | p dP ≤ kXk p (E|X +Y |(p−1)q ) q = kXk p kX +Y k pq .
(3.18)
Ω
Similarly, we have
Z
p
|Y ||X +Y | p dP ≤ kY k p kX +Y k pq .
(3.19)
Ω
The result follows from combining (3.17), (3.18) and (3.19).
Given a sequence of random variables X1 , . . . , Xn on the probability space (Ω , F , P),
we have the following generalization of MCT, Fatou’s lemma and DCT:
Theorem 3.10. Let G be a sub σ -field of F .
i) (Monotone Convergence Theorem (MCT)) If 0 ≤ Xn ↑ X, then E(Xn |G ) ↑
E(X|G ) a.s..
ii) (Fatou’s Lemma) If Xn ≥ 0, then E(lim inf Xn |G ) ≤ lim inf E(Xn |G ) a.s..
iii)(Dominated Convergence Theorem (DCT)) If for some r.v. M and all n ≥ 1,
a.s.
Xn ≤ M (i.e. |Xn (ω)| ≤ M(ω) for all ω ∈ Ω ), and if E(M) ≤ ∞ and Xn → X,
then
a.s.
E(Xn |G ) → E(X|G ) .
58
3 Conditional Expectations
3.7 Exercises
Exercise 3.11 Consider the dice example (Ω , F , P) where Ω = {1, 2, 3, 4, 5, 6} and
F = 2Ω . Let P({ω}) = 1/6 for ω = 1, . . . , 6, and
A1 = {1, 2, 3}, A2 = {4, 5, 6}, A3 = {2, 5}, A4 = {2, 6},
Xi = 1Ai , Fi = σ ({Ai }) for i = 1, . . . , 4 ,
X5 = X5 (ω) = ω1Ai (ω), F5 = σ ({A1 , A3 }), F6 = σ ({A3 , A4 }), F7 = 2Ω .
a)
b)
c)
d)
Which events are independent?
Which random variables are independent?
Which σ -fields are independent?
for i = 1, . . . , 6.
Repeat a)-c) using the measure P({i}) = |i−3.5|
9
Exercise 3.12 Explain the meaning of independence between two events and mutually exclusive between two events. Can two events be both independent and mutually
exclusive? Explain.
Exercise 3.13 How to define the independence between an event and a random
variable? How to define the independence between an event and a σ -field? How to
define the independence between a random variable and a σ -field?
Exercise 3.14 Show that two random variables X and Y are independent if and only
if the σ -fields σ (X) and σ (Y ) are independent.
Exercise 3.15 Let X ∼ N(0, 1) and Y = X 2 . Show that X and Y are uncorrelated
but dependent.
Exercise 3.16 Let IA be the indicator function of an event A, i.e.
1 if ω ∈ A
IA (ω) =
0 if ω ∈
/ A.
Show that E(IA |B) = P(A|B) for any event B with P(B) 6= 0.
Exercise 3.17 For the following r.v.s, write down the probability space and the induced probability space, and find the expectation of X.
i) (Constant r.v.) X(ω) = a for all ω.
ii) X : [0, 1] → R given by X(ω) = min{ω, 1 − ω} (The distance to the nearest endpoint of the interval [0, 1])
iii)X : [0, 1]2 → R , the distance to the nearest endpoint of the unit square.
Exercise 3.18 Let Ω = [0, 1], F = B[0,1] and P = λ[0,1] . Let X(ω) = ω(1 − ω) for
ω ∈ Ω . For any random variable Y on (Ω , F , P),
i) Show that Y (ω) + Y (1 − ω) is σ (X) measurable. (Hint: Draw the graph Y
against ω)
3.7 Exercises
ii) Show that E(Y |X)(ω) =
59
Y (1−ω)+Y (ω)
2
for any ω ∈ Ω .
Exercise 3.19 Take Ω = [0, 1], F = B[0,1] and P = λ[0,1] , the Lebesgue measure on
[0, 1]. Find E(X|Z) if X(ω) = 2ω 2 and Z(ω) = 1{ω∈[0,1/3]} + 2 × 1{ω∈(2/3,1]} .
Exercise 3.20 Show that if Z is a constant function, then E(X|Z) is constant and
equal to E(X).
Exercise 3.21 On (Ω , F , P), show that for A, B ∈ F ,
P(A|B) if ω ∈ B ,
E(1A |1B )(ω) =
P(A|Ω \B) if ω ∈
/ B.
Exercise 3.22 Recall that if Z : Ω → R is a discrete r.v. taking n values {zi }i=1,...,n ,
then there exists disjoint Ai such that Ω = ∪ni=1 Ai and Z(ω) = zi whenever ω ∈ Ai .
Show that every element in σ (Z) is a union of some Ai s. Generalize to the case
where Z is a discrete r.v. taking infinite values {zi }i=1,... .
Exercise 3.23 Show that if X is G -measurable, then E(X|G ) = X a.s.
Exercise 3.24 Refer to Example 3.10. If B ∈
/ G , does the result still hold? Prove it
or give a counter example.
Exercise 3.25 Show that, if Q P, then for any ε > 0, we can find a δ > 0 such
that P(F) < δ implies Q(F) ≤ ε.
Exercise 3.26 Show that, if P and Q satisfy (3.11) for any A ∈ F ., then Q P.
Exercise 3.27 Let Q1 , Q2 and P be measures on (Ω , F ). Show that if Q1 P and
Q2 P, then Q1 + Q2 P.
Exercise 3.28 Consider the probability space (Ω , F , P) with Ω = (1, 2, 3, 4, 5, 6)
and F = 2Ω . If P({i}) = i/21 and Q({i}) = (i − 1)/15. Is P Q? Is Q P? Find
dP
and dQ
the Radon-Nikodym derivative dQ
dP if they exist.
T∞
Exercise 3.29 Justify rigorously the equality P (
(3.5).
Tm
n=1 An ) = limm→∞ P (
n=1 An )
in
Exercise 3.30 Using the notation in the proof of Theorem 3.4.
i) From the existence of l = limk→∞ Ω Yk2 dP, show that we can find a subsequence
R
{nk }k=1,... such that l − 41k < limk→∞ Ω Yn2k dP ≤ l.
R
R
ii) Using the definition of Yk , show that Ω (Ynk+1 −Ynk )2 dP = Ω (Yn2k+1 −Yn2k ) dP <
1
.
4k
R
iii)By Cauchy Schwartz inequality,R show that Ω |Ynk+1 −Ynk | dP <R 21k .
∞
iv) Using (iii) and MCT, show that Ω ∑∞
k=1 |Ynk+1 −Ynk | dP = ∑k=1 Ω |Ynk+1 −Ynk | dP <
∞.
v) Using (iv), argue that ∑∞
k=1 |Ynk+1 −Ynk | < ∞ almost surely. (Hint: proof by contradiction). Then, show that ∑∞
k=1 (Ynk+1 −Ynk ) < ∞ almost surely.
R
60
3 Conditional Expectations
vi) Using (v), show that limRk→∞ Yk exists
almost surely.
R
vii) Noting that l − 41k < Ω Yn2k ≤ Ω YR2 n ≤ l, repeating the arguments (ii)-(iii),
k
R
show that F (YR2 n −Yn2k ) dP < 1/2k .
k
R
R
viii)Using (vi) and (vii), show that F YRnk dP → F Y dP.
R
R
ix) Using A YP dS = P(A) = A dP, show that ES (YYP ) = EP (Y ) for any simple function Y on F . Extend to the case where Y is F measurable.