Download Lecture Notes 5;

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MA400. September Introductory Course
(Financial Mathematics, Risk & Stochastics)
P5. Conditional Expectation
1. We assume that an underlying probability space (Ω, F, P) is fixed.
2. Recall from Chapter P2 where we defined random variables that σ-algebras are
mathematical models for information. Suppose that we are interested in a random
variable X, say the price of a certain stock at some future date. Consider a situation
where the information provided by a σ-algebra G ⊆ F becomes available to us.
For instance, G can identify with the information σ(Z) that is associated with the
observation of another random variable Z. The conditional expectation E[X | G] is
the expected value of X if we are given the information G.
The conditional expectation E[X | G] is a random variable that is G-measurable,
namely,
σ E[X | G] ⊆ G.
Indeed, once we possess the information G and we have updated our beliefs about
the likelihood of random events, the conditional expectation should be simply the
“updated” expectation of X. To appreciate more this point, we consider the following “extreme” cases.
- Suppose that the information G contains the “knowledge” of the actual value of
X, i.e., X is G-measurable (σ(X) ⊆ G). In this case, E[X | G] = X (see 18.(iii)
below), which reflects the idea that “knowledge” of G implies “knowledge” of
the actual value of X.
- Suppose that X is independent of the information G. In this case, E[X | G] = E[X]
(see 18.(xi) below), which conveys the idea that “knowledge” of G provides no
information about X.
- The trivial σ-algebra {Ω, ∅} can be viewed as a model for “absence of information”:
we can interpret Ω as the event that “something occurs” and ∅ as the event
that “nothing happens”. In this context, the result E[X | {Ω, ∅}] = E[X] (see
18.(i) below), which expresses expectation as a “special case” of conditional
expectation, is only the “expected” one.
3. Given an event B ∈ F, recall that E[1B = P(B). In view of this observation, it
is natural to identify the conditional probability P(B | G) of B given a σ-algebra
G ⊆ F with the conditional expectation E[1B | G] of the random variable 1B given
the σ-algebra G (see Definition 17 below).
4. It is important to keep in mind that conditional expectation and conditional probability are random variables: probability theory is concerned with the future!
1
5. In what follows, we start from elementary considerations and we end up with the
general mathematical Definitions 15 and 17 of conditional expectation and conditional probability.
Elementary conditional probability
6. Given an event B ∈ F, P(B) quantifies our views on how likely it is for the event
B to occur. Now, suppose that we have been informed that chance outcomes are
restricted within an event A ∈ F. In other words, suppose that somebody informs
us that all likely to happen events are subsets of A, and all events that are subsets
of Ac are impossible to occur.
How should we modify our views, namely our probability measure, to account for
this scenario? Let us denote by P(B | A) our modified belief on the likelihood of
the event B ∈ F given the knowledge that A has occurred. Since the only new
information that we possess is that chance outcomes are restricted within the event
A, it is natural to postulate that P(B | A) should be proportional to P(B ∩ A),
namely
P(B | A) ∼ P(B ∩ A).
(1)
However, our beliefs should “add up” to 1, so that we have a proper probability
measure. This means that we should impose the requirement that P(Ω | A) = 1.
Since P(Ω ∩ A) = P(A), we conclude that we should scale the right hand side of (1)
by 1/P(A) to obtain
P(B | A) =
P(B ∩ A)
.
P(A)
(2)
(Of course, this formula makes sense only if P(A) > 0.)
We can check that the function P(· | A) : F → [0, 1] defined (2) is indeed a probability measure on (Ω, F) [Exercise].
7. Law of total probability. Suppose that the S
events A1 , A2 , . . . , An ∈ F form a partition
of Ω i.e., Ai ∩ Aj = ∅ for all i 6= j, and ni=1 Ai = Ω. Given anSevent B ∈ F, the
events B ∩ A1 , B ∩ A2 , . . . , B ∩ An are pairwise disjoint and ni=1 B ∩ Ai = B.
Therefore, the additivity property of a probability measure and (2) imply
P(B) = P(B ∩ A1 ) + P(B ∩ A2 ) + · · · + P(B ∩ An )
= P(B | A1 )P(A1 ) + P(B | A2 )P(A2 ) + · · · + P(B | An )P(An ).
2
(3)
8. Bayes’ theorem. Suppose that the events A1 , A2 , . . . , An ∈ F form a partition of Ω.
Given an event B ∈ F such that P(B) > 0, and any k = 1, 2, . . . , n, (2) and the law
of total probability (3) imply
P(Ak | B) =
=
P(Ak ∩ B)
P(B)
P(B | Ak )P(Ak )
.
P(B | A1 )P(A1 ) + P(B | A2 )P(A2 ) + · · · + P(B | An )P(An )
(4)
9. Conditional probabilities defined as in (2) have an a posteriori character: we have
been informed and we know that event A has occurred. How should we develop our
theory to account for a prior to observation perspective? In other words, suppose
that we anticipate an observation that will inform us on whether A or Ac occurs.
How should we modify our views to account for this situation?
Given the arguments in Paragraph 6 above, the natural answer is to set
(
P(B | A) if A occurs,
P(B | “observation of A or Ac ”) =
P(B | Ac ) if Ac occurs,
= P(B | A)1A + P(B | Ac )1Ac ,
(5)
provided, of course, that 0 < P(A) < 1. Observe that our views on how likely it
is for the event B to occur have now become a simple random variable. Given any
sample ω ∈ Ω, the conditional probability of the event B takes the value P(B | A)
if ω ∈ A and takes the value P(B | Ac ) if ω ∈ Ac .
10. Given events A, B ∈ F such that 0 < P(A) < 1, the random variable Y defined by
Y =
P(A ∩ B)
P(Ac ∩ B)
1A +
1A c
P(A)
P(Ac )
is the conditional probability of B given the σ-algebra {∅, Ω, A, Ac }. We denote this
conditional probability by
P B | {∅, Ω, A, Ac } = E 1B | {∅, Ω, A, Ac } .
3
We note that this random variable Y has the following properties:
(i) Y is {∅, Ω, A, Ac }-measurable.
(ii) E |Y | < ∞.
(iii) E 1C Y = E 1C X for all C ∈ {∅, Ω, A, Ac }. Indeed, given any C ∈ {∅, Ω, A, Ac },
P(A ∩ B)
P(Ac ∩ B)
E 1C Y = E
1C 1A +
1C 1A c
P(A)
P(Ac )
P(Ac ∩ B)
P(A ∩ B)
P(A ∩ C) +
P(Ac ∩ C)
=
P(A)
P(Ac )

P(∅ ∩ B),
if C = ∅,



P(Ω ∩ B), if C = Ω,
=

P(A ∩ B), if C = A,



P(Ac ∩ B), if C = Ac ,
= E 1C 1B .
Conditional expectation of a simple random variable given another
simple random variable
11. Consider two simple random variables X and Z and suppose that
X=
n
X
xi 1{X=xi }
and Z =
i=1
m
X
zj 1{Z=zj } ,
j=1
for some distinct x1 , . . . , xn and z1 , . . . , zm . Also, assume that P(Z = zj ) > 0 for all
j = 1, . . . , m.
We observe that the σ-algebra
σ(Z) is easy to describe:
it consists of all possible
unions of sets in the family {Z = z1 }, . . . , {Z = zm } , namely,
(
)
[
σ(Z) =
{Z = zk } | J ⊆ {1, . . . , m} ,
(6)
k∈J
with the convention that
[
{Z = zk } = ∅.
k∈∅
4
12. Suppose that we have made an “experiment” that has informed us about the actual
value of Z. In particular, suppose that we have been given the information that
the actual value of the random variable Z is zj , for some j = 1, . . . , m. In this
context where we know that the event {Z = zj } has occurred, we should revise
our probabilities from P(·) to P(· | Z = zj ). Furthermore, we should revise the
expectation of X from E[X] to
E [X | Z = zj ] =
n
X
xi P(X = xi | Z = zj ).
i=1
This conditional expectation, the conditional expectation of X given that the random
variable Z is equal to zj , which is a real number, has an a posteriori character: we
have been informed that the actual value of Z is zj .
Prior to observation, namely, before we observe the value of the random variable
Z, it is natural to define the conditional expectation of X given the information we
can obtain from the observation of the random variable Z by
E [X | σ(Z)] ≡ E [X | Z] =
m
X
E [X | Z = zj ] 1{Z=zj } ,
j=1
which is a random variable.
13. The random variable
Y =
m
X
E [X | Z = zj ] 1{Z=zj } ,
j=1
namely, the conditional expectation E [X | σ(Z)] ≡ E [X | Z] of X given the σalgebra σ(Z) has the following properties:
(i) Y is σ(Z)-measurable because Y is constant on the event {Z = zj } for all
j = 1, . . . , m.
(ii) E |Y | < ∞ because |Y | can take only m possible values.
(iii) E 1C Y = E 1C X for all C ∈ σ(Z). To see this claim, we consider any event
5
C ∈ σ(Z) and we use (6) to calculate
"
! m n
!#
X
X X P(X = xi , Z = zj )
E 1C Y = E
1{Z=zk }
xi
1{Z=zj }
P(Z = zj )
j=1 i=1
k∈J
" n
#
m
X XX
P(X = xi , Z = zj )
=E
xi
1{Z=zj } 1{Z=zk }
P(Z
=
z
)
j
i=1
k∈J j=1
" n
#
X X P(X = xi , Z = zk )
=E
xi
1{Z=zk }
P(Z = zk )
i=1
k∈J
=
=
n
X
i=1
n
X
xi
=
=
i=1
n
X
xi
X E 1{X=xi }∩{Z=zk }
k∈J
X xi
E 1{X=xi } 1{Z=zk }
k∈J
"
=
#
xi E 1{X=xi }
i=1
n
X
P(X = xi , Z = zk )
k∈J
i=1
n
X
X
X
1{Z=zk }
k∈J
xi E 1{X=xi } 1C
i=1
"
= E 1C
n
X
#
xi 1{X=xi }
i=1
= E 1C X .
6
Conditional expectation of a continuous random variable given another
continuous random variable
14. Suppose that X and Z are continuous random variables with joint probability density function fXZ , so that
Z ∞
fXZ (x, z) dx
fZ (z) =
−∞
is the probability density function of Z, and assume that
Z ∞
Z ∞Z ∞
E |X| =
|x|fX (x) dx =
|x|fXZ (x, z) dx dz < ∞.
−∞
−∞
−∞
We define the conditional probability density function of X given Z by
(
fXZ (x, z)/fZ (z), if fZ (z) 6= 0,
fX|Z (x|z) =
0,
if fZ (z) = 0.
The random variable Y
Z
∞
Y =
x fX|Z (x|Z) dx
−∞
is the conditional expectation E [X | σ(Z)] ≡ E [X | Z] of X given the σ-algebra
σ(Z) and has the following properties:
(i) Y is σ(Z)-measurable.
(ii) E |Y | < ∞. Indeed, we note that x 7→ |x| is a convex function and we use
Jensen’s inequality to calculate
Z ∞ Z ∞
fZ (z) dz
E |Y | =
x
f
(x|z)
dx
X|Z
−∞
Z−∞
Z
∞
∞
≤
|x| fX|Z (x|z) dx fZ (z) dz
−∞
−∞
Z ∞Z ∞
=
|x| fX|Z (x|z)fZ (z) dx dz
−∞ −∞
Z ∞Z ∞
=
|x| fXZ (x, z) dx dz
−∞ −∞
= E |X| < ∞.
(iii) E 1C Y = E 1C X for all C ∈ σ(Z). To see this claim, we first note, given
any event C ∈ σ(Z), there exists A ∈ B(R) such that
C = ω ∈ Ω | Z(ω) ∈ A .
7
In view of this observation, we calculate
Z ∞
E 1C Y = E 1{Z∈A}
x fX|Z (x|Z) dx
−∞
Z ∞
Z ∞
1{z∈A}
x fX|Z (x|z) dx fZ (z) dz
=
−∞
−∞
Z ∞Z ∞
1{z∈A} x fX|Z (x|z)fZ (z) dx dz
=
−∞ −∞
Z ∞Z ∞
=
1{z∈A} x fXZ (x, z) dx dz
−∞ −∞
= E 1{Z∈A} X
= E 1C X .
8
Definitions and existence
15. Definition. Consider a random variable X such that E |X| < ∞, and let G ⊆ F
be a σ-algebra on Ω. The conditional expectation E[X | G] of the random variable
X given the σ-algebra G is any random variable Y such that
(i) Y is G-measurable,
(ii) E |Y | < ∞, and
(iii) for every event C ∈ G,
E 1C Y = E 1C X .
We say that a random variable
Y with the properties (i)–(iii) is a version
of the
conditional expectation E X | G of X given G, and we write Y = E X | G , P-a.s..
16. Theorem. Consider a random variable X such that E |X| < ∞, and let G ⊆ F
be a σ-algebra. There exists a random variable Y having properties (i)–(iii) in
Definition 15.
Furthermore, Y is unique in the sense that, if Ỹ is another random variables satisfying the required properties, then Ỹ = Y , P-a.s..
17. Definition. Consider an event B ∈ F, and let G ⊆ F be a σ-algebra. The conditional
probability of B given G is the random variable defined by
P(B | G) = E[1B | G].
Properties of conditional expectation
18. Theorem. In the following list of properties of conditional expectation, we assume
that all random variables are in L1 , and that G, H ⊆ F are σ-algebras on Ω.
(i) E[X | {Ω, ∅}] = E[X].
(ii) If Y is a version of E[X | G], then E[Y ] = E[X].
(iii) If X is G-measurable, then E[X | G] = X, P-a.s..
(iv) (Linearity) Given constants a1 , a2 ∈ R, and random variables X1 , X2 ,
E[a1 X1 + a2 X2 | G] = a1 E[X1 | G] + a2 E[X2 | G],
P-a.s..
Precisely, if Y1 is a version of E[X1 | G] and Y2 is a version of E[X2 | G], then
a1 Y1 + a2 Y2 is a version of E[a1 X1 + a2 X2 | G].
9
(v) (Conditional monotone convergence theorem) If (Xn ) is an increasing sequence
of positive random variables (i.e., 0 ≤ X1 ≤ X2 ≤ · · · ≤ Xn ≤ · · · ) converging
to the random variable X, P-a.s., then
lim E[Xn | G] = E[X | G],
n→∞
P-a.s..
(vi) (Conditional dominated convergence theorem) If (Xn ) is a sequence of random
variables that converges to a random variable X, P-a.s., and is such that |Xn | ≤
Z, P-a.s., for all n ≥ 1, for some random variable Z, then
lim E[Xn | G] = E[X | G],
n→∞
P-a.s..
(vii) (Conditional Fatou’s lemma) If (Xn ) is a sequence of random variables such
that Xn ≥ Z, P-a.s., for all n ≥ 1, for some random variable Z, then
h
i
E lim inf Xn | G ≤ lim inf E[Xn | G], P-a.s..
n→∞
n→∞
Similarly, if (Xn ) is a sequence of random variables such that Xn ≤ Z, P-a.s.,
for all n ≥ 1, for some random variable Z, then
E lim sup Xn | G ≥ lim sup E[Xn | G], P-a.s..
n→∞
n→∞
(viii) (Conditional Jensen’s inequality) Given a random variable X and a convex
function g : R → R,
E g(X) | G ≥ g E[X | G] , P-a.s..
(ix) (Tower property) If H ⊆ G, then
E E[X | G] | H] = E[X | H],
P-a.s..
(x) (“Taking out what is known”) If Z is G-measurable, then
E[ZX | G] = ZE[X | G],
P-a.s..
(xi) (Independence) If H is independent of σ(σ(X) ∪ G), then
E[X | σ(G ∪ H)] = E[X | G],
P-a.s..
In particular, if σ(X) and H are independent σ-algebras,
E[X | H] = E[X],
10
P-a.s..