Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MA400. September Introductory Course (Financial Mathematics, Risk & Stochastics) P5. Conditional Expectation 1. We assume that an underlying probability space (Ω, F, P) is fixed. 2. Recall from Chapter P2 where we defined random variables that σ-algebras are mathematical models for information. Suppose that we are interested in a random variable X, say the price of a certain stock at some future date. Consider a situation where the information provided by a σ-algebra G ⊆ F becomes available to us. For instance, G can identify with the information σ(Z) that is associated with the observation of another random variable Z. The conditional expectation E[X | G] is the expected value of X if we are given the information G. The conditional expectation E[X | G] is a random variable that is G-measurable, namely, σ E[X | G] ⊆ G. Indeed, once we possess the information G and we have updated our beliefs about the likelihood of random events, the conditional expectation should be simply the “updated” expectation of X. To appreciate more this point, we consider the following “extreme” cases. - Suppose that the information G contains the “knowledge” of the actual value of X, i.e., X is G-measurable (σ(X) ⊆ G). In this case, E[X | G] = X (see 18.(iii) below), which reflects the idea that “knowledge” of G implies “knowledge” of the actual value of X. - Suppose that X is independent of the information G. In this case, E[X | G] = E[X] (see 18.(xi) below), which conveys the idea that “knowledge” of G provides no information about X. - The trivial σ-algebra {Ω, ∅} can be viewed as a model for “absence of information”: we can interpret Ω as the event that “something occurs” and ∅ as the event that “nothing happens”. In this context, the result E[X | {Ω, ∅}] = E[X] (see 18.(i) below), which expresses expectation as a “special case” of conditional expectation, is only the “expected” one. 3. Given an event B ∈ F, recall that E[1B = P(B). In view of this observation, it is natural to identify the conditional probability P(B | G) of B given a σ-algebra G ⊆ F with the conditional expectation E[1B | G] of the random variable 1B given the σ-algebra G (see Definition 17 below). 4. It is important to keep in mind that conditional expectation and conditional probability are random variables: probability theory is concerned with the future! 1 5. In what follows, we start from elementary considerations and we end up with the general mathematical Definitions 15 and 17 of conditional expectation and conditional probability. Elementary conditional probability 6. Given an event B ∈ F, P(B) quantifies our views on how likely it is for the event B to occur. Now, suppose that we have been informed that chance outcomes are restricted within an event A ∈ F. In other words, suppose that somebody informs us that all likely to happen events are subsets of A, and all events that are subsets of Ac are impossible to occur. How should we modify our views, namely our probability measure, to account for this scenario? Let us denote by P(B | A) our modified belief on the likelihood of the event B ∈ F given the knowledge that A has occurred. Since the only new information that we possess is that chance outcomes are restricted within the event A, it is natural to postulate that P(B | A) should be proportional to P(B ∩ A), namely P(B | A) ∼ P(B ∩ A). (1) However, our beliefs should “add up” to 1, so that we have a proper probability measure. This means that we should impose the requirement that P(Ω | A) = 1. Since P(Ω ∩ A) = P(A), we conclude that we should scale the right hand side of (1) by 1/P(A) to obtain P(B | A) = P(B ∩ A) . P(A) (2) (Of course, this formula makes sense only if P(A) > 0.) We can check that the function P(· | A) : F → [0, 1] defined (2) is indeed a probability measure on (Ω, F) [Exercise]. 7. Law of total probability. Suppose that the S events A1 , A2 , . . . , An ∈ F form a partition of Ω i.e., Ai ∩ Aj = ∅ for all i 6= j, and ni=1 Ai = Ω. Given anSevent B ∈ F, the events B ∩ A1 , B ∩ A2 , . . . , B ∩ An are pairwise disjoint and ni=1 B ∩ Ai = B. Therefore, the additivity property of a probability measure and (2) imply P(B) = P(B ∩ A1 ) + P(B ∩ A2 ) + · · · + P(B ∩ An ) = P(B | A1 )P(A1 ) + P(B | A2 )P(A2 ) + · · · + P(B | An )P(An ). 2 (3) 8. Bayes’ theorem. Suppose that the events A1 , A2 , . . . , An ∈ F form a partition of Ω. Given an event B ∈ F such that P(B) > 0, and any k = 1, 2, . . . , n, (2) and the law of total probability (3) imply P(Ak | B) = = P(Ak ∩ B) P(B) P(B | Ak )P(Ak ) . P(B | A1 )P(A1 ) + P(B | A2 )P(A2 ) + · · · + P(B | An )P(An ) (4) 9. Conditional probabilities defined as in (2) have an a posteriori character: we have been informed and we know that event A has occurred. How should we develop our theory to account for a prior to observation perspective? In other words, suppose that we anticipate an observation that will inform us on whether A or Ac occurs. How should we modify our views to account for this situation? Given the arguments in Paragraph 6 above, the natural answer is to set ( P(B | A) if A occurs, P(B | “observation of A or Ac ”) = P(B | Ac ) if Ac occurs, = P(B | A)1A + P(B | Ac )1Ac , (5) provided, of course, that 0 < P(A) < 1. Observe that our views on how likely it is for the event B to occur have now become a simple random variable. Given any sample ω ∈ Ω, the conditional probability of the event B takes the value P(B | A) if ω ∈ A and takes the value P(B | Ac ) if ω ∈ Ac . 10. Given events A, B ∈ F such that 0 < P(A) < 1, the random variable Y defined by Y = P(A ∩ B) P(Ac ∩ B) 1A + 1A c P(A) P(Ac ) is the conditional probability of B given the σ-algebra {∅, Ω, A, Ac }. We denote this conditional probability by P B | {∅, Ω, A, Ac } = E 1B | {∅, Ω, A, Ac } . 3 We note that this random variable Y has the following properties: (i) Y is {∅, Ω, A, Ac }-measurable. (ii) E |Y | < ∞. (iii) E 1C Y = E 1C X for all C ∈ {∅, Ω, A, Ac }. Indeed, given any C ∈ {∅, Ω, A, Ac }, P(A ∩ B) P(Ac ∩ B) E 1C Y = E 1C 1A + 1C 1A c P(A) P(Ac ) P(Ac ∩ B) P(A ∩ B) P(A ∩ C) + P(Ac ∩ C) = P(A) P(Ac ) P(∅ ∩ B), if C = ∅, P(Ω ∩ B), if C = Ω, = P(A ∩ B), if C = A, P(Ac ∩ B), if C = Ac , = E 1C 1B . Conditional expectation of a simple random variable given another simple random variable 11. Consider two simple random variables X and Z and suppose that X= n X xi 1{X=xi } and Z = i=1 m X zj 1{Z=zj } , j=1 for some distinct x1 , . . . , xn and z1 , . . . , zm . Also, assume that P(Z = zj ) > 0 for all j = 1, . . . , m. We observe that the σ-algebra σ(Z) is easy to describe: it consists of all possible unions of sets in the family {Z = z1 }, . . . , {Z = zm } , namely, ( ) [ σ(Z) = {Z = zk } | J ⊆ {1, . . . , m} , (6) k∈J with the convention that [ {Z = zk } = ∅. k∈∅ 4 12. Suppose that we have made an “experiment” that has informed us about the actual value of Z. In particular, suppose that we have been given the information that the actual value of the random variable Z is zj , for some j = 1, . . . , m. In this context where we know that the event {Z = zj } has occurred, we should revise our probabilities from P(·) to P(· | Z = zj ). Furthermore, we should revise the expectation of X from E[X] to E [X | Z = zj ] = n X xi P(X = xi | Z = zj ). i=1 This conditional expectation, the conditional expectation of X given that the random variable Z is equal to zj , which is a real number, has an a posteriori character: we have been informed that the actual value of Z is zj . Prior to observation, namely, before we observe the value of the random variable Z, it is natural to define the conditional expectation of X given the information we can obtain from the observation of the random variable Z by E [X | σ(Z)] ≡ E [X | Z] = m X E [X | Z = zj ] 1{Z=zj } , j=1 which is a random variable. 13. The random variable Y = m X E [X | Z = zj ] 1{Z=zj } , j=1 namely, the conditional expectation E [X | σ(Z)] ≡ E [X | Z] of X given the σalgebra σ(Z) has the following properties: (i) Y is σ(Z)-measurable because Y is constant on the event {Z = zj } for all j = 1, . . . , m. (ii) E |Y | < ∞ because |Y | can take only m possible values. (iii) E 1C Y = E 1C X for all C ∈ σ(Z). To see this claim, we consider any event 5 C ∈ σ(Z) and we use (6) to calculate " ! m n !# X X X P(X = xi , Z = zj ) E 1C Y = E 1{Z=zk } xi 1{Z=zj } P(Z = zj ) j=1 i=1 k∈J " n # m X XX P(X = xi , Z = zj ) =E xi 1{Z=zj } 1{Z=zk } P(Z = z ) j i=1 k∈J j=1 " n # X X P(X = xi , Z = zk ) =E xi 1{Z=zk } P(Z = zk ) i=1 k∈J = = n X i=1 n X xi = = i=1 n X xi X E 1{X=xi }∩{Z=zk } k∈J X xi E 1{X=xi } 1{Z=zk } k∈J " = # xi E 1{X=xi } i=1 n X P(X = xi , Z = zk ) k∈J i=1 n X X X 1{Z=zk } k∈J xi E 1{X=xi } 1C i=1 " = E 1C n X # xi 1{X=xi } i=1 = E 1C X . 6 Conditional expectation of a continuous random variable given another continuous random variable 14. Suppose that X and Z are continuous random variables with joint probability density function fXZ , so that Z ∞ fXZ (x, z) dx fZ (z) = −∞ is the probability density function of Z, and assume that Z ∞ Z ∞Z ∞ E |X| = |x|fX (x) dx = |x|fXZ (x, z) dx dz < ∞. −∞ −∞ −∞ We define the conditional probability density function of X given Z by ( fXZ (x, z)/fZ (z), if fZ (z) 6= 0, fX|Z (x|z) = 0, if fZ (z) = 0. The random variable Y Z ∞ Y = x fX|Z (x|Z) dx −∞ is the conditional expectation E [X | σ(Z)] ≡ E [X | Z] of X given the σ-algebra σ(Z) and has the following properties: (i) Y is σ(Z)-measurable. (ii) E |Y | < ∞. Indeed, we note that x 7→ |x| is a convex function and we use Jensen’s inequality to calculate Z ∞ Z ∞ fZ (z) dz E |Y | = x f (x|z) dx X|Z −∞ Z−∞ Z ∞ ∞ ≤ |x| fX|Z (x|z) dx fZ (z) dz −∞ −∞ Z ∞Z ∞ = |x| fX|Z (x|z)fZ (z) dx dz −∞ −∞ Z ∞Z ∞ = |x| fXZ (x, z) dx dz −∞ −∞ = E |X| < ∞. (iii) E 1C Y = E 1C X for all C ∈ σ(Z). To see this claim, we first note, given any event C ∈ σ(Z), there exists A ∈ B(R) such that C = ω ∈ Ω | Z(ω) ∈ A . 7 In view of this observation, we calculate Z ∞ E 1C Y = E 1{Z∈A} x fX|Z (x|Z) dx −∞ Z ∞ Z ∞ 1{z∈A} x fX|Z (x|z) dx fZ (z) dz = −∞ −∞ Z ∞Z ∞ 1{z∈A} x fX|Z (x|z)fZ (z) dx dz = −∞ −∞ Z ∞Z ∞ = 1{z∈A} x fXZ (x, z) dx dz −∞ −∞ = E 1{Z∈A} X = E 1C X . 8 Definitions and existence 15. Definition. Consider a random variable X such that E |X| < ∞, and let G ⊆ F be a σ-algebra on Ω. The conditional expectation E[X | G] of the random variable X given the σ-algebra G is any random variable Y such that (i) Y is G-measurable, (ii) E |Y | < ∞, and (iii) for every event C ∈ G, E 1C Y = E 1C X . We say that a random variable Y with the properties (i)–(iii) is a version of the conditional expectation E X | G of X given G, and we write Y = E X | G , P-a.s.. 16. Theorem. Consider a random variable X such that E |X| < ∞, and let G ⊆ F be a σ-algebra. There exists a random variable Y having properties (i)–(iii) in Definition 15. Furthermore, Y is unique in the sense that, if Ỹ is another random variables satisfying the required properties, then Ỹ = Y , P-a.s.. 17. Definition. Consider an event B ∈ F, and let G ⊆ F be a σ-algebra. The conditional probability of B given G is the random variable defined by P(B | G) = E[1B | G]. Properties of conditional expectation 18. Theorem. In the following list of properties of conditional expectation, we assume that all random variables are in L1 , and that G, H ⊆ F are σ-algebras on Ω. (i) E[X | {Ω, ∅}] = E[X]. (ii) If Y is a version of E[X | G], then E[Y ] = E[X]. (iii) If X is G-measurable, then E[X | G] = X, P-a.s.. (iv) (Linearity) Given constants a1 , a2 ∈ R, and random variables X1 , X2 , E[a1 X1 + a2 X2 | G] = a1 E[X1 | G] + a2 E[X2 | G], P-a.s.. Precisely, if Y1 is a version of E[X1 | G] and Y2 is a version of E[X2 | G], then a1 Y1 + a2 Y2 is a version of E[a1 X1 + a2 X2 | G]. 9 (v) (Conditional monotone convergence theorem) If (Xn ) is an increasing sequence of positive random variables (i.e., 0 ≤ X1 ≤ X2 ≤ · · · ≤ Xn ≤ · · · ) converging to the random variable X, P-a.s., then lim E[Xn | G] = E[X | G], n→∞ P-a.s.. (vi) (Conditional dominated convergence theorem) If (Xn ) is a sequence of random variables that converges to a random variable X, P-a.s., and is such that |Xn | ≤ Z, P-a.s., for all n ≥ 1, for some random variable Z, then lim E[Xn | G] = E[X | G], n→∞ P-a.s.. (vii) (Conditional Fatou’s lemma) If (Xn ) is a sequence of random variables such that Xn ≥ Z, P-a.s., for all n ≥ 1, for some random variable Z, then h i E lim inf Xn | G ≤ lim inf E[Xn | G], P-a.s.. n→∞ n→∞ Similarly, if (Xn ) is a sequence of random variables such that Xn ≤ Z, P-a.s., for all n ≥ 1, for some random variable Z, then E lim sup Xn | G ≥ lim sup E[Xn | G], P-a.s.. n→∞ n→∞ (viii) (Conditional Jensen’s inequality) Given a random variable X and a convex function g : R → R, E g(X) | G ≥ g E[X | G] , P-a.s.. (ix) (Tower property) If H ⊆ G, then E E[X | G] | H] = E[X | H], P-a.s.. (x) (“Taking out what is known”) If Z is G-measurable, then E[ZX | G] = ZE[X | G], P-a.s.. (xi) (Independence) If H is independent of σ(σ(X) ∪ G), then E[X | σ(G ∪ H)] = E[X | G], P-a.s.. In particular, if σ(X) and H are independent σ-algebras, E[X | H] = E[X], 10 P-a.s..