Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Conditional Expectations When we work on a particular branch of multi-period Binomial model, conditional probability is involved: conditional on a node, we consider the probability of going up or down. Given a continuous time price process St , if at time u we want to find a product’s price at maturity T , we need to deal with the conditional distribution of ST given Su . In this section we provide a rigorous introduction to conditional probabilities and conditional expectations. 3.1 Conditional Probability and Independence Definition 3.1. (Conditional Probability) For any events A, B ∈ F such that P(B) 6= 0, the conditional probability of A given B is defined by P(A|B) = P(A ∩ B) . P(B) (3.1) Example 3.1. (Total Probability Formula). For any event A ∈ F and any partition B1 , B2 , . . . of Ω such that P(Bi ) 6= 0 and Bi ∈ F for any i, ∞ ∞ P(A) = ∑ P(A ∩ Bi ) = ∑ P(A|Bi )P(Bi ), , i=1 i=1 where the first and second equalities follow from the countably additivity of probability measure and (3.1) respectively. Definition 3.2. (Independence of events) Two events A, B ∈ F are called independent if P(A ∩ B) = P(A)P(B). Any n events A1 , A2 , . . . , An ∈ F are independent if P(Ai1 ∩ Ai2 ∩ · · · ∩ Aik ) = P(Ai1 )P(Ai2 ) · · · P(Aik ) for any indices 1 ≤ i1 < i2 < · · · < ik ≤ n. In general, an infinite family of events is independent if any finite number of them are independent. 41 42 3 Conditional Expectations Definition 3.3. (Independence of random variables) A set of n random variables X1 , . . . , Xn are independent if for any Borel sets B1 , . . . , Bn ∈ B, the events {X1 ∈ B1 }, . . . , {Xn ∈ Bn } are independent. In general, an infinite family of random variable is independent if any finite number of them are independent. Definition 3.4. (Independence of σ -fields) A set of n σ -fields G1 , . . . , Gn contained in F are independent if any n events {B1 ∈ G1 }, . . . , {Bn ∈ Gn } are independent. In general, an infinite family of σ -fields is independent if any finite number of them are independent. Remark 3.1. The σ fields Gi s contained in F , i.e., Gi ⊂ F for all i. Otherwise, the measure P is not able to assign probabilities on the set {Bi ∈ Gi }. Example 3.2. Consider the dice example (Ω , F , P) where Ω = {1, 2, 3, 4, 5, 6} and F = 2Ω . Let P({ω}) = 1/6 for ω = 1, . . . , 6. and A1 = {1, 2, 3}, A2 = {2, 5}, Xi = 1Ai , Fi = σ ({Ai }) for i = 1, 2 . First consider independence of events: P(A1 ∩ A2 ) = P(A1 )P(A2 ) = 61 , thus A1 and A2 are independent. Next consider independence of r.v.s. Note that the σ -field generated by X1 and X2 are {∅, A1 , {4, 5, 6}, Ω } and {∅, A2 , {1, 3, 4, 6}, Ω } respectively. We can conclude that X1 and X2 are independent if each pair of these two groups of events are independent (4×4=16 combinations). Nevertheless, these 16 pairs of events can be shown to be independent by elementary calculations. Thus, X1 and X2 are independent. Finally, to show that F1 and F2 are independent, we need to show that each combination of events from F1 and F2 are independent. Notice that Fi coincide with the σ -field generated by Xi for i = 1, 2, respectively. Thus, the calculations in the preceding paragraph show that F1 and F2 are indeed independent. From the above, it is seen that if two σ -fields Fa and Fb are independent, any Fa measurable r.v. is independent of any Fb measurable r.v.. 3.2 Conditional Expectation 3.2.1 Conditioning on an Event Definition 3.5. (Conditional Expectation given an Event) For any random variable X ∈ L 1 on (Ω , F , P) and any event B ∈ F such that P(B) 6= 0, the conditional 3.2 Conditional Expectation 43 expectation of X given B is defined by 1 E(X|B) = P(B) Z XdP . B Example 3.3. An investor plans to invest $10 in an asset. According to his estimate, the asset value in the coming two years can be represented by the following diagram 12.1 1/4 2/3 11 3/4 1/4 10 1/3 9.5 3/4 Year 0 10.45 9.025 Year 1 Year 2 The number along an arrow represents the probability that the event indicated by the arrow would occur. Let X be the value of the asset at the end of the second year. Recall the definition of path in Definition 1.6 where ωt = (z1 , . . . , zt ), zi = ±1 for i = 1, . . . ,t, indicates the evolution (up=1, down=-1) of the market. We can take the probability space to be (ΩX , FX , PX ) where ΩX = {ω2 = (a, b)|a = ±1, b = ±1}, FX = 2ΩX and PX is the discrete probability measure satisfying PX ({(1, 1)}) = 1/6, . . . , PX ({(−1, −1)}) = 1/4. Consider the event B = {ω1 = (1)}. Clearly, P(B) = 2/3. Also, note that X((1, 1)) = 12.1 , X((1, −1)) = 10.45 . and P({(1, 1)}) = 1 1 , P({(1, −1)}) = . 6 2 Therefore Z 1 1 XdP = (X((1, 1))P({(1, 1)}) + X((1, −1))P({(1, −1)}) P(B) B P(B) 1 1 1 12.1 × + 10.45 × = 10.8625. = 2/3 6 2 E(X|B) = 44 3 Conditional Expectations 3.2.2 Conditioning on a discrete random variable Given an arbitrary random variable X in L 1 (so expectation exists) and a discrete random variable Z : Ω → {z1 , . . . , zm }, the conditional expectation of a r.v. X given Z, i.e. E(X|Z), must depend solely on the random variable Z. Thus E(X|Z) is itself a random variable. Since Z takes only m possible values, so does E(X|Z). Therefore, we can characterize E(X|Z) by m conditional expectations (on an event {Z = zi }) E(X|{Z = zi }), i = 1, . . . , m, using the definition in the last section. Definition 3.6. (Conditional Expectation given a discrete r.v.) Let X ∈ L 1 and Z be a discrete r.v. taking values on {zi }i=1,... . The conditional expectation of X given Z is defined to be a discrete random variable E(X|Z) : Ω → R such that E(X|Z)(ω) = E(X|{Z = zi }) on {ω : Z(ω) = zi } , for any i = 1, 2, . . . (3.2) Example 3.4. Three coins, 10, 20 and 50 cents, are tossed. The values of those coins with heads up are added for a total amount X. Let Z be the total amount of the 10 and 20 cents, what is E(X|Z)? First, the probability space can be taken as Ω = {HHH, HHT, . . . , T T T }, where H and T stands for Head and Tail respectively. The associated σ -field can be taken as F = 2Ω . The probability measure P may be defined on F with P({HHH}) = · · · = P({T T T }) = 81 . The discrete r.v. Z on (Ω , F , P) is given by Z : Ω → {0, 10, 20, 30}. Thus finding E(X|Z) is equivalent to finding E(X|{Z = 0}), . . . , E(X|{Z = 30}). For example, Z 1 1 1 1 E(X|{Z = 0}) = X dP = (0 + 50) + (0 + 0) = 25 . P({Z = 0}) {Z=0} 1/4 8 8 Using similar argument for the other cases, we have 25 if Z(ω) = 0 , 35 if Z(ω) = 10 , E(X|Z)(ω) = 45 if Z(ω) = 20 , 55 if Z(ω) = 30 . Note that the probability measure on Z is irrelevant. In other words, the above results still hold even if the 10 and 20 cents coins are biased. However, the results will be different if the 50 cents coin is assumed biased. Theorem 3.1. If X ∈ L 1 and Z is a discrete random variable, then i) E(X|Z) is σ (Z)-measurable ii) For any A ∈ σ (Z), Z Z E(X|Z) dP = A X dP . A (3.3) 3.2 Conditional Expectation 45 Proof. i) To show that E(X|Z) is σ (Z) measurable, we have to show that for any B ∈ B, {ω : E(X|Z)(ω) ∈ B} ∈ σ (Z). Since Z is discrete, we can assume that Z takes values {zi }i=1,... . From (3.2), E(X|Z) also takes discrete values {yi }i=1,... , where yi = E(X|{Z = zi }). Note that {ω : E(X|Z)(ω) = yi } = {Z = zi }. It follows that for any B ∈ B, {ω : E(X|Z)(ω) ∈ B} is a disjoint union of {Z = zi }, which belongs to σ (Z). This completes the proof of i). ii) Since E(X|Z)(ω) is constant on {ω : Z(ω) = zi }, we have Z Z E(X|Z)dP = {Z=zi } {Z=zi } E(X|{Z = zi })dP = E(X|{Z = zi }) R {Z=zi } XdP Z dP {Z=zi } Z = R {Z=zi } dP dP {Z=zi } Z XdP . = {Z=zi } In general, any A ∈ σ (Z) is a countable union of sets of the form {Z = zi }, which are pairwise disjoint. Thus (3.3) follows from the countably additivity of Lebesgue integral. Example 3.5. Continuing Example 3.4, the induced sample space for the r.v. X and Z are ΩX = {0, 10, 20, 30, 50, 60, 70, 80} and ΩZ = {0, 10, 20, 30}. Let A1 = {Z = 20} and A2 = {Z = 10 or 30}. Note that A1 , A2 ∈ FZ , 2ΩZ . It is easy to verify that (3.3) holds for A = A1 and A2 . In particular, 1 1 X dP = 20 + 70 = 11.25 , 8 8 A1 Z 1 E(X|Z) dP = 45 = 11.25 , 4 A1 Z 1 X dP = (10 + 60 + 30 + 80) = 22.5 , 8 A2 Z 1 E(X|Z) dP = (35 + 55) = 22.5 . 4 A2 Z 3.2.3 Conditioning on an arbitrary random variable If Z is a discrete r.v., we see from (3.2) that E(X|Z) is also a discrete random variable. When Z is an arbitrary r.v. (has discrete and continuous component), it is not easy to write the explicit formula for E(X|Z) in general. (The elementary way of 46 3 Conditional Expectations R writing E(X|Z) = x fX,Z (x, z)/ fZ (z) dx cannot be used if the p.d.f.s fX,Z or fZ (z) does not exist.) However, Theorem 3.1 suggests a way for us to define conditional expectation given an arbitrary r.v.. Definition 3.7. (Conditional Expectation given an arbitrary r.v.) Let X ∈ L 1 and Z be an arbitrary r.v.. Then the conditional expectation of X given Z is defined to be a random variable E(X|Z) such that i) E(X|Z) is a σ (Z)-measurable r.v. , ii) For any A ∈ σ (Z), Z Z E(X|Z) dP = A X dP . (3.4) A Since conditional expectation is defined implicitly by (3.4) rather than an explicit formula, we need to justify such a definition by showing that the r.v. E(X|Z) is characterized uniquely. Since E(X|Z) is a random variable, the uniqueness depends on the probability measure P through the notion of almost sure equivalence. Definition 3.8. (Almost Sure equivalence) 1. Two events A, B ∈ F are equal almost surely (a.s.) if P((A\B) ∪ (B\A)) = 0, i.e., the un-common region of A and B has probability 0. 2. Two random variables X and Y on (Ω , F , P) are equal almost surely (a.s.) if P(X = Y ) , P({ω ∈ Ω : X(ω) = Y (ω)}) = 1. Example 3.6. Note that two events A and B are equal a.s. does not imply that the two events are the same. It only means that the event that A\B and B\A are of P-measure 0. For example, consider A = (0, 0.5) and B = (0, 0.5] on ([0, 1], B[0,1] , λ[0,1] ). It is clear that A = B a.s. but A 6= B. For the case of random variables, let Ω = {ω1 , ω2 }, F = 2Ω , P is defined on F and satisfies P({ω1 }) = 1. Suppose that Xi : Ω → R are r.v.s with X1 = Iω1 + 2 × Iω2 and X2 = Iω1 + 3 × Iω2 . Then X1 = X2 a.s. but X1 6= X2 . Theorem 3.2. (Uniqueness of Conditional Expectation) E(X|Z) is unique in the sense that if X = X 0 a.s., then E(X|Z) = E(X 0 |Z) a.s. The proof of Theorem 3.2 relies on the following lemma. Lemma 3.3 Let (Ω , F , P) be a probability space and let σ (Z) ∈ F . If Y is a σ (Z) measurable r.v. and Z Y dP = 0 , A for any A ∈ σ (Z), then Y = 0 a.s. That is, P(Y = 0) = 1. Proof. Observe that P{Y ≥ n1 } = 0 for any positive integer n because Z Z 1 1 1 0≤ P Y ≥ Y dP = 0 = dP ≤ n n {Y ≥ 1n } n {Y ≥ n1 } 3.2 Conditional Expectation 47 The last equality follows from the fact that {Y ≥ 1n } = {Y ∈ [ 1n , ∞)} ∈ σ (Z). Similarly, P{Y ≤ − n1 } = 0 for any n > 0. Set An = {− 1n < Y < n1 }, then we have P(An ) = 1 for any positive integer n. Since {An } forms a contracting sequence of events and {Y = 0} = ∞ \ An , n=1 it follows that P{Y = 0} = P ∞ \ ! m \ = lim P An m→∞ n=1 ! An n=1 = lim P(Am ) = 1, m→∞ (3.5) as required. See Exercise 3.29. Proof. (Theorem 3.2.) Suppose that X = X 0 a.s., i.e., P(X = X 0 ) = 1. We want to show that E(X|Z) = E(X 0 |Z) a.s.. Let Y = E(X|Z) − E(X 0 |Z), note that by the definition of conditional distribution, E(X|Z), and E(X 0 |Z) are σ (Z) measurable, so does Y . For any A ∈ σ (Z), we have Z Z Y dP = A ZA = E(X|Z) − E(X 0 |Z) dP (X − X 0 ) dP (by (3.4)) A (Since X = X 0 a.s..) =0 From Lemma 3.3, we have Y = 0 a.s., i.e., E(X|Z) = E(X 0 |Z) a.s., completing the proof of Theorem 3.2. Example 3.7. Consider the probability space ([0, 1], B[0,1] , λ[0,1] ). Let 2 X(ω) = 2ω , Z(ω) = 2 if ω ∈ [0, 12 ) . ω if ω ∈ [ 12 , 1] The following heuristic can be used to guess the form of E(X|Z): First note that there is a discrete mass at Z = 2, we can follow the arguments used in conditioning on discrete r.v. to obtain that, on ω ∈ [0, 21 ), E(X|Z)(ω) = E(X|{Z = 2}) = 1 P(ω ∈ [0, 12 )) 1 = 1/2 Z 0 1 2 Z [0, 12 ) 2ω 2 dω = X(ω) dP 1 . 6 (3.6) 48 3 Conditional Expectations For the continuous part Z(ω) = ω on ω ∈ [1/2, 1], note that {Z = z} ∈ σ (Z) and {Z = z} = {ω : Z(ω) = z} = {ω = z} for z ∈ [1/2, 1], we heuristically solve for E(X|Z) from the definition by R R = {Z=z} X dP R ⇒ E(X|{Z = z}) {Z=z} dP = {ω:Z(ω)=z} X P(dω) R dP {Z=z} E(X|Z) R ⇒ (3.7) P(dω) E(X|{Z = z}) = X(z) R{ω=z} P(dω) = X(z) = 2z2 . {ω=z} Express E(X|Z) as a function of ω (sample space) instead of a function of Z (induced sample space), we have 1 1 6 if ω ∈ [0, 2 ) . E(X|Z)(ω) = 1 2 2ω if ω ∈ [ 2 , 1] To verify that E(X|Z) is really a conditional expectation, we need to verify Conditions i) and ii) in Definition 3.7. First we show i): E(X|Z) is σ (Z) measurable. Recall that σ (Z) = {Z −1 (B), B ∈ B}. For any Borel set B ∈ B, from the definition of Z, we have / B, B ∩ 21 , 1 if 2 ∈ (3.8) Z −1 (B) = B ∩ 21 , 1 ∪ 0, 12 if 2 ∈ B . On the other hand, it can be checked that the inverse mapping of {ω : E(X|Z)(ω) ∈ B}, where B ∈ B, also takes the same form as in (3.8). Hence, {ω : E(X|Z)(ω) ∈ B} ⊆ {Z −1 (B), B ∈ B} = σ (Z), i.e., E(X|Z) is σ (Z) measurable. Finally, ii) can be directly justified by calculations similar to (3.6) and (3.7). Thus, E(X|Z) satisfies Definition 3.7, i.e., is a valid conditional expectation. Example 3.8. (Elementary definition of Conditional Expectation) Recall that Definition 3.7 of conditional expectation E(X|Z) does not require the existence of joint p.d.f. of X and Z. In this example we show that E(X|Z) reduces to the elementary definition of conditional expectation when the joint p.d.f. of X and Z exists. Suppose that the joint p.d.f. of r.v.s X and Z exists and is given by fX,Z (x, z). We can define the conditional p.d.f. by fX|Z (x|z) = fX,Z (x, z)/ fZ (z) if fZ (z) 6= 0 , R and 0 otherwise, where fZ (z) = fX,Z (x, z) dx. Note that if the joint p.d.f. exists, then dP = fX,Z (x, z)dxdz. Following the same heuristic as in Example 3.7, we solve for E(X|Z) from the definition by R ⇒ E(X|{Z = z}) ⇒ RR {z=z} R = {Z=z} X dP RR fX,ZR (x, z) dxdz = {z=z} x fX,Z (x, z) dxdz {Z=z} E(X|Z) dP E(X|{Z = z}) = x fX,Z (x,z) dx fZ (z) R = x fX|Z (x|z) dx =: g(z) , say. Then E(X|Z) , g(Z) is the conditional expectation of X given Z. (3.9) 3.3 Conditioning on a σ -field 49 Next we justify Conditions i) and ii) in Definition 3.7. First, i) is trivial since g(z) is a function involving only z. Lastly we verify ii): for any B ∈ B and A , {ω : Z(ω) ∈ B} ∈ σ (Z), we have Z Z Z X dP = A IA (z)x fX,Z (x, z) dxdz . and Z Z g(Z) dP = A Z Z IA (z)g(z) fZ (z) dz = IA (z) Z Z = x fX|Z (x|z) dx fZ (z)dz Z IA (z)x fX,Z (x, z) dxdz = X dP . A Thus conditions in Definition 3.7 are fulfilled. 3.3 Conditioning on a σ -field Note that Definition 3.7 only looks at sets of σ (Z) rather than on the actual values of Z, we can generalize the definition of conditional expectation to a given σ -field. Definition 3.9. Let X ∈ L 1 on (Ω , F , P), and let G ⊂ F be a σ -field. Then the conditional expectation of X given G is defined to be a random variable E(X|G ) such that i) E(X|G ) is a G -measurable r.v. ii) For any A ∈ G , Z Z E(X|G ) dP = A X dP . (3.10) A Remark 3.2. Combining Definition 3.7 and 3.9, it can be seen that E(X|σ (Z)) = E(X|Z) . Note that different r.v.s can generate the same σ -field. For example, on the sample space Ω = {H, T }, let X1 = 1{H} and X2 = 1{T } generate the same σ -field F = {∅, {H}, {T }, Ω }. Thus, conditional expectation given a σ -field is more general than conditional expectation given a r.v.. Example 3.9. If G = {φ , Ω }, then E(X|G ) = E(X) a.s. To see this, note that if G = {∅, Ω }, then any constant r.v., including E(X), is G -measurable. Note that Z Z E(X) dP , X dP = E(X) = Ω and Ω 50 3 Conditional Expectations Z Z X dP = 0 = ∅ E(X) dP . ∅ Therefore, E(X) satisfies i) and ii) of Definition 3.7, thus E(X|G ) = E(X), a.s.. Example 3.10. For B ∈ G , we have E(E(X|G )|B) = E(X|B). To see this, note that from the definition of conditional expectation, for B ∈ G we have Z Z E(X|G ) dP = X dP . B B Thus Definition 3.5 implies that 1 E(E(X|G )|B) = P(B) 1 E(X|G ) dP = P(B) B Z Z X dP = E(X|B) . B Example 3.11. Consider the fair dice example where Ω = {1, . . . , 6} and P({i}) = 1 Ω 6 . Let F1 = {∅, Ω }, F2 = {∅, {1, 2, 3}, {4, 5, 6}, Ω }, F3 = 2 and X : Ω → R satisfying X(i) = (i − 4)+ . Note that σ (X) = {∅, {5}, {6}, {5, 6}, {1, 2, 3, 4}, {1, 2, 3, 4, 5}, {1, 2, 3, 4, 6}, Ω } , thus X ∈ F3 but X ∈ / F1 and F2 . From Example 3.9, E(X|F1 ) = E(X) = (5 − 4) 61 + (6 − 4) 61 = 0.5. Next, E(X|F2 ) is a F2 measurable function, thus it takes constant values on each of {1, 2, 3} and {4, 5, 6}. Simple calculations yield R {1,2,3} X dP=0 if ω ∈ {1, 2, 3} , E(X|F2 )(ω) = R P({1,2,3}) {4,5,6} X dP = 1 (5 − 4) 1 + (6 − 4) 1 = 1 if ω ∈ {4, 5, 6} . P({4,5,6}) 1/2 6 6 Lastly, since X ∈ F3 , we have E(X|F3 ) = X . 3.4 Radon-Nikodym derivative In Definition 3.7 and 3.9, it is implicitly assumed that the random variable E(X|Z) or E(X|G ) exists. The existence of conditional expectation is non-trivial, but is guaranteed by the Radon-Nikodym Theorem, which is an important tool in probability theory that makes connections between measures. Definition 3.10. (Absolute Continuous measures) Suppose that P and Q are probability measures on (Ω , F ) and that P(F) = 0 implies Q(F) = 0 when F ∈ F . Then Q is said to be absolutely continuous with respect to P and is denoted by Q P. Example 3.12. Consider the measurable space (Ω , F ) where Ω = {1, 2, 3, 4, 5, 6} and F = 2Ω . For ω ∈ Ω , let P1 ({ω}) = 1/6, P2 ({ω}) = |ω − 3.5|/9, P3 ({ω}) = 1 1 3 1{ω≥4} and P4 ({ω}) = 3 1{ω≤3} . Then, the only absolute continuous pairs are P1 P2 , P2 P1 , P3 Pi and P4 Pi for i = 1, 2. 3.4 Radon-Nikodym derivative 51 For the measurable space (R, B), let PN and PE satisfies PN ((a, b]) = FN (b) − FN (a) and PE ((a, b]) = FE (b) − FE (a), where FN and FE are the c.d.f.s of the standard normal and standard exponential distribution family, respectively. Then we have PE PN , but PN PE is not true. Theorem 3.4. (Radon-Nikodym Theorem) Let P and Q be two probability measures on the measurable space (Ω , F ). If Q P, then there exists a F measurable r.v. Y ∈ L 1 such that Z Q(A) = Y dP , (3.11) A for any A ∈ F . The r.v. Y is called the Radon-Nikodym derivative of Q w.r.t. P and is denoted by Y = dQ dP . Proof. The proof consists of four steps: i) Assume Q(A) ≤ P(A) for all A ∈ F . Construct a Yn on a finite σ -field Fn ⊂ F . ii) Take limit to obtain the required r.v. Y = limn→∞ Yn . iii)Show that the limit Y satisfies (3.11) any A ∈ F . iv) Relax the assumption Q(A) ≤ P(A). i) [Construction on finite σ -field] Assume that Q(A) ≤ P(A) for all A ∈ F . First we define the required function Y on a finite sub-σ field Fn ⊂ F which is generated n by a finite partition of Ω . That is, for some integer rn , Ω = ∪rj=1 Fj where Fi ∩Fj = ∅ if i 6= j and Fn = σ (F1 , . . . , Frn ). Define Yn : Ω → R by ( Q(Fj ) if ω ∈ Fj and P(Fj ) > 0 , Yn (ω) = P(Fj ) (3.12) 0 if P(Fj ) = 0 . Note that Yn ∈ L 1 since E(|Yn |) = rn Z Yn dP = Ω ∑ rn Z j=1 Fj Yn dP = ∑ Q(Fj ) = Q(Ω ) = 1 < ∞. j=1 Moreover, Yn (ω) is a constant (Q(Fj )/P(F )) on ω ∈ Fj , so Yn is Fn measurable. Rj Also, simple algebra shows that Q(Fj ) = Fj Yn dP. Since every F ∈ Fn is a union of Fj s, (3.11) holds for all A ∈ Fn . Next consider a refinement Fn+1 = σ (F̃1 , . . . , F̃rn+1 ) of Fn such that {F̃j } j=1,...,rn+1 are disjoint and each Fi ∈ Fn is a union of some F̃j s. Define Yn+1 : Ω → R by ( Q(F̃j ) if ω ∈ F̃j and P(F̃j ) > 0 , Yn+1 (ω) = P(F̃j ) 0 if P(F̃j ) = 0 . Similarly, it can be shown that Yn+1 ∈ L 1 , Yn+1 is Fn+1 measurable, and Q(A) = R A Yn+1 dP for all A ∈ Fn+1 . Moreover, 52 3 Conditional Expectations Z 2 Yn+1 dP = Ω Z (Yn+1 −Yn )2 dP + Ω Z = Z Yn2 dP + 2 Z Ω (Yn+1 −Yn )2 dP + Ω ≥ Z Z Yn (Yn+1 −Yn ) dP Ω Yn2 dP Ω Yn2 dP , (3.13) Ω where the second equality follows from Z rn Yn (Yn+1 −Yn ) dP = Ω = ∑ Z j=1 Fj rn Yn (Yn+1 −Yn ) dP Q(Fj ) ∑ P(Fj ) j=1 Z Fj Yn+1 dP − Z Fj Q(Fj ) dP P(Fj ) rn = Q(Fj ) (Q(Fj ) − Q(Fj )) = 0 . j=1 P(Fj ) ∑ ii) [Passing to the limit] From (3.13), it can be observed that if the partition of Ω gets finer and finer (e.g., a sequence of σ -field F1 ⊂ . . . ⊂ Fk ⊂ . . .), then the corresponding Radon-Nikodym derivative Yk is a non-decreasing sequence in the R R R 2 dP ≥ 2 2 sense that Ω Yk+1 Ω Yk dP. On the other hand, Ω Yk dP is bounded because P(Ω ) = 1 and Yk ≤ 1 from (3.12) and the assumption Q(A) ≤ P(A) for all A ∈ F . R Thus, the limit l := limk→∞ Ω Yk2 dP exists. Some limit arguments show that the limit Y = limk→∞ Yk also exists (see Exercise 3.30). iii) [Verify (3.11)] Since Yn ∈ Fn ⊂ F and Y is the limit of Yn , Y is F measurable. For any A ∈ F , let Rn be the common refinement of Fn and {A, Ac }. It can be R verified directly that Q(A) = A YRn dP where YRn is defined similarly as Yn but is on R R Rn instead of Fn . Similar limit arguments as in ii) show that A YRnk dP → A Y dP for some subsequence {nk }k=1,2,... , yielding (3.11) (see Exercise 3.30). iv) [Relax the assumption Q(A) ≤ P(A)] In general, let S = P + Q, so P(A) ≤ S(A) and Q(A) ≤ S(A) for allR A ∈ F . The previous results implies that there exist R YQ and YP such that Q(A) = A YQ dS and P(A) = A YP dS for all A ∈ F . Now, let Y Y = YQ 1{YP >0} . Then, P Z Q(A) = A YQ dS YQ YP dS + A∩{YP >0} YP Z = Z = ZA = YYP dS + 0 Y dP , A Z A∩{YP =0} YQ dS 3.5 General Properties of Conditional Expectations 53 R where the third equality follows from that on B := {YP = 0}, P(B) = B YP dS = 0, which implies Q(B) = 0 as Q P, which in turn implies that S(B) = 0. The last equality follows from Exercise 3.30. Thus the proof is completed. Using the Radon-Nikodym Theorem, the existence of conditional expectation can be justified: Theorem 3.5. (Existence of Conditional Expectation) Let (Ω , F , P) be a probability space and G be a σ -field contained in F . Then for any r.v. X ∈ L 1 there exists a G -measurable r.v. XG such that for every A ∈ G , Z Z X dP = A A XG dP . Here XG can be regarded as the conditional expectation E(X|G ). Proof. Given a positive r.v. X, let QX : F → R be a set function satisfying Z QX (A) = X dP , (3.14) A for A ∈ F . It can be checked that QX is a probability measure and QX P on G . Thus, applying Radon-Nikodym Theorem on the measure QX (·) yields a G meaR surable r.v., say Y , such that QX (A) = A Y dP for A ∈ G . Together with (3.14), Y satisfies i) and ii) of Definition (3.9). Thus XG = Y is the desired conditional expectation of X given G . For general r.v. X, we can write X = X + − X − , where X + = X1{X>0} and X − = −X1{X<0} are positive r.v.s. Then the preceding result can be applied to each of X + and X − , and the difference of the two conditional expectations is the desired XG . Example 3.13. Refer to Example 3.12, we have dP1 dP2 (ω) dP2 dP1 (ω) = dP3 dP4 dP2 and dP2 ? = 3/2|ω − 3.5|, dP3 dP4 (ω) = 2 · 1{ω≥4} and dP (ω) = 2 · 1{ω≤3} . What are 2|ω − 3.5|/3, dP 1 1 Rx On theRother hand, note that PN ((−∞, x)) = FN (x) = −∞ fN (x) dx and PE ((−∞, x)) = x FE (x) = 0 fE (x) dx where fN (x) and fE (x) are the p.d.f.s of standard normal and standard exponential distribution, respectively. Note that ffNE (x) (x) is Borel measurable and satisfies Z A fE (x) dPN = fN (x) Z A fE (x) fN (x) dx = fN (x) Z A fE (x) dx = PE (A) . dPN dPE Thus, dP = ffNE (x) (x) . However, dPE does not exists as PN is not absolutely continuous N w.r.t. PE ( fN (x)/ fE (x) does not exist for x < 0). 3.5 General Properties of Conditional Expectations Theorem 3.6. Conditional expectation has the following properties 54 3 Conditional Expectations 1. E(aX + bY |G ) = aE(X|G ) + bE(Y |G ) (linearity). 2. E(E(X|G )) = E(X). 3. E(XY |G ) = XE(Y |G ) if X is G -measurable. 4. E(X|G ) = E(X) if X is independent of G 5. E(E(X|G )|H ) = E(X|H ) if H ⊂ G (tower property). 6. If X ≥ 0, then E(X|G ) ≥ 0 (positivity). Here a, b are real numbers, X, Y ∈ L 1 are defined on a probability space (Ω , F , P) and G , H ⊂ F are σ -fields. In (3), we also assume that the product XY ∈ L 1 . All equalities and inequalities hold almost surely. Proof. 1. For any B ∈ G Z (aE(X|G ) + bE(Y |G ))dP = a B Z Z E(X|G )dP + b Z B = E(Y |G )dP B (aX + bY )dP. B The result thus follows from the definition of conditional expectation. 2. This follows by putting B = Ω in Example 3.10. Also, it is a special case of (5) when H = {∅, Ω }. 3. We first verify the result for X = IA (indicator function) where A ∈ G . In this case Z B IA E(Y |G )dP = Z E(Y |G )dP ZA∩B = Y dP ZA∩B = B IAY dP. for any B ∈ G . From the definition of conditional expectation, this implies IA E(Y |G ) = E(IAY |G ) . Using 1), we can extend the preceding result to a simple-function X = ∑mj=1 a j IA j , i.e., m XE(Y |G ) = m ∑ a j IA j E(Y |G ) = E( ∑ a j IA j Y |G ) = E(XY |G ) . j=1 j=1 where A j ∈ G for j = 1, 2, . . . , m. Finally, the general result follows by approximating X with an increasing sequence of simple functions and use MCT. 4. Since X is independent of G , the random variables X and IB are independent for any B ∈ G . Thus Z Z B Z dP = E(X) E(X)dP = E(X) B IB dP = E(X)E(IB ) Ω 3.5 General Properties of Conditional Expectations 55 Since independent random variables are uncorrelated, Z E(X)E(IB ) = E(XIB ) = R R Now we have obtained that B E(X)dP = E(X). 5. By Definition, Z B XdP for all B ∈ G , i.e., E(X|G ) = Z E(X|G )dP = B for every B ∈ G and XdP . B XdP B Z Z E(X|H )dP = B XdP B for every B ∈ H . Since H ⊂ G , Z Z E(X|G )dP = B E(X|H )dP B for every B ∈ H . Therefore, the definition of conditional expectation implies that E(E(X|G )|H ) = E(X|H ). 6. For any n we put 1 An = E(X|G ) ≤ − . n Since E(X|G ) is a G measurable r.v., we have An ∈ G . If X ≥ 0 a.s., then 0≤ Z 1 E(X|G )dP ≤ − P(An ) n An Z XdP = An which implies P(An ) = 0. Since {E(X|G ) < 0} = ∞ [ An , n=1 it follows that P{E(X|G ) < 0} = ∑∞ i=1 P(An ) = 0, completing the proof. Example 3.14. (Independence and Zero-Correlation) Recall that X ∈ L 1 if E(|X|) < ∞. If X1 and X2 are independent, then they are uncorrelated, i.e., E(X1 X2 ) = E(X1 )E(X2 ). To see this, note that E(X1 X2 ) = E(E(X1 X2 |σ (X2 ))) = E(X2 E(X1 |σ (X2 ))) = E(X2 E(X1 )) = E(X1 )E(X2 ) , (3.15) where the four equalities are consequences of Properties 2., 3., 4., 1. of Theorem 3.6 respectively. Repeating the argument in (3.15), we have that, if X1 , . . . , Xn ∈ L 1 are independent, then they are uncorrelated, i.e., 56 3 Conditional Expectations E(X1 X2 · · · Xn ) = E(X1 )E(X2 ) · · · E(Xn ) provided that the product X1 X2 · · · Xn ∈ L 1 . However, the opposite direction is not true. See Exercise 3.15. 3.6 Useful Inequalities Definition 3.11. (Convex Function) A function ϕ : R → R is convex if for any x, y ∈ R and any λ ∈ [0, 1] ϕ(λ x + (1 − λ )y) ≤ λ ϕ(x) + (1 − λ )ϕ(y) . Graphically, a function ϕ is convex if it lies below the straight line from the point (x, ϕ(x)) to the point (y, ϕ(y)) on any interval [x, y]. Theorem 3.7. (Jensen’s Inequality) Let ϕ : R → R be a convex function and let X ∈ L 1 on (Ω , F , P) such that ϕ(X) ∈ L 1 . Then ϕ(E(X|G )) ≤ E(ϕ(X)|G ) a.s. (3.16) for any σ -field G on Ω contained in F . Proof. By the convexity of ϕ, for every fixed c ∈ R, there exists an ac such that ϕ(x) ≥ ac (x − c) + ϕ(c) for all x. (If ϕ(x) is continuous and ac = ϕ 0 (x)|x=c , then the line ac (x − c) + ϕ(c) is the tangent of ϕ(x) at the point (c, ϕ(c)).) Replacing x by a r.v. X and taking conditional expectation on both sides gives E(ϕ(X)|G ) ≥ ac (E(X|G ) − c) + ϕ(c) . Since c is arbitrary, put c = E(X|G ) yields (3.16). Theorem 3.8. (Hölder Inequality) Let p > 1, 1p + 1q = 1 and X, Y be r.v. on (Ω , F , P). If E(X p ) < ∞ and E(Y q ) < ∞, then E(|XY |) < ∞ and 1 1 E(XY ) ≤ (E(X p )) p (E(Y q )) q . Proof. Without loss of generality, assume that X and Y are positive r.v.s. First define a new probability measure Q on (Ω , F ) by Xp p dP , A kXk p Z Q(A) = where kXk p = (E(X p ))1/p . Then, 3.6 Useful Inequalities 57 Y kXk pp X p Y kXk pp dP = dQ p−1 kXk p p−1 Ω X Ω X p q1 Z 1q Z Jensen Y q kXk pq Y q kXk pq Xp p p dP ≤ dQ = p Ω X q(p−1) Ω X q(p−1) kXk p ! q1 Z 1 Z p(q−1) q Y q kXk p q = dP = kXk Y dP p qp−q−p X Ω Ω Z Z Z XY dP = Ω = kXk p kY kq , completing the proof. The following is the triangular inequality for the L p norm kXk p = (E|X| p )1/p . Theorem 3.9. (Minkowski’s Inequality) Let p > 1 and X, Y are r.v. on (Ω , F , P). If E(X p ) < ∞, then kX +Y k p ≤ kXk p + kY k p . Proof. We focus on p > 1 as p = 1 corresponds to the usual triangular inequality. Note that |X +Y | p ≤ |X||X +Y | p−1 + |Y ||X +Y | p−1 . Set q = p p−1 Z such that 1 p (3.17) + 1q = 1. Apply Hölder’s inequality on |X||X +Y | p−1 yields p 1 |X||X +Y | p dP ≤ kXk p (E|X +Y |(p−1)q ) q = kXk p kX +Y k pq . (3.18) Ω Similarly, we have Z p |Y ||X +Y | p dP ≤ kY k p kX +Y k pq . (3.19) Ω The result follows from combining (3.17), (3.18) and (3.19). Given a sequence of random variables X1 , . . . , Xn on the probability space (Ω , F , P), we have the following generalization of MCT, Fatou’s lemma and DCT: Theorem 3.10. Let G be a sub σ -field of F . i) (Monotone Convergence Theorem (MCT)) If 0 ≤ Xn ↑ X, then E(Xn |G ) ↑ E(X|G ) a.s.. ii) (Fatou’s Lemma) If Xn ≥ 0, then E(lim inf Xn |G ) ≤ lim inf E(Xn |G ) a.s.. iii)(Dominated Convergence Theorem (DCT)) If for some r.v. M and all n ≥ 1, a.s. Xn ≤ M (i.e. |Xn (ω)| ≤ M(ω) for all ω ∈ Ω ), and if E(M) ≤ ∞ and Xn → X, then a.s. E(Xn |G ) → E(X|G ) . 58 3 Conditional Expectations 3.7 Exercises Exercise 3.11 Consider the dice example (Ω , F , P) where Ω = {1, 2, 3, 4, 5, 6} and F = 2Ω . Let P({ω}) = 1/6 for ω = 1, . . . , 6, and A1 = {1, 2, 3}, A2 = {4, 5, 6}, A3 = {2, 5}, A4 = {2, 6}, Xi = 1Ai , Fi = σ ({Ai }) for i = 1, . . . , 4 , X5 = X5 (ω) = ω1Ai (ω), F5 = σ ({A1 , A3 }), F6 = σ ({A3 , A4 }), F7 = 2Ω . a) b) c) d) Which events are independent? Which random variables are independent? Which σ -fields are independent? for i = 1, . . . , 6. Repeat a)-c) using the measure P({i}) = |i−3.5| 9 Exercise 3.12 Explain the meaning of independence between two events and mutually exclusive between two events. Can two events be both independent and mutually exclusive? Explain. Exercise 3.13 How to define the independence between an event and a random variable? How to define the independence between an event and a σ -field? How to define the independence between a random variable and a σ -field? Exercise 3.14 Show that two random variables X and Y are independent if and only if the σ -fields σ (X) and σ (Y ) are independent. Exercise 3.15 Let X ∼ N(0, 1) and Y = X 2 . Show that X and Y are uncorrelated but dependent. Exercise 3.16 Let IA be the indicator function of an event A, i.e. 1 if ω ∈ A IA (ω) = 0 if ω ∈ / A. Show that E(IA |B) = P(A|B) for any event B with P(B) 6= 0. Exercise 3.17 For the following r.v.s, write down the probability space and the induced probability space, and find the expectation of X. i) (Constant r.v.) X(ω) = a for all ω. ii) X : [0, 1] → R given by X(ω) = min{ω, 1 − ω} (The distance to the nearest endpoint of the interval [0, 1]) iii)X : [0, 1]2 → R , the distance to the nearest endpoint of the unit square. Exercise 3.18 Let Ω = [0, 1], F = B[0,1] and P = λ[0,1] . Let X(ω) = ω(1 − ω) for ω ∈ Ω . For any random variable Y on (Ω , F , P), i) Show that Y (ω) + Y (1 − ω) is σ (X) measurable. (Hint: Draw the graph Y against ω) 3.7 Exercises ii) Show that E(Y |X)(ω) = 59 Y (1−ω)+Y (ω) 2 for any ω ∈ Ω . Exercise 3.19 Take Ω = [0, 1], F = B[0,1] and P = λ[0,1] , the Lebesgue measure on [0, 1]. Find E(X|Z) if X(ω) = 2ω 2 and Z(ω) = 1{ω∈[0,1/3]} + 2 × 1{ω∈(2/3,1]} . Exercise 3.20 Show that if Z is a constant function, then E(X|Z) is constant and equal to E(X). Exercise 3.21 On (Ω , F , P), show that for A, B ∈ F , P(A|B) if ω ∈ B , E(1A |1B )(ω) = P(A|Ω \B) if ω ∈ / B. Exercise 3.22 Recall that if Z : Ω → R is a discrete r.v. taking n values {zi }i=1,...,n , then there exists disjoint Ai such that Ω = ∪ni=1 Ai and Z(ω) = zi whenever ω ∈ Ai . Show that every element in σ (Z) is a union of some Ai s. Generalize to the case where Z is a discrete r.v. taking infinite values {zi }i=1,... . Exercise 3.23 Show that if X is G -measurable, then E(X|G ) = X a.s. Exercise 3.24 Refer to Example 3.10. If B ∈ / G , does the result still hold? Prove it or give a counter example. Exercise 3.25 Show that, if Q P, then for any ε > 0, we can find a δ > 0 such that P(F) < δ implies Q(F) ≤ ε. Exercise 3.26 Show that, if P and Q satisfy (3.11) for any A ∈ F ., then Q P. Exercise 3.27 Let Q1 , Q2 and P be measures on (Ω , F ). Show that if Q1 P and Q2 P, then Q1 + Q2 P. Exercise 3.28 Consider the probability space (Ω , F , P) with Ω = (1, 2, 3, 4, 5, 6) and F = 2Ω . If P({i}) = i/21 and Q({i}) = (i − 1)/15. Is P Q? Is Q P? Find dP and dQ the Radon-Nikodym derivative dQ dP if they exist. T∞ Exercise 3.29 Justify rigorously the equality P ( (3.5). Tm n=1 An ) = limm→∞ P ( n=1 An ) in Exercise 3.30 Using the notation in the proof of Theorem 3.4. i) From the existence of l = limk→∞ Ω Yk2 dP, show that we can find a subsequence R {nk }k=1,... such that l − 41k < limk→∞ Ω Yn2k dP ≤ l. R R ii) Using the definition of Yk , show that Ω (Ynk+1 −Ynk )2 dP = Ω (Yn2k+1 −Yn2k ) dP < 1 . 4k R iii)By Cauchy Schwartz inequality,R show that Ω |Ynk+1 −Ynk | dP <R 21k . ∞ iv) Using (iii) and MCT, show that Ω ∑∞ k=1 |Ynk+1 −Ynk | dP = ∑k=1 Ω |Ynk+1 −Ynk | dP < ∞. v) Using (iv), argue that ∑∞ k=1 |Ynk+1 −Ynk | < ∞ almost surely. (Hint: proof by contradiction). Then, show that ∑∞ k=1 (Ynk+1 −Ynk ) < ∞ almost surely. R 60 3 Conditional Expectations vi) Using (v), show that limRk→∞ Yk exists almost surely. R vii) Noting that l − 41k < Ω Yn2k ≤ Ω YR2 n ≤ l, repeating the arguments (ii)-(iii), k R show that F (YR2 n −Yn2k ) dP < 1/2k . k R R viii)Using (vi) and (vii), show that F YRnk dP → F Y dP. R R ix) Using A YP dS = P(A) = A dP, show that ES (YYP ) = EP (Y ) for any simple function Y on F . Extend to the case where Y is F measurable.