Download 18. Conditional expectation and variance We define the conditional

74 HARRY CRANE 18. Conditional expectation and variance We define the conditional expectation of Y given X, written E(Y | X), by ( P Y is discrete y y fY|X (y | x) E(Y | X = x) := R y fY|X (y | x)dy Y is continuous. We define the conditional Variance of Y given X, denoted Var(Y | X), analogously. Example 18.1. Suppose X, Y are jointly normal with EX = EY = 0, Var(X) = Var(Y) = 1, and Cov(X, Y) = ρ. Then Y | X = x ∼ N(ρx, 1 − ρ2 ), and E(Y | X) = ρx and Var(Y | X = x) = 1 − ρ2 . Conditional expectation has all the usual properties of expectation since it is essentially the expectation you would compute for the reduced sample space {ω ∈ Ω : X(ω) = x}; however, you might want to keep in mind (and come to terms with) the fact that this reduced sample space is in fact random, it depends on the value of the random variable X. In that sense, E(Y | X) and Var(Y | X) are, themselves, random variables. We also have: E(g(X)h(Y) | X = x) = g(x)E(h(Y) | X = x) E(g(X) | X = x) = g(x) E(Y | X = x) = E(Y) if X and Y are independent. 18.1. Conditional mean and variance of Y given X. For each x, let ϕ(x) := E(Y | X = x). The random variable ϕ(X) is the conditional mean of Y given X, denoted E(Y | X). The conditional mean satisfies the tower property of conditional expectation: EY = EE(Y | X), which coincides with the law of cases for expectation. To define conditional variance Var(Y | X) := ψ(X), where ψ(x) := Var(Y | X = x), we need E|Y| < ∞. Example 18.2. Let X and Y be marginally standard Normal random variables with Cov(X, Y) = ρ. We have already seen that the conditional law of Y given X = x is N(ρx, 1 − ρ2 ). We have • E(Y | X = x) = ρx implies E(Y | X) = ρX. • Var(Y | X = x) = 1 − ρ2 implies Var(Y | X) = 1 − ρ2 . • E(Y | X) has mean ρEX = 0 = EY. • Var(Y | X) has mean 1 − ρ2 . We observe Var(E(Y | X)) + E Var(Y | X) = ρ2 + (1 − ρ2 ) = 1 = Var(Y). 18.2. Mean and variance of a random sum. Consider the random sum N X SN := Xi = X1 + · · · + XN , i=1 where X1 , X2 , . . . are independent, identically distributed and independent of N, EX1 = µ, Var(X1 ) = σ2 , and N is a non-negative integer-valued random variable. We have already computed ESN and Var(SN ) during our discussion on generating functions. Alternatively, we can compute these using properties of conditional expectation and variance. PROBABILITY MODELS 75 For ESN , we observe ESN = = = = EE(SN | N = n) EE(Sn | N = n) E(NEX) ENEX. For Var(SN ), we have Var(SN | N = n) = Var(Sn ) = n Var(X), which implies Var(SN | N) = N Var(X). From above, we have ESN = ENEX and so Var(E(SN | N)) = Var(NEX). We conclude Var(SN ) = [EX]2 Var(N) + EN Var(X). Note that (i) E(E(Y | X)) = EY and (ii) Var(Y) = Var(E(Y | X)) + E Var(Y | X). To see (ii): Var(Y) = EY2 + (EY)2 = EE(Y2 | X) − E(E(Y | X))2 = E(E(Y2 | X) − E(Y | X)2 ) + E(E(Y | X)2 ) − E(E(Y | X))2 . | {z } | {z } E Var(Y|X) Var(E(Y|X)) Example 18.3. Roll a fair 10-sided die. If we roll a 1, 2, . . . , 7, we win that many dollars and start over (roll again). If we roll 8, 9, 10, we win that many dollars and the game stops. What is the expectation and variance of the number of rolls and the amount of money won? Consider a coin with P{Heads} = 3/10. Each time it lands Tails, we get a random number Ui ∼ d-Unif{1, . . . , 7} of dollars. If it lands on Heads, we get V ∼ d-Unif{8, 9, 10} and quit. Let N be the number of tails flipped before the first head appears. Then N ∼ Geometric(3/10). The number of dollars won is M := V + SN , where SN = U1 + · · · + UN , N ∼ Geo(3/10) and P{Ui = j} = 1/7, j = 1, . . . , 7. Hence, EN = q/p = 7/3, EU1 = 4, EV = 9 and Var(N) = q/p2 = 70/9, Var(U1 ) = 72 − 1 = 4, 12 Var(U2 ) = 2/3. It follows that EM = E(SN + V) = ENEU1 + EV = 7 × 4 + 9 = 55/3; 3 Var(M) = Var(SN ) + Var(V) = (EU1 )2 Var(N) + Var(U1 )EN + Var(V) = 42 × 70/9 + 4 × 7/3 + 2/3 4 = 134 . 9 18.3. Best prediction. Recall that the constant that minimizes E(Y − c)2 is c = EY, provided EY2 < ∞. Now, suppose X, Y are random variables with joint probability mass function fX,Y . Then what function g(X) predicts Y in terms of minimizing MSE = E(Y − g(X))2 ? 76 HARRY CRANE We have MSE = EE((Y − g(X))2 | X) X = E((Y − g(X))2 | X = x) fX (x) = x X E((Y − g(x))2 | X = x) fX (x). x For each x, the nth summand is minimized by g(x) := E(Y|X = x); hence, the best predictor is g(X) = E(Y|X), which has MSE E(Y − E(Y|X))2 = E(E(Y − E(Y|X))2 |X) = E Var(Y|X) = Var(Y) − Var(E(Y|X)). 18.3.1. Best linear prediction. A predictor g(X) of the form a + bX is a linear predictor. Suppose X, Y are mean 0, variance 1 and have Cov(X, Y) = ρ. What pair (a, b) minimizes E(Y − (a + bX))2 ? To answer this, notice that Cov(Y − ρX, X) = Cov(Y, X) − Cov(ρX, X) = 0. Therefore, for R = Y − (a + bX) = −a + Y − ρX + (ρ − b)X, we have ER = −a Var(R) = Var(Y − ρX) + (ρ − b)2 Var(X) MSE = ER2 = a2 + (ρ − b)2 + Var(Y − ρX). Take a = 0 and b = ρ. Then MSE(ρX) = Var(Y − ρX) = 1 − ρ2 . As an immediate corollary, we observe that |ρ| ≤ 1. In general, for arbitrary means µX , µY , variance σ2X , σ2Y , and covariance σXY , the best linear predictor of Y given X is σY µY + ρ (X − µX ), σX where ρ = σXY /(σX σY ), with MSE = σ2Y (1 − ρ2 ).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 18. Conditional expectation and variance We define the conditional