Download 18. Conditional expectation and variance We define the conditional

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
74
HARRY CRANE
18. Conditional expectation and variance
We define the conditional expectation of Y given X, written E(Y | X), by
( P
Y is discrete
y y fY|X (y | x)
E(Y | X = x) := R
y fY|X (y | x)dy Y is continuous.
We define the conditional Variance of Y given X, denoted Var(Y | X), analogously.
Example 18.1. Suppose X, Y are jointly normal with EX = EY = 0, Var(X) = Var(Y) = 1, and
Cov(X, Y) = ρ. Then Y | X = x ∼ N(ρx, 1 − ρ2 ), and
E(Y | X) = ρx
and
Var(Y | X = x) = 1 − ρ2 .
Conditional expectation has all the usual properties of expectation since it is essentially
the expectation you would compute for the reduced sample space {ω ∈ Ω : X(ω) = x};
however, you might want to keep in mind (and come to terms with) the fact that this
reduced sample space is in fact random, it depends on the value of the random variable X.
In that sense, E(Y | X) and Var(Y | X) are, themselves, random variables.
We also have:
E(g(X)h(Y) | X = x) = g(x)E(h(Y) | X = x)
E(g(X) | X = x) = g(x)
E(Y | X = x) = E(Y)
if X and Y are independent.
18.1. Conditional mean and variance of Y given X. For each x, let ϕ(x) := E(Y | X = x).
The random variable ϕ(X) is the conditional mean of Y given X, denoted E(Y | X). The
conditional mean satisfies the tower property of conditional expectation:
EY = EE(Y | X),
which coincides with the law of cases for expectation. To define conditional variance
Var(Y | X) := ψ(X), where ψ(x) := Var(Y | X = x), we need E|Y| < ∞.
Example 18.2. Let X and Y be marginally standard Normal random variables with Cov(X, Y) = ρ.
We have already seen that the conditional law of Y given X = x is N(ρx, 1 − ρ2 ). We have
• E(Y | X = x) = ρx implies E(Y | X) = ρX.
• Var(Y | X = x) = 1 − ρ2 implies Var(Y | X) = 1 − ρ2 .
• E(Y | X) has mean ρEX = 0 = EY.
• Var(Y | X) has mean 1 − ρ2 .
We observe
Var(E(Y | X)) + E Var(Y | X) = ρ2 + (1 − ρ2 ) = 1 = Var(Y).
18.2. Mean and variance of a random sum. Consider the random sum
N
X
SN :=
Xi = X1 + · · · + XN ,
i=1
where X1 , X2 , . . . are independent, identically distributed and independent of N, EX1 = µ,
Var(X1 ) = σ2 , and N is a non-negative integer-valued random variable. We have already
computed ESN and Var(SN ) during our discussion on generating functions. Alternatively,
we can compute these using properties of conditional expectation and variance.
PROBABILITY MODELS
75
For ESN , we observe
ESN =
=
=
=
EE(SN | N = n)
EE(Sn | N = n)
E(NEX)
ENEX.
For Var(SN ), we have
Var(SN | N = n) = Var(Sn ) = n Var(X),
which implies
Var(SN | N) = N Var(X).
From above, we have ESN = ENEX and so Var(E(SN | N)) = Var(NEX). We conclude
Var(SN ) = [EX]2 Var(N) + EN Var(X).
Note that (i) E(E(Y | X)) = EY and (ii) Var(Y) = Var(E(Y | X)) + E Var(Y | X). To see (ii):
Var(Y) = EY2 + (EY)2
= EE(Y2 | X) − E(E(Y | X))2
= E(E(Y2 | X) − E(Y | X)2 ) + E(E(Y | X)2 ) − E(E(Y | X))2 .
|
{z
} |
{z
}
E Var(Y|X)
Var(E(Y|X))
Example 18.3. Roll a fair 10-sided die. If we roll a 1, 2, . . . , 7, we win that many dollars and start
over (roll again). If we roll 8, 9, 10, we win that many dollars and the game stops. What is the
expectation and variance of the number of rolls and the amount of money won?
Consider a coin with P{Heads} = 3/10. Each time it lands Tails, we get a random number
Ui ∼ d-Unif{1, . . . , 7} of dollars. If it lands on Heads, we get V ∼ d-Unif{8, 9, 10} and quit. Let N
be the number of tails flipped before the first head appears. Then N ∼ Geometric(3/10). The number
of dollars won is M := V + SN , where SN = U1 + · · · + UN , N ∼ Geo(3/10) and P{Ui = j} = 1/7,
j = 1, . . . , 7. Hence,
EN = q/p = 7/3, EU1 = 4, EV = 9 and
Var(N) = q/p2 = 70/9,
Var(U1 ) =
72 − 1
= 4,
12
Var(U2 ) = 2/3.
It follows that
EM = E(SN + V) = ENEU1 + EV =
7
× 4 + 9 = 55/3;
3
Var(M) = Var(SN ) + Var(V)
= (EU1 )2 Var(N) + Var(U1 )EN + Var(V)
= 42 × 70/9 + 4 × 7/3 + 2/3
4
= 134 .
9
18.3. Best prediction. Recall that the constant that minimizes E(Y − c)2 is c = EY, provided
EY2 < ∞.
Now, suppose X, Y are random variables with joint probability mass function fX,Y . Then
what function g(X) predicts Y in terms of minimizing
MSE = E(Y − g(X))2 ?
76
HARRY CRANE
We have
MSE = EE((Y − g(X))2 | X)
X
=
E((Y − g(X))2 | X = x) fX (x)
=
x
X
E((Y − g(x))2 | X = x) fX (x).
x
For each x, the nth summand is minimized by g(x) := E(Y|X = x); hence, the best predictor
is g(X) = E(Y|X), which has MSE
E(Y − E(Y|X))2 = E(E(Y − E(Y|X))2 |X) = E Var(Y|X) = Var(Y) − Var(E(Y|X)).
18.3.1. Best linear prediction. A predictor g(X) of the form a + bX is a linear predictor. Suppose
X, Y are mean 0, variance 1 and have Cov(X, Y) = ρ. What pair (a, b) minimizes E(Y − (a +
bX))2 ? To answer this, notice that
Cov(Y − ρX, X) = Cov(Y, X) − Cov(ρX, X) = 0.
Therefore, for R = Y − (a + bX) = −a + Y − ρX + (ρ − b)X, we have
ER = −a
Var(R) = Var(Y − ρX) + (ρ − b)2 Var(X)
MSE = ER2 = a2 + (ρ − b)2 + Var(Y − ρX).
Take a = 0 and b = ρ. Then MSE(ρX) = Var(Y − ρX) = 1 − ρ2 . As an immediate corollary, we
observe that |ρ| ≤ 1. In general, for arbitrary means µX , µY , variance σ2X , σ2Y , and covariance
σXY , the best linear predictor of Y given X is
σY
µY + ρ (X − µX ),
σX
where ρ = σXY /(σX σY ), with MSE = σ2Y (1 − ρ2 ).
Related documents