Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
18:10 10/16/2000
TOPIC. Expectations. This section deals with the notion of the
expected value of a random variable. We start with some definitions
and examples, then give some ways of thinking about expected values,
and then present some properties of expectation along with examples.
Definitions. Let Ω be a sample space, P a probability measure on Ω,
X a real-valued random variable on Ω with distribution function F .
This situation is illustrated below:
Sample space Ω
'$
← Real line R
X
Sample
.................................................................................
............ .
→ ω • ...............
.............
.......... •
point
← X(ω)
&%
Probability P
We are going to define E(X), the expected value of X(ω) when
the sample point ω is chosen at Rrandom from Ω according toR P . An
alternative notation for E(X) is ω∈Ω X(ω) P (dω), or simply X dP .
Consider first the case where X is nonnegative: X(ω) ≥ 0 for
all ω ∈ Ω. If X is discrete, taking finitely or countably many values
x1 , x2 , . . . with corresponding probabilities f (x1 ), f (x2 ), . . . (here f
denotes the probability mass function of X), one takes
X
xk f (xk ).
(1)
E(X) :=
k
(2)
0
Formulas (1) and (2) are each special cases of the general definition
Z ∞
Z n
³
´
E(X) :=
x dF (x) := lim
x dF (x)
(3)
0
n→∞
7–1
0
Example 1. (a) Suppose Z is a standard normal random variable,
and consider
½
Z, if Z ≥ 0,
+
Z := max(Z, 0) =
0, if Z < 0.
Z + is a nonnegative random variable. Its distribution
has a lump of
√
−z 2/2
mass of size 1/2 at 0 and density φ(z) = e
/ 2π over the interval
(0, ∞). Hence
P [ω ∈ Ω : X(ω) ≤ x] = F (x)
If X is continuous with density f , one takes
Z ∞
E(X) :=
x f (x) dx.
where the integral is taken to be a Riemann-Stieltjes integral. For
this course you don’t need to know much about Riemann-Stieltjes
integration; you can just think of the RHS of (3) as a generic way
of writing the RHSs of (1) and (2). We don’t require the sum and
integrals in (1)–(3) to converge to a finite value; E(X) = ∞ is allowed,
and happens (see Example 1 (b) below).
Z
+
E(Z ) = 0 × P [Z
1
1
=0× + √
2
2π
+
∞
zφ(z) dz
= 0] +
Z
0
∞
ze
0
−z 2/2
1
dz = √
2π
Z
0
∞
1
e−y dy = √ . (4)
2π
In particular E(Z + ) is finite.
(b) Suppose C is a standard Cauchy random variable, with density
f (x) = 1/(π(1 + x2 )) on R. Then C + = max(C, 0) is a nonnegative
random variable with expectation
Z ∞
1
x
E(C + ) = 0 × +
dx
2
π(1
+
x2 )
0
Z ∞
¯∞
1
1
1
¯
=
dy =
log(1 + y)¯ = ∞.
(5)
2π 0 1 + y
2π
0
Note that E(C + ) is infinite.
•
7–2
Now consider the case where X can take both positive and negative values. Define random variables X + and X − on Ω by setting
½
X(ω), if X(ω) ≥ 0,
+
(6+ )
X (ω) = max(X(ω), 0) =
0,
otherwise,
½
−X(ω), if X(ω) ≤ 0,
−
X (ω) = max(−X(ω), 0) =
(6− )
0,
otherwise,
for each ω ∈ Ω, as illustrated below:
X − .......
X
................
......................
...............
................
.
.
.
.
.
.
.
.
.........
.........
......
. ...........
.... .... .... .... .... .... .... .... .... .... ...................... ....... ....... ....... ....... ......
.
.
.
.
....
......
.......
.......
.......
.
.
.
.
.
.
.
.........
..........
............
.......
.......
.......
.......
X+
(b) Suppose C is a standard Cauchy random variable. By Example 1 (b) and symmetry, we have E(C + ) = ∞ = E(C − ). Thus C
does not have an expectation, finite or otherwise.
X + = X = C + =⇒ E(X + ) = ∞
Ω
X
−
and
−
= 0 =⇒ E(X ) = 0.
Consequently X has an expectation, namely
E(X) = E(X + ) − E(X − ) = ∞ − 0 = ∞.
In this case E(X) exists, but is infinite; this is an example of a random
variable that is quasi-integrable, but not integrable.
and |X(ω)| = X + (ω) + X − (ω)
for all ω ∈ Ω; these identities are written more concisely as X =
X + − X − and |X| = X + + X − .
One says that X has an expectation, or that E(X) exists,
or that X is quasi-integrable if at least one of E(X + ) and E(X − )
is finite; in that case the expected value, or mean, of X is taken
to be
E(X) := E(X + ) − E(X − )
E(Z) = E(Z + ) − E(Z − ) = c − c = 0.
(c) As in (b), suppose C is standard Cauchy. Put X = C + . Then
X + is called the positive part of X, and X − the negative part.
Note that X + and X − are nonnegative random variables and that
X(ω) = X + (ω) − X − (ω)
Example 2. (a) Suppose Z is a standard normal random variable.
√
By Example 1 (a), we have E(Z + ) = c := 1/ 2π < ∞. Since Z − and
Z + have the same distribution, we also have E(Z − ) = c. Since both
E(Z + ) and E(Z − ) are finite, Z is integrable; its (finite) expectation
is
(7)
(d) Suppose X is a continuous random variable with density f on R
R∞
such that the integral −∞ xf (x) dx is absolutely convergent. Since
E(X + ) + E(X − )
Z ∞
Z
=
xf (x) dx +
0
Z
0
∞
(−x)f (x) dx =
−∞
|x|f (x) dx < ∞,
−∞
X is integrable with finite expectation
E(X) = E(X + ) − E(X − )
Z ∞
Z 0
Z
=
xf (x) dx −
(−x)f (x) dx =
∞
with the convention that ∞ − x = ∞ and x − ∞ = −∞ for any
nonnegative real number x. One says that X is integrable, or that
X has a finite expectation, if both E(X + ) and E(X − ) are finite,
or, equivalently, if E(|X|) is finite. There are random variables X for
which E(X + ) = ∞ = E(X − ); for such X’s, E(X) is not defined.
When it applies, (8) can be used to compute E(X) directly, without
first computing E(X + ) and E(X − ).
•
7–3
7–4
0
−∞
xf (x) dx.
(8)
−∞
18:10 10/16/2000
The strong law of large numbers (SLLN). Why is E(X) important? One of the main reasons is:
Theorem 1 (The SLLN). Suppose X1 , X2 , . . . is an infinite sequence of independent random variables, each distributed like a random variable X. Put
Sn = X1 + X2 + · · · + Xn
E(X) as a measure of location. The expected value of X is
often used as a measure of the location of the distribution of X. To
understand why, consider the case where X takes finitely many values
x1 , x2 , . . . , xk with corresponding probabilities p1 , p2 , . . . , pk . We
can represent the distribution of X by a physical system in which a
masses of weight pi are placed above the points xi for i = 1, . . . , k on
a dimensionless rod, as illustrated below:
for each n ∈ N. If X has an expectation E(X) = µ (possibly ±∞)
then
P [Sn /n converges to µ as n → ∞] = 1.
← mass pi
(91 )
On the other hand, if X does not have an expectation, then
P [lim supn |Sn /n| = ∞] = 1;
x1
(92 )
if in addition X is symmetric, then
P [lim inf n Sn /n = −∞] = 1 = P [lim supn Sn /n = ∞].
(93 )
In other words, if the “population mean” µ exists, then the sample means X̄n = Sn /n will converge to it almost surely as the sample
size n tends to infinity; but if µ does not exist, the sample means X̄n
will behave very badly as n → ∞, as illustrated in Figure 1 below.
The proof of the SLLN is not easy; we won’t go into it here (but see
Exercises 10 and 11 for some special cases).
Figure 1: A graph of Sn /n versus n for a random sample
of size 12000 from the standard Cauchy distribution.
..
Xn
Sn
n−1 Sn−1
.....
.......
n = n
n−1 + n
. . ........ .......................... ...........................
0 .. .. . ............
−1 . .....
.
......
...................................................
..
..............
................
−2
....
...........................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
......... ...
−3
......
..........
−4
X̄n
0
2000
4000
6000
n
7–5
8000
10000
12000
↑
c
x3 x4
xk
← dimensionless rod
Consider the center of gravity of this mass system, i.e., the point
c at which the rod would balance if it were pivoted there. According
to physics, c must satisfy the so-called balancing equation
k
X
pi (xi − c) = 0.
i=1
Since
k
X
i=1
1
xi
pi xi = E(X) and
k
X
pi = 1
i=1
the solution to the balancing equation is
Pk
pi xi
c = Pi=1
= E(X).
k
i=1 pi
(10)
In general, for any integrable random variable X, the center of gravity
of the distribution of X is c = E(X). There is an important corollary:
moving a little bit of probability mass a long way from its initial
position has a big effect on the expected value of X.
7–6
The expected value of a transformation of X. Suppose Y =
t(X) is a transformation t of X. The expected value of Y can be
expressed directly in terms of the distribution of X. To see how,
consider the case where X is continuous with density f on (−∞, ∞)
and the transformation t is regular from (−∞, ∞) to (0, ∞). Since Y
has density
¡
¢
fY (y) = fX u(y) |u0 (y)|
−1
where u = t is the inverse of t (see 3.19), Y has expectation
Z
Z
¡
¢ ¡
¢
yfY (y) dy =
t u(y) fX u(y) |u0 (y)| dy
y>0
Z
by (12)
t(x)fX (x) dx
0
The integral here is the area |A| of the region A = {(x, u) : 0 < x <
Q(u)} indicated below:
1
y>0
=
x=u(y)
Expressing E(X) in terms of Q and F . Let X be a random variable with quantile function Q and distribution function F . E(X) can
be expressed directly in terms of Q, and also directly in terms of F .
To see how, let U be a standard uniform random variable, with density fU (u) = I(0,1) (u). Since X and Q(U ) have the same distribution
by the IPT Theorem (Theorem 1.5), so do X + and Q+ (U ), whence
Z 1
¡ +
¢
+
E(X ) = E Q (U )
=
Q+ (u) du.
............................................................
..................... ....................
...............................
..........................
..................
...............
. .......
......
.
.
.
.
.
.
........
.
.
.
.
.
.... . . .
................
........ ...........
.........................
.........................................
.
.
.
.
.
.
.
.
..
............. ............................
.................................................
................................... ...................................
A
The area of this infinitesimal
←−
strip equals Q+ (u) du.
u + du
u
(11)
−∞<x<∞
A = {(x, u) : 0 < x < Q(u)}
B = {(x, u) : Q(u) < x < 0}
F (x)
by (3.18). The point is that you can find E(Y ) from (11) without
having to first work out the distribution of Y . This important fact is
true in general.
Theorem 2. Let X be an arbitrary (not necessarily continuous) random variable) and let Y = t(X) for an arbitrary (not necessarily
regular) transformation t. Put
Z
Z
+
τ+ = t (x) dFX (x) and τ− = t− (x) dFX (x).
0
B
x
0
Q(u)
By slicing A into infinitesimal vertical strips instead of horizontal ones,
we can also compute its area as
Z ∞
Z ∞
Z ∞
¡
¢
|A| =
1 − F (x) dx =
P [X > x] dx =
P [X ≥ x] dx.
0
0
Similarly
¡
¢
E(X − ) = E Q− (U ) =
Z
1
Z
Q− (u) du = |B| =
0
Then Y has an expectation if and only if at least one of τ+ and τ− is
finite, in which case
Z
E(Y ) = τ+ − τ− = t(x) dFX (x).
(12)
With an appropriate definition of the integral, this formula is
valid even if X is a (multi-dimensional) random vector. These results
are proved in Stat 381.
7–7
0
0
F (x) dx,
−∞
where B = {(x, u) : Q(u) < x < 0}. This proves:
Theorem 3. Let X be a random variable with df F and quantile
function Q, and let A and B be defined as above. X is quasi-integrable
if and only if at least one of |A| and |B| is finite, and then
E(X) = |A| − |B|
Z 1
Z ∞
£
¡
¢¤
=
Q(u) du =
−F (−x) + 1 − F (x) dx.
(13)
0
0
7–8
18:10 10/16/2000
If X is quasi-integrable, then E(X) =
¡
¢¤
R ∞£
−F (−x) + 1 − F (x) dx.
0
Example 3. (a) For any random variable
Z ∞
E(|X|) =
P [ |X| ≥ x ] dx.
(14)
0
Consequently X is integrable if and only if this integral is convergent.
For a standard Cauchy random variable C, P [ |C| ≥ x ] ∼ 2/(πx) as
x → ∞, so the integral diverges; this is another way to see that C is
not integrable. Note that if X is integer valued, then
X∞
E(|X|) =
P [ |X| ≥ n ].
(15)
n=1
(b) Let X be a standard exponential random variable, with density
f (x) = e−x I(0,∞) (x) on R. Note that X takes on only nonnegative
values. We have
Z x
F (x) =
e−ξ dξ = 1 − e−x
0
Q(u) = F −1 (u) = − log(1 − u)
0
Z
1
Z
1
Q(u) du =
0
E+ : If two random variables X and Y each have finite expectations,
then so does X + Y , and
E(X + Y ) = E(X) + E(Y ).
(16)
More generally, if E(X) and E(Y ) exist (possibly as ±∞) and if the
sum E(X) + E(Y ) is defined (i.e., is not of the form +∞ − ∞ or
−∞ + ∞), then E(X + Y ) exists and is given by (16).
Ec : If X has an expectation and c is a finite real number, then cX
has an expectation, given by
E(cX) = cE(X).
(17)
E(X) ≤ E(Y )
for 0 < u < 1. By calculus
Z ∞
Z ∞
x f (x) dx =
xe−x dx = Γ(2) = 1,
0
0
Z ∞
Z ∞
¡
¢
1 − F (x) dx =
e−x dx = Γ(1) = 1,
0
Theorem 4. Expectation has the following properties.
E≤ : Suppose X and Y are two random variables such that X ≤ Y
(i.e., X(ω) ≤ Y (ω) for all sample points ω). Then
for x ≥ 0, and
Z
Properties of E. We state without proof some basic properties of
the expectation operator E. These properties are proved (perhaps
under some further integrability assumptions) in elementary texts in
the discrete and continuous case; they are proved in general in Stat
381.
− log(1 − u) du =
0
1
− log(v) dv = 1.
0
(18)
provided both expectations exist. If in addition the expectations are
equal and finite, then P [ X = Y ] = 1.
EI : Suppose X1 , X2 , . . . , Xn are independent random variables. Then
the product X1 X2 · · · Xn has an expectation provided: (a) all the Xk ’s
are nonnegative, or (b) all the Xk ’s are integrable. In both of these
cases,
E(X1 X2 · · · Xn ) = E(X1 )E(X2 ) · · · E(Xn ).
(19)
Of course, all three integrals had to be the same, since they each give
the value of E(X).
•
In case (a) the product on the right-hand side is to be evaluated using
the rule ∞ × c = c × ∞ equals ∞ if 0 < c ≤ ∞, and equals 0 if c = 0.
7–9
7 – 10
Example 4. (a) Let X ∼ Gamma(r, λ), with density λr xr−1 e−λx/Γ(r)
for x > 0. Then Y = λX ∼ Gamma(r, 1), so E(X) = E(Y )/λ. Moreover
Z ∞
1
E(Y ) =
y y r−1 e−y dy
Γ(r) 0
Z ∞
´ Γ(r + 1)
1
Γ(r + 1) ³
y (r+1)−1 e−y dy =
=r
=
Γ(r)
Γ(r + 1) 0
Γ(r)
(see Exercise 5 for the last step). Hence
E(X) = r/λ.
(20)
(b) Suppose again that X ∼ Gamma(r, λ) and Y = λX. Then
Z ∞
Z ∞
1
1
E(1/Y ) =
fY (y) dy =
y (r−1)−1 e−y dy
y
Γ(r)
0
0
½
Γ(r − 1)/Γ(r) = 1/(r − 1), if r > 1,
=
∞,
if r ≤ 1,
and
½
E(1/X) = E(λ/Y ) =
λ/(r − 1), if r > 1,
∞,
otherwise.
(21)
(d) Suppose X ∼ χ2n = Gamma(r, λ) for r = n/2 and λ = 1/2. Then
7 – 11
(23)
if n > 2,
otherwise.
(e) Suppose X ∼ UF (m, n). Thus X = SS 1 /SS 2 where SS 1 ∼ χ2m
and SS 2 ∼ χ2n , and SS 1 is independent of SS 2 . Since each SS i is
nonnegative, (19) gives
½
m/(n − 2), if n > 2,
E(X) = E(SS 1 )E(1/SS 2 ) =
(25)
∞,
otherwise.
(f) Suppose X ∼ F (m, n). Then X = (SS 1 /m)/(SS 2 /n) = (n/m)Y
where Y ∼ UF (m, n). Hence
½
n/(n − 2), if n > 2,
n
(26)
E(X) = E(Y ) =
m
∞,
otherwise.
Example 5. Consider the following game. I am going to pick a
number x at random from the F distribution with m = 3 and n = 4
degrees of freedom. Before I make my draw, you have guess what
my x will be; call your guess c. Then I’ll make the draw, and you’ll
pay me
(c) Similar calculations (do them!) show that for X ∼ Beta(α, β),
with density xα−1 (1 − x)β−1 /B(α, β) for 0 < x < 1, one has
α
E(X) =
.
(22)
α+β
r
n/2
E(X) = =
=n
λ
1/2
½
λ/(r − 1) = 1/(n − 2),
E(1/X) =
∞,
(19) E(Y1 Y2 ) = E(Y1 )E(Y2 ) if Y1 ≥ 0 and Y2 ≥ 0 are independent.
(24)
(x − c)2 − w
cents (or dollars!), where w is my wager, say 10 units. For example,
if you guess my x exactly, I’ll pay you 10 units. But if your guess if
off by 2, I’ll only pay you 10 − 4 = 6 units, whereas if your guess if
off by 4, you’ll pay me 16 − 10 = 6 units. Any takers?
Classroom demonstration here
Some questions: (a) What is the best choice for your guess c? (b) Is it
fair for me to wager w = 10 units? These questions will be answered
in the next lecture.
•
We close this section with a couple of simple but useful inequalities.
7 – 12
18:10 10/16/2000
Theorem 5 (Markov’s inequality). Let X be a nonnegative random variable. One has
P[X ≥ c] ≤
E(X)
c
(27)
for each number c > 0. Moreover for any given c, equality holds in
(27) if and only if P [ X = 0 or X = c ] = 1.
Proof Let Ω be the sample space on which X is defined. Let V be
the random variable on Ω defined by
.
..... X
......
½
.....
......
.
.
.
.
.
c, if X(ω) ≥ c,
V (ω)
...•
X(ω) →
......
......
.
V (ω) =
.......
•
.......
c
.
.
.
V
.
.
.
0, otherwise.
........
........
0
Since
...
.........
...........
V (ω) ≤ X(ω)
ω
Ω
(28)
for all ω, (18) implies that
E(V ) ≤ E(X);
(29)
(27) follows since
E(V ) = 0 × P [ V = 0 ] + c × P [ V = c ] = c × P [ X ≥ c ].
If equality holds in (27), then it also holds in (29). By the addendum
to (18), equality must hold in (28) for almost all sample points ω,
and hence X can take only the values 0 and c, with probability one.
Conversely, if X takes just those values, equality does hold in (27).
Theorem 6 (Chebychev’s inequality). Let X be an integrable
random variable with mean µ. One has
¡
¢
E (X − µ)2
P [ |X − µ| ≥ c ] ≤
(30)
c2
for each number c > 0. Moreover for any given c, equality holds in (30)
if and only if X takes the values µ − c, µ, and µ + c with probabilities
(1 − p)/2, p, and (1 − p)/2 respectively, for some p ∈ [0, 1].
7 – 13
Chebychev’s inequality follows easily from Markov’s inequality;
the proof is left to you as Exercise 7.
Exercise 1. Let Z be a standard normal random variable. Show
that for positive integers k
(Q
k
k
j=1 (2j − 1), if k = 2j is even,
E(Z ) =
(31) ¦
0,
if k is odd.
Exercise 2. Let Y and Z be independent standard normal random
variables. For positive integers n, put
√
Xn := Y (1 + Z/ n ).
(32)
For k = 1, 2, . . . find a simple computable expression for E(Xnk ) and
show that E(Xnk ) → E(Y k ) as n → ∞. Evaluate E(Xnk ) for k = 1,
. . . , 4.
¦
Exercise 3. Let X be an integrable real random variable with distribution function F , quantile function Q, and mean µ = E(X). Let
Z ∞
¡
¢
δ := E |X − µ| =
|x − µ| F (dx)
(33)
−∞
be the so-called mean (absolute) deviation (MAD) of X about
its mean. Show that
Z 1
δ=
|Q(u) − µ| du
0
Z µ
Z ∞
¡
¢
=2
F (x) dx = 2
1 − F (x) dx.
(34) ¦
−∞
µ
Exercise 4.P Show that a random variable X is quasi-integrable if
∞
¦
and only if n=1 P [ |X| ≥ n ] < ∞.
7 – 14
Exercise 5. Show that the Gamma function Γ(r) :=
satisfies the recursion formula
R∞
0
xr−1 e−x dx
Γ(r + 1) = rΓ(r)
(35)
for r > 0. [Hint: integrate by parts.]
¦
Exercise 6. Find E(X) for random variables X having the following
discrete distributions.
Distribution
Poisson(µ)
P[X = k ]
³n´
pk (1 − p)n−k , k = 0, . . . , n
k
e−µ µk/k! , k = 0, 1, . . .
Geometric(p)
q k−1 p, k = 0, 1, . . .
Binomial(n, p)
Exercise 7. Prove Theorem 6.
¦
Exercise 8 (A weak law of large numbers). Let X1 , X2 , . . . be independent random variables, each distributed like a random variable
X with E(X) = 0 and σ 2 := E(X 2 ) < ∞. For each n ∈ N set
Sn = X1 + · · · + Xn . (a) Show that E(Sn ) = 0 and E(Sn2 ) = nσ 2 .
(b) Show that for each ² > 0,
limn→∞ P [ |Sn /n| ≥ ² ] = 0.
(36) ¦
Exercise 9. Let X1 , . . . , Xn be independent random variables, each
distributed like a random variable X with E(X) = 0 and E(X 4 ) < ∞.
(a) Show that X 2 and X 3 are integrable. (b) Put Sn = X1 + · · · + Xn .
Show that
¡
¢2
E(Sn4 ) = nE(X 4 ) + 3n(n − 1) E(X 2 ) .
(37)
The following information is needed for next two exercises. Let P
be a probability measure on a sample space Ω. Let A1 , A2 , A3 , . . . be
an infinite sequence of events (i.e., subsets of Ω), and let lim supn An
be the set of sample points ω ∈ Ω which belong to An for infinitely
many n’s. According to the first Borel-Cantelli lemma,
P∞
P [ lim supn An ] = 0 provided
(38)
n=1 P [An ] < ∞.
According to the second Borel-Cantelli lemma,
· P∞
¸
P [An ] = ∞ and
n=1
. (39)
P [ lim supn An ] = 1 provided
the An ’s are independent
Exercise 10. Let X1 , X2 , . . . be an infinite sequence of independent
standard Cauchy random variables and let c be a positive number. Use
the second Borel-Cantelli lemma to show that for almost every sample
point ω, Xn (ω) ≥ nc for infinitely many n’s, and also Xn (ω) ≤ −nc
for infinitely many n’s. Use this fact to explain the behavior of Sn /n
exhibited in Figure 1.
¦
Exercise 11 (A SLLN ). Let X1 , X2 , . . . be independent random
variables, each distributed like a random variable X with E(X) = 0
and E(X 4 ) < ∞. Put Sn = X1 + · · · + Xn for each n. Use Markov’s
inequality for Sn4 , Exercise 9, and the first Borel-Cantelli lemma to
show that
P [ |Sn |/n ≥ 1/n1/8 for infinitely many n ] = 0
and conclude that the set of sample points ω such that Sn (ω)/n →
E(X) as n → ∞ has probability 1.
¦
Pn
Pn
Pn
Pn
[Hint: Write Sn4 as ( i=1 Xi )( j=1 Xj )( k=1 Xk )( `=1 X` ) and expand the sums.]
¦
7 – 15
(40)
7 – 16