Download Expectation of Discrete Random Variables Def: The expected value

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Expectation of Discrete Random Variables
Def: The expected value of a discrete random variable X is
defined to be
E [X ] =
xi p xi
∑
( )
x i : p x i >0
( )
The symbol µ is often used denote expection of a random
variable
Ex: Let X denote the no. of tosses of a fair coin required to
get “heads”, with possible values 1,2,3,...... The probability
that it takes k tosses is 1/2k (the probability of k-1 tails
followed by heads).
∞
E [X ] = ∑ k 2
k =1
∞
−k
∞
 k  −k
= ∑  ∑ 12
k =1 i =1 
∞
1
 ∞ −k 
− (i −1)
= ∑ ∑2
2
=
=
=2
∑


1
 i =1
i =1k = i
1−
2
(Sum of geometric series. Series is convergent, so can
interchange the order of summations.)
Theorem: When X is discrete, and E[X] exists,
E [ X ] = ∑ X ( ω )P ( ω )
Ω
Ex: Given a sequence of 3 coin tosses where the outcomes
(8 of them) are equally likely and define X ( ω ) as the
number of heads in the outcome ω . The distribution of X is:
xi
0
1
2
3
p(xi)
1/8
3/8
3/8
1/8
xp(xi)
0
3/8
6/8
3/8
The expected value is the sum of the values in the last
column (3/2).
However, we can also get this from the distribution of the
elements in the original sample space
E [ X ] = ∑ X ( ω )P ( ω ) = ( 0 + 1 + 1 + 1 + 2 + 2 + 2 + 3 )
Ω
=3/2
Properties of Expectation
Theorem: Let X and Y be discrete random variables.
Then
(i): For any constant a, E(a) = a, and E[aX] = aE[X]
1
8
To prove, use E [X ] = ∑ X (ω )P (ω ) with ω = x i . ,
Ω
( )
( )
X (ω ) = g x i , and P (ω ) = f x i :
( )=a
E [aX ] = ∑ ax f (x ) = a ∑ f (x ) = aE [X ]
( )
E [a ] = ∑ a f x i = a ∑ f x
Ω
i
Ω
Ω
i
Ω
(ii): Additive Property of expectations:
E [X + Y ] = E [X ] + E [Y ]
To prove:
E [X + Y ] = ∑ [X (ω ) + Y (ω )] P (ω )
Ω
= ∑ X (ω ) P (ω ) + ∑ Y (ω ) P (ω )
Ω
Ω
= E [X ] + E [Y ]
(iii): E [aX + bY + c ] = aE [X ] + bE [Y ] + c
Ex: Let X be the no of points assigned to a playing card
in a system of bidding for bridge. It has probability
mass function:
P ( x i ) = {1 / 13 , x = 1, 2 , 3 , 4 }
= { 9 / 13 , x = 0 }
J,Q,K,A
ow
E[X]= ((1+2+3+4) (1/13) = 10/13
Alternatively, let Yi denote the no of points in player i’s
hand, realize that these Y’s are exchangable (pmf is a
symmetric function of its arguments) and have the
same expectation. There are 40 pts in the whole deck,
so sum of Yi’s is 40. Then 4*E[Y]=40 so E[Y]=10
Conditional Expectation
Def: For a discrete valued random variable, the
expectation of X conditioned on Y is
[
]=
E XY = y
(
∑
)
x i : p x i Y >0
x i p (x Y = y )
Expectation of Functions of Discrete Random
Variables
If X is a discrete rv which takes on one of the values xi, i ≥ 1,
with probability p(xi ), then for any real valued function g ,
( ) ( )
E [g (X )] = ∑ g x i p x i
i
(Note shorthand form for index in summation.)
Expectation of “Special Functions of Discrete Rv’s
[[
Theorem: E E X Y = y i
]] = ∑i E [X Y = y i ]p (y i )
Moments of Distributions
Def: The kth moment (about zero) of the random
variable X is µ' k = E ( X k ) . The kth central moment is
[
µ k = E ( X − µ )k
]
.
Theorem: If X has moments of order k, it has
moments of all lower orders.
Definitions:
Variance of a distribution is the average squared
deviation.
[
σ 2 = E (X − µ )2
]
Standard deviation is the positive square root of the
variance: σ
Mean Absolute Deviation: m.a.d . = E X − µ
Note: m.a.d .( X ) ≤ σ X
To show this, just use definitions:
2
Var [ X − µ ] = E X − µ 2 − [E ( X − µ )] ≥ 0
But note that the first term is the same as
E X − µ 2 = E (X − µ )2 . Substitute and take the
square root.
(
(
) [
)
]
Moment Generating Function:
Suppose that there is a positive number h such that for
-h<t<h, the expectation E [ e tX ] exists. For a discrete
random variable X,
[ ]
ψ (t ) = E e
tx i
= ∑e
i
tx i
p( x i )
is called the moment generating function of X. It has
the property that
( ) (
dk
ψ (t ) = k E e tX = E X k e tX
dt
k
)
Now if t = 0, this becomes E ( X k )
Ex: Suppose that p(X=1)=p and P(X=0)=1-p
[ ]
ψ (t ) = E e
tx i
(
)
= pe t + (1 − p )e 0 = 1 + p e t − 1
ψ k (0) = p , k=1,2,....
So µ = p , σ 2 = p − p 2 = p(1 − p )
Ex: Suppose that
p (x ) =
∞
ψ (t ) = ∑
6e tx
π x
exist for -h<t<h
x =1
2 2
6
π x
2 2
, which
, x = 1, 2, 3,....
diverges, so ψ ( t ) does not
Theorem: (Parallel axis theorem) For any constant a,
[
]
(
E (X − a )2 = σ 2x + a − µ
X
)
2
Corollary: The variance is the smallest second
moment. (To show this, note that the expression is
minimized if a = µ x .
Theorem: When X has second moments,
var X = E [var (X Y )] + var µ X Y
( )
To prove, use the parallel axis theorem.
Def: Covariance between X and Y:
[(
cov (X ,Y ) = E X − µ X
= E (XY ) − E (X ) E (Y )
) (Y − µY )]