Download Law of Unconscious Statistician

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Law of the Unconscious Statistician
Introduction
If X is a random variable and p(x) is the probability density (or mass) function
(p.d.f.) of X, then the expectation of X (E(X)) is defined as
 xp( x)
if X is discrete and
x
or
 x p( x)  
x
 xp( x)dx
if X is continuous, where D is the domain of x and
D
 x p( x)dx  
D
Suppose f(x) is a function of a random variable. To find the expectation of f(x), it
is necessary to know the distribution of f(x).
Example 1
In throwing a dice, define the random variable X as the number shown on the
dice.
E ( X )   xp( x)
x
6
 x
x 1
1
6

6
1
(6  1) 
2
6

7
2
For f(x) = 2x, the distribution of f is
2
4
6
8
10
12
1
6
1
6
1
6
1
6
1
6
1
6
E ( f ( x)) 
6
1
(2  12)   7
2
6
It appears that E ( f ( x))   f ( x) p( x) .
x
Once the function is complicated, especially when it is not injective, it is not easy
to derive the result.
Example 2
Use the same random variable as Example 1 and f(x) = -x2 + 7x – 6
-1-
The range of f is {0, 4, 6}
The distribution of f is
f(x)
0
4
6
Pr(f(x))
1
3
1
3
1
3
1
10
E ( f ( x))  (0  4  6) 
3
3
On the other hand,
1
 f ( x) p( x)  (0  4  6  6  4  0)  6
x

20 10

6
3
Once again, E ( f ( x))   f ( x) p( x) .
x
The result is simple but the logic is not obvious. The formula is often found in
elementary part of statistics. It is usually treated as the definition of expectation of a
random function. In fact, it should be a theorem. In some books, it states that the
distribution of f should be known and fortunately we have the above formula for the
general function. There is no proof in most books. A famous statistician Sheldon Ross
called it Law of the Unconscious Statistician. In return, he received harsh criticism.
We are going to give a proof of the law.
Discussion of the Law
Discrete Case
If X is a discrete random variable with p.d.f. p(x) and f is a real valued function
such that
 f ( x) p ( x)  
(i.e., the sum is absolutely convergent), then
x
E ( f ( X ))   f ( x) p( x)
x
Proof: Let D = {x1, x2, …. } be the set of all possible values of the random variable X.
{y1, y2, …. } be range of f.
Aj = {x: f(x) = yj}
then E ( f ( X ))   y j Pr ( f ( x)  y j )
j
(where Pr(E) means the probability of the event E.)
  f ( x j )  p ( x)
j
xA j
-2-
  f ( x j ) p( x)
j xAi
As  A j  D and Aj  Ak =  for j ≠ k,
j
  f ( x ) p ( x)   f ( x ) p ( x)
j
j
xAi
xD
where the sum can be taken in any order as the series is absolutely convergent.
If X is a continuous random variable with p.d.f. p(x) and f is a real valued
function.
Continuous Case
It is less obvious even for a uniform distribution. For example
1
2 1  x  1

p ( x)  
 0 otherwise


Suppose f(x) = x2 with p.d.f. 
 1
2 x

 ( x)  
 0


0  x 1
otherwise
Then
1
E()   x ( x)dx
0
1
x
0
2 x

dx 

3
12 21
x
23 0

1
3
On the other hand,
1 1
x dx
2 0
1

1
x 2 p( x)dx
-3-

1 1 2
x dx
2 1
1
1
1 1
 . x3
2 3 1
1 1
 . (1  1)
2 3

1
3

It is expected that E ( f ( x))   f ( x) p( x)dx

To prove the formula, it needs the formula

E( x)   Pr (t  x)dx for a non-negative random variable x.
0
Before proving the formula, let us investigate the discrete case:
suppose the p.d.f. of x is
xi
1
3
6
7
8
9
p(xi)
p1
p2
p3
p4
p5
p6
Pr(x > 0) = p1 + p2 + p3 + p4 + p5 + p6
Pr (x > 1) =
p2 + p3 + p4 + p5 + p6
Pr (x > 2) =
p2 + p3 + p4 + p5 + p6
Pr (x > 3) =
p3 + p4 + p5 + p6
Pr (x > 4) =
p3 + p4 + p5 + p6
Pr (x > 5) =
p3 + p4 + p5 + p6
Pr (x > 6) =
p4 + p5 + p6
Pr (x > 7) =
p5 + p6
Pr (x > 8) =
p6
8
then
 P ( x  i)  p
i 0
r
1
 3 p 2  6 p 3  7 p 4  8 p5  9 p 6  E ( x )

In general, E ( x)   Pr ( x  i ) , where X is a random variable with positive
i 0
integral values.
Proof: pi appears once in each of the sum Pr(x > 0), Pr(x > 1), …, Pr(x > xi – 1).

Group the like terms of the sum
 P ( x  i) . It can be seen that the coefficient of
i 0
r
pi is xi.
-4-


 P ( x  i)   x p
Hence
i 0
r
i 1
i
i
 E ( x)
Lemma
If x is a continuous random variable with positive values only, then

E ( x)   Pr ( y  x)dx
0
Proof:


0
Pr ( y  x)dx


0
x
  ( p( y)dy)dx


0



p( y)dydx
x

0

y
Y
p( y)dxdy
0

y
  p( y)(  dx)dy
0
0

  p( y) ydy
0

  xp( x)dx
X
0
= E(x)
Theorem
Suppose X is a random variable and p.d.f. p(x) and f(x) is a positive function then

E ( f ( x))   f ( x) p( x)dx
0
Proof: By the lemma

E( f ( x))   Pr ( f ( x)  y)dy
0

  (
0
x: f ( x )  y
p( x)dx)dy
f ( x)

(

f ( x) p( x)dx
x: f ( x ) 0
x: f ( x ) 0
0
dy ) p ( x)dx

  f ( x) p( x)dx
0
-5-
General Case
Lemma


0
0
E( x)   Pr ( y  x)dx   Pr ( y   x)dx
Theorem

E( f ( x))   f ( x) p( x)dx

-6-
Related documents