Download Law of Unconscious Statistician

Law of the Unconscious Statistician Introduction If X is a random variable and p(x) is the probability density (or mass) function (p.d.f.) of X, then the expectation of X (E(X)) is defined as  xp( x) if X is discrete and x or  x p( x)   x  xp( x)dx if X is continuous, where D is the domain of x and D  x p( x)dx   D Suppose f(x) is a function of a random variable. To find the expectation of f(x), it is necessary to know the distribution of f(x). Example 1 In throwing a dice, define the random variable X as the number shown on the dice. E ( X )   xp( x) x 6  x x 1 1 6  6 1 (6  1)  2 6  7 2 For f(x) = 2x, the distribution of f is 2 4 6 8 10 12 1 6 1 6 1 6 1 6 1 6 1 6 E ( f ( x))  6 1 (2  12)   7 2 6 It appears that E ( f ( x))   f ( x) p( x) . x Once the function is complicated, especially when it is not injective, it is not easy to derive the result. Example 2 Use the same random variable as Example 1 and f(x) = -x2 + 7x – 6 -1- The range of f is {0, 4, 6} The distribution of f is f(x) 0 4 6 Pr(f(x)) 1 3 1 3 1 3 1 10 E ( f ( x))  (0  4  6)  3 3 On the other hand, 1  f ( x) p( x)  (0  4  6  6  4  0)  6 x  20 10  6 3 Once again, E ( f ( x))   f ( x) p( x) . x The result is simple but the logic is not obvious. The formula is often found in elementary part of statistics. It is usually treated as the definition of expectation of a random function. In fact, it should be a theorem. In some books, it states that the distribution of f should be known and fortunately we have the above formula for the general function. There is no proof in most books. A famous statistician Sheldon Ross called it Law of the Unconscious Statistician. In return, he received harsh criticism. We are going to give a proof of the law. Discussion of the Law Discrete Case If X is a discrete random variable with p.d.f. p(x) and f is a real valued function such that  f ( x) p ( x)   (i.e., the sum is absolutely convergent), then x E ( f ( X ))   f ( x) p( x) x Proof: Let D = {x1, x2, …. } be the set of all possible values of the random variable X. {y1, y2, …. } be range of f. Aj = {x: f(x) = yj} then E ( f ( X ))   y j Pr ( f ( x)  y j ) j (where Pr(E) means the probability of the event E.)   f ( x j )  p ( x) j xA j -2-   f ( x j ) p( x) j xAi As  A j  D and Aj  Ak =  for j ≠ k, j   f ( x ) p ( x)   f ( x ) p ( x) j j xAi xD where the sum can be taken in any order as the series is absolutely convergent. If X is a continuous random variable with p.d.f. p(x) and f is a real valued function. Continuous Case It is less obvious even for a uniform distribution. For example 1 2 1  x  1  p ( x)    0 otherwise   Suppose f(x) = x2 with p.d.f.   1 2 x   ( x)    0   0  x 1 otherwise Then 1 E()   x ( x)dx 0 1 x 0 2 x  dx   3 12 21 x 23 0  1 3 On the other hand, 1 1 x dx 2 0 1  1 x 2 p( x)dx -3-  1 1 2 x dx 2 1 1 1 1 1  . x3 2 3 1 1 1  . (1  1) 2 3  1 3  It is expected that E ( f ( x))   f ( x) p( x)dx  To prove the formula, it needs the formula  E( x)   Pr (t  x)dx for a non-negative random variable x. 0 Before proving the formula, let us investigate the discrete case: suppose the p.d.f. of x is xi 1 3 6 7 8 9 p(xi) p1 p2 p3 p4 p5 p6 Pr(x > 0) = p1 + p2 + p3 + p4 + p5 + p6 Pr (x > 1) = p2 + p3 + p4 + p5 + p6 Pr (x > 2) = p2 + p3 + p4 + p5 + p6 Pr (x > 3) = p3 + p4 + p5 + p6 Pr (x > 4) = p3 + p4 + p5 + p6 Pr (x > 5) = p3 + p4 + p5 + p6 Pr (x > 6) = p4 + p5 + p6 Pr (x > 7) = p5 + p6 Pr (x > 8) = p6 8 then  P ( x  i)  p i 0 r 1  3 p 2  6 p 3  7 p 4  8 p5  9 p 6  E ( x )  In general, E ( x)   Pr ( x  i ) , where X is a random variable with positive i 0 integral values. Proof: pi appears once in each of the sum Pr(x > 0), Pr(x > 1), …, Pr(x > xi – 1).  Group the like terms of the sum  P ( x  i) . It can be seen that the coefficient of i 0 r pi is xi. -4-    P ( x  i)   x p Hence i 0 r i 1 i i  E ( x) Lemma If x is a continuous random variable with positive values only, then  E ( x)   Pr ( y  x)dx 0 Proof:   0 Pr ( y  x)dx   0 x   ( p( y)dy)dx   0    p( y)dydx x  0  y Y p( y)dxdy 0  y   p( y)(  dx)dy 0 0    p( y) ydy 0    xp( x)dx X 0 = E(x) Theorem Suppose X is a random variable and p.d.f. p(x) and f(x) is a positive function then  E ( f ( x))   f ( x) p( x)dx 0 Proof: By the lemma  E( f ( x))   Pr ( f ( x)  y)dy 0    ( 0 x: f ( x )  y p( x)dx)dy f ( x)  (  f ( x) p( x)dx x: f ( x ) 0 x: f ( x ) 0 0 dy ) p ( x)dx    f ( x) p( x)dx 0 -5- General Case Lemma   0 0 E( x)   Pr ( y  x)dx   Pr ( y   x)dx Theorem  E( f ( x))   f ( x) p( x)dx  -6-

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Law of Unconscious Statistician