Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Law of the Unconscious Statistician Introduction If X is a random variable and p(x) is the probability density (or mass) function (p.d.f.) of X, then the expectation of X (E(X)) is defined as xp( x) if X is discrete and x or x p( x) x xp( x)dx if X is continuous, where D is the domain of x and D x p( x)dx D Suppose f(x) is a function of a random variable. To find the expectation of f(x), it is necessary to know the distribution of f(x). Example 1 In throwing a dice, define the random variable X as the number shown on the dice. E ( X ) xp( x) x 6 x x 1 1 6 6 1 (6 1) 2 6 7 2 For f(x) = 2x, the distribution of f is 2 4 6 8 10 12 1 6 1 6 1 6 1 6 1 6 1 6 E ( f ( x)) 6 1 (2 12) 7 2 6 It appears that E ( f ( x)) f ( x) p( x) . x Once the function is complicated, especially when it is not injective, it is not easy to derive the result. Example 2 Use the same random variable as Example 1 and f(x) = -x2 + 7x – 6 -1- The range of f is {0, 4, 6} The distribution of f is f(x) 0 4 6 Pr(f(x)) 1 3 1 3 1 3 1 10 E ( f ( x)) (0 4 6) 3 3 On the other hand, 1 f ( x) p( x) (0 4 6 6 4 0) 6 x 20 10 6 3 Once again, E ( f ( x)) f ( x) p( x) . x The result is simple but the logic is not obvious. The formula is often found in elementary part of statistics. It is usually treated as the definition of expectation of a random function. In fact, it should be a theorem. In some books, it states that the distribution of f should be known and fortunately we have the above formula for the general function. There is no proof in most books. A famous statistician Sheldon Ross called it Law of the Unconscious Statistician. In return, he received harsh criticism. We are going to give a proof of the law. Discussion of the Law Discrete Case If X is a discrete random variable with p.d.f. p(x) and f is a real valued function such that f ( x) p ( x) (i.e., the sum is absolutely convergent), then x E ( f ( X )) f ( x) p( x) x Proof: Let D = {x1, x2, …. } be the set of all possible values of the random variable X. {y1, y2, …. } be range of f. Aj = {x: f(x) = yj} then E ( f ( X )) y j Pr ( f ( x) y j ) j (where Pr(E) means the probability of the event E.) f ( x j ) p ( x) j xA j -2- f ( x j ) p( x) j xAi As A j D and Aj Ak = for j ≠ k, j f ( x ) p ( x) f ( x ) p ( x) j j xAi xD where the sum can be taken in any order as the series is absolutely convergent. If X is a continuous random variable with p.d.f. p(x) and f is a real valued function. Continuous Case It is less obvious even for a uniform distribution. For example 1 2 1 x 1 p ( x) 0 otherwise Suppose f(x) = x2 with p.d.f. 1 2 x ( x) 0 0 x 1 otherwise Then 1 E() x ( x)dx 0 1 x 0 2 x dx 3 12 21 x 23 0 1 3 On the other hand, 1 1 x dx 2 0 1 1 x 2 p( x)dx -3- 1 1 2 x dx 2 1 1 1 1 1 . x3 2 3 1 1 1 . (1 1) 2 3 1 3 It is expected that E ( f ( x)) f ( x) p( x)dx To prove the formula, it needs the formula E( x) Pr (t x)dx for a non-negative random variable x. 0 Before proving the formula, let us investigate the discrete case: suppose the p.d.f. of x is xi 1 3 6 7 8 9 p(xi) p1 p2 p3 p4 p5 p6 Pr(x > 0) = p1 + p2 + p3 + p4 + p5 + p6 Pr (x > 1) = p2 + p3 + p4 + p5 + p6 Pr (x > 2) = p2 + p3 + p4 + p5 + p6 Pr (x > 3) = p3 + p4 + p5 + p6 Pr (x > 4) = p3 + p4 + p5 + p6 Pr (x > 5) = p3 + p4 + p5 + p6 Pr (x > 6) = p4 + p5 + p6 Pr (x > 7) = p5 + p6 Pr (x > 8) = p6 8 then P ( x i) p i 0 r 1 3 p 2 6 p 3 7 p 4 8 p5 9 p 6 E ( x ) In general, E ( x) Pr ( x i ) , where X is a random variable with positive i 0 integral values. Proof: pi appears once in each of the sum Pr(x > 0), Pr(x > 1), …, Pr(x > xi – 1). Group the like terms of the sum P ( x i) . It can be seen that the coefficient of i 0 r pi is xi. -4- P ( x i) x p Hence i 0 r i 1 i i E ( x) Lemma If x is a continuous random variable with positive values only, then E ( x) Pr ( y x)dx 0 Proof: 0 Pr ( y x)dx 0 x ( p( y)dy)dx 0 p( y)dydx x 0 y Y p( y)dxdy 0 y p( y)( dx)dy 0 0 p( y) ydy 0 xp( x)dx X 0 = E(x) Theorem Suppose X is a random variable and p.d.f. p(x) and f(x) is a positive function then E ( f ( x)) f ( x) p( x)dx 0 Proof: By the lemma E( f ( x)) Pr ( f ( x) y)dy 0 ( 0 x: f ( x ) y p( x)dx)dy f ( x) ( f ( x) p( x)dx x: f ( x ) 0 x: f ( x ) 0 0 dy ) p ( x)dx f ( x) p( x)dx 0 -5- General Case Lemma 0 0 E( x) Pr ( y x)dx Pr ( y x)dx Theorem E( f ( x)) f ( x) p( x)dx -6-