Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Expectation of Discrete Random Variables Def: The expected value of a discrete random variable X is defined to be E [X ] = xi p xi ∑ ( ) x i : p x i >0 ( ) The symbol µ is often used denote expection of a random variable Ex: Let X denote the no. of tosses of a fair coin required to get “heads”, with possible values 1,2,3,...... The probability that it takes k tosses is 1/2k (the probability of k-1 tails followed by heads). ∞ E [X ] = ∑ k 2 k =1 ∞ −k ∞ k −k = ∑ ∑ 12 k =1 i =1 ∞ 1 ∞ −k − (i −1) = ∑ ∑2 2 = = =2 ∑ 1 i =1 i =1k = i 1− 2 (Sum of geometric series. Series is convergent, so can interchange the order of summations.) Theorem: When X is discrete, and E[X] exists, E [ X ] = ∑ X ( ω )P ( ω ) Ω Ex: Given a sequence of 3 coin tosses where the outcomes (8 of them) are equally likely and define X ( ω ) as the number of heads in the outcome ω . The distribution of X is: xi 0 1 2 3 p(xi) 1/8 3/8 3/8 1/8 xp(xi) 0 3/8 6/8 3/8 The expected value is the sum of the values in the last column (3/2). However, we can also get this from the distribution of the elements in the original sample space E [ X ] = ∑ X ( ω )P ( ω ) = ( 0 + 1 + 1 + 1 + 2 + 2 + 2 + 3 ) Ω =3/2 Properties of Expectation Theorem: Let X and Y be discrete random variables. Then (i): For any constant a, E(a) = a, and E[aX] = aE[X] 1 8 To prove, use E [X ] = ∑ X (ω )P (ω ) with ω = x i . , Ω ( ) ( ) X (ω ) = g x i , and P (ω ) = f x i : ( )=a E [aX ] = ∑ ax f (x ) = a ∑ f (x ) = aE [X ] ( ) E [a ] = ∑ a f x i = a ∑ f x Ω i Ω Ω i Ω (ii): Additive Property of expectations: E [X + Y ] = E [X ] + E [Y ] To prove: E [X + Y ] = ∑ [X (ω ) + Y (ω )] P (ω ) Ω = ∑ X (ω ) P (ω ) + ∑ Y (ω ) P (ω ) Ω Ω = E [X ] + E [Y ] (iii): E [aX + bY + c ] = aE [X ] + bE [Y ] + c Ex: Let X be the no of points assigned to a playing card in a system of bidding for bridge. It has probability mass function: P ( x i ) = {1 / 13 , x = 1, 2 , 3 , 4 } = { 9 / 13 , x = 0 } J,Q,K,A ow E[X]= ((1+2+3+4) (1/13) = 10/13 Alternatively, let Yi denote the no of points in player i’s hand, realize that these Y’s are exchangable (pmf is a symmetric function of its arguments) and have the same expectation. There are 40 pts in the whole deck, so sum of Yi’s is 40. Then 4*E[Y]=40 so E[Y]=10 Conditional Expectation Def: For a discrete valued random variable, the expectation of X conditioned on Y is [ ]= E XY = y ( ∑ ) x i : p x i Y >0 x i p (x Y = y ) Expectation of Functions of Discrete Random Variables If X is a discrete rv which takes on one of the values xi, i ≥ 1, with probability p(xi ), then for any real valued function g , ( ) ( ) E [g (X )] = ∑ g x i p x i i (Note shorthand form for index in summation.) Expectation of “Special Functions of Discrete Rv’s [[ Theorem: E E X Y = y i ]] = ∑i E [X Y = y i ]p (y i ) Moments of Distributions Def: The kth moment (about zero) of the random variable X is µ' k = E ( X k ) . The kth central moment is [ µ k = E ( X − µ )k ] . Theorem: If X has moments of order k, it has moments of all lower orders. Definitions: Variance of a distribution is the average squared deviation. [ σ 2 = E (X − µ )2 ] Standard deviation is the positive square root of the variance: σ Mean Absolute Deviation: m.a.d . = E X − µ Note: m.a.d .( X ) ≤ σ X To show this, just use definitions: 2 Var [ X − µ ] = E X − µ 2 − [E ( X − µ )] ≥ 0 But note that the first term is the same as E X − µ 2 = E (X − µ )2 . Substitute and take the square root. ( ( ) [ ) ] Moment Generating Function: Suppose that there is a positive number h such that for -h<t<h, the expectation E [ e tX ] exists. For a discrete random variable X, [ ] ψ (t ) = E e tx i = ∑e i tx i p( x i ) is called the moment generating function of X. It has the property that ( ) ( dk ψ (t ) = k E e tX = E X k e tX dt k ) Now if t = 0, this becomes E ( X k ) Ex: Suppose that p(X=1)=p and P(X=0)=1-p [ ] ψ (t ) = E e tx i ( ) = pe t + (1 − p )e 0 = 1 + p e t − 1 ψ k (0) = p , k=1,2,.... So µ = p , σ 2 = p − p 2 = p(1 − p ) Ex: Suppose that p (x ) = ∞ ψ (t ) = ∑ 6e tx π x exist for -h<t<h x =1 2 2 6 π x 2 2 , which , x = 1, 2, 3,.... diverges, so ψ ( t ) does not Theorem: (Parallel axis theorem) For any constant a, [ ] ( E (X − a )2 = σ 2x + a − µ X ) 2 Corollary: The variance is the smallest second moment. (To show this, note that the expression is minimized if a = µ x . Theorem: When X has second moments, var X = E [var (X Y )] + var µ X Y ( ) To prove, use the parallel axis theorem. Def: Covariance between X and Y: [( cov (X ,Y ) = E X − µ X = E (XY ) − E (X ) E (Y ) ) (Y − µY )]