Download Notes on Expected Value

October 21, 2002 Expected Values Notes for Math 295 1. Expected Values Intuitively, the expected value E(X) of a random variable X is the average value of X, in this sense: If we repeat the relevant experiment many times, and compute the “realized” value of X each time, and compute the average of the realized values of X, the result will be close to E(X). Furthermore, if we keep repeating the experiment, the successive averages (for increasing numbers of trials) will converge to E(X). This won’t work as a mathematical definition, since it assumes too much. What if we can’t actually repeat the experiment? What if we do repeat it, but the averages don’t converge? We will give a more formal mathematical definition. Our plan is to show—first intuitively, and later on, formally—that our mathematical definition matches the intuitive definition We begin with a definition for a discrete random variable X. (Recall X is discrete if it can take on only finitely many or countably many values.) Definition. Let X be a discrete random variable. Then the expected value of X is the number given by E(X)   a  P(X  a) a   a  p X (a) (1) a where the sum is taken over all possible values a of X, provided that this sum is absolutely convergent. (Please ignore that “provided” clause for the moment. There is a whole section about it, below.) This definition gives the expected value in terms of the probability function pX( ) (which means, in terms of P(X = a) for the various values of a). Equation (1) is the best way to compute E(X) if you have already computed pX( ). Just add a column to the table you already have, and compute the sum of the new column. Example 1. The experiment is to roll one die. Let X be the number on top: 1, 2, 3, 4, 5, or 6, with uniform probabilities. Then the expected value of X is E(X)   a  P(X  a)  1 16  2  16  3  16  4  16  5  16  6  16  3 12 . a Since the probabilities are all the same, it would have been easier to write E(X) = (1+2+3+4+5+6) / 6 = 3 1/2 with the same result. This matches our intuition for the average result of rolling one die. Example 2. The experiment is to flip three coins. Let X be the number of heads showing. In an earlier exercise we found the probability function for X: a 0 1 2 3 P(X=a) 1/8 3/8 3/8 1/8 One way to calculate the expected value is to add a column to the table as follows. a 0 1 2 3 P(X = a) 1/8 3/8 3/8 1/8 a P(X = a) 0 3/8 6/8 3/8 12/8  1 12  E(X) . Once again, this matches our intuition. On average, when we flip three coins, we get one and one half heads. Note: Since E(X) isn’t a probability, it is not necessary that E(X)  1. Our examples show this. It is also possible for E(X) to be negative. There are plenty of examples of negative expected values in any casino! Also see Examples 5 and 6 below. Note also: It is possible that E(X) has a value which is not one of the possible values of X. This happened in both Examples 1 and 2. The classic case is the average family with 1.5 children. Always remember: In order for E(X) to make sense, X must be a random variable. E(X) is always a number. Question: Can we make sense of an expression like E(15) or E(E(X))? Well, yes, if we treat “15” in this context as the name for a “random” variable that happens always to be equal to the number 15, regardless of the outcome of the experiment. Of course, E(15) = 15. To make sense of E(E(X)), we have to treat the inner E(X) as a random variable in the same way---but since it is really a number like 15, we always have E(E(X)) = E(X). Example 3. The experiment is to roll two dice. Let X = sum of the two numbers showing. In this case the possible values of X are 2, 3, …, up to 12, and the slow way to compute E(X) is from the probability function: 2 a 2 3 4 5 6 7 8 9 10 11 12 P(X = a) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 a P(X = a) 1/36 4/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 E(X) = 7 But there’s an easier way. Define two other random variables: X1 = number on first die X2 = number on second die Now we have: X = X1 + X2. Also, we know from Example 1 that E(X1) = 3.5 and E(X2) = 3.5. So, we can apply the principle we are about to learn, that the expected value of the sum of random variables is the sum of their expected values: E(X) = E(X1 + X2) = E(X1) + E(X2) = 3.5 + 3.5 = 7. Again, this matches intuition. (That principle would have saved some work in Example 2, also.) Example 4. (Optional) Let’s try an example with a countably infinite number of possible values of X. The real point of this example is a trick for summing an infinite series. The experiment is to roll a die repeatedly until we get a six. Let X be the number of rolls required. Then the possible values of X are 1, 2, 3, 4, … and we have seen earlier that n 1  1  5  , for each integer n  1. P(X  n)      6  6  The expected value is therefore E(X)  (1)  16   (2)  16  65   (3)  16  65   (4)  16  65   2 3 (2) This looks like a geometric series, but it is not a geometric series. Each term gains an extra factor of 5/6, as in a geometric series, but the multipliers 1, 2, 3, 4 etc. keep it from fitting the pattern. So how can we sum the series? 3 This is a case in which the formula for a geometric series is useless, but the method I keep urging on you works like a charm. Since 5/6 seems to have a significant roll, let’s see what happens when we multiply equation (2), term by term, by 5/6:  56  E(X)  (1)  16  56   (2)  16  65  2  (3)  16  65   (4)  16  65   3 4 (3) Now we subtract (3) from (2). It would be nice if all the terms would cancel, but that doesn’t happen. Instead, we get (subtracting the first term of (3) from the second term of (2), etc.)  16  E(X)  (1)  16   (1)  16  65   (1)  16  65  2  (1)  16  65   3 (4) This is a geometric series. So, we can use the formula. Or, better still, use the method again. Multiply (4) by 5/6:  56  16  E(X)  (1)  16  65   (1)  16  65  2  (1)  16  65   (1)  16  65   3 4 (5) Subtract (5) from (4):  16  16  E(X)  (1)  16  . (6) Finally, that’s all we need to determine that E(X)  6. (7) On average, if you set out to roll a six, it will take you 6 tries. Does that method look hard? It’s easier than learning the formula for geometric series (which isn’t actually very hard). If you have an infinite sum, just look for a multiplier that seems important, multiply your series by that number, subtract from your original series, and see if anything good happens. That’s really easy. And as we have just seen, sometimes it works even when the series in question isn’t geometric. Example 5. The experiment is to spin a standard roulette wheel, and bet the table minimum, ten dollars, on red. Let X be my profit in dollars. Now X has two possible outcomes: +10 and –10. Since there are 18 red numbers out of 38 numbers on the wheel, we have P(X = +10) = 18/38, and P(X = –10) = 20/38. So we have: E(X) = (10)(18/38) + (-10)(20/38)  10/19  0.526. On average, I lose a bit over fifty cents every time I place the minimum bet on red. Example 6. The experiment is to spin a standard roulette wheel, and bet ten dollars on the single number 17 (for which the house offers 35-to-1 odds). Let X be my profit in 4 dollars. Now the possible values of X are +350 and –10, and the probabilities are P(X = +350) = 1/38 and P(X = -10) = 37/38. Again, we calculate E(X) =  10/19  0.526. That’s the same expected value as for betting on red. It turns out that in an American roulette game, the expected profit on every $10 bet is the same—even if you are allowed to divide your $10 among several numbers. (There are minor exceptions, namely the 6-to-1 bets.) 2. Does every random variable have an expected value? Why did we need that “provided” clause in the definition of expected value? Is it possible that the sum in the definition fails to converge, or fails to converge absolutely? If X has finitely many possible values, the sum is a finite sum. So, it converges automatically and E(X) always exists. But if X has infinitely many possible values, it is possible that the sum diverges to infinity. A famous example is the “St. Petersburg Paradox.” A coin is flipped until it comes up heads. You win $2 if it happens on the first flip, $4 if it happens on the second flip, $8 if it happens on the third flip, etc. (In general, 2n dollars if the experiment takes n flips.) If you try computing the expected value, the result of the sum is +infinity. (How much would you pay to play that game? Would your answer depend on how much money there is in the world? If you this example is interesting, take a course in game theory.) It is also possible for something even worse to happen. The following example will never come up again in Math 295. It isn’t in any homework or on any test, and you might never see this phenomenon again, but you’re entitled to fair warning. Suppose X takes on both positive and negative values. In particular, suppose that the pmf for X is as follows: P(X = +1) = C P(X = -1) = C P(X = +2) = C/4 P(X = -2) = C/4 P(X = +3) = C/9 P(X = -3) = C/9 In general, if n isn’t zero, P(X = n) = C/n2. Here C is chosen to make the sum of the probabilities equal 1. (I think C   2 /12 ). In this case the sum that would give E(X) is conditionally convergent. If you calculate the sum in the order that I have given you the terms, the sum is zero. But if you put the terms in some other order, you might get some other sum. In fact, you can get any sum you want, 5 including infinity or minus infinity, by putting the terms in an appropriate order. We don’t want this kind of chaos in Math 295. So, we say that if the sum that defines E(X) is not absolutely convergent, the expected value of X does not exist. If you are very brave, you might try to simulate this experiment. If you do, you will find that the average value of X does not converge. It just keeps jumping around, even if you run billions of trials. Fortunately, examples like this are rare in applications. 3. Expected value directly from outcomes In practice, you have probably calculated expected value once using the definition above, and several times using a somewhat different formula: E(X)   X(s)  P(s) (8) s Here the sum is over all outcomes in the sample space. Of course, this formula only works if the sample space is finite or countably infinite, and if the sum in (8) converges absolutely. Maybe it is obvious that formulas (1) and (8) give the same result. If it isn’t obvious, here’s a proof:    X(s)  P(s)  s X(s)  P(s)  all possible   all s's  for  values a  which X(s)=a  (because those are the same terms, just collected into groups)      a  P(s)     all possible  all s's for values a  which X(s)=a       a P(s)     all possible  all s's for values a  which X(s)=a    a  P(X  a) all possible values a (because that's the definition of P(event) when S is discrete) 6 4. Expected value of a function of a random variable Suppose X is a discrete random variable, and Y is another random variable that is defined in terms of X, say by Y = h(X). That means that for every s, the value Y(s) is related to the value X(s) by Y(s) = h ( X(s) ), where h is some function. If you want to calculate E(Y), you could build a probability function pY for Y and calculate E(Y) from Equation (1). That may not be too hard, especially if you already have a probability function for X. But you can speed up the process even more by using the formula E(Y)   h(a)  P(X  a) (9) a The significance of this formula is that you are still summing over the possible values a of X, and you are still using the probability function for X, but you are substituting the corresponding values of Y for the values of X in the multiplications (that is, h(a) in place of a). Example: Roll one die. Let X = number on top, as in example 1. Let Y = the square of the number on top. Thus, Y = X2. Or, Y = h(X) where h(a) means a2. In this case, we have: E(Y)   h(a)  P(X  a) a  (1)(1/ 6)  (4)(1/ 6)  (9)(1/ 6)  (16)(1/ 6)  (25)(1/ 6)  (36)(1/ 6)  91/ 6  15.167. Note that E(Y) = E(X2) is not the same as (E(X))2. 5. What about random variables that are not discrete? We have defined E(X) when X is a discrete random variable. What if X is not discrete? If X isn’t discrete, then it has no pX( ) (at least, no useful pX). But every random variable has a cdf, defined by F(a) = P( X  a ). Can we define E(X) in terms of the cdf? Definition. If X is any random variable and F is its cumulative distribution function, then 7  E(X)   1  F(x)  dx  0 0  F(x)dx (10)  provided both integrals exist. This definition has a geometric interpretation. It says that E(X) is the combination of the two shaded areas in this figure. The right-hand area counts as positive, and the left-hand area counts as negative. + F(a) – (For random variables that can only take positive values, this definition is simpler. In this case F(a)=0 whenever a<0. This means that the second integral in (10) is zero, and the left-hand area in the figure vanishes.) Before we can accept this definition for general random variables, we should check that it agrees with our earlier definition in the case of discrete random variables. This is a fairly large project, and we will skip it. In fact, the definitions do agree. I haven’t seen this general definition in any probability texts. (Maybe it’s there and I just missed it.) At any rate, it is rarely used. We probably won’t return to it in this class (hence, no homework or exams on this section). The reason this definition isn’t used is that we will discover still another definition later on, in the case of continuous random variables (random variables with density functions). Between them, the discrete-case definition and the density-function definition cover most applications. But they don’t cover all possibilities. I offer you this general definition so that you will be confident that E(X) is a very general concept, and not limited to special cases. Optional Exercise. Construct cdf’s for the random variables in examples 2 and 6, above, and show that in these cases, the definition of E(X) in this section agrees with the definition of E(X) in Section 1. (end) 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes on Expected Value