Download Notes on Expected Value

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
October 21, 2002
Expected Values
Notes for Math 295
1. Expected Values
Intuitively, the expected value E(X) of a random variable X is the average value of X,
in this sense: If we repeat the relevant experiment many times, and compute the “realized”
value of X each time, and compute the average of the realized values of X, the result will be
close to E(X). Furthermore, if we keep repeating the experiment, the successive averages (for
increasing numbers of trials) will converge to E(X).
This won’t work as a mathematical definition, since it assumes too much. What if we
can’t actually repeat the experiment? What if we do repeat it, but the averages don’t
converge?
We will give a more formal mathematical definition. Our plan is to show—first
intuitively, and later on, formally—that our mathematical definition matches the intuitive
definition
We begin with a definition for a discrete random variable X. (Recall X is discrete if it
can take on only finitely many or countably many values.)
Definition. Let X be a discrete random variable. Then the expected value of X is the
number given by
E(X)   a  P(X  a)
a
  a  p X (a)
(1)
a
where the sum is taken over all possible values a of X, provided that this sum is absolutely
convergent.
(Please ignore that “provided” clause for the moment. There is a whole section about
it, below.)
This definition gives the expected value in terms of the probability function pX( )
(which means, in terms of P(X = a) for the various values of a). Equation (1) is the best way
to compute E(X) if you have already computed pX( ). Just add a column to the table you
already have, and compute the sum of the new column.
Example 1. The experiment is to roll one die. Let X be the number on top: 1, 2, 3, 4,
5, or 6, with uniform probabilities. Then the expected value of X is
E(X)   a  P(X  a)  1 16  2  16  3  16  4  16  5  16  6  16  3 12 .
a
Since the probabilities are all the same, it would have been easier to write
E(X) = (1+2+3+4+5+6) / 6 = 3 1/2
with the same result. This matches our intuition for the average result of rolling one die.
Example 2. The experiment is to flip three coins. Let X be the number of heads
showing. In an earlier exercise we found the probability function for X:
a
0
1
2
3
P(X=a)
1/8
3/8
3/8
1/8
One way to calculate the expected value is to add a column to the table as follows.
a
0
1
2
3
P(X = a)
1/8
3/8
3/8
1/8
a P(X = a)
0
3/8
6/8
3/8
12/8  1 12  E(X) .
Once again, this matches our intuition. On average, when we flip three coins, we get one and
one half heads.
Note: Since E(X) isn’t a probability, it is not necessary that E(X)  1. Our examples
show this. It is also possible for E(X) to be negative. There are plenty of examples of
negative expected values in any casino! Also see Examples 5 and 6 below.
Note also: It is possible that E(X) has a value which is not one of the possible values
of X. This happened in both Examples 1 and 2. The classic case is the average family with
1.5 children.
Always remember: In order for E(X) to make sense, X must be a random variable.
E(X) is always a number. Question: Can we make sense of an expression like E(15) or
E(E(X))? Well, yes, if we treat “15” in this context as the name for a “random” variable that
happens always to be equal to the number 15, regardless of the outcome of the experiment.
Of course, E(15) = 15. To make sense of E(E(X)), we have to treat the inner E(X) as a
random variable in the same way---but since it is really a number like 15, we always have
E(E(X)) = E(X).
Example 3. The experiment is to roll two dice. Let X = sum of the two numbers
showing. In this case the possible values of X are 2, 3, …, up to 12, and the slow way to
compute E(X) is from the probability function:
2
a
2
3
4
5
6
7
8
9
10
11
12
P(X = a)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
a P(X = a)
1/36
4/36
12/36
20/36
30/36
42/36
40/36
36/36
30/36
22/36
12/36
E(X) = 7
But there’s an easier way. Define two other random variables:
X1 = number on first die
X2 = number on second die
Now we have: X = X1 + X2. Also, we know from Example 1 that E(X1) = 3.5 and
E(X2) = 3.5. So, we can apply the principle we are about to learn, that the expected value of
the sum of random variables is the sum of their expected values:
E(X) = E(X1 + X2) = E(X1) + E(X2) = 3.5 + 3.5 = 7.
Again, this matches intuition.
(That principle would have saved some work in Example 2, also.)
Example 4. (Optional) Let’s try an example with a countably infinite number of
possible values of X. The real point of this example is a trick for summing an infinite series.
The experiment is to roll a die repeatedly until we get a six. Let X be the number of
rolls required. Then the possible values of X are 1, 2, 3, 4, … and we have seen earlier that
n 1
 1  5 
, for each integer n  1.
P(X  n)    
 6  6 
The expected value is therefore
E(X)  (1)  16   (2)  16  65   (3)  16  65   (4)  16  65  
2
3
(2)
This looks like a geometric series, but it is not a geometric series. Each term gains an extra
factor of 5/6, as in a geometric series, but the multipliers 1, 2, 3, 4 etc. keep it from fitting the
pattern. So how can we sum the series?
3
This is a case in which the formula for a geometric series is useless, but the method I
keep urging on you works like a charm. Since 5/6 seems to have a significant roll, let’s see
what happens when we multiply equation (2), term by term, by 5/6:
 56  E(X)  (1)  16  56   (2)  16  65 
2
 (3)  16  65   (4)  16  65  
3
4
(3)
Now we subtract (3) from (2). It would be nice if all the terms would cancel, but that doesn’t
happen. Instead, we get (subtracting the first term of (3) from the second term of (2), etc.)
 16  E(X)  (1)  16   (1)  16  65   (1)  16  65 
2
 (1)  16  65  
3
(4)
This is a geometric series. So, we can use the formula. Or, better still, use the method again.
Multiply (4) by 5/6:
 56  16  E(X)  (1)  16  65   (1)  16  65 
2
 (1)  16  65   (1)  16  65  
3
4
(5)
Subtract (5) from (4):
 16  16  E(X)  (1)  16  .
(6)
Finally, that’s all we need to determine that
E(X)  6.
(7)
On average, if you set out to roll a six, it will take you 6 tries.
Does that method look hard? It’s easier than learning the formula for geometric series
(which isn’t actually very hard). If you have an infinite sum, just look for a multiplier that
seems important, multiply your series by that number, subtract from your original series, and
see if anything good happens. That’s really easy. And as we have just seen, sometimes it
works even when the series in question isn’t geometric.
Example 5. The experiment is to spin a standard roulette wheel, and bet the table
minimum, ten dollars, on red. Let X be my profit in dollars.
Now X has two possible outcomes: +10 and –10. Since there are 18 red numbers out
of 38 numbers on the wheel, we have P(X = +10) = 18/38, and P(X = –10) = 20/38. So we
have:
E(X) = (10)(18/38) + (-10)(20/38)  10/19  0.526.
On average, I lose a bit over fifty cents every time I place the minimum bet on red.
Example 6. The experiment is to spin a standard roulette wheel, and bet ten dollars
on the single number 17 (for which the house offers 35-to-1 odds). Let X be my profit in
4
dollars. Now the possible values of X are +350 and –10, and the probabilities are P(X =
+350) = 1/38 and P(X = -10) = 37/38.
Again, we calculate
E(X) =  10/19  0.526.
That’s the same expected value as for betting on red. It turns out that in an American
roulette game, the expected profit on every $10 bet is the same—even if you are allowed to
divide your $10 among several numbers. (There are minor exceptions, namely the 6-to-1
bets.)
2. Does every random variable have an expected value?
Why did we need that “provided” clause in the definition of expected value? Is it
possible that the sum in the definition fails to converge, or fails to converge absolutely?
If X has finitely many possible values, the sum is a finite sum. So, it converges
automatically and E(X) always exists.
But if X has infinitely many possible values, it is possible that the sum diverges to
infinity. A famous example is the “St. Petersburg Paradox.” A coin is flipped until it comes
up heads. You win $2 if it happens on the first flip, $4 if it happens on the second flip, $8 if it
happens on the third flip, etc. (In general, 2n dollars if the experiment takes n flips.) If you
try computing the expected value, the result of the sum is +infinity. (How much would you
pay to play that game? Would your answer depend on how much money there is in the
world? If you this example is interesting, take a course in game theory.)
It is also possible for something even worse to happen. The following example will
never come up again in Math 295. It isn’t in any homework or on any test, and you might
never see this phenomenon again, but you’re entitled to fair warning.
Suppose X takes on both positive and negative values. In particular, suppose that the
pmf for X is as follows:
P(X = +1) = C
P(X = -1) = C
P(X = +2) = C/4
P(X = -2) = C/4
P(X = +3) = C/9
P(X = -3) = C/9
In general, if n isn’t zero, P(X = n) = C/n2.
Here C is chosen to make the sum of the probabilities equal 1. (I think C   2 /12 ).
In this case the sum that would give E(X) is conditionally convergent. If you calculate
the sum in the order that I have given you the terms, the sum is zero. But if you put the terms
in some other order, you might get some other sum. In fact, you can get any sum you want,
5
including infinity or minus infinity, by putting the terms in an appropriate order. We don’t
want this kind of chaos in Math 295. So, we say that if the sum that defines E(X) is not
absolutely convergent, the expected value of X does not exist.
If you are very brave, you might try to simulate this experiment. If you do, you will
find that the average value of X does not converge. It just keeps jumping around, even if you
run billions of trials.
Fortunately, examples like this are rare in applications.
3. Expected value directly from outcomes
In practice, you have probably calculated expected value once using the definition
above, and several times using a somewhat different formula:
E(X)   X(s)  P(s)
(8)
s
Here the sum is over all outcomes in the sample space.
Of course, this formula only works if the sample space is finite or countably infinite,
and if the sum in (8) converges absolutely. Maybe it is obvious that formulas (1) and (8) give
the same result. If it isn’t obvious, here’s a proof:



X(s)  P(s) 
s X(s)  P(s)  all possible
  all s's

for

values a  which X(s)=a

(because those are the same terms, just collected into groups)



 
a  P(s) 



all possible  all s's for
values a  which X(s)=a




  a
P(s) 



all possible  all s's for
values a
 which X(s)=a

  a  P(X  a)
all possible
values a
(because that's the definition of P(event) when S is discrete)
6
4. Expected value of a function of a random variable
Suppose X is a discrete random variable, and Y is another random variable that is
defined in terms of X, say by
Y = h(X).
That means that for every s, the value Y(s) is related to the value X(s) by
Y(s) = h ( X(s) ),
where h is some function.
If you want to calculate E(Y), you could build a probability function pY for Y and
calculate E(Y) from Equation (1). That may not be too hard, especially if you already have a
probability function for X. But you can speed up the process even more by using the formula
E(Y)   h(a)  P(X  a)
(9)
a
The significance of this formula is that you are still summing over the possible values
a of X, and you are still using the probability function for X, but you are substituting the
corresponding values of Y for the values of X in the multiplications (that is, h(a) in place of
a).
Example: Roll one die. Let X = number on top, as in example 1. Let Y = the square
of the number on top. Thus, Y = X2. Or, Y = h(X) where h(a) means a2.
In this case, we have:
E(Y)   h(a)  P(X  a)
a
 (1)(1/ 6)  (4)(1/ 6)  (9)(1/ 6)  (16)(1/ 6)  (25)(1/ 6)  (36)(1/ 6)
 91/ 6  15.167.
Note that E(Y) = E(X2) is not the same as (E(X))2.
5. What about random variables that are not discrete?
We have defined E(X) when X is a discrete random variable. What if X is not
discrete?
If X isn’t discrete, then it has no pX( ) (at least, no useful pX). But every random
variable has a cdf, defined by F(a) = P( X  a ). Can we define E(X) in terms of the cdf?
Definition. If X is any random variable and F is its cumulative distribution function,
then
7

E(X)   1  F(x)  dx 
0
0
 F(x)dx
(10)

provided both integrals exist.
This definition has a geometric interpretation. It says that E(X) is the combination of
the two shaded areas in this figure. The right-hand area counts as positive, and the left-hand
area counts as negative.
+
F(a)
–
(For random variables that can only take positive values, this definition is simpler. In
this case F(a)=0 whenever a<0. This means that the second integral in (10) is zero, and the
left-hand area in the figure vanishes.)
Before we can accept this definition for general random variables, we should check
that it agrees with our earlier definition in the case of discrete random variables. This is a
fairly large project, and we will skip it. In fact, the definitions do agree.
I haven’t seen this general definition in any probability texts. (Maybe it’s there and I
just missed it.) At any rate, it is rarely used. We probably won’t return to it in this class
(hence, no homework or exams on this section).
The reason this definition isn’t used is that we will discover still another definition
later on, in the case of continuous random variables (random variables with density
functions). Between them, the discrete-case definition and the density-function definition
cover most applications. But they don’t cover all possibilities. I offer you this general
definition so that you will be confident that E(X) is a very general concept, and not limited to
special cases.
Optional Exercise. Construct cdf’s for the random variables in examples 2 and 6,
above, and show that in these cases, the definition of E(X) in this section agrees with the
definition of E(X) in Section 1.
(end)
8