Download 7.4 Expected Value and Variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
7.4
Expected Value and Variance
Recall: A random variable is a function from the sample space of an experiment to the set of real numbers.
That is, a random variable assigns a real number to each possible outcome.
Expected Values
People who buy lottery tickets regularly often justify the practice by saying that, even though they know
that on average they will lose money, they are hoping for one significant gain, after which they believe they
will quit playing. Unfortunately, when people who have lost money on a string of losing lottery tickets win
some or all of it back, they generally decide to keep trying their luck instead of quitting.
The technical way to say that on average a person will lose money on the lottery is to say that the
expected value of playing the lottery is negative.
Definition 1. The expected value, also called the expectation or mean, of the random variable X on the
sample space S is equal to
X
E(X) =
p(x)X(x).
x∈S
The deviation of X at s ∈ S is X(s) − E(X), the difference between the value of X and the mean of X.
Pn
Remark 1. When the sample space S has n elements, S = {x1 , . . . , xn }, then E(X) = i=1 p(xi )X(xi ).
Remark 2. When there are infinitely many elements of the sample space, the expectation is defined only
when the infinite series in the definition is absolutely convergent. In particular, the expectation of a random
variable on an infinite sample space is finite if it exists.
Example 1. What is the expected number of times a H comes up when a fair coin is flipped twice?
Solution. Sample space S = {HH, HT, T H, T T }. Random variable X:
X(HH) = 2
X(HT ) = X(T H) = 1
X(T T ) = 0
Because the coin is fair and the flips are independent, the probability of each outcome is 41 . Consequently,
1
[X(HH) + X(HT ) + X(T H) + X(T T )]
4
1
= [2 + 1 + 1 + 0]
4
=1
E(X) =
Consequently, the expected number of heads that come up when a fair coin is flipped twice is 1.
Notation. If X is a random variable on a sample space S, let p(X = r) be the probability that X = r, that
is,
X
p(X = r) =
p(s).
s∈S,X(s)=r
Example 2. Suppose that 500,000 people pay $5 each to play a lottery game with the following prizes: a
grand prize of $1,000,000, 10 second prizes of $1,000 each, 1,000 third prizes of $500 each, and 10,000 fourth
prizes of $10 each. What is the expected value of a ticket?
Solution. Each of the 500,000 lottery tickets has the same chance as any other of containing a winning
1
lottery number, and so p(xk ) = 500000
for all k = 1, 2, 3, . . . , 500000. Let x1 , x2 , x3 , . . . , x500000 be the net
gain for an individual ticket, where x1 = 999995 (the net gain for the grand prize ticket, which is one million
dollars minus the $5 cost of the winning ticket), x2 = x3 = · · · = x11 = 995 (the net gain for each of the 10
second prize tickets), x12 = x13 = · · · = x1011 = 495 (the net gain for each of the 1,000 third prize tickets),
and x1012 = x1013 = · · · = x11011 = 5 (the net gain for each of the 10,000 fourth prize tickets). Since the
remaining 488,989 tickets just lose $5, x11012 = x11013 = · · · = x500000 = −5.
The expected value of a ticket is therefore
500000
X
xi p(xi ) =
i=1
500000
X
i=1
=
xi ·
1
500000
500000
X
1
xi
500000 i=1
1
(999995 + 10 · 995 + 1000 · 495 + 10000 · 5 + (−5) · 488989)
500000
1
=
(999995 + 9950 + 495000 + 50000 − 2444945)
500000
= −1.78
=
In other words, a person who continues to play this lottery for a very long time will probably win some
money occasionally but on average will lose $1.78 per ticket.
Lemma. If X is a random variable, then
E(X) =
X
p(X = r)r.
r∈X(s)
Example 3. What is the expected value of the sum of the numbers that appear when a pair of fair dice is
rolled?
Solution. Let X be the random variable equal to the sum of the numbers that appear when a pair of dice is
rolled. The range of X is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. We have
1
36
2
= 11) =
36
3
= 10) =
36
4
= 9) =
36
5
= 8) =
36
p(X = 2) = p(X = 12) =
p(X = 3) = p(X
p(X = 4) = p(X
p(X = 5) = p(X
p(X = 6) = p(X
p(X = 7) =
6
36
Thus
2
3
4
5
6
5
4
3
2
1
1
+3·
+4·
+5·
+6·
+7·
+8·
+9·
+ 10 ·
+ 11 ·
+ 12 ·
36
36
36
36
36
36
36
36
36
36
36
2 · 1 + 3 · 2 + · · · + 12 · 1
=
36
=7
E(X) = 2 ·
Linearity of Expectations
Recall that if f1 , f2 are functions from A to R, we may add and multiply them:
(f1 + f2 )(x) = f1 (x) + f2 (x)
(f1 f2 )(x) = f1 (x) · f2 (x)
So if X1 and X2 are random variables with sample space S, we may add or multiply them to obtain new
random variables.
Theorem 1 (Expectation is Linear). If Xi , i = 1, 2, . . . , n are random variables on S, and if a, b ∈ R, then
1. E(X1 + X2 + · · · Xn ) = E(X1 ) + E(X2 ) + · · · + E(Xn )
2. E(aX + b) = aE(X) + b
Example 4. Suppose that n Bernoulli trials are performed, where p is the probability of success on each
trial. What is the expected number of successes?
Solution. Let Xi be the random variable with Xi ((t1 , t2 , . . . , tn )) = 1 if ti is a success and Xi ((t1 , t2 , . . . , tn )) =
0 if ti is a failure. The expected value of Xi is E(Xi ) = 1 · p + 0 · (1 − p) = p for i = 1, 2, . . . , n. Let
X = X1 + X2 + · · · + Xn , so that X counts the number of successes when these n Bernoulli trials are performed. Thus the sum of n random variables shows that E(X) = E(X1 ) + E(X2 ) + · · · + E(Xn ) = np.
The Geometric Distribution
We now turn our attention to a random variable with infinitely many possible outcomes.
Definition 2. A random variable X has a geometric distribution with parameter p if p(X = k) = (1−p)k−1 p
for k = 1, 2, 3, . . ., where p ∈ R with 0 ≤ p ≤ 1.
Lemma.
∞
X
jxj−1 =
j=1
1
(x − 1)2
for |x| < 1.
Example 5. Suppose that the probability that a coin comes up tails is p. This coin is flipped repeatedly until
it comes up tails. What is the expected number of flips until this coin comes up tails?
Solution. We first note that the sample space consists of all sequences that begin with any number of
heads, denoted by H, followed by a tail, denoted by T . Therefore, the sample space is the set S =
{T, HT, HHT, HHHT, HHHHT, . . .}. Note that this is an infinite sample space. We can determine the
probability of an element of the sample space by noting that the coin flips are independent and that the
probability of a head is 1 − p. Therefore, p(T ) = p, p(HT ) = (1 − p)p, p(HHT ) = (1 − p)2 p, and in general
the probability that the coin is flipped n times before a tail comes up, that is, that n − 1 heads come up
followed by a tail, is (1 − p)n−1 p.
Now let X be the random variable equal to the number of flips in an element in the sample space. That
is, X(T ) = 1, X(HT ) = 2, X(HHT ) = 3, and so on. Note that p(X = j) = (1 − p)j−1 p. The expected
number of flips until the coin comes up tails equals E(X). Thus, using the above lemma:
E(X) =
∞
X
j=1
j · p(X = j) =
∞
X
j=1
j(1 − p)j−1 p = p
∞
X
j=1
j(1 − p)j−1 = p ·
1
1
= .
p2
p
Theorem 2. If the random variable X has the geometric distribution with parameter p, then E(X) = 1/p.
Independent Random Variables
Definition 3. The random variables X and Y on a sample space S are independent if
p(X = r1 and Y = r2 ) = p(X = r1 ) · p(Y = r2 ),
or in words, if the probability that X = r1 and Y = r2 equals the product of the probabilities that X = r1
and Y = r2 , for all real numbers r1 and r2 .
Theorem 3. If X and Y are independent random variables on a sample space S, then E(XY ) = E(X)E(Y ).
Proof. To prove this formula, we use the key observation that the event XY = r is the disjoint union of the
events X = r1 and Y = r2 over all r1 ∈ X(S) and r2 ∈ Y (S) with r = r1 r2 . We have
X
E(XY ) =
r · p(XY = r)
r∈XY (s)
X
=
r1 r2 · p(X = r1 and Y = r2 )
r1 ∈X(s),r2 ∈Y (s)
=
X
X
r1 r2 · p(X = r1 and Y = r2 )
r1 ∈X(s) r2 ∈Y (s)
=
X
X
r1 r2 · p(X = r1 ) · p(Y = r2 )
r1 ∈X(s) r2 ∈Y (s)
=
X
(r1 · p(X = r1 ) ·
r1 ∈X(s)
=
X
X
r2 · p(Y = r2 ))
r2 ∈Y (s)
r1 · p(X = r1 ) · E(Y )
r1 ∈X(s)
= E(Y ) ·
X
r1 · p(X = r1 )
r1 ∈X(s)
= E(Y )E(X)
We complete the proof by noting that E(Y )E(X) = E(X)E(Y ), which is a consequence of the commutative
law for multiplication.
Variance
The expected value of a random variable tells us its average value. What if we want to know how far from
the average the values of a random variable are distributed?
For example, if X and Y are the random variables on the set S = {1, 2, 3, 4, 5, 6}, with X(s) = 0 for all
s ∈ S and Y (s) = −1 if s ∈ {1, 2, 3} and Y (s) = 1 if s ∈ {4, 5, 6}, then the expected values of X and Y
are both zero. However, the random variable X never varies from 0, while the random variable Y always
differs from 0 by 1. The variance of a random variable helps us characterize how widely a random variable
is distributed. In particular, it provides a measure of how widely X is distributed about its expected value.
Definition 4. Let X be a random variable on a sample space S. The variance of X, denoted by V (X), is
X
V (X) =
(X(s) − E(X))2 p(s).
s∈S
That is, V (X) is the weighted p
average of the square of the deviation of X. The standard deviation of X,
denoted σ(X), is defined to be V (X).
Theorem 4. If X is a random variable on a sample space S, then V (X) = E(X 2 ) − E(X)2 .
Proof. Note that
V (X) =
X
(X(s) − E(X))2 p(s)
s∈S
=
X
X(s)2 p(s) − 2E(X)
s∈S
X
X(s)p(s) + E(X)2
s∈S
X
p(s)
s∈S
= E(X 2 ) − 2E(X)E(X) + E(X)2
= E(X 2 ) − E(X)2
P
We have used the fact that s∈S p(s) = 1 in the next-to-last step.
Using this theorem, we can prove ...
Theorem 5. If X and Y are two independent random variables on a sample space S, and a ∈ R, then
V (X + Y ) = V (X) + V (Y )
and
V (aX) = a2 V (X).
Example 6. What is the variance of the random variable X with X(t) = 1 if a Bernoulli trial is a success
and X(t) = 0 if it is a failure, where p is the probability of success and q is the probability of failure?
Solution. Because X takes only the values 0 and 1, it follows that X 2 (t) = X(t). Hence,
V (X) = E(X 2 ) − E(X)2 = p − p2 = p(1 − p) = pq.
Example 7. What is the variance of the number of successes when n independent Bernoulli trials are
performed, where, on each trial, p is the probability of success and q is the probability of failure?
Solution. Let Xi be the random variable with Xi ((t1 , t2 , . . . , tn )) = 1 if trial ti is a success and Xi ((t1 , t2 , . . . , tn )) =
0 if trial ti is a failure. Let X = X1 + X2 + · · · + Xn . Then X counts the number of successes in the n
trials. From a theorem it follows that V (X) = V (X1 ) + V (X2 ) + · · · + V (Xn ). Using the previous example,
we have V (Xi ) = pq for i = 1, 2, . . . , n. It follows that V (X) = npq.
Chebyshev’s Inequality
How likely is it that a random variable takes a value far from its expected value?
Theorem 6 (CHEBYSHEV’S INEQUALITY). Let X be a random variable on a sample space S with
probability function p. If r is a positive real number, then
p(|X(s) − E(X)| ≥ r) ≤
V (X)
.
r2