Download Averages and expected values of random variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
1.7 Averages and Expected Values of Random Variables
One thing that we do frequently is compute the average of a series of related measurement.
Example 1. You are a wholesaler for gasoline and each week you buy and sell gasoline. Naturally you are
interested in how the price you pay for gasoline (the wholesale price) varies from week to week. Let Xj be
the wholesale price of gasoline on week j where week one is the first full week of February of this year.
The Xj can be regarded as random variables. Now suppose we are in mid-march and the actual wholesale
prices of gasoline in the five weeks beginning with the first full week of February were
q1 = $2.70
q2 = $2.60
q3 = $2.80
q4 = $2.70
q5 = $2.80
The average of these five prices is
_
q1 + q2 + q3 + q4 + q5
2.70 + 2.60 + 2.80 + 2.70 + 2.80
13.60
q =
=
=
= $2.72
5
5
5
_
In general, the average q of a sequence of values q1, q1, …, qn is the sum divided by the number n of values,
i.e.
_
q1 + q2 +  + qn
q =
n
Why are we interested in averages? One reason is that it falls somewhat in the "middle" of the values, so it
is often used to summarize a group of numbers by a single number. Another reason is the following.
Suppose you were to sell the gasoline over the five week period at a single price s. What price s should you
sell the gasoline for in order to come out even assuming you buy and sell the same amount each week. It is
not hard to see that s = the average = $2.72, since
 amount received for 
selling a gallon each week
= 5s = (5)(2.72) = 13.60
=
 amount received for 
buying a gallon each week
Suppose we have a sequence X1, X1, …, Xn of repeated independent trials where each of the random
variables Xj take on the values x1, …, xm and they all have the same probability mass function f(x) where
f(xk) = Pr{Xj = xk} for each j and k. Suppose q1, q1, …, qn are the values we actually observe for the random
_
variables X1, X1, …, Xn, … In our computation of q let's group all the values of qj that equal x1 together and
all the values of qj that equal x2 together, etc. Then we have
1.7 - 1
_
q1 + q2 +  + qn
(x1 + x1 +  + x1) + (x2 + x2 +  + x2) +  + (xm + xm +  + xm)
q =
=
n
n
=
g1x1 + g2x2 +  + gmxm
g1
g2
gm
=
x + x +  + xm
n
n 1 n 2
n
gk
 Pr{X = xk} = f(xk)
n
where gj is the number of times that xj appears in q1, q1, …, qn. As n   one has
where X denotes any of the Xj. So as n   one has
(1)
_
q  X
where
X = E(X) = mean of X = expected value of X
(2)
m
= Pr{X = x1} x1 + Pr{X = x2} x2 + + Pr{X = xm} xm =
 Pr{X = xk} xk
k=1
= f(x1)x1 + f(x2)x2 + + f(xm)xm =
m
 f(xk)xk
k=1
The fact that (1) holds is actually an important theorem in probability theory called the Law of Large
Numbers. A precise statement is in Theorem 2 below.
Example 2. Suppose in Example 1 the set of possible values for the wholesale gasoline prices for any
particular week is  = {2.60, 2.70, 2.80, 2.90, 3.00} and the probabilities that the gasoline price Xj takes on
these values for the jth week is as follows
Pr{X = 2.60} = 0.25
Pr{X = 2.70} = 0.4
Pr{Xk = 2.90} = 0.1
Pr{X = 2.80} = 0.2
Pr{Xk = 3.00} = 0.05
Then
Xk = (0.25) (2.60) + (0.4) (2.70) + (0.2) (2.80) + (0.1) (2.90) + (0.05) (3.00)
= 0.52 + 1.08 + 0.56 + 0.29 + 0.15 = 2.73
_
If the Xj are all independent, then we would expect the average qn of the actual prices over n weeks to
approach $2.73 as n  .
For certain special types of random variables there are formulas for their mean. The following propositions
give the mean for uniform, Bernoulli, geometric and Poisson distributions.
1.7 - 2
Proposition 1. Suppose X has a uniform distribution on equally spaced outcomes, i.e. the set of possible
1
values for X is  = {a, a + h, a + 2h, …, a + mh = b} and Pr{Xj = a + kh} =
for k = 0, 1, …, m.
m+1
a+b
Then X =
.
2
Proof. X =
m
m
m
1
1
1
 m + 1 (a + kh) = m + 1  (a + kh) = m + 1 [(m + 1)a + h  k ]
k=0
k=0
k=0
1  m(m + 1)
mh
= a+
m + 1 h  2  = a + 2 = a +
b - a
m
 m 
b-a
a+b
= a+
=
2
2
2
Here we have used the fact that the sum of the integers from 1 to m is
m(m + 1)
. //
2
Example 3. Let X be the outcome of a single roll a fair die. Then X has a outcomes 1, 2, 3, 4, 5, 6. Since
the die is assumed fair, X has a uniform distribution with a = 1, b = 6 and m = 5. By Proposition 1,
a+b
X = 2 = 3.5.
Proposition 2. Suppose X has a Bernoulli distribution, i.e. Pr{X = 0} = 1 - p and Pr{X = 1} = p where p
lies between 0 and 1. Then X = p.
Proof. X = (1 – p)(0) + (p)(1) = p. //
Proposition 3. Suppose X has a geometric distribution, i.e. Pr{X = k} = p(1 – p)k-1 for k = 1, 2, 3, ….
1
where p lies between 0 and 1. Then X = .
p
Proof. X =


 kp(1 – p)k-1. In order to do this sum we start with the fact that  (1 – p)m-1 = 1/p and
k=1
m=1

take the derivative of both sides with respect to p. This gives
 (m - 1)(1 – p)m-2 = 1/p2. If we replace
m=1

m - 1 by k and multiply both sides by p we get
1
 kp(1 – p)k-1 = p. //
k=1
Example 4. A store sells two types of tables: plain and deluxe. Whan a customer buys a table, there is an
80% chance that it will be a plain table. Assume that each day five tables are sold. Let N be the number of
days until a deluxe table is sold starting with today which corresponds to N = 0. What is the expected
number of days until a deluxe table is sold?
Solution. The probability that the five tables sold on a give day are all plain is (0.8) 5  0.3277. The
probability of selling at least one deluxe table on a given day is p = 1 – (0.8)5  0.6723. The probability of
first selling a deluxe table on day n is p(1 – p)n, i.e. Pr{N = n} = p(1 – p)n. If we let M = N + 1, then
1.7 - 3
Pr{M = m} = Pr{N + 1 = m} = Pr{N = m – 1} = p(1 – p)m-1. So M is geometric. By Proposition 3,
1
1
1
E{M} = . So E{N} = E{M – 1} = E{M} – 1 = - 1 
- 1  1.486 – 1 = 0.486.
p
p
0.6723
Proposition 4. Suppose N has a Poisson distribution, i.e. Pr{N = n} =
 n e -
n!
for n = 0, 1, 2, 3, …. where
 is a positive parameter. Then E{N} =  .

Proof. E{N} =

 e 
n
 n!  =
n=0

giving E{N} = 

n
-


 n e -
(n - 1)!
n=1

=

 n e -
. We replace n – 1 by k and factor out 
(n - 1)!
n=1

 k e -
=  . Here we have used the fact that
k!
j=1

 k e -
k!
= 1. //
j=1
Example 5. A hospital observes that the number of heart attack cases that arrive in the Emergency Room is
a Poisson random variable with mean 3 per hour? Find the probability that no more than two heart attack
cases arrive in the Emergency Room during the next hour.
Solution. By Proposition 4, one has  = 3. Then Pr{N  2} = Pr{N = 0} + Pr{N = 1} + Pr{N = 2} =
 0 e -  1 e -  2 e -
2
+
+
= (1 +  + ) e- = (1 + 3 + 4.5) e-3 = 8.5e-3  0.423.
0!
1!
2!
2
The operation of finding the mean of a random variable has a number of useful properties.
Theorem 5. Let S be a sample space and X be a random variables with domain S.
Let x1, …, xm be the
values X assumes and let E1, …, Eq be disjoint sets whose union is S such that for each r the random
variable X assumes the same value on Er, i.e. there is k = k(r) such that X(a) = xk for a  Er. Then
q
(3)
E(X) =
 xk(r) Pr{Er}
r=1
(4)
E(X) =
 X(a) Pr{a}
aS
m
Proof. By (2) one has E(X) =
 Pr{X = xk} xk. Let Ek1, …, Ek,rk be those Eq such that X(a) = xk for
k=1
a  Ekq. Then {X = xk} = Ek1  …  Ek,rk and Pr{X = xk} =
rk
m
rk
 Pr{Ekr}. So E(X) =   Pr{Ekr} xk =
r=1
k = 1r = 1
q
 xk(r) Pr{Er} which proves (3). (4) is a special case of (3). //
r=1
Example 6. You have two friends, Alice and Bob. You invite both of them over to help you clean house.
If either or both of them come you get $100. If neither comes you get nothing. Suppose the probability of
1.7 - 4
either coming is ¼ and whether one comes is independent of whether the other comes. The four outcomes ,
the probability of the outcomes and how much you get, W in each case are as follows.
AB = both come
Pr{AB} = 1/16
W=1
Ab = only Alice comes
Pr{Ab} = 3/16
W=1
aB = only Bob comes
Pr{aB} = 3/16
W=1
ab = neither comes
Pr{ab} = 9/16
W=0
One has Pr{W = 0} = 9/16 and Pr{W = 1} = 7/16, so E(W) = (0)(9/16) + (1)(7/16) = 7/16. To illustrate
formula (3) in Theorem 1, consider the following three events with their probabilities and the value of W for
the outcomes in that event
E1 = {AB} = both come
Pr{E1} = 1/16
W=1
E2 = {Ab, aB} = only one comes
Pr{E2} = 6/16
W=1
E3 = neither comes
Pr{E3} = 9/16
W=0
Then according to formula (3) one has E(W) = (1) Pr{E1} + (1) Pr{E2} + (0) Pr{E3} = (1) (1/16) +
(1) (6/16) + (0) (9/16) = 7/16. To illustrate formula (4) in Theorem 1, one has E(W) = (1) Pr{AB} + (1)
Pr{Ab} + (1) Pr{Ab} + (0) Pr{ab} = (1) (1/16)} + (1) (3/16) + (1) (3/16) + (0) (9/16) = 7/16.
Theorem 6. Let S be a sample space and X and Y be random variables with domain S. Let c be a real
number and y = g(x) be a real valued function defined for real numbers x. Let x1, …, xm be the values X
assumes. In (9) on the left E(c) denotes the expected value of the random variable which is c for every
outcome. Then
(5)
E(X + Y) = E(X) + E(Y)
(6)
E(cX) =
cE(X)
(7)
E(XY) =
E(X)E(Y)
(8)
E(g(X)) =
if X and Y are independent
m
 g(xk)f(xk)
k=1
(9)
E(c) = c
Proof. Using (4) one has E(X + Y) =
 (X(a) + Y(a)) Pr{a} =  X(a) Pr{a} +  Y(a) Pr{a} =
aS
aS
aS
E(X) + E(Y) which proves (5). The proof of (6) is similar. Let y1, …, yr be the values Y takes on. By the
definition of expectation one has E(X) =
m
m
r
j=1
k=1
 Pr{X = xj} xj and E(Y) =  Pr{Y = yk} yk. So E(X)E(Y) =
r
  Pr{X = xj}Pr{Y = yk}xjyk. Since X and Y are independent one has Pr{X = xj}Pr{Y = yk} =
j=1k=1
1.7 - 5
m
Pr{X = xj, Y = yk}. So E(X)E(Y) =
r
  Pr{X = xj, Y = yk}xjyk. However, by (3) this last sum equals
j=1k=1
E(XY) which proves (7). Note that g(x) is constant on the sets {X = xj}. So (8) follows from (3). The proof
of (9) is easy. //
Example 7. A company produces transistors. They estimate that the probability of any one of the
transistors is defective is 0.1. Suppose a box contains 20 transistors. What is the expected number of
defective transistors in a box?
Solution. Let Xj = 1 if the jth transistor is defective and Xj = 0 if is is good. The number N of defective
transistors is N = X1 + … + X20. By (5) in Theorem 2 one has E(N) = E(X1) + … + E(X20). By Proposition
2 one has E(Xj) = 0.1 for each j. So E(N) = (0.1)(20) = 2.
This example illustrates the following general proposition.
n
Proposition 7. If N has a binomial distribution with Pr{N = k} = k  pk qn-k for k = 0, 1, …, n then
E(N) = np.
Proof. N = X1 + … + Xn where Pr{Xj = 1} = p and Pr{Xj = 0} = 1 - p. By (5) in Theorem 2 one has
E(N) = E(X1) + … + E(Xn). By Proposition 2 one has E(Xj) = p for each j. So E(N) = np.
As mentioned earlier, the fact that (1) holds is called the Law of Large Numbers. A precise statement is as
follows.
Example 8. Consider a random walk where the probability of a step to the right is ½ and the probability of
a step to the left is ½. After 4 steps your position Z could be either -2, 0 or 2 with probabilities ¼, ½ and ¼
respectively. Compute E(Z2).
Solution. By (8) one has E(Z2) = (- 2)2 Pr{Z = - 2} + (0)2 Pr{Z = 0} + (2)2 Pr{Z = 2} = (4)(1/4) + (0)(1/2)
+ (4)(1/4) = 2. If we were to compute E(Z2) from the definition (2) then E(Z2) = (4) Pr{Z2 = 4} + (0)2
Pr{Z2 = 0} = (4)(1/2) + (0)2 (1/2) = 2.
Example 9. An unfair coin is tossed twice with the two tosses independent of each other. Suppose on each
toss Pr{H} = ¼ and Pr{T} = ¾. For each toss we win a $1 if it comes up heads and lose a $1 if it comes up
tails. We Wj be the amount we win on the jth toss. Compute E(W1W2).
Solution. By (7) one has E(W1W2) = E(W1) E(W2). One has E(W1) = E(W2) = (1)(1/4) + (-1)(3/4) = -1/2.
Do E(W1W2) = (- ½)(- ½) = ¼.
Theorem 8. Let X1, X1, …, Xn, … be a sequence of independent random variables all taking on the same
set of values x1, …, xm and having the same probability mass function f(x). Let
_
Xn =
1 (X1 + X2 +  + Xn)
n
1.7 - 6
_
Then Pr{a: Xn(a)  X } = 1 as n   .
_
_ _
_
Note that Xn is again a random variable, so that X1, X2, …, Xn, … is a new sequence of random variables and
_
_
_
for each outcome a one has a sequence of numbers X1(a), X2(a), …, Xn(a), … The Law of Large Numbers
_
says that Xn(a)  X except for a set of outcomes that has probability zero. For the proof of the law of large
numbers see a more advanced book in probability.
1.7 - 7