Download Averages or Expected Values of Random Variable

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Randomness wikipedia , lookup

Transcript
2. Averages and Expected Values of Random Variables
In the next section we will be interested in computing the average cost of testing a diode when we test them
in groups of n. This is a special case of the mean or expected value of a random variable. The mean or
expected value of a random variable is is related to computing the average of a sequence of related
measurements, but is not quite the same. So let's look at averages of a sequence of numbers first.
Suppose we have a sequence of observations x1, x2, …, xn of something. So x1, x2, …, xn is just a sequence
of numbers which may be observations of something that have already been made, so there is nothing
_
probabilistic about them. The average x of these observations is their sum divided by the number of
observations, i.e.
(1)
_
x1 + x2 +  + xn
x =
n
Example 1. You are a wholesaler for gasoline and each week you buy and sell gasoline. Naturally you are
interested in how the price you pay for gasoline (the wholesale price) varies from week to week. Suppose
the wholesate price of gasoline for the five weeks was
q1 = $2.70
q2 = $2.60
q3 = $2.80
q4 = $2.70
q5 = $2.80
The average of these five prices is
_
q1 + q2 + q3 + q4 + q5
2.70 + 2.60 + 2.80 + 2.70 + 2.80
13.60
q =
=
=
= $2.72
5
5
5
Why are we interested in averages? One reason is that it falls somewhat in the "middle" of the values, so it
is often used to summarize a group of numbers by a single number. Another reason is the following.
Suppose you were to sell the gasoline over the five week period at a single price s. What price s should you
have sold the gasoline for in order to come out even for the five week period assuming you buy and sell the
same amount each week. It is not hard to see that s = the average = $2.72, since
 amount received for  = 5s = (5)(2.72) = 13.60 =  amount received for 
selling a gallon each week
buying a gallon each week
Problem 1. Each day a newsstand buys and sells The Wall Street Journal. Suppose the number the have
sold for each of the past ten days is 1, 3, 0, 1, 2, 0, 2, 1, 3, 1. Find the average number of copies they have
sold per day during the past ten days.
Answer: 1.4
Problem 2. A company manufactures diodes. 100 diodes are taken from the production line and tested. 98
of these are good and 2 are bad. Suppose we assign 1 to a diode if it is bad and 0 if it is good so that the
result of testing these 100 diodes is a sequence of numbers x1, …, x100 where xj is 1 if the diode is bad and 0
it it is good. What is the average number of this sequence of values? (This average is the proportion of
diodes that are bad.)
Answer: 0.02
2-1
Problem 3. A company manufactures diodes. 100 diodes are taken from the production line and tested.
However, instead of testing them individually, they are tested in groups of four. So there are 25 groups of
four. The cost of testing a group of four is 2 cents if they are all good and 7 cents if one or more are bad.
So the costs of testing the 25 groups can be represented by c1, …, c25. Suppose the 23rd and the 52nd diodes
are bad and the rest are good. Thus c6 = 7 and c13 = 7 and cj = 2 if j  6 and j  13. Find the average of
c1, …, c25?
Answer: 2.4
Now let's connect the average of a sequence of observations with random variables. Suppose we are
modeling a situation where we are going to make a sequence of related observations by a sequence
X1, X1, …, Xn of random variables where Xj is the result of the jth observation. Suppose each of the random
variables Xj takes on the values x1, …, xm and all the random variables have the same probability mass
function f(x) where f(xk) = Pr{Xj = xk} for each j and k. Suppose q1, q1, …, qn are the values we actually
_
observe for the random variables X1, X1, …, Xn. In our computation of q let's group all the values of qj that
equal x1 together and all the values of qj that equal x2 together, etc. Then we have
_
q1 + q2 +  + qn
(x1 + x1 +  + x1) + (x2 + x2 +  + x2) +  + (xm + xm +  + xm)
q =
=
n
n
=
g1x1 + g2x2 +  + gmxm
g1
g2
gm
=
x + x +  + xm
n
n 1 n 2
n
where gj is the number of times that xj appears in q1, q1, …, qn. As n   we expect
gk
 Pr{X = xk} = f(xk) where X denotes any of the Xj. So as n   we expect
n
(2)
_
q  f(x1)x1 + f(x2)x2 + + f(xm)xm
The sum f(x1)x1 + f(x2)x2 + + f(xm)xm is called the mean or expected value of each of the random variables
Xj. We summarize this by means of the following definition
Definition 1. Suppose X is a random variable that takes on the values x1, …, xm. Let f(x) be the probability
mass function, i.e. f(xk) = Pr{X = xk} for each k. Then
X = E(X) = mean of X = expected value of X
(3)
= Pr{X = x1} x1 + Pr{X = x2} x2 + + Pr{X = xm} xm =
m
 Pr{X = xk} xk
k=1
= f(x1)x1 + f(x2)x2 + + f(xm)xm =
m
 f(xk)xk
k=1
So (2) can be restated as
_
q  X
2-2
where X is the common mean of X1, X1, …, Xn. The fact that (3) holds if the Xj are independent is actually
an important theorem in probability theory called the Law of Large Numbers. A precise statement is in
Theorem 7 below.
Example 2. Suppose in Example 1 the set of possible values for the wholesale gasoline prices for any
particular week is S = {2.60, 2.70, 2.80, 2.90, 3.00}. Let Xj be the wholesale price of gasoline on week j
where week one is the first full week of May of this year. The Xj can be regarded as random variables.
Assume each of the Xj has the same probability distribution and the probabilities that the gasoline price Xj
takes on the values in S for the jth week is as follows
Pr{Xj = 2.60} = 0.25
Pr{Xj = 2.70} = 0.4
Pr{Xj = 2.80} = 0.2
Pr{Xj = 2.90} = 0.1
Pr{Xj = 3.00} = 0.05
Then
X = (0.25) (2.60) + (0.4) (2.70) + (0.2) (2.80) + (0.1) (2.90) + (0.05) (3.00)
= 0.52 + 1.08 + 0.56 + 0.29 + 0.15 = 2.73
_
If the Xj are all independent, then we would expect the average qn of the actual prices over n weeks to
approach $2.73 as n  .
Problem 4. Each day a newsstand buys and sells The Wall Street Journal. Based on records for the past
month they feel that they would never sell more than 4 copies in any day. Suppose the probabilities of
selling a certain number of copies on a given day are
The probability, Pr{0}, of selling zero copies in a given day = 0.21,
The probability, Pr{1}, of selling one copy in a given day = 0.26,
(1.3)
Pr{2} = 0.32,
Pr{3} = 0.16,
Pr{4} = 0.05,
Let X be the number of copies the newsstand sells tomorrow. Find X.
Answer: 1.58
Problem 5. A company manufactures diodes. Suppose the probability that a diode is defective is 0.3%,
i.e.
The probability that a diode is defective = Pr{d} = 0.003,
The probability that a diode is good = Pr{g} = 0.997.
2-3
Suppose the random variable X is defined by X(d) = 1 and X(g) = 0. Find X. How is X related to the other
Ans: X = 0.003 = Pr{d}
parameters in the situation.
Problem 6. A company manufactures diodes. They are tested in groups of four. The cost C of testing a
group of four is 2 cents if they are all good and 7 cents if one or more are bad. Suppose, as in Problem 5,
the probability of one diode being defective is 0.003 and whether one diode is defective is independent of
whether any other diode is defective. We saw in Example 15 in section 1.3 that the probability that all four
diodes in a group of 4 are good is (0.997)4 and the probability that one or more is defective is 1 – (0.997)4.
Ans: 7 – 5 (0.997)4 = 2.05973
Find E(C).
Means of Special Types of Distributions. For certain special types of random variables there are formulas
for their mean. The following propositions give the mean for uniform, Bernoulli, geometric and Poisson
distributions.
Proposition 1. Suppose X has a uniform distribution on equally spaced outcomes, i.e. the set of possible
1
values for X is S = {a, a + h, a + 2h, …, a + mh = b} and Pr{Xj = a + kh} =
for k = 0, 1, …, m. Then
m+1
a+b
X = 2 .
Proof. X =
m
m
m
1
1
1
 m + 1 (a + kh) = m + 1  (a + kh) = m + 1 [(m + 1)a + h  k ]
k=0
k=0
k=0
1  m(m + 1)
mh
= a+
m + 1 h  2  = a + 2 = a +
b - a
m
 m 
b-a
a+b
= a+
=
2
2
2
Here we have used the fact that the sum of the integers from 1 to m is
m(m + 1)
. //
2
Example 3. Let X be the outcome of a single roll a fair die. Then X has a outcomes 1, 2, 3, 4, 5, 6. Since
the die is assumed fair, X has a uniform distribution with a = 1, b = 6 and m = 5. By Proposition 1,
a+b
X = 2 = 3.5.
Proposition 2. Suppose X has a Bernoulli distribution, i.e. Pr{X = 0} = 1 - p and Pr{X = 1} = p where p
lies between 0 and 1. Then X = p.
Proof. X = (1 – p)(0) + (p)(1) = p. //
Proposition 3. Suppose X has a geometric distribution, i.e. Pr{X = k} = p(1 – p)k-1 for k = 1, 2, 3, ….
1
where p lies between 0 and 1. Then X = .
p
2-4
Proof. X =


 kp(1 – p)k-1. In order to do this sum we start with the fact that  (1 – p)m-1 = 1/p and
k=1
m=1

take the derivative of both sides with respect to p. This gives
 (m - 1)(1 – p)m-2 = 1/p2. If we replace
m=1

m - 1 by k and multiply both sides by p we get
1
 kp(1 – p)k-1 = p. //
k=1
Example 4. A store sells two types of tables: plain and deluxe. Whan a customer buys a table, there is an
80% chance that it will be a plain table. Assume that each day five tables are sold. Let N be the number of
days until a deluxe table is sold starting with today which corresponds to N = 0. What is the expected
number of days until a deluxe table is sold?
Solution. The probability that the five tables sold on a give day are all plain is (0.8) 5  0.3277. The
probability of selling at least one deluxe table on a given day is p = 1 – (0.8)5  0.6723. The probability of
first selling a deluxe table on day n is p(1 – p)n, i.e. Pr{N = n} = p(1 – p)n. If we let M = N + 1, then
Pr{M = m} = Pr{N + 1 = m} = Pr{N = m – 1} = p(1 – p)m-1. So M is geometric. By Proposition 3,
1
1
1
E{M} = . So E{N} = E{M – 1} = E{M} – 1 = - 1 
- 1  1.486 – 1 = 0.486.
p
p
0.6723
Proposition 4. Suppose N has a Poisson distribution, i.e. Pr{N = n} =
 n e -
n!
for n = 0, 1, 2, 3, …. where
 is a positive parameter. Then E{N} =  .

Proof. E{N} =

 e 
n
 n!  =
n=0

giving E{N} = 

n
-


n=1
 k e -
k!
 n e -
(n - 1)!

=

 n e -
. We replace n – 1 by k and factor out 
(n - 1)!
n=1

=  . Here we have used the fact that
j=1

e -
= 1. //
k!
k
j=1
Example 5. A hospital observes that the number of heart attack cases that arrive in the Emergency Room is
a Poisson random variable with mean 3 per hour? Find the probability that no more than two heart attack
cases arrive in the Emergency Room during the next hour.
Solution. By Proposition 4, one has  = 3. Then Pr{N  2} = Pr{N = 0} + Pr{N = 1} + Pr{N = 2} =
 0 e -  1 e -  2 e -
2
+
+
= (1 +  + ) e- = (1 + 3 + 4.5) e-3 = 8.5e-3  0.423.
0!
1!
2!
2
Properties of means. The operation of finding the mean of a random variable has a number of useful
properties.
Theorem 5. Let S be a sample space and X be a random variables with domain S.
Let x1, …, xm be the
values X assumes and let E1, …, Eq be disjoint sets whose union is S such that for each r the random
variable X assumes the same value on Er, i.e. there is k = k(r) such that X(a) = xk for a  Er. Then
2-5
q
(4)
E(X) =
 xk(r) Pr{Er}
r=1
(5)
E(X) =
 X(a) Pr{a}
aS
m
Proof. By (3) one has E(X) =
 Pr{X = xk} xk. Let Ek1, …, Ek,rk be those Eq such that X(a) = xk for
k=1
a  Ekq. Then {X = xk} = Ek1  …  Ek,rk and Pr{X = xk} =
rk
m
rk
 Pr{Ekr}. So E(X) =   Pr{Ekr} xk =
r=1
k = 1r = 1
q
 xk(r) Pr{Er} which proves (4). (5) is a special case of (4). //
r=1
Example 6. You have two friends, Alice and Bob. You invite both of them over to help you clean house.
If either or both of them come you get $100. If neither comes you get nothing. Suppose the probability of
either coming is ¼ and whether one comes is independent of whether the other comes. The four outcomes,
the probability of the outcomes and how much you get, W in each case are as follows.
AB = both come
Pr{AB} = 1/16
W=1
Ab = only Alice comes
Pr{Ab} = 3/16
W=1
aB = only Bob comes
Pr{aB} = 3/16
W=1
ab = neither comes
Pr{ab} = 9/16
W=0
One has Pr{W = 0} = 9/16 and Pr{W = 1} = 7/16, so E(W) = (0)(9/16) + (1)(7/16) = 7/16. To illustrate
formula (4) in Theorem 1, consider the following three events with their probabilities and the value of W for
the outcomes in that event
E1 = {AB} = both come
Pr{E1} = 1/16
W=1
E2 = {Ab, aB} = only one comes
Pr{E2} = 6/16
W=1
E3 = neither comes
Pr{E3} = 9/16
W=0
Then according to formula (4) one has E(W) = (1) Pr{E1} + (1) Pr{E2} + (0) Pr{E3} = (1) (1/16) +
(1) (6/16) + (0) (9/16) = 7/16. To illustrate formula (5) in Theorem 1, one has E(W) = (1) Pr{AB} + (1)
Pr{Ab} + (1) Pr{Ab} + (0) Pr{ab} = (1) (1/16)} + (1) (3/16) + (1) (3/16) + (0) (9/16) = 7/16.
Theorem 6. Let S be a sample space and X and Y be random variables with domain S. Let c be a real
number and y = g(x) be a real valued function defined for real numbers x. Let x1, …, xm be the values X
assumes. In (10) on the left E(c) denotes the expected value of the random variable which is c for every
outcome. Then
(6)
E(X + Y) = E(X) + E(Y)
(7)
E(cX) =
cE(X)
2-6
(8)
E(XY) =
(9)
E(g(X)) =
E(X)E(Y)
if X and Y are independent
m
 g(xk)f(xk)
k=1
(10)
E(c) = c
 (X(a) + Y(a)) Pr{a} =  X(a) Pr{a} +  Y(a) Pr{a} =
Proof. Using (4) one has E(X + Y) =
aS
aS
aS
E(X) + E(Y) which proves (6). The proof of (7) is similar. Let y1, …, yr be the values Y takes on. By the
definition of expectation one has E(X) =
m
m
r
j=1
k=1
 Pr{X = xj} xj and E(Y) =  Pr{Y = yk} yk. So E(X)E(Y) =
r
  Pr{X = xj}Pr{Y = yk}xjyk. Since X and Y are independent one has Pr{X = xj}Pr{Y = yk} =
j=1k=1
m
Pr{X = xj, Y = yk}. So E(X)E(Y) =
r
  Pr{X = xj, Y = yk}xjyk. However, by (4) this last sum equals
j=1k=1
E(XY) which proves (8). Note that g(x) is constant on the sets {X = xj}. So (9) follows from (4). The proof
of (9) is easy. //
Example 7. A company produces transistors. They estimate that the probability of any one of the
transistors is defective is 0.1. Suppose a box contains 20 transistors. What is the expected number of
defective transistors in a box?
Solution. Let Xj = 1 if the jth transistor is defective and Xj = 0 if is is good. The number N of defective
transistors is N = X1 + … + X20. By (6) in Theorem 2 one has E(N) = E(X1) + … + E(X20). By Proposition
2 one has E(Xj) = 0.1 for each j. So E(N) = (0.1)(20) = 2.
This example illustrates the following general proposition.
Example 8. Consider a random walk where the probability of a step to the right is ½ and the probability of
a step to the left is ½. After 4 steps your position Z could be either -2, 0 or 2 with probabilities ¼, ½ and ¼
respectively. Compute E(Z2).
Solution. By (8) one has E(Z2) = (- 2)2 Pr{Z = - 2} + (0)2 Pr{Z = 0} + (2)2 Pr{Z = 2} = (4)(1/4) + (0)(1/2)
+ (4)(1/4) = 2. If we were to compute E(Z2) from the definition (2) then E(Z2) = (4) Pr{Z2 = 4} + (0)2
Pr{Z2 = 0} = (4)(1/2) + (0)2 (1/2) = 2.
Example 9. An unfair coin is tossed twice with the two tosses independent of each other. Suppose on each
toss Pr{H} = ¼ and Pr{T} = ¾. For each toss we win a $1 if it comes up heads and lose a $1 if it comes up
tails. We Wj be the amount we win on the jth toss. Compute E(W1W2).
Solution. By (7) one has E(W1W2) = E(W1) E(W2). One has E(W1) = E(W2) = (1)(1/4) + (-1)(3/4) = -1/2.
Do E(W1W2) = (- ½)(- ½) = ¼.
2-7
The Law of Large Numbers. As mentioned earlier, the fact that (2) holds is called the Law of Large
Numbers. A precise statement is as follows.
Theorem 7. Let X1, X1, …, Xn, … be a sequence of independent random variables all taking on the same
set of values x1, …, xm and having the same probability mass function f(x). Let
_
Xn =
1 (X1 + X2 +  + Xn)
n
_
Then Pr{a: Xn(a)  X as n  } = 1.
_
_ _
_
Note that Xn is again a random variable, so that X1, X2, …, Xn, … is a new sequence of random variables and
_
_
_
for each outcome a one has a sequence of numbers X1(a), X2(a), …, Xn(a), … The Law of Large Numbers
_
says that Xn(a)  X except for a set of outcomes that has probability zero. For the proof of the law of large
numbers see a more advanced book in probability.
2-8