Download Chapter 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Chapter 5
The Binomial Distribution
The binomial distribution is a very useful distribution for studying situations
where the information is available in a ``yes'' or ``no'' form. Such information
is typical of market research studies. ``Do you prefer product A or product B?'' ``Would
you buy this product if the price was $5.00?'' Another application is election polls. ``If
the election were held tomorrow, would you vote for Smith or Jones?'' A response for
Smith could be considered a yes and a response for Jones a no. Another business
application that is becoming increasingly important is in the quality control. A machine
is producing parts. We are interested in the proportion of defective parts (a defective part
could be considered a ``yes'') the machine produces as opposed to the proportion of good
parts (a good part considered a ``no'').
The binomial distribution arises from binomial experiments. These experiments have the
following characteristics:
1. The experiment consists of n distinct trials
2. Each trial must result in one of two possible outcomes. Designate one of these
outcomes as a success, S, and the other as a failure, F.
3. Let the probability of a success be P(S)=p and P(F)=q. Successes and failures
are complements so p+q=1, also p and q do not change from trial to trial.
4. The trials are independent. That is the probability of a success on the third trial,
for example, does not depend on the outcome of the previous trials.
5. The RV X= number of S's in n trials
Let's conduct an experiment which consists of tossing a single coin $n=20$ times and see
if it is a binomial experiment. First of all you can easily distinguish the results of the
trials and know how many of them there are, so (1) is met. If we call getting a head a
success (S) and getting a tail a failure (F) then it is clear that only S's and F's can occur
and that one and only one of them will result on any toss, so (2) is met. Heads and tails
are complements and the probability of getting a head or getting a tail won't change from
toss to toss, so (3) is met. The probability of getting a head on any toss should not
depend on the results of any previous trial, so (4) is met. The final condition, (5), is met
because we can count the number of heads (S's). So this is a binomial experiment.
Consider a second experiment. An urn is filled with 10 red balls and 10 black balls. Two
balls are drawn from the urn. Call a getting a red ball a success. Is this binomial?
Conditions (1), (2), and (5) seem to be met. Conditions (3) and (4) depend on whether
sampling with replacement (putting each ball back in the urn after it is drawn) or
sampling without replacement (the balls are not placed back after each draw). If
sampling with replacement is used, the experiment is binomial. If sampling without
replacement is used the experiment is not binomial. Condition (3) is not met because both
p and q change from trial to trial.
The probability of getting a red ball (S) on the first draw is P(S)=10/20. Suppose you get
a S on draw one. Then on draw 2, P(S)=9/19. On the other hand if you get a F on draw
one the probability of getting a S on draw 2 is P(S)=10/19. So p and q change from trail
to trial. Also the probability a getting a success on the second trial depend on the results
of the first trial. This means the trials are not independent so Condition 4 is violated.
There is a probability distribution which can describe sampling without replacement, the
hypergeometric distribution.
Suppose a market survey is being conducted. Is this binomial? In one sense it is like the
urn problem unless sampling with replacement is done (each person could be questioned
more than once --if they will put up with it). But if a very small sample is taken from a
very large population, the change in p and q would be so small that we could ignore it.
The trials would likely be independent in this case also. But consider the following: the
researcher asks two friends standing side by side what they think of a certain product. Do
you think one friends response might affect the others response?
The binomial formula is
P( X  x)  C xn p x q ( n x )
where x is the number of successes in n trials, n-x is the number of failures, p is the
probability of a success on any one trial, and q is the probability of a failure on any one
trial. Further,
C xn 
n!
x!(n  x)!
.
Example:1.
Suppose you were to toss 2 coins (n=2) Call getting a head a success(S). Suppose the
probability of getting a head on either coin is 50% (p=q=0.5). Find the probability that
you will get no heads (X=0), the probability that you will get exactly one head (X=1),
and the probability that you will get 2 heads (x=2).
For P(X=0)
p  0.5, q  0.5, n  2, X  0
Cxn 
n!
2!
12
2


 1
x ! n  x  ! 0! 2  0 ! 11 2  2
P( X  0)  C02 p x q n  x  1 0.5  (0.5) 2  1 1 0.25  0.25
0
For P(X=1)
p  0.5, q  0.5, n  2, X  1
Cxn 
n!
2!
12 2


 2
x ! n  x  ! 1! 2  1! 11  1
P( X  1)  C12 p x q n  x  1 0.5  (0.5)1  2 0.5 0.5  0.5
1
and for P(X=2)
p  0.5, q  0.5, n  2, X  2
Cxn 
n!
2!
12
2


 1
x ! n  x  ! 2! 2  2 ! 1 2 1 2
P( X  2)  C22 p x q n  x  1 0.5  (0.5)0  1 0.25 1  0.25
2
Example 2.
We could also consider tossing a coin 10 times and finding the probability that we would
get exactly 3 heads out of the ten tosses.
p  0.5, q  0.5, n  10, X  3
Cxn 
n!
10!
3628800


 120
x ! n  x ! 3!10  3! 6  5040 
P( X  3)  C310 p x q n  x  120  0.5 (0.5)7  120 0.125 0.0078  0.1172
3
Example 3
If you were to go to Las Vegas and play a game like roulette with fairly even odds (I
think you have about a 49% chance of winning), then the probability that you will win
exactly 5 games in 10 trials is
p  0.49, q  0.51, n  10, X  5
Cxn 
n!
10!
3628800


 252
x ! n  x  ! 5!10  5 ! 120 120 
P( X  5)  C510 p x q n  x  252  0.49  (0.51)5  252 0.02825 0.03450  0.2456
5
It should be clear that it will be difficult to use the formula if n is very large. Excel can
be used in that case
Probability Formula
Probability Formula
P(X=5)
P(X<=5)
Excel command
Binodist(5,10,0.5,false)
Name
Probability mass function
Binodist(5,10,0,5,true)
Cumulative distribution
function
Some statistical terminology.
P( X  x)
is called the probability mass function.
P( X  x)  P( x  0)  P( X  1)    P( X  x)
is called the cumulative distribution function. That is it accumulates (sums) a number of
mass function values.
In Excel
P( X  x)
binodist(x,number of trials, p, false)
P ( X  x)
Binodist(x,number of trials, p, true)
P( X  x)  1  P( X  x)
1-Binodist(x,number of trials, p, true)
where p is the probability of success on any one trial.
Example 4.
Suppose a fair coin (call a Success S is getting a head, the P(S) = p = 0.5) is tossed 10
times.
(a)
find the probability that you will get exactly 5 heads in the ten tosses
(b)
find the probability that you will get 5 or less heads in the ten tosses
(c)
tosses.
find the probability that you will get between 3 and 5 heads in the ten
Example 5.
Work Example 3 using Excel. Find the probability that you would win 5 out of 10 games
in Las Vegas if the odds on winning any one game is 49%
Example 7. We can find the probability that you will lose money (if you win 4 or fewer
times), break even (if you win 5 times) or win money (if you win 6 or more times).
.
Example 8. It is interesting to note what happens if you extend the play at the game.
Suppose you play 100 times rather than 10. You will lose money if you win 49 or fewer
games, break even if you win 50 games, and win money if you win 51 or more games.
Moral: if you go to Las Vegas, the longer you play the more you lose.
The Mean and Standard Deviation of the Binomial Distribution
The mean of the binomial distribution is given by
  E ( x)  np
where the notation E(x) is the expected value (which is just another word for mean. Some
people like to say mean, others like to say expected value). Suppose you played a game
where you tossed a coin 10 times where each toss had a 50% chance of coming up heads.
Repeat this game a larger number of times recording X, the number of successes for each
game, and then averaged the results for all of the games. The average then is the expected
value. For this case
  E ( x)  np  10(0.5)  5
The variance and standard deviation of the binomial distribution is given by
 2  npq
  npq
and for the game mentioned above
 2  npq  10(0.5)(0.5)  2.5
  2.5  1.58
The use of the mean and standard deviation will become clearer in the next chapter. At
this point we can note that we could use them in Chebychev’s Theorem.
Example 9.
Suppose we were to toss a coin n=1000 with p=0.5 where we say a success occurs if a
head shows. Find the mean and variance of the probability distribution. Then use
Chebychev’s theorem with k=2 place limits on the distribution.
  np  1000(0.5)  500
 2  npq  1000(0.5)(0.5)  250 \[
  npq  250  15.81
Then using Chebychev’s theorem
P    2  X    2   1 
1
k2
P  500  2 15.81  X  500  2 15.81   1 
P  468.38468  X  531.62  
P  468  X  532   75%
1
22
3
4
Suppose we repeated this experiment a large number of times. Chebychev's Theorem
says that at least 75% of the time the number of heads will be between 468 and 532.
Problems
1. Let the probability of a success on any one trial be p = 0.1. Suppose n = 3 trials are
conducted. Find
(a)
the probability that there will be no successes
(b)
the probability that there will be exactly 1 success
(c)
the probability that there will be exactly 2 success
(d)
the probability that there will be exactly 3 success
(e)
the probability that there will be exactly 1 failure [P(X=2)]
(f)
the probability that there will be exactly 2 failures [P(X=1)]
2. . Suppose the probability of having a good blind date is 0.20 and that you have 5 blind
dates during the semester. What is
(a) the probability that you will have exactly 4 good blind dates
(b) the probability that you will have exactly 3 bad blind dates
(c)
3 Suppose the probability that a student entering NAU as a freshman will graduate in four
years is p=.3. Suppose 1000 new freshmen enter NAU this fall. Use Chebychev's
Theorem with k=2 to find limits on the numbers that will four years hence. Repeat using
k=3.
4. Suppose that 10% of all market research surveys are returned. If a company sends out
n=10,000 surveys set limits on the number they might expect to have returned. To do this
use Chebychev's Theorem with k=2 and k=3.
Answers
1.Use the Excel results to check your calculations
2. Use the Excel results to check your calculations