Download 8.1 HE BINOMIAL DISTRIBUTION: SUCCESS COUNTS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Law of large numbers wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
sample—the distribution of the number of heads in 10 fair coin tosses, for
example.
An example of such a distribution is that of chi-square, discussed in
Chapter 7. One benefit of the chi-square distribution is that a probability
involving a chi-square statistic (a statistic computed from a random sample) can be approximated by the area under a particular smooth curve,
called a chi-square density. We can therefore use a table of probabilities for
this chi-square distribution to bypass the six-step method when doing a
chi-square test, obtaining from a chi-square table the probability (under
the null hypothesis) of a chi-square statistic being larger than the observed
chi-square value. Such a probability is determined by computing the area
under the density lying to the right of the specified value. The chi-square
density is a continuous distribution (recall Figure 7.4), whereas the above two
specified distributions are discrete. Both types are very important in statistics.
In summary, by the distribution of a statistic (also called a random variable) we mean the set of the theoretical probabilities of its possible values in
the discrete case, and a density in the continuous case. In Section 6.1 we discussed the concepts of sample and population. As discussed above, the population distribution gives the theoretical probability of each possible value
of the statistic resulting from randomly sampling once from a population.
When the population is a large real physical population, such as the
heights of adult males, we often use a continuous distribution whose density
has a shape approximately that of the relative frequency histogram (also
called probability histogram) of the population. For example, the probability
law for sampling one adult male’s height is given by the famous bell-shaped
curve (called a normal density), because the relative frequency histogram
of all the population’s heights is bell shaped. The population distribution is
selected to approximate the relative frequency histogram of the population.
We will first discuss the discrete distribution associated with yes/no or
success/failure situations, such as coin tossing or medical drug testing.
8.1 THE BINOMIAL DISTRIBUTION: SUCCESS COUNTS
The simplest random experiments focus on only two possible results.
Buy a lottery ticket: you either win or lose.
Flip a coin: it comes up heads or tails.
Randomly choose a person from the United States: does the person have
confidence in the public schools? Yes or no.
Deal a 5-card hand from a typical 52-card deck: all the cards are from the
same suit (a flush), or at least two suits appear.
A person is given a flu shot: the person either does or does not contract
the flu.
A couple has a baby: it is either a girl or a boy.
On a given day in a given city, it either snows or does not snow.
The statistic in such an experiment is either 0 or 1. One has to decide which
result is assigned 1 and which is assigned 0. For example, if your lottery
ticket wins, you record 1, but if you lose, you record 0. Often the outcome
associated with the 1 is called a “success,” especially if the context suggests
that one outcome is good and the other is bad.
Suppose you buy one lottery ticket every day for a week. Each ticket
has a 25% chance of winning (the amount won is usually fairly small).
The population distribution is p(win) ⳱ p(1) ⳱ 14 and p(lose) ⳱ p(0) ⳱ 34 .
What might the data look like? Here is a typical week:
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Win or lose
Statistic
Win
Lose
Lose
Lose
Lose
Win
Lose
1
0
0
0
0
1
0
Monday and Saturday happened to have winners. The other days lost.
Instead of looking at the above table as representing seven small
experiments, consider it to represent one larger experiment that consists of
buying seven lottery tickets and counting the number of winners. (How
would you build and sample from a box model to simulate seven purchases
of a lottery ticket with a 25% chance of winning for each purchased ticket?)
The statistic of interest is the number of winners (you do not care on which
days you won, just on how many days you won). Notice that the statistic
for the number of winners for the week is just the sum of the statistics for
the seven days:
1Ⳮ0Ⳮ0Ⳮ0Ⳮ0Ⳮ1Ⳮ0 ⳱ 2
That is, adding up the 1s counts the number of successes in the larger trial or
experiment. For simplicity we will call the random purchase of each ticket
a trial rather than a subtrial.
What is the chance of having no winning ticket in a week? Of having
three winners? Or of having all seven winners? That is, what is the distribution of this sum? In other words, what is the distribution of the statistic of interest computed by summing the sample of 7? Recall that the term distribution
simply means the specification of the probabilities for the various possible
values of the statistic of interest, in this case 0, 1, . . . , 7. The answer is the
binomial distribution. Bi means two, nom means name. The idea is that
each day there are two possible results: win or lose.
For a random experiment in the real world to yield a statistic with a
binomial distribution, several conditions have to be satisfied:
1. There is a fixed number of individual trials.
2. Each trial yields a statistic of either 0 or 1.
3. The chance of an individual trial yielding a 1 is the same for each trial.
4. The trials are independent. That is, the 1s and 0s that occurred on prior
trials have no influence on whether the current trial produces a 1 or
a 0. (It may be helpful to review the discussion on independence in
Section 4.3.)
If the four conditions are satisfied, the sum of the individual statistics (the
number of successes) has a binomial distribution. Later in the chapter we
will learn how to compute binomial probabilities. We refer to an experiment
satisfying the four conditions as a binomial experiment.
The lottery example appears to satisfy the conditions very well. Condition 1: there are exactly seven trials planned. Condition 2: each trial is
either a winner, in which case the statistic is 1, or a loser, in which case the
statistic is 0. Condition 3: the chance of winning (getting a 1) on any one
day is 25%. Condition 4, independence, supposes that whether you win one
day does not have any effect on whether you win on any of the other days.
Theoretically that should be true.
Flipping a coin 25 times and counting the number of heads is also well
modeled by a binomial distribution. (Ask yourself why the four required
conditions hold.) Some other examples follow.
Polling: The Gallup organization conducted a survey during May 28–29,
1996, to assess the public’s confidence in various institutions. It interviewed
1019 adults. These people were chosen randomly in such a way that
everyone in the population had an equal chance of being chosen. For each
person, a 1 was recorded if the person expressed confidence in the public
schools, and otherwise a 0 was recorded. (More specifically, a 1 means the
person had a “great deal” or “quite a lot” of confidence, while a 0 means
the person had only “some” confidence, “very little” confidence, or “no”
confidence or did not know.) The number of 1s was 387; that is, 387 of the
1019 people had confidence in the public schools. That is about 38%.
Think about the four conditions for a binomial experiment. First, the
number of individual trials is 1019, the number of people interviewed. Did
the Gallup Organization fix that number in advance, or actually interview a
random number of people? Probably the number was not exactly fixed, but
for practical purposes it can probably be considered so. Its polls often have
around 1000 respondents, but because of the vagaries of finding people at
home, having working telephones, and so on, the actual number randomly
fluctuates a bit, depending on how lucky the polling organization is in
reaching the people it intends to sample. Second, each trial indeed yields a
1 or a 0.
Third, what is the chance that a given chosen person has confidence in
the public schools? Here we have to be careful about where the randomness
is. Imagine the entire population of adults in the United States. Some have
confidence in the public schools; some do not. For argument, suppose 40%
do have confidence. Then the chance that a randomly chosen person is one
of those with confidence in the public schools is 40%, or 25 . This chance is
exactly the same for the first person chosen, the second chosen, and so on, so
condition 3 is satisfied. This last statement is not intuitive for many students.
It becomes clearer if one realizes that the 25 chance of the second person
having confidence in the public schools is figured without knowledge of the
first person’s response. To make this clear, consider a simple example of two
people buying milk in the case in which one of the five bottles in the grocery
cooler is spoiled. The chance that the first person buys the spoiled milk is of
course 15 . But the chance for the second person is also 15 , because the result
from the first person is not known. For example, do you care whether you
are the first or the second purchaser if as the second purchaser you have no
knowledge about the first person’s purchase?
Were people chosen independently? In a practical sense, yes. Thus
the resulting 1s and 0s of the 1019 trials will be independent as required.
There could be a slight problem of dependency between people selected
because a person chosen once cannot be chosen again, which is to say that
the population being drawn from changes slightly from trial to trial. For
example, if a lot of 1s have occurred, then the chance of getting a 1 in the
current trial with so many people with 1s already selected may be less.
But the population of the United States is so much larger than 1019 that
the chance of getting a 1 on the current trial is for all practical purposes
uninfluenced by the results of previous trials. For all practical purposes,
then, independence holds. (Reviewing Section 5.7 may be useful here.)
The result of this poll does appear to be reasonably well modeled by a
binomial distribution, although there are a few subtleties to suggest caution
about our conclusion.
Drawing cards: This example uses just the four aces and four kings from
a regular deck. Shuffle those eight cards well, and choose five off the top.
Count the number of aces you draw. Is this number binomial? The first
condition holds: there are five individual trials. The second holds, too. Each
draw is either 1, if it is an ace, or 0, if it is a king. The third condition
is true, because the chance of an ace being drawn is 12 for any of the
cards. (Remember that you are not considering the result of previous trials
when you say the probability of an ace in the third trial, say, is 12 .) The
fourth condition is false. The chance that the second card is an ace is not
independent of the first card. If the first card is an ace, the chance that the
second is an ace is 37 , because three aces and four kings are left. If the first
card is a king, the chance that the second card is an ace is 47 . What is the
chance that the fifth card is an ace if the first four cards are aces? What is the
chance that the fifth card is an ace if the first four cards are kings? Moreover,
3
1
7 is a lot less than 2 . So, as was not the case in the above polling example,
where the dependence was slight and hence ignorable, here it is substantial.
The number of aces should not be modeled as a binomial.
Children: For a randomly selected family, is the number of girls among
the children binomial? Certainly, each child is either a girl or boy, and
the chance that any given child is a girl is about 49% (generally slightly
more boys are born than girls). The genders of the children in a family are
probably reasonably independent, especially if we discount the occurrence
of twins. But condition 1 may not be valid: in some cultures many parents
want to have at least one son, so the number of children is not fixed, because
they keep having children until they have at least one son. Therefore the
number of girls may not be binomial. Of course, if we only considered
three-child families, the binomial distribution would work well.
Snow: Each day of the year, it either does or does not snow in Detroit. Is
the number of days it snows during a given year binomial? No. The first
two conditions are valid, but the third is not. The chance of snow on winter
days is obviously much higher than on summer days. What about condition
4? If it is snowing on one day, then clearly the chance of snow on the next
day is higher. Thus condition 4 fails too!
Binomial Probabilities
By knowing the number of individual trials and the chance of a 1 on an
individual trial, it is possible to find the chance of obtaining a given number
of 1s, or successes, in a binomial experiment. For example, we will simulate
the above lottery example. There are seven individual trials, and each has a
1
4 chance of yielding a 1. We generate seven individual trials 100 times, and
each time we find the number of winners. Table 8.1 contains the number
of times we obtained each possible sum (for example, 10 simulated weeks
had no winner, and 37 had one winner), the proportion (experimental
probability) of seven-trial experiments in which we obtained each sum, and
the theoretical probabilities.
Table 8.1
p (number of winners) for Success Probability
1
4
Number
of winners
Experimental
frequency
Experimental
probability
Theoretical
probability
0
1
2
3
4
5
6
7
10
37
32
17
3
1
0
0
0.10
0.37
0.32
0.17
0.03
0.01
0.00
0.00
0.1335
0.3115
0.3115
0.1730
0.0577
0.0115
0.0013
0.0000
In the next section we will show how to figure out the theoretical
probabilities that are given in the last column. Note, as expected, that the
experimental probabilities are reasonably close to the theoretical ones. If
instead of simulating 100 weeks we simulated 400 or 1000, the experimental
probabilities would be even closer to the theoretical probabilities.
Although the estimate of the theoretical probability of getting seven
winners (outcome is 1111111) rounds to 0, it is possible. The theoretical
probability of doing so is easy to figure out. Since each individual day has a
1
4 chance of winning, and the seven days are independent, the chance that
all seven are winners is
冢4冣冢4冣冢4冣冢4冣冢4冣冢4冣冢4冣 ⳱ 16,384 ⳱ 0.000061
1
1
1
1
1
1
1
1
(recalling from Section 4.3 the fact that we multiply probabilities to determine the probability of events occurring together when the events are
independent). With this very small probability, which is about 1 in 16,380,
we would expect to wait on average over 315 years (16,380 weeks) before
having an entire week of winners.
Figuring Out Theoretical Binomial Probabilities
The approach to finding the chance of there being seven winners in a trial
can be further developed to find the chance that any specified number of
winners will occur. There are three steps:
1. List all the length 7 win/loss sequences (outcomes) having the specified
number of winners
2. Find the probability of each of those sequences of interest occurring.
3. Sum the probabilities found in step 2. This is legitimate because when
several outcomes each make the event of interest occur, probability
theory requires us to add these outcome probabilities to obtain the
event’s probability.
Let’s start with the simple example of flipping a fair coin twice, which
is a simple case of a binomial experiment. The number of heads could be
0, 1, or 2. Thus we want the probability law for p(0 heads), p(1 head), p(2
heads). For each of these, we follow the above steps.
Number of heads ⳱ 0: Only one sequence yields 0 heads: both flips are
tails. Hence step 1 yields
tails tails
00
The chance that both are tails is 冸 12 冹冸 12 冹, because the coin is fair and the flips
are independent. So step 2 yields
p(0 heads) ⳱ Probability of (tails,tails)
⳱ (probability of tails on first flip)(probability of tails on second flip)
1 1
1
⳱
⳱
2 2
4
冢 冣冢 冣
Step 3 is to sum the probabilities, and since there is only one, the answer
is 14 .
Number of heads ⳱ 1: For step 1 we need the ways to get one heads and
one tails. There are two possible sequences, or outcomes:
heads tails
tails heads
10
01
The chance of each of those two outcomes can be figured out as in the
previous case, so step 2 gives us
Probability of (heads,tails)
⳱ (probability of heads on first flip)(probability of tails on second flip)
⳱
冢2冣冢2冣 ⳱ 4
1
1
1
and
Probability of (tails,heads)
⳱ (probability of tails on first flip)(probability of heads on second flip)
⳱
冢 冣冢 冣
1
2
1
1
⳱
2
4
Now add the probabilities to complete step 3, noting that the event of 1 head
occurs if either of the outcomes (heads, tails) = (10) or (tails, heads) = (01)
occurs:
p(1 head) ⳱
1 1
1
Ⳮ ⳱
4 4
2
Number of heads ⳱ 2:
Now both flips must be heads. Step 1 gives
heads heads
11
Step 2 gives
Probability of (heads,heads)
⳱ (probability of heads on first flip)(probability of heads on second flip)
1 1
1
⳱
⳱
2 2
4
冢 冣冢 冣
In step 3 we get a sum of 14 .
The distribution just derived is binomial with two individual trials in
which the chance of a 1 (success) on any individual trial is 12 . Here is the
summary:
Number of successes
Theoretical probability
1
4
1
2
1
4
0
1
2
⳱ 0.25
⳱ 0.50
⳱ 0.25
Notice that the probabilities for all possible values of the statistic of
interest add up to 1. This intuitive fact that the sum of the probabilities for all
possible values of a random variable adds to one will be made rigorous in
Chapter 14. But here we can say that when we list the probabilities of all the
distinct possible outcomes of a probability experiment, these probabilities
must sum to one. As a simple example, the sum of the six die probabilities
of 16 is 1.
Now we tackle a slightly more complicated example. Imagine playing
the Illinois Instant Lottery game three times. Recall that the chance of a
single ticket being a winner is 14 . The possible values of the statistic, the
number of wins in three tries, are 0, 1, 2, and 3. We will go through the steps
to find the probability of each.
Number of wins ⳱ 0:
Step 1: The only sequence is
lose lose lose
000
The chance of a single ticket winning is 14 , so the chance of losing is 1 ⫺ 14 ⳱ 34
(because the complement of winning is losing). Step 2: The chance of three
losers in a row is
3 3 3
27
⳱
⳱ 0.421875
4 4 4
64
冢 冣冢 冣冢 冣
Step 3: The answer is
27
64 .
Number of wins ⳱ 1: Of the three tickets, one is a winner and two are
losers. The winning one could either be the first, second, or third ticket. So
for step 1 we have the three possible sequences:
Win Lose Lose
Lose Win Lose
Lose Lose Win
100
010
001
Step 2 finds the chance of each sequence by multiplication:
Chance of (win,lose,lose)
⳱ (chance of win on first) ⫻ (chance of loss on second)
⫻ (chance of loss on third)
1 3 3
9
⳱
⳱
⳱ 0.140625
4 4 4
64
冢 冣冢 冣冢 冣
Similarly,
冢 冣冢 冣冢 冣
3 3 1
9
Chance of (lose,lose,win) ⳱ 冢 冣冢 冣冢 冣 ⳱
⳱ 0.140625
4 4 4
64
Chance of (lose,win,lose) ⳱
3
4
1
4
3
9
⳱
⳱ 0.140625
4
64
9
Notice that each sequence has the same probability, 64
. Step 3 adds the three
probabilities:
9
9
9
27
Ⳮ Ⳮ
⳱
⳱ 0.421875
64 64 64
64
Number of wins ⳱ 2: Now there are two wins and one loss. Step 1:
Win Win Lose
Win Lose Win
Lose Win Win
110
101
011
Step 2:
Chance of (win,win,lose)
⳱ (chance of win on first) ⫻ (chance of win on second)
⫻ (chance of loss on third)
1 1 3
3
⳱
⳱
⳱ 0.046875
4 4 4
64
冢 冣冢 冣冢 冣
Continuing:
Chance of (win,lose,win) ⳱
冢4冣冢4冣冢4冣 ⳱ 64 ⳱ 0.046875
1
3
1
3
Chance of (lose,win,win) ⳱
冢4冣冢4冣冢4冣 ⳱ 64 ⳱ 0.046875
3
1
1
3
Again, each sequence has the same probability, in this case
add them up:
3
64 .
Step 3 is to
3
3
3
9
Ⳮ Ⳮ
⳱
⳱ 0.140625
64 64 64
64
Number of wins ⳱ 3:
There is only one sequence with three wins. Step 1:
Win Win Win
111
Step 2:
Chance of (win,win,win) ⳱
冢4冣冢4冣冢4冣 ⳱ 64 ⳱ 0.015625
1
1
1
1
1
Step 3: The answer is then 64
. There is not a very good chance that all three
tickets will win.
Now we can summarize the binomial distribution with three trials and
with the chance of a 1 on an individual trial being 14 :
Number of successes
Theoretical probability
0
1
2
3
27/64 ⳱ 0.421875
27/64 ⳱ 0.421875
9/64 ⳱ 0.140625
1/64 ⳱ 0.015625
Add up the probabilities. Do you obtain 1?
When the number of individual trials is large, writing out all the
possibilities is tedious, but in principle it can be done. Table 8.1 showed
the theoretical probabilities for seven trials for the case in which the
probability of a 1 on a single trial is 0.25. Section 14.2 revisits the topic
of the binomial distribution and gives a compact formula for theoretical
binomial probabilities in general. For your convenience, an incomplete table
of binomial probabilities based on this formula appears in Appendix G.
Means and Standard Deviations
An advantage to recognizing that a random experiment (viewable as
sampling from a population) yields a statistic (random variable) with a
known probability distribution, such as the binomial, is that many important
characteristics of the experiment can be predicted without use of fivestep simulation. For example, theoretical probabilities of interest can often
be obtained through tables or formulas (rather than the laborious fivestep method). In addition, there is usually a simple expression for the
theoretical mean and the theoretical standard deviation, the most widely
used measures of the center and spread of a distribution. Concerning the
binomial distribution, in Chapter 11 confidence intervals that are useful
to estimate the theoretical mean of a binomial distribution with a built-in
measure of estimation accuracy are developed, and in Chapter 12 hypothesis
tests to carry out decision making concerning the theoretical mean of a
binomial distribution are developed. Below we discuss the theoretical mean
and standard deviation of a binomial.
In advanced statistics books that utilize calculus, general formulas for
the mean and standard deviation of a distribution (population parameters:
recall the discussion in Section 6.1) are given. Perhaps more important
than having the ability to compute such theoretical (population) means
and standard deviations, however, is having a clear understanding of their
meaning and usefulness. Our five-step method and the notion of sampling
from a distribution provide this. Consider simulating over and over the
experiment of seven lottery plays with the number of wins being the statistic
of interest. This repeated five-step simulation of the number of wins consists
of repeatedly sampling from a binomial distribution having seven trials
and success probability 1/4. Now imagine 100 trials of this experiment,
conducted with the five-step method using seven draws with replacement
from a box of 3 zeros and 1 one. As we learned in Chapter 4, which discussed
expected value, the average of these 100 sampled statistics (numbers of wins)
from the distribution will be close to the theoretical expected value, which
is the distribution mean we need. Indeed, if more and more trials were
carried out, the experimental average, or estimated expected value, would
approach closer and closer to this theoretical mean.
Thus, the real meaning of a theoretical mean of a distribution is that it
is a parameter that indicates approximately where a random sample from
a population having that distribution will center. “Center” here is measured by the sample mean of the random sample. Conversely, the sample
mean of a large random sample from a distribution (population) indicates
approximately where the unknown population’s mean lies. Similarly, the
theoretical standard deviation indicates approximately the spread likely for
a large random sample, while conversely the sample standard deviation of
a large random sample indicates approximately the size of the population’s
unknown standard deviation and hence the unknown size of the population’s spread. In our example, the population could be thought of as the
result of playing the lottery seven times over and over indefinitely, each
time recording the number of wins.
In Section 14.4 we will learn a general mathematical formula for finding
the theoretical expected value of a statistic and in particular for applying it to
finding the population mean and standard deviation of various theoretical
probability distributions.
If you flip a fair coin 10 times, you would expect about five heads. In
any binomial experiment, you would conjecture that the proportion of 1s
will be equal to the probability any individual trial is a 1 times the number
of trials. Indeed, in a binomial experiment, the theoretical mean number of
1s, or successes, is
(Number of trials) ⫻ (probability of a 1 on an individual trial)
In the lottery example above, suppose you buy a lottery ticket on each
of seven days of the week and the chance of winning on any given day is 14 ,
Theoretical mean number of wins
⳱ (Number of days) ⫻ (probability of winning on a given day)
1
7
⳱ 7⫻
⳱ ⳱ 1.75
4
4
冢冣
You expect 1.75 winning tickets in a week. What can this mean? You
know you cannot win exactly one and three-quarters times. It is important to realize that expected means that after repeating the experiment
over and over (thus repeatedly sampling from a binomial distribution),
the average number of wins per week would be around 1.75. (Recall
Chapter 4, where we found such expected values empirically by the fivestep method. Here we are using the formula for the expected number of
successes in a binomial experiment, which allows us to bypass the five-step
method to precisely find the expected number of successes in a binomial
experiment, which the five-step method only approximates.)
In the 100 five-step method experiments summarized in Table 8.1, the
actual average is 1.69:
(10 ⫻ 0) Ⳮ (37 ⫻ 1) Ⳮ (32 ⫻ 2) Ⳮ (17 ⫻ 3) Ⳮ (3 ⫻ 4) Ⳮ (5 ⫻ 1)
169
⳱
100
100
⳱ 1.69
This experimental expected value or mean, 1.69, is fairly close to the
theoretical mean, 1.75, that it estimates.
How far do the actual values of the statistic of interest (number of
successes in 7 trials) from the individual experiments typically differ from
1.75? In most weeks the number of winners was 0, 1, 2, or 3, so you expect
usually to be fairly close to the theoretical mean. How close can be measured
by the theoretical standard deviation of the binomially distributed number
of wins because it is a measure of the typical spread one expects of an
observed statistic of interest from its theoretical mean. That is, being off
by up to one standard deviation from the theoretical mean is a common
occurrence. Indeed, obtaining a sampled statistic that is very close to the
theoretical mean (relative to a distance away of one standard deviation) is
quite unusual: variation is always present in random experiments! Crudely,
we expect to be off from the mean by about one standard deviation, on
average. For the data in Table 8.1, we estimate the theoretical standard
deviation by the sample standard deviation (the square root of the sample
variance of the data):
square root
冦 100 [10(0 ⫺ 1.69) Ⳮ 37(1 ⫺ 1.69) Ⳮ 32(2 ⫺ 1.69) Ⳮ
17(3 ⫺ 1.69) Ⳮ 3(4 ⫺ 1.69) Ⳮ (5 ⫺ 1.69) ]冧
1
2
2
2
2
2
2
⳱
冪
105.39
⳱ 1.0267 ⬇ 1
100
We express our conclusion about the typical variation in the sample mean
by stating that the number of winners per week averaged 1.69, plus or
minus the standard deviation, which is about 1. By “plus or minus about 1”
we mean that the typical deviation from 1.69 is typically at most about 1 in
magnitude and can be in either direction.
The theoretical standard deviation of the number of wins is approximately the standard deviation for the number of wins you would obtain if
you ran the experiment a large number of times. The theoretical standard
deviation for a binomial has a convenient, if not particularly intuitive, formula: In a binomial experiment, the theoretical standard deviation of the
number of 1s is
square root[(number of trials) ⫻ (probability of a 1 on an individual trial)
⫻ (probability of a 0 on an individual trial)]
In the lottery example,
Theoretical standard deviation of the number of wins
⳱ square root[(number of days)(probability of winning on a given day)
⫻ (probability of losing on a given day)]
⳱
冪 冢4冣 ⫻ 冢4冣 ⳱ 冪 16 ⳱ 冪1.3125 ⳱ 1.1456
7⫻
1
3
21
It is close to the experimental value of 1.0267—not surprising, because with
100 trials we anticipate that the experimental or sample value will be close
to its theoretical or population value. (This is of course our justification for
using the five-step method with a reasonably large number of trials.)
Flip a fair coin 100 times. You expect 50 heads 冸100 ⫻ 12 冹, and the
theoretical standard deviation of the number heads is
冪
100 ⫻
冢2冣 ⫻ 冢2冣 ⳱ 冪25 ⳱ 5
1
1
Thus it is not unusual to be off by up to 5 heads or so; that is, any number
between 45 and 55 is reasonable. In fact, an important rule of thumb is that
about 23 of the time we expect to be off from the theoretical mean by less
than one standard deviation. In this precise sense, the size of the theoretical
standard deviation gives us a quantitative interpretation of the “typical”
variability in whatever is being observed—here the number of heads in 100
tosses. You would be surprised to get as few as 20 or 30, or as many as 70
or 80.
The law of averages discussed in Example 4.3 tells us to expect heads in
about half the tosses. If you flip a fair coin 1000 times, you would therefore
anticipate that the number of heads would be around 500. How far off is
reasonable? More than or less than 5? The answer lies in the theoretical
standard deviation of the number of heads:
冪1000 ⫻ 冢2冣 ⫻ 冢2冣 ⳱ 冪250 ⳱ 15.81
1
1
Now from 484 to 516 is a likely number of heads—to be precise, an
occurrence that should happen about 23 of the time if we toss a coin 1000
times. So the typical spread with 1000 flips is about three times that with
100 flips. The more flips, the larger the standard deviation. However, an
interesting fact is that as a percentage of the number of flips, the standard
deviation is actually less with 1000: 15.81 out of 1000 is 1.581%, while 5 out
of 100 is 5%. Later this observation will be seen to be involved with the law
of averages.
The interested reader is encouraged to consult Section 14.2, where the
formula for computing theoretical binomial probabilities for a specified
number of trials and specified probability of a success (getting a 1) is given.
SECTION 8.1 EXERCISES
1. A baseball team has won 60% of its games to
this point in the season. Over the weekend it
will play three games. Go through the four
conditions for a binomial experiment. Why is
the binomial distribution not appropriate for
the number of games won on the weekend?
2. On a multiple-choice quiz, Joan guesses on
each of five questions. There are four possible
answers for each question.
a. What is the probability that Joan guesses
the first question correctly?
b. Go through the four conditions for a binomial experiment. Verify that the number
of correct answers out of five follows the
binomial distribution.
c. What is the probability that Joan guesses
the correct answer on all five questions?
3. Return to the situation presented in Exercise 2.
a. Write out the possible quiz outcomes in
which Joan correctly guesses exactly two
of the five questions. There should be 10
total possible outcomes. To get you started,
here is one of the outcomes:
Question number
Guess
1
2
3
4
5
Right
Right
Wrong
Wrong
Wrong
b. Now, next to each of the outcomes, write
the probability of seeing that particular
outcome.
c. Using the table you have created, find the
probability that Joan gets exactly two of the
five questions correct on the quiz.
4. Give an example of a situation in which the
binomial distribution could be used.
5. On each day of a certain week, there is a 20%
chance that the bus will reach the bus stop
late. Assume that each day is independent of
the next.
a. What is the probability that the bus reaches
the bus stop on time on a certain day?
b. What is the probability that the bus reaches
the bus stop on time every day in a five-day
work week?
c. What is the probability that the bus reaches
the bus stop on time exactly four days in
the five-day work week?
d. What is the theoretical mean for the number of times the bus will reach the bus stop
on time in a five-day work week? What is
the standard deviation?
6. For each of the following binomial distributions, give the theoretical mean and the
theoretical standard deviation:
a.
b.
c.
d.
n ⳱ 10; probability of a 1 is 0.90.
n ⳱ 22; probability of a 1 is 0.35.
n ⳱ 14; probability of a 1 is 0.12.
n ⳱ 7; probability of a 1 is 0.47.
7. There are 100 poker chips in a bowl, 65 of
which are blue and 35 red. You are to draw
out a chip 50 times. After each draw, you
replace the chip into the bowl (that is, there
are 100 chips in the bowl prior to each draw).
Hint: Does replacement make this a binomial
probability problem?
a. What is the theoretical expected value of
the number of blue chips you will draw in
50 draws?
b. What is the theoretical standard deviation
of the number of blue chips you will draw
in 50 draws?
c. For each of the following values, give a
subjective answer as to whether it is likely
or not to draw that many blue chips in 50
draws: 18 or fewer; 29 or fewer. Base your
decisions solely on the theoretical expected
value and standard deviation you found
in parts (a) and (b). Justify your answers.
8. You roll a fair, six-sided die six times. Calculate the following probabilities by writing out
all the possible outcomes that result in the
stated event and summing those probabilities.
a. The probability that exactly one of the six
tosses will come up a five
b. The probability that exactly five of the six
tosses will come up an even number
c. The probability that all of the tosses will
come up an odd number
9. A shopper goes to the grocery store and purchases a bag of 10 apples. Based on previous
experience, he knows that 5% of the apples of
this brand will be bruised and inedible. Assume that the apples are randomly assigned
to the bag. What is the probability that in his
bag he has either no bad apple or only one
bad apple? (Hint: The probability that there is
either one bad apple or none equals the sum of
the probability that there is no bad apple and
the probability that there is one bad apple.)
For additional exercises, see page 726.