Download 5.7 AMPLING WITH OR WITHOUT REPLACEMENT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
7. A manufacturer of transistors submits a bid
on each of four government contracts. A firm
will receive the contract if it submits the lowest
bid. In the past, this manufacturer’s bid has
been the low bid 15% of the time. Assume
that this continues to be the case, and assume
independence from one bid to another.
a. Conduct an experiment of 50 trials to find
the experimental probability that the manufacturer will not receive any of the four
contracts.
b. What is the theoretical probability that the
manufacturer does not receive any of the
contracts?
c. What is the theoretical probability that the
manufacturer will receive at least one of the
contracts?
8. During a certain week, there is a 25% chance
of rain each day. Assume that whether it rains
on one day is independent of whether it rains
on any other day.
a. What is the theoretical probability that it
does not rain on any day during the week?
b. Write out the complement to the event, “It
does not rain on any day during the week.”
c. Using parts (a) and (b), find the theoretical
probability that it rains at least one day
during the week.
5.7 SAMPLING WITH OR WITHOUT REPLACEMENT
We will introduce this important topic with an example. Suppose we have
four colors of flags. We decide to wave three flags in sequence to signal from
a ship to the shore (not a likely approach in this electronic age). Suppose
a young girl finds the flags and sends a message at random. She begins
by sampling one flag—that is, she chooses the first color she will transmit.
Now she has a choice. She can sample from the three remaining flags, or
she can replace the first flag and transmit the second color by randomly
choosing from the same four flags as the first signal. This choice is the
distinction between sampling randomly without or with replacement. It
is a very important distinction in statistics, where we very often sample
randomly from a population in order to use properties of the sample to
estimate properties of the entire population.
Let’s count the number of signals possible if the child is sampling without
replacement. Denote the four colors as R (red), O (orange), Y (yellow), and
B (blue). We’ll use a tree diagram to count all possible outcomes. See Figure
5.8. Notice that 24 signals are possible (4 ⫻ 3 ⫻ 2 ⳱ 24).
What is the probability of the signal ORY (O followed by R followed by
Y)? Since this signal can occur only in one way, we have
p(ORY) ⳱
1
24
because all possible sequences (keeping in mind that the sampling is without
replacement) are equally likely.
Flag 1
R
O
Y
B
Flag 2
Flag 3
O
Y
B
Y
B
O
B
O
Y
R
Y
B
Y
B
R
B
R
Y
R
O
B
O
B
R
B
R
O
R
O
Y
O
Y
R
Y
R
O
Figure 5.8
Possible three-flag
signals without replacement.
What is the probability of a signal containing no yellow flag? Applying
the principle of equally likely outcomes, we see by counting the outcomes
in Figure 5.8 having no Y (such as OBR) that
p(no Y) ⳱
6
24
If instead the child creates signals randomly with replacement, the above
probabilities will change. Even the total number of possible outcomes
will be different: there will be 4 ⫻ 4 ⫻ 4 ⳱ 64 possible signals rather than
4 ⫻ 3 ⫻ 2 ⳱ 24. For example, ORY can occur in one way (clearly) and hence
p(ORY) ⳱ 1/64. You can solve the problem of p(no Y) by drawing the tree
diagram, analogous to that of Figure 5.8, and counting outcomes having
no Y.
As another example, suppose you were to draw three balls with replacement from a jar containing 10 balls of 10 different colors. You must
return each drawn ball to the jar and mix thoroughly before drawing the
next one. It would be possible to draw the same ball more than once in one
trial. If you were not to replace each ball before drawing the next one, the
drawing would be a drawing without replacement, and in any given trial
the same ball could not be drawn more than once.
If a table of random numbers is used to represent the above drawing
of three balls with replacement, the same random number may occur more
than once in a given trial. But if the drawing is without replacement, the
same number may not be used more than once in a given trial. That is, a
duplicate must be ignored and the next number in the table must be chosen,
and so on until a number that is not a duplicate occurs. Thus we can use
random numbers to simulate sampling without replacement provided we
ignore duplicates.
Now consider the following problem to illustrate how this works.
Example 5.12
What is the probability of getting at least one ace in a hand of five cards dealt from
a deck of 52 ordinary playing cards?
Solution
The drawing involved here is without replacement because the same card can only
occur once in a hand of five cards—it is a drawing without replacement. Let’s find
an experimental probability as an estimation of the theoretical probability. One way
of estimating the probability of getting at least one ace in a hand of five cards would
be to actually deal out hands of five cards to find out the number of times one or
more aces occur. Another approach would be to use a table of random numbers
and follow the five-step procedure.
1. Choice of a Model: Use two-digit random numbers from 01 to 52, inclusive.
Ignore all others.
01–04:
05–52:
ace
remaining cards in deck
If the first six digits are 09 75 48, we treat them as 09 48 because 75 is greater than
52 and hence is ignored. Thus we now have 52 equally likely outcomes by this
trick of ignoring 53–99 and 00. This is a powerful tool for obtaining equally likely
probabilities when the number of outcomes is not 10, 100, or 1000, say.
2. Definition of a Trial: A trial consists of reading off five random numbers
between 01 and 52, ignoring duplicates. That is, if the first six digits in the table are
03 03 27, then we treat this as 03 27 because the second 03 is a duplicate.
Table 5.11
Estimating the Probability
of At Least One Ace in Five Cards
Trial
1
2
3
4
5
6
7
8
9
10
Random numbers
49
42
37
48
43
09
39
10
01
10
29
45
20
32
49
38
16
09
21
02
25
40
30
07
04
44
51
49
03
16
02
49
38
30
26
22
19
50
26
47
52
07
21
22
09
36
06
24
02
13
Success?
Yes
No
No
No
Yes
No
No
No
Yes
Yes
3. Definition of a Successful Trial: A successful trial occurs when at least one of
the five two-digit random numbers is between 01 and 04, inclusive (that is, when at
least one of the numbers obtained represents an ace).
4. Repetition of Trials: Do at least 100 trials.
Suppose 10 trials produced the results listed in Table 5.11. Here we have
removed all pairs larger than 52. In 4 of the 10 trials, at least one of the random
numbers is less than or equal to 4 (trials 1, 5, 9, and 10). Therefore, in four of the
trials we have drawn at least one ace.
5. Finding the Probability of a Successful Trial: (using only 10 trials: too few
for good accuracy!)
P(at least one ace in hand of five cards) ⳱
⳱
number of successful trials
total number of trials
4
⳱ 0.4
10
In Chapter 14 we will learn how to solve problems like this theoretically using
the hypergeometric distribution, which deals with probabilities involving two types
of outcomes (like ace or not ace) when the sampling is without replacement. The
theoretical answer is approximately 0.141.
Step 2 of the above procedure is the one that involved drawing samples
without replacement. We had to sometimes search through more than five
two-digit random numbers less than or equal to 52 before we got five that
were different. When drawing with replacement, however, we always take
the two-digit random numbers less than or equal to 52 as they come from
the table (that is, we do not ignore duplicate numbers).
Why Consider Sampling without Replacement?
One of the most important applications of statistics is the random sampling
of real populations, often people, in order to decide, based on the sample,
what the population is like. Since it is easier to find theoretical probabilities
for the independent subtrials that result from doing such sampling with
replacement, one prefers the theory that results when the sampling is with
replacement. In real sampling situations, however, it would be silly to
allow an individual to be sampled twice. Therefore, as a practical matter,
all sampling of populations in real statistical problems is done without
replacement.
It can be shown that the two theoretical probabilities of an event computed for sampling with replacement and for sampling without replacement,
such as the two differing values for p(At least 60 of 100 sampled people
favor legalizing abortion), are in fact almost equal to each other if the size
of the population is large relative to the sample size. This is true because
if the population is large, the chance of choosing an individual who has
already been sampled is very small even if the sampling is with replacement.
Thus, excluding this possibility of resampling a person, which is exactly
what sampling without replacement does, makes almost no change in a
computed sampling probability of interest that is more easily computed
assuming sampling with replacement.
Suppose we decide to address a small sample, large population sampling
situation with the five-step method because we need to estimate a probability
(or an expected value of interest to us). Then the two methods of defining
a trial (sampling with replacement and sampling without replacement
between subtrials) are, roughly, equally easy to carry out. Therefore we can
decide to sample with or without replacement, with the knowledge that
either approach is acceptable because the associated theoretical probabilities
of a successful trial are so close to each other. By contrast, if we were
instead doing a theoretical analysis of the sampling situation to solve for
a probability (or an expected value), we would likely assume sampling
with replacement because the mathematics needed to do our probability
computations is so much easier and the resulting answer is so close to the
(often difficult to compute) theoretical answer assuming sampling without
replacement.
SECTION 5.7 EXERCISES
Many of these probability problems are difficult to solve theoretically, especially when the
sampling is without replacement. You will likely
often need to resort to the five-step method, but
try to solve them theoretically first.
1. A bag contains five black marbles, four red
marbles, and three white marbles. Three marbles are drawn in succession.
a. If each marble is replaced before the next
one is drawn, what is the probability that at