Download 7.3 Sampling Distributions of the Sample Proportion

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Section 7.3
Sampling Distribution of the
Sample Proportion
Binomial Probability Distribution
Recall, in Chapter 6 we discussed a
binomial probability distribution.
What conditions must be satisfied to have a
binomial probability distribution?
Binomial Probability Distribution
A binomial probability distribution must
satisfy these conditions:
B
I
N
S
Binomial Probability Distribution
B: binomial –each trial must have one of two
outcomes—”success” or “failure”
I
N
S
Binomial Probability Distribution
B: binomial –each trial must have one of two
outcomes—”success” or “failure”
I: each trial is independent of the others
N:
S
Binomial Probability Distribution
B: binomial –each trial must have one of two
outcomes—”success” or “failure”
I: each trial is independent of the others
N: there is a fixed number, n, of trials
S
Binomial Probability Distribution
B: binomial –each trial must have one of two
outcomes—”success” or “failure”
I: each trial is independent of the others
N: there is a fixed number, n, of trials
S: P(success) does not change
Properties of the Sampling Distribution
of the Number of Successes
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of the number of successes X:
Properties of the Sampling Distribution
of the Number of Successes
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of the number of successes X:
• has mean x = np
Properties of the Sampling Distribution
of the Number of Successes
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of the number of successes X:
• has mean x = np
• has standard error  x  np1  p 
Properties of the Sampling Distribution
of the Number of Successes
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of the number of successes X:
• has mean x = np
• has standard error  x  np1  p 
• will be approximately normal as long as n
is large enough
How Large is Large Enough?
As a conservative guideline, if both
How Large is Large Enough?
As a conservative guideline, if both np and
n(1 - p) are at least 10,
How Large is Large Enough?
As a conservative guideline, if both np and
n(1 - p) are at least 10, then using the
normal distribution as an approximation for
the shape of the sampling distribution will
give reasonably accurate results.
The use of seat belts continues to rise in the
U.S., with overall seat belt usage of 82%.
Mississippi lags behind the rest of the
nation – only about 60% wear seat belts.
The use of seat belts continues to rise in the
U.S., with overall seat belt usage of 82%.
Mississippi lags behind the rest of the
nation – only about 60% wear seat belts.
(a) Suppose you take a random sample of
40 Mississipians.
How many do you expect will wear seat
belts?
The use of seat belts continues to rise in the
U.S., with overall seat belt usage of 82%.
Mississippi lags behind the rest of the
nation – only about 60% wear seat belts.
(a) Suppose you take a random sample of
40 Mississipians. How many do you
expect will wear seat belts?
You expect that 60% of the 40, or 24,
Mississipians will be wearing seat belts.
The use of seat belts continues to rise in the
U.S., with overall seat belt usage of 82%.
Mississippi lags behind the rest of the
nation – only about 60% wear seat belts.
(b) What is the probability that 30 or more of
the people in the sample of 40 wear seat
belts?
The use of seat belts continues to rise in the
U.S., with overall seat belt usage of 82%.
Mississippi lags behind the rest of the
nation – only about 60% wear seat belts.
(b) What is the probability that 30 or more of
the people in the sample of 40 wear seat
belts?
Hint: Can you use the normal
approximation to the binomial distribution?
The use of seat belts continues to rise in the U.S.,
with overall seat belt usage of 82%. Mississippi
lags behind the rest of the nation – only about
60% wear seat belts.
(b) What is the probability that 30 or more of
the people in the sample of 40 wear seat
belts?
np = 40(0.6) = 24
n(1 – p) = 40 (1 – 0.6) = 16
Since both are at least 10, you can use the
normal approximation to the binomial
distribution to determine the probability.
(b) What is the probability that 30 or more of
the people in the sample of 40 wear seat
belts?
P(30 or more) = normalcdf (lower bound,
upper bound,  x,  x)
(b) What is the probability that 30 or more of
the people in the sample of 40 wear seat
belts?
P(30 or more) = normalcdf (lower bound,
upper bound,  x,  x)
P(30 or more) = normalcdf (lower bound,
upper bound, np, np(1  p) )
(b) What is the probability that 30 or more of
the people in the sample of 40 wear seat
belts?
P(30 or more) = normalcdf (lower bound,
upper bound,  x,  x)
P(30 or more) = normalcdf (lower bound,
upper bound, np, np(1  p) )
P(30 or more) = normalcdf(30, 1E99, 24,
3.098) ≈ 0.0264
Sampling Distribution of the
Number of Successes
A survey of hundreds of thousands of
college freshmen found that 63% believe
“dissent is a critical component of the
political process.”
Sampling Distribution of the
Number of Successes
A survey of hundreds of thousands of
college freshmen found that 63% believe
“dissent is a critical component of the
political process.”
Suppose you take a random sample of 100
of the freshmen surveyed. What is the
probability that you will find that between
56 and 70 of the freshmen in your sample
believe this?
Use Normal Distribution?
np = 100(0.63) = 63
n(1 – p) = 100 ( 1 – 0.63) = 37
Since both are at least 10, the shape of the
sampling distribution will be approximately
normal.
Now find  x and 
x
P(between 56 and 70) = normalcdf(lower
bound, upper bound,  x,  x)
P(between 56 and 70) = normalcdf(56, 70,
63, 4.83) ≈ 0.853
Therefore, there is about an 85.3%
probability that a sample of 100 freshmen
will contain between 56 and 70 freshmen
who believe that dissent is a critical
component of the political process.
Properties of the Sampling Distribution
of the Sample Proportion
To change from number of successes to
proportion of successes, divide by sample
size, n.
number of "successes"
p
sample size
Properties of the Sampling Distribution
of the Sample Proportion
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of p has these properties:
Properties of the Sampling Distribution
of the Sample Proportion
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of p has these properties:
• Mean of the sampling distribution is equal
to the mean of the population, or
 p
p
Properties of the Sampling Distribution
of the Sample Proportion
If a random sample of size n is selected
from a population with proportion of
successes p:
• Standard error of the sampling
distribution is equal to the standard
deviation of the population divided by the
square root of the sample size:
p
(
1

p
)
p
n
Properties of the Sampling Distribution
of the Sample Proportion
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of p has these properties:
Properties of the Sampling Distribution
of the Sample Proportion
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of p has these properties:
As the sample size gets larger, the shape
of the sampling distribution becomes more
normal and
Properties of the Sampling Distribution
of the Sample Proportion
If a random sample of size n is selected
from a population with proportion of
successes p, then the sampling
distribution of p has these properties:
As the sample size gets larger, the shape
of the sampling distribution becomes more
normal and will be approximately
normal if n is large enough (both np and
n(1 – p) are at least 10).
Drivers in the Northeast and Mid-Atlantic
states had the highest failure rate, 20%,
on the GMAC Insurance National Driver’s
Test.
Drivers in the Northeast and Mid-Atlantic
states had the highest failure rate, 20%,
on the GMAC Insurance National Driver’s
Test.
Describe the shape, center, and spread of
the sampling distribution of the proportion
of drivers who would fail the test in a
random sample of 60 drivers from these
states.
Drivers in the Northeast and Mid-Atlantic
states had the highest failure rate, 20%,
on the GMAC Insurance National Driver’s
Test.
Random sample of 60 drivers from these
states:
np = 60(.2) = 12
n(1 – p) = 60 (1 - .2) = 48
Both at least 10 so,
shape of sampling distribution of sample
proportion is approximately normal.
Drivers in the Northeast and Mid-Atlantic
states had the highest failure rate, 20%,
on the GMAC Insurance National Driver’s
Test.
Random sample of 60 drivers from these
states:
Mean of the sampling distribution is equal to
the mean of the population = 0.2
Drivers in the Northeast and Mid-Atlantic states
had the highest failure rate, 20%, on the GMAC
Insurance National Driver’s Test.
Random sample of 60 drivers from these states:
Spread:
p(1 p)
p
=
n
0.2(1  0.2)
60
≈ 0.05
Drivers in the Northeast and Mid-Atlantic states
had the highest failure rate, 20%, on the GMAC
Insurance National Driver’s Test.
Random sample of 60 drivers from these states:
Spread:
p
(
1

p
)
p
n
=
0.2(1  0.2)
60
≈ 0.05
What happens if we quadruple the
sample size?
Drivers in the Northeast and Mid-Atlantic states
had the highest failure rate, 20%, on the GMAC
Insurance National Driver’s Test.
Random sample of 60 drivers from these states:
Spread:
p
(
1

p
)
p
n
=
0.2(1  0.2)
60
≈ 0.05
What happens if we quadruple the
sample size? Spread is reduced in half.
What are the reasonably likely proportions of
drivers in the sample who would fail the
test?
What are the reasonably likely proportions of
drivers who would fail the test?
mean ± 1.96(standard error)
What are the reasonably likely proportions of
drivers who would fail the test?
mean ± 1.96(standard error)
0.2 ± 1.96(0.05)
So, reasonably likely proportions would be
between about 0.1 and 0.3
In the 2000 U.S. Census, 53% of the
population over age 30 were women.
Describe the shape, mean, and
standard error of the sampling
distribution of the sample proportion for
random samples of size 100 taken from
this population. Make an accurate
sketch, with a scale on the horizontal axis
of this distribution.
To be a member of the U.S. Senate, you
must be at least 30 years old. In 2000, 9
of the 100 members of the U.S. Senate
were women. Is this a reasonably likely
event if gender plays no role in whether a
person becomes a U.S. Senator?
Use mean and standard error from previous
problem.
To be a member of the U.S. Senate, you
must be at least 30 years old. In 2000, 9
of the 100 members of the U.S. Senate
were women. Is this a reasonably likely
event if gender plays no role in whether a
person becomes a U.S. Senator?
normalcdf(lower bound, upper bound, mean,
standard error)
To be a member of the U.S. Senate, you
must be at least 30 years old. In 2000, 9
of the 100 members of the U.S. Senate
were women. Is this a reasonably likely
event if gender plays no role in whether a
person becomes a U.S. Senator?
normalcdf(-1E99, 0.09, 0.53, 0.05) = ?
To be a member of the U.S. Senate, you
must be at least 30 years old. In 2000, 9
of the 100 members of the U.S. Senate
were women. Is this a reasonably likely
event if gender plays no role in whether a
person becomes a U.S. Senator?
normalcdf(-1E99 , 0.09, 0.53, 0.05) = 0
Reasonably likely event or not?
To be a member of the U.S. Senate, you
must be at least 30 years old. In 2000, 9
of the 100 members of the U.S. Senate
were women. Is this a reasonably likely
event if gender plays no role in whether a
person becomes a U.S. Senator?
normalcdf(-1E99 , 0.09, 0.53, 0.05) = 0
The probability of getting 9 or fewer women
just by chance is 0 so this is not a
reasonably likely event.
About 60% of Mississippians wear seat
belts. What proportion of seat belt users
would be reasonably likely to occur in a
random sample
a. of 40 drivers?
b. of 100 drivers?
c. of 400 drivers?
About 60% of Mississippians wear seat belts.
What proportion of seat belt users would be
reasonably likely to occur in a random sample
a. of 40 drivers?
b. of 100 drivers?
c. of 400 drivers?
Because the sampling distributions are
approximately normal, in each case
95% of the potential values of the
sample proportion will lie within
1.96 standard errors of the mean
About 60% of Mississippians wear seat belts. What
proportion of seat belt users would be reasonably likely
to occur in a random sample
a. of 40 drivers? b. of 100 drivers? c. of 400 drivers?
Questions?