Download 7.2 Estimating a Population Proportion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
7.2 – Estimating a Population Proportion:
Objectives:
1.
2.
3.
4.
5.
Find Critical Values
Interpret Confidence Intervals
Find the Margin of Error
Construct confidence intervals for population proportions
Determine the sample size.
Chapter Overview:
This chapter presents the beginning of inferential statistics.
• The two major activities of inferential statistics are:
1. To use sample data to estimate values of population parameters.
2. To test hypotheses or claims made about population parameters.
• We introduce methods for estimating values of these important population parameters:
proportions, means, and variances.
• We also present methods for determining sample sizes necessary to estimate those parameters.
Section Overview:
In this section we present methods for using a sample proportion to estimate the value of a population
proportion.
• The sample proportion is the best point estimate of the population proportion.
• We can use a sample proportion to construct a confidence interval to estimate the true value of a
population proportion, and we should know how to interpret such confidence intervals.
• We should know how to find the sample size necessary to estimate a population proportion.
Point Estimate:
A point estimate is a single value (or point) used to approximate a population parameter.
The sample proportion p̂ is the best point estimate of the population proportion p.
Example: In a Pew Research Center poll, 70% of 1501 randomly selected adults in the United States
believe in global warming, so the sample proportion is p̂ = 0.70. Find the best point estimate of the
proportion of all adults in the United States who believe in global warming.
Solution: Because the sample proportion is the best point estimate of the population proportion, we
conclude that the best point estimate of p is 0.70. When using the sample results to estimate the
percentage of all adults in the United States who believe in global warming, the best estimate is 70%.
A point estimate fails to indicate just how good that estimate really is. Because of this flaw, another type
of estimate has been developed called a confidence interval.
Confidence Interval:
A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true
value of a population parameter. A confidence interval is sometimes abbreviated as CI.
A confidence interval is associated with a confidence level, such as 0.95 (or 95%). The confidence level
gives us the success rate of the procedure used to construct the confidence interval.
Confidence Level:
The confidence level is the probability 1 – α (often expressed as the equivalent percentage value) that
the confidence interval actually does contain the population parameter, assuming that the estimation
process is repeated a large number of times. (The confidence level is also called degree of confidence,
or the confidence coefficient.)
For instance, a 0.95 (or 95%) confidence level, α = 0.05. For a 0.99 (or 99%) confidence level,
α = 0.01.
Most common choices for confidence levels are 90% ( α = 0.10), 95 % ( α = 0.05), or 99% ( α = 0.01).
Interpreting a Confidence Interval:
We must be careful to interpret confidence intervals correctly. There is a correct interpretation and many
different and creative incorrect interpretations of the confidence interval 0.677 < p < 0.723.
Correct Interpretation:
“We are 95% confident that the interval from 0.677 to 0.723 actually does contain the true value of the
population proportion p.” This means that if we were to select many different samples of size 1501 and
construct the corresponding confidence intervals, 95% of them would actually contain the value of the
population proportion p.
Note:
In this correct interpretation, the level of 95% refers to the success rate of the process being used to
estimate the proportion.
Incorrect Interpretation:
There is a 95% chance that the true value of p will fall between 0.677 and 0.723. It would also be
incorrect to say that “95% of sample proportions fall between 0.677 and 0.723.
Caution:
Confidence intervals can be used informally to compare different data sets, but the overlapping of
confidence intervals should not be used for making formal and final conclusions about equality of
proportions.
Critical Values:
A standard z score can be used to distinguish between sample statistics that are likely to occur and those
that are unlikely to occur. Such a z score is called a critical value. The number zα/2 is a critical value that
is a z score with the property that it separates an area of α/2 in the right tail of the standard normal
distribution.
Critical values are based on the following observations:
1. Under certain conditions, the sampling distribution of sample proportions can be approximated
by a normal distribution.
2. A z score associated with a sample proportion has a probability of α/2 of falling in the right tail.
3. The z score separating the right-tail region is commonly denoted by zα/2 and is referred to as a
critical value because it is on the borderline separating z scores from sample proportions that are
likely to occur from those that are unlikely to occur.
Notation for Critical Value:
1. The critical value zα/2 is the positive z value that is at the vertical boundary separating an area of
α/2 in the right tail of the standard normal distribution.
2. The value of –zα/2 is at the vertical boundary for the area of α/2 in the left tail.
3. The subscript α/2 is simply a reminder that the z score separates an area of α/2 in the right tail of
the standard normal distribution.
Example: Find the critical value zα/2 corresponding to a 95% confidence level.
Solution: A 95% confidence level corresponds to α = 0.05 and α / 2 = 0.025 . Therefore, by noting that
the cumulative area to the left must be 1 − α = 1 − 0.025 = 0.975 and using table A-2, we find the
z-score to be 1.96.
For a 95% confidence level, the critical value is therefore zα
Example: Find the critical value zα
2
2
= 1.96 .
that corresponds to a 98% confidence level.
Solution: A 98% confidence level corresponds to α = 0.02 and α / 2 = 0.01 . Therefore, by noting that
the cumulative area to the left must be 1 − α 2 = 1 − 0.01 = 0.99 and using table A-2, we find the
z-score to be 2.33.
Example: Find the critical value zα
2
that corresponds to a 91% confidence level.
Solution: A 91% confidence level corresponds to α = 0.09 and α / 2 = 0.045 . Therefore, by noting that
the cumulative area to the left must be 1 − α 2 = 1 − 0.045 = 0.955 and using table A-2, we find the
z-score to be 1.70.
Margin of Error:
When data from a simple random sample are used to estimate a population proportion p, the margin of
error, denoted by E, is the maximum likely difference (with probability 1 – α, such as 0.95) between the
observed proportion and the true value of the population proportion p.
The margin of error E is also called the maximum error of the estimate and can be found by multiplying
the critical value and the standard deviation of the sample proportions:
E = zα 2
pˆ qˆ
n
For a 95% confidence level, α = 0.05 , so there is a probability of 0.05 that the sample proportion will be
in error by more than E.
Example: Assume that a sample is used to estimate a population proportion p. Find the margin of error
E that corresponds to the given statistics and confidence level. 95% confidence level; n = 320, p̂ = 0.60.
Solution: The z-score corresponding to a 95% confidence level is 1.96.
E = zα 2
pˆ qˆ
n
(0.6)(0.4)
320
E = 1.96(0.027 )
E = 0.054
E = 1.96
Example: Assume that a sample is used to estimate a population proportion p. Find the margin of error
E that corresponds to the given statistics and confidence level. 90% confidence level; n = 480, p̂ = 0.7.
Solution: The z-score corresponding to a 90% confidence level is 1.645.
E = zα 2
pˆ qˆ
n
(0.7)(0.3)
480
E = 1.645(0.021)
E = 1.645
E = 0.034
Constructing a Confidence Interval for a Population Proportion p:
We may construct a confidence interval given the following:
p = population proportion
p̂ = sample proportion
qˆ = 1 − pˆ
n = number of sample values
E = margin of error
zα/2 = z score separating an area of α/2 in the right tail of the standard normal distribution
Then the confidence interval may be written as:
pˆ − E < p < pˆ + E
where
E = zα 2
pˆ qˆ
n
The confidence interval is often expressed in the following equivalent forms:
pˆ ± E
or
( pˆ − E , pˆ + E )
Requirements:
1. The sample is a simple random sample.
2. The conditions for the binomial distribution are satisfied: there is a fixed number of trials, the
trials are independent, there are two categories of outcomes, and the probabilities remain
constant for each trial.
3. There are at least 5 successes and 5 failures.
Procedure for Constructing Confidence Intervals:
1.
2.
3.
4.
Verify that requirements are met.
Find the critical value for the confidence level.
Find the margin of error.
Determine confidence interval.
Example: Use the given degree of confidence and sample data to construct a confidence interval for the
population proportion p. n = 51, p̂ = .27; 95% confidence level.
Solution: Assume requirements are met. The z-score corresponding to a 95% confidence level is 1.96.
The margin of error is:
E = zα 2
pˆ qˆ
n
(0.27)(0.73)
51
E = 1.96(0.062)
E = 0.122
E = 1.96
The confidence interval pˆ − E < p < pˆ + E is:
0.27 − 0.122 < p < 0.27 + 0.122
0.148 < p < 0.392
14.8% < p < 39.2%
Example: Use the given degree of confidence and sample data to construct a confidence interval for the
population proportion p. n = 110, p̂ = .55; 88% confidence level.
Solution: Assume requirements are met. The z-score corresponding to an 88% confidence level is 1.56.
The margin of error is:
E = zα 2
pˆ qˆ
n
(0.55)(0.45)
110
E = 1.56(0.047)
E = 0.074
E = 1.56
The confidence interval pˆ − E < p < pˆ + E is:
0.55 − 0.074 < p < 0.55 + 0.074
0.476 < p < 0.624
47.6% < p < 62.4%
Example: Use the given degree of confidence and sample data to construct a confidence interval for the
population proportion p. n = 125, p̂ = .72; 90% confidence level.
Solution: Assume requirements are met. The z-score corresponding to a 90% confidence level is 1.645.
The margin of error is:
E = zα 2
pˆ qˆ
n
(0.72)(0.28)
125
E = 1.645(0.002)
E = 1.645
E = 0.003
The confidence interval pˆ − E < p < pˆ + E is:
0.72 − 0.003 < p < 0.72 + 0.003
0.717 < p < 0.723
71.7% < p < 72.3%
Example: A survey of 865 voters in one state reveals that 408 favor approval of an issue before the
legislature. Construct the 95% confidence interval for the true proportion of all voters in the state who
favor approval.
Solution: Assume requirements are met. The z-score corresponding to a 95% confidence level is 1.96.
The margin of error is:
E = zα 2
pˆ qˆ
n
(0.472)(0.528)
865
E = 1.96(0.017)
E = 0.033
E = 1.96
The confidence interval pˆ − E < p < pˆ + E is:
0.472 − 0.033 < p < 0.472 + 0.033
0.439 < p < 0.505
43.9% < p < 50.5%
A survey found that 47.2% of all voters approve of an issue. We can be 95% confident that the actual
percentage will fall between 43.9% and 50.5%.
Example: Of 367 randomly selected medical students, 30 said that they planned to work in a rural
community. Find a 95% confidence interval for the true proportion of all medical students who plan to
work in a rural community.
Solution: Assume requirements are met. The z-score corresponding to a 95% confidence level is 1.96.
The margin of error is:
E = zα 2
pˆ qˆ
n
(0.082)(0.918)
367
E = 1.96(0.014)
E = 1.96
E = 0.028
The confidence interval pˆ − E < p < pˆ + E is:
0.082 − 0.028 < p < 0.082 + 0.028
0.054 < p < 0.110
5.4% < p < 11.0%
8.2% of medical students surveyed planned to work in a rural community. We can be 95% confident that
the actual number is between 5.4% and 11%.
Analyzing Polls:
When analyzing polls consider the following:
1. The sample should be a simple random sample, not an inappropriate sample (such as a voluntary
response sample).
2. The confidence level should be provided. (It is often 95%, but media reports often neglect to
identify it.)
3. The sample size should be provided. (It is usually provided by the media, but not always.)
4. Except for relatively rare cases, the quality of the poll results depends on the sampling method
and the size of the sample, but the size of the population is usually not a factor.
Caution:
Never follow the common misconception that poll results are unreliable if the sample size is a small
percentage of the population size. The population size is usually not a factor in determining the
reliability of a poll.
Determining Sample Size:
Suppose we want to collect sample data in order to estimate some population proportion. The question
is how many sample items must be obtained? If we solve the formula for margin of error, we obtain the
following formula:
[
z ]
n=
2
α 2
pˆ qˆ
E2
This formula requires p̂ as an estimate of the population proportion. I f no such estimate exists, we
replace p̂ with 0.5 and q̂ with 0.5 to obtain:
[
z ] 0.25
n=
2
α 2
E2
These formulas show that the sample size does not depend on the size of the population, but rather, on
the desired confidence level and margin of error and sometimes p̂ .
Round Off Rule for Sample Size:
If the computed sample size is not a whole number, then round the value up to the next whole number.
Example: Use the given data to find the minimum sample size required to estimate the population
proportion. Margin of error: 0.05; confidence level: 99%; from a prior study, p̂ is estimated at 0.15.
Solution:
[z ] pˆ qˆ
n=
2
α 2
E2
2
[
2.575] (0.15)(0.85)
n=
(0.05) 2
0.845
n=
0.0025
n = 338
Example: Use the given data to find the minimum sample size required to estimate the population
proportion. Margin of error: 0.024; confidence level: 95%; p̂ is unknown
Solution:
[z ] (0.25)
n=
2
α 2
E2
2
[
1.96] (0.25)
n=
(0.024) 2
0.9604
n=
0.000576
n = 1667
Example: A newspaper article about the results of a poll states: "In theory, the results of such a poll, in
99 cases out of 100 should differ by no more than 5 percentage points in either direction from what
would have been obtained by interviewing all voters in the United States." Find the sample size
suggested by this statement.
Solution: Margin of error: 0.05; confidence level: 99% p̂ is unknown.
[z ] (0.25)
n=
2
α 2
E2
2
[
2.575] (0.25)
n=
(0.05) 2
1.6576
n=
0.0025
n = 663
Finding the Point Estimate and E from a Confidence Interval:
Sometimes we want to better understand a confidence interval that might have been obtained from a
journal article or generated by computer software. If we already know the confidence interval limits,
then the sample point estimate and E can be found as follows:
Point estimate of p:
pˆ =
(upper confidence limit) + (lower confidence limit)
2
Margin of Error:
E=
(upper confidence limit) - (lower confidence limit)
2
Example: Find the point estimate of the proportion of people who wear hearing aids if, in a random
sample of 304 people, 20 people had hearing aids. The confidence interval is 5.9% to 7.5%
Solution: Using the formula
(upper confidence limit) + (lower confidence limit)
2
0.075 + 0.059
pˆ =
2
pˆ = 0.067
pˆ =
The point estimate would be 6.7%. Also, remember that a point estimate is also obtained by simple
finding the proportion 20/304 = 0.0657.