Download ges2e_Ch07

Document related concepts
Transcript
Chapter 7
Survey
Sampling and
Inference
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 1
Chapter 7 Topics
• Discuss survey quality and identify possible
sources of bias in surveys
• Use the Central Limit Theorem for Sample
Proportions to construct confidence intervals
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 2
Section 7.1
anaken2012. Shutterstock
LEARNING ABOUT THE WORLD
THROUGH SURVEYS
• Distinguish between Populations and Samples,
Parameters and Statistics
• Identify Possible Sources of Bias in Surveys
• Identify the features of a Simple Random Sample
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 3
Populations and Parameters
A population is a group of objects or people we
with to study.
A parameter is a numerical value that
characterizes some aspect of the population.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 4
Populations and Parameters: Example
Data is gathered on the heights of all NBA
basketball players. The mean of the data is
calculated.
The population is all NBA basketball players.
The parameter is the mean height of all NBA
basketball players.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 5
Measuring Populations: Census
A census is a survey which measures every
member of a population.
Most populations are too large for a census, so
we study populations by measuring samples
instead.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 6
Samples and Statistics
A sample is a collection of objects or people
taken from the population of interest.
A statistic is a numerical characteristic of a
sample of data. Because a statistic is used to
estimate the value of a population
characteristic, it sometimes called an estimator.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 7
Samples and Statistics: Example
A researcher is interested in studying the heights
of American women. She takes a random
sample of 1500 American women finds the
mean height of the sample is 63.6 inches.
The sample is the 1500 American women.
The statistic is the sample mean of 63.6 inches.
The researcher may estimate that the average
American woman is 63.6 inches tall.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 8
Statistical Inference
• Statistical inference is the art and science of
drawing conclusions about a population based
on observing characteristics of samples.
• It involves uncertainty because the entire
population is not being measured.
• An important component of statistical
inference is measuring our uncertainty.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 9
Example:
Genetically Modified Foods
In January 2015 the Pew Research Center
published a report stating that 37% of
Americans believed that genetically modified
foods (GMOs) were safe to eat. This was based
on a survey of 2002 American adults. Identify
the population and sample. What was the
parameter of interest? What is the statistic?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 10
Example: GMOs (continued)
The population is all American adults.
The sample was the 2002 American adults who
were surveyed.
The parameter of interest is the percentage of
all American adults who believe that GMOs are
safe to eat.
The statistic is 37% (the percentage of the
sample who felt this way).
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 11
Statistics vs. Parameters
Statistics are knowable – any time we collect
data we can find the value of a statistic.
Parameters are typically unknown – they can be
estimated with statistics, but these estimates
will involve some degree of uncertainty.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 12
Notation
In general, Greek characters are used to
represent population parameters. English
letters are used to represent statistics.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 13
Bias
A survey method is biased if it has a tendency to
produce an untrue value.
Three types:
1. Measurement bias
2. Sampling bias
3. Use of an estimator that is biased
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 14
Measurement Bias
• Results from asking questions that do not
produce a true answer; occurs when
measurements tend to record values larger (or
smaller) than the true value
Example: Asking people, “How much do you
weigh?” It is likely that people will report a
number less than their actual weight, resulting
in an estimate that tends to be too small.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 15
Sources of Measurement Bias
Measurement bias can occur in a variety of
situations including:
• Self-reporting of personal data
• The use of confusing wording in survey
questions
• The use of non-neutral language in questions
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 16
Sampling Bias
• Occurs when a sample is used that is not
representative of the population
Example: Internet polls – people who answer
these polls tend to be those who have strong
feelings about the results and are not
necessarily representative of the population
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 17
Important Questions to Ask about
Sampling
1. What percentage of people who were asked
to participate actually did so?
2. Did researchers choose people to participate
in the survey or did the people themselves
choose to participate?
If a large percentage of those chosen to
participate refused to do so or if people
themselves chose to participate, the conclusions
of the survey are suspect.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 18
Example: Identifying Sources of
Possible Bias
A school district is interested in finding out what
percentage of voters in the district would favor
the passage of a school bond measure. The
district sends out a survey to all school parents
asking, “Do you favor the passing of a school
bond measure to provide additional resources
to improve educational opportunities for your
student?” Identify any sources of possible bias
in this survey.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 19
Example: Identifying Sources of
Possible Bias
The use of non-neutral language in the wording
of the question, “Do you favor the passing of a
school bond measure to provide additional
resources to improve educational opportunities
for your student?” introduces possible
measurement bias into the survey.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 20
Simple Random Sampling (SRS)
• Draw subjects at random from the population
without replacement
• A random sample is one in which every
member of the population is equally likely to
be chosen for the sample.
• A true random sample is difficult to achieve.
• Statisticians have developed methods for
producing random samples that can be used
to estimate characteristics of populations.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 21
Section 7.2
Lord and Leverett. Pearson Education Ltd
MEASURING THE QUALITY OF A
SURVEY
• Use Accuracy and Precision to Measure the Quality
of a Survey
• Describe the Important Features of a Sampling
Distribution, Including the Standard Error
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 22
Evaluating Surveys
Statisticians evaluate the method used for a
survey, not the outcome of a single survey.
Example: If a group of researchers were to
survey 1000 randomly selected people, we
would expect the results to vary from sample to
sample. Because we would want to know how
the group did as a whole, we evaluate the
estimation methods, not the individual
estimates.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 23
The Goal: Accuracy and Precision
An estimation method should be both accurate
and precise.
• Accurate – The method measures what it
intended; correctly estimates the population
parameter.
• Precise – If the method is repeated, the
estimates are very consistent.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 24
Accuracy and Precision
If the goal is to get the golf ball in the hole,
which of these pictures shows a method that
accurate? Precise? Both?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 25
Accuracy and Precision
T
This picture shows both
accuracy and precision.
This picture shows precision, but
not accuracy.
This picture shows accuracy, but
not precision.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 26
Understanding the Behavior of
Estimators: Simulation 1
We have a small population of eight people, two of
whom are Cat People (C) and six of whom are Dog
People (D). Random samples of four are taken from this
population and the percentage of Cat People are noted
for each sample.
Note: 25% of the population are Cat People.
• Will each sample contain 25% Cat People?
• Is is possible to get a sample with 0 Cat People?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 27
Simulation 1: Looking at Data
Using a random number table, three random
samples were taken and results are shown
below:
D D D D (0% Cat People)
D C D D (25% Cat People)
C D D D (25% Cat People)
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 28
Simulation 1: Looking at Data
Note: The sample proportion p̂ changes from
sample to sample, but the population sample (p)
remains the same.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 29
Sampling Distribution of p̂
The probability distribution of p̂ is called the
sampling distribution. We can represent it
using a table or a graph.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 30
Notes about the Simulation
• Our estimator, p̂ , is not always the same as
our parameter, p.
• The mean of the sampling distribution is 25%,
the same as the value of p – this indicates that
the estimator p̂ is unbiased.
• Even though p̂ is not always equal to p, the
estimate is never more than 25 percentage
points away from p.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 31
The Standard Error
The standard error (SE) is the standard
deviation of the sampling distribution. It
measures how much an estimator typically
varies from sample to sample. When the
standard error is small, we say the estimator is
precise.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 32
Simulation 2:
Using a Larger Population
This time we consider a population of 1000
people, 250 of whom are Cat People (25% of the
population). We take random samples of size 10
and find the sample proportion, p̂ , of Cat
People in each sample. Using technology,
10,000 samples were taken and the results are
shown in the graph on the next slide.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 33
Simulation 2: Looking at Data
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 34
Simulation 2: Looking at Data
Since the center of the distribution is about
25%, the bias of p̂ is still 0, even though a
larger population and sample size were used.
The variation of p̂ is less; this estimator is more
precise even though the population is larger. In
general, the precision has nothing to do with the
size of the population, but only with the size of
the sample.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 35
Simulation 3: Large Samples Produce
More Precise Estimators
Again we consider a population of 1000 people,
250 of whom are Cat People (25% of the
population). This time we increase our sample
size from 10 to 100, take 10,000 such samples,
and compute the proportion of Cat People in
each of the samples. The following slide shows
the results of this simulation.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 36
Simulation 3: Looking at Data
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 37
Simulation 3: Looking at Data
• The estimation method remains unbiased
(center is still at 25%, the population
proportion).
• The shape of the histogram looks more
symmetric than the one for samples of size 10.
• The estimator is more precise because it uses
a larger sample size (standard error is smaller
than for samples of size 10).
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 38
Summary of Three Simulations
1. The estimator p̂ is unbiased for all sample
sizes (as long as we take random samples).
2. The precision improves as the sample size
gets larger.
3. The shape of the sampling distribution is
more symmetric for larger sample sizes.
Increasing sample size improves precision.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 39
Finding Bias and Standard Error
The bias of p̂ is 0.
The standard error of p̂ is
SE 
p(1  p)
n
if the following conditions are met:
1. The sample is randomly selected from the
population of interest.
2. If the sampling is without replacement, the
population must be at least 10 times larger than the
sample size.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 40
Section 7.3
racorn. Shutterstock
THE CENTRAL LIMIT THEOREM FOR
SAMPLE PROPORTIONS
• Identify the Conditions Needed to Apply the Central
Limit Theorem for Proportions
• Use the Central Limit Theorem for Proportions to
Describe the Sampling Distribution of a Sample
Proportion and to Find Probabilities
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 41
Central Limit Theorem
for Sample Proportions
• Used to estimate proportions in a population
• Tells us that, if some basic conditions are met,
the sampling distribution of the sample
proportion is close to the Normal distribution
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 42
Central Limit Theorem
for Sample Proportions
If we take a random sample from a population, and if the sample
size is large and the population size is much larger, then the
sampling distribution of p̂ is approximately Normal with a mean
of p and a standard deviation of
SE 
p(1  p)
n
If you don’t know the value of p, you can substitute the value of
p̂ .
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 43
Conditions for the Central Limit
Theorem for Sample Proportions
1. Sampling is random and independent.
2. Large sample: The sample size, n, is large
enough that the sample expects at least 10
successes (yes) and 10 failures (no).
np̂ ³10 and n(1- p̂) ³10
3. Big population: If sampling is done without
replacement, the population must be at least
10 times larger than the sample size.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 44
CLT for Sample Proportions:
A Picture
This figure shows the sampling distribution of p̂ for 10,000
samples of size 10 drawn from a population with p = 0.25. Note
that this does not satisfy condition 2 of the CLT – the sample size
is too small and the distribution is not Normal.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 45
CLT for Sample Proportions:
A Picture
This figure shows the sampling distribution of p̂ for 10,000
samples of size 100 drawn from a population with p = 0.25. Note
that this does satisfy the sample size condition of the CLT and
the shape of the distribution is approximately Normal.
The standard deviation (or standard error) is:
0.25  0.75
 0.0433.
100
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 46
Example: Vegetarians
According to www.statisticbrain.com, 59% of
vegetarians are women. Suppose a random
sample of 500 vegetarians is taken. What is the
approximate probability that the proportion of
females in our sample will be more than 63%?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 47
Example: Vegetarians
First, check conditions:
1. The sample was random and independent.
2. Large sample size:
np = 500(0.59) = 295, and 295 ≥ 10
n(1 – p) = 500(1 – 0.59) = 205, and 205 ≥ 10
3. We can assume that there are at least 10(500) = 5000
vegetarians in the population.
Since these conditions are met, the CLT tells us the sampling
distribution will be approximately normal, with
SE = p(1  p)
0.59  0.41
n

500
 0.022.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 48
Example: Vegetarians
To find the probability that the proportion in the
sample is greater than 63%, we can use
technology to find the probability of getting a
value larger than 0.63 in a N(0.59, 0.022).
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 49
Example: Vegetarians
The probability that the proportion of females in the sample will
be more than 63% is 0.0345.
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 50
Example: Voting
In a closely contested school bond election, polling
indicates that 50% of voters are in favor of the bond and
50% are against the bond. Suppose a random sample of 80
voters is selected.
1. What percentage of the sample would we expect to
favor the bond?
2. Does the Central Limit Theorem apply?
3. What is the standard error for this sample proportion?
4. If so, what is the approximate probability that the
sample proportion will fall within two standard errors of
the population value p = 0.50?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 51
Example: Voting
1. Because we have taken a random sample we expect the
sample proportion to be about the same as the
population proportion, so we expect 50% favor the
bond.
2. We have a random sample, np = 80 x 0.50 = 40, and
n(1 – p) = 80 x 0.50 = 40. Since 40 ≥ 10, our sample is
large enough. We will assume there are more than 800
voters in the district, so the population size is at least 10
times larger than the sample size. The conditions for the
CLT are met.
3. The SE =
0.50  0.50
 0.0559.
80
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 52
Example: Voting
4. Since the CLT applies, we can use the Normal model for
the distribution of sample proportions
N(0.50, 0.0559).
By the Empirical Rule, we know that the probability that the
sample proportion will fall within two standard errors of
0.50 is 95%. We could also verify this using technology.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 53
Example: Voting
0.50 + 2SE = 0.50 + 2(0.0559) = 0.6118
0.50 – 2SE = 0.50 – 2(0.0559) = 0.3882
The probability of being within 2 SE of the mean is 0.9545≈ 95%.
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 54
Key Idea
If the conditions for the CLT are met, the
probability that a sample proportion will fall
within two standard errors of the population
value is 95%.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 55
Example: Dog Owners
According to the American Veterinarian Medical
Association, 36.5% of all Americans own a dog.
Suppose a random sample of 200 Americans is
taken. Would it be surprising to find that 45% of
the sample owned a dog? Why or why not?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 56
Example: Dog Owners
Check the conditions needed to apply the CLT are satisfied:
1. Random independent sample
2. np = 200(0.365) = 73, n(1-p) = 200(1 – 0.365) = 127. The
sample is large enough since both 73 and 127 are greater
than or equal to 10.
3. The population is at least 10 times larger than the sample.
So the distribution of sample proportions will be Normal with a
mean = 0.365 and a standard error =
0.365(1  0.365)
 0.034.
200
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 57
Example: Dog Owners
mean + 2 SE = 0.365 + 2(0.034) = 0.433
Since 45% is more than two standard errors
from the mean, it would be surprising.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 58
Section 7.4
ocphoto. Shutterstock
ESTIMATING THE POPULATION
PROPORTION WITH CONFIDENCE
INTERVALS
• Use the Central Limit Theorem to Construct a
Confidence Interval for a Population Proportion
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 59
Affordable Care Act: Birth Control
The Kaiser Health Tracking Poll surveyed 1504
American adults and asked them if they
supported or opposed the requirement that
private health insurance plans cover the full cost
of birth control. The poll found the 61% of
respondents supported this requirement.
What percentage of ALL American adults
support this requirement?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 60
Affordable Care Act: Birth Control
From previous sections we know:
1. Our estimator p̂ is unbiased; the population parameter will
be very close to 61%.
2. The standard error is estimated as
pˆ (1  pˆ )

n
0.61  0.39
 0.0126.
1504
3. Because the sample size is large, the probability distribution
of p̂ is approximately Normal and centered around the true
population proportion.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 61
Affordable Care Act: Birth Control
There’s about a 95% chance that p̂ is closer than 2 standard
errors away from the population proportion.
2SE = 2 ´ 0.0126 = 0.0252
So we are 95% confident that the value of the population
proportion is within 2.5 percentage points of 61%.
61% – 2.5% = 58.5%
61% + 2.5% = 63.5%
We are 95% confident that the population proportion is between
58.5% and 63.5%. This is called a confidence interval.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 62
Confidence Intervals
Confidence intervals provide us with:
1. A range of plausible values for a population
parameter.
2. A confidence level, which expresses our level
of confidence that the interval contains the
population parameter.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 63
Confidence Level
• Tells us how often the estimation is successful
• Measures the success rate of the method, not
of any one particular interval
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 64
Margin of Error
• Tells us how far from the population value our
estimate can be
Margin of error = z* SE
where z* is a number that tells how many
standard errors to include in the margin of error.
From the Empirical Rule we know that z* = 1
corresponds with a confidence level of 68% and
z* = 2 corresponds with a confidence level of
95%.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 65
Confidence Level and Margin of Error
This table shows the more precise z* values for
some frequently used confidence levels.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 66
Confidence Interval for a Population
Proportion
Confidence intervals have the form
p̂ ± m
where m is the margin of error. The margin of error is z*SE
so the confidence interval is
pˆ  z * SE.
To find SE, we substitute p̂ for p since p Is unknown, and
estimate the standard error as
SEest 
pˆ (1  pˆ )
.
n
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 67
Example: Intelligent Life
on Other Planets
A Huffpost/YouGov poll of 1000 Americans
found that 38% believed that there is intelligent
life on other planets.
a. Construct a 95% confidence interval for the
percentage of all Americans who believe
there is intelligent life on other planets.
b. Would it be plausible to conclude that 40% of
Americans believe in intelligent life on other
planets?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 68
Example: Intelligent Life
on Other Planets
Check that the conditions for the Central Limit Theorem apply.
1. Random independent sample
2. Large Sample:
p̂ = 0.38
np̂ =1000(0.38) = 380
n(1- p̂) =1000(1- 0.38) = 620
both 380 and 620 are greater than or equal to 10.
3. The population is at least 10 times larger than the sample.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 69
Example: Intelligent Life on Other
Planets
The confidence interval can be constructed by first calculating
the standard error or by using technology.
1. SE = 0.38(1- 0.38)
= 0.0153
1000
2. m = 1.96(0.0153) = 0.03
3. 0.38 – 0.03 = 0.35 0.38 + 0.03 = 0.41
The 95% confidence interval is 35% to 41%. It is plausible that
40% of Americans believe in intelligent life on other planets,
since 40% is contained within the confidence interval.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 70
Using the TI-84 Calculator
To construct a confidence interval for a population
proportion on the TI-84 calculator:
1. Push STAT then select TESTS.
2. Select option 1-PropZInt.
3. Enter values for x and n, enter the confidence
level
(C-Level), and press Calculate.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 71
Using StatCrunch
Stat > Proportion Stats > One sample > with data
Enter # successes (x), # observations (n), select
Confidence Interval and enter confidence level.
Click Compute.
L Limit = lower limit of confidence interval
U Limit = upper limit of confidence interval
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 72
Confidence Interval:
(0.350, 0.410)
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 73
Example: Election Prediction
A political consultant randomly surveys 300
voters in a district to see how many intend to
vote for a certain candidate. Of the 300 voters
surveyed, 165 indicate they will vote for the
candidate. Using a 99% confidence interval,
should the consultant predict that the candidate
will win the election? Why or why not?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 74
Example: Election Prediction
Confidence interval by applying the CLT:
1. Random independent sample
2. Large sample size p̂ = 165/300 = 0.55
np̂ = 300(0.55) = 165
n(1- p̂) = 300(1- 0.55) = 135
Both 165 and 135 are greater than or equal to 10.
3. We can assume that the number of voters is greater than or
equal to 10(300) = 3000.
The conditions needed to apply the CLT are met.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 75
Example: Election Prediction
Constructing the confidence interval:
1. The standard error =
0.55(1  0.55)
300
 0.0287.
2. m = 2.58(0.0287) = 0.074
3. 0.55 – 0.074 = 0.476
0.55 + 0.075 = 0.624
The 99% confidence interval is (0.476, 0.624). Since the interval
contains plausible values that are less than 50% the consultant
might not predict the candidate will win the election.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 76
Example: Election Prediction
Confidence intervals are frequently constructed using
technology. This is the StatCrunch output for the99%
confidence interval based on on the polling data.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 77
Relationship between Confidence
Level and Interval Width
As the level of confidence increases, the width
of the confidence intervals also increases.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 78
Section 7.5
Sozaijiten
COMPARING TWO POPULATION
PROPORTIONS WITH CONFIDENCE
• Construct a Confidence Interval for the Difference of
Two Population Proportions
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 79
Comparing Two Population
Proportions: Introduction
In 2002, a Pew Poll based on a random sample
of 1500 people suggested that 43% of
Americans approved of stem cell research. In
2009 a new poll of a different sample of 1500
people found that 58% approved.
Did American opinion really change? Or do the
sample proportions differ just by chance? Could
the population proportions be the same even
though the sample proportions are different?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 80
Sample Proportions
Even if two population proportions are equal,
the sample proportions drawn from these
populations are usually different.
Confidence intervals are one method for
determining whether different sample
proportions indicate there are “real” differences
in the population proportions.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 81
Basic Approach
Our comparison of two populations proportions
will be based on the statistic
pˆ1  pˆ 2 .
This statistic is used to estimate the difference
between two population proportions
p1  p2 .
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 82
Basic Approach
1. Find a confidence interval for the difference in
proportions p1 – p2.
2. Check to see if 0 is included in the interval.
• If 0 is in the interval, this suggests the two
population proportions might be the same because if
p1 – p2 = 0, then
p1 = p2 and the proportions are the same.
• If 0 is not in the interval, the confidence interval tells
us how much greater one of the proportions might
be than the other.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 83
General Method
The confidence interval for two proportions has the same
structure as the confidence interval for one proportion
statistic ± z* x SEest
where
p̂ (1- p̂1 ) p̂2 (1- p̂2 )
SEest = 1
+
n1
n2
and z* is chosen to get the desired confidence level, as we did for
one proportion. Putting these two together, we see the
confidence interval for the difference of two proportions is
pˆ1  pˆ 2  z
*
pˆ1 (1  pˆ1 ) pˆ 2 (1  pˆ 2 )

.
n1
n2
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 84
Using the TI-84 Calculator
To construct a confidence interval for the difference of
two population proportions on the TI-84 calculator:
1. Push STAT > TESTS and select the 2-PropZInt
option.
2. Enter x1, n1, x2, n2 and the C-level you want.
Make sure to round values for x1 and x2 to whole
numbers.
3. Press Calculate.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 85
Using StatCrunch
Select STAT > Proportion Stats > Two samples > with
summary.
Enter the number of successes and sample size for each
sample. Click the option Confidence Interval for p1 – p2
and adjust your confidence level.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 86
Checking Four Conditions
Before constructing a confidence interval for the difference in
the population proportions. whether you use the formula or
technology, check these conditions:
1. Random and Independent samples. Both samples are
randomly drawn from their populations and are independent
of each other.
2. Large Samples. Both sample sizes are large enough that at
least 10 success and 10 failures can be expected in both
samples. In symbols, this means we need to check:
n1 p̂1 ³10
n1 (1- p̂1 ) ³10
n2 p̂2 ³10
n2 (1- p̂2 ) ³10
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 87
Checking Conditions (Continued)
3. Big Population. If the samples are collected without
replacement, both population sizes must be at least 10 times
bigger than their samples.
3. Independent Samples. The samples must be independent of
each other. This means there can be no relationship
between the objects in one sample and the objects in the
other.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 88
Example: Space Station
In August 2014 the Pew Poll asked random samples of 2002
Americans (non-scientists) and 3748 scientists if the Space
Station has been a good investment for the US. Of the nonscientists polled, 64% said that the Space Station has been a
good investment; of the scientists polled, 68% said that the
Space Station has been a good investment.
Construct a 95% confidence interval for the difference between
the proportion of scientists and non-scientists who believe the
Space Station has been good investment for the US. Based on
your confidence interval, is there a difference in the population
proportions? Explain.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 89
Example: Space Station
Check conditions:
1. Random and Independent
2. Large Samples: all values greater than or equal to 10
a.
b.
2002(0.64) = 1281.3 2002(1-0.64) = 720.7
3748(0.68) = 2548.6 3748(1-0.68) = 1199.4
3. Big Populations. The populations are at least 10 times
greater than the sample sizes.
4. Independent Samples
The conditions are met. Use technology to construct the
confidence interval.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 90
Example: Space Station
Non-scientists:
64% of 2002 = .64(2002) = 1281.28 ≈ 1281
Scientists:
68% of 3748 = .68(3748) = 2548.64 ≈ 2549
Use technology to construct the confidence
interval.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 91
Confidence Interval
(–0.0660, –0.0144)
Copyright © 2017, 2014 Pearson Education, Inc.
Slide 92
Interpreting Confidence Interval for
Two Proportions
When constructing a confidence interval for p1 – p2:
Interval
Contains 0
Both values are
positive (+,+)
Both values are
negative (-, -)
Interpretation
The population
proportions may be
equal
The population
proportions are
different and p1>p2
The population
proportions are
different and p1<p2
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 93
Interpreting Space Station Confidence
Interval
Our confidence interval for p1 – p2 is (–0.0660, –0.0144).
The interval does not contain 0, so there is a significant different
in the population proportions.
p1 - p2
p̂1 - p̂2
Since both values in the confidence interval are negative, this
tells us that p1 < p2. In other words, the proportion of nonscientists who believe the space station has been a good
investment is less than that proportion of scientists who believe
so. The plausible values for the difference in the population
proportions lie between 1.4% and 6.6%.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 94
Example: Interpreting Confidence
Intervals
Extracting natural gas through hydraulic fracturing, or “fracking,”
is controversial practice. Residents in two states with large
natural gas deposits, Michigan and Pennsylvania, were polled by
the University of Michigan to determine whether they favored or
opposed fracking. Let p1 represent the proportion of Michigan
residents who favor fracking and p2 represent the proportion of
Pennsylvania residents who favor fracking. A 95% confidence
interval for the difference in the population proportions is
(–0.0184, 0.117). Does this interval indicate there is a difference
in support of fracking between these two states?
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 95
Example: Interpreting Confidence
Intervals
Since the confidence interval contains 0, we can
conclude there is no significant difference in the
proportion of residents who support fracking in
these two states.
Copyright
Copyright©©2017,
2017,2014
2014Pearson
PearsonEducation,
Education,Inc.
Inc.
Slide 96