Download Confidence Interval for a Proportion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Last Name___________________ First Name _________________Class Time________Chapter 8-1
Chapter 8: Confidence Intervals
Parameters are calculations based on using data from the entire population. As a result, it is RARE to be
able to successfully calculate a parameter. When we don't know the value of a parameter, we need an
approach to estimate the parameter.
The best single number estimate for a parameter is its corresponding sample statistic.
We can ALWAYS calculate values for statistics using sample data.
Symbol
µ
Statistic
(Best single point estimate)
Sample Mean
σ
Sample Standard Deviation
s
p
Sample Proportion
p
Parameter
Symbol
Population Mean
Population Standard
Deviation
Population Proportion
x
Note: ALL of the symbols for both population parameters and sample statistics are lower-case letters.
They are ALL values of a single number. Upper-case letters are used to represent random variables.
But, rather than hope that the single value of a sample statistic is a good estimate of a population
parameter, it is safer to establish a range of values to estimate a population parameter. This range of
values is called a confidence interval for the parameter.
A CONFIDENCE INTERVAL IS A RANGE OF VALUES THAT IS LIKELY
TO CONTAIN THE TRUE VALUE OF THE POPULATION PARAMETER.
THE LEVEL OF CONFIDENCE IS THE PROBABILITY 1 - α
(OR EXPRESSED IN PERCENT AS 100(1 - Α )%)
THAT A CONFIDENCE INTERVAL CONTAINS A POPULATION PARAMETER.
We calculate CONFIDENCE INTERVALS ONLY for PARAMETERS, the values
that we can’t calculate because we don’t have access to all of the data.
In particular, in this course, we will only calculate confidence intervals for
2 parameters: population means and population proportions.
A Confidence Interval (CI) consists of a range of numerical values which we believe
will include the true value of a population parameter with a specified level of
confidence.
Last Name___________________ First Name _________________Class Time________Chapter 8-2
The confidence interval for a parameter is:
(Best Point Estimate – Error Bound, Best Point Estimate + Error Bound)
The confidence interval for µ is ( x - EBM, x + EBM).
The confidence interval for p is (p - EBP, p + EBP).
In chapter 8, we will be constructing confidence intervals for parameters in three cases:
Case I. For the population proportion p.
Case II. For the population mean µ when the population standard deviation σ is known
Case III. For the population mean µ when the population standard deviation σ is unknown
Confidence Intervals are created ONLY for population parameters such
as µ or p. Typically, we do not know the value of a population parameter
because it's almost always impossible to accurately obtain data from an
entire population.
Confidence Intervals are NEVER created for sample statistics such as x
or p'. We don’t need a confidence interval for a statistic.
We can always calculate the exact value of a sample statistic by using
sample data.
Last Name___________________ First Name _________________Class Time________Chapter 8-3
Case I: Construct A Confidence Interval For p:
We can construct a confidence interval for the population average, p , by building on our knowledge of
the random variable P' for sample proportions.
The random variable P' is the basis for constructing a confidence interval for p.
p = The sample proportion =
x number of successes in sample

n
total number in sample
RECALL THAT: X is a binomial random variable with X ~ B ( n, p )!
But if n is large enough, then we can use a normal distribution to approximate the distribution of X.
and X ~ N ( np,
npq )!
We can divide by n and use algebra to find the distribution of P . And
But, we don’t know p. Our best point estimate for p is p. So we use:

pq 

P ~ N p ,
n 


P ~ N p' ,

p'q' 

n  .
Confidence Interval for a Population Proportion
We want to estimate the population proportion p.
The confidence interval for p is:
(Best Point Estimate – Error Bound For The Proportion, Best Point Estimate + Error Bound For The Proportion)
= (p  - EBP, p  + EBP ).
(The Error Bound For The Proportion…. is also called……the Margin Of Error.)
(The confidence interval for the population proportion is based on the binomial probability of success)
Error Bound For The Proportion: EBP = Z
2
p' q '
n
where q' = 1 - p', and n = sample size
p' q ' = the standard deviation for the sample proportion P'.
n
Z = the upper Z value that bounds the middle area equal to the confidence level (1 - α ).
2
OR Z is the lower cut-off value of the top α / 2 area of the Z distribution.
2
Confidence Level = 1 – α
(In Chapter 9, we focus on α, the level of significance.)
Confidence Interval for p'
= ( x - EBP, x + EBP )
=
( x - Z
2
p' q '
,
n
x - Z
2
p' q '
)
n
Last Name___________________ First Name _________________Class Time________Chapter 8-4
Case II: Construct A Confidence Interval For  When  Is Known:
We can construct a confidence interval for the population average  , when we know the population standard
deviation , by building on our knowledge of the distribution of the random variable
The random variable
x for sample averages.
x is the basis for constructing a confidence interval for  when  is known.
From Chapter 7, we know that if the sample is large enough, then
But, we don’t know . Our best point estimate for  is x . So we use:
x ~ N  , 


n .

x ~ N x , 


.
n
Confidence Interval For The Population Mean  when  is known:
We want to estimate the population average  and we know the population standard deviation  .
The confidence interval for µ is:
(Best Point Estimate – Error Bound For The Mean, Best Point Estimate + Error Bound For The Mean)
= ( x - EBM, x + EBM ).
(The Error Bound For The Mean…… is also called……the Margin Of Error.)
x = the average of data from the sample = ( ∑ x’s from the sample) / n )
Error Bound For The Mean: EBM = Z
  

 ;
2  n 
where n = sample size.
 = the standard deviation (or standard error) of
n
Z = the upper Z value that bounds the middle area under the Z distribution equal to the confidence level (CL).
2
OR Z is the lower cut-off value of the top α / 2 area of the Z distribution.
2
Confidence Level = 1 – α
(In Chapter 9, we will focus on α, the level of significance.)
Confidence Interval for µ
= ( x - EBM, x + EBM )
= ( x - Z    , x - Z    )
2  n
2  n
x.
Last Name___________________ First Name _________________Class Time________Chapter 8-5
Case III: Construct A Confidence Interval For  When  Is Unknown:
We can construct a confidence interval for the population average  , when we do not know the population
standard deviation , by introducing a new distribution, called the t-distribution for the random variable
sample averages.
x for
The t-distribution (or Student-t distribution) has several specific characteristics:

The mean value is zero (like the standard normal random variable, Z).

The t random variable can have any value between -∞ and +∞ (like Z).

The distribution is bell-shaped, and symmetric about the value zero on the horizontal axis (like Z).

the t-distribution will have a lower central peak and higher tails than the standard normal distribution.

The parameter of the t distribution is its degrees of freedom (df = n-1).

The larger the sample size, the larger the degrees of freedom and the
more the particular t-distribution will be like the Z distribution.
Confidence Interval for a Mean  when  is NOT known
We want to estimate the population average  and we do not know the population standard deviation .
We use the sample standard deviation, s, to estimate the population standard deviation .
The confidence interval for µ is:
(Best Point Estimate – Error Bound For The Mean, Best Point Estimate + Error Bound For The Mean)
= ( x - EBM, x + EBM ).
(The Error Bound For The Mean…… is also called……the Margin Of Error.)
x = the average of data from the sample = ( ∑ x’s from the sample) / n ) ;
 s 


Error Bound For The Mean: EBM = t 2  n 
s = the standard deviation (or standard error) of
n
x.
t = the upper t value that bounds the middle area equal to the confidence level
2
under the student t-distribution with n-1 degrees of freedom (df)
t
is the lower cut-off value of the top α / 2 area of the t-distribution.
2
Confidence Level = 1 – α
Confidence Interval for µ
= ( x - EBM, x + EBM )
 s 
=
 s 




( x - t 2  n  , x + t 2  n  )
n = sample size.
Last Name___________________ First Name _________________Class Time________Chapter 8-6
***********Interpreting the Confidence Interval************
For a proportion, p: 2 types of interpretation:
First Interpretation (2 ways):
We are ______% confident that the true proportion of the population (describe the population
parameter in the problem) is between _____________ and _______________.
OR
We estimate with _____% confidence that between ___________% and ___________% of the
population are successes (describe the population parameter and what a success is in the problem.)
Second Interpretation:
If we calculate confidence intervals based on repeated sampling of size ___ (fill in the value of n)
in the same way, then we expect that ________% of the confidence intervals calculated will contain
the true population proportion (describe the population parameter in the problem).
For a mean, µ: 2 types of interpretation:
First Interpretation (2 ways):
We are _____% confident that the true population average (or mean) (describe the population
parameter in the situation of this problem) is between _______ and _______ (include the units).
OR
We estimate with _____% confidence that the true population average (or mean) (describe the
population parameter in the situation of this problem) is between ___________ and ___________ of
Second Interpretation:
If we calculate confidence intervals based on repeated sampling of size ___ (fill in the value of n)
in the same way, then we expect that ________% of the confidence intervals calculated will contain
the true population mean (describe the population parameter in the problem).
Visual Look At The Second Interpretation:
For example, a confidence level of 90% means that on average, 90% of all possible confidence
intervals based on repeated sampling for a population mean µ in the same way are expected to contain
the true population mean, µ.
In the following diagram, the ten confidence interval lengths, constructed by using ten random samples
of the same size, n, are all the same width. And, we expect, on average, that 9 out 10 such confidence
intervals will contain the true population mean, µ.
Last Name___________________ First Name _________________Class Time________Chapter 8-7
Finding the Point Estimate and Error Bound if we know the Confidence Interval:
If we know that the confidence interval is: (lower bound, upper bound), then:
•
Point Estimate = (lower bound + upper bound)/2, the average of the upper and lower bounds.
•
Error Bound = Margin of Error = (upper bound – lower bound) / 2, or half of the confidence interval length
What does it mean to be CL% confident?
If we were to take repeated samples and calculate many confidence interval estimates based on those samples,
then we would expect that CL% of the confidence interval estimates would be “good estimates” that would
enclose (capture) the true value of the population parameter we are trying to estimate.
If we were to take repeated samples and calculate many confidence interval estimates based on those samples,
then we would expect that 100% − CL% of the confidence interval estimates would be “bad estimates” that
would NOT enclose (capture) the true value of the population parameter we are trying to estimate.
Question for discussion: Can you ever know which confidence intervals actually do contain the true
population parameter and which intervals do not?
Note that the confidence interval is about population proportions or population averages, parameters
that are calculated from all data based on the population. The confidence interval is not about
individual data values. It does not mean that CL% of the data lie within the confidence interval.
To find Z that puts the area equal to the confidence level “in the middle”
CL is in the middle.
 = 1 − CL is the combined area of both tails.
α
2
is the area of one tail.
To find z = invnorm(1− α , 0, 1) ;
2
α
z = - invnorm( , 0, 1). ( z is always positive.)
2
2
2
2
OR
EXAMPLE 1:
CONFIDENCE INTERVAL ESTIMATE for an unknown POPULATION PROPORTION p
A city government needs to determine the percent of its residents that do not have health insurance. The
city health department randomly surveys 1600 city residents and finds that 15.25% of the 1600 residents do
not have health insurance. Construct and interpret a 95% confidence interval for the true population
proportion of all city residents who do not have health insurance.
Use a 95% confidence level.
a. Define the population parameter: p =
b. Define the random variable P =
Last Name___________________ First Name _________________Class Time________Chapter 8-8
We are using sample data to estimate an unknown proportion for the whole population.
Point
Estimate
= p
α
Z = invnorm(1− , 0, 1)
EBP =
z
2
p' q '
n
Confidence Interval =
(p  - EBP, p  + EBP)
Confidence Level CL is
area in the middle
 = 1 – Confidence Level
2
2
 area to left

invnorm 
, 0, 1
 of z

so α = area in one tail
2
c. Find the confidence interval by hand, filling in the steps for the following calculations.
Estimate answers to 4 decimal places.
p  = _____________ = ______ , 1 -  =________
so
 = ________ and /2 = __________
p' q '
z = invNorm (
n = ______________ = ________ ,
2
EBP = z 2
,
,
) = _______
p' q '
n = ___________________ = _________
p  - EBP = _______ - ________ = ________ ,
p  + EBP = _______ + ________ = ________
So the 95% confidence interval is: ___________________________________________________
d. Find the confidence interval using your calculator command, 1-PropZInterval.
 Key in the following sequence : 2nd
STAT
TESTS
1-PropZInterval
 Fill in the appropriate values for x, n, and the confidence level in your calculator.
Using 1-PropZInterval: the 95% confidence interval is _____________________________________
e. Find the Error Bound For the Proportion using your confidence interval. Show your work.
f. Sketch a graph of your confidence interval results.
 Draw the appropriate curve shape. Label the axis. Label the mean and key points on the axis.
 Shade the area corresponding to the confidence interval.
 Label the size of all shaded and unshaded areas.
 Draw a second axis, label it Z and label the mean.
 Calculate and label z-scores corresponding to the
upper and lower bounds of your CI.
Last Name___________________ First Name _________________Class Time________Chapter 8-9
g. Interpret your confidence interval in two ways, in context of the problem.
First Interpretation:
Second Interpretation:

h. The probability distribution N  p' ,


α
2
Using: invnorm  , p' ,

p' q ' 
 can also be used to find the confidence interval.
n 
p' q ' 
 = upper bound and
n 



invnorm 1 
α
p' q ' 
 = lower bound
, p' ,
2
n 
will give the right and left bounds of the same confidence interval.
This is because of the property we learned in chapter 6, that for the normal distribution the
probabilities found using the given distribution are the same as the probabilities found using the zscores with the standard normal distribution Z~N(0,1).
Fill in the blanks below:
invnorm(______ , ______ , _______) = ______ and invnorm(______ , ______, _______) = _______
Confidence Interval = ( ______________, ______________ )
k. Using your calculator software to find the confidence interval:
STAT → TESTS → 1-PropZInterval
→ ENTER
x: (Fill in the number of successes directly if given, or as a percent of the total sample size.)
n: (Fill in the sample size.)
C- Level: (Fill in the level of confidence as a decimal.)
Calculate ENTER
Fill in the blanks, showing any necessary calculations:
x: _____________________
and
n: _____________
C- Level: ______________
the Confidence Interval = ( ______________, ______________ )
Last Name___________________ First Name _________________Class Time________Chapter 8-10
i. It has been estimated that for the state in which that city is located, approximately 20% of residents do
not have health insurance. Based on the confidence interval you found above (in part h), can we conclude
that the proportion of city residents who lack health insurance is lower than the proportion of state
residents? Explain.
j. It has been estimated that nationally, approximately 16% of residents do not have health insurance.
Based on this confidence interval, can we conclude that the proportion of city residents who lack health
insurance is lower than the proportion of U.S. residents? Explain.
CONFIDENCE INTERVAL ESTIMATE for unknown POPULATION MEAN 
when the POPULATION STANDARD DEVIATION  is KNOWN
EXAMPLE 2:
a. A soda bottling plant fills 12 ounce cans with soda. The filling machine varies and does not fill each
can with exactly 12 ounces. To determine if the filling machine needs adjustment, each day the quality
control manager measures the amount of soda per can for a random sample of 50 cans. Experience
shows that its filling machines have a known (population) standard deviation of 0.35 ounces. In today's
sample of 50 cans of soda, the average amount of soda per can is 12.1 ounces with a standard deviation
of 0.42 ounces. Construct and interpret a 95% confidence interval estimate for the true population
average amount of soda contained in all cans filled today at this bottling plant.
a. X =
b. population parameter:  =
c. random variable
x
=
We are using sample data to estimate an unknown mean (average) for the whole population.
Confidence Level CL is
α , 0, 1)
Confidence
=
invnorm(1−
Z
area in the middle
2
2
Interval =

=
1 – Confidence Level


Invnorm (area to left of z, 0, 1)
(
EBM,
+
EBM)
x
x
 n
so α = area in one tail
EBM =
Point
Estimate = z
  
x

2
2
Last Name___________________ First Name _________________Class Time________Chapter 8-11
d. Find the confidence interval by hand, filling in the steps for the following calculations.
Estimate answers to 4 decimal places. Show ALL WORK below.
Include both symbols and numerical values of all pieces of your work.
x = _____________ = ______ , 1 -  =________
so
 = ________ and /2 = __________

= ______________ = ________ , z 2 = invNorm (
n
,
,
) = _______

EBM = z •
= ___________________ = _________
2
n
x - EBM = _______ - ________ = ________ ,
x + EBM = _______ + ________ = ________
So the 95% confidence interval is: ___________________________________________________
e. Sketch a graph of your confidence interval results.
 Draw the appropriate curve shape. Label the axis. Label the mean and key points on the axis.
 Shade the area corresponding to the confidence interval.
 Label the size of all shaded and unshaded areas.
 Draw a second axis, label it Z and label the mean.
 Calculate and label z-scores corresponding to the
upper and lower bounds of your CI.
  
f. As in the previous example, the probability distribution N  x ,
 can also be used to find the
n

confidence interval.
 
 α
α
 
Using: invnorm  , x ,
invnorm 1  , x ,
 = upper bound and
 = lower bound
n
2
2
n


will give the right and left bounds of the same confidence interval.
invnorm(____ , _____ , _______ ) = _______
and
invnorm(____ , _____, _______ ) = _______
Confidence Interval ( ________, ________ )
Last Name___________________ First Name _________________Class Time________Chapter 8-12
g. Interpret your confidence interval in two ways, in context of the problem.
First Interpretation:
Second Interpretation:
h. Using your calculator software to find the confidence interval:
STAT → TESTS → ZInterval
→ ENTER
Input: Data Stats
 Choose Data if you are providing a list of data values (L1) with a frequency
(L2 or 1). The calculator will use your data to find the sample mean.
 Choose Stats if you know x and n.
σ:
x
n
C- Level:
Fill in the necessary values.
Fill in the blanks, showing any necessary calculations:
σ: _________
and
x = _________
n: _____________
C- Level: ______________
the Confidence Interval = ( ______________, ______________ )
EXAMPLE 3:
CONFIDENCE INTERVAL ESTIMATE for unknown POPULATION
MEAN 
when the POPULATION STANDARD DEVIATION  is NOT KNOWN
a. The speeds of 20 vehicles are observed by radar on a particular road.
For the vehicles in the sample, the average speed is 31.3 miles per hour with a standard deviation
7.0 mph. Construct and interpret a confidence interval estimate of the true population average speed
of all vehicles traveling on this road. Use a 90% confidence level.
X = ________________________________________________________________________
population parameter:  = ________________________________________________________________
random variable
x
= ____________________________________________________________________
Last Name___________________ First Name _________________Class Time________Chapter 8-13
b. Using your calculator software to find the confidence interval:
STAT → TESTS → TInterval
→ ENTER
Input: Data Stats
 Choose Data if you are providing a list of data values (L1) with a frequency
(L2 or 1). The calculator will use your data to find the sample mean.
 Choose Stats if you know x , sx , and n.
x:
sx :
n:
C- Level:
Fill in the necessary values.
Fill in the blanks, showing any necessary calculations:
sx: _________
and
x = _________
n: _____________
C- Level: ______________
the Confidence Interval = ( ______________, ______________ )
c. Sketch a graph of your confidence interval results.
 Draw the appropriate curve shape. Label the axis. Label the mean and key points on the axis.
 Shade the area corresponding to the confidence interval.
 Label the size of all shaded and unshaded areas.
 Calculate and label z-scores corresponding to the
upper and lower bounds of your CI.
d. Interpret your confidence interval in two ways, in context of the problem.
First Interpretation:
Second Interpretation:
Last Name___________________ First Name _________________Class Time________Chapter 8-14
e. Find the Error Bound for the mean, showing all calculations.
f. What is the best point estimate of the true average speed of all vehicles on this road?
g. In Example 3, suppose that you were not given the sample mean and sample standard deviation and instead
you were given a list of data for the speeds (in miles per hour) of the 20 vehicles.
19
19
22
24
25
27
28
37
35
30
37
36
39
40
43
30
31
36
33
35
Use these data to find the 90% confidence interval.
Did you find exactly the same interval as you answered to part b above?
NOTE: Using the t- distribution requires that the underlying population of individual values is approximately
normally distributed. This assumption is somewhat "robust", meaning it can be violated to some degree. But if
the underlying population of individual values has a distribution that differs too much from the normal
distribution, then this confidence interval method would not be appropriate, and statisticians would use other
techniques that we do not study in Math 10.
Last Name___________________ First Name _________________Class Time________Chapter 8-15
EXAMPLE 4:
If we know the confidence interval: Working Backwards
The average nightly cost of hotel rooms for two resort areas are compared. Large random samples of
hotel room costs are collected for each city. The resulting confidence intervals are reported in a hotel
industry journal.
The 90% confidence interval estimate for the true population average nightly cost of a hotel room in
Surf City is $134 to $159 per night.
The 90% confidence interval estimate for the true population average nightly cost of a hotel room in
Ski Village is $123 to $141 per night.
a. Find the point estimate for the true average nightly cost of a hotel room in each city.
Which city has a higher point estimate?
Surf City:
Ski Village:
Circle the city with the higher point estimate.
Surf City
Ski Village
b. Find the error bound for each city. Which city has a smaller margin of error?
Surf City:
Circle the city with the smaller error bound.
Ski Village:
Surf City
Ski Village
c. Based on the confidence intervals only, would it be reasonable to conclude that the true average
nightly cost of a hotel rooms are different in Surf City and in Ski Village?
Answer Yes or No and explain why your answer is reasonable.
d. Would it be true that 90% of hotel rooms cost between $134and $159 per night in Surf City and that
90% of hotel rooms cost between $123 and $141 per night in Ski Village? Why or why not? Explain!
Last Name___________________ First Name _________________Class Time________Chapter 8-16
Exploring Confidence Intervals.
Suppose a representative of the dairy industry is interested in the proportion of adults in a certain city
who drink milk.
He randomly surveys 650 adults in the city and finds that 28% drink milk.
a. Write the symbol for and define the parameter.
symbol:_____ = _________________________________________________________________
b. Write the symbol for and define the random variable.
symbol:_____ = _________________________________________________________________
c. Write the distribution of the random variable: ______ ~ ______________________________
d. Find a 96% confidence interval for the true proportion of all adults in the city when 650 adults are
included in the survey. Show your work by hand. Find the error bound for the confidence interval.
Draw a well-labeled and shaded graph of the results. Include a second scale for z scores.
Error Bound:
CI by hand:
Sketch:
e. Find a 92% confidence interval for the true proportion of all adults in the city if 650 adults were
included in the survey. Show your work by hand. Find the error bound for the confidence interval.
Draw a well-labeled and shaded graph of the results. Include a second scale for z scores.
Error Bound:
CI by hand:
Sketch:
Last Name___________________ First Name _________________Class Time________Chapter 8-17
f. Find a 99% confidence interval for the true proportion of all adults in the city if 650 adults were
included in the survey. Show your work by hand and show the steps when using calculator software.
Find the error bound for the confidence interval.
CI using calculator
Error Bound:
g. Find a 96% confidence interval for the true proportion of all adults in the city if 100 adults were
included in the survey. Show your work by hand and show the steps when using calculator software.
Find the error bound for the confidence interval.
CI using calculator
Error Bound:
h. Find a 96% confidence interval for the true proportion of all adults in the city if 50 adults were
included in the survey. Show your work by hand and show the steps when using calculator software.
Find the error bound for the confidence interval.
CI using calculator
Error Bound:
Based on the intervals that you calculated above, circle the correct answers to the following questions:
i. For a constant sample size, if the confidence level is increased,
 the confidence interval becomes:
wider
narrower
no change
 the error bound becomes:
larger
smaller
no change
j. For a constant sample size, if the confidence level is decreased,
 the confidence interval becomes:
wider
narrower
no change
 the error bound becomes:
larger
smaller
no change

k. If the confidence level is held constant and the sample size is increased,
 the confidence interval becomes:
wider
narrower
no change
 the error bound becomes:
larger
smaller
no change
l. If the confidence level is held constant and the sample size is decreased,
 the confidence interval becomes:
wider
narrower
no change
 the error bound becomes:
larger
smaller
no change
