Download Chapter 21: What is a Confidence Interval?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 21: What is a Confidence Interval?
Thought Question
To estimate the percentage of all adults who have an Internet connection in their homes, a
properly chosen sample of 1100 adults across the U.S. were contacted, and 60% said “yes”.
If this poll were repeated many times, would it get the same sample proportion each time?
How close do you think this sample proportion is to the percentage of the entire country who
have an internet connection? Within 30%? 10%? 5%? 1%? Exactly the same?
1
Estimating Population Parameters
Terminology
• Statistical inference: draws conclusions about a population on the basis of data about
a sample.
• Parameter: fixed, unknown number that describes the population.
• Statistic: known value calculated from a sample, which can change from sample to
sample.
• Population proportion p: a parameter, describes the proportion of the population with
a particular characteristic.
• Sample proportion p̂: a statistic, calculates the proportion of the sample with a particular characteristic.
How do we estimate an unknown parameter?
Choose a sample from the population and use a sample statistic as an estimate.
Observations
• Statistical conclusions are uncertain because the sample isn’t the entire population.
• Statistical inference must both give conclusions and say how uncertain they are.
2
Review of Quantifying Uncertainty
We will never estimate the population parameter exactly. How far off are we?
Terminology
• Margin of Error: how close the sample statistic lies to the population parameter.
• Level of Confidence: what percentage of all possible samples satisfy the margin of
error.
• Confidence Statement: combines margin of error and level of confidence.
Interpretation of Common Language
Statement: “The margin of error is plus or minus two percentage points.”
Translation: If we took many samples, 95% of them would give a value of p̂ within ±2% of p.
Equivalently: If we took many samples, p is within ± 2% of 95% of the values of p̂.
Equivalently: If we took many samples, p will captured by 95% of the intervals [p̂ − 0.02, p̂ +
0.02].
Interpretation of 95% Confidence using Margin of Error
• 95% of the time p̂ will be no more than the margin of error away from p.
• 5% of the time p̂ will “miss” p by more than the margin of error.
• Can’t tell if we “hit” or “miss.” This is just a fact of life!
3
Quick Method
√
Use p̂ from an SRS of size n to estimate p. The margin of error for 95% confidence is ≈ 1/ n.
Example: Internet Connection
√
Margin of error for 95% confidence: ≈ 1/ 1100 ≈ 0.0302 = 3.02%.
Confidence statements:
(i) Margin of error interpretation: We are 95% confident that the true proportion of adults
who have an internet connection in their homes is within ± 3.02% of our sample proportion
60%.
(ii) Interval interpretation: We are 95% confident that between 56.98% and 63.02% of adults
have an internet connection in their homes.
Important Points
• The conclusion of a confidence statement always applies to the population, not to the
sample.
• Our conclusion about the population is never completely certain.
• We can choose to use a confidence level other than 95%.
• It is usual to report the margin of error for 95% confidence.
• Take a larger sample to get a smaller margin of error with the same confidence.
4
Confidence Intervals For Sample Proportions
Terminology
• 95% confidence interval: an interval calculated from sample data by a process that is
guaranteed to capture the true population parameter in 95% of all samples.
Facts About Sample Proportions
For large enough n:
• The sampling distribution of p̂ is approximately Normal.
• The mean of the sampling distribution is p.
r
• The standard deviation of the sampling distribution is
p(1 − p)
.
n
Example: Internet Connection
Assume that the true proportion p of adults who have an internet connection in their homes is
0.62.
What are the mean and standard deviation of the sample proportion p̂?
mean = p = 0.62
r
standard deviation =
0.62(1 − 0.62)
≈ 0.015
1100
5
Using the 68-95-99.7 rule, determine two values of p̂ in between which 95% of all values of p̂
will lie.
95% of all values of p̂ will fall between:
mean - 2 standard deviations = 0.62 − 2 × 0.015 = 0.59
and
mean + 2 standard deviations = 0.62 + 2 × 0.015 = 0.65.
Note that:
• Margin of error interpretation: In 95% of all samples of size 1100, the statistic p̂ is
within ± 0.030 of the parameter p.
• Confidence interval interpretation: Equivalently, 95% of all samples of size 1100 give
an outcome p̂ such that the population truth p is captured by the interval [p̂ − 0.030, p̂ +
0.030].
In General
Approximately 95% of all samples catch p in the interval
"
#
r
r
r
p(1 − p)
p(1 − p)
p(1 − p)
p̂ ± 2
= p̂ − 2
, p̂ + 2
.
n
n
n
Why can’t we use this formula to compute a confidence interval from our data?
We don’t know the value of p!
Approximate p by p̂.
6
95% Confidence Interval for a Proportion
An approximate 95% confidence interval for p is
#
"
r
r
r
p̂(1 − p̂)
p̂(1 − p̂)
p̂(1 − p̂)
.
p̂ ± 2
= p̂ − 2
, p̂ + 2
n
n
n
Example: Internet Connection
Calculate a 95% confidence interval for the proportion of adults who have an internet connection
at home.
r
r
p̂(1 − p̂)
0.60(1 − 0.60)
p̂ ± 2
= 0.60 ± 2
≈ 0.60 ± 0.0295 ≈ [0.5705, 0.6295]
n
1100
Write the two versions of a confidence statement.
(i) Margin of error interpretation: We are 95% confident that the proportion of adults that
have an internet connection in their homes is within ± 2.95% of our sample proportion 60%.
(i) Confidence interval interpretation: We are 95% confident that between 57.05% and
62.95% of adults have an internet connection in their homes.
7
Understanding Confidence Intervals
So Far...
confidence interval = estimate ± margin of error
In General
A level C confidence interval for a parameter has two parts:
• An interval calculated from the data.
• A confidence level C, which gives the probability that the interval will capture the true
parameter value with repeated samples.
Example: Internet Connection
Assume three additional polls of adults were taken, with the following results:
(i) 671 out of 1100 adults had an internet connection in their homes
(ii) 704 out of 1100 adults had an internet connection in their homes
(iii) 638 out of 1100 adults had an internet connection in their homes
8
Calculate a 95% confidence interval for each poll.
r
(i) p̂ ± 2
p̂(1 − p̂)
=
n
r
(ii) p̂ ± 2
p̂(1 − p̂)
=
n
r
(iii) p̂ ± 2
r
671
1100
p̂(1 − p̂)
=
n
±2
671
1100 (1
r
704
1100
±2
671
− 1100
)
≈ 0.610 ± 0.029 ≈ [0.581, 0.639]
1100
r
638
1100
±2
704
− 1100
)
≈ 0.640 ± 0.029 ≈ [0.611, 0.669]
1100
704
1100 (1
638
1100 (1
638
− 1100
)
≈ 0.580 ± 0.030 ≈ [0.550, 0.610]
1100
Which of these confidence intervals contain the true parameter 0.62?
The first and second.
If we sample forever, 95% of our confidence intervals will capture the true parameter.
9
Interpretation of 95% Confidence using Confidence Intervals
• 95% of the time the confidence interval will capture p.
• 5% of the time the confidence interval will not capture p.
• Can’t tell if we “hit” or “miss.” This is just a fact of life!
10
Changing the Confidence Level
What if we want a confidence level other than 95%?
Confidence Level C
50%
60%
70%
80%
90%
95%
99%
99.9%
Critical Value z ∗
0.67
0.84
1.04
1.28
1.64
1.96
2.58
3.28
Observations
• The sample proportion p̂ takes a value within z ∗ standard deviations of p, with probability
C.
• The interval extending z ∗ standard deviations either side of p̂ captures p, with probability
C.
11
Level C Confidence Interval for a Proportion
An approximate level C confidence interval for p is
#
"
r
r
r
p̂(1
−
p̂)
p̂(1
−
p̂)
p̂(1
−
p̂)
,
p̂ ± z ∗
= p̂ − z ∗
, p̂ + z ∗
n
n
n
where z ∗ is the critical value for confidence level C.
Example: Internet Connection
Using the sample proportion p̂ = 0.60, calculate a 99% confidence interval for the proportion of
adults who have an internet connection at home.
r
r
p̂(1 − p̂)
0.60(1 − 0.60)
p̂ ± 2.58
= 0.60 ± 2.58
≈ 0.60 ± 0.038 ≈ [0.562, 0.638]
n
1100
Using the sample proportion p̂ = 0.60, calculate a 99.9% confidence interval for the proportion
of adults who have an internet connection at home.
r
p̂ ± 3.29
r
p̂(1 − p̂)
0.60(1 − 0.60)
= 0.60 ± 3.29
≈ 0.60 ± 0.049 ≈ [0.551, 0.649]
n
1100
What happens to the confidence interval width as the confidence level increases?
The confidence interval gets wider.
12
Review of Confidence Intervals; Means
Goal : draw conclusions about a population mean µ.
How do we estimate µ?
Choose a sample from the population and use the sample mean x̄ as an estimate of µ.
Why do different samples give different values of x̄?
Different samples are made up of different people/objects by random chance. So we get different
values of x̄.
Why don’t we report just the value of x̄ that we calculate in order to draw conclusions about µ?
Why do we give the margin of error or the confidence interval?
x̄ is most likely not the same as µ, because the sample isn’t the entire population. We also give
the margin of error or confidence interval to indicate the uncertainty in our estimate of µ.
13
Sampling Distribution of the Sample Mean
Facts About Sample Means (Central Limit Theorem)
Choose an SRS of size n from a population with mean µ and standard deviation σ. For large
enough n:
• The sampling distribution of x̄ is approximately Normal.
• The mean of the sampling distribution is µ.
√
• The standard deviation of the sampling distribution is σ/ n.
What proportion of possible x̄ values will fall within ± 2 standard deviations of the mean µ?
µ − 3σ
µ − 2σ
µ−σ
µ
µ+σ
µ + 2σ
µ + 3σ
95% of x̄ values will be within ± 2 standard deviations of µ. This is a statement about x̄, not
µ.
14
Turn it around: µ will be within ± 2 standard deviations of 95% of the x̄ values. This is a
statement about µ, not x̄.
µ − 3σ
µ − 2σ
µ−σ
µ
µ+σ
µ + 2σ
µ + 3σ
σ
σ
√
√
So the interval x̄ − 2
, x̄ + 2
is an approximate 95% confidence interval for µ. Approxn
n
imately 95% of these intervals will capture µ.
15
Confidence Intervals: Sample Means
An approximate level C confidence interval for µ is
∗ σ
∗ σ
∗ σ
x̄ ± z √ = x̄ − z √ , x̄ + z √ .
n
n
n
where z ∗ is the critical value for confidence level C.
Why can’t we use this formula to compute a confidence interval from our data?
We don’t know the value of σ!
Approximate σ by s.
Level C Confidence Interval for a Sample Mean
An approximate level C confidence interval for µ is
∗ s
∗ s
∗ s
x̄ ± z √ = x̄ − z √ , x̄ + z √ .
n
n
n
where z ∗ is the critical value for confidence level C.
16
Example: Blood Pressure
The medical director of a large company looks at the medical records of 72 executives between
the ages of 35 and 44 years. He finds that the mean systolic blood pressure in this sample is
x̄ = 126.1 and the standard deviation is s = 15.2.
Find a 95% confidence interval for µ, the unknown mean systolic blood pressure of all executives
in the company.
s
15.2
x̄ ± z ∗ √ = 126.1 ± 1.96 √ ≈ 126.1 ± 3.6 ≈ [122.5, 129.7]
n
72
Write a confidence statement.
We are 95% confident that the mean systolic blood pressure of all executives in the company
is within ±3.6 of 126.1.
or
We are 95% confident that the mean systolic blood pressure of all executives in the company
is between 122.5 and 129.7.
Interpret the meaning of 95% confidence if we repeat the study many times.
95% of the time x̄ will be within ± 3.6 of µ.
or
95% of the confidence intervals will contain µ.
Find a 99% confidence interval for µ, the unknown mean systolic blood pressure of all executives
in the company.
s
15.2
x̄ ± z ∗ √ = 126.1 ± 2.58 √ ≈ 126.1 ± 4.6 ≈ [121.5, 130.7]
n
72
17
We can interpret 95% confidence in two ways:
(i) 95% of the time x̄ will be no more than the margin of error away from µ.
(ii) 95% of the time the confidence interval will capture µ.
Explain why these statements are equivalent.
If 95% of x̄ values are no more than the margin or error away from µ, then µ is no more than
the margin of error away from 95% of the values of x̄. Therefore, the confidence interval x̄ ±
margin of error will contain µ for 95% of the values of x̄.
What will happen to the confidence interval if we keep the sample size the same and increase
the confidence level?
The confidence interval width will increase. The interval can’t “miss” µ as often, so it has to
get wider.
What will happen to the confidence interval if we keep the confidence level the same and increase
the sample size?
The confidence interval will shrink, since the margin of error decreases.
18
Example: ACT Scores
A college admissions counselor looks at the ACT scores of 1000 high school students. He finds
that the mean ACT score in this sample is x̄ = 18.2 and the standard deviation is s = 5.8.
Find a 95% confidence interval for µ, the unknown mean ACT score for high school students.
s
5.8
x̄ ± z ∗ √ = 18.2 ± 2 √
≈ 18.2 ± 0.37 ≈ [17.83, 18.58]
n
1000
Write a confidence statement for the true mean ACT score.
We are 95% confident that the mean ACT score of all high school students is within ±0.37 of
18.2.
or
We are 95% confident that the mean systolic blood pressure of all executives in the company
is between 17.83 and 18.58.
The true mean ACT score is µ = 18. Did your confidence interval capture µ?
Yes!
19