Download + Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
+
Chapter 8
Estimating with Confidence
 8.1
Confidence Intervals: The Basics
 8.2
Estimating a Population Proportion
 8.3
Estimating a Population Mean
+ Section 8.1
Confidence Intervals: The Basics
Learning Objectives
After this section, you should be able to…

INTERPRET a confidence level

INTERPRET a confidence interval in context

DESCRIBE how a confidence interval gives a range of plausible
values for the parameter

DESCRIBE the inference conditions necessary to construct
confidence intervals

EXPLAIN practical issues that can affect the interpretation of a
confidence interval
In Chapter 7, we learned that different samples yield different
results for our estimate. Statistical inference uses the
language of probability to express the strength of our
conclusions by taking chance variation due to random
selection or random assignment into account.
In this chapter, we’ll learn one method of statistical inference –
confidence intervals – so we may estimate the value of a
parameter from a sample statistic. As we do so, we’ll learn not
only how to construct a confidence interval, but also how to
report probabilities that would describe what would happen if
we used the inference method many times.
Confidence Intervals: The Basics
Our goal in many statistical settings is to use a sample statistic
to estimate a population parameter. In Chapter 4, we learned
if we randomly select the sample, we should be able to
generalize our results to the population of interest.
+
 Introduction
This is the goal of Statistical Inference
We cannot be certain our conclusion is correct.
Statistical Inference uses probability to express the strength
of our conclusions.
This is the largest part of the AP
Exam (30 – 40 percent).
Where Are We Headed?
Two common types of Statistical inference
Chapter 8 – Confidence Intervals for
estimating the value of a population
parameter.
Chapter 9 – Significance Tests –
Assess the evidence for a claim about a
population.
Both report probabilities that state what
would happen if we used the inference
method many times.
Intervals: The Basics
+
 Confidence
If you had to give one number to estimate an unknown population
parameter, what would it be?
If you were estimating a population mean  , you would probably use x.
If you were estimating a population proportion p, you might use pˆ .
In both cases, you would be
providing a point estimate of the parameter of interest.
+  Definition:
Point
estimator : a statistic that
provides an estimate of a population
parameter.

Point estimate: The value of that
statistic from a sample.
a point estimate is our “best
guess” at the value of an unknown
parameter.
Ideally,
+ We learned in Chapter 7 that an ideal
point estimator will have no bias and
low variability.
Since
variability is almost always
present when calculating statistics from
different samples, we must extend our
thinking about estimating parameters to
include an acknowledgement that
repeated sampling could yield different
results.
+
 A point


estimate is a single number,
How much uncertainty is associated with a point estimate of a
population parameter?
An interval estimate provides more information about a
population characteristic than does a point estimate. It provides
a confidence level for the estimate. Such interval estimates are
called confidence intervals
Lower
Confidence
Limit
Upper
Point Estimate
Confidence
Limit
Width of
confidence interval
+
Mystery Mean Activity
Mean(randNorm(M,20,16))
Keystrokes: 2nd Stat Math ->
Mean ( Math ->Prob ->
randNorm)
(
(Put:
mu = M , sigma=20, n=16)
Idea of a Confidence Interval
To answer this question, we must ask another:
How would the sample mean x vary if we took many SRSs
of size 16 from the population?
Shape: Since the population is Normal, so is the sampling distribution of x.
Center : The mean of the sampling distribution of
 of the population distribution , .
x is the same as the mean
Spread: The standard deviation of x for an SRS of 16 observations is

20
x 

5
n
16
Confidence Intervals: The Basics
Recall the “Mystery Mean” Activity. Is the value of the population
mean µ exactly 240.79? Probably not. However, since the sample
mean is 240.79, we could guess that µ is “somewhere” around
240.79. How close to 240.79 is µ likely to be?
+
 The

 In repeated samples, the values of x
follow a Normal distribution with mean
and standard deviation 5.

 The 68 - 95 - 99.7 Rule tells us that in 95%
of all samples of size 16, x will be within 10
(two standard deviations) of .
 If x is within 10 points of , then  is
within 10 points of x .

Therefore, the interval from x 10 to x  10 will " capture"  in about
95% of all samples of size 16.

If we estimate that µ lies somewhere in the interval 230.79 to
250.79, we’d be calculating an interval using a method that
captures the true µ in about 95% of all possible samples of this
size.
+
 The Idea of a Confidence Interval
To estimate the Mystery Mean , we can use x  240.79 as a point
estimate. We donÕt expect  to be exactly equal to x so we need to
say how accurate we think our estimate is.
Idea of a Confidence Interval
+
 The
The big idea : The sampling distribution of x tells us how close to  the
sample mean x is likely to be. All confidence intervals we construct will
have a form similar to this :
estimate ± margin of error
Definition:
A confidence interval for a parameter has two parts:
• An interval calculated from the data, which has the form:
estimate ± margin of error
• The margin of error tells how close the estimate tends to be to the
unknown parameter in repeated random sampling.
• A confidence level C, the overall success rate of the method for
calculating the confidence interval.
--That is, in C% of all possible samples, the method would yield an
interval that captures the true parameter value.
We usually choose a confidence level of 90% or higher because we want to be
quite sure of our conclusions. The most common confidence level is 95%.
Interpreting Confidence Levels and Confidence Intervals: The
confidence level is the overall capture rate if the method is used many times.
+

Starting with the population, imagine taking many SRSs of 16 observations. The sample mean will
vary from sample to sample, but when we use the method estimate ± margin of error to get an
interval based on each sample, 95% of these intervals capture the unknown population mean µ.
Interpreting Confidence Level and Confidence Intervals
Confidence level: To say that we are 95% confident
is shorthand for “95% of all possible samples of a
given size from this population will result in an
interval that captures the unknown parameter.”
Confidence interval: To interpret a C% confidence
interval for an unknown parameter, say, “We are
C% confident that the interval from _____ to _____
captures the actual value of the [population
parameter in context].”
Interpreting Confidence Levels and Confidence Intervals
The confidence level tells us how likely it is that the
method we are using will produce an interval that
captures the population parameter if we use it many
times.
The confidence level does not tell us the
chance that a particular confidence
interval captures the population
parameter.
Instead, the confidence interval gives us a set of plausible
values for the parameter.
We interpret confidence levels and confidence intervals in much the
same way whether we are estimating a population mean, proportion,
or some other parameter.
+

a Confidence Interval
When we calculated a 95% confidence interval for the mystery
mean µ, we started with
estimate ± margin of error
Our estimate came from the sample statistic x .
Since the sampling distribution of x is Normal,
about 95% of the values of x will lie within 2
standard deviations ( 2 x ) of the mystery mean .
That is, our interval could be written as :
240.79  2  5 = x  2 x
This leads to a more general formula for confidence intervals:
statistic ±
(critical value) • (standard deviation of statistic)
Confidence Intervals: The Basics
Why settle for 95% confidence when estimating a parameter?
The price we pay for greater confidence is a wider interval.
+
 Constructing
a Confidence Interval
The confidence interval for estimating a population parameter has the form
statistic ± (critical value) • (standard deviation of statistic)
where the statistic we use is the point estimator for the parameter.
Properties of Confidence Intervals:
 Margin of error = (critical value) • (standard deviation of statistic)
 The user chooses the confidence level, and the margin of error follows
from this choice.
 The critical value depends on the confidence level and the sampling
distribution of the statistic.
 Greater confidence requires a larger critical value
Confidence Intervals: The Basics
Calculating a Confidence Interval
+
 Calculating
 The standard deviation of the statistic depends on the sample size n
The margin of error gets smaller when:
 The confidence level decreases
 The sample size n increases
Confidence Intervals
Before calculating a confidence interval for µ or p there are
3 IMPORTANT conditions that you should check.
1) Random: The data should come from a well-designed random
sample or randomized experiment.
2) Normal: The sampling distribution of the statistic is approximately
Normal.
For means: The sampling distribution is exactly Normal if the population
distribution is Normal. When the population distribution is not Normal,
then the central limit theorem tells us the sampling distribution will be
approximately Normal if n is sufficiently large (n ≥ 30).
For proportions: We can use the Normal approximation to the sampling
distribution as long as np ≥ 10 and n(1 – p) ≥ 10.
3) Independent: Individual observations are independent. When
sampling without replacement, the sample size n should be no more
than 10% of the population size N (the 10% condition) to use our
formula for the standard deviation of the statistic.
+
 Using
+ Section 8.1
Confidence Intervals: The Basics
Summary
In this section, we learned that…

To estimate an unknown population parameter, start with a statistic that
provides a reasonable guess. The chosen statistic is a point estimator for
the parameter. The specific value of the point estimator that we use gives a
point estimate for the parameter.

A confidence interval uses sample data to estimate an unknown population
parameter with an indication of how precise the estimate is and of how
confident we are that the result is correct.

Any confidence interval has two parts: an interval computed from the data
and a confidence level C. The interval has the form
estimate ± margin of error

When calculating a confidence interval, it is common to use the form
statistic ± (critical value) · (standard deviation of statistic)
+ Section 8.1
Confidence Intervals: The Basics
Summary
In this section, we learned that…

The confidence level C is the success rate of the method that produces the
interval. If you use 95% confidence intervals often, in the long run 95% of
your intervals will contain the true parameter value. You don’t know whether
a 95% confidence interval calculated from a particular set of data actually
captures the true parameter value.

Other things being equal, the margin of error of a confidence interval gets
smaller as the confidence level C decreases and/or the sample size n
increases.

Before you calculate a confidence interval for a population mean or
proportion, be sure to check conditions: Random sampling or random
assignment, Normal sampling distribution, and Independent observations.

The margin of error for a confidence interval includes only chance variation,
not other sources of error like nonresponse and undercoverage.
+
Looking Ahead…
In the next Section…
We’ll learn how to estimate a population proportion.
We’ll learn about
 Conditions for estimating p
 Constructing a confidence interval for p
 The four-step process for estimating p
 Choosing the sample size for estimating p
+
AP EXAM COMMON ERRORRRRRRR!!!!!!!!!
Do
not get confused between Con Level and
Con Interval.
Whenever you construct a CI, you are
expected to INTERPRET the interval you
calculate.
You are expected to interpret the C Level
only when you are asked to do so.
+ BEWARE of the Incorrect phrase for C Interval
!!!!!!!!!!!!!!!!!!!!!!!
Incorrect “ There is a 95% probability that the true
mean falls in the interval from 230.75 to 250.78”
THIS IS INCORRECT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Correct “ We are 95% confident that the interval
from 230.75 to 250.78 captures the actual value of the
[population parameter in context].
+
P- 481 # 1-4
Put the #s in List L1. Do 1-Var Stat : Find the mean x-bar.
To get the variance, square the sd : 14.23
36/50 = 0.72
19/172= 0.11
+ 6.
+ 8.6
+ 10.
The figure shows that all of the
25 Confidence intervals did
contain the mean. This suggests
that the confidence level was
quite high- probably 99% - but
possible 95%.
+ 12. (a) If we were to repeat the sampling procedure
many times, on average, the sample proportion
would be within 3 % points of the true proportion
in 95% of samples.
(b) The 95% confidence interval is ( 0.59 - .03 ,
0.59+ .03), which is 0.56 to 0.62.
We are 95% confident that the interval from 0.56 to
0.62 captures the true proportion of those who
would like to lose weight.
c) If we were to repeat the sampling procedure
many time, about 95% of the confidence intervals
computed would contain the true proportion of
those who would like to lose weight.
+ 16.
We are 95% confident that the interval from
0.120 to 0.297 captures the true difference in the
proportions of younger teens and older teens and
older teens who include false information on their
profiles( younger-older).
That is, we are 95% confident that between 12%
and 29.7%, more younger teens publish false
information on their profiles than older teens.
When we say 95% , we mean that if this sampling
method were employed many times, approx. 95%
of the resulting confidence intervals would
capture the true difference between the
proportions of younger teens and older teens who
include false information on their profiles.
+ 20.
The data were not randomly collected; they come
from a voluntary response sample. Those who
respond to such online polls tend to be those with
stronger opinions about the issue at hand and so
are not representative of the population of
interest. Since the data are not random, the
confidence interval cannot be generalized to any
larger group , rendering it basically useless.
+ 21 b
22 e
23 c
24 b