Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Confidence Intervals
Dennis Sun
Data 301
Statistical Inference
probability
Population /
Box
Sample /
Data
statistics
The goal of statistics is to infer the unknown population from the
sample.
We’ve already seen one mode of statistical inference: hypothesis
testing.
Probability vs. Statistics
Probability: I have a fair coin. If I toss it 100 times, how many
heads will I get?
0 1
100 draws with
replacement
???
Statistics: I have a coin. I do not know if it is fair or not. I toss it 100
times and get 60 heads. Is the coin fair or not?
? ... ?
100 draws with
replacement
0 , 1 , 1 , ..., 0
|
{z
}
60 1 s
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See which box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
NOT COMPATIBLE
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See what box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
NOT COMPATIBLE?
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See what box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
COMPATIBLE
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See what box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
COMPATIBLE
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See what box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
COMPATIBLE
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See what box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
NOT COMPATIBLE?
Confidence Intervals
Another mode of statistical inference is interval estimation.
Idea: See what box models are compatible with the given data.
Example: Let’s vary the proportion p of 1 s in the box, and see
which boxes are compatible with the observed data: 60 heads in
100 tosses.
NOT COMPATIBLE
Confidence Interval Interpretation #1
For any p (the proportion of 1 s in the box) between .497 and .697,
the P -value for the observed data (60 heads) is above 2.5%.
In other words, all values of p in the interval (.497, .697) are
“compatible” with observing 60 heads. We call this a 95%
confidence interval for p.
In this class, we will only deal with 95% confidence intervals. But
you can obtain intervals will other confidence levels by adjusting
the minimum P -value you are willing to tolerate.
Confidence Intervals by Theory
Confidence intervals are easier to understand by theory than by
simulation.
Remember, if the mean of the box was µ, we compared Z =
to a Normal(0, 1) distribution. (Or T =
X−µ
√
S/ n
X−µ
√
σ/ n
to tn−1 .)
If Z is too large or too small, then the P -value will be small. We
need to find the cutoffs that make the P -value is exactly 2.5%.
Confidence Intervals by Theory
The cutoff is close to 2. (It’s actually a bit lower for Z and usually a
bit higher for T .)
So if we observe X, we need to find all values µ such that
X − µ
√ < 2.
σ/ n σ
In other words, the interval contains all µ within X ± 2 √ .
n
Theory-Based Interval for p
We observed 60 heads in 100 tosses. Let’s find a 95% confidence
interval.
The mean of the sample is X =
60
100
= .60.
We don’t know σ. Approximate it by the SD of the sample, S ≈ .49.
So a 95% confidence interval for p is:
.49
.60 ± 2 √
= (.502, .698).
100
Compare this with the simulation-based interval we obtained
earlier, (.497, .697).
Confidence Interval Interpretation #2
We just saw that the interval X ± 2 √σn contains all values µ that
are “compatible” with the observed data X. (If we were to test that
the mean of the box is any of these values, the P -value would be at
least 2.5%.)
Another way to interpret this is to imagine the interval as random.
If we were to collect another sample of the same size, we would
obtain a new X and thus a new interval.
A 95% confidence interval means that about 95 of every 100
intervals will cover the true mean µ.
But of course, we typically only get to observe one interval.