Download Confidence Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Confidence Intervals
According to the Empirical Rule, we can calculate intervals of values that contain specified percentages (proportions) of
the observations (distribution). Proportions are approximately normal (assuming we have a random sample) if n and
n(1)  10 (or at least 5). The mean of the sample proportions, (p), is  and the standard deviation, (p), is (1)/n,
so a 95% interval would be about   2*(1)/n, the proportion  the margin of error. If we want exactly 95%, we
should use 1.96 = z0.05/2 instead of 2. Remember P( Z > z0.05/2 ) = 0.05/2 = 0.025, so P(Z < z0.05/2 ) = 0.975, or z0.05/2 is the
97.5 percentile of the Standard Normal distribution. Also, the middle 95% of the distribution falls between the 2.5 and
97.5 percentiles.
The problem comes when we don’t know what the true proportion, , is, so we have to estimate it with the sample
proportion, p. Our 95% interval would be:
p  1.96*p(1p)/n
(this is the traditional formula)
Notice that we had to substitute p into the standard deviation. This is usually called the estimated standard deviation of p
or the standard error of p. The problem with this formula is that it can be quite inaccurate even for large samples (see IPS
p. 573). A slight adjustment, moving p slightly away from 0 or 1, will do better. The one we will use is the Wilson
estimate of the population proportion, p 
x2
. The confidence interval is:
n4
p(1  p)
p  z / 2
n4
Intervals like this are called confidence intervals because we are confident that the true proporiton will fall in this
interval. Suppose we calculate a 95% confidence interval and then make the statement “I am 95% confident that the
population proportion is between ...”. This statement of confidence is correctly interpreted by going back to the idea of a
sampling distribution. What this statement means is
***** If we could take all possible samples of size n, calculate the confidence interval in the formula above for each and
every sample, then the proportion of confidence intervals containing the true value of the population proportion will be
exactly 95%. Of course, we can’t possibly take ALL samples of size n, but we will confident that our sampling method
will produce a sample proportion and a (1)*100% confidence interval that will contain the true proportion
approximately (1-)100% of the time. Any one confidence interval, however, either contains the true mean or not.
A note on :  is the proportion of the distribution under the Z curve that falls outside our interval, which is why we call
our confidence intervals (1)*100% intervals. (1)*100% is called the confidence level.
Properties of Confidence Intervals:
1. The sample proportion is our ‘best guess’for , so it is the center of the interval.
2. The larger the level of confidence, (1-)100%, the wider the confidence interval. Conversely, the larger  (the area
‘outside’ the interval), the narrower the width of the confidence inteval. z/2, found in the last row of the t tables, gives us
the proper width for each confidence level.
3. The larger the sample size, n, the narrower the width of the confidence interval. (more data, means more accurate
estimate)
4. The closer p is to 0.5, the wider the confidence interval. (the closer the proportion of success vs. failure the harder it
is to estimate)
5. As long as np and n(1p)  10 (remember we don’t know  so we use p, so this means the population is normally
distributed), the sample size has no effect on the level of confidence; i.e., the % of confidence intervals containing the
true population mean will be about (1-)100% no matter what n is. BUT, if the population is not normally distributed,
our “(1-)100% confident” statement may be compromised.
Proportions: if we want to estimate the true center of a distribution of proportions (from categorical data), , we use the
statistic p.
1. Sample proportions, p’s are approximately normal (assuming we have a random sample) if n and n(1)  10.
Since we don’t know , we use np and n(1p) instead.
2. The mean of the sample proportions, p, is  as long as we have a random sample.
3. The standard deviation, p, is (1)/n. Again, we don’t know . We could use the sample proportion, p, but this
can give you intervals which contain values outside 0 and 1. An adjustment is the Wilson estimate of the population
proportion is p  X  2 and the standard error of p is SE p  p(1  p)
n4
n4
4. The z-score used is like that for means.
A (1-)100% confidence interval for the population proportion, , is given by:
p  z*
p(1  p)
.
n4
NOTE: the value of our sample proportion, p, affects both the center of the interval AND the width!
Making Decisions with Confidence Intervals:
1. If a value is NOT covered by a confidence interval (it’s not included in the range), then it’s NOT a plausible value for
the parameter in question and should be rejected as such.
2. When the confidence intervals from two different populations do NOT overlap (they don’t have any values in
common), then it’s NOT plausible that they have the same value for the parameter in question.