Download script chapter 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 8 Confidence Intervals
L8_S1 Confidence intervals
It is rare that researchers gather information from an entire population. If we did, statistics would be
unnecessary. Error is involved whenever an experiment is run or people are sampled for a survey.
Confidence intervals give us an estimate of the amount of error involved in our data. They tell us about
the precision of the statistical estimates (e.g., means, standard deviations, correlations) we have
computed. Confidence intervals are related to the concept of the power. The larger the confidence
interval the less power a study has to detect differences between treatment conditions in experiments or
between groups of respondents in survey research.
A confidence interval gives an estimated range of values, which is likely to include an unknown
population parameter, and we calculate the estimated range from a given set of sample data.
The common notation for the parameter in question is theta (θ). Often, this parameter is the population
mean mu (μ), which is estimated through the sample mean X.
L8_S2 95% confidence
If independent samples are taken repeatedly from the same population, and a confidence interval is
calculated for each sample, then a certain percentage of the intervals will include the unknown
population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we
can produce 90%, 99%, 99.9%, confidence intervals for the unknown parameter.
L8_S3 Factors affecting CI
A confidence interval is based on three elements: a value of a statistic (the mean, the correlation, etc.);
the standard error of the measure; and the desired width of the confidence interval (for example, the
95% confidence interval or the 99% confidence interval).
The width of the confidence interval gives us some idea about how uncertain we are about the unknown
parameter. A very wide interval may indicate that more data should be collected before anything very
definite can be said about the parameter.
Confidence intervals are more informative than the simple results of hypothesis tests, where we decide
to reject the null hypothesis or not to reject the null hypothesis, since they provide a range of plausible
values for the unknown parameter.
PTO
1
Confidence limits are the lower and upper boundaries or values of a confidence interval, that is, the
values which define the range of a confidence interval.
The upper and lower bounds of a 95% confidence interval are the 95% confidence limits.
L8_S4 Confidence levels
The confidence level is the probability value (1 – alpha) associated with a confidence interval.
It is often expressed as a percentage. For example, say alpha equals 0.05, which equals 5%, then the
confidence level = (1 - 0.05) = 0.95, that is, a 95% confidence level.
L8_S5 CI for a mean
A confidence interval for a mean specifies a range of values within which the unknown population
parameter, in this case the mean, may lie. These intervals may be calculated by, for example, a
producer who wishes to estimate his mean daily output; a medical researcher who wishes to estimate
the mean response by patients to a new drug; etc.
The (two-sided) confidence interval for a mean contains all the values of the true population mean,
which would not be rejected in the two-sided hypothesis test of:
the null hypothesis (H0) where μ equals μ0, against the alternative hypothesis (H1) where μ not equal to
μ0.
L8_S6 CI for a mean & difference between means
We calculate these intervals for different confidence levels, depending on how precise we want to be.
We interpret an interval calculated at a 95% level as that we are 95% confident that the interval contains
the true population mean. We could also say that 95% of all confidence intervals formed in this manner
(from different samples of the population) will include the true population mean. We can use either the zscores or the t-scores in order to calculate our critical score.
The confidence interval for the difference between two means contains all the values of: µ1 - µ2 in other
words, the difference between the two population means, which would not be rejected in the two sided
hypothesis test of:
the null hypothesis where µ1 - µ2 equals zero against the alternative hypothesis where µ1 - µ2 not equal
to zero.
If the confidence interval includes zero we can say that there is no significant difference between the
means of the two populations, at a given level of confidence.
2
L8_S7 CI example
As an example, let’s look at a group of ten girls, who on average, went on their first camping trip at the
age of 15 and a half years old. The standard deviation is 4.2 yrs, and we want to know what range of
values can we state with 95% accuracy, contain the true population mean?
Our sample size is ten, so our degrees of freedom are 9, which we use to look up the t-critical in the
table, and see that it is 2.25.
We apply the formula and get a margin of 3 yrs. We can therefore state that with 95% accuracy, that the
true population mean lies between 12 and a half, and 18 and a half years.
Test yourself with these…
3