Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 8 Confidence Intervals L8_S1 Confidence intervals It is rare that researchers gather information from an entire population. If we did, statistics would be unnecessary. Error is involved whenever an experiment is run or people are sampled for a survey. Confidence intervals give us an estimate of the amount of error involved in our data. They tell us about the precision of the statistical estimates (e.g., means, standard deviations, correlations) we have computed. Confidence intervals are related to the concept of the power. The larger the confidence interval the less power a study has to detect differences between treatment conditions in experiments or between groups of respondents in survey research. A confidence interval gives an estimated range of values, which is likely to include an unknown population parameter, and we calculate the estimated range from a given set of sample data. The common notation for the parameter in question is theta (θ). Often, this parameter is the population mean mu (μ), which is estimated through the sample mean X. L8_S2 95% confidence If independent samples are taken repeatedly from the same population, and a confidence interval is calculated for each sample, then a certain percentage of the intervals will include the unknown population parameter. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9%, confidence intervals for the unknown parameter. L8_S3 Factors affecting CI A confidence interval is based on three elements: a value of a statistic (the mean, the correlation, etc.); the standard error of the measure; and the desired width of the confidence interval (for example, the 95% confidence interval or the 99% confidence interval). The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter. A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter. Confidence intervals are more informative than the simple results of hypothesis tests, where we decide to reject the null hypothesis or not to reject the null hypothesis, since they provide a range of plausible values for the unknown parameter. PTO 1 Confidence limits are the lower and upper boundaries or values of a confidence interval, that is, the values which define the range of a confidence interval. The upper and lower bounds of a 95% confidence interval are the 95% confidence limits. L8_S4 Confidence levels The confidence level is the probability value (1 – alpha) associated with a confidence interval. It is often expressed as a percentage. For example, say alpha equals 0.05, which equals 5%, then the confidence level = (1 - 0.05) = 0.95, that is, a 95% confidence level. L8_S5 CI for a mean A confidence interval for a mean specifies a range of values within which the unknown population parameter, in this case the mean, may lie. These intervals may be calculated by, for example, a producer who wishes to estimate his mean daily output; a medical researcher who wishes to estimate the mean response by patients to a new drug; etc. The (two-sided) confidence interval for a mean contains all the values of the true population mean, which would not be rejected in the two-sided hypothesis test of: the null hypothesis (H0) where μ equals μ0, against the alternative hypothesis (H1) where μ not equal to μ0. L8_S6 CI for a mean & difference between means We calculate these intervals for different confidence levels, depending on how precise we want to be. We interpret an interval calculated at a 95% level as that we are 95% confident that the interval contains the true population mean. We could also say that 95% of all confidence intervals formed in this manner (from different samples of the population) will include the true population mean. We can use either the zscores or the t-scores in order to calculate our critical score. The confidence interval for the difference between two means contains all the values of: µ1 - µ2 in other words, the difference between the two population means, which would not be rejected in the two sided hypothesis test of: the null hypothesis where µ1 - µ2 equals zero against the alternative hypothesis where µ1 - µ2 not equal to zero. If the confidence interval includes zero we can say that there is no significant difference between the means of the two populations, at a given level of confidence. 2 L8_S7 CI example As an example, let’s look at a group of ten girls, who on average, went on their first camping trip at the age of 15 and a half years old. The standard deviation is 4.2 yrs, and we want to know what range of values can we state with 95% accuracy, contain the true population mean? Our sample size is ten, so our degrees of freedom are 9, which we use to look up the t-critical in the table, and see that it is 2.25. We apply the formula and get a margin of 3 yrs. We can therefore state that with 95% accuracy, that the true population mean lies between 12 and a half, and 18 and a half years. Test yourself with these… 3