Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
td aw lk 1/3/04 CHAPTER 9: USING SAMPLING DISTRIBUTIONS: CONFIDENCE INTERVALS Outline Confidence intervals using the Central Limit Theorem Constructing the confidence interval for large samples Sampling experiments Confidence intervals using the t distribution Size of confidence intervals Graphing confidence intervals Confidence intervals for dichotomous variables Rough confidence intervals What have we learned? Figure 9.1. Histogram of animal concern score in random sample of size 10 Figure 9.2. Population of years of education that is roughly Normally distributed Figure 9.3. Size of confidence interval as certainty in capturing the population mean increases, based on the animal concern data Figure 9.4. Mean animal concern score by race and gender Figure 9.5. 95% confidence interval around the mean homicide rates for Confederate states and non-Confederate states Figure 9.6. A plot of the means for Confederate states and non-Confederate states Figure 9.7. Confidence intervals on the sustainability index for the countries with high and low numbers of women in the legislature Figure 9.8. Confidence intervals on the sustainability index for the countries with high and low numbers of women in the legislature and of gdp per capita Figure 9.9. Confidence intervals around the proportion of men and women who would have genetic testing Figure 9.10. Confidence intervals of willingness to have genetic testing by race and gender Table 9.1. Table 9.2. 95% confidence interval for a sampling experiment 95% confidence intervals based on applying the Central Limit Theorem to small samples Table 9.3. Accuracy of 95% confidence interval based on Gosset’s theorem when the population is not Normally distributed Table 9.4. Levels of confidence associated with interval estimates for mean animal concern score Table 9.5. Certainty levels for confidence intervals with “nice” values of z Table 9.6. Mean number of children by educational level 1 The last two chapters introduced the key concepts we use in making statements in the face of random error: the sampling experiment and the sampling distribution. In nearly every chapter from here on we will learn about tools that allow us to use the logic of the sampling experiment and sampling distribution to deal with sampling error in surveys, randomization error in experiments, measurement error or other sources of random error in data we are analyzing. While many tools have been developed by statisticians to deal with specific problems in data analysis, most of them are applications of these two basic concepts. In fact, most of the tools are really special cases of the two general tools we introduce in this chapter and the next: the confidence interval and the hypothesis test. Confidence intervals allow us to make estimates of values in the population, such as the population mean, while taking into account that our estimates are never certain. Hypothesis tests allow us to assess the degree to which some assertion about the population is reasonable to believe, given the information about the population we have in our sample. That is, the hypothesis test gives a sense of whether or not it is reasonable to believe a particular statement about the population. For example, a hypothesis test could tell us whether or not it’s reasonable to believe that men and women have the same attitude toward genetic testing, or have the same level of concern about animals. In this chapter we focus on the confidence interval. In Chapter 8 we discussed the Central Limit Theorem and Gosset’s work in describing sampling distributions, both of which help us make statements about the population based on sample data in two ways. (Remember that we can rely on the Central Limit Theorem when we have large simple random samples, and we can use Gosset’s work when we have simple random samples of any size drawn from a Normally distributed population.) First, they tell us that the mean of a sample is a good guess of the mean of the population. Below we discuss in detail what we mean by “good” but to preview: We know that for large samples, most samples will have means close to the population mean, few sample means will be far from the population mean, and if we use the sample mean as our estimate of the population mean we will be right on average. The same thing applies when we have simple random samples of any size drawn from a normally distributed population—the mean of the sample mean is a good guess of the population mean for the same reasons. The use of the sample mean to estimate the population mean is called a point estimate. We are guessing at the population parameter with a single number. But sampling distributions let us do even better because the second way sampling distributions help us make statements about the population is by allowing us to develop an interval estimate. An interval estimate is a range that we can be quite certain includes the real population mean. This is important, since few sample means are exactly equal to the population mean, even though most are pretty close. We can have not only a single number that is a good guess of what’s true in the population but a sense of how likely that guess is to miss (or accurately reflect) the true population value. Confidence Intervals Using the Central Limit Theorem 2 While the sample mean may be the best single number to use as an estimate of the population mean, how can we take into account the uncertainty involved in making such a guess? To do this, we want to develop a high and low estimate around the sample mean so that this range will be very likely include the population mean. That is, we can be pretty confident that for our research sample that the confidence interval we build will include the real value of the population mean. We are confident because we know from sampling theory that we usually “catch” the value of the population mean in the confidence interval. So unless we are very unlucky, it’s likely that the mean of the population we are studying is “captured” by the confidence interval we construct from data in our sample. We can determine the percentage of samples in the sampling distribution for which the confidence interval “catches” the population mean. It is common to use 90%, 95% and 99%. Thus, we can be 90, 95 or 99% certain that the confidence interval includes the population mean. When we use these values we are saying that the procedures we are using to construct the confidence intervals gives us intervals that cover the population mean for 90% or 95% or 99% of the samples in the sampling distribution. To be a bit more informal, we can say that we are 90% or 95% or 99% certain that the confidence interval we have constructed for our research sample “captures” (includes) the population mean. Constructing the Confidence Interval for Large Samples Let’s start with the case of large simple random samples. If we have a large simple random sample, and we are trying to estimate the population mean, the confidence interval will be: Upper limit: x Z1 x Equation 9.1 Lower limit: x Z1 x Equation 9.2 Or, in the form that statisticians use: x Z1 x Equation 9.3 These formulas are really rather simple if we take it one step at a time. First, we would read the equation as “x bar plus and minus z sub one minus alpha times sigma hat sub x bar.” We start with the sample mean, which is our best estimate of the population mean. To get the upper bound of the confidence interval we are going to add a number to the sample mean. To get the lower bound of the confidence interval, we will subtract the same number from the sample mean. We calculate the number to add and subtract by multiplying together two numbers. The first one, z1- , is found by looking in a “z table” for the number corresponding to the confidence level we want. Let’s say we want to be 95% certain that the confidence interval we build from our research sample will capture 3 the population mean—that is, we want to use a procedure that will include the population mean in the interval for 95% of the samples in the sampling experiment. The z table tells us the value of z to use to make sure the confidence interval will include the mean of the population the percentage of times we want. The z table is based on figuring out the areas under the Normal distribution, that is, how far we have to go to get a confidence interval big enough to work the specified percentage of times. In a sense the z table saves us the trouble of having to do a sampling experiment because the Central Limit Theorem tells us what would happen if we did the experiment. The z table just records the results of the elaborate calculations that are required by the Theorem. Computer statistical packages do the calculations directly, so we don’t use tables much these days except when we are learning how to build a confidence interval. In the next chapter it will become clear why the confidence level (95% in this case) is referred to as 1-. For the moment, accept it as an arbitrary label. Note that the right hand side of the expression Z1 x Equation 9.4 is also referred to as the “margin of error.” Thus you will sometimes see newspapers reporting that the results of a poll have a 5% margin of error. What they are saying is that a particular confidence interval (usually 95%, but often the newspapers don’t say) is the mean plus and minus the reported margin of error. The symbol “sigma hat sub x bar” is written ˆ x Equation 9.5 and is the symbol for the standard deviation of the sampling distribution of sample means. It is given a special name: the standard error of the mean, but it is just the standard deviation of the sampling distribution. We know from the Central Limit Theorem that 2 x 2 n Equation 9.6 That is, the variance of the sampling distribution equals the variance of the population divided by the sample size. Then we can get the standard error by taking the square root. But since we don’t know the variance of the population, what good does this do? 4 To actually build a confidence interval, we have to estimate the variance of the population. Gosset showed that we can get a good estimate of the population variance by dividing the sample sum of squares by n-1. That is n ˆ 2 (x i x )2 1 n 1 Equation 9.7 Or if you already have the variance of the sample s2 ˆ 2 ( n )s 2 n 1 Equation 9.8 Let’s build a confidence interval for the animal concern scale. Recall that we have a sample of 1386 people and that the score can run from 2 to 10. The sample mean is 5.20 and the sample variance is 3.64. 1. Estimate the population variance. Given that the sample size is 1386, we have ˆ 2 ( 1386 3.64 3.64 1385 ˆ 2 n )s 2 n 1 Equation 9.9 In this case the sample size is so large that the correction does not change the variance, at least to the second decimal point. 2. Estimate the variance of the sampling distribution. Equation 9.10 5 ˆ 2 x ˆ 2 n That is, we take the estimate of the variance of the population and divide it by the size of our sample to get an estimate of the variance of the sampling distribution. This will be 3.64/1386= 0.0026. 3. Estimate the standard error. This is easy. Once we have an estimate of the variance of the sampling distribution, we just take the square root to get an estimate of the standard deviation of the sampling distribution—the standard error. This will be the square root of 0.0026, which equals 0.05. 4. Find the z value for the level of confidence we want. If we want a 95% confidence interval, the z value will be 1.96 (see the Z table in the Appendix). 5. Multiply the z value by the standard error to get the “margin of error.” This is 1.96*0.05= 0.098 6. Add the margin of error (in this case, 0.098) to the mean to get the upper bound. This gives 5.20 + 0.098 = 5.298 or about 5.3 7. Subtract the margin of error (in this case, 0.098) from the mean to get the lower bound. This gives 5.20 - 0.098 = 5.102 or about 5.1 Thus, we can be 95% certain that the mean animal concern score for the US adult population is between 5.2 and 5.3. By 95% certain we mean that if we used this procedure in a sampling experiment, the confidence interval would cover the true population mean for 95% of all samples, and for 5% of all samples it would miss. Sampling Experiments It may be helpful to show how sampling experiments work using the two examples we explored in the last chapter. Recall that we created two populations from which to draw samples. One had a uniform distribution of years of education, with a mean of 10 years of education and every number of years of education from 0 to 20 having the same number of “people.” The other was a “lumpy” distribution with a mean of 13.4 and with people “lumping up” in certain numbers representing the number of 6 years of education that match categories, such as high school graduate or college graduate. We will again draw large samples of 901 “people.” For each sample in the sampling distribution, we will calculate a confidence interval using steps 1-6 above. We will see if the confidence interval for that sample includes the population mean. For the uniform distribution the mean was 10. For the non-uniform distribution the mean was 13.4. If the confidence interval includes the population mean, we have a success for that sample in that the confidence interval included the true population mean. The Central Limit Theorem says that a 95% confidence interval should be successful for 95% of the samples in the sampling distribution. Table 9.1 gives the results for both the uniform and non-uniform distributions. Table 9.1. Distribution of Population Uniform Non-uniform 95% confidence interval for a sampling experiment Number of Successes 4739 4747 Number of Misses 261 253 Percentage successes 94.78 94.94 Percentage misses 5.22 5.06 It appears that the theory is pretty accurate—for about 95% of samples the confidence interval included the population mean, and it missed for about 5%. Confidence Intervals Using the t Distribution But what do we do if we have small samples? Before Gosset, researchers would calculate all confidence intervals, even for small samples, using the Central Limit Theorem. And before Gosset, they had to use the sample variances as the estimate of the population variation, because they hadn’t figured out how to use the correction that yields an unbiased estimate of the population variance from the sample. We can try this with some sampling experiments using these older procedures with small samples to see how bad the problem can be. We’ve had the computer draw samples of size 7 from both the uniform and the non-uniform populations of education we’ve been using. Table 9.2 shows the performance of confidence intervals based on the Central Limit Theorem with a biased estimate of population variance. Table 9.2 95% confidence intervals based on applying the Central Limit Theorem to small samples Distribution of population Uniform Non-uniform Number of success 4551 4446 Number of misses 449 554 Percentage successes 91.02 88.92 Percentage misses 8.98 11.08 It appears that we’re in trouble. The 95% confidence interval is supposed to include the population mean for 95% of samples, and miss only 5% of the time. But it’s working 7 only around 90% of the time. That is, we’re missing the true population mean nearly twice as often as we should be. The problem is that using a z value with small samples makes the size of the interval too small. Further, since we tend to underestimate the population variance when we just use the sample variance as our estimate, rather than using the formula that Gosset developed, that too will make the confidence interval too small. The result is the confidence interval misses the population mean more often than it should. We can easily get around this by using Gosset’s work to build the confidence interval. We proceed in exactly the same way as we did with the z distribution. The only difference is that we now use a value for tdf, 1- rather than for z 1-. Because we have a small sample from a Normally distributed population when we use Gosset’s approach, the sampling distribution of sample means will have a t distribution. We use a t table to find these values of t we want that correspond to the accuracy of the confidence interval, just as we used a z table to find the appropriate values of z. But now we have to keep track of the number of degrees of freedom (n -1 for estimating a confidence interval for the population mean) to find the right t value. Remember, we can only use Gosset’s theorem and the t distribution when we have reason to believe that the population from which the sample was drawn is Gaussian in its distribution. Suppose that instead of having 1386 observations of the animal concern scale scores, we only had 10 observations. This might be because we are studying something that is expensive to measure, or because we are doing a pilot study with limited data. To show how we would proceed, we have drawn a random sample of ten cases from our sample of 1386. A random sample of a random sample is still a random sample, so our 10 cases are a random sample of the US population. Of course, this is not something we would do in research, but here it helps to show how we can build a confidence interval for the mean when we have a small sample. The mean of our sample of 10 is 5.5 and the variance is 6.05. We will now build the confidence interval assuming that the population values of the animal concern scale are Normally distributed. Of course, we don’t know if that’s true, and our confidence intervals will be accurate only if the population really is Normal. Figure 9.1 is a histogram of the sample of 10 cases imposed with a Normal curve. 8 Number Figure 9.1. Histogram of animal concern score in random sample of size 10 The shape of the frequency distribution of the sample would make us cautious about the confidence intervals in that, while it has some tendency to peak in the middle, it’s not very close to a Normal distribution. But of course, it’s not the shape of the sample distribution we are concerned with but rather the shape of the population distribution. Still, we would want to be very cautious if an important decision rested on the confidence interval. But having noted that our confidence interval may not capture the population mean the right percentage of times if the population really isn’t Normal, we will proceed through the steps of building the confidence interval. The steps we use in building the confidence interval are the same as before: 1. Estimate the population variance. Again, we take the sample variance and multiply by the correction factor. n 10 2 ˆ 2 s 6.05 6.72 n 1 9 Equation 9.11 9 2. Use that to estimate the variance of the sampling distribution. ˆ 2 6.72 0.672 n 10 ˆ x 2 Equation 9.12 3. Estimate the standard error (take the square root of the variance of the sampling distribution found in step 2 to get the standard error of the mean). ˆ x 0.672 0.820 Equation 9.13 4. Find the t value (remember this is a small sample) that matches the desired level of confidence, 95%, and the number of degrees of freedom, which is n-1 or 9. The t value is 2.262 (see the t Table in the Appendix). Remember that the 95% z value is 1.96. The t value takes the sample size into account and is larger than the z value, thus making the confidence interval bigger than if we used a z based on the Central Limit Theorem. 5. Multiply the t value by the standard error. (2.262)*(0.820)=1.85 Equation 9.14 6. Add this to the population mean to get the upper bound. 5.50 – 1.85 = 3.65 Equation 9.15 7. Subtract from the population mean to get the lower bound. 5.50 + 1.85 =7.35 Equation 9.16 So we can be 95% certain that the true population mean for animal concern is between about 3.65 and 7.35. Remember that the 95% means that this confidence interval procedure would catch the real population mean in 95% of samples in a sampling experiment. Also remember that the 95% depends on the population distribution of education being Normal. If it’s a bit different than Normal, then the confidence interval will hit less often. What if the population isn’t really normal? We never know for certain if the population is Normally distributed, though sometimes a body of previous research allows us to be fairly certain it’s roughly Normal. We can see what happens when we apply Gosset’s theorem to building confidence intervals with small samples when the population is roughly normal and when it is not by conducting another set of sampling experiments. We’ll use the same two non-Normal distributions of education we used 10 before, the uniform distribution and the lumpy distribution. We’ll also add a third distribution – one that is a roughly Normal distribution of education with a mean of 9.96 and a variance of 16.43. Figure 9.2 shows this population. Note that it does deviate from the Normal a bit, in that there are too many cases in the “tails” and a bit two few in the middle. If we had a perfectly Normal distribution, the results based on Gosset’s theorem would work perfectly. But here we want to see what happens if we are a little off. .1 Fraction .075 .05 .024 0 0 Figure 9.2. 4 8 12 Number of years of education 16 20 Population of years of education that is roughly Normally distributed Table 9.3 shows how often out of a thousand samples in a sampling experiment the confidence intervals based on Gosset’s theorem actually include the population mean. Table 9.3. Accuracy of 95% confidence interval based on Gosset’s theorem when the population is not Normally distributed Distribution of population Normal population assumed by theory Uniform Non-uniform, lumpy Nearly Normal Number of success out of 1000 950 Number of misses out of 1000 50 Percentage successes Percentage misses 95% 5% 940 938 60 62 94.0% 93.8% 6.0% 6.2% 942 58 94.2% 5.8% 11 We can see that, in each case when we apply the confidence interval constructed on the assumption that the population is Normally distributed to non-Normal populations, we actually included the true population mean fewer times than the theory predicted. We should have had 950 out of 1000 success and 50 misses according to theory. This should make us a bit cautious about working with small samples where we don’t know the distribution of the population to be Normal. But even with the very non-Normal uniform and lumpy distributions, we don’t go too far wrong. Applying Gosset’s theorem to these non-Normal populations still created confidence intervals that “captured” the true population mean about 94% of the time. Size of Confidence Intervals At the beginning of the chapter we said that you can build the confidence interval to be as certain about capturing the true population mean as you want. While 95% is a pretty good success rate, is it possible be absolutely certain? Yes, we can be absolutely certain that the confidence interval for education runs from 0 (the lowest possible value) to infinity, or to some very large number. Of course, that’s not very useful. In confidence intervals there is always a tradeoff between how wide the confidence interval is and how certain you are it catches the population mean. The wider the range, the more certain you are. Table 9.4 shows how the size of the confidence interval based on the Central Limit Theorem changes as we demand more certainty that we have captured the true population mean. Remember that the sample size for the animal concern variable we are using as an example is 1386. Table 9.4. Levels of confidence associated with interval estimates for mean animal concern score Probability of hitting population mean Upper bound Lower bound 75% 90% 95% 99% 5.26 5.28 5.30 5.33 5.14 5.11 5.10 5.06 Size (upper limit minus lower limit) 0.12 0.17 0.20 0.27 The more certain we want to be that the confidence interval captures the population mean, the broader the range of our estimate. Figure 9.3 shows how the size of the confidence interval increases with the certainty that we have captured the population mean. 12 Size of confidence interval .3 .2 .1 75 90 95 99 Probablity of CI Figure 9.3. increases Size of confidence interval as certainty in capturing the population mean In some applications we’d like to have the size of the confidence interval relatively small but the probability relatively high. How can we do that? Remember that there are several things that influence the size of the confidence interval: 1. The confidence level. The more certain we are of getting the population mean in the interval, the larger the interval. 2. The sample size. The variance of the sampling distribution is inversely proportional to the sample size, so the larger the sample size the smaller the confidence interval. Since we actually multiply by the standard error, which is the square root of the variance of the sampling distribution, the size of the confidence interval changes proportionately with the square root of the sample size. If you want to cut the size of the confidence interval in half by increasing sample size, you have to quadruple the sample size. The sample size also plays a role in the t value in that the smaller the number of degrees of freedom, the larger the t. 3. The variance of the population. Usually, this is out of the researcher’s control. But some variables may have smaller variance than others, so if you can work with variables known to have small variance, this could reduce the size of the confidence interval. 13 So, to reduce the size of the confidence interval without increasing the chances of missing the population mean, we must increase the sample size for a study. In fact, if we know how big the confidence interval should be, and we know how certain we must be and can make a guess at the population variance, we can calculate how large our sample needs to be. The equation is 4 2 t n21,1 n c Equation 9.17 Here c is the size of the confidence interval we want. Remember that ̂ 2 is our estimate of the population variance and that tn-1, 1-α is the value of t from the table to get the probability of catching the population mean that we want. Of course the value for t depends on the confidence level but also the sample size. So you can start with a guess, look up a t value, calculate the sample size required, then use that sample size for a new t and repeat the process until the answer doesn’t change. The tricky part can be guessing the value of the population variance. Usually we have to rely on prior research for this. Notice that the size of the population does not enter into the formula. As long as we are drawing a relatively small proportion of the population into the sample (so that we are approximating sampling with replacement), then the population size is irrelevant. For example, for a given variance, confidence level and size of confidence interval, you need the same size sample from one state as you do from the whole country. Graphing Confidence Intervals It is common to see graphs that display the mean and confidence interval of a variable conditional on some other variables. Figure 9.4 shows the mean animal concern score by race (limited to Euro-Americans and African-Americans because of the relatively small sample) and gender. The small boxes represent the sample mean scores on the scale for each gender/ethnic group (remember the sample mean is our best estimate of the population mean). The vertical bars extend to the 95% confidence intervals. [If you see an error bar graph in the literature, check to see the confidence level. A 95% confidence interval is the most common, but sometimes bars are constructed at two standard errors (this is because the z value for 95% is 1.96, very close to 2) or at one standard error (with a large sample this would be about a 42% confidence interval).] 14 6.5 6.0 5.5 5.0 RESPONDENT'S RACE 4.5 WHITE BLACK 4.0 N= 518 60 MALE 654 92 FEMA LE RESPONDENT'S SEX (SEX) Figure 9.4. Mean animal concern score by race and gender Note that because of the much smaller sample size, the confidence intervals for African-Americans are much larger than those for Euro-Americans. The mean is lowest for white men, black men are next, followed by white women. Black women have the highest animal concern score. Notice that the confidence interval for white men overlaps with that for black men but not with that for white women or black women. The confidence intervals for the other three groups do overlap. This might make us suspect that there are group differences, with white men different from the other three groups on the animal concern scale. The next chapter will examine how we determine whether differences across groups are likely to exist in the population, given evidence of differences in our sample. Confidence Intervals for Dichotomous Variables The Central Limit Theorem applies whenever we have a large simple random sample, whatever the distribution of the population. This means that we can use the Central Limit Theorem to construct a confidence interval for a dichotomous variable. The steps are just the same as those above for a continuous variable. As an example, we will construct a 99% confidence interval for the genetic testing variable. Recall that the sample mean for this variable is 0.684. 1. Estimate the population variance. 15 Here we use a simplification: For a dichotomous variable, if we label the proportion in the category labeled 1 as p, then the variance is just p*(1-p). We know that for this sample, p is 0.684, so then p*(1-p) will be (0.684)*(1-0.684), which equates to 0.684*0.316=0.216. We can now estimate the population variance using Gosset’s formula: n 2 901 ˆ 2 s 0.216 0.216 n 1 900 Equation 9.18 Some statisticians use a different logic at this step. They note that the sample mean is the best estimate we have of the population mean. So we can take 0.684 as our estimate of the population mean, then apply the p*(1-p) formula to the population mean to get an estimate of the population variance. This gives the same result to three decimal places in this example because we have a reasonably large sample. The two approaches would differ with small samples, but we can’t use this approach with small samples anyway. 2. Estimate the variance of the sampling distribution Equation 9.19 ˆ 2 x ˆ 2 n That is, we take the estimate of the variance of the population and divide it by the size of our sample to get an estimate of the variance of the sampling distribution. This will be 0.216/901 = 0.00024. 3. Estimate the standard error. Once we have an estimate of the variance of the sampling distribution, we just take the square root to get an estimate of the standard deviation of the sampling distribution—the standard error. This will be the square root of 0.0024 which equals 0.015. 4. Find the z value for the level of confidence we want. If we want a 99% confidence interval, the z value will be 2.576 (See the z table in the Appendix). 5. Multiply the z value by the standard error. This is 2.576* 0.015 = 0.039. 6. Add this to the mean to get the upper bound. This gives 0.684 + 0.039 = 0.723. Subtract the number from step 5 from the mean to get the lower bound. This gives 0.684 - 0.039 = 0.645. Thus, we can be 99% certain that the proportion of people in the US who would have a genetic test is between about 64% and 72%. Saying that we are 99% certain 16 means that if we used this procedure in a sampling experiment, the confidence interval would cover the true population mean for 99% of all samples, and for 1% of all samples it would miss. When the mean is close to one or zero, Statisticians have developed a more precise approach based on what is called the binomial theorem. It’s sometimes necessary to use this if the mean (the proportion in the category scored 1) is very close to one or zero. There is nothing in the calculations we have just described to keep the confidence interval from going below zero or over one, but such values don’t mean anything when working with a proportion. In such cases the Central Limit Theorem doesn’t work and the more appropriate binomial theorem does. But for most applications when the mean is not close to one or zero, the Central Limit Theorem works well. If we had used the binomial theorem for this problem the confidence interval would have ranged from 0.645 to 0.723—hardly a difference. Rough Confidence Intervals Remember that the choice of 90%, 95% and 99% rather than, for example, 85%, 97% or some other level of confidence is arbitrary. We want the confidence intervals to capture the true population mean most of the time so we can be pretty certain we’ve captured it with the research sample in a particular study. But the common use of 90%, 95% and 99% instead of some other high level of confidence is just a conventional choice.1 It is common to see researchers conduct confidence intervals based on z or t values of 1, 1.5, 2 or 3. There are choosing “nice” values for values of z or t rather than “nice” values for the confidence level. There is nothing wrong with choosing “nice” values for z or t rather than “nice” values for the level of confidence Table 9.5 shows the confidence levels (the chances of catching the population mean) that correspond to z values of 1, 1.5, 2 and 3. Both indicate a particular level of assurance that the confidence interval has captured the population mean. In management science, there is sometimes discussion of having quality control at the “six sigma” level, which means a z value of six (six standard deviations from the mean), which corresponds to a confidence level of 0.9999, or one chance in ten thousand of missing the true value of the mean. Table 9.5. Value of z 1.0 1.5 2.0 Certainty levels for confidence intervals with “nice” values of z Certainty level 0.6827 0.8664 0.9454 1 There is a subfield of statistics (beyond the scope of this book) called decision theory that suggests how sure we can be by calculating the costs of being wrong and the costs of collecting data. 17 3.0 0.9973 What Have We Learned? The Central Limit Theorem and Gosset’s work allow us to use the logic of the sampling experiment to estimate means of populations from sample data. The best estimate of the population mean is the sample mean. If we use just one number as the estimate, it is called a point estimate. But we know our sample mean may differ from the population mean because of sampling error. So we hedge our estimate by constructing a confidence interval that, for a designated large percentage of samples, will actually include the population mean. Several things influence the size of the confidence interval: the confidence level, the sample size, and the variance of the population. Constructing confidence intervals for large samples uses the z distribution, and for small samples the t distribution is used. If we know how big the confidence interval should be, and we know how certain we must be and can make a guess at the population variance, we can calculate how large our sample needs to be. 18 td lk aw Chapter 9 Applications Example 1. Why do homicide rates vary? We can compare plots of means and confidence intervals for the homicide rates of different groupings of states, where the groups correspond to variables we think might cause variation in homicide rates. Before we do that, however, we need to stop and reflect on how to think about confidence intervals of state homicide rates. First, remember that the states differ substantially in their populations. So the average homicide rate across states is not the same as the average homicide rate for the US. This is because in taking the average across all states, which is 5.83, we give every state equal weight. But of course, states vary enormously in population, from Wyoming, which in 1997 had about 481,000 people, to California, which had about 31,878,000 people in 1997. So the homicide rate of 3.50 per 100,000 for Wyoming represents about 17 homicides while the rate of 8.00 for California represents about 2,550 homicides. If we want to get the homicide rate for the whole country, we would have to take the size of each state into account in what is called a weighted average. Second, we have to remember that the confidence interval is trying to give us a good estimate of the value of the mean in the population, taking sampling error into account. But we have data on all fifty states. So what does the confidence interval mean? One way to think about it is to use the hypothetical logic that the set of states we actually have is a sort of sample from all the states that might exist with somewhat different configurations of homicide rates, histories, poverty levels, etc. Then the confidence interval is an estimate of what the mean of a group of states might be in that hypothetical population. As we’ve mentioned before, some researchers don’t like this way of thinking; others do. But unless we have some random mechanism that underlies our data, then the confidence interval has no meaning. So if we are going to use confidence intervals in our research, we have to think in terms of a random process. If we are not comfortable with that logic for a data set, then we shouldn’t use confidence intervals. With those cautions in mind, we can now look at Figure 9.5. It shows the 95% confidence interval around the mean homicide rate for states that were part of the Confederacy and those that were not. Clearly, the mean across the former Confederate states is higher. But we also know from previous chapters that the homicide rate is related to the amount of poverty in a state. So it would be helpful to take account of poverty in looking at the effect of region on homicide rates. It may be that many formerly Confederate states also have substantial amounts of poverty, and the poverty level is what is really driving homicide rates. In Figure 9.6 we plot the means for states that were formally in and not in the Confederacy, after first splitting the sample into two groups, those with higher and those with lower percentages of households in poverty. To split the states into these two groups, we used the median of the percentage of families in 19 poverty across states in the Confederacy, 15.0. Those states above the median were considered to have more poverty; those below the median were considered to have less poverty. 12 10 8 6 4 2 N= 39 11 Not_c onfederacy Confederacy CONFED Figure 9.5. 95% confidence interval around the mean homicide rates for Confederate states and non-Confederate states 20 18 16 14 12 10 8 Poverty Level 6 Low Poverty Level 4 High Poverty Level 2 N= 31 8 Not_c onfederacy 6 5 Confederacy CONFED Figure 9.6. A plot of means for Confederate states and non-Confederate states For states not in the Confederacy, poverty seems to make a difference. States with higher levels of poverty have a higher mean homicide rate than those with less poverty, and the 95% confidence intervals around the mean for these two groups don’t overlap. While the mean for the low poverty group is lower than the mean for the high poverty group for states that were in the Confederacy, the 95% confidence intervals overlap, so our interval estimates would not lead us to suspect differences between the two groups. And the confidence intervals for both Confederate groups overlap with the high poverty group of non-Confederate states, though not with the low poverty nonConfederate states. We also see the effects of small sample size—three of our groups have less than 10 states. The confidence intervals become rather large when we only have a handful of states in each group. And of course we don’t have any reason to be sure that the “population” distribution of homicide rates is Normal, so the confidence intervals based on small samples may not be actually at 95%. Example 2. For an illustration using the animal concern scale, please review the example in the text. Example 3. Why do nations differ in sustainability? Here we have the same issue as with analysis of the state homicide data. The data set is not a random sample – it is all the data available. So when we build a confidence interval, the “population” to which we are generalizing is a hypothetical one. 21 For example, we can examine whether or not having women in the legislature is beneficial to sustainability. It can be argued that women are more concerned than men about both human welfare and the environment, so having a larger percentage of women in the legislature may lead to higher levels of sustainability. The median across countries of women in the legislature is 9.7%, so we will consider any country with 10% or more as having a high percentage and any country with less than 10% as having a low percentage of women in the legislature. Figure 9.7 shows the confidence intervals on the sustainability index for the countries with high and low numbers of women in the legislature. 3.8 3.6 3.4 3.2 3.0 2.8 2.6 2.4 2.2 N= 43 42 Low High Women in legislature Figure 9.7. Confidence intervals of the sustainability index for the countries with high and low numbers of women in the legislature While the mean for countries with a high percentage of women in the legislature is a bit higher than for the countries with less women in the legislature, the difference is very small and the confidence intervals completely overlap. Perhaps the effect of women in the legislature is being masked by other variables that we are not considering. In later chapters we will consider tools that allow us to look at many variables at a time, but here we can certainly add one more variable. Let’s consider the effect of affluence, measured by Gross Domestic Product per capita. We will split the countries into more and less affluent based on $5600 in gross domestic 22 product (gdp) per capita (the median across all countries in the data set is $5630). Figure 9.8 shows the effects of both women in the legislature and gdp per capita. 70 60 50 40 Affluence 30 Poorer Richer 20 N= 25 17 Low 16 25 High Women in legislature Figure 9.8. Confidence intervals of the sustainability index for the countries with high and low numbers of women in the legislature and of gdp per capita. Here we see what at first looks like a rather complex pattern. Let’s look first at countries that have few women in the legislature, on the left hand side of the graph. We see that in this group of countries, the lower income countries (the confidence interval furthest left in the graph) have a higher mean sustainability than do the higher income countries. But the two confidence intervals overlap slightly, so we would be cautious about drawing any strong conclusions. Looking at the countries with a larger proportion of women in the legislature on the right half of the graph, we find very little difference between the two affluence levels. Indeed, the 95% confidence intervals around the means overlap for all four groups of countries, so we would be inclined to conclude that there aren’t very strong effects of either women in the legislature or affluence. It may be that we need to control for other factors before we could see any effects of women in the legislature or affluence. Example 4. Why do people differ in their views about genetic testing? Remember that the mean of a zero-one categorical variable is just the proportion of people who fall into the category labeled one. Figure 9.9 shows the confidence intervals around the proportion of men and women who would have genetic testing. 23 .76 .74 .72 .70 .68 .66 .64 .62 .60 N= 396 505 MALE FEMA LE RESPONDENT'S SEX (SEX) Figure 9.9. Confidence intervals around the proportion of men and women who would have genetic testing Here the mean for men is slightly lower but the confidence intervals overlap, so we would not conclude there is any gender difference. Now let’s see how willingness to have the testing differs across both race and gender. The confidence intervals for black and white men and women are displayed in Figure 9.10. Again, we have had to restrict ourselves to just European and African Americans as there are too few people in any other racial/ethnic category to analyze with this data set. 24 .9 .8 .7 RESPONDENT'S RACE (R .6 WHITE BLACK .5 N= 335 40 MALE 388 90 FEMA LE RESPONDENT'S SEX (SEX) Figure 9.10. gender Confidence intervals of willingness to have genetic testing by race and Looking at the left hand side of the diagram, we see that black men are a bit more likely, on average, to say they would have the testing. But the confidence intervals overlap so we wouldn’t conclude that there’s a difference between the two groups of men in the population. Looking on the right, we see the same pattern, with black women on average more likely than white women to say they’d have the test, but with overlapping confidence intervals. All four of the confidence intervals overlap, so we can’t argue for any strong gender or race effects. 25 Exercises Chapter 9 Q1. One thousand physicians are surveyed nation-wide about their views on the quality of care received by patients who are enrolled in health maintenance organizations (HMOs). The quality of care scale ranges from 0-10, with 10 reflecting the highest quality of care. The sample mean is 4.20, and the sample variance is 3.45. Based on this information, calculate the confidence interval around the sample mean. What does this confidence interval tell us? Q2. Is a 90% confidence interval going to have a smaller or larger range than a 95% confidence interval? What are the advantages and disadvantages of selecting a 95% versus a 90% confidence interval? Q3. A researcher is interested in whether people, on average, work a full-time 40-hour work week. To examine this research question, he turned to the 1996 General Social Survey. In a question asking respondents (N=1935) the average number of hours they worked in the prior week, the average hours worked was 42.35, with a range of 2-89 hours. The sample variance was 199.95. Based on this information, construct the 95% confidence interval and show your work. Do people tend to work 40 hours a week? Q4. A group of researchers is interested in factors that relate to the number of children people have. One of the researchers suggests that, among other factors, educational attainment may relate to number of children, with the hypothesis that people with the highest degrees having fewer children (more focus on their professional lives) than those with less educational achievement. Data from 1946 participants in the 1996 General Social Survey who are at least 35 years old are presented below by highest degree earned. (a) Graphically present the five 95% confidence intervals. (b) Does there appear to be a relationship between number of children and educational attainment? Table 9.6. Mean number of children by educational level N Mean 0 LT HIGH SCHOOL 1 HIGH SCHOOL 2 JUNIOR COLLEGE 3 BACHELOR 4 GRADUATE Total 337 988 118 311 192 1946 2.93 2.33 1.98 1.80 1.79 2.28 Std. Deviation 2.00 1.63 1.37 1.55 1.47 1.70 Std. Error .11 .05 .13 .09 .11 .04 95% Confidence Interval for Mean Lower Bound 2.72 2.23 1.73 1.63 1.58 2.20 Upper Bound 3.15 2.44 2.23 1.98 2.00 2.35 26 27