The last two chapters introduced the key concepts we use in making statements in
the face of random error: the sampling experiment and the sampling distribution. In
nearly every chapter from here on we will learn about tools that allow us to use the logic
of the sampling experiment and sampling distribution to deal with sampling error in
surveys, randomization error in experiments, measurement error or other sources of
random error in data we are analyzing. While many tools have been developed by
statisticians to deal with specific problems in data analysis, most of them are applications
of these two basic concepts. In fact, most of the tools are really special cases of the two
general tools we introduce in this chapter and the next: the confidence interval and the
hypothesis test. Confidence intervals allow us to make estimates of values in the
population, such as the population mean, while taking into account that our estimates are
never certain. Hypothesis tests allow us to assess the degree to which some assertion
about the population is reasonable to believe, given the information about the population
we have in our sample. That is, the hypothesis test gives a sense of whether or not it is
reasonable to believe a particular statement about the population. For example, a
hypothesis test could tell us whether or not it’s reasonable to believe that men and women
have the same attitude toward genetic testing, or have the same level of concern about
In this chapter we focus on the confidence interval. In Chapter 8 we discussed
the Central Limit Theorem and Gosset’s work in describing sampling distributions, both
of which help us make statements about the population based on sample data in two
ways. (Remember that we can rely on the Central Limit Theorem when we have large
simple random samples, and we can use Gosset’s work when we have simple random
samples of any size drawn from a Normally distributed population.)
First, they tell us that the mean of a sample is a good guess of the mean of the
population. Below we discuss in detail what we mean by “good” but to preview: We
know that for large samples, most samples will have means close to the population mean,
few sample means will be far from the population mean, and if we use the sample mean
as our estimate of the population mean we will be right on average. The same thing
applies when we have simple random samples of any size drawn from a normally
distributed population—the mean of the sample mean is a good guess of the population
mean for the same reasons.
The use of the sample mean to estimate the population mean is called a point
estimate. We are guessing at the population parameter with a single number. But
sampling distributions let us do even better because the second way sampling
distributions help us make statements about the population is by allowing us to develop
an interval estimate. An interval estimate is a range that we can be quite certain
includes the real population mean. This is important, since few sample means are exactly
equal to the population mean, even though most are pretty close. We can have not only a
single number that is a good guess of what’s true in the population but a sense of how
likely that guess is to miss (or accurately reflect) the true population value.
Confidence Intervals Using the Central Limit Theorem
While the sample mean may be the best single number to use as an estimate of the
population mean, how can we take into account the uncertainty involved in making such
a guess? To do this, we want to develop a high and low estimate around the sample mean
so that this range will be very likely include the population mean. That is, we can be
pretty confident that for our research sample that the confidence interval we build will
include the real value of the population mean. We are confident because we know from
sampling theory that we usually “catch” the value of the population mean in the
confidence interval. So unless we are very unlucky, it’s likely that the mean of the
population we are studying is “captured” by the confidence interval we construct from
data in our sample.
We can determine the percentage of samples in the sampling distribution for
which the confidence interval “catches” the population mean. It is common to use 90%,
95% and 99%. Thus, we can be 90, 95 or 99% certain that the confidence interval
includes the population mean. When we use these values we are saying that the
procedures we are using to construct the confidence intervals gives us intervals that cover
the population mean for 90% or 95% or 99% of the samples in the sampling distribution.
To be a bit more informal, we can say that we are 90% or 95% or 99% certain that the
confidence interval we have constructed for our research sample “captures” (includes) the
population mean.
Constructing the Confidence Interval for Large Samples
Let’s start with the case of large simple random samples. If we have a large
simple random sample, and we are trying to estimate the population mean, the confidence
interval will be:
Upper limit: x  Z1  x
Equation 9.1
Lower limit: x  Z1  x
Equation 9.2
Or, in the form that statisticians use:
x  Z1  x
Equation 9.3
These formulas are really rather simple if we take it one step at a time. First, we
would read the equation as “x bar plus and minus z sub one minus alpha times sigma hat
sub x bar.” We start with the sample mean, which is our best estimate of the population
mean. To get the upper bound of the confidence interval we are going to add a number to
the sample mean. To get the lower bound of the confidence interval, we will subtract the
same number from the sample mean. We calculate the number to add and subtract by
multiplying together two numbers. The first one, z1- , is found by looking in a “z table”
for the number corresponding to the confidence level we want. Let’s say we want to be
95% certain that the confidence interval we build from our research sample will capture
the population mean—that is, we want to use a procedure that will include the population
mean in the interval for 95% of the samples in the sampling experiment. The z table tells
us the value of z to use to make sure the confidence interval will include the mean of the
population the percentage of times we want. The z table is based on figuring out the
areas under the Normal distribution, that is, how far we have to go to get a confidence
interval big enough to work the specified percentage of times.
In a sense the z table saves us the trouble of having to do a sampling experiment
because the Central Limit Theorem tells us what would happen if we did the experiment.
The z table just records the results of the elaborate calculations that are required by the
Theorem. Computer statistical packages do the calculations directly, so we don’t use
tables much these days except when we are learning how to build a confidence interval.
In the next chapter it will become clear why the confidence level (95% in this case) is
referred to as 1-. For the moment, accept it as an arbitrary label. Note that the right
hand side of the expression
Z1  x
Equation 9.4
is also referred to as the “margin of error.” Thus you will sometimes see newspapers
reporting that the results of a poll have a 5% margin of error. What they are saying is that
a particular confidence interval (usually 95%, but often the newspapers don’t say) is the
mean plus and minus the reported margin of error.
The symbol “sigma hat sub x bar” is written
ˆ x
Equation 9.5
and is the symbol for the standard deviation of the sampling distribution of sample
means. It is given a special name: the standard error of the mean, but it is just the
standard deviation of the sampling distribution. We know from the Central Limit
Theorem that
Equation 9.6
That is, the variance of the sampling distribution equals the variance of the population
divided by the sample size. Then we can get the standard error by taking the square root.
But since we don’t know the variance of the population, what good does this do?
To actually build a confidence interval, we have to estimate the variance of the
population. Gosset showed that we can get a good estimate of the population variance by
dividing the sample sum of squares by n-1. That is
ˆ 2 
 x )2
n 1
Equation 9.7
Or if you already have the variance of the sample s2
ˆ 2  (
)s 2
n 1
Equation 9.8
Let’s build a confidence interval for the animal concern scale. Recall that we
have a sample of 1386 people and that the score can run from 2 to 10. The sample mean
is 5.20 and the sample variance is 3.64.
1. Estimate the population variance.
Given that the sample size is 1386, we have
ˆ 2  (
 1386 
 3.64  3.64
 1385 
ˆ 2  
)s 2
n 1
Equation 9.9
In this case the sample size is so large that the correction does not change the variance, at
least to the second decimal point.
2. Estimate the variance of the sampling distribution.
Equation 9.10
ˆ 2 x 
ˆ 2
That is, we take the estimate of the variance of the population and divide it by the
size of our sample to get an estimate of the variance of the sampling distribution. This
will be 3.64/1386= 0.0026.
3. Estimate the standard error.
This is easy. Once we have an estimate of the variance of the sampling
distribution, we just take the square root to get an estimate of the standard deviation of
the sampling distribution—the standard error. This will be the square root of 0.0026,
which equals 0.05.
4. Find the z value for the level of confidence we want.
If we want a 95% confidence interval, the z value will be 1.96 (see the Z table in
the Appendix).
5. Multiply the z value by the standard error to get the “margin of error.”
This is 1.96*0.05= 0.098
6. Add the margin of error (in this case, 0.098) to the mean to get the upper bound.
This gives 5.20 + 0.098 = 5.298 or about 5.3
7. Subtract the margin of error (in this case, 0.098) from the mean to get the lower
This gives 5.20 - 0.098 = 5.102 or about 5.1
Thus, we can be 95% certain that the mean animal concern score for the US adult
population is between 5.2 and 5.3. By 95% certain we mean that if we used this
procedure in a sampling experiment, the confidence interval would cover the true
population mean for 95% of all samples, and for 5% of all samples it would miss.
Sampling Experiments
It may be helpful to show how sampling experiments work using the two
examples we explored in the last chapter. Recall that we created two populations from
which to draw samples. One had a uniform distribution of years of education, with a
mean of 10 years of education and every number of years of education from 0 to 20
having the same number of “people.” The other was a “lumpy” distribution with a mean
of 13.4 and with people “lumping up” in certain numbers representing the number of
years of education that match categories, such as high school graduate or college
graduate. We will again draw large samples of 901 “people.”
For each sample in the sampling distribution, we will calculate a confidence
interval using steps 1-6 above. We will see if the confidence interval for that sample
includes the population mean. For the uniform distribution the mean was 10. For the
non-uniform distribution the mean was 13.4. If the confidence interval includes the
population mean, we have a success for that sample in that the confidence interval
included the true population mean. The Central Limit Theorem says that a 95%
confidence interval should be successful for 95% of the samples in the sampling
distribution. Table 9.1 gives the results for both the uniform and non-uniform
Table 9.1.
of Population
95% confidence interval for a sampling experiment
Number of
Number of
It appears that the theory is pretty accurate—for about 95% of samples the confidence
interval included the population mean, and it missed for about 5%.
Confidence Intervals Using the t Distribution
But what do we do if we have small samples? Before Gosset, researchers would
calculate all confidence intervals, even for small samples, using the Central Limit
Theorem. And before Gosset, they had to use the sample variances as the estimate of the
population variation, because they hadn’t figured out how to use the correction that yields
an unbiased estimate of the population variance from the sample. We can try this with
some sampling experiments using these older procedures with small samples to see how
bad the problem can be. We’ve had the computer draw samples of size 7 from both the
uniform and the non-uniform populations of education we’ve been using. Table 9.2
shows the performance of confidence intervals based on the Central Limit Theorem with
a biased estimate of population variance.
Table 9.2
95% confidence intervals based on applying the Central Limit Theorem to
small samples
Distribution of
Number of
Number of
It appears that we’re in trouble. The 95% confidence interval is supposed to include the
population mean for 95% of samples, and miss only 5% of the time. But it’s working
only around 90% of the time. That is, we’re missing the true population mean nearly
twice as often as we should be.
The problem is that using a z value with small samples makes the size of the
interval too small. Further, since we tend to underestimate the population variance when
we just use the sample variance as our estimate, rather than using the formula that Gosset
developed, that too will make the confidence interval too small. The result is the
confidence interval misses the population mean more often than it should.
We can easily get around this by using Gosset’s work to build the confidence
interval. We proceed in exactly the same way as we did with the z distribution. The only
difference is that we now use a value for tdf, 1- rather than for z 1-. Because we have a
small sample from a Normally distributed population when we use Gosset’s approach,
the sampling distribution of sample means will have a t distribution. We use a t table to
find these values of t we want that correspond to the accuracy of the confidence interval,
just as we used a z table to find the appropriate values of z. But now we have to keep
track of the number of degrees of freedom (n -1 for estimating a confidence interval for
the population mean) to find the right t value. Remember, we can only use Gosset’s
theorem and the t distribution when we have reason to believe that the population from
which the sample was drawn is Gaussian in its distribution.
Suppose that instead of having 1386 observations of the animal concern scale
scores, we only had 10 observations. This might be because we are studying something
that is expensive to measure, or because we are doing a pilot study with limited data. To
show how we would proceed, we have drawn a random sample of ten cases from our
sample of 1386. A random sample of a random sample is still a random sample, so our
10 cases are a random sample of the US population. Of course, this is not something we
would do in research, but here it helps to show how we can build a confidence interval
for the mean when we have a small sample.
The mean of our sample of 10 is 5.5 and the variance is 6.05. We will now build
the confidence interval assuming that the population values of the animal concern scale
are Normally distributed. Of course, we don’t know if that’s true, and our confidence
intervals will be accurate only if the population really is Normal. Figure 9.1 is a
histogram of the sample of 10 cases imposed with a Normal curve.
Figure 9.1.
Histogram of animal concern score in random sample of size 10
The shape of the frequency distribution of the sample would make us cautious
about the confidence intervals in that, while it has some tendency to peak in the middle,
it’s not very close to a Normal distribution. But of course, it’s not the shape of the
sample distribution we are concerned with but rather the shape of the population
distribution. Still, we would want to be very cautious if an important decision rested on
the confidence interval. But having noted that our confidence interval may not capture
the population mean the right percentage of times if the population really isn’t Normal,
we will proceed through the steps of building the confidence interval.
The steps we use in building the confidence interval are the same as before:
1. Estimate the population variance. Again, we take the sample variance and
multiply by the correction factor.
 n 
 10 
ˆ 2  
 s    6.05  6.72
 n 1 
Equation 9.11
2. Use that to estimate the variance of the sampling distribution.
 ˆ 2   6.72 
  0.672
 n   10 
ˆ x 2  
Equation 9.12
3. Estimate the standard error (take the square root of the variance of the
sampling distribution found in step 2 to get the standard error of the mean).
ˆ x  0.672  0.820
Equation 9.13
4. Find the t value (remember this is a small sample) that matches the desired
level of confidence, 95%, and the number of degrees of freedom, which is n-1 or
9. The t value is 2.262 (see the t Table in the Appendix). Remember that the
95% z value is 1.96. The t value takes the sample size into account and is larger
than the z value, thus making the confidence interval bigger than if we used a z
based on the Central Limit Theorem.
5. Multiply the t value by the standard error.
Equation 9.14
6. Add this to the population mean to get the upper bound.
5.50 – 1.85 = 3.65
Equation 9.15
7. Subtract from the population mean to get the lower bound.
5.50 + 1.85 =7.35
Equation 9.16
So we can be 95% certain that the true population mean for animal concern is
between about 3.65 and 7.35. Remember that the 95% means that this confidence
interval procedure would catch the real population mean in 95% of samples in a sampling
experiment. Also remember that the 95% depends on the population distribution of
education being Normal. If it’s a bit different than Normal, then the confidence interval
will hit less often.
What if the population isn’t really normal? We never know for certain if the
population is Normally distributed, though sometimes a body of previous research allows
us to be fairly certain it’s roughly Normal. We can see what happens when we apply
Gosset’s theorem to building confidence intervals with small samples when the
population is roughly normal and when it is not by conducting another set of sampling
experiments. We’ll use the same two non-Normal distributions of education we used
before, the uniform distribution and the lumpy distribution. We’ll also add a third
distribution – one that is a roughly Normal distribution of education with a mean of 9.96
and a variance of 16.43. Figure 9.2 shows this population. Note that it does deviate from
the Normal a bit, in that there are too many cases in the “tails” and a bit two few in the
middle. If we had a perfectly Normal distribution, the results based on Gosset’s theorem
would work perfectly. But here we want to see what happens if we are a little off.
Figure 9.2.
Number of years of education
Population of years of education that is roughly Normally distributed
Table 9.3 shows how often out of a thousand samples in a sampling experiment
the confidence intervals based on Gosset’s theorem actually include the population mean.
Table 9.3.
Accuracy of 95% confidence interval based on Gosset’s theorem when the
population is not Normally distributed
Distribution of
assumed by
Nearly Normal
Number of
success out of
Number of
misses out of
We can see that, in each case when we apply the confidence interval constructed
on the assumption that the population is Normally distributed to non-Normal populations,
we actually included the true population mean fewer times than the theory predicted. We
should have had 950 out of 1000 success and 50 misses according to theory. This should
make us a bit cautious about working with small samples where we don’t know the
distribution of the population to be Normal. But even with the very non-Normal uniform
and lumpy distributions, we don’t go too far wrong. Applying Gosset’s theorem to these
non-Normal populations still created confidence intervals that “captured” the true
population mean about 94% of the time.
Size of Confidence Intervals
At the beginning of the chapter we said that you can build the confidence interval
to be as certain about capturing the true population mean as you want. While 95% is a
pretty good success rate, is it possible be absolutely certain? Yes, we can be absolutely
certain that the confidence interval for education runs from 0 (the lowest possible value)
to infinity, or to some very large number. Of course, that’s not very useful. In
confidence intervals there is always a tradeoff between how wide the confidence interval
is and how certain you are it catches the population mean. The wider the range, the more
certain you are. Table 9.4 shows how the size of the confidence interval based on the
Central Limit Theorem changes as we demand more certainty that we have captured the
true population mean. Remember that the sample size for the animal concern variable we
are using as an example is 1386.
Table 9.4.
Levels of confidence associated with interval estimates for mean animal
concern score
Probability of hitting
population mean
Upper bound Lower bound
Size (upper
limit minus
lower limit)
The more certain we want to be that the confidence interval captures the
population mean, the broader the range of our estimate. Figure 9.3 shows how the size of
the confidence interval increases with the certainty that we have captured the population
Size of confidence interval
Probablity of CI
Figure 9.3.
Size of confidence interval as certainty in capturing the population mean
In some applications we’d like to have the size of the confidence interval
relatively small but the probability relatively high. How can we do that? Remember that
there are several things that influence the size of the confidence interval:
1. The confidence level. The more certain we are of getting the population mean
in the interval, the larger the interval.
2. The sample size. The variance of the sampling distribution is inversely
proportional to the sample size, so the larger the sample size the smaller the
confidence interval. Since we actually multiply by the standard error, which is
the square root of the variance of the sampling distribution, the size of the
confidence interval changes proportionately with the square root of the sample
size. If you want to cut the size of the confidence interval in half by increasing
sample size, you have to quadruple the sample size. The sample size also plays a
role in the t value in that the smaller the number of degrees of freedom, the larger
the t.
3. The variance of the population. Usually, this is out of the researcher’s control.
But some variables may have smaller variance than others, so if you can work
with variables known to have small variance, this could reduce the size of the
confidence interval.
So, to reduce the size of the confidence interval without increasing the chances of
missing the population mean, we must increase the sample size for a study. In fact, if we
know how big the confidence interval should be, and we know how certain we must be
and can make a guess at the population variance, we can calculate how large our sample
needs to be.
The equation is
4 2 t n21,1
Equation 9.17
Here c is the size of the confidence interval we want. Remember that ̂ 2 is our
estimate of the population variance and that tn-1, 1-α is the value of t from the table to get
the probability of catching the population mean that we want. Of course the value for t
depends on the confidence level but also the sample size. So you can start with a guess,
look up a t value, calculate the sample size required, then use that sample size for a new t
and repeat the process until the answer doesn’t change. The tricky part can be guessing
the value of the population variance. Usually we have to rely on prior research for this.
Notice that the size of the population does not enter into the formula. As long as
we are drawing a relatively small proportion of the population into the sample (so that we
are approximating sampling with replacement), then the population size is irrelevant. For
example, for a given variance, confidence level and size of confidence interval, you need
the same size sample from one state as you do from the whole country.
Graphing Confidence Intervals
It is common to see graphs that display the mean and confidence interval of a variable
conditional on some other variables. Figure 9.4 shows the mean animal concern score by
race (limited to Euro-Americans and African-Americans because of the relatively small
sample) and gender. The small boxes represent the sample mean scores on the scale for
each gender/ethnic group (remember the sample mean is our best estimate of the
population mean). The vertical bars extend to the 95% confidence intervals. [If you see
an error bar graph in the literature, check to see the confidence level. A 95% confidence
interval is the most common, but sometimes bars are constructed at two standard errors
(this is because the z value for 95% is 1.96, very close to 2) or at one standard error (with
a large sample this would be about a 42% confidence interval).]
Figure 9.4.
Mean animal concern score by race and gender
Note that because of the much smaller sample size, the confidence intervals for
African-Americans are much larger than those for Euro-Americans. The mean is lowest
for white men, black men are next, followed by white women. Black women have the
highest animal concern score. Notice that the confidence interval for white men overlaps
with that for black men but not with that for white women or black women. The
confidence intervals for the other three groups do overlap. This might make us suspect
that there are group differences, with white men different from the other three groups on
the animal concern scale. The next chapter will examine how we determine whether
differences across groups are likely to exist in the population, given evidence of
differences in our sample.
Confidence Intervals for Dichotomous Variables
The Central Limit Theorem applies whenever we have a large simple random
sample, whatever the distribution of the population. This means that we can use the
Central Limit Theorem to construct a confidence interval for a dichotomous variable.
The steps are just the same as those above for a continuous variable. As an example, we
will construct a 99% confidence interval for the genetic testing variable. Recall that the
sample mean for this variable is 0.684.
1. Estimate the population variance.
Here we use a simplification: For a dichotomous variable, if we label the
proportion in the category labeled 1 as p, then the variance is just p*(1-p). We know that
for this sample, p is 0.684, so then p*(1-p) will be (0.684)*(1-0.684), which equates to
We can now estimate the population variance using Gosset’s formula:
 n  2  901 
ˆ 2  
s  
 0.216  0.216
 n 1 
 900 
Equation 9.18
Some statisticians use a different logic at this step. They note that the sample
mean is the best estimate we have of the population mean. So we can take 0.684 as our
estimate of the population mean, then apply the p*(1-p) formula to the population mean
to get an estimate of the population variance. This gives the same result to three decimal
places in this example because we have a reasonably large sample. The two approaches
would differ with small samples, but we can’t use this approach with small samples
2. Estimate the variance of the sampling distribution
Equation 9.19
ˆ 2 x 
ˆ 2
That is, we take the estimate of the variance of the population and divide it by the
size of our sample to get an estimate of the variance of the sampling distribution. This
will be 0.216/901 = 0.00024.
3. Estimate the standard error.
Once we have an estimate of the variance of the sampling distribution, we just
take the square root to get an estimate of the standard deviation of the sampling
distribution—the standard error. This will be the square root of 0.0024 which equals
4. Find the z value for the level of confidence we want. If we want a 99% confidence
interval, the z value will be 2.576 (See the z table in the Appendix).
5. Multiply the z value by the standard error. This is 2.576* 0.015 = 0.039.
6. Add this to the mean to get the upper bound. This gives 0.684 + 0.039 = 0.723.
Subtract the number from step 5 from the mean to get the lower bound. This gives
0.684 - 0.039 = 0.645.
Thus, we can be 99% certain that the proportion of people in the US who would
have a genetic test is between about 64% and 72%. Saying that we are 99% certain
means that if we used this procedure in a sampling experiment, the confidence interval
would cover the true population mean for 99% of all samples, and for 1% of all samples
it would miss.
When the mean is close to one or zero, Statisticians have developed a more
precise approach based on what is called the binomial theorem. It’s sometimes necessary
to use this if the mean (the proportion in the category scored 1) is very close to one or
zero. There is nothing in the calculations we have just described to keep the confidence
interval from going below zero or over one, but such values don’t mean anything when
working with a proportion. In such cases the Central Limit Theorem doesn’t work and
the more appropriate binomial theorem does. But for most applications when the mean is
not close to one or zero, the Central Limit Theorem works well. If we had used the
binomial theorem for this problem the confidence interval would have ranged from 0.645
to 0.723—hardly a difference.
Rough Confidence Intervals
Remember that the choice of 90%, 95% and 99% rather than, for example, 85%,
97% or some other level of confidence is arbitrary. We want the confidence intervals to
capture the true population mean most of the time so we can be pretty certain we’ve
captured it with the research sample in a particular study. But the common use of 90%,
95% and 99% instead of some other high level of confidence is just a conventional
It is common to see researchers conduct confidence intervals based on z or t
values of 1, 1.5, 2 or 3. There are choosing “nice” values for values of z or t rather than
“nice” values for the confidence level. There is nothing wrong with choosing “nice”
values for z or t rather than “nice” values for the level of confidence Table 9.5 shows the
confidence levels (the chances of catching the population mean) that correspond to z
values of 1, 1.5, 2 and 3. Both indicate a particular level of assurance that the confidence
interval has captured the population mean. In management science, there is sometimes
discussion of having quality control at the “six sigma” level, which means a z value of six
(six standard deviations from the mean), which corresponds to a confidence level of
0.9999, or one chance in ten thousand of missing the true value of the mean.
Table 9.5.
Value of z
Certainty levels for confidence intervals with “nice” values of z
Certainty level
There is a subfield of statistics (beyond the scope of this book) called decision theory
that suggests how sure we can be by calculating the costs of being wrong and the costs of
collecting data.
What Have We Learned?
The Central Limit Theorem and Gosset’s work allow us to use the logic of the
sampling experiment to estimate means of populations from sample data. The best
estimate of the population mean is the sample mean. If we use just one number as the
estimate, it is called a point estimate. But we know our sample mean may differ from the
population mean because of sampling error. So we hedge our estimate by constructing a
confidence interval that, for a designated large percentage of samples, will actually
include the population mean. Several things influence the size of the confidence interval:
the confidence level, the sample size, and the variance of the population. Constructing
confidence intervals for large samples uses the z distribution, and for small samples the t
distribution is used. If we know how big the confidence interval should be, and we know
how certain we must be and can make a guess at the population variance, we can
calculate how large our sample needs to be.
Chapter 9
Example 1.
Why do homicide rates vary?
We can compare plots of means and confidence intervals for the homicide rates of
different groupings of states, where the groups correspond to variables we think might
cause variation in homicide rates. Before we do that, however, we need to stop and
reflect on how to think about confidence intervals of state homicide rates. First,
remember that the states differ substantially in their populations. So the average
homicide rate across states is not the same as the average homicide rate for the US. This
is because in taking the average across all states, which is 5.83, we give every state equal
weight. But of course, states vary enormously in population, from Wyoming, which in
1997 had about 481,000 people, to California, which had about 31,878,000 people in
1997. So the homicide rate of 3.50 per 100,000 for Wyoming represents about 17
homicides while the rate of 8.00 for California represents about 2,550 homicides. If we
want to get the homicide rate for the whole country, we would have to take the size of
each state into account in what is called a weighted average.
Second, we have to remember that the confidence interval is trying to give us a
good estimate of the value of the mean in the population, taking sampling error into
account. But we have data on all fifty states. So what does the confidence interval
mean? One way to think about it is to use the hypothetical logic that the set of states we
actually have is a sort of sample from all the states that might exist with somewhat
different configurations of homicide rates, histories, poverty levels, etc. Then the
confidence interval is an estimate of what the mean of a group of states might be in that
hypothetical population. As we’ve mentioned before, some researchers don’t like this
way of thinking; others do. But unless we have some random mechanism that underlies
our data, then the confidence interval has no meaning. So if we are going to use
confidence intervals in our research, we have to think in terms of a random process. If we
are not comfortable with that logic for a data set, then we shouldn’t use confidence
With those cautions in mind, we can now look at Figure 9.5. It shows the 95%
confidence interval around the mean homicide rate for states that were part of the
Confederacy and those that were not. Clearly, the mean across the former Confederate
states is higher. But we also know from previous chapters that the homicide rate is
related to the amount of poverty in a state. So it would be helpful to take account of
poverty in looking at the effect of region on homicide rates. It may be that many
formerly Confederate states also have substantial amounts of poverty, and the poverty
level is what is really driving homicide rates. In Figure 9.6 we plot the means for states
that were formally in and not in the Confederacy, after first splitting the sample into two
groups, those with higher and those with lower percentages of households in poverty. To
split the states into these two groups, we used the median of the percentage of families in
poverty across states in the Confederacy, 15.0. Those states above the median were
considered to have more poverty; those below the median were considered to have less
Not_c onfederacy
Figure 9.5.
95% confidence interval around the mean homicide rates for Confederate
states and non-Confederate states
Poverty Level
Low Poverty Level
High Poverty Level
Not_c onfederacy
Figure 9.6.
A plot of means for Confederate states and non-Confederate states
For states not in the Confederacy, poverty seems to make a difference. States
with higher levels of poverty have a higher mean homicide rate than those with less
poverty, and the 95% confidence intervals around the mean for these two groups don’t
overlap. While the mean for the low poverty group is lower than the mean for the high
poverty group for states that were in the Confederacy, the 95% confidence intervals
overlap, so our interval estimates would not lead us to suspect differences between the
two groups. And the confidence intervals for both Confederate groups overlap with the
high poverty group of non-Confederate states, though not with the low poverty nonConfederate states. We also see the effects of small sample size—three of our groups
have less than 10 states. The confidence intervals become rather large when we only
have a handful of states in each group. And of course we don’t have any reason to be
sure that the “population” distribution of homicide rates is Normal, so the confidence
intervals based on small samples may not be actually at 95%.
Example 2. For an illustration using the animal concern scale, please review the
example in the text.
Example 3.
Why do nations differ in sustainability?
Here we have the same issue as with analysis of the state homicide data. The data
set is not a random sample – it is all the data available. So when we build a confidence
interval, the “population” to which we are generalizing is a hypothetical one.
For example, we can examine whether or not having women in the legislature is
beneficial to sustainability. It can be argued that women are more concerned than men
about both human welfare and the environment, so having a larger percentage of women
in the legislature may lead to higher levels of sustainability. The median across countries
of women in the legislature is 9.7%, so we will consider any country with 10% or more
as having a high percentage and any country with less than 10% as having a low
percentage of women in the legislature.
Figure 9.7 shows the confidence intervals on the sustainability index for the
countries with high and low numbers of women in the legislature.
Women in legislature
Figure 9.7. Confidence intervals of the sustainability index for the countries with high
and low numbers of women in the legislature
While the mean for countries with a high percentage of women in the legislature
is a bit higher than for the countries with less women in the legislature, the difference is
very small and the confidence intervals completely overlap.
Perhaps the effect of women in the legislature is being masked by other variables
that we are not considering. In later chapters we will consider tools that allow us to look
at many variables at a time, but here we can certainly add one more variable. Let’s
consider the effect of affluence, measured by Gross Domestic Product per capita. We
will split the countries into more and less affluent based on $5600 in gross domestic
product (gdp) per capita (the median across all countries in the data set is $5630). Figure
9.8 shows the effects of both women in the legislature and gdp per capita.
Women in legislature
Figure 9.8.
Confidence intervals of the sustainability index for the countries with high
and low numbers of women in the legislature and of gdp per capita.
Here we see what at first looks like a rather complex pattern. Let’s look first at
countries that have few women in the legislature, on the left hand side of the graph. We
see that in this group of countries, the lower income countries (the confidence interval
furthest left in the graph) have a higher mean sustainability than do the higher income
countries. But the two confidence intervals overlap slightly, so we would be cautious
about drawing any strong conclusions. Looking at the countries with a larger proportion
of women in the legislature on the right half of the graph, we find very little difference
between the two affluence levels. Indeed, the 95% confidence intervals around the
means overlap for all four groups of countries, so we would be inclined to conclude that
there aren’t very strong effects of either women in the legislature or affluence. It may be
that we need to control for other factors before we could see any effects of women in the
legislature or affluence.
Example 4. Why do people differ in their views about genetic testing?
Remember that the mean of a zero-one categorical variable is just the proportion
of people who fall into the category labeled one. Figure 9.9 shows the confidence
intervals around the proportion of men and women who would have genetic testing.
Figure 9.9.
Confidence intervals around the proportion of men and women who would
have genetic testing
Here the mean for men is slightly lower but the confidence intervals overlap, so
we would not conclude there is any gender difference. Now let’s see how willingness to
have the testing differs across both race and gender. The confidence intervals for black
and white men and women are displayed in Figure 9.10. Again, we have had to restrict
ourselves to just European and African Americans as there are too few people in any
other racial/ethnic category to analyze with this data set.
Figure 9.10.
Confidence intervals of willingness to have genetic testing by race and
Looking at the left hand side of the diagram, we see that black men are a bit more
likely, on average, to say they would have the testing. But the confidence intervals
overlap so we wouldn’t conclude that there’s a difference between the two groups of men
in the population. Looking on the right, we see the same pattern, with black women on
average more likely than white women to say they’d have the test, but with overlapping
confidence intervals. All four of the confidence intervals overlap, so we can’t argue for
any strong gender or race effects.
Chapter 9
Q1. One thousand physicians are surveyed nation-wide about their views on the quality
of care received by patients who are enrolled in health maintenance organizations
(HMOs). The quality of care scale ranges from 0-10, with 10 reflecting the highest
quality of care. The sample mean is 4.20, and the sample variance is 3.45. Based on this
information, calculate the confidence interval around the sample mean. What does this
confidence interval tell us?
Q2. Is a 90% confidence interval going to have a smaller or larger range than a 95%
confidence interval? What are the advantages and disadvantages of selecting a 95%
versus a 90% confidence interval?
Q3. A researcher is interested in whether people, on average, work a full-time 40-hour
work week. To examine this research question, he turned to the 1996 General Social
Survey. In a question asking respondents (N=1935) the average number of hours they
worked in the prior week, the average hours worked was 42.35, with a range of 2-89
hours. The sample variance was 199.95. Based on this information, construct the 95%
confidence interval and show your work. Do people tend to work 40 hours a week?
Q4. A group of researchers is interested in factors that relate to the number of children
people have. One of the researchers suggests that, among other factors, educational
attainment may relate to number of children, with the hypothesis that people with the
highest degrees having fewer children (more focus on their professional lives) than those
with less educational achievement. Data from 1946 participants in the 1996 General
Social Survey who are at least 35 years old are presented below by highest degree earned.
(a) Graphically present the five 95% confidence intervals. (b) Does there appear to be a
relationship between number of children and educational attainment?
Table 9.6.
Mean number of children by educational level
N Mean
95% Confidence
Interval for Mean