Download - Fairview High School

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
95% Confidence Intervals: Illustrating Uncertainty in Summarized Data
Paul K. Strode, Ph.D., Fairview High School, Boulder, CO
Consider the following scenario and data:
Huck found two very large fish tanks in an abandoned barn. He moved them to his room and
filled them with water, sand, and some submerged plants from the lower Mississippi River. He
then went fishing. Huck returned home with several flounder (a fish species; Paralichthys
lethosigma) that he put in his two tanks so that he could eat them later. Within a week, two of the
ten flounders in the tank next to his south-facing window (Tank 1) had died, but the ten in the
tank next to the inner wall of his room (Tank 2) were doing great. Huck went to his friend Tom
and told him, “Hey Tom I catched some fish in the river and now two of ‘em gone and went
belly up.” Tom went to the town library, did some research, and discovered that flounders have a
rather narrow range of tolerance for water temperature. When Tom returned to Huck’s house,
Huck had just returned from the school house carrying a lab-grade thermometer. “Where did you
get that?” asked Tom. “Pap always said it warn't no harm to borrow things, if you was meaning
to pay them back, sometime.” said Huck. Tom responded, “That’s nothing but a soft name for
stealing.” At Tom’s suggestion, the two took the temperature of both tanks at 8 PM that evening.
Tank 1 had a temperature of 66° F and Tank 2 had a temperature of 65° F. “Well, it must be
somethin’ else.” Huck said. “Not so fast!” said Tom, “The temperatures of the tanks may vary
over a 24-hour period, and they may vary differently depending on their location in the room and
their proximity to the sunny window. In fact, I predict the tank by the window that gets hit by the
sun during the day will have a higher mean temperature than the tank against the wall.” The two
then took turns measuring the tank temperatures every two hours over the next 24 hours,
converted the temperatures to Celsius, and recorded their data in the table below.
Time
Tank 1
Temp (°C)
Tank 2
Temp (°C)
8 PM
10 PM
12 AM
2 AM
4 AM
6 AM
8 AM
10 AM
12 PM
2 PM
4 PM
6 PM
18.9
18.7
18.4
18.1
17.5
17.5
19.2
20.5
23.7
23.9
23.1
19.4
18.3
18.1
17.9
17.4
17.1
17.0
17.6
18.4
18.9
18.9
18.9
18.4
After the last measurement, the boys calculated the mean (𝑥̅ ) for each tank over the 24-hour
period. They also calculated the sample standard deviation (s) (see note a below) for each
tank’s set of data to get a sense for how much the measurements as a whole in each tank deviated
from (disagreed with) the means for each tank. The means for Tanks 1 and 2 were 19.9 °C and
18.1 °C, respectively. The standard deviations for tanks 1 and 2 respectively were 2.36 and 0.68.
Huck had already noticed that the temperatures for Tank 1 were spread out all over the place
while the temperatures for Tank 2 were less spread out. Tom knew that for a normally distributed
random variable, around 68% of the measurements should fall within 1 standard deviation from
the mean. He also knew that around 95% of the measurements should fall within 2 standard
deviations from the mean. He drew Huck the figure on the next page (Figure 1).
Tom then graphed the means with a bar graph and asked Huck if he thought the means were
different (Figure 2). “Heck yes the means is different!” said Huck. Tom then asked Huck if he
was sure those were the true means for the two tanks. “I don’t get it.” said Huck. Tom asked the
question differently, “What I mean is, if we recorded these data again over the next 24 hours and
calculated the means then did it again the next 24 hours, would we keep getting the same mean?”
“I suppose not exactly.” answered Huck. “Then how confident are you that those are the true
means for the tanks?” asked Tom. “I guess I’m not that confident now that you put it that way.”
said Huck. “What if I told you we could come up with a range of temperatures within which the
true means likely lie?” asked Tom. “I would like that.” said Huck, nodding.
Tom then went on to explain 95% confidence intervals (95% CI) (see note b below) to Huck. He
told Huck that if they took the standard deviation (s) for each tank’s temperature data and
multiplied them by two, then divided by the square root of the number of measurements (n) they
took for each tank, they would have a range within which the true means likely were. Tom used
Equation 1 below and calculated the 95% confidence interval for each tank. The values were
1.36 and 0.39 for Tank 1 and Tank 2, respectively. Tom then drew a new bar graph and added
the confidence interval to each of the mean bars and showed Huck Figure 3. He explained to
Huck that the confidence intervals were also called error bars and showed the uncertainty in the
means. But, he explained, they if they did the temperature study of the tanks 100 times, 95 out of
those times they could expect the mean for Tank 1 to lie somewhere between 19.90 + 1.36 and
19.90 – 1.36, or 21.26 and 18.54. The mean for Tank 2 should lie between 18.49 and 17.71 for
95 out of 100 tries. “Well looky there,” said Huck, pointing to the error bars, “the bottom of the
error bar for Tank 1 doesn’t overlap the top of the error bar for Tank 2. So, the two tanks may
actually have different temperature profiles over 24 hours. I told you the means were probly
differnt!” “So do you think we should move Tank 1?” asked Tom. “No,” said Huck, “I’ll just eat
the fish in that tank and then fill it with turtles.”
95% CI =
1.96 (s)
√n
Eq 1
Figure 1. Standard deviations (σ) and the normal distribution for a large population of random
measurements. s is the symbol for the standard deviation of a sample of measurements from a
population and σ as seen here is the symbol for the standard deviation for an entire population.
23
22
22
21
21
Temperature (C)
Temperature (C)
23
20
19
18
20
19
18
17
17
16
16
15
15
Tank 1
Tank 1
Tank 2
Figure 2. Mean temperature over 24 hours.
Tank 2
Figure 3. Mean temperature over 24 hours.
Error bars are 95% confidence intervals.
Practice Problem
The data in the table to the right come from Letourneau and
Dyer (1998) where they experimentally added a predatory
ant-eating beetle to the trophic system (food chain) that
includes the piper tree (Piper cenocladum), whose leaves
are eaten by a small herbivore, which in turn is eaten by a
predatory ant. The data in the table are the leaf area
remaining (cm2) on Piper seedlings (little trees) in each of
ten plots after an 18-month experiment. The standard
deviation (s) has been calculated for you.
1. Use 95% confidence intervals to determine if the
addition of the predatory beetle may have had an
effect on the leaf area eaten by the herbivores.b
2. Graph the means and add the 95% confidence
intervals as error bars. Label your graph axes and
write a descriptive figure caption across the bottom
of your graph.
Plot
1
2
3
4
5
6
7
8
9
10
s
Control
Predatory
Beetle
(cm2)
208
287
198
223
248
198
290
267
287
294
40.1
Added (cm2)
145
160
123
187
120
87
143
187
140
158
30.6
a
Note: The Standard Deviation (s) (how spread out the numbers in a sample are) of a sample is
the square root of the Variance (s2) of the sample. Variance (error within a sample) is calculated
by subtracting each datum in a sample from the sample mean and then squaring it. We then sum
(add up) all those values and divide by the sample size (n) minus 1 (n – 1). By dividing by one
less than the actual sample size, this keeps us from underestimating the variance.
b
Note: 95% confidence intervals only describe the set of data (and their mean) from which they
are calculated. In other words, confidence intervals and any other error bars only describe the
uncertainty of a calculated mean from a single set of data. Other statistical tests are used to
directly compare two means and determine if the observed differences are significantly different.