Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
95% Confidence Intervals: Illustrating Uncertainty in Summarized Data Paul K. Strode, Ph.D., Fairview High School, Boulder, CO Consider the following scenario and data: Huck found two very large fish tanks in an abandoned barn. He moved them to his room and filled them with water, sand, and some submerged plants from the lower Mississippi River. He then went fishing. Huck returned home with several flounder (a fish species; Paralichthys lethosigma) that he put in his two tanks so that he could eat them later. Within a week, two of the ten flounders in the tank next to his south-facing window (Tank 1) had died, but the ten in the tank next to the inner wall of his room (Tank 2) were doing great. Huck went to his friend Tom and told him, “Hey Tom I catched some fish in the river and now two of ‘em gone and went belly up.” Tom went to the town library, did some research, and discovered that flounders have a rather narrow range of tolerance for water temperature. When Tom returned to Huck’s house, Huck had just returned from the school house carrying a lab-grade thermometer. “Where did you get that?” asked Tom. “Pap always said it warn't no harm to borrow things, if you was meaning to pay them back, sometime.” said Huck. Tom responded, “That’s nothing but a soft name for stealing.” At Tom’s suggestion, the two took the temperature of both tanks at 8 PM that evening. Tank 1 had a temperature of 66° F and Tank 2 had a temperature of 65° F. “Well, it must be somethin’ else.” Huck said. “Not so fast!” said Tom, “The temperatures of the tanks may vary over a 24-hour period, and they may vary differently depending on their location in the room and their proximity to the sunny window. In fact, I predict the tank by the window that gets hit by the sun during the day will have a higher mean temperature than the tank against the wall.” The two then took turns measuring the tank temperatures every two hours over the next 24 hours, converted the temperatures to Celsius, and recorded their data in the table below. Time Tank 1 Temp (°C) Tank 2 Temp (°C) 8 PM 10 PM 12 AM 2 AM 4 AM 6 AM 8 AM 10 AM 12 PM 2 PM 4 PM 6 PM 18.9 18.7 18.4 18.1 17.5 17.5 19.2 20.5 23.7 23.9 23.1 19.4 18.3 18.1 17.9 17.4 17.1 17.0 17.6 18.4 18.9 18.9 18.9 18.4 After the last measurement, the boys calculated the mean (𝑥̅ ) for each tank over the 24-hour period. They also calculated the sample standard deviation (s) (see note a below) for each tank’s set of data to get a sense for how much the measurements as a whole in each tank deviated from (disagreed with) the means for each tank. The means for Tanks 1 and 2 were 19.9 °C and 18.1 °C, respectively. The standard deviations for tanks 1 and 2 respectively were 2.36 and 0.68. Huck had already noticed that the temperatures for Tank 1 were spread out all over the place while the temperatures for Tank 2 were less spread out. Tom knew that for a normally distributed random variable, around 68% of the measurements should fall within 1 standard deviation from the mean. He also knew that around 95% of the measurements should fall within 2 standard deviations from the mean. He drew Huck the figure on the next page (Figure 1). Tom then graphed the means with a bar graph and asked Huck if he thought the means were different (Figure 2). “Heck yes the means is different!” said Huck. Tom then asked Huck if he was sure those were the true means for the two tanks. “I don’t get it.” said Huck. Tom asked the question differently, “What I mean is, if we recorded these data again over the next 24 hours and calculated the means then did it again the next 24 hours, would we keep getting the same mean?” “I suppose not exactly.” answered Huck. “Then how confident are you that those are the true means for the tanks?” asked Tom. “I guess I’m not that confident now that you put it that way.” said Huck. “What if I told you we could come up with a range of temperatures within which the true means likely lie?” asked Tom. “I would like that.” said Huck, nodding. Tom then went on to explain 95% confidence intervals (95% CI) (see note b below) to Huck. He told Huck that if they took the standard deviation (s) for each tank’s temperature data and multiplied them by two, then divided by the square root of the number of measurements (n) they took for each tank, they would have a range within which the true means likely were. Tom used Equation 1 below and calculated the 95% confidence interval for each tank. The values were 1.36 and 0.39 for Tank 1 and Tank 2, respectively. Tom then drew a new bar graph and added the confidence interval to each of the mean bars and showed Huck Figure 3. He explained to Huck that the confidence intervals were also called error bars and showed the uncertainty in the means. But, he explained, they if they did the temperature study of the tanks 100 times, 95 out of those times they could expect the mean for Tank 1 to lie somewhere between 19.90 + 1.36 and 19.90 – 1.36, or 21.26 and 18.54. The mean for Tank 2 should lie between 18.49 and 17.71 for 95 out of 100 tries. “Well looky there,” said Huck, pointing to the error bars, “the bottom of the error bar for Tank 1 doesn’t overlap the top of the error bar for Tank 2. So, the two tanks may actually have different temperature profiles over 24 hours. I told you the means were probly differnt!” “So do you think we should move Tank 1?” asked Tom. “No,” said Huck, “I’ll just eat the fish in that tank and then fill it with turtles.” 95% CI = 1.96 (s) √n Eq 1 Figure 1. Standard deviations (σ) and the normal distribution for a large population of random measurements. s is the symbol for the standard deviation of a sample of measurements from a population and σ as seen here is the symbol for the standard deviation for an entire population. 23 22 22 21 21 Temperature (C) Temperature (C) 23 20 19 18 20 19 18 17 17 16 16 15 15 Tank 1 Tank 1 Tank 2 Figure 2. Mean temperature over 24 hours. Tank 2 Figure 3. Mean temperature over 24 hours. Error bars are 95% confidence intervals. Practice Problem The data in the table to the right come from Letourneau and Dyer (1998) where they experimentally added a predatory ant-eating beetle to the trophic system (food chain) that includes the piper tree (Piper cenocladum), whose leaves are eaten by a small herbivore, which in turn is eaten by a predatory ant. The data in the table are the leaf area remaining (cm2) on Piper seedlings (little trees) in each of ten plots after an 18-month experiment. The standard deviation (s) has been calculated for you. 1. Use 95% confidence intervals to determine if the addition of the predatory beetle may have had an effect on the leaf area eaten by the herbivores.b 2. Graph the means and add the 95% confidence intervals as error bars. Label your graph axes and write a descriptive figure caption across the bottom of your graph. Plot 1 2 3 4 5 6 7 8 9 10 s Control Predatory Beetle (cm2) 208 287 198 223 248 198 290 267 287 294 40.1 Added (cm2) 145 160 123 187 120 87 143 187 140 158 30.6 a Note: The Standard Deviation (s) (how spread out the numbers in a sample are) of a sample is the square root of the Variance (s2) of the sample. Variance (error within a sample) is calculated by subtracting each datum in a sample from the sample mean and then squaring it. We then sum (add up) all those values and divide by the sample size (n) minus 1 (n – 1). By dividing by one less than the actual sample size, this keeps us from underestimating the variance. b Note: 95% confidence intervals only describe the set of data (and their mean) from which they are calculated. In other words, confidence intervals and any other error bars only describe the uncertainty of a calculated mean from a single set of data. Other statistical tests are used to directly compare two means and determine if the observed differences are significantly different.