Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7. The manufacturers of resistors for electric circuits have put in bids to a television company. Their prices are comparable, so the television company purchases 10 resistors of each brand and tests them. The resistors are marked 100 ohms. Each is measured for resistance, and the following values are found. Which company, Circuits R Us or Electronics Superstore, would you recommend as the supplier for the television company? Circuits R Us Electronics Superstore 95 91 105 103 95 99 103 101 98 99 87 106 105 79 103 102 103 97 101 98 2.7 STANDARD DEVIATION Let’s first review the ways we can describe the spread, or variation, in a set of data. 1. Range: the distance between the smallest and the largest value. (This measure is not often used.) 2. Interquartile range: the difference between the first quartile and the third quartile. This measure tells the difference between the smallest and the largest value in the middle one-half of the data. (Recall that this measure is quite robust.) 3. Mean absolute deviation: the average (mean) difference between the data values (in absolute terms, that is, ignoring the plus and minus signs) and the mean of the set of data. (This measure is quite intuitive.) 4. Variance: the average (mean) squared difference between the data values and the mean score of the set of data. (This measure is less intuitive than the mean absolute deviation, but it possesses nice mathematical properties.) Recall that the mean (the average) has the same units as the data it is computed from. If the data are in inches, their mean is in inches; if the data are city MPG values, their mean is a city MPG value; if the data are in baskets per 10 throws, their mean is in baskets per 10 throws. Similarly, the deviation value, which is the difference between a data point and the mean and expresses how far off that data value is from the mean, is in the same units as the data. Finally, the mean absolute deviation, which is the average of the absolute values of the deviation values, is also in the same units as the original data. The variance, in contrast, is not in the same units as the data. Because it is the average of the squares of the deviation values, its units are squares of the units of the data. If the original data are in inches, the variance is in inches squared, or square inches; if the data are city MPG values, the variance is a city MPG value squared. Recall that the mean absolute deviation expresses how far off the data are, on average, from the mean of the data. The interpretation of the variance is less clear, because its units are different from those of the original measurements, and we cannot compare numbers having different units. For this reason we often use another measure of the spread of data: the standard deviation. The standard deviation, sometimes abbreviated SD, is the square root of the variance. This change in the variance rescales the variance to put it on the same scale as the data and the other measures of spread. So finding the standard deviation undoes the effect of distorting the scale of the data by squaring the deviation values. That is, the standard deviation is a new measure of the average difference of values from their mean. It differs from the average distance produced by the mean absolute deviation. The standard deviation has many important statistical properties, some of which we will study later in this book. Example 2.10 Consider Jayne’s basketball scoring record again. Early in the season: Variance ⳱ 6.7 Standard deviation ⳱ 冪6.7 ⬇ 2.6 Late in the season: Variance ⳱ 0.67 Standard deviation ⳱ 冪0.67 ⬇ 0.81 Just as the variance of Jayne’s late-season values is less than the variance of her early-season values, so is the standard deviation of her late-season values less than the standard deviation of her early-season values. The standard deviation of Jayne’s early-season values, 2.6, is again a kind of average difference between Jayne’s baskets-per-10-throws values and their average. Taking the square root undoes the effect of squaring the deviation scores, which was done to find the variance. In this new way of thinking about the average, we can say that, on the average, Jayne’s shooting record was about 2.6 from her mean of 5 baskets in every 10 throws. Late in the season, she was shooting much closer to her average value, or mean (which was now 6), since the standard deviation went down to 0.81. Table 2.9 Deviation Values for MPG City MPG Deviation MPG value Deviation MPG value squared Geo Metro Dodge Colt Chevrolet Astro 46 29 15 16 ⫺1 ⫺15 256 1 225 Mean 30 0 Make and model Example 2.11 482/3 ⳱ 160.7 Using Table 2.9, find the standard deviation of city mileage for the three cars. Solution We know that the standard deviation is given by the square root of the variance. From Table 2.9, Square root of variance ⳱ 冪160.7 ⳱ 12.7 It is instructive to compare the standard deviation of these mileages with their mean deviation. The mean absolute deviation is 10.7 miles per gallon. This is seen by averaging the magnitudes of the deviations from Table 2.9: (16 Ⳮ 1 Ⳮ 15)/3 ⳱ 10.7. We interpret this by saying that on average, the cars differed from the mean mileage by 10.7 miles per gallon. (Two got less than the average of 30 MPG, and one got more than 30 MPG.) Similarly, we interpret the standard deviation of 12.7 miles per gallon by saying that “on average,” the cars differed from the mean mileage by 12.7 miles per gallon. But this time the average is determined by squaring how much each car deviated (differed) from the average MPG, averaging these squared deviation scores, and then taking the square root. Although these two statistics are on the same scale, they will almost always differ in value. As this suggests, these two measures of variation will in fact often be rather close. The mean absolute deviation is more robust against a small proportion of unusually large or small numbers. The square of a large deviation is huge, thus tending to inflate the variance and its directly derived standard deviation over the mean absolute deviation. In summary, if you wish to use statistics with more mathematical properties, you are likely to prefer the mean and the standard deviation. However, if robustness is important (and many statisticians now insist that it is essential), then you would choose the median and either the interquartile range (of the box plot) or the mean absolute deviation. Note: For technical reasons, we often divide by n ⫺ 1 instead of n to obtain the variance and standard deviation of a data set. If using a calculator, you should check which yours does (many do both). If yours only divides by n ⫺ 1, multiplying the variance by (n/n ⫺ 1) or the standard deviation by 冪n/n ⫺ 1 will convert the calculator’s answer to this textbook’s answer. SECTION 2.7 EXERCISES 1. The number of accidents occurring in each of five weeks on a busy freeway are given below. Find the variance and standard deviation of these data. 4 0 6 10 your answer. (Hint: Reread the discussion of the robustness of statistics in Section 2.2.) 6. The following data are the mean ages (in months) and the standard deviations of students who were tested in an international study of mathematics. a. In which country were the students the closest in age? b. In which country were the students the farthest apart in age? c. Explain your answers to parts (a) and (b). 5 2. The number of defective valves found in each of four batches of 1000 in a machine shop are given below. Find the range, mean deviation, variance, and standard deviation of these data. 2 4 0 10 3. Refer to the quiz scores of Exercise 1 in Section 2.5. Find the standard deviation of the quiz scores. 4. Find the variance and standard deviation for each of the following data sets: No 1: No 2: 10 10 15 15 20 20 Country Belgium (Flemish) Belgium (French) Canada (B.C.) France Hungary Japan New Zealand Nigeria Scotland Swaziland United States 25 2500 How are the variance and standard deviation affected by changing the 25 in the first data set to a 2500 in the second data set? 5. Find the mean absolute deviation of the two data sets in Exercise 4. Which do you think is a more robust measure of variation—the mean deviation or the standard deviation? Explain Mean Standard deviation 171 174 168 170 171 162 168 200 168 188 170 8.0 11.3 6.0 8.3 13.4 3.5 5.4 37.7 4.3 22.5 6.0 Source: Second International Mathematics Study, 1987. CHAPTER REVIEW EXERCISES 1. The heights (in inches) of the five starting members of a basketball team are listed below. Find the mean and median height. 71 75 78 81 84 2. Suppose the shortest player is replaced by a 73-inch-tall player. What is the new mean and median height? Why does the mean change?