Download 2.7 TANDARD DEVIATION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
7. The manufacturers of resistors for electric circuits have put in bids to a television company.
Their prices are comparable, so the television company purchases 10 resistors of each
brand and tests them. The resistors are marked
100 ohms. Each is measured for resistance, and
the following values are found. Which company, Circuits R Us or Electronics Superstore,
would you recommend as the supplier for the
television company?
Circuits R Us
Electronics
Superstore
95
91
105
103
95
99
103
101
98
99
87
106
105
79
103
102
103
97
101
98
2.7 STANDARD DEVIATION
Let’s first review the ways we can describe the spread, or variation, in a
set of data.
1. Range: the distance between the smallest and the largest value. (This
measure is not often used.)
2. Interquartile range: the difference between the first quartile and the
third quartile. This measure tells the difference between the smallest
and the largest value in the middle one-half of the data. (Recall that this
measure is quite robust.)
3. Mean absolute deviation: the average (mean) difference between the
data values (in absolute terms, that is, ignoring the plus and minus
signs) and the mean of the set of data. (This measure is quite intuitive.)
4. Variance: the average (mean) squared difference between the data
values and the mean score of the set of data. (This measure is less intuitive
than the mean absolute deviation, but it possesses nice mathematical
properties.)
Recall that the mean (the average) has the same units as the data it is
computed from. If the data are in inches, their mean is in inches; if the data
are city MPG values, their mean is a city MPG value; if the data are in
baskets per 10 throws, their mean is in baskets per 10 throws. Similarly, the
deviation value, which is the difference between a data point and the mean
and expresses how far off that data value is from the mean, is in the same
units as the data. Finally, the mean absolute deviation, which is the average
of the absolute values of the deviation values, is also in the same units as
the original data.
The variance, in contrast, is not in the same units as the data. Because
it is the average of the squares of the deviation values, its units are squares
of the units of the data. If the original data are in inches, the variance is
in inches squared, or square inches; if the data are city MPG values, the
variance is a city MPG value squared.
Recall that the mean absolute deviation expresses how far off the data
are, on average, from the mean of the data. The interpretation of the
variance is less clear, because its units are different from those of the original
measurements, and we cannot compare numbers having different units. For
this reason we often use another measure of the spread of data: the standard
deviation.
The standard deviation, sometimes abbreviated SD, is the square root
of the variance. This change in the variance rescales the variance to put it
on the same scale as the data and the other measures of spread. So finding
the standard deviation undoes the effect of distorting the scale of the data
by squaring the deviation values. That is, the standard deviation is a new
measure of the average difference of values from their mean. It differs from
the average distance produced by the mean absolute deviation.
The standard deviation has many important statistical properties, some
of which we will study later in this book.
Example 2.10
Consider Jayne’s basketball scoring record again.
Early in the season:
Variance ⳱ 6.7
Standard deviation ⳱ 冪6.7 ⬇ 2.6
Late in the season:
Variance ⳱ 0.67
Standard deviation ⳱ 冪0.67 ⬇ 0.81
Just as the variance of Jayne’s late-season values is less than the variance of
her early-season values, so is the standard deviation of her late-season values less
than the standard deviation of her early-season values. The standard deviation
of Jayne’s early-season values, 2.6, is again a kind of average difference between
Jayne’s baskets-per-10-throws values and their average. Taking the square root
undoes the effect of squaring the deviation scores, which was done to find the
variance. In this new way of thinking about the average, we can say that, on the
average, Jayne’s shooting record was about 2.6 from her mean of 5 baskets in every
10 throws. Late in the season, she was shooting much closer to her average value,
or mean (which was now 6), since the standard deviation went down to 0.81.
Table 2.9
Deviation Values for MPG
City MPG
Deviation
MPG value
Deviation
MPG value
squared
Geo Metro
Dodge Colt
Chevrolet Astro
46
29
15
16
⫺1
⫺15
256
1
225
Mean
30
0
Make and model
Example 2.11
482/3 ⳱ 160.7
Using Table 2.9, find the standard deviation of city mileage for the three cars.
Solution
We know that the standard deviation is given by the square root of the variance.
From Table 2.9,
Square root of variance ⳱ 冪160.7 ⳱ 12.7
It is instructive to compare the standard deviation of these mileages with their
mean deviation. The mean absolute deviation is 10.7 miles per gallon. This is seen by
averaging the magnitudes of the deviations from Table 2.9: (16 Ⳮ 1 Ⳮ 15)/3 ⳱ 10.7.
We interpret this by saying that on average, the cars differed from the mean mileage
by 10.7 miles per gallon. (Two got less than the average of 30 MPG, and one got
more than 30 MPG.) Similarly, we interpret the standard deviation of 12.7 miles
per gallon by saying that “on average,” the cars differed from the mean mileage
by 12.7 miles per gallon. But this time the average is determined by squaring how
much each car deviated (differed) from the average MPG, averaging these squared
deviation scores, and then taking the square root. Although these two statistics are
on the same scale, they will almost always differ in value.
As this suggests, these two measures of variation will in fact often be
rather close. The mean absolute deviation is more robust against a small
proportion of unusually large or small numbers. The square of a large
deviation is huge, thus tending to inflate the variance and its directly
derived standard deviation over the mean absolute deviation.
In summary, if you wish to use statistics with more mathematical
properties, you are likely to prefer the mean and the standard deviation.
However, if robustness is important (and many statisticians now insist that it
is essential), then you would choose the median and either the interquartile
range (of the box plot) or the mean absolute deviation.
Note: For technical reasons, we often divide by n ⫺ 1 instead of n to
obtain the variance and standard deviation of a data set. If using a calculator,
you should check which yours does (many do both). If yours only divides
by n ⫺ 1, multiplying the variance by (n/n ⫺ 1) or the standard deviation
by 冪n/n ⫺ 1 will convert the calculator’s answer to this textbook’s answer.
SECTION 2.7 EXERCISES
1. The number of accidents occurring in each of
five weeks on a busy freeway are given below.
Find the variance and standard deviation of
these data.
4
0
6
10
your answer. (Hint: Reread the discussion of
the robustness of statistics in Section 2.2.)
6. The following data are the mean ages (in
months) and the standard deviations of students who were tested in an international
study of mathematics.
a. In which country were the students the
closest in age?
b. In which country were the students the
farthest apart in age?
c. Explain your answers to parts (a) and (b).
5
2. The number of defective valves found in each
of four batches of 1000 in a machine shop are
given below. Find the range, mean deviation,
variance, and standard deviation of these data.
2
4
0
10
3. Refer to the quiz scores of Exercise 1 in Section
2.5. Find the standard deviation of the quiz
scores.
4. Find the variance and standard deviation for
each of the following data sets:
No 1:
No 2:
10
10
15
15
20
20
Country
Belgium (Flemish)
Belgium (French)
Canada (B.C.)
France
Hungary
Japan
New Zealand
Nigeria
Scotland
Swaziland
United States
25
2500
How are the variance and standard deviation
affected by changing the 25 in the first data set
to a 2500 in the second data set?
5. Find the mean absolute deviation of the two
data sets in Exercise 4. Which do you think is a
more robust measure of variation—the mean
deviation or the standard deviation? Explain
Mean
Standard
deviation
171
174
168
170
171
162
168
200
168
188
170
8.0
11.3
6.0
8.3
13.4
3.5
5.4
37.7
4.3
22.5
6.0
Source: Second International Mathematics Study,
1987.
CHAPTER REVIEW EXERCISES
1. The heights (in inches) of the five starting members of a basketball team are
listed below. Find the mean and median height.
71
75
78
81
84
2. Suppose the shortest player is replaced by a 73-inch-tall player. What is the
new mean and median height? Why does the mean change?