Download Chapter 2 - Dr. George Fahmy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 2
DESCRIPTIVE STATISTICS
2.1 Frequency Distributions
It is often useful to organize or arrange a body of data into a frequency
distribution. This breaks up the data into groups or classes and shows the
number of observations in each class. The number of classes is usually
between 5 and 15. A relative frequency distribution ins obtained by
dividing the number of observations in each class by total number of
observations in the data as a whole. The sum of the relative frequencies
equals 1. A histogram is a bar graph of a frequency distribution, where
classes are measured along the horizontal axis and frequencies along the
vertical axis. A frequency polygon is a line graph of a frequency
distribution resulting from joining the frequency of each class plotted at
the class midpoint. A cumulative frequency distribution shows, for each
class, the total number of observations in all classes up to and including
that class. When plotted, this gives a distribution curve, or ogive.
Example 1. A student received the following grades (measured from 0 to
10) on the 10 quizzes he took during a semester: 6, 7, 6, 8, 5, 7, 6, 9, 10,
and 6. These grades can be arranged into frequency distributions as in
table 2.1 and shown graphically as in Fig. 2-1.
Table 2.1 Frequency Distributions of Grades
Grades
5
6
7
8
9
10
Absolute
Frequency
1
4
2
1
1
1
10
Relative
Frequency
0.1
0.4
0.2
0.1
0.1
0.1
1.0
Example 2. The cans in a sample of 20 cans of fruit contain weights of
fruit ranging from 19.3 to 20.9 oz, as given in Table 2.2. If we want to
group these data into 6 classes, we get class intervals of 0.3 oz [(21.019.2)/6=0.3 oz]. The weights given in Table 2.2 can be arranged into the
frequency distributions given in Table 2.3 and shown graphically in Fig.
2-2.
Table 2.2 Net Weight in Ounces of Fruits
19.7 19.9 20.2 19.9 20.0 20.6 19.3 20.4 19.9 20.3
20.1 19.5 20.9 20.3 20.8 19.9 20.0 20.6 19.9 19.8
Weight, oz
19.2-19.4
19.5-19.7
19.8-20.0
20.1-20.3
20.4-20.6
20.7-20.9
Table 2.3 Frequency Distribution of Weights
Class
Absolute
Relative
Cumulative
Midpoint
Frequency
Frequency
Frequency
19.3
1
0.05
1
19.6
2
0.10
3
19.9
8
0.40
11
20.2
4
0.20
15
20.5
3
0.15
18
20.8
2
0.10
20
20
1.00
2.2 Measures of Central Tendency
Central tendency refers to the location of a distribution. The most
important measures of central tendency are (1) the mean, (2) the median,
and (3) the mode. We will be measuring these for populations (i.e., the
collection of all the elements we are describing) and for samples drawn
from populations, as well as for grouped and ungrouped data.
1.
The arithmetic mean, or average, of a population is represented by
μ (the Greek letter mu); and for a sample, by X’ (read “X’ bar”). For
ungrouped data, μ and X’ are calculated by the following formulas:


n
and X' 

n
(2.1a, b)
Where ΣΧ refers to the sum of all the observations, while N and n refer to
the number of observations in the population and sample, respectively.
For grouped data, μ and X’ are calculated by


n
and X' 

n
(2.2a, b)
Where Σ∫Χ refers to the sum of the frequency of each class f times the
class midpoint X.
2.
The medium for ungrouped data is the value of the middle item
when all the items are arranged in either ascending or descending
order in terms of values. That is,
 N 1
Median  the 
th item in the data array
 2 
(2.3)
Where N refers to the number of items in the population (n for a
sample). The median for grouped data is given by the formula
Median  L 
n/2 - F
c
fm
(2.4)
Where
L = lower limit of the median class (i.e., the class that contains the
middle item of the distribution)
n = the number of observations in the data set
F = sum of the frequencies up to but not including the median
class
fm = frequency of the median class
c = width of the class interval
3.
The mode is the value that occurs most frequently in the data set.
For grouped data,
Median  L 
d1
c
d1  d 2
(2.5)
Where
L = lower limit of the median class (i.e., the class with the greatest
frequency)
d1 = frequency of the model class minus the frequency of the
previous class
d2 = frequency of the model class minus the frequency of the
following class
c = width of the class interval
The mean is the most used measure of central tendency. The mean,
however, is affected by extreme values in the data set, while the
median and the mode are not. Other measures of central tendency are
the weighted mean, the geometric mean, and the harmonic mean (see
probs. 2.7 to 2.9).
Example 3. The mean grade for the population on the 10 quizzes given
in Example 1, using the formula for ungrouped data, is


N

6  7  6  8  5  7  6  9  10  6 70

 7 points
10
10
To find the median for the ungrouped data, we first arrange the 10 grades
in ascending order: 5, 6, 6, 6, 6, 7, 7, 8, 9, 10. Then we find the grade of
the (N+1)/2 or (10+1)/2=5.5th item. Thus the median is the average of the
5th and 6th item in the array, or (6+7)/2=6.5. The mode for the ungrouped
data is 6 (the value that occurs most frequently in the data set).
Example 4. We can estimate the mean for the grouped data given in
Table 2.3 with the aid of Table 2.4:
' 
 f  401.6  20.08 oz
n
20
This calculation could be simplified by coding (see Prob. 2.6)
Table 2.4 Calculation of the Sample Mean for the Data in Table 2.3
Class
Frequency
Weight, oz
Midpoint X
F
FX
19.2-19.4
19.3
1
19.3
19.5-19.7
19.6
2
39.2
19.8-20.0
19.9
8
159.2
20.1-20.3
20.2
4
80.8
20.4-20.6
20.5
3
61.5
20.7-20.9
20.8
2
41.6
f
 n  20
 fX  401.6
We can estimate the median (med) for the same grouped data as follows:
Med  L 
n/2 F
20 / 2  3
7
c  19.8 
0.3  19.8  0.3
fm
8
8
 19.8  0.2625  20.06 oz
Where
L = 19.8= lower limit of the median class (i.e., the 19.8-20.0 class
which contains the 10th and 11th observations)
n = 20 = number of observations or items
F = 3 = sum of frequencies up to but not including the median
class
fm = 8 = frequency of the median class
c = 0.3 = width of class interval
Similarly,
Mode  L
d1
6
1.8
c  19.8 
0.3  19.8 
 19.8  0.18  19.98 oz
d1  d 2
64
10
As noted in Prob. 2.4, the mean, median, and mode for grouped data are
estimates used when only the grouped data are available or to reduce
calculations with a large ungrouped data set.
2.3 Measures of Dispersion
Dispersion refers to the variability or spread in he data. The most
important measures of dispersion are (1) the average deviation, (2) the
variance, and (3) the standard deviation. We will measure these for the
populations and samples, as well as for grouped and ungrouped data.
1.
The average deviation (AD) for ungrouped data is given by
AD 
 X 
N
for population s
(2.6a)
for samples
(2.6b)
and
AD 
 -
n
where the two vertical bars indicate the absolute value, or the values
omitting the sign, with the other symbols having the same meaning as in
sec.2.2. For grouped data,
AD 
f
X 
N
for population s
(2.7a)
for samples
(2.6b)
and
AD 
 f -
n
where f refers to the frequency of each class and X to the class midpoints.
2.
Variance. The population variance 2 (the Greek letter sigma
squared) and the sample variance s2 for ungrouped data are given by
    


and
N
s
2
 f    

s
2
3.
 f   

2
2

(2.8a, b)
n 1
For grouped data,
2
   

2
2
2
and
N
n 1
(2.9a, b)
Standard deviation. The population standard deviation σ and
sample standard deviation s are the positive square roots of their
respective variances. For ungrouped data,
    
 
s
and
N
For grouped data,
 f    
(2.10a, b)
n 1
 f  - 
2
2
 
  - 
2
2
s
and
N
n 1
(2.11a, b)
The most widely used measure of (absolute) dispersion in the standard
deviation. Other measures (beside the variance and average deviation) are
the range, the interquartile range, and the quartile deviation (see Probs.
2.11 and 2.12).
4.
The coefficient of variation V measures relative dispersion:
V 


for population s
(2.12a)
s

for samples
(2.12b)
and
v
Example 5. The average deviation, variance, standard deviation, and
coefficient of variation for the ungrouped data given in Example 1 can be
found with the aid of Table 2.5 (μ = 7; see Example 3):
AD 
 
2 
N
2
    
 
    

N
12
 1.2 points
10

22
 2.2 points squared
10
2

22
 2.2  1.48 points
10
N
 1.48
V 
 0.21, or 21%

7
Table 2.5 Calculations on the Data in Example 1
μ
7
7
7
7
7
7
7
7
7
7
Grade X
6
7
6
8
5
7
6
9
10
6
X- μ
-1
0
-1
1
-2
0
-1
2
3
-1
|X- μ|
1
0
1
1
2
0
1
2
3
1
(X- μ)2
1
0
1
1
4
0
1
4
9
1
     0      0     
2
 22
Example 6. The average deviation, variance, standard deviation, and
coefficient of variation for the frequency distribution of weights (grouped
data) given in table 2.3 can be found with the aid of Table 2.6 (X’=20.08
oz; see Example 4):
 f 

 f    

6.36
 0.318 oz
20

 f  -  
2.9520
 0.1554 oz squared
19

AD 
n
2
s
2
n 1
2
s
2.9520
 0.1544  0.3942 oz
19
n 1
s 0.3942oz
V 
 0.0196, or 1.96%
20.08oz

Note that in the formula for s2 and s, n-1 rather than n is used in the
denominator (see Prob. 2.16 for the reason). From the formulas for σ2, σ,
s2, and s given in this section, other may be derived that will simplify the
calculations for a large body of data (see Prob. 2.17 to 2.19 for their
derivation and application).
Table 2.6 Calculations on the data in Table 2.4
Weight, oz
Class
Midpoint X
19.20-19.40
19.50-19.70
19.80-20.00
20.10-20.30
20.40-20.60
20.70-20.90
19.30
19.60
19.90
20.20
20.50
20.80

Frequency f
Mean
X’

1
2
8
4
3
2
20.08
20.08
20.08
20.08
20.08
20.08
-0.78
-0.48
-0.18
0.12
0.42
0.72
0.78
0.48
0.18
0.12
0.42
0.72
f
 n  20
 f       
2
0.78
0.96
1.44
0.48
1.26
1.44
0.6084
0.2304
0.0324
0.0144
0.1764
0.5184
=6.36
2.4 Shape of Frequency Distributions
The shape of a distribution refers to (1) its symmetry or lack of it
(skewness) and (2) its peakedness (kurtosis).
1.
Skewness. A distribution has zero skewness if it is symmetrical
about its mean. For a symmetrical (unimodal) distribution, the mean,
median, and mode are equal. A distribution is positively skewed if the
right tail is longer. Then, mean>median>mode. A distribution is
negatively skewed if the left tail is longer. Then, mode>median>mean
(see Fig. 2-3).
Skewness can be measured by the Pearson’s coefficient of skewness:
Sk 
and
Sk 
3  med 


3   med
s

for population s
(2.13a)
for samples
(2.13b)

f 

2
0.6084
0.4608
0.2592
0.0576
0.5292
1.0368
=2.9520
Skewness also can be measured by the third moment [the numerator of
Eq. (2.14a, b)] divided by the cube of the standard deviation:
 f    
3
Sk 
3
for population s
(2.14a)
for samples
(2.14b)
and
 f    
Sk 
3
s3
2.
Kurtosis. A peaked curve is called leptokurtic, as opposed to a flat
on (platykurtic), relative to one that is mesokurtic (see Fig. 2-4).
Kurtosis can be measured by the fourth moment [the enumerator of
Eq. (2.15a, b)] divided by the standard deviation raised to the fourth
power. The kurtosis for a mesokurtic curve is 3.
 f    
Kurtosis 
4
4
and
 f  -  

for population s
(2.15a)
4
Kurtosis
s4
for samples
(2.15b)
Example 7. We can find the Pearson’s coefficient of skewness for the
grades given in Example 1 by using μ=7, med=6.5 (see Example 3), and
σ=1.48 (see Example 5):
Sk 
3  med 


37  6.5 30.5

 1.01
1.48
1.48
(seeFig. 2 - 1)
Similarly, by using X’=20.08 oz, med=20.06 oz (see Example 4), and
s=0.39 oz (see Example 6), we can find the Pearson’s coefficient of
skewness for the frequency distribution of weights in Table 2.3 as
follows:
Sk 


3   med
320.08  20.06

 0.15 (see Fig. 2 - 2c).
s
0.39
For kurtosis, see Prob. 2.23.
SOLVED PROBLEMS
Frequency Distributions
2.1
Table 2.7 gives the grades on a quiz for a class of 40 students. (a)
Arrange these grades (raw data set) into an array from the lowest to
the highest grade. (b) Construct a table showing class intervals and
class midpoints and the absolute, relative, and cumulative frequency
for each grade. (c) Present the data in the form of a histogram,
relative-frequency histogram, frequency polygon, and ogive.
Table 2.7 Grades on a Quiz for a Class of 40 Students
7
5
6
2
8
7
6
7
3
9
10
4
5
5
4
6
7
4
8
2
3
5
6
7
9
8
2
4
7
9
4
6
7
8
3
6
7
9
10
5
(One) See Table 2.8.
Table 2.8 Data Array of Grades
2
2
2
3
3
3
4
4
4
4
4
5
5
5
5
5
6
6
6
6
6
6
7
7
7
7
7
7
7
7
8
8
8
8
9
9
9
9
10
10
(Two) See Table 2.9. Note that since we are dealing here with discrete
data (i.e., data expressed in whole numbers), we used the actual grades
as the class midpoints.
Grade
1.5-2.4
2.5-3.4
3.5-4.4
4.5-5.4
5.5-6.4
6.5-7.4
7.5-8.4
8.5-9.4
9.5-10.4
Table 2.9 Frequency Distribution of Grades
Class
Absolute
Relative
Cumulative
Midpoint
Frequency
Frequency
Frequency
2
3
0.075
3
3
3
0.075
6
4
5
0.125
11
5
5
0.125
46
6
6
0.150
22
7
8
0.200
30
8
4
0.100
34
9
4
0.100
38
10
2
0.050
40
40
1.000
(Three) See Fig. 2-5.
2.2
A sample of 25 workers in a plant receives the hourly wages
given in Table 2.10. (a) Arrange these raw data into an array from the
lowest to the highest wage. (b) Group the data into classes. (c) Present
the data in the form of a histogram, relative-frequency histogram,
frequency polygon, and ogive.
3.65
3.60
3.88
3.78
3.90
3.95
Table 2.10 Hourly Wages in Dollars
3.85 3.95 4.00 4.10 4.25 3.55
4.26 3.75 3.95 4.05 4.08 4.15
4.06 4.18 4.05
3.85
3.80
3.96
4.05
Table 2.11 Data Array of Wages in Dollars
3.60 3.65 3.75 3.78 3.80 3.85 3.85 3.88
3.95 3.95 3.96 4.00 4.05 4.05 4.05 4.06
4.15 4.18 4.25 4.26
3.90
4.08
(One) See Table 2.11
3.55
3.95
4.10
(Two) The hourly wages in Table 2.10 range from $3.55 to $4.26. This
can be conveniently subdivided into 8 equal classes of $0.10 each.
That is, ($4.30-$3.50)/8 = $0.80/8 = $0.10. Note that the range was
extended from $3.50 to $4.30 so that the lowest wage, $3.55, falls
within the lowest class and the largest wage, $4.26, falls within the
largest class. It is also convenient (and needed for plotting the
frequency polygon) to find the class mark or midpoint of each class.
These are shown in Table 2.12.
Table 2.11 Data Array of Wages in Dollars
Hourly
Class
Absolute
Relative
Cumulative
Wage
Midpoint
Frequency
Frequency
Frequency
$3.50-3.59
$3.55
1
0.04
1
3.60-3.69
3.65
2
0.08
3
3.70-3.79
3.75
2
0.08
5
3.80-3.89
3.85
4
0.16
9
3.90-3.99
3.95
5
0.20
14
4.00-4.09
4.05
6
0.24
20
4.10-4.19
4.15
3
0.12
23
4.20-4.29
4.25
2
0.08
25
25
1.00
(Three) See Fig. 2-6. Another way of getting the ogive is to plot the
cumulative frequencies up to $3.595, 3.695, 3.795, and so on (so as to
include the upper limit of each class). The values $3.595, 3.695,
3.795, etc. are often referred to as the class boundaries or exact limits.
Note that the class midpoints are obtained by adding together the
lower and upper class boundaries and dividing by 2. Four example,
the second class midpoint is given by (3.595+3.695)/2 = 7.290/2 =
3.65 (see Table 2.12).
Measures of Central Tendency
2.3
Find the mean, median, and mode (a) for the grade on the quiz for
the class of 40 students given in Table 2.7 (the ungrouped data) and
(b) for the grouped data of these grades given in Table 2.9.
(One) Since we are dealing with all the grades, we want the population
mean:

   7  5  6  ...  5  240  6 points
N
40
40
That is, μ is obtained by adding together all the 40 grades given in Table
2.7 and dividing by 40 (the three centered dots were put in to avoid
repeating the 40 values in Table 2.7). the median is given by the value of
the [(N+1)/2]th item in the data array in Table 2.8. Therefore, the median
is the value of the (40+1)/2 or 20.5 th item, or the average of the 20th and
21st item. Since they are both equal to 6, the median is 6. The mode is 7
(the value that occurs most frequently in the data set).
(Two) We can find the population mean for the grouped data in Table 2.9
with the aid of Table 2.13:

 f  240  6
N
40
This is the same mean we found for the ungrouped data. Note that the
sum of the frequencies, Σf, equals the number of observations in the
population, N, and ΣX = Σ fX. The median for the grouped data of Table
2.13 is given by
Med  L 
N /2 F
40 / 2  16
c  5.5 
1  5.5  0.67  6.17
fm
6
where
L = 5.5 =lower limit of the median class (i.e., the 5.5-6.4 class,
which contains the 20th and 21st observation)
N = 40 = number of observations
F = 16 = sum of observations up to but not including the median
class
fm = 6 = frequency of the median class
c = 1 = width of class interval
The mode for the grouped data in Table 2.13 is given by
Mode  L 
d1
2
c  6.5 
1  6.5  0.33  6.83
d1  d 2
24
where
L = 6.5 =lower limit of the model class (i.e., the 6.5-7.4 class with
the highest frequency of 8)
d1 = 2 = frequency of the modal class, 8, minus the frequency of the
previous class, 6
d2 = 4 = frequency of the modal class, 8, minus the frequency of the
following class, 4
c = 1 = width of class interval
Note that while the mean calculated from the grouped data is in this case
identical to the mean calculated for the ungrouped data, the median and
the mode are only (good) approximations.
Table 2.13 Calculation of the Population Mean for the Grouped Data
in Table 2.9
Class
Grade
Midpoint X
Frequency f
FX
1.5-2.4
2
3
6
2.5-3.4
3
3
9
3.5-4.4
4
5
20
4.5-5.4
5
5
25
5.5-6.4
6
6
36
6.5-7.4
7
8
56
7.5-8.4
8
4
32
8.5-9.4
9
4
36
9.5-10.4
10
2
20
f
2.4
 N  40
 fX  240
Find the mean, median, and mode (a) for the sample of hourly
wages received by the 25 workers recorded in Table 2.10 (the
ungrouped data) and (b) for the grouped data of these wages given in
Table 2.12.
(a)
X 
X
n

$3.65  $3.78  $3.85  ...  $4.05 $98.65

 $3.946 or $3.95
25
25
Median = $3.95 [the value of the (n+1)/2 = (25 + 1)/2 = 13th item in the
data array in Table 2.11].
Mode = $3.95 and $4.05, since there are three of each of these wages.
Thus the distribution is bimodal (i.e., it has two modes).
(Three) We can find the sample mean for the grouped data in Table 2.12
with the aid of Table 2.14:
 fX
$98.75
 $3.95
n
25
Note that in this case  fX  $98.75   X  $98.65 (found in part a) since
X 

the average of the observations in each class is not equal to the class
midpoint for all classes [as in Prob. 2.3(b)]. Thus X’ calculated from the
grouped data is only a very good approximation for the true value of X’
calculated for the ungrouped data. In the real world, we often have only
the grouped data, or if we have a very large body of ungrouped data, it
will save on calculations to estimate the mean by first grouping the data.
Med  L 
n/2  F
25 / 2  9
c  $3.90 
(0.10)  $3.90  $0.07  $3.97
fm
5
as compared with the true median of $3.95 found from the ungrouped
data (see part a)
Mode  L 
d1
1
c  $4.00 
(0.10)  $4.00  $0.025  $4.025 or $4.03
d1  d 2
1 3
as compared with the true modes of $3.95 and $4.05 found from the
ungrouped data (see part a). Sometimes the mode is simply given as the
midpoint of the modal class.
Table 2.14 Calculation of the Sample Mean for the Grouped Data in
Table 2.12
Hourly
Class
Wage
Midpoint X
Frequency f
FX
$3.50-3.59
$3.55
1
$3.55
3.60-3.69
3.65
2
7.30
3.70-3.79
3.75
4
7.50
3.80-3.89
3.85
5
15.40
3.90-3.99
3.95
6
19.75
4.00-4.09
4.05
3
24.30
4.10-4.19
4.15
2
12.45
4.20-4.29
4.25
 fX  $98.75
 f  n  25
2.5
Compare the advantages and disadvantages of (a) the mean, (b)
the median, and (c) the mode as measures of central tendency.
(One) The advantages of the mean are (1) it is familiar and understood by
virtually everyone, (2) all the observations in the data are taken into
account, and (3) it is used in performing many other statistical
procedures and tests. The disadvantages of the mean are (1) it is
affected by extreme values, (2) it is time-consuming to compute for a
large body of ungrouped data, and (3) it cannot be calculated when the
last class of grouped data is open-ended (i.e., it includes the lower
limit of the last class “and over”).
(Two) The advantages of the median are (1) it is not affected by extreme
values, (2) it is easily understood (i.e., half the data are smaller than
the median and half are greater), and (3) it can be calculated even
when the last class in open-ended and when the data are qualitative
rather than quantitative. The disadvantages of the median are (1) it
does not use much of the information available, and (2) it requires that
observations be arranged into an array, which is time-consuming for a
large body of ungrouped data.
(Three) The advantages of the mode are the same as those for the
median. The disadvantages of the mode are (1) as for median, the
mode does not use much of the information available, and (2)
sometimes no value if the data is repeated more than once, so that
there is no mode, while at other times there may be many modes. In
general, the mean is the most frequently used measure of central
tendency and the mode is the least used.
2.6
Find the mean for the grouped data in Table 2.12 by coding (i.e.,
by assigning the value of μ=0 to the 4th or 5th classes and μ =-1, μ =-2,
etc. to each lower class and μ=1, μ=2, etc. to each larger class and
then using the formula
X  X0 
 f c
(2.16)
n
where X0 is the midpoint of the class assigned μ=0 and c is the width of
the class intervals).
See Table 2.15
Table 2.15 Calculation of the Sample Mean by Coding for the
Grouped Data in Table 2.12
Hourly
Class
Wage
Midpoint X
Code μ
Frequency f
FX
$3.50-3.59
$3.55
-3
1
$3.55
3.60-3.69
3.65
-2
2
7.30
3.70-3.79
3.75
-1
4
7.50
3.80-3.89
3.85
0
5
15.40
3.90-3.99
3.95
1
6
19.75
4.00-4.09
4.05
2
3
24.30
4.10-4.19
4.15
3
2
12.45
4.20-4.29
4.25
4
 f  n  25
 f  25
X  X0 
 f c  $3.85  25 ($0.10)  $3.85  $0.10  $3.95
n
25
X’ for the grouped data formed by coding is identical to that found in
Prob. 2.4b without coding. Coding eliminates the problem of having to
deal with possibly large and inconvenient class midpoints; thus it may
simplify the calculations.
2.7
A firm pays a wage of $4 per hour to its 25 unskilled workers, $6
to its 15 semiskilled workers, and $8 to its 10 skilled workers. What is
the weighted average, or weighted mean, wage paid by this firm?
In finding the weighted mean, or weighted average, of a population,
μw, or sample Xm’ , the weights, w, have the same function as the
frequency in finding the mean for the grouped data. Thus,
X w or  w 
 wX
w
(2.17)
For this problem, the weights are the number of workers employed at
each wage, and Σw equals the sum of all the workers:
w 
$425  $615  $810  $100  $90  $80  $270  $5.40
25  15  10
50
50
This weighted average compares with the simple average of $6
[($4+$6+$8)/3=$6] and is a better measure of the average wages.
2.8
A nation faces a rate of inflation of 2% in one year, 5% in the
second year, and 12,5% in the third year. Find the geometric mean of
the inflation rates. (the geometric mean, μG or XG’, of a set of n
positive numbers in the nth root of their product and is used mainly to
average rates of change and index numbers.) That is,
G
XG  3 X1  X 2 ... X n
or
(2.18)
where X1, X2, … , Xn refer to the n (or N) observations.
G  3 2512.5  3 125  5%
this compares with μ=(2+5+12.5)/3=19.5/3=6.5%. When all the
numbers are equal, μG equals μ; otherwise μG is smaller than μ. In
practice, μG is calculated by logarithms:
log  G 
 log 
(2.19)
N
The geometric mean is used primarily in the mathematics of finance and
financial management.
2.9
A commuter drives 10mi on the highway at 60mi/h and 10mi on
local streets at 15mi/h. Find the harmonic mean. The harmonic mean
μH is used primarily to average ratios:
H 
H 
2
N
1
160 115
 X

(2.20)
2
1  4
as compared with    X N

60
2
60 120
2 
 24 mi/h
5
5
5
60
 60  15
2
 75  37.5 mi/h . Note that if the
2
commuter had averaged 37.5 mi/h, it would have taken her (20
mi/37.5mi) 60 min = 32 min to drive the 20 mi. Instead she drives 6 min
on the highway (10mi at 60mi/h) and 40min on local streets (10mi at
15mi/h) for a total of 50min, and this is the (correct) answer we get by
using μH=24mi/h. That is, (20mi/24mi/h) X 60min = 50min.
2.10 (a) For the ungrouped data in Table 2.7, find the first, second, and
third quartiles and the third deciles and sixtieth percentiles. (b) Do the
same for the grouped data in Table 2.12. (Quartiles divide the data
into 4 parts, deciles into 10 parts, and percentiles into 100 parts).
(One) Q1 (first quartile) = 4 (the average of the 10 th and 11th values in
Table 2.8)
Q2 (second quartile) = 6 = the value of the 20.5th item = the median
Q3 (third quartile) = 7.5 = the value of the 30.5th item
D3 (third decile) = 5 = the value of the 12.5th item
P60 (sixtieth percentile) = 7 = the value of the 24.5th item
(b)
n F
Q1  L  4
c
ft
 $3.80 
25 4  5
($0.10)  $3.80  $0.03125  $3.83
4
(2.21)
n F
Q2  L  2
c
f2
 $3.90 
Q3  L 
25 2  9
($0.10)  $3.90  $0.07  $3.97  median
5
(2.22)
3n  F
4
c
f3
75 4  14
($0.10)  $4.00  $0.0792  $4.08
6
3n  F
D3  L  10
c
f3
 $4.00 
 $3.80 
75 10  5
($0.10)  $3.80  $0.0625  $3.86
4
(2.23)
(2.24)
Measures of Dispersion
2.11 (a) Find the range for the ungrouped data in Table 2.7. (b) Find
the range for the ungrouped data in Table 2.10 and for the grouped
data in Table 2.12. (c) What are the advantages and disadvantages of
the range?
(One) The range for ungrouped data is equal to the value of the
largest observation minus the value of the smallest
observation in the data set. The range for the ungrouped
data in Table 2.7 is from 2 to 10, or 8 points.
(Two) The range for the ungrouped data in Table 2.10 is from
$3.55 to $4.26, or $0.71. For grouped data, the range
extends from the lower limit of the smallest class to the
upper limit of the largest class. For the grouped data in
Table 2.12, the range extends from $3.50 to $4.29.
(Three) The advantages of the range are that it is easy to find
data and understand. Its disadvantages are that it
considers only the lowest and highest values of a
distribution, it is greatly influenced by extreme values,
and it cannot be found for open-ended distributions.
Because of these disadvantages, the range is of limited
usefulness (except in quality control).
2.12 Find the interquartile range and the quartile deviation (a) for the
ungrouped data in Table 2.7 and (b) for the grouped data in Table
2.12.
(One) The interquartile range is equal to difference between
the third and first quartiles. That is,
IR = Q3 – Q1
(2.26)
For the ungrouped data in Table 2.7, IR=7.5-4=3.5 points
[utilizing the values of Q3 and Q4 found in Prob. 2.10(a)]. Note
that the interquartile range is not affected by extreme values
because it utilize only the middle half of the data. It is thus
better than the range, but it is not as widely used as the other
measures of dispersion. For the quartile deviation.
QD 
Q3  Q1
2
(2.27)
Therefore, QD = (7.5 –4)/2 = 3.5/2 = 1.75 points. Quartile
deviation measures the average range of one-fourth of the data.
(Two) IR = Q3 – Q1 = $4.08 - $3.83 = $0.25 [utilizing the
values of Q3 and Q1 found in Prob. 2.10(b)]:
QD 
Q3  Q1 $4.08  $3.83

 $0.125
2
2
2.13 Find the average deviation for (a) the ungrouped data in Table
2.7 and (b) for the grouped data in Table 2.9
(One)
Since μ=6 [see Prob.2.3(a)],
 
 11 0  4  2 1 0 1 3  3  4  2 11 2  0 1 2  2  4
 3  1  0  1  3  2  4  2  1  3  2  0  1  2  3  0  1  3  4  1  72
     72  1.8 points
AD 
N
40
Note that the average deviation takes every observation into
account. It measures the average of the absolute deviation of each
observation from the mean. It takes the absolute value (indicated by
the two vertical bars) because      0 (see Example 5).
(Two) We can find the average deviation for the same
grouped data with the aid of Table 2.16:
AD 
F   
N

72
 1.8 points
40
The same as we found for the ungrouped data.
Table 2.16 Calculations for the Average Deviation for the Grouped
Data in Table 2.9
Class
Grade Midpoint X Frequency f Mean μ X- μ |X- μ |
f|X- μ |
1.5-2.4
2
3
6
-4
4
12
2.5-3.4
3
3
6
-3
3
9
3.5-4.4
4
5
6
-2
2
10
4.5-5.4
5
5
6
-1
1
5
5.5-6.4
6
6
6
0
0
0
6.5-7.4
7
8
6
1
1
8
7.5-8.4
8
4
6
2
2
8
8.5-9.4
9
4
6
3
3
12
9.5-10.4
10
2
6
4
4
8
f
 X 
 N  40
2.14 Find the average deviation for the grouped data in Table 2.12
We can find the average deviation for the grouped data of hourly wages
in Table 2.12 with aid of Table 2.17 [X’=$3.95; see Prob. 2.4(b)]:
AD 
f
X X
n

$3.60
 $0.144
25
Note that the average deviation found for the grouped data is an estimate
of the “true” average deviation that could be found for the ungrouped
data. It usually differs slightly from the true average deviation because we
use the estimate of the mean for the grouped data in our calculations
[compare the values of X’ found in Prob. 2.4(a) and (b)].
 72
Table 2.17 Calculations for the Average Deviation for the Grouped
Data in Table 2.12
Hourly
Wage
$3.50-3.59
3.60-3.69
3.70-3.79
3.80-3.89
3.90-3.99
4.00-4.09
4.10-4.19
4.20-4.29
Class
Midpoint
$3.55
3.65
3.75
3.85
3.95
4.05
4.15
4.25
Mean X’
$3.95
3.95
3.95
3.95
3.95
3.95
3.95
Frequency f
1
2
2
4
5
6
3
2
25
X-X’
-$0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
|X-X’|
$0.40
0.30
0.20
0.10
0.00
0.10
0.20
0.30
f|X-X.|
$0.40
0.60
0.40
0.40
0.00
0.60
0.60
0.60
 f     $3.60
2.15 Find the variance and the standard deviation for (a) the ungrouped
data in Table 2.7 and (b) the grouped data in Table 2.9. (c) What is
the advantage of the standard deviation over the variance?
    

2

(One)
2
    
2
and
  6 (see Prob. 2.3a)
N
 1  1  0  16  4  1  0  1  9  9  13  4  1  1  4  0  1  4  4  16
 9  1  0  1  9  4  16  4  1  9  4  0  1  4  9  0  1  9  10  1  192
    


 X -  

2

2
N
192
 4.8 points squared
40
2
 
(Two)
N
192
 4.8  2.19 points
40
We can find the variance and the standard deviation for the
grouped data of grades with the aid of Table 2.18:
 f    

2

2
N

192
 4.8 points squared
40
and
   2  4.8  2.19 points
The same as we found for the ungrouped data.
Table 2.18 Calculations for the Variance and Standard Deviation for
the Data in Table 2.9
Class
Grade Midpoint X Frequency f
Mean μ
X- μ (X- μ)2
f(X- μ)2
1.5-2.4
2
3
6
-4
16
48
2.5-3.4
3
3
6
-3
9
27
3.5-4.4
4
5
6
-2
4
20
4.5-5.4
5
5
6
-1
1
5
5.5-6.4
6
6
6
0
0
0
6.5-7.4
7
8
6
1
1
8
7.5-8.4
8
4
6
2
4
16
8.5-9.4
9
4
6
3
9
36
9.5-10.4
10
2
6
4
16
32
f
 f    
 N  40
2
(c) The advantage of the standard deviation over the variance is that
standard deviation is expressed in the same units as the data rather
than in “the width squared,” which is how the variance is expressed.
The standard deviation is by far the most widely used measure of
(absolute) dispersion.
2.16 Find the variance and the standard deviation for the grouped data
in Table 2.10.
We can find the variance and the standard deviation for the grouped
data of hourly wages with the aid of Table 2.19 [X’=$3.95; see Prob.
2.4(b)]:
 f X  X 

2
s
2
n 1

0.82
 0.0342 dollars squared
24
and
 f X - X 
2
s
n 1
 0.0342  $0.18
 192
Hourly
Wage
$3.50-3.59
3.60-3.69
3.70-3.79
3.80-3.89
3.90-3.99
4.00-4.09
4.10-4.19
4.20-4.29
Table 2.19 Calculations for the Variance and Standard Deviation
for the Data in Table 2.12
Class
Midpoint Frequency f Mean X’ X-X’ (X-X’)2
F(X-X’)2
$3.55
1
$3.95
-$0.40
0.16
0.16
3.65
2
3.95
-0.30
0.09
0.18
3.75
2
3.95
-0.20
0.04
0.08
3.85
4
3.95
-0.10
0.01
0.04
3.95
5
3.95
0.00
0.00
0.00
4.05
6
3.95
0.10
0.01
0.06
4.15
3
3.95
0.20
0.04
0.12
4.25
2
0.30
0.09
0.18
2
25
f      0.82

Note that in the formula for s2 and s, n-1 rather than n is used in the
denominator. The reason for this is that if we take many samples from a
population, the average of the sample variances does not tend to equal
population variance, 2, unless we use n-1 in the denominator of the
formula for s2 (more will be said on this in Chap.5). Furthermore, s 2 and s
for the grouped data are estimates for the true s2 and s that could be found
for the ungrouped data because we used the estimate of X’ from the
grouped data in our calculations.
2.17 Starting with the formula for 2 and s2 given in Sec. 2.3, prove
that

(One)
2
2
(Two)


2
 N 2
 f

 N 2
2
    


(a)


N
N
 

2
2
S


 n
 f

S2
 2    2
  
N
   N 2
2
2
(2.28a, b)
n 1
and
N
2
2
and
N
2
2
 n
2
n 1
2
(2.29a, b)
 2     N 2
N
2
 2 2   2 
N
We can get s by simply replacing  with X’ and N with n in the
numerator and N with n-1 in the denominator of the formula for 2.
2
(b)
 f    

2


2
 f
N
N

 f 
2
2
 2   2
N
 f  N  2
   f
2
 2  f  N 2
N
2
 2 2   2 
N
We can get s2 in the same way as we did in part a. The preceding
formulas will simplify the calculations for 2 and s2 for a large body of
data. Coding also helps (see Prob. 2.6).