Download Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 3
Data Summary Using Descriptive Measures
CHAPTER OVERVIEW AND OBJECTIVES
Chapter 2 examined techniques for visually describing a set of data.
The purpose of this chapter is to introduce techniques for describing a
set of data using one or more numerical measures.
By the end of the
chapter, the student should be able to define and use the following
measures:
1. Measures of Central Tendency:
2. Measures of Variation:
Mean, Median, Mode and Midrange.
Range, Standard Deviation, Variance, and
Coefficient of Variation.
3. Measures of Position:
4. Measures of Shape:
Percentiles, Quartiles, and z-scores.
Skewness and Kurtosis.
5. Techniques for handling frequency distributions (grouped data).
6. Construction of box plots.
75
76
Instructor's Manual
Chapter 3 Glossary
box plot.
A diagram that demonstrates the lowest and highest values
within that portion of the sample not containing outliers, the three
sample quartiles, and any sample values determined to be outliers.
Chebyshev's inequality.
A rule stating at least what percentage of the
sample values are within 2, 3,  standard deviations of the mean.
coefficient of variation.
The sample standard deviation divided by the
sample mean and multiplied by 100.
descriptive measure.
A statistic that describes the location,
variation, or shape of a sample or one that describes the position of
an individual value in a sample (such as a percentile).
empirical rule.
A rule that states approximately what percentage of the
sample values are within 1, 2, and 3 standard deviations of the mean.
This rule assumes that the population has a bell-shaped (normal)
appearance.
grouped data.
Summarized data in the form of a frequency distribution.
interquartile range.
The difference between the first and third
quartiles (Q3 - Q1).
kurtosis.
A measure of shape that describes the tendency of a
distribution to stretch out in a particular direction.
mean.
-.
The average of the sample data; its symbol is x
measure.
See descriptive measure.
Measures consist of measures of
central tendency, variation, position, and shape.
Chapter 3
measures of central tendency.
77
Measures that describe the location
(typical value) of a sample, including the sample mean, median,
midrange, and mode.
measures of variation.
Measures that describe the variation within a
sample; they include the sample range, variance, standard deviation,
and coefficient of variation.
measures of position.
Measures that indicate the relative position of a
sample value, such as percentiles, quartiles, and z-scores.
measures of shape.
Measures that describe the shape (symmetry and
peakedness) of a sample, including measures of skewness (lack of
symmetry) and kurtosis (peakedness).
median.
The value in the center of the ordered data (if the sample size
is an odd number) or the average of the two center values (if the
sample size is an even number).
midrange.
mode.
The average of the lowest and highest values in the sample.
The sample value that occurs more than once and the most often.
outier.
An unusually large or small data value in a sample.
Such a
value can be illustrated and detected using a box plot and is
considered to be an extreme outlier if it lies beyond either of the
outer fences.
A mild outlier is a sample value that lies beyond
either of the inner fences but not beyond the corresponding outer
fence.
78
Instructor's Manual
percentile.
A measure of position, written PK, where at most K% of the
sample values are less than PK and at most (100 - K)% of the sample
values are greater than PK.
quartiles.
Special percentiles; the 1st quartile = 25th percentile, 2nd
quartile = 50th
percentile (= median), and 3rd quartile = 75th
percentile.
range.
The difference between the highest and lowest data values in the
sample.
skewness.
A measure of shape that describes the degree of symmetry in
the sample data.
standard deviation.
The square root of the sample variance; its symbol
is s.
variance.
A measure of variation that is obtained by summing the
squared deviations from the sample mean and dividing by one less than
the sample size; its symbol is s2.
z-score.
A measure of position for any particular value in a sample.
It is obtained by subtracting the mean and dividing by the standard
deviation.
It tells how many standard deviations to the right or
left of the mean this value lies.
Chapter 3
3.1
a)
10
20
30
Mean = 60/3 = 20
There is no mode.
Median = 20
Midrange = (10 + 30) / 2 = 20
b)
3
5
6
8
9
Mean = 31/5 = 6.2
There is no mode.
Median = 6
Midrange = (3 + 9) / 2 = 6
c)
1
2
2
3
7
7
8
9
10
14
Mean = 63/10 = 6.3
Mode = 2 and 7
Median = (7 + 7) / 2 = 7
Midrange = (1 + 14) / 2 = 7.5
3.2
a)
4
6
7
8
10
13
13
14
18
Mean = 93/9 = 10.333 cubic meters
Mode = 13 cubic meters
Median = 10 cubic meters
Midrange = (4 + 18)/2 = 11 cubic meters
b)
Convert yards to feet
9
12
14
16
20
Mean = 159/8 = 19.875
Mode = 30
28
30
30
79
80
Instructor's Manual
Median = (16 + 20)/2 = 18
Midrange = (9 + 30)/2 = 19.5
c)
Convert all numbers to percents
Mean = 310/20 = 15.5%
Mode = 10 and 15
Median = (14 + 15)/2 = 14.5%
Midrange = (7 + 30)/2 = 18.5%
3.3
a)
1
2
7
8
9
Mean = 5.4
Median = 7
b)
1
3
10
18
19
Mean = 10.2
Median = 10
3.4
a)
- = 1515/30 = 50.5
x
Median = (50 + 50)/2 = 50
Mode = 50
Midrange = (40 + 88)/2 = 64
b)
Except for the midrange, the measures of central tendency are
approximately the same.
Any of these measures, except the
midrange appear to be appropriate.
3.5
No, the median does not have to change by the same amount that
the mean changes.
3.6
a)
Mean = (12418/20) = 620.9
Median = (633 + 638)/2 = 635.5
Chapter 3
81
Mode = 640
b)
The median and the mode would not be easily influenced by
extreme values.
- = 11998/19 = 631.47
x
c)
Median = 638
Mode = 640
The mean changed the most.
3.7
Mean = 55.77/14 = 3.98
Median = (.5 + .68)/2 = .59
If 26.30 is omitted, the value of the mean will change more than
if any other values were omitted.
After omitting 26.30, the mean is 29.47/13 = 2.267.
is .5.
3.8
a)
Process 1:
- = 129/10 = 12.9
x
Median = (13 + 13)/2 = 13
Mode = 13
Midrange = (11 + 15)/2 = 13
Process 2:
- = 142/10 = 14.2
x
Median = (13 + 13)/2 = 13
Mode = 13
Midrange = (10 + 20)/2 = 15
The median
82
Instructor's Manual
b)
In Process 1, the data are approximately symmetrical.
Hence,
the measures of central tendency are approximately the same.
In Process 2, a few large values, (19 and 20) easily affect
the mean and the midrange.
Hence these two values are some-
what higher than the other measures of central tendency.
3.9
a)
Mean = 4020.4 / 15 = 268.0267
Median = 168.3
Midrange = (.8 + 1400) / 2 = 700.4
Yes, these statistics would be expected to be different
especially when there is a very large value in the dataset.
b)
Mean = 2620.4 / 14 = 187.1714
Median = (70.5 + 168.3) / 2 = 119.4
Midrange = 319.9
The value of the midrange changed the most.
3.10
a)
The wait time is mostly less than 25 minutes as viewed from
the frequency table below.
However, in 30% of the observed
cases, the wait time has been 25 minutes or more. In fact, in
14% of the cases the wait time has been at least 29 minutes
but less than 31 minutes.
Chapter 3
83
Frequency Distribution Table
CLASS
CLASS LIMITS
FREQUENCY
1
15 and under 17
8
2
17 and under 19
6
3
19 and under 21
10
4
21 and under 23
8
5
23 and under 25
3
6
25 and under 27
2
7
27 and under 29
6
8
29 and under 31
7
TOTAL
50
b.) The mean is 21.84 and the median is 21. On average, the new
procedure is reaching its target.
3.11 a)
The mean of the scores is 68.67 and the median is 70.
Frequency Histogram
60
50
40
30
20
10
0
20 and under 30 and under 40 and under 50 and under 60 and under 70 and under 80 and under 90 and under
30
40
50
60
70
80
90
100
Class Limits
b)
From the histogram below, the scores appear to be clustered
between 50 and 90, with outliers greater than 90 and less than
84
Instructor's Manual
50.
About 34% of the observations fall between 70 and 80.
The shape is somewhat bell shaped between 50 and 90.
c)
The new mean (median) should be equal to 4.5 times the old
mean (median) plus 50.
d)
The mean of the non-standardized data is 359 and the median of
the non-standardized data is 365.
3.12
x = 20
x2 = 33
n = 50
- = 20/50 = 0.4
x
s2 = (33 - 202/50)/49 = .5102
s =
.5102 = .7143
.5102
CV = (.7143/.4)  100 = 178.6%
3.13 a)
Range = 8 - 2 = 6
s2 = (118 - 222/5)/4 = 5.3
s =
5.3 = 2.302
CV = (2.302/4.4)  100 = 52.322%
b)
Range = 22 - 10 = 12
s =
c)
b)
CV = (3.7253/17.1)  100 = 21.79%
13.877 = 3.7253
Range = 5.3 - 2.1 = 3.2
s =
3.14 a)
s2 = (3049 - 1712/10)/9 = 13.8777
s2 = (338.18 - 80.42/20)/19 = .788
CV = (.8877/4.02)  100 = 22.09%
.788 = .888
The values of differences should sum to zero.
The variance is the sum of the squared values of the
differences divided by the number of observations minus one.
Variance = (25 + 1 + 9 + 4 + 9 + 4) / 5
Variance = 10.4
c)
Standard deviation is
104
.  3225
.
10.4 = 3.225
Chapter 3
3.15
a)
Since the range is larger for the large carriers, one might
expect this group to have more variation.
b)
Standard deviation of the large group of carriers is 32.27.
Standard deviation of the small group of carriers is 16.63.
3.16 Stock A:
- = 149/12 = 12.41667
x
s =
= 2.7122
Stock B:
- = 445/12 = 37.08333
x
s =
= 6.77506
CVA = 2.7122/12.41667  100 = 21.8432%
CVB = 6.77506/37.08333  100 = 18.2698%
Stock B appears to be more stable since its coefficient of
variation is lower.
3.17
Note that:
-)2 = (x2 - 2 x x
- + x
-2) = x2 - 2 x
- x + nx
-2
(x - x
= x2 - 2(x)2/n + (x)2/n = x2 - (x)2/n
3.18
a)
The range depends on only 2 values whereas the standard
deviation uses all the data values in its calculation.
However, the range is easier to calculate than the standard
deviation.
b)
Zero is the smallest value.
85
86
Instructor's Manual
c)
We can say that all the data points are equal to the same
value.
3.19
a)
A guess at the mean can be determined by looking at where the
data are centered.
A guess at the standard deviation can be
determined by looking at the range of the data.
b)
Mean is 1368 / 7 = 195.4286.
Variance is
276762  (1368)2 / 7
 1569.286
7 1
Standard deviation is 39.614.
c)
Standard deviation will decrease.
d)
Removing the smallest and largest observations,
- = 1008 / 5 = 201.6
x
s2 = (205912 - (1008)2 / 5) / (5 - 1) = 674.8
s = 25.977
e)
The middle values give one a clue as to what the mean might
be.
The range of the data give a clue as to what the standard
deviation might be.
3.20
a)
The first histogram is for the government sector and the
second histogram is for the private sector. The values for the
government sector appear to be more concentrated in the 1000
and less than 2000 range. The values for the private sector
are somewhat larger and slightly more spread out.
Chapter 3
87
88
Instructor's Manual
b)
The mean and standard deviation for the government employees
are 1660.37 and 1171.106, respectively. The mean and standard
deviation for the private sector employees is 2263.14 and
1247.56, respectively.
3.21
a)
Mean is 85.2, standard deviation is 47.6, coefficient of
variation is 100(47.604/85.2) = 55.87, and the variance is
2266.182.
b)
The mean should change by dividing its value by 60. The
standard deviation should change by dividing its value by 60.
The variance should change by dividing its value by 3600.
c)
Mean is 1.42, standard deviation is .793, and the variance is
.6295.
3.22
a)
nP / 100 = 10 (75) / 100 = 7.5
Round up to 8.
8th value is 3.
75th percentile is 3.
np / 100 = 10 (50) / 100 = 5
50th percentile = (5th value + 6th value) / 2
= 2) / 2
= 2
b)
x
= 22/10 = 2.2
s2 = (80 - (22)2 / 10) / (10 - 1)
= 3.511
s = 1.874
Chapter 3
- - Md) / s
sk = 3 (x
= 3 (2.2 - 2) / 1.874
= .3202
3.23
Median is 1.5, Mean is 10 / 10 = 1
and s2 = (20 - (10)2/ 10) / 9 = 1.111
s = 1.054
- - Md) / s
Sk = 3 (x
= 3 (1 - 1.5) / 1.054
= -1.423
3.24
a)
20th percentile:
i = (20  20)/100 = 4; (4th value + 5th value) / 2
= (2.4 + 2.4)/2 = 2.4
b)
40th percentile:
i = (20  40)/100 = 8; (8th value + 9th value) / 2
= (3.7 + 3.8)/2 = 3.75
c)
60th percentile:
i = (60  20)/100 = 12; (12th value + 13th value) / 2
= (4.5 + 4.6)/2 = 4.55
d)
80th percentile:
i = (80  20)/100 = 16; (16th value + 17th value) / 2
= (6.3 + 7.2)/2 = 6.75
e)
Interquartile range:
25th percentile:
89
90
Instructor's Manual
i = (20  25)/100 = 5; (5th value + 6th value) / 2
= (2.4 + 2.5)/2 = 2.45
75th percentile:
i = (20  75)/100 = 15; (15th value + 16th value) / 2
= (5.4 + 6.3)/2 = 5.85
IQR = 5.85 - 2.45 = 3.4
- = 50
x
3.25
3.26
s = 5
a)
Z = (40 - 50)/5 = -2
b)
Z = (65 - 50)/5 = 3
c)
1 = (x - 50)/5
x = 5 + 50 = 55
d)
-2.5 = (x - 50)/5
x = -12.5 + 50 = 37.5
a)
nP / 100 = 20 (75) /100 = 15
75th percentile = (15th value + 16th value) / 2
= (55 + 60) / 2
= 57.5
nP / 100 = 20 (25) / 100 = 5
25th percentile = (5th value + 6th value) / 2
= (35 + 38) / 2
= 36.5
Interquartile range = Q3 - Q1 = 57.5 - 36.5 = 21
b)
c)
Mean = 941 / 20 = 47.05
s2 = (47587 - (941)2 / 20) / 19
= 174.3658
s = 13.2048
Chapter 3
d)
91
Median is (49 + 50) / 2 = 49.5
- - Md) / s
Sk = 3 (x
= 3 (47.05 - 49.5) / 13.2048
= -.5566
3.27
a)
20th percentile:
i = 30(20)/100 = 6; (73 + 74)/2 = 73.5 = 20th percentile
b)
80th percentile:
i = 30(80)/100 = 24; (105 + 106)/2 = 105.5 = 80th percentile
c)
25th percentile:
i = 30(25)/100 = 7.5; 75 = 25th percentile
75th percentile:
i = 30(75)/100 = 22.5; 103 = 75th percentile
IQR = 103 - 75 = 28
d)
- = 2719/30 = 90.6333
x
Median:
i = 30/2 = 15th position;
(87 + 88)/2 = 87.5 = median
s =
= 19.71781
Sk = 3(90.6333 - 87.5)/19.71781 = 0.47673
e)
The data are slightly skewed to the right.
the data fall between 75 and 103.
The middle 50% of
The mean and standard
deviation are 90.6333 and 19.71781, respectively.
92
3.28
Instructor's Manual
12
15
19
20
21
21
22
42
45
47
52
53
53
54
70
71
71
71
72
73
73
74
74
74
75
77
84
86
87
90
a)
Q1 = 25th percentile:
i = 30(25)/100 = 7.5; 42 = Q1
Q2 = 50th percentile:
i = 30(50)/100 = 15; (70 + 71)/2 = 70.5 = Q2
Q3 = 75th percentile:
i = 30(75)/100 = 22.5; 74 = Q3
s =
= 24.631
Sk = 3(56.6 - 70.5)/24.631 = -1.69
b)
The observations that are much higher than Q3 and much lower
than Q1 may be considered unusually high and low, respectively.
c)
3.29
1
75 is close to the 75th percentile of 74.
4
6
7
9
a)
s = 183  27 2 / 5) / 4  3.0496
b)
z = (1 - 5.4)/3.0496 = -1.443
z = (4 - 5.4)/3.0496 = -.4591
z = (6 - 5.4)/3.0496 = 0.1967
z = (7 - 5.4)/3.0496 = 0.52466
z = (9 - 5.4)/3.0496 = 1.18049
c)
The standard deviation of the Z score is expected to be 1.
Chapter 3
sz  400053  02 ) / 4  1.00006 1
93
94
Instructor's Manual
3.30
Boyston
Mean
1.237
Standard deviation
3.31
.077896
Farmersville
1.235
.055227
Median
1.2
1.245
Coefficient of skewness
1.425
-.5432
a)
- = 1205; s2 = 7,837,307; s = 2799.519, Median = 236
x
Sk = 3 (1205 - 236) / 2799.519
= 1.038393
b)
The skewness should decrease.
- = 159.2857; s2 = 8762.571; s = 93.6086; Median = 153
x
- - Md) / s
Sk = 3 (x
= 3 (159.2857 - 153) / 93.6086
= .20145
3.32
a)
nP / 100 = 75 (75) / 100 = 56.25; round up to 57
75th percentile is 7.44
nP / 100 = 75 (25) / 100 = 18.75; round up to 19
25th percentile is 3.12
Interquartile range = 7.44 - 3.12 = 4.32
- = 5.421; Md = 5.81; s2 = 6.7554; s = 2.599115
x
- - Md) / s
Sk = 3 (x
= 3 (5.421 - 5.81) / 2.599115
= -.449
Chapter 3
b)
Interquartile range = .7 (4.32) = 3.024
- = 3.7947; Md = 4.067; s2 = 3.310146; s = 1.81938;
x
Sk = -.449
All values were multiplied by .7 in part (b), except the
skewness measure which did not change.
3.33
a)
The data do not look to be very skewed from the histogram.
Therefore, one would expect to have a small value for the
coefficient of skewness.
95
96
Instructor's Manual
Frequency Histogram
10
9
8
7
6
5
4
3
2
1
0
50 and under 70 and under 90 and under
70
90
110
110 and
under 130
130 and
under 150
150 and
under 170
170 and
under 190
190 and
under 210
Class Limits
b)
- = 127.3724; s2 = 1782.21124; s = 42.2162; Md = 134.995
x
Sk = 3 (127.3724 - 134.995) / 42.2162
= -.54168
c)
Since the data is negatively skewed, the largest 5 values are
deleted.
That is, values 199.56, 199.39, 183.22, 180.38, and
174.53 are deleted.
- = 120.7008; s2 = 1517.6195; s = 38.9566; Md = 129.49
x
Sk = 3(120.7008 - 129.49)/38.9566
= -.677
3.34
- = 20
x
a)
-  2s
x
s = 5
20  2(5)
At least 75% of the data lie between 10 and 30.
b)
-  3s
x
20  3(5)
At least 89% of the data lies between 5 and 35
Chapter 3
3.35 a)
97
- - s to x
- + s
x
100 - 20 to 100 + 20
80 to 120
b)
- - 3s to x
- + 3s
x
100 - 60 to 100 + 60
40 to 160
3.36
(1 - 1/k2) 100% = (1 - 1/16) 100% = 93.75%
3.37
a)
- = 448/10 = 44.8
x
b)
-  2s
x
s = 10.5283
44.8  2(10.5283)
At least 75% of the data lie between 23.7434 and 65.8565.
c)
-  3x ===>
x
44.8  3(10.5283)
At least 89% of the data lie between 13.2152 and 76.3848.
d)
Yes, it is consistent with Chebyshev's inequality.
90% of the
observations fall between 23.7434 and 65.8565 and 100% of the
observations fall between 13.2152 and 76.3848.
3.38
- = 120
x
a)
s = 30
n = 300
- + 2s.
60 to 180 is equivalent to x
At least 75% of the data or 225 observations would lie within
the interval 60 to 180.
b)
With the bell-shaped assumption, 95% of the data or 285
observations would lie between 60 and 180.
3.39
- = 50.5
x
s = 9.7512
n = 30
-  s = 50.5  9.7512 = 40.749 to 60.2512
x
98
Instructor's Manual
27/30 = .90 or 90% of the data values fall in this interval.
-  2s = 50.5  2(9.7512) = 30.997 to 70.002
x
28/30 = 93.33% of the data values fall in this interval.
-  3s = 50.5  3(9.7512) = 21.246 to 79.753
x
28/30 = 93.33% of data values fall in this interval
The data do not appear to come from a normal population since 90%
of the data lie within one standard deviation.
3.40
30 = 45 - 15 to 45 + 15 = 60
By the empirical rule, 15 is equal to two standard deviations.
Therefore, s = 7.5.
3.41
- = 90.6333
x
s = 19.71781
n = 30
-  2s = 90.633  2(19.71781)
x
At least 75% of the data will fall between 51.198 and 130.069.
Twenty-nine observations or 96.667% of the data actually lie
within this interval.
3.42
1 - 1/k2 = .55
1/k2 = .45
k2 = 2.222
k = 1.41
At least 55% of the data lie within 1.5 standard deviations of the
mean.
3.43
- = 12.733; s = 9.49
x
- - 2s = - 6.247; x
- + 2s = 31.713;
x
- - 3s = -15.737; x
- + 3s = 41.203
x
a)
14 observations lie within two standard deviations of the
mean, that is 93.3% of data are within two standard deviations
Chapter 3
of the mean.
99
100% of the data lie within three standard
deviations of the mean.
b)
The results in part (a) are consistent with Chebyshevs
inequality.
Frequency Histogram
25
20
15
10
5
0
0 and under 5 and under
5
10
10 and
under 15
15 and
under 20
20 and
under 25
25 and
under 30
30 and
under 35
35 and
under 40
40 and
under 45
Class Limits
3.44
a)
The histogram is approximately bell-shaped as displayed below.
b)
The sample mean plus or minus two standard deviations is as
follows:
19.3867 - 2(6.8554) to 19.3867 + 2(6.8554)
5.6759 to 33.0975
Approximately 95% of the data should lie between these
interval endpoints.
3.45 a)
The following interval contains at least 75% of the data:
- - 2s to x
- + 2s
x
18335.76 - 2(2268.515) to 18335.76 + 2(2268.515)
$13,798.73 to $22,872.79
The following interval contains at least 89% of the data.
100
Instructor's Manual
- - 3s to x
- + 3s
x
18,335.76 - 3(2268.515) to 18335.76 + 3(2268.515)
$11,530.22 to $25,141.31
b)
196 observations (98%) of the data lie within 2 standard
deviations of the mean.
The minimum number expected by
Chebyshevs inequality is 150.
199 observations (99.5%) of the data lie within 2 standard
deviations of the mean.
The minimum number expected by
Chebyshevs inequality is 178.
3.46
a)
- = [(5)(4) + (15)(7) + (25)(5) + (35)(4)]/20 = 19.5
x
b)
s  9700  3902 / 20) /19 10.5006
Note fm2 = 4(5)2 + 7(15)2 + 5(25)2 + 4(35)2 = 9700
3.47
a)
- = [(10)(10) + (10)(20) + (10)(30) + (10)(40) + (10)(50)]/50
x
= 30
3.48
b)
s2 = (55000 - 15002/50)/49 = 204.0816
c)

s =
204 . 0816
= 14.2857
- = [(30)(5) + (40)(11) + (50)(18) + (60)(6) + (70)(10)]/50
x
= 2550/50 = 51
3.49
- = (115(2) + 145(12) + 175(4) + 205(1) + 235(2) + 265(1))
x
/ 22 = 3610 / 22 = 164.0909
Chapter 3
6
fm
i 1
i
 2(115)2  12(145)2  4(175)2
2
i
1(205) 2 2(235) 2  1(265) 2
 623950
s2 (623950(3610)2 /22)/211503.896
s  38.7801
3.50 a.)
Frequency Distribution Table
CLASS
CLASS LIMITS
FREQUENCY
1
6 and under 9
3
2
9 and under 12
6
3
12 and under 15
4
4
15 and under 18
2
TOTAL
15
b)
- = (7.5(3) + 10.5(6) + 13.5(4) + 16.5(2)) / 15
x
= 172.5 / 15 = 11.5
4
fm
i 1
i
2
1
 3(7.5)2  6(10.5)2  4(13.5)2  2(16.5)2
= 2103.75
s2 = (2103.75 - (172.5)2 / 15) / 14 = 8.5714
s
3.51
 2.9277
=
c)
- = 11.66; s = 2.802
x
a)
- = [(7.5)(5) + (12.5)(15) + (17.5)(31) + (22.5)(30)
x
+ (27.5)(16) + (32.5)(3)]/100
= 1980/100 = 19.8
101
102
Instructor's Manual
42575  19802 /100) / 99  34  0505  5.8353
b)
s
c)
Yes, the mean is an appropriate summary statistic since these
data appear to have a symmetrical distribution.
3.52 a)
Frequency Distribution Table
CLASS
CLASS LIMITS
FREQUENCY
1
10 and under 15
15
2
15 and under 20
78
3
20 and under 25
78
4
25 and under 30
23
5
30 and under 35
6
TOTAL
200
b)
- = [12.5(15) + 17.5(78) + 22.5(78) + 27.5(23) + 32.5(6)]/200
x
= 20.675
fm2 = 15(12.5)2 + 78(17.5)2 + 78(22.5)2 + 23(27.5)2 + 6(32.5)2
= 89450
s2 = [(89450 - (4135)2 / 200)] / 199 = 19.894
s = 4.46
c)
- = 20.155; s = 4.106
x
a)
-------------------------------------I
+
I------------------------------------+---------+---------+---------+---------+---------+------C1
0
20
40
60
80
100
Lower hinge = 6.0
b)
 8.0
Median =
c)
Upper hinge = 10.5
d)
Mild outlier = 18
3.53
3.54
Chapter 3
e)
No extreme outliers
103
104
Instructor's Manual
3.55
3.56 a)
------------------------------I
+
I------------------------
-------------------------------+---------+---------+---------+---------+--------C1
6.4
b)
8.0
9.6
11.2
12.8
Approximately 25% of the customers are served in 10 minutes or
more.
Therefore, the manager should rethink the policy of
allowing customers to eat for free if not served within 10
minutes.
Chapter 3
105
3.57 a)
Box Plot
140.000000
120.000000
Upper Outer Fence
100.000000
Upper Inner Fence
80.000000
60.000000
Third Quartile
40.000000
Median
First Quartile
20.000000
0.000000
Lower Inner Fence
-20.000000
-40.000000
Lower Outer Fence
-60.000000
* mild outlier
b)
o extreme outlier
The distribution appears to be skewed slightly to the right.
There is one mild outlier.
3.58
---------------------------I
+
I-----------------------------------------------+---------+---------+---------+---------+---------+--C1
64
80
96
112
128
144
For this set of 30 observations, we would not expect any mild or
extreme outliers.
106
3.59
Instructor's Manual
The distribution is slightly skewed to the right.
Box Plot
400.000000
350.000000
Upper Outer Fence
Upper Inner Fence
300.000000
Third Quartile
Median
First Quartile
250.000000
Lower Inner Fence
200.000000
Lower Outer Fence
150.000000
100.000000
50.000000
0.000000
* mild outlier
o extreme outlier
Chapter 3
3.60
a)
107
The box plot below is for Holiday Hotel North
Box Plot
140.000000
Upper Outer Fence
120.000000
Upper Inner Fence
100.000000
Third Quartile
Median
First Quartile
80.000000
Lower Inner Fence
60.000000
Lower Outer Fence
40.000000
20.000000
0.000000
* mild outlier
o extreme outlier
108
Instructor's Manual
The box plot below is for Holiday Hotel South
Box Plot
140.000000
Upper Outer Fence
120.000000
Upper Inner Fence
100.000000
Third Quartile
Median
First Quartile
80.000000
Lower Inner Fence
60.000000
Lower Outer Fence
40.000000
20.000000
0.000000
* mild outlier
b)
Holiday Hotel South is more skewed to the left (toward the
smaller values).
North.
It has more outliers than Holiday Hotel
The median for Holiday Hotel South is larger than that
of Holiday Hotel North.
3.61 a)
o extreme outlier
- = 15.983
x
Median = 11.15
There is no mode.
Chapter 3
109
Since there is a very large value (46) in the data set, the
mean will be affected.
The median would be a more appropriate
measure of central tendency.
b)
- - Md)/s
Sk = 3 (x
= 3 (15.989 - 11.15)/15.2472 = .9521
The data are slightly skewed to the right.
c)
First z = (46 - 15.983)/15.2472 = 1.968
Second z = (16-15.983)/15.2472 = .001
The first z-value says that Yahoos price / sales figure is
approximately 2 standard deviations from the mean.
3.62
a)
- = (1091.3)/7 = 155.9
x
b)
Midrange = (186.7 + 126.9)/2 = 156.8
c)
s2 = (172595.33 - (1091.3)2/7)/6 = 410.277
s = 20.255
d)
- = 13%
CV = s / x
e)
nP / 100 = 70 (40)/100 = 2.8; round up to 3
40th percentile = 146.9
f)
nP / 100 = 7(25)/100 = 1.75; round up to 2
25th percentile = 146.0
nP / 100 = 7(75)/100 = 5.25; round up to 6
75th percentile is 176
Interquartile range is 176 - 146 = 30
110
3.63
Instructor's Manual
a)
- = 7.236; s = .4843
x
At least 75% of the data should lie within 2 standard
deviations of the mean.
- - 2s to x
- + 2s
x
7.236 - 2(.4843) to 7.236 + 2(.4843)
6.268 to 8.204
b)
nP / 100 = 11(25)/100 = 2.75; round up to 3
nP / 100 = 11(75)/100 = 8.25; round up to 9
25th percentile is 6.8
75th percentile is 7.5
Interquartile range is 7.5 - 6.8 = .7
The following results are for 8.0 omitted
nP / 100 = 10(25)/100 = 2.5; round up to 3
nP / 100 = 10(75)/100 = 7.5; round up to 8
25th percentile is 6.8
75th percentile is 7.5
Interquartile range is 7.5 - 6.8 = .7
There is no change in the Interquartile range
3.64
a)
- = 203.2667; median = 205; s = 24.835
x
Sk = 3(203.2667 - 205)/24.835 = - .2094
b)
- = 275.333; median = 275; s = 60.1696
x
Sk = 3(275.333 - 275)/60.1696 = .0166
Chapter 3
c)
111
The Caribbean data have a larger mean, median and standard
deviation.
The European data are slightly negatively skewed
whereas the Caribbean data are slightly positively skewed.
However, for both groups the magnitude of the skewness
coefficient is small.
3.65
a)
i = 18(25)/100 = 4.5
Q1 = 2.7 (5th position of the ordered data)
i = 18(75)/100 = 13.5
Q3 = 4.1 (14th position of the ordered data)
IQR = Q3 - Q1 = 4.1 - 2.7 = 1.4
s =
b)
= 1.1757
i = 17(25)/100 = 4.25
Q1 = 2.7 (5th position of the ordered data)
i = 17(75)/100 = 12.75
Q3 = 3.8 (13th position of the ordered data)
IQR = Q3 - Q1 = 3.8 - 2.7 = 1.1
s =
= .8206
The standard deviation is affected more by the removal of the
outlier than the IQR is affected.
3.66
a)
- = 704/30 = 23.467
x
s =
= 7.0257
112
Instructor's Manual
Median =
(22 + 23)/2 = 22.5 (average of the 15th and 16th
positions)
Sk = 3(23.467 - 22.5)/7.0257
b)
z-scores:
-.78
-1.92
1.50
-1.49
-.64
c)
Sk = .413
.08
1.50
-.49
-.78
-.92
1.07
.36
1.21
-.49
-.35
Sk is close to zero.
.93
-1.06
.22
-.35
.79
-.92
2.35
-1.06
-.21
.65
.08
-.78
-.07
1.07
.50
Therefore, the data are not very skewed.
The observation 40 has a z-score of 2.35.
This observation
may be considered an outlier since it is more than 2 standard
deviations from the mean.
3.67
a)
A high Sharpe measure indicates that you are being well paid
for the risk you are taking.
The Janus fund had a better
risk-adjusted performance than the Magellan fund.
b)
The coefficient of variation is the standard deviation divided
by the mean and multiplied by 100.
The Sharpe measure has the
standard deviation in the denomination and average return in
excess of a treasury bill's performance in the numerator.
Sharpe measure can be thought of as the reciprocal of the
coefficient of variation.
3.68
- = 20
x
s = 9.1287
z = (x - 20)/9.1287
z-scores are
-1.095
-.548
1.095
.548
The
Chapter 3
113
The mean of the z-scores = 0 since the sum of the z-scores is 0.
The standard deviation of the z-scores is
s  (3  02 / 4)3  1
- = 0 always:
Show z
-)/s
z = (x - x
-) = (1/s)(x - x) = 0
= (1/s)(x - x
3.69
a)
Stem
-2
-1
-1
-0
-0
0
0
1
1
2
2
3
Leaf
(Leaf Unit = .10)
3
55
21
965
211
3
789
013
9
04
1
The data are not exactly bell-shaped.
However, the data
appear to be almost uniformly distributed between -1.5 and
2.5.
b)
Sk = 3(.25 - .1)/1.405 = .32
c)
By the empirical rule, approximately 68% of the values should
fall between .25 - 1.405 to .25 + 1.405; -1.155 to 1.655
3.70
a)
- = (484(17.5) + 1010(22.5) + 1188(27.5) + 795(32.5)
x
+ 278(37.5) + 44(42.5))/3799 = 101998/3799 = 26.8486
114
Instructor's Manual
b)
Since grouped data are used in part (a), the mean for each
interval is approximated by the midpoint of the interval.
Therefore, the actual data should yield a different mean.
3.71
The interval that contains approximately 95% of the rates is
$75 - (2)($15) to $75 + (2)($15)
3.72
-1.50 = (15 - 45)/s
s = (15 - 45)/-1.50
3.73
$45 to $105
s = 20
s2 = 400
By Chebyshev's inequality at least 88.9% of the observations will
lie within 3 standard deviations of the mean.
Therefore, at least
711 aluminum sheets will have castings between 3.0 - 3(.5) and 3.0
+ 3(.5).
This interval is 1.5 to 4.5.
Therefore, the supervisor
should accept the shipment.
3.74
n = 65
- = 520
x
- - 2s = 520 - (2)(25) = 470
x
s = 25
- + 2s = 520 + (2)(25) = 570
x
There should be at least 75% of the data within this interval.
we know that the population has a bell-shaped distribution, we
should expect approximately 95% of the data to fall within this
interval.
3.75
a)
- - s to x
- + s
x
13,062.33 - 7007.578 to 13,062.33 + 7007.578
6054.752 to 20069.908
b)
nP / 100 = 30(25)/100 = 7.5; round up to 8
n(P) / 100 = 30(50)/100 = 15; average the 15th and 16th
If
Chapter 3
positions.
nP / 100 = 30(75)/100 = 22.5; round up to 23
25th percentile is 7400
50th percentile is (13500 + 14500)/2 = 14000
75th percentile is 17500
Twenty-five percent of the data are less than or equal to
7400.
Fifty percent of the data are less than or equal to 14000.
Seventy-five percent of the data are less than or equal to
17500.
115
116
3.76
Instructor's Manual
a)
The histogram for calories is as follows.
Chapter 3
117
The histogram for fat is as follows.
Frequency Histogram
16
14
12
10
8
6
4
2
0
0 and under 1
1 and under 2
2 and under 3
3 and under 4
4 and under 5
Class Limits
b)
Neither histogram actually resembles a normal distribution.
However, the histogram for fat appears to be closer in shape
as it is more symmetrical than the histogram for calories and
has a single mode near the middle.
c)
Approximately 95% of the data should lie within two standard
deviations of the mean.
2 -2(1.2247) to 2 + 2(1.2247) gives an interval of -.45 to
4.45.
118
Instructor's Manual
d)
Calories
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
126.92308
5.2847889
125
120
26.947242
726.15385
-0.9103115
0.3683437
80
90
170
3300
26
Fat
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
2
0.240192231
2
2
1.224744871
1.5
0.070869565
-0.141526074
4.5
0
4.5
52
26
The Calories datas skewness is slightly positive and the Fat
datas skewness is slightly negative.
The mean, median, and
mode for the Fat data are all equal, whereas these values
differ somewhat for the Calories data.
Frequency Histogram
35
30
25
20
15
10
5
0
10 and under
20
20 and under
30
30 and under
40
40 and under
50
50 and under
60
Class Limits
60 and under
70
70 and under
80
80 and under
90
Chapter 3
3.77
119
a)
Box Plot
140.000000
Upper Outer Fence
120.000000
100.000000
Upper Inner Fence
80.000000
Third Quartile
60.000000
Median
First Quartile
40.000000
20.000000
Lower Inner Fence
0.000000
-20.000000
Lower Outer Fence
-40.000000
* mild outlier
o extreme outlier
b)
The data appear to have a bell-shaped distribution.
c)
The mean is 50.2167, standard deviation is 15.913, and the
median is 50. The coefficient of skewness is 3(50.2167 - 50)/
15.913 = .0408.
Note that the coefficient of skewness is
small in magnitude which is consistent with a bell-shaped
distribution.
120
Instructor's Manual
3.78 The box plot for MarketExp is as follows.
Box Plot
25000.000000
20000.000000
Upper Outer Fence
15000.000000
Upper Inner Fence
10000.000000
Third Quartile
Median
First Quartile
5000.000000
Lower Inner Fence
0.000000
Lower Outer Fence
-5000.000000
-10000.000000
* mild outlier
o extreme outlier
Chapter 3
The box plot for R&DExp is as follows.
Box Plot
18000.000000
16000.000000
Upper Outer Fence
14000.000000
Upper Inner Fence
12000.000000
10000.000000
Third Quartile
Median
8000.000000
First Quartile
6000.000000
4000.000000
Lower Inner Fence
2000.000000
Lower Outer Fence
0.000000
* mild outlier
o extreme outlier
121
122
Instructor's Manual
b)
The histogram for MarketExp
Frequency Histogram
35
30
25
20
15
10
5
0
2500 and 3500 and 4500 and 5500 and 6500 and 7500 and 8500 and 9500 and
10500
11500
12500
under
under
under
under
under
under
under
under
and under and under and under
3500
4500
5500
6500
7500
8500
9500
10500
11500
12500
13500
Class Limits
The histogram for R&DExp is as follows.
Frequency Histogram
35
30
25
20
15
10
5
0
2500 and 3500 and 4500 and 5500 and 6500 and 7500 and 8500 and 9500 and
10500
11500
12500
under
under
under
under
under
under
under
under
and under and under and under
3500
4500
5500
6500
7500
8500
9500
10500
11500
12500
13500
Class Limits
Chapter 3
c)
The data for R&DExp appear to follow a normal distribution.
d)
The distribution of both groups appears to be centered near
123
8,000. The shape of the R&DExp data appears to be more
symmetrically shaped than that of the MarketExp data. The
R&DExp data have 3 mild outliers whereas the MarketExp data do
not have any outliers as indicated by the box plots.
3.79 a)
Mean
Median
Minimum
Maximum
First Quartile
Third Quartile
b)
24.872
25.275
10.21
34.42
21.62
28.34
For the case where the largest value is set to 0,
Mean
Median
Minimum
Maximum
First Quartile
Third Quartile
24.1836
25.05
0
32.88
21.56
28.34
The statistics do not change by much.
c)
For the case where the four largest values are set to 0,
Mean
Median
Minimum
Maximum
First Quartile
Third Quartile
22.2438
24.935
0
31.56
21.05
27.45
124
Instructor's Manual
Note that the value of the mean has dropped by over 2 units.
The mean is affected more than the median. The first and third
quartiles are not affected very much.
3.80
a)
Approximately 75% of the data are between the values of 21 and
33.
The fiftieth percentile is approximately equal to 25.
Therefore, there are some mild outliers between 50 and 70 and
one extreme value over 70.