Download CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Berenson CH03
22/12/09
11:55 AM
Page 1
SOLUTIONS 1
CHAPTER 3: NUMERICAL DESCRIPTIVE MEASURES
Learning Objectives:
In this chapter, you learn:
• To calculate and interpret numerical descriptive measures of central
tendency, variation and shape for numerical data
• To calculate and interpret descriptive summary measures for a
population
• To construct and interpret a box-and-whisker plot
• To calculate and interpret the covariance and the coefficient of
correlation for bivariate data.
Solutions:
3.1 (a) Mean = 6; Median = 7; There is no mode. (b) Range = 7;
Variance = 8.5; Interquartile range = 5.5; Standard deviation = 2.9;
Coefficient of variation = (2.915/6) .100% = 48.6%. (c) Z scores: 0.343,
–0.686, 1.029, 0.686, –1.372. None of the Z scores is larger than 3.0 or
smaller than –3.0. There is no outlier.
3.2 (a) Mean = 7; Median = 7; Mode = 7. (b) Range = 9;
Variance = 10.8; Interquartile range = 5; Standard deviation = 3.286;
Coefficient of variation = (3.286/7) .100% = 46.94%. (c) Z scores: 0,
–0.913, 0.609, 0, –1.217, 1.522. None of the Z scores is larger than
3.0 or smaller than –3.0. There is no outlier.
3.3 (a) Mean = 6; Median = 7; Mode = 7. (b) Range = 12; Variance = 16;
Interquartile range = 6; Standard deviation = 4; Coefficient of variation
= (4/6).100% = 66.67%.
3.4 (a) Mean = 2; Median = 7; Mode = 7. (b) Range = 17; Variance = 62;
Interquartile range = 14.5; Standard deviation = 7.874; Coefficient of
variation = (7.874/2).100% = 393.7%.
–
3.5 RG = [(1 + 0.1)(1 + 0.3)]1/2 – 1 = 19.58%.
Grade X
Grade Y
575
575
6.4
575.4
575
2.1
Rolls and Lean Burgers
–
Mean: X = 11.7
Rank of the median is 11.5
Median = 8.15
3 modes 5.1, 5.4 and 8.1
Rank of the first quartile is 5.75 which rounds to 6 so Q1 = 5.4
Rank of the third quartile is 17.25 which rounds to 17 so, Q3 = 19.6
Variance S2 = 69.8314
Standard deviation S = 8.3565…
Range = 30.9
Interquartile range = 14.2
Coefficient of variation CV = 71.42…%
Salads
–
Mean: X = 15.12
Median = 15.8
no mode
Rank of the first quartile is 1.5 so Q1 = 5.3
Rank of the third quartile is 4.5 so, Q3 = 24.6
Variance S2 = 99.692
Standard deviation S = 9.9845…
Range = 24
Interquartile range = 19.3
Coefficient of variation CV = 66.035…%
Traditional Items
–
Mean: X = 31.85
Median = 23.05
no mode
Rank of the first quartile is 1.75 so Q1 = 20.8
Rank of the third quartile is 5.25 so, Q3 = 39.3
Variance S2 = 318.019
Standard deviation S = 17.833…
Range = 45.3
Interquartile range = 18.5
Coefficient of variation CV = 55.99…%
(c) As expected the traditional items have the highest average fat content,
followed by salads with rolls and lean burgers the least. However, the
traditional items vary the most in their fat content as indicated by the
large standard deviation so some of these items have less fat than some
of the ‘healthier’ options.
3.6 (a)
Mean
Median
Standard deviation
3.7 (a) & (b)
n
(b) If quality is measured by the average inner diameter, Grade X tyres
provide slightly better quality because X’s mean and median are both
equal to the expected value, 575 mm. If, however, quality is measured by
consistency, Grade Y provides better quality because, even though Y’s
mean is only slightly larger than the mean for Grade X, Y’s standard
deviation is much smaller. The range in values for Grade Y is 5 mm
compared to the range in values for Grade X which is 16 mm.
3.8 (a) Mean: X =
(c)
Median =
Mean
Median
Standard deviation
Grade X
Grade Y, Altered
575
575
6.4
577.4
575
6.1
In the event the fifth Y tyre measures 588 mm rather than 578 mm, Y’s
average inner diameter becomes 577.4 mm, which is larger than X’s
average inner diameter, and Y’s standard deviation swells from 2.07 mm
to 6.11 mm. In this case, X’s tires are providing better quality in terms of
the average inner diameter with only slightly more variation among the
tires than Y’s.
∑ Xi
39,600
=
= 2,200
18
n
So mean daily sales is $2,200
n + 1 18 + 1
Rank of the median is
=
= 9.5
2
2
i=1
2,330 + 2,390
= 2,360
2
2 modes $2,390 and $2,400
n + 1 18 + 1
=
= 4.75 round to 5 so
Rank of the first quartile is
4
4
Q1 = 1,525
Rank of the third quartile is
so, Q3 = 2,545
3(n + 1) 3(18 + 1)
=
= 14.25 round to 14
4
4
Berenson CH03
22/12/09
11:55 AM
Page 2
2 SOLUTIONS
3.11 Excel output:
n
(b) Variance S 2 =
Σ Xi2 − nX 2
i=1
n−1
Standard deviation S =
S2=
=
96,601,350 − 18 17
2,2002
Price($)
340
450
450
280
220
340
290
370
400
310
340
430
270
380
= 557,726.470…
557,726.470… = 746.810…
Range = 3,580 – 1,350 = 2,230
Interquartile range = 2,545 – 1,525 = 1,020
Coefficient of variation
746.810…
⎛ S⎞
CV = ⎜ ⎟ 100% =
100% = 33.94…%
2,200
⎝X⎠
(c) The mean of daily sales is $2,200 and the median is $2,360,
suggesting that daily sales may be skewed to the left (only a few days
with low sales) as mean < median. Furthermore, daily sales are fairly
varied ranging from $1,350 to $3,580 with the middle 50% of days have
sales between $1,525 and $2,545. The standard deviation of $746 implies
that the majority of days have sales within $746 of the mean of $2,200.
3.9 (a) Mean 2.45 minutes; Median 2.5 minutes; Mode 1.4 minutes; First
quartile 1.4 minutes; Third quartile 3.1 minutes. (b) Variance 2.271...;
Standard deviation 1.507... minutes; Range 5.5 minutes; Interquartile
range 1.7 minutes; Coefficient of variation 61.55...%
Time
Z score
0.6
0.9
1.4
1.4
1.5
2.4
2.6
2.7
2.8
3.1
3.9
6.1
–1.025
–0.725
–0.225
–0.225
–0.125
0.775
0.975
1.075
1.175
1.475
2.275
4.475
Distance Km
25.46
26
27
43
117.6
10.844
(c) The mean median and mode are all similar. So on average these
negative z scores and the data is skewed to the right, in spite of the mean
and median being very close. (d) The average time to serve a customer is
approximate 2.5 minutes but this varies from 0.6 minutes to 6.1 minutes,
probably indicating that some customers just need a yes or no answer or
a brochure but that others need detailed information.
A
7377.33
6667
7316.5
8091
856.229
733128
11.61%
3187
1424
5544
8731
(a) mean = $348; median = $340; 1st quartile = $290; 3rd quartile =
$400. (b) variance = 4910; standard deviation = $70; range = $230;
interquartile range = $110; CV = 20.14%. None of the Z scores are less
than –3 or greater than 3. There is no outlier in the price. (c) The price of
the digital cameras is rather symmetrical. (d) The mean price is $348
while the middle ranked price is $340. The average scatter of price
around the mean is $70. 50% of the price is scattered over $110 while
the difference between the highest and the lowest price is $230.
3.12 (a) and (b)
Sample Data:
Arithmetic Mean
Median
Mode
Range
Variance (Sample)
Standard deviation (Sample)
(c) 6.1 is an outlier, as the positive z-scores are generally larger than the
3.10 (a) and (b)
Manufacturer
Mean
First quartile
Median
Third quartile
Standard deviation
Sample variance
Coefficient of variation
Range
Interquartile range
Minimum
Maximum
Price Z Score
–0.1121
1.4576
1.4576
–0.9684
–1.8246
–0.1121
–0.8257
0.3160
0.7441
–0.5402
–0.1121
1.1722
–1.1111
0.4587
B
8260.90
7569
8140.5
9036
909.829
827789
11.01%
3034
1467
6701
9744
(c) Manufacturer’s B bulbs last on average longer, as mean and median
are higher, however, the lifetimes are more varied as the standard
deviation and interquartile range are also larger.
people drive approximately 26 kms to work. Furthermore, the distances
driven to work are clustered in the interval 15 kms to 37 kms.
3.13 (a) Mean = 473.5; Median = 451. There is no mode.
The median seems to be a better descriptive measure of the data, since it
is closer to the observed values than is the mean. Also the outlier of
1,049 affects the mean.
(b) Range = 785; Variance = 44,422.44; Standard deviation = 210.77
(c) From the manufacturer’s viewpoint, the worst measure would be to
compute the percentage of batteries that last over 400 hours (8/13 = .61).
The median (451) and the mean (473.5) are both over 400, and would be
better measures for the manufacturer to use in advertisements.
(d) (a), (b)
Mean
Median
Mode
Range
Variance
Standard deviation
Original Data
473.5
451
none
785
44,422.44
210.77
Altered Data
550.4
492
none
1,078
99,435.26
315.33
(c) From the manufacturer’s viewpoint, the worst measure remains the
percentage of batteries that last over 400 hours (9/13 = .69). The median
(492) and the mean (550.38) are both well over 400, and would be better
measures for the manufacturer to use in advertisements.
The shape of the distribution of the original data is right-skewed, since
the mean is larger than the median.
The shape of the distribution of the altered data set is right-skewed as
well, since its mean is also larger than its median.
Berenson CH03
22/12/09
11:55 AM
Page 3
SOLUTIONS 3
3.14 (a) Mean = 4.287; Median = 4.5; Q1 = 3.20; Q3 = 5.55.
(b) Variance = 2.683; Standard deviation = 1.638; Range = 6.08;
Interquartile range = 2.35; Coefficient of variation = 38.21%
Z scores: –0.05 0.77 –0.77 0.51 0.30 –1.19 –0.46 –0.66 0.13 1.11
–2.39 0.51 1.33 1.16 –0.30
There are no outliers.
(c) Since the mean is less than the median, the distribution may be
left-skewed. (d) The mean and median are both under 5 minutes and the
distribution may be left-skewed, meaning that there are more unusually
low observations than there are high observations. But six of the 15 bank
customers sampled (or 40%) had wait times in excess of 5 minutes. So,
although the customer is more likely to be served in less than 5 minutes,
the manager may have been overconfident in responding that the customer
would ‘almost certainly’ not wait longer than 5 minutes for service.
3.15 (a) Mean = 7.114; Median = 6.68; Q1 = 5.64; Q3 = 8.73.
(b) Variance = 4.336; Standard deviation = 2.082; Range = 6.67;
Interquartile range = 3.09; Coefficient of variation = 29.27%.
(c) Since the mean is greater than the median, the distribution may be
right-skewed. (d) The mean and median are both well over 5 minutes
and the distribution may be right-skewed, meaning that there are more
unusually high observations than low. Further, 13 of the 15 bank
customers sampled (or 86.7%) had wait times in excess of 5 minutes.
So, the customer is more likely to experience a wait time in excess of
5 minutes. The manager overstated the bank’s service record in
responding that the customer would ‘almost certainly’ not wait longer
than 5 minutes for service.
3.16
Asking Price
Mean
472440
Median
457000
First quartile
397000
Third quartile
529000
Standard deviation
102394.989
Sample variance
1.0485E+10
Kurtosis
1.92131411
Range
524000
Interquartile range
132000
The mean asking price is $472,440. From the histogram for asking
prices, Problem 2.54, we can see that $472,440 is in the upper of the two
central classes $400,000 to $500,000. We would expect the mean to be
higher than the “centre” of the asking prices as there are two extreme
values, the asking prices over $600,000, which will have affected the
mean.
3.17 (a)
Year
2002
2003
2004
2005
2006
2007
Geometric rate of return
Annual Return
Hang Seng
S&P/ASX 200
–18.2%
–12.1%
34.9%
9.7%
13.2%
22.8%
4.5%
17.6%
34.2%
19.0%
39.3%
11.8%
16.0%
10.8%
(b) The average rate of return for the Hang Seng is higher; however, it
also is more variable.
3.18 (a)
Historical crediting rate for
year ending 30 June %
Superannuation
Fund
2008
Conservative
Balanced
–3.8
Balanced
–5.9
High Growth –10.6
Socially
Responsible,
HG
–8.5
Average returns to
30 June 2008 % p.a.
2007 2006 2005 2004
3 year
5 year
11.7
14.8
19.7
11.4 11.6
14.4 13.7
17.6 18.1
10.9
14.3
16.8
6.18%
7.31%
7.96%
8.18%
9.94%
11.66%
17.6
20.7 15.9
17.5
9.11%
12.08%
(b) The average rate of return is lower over the previous 3 years than the
previous 5 years due to the negative returns in the year ending 30th June
2008. The average returns for High Growth and Socially Responsible
shares are the highest and are similar, however, these funds had the
largest losses in the year ending 30 June 2008.
3.19 (a) Population Mean = 6. (b) ␴2 = 9.4; ␴ = 3.1.
3.20 (a) Population Mean = 6. (b) ␴ = 1.67; ␴2 = 2.8.
3.21 (a)
Population Data
Mean
Variance (Pop)
Standard Deviation (Pop)
Male
97.92
93.58
9.67
Female
36.17
38.97
6.24
(b) Male
µm ⫾ ␴m = 97.92 ⫾ 9.67 = (88.24, 107.59) 9 out of 12 or 75% of months are
in this range.
µm ⫾ 2␴m = 97.92 ⫾ 2 ⫻ 9.67 = (78.57, 117.26) 100% of months are in
this range.
Female
µf ⫾ ␴f = 36.17 ⫾ 6.24 = (29.92, 42.41) 7 out of 12 or 58.3% of months
are in this range.
µf ⫾ 2␴f = 36.17 ⫾ 2 ⫻ 6.24 = (23.68, 48.65) 100% of months are in this
range.
(c) The proportion within one standard deviation of the mean for the
male distribution is higher than what would be expected from the
empirical rule, while the proportion within one standard deviation of the
mean for the female distribution is lower than what would be expected
from the empirical rule. Therefore, these distributions may not be mound
shaped.
3.22 (a) As have weekly sales for all 52 weeks in the year this is
population data
Population Data
Mean
Variance (Pop)
Standard Deviation (Pop)
Forgive
564.873
47778.799
218.584
Rejoice
495.638
66121.428
257.141
(b) Weekly sales in the previous year were higher and less varied for
Forgive chocolates than for Rejoice chocolate. However, as the
population standard deviation is large for both products, we can
conclude that the weekly quantity sold for each product was highly
variable
Berenson CH03
22/12/09
11:55 AM
Page 4
4 SOLUTIONS
(c) & (d)
Lower
Value
346.29
127.71
–90.88
Upper
Value
783.46
1002.04
1220.62
Lower
Rejoice
Value
Within 1 standard deviation
238.50
Within 2 standard deviations –18.64
Within 3 standard deviations –275.78
Upper
Value
752.78
1009.92
1267.06
Forgive
Within 1 standard deviation
Within 2 standard deviations
Within 3 standard deviations
Number Percentage
35
67.31%
51
98.08%
52
100.00%
Number Percentage
41
78.85%
50
96.15%
52
100.00%
Last year, for the regional city store, sales were on average 564.9kg and
495.6kg for Forgive and Rejoice chocolates respectively. Furthermore, in
a typical week sales were between 346.3kg and 783.5kg for Forgive
chocolates and between 238.5kg and 752.8kg for Rejoice chocolates.
The percentage of weeks with sales within one, two, and three standard
deviations of the mean approximately follows the empirical rule for
Forgive chocolates. Therefore, distribution of the quantity sold weekly of
Forgive chocolates may be mound shaped with most weeks sales being
close to the mean of 564.9kg, and a few weeks having very low or high
sales. However, Rejoice chocolates have a larger percentage of weeks
within one standard deviation than that given by the empirical rule so
may not be mound shaped.
As all values are within three standard deviations of the mean there are
no outliers. However, there is one possible outlier for Forgive, sales of
1039.2kg in Week 1 (New Year) and two for Rejoice, sales of 1031.4kg
and 1056.3kg in weeks 49 and 51 (leading up to Christmas). As you
would expect sales of Rejoice chocolates to increase over the Christmas
period, and possibly Forgive chocolates are needed after New Years
parties, these quantities are probably not outliers.
found within 2 standard deviations of the mean. 93% of the values are
within 2 standard deviations of the mean and 100% of the values are
within 3 standard deviations of the mean. (d) The preceding suggests
there are no outliers within the sample set.
3.25
mj
fj
5
15
25
35
45
10
20
40
20
10
n =100
mj fj
50
300
1,000
700
450
∑(mj fj) = 2,500
–
(mj – X )2 fj
4,000
2,000
0
2,000
4,000
–
∑(mj – X )2 fj = 12,000
c
(a) X =
mj fj
∑
j=1
n
=
2,500
= 25
100
c
Σ (mj − X )2 fj
j=1
(b) S =
n−1
3.26
mj
= 11.01
fj
5
15
25
35
45
40
25
15
15
5
n = 100
mj fj
200
375
375
525
225
∑(mj fj) = 1,700
–
(mj – X )2 fj
5,760
100
960
4,860
3,920
–
∑(mj – X )2 fj = 15,600
c
3.23 (a) and (b) As data is all employee ages, this is population data
Ages
Z score
19
–0.43
19
–0.43
45
2.94
20
–0.30
21
–0.17
21
–0.17
18
–0.56
20
–0.30
23
–0.09
17
–0.69
Mean
22.30
Variance
59.810
Standard Deviation
7.734
(a) X =
3.24 (a) mean = 35. On average, the 30 employees worked 35 hours last
week. (b) variance = 188.45, standard deviation = 13.73. The average
squared distances between the 30 employees working hours is 188.45. If
the distribution is approximately symmetrical, about 68% of the work
hours will be within 13.73 of the mean value of 35. (c) Since the median
= 37.5 it is highly likely that the distribution is not symmetrical.
According to the Chebyshev rule at least 75% of the values will be
n
=
1,700
= 17
100
c
(b) S =
Σ (mj − X )2 fj
j=1
n−1
= 12.55
3.27 Excel output for March:
mj
fj
1,000
3,000
5,000
7,000
9,000
11,000
(c) The mean age of all employees is 22.3 years with a standard
deviation of 7.7 years. However only 20% (2 out of 10) employees have
ages above the mean, so the mean is not a good measure of the ‘typical
age’ of an employee. Furthermore, 90% (9 out of 10) of employees have
ages within one standard deviation of the mean. There is one extreme
age of 45 years 2.9 standard deviations above the mean, which has
unduly affected the mean.
mj fj
∑
j=1
6
13
17
10
4
0
n = 50
mj fj
6,000
83,030,400
39,000
38,459,200
85,000
1,332,800
70,000
51,984,000
36,000
73,273,600
0
0
–
∑(mj fj) = 236,000 ∑(mj – X )2 fj = 2.48E+08
Excel output for April:
mj
1,000
3,000
5,000
7,000
9,000
11,000
fj
10
14
13
10
0
3
n = 50
–
(mj – X )2 fj
mj fj
–
(mj – X )2 fj
10,000
1.16E+08
42,000
27,440,000
65,000
4,680,000
70,000
67,600,000
0
0
33,000
1.31E+08
–
∑(mj fj) = 220,000 ∑(mj – X )2 fj = 3.46E+08
Berenson CH03
22/12/09
11:55 AM
Page 5
SOLUTIONS 5
3.32 (a)
Five-number Summary
c
(a) March: X =
mj fj
∑
j=1
n
=
236,000
= 4,720
50
c
April: X =
mj fj
∑
j=1
n
=
220,000
= 4,400
50
c
(b) March: S =
Σ (mj − X )2 fj
j=1
n−1
A
5544
6667
7316.5
8091
8731
Minimum
First Quartile
Median
Third Quartile
Maximum
B
6701
7569
8140.5
9036
9744
(b) CFL light Bulbs
= 2,250.08
CFL Light Bulbs
c
April: S =
Σ (mj − X )2 fj
j=1
n−1
Manufacturer B
= 2,657.30
(c) The arithmetic mean has declined by $320 while the standard
Manufacturer A
deviation has increased by $407.22.
3.28 (a) Five-number summary: 2 3 7 8.5 9
(b)
5500 6000 6500 7000 7500 8000 8500 9000 9500 10000
Life in Hours
0
5
Manufacturer’s B bulbs generally last longer.
10
The distribution is left-skewed.
There is a longer distance from Q1 to Q2 than from Q2 to Q3, confirming
our conclusion that the data are left-skewed.
3.33 (a) From solution of Problem 3.8 the five-number summary is
1350
1525 2360 2545 3580
(b)
Box-and-whisker plot
3.29 (a) Five-number summary: 3 4 7 9 12
(b)
1000
0
10
5
1500
2000
15
The distribution is almost symmetrical.
The data set is almost symmetrical since the median line almost divides
the box in half but the whiskers show right skewness.
3.30 (a) Five-number summary: 0 3 7 9 12
(b)
2500
3000
Daily sales ($)
3500
4000
The results are inconsistent. The right-hand whisker is far longer than
the left-hand and the distance from the median to Xlargest is greater than
the distance from the median to Xsmallest indicating right skewness.
However the left-hand side of the box is far longer than the right-hand
side indicating left skewness.
3.34 (a) From solution of Problem 3.7 the five-number summaries are
–10
0
10
20
The distribution is left-skewed.
The box-and-whisker plot shows a longer left box from Q1 to Q2 than
from Q2 to Q3, visually confirming our conclusion that the data are leftskewed.
Rolls and Lean Burgers
Salads
Traditional Items
3.7
3.9
19.8
5.4
5.3
20.8
8.15
15.8
23.05
19.6
24.6
39.3
34.6
27.9
65.1
(b)
Fat Content per Serve grams
Traditional Items
3.31 (a) Five-number summary: –8 –6.5 7 8 9
(b)
Salads
–10
–5
0
5
Rolls and Lean
Burgers
10
0
The distribution is left-skewed.
The box-and-whisker plot shows a longer left box from Q1 to Q2 than
from Q2 to Q3, visually confirming our conclusion that the data are leftskewed.
10
20
30
40
50
60
70
(c) The traditional items have the highest average fat content, followed
by salads with rolls and lean burgers the least. However, the fat content
of traditional items and rolls and lean burgers vary more than that of the
salads. Therefore, some rolls and lean burgers have more fat than any
Berenson CH03
22/12/09
11:55 AM
Page 6
6 SOLUTIONS
salad and some traditional items have less fat than some ‘healthier’
options. Rolls and lean burgers and the traditional items are both skewed
to the right while the fat content of salads seems fairly symmetric.
3.38 (a) cov (ULP, Diesel) = 18.5301K
r = 0.6814K
(b) In NSW on this day there was a slightly strong positive linear
3.35 (a) Commercial district: Five-number summary: 0.38 3.2 4.5 5.55
relationship between petrol and diesel prices, where petrol prices are
high so are diesel prices.
6.46
Residential area: Five-number summary: 3.82 5.64 6.68 8.73 10.49
(b) Commercial district:
3.39 (a)
(e)
Annual Water Usage—Local Restaurants
Waiting Time
2
0
4
6
8
The distribution is skewed to the left.
Residential area:
Box-and-whisker plot
Annual Water Usage—Kilolitres
Box-and-whisker plot
1000
900
800
700
600
500
400
300
200
100
0
40
45
50
Waiting Time
4
2
0
8
6
10
12
The distribution is skewed slightly to the right.
(c) The central tendency of the waiting times for the bank branch located
in the commercial district of a city is lower than that of the branch
located in the residential area. There are a few longer than normal
waiting times for the branch located in the residential area whereas there
are a few exceptionally short waiting times for the branch located in the
commercial area.
3.36 (a) cov(X,Y) = 65.2909. (b) SX2 = 21.7636, SY2= 195.8727
r=
cov(X,Y )
S 2X S 2Y
65.2909
=
21.7636
195.8727
55
60
65
70
75
Number of Seats
= +1.0
(c) There is a perfect positive linear relationship between X and Y; all the
points lie exactly on a straight line with a positive slope.
3.37 Let X = number of sales staff; Y = sales $ million.
n
(b) cov (size, water usage) = 10.4029K
r = 0.7941K
(c) From the results of parts (a) and (b) we can conclude that there is a
moderately strong positive linear relationship between size and annual
water usage.
3.40 All of the data collected would be considered to be the population.
Let X = Exports; Y = Imports
N
(a) SSXY = ∑ (Xi − X ) (Yi − Y ) = 1,181,005,626
i=1
cov(X,Y) = SSXY = 1,181,005,626/42 = 28,119,181
N
SSXY
1,181,005,626
=
= 0.7531
(b) r =
SSX SSY
1,638,229,311 1,501,163,228
(c) The correlation coefficient is more valuable for expressing the
relationship between exports and imports as it does not depend on the
measurement units. (d) We can conclude that there is a relatively strong
positive linear relationship between exports and imports.
3.41 Central tendency or location refers to the fact that most sets of data
(a) SSXY = ∑ XiYi − nX Y = 10,569 − (10 22 46) = 449
show a distinct tendency to group or cluster about a certain central point.
SSXY
449
=
= 49.88…
n−1
9
Can conclude that there is a positive linear relationship between the
number of people in a sales team and the sales generated.
3.42 The arithmetic mean is a simple average of all the values, but is
subject to the effect of extreme values. The median is the middle ranked
value, but varies more from sample to sample than the arithmetic mean,
although it is less susceptible to extreme values. The mode is the most
common value, but is extremely variable from sample to sample.
i=1
cov(X, Y ) =
n
SSX = ∑ Xi2 − nX 2 = 5,022 − (10 222) = 182
i=1
n
SSY =
∑
i=1
r=
Yi2 − nY 2 = 22,822 − (10 462) = 1,662
SSXY
SSX
SSY
=
449
182
1,662
= 0.816…
(b) Can conclude that there is a fairly strong positive linear relationship
between the number of people in a sales team and the sales generated.
3.43 The first quartile is the value below which 1⁄4 of the total ranked
observations will fall, the median is the value that divides the total
ranked observations into two equal halves and the third quartile is the
observation above which 1⁄4 of the total ranked observations will fall.
3.44 Variation is the amount of dispersion, or ‘spread’, in the data.
3.45 The Z score measures how many sample standard deviations an
observation in a data set is away from the sample mean.
Berenson CH03
22/12/09
11:55 AM
Page 7
SOLUTIONS 7
3.46 The range is a simple measure, but only measures the difference
between the extremes. The interquartile range measures the range of the
centre fifty percent of the data. The standard deviation measures
variation around the mean while the variance measures the squared
variation around the mean, and these are the only measures that take into
account each observation. The coefficient of variation measures the
variation around the mean relative to the mean.
3.47 The Chebyshev rule applies to any type of distribution while
the empirical rule applies only to data sets that are approximately
bell-shaped. The empirical rule is more accurate than Chebyshev rule
in approximating the concentration of data around the mean.
3.48 (a) mean = 5.5014; median = 5.515; first quartile = 5.44; third
quartile = 5.57. (b) range = 0.52; interquartile range = 0.13; variance =
0.0112; standard deviation = 0.10583; coefficient of variation = 1.924%.
(c) The mean weight of the tea bags in the sample is 5.5014 grams while
the middle ranked weight is 5.515. The company should be concerned
about the central tendency because that is where the majority of the
weight will cluster around.
The average of the squared differences between the weights in the
sample and the sample mean is 0.0112 whereas the square-root of it is
0.106 gram. The difference between the lightest and the heaviest tea
bags in the sample is 0.52. 50% of the tea bags in the sample weigh
between 5.44 and 5.57 grams. According to the empirical rule, about
68% of the tea bags produced will have weight that falls within
0.106 gram around 5.5014 grams. The company producing the tea bags
should be concerned about the variation because tea bags will not weigh
exactly the same due to various factors in the production process, e.g.
temperature and humidity inside the factory, differences in the density of
the tea, etc. Having some idea about the amount of variation will enable
the company to adjust the production process accordingly.
3.49 (a) and (b)
Sample Data
NSW ULP NSW Diesel QLD ULP QLD Diesel
Number of Data Points
40
40
40
40
Minimum
137.90
172.90
133.50
164.90
Maximum
164.90
187.90
169.90
192.90
Total
6064.50
7278.00
5809.30
6947.90
Arithmetic mean
151.61
181.95
145.23
173.70
Median
151.90
180.90
144.90
172.90
Mode
151.90
179.90
137.90
170.90
First Quartile
144.9
179.9
137.9
170.9
Third Quartile
158.9
185.9
149.9
176.9
Range
27.00
15.00
36.40
28.00
Inter Quartile Range
14.00
6.00
12.00
6.00
Variance (Sample)
53.609
13.793
68.349
23.006
Standard Deviation
(Sample)
7.322
3.714
8.267
4.796
Coefficient of
Variation (Sample)
4.83%
2.04%
5.69%
2.76%
We can conclude that on this day, petrol and diesel prices were on average
higher but less variable in NSW than in Queensland. Also diesel prices are
higher but less variable than unleaded petrol prices.
(c)
Fuel Prices—August 2008
Qld Diesel
Qld ULP
NSW Diesel
NSW ULP
(d)
130
140
150
160
170
180
190
200
Box-and-whisker plot
Teabags
5
5.2
5.4
5.6
5.8
6
The data is slightly left skewed.
(e) On average, the weight of the tea bags is quite close to the target of
5.5 grams. Even though the mean weight is close to the target weight
of 5.5 grams, the standard deviation of 0.106 indicates that about 75%
of the tea bags will fall within 0.212 gram around the target weight of
5.5 grams. The interquartile range of 0.13 also indicates that half of the
tea bags in the sample fall in an interval 0.13 gram around the median
weight of 5.515 grams. The process can be adjusted to reduce the
variation of the weight around the target mean.
Petrol and diesel prices are skewed to the right in Queensland but
symmetric and skewed to the left respectively in New South Wales.
Furthermore, the box-and whisker plots show that prices for both petrol
and diesel are generally higher in New South Wales than in Queensland.
(d) cov (ULP, Diesel) = 26.3039K
r = 0.6633K
In Queensland on this day there was a slightly strong positive linear
relationship between petrol and diesel prices, where petrol prices are
high so are diesel prices.
3.50 (a) & (b)
Sample Data:
Number of Data Points
Minimum
Maximum
Total
Arithmetic mean
Median
Mode
First Quartile
Third Quartile
Range
Inter Quartile Range
Variance (Sample)
Standard Deviation (Sample)
Coefficient of Variation (Sample)
Total Mark
55
14
94
3343
60.782
63
Multi
56
73
80
17
327.581
18.099
0.298
Berenson CH03
22/12/09
11:55 AM
Page 8
8 SOLUTIONS
The sample mean 60.782 is slightly less than the median 63 which
indicates that the data may be skewed to the left with a few small values
(marks). From the standard deviation of 18 we can conclude that the
majority of marks, or a typical student’s mark, will be in the range of
approximately 61 ⫾ 18, that is, 43 to 79. 50% of students have marks in
the range 56 to 73 with 25% less than 56 and 25% more than 73.
The distribution of processing times for Plant A is right-skewed.
(d) Processing times for Plants A and B are quite different. Plant B has a
greater range of processing times, much more dispersion among data
values, a higher median, a higher value for the third quartile, and a
greater extreme value than Plant A.
(c)
3.53 (a) Other charts are also appropriate.
20
40
60
Total Mark out of 100
80
Data seems skewed to the left as the distance from median to lowest
mark is longer than the distance from median to the highest mark, and
the left-hand whisker is longer than the right hand whisker. However, the
right-hand side of the box is longer than the left-hand side.
1948.8
(d) cov(X,Y) =
= 39.771…
49
cov(X,Y)
39.771…
r=
=
= 0.7939…
SXSY
7.48… ⫻ 6.69…
(e) There is a strong positive linear relationship between a student’s
semester mark and their exam mark.
3.51 (a)
Mean
Standard deviation
2001
Males
35.3
21.8
2001
Females
37.1
22.7
2006
Males
36.4
22.2
2006
Females
38.1
22.9
Note: these are population parameters and are approximations as
calculated from the frequency distribution with the midpoint of the last
class (85 and over) estimated to be 87.
(b) These statistics show that the average age for females is higher than
males and slightly more varied. Furthermore, the Australian population
is growing older, with the average age of both males and females
increasing from 2001 to 2006. Ages are also slightly more varied in
2006 than 2001.
3.52 (a), (b)
Plant A
9.382
8.515
7.29
11.42
17.2
4.13
15.981
3.998
42.61%
Mean
Median
Q1
Q3
Range
Interquartile range
Variance
Standard deviation
Coefficient of variation
(c)
Plant B
11.354
11.96
6.25
14.25
23.42
8
26.277
5.126
45.15%
Box-and-whisker plot
B
A
50
0
100
0
20
100
150
200
250
Annual Household Income $’000’s
40
60
80 100 120 140
Monthly Account $
(b)
Sample Data:
Number of Data Points
Minimum
Maximum
Total
Arithmetic mean
Median
Mode
First Quartile
Third Quartile
Range
Inter Quartile Range
Variance (Sample)
Standard Deviation (Sample)
Coefficient of Variation (Sample)
160
Income $’000’s
40
11
245
3,622
90.55
74.5
Multi
40
122
234
77.5
3,950.613
62.854
0.684
5
10
15
20
25
180
200
Amount $
40
0.47
182.2
2,104.99
52.625
47.18
Nil
5.47
89.15
181.73
78.475
1,906.184
43.66
0.83
both household income and monthly account are skewed to the right, this
is supported by in both cases the mean being higher than the median,
due to the mean being affected by a few large values. The annual
incomes of 25% of households are less than $40,000 and 25% are more
than $122,000 while the monthly long distance call accounts of 25% of
households are less than $5.47 and 25% are more than $89.15.
(d)
Scatter plot
200
150
100
50
0
0
50
100
150
200
250
Annual Household Income $’000
0
300
(c) From the box-and-whisker plots we can see that the distributions of
Monthly Long Distance Account $
0
300
Berenson CH03
22/12/09
11:55 AM
Page 9
SOLUTIONS 9
(e) r = 0.565…. (f) The scatter plot and the value of the correlation
(d)
coefficient show that there is a weak positive linear relationship between
a household’s income and the monthly long distance phone account.
Capital City Suburb—Rejoice Chocolates
3.54 (a) and (b)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
A
Rejoice
Mean
First Quartile
Median
Third Quartile
Mode
Standard Deviation
Sample Variance
Range
Interquartile Range
Coefficient of Variation
Minimum
Maximum
Sum
Count
B
554.76
441.9
490.35
596.6
#N/A
216.898
47044.927
747.3
154.7
39.10%
385.2
1132.5
5547.6
10
For the capital city suburban store the mean weekly sales for Rejoice
chocolates is 554.76kg. Furthermore, for 50% of weeks sales are
between 441.9kg and 596.6kg. As the median is less than the mean, the
distribution may be skewed to the right.
The range is large compared to the interquartile range.
300
Rejoice
385.2
412.0
441.9
445.2
453.4
527.3
545.1
596.6
608.4
1,132.5
Z-score
–0.8
–0.7
–0.5
–0.5
–0.5
–0.1
0.0
0.2
0.2
2.7
As 90% of the values are within one standard deviation of the mean, the
distribution does not follow the empirical rule, so is not a symmetric
mound shaped distribution. Also the weekly sales of 1132.5kg may be a
possible outlier, so should be investigated.
500
600
800
900
700
Weekly Sales, kg
1000
1100
1200
The distribution of weekly sales of Rejoice chocolates for this store is
skewed to the right.
3.55 (a) Geometric mean change in CPI for Australia 2004 to 2008 is
3.12% Geometric mean change in CPI for New Zealand 2002 to 2006 is
3.04%. (b) For the given period New Zealand’s annual inflation rate was
on average slightly less than Australia’s annual rate of 3.12%.
3.56 (a) Forgive r = 0.9260K
Rejoice: r = 0.9701K
For both products there is a very strong positive linear relationship
between weekly quantity sold and associated costs.
(b) r = –0.3420K there is a weak negative linear relationship between
Forgive and Rejoice chocolates. When sales are high for one sales tend
to be low for the other.
3.57 Much of this output is not valid. Examples
•
(c)
400
•
Gender and major are categorical variables so box-and-whisker plots
are not valid for this data, nor are the calculated statistics, mean,
median etc
Height and grade point value are numerical variables so pie charts
are not valid for this data
3.58 Answers will vary.