Download Chapter3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
1 AP* SOLUTIONS
Chapter 3 Numerical Methods for Describing Data Distributions
Section 3.1 Exercise Set 1
3.1: The distribution is approximately symmetric with no outliers, so the mean and standard
deviation should be used to describe the center and spread, respectively.
30
35
40
45
50
55
60 65
70
amount (mL)
75
80
85
90
95
3.2: The distribution is positively skewed with an outlier, so the median and interquartile range
should be used to describe the center and spread, respectively.
0
10
20
30
40
50
60
Tip Percent
70
80
90
100
3.3: The distribution is positively skewed with a possible outlier, so the median and
interquartile range should be used to describe center and spread, respectively.
85
90
95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175
Defects per 100 cars
*AP and Advanced Placement Program are registered trademarks of the College Entrance Examination Board,
which was not involved in the production of, and does not endorse, this product.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
2 3.4: The average may not be the best measure of a typical value for this data set because
examination of the dotplot (reproduced below) indicates that the distribution is clearly
skewed and may contain an outlier.
0
20
40
60
80 100 120 140 160 180 200 220 240 260 280 300
minutes
Section 3.1 Exercise Set 2
3.5:
The distribution of times between ordering and receiving coffee is roughly symmetric, so
using the mean and standard deviation to describe center and spread, respectively, is
appropriate.
3.6:
The distribution of APEAL ratings is roughly symmetric, so using the mean and standard
deviation to describe center and spread, respectively, is appropriate.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
3 3.7:
The distribution of male exercise times in positively skewed, so the median and
interquartile range should be used to describe the center and spread, respectively.
3.8: The dotplot of average weekday circulation (reproduced below) shows that the distribution
is strongly positively skewed. The mean should be used to describe a typical value of
symmetric distributions, and therefore should not be used to describe the center of this
distribution.
Additional Exercises for Section 3.1
3.9: The distribution is skewed, so median and interquartile range should be used to describe
center and spread, respectively.
0
20
40
60
80
100
120
Weekend Exercise Time (minutes)
140
160
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
4 3.10:
The distribution of female exercise time is positively skewed, so the median and
interquartile range should be used to describe center and spread, respectively.
3.11: The distribution is roughly symmetric with no obvious outliers, so the mean and standard
deviation should be used to describe center and spread, respectively.
45
50
55
60
65
70
Passive Knee Extension (degrees)
Section 3.2 Exercise Set 1
3.12: The mean is x  51.33 ounces. This is a typical or representative value for the amount of
alcohol poured. The standard deviation is s  15.22 ounces, which represents how much,
on average, the values in the data set spread out, or deviate, from the mean.
3.13: (a) x  59.23 ounces, and s  16.71 ounces. The mean represents a typical or
representative value for the amount of alcohol poured and the standard deviation represents
how much, on average, the values in the data set spread out, or deviate, from the mean. (b)
Individuals pouring alcohol into short wide glasses pour, on average, more alcohol when
pouring one shot than when pouring into tall, slender glasses.
3.14: (a) x  59.85 hours, s  14.78 hours (b) x  56.67 hours, s  9.75 hours. When Los
Angeles was excluded from the data set, the mean and standard deviation both decreased.
This suggests that using the mean and standard deviation as measures of center and spread
for data sets with outliers present can be risky, because outliers seem to have a significant
impact on those measures.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
5 3.15: Answers will vary, here is one possible answer. The mean, $444, is large, and we can
likely assume that some parents spend amounts close to zero. Thus it is likely that the
amounts vary greatly, making the standard deviation large.
Section 3.2 Exercise Set 2
3.16: (a) x  448.30 , which is the typical number of speed-related fatalities of these 20 dates;
s  28.24 is, on average, how much the number of speed-related fatalities deviates from
the mean.
(b) It is not reasonable to generalize from the sample of 20 days to the other 345 days of the
year because these days were not randomly selected. Rather, these are the 20 days that had
the highest number of speed-related fatalities between 1994 and 2003.
3.17: x  49.40 cents, which is the typical cost per serving (in cents) for this set of 15 high-fiber
cereals rated very good or good by Consumer Reports; s  16.10 cents is, on average, how
much the costs per serving deviate from the mean.
3.18: (a) x  152.1 seconds; s  74.6 seconds
(b) x  139.4 seconds; s  51.6 seconds. Deleting the observation of 380 had a profound
impact on the mean and standard deviation. The mean decreased from 152.1 to 139.4
seconds, and the standard deviation decreased from 74.6 to 51.6 seconds. This suggests
that using the mean and standard deviation to measure center and spread when outliers are
present can give a misleading perception of the distribution.
3.19: The standard deviation is a reasonable measure of volatility because it measures how much,
on average, individual asset returns deviate from the mean return of the portfolio. A
smaller standard deviation indicates smaller deviations (on average) from the mean return,
and therefore less risk.
Additional Exercises for Section 3.2
3.20: (a) x  9.625 mg/ounce (b) The caffeine concentration of Coca-Cola and Pepsi Cola are
quite a bit lower than the energy drinks. In fact, the average caffeine concentration of the
energy drinks is more than 3 times the caffeine concentration of Coca-Cola and Pepsi Cola,
and some of the individual energy drinks have even more than 3 times the caffeine
concentration.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
6 3.21: (a) x  287.714 ; the table below shows the deviations from the mean (b) the table below
shows the sum of the deviations (at the bottom of column 2)
Data Values  xi 
Deviation from mean  xi  x 
Squared Deviations  xi  x 
497
193
328
155
326
245
270
497 – 287.714 = 209.286
193 – 287.714 = -94.714
328 – 287.714 = 40.286
155 – 287.714 = -132.714
326 – 287.714 = 38.286
245 – 287.714 = -42.714
270 – 287.714 = -17.714
  xi  x   0.002
209.2862 = 43,800.630
(-94.714)2 = 8970.742
40.2862 = 1622.962
(-132.714)2 = 17,613.006
38.2862 = 1465.818
(-42.714)2 = 1824.486
(-17.714)2 = 313.786
x  x 
i
2
2
 75, 611.43
(c) To calculate the variance and standard deviation, the squared deviations and sum of the
squared deviations are needed. The third column contains these values. The variance is
  xi  x   75, 611.43  12, 601.905 . The standard
computed using the formula s 2 
n 1
7 1
deviation is s  s 2  12, 601.905  112.258 .
3.22: (a) x  48.36 cm. This is a typical distance (in centimeters) at which a bat first detects a
nearby insect. (b) s 2  327.05 cm2, s  18.08 cm. The variance is the mean squared
deviation from the mean distance at which a bat first detects a nearby insect, in square
centimeters. The standard deviation represents, on average, how much a distance at which
a bat first detects a nearby insect deviates from the mean, in centimeters.
3.23: The mean found after subtracting 10 from each sample observation is x  38.36 cm. The
table below shows the original sample observations, the values after subtracting 10, and the
deviations from the new mean.
Original Sample
Observation
62
23
27
56
52
34
42
40
68
45
83
Sample Observation
minus 10
52
13
17
46
42
24
32
30
58
35
73
Deviation from
the new mean
13.64
-25.36
-21.36
7.64
3.64
-14.36
-6.36
-8.36
19.64
-3.36
34.64
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
7 The deviations for the data set obtained by subtracting 10 from each sample observation are
exactly the same as the corresponding deviations from the mean for the original data set.
Since the deviations are the same, the new variance (s2) and standard deviation (s) are also
the same as old variance and standard deviation. Subtracting or adding the same number to
every value in a data set does not change the variance (s2) or standard deviation (s).
3.24: The standard deviation of the original data is s = 18.08 cm. After multiplying each data
value by 10, the new standard deviation is s = 180.8 cm. In general, if each observation is
multiplied by a positive constant c, the standard deviation s is also multiplied by c.
Section 3.3 Exercise Set 1
3.25: (a) There is an even number of observations (n = 20), so the median is the average of the
438, 722  427, 771
two middle values: median 
 433, 246.5 . This value, 433,246.5, is
2
the value that divides the ordered data set into two halves. This tells us that half of the
values in our data set had average weekly circulations of less than 433,246.5, and the other
half had average weekly circulations of more than 433,246.5. (b) The median is preferable
to the mean for describing the center for this data set because the distribution is positively
skewed and contains outliers. (c) It is not reasonable to generalize from this sample to the
population of daily newspapers in the United States because these newspapers were not
randomly selected. Rather, they are the top 20 newspapers in average weekday circulation.
3.26: Lower quartile = 10,478; upper quartile = 11,778. The lower quartile of 10,478 mg/kg is
the value such that 25% of the catsups have sodium contents lower than this value, and
75% are higher. The upper quartile of 11,778 mg/kg is the value such that 75% of the
catsups have sodium contents lower than this value, and 25% are higher. The interquartile
range is iqr  11, 778  10, 478  1300 . The interquartile range of 1300 mg/kg is the range
of the middle 50% of the catsup sodium contents. It tells us how spread out the middle
50% of the data values are.
3.27: Because n = 25, the median is the value in the middle of the ordered list. Therefore, the
median is 142. The lower quartile is 0, and the upper quartile is 195. The interquartile
range is iqr  195  0  195 . Half of the values of number of minutes used in cell phone
calls in one month are less than or equal to 142 minutes, and half of the data values of
number of minutes used in cell phone calls is greater than or equal to 142 minutes. The
middle 50% of the data values have a range of 195 minutes.
3.28: The median tipping percentage is 21%. The lower quartile is 10.75%, and the upper
quartile is 35.6%. The interquartile range is 35.6 – 10.75 = 24.85%. The median tipping
percentage of 21% indicates that half of the tips were below 21%, and the remaining half
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
8 were above 21%. The interquartile range indicates that the middle 50% of tips had a range
of 24.85%.
Section 3.3 Exercise Set 2
3.29: (a) The median repair cost is $1,688. The median is the middle value in the ordered list of
repair costs, so half the repair costs are less than or equal to $1,688, and half the repair
costs are greater than or equal to $1,688.
(b) The median is preferable to the mean because the distribution of repair costs is
positively skewed (see dotplot below).
3.30: The lower quartile is 49.5 hours, which is the number of extra hours that divides the lower
25% of values from the upper 75%. The upper quartile is 65.5 hours, which is the number
of extra hours that divides the lower 75% of values from the upper 25%. The interquartile
range is iqr  65.5  49.5  16 , which is the range of the middle 50% of the data values.
3.31: The median exercise time for this set of 20 males is 31.5. The median value of 31.5 is the
middle value in the ordered list of exercise times, so half the values are less than or equal to
31.5 and half the values are greater than or equal to 31.5. The lower quartile is 3.75, and
the upper quartile is 67.5. Therefore, the interquartile range is iqr  67.5  3.75  63.75 ,
which is the range of the middle 50% of the exercise times.
3.32: (a) The median exercise time for this set of 20 females is 7.5, which represents the middle
value of the ordered female exercise times, so half the exercise times are less than or equal
to 7.5, and half the exercise times are greater than or equal to 7.5. The lower quartile is 1.0,
and the upper quartile is 49.50, so the interquartile range is iqr  49.5  1.0  48.5 . The
middle 50% of the data has a spread of 48.5.
(b) The median male exercise time is much greater than the median female exercise time.
In addition, the male interquartile range is greater than the female interquartile range,
which indicates that there is more variability in the middle 50% of exercise times for males
than for females.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
9 Additional Exercises for Section 3.3
3.33: The large difference between the mean and median indicates that there were some parents
who spent large amounts of money on school supplies, while the amounts for the lowest
spenders were less far from the median value. These outliers have the effect of pulling the
mean toward the outliers, yet the median generally remains unchanged.
3.34: The median is the measure of center that determines this salary, and is $4,286. The other
measure of center is the mean, and it’s value for this data set is x  $3, 969 . The value of
the mean is less than that of the median, which makes the mean not as favorable to the San
Luis Obispo County supervisors.
3.35: (a) The dotplot is relatively symmetric, with a possible outlier at the high end of the scale.
As such, the mean and median will be relatively close to each other, with the mean being
greater than the median.
325
335
345
355
365
375
385
time (s)
395
405
415
425
369  370
 369.5 seconds. (c) The largest time could
2
be increased by any amount and not affect the sample median because the position of the
middle value will not change if the largest value is increased. The largest time could be
decreased to 370 seconds without changing the value of the median.
(b) x  370.69 seconds, median 
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
10 Section 3.4 Exercise Set 1
3.36: Minimum = 0, lower quartile = 14, median = 33.5, upper quartile = 63, maximum = 151.
3.37:
80
90
100
110
120
130
140
manufacturing defects
150
160
170
The boxplot shows that there is one outlier (170 defects), and the value of the largest nonoutlier is 146 defects. The middle 50% of the data values range between about 106 and
126 defects. The distribution is positively skewed. The median is not centered in the
middle 50% of the data values, further indicating the skewed nature of the distribution.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
11 3.38: (a) No, they are not outliers. Any values greater than upper quartile + 1.5(iqr) or less than
lower quartile – 1.5(iqr) are considered outliers. For this data set, values are outliers if they
are greater than 32.3  1.5(32.3  20)  50.75 cents or less than 20  1.5(32.3  20)  1.55
cents. The largest value is not greater than 50.75, and the smallest value is not less than
1.55, so these values are not outliers.
(b)
10
20
30
40
gasoline tax per gallon (cents)
50
The boxplot is positively skewed. The median is not located at the center of the middle
50%, further indicating a skewed distribution.
3.39: (a) lower quartile = 16.05 inches, upper quartile = 21.93 inches. iqr  21.93  16.05  5.88
inches.
(b) Any values greater than upper quartile + 1.5(iqr) or less than lower quartile – 1.5(iqr)
are considered outliers. For this data set, values are outliers if they are greater than
21.93  1.5(5.88)  30.75 inches or less than 16.05  1.5(5.88)  7.23 inches. The value
31.57 inches is an outlier.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
12 (c)
10
15
20
rainfall (inches)
25
30
The modified boxplot shows one outlier at the high end of the scale. The distribution of
inches of rainfall is slightly positively skewed.
3.40:
Short Wide
Tall Slender
20
30
40
50
60
70
80
amount of alcohol poured (mL)
90
100
Both distributions (short wide and tall slender) are skewed, although the direction of skew
is different for the two distributions. The distribution of amount of alcohol poured into
short wide glasses is positively skewed, and the distribution of amount of alcohol poured
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
13 into tall slender glasses is negatively skewed. The amount of alcohol poured into short
wide glasses tends to be more than the amount poured into tall slender glasses.
Specifically, the five-number-summary values for short wide glasses are all greater than the
corresponding values for tall slender glasses. For example, the maximum amount of
alcohol poured into short wide glasses (92.4 mL) is much greater than the maximum
amount of alcohol poured into tall wide glasses (73.5 mL). In addition, the median amount
of alcohol poured into short wide glasses (60.4 mL) is greater than the median amount of
alcohol poured into tall slender glasses.
Section 3.4 Exercise Set 2
3.41: Minimum: 28.8; Lower Quartile: 35.7; Median: 37.3; Upper Quartile: 38.5; Maximum:
42.2
3.42:
30
50
70
90
110
130
150
Waiting time (seconds)
170
190
210
The distribution of waiting times is nearly symmetric, with a median of 120 seconds. The
times range from a minimum of 40 to a maximum of 200 seconds. The middle 50% of
waiting times range between 85 and 160 seconds.
3.43: (a) Lower quartile: 11.1; Upper quartile: 13.4; interquartile range = 13.4 – 11.1 = 2.3. Any
observations smaller than 11.1  1.5(2.3)  7.65 or larger than 13.4  1.5(2.3)  16.85 are
considered outliers. Vermont’s data value (9.5) is not an outlier because it is not smaller
than 7.65. Mississippi’s data value (18.0) is an outlier because it is larger than 16.85.
(b) The boxplot shows the one large outlier (Mississippi). Excluding the outlier, the
boxplot is relatively symmetric. The median is 12.3, and the upper and lower quartiles
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
14 agree with the values stated in part (a). Clearly, Mississippi has an unusually large value
for the percent of premature births in 2008.
9
10
11
12
13
14
15
Premature percent
16
17
18
3.44: (a) Lower quartile: 81.5; Upper quartile: 94; Interquartile range = 94 – 81.5 = 12.5.
Outliers are observations that are smaller than 81.5  1.5(12.5)  62.75 and larger than
94  1.5(12.5)  112.75 . The farmer’s observation (43) is an outlier, and the student’s
observation (152) is an outlier.
(b)
40
60
80
100
120
Accidents per 1,000
140
160
(c) Answers may vary. One possible answer is to offer a professional discount on auto
insurance to the professions below the lower quartile for accidents (law enforcement,
physical therapist, veterinarian, clerical (secretary), clergy, homemaker, politician, pilot,
firefighter, and farmer).
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
15 3.45:
Male
Female
0
50
100
Exercise Time
150
200
The distributions of exercise times for males and females are both positively skewed. The
distribution of the middle 50% of the male observations is approximately symmetric, but
the distribution of the middle 50% of the female observations is positively skewed. The
values of the lower quartile (3.75), median (31.5), and upper quartile (67.5) for the males
are all larger than the corresponding values for female exercise times (1.0, 7.5, and 49.5,
respectively). The male distribution has one large outlier, while the female distribution has
no outliers.
Additional Exercises for Section 3.4
3.46: The boxplot is shown below. No, the boxplot is not approximately symmetric, it is
positively skewed.
20
30
40
50
Maximum Annual Wind Speed
60
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
16 3.47: (a)
East
Middle States
West
5
10
15
20
25
Wireless %
(b) The most noticeable difference between the wireless percent for the three geographical
regions is that the Middle States region is negatively skewed, and has a smaller
interquartile range than the East and West regions. The Eastern region has the smallest
median (11.4%), and the Middle States and Western regions have medians that are much
closer to each other (16.9% and 16.3%, respectively).
3.48: (a) Lower quartile: 44; Upper quartile: 53; Interquartile range: 53 – 44 = 9. Observations
smaller than 44  1.5(9)  30.5 or larger than 53  1.5(9)  66.5 are outliers. There are no
observations smaller than 30.5 or larger than 66.5, so there are no outliers in this data set.
(b) The boxplot is shown below. As indicated in part (a), there are no outliers. The
median of this data set is 46%. The entire data set ranges between a minimum of 33% and
a maximum of 60%. The middle 50% of observations range between 44% and 53%. The
middle 50% is also asymmetric, with the lower half ranging between 44% and 46% and the
upper half ranging between 46% and 53%.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
17 30
35
40
45
50
Juice Lost After Thawing (%)
55
60
3.49: The fact that the mean is so much higher than the median indicates that the distribution is
positively skewed. There were undoubtedly some very high punitive damage awards,
which pulled the mean up toward the large values.
Section 3.5 Exercise Set 1
3.50: First national aptitude test: z 
625  475
 1.5 . Second national aptitude test:
100
45  30
 1.875 . The student performed better on the second national aptitude test
8
relative to the other test takers because the z-score for the second test is higher than that for
the first test.
z
3.51: (a) 40 minutes is 1 standard deviation above the mean; 30 minutes is 1 standard deviation
below the mean. The values that are 2 standard deviations away from the mean are 25 and
45 minutes. (b) Approximately 95% of times are between 25 and 45 minutes;
approximately 0.3% of times are less than 20 minutes or greater than 50 minutes;
Approximately 0.15% of times are less than 20 minutes.
3.52: The 10th percentile of $0 indicates that 10% of students have $0 or less of student debt.
The 25th percentile (which is the lower quartile) indicates that 25% of students have $0 or
less of student debt. The 50th percentile (the median) indicates that 50% of students have
$11,000 or less of student debt. The 75th percentile (the upper quartile) indicates that 75%
of students have $24,600 or less of student debt. The 90th percentile indicates that 90% of
students have $39,300 or less of student debt.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
18 3.53: (a)
100
90
80
Frequency
70
60
50
40
30
20
10
0
15
16
17
18
19
20
21
22
23
Bus Travel Times (minutes)
24
25
26
(b) (i) 86th percentile is approximately 21 minutes; (ii) 15th percentile is approximately 18
minutes; (iii) 90th percentile is approximately 21.5 minutes; (iv) 95th percentile is
approximately 25.5 minutes; (v) 10th percentile is approximately 17.5 minutes
Section 3.5 Exercise Set 2
3.54: (a) The z-score tells us that the score is 2.2 standard deviations above the mean. Because
the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore corresponds to a score slightly above the 97.5th percentile, which means the score is
greater than or equal to approximately 97.5% of all the scores.
(b) The z-score tells us that the score is 0.4 standard deviations above the mean. Because
the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore tells us that the score is in the upper half of all scores.
(c) The z-score tells us that the score is 1.8 standard deviations above the mean. Because
the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore corresponds to a little below the 97.5th percentile.
(d) The z-score tells us that the score is 1.0 standard deviation above the mean. Because
the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore corresponds to approximately the 84th percentile, which means the score is greater
than or equal to approximately 84% of all the scores.
(e) The z-score of 0 indicates my score was equal to the mean and median.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
19 3.55: (a) Given that the distribution is symmetric and mound-shaped, we can apply the empirical
rule. Twenty-seven mph is 1 standard deviation below the mean, and 57 mph is 1 standard
deviation above the mean. The empirical rule tells us that approximately 68% of the
vehicle speeds lie within one standard deviation of the mean, or between 27 and 57 mph.
(b) Given that the distribution is symmetric and mound-shaped, we can apply the empirical
rule. Fifty-seven mph is 1 standard deviation above the mean. Therefore, by the empirical
rule, 84% of the vehicle speeds lie below 1 standard deviation above the mean, so 16% of
the observations will lie above 1 standard deviation above the mean.
3.56: The 83rd percentile indicates that her score was greater than or equal to 83% of all scores
on the verbal section of the test. Additionally, she scored greater than or equal to 94% of
all scores on the math section.
3.57: (a) The frequency distribution is shown in the table below.
Expenditures
(per capita)
0-<2
2-<4
4-<6
6-<8
8 - < 10
10 - < 12
12 - < 14
14 - < 16
16 - < 18
Frequency
13
18
10
5
1
2
0
0
2
20
Frequency
15
10
5
0
0
2
4
6
8
10
12
Expenditures (per capita)
14
16
18
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
20 (b) (i) The 50th percentile is between per capita expenditures of 2 and 4.
(ii) The 70th percentile is between per capita expenditures of 4 and 6.
(iii) The 10th percentile is between per capita expenditures of 0 and 2.
(iv) The 90th percentile is between per capita expenditures of 7 and 8.
(v) The 40th percentile is between per capita expenditures of 2 and 4.
Additional Exercises for Section 3.5
3.58: (a) z 
data value  mean 0  1, 650

 2.2
standard deviation
750
(b) z 
data value  mean 10, 000  1,650

 11.133
standard deviation
750
(c) z 
data value  mean 4,500  1, 650

 3.8
standard deviation
750
(d) z 
data value  mean 300  1, 650

 1.8
standard deviation
750
3.59: (a) 1100 gallons; (b) 1400 gallons; (c) 1700 gallons
3.60: (a) z 
data value  mean 320  450

 1.857
standard deviation
70
(b) z 
data value  mean 475  450

 0.357
standard deviation
70
(c) z 
data value  mean 420  450

 0.429
standard deviation
70
(d) z 
data value  mean 610  450

 2.286
standard deviation
70
90  100
 0.5 ; (d) 97.5; (e) Since a score of 40 is 3 standard
20
deviations below the mean, that corresponds to a percentile of 0.15%. Therefore, there
were relatively few scores below 40.
3.61: (a) 120; (b) 20; (c) z 
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
21 Chapter 3:
Are You Ready to Move On?
3.62: (a) The distribution of female weekend exercise time is positively skewed (see the boxplot
below), so the median and interquartile range should be used to describe center and spread,
respectively.
0
10
20
30
40
50
60
Female Weekend Exercise Time
70
80
90
(b) The distribution of amount of alcohol poured is negatively skewed (see the boxplot
below), so the median and interquartile range should be used to describe center and spread,
respectively.
20
30
40
50
60
Amount of Alcohol (mL)
70
80
(c) The distribution of wait times is positively skewed with a large outlier (see the boxplot
below), so the median and interquartile range should be used to describe center and spread,
respectively.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
22 50
100
150
200
250
wait time (sec)
300
350
400
3.63: The mean APEAL rating is x  792.03 , which is a typical or representative value for the
APEAL ratings in the sample. The standard deviation is s  36.70 and represents how
much, on average, the values in the data set spread out, or deviate, from the mean APEAL
rating.
3.64: The high-caffeine energy drinks show much more variability in caffeine per ounce. This
can be seen in the comparative boxplots below. In addition, since both distributions are
reasonably symmetric, the standard deviation is an appropriate measure of variability. The
standard deviation for the caffeine content per ounce in the energy drinks is s = 0.667, and
the standard deviation for the caffeine content per ounce in the high-caffeine energy drinks
is s = 8.31.
Top Selling Energy Drink
High Caffeine Energy Drink
10
15
20
25
30
Caffeine per Ounce
35
3.65: (a) The mean tipping percent is x  27.31% , and the standard deviation is s  23.83% . (b)
After removing the 105% tip, the new mean and standard deviation are xnew  23.23% , and
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
23 snew  15.70% . These values are much smaller than the mean and standard deviation
computed with 105 included. This suggests that the mean and standard deviation can
change dramatically when outliers are present (or removed) from the data set, and,
therefore, are probably not the best measures of center and spread to use in this situation.
3.66: The mean repair cost is $2,119, and the median repair cost is $1,688. These values are so
different because the distribution is positively skewed, and the mean tends to be pulled
toward larger values in positively skewed distributions, whereas the median is more
resistant. Therefore, the median is preferable to the mean because the distribution of repair
costs is positively skewed (see dotplot below).
3.67: (a) The median is 140 seconds, and the interquartile range is iqr  200  100  100 seconds.
The median divides the ordered list into two equal halves, with half the values less than 140
seconds and half the values greater than 140 seconds. The interquartile range of 100
seconds indicates that the middle 50% of the data values have a range of 100 seconds. (b)
Due to the presence of the large outlier (the value 380 seconds), the median and
interquartile range are the appropriate summary measures to describe center and spread for
this data set.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
24 3.68: (a) Median = 58, lower quartile = 53.5, upper quartile = 64.4
(b) Outliers at the low end of the distribution are values less than lower quartile – 1.5(iqr).
The iqr = 64.4 – 53.5 = 10.9, so values less than 53.5 – 1.5(10.9) = 37.15 are considered
outliers. Since the values for Alaska (28.2) and Wyoming (35.7) are both less than 37.15,
they are outliers.
(c) The distribution is negatively skewed with two outliers on the low end of the scale. The
median is 58%, and the lower and upper quartiles are 53.5% and 64.4%, respectively. The
middle 50% of the data values range between these quartiles, and is approximately
symmetric. Excluding the outliers, the distribution of the remaining data values is
approximately symmetric.
30
40
50
60
Percent Still Living in the State
70
80
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
25 3.69: (a) Median = 8 grams/serving; lower quartile = 7 grams/serving; upper quartile = 12
grams/serving; interquartile range = 12 – 7 = 5 grams/serving
(b) Median = 10 grams/serving; lower quartile = 6 grams/serving; upper quartile = 13
grams/serving; interquartile range = 13 – 6 = 7 grams/serving
(c) There are no outliers in the sugar content data set because there are no values greater
than 1.5(iqr) above the upper quartile or smaller than 1.5(iqr) below the lower quartile.
(d) The minimum value and lower quartile are the same because the smallest five values in
the data set are all equal to 7.
(e)
Fiber
Sugar
0
5
10
Content (grams/serving)
15
20
The sugar content in grams/serving is much more variable than the fiber content in
grams/serving. The range in sugar content (19 grams/serving) is greater than the range in
fiber content (7 grams/serving). The boxplot of fiber content shows that the minimum and
lower quartile are equal to each other, which is not observed in the sugar content. The
distribution of sugar content values are approximately symmetric, which is different from
the skewed fiber distribution.
3.70: Use z-scores to make comparisons between the two different stimuli. For stimulus 1,
4.2  6.0
1.8  3.6
 2.25 . The z-scores indicate that
z
 1.5 , and for stimulus 2, z 
1.2
0.8
your reaction time for stimulus 1 is 1.5 standard deviations below the mean, and your
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.
26 reaction time for stimulus 2 is 2.25 standard deviations below the mean. Therefore,
compared to other people, you are reacting to stimulus 2 more quickly.
3.71: (a) The 25th percentile indicates that 25% of full-time female workers age 25 or older with
an Associate degree earn $26,800 or less. The 50th percentile indicates that 50% of fulltime female workers age 25 or older with an Associate degree earn $36,800 or less. The
75th percentile indicates that 75% of full-time female workers age 25 or older with an
Associate degree earn $51,100 or less. (b) The 25th, 50th, and 75th percentile values for men
are all greater than the corresponding percentiles for female workers, indicating that fulltime employed men age 25 or older with an Associate degree, in general, earn more than
full-time employed women age 25 or older with an Associate degree.
©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.