Download Chapter3

1 AP* SOLUTIONS Chapter 3 Numerical Methods for Describing Data Distributions Section 3.1 Exercise Set 1 3.1: The distribution is approximately symmetric with no outliers, so the mean and standard deviation should be used to describe the center and spread, respectively. 30 35 40 45 50 55 60 65 70 amount (mL) 75 80 85 90 95 3.2: The distribution is positively skewed with an outlier, so the median and interquartile range should be used to describe the center and spread, respectively. 0 10 20 30 40 50 60 Tip Percent 70 80 90 100 3.3: The distribution is positively skewed with a possible outlier, so the median and interquartile range should be used to describe center and spread, respectively. 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 Defects per 100 cars *AP and Advanced Placement Program are registered trademarks of the College Entrance Examination Board, which was not involved in the production of, and does not endorse, this product. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 2 3.4: The average may not be the best measure of a typical value for this data set because examination of the dotplot (reproduced below) indicates that the distribution is clearly skewed and may contain an outlier. 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 minutes Section 3.1 Exercise Set 2 3.5: The distribution of times between ordering and receiving coffee is roughly symmetric, so using the mean and standard deviation to describe center and spread, respectively, is appropriate. 3.6: The distribution of APEAL ratings is roughly symmetric, so using the mean and standard deviation to describe center and spread, respectively, is appropriate. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 3 3.7: The distribution of male exercise times in positively skewed, so the median and interquartile range should be used to describe the center and spread, respectively. 3.8: The dotplot of average weekday circulation (reproduced below) shows that the distribution is strongly positively skewed. The mean should be used to describe a typical value of symmetric distributions, and therefore should not be used to describe the center of this distribution. Additional Exercises for Section 3.1 3.9: The distribution is skewed, so median and interquartile range should be used to describe center and spread, respectively. 0 20 40 60 80 100 120 Weekend Exercise Time (minutes) 140 160 ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 4 3.10: The distribution of female exercise time is positively skewed, so the median and interquartile range should be used to describe center and spread, respectively. 3.11: The distribution is roughly symmetric with no obvious outliers, so the mean and standard deviation should be used to describe center and spread, respectively. 45 50 55 60 65 70 Passive Knee Extension (degrees) Section 3.2 Exercise Set 1 3.12: The mean is x  51.33 ounces. This is a typical or representative value for the amount of alcohol poured. The standard deviation is s  15.22 ounces, which represents how much, on average, the values in the data set spread out, or deviate, from the mean. 3.13: (a) x  59.23 ounces, and s  16.71 ounces. The mean represents a typical or representative value for the amount of alcohol poured and the standard deviation represents how much, on average, the values in the data set spread out, or deviate, from the mean. (b) Individuals pouring alcohol into short wide glasses pour, on average, more alcohol when pouring one shot than when pouring into tall, slender glasses. 3.14: (a) x  59.85 hours, s  14.78 hours (b) x  56.67 hours, s  9.75 hours. When Los Angeles was excluded from the data set, the mean and standard deviation both decreased. This suggests that using the mean and standard deviation as measures of center and spread for data sets with outliers present can be risky, because outliers seem to have a significant impact on those measures. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 5 3.15: Answers will vary, here is one possible answer. The mean, $444, is large, and we can likely assume that some parents spend amounts close to zero. Thus it is likely that the amounts vary greatly, making the standard deviation large. Section 3.2 Exercise Set 2 3.16: (a) x  448.30 , which is the typical number of speed-related fatalities of these 20 dates; s  28.24 is, on average, how much the number of speed-related fatalities deviates from the mean. (b) It is not reasonable to generalize from the sample of 20 days to the other 345 days of the year because these days were not randomly selected. Rather, these are the 20 days that had the highest number of speed-related fatalities between 1994 and 2003. 3.17: x  49.40 cents, which is the typical cost per serving (in cents) for this set of 15 high-fiber cereals rated very good or good by Consumer Reports; s  16.10 cents is, on average, how much the costs per serving deviate from the mean. 3.18: (a) x  152.1 seconds; s  74.6 seconds (b) x  139.4 seconds; s  51.6 seconds. Deleting the observation of 380 had a profound impact on the mean and standard deviation. The mean decreased from 152.1 to 139.4 seconds, and the standard deviation decreased from 74.6 to 51.6 seconds. This suggests that using the mean and standard deviation to measure center and spread when outliers are present can give a misleading perception of the distribution. 3.19: The standard deviation is a reasonable measure of volatility because it measures how much, on average, individual asset returns deviate from the mean return of the portfolio. A smaller standard deviation indicates smaller deviations (on average) from the mean return, and therefore less risk. Additional Exercises for Section 3.2 3.20: (a) x  9.625 mg/ounce (b) The caffeine concentration of Coca-Cola and Pepsi Cola are quite a bit lower than the energy drinks. In fact, the average caffeine concentration of the energy drinks is more than 3 times the caffeine concentration of Coca-Cola and Pepsi Cola, and some of the individual energy drinks have even more than 3 times the caffeine concentration. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 6 3.21: (a) x  287.714 ; the table below shows the deviations from the mean (b) the table below shows the sum of the deviations (at the bottom of column 2) Data Values  xi  Deviation from mean  xi  x  Squared Deviations  xi  x  497 193 328 155 326 245 270 497 – 287.714 = 209.286 193 – 287.714 = -94.714 328 – 287.714 = 40.286 155 – 287.714 = -132.714 326 – 287.714 = 38.286 245 – 287.714 = -42.714 270 – 287.714 = -17.714   xi  x   0.002 209.2862 = 43,800.630 (-94.714)2 = 8970.742 40.2862 = 1622.962 (-132.714)2 = 17,613.006 38.2862 = 1465.818 (-42.714)2 = 1824.486 (-17.714)2 = 313.786 x  x  i 2 2  75, 611.43 (c) To calculate the variance and standard deviation, the squared deviations and sum of the squared deviations are needed. The third column contains these values. The variance is   xi  x   75, 611.43  12, 601.905 . The standard computed using the formula s 2  n 1 7 1 deviation is s  s 2  12, 601.905  112.258 . 3.22: (a) x  48.36 cm. This is a typical distance (in centimeters) at which a bat first detects a nearby insect. (b) s 2  327.05 cm2, s  18.08 cm. The variance is the mean squared deviation from the mean distance at which a bat first detects a nearby insect, in square centimeters. The standard deviation represents, on average, how much a distance at which a bat first detects a nearby insect deviates from the mean, in centimeters. 3.23: The mean found after subtracting 10 from each sample observation is x  38.36 cm. The table below shows the original sample observations, the values after subtracting 10, and the deviations from the new mean. Original Sample Observation 62 23 27 56 52 34 42 40 68 45 83 Sample Observation minus 10 52 13 17 46 42 24 32 30 58 35 73 Deviation from the new mean 13.64 -25.36 -21.36 7.64 3.64 -14.36 -6.36 -8.36 19.64 -3.36 34.64 ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 7 The deviations for the data set obtained by subtracting 10 from each sample observation are exactly the same as the corresponding deviations from the mean for the original data set. Since the deviations are the same, the new variance (s2) and standard deviation (s) are also the same as old variance and standard deviation. Subtracting or adding the same number to every value in a data set does not change the variance (s2) or standard deviation (s). 3.24: The standard deviation of the original data is s = 18.08 cm. After multiplying each data value by 10, the new standard deviation is s = 180.8 cm. In general, if each observation is multiplied by a positive constant c, the standard deviation s is also multiplied by c. Section 3.3 Exercise Set 1 3.25: (a) There is an even number of observations (n = 20), so the median is the average of the 438, 722  427, 771 two middle values: median   433, 246.5 . This value, 433,246.5, is 2 the value that divides the ordered data set into two halves. This tells us that half of the values in our data set had average weekly circulations of less than 433,246.5, and the other half had average weekly circulations of more than 433,246.5. (b) The median is preferable to the mean for describing the center for this data set because the distribution is positively skewed and contains outliers. (c) It is not reasonable to generalize from this sample to the population of daily newspapers in the United States because these newspapers were not randomly selected. Rather, they are the top 20 newspapers in average weekday circulation. 3.26: Lower quartile = 10,478; upper quartile = 11,778. The lower quartile of 10,478 mg/kg is the value such that 25% of the catsups have sodium contents lower than this value, and 75% are higher. The upper quartile of 11,778 mg/kg is the value such that 75% of the catsups have sodium contents lower than this value, and 25% are higher. The interquartile range is iqr  11, 778  10, 478  1300 . The interquartile range of 1300 mg/kg is the range of the middle 50% of the catsup sodium contents. It tells us how spread out the middle 50% of the data values are. 3.27: Because n = 25, the median is the value in the middle of the ordered list. Therefore, the median is 142. The lower quartile is 0, and the upper quartile is 195. The interquartile range is iqr  195  0  195 . Half of the values of number of minutes used in cell phone calls in one month are less than or equal to 142 minutes, and half of the data values of number of minutes used in cell phone calls is greater than or equal to 142 minutes. The middle 50% of the data values have a range of 195 minutes. 3.28: The median tipping percentage is 21%. The lower quartile is 10.75%, and the upper quartile is 35.6%. The interquartile range is 35.6 – 10.75 = 24.85%. The median tipping percentage of 21% indicates that half of the tips were below 21%, and the remaining half ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 8 were above 21%. The interquartile range indicates that the middle 50% of tips had a range of 24.85%. Section 3.3 Exercise Set 2 3.29: (a) The median repair cost is $1,688. The median is the middle value in the ordered list of repair costs, so half the repair costs are less than or equal to $1,688, and half the repair costs are greater than or equal to $1,688. (b) The median is preferable to the mean because the distribution of repair costs is positively skewed (see dotplot below). 3.30: The lower quartile is 49.5 hours, which is the number of extra hours that divides the lower 25% of values from the upper 75%. The upper quartile is 65.5 hours, which is the number of extra hours that divides the lower 75% of values from the upper 25%. The interquartile range is iqr  65.5  49.5  16 , which is the range of the middle 50% of the data values. 3.31: The median exercise time for this set of 20 males is 31.5. The median value of 31.5 is the middle value in the ordered list of exercise times, so half the values are less than or equal to 31.5 and half the values are greater than or equal to 31.5. The lower quartile is 3.75, and the upper quartile is 67.5. Therefore, the interquartile range is iqr  67.5  3.75  63.75 , which is the range of the middle 50% of the exercise times. 3.32: (a) The median exercise time for this set of 20 females is 7.5, which represents the middle value of the ordered female exercise times, so half the exercise times are less than or equal to 7.5, and half the exercise times are greater than or equal to 7.5. The lower quartile is 1.0, and the upper quartile is 49.50, so the interquartile range is iqr  49.5  1.0  48.5 . The middle 50% of the data has a spread of 48.5. (b) The median male exercise time is much greater than the median female exercise time. In addition, the male interquartile range is greater than the female interquartile range, which indicates that there is more variability in the middle 50% of exercise times for males than for females. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 9 Additional Exercises for Section 3.3 3.33: The large difference between the mean and median indicates that there were some parents who spent large amounts of money on school supplies, while the amounts for the lowest spenders were less far from the median value. These outliers have the effect of pulling the mean toward the outliers, yet the median generally remains unchanged. 3.34: The median is the measure of center that determines this salary, and is $4,286. The other measure of center is the mean, and it’s value for this data set is x  $3, 969 . The value of the mean is less than that of the median, which makes the mean not as favorable to the San Luis Obispo County supervisors. 3.35: (a) The dotplot is relatively symmetric, with a possible outlier at the high end of the scale. As such, the mean and median will be relatively close to each other, with the mean being greater than the median. 325 335 345 355 365 375 385 time (s) 395 405 415 425 369  370  369.5 seconds. (c) The largest time could 2 be increased by any amount and not affect the sample median because the position of the middle value will not change if the largest value is increased. The largest time could be decreased to 370 seconds without changing the value of the median. (b) x  370.69 seconds, median  ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 10 Section 3.4 Exercise Set 1 3.36: Minimum = 0, lower quartile = 14, median = 33.5, upper quartile = 63, maximum = 151. 3.37: 80 90 100 110 120 130 140 manufacturing defects 150 160 170 The boxplot shows that there is one outlier (170 defects), and the value of the largest nonoutlier is 146 defects. The middle 50% of the data values range between about 106 and 126 defects. The distribution is positively skewed. The median is not centered in the middle 50% of the data values, further indicating the skewed nature of the distribution. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 11 3.38: (a) No, they are not outliers. Any values greater than upper quartile + 1.5(iqr) or less than lower quartile – 1.5(iqr) are considered outliers. For this data set, values are outliers if they are greater than 32.3  1.5(32.3  20)  50.75 cents or less than 20  1.5(32.3  20)  1.55 cents. The largest value is not greater than 50.75, and the smallest value is not less than 1.55, so these values are not outliers. (b) 10 20 30 40 gasoline tax per gallon (cents) 50 The boxplot is positively skewed. The median is not located at the center of the middle 50%, further indicating a skewed distribution. 3.39: (a) lower quartile = 16.05 inches, upper quartile = 21.93 inches. iqr  21.93  16.05  5.88 inches. (b) Any values greater than upper quartile + 1.5(iqr) or less than lower quartile – 1.5(iqr) are considered outliers. For this data set, values are outliers if they are greater than 21.93  1.5(5.88)  30.75 inches or less than 16.05  1.5(5.88)  7.23 inches. The value 31.57 inches is an outlier. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 12 (c) 10 15 20 rainfall (inches) 25 30 The modified boxplot shows one outlier at the high end of the scale. The distribution of inches of rainfall is slightly positively skewed. 3.40: Short Wide Tall Slender 20 30 40 50 60 70 80 amount of alcohol poured (mL) 90 100 Both distributions (short wide and tall slender) are skewed, although the direction of skew is different for the two distributions. The distribution of amount of alcohol poured into short wide glasses is positively skewed, and the distribution of amount of alcohol poured ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 13 into tall slender glasses is negatively skewed. The amount of alcohol poured into short wide glasses tends to be more than the amount poured into tall slender glasses. Specifically, the five-number-summary values for short wide glasses are all greater than the corresponding values for tall slender glasses. For example, the maximum amount of alcohol poured into short wide glasses (92.4 mL) is much greater than the maximum amount of alcohol poured into tall wide glasses (73.5 mL). In addition, the median amount of alcohol poured into short wide glasses (60.4 mL) is greater than the median amount of alcohol poured into tall slender glasses. Section 3.4 Exercise Set 2 3.41: Minimum: 28.8; Lower Quartile: 35.7; Median: 37.3; Upper Quartile: 38.5; Maximum: 42.2 3.42: 30 50 70 90 110 130 150 Waiting time (seconds) 170 190 210 The distribution of waiting times is nearly symmetric, with a median of 120 seconds. The times range from a minimum of 40 to a maximum of 200 seconds. The middle 50% of waiting times range between 85 and 160 seconds. 3.43: (a) Lower quartile: 11.1; Upper quartile: 13.4; interquartile range = 13.4 – 11.1 = 2.3. Any observations smaller than 11.1  1.5(2.3)  7.65 or larger than 13.4  1.5(2.3)  16.85 are considered outliers. Vermont’s data value (9.5) is not an outlier because it is not smaller than 7.65. Mississippi’s data value (18.0) is an outlier because it is larger than 16.85. (b) The boxplot shows the one large outlier (Mississippi). Excluding the outlier, the boxplot is relatively symmetric. The median is 12.3, and the upper and lower quartiles ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 14 agree with the values stated in part (a). Clearly, Mississippi has an unusually large value for the percent of premature births in 2008. 9 10 11 12 13 14 15 Premature percent 16 17 18 3.44: (a) Lower quartile: 81.5; Upper quartile: 94; Interquartile range = 94 – 81.5 = 12.5. Outliers are observations that are smaller than 81.5  1.5(12.5)  62.75 and larger than 94  1.5(12.5)  112.75 . The farmer’s observation (43) is an outlier, and the student’s observation (152) is an outlier. (b) 40 60 80 100 120 Accidents per 1,000 140 160 (c) Answers may vary. One possible answer is to offer a professional discount on auto insurance to the professions below the lower quartile for accidents (law enforcement, physical therapist, veterinarian, clerical (secretary), clergy, homemaker, politician, pilot, firefighter, and farmer). ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 15 3.45: Male Female 0 50 100 Exercise Time 150 200 The distributions of exercise times for males and females are both positively skewed. The distribution of the middle 50% of the male observations is approximately symmetric, but the distribution of the middle 50% of the female observations is positively skewed. The values of the lower quartile (3.75), median (31.5), and upper quartile (67.5) for the males are all larger than the corresponding values for female exercise times (1.0, 7.5, and 49.5, respectively). The male distribution has one large outlier, while the female distribution has no outliers. Additional Exercises for Section 3.4 3.46: The boxplot is shown below. No, the boxplot is not approximately symmetric, it is positively skewed. 20 30 40 50 Maximum Annual Wind Speed 60 ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 16 3.47: (a) East Middle States West 5 10 15 20 25 Wireless % (b) The most noticeable difference between the wireless percent for the three geographical regions is that the Middle States region is negatively skewed, and has a smaller interquartile range than the East and West regions. The Eastern region has the smallest median (11.4%), and the Middle States and Western regions have medians that are much closer to each other (16.9% and 16.3%, respectively). 3.48: (a) Lower quartile: 44; Upper quartile: 53; Interquartile range: 53 – 44 = 9. Observations smaller than 44  1.5(9)  30.5 or larger than 53  1.5(9)  66.5 are outliers. There are no observations smaller than 30.5 or larger than 66.5, so there are no outliers in this data set. (b) The boxplot is shown below. As indicated in part (a), there are no outliers. The median of this data set is 46%. The entire data set ranges between a minimum of 33% and a maximum of 60%. The middle 50% of observations range between 44% and 53%. The middle 50% is also asymmetric, with the lower half ranging between 44% and 46% and the upper half ranging between 46% and 53%. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 17 30 35 40 45 50 Juice Lost After Thawing (%) 55 60 3.49: The fact that the mean is so much higher than the median indicates that the distribution is positively skewed. There were undoubtedly some very high punitive damage awards, which pulled the mean up toward the large values. Section 3.5 Exercise Set 1 3.50: First national aptitude test: z  625  475  1.5 . Second national aptitude test: 100 45  30  1.875 . The student performed better on the second national aptitude test 8 relative to the other test takers because the z-score for the second test is higher than that for the first test. z 3.51: (a) 40 minutes is 1 standard deviation above the mean; 30 minutes is 1 standard deviation below the mean. The values that are 2 standard deviations away from the mean are 25 and 45 minutes. (b) Approximately 95% of times are between 25 and 45 minutes; approximately 0.3% of times are less than 20 minutes or greater than 50 minutes; Approximately 0.15% of times are less than 20 minutes. 3.52: The 10th percentile of $0 indicates that 10% of students have $0 or less of student debt. The 25th percentile (which is the lower quartile) indicates that 25% of students have $0 or less of student debt. The 50th percentile (the median) indicates that 50% of students have $11,000 or less of student debt. The 75th percentile (the upper quartile) indicates that 75% of students have $24,600 or less of student debt. The 90th percentile indicates that 90% of students have $39,300 or less of student debt. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 18 3.53: (a) 100 90 80 Frequency 70 60 50 40 30 20 10 0 15 16 17 18 19 20 21 22 23 Bus Travel Times (minutes) 24 25 26 (b) (i) 86th percentile is approximately 21 minutes; (ii) 15th percentile is approximately 18 minutes; (iii) 90th percentile is approximately 21.5 minutes; (iv) 95th percentile is approximately 25.5 minutes; (v) 10th percentile is approximately 17.5 minutes Section 3.5 Exercise Set 2 3.54: (a) The z-score tells us that the score is 2.2 standard deviations above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore corresponds to a score slightly above the 97.5th percentile, which means the score is greater than or equal to approximately 97.5% of all the scores. (b) The z-score tells us that the score is 0.4 standard deviations above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore tells us that the score is in the upper half of all scores. (c) The z-score tells us that the score is 1.8 standard deviations above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore corresponds to a little below the 97.5th percentile. (d) The z-score tells us that the score is 1.0 standard deviation above the mean. Because the distribution was mound-shaped and symmetric, the empirical rule applies, and this zscore corresponds to approximately the 84th percentile, which means the score is greater than or equal to approximately 84% of all the scores. (e) The z-score of 0 indicates my score was equal to the mean and median. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 19 3.55: (a) Given that the distribution is symmetric and mound-shaped, we can apply the empirical rule. Twenty-seven mph is 1 standard deviation below the mean, and 57 mph is 1 standard deviation above the mean. The empirical rule tells us that approximately 68% of the vehicle speeds lie within one standard deviation of the mean, or between 27 and 57 mph. (b) Given that the distribution is symmetric and mound-shaped, we can apply the empirical rule. Fifty-seven mph is 1 standard deviation above the mean. Therefore, by the empirical rule, 84% of the vehicle speeds lie below 1 standard deviation above the mean, so 16% of the observations will lie above 1 standard deviation above the mean. 3.56: The 83rd percentile indicates that her score was greater than or equal to 83% of all scores on the verbal section of the test. Additionally, she scored greater than or equal to 94% of all scores on the math section. 3.57: (a) The frequency distribution is shown in the table below. Expenditures (per capita) 0-<2 2-<4 4-<6 6-<8 8 - < 10 10 - < 12 12 - < 14 14 - < 16 16 - < 18 Frequency 13 18 10 5 1 2 0 0 2 20 Frequency 15 10 5 0 0 2 4 6 8 10 12 Expenditures (per capita) 14 16 18 ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 20 (b) (i) The 50th percentile is between per capita expenditures of 2 and 4. (ii) The 70th percentile is between per capita expenditures of 4 and 6. (iii) The 10th percentile is between per capita expenditures of 0 and 2. (iv) The 90th percentile is between per capita expenditures of 7 and 8. (v) The 40th percentile is between per capita expenditures of 2 and 4. Additional Exercises for Section 3.5 3.58: (a) z  data value  mean 0  1, 650   2.2 standard deviation 750 (b) z  data value  mean 10, 000  1,650   11.133 standard deviation 750 (c) z  data value  mean 4,500  1, 650   3.8 standard deviation 750 (d) z  data value  mean 300  1, 650   1.8 standard deviation 750 3.59: (a) 1100 gallons; (b) 1400 gallons; (c) 1700 gallons 3.60: (a) z  data value  mean 320  450   1.857 standard deviation 70 (b) z  data value  mean 475  450   0.357 standard deviation 70 (c) z  data value  mean 420  450   0.429 standard deviation 70 (d) z  data value  mean 610  450   2.286 standard deviation 70 90  100  0.5 ; (d) 97.5; (e) Since a score of 40 is 3 standard 20 deviations below the mean, that corresponds to a percentile of 0.15%. Therefore, there were relatively few scores below 40. 3.61: (a) 120; (b) 20; (c) z  ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 21 Chapter 3: Are You Ready to Move On? 3.62: (a) The distribution of female weekend exercise time is positively skewed (see the boxplot below), so the median and interquartile range should be used to describe center and spread, respectively. 0 10 20 30 40 50 60 Female Weekend Exercise Time 70 80 90 (b) The distribution of amount of alcohol poured is negatively skewed (see the boxplot below), so the median and interquartile range should be used to describe center and spread, respectively. 20 30 40 50 60 Amount of Alcohol (mL) 70 80 (c) The distribution of wait times is positively skewed with a large outlier (see the boxplot below), so the median and interquartile range should be used to describe center and spread, respectively. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 22 50 100 150 200 250 wait time (sec) 300 350 400 3.63: The mean APEAL rating is x  792.03 , which is a typical or representative value for the APEAL ratings in the sample. The standard deviation is s  36.70 and represents how much, on average, the values in the data set spread out, or deviate, from the mean APEAL rating. 3.64: The high-caffeine energy drinks show much more variability in caffeine per ounce. This can be seen in the comparative boxplots below. In addition, since both distributions are reasonably symmetric, the standard deviation is an appropriate measure of variability. The standard deviation for the caffeine content per ounce in the energy drinks is s = 0.667, and the standard deviation for the caffeine content per ounce in the high-caffeine energy drinks is s = 8.31. Top Selling Energy Drink High Caffeine Energy Drink 10 15 20 25 30 Caffeine per Ounce 35 3.65: (a) The mean tipping percent is x  27.31% , and the standard deviation is s  23.83% . (b) After removing the 105% tip, the new mean and standard deviation are xnew  23.23% , and ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 23 snew  15.70% . These values are much smaller than the mean and standard deviation computed with 105 included. This suggests that the mean and standard deviation can change dramatically when outliers are present (or removed) from the data set, and, therefore, are probably not the best measures of center and spread to use in this situation. 3.66: The mean repair cost is $2,119, and the median repair cost is $1,688. These values are so different because the distribution is positively skewed, and the mean tends to be pulled toward larger values in positively skewed distributions, whereas the median is more resistant. Therefore, the median is preferable to the mean because the distribution of repair costs is positively skewed (see dotplot below). 3.67: (a) The median is 140 seconds, and the interquartile range is iqr  200  100  100 seconds. The median divides the ordered list into two equal halves, with half the values less than 140 seconds and half the values greater than 140 seconds. The interquartile range of 100 seconds indicates that the middle 50% of the data values have a range of 100 seconds. (b) Due to the presence of the large outlier (the value 380 seconds), the median and interquartile range are the appropriate summary measures to describe center and spread for this data set. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 24 3.68: (a) Median = 58, lower quartile = 53.5, upper quartile = 64.4 (b) Outliers at the low end of the distribution are values less than lower quartile – 1.5(iqr). The iqr = 64.4 – 53.5 = 10.9, so values less than 53.5 – 1.5(10.9) = 37.15 are considered outliers. Since the values for Alaska (28.2) and Wyoming (35.7) are both less than 37.15, they are outliers. (c) The distribution is negatively skewed with two outliers on the low end of the scale. The median is 58%, and the lower and upper quartiles are 53.5% and 64.4%, respectively. The middle 50% of the data values range between these quartiles, and is approximately symmetric. Excluding the outliers, the distribution of the remaining data values is approximately symmetric. 30 40 50 60 Percent Still Living in the State 70 80 ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 25 3.69: (a) Median = 8 grams/serving; lower quartile = 7 grams/serving; upper quartile = 12 grams/serving; interquartile range = 12 – 7 = 5 grams/serving (b) Median = 10 grams/serving; lower quartile = 6 grams/serving; upper quartile = 13 grams/serving; interquartile range = 13 – 6 = 7 grams/serving (c) There are no outliers in the sugar content data set because there are no values greater than 1.5(iqr) above the upper quartile or smaller than 1.5(iqr) below the lower quartile. (d) The minimum value and lower quartile are the same because the smallest five values in the data set are all equal to 7. (e) Fiber Sugar 0 5 10 Content (grams/serving) 15 20 The sugar content in grams/serving is much more variable than the fiber content in grams/serving. The range in sugar content (19 grams/serving) is greater than the range in fiber content (7 grams/serving). The boxplot of fiber content shows that the minimum and lower quartile are equal to each other, which is not observed in the sugar content. The distribution of sugar content values are approximately symmetric, which is different from the skewed fiber distribution. 3.70: Use z-scores to make comparisons between the two different stimuli. For stimulus 1, 4.2  6.0 1.8  3.6  2.25 . The z-scores indicate that z  1.5 , and for stimulus 2, z  1.2 0.8 your reaction time for stimulus 1 is 1.5 standard deviations below the mean, and your ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part. 26 reaction time for stimulus 2 is 2.25 standard deviations below the mean. Therefore, compared to other people, you are reacting to stimulus 2 more quickly. 3.71: (a) The 25th percentile indicates that 25% of full-time female workers age 25 or older with an Associate degree earn $26,800 or less. The 50th percentile indicates that 50% of fulltime female workers age 25 or older with an Associate degree earn $36,800 or less. The 75th percentile indicates that 75% of full-time female workers age 25 or older with an Associate degree earn $51,100 or less. (b) The 25th, 50th, and 75th percentile values for men are all greater than the corresponding percentiles for female workers, indicating that fulltime employed men age 25 or older with an Associate degree, in general, earn more than full-time employed women age 25 or older with an Associate degree. ©2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible website, in whole or in part.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter3