Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Data Summary Using Descriptive Measures CHAPTER OVERVIEW AND OBJECTIVES Chapter 2 examined techniques for visually describing a set of data. The purpose of this chapter is to introduce techniques for describing a set of data using one or more numerical measures. By the end of the chapter, the student should be able to define and use the following measures: 1. Measures of Central Tendency: 2. Measures of Variation: Mean, Median, Mode and Midrange. Range, Standard Deviation, Variance, and Coefficient of Variation. 3. Measures of Position: 4. Measures of Shape: Percentiles, Quartiles, and z-scores. Skewness and Kurtosis. 5. Techniques for handling frequency distributions (grouped data). 6. Construction of box plots. 75 76 Instructor's Manual Chapter 3 Glossary box plot. A diagram that demonstrates the lowest and highest values within that portion of the sample not containing outliers, the three sample quartiles, and any sample values determined to be outliers. Chebyshev's inequality. A rule stating at least what percentage of the sample values are within 2, 3, standard deviations of the mean. coefficient of variation. The sample standard deviation divided by the sample mean and multiplied by 100. descriptive measure. A statistic that describes the location, variation, or shape of a sample or one that describes the position of an individual value in a sample (such as a percentile). empirical rule. A rule that states approximately what percentage of the sample values are within 1, 2, and 3 standard deviations of the mean. This rule assumes that the population has a bell-shaped (normal) appearance. grouped data. Summarized data in the form of a frequency distribution. interquartile range. The difference between the first and third quartiles (Q3 - Q1). kurtosis. A measure of shape that describes the tendency of a distribution to stretch out in a particular direction. mean. -. The average of the sample data; its symbol is x measure. See descriptive measure. Measures consist of measures of central tendency, variation, position, and shape. Chapter 3 measures of central tendency. 77 Measures that describe the location (typical value) of a sample, including the sample mean, median, midrange, and mode. measures of variation. Measures that describe the variation within a sample; they include the sample range, variance, standard deviation, and coefficient of variation. measures of position. Measures that indicate the relative position of a sample value, such as percentiles, quartiles, and z-scores. measures of shape. Measures that describe the shape (symmetry and peakedness) of a sample, including measures of skewness (lack of symmetry) and kurtosis (peakedness). median. The value in the center of the ordered data (if the sample size is an odd number) or the average of the two center values (if the sample size is an even number). midrange. mode. The average of the lowest and highest values in the sample. The sample value that occurs more than once and the most often. outier. An unusually large or small data value in a sample. Such a value can be illustrated and detected using a box plot and is considered to be an extreme outlier if it lies beyond either of the outer fences. A mild outlier is a sample value that lies beyond either of the inner fences but not beyond the corresponding outer fence. 78 Instructor's Manual percentile. A measure of position, written PK, where at most K% of the sample values are less than PK and at most (100 - K)% of the sample values are greater than PK. quartiles. Special percentiles; the 1st quartile = 25th percentile, 2nd quartile = 50th percentile (= median), and 3rd quartile = 75th percentile. range. The difference between the highest and lowest data values in the sample. skewness. A measure of shape that describes the degree of symmetry in the sample data. standard deviation. The square root of the sample variance; its symbol is s. variance. A measure of variation that is obtained by summing the squared deviations from the sample mean and dividing by one less than the sample size; its symbol is s2. z-score. A measure of position for any particular value in a sample. It is obtained by subtracting the mean and dividing by the standard deviation. It tells how many standard deviations to the right or left of the mean this value lies. Chapter 3 3.1 a) 10 20 30 Mean = 60/3 = 20 There is no mode. Median = 20 Midrange = (10 + 30) / 2 = 20 b) 3 5 6 8 9 Mean = 31/5 = 6.2 There is no mode. Median = 6 Midrange = (3 + 9) / 2 = 6 c) 1 2 2 3 7 7 8 9 10 14 Mean = 63/10 = 6.3 Mode = 2 and 7 Median = (7 + 7) / 2 = 7 Midrange = (1 + 14) / 2 = 7.5 3.2 a) 4 6 7 8 10 13 13 14 18 Mean = 93/9 = 10.333 cubic meters Mode = 13 cubic meters Median = 10 cubic meters Midrange = (4 + 18)/2 = 11 cubic meters b) Convert yards to feet 9 12 14 16 20 Mean = 159/8 = 19.875 Mode = 30 28 30 30 79 80 Instructor's Manual Median = (16 + 20)/2 = 18 Midrange = (9 + 30)/2 = 19.5 c) Convert all numbers to percents Mean = 310/20 = 15.5% Mode = 10 and 15 Median = (14 + 15)/2 = 14.5% Midrange = (7 + 30)/2 = 18.5% 3.3 a) 1 2 7 8 9 Mean = 5.4 Median = 7 b) 1 3 10 18 19 Mean = 10.2 Median = 10 3.4 a) - = 1515/30 = 50.5 x Median = (50 + 50)/2 = 50 Mode = 50 Midrange = (40 + 88)/2 = 64 b) Except for the midrange, the measures of central tendency are approximately the same. Any of these measures, except the midrange appear to be appropriate. 3.5 No, the median does not have to change by the same amount that the mean changes. 3.6 a) Mean = (12418/20) = 620.9 Median = (633 + 638)/2 = 635.5 Chapter 3 81 Mode = 640 b) The median and the mode would not be easily influenced by extreme values. - = 11998/19 = 631.47 x c) Median = 638 Mode = 640 The mean changed the most. 3.7 Mean = 55.77/14 = 3.98 Median = (.5 + .68)/2 = .59 If 26.30 is omitted, the value of the mean will change more than if any other values were omitted. After omitting 26.30, the mean is 29.47/13 = 2.267. is .5. 3.8 a) Process 1: - = 129/10 = 12.9 x Median = (13 + 13)/2 = 13 Mode = 13 Midrange = (11 + 15)/2 = 13 Process 2: - = 142/10 = 14.2 x Median = (13 + 13)/2 = 13 Mode = 13 Midrange = (10 + 20)/2 = 15 The median 82 Instructor's Manual b) In Process 1, the data are approximately symmetrical. Hence, the measures of central tendency are approximately the same. In Process 2, a few large values, (19 and 20) easily affect the mean and the midrange. Hence these two values are some- what higher than the other measures of central tendency. 3.9 a) Mean = 4020.4 / 15 = 268.0267 Median = 168.3 Midrange = (.8 + 1400) / 2 = 700.4 Yes, these statistics would be expected to be different especially when there is a very large value in the dataset. b) Mean = 2620.4 / 14 = 187.1714 Median = (70.5 + 168.3) / 2 = 119.4 Midrange = 319.9 The value of the midrange changed the most. 3.10 a) The wait time is mostly less than 25 minutes as viewed from the frequency table below. However, in 30% of the observed cases, the wait time has been 25 minutes or more. In fact, in 14% of the cases the wait time has been at least 29 minutes but less than 31 minutes. Chapter 3 83 Frequency Distribution Table CLASS CLASS LIMITS FREQUENCY 1 15 and under 17 8 2 17 and under 19 6 3 19 and under 21 10 4 21 and under 23 8 5 23 and under 25 3 6 25 and under 27 2 7 27 and under 29 6 8 29 and under 31 7 TOTAL 50 b.) The mean is 21.84 and the median is 21. On average, the new procedure is reaching its target. 3.11 a) The mean of the scores is 68.67 and the median is 70. Frequency Histogram 60 50 40 30 20 10 0 20 and under 30 and under 40 and under 50 and under 60 and under 70 and under 80 and under 90 and under 30 40 50 60 70 80 90 100 Class Limits b) From the histogram below, the scores appear to be clustered between 50 and 90, with outliers greater than 90 and less than 84 Instructor's Manual 50. About 34% of the observations fall between 70 and 80. The shape is somewhat bell shaped between 50 and 90. c) The new mean (median) should be equal to 4.5 times the old mean (median) plus 50. d) The mean of the non-standardized data is 359 and the median of the non-standardized data is 365. 3.12 x = 20 x2 = 33 n = 50 - = 20/50 = 0.4 x s2 = (33 - 202/50)/49 = .5102 s = .5102 = .7143 .5102 CV = (.7143/.4) 100 = 178.6% 3.13 a) Range = 8 - 2 = 6 s2 = (118 - 222/5)/4 = 5.3 s = 5.3 = 2.302 CV = (2.302/4.4) 100 = 52.322% b) Range = 22 - 10 = 12 s = c) b) CV = (3.7253/17.1) 100 = 21.79% 13.877 = 3.7253 Range = 5.3 - 2.1 = 3.2 s = 3.14 a) s2 = (3049 - 1712/10)/9 = 13.8777 s2 = (338.18 - 80.42/20)/19 = .788 CV = (.8877/4.02) 100 = 22.09% .788 = .888 The values of differences should sum to zero. The variance is the sum of the squared values of the differences divided by the number of observations minus one. Variance = (25 + 1 + 9 + 4 + 9 + 4) / 5 Variance = 10.4 c) Standard deviation is 104 . 3225 . 10.4 = 3.225 Chapter 3 3.15 a) Since the range is larger for the large carriers, one might expect this group to have more variation. b) Standard deviation of the large group of carriers is 32.27. Standard deviation of the small group of carriers is 16.63. 3.16 Stock A: - = 149/12 = 12.41667 x s = = 2.7122 Stock B: - = 445/12 = 37.08333 x s = = 6.77506 CVA = 2.7122/12.41667 100 = 21.8432% CVB = 6.77506/37.08333 100 = 18.2698% Stock B appears to be more stable since its coefficient of variation is lower. 3.17 Note that: -)2 = (x2 - 2 x x - + x -2) = x2 - 2 x - x + nx -2 (x - x = x2 - 2(x)2/n + (x)2/n = x2 - (x)2/n 3.18 a) The range depends on only 2 values whereas the standard deviation uses all the data values in its calculation. However, the range is easier to calculate than the standard deviation. b) Zero is the smallest value. 85 86 Instructor's Manual c) We can say that all the data points are equal to the same value. 3.19 a) A guess at the mean can be determined by looking at where the data are centered. A guess at the standard deviation can be determined by looking at the range of the data. b) Mean is 1368 / 7 = 195.4286. Variance is 276762 (1368)2 / 7 1569.286 7 1 Standard deviation is 39.614. c) Standard deviation will decrease. d) Removing the smallest and largest observations, - = 1008 / 5 = 201.6 x s2 = (205912 - (1008)2 / 5) / (5 - 1) = 674.8 s = 25.977 e) The middle values give one a clue as to what the mean might be. The range of the data give a clue as to what the standard deviation might be. 3.20 a) The first histogram is for the government sector and the second histogram is for the private sector. The values for the government sector appear to be more concentrated in the 1000 and less than 2000 range. The values for the private sector are somewhat larger and slightly more spread out. Chapter 3 87 88 Instructor's Manual b) The mean and standard deviation for the government employees are 1660.37 and 1171.106, respectively. The mean and standard deviation for the private sector employees is 2263.14 and 1247.56, respectively. 3.21 a) Mean is 85.2, standard deviation is 47.6, coefficient of variation is 100(47.604/85.2) = 55.87, and the variance is 2266.182. b) The mean should change by dividing its value by 60. The standard deviation should change by dividing its value by 60. The variance should change by dividing its value by 3600. c) Mean is 1.42, standard deviation is .793, and the variance is .6295. 3.22 a) nP / 100 = 10 (75) / 100 = 7.5 Round up to 8. 8th value is 3. 75th percentile is 3. np / 100 = 10 (50) / 100 = 5 50th percentile = (5th value + 6th value) / 2 = 2) / 2 = 2 b) x = 22/10 = 2.2 s2 = (80 - (22)2 / 10) / (10 - 1) = 3.511 s = 1.874 Chapter 3 - - Md) / s sk = 3 (x = 3 (2.2 - 2) / 1.874 = .3202 3.23 Median is 1.5, Mean is 10 / 10 = 1 and s2 = (20 - (10)2/ 10) / 9 = 1.111 s = 1.054 - - Md) / s Sk = 3 (x = 3 (1 - 1.5) / 1.054 = -1.423 3.24 a) 20th percentile: i = (20 20)/100 = 4; (4th value + 5th value) / 2 = (2.4 + 2.4)/2 = 2.4 b) 40th percentile: i = (20 40)/100 = 8; (8th value + 9th value) / 2 = (3.7 + 3.8)/2 = 3.75 c) 60th percentile: i = (60 20)/100 = 12; (12th value + 13th value) / 2 = (4.5 + 4.6)/2 = 4.55 d) 80th percentile: i = (80 20)/100 = 16; (16th value + 17th value) / 2 = (6.3 + 7.2)/2 = 6.75 e) Interquartile range: 25th percentile: 89 90 Instructor's Manual i = (20 25)/100 = 5; (5th value + 6th value) / 2 = (2.4 + 2.5)/2 = 2.45 75th percentile: i = (20 75)/100 = 15; (15th value + 16th value) / 2 = (5.4 + 6.3)/2 = 5.85 IQR = 5.85 - 2.45 = 3.4 - = 50 x 3.25 3.26 s = 5 a) Z = (40 - 50)/5 = -2 b) Z = (65 - 50)/5 = 3 c) 1 = (x - 50)/5 x = 5 + 50 = 55 d) -2.5 = (x - 50)/5 x = -12.5 + 50 = 37.5 a) nP / 100 = 20 (75) /100 = 15 75th percentile = (15th value + 16th value) / 2 = (55 + 60) / 2 = 57.5 nP / 100 = 20 (25) / 100 = 5 25th percentile = (5th value + 6th value) / 2 = (35 + 38) / 2 = 36.5 Interquartile range = Q3 - Q1 = 57.5 - 36.5 = 21 b) c) Mean = 941 / 20 = 47.05 s2 = (47587 - (941)2 / 20) / 19 = 174.3658 s = 13.2048 Chapter 3 d) 91 Median is (49 + 50) / 2 = 49.5 - - Md) / s Sk = 3 (x = 3 (47.05 - 49.5) / 13.2048 = -.5566 3.27 a) 20th percentile: i = 30(20)/100 = 6; (73 + 74)/2 = 73.5 = 20th percentile b) 80th percentile: i = 30(80)/100 = 24; (105 + 106)/2 = 105.5 = 80th percentile c) 25th percentile: i = 30(25)/100 = 7.5; 75 = 25th percentile 75th percentile: i = 30(75)/100 = 22.5; 103 = 75th percentile IQR = 103 - 75 = 28 d) - = 2719/30 = 90.6333 x Median: i = 30/2 = 15th position; (87 + 88)/2 = 87.5 = median s = = 19.71781 Sk = 3(90.6333 - 87.5)/19.71781 = 0.47673 e) The data are slightly skewed to the right. the data fall between 75 and 103. The middle 50% of The mean and standard deviation are 90.6333 and 19.71781, respectively. 92 3.28 Instructor's Manual 12 15 19 20 21 21 22 42 45 47 52 53 53 54 70 71 71 71 72 73 73 74 74 74 75 77 84 86 87 90 a) Q1 = 25th percentile: i = 30(25)/100 = 7.5; 42 = Q1 Q2 = 50th percentile: i = 30(50)/100 = 15; (70 + 71)/2 = 70.5 = Q2 Q3 = 75th percentile: i = 30(75)/100 = 22.5; 74 = Q3 s = = 24.631 Sk = 3(56.6 - 70.5)/24.631 = -1.69 b) The observations that are much higher than Q3 and much lower than Q1 may be considered unusually high and low, respectively. c) 3.29 1 75 is close to the 75th percentile of 74. 4 6 7 9 a) s = 183 27 2 / 5) / 4 3.0496 b) z = (1 - 5.4)/3.0496 = -1.443 z = (4 - 5.4)/3.0496 = -.4591 z = (6 - 5.4)/3.0496 = 0.1967 z = (7 - 5.4)/3.0496 = 0.52466 z = (9 - 5.4)/3.0496 = 1.18049 c) The standard deviation of the Z score is expected to be 1. Chapter 3 sz 400053 02 ) / 4 1.00006 1 93 94 Instructor's Manual 3.30 Boyston Mean 1.237 Standard deviation 3.31 .077896 Farmersville 1.235 .055227 Median 1.2 1.245 Coefficient of skewness 1.425 -.5432 a) - = 1205; s2 = 7,837,307; s = 2799.519, Median = 236 x Sk = 3 (1205 - 236) / 2799.519 = 1.038393 b) The skewness should decrease. - = 159.2857; s2 = 8762.571; s = 93.6086; Median = 153 x - - Md) / s Sk = 3 (x = 3 (159.2857 - 153) / 93.6086 = .20145 3.32 a) nP / 100 = 75 (75) / 100 = 56.25; round up to 57 75th percentile is 7.44 nP / 100 = 75 (25) / 100 = 18.75; round up to 19 25th percentile is 3.12 Interquartile range = 7.44 - 3.12 = 4.32 - = 5.421; Md = 5.81; s2 = 6.7554; s = 2.599115 x - - Md) / s Sk = 3 (x = 3 (5.421 - 5.81) / 2.599115 = -.449 Chapter 3 b) Interquartile range = .7 (4.32) = 3.024 - = 3.7947; Md = 4.067; s2 = 3.310146; s = 1.81938; x Sk = -.449 All values were multiplied by .7 in part (b), except the skewness measure which did not change. 3.33 a) The data do not look to be very skewed from the histogram. Therefore, one would expect to have a small value for the coefficient of skewness. 95 96 Instructor's Manual Frequency Histogram 10 9 8 7 6 5 4 3 2 1 0 50 and under 70 and under 90 and under 70 90 110 110 and under 130 130 and under 150 150 and under 170 170 and under 190 190 and under 210 Class Limits b) - = 127.3724; s2 = 1782.21124; s = 42.2162; Md = 134.995 x Sk = 3 (127.3724 - 134.995) / 42.2162 = -.54168 c) Since the data is negatively skewed, the largest 5 values are deleted. That is, values 199.56, 199.39, 183.22, 180.38, and 174.53 are deleted. - = 120.7008; s2 = 1517.6195; s = 38.9566; Md = 129.49 x Sk = 3(120.7008 - 129.49)/38.9566 = -.677 3.34 - = 20 x a) - 2s x s = 5 20 2(5) At least 75% of the data lie between 10 and 30. b) - 3s x 20 3(5) At least 89% of the data lies between 5 and 35 Chapter 3 3.35 a) 97 - - s to x - + s x 100 - 20 to 100 + 20 80 to 120 b) - - 3s to x - + 3s x 100 - 60 to 100 + 60 40 to 160 3.36 (1 - 1/k2) 100% = (1 - 1/16) 100% = 93.75% 3.37 a) - = 448/10 = 44.8 x b) - 2s x s = 10.5283 44.8 2(10.5283) At least 75% of the data lie between 23.7434 and 65.8565. c) - 3x ===> x 44.8 3(10.5283) At least 89% of the data lie between 13.2152 and 76.3848. d) Yes, it is consistent with Chebyshev's inequality. 90% of the observations fall between 23.7434 and 65.8565 and 100% of the observations fall between 13.2152 and 76.3848. 3.38 - = 120 x a) s = 30 n = 300 - + 2s. 60 to 180 is equivalent to x At least 75% of the data or 225 observations would lie within the interval 60 to 180. b) With the bell-shaped assumption, 95% of the data or 285 observations would lie between 60 and 180. 3.39 - = 50.5 x s = 9.7512 n = 30 - s = 50.5 9.7512 = 40.749 to 60.2512 x 98 Instructor's Manual 27/30 = .90 or 90% of the data values fall in this interval. - 2s = 50.5 2(9.7512) = 30.997 to 70.002 x 28/30 = 93.33% of the data values fall in this interval. - 3s = 50.5 3(9.7512) = 21.246 to 79.753 x 28/30 = 93.33% of data values fall in this interval The data do not appear to come from a normal population since 90% of the data lie within one standard deviation. 3.40 30 = 45 - 15 to 45 + 15 = 60 By the empirical rule, 15 is equal to two standard deviations. Therefore, s = 7.5. 3.41 - = 90.6333 x s = 19.71781 n = 30 - 2s = 90.633 2(19.71781) x At least 75% of the data will fall between 51.198 and 130.069. Twenty-nine observations or 96.667% of the data actually lie within this interval. 3.42 1 - 1/k2 = .55 1/k2 = .45 k2 = 2.222 k = 1.41 At least 55% of the data lie within 1.5 standard deviations of the mean. 3.43 - = 12.733; s = 9.49 x - - 2s = - 6.247; x - + 2s = 31.713; x - - 3s = -15.737; x - + 3s = 41.203 x a) 14 observations lie within two standard deviations of the mean, that is 93.3% of data are within two standard deviations Chapter 3 of the mean. 99 100% of the data lie within three standard deviations of the mean. b) The results in part (a) are consistent with Chebyshevs inequality. Frequency Histogram 25 20 15 10 5 0 0 and under 5 and under 5 10 10 and under 15 15 and under 20 20 and under 25 25 and under 30 30 and under 35 35 and under 40 40 and under 45 Class Limits 3.44 a) The histogram is approximately bell-shaped as displayed below. b) The sample mean plus or minus two standard deviations is as follows: 19.3867 - 2(6.8554) to 19.3867 + 2(6.8554) 5.6759 to 33.0975 Approximately 95% of the data should lie between these interval endpoints. 3.45 a) The following interval contains at least 75% of the data: - - 2s to x - + 2s x 18335.76 - 2(2268.515) to 18335.76 + 2(2268.515) $13,798.73 to $22,872.79 The following interval contains at least 89% of the data. 100 Instructor's Manual - - 3s to x - + 3s x 18,335.76 - 3(2268.515) to 18335.76 + 3(2268.515) $11,530.22 to $25,141.31 b) 196 observations (98%) of the data lie within 2 standard deviations of the mean. The minimum number expected by Chebyshevs inequality is 150. 199 observations (99.5%) of the data lie within 2 standard deviations of the mean. The minimum number expected by Chebyshevs inequality is 178. 3.46 a) - = [(5)(4) + (15)(7) + (25)(5) + (35)(4)]/20 = 19.5 x b) s 9700 3902 / 20) /19 10.5006 Note fm2 = 4(5)2 + 7(15)2 + 5(25)2 + 4(35)2 = 9700 3.47 a) - = [(10)(10) + (10)(20) + (10)(30) + (10)(40) + (10)(50)]/50 x = 30 3.48 b) s2 = (55000 - 15002/50)/49 = 204.0816 c) s = 204 . 0816 = 14.2857 - = [(30)(5) + (40)(11) + (50)(18) + (60)(6) + (70)(10)]/50 x = 2550/50 = 51 3.49 - = (115(2) + 145(12) + 175(4) + 205(1) + 235(2) + 265(1)) x / 22 = 3610 / 22 = 164.0909 Chapter 3 6 fm i 1 i 2(115)2 12(145)2 4(175)2 2 i 1(205) 2 2(235) 2 1(265) 2 623950 s2 (623950(3610)2 /22)/211503.896 s 38.7801 3.50 a.) Frequency Distribution Table CLASS CLASS LIMITS FREQUENCY 1 6 and under 9 3 2 9 and under 12 6 3 12 and under 15 4 4 15 and under 18 2 TOTAL 15 b) - = (7.5(3) + 10.5(6) + 13.5(4) + 16.5(2)) / 15 x = 172.5 / 15 = 11.5 4 fm i 1 i 2 1 3(7.5)2 6(10.5)2 4(13.5)2 2(16.5)2 = 2103.75 s2 = (2103.75 - (172.5)2 / 15) / 14 = 8.5714 s 3.51 2.9277 = c) - = 11.66; s = 2.802 x a) - = [(7.5)(5) + (12.5)(15) + (17.5)(31) + (22.5)(30) x + (27.5)(16) + (32.5)(3)]/100 = 1980/100 = 19.8 101 102 Instructor's Manual 42575 19802 /100) / 99 34 0505 5.8353 b) s c) Yes, the mean is an appropriate summary statistic since these data appear to have a symmetrical distribution. 3.52 a) Frequency Distribution Table CLASS CLASS LIMITS FREQUENCY 1 10 and under 15 15 2 15 and under 20 78 3 20 and under 25 78 4 25 and under 30 23 5 30 and under 35 6 TOTAL 200 b) - = [12.5(15) + 17.5(78) + 22.5(78) + 27.5(23) + 32.5(6)]/200 x = 20.675 fm2 = 15(12.5)2 + 78(17.5)2 + 78(22.5)2 + 23(27.5)2 + 6(32.5)2 = 89450 s2 = [(89450 - (4135)2 / 200)] / 199 = 19.894 s = 4.46 c) - = 20.155; s = 4.106 x a) -------------------------------------I + I------------------------------------+---------+---------+---------+---------+---------+------C1 0 20 40 60 80 100 Lower hinge = 6.0 b) 8.0 Median = c) Upper hinge = 10.5 d) Mild outlier = 18 3.53 3.54 Chapter 3 e) No extreme outliers 103 104 Instructor's Manual 3.55 3.56 a) ------------------------------I + I------------------------ -------------------------------+---------+---------+---------+---------+--------C1 6.4 b) 8.0 9.6 11.2 12.8 Approximately 25% of the customers are served in 10 minutes or more. Therefore, the manager should rethink the policy of allowing customers to eat for free if not served within 10 minutes. Chapter 3 105 3.57 a) Box Plot 140.000000 120.000000 Upper Outer Fence 100.000000 Upper Inner Fence 80.000000 60.000000 Third Quartile 40.000000 Median First Quartile 20.000000 0.000000 Lower Inner Fence -20.000000 -40.000000 Lower Outer Fence -60.000000 * mild outlier b) o extreme outlier The distribution appears to be skewed slightly to the right. There is one mild outlier. 3.58 ---------------------------I + I-----------------------------------------------+---------+---------+---------+---------+---------+--C1 64 80 96 112 128 144 For this set of 30 observations, we would not expect any mild or extreme outliers. 106 3.59 Instructor's Manual The distribution is slightly skewed to the right. Box Plot 400.000000 350.000000 Upper Outer Fence Upper Inner Fence 300.000000 Third Quartile Median First Quartile 250.000000 Lower Inner Fence 200.000000 Lower Outer Fence 150.000000 100.000000 50.000000 0.000000 * mild outlier o extreme outlier Chapter 3 3.60 a) 107 The box plot below is for Holiday Hotel North Box Plot 140.000000 Upper Outer Fence 120.000000 Upper Inner Fence 100.000000 Third Quartile Median First Quartile 80.000000 Lower Inner Fence 60.000000 Lower Outer Fence 40.000000 20.000000 0.000000 * mild outlier o extreme outlier 108 Instructor's Manual The box plot below is for Holiday Hotel South Box Plot 140.000000 Upper Outer Fence 120.000000 Upper Inner Fence 100.000000 Third Quartile Median First Quartile 80.000000 Lower Inner Fence 60.000000 Lower Outer Fence 40.000000 20.000000 0.000000 * mild outlier b) Holiday Hotel South is more skewed to the left (toward the smaller values). North. It has more outliers than Holiday Hotel The median for Holiday Hotel South is larger than that of Holiday Hotel North. 3.61 a) o extreme outlier - = 15.983 x Median = 11.15 There is no mode. Chapter 3 109 Since there is a very large value (46) in the data set, the mean will be affected. The median would be a more appropriate measure of central tendency. b) - - Md)/s Sk = 3 (x = 3 (15.989 - 11.15)/15.2472 = .9521 The data are slightly skewed to the right. c) First z = (46 - 15.983)/15.2472 = 1.968 Second z = (16-15.983)/15.2472 = .001 The first z-value says that Yahoos price / sales figure is approximately 2 standard deviations from the mean. 3.62 a) - = (1091.3)/7 = 155.9 x b) Midrange = (186.7 + 126.9)/2 = 156.8 c) s2 = (172595.33 - (1091.3)2/7)/6 = 410.277 s = 20.255 d) - = 13% CV = s / x e) nP / 100 = 70 (40)/100 = 2.8; round up to 3 40th percentile = 146.9 f) nP / 100 = 7(25)/100 = 1.75; round up to 2 25th percentile = 146.0 nP / 100 = 7(75)/100 = 5.25; round up to 6 75th percentile is 176 Interquartile range is 176 - 146 = 30 110 3.63 Instructor's Manual a) - = 7.236; s = .4843 x At least 75% of the data should lie within 2 standard deviations of the mean. - - 2s to x - + 2s x 7.236 - 2(.4843) to 7.236 + 2(.4843) 6.268 to 8.204 b) nP / 100 = 11(25)/100 = 2.75; round up to 3 nP / 100 = 11(75)/100 = 8.25; round up to 9 25th percentile is 6.8 75th percentile is 7.5 Interquartile range is 7.5 - 6.8 = .7 The following results are for 8.0 omitted nP / 100 = 10(25)/100 = 2.5; round up to 3 nP / 100 = 10(75)/100 = 7.5; round up to 8 25th percentile is 6.8 75th percentile is 7.5 Interquartile range is 7.5 - 6.8 = .7 There is no change in the Interquartile range 3.64 a) - = 203.2667; median = 205; s = 24.835 x Sk = 3(203.2667 - 205)/24.835 = - .2094 b) - = 275.333; median = 275; s = 60.1696 x Sk = 3(275.333 - 275)/60.1696 = .0166 Chapter 3 c) 111 The Caribbean data have a larger mean, median and standard deviation. The European data are slightly negatively skewed whereas the Caribbean data are slightly positively skewed. However, for both groups the magnitude of the skewness coefficient is small. 3.65 a) i = 18(25)/100 = 4.5 Q1 = 2.7 (5th position of the ordered data) i = 18(75)/100 = 13.5 Q3 = 4.1 (14th position of the ordered data) IQR = Q3 - Q1 = 4.1 - 2.7 = 1.4 s = b) = 1.1757 i = 17(25)/100 = 4.25 Q1 = 2.7 (5th position of the ordered data) i = 17(75)/100 = 12.75 Q3 = 3.8 (13th position of the ordered data) IQR = Q3 - Q1 = 3.8 - 2.7 = 1.1 s = = .8206 The standard deviation is affected more by the removal of the outlier than the IQR is affected. 3.66 a) - = 704/30 = 23.467 x s = = 7.0257 112 Instructor's Manual Median = (22 + 23)/2 = 22.5 (average of the 15th and 16th positions) Sk = 3(23.467 - 22.5)/7.0257 b) z-scores: -.78 -1.92 1.50 -1.49 -.64 c) Sk = .413 .08 1.50 -.49 -.78 -.92 1.07 .36 1.21 -.49 -.35 Sk is close to zero. .93 -1.06 .22 -.35 .79 -.92 2.35 -1.06 -.21 .65 .08 -.78 -.07 1.07 .50 Therefore, the data are not very skewed. The observation 40 has a z-score of 2.35. This observation may be considered an outlier since it is more than 2 standard deviations from the mean. 3.67 a) A high Sharpe measure indicates that you are being well paid for the risk you are taking. The Janus fund had a better risk-adjusted performance than the Magellan fund. b) The coefficient of variation is the standard deviation divided by the mean and multiplied by 100. The Sharpe measure has the standard deviation in the denomination and average return in excess of a treasury bill's performance in the numerator. Sharpe measure can be thought of as the reciprocal of the coefficient of variation. 3.68 - = 20 x s = 9.1287 z = (x - 20)/9.1287 z-scores are -1.095 -.548 1.095 .548 The Chapter 3 113 The mean of the z-scores = 0 since the sum of the z-scores is 0. The standard deviation of the z-scores is s (3 02 / 4)3 1 - = 0 always: Show z -)/s z = (x - x -) = (1/s)(x - x) = 0 = (1/s)(x - x 3.69 a) Stem -2 -1 -1 -0 -0 0 0 1 1 2 2 3 Leaf (Leaf Unit = .10) 3 55 21 965 211 3 789 013 9 04 1 The data are not exactly bell-shaped. However, the data appear to be almost uniformly distributed between -1.5 and 2.5. b) Sk = 3(.25 - .1)/1.405 = .32 c) By the empirical rule, approximately 68% of the values should fall between .25 - 1.405 to .25 + 1.405; -1.155 to 1.655 3.70 a) - = (484(17.5) + 1010(22.5) + 1188(27.5) + 795(32.5) x + 278(37.5) + 44(42.5))/3799 = 101998/3799 = 26.8486 114 Instructor's Manual b) Since grouped data are used in part (a), the mean for each interval is approximated by the midpoint of the interval. Therefore, the actual data should yield a different mean. 3.71 The interval that contains approximately 95% of the rates is $75 - (2)($15) to $75 + (2)($15) 3.72 -1.50 = (15 - 45)/s s = (15 - 45)/-1.50 3.73 $45 to $105 s = 20 s2 = 400 By Chebyshev's inequality at least 88.9% of the observations will lie within 3 standard deviations of the mean. Therefore, at least 711 aluminum sheets will have castings between 3.0 - 3(.5) and 3.0 + 3(.5). This interval is 1.5 to 4.5. Therefore, the supervisor should accept the shipment. 3.74 n = 65 - = 520 x - - 2s = 520 - (2)(25) = 470 x s = 25 - + 2s = 520 + (2)(25) = 570 x There should be at least 75% of the data within this interval. we know that the population has a bell-shaped distribution, we should expect approximately 95% of the data to fall within this interval. 3.75 a) - - s to x - + s x 13,062.33 - 7007.578 to 13,062.33 + 7007.578 6054.752 to 20069.908 b) nP / 100 = 30(25)/100 = 7.5; round up to 8 n(P) / 100 = 30(50)/100 = 15; average the 15th and 16th If Chapter 3 positions. nP / 100 = 30(75)/100 = 22.5; round up to 23 25th percentile is 7400 50th percentile is (13500 + 14500)/2 = 14000 75th percentile is 17500 Twenty-five percent of the data are less than or equal to 7400. Fifty percent of the data are less than or equal to 14000. Seventy-five percent of the data are less than or equal to 17500. 115 116 3.76 Instructor's Manual a) The histogram for calories is as follows. Chapter 3 117 The histogram for fat is as follows. Frequency Histogram 16 14 12 10 8 6 4 2 0 0 and under 1 1 and under 2 2 and under 3 3 and under 4 4 and under 5 Class Limits b) Neither histogram actually resembles a normal distribution. However, the histogram for fat appears to be closer in shape as it is more symmetrical than the histogram for calories and has a single mode near the middle. c) Approximately 95% of the data should lie within two standard deviations of the mean. 2 -2(1.2247) to 2 + 2(1.2247) gives an interval of -.45 to 4.45. 118 Instructor's Manual d) Calories Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 126.92308 5.2847889 125 120 26.947242 726.15385 -0.9103115 0.3683437 80 90 170 3300 26 Fat Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 2 0.240192231 2 2 1.224744871 1.5 0.070869565 -0.141526074 4.5 0 4.5 52 26 The Calories datas skewness is slightly positive and the Fat datas skewness is slightly negative. The mean, median, and mode for the Fat data are all equal, whereas these values differ somewhat for the Calories data. Frequency Histogram 35 30 25 20 15 10 5 0 10 and under 20 20 and under 30 30 and under 40 40 and under 50 50 and under 60 Class Limits 60 and under 70 70 and under 80 80 and under 90 Chapter 3 3.77 119 a) Box Plot 140.000000 Upper Outer Fence 120.000000 100.000000 Upper Inner Fence 80.000000 Third Quartile 60.000000 Median First Quartile 40.000000 20.000000 Lower Inner Fence 0.000000 -20.000000 Lower Outer Fence -40.000000 * mild outlier o extreme outlier b) The data appear to have a bell-shaped distribution. c) The mean is 50.2167, standard deviation is 15.913, and the median is 50. The coefficient of skewness is 3(50.2167 - 50)/ 15.913 = .0408. Note that the coefficient of skewness is small in magnitude which is consistent with a bell-shaped distribution. 120 Instructor's Manual 3.78 The box plot for MarketExp is as follows. Box Plot 25000.000000 20000.000000 Upper Outer Fence 15000.000000 Upper Inner Fence 10000.000000 Third Quartile Median First Quartile 5000.000000 Lower Inner Fence 0.000000 Lower Outer Fence -5000.000000 -10000.000000 * mild outlier o extreme outlier Chapter 3 The box plot for R&DExp is as follows. Box Plot 18000.000000 16000.000000 Upper Outer Fence 14000.000000 Upper Inner Fence 12000.000000 10000.000000 Third Quartile Median 8000.000000 First Quartile 6000.000000 4000.000000 Lower Inner Fence 2000.000000 Lower Outer Fence 0.000000 * mild outlier o extreme outlier 121 122 Instructor's Manual b) The histogram for MarketExp Frequency Histogram 35 30 25 20 15 10 5 0 2500 and 3500 and 4500 and 5500 and 6500 and 7500 and 8500 and 9500 and 10500 11500 12500 under under under under under under under under and under and under and under 3500 4500 5500 6500 7500 8500 9500 10500 11500 12500 13500 Class Limits The histogram for R&DExp is as follows. Frequency Histogram 35 30 25 20 15 10 5 0 2500 and 3500 and 4500 and 5500 and 6500 and 7500 and 8500 and 9500 and 10500 11500 12500 under under under under under under under under and under and under and under 3500 4500 5500 6500 7500 8500 9500 10500 11500 12500 13500 Class Limits Chapter 3 c) The data for R&DExp appear to follow a normal distribution. d) The distribution of both groups appears to be centered near 123 8,000. The shape of the R&DExp data appears to be more symmetrically shaped than that of the MarketExp data. The R&DExp data have 3 mild outliers whereas the MarketExp data do not have any outliers as indicated by the box plots. 3.79 a) Mean Median Minimum Maximum First Quartile Third Quartile b) 24.872 25.275 10.21 34.42 21.62 28.34 For the case where the largest value is set to 0, Mean Median Minimum Maximum First Quartile Third Quartile 24.1836 25.05 0 32.88 21.56 28.34 The statistics do not change by much. c) For the case where the four largest values are set to 0, Mean Median Minimum Maximum First Quartile Third Quartile 22.2438 24.935 0 31.56 21.05 27.45 124 Instructor's Manual Note that the value of the mean has dropped by over 2 units. The mean is affected more than the median. The first and third quartiles are not affected very much. 3.80 a) Approximately 75% of the data are between the values of 21 and 33. The fiftieth percentile is approximately equal to 25. Therefore, there are some mild outliers between 50 and 70 and one extreme value over 70.