Download Chapter 2

Chapter 2 ORGANIZATION AND DESCRIPTION OF DATA (a) The percentage in other classes is 100  32.7  12.8  12.5  12.1  8.2  21.7 % (b) 35 30 25 Percent Waste 2.1 20 15 10 5 0 Paper Yard Food Plastic Metals Other (c) The percentage of waste that is paper or paperboard is: 32.7 % The percentage of waste in the top two categories is: 32.7 12.8  45.5 % The percentage in the top five categories is: 32.7 12.8 12.5 12.1  8.2  78.3% 5 6 2.2 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA The frequency table for blood type is Blood type O A B AB Total 2.3 Frequency 16 18 4 2 40 Relative Frequency 0.40  16/ 40 0.45  18/ 40 0.10  4 / 40 0.05  2 / 40 1.00 The frequency table for number of activities is Number of Activities 0 1 2 3 4 5 6 7 Total Frequency 7 10 13 5 2 1 1 1 40 This is the relative frequency histogram: Relative Frequency 7/40 = 0.175 10/40 = 0.25 13/40 = 0.325 5/40 = 0.125 2/40 = 0.05 1/40 = 0.025 1/40 = 0.025 1/40 = 0.025 1.00 7 2.4 The frequency table for number of crashes per month is Number of Activities 0 1 2 3 4 5 6 Total Frequency 5 12 11 14 8 8 1 59 Relative Frequency 5/59 = 0.085 12/59 = 0.203 11/59 = 0.186 14/59 = 0.237 8/59 = 0.136 8/59 = 0.136 1/59 = 0.017 1.00 (rounding error) This is the relative frequency histogram: 2.5 (a) The table of relative frequencies for workers in the department is Mode of Transportation Drive alone Car pool Ride bus Other Total Frequency 50 6 14 10 80 Relative Frequency 0.625  50 / 80 0.075  6 / 80 0.175  14 / 80 0.125  10 / 80 1.000 8 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA (b) The pie chart for workers in the department is 2.6 The table of relative frequencies for the money raised (in million dollars) is Source Individuals and bequests Industry and business Foundations and associations Total Frequency 234 48 132 414 Relative Frequency 0.565  234 / 414 0.116  48 / 414 0.319  132 / 414 1.000 The pie chart for the university fund drive is 2.7 There are overlapping classes in the grouping. A report of 3 stolen bicycles will fall in two classes. 2.8 There is a gap. A report of 6 complaints in one week does not fall in any class. The last class should be 6 or more. 9 2.9 There is a gap. The response 5 close friends does not fall in any class. The last class should be 5 or more. 2.10 The first class should be “less than 175 pounds”. Otherwise, a light weight kicker cannot be assigned to a class. 2.11 (a) Yes. (b) Yes. (c) Yes. (d) No. (e) No. 2.12 The frequency table of the survey response is Response 1 2 3 4 Total Frequency 14 13 7 16 50 Relative Frequency 0.28  14/ 50 0.26  13/ 50 0.14  7 / 50 0.32  16/ 50 1.00 2.13 (a) The relative frequencies are 0.18 , 0.48, 0.26, and 0.08 for 0, 1, 2, and 3 bags, respectively. (b) Nearly one-half of the passengers check exactly one bag. The longest tail is to the right. (c) The proportion of passengers who fail to check a bag is 9 / 50  0.18 . 10 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.14 The dot diagram of meter readings is Measurements DotPlot 422 432 442 452 462 472 2.15 The dot diagram of amounts of radiation leakage is 2.16 The dot diagram of number of bad checks received is Number of Bad Checks Received 3 4 5 6 7 8 11 2.17 (a) The dot diagram of number of CFUs is (b) There is a long tail to the right with one extremely large value of 1700 CFU units. (c) There is one day so the proportion is 1/15  0.067 2.18 (a) The frequency distribution of tornado fatalities is given in the table below. Class Interval [0, 25) [25, 50) [50, 75) [75, 100) [100, 150) [150, 200) [200, 250) [250, 550) Total Frequency 2 19 18 7 5 2 1 3 58 Relative Frequency 2/58 = 0.034 19/58 = 0.328 18/58 = 0.310 7/58 = 0.121 5/58 = 0.086 2/58 = 0.034 1/58 = 0.017 3/58 = 0.052 0.982 (rounding error) (b) The relative frequency histogram is given below. Relative Frequency 0.3 0.2 0.1 0.0 0 25 50 75 100 150 200 250 Number of Deaths (c) The proportion of years having 49 or fewer tornado fatalities is 0.034  0.328  0.362 . (d) There is a long tail to the right due to the fact that the last class interval is much wider than the others yet still exhibits a low frequency of observations. 12 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.19 (a) In the following frequency distribution of lizard speed (in meters per second), the left endpoint is included in the class interval but not the right endpoint. Class Interval 0.45 to 0.90 0.90 to 1.35 1.35 to 1.80 1.80 to 2.25 2.25 to 2.70 Total Frequency 2 6 11 5 6 30 Relative Frequency 0.067 0.200 0.367 0.167 0.200 1.001 (rounding error) (b) All of the class intervals are of length 0.45 so we can graph rectangles whose heights are the relative frequency. The histogram is 2.20 In the following frequency distribution of order of earthquake magnitudes (as given on the Richter scale), the left endpoint is not included in the class interval, but the right one is. Class Interval (6.0, 6.3] (6.3, 6.6] (6.6, 6.9] (6.9, 7.2] (7.2, 7.5] (7.5, 7.8) (7.8, 8.1) Total Frequency 12 15 10 10 5 2 1 55 Relative Frequency 12/55 = 0.218 15/55 = 0.273 10/55 = 0.182 10/55 = 0.182 5/55 = 0.091 2/55 = 0.036 1/55 = 0.018 1.0000 (rounding error) The class intervals all have the same length so we take the option of making the height of a rectangle equal to the relative frequency. The histogram is Frequency 13 16 14 12 10 8 6 4 2 0 6.3 6.6 6.9 7.2 7.5 7.8 8.1 Order of Earthquake Magnitude 2.21 This time, the frequency distribution is given by Class Interval (6.0, 6.3] (6.3, 6.6] (6.6, 6.9] (6.9, 7.2] (7.2, 7.9] Total Frequency 12 15 10 10 8 55 Relative Frequency 12/55 = 0.218 15/55 = 0.273 10/55 = 0.182 10/55 = 0.182 8/55 = 0.145 1.0000 (rounding error) Frequency The corresponding frequency histogram is as follows: 16 14 12 10 8 6 4 2 0 6.3 6.6 6.9 7.2 >7.2 Order of Earthquake Magnitude 14 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.22 The stem-and-leaf display of the scores is 9 58 10 6 11 559 12 6 13 135678 14 344557 15 2478 16 01222567 17 14688 18 24 19 04 2.23 The stem-and-leaf display of the amount of iron present in the oil is 0 6 1 2234455567777889 2 000000222445567799 3 022444566 4 1167 5 12 2.24 The corresponding measurements are 225 238 290 319 344 371 382 397 405 416 433 480 504 568 613 2.25 The double-stem display of the amount of iron present in the oil is 0 6 1 22344 1 55567777889 2 00000022244 2 5567799 3 022444 3 566 4 11 4 67 5 12 15 2.26 The corresponding measurements are 18 20 20 20 20 21 22 22 23 23 23 23 23 24 24 24 25 25 25 26 26 27 29 30 31 31 34 2.27 The five-stem display of the Consumer Price Index in 2001 for the given cities is 15 5 15 15 9 16 16 16 16 7 16 8 17 0 17 22333 17 4 17 677 17 88 18 11 18 2 18 18 67 18 19 011 2.28 (a) The median is 4. The sample mean is 3  7  4  11  5 30 x  6 5 5 (b) The median is 3. The sample mean is 3  1  7  3  1 15 x  3 5 5 2.29 (a) The sample mean is 4  8  1  2  0 15 x  3 5 5 The ordered measurements are 0, 1, 2, 4, 8, so the median is 2 (b) The mean is 26  30  38  32  26  31 183 x   30.5 6 6 16 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA The ordered measurements are: 26, 26, 30, 31, 32, 38 30  31  30.5 The median  2 (c) The sample mean is 2  4  0  2  8  2  4 x 2 7 The ordered measurements are: 2,  2, 0, 2, 4, 4, 8 . The median is 2. 2.30 The sample mean is x 6.2  6.9  5.5  5.3  5.6  5.4  6.6  6.5 48.0   6.0 8 8 The ordered measurements are: 5.3, 5.4, 5.5, 5.6, 6.2, 6.5, 6.6, 6.9. 5.6  6.2 11.8   5.9 The median  2 2 2.31 (a) x  3810/15  254. (b) The ordered observations are: 10 20 50 60 80 90 90 110 140 180 260 340 380 400 1600 So, the median is 110 CFU units. The one very large observation makes the sample mean much larger. Hence, the sample median is better to use in this instance. 2.32 (a) The ordered monthly incomes are: 2300 2350 2400 2450 2575 2650 4700. x 19425  2775 , median  2450 . 7 (b) For a typical salary, the median is better. Only one person earns more than the mean. 2.33 The mean is 956/12  79.67 . The claim ignores variability and is not true. It is certainly unpleasant with a daily maximum temperature 105oF in July. 17 2.34 The sample mean is x 65  72  67  73  70  67  84 498   71.14 cases 7 7 The ordered sales times are: 65, 67, 67, 70, 72, 73, 84 median  70 2.35 (a) x  212/ 25  8.48 (b) The sample median is 8. Since the sample mean and median are about the same, either of them can be used as an indication of radiation leakage. 2.36 The mean, 10.30, is one measure of center tendency and the median, 10.00, is another. These values may be interpreted as follows. On average, there were 10.3 reports of aggravated assault at the 27 universities. Thirteen of the universities had at least 10 such reports while 13 recorded at most 10 such reports. At least one school logged exactly 10 reports. 2.37 The mean, 118.05, is one measure of center tendency and the median, 117.00, is another. The value 118.05 tells us that, on average, that a baby weighed 118.05 ounces. The median tells us that about half of the babies weighed at least 117 ounces while roughly half weighed at most 117 ounces. 0(7)  1(10)  2(13)  3(5)  4(2)  5(1)  6(1)  7(1)  1.925 (activities) 40 (b) Sample median is 2 activities (c) The large observations of 5, 6, and 7 activities did not drastically affect the computation of the mean in this instance. 2.38 (a) x  1(7)  2(9)  3(6)  4(5)  5(3)  2.6 (returns) 30 (b) Sample median is 2 returns. 2.39 (a) x  2.40 (a) Sample median  (235  242) / 2  238.5 (seconds). (b) x  1240 / 6  206.7 (seconds). 2.41 (a) x  271/ 40  6.775 days. (b) Sample median  (6  7) / 2  6.5 . Both the sample mean and the sample median give a good indication of the amount of mineral lost. 2.42 (a) Sample median for males  (45.8  48.3) / 2  47.05 . (b) Sample median for females  30.3 . (c) Sample median for the combined set of males and females  38.6 . 18 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.43 The ordered measurements are: 145, 158, 165, 176, 182, 183, 200, 205, 216, 232 Sample median  (182  183) / 2  182.5 (minutes). 2.44 In Exercise 2.43, the sample mean  1862/10  186.2 (minutes). The total time for 10 games is 10x  1862 minutes and this is meaningful. However 10  median ignores the actual times of the long games and is therefore meaningless. 2.45 (a) The dot diagram for the diameters (in feet) of the Indian mounds in southern Wisconsin is (b) x  346/13  26.62 . Sample median  24 . (c) 13/ 4  3.25 , so we count in 4 observations. Q1  22 and Q3  30 . 2.46 16/ 4  4 , an integer, so we average the 4th and 5th observations. Q1  (15  20) / 2  17.5 and Q3  (34  42) / 2  38 . The median, or Q2  (26  31) / 2  28.5 days. 2.47 (a) Median  (152  154) / 2  153 . (b) 40/ 4  10 , so we need to count in 10 observations. The 11-th smallest observation also satisfies the definition. Q1  135  136 166  167  135.5, Q3   166.5 2 2 2.48 x  2283/ 25  91.32 calls per shift. 2.49 The ordered data are 50 57 68 69 72 73 73 80 82 91 92 93 94 96 96 100 102 104 105 106 108 109 118 118 127 Since the number of observations is 25, the median or second quartile is the 13th ordered observation in the list. The first quartile is the 7th observation. Q1  73 Q2  94 Q3  105 19 2.50 (a) The ordered data are 0.50 0.76 1.02 1.04 1.20 1.24 1.28 1.29 1.36 1.49 1.55 1.56 1.57 1.57 1.63 1.70 1.72 1.78 1.78 1.92 1.94 2.10 2.11 2.17 2.47 2.52 2.54 2.57 2.66 2.67 Since the number of observations is 30, the median or second quartile is the average of the 15th and 16th in the list. Sample median  (1.63  1.70) / 2  1.665 meters per second. Because 30/ 4  7.5 , the first quartile is the 8th ordered observation. Q1  1.29 Q2  1.665 Q3  2.11 (b) Since 0 .9(30)  27 , the 90th percentile is the average of the 27th and 28th observation in the ordered list. Sample 90th percentile  (2.54  2.57) / 2  2.555 . 2.51 (a) The ordered observations are 10 20 50 60 80 90 90 110 140 180 260 340 440 450 1700 Since the sample size is 15, the median is the 8th observation 110. To obtain Q1 , we find 15/ 4  3.75 so the first quartile is the 4th observation in the ordered list. Q1  60 Q3  340 (b) The 90th percentile requires us to count in at least 0 .9(15)  13.5 or 14 observations. The 90th sample percentile  450 . 2.52 (a) The mean of the original data set is 4  8  8  7  9  6 42 x  7 6 6 Adding c  4 to the original data set we get: 8, 12, 12, 11, 13, 10. The mean of the new data set is 8  12  12  11  13  10 66 xc    11  x  c 6 6 which equals x  4  7  4 . Multiplying the original data set by d  2 we get: 8, 16, 16, 14, 18, 12. The mean of the new data set is dx  8  16  16  14  18  12 84   14  d x 6 6 20 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA which equals d  x  2(7) . (b) The median of the original data set is median  78  7.5 2 When c  4 is added to the original data set, the median of the new data set is median of ( x  4)  11  12  11.5 2 which equals (median  c)  7.5  4 . When the original data set is multiplied by d  2 , the median of the new data set is median of 2 x  14  16  15 2 which equals (d  median)  2(7.5) . 2.53 (a) The ordered data are 62, 70, 75, 75, 80. The median is 75oF and the mean is x  362 / 5  72.4o F . (b) The mean of (o F  32) is x  32 by property (i) of Exercise 2.52 with c  32 . By property (ii) 5 o 5 ( F  32)  (mean of (o F  32)) 9 9 5 5  ( x  32)  (72.4  32)  22.44o C 9 9 mean of By similar properties for the median median of 5 o 5 5 ( F  32)  (median of (o F )  32)  (75  32)  23.89o C 9 9 9 2.54 (a) Company A. The average is highest and a superior machinist would earn above the median. (b) Company B. A medium quality machinist would be paid near the median. Company B has the higher median. 21 2.55 (a) (b) 67 454  13.40  50.44 Lake Woodruff x  5 9 (c) From the dot diagrams, the males in Lake Apopka have lower levels of testosterone and their sample mean is only about one-third of that for males in (un-contaminated) Lake Woodruff. This finding is consistent with the environmentalists’ concern that the contamination has affected the testosterone levels and the reproductive abilities. Lake Apopka x  2.56 (a) 22 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA (b) Males x  67 / 5  13.40 Females x  144/11  13.09 (c) The dot diagrams of the amount of testosterone seems to be quite similar for males and females although there is a gap in the male diagram. The two means are nearly the same which suggests that the insecticide contamination has pushed hormone concentrations far out of balance because, ordinarily, males should have higher testosterone concentrations. 2.57 (a) We carry out all necessary calculations in the following table. The mean is x  12 / 3  4 . Total x 7 2 3 12 xx 3 2 1 0.0 ( x  x )2 9 4 1 14 (b) The variance and the standard deviation are 14 s2   7 , s  7  2.646 3 1 2.58 (a) We carry out all necessary calculations in the following table. The mean is x  15/ 3  5 . xx x ( x  x )2 1 -4 16 10 5 25 4 -1 1 Total 15 0.0 42 (b) The variance and the standard deviation are 42 s2   21 , s  21  4.583 3 1 2.59 (a) We carry out all necessary calculations in the following table. The mean is x  24 / 4  6 . xx x ( x  x )2 Total 6 4 12 2 24 0 2 6 4 0.0 (b) The variance and the standard deviation are 56 s2   18.667 , s  18.667  4.320 4 1 0 4 36 16 56 23 2.60 (a) We carry out all necessary calculations in the following table. The mean is x  11.5 / 5  2.3 . Total x 2.6 1.5 3.5 2.4 1.5 11.5 xx 0.3 -0.8 1.2 0.1 -0.8 0.0 ( x  x )2 0.09 0.64 1.44 0.01 0.64 2.82 (b) The variance and the standard deviation are 2.82 s2   0.705 , s  0.705  0.840 5 1 2.61 We carry out all necessary calculations in the following table. Total x 8 3 4 15 x2 64 9 16 89 The variance is 2 x  1   1  152  1  2  s  x   89      (89  75)  7 n 1  n  2 3  2   2 2.62 We carry out all necessary calculations in the following table. x x2 6 36 4 16 12 144 2 4 Total 24 200 The variance is 2 x  1   1  242  1 56  2 2  s  x   200   18.667     (200  144)  n 1  n  3 4  3 3   2.63 (a) s 2  (38  122 / 5) / 4  2.30 . (b) s 2  (19  (7)2 / 6) / 5  2.167 . (c) s 2  (499  592 / 7) / 6  0.286 . 24 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.64 (a) Many factors could explain the difference in apartment rents. One possible factor is simply that different landlords may charge different rents. Other factors are the size of the apartment, the proximity of the apartment to key locations such as parks or public transportation, and whether utilities such as water and electricity are included. (b) s 2  (3,836,675  51552 / 7) / 6  6730.95 . (c) s  6730.95  82.04 2.65 s  (9726  3462 /13) /12  6.5643 . 2.66 (a) s 2  (142  262 / 5) / 4  1.70 . (b) s  1.70  1.304 2.67 (a) s 2  (3,140,900  38102 /15) /14  155, 225.7 . (b) s  155225.7  393.99 . (c) s 2  (580900  22102 ) /14) /13  17,848.9 . so s  17848.9  133.6 . The single very large value greatly inflates the standard deviation. 2.68 (a) s 2  (2410  2122 / 25) / 24  25.51 . (b) s  25.51  5.05 . 2.69 (a) x  1862/10  186.2 . (b) s 2  (353,308  18622 /10) / 9  733.73 . (c) s  733.73  27.09 2.70 (a) x  62 / 50  1.24 bags. (b) s 2  (112  622 / 50) / 49  0.71673 , so that s  .847 . 2.71 (a) Median  67.4 . (b) x  478.4 / 7  68.343 . (c) s 2  (32,773.34  478.42 / 7) / 6  13.020 . Hence s  3.608 . 2.72 (a) The measure of variation displayed is 7.61, the sample standard deviation. The sample variance is s 2  7.612  57.9121 . (b) The interquartile range is Q3  Q1  14.00  5.00  9.00 . This means the center half of the data span an interval of length 9. (c) Any value greater than 7.61 would correspond to greater variation. 25 2.73 (a) The measure of variation displayed is 15.47, the sample standard deviation. The sample variance is s 2  15.47 2  239.321 . (b) The interquartile range is Q3  Q1  131.00  106.00  25.00 . This means the center half of the data span an interval of length 25 ounces. (c) Any value smaller than 15.47 would correspond to smaller variation. 2.74 (i) For the observations 5, 9, 9, 8, 10, 7, x  8, s 2  3.2 and s  1.789 . Add c  4 to the observations x , we have 9, 13, 13, 12, 14, 11. The sample mean and variance of the new data set are 9  13  13  12  14  11  12 6 9  1  1  0  4  1 16 variance of ( x  4)    3.2 6 1 5 mean of ( x  4)  So the standard deviation of the new data set x  4 is same as the standard deviation of x. 3.2  1.789 which is the (ii) Multiply the observations x by d  2 . We get 10, 18, 18, 16, 20, 14. The sample mean and variance of the new data set are 10  18  18  16  20  14  16 6 36  4  4  0  16  4 variance of 2 x  6 1 4(9  1  1  0  4  1)   4(3.2)  12.8 5 So the standard deviation of the new data set 2x is 2 3.2  3.578 or d times the standard deviation of x. mean of 2 x  2.75 Using the data set in Exercise 2.22, in Exercise 2.47, we determined that Q1  135.5 and Q3  166.5 . Hence, Interquartile range  Q3  Q1  166.5  135.5  31.0 points. 2.76 From the data set of Exercise 2.33, in Exercise 2.46 we determined that Q1  17.5 and Q3  38 . Hence, Interquartile range  Q3  Q1  38  17.5  20.5 days. 26 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.77 No. Typically, the middle half of a data set is much more concentrated than the sum of the two quarters, one in each tail. As an example, for the water quality data of Exercise 2.17, the range is 1600  10  1590 because of one extremely large observation. From the quartiles determined in Exercise 2.51, the interquartile range is 340  60  280 . The range is six times larger than the interquartile range. 2.78 (a) x  150.125 and 2s  49.354 so x  2s is the interval (100.771, 199.479) . This interval contains 38 observations or proportion .95 of the observations. And x  3s is the interval (76.094, 224.156) which contains proportion 1 of the observations. (b) The empirical guidelines suggests proportion 0.95 in the interval x  2s and we observed 0.95. It suggests proportion 0.997 for the interval x  3s and we observed 1.000. The agreement is excellent. 2.79 (a) x  6.775 and s  19.4096  4.406 . (b) The proportion of the observations are given in the following table: xs x  2s x  3s Interval: (2.369, 11.181) ( 2.037, 15.587) ( 6.443, 19.993) Proportion: 26 / 40  0.65 38 / 40  0.95 40 / 40  1.00 Guidelines: 0.68 0.95 0.997 (c) We observe a good agreement with the proportions suggested by the empirical guideline. 2.80 (a) x  51.71/ 30  1.724 and s  (98.641  51.712 / 30) / 29  0.5727 . (b) The proportion of the observations are given in the following table: xs x  2s x  3s Interval: (1.151, 2.296) (.579, 2.2869) (.006, 3.442) Proportion: 20 / 30  0.667 29 / 30  0.967 30 / 30  1.00 Guidelines: 0.68 0.95 0.997 (c) We observe a good agreement with the proportions suggested by the empirical guideline. 2.81 (a) x  25.160 and s  114.790  10.714 . (b) The proportion of the observations are given in the following table: xs x  2s x  3s Interval: (14.446, 35.874) (3.732, 46.588) ( 6.982, 57.302) Proportion: 36 / 50  0.72 47 / 50  0.94 50 / 50  1.00 Guidelines: 0.68 0.95 0.997 (c) We observe a good agreement with the proportions suggested by the empirical guideline. 27 2.82 (a) The z-values of 350 and 620 are z 350  490 620  490  1.167 , z   1.083 . 120 120 (b) For the z-score of 2.4, the raw score is obtained by solving the equation x  210 so x  210  50(2.4)  330 . 50 102  118.05 144  118.05  1.037 z  1.677 2.83 (a) z  (b) 15.47 15.47 z  2.4  2.84 (a), (b) The boxplots for salaries in City A and City B are shown below. 28 (c) CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA There is a greater difference between the cities with respect to the higher salaries. For instance, any salary above the median in City B is greater than the 75th percentile in City A. 2.85 For males, the minimum and the maximum horizontal velocity of a thrown ball are 25.2 and 59.9 respectively. The quartiles are: Q1  (38.6  39.1) / 2  38.85, median  (45.8  48.3) / 2  47.05, Q3  (49.9  51.7) / 2  50.8. For females, the minimum and the maximum horizontal velocity of a thrown ball are 19.4 and 53.7 respectively. The quartiles are Q1  25.7, median  30.3, Q3  33.5 . The boxplot of the male and female throwing speed are Comparing the two boxplots, we can see that males throw the ball faster than females. 2.86 (a) The differences, arranged in order, between 2007 and 1992 Consumer Price Index are 13 13 14 18 20 20 21 21 21 22 23 24 25 25 26 26 27 33 33 34 35 36 37 41 Q1  20  21 24  25 33  34  20.5, Q2   24.5, Q3   33.5 2 2 2 The five-number summary is: 13, 20.5, 24.5, 33.5, 41 29 Alternatively, from Minitab: Descriptive Statistics: Variable C1 N 24 N* Mean 0 25.33 SE Mean 1.59 StDev 7.81 Minimum Q1 13.00 20.25 Median 24.50 Q3 33.00 Maximum 41.00 (b) Boxplot of Increases in Consumer Price Indices 40 35 30 25 20 15 10 2.87 (a) x 608  25.33 24 and s 16806  (608) 2 / 24  7.811 24  1 (b) Since x  2s  25.33  2(7.811)  9.708 and x  2s  25.33  2(7.811)  40.952 only the increase of 41 for Honolulu lies outside the interval. The proportion 23/ 24  0.958 lies within the interval. 2.88 (a) Using the ordered data set from Example 5, we have min  4.5 max  10.0 Q1  (6.0  6.3) / 2  6.15 Q2  7.3 Q3  (8.0  8.3) / 2  8.15 (b) The box plot depicting this data set is as follows: 30 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA Hours of Sleep 10 9 C1 8 7 6 5 4 2.89 From Exercise 2.38, we know that x  1.925 s and 249  (77) 2 / 40  1.607 40  1 2.90 (a) x  296 /14  21.14 (b) 12078  296 2 14  21.16 14  1 (c) The dot plot is given below s (d) All are losses except for two gains in the 1998 and 2002 elections. 2.91 (a) The ordered data are 8  5 4 5 8 11 12 16 26 30 43 47 52 55 Median  (12  16) / 2  14 seats lost. 31 (b) The maximum number of seats lost, 55, occurred when Harry S. Truman was President. The minimum number, 8 or a gain, occurred during G.W. Bush’s term as President. (c) range  55  (8)  63 2.92 The process appears to be in statistical control. The pattern is nearly a horizontal band with one possible low value. 2.93 The value 215 from the second pay period looks high and 194 from the fifth period is possibly high. 32 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.94 We calculate x  2283/ 25  91.32 and s  9281.44 / 24  19.67 so the upper limit is x  2s  130.66 and the lower limit is x  2s  51.98 . Only the value 50 calls for worker 20 is out of control. 2.95 We calculate x  2501/ 26  96.2 and s  65254 / 25  51.1 so the upper limit is x  2s  198.4 and the lower limit is x  2s  6.0 which we take as 0. Only the value 215 from the second pay period is out of control. 33 2.96 Time Series Plot of Exchange Rate 2.4 2.2 Exchange Rate 2.0 1.8 1.6 1.4 1.2 1.0 1992 1994 1996 1998 2000 Index 2002 2004 2006 The process appears to be in statistical control between 1994 and 2003, but then begins to taper off from 2003 to 2007. 2.97 Xbar Chart of Exchange Rate 2.4 1 2.2 Sample Mean 2.0 UCL=1.902 1.8 1.6 _ _ X=1.416 1.4 1.2 1.0 LCL=0.930 1 3 5 7 9 Sample 11 13 15 The process appears to be in statistical control. Only the value of 2.29 in 1993 is out of control. 34 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.98 We re-calculate without the outlier 5326. 18329 5195138 x  1221.9 and s   609.16 15 14 so the upper limit is x  2s  2440.2 and the lower limit is x  2s  3.6 . All of the points are within the control limits. 2.99 (a) The relative frequencies of the occupation groups are: Goods Producing Service (Private) Government Total Relative Frequency 2007 2000 0.139 0.161 0.722 0.702 0.139 0.136 1.000 1.000 (b) The proportions of persons in private service occupations and government has increased while the proportion in goods producing have decreased from 2000 to 2007. 2.100 (a) The frequency table of “intended major” of the students is: Intended major Biological Science Humanities Physical Sciences Social Science Total Frequency Relative Frequency 18 0.367 4 0.082 9 0.184 18 0.367 49 1.000 35 (b) The frequency table of “year in college” of the students is: Year 1 2 3 4 Total Frequency Relative Frequency 4 0.082 10 0.204 20 0.408 15 0.306 49 1.000 2.101 The dot diagrams of heights for the male and female students are 2.102 The frequency table of the causes for power outage is: Frequency Table for Causes of Outage Cause Frequency Trees and limbs 12 Animals 9 Lighting 3 Wind storm 1 Fuse 1 Unknown 4 The Pareto chart for the cause of outage is 36 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.103 (a) Yes. The exact number of lunches is the sum of the frequencies of the first four classes. (b) Yes. The exact number of lunches is the sum of the frequencies of the last two classes. (c) No. 2.104 The sample mean and sample standard deviation are: 137  139  96  137  115  124.8 mm. 5 (79,300  6242 / 5) s2   356.2, so that s  356.2  18.9 mm . 4 x 2.105 (a) The mean, 227.4, is one measure of center tendency and the median, 232.5, is another. These values may be interpreted as follows. On average, the 20 grizzly bears weigh 227.4 pounds apiece. Half of the grizzly bears sampled weighed at least 232.5 pounds while half weighed at most 232.5 pounds. (b) The sample standard deviation is 82.7 pounds. (c) The z score for a weight of 320 pounds is 320  227.4 z  1.12 82.7 2.106 (a) Median  (64  67) / 2  65.5 . (b) We count in 38/ 4  9.5 or 10 observations to find Q1  50 and Q3  79 . (c) The proportion of students who scored below 40 is 5 / 38  0.132 . The proportion of students who scored 90 or over is 4 / 38  0.105 . 2.107 (a) Sample median  (9  9) / 2  9 . (b) x  271/ 30  9.033 . (c) The sample variance is 1  2712  s 2   2587    4.792 . 29  30  2.108 (a) The double stem display is 4 23 4 6677899 5 0011112244444 5 555566677778 6 0111244 6 589 37 (b) Median  54  55  54.5, Q1  50 , Q3  57 . 2 2.109 (a) x  7, s  2 (b) By the properties, the new data set x  100 has sample mean  (7  100)  107 and standard deviation 2. By direct calculation, we verify 106  108  104  109  108  107 5 (106  107) 2  (108  107) 2  (104  107) 2  (109  107) 2  (108  107) 2 4 4 (c) By the properties, the new data set 3x has sample mean  3(7)  21 and standard deviation 3 s  3(2)  6 . By direct calculation, we verify 18  24  12  27  24  21 5 (3)2  (3)2  (9)2  (6)2  (3)2  1 1  9  4 1  2  9   9s 4 4   2.110 (a) For the heights of males, x  69.61, s  2.97 . (b) For the heights of females, x  66.14, s  2.60 . (c) For the heights of males, median  70, Q1  68, Q3  72 . (d) For the heights of females, median  66.5, Q1  65, Q3  67 . 2.111 (a) The dot diagrams are (b) From the dot diagrams we can see the number of flies (grape juice) is centered at about 11 and the number of flies (regular food) is centered near 25. The spread looks about the same. 38 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA (c) Regular food: x  25.1, s  6.84 . Grape juice: x  10.6, s  6.07 . 2.112 (a) The dot diagram of the usage times per ounce of toothpaste is (b) The relative frequency of usage times that do not exceed .80 is number of usage times less than or equal to .80 12   0.50 . 24 24 (c) x  0.794 and s  0.115 . (d) Median  0.805, Q1  0.755 and Q3  0.86 . 2.113 (a) x  5.38 and s  3.42 . (b) Median  5 . (c) Range  13  0  13 . 2.114 (a) 0.75 since 106.0 is the third quartile. (b) 0.50 since 94.0 is the median. (c) 0.68 in the interval x  2s or 69.5-111.5 if the frequency distribution is nearly bell-shaped. (d) 0.50 since Q1  85.5 and Q3  106.0 . (e) 0.997 in the interval x  3s or 59.0-122.0 if the frequency distribution is nearly bell-shaped. 2.115 (a) (b) (c) (d) Median  4.505, Q1  4.30 and Q3  4.70 . 90th percentile =  (4.80  5.07) / 2  4.935 . x  4.5074 and s  0.368 . The boxplot of acid rain in Wisconsin is 39 2.116 (a) and (b). We have x  4.507 and s  0.368 so xs x  2s x  3s Interval: (4.139, 4.875) (3.771, 5.243) (3.403, 5.611) Proportion: 38 / 50  0.76 46 / 50  0.92 50 / 50  1.000 Guidelines: 0.68 0.95 0.997 (c) The observed proportions are somewhat close to those suggested by the empirical guidelines. However, the proportion between one and two standard deviations (46  38) / 50  0.16 is noticeably smaller than the expected 0.95  0.68  0.27 . 2.117 (a) Median  6.3, Q1  5.9 and Q3  6.9 . (b) x  314.8/ 49  6.424 and s  2047.4  (314.8)2 / 49 / 48  0.721 . (c) Class Interval (%) (5.2, 5.6] (5.6, 6.0] (6.0, 6.4] (6.4, 6.8] (6.8, 7.2] (7.2, 8.4] Total Frequency 5 11 13 7 8 5 49 Relative Frequency 0.1020 0.2245 0.2653 0.1429 0.1633 0.1020 1.0000 We use the convention that the right endpoint is included in the class interval. (d) The boxplot of the data is 40 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.118 (a) Class Interval  4500, 1600 ( 1600,  850] ( 850,  250] ( 250, 0] (0, 250] (250,850] (850,1650] (1650, 2450] Total Frequency 1 Relative Frequency 0.026 1 2 8 13 6 5 3 39 0.026 0.051 0.205 0.333 0.154 0.128 0.077 1.0000 (b) The frequency and relative frequency histograms are: Frequency Yearly Changes in Dow Jones Averages 14 12 10 8 6 4 2 0 -1600 -850 -250 0 250 850 1650 2450 41 Yearly Changes in Dow Jones Averages 0.35 Relative Frequency 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 Changes in DJ Average (Here, the classes were renumbered 1 – 8, consecutively, for simplicity. (c) Since the value 0 is included in the interval ( 250, 0] , it is unclear how many of the 8 observations contained in that interval are negative. It is, however, unlikely that the Dow Jones average at the end of one year was exactly the same as that in the previous year. It seems safe to assume that all 8 differences in the interval ( 250, 0] are negative. The proportion of changes that are negative is then (1  1  2  8) / 39  0.308 . (d) The distribution is roughly bell-shaped centered slightly above 0. 2.119 (a) Winning Times in Minutes and Seconds 3.4 4.4 5.4 (b) It is not reasonable, because a frequency distribution would not show the systematic decrease of the winning times over the years, which is the main feature of these observations. 42 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.120 The mode is 1 bag, since the sample has twenty-four 1’s. 2.121 We calculate 419, 411  4167 4167 50  38.368 x  83.34 and s  49 50 2 2.122 (a) The time plot for this data set is as follows: Time Plot for Deaths Due to Lightning Number of Deaths 200 150 100 50 1959 1964 1969 1974 1979 1984 Year 1989 1994 1999 2004 (b) The number of deaths due to lightning is steadily declining over the indicated time period. The mean and standard deviation, while important, would not capture this trend. The 5-number summary would be an important supplement to report when describing this data set. 2.123 (a) The partial MINITAB output is 43 (b) The partial MINITAB output for the acid rain data. (Note that MINITAB uses a slightly different convention for determining Q1 and Q3 .) 2.124 (a) The histogram and box plot reveal a longer right hand tail. The five observations 4749, 4846, 4949, 5005, and 5157 seem to be detached and larger than expected for a bell-shaped pattern. (b) Textbook scheme: Q1  3470 MINITAB scheme: Q1  3457.5 The scheme used by the text yields a slightly higher value. 2.125 The partial MINITAB output for the data set in Table 4. 2.126 The MINITAB commands and partial output of the final times to run 1.5 miles in Data Bank D.5. 2.127 The mean and standard deviation given by MINITAB are the rounded off values of the answer given by SAS. 44 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.128 (a) Freshwater growth of male salmon: x  98.350, s  30.03 median  98.5, Q1  80 and Q3  118 . The frequency table for freshwater growth of male salmon is: Class Interval 40-60 60-80 80-100 100-120 120-140 140-160 Total Frequency 6 4 11 10 6 3 40 Relative Frequency 0.150 0.100 0.275 0.250 0.150 0.075 1.000 Because the cells have equal width, we take the option of using relative frequency for the vertical scale. The histogram is shown below. (b) Freshwater growth of female salmon: x  114.825, s  22.22 median  115.5, Q1  98.5 and Q3  132.5 . The frequency table for freshwater growth of female salmon is: Class Interval 60-80 80-100 100-120 120-140 140-160 160-180 Total Frequency 2 9 12 13 3 1 40 Relative Frequency 0.050 0.225 0.300 0.325 0.075 0.025 1.000 Because the cells have equal width, we take the option of using relative frequency for the vertical scale. The histogram is shown below. 45 (c) The box plots for freshwater growth of male and female salmon are: 2.129 (a) The histogram of the alligator data is (b) x 4035 155672  109.1 s   65.8 37 37  1 2.130 (a) The ordered data are 16 18 19 19 24 29 32 33 46 58 68 72 78 82 82 83 99 101 109 110 114 118 125 134 140 141 142 143 163 170 184 194 200 220 220 221 228 Median  109, Q1  58 and Q3  143 . (b) We count in 34 positions to find the 90th percentile  220 . The only two observations that are higher were taken from females. 46 CHAPTER 2. ORGANIZATION AND DESCRIPTION OF DATA 2.131 (a) (b) The ordered observations are 75.3 75.7 75.9 75.9 76.2 76.3 76.4 76.4 76.6 76.6 76.7 76.9 76.9 77.0 77.0 77.1 77.4 77.4 77.4 77.4 77.4 77.5 77.6 77.6 77.8 77.9 77.9 77.9 77.9 77.9 78.0 78.1 78.3 78.4 78.4 78.5 79.1 79.2 80.0 80.4 There are 40 observations so the median  (77.4  77.4) / 2  77.4 . The first quartile is the average of the 40 / 4  10th and 11th observations in the sorted list. Q1  (76.6  76.7) / 2  76.65 and Q1  (77.9  78.0) / 2  77.95 . (c) The interval x  s or (76.36, 78.56) has relative frequency 30 / 40  0.75 compared to 0.683. The interval x  2s or (75.26, 79.66) has relative frequency 38 / 40  0.95 compared to 0.95. The interval x  3s or (74.16,80.76) has relative frequency 1 compared to 0.997. The agreement is quite good.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 2