Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 1 INTRODUCTION AND DESCRIPTIVE STATISTICS 1-1. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. quantitative/ratio qualitative/nominal quantitative/ratio qualitative/nominal quantitative/ratio quantitative/interval quantitative/ratio quantitative/ratio quantitative/ratio quantitative/ratio quantitative/ordinal 1-2. Data are based on numeric measurements of some variable, either from a data set comprising an entire population of interest, or else obtained from only a sample (subset) of the full population. Instead of doing the measurements ourselves, we may sometimes obtain data from previous results in published form. 1-3. The weakest is the Nominal Scale, in which categories of data are grouped by qualitative differences and assigned numbers simply as labels, not usable in numeric comparisons. Next in strength is the Ordinal Scale: data are ordered (ranked) according to relative size or quality, but the numbers themselves don't imply specific numeric relationships. Stronger than this is the Interval Scale: the ordered data points have meaningful distances between any two of them, measured in units. Finally is the Ratio Scale, which is like an Interval Scale but where the ratio of any two specific data values is also measured in units and has meaning in comparing values. 1-4. Fund: Style: US/Foreign: 10 yr Return: Expense Ratio: 1-5. Ordinal. 1-6. A qualitative variable describes different categories or qualities of the members of a data set, which have no numeric relationships to each other, even when the categories happen to be coded as numbers for convenience. A quantitative variable gives numerically meaningful information, in terms of ranking, differences, or ratios between individual values. Qualitative Qualitative Qualitative Quantitative Quantitative 1 1-7. The people from one particular neighborhood constitute a non-random sample (drawn from the larger town population). The group of 100 people would be a random sample. 1-8. A sample is a subset of the full population of interest, from which statistical inferences are drawn about the population, which is usually too large to permit the variables to be measured for all the members. 1-9. A random sample is a sample drawn from a population in a way that is not a priori biased with respect to the kinds of variables being measured. It attempts to give a representative cross-section of the population. 1-10. Nationality: qualitative. Length of intended stay: quantitative. 1-11. Ordinal. The colors are ranked, but no units of difference between any two of them are defined. 1-12. Income: Number of dependents: Filing singly/jointly: Itemized or not: Local taxes: 1-13. Lower quartile = 25th percentile = data point in position (n + 1)(25/100) = 34(25/100) = position 8.5. (Here n = 33.) Let us order our observations: 109, 110, 114, 116, 118, 119, 120, 121, 121, 123, 123, 125, 125, 127, 128, 128, 128, 128, 129, 129, 130, 131, 132, 132, 133, 134, 134, 134, 134, 136, 136, 136, 136. Lower quartile = 121 Middle quartile is in position: 34(50/100) = 17. Point is 128. Upper quartile is in position: 34(75/100) = 25.5. Point is 133.5 10th percentile is in position: 34(10/100) = 3.4. Point is 114.8. 15th percentile is in position: 34(15/100) = 5.1. Point is 118.1. 65th percentile is in position: 34(65/100) = 22.1. Point is 131.1. IQR = 133.5 - 121 = 12.5. quantitative, ratio quantitative, ratio qualitative, nominal qualitative, nominal quantitative, ratio 2 Percentile and Percentile Rank Calculations x 10 15 65 x-th Percentile 116.4 118.8 130.8 1st Quartile Median 3rd Quartile 121 128 133 y 116.4 118.8 130.8 Percentile rank of y 10 15 65 Quartiles IQR 12 1-14. First, order the data: -1.2, 3.9, 8.3, 9, 9.5, 10, 11, 11.6, 12.5, 13, 14.8, 15.5, 16.2, 16.7, 18 The median, or 50th percentile, is the point in position 16(50/100) = 8. The point is 11.6. First quartile is in position 16(25/100) = 4. Point is 9. Third quartile is in position 16(75/100) = 12. Point is 15.5. 55th percentile is in position 16(55/100) = 8.8. Point is 12.32. 85th percentile is in position 16(85/100) = 13.6. Point is 16.5. 1-15. Order the data: 38, 41, 44, 45, 45, 52, 54, 56, 60, 64, 69, 71, 76, 77, 78, 79, 80, 81, 87, 88, 90, 98 Median is in position 23(50/100) = 11.5. Point is 70. 20th percentile is in position 23(20/100) = 4.6. Point is 45. 30th percentile is in position 23(30/100) = 6.9. Point is 53.8. 60th percentile is in position 23(60/100) = 13.8. Point is 76.8. 90th percentile is in position 23(90/100) = 20.7. Point is 89.4. Percentile and Percentile Rank Calculations x-th x Percentile 20 46.4 30 54.6 60 76.6 y 46.4 54.6 76.6 Quartiles 1st Quartile Median 3rd Quartile 52.5 70 79.75 IQR 3 27.25 1-16. Order the data: 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7. Lower quartile is the 25th percentile, in position 16(25/100) = 4. Point is 2. The median is in position 16(50/100) = 8. The point is 3. Upper quartile is in position 16(75/100) = 12. Point is 5. IQR = 5 - 2 = 3. 60th percentile is in position 16(60/100) = 9.6. Point is 4. Percentile and Percentile Rank Calculations x-th x Percentile 60 4 1 1 y 4.0 0 0 Quartiles 1st Quartile Median 3rd Quartile 1-17. 2 3 5 IQR 3 The data are already ordered; there are 16 data points. The median is the point in position 17(50/100) = 8.5 It is 51. Lower quartile is in position 17(25/100) = 4.25. It is 30.5. Upper quartile is in position 17(75/100) = 12.75. It is 194.25. IQR = 194.25 - 30.5 = 163.75. 45th percentile is in position 17(45/100) = 7.65. Point is 42.2. Percentile and Percentile Rank Calculations x-th x Percentile 45 43 y 43.0 0 0 Quartiles 1st Quartile Median 3rd Quartile 1-18. 31.5 51 162.75 IQR 131.25 The mean is a central point that summarizes all the information in the data. It is sensitive to extreme observations. The median is a point "in the middle" of the data set and does not contain all the information in the set. It is resistant to extreme observations. The mode is a value that occurs most frequently. 4 1-19. Mean, median, mode(s) of the observations in Problem 1-13: Mean x xi 126 .64 Median = 128 Modes = 128, 134, 136 (all have 4 points) Measures of Central tendency Mean 126.63636 1-20. For the data of Problem 1-14: Mean = 11.2533 Median = 11.6 Mode: none 1-21. For the data of Problem 1-15: Mean = 66.955 Median = 70 Mode = 45 Median 128 Median 70 Mode 128 Measures of Central tendency Mean 66.954545 1-22. Mode 45 For the data of Problem 1-16: Mean = 3.466 Median = 3 Mode = 1 and 2 Measures of Central tendency Mean 3.4666667 1-23. Median 3 Mode 1 For the data of Problem 1-17: Mean = 199.875 Median = 51 Mode: none Measures of Central tendency Mean 199.875 Median 51 5 Mode #N/A 1-24. For the data of Example 1-1: Mean = 163,260 Median = 166,800 Mode: none 1-25. (Using the template: “Basic Statistics.xls”, enter the data in column K.) Basic Statistics from Raw Data Measures of Central tendency Mean 21.75 1-26. Median Mode 12 13 (Using the template: “Basic Statistics.xls”) Frequency 2.5 2 1.5 1 0.5 0 -2.6 -1.2 0.3 0.6 3.4 4.3 Mean = .0514 Median = 0.3 Outliers: none 1-27. Mean = 592.93 Median = 566 Std Dev = 117.03 QL = 546 QU = 618.75 Outliers: 940 Suspected Outlier: 399 1-28. Measures of variability tell us about the spread of our observations. 1-29. The most important measures of variability are the variance and its square root- the standard deviation. Both reflect all the information in the data set. 1-30. For a sample, we divide the sum of squared deviations from the mean by n – 1, rather than by n. 6 1-31. For the data of Problem 1-13, assumed a sample: Range = 136 – 109 = 27 Variance = 57.74 Standard deviation = 7.5986 Variance St. Dev. If the data is of a Sample Population 57.7386364 55.9889807 7.59859437 7.48257848 1-32. For the data of Problem 1-14: Range = 18 – (–1.2) = 19.2 Variance = 25.90 Standard deviation = 5.0896 1-33. For the data of Problem 1-15: Range = 98 – 38 = 60 Variance = 321.38 Standard deviation = 17.927 If the data is of a 1-34. Sample Population Variance 321.378788 306.770661 St. Dev. 17.9270407 17.5148697 For the data of Problem 1-16: Range = 7 – 1 = 6 Variance = 3.98 Standard deviation = 1.995 Variance St. Dev. 1-35. If the data is of a Sample Population 3.98095238 3.71555556 1.99523241 1.92757764 For the data of Problem 1-17: Range = 1,209 – 23 = 1,186 Variance = 110,287.45 Standard deviation = 332.096 If the data is of a Sample Population Variance 110287.45 103394.484 St. Dev. 332.095543 321.550127 1-36. n 33, x 126.64, s 7.60, so x 2s 111.44,141.84 ; this captures 31/33 of the data points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not apply. 1-37. n 15, x 11.253, s 5.090, so x 2s 1.073, 21.433 ; this captures 14/15 of the data points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not apply 7 1-38. n 22, x 66.95, s 17.93, so x 2s 31.09,102.81 ; this captures all the data points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not apply. 1-39. 1-40. n 15, x 3.467 , s 1.995, so x 2s 0.523, 7.457 ; this captures all the data points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not apply. n 16, x 199.9, s 332.1, so x 2s 464.3, 864.1; this captures 15/16 of the data points, so Chebyshev's theorem holds. The data set is not mound-shaped, so the empirical rule does not apply. 1-41. Electrolux GE Matsushita Whirlpool B-S Philips Maytag 1-42. Stock 5 Stock 4 Stock 3 Stock 2 Stock 1 0 5 10 15 20 8 1-43. Mean = 0.917 Median = 0.85 Std dev = 0.4569 Annual Percentage Yields 2.5 2 1.5 yields 1-44. 1 0.5 0 Chase Citi Fleet HSBC Banco Popular Banks 9 North Fork Valley Nat'l PNC M&T 1-45. Mean = $18.53 Median = $15.93 Average Book Prices Adult MM paper 8% Adult Trade 17% Adult Nonfiction 31% Adult Nonfiction Adult Fiction Children's HC Adult Trade Adult MM paper Children's HC 17% Adult Fiction 27% 1-46. 1-47. Using MINITAB Stem Leaves 4 5 5688 8 6 0123 14 6 677789 (9) 7 002223334 11 7 55667889 3 8 224 10 Box and Whisker Plot 1-48. 8.5 7.9 C1 7.3 6.7 6.1 5.5 34 cases There are no outliers. Distribution is skewed to the left. 1-49. A stem-and-leaf display is a quickly drawn type of histogram useful in analyzing data. A box plot is a more advanced display useful in identifying outliers and the shape of the distribution of the data. 1-50. Stem 1 0 1 1 1 2 7 3 (13) 4 11 5 2 6 1 7 1-51. Leaves 5 234578 2234567788899 012235678 3 8 The data are narrowly and symmetrically concentrated near the median (IQR and the whisker lengths are small), not counting the two extreme outliers. Box and Whisker Plot 80 C1 60 40 20 0 31 cases 11 1-52. Wider dispersion in data set #2. Not much difference in the lower whiskers or lower hinges of the two data sets. The high value, 24, in data set #2 has a significant impact on the median, upper hinge and upper whisker values for data set #2 with respect to data set #1. 1-53. Mean = 127 Var = 137 sd = 11.705 mode = 127 outliers: TWA, Lufthansa 160 150 140 130 120 110 100 1-54. Stem-and-leaf of C2 Leaf Unit = 1.0 f 13 18 (6) 21 15 8 6 3 2 Stem 1 1 2 2 3 3 4 4 5 N = 45 Leaves 0011111223444 55689 022333 567789 0122234 78 012 7 23 12 1-55. Outliers are detected by looking at the data set, constructing a box plot or stem-and-leaf display. An outlier should be analyzed for information content and not merely eliminated. 1-56. The median is the line inside the box. The hinges are the upper and lower quartiles. The inner fences are the two points at a distance of 1.5 (IQR) from the upper and lower quartiles. Outer fences are similar to the inner fences but at a distance of 3 (IQR). The box itself represents 50% of the data. 1-57. Mine A: f 2 4 7 (5) 7 4 4 3 1 Stem 3 3 4 4 5 5 6 7 8 Leaves 24 57 123 55689 123 0 36 5 Mine B: f 2 4 6 9 (3) 7 4 1 Stem 2 2 3 3 4 4 5 5 Leaves 34 89 24 578 034 789 012 9 Values for Mine A are smaller than for Mine B, right-skewed, and there are three outliers. Values for Mine B are larger and the distribution is almost symmetric. There is larger variance in B. 1-58. No. One needs to use descriptive statistics and/or statistical inference. 1-59. Comparing two data sets using Box Plots Lower Lower Upper Upper Whisker Hinge Median Hinge Whisker Shipments 1.3 1.975 2.4 3.4 4.2 Market Share 3.6 5.3 6.55 9.275 11.4 Shipments Market Share 13 1-60. Mean = 5.785 median = 5.782 The mean is impacted by the high rate of fatalities for the very small car classification. Fatality Rates Minivans Large cars Very small cars Large SUVs Very small cars Small cars Compact pickups Midsize SUVs Large pickups Small SUVs Small cars Midsize cars Midsize cars Large pickups Large SUVs Small SUVs Compact pickups Minivans Midsize SUVs 1-61. Large cars Answers will vary. a. If we add the value “5” to all the data points, then the average, median, mode, first quartile, third quartile and 80th percentile values will change by “5”. There is no change in the variance, standard deviation, skewness, kurtosis, range and interquartile range values. b. Average: if we add “5” to all the data points, then the sum of all the numbers will increase by “5*n”, where n is the number of data points. The sum is divided by n to get the average. So 5*n / n = 5: the average will increase by “5”. Median: If we add “5” to all the data points, the median value will still be the midway point in the ordered array. Its value will also increase by “5” Mode: Adding “5” to all the data points changes the number that occurs most frequently by “5” First Quartile: adding “5” to all the data points does not change the location of the first quartile in the ordered array of numbers, which is: (.25)(n+1) where n is the number of data points. Whether the first quartile falls on a specific data point or between two data points, the resulting value will have been increased by “5”. Third Quartile: adding “5” to all the data points does not change the location of the third quartile in the ordered array of numbers, which is: (.75)(n+1) where n is the number of data points. Whether the third quartile falls on a specific data point or between two data points, the resulting value will have been increased by “5”. 14 80th percentile: adding “5” to all the data points has the same effect as in the calculation of the first or third quartile. The value will be increased by “5” Range: adding “5” to the all the data points will have no effect on the calculation of the range. Since both the highest value and the lowest value have been increase by the same number, the subtraction of the lowest value from the highest value still yields the same value for the range. Variance: adding “5” to all the data points has no effect on the calculation of the variance. Since each data point is increased by “5” and the average has also been shown to increase by the same factor, the differences between each individual new data point and the new average will not change and will not be affected by squaring the difference, summing the squared differences and dividing by number of data points. Standard Deviation: since the variance is not affected by adding “5” to each data point, neither is the standard deviation. Skewness: Since each data point is increased by “5” and the average has also been shown to increase by the same factor, the differences between each individual new data point and the new average will not change. Therefore, the numerator in the formula for skewness is not affected. Since the standard deviation is not affected as well (the denominator), there is no change in the value for skewness. Kurtosis: Since each data point is increased by “5” and the average has also been shown to increase by the same factor, the differences between each individual new data point and the new average will not change. Therefore, the numerator in the formula for kurtosis is not affected. Since the standard deviation is not affected as well (the denominator), there is no change in the value for kurtosis. Interquartile Range: given that both the first quartile and the third quartile increased by the same factor, “5”, the difference between the two values remains the same. c. Multiplying each data point by a factor “3” results in the following changes. The mean, median, mode, first quartile, third quartile and 80th percentile values will be increased by the same factor “3”. In addition, the standard deviation and the range will also increase by the same factor “3”. The variance will increase by the factor squared, and the skewness and kurtosis values will remain unchanged. d. Multiplying all data points by a factor “3” and adding a value “5” to each data point has the following results. The order of operation is first to multiply each data point and then add a value to each data point. Each data point is first multiplied by the factor “3” and then the value “5” is added to each newly multiplied data point. Multiplying each data point by the factor “3” yields the results listed in c). Adding a value 5 to the newly multiplied data points yields the results listed in a). 1-62. x 74.7 s = 13.944 s2 = 194.43 15 1-63. = 504.688 = 94.547 Measures of Central tendency Mean 504.6875 Median 501.5 Mode #N/A Range IQR 346 149.5 Measures of Dispersion Variance St. Dev. If the data is of a Sample Population 9227.5121 8939.15234 96.0599401 94.5470906 1-64. Step 1: Enter the data from problem 1-63 into cells Y4:Y35 of the template: Histogram.xls from Chapter 1. The template will order the data automatically. Step 2: We need to select a starting point for the first class, an ending point for the last class, and a class interval width. The starting point of the first class should be a value less than the smallest value in the data set. The smallest value in the data set is 344, so you would want to set the first class to start with a value smaller than 344. Let’s use 320. We also selected 710 as the ending value of the last class, and selected 50 as the interval width. The data input column and the histogram output from the template are presented below. The end-point for each class is included in that class; i.e., the first class of data goes from more than 320 up to and including 370, the second class starts with more than 370 up to and including 420, etc. 16 1-65. Range: 690 – 344 = 346 90th percentile lies in position: 33(90/100) = 29.7 It is 632.7 First quartile lies in position: 33(25/100) = 8.25 It is 419.25 Median lies in position: 33(50/100) = 16.5 It is 501.5 Third quartile lies in position: 33(75/100) = 24.75 It is 585.75 1-66. 17 1-67. 2 7 (3) 6 4 2 2 Stem 1 1 2 2 3 3 4 Leaves 24 56789 023 55 24 01 Box and Whisker Plot 1-68. 42 36 C2 30 24 18 12 The data is skewed to the right. 1-69. Stem Leaves 3 1 012 4 1 9 12 2 1122334 (9) 2 556677889 6 3 024 3 3 57 1 4 1 4 1 5 1 5 1 6 2 The data is skewed to the right with one extreme outlier (62) and three suspected outliers (10,11,12) 18 Box and Whisker Plot 1-70. 80 C1 60 40 20 0 1-71. Mean = 25.857 sd = 9.651 D Media Cos. 1-72. Mean = 18.875 var = 38.65 outliers: none Box and Whisker Plot 34 C1 26 18 10 16 cases 19 Vi ac om C is ne In y te rA ct iv eC or p Li be rty M ed ia N ew s C or p Ti m e W ar ne r 40 35 30 25 20 15 10 5 0 om ca st price Stock Prices 1-73. Mean = 33.271 sd = 16.945 var = 287.15 QL = 25.41 Med = 26.71 QU = 35 Outliers: Morgan Stanley (91.36%) Box and Whisker Plot 100 C1 80 60 40 20 15 cases 1-74. Mean = 3.18 sd = 1.348 var = 1.817 QL = 1.975 Med = 2.95 QU = 3.675 Outliers: 8.70 Box and Whisker Plot 9 C1 7 5 3 1 20 cases 20 1-75. a. b. c. d. IQR = 3.5 data is right-skewed 9.5 is more likely to be the mode, since the data is right-skewed Will not affect the plot. 1-76. Bar graph showing changes over time. Both the employee’s out-of-pocket and payroll deduction expenses have increased substantially over the last three years. 1-77. Mean (billions of tons) = 1.439 Mean (per capita tons) = 9.98 The mathematical computation for both averages is the same, however, they do differ in meaning. On average, the countries listed emit 1.439 billion tons of carbon dioxide each. However, the emissions per person is 9.98 tons. Dividing billions of tons by the rate per capita for the US, we get a population estimate of 256 million people, which is close to the actual population for 1997. 1-78. Mean = 2.75 sd = 14.44 var = 208.59 QL = 5.075 Med = 7.9 QU = 13.675 Outliers: –30.2 Box and Whisker Plot 20 C1 0 -20 -40 8 cases 21 1-79. Mean = 10301.05 sd = 16.916 var = 286.155 (Using the template: “Basic Statistics.xls”) Measures of Central tendency Mean 10301.05 Median 10300.5 Mode 10300 Range IQR 54 16.25 Measures of Dispersion If the data is of a Sample Population Variance 286.155263 271.8475 St. Dev. 16.9161244 16.4877985 1-80. Mean = 99.039 sd = .4366 var = .1907 Median = 99.155 1-81. Mean = 17.587 sd = .466 var = .2172 Measures of Central tendency Mean Median 17.5875 17.5 Mode 18.3 Range IQR 1.4 0.75 Mode #N/A Range IQR 12.38 2.92 Measures of Dispersion If the data is of a Sample Population Variance 0.21716667 0.20359375 St. Dev. 0.46601144 0.45121364 1-82. Mean = 29.018 sd = 4.611 (Using the template: “Basic Statistics.xls”) Measures of Central tendency Mean Median 29.018 29.75 Measures of Dispersion Variance St. Dev. If the data is of a Sample Population 21.26552 17.012416 4.6114553 4.12461101 22 1-83. Mean = 4.8394 sd = .08 Median = 4.86 1-84. Stock Prices for period: April, 2001 through June, 2001 [Answers will vary due to dates used.] a). Mean and Standard Deviation for Wal-Mart Basic Statistics from Raw Data Stock Prices: Wal-Mart Measures of Central tendency Mean 51.041478 Median 51.1266 Mode 50.158 Range IQR 6.1911 1.9613 Measures of Dispersion If the data is of a Sample Population Variance 2.25711298 2.22128579 St. Dev. 1.50236912 1.49039786 Higher Moments If the data is of a Sample Population Skewness 0.07083784 0.06913994 (Relative) Kurtosis -0.711512 -0.7500338 23 b). Mean and Standard Deviation for K-Mart Basic Statistics from Raw Data Stock Prices: K-Mart Measures of Central tendency Mean 10.450952 Median 10.66 Mode 11.8 Range IQR 3.51 1.955 Measures of Dispersion If the data is of a Sample Population Variance 0.9852023 0.96956417 St. Dev. 0.99257358 0.9846645 Higher Moments If the data is of a Sample Population Skewness -0.4070262 -0.3972703 (Relative) Kurtosis -1.132009 -1.1378913 c). Coefficient of variation: CV = std. dev mean For Wal-Mart: considering the data as a population: CV = 1.49039786 / 51.041478 = 0.0292 considering the data as a sample: CV = 1.50236912 / 51.041478 = 0.02943 for K-Mart: CV = 0.9846645 / 10.450952 = 0.0942 CV = 0.99257358 / 10.450952 = 0.09497 d). There is a greater degree of risk in the stock prices for K-Mart than for Wal-Mart over this three month period. e). For DJIA considering the data as a population: CV = 427.913791 / 10681.11 = 0.04006 considering the data as a sample: CV = 431.350905 / 10681.11 = 0.04038 Wal-Mart stocks provided a less risky return for this time period relative to DJIA and K-Mart. f). 100 Shares of Wal-Mart stocks purchased April 2, 2001: Price = $50.5674 Cost = $5056.74 Mean of holding 100 shares: $5104.15 Std dev of holding 100 shares: 1.4904 (rounded: if data considered a population) 1.5024 (rounded: if data considered a sample) 24 1-85. a). for a process mean = 2004 VARP = Average SSD2004 + offset2 VARP = 3.5 + offset2 where offset = target – process b). if target = process, then offset = 0 substituting: VARP = 3.5 + offset2 = 3.5 + 02 = 3.5 1-86. a) & b): CPI and Gas prices for period: June 97 through May 01. (Non-seasonally adjusted series.) CPI index converted (by 100) in order to compare both series on same chart. There is no seasonal pattern present in the CPI index. Steady trend present in CPI; considerable variability in gas prices. Gas prices increased considerably more than the overall CPI for the same time period. 25 1-87. a). Pie Chart: AIDS cases by Age groups Age Group Under 5: Ages 5 to 12: Ages 13 to 19: Ages 20 to 24: Ages 25 to 29: Ages 30 to 34: Ages 35 to 39: Ages 40 to 44: Ages 45 to 49: Ages 50 to 54: Ages 55 to 59: Ages 60 to 64: Ages 65 or older: No. 6812 1992 3865 26518 99587 168723 168778 124398 72128 38118 20971 11636 10378 % 0.90% 0.26% 0.51% 3.52% 13.21% 22.38% 22.39% 16.50% 9.57% 5.06% 2.78% 1.54% 1.38% AIDS cases by age Under 5: (0.90%) Ages 5 to 12: (0.26%) Ages 65 or older: (1.38%) Ages 13 to 19: (0.51%) Ages 60 to 64: (1.54%) Ages 55 to 59: (2.78%) Ages 20 to 24: (3.52%) Ages 50 to 54: (5.06%) Ages 25 to 29: (13.21%) Ages 45 to 49: (9.57%) Ages 40 to 44: (16.50%) Ages 30 to 34: (22.38%) Ages 35 to 39: (22.39%) 26 b). Pie Chart: AIDS cases by Race Race White, not Hispanic Black, not Hispanic Hispanic Asian/Pacific Islander American Indian/Alaska Native Race/ethnicity unknown No. 324822 282720 137575 5546 2234 1010 % 43.09% 37.50% 18.25% 0.74% 0.30% 0.13% AIDS cases by Race Race/ethnicity unknown (0.13%) American Indian/Alaska Native (0.30%) Asian/Pacific Islander (0.74%) Hispanic (18.25%) White, not Hispanic (43.09%) Black, not Hispanic (37.50%) 1-88. (Using the template: “Box Plot 2.xls”) Comparing two data sets using Box Plots Lower Whisker Cubs 300000 White Sox 301000 Salaries 2004 Lower Hinge 650000 340000 Upper Upper Median Hinge Whisker 1550000 5750000 9500000 775000 3875000 8000000 Cubs White Sox Outliers: Cubs: Sosa’s salary of $16M White Sox: Ordonez’s salary of $14M Furthermore, the median salary of the Cubs is twice the median salary of the White Sox. There are some players on both teams making the league minimum salary. 27 Somewhat lower salary range for the White Sox relative to the Cubs due to the fact that only seven (7) players on the Cubs were paid $500,000 or less while eleven (11) players earned less than that amount on the White Sox. 1-89 25 20 15 Errors OT Type 10 5 0 0 5 10 15 20 250 200 150 Skill Stress 100 50 0 0 2 4 6 8 10 28 12 14 16 18 Correlation Table: Errors Errors OT Type Skill Stress OT Type 1 0.962672 1 0.036243 0.065654 1 -0.89162 -0.82627 -0.00447 0.979628 0.926601 0.053555 Skill 1 -0.93428 Stress 1 There is high positive correlation between the number of errors and the amount of overtime and stress, but a high negative correlation between the number of errors and skill level. Skill level appears to decrease the number of errors, but overtime and stress add to the number of errors. Overtime is highly correlated with stress and negatively correlated with skill level. Skill level and stress are negatively correlated. The higher the skill level of the employee the lower the stress level. 29