Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4.1 The mean, π₯Μ = $2118.71 and the median, the middle value is $1688. The mean is larger than the median because the data set is positively skewed. There are two very high values that bring up the mean. The median is a better representation of a typical value for this data set. 4.6. a. The data is represented in a dotplot below. Since the distribution is not symmetrical, the median would be the preferred measure of center. b. 0 0 0 0 0 0 212 224 236 236 306 0 59 71 83 106 130 142 142 165 177 189 189 189 201 The mean of the remaining values is 119.947. 3 The trimming percentage is 25 β 100 = 12% 7 c. You would need to trim 7 values from each side: 25 β 100 = 28%. Removing all of the zeros would give an unrepresentative center of value. 4.10 a. Since the mean is greater than the median in both cases, both distributions are positively skewed, meaning that there are a few very large values that bring the mean up. The difference between the median and the mean is greater for Bypass surgery, which indicates that the positive skew is larger for that surgery. It seems that there are a few patients who wait a long time for the surgery. b. The median wait time is the 50% completion date. That will always be less than the 90% completion date. 4.12 It depends on what the word βaverageβ means. If βaverageβ is median, than this statement is not possible. But if βaverageβ is mean, then it is possible. We know that wage distributions are positively skewed, with a few very large values, so the mean is higher than the median. So it makes sense that more than 50% of people will earn less than the mean. 4.14 a. 7 πΜ = 10 = .7 7 b. π₯Μ = 10 = .7 The sample proportion of success is equal to the mean. c. πΜ = .8 = 25 π₯ x=20, so you need 13 more successes. 4.15 The median can be calculated by putting the values in order: 170 290 350 480 570 790 860 920 1000+ 1000+ The median is the average of the middle two values: π₯Μ = 680 hours. We cannot calculate a mean, but we could calculate a trimmed mean by trimming two values from each end. The 20% trimmed mean is 661.667 mean. 4.16 19 β 70 = 1330 πππππ‘π ππππππ π π πππ 1330+π₯ 2000 = .71 π₯ = 90 1330+π₯ 2000 = .72 π₯ = 110 Not possible 4.22 55 60 60 110 110 115 130 140 155 180 195 195 85 167.5 Lower Quartile Median Upper Quartile Interquartile Range: 167.5-85=82.5 mg/cup 4.24 1,2,3,4,5 π₯Μ = 3 π = 1.58 1,3,3,3,5 π₯Μ = 3 π = 1.41 1,2,3,4,5 π₯Μ = 3 π = 1.58 6,7,8,9,10 π₯Μ = 8 π = 1.58 4.27 a. π₯Μ = 141+142+178+72+219+138+171+134+210+70 10 = 147.5 Variance: π 2 = (141βπ₯Μ )2 +(142βπ₯Μ )2 +(178βπ₯Μ )2 +(72βπ₯Μ )2 +(219βπ₯Μ )2 +(138βπ₯Μ )2 +(171βπ₯Μ )2 +(134βπ₯Μ )2 +(210βπ₯Μ )2 +(70βπ₯Μ )2 9 π = β2505.83 = 50.058 b. Smaller. The data for Memorial Day is much more consistent. c. Using my calculator to find the standard deviations, I find: = 2505.83 Holiday New Years Memorial Day July 4th Labor Day Thanksgiving Christmas Consistent Day? No Yes No Yes Yes No Standard Deviation 50.058 18.224 47.139 17.725 15.312 52.370 So we do see that holidays that fall on consistent days tend to have more consistent data, since their standard deviations are smaller. 4.28 a. The lower quartile must be less than the median, so the lower quartile must be less than 14. b. The upper quartile must be greater than the median, and less than 90% (the upper quartile is 75%) so the upper quartile is between 13 and 42. c. The people who wait the most as defined here are in the 95th percentile, so it must be greater than the 90th percentile, and must be greater than 42. 4.29 a. There were more houses sold in Los Osos than in Morro Bay, so an average of the two areas would have to account for that fact. b. The range of values in Pasa Robles is much greater than the range of values in Grover Beach, so Paso Robles is likely to have a higher standard deviation. c. Since Paso Robles has a highest value that is double that of Grover Beach, it is likely that the median value is higher in Paso Robles. 4.30 Using my calculator, the standard deviation of the cases listed is 606.894 and the mean is 747.370. So the mean plus two standard deviations is 747.370+2(606.894)=1961.138 thousands, or $1,961,158. 4.31 a. Using my calculator, I find: Sample 1 Mean 7.81 Standard Deviation 0.398 Sample 2 49.68 1.739 Coefficient of Variation . 398 β 100 = 5.096 7.81 1.739 = 3.500 49.68 Sample 1 is measured in ounces, and the values are smaller compared to Sample 2 which is measured in pounds. It makes sense that the coefficient of variation for Sample 1 would be smaller since adding just a little more or less makes a bigger difference in a small container. 4.32 a. Since the mean is greater than the median, the distribution is positively skewed. b. c. There are no outliers at the upper end. An outliers at the upper end would need to be three interquartile ranges away from the upper quartile. The IQR is 31-7=24, so an outlier would be 31+3(24)=103. The largest value given is 205, which is beyond 103, so it is an extreme outlier. 4.33 Median: 57.3+58.7 2 = 58 Lower Quartile: 53.5 Upper Quartile: 64.4 b. The IQR is 10.9. So an outlier would be below 53.5-1.5(10.9)=37.5. Both Alaska and Wyoming are outliers. c. The median percent of population of a state that was born there and still lives there is 58%. There are two outliers at the lower end. If those are ignored, the distribution is roughly symmetrical. 4.36 a. Fiber: 7 7 7 7 7 8 8 8 8 8 10 10 10 12 12 12 13 14 5 6 6 9 9 10 10 10 10 11 11 13 14 17 18 19 Median: 8 Lower Quartile: 7 Upper Quartile: 12 IQR: 12-7=5 b. Sugar: 0 0 Median: 10 Lower Quartile: 6 Upper Quartile: 13 IQR= 13-6=7 c. An outlier would be 6-1.5(7)=-4.5 or 13+1.5(7)=23.5 No, there are no outliers in the sugar values. d. The fiber data includes 5 values of 7, which is more than 25% of the data. And so the lowest value is the same as the lower quartile. e. The median sugar content in the cereal is 10, while the median fiber content is 8. In general, there is more sugar in the cereals than fiber. There is more variability in the sugar content distribution, with a range of 19 and IQR of 7. The fiber distribution is smaller, with a range of 7 and IQR of 5. The distribution of the sugar content is roughly symmetrical, while the fiber content distribution is positively skewed. 4.37 a. Since there seem to be outliers at either end, the IQR would be more useful as a measure of variability. b. Lower quartile: 84+79 2 = 81.5 Upper quartile: 94 IQR: 12.5 81.5-3(12.5)=44 81.5-1.5(12.5)=62.75 94+1.5(12.5)=112.75 94+3(12.5)=131.5 The data point for farmer is less than 44, so it is an extreme outlier. The data point for student is above 131.5, so it is an extreme outlier. There are no non-extreme outlier. c. d. It is reasonable to offer discounts to professions that would qualify as low outliers, which mean just farmers would get the discount. It is also reasonable to offer discounts to those in the lower quartile, which would be the last 10 professions on the list. 4.38 a. One std dev above mean: 40min One std dev below mean: 30 min Two std dev above mean: 45 min Two std dev below mean: 25 min 1 b. 100 (1 β 22 ) = 75% c. These are three std dev away, so 100 (1 β 1 ) 32 = 88.8%. So 11.1% are outside the give range. d. 95% are within two std deviations. Outside of three std deviations is .3%. Less than 20 mintes is half that, so .15% 4.39 a. 27 and 57 mph is +/- 1 standard deviation from the mean. So 68% of the speeds would be between those values. b. 100β68 2 = 16% 4.41 a. The values given are two std deviations above and below the mean. So using Chebyshevβs Rule at least 75% of the observations must lie between those values. b. Three standard deviations: above 2.9 and below 70.94 c. 24.76-2(1720)= a negative number. So the distribution cannot be normal. 4.43 First test π§ = 625β475 100 = 1.5. Second test π§ = 45β30 8 = 1.875. Since the z score is higher for the second test, so the student scores better relative to others on the second test. 4.48 a. A z score of 2.2 means you were 2.2 std deviations above the mean, and you performed better than 95% of the class. b. A z-score of .4 means you were .4 std dev above the mean, and you performed better than average. c. A z-score of 1.8 means you were 1.8 std dev above the mean and youβre around the 90th percentile. d. A z-score of 1 means you were 1 std dev above the mean and you performed better than 68% of the students. e. A z-score of 0 means you were right at average.