Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
251y0212 10/07/02 Part I. ECO251 QBA1 FIRST HOUR EXAM OCTOBER 1, 2002 Name __________________ SECTION MWF 10 11 TR 11 12:30 (10 points) 1. (Lind et. al.) A student takes a survey of heights of 400 college women and divides them into 11 classes. Her first two class midpoints are 62.5" and 65.5". There are 91 women in the first class and 110 women in the second class. (5) a. What is the (width of the) class interval w ? b. What is the lower limit of the third class? c. What is the relative frequency of the second class? d. What is the cumulative relative frequency of the second class? e. If this distribution is skewed to the left, about what percent of the data is above the median? Solution: a) Since 65.5 - 62.5 = 3, the interval must be 3. b) The lower limit must be 67. If the class interval is 3, the class must extend by 1.5 on either side of the midpoint. The layout for the first three classes is given in the table below. n 400 f Class Midpoint F f Frel Fn f rel n 61 - 64 62.5 91 .2275 91 .2275 64 - 67 65.5 110 .2750 201 .5025 67 - 70 68.5 ? ? ? ? c) The relative frequency is .275 d) The cumulative relative frequency is .5025, which might be found as the sum of .2275 and .2750. e) the median is defined as a point with 50% of the data above or below it. 2. Indicate whether the following are: Nominal Data, Ordinal Data, Interval Data, Continuous Ratio Data or Discrete Ratio Data. (3) a. A recent cartoon suggested that the numbers on pro Football jerseys be replaced by their salaries. What kind of data would the numbers be before the change? Ans: Nominal. b. What kind of data would the numbers usually be considered after the change? Ans: Continuous ratio. c. What kind of data is a team's score at half time? Ans: Discrete ratio. 3. All my family doctor's patient files are coded as follows: FS (Adult females who currently smoke); FN (Adult females who do not currently smoke); MS (Adult males who currently smoke) and MN (Adult males who do not currently smoke). Are these categories mutually exclusive and collectively exhaustive? If you say 'no' to either characteristic, explain. (2) Ans: The categories are mutually exclusive ( for instance no person who is coded FS will be in FN, MS or MN), but not collectively exhaustive because there is no reason to assume that all the family doctor's patients are adults. 1 251y0212 9/30/02 Part II. Compute an appropriate answer, showing your work (15 Points maximum - if you do more than 15 points, only your right answers will be counted.): 1) A sample of pipe outside diameters gives a mean of 15.0 inches and a standard deviation of 0.1 inches. a) If the median diameter is 14.9 inches and the mode is unknown (i) What is the maximum fraction of the pipes that could have a diameter above 15.2 x x 15 .2 15 inches? (1) Ans: Get a z-score. k z 2 . We know from the s 0.1 Chebyshef rule that the fraction in the tail above k is less than 1 2 . Since k k 2, this fraction is below 1 4 25%. (ii) Between what two diameters must at least 15/16 of the pipe diameters lie? (1) Ans: If only 116 1 k 2 are outside the interval, we must have k 2 16 or k 4. Thus the interval is k 15.0 4.1 15.0 0.4 or 14.6 to 15.4. (iii) Is this distribution skewed to the left or the right, or is it symmetrical? (1) Ans: Since the median is below the mean, it should be skewed to the right. b) If, instead, the median diameter is 15.0 inches and the mode is also 15.0 inches, between what two diameters must almost all the pipe diameters lie? (1) Ans: Since 15 30.1 15 0.3 is 3 standard deviations from the mean, the Empirical Rule (which applies because the mean, median and mode are equal) says that there will be almost none outside 14.7 to 15.3. c) What is the coefficient of variation for this sample of pipes? (1) Ans: C s x 0.115 0.0067 . 2) The newest computer in the headquarters of a firm that you are liquidating is two months old and the oldest is 97 months old. a) If the (absolute) frequencies are to be presented in a line graph in seven classes, what intervals would you use? Explain your reasoning using an appropriate formula and use it to fill in the table below.(3) 97 2 13 .57 so use 14. This is only a suggestion. Any number, like 15, somewhat above Ans: 7 13.57 will work, as long as you cover the range. Two possibilities are shown below. You should have shown only one. Class A B C D E F G From 2 16 30 44 58 72 86 to 15.9 29.9 43.9 57.9 71.9 85.9 99.9 From 0 15 30 45 60 75 90 to 14.9 29.9 44.9 59.9 74.9 89.9 104.9 b) What is the name of the type of graph that you are drawing (Is it a histogram?) and what would the x and y coordinates be of the last point on the line that you draw to represent the frequencies? (2) Ans: The graph is a frequency polygon and we must create an empty class to end it. For the first classification above, the interval is 14 and the midpoint of the last class on the table is 93, so the last point is x 107 , y 0 . For the second classification above, the interval is 15 and the midpoint of the last class on the table is 97.5, so the last point is x 112 .5, y 0 . 2 251y0212 9/30/02 3) For the numbers 60, 260, 460 and 660, compute the a) Geometric Mean b) Harmonic mean, c) Rootmean-square (2 each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10 logarithms. (3 points if you need it here or two points if you need it in the next section - doing this is insurance, you cannot get more than cannot get more than 15 points on part II or 25 points on part III unless you do the extra credit in part III. ) x 1440 . This is not used in any of the following calculations and there is Solution: Note that no reason why you should have computed it! (a) The Geometric Mean. 1 x g x1 x 2 x3 x n n n 4736160000 x 4 60 260 460 660 4 4736160000 4736160000 1 4 0.25 262 .335 . Do any of you really believe that 4736160000 1 4 4736160000 ? 4 (b) The Harmonic Mean. 1 1 xh n 1 1 1 x 4 60 260 460 660 4 0.066667 0.0038462 0.0021739 0.0015152 1 1 1 1 1 1 0.0242019 0.00605047 . So xh 1 165 .276 . As I explained several 1 1 4 0.00605047 n x times, 1 1 could not possibly be 1 1 . n 4 1440 x (c) The Root-Mean-Square. 1 1 1 2 x rms x 2 60 2 260 2 460 2 660 2 3600 67600 211600 435600 n 4 4 1 718400 179600 . So x rms 4 1 n x 2 179600 423 .792 . (d) The geometric mean using logarithms. Using natural logarithms: 1 ln( x) 1 ln 60 ln 260 ln 460 ln 660 ln x g n 4 1 1 4.09434 5.56068 6.13123 6.49224 22 .2785 5.556962 . So 4 4 x g e 5.556962 262 .335 . Or using logarithms to the base 10: 1 log( x) 1 log60 log260 log460 log660 log x g n 4 1 1 1.77815 2.41497 2.66276 2.81954 9.67543 2.41886 . So 4 4 x g 10 2.41886 262 .335 . Notice that the original numbers and all the means are between 60 and 660. 3 25190212 9/30/02 Part III. Do the following problems (25+ Points) 1. I have the following data for March electricity bills at a sample of 6 homes of similar sizes. 92 104 212 135 176 191 Compute the following: a) The Median (1) b) The Standard Deviation (4) c) The 3rd Quintile (2) Solution: Compute the Following: Note that x is in order Index 1 2 3 4 5 6 Total x 92 104 135 176 191 212 910 x2 8464 10816 18225 30976 36481 44944 149906 xx x x 2 -59.6667 3560.11 -47.6667 2272.11 -16.6667 277.78 24.3333 592.11 39.3333 1547.11 60.3333 3640.11 -0.0003 11889.33 Note that, to be reasonable, the mean, median and 3 rd decile must fall between 92 and 212. You should have done either the third column or the fourth and fifth columns. If you did both you were wasting time. n 6, x 910 , x 2 149906 , x x 0.00, x x 2 a) Just put the numbers in order and average the middle numbers, x.5 Or formally: position pn 1 a.b .57 3.5 11889.33 . x3 x 4 135 176 155 .5 . 2 2 x1 p xa .b( xa1 xa ) so x1.5 x.5 x3 0.5( x 4 x3 ) 135 0.5(176 135 ) 155 .5 . x 910 151 .667 b) x n x x 6 s 2 x 2 nx 2 n 1 149906 6151 .667 2 2377 .7 or 5 2 11889 .33 2377 .9 s 2377.9 48.76 n 1 5 c) The 3rd quintile has 60% below it. position pn 1 a.b 0.67 4.2 . a 4, .b 0.2 . s2 x1 p x a .b( x a1 x a ) so x1.6 x.4 x 4 0.2( x5 x 4 ) 176 0.2(191 176 ) 179 (New Formula: position 1 pn 1 a.b 1 0.6(5) 1 3.0 4.0 . a 4, .b 0.0 . x1 p xa .b( xa1 xa ) so x1.6 x.4 x 4 0.0( x5 x 4 ) 176 0.0(191 176 ) 176 ) 4 251y0212 9/30/02 2. A bus line takes a sample of the distances ridden by commuters and gets the following. is investigating the amount of time customers are put on hold when they call. The times are tabulated below. (Assume that the numbers are a sample.) a. Calculate the Cumulative Frequency (1) b. Calculate The Mean (1) amount frequency F c. Calculate the Median (2) less than 5 miles 5 5 d. Calculate the Mode (1) 5 - 9.99 miles 8 13 e. Calculate the Variance (3) 10 - 14.99 miles 9 22 f. Calculate the Standard Deviation (2) 15 - 19.99 miles 11 33 g. Calculate the Interquartile Range (3) 20 - 24.99 miles 16 49 h. Calculate a Statistic showing Skewness and 25 - 29.99 miles 19 68 Interpret it (3) 30 - 34.99 miles 12 80 i. Make an ogive of the Data (Neatness Counts!)(2) j. Extra credit: Put a (horizontal) box plot below the ogive using the same scale. Solution: x is the midpoint of the class. Our convention is to use x as the midpoint of 0 to 2, not 1.99999. If you did this using computational formulas, you should have the table below. Row x f class fx 2 fx fx3 1 below 5 5 2.5 12.5 31.2 78 2 5-9.99 8 7.5 60.0 450.0 3375 3 10-14.99 9 12.5 112.5 1406.3 17578 4 15-19.99 11 17.5 192.5 3368.7 58953 5 20-24.99 16 22.5 360.0 8100.0 182250 6 25-29.99 19 27.5 522.5 14368.7 395141 7 30-34.99 12 32.5 390.0 12675.0 411938 Total 80 1650.0 40400.0 1069312 Definitional formulas give you the table below. There is no reason to do both the tables for computational and definitional formulas. Row 1 2 3 4 5 6 7 Total class below 5 5-9.99 10-14.99 15-19.99 20-24.99 25-29.99 30-35.99 x f 5 8 9 11 16 19 12 80 xx fx 2.5 7.5 12.5 17.5 22.5 27.5 32.5 12.5 60.0 112.5 192.5 360.0 522.5 390.0 1650.0 -18.125 -13.125 -8.125 -3.125 1.875 6.875 11.875 f x x -90.625 -105.000 -73.125 -34.375 30.000 130.625 42.500 0.000 f x x 2 1642.58 1378.12 594.14 107.42 56.25 898.05 1692.19 6368.75 f x x 3 -29771.7 -18087.9 -4827.4 -335.7 105.5 6174.1 20094.7 -26648.4 fx 40400 , fx 1069312 , f x x 0, f x x 2 6368.75, and f x x 3 26648.4. Note that, to be reasonable, the mean, median and n f 80, fx 1650 , 2 3 quartiles must fall between 0 and 36. a. Calculate the Cumulative Frequency (1): (See above - in red) The cumulative frequency is the whole F column. b. Calculate the Mean (1): x fx 1650 20.625 n 80 5 251y0212 9/30/02 c. Calculate the Median (2): position pn 1 .581 40.5 . This is above F 33 and below F 49, pN F .580 33 so the interval is 20-24.99. x1 p L p w so x1.5 x.5 20 5 22 .1875 16 f p d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 19 is the largest frequency, the modal group is 25 to 29.99 and the mode is 27.5. e. Calculate the Variance (3): s 2 s2 f x x n 1 2 fx 2 nx 2 n 1 40400 80 20 .625 2 6368 .75 80 .6171 or 79 79 6368 .75 80 .6171 79 f. Calculate the Standard Deviation (2): s 80.6171 8.97870 g. Calculate the Interquartile Range (3): First Quartile: position pn 1 .25 81 20.25 . This is above pN F F 13 and below F 22 so the interval is 10-14.99. x1 p L p w gives us f p .2580 13 Q1 x1.25 x.75 10 5 10 3.8889 13 .8889 . 9 Third Quartile: position pn 1 .75 81 60.75 . This is above F 49 and below F 68, so the .7580 49 interval is 25-29.99. x1.75 x.25 25 5 25 2.8947 27 .8947 . 19 IQR Q3 Q1 27.8947 13.8889 14.0058. (New Formula: For the median - position 1 pn 1 1 0.579 40 .5 . This is the same result as on the previous page. For the first quartile - position 1 pn 1 1 0.2579 20.75 . This leads to the same result as above. For the third quartile - position 1 pn 1 1 0.7579 60 .25 . This leads to the same result as above.) h. Calculate a Statistic showing Skewness and interpret it (3): n k 3 fx 3 3x fx 2 2nx 3 80 1069312 320.625 40400 280 20.625 3 (n 1)( n 2) 7978 0.0129828 26648 .932 345 .97 . or k 3 or g 1 n (n 1)( n 2) k3 s 3 f x x 345 .97 8.9787 3 3 80 26648 .4 345 .97 79 78 0.4800 3mean mode 320 .625 27 .5 2.297 std .deviation 8.9787 Because of the negative sign, the measures imply skewness to the left. i. Make an ogive of the Data (Neatness Counts!)(2) An ogive is a line graph of the cumulative frequency. It should hit zero on the left at the origin. The next point is 5 at x 5 . It continues to rise until x 35 , when the height is 80. It ends with a horizontal line. j. Your box plot should show the first and third quartiles at the left and right of the box and a vertical line across the box at the median. It should be immediately below the ogive and use the same x points. or Pearson's Measure of Skewness SK 6