Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
251y0815 6/11/08 ECO251 QBA1 FIRST EXAM June 11, 2008 Name: ___KEY_________________ Student Number: _________________________ Class Hour: _____________________ Remember – Neatness, or at least legibility, counts. In most non-multiple-choice questions an answer needs a calculation or short explanation to count. Show your work! The exam is normed on 50 points, so that any grade above 48 is an A+ and grades wrap around. Part I. (7 points) The following numbers are to be considered a sample of prices of gasoline taken at five gas stations. 4.14 4.19 4.25 4.01 3.99 x1 Compute the following: Show your work! a) The Median (1) b) The Standard Deviation (3) c) The 3rd quintile (2) d) The Coefficient of variation (1) Extra credit beyond this point! e) The harmonic mean (1.5) f) The root-mean square (1.5) g) The geometric mean (3 ways!) (3) Solution: My calculations are below. logx  is a logarithm to the base 10 and ln x  is a natural logarithm. Row 1 2 3 4 5 x x in order 4.14 4.19 4.25 4.01 3.99 20.58 3.99 4.01 4.14 4.19 4.25 x logx  0.241546 0.238663 0.235294 0.249377 0.250627 1.21551 0.617000 0.622214 0.628389 0.603144 0.600973 3.07172 x2 17.1396 17.5561 18.0625 16.0801 15.9201 84.7584 1 ln x  1.42070 1.43270 1.44692 1.38879 1.38379 7.07290 a) The Median (1): the median is the middle number when the data is in order – 4.14 b) The Standard Deviation (3): We have x  x  20 .58 ,  x 2  84 .7584 , x   x  20.58  4.116 , n 5 84 .7584  54.116 2 0.05112   0.01278 . So that s  0.01278  0.113049. n 1 4 4 If you chose to annoy me by using the definitional formula, you should have gotten the following. x  20 .58 , You would have x xx Row  x  x 2 s2  1 2 3 4 5 s 2 2  nx 4.14 4.19 4.25 4.01 3.99 20.58  0.024 0.074 0.134 -0.106 -0.126 0.000  x  x   n 1 2 2  0.000576 0.005476 0.017956 0.011236 0.015876 0.05112  x  x  20.58  4.116 , n  x  x  5 2  0.05112 0.05112  0.01278 . So that s  0.01278  0.113049 4 1 251y0815 6/11/08 I had better repeat the table from the last page. 1 x Row x in order x2 x 1 2 3 4 5 4.14 4.19 4.25 4.01 3.99 20.58 3.99 4.01 4.14 4.19 4.25 17.1396 17.5561 18.0625 16.0801 15.9201 84.7584 logx  0.241546 0.238663 0.235294 0.249377 0.250627 1.21551 c) The 3rd quintile (2): The third quintile has 3 5 ln x  0.617000 0.622214 0.628389 0.603144 0.600973 3.07172 1.42070 1.43270 1.44692 1.38879 1.38379 7.07290  .60 of the data below it. The data must be in order. location  pn  1  .606  3.60  a.b . x1 p  x.40  x3  0.60x4  x3   4.14  0.604.19  4.14  4.17 d) The Coefficient of variation (1): C  s 0.113049   .0247 or 2.47% x 4.116 Extra credit beyond this point! e) The harmonic mean (1.5): The formula table says 1 1  xh n  x or x1  15  4.114  4.119  4.125  4.101  3.199  1 h 1 1 1  1.21551   0.243102 . So xh    4.1135 . 1 1 0.243102 5 n x Of course some of you decided that  1 1  xh n 1 1 1  1  1   x  5  4.14  4.19  4.25  4.01  3.99   ? 5  4.14  4.19  4.25  4.01  3.99   5  20.58   0.009718 1 1 1 1 1 1 This is, of course, an easier way to do the problem. It is also wrong and unreasonable (since it is not between 3.99 and 4.25), and you will get an A for the course if you can prove to me that it is not wrong! And please don’t try any math if you get on “Are you smarter than a fifth-grader.” 1 1 x 2 or x rms 2  x2 f) The root-mean square (1.5): The formula table says x rms  n n 84 . 7584 x rms 2   16 .95168 . So x rms  16 .95168  4.11724 5   1 g) The geometric mean (3): The formula table says x g  x1  x 2  x3  x n  n  n xg  n x  5  x . So 4.14  4.19  4.25  4.01  3.99  5 1179 .61428  1179 .61428 0.2  4.11476   The formula table also says ln x g  1 n  ln( x) , but I said in class that this could be either natural logs or   logs to the base 10. If we use logarithms to the base 10 we get log x g  1 n  log( x)     0.614344 and x g  10 0.614344  4.11476 . If we use natural logarithms we get ln x g  3.07172 5 1 ln( x)   n  7.07290  1.41458 and x g  e1.41458  4.11476 5 2 251y0815 6/11/08 Part II. (18 points) According to Anderson, Sweeny and Williams a bank found the following as a sample of 30 waiting times (in seconds) for service. Row 1 2 3 4 5 Time 60-119.99 120-179.99 180-239.99 240-299.99 300-359.99 Frequency 6 10 8 4 2 a. Calculate the Cumulative Frequency (1) b. Calculate the Mean (1) c. Calculate the Median (2) d. Calculate the Mode (1) e. Calculate the Variance (3) f. Calculate the Standard Deviation (2) g. Calculate the Interquartile Range (3) h. Calculate a Statistic showing Skewness and interpret it (3) i. Make a frequency polygon of the data (Neatness Counts!)(2) Solution: Note that unreasonable answers are answers where the mean, median, mode, first quartile and third quartile do not fall between 60 and 360. If we use the computational method, we get the following. x is the midpoint of the class. Row 1 2 3 4 5 Class 60-119.99 120-179.99 180-239.99 240-299.99 300-359.99 f F x 6 10 8 4 2 30 6 16 24 28 30 90 150 210 270 330 fx fx2 fx3 540 48600 4374000 1500 225000 33750000 1680 352800 74088000 1080 291600 78732000 660 217800 71874000 5460 1135800 262818000 If we use the definitional method, we get the following. x is the midpoint of the class. I usually tell people that they are wasting their time if they use the definitional method. Because of the large numbers here that may not be true. Row 1 2 3 4 5 Class 60-119.99 120-179.99 180-239.99 240-299.99 300-359.99 F f 6 10 8 4 2 30 6 16 24 28 30 fx x 90 150 210 270 330 540 1500 1680 1080 60 5460 xx f x  x  f x  x 2 -92 -32 28 88 148 -552 -320 224 352 296 0 50784 10240 6272 30976 43808 142080 If you used the computational method, you would have gotten n  the mean is x   fx  5460  182 .0000 . You would also find n 30  f  30  fx 2 and f x  x 3 -4672128 -327680 175616 2725888 6483584 4385280  fx  1135800 and  5460 , so that  fx 3  262818000. If you used the definitional method, you would have and gotten If you used the computational method, you would have gotten n   f  84 You would have followed by getting  f x  x  3  fx 5460  fx  5460 , so that the mean is x  n  30  182 .0000 .  f x  x   0 (a check),  f x  x 2  142080 and and  4385280 . If you used one of Pearson’s measures of skewness, you would not have bothered with the f x  x 3 or the fx3 columns. a. Calculate the Cumulative Frequency (1) See the F column above. b. Calculate the Mean (1): We have already found x  182 .0000 3 251y0815 6/11/08 Row f F x 6 10 8 4 2 30 6 16 24 28 30 90 150 210 270 330 Class 1 2 3 4 5 60-119.99 120-179.99 180-239.99 240-299.99 300-359.99 fx fx2 fx3 540 48600 4374000 1500 225000 33750000 1680 352800 74088000 1080 291600 78732000 660 217800 71874000 5460 1135800 262818000  fx 5460  f  30 ,  fx  5460 , x  n  30  182 .0000 ,  fx  1135800 , 2 3  262818000,  f x  x   0 ,  f x  x   142080 and  f x  x   4385280 . Remember n   fx 3 2 c. Calculate the Median (2): position  pn  1  .531  15.5 . This is above F  6 and below F  16 , so the interval is the 2nd, 120 to 180, which has a frequency of 10. Each interval width is 180 – 120 = 60.  pN  F   .530   6  x1 p  L p    w so x1.5  x.5  120    60   120  0.960   174 . Check: this is  10   f p  between 120 and 160. d. Calculate the Mode (1): The largest group is 120 to 180, which has a frequency of 10, so by convention the mode is its midpoint, which is mo  150. e. Calculate the Variance (3): We have s2   fx s2   f x  x  2  nx 2 n 1 n 1  fx 2  1135800 and x  182 .0000 or  1135800  30 182 .0000 2 142080   4899 .3103 or 29 29  142080  4899 .3103 . 29 2  f x  x  2  142080 . f. Calculate the Standard Deviation (2): s  4899.3103  69.9951 . g. Calculate the Interquartile Range (3): Note that to be reasonable, Q1  x50  Q3 . First Quartile: position  pn  1  .2531  7.75 . This is above F  6 and below F  16 , so the interval is the 2nd, 120 to 180, which has a frequency of 10. Each interval width is 180 – 120 = 60.  pN  F   .2530   6  x1 p  L p    w gives us Q1  x1.25  x.75  120    60   120  0.1560   129 .00 . 10    f p  Third Quartile: position  pn  1  .7531  23.25 . This is above F  16 and below F  24, so the  .7530   16  interval is the 3rd, 180 to 240 which has a frequency of 8. Q3  x1.75  x.25  180    60  8    180  0.8125 60  228 .75 . So IQR  Q3  Q1  228 .75  129 .00   99.75 .  f  30 ,  1135800 ,  fx h. Calculate a Statistic showing Skewness and interpret it (3): Remember n  x  182 .0000 , x.5  174 , mo  150, s  4899.3103  69.9951 ,  fx 2 3   f x  x   0 , and  f x  x 3  4385280. n  fx  3x  fx  2nx   293028 262818000  3182 .000 1135800  230182 .000    (n  1)( n  2)  262818000, k3 3 2 3 3  0.0369458 262818000  620146800  361714080   0.0369458 4385280   162017 .734 . or k 3  n (n  1)( n  2)  f x  x  3  30 4385280   162017 .734 . 29 28  4 251y0815 6/11/08 or g1  k3 s 3  162017 .734 69 .9951 3  0.4725 Pearson's Measure of Skewness SK1  or mean  mode  182  150   0.4572 or 69 .9951 std .deviation 3mean  median 3182  174    0.3429 69 .9951 std .deviation Because of the positive sign, the measures all imply (slight) skewness to the right.. SK 2  i. Make a frequency polygon of the data (Neatness Counts!)(2) Row 0 1 2 3 4 5 6 Class 0-60 60-119.99 120-179.99 180-239.99 240-299.99 300-359.99 360-419.99 f F x fx 0 6 10 8 4 2 0 6 16 24 28 30 30 90 150 210 270 330 390 540 1500 1680 1080 660 fx2 48600 225000 352800 291600 217800 fx3 4374000 33750000 74088000 78732000 71874000 The seven points on your graph should be (30, 0), (90, 6), (150, 10), (210, 8), (270, 4), (330, 2) and (390, 0). 5 251y0815 6/11/08 Part III. Multiple choice (12 points). Note: If you say ‘None of the above,’ you should supply a correct answer to get full credit. 1. If a distribution is skewed to the right, the following must be true. (Hint: making a diagram first is a good way to prevent errors.) a. Mean < median < mode b. Median < mean < mode c. *Mode < median < mean d. Mode < mean < median e. Mean = median = mode f. None of the above. 2. If I have a population described as grouped data and I am using definitional formulas. f x     0 a.  b.  f x     0 c.  f x     1 d. *None of the above. Solution: For the same reason that  f x  x   0 on page 4, this sum is zero. To do the mathematics,  f x  x    fx   fx   fx   f x   fx  nx  fx  fx  fx  0   fx  n   n 3. Which of the following does not describe a population? a.* x b.  c. The coefficient of variation d.  e. Pearson’s coefficient of skewness. f.  g. All of the above describe a population. 4. Mark the following items N (nominal), O (ordinal), I (interval) or R (ratio) data. If the data is interval or ratio data, would it be considered C (continuous) or D (discrete)? (4) a) Likert Scale - The format of a typical five-level Likert item is: 1) Strongly disagree; 2) Disagree; 3) Neither agree nor disagree; 4) Agree; 5) Strongly agree O b) Next year’s tuition (in dollars and cents) RC c) Place of residence N d) Number of credit cards that you hold RD 5. If you make a graph to represent a data set, the following should be plotted at class midpoints. a. An ogive b. *A frequency polygon c. A Pareto diagram d. All of the above e. None of the above 6 251y0815 6/11/08 Part IV. (13+ points) Table 1 Given below is a stem-and-leaf display for the amount of gas purchased at a service station. Minitab gives the following information. (SE Mean is the standard deviation divided by the square root of n .) MTB > describe c1 Descriptive Statistics: x Variable n x 25 9|147 10|02238 11|125566777 12|223489 13|02 Mean SE Mean 11.372 0.232 StDev 1.158 Minimum 9.100 Q1 Median Q3 Maximum …………… ………… ………… 13.200 1. In Table 1, what is the median purchase? (2) Solution: The way that I did these problems is to write out the indices of the numbers as below. 9|147 x1  x3  10|02238 x 4  x8  11|125566777 12|223489 13|02 x9  x17  x18  x23  x 24  x 25  It may be clearer if I actually write out the numbers and their indices. Value Index 9.1 1 9.4 2 9.7 3 10.0 4 10.2 5 10.2 6 10.3 7 10.8 8 11.1 9 11.2 10 11.5 11 11.5 12 Value Index 11.6 13 11.6 14 11.7 15 11.7 16 11.7 17 12.2 18 12.2 19 12.3 20 12.4 21 12.8 22 12.9 23 13.0 24 Value 13.2 Index 25 location  pn  1  0.526   13. median  x13  11.6 . 2. Create a 5-number summary from Table 1. (4) Solution: To find the first quartile, we write location  pn  1  0.2526   6.5. This implies that the first quartile is x6  0.5x7  x6   10.2  0.510.3  10.2  10.25 . For the third quartile, location  pn  1  0.7526   19 .5. This implies that the third quartile is x19  0.5x 20  x19   12.2  0.512.3  12 .2  12 .25 . The five-number summary is thus 9.1, 10.25, 11 .6, 12 .25, 13.2  7 251y0815 6/11/08 3. In Table 1, assume that you were asked to present the data in 4 classes. Using the method you learned in class, show how you would decide what class interval to use and list the classes below with their frequencies. (4) [10] A B C D ___ ___ ___ ___ Class to under to under to under to under ___ ___ ___ ___ Frequency ___ ___ ___ ___ Solution: The highest number is 13.2 and the lowest is 9.1. We calculate 1.25. If we use 1.25, we might get the following. Class frequency A 8.75 to under 10.0 3 B 10.00 to under 11.25 7 C 11.25 to under 12.50 11 D 12.50 to under 13.75 4 You could also start at 8.50. 25 13 .2  9.1  1.025 . Use 1.2 or 4 If we use 1.2, we might get the following. Class frequency A 9.0 to under 10.2 4 B 10.2 to under 11.4 6 C 11.4 to under 12.6 11 D 12.6 to under 13.8 4 1.5 would work if you start at 8. 25 4. In Table 1, according to the Tchebyschev inequality, what is the minimum number of observations that should be between 9.056 and 13.688? What would the empirical rule say? Should the empirical rule apply here? Why? (4) Variable x n 25 Mean SE Mean 11.372 0.232 StDev 1.158 Minimum 9.100 Q1 Median Q3 Maximum …………… ………… ………… The mean is 11.372 and 11.372 – (2)1.158 = 9.056, 11.372 + (2)1.158 = 13.688. These points are thus 2 standard deviations from the mean. The inequality says that at most 1 22  1 4 of the points should be below 9.056 or above 13.688. So at least 75% of the points should be between these numbers. This is at least 19 points. The empirical rule says that about 95% of the observations should be between these two points. This is about 24 points. The stem-and leaf diagram shows that the data is roughly symmetrical, so we would expect this to be almost true. Actually all the 25 points are in the interval. As usual the inequality gives us an underestimate of the number of the points in the interval. 5. Which of the following are not sensitive to extreme values? (Circle all correct answers.) (3) a. *The mode b. The mean c. The variance d. The coefficient of variation e. k 3 , the third k-statistic. 8