* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 251y0312 - On-line Web Courses
Survey
Document related concepts
Transcript
251y0312 9/26/03 ECO251 QBA1 FIRST HOUR EXAM October 1, 2003 Name: _____KEY____________ Social Security Number: _____________________ Part I. (32 points) 1. The process of using sample statistics to draw conclusions about true population parameters is called a) *statistical inference. b) the scientific method. c) sampling. d) descriptive statistics. 2. A summary measure that is computed to describe a characteristic of an entire population is called a) *a parameter. b) a census. c) a statistic. d) the scientific method. 3. Which of the following is a discrete quantitative variable? a) the Dow Jones Industrial Average b) the volume of water released from a dam c) the distance you drove yesterday d) *the number of employees of an insurance company TABLE 1-1 The manager of the customer service division of a major consumer electronics company is interested in determining whether the customers who have purchased a videocassette recorder made by the company over the past 12 months are satisfied with their products. 4. Referring to Table 1-1, the possible responses to the question "Are you happy, indifferent, or unhappy with the performance per dollar spent on the videocassette recorder?, " if we write down a 1 for ‘happy, ’ a 2 for ‘unhappy’ and a 3 for ‘indifferent, are the following kind of random variable. a) ratio b) *nominal c) interval d) ordinal 1 251y0312 9/26/03 TABLE 2-2 At a meeting of information systems officers for regional offices of a national company, a survey was taken to determine the number of employees the officers supervise in the operation of their departments, where X is the number of employees overseen by each information systems officer. X f_ 1 7 2 5 3 11 4 8 5 9 5. Referring to Table 2-2, how many regional offices are represented in the survey results? a) 127 b) 5 c) 15 n f d) *40 TABLE 2-5 The following are the durations (in minutes) of a sample of long-distance phone calls made within the continental United States, reported by one long-distance carrier: Time (in Minutes) 0 but less than 5 5 but less than 10 10 but less than 15 15 but less than 20 20 but less than 25 25 but less than 30 30 but less than 35 Relative Frequency 0.37 0.22 0.15 0.10 0.07 0.07 0.02 6. Referring to Table 2-5, if 1,000 calls were randomly sampled, how many calls lasted under 10 minutes? a) 220 class f rel Frel b) 370 0 but less than 5 0.37 0.37 c) 410 5 but less than 10 0.22 0.59 10 but less than 15 0.15 0.74 d) *590 15 but less than 20 0.10 0.84 The answer is the 20 but less than 25 0.07 0.91 cumulative frequency 25 but less than 30 0.07 0.98 nd for the 2 class 30 but less than 35 0.02 1.00 multiplied by 1000. 7. If I make a graph of the data in table 2-5 (Assume the table represents a sample of 1000 calls) with the following x and y coordinates for the first five points: {(0, 0), (5, 370), (10, 590), (15, 740) , (20, 840)}, a one-word name for this type of graph is _ogive_ , and the last point on the line could be (45, _1000_ ) Explanation: The x points are the upper limits of the class, starting at the last empty class. The y points are the cumulative frequencies, gotten by multiplying the Frel column by 1000. When the graph gets to x = 35, y hits 1000 and is 1000 for all subsequent points. 2 251y0312 9/26/03 8. Referring to Table 2-5, what is Frel for the percentage of calls that lasted under 20 minutes? a) 0.10 b) 0.76 c) *0.84 Look at the table. d) None of the above – write in the correct answer. TABLE 2-7 The stem-and-leaf display below contains data on the number of months between the date a civil suit is filed and when the case is actually adjudicated for 50 cases heard in superior court. Stem Leaves 1 234447899 2 22223455678889 3 0011135778 4 02345579 5 112466 6 158 9. Referring to Table 2-7, the civil suit with the fourth shortest waiting time between when the suit was filed and when it was adjudicated had a wait of _14__ months. Explanation: The first four numbers are 12, 13, 14, 14. k n x x 3 , 33 , 10. Eunice computes the following statistics from a sample (n 1)( n 2) s x x x x 4 3n 13 s 4 3mean mode n2 . She , , k4 n 1 std .deviation n 1 n 1n 2n 3 n n2 thinks the sample represents a population that is skewed to the right. Which of the statistics would show skewness and what sign should she expect from them? (No partial credit on this one.) Answer: Any legitimate measure of skewness would be positive if the population is skewed to the n x x 3 right. From your formula table, the measures of skewness are: (i) k 3 (n 1)( n 2) 2 skewness, (ii) g1 k3 s 3 skewness. The other two are s 2 - relative skewness and (iii) SK x x n 1 3mean mode - Pearson’s measure of std .deviation 2 - the sample variance, which is always positive and x x 4 3n 13 s 4 n2 n 1 - the n 1n 2n 3 n n2 coefficient of excess (in the outline), which measures kurtosis. measures dispersion and k4 11. In a perfectly symmetrical distribution with one mode. a) the arithmetic mean equals the median. b) the median equals the mode. c) the arithmetic mean equals the mode. d) *all of the above. e) none of the above. 3 251y0312 9/26/03 12. According to the Bienayme-Chebyshev rule (I called it Chebyshef’s Inequality), at least 93.75% of all observations in any data set are contained within a distance of how many standard deviations around the mean? a) 1 b) 2 c) 3 d) *4 Explanation: If at least 93.75% are ‘in,’ then at most 6.25% are out in the tails. The rule says that 1 k 2 is the proportion in the tails, defined as the points below k and the points above k . If you try out the values here, you will find More directly, you could solve 1 1 k2 1 42 116 .0625, so k must be 4. .9375 , by trying the four values of k that were given. This is a problem that was done in class. 13. Evaluate the following statements. (i) The median of the values 3.4, 4.7, 1.9, 7.6, and 6.5 is 4.05. (ii) In a set of numerical data, the value for Q3 can never be smaller than the value for Q1. (iii) In a set of numerical data, the value for Q2 is always halfway between Q1 and Q3. a) (i) and (ii) are false. b) *(i) and (iii) are false. c) (ii) and (iii) are false d) Only one of the statements is false. e) All of the statements are false. Explanation: The numbers in order are 1.9, 3.4 ,4.7 ,6.5 ,7.6 , so the median is 4.7 and (i) is wrong. The order of the quartiles is Q1, median, Q3. If all the middle numbers are the same, Q3 could equal both the median and Q1, but it could never be smaller than Q1, so (ii) is true. Q2 is the second quartile and it could be any value between Q1 and Q3, depending of what the numbers are. Its position, however, is halfway between them, so (iii) is false. 14. Which one of the following statements is false? a) In a sample of size 40, the sample mean is 15. In this case, the sum of all observations in x 600 . the sample is b) *A population with 200 elements has an arithmetic mean of 10. From this information, it can be shown that the population standard deviation is 15. c) The median of a data set with 20 items would be the average of the 10th and 11th items in the ordered array. d) The coefficient of variation measures variability in a data set relative to the size of the arithmetic mean. e) If every possible group of 10 individuals in the population is equally likely to be chosen to be in the sample, we must be taking a simple random sample of 10. f) All of the above statements are false. 15. Which of the following is NOT a measure of central tendency? a) the arithmetic mean b) the geometric mean c) the mode d) *the interquartile range 4 251y0312 9/26/03 16. Which of the following is most sensitive to extreme values? a) the median b) the interquartile range c) *the arithmetic mean d) the 1st quartile 5 251y0312 9/26/03 Part II. (Ng pp 77-79) (8 points) The data below represent the amount of grams of carbohydrates in a serving of breakfast cereal. It is a x 217 , x 2 4541 sample containing 11 numbers. Note: {11, 15, 23, 29, 19, 22, 21, 20, 15, 25, 17} Find: a) The First Quartile (1.5) b) The Standard Deviation (2) c) The Coefficient of variation (1.5) d) The five-number summary (3) x , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x8 , x 9 , x10 , x11 Solution: a) Put the numbers in order. 1 . 11, 15, 15, 17 , 19 , 20 , 21, 22 , 23, 25, 29 n 11, so the first quartile is at position pn 1 .2512 3.0 , and Q1 x3 15 . Or if a.b 3.0, x1 p x.75 xa .bxa1 xa x3 0x 4 x3 15 017 15. x 217 19.7273 , so, using the computational formula, s x b) x 2 n 2 nx 2 n 1 11 4541 1119 .7273 260 .17 26 .017 . s 26.017 5.101 . 10 10 st .deviation s 5.101 0.2586 . c) C mean x 19 .7273 d) For the median position pn 1 .512 6.0 and for the third quartile, position pn 1 .7512 9.0 . So, x.50 x6 20 and Q3 x.75 x9 23. The 5 number summary would be {lower bound, Q1, 2 median, Q3, upper bound} or 11, 15, 20, 23, 29 . 6 251x0312 9/23/03 ECO251 QBA1 FIRST EXAM October 1, 2003 TAKE HOME SECTION Name: _____KEY________________ Social Security Number: _________________________ Throughout this exam show your work! Please indicate clearly what sections of the problem you are answering and what formulas you are using. Part III. Do all the Following (11 Points) Show your work! 1. My Social Security Number is 265398248. If I use each digit as a frequency in and the intervals below, I get: Class Frequency $0- 5999 $6000- 11999 $12000- 17999 $18000- 23999 $24000- 29999 $30000- 35999 $36000- 41999 $42000- 47999 $48000- 53999 Assume that this data represents a sample of rents paid in Chester County. a. Calculate the Cumulative Frequency (0.5) b. Calculate The Mean (0.5) c. Calculate the Median (1) d. Calculate the Mode (It is possible but unlikely that there is more than one)(0.5) e. Calculate the Variance (1.5) f. Calculate the Standard Deviation (1) g. Calculate the Interquartile Range (1.5) h. Calculate a Statistic showing Skewness and Interpret it (1.5) i. Make a frequency polygon of the Data (Neatness Counts!)(1) j. Extra credit: Put a (horizontal) box plot below the histogram using the same scale. (1) 2 6 5 3 9 8 2 4 8 Replace my Social Security number with your own in the frequency column. To make the problem easier, you may replace all zeros in your new frequency column with 10s. Solution: x is the midpoint of the class. Our convention is to use the midpoint of 0 to 2, not 1.99999. Note also, that the midpoints and class limits have been divided by 1000. Most numbers should be multiplied by 1000, the variance should be multiplied by 1,000,000 and k 3 by 1,000,000,000. class A B C D E F G H I 0- 5.999 6-11.999 12-17.999 18-23.999 24-29.999 30-35.999 36-41.999 42-47.999 48-53.999 f F x 2 6 5 3 9 8 2 4 8 47 2 8 13 16 25 33 35 39 47 3 9 15 21 27 33 39 45 51 6 54 75 63 243 264 78 180 408 1371 18 486 1125 1323 6561 8712 3042 8100 20808 50175 fx3 x x 54 4374 16875 27783 177147 287196 118638 364500 1061208 2058075 -26.1702 -20.1702 -14.1702 - 8.1702 - 2.1702 3.8298 9.8298 15.8298 21.8298 f x x f x x 2 f x x 3 -52.340 1369.76 -35846.9 -121.021 2441.02 -49236.0 -70.851 1003.97 -14226.5 -24.511 200.26 -1636.1 -19.532 42.39 -92.0 30.638 117.34 449.4 19.660 193.25 1899.6 63.319 1002.33 15866.6 174.638 3812.32 83222.1 0.000 10182.64 400.2 fx 50175 , fx 2058075 , f x x 0, f x x 2 10182.64, and f x x 3 400.2. Note that, to be reasonable, the mean, median and n f 47, fx fx 2 fx 1371 , 2 3 quartiles must fall between 0 and 54. a. Calculate the Cumulative Frequency (1): (See above) The cumulative frequency is the whole F column. b. Calculate the Mean (1): x fx 1371 29.1702 n 47 7 c. Calculate the Median (2): position pn 1 .548 24 . This is above F 16 and below F 25, so pN F the interval is E, 24-29.999 in thousands. x1 p L p w so f p .547 16 x1.5 x.5 24 6 24 0.83333 10 24 .5000 9 d. Calculate the Mode (1) The mode is the midpoint of the largest group. Since 9 is the largest frequency, the modal group is E, 24 to 29.999 and the mode is 27 (in thousands). e. Calculate the Variance (3): s 2 s2 f x x 2 n 1 fx 2 nx 2 n 1 51075 47 29 .1702 2 11082 .673 221 .3627 or 46 46 10182 .64 221 .3617 . The computer got 221.362. (in millions) 46 f. Calculate the Standard Deviation (2): s 221.3627 14.8783 or s 221.3617 14.8782 (in thousands) g. Calculate the Interquartile Range (3): First Quartile: position pn 1 .2548 12 . This is above pN F F 8 and below F 13, so the interval is C, 12-17.999. x1 p L p w gives us, in thousands, f p .25 47 8 Q1 x1.25 x.75 12 6 16 .500 . 5 Third Quartile: position pn 1 .7548 36 . This is above F 35 and below F 39, so the interval .7547 35 is H, 42-47.999. x1.75 x.25 42 6 42 .375 . 4 IQR Q3 Q1 42.375 16.500 25.875 (in thousands). h. Calculate a Statistic showing Skewness and interpret it (3): n k 3 fx 3 3x fx 2 2nx 3 47 2058075 329.1702 50175 247 29.1702 3 (n 1)( n 2) 4645 0.0227053 2058075 4390844 .4 2333168 .3 0.0227053 399 .3 9.066 . or k 3 g1 k3 s 3 n (n 1)( n 2) f x x 9.085 14 .8782 3 3 47 400 .2 9.087 (The computer gets 9.0849) or 46 45 .00276 3mean mode 329 .1702 27 0.4376 std .deviation 14 .8782 Because of the positive sign, the measures imply skewness to the right. i. A frequency polgon is a simple line graph with frequency on the y-axis and the numbers 0- 54 (thousand) on the x-axis. Since class A has a frequency of 2 plotted at x = 3 and the class width is 6, it should really start at x = -3 and y = 0. You should, at least show, the line falling across the y axis. Sinne the last nonempty class is 48-53.999, with its frequency plotted at x = 51, there should be a zero at x = 57. j. The box plot should show the median and the quartiles. (See text) or Pearson's Measure of Skewness SK 8 251y0312 9/26/03 2. My Social Security Number is 265398248. If I write it in clumps of 2 numbers and add 100 to the end, I get: 26, 53, 98, 24, 8, 100. Write your social security number the same way, so that you have a list of six numbers. Note: If any of these five numbers is a zero, change it to a one. For these five numbers, compute the a) Geometric Mean b) Harmonic mean, c) Root-mean-square (1point each). Label each clearly. If you wish, d) Compute the geometric mean using natural or base 10 logarithms. (1 points extra credit each ). Solution: Note that x 209 . This is not used in any of the following calculations and there is no reason why you should have computed it! a) The Geometric Mean. 1 x g x1 x 2 x3 x n n n 25928448 x 26 5398 24 8100 6 2592844800 5 2592844800 1 6 0.16667 37.0648 . b) The Harmonic Mean. 1 1 xh n 1 1 1 1 0.20289454 6 1 x 6 26 53 98 24 8 100 6 0.0384615 0.0188679 0.010204 0.00036099 1 1 1 0.0338157 1 . So xh 1 1 1 n 1 x 0.125 0.01 1 29 .57208 0.0338157 c) The Root-Mean-Square. 1 1 1 2 x rms x 2 26 2 53 2 98 2 24 2 8 2 100 2 676 2809 9604 576 64 10000 n 6 6 1 23729 3954 .83 . So x rms 6 1 n x 2 3954 .83 62 .8875 . d) (i) ln x g 1 n ln( x) 6 ln 26 ln 53 ln 98 ln 24 ln 8 ln 100 1 1 3.25809 3.97029 4.58497 3.17805 2.07944 4.60517 1 21 .67594 3.6127 6 6 So x g e 3.6127 37 .0644 . (ii) log x g 1 n log( x) 6 log26 log53 log98 log24 log8 log100 1 1 1.41497 1.72428 1.99123 1.38021 0.90309 2.00000 1 9.41378 1.56896 . 6 6 So x g 10 1.56896 37 .0649 . Notice that the original numbers and all the means are between 8 and 100. 9